EGEN5202


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


EGEN5202

General Information

Expected time required
15-20 hours per phase
Group Project (2 students)
Submission format
Zipped file (.zip) on Brightspace
• Only one team member needs to submit the project files
• No late submissions are allowed
• No Email submission are allowed

Project Overview

This project focuses on designing, implementing, and analyzing a secure and fault-tolerant distributed system using Python. The system will simulate monitoring critical infrastructure data, such as environmental sensors or healthcare metrics. You will integrate key concepts from secure systems engineering course, including fault tolerance, cryptography, network security, system reliability, and secure software development. Your final system should be resilient against failures and security threats while ensuring high reliability and availability.

Objectives

The project aims to apply theoretical knowledge to real-world challenges by creating a distributed system that is both secure and fault-tolerant. You will implement cryptographic techniques to secure communication between distributed nodes, embed fault-tolerant mechanisms for system reliability, and simulate failure scenarios to test the system's robustness. Additionally, you will conduct research on an advanced topic related to secure systems, deepening your understanding of contemporary issues in the field. Finally, you will enhance your Python programming skills, focusing on network programming, multithreading, and using security libraries.

Project Phases

Phase 1: System Design, Research, and Proposal

In the initial phase, your focus will be on research and planning. Begin by delving into distributed system architectures suitable for critical data monitoring. This involves studying existing models and understanding how they address challenges related to scalability, reliability, EGEN5202 and security. Examine best practices in secure and fault-tolerant system design, paying close attention to how these systems are implemented in real-world applications. 

With this foundational knowledge, proceed to design your system. Create a detailed architecture diagram that illustrates all components, their interactions, and the data flow within the system. Clearly define the security requirements, emphasizing confidentiality (protecting data from unauthorized access), integrity (preventing data tampering), and availability (ensuring system accessibility when needed). Determine the fault-tolerance strategies you will employ— such as replication, redundancy, or failover mechanisms—to ensure the system can handle failures gracefully without significant impact on performance or data loss.

Concurrently, select a research topic that aligns with advanced challenges in secure and fault-tolerant systems. Possible areas include quantum-resistant cryptography, security considerations in cloud-based distributed systems, or vulnerabilities associated with fault tolerant mechanisms. Conduct a literature review, sourcing at least five recent academic references. Your analysis should explore the current state of the art, identify existing challenges, and discuss potential solutions or mitigations.

By the end of this phase, compile a comprehensive document (approximately 3–5 pages). This document should include an introduction outlining the project objectives and significance. Provide a detailed description of your system design, accompanied by architecture diagrams and explanations of your security and fault-tolerance strategies. Outline your research plan, highlighting key findings from your preliminary literature review and how you intend to delve deeper into the topic. Lastly, include a project plan that maps out your timeline, key milestones, and deliverables for the implementation and testing phases.

Phase 2: Implementation, Testing, and Analysis

The second phase transitions from planning to execution. Begin by setting up your development environment, ensuring you have Python 3.x installed along with necessary libraries such as socket for network communication, ssl for secure connections, threading for concurrency, and cryptography for implementing cryptographic functions. Generate SSL certificates to facilitate secure communication between nodes—self-signed certificates are acceptable for this project, but ensure they are properly configured.

Familiarize yourself with the provided starting code, which offers a foundational structure for a distributed system with secure communication. This code includes basic functionalities like setting up SSL/TLS-secured sockets and demonstrates symmetric encryption using the cryptography library's Fernet class. Use this as a springboard to build your system, expanding and customizing it to meet the project's requirements.

Implement secure communication within your distributed system. Utilize SSL/TLS protocols to establish secure socket communications between nodes, ensuring that data transmitted over the network is encrypted and secure from eavesdropping or interception. Incorporate cryptographic methods by using symmetric encryption algorithms like AES for data encryption, providing efficient and secure data handling. For key exchange and establishing secure channels, implement asymmetric encryption methods such as RSA. Additionally, implement data integrity checks using hashing algorithms like SHA-256 to verify that data has not been altered during transmission.

Integrate fault-tolerance mechanisms to enhance system reliability and availability. Implement data replication across multiple nodes to prevent data loss in the event of a node failure, ensuring that the system can continue to operate seamlessly. Develop failure detection mechanisms, such as heartbeat messages, to monitor the health and responsiveness of nodes within the system. These mechanisms should allow your system to detect when a node is down and trigger appropriate responses. Create recovery procedures that enable automatic failover to standby nodes, minimizing downtime, and implement state synchronization processes so that nodes recovering from failure can rejoin the system without causing inconsistencies.

Enhance your system's functionality by incorporating advanced features. While optional, implementing consensus protocols like Paxos or Raft can provide extra credit and demonstrate a deeper understanding of distributed systems. These protocols help maintain consistency across distributed nodes, especially in the presence of failures. Other features could include leader election algorithms, load balancing to distribute workloads efficiently, or dynamic node management that allows nodes to join or leave the network seamlessly.

Conduct comprehensive testing and validation of your system. Perform functional testing to ensure that all components operate correctly under normal conditions, verifying that nodes can communicate securely, and data is exchanged accurately. Simulate node failures by intentionally disabling nodes or network connections, observing how your system handles these disruptions. Assess the effectiveness of your fault-tolerance mechanisms and recovery procedures in maintaining system integrity and availability. For security testing, attempt to intercept or tamper with data transmissions to evaluate the robustness of your encryption and integrity checks. Ensure that unauthorized access is effectively prevented, and that the system behaves as expected under various threat scenarios.

Carry out a quantitative reliability analysis of your system. Calculate key reliability metrics such as Mean Time Between Failures (MTBF), which measures the average time between system failures, and Mean Time To Recovery (MTTR), which assesses how quickly the system can recover from a failure. Determine the overall system availability, reflecting the proportion of time the system is operational and accessible. Use Python scripts to simulate different failure and recovery  scenarios, analyzing the impact on system performance and reliability. Document your findings thoroughly, presenting results through graphs, tables, or charts, and provide insightful interpretations. Discuss any observed patterns or anomalies and propose potential improvements based on your analysis.

Update your document from Phase 2 and include all your findings and observations. (approximately an extra 3–5 pages).

Phase 3: Finalization, Reporting, and Presentation

This paper should provide an in-depth exploration of your selected research topic. Begin with an introduction that outlines the significance of the topic and its relevance to secure and fault-tolerant systems. The literature review should synthesize findings from your academic references, highlighting current advancements, challenges, and gaps in the field. In the discussion section, delve into potential solutions or mitigations for the challenges identified, offering critical analysis and personal insights. Conclude the paper by summarizing key takeaways and suggesting areas for future research. Ensure that all sources are properly cited and referenced according to IEEE formatting guidelines.

Prepare a comprehensive final project report that encapsulates all aspects of your project. Start with an executive summary that provides a concise overview of the project objectives, methodologies, and outcomes. In the system design and architecture section, include updated diagrams and provide detailed explanations of each component, their roles, and interactions within the system. The implementation details should describe your development process, elaborating on code modules, functionalities, and the rationale behind your design choices. Discuss any challenges encountered during implementation—such as technical hurdles or unforeseen issues—and how you addressed them. 

Include a section on testing and validation, describing the methodologies used and presenting the results of your functional, failure, and security tests. Provide evidence of your system's performance, reliability, and security through data, logs, or screenshots. In the reliability analysis section, present your calculated metrics, interpreting what they indicate about your system's robustness and areas for enhancement. The security analysis should evaluate yoursystem's  security features, identify any potential vulnerabilities, and discuss how they could be mitigated or addressed in future iterations.

Conclude your report by summarizing your achievements, reflecting on the learning outcomes, and offering recommendations for future improvements or extensions of your system. Appendices can include supplementary material such as code snippets, additional data sets, or detailed logs that support your findings but are too extensive for the main report.

Submit your source code along with a README file that provides clear instructions on how to set up and run your system. The README should describe the code structure, functionalities, dependencies, and any configuration required. Ensure that your code is well-documented with comments explaining key sections and that it adheres to best practices in coding standards.

You must include a video demonstration showcasing your system in action, highlighting key features, how it handles failures, and the effectiveness of its security measures. The video must be between 4 and 7min long.

Starting Code

Refer to the starting code provided on Brightspace to help you begin the implementation of your distributed system. This code provides a basic structure for secure communication between nodes using SSL/TLS and symmetric encryption.

Notes on the Starting Code

  • SSL/TLS Communication:
    • The code sets up SSL/TLS-secured sockets using self-signed certificates.
    • You need to generate your own node_cert.pem and node_key.pem files.
  • Cryptography:
    • Uses the cryptography library's Fernet class for symmetric encryption.
    • A cipher suite is established upon connecting to a peer.
  • Node Class Structure:
    • Each node can accept incoming connections and connect to other peers.
    • Includes methods for handling clients and broadcasting messages.
  • Multithreading:
    • Server and client handlers run in separate threads to allow concurrent connections.

Your Tasks with the Starting Code

1. Certificate Management:
  • Generate unique certificates for each node to simulate real-world scenarios.
  • Implement certificate verification to prevent man-in-the-middle attacks.
2. Enhance Security Features:
  • Implement asymmetric encryption for key exchange before establishing symmetric encryption.
  • Use digital signatures to ensure message authenticity.
3. Implement Fault-Tolerant Mechanisms:
  • Modify the code to handle node failures gracefully.
  • Implement data replication across nodes to prevent data loss.
4. Expand Networking Capabilities:
  • Allow dynamic discovery of nodes in the network.
  • Implement message routing for nodes that are not directly connected.
5. Improve the Node Class:
  • Add logging mechanisms for monitoring and debugging.
  • Implement configuration files or command-line arguments for flexibility.
6. Testing and Validation:
  • Write unit tests for critical components of the code.
  • Simulate network partitions and observe how the system handles them.
Deliverables Summary
1. Phase 1 (11:00pm Oct 31st, 2024):
  • Proposal Document (including system design and research plan)
2. Phase 2 (11:00pm Nov 14th, 2024):
  • Implemented Distributed System with Fault-Tolerance and Security
  • Testing and Reliability Analysis Documentation
3. Phase 3 (11:00pm Dec 6th, 2024):
  • Final Project Report (10-15 pages)
  • Source Code and README File
  • Demonstration Video of your code in action

Assessment Criteria

  • Functionality (40%): The system meets the specified requirements and functions correctly.
  • Security Implementation (20%): Effective use of cryptographic methods and secure communication.
  • Fault Tolerance (15%): Implementation of fault-tolerant features and handling of failures.
  • Reliability Analysis (10%): Accurate calculation and interpretation of reliability metrics.
  • Research Quality (15%): Depth of research, analysis, and relevance in the research paper.
  • Code Quality and Documentation: Clean, well-documented code following best practices.

Project submission

  • The deadline for submitting each requirement is in Deliverables Summary section.
  • The required submit files/documents are listed under each submission link description on Brightspace.
  • Since, this is a group project, only one team member needs to submit the files on behalf of other team members.
  • You are allowed to submit as many times as you wish before the deadline. Keep in mind that each submission overrides the previous one.

Resources

  • Python Documentation: https://docs.python.org/3/
  • Cryptography Library: https://cryptography.io/en/latest/
  • Socket Programming in Python: https://realpython.com/python-sockets/
  • SSL/TLS in Python: https://docs.python.org/3/library/ssl.html
  • Generating SSL Certificates: Refer to Appendix 

Appendix A: Generate SSL Certificate

In this project, SSL/TLS is used to secure the communication between nodes in the distributed system. To implement SSL/TLS, each node requires an SSL certificate and a private key. You can generate “self-signed SSL certificates” for this purpose using OpenSSL, which is a widely-used tool for creating certificates.

Below are step-by-step instructions for generating SSL certificates and keys for your nodes.

Step 1. Install OpenSSL (if necessary)

OpenSSL is likely already installed on most Linux and macOS systems. You can check if OpenSSL is installed by typing the following command in your terminal or command prompt:

openssl version

If OpenSSL is installed, it will return the version number. If not, follow the instructions below to install it.
➢ For Linux (Ubuntu/Debian):
sudo apt update
sudo apt install openssl
➢ For macOS (using Homebrew):
brew install openssl
➢ For Windows:
You can download OpenSSL for Windows from [OpenSSL for Windows](https://slproweb.com/products/Win32OpenSSL.html) and follow the installation instructions on that site.

Step 2. Generate the Private Key and SSL Certificate

1. Navigate to your project directory (or any directory where you want to store your certificates).
▪ Open a terminal/command prompt in that directory.
2. Generate a Private Key:
▪ Run the following command to generate a new RSA private key. This key will be used for SSL encryption.
openssl genrsa -out node_key.pem 2048
▪ This will create a file called node_key.pem, which contains the private key.
3. Create a Self-Signed Certificate:
▪ Run the following command to generate a self-signed certificate using the private key you just created. The certificate will be valid for 365 days.
openssl req -new -x509 -key node_key.pem -out node_cert.pem
-days 365
▪ You will be prompted to enter some information for the certificate, such as the country name, organization name, and common name (CN). For local development, these values can be anything, but the “Common Name (CN)” should typically match the hostname of your node (e.g., localhost or 127.0.0.1).
▪ Example inputs:
Country Name (2 letter code) [AU]:Ca
State or Province Name (full name) [Some-State]:Ontario
Locality Name (eg, city) []:Ottawa
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Carelton
University
Organizational Unit Name (eg, section) []: IT
Common Name (e.g., server FQDN or YOUR name) []: 127.0.0.1
Email Address []:
▪ This will generate a self-signed certificate file called node_cert.pem.

Step 3. Verifying the Certificate

Once you’ve generated the certificate and key, you can verify that they are valid by running the following command:
openssl x509 -text -noout -in node_cert.pem
This command will display the details of the certificate, including the validity period and the Common Name (CN) you entered.

Step 4. Use the Certificate and Key in Your Python Code

You now have two files:
  • node_key.pem (private key)
  • node_cert.pem (self-signed certificate)
These files can be used in your Python code to establish SSL/TLS communication between nodes. In your Python code, you can reference these files as shown in the following example:
def create_ssl_context(certfile, keyfile):
context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
context.load_cert_chain(certfile=certfile, keyfile=keyfile)
return context
In this case, certfile will be node_cert.pem and keyfile will be node_key.pem. Use these to set up a secure SSL socket in your distributed system.

Step 5. Repeat for Multiple Nodes

If your system has multiple nodes, each node can use its own unique self-signed certificate and key, or for simplicity in local development, you can reuse the same certificate and key across all nodes. For real-world deployment, each node should have its own certificate signed by a trusted Certificate Authority (CA), but for development purposes, self-signed certificates are sufficient.

发表评论

电子邮件地址不会被公开。 必填项已用*标注