Fundamental Concepts of Distributed Systems
CAP & PACELC Theoremsβ
The PACELC theorem extends the CAP theorem by stating that:
if (network Partition exists)
: a distributed system must trade off between Availability and Consistency (as per CAP).else (there is no partition)
: the system still has to trade off between low Latency and Consistency
π This highlights that even under normal conditions, trade-offs between performance and consistency exist.
Core Distributed Systems Conceptsβ
Concept | Description |
---|---|
Data Durability | Data durability ensures stored data remains intact and retrievable despite hardware failures, power outages, or other disruptions. |
Data Consistency | Data consistency ensures that all copies of data across a distributed system are identical and reflect the most recent updates, preventing discrepancies or conflicting versions. |
Replication | Replication is the process of duplicating data across multiple systems or locations to ensure data availability, fault tolerance, and improved performance by providing backup copies in case of failure. |
Partitioning | The process of dividing a large dataset into smaller, manageable pieces to optimize storage and improve performance. |
Sharding | A form of partitioning where data is distributed across multiple servers to enhance scalability and load balancing. |
Consensus | Consensus in Distributed Systems refers to the process of achieving agreement among multiple nodes on a single data value or decision, even in the presence of failures or network delays. It ensures consistency and coordination across distributed components.Challenges in achieving consensus include network latency, node failures, and conflicting updates, making it a fundamental problem in distributed computing. Efficient consensus mechanisms help maintain data integrity, system availability, and fault tolerance. |
Distributed Transactions | A distributed transaction is an operation spanning multiple databases or systems, ensuring consistency across them. It follows ACID principles in databases and requires coordination protocols like two-phase commit, saga, etc. |
Heartbeat | A heartbeat message is used in distributed systems to detect failures. - With a central server: Servers periodically send heartbeat messages to indicate they are alive. - Without a central server: Servers randomly select peers and send heartbeat messages. |
Communication Fundamentalsβ
AJAX pollingβ
Client repeatedly sends requests to the server to check if new data is available. Useful when real-time updates are required but the server can't push updates to the client directly.
Pros:
- Simplicity: Easy to implement with basic HTTP requests.
- Compatibility: Works with most web servers and clients. Doesnβt require WebSockets or long-polling configuration.
Cons:
- Inefficient: Frequent requests even if no data has changed, leading to wasted resources.
- Server load: High frequency of requests can strain the server.
- Latency: Delayed updates depending on the polling interval.
- Network overhead: Increased network traffic from repeated requests and empty responses.
HTTP Long-Pollingβ
- The client sends a request to the server, but the server may hold the request if no data is available.
- The server waits until new data is available and then responds.
- Once the client receives a response, it immediately re-requests information, ensuring the server always has a pending request to deliver updates when data becomes available.
WebSocketsβ
- Full-duplex communication: Enables bidirectional communication between client and server over a single TCP connection. Both client and server can send data at any time.
- Persistent connection: The connection remains open, allowing continuous data exchange.
- Handshake process: The client initiates a connection with a handshake, and if successful, both parties can start exchanging data.
Server-sent events (SSE)β
SSE is ideal for real-time notifications, live feeds, and data streaming where only the server needs to send updates
- One-way communication: The server can push updates to the client, but the client cannot send data back.
- Persistent connection: Uses a single HTTP connection that remains open for continuous updates.
- Event-driven: The server sends messages when new data is available.
- Lightweight: Simpler and more efficient than WebSockets for one-way real-time updates.
- Built-in support: Natively supported in modern browsers via the
EventSource
API.