Skip to main content

Well-Architected Framework

AWS Design Principles

Stop guessing your capacity needs

Avoid idle resources and performance issues assosiated with underprovisioned infrastructure. Instead, scale automatically based on demand and adjust capacity with real-time monitoring and alerts.

Test systems at production scale

With IaC (infrasctructure as code) approach, simulate production in the cloud cost-effectively. Run full-scale tests, then decommission resources—pay only for what you use.

Automate to make architectural experimentation easier

Automate to reduce costs, ensure consistency, and simplify changes. The cloud enables easy tracking, auditing, and rollback of decisions.

Allow for evolutionary architecture

Design your system in a way, which allow it to evolve over time to meet changing business needs and innovations.

Drive architecture using data

By collecting insights on your cloud infrastructure, you can continuously improve architecture through facts rather than assumptions. This approach enables real-time adjustments based on data, something not feasible with traditional on-premises systems.

Improve through game days

Designed architecture and processes need to be regularly tested during game days to simulate events in production (for example, BCDR). This will help you identify where improvements are needed.

The Six Pillars

Operational Excellence

Focuses on continuous improvement, monitoring, and automation. Key topics: automation changes, defines standards for daily operations
Principles:

  • Perform operations as code - Automate infrastructure and operations using code to ensure consistency, reduce human error, and improve efficiency in cloud environments.
  • Make Frequent, Small, Reversible Changes – Implement small, incremental updates that can be quickly reversed if needed, ensuring faster innovation, easier troubleshooting, and minimal impact on users.
  • Refine operations procedures frequently - Continuously improve and adapt operational procedures to align with evolving workloads. Regularly test and validate processes to ensure effectiveness and team readiness.
  • Anticipate failure - identify and mitigate potential failure points through testing and simulations. Regularly validate response procedures to enhance system reliability, fault tolerance, and resilience. 
  • Learn from all operational failures - Continuously improve by analyzing failures, sharing lessons learned across teams and throughout the entire organization.

Reliability

Designs systems for fault tolerance, automated recovery, and resilience
Principles:

  • Automatically recover from failure - Monitor KPIs to trigger automated recovery and prevent failures.
  • Test recovery procedures - Simulate failures to validate and improve resilience. 
  • Scale horizontally to increase aggregate workload availability - Distribute requests across multiple smaller resources to avoid single points of failure.
  • Stop guessing capacity - Monitor demand and workload utilization and automate the addition or removal of resources to maintain the optimal level to satisfy demand without over- or under-provisioning.
  • Manage change in automation - Automate infrastructure changes and track modifications for review.

Performance Efficiency

Optimizes resources for scalability and efficiency. selecting resource types and sizes optimized for workload requirements, monitoring performance, and maintaining efficiency as business needs evolve.
Principles:

  • Democratize advanced technologies - Offload complex tech tasks to your cloud provider.
  • Go global in minutes - Deploy in multiple regions for lower latency and better user experience.
  • Use serverless architecture - Serverless architectures can ameliorate elasticity & reduce operational overhead/costs.
  • Experiment more often - Run comparative testing using different types of instances, storage, databases. Make data driven decisions. 
  • Consider mechanical sympathy - Use the technology approach that aligns best with your goals.

Cost Optimization

Controls spending while maximizing business value. Also known as FinOps
Principles:

  • Implement cloud financial management - Build FinOps expertise, processes for cost efficiency. Your organization must dedicate the necessary time and resources for building capability to become a cost-efficient organization.
  • Adopt a consumption model - Pay only for what you use, scaling as needed. 
  • Measure overall efficiency - Track cost vs. business output to optimize value. 
  • Stop spending money on undifferentiated heavy lifting - Focus on business, not IT infrastructure.
  • Analyze and attribute expenditure - Facilitate transparency through attribution of IT costs; optimize spendings and improve ROI.

Sustainability

Reduces environmental impact through efficient cloud resource usage.
Principles:

  • Understand your impact - Track resource use, emissions, and productivity to set KPIs and optimize efficiency.
  • Establish sustainability goals - Reduce resource use per transaction and plan for sustainable growth.
  • Maximize utilization - Optimize workloads to improve energy efficiency and minimize idle resources.
  • Anticipate and adopt new, more efficient hardware and software offerings - Stay flexible to integrate new, energy-efficient hardware and software.
  • Use managed services - Share infrastructure to reduce overall resource consumption.
  • Reduce the downstream impact of your cloud workloads - Optimize services to lower energy use and device requirements.

Security

Ensures data protection, risk assessment, and access management.
Principles:

  • Implement a strong identity foundation - Implement least privilege and centralize identity management. Eliminate reliance on long-term static credentials.
  • Enable traceability - Monitor, alert, and audit environment changes in real time.
  • Apply security at all layers - Use a defense-in-depth approach across all components.
  • Automate security best practices - Implement security controls as code to scale securely.
  • Protect data in transit and at rest - Encrypt and control access to sensitive data in transit and at rest.
  • Keep people away from data - Minimize manual data handling to reduce risk.
  • Prepare for security events - Establish incident management processes and run simulations.