# Airbnb Architecture
# Overview
Airbnb is a website that operates an online marketplace and hospitality service for people to lease or rent short-term lodging. The challenges for the engineering team includes high-availability, quick-scaling, etc. In this post, I put the architecture of Airbnb website in one article. Please tweet to @enqueuezero (opens new window) if you think anything is incorrect or out-dated.
Disclaimer: I'm not from Airbnb team and don't know anybody from Airbnb. All information can be found on the Internet, mainly from the Airbnb engineering blog (opens new window).
# Solutions
# AWS Stack
Airbnb uses below AWS services.
- It uses EC2 instances for its application, memcache, and search servers.
- It uses RDS as main MySQL database.
- It used ELB for traffic load balancing (Note: seems no longer used anymore, check section
Load Balancer
below.). - It uses EMR for daily data processing and analyzing (Note: seems somewhat out-dated, check section
Data Warehouse
below). - It uses S3 for backups and static files, including user pictures.
- It uses Amazon CloudWatch to supervise ES2 assets.
# Load Balancer
Charon is Airbnb's front-facing load balancer. Previously it was Amazon's ELB. The decision based on the fact that ELB was clunky and less helpful to troubleshoot.
With Charon, Akamai traffic hits Nginx servers directly. Then the traffic routes to the backend services by Synapse and HAProxy.
# Service Discovery
- SmartStack is an OSS service discovery framework. It has two components: Nerve (opens new window) and Synapse (opens new window). It relies on Zookeeper to store discovery data, as well as HAProxy for routing.
- Nerve manages the life-cycle of microservices based on health checks.
- Synapse looks up microservices instances and automatically update HAProxy configuration.
- Zookeeper stores znode for the name of the services and provide microservice instances change via Zookeeper watches.
# Web Tier
Airbnb users Rails for the front-end.
# Data Tier
Airbnb uses Amazon RDS as main MySQL database. The databases are deployed in multi-AZ (availability zone). Below 3-tier architecture reflects the basic pattern. Note that there are several types of databases for different scenarios, for example, airmaster
, calendar
, message
, etc. Therefore, there are over a dozen dbproxy and hundreds of database instances gets deployed.
- They're the community edition of MySQL server.
- Each MySQL server uses one-thread-per-connection model.
- Airbnb forked and modified (opens new window) MariaDB MaxScale for database proxy.
- Main functionalities of this proxy layer include connection pooling, request throttling, query blocklist, etc.
# Infrastructure as code
Airbnb manages infrastructure with Chef (opens new window).
# Data Warehouse
The Airbnb data infrastructure handles metrics, trains machine learning models, and runs business analytics, etc.
- Kafka performs as a broker for event logs.
- Sqoop performs as a broker for production database dumps.
- The Gold and Silver Hive cluster are the data sinks. The Gold Hive cluster replicates data to silver. The Gold Hive cluster has a higher SLA guarantee.
- A Spark Cluster works on machine learning for stream processing.
- A Presto Cluster is for ad hoc querying.
- An Airflow application runs in front-end for job scheduling.
- S3 is a long-term solution for HDFS data.
# Microservices
Airbnb uses Dropwizard (opens new window) service framework, and customized a Thrift service IDL.
- Developers can choose between JSON-over-http and Thrift-over-http.
- Downstream services need to install generated RPC clients from upstream.
- Downstream services also need to apply standard timeout, retry, and circuit breaker logic.
- The framework adds request and response metrics on both service-side and client-side.
- The framework adds requests context, including request id to all underlying service requests.
- The framework supports adding alerts based on metrics like
p95_latency
,p99_latency
, etc.
# Search Service
- Nebula is a schema-less, versioned data store service with both real-time random data access and offline batch data management.
- The search flow only adds some search indexing logic into this system.
- The snapshot is generated daily as a part of the offline data merge.
- The search index is built from the snapshot and then deployed to search periodically as an ordinary binary deploy.
# References
- Data Infrastructure at Airbnb (opens new window)
- Scaling Airbnb's Experimentation Platform (opens new window)
- What is the Airbnb Software Architecture (opens new window)
- Airbnb Case Study (opens new window)
- BinaryAlert: Real-time Serverless Malware Detection (opens new window)
- Alerting Framework at Airbnb (opens new window)
- Scaling Airbnb Payment Platform (opens new window)
- Measuring Transactional Integrity in Airbnb's Distributed Payment Ecosystem (opens new window)
- Tracking the Money - Scaling Financial Reporting at Airbnb (opens new window)
- Building Services, Part 1 (opens new window), Part 2 (opens new window)
- How Airbnb manages to monitor customer issues at scale (opens new window)
- Experiment Reporting Framework (opens new window)
- Streamalert: Real-time Data Analysis and Alerting (opens new window)
- Nebula as a Storage Platform to build Airbnb's Search Backends (opens new window)
- Unlocking Horizontal Scalability in Web Serving Tier (opens new window)
- Smartstack service discovery in the cloud (opens new window)
- Service Discovery with Smartstack and Docker (opens new window)