Maintaining the Front Door to Netflix презентация

Содержание

There are copious notes attached to each slide in this presentation. Please read those notes to get the full context of the presentation

Слайд 1Maintaining the Front Door to Netflix
Daniel Jacobson
@daniel_jacobson
http://www.linkedin.com/in/danieljacobson
http://www.slideshare.net/danieljacobson


Слайд 2There are copious notes attached to each slide in this presentation.

Please read those notes to get the full context of the presentation

Слайд 3Global Streaming Video for TV Shows and Movies


Слайд 4More than 44 Million Subscribers
More than 40 Countries


Слайд 5Netflix Accounts for ~33% of Peak Internet Traffic in North America

Netflix

subscribers are watching more than 1 billion hours a month

Слайд 8Team Focus: Build the Best Global Streaming Product
Three aspects of the Streaming

Product:
Non-Member
Discovery
Streaming

Слайд 9Key Responsibilities
Broker data between services and UIs

Maintain a resilient front-door

Scale the

system vertically and horizontally

Maintain high velocity

Слайд 10But Before Streaming…


Слайд 13Monolithic Application
In Netflix Data Centers


Слайд 14The bigger the ship…
the slower it turns


Слайд 15Distributed Architecture


Слайд 17

1000+ Device Types


Слайд 18
Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
Reviews
A/B Test Engine

Dozens of Dependencies


Слайд 19
Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 20Dependency Relationships


Слайд 212,000,000,000
Requests Per Day to the Netflix API


Слайд 2230
Distinct Dependent Services for the Netflix API


Слайд 23~500
Dependency jars Slurped into the Netflix API


Слайд 2414,000,000,000
Netflix API Calls Per Day to those Dependent Services


Слайд 250
Dependent Services with 100% SLA


Слайд 2699.99% = 99.7%
30
0.3% of 2B = 6M failures per day
2+

Hours of Downtime
Per Month

Слайд 2799.99% = 99.7%
30
0.3% of 2B = 6M failures per day
2+

Hours of Downtime
Per Month



Слайд 2899.9% = 97%
30
3% of 2B = 60M failures per

day

20+ Hours of Downtime
Per Month


Слайд 29Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 30Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 31Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 32Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 33Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 35Circuit Breaker Dashboard






Слайд 37
Call Volume and Health / Last 10 Seconds


Слайд 38
Call Volume / Last 2 Minutes


Слайд 39

Successful Requests


Слайд 40

Successful, But Slower Than Expected


Слайд 41

Short-Circuited Requests, Delivering Fallbacks


Слайд 42

Timeouts, Delivering Fallbacks


Слайд 43

Thread Pool & Task Queue Full, Delivering Fallbacks


Слайд 44

Exceptions, Delivering Fallbacks


Слайд 45

Error Rate
# + # + # + # / (# +

# + # + # + #) = Error Rate



Слайд 46

Status of Fallback Circuit


Слайд 47

Requests per Second, Over Last 10 Seconds


Слайд 48

SLA Information


Слайд 49Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 50Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 51Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine


Слайд 52Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine
Fallback


Слайд 53Personalization Engine
User Info
Movie Metadata
Movie Ratings
Similar Movies
API
Reviews
A/B Test Engine
Fallback


Слайд 54Scaling the Distributed System


Слайд 56AWS Cloud


Слайд 58Autoscaling


Слайд 59Autoscaling


Слайд 60Amazon Auto Scaling Limitations
Hard to fit policies to variable traffic patterns

(weekday vs weekend)
Limited control over capacity adjustments (absolute value or %)

Слайд 61The Impact of AAS Limitations
Traffic drop can lead to scale downs

during outage
Performance degradation between new instance launch and taking traffic
Excess capacity at peak and trough

Слайд 62Scryer : Predictive Auto Scaling
Not yet…


Слайд 63Typical Traffic Patterns Over Five Days


Слайд 64Predicted RPS Compared to Actual RPS


Слайд 65Scaling Plan for Predicted Workload


Слайд 66What is Scryer Doing?
Evaluating needs based on historical data
Week over week,

month over month metrics

Adjusts instance minimums based on algorithms

Relies on Amazon Auto Scaling for unpredicted events

Слайд 67Results


Слайд 68Results : Load Average
Reactive
Predictive


Слайд 69Results : Response Latencies
Reactive
Predictive


Слайд 70Results : Outage Recovery


Слайд 71Results : Outage Recovery


Слайд 72Results : AWS Costs


Слайд 73Scaling Globally


Слайд 74More than 44 Million Subscribers
More than 40 Countries


Слайд 75Zuul Gatekeeper for the Netflix Streaming Application


Слайд 76Zuul *
Multi-Region Resiliency
Insights
Stress Testing
Canary Testing
Dynamic Routing
Load Shedding
Security
Static Response Handling
Authentication
* Most closely

resembles an API proxy

Слайд 77Isthmus


Слайд 79All of these approaches are designed to prevent failures…


Слайд 80But sometimes the best way to prevent failures is to force

them!

Слайд 82I randomly terminate instances in production to identify dormant failures.
Chaos Monkey


Слайд 83Chaos Gorilla
I simulate an outage of an entire Amazon availability zone.


Слайд 84I simulate an outage in an AWS region.
Chaos Kong


Слайд 85I find instances that don’t adhere to best practices.
Conformity Monkey


Слайд 86I extend Conformity Monkey to find security violations.
Security Monkey


Слайд 87I detect unhealthy instances and remove them from service.
Doctor Monkey


Слайд 88I clean up the clutter and waste that runs in the

cloud.

Janitor Monkey


Слайд 89I induce artificial delays and errors into services to determine how

upstream services will respond.

Latency Monkey


Слайд 91Deployments in the Cloud


Слайд 92Dependency Relationships


Слайд 94Testing Philosophy: Act Fast, React Fast


Слайд 95That Doesn’t Mean We Don’t Test


Слайд 96Automated Delivery Pipeline


Слайд 97Cloud-Based Deployment Techniques



Слайд 98Current Code

In Production
API Requests from
the Internet


Слайд 99Single Canary Instance
To Test New Code with Production Traffic
(around 1% or

less of traffic)

Current Code

In Production

API Requests from
the Internet


Слайд 100Canary Analysis Automation


Слайд 101Single Canary Instance
To Test New Code with Production Traffic
(around 1% or

less of traffic)

Current Code

In Production

API Requests from
the Internet

Error!


Слайд 102Current Code

In Production
API Requests from
the Internet


Слайд 103Current Code

In Production
API Requests from
the Internet


Слайд 104Current Code

In Production
API Requests from
the Internet
Perfect!


Слайд 105Stress Test with Zuul


Слайд 106Current Code

In Production
API Requests from
the Internet
New Code

Getting Prepared for Production


Слайд 107Current Code

In Production
API Requests from
the Internet
New Code

Getting Prepared for Production


Слайд 108Error!
Current Code

In Production
API Requests from
the Internet
New Code

Getting Prepared for Production


Слайд 109Current Code

In Production
API Requests from
the Internet
New Code

Getting Prepared for Production


Слайд 110Current Code

In Production
API Requests from
the Internet
Perfect!


Слайд 111Stress Test with Zuul


Слайд 112Current Code

In Production
API Requests from
the Internet
New Code

Getting Prepared for Production


Слайд 113Current Code

In Production
API Requests from
the Internet
New Code

Getting Prepared for Production


Слайд 114API Requests from
the Internet
New Code

Getting Prepared for Production


Слайд 115Brokering Data to 1,000+ Device Types


Слайд 118Screen Real Estate


Слайд 119Controller


Слайд 120Technical Capabilities


Слайд 121One-Size-Fits-All
API
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request


Слайд 122Courtesy of South Florida Classical Review


Слайд 124Resource-Based API vs. Experience-Based API


Слайд 125Resource-Based Requests
/users//ratings/title
/users//queues
/users//queues/instant
/users//recommendations
/catalog/titles/movie
/catalog/titles/series
/catalog/people


Слайд 126
REST API
RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH
MEMBER
DATA
A/B TESTS
START-UP
RATINGS
Network Border
Network Border


Слайд 127RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH
MEMBER
DATA
A/B TESTS
START-UP
RATINGS

OSFA API
Network Border
Network Border
SERVER CODE
CLIENT CODE


Слайд 128RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH
MEMBER
DATA
A/B TESTS
START-UP
RATINGS

OSFA API
Network Border
Network Border
DATA GATHERING,
FORMATTING,
AND DELIVERY
USER INTERFACE
RENDERING


Слайд 131Experience-Based Requests
/ps3/homescreen


Слайд 132
JAVA API

Network Border
Network Border
RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH
MEMBER
DATA
A/B TESTS
START-UP
RATINGS
Groovy Layer


Слайд 134RECOMMENDATIONSAZXSXX C CCC
MOVIE DATA
SIMILAR MOVIES
AUTH
MEMBER
DATA
A/B TESTS
START-UP
RATINGS

JAVA API

SERVER CODE

CLIENT CODE

CLIENT ADAPTER CODE
(WRITTEN

BY CLIENT TEAMS, DYNAMICALLY UPLOADED TO SERVER)

Network Border

Network Border


Слайд 135RECOMMENDATIONSAZXSXX C CCC
MOVIE DATA
SIMILAR MOVIES
AUTH
MEMBER
DATA
A/B TESTS
START-UP
RATINGS

JAVA API

DATA GATHERING
DATA FORMATTING
AND DELIVERY
USER INTERFACE
RENDERING
Network

Border

Network Border


Слайд 137https://www.github.com/Netflix


Слайд 138Maintaining the Front Door to Netflix
Daniel Jacobson
@daniel_jacobson
http://www.linkedin.com/in/danieljacobson
http://www.slideshare.net/danieljacobson


Обратная связь

Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть 

Что такое ThePresentation.ru?

Это сайт презентаций, докладов, проектов, шаблонов в формате PowerPoint. Мы помогаем школьникам, студентам, учителям, преподавателям хранить и обмениваться учебными материалами с другими пользователями.


Для правообладателей

Яндекс.Метрика