Humans By The Hundred презентация

Содержание

1. Humans By The Hundred
2. $ whoami SRE Manager at Yelp CWRU Alum Pittsburgh native
3. Yelp’s Mission: Connecting people with great local businesses.
4. Yelp Stats: As of Q2 2015
5. What is Yelp? Many sites: www, m,
6. Why Am I Here?
8. DATA
9. This talk is about people
17. The Goal
18. Iterate as fast as possible
19. Regardless of how many people are participating
20. Deployment
21. How It Starts
22. Deployment: the early days Get a few
25. Things get slower... Tests take longer to
26. The Problem: Humans Are Fallible
27. The Problem: Humans Are Fallible “…oh @$#&”
29. The Problem, With Math Assume: Every change
30. The Problem, With Math Only you p
31. The Problem, With Math p = (.98)n
32. The Problem, With Math p = (.98)n exponential decay!
34. This doesn’t scale! More developers = more
35. Mitigating Exponential Decay p = (.98)n
36. Mitigating Exponential Decay p = (.98)n
38. Making it harder to screw up Write
39. Just write better software and stop making mistakes!
40. PROBLEM SOLVED
42. The Real World Testing builds confidence in
43. Mitigating Exponential Decay p = (.98)n
44. Mitigating Exponential Decay p = (.98)n
45. Service-Oriented Architecture Large monolith → smaller services
46. Service-Oriented Architecture Benefits Smaller code bases =
47. Service-Oriented Architecture Drawbacks everything becomes decoupled function
48. SOA scales people, not code.
49. Conquering SOA With the monolith, it’s easy to focus on mean time between failures (MTBF)
50. Conquering SOA In a SOA, focus on mean time to recovery (MTTR)
51. Conquering SOA Fail fast Anticipate failure Leverage iteration speed to recover fast
52. Conquering SOA Treat everything as distributed That
53. Reaping the Benefits Smaller failure domains Fewer
54. Reaping the Benefits Smaller changes means smaller
55. Continuous Delivery Everyone works against master branch
56. PROBLEM SOLVED
57. Testing
58. Tests are hard to get right.
65. How can we do better?
67. “Not Recommended” Tests
68. “Not Recommended” Tests If a test fails
69. Reliable tests >> test coverage.
70. Don’t always run all the tests!
71. Tests of external services should be monitoring
72. Define your boundaries.
73. yelp.com / dataset_challenge 61K businesses
74. @YelpEngineering YelpEngineers engineeringblog.yelp.com github.com/yelp
75. yelp.com/careers
76. Questions?

$ whoami SRE Manager at Yelp CWRU Alum Pittsburgh native

Слайд 1Humans By The Hundred
Scaling Big Data for Big Team Growth

Слайд 2$ whoami
SRE Manager at Yelp
CWRU Alum
Pittsburgh native

Слайд 3

Yelp’s Mission:
Connecting people with great
local businesses.

Слайд 4

Yelp Stats:
As of Q2 2015

Слайд 5What is Yelp?
Many sites: www, m, biz, api
Mobile apps
Partner platform
Hundreds of

developers
Thousands of servers

Слайд 6Why Am I Here?

Слайд 7

Слайд 8DATA

Слайд 9This talk is about people

Слайд 10

Слайд 11

Слайд 12

Слайд 13

Слайд 14

Слайд 15

Слайд 16

Слайд 17The Goal

Слайд 18Iterate as fast as possible

Слайд 19Regardless of how many people are participating

Слайд 20Deployment

Слайд 21How It Starts

Слайд 22Deployment: the early days
Get a few people together in slack/irc/etc.
Merge up

the code
Run the tests
Manually test it in stage
Cross your fingers

Слайд 23

Слайд 24

Слайд 25Things get slower...
Tests take longer to run
More hosts = longer downloads
More

developers = more eyeballs
More features = more code

Слайд 26The Problem: Humans Are Fallible

Слайд 27The Problem: Humans Are Fallible
“…oh @$#&”

Слайд 28

Слайд 29The Problem, With Math
Assume:
Every change has a chance of success: 98%
That

means no test failures, no reverts, etc.
Every deploy has a number of changes: n
Any failure in the pipeline invalidates the deploy
Let’s figure out the probability of a successful deployment: p

Слайд 30The Problem, With Math
Only you
p = .98 (98%)
You and a friend
p

= .98 * .98 = .96 (96%)
You and nine co-workers
p = .98 * .98 * .98 * … * .98 = .82 (82%)

Слайд 31The Problem, With Math
p = (.98)n

Слайд 32The Problem, With Math
p = (.98)n
exponential decay!

Слайд 33

Слайд 34This doesn’t scale!
More developers = more changes
More changes = longer deploys
Longer

deploys = less time to develop
Less time to develop = slower to iterate
Slower to iterate != the goal

Слайд 35Mitigating Exponential Decay
p = (.98)n

Слайд 36Mitigating Exponential Decay
p = (.98)n

Слайд 37

Слайд 38Making it harder to screw up
Write more tests
Write better tests
Get better

code reviews
Get better infrastructure
Switch programming languages
Use better tools

Слайд 39Just write better software and stop making mistakes!

Слайд 40PROBLEM SOLVED

Слайд 41

Слайд 42The Real World
Testing builds confidence in our changes
Testing does not protect

you from failure
Better tools, tests, and infrastructure can raise our success rates

Слайд 43Mitigating Exponential Decay
p = (.98)n

Слайд 44Mitigating Exponential Decay
p = (.98)n

Слайд 45Service-Oriented Architecture
Large monolith → smaller services
Services communicate over network
Usually HTTP, but

you can do RPC, SOAP, etc.
Service = independent code base
Independent deployments

Слайд 46Service-Oriented Architecture
Benefits
Smaller code bases = upper bound to n
Failure domains become

isolated
Technology independence
Federated responsibility

Слайд 47Service-Oriented Architecture
Drawbacks
everything becomes decoupled
function calls start looking like HTTP requests
versioning can

be a nightmare
tracking dependencies is hard
data consistency becomes challenging
end-to-end testing becomes hard(er), if not impossible

Слайд 48SOA scales people, not code.

Слайд 49Conquering SOA
With the monolith, it’s easy to focus on mean time

between failures (MTBF)

Слайд 50Conquering SOA
In a SOA, focus on mean time to recovery (MTTR)

Слайд 51Conquering SOA
Fail fast
Anticipate failure
Leverage iteration speed to recover fast

Слайд 52Conquering SOA
Treat everything as distributed
That means everything will fail
Use timeouts, retries
Find

ways to degrade gracefully
Fail fast & isolated
Don’t rely on synchronous processes
Prepare for eventual consistency

Слайд 53Reaping the Benefits
Smaller failure domains
Fewer people & changes to manage
Deploys get

smaller
Deploys get faster
Deploys become continuous

Слайд 54Reaping the Benefits
Smaller changes
means smaller code reviews
means faster validation
means smaller blast

radius
means faster iteration

Слайд 55Continuous Delivery
Everyone works against master branch
Master is deployed when commits added
Deployment

gated by tests
Monitoring knows something is wrong before you do!

Слайд 56PROBLEM SOLVED

Слайд 57Testing

Слайд 58Tests are hard to get right.

Слайд 59

Слайд 60

Слайд 61

Слайд 62

Слайд 63

Слайд 64

Слайд 65How can we do better?

Слайд 66

Слайд 67“Not Recommended” Tests

Слайд 68“Not Recommended” Tests
If a test fails on master:
a feature is broken

on the live website, or
your test sucks and you should ditch it
In either case, we disable it
Ticket is created
Developers can fix it later or just bin it and start fresh

Слайд 69Reliable tests >> test coverage.

Слайд 70Don’t always run all the tests!

Слайд 71Tests of external services should be monitoring

Слайд 72Define your boundaries.

Слайд 73
yelp.com / dataset_challenge
61K businesses
61K checkin-sets
481K business attributes
1.6M reviews
366K users
2.8M edge

social-graph
495K tips

Your academic project, research or visualizations, submitted by Dec 31, 2015
=
$5,000 prize + $1,000 for publication + $500 for presenting*

*See full terms on website

Academic dataset from 10 cities in 4 countries!

Слайд 74
@YelpEngineering
YelpEngineers

engineeringblog.yelp.com
github.com/yelp

Слайд 75

yelp.com/careers

Слайд 76Questions?

Скачать презентацию

Humans By The Hundred презентация

Содержание

Слайд 1Humans By The HundredScaling Big Data for Big Team Growth

Слайд 2$ whoamiSRE Manager at YelpCWRU AlumPittsburgh native

Слайд 3Yelp’s Mission:Connecting people with greatlocal businesses.

Слайд 4Yelp Stats:As of Q2 2015

Слайд 5What is Yelp?Many sites: www, m, biz, apiMobile appsPartner platformHundreds of

Слайд 6Why Am I Here?

Слайд 8DATA

Слайд 9This talk is about people

Слайд 17The Goal

Слайд 18Iterate as fast as possible

Слайд 19Regardless of how many people are participating

Слайд 20Deployment

Слайд 21How It Starts

Слайд 22Deployment: the early daysGet a few people together in slack/irc/etc.Merge up

Слайд 25Things get slower...Tests take longer to runMore hosts = longer downloadsMore

Слайд 26The Problem: Humans Are Fallible

Слайд 27The Problem: Humans Are Fallible“…oh @$#&”

Слайд 29The Problem, With MathAssume:Every change has a chance of success: 98%That

Слайд 30The Problem, With MathOnly youp = .98 (98%)You and a friendp

Слайд 31The Problem, With Mathp = (.98)n

Слайд 32The Problem, With Mathp = (.98)nexponential decay!

Слайд 34This doesn’t scale!More developers = more changesMore changes = longer deploysLonger

Слайд 35Mitigating Exponential Decayp = (.98)n

Слайд 36Mitigating Exponential Decayp = (.98)n

Слайд 38Making it harder to screw upWrite more testsWrite better testsGet better

Слайд 39Just write better software and stop making mistakes!

Слайд 40PROBLEM SOLVED

Слайд 42The Real WorldTesting builds confidence in our changesTesting does not protect

Слайд 43Mitigating Exponential Decayp = (.98)n

Слайд 44Mitigating Exponential Decayp = (.98)n

Слайд 45Service-Oriented ArchitectureLarge monolith → smaller servicesServices communicate over networkUsually HTTP, but

Слайд 46Service-Oriented ArchitectureBenefitsSmaller code bases = upper bound to nFailure domains become

Слайд 47Service-Oriented ArchitectureDrawbackseverything becomes decoupledfunction calls start looking like HTTP requestsversioning can