Scale Your Data Collection on the Cloud Like a Champ презентация

Содержание

1. Scale Your Data Collection on the Cloud Like a Champ
2. SCALING DATA COLLECTION = A PAIN Plenty
3. THREE COMMON METHODS FOR COLLECTING BIG DATA...
4. STORING DIRECTLY IN THE DB This is
5. PROS FOR STORING DIRECTLY IN THE DB
6. CONS FOR STORING DIRECTLY IN THE DB
7. BOTTOM LINE Storing directly in the DB gives you fast performance, but it doesn’t scale.
8. KEEPING IT LOCAL Data is dumped in
9. PROS FOR KEEPING IT IN A LOCAL
10. CONS FOR KEEPING IT IN A LOCAL
11. BOTTOM LINE More flexible than direct DB
12. S3/CLOUDFRONT LOGGING This old school solution goes
13. PROS FOR S3/CLOUDFRONT LOGGING No tracking server
14. PROS FOR S3/CLOUDFRONT LOGGING CONTINUED Easily scalable
15. CONS FOR S3/CLOUDFRONT LOGGING Slower filtering performance
16. BOTTOM LINE Quick, cheap, and scalable though it doesn’t provide the best performance and customization.
17. WHAT’S RIGHT FOR YOU? So much emphasis
18. XPLENTY WWW.XPLENTY.COM

Главная
Аналитика
Scale Your Data Collection on the Cloud Like a Champ

Слайд 1SCALE YOUR DATA COLLECTION ON THE CLOUD LIKE A CHAMP
Moty Michaely,

VP R&D Xplenty

Слайд 2SCALING DATA COLLECTION = A PAIN
Plenty of companies are limited by

their data collection methods when it comes to scalability.
Once they need more detailed data and in larger quantities, scaling the system can become a major pain.

Слайд 3THREE COMMON METHODS FOR COLLECTING BIG DATA... IS YOUR COMPANY USING

THE RIGHT ONE?

Storing directly in the DB
Keeping it in a local file
S3/CloudFront logging

Слайд 4STORING DIRECTLY IN THE DB
This is what companies usually start with.

As the name suggests, data is inserted right into the DB.
There are two ways to do it:
Row by row means the data is added as a row to the DB in real time.
Bulk insert adds multiple rows to the DB in one transaction. (It’s faster than row by row, but insertion of the entire batch may fail, thus having to re-insert a big chunk of data.)

Слайд 5PROS FOR STORING DIRECTLY IN THE DB
Better performance than other methods

for inserting data.
Real-time data available when adding row by row.

Слайд 6CONS FOR STORING DIRECTLY IN THE DB
Schema changes are required to

add new types of data.
Scaling is required in two layers - application and database. Scaling the application is usually easier (using a network load balancer for example) but scaling the database requires hiring an expert DBA, partitioning the DB, and scaling up the server. (Relational DBs that scale out to multiple nodes are expensive and require a lot of maintenance.)

Слайд 7BOTTOM LINE
Storing directly in the DB gives you fast performance, but

it doesn’t scale.

Слайд 8KEEPING IT LOCAL
Data is dumped in big local files. These files

are periodically uploaded via a program to S3 or inserted in batches into a NoSQL DB, such as Amazon DynamoDB or a data warehouse like Amazon RedShift.

Слайд 9PROS FOR KEEPING IT IN A LOCAL FILE
New types of data

can be added easily since no schema changes are required.
Compatible with all applications because any file format can be used.
Quicker filtering via customized directory/file names, e.g. with date/time indication.

Слайд 10CONS FOR KEEPING IT IN A LOCAL FILE
One needs to develop

a tracking program to deal with the files - rotating logs while more data is incoming, handling failures, and transactionality. Even if you have the manpower, time, and money, it’s hard to develop such a program.
Scaling means adding more servers, more maintenance, and more money.
Data is not as query-able compared to storage in a DB.
Staging and production environments require extra servers.

Слайд 11BOTTOM LINE
More flexible than direct DB storage, but requires more development,

and scaling is still an issue.

Слайд 12S3/CLOUDFRONT LOGGING
This old school solution goes back to the early days

when visitor counters and burning “hot!” animations ruled the web. To track an event, an HTTP request is sent for a 1x1 pixel image from a relevant S3 directory. Accessing the image automatically generates a W3C log with all HTTP request parameters: IP address, browser, date/time, etc. Extra session level data like username or mouse position is passed via the query string. To differentiate between event types, images are placed in accordingly named directories, e.g. /click/.

Слайд 13PROS FOR S3/CLOUDFRONT LOGGING
No tracking server required - data reaches S3

automatically.
No file management - Amazon handles all file monkey business.
No servers - Amazon provides them.
Cost effective - only log storage and bandwidth are paid for. The logs take little space since they are all GZipped and the bandwidth for 1x1 pixel images is marginal.

Слайд 14PROS FOR S3/CLOUDFRONT LOGGING CONTINUED
Easily scalable with practically infinite space and

firepower.
Quick and easy to implement.
Simple setup for staging/production environments via additional distributions and a prefix.
Web application performance unharmed, especially using the CloudFront CDN.

Слайд 15CONS FOR S3/CLOUDFRONT LOGGING
Slower filtering performance compared to local setup. Amazon

handles log file/directory names automatically and no customization is available.
Not suitable for real time or impatience. Data is aggregated into a new file in the bucket only once per hour, and that’s Amazon’s best effort so it could take longer.
Data is not as query-able compared to storage in a DB.
Vendor dependent. Having your servers outside of Amazon will decrease performance.
No control over the file format. W3C Extended Log File Format is mandatory and some applications may not like that.

Слайд 16BOTTOM LINE
Quick, cheap, and scalable though it doesn’t provide the best

performance and customization.

Слайд 17WHAT’S RIGHT FOR YOU?
So much emphasis has been put on the

technologies used for processing, analyzing, and visualizing data. But so often getting lost in the shuffle is the importance of the collection of this data. The two go hand in hand. To get good output from your data, you must first have proper input.
Only once you have achieved the synergy between the two will you fully be able to tap into your data’s potential.

Слайд 18XPLENTY WWW.XPLENTY.COM

Скачать презентацию

Scale Your Data Collection on the Cloud Like a Champ презентация

Содержание

Слайд 1SCALE YOUR DATA COLLECTION ON THE CLOUD LIKE A CHAMP
Moty Michaely,

Слайд 2SCALING DATA COLLECTION = A PAIN
Plenty of companies are limited by

Слайд 3THREE COMMON METHODS FOR COLLECTING BIG DATA... IS YOUR COMPANY USING

Слайд 4STORING DIRECTLY IN THE DB
This is what companies usually start with.

Слайд 5PROS FOR STORING DIRECTLY IN THE DB
Better performance than other methods

Слайд 6CONS FOR STORING DIRECTLY IN THE DB
Schema changes are required to

Слайд 7BOTTOM LINE
Storing directly in the DB gives you fast performance, but

Слайд 8KEEPING IT LOCAL
Data is dumped in big local files. These files

Слайд 9PROS FOR KEEPING IT IN A LOCAL FILE
New types of data

Слайд 10CONS FOR KEEPING IT IN A LOCAL FILE
One needs to develop

Слайд 11BOTTOM LINE
More flexible than direct DB storage, but requires more development,

Слайд 12S3/CLOUDFRONT LOGGING
This old school solution goes back to the early days

Слайд 13PROS FOR S3/CLOUDFRONT LOGGING
No tracking server required - data reaches S3

Слайд 14PROS FOR S3/CLOUDFRONT LOGGING CONTINUED
Easily scalable with practically infinite space and

Слайд 15CONS FOR S3/CLOUDFRONT LOGGING
Slower filtering performance compared to local setup. Amazon

Слайд 16BOTTOM LINE
Quick, cheap, and scalable though it doesn’t provide the best

Слайд 17WHAT’S RIGHT FOR YOU?
So much emphasis has been put on the

Слайд 18XPLENTY WWW.XPLENTY.COM

Обратная связь

Что такое ThePresentation.ru?

Scale Your Data Collection on the Cloud Like a Champ презентация

Содержание

Слайд 1SCALE YOUR DATA COLLECTION ON THE CLOUD LIKE A CHAMPMoty Michaely,

Слайд 2SCALING DATA COLLECTION = A PAINPlenty of companies are limited by

Слайд 3THREE COMMON METHODS FOR COLLECTING BIG DATA... IS YOUR COMPANY USING

Слайд 4STORING DIRECTLY IN THE DBThis is what companies usually start with.

Слайд 5PROS FOR STORING DIRECTLY IN THE DBBetter performance than other methods

Слайд 6CONS FOR STORING DIRECTLY IN THE DBSchema changes are required to

Слайд 7BOTTOM LINEStoring directly in the DB gives you fast performance, but

Слайд 8KEEPING IT LOCALData is dumped in big local files. These files

Слайд 9PROS FOR KEEPING IT IN A LOCAL FILENew types of data

Слайд 10CONS FOR KEEPING IT IN A LOCAL FILEOne needs to develop

Слайд 11BOTTOM LINEMore flexible than direct DB storage, but requires more development,

Слайд 12S3/CLOUDFRONT LOGGINGThis old school solution goes back to the early days

Слайд 13PROS FOR S3/CLOUDFRONT LOGGINGNo tracking server required - data reaches S3

Слайд 14PROS FOR S3/CLOUDFRONT LOGGING CONTINUEDEasily scalable with practically infinite space and

Слайд 15CONS FOR S3/CLOUDFRONT LOGGINGSlower filtering performance compared to local setup. Amazon

Слайд 16BOTTOM LINEQuick, cheap, and scalable though it doesn’t provide the best

Слайд 17WHAT’S RIGHT FOR YOU?So much emphasis has been put on the

Слайд 18XPLENTY WWW.XPLENTY.COM

Похожие презентации

Обратная связь

Что такое ThePresentation.ru?

Слайд 1SCALE YOUR DATA COLLECTION ON THE CLOUD LIKE A CHAMP
Moty Michaely,

Слайд 2SCALING DATA COLLECTION = A PAIN
Plenty of companies are limited by

Слайд 4STORING DIRECTLY IN THE DB
This is what companies usually start with.

Слайд 5PROS FOR STORING DIRECTLY IN THE DB
Better performance than other methods

Слайд 6CONS FOR STORING DIRECTLY IN THE DB
Schema changes are required to

Слайд 7BOTTOM LINE
Storing directly in the DB gives you fast performance, but

Слайд 8KEEPING IT LOCAL
Data is dumped in big local files. These files

Слайд 9PROS FOR KEEPING IT IN A LOCAL FILE
New types of data

Слайд 10CONS FOR KEEPING IT IN A LOCAL FILE
One needs to develop

Слайд 11BOTTOM LINE
More flexible than direct DB storage, but requires more development,

Слайд 12S3/CLOUDFRONT LOGGING
This old school solution goes back to the early days

Слайд 13PROS FOR S3/CLOUDFRONT LOGGING
No tracking server required - data reaches S3

Слайд 14PROS FOR S3/CLOUDFRONT LOGGING CONTINUED
Easily scalable with practically infinite space and

Слайд 15CONS FOR S3/CLOUDFRONT LOGGING
Slower filtering performance compared to local setup. Amazon

Слайд 16BOTTOM LINE
Quick, cheap, and scalable though it doesn’t provide the best

Слайд 17WHAT’S RIGHT FOR YOU?
So much emphasis has been put on the