Thinking Big презентация

Содержание

About Me Shawn Hermans Data Engineer/Scientist Technology consultant Physics, math, data geek

Слайд 1Thinking Big
An Introduction to Big Data


Слайд 2About Me
Shawn Hermans
Data Engineer/Scientist
Technology consultant
Physics, math, data geek


Слайд 3About this Talk
Non-technical introduction to Big Data
Not focused on any technology

or platform
Focus on concepts

Слайд 4Should you believe the hype?


Слайд 5No need for scientific method
Predict disease outbreaks before the CDC
Cure cancer
Innovating

healthcare
Solve world hunger
Bring about world peace

Big Data Promises


Слайд 7Big Data Criticism
Garbage in, Garbage out
Ignores the role of the

scientific method
Lots of questions don’t require large amounts of data to get good stats
Privacy issues

Слайд 8Big Data is just another way to think about data


Слайд 9Mental Models
“A mental model is simply a representation of an external

reality inside your head. Mental models are concerned with understanding knowledge about the world.”
- Farnam Street Blog

Слайд 10Examples
Occam's razor
Mind maps
Law of supply and demand
Never get in a land

war in Asia

Слайд 11All models are wrong, but some are useful


Слайд 12Relational Resistance
Resistance to big data concepts, technologies, and techniques because of

belief that the relational model is the only way to think about data.

See also: Theory induced blindness

Слайд 14Data Mental Models
Relational
Linked
Object Oriented
Geospatial
Temporal
Semantic
Event Based
Data as Code
Bayesian
Unstructured


Слайд 15What is Big Data?


Слайд 16“Big data is high volume, high velocity, and/or high variety information

assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”

According to Gartner


Слайд 17According to Me
Big data is the Bazaar to traditional data’s Cathedral



Слайд 18Cathedral and Bazaar
Traditional Data
Clean
Top down
Carefully collected
Scales vertically
One true way
Big Data
Disorderly
Bottom up
Randomly

collected
Scales horizontally
More than one way

Слайд 19Big Data Differences
Relational
Normalization
ACID
SQL/Query
Structured/Schema
Big Data
Denormalization
BASE
MapReduce/Other
Loosely Structured


Слайд 20Integrating all available data is the promise of Big Data


Слайд 21Why should you care?


Слайд 23Information as an Asset
Target specific customer's needs rather than broad segments
Just-in-time

inventory management
Evaluating demand for product
Predict and track traffic patterns

Слайд 24Big Data and You
What information do you have, that no one

else has?
Can you easily integrate your data or is it locked in silos?
What data don’t you collect?
What data don’t you archive?

Слайд 25Big Data Technology


Слайд 26Big Data Platforms
Cloud
AWS
Google
Microsoft

Hadoop
Cloudera
MapR
Hortonworks
This isn’t an all inclusive list, but a sample

of the big players in the space.

Слайд 27Big Data Stack
Batch Processing
Data Collection
SQL/Query
Search
Machine Learning
Serialization
Security
Stream Processing
File Storage
Resource management
Online NoSQL
Data Pipeline


Слайд 29What about data science?


Слайд 30Data science is statistics on a Mac
A data scientist is a

statistician who lives in San Francisco
Person who is better at statistics than any software engineer and better at software engineering than any statistician.

What IS Data Science?


Слайд 32The need for Data Science
There is a LOT of data
Too much

data for people to look at it all
Probabilistic models help extract signal from the noise
Need to automate the analysis and exploitation of data

Слайд 33Big Data has its limits


Слайд 34Black Swans and Big Data
There are fundamental limits to prediction
Hard to

predict rare events where no prior data exists (i.e. Black Swans)
Complex systems often have feedback loops (e.g. stock market)

Слайд 35What’s next?


Слайд 36Business
Identify some unresolved questions
Figure out what data could answer those questions
Pick

the easiest and test out your hypothesis

Getting Started

Technology
Pick a technology you know or want to learn
Pick a platform
Pick a data set and identify some basic problems to solve


Слайд 37My Info
Twitter: @shawnhermans
Github: github.com/shawnhermans
Blog: http://shawnhermans.github.io/ (In Progress)
Slideshare: www.slideshare.net/shawnhermans/
Quora: http://www.quora.com/Shawn-Hermans


Слайд 38Backup Slides


Слайд 40The Fourth Quadrant and the Failure of Statistics


Слайд 41Soothsayer
Simple HTTP/JSON API for training/classifying data
Lots of built in

classifier statistics

https://github.com/shawnhermans/soothsayer


Обратная связь

Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть 

Что такое ThePresentation.ru?

Это сайт презентаций, докладов, проектов, шаблонов в формате PowerPoint. Мы помогаем школьникам, студентам, учителям, преподавателям хранить и обмениваться учебными материалами с другими пользователями.


Для правообладателей

Яндекс.Метрика