Leveraging Customer Data to Enhance Relevancy in Personalization презентация

Содержание

1. Leveraging Customer Data to Enhance Relevancy in Personalization
2. Big Data Analytics Track Driving Personalized Experiences
3. Agenda For This Session Personalization Process Review
4. High Level Personalization Process 1. Profile created
5. Evolution of a Profile (1) { "_id"
6. Evolution of a Profile (n+1) { "_id"
7. One size/document fits all? Profile Data Preferences
8. Separation of Concerns Profile Data Preferences Personal
9. Benefits Code does less, Document and Code
10. Result Code does less, Document and Code
11. Analytics and Personalization From Query to Clustering
12. Separation of Concerns Profile Data Preferences Personal
13. Separation of Concerns Profile Data Preferences Personal
14. Architecture revised Data Processing
15. Advice for Developers OWN YOUR DATA! (but
16. Data Processing
17. Hadoop in a Nutshell An open source
18. Spark in a Nutshell Spark is a
19. Flink in a Nutshell Flink is a
20. Latency of query operations
21. Iterative Algorithms / Clustering
22. K-Means in Pictures Source: Wikipedia K-Means
23. K-Means as a Process
24. Iterations in Hadoop and Spark
25. Iterations in Flink Dedicated iteration operators
26. Demo
27. Result
28. More…?
29. Takeaways Stay focussed => Start and stay
30. Next Steps Next Session => Hands on
31. Thank you! Marc Schwering Sr. Solutions Architect – EMEA marc@mongodb.com @m4rcsch

Главная
Аналитика
Leveraging Customer Data to Enhance Relevancy in Personalization

Слайд 1Leveraging Customer Data to Enhance Relevancy in Personalization
“Using Apache Data Processing

Projects on top of MongoDB”

Marc Schwering
Sr. Solution Architect – EMEA
marc@mongodb.com
@m4rcsch

Слайд 2Big Data Analytics Track
Driving Personalized Experiences Using Customer Profiles

Leveraging Data to

Enhance Relevancy in Personalization

Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB

Слайд 3Agenda For This Session
Personalization Process Review
The Life of an Application
Separation of

Concerns / Real World Architecture
Apache Spark and Flink Data Processing Projects
Clustering with Apache Flink
Next Steps

Слайд 4High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture

activity

4. Clustering analysis

5. Define Personas

6. Tag with personas

7. Personalize interactions

Batch analytics

Public data

Common technologies
R
Hadoop
Spark
Python
Java
Many other options

Personas changed much less often than tagging

Слайд 5Evolution of a Profile (1)
{
"_id" : ObjectId("553ea57b588ac9ef066428e1"),
"ipAddress" : "216.58.219.238",
"referrer" : ”kay.com",
"firstName"

: "John",
"lastName" : "Doe",
"email" : "johndoe@gmail.com"
}

Originating IP
Demographic info
Location
Name
Sex
Email

Слайд 6Evolution of a Profile (n+1)
{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),
"firstName" : "John",
"lastName" : "Doe",
"address"

: "229 W. 43rd St.",
"city" : "New York",
"state" : "NY",
"zipCode" : "10036",
"age" : 30,
"email" : "john.doe@mongodb.com",
"twitterHandle" : "johndoe",
"gender" : "male",
"interests" : [
"electronics",
"basketball",
"weightlifting",
"ultimate frisbee",
"traveling",
"technology"
],
"visitedCounts" : {
"watches" : 3,
"shirts" : 1,
"sunglasses" : 1,
"bags" : 2
},
"purchases" : [
{
"id" : 1,
"desc" : "Power Oxford Dress Shoe",
"category" : "Mens shoes"
},
{
"id" : 2,
"desc" : "Striped Sportshirt",
"category" : "Mens shirts"
}
],
"persona" : "shoe-fanatic”
}

Слайд 7One size/document fits all?
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing

History
„Session Data“
View History
Shopping Cart Data
Information Broker Data
Personalisation Data
Persona Vectors
Product and Category recommendations

Application

Batch analytics

Слайд 8Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Data“
View History
Shopping Cart Data
Information Broker Data
Personalisation Data
Persona Vectors
Product and Category recommendations

Batch analytics Layer

Слайд 9Benefits
Code does less, Document and Code stays focused
Split ability
Different Teams
New Languages
Defined

Dependencies

Слайд 10Result
Code does less, Document and Code stays focused
Split ability
Different Teams
New Languages
Defined

Dependencies

KISS => Keep it simple and save!

=> Clean Code <=

Robert C. Marten: https://cleancoders.com/
M. Fowler / B. Meyer. et. al.: Command Query Separation

Слайд 11Analytics and Personalization
From Query to Clustering

Слайд 12Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Data“
View History
Shopping Cart Data
Information Broker Data
Personalisation Data
Persona Vectors
Product and Category recommendations

Batch analytics Layer

Слайд 13Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Data“
View History
Shopping Cart Data
Information Broker Data
Personalisation Data
Persona Vectors
Product and Category recommendations

Batch analytics Layer

Слайд 14Architecture revised

Data Processing

Слайд 15Advice for Developers
OWN YOUR DATA! (but only relevant Data)
Say no! (to

direct Data ie. DB Access)

Слайд 16Data Processing

Слайд 17Hadoop in a Nutshell
An open source distributed storage and distributed batch

oriented processing framework

Hadoop Distributed File System (HDFS) to store data on commodity hardware
Yarn as resource management platform
MapReduce as programming model working on top of HDFS

Слайд 18Spark in a Nutshell
Spark is a top-level Apache project
Can be run

on top of YARN and can read any Hadoop API data, including HDFS or MongoDB

Fast and general engine for large-scale data processing and analytics
Advanced DAG execution engine with support for data locality and in-memory computing

Слайд 19Flink in a Nutshell
Flink is a top-level Apache project
Can be run

on top of YARN and can read any Hadoop API data, including HDFS or MongoDB

A distributed streaming dataflow engine
Streaming and batch
Iterative in memory execution and handling
Cost based optimizer

Слайд 20Latency of query operations

Слайд 21Iterative Algorithms / Clustering

Слайд 22K-Means in Pictures
Source: Wikipedia K-Means

Слайд 23K-Means as a Process

Слайд 24Iterations in Hadoop and Spark

Слайд 25
Iterations in Flink
Dedicated iteration operators
Tasks keep running for the iterations, not

redeployed for each step
Caching and optimizations done automatically

Слайд 26Demo

Слайд 27Result

Слайд 28More…?

Слайд 29Takeaways
Stay focussed => Start and stay small
Evaluate with BigDocuments but do

a PoC focussed on the topic
Extending functionality is easy
Aggregation, MapReduce
Hadoop Connector opens a new variety of Use Cases
Extending functionality could be challenging
Evolution is outpacing help channels
A lot of options (Spark, Flink, Storm, Hadoop….)
More than just a binary

Слайд 30Next Steps
Next Session => Hands on Spark and Whatson Content!
„Machine Learning

to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB“
RDD Examples
Try out Spark and Flink
http://bit.ly/MongoDB_Hadoop_Spark_Webinar
http://flink.apache.org/
https://github.com/mongodb/mongo-hadoop
https://github.com/m4rcsch/flink-mongodb-example
Participate and ask Questions!
@m4rcsch
marc@mongodb.com

Слайд 31Thank you!

Marc Schwering
Sr. Solutions Architect – EMEA
marc@mongodb.com
@m4rcsch

Скачать презентацию

Leveraging Customer Data to Enhance Relevancy in Personalization презентация

Содержание

Слайд 1Leveraging Customer Data to Enhance Relevancy in Personalization
“Using Apache Data Processing

Слайд 2Big Data Analytics Track
Driving Personalized Experiences Using Customer Profiles

Leveraging Data to

Слайд 3Agenda For This Session
Personalization Process Review
The Life of an Application
Separation of

Слайд 4High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture

Слайд 5Evolution of a Profile (1)
{
"_id" : ObjectId("553ea57b588ac9ef066428e1"),
"ipAddress" : "216.58.219.238",
"referrer" : ”kay.com",
"firstName"

Слайд 6Evolution of a Profile (n+1)
{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),
"firstName" : "John",
"lastName" : "Doe",
"address"

Слайд 7One size/document fits all?
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing

Слайд 8Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Слайд 9Benefits
Code does less, Document and Code stays focused
Split ability
Different Teams
New Languages
Defined

Слайд 10Result
Code does less, Document and Code stays focused
Split ability
Different Teams
New Languages
Defined

Слайд 11Analytics and Personalization
From Query to Clustering

Слайд 12Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Слайд 13Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Слайд 14Architecture revised

Data Processing

Слайд 15Advice for Developers
OWN YOUR DATA! (but only relevant Data)
Say no! (to

Слайд 16Data Processing

Слайд 17Hadoop in a Nutshell
An open source distributed storage and distributed batch

Слайд 18Spark in a Nutshell
Spark is a top-level Apache project
Can be run

Слайд 19Flink in a Nutshell
Flink is a top-level Apache project
Can be run

Слайд 20Latency of query operations

Слайд 21Iterative Algorithms / Clustering

Слайд 22K-Means in Pictures
Source: Wikipedia K-Means

Слайд 23K-Means as a Process

Слайд 24Iterations in Hadoop and Spark

Слайд 25
Iterations in Flink
Dedicated iteration operators
Tasks keep running for the iterations, not

Слайд 26Demo

Слайд 27Result

Слайд 28More…?

Слайд 29Takeaways
Stay focussed => Start and stay small
Evaluate with BigDocuments but do

Слайд 30Next Steps
Next Session => Hands on Spark and Whatson Content!
„Machine Learning

Слайд 31Thank you!

Marc Schwering
Sr. Solutions Architect – EMEA
marc@mongodb.com
@m4rcsch

Обратная связь

Что такое ThePresentation.ru?

Leveraging Customer Data to Enhance Relevancy in Personalization презентация

Содержание

Слайд 1Leveraging Customer Data to Enhance Relevancy in Personalization“Using Apache Data Processing

Слайд 2Big Data Analytics TrackDriving Personalized Experiences Using Customer ProfilesLeveraging Data to

Слайд 3Agenda For This SessionPersonalization Process ReviewThe Life of an ApplicationSeparation of

Слайд 4High Level Personalization Process1. Profile created2. Enrich with public data3. Capture

Слайд 5Evolution of a Profile (1){ "_id" : ObjectId("553ea57b588ac9ef066428e1"), "ipAddress" : "216.58.219.238", "referrer" : ”kay.com", "firstName"

Слайд 6Evolution of a Profile (n+1){ "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName" : "John", "lastName" : "Doe", "address"

Слайд 7One size/document fits all?Profile DataPreferencesPersonal informationContact informationDOB, gender, ZIP...Customer DataPurchase HistoryMarketing

Слайд 8Separation of ConcernsProfile DataPreferencesPersonal informationContact informationDOB, gender, ZIP...Customer DataPurchase HistoryMarketing History„Session

Слайд 9BenefitsCode does less, Document and Code stays focusedSplit abilityDifferent TeamsNew LanguagesDefined

Слайд 10ResultCode does less, Document and Code stays focusedSplit abilityDifferent TeamsNew LanguagesDefined

Слайд 11Analytics and PersonalizationFrom Query to Clustering

Слайд 12Separation of ConcernsProfile DataPreferencesPersonal informationContact informationDOB, gender, ZIP...Customer DataPurchase HistoryMarketing History„Session

Слайд 13Separation of ConcernsProfile DataPreferencesPersonal informationContact informationDOB, gender, ZIP...Customer DataPurchase HistoryMarketing History„Session

Слайд 14Architecture revisedData Processing

Слайд 15Advice for DevelopersOWN YOUR DATA! (but only relevant Data)Say no! (to

Слайд 16Data Processing

Слайд 17Hadoop in a NutshellAn open source distributed storage and distributed batch

Слайд 18Spark in a NutshellSpark is a top-level Apache projectCan be run

Слайд 19Flink in a NutshellFlink is a top-level Apache projectCan be run

Слайд 20Latency of query operations

Слайд 21Iterative Algorithms / Clustering

Слайд 22K-Means in PicturesSource: Wikipedia K-Means

Слайд 23K-Means as a Process

Слайд 24Iterations in Hadoop and Spark

Слайд 25Iterations in FlinkDedicated iteration operatorsTasks keep running for the iterations, not

Слайд 26Demo

Слайд 27Result

Слайд 28More…?

Слайд 29TakeawaysStay focussed => Start and stay smallEvaluate with BigDocuments but do

Слайд 30Next StepsNext Session => Hands on Spark and Whatson Content!„Machine Learning

Слайд 31Thank you!Marc SchweringSr. Solutions Architect – EMEAmarc@mongodb.com@m4rcsch

Похожие презентации

Обратная связь

Что такое ThePresentation.ru?

Слайд 1Leveraging Customer Data to Enhance Relevancy in Personalization
“Using Apache Data Processing

Слайд 2Big Data Analytics Track
Driving Personalized Experiences Using Customer Profiles

Leveraging Data to

Слайд 3Agenda For This Session
Personalization Process Review
The Life of an Application
Separation of

Слайд 4High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture

Слайд 5Evolution of a Profile (1)
{
"_id" : ObjectId("553ea57b588ac9ef066428e1"),
"ipAddress" : "216.58.219.238",
"referrer" : ”kay.com",
"firstName"

Слайд 6Evolution of a Profile (n+1)
{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),
"firstName" : "John",
"lastName" : "Doe",
"address"

Слайд 7One size/document fits all?
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing

Слайд 8Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Слайд 9Benefits
Code does less, Document and Code stays focused
Split ability
Different Teams
New Languages
Defined

Слайд 10Result
Code does less, Document and Code stays focused
Split ability
Different Teams
New Languages
Defined

Слайд 11Analytics and Personalization
From Query to Clustering

Слайд 12Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Слайд 13Separation of Concerns
Profile Data
Preferences
Personal information
Contact information
DOB, gender, ZIP...
Customer Data
Purchase History
Marketing History
„Session

Слайд 14Architecture revised

Data Processing

Слайд 15Advice for Developers
OWN YOUR DATA! (but only relevant Data)
Say no! (to

Слайд 17Hadoop in a Nutshell
An open source distributed storage and distributed batch

Слайд 18Spark in a Nutshell
Spark is a top-level Apache project
Can be run

Слайд 19Flink in a Nutshell
Flink is a top-level Apache project
Can be run

Слайд 22K-Means in Pictures
Source: Wikipedia K-Means

Слайд 25
Iterations in Flink
Dedicated iteration operators
Tasks keep running for the iterations, not

Слайд 29Takeaways
Stay focussed => Start and stay small
Evaluate with BigDocuments but do

Слайд 30Next Steps
Next Session => Hands on Spark and Whatson Content!
„Machine Learning

Слайд 31Thank you!

Marc Schwering
Sr. Solutions Architect – EMEA
marc@mongodb.com
@m4rcsch