Face Recognition: From Scratch to Hatch презентация

Содержание

Face Recognition in Cloud@Mail.ru Users upload photos to Cloud Backend identifies persons on photos, tags and show clusters

Слайд 2Face Recognition in Cloud@Mail.ru
Users upload photos to Cloud

Backend identifies

persons on photos, tags and show clusters






Слайд 3Social networks


Слайд 5edges
object parts (combination of edges)
object models


Слайд 7Face detection


Слайд 8Auxiliary task: facial landmarks
Face alignment: rotation
Goal: make it easier for Face

Recognition

Слайд 9Train Datasets
Wider
32k images
494k faces

Celeba
200k images, 10k persons
Landmarks, 40 binary attributes


Слайд 10Test Dataset: FDDB
Face Detection Data Set and Benchmark
2845 images
5171 faces


Слайд 11Old school: Viola-Jones
Haar Feature-based Cascade Classifiers


Слайд 12Viola-Jones algorithm: training
Face or Not


Слайд 13Viola-Jones algorithm: inference
Stages
Face
Yes
Yes
Stage 1
Stage 2
Stage N
Optimization
Features are grouped into stages
If a

patch fails any stage => discard

Слайд 14Viola-Jones results
OpenCV implementation
Fast: ~100ms on CPU
Not accurate


Слайд 15Pre-trained network: extracting features
New school: Region-based Convolutional Networks

Faster RCNN, algorithm
Face ?


Region proposal network

RoI-pooling: extract corresponding tensor

Classifier: classes and the bounding box


Слайд 16Comparison: Viola-Jones vs R-FCN
Results
92% accuracy (R-FCN)
FDDB
results
40ms on GPU (slow)


Слайд 17Face detection: how fast
We need faster solution at the same accuracy!



Target: < 10ms



Слайд 18Alternative: MTCNN
Cascade of 3 CNN
Resize to different scales

Proposal -> candidates +

b-boxes

Refine -> calibration

Output -> b-boxes + landmarks


Слайд 19Comparison: MTCNN vs R-FCN
MTCNN
+ Faster
+ Landmarks
- Less accurate
- No batch

processing



Слайд 21What is TensorRT
NVIDIA TensorRT is a high-performance deep learning inference optimizer

Features
Improves

performance for complex networks
FP16 & INT8 support
Effective at small batch-sizes



Слайд 22TensorRT: layer optimizations
Horizontal fusion

Concat elision



Vertical layer fusion


Слайд 23TensorRT: downsides
Caffe + TensorFlow supported
Fixed input/batch size
Basic layers support


Слайд 24Batch processing
Problem
Image size is fixed, but
MTCNN works at different scales
Solution
Pyramid on

a single image



Слайд 25Batch processing

Results
Single run
Enables batch processing


Слайд 26TensorRT: layers

Problem
No PReLU layer => default pre-trained model can’t be used

Retrained

with ReLU from scratch




-20%


Слайд 27Face detection: inference

Target: < 10 ms

Result: 8.8 ms


Ingredients
MTCNN
Batch processing
TensorRT


Слайд 29Face recognition task
Goal – to compare faces
How? To learn metric
To enable

Zero-shot learning

Слайд 30Training set: MSCeleb
Top 100k celebrities
10 Million images, 100 per person
Noisy: constructed

by leveraging public search engines


Слайд 31Small test dataset: LFW
Labeled Faces in the Wild Home
13k images from

the web
1680 persons have >= 2 photos


Слайд 32Large test dataset: Megaface
Identification under up to 1 million “distractors”
530 people

to find


Слайд 33Megaface leaderboard

~83%

~98%
cleaned


Слайд 34Metric Learning


Слайд 35Classification
Train CNN to predict classes
Pray for good latent space


Слайд 36Softmax
Learned features only separable but not discriminative
The resulting features are not

sufficiently effective

Слайд 37We need metric learning
Tightness of the cluster
Discriminative features


Слайд 38Triplet loss
Features
Identity -> single point
Enforces a margin between persons



positive + α

< negative

Слайд 39Choosing triplets
Crucial problem
How to choose triplets ? Useful triplets = hardest

errors


Solution
Hard-mining within a large mini-batch (>1000)


Слайд 40Choosing triplets: trap


Слайд 41Choosing triplets: trap
positive ~ negative


Слайд 42Choosing triplets: trap
Instead


Слайд 43Choosing triplets: trap
Selecting hardest negative may lead to the collapse early

in training

Слайд 44Choosing triplets: semi-hard
positive < negative < positive + α


Слайд 45Triplet loss: summary
Overview
Requires large batches, margin tuning
Slow convergence

Opensource Code
Openface (Torch)
suboptimal implementation
Facenet,

not original (TensorFlow)


Слайд 46Center loss
Idea: pull points to class centroids













































Слайд 47Center loss: structure
Without classification loss – collapses
Softmax
Loss
Center
Loss
Final loss = Softmax loss

+ λ Center loss

Слайд 48Center Loss: different lambdas
λ = 10-7


Слайд 49Center Loss: different lambdas
λ = 10-6


Слайд 50Center Loss: different lambdas
λ = 10-5


Слайд 51Center loss: summary
Overview
Intra-class compactness and inter-class separability
Good performance at several other

tasks

Opensource Code
Caffe (original, Megaface - 65%)


Слайд 52Tricks: augmentation
Test time augmentation
Flip image
Average embeddings

Compute 2 embeddings


Слайд 53Tricks: alignment
Rotation
Kabsch algorithm - the optimal rotation matrix that minimizes the

RMSD

Слайд 54Angular Softmax
On sphere
Angle discriminates


Слайд 55Angular Softmax


Слайд 56Angular Softmax: different «m»


Слайд 57Angular softmax: summary
Overview
Works only on small datasets
Slight modification of the loss

yields 74.2%

Various modification of the loss function


Слайд 58Metric learning: summary
Softmax < Triplet < Center < A-Softmax

A-Softmax
With bells and

whistles better than center loss

Overall
Rule of thumb: use Center loss
Metric learning may improve classification performance


Слайд 59Fighting errors


Слайд 60Errors after MSCeleb: children



Problem
Children all look alike


Consequence
Average embedding ~ single point

in the space

Слайд 61Errors after MSCeleb: asian


Problem
Face Recognition’s intolerant to Asians


Reason
Dataset doesn’t contain enough

photos of these categories

Слайд 62How to fix these errors ?
It’s all about data, we need

diverse dataset!
Natural choice – avatars of social networks


Слайд 63A way to construct dataset

Cleaning algorithm
Face detection
Face recognition -> embeddings
Hierarchical clustering

algorithm

Pick the largest cluster as a person

Iterate after each model improvement


Слайд 64MSCeleb dataset’s errors
MSCeleb is constructed by leveraging search engines

Joe Eszterhas and

Mel Gibson public confrontation leads to the error

=


Слайд 65MSCeleb dataset’s errors
Female
+
Male


Слайд 66MSCeleb dataset’s errors
Asia
Mix


Слайд 67MSCeleb dataset’s errors
Dataset has been shrinked from 100k to 46k celebrities

Random
search

engine

Слайд 68Results on new datasets
Datasets
Train:
MSCeleb (46k)
VK-train (200k)

Test
MegaVK
Sets for children and asians


Слайд 69How to handle big dataset
It seems we can add more data

infinitely, but no.

Problems
Memory consumption (Softmax)
Computational costs
A lot of noise in gradients


Слайд 70Softmax Approximation
Algorithm
Perform K-Means clustering using current FR model


Слайд 71Softmax Approximation
Algorithm
Perform K-Means clustering using current FR model
Two Softmax heads:
Predicts cluster

label

Class within the true cluster


Слайд 72Softmax Approximation
Pros
Prevents fusing of the clusters
Does hard-negative mining
Clusters can be specified
Children
Asian
Results
Doesn’t

improve accuracy
Decreases memory consumption (K times)

Слайд 73Fighting errors on production


Слайд 74Errors: blur

Problem
Detector yields blurry photos
Recognition forms «blurry clusters»

Solution
Laplacian – 2nd order

derivative of the image

Слайд 75Laplacian in action
Low
variance
High
variance


Слайд 76Errors: body parts
Detection
mistakes form
clusters


Слайд 77Errors: diagrams & mushrooms


Слайд 78Fixing trash clusters
There is similarity between “no faces”!


Слайд 79Workaround
Algorithm

Construct «trash» dataset
Compute average embedding
Every point inside the sphere –

trash

Results
ROC AUC 97%


Слайд 80Spectacular results


Слайд 81Fun: new governors
Recently appointed governors are almost twins, but FR

distinguishes them

Слайд 82Over years
Face recognition algorithm captures similarity across years

Although we didn’t

focus on the problem


Слайд 83Over years


Слайд 84Summary

Use TensorRT to speed up inference
Metric learning: use Center loss by

default
Clean your data thoroughly
Understanding CNN helps to fight errors

Слайд 86Auxiliary


Слайд 87Best avatar
Problem
How to pick an avatar for a person ?
Solution
Train model

to predict awesomeness of photo


Слайд 88Predicting awesomeness: how to approach
Social networks – not only photos, but

likes too





Слайд 89Predicting awesomeness: dataset

Awesomeness (A) = likes/audience
A=18%
A=27%
A=75%


Слайд 90Results
Mean Aveage Precision @5: 25%
Data and metric are noisy => human

evaluation

Predicting awesomeness: summary


Слайд 91Predicting awesomeness: incorporating into FR
One more branch in Face Recognition CNN

Small

overhead



Обратная связь

Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть 

Что такое ThePresentation.ru?

Это сайт презентаций, докладов, проектов, шаблонов в формате PowerPoint. Мы помогаем школьникам, студентам, учителям, преподавателям хранить и обмениваться учебными материалами с другими пользователями.


Для правообладателей

Яндекс.Метрика