home / Education/ How cameras are watching us on the streets of Russian cities. And how to fool them

How cameras are watching us on the streets of Russian cities. And how to fool them

With enviable regularity, articles appear on Habré that tell about certain methods of face recognition. We decided not only to support this wonderful topic, but to publish our internal document, which covers, though not all, but many approaches to face recognition, their strengths and weaknesses. It was compiled by Andrey Gusak, our engineer, for young employees of the machine vision department, for educational purposes, so to speak. Today we offer it to everyone. At the end of the article - an impressive list of references for the most curious.

So, let's begin.
Despite the wide variety of algorithms presented, one can single out general structure face recognition process:

The general process of processing a face image during recognition

At the first stage, the face is detected and localized in the image. At the stage of recognition, the face image is aligned (geometric and luminance), the features are calculated and the recognition itself is performed - the computed features are compared with the standards stored in the database. The main difference between all the presented algorithms will be the calculation of features and comparison of their collections with each other.

1. Elastic graph matching method.

The essence of the method is reduced to elastic comparison of graphs describing face images. Faces are represented as graphs with weighted vertices and edges. At the stage of recognition, one of the graphs - the reference one - remains unchanged, while the other is deformed in order to best fit the first. In such recognition systems, graphs can represent both a rectangular lattice and a structure formed by characteristic (anthropometric) points of the face.

An example of the structure of a graph for face recognition: a) a regular lattice b) a graph based on the anthropometric points of the face.

At the vertices of the graph, the values of features are calculated, most often they use the complex values of Gabor filters or their ordered sets - Gabor wavelets (Gabor systems), which are calculated in some local area of the graph vertex locally by convolving the pixel brightness values with Gabor filters.

Set (bank, jet) of Gabor filters

An example of a convolution of a face image with two Gabor filters

The edges of the graph are weighted by the distance between adjacent vertices. The difference (distance, discriminatory characteristic) between two graphs is calculated using a certain price deformation function that takes into account both the difference between the values of the features calculated at the vertices and the degree of deformation of the edges of the graph.
The deformation of the graph occurs by displacing each of its vertices by some distance in certain directions relative to its original location and choosing such a position at which the difference between the values of the features (responses of Gabor filters) at the vertex of the deformed graph and the corresponding vertex of the reference graph will be minimal. This operation is performed alternately for all vertices of the graph until the smallest total difference between the features of the deformable and the reference graphs is reached. The value of the price deformation function at this position of the deformable graph will be a measure of the difference between the input face image and the reference graph. This "relaxation" deformation procedure must be performed for all reference faces stored in the system database. The system recognition result is a standard with the best value of the price deformation function.

An example of deformation of a graph in the form of a regular lattice

Some publications indicate a 95-97% recognition efficiency even in the presence of various emotional expressions and changing the angle of the face up to 15 degrees. However, developers of elastic comparison systems on graphs cite high computational costs. this approach... For example, to compare the input face image with 87 reference images, it took approximately 25 seconds when working on a parallel computer with 23 transputers (Note: the publication is dated 1993). In other publications on this topic, the time is either not indicated, or it is said that it is long.

Disadvantages: high computational complexity of the recognition procedure. Low manufacturability when memorizing new standards. Linear dependence of the running time on the size of the face database.

2. Neural networks

Currently, there are about a dozen types of neural networks (NN). One of the most widely used options is a network based on a multilayer perceptron, which allows you to classify the input image / signal in accordance with the presetting / training of the network.
Neural networks are trained using a set of training examples. The essence of training comes down to adjusting the weights of interneural connections in the process of solving an optimization problem using the gradient descent method. In the process of teaching the neural network, the key features are automatically extracted, their importance is determined and the relationships between them are built. It is assumed that a trained NN will be able to apply the experience gained in the learning process to unknown images due to generalizing abilities.
The best results in the field of face recognition (according to the results of the analysis of publications) were shown by the Convolutional Neural Network or convolutional neural network (hereinafter - CNN), which is a logical development of the ideas of such neural network architectures as the cognitron and neocognitron. The success is due to the ability to take into account the two-dimensional topology of the image, in contrast to the multilayer perceptron.
Distinctive features of SNS are local receptor fields (provide local two-dimensional connectivity of neurons), general weights (provide detection of some features anywhere in the image) and hierarchical organization with spatial subsampling. Thanks to these innovations, the SNS provides partial resistance to changes in scale, displacement, rotation, change of perspective and other distortions.

A schematic representation of the architecture of a convolutional neural network

Testing the SNS based on the ORL database, containing images of faces with small changes in lighting, scale, spatial rotation, position and various emotions, showed 96% recognition accuracy.
SNS got their development in the development of DeepFace, which was acquired by
Facebook to recognize the faces of users of its social network. All architectural features are closed.

How DeepFace works

Disadvantages of neural networks: adding a new reference person to the database requires a complete retraining of the network on the entire available set (a rather lengthy procedure, depending on the sample size, from 1 hour to several days). Problems of a mathematical nature associated with training: getting into the local optimum, choosing the optimal optimization step, retraining, etc. Difficult to formalize the stage of choosing a network architecture (number of neurons, layers, nature of connections). Summarizing all of the above, we can conclude that the NS is a "black box" with hard-to-interpret results of work.

3. Hidden Markov Models (CMM, HMM)

One of the statistical methods of face recognition is hidden Markov models (HMM) with discrete time. HMM uses the statistical properties of signals and takes into account their spatial characteristics directly. The elements of the model are: a set of hidden states, a set of observed states, a matrix of transition probabilities, an initial probability of states. Each has its own Markov model. When recognizing an object, the Markov models generated for a given base of objects are checked and the maximum observable probability is sought that the sequence of observations for of this object generated by the corresponding model.
To date, it has not been possible to find an example of a commercial application of SMM for face recognition.

Disadvantages:
- it is necessary to select the parameters of the model for each database;
- HMM does not have discriminating ability, that is, the learning algorithm only maximizes the response of each image to its own model, but does not minimize the response to other models.

4. Principal component analysis (PCA)

One of the most well-known and well-developed is the principal component analysis (PCA) method, based on the Karunen-Loev transformation.
Initially, the principal component analysis began to be applied in statistics to reduce the feature space without significant loss of information. In the problem of face recognition, it is mainly used to represent a face image with a vector of low dimension (principal components), which is then compared with the reference vectors stored in the database.
The main goal of the principal component method is to significantly reduce the dimension of the feature space in such a way that it describes as best as possible the “typical” images belonging to many persons. Using this method, it is possible to identify various variability in the training set of images of faces and describe this variability in the basis of several orthogonal vectors, which are called eigenfaces.

The set of eigenvectors obtained once on the training set of face images is used to encode all other face images, which are represented by a weighted combination of these eigenvectors. Using a limited number of eigenvectors, it is possible to obtain a compressed approximation of the input face image, which can then be stored in the database as a vector of coefficients, which simultaneously serves as a search key in the face database.

The essence of the principal component method is as follows. First, the entire training set of faces is converted into one common data matrix, where each row is one instance of the face image decomposed into a row. All faces of the training set should be reduced to the same size and with normalized histograms.

Transforming the training set of faces into one common matrix X

Then the data is normalized and the rows are reduced to the 0th mean and 1st variance, and the covariance matrix is calculated. For the obtained covariance matrix, the problem of determining the eigenvalues and the corresponding eigenvectors (eigenvalues) is solved. Next, the eigenvectors are sorted in descending order of eigenvalues and only the first k vectors are left according to the rule:

PCA algorithm

An example of the first ten eigenvectors (eigenvalues) obtained on the trained set of faces

= 0.956*-1.842*+0.046 …

An example of constructing (synthesizing) a human face using a combination of its own faces and main components

Principle of choosing a basis from the first best eigenvectors

An example of mapping a face into three-dimensional metric space, obtained from three own faces and further recognition

Principal component analysis has proven itself well in practical applications. However, in cases where there are significant changes in illumination or facial expression in the image of the face, the effectiveness of the method drops significantly. The point is that the PCA chooses a subspace with such a goal as to approximate the input dataset as much as possible, and not to discriminate between classes of persons.

In a solution to this problem was proposed using the linear Fisher discriminant (in the literature there is a name “Eigen-Fisher”, “Fisherface”, LDA). LDA chooses a linear subspace that maximizes the ratio:

Where

Interclass scatter matrix, and

Intra-class scatter matrix; m is the number of classes in the database.

LDA looks for a data projection that makes the classes as linearly separable as possible (see figure below). For comparison, the PCA looks for a data projection that maximizes the spread across the entire face database (excluding classes). Experiments with strong tank and bottom shading of facial images showed Fisherface to be 95% effective compared to 53% for Eigenface.

The fundamental difference between the formation of projections PCA and LDA

Difference between PCA and LDA

5. Active Appearance Models (AAM) and Active Shape Models (ASM) ()

Active Appearance Models (AAM)
Active Appearance Models (AAM) are statistical image models that can be adjusted to fit the real image by various deformations. This type of 2D model was proposed by Tim Coots and Chris Taylor in 1998. Initially, active appearance models were used to estimate the parameters of facial images.
The active appearance model contains two types of parameters: parameters related to shape (shape parameters) and parameters related to the statistical pixel model of an image or texture (appearance parameters). Before use, the model must be trained on a set of pre-labeled images. Images are marked manually. Each mark has its own number and defines a characteristic point that the model will have to find during adaptation to a new image.

An example of a face image marking from 68 points forming an AAM shape.

The AAM training routine begins by normalizing the shapes in the mapped images to compensate for differences in scale, tilt, and offset. For this, the so-called generalized Procrustean analysis is used.

Coordinates of points of the face shape before and after normalization

The principal components are then extracted from the entire set of normalized points using the PCA method.

The AAM shape model consists of a triangulation lattice s0 and a linear combination of displacements si relative to s0

Further, a matrix is formed from the pixels inside the triangles formed by the points of the shape, such that each of its columns contains the pixel values of the corresponding texture. It is worth noting that textures used for training can be either single-channel (grayscale) or multi-channel (for example, RGB color space or other). In the case of multichannel textures, pixel vectors are formed separately for each of the channels, and then they are concatenated. After finding the principal components of the texture matrix, the AAM model is considered trained.

AAM's appearance model consists of a base view A0 defined by pixels within a base lattice s0 and a linear combination of the offsets of Ai relative to A0

An example of fleshing out AAM. Shape parameter vector
p = (p_1, p_2, 〖…, p〗 _m) ^ T = (- 54,10, -9.1,…) ^ T is used to synthesize a model of the form s, and the parameter vector λ = (λ_1, λ_2, 〖…, λ〗 _m) ^ T = (3559,351, -256,…) ^ T for synthesizing the appearance of the model. The final face model 〖M (W (x; p))〗 ^ is obtained as a combination of two models - shape and appearance.

Fitting the model to a specific face image is performed in the process of solving an optimization problem, the essence of which is to minimize the functional

Gradient descent method. The parameters of the model found in this case will reflect the position of the model on a specific image.

An example of fitting a model to a specific image in 20 iterations of the gradient descent procedure.

AAM can be used to simulate images of objects subject to both rigid and non-rigid deformation. AAM consists of a set of parameters, some of which represent the shape of the face, the rest set its texture. Deformations are usually understood as a geometric transformation in the form of a translation, rotation, and scaling composition. When solving the problem of localizing a face in an image, the search for AAM parameters (location, shape, texture) is performed, which represent the synthesized image that is closest to the observed one. Based on the closeness of AAM to the adjusted image, a decision is made whether there is a face or not.

Active Shape Models (ASM)

The essence of the ASM method is to take into account the statistical relationships between the location of anthropometric points. On the available sample of face images, shot in front view. On the image, the expert marks the location of the anthropometric points. In each image, the points are numbered in the same order.

An example of a face shape representation using 68 points

In order to bring the coordinates on all images to unified system usually the so-called. generalized Scrolling analysis, as a result of which all points are brought to the same scale and centered. Further, for the entire set of images, the mean shape and the covariance matrix are calculated. Based on the covariance matrix, eigenvectors are calculated, which are then sorted in descending order of their corresponding eigenvalues. The ASM is defined by a matrix Φ and a mean form vector s ̅.
Then any form can be described using a model and parameters:

Localization of the ASM model on a new image not included in the training sample is carried out in the process of solving the optimization problem.

a B C D)
Illustration of the process of localizing the ASM model on a specific image: a) initial position b) after 5 iterations c) after 10 iterations d) the model has converged

However, the main goal of AAM and ASM is not face recognition, but the exact localization of the face and anthropometric points in the image for further processing.

In almost all algorithms, an obligatory step that precedes classification is alignment, which means aligning the face image to the frontal position relative to the camera or bringing a set of faces (for example, in a training sample for training a classifier) to a single coordinate system. To implement this stage, it is necessary to localize anthropometric points characteristic of all faces on the image - most often these are the centers of the pupils or the corners of the eyes. Different researchers distinguish different groups of such points. In order to reduce computational costs for real-time systems, developers allocate no more than 10 such points.

The AAM and ASM models are designed to precisely localize these anthropometric points in the facial image.

6. The main problems associated with the development of face recognition systems

Illumination problem

Head position problem (face is, after all, a 3D object).

In order to assess the effectiveness of the proposed face recognition algorithms, the DARPA agency and the US Army Research Laboratory have developed the FERET (face recognition technology) program.

In large-scale tests of the FERET program, algorithms based on flexible comparison on graphs and various modifications of the principal component analysis (PCA) took part. The efficiency of all algorithms was approximately the same. In this regard, it is difficult or even impossible to clearly distinguish between the two (especially if testing dates are agreed). For frontal images taken on the same day, the acceptable recognition accuracy is usually 95%. For images taken with different devices and under different lighting conditions, the accuracy usually drops to 80%. For images taken with a difference of one year, the recognition accuracy was approximately 50%. It should be noted that even 50 percent is more than acceptable accuracy of this kind of system.

FERET publishes a comparative test report annually modern systems face recognition based on faces of over one million. Unfortunately, the latest reports do not disclose the principles of building recognition systems, but only the results of the operation of commercial systems are published. Today, the leading system is the NeoFace system developed by NEC.

References (googled at the first link)

1. Image-based Face Recognition - Issues and Methods
2. Face Detection A Survey.pdf
3. Face Recognition A Literature Survey
4. A survey of face recognition techniques
5. A survey of face detection, extraction and recognition
6. Overview of methods for identifying people based on facial images
7. Methods for recognizing a person by face image
8. Comparative analysis of face recognition algorithms
9.Face Recognition Techniques
10. About one approach to the localization of anthropometric points.
11. Face recognition in group photos using segmentation algorithms
12. Research report 2nd stage on face recognition
13. Face Recognition by Elastic Bunch Graph Matching
14. Algorithms for identifying a person from a photograph based on geometric transformations. Thesis.
15. Distortion Invariant Object Recognition in the Dynamic Link Architecture
16. Facial Recognition Using Active Shape Models, Local Patches and Support Vector Machines
17.Face Recognition Using Active Appearance Models
18. Active Appearance Models for Face Recognition
19.Face Alignment Using Active Shape Model And Support Vector Machine
20. Active Shape Models - Their Training and Application
21. Fisher Vector Faces in the Wild
22. Eigenfaces vs. Fisherfaces Recognition Using Class Specific Linear Projection
23. Eigenfaces and fisherfaces
24. Dimensionality Reduction
25. ICCV 2011 Tutorial on Parts Based Deformable Registration
26. Constrained Local Model for Face Alignment, a Tutorial
27. Who are you - Learning person specific classifiers from video
28. Recognition of a person by facial image using neural network methods
29. Face Recognition A Convolutional Neural Network Approach
30. Face Recognition using Convolutional Neural Network and Simple Logistic Classifier
31. Face Image Analysis With Convolutional Neural Networks
32. Methods of face recognition based on hidden Markov processes. Author-ferat
33. Application of hidden Markov models for face recognition
34. Face Detection and Recognition Using Hidden Markovs Models
35. Face Recognition with GNU Octave-MATLAB
36. Face Recognition with Python
37. Anthropometric 3D Face Recognition
38.3D Face Recognition
39. Face Recognition Based on Fitting a 3D Morphable Model
40. Face Recognition
41. Robust Face Recognition via Sparse Representation
42. The FERET Evaluation Methodology For Face-Recognition Algorithms
43. Search for faces in electronic collections of historical photographs
44. Design, Implementation and Evaluation of Hardware Vision Systems dedicated to Real-Time Face Recognition
45. An Introduction to the Good, the Bad, & the Ugly Face Recognition Challenge Prob-lem
46. Research and development of methods for detecting a human face on digital images. Diploma
47. DeepFace Closing the Gap to Human-Level Performance in Face Verification
48. Taking the bite out of automated naming of characters in TV video
49. Towards a Practical Face Recognition System Robust Alignment and Illumination by Sparse Representation
50. Algorithms for detecting a person's face for solving applied problems of image analysis and processing
51. Face detection and localization in the image
52. Modified Viola-Jones mod
53. Development and analysis of algorithms for the detection and classification of objects based on machine learning methods
54. Overview of the Face Recognition Grand Challenge
55. Face Recognition Vendor Test (FRVT)
56. On the efficiency of using the SURF algorithm in the problem of identifying persons

More than three thousand video cameras of the city video surveillance network were connected to the face recognition system. The video image is automatically analyzed in real time: the system can identify the person on the video, his gender and age.

The Moscow video surveillance system was taught to recognize faces. Thanks to an algorithm based on the use of neural networks, video recordings from city cameras are analyzed in real time. The persons on the records are scanned so that they can be compared, if necessary, with information in various databases - for example, in photo databases of law enforcement agencies when it comes to finding the offender. In addition, such an analytical system can help law enforcement agencies, when capturing a criminal, to build a route for his movement around the city. The system itself will select the necessary video recordings from different surveillance cameras, identifying the suspect in the video. The metropolitan network consists of 160 thousand video cameras and covers 95 percent of the entrances of residential buildings. Until the end of the year, citizens will be able to independently install cameras on their homes and connect them to a unified video surveillance system.

“The introduction of video analytics is a powerful driver for increasing the efficiency of both private and urban video surveillance systems. The residents of the city have an additional level of protection, - said the head of the Department information technologies Moscow Artem Ermolaev. - Of course, all these opportunities must be implemented very responsibly. Our priority is the balance between privacy and security, and we adhere to a strict internal control policy to ensure that citizens' rights are respected. "

Now about 16 thousand users are connected to the city surveillance system - they are employees of law enforcement agencies, state and municipal organizations. Each has its own access level, which allows maintaining the confidentiality of information. Law enforcement officers can obtain the necessary data upon request within the framework of the current legislation, and employees of state institutions get access to video cameras only from those territories and routes for which they are responsible. Each call to the tracking system is recorded.

The face recognition function works online, the identification process takes a few seconds. If the algorithm detects a person whose face is uploaded to the database, it will send an alert to law enforcement agencies.

The Department also noted that the introduction of the face recognition function has already increased the efficiency of the investigation of offenses and the search for criminals. During pilot tests, it identified and detained more than 50 percent of lawbreakers, who were searched for using analytical algorithms. Before that, some of them could not be found for many years.

Muscovites will be able to connect their surveillance cameras to the general city network. This option will be implemented before the end of the year. Video from such cameras will be transmitted to the unified data storage and processing center (ECDC), and the recordings from them can be used as legally significant evidence in court.

This year, more than 3.5 thousand cameras were additionally connected to a single data storage and processing center. Access cameras, cameras installed on the territory and in buildings of schools and kindergartens, at MCC stations, stadiums, public transport stops and bus stations, as well as in parks are connected to a single system. In addition, video surveillance cameras will appear in 25 underground pedestrian crossings in the capital by June 2018. The recording devices will be installed in underground passages not connected with metro stations and under the jurisdiction of GBU "Gormost".

Perhaps there is no other technology today around which there would be so many myths, lies and incompetence. Journalists who talk about technology lie, politicians who talk about successful implementation lie, most technology sellers lie. Every month I see the consequences of people trying to implement facial recognition into systems that can't work with it.

The topic of this article long ago became painful, but I was somehow too lazy to write it. A lot of text, which I have already repeated twenty times to different people. But, after reading another pack of trash, I still decided that it was time. I will give a link to this article.

So. In this article, I will answer a few simple questions:

Where do you think the creators of the algorithms got these bases?

A little tip. NTech's first product they are now is Find Face, a search for people by contact. I think no explanations are needed. Of course, VKontakte fights against bots that pump out all open profiles. But as far as I heard, people are still pumping. And classmates. And instagram.

It’s like with Facebook - everything is more complicated there. But I'm pretty sure they've come up with something too.
So yes, if your profile is open, then you can be proud, it was used to train algorithms;)

About solutions and about companies

You can be proud of it. Of the 5 leading companies in the world, two are now Russian. These are N-Tech and VisionLabs. Half a year ago, NTech and Vocord were the leaders, the former worked much better on turned faces, the second on frontal ones.

Now the rest of the leaders are 1-2 Chinese companies and 1 American, Vocord has passed something in the ratings.

Also Russian in the rating of itmo, 3divi, intellivision. Synesis is a Belarusian company, although some were once in Moscow, about 3 years ago they had a blog on Habré. I also know about several solutions that they belong to foreign companies, but the development offices are also in Russia. There are also several Russian companies that are not in the competition, but which seem to have good solutions. For example, the MDGs have it. Obviously, Odnoklassniki and Vkontakte also have their own good ones, but they are for internal use.

In short, yes, mostly we and the Chinese are shifted on our faces.

NTech was the first in the world to show good parameters new level. Sometime at the end of 2015. VisionLabs only caught up with NTech. In 2015, they were the market leaders. But their decision was of the past generation, and they began to try to catch up with NTech only at the end of 2016.

To be honest, I don't like both of these companies. Very aggressive marketing. I have seen people who were given a clearly inappropriate solution that didn’t solve their problems.

From this side, I liked Vocord much more. I once consulted the guys to whom Vocord said very honestly, “your project will not work with such cameras and installation points”. NTech and VisionLabs happily tried to sell. But something Vocord has recently disappeared.

conclusions

In the conclusions, I would like to say the following. Facial recognition is a very good and powerful tool. It really allows you to find criminals today. But its implementation requires a very accurate analysis of all parameters. There are enough OpenSource solutions where. There are applications (recognition at stadiums in a crowd), where it is necessary to install only VisionLabs | Ntech, and also keep a team of maintenance, analysis and decision-making. And OpenSource won't help you here.

Today, one cannot believe all the tales that one can catch all criminals, or observe everyone in the city. But it's important to remember that such things can help catch criminals. For example, to stop in the subway not everyone in a row, but only those whom the system considers to be similar. Place cameras so that faces are better recognized and create an appropriate infrastructure for this. Although, for example, I am against this. For the cost of a mistake if you are recognized as someone else may be too great.

Add tags

Complex passwords, two factor authentication Fingerprint scanners are all ways to protect user data. In the past few years, smartphone manufacturers have begun to actively promote a new trend - systems automatic recognition human faces. Let's figure out where they came from, how they work and why they are needed.

A bit of history

The first closed experiments on computer recognition of human faces began in the 1960s. The main problems of scientists then - the inability of computers to capture different expressions and age-related changes in a person's face, as well as low automation of the process. Research moved to a new level at the end of the 20th century - then computers began to teach, when analyzing photographs, to “recognize” people from several angles, not to react to beards, mustaches, cosmetics and other “interference”. This process continues to this day - there is no system in the world that works in 100% of cases and provides high recognition accuracy. However, at the beginning of the 21st century, technology has stepped forward, and a new method of face identification based on three-dimensional scanning has appeared. We will focus on it today.

How face recognition systems work on smartphones

Face recognition on modern gadgets, like any other process of biometric user identification, can be roughly divided into 4 stages:

Initial face scan. Using a special sensor or camera, the system performs a three-dimensional face scan and processes the information received.
Extracting unique data and creating a template based on it. At this stage, the system determines a set of features of a particular face: the contours of the eye socket, the width of the nose and the shape of the cheekbones.
Matching the finished template with new input, for example, another person's face.
Search for matches. The system decides whether the set of features of the new sample matches the ready-made template and performs a specific action. In our case, it will unlock the screen or leave it locked.

disadvantages

On modern smartphones scanning takes less than a second. However, the 3D sensor cannot yet completely replace other methods of user identification, such as a fingerprint scanner. There are several reasons:

the system is unstable in low light conditions;
she does not cope well with different expressions of the human face, hairstyles, facial hair and other obstacles;
the system does not always accurately compare the template with the new input data, which is why the device can be unlocked using the owner's photo.

Where are face recognition systems used?

Previously, face recognition and identification systems were used exclusively by law enforcement agencies, at airports and at customs. In recent years, the focus has shifted towards personal computers, smartphones and wearable devices, where face scanners are an additional tool for user authentication. Thus, the Galaxy S8 presented in March is equipped with a 3D sensor that can unlock the device. It is noteworthy that to make payments or work with confidential folders, users have to use a more reliable method of biometric verification - a fingerprint.

Another area of application of face recognition technology is identifying people in photographs. This feature works in Google Photos albums and the Photos app on iPhone and Mac. In the latter, the system recognizes the people in the pictures that the user uploads to the library, and then lets you add names and contact details, making it easier to find the pictures.

When Apple completes Siri, we will be able to open the photos we need without touching the device and share them on social networks, call old friends whom we saw in an album with university photos, or ask an assistant to show us how our appearance has changed over the past 5 years. And that's just what comes to mind first.

Face recognition in Russia

Where and why they want to apply it

Public events

NtechLab has developed a camera system that. She recognizes the intruders and sends their photos to the police. The police will also have hand-held cameras to photograph suspicious people, recognize their faces and find out who they are from databases.

Cameras with face recognition are being tested in the Moscow metro. They scan the faces of 20 people per second and check them against databases of wanted people. If there is a match, the cameras send the data to the police. For 2.5 months the system has been on the wanted list. It is known that there are such cameras, but it is possible that they were installed at other stations as well.

Otkritie Bank launched a face recognition system at the beginning of 2017. It compares the visitor's face to the photograph in the database. The system is needed to serve customers faster, how exactly is not specified. In the future, Otkritie wants to use the system for remote identification. In 2018 similar system, but the development of "Rostelecom" should appear.

The main thing is the algorithm

What technology allows machines to recognize faces

Sergey Milyaev

Computer vision is algorithms that allow you to obtain high-level information from images and videos, thereby automating some aspects of human visual perception. Computer vision for a machine, like ordinary vision for a person, is a means of measuring and obtaining semantic information about the observed scene. With its help, the machine receives information about how large the object is, what shape it is and what it is.

Algorithm camera computer vision OpenCV monitors children on the playground

Everything works on the basis of neural networks

How exactly face recognition works, with an example

Sergey Milyaev: Machines do this most efficiently on the basis of machine learning, that is, when they make a decision based on some parametric model without explicitly describing all the necessary decision-making rules. program code... For example, for face recognition, a neural network extracts features from an image and gets a unique representation of each person's face, which is not affected by the orientation of his head in space, the presence or absence of a beard or makeup, lighting, age-related changes, and so on.

Computer vision does not reproduce the human visual system, but only simulates some aspects to solve various problems

Sergey Milyaev

Lead Researcher, VisionLabs

The most common computer vision algorithms are now based on neural networks, which, with the increase in processor performance and data volume, have demonstrated a high potential for solving a wide range of problems. Each fragment of the image is analyzed using filters with parameters that the neural network uses to search for characteristic features of the image.

Example

The layers of the neural network sequentially process the image, and on each subsequent layer, more and more abstract features are calculated, and the filters on the last layers can see the entire image. When recognizing faces on the first layers, the neural network detects simple features such as boundaries and facial features, then on deeper layers filters can reveal more complex features - for example, two circles next to each other will most likely mean that these are eyes, and so on.

OpenCV's computer vision algorithm determines how many fingers are shown

The computer knows when it is being cheated

Can a person fool a very smart computer, three examples

Oleg Grinchuk

VisionLabs Lead Researcher

Fraudsters can try to either impersonate another person in order to gain access to his accounts and data, or deceive the system so that it cannot recognize them in principle. Let's consider both options.

Another person's photo, video, or printed mask

The VisionLabs platform combats these methods of deception by checking for liveness, that is, it checks that the object in front of the camera is alive. This can be, for example, interactive liveness, when the system asks a person to smile, blink, or bring the camera or smartphone closer to the face.

The set of checks cannot be predicted, since the platform creates a random sequence with tens of thousands of combinations - it is unrealistic to record thousands of videos with the desired combinations of smiles and other emotions. And if the camera is equipped with sensors of the near infrared range or a depth sensor, then they transmit to the system Additional information, which helps to determine one frame at a time if the person in front of her is real.

In addition, the system analyzes the reflection of light from different textures, as well as the environment of the object. So it is almost impossible to deceive the system in this way.

In this case, in order to reproduce a copy sufficient to gain access, the fraudster needs to have access to source code and based on the system's reactions to changes in appearance with makeup, gradually change it to become an exact copy of another person.

An attacker needs to crack exactly the logic and the principle of verification. But for a third-party user, this is just a camera, a black box, looking at which it is impossible to understand which version of the check is inside. Moreover, the factors for verification differ from case to case, so some universal algorithm cannot be used for hacking.

If there are several recognition errors, the system sends a warning signal to the server, after which the attacker is blocked from access. So even under the unlikely condition of having access to the code, it is difficult to hack the system, since an attacker cannot endlessly change his appearance until recognition occurs.

Big sunglasses, cap, scarf, cover your face with your hand

The system will not be able to recognize a person if most of his face is hidden, even though the neural network recognizes faces much better than a person. But in order to completely hide from the face recognition system, a person must always hide his face from the cameras, and this is quite difficult to implement in practice.

Computer vision is superior to human vision

What exactly and why, with an example

Yuri Minkin

Computer vision systems are similar in basic principles to human vision. As a person, they have devices that are responsible for collecting information, these are video cameras, an analogue of eyes, and its processing is a computer, an analogue of the brain. But computer vision has a significant advantage over human vision.

A person has a certain threshold of what he can see and what information to extract from the image. This threshold cannot be exceeded for purely physiological reasons. And computer vision algorithms will only get better. They have endless learning opportunities

Yuri Minkin

Head of Cognitive Technologies Department

A good example is computer vision technology in self-driving cars. If one person can teach his knowledge of the traffic situation only a small, significantly limited number of people, then cars can transfer all the existing experience of detecting certain objects at once to all new systems that will be installed on a fleet of thousands or even a million.

Example

At the end of last year, Cognitive Technologies specialists conducted experiments to compare the capabilities of humans and artificial intelligence in detecting road scene objects. And even now AI in some cases not only did not yield, but also exceeded human capabilities. For example, he was better at recognizing road signs when they were partially obscured by the foliage of trees.

Can a computer testify against a person

Sergey Izrailit: Now in the legislation, the use of data "obtained from computers" for use as evidence of some significant circumstances, including offenses, is specifically regulated only for some cases. For example, the use of cameras that recognize license plates of cars that violate high-speed traffic is regulated.

In general, such data can be used on an equal basis with any other evidence that the investigation or the court can either take into account or reject. At the same time, procedural legislation establishes general order work with evidence - an examination, within the framework of which it is established whether the submitted record really confirms any facts or the information has been distorted in one way or another.

How cameras are watching us on the streets of Russian cities. And how to fool them

1. Elastic graph matching method.

2. Neural networks

3. Hidden Markov Models (CMM, HMM)

4. Principal component analysis (PCA)

5. Active Appearance Models (AAM) and Active Shape Models (ASM) ()

6. The main problems associated with the development of face recognition systems

References (googled at the first link)

About solutions and about companies

conclusions

A bit of history

How face recognition systems work on smartphones

disadvantages

Where are face recognition systems used?

Face recognition in Russia

The main thing is the algorithm

Everything works on the basis of neural networks

Example

The computer knows when it is being cheated

Computer vision is superior to human vision

Example

Password recovery