home / ON/ Recognition of the body contour in real time. Computer vision on openCV

Real-time body contour recognition. Computer vision on openCV

Evgeny Borisov Monday, 24 July 2017

This article provides an overview of methods for finding an object in an image.

1. Introduction

Many practical tasks from automation of control in production to the design of robotic vehicles are directly related to the task of finding objects in the image. To solve it, you can use two different strategies, which depend on the shooting conditions - background modeling and object modeling.

Background Simulation - This approach can be used if the camera is stationary, i.e. we have a background that changes little, and thus we can build a model of it. All image points that deviate significantly from the background model are considered foreground objects. Thus, it is possible to solve the problems of object detection and tracking.
Object modeling - this approach is more general, it is used in cases where the background is constantly and significantly changing. Unlike the previous case, here we need to know what exactly we want to find, i.e. it is necessary to build a model of the object, and then check the points of the picture for compliance with this model.

Sometimes the conditions of the problem allow you to combine both approaches, this can significantly improve the results. The solution to the problem with modeling the background for a stationary camera can be found in. Next, we will consider the application of the second strategy, i.e. modeling of the search object.

2. Overview of methods

In this section, we present a list of approaches that can be used to successfully solve the problem of finding an object in an image, in order of increasing complexity.

Color filters - if the object stands out significantly from the background in color, then you can choose the appropriate filter.
Selection and analysis of contours - if we know that the object has the shape of, for example, a circle, then we can look for circles in the image.
Pattern matching - we have an image of an object, we are looking for areas in another image that match this image of an object.
Working with special points - in the picture with the object, we are looking for features (for example, angles), which we are trying to compare with such features in another image.
Machine learning methods - we train the classifier on pictures with an object, in some way divide the image into parts, check each part with the classifier for the presence of an object.

We'll look at these methods in more detail below.

3. Color filters

The method of color filters can be used in cases where the object differs significantly from the background in color and the lighting is uniform and does not change. You can read more about the color filter method in.

4. Extraction and analysis of contours

If the object does not stand out significantly against the background in color and / or has a complex coloring, then the use of the method of color filters will not give good results. In this case, you can try to apply the method of selection and analysis of contours. To do this, we select the borders in the image. Borders are places where the brightness gradient changes abruptly and can be found using the Canny method. Next, we can check the selected boundary lines for compliance with the geometric contours of the object, this can be done using the Hough Transform method, for example, we can search within the boundaries of a circle.

Fig. 4: finding circles

This method can also be used in conjunction with color filters. You can read more about the selection and analysis of contours in. The source code for the circle search example can be downloaded.

5. Pattern matching

If the image has a lot of small details, then edge analysis can be difficult. In this case, you can use the template matching method. It consists in the following - we take a picture with an object (Fig. 5) and look for areas in the large image that coincide with the image of the object (Fig. 6,7).

Fig 5: object to search

You can listen to the lecture in more detail about the pattern matching method. The source code for the example can be downloaded.

6. Working with special points

The pattern matching method described in the previous section looks for exact matches between pattern points and image points. If the image is rotated or scaled relative to the template parameters, then this method does not work well. To overcome these limitations, methods are used based on the so-called. singular points, we will consider them below. A key point is a small area that stands out significantly in the image. There are several methods for determining such points, these can be angles (Harris corner detector) or blobs (blob, drop), i.e. small areas of the same brightness, with a fairly clear border that stand out against the general background. For a special point, the so-called. descriptor - a characteristic of a particular point. The descriptor is calculated from a given neighborhood of the singular point, as the directions of the brightness gradients of different parts of this neighborhood. There are several methods for calculating descriptors for special points: SIFT, SURF, ORB, etc. It should be noted that some methods for calculating descriptors are patented (for example, SIFT) and their commercial use is limited. You can listen to the lecture in more detail about special points in images and methods of working with them. Special points can be used to find an object in an image. To do this, we need to have an image of the desired object and then perform the following actions.

In the picture with the object, we are looking for special points of the object and calculate their descriptors.
On the analyzed image, we also look for singular points and calculate descriptors for them.
We compare the descriptors of the special points of the object and the descriptors of the special points found in the image.
If a sufficient number of matches are found, then we mark the area with the corresponding points.

Figure 8 below shows the results of the method for searching for an object by singular points.

Fig 8: object detector by special points

The source code for the example can be downloaded.

7. Machine learning methods

The method of searching for objects by comparing sets of special points has its drawbacks, one of which is poor generalization ability. If we have a task, for example, to select the faces of people in a photo, then our method will search for one specific photo at special points. The photo in which the special points were selected, the rest of the faces will stand out worse, because they, most likely, correspond to other sets of special points. The results can be even worse if you change the camera angle. To solve these problems, we already need machine learning methods and not one picture with an object, but whole training sets of hundreds (and in some cases, hundreds of thousands) of different pictures with an image of an object in different conditions. We will look at the application of machine learning methods to find objects in an image in the second part of this article.

Literature

ES Borisov Object detector for fixed cameras.
- http: //site/cv-backgr.html
ES Borisov Video processing: object detector based on color filters.
- http: //site/cv-detector-color.html
ES Borisov Basic methods of image processing.
- http: //site/cv-base.html
Anton Konushin Computer vision (2011). Lecture 3. Simple methods of image analysis. Matching patterns.
- http://www.youtube.com/watch?v=TE99wDbRrUI
OpenCV documentation: Harris Corner Detection
- http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
Wikipedia: Blob_detection
- http://en.wikipedia.org/wiki/Blob_detection
Anton Konushin Computer vision (2011). Lecture 5. Local features
- http://www.youtube.com/watch?v=vFseUICis-s

An open source computer vision and machine learning library. It includes more than 2,500 algorithms, which include both classical and modern algorithms for computer vision and machine learning. This library has interfaces in various languages, among which are Python (we use it in this article), Java, C ++ and Matlab.

Installation

The installation instructions for Windows can be viewed, and for Linux -.

Importing and viewing an image

import cv2 image = cv2.imread ("./ path / to / image.extension") cv2.imshow ("Image", image) cv2.waitKey (0) cv2.destroyAllWindows ()

Note When reading the method above, the image is in the color space not RGB (as everyone is used to), but BGR. It may not be that important in the beginning, but once you start working with color it is worth knowing about this feature. There are 2 solutions:

Swap the 1st channel (R - red) with the 3rd channel (B - blue), and then the red color will be (0,0,255), not (255,0,0).
Change color space to RGB: rgb_image = cv2.cvtColor (image, cv2.COLOR_BGR2RGB)
And then in the code to work no longer with image, but with rgb_image.

Note Press any key to close the window in which the image is displayed. If you use the window close button, you may stumble upon freezes.

Throughout the article, the following code will be used to display images:

Import cv2 def viewImage (image, name_of_window): cv2.namedWindow (name_of_window, cv2.WINDOW_NORMAL) cv2.imshow (name_of_window, image) cv2.waitKey (0) cv2.destroyAllWindows ()

Cropping

Doggie after cropping

Import cv2 cropped = image viewImage (cropped, "Doggie after cropping")

Where image is image.

Change of size

After resizing by 20%

Import cv2 scale_percent = 20 # Percentage of original size width = int (img.shape * scale_percent / 100) height = int (img.shape * scale_percent / 100) dim = (width, height) resized = cv2.resize (img, dim , interpolation = cv2.INTER_AREA) viewImage (resized, "After resizing by 20%")

This function takes into account the aspect ratio of the original image. Other image resizing functions can be seen.

Turn

Dog after turning 180 degrees

Import cv2 (h, w, d) = image.shape center = (w // 2, h // 2) M = cv2.getRotationMatrix2D (center, 180, 1.0) rotated = cv2.warpAffine (image, M, (w , h)) viewImage (rotated, "Doggie after 180 degrees rotation")

image.shape returns height, width and channels. M - Rotation Matrix - rotates the image 180 degrees around the center. -ve is the clockwise rotation of the image, and + ve is counterclockwise, respectively.

Grayscale and black-and-white thresholding

Grayscale doggie

Black and white doggie

Import cv2 gray_image = cv2.cvtColor (image, cv2.COLOR_BGR2GRAY) ret, threshold_image = cv2.threshold (im, 127, 255, 0) viewImage (gray_image, "Grayscale doggie") viewImage (threshold_image, "Black and white doggie ")

gray_image is the single-channel version of the image.

The threshold function returns an image in which all pixels that are darker (less than) 127 are replaced by 0, and all pixels that are brighter (greater than) 127 are replaced by 255.

For clarity, another example:

Ret, threshold = cv2.threshold (im, 150, 200, 10)

Here everything that is darker than 150 is replaced by 10, and everything that is brighter is replaced by 200.

The rest of the threshold functions are described.

Blur / anti-aliasing

Blurry doggie

Import cv2 blurred = cv2.GaussianBlur (image, (51, 51), 0) viewImage (blurred, "Blurred doggie")

The GaussianBlur function takes 3 parameters:

Original image.
A tuple of 2 positive odd numbers. The higher the numbers, the greater the strength of the smoothing.
sigmaX and sigmaY... If these parameters are left equal to 0, then their value will be calculated automatically.

Drawing rectangles

Draw a rectangle around the dog's face

Import cv2 output = image.copy () cv2.rectangle (output, (2600, 800), (4100, 2400), (0, 255, 255), 10) viewImage (output, "Draw a rectangle around the dog's face")

This function takes 5 parameters:

The image itself.
Top-left corner coordinate (x1, y1).
Bottom-right corner coordinate (x2, y2).
Rectangle color (GBR / RGB depending on the selected color model).
Line width of the rectangle.

Line drawing

2 dogs separated by a line

Import cv2 output = image.copy () cv2.line (output, (60, 20), (400, 200), (0, 0, 255), 5) viewImage (output, "2 dogs separated by a line")

The line function takes 5 parameters:

The actual image on which the line is drawn.
The coordinate of the first point (x1, y1).
The coordinate of the second point (x2, y2).
Line color (GBR / RGB depending on the selected color model).
Line width.

Text on image

Image with text

Import cv2 output = image.copy () cv2.putText (output, "We<3 Dogs", (1500, 3600),cv2.FONT_HERSHEY_SIMPLEX, 15, (30, 105, 210), 40) viewImage(output, "Изображение с текстом")

The putText function takes 7 parameters:

Directly image.
Text for the image.
The coordinate of the lower-left corner of the start of the text (x, y).

Faces detected: 2

Import cv2 image_path = "./path/to/photo.extension" face_cascade = cv2.CascadeClassifier ("haarcascade_frontalface_default.xml") image = cv2.imread (image_path) gray = cv2.cvtColor (image, cv2.COLOR_BGR_BGR .detectMultiScale (gray, scaleFactor = 1.1, minNeighbors = 5, minSize = (10, 10)) faces_detected = "Faces detected:" + format (len (faces)) print (faces_detected) # Draw squares around faces for (x, y , w, h) in faces: cv2.rectangle (image, (x, y), (x + w, y + h), (255, 255, 0), 2) viewImage (image, faces_detected)

detectMultiScale is a general function for recognizing both faces and objects. In order for the function to search specifically for faces, we pass it the appropriate cascade.

The detectMultiScale function takes 4 parameters:

The processed image in grayscale.
ScaleFactor parameter. Some faces may be larger than others because they are closer than others. This parameter compensates for perspective.
The recognition algorithm uses a sliding window during object recognition. The minNeighbors parameter determines the number of objects around the face. That is, the larger the value of this parameter, the more similar objects the algorithm needs to define the current object as a face. Too small a value will increase the number of false positives, and too large will make the algorithm more demanding.
minSize is the actual size of these areas.

Contours - object recognition

Object recognition is performed using color segmentation of the image. There are two functions for this: cv2.findContours and cv2.drawContours.

This article details object detection using color segmentation. Everything you need for her is there.

Saving an image

Conclusion

OpenCV is an excellent library with lightweight algorithms that can be used in 3D rendering, advanced image and video editing, tracking and identifying objects and people in videos, finding identical images from a set, and much, much more.

This library is very important for anyone developing projects related to machine learning in the field of images.

This article will show you how to create a Python script to count the number of books in an image using OpenCV.

What will we do?

Let's take a look at the image in which we will search for books:

We can see that there are four books in the image, as well as distracting items such as a coffee mug, a Starbucks cup, a few magnets, and a piece of candy.

Our goal is to find the four books in the image without identifying any other item as a book.

What libraries do we need?

To write a system for searching and discovering books on images, we will use OpenCV for computer vision and image processing. We also need to install NumPy for OpenCV to work correctly. Make sure you have these libraries installed!

Searching for books on images with Python and OpenCV

Approx. transl. You may notice that the source code in our article differs from the code in the original. The author probably used the installation of the required libraries through the repositories. We suggest using pip, which is much easier. To avoid errors, we recommend using the version of the code given in our article.

Open up your favorite code editor, create a new file called find_books.py and start:

# - * - coding: utf-8 - * - # import required packages import numpy as np import cv2 # load image, change color to grayscale and de-sharpen image = cv2.imread ("example.jpg") gray = cv2. cvtColor (image, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur (gray, (3, 3), 0) cv2.imwrite ("gray.jpg", gray)

Let's start by importing the OpenCV library. Loading an image from disk is handled by the cv2.imread function. Here we just load it from disk and then convert the color gamut from RGB to grayscale.

We also blur the image a little to reduce high frequency noise and improve the accuracy of our application. After executing the code, the image should look like this:

We loaded an image from disk, converted it to grayscale and blurred it a bit.

Now let's define the edges (i.e. outlines) of objects in the image:

# edge detection edged = cv2.Canny (gray, 10, 250) cv2.imwrite ("edged.jpg", edged)

Our image now looks like this:

We found the outlines of objects in the images. However, as you can see, some of the paths are not closed - there are gaps between the paths. To remove the gaps between the white pixels of the image, we will use the "close" operation:

# create and apply a closure kernel = cv2.getStructuringElement (cv2.MORPH_RECT, (7, 7)) closed = cv2.morphologyEx (edged, cv2.MORPH_CLOSE, kernel) cv2.imwrite ("closed.jpg", closed)

Now the gaps in the outlines are closed:

The next step is to actually detect the outlines of objects in the image. For this we will use the cv2.findContours function:

# find the contours in the image and count the number of books cnts = cv2.findContours (closed.copy (), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) total = 0

Consider the geometry of the book.

The book is a rectangle. The rectangle has four vertices. Therefore, if we look at the outline and find that it has four vertices, then we can assume that this is a book, and not another object in the image.

To check if a path is a book or not, we need to loop through each path:

# loop along the contours for c in cnts: # approximate (smooth) the contour peri = cv2.arcLength (c, True) approx = cv2.approxPolyDP (c, 0.02 * peri, True) # if the contour has 4 vertices, we assume that it is book if len (approx) == 4: cv2.drawContours (image,, -1, (0, 255, 0), 4) total + = 1

For each of the paths, we compute the perimeter using cv2.arcLength and then approximate (smooth) the path using cv2.approxPolyDP.

The reason we are approximating the path is because it may not be a perfect rectangle. Due to the noise and shadows in the photo, it is unlikely that the book will have exactly 4 vertices. By approximating the contour, we solve this problem.

Finally, we check that the contour to be approximated does indeed have four vertices. If so, then we draw an outline around the book and then increment the counter for the total number of books.

Let's wrap up this example by showing the resulting image and the number of books found:

# show the resulting image print ("I found (0) books in this picture" .format (total) cv2.imwrite ("output.jpg", image))

At this stage, our image will look like this:

Let's summarize

In this article, you learned how to find books in images using simple image processing and computer vision techniques with Python and OpenCV.

Our approach was to:

Load an image from disk and convert it to grayscale.
Blur the image a little.
Apply Canny edge detector to detect objects in the image.
Close any gaps in the paths.
Find outlines of objects in the image.
Apply contour approximation to determine if the contour was a rectangle and therefore a book.

You can download the source code of the script and the image that is used in this article.

The main idea is to take into account the statistical relationships between the location of the anthropometric points of the face. On each face image, the points are numbered in the same order. Comparison of faces is carried out according to their relative position.

To compare faces, you can use the same face position relative to the camera. More preferable for this.

Capturing a video stream from a camera and highlighting a face

#include using namespace cv; int main () (// Load Face cascade (.xml file) CascadeClassifier face_cascade; face_cascade.load ("haarcascade_frontalface_alt2.xml"); Mat img; VideoCapture cap (0); while (true) (cap >> img; // cvtColor (img, img, CV_BGR2GRAY); // Detect faces std :: vector faces; face_cascade.detectMultiScale (img, faces, 1.1, 2, 0 | CV_HAAR_SCALE_IMAGE, Size (30, 30)); // Draw circles on the detected faces for (int i = 0; i< faces.size(); i++) { Point center(faces[i].x + faces[i].width*0.5, faces[i].y + faces[i].height*0.5); ellipse(img, center, Size(faces[i].width*0.5, faces[i].height*0.5), 0, 0, 360, Scalar(255, 0, 255), 4, 8, 0); } imshow("Detected Face", img); waitKey(1); } return 0; }

The cascade files are located in the c: \ opencv \ build \ etc \ directory ... Place the required cascade in the project directory, in the same place as the main.cpp source file.

Highlighting facial points

The application is based on C ++ code for OpenCV Facemark

In the application project, in the same place as the file main.cpp, posted files haarcascade_frontalface_alt2.xml, drawLandmarks.hpp and lbfmodel.yaml that are referenced in the code. The cascade files are located in the c: \ opencv \ build \ etc \ ... directory. drawLandmarks.hpp and lbfmodel.yaml available in the Facemark_LBF.rar archive.

After pasting the code, errors appeared due to the fact that OpenCV 3.4.3-vc14-vc15 lacks a number of libraries required to run the application. I linked my library (download opencv_new.zip) and installed it in the root of the C drive (C: \ opencv-new).

Now, all the settings that were performed must be done for opencv-new:

Making settings in Windows... I go to the "Change environment variable" window (Windows buttons -> System Tools -> Control Panel -> System and Security -> System -> Advanced System Settings -> Environment Variables -> Path -> Change). In this window I create the variable C: \ opencv-new\ x64 \ vc14 \ bin. Restart Windows.

In project properties also refer to the opencv_new library (instead of opencv). In the "Property Pages" window I do the following:

C / C ++ -> General -> Additional Include Directories -> C: \ opencv-new\ include
Linker -> General -> Additional Library Directories -> C: \ opencv-new\ x64 \ vc14 \ lib
Linker -> Input -> Additional Dependencies -> opencv_core400.lib; opencv_face400.lib; opencv_videoio400.lib; opencv_objdetect400.lib; opencv_imgproc400.lib; opencv_highgui400.lib

At startup, the program generates an error if the Debug. For Release, launch is successful.

Selecting features for image filtering and face recognition

The dotted frame of the face is displayed in different ways depending on objective and subjective factors.

Objective factors are the position of the face relative to the camera.

Subjective factors - uneven or poor lighting, facial distortion due to emotions, squinting eyes, etc. In these cases, the wireframe may not be correct, the points may even be torn off the face:

Sometimes such images are skipped during video capture. They need to be filtered out - both during training and recognition.

Some of the points are the most stable and informative. They are rigidly attached to the face, regardless of its position relative to the camera. In addition, they characterize well the specifics of the face. These points can be used as a basis for modeling the feature system.

You can use a 2D point wireframe of the same face position to compare faces. What is the most informative position of your face relative to the camera? Obviously frontal. It is not for nothing that forensic science takes a full face and profile photo. For now, we will restrict ourselves to the full face.

All signs (distances) must be dimensionless (normalized), i.e., correlated to some size (distance). I guess the most suitable size for this is the distance between the midpoints of the corner points of the eyes. Why not, for example, the outer corner points of the eyes, which are actually defined in the landmarks array? The fact is that the corner points of the eyes move apart (come closer) when responding to a color change, expression of surprise, blinking, etc. The distance between the centers of the eyes neutralizes these fluctuations and is therefore more preferable.

What sign will we take as a basis in the first approximation? I assume the distance from the top of the bridge of the nose to the bottom of the chin. Judging by the photo, this sign may differ significantly for different persons.

So, before forming features for training and comparison, it is necessary to filter out the point frames of faces obtained by video capture, which, for subjective or objective reasons, are not the correct frontal image of the face (full face).

We leave only those point wireframes that follow the following criteria:

The straight line that goes through the extreme points of the eyes (line of the eyes) is perpendicular to the line that goes through the extreme points of the nose (line of the nose).
The line of the eyes is parallel to the straight line that passes through the corners of the mouth (mouth line).
The above points are symmetrical about the nose line.
The corner points of the eyes (external and internal) are on the same straight line.

An example of frontal images that show all signs:

An example of images that are filtered:

Try to determine for yourself which of the signs does not pass the images.

How are features that provide filtering and face recognition formalized? Basically, they are based on the conditions for determining the distances between points, conditions of parallelism and perpendicularity. The task of formalizing such signs is discussed in the topic.

Algorithm for face recognition by 2D-wireframe of points

The coordinates of the points of the face wireframe are initially set in the coordinate system, which is anchored to the upper left point of the window. In this case, the Y axis is directed downward.

For the convenience of identifying features, we use a custom coordinate system (UCS), the X axis of which passes through the segment between the midpoints of the eyes, and the Y axis is perpendicular to this segment through its middle in the upward direction. UCS coordinates (from -1 to +1) are normalized - correlated with the distance between the midpoints of the eyes.

UCS provides convenience and simplicity in identifying features. For example, the position of the face in frontal view is determined by the sign of symmetry of the corresponding points of the eyes relative to the line of the nose. This feature is formalized by the coincidence of the nose line with the Y axis, i.e. X1 = X2 = 0, where X1 and X2 are the coordinates of the extreme points of the nose (27 and 30) in the UCS.

Determining relative to the window SC

The coordinates of the midpoints of the left and right eyes (Left and Right):

XL = (X45 + X42) / 2; YL = (Y45 + Y42) / 2; XR = (X39 + X 36) / 2; YR = (Y39 + Y 36) / 2;

Start of UCS:

X0 = (XL + XR) / 2; Y0 = (YL + YR) / 2;

Distances between the midpoints of the eyes along the X and Y axes:

DX = XR - XL; DY = YR - YL;

The actual distance L between the midpoints of the eyes (according to the Pythagorean theorem):

L = sqrt (DX ** 2 + DY ** 2)

Trigonometric functions of the UCS rotation angle:

Go from coordinates in windowed CS to coordinates in UCS using parameters X0, Y0, L, sin AL, cos AL:

X_User_0 = 2 (X_Window - X0) / L;

Y_User_0 = - 2 (Y_Window - Y0) / L;

X_User= X_User_0 * cos_AL - Y_User_0 * sin_AL;

Y_User= X_User_0 * sin_AL + Y_User_0 * cos_AL;

Implementing image filtering consistently checking the signs:

1.A sign of the perpendicularity of the lines of the nose and eyes, as well as the symmetry of the corner points of the eyes... The nose line is defined by points 27 and 30 (see figure c). Both signs are fulfilled if the coordinates of these points in the UCS are X1 = X2 = 0 (i.e., the nose line coincides with the Y axis).

2.A sign of parallelism of the line of the eyes and the line of the mouth... The mouth line is defined by points 48 and 54 (see figure c). The characteristic is fulfilled if in the UCS Y1-Y2 = 0.

3. Sign of symmetry of the corner points of the mouth... The mouth line is defined by points 48 and 54 (see figure c). The feature is fulfilled if in UCS X1 + X2 = 0

4. Sign "The corner points of the eyes are on the same straight line"... Lines are defined by pairs of points: (36 and 45), as well as (39 and 42). Since the test for attribute 1 has already been passed, it is enough just to define in the UCS the attribute Y2-Y1 = 0 only for points 36 and 39.

There can be no absolute equality to zero, so the features are compared with an admittedly small value.

The program for comparing faces by one characteristic

The distance between the points of the bridge of the nose and the chin is taken as a sign (Landmarks points 27 and 8, see figure c). The feature, normalized, is defined in the UCS by the ratio: (Y1 - Y2) / L, where L is the distance between the centers of the eyes. When training the program, the sign for a specific person is determined by the number that is displayed next to the tracked person (this part of the code is commented out in the program). During recognition, the value of a feature is compared with a specific feature entered into the program for each person. If the comparison is positive, its identifier appears next to the face.

The program also recognizes the photo in which I am 15 years younger, and even with a mustache. The difference in the photo is significant, not everyone will catch. But a computer program cannot be fooled.

Control tasks:

Get familiar with the program.
Determine the meaning of the feature for your face and several of your colleagues.
Test the program for the identification of individuals (your own and colleagues).

#include #include #include #include #include #include #include "drawLandmarks.hpp" using namespace std; using namespace cv; using namespace cv :: face; int main (int argc, char ** argv) (// Load Face Detector CascadeClassifier faceDetector ("haarcascade_frontalface_alt2.xml"); // Create an instance of Facemark Ptr facemark = FacemarkLBF :: create (); // Load landmark detector facemark-> loadModel ("lbfmodel.yaml"); // Set up webcam for video capture VideoCapture cam (0); // Variable to store a video frame and its grayscale Mat frame, gray; // Read a frame while (cam.read (frame)) (// Find face vector faces; // Convert frame to grayscale because // faceDetector requires grayscale image. cvtColor (frame, gray, COLOR_BGR2GRAY); // Detect faces faceDetector.detectMultiScale (gray, faces); // Variable for landmarks. // Landmarks for one face is a vector of points // There can be more than one face in the image. Hence, we // use a vector of vector of points. vector< vector> landmarks; // Run landmark detector bool success = facemark-> fit (frame, faces, landmarks); if (success) (// If successful, render the landmarks on the face for (size_t i = 0; i< faces.size(); i++) { cv::rectangle(frame, faces[i], Scalar(0, 255, 0), 3); } for (int i = 0; i < landmarks.size(); i++) { //if((i >= 30) && (i<= 35)) drawLandmarks(frame, landmarks[i]); for (size_t j = 0; j < landmarks[i].size(); j++) { circle(frame, Point(landmarks[i][j].x, landmarks[i][j].y), 1, Scalar(255, 0, 0), 2); } line(frame, Point(landmarks[i].x, landmarks[i].y), Point(landmarks[i].x, landmarks[i].y), Scalar(0, 0, 255), 2); float XL = (landmarks[i].x + landmarks[i].x) / 2; float YL = (landmarks[i].y + landmarks[i].y) / 2; float XR = (landmarks[i].x + landmarks[i].x) / 2; float YR = (landmarks[i].y + landmarks[i].y) / 2; line(frame, Point(XL, YL), Point(XR, YR), Scalar(0, 0, 255), 2); float DX = XR - XL; float DY = YR - YL; float L = sqrt(DX * DX + DY * DY); float X1 = (landmarks[i].x); float Y1 = (landmarks[i].y); float X2 = (landmarks[i].x); float Y2 = (landmarks[i].y); float DX1 = abs(X1 - X2); float DY1 = abs(Y1 - Y2); float L1 = sqrt(DX1 * DX1 + DY1 * DY1); float X0 = (XL + XR) / 2; float Y0 = (YL + YR) / 2; float sin_AL = DY / L; float cos_AL = DX / L; float X_User_0 = (landmarks[i].x - X0) / L; float Y_User_0 = -(landmarks[i].y - Y0) / L; float X_User27 = X_User_0 * cos_AL - Y_User_0 * sin_AL; float Y_User27 = X_User_0 * sin_AL + Y_User_0 * cos_AL; X_User_0 = (landmarks[i].x - X0) / L; Y_User_0 = -(landmarks[i].y - Y0) / L; float X_User30 = X_User_0 * cos_AL - Y_User_0 * sin_AL; float Y_User30 = X_User_0 * sin_AL + Y_User_0 * cos_AL; if (abs(X_User27 - X_User30) <= 0.1) { //putText(frame, std::to_string(abs(L1 / L)), Point(landmarks[i].x, landmarks[i].y), 1, 2, Scalar(0, 0, 255), 2); if (abs((L1 / L) - 1.6) < 0.1) { putText(frame, "Roman", Point(landmarks[i].x, landmarks[i].y), 1, 2, Scalar(0, 0, 255), 2); } if (abs((L1 / L) - 1.9) < 0.1) { putText(frame, "Pasha", Point(landmarks[i].x, landmarks[i].y), 1, 2, Scalar(0, 0, 255), 2); } if (abs((L1 / L) - 2.1) < 0.1) { putText(frame, "Svirnesvkiy", Point(landmarks[i].x, landmarks[i].y), 1, 2, Scalar(0, 0, 255), 2); } } putText(frame, "Incorrect", Point(landmarks[i].x, landmarks[i].y), 1, 2, Scalar(0, 0, 255), 2); } } // Display results imshow("Facial Landmark Detection", frame); // Exit loop if ESC is pressed if (waitKey(1) == 27) break; } return 0; }

The most important sources of information about the outside world for a robot are its optical sensors and cameras. After receiving the image, it is necessary to process it to analyze the situation or make a decision. As I said earlier, computer vision combines many methods of working with images. During the operation of the robot, it is assumed that the video information from the cameras is processed by some program running on the controller. In order not to write code from scratch, you can use ready-made software solutions. At the moment, there are many ready-made computer vision libraries:

Matrox Imaging Library
Camellia library
Open eVision
HALCON
libCVD
Opencv
etc…

SDK data can vary greatly in functionality, licensing terms, and programming languages used. We will dwell on Opencv... It is free for both educational purposes and commercial use. It is written in optimized C / C ++, supports C, C ++, Python, Java interfaces and includes over 2500 algorithms. In addition to the standard image processing functions (filtering, blurring, geometric transformations, etc.), this SDK allows you to solve more complex tasks, which include the detection of an object in a photograph and its "recognition". It should be understood that the tasks of detection and recognition can be completely different:

search and recognition of a specific object,
search for objects of the same category (without recognition),
only object recognition (a ready-made image with it).

To detect features in an image and check for a match, OpenCV has the following methods:

Histogram of Oriented Gradients (HOG) - can be used to detect pedestrians
Viola-Jones algorithm - used to find faces
SIFT (Scale Invariant Feature Transform) feature detection algorithm
SURF (Speeded Up Robust Features) feature detection algorithm

For example, SIFT detects point sets that can be used to identify an object.

In addition to the above techniques, OpenCV has other algorithms for detection and recognition, as well as a set of algorithms related to machine learning, such as k-nearest neighbors, neural networks, support vector machines, etc. In general, OpenCV provides a toolkit sufficient for solving the vast majority of computer vision problems. If the algorithm is not included in the SDK, then, as a rule, it can be programmed without problems. In addition, there are many author's versions of algorithms written by users based on OpenCV. It should also be noted that in recent years OpenCV has expanded a lot and has become somewhat "heavyweight". In this regard, different groups of enthusiasts are creating "lightweight" libraries based on OpenCV. Examples: SimpleCV, liuliu ccv, tinycv ... Useful sites

http://opencv.org/ - The main site of the project
http://opencv.willowgarage.com/wiki/ - Old project site with documentation for old versions