Human Crowd Detection With Public Webcam Part 3

DataValley Team
January 9, 2022
1:06 am
No Comments

Introduction To YOLO — You Only Look Once

All of the previous object detection algorithms use regions to localize the object within the image. The network does not look at the complete image. Instead:

It Intakes an image and divides it in a grid of S X S (where S is a natural number)
Each pixel in the image can be responsible for a finite number of (5 in our case) bounding box predictions. A pixel is taken responsible for prediction when it is the center of the object detected. Out of all detected boxes, It is taken responsible for the detection of only one object, and other detections are rejected.
It predicts C conditional class probabilities (one per class for the likeliness of the object class).

Notes:

Total detections to be done per image=S X S((B*5)+C)

S X S= Total number of images YOLO divides the Input
B is the Number of Bounding boxes Detected all over the image(Without any threshold consideration).
B*5= For each bounding box, 5 elements are detected:
1. Detected Objects Center coordinates(x,y)
2. Height and Width
3. Confidence score.
C=Conditional probability for the Number of Classes.

– What is the Confidence threshold?

The threshold for minimum confidence the model has on a detected object (box confidence score)

– How is the box confidence score calculated?

However, most of these boxes have low confidence scores and if we set a threshold say 30% confidence, we can remove most of them as shown in the example below

YOLO requires a Neural Network framework for training and for this we have used DarkNet

Let’s Learn about different YOLO versions:

– YOLOv1:

has 26 layers in total, with 24 Convolution Layers followed by 2 Fully Connected layers. The major problem with YOLOv1 is
its inability to detect very small objects.

– YOLO9000 / YOLOv2:

Inclusion of batch Normalization layers after each Conv Layer
It has 30 layers in comparison to YOLOv1’s 26 layers.
Anchor Boxes were introduced.

Notes:

Anchor boxes are predefined boxes provided by the user to Darknet which gives the network an idea about the relative position and
dimensions of the objects to be detected. It must be calculated using the training set Objects.

No fully connected layer present
Random dimensions were taken for training images ranging from 320–608
Multiple labels might be provided to the same objects, but still a multiclass problem (WordTree concept)
i.e. either the parent or child be the final label and not both.
Still bad with small objects

– YOLOv3:

106 layers neural network
Detection on 3 scales for detecting objects of small to very large size
9 anchor boxes taken; 3 per scale. Hence more bounding boxes are predicted than YOLOv2 & YOLOv1
Multi-class problem turned into Multi-Label problem
Certain changes in the Error function.
Quite good with small objects

– YOLOv4:

There are a huge number of features
Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets:
1. There are a huge number of features
2. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets
3. ~65 frames per second (FPS)

– YOLOv5:

0.007 seconds per image
~140 frames per second (FPS)

Conclusion:

YOLOv5 is orders of magnitude faster (~140 frames per second) than other object detection algorithms. The limitation of the YOLO algorithm is that it struggles with small objects within the image, for example, it might have difficulties in detecting a flock of birds.
This is due to the spatial constraints of the algorithm.

In the next part, we will learn about the Implementation of HUMAN CROWD DETECTION and How to solve the problem of Detection Small Objects in YOLOv5

References:

Official repository – declares the latest version of YOLOv4 / Scaled-YOLOv4: https://github.com/pjreddie/darknet#darknet
Official YOLOv4 repository: https://github.com/AlexeyAB/darknet
Official paper – YOLOv4: https://arxiv.org/abs/2004.10934
Official paper – Scaled-YOLOv4: https://arxiv.org/abs/2011.08036
Taiwanese government uses YOLOV4: https://www.taiwannews.com.tw/en/news/3957400
Official paper – YOLOv5: https://docs.ultralytics.com/
Official repository – YOLOv5: https://github.com/ultralytics/yolov5

Human Crowd Detection With Public Webcam Part 3

Leave a Reply Cancel reply

Unlimited access to educational materials for subscribers

Resources

Information

Social Media

We Accept