Papers

Unable to Load Image

Towards an Error-free Deep Occupancy Detector for Smart Camera Parking System

CVCIE (ECCV Workshop)
Abstract:

Although the smart camera parking system concept has existed for decades, a few approaches have fully addressed the system's scalability and reliability. As the cornerstone of a smart parking system is the ability to detect occupancy, traditional methods use the classification backbone to predict spots from a manual labeled grid. This is time-consuming and loses the system's scalability. Additionally, most of the approaches use deep learning models, making them not error-free and not reliable at scale. Thus, we propose an end-to-end smart camera parking system where we provide an autonomous detecting occupancy by an object detector called OcpDet. Our detector also provides meaningful information from contrastive modules: training and spatial knowledge, which avert false detections during inference. We benchmark OcpDet on the existing PKLot dataset and reach competitive results compared to traditional classification solutions. We also introduce an additional SNU-SPS dataset, in which we estimate the system performance from various views and conduct system evaluation in parking assignment tasks. The result from our dataset shows that our system is promising for real-world applications.

SNU-SPS Dataset

Overview of SNU-SPS dataset
SNU-SPS dataset contains nearly 3500 images to support our system. Those images are captured from various views, heights (1-3m), and light conditions in indoor and outdoor parking lots. Each parking lot has different parking spot background colors. The total images were manually checked, labeled, and attached by GPS to the corresponding parking slot. The protocol used to construct the SNU-SPS dataset is composed as follows:
Image Acquisition:
All images are captured with a full-HD resolution. For the training set, it is captured randomly for one month in 15 parking lots. Meanwhile, the test set is captured consecutively in 6 parking lots from 3-6pm through 5 working days. It should be noted that none of the 6 parking lots are in the training set. Moreover, test samples contains various weather conditions (sun/rain/cloudy) and has corresponding surrounding traffic measurements from the open government website.
Image Labeling:
For each parking sector, images were labeled as available/ occupied/ illegal/ restricted of each parking space. Each annotation is covered by four keypoints that specify for the localization of a parking lot. We formulate the wrapping bounding boxes for the detector from these key points. Especially, we provide optional image masks for the test set to filter out overlapping areas and non-important localization among capture among parking lots. The intention is to maintain the system’s constraints and preserve a better parking assignment benchmark.
Require Dataset Access
Details of SNU-SPS dataset

Fault Tolerance Parking System

The overall architecture is designed for a full webservice integration, which is my thesis and going to be released in this November. Through the overall architecture, we demonstrated the intergration of spatial module and training error module in filtering wrong detections and storing for annotations. Images/captures after those grading will be noticed in the system so that its result won't be counted in the aggregation process. In the following part, the main ingredient of training and creating OcpDet will be introduced.
Overall Architecture

OcpDet Detector

As we aim to capture both high level features and low level features of the input image throughout the model interpretation, which combine the information of the parking borders/lines and the object inside those. Thus, Our OcpDet inherited the structure of Retina detector for this mechanism. However, we conduct additional modification on this architecture. First of all, instead of regressing only the center and the width of the bounding boxes, we integrate 4 others keypoints as new outcomes for the model (which is provided from our dataset).
Modules Architecture: each block is attached to the FPN feature level for its training and prediction
Moreover, we attached two new modules for the Spatial Estimator and the Training Error Estimator as decribed in the abstract.
Training Error Estimator:
For training error, we integrate the Learning Loss approach to predict the error of the prediction/predict the loss that the image can cause. Based on this loss, we can replicate how much the loss will be for the output if we labeled and compute for an input. However, to make the Learning Loss behave better with the dataset statistic, we sample the predicted Learning Loss over the dataset as its mean and variance and compare inferenced samples by a distribution distance to these statistics.
Error training samples behavior (on the left) & Error determination/Spatial error (on the right)
Spatial Estimator:
As most of our nowadays detection focus on grid and anchor boxes, the prediction is filtered by a threshold value and background value. This can cause a situation where a correct activation anchor box has a score below this threshold, or a slight below the background confidence. Therefore, we provide an easy-integrated method by predicting the active anchor boxes in the scene by attaching another CNN module beside the Training Error Estimator and predict a mask of active anchors. This can be done reliably as the parking spots are equally patched in the scenes and can be well covered by different level of anchors. From this mask, we can suppress wrong detections by decreasing their confidence and enhance high-confidence anchors.
For formulation, each block of the spatial module will predict a 2D mask corresponding to the anchors of the same branch prediction. This mask then will be reduce by a confidence threshold before combining with the confidence prediction branch for suppression and enhancement. The blue color showing in the result picture below meaning those anchor got suppressed (the confidence got minus), while the yellow is enhanced (the confidence got increased). Noted that, this is done on all classes except the background
By knowing the training-error and spatial estimator, we can determine which sample is not reliable due to training and which sample cause lots of spatial recovery and adjustment. From this preliminary, we can determine and improve our detector efficiently
The results of these two modules

Training Description

The total framework has been trained by Tensorflow Object Detection API, please follow the instruction from my github resporitory for reinstallation and remake
Jun 15, 2022

GreedySlide: An Efficient Sliding Window for Improving Edge-Object Detectors

RICE (Vietnamese Local conference)
Abstract:

The recent development in deep learning and edge hardware architecture has provided artificial applications with a robust foundation to move into real-life applications and allow a model to inference right on edge. If a well-trained edge object detection (OD) model is acquired, multiple scenarios such as autonomous driving, autonomous hospital management, or a self-shopping cart can be achieved. However, to make a model well-inference on edge, a model needs to be quantized to scale down the size and speed up at inference. This quantization scheme creates a degradation in the model where each layer is restricted to at most lower representations, forcing an output layer only to have fewer options to circle an object. Furthermore, it also limits model generalization where the behavior of the dataset gets cut off each activation layer. We proposed a novel method GreedySlide by sliding window that divides a capture into windows to make an object fits better on the quantization bound to address this problem. Even though the technique sounds simple, it helps increase the number of options for bounding an object and clips the variance that can have by scanning the whole image. Our work has improved an original edge model on its corresponding benchmark by experimenting and increasing the model generalization on other related datasets without retraining the model.

GreedySlide

GreedySlide allows to take advantage of any model inference performance and improve model generalization. To perform our GreedySlide Algorithm, we divide our work into three phases: Sliding Windows Detection, Bounding Boxes Suppression and Greedily Bounding Boxes Selection.

Demonstation of the GreedySlid Algorithm : Top picture is the result of bounding boxes by an original model; Middle picture is the ”partition” bounding boxes after Sliding Windows Detection of the same model; Bottom picture is the final bounding boxes after Bounding Boxes Suppression and Greedily Bounding Boxes Selection

May 25, 2021