AcademicProjectVideos

Monocular Object Distance Estimation

Using a single camera to estimate the distances of objects reduces costs compared to stereo-vision and LiDAR.  Although monocular distance estimation has been studied in the literature, previous methods all rely on knowing an object’s class in some way.
In this paper, we aim to circumvent the potential downsides of these kinds of approaches, and provide an alternative technique that does not use any information relating to its class.
We propose a method that combines the change in an object’s appearance over time together with the camera’s motion to estimate the object’s distance to the camera.
Furthermore, we have designed our model to adaptable to new environments: Our model is able to maintain performance across different object detectors, and be easily adapted to new object classes.
We test our model across different scenarios of training and testing on the KITTI MOTS dataset’s ground truth annotations and TrackRCNN and EagerMOT outputs.
These detections are then used to obtain measurements on its detection source agnostic and class agnostic properties.
Our results show that we are able to outperform methods IPM, SVR, and in test environments with multi-class detections.

Q: Why is it not desirable to memorize distances according to object type? Isn’t this how we often estimate depth, by knowing approx. size of objects, and how big they look?
A: This method is desirable and from our results of the term project (and other research), it is highly effective. The caveat however, is that this method is highly specific to the object classification of the dataset. For example, in KITTI, there are 7 different types of objects while in nuScenes, there are 23. A model of this approach can only operate in environments that are very similar to its training data. Meaning that the objects that it’s estimating the distance of must be one of the object classes in the training data. In the case that the user wish to expand the model’s functionality to predict the distances of a new type of object, the model will need to be:

  1. remade with an extra input parameter
  2. retrained with a new dataset that has that additional class of object

in short, this approach is very specific and has zero flexibility on what it can be used for.

Q: Is it possible to train using trackRCNN detections (but still with GT depth values)?
A: Yes! The results are identical (within margin of error) to the model trained on GT detections. This indicates that this approach is portable to different datasets and that if we can improve the fundamental accuracy, this approach will be able to be used on all systems.

Q: how much does using IMU data help?
A: This current approach is based on the mathematical approach that was part of the presentation.  If we know:

  1. how much the camera has moved
  2. how much the object has changed

we can predict how far the object is.

Q: Main contributions I am trying to claim:
A: A monocular object distance estimator with the following properties:

  1. Class agnoticity: instead of just memorizing how big certain objects are at certain distances away (then inter/intrapolate), estimate the distances with no object specific information
  2. Dataset agnoticity: maintain performace across GT and other third-party object tracking network’s annotations/outputs

Besides that, in the current method I would also like to claim the preprocessing technique that represent the changes between 3 intervals with 3 channels, but we’ll see if this works out.

YouTube player

Here is our GitHub page:

4 thoughts on “Monocular Object Distance Estimation

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.