Blog Page


Improving the Object Detection Ability of Artificial Intelligence – AZoRobotics

We use cookies to enhance your experience. By continuing to browse this site you agree to our use of cookies. More info.
Object detection in 3D spaces is a key component of many computer vision applications, such as image sensing in autonomous driving or robot navigation. However, sophisticated, high-performance systems such as Lidar can be expensive and computationally complicated.
Now, a team of researchers at North Carolina State University has developed a new system dubbed MonoCon that enhances the ability of object detection of artificial intelligence (AI) programs using 2D images.
We live in a 3D world, but when you take a picture, it records that world in a 2D image. 
Tianfu Wu, Corresponding Author and Assistant Professor of Electrical and Computer Engineering at North Carolina State University
One of the main questions the team had to address was whether it was possible to recover information on 3D structures and the surrounding environment using 2D images that have lost some of the relevant depth information.
AI programs that receive relevant visual information from standard cameras convert 2D images into information that allows them to navigate 3D spaces by placing certain objects, such as vehicles, people, traffic structures, etc., into their surroundings.
While the researchers were particularly interested in the use of the MonoCon system for application in autonomous driving, the system could also find application potential in robotics and manufacturing systems.
While the majority of today’s self-driving or autonomous systems employ Lidar to navigate 3D environments, these systems are expensive. Lidar, an acronym for light detection and ranging, uses a series of eye-safe lasers to map an area and create a 3D representation of an environment.
However, due to the cost, it is expensive and doesn’t allow much room for redundancy as it is not economical to include multiple Lidar scanners on a vehicle.
But if an autonomous vehicle could use visual inputs to navigate through space, you could build in redundancy.
Tianfu Wu, Corresponding Author and Assistant Professor of Electrical and Computer Engineering at North Carolina State University
This is because, unlike Lidar, cameras are much cheaper, meaning it would be economically viable to incorporate multiple cameras, making it possible for redundancy to be built into the system, thereby making it safe and sturdy.
However, the possibility of using 2D input and extracting 3D data also offers another exciting opportunity and exemplifies one of the advanced capabilities MonoCon carries – effectively telling an AI system the exterior edges of a relevant object by placing objects into ‘bounding boxes’.
The AI is trained by reading a series of 2D images and placing these bounding boxes around certain articles in the image. Each one of these boxes is a cuboid comprised of eight points. The AI is then fed the relevant information to each of the points allowing it to understand the height, length and width as well as the linear measure between the corners.
The AI can then make its predictions about the objects in relation to one another and their distance from the camera. Once the AI has made its calculations, the researchers can fine-tune the data by correcting any mistakes, which over time improves the performance of the AI.
What sets our work apart is how we train the AI, which builds on previous training techniques.” Wu says. 
The proposed method is motivated by a well-known theorem in measure theory, the Cramér–Wold theorem. It is also potentially applicable to other structured-output prediction tasks in computer vision.
Tianfu Wu, Corresponding Author and Assistant Professor of Electrical and Computer Engineering at North Carolina State University
As well as asking the AI to predict the distance between the camera, various objects, items, articles and the proportions of the bounding boxes, the AI was also given the objective to predict the positions of each of the eight points on the box and work out the distance from the center of the bounding box.
This process is known as ‘auxiliary context’ and allows the AI to detect and predict the location of 3D objects more accurately using 2D images.
The MonoCon system was then tested on an extensive set of data called KITTI, which outperformed many of the other AI programs that have been developed to extract 3D data from 2D images. While MonoCon also performed well when asked to identify bicycles and/or pedestrians, other programs still fare better.
The next step is fine-tuning MonoCon using larger datasets in order to make it scalable for use in autonomous, self-driving driving applications.
We also want to explore applications in manufacturing, to see if we can improve the performance of tasks such as the use of robotic arms,” Wu concludes.
Shipman, M., (2022) Technique Improves AI Ability to Understand 3D Space Using 2D Images. [online] NC State News. Available at: https://news.ncsu.edu/2022/01/monocon-ai-3d/
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.
Written by
David is an academic researcher and interdisciplinary artist. David's current research explores how science and technology, particularly the internet and artificial intelligence, can be put into practice to influence a new shift towards utopianism and the reemergent theory of the commons.
Please use one of the following formats to cite this article in your essay, paper or report:
Cross, David. (2022, February 07). Improving the Object Detection Ability of Artificial Intelligence. AZoRobotics. Retrieved on February 08, 2022 from https://www.azorobotics.com/News.aspx?newsID=12729.
Cross, David. "Improving the Object Detection Ability of Artificial Intelligence". AZoRobotics. 08 February 2022. <https://www.azorobotics.com/News.aspx?newsID=12729>.
Cross, David. "Improving the Object Detection Ability of Artificial Intelligence". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=12729. (accessed February 08, 2022).
Cross, David. 2022. Improving the Object Detection Ability of Artificial Intelligence. AZoRobotics, viewed 08 February 2022, https://www.azorobotics.com/News.aspx?newsID=12729.
Do you have a review, update or anything you would like to add to this news story?
Cancel reply to comment
AZoRobotics speaks to Odei Garcia-Garin from the University of Barcelona. Odei is part of a Research Group on Large Marine Vertebrates at the University of Barcelona; the team is currently developing an artificial intelligence based app called MARLIT that studies floating marine macro-litter.
We speak to Dr. Daniel Ahmed about the robotic microswimmers that are paving the way for innovation between biomedical and microrobotic research.
Multinational technology corporation IBM calculated that 72% of business leaders cited fraud as a growing concern in the last year. AI has become the leading tool for fighting fraud, but it can still be improved upon.
This page discusses Celera Motion’s Capitan Series to their line of premium performance Ingenia servo drives.
This page details the capabilities of the TeraRanger Evo Mini, a single- and multi-pixel capability in one sensor.
The basis of the fiber alignment system is a very stiff set-up consisting of the H-811 hexapod and P-616 NanoCube® nanopositioner.
AZoRobotics.com – An AZoNetwork Site
Owned and operated by AZoNetwork, © 2000-2022


× How can I help you?