Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group


80,093 Subscribers


Most popular LiDAR dataset / scene?

I’m planning to visualize a LiDAR dataset with a new tool I’ve been using. Curious on people’s opinions on the most popular LiDAR dataset / scene? KITTI? Waymo Open dataset? ISPRS?

1 Comment
20:20 UTC


Project help needed

I have to do a project on ‘text based sign detection’ using opencv and python . How do i go about doing it. Can someone please tell a solution

09:33 UTC


Does the latest real-time object recognition models identifies fooling attacks like printed images of the object or displayed image on a phone screen?

I just wanted to know if this kinds of physical adversarial attacks like printed images of a target object is still a problem in object recognition models right now, even in deep neural networks.


I'm having a research and I wanted to focus on computer vision area for my computer science degree. One of the questions I have encountered in the past when I proposed an real-time object recognition system is what if there is a printed image of the target object (ex. plastic bottle) in front of the camera, will it create a false positive result?

05:10 UTC


What's the SOTA Camera Raw Processor (demosaic/denoise/sharpen/deblur) available for licensing (or open source)?

We're looking to improve the perceptual image quality of our image processing workflow. Right now we have a home-written system doing all the standard textbook-stuff plus some application-specific stuff, but we're struggling to match the "secret sauce" we get processing our raws in e.g. Adobe Camera Raw, so we're thinking to bring on a vendor. We have decent experience in house and excellent hardware but we're a small team.

Can anybody recommend a vendor who can offer this kind of solution to integrate into our product?

22:58 UTC


Hand keypoint detection

Hello Reddit,

I have a question regarding the right tool. I'm looking for a tool / model to detect hand-keypoints in a video stream of a person assembling stuff. I know OpenPose is a possible one, also Google MediaPipe.

I’m not really getting along with OpenPose and MediaPipe don’t show really good results.

In my project, I would like to detect hand keypoints in assembly scenarios. It would be ok to use 2 cameras or a depth camera if necessary.

Does anybody knows any models / tools to use?

Thanks in advance :)

21:31 UTC


Stereo Camera Calibration

If I perform a calibration to get the intrinsic and extrinsic information for a stereo camera setup, then I can undistort and rectify then calibrate again. Would that be useful at all, and would I be able to combine the matrices together to get a more accurate calibration? Has any ever done an iterative calibration process like this?

I looked around and couldn’t find anything. My depth map doesn’t not look good at all after stereoBM or stereoSGBM with a WLS filter, so I’m trying to improve my calibration.

19:44 UTC


(Job) Contract Computer Vision Developer

Mousetrappe is seeking a with OpenCV and Python experience. Additional experience using Unreal Engine and TouchDesigner is also beneficial. The project involves automatic camera and projector calibration/localization using sensors. We are located in Burbank, CA. A candidate with the potential for short-term on-site travel would be ideal, but entirely remote work is possible. The project will run for roughly two months.

Cedar Connor
cedarconnor (at) mousetrappe.com

17:01 UTC


Good models to use for multimodal object detection when both the modalities are image based?

So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks

16:24 UTC


Ripeness of vegetables

Hi all,

So my in-laws have a small farm and I am considering a hobby ML project to try to determine the ripeness stage of vegetables.

After the vegetable becomes of a decent size, the farmer looks at the color/shade of the vegetable to determine if it is ripe enough to pluck. It is also possible to hold the vegetable in your hand and feel the firmness but that's not something I can do with a vision model.

For something like eggplants, it is basically different shades of light green. An experienced farmer knows it instinctively whereas I am usually clueless.

So I wonder if it is reasonable to try and fine-tune a vision model like DETR or YOLO to do this kind of thing?

Does anyone have any other suggestions / starting points for me to investigate?

I haven't yet decided if I want to do it, but I started learning ML recently and wanted to try my hands at something new but solvable.

14:32 UTC


How to load SSD mobilenet tensorflow object detection custom trained model into unity 3D?

12:45 UTC


Speeding Up My Augmented Reality College Project with Pre-built Solutions

Hey, guys! I need to start an augmented reality college project and present the MVP by the end of the year. As the deadline is a bit tight, my advisor suggested that, for now, I explore pre-built solutions to speed up the development process. Do you know of any additional libraries or resources that could help me? The project involves rendering objects 3D through the smartphone camera.

21:49 UTC


datamuro can't import roboflow_voc

I must be doing something wrong but, I am following the eBasic Skills example from datamuro user guide and getting errors at the very first step.

I exported my dataset from Roboflow using VOC. I unzipped the zip into its own dir @ /root/voc. It has train/ test/ valid/ sub dirs.

I tried CLI and IDLE:

Tree output

╰─❯ tree -d ~/voc
├── test
├── train
└── valid


╰─❯ datum project import -n cm-v7 -f roboflow_voc -p ../datamuro/projects/roboflow ~/voc
2023-10-01 14:50:07,936 INFO: Checking the source...
2023-10-01 14:50:08,981 ERROR: Failed to find dataset 'roboflow_voc' at '/root/datamuro/projects/roboflow/cm-v7'
Traceback (most recent call last):
  File "/root/datamuro/projects/venv/bin/datum", line 8, in <module>
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/cli/__main__.py", line 150, in main
    retcode = args.command(args)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/util/scope.py", line 158, in wrapped_func
    ret_val = func(*args, **kwargs)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/cli/commands/require_project/modification/import_.py", line 176, in import_command
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 1556, in make_dataset
    return ProjectBuilder(self._project, self).make_dataset(pipeline)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 379, in make_dataset
    dataset = self._get_resulting_dataset(pipeline)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 444, in _get_resulting_dataset
    graph, head = self._run_pipeline(pipeline)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 441, in _run_pipeline
    return self._init_pipeline(pipeline, working_dir_hashes=wd_hashes)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 573, in _init_pipeline
    dataset = _try_load_from_disk(stage_name, stage_config)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 552, in _try_load_from_disk
    return ProjectSourceDataset(
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 99, in __init__
    dataset = Dataset.import_from(rpath, env=tree.env, format=config.format, **config.options)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/dataset.py", line 700, in import_from
    else importer(path, **kwargs)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/importer.py", line 67, in __call__
    raise DatasetNotFoundError(path, self.NAME)
datumaro.components.errors.DatasetNotFoundError: Failed to find dataset 'roboflow_voc' at '/root/datamuro/projects/roboflow/cm-v7'


╰─❯ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datumaro.components.dataset import Dataset
^[[Adata_path = '/root/voc'
>>> data_format="roboflow_voc"
>>> dataset = Dataset.import_from(data_path, data_format)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/dataset.py", line 700, in import_from
    else importer(path, **kwargs)
  File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/importer.py", line 67, in __call__
    raise DatasetNotFoundError(path, self.NAME)
datumaro.components.errors.DatasetNotFoundError: Failed to find dataset 'roboflow_voc' at '/root/voc'

I am trying to import and then export to CVAT so I can import my dataset into CVAT. Any advice would be greatly appreciated.


20:51 UTC


Reinforcement Learning + Computer Vision listing papers

Hello everyone!

A while back, I stumbled upon an interesting paper that applied Reinforcement Learning to Object Localization. I got fascinated by how computer vision tasks could be transformed into a reinforcement learning problem, making it feel like a Markov decision process !

So, i've decided to create a repository to compile all the existing (published) papers that delve into Reinforcement Learning in Computer Vision : https://github.com/rayanramoul/RLCV-Papers

If you have any papers in mind or recommendations to enhance the repository, please don't hesitate to share them. Your input would be greatly appreciated!

Thank you! :)

16:24 UTC


Best way to understand the basics

Hi CV community,

I was wondering what the best resources were (textbooks/ lectures on YouTube) were to get into Computer Vision. My purpose is to get a working understanding of the theory behind it in order to make my own models. Many thanks!

15:08 UTC


Best depth cameras for small objects? (e.g. sewing needle or <1cm sized objects)

Looking for a depth camera for:

- small workspace (15x15x15cm) and small objects (e.g. sub-centimeter scale)

- reconstruction accuracy <1mm (ideally)

- works in real-time (~10Hz)

- will be used with a robotic arm

Cameras from Zivid looked promising, but wasn't sure if there were other options out there. Any suggestions would be great.

12:08 UTC


Auto Roto and Remove in Nuke by XMEM ,Propainter

10:51 UTC


Fastest way to trigger a single frame image from any time of camera

Hi All,

I'm looking at a setup that would have a hardware trigger that would send a signal to a camera to capture a single still image.

All the web cameras would be bound by their FPS so are not great, and DSLRs are 100-150ms.

I will be using a laser that when broken will need to take an image, this will happen every 5-6 seconds. The object will be going ~100mph and there is a TON of light available / can add more if needed.

How can I trigger a camera using python, c++/c#, or a raspberry pi with <10ms response time?

04:50 UTC


What are some good pairs of transfer learning source and target datasets for image classification ?

As the title says, I'm curious about some well used transfer learning tasks.

ImageNet to other datasets is common, but what are something good pairs I can try and mess around with ?

Like CIFAR10 to MNIST, or CIFAR10 to CIFAR100, or CIFAR10 to SVHN.

While ImageNet is 224x224, and some of the above ones are 32x32, I also want to look at 64x64(like TinyImageNet)

P.S : I know that transfer learning is mainly used for situation where the target dataset is quite small, but I want to use some standard popular datasets and see how things work

1 Comment
20:25 UTC


How to load SSD mobilenet tensorflow object detection custom trained model into unity 3D?

18:51 UTC


Struggling to get interviews what to do?

I have 4 yoe in a start up company and a phd four publications 2 in high level math journals and 2 CV/DL papers in A journals and also 4 patents. I have experience with most common Cv tasks eg object detection, Multi object tracking, 2d/3d human pose estimation and monocular depth estimation. I’m well versed in typical network building blocks eg conv nets, FFNs, transformers, Diffusion etc. I have a little experience with NLP like NLTK and TTS networks. Also some other general dev technologies like ec2,s3,sql,mongoose, etc.

That all being said I can’t seem to even get interviews these days just straight rejections not talking to recruiters. On the other hand in 2020, I was just searching for jobs passively and had something like a 75% success rate with getting interviews. I know the job market has changed but I’m a lot more experienced at this time than then and having abysmal luck.

Anyone have any advice would be happy to share my resume if that would make it easier to give advice. Also open to hearing what other technologies o should/could learn.

16:49 UTC


Highlights for every ICCV 2023 paper

Here is the list of all ICCV 2023 (International Conference on Computer Vision) papers, and a short highlight for each of them. Among all ~2,100 papers, authors of around 800 papers also made their code or data available. The 'related code' link under paper title will take you directly to the code base.


In addition, here is the link of "search by venue" page that can be used to find papers within ICCV-2023 related to a specific topic, e.g. "diffusion model":


ICCV 2023 will take place at Paris on Oct 2nd, 2023.

1 Comment
12:05 UTC


Which ML model does Apple use to generate stickers in iOS 17?

I've experimented with semantic segmentation models like DeeplabV3 on Core ML to try to generate stickers from images. However, the results from DeeplabV3 don't look nearly as clean and accurate as the stickers generated in the Messages app on iOS 17.

On the Apple Machine Learning models page, DeeplabV3 is the only image segmentation model they offer. However, it seems they must be using a more advanced model to achieve such high quality sticker generation directly from the camera roll.

Does anyone have any insight into what kind of model Apple might be using behind the scenes? I'd like to try and replicate their level of accuracy if possible. Any information or suggestions would be greatly appreciated!

06:40 UTC


Has anyone deployed YOLO model on sagemaker

I have trained a YOLO v8 model and have deployed it on AWS Sagemaker.

But while accessing the endpoint to make predictions it gives timeout error.

The logs show that it is trying to install ultralytics from requirements.txt file. And this is required to load YOLO model(from ultralytics import YOLO).

Any suggestions on how to deploy it?

04:09 UTC


Which Real Time Instance Segmentation to use in 2023?

I really like user experience of Yolov8 but license is restrictive. I tried RTMDet-Ins, but in practice, the inference time was really underwhelming (0.3s per 640px frame) https://paperswithcode.com/paper/rtmdet-an-empirical-study-of-designing-real

Any interesting recent projects that I missed?

22:34 UTC


ROScribe is now autogenerating both ROS1 and ROS2

We are pleased to announce that we have released a new version of ROScribe that supports ROS2 and well as ROS1.

ROScribe is an open source project that uses human language interface to capture the details of your robotic project and creates the entire ROS packages for you.

ROScribe motivates you to learn ROS
Learning ROS might feel intimidating for robotic enthusiasts, college students, or professional engineers who are using it for the first time. Sometimes this skill barrier forces them to give up on ROS altogether and opt out for non-standard options. We believe ROScribe helps students to better learn ROS and encourages them to adopt it for their projects.
ROScribe eliminates the skill barrier for beginners, and saves time and hassle for skilled engineers.

Using LLM to generate ROS
ROScribe combines the power and flexibility of large language models (LLMs) with prompt tuning techniques to capture the details of your robotic design and to automatically create an entire ROS package for your project. As of now, ROScribe supports both ROS1 and ROS2.

Keeping human in the loop
Inspired by GPT-Synthesizer, the design philosophy of ROScribe is rooted in the core belief that a single prompt is not enough to capture the details of a complex design. Attempting to include every bit of detail in a single prompt, if not impossible, would cause losing efficiency of the LLM engine. Powered by LangChain, ROScribe captures the design specification, step by step, through an AI-directed interview that explores the design space with the user in a top-down approach. We believe that keeping human in the loop is crucial for creating a high quality output.

Code generation and visualization
After capturing the design specification, ROScribe helps you with the following steps:

  1. Creating a list of ROS nodes and topics, based on your application and deployment (e.g. simulation vs. real-world)
  2. Visualizing your project in an RQT-style graph
  3. Generating code for each ROS node
  4. Writing launch file and installation scripts

Source code and demo
For further detail of how to install and use ROScribe, please refer to our Github and watch our demo:
ROScribe open source repository
TurtleSim demo

Version v0.0.3 release notes
ROS2 integration:

  • Now ROScribe supports both ROS1 and ROS2.
  • Code generation for ROS2 uses rclpy instead of rospy
  • Installation scripts for ROS2 use setup.py and setup.cfg instead of CMakeLists.txt.

ROScribe supports both ROS1 and ROS2 with Python code generation. We plan to support the following features in the upcoming releases:

  1. C++ code generation
  2. ROS1 to ROS2 automated codebase migration
  3. ROS-Industrial support
  4. Verification of an already existing codebase
  5. Graphic User Interface
  6. Enabling and integrating other robotic tools

Call for contributions
ROScribe is a free and open source software. We encourage all of you to try it out and let us know what you think. We have a lot of plans for this project and we intend to support and maintain it regularly. we welcome all robotics enthusiasts to contribute to ROScribe. During each release, we will announce the list of new contributors.

21:29 UTC



I am interested in programming and am looking for a laptop that is good. I am willing to spend around $1,800, or even a little more if necessary. I want the computer to be capable of gaming, programming (electronics compatible), video editing, and the like, because I want to be able to do pretty much anything, although I know that may not be possible. Could you recommend any particular part or model that meets these requirements

21:11 UTC


Please help me identify digits on top right corner of vehicle

I was involved in a hit and run. All help would much appreciated. See image below:


19:49 UTC


python vs c++ for YOLO for jetson nano

I know this question has been beaten to death but I want to know about the current state as of 2023. We are using Yolo V5, V8 and Yolo NAS at my company. We have to deploy them on a Jetson nano.

We are now having a discussion that should we switch to C++ instead of using the YOLO libraries. My point of view is that as PyTorch is already optimized enough we should not be observing any tangible benefits by switching to C++. Also I think that these libraries are very well compiled and to write up code from scratch that can achieve parity with these libraries will take quite sometime.

So what are your thoughts? Are there any advantages of writing yolov5 from scratch on C++? Or is python sufficient enough?

16:07 UTC

Back To Top