/r/computervision

Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group

/r/computervision

109,401 Subscribers

2

Icon detection in blueprint

I'm new to CV and been trying to do template matching and find all instances of the template in the image.

Image

Template

Result

I can get a fair amount of matches, but there are a lot more that I would expect to find.

I am using cv2.matchTemplate with cv2.TM_CCOEFF_NORMED, but have tried pretty much every option. To try and help with rotation variants, I am rotating and flipping the template image and doing multiple passes. E.g. cv2.flip(cv2.rotate(cv2.imread(image_path), cv2.ROTATE_180), -1)

I've tried sift / org / mser matching but get no matches. If I try to do more angles (currently only able to do 90 degree increments), I end up with no matches, e.g.

rotation_matrix = cv2.getRotationMatrix2D((cols / 2, rows / 2), angle, 1) rotated_template = cv2.warpAffine(scaled_template, rotation_matrix, (cols, rows))

I'm kind of at a loss for what to do next.

If I reduce how much it has to match by, I end up with so many false positives it's ridiculous.

Is there a different approach I should be taking here?

0 Comments
2025/02/04
19:37 UTC

0

Extend a video

I have a 10s video. I want to extend it to 30s. for the project i need to show code to teacher. What AI model opensource are there. I tried to take the last and first frame and create an animation but that doesnt look so natural.

2 Comments
2025/02/04
19:28 UTC

1

YOLO food image detection with dataset

Are there any easily followable implementations of food detection using yolo (or an alternate model) and a food dataset that anyone could recommend? The few I've found are either outdated and difficult to implement, or have too few classes.

I ideally need it to function similar to yolo, where multiple food items on a plate could each be identified. And if possible more than 50 classes of foods.

Any suggestions?

2 Comments
2025/02/04
18:28 UTC

5

3D reconstruction from RGBD images.

I am workin on 3D reconstruction task. I have tried the tutorials from open3D but always found that no matter the algorithm the reconstruction quality is not good, there is always a pose drift or misaligned in some weird ways. I have also tried global pose optimization but nothing improves the results.

Are there any resources that I can look into or repos that have a good guide on this subject?

3 Comments
2025/02/04
14:43 UTC

1

Clarification on 17 Keypoints in MPI-INF-3DHP Dataset

Hello,

I'm currently working with the MPI-INF-3DHP dataset and have encountered references to a subset of 17 keypoints used for 3D human pose estimation. These 17 keypoints typically include major joints such as the nose, neck, shoulders, elbows, wrists, hips, knees, and ankles.

However, the MPI-INF-3DHP dataset provides 3d annotations for 28 keypoints. I'm interested in understanding which specific 17 keypoints are commonly selected from the full set for tasks like 3D pose estimation.

Could anyone provide insights or resources detailing the selection of these 17 keypoints from the MPI-INF-3DHP dataset? Any guidance or references would be greatly appreciated.

Thank you!

0 Comments
2025/02/04
14:33 UTC

2

Camera recommendations: Optical Zoom, but software adjustable optical zoom

Hi all,

Looking for a camera that has optical zoom, but want to be able to control the zoom level through code. Anyone have any recommendations for such a camera?

6 Comments
2025/02/04
14:11 UTC

0

Is it possible to combine different best.pt into one model?

Me and my friends are planning to make a project that uses YOLO algorithm. We want to divide the datasets to have a faster training process. We also cant find any tutorial on how to do this.

13 Comments
2025/02/04
11:44 UTC

6

Google Coral TPU vs Offboard-processing

Hi everybody,

I am researching for my ambitious project of a CV - capable drone. Context: 5 inch drone (10 inch motor mount - to - motor mount), aiming for < 300g all up weight. I have a rover that carries the drone aircraft-carrier style and i'd like to precision land on it.

So far I have narrowed down my options to 2:

- Off-board processing. I'd just get a powerful Jetson Nano on the rover and have the drone stream to it and do the CV task there. I have been streaming live video between Pis and the latency seems low (<150ms), but that still might not be enough.

- On board processing: Since weight and dimension is quite a constraint, I'm thinking a Pi Zero 2W + Coral USB accelerator.

The CV tasks should be extremely light, I'll just put markers on the rover.

I'd love to hear what you all think. If you have any other suggestions, that would be of great help too (a board the same size as a pi zero but has integrated NPU, wifi & video encoding hardware would be so nice, but I spent hours and came out fruitless).

Somehow DJI pulls off CV tasks on their absolutely miniscule 135g Neo, which I find absurd. What kind of wizardry is that???

2 Comments
2025/02/04
11:11 UTC

0

How to implement automatic image capture based on object orientation in camera view?

Hi everyone,

I'm working on an app that needs to automatically capture images when objects appear in a specific orientation within the camera view. For example, when an object rotates to a particular angle or position, the app should automatically take a photo.

Technical requirements:

  • Need to detect object orientation in real-time through the camera feed
  • Trigger automatic image capture when specific orientation criteria are met

Has anyone implemented something similar? I'm looking for suggestions on:

  1. Best approaches for real-time orientation detection
  2. Recommended libraries or frameworks that could help with this
0 Comments
2025/02/04
09:50 UTC

0

Which 3D Object Detection Model is Best for Volumetric Anomaly Detection?

I am working on a 3D object detection task using a dataset composed of stacked sequential 2D images that together form a volumetric representation (Grayscale images). Each instance consists of 1024×1024×2000 (H×W×D) image stacks, and I have 3D bounding box annotations available for where the anomaly exists (So 6 coordinates for each bounding box). My GPU has 24GB VRAM, so I need to be mindful of computational efficiency.

I am considering the following 3D deep learning architectures for detecting objects/anomalies in this volumetric data:

3D ResNet, 3D Faster R-CNN, 3D YOLO, 3D VGG

I plan to experiment with only two models of which one would be a simple baseline model. So, which of these models would be best suited? Or are there any other models that I haven't considered that I should look into?

Additionally, I would prefer models that have existing PyTorch/TensorFlow implementations rather than coding from scratch. That's why I'm a bit more inclined to start with Pytorch's 3D ResNet (https://pytorch.org/hub/facebookresearch\_pytorchvideo\_resnet/)

My approach with the 3D ResNet is doing a sliding window (128 x 128 x 128), but not sure if this would be computationally viable. That's why I was looking into 3D faster R-CNN, but I don't seem to find any package out there for this. Are there any existing PyTorch/TensorFlow implementations for 3D Faster R-CNN or 3D YOLO?

10 Comments
2025/02/04
09:09 UTC

3

Minimizing Drift in Stitched Images

Hey guys, I’m working on image stitching software to stitch upwards of 100+ pictures taken of a flat road moving in a straight line. Visually, I have a good looking stitch, but for longer sequences, the resulting stitched image starts to distort. This is due to the accumulation of drift in the estimated homographies and I’m looking for ways to minimize these errors. I have 2 approaches currently, calculate pair-wise homographies then optimize them jointly using LM then chain them together. Before that tho, I want to look for ways to reduce the reprojection error in these pairwise homographies before trying to minimize them. One of the homographies had a reprojection error of ~15px, but upon warping the images aligned well which might indicate an issue with inliers (?).

Lmk your thoughts, thanks!

4 Comments
2025/02/04
06:16 UTC

2

gaze estimation models

Hi there, I am trying to classify pictures into which of the 9 tiles they should be placed into. We receive 9 pictures out of order and then can use those classifications to arrange them. I'm not super experienced with computer vision but have general python experience and some data science.

I tried out using a pretrained model via https://blog.roboflow.com/gaze-direction-position/, but I found it only worked with pictures that were more zoomed out showing the whole head. Does anyone know of a model that could work for this task? I've seen a number of APIs and models with weights available but as far as i can tell everything is focused on webcam-distance video which makes sense as its probably more useful generally.

https://preview.redd.it/txmpggnca2he1.png?width=850&format=png&auto=webp&s=7a941bff7bb0472848c025e30ae4b24d29981030

1 Comment
2025/02/04
06:04 UTC

4

Using RTMPose for multi-object detection

I'm using MMlab to deploy RTMpose for bee pose estimation.
I have deployed the model but it only detects one bee and ignores the rest. how to adapt it to multi-bee pose estimation?

1 Comment
2025/02/04
02:42 UTC

3

Best solution to construct an accurate 3D human body from 2D images?

What models out there that do this really well. I am looking for something accurate and gets the small details.

3 Comments
2025/02/04
00:59 UTC

2

Training paddleocr on my custom dataset

hello guys , can you help me with training paddleocr on my custom dataset , i have folder contains images ( lines of handwritten and printed text) and label file (txt file contain image name and label) ,now how to train it and output them model to do inference with it ?

0 Comments
2025/02/04
00:13 UTC

1

SAM on instance segmentation

If you want to segment objects in a stack, Assuming you know the max amount of objects that can be stacked, can you segment using classes from top to bottom? (Item 1, item 2, item 3)?

0 Comments
2025/02/03
23:54 UTC

4

Looking for pose network recommendation

Hi, been researching cv now for about a month quite intensively, know my basic way around. Have a business case in my head and a small working prototype but I am looking for a definite network/platform to implement my usecase:

Requirements:

  • Permissive license commercial closed source (project is self funded, cannot afford license fees atm)
  • Custom dataset training
  • Multi class and multi instance simultaniously on keypoints
  • 5-10 fps on edge device is acceptable, preferbly tflite conversion. Class has 4 and 2 keypoints respectively so simple architecture

Network I am looking at:

Yolov11 ultra seems to work technically but license issue

Rtmpose is only 1 instance (afaik) Rtmo is only 1 class (afaik)

Currently looking at detectron2 which checks the boxes but can be heavy for mobile resources

Also concidered mmrotate because position of said classes are important for my usecase but have to check further.

My current knowledge is also quite limited so any general advice is apreciated, thanks

11 Comments
2025/02/03
18:07 UTC

0

Quantum optimization for image segmentation and more algos

We’ve recently updated our free QML library with additional tools to help better understand and analyze quantum models, including: 

  • Quantum optimization for image segmentation – Provides the graph mapping for image segmentation and the formulation as a QUBO problem. Many quantum and quantum-inspired algorithms, such as quantum annealing and QAOA, can then be used to find the optimal segmentation mask.

  • Tensor network decomposition – One of the most effective tensor decompositions for compressing convolutional layers is the Tucker decomposition. This method breaks down the original four-dimensional weight tensor of a convolutional layer into multiple smaller tensors.

  • Quantum neural network statistics – Provides metrics to evaluate the balance between performance and complexity in quantum neural networks, including Expressibility and Entangling Capacity.

  • Quantum state visualizations – Explore quantum states with state space, q-sphere, phase disk, and Bloch sphere representations. 

Our goal is to demystify quantum algorithms and help scientists build more effective models for computation. The library is free to access, and for those looking to go deeper, there are also paid courses on QML Fundamentals and QML for Medical Imaging that accompany it. 

Check it out here: https://www.ingenii.io/library 

0 Comments
2025/02/03
17:18 UTC

8

Best Practices for Monitoring Object Detection Models in Production ?

Hey !

I’m a Data Scientist working in tech in France. My team and I are responsible for improving and maintaining an Object Detection model deployed on many remote sensors in the field. As we scale up, it’s becoming difficult to monitor the model’s performance on each sensor.

Right now, we rely on manually checking the latest images displayed on a screen in our office. This approach isn’t scalable, so we’re looking for a more automated and robust monitoring system, ideally with alerts.

We considered using Evidently AI to monitor model outputs, but since it doesn’t support images, we’re exploring alternatives.

Has anyone tackled a similar challenge? What tools or best practices have worked for you?

Would love to hear your experiences and recommendations! Thanks in advance!

9 Comments
2025/02/03
17:05 UTC

2

Target inference HW selection?

Question for the community:
When looking for inference HW what do you look for and where do you look for the information?
Or do you start with HW and size the SW/models/algos appropriately?

Full disclosure I work at Intel and am trying to learn how people select HW, say between things like Pi5, Lattepanda Mu, Jetson, other...?

Market research in the open :)

0 Comments
2025/02/03
16:46 UTC

2

Beginner Project: Tracking a Golf Clubhead Without Painful Data Labeling?

Hey everyone! I’m pretty new to computer vision and haven’t kept up with the latest literature. I’m working on a project to track a golf clubhead in real-time (or near real-time) across a sequence of images or videos. However, I’d rather not go through the painstaking process of labeling huge amounts of data if there’s a way around it.

I’ve been exploring existing datasets on Roboflow and even tried training something like YOLOv8 on this dataset (https://universe.roboflow.com/fp-fwgwb/golf-batch-12-skjfm), but I haven’t been able to get the results I’m looking for. Does anyone have suggestions for alternative approaches or resources that might help?

Any tips, references, or insights into more streamlined methods (that don’t involve massive manual labeling) would be greatly appreciated. Thanks in advance!

3 Comments
2025/02/03
16:19 UTC

13

How to become a Computer Vision engineer at BigTech?

Hi I am fresher in computer vision, I am primarily work with perception systems for Unmanned Vehicles, I really want to join a bigTech company eventually.

Can any insider tell me what separates a BigTech computer vision engineer from the rest?

Thanks in Advance!!

7 Comments
2025/02/03
15:54 UTC

1

Beginner in learning CV.Suggestion for project topics

Am looking for good project topics in cv where datasets are also available.Want to do something unique than already available ones.

4 Comments
2025/02/03
14:38 UTC

2

Can a YOLO pose estimation model also perform object recognition for classes without keypoints?

Hello, I couldn't find a solution in the ultralytics documentation. If I train a YOLO pose model to recognize keypoints for one class, can it also perform object detection for other classes without keypoints?

So e.g. the class “chessboard” tracks the corners on a chessboard and there are additional classes for all pieces like “White King”, “White Queen” which do not contain keypoints themselves and just object detection is performed on them.

4 Comments
2025/02/03
11:34 UTC

10

What is this colortag?

5 Comments
2025/02/03
09:57 UTC

29

I made an algorithm which detects the lane you're driving in! Details about the algorithm inside

Link to example video: Video. The light blue area represents the lane's region, as detected by the algorithm.

Hi! I'm Ari Barzilai. As part of a university CV course I'm taking as part of my Bachelors' degree, I and my colleague Avi Lazerovich developed a Lane Detection algorithm. One of the criteria was that we were not allowed to use neural networks - this is just using classic CV techniques and an algorithm we developed along the way.

If you'd like to read more about how we made this, you can check out the (not academically published) paper we wrote as part of the project, which goes into detail about the algorithm and why we made it the way we did: Link to Paper

I'd be eager to hear for feedback from people in the field - please let me know what you think!

If you'd like to collab or discuss additional stuff - I'm best reached via LinkedIn, I'll be checking this account only periodically

Cheers, Ari!

5 Comments
2025/02/03
08:49 UTC

3

Help: Streaming Jetson screen to PC using TCP/RTSP with GStreamer

Hello everyone,

I’m currently learning GStreamer and would like to stream my Jetson screen to my PC. I’ve managed to achieve this using UDP, but I’m encountering some challenges with TCP and RTSP. Here’s what I’ve done so far:

UDP Setup

Server-side command:

gst-launch-1.0 ximagesrc ! "video/x-raw" ! nvvidconv ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=32000000 ! h264parse ! rtph264pay ! udpsink host=192.168.100.4 port=8554 -e

Client side:

gst-launch-1.0 udpsrc port=8554 ! application/x-rtp ! rtph264depay ! avdec_h264 ! videoconvert ! autovideosink 

However, when using UDP, I experience a lot of artifacts when moving windows around.

UDP Streaming with artifacts.

Trying TCP: I attempted to switch to TCP by replacing the sink and source elements with tcpserversink and tcpclientsrc. Here’s what I used:

Server-side command:

gst-launch-1.0 ximagesrc ! "video/x-raw" ! nvvidconv ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1' ! nvv4l2h264enc maxperf-enable=1 bitrate=32000000 ! h264parse ! rtph264pay ! tcpserversink host=0.0.0.0 port=8554 -e 

Client-side command:

gst-launch-1.0 tcpclientsrc host=192.168.100.20 port=8554 ! application/x-rtp, encoding-name=H264, payload=96 ! rtph264depay ! avdec_h264 ! videoconvert ! autovideosink

However, on the client side, I get the following error:

Setting pipeline to PAUSED ... Pipeline is PREROLLING ... ERROR: from element /GstPipeline:pipeline0/GstTCPClientSrc:tcpclientsrc0: Internal data stream error. Additional debug info: ../libs/gst/base/gstbasesrc.c(3177): gst_base_src_loop (): /GstPipeline:pipeline0/GstTCPClientSrc:tcpclientsrc0: streaming stopped, reason error (-5) ERROR: pipeline doesn't want to preroll. Setting pipeline to NULL ... Freeing pipeline ... 

I also attempted to use RTSP, referencing this post: https://community.hailo.ai/t/sending-gstreamer-pipeline-output-over-rtsp/135 , but I couldn’t get it to work with the provided examples. I’ve also checked other forums, such as the NVIDIA developer forums, but the solutions I found didn’t help much.

Question: Is there a way to stream the Jetson screen to my second PC using TCP or RTSP? If so, could someone guide me on how to set up the pipelines correctly? Any suggestions or examples would be greatly appreciated!

Additional Question:
On the Jetson, I’ve used NVIDIA HW-accelerated encoding and managed to achieve around 100ms latency. Without hardware acceleration, the latency was around 300ms. I don’t have much experience with video encoding and decoding (yes I know that wifi latency has an impact, I got 100/80 dow/up speed and my ping is stable on 4ms), but is this level of performance expected when using hardware acceleration? On my PC I didn't (not yet :| )setup the HW-accelerated decoding.

For reference, my PC has an Intel i7-14th Gen CPU and an NVIDIA RTX 4060 Mobile GPU.

Thank you in advance for your help!

1 Comment
2025/02/03
08:36 UTC

5

Can Disaster Management and Rescue Problems Be Solved Using Computer Vision and Imaging Science?

I am a beginner in computer vision, but I have implemented some basic applications and developed an interest in the field. I am planning to pursue a master's in Computer Vision and Imaging Science, and for my thesis, I want to research a topic related to disaster management and rescue. However, while searching for existing research papers, I couldn’t find many studies in this area. This made me wonder whether disaster management and rescue can effectively integrate with computer vision and imaging science.

8 Comments
2025/02/03
07:16 UTC

12

MOT library recommendations

I am working on an object tracking application in which the object detector gives me the bounding boxes, classes, confidences and I would like to track them. It can miss objects sometimes and can detect them again in some frames later on. I tried IOU-based methods like ByteTrack and BoT-SORT that are integrated in the Ultralytics library but since the FPS is not that great as its edge inference on jetson, and the objects move randomly sometimes, there is little/ no overlap in the bounding boxes in the consecutive frames. So, I feel that distance based approach should be the best. I tried Deepsort tracker but that adds substantial delay to the system as it's another neural network working after the detector. Plus, the objects are mostly visually similar in appearance through the eyes.

I also implemented my own tracker using bi-partite graph matching using Hungarian algorithm which had IOU/ pixel euclidean distance/ mix of them as cost-matrix but there is no thresholding as of now. So, it looks to me like making my own tracking library and that feels intimidating.

I have started using Norfair that does motion compensation and uses Kalman filter after getting to know about it on Reddit/ ChatGPT and found it to be fairly good but feel that some features are missing and more documentation could be added to help understand it.

I want to know what are folks using in such a case.

Summary of solutions that I have tried.

ByteTrack , BoT-SORT from Ultralytics, Deepsort, Hungarian matching (IOU/ pixel euclidean distance/ mix of them as cost-matrix), Norfair

Thanks a lot in advance!

8 Comments
2025/02/03
05:52 UTC

Back To Top