/r/computervision

Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group

/r/computervision

109,143 Subscribers

2

Segment anything for small objects

If I want to segment out individual chairs in a image of a stack of chairs (like in a cafeteria after cleanup) could I use unity or some other 3D engine to train the masking part of the SAM model? Since SAM already does segment on a small scale, would a little guidance from supervise fine tuning help it converge?

I assume the synthetic data/sim to real gap isn’t too bad given how smart the model is, and the fact that you can give it prompts.

5 Comments
2025/02/01
21:55 UTC

0

Instant-NGP: 3D Reconstruction in Seconds with NERF Optimized

NERF has shown some impressive 3D reconstruction results, but there’s one problem. It’s slow. Nvidia came out with instant-ngp that solves this problem by optimizing the NERF model and other primitives so that it can run significantly faster. With this new method, you can do 3D reconstruction in a matter of seconds. Check it out!

0 Comments
2025/02/01
21:09 UTC

2

CV applied to spacecraft

Hello,

For those of you that work in robotics and spacecraft, can you talk about the techniques you use and challenges you face?

I am doing a project to estimate the pose of a spacecraft for docking, using classical CV.

2 Comments
2025/02/01
20:45 UTC

1

Learning Material on Image Accusation

Hey everyone,

I'm just getting started with Basler cameras for a computer vision project, and I'm pretty new to image acquisition. There are a lot of concepts I need to learn to properly set up the camera and environment for optimal results—like shutter speed, which I only recently discovered.

Does anyone know of any good courses or structured learning materials that cover image acquisition settings and techniques?

1 Comment
2025/02/01
19:49 UTC

1

Chess board dimensions(Cameracalibration)

I'm calibrating my camera with a (9×9) chess board(square), but I have noticed that many articles use a rectangular shape(9×6)(rectangular), does the shape matter for the quality of calibration?

3 Comments
2025/02/01
17:05 UTC

3

Corner detection: which method is suitable for this image?

Given the following image

https://preview.redd.it/qwoxz1mlijge1.png?width=956&format=png&auto=webp&s=31efbc9943454d5848ec9e04f0a1d427c590c89d

when using harris corner (from scikit-image) it mostly got the result but missing the two center points. maybe because the angle is too wide and doesn't consider to be a corner

https://preview.redd.it/z4ekdbt1jjge1.png?width=956&format=png&auto=webp&s=9fbeb7b178ccc05f9a36d4d3e4ca7ff62706abe4

The question is can it be done with corner approach? or should I detect lines instead (have try using sample code but not get good yet.

https://preview.redd.it/w1q36waejjge1.png?width=864&format=png&auto=webp&s=efd5dec446771fad012bd1501e94d6e634cf50c0

Edit additional info: the small line section outside is for known length reference so I can later calculate the area of the polygon.

4 Comments
2025/02/01
14:51 UTC

1

Birds-eye view wireframing

Hi, are there any algorithms you would recommend for placing wireframes on a person from a bird-eye view? The algorithms I’ve tried so far don’t seem that robust.

0 Comments
2025/02/01
11:01 UTC

1

Questions about how to gather a batch images without pad and keeping ratio

Given a batch of images with different sizes and ratios, make them in batch. But

- ratio keep;

- no pad;

Anyone knows anyway to do this?

(Or how does qwen2vl able to do this?)

1 Comment
2025/02/01
10:46 UTC

2

Novel view synthesis, NeRF vs Gaussian splatting

Hello everyone.

For context, I am currently working on a project about evaluating SFM methods in various ways and one of them is to produce something new to me called novel view synthesis.

I am exploring NeRF and Gaussian Splatting but I am not sure which is the best approach in the context of novel view synthesis evaluation.

Does anyone have any advice or experience in this area ?

7 Comments
2025/02/01
09:37 UTC

2

Best service for cropping/segmenting images?

I'm building a tool where you upload a bunch of video games, and gpt4 extracts the title of each game from the image. Then it gets price data for each game.

I'm running into a problem and need some help. When the image contains too many games, gpt starts to perform poorly. I've found that when I manually crop those same images and send in just one game at a time, it's perfect.

How can I do pre-processing so that it will crop or segment each game and increase the accuracy? Is there a good service for this?

Btw, here is the tool so you can see how it works:
https://frontend-production-bca1.up.railway.app/

1 Comment
2025/02/01
01:32 UTC

1

Can we accelerate stablevideo diffusion single video generalization speed with multiple GPUs?

Hi everyone. May I ask if it possible to accelerate stablevideo diffusion single video generalization speed with multiple GPUs. I have been reading papers and trying to figure out this problems for a few days. It seems the video generalization process follow a strong sequence in both denoising process and video generate sequence. Making it impossible to acclerate like using different gpus to generate different frames.

It seems the only possiblity if to acclearte the denoising process through something like tensor parallel, this also seems hard since the U map are not regular attention block (MLP+mutihead attention).

Does anyone have some related experience? Any suggestion helps. Thank you!

0 Comments
2025/01/31
22:53 UTC

6

Crowd Sourcing Computer Vision Dataset Needs

Hi All,

I've been following this channel for months, and have loved seeing the amazing work happening here. As someone deeply involved in synthetic data generation, I want to give back to this awesome community.

I work for a company that specialize in creating synthetic datasets, and I'm reaching out to understand exactly what you need. Our recent Pose Estimation dataset was to help the community, and now we want to tackle the datasets that will truly move your projects forward.

Some areas we're particularly interested in exploring:

  • Object detection in challenging environments
  • Semantic segmentation for complex scenes
  • Multi-object tracking scenarios
  • Anomaly detection datasets
  • Domain-specific imaging (Offroad autonomous driving, UAV, etc.)

Your input is crucial. What datasets would make your CV work easier, faster, or more precise? What specific challenges are you facing in data collection?

https://huggingface.co/posts/DualityAI-RebekahBogdanoff/175052732651947 - This is the post we shared on HF to get more information.

For the comments that get traction I will update and share the datasets on HF and our site. Drop in your requests and I will love to help!

4 Comments
2025/01/31
22:24 UTC

2

I am working on real-time semantic segmentation models, and would like to know where to get recent temporal-consistent models.

I see a lot of repositories 5-6 years ago, such as flownet+semantic segmentation.

Does anyone know of any recent models that are temporal-consistent and open source for use? Thank you!

1 Comment
2025/01/31
20:30 UTC

101

Computer vision feeling stagnant in the age of LLM? Am I the only one?

I've been following the rapid progress of LLM with a mix of excitement and, honestly, a little bit of unease. It feels like the entire AI world is buzzing about them, and rightfully so – their capabilities are mind-blowing. But I can't shake the feeling that this focus has inadvertently cast a shadow on the field of Computer Vision. Don't get me wrong, I'm not saying CV is dead or dying. Far from it. But it feels like the pace of groundbreaking advancements has slowed down considerably compared to the explosion of progress we're seeing in NLP and LLMs. Are we in a bit of a lull? I'm seeing so much hype around LLMs being able to "see" and "understand" images through multimodal models. While impressive, it almost feels like CV is now just a supporting player in the LLM show, rather than the star of its own. Is anyone else feeling this way? I'm genuinely curious to hear the community's thoughts on this. Am I just being pessimistic? Are there exciting CV developments happening that I'm missing? How are you feeling about the current state of Computer Vision? Let's discuss! I'm hoping to spark a productive conversation.

43 Comments
2025/01/31
20:16 UTC

0

Pre-trained weights

HI! Can anyone help me out in finding some weights trained to localize and classify blood cells for an RT-DETR based detection algorithm?

0 Comments
2025/01/31
14:46 UTC

3

How is computer vision related to graphics and images?

Cv noob here,i may have to take a course in cv next and i was wondering is cv the same (when working with it) with graphical representations (like in games, animations, rotation, translation where you work with matrices etc) I didn’t really enjoy working with games and graphics so if its too much like it then cv is not for me.

8 Comments
2025/01/31
13:22 UTC

3

How to Handle Image Reflection and Dirty Camera Artifacts

Hey everyone,

I'm working on an image classification and object detection model, but I’m running into issues with image reflections and dirty camera artifacts (e.g., sand, dust, smudges). These distortions are causing a lot of false positives and impacting model performance.

Im trying to add new data augmentation techniques in order to simulate these distortions but the results are still not good.

Has anyone dealt with similar problems before? Do you know any other technique that can help me in this situation?

9 Comments
2025/01/31
11:42 UTC

3

A newbie trying to get advice

I am new to ml and I making a project for vehicle detection using drone videos as input at about height 200meters so i am thinking about models i should train for this application. And processing is done after the flight. So i am currently thinking to train yolon8x on visdrone data and later train it on custom data after collecting. final output is going to be entire trajectory of the vehicle in that video.

can someone help me out like is this a current direction. or I need to train some different model. Accuracy is a priority. give some general advice on how u would approach this or things i need to watchout for .

2 Comments
2025/01/31
08:44 UTC

5

Segmentation of overlapping objects

I have this image containing overlapping objects. I want to find out the mask of each object.

What I tried -
- SAM doesn't segment properly when given the image. It segments properly when some points covering each part of the object is given as input along with the image.
- Trained yolo and detectron models on my data. Yolo doesn't even detect each object properly. Detectron detects and gives bounding box better than yolo (but not best) but fails in segmentation. I have a dataset of 100 images which i augmented to thousands of images and trained the models.
- I could take the segmentation points from detectron and give it to sam as input with image. But detectron doesn't segment that properly to cover each part of overlapping object so that sam can perform well.
Help me approach this problem. Any suggestions or links to research papers related to this are appreciated.

Image

3 Comments
2025/01/31
06:02 UTC

1

Oak D Pro

Ros 2 Packages to Raspberry Pi? I don't get how it works. I have a project building a search and rescuse robot using Oak D Pro 9782, and we're going to use Linux. Any suggestions?

BTW any advice on how to categorize data types for a stereo depth camera? I'm a volunteer for a Senior Design Project and I don't understand what the Professor is saying. Any assistance is all appreciated, thank you!

2 Comments
2025/01/31
05:18 UTC

4

DINOv2 for Semantic Segmentation

DINOv2 for Semantic Segmentation

https://debuggercafe.com/dinov2-for-semantic-segmentation/

Training semantic segmentation models are often time-consuming and compute-intensive. However, with the powerful self-supervised DINOv2 backbones, we can drastically reduce the training compute and time. Using DINOv2, we can just add a semantic segmentation head on top of the pretrained backbone and train a few thousand parameters for good performance. This is exactly what we are going to cover in this article. We will modify the DINOv2 backbone, add a simple pixel classifier on top of it, and train DINOv2 for semantic segmentation.

https://preview.redd.it/3mqlsl9958ge1.png?width=1000&format=png&auto=webp&s=d426e349ddb470991a259def1d795cfbf4d2fca9

7 Comments
2025/01/31
00:31 UTC

5

What's the Current State of Computer Vision in Medical Imaging ?

Hi,

I’ve been thinking a lot about this lately and wanted to get some insights on the current state of computer vision and image processing in medical imaging. I was recently offered an internship in cardiac video segmentation, but I’m wondering if this field still has a strong future given the rapid advancements in AI

2 Comments
2025/01/30
23:45 UTC

5

Favourite Computer Vision Papers

What are your favorite computer vision papers?

Gotta travel a bit and need something nice to read.

Can be any paper also just nice and fun to read ones.

2 Comments
2025/01/30
23:13 UTC

10

Understanding Vision Transformers

I want to start learning about vision transformers. What previous knowledge do you recommend to have before I start learning about them?

I have worked with and understand CNNs, and I am currently learning about text transformers. What else do you think I would need to understand vision transformers?

Thanks for the help!

9 Comments
2025/01/30
22:41 UTC

14

Created a background remover arena like LMSYS to benchmark APIs

5 Comments
2025/01/30
21:11 UTC

4

Siamese Neural Network for Object Detection / Template Matching

Is it possible to use a Siamese Neural Network for finding all instances of a given template in an image? Similar to the Template Matching with Multiple Objects at the end of https://docs.opencv.org/4.x/d4/dc6/tutorial_py_template_matching.html but also output a score of how similar the object is to the template?

My thought is that SNNs are more robust to changes in object appearance / size / lighting than classical template matching. The actual objects I'm looking for are arbitrary items in a video feed. i.e. a person draws a bounding box around an object (one that may not be part of the training set of a normal object detector/classifier like yolo) in one of the first frames and then the tracker searches for that item in subsequent frames. I know there are trackers like DaSiamRPN which works pretty good as long as the object stays in the frame and is maybe only briefly occluded (I tested the implementation in OpenCV), but I want to account for the object completely leaving the frame possibly for hundreds of frames.

I played around with DeepSORT and it supposedly uses "Re-ID" but it seems like they mean re-id to mean performing association via visual similarity between an object that of the track it's maintaining as opposed to long term re-id.

5 Comments
2025/01/30
21:02 UTC

0

Resume Review for FT CV/perception roles starting summer 2025.

Hi all, I have been getting only rejections from all the relevant CV/perception roles that I have been applying. Some require PhDs or papers in top conf. It seems like my resume might not be up to the mark.

So I would request a honest roast or review of the resume, and if you have any suggestions on improving the profile.
Thank you for your time. ANY SUGGESTION IS GREATLY APPRECIATED!

https://preview.redd.it/lpa1h6mcp6ge1.png?width=788&format=png&auto=webp&s=5b3166a198895a53aeb39e079435dd38a4528bac

1 Comment
2025/01/30
19:40 UTC

6

Looking for Co-Founder in CV/AR: Augmented Endodontics

I am looking for a computer vision AI engineer to join me as a co-founder on a project that I am calling Augmented Endodontics.

The overarching goal is to take stereo image data of surgical microscope videos from root canal procedures, re-project the scene depth and use depth to register 3D CBCT model overlays. (see video)

I am a DDS/Phd and Endodontist and have been working on this project for many years now. If you are interested discussing the project more please email me at jsimon.endo@gmail.com

https://reddit.com/link/1idulfe/video/94aze60ji6ge1/player

0 Comments
2025/01/30
19:03 UTC

9

Giving ppl access to free GPUs - would love beta feedback🦾

Hello! I’m the founder of a YC backed company, and we’re trying to make it very easy and very cheap to train ML models. Right now we’re running a free beta and would love some of your feedback.

If it sounds interesting feel free to check us out here: https://github.com/tensorpool/tensorpool

TLDR; free GPUs😂

15 Comments
2025/01/30
18:27 UTC

Back To Top