/r/computervision

Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group

/r/computervision

104,418 Subscribers

1

Model Deployment Cost Estimate

I developed a model that can perform detections on video streams, I want to deploy this in practical scenarios on cctv , let’s say I take the stream input as RTSP and create a pipeline , I want a rough estimate on how much this will cost me for deploying this model on AWS , just to give a broader perspective , the end result will look like let’s say a website where you can put in your rtsp link and it will start performing detections and give you the detection outputs.How do I calculate these costs ??

1 Comment
2024/12/02
16:27 UTC

1

Dev-Friendly Inside-Out VR Headset w/ Accessible API for Auxiliary Cameras?

I'm hoping someone here has experience with working w/ VR headset APIs. Both Apple and soon Meta have an API to access the front camera of their respective headsets, while other brands (HP Reverb series, Vive) appear to outright have no access. I understand this is largely for privacy reasons, but I am working on a project with important applications in VR (fast optical flow / localization stuff) and I would really benefit from access to the camera streams in any inside-out tracked VR headset.

If I cannot access these streams directly, I think I will attempt to simulate these cameras and their fovs in a CGI environment. This endeavor would benefit from documentation of their relative positions and FOVs (which of course vary from headset to headset).

TL;DR - Know of any dev-friendly VR headsets with an open api for the inside out cameras? Alternatively, any headset with documented inside-out camera intrinsics/relative extrinsics for calibration or simulation?

0 Comments
2024/12/02
15:35 UTC

7

Handling 70 hikvision camera stream, to run them through a model.

I am trying to set up my system using deepstream
i have 70 live camera streams and 2 models (action Recognition, tracking) and my system is
a 4090 24gbvram device running on ubunto 22.04.5 LTS,
I don't know where to start from.

16 Comments
2024/12/02
12:03 UTC

1

Yolo-NAS Performance -- What is the best model to deploy?

https://preview.redd.it/kdlmuz6v6f4e1.png?width=752&format=png&auto=webp&s=683aa523a9decf283af472db58e4e7fc4f3a65fb

I'm wondering what is the best model to deploy for a detection tool I'm working on. I'm current working with the Yolo-NAS-S architecture to minimise latency, but am I right in saying the Quantised version of Yolo-NAS-L should perform just as well from a latency point of view, but with improved accuracy?

3 Comments
2024/12/02
11:25 UTC

2

About Basler camera's operation time

I wonder is it able to turn on this camera for 24hours with poe? I'm using a2A640-240gmSWIR

and this camera gets hot so easily. got to 60~69degrees(celsius) too soon

1 Comment
2024/12/02
09:48 UTC

2

Surround View

I have 4 fisheye camera that is located each corner of a car. I want to stitch the output of the cameras to create a 360 degrees surround view. So far, i have managed to undistort the fisheye cameras with a loss in FoV. Because of that, my system fails when stitching them since the intersection region of the cameras may not contain enough features to match. Are there any other methods that you might propose? Thanks

10 Comments
2024/12/02
05:57 UTC

2

is running 10 different nerf/3dgs methods gonna help me?

I joined my uni's research program for undergrad. Me and my team of 3 people have to collect numerous different videos of objects with different lighting condition etc and run nerf/3dgs on them. As you know, researchers don't maintain the codes after their papers are accepted and thus, I have to run dockers to make them run. (Btw nerfstudio doesn't work at all) Also there are so many bugs so I have to read the codes and fix the bugs myself. Two other team members don't contribute at all. They literally don't do anything so I have to do all this by myself. Now my professor is asking us to read through the code and insert custom camera position to test the novel view synthesis near the ground truth image. This means I would have to understand the abysmal codes of the researchers wrote better than themselves... She is asking me to get the PSNR/LPIPS etc but that part won't be too hard.

I asked my prof if this is going to be published and she told me that this project lacks novelty so it probably won't be published. She told me this project is for her to better understand these models and that's it.

I was originally interested in 3d reconstruction and novel view synthesis but this project is making me hate it. This is just a grunt work with no real novel ideas to try and eating up so much of my time. I recently talked to the professor I really wanted to work with and he told me that he will let me into his lab if I do well in his class next semester and I am worried this project, that I have no passion anymore, will waste too much of my time which would be better spent on doing well on the class....

What do you think? Should I put in 20+ hours/week and the entirety of my winter break for the project that only serves to enhance the practical knowledge of my professor with absolutely no help from teammates?

* I am on a time crunch so I don't have enough time to get a firm knowledge on the foundations of model and just skim through the papers multiple times to understand the code

2 Comments
2024/12/02
05:42 UTC

1

Proposed solution to the super-resolution problem of multi-frame license plate

Currently, I am working on a project to recover license plates with many initial inputs. Can someone suggest me some source code or some related articles?

Thanks everyone

1 Comment
2024/12/02
03:16 UTC

1

Looking for a tool to extract text from images or PDFs using OCR. What do you use?

Looking for a tool to extract text from images or PDFs using OCR. What do you use?

3 Comments
2024/12/02
00:03 UTC

12

What's the future for inference? On the edges or on the cloud?

I'm a seasoned software developer but pretty late comer to computer vision field. It's been just a few weeks but I'm truly loving it. Here is an important question I have in mind and couldn't find an "obvious" answer so wanted to check what community thinks in general.

What do you think about the future of inference? Is it possible to expect that inference will mostly move to edges? Or will it be mostly cloud based? For example; for robotics it must have been on the edge somehow, right? But for web use cases probably mostly cloud?

I'm guessing that there wouldn't be a one-size-fits-all answer to this question but I'm hoping to see some informative replies about different aspects to it.

Here is an example sub-question:

I'm thinking of creating a custom trained object-detection/classification model for a certain industry and I'm going to create a mobile app for users to use it. I'm guessing that the common practice would be "use local inference whenever possible", is this correct? But if the implementation is easier/more feasible cloud inference can be used as well? So the sub-question is; what are the main questions you ask when deciding on picking a inference infra for a project?

11 Comments
2024/12/01
20:21 UTC

3

Advice on how to make it in this field

I'm a second year PhD student doing my research in animal behavioral genetics. My project has taken a turn where most of my job is to now create models that can extract behavioural features from videos of animals interacting.

I want to stay in this field in the future and try to make it in academia/industry but focusing on computer vision for animal behavior quantification.

What skills should I make sure to acquire and where in the world should I try to apply for jobs after my PhD?

Anyone else who went through the same thing as me? All advice is welcome!

Thanks in advance!

2 Comments
2024/12/01
20:10 UTC

2

Identifying and parsing form data with nested tables

I work for a health-insurance adjacent company in the United States. As a part of our workflow, we need to ingest claims reports from various insurance providers. The meat of these PDF-formatted reports is the “Claims” table.

The “Claims” table has a “row” for each claimant. However, this “row” is actually two rows. For each claimant, the top row has cells with basic identifying information. The bottom row is actually a second subtable with one or more claims.

I’ve included a mockup in Excel of what a “Claims” table might look like. Actual claims reports are PDFs with headers/footers/page breaks/etc. The mockup has two claimants. The first claimants has two claims, the second just has one.

What’s approach can I take to extract claim and claimant data from this nested table? I’ve tried form parsing software like Azure Form Recognizer/Document Intelligence, but these solutions don’t support nested tables. I also tried a multi-modal LLM-based approach with Claude but got terrible results.

Do I need to build a custom parser for this? Maybe use a Python OCR library to iterate over rows and apply a different parsing model to the claimant info and the claims info?

TIA

6 Comments
2024/12/01
19:20 UTC

4

motion detection model

Hi guys.

I am developing a model that detects fidgeting movement when sitting for the upper body part ( so like fidgeting left to right, back and forth on a chair ). The goal of the project is to see whether the student has some sort of ADHD related behaviors.

Currently, i am thinking to use a rule-base system that has a threshold for left and right shoulder landmarks. But this approach does not seems to work for bigger frame student. I want to ask reddits for help, and hopefully find the right answer.

Thank you for reading

2 Comments
2024/12/01
17:55 UTC

1

Sugarcane lodging detection dataset

Hello everyone, I am working on a project where I have to label data to segment lodging and weeds in sugarcane crop. I have some drone images and I am trying to label them using tradition computer vision techniques like using height data available in drone images to highlight lodged crops. I am still not getting very accurate results due to varying height of the crops and now I feel stuck.

I would love to know if anyone has done something similar and how did they overcome the issue?

4 Comments
2024/12/01
17:03 UTC

33

Why do papers never…?

...try to characterize the images within a dataset that their model performs better (or worse) on compared to other models?

For all we know the author's contributions might have a huge impact on images with certain characteristics and hurt performance on other images...but we never see any analysis towards that.

This seems so obvious...why can't I find any papers on the topic?

9 Comments
2024/12/01
16:27 UTC

21

How Do You Handle Massive Image Datasets?

Hi everyone!

I was checking out some cool computer vision projects recently (like that insane DALL-E training data and those medical imaging projects) and wow - the amount of images needed for training these models is mind-blowing!

How do you all handle this? How much data are you typically working with? What's your setup for managing it all? I'm really curious about the costs too - do you use cloud storage or have your own setup? What kind of challenges do you run into with all this data?

Would love to hear your stories and solutions!

12 Comments
2024/12/01
16:24 UTC

3

Recommendation for Multi Crack Detection

Hey guys I was given a dataset of several different type of construction cracks and I need to create a model that identifies each one. I’m a beginner in CV and none of them are label.

The goal is to take this to production. I have background in ML and doing backend using fastapi but what algorithm should I use for such a use case and what do I need to consider for deploying such a project in production?

10 Comments
2024/12/01
12:30 UTC

1

Easy Eye-Tracking Method Using Deep Learning for Wheelchair Control

Hey Redditors,

I’ve been exploring a simple and effective way to implement eye-gaze tracking using deep learning. My goal is to use this system to control a wheelchair, making it more accessible for individuals with limited mobility.

Here’s the approach I’m thinking about:

  1. Pre-trained models: Leverage pre-trained deep learning models like OpenCV's gaze tracking tools or custom-trained networks for detecting eye movement and mapping gaze direction.
  2. User-friendly interface: The system converts gaze directions into wheelchair commands, such as forward, backward, left, or right.

Benefits:

  • Cost-effective compared to commercial systems.
  • Easy to set up with off-the-shelf hardware.
  • Can be customized for different users' needs.

I am new to Python, so I’m still learning the basics. If anyone has advice or suggestions on how to start this project, or even some Python-friendly libraries or tutorials that could help, please let me know!

I’d love to hear your thoughts, feedback, or any resources you think would be useful. Let’s make mobility accessible for everyone!

2 Comments
2024/12/01
10:59 UTC

0

water tracking was to identify its speed

How I can do this task any one guide me plz

5 Comments
2024/12/01
10:52 UTC

1

Face Identification

I am working on a CV problem, where I have to detect all the faces present in the image...

  1. the quality of the images is not that good, also the faces in images are very small and their alignment is not straight.
  2. I have tried retina-face however not geeting accurate results, only extracting some faces and that too very blurred.

Attached an example image for reference...I exactly have images like this...there are photos on camera roll and have to ectract faces from it

Please Suggest me some way to achieve this with good accuracy and speed. Also should I do any image preprocessing step ?

https://preview.redd.it/0o0eg445a64e1.jpg?width=1300&format=pjpg&auto=webp&s=5803b86183766e5ff8588e179f37e071e0fa5b51

1 Comment
2024/12/01
05:26 UTC

2

Satellite Aerial Recognition of Landmarks and Objects

i am new to all this i have to work on a project that entails Landmark And Object Detection Via Satellite Images

Please any advice and guidance on courses or videos or codes would be really appreciated im really at a loss

0 Comments
2024/11/30
21:30 UTC

4

Multi-camera-multi-object near-infrared tracking of reflective QR codes

Hello,

I'm looking for recommendations on a project we are working on.

  • the space we are working in is 60-120 sqm
  • high contrast, pretty bad light conditions, but only in the visible light spectrum
  • we can flood the room with NIR light from the ceiling
  • we can put several cameras on the ceiling with their own raspberries
  • we want to track 6 objects, with unique, large IDs made with reflective paint
  • loosing the objects to occlusions is okay, 30-60 FPS is all we need

We need to feed the direction and the location of the objects with low precision (5 cm, 5 degree error is okay) into a central computing unit at 30+ FPS.

We are thinking about using

  • 2 or 3 Raspberry Pi Camera Module v3s
  • with no internal IR filter,
  • an 850 nm band pass filter,
  • OpenCV feeding the extracted data into the central computing unit

I need feedback about

  • better setups to achieve the same goal at lower price (DIY is okay)
  • good sources in Spain to buy the HW and the pass filter
  • any misc suggestion, critique, idea

Thank you guys!

9 Comments
2024/11/30
21:04 UTC

9

Smoothing a Python and OpenCV AR App with a Kalman Filter

Long time in the making, finally found time to finish writting the last post on this project. This final post is about including a Kalman Filter to timprove tracking and smooth and stabilize the projection:

https://bitesofcode.wordpress.com/2024/11/30/augmented-reality-with-python-and-opencv-part-3/

Code: https://github.com/juangallostra/augmented-reality/tree/kf-tracking/src

Video: https://www.youtube.com/watch?v=3LSitteyw4Y

1 Comment
2024/11/30
20:24 UTC

1

clarification about mAP metric in object detection.

Hi everyone.

So, I am confused about this mAP metric.
Let's consider AP@50. Some sources say that I have to label my predictions, regardless of any confidence threshold, as tp,fp, or fn, then sort them by confidence (with respect to iou threshold of course). Next, I start at the top of the sorted table and compute the accumulated precision and recall by adding predictions one by one. This gives me a set of pairs. After that, I must compute the area under the PR Curve, which is resulted from a unary function of f(precision)=recall_per_precision (for each class).

And then for a mAP@0.5:0.95:0.05, I do the steps above for each threshold and compute their mean.

Some others, on the other hand, say that I have to compute precision and recall in every confidence threshold, for every class, and compute the auc for these points. For example, I take thresholds from 0.1:0.9:0.1, compute precision and recall for each class at these points, and then average them. This gives me 9 points to make a function, and I simply compute the AUC after that.

Which one is correct?

I know Kitti uses something, VOC uses another thing and COCO uses a totally different thing, but they are all the same about AP. So which of the above is correct?

EDIT: Seriously guys? not a single comment?

4 Comments
2024/11/30
18:12 UTC

0

What hardware do I need for this project?

I have a camera module (attached to an eyewear) that scans the text (using OCR?) e.g. if the scanned text is "abc" i need to output the ascii value of each letter i.e. "a - 97, b = 98, c = 99")
p.s it needs to be fast.

  1. what camera module do i need
  2. do i need raspberry pi (if yes, what model)
  3. how much do i spend for this?
2 Comments
2024/11/30
18:03 UTC

1

To create a python program that detect face and objects with a box with a text referring the same

as the title says, i need a step by step approach since i am very new to this project.

2 Comments
2024/11/30
17:53 UTC

4

uploading CV project to github?

i trained a yolov7 to detect damages on roads. i want to publish my work on my github but looking at it its not really my code? the only thing i changed were the necessary config files to locate my training set. and one python file that just has one line of code to run the model?

should i just publish my results folder? if i upload the whole project it just looks like im stealing someone else's work since i barely changed anything.

3 Comments
2024/11/30
16:36 UTC

5

Suggestions for text identification and extraction from engineering drawings

I need a way to extract the circled text (in the REV box there will usually be a single letter or number). The boxes will often change size and position. The structure of the text will also change along with position, size, and font. Generally, the text will always be in the bottom right.

Problem is that I cannot rely on keywords, positions or regex.

I’m using tesseract and openCV but I am open to other stuff (can only use azure for cloud computing)

I’m just looking for suggestions on how y’all would tackle this. I am a beginner.

8 Comments
2024/11/30
16:34 UTC

Back To Top