/r/computervision

Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group

/r/computervision

90,773 Subscribers

1

Video lecture links for CS131 Computer Vision: Foundations and Applications by Stanford?

I just wanted to ask if anyone video link for the lectures of CS131 Computer Vision: Foundations and Applications by Stanford, I found video lectures for CS231n which is CNN's for visual recognition but CS131 covers all the traditional techniques used in CV

link for CS231n: https://www.youtube.com/playlist?list=PLf7L7Kg8_FNxHATtLwDceyh72QQL9pvpQ

0 Comments
2024/05/08
17:59 UTC

2

What Eye Tracking model do you use?

Hello!

I have a project, where I need some eye-tracking algorithm for detecting the coordinates of the screen the user gazes on. I tried some models available on github, but those ones are lack of accuracy or too difficult calibration (opencv calibration using chessboard was mindblowing), I wanted to integrate something accurate and easy to use =) Sooo... Do you have some modern eye-tracking algorithm/repository on your mind? =)

0 Comments
2024/05/08
17:04 UTC

2

Resources for learning computer vision using C++

does anyone have any good resources for learning computer vision using C++, I personally have always solved vision problems using python and after seeing the job market and talking to one of my interviewer they told me that most of their vision tasks is done using C++ . If anyone has any good material that they can share I would really appreciate it, Thank you!!

EDIT
Hey everyone I found a reddit thread that explained my problem better and has some detailed information which you might find useful https://www.reddit.com/r/computervision/comments/12cyzwa/practical_c_for_computer_vision_jobs/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

this gives a better explanation of what I meant anymore and gives a good insight of its practicality of using C++ in CV thanks to u/CowBoyDanIndie , but does not have any resources per see if you have any and can share it will be really appreciated
Thank you!!

CowBoyDanIndie

0 Comments
2024/05/08
15:29 UTC

4

Distance Estimation in Real world Coordinates

https://preview.redd.it/tau7j3rna7zc1.jpg?width=1280&format=pjpg&auto=webp&s=1a733f6b094b87e24a460c710db707e87061f497

https://preview.redd.it/97h7i3rna7zc1.jpg?width=1280&format=pjpg&auto=webp&s=1f2dd8f21de8a05d39e02815eabbf67811809597

https://preview.redd.it/8da4g3rna7zc1.jpg?width=1280&format=pjpg&auto=webp&s=9334dca0bbeb926e147307d434b5b394db5a214a

Hello,
I have three cameras and I'd like to find the distance in meters from point a to point b with in the frame as you can see in the uploaded images with the ground truth values.

Can someone please guide/advise me on how to tackle this problem?

What have I tried?

  1. I calibrated each camera using opencv and also used matlab calibrator tool, and I have a reprojection error of less than 0.5 pixel. I have the intrinsic and extrinsic parameters. Using these parameters I applied the DLT algorithm to find the distance between two points but the values are way off.
  2. I tried using a known reference of 0.45m (human width) when there are people in the frame. I tried to get the distance from camera 1 to person 1, camera 1 to person 2. Using the length of these two sides I tried to get the third side but I don't have the angle. I tried to get the depth and angle using SIFT and used triangulation method but the values I got were 8000, 7000m.
  3. I tried segmenting and detecting the poses of each human to get the distance from shoulder to shoulder but couldn't get values anywhere close to the ground truth.

Please guide and advise. Thanks a lot.

Camera Details - Unifi G4 Pro.

|| || |Lens| 4.1–12.3 mmF ; ƒ/1.53–ƒ/3.3| |View angle|Wide: H: 109.9°, V: 60°, D: 127.7° Zoom: H: 35°, V: 19.8°, D: 40°|

1 Comment
2024/05/08
13:00 UTC

0

Depth Measurement from Single Image or Videos

Hello everyone,

I would like to develop a project for measurement of specific objects in real-world units, in particular to extract depth. Note that I do not intend to measure the distance to the camera, instead I want to find the height, width and depth relative to the object's plane.

I have previously experimented with Structure from Motion (SfM) for 3D reconstruction and then through point cloud manipulation and by knowing the dimensions of a reference square that I placed within the scene, I was able to roughly extract the dimensions. However the results were not incredible and I would like to try more state-of-the-art approaches.

I have been keeping an eye on recent developments in depth estimation (namely https://github.com/prs-eth/Marigold, https://github.com/LiheYoung/Depth-Anything ). Is it a good idea to use these kind of models to generate 3D models and perform the same approach that I mentioned earlier or would you suggest something else?

I mostly work in developing segmentation and detection deep learning models, so your help to dive into this world would be much appreciated!
Thank you in advance :)

1 Comment
2024/05/08
11:35 UTC

2

Need help with breast cancer findings (multi-label) classification

I have been asked to extract embeddings of mammographies that may have various findings. I have the momographies and a table of the findings present in each of them, but I don't have any information about where the findings are located.

The idea of the project was to fine-tune a convolutional model to get more informative embeddings in the momography domain and deal with the domain-gap between the dataset used to train the backbone and my dataset. My main problem is that lacking positional information about the findings the models I try to train do not learn well or engage in predicting the same label distribution for each finding.

Reviewing the literature I have not found any work that performs multi-label classification only with just mammographies, without using ROIs or any other kind of positional information. What can I do to take advantage of my dataset and get more informative embeddings?

0 Comments
2024/05/08
11:19 UTC

0

How to construct a 3d image from a 2d image of a wheel?

Hello everyone, I hope everyone is doing well. So I’m working on a 3D image construction project using AI, where I have been given thousands of wheel images in png format. How can I setup a pipeline that’s take all the image one by one and reconstruct a 3D image? And how to create a high resolution accurate 3D wheel image? Are there any good models available? If not, how can I build a model for this?

11 Comments
2024/05/08
08:06 UTC

2

Need help with synthetic dataset for BGA defect detection using Blender/BlenderProc

Hey folks,

I'm working on detecting defects in ball grid array (BGA) solder balls, but collecting and annotating enough real images is a pain cost and time-wise. So I'm trying to generate a synthetic dataset using Blender and BlenderProc instead.

Problem is, I'm not too familiar with Blender's Python API. Here's what a real BGA image looks like compared to one of my synthetic ones. https://imgur.com/a/5a2PdKU

I trained YOLO v8 on the synthetic data, but it's performing pretty badly on real images. Any tips on how I can improve the quality and realism of my synthetic BGA dataset? I'd really appreciate any advice or pointers from those more experienced with using Blender for this kind of thing.

Let me know if you need any other details! Thanks in advance for the help.

7 Comments
2024/05/08
07:25 UTC

6

Choosing HW for home CCTV image classification

My dad stays on a farm where trespassers and attempted break-ins have become more frequent. He already has a standard 8 camera Hikvision setup at home and I would like to help him out by adding image classification to his CCTV feeds.

I saw a product called "Spotbot" which is exactly what we are looking for but my dad is cash strapped and you don't really have control over your models.

I have some experience in the field and thus want to give DIY a go.

I am hoping someone can point me in the right direction w.r.t hardware selection, criteria:

  • price <$250
  • image classification at 2-4 frames/sec x 8 cameras
  • HW/SW support making it "hobbyist friendly"

I looked at the following:

  • RPI5 - Affordable but not the best for CV
  • Jetson Nano - Performance is there but EOL and thus not great support. I think this is what Spotbot is using.
  • Jetson Xavier - Too expensive
  • Orange Pi - Great performance and price
  • N100 mini PC - Great performance and price

So far the Orange Pi and N100 mini PC looks like great options but given the fact that this is not my field of expertise, I would just like to find out what the community thinks/suggests?

P.S. The selected HW will be on the same network as the 8 CCTV cameras, it will then process 2-4 frames from each camera every second and once a human is detected, notify my dad.

4 Comments
2024/05/08
07:01 UTC

1

Need Advice for University Presentation on "SNNs on Embedded Devices and Applications"

Hi everyone,

I'm preparing a presentation and paper for my university course on "Spiking Neural Networks (SNNs) on Embedded Devices and Applications." It's a fascinating topic, but I'm tasked with introducing it to an audience unfamiliar with the concept.

I'm looking for:

Advice: How can I break down this complex topic in a way that's easy to understand? Sources: Any must-read papers or articles? Presentation tips: Any engaging ways to present this to first-time listeners? Any help or guidance would be greatly appreciated!

0 Comments
2024/05/08
06:56 UTC

2

HELP project 3D reconstruction

Hi guys. I’m working on a project about 3D road reconstruction from a static single camera (no stereo).

https://stackoverflow.com/questions/78435888/road-cloud-points-estimation-from-static-monocular-camera

Please help me! Thanks.

0 Comments
2024/05/08
06:49 UTC

3

CV Experiment with Industrial-Grade Camera in a Confidential Factory

Hey everyone. I'm preparing for my first on-site experiment. It will be on a factory. They value their confidentiality highly so this will be my only chance to go in there for now. They sent me one of the cameras planned to be used in the project.

There will be varied lighting conditions on the site and the product has a shiny finish, so I'm thinking about maybe using a polarizing filter for preventinng uniformity problems caused by shining. Apart from this, what important preparations should I not overlook? Like should I bring a chessboard-like piece to calibrating lens distortion? How can I best manage lighting and environmental challenges? Are there specific software tools or libraries you’d recommend for quick setup and analysis? What advice do you have for handling unexpected issues on the spot?

I would really appreciate any advice to ensure things running smoothly. Thanks in advance!

2 Comments
2024/05/08
05:35 UTC

3

Modern best practices for image segmentation tasks

I have been working on annotating medical images for an image segmentation task using a UNet. However, I find the entire process painstaking and long. Unfortunately there is also a lot of data that needs to be annotated due to the poor signal to noise in the images. I recently came across monai and concepts like interactive segmentation and segment anything paper. It made me realize that maybe my current approach for annotating is the wrong way to go about this. Because I have been using 3D slicer since it is easy to download and segment in the software, but it is also very difficult to export the data and did not seem to do thinks like using interactive segmentation or model-in-the-loop annotations.

I am wondering if I need to revamp the whole annotation procedure. I was thinking of applying pre-trained medical segmentation models (not sure if SAM is the right foundational model or if it is possible to adjust the number of channels in its input) to fine-tune on my medical task. I was also thinking that maybe 3D slicer is not the best tool and if there was a tool that was more user friendly and easy to save and export the data, while also bringing my UNet model in the loop to help with the annotations why simultaneously getting better.

1 Comment
2024/05/08
05:11 UTC

0

New techniques and Novelty in pose estimation

I want to do relative camera pose estimation using 2D images. The most traditional method seems to be keypoint correspondance and using bundle adjustment. Since I don't have access to say the CAD model or depth, I just have relative poses and 2d images. What seem to be the new methods to solve this problem? What areas can be looked into for novelty?

5 Comments
2024/05/08
03:15 UTC

2

Seeking advice for OBB detection in a vehicle damage assessment CV project

I’m currently developing a computer vision system aimed at assessing vehicle damage, focusing on detecting and analyzing issues with paint, tires, rims, and more through video feeds. The goal is to create robust oriented bounding boxes (OBB) around the damaged areas.

I've set up an ensemble of two different YOLO models combined with Sahi to improve object detection accuracy. While the YOLO models provide a broad detection scope, I'm leveraging Sahi for its capability to detect smaller, easily missed objects. However, I’m encountering challenges with Sahi introducing significant over-classification, and the generalization of smaller objects remains an issue.

One model assesses vehicle type the other model assesses the unique types of damage.

I’m seeking guidance from those who have tackled similar challenges in computer vision. Specifically I’m curious about:

  1. How can I refine my approach to reduce over-classification while maintaining the ability to detect smaller objects?
  2. Are there alternative models or techniques that could complement or replace parts of my current setup to improve accuracy and generalization?
  3. Any specific tuning tips for YOLO or Sahi when dealing with complex scenarios like damage assessment?

Any feedback, resource recommendations, or shared experiences with similar projects would be incredibly valuable as I navigate these challenges.

Thank you!

1 Comment
2024/05/08
02:15 UTC

6

Extremely accurate position finding - camera question

I'm looking to do an application where I need find the location of a black square to within +/- 0.13mm (0.005"). It is a black square on a white background, so a pretty simple image However, it's size can vary from 75-200mm square. My initial thought is to get a monochrome camera+lens combo that will let me see the image at 1 pixel = 0.01mm real-world size. This, however, starts to require a huge number of pixels for the largest of square sizes. Not a lot of 400MP cameras going around. Maybe it's good enough to do 1 pixel = 0.03mm or 0.05mm? Even those are still pretty high resolution cameras. Is my only option to get a high resolution camera and proper lens for it? Or are there other ways to do accurate an repeatable center finding to within such tight tolerances? If the size didn't vary so much I could do 4 smaller cameras, one at approximately each corner. This is a work project for in-house use, not for sale, if that matters. Thank you!

9 Comments
2024/05/07
22:12 UTC

1

Face classification accuracy low

I’ve taken the images from lfw from sklearn, and I’m trying to analyze the classification accuracy as you introduce more classes/people. But the models seem really inaccurate, almost guesswork, and I’m trying to see what I’m doing wrong.

I take the images (in color, with the original sizes), preprocess them by resizing to some shape like (64, 64, 3) (I’ve tried larger and smaller sizes, not much difference), and normalizing the values (I’ve also tried aligning the orientation of the images)—no centering or cropping was needed cus the faces are given centered and cropped from sklearn. Then I take a model (I tried vggface, deepface, resnet50, etc) to compute the embeddings. Then I classify the image according to the min Euclidean distance (I’ve tried cosine distance too).

I wanted to go for a one-shot learning context (in this case there is something like 1680 individuals/classes), but I’m even getting poor results incorprating 10 images per person before the test/train split (but marginally better than the one-shot case, where here there are around 150ish individuals/classes). In the latter context, for k <= 5 classes, accuracy is around 0.6-0.7, and then it shoots down as k increases (around 0.3 for k=10, 0.2 for k=20, 0.15 for k=40, 0.1 for k=140).

I’m wondering if I’m doing something wrong or this is expected? I’m thinking about maybe taking the 68 facial landmarks from dlib, then transforming each face nonlinearly according to some facial template (defined by some 68 points), but I don’t know of a way to find the transformations and warp the images accordingly. Additionally, I don’t know whether this would even sufficiently improve the accuracy.

Am I doing something wrong? Is this expected? Any tips or suggestions?

1 Comment
2024/05/07
17:28 UTC

0

Computer Vision in Sports

I hope you find this article well.
The OpenCV.ai team has gathered some of the most interesting examples of using computer vision. Read in the blog about how AI helps athletes to run faster and score more often, referees — to follow the rules with millimeter accuracy.
In this article, you will see different use cases of computer vision implementation in various types of sports. 

0 Comments
2024/05/07
16:48 UTC

1

Remove duplicate images from Polycam image export

I'm trying to run object detection on a set of images that are collected by Polycam room/lidar scan. For example, count the number of electrical wall outlets with Yolo

The export mode gives you a set of images of which some contain the same frames and therefore the same objects.

Any idea's what would be the best way to filter out duplicate images so objects will only be detected once?

3 Comments
2024/05/07
14:35 UTC

0

I have to write lbp descriptor in python from scratch with no predefined functions, someone help pls

Thanx in advance

0 Comments
2024/05/07
09:51 UTC

2

Object Detection in Sewer Condition

Hi everybody,
I am new in this computer vision field. My project is about object detection for detect defects in sewer from CCTV footage.

With the first dataset (3500 images), the training progress was good with YoloV2 and YoloV4 and mAP of 0.8 to 0.9, but when I tried with YoloX with same augmentation, same dataset ratio and training options, the return average precision is really low around 0.001, or even sometimes the detector cannot detect the defect in the images. The options inlcude 16 of minibatchsize, 0.0001 learning rate and 100 epochs. I am not really sure the reason why? Is that because of YoloV2 and V4 are anchor-based models while YoloX are anchor-free-based models?

For testing the model, I also collect public image dataset and got around 1853 images of 3 defects. The number labels of each defects are 707, 561 and 719, respectively. With this dataset, the training models are YoloV2 and V4, similar with the fist dataset. The augmentation and training options were similar, with 16 of minibatchsize, 0.0001 learning rate and 100 epochs. However, the return average precision is always low around 0.2 to 0.3, the validation loss was reuce to 3 and increase to 5-7 until finished. I am thinking this is because of overfitting problems? Do we have any solutions or advice for this problems?

Thank you so much.

1 Comment
2024/05/07
08:03 UTC

0

The 2nd ACMMM UAVs in Multimedia workshop: Capturing the World from a New Perspective

We are organizing the 2nd workshop on "UAVs in Multimedia: Capturing the World from a New Perspective" at ACM Multimedia 2024 (Melbourne, Australia). You are welcome to show your insights from a new perspective!

Workshop link: https://www.zdzheng.xyz/ACMMM2024Workshop-UAV/

Call for Paper:

The list of possible topics includes, but is not limited to:

• Video-based UAV Navigation

• UAV Swarm Coordination

• UAV-based Object Detection and Tracking

• UAV-based Sensing and Mapping

• UAV-based Delivery and Transportation

Workshop Papers Submission: 5 July 2024

Workshop Papers Notification: 30 July 2024

Call for Challenge

We also provide a multi-weather cross-view geo-localization dataset, called University160k-WX, and we welcome your participation in the competition.

Challenge Platform: https://codalab.lisn.upsaclay.fr/competitions/18770

0 Comments
2024/05/07
01:42 UTC

0

Meet other innovators incorporating computer vision and edge AI in products at the Embedded Vision Summit, coming up May 21-23 in Santa Clara, California.

Our main conference program on May 22-23 features nearly 100 presenters in four tracks, covering every aspect of practical computer vision and edge AI. Plus there are our Technology Exhibits, featuring 70+ innovative providers of key building blocks for perceptual AI—they’ll be showing hundreds of demos, and their top experts will be available to answer your questions. And on Tuesday, May 21 (the day before the main Summit program starts) don't miss Qualcomm’s Edge AI Deep Dive Session: “Accelerating Model Deployment with Qualcomm AI Hub”.

0 Comments
2024/05/06
21:46 UTC

5

how to utilize my time?

i have two months of time before my university starts. I want to use it productively. With five years of experience in deep learning and computer vision, how can I utilize this time? Should I take up competitive programming or a project, or brush up on my basics?

2 Comments
2024/05/06
20:35 UTC

1

Embedded vision summit 2024

Hi, can anyone help me with a discount code, or maybe a free pass to the exhibition? Please I am kind of broke and I need to be there to do some journaling

0 Comments
2024/05/06
20:23 UTC

3

Computer Vision Recommendation

Hello, I am interested about the realm of computer vision research. Can you recommend books,videos or any resources about the theory of computer vision so i can commence my journey. Thanks in advance.

6 Comments
2024/05/06
19:31 UTC

9

3D CV resources/roadmap

Hi all, I have decent knowledge in CV and have around 2 years of work experience with image processing/classification. But, I have only worked with 2D imgs and videos so far.

I need to begin with 3D CV along with camera callibration/rendering as well.

Can someone suggest good resources or road map on how to proceed with this.

Thanks in advance

9 Comments
2024/05/06
18:29 UTC

1

Automatically choose an image within the same cluster

I'm doing image similarity assessment, I'm trying to, for an input image, get a reference image that is the closest to the input image to reduce noise. My dataset contains images that are distributed as 3 clusters when I embedded them using ResNet51 and a TSNE. I tried to automatically choose the image within the same cluster by getting the image with the smallest embedding distance (Euclidean). However, it seems like this approach is not consistent, as the reference image can still be chosen from the other 2 clusters (instead of the cluster that the input image belongs to). Is there any alternative that I can approach this?

0 Comments
2024/05/06
18:26 UTC

2

Need help with OCR’ing scanned documents with Python

As the title says I’ve been trying to write out a program that will allow me to OCR PDF files that will just add the ocr on the pdf I don’t need anything to be extracted just make the pdf searchable like if the document is text heavy and they want to search a specific word within that document. I’ve watched tons of tutorials and none of them work the way I need them to . Thank you

5 Comments
2024/05/06
18:01 UTC

0

Need AI to recognise facial features

Hey! I need to make an AI that recognises facial features as:

  • Eye color
  • Haircut
  • Roman, bulbous, snub, hawk noses....
  • Covex, wavy foreheads,....
  • .....

How should it be done?

5 Comments
2024/05/06
17:41 UTC

Back To Top