/r/computervision
Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.
We welcome everyone from published researchers to beginners!
Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).
If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!
Related Subreddits
/r/computervision
I just wanted to ask if anyone video link for the lectures of CS131 Computer Vision: Foundations and Applications by Stanford, I found video lectures for CS231n which is CNN's for visual recognition but CS131 covers all the traditional techniques used in CV
link for CS231n: https://www.youtube.com/playlist?list=PLf7L7Kg8_FNxHATtLwDceyh72QQL9pvpQ
Hello!
I have a project, where I need some eye-tracking algorithm for detecting the coordinates of the screen the user gazes on. I tried some models available on github, but those ones are lack of accuracy or too difficult calibration (opencv calibration using chessboard was mindblowing), I wanted to integrate something accurate and easy to use =) Sooo... Do you have some modern eye-tracking algorithm/repository on your mind? =)
does anyone have any good resources for learning computer vision using C++, I personally have always solved vision problems using python and after seeing the job market and talking to one of my interviewer they told me that most of their vision tasks is done using C++ . If anyone has any good material that they can share I would really appreciate it, Thank you!!
EDIT
Hey everyone I found a reddit thread that explained my problem better and has some detailed information which you might find useful https://www.reddit.com/r/computervision/comments/12cyzwa/practical_c_for_computer_vision_jobs/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
this gives a better explanation of what I meant anymore and gives a good insight of its practicality of using C++ in CV thanks to u/CowBoyDanIndie , but does not have any resources per see if you have any and can share it will be really appreciated
Thank you!!
Hello,
I have three cameras and I'd like to find the distance in meters from point a to point b with in the frame as you can see in the uploaded images with the ground truth values.
Can someone please guide/advise me on how to tackle this problem?
What have I tried?
Please guide and advise. Thanks a lot.
Camera Details - Unifi G4 Pro.
|| || |Lens| 4.1–12.3 mmF ; ƒ/1.53–ƒ/3.3| |View angle|Wide: H: 109.9°, V: 60°, D: 127.7° Zoom: H: 35°, V: 19.8°, D: 40°|
Hello everyone,
I would like to develop a project for measurement of specific objects in real-world units, in particular to extract depth. Note that I do not intend to measure the distance to the camera, instead I want to find the height, width and depth relative to the object's plane.
I have previously experimented with Structure from Motion (SfM) for 3D reconstruction and then through point cloud manipulation and by knowing the dimensions of a reference square that I placed within the scene, I was able to roughly extract the dimensions. However the results were not incredible and I would like to try more state-of-the-art approaches.
I have been keeping an eye on recent developments in depth estimation (namely https://github.com/prs-eth/Marigold, https://github.com/LiheYoung/Depth-Anything ). Is it a good idea to use these kind of models to generate 3D models and perform the same approach that I mentioned earlier or would you suggest something else?
I mostly work in developing segmentation and detection deep learning models, so your help to dive into this world would be much appreciated!
Thank you in advance :)
I have been asked to extract embeddings of mammographies that may have various findings. I have the momographies and a table of the findings present in each of them, but I don't have any information about where the findings are located.
The idea of the project was to fine-tune a convolutional model to get more informative embeddings in the momography domain and deal with the domain-gap between the dataset used to train the backbone and my dataset. My main problem is that lacking positional information about the findings the models I try to train do not learn well or engage in predicting the same label distribution for each finding.
Reviewing the literature I have not found any work that performs multi-label classification only with just mammographies, without using ROIs or any other kind of positional information. What can I do to take advantage of my dataset and get more informative embeddings?
Hello everyone, I hope everyone is doing well. So I’m working on a 3D image construction project using AI, where I have been given thousands of wheel images in png format. How can I setup a pipeline that’s take all the image one by one and reconstruct a 3D image? And how to create a high resolution accurate 3D wheel image? Are there any good models available? If not, how can I build a model for this?
Hey folks,
I'm working on detecting defects in ball grid array (BGA) solder balls, but collecting and annotating enough real images is a pain cost and time-wise. So I'm trying to generate a synthetic dataset using Blender and BlenderProc instead.
Problem is, I'm not too familiar with Blender's Python API. Here's what a real BGA image looks like compared to one of my synthetic ones. https://imgur.com/a/5a2PdKU
I trained YOLO v8 on the synthetic data, but it's performing pretty badly on real images. Any tips on how I can improve the quality and realism of my synthetic BGA dataset? I'd really appreciate any advice or pointers from those more experienced with using Blender for this kind of thing.
Let me know if you need any other details! Thanks in advance for the help.
My dad stays on a farm where trespassers and attempted break-ins have become more frequent. He already has a standard 8 camera Hikvision setup at home and I would like to help him out by adding image classification to his CCTV feeds.
I saw a product called "Spotbot" which is exactly what we are looking for but my dad is cash strapped and you don't really have control over your models.
I have some experience in the field and thus want to give DIY a go.
I am hoping someone can point me in the right direction w.r.t hardware selection, criteria:
I looked at the following:
So far the Orange Pi and N100 mini PC looks like great options but given the fact that this is not my field of expertise, I would just like to find out what the community thinks/suggests?
P.S. The selected HW will be on the same network as the 8 CCTV cameras, it will then process 2-4 frames from each camera every second and once a human is detected, notify my dad.
Hi everyone,
I'm preparing a presentation and paper for my university course on "Spiking Neural Networks (SNNs) on Embedded Devices and Applications." It's a fascinating topic, but I'm tasked with introducing it to an audience unfamiliar with the concept.
I'm looking for:
Advice: How can I break down this complex topic in a way that's easy to understand? Sources: Any must-read papers or articles? Presentation tips: Any engaging ways to present this to first-time listeners? Any help or guidance would be greatly appreciated!
Hi guys. I’m working on a project about 3D road reconstruction from a static single camera (no stereo).
Please help me! Thanks.
Hey everyone. I'm preparing for my first on-site experiment. It will be on a factory. They value their confidentiality highly so this will be my only chance to go in there for now. They sent me one of the cameras planned to be used in the project.
There will be varied lighting conditions on the site and the product has a shiny finish, so I'm thinking about maybe using a polarizing filter for preventinng uniformity problems caused by shining. Apart from this, what important preparations should I not overlook? Like should I bring a chessboard-like piece to calibrating lens distortion? How can I best manage lighting and environmental challenges? Are there specific software tools or libraries you’d recommend for quick setup and analysis? What advice do you have for handling unexpected issues on the spot?
I would really appreciate any advice to ensure things running smoothly. Thanks in advance!
I have been working on annotating medical images for an image segmentation task using a UNet. However, I find the entire process painstaking and long. Unfortunately there is also a lot of data that needs to be annotated due to the poor signal to noise in the images. I recently came across monai and concepts like interactive segmentation and segment anything paper. It made me realize that maybe my current approach for annotating is the wrong way to go about this. Because I have been using 3D slicer since it is easy to download and segment in the software, but it is also very difficult to export the data and did not seem to do thinks like using interactive segmentation or model-in-the-loop annotations.
I am wondering if I need to revamp the whole annotation procedure. I was thinking of applying pre-trained medical segmentation models (not sure if SAM is the right foundational model or if it is possible to adjust the number of channels in its input) to fine-tune on my medical task. I was also thinking that maybe 3D slicer is not the best tool and if there was a tool that was more user friendly and easy to save and export the data, while also bringing my UNet model in the loop to help with the annotations why simultaneously getting better.
I want to do relative camera pose estimation using 2D images. The most traditional method seems to be keypoint correspondance and using bundle adjustment. Since I don't have access to say the CAD model or depth, I just have relative poses and 2d images. What seem to be the new methods to solve this problem? What areas can be looked into for novelty?
I’m currently developing a computer vision system aimed at assessing vehicle damage, focusing on detecting and analyzing issues with paint, tires, rims, and more through video feeds. The goal is to create robust oriented bounding boxes (OBB) around the damaged areas.
I've set up an ensemble of two different YOLO models combined with Sahi to improve object detection accuracy. While the YOLO models provide a broad detection scope, I'm leveraging Sahi for its capability to detect smaller, easily missed objects. However, I’m encountering challenges with Sahi introducing significant over-classification, and the generalization of smaller objects remains an issue.
One model assesses vehicle type the other model assesses the unique types of damage.
I’m seeking guidance from those who have tackled similar challenges in computer vision. Specifically I’m curious about:
Any feedback, resource recommendations, or shared experiences with similar projects would be incredibly valuable as I navigate these challenges.
Thank you!
I'm looking to do an application where I need find the location of a black square to within +/- 0.13mm (0.005"). It is a black square on a white background, so a pretty simple image However, it's size can vary from 75-200mm square. My initial thought is to get a monochrome camera+lens combo that will let me see the image at 1 pixel = 0.01mm real-world size. This, however, starts to require a huge number of pixels for the largest of square sizes. Not a lot of 400MP cameras going around. Maybe it's good enough to do 1 pixel = 0.03mm or 0.05mm? Even those are still pretty high resolution cameras. Is my only option to get a high resolution camera and proper lens for it? Or are there other ways to do accurate an repeatable center finding to within such tight tolerances? If the size didn't vary so much I could do 4 smaller cameras, one at approximately each corner. This is a work project for in-house use, not for sale, if that matters. Thank you!
I’ve taken the images from lfw from sklearn, and I’m trying to analyze the classification accuracy as you introduce more classes/people. But the models seem really inaccurate, almost guesswork, and I’m trying to see what I’m doing wrong.
I take the images (in color, with the original sizes), preprocess them by resizing to some shape like (64, 64, 3) (I’ve tried larger and smaller sizes, not much difference), and normalizing the values (I’ve also tried aligning the orientation of the images)—no centering or cropping was needed cus the faces are given centered and cropped from sklearn. Then I take a model (I tried vggface, deepface, resnet50, etc) to compute the embeddings. Then I classify the image according to the min Euclidean distance (I’ve tried cosine distance too).
I wanted to go for a one-shot learning context (in this case there is something like 1680 individuals/classes), but I’m even getting poor results incorprating 10 images per person before the test/train split (but marginally better than the one-shot case, where here there are around 150ish individuals/classes). In the latter context, for k <= 5 classes, accuracy is around 0.6-0.7, and then it shoots down as k increases (around 0.3 for k=10, 0.2 for k=20, 0.15 for k=40, 0.1 for k=140).
I’m wondering if I’m doing something wrong or this is expected? I’m thinking about maybe taking the 68 facial landmarks from dlib, then transforming each face nonlinearly according to some facial template (defined by some 68 points), but I don’t know of a way to find the transformations and warp the images accordingly. Additionally, I don’t know whether this would even sufficiently improve the accuracy.
Am I doing something wrong? Is this expected? Any tips or suggestions?
I hope you find this article well.
The OpenCV.ai team has gathered some of the most interesting examples of using computer vision. Read in the blog about how AI helps athletes to run faster and score more often, referees — to follow the rules with millimeter accuracy.
In this article, you will see different use cases of computer vision implementation in various types of sports.
I'm trying to run object detection on a set of images that are collected by Polycam room/lidar scan. For example, count the number of electrical wall outlets with Yolo
The export mode gives you a set of images of which some contain the same frames and therefore the same objects.
Any idea's what would be the best way to filter out duplicate images so objects will only be detected once?
Thanx in advance
Hi everybody,
I am new in this computer vision field. My project is about object detection for detect defects in sewer from CCTV footage.
With the first dataset (3500 images), the training progress was good with YoloV2 and YoloV4 and mAP of 0.8 to 0.9, but when I tried with YoloX with same augmentation, same dataset ratio and training options, the return average precision is really low around 0.001, or even sometimes the detector cannot detect the defect in the images. The options inlcude 16 of minibatchsize, 0.0001 learning rate and 100 epochs. I am not really sure the reason why? Is that because of YoloV2 and V4 are anchor-based models while YoloX are anchor-free-based models?
For testing the model, I also collect public image dataset and got around 1853 images of 3 defects. The number labels of each defects are 707, 561 and 719, respectively. With this dataset, the training models are YoloV2 and V4, similar with the fist dataset. The augmentation and training options were similar, with 16 of minibatchsize, 0.0001 learning rate and 100 epochs. However, the return average precision is always low around 0.2 to 0.3, the validation loss was reuce to 3 and increase to 5-7 until finished. I am thinking this is because of overfitting problems? Do we have any solutions or advice for this problems?
Thank you so much.
We are organizing the 2nd workshop on "UAVs in Multimedia: Capturing the World from a New Perspective" at ACM Multimedia 2024 (Melbourne, Australia). You are welcome to show your insights from a new perspective!
Workshop link: https://www.zdzheng.xyz/ACMMM2024Workshop-UAV/
Call for Paper:
The list of possible topics includes, but is not limited to:
• Video-based UAV Navigation
• UAV Swarm Coordination
• UAV-based Object Detection and Tracking
• UAV-based Sensing and Mapping
• UAV-based Delivery and Transportation
Workshop Papers Submission: 5 July 2024
Workshop Papers Notification: 30 July 2024
Call for Challenge
We also provide a multi-weather cross-view geo-localization dataset, called University160k-WX, and we welcome your participation in the competition.
Challenge Platform: https://codalab.lisn.upsaclay.fr/competitions/18770
Our main conference program on May 22-23 features nearly 100 presenters in four tracks, covering every aspect of practical computer vision and edge AI. Plus there are our Technology Exhibits, featuring 70+ innovative providers of key building blocks for perceptual AI—they’ll be showing hundreds of demos, and their top experts will be available to answer your questions. And on Tuesday, May 21 (the day before the main Summit program starts) don't miss Qualcomm’s Edge AI Deep Dive Session: “Accelerating Model Deployment with Qualcomm AI Hub”.
i have two months of time before my university starts. I want to use it productively. With five years of experience in deep learning and computer vision, how can I utilize this time? Should I take up competitive programming or a project, or brush up on my basics?
Hi, can anyone help me with a discount code, or maybe a free pass to the exhibition? Please I am kind of broke and I need to be there to do some journaling
Hello, I am interested about the realm of computer vision research. Can you recommend books,videos or any resources about the theory of computer vision so i can commence my journey. Thanks in advance.
Hi all, I have decent knowledge in CV and have around 2 years of work experience with image processing/classification. But, I have only worked with 2D imgs and videos so far.
I need to begin with 3D CV along with camera callibration/rendering as well.
Can someone suggest good resources or road map on how to proceed with this.
Thanks in advance
I'm doing image similarity assessment, I'm trying to, for an input image, get a reference image that is the closest to the input image to reduce noise. My dataset contains images that are distributed as 3 clusters when I embedded them using ResNet51 and a TSNE. I tried to automatically choose the image within the same cluster by getting the image with the smallest embedding distance (Euclidean). However, it seems like this approach is not consistent, as the reference image can still be chosen from the other 2 clusters (instead of the cluster that the input image belongs to). Is there any alternative that I can approach this?
As the title says I’ve been trying to write out a program that will allow me to OCR PDF files that will just add the ocr on the pdf I don’t need anything to be extracted just make the pdf searchable like if the document is text heavy and they want to search a specific word within that document. I’ve watched tons of tutorials and none of them work the way I need them to . Thank you
Hey! I need to make an AI that recognises facial features as:
How should it be done?