/r/computervision

Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group

/r/computervision

90,042 Subscribers

0

Open Source Computer Vision App with UI

Hello legends, I’ve recently stumbled into your community and I’m blown away at what some of you are able to accomplish. I’ve been tasked to implement an open source computer vision project to detect missing labels on our assembly line. I know there are a lot of really powerful builds out there, but I’m more curious to ask if there are some with user interfaces to retrain or to have a button to click to take a picture and run the detection.

I’m sorry for the lack of detail, I’m new to a lot of this field. This is quite a stretch for me and any information you could provide would be greatly appreciated!

J

3 Comments
2024/04/25
18:35 UTC

0

Rabbit R1 AI Real World Uses First Impressions

0 Comments
2024/04/25
17:50 UTC

2

Help with Object Detection and Tracking

I have a project in my collage about making a software that does the following:

  1. Count Red Blood Cells (RBCs) from a video.
  2. Count whit blood cells (WBCs) from a video.

I'm new to this filed but I managed to gather some information.

So I used YOLOv9 for object detection and I've trained it on my custom data (will leave a link for the data link in RoboFlow).

I'm using supervision to do the tracking part, but for some reason I can't track the cells correctly, the problems I've faced are:

  1. Object ID keeps repeating for different cells of the same type.
  2. Detecting new cells as old ones with an old ID.

Notes for your background:

the cells shape changes throughout the video because they can deform.

the best method for tracking I can think of is to track the center of the cell regardless of its shape, but I can't do that either.

I've tried to use CenterTrack but I can't get my head around it and it's very complicated and old that makes me obligated to change a lot of the source code my self.

I'm open to use totally different models than the ones that I've selected so if you have an idea that will work better in my case I'm all ears :)

Thanks in advance <3

The used Dataset for training

My work on Google Colab

Sample of the input video

1 Comment
2024/04/25
16:27 UTC

2

Help with Object Detection and Tracking

I have a project in my collage about making a software that does the following:

  1. Count Red Blood Cells (RBCs) from a video.
  2. Count whit blood cells (WBCs) from a video.

I'm new to this filed but I managed to gather some information.

So I used YOLOv9 for object detection and I've trained it on my custom data (will leave a link for the data link in RoboFlow).

I'm using supervision to do the tracking part, but for some reason I can't track the cells correctly, the problems I've faced are:

  1. Object ID keeps repeating for different cells of the same type.
  2. Detecting new cells as old ones with an old ID.

Notes for your background:

the cells shape changes throughout the video because they can deform.

the best method for tracking I can think of is to track the center of the cell regardless of its shape, but I can't do that either.

I've tried to use CenterTrack but I can't get my head around it and it's very complicated and old that makes me obligated to change a lot of the source code my self.

I'm open to use totally different models than the ones that I've selected so if you have an idea that will work better in my case I'm all ears :)

Thanks in advance <3

The used Dataset for training

My work on Google Colab

Sample of the input video

0 Comments
2024/04/25
16:27 UTC

3

Recent work on two stage object detection models?

I’ve been working with YOLO models for the past few years, but thinking about trying some new stuff out, and trying to get back up to date with what’s been going on in the field. There have been new YOLO advances every year or so, but Faster R-CNN is from 2016 and it seems like it’s still the standard in that family - is that correct? I was reading about G-RCNN, but even that is from 2021 which isn’t exactly new. Has that line of research turned to DETR and other transformer models? Overall I’m trying to get a better sense of where the field is, and having some trouble sorting through info.

1 Comment
2024/04/25
14:24 UTC

1

Methods to filter segmentation masks.

An object is positioned on a rotating platform, allowing a stationary camera to capture multiple images. These images are then fed into Segment Anything Model to generate segmentation masks.

SAM produces high quality masks but there some masks that don't meet the standard and i have to filter them out. Currently, I have to manually go through the masks and filter out the bad masks. Are there methods to automate this?

Current solution: Counting contour Contour area to filter

3 Comments
2024/04/25
11:24 UTC

1

Help with no-identification object detection

Hi everyone, I am doing a college project with little knowledge on AI and especially computer vision and I need some help. Our goal is to take a picture of several ingredients, identify them, and suggest recipes. I already have a good model trained to identify ingredients in an image with 1 single ingredient. Right now, I want to take an image with many ingredients, isolate each one with a bounding box and feed them into the model so that it predicts 1 by 1.

I tried to find models that just create a bounding box for any object in the image. I dont want the model to identify the objects, it doesnt even need to know what they are, only that there is something there.

I tried YOLO, but apparently it only identifies objects for which it was trained on so in a picture with apples, oranges and strawberries it will not detect strawberries as an object, for example. I tried other packages and even Opencv but I cant seem to get it working the way I want to. Any suggestions? Just a package or link to some examples would be great, as I cant seem to find a tool that does exactly what I need.

Thanks in advance!

2 Comments
2024/04/25
10:57 UTC

86

Computer vision on an MCU, and I got this fan that follows my every single move! No more manual adjustments or stagnant air!!!

12 Comments
2024/04/25
09:55 UTC

0

Detection of Smoke and Fire

Hi everyone,

I was researching about ways to detect fire and smoke using Computer vision and I wanted to know what are the ways one can achieve that. I was able to find that Object Detection and Object Segmentation are two such ways.

I also think edge detection based methods can work but they would need to be scene specific, right? Are there other methods of solving this issue?

An important assumption: This method should work for indoor environments with controlled light conditions.

0 Comments
2024/04/25
09:40 UTC

1

Text Line Dewarping Dataset

I'm looking for any public available dataset that contains curved text lines (preferably one per image), like those from "Alignment of Curved Text Strings for Enhanced OCR Readability". I created an algorithm of my own (good enough for a paper, as per my professor) and I need a dataset to test its performance.

0 Comments
2024/04/25
07:42 UTC

1

Diff master degrees and job titles???

My career goal is to work within research/ R&D type of work within robotic perception. Currently working towards the masters with focus on perception with full time work. My degree is masters in science of computer science.

My questions are..

  • Is this good degree for my career goal? Will M.S. AI / M.S. autonomous system/robotics / M.S. CV would be better??
  • My current role is sr ML engineer with focusing on CV; however I have another opportunity for ai solution architect. If I wanna remain on technical manager side, is ML engineer better or AI SA better??

Thank you in advance!!

0 Comments
2024/04/25
05:28 UTC

14

How do we accurately measure objects in the real world using Computer Vision without any reference object?

the problem is the following given an image like this how do we estimate foot length and width using image processing or deep learning techniques?

  1. I have tried image processing techniques but the results weren't good at all
  2. I have used yolov8 + FastSAM for zero-shot segmentation along with a reference object and it works perfectly but the client said I don't want to use any reference object so what will be the solution for such a case?

I know the solution is complicated for such a problem but I want to hear your ideas

https://preview.redd.it/e1b3nmmlmjwc1.png?width=1425&format=png&auto=webp&s=54e74b09ed4db3437c8ee5e44100c46bafd67946

https://preview.redd.it/ke8h6d4lljwc1.jpg?width=1200&format=pjpg&auto=webp&s=8ce1beeb91b4f0e97ece3b0d1117c3cf1801520e

33 Comments
2024/04/25
03:20 UTC

1

Will perceptual hash image recognition detect two similar selfies?

So basically I’ve got two images. It’s of me and my friend at a rock climbing image, a selfie. One of them is of us smiling, and the other was taken 10 seconds later of us not smiling. The phone was more or less kept in the same spot for both photos. Would this get flagged by duplicate image recognition algorithms and software? Thanks 🙏

2 Comments
2024/04/25
02:06 UTC

1

Which are the current technologies for joint detection?

I have been seeing some projects using kinect and joint detection and I was wondering what is behind that? How can you implement something like that? what is nowadays state of the art?

6 Comments
2024/04/24
22:28 UTC

1

Need help with TensorFlow not detecting Nvidia GPU (GTX 1650) despite having both Nvidia and AMD GPUs

Hey everyone,

I'm encountering an issue with TensorFlow where it's not detecting my Nvidia GeForce GTX 1650 GPU for acceleration, despite having both an Nvidia GPU and an AMD Radeon Graphics card installed in my system. I've tried various troubleshooting steps, but I'm still unable to get TensorFlow to recognize the Nvidia GPU.

If anyone has experienced a similar issue or has any suggestions on how to resolve this, I would greatly appreciate your help. I'm also open to alternative solutions or recommendations for frameworks that support both Nvidia and AMD GPUs for machine learning tasks.

And yes, this was the code I used to search for GPUs

tf.config.list_physical_devices('GPU')
11 Comments
2024/04/24
19:16 UTC

0

Where are my keys? computer vision app

Hi there, this is Antonio from Italy and I'm working on my startup that integrates computer vision. I'm looking for people skilled in computer vision to help me complete my project. My project consist of a raspberry pi that recognise object in a drawer, I want to bring this concept to consumer and integrate this into a mobile app, for the app side is covered, I just need a team member for the computer vision, if interested you can contact me on telegram @ Hairlatte

This is not my first startup, I worked in a startup accelerator for almost 5 years, I'd like to make this project available on the market

5 Comments
2024/04/24
17:32 UTC

10

Explaination about Cross Validation

Hi guys, I had an interview process going on for a company for the position of computer vision engineer. One stage was take home assignment which had a few questions. One question involved classifying fruits and the images were of pretty high resolution (1024x768) but were low in numbers(700 images approx). I tried transfer learning with data augmentation but finally picked SVM on hand extracted features owing to its better performance. In feedback, they objected the use of traditional methode and when I explained the reason they sent me this feedback. 'the images don't have the same resolution (means I reduced the resolution before processing the images). Also, the results obtained with cross validation cannot be used to measure generalisation performance.(as I had mentioned the reason for cross validation as it generalizez better)'. Ofcourse it was a rejection but I really want to know if their feedback is right. I beleive reducing quality and hence the computational power while producing good results is concidered a good practice. Also as per my knowledge, the whole purpose of cross-val is to measure generalized performance of the mode.

I would really be grateful if someone can explain me/ correct me if I am wrong here.

6 Comments
2024/04/24
14:45 UTC

5

Creating Segmentation Masks of Rings (i.e circles with holes in )

Hello

I am a beginner. I have tried to format this as clearly as possible but if something is confusing please let me know

I am working to build a fairly basic object classifier that will identify between lines and rings. My first step is to build an annotated library for training. I've sped this process up slightly by using a generalist segmentation algorithm to create masks around the objects. The problem is that the masks generated for the rings seem to be detecting the edges and then fills in the hole in the center so it looks like a circle (image attached).

https://preview.redd.it/urk6qhuaqfwc1.png?width=660&format=png&auto=webp&s=a0a838058942dbcf17c0ef34695ad77804ab39a3

https://preview.redd.it/ioelactbqfwc1.png?width=793&format=png&auto=webp&s=37953eca984a8d053203666a72829a195aaa9e13

Even when I did the segmentation myself either with cv2.findCountours() or just with edge detection I was able to make some fairly accurate masks for the lines. However, I still had this issue where the mask for the rings was completely filled in

Is there a way to avoid this so that I can get masks that retain the hole in the center?

One of the goals of the project is to get information out of the ring structures like thickness of the ring which is why having just a filled in mask wouldn't be as ideal. Additionally my biggest bottleneck at the moment is going in and redefining the holes in these rings so this would speed up my annotating by a LOT

Thanks!

3 Comments
2024/04/24
14:08 UTC

1

Using pretrained pose estimation model for MPII TRB prediction

Hi :) I want to use a pretrained pose estimation model to further train it on the MPII-TRB-Dataset. Most of the time, those models are trained with the COCO Dataset containing 17 keypoints.

I tried to do this with the detectron2 framework and MMPose, but in both cases I can’t find ways to adopt the pretrained models in a way that in they predict 40 Keypoints instead of those 17 Keypoints from the COCO Dataset.

When it comes to detectron2 I wrote my code based on this tutorial and accordingly changed the values of “cfg.MODEL.ROI_KEYPOINTS_HEAD.NUM_KEYPOINTS” and “cfg.TEST.KEYPOINT_OKS_SIGMAS” according to my dataset but it would still only predict the 17 Keypoints from COCO.

I also head a look on MMPose but didn’t get any results. Am I correct in assuming that I only need to adapt the head of a 2D keypoint detection model? And if so, how can I adapt it accordingly to my problem? Help is very much appreciated.

2 Comments
2024/04/24
11:05 UTC

8

Uncensored auto-captioning libraries that work well for NSFW image datasets

I have a large (>2.5 million files) dataset of NSFW images that I would like to auto-generate detailed (~100-150 token) captions for, using a visual language model similar to CogVLM or Llava.

I have tried both CogVLM and Llava, and unfortunately both models are far too heavily censored to complete the task. The responses range either from outright refusal to caption the images, or captions that are so heavily filtered for "appropriateness" that they fail to describe the important features of the image. Also, it appears that they struggle to comprehend the actual contents of the images, due to having a skewed training dataset that deliberately excluded this class of images.

I am wondering if there are any image understanding models that work in a similar way to CogVLM but that are less heavily censored and work well with captioning NSFW images and are comfortable writing output with "vulgar" language? Basically, I want to provide a text prompt (e.g. "Describe what these people are doing in this image.") and an image file, and have the model provide a caption with the specified # of tokens.

Or alternatively, if there are no such models that already exist, do you know of any work that people have done to create fine-tunes of CogVLM/Llava that would work for this purpose, and what this fine tuning process would look like for the task described above? Thanks!

8 Comments
2024/04/24
04:40 UTC

1

Adapting pre trained semantic/panoptic segmentation on unseen datasets

I have a project where I need to adapt pretrained segmentation models on new unlabeled datasets. So it would fall under unsupervised domain adaptation. However, I am struggling with finding proper resources on how to do it. All of the tutorials and documentation I am seeing is for classification. If anyone know of a good post/tutorial or GitHub repo beyond Roboflow, please help 🙂

1 Comment
2024/04/24
01:58 UTC

1

Medical CT + Radiomics

Hi everyone! I’m a beginner at working on 3D CT and I have tried extracting features from it using Radiomics. I first segmented the lungs from the CT using a prebuilt deep learning model and used the mask to guide the PyRadiomics on the area to extract the features from. However, the features I got were not deterministic of the groups I have when I plugged it into a umap. Any thoughts/suggestions and alternatives to the process?

2 Comments
2024/04/23
21:41 UTC

0

How Long Do You Think It Will Be Until We Get FDVR

1 Comment
2024/04/23
20:23 UTC

0

Denoising diffusion model for image-to-image translation

I’ve been working on a project to do paired image-to-image translation and have been using GANs (pix2pix) so far. However, I wanted to know if I could use denoising diffusion models for this purpose, and if so, if there are any GitHub repos where people have already implemented this for 2D or 3D (like MRI/CT) images? Thanks!

1 Comment
2024/04/23
19:16 UTC

9

Why do most Computer Vision startups prefer IOS to Android?

I was researching on some computer vision startups, i noticed majority of them are IOS first and Android at a later stage.

I understand ANE in iphones, are there any other factors?

18 Comments
2024/04/23
18:27 UTC

5

Where to start

Hello everyone, I’m a mechanical engineering and lately with my company I’m dealing with computer vision. One of our customers asked us to integrate a system to detect problems in the foundry process (i.e. lack of materials, scratches, etc. ). I was involved in the project because I’ve the knowledge to establish the cause of a lack of material and because for other purposes I use vision systems in very simple application to recognise geometries or boundaries. Since this time my knowledge about cameras, lights, algorithms is not enough, I’m interested in studying these arguments both from a theoretical point of view and from a programming point of view (I’ve some rudiments of python and c++).

For the theory I’ve discovered the YouTube channel “first principles of computer vision”. I wonder if there are other YouTube channels (suggested courses on udemy, udacity or Similar) that focus on implementing simple algorithms (maybe with examples, projects, exercises).

I’m not really interested into neural network because for my job the classic edge detection, filtering and so on is more then enough (at least in this moment).

11 Comments
2024/04/23
16:46 UTC

Back To Top