/r/computervision

Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group

/r/computervision

101,944 Subscribers

2

nnU-NetV2 pre-trained?

Hello everyone,

I was reading the nnU-Net paper: https://www.nature.com/articles/s41592-020-01008-z or arxiv version https://arxiv.org/abs/1809.10486 and I was wondering if I can find the pre-trained version of their model? Specifically I'm looking for the nnU-Net that they themselves trained on their dataset, since I am looking to work with the same Medical Decathlon dataset, conducting Knowledge Distillation specifically.

I found their github https://github.com/MIC-DKFZ/nnUNet They provide details on how to train and do inference etc. generally. I was originally going to train an nnUNet on the existing Medical Decathlon dataset but that would be doing work that already has been done. I was wondering if anyone knows how do I find the trained model instance that they worked with? I thought about emailing them, but idk how acceptable is that of a request in the CV community.

nnUNetV1 would also be fine.

Would be grateful for any advice.

1 Comment
2024/11/01
09:31 UTC

2

Papers that discusses these terms

Hi, I’m looking for papers that goes into more depth of these components mentioned in my professors lecture. The closest thing I have found is data association discussed in ORB-SLAM3 paper, but it does not mention loop closure in local map or global map.

My professor said this is discussed in many papers, but I have so far not found one.

1 Comment
2024/11/01
06:24 UTC

1

Train S3D Video Classification Model using PyTorch

Train S3D Video Classification Model using PyTorch

https://debuggercafe.com/train-s3d-video-classification-model/

PyTorch (Torchvision) provides a host of pretrained video classification models. Training and fine-tuning these models can prove to be an invaluable asset in building many real-life applications. However, preparing the right code to start with custom video classification training can be difficult. In this article, we will train the S3D video classification model from PyTorch. Along the way, we will discuss the pitfalls, caveats, and optimization techniques specific to the model.

https://preview.redd.it/6vkg4vwhr6yd1.png?width=1000&format=png&auto=webp&s=33eff9aa999a466cf3c16a2b8d29a379264eb5d9

0 Comments
2024/11/01
00:38 UTC

2

How to calculate distance from a profile view?

Hello, everyone! I'm currently working on a YOLO vision project using the pose model, and I'm having some issues estimating distance using only a camera. I know it might sound a bit arbitrary, but this is the solution we have for now while I wait for a LiDAR sensor I ordered last week. Since I'm in a Latin American country, it may take a month to arrive.

Right now, we're estimating distance by using the focal length with the person facing the camera, and it seems to be working well, with an error margin of around 20-25 cm. Here’s the code we're using:

```

float YoloV8::estimateDistance(const cv::Rect_<float>& bbox, const std::vector<float>& keypoints) {

// If we have valid keypoints, use the shoulder distance

if (!keypoints.empty() && keypoints.size() >= 21) { // Ensure we have enough keypoints

// Get shoulder coordinates (5 and 6 in COCO format)

float shoulder1X = keypoints[12]; // Right shoulder X (5 * 3)

float shoulder2X = keypoints[15]; // Left shoulder X (6 * 3)

float shoulder1Conf = keypoints[14]; // Right shoulder confidence

float shoulder2Conf = keypoints[17]; // Left shoulder confidence

// If both shoulders are detected with sufficient confidence

if (shoulder1Conf > KPS_THRESHOLD && shoulder2Conf > KPS_THRESHOLD) {

float shoulderWidth = std::abs(shoulder2X - shoulder1X);

if (shoulderWidth > 0) {

return (AVERAGE_SHOULDER_WIDTH * CAMERA_FOCAL_LENGTH_SHOULDERS) / shoulderWidth;

}

}

}

// Fallback to the original method if we cannot use shoulders

return (AVERAGE_PERSON_WIDTH * CAMERA_FOCAL_LENGTH) / bbox.width;

}

```

The issue I’m currently facing is with profile views; the distance calculation becomes inaccurate, returning values that don't make sense.

4 Comments
2024/10/31
19:51 UTC

0

Looking for contract based projects (CV, ML, Robotics and IoT)

Background: Worked in the research labs of McGill University and IISC Bangalore in the fields of CV, ML, Robotics and IoT

Tech stacks: PyTorch, OpenCV, Mediapipe, ROS, puredata, C++

Currently looking for contract based projects, if you a professional looking to delegate your work, or a college student looking to get their final year project done at an industrial level, feel free to contact me for my portfolio/profile.

1 Comment
2024/10/31
19:09 UTC

1

Creating a robot for you all and I am hoping we can collaborate on it together.

I am really trying to find my target market, and it would really help me out if some of you took this survey for me. We will be releasing more information about it in the future. I think you all will love it, developers and hobbyists alike. I am trying to figure out who my target market is, and it would be extremely helpful if some of you could fill out this survey for me. https://forms.gle/6KzCHZskboepSpWQ6

1 Comment
2024/10/31
18:26 UTC

2

Experimental Design for Multi-Channel Imaging via Task-Driven Feature Selection (ICLR)

The paper aims to shorten acquisition time, reduce costs, and accelerate the deployment of imaging devices.

https://openreview.net/pdf?id=MloaGA6WwX

Contributions:

  • A novel method for supervised feature selection that performs task-based image channel selection.
  • Results shorten the acquisition time in MRI, reconstruct image cubes of remotely-sensed multispectral ground images with few sensors, estimate tissue oxygenation from hyperspectral medical devices.
  • Results show improvement on i) classical experimental design, ii) recent application-specific published results, iii) state-of-the-art approaches in supervised feature selection.

We expect further applications to similar datatypes e.g. data efficiency on multi-channel images, other hyperspectral/multispectral application, cell microscopy, weather and climate data et.c

Code is available, PM me if interested.

0 Comments
2024/10/31
16:59 UTC

1

MBA students researching machine and computer vision

Hi all. I am an MBA student at Temple university and we are doing our final project looking at Machine and computer vision. I would be grateful if you would be able to fill out this survey and if possible send to anyone else that works in manufacturing. We are looking for opinions from those that currently and do not currently use vision systems. Here is the link to the survey: https://fox.az1.qualtrics.com/jfe/form/SV_0cEBnNUQ9jnxZpI

 Alternatively if you would like to do a short interview on your experiences, this would also be much appreciated.

Thanks so much!

1 Comment
2024/10/31
16:35 UTC

2

Need help with conditional random fields

Hi all:

I am reading a paper and need to thoroughly understand it. This is the paper: https://ieeexplore.ieee.org/abstract/document/6983606

I can pay. If anyone here is well versed in this and can read through and thoroughly understand/help me implement this, please DM me. Thanks!

0 Comments
2024/10/31
15:11 UTC

4

Alternatives for grounding Dino?

I’m looking for a model like gdino, where there is a sort of open-vocabulary/zero-shot support, but also one that is preferably faster (and maybe smaller/less resource intensive). I looked into yolo-world but it didnt support the open-vocab part quite like I wanted to (e.g. instead of detecting all apples in a scene, I would want to detect “apple on table” which gdino is much better at compared to yolo world from what I’ve tested).

Or should I just maybe fine-tune yolo world to do what I want it to do?

6 Comments
2024/10/31
14:36 UTC

5

Anyone got ideas on how Claude Computer Use picks coordinates?

Title

5 Comments
2024/10/31
09:35 UTC

3

Any better alternatives to OmniParser?

Tried out omniparser and it's pretty decent, but it misses some stuff. Also, I'd like something that can recognize boxes / layouts instead of just icons / text

1 Comment
2024/10/31
02:02 UTC

0

Take this survey to help out an intern at a robotics startup <3

We are making a robot for you: https://forms.gle/ggVetcDios9m15yV8you:

2 Comments
2024/10/30
23:15 UTC

1

Problems with opening a video

Hi!

I recently started my adventure with computer vision. I wrote some code that was supposed to use YOLO algorythm working with GPU (I am using nvidia cuda), encountered whole lot of errors trying to open video files with it and I'm still having some problems with it - it seems to have problems with reading the frames. The code is down below. I spent couple hours with chat gpt and scrolling through internet in search of help but nothing worked:(, also checked the directory, the video resolution, it seems to be fine. Do you have any idea how to repair it? I will be grateful for any kind of help!

P.S. ffmpeg seems to have no problems with localizing and opening the file through command prompt

import random
import threading
import cv2 as cv
import numpy as np
from ultralytics import YOLO
import torch
import time
import subprocess as sp
import os

cv.setNumThreads(1)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load the class list
with open("utils/coco.txt", "r") as my_file:
    class_list = my_file.read().strip().split("\n")

detection_colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) for _ in range(len(class_list))]
model = YOLO("weights/yolov8n.pt", "v8").to(device)

def read_video_ffmpeg(path, frame_wid=640, frame_hyt=480):
    command = ['ffmpeg', '-loglevel', 'error', '-i', path, '-f', 'image2pipe', '-pix_fmt', 'bgr24', '-vcodec', 'rawvideo', '-']
    print("Running FFmpeg command:", " ".join(command))  # Debug print
    pipe = sp.Popen(command, stdout=sp.PIPE, stderr=sp.PIPE, bufsize=10**8)

    while True:
        raw_image = pipe.stdout.read(frame_wid * frame_hyt * 3)
        print(f"Raw image length: {len(raw_image)}")

        # Check if FFmpeg gave any error
        err = pipe.stderr.read().decode()
        if err:
            print("FFmpeg error:", err)
            break

        if not raw_image:
            print("End of video stream or no data received.")
            break

        try:
            frame = np.frombuffer(raw_image, dtype='uint8').reshape((frame_hyt, frame_wid, 3))
            yield frame
        except ValueError as e:
            print(f"Error reshaping frame: {e}")
            break

    pipe.stdout.close()
    pipe.stderr.close()
    pipe.terminate()

class Video:
    def __init__(self, src="D:/OPENCV/videos/DJI_0302.MP4"):
        self.src = src
        
        # Check if the video file exists
        if not os.path.isfile(self.src):
            print(f"Error: Video file does not exist at path: {self.src}")
            return  # Stop initializing if file does not exist

        self.frame_wid = 2720  # Update frame width
        self.frame_hyt = 1536  # Update frame height
        self.frame_gen = read_video_ffmpeg(self.src, self.frame_wid, self.frame_hyt)
        self.frame = None
        self.running = True
        threading.Thread(target=self.update, daemon=True).start()

    def update(self):
        while self.running:
            try:
                self.frame = next(self.frame_gen)
                print("Frame read successfully")
            except StopIteration:
                self.running = False
            except Exception as e:
                print(f"Error updating frame: {e}")
                self.running = False

    def read(self):
        return self.frame

    def stop(self):
        self.running = False

video_stream = Video(src="D:/OPENCV/videos/DJI_0302.MP4")
fps_limit = 10

while video_stream.running:
    start_time = time.time()
    frame = video_stream.read()

    if frame is None:
        print("No frame to display")
        break

    detect_params = model.predict(source=[frame], conf=0.25, save=False)

    if detect_params:
        boxes = detect_params[0].boxes
        for box in boxes:
            clsID = int(box.cls.cpu().numpy()[0])
            conf = box.conf.cpu().numpy()[0]
            bb = box.xyxy.cpu().numpy()[0]

            cv.rectangle(
                frame,
                (int(bb[0]), int(bb[1])),
                (int(bb[2]), int(bb[3])),
                detection_colors[clsID],
                5,
            )

            cv.putText(
                frame,
                f"{class_list[clsID]} {round(conf * 100, 2)}%",
                (int(bb[0]), int(bb[1]) - 10),
                cv.FONT_HERSHEY_COMPLEX,
                1,
                (255, 255, 255),
                2,
            )

    cv.imshow("Object Detection", frame)
    elapsed_time = time.time() - start_time
    frame_delay = max(1, int((1 / fps_limit - elapsed_time) * 1000))
    if cv.waitKey(frame_delay) == ord("q"):
        break

video_stream.stop()
cv.destroyAllWindows()
2 Comments
2024/10/30
22:26 UTC

2

Book recommendations

Does anyone have a recommendation for a theory-based, up-to-date book on Computer Vision based on deep learning techniques? My main topic of interest is object detection.

0 Comments
2024/10/30
21:03 UTC

11

Im building an online platform for people in ai that want to build and collaborate on innovative projects !

Hi there :)

I got something cool to share with you, over the past few months i have been running around trying to find a way to make a dream come true

Im creating a online hub for people in ai that care about technological innovation and having a positive impact by building and contributing on projects

This is hub will be a place to find like minded people to connect with and work on passion projects with.

Currently we are coding a platform so that everyone can find each other and get to know each other

After we got some initial users we will start with short builder programs where individuals and teams can compete in a online competition where the projects that stand out the most can earn some prize :)

Our goal is to make the world a better place by helping others to do the same

If you like our initiative, please sign up below on our website !

https://www.yournewway-ai.com/

And in some weeks, once we're ready we will send you a invite to join our platform :)

8 Comments
2024/10/30
17:15 UTC

3

Camera rotation degree

Hi, given 2 camera2world matrices, I am trying to compute the rotation degree of camera from first image to second image, for this purpose I calculated the relative transformation between the matrices(multiplying second matrix by the inverse of the first), and took the sub matrix(:3,:3 of the 4*4 relative transform matrix), I have the ground truth rotation value but for some reason they do not match the Euler degrees I compute using scipy's rotation package, any clue what I am doing wrong mathmatically?

*the values of cam2world are the output obtained from Dust3r if that makes a difference

4 Comments
2024/10/30
16:29 UTC

0

I hate my amd gpu

hello guys, my first post on here and I just want to say I freaking hate my amd gpu (running on windows) so damn much, I have been trying for 6 weeks now to train a simple face detection model using a public dataset, but my amd gpu refuses to elaborate! I wish I knew how bad amd was when it comes to machine learning and computer vision before I bought it 😔😔 I can’t even download linux due to other reasons, I also tried directML but that failed miserably for some reason, not really looking for help but if anyone is considering buying a build for computer vision (which I was not when I got mine) please avoid amd at all costs.

25 Comments
2024/10/30
15:36 UTC

16

I created a course on Coursera called Hands-on Data Centric Visual AI and made a series of cringey videos to promote it.

3 Comments
2024/10/30
12:41 UTC

5

Jetson Orin Nano 8GB: different YOLO fps with same configurations

I want to measure fps to benchmark different versions of YOLO and I do this by running inference 5 times on a video and then averaging fps for each frame. To be sure that this task is not interrupted by the scheduler, I put sudo nice -n -20 before yolo predict and I check processes with jtop (and ofc power mode is fixed). However, under these conditions I sometimes get big differences for the same model (i.e. 50<->75 fps).

Do you know which is the reason? Temperature? Or is there a more robust way to achieve my goal?

11 Comments
2024/10/30
10:56 UTC

1

Gathering training data from google maps

Hello CV,

I'm currently in the process of training YOLO to identify which industrial complexes does NOT have solar panels on their roof. I want it feed it training data of google maps satellite images, but I'm unsure how to go about this.
The questions that I have:

- How do I determine the correct size (pixel) for my training data?

- Is there any available API that can help me make the process easier?

- Is there a way to use the globe/3d view to help identify the model identify if the roof is flat or slanted?

Thank you, hope someone can help me

1 Comment
2024/10/30
08:28 UTC

3

Android : MobileFaceNet performance is abysmal. What can i do to up it?

I was tasked with a project at work to build a facial recognition app that runs on Android tablets for one of our clients on a tight deadline. The first thing I did was detect the face on the device, send it to a local server, get DLIB to create an embedding from the captured face THEN compare the embedding with the list of saved face embeddings. This worked (albeit with max. achievable latency), and the effective accuracy was about 50-60%.

After deploying this solution i started working on the app again, to enable on-device recognition using TFLite and MobileFaceNet - (Normalized embeddings and L2Normalization). It works BUT the accuracy is like -30%.

At the moment i am using one frontal picture of each employee, can I increase the number of comparison (base) pictures per employee?
I (think) i realized that base pictures taken in front of a dark background tend to yield more accurate comparisons - is this the case (theoretically)?
Any other suggestions would be bloody appreciated - Oh and by the way, prior to this project i had no knowledge of CV, so please explain things like you are talking to a five four year old.

1 Comment
2024/10/30
08:02 UTC

71

Control Gimbal(reCamera) using LLMs(Locally deployed on NVIDIA Jetson Orin)! Say turn left at 40 degrees, it works!

8 Comments
2024/10/30
07:54 UTC

3

Basler camera -opencv

Hey Hi I’m developing my first project with opencv using a basler camera but I cannot achieve image acquiring: it opens the image and cracks instactly (doesn’t respond anymore)

Is there any guide anywhere I can use?

Also I can’t see the camera on Pylon viewer, but it runs in my python code in spyder (the one that cracks)

1 Comment
2024/10/30
06:30 UTC

4

Question about relate github project, or possible approaches on point detection for lane line detection

Hi everyone, I was assigned the task of lane detection. However, after searching the internet, I found many methods, mainly the lane segmentation method or polyline-based detection since I only want is to predict the dot on the lane, like in the attached image. Can you suggest any model or any method that already worked on this?

Thank you very much

Processing img ocgy2almxuwd1...

0 Comments
2024/10/30
02:31 UTC

1

Help - 360 degree and CV

I am willing to start a small academic project that takes in a street view from a Maps' API and then do some processing on it. For example, if we are passing by a monument or any building that is of pretty much importance and the crowd is pretty much covering all the space up. I would like to erase them, be it cars or people and content-fill give a clear one. I need help to what papers to read, if anyone has done anything similar to this. Mainly, how to project the 360 view? On what sort of plane to perform all the desired actions. Anything other help would also be helpful

0 Comments
2024/10/29
23:50 UTC

1

Preferred Computer Vision Models - Open Soure?

Planning to make an identifier for manufacturing parts kept in a storage line with nuts and bolts of different sizes. Any recommendations?

5 Comments
2024/10/29
21:13 UTC

3

Orange Pi 5, RK3588 and yolov9

This is my experience by far using orange pi 5 and my tries up until now in making yolov9 work on orange pi5/ RK3588 SoC . Our company uses Orange Pi5 4GB (RK3588 SoC) as the main process unit of our traffic cameras . This boards are pact with NPU which is very useful considering our process's behind the since of the whole detection process . I decided to make 3 different models, one for detecting vehicles, one for detecting License plates and an other one for reading the plates. I chose yolov9 since it had more accuracy comparing with yolov10 and more speed compared to yolov8, I also chose t variant of yolov9 models since they are the lightest and probably faster on edge devices. . After process of making a good dataset base on company data and my best tires on normalizing the dataset, I got a good acceptable above 70% accuracy on test environment(and 60-82% in real life soon after) . After 3 work days of work on orange pi, I was able to boot up on OS (The company gave me a board that had already OS(some old version of PiOH the specialized Ubuntu for orange pi boards) but that had some old dependencies like onnx 1.13.0 and my newer models wasn't compatible so after checking multiple versions of the arm Linux Versions (armbian, arch, piOH etc...) I got hands on https://github.com/Joshua-Riek/ubuntu-rockchip/wiki Which helped me boot up correctly to orange pi(In this process I even though I damaged a board since this shitty boards are moody and sometimes they simply don't want to boot to SD card or nvme or show red light so we found out they are alive) . After that, I made a simple python code, for taking frames from cameras and trying to detect object via my models (vehicle detection->cut the vehicle image-> send to license plate detection model->detect the lisence plate -> cut lisence plate -> send to OCR model -> read license plate, and then save images of the car, lisence plate nad the OCR output. . After trying for 1 week on trying different types of approach on importing my .pt model to .rknn, I found out, YOLOv9 models are simply not compatible with Rk3588 NPU's since Only models saved in torch.jit.trace can be used and YOLOv9 isn't. yet you can't use any other types of YOLO models but those that cosumized to be able to convert to rknn This was my experience, I hope it help others to do not fall in this shitty hole of not understanding wtf doc and manuals said in rknn-toolkit2

14 Comments
2024/10/29
18:45 UTC

9

Best YOLO Model for Detecting on Raspberry Pi with Video Streaming?

Hey everyone! For my capstone project, I'm building a system to detect people in wheelchairs through video streaming, but here's the catch: it has to run on a microcontroller like a Raspberry Pi 4 or 5. I’m pretty new to machine learning and YOLO models, so I could really use some advice on a few things:

  1. Best YOLO Version: Which YOLO version is best suited for the Raspberry Pi that won’t lag or stutter?
  2. Video Stream Compatibility: If I train a YOLO model on a dataset of wheelchair images, will that also work effectively on a live video stream?
  3. Dataset Annotation: I have a 10,000-image dataset. Do I need to manually annotate every single image, or can I label a few, and the model will learn the rest on its own?
  4. Training on Colab: Do I need Colab Pro to train a YOLO model, or can I get by with the free version?
  5. C++ vs Python for YOLO: Will there be a noticeable performance difference if I run YOLO in C++ compared to Python on the Raspberry Pi?

Thanks in advance for any help! Any advice or resources would be really appreciated.

7 Comments
2024/10/29
16:08 UTC

Back To Top