/r/computervision

Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group

/r/computervision

102,003 Subscribers

0

Rubik’s cube solver

Does anyone have this project?? It’s for my project plz help

1 Comment
2024/11/02
03:20 UTC

0

Segmentation Error on YOLO

Hi! I’m new to deep learning and working on a project using YOLOv8 to detect insects in museum display images. I’m running this on my school’s HPC, which has an NVIDIA A100 GPU and uses Slurm. The training stops around the 4th epoch with a "segmentation error." Oddly enough, it works fine with a much smaller dataset (around 10 images, training for 1 epoch).Has anyone encountered this before, or have any tips on troubleshooting this?

4 Comments
2024/11/02
00:31 UTC

6

Calling all ML developers!

I am working on a research project which will contribute to my PhD dissertation. 

This is a user study where ML developers answer a survey to understand the issues, challenges, and needs of ML developers to build privacy-preserving models.

 If you work on ML products or services or you are part of a team that works on ML, please help me by answering the following questionnaire:  https://pitt.co1.qualtrics.com/jfe/form/SV_6myrE7Xf8W35Dv0.

For sharing the study:

LinkedIn: https://www.linkedin.com/feed/update/urn:li:activity:7245786458442133505?utm_source=share&utm_medium=member_desktop

Please feel free to share the survey with other developers.

Thank you for your time and support!

 

Mary

0 Comments
2024/11/01
22:04 UTC

2

3d models of DTU dataset

Are the 3D models available for the DTU Dataset for all the scans? Obj, ply files etc

0 Comments
2024/11/01
20:59 UTC

1

Low latency classification help

Im currently working at a startup that focuses on drone detection using RF sensors and radars, but my boss recently asked me to explore using computer vision to identify and track drones with our camera system. We’re using a solid camera setup(a 1/2" sensor with 31x optical zoom) so we’ve got some good hardware to work with.

ive done a few ML projects in school and played around with YOLOv8 before, but Im trying to figure out if it’s really the best fit for this. Is YOLO good enough for this kind of task, or should I try architecting a custom model specifically tuned for tracking drones in the sky? My priority is to make it fast enough to keep up with the drones while being accurate enough to identify them reliably. will be using edge computing too probably a jetson or something

any advice??

3 Comments
2024/11/01
19:03 UTC

3

HTR and OCR in Portuguese

Goal: to digitalize a few notebooks and family history letters (some English, some Portuguese) using HTR run locally (I want to eventually package this into an open-source app with some other features).

Resources: no GPU, 16GB of RAM on an Intel MacBook Pro. Extra CPU can be provided by my home cluster.

I've tried: Microsoft's handwritten OCRs on hugging face (was most accurate that I used, but only good for English), TrOCR, EasyOCR, and uploading to ChatGPT4.o. ChatGPT has been the most successful with an insane accuracy—just uploaded the entire page. I tried downloading a local version of LLAVA to imitate the same idea as ChatGPT but it wasn't successful at all. I've not tried using separate segmenting from transcription, mostly resources that do both or using tiny data samples (a photo cropped to one line).

I really want something that I can do from the command line.
I kind of feel a bit overhwelmed by how many tools and how different they are.

Here's some of the attempts:

Original text:

"A fim de ser um lider que estimula crescimento e" (note that "estimula" is spelled incorrectly as "estumula")

https://preview.redd.it/0j24tk6agbyd1.png?width=2194&format=png&auto=webp&s=f02a3633e66b2ce54ba98fc63b330d77118f90ab

EasyOCR was a bust (see result on last line)

https://preview.redd.it/ynbt6on6gbyd1.png?width=3442&format=png&auto=webp&s=a9de4ded82ac44e632016c782c5785596f75a002

Microsoft handwritten OCR was impressive:

https://preview.redd.it/4qni049pgbyd1.png?width=1130&format=png&auto=webp&s=bbb7b4af40226b0b8dc98f042fdb299788ed7901

0 Comments
2024/11/01
16:21 UTC

1

point cloud segmentation, help needed

Hello Everyone,
I need to perform a point cloud segmentation, I have many scans of a rock surface and I want to segment into 3 category. I have lots of different scans in different locations labelled every point into the 3 category so I can perform a supervised deep learning. I have some experience with machine learning in TensorFlow but I am new to computer vision.

My two questions are:

do I need a part segmentation or semantic segmentation? the difference really confuses me sorry!

I have spent a lot of time going through the examples in pointnet and pointnet++ however there are many somewhat newer projects like DGCNN, and POINTCNN. My question is should I be using these newer packages, or the older more established ones. Also many of these packages are only available in TF1 but there are some open pull requests for tf2 available in some cases. If anyone has a lot of experience could you advise me on the best place to start. I have XYZRGB data labelled.

0 Comments
2024/11/01
15:10 UTC

221

Dear researchers, stop this non-sense

Dear researchers (myself included), Please stop acting like we are releasing a software package. I've been working with RT-DETR for my thesis and it took me a WHOLE FKING DAY only to figure out what is going on the code. Why do some of us think that we are releasing a super complicated stand alone package? I see this all the time, we take a super simple task of inference or training, and make it super duper complicated by using decorators, creating multiple unnecessary classes, putting every single hyper parameter in yaml files. The author of RT-DETR has created over 20 source files, for something that could have be done in less than 5. The same goes for ultralytics or many other repo's. Please stop this. You are violating the simplest cause of research. This makes it very difficult for others take your work and improve it. We use python for development because of its simplicityyyyyyyyyy. Please understand that there is no need for 25 differente function call just to load a model. And don't even get me started with the rediculus trend of state dicts, damn they are stupid. Please please for God's sake stop this non-sense.

97 Comments
2024/11/01
14:11 UTC

8

nnU-NetV2 pre-trained?

Hello everyone,

I was reading the nnU-Net paper: https://www.nature.com/articles/s41592-020-01008-z or arxiv version https://arxiv.org/abs/1809.10486 and I was wondering if I can find the pre-trained version of their model? Specifically I'm looking for the nnU-Net that they themselves trained on their dataset, since I am looking to work with the same Medical Decathlon dataset, conducting Knowledge Distillation specifically.

I found their github https://github.com/MIC-DKFZ/nnUNet They provide details on how to train and do inference etc. generally. I was originally going to train an nnUNet on the existing Medical Decathlon dataset but that would be doing work that already has been done. I was wondering if anyone knows how do I find the trained model instance that they worked with? I thought about emailing them, but idk how acceptable is that of a request in the CV community.

nnUNetV1 would also be fine.

Would be grateful for any advice.

1 Comment
2024/11/01
09:31 UTC

9

Papers that discusses these terms

Hi, I’m looking for papers that goes into more depth of these components mentioned in my professors lecture. The closest thing I have found is data association discussed in ORB-SLAM3 paper, but it does not mention loop closure in local map or global map.

My professor said this is discussed in many papers, but I have so far not found one.

4 Comments
2024/11/01
06:24 UTC

0

Train S3D Video Classification Model using PyTorch

Train S3D Video Classification Model using PyTorch

https://debuggercafe.com/train-s3d-video-classification-model/

PyTorch (Torchvision) provides a host of pretrained video classification models. Training and fine-tuning these models can prove to be an invaluable asset in building many real-life applications. However, preparing the right code to start with custom video classification training can be difficult. In this article, we will train the S3D video classification model from PyTorch. Along the way, we will discuss the pitfalls, caveats, and optimization techniques specific to the model.

https://preview.redd.it/6vkg4vwhr6yd1.png?width=1000&format=png&auto=webp&s=33eff9aa999a466cf3c16a2b8d29a379264eb5d9

0 Comments
2024/11/01
00:38 UTC

5

How to calculate distance from a profile view?

Hello, everyone! I'm currently working on a YOLO vision project using the pose model, and I'm having some issues estimating distance using only a camera. I know it might sound a bit arbitrary, but this is the solution we have for now while I wait for a LiDAR sensor I ordered last week. Since I'm in a Latin American country, it may take a month to arrive.

Right now, we're estimating distance by using the focal length with the person facing the camera, and it seems to be working well, with an error margin of around 20-25 cm. Here’s the code we're using:

```

float YoloV8::estimateDistance(const cv::Rect_<float>& bbox, const std::vector<float>& keypoints) {

// If we have valid keypoints, use the shoulder distance

if (!keypoints.empty() && keypoints.size() >= 21) { // Ensure we have enough keypoints

// Get shoulder coordinates (5 and 6 in COCO format)

float shoulder1X = keypoints[12]; // Right shoulder X (5 * 3)

float shoulder2X = keypoints[15]; // Left shoulder X (6 * 3)

float shoulder1Conf = keypoints[14]; // Right shoulder confidence

float shoulder2Conf = keypoints[17]; // Left shoulder confidence

// If both shoulders are detected with sufficient confidence

if (shoulder1Conf > KPS_THRESHOLD && shoulder2Conf > KPS_THRESHOLD) {

float shoulderWidth = std::abs(shoulder2X - shoulder1X);

if (shoulderWidth > 0) {

return (AVERAGE_SHOULDER_WIDTH * CAMERA_FOCAL_LENGTH_SHOULDERS) / shoulderWidth;

}

}

}

// Fallback to the original method if we cannot use shoulders

return (AVERAGE_PERSON_WIDTH * CAMERA_FOCAL_LENGTH) / bbox.width;

}

```

The issue I’m currently facing is with profile views; the distance calculation becomes inaccurate, returning values that don't make sense.

6 Comments
2024/10/31
19:51 UTC

0

Looking for contract based projects (CV, ML, Robotics and IoT)

Background: Worked in the research labs of McGill University and IISC Bangalore in the fields of CV, ML, Robotics and IoT

Tech stacks: PyTorch, OpenCV, Mediapipe, ROS, puredata, C++

Currently looking for contract based projects, if you a professional looking to delegate your work, or a college student looking to get their final year project done at an industrial level, feel free to contact me for my portfolio/profile.

3 Comments
2024/10/31
19:09 UTC

1

Creating a robot for you all and I am hoping we can collaborate on it together.

I am really trying to find my target market, and it would really help me out if some of you took this survey for me. We will be releasing more information about it in the future. I think you all will love it, developers and hobbyists alike. I am trying to figure out who my target market is, and it would be extremely helpful if some of you could fill out this survey for me. https://forms.gle/6KzCHZskboepSpWQ6

2 Comments
2024/10/31
18:26 UTC

2

Experimental Design for Multi-Channel Imaging via Task-Driven Feature Selection (ICLR)

The paper aims to shorten acquisition time, reduce costs, and accelerate the deployment of imaging devices.

https://openreview.net/pdf?id=MloaGA6WwX

Contributions:

  • A novel method for supervised feature selection that performs task-based image channel selection.
  • Results shorten the acquisition time in MRI, reconstruct image cubes of remotely-sensed multispectral ground images with few sensors, estimate tissue oxygenation from hyperspectral medical devices.
  • Results show improvement on i) classical experimental design, ii) recent application-specific published results, iii) state-of-the-art approaches in supervised feature selection.

We expect further applications to similar datatypes e.g. data efficiency on multi-channel images, other hyperspectral/multispectral application, cell microscopy, weather and climate data et.c

Code is available, PM me if interested.

0 Comments
2024/10/31
16:59 UTC

2

MBA students researching machine and computer vision

Hi all. I am an MBA student at Temple university and we are doing our final project looking at Machine and computer vision. I would be grateful if you would be able to fill out this survey and if possible send to anyone else that works in manufacturing. We are looking for opinions from those that currently and do not currently use vision systems. Here is the link to the survey: https://fox.az1.qualtrics.com/jfe/form/SV_0cEBnNUQ9jnxZpI

 Alternatively if you would like to do a short interview on your experiences, this would also be much appreciated.

Thanks so much!

1 Comment
2024/10/31
16:35 UTC

2

Need help with conditional random fields

Hi all:

I am reading a paper and need to thoroughly understand it. This is the paper: https://ieeexplore.ieee.org/abstract/document/6983606

I can pay. If anyone here is well versed in this and can read through and thoroughly understand/help me implement this, please DM me. Thanks!

0 Comments
2024/10/31
15:11 UTC

6

Alternatives for grounding Dino?

I’m looking for a model like gdino, where there is a sort of open-vocabulary/zero-shot support, but also one that is preferably faster (and maybe smaller/less resource intensive). I looked into yolo-world but it didnt support the open-vocab part quite like I wanted to (e.g. instead of detecting all apples in a scene, I would want to detect “apple on table” which gdino is much better at compared to yolo world from what I’ve tested).

Or should I just maybe fine-tune yolo world to do what I want it to do?

7 Comments
2024/10/31
14:36 UTC

5

Anyone got ideas on how Claude Computer Use picks coordinates?

Title

7 Comments
2024/10/31
09:35 UTC

2

Any better alternatives to OmniParser?

Tried out omniparser and it's pretty decent, but it misses some stuff. Also, I'd like something that can recognize boxes / layouts instead of just icons / text

2 Comments
2024/10/31
02:02 UTC

0

Take this survey to help out an intern at a robotics startup <3

We are making a robot for you: https://forms.gle/ggVetcDios9m15yV8you:

2 Comments
2024/10/30
23:15 UTC

1

Problems with opening a video

Hi!

I recently started my adventure with computer vision. I wrote some code that was supposed to use YOLO algorythm working with GPU (I am using nvidia cuda), encountered whole lot of errors trying to open video files with it and I'm still having some problems with it - it seems to have problems with reading the frames. The code is down below. I spent couple hours with chat gpt and scrolling through internet in search of help but nothing worked:(, also checked the directory, the video resolution, it seems to be fine. Do you have any idea how to repair it? I will be grateful for any kind of help!

P.S. ffmpeg seems to have no problems with localizing and opening the file through command prompt

import random
import threading
import cv2 as cv
import numpy as np
from ultralytics import YOLO
import torch
import time
import subprocess as sp
import os

cv.setNumThreads(1)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load the class list
with open("utils/coco.txt", "r") as my_file:
    class_list = my_file.read().strip().split("\n")

detection_colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) for _ in range(len(class_list))]
model = YOLO("weights/yolov8n.pt", "v8").to(device)

def read_video_ffmpeg(path, frame_wid=640, frame_hyt=480):
    command = ['ffmpeg', '-loglevel', 'error', '-i', path, '-f', 'image2pipe', '-pix_fmt', 'bgr24', '-vcodec', 'rawvideo', '-']
    print("Running FFmpeg command:", " ".join(command))  # Debug print
    pipe = sp.Popen(command, stdout=sp.PIPE, stderr=sp.PIPE, bufsize=10**8)

    while True:
        raw_image = pipe.stdout.read(frame_wid * frame_hyt * 3)
        print(f"Raw image length: {len(raw_image)}")

        # Check if FFmpeg gave any error
        err = pipe.stderr.read().decode()
        if err:
            print("FFmpeg error:", err)
            break

        if not raw_image:
            print("End of video stream or no data received.")
            break

        try:
            frame = np.frombuffer(raw_image, dtype='uint8').reshape((frame_hyt, frame_wid, 3))
            yield frame
        except ValueError as e:
            print(f"Error reshaping frame: {e}")
            break

    pipe.stdout.close()
    pipe.stderr.close()
    pipe.terminate()

class Video:
    def __init__(self, src="D:/OPENCV/videos/DJI_0302.MP4"):
        self.src = src
        
        # Check if the video file exists
        if not os.path.isfile(self.src):
            print(f"Error: Video file does not exist at path: {self.src}")
            return  # Stop initializing if file does not exist

        self.frame_wid = 2720  # Update frame width
        self.frame_hyt = 1536  # Update frame height
        self.frame_gen = read_video_ffmpeg(self.src, self.frame_wid, self.frame_hyt)
        self.frame = None
        self.running = True
        threading.Thread(target=self.update, daemon=True).start()

    def update(self):
        while self.running:
            try:
                self.frame = next(self.frame_gen)
                print("Frame read successfully")
            except StopIteration:
                self.running = False
            except Exception as e:
                print(f"Error updating frame: {e}")
                self.running = False

    def read(self):
        return self.frame

    def stop(self):
        self.running = False

video_stream = Video(src="D:/OPENCV/videos/DJI_0302.MP4")
fps_limit = 10

while video_stream.running:
    start_time = time.time()
    frame = video_stream.read()

    if frame is None:
        print("No frame to display")
        break

    detect_params = model.predict(source=[frame], conf=0.25, save=False)

    if detect_params:
        boxes = detect_params[0].boxes
        for box in boxes:
            clsID = int(box.cls.cpu().numpy()[0])
            conf = box.conf.cpu().numpy()[0]
            bb = box.xyxy.cpu().numpy()[0]

            cv.rectangle(
                frame,
                (int(bb[0]), int(bb[1])),
                (int(bb[2]), int(bb[3])),
                detection_colors[clsID],
                5,
            )

            cv.putText(
                frame,
                f"{class_list[clsID]} {round(conf * 100, 2)}%",
                (int(bb[0]), int(bb[1]) - 10),
                cv.FONT_HERSHEY_COMPLEX,
                1,
                (255, 255, 255),
                2,
            )

    cv.imshow("Object Detection", frame)
    elapsed_time = time.time() - start_time
    frame_delay = max(1, int((1 / fps_limit - elapsed_time) * 1000))
    if cv.waitKey(frame_delay) == ord("q"):
        break

video_stream.stop()
cv.destroyAllWindows()
2 Comments
2024/10/30
22:26 UTC

3

Book recommendations

Does anyone have a recommendation for a theory-based, up-to-date book on Computer Vision based on deep learning techniques? My main topic of interest is object detection.

0 Comments
2024/10/30
21:03 UTC

10

Im building an online platform for people in ai that want to build and collaborate on innovative projects !

Hi there :)

I got something cool to share with you, over the past few months i have been running around trying to find a way to make a dream come true

Im creating a online hub for people in ai that care about technological innovation and having a positive impact by building and contributing on projects

This is hub will be a place to find like minded people to connect with and work on passion projects with.

Currently we are coding a platform so that everyone can find each other and get to know each other

After we got some initial users we will start with short builder programs where individuals and teams can compete in a online competition where the projects that stand out the most can earn some prize :)

Our goal is to make the world a better place by helping others to do the same

If you like our initiative, please sign up below on our website !

https://www.yournewway-ai.com/

And in some weeks, once we're ready we will send you a invite to join our platform :)

8 Comments
2024/10/30
17:15 UTC

3

Camera rotation degree

Hi, given 2 camera2world matrices, I am trying to compute the rotation degree of camera from first image to second image, for this purpose I calculated the relative transformation between the matrices(multiplying second matrix by the inverse of the first), and took the sub matrix(:3,:3 of the 4*4 relative transform matrix), I have the ground truth rotation value but for some reason they do not match the Euler degrees I compute using scipy's rotation package, any clue what I am doing wrong mathmatically?

*the values of cam2world are the output obtained from Dust3r if that makes a difference

4 Comments
2024/10/30
16:29 UTC

0

I hate my amd gpu

hello guys, my first post on here and I just want to say I freaking hate my amd gpu (running on windows) so damn much, I have been trying for 6 weeks now to train a simple face detection model using a public dataset, but my amd gpu refuses to elaborate! I wish I knew how bad amd was when it comes to machine learning and computer vision before I bought it 😔😔 I can’t even download linux due to other reasons, I also tried directML but that failed miserably for some reason, not really looking for help but if anyone is considering buying a build for computer vision (which I was not when I got mine) please avoid amd at all costs.

25 Comments
2024/10/30
15:36 UTC

16

I created a course on Coursera called Hands-on Data Centric Visual AI and made a series of cringey videos to promote it.

3 Comments
2024/10/30
12:41 UTC

5

Jetson Orin Nano 8GB: different YOLO fps with same configurations

I want to measure fps to benchmark different versions of YOLO and I do this by running inference 5 times on a video and then averaging fps for each frame. To be sure that this task is not interrupted by the scheduler, I put sudo nice -n -20 before yolo predict and I check processes with jtop (and ofc power mode is fixed). However, under these conditions I sometimes get big differences for the same model (i.e. 50<->75 fps).

Do you know which is the reason? Temperature? Or is there a more robust way to achieve my goal?

11 Comments
2024/10/30
10:56 UTC

1

Gathering training data from google maps

Hello CV,

I'm currently in the process of training YOLO to identify which industrial complexes does NOT have solar panels on their roof. I want it feed it training data of google maps satellite images, but I'm unsure how to go about this.
The questions that I have:

- How do I determine the correct size (pixel) for my training data?

- Is there any available API that can help me make the process easier?

- Is there a way to use the globe/3d view to help identify the model identify if the roof is flat or slanted?

Thank you, hope someone can help me

1 Comment
2024/10/30
08:28 UTC

Back To Top