Photograph via snooOG

Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.

We welcome everyone from published researchers to beginners!

Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).

If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!

Related Subreddits

Computer Vision Discord group

Computer Vision Slack group


93,280 Subscribers


Fun interactive ideas

I am doing my first exhibition showcasing our facial recognition services. I have a 3x3m stand that will have a demonstration of our solution but was thinking of having 3 little kiosks setup each with their own monitor and own camera where people can walk in front of and interact with them showcasing computer vision.

I have a couple of ideas like an age detection where people can walk in front of the camera displaying their age and gender. Another idea is a where’s Wally finder, display a page in front of the camera and Wally will be located on the screen.

Looking for some other creative ideas that will be fun for people visiting our stand.

03:49 UTC


Creating a custom yolox model using coco script

I want to train my custom yolox model. I have two different folders for images, one is for training and one is for testing.

I have annotated both the training and the testing folder.

I am using labelme2coco script: https://github.com/fcakyon/labelme2coco

And I copied file and executed it:

# import functions
from labelme2coco import get_coco_from_labelme_folder, save_json

# set labelme training data directory
labelme_train_folder = "tests/data/labelme_annot"

# set labelme validation data directory
labelme_val_folder = "tests/data/labelme_annot"

# set path for coco json to be saved
export_dir = "tests/data/"

# set category ID start value
category_id_start = 1

# create train coco object
train_coco = get_coco_from_labelme_folder(labelme_train_folder, category_id_start=category_id_start)

# export train coco json
save_json(train_coco.json, export_dir+"train.json")

# create val coco object
val_coco = get_coco_from_labelme_folder(labelme_val_folder, coco_category_list=train_coco.json_categories, category_id_start=category_id_start)

# export val coco json
save_json(val_coco.json, export_dir+"val.json")

which generated a test and val json file. Now I need to create a dataset file. I have ran labelme2coco path/to/labelme/dir which did generate the dataset file however my question is that am I supposed generate the training and val json files before I generate the dataset file or vice versa or it doesn't matter?

And another question is when running labelme2coco path/to/labelme/dir, the directory is referring to the training images, NOT the validation images?

01:58 UTC


Computer Vision AI Development for Sports

hey guys my team and I have been building computer vision AI for sports for a while now and we've developed a lot of infrastructure and tooling for video analysis for like re-id, automated event recognition for stats, ball tracking, 3d scene reconstruction for various use cases like analysis for sports facilities, broadcasting, and advertising.

we get a lot of questions and interest so happy to connect with anyone with similar interests and inquiries on this topic!

19:26 UTC


Not sure what's happening - custom yolov8n model

I trained a yolov8n model with a custom dataset of 100 annotated boggle boards. It amounted to something around 1900 individual annotations in total. The images were converted to greyscale before training and had their "ICC color profile" removed so that it would get rid of the torch warning. 100 epochs, 1024 image size, 4 batch size. The images were also resized during this pre-processing, but the aspect ratio was maintained and the labels fit perfectly:

This image is from the runs folder that is made during training:


The dataset contains 31 classes: A-Z, DL, DW, TL, TW, and the SUBMIT checkmark on the game boards.

Everything looks good so far, no?

Here's where things get weird:

It detects small little artifacts in vscode - little icons, for ex.

Here is the github repo folder for the model results - It also has the train.ipynb that I used.

These are pics of the detection window (from detect.py):


And it doesn't detect the letters and tags I thought it would.



import cv2
import numpy as np
import pyautogui
from ultralytics import YOLO

model = YOLO('runs/detect/boggle-model-8n/weights/best.pt')

def capture_screen(bbox=(0, 0, 1024, 1024)):
    screen = np.array(pyautogui.screenshot(region=bbox))
    return screen

def run_inference(model, image):
    results = model(image)
    return results

def main():
    while True:
        screen = capture_screen()

        results = run_inference(model, screen)

        for result in results:
            boxes = result.boxes.xyxy.cpu().numpy()
            scores = result.boxes.conf.cpu().numpy()
            class_ids = result.boxes.cls.cpu().numpy()

            for box, score, class_id in zip(boxes, scores, class_ids):
                if score > 0.7:
                    x1, y1, x2, y2 = map(int, box)
                    cv2.rectangle(screen, (x1, y1), (x2, y2), (0, 255, 0), 2)
                    label = f'{class_id}: {int(score * 100)}%'
                    cv2.putText(screen, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

        cv2.imshow('Object Detection', screen)

        if cv2.waitKey(1) & 0xFF == ord('q'):


if __name__ == '__main__':

The code is super boilerplate - it's just for detecting and drawing bounding boxes.

Is there an obvious reason why this isn't working? I'm training a yolov8x model right now to compare it.

Unrelated, but with this project, I plan on drawing lines from the midpoints of each letter to the next in each row. I'll add some code for the bonus tiles. Then, I'll be able to extract the letters from the board and feed it into the main algorithm.

EDIT: this is a yolov8x prediction picture from validation (before there was an error that stopped it)


It's garbage 😭😭

100 epochs completed in 1.868 hours.
Optimizer stripped from runs\detect\boggle-model-8x\weights\last.pt, 136.8MB
Optimizer stripped from runs\detect\boggle-model-8x\weights\best.pt, 136.8MB

Validating runs\detect\boggle-model-8x\weights\best.pt...
Ultralytics YOLOv8.2.32  Python-3.11.9 torch-2.3.1 CUDA:0 (NVIDIA GeForce GTX 1660 Ti, 6144MiB)
Model summary (fused): 268 layers, 68153421 parameters, 0 gradients, 257.6 GFLOPs

                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95):  33%|███▎      | 1/3 [00:02<00:04,  2.25s/it]c:\Users\evank\anaconda3\envs\pytorch_cuda\Lib\site-packages\ultralytics\utils\plotting.py:858: RuntimeWarning: invalid value encountered in cast
  for j, box in enumerate(boxes.astype(np.int64).tolist()):
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95):  67%|██████▋   | 2/3 [00:04<00:02,  2.23s/it]Exception in thread Thread-141 (plot_images):
OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\evank\anaconda3\envs\pytorch_cuda\Lib\threading.py", line 1045, in _bootstrap_inner
  File "c:\Users\evank\anaconda3\envs\pytorch_cuda\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "c:\Users\evank\anaconda3\envs\pytorch_cuda\Lib\site-packages\ultralytics\utils\plotting.py", line 864, in plot_images
    annotator.box_label(box, label, color=color, rotated=is_obb)
  File "c:\Users\evank\anaconda3\envs\pytorch_cuda\Lib\site-packages\ultralytics\utils\plotting.py", line 173, in box_label
    self.draw.rectangle(box, width=self.lw, outline=color)  # box
  File "c:\Users\evank\anaconda3\envs\pytorch_cuda\Lib\site-packages\PIL\ImageDraw.py", line 318, in rectangle
    self.draw.draw_rectangle(xy, ink, 0, width)
SystemError: <method 'draw_rectangle' of 'ImagingDraw' objects> returned a result with an exception set
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:05<00:00,  1.87s/it]

                   all         20        355          0          0          0          0

Speed: 0.9ms preprocess, 271.0ms inference, 0.0ms loss, 2.4ms postprocess per image
Results saved to runs\detect\boggle-model-8x
18:24 UTC


Decoding a Data matrix from a "rough" image.

Hello, hope you're doing well.

I have this task that I'm working on, unfortunately I'm an intern without a mentor.

I need to make a software solution that gets an image( Usually this image was captured with a phone.), locates the 2D-Barcode within it, deskews it, and decodes it.

I did manage to make the first 2. However I couldn't make a realiable program to decode the data matrices. I did try off the shelf solutions like pyzbar and pylibdmtx. The latter being more reliable.

Please, any help is appreacited.

17:26 UTC


Cheap ip-ptz programmable camera

Hello guys,

Can somebody help me with choosing some cheap ip-ptz camera?

I did it already for my Master thesis but that camera is kinda large and expensive and id like to try it out with some smaller cameras around 50 bucks.

Dont have much experiences with connecting to cameras so i would like to avoid a possibility if i buy a camera, that it then has some security which wont allow me to connect to the camera directly and control it with my own code.

14:54 UTC


Conversion from YOLOv8 .pt model to .tflite for android app

Hello! I'm sturggling with my bachelor thesis which needs to develop an android app for object indoor detection. I've trained the dataset by using YOLOv8 and got a model best.pt. I tested it on various test images and I am somehow satisfied with the outputs. The problem is that for android app I need to convert the best.pt model to a tflite model to be compatible with Android Studio. Done that, but when I got to test the model I saw that it wasn't capable of recognizing any image i gave (i gave images with only one object in perfect conditions of lumminance etc.) . For conversion I followed some tutorials on youtube and did exactly the same(conversion consisted mainly of 3 or 4 lines of code and was done directly from .pt to .tflite, not in steps, to onnx, tensorflow and last to tflite) , but I don't get the results they get. Can you help me with any ideas of how I could make this work? Would it be better if I try to train an yolov5 model(i read that it is commonly used for developing android apps) . Any help is welcomed. Thank you!

14:48 UTC


Point of cloud from multiple fisheye cameras (stereo photogrammetry)

Hey there,

I am investigating how I can improve the stereo photogrammetry generation starting from multiple fisheye cameras.

The goal is to record outdoor videos in daylight conditions and then postprocess 3 streams (or more if needed) to generate a point of cloud of what's in front the cameras.

I tried to document myself but a lot of content points to depth cameras which I can't use because of the limited depth they can perceive (well unless spending $$$$$$$$).

It would be probably easier to use a depth camera but I need distance in the order of 40/50m or more (if possible). I don't need a super high precision in distance and anyway the data can be corrected with the successive frames.

I want to install this hardware, together with an Rk3588s based sbc to handle the recording and some minimal processing to hide faces and license plates, in a car behind the rear mirror, to film the front. I was thinking (but maybe I am very wrong) that I can potentially enrich the data with a couple of tof sensors that will give be able to provide a specific depth for specific points so that the system can have a specific reference, not sure if it would actually help or what else I can do to improve the accuracy

Also if fisheye cameras are a bad idea happy to use normal cameras.

Any suggestion or code reference would be really appreciated!

1 Comment
14:16 UTC


Label me / yolov8 training

Hi I’m training yolov8 for segmenting deformities on a blade. Currently all images for training are with the blade in a vertical direction therefore when I have trained the system it works well on blades in the same orientation but badly when they arnt. I’m trying to figure out if I should automate rotating the images and linked segmented areas increasing the size of the training database or are there settings within yolo to do this


10:50 UTC


Stereo RGB-D directly from camera (OAK-D-LR), visualized with Rerun

08:25 UTC


Need help in building something like this ...


I want to build something like this for demonstration of computer vision with AI to show some kids. I searched online for details about this demo but I couldn't find anything on it. So, I am wondering if this is something that can be done with a cheaper SBC (Single Board Computer) or an old laptop, while still achieving near real-time on-device learning for image classification. Any help is much appreciated.

1 Comment
05:50 UTC


I created CLIPPyX: local image search with content-based, text, and visual similarity system-wide search

The tool made using Apple's Mobile CLIP and nomic embeddings GGUF through llama.cpp (can be used with 🤗 transformers too)

  • Search by Image Caption: Enter descriptive text or phrases, using CLIP, CLIPPyX will return all images related to that semantic meaning or caption.
  • Search by Textual Content in Images: Provide descriptive text or phrases, and using Optical Character Recognition (OCR) and text embedding model, CLIPPyX will return all images with text semantically similar to the provided text.
  • Search by Image Similarity: Provide an existing image as a reference, and CLIPPyX will find visually similar images using CLIP
  • CLIPPyX Server can be accessed through any UI (I made simple webui and plugin from flow launcher)

GitHub Repo


Video at 1x speed on my 1660 Ti Laptop (16GB RAM)

01:52 UTC


Is this better than pytorch or is it the same?

01:23 UTC


Working with ReduNets guides?

Are there any good guides to working with ReduNets? I am doing my masters and am supposed to discuss my research team on this.

I am reading the paper by Yi Ma and it makes sense that they are more efficient and sparse compared to resnets (I have mostly experience with ResNets so this is sort of difficult).

Also do I understand it correctly that they are similar to GANs in the game based adversary approach of learning?

1 Comment
00:59 UTC


Video to point cloud

This shit is fucking hard, that is all I wanted I to say, I don’t need help or anything.

00:45 UTC


Created an open source version of "Math Notes" from Apple with GPT-4o!

00:22 UTC


MSc Thesis Resources Recommendations

Hello everyone,

I am a student starting my MSc thesis in September, and I'd appreciate recommendations for papers, books, or other resources to read over the summer to better prepare for my research.

My thesis aims to improve the mapping of underwater cave systems using modern AI techniques. Specifically, I will explore SLAM combined with techniques like Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting for dense 3D visualizations.

Thanks in advance!

1 Comment
23:29 UTC


Is there a way to see which direction this circle is viewed from?


that just a simulation but ı want to configure with real time

what are the best calculations for it

22:29 UTC


Toshiba Telicam site login

Anyone else having trouble with this? I need to download their SDK/Viewer for a few machine cameras that I bought late last year and it appears my old site login doesn't work and I've been waiting almost 4 business days for a response to reset / re-register.

I tried e-mailing some people on their team that I used to work with but my e-mails are coming back as undeliverable/delayed.

Anyone have some insight on the situtation or, even better a working login? The registration process was always nothing but a formality in the past.

** Yes, I know they were bought out by an investment group, but it's been 6 months and their group still seems to be posting blog posts as recently as late May this year **

19:42 UTC


Blur detection for segmented eye image

I have segmented eye images and I want to build an algorithm that tells me if the image is in focus or not by first generating a focus score and based on that I determine a threshold. I have tried using the Laplacian but my hypothesis is that edge detection would not work well since I am essentially removing all the edges by using a segmented image. It removes eyelids and all parts outside the eye and mainly just contains the portion inside the main eye including the sclera and the iris. I am looking for any suggestions to try out different algorithms and research papers/articles on detecting blur for image with less edges or for segmented images. Thanks!

1 Comment
18:00 UTC


is c++'s opencv dead?

i have seen that opencv have version of c++ instead of python and many companies uses computer vision for example tesla's autopilot, since c++ is high performance and if we use c++ in computer vision it will be great, but i see rarely coding tutorials, videos and books about c++'s opencv but there are lot of video of python's opencv
what i am trying to say is does big companies using computer vision necessary use c++ for their computer vision or opencv if not why and what they are using

17:34 UTC


3D Arena Leaderboard

1 Comment
16:51 UTC


Accessibility Consultant Seeking Assistance For Computer Vision Proof Of Concept

Hello all, Just found this subreddit, though I've been looking into this topic for a long time. I'm an award-winning, multi-credited accessibility consultant and gamer without sight (having never had any sight whatsoever) from the UK. As a part of my work, I've been very much interested in leveraging the potential of computer vision, capture cards and audio to add cues to games that didn't previously have them. Think, for instance of the Batman Arkham series, or any other game where enemy attacks are only cued visually. This means that even though a sighted player can counter fluidly, a gamer without sight (GWS) has no way to do so other than just spamming the button and hoping they are fortunate in their unintended timing.

I wondered if anyone would be interested in assisting, or being able to provide a starting point to construct a project that would be able to read such counter or interactivity cues from a game and, rather than, say, automatically interacting with them, provide a directionally based audio cue for the player to know that there is a cue there and where it is, as we've seen with some more playable titles over the past few years?

I have no real experience with Python programming etc, but am willing to learn for projects like this. Additionally, I have access to an Elgato HD 60 Pro capture card if that would be of use.

If anyone has further questions, please feel free to ask and I'll answer what I can.

Thanks in advance for any assistance!

1 Comment
15:49 UTC


Data Order FP16 Confusion Luxonis Oak-D Lite YOLOv5s

I have built a custom model to detect objects using YOLOv5s. The model was then converted to ONNX and then to a .blob. I have set up the model to run on an OAKdLite camera. The dataset used was very small (200 images) as this was a proof of concept.

The results were good as it was able to detect with 60% confidence some of the objects. But there are two things happening that are causing issues in reading the Data.

After turning the Data into a readable format I am able to determine there are seven points to each detection and I have print results that read like this:

Raw Detection Data: [5.44000000e+02 5.65000000e+02 8.25000000e+01 8.63125000e+01 6.06445312e-01 1.05102539e-01 8.80371094e-01]

Processed Data: Xcenter[0] 544.0 Ycenter[1] 565.0 Width[2] 82.5 Height[3] 86.3125 Confidence[4] 0.6064453125 Label[5]*Maybe* 0 unkown[6] 0.88037109375

This is not the Data format that was expected and it took some trial and error to determine that those were the values by drawing the bounding boxes on the screen it is in fact picking up the objects as indented up to 70% confidence which is likely due to a few factors in the dataset and testing environment.

So my Main issues is that the label for the object its finding is actual class 1 but it will always put out class 0 unless I am supposed to round this up which seems unlikely.

Further I don't know what the seventh Data point is… Unless its actually the class and in that case I don't know what the 6th data point is…

Currently the model fails to pickup any items from class 0 and only returns class one. But the Detections from getFirstLayerFp16() are always full at a len of 25200 and an average confidence of the 25200 very stable around 0.2%. Even with no objects in the frame I get 25200 results but with average confidence of 1.6e-7%

Is it common to have so many results which often overlap very close?

Any resources on parsing this unusual data or advice on lowering the incoming results to a more reasonable level would be appreciated.

I could update with the code but I am not sure that would be of any consequence.

15:12 UTC


Benchmarking in PoseTracking

Hey guys,

A lot of the 2017/18 PoseTrack benchmark links seem to be dead. Does anyone know of the bestmethod to benchmark in PoseTracking nowadays?

14:47 UTC


os module isn't working and can any one check "  os.system(f"cd yolov9/ && python train.py --batch {self.model_trainer_config.batch_size} --epochs " this line too that what's the error?

import os,sys
import yaml
import zipfile
from PlantDiseaseDetection.utils.main_utils import read_yaml_file

from PlantDiseaseDetection.logger import logging
from PlantDiseaseDetection.exception import AppException
from PlantDiseaseDetection.entity.config_entity import ModelTrainerConfig
from PlantDiseaseDetection.entity.artifacts_entity import ModelTrainerArtifact

class ModelTrainer:
    def __init__(
        model_trainer_config: ModelTrainerConfig,
        self.model_trainer_config = model_trainer_config

    def initiate_model_trainer(self,) -> ModelTrainerArtifact:
        logging.info("Entered initiate_model_trainer method of ModelTrainer class")

            logging.info("Unzipping data")

            with zipfile.ZipFile('data.zip', 'r') as zip_ref:
            # os.system("unzip data.zip")
            # file_to_remove = "data.zip"
            # os.remove(file_to_remove)
            # os.system("rm data.zip")

            file_to_remove = "data.zip"
                print(f"File '{file_to_remove}' deleted successfully.")
                logging.info(f"File '{file_to_remove}' deleted successfully.")
            except Exception as e:
                print(f"File '{file_to_remove}' not found.")
                raise AppException(e, sys)

            with open("data.yaml", 'r') as stream:
                num_classes = str(yaml.safe_load(stream)['nc'])

            model_config_file_name = self.model_trainer_config.weight_name.split(".")[0]

            config = read_yaml_file(f"yolov9/models/detect/{model_config_file_name}.yaml")

            config['nc'] = int(num_classes)

            with open(f'yolov9/models/custom_{model_config_file_name}.yaml', 'w') as f:
                yaml.dump(config, f)

            # os.system(f"cd yolov9/ && python train.py --img 640 --batch {self.model_trainer_config.batch_size} --epochs {self.model_trainer_config.no_epochs} --data ../data.yaml --cfg ./models/detect/gelan-c.yaml --weights {self.model_trainer_config.weight_name} --hyp yolov9/data/hyps/hyp.scratch-high.yaml")
            os.system(f"cd yolov9/ && python train.py --batch {self.model_trainer_config.batch_size} --epochs {self.model_trainer_config.no_epochs} --img 640 --device 1 --min-items 0 --close-mosaic 1 && --data Plant-Disease_detection/data.yaml && --weights {self.model_trainer_config.weight_name} && --cfg yolov9/models/detect/gelan-c.yaml && --hyp hyp.scratch-high.yaml")

            os.system(f"cd yolov5/ && python train.py --img 416 --batch {self.model_trainer_config.batch_size} --epochs {self.model_trainer_config.no_epochs} --data ../data.yaml --cfg ./models/custom_yolov5s.yaml --weights {self.model_trainer_config.weight_name} --name yolov5s_results  --cache")
            # os.system("cp yolov9/runs/train/exp/weights/best.pt yolov9/")

            source = "yolov9/runs/train/exp/weights/best.pt"  # Path of the file to copy
            destination = "yolov9"  # Path where you want to copy it

  # Rename the file to the destination path (effectively copying)
                os.rename(source, destination)
                print(f"File '{source}' copied to '{destination}' successfully.")
            except OSError as e:
                print(f"Error copying file: {e}")


            os.makedirs(self.model_trainer_config.model_trainer_dir, exist_ok=True)
            os.system(f"cp yolov9/runs/train/exp/weights/best.pt {self.model_trainer_config.model_trainer_dir}/")
            # os.system("rm -rf yolov9/runs")
            # os.system("rm -rf train")
            # os.system("rm -rf valid")
            # os.system("rm -rf data.yaml")

            print("File yolov9/runs deleted successfully.")

            print("File train deleted successfully.")

            print("File valid deleted successfully.")

            print("File  deleted successfully.")

            model_trainer_artifact = ModelTrainerArtifact(

            logging.info("Exited initiate_model_trainer method of ModelTrainer class")
            logging.info(f"Model trainer artifact: {model_trainer_artifact}")

            return model_trainer_artifact

        except Exception as e:
            raise AppException(e, sys)
1 Comment
14:20 UTC


Need help on how to implement computer vision in detecting the encircled markings on the gears in the picture.

14:11 UTC


What Happened to OpenMMLab?


Looks like they suddenly halted all development towards the end of Q3 2023.

Apart from some maintenance-like commits in a handful of repos, it seems that the rest have been stale. I tried to reach out to them but didn't get any response.

Did their research group disband or something? Just wondering if anybody knows.

Perhaps this post could even serve as a starting point to see if any other research groups elsewhere in the World would be open to taking over the further development of some of the repos.

12:34 UTC

Back To Top