/r/computervision
Computer Vision is the scientific subfield of AI concerned with developing algorithms to extract meaningful information from raw images, videos, and sensor data. This community is home to the academics and engineers both advancing and applying this interdisciplinary field, with backgrounds in computer science, machine learning, robotics, mathematics, and more.
We welcome everyone from published researchers to beginners!
Content which benefits the community (news, technical articles, and discussions) is valued over content which benefits only the individual (technical questions, help buying/selling, rants, etc.).
If you want an answer to a query, please post a legible, complete question that includes details so we can help you in a proper manner!
Related Subreddits
/r/computervision
I’m planning to visualize a LiDAR dataset with a new tool I’ve been using. Curious on people’s opinions on the most popular LiDAR dataset / scene? KITTI? Waymo Open dataset? ISPRS?
I have to do a project on ‘text based sign detection’ using opencv and python . How do i go about doing it. Can someone please tell a solution
I just wanted to know if this kinds of physical adversarial attacks like printed images of a target object is still a problem in object recognition models right now, even in deep neural networks.
Purpose:
I'm having a research and I wanted to focus on computer vision area for my computer science degree. One of the questions I have encountered in the past when I proposed an real-time object recognition system is what if there is a printed image of the target object (ex. plastic bottle) in front of the camera, will it create a false positive result?
We're looking to improve the perceptual image quality of our image processing workflow. Right now we have a home-written system doing all the standard textbook-stuff plus some application-specific stuff, but we're struggling to match the "secret sauce" we get processing our raws in e.g. Adobe Camera Raw, so we're thinking to bring on a vendor. We have decent experience in house and excellent hardware but we're a small team.
Can anybody recommend a vendor who can offer this kind of solution to integrate into our product?
Hello Reddit,
I have a question regarding the right tool. I'm looking for a tool / model to detect hand-keypoints in a video stream of a person assembling stuff. I know OpenPose is a possible one, also Google MediaPipe.
I’m not really getting along with OpenPose and MediaPipe don’t show really good results.
In my project, I would like to detect hand keypoints in assembly scenarios. It would be ok to use 2 cameras or a depth camera if necessary.
Does anybody knows any models / tools to use?
Thanks in advance :)
If I perform a calibration to get the intrinsic and extrinsic information for a stereo camera setup, then I can undistort and rectify then calibrate again. Would that be useful at all, and would I be able to combine the matrices together to get a more accurate calibration? Has any ever done an iterative calibration process like this?
I looked around and couldn’t find anything. My depth map doesn’t not look good at all after stereoBM or stereoSGBM with a WLS filter, so I’m trying to improve my calibration.
Mousetrappe is seeking a with OpenCV and Python experience. Additional experience using Unreal Engine and TouchDesigner is also beneficial. The project involves automatic camera and projector calibration/localization using sensors. We are located in Burbank, CA. A candidate with the potential for short-term on-site travel would be ideal, but entirely remote work is possible. The project will run for roughly two months.
Cedar Connor
cedarconnor (at) mousetrappe.com
www.mousetrappe.com
So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks
Hi all,
So my in-laws have a small farm and I am considering a hobby ML project to try to determine the ripeness stage of vegetables.
After the vegetable becomes of a decent size, the farmer looks at the color/shade of the vegetable to determine if it is ripe enough to pluck. It is also possible to hold the vegetable in your hand and feel the firmness but that's not something I can do with a vision model.
For something like eggplants, it is basically different shades of light green. An experienced farmer knows it instinctively whereas I am usually clueless.
So I wonder if it is reasonable to try and fine-tune a vision model like DETR or YOLO to do this kind of thing?
Does anyone have any other suggestions / starting points for me to investigate?
I haven't yet decided if I want to do it, but I started learning ML recently and wanted to try my hands at something new but solvable.
Hey, guys! I need to start an augmented reality college project and present the MVP by the end of the year. As the deadline is a bit tight, my advisor suggested that, for now, I explore pre-built solutions to speed up the development process. Do you know of any additional libraries or resources that could help me? The project involves rendering objects 3D through the smartphone camera.
I must be doing something wrong but, I am following the eBasic Skills example from datamuro user guide and getting errors at the very first step.
I exported my dataset from Roboflow using VOC. I unzipped the zip into its own dir @ /root/voc. It has train/ test/ valid/ sub dirs.
I tried CLI and IDLE:
╰─❯ tree -d ~/voc
/root/voc
├── test
├── train
└── valid
╰─❯ datum project import -n cm-v7 -f roboflow_voc -p ../datamuro/projects/roboflow ~/voc
2023-10-01 14:50:07,936 INFO: Checking the source...
2023-10-01 14:50:08,981 ERROR: Failed to find dataset 'roboflow_voc' at '/root/datamuro/projects/roboflow/cm-v7'
Traceback (most recent call last):
File "/root/datamuro/projects/venv/bin/datum", line 8, in <module>
sys.exit(main())
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/cli/__main__.py", line 150, in main
retcode = args.command(args)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/util/scope.py", line 158, in wrapped_func
ret_val = func(*args, **kwargs)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/cli/commands/require_project/modification/import_.py", line 176, in import_command
project.working_tree.make_dataset(name)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 1556, in make_dataset
return ProjectBuilder(self._project, self).make_dataset(pipeline)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 379, in make_dataset
dataset = self._get_resulting_dataset(pipeline)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 444, in _get_resulting_dataset
graph, head = self._run_pipeline(pipeline)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 441, in _run_pipeline
return self._init_pipeline(pipeline, working_dir_hashes=wd_hashes)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 573, in _init_pipeline
dataset = _try_load_from_disk(stage_name, stage_config)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 552, in _try_load_from_disk
return ProjectSourceDataset(
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/project.py", line 99, in __init__
dataset = Dataset.import_from(rpath, env=tree.env, format=config.format, **config.options)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/dataset.py", line 700, in import_from
else importer(path, **kwargs)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/importer.py", line 67, in __call__
raise DatasetNotFoundError(path, self.NAME)
datumaro.components.errors.DatasetNotFoundError: Failed to find dataset 'roboflow_voc' at '/root/datamuro/projects/roboflow/cm-v7'
╰─❯ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datumaro.components.dataset import Dataset
^[[Adata_path = '/root/voc'
>>> data_format="roboflow_voc"
>>> dataset = Dataset.import_from(data_path, data_format)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/dataset.py", line 700, in import_from
else importer(path, **kwargs)
File "/root/datamuro/projects/venv/lib/python3.9/site-packages/datumaro/components/importer.py", line 67, in __call__
raise DatasetNotFoundError(path, self.NAME)
datumaro.components.errors.DatasetNotFoundError: Failed to find dataset 'roboflow_voc' at '/root/voc'
I am trying to import and then export to CVAT so I can import my dataset into CVAT. Any advice would be greatly appreciated.
TIA!
Hello everyone!
A while back, I stumbled upon an interesting paper that applied Reinforcement Learning to Object Localization. I got fascinated by how computer vision tasks could be transformed into a reinforcement learning problem, making it feel like a Markov decision process !
So, i've decided to create a repository to compile all the existing (published) papers that delve into Reinforcement Learning in Computer Vision : https://github.com/rayanramoul/RLCV-Papers
If you have any papers in mind or recommendations to enhance the repository, please don't hesitate to share them. Your input would be greatly appreciated!
Thank you! :)
Hi CV community,
I was wondering what the best resources were (textbooks/ lectures on YouTube) were to get into Computer Vision. My purpose is to get a working understanding of the theory behind it in order to make my own models. Many thanks!
Looking for a depth camera for:
- small workspace (15x15x15cm) and small objects (e.g. sub-centimeter scale)
- reconstruction accuracy <1mm (ideally)
- works in real-time (~10Hz)
- will be used with a robotic arm
Cameras from Zivid looked promising, but wasn't sure if there were other options out there. Any suggestions would be great.
Hi All,
I'm looking at a setup that would have a hardware trigger that would send a signal to a camera to capture a single still image.
All the web cameras would be bound by their FPS so are not great, and DSLRs are 100-150ms.
I will be using a laser that when broken will need to take an image, this will happen every 5-6 seconds. The object will be going ~100mph and there is a TON of light available / can add more if needed.
How can I trigger a camera using python, c++/c#, or a raspberry pi with <10ms response time?
As the title says, I'm curious about some well used transfer learning tasks.
ImageNet to other datasets is common, but what are something good pairs I can try and mess around with ?
Like CIFAR10 to MNIST, or CIFAR10 to CIFAR100, or CIFAR10 to SVHN.
While ImageNet is 224x224, and some of the above ones are 32x32, I also want to look at 64x64(like TinyImageNet)
P.S : I know that transfer learning is mainly used for situation where the target dataset is quite small, but I want to use some standard popular datasets and see how things work
I have 4 yoe in a start up company and a phd four publications 2 in high level math journals and 2 CV/DL papers in A journals and also 4 patents. I have experience with most common Cv tasks eg object detection, Multi object tracking, 2d/3d human pose estimation and monocular depth estimation. I’m well versed in typical network building blocks eg conv nets, FFNs, transformers, Diffusion etc. I have a little experience with NLP like NLTK and TTS networks. Also some other general dev technologies like ec2,s3,sql,mongoose, etc.
That all being said I can’t seem to even get interviews these days just straight rejections not talking to recruiters. On the other hand in 2020, I was just searching for jobs passively and had something like a 75% success rate with getting interviews. I know the job market has changed but I’m a lot more experienced at this time than then and having abysmal luck.
Anyone have any advice would be happy to share my resume if that would make it easier to give advice. Also open to hearing what other technologies o should/could learn.
Here is the list of all ICCV 2023 (International Conference on Computer Vision) papers, and a short highlight for each of them. Among all ~2,100 papers, authors of around 800 papers also made their code or data available. The 'related code' link under paper title will take you directly to the code base.
https://www.paperdigest.org/2023/09/iccv-2023-highlights/
In addition, here is the link of "search by venue" page that can be used to find papers within ICCV-2023 related to a specific topic, e.g. "diffusion model":
https://www.paperdigest.org/search/?topic=iccv&year=2023&q=diffusion_model
ICCV 2023 will take place at Paris on Oct 2nd, 2023.
I've experimented with semantic segmentation models like DeeplabV3 on Core ML to try to generate stickers from images. However, the results from DeeplabV3 don't look nearly as clean and accurate as the stickers generated in the Messages app on iOS 17.
On the Apple Machine Learning models page, DeeplabV3 is the only image segmentation model they offer. However, it seems they must be using a more advanced model to achieve such high quality sticker generation directly from the camera roll.
Does anyone have any insight into what kind of model Apple might be using behind the scenes? I'd like to try and replicate their level of accuracy if possible. Any information or suggestions would be greatly appreciated!
I have trained a YOLO v8 model and have deployed it on AWS Sagemaker.
But while accessing the endpoint to make predictions it gives timeout error.
The logs show that it is trying to install ultralytics from requirements.txt file. And this is required to load YOLO model(from ultralytics import YOLO).
Any suggestions on how to deploy it?
I really like user experience of Yolov8 but license is restrictive. I tried RTMDet-Ins, but in practice, the inference time was really underwhelming (0.3s per 640px frame) https://paperswithcode.com/paper/rtmdet-an-empirical-study-of-designing-real
Any interesting recent projects that I missed?
We are pleased to announce that we have released a new version of ROScribe that supports ROS2 and well as ROS1.
ROScribe
ROScribe is an open source project that uses human language interface to capture the details of your robotic project and creates the entire ROS packages for you.
ROScribe motivates you to learn ROS
Learning ROS might feel intimidating for robotic enthusiasts, college students, or professional engineers who are using it for the first time. Sometimes this skill barrier forces them to give up on ROS altogether and opt out for non-standard options. We believe ROScribe helps students to better learn ROS and encourages them to adopt it for their projects.
ROScribe eliminates the skill barrier for beginners, and saves time and hassle for skilled engineers.
Using LLM to generate ROS
ROScribe combines the power and flexibility of large language models (LLMs) with prompt tuning techniques to capture the details of your robotic design and to automatically create an entire ROS package for your project. As of now, ROScribe supports both ROS1 and ROS2.
Keeping human in the loop
Inspired by GPT-Synthesizer, the design philosophy of ROScribe is rooted in the core belief that a single prompt is not enough to capture the details of a complex design. Attempting to include every bit of detail in a single prompt, if not impossible, would cause losing efficiency of the LLM engine. Powered by LangChain, ROScribe captures the design specification, step by step, through an AI-directed interview that explores the design space with the user in a top-down approach. We believe that keeping human in the loop is crucial for creating a high quality output.
Code generation and visualization
After capturing the design specification, ROScribe helps you with the following steps:
Source code and demo
For further detail of how to install and use ROScribe, please refer to our Github and watch our demo:
ROScribe open source repository
TurtleSim demo
Version v0.0.3 release notes
ROS2 integration:
Roadmap
ROScribe supports both ROS1 and ROS2 with Python code generation. We plan to support the following features in the upcoming releases:
Call for contributions
ROScribe is a free and open source software. We encourage all of you to try it out and let us know what you think. We have a lot of plans for this project and we intend to support and maintain it regularly. we welcome all robotics enthusiasts to contribute to ROScribe. During each release, we will announce the list of new contributors.
I am interested in programming and am looking for a laptop that is good. I am willing to spend around $1,800, or even a little more if necessary. I want the computer to be capable of gaming, programming (electronics compatible), video editing, and the like, because I want to be able to do pretty much anything, although I know that may not be possible. Could you recommend any particular part or model that meets these requirements
I was involved in a hit and run. All help would much appreciated. See image below:
I know this question has been beaten to death but I want to know about the current state as of 2023. We are using Yolo V5, V8 and Yolo NAS at my company. We have to deploy them on a Jetson nano.
We are now having a discussion that should we switch to C++ instead of using the YOLO libraries. My point of view is that as PyTorch is already optimized enough we should not be observing any tangible benefits by switching to C++. Also I think that these libraries are very well compiled and to write up code from scratch that can achieve parity with these libraries will take quite sometime.
So what are your thoughts? Are there any advantages of writing yolov5 from scratch on C++? Or is python sufficient enough?