/r/AskComputerScience
Ask Computer Science Questions And Get Answers! This subreddit is intended for questions about topics that might be taught by a computer science department at a university.
Ask Computer Science Questions And Get Answers!
Before posting please read our rules.
This subreddit is intended for questions about topics that might be taught by a computer science department at a university. Questions do not have to be expert-level - it's perfectly fine to ask beginner questions. We all start somewhere. All we ask is that your questions be on-topic.
If you have questions that aren't directly about computer science, here are some other subreddits that can likely offer you more help:
If your post is off-topic, or violates one of our other rules, your post may be removed.
/r/AskComputerScience
I'm currently taking a digital logic design course and I watched a video about overflowing cases in data, where you have (let's say) 4 bits of signed memory, now if you add two negative numbers like -7 and -4, you'll end up with an extra fifth digit that won't fit in the 4 bits, same as with the case when you add 7 and 1, you'll get 4 digits but representing the wrong value, -8
my question is, by that logic, does overflow refer to the occasion where you get an incorrect answer overall? considering in the second case there wasn't an extra fifth digit that won't fit in the 4 bits, -8 fits perfectly in 4 bits yet is the incorrect answer.
Does there exist an algorithm that is both designed to halt, and produces meaningful output that is used for practical purposes, but one where we cannot definitively prove that it halts for all inputs? The closest algorithm I can think of is the Collatz conjecture, but it doesn't produce meaningful output. I am studying whether is is necessary to be able to test algorithms like Collatz, since they don't appear to have many practical applications. My best guess is that the algorithm would have something to do manipulating numbers in a loop based on conditions on them, like with the Collatz conjecture.
I'm trying to solve this: (A ∧ B) → C ⊢ A → (B → C)
And I wonder if that's what I came up with is correct. Is it possible to start with the A ∧ B assumption?
A ∧ B (assumption)
A ∧ E(2)
B ∧ E(2)
C → E(1,2)
B → C → I(4,5)
Is it correct?
Solution starts with assumptions A and then B to form A ∧ B.
I have a pub sub (kafka) network which sends data from Machine A to machine B, now issue is its overloading the consumer which is causing my code and linux to crash..
Just wondering what the best pratice is here for pub sub related things, how can i make it into a concrete pipeline?
Do note i cannot use cloud for a pipeline, i have a network but not necessarily access to internet
Reading Computer Organization by Patterson and Hennessy and they mention that lowering the instruction count of a program may lead to an increase in clock cycle time. Therefore, improving performance isn’t as straightforward as lowering the instruction count. could someone explain how lowering the instruction count affects clock speed and why it would decrease it?
An + operation is represented by a binary code ....
Where can I find all these instructions and code representation for them???
My professor wants our class to make a 4 bit program counter out of logic gates on Digital Simulator but he didn't teach us how to use it. I know how it works but I don't know how to express it in logic gates.
Also the instructions don't have the jump in the program, just Load and Increment.
Didn't tell us anything else on how to do it. Please help.
Clarification: We're making a CPU in class but I only fully understand the wider scope and nothing about the logic gates inside.
I'm requesting any resources or any images that show the logic gates.
I regret not learning seriously
Hi folks, I hope you’re doing well.
I am a student currently studying Computer Science at university.
I studied very shallowly in the first 1/3rd of the curriculum.
I regret not taking everything seriously from the beginning because I have now become passionate and interested in computer science as a field beyond getting a qualification for a job.
In the first few modules I crammed and retained very little knowledge. I have been more diligent with my more recent work and plan on continuing to do so.
How can I overcome the knowledge gaps I created?
I am also working part time so going back to each of those subjects is going to be challenging.
How would you deal with this situation if you were me?
I'm trying to apply an A* algorithm in this game where I used a set of test nodes in a 3D environment with a start node on one extreme of the map and a goal node at the other extreme.
The nodes in between were scrambled then sorted in a list in ascending order based on their H cost. The problem I ran into is calculating the G cost. Its not a simple 2D grid where I can just get an arrow to reposition to an adjacent node. These nodes are separated apart from each other and their 3D positions are randomized on the X and Y axis.
I had thought this algorithm was straightforward to implement but when I realized I had to define what counts as a neighbor for each node in a set of nodes with varying amounts of distance from each other then now I find the task daunting and I don't think its very efficient to just manually assign the neighbors for each node, even if its just a set of 8 nodes on the map.
I'm still new to pathfinding algorithms and I am sure there is a way to dynamically assign neighbors to nodes based on a number of estimates or sound assumptions. How do you go about doing this?
I have roughly 20k images and some of them are thumbnails of each other. I'm trying to write a program to find which ones are duplicates, but I can't directly compare the hash of the contents because the thumbnail versions have different pixel data. I tried scaling them to 16x16, quantizing them to 6 bits/pixel, and calculating a CRC, but that doesn't work since it amplifies small differences in the images. Does anyone know of a fast algorithm (O(n log n) or better) that can find which images are visually similar to each other?
Obviously the POSIX API means that most non-GUI applications, if they don't depend on nonstandard OS-specific behavior, can be quite readily cross-compiled, and shell scripts are usually fairly cross-compatible, but to what extent would binary compatibility work? Obviously this is assuming that all of these operating systems are running on the same architecture, but could a non-GUI program distributed in binary-only form written for, say, AIX run on Linux without significant issue? Back when Linux was first replacing commercial UNIX, could a user's old UNIX programs run on Linux? If the answer is "absolutely not," how difficult would it be to translate the ABIs, would it be as difficult to do as, for example, WINE translating the Windows ABI to Linux, or would it be more simple due to the common APIs of the original source code? Also, if anyone knows, is FreeBSD's binary compatibility with Linux in any way native, or is it a WINE-like translation layer?
Hello, I’m a Computer Science student from Germany and we have to create a Turing machine which accepts every word (only containing zeros and ones) and gives out the length of it in binary. Can somebody please help me, I’m completely stuck? Thanks
Hi there, I'm a physicist working in AI research, mostly on the theoretical model development side, but increasingly my work involves training models in a distributed fashion across many GPUs, and issues such as using CPUs to load data across GPUs etc, and trying to do all this efficiently. So I'd really like to learn more about computing at low level, how the code I write in python or C++ actually gets executed, and the difference between CPU and GPUs etc. Can anyone recommend some online courses that go into this? I guess this is on the topic of computer engineering? I would like to invest some time into this. For context I have about a decade of experience working in python, with some C++, mostly on hpc clusters using linux. However these have always just been tools to solve problems in research, so my understanding of how these systems actually work stopped at a pretty shallow level. Thanks!
Hello, I am using scikit-learn to train a random forest regression model. I have to read a lot of scientific articles for a project, but I don't have time to read them all. Hence, I am trying to train a random forest model to read them for me, and only send forward high scoring articles that are likely to contain information that is relevant to me. Once the model is trained, I intend to take random chunks of the google scholar database and run the model on it.
Now, my question is, when making my training set, how should the scores be distributed? Do I want equal parts good, bad, medium datapoints? Do I want a skew toward training the model on more examples of good data? Do I want a skew toward training it on how to recognize bad articles? And while I appreciate all answers, bonus points for providing some sort of source for your answer. So far, I have an inclination toward making the training set uniform in how good and bad I have scored the datapoints, or in other words, a healthy mix of good, bad, and mediocre articles.
Could someone remind me of the name of the YouTube video about modern software performance testing methods? I feel like it's a fairly well-known video, as I think I've seen someone reference it in a Reddit post before. It's a talk that covers how small changes in the way that you test a software program's performance can cause misleading results. For example, changes in the order in which you run the tests, the name of the software on the file system, the location of the software in the file system, etc. The basic point of the talk is that you need to scientifically account for these variables that can affect the cache's performance so that you can get an accurate idea of how any change to your source code changes the performance. The video is in English, of course. If it helps to narrow it down, I think that at some point the presenter references image processing software and also promote's his company's products. Thank you.
Hey all, I'm a freshman in CS and have no experience with coding. I just finished my "project" where I built a basic Python code that allows users to sign up with a username and password and their username and password are stored in a text file for when they try to log in later. Is this something that I should think about posting on GitHub/LinkedIn or should I wait until I get into more advanced projects? I'm just brand new to this and am unsure if this kind of project seems too basic or if this even counts as a project.
Hey everyone,
I'm currently working on a project to build a PDF query bot using LLaMA or Hugging Face open-source APIs, but I'm facing a lot of challenges. I've tried various code snippets from GitHub, but I still can't get it to work properly. My project review is coming up fast, and I'm starting to get a bit worried.
If anyone here has experience with LLaMA or Hugging Face APIs and has worked on a similar project, I would really appreciate any guidance or suggestions. Also, if you know of any reliable resources or tutorials that could help me better understand how to implement this, please do share.
Thank you in advance!
Given that most software engineers likely wouldn’t appreciate introducing flaws or limitations on purpose, I’m curious if there are cases where companies deliberately design software to become obsolete or incompatible over time. Have you come across it yourselves or heard about such practices?
Anything i've ever heard about is that it's never intentional, software should be made to be sustainable and efficient™ since people actively need to use it and things like PO sound like something you'd ever do just to annoy someone.
I'm a CS student, and I just want you to suggest resources that provide articles (software/hardware) suitable for my level as a junior student.
The understanding I have about this question is this-
When I compile a code, OS loads the compiler program related to that code in the main memory.
Then the compiler program is executed and the code it is supposed to compile gets translated into the necessary format using the cpu.
Meaning, OS executable code(already present in RAM) runs on CPU. Schedules the compiler, then CPU executes the compilation process as instructed in the compiler executable file.
I understand other process might get a chance for execution in between the compilation process, and IO interruption might happen.
Now I can be totally wrong here, the image I have about this process may be entirely wrong. And then in that case I'd say please enlighten me, by providing me with a clearer picture.
I’ve been getting really into DSA recently and was looking for a book that covers topics like bloom filters or tries over traditional DS. Thanks in advance!
Does anyone know how to scrape data from those apps? Any free APIs? I need the data for ENGAGEMENT PREDICTION and INFLUENCE CLASSIFICATION for my personal COMPARATIVE ANALYSIS experiment of how influencial a person is based on how active he is in social media
PS I'm a broke undergraduate so I can't afford those tokens from Facebook and X
I had this lingering thought while waiting in traffic. It's nothing serious but I just want to know. I know that Google maps is able to take into account real time traffic data for it's pathfinding along with average speed and road conditions.
What I want to know is if they estimate the traffic of a given section of road depending on day and hour. If they do, do they take it into account in their pathfinding? How do/would they optimize it?
As an example: Let's say there's two paths to choose from and each path contains two sections:
At timestep t=0: The first path has both sections of the road estimated to take around 5 units of time.
The second path has the first section take around 5 units as well. However, the second section is a bit more congested and is estimated to take around 10 units of time.
At timestep t=5: Let's say the first section of both path doesn't fluctuate and that if you were to take either path at t=0, you would have cleared it.
However, the second sections do: The second section of the first path starts to enter their rush hour time and gives an ETA of 7 units of time.
On the other hand, the second section of the second path just finished it's rush hour and the road is basically empty. Now it has an ETA of 4 minutes.
Would Google's algorithm have taken the first path (shortest path at t=0) or the second path(the true shortest path)?
Note: let's say that these paths fork out so you can't just switch paths mid journey without making the trip longer.
I was asked this question on exam, and professor gave 0 marks because I proved it irregular using pumping lemma. He said that it is same as O^2n whose DFA can be made, but how the question is structured and how it fails pumping lemma it seems irregular.
I am learning how to use logic gates as part of my computer science major. I am attempting to construct a 7-segment decoder using only 24 gates, but I can't seem to get lower than 27! Any suggestions from anyone? I am using logicsim.
Can someone help me with these?? Regular expression
Consider the language consisting of all strings containing exactly two a’s, with Σ = {a, b}. Give an RE for this language.
• b*ab*ab*
• a*ba*b
• ab*ab*
• (a*b*)aa(a*b*)*
With the advent of language models purportedly able to do math and programming, the time it takes to 'generate' a solution is orders of magnitude larger than the time it takes to verify it for correctness.
What are your views on the implications of this 'reversed' P vs NP problem, with AGI? For the truly massive complex problems that it is expected to solve, without a robust and efficient way to verify that solution, how would one even know if they've built an AGI?
answers appreciated
Hi,
I'm trying to implement a sinusoidal positional encoding for DDPM. I found two solutions that compute different embeddings for the same position/timestep with the same embedding dimensions. I am wondering if one of them is wrong or both are correct. DDPMs official source code does not uses the original sinusoidal positional encoding used in transformers paper... why? Is the new solution better?
I noticed the sinusoidal positional encoding used in the official DDPM code implementation was borrowed from tensor2tensor. The difference in implementations was even highlighted in one of the PR submissions to the official tensor2tensor implementation. Why did the authors of DDPM used this implementation rather than the original from transformers?
ps: If you want to check the code it's here https://stackoverflow.com/questions/79103455/should-i-interleave-sin-and-cosine-in-sinusoidal-positional-encoding
So, let's say you're at an event where you have to go from point to point as fast as possible, but there's a catch. Every point has a pair such that if you are at one end, you have to go to the other point before continuing onto the next vertex. It's almost the traveling salesman problem, but the twist is these edges that must be traversed for each point before the next arbitrary vertex can be chosen. What would this variant be called?