/r/compsci
Computer Science Theory and Application. We share and discuss any content that computer scientists find interesting. People from all walks of life welcome, including hackers, hobbyists, professionals, and academics.
Welcome Computer Science researchers, students, professionals, and enthusiasts!
We share and discuss content that computer scientists find interesting.
Self-posts and Q&A threads are welcome, but we prefer high quality posts focused directly on graduate level CS material. We discourage most posts about introductory material, how to study CS, or about careers. For those topics, please consider one of the subreddits in the sidebar instead.
Read the original free Structure and Interpretation of Computer Programs (or see the Online conversion of SICP )
Other topics are likely better suited for:
Other online communities:
If you are new to Computer Science please read our FAQ before posting. A list of book recommendations from our community for various topics can be found here.
/r/compsci
Suppose we have a function f that takes (x,y) as input and outputs 1 only if \phi_x(x) converges, and undefined otherwise. Well, I understand that f is partial computable. But how do I use a universal Turing machine argument to show it? Here \phi_e is the partial function computed by the e-th Turing program P_e, as usual.
Can anyone explain to me better the difference between these two concepts, from the point of view of the multiplexing that the CPU performs?
I understood, so far, that Pipeline concerns several internal units, each with its own register, in order to run different instructions (execute, fetch, decode...) in parallel.
Therefore, would Time-Sharing be just an alternation between processes, in order to create the illusion that they are simultaneous?
Is it correct?
scale sharp provide work squash dog party boat piquant shrill
This post was mass deleted and anonymized with Redact
Take them with a grain of salt. These animations give an idea of the algorithms’ processing. YMMV.
I have a yearly subscription to O'Reilly Media through work, which is phenomenal. I read ~4 books per year with a book club and sample from many others. The stuff from O'Reilly proper tends to be high quality (emphasis on tends), and there are other publishers I see consistently putting out high quality content as well like Manning and Springer.
I often see really interesting titles from Packt publishers too, but I've read a few books from them and was left with the impression that they are much lower quality in terms of content. In addition, some part of this impression is because, when I was a newer engineer years ago, I reviewed several books for them, for which I was basically given free books, and the process seemed really fluffy and without rigour. After reviewing a couple of books, I was asked, without any real insight into my credentials, if I would like to write a book for them. I had no business writing books on engineering subjects at the time.
Maybe I'm soured on them by just an unfortunate series of cooincidences that led to this impression, but the big CS publishers do seem to fall on some hierarchy of quality. Sometimes Packt is the only publishers with titles on newer tech, or they are the publisher with the most recent publications on certain topics, so it would be great if I were wrong about this. How do you all see this?
Hi everyone! I’m working on designing a highly efficient, minimal-core processor tailored for basic neural network computations. The primary goal is to keep the semiconductor component count as low as possible while still performing essential operations for neural networks, such as multiplication, addition, non-linear activation functions, and division for normalization (1/n). I’d love any feedback or suggestions for improvements!
Objective
My goal is to build an ultra-lightweight core capable of running basic neural network inference tasks. To achieve this, I’m incorporating a lookup table approximation for activation functions, an L-Mul linear-complexity multiplier to replace traditional floating-point multipliers, and a specialized 1/n calculation module for normalization.
Core Design Breakdown
Lookup Table (ROM) for Activation Functions
• Purpose: The ROM stores precomputed values for common neural network activation functions (like ReLU, Sigmoid, Tanh). This approach provides quick lookups without requiring complex runtime calculations.
• Precision Control: Storing 4 to 6 bits per value allows us to keep the ROM size minimal while maintaining sufficient precision for activation function outputs.
• Additional Components:
• Address Decoding: Simple logic for converting the input address into ROM selection signals.
• Input/Output Registers: Registers to hold input/output values for stable data processing.
• Control Logic: Manages timing and ensures correct data flow, including handling special cases (e.g., n = 0 ).
• Output Buffers: Stabilizes the output signals.
• Estimated Components (excluding ROM):
• Address Decoding: ~10-20 components
• Input/Output Registers: ~80 components
• Control Logic: ~50-60 components
• Output Buffers: ~16 components
• Total Additional Components (outside of ROM): Approximately 156-176 components.
L-Mul Approximation for Multiplication (No Traditional Multiplier)
• Why L-Mul? The L-Mul (linear-complexity multiplication) technique replaces traditional floating-point multiplication with an approximation using integer addition. This saves significant power and component count, making it ideal for minimalistic neural network cores.
• Components:
• L-Mul Multiplier Core: Uses a series of additions for approximate mantissa multiplication. For an 8-bit setup, around 50-100 gates are needed.
• Adders and Subtracters: 8-bit ALUs for addition and subtraction, each requiring around 80-120 gates.
• Control Logic & Buffering: Coordination logic for timing and operation selection, plus output buffers for stable signal outputs.
• Total Component Estimate for L-Mul Core: Including multiplication, addition, subtraction, and control, the L-Mul section requires about 240-390 gates (or roughly 960-1560 semiconductor components, assuming 4 components per gate).
1/n Calculation Module for Normalization
• Purpose: This module is essential for normalization tasks within neural networks, allowing efficient computation of 1/n with minimal component usage.
• Lookup Table (ROM) Approximation:
• Stores precomputed values of 1/n for direct lookup.
• ROM size and precision can be managed to balance accuracy with component count (e.g., 4-bit precision for small lookups).
• Additional Components:
• Address Decoding Logic: Converts input n into an address to retrieve the precomputed 1/n value.
• Control Logic: Ensures data flow and error handling (e.g., when n = 0, avoid division by zero).
• Registers and Buffers: Holds inputs and outputs and stabilizes signals for reliable processing.
• Estimated Component Count:
• Address Decoding: ~10-20 components
• Control Logic: ~20-30 components
• Registers: ~40 components
• Output Buffers: ~10-15 components
• Total (excluding ROM): ~80-105 components
Overall Core Summary
Bringing it all together, the complete design for this minimal neural network core includes:
1. Activation Function Lookup Table Core: Around 156-176 components for non-ROM logic.
2. L-Mul Core with ALU Operations: Approximately 960-1560 components for multiplication, addition, and subtraction.
3. 1/n Calculation Module: Roughly 80-105 components for the additional logic outside the ROM.
Total Estimated Component Count: Combining all three parts, this minimal core would require around 1196-1841 semiconductor components.
Key Considerations and Challenges
• Precision vs. Component Count: Reducing output precision helps keep the component count low, but it impacts accuracy. Balancing these factors is crucial for neural network tasks.
• Potential Optimizations: I’m considering further optimizations, such as compressing the ROM or using interpolation between stored values to reduce lookup table size.
• Special Case Handling: Ensuring stable operation for special inputs (like n = 0 in the 1/n module) is a key part of the control logic.
Conclusion
This core design aims to support fundamental neural network computations with minimal hardware. By leveraging L-Mul for low-cost multiplication, lookup tables for quick activation function and 1/n calculations, and simplified control logic, the core remains compact while meeting essential processing needs.
Any feedback on further reducing component count, alternative low-power designs, or potential improvements for precision would be highly appreciated. Thanks for reading!
Hope this gives a clear overview of my project. Let me know if there’s anything else you’d add or change!
L-mul Paper source: https://arxiv.org/pdf/2410.00907
Heyo there!
I wrote and illustrated a computer science comic book (which was published by the Stanford University Press 🎉), and I'm wondering places (both online and offline) to share it that people might find enjoyable and meaningful. My goal is to share it with communities that would appreciate CS "edutainment". I was inspired by my experiences teaching intro CS and my love for visual thinking.
The comic book touches upon beginner computer science concepts in Python and C++ with characters like Fabulous Function, Sir Python, and Mama If alongside text blocks of code, and is supplemental to an intro CS course.
So far, I've had the chance to share it at my university and a few other places, and the response has been great! I'm curious if anyone has other places/platforms (online and offline) that might find joy in this type of content. I've looked into SIGCSE, a few educational forums, design places, and any other other suggestions are much appreciated :)
I've a B.Sc. in Computer Science, with a track in Software Engineering.
When I was in university, I wanted to somehow address device drivers in my thesis, but my professors rejected it since they claimed it was too hardware related.
I found it strange. I mean, they taught me computer architecture and operating systems, yet DDs were out of scope?
For me, it is sun-light clear that Computer Engineers can develop such software modules, but what about CS?
I've made some research about it and, thus far, I've come up with the conclusion that CS grads actually can develop DDs (they're software modules after all), but, unlike CEs, it is not a given.
What do you think about this? Did I come up with the right conclusion?
Did anybody of you ever develop a device driver?
How can I?
Hey,
In Variational Autoencoders (VAEs) we try to learn the distribution of some data. For that we have "two" neural networks trained end-to-end. The first network, the encoder, models the distribution q(z|x), i.e., predicts z given x. The second network models an approximation of the posterior q(x|z), p_theta(x|z), i.e., models the distribution that samples x given the latent variable z.
Reading the literature it seems the optimisation objective of VAEs is to maximize the ELBO. And that means maximizing p_theta(x). However, I'm wondering isn't p_theta(x) the prior? Is is the evidence?
My doubt is simply regarding jargon. Let me explain. For a given conditional probability with two random variables A and B we have:
p(B|A) = p(A|B)*p(B)/P(A)
- p(B|A) is the posterior
- p(A|B) is the likelihood
- p(B) is the prior
- P(A) is the evidence
Well, for VAEs the decoder will try to approximate the posterior q(x|z). In VAEs the likelihood is q(z|x), which means the posterior is q(x|z), the evidence is q(z) and the prior is q(x). Well if the objective of VAE is to maximize the ELBO (Evidence lower bound) and p_theta(x|z) is an approximation of the posterior q(x|z) then the evidence should be p_theta(z) given that q(z) is the evidence, right? That's what I don't get, because they say p_theta(x) is the evidence now... but that was the prior in q...
Are q and p_theta distributions different and they have distinct likelihoods, priors, evidences and posteriors? What are the likelihoods, priors, evidences and posteriors for q and p_theta?
Thank you!
I'm working on implementing dynamic segment trees with lazy propagation for a specific application, but my understanding of dynamic data structures is a bit rusty. My segment tree will store a numerical value called "weight" with each segment and will use AVL trees as the underlying structure. Given that there will be lazy updates to the weights of certain segments on internal nodes, I'm concerned about how the rebalancing operations in AVL trees might affect these lazy updates.
Could the rebalancing of AVL trees interfere with lazy propagation in a way that is not fixable by just transferring the lazy update information to the relevant nodes when rotations occur , and if so, how can this issue be mitigated?
For example, isn't an undergraduate course on approximation algorithms that provide provable performance guarantees more useful than one on group theory?
Hello, Reddit community! I’m very new to Turing machines and could really use some guidance. I’m struggling to understand how an enumerator works – a Turing machine with an attached printer. I'm attempting to construct the language defined below, but I feel like I might have a logical issue in my approach. Could anyone review it and let me know if it looks correct or if there are any mistakes? Thanks so much for your help! I attached a picture of what I have constructed a diagram[![enter image description here][1]][1]"**
**Present an enumerator with four states (including q_print and q_halt) for the language L={c^2n ∣ n≥0}.
The language's words are: {ϵ,cc,cccc,cccccc,…}
Set of states: Q={q1,q2,q_print,q_halt}
Input alphabet: Σ={0}
Output alphabet: Γ={x,y,0}
Describe this enumerator using a diagram (see Example 3.10 in the book – it is possible to omit the drawing of the qhalt state and all transitions connected to it). You may omit the drawing of impossible transitions and indicate these only as labels. For further details, refer to the student guide.
the book I'm reading is Micheal Sipser's
picture's writing here :
q_0 = we either print epsilon or we print when we have an even number of C's or we put x and send it to q1 to return us another C .
q_1 = we return a C back to q_0 to achieve an even number of C's
q_print = new line and rest the cycle and go back to q_0.
I also ask further questions :
Question 1: want to know with q_print if going back to q_0 and left is legal/correct?
question 2 : does it ever stop? does it need to stop?
According to teachyourselfcs.com, “Most of the code you write is run by an operating system, so you should know how those interact.” Since I started studying from this list, I’ve begun to question if (and to what extent) I should dive deeper into OS concepts.
I’ve been working as a fullstack web dev and recently asked ChatGPT if fullstack/backend devs need a solid understanding of OS concepts. The answer was yes, as knowledge of areas like:
…are all relevant to backend app development. I agree these are important; however, topics like memory management, concurrency, and file system management are often addressed differently depending on the programming language. (e.g. C# and Go each offer distinct approaches to concurrency)
If your goal is to create more performant backend applications, you may only need a basic understanding of OS concepts. After that, it might be more practical to focus on language-specific techniques for handling concurrency, file management, and similar tasks. What do you think?
I'm interested in the intersection of linguistics and computer science, I've been reading on Chomsky hierarchy, and would like to know if there exist lambda calculus types that are equivalent to the Chomsky types, especially the type-1 that's context-sensitive and has the linear-bounded non-deterministic Turing machine as an automation equivalent on the wiki.
The output of transition functions of NTMs are powersets and of DTMs are sets. If I interpret time complexity as the difficulty of simulating NTMs by DTMs, then it seems that due to Cantor’s theorem, there can’t ever be a bijection between these. Therefore P != NP. Can anybody please give me any feedback on this? I’ve exchanged a few mails with Scott Aaronson and Joshua Grochow, but the issue has not yet been resolved. Thanks in advance. Jan
For context, I am starting grad school in January with a Data Science concentration. I want to learn as much as possible in the next 2 months.
I have roughly 20k images and some of them are thumbnails of each other. I'm trying to write a program to find which ones are duplicates, but I can't directly compare the hash of the contents because the thumbnail versions have different pixel data. I tried scaling them to 16x16, quantizing them to 6 bits/pixel, and calculating a CRC, but that doesn't work since it amplifies small differences in the images. Does anyone know of a fast algorithm (O(n log n) or better) that can find which images are visually similar to each other?
I was playing a bit with Generative AI (using NotebookML and podcastfy) and created a podcast using Human Computer Interaction (HCI) publications.
Some MobileHCI, Ubicomp, ISWC and UIST papers are posted. Next is ISMAR.
Paper requests and feedback is welcome.
Hello everyone! I'm new to this sub, and I'm thankful for your time helping me out a bit.
I've been interested for a while on the correctness guarantees one can get for programs depending on the semantics in use on the language. Memory-safety as popularized by Rust, or type-safety as introduced by many languages (nominal vs structural typing), serve to me as examples of ways in which semantics make programs easier to reason about (albeit at some cost of making them somewhat harder to write).
Lately I've been asking myself if some semantics were already well-established for not only writing powerful-yet-decidable programs, but additionally for reasoning about worst-case time complexity.
I've glanced over Primitive-Recursive Functions as a formal strict subset of the General Recursive Total Functions, and found it interesting for formalizing decidable programs (BlooP and FlooP being language implementations). However, I haven't found much information about whether one could compute the worst-case time complexity of these in an efficient manner besides running the programs or attempting to formalize a closed-form-solvable recurrence relation.
I've been glancing over Predicate Transformer Semantics a little bit as well, especially on some of the guarantees that can be given on the correctness of Floyd-Hoare triples. Haven't found much on strong asymptotic guarantees though.
What literature do you recommend for the fundamentals on algorithm analysis and formal semantics on languages? I'm a last year compsci student and sadly we don't study semantics or paradigms at my college besides the basics of OOP :')
Thank you for your time!
I’ve combined an Integer Linear Program (ILP)( which is optimal) with a greedy approach for a set covering problem. It worked faster than using only ILP and returned an optimal solution, but I'm unsure if this will consistently be the case. I have tried this on a few examples, and it finds optimal solutions. Any insights?
Code: Colab Link
I know conway's game of life is not reversible, but can any one state be found that generates a given board and has a min number of cells? Or at least close to min as possible?
Given a configuration like this:
0 0 0 0 0 0
0 0 1 1 0 0
0 0 0 1 1 0
0 0 0 0 0 0
A possible state that generates it is:
0 0 0 0 0 0
0 0 1 1 0 0
0 0 0 1 0 0
0 0 0 1 0 0
Another one is:
0 0 0 0 0 0
0 0 1 1 0 0
0 0 0 1 0 0
1 0 0 1 0 0
The first one being better in this case because has a few number of live cells in it.
I basically need to write an algorithm that finds a solution (the best one i can) in at max 300 secs and 8GB of memory and, if possible, faster than that. Right now I tried some forms of simulated annealing with little to no success and am trying to transition into some form of A* but cant find a good enough heuristic
It must be 100% free to upload my paper because my university is fucked up. And please explain to me how the publication procedure works.