/r/algorithms
Computer Science for Computer Scientists
Computer Science for Computer Scientists
Other subreddits you may like:
Does this sidebar need an addition or correction? Tell me here
/r/algorithms
This paper claims to solve the Euclidean Hamiltonian path problem in Polynomial time, with proof. Emperical testing so far is proving it right. What are your thoughts? You can run the code yourself using the given python notebook.
Hey, I am struggling to find introductory material on half approximation problems online. Kindly share resources for the same. Most of the resources are for 2 approximation problems and I cannot start with Vazirani.
Also tell me whether my understanding of it is correct. A half approximation algorithm gives me half of what the optimal algorithm would give and thus it is for maximization problems. Whereas a 2 approximation algorithm gives me twice the size of the solution than the optimal will give so it's for minimization problems.
Thanks.
Here are some tricks and techniques for solving problems, debugging, and optimizing your solutions, especially for dynamic programming (DP) and general algorithmic challenges. Missing these might slow you down or make problems unnecessarily hard.
dp[i][j]
might equal dp[j][i]
.bisect
in Python to efficiently find positions for binary search.sys.setrecursionlimit()
for deep recursions or switch to bottom-up.text1 = ""
, text2 = ""
text1 = "a"
, text2 = "a"
text1 = "abc"
, text2 = "xyz"
I recently encountered a fascinating problem where I had to optimize the timing of traffic signals at a set of crossroads to minimize congestion. The problem involved several intersections, each with traffic flow coming from multiple directions. The objective was to dynamically adjust signal timings to reduce overall wait times and keep traffic moving efficiently.
I found this type of problem fascinating and want to dive deeper into similar problems. Specifically, I'm looking for:
Optimization problems that involve maximizing or minimizing an objective.
Heuristic or randomized problem-solving techniques, especially those used in real-world scenarios.
Any courses, books, websites, or platforms where I can practice these kinds of challenges.
For context, I've explored competitive programming platforms like Codeforces and CodeChef but find they rarely feature these open-ended optimization problems. I’m also aware of contests like Google Hash Code but am looking for additional resources.
Does anyone have recommendations for learning and practicing topics like this?
Hi all,
I'm currently learning about tree data structures, and I'm exploring how AVL trees handle deletion. From my understanding, when a node is deleted, its in-order successor should replace the deleted node.
I was experimenting with the Tree Visualizer tool on https://www.cs.usfca.edu/~galles/visualization/Algorithms.html, and I noticed something odd. Here's the scenario:
In this case, the tool replaces node 0006 with node 0005.
However, shouldn't node 0007 replace 0006 instead? Based on the AVL tree deletion rules I've read, the in-order successor (the smallest node in the right subtree) should take the deleted node's place, and in this case, 0007 seems to fit that criteria.
Am I misunderstanding something about AVL deletion, or is this a bug/misrepresentation in the tool?
Looking forward to insights from the community.
Ok so hear me out:
In binary code, every uneven byte has the digit that represents 1 activated. For example:
01100001=1/A 01100010=2/B 01100011=3/C 01100100=4/D 01100101=5/E
See how the last digit switches on and off every step?
In the Tower of Hanoi, in every uneven move you have to move the smallest disc in order to get the minimum amount of moves. I'm not sure how to give an example of this though, just try it out.
I tried to look up whether someone had seen this before but i didn't get any results.
I might be insane but i think i could be onto something.
reservoir sampling on wikipedia: https://en.wikipedia.org/wiki/Reservoir_sampling
Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. **The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.** The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far.
Why does it say that the population n is not known to the algorithm, but then in the source code it is known?
They literally use the length n to do the loop.
(* S has items to sample, R will contain the result *)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i := 1 to k
R[i] := S[i]
end
// replace elements with gradually decreasing probability
for i := k+1 to n
(* randomInteger(a, b) generates a uniform integer from the inclusive range {a, ..., b} *)
j := randomInteger(1, i)
if j <= k
R[j] := S[i]
end
end
end
Looking for a rigorous proof without assuming distinct edge weights. I attempted using induction, can someone check my work?
My Attempt:
Let G be a connected, undirected, weighted, finite graph with at least 2 vertices. We will show that Kruskal's algorithm produces a minimum spanning tree (MST) of G.
Outline: Use induction to prove the statement: There exists a MST of G containing the first k selected edges (1 <= k <= V-1).
Proof of correctness of Kruskal's algorithm (V >= 2):
Base case (k = 1):
Let M be an MST of G. Let e be the first edge selected by Kruskal’s algorithm. Then e is crossing a cut C such that all other edges crossing C (if any) are unprocessed. This means weight(e) is less than or equal to all other edges crossing C (if any).
Case 1: e is the only smallest edge crossing C Then M contains e by the Edge Exchange Argument.
Case 2: e is one of multiple smallest edges crossing C, then by the Edge Exchange Argument: a) Either M contains e, or b) M does not contain e, in which case M contains an equal-weight edge f crossing C, such that it can be exchanged for e to obtain a new MST.
Therefore there exists a MST of G containing the first selected edge.
I.H.: There exists a MST of G containing all the first k selected edges (1 <= k < V-1).
I.S.: Suppose I.H., WTS there exists a MST of G containing the first k+1 selected edges.
By I.H., there exists a MST of G, call it M, containing the first k selected edges.
Let e be the (k+1)th edge the algorithm selects, namely the (k+1)th edge that do not form a cycle with previously selected edges. Since e does not form a cycle with previous selected edges, e is crossing a cut C such that all other edges crossing C (if any) are unprocessed. This means weight(e) is less than or equal to all other edges crossing C (if any).
Case 1: e is the only smallest edge crossing C Then M contains e by the Edge Exchange Argument.
Case 2: e is one of multiple smallest edges crossing C. By the Edge Exchange Argument: a) Either M contains e, or b) M does not contain e, in which case M contains an equal-weight edge f crossing C, such that it can be exchanged for e to obtain another MST M'. M' contains the first k selected edges and e, as the swapped-out edge f is unprocessed, ensuring it was not among the selected edges.
Therefore there exists a MST of G that the first k+1 selected edges.
Conclusion: By the inductive proof, there exists a MST of G containing the first k selected edges (1 <= k <= V-1). This proves that Kruskal's algorithm produces a minimum spanning tree (MST) of G.
Hey fellow programmers,
I'm preparing for coding interviews and struggling with a specific aspect of problem-solving. When presenting solutions, I'm unsure about the best brute force approach to explain, especially when constraints are involved.For example, consider merging two sorted arrays into one sorted array with no extra space allowed. Two possible brute force approaches come to mind:
I've seen people use the first approach (extra space) as a brute force example, but it technically violates the constraint. Should I:
A) Explain the extra space approach as brute force, acknowledging the constraint violation
B) Choose a different, less efficient brute force method (like O(n^2) insertion sort) that adheres to constraints
C) Other considerations?
What's your take on this? How do you choose the right brute force approach in similar situations?
Given a path between two nodes on a graph (in my case: a 2d grid with only cost-1 or missing edges), I would like to have an indication of how efficient / how close that path might be to a shortest path. This has to work without knowing the shortest path.
Background: I obtain this path from some approximate shortest path algorithm and would like to decide whether its worth calculating the actual shortest path. Bonus points if you can provide a way to improve the non-optimal path in a computationally cheap way.
One obvious heuristic would be to compare the paths length to the direct path (eg on the grid that would be the manhattan distance between the two points). But this doesnt take into account the graph structure at all.
Any pointers?
Suppose that the background is black, objects in it are white, and objects cannot have a hole inside them, I think the best way is DFS, as described here: My DFS Solution
Hi everyone,
I’m working on a problem where I need to identify the critical node in a flow network (imagine a "brain" simulation) composed of three types of nodes: initiators, calculators, and executors. The goal is to find the best calculator node to block in order to maximize the reduction in messages that reach the executor nodes.
The brute-force approach would be to compute the maximum flow, remove each calculator node individually, and recalculate the flow to see the impact. This means calculating the maximum flow for each node, which is computationally intense and impractical, given the network can contain up to 10510^5105 nodes. I’ve been struggling to find a way to reduce the complexity here, as my current methods just don’t scale.
I feel there might be a way to avoid recalculating the maximum flow for each node individually, perhaps by using dynamic programming or memoization to save some partial results, but I’m not sure how to apply it here without compromising accuracy.
Does anyone have experience with similar network flow problems, or can suggest an optimized approach or algorithm?
Thanks in advance for any insights!
I know that first part has used kmp to find lsp and I also thought of using kmp when I saw this question but couldn't think how to use KMP to solve it.After reading this I can get how it's doing it but it's like it's not something that I can think my myself like finding lsp then checking if n-len(lsp) divides n if yes return true else false..how can I think of this part by myself and within the time limit?
Hello everyone! I have an Introduction to algorithm design exam tomorrow and I cannot understand backward substitution. Is there anyone who get this concept clearly. NOTE: I looked for Abdul Bari's video but he is solving very basic problems. My teacher uses it in a different way I guess.
Hello everyone,
I’m a student researching renewable energy, specifically working on a system to convert mechanical energy from ocean waves into electricity. My objective is to design a system that maintains a tangential alignment with the ocean wave surface at all times (meaning the angle between the system and the wave surface is always zero). This alignment should optimize energy transfer.
I’m looking for advice on the best way to determine the ideal shape for this system. One idea I have is to create a Python program that simulates different shapes and tests how well they maintain a tangential alignment with waves in a simulated environment. Additionally, I’d like to explore if I could use AI to automatically generate and test shapes, ultimately helping me find the most effective design. While I have some Python experience, so any guidance would be helpful.
Thank you for your help!
I'm having a hard time understanding the amortized analysis for insertion of a value in a dynamic doubling array. Doubling array is an array of size k that grows 2k each time a new value is being inserted into an array of k capacity and k values already exist inside that array.
I know that phi = number of values - free space left
Please help me understand why and how to gain the intuition for that kind of analysis.
i'm new to the subject and just got stuck on this test question:
"If all heuristic values of every node in a graph equal the minimum total cost to the goal node then Heuristic Depth-first Search and Best-first Search behave the same way." True or False?
could you explain why it's true/false? i'm a bit confused about the "minimum total cost" part.
I recently saw I question that uses kmp to solve it in O(n) TC but the problem is that algo like KMP are not something very intuitive.So how do you remember them?
Hi! Let me know if this post is OK :)
Summary: Working on an encryption based on using a number to seed keystream generation from physical objects.
The Problem: You have a number C that is a concatenation of all whole numbers [1, N] randomly ordered. Develop a process for deconcatenating any C such that there is exactly 1 possible order of [1, N].
Intro Example: N = 12, a possible C = 123456789101112. We need a way to know if it begins with 1, 2 or with 12, but the same process should work for any mix of C and higher N
Deeper Example: If N = 21, C could = 121212345678910111314151617181920 so the beginning could be {1, 21, 2, 12} or {12, 1, 21, 2} etc
Notes: For someone who intercepts C with no context at all, it should not be immediately apparent what N is, or even than N would be important. The recipient knows N and should be able to reliably decipher the randomized order of [1, N] using only C and N, ideally for N<100 on pencil & paper.
Other approach: We could constrain the random ordering -> concatenation process such that a simple deconcatenation process removes ambiguity only if those constraints would not make N obvious from C or require N to be smaller than ~50.
I wrote a demonstration in JS on how to make sure a fixed number of rare events is found in a limited and fixed number of trials.
Then I analysed the outcome and discussed the pros and cons, in the setting of a trading card game, of using a raffle system instead.
I would like to know if there are well establish methods for this purpose that I wasn't able to find.
Thanks.
The input is a full rank n by n binary matrix. You have one type of operation which is to add one row to another mod 2. The goal is to find the minimum number of operations to transform the matrix into a permutation matrix. That is with exactly one 1 in each column and each row.
It doesn't seem a million miles from https://cstheory.stackexchange.com/questions/10396/0-1-vector-xor-problem so I was wondering if it was np-hard.
Been trying to get a deeper understanding of how the DFT and FFT work. So far I've watched several videos on this (Reducible's videos on the DFT and FFT, 3Blue1Brown's Fourier Transform, Steve Brunton's Fourier Analysis playlist, Erik Demaine's Divide & Conquer FFT) as well as worked out various examples and coded my own implementations of each. I understand how the DFT is a matrix multiplication of complex exponentials with a signal, how the roots of unity are used to reduce the amount of computations to evaluate a polynomial, how the FFT is an exact computation of the DFT and so on. The only part I don't quite get is why in the divide and conquer step of the FFT the signal is divided based on even and odd indices instead of simply in half. This is a step that seems to be glossed over most of the times and simply assumed to work. It makes sense viewing it from the perspective of solving polynomial multiplication and evaluation (as explained in this previous Reddit post and the previous videos) since we use the roots of unity to reduce the amount of evaluations due to its periodic nature of said roots. However, the FFT is used for more than polynomial multiplication/evaluation so I assume there is another way to explain this particular way of splitting the signal. That's why I've been trying to figure out another reason outside of that as to why the FFT algorithm does not work when splitting it down the middle.
Unfortunately I have not been able to come up with any. Based on reading the Wikipedia articles and some other sources (including an explanation from ChatGPT and chapter 30 from Intro to Algorithms by CLRS), I have a general idea of why it does not work but cannot quite pin down the specifics without relying on polynomial multiplication. It seems like splitting a signal down the middle changes the phase shift in a way that prevents both halves from being recombined properly or alters some of the data of the original signal leading to wrong results. Visually, I can kind of see why it is better to split along even/odd indices (intuition tells me half the data points through the same range of values may keep the original frequency and shape better than half the points in half the range) but mathematically it still doesn't click despite being able to follow it. What am I missing? How can you explain without using the problem of polynomial multiplication/evaluation that splitting down the middle doesn't work but splitting by even & odd indices does? Or is my original assumption wrong so the FFT cannot be explained without using polynomial multiplication/evaluation? Perhaps the FFT and DFT always carry out some sort of polynomial multiplication/evaluation which is why it cannot be explained any othe way. Any tips or explanations would be appreciated.
Hi everyone,
I’m a second-year computer science student, looking for advice on selecting a topic for my seminar paper, which focuses on algorithms. Honestly, I’m feeling pretty desperate at this point to find a suitable topic. My professor has already rejected a number of topics (listed below) because they were either too simple or already taken, so I’d love your input or suggestions based on the following requirements.
This is the only instructions we got on the paper:
- a) A description of the data structure or algorithm being analyzed
- b) An implementation of this data structure or algorithm in a standard programming language
- c) A detailed complexity analysis for algorithm-based topics
- d) Detailed operation complexity analysis for data structure topics, including amortized complexity
I don’t have much experience with these topics, so I’d prefer something not overly complicated. Ideally, I’m looking for a topic with plenty of publications and resources available to make it easier to research and write.
"Non-Rejected" topics (with his comments)
With the professor’s comments, I’m starting to feel like it might be better to skip these remaining topics altogether and look for something new.
Rejected Topics:
Note: The professor mentioned not to send any more sorting algorithms—unless they're exceptionally interesting. In that case, there might still be a chance.
If you have any experience with these algorithms or recommendations that could fit the requirements, I’d appreciate your insights. Thank you so so much, sorry for the long post!
Extending the cuGraph RAPIDS library for GPU, NVIDIA has recently launched the cuGraph backend for NetworkX (nx-cugraph), enabling GPUs for NetworkX with zero code change and achieving acceleration up to 500x for NetworkX CPU implementation. Talking about some salient features of the cuGraph backend for NetworkX:
You can try the cuGraph backend for NetworkX on Google Colab as well. Checkout this beginner-friendly notebook for more details and some examples:
Google Colab Notebook: https://nvda.ws/networkx-cugraph-c
NVIDIA Official Blog: https://nvda.ws/4e3sKRx
YouTube demo: https://www.youtube.com/watch?v=FBxAIoH49Xc
I’m starting to see algorithms in my college and I was wondering: are there more than one way to build an algorithm and if you can always improve it? And, what should I be looking for when learning about an algorithm, Insertion sort for example? Should I be able to look at it and say what it does, should I be able to calculate how many times the while loop is executed?
I've been practicing dynamic programming problems for a few weeks now.
On some simple cases that have *a lot of* similarities with problems I've already encountered (basically same problem, just presented differently) I can find the solution on my own. I understand how recursion works, I understand memoization, tabulation, I understand how, in some cases, reduce the space complexity of my tabulated problems...
I know how I'm supposed to get started, when presented with a new problem, trying to represent the problem as a graph, trying a naive sub-optimal implementation before moving to optimization etc.
I fee like from a theoretical standpoint, I've got it.
Yet every single time I discover a new problem, I am stuck. I spend hours implementing a super lengthy solution that half-works (when I'm lucky...), and after a while, I give up, look up the solution and realize I don't understand it fully. I break it down into pieces to understand what's going on, and I end up figuring it out.
But then I realize: there's just now freaking way I could have come up with that solution!
The last example to date was with the CountConformingBitmasks problem from Codility exercises... And it's supposed to be a "medium difficulty problem"...
I don't understand what I'm missing. Maybe my brain just isn't wired for this..?
Does it really get easier? Is there really a way to extract some rules out of these problems, to improve over time, or should I just learn all the solutions to all these problems by heart..?
I want to sort an array of words alphabetically and output them grouped in a table with X columns. All columns should be the same length, except for the rightmost column, which serves as a balancer. The rightmost column may be shorter, but not longer.
I am still in the concept phase before I program this and the letter headings in between are causing me problems. For visual reasons, these headings must not be at the very beginning and end of a column. If I want to calculate the maximum height of all columns, I don't yet know where these letters will be. They are deleted at the beginning of the columns because the column heading already exists. And they must not be at the end either, as in the example with 'S'. In this case, I would have to increase the total height of the columns by 1 and rebuild the table, but then it can happen that a bold letter is at the end of other columns and in the worst case I end up in an endless loop.
Can this be solved at all?
A-De | Do-G | H-K | L-Oc | Or-R |
---|---|---|---|---|
Ant | Dog | Hat | Lamp | Orange |
Apple | E | I | Lion | P |
B | Egg | Ice | M | Penguin |
Ball | Elephant | Island | Monkey | Piano |
Bridge | F | J | Mountain | Q |
C | Fish | Jack | N | Question |
Car | Flower | Juice | Nest | R |
Cat | G | K | Nose | Rabbit |
D | Garden | Kite | O | Rose |
Desk | Goat | King | Ocean | S (ERROR) |
Test-Array:
words = [ "Ant", "Apple", "Ball", "Bridge", "Car", "Cat", "Desk", "Dog", "Egg", "Elephant", "Fish", "Flower", "Garden", "Goat", "Hat", "Ice", "Island", "Jack", "Juice", "Kite", "King", "Lamp", "Lion", "Monkey", "Mountain", "Nest", "Nose", "Ocean", "Orange", "Penguin", "Piano", "Question", "Rabbit", "Rose", "Snake", "Sun", "Tiger", "Tree", "Umbrella", "Van", "Victory", "Water", "Whale", "Xylophone", "Yellow", "Yard" "Zoo"];