/r/algorithms

Photograph via snooOG

Computer Science for Computer Scientists

Computer Science for Computer Scientists

Smokey says: boycott all products and services from eco-unfriendly businesses to fight climate change! [see more tips]

Note: this subreddit is not for homework advice. Requests for assistance with coursework may be removed.

Other subreddits you may like:

Does this sidebar need an addition or correction? Tell me here

/r/algorithms

112,292 Subscribers

1

What is the point of proof of correctness of NP-completeness?

In most the problems I am tasked to prove that a problem A is NP-complete. I show that B is in NP, then I reduce NP-hard problem A to B. Then I am required to prove that a yes instance in B is a yes instance in A. But also it says that I need to prove that a yes instance in A will be a yes instance in A. This is a bit confusing because isn't it basically the same thing from another angle?

I also got this understanding that all yes instances in A will not be yes instances in B. Given that the reduction is from B to A, all yes instance inputs of A won't even be defined for B unless I also reduce A to B. What am I supposed to do when asked to prove that yes in A -> yes in B?

3 Comments
2024/12/03
01:02 UTC

1

Merge Sort in a sports context - problem context, constraints, and an attempt at a

I am not at all a specialist in sorting algorithms, so I am wondering if there is some gold standard solution for this very specific case, where the constraints are not the usual ones. I am going to present the problem context, its constraints, and an attempt at a solution. I would appreciate any feedback, both positive and negative.

The problem context:

  • 1: There is a sporting competition, where the entrants are club teams from various countries.

  • 2: The federations of the countries with club teams entered all have intra-national club rankings.

  • 3: This initial sorting, based on the match results in the initial rounds, should result in an initial cross-national ranking which is then used for the subsequent rounds. We do not have to concern ourselves with those subsequent rounds, that is a matter for another day. Also, that is a far easier problem.

  • 4: In each round n number of matches are played. The total number of entered teams is significantly higher than 2*n. Each match is played on exactly one pitch/court.

The constraints:

  • 1: The sorting algorithm must be explainable to non-mathematicians.
  • 2: The sorting algorithm must be acceptable by non-mathematicians.
  • 3: The sorting algorithm must be understandable by non-mathematicians.
  • 4: The initial intra-national rankings must be treated as gospel. If team A is ranked better than team B in the intra-national ranking, this must also be the case in the initial sorting.
  • 5: All matches that are played in the initial ranking, and thus count for the initial sorting, must be played between teams from different countries. The teams do not want to travel internationally just to start out by playing their neighbors.
  • 6: All matches end with a win for one team, and a loss for the other. Tiebreakers are used if necessary to acheive this.
  • 7: All inputs are in the form of A>B, or A<B. Point differentials, or anything else than win/loss data, are not used. This is due to a hard demand that runaway results should not skew overall rankings, and to keep things simple.
  • 8: The intra-national ranking systems are not comparable to each other. They have been constructed by the individual national federations, and have been done so in an ad-hoc fashion. It is not possible to normalize the various intra-national ranking systems so that a team which has X points in one ranking system means will say anything about how good that team is compared to another team in another country which has X points in its intra-national ranking system. It is furthermore not possible, politically, to start a overall normalization program intended to create normalized ranking systems in the future.
  • 9: No team will play more than one match in any one round.
  • 10: There can be multiple initial rounds played in order to achieve the initial sorting.
  • 11: It is desired to avoid matchups which realistically will result in blowouts, as much as possible.
  • 12: The competition leadership should have a limited input on which teams are pitted against each other. This is due to a desire to avoid the possibility of corruption.
  • 13: The initial sorting should be reached in a few rounds as possible.
  • 14: If several competitions are run concurrently (mens/womens event, for example) it should be possible to change the number of pitches/courts assigned to one competition from round to round without breaking the whole competition structure.
  • 15: If there are teams from two countries with wildly differing overall capabilities present, the competition structure should not entail a lot of unnecessary matches.
  • 16: If team A has won over team B in the initial rounds, then team A must be ranked better than team B in the overall initial sorting.

Given the constraint list above, the following is what I have come up with:

    1. All teams from country A and all from country B are initially assigned to to a merge-sort which produces an initially sorted list which is the ranking of the synthetic country AB. Likewise with countries C, D, and so on. The competition leadership assigns the countries to those pair-ups. The merge-sort is done so that it fulfills all constraints above.
  • 2: Once the rankings of the synthetic countries AB, CD and so on have been created, the competition leadership assigns them into pairs which result in rankings of the synthetic countries ABCB, EFGH, and so on. This is done iteratively until we have an overall ranking which contains all teams from all countries.
  • 3: Each pairing is done so that the teams from country A are listed in the left collumn, in the order of their intra-national ranking. The teams from country B are listed in right collumn, likewise ordered according to their intra-national ranking.
  • 4: In round #1, Team A1 selects an opponent among the teams from country B. If team A1 wins that match, they can only select opponents that were initially ranked higher than their initial opponent, and vice versa. Once team A1 has won a match, teams from country B which initially were ranked lower than the team that A1 won over cannot subsequently select A1 as an opponent i later rounds.
  • 5: In round #1, the team from B that was initially ranked just below the team selected by A1 selects an A team for its opponent in the first round. This team must be ranked lower than A1, which at this stage is not a limitation.
  • 6: In round #1, the lowest ranked team from B chooses an opponent from A which is lower than the A team chosen in step#5.
  • 7: In round #1, the A team which initially is ranked just above the A team chosen in step #6 selects a B team as its opponent. This team must be ranked below the B team in step#5, and also above the B team in step #6.
  • 8: Steps 4-7 are repeated, with the constraints that no team can select an opponent if there is any possible match outcome which would lead to a forbidden outcome – two teams from the same country having an initial sorting which does not coincide with their intra-national ranking. This means that subsequently created matchups after those from steps #4-7 involve teams closer and closer to the middle of the intra-national rankings of their respective countries.
  • 9: Matchups are created until all alloted pitches/courts have been used, or no more matchups can be created that do not break the criterion outlined in #8 above.
  • 10: The match results from round #1 are used to create the first iteration of the ranking for the synthetic nation AB. This ranking consists of three parts: One or more teams from one country that are at the top of the ranking and are done, one or more teams from one country that are at the bottom of the ranking and are done, and the remainder in the middle. Example: If team A1 selects B1 as its opponent and then wins that match, then A1 is at the top of the ranking of of the synthetic nation AB and will remain so, no matter what the results of the subsequent A-B matches. No A teams can select A1 as an opponent, and since B1 lost against A1, neither can teams B2-Blast. Should any other B team play a match gainst A1 and then win, that would require that that B team is placed better than A1, which must be placed better than B1 – which would lead to a conflict among the B teams. Example: If A1 selects B3 as as its opponent and then loses, then teams B1, B2, and B3 are at the top of the ranking of the synthetic nation AB.
  • 11: Steps #4 -10 are repeated for round #2, but only the remainder teams in the middle of the ranking are eligble for matchups.
  • 12: Steps #4 -11 are repeated for rounds #3 and beyond, until there are no more remainder teams. At that stage, we have a complete ranking for the synthetic nation AB.
  • 13: Steps #4-12 are repeated for the synthetic nations of AB and CD, until a complete ranking of the synthetic nation of ABCD is created
  • 14: Steps #4 -13 are repeated until there is an overall ranking for all teams AZ, which is the initial sorting mentioned in point #3 of the problem context.
  • 15: The initial sorting is then used for the latter parts of the competition. In those latter parts, the prohibition against matchups featuring two teams from the same country is removed. Teams with similar rankings from the initial sorting are divided into poules. All possible matchups between any two teams in the same poule that have not yet been played are done, and all the match results featuring two teams in the same poule are used to create the overall ranking in that poule, according to a round-robin system.

After all of this, let me make examples which hopefully will make the whole thing clearer.

Let us, for the sake of the example, assume that we have a floorball competition. Assume that we have ten teams each from Sweden, Finland, USA, Canada, and also lesser numbers of teams from other countries. Assume that we have ten floorball courts available. The choice of floorball of an example sport is intentional, for reasons which hopefully become appearent soon.

Assume that some of you are tasked with creating an overall ranking which fulfills all the listed constraints. You are – unless you come from a small number of countries, not including USA, Canada, and most of the rest of the world – well versed in sorting algoritm usage, selection and optimization, but completely ignorant of the specifics of floorball.

If you select Sweden and Finland to play in the beginning, and match them up so that court #1 will feature the match between SWE1 – FIN1, court #2 having SWE2 – FIN2, and so on, you will have created a set of matches that will overall be a fairly good set of matches, and that without knowing anything about floorball. Likewise if you create a set of USA – CAN matchups. Starting from those results, anyone with a reasonable knowledge of sorting algorithms would reach the desired initial sorting of those synthetic nations in short order, even without any knowledge of floorball.

However, the same idea would break down – massively – if you alloted all ten courts to matchups featuring teams from one side of the Atlantic versus the other. You would get ten blowouts, and waste a lot of time and court space on getting information that anyone knowledgeable with floorball could have told you beforehand.

A quicker way to arrive at transatlantic ranking would be to pit the SWE10 against USA1 (or, for that matter, CAN1) and watch the carnage on the court when stars&stripes gets absolutely shellacked against the also-rans of the big blond machine. Yes, there would be a blowout, but only one game, and then we would have an initial sorting of the synthetic nation of Greater Minnesota which looks like this: SWE1---SWE10-USA1---USA10.

(As an aside: There have been several matches featuring Sweden and USA national teams, in both genders and for both age categories. USA has never won a match. USA has never reached a tied result. USA has never lost a nailbiter. USA has never lost by merely clear and convincing numbers. Every single match has ended in an absolute slamdanger, with blue&yellow on top. USA would not have a realistic chance of winning a game featuring USA 20+ age category players against SWE juniors, provided that both countries play with teams of the same gender. Testosterone is one h-ll of a drug, so a game featuring USA men versus SWE women is not a foregone conclusion. However, your men would have their hands absolutely full against our women, and I would hold our team as the slight favorite. We are that dominant. End of aside.)

However, that facile matchup, even if it results in a quick sorting, is not acceptable. People would be livid about the competition leadership creating matchups in which a planeload of players end up not playing a single match in the beginning, without them having any say in the matter. So that is a non-starter with regard to stakeholder acceptance.

If one instead transfers the decision power regarding which teams play against each other to the respective team captains, then one bypasses that problem. It is more difficult to accuse someone else of corrupt choices, if you yourself are making said choices. Foist the decision on the team captain for USA 1 team, and no one else is responsible.

That, and the other constraints/criteria listed above, is why I came up with the system listed above. Has anyone seen this set (or something similar) of constraints/critera before? Do you see any faults that I have overlooked?

A related optimization problem: Assuming that the mergesort-adjacent idea outlined above is not fatally flawed, what is the best way to pair up countries? If one does (USA/CAN)-(SWE/FIN) one will have two mergings in the beginning that will require a bit of match resources, but in the end one will have two larger lists which will be quickly sorted into the final list. In either of the two other possible ways to pair those countries up, one will start out with very little resources used to get the two larger lists, but then there will be more work to get the final merging right.

Any idea on what is the right approach, and how one would find that right approach (or at least one that is not especially bad) in the more general case? Assume that the person doing that deciding has good knowledge of national team results – which are indicative of club team results – but no useful data on club team performances against teams outside of their country aside from the very top teams.

0 Comments
2024/12/02
18:46 UTC

0

Need help with this bresenham line drawing algorithm.

0 Comments
2024/12/02
15:41 UTC

1

NP problem proof checking

Given 3D bin packing problem (a special case where the box is cube and everything you wanna fill it with are cuboids and there should be no empty space once you are done). The input it takes is the side of the cube box and cuboids.

I decided to use subset sum to reduce it to the bin packing. I decided to make all the cuboids have the same sides except for the width which will be decided by the numbers that would have made the subset sum of the target. So the if I can get a subset sum of target t, then I could fill in the cube with cuboids. To sum up the cuboids side will be, subset element, target, target etc.. Does it sound logical or am I missing something?

4 Comments
2024/12/02
00:50 UTC

3

removing near-duplicate lists

Let's define a playlist as a collection of song IDs: [20493, 19840, 57438, 38572, 09281]. Order doesn't matter. All playlists are the same length.

Playlists are considered near-duplicates if, for some minimum distance D, they differ by less than D elements. For the example above, [20493, 19840, 57438, 47658, 18392] is a near-duplicate for minimum distance 3 since the playlists differ by two elements each (the last two songs). All playlists are the same length, so the near-duplicate relationship is reciprocal: if A is a near-duplicate of B, B is a near-duplicate of A.

The playlists are sorted by some metric (let's say popularity) and I want to remove all near-duplicates, leaving a list that is still sorted by popularity but that has playlists that are meaningfully different from each other.

Right now I'm just doing a quadratic algorithm where I compare each new playlist to all of the non-near-duplicate playlists that came before it. Sorting the song IDs ahead of time lets you do a fairly efficient linear comparison of any two playlists, but that part of the algorithm isn't the problem anyway since playlists tend to be short (<10 songs). The issue is that as number of playlists gets very high (tens of thousands), the quadratic part is killing the performance.

However, I don't see an easy way to avoid comparing to all previous non-near-duplicates. Order doesn't matter, so I don't think a trie would help. The song IDs have no relation to each other, so I don't see how I could use something like an r-tree to do a nearest neighbors search.

My other idea was to store a map from song ID -> collection of playlists that contain it. Then for each new playlist, I could maintain a temporary map from candidate near-duplicate playlists to number of overlapping songs. As I pass through the new playlists's song IDs, I'd lookup the playlists that contain it, adjust my temporary map, and then eventually be able to tell if the new playlist was a near-duplicate of the previous ones. Does that seem reasonable? Are there other approaches?

# of distinct songs: ~3k

# of playlists: up to 100k

# of songs per playlist: up to 15

12 Comments
2024/12/01
22:30 UTC

1

Time complexity of graph algorithm

0 Comments
2024/11/30
11:33 UTC

0

Consolidated list of similar problems of all patterns in LeetCode, Check it out!

Consolidated list of similar problems of all patterns in LeetCode, Check it out! https://grid47.xyz/sheets/

0 Comments
2024/11/30
02:43 UTC

16

Which algorithms/data structures that are taught in degrees should you never use in practical coding on real problems?

I know Fibonacci trees is one. What else?

30 Comments
2024/11/28
18:34 UTC

1

Bellman ford optimization

So i recently came up with bellman ford shortest path algorithm.

I visited some online blogs, where they say,

First relax the edges v - 1 times and then check for the negative cycle by doing this one more time.

But if the updation stops, in earlier loops shouldn't we just return from here? Or is there a catch?

2 Comments
2024/11/27
17:45 UTC

1

genomic data compressor - using rle and huffman combined (modified huffman coding)

hello everyone! i am currently making a project that is to compress genomic data using various algorithms and then compare the compression metrics of them. i have implemented rle and huffman coding in my project and am looking to also add a combination of rle first to compress the data and then huffman on the already encoded data. i even already did the implementation of it, but my implementation treats every symbol in the rle output as a single symbol, with no interdependencies. by this i mean that if the original string is 'AAAAAAAAAAAAAATTTTGGCCCCA', using rle it becomes 'A14T4G2C4A1'. then when i use huffman on it, i count the frequecies as '4' - 3, 'A' - 2, '1' - 2, etc. and create the nodes and tree accordingly. however, i saw online that there is also the option of using the symbol number pair as one "symbol", and encode accordingly (meaing A14 is one symbol, T4 is another, etc). in my mind this doesnt make any sense as the frequency distribution will always be even. could someone explain to me if my approach is correct or how to improve it in some way?

0 Comments
2024/11/27
09:57 UTC

4

What does it mean when f sometimes goes down in A* search?

I have an admissible heuristic but notice that sometimes f decreases and then later increases when it is running. That is when I pop from the priority queue sometimes f is smaller than a value that was popped before and then later on it is larger again.

How is this possible or must it be a bug in my code?

7 Comments
2024/11/26
11:44 UTC

0

DFS recursive backtracking issue

I'm trying to write some algorithms for learning and I'm hitting a brick wall on my recursive dfs backtracking and I don't know why.
This is what my code looks like, and its output leaves these cells unvisited for some reason and I don't know why: 'Unvisited cells: [(2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]'

It works pretty well for almost all other seeds and all other grid sizes that i tested so I'm not really sure why this is breaking, any insight would be greatly appreciated.

WIDTH = 5
HEIGHT = 5
DIRECTIONS = [
    ("top", -1, 0),
    ("bot", 1, 0),
    ("left", 0, -1),
    ("right", 0, 1),
]
OPPOSITES = {"top": "bot", "bot": "top", "left": "right", "right": "left"}
random.seed(69)
grid = [[Cell(x, y) for y in range(WIDTH)] for x in range(HEIGHT)]


def generate_maze(cell: Cell):
    grid[cell.x][cell.y].visited = True
    print(f"Visiting cell: ({cell.x}, {cell.y})")
    validate_maze_step()

    random.shuffle(DIRECTIONS)

    for direction, x, y in DIRECTIONS:
        new_x, new_y = cell.x + x, cell.y + y
        if new_x < 0 or new_x >= HEIGHT or new_y < 0 or new_y >= WIDTH:
            print(f"Neighbor ({new_x}, {new_y}) out of bounds")
            continue

        neighbor = grid[new_x][new_y]
        # If neighbor is unvisited, remove walls and recurse
        if not neighbor.visited:
            cell.walls[direction] = False
            neighbor.walls[OPPOSITES[direction]] = False
            generate_maze(neighbor)
1 Comment
2024/11/25
23:20 UTC

0

Alternate Sorting Algorithm?

Let's say you were to sort an array of natural numbers, why don't you just search the smallest number and biggest number, generate an already sorted array, then pick out the extras?

For example, let's say you have the array of [1,4,6,5,2,8,10], you find their min=1, max=10, then generate[1,2,3,4,5,6,7,8,9,10]. Then you check whether the elements are extra. For example, 1 is ok, 2 is ok, 3 isn't, so you crop out 3. Thus you throw out 3,7,9, resulting in [1,2,4,5,6,8,10], which is indeed sorted.

I think there must be something seriously wrong with my method but I just really couldn't figure it out

PS: Let's just assume that numbers are evenly distributed, so arrays such as [1,222,45678,29] won't have to be sorted.

5 Comments
2024/11/25
15:51 UTC

7

Minimum swaps to transform order of list, if the list contains duplicates?

Say permutation A = [D,K,T,W,Y,Y,K,K,K] and permutation B = [T,Y,K,K,K,W,D,K,Y]. How can we determine list the minimum number of letter-pair swaps to transform from permutation A to permutation B?

Is it different if the list contains no duplicates?

Struggling to understand this & would appreciate any insight pls

10 Comments
2024/11/24
18:07 UTC

18

How can I find the closest word in a dictionary to a misspelled word given by input?

The closest thing i found on the internet to what I want to achieve is to store the dictionary in a BK-tree and then to create a list of candidates (i.e. similar words) given a tolerance.

The tolerance is the maximum distance a word from the dictionary can have to the misspelled word in order to be one of the candidates, and it basically is what makes the BK-tree useful since a lower tolerance means more branches are pruned.

My goal though is to only find the closest word of all with virtually infinite tolerance. My initial idea was to use the "native" lookup algorithm multiple times, starting from tolerance = 0 and increasing it until at least one candidate was found, but that's probably bad because I'd end up visiting the same nodes multiple times.

Does anyone have any idea how to fix this? I know there's probably an hilariously straightforward solution to this but I'm a noob. Thanks! :)

P.S.: By distance I mean Levenshtein's edit-distance.

15 Comments
2024/11/23
12:33 UTC

4

Resources for half approximation problems.

Hey, I am struggling to find introductory material on half approximation problems online. Kindly share resources for the same. Most of the resources are for 2 approximation problems and I cannot start with Vazirani.

Also tell me whether my understanding of it is correct. A half approximation algorithm gives me half of what the optimal algorithm would give and thus it is for maximization problems. Whereas a 2 approximation algorithm gives me twice the size of the solution than the optimal will give so it's for minimization problems.

Thanks.

2 Comments
2024/11/22
05:08 UTC

0

this advice from chatgpt about dp is great.

Here are some tricks and techniques for solving problems, debugging, and optimizing your solutions, especially for dynamic programming (DP) and general algorithmic challenges. Missing these might slow you down or make problems unnecessarily hard.

1. Reverse Thinking in Bottom-Up DP

  • For many top-down solutions, the key to converting them into bottom-up is reverse thinking:
    • In top-down, you solve the problem from the start and rely on future (recursive) results.
    • In bottom-up, start with the base cases and build up results layer by layer.
  • Trick: Write the recursive solution first. Look at the recursive calls and figure out which smaller subproblems are computed first. Those subproblems form the base cases for bottom-up.

2. Use Rolling Arrays for Space Optimization

  • Many DP problems only require the previous row (or column) to calculate the current one.
  • Instead of storing the entire table, use just 2 arrays (or even 1, depending on the recurrence).
  • Example: LCS space optimization (reduce O(n×m)O(n \times m)O(n×m) to O(m)O(m)O(m)).

3. Precompute and Reuse Results

  • Subproblem Overlap: Check if a result can be reused across multiple iterations.
    • Example: In interval problems, precompute values for ranges or gaps (e.g., sum of ranges in prefix sums).
  • Trick: Precompute helper data structures like prefix sums, difference arrays, or hash maps for fast lookups.

4. Analyze Symmetry in Problems

  • Many problems have symmetry or mirroring:
    • Example: In a palindrome problem, dp[i][j] might equal dp[j][i].
    • Use this property to halve the number of computations.

5. Binary Search in DP

  • Sometimes, DP states don't need to be iterated linearly and can be optimized with binary search.
  • Example: Longest Increasing Subsequence (LIS) can be solved in O(nlog⁡n)O(n \log n)O(nlogn) using a combination of DP and binary search.
  • Trick: Use bisect in Python to efficiently find positions for binary search.

6. Reduce DP Dimensions with Coordinate Compression

  • If the DP table depends on large input sizes but not all indices are relevant, use coordinate compression:
    • Example: Given indices spanning millions, map them to a smaller range (e.g., sorted indices).

7. Visualize the Problem

  • Draw the DP table or recursion tree to see overlapping subproblems, dependencies, and transitions.
  • Trick: Use pen and paper or a tool like Excel to fill out the DP table for small inputs. This often reveals patterns.

8. Recognize Subproblems from Problem Structure

  • DP problems often involve choices:
    • Take it or leave it.
    • Add or skip.
    • Move left or right.
  • Identify these choices early and write transitions for each.

9. Base Cases and Edge Cases

  • Edge cases are where your solution is most likely to fail:
    • Empty strings or arrays.
    • Single-element inputs.
    • Large inputs that test performance.
  • Trick: Always think about the smallest possible input and the largest input.

10. Debugging DP Efficiently

  • Print the DP Table:
    • For bottom-up, print the table at every step to ensure transitions are correct.
    • For top-down, print the states being memoized and reused.
  • Trick: Compare your DP table against brute force results for small inputs.

11. Exploit Patterns in Recurrence

  • Many problems involve simple arithmetic or geometric progressions in their recurrence. Recognizing these can save computation.
    • Example: Sliding Window in DP reduces recalculating overlapping ranges.
    • For problems like "maximum subarray," the recurrence directly translates into an O(n)O(n)O(n) solution.

12. Memorize Common Problems and Transitions

  • Many DP problems boil down to similar templates:
    • Subsequence problems → LCS, LIS, Knapsack.
    • Partition problems → Subset Sum, Palindrome Partitioning.
    • Grid problems → Unique Paths, Minimum Path Sum.
  • Trick: Familiarize yourself with these patterns and transitions. Many new problems are just slight variations.

13. Use Efficient Data Structures

  • Use the right data structures for optimal transitions:
    • Priority Queues: For state expansions in Dijkstra-like DP.
    • Fenwick Trees/Segment Trees: For range queries in DP.
    • Hash Maps: For sparse DP tables.

14. Optimize Recursion Depth

  • Recursive solutions can hit Python's recursion limit for large inputs.
  • Trick: Use sys.setrecursionlimit() for deep recursions or switch to bottom-up.

15. Think Greedy First, Prove Later

  • For some problems, greedy methods work instead of full DP.
    • Example: Activity Selection is solved greedily, though it resembles knapsack.
  • Trick: Test greedy approaches and prove correctness (e.g., with invariants).

16. Simulate Small Examples

  • For any unclear DP transition, write out a few small test cases by hand.
  • Trick: If it doesn't work for small cases, it won't work for larger ones.

17. Write Tests to Validate Your Solution

  • Common edge cases:
    • text1 = "", text2 = ""
    • text1 = "a", text2 = "a"
    • text1 = "abc", text2 = "xyz"
  • Automate test case generation to catch subtle bugs.

18. Compare Top-Down vs. Bottom-Up

  • If you're unsure about a transition or base case in bottom-up DP, compare it with the recursive top-down memoization approach.
1 Comment
2024/11/22
04:02 UTC

21

Math for programmers 2024 book bundle. Manning

3 Comments
2024/11/21
11:28 UTC

1

Analysis of randomized algorithm by number of "failing" inputs

Hi all, I have a question more related to proof techniques.

Assume I have a problem P (of type yes/no) and I devise some randomized algorithm A. I then prove that in expectation, the number of inputs for which my algorithm gives a wrong answer is o(1). Does this imply that my algorithm is correct with high probability (1 - o(1))?

Formally I would say yes, because by Markov's we can bound the probability that there exist some bad input by o(1). However, intuitively I'm not sure if it makes sense, since this kinda seems like having the input over some distribution (rather than having lets say an adversary which can try to always choose something bad). I have also never seen proofs following this technique, which I find weird. What do you think?

0 Comments
2024/11/19
22:48 UTC

7

Seeking Resources to Study Optimization and Heuristic-Based Problem Solving

I recently encountered a fascinating problem where I had to optimize the timing of traffic signals at a set of crossroads to minimize congestion. The problem involved several intersections, each with traffic flow coming from multiple directions. The objective was to dynamically adjust signal timings to reduce overall wait times and keep traffic moving efficiently.

I found this type of problem fascinating and want to dive deeper into similar problems. Specifically, I'm looking for:

Optimization problems that involve maximizing or minimizing an objective.

Heuristic or randomized problem-solving techniques, especially those used in real-world scenarios.

Any courses, books, websites, or platforms where I can practice these kinds of challenges.

For context, I've explored competitive programming platforms like Codeforces and CodeChef but find they rarely feature these open-ended optimization problems. I’m also aware of contests like Google Hash Code but am looking for additional resources.

Does anyone have recommendations for learning and practicing topics like this?

6 Comments
2024/11/18
10:23 UTC

2

Is the Tree Visualizer on https://www.cs.usfca.edu Showing an Incorrect AVL Tree Representation after deleting node?

Hi all,

I'm currently learning about tree data structures, and I'm exploring how AVL trees handle deletion. From my understanding, when a node is deleted, its in-order successor should replace the deleted node.

I was experimenting with the Tree Visualizer tool on https://www.cs.usfca.edu/~galles/visualization/Algorithms.html, and I noticed something odd. Here's the scenario:

  1. Initial tree state: (Screenshot: https://i.imgur.com/sy5MMGh.png)
  2. After deleting node 0006: (Screenshot: https://i.imgur.com/cPVCsXD.png)

In this case, the tool replaces node 0006 with node 0005.

However, shouldn't node 0007 replace 0006 instead? Based on the AVL tree deletion rules I've read, the in-order successor (the smallest node in the right subtree) should take the deleted node's place, and in this case, 0007 seems to fit that criteria.

Am I misunderstanding something about AVL deletion, or is this a bug/misrepresentation in the tool?

Looking forward to insights from the community.

5 Comments
2024/11/16
06:11 UTC

0

Are the Tower of Hanoi and binary code related?

Ok so hear me out:

In binary code, every uneven byte has the digit that represents 1 activated. For example:

01100001=1/A 01100010=2/B 01100011=3/C 01100100=4/D 01100101=5/E

See how the last digit switches on and off every step?

In the Tower of Hanoi, in every uneven move you have to move the smallest disc in order to get the minimum amount of moves. I'm not sure how to give an example of this though, just try it out.

I tried to look up whether someone had seen this before but i didn't get any results.

I might be insane but i think i could be onto something.

4 Comments
2024/11/14
22:30 UTC

3

why does reservoir sampling wikipedia say that the algorithm doesn't know the list's size beforehand, but in the source code the size is known?

reservoir sampling on wikipedia: https://en.wikipedia.org/wiki/Reservoir_sampling

Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. **The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.** The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far.

Why does it say that the population n is not known to the algorithm, but then in the source code it is known?

They literally use the length n to do the loop.

(* S has items to sample, R will contain the result *)
ReservoirSample(S[1..n], R[1..k])
  // fill the reservoir array
  for i := 1 to k
      R[i] := S[i]
  end

  // replace elements with gradually decreasing probability
  for i := k+1 to n
    (* randomInteger(a, b) generates a uniform integer from the inclusive range {a, ..., b} *)
    j := randomInteger(1, i)
    if j <= k
        R[j] := S[i]
    end
  end
end
2 Comments
2024/11/14
20:44 UTC

1

Correctness of Kruskal's algorithm without assuming distinct edges

Looking for a rigorous proof without assuming distinct edge weights. I attempted using induction, can someone check my work?

My Attempt:

Let G be a connected, undirected, weighted, finite graph with at least 2 vertices. We will show that Kruskal's algorithm produces a minimum spanning tree (MST) of G.

Outline: Use induction to prove the statement: There exists a MST of G containing the first k selected edges (1 <= k <= V-1).

Proof of correctness of Kruskal's algorithm (V >= 2):

Base case (k = 1):

Let M be an MST of G. Let e be the first edge selected by the algorithm. Then e is crossing a cut C such that weight(e) is less than or equal to all other edges crossing C (if any).

Case 1: e is the only smallest edge crossing C Then M contains e by the Edge Exchange Argument (or the connectivity argument if e is the only edge crossing C).

Case 2: e is one of multiple smallest edges crossing C, then by the Edge Exchange Argument: a) Either M contains e, or b) M does not contain e, in which case M contains an equal-weight edge f crossing C, such that it can be exchanged for e to obtain another MST.

Therefore there exists a MST of G containing the first selected edge.

I.H.: There exists a MST of G containing all the first k selected edges (1 <= k < V-1).

I.S.: Suppose I.H., WTS there exists a MST of G containing the first k+1 selected edges.

By I.H., there exists a MST of G, call it M, containing the first k selected edges.

Let e be the (k+1)th edge the algorithm selects, namely the (k+1)th edge that do not form a cycle with previously selected edges. Since e does not form a cycle with previous selected edges, e is crossing a cut C such that all other edges crossing C (if any) are unprocessed (use proof by contra. to show that all edges between disjointed components must be unprocessed). This means weight(e) is less than or equal to all other edges crossing C (if any).

Case 1: e is the only smallest edge crossing C Then M contains e by the Edge Exchange Argument (or the connectivity argument if e is the only edge crossing C).

Case 2: e is one of multiple smallest edges crossing C. By the Edge Exchange Argument: a) Either M contains e, or b) M does not contain e, in which case M contains an equal-weight edge f crossing C, such that it can be exchanged for e to obtain another MST M'. M' contains the first k selected edges and e, as the swapped-out edge f is unprocessed, ensuring it was not among the selected edges.

Therefore there exists a MST of G that the first k+1 selected edges.

Conclusion: By the inductive proof, there exists a MST of G containing the first k selected edges (1 <= k <= V-1). This proves that Kruskal's algorithm produces a minimum spanning tree (MST) of G.

4 Comments
2024/11/14
05:02 UTC

0

Choosing the Right Brute Force Approach in Coding Interviews

Hey fellow programmers,
I'm preparing for coding interviews and struggling with a specific aspect of problem-solving. When presenting solutions, I'm unsure about the best brute force approach to explain, especially when constraints are involved.For example, consider merging two sorted arrays into one sorted array with no extra space allowed. Two possible brute force approaches come to mind:

  1. Create a new array, merge and sort (O(n log n) time, O(n) space)
  2. Use an O(n^2) approach like insertion sort, comparing and shifting elements in-place

I've seen people use the first approach (extra space) as a brute force example, but it technically violates the constraint. Should I:
A) Explain the extra space approach as brute force, acknowledging the constraint violation
B) Choose a different, less efficient brute force method (like O(n^2) insertion sort) that adheres to constraints
C) Other considerations?

What's your take on this? How do you choose the right brute force approach in similar situations?

3 Comments
2024/11/13
13:45 UTC

1

Lattice-Based Hydraulic Erosion For River/Valley/Canyon Formation?

0 Comments
2024/11/13
13:14 UTC

4

How close is a given path to being a shortest path?

Given a path between two nodes on a graph (in my case: a 2d grid with only cost-1 or missing edges), I would like to have an indication of how efficient / how close that path might be to a shortest path. This has to work without knowing the shortest path.

Background: I obtain this path from some approximate shortest path algorithm and would like to decide whether its worth calculating the actual shortest path. Bonus points if you can provide a way to improve the non-optimal path in a computationally cheap way.

One obvious heuristic would be to compare the paths length to the direct path (eg on the grid that would be the manhattan distance between the two points). But this doesnt take into account the graph structure at all.

Any pointers?

5 Comments
2024/11/13
11:54 UTC

1

What is the best way to find the background of an image?

Suppose that the background is black, objects in it are white, and objects cannot have a hole inside them, I think the best way is DFS, as described here: My DFS Solution

  • What is the best way to find the background of an image?
  • Is there any way better then DFS?
1 Comment
2024/11/13
16:54 UTC

2

Need Help Finding the Critical Node in a Flow Network for Maximum Message Reduction

Hi everyone,

I’m working on a problem where I need to identify the critical node in a flow network (imagine a "brain" simulation) composed of three types of nodes: initiators, calculators, and executors. The goal is to find the best calculator node to block in order to maximize the reduction in messages that reach the executor nodes.

What I've Tried So Far

The brute-force approach would be to compute the maximum flow, remove each calculator node individually, and recalculate the flow to see the impact. This means calculating the maximum flow for each node, which is computationally intense and impractical, given the network can contain up to 10510^5105 nodes. I’ve been struggling to find a way to reduce the complexity here, as my current methods just don’t scale.

Problem

I feel there might be a way to avoid recalculating the maximum flow for each node individually, perhaps by using dynamic programming or memoization to save some partial results, but I’m not sure how to apply it here without compromising accuracy.

Does anyone have experience with similar network flow problems, or can suggest an optimized approach or algorithm?

Thanks in advance for any insights!

3 Comments
2024/11/12
21:19 UTC

1

Can anyone tell me how did can I think of the last part ont his algo?

https://www.geeksforgeeks.org/find-given-string-can-represented-substring-iterating-substring-n-times/

I know that first part has used kmp to find lsp and I also thought of using kmp when I saw this question but couldn't think how to use KMP to solve it.After reading this I can get how it's doing it but it's like it's not something that I can think my myself like finding lsp then checking if n-len(lsp) divides n if yes return true else false..how can I think of this part by myself and within the time limit?

0 Comments
2024/11/12
22:06 UTC

Back To Top