/r/algorithms
Computer Science for Computer Scientists
Computer Science for Computer Scientists
Other subreddits you may like:
Does this sidebar need an addition or correction? Tell me here
/r/algorithms
Hello everyone,
So I want to build rent-a-car app, with focus on auto-scheduling part. To explain what I mean by auto-scheduling I will bring some background firstly.
Now, how prioritisation would work - how to differentiate multiple cars from same category. Firstly I would need input (from-to date, and category of car, i.e. Audi Q5).Rules:
I would try to repeat these steps (1-2) when adding new rent.
Now, for me this looks like DP problem. Now I would ask for guidelines in base of is this right way to approach this problem, because it seems like best precision/time-cost ratio. I considered heuristics or genetic algorithm but it seems they would get off course more fast job done, but I think I might not get best outcome.
If you came this far, at least thanks for reading, if you could just approve my thoughts or suggest me or share me some paper that gives better idea. Just want to get to best approach possible which will be also time efficient.
I made a previous post about this, https://www.reddit.com/r/algorithms/comments/14e7vfo/algorithm_to_remove_temperature_influence_on/
I'm going to try again since I didn't explain myself well.
I have a system which measures liquid volume. In this test the liquid volume is fixed, and hence the measurement should be constant. But the measurement has a temperature dependence, and so the value varies over temperature. Both the liquid and the measuring device have a temperature dependence.
In addition, the system also measures a fixed reference channel which physically cannot change, but also shows some correlation to temperature. The reference channel uses the same measuring device, so any temperature dependence in the measuring device will be compensated for, by using this reference channel.
For ease, I’ve put the whole system in our kitchen fridge and gathered data over many hours (mostly over night and weekends when the fridge stays shut).
Here you can see the fridge temperature (measured by a digital thermometer) and the reference channel:
https://i.imgur.com/HciOUvP.png
Considering that on the cooling cycle of the fridge, the air temperature changes quite rapidly and the thermal mass of the measured items result in some thermal lag, I think that’s a reasonable degree of correlation. As a first pass, I’m going to assume 100% correlation.
Here is the measurement channel and temperature:
https://i.imgur.com/QFskX5X.png
I’m wondering how to develop a formula to compensate the liquid signal such that for a fixed volume of liquid, over a wide temperature range, results in a near constant result (when the system reaches temperature ). Something that “looks” like this - nearly a flat line (I’ve faked it by making a wider y-axis range)
https://i.imgur.com/MfSqG7k.png
Even reducing the variation in the liquid signal would be good.
I would imagine this algorithm would use:
Initially I thought I could compensate without needing to measure the temperature, by using the reference channel, but on reflection, the reference channel can compensate the temperature effect on the measurement device, but not the temperature effect on the liquid.
Thanks for reading.
I have a knapsack problem. Finding maximum point with maximum money. But u just can choose the maximum point from odd-even combination. (Ex : 1 4, 2 3 6, etc). How i can implement DP for this problem. I've try used backtracking and its so slow
Hi, I was rejected from a job offer because I could not get the optimal solution for this problem:
Given a list of numbers, each number represents the number of days before an alien gives birth. An adult alien gives birth once every 7 days indefinitely. A baby alien reaches adulthood after two days. Compute the final number of aliens given an initial list of aliens and a number of days.
Here is a small example:
For the list [3, 2], and 5 days, this is how the list should evolve:
[3, 2]
[2, 1]
[1, 0]
[0, 6, 8]
[6, 5, 7, 8]
I got a naive solution by computing each day for one alien and pushing baby aliens to the end of the resulting list, then counting the total elements of the resulting list. The problem is that this takes an exponential amount of time and space. The interviewers said the optimal solution is O(n) for n the number of elements in the list, but I simply can’t get a solution with that complexity. I thought of mathematically computing how many babies each alien will have, but I also need to do this for each baby recursively, this also does not seems O(n) at all. Do you have any clue?
Edit: reformatted example
Can please anyone provide me some subtle hint on this curious puzzle: Given a string consisting of only parentheses, e.g. (()(()))()
we want to tell how many different subsequences with proper (valid) arrangement of parentheses are possible. E.g. (), (()), ((())), ()(), ...
etc.
Note, it is about subsequences rather substrings. For example (()()())
contains subsequence ((()))
despite no such substring.
I started to code this as typical DP problem but suddenly realized that subsequence, say, `()` could be encountered in multiple ways and we want disregard duplicates. It doesn't look I can simply put all results into hashtable as limit (e.g. 300 parentheses) means the result is expressed as a quite large value.
(this problem was shown to me as a "similar but simpler" to another one, regretfully I feel too stupid and this simpler doesn't enlighten me yet - thus ideally I'd like some more hint rather than solution explained)
Hi, I have been trying to make a robot face that will smoothly track objects or people as they move around the room. I have a test setup of a camera on a pan-tilt rig driven by two servos (HWonder high torque 25kg/cm servos). The controller is a raspberry pi 4 and computer vision processing is done on a tensor processing unit.
The main issue I have been having is that if I want faster movement then the servo's overshoot the subject and there is a lot of thrashing back and forth before it gets close. If I slow down how quickly it moves then it works ok, but it is still not very smooth.
I have tried three approaches that all first involved calculating the difference between the center of the object and the center of the camera screen, I call this error_x
or error_y
using this value I compute the new angle that the servo must move to in order to close the distance and this keeps happening through a loop with the image processing. These are new_angle_tilt
and new_angle_pan
new_angle_pan = kit.servo[1].angle + error_x*.05
This works well when I use a very small factor (ie .05
) but that causes it to move very slowly and not smoothly. If I increase the factor the camera overshoots the center too much and thrashes all over.
new_angle_pan = kit.servo[1].angle + sign(error_x)*2.5
This works better than using a fraction of error_x but is super slow and not very realistic head movement.
pid_pan = PIDController(kp=0.05, ki=0.01, kd=0.01, max_output=5, min_output=-5)
new_angle_pan = kit.servo[1].angle + pid_pan.compute(0, error_x)
I can share my full code if people need it, but I think I am looking more for a conceptual answer than for someone to troubleshoot my code as I know it works when it's moving slowly.
Another user suggested doing a y=mx+b
calibration, I am going to look into this.
At this point I am wondering if I need to implement some type of re-enforcement learning to teach the system how to move correctly, though that seems like a big time investment.
Or I give up on having the camera on the pan-tilt and I instead have the camera fixed and map distances across the screen to angles on the servo using something like remap in python. Or maybe there is something I haven't thought of yet and this group can save me a ton of headaches lol
for(int i=0;i<N;++i){
for(int j=0;j<N;j+=1){
if(j>=1000001){
i+=ceil(N/10);
break;
}
}
According to me when i try to solve this algorithm, I think the time complexity should be O(n^2) because what if there is no element which is equal to 1000001 and we loop through the whole inner loop, assumuing the worst but and my TA says it is O(N) which is quite don't understand can please someone would like to provide me the correct logic and help me, please!
I was wondering if anyone knows of an algorithm that produces voronoi type diagrams. I am currently using fortunes sweep algorithm in my project for generating terrains. However due to the size of my terrains (~1 million seed points) it takes quite a long time. As stated it doesn’t have to exact, just something that looks similar that is pretty fast, thanks
Hello everyone
I very much doubt I picked the right subreddit for my question / request but lets see where this goes, please redirect me to a more appropriate sub if needed.
One part of my thesis is dedicated to "Route Planning Systems" (as if in vehicle navigation on public roads), more specifically how a route from A to B is constructed from a technical standpoint and how these kinds of systems can be optimised (idk... caching between commonly queried nodes?) but I struggle to do any proper research for this topic.
Road navigation sounds like a typical example for the shortest-path-problem. Information on algorithms to solve this type of problem is widely available yet technical analysis documents and research papers (no, blogs don't count) on the topic is not.
Are there any specific keywords I should focus on finding material for? Can somebody point me into the right direction?
Thanks in advance
Can someone explain me intuitively the difference between Counterfactual regret Minimization and Monte Carlo tree search (or IS-MCTS for imperfect information games).
Also when it is best to apply which algorithm and if both are suited for real time decision making?
I read online that to solve max flow for a single source / sink undirected graph, one must turn this into a directed graph by replacing every undirected edge with capacity c between nodes u, v and two directed edges (u,v) and (v,u) both with the same capacity c, and that once solved the flow in one of these two edges must be zero. Is there some proof with explanation on why the two problems are equivalent?
Say you have 3 columns of data and each column has 3 columns of data so
X y z x1 y1 z1 x2 y2 z2
I need to compare everything in a row to check whether higher or lower. So x compared with x1 and x2 then x1 compare with x2 then same with y and z.
Now there will be more than 3 columns also so I need some algorithm that iterates through every possibility in a row.
I make “5” my pivot but by the time I get to the end, 6 is the last number in the array and 5 has nowhere to go to partition it.
[5, 6, 7, 2, 3, 4, 1]
(Swap 6 with 2)
[5, 2, 7, 6, 3, 4, 1]
(Swap 6 with 3)
[5, 2, 7, 3, 6, 4, 1]
(Swap six with 4)
[5, 2, 7, 3, 4, 6, 1]
(Swap 6 with 1)
[5, 2, 7, 3, 4, 1, 6]
And now five can’t go anywhere to partition the array.
Context of the problem: I have the following entities: stations, programs, project manager, days, and time slots. The goal of the problem is to decide what program with who project manager to assign on what day at what time slot for each station.
Some notes:
Each program has its own duration. Say Program A last for 50 minutes, Program B is 30 minutes, etc.
Each project manager has qualifications on what program he/she can handle. This constraint is a hard constraint.
Time slots start from 6AM to 6PM. This means that if Program A which lasts for 30 minutes is assigned to start at 6:00AM, then it will end at 6:30AM. Only one program can be assigned in each station, so there should be no overlap in programs in terms of day and time per station.
There is a forecasting model that takes as input the studio, program, project manager, day, and time set and can give the expected revenue. We want to maximize this while satisfying the project manager qualification constraint, as well as the no schedule overlap constraint.
It is alright to have blank time slots. Example: Program A is from 6:00AM to 6:30AM. Program B can start at 6:40AM. That means 6:30 AM to 6:40 AM has no schedule.
What was tried so far:
Google OR - Tried to represent a solution to be a 5 dimensional matrix with an hour granularity. Dimensions are stations, program, project manager, day and time. If matrix[station][program][project manager][day][time] = 1, then that set is assigned, otherwise not. The main issue encountered here is about time slots, as they are not necessarily on a per hour basis. We tried time slots to be in a 5-minute interval. However, constructing the constraints that would adhere to each programs duration was proven to be difficult.
Pyomo: We tried pyomo still using the same matrix representation as above (5-minutes timeslot interval), but still encountered the same difficulty of expressing program durations as constraint. I seem to not able to make a condition inside the constraint declaration such that if this matrix entry is 1, then do this operation.
Genetic Algorithm: Tried also GA and seems that it is the viable option currently have, although the response time is really long. The solution representation is an array of days, and each day has an array of schedules. Each schedule is a combination of station, program, project manager, and start time.
We are just not sure if GA is the only option we have, or there are other algorithms or python tools that could help us on this complexity of out problem where the solution space seems to be very large. Any advice or thoughts would be useful. Thanks!
I came across this problem on the web the other day.
Find the number of subarrays in an array that contain an odd number of odd integers.
Eg:
-----------------------------------------------------------------------
Array: [2, 2, 1, 2, 2, 2, 1, 2, 2, 1]
Some of the valid subarrays: [2, 2, 1], [2, 2, 1, 2], [1, 2, 2, 2, 1, 2, 2, 1].........
You get the point.
---------------------------------------------------------------------------
I know how to iteratively do this.
Iterative approach:
What's a more efficient way to approach this problem? I'm guessing it's got something to do with keeping note of the indexes of where the odd values are and breaking those down into smaller ranges and then dealing with them individually?
I would like to modify "classic" reddit story ranking algorithm described here (i dont think its used anymore but nevermind):
https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9
New requirements to introduce:
Any ideas?
I was asked this one today, and have spent the last few hours trying to think of a way to model/program a solution. Anyone know of a good way to think about this?
Problem: There are 24 volunteers. Over the next 3 weeks, each volunteer is assigned to a different task. There are 8 tasks. Each week, the volunteers switch tasks. Each task has 3 volunteers assigned to it. Volunteers cannot be assigned to the same task more than once, and volunteers cannot share the same task more than once.
Say you're building a wiki/document product like Notion/GDocs/Coda. Notion and Coda seem to split the document into blocks (new block starts when you press return).
If we want to search across title and document body, how hard of a problem is this? What makes it hard? For starters, let's assume the blocks of text are stored in postgres.
The search in all these products sucks, especially searching within the document. For example, I have a document in Notion which has the words "will work on" in the body. When I search "will work on", I see the document in the results because it matched "will" but it wasn't able to find the full phrase because it's not highlighted in the results. I understand there's a limit to the number of "multiple word" combinations you can extract out of a document for reasons of scale, performance, and resource constraints.
I'd like to still understand why these startups with tens/hundreds of millions in funding can't do better w.r.t search? How do solutions like Algolia, Meilisearch, or Typesense fare in this regard?
Should a young startup that's building a knowledge base/wiki platform just accept that search will suck?
I’ve found myself in a situation today where this is quite literally what I need. Implementing such a thing is fairly trivial, but I’m shocked I can’t find a implementation or a library in any language doing such a thing (I’m working in go so I know I’ll need to do it myself not looking for a lib or anything).
My thought is that maybe I’m calling it by the wrong name or something? Anyone familiar with a similar DS?
I am curious to see if my understanding of Depth First Search is correct. In my classes at university we were taught that DFS prioritizes going as deep as possible into a tree or graph before returning to add unvisited paths/edges. So doesn't that mean in a graph drawn with some level of interconnectivity that we could technically have multiple "correct" search trees from DFS? and that which one we use is an arbitrary decision we could make?
Example Graph:
Example I had in mind
I'm not talking about in a specific electronic case here btw. I know that how you write code might change this but assume for me that we are doing this by hand and stuff. Hopefully this is making sense.
"Incremental backup" is a well-known concept. Whenever we commit, only the increments will be recorded to save all the versions and save disk space.
While increments in source codes (plain texts) are very easy to identify, increments in binary files (like a painting, when we add one additional stroke to it) are harder. It is hard in general to identify what exactly is changing in the document and encode that in the most efficient way.
Is there a nice way to precisely define and handle "increments" theoretically?
I am trying to design an algorithm. For now, I am trying to give myself a large overview.
Assume I have K (for example, K can equal 8) perfect slave machines, and one master machine. Each slave has a perfect HDD. I want to save the string "ABCDEFGHIJKLMNOP". My wished are
Note that I am differentiating between compromised and accessible. A prefect adversary compromises a slave without rendering it accessible. However, a slave can become inaccessible with or without the involvement of an adversary.
How should I go about designing an algorithm that can do it?
My attempt to solution was to experiment with Reed Solomon Encoding - but I can't find a way that would allow both the fault tolerance, and significant difficulty for an attacker.
As an answer, I would like a detailed pseudocode with explanation with step by step explanation. In this question, I have not set any limit on K or p because I don't know if that is necessary or not. However, if a fully general answer isn't possible, then may be you can set suitable limits on these, to develop a special solution.
Thank you.
Context, I’m brushing back up on my algorithms after taking a year off. Also I’ve been playing a bit of d4 lately, so the inspiration for this problem stems from the paragon board. Also, realized I’m far too rusty for this to be my first problem back 😅
Given a graph containing a starting city, a number of cities with their respective populations, and miles between cities, as well as an integer i corresponding to the resources available to pave i miles of road, generate the subgraph such that the most number of people can be reached by paved road from your starting city. Tl;dr, generate the connected subgraph containing starting node(I guess it’d end up being a tree) with max weight nodes with total edge weight less than or equal to i).
Seems like this is going to be one of those things that at face value is easy but then is np-hard. But again, hella rusty, so figured I’d ask the folks here 😅
I'm fighting a O(n!) problem.
Things I tried.
Try an evolutionary code instead. Fail. Always runs into useless side minima. (And I have no idea how to avoid that.)
Improve the code. Yay, brought the inner loop from n^2 to n*(log n)^2 . Only for my n=10-20, no actual improvements (more array operations needed or whatever the reason).
The next step would have been actual parallel code, but alas, I'm no longer a student and can only parallelize by running my second computer, both of them being rather sluggy anyway.
Which brought me to an idea. If I would find 1000 gullible twits^W generous benefactors to run my code in parallel I could go about 3 numbers up with my n and maybe finally see the pattern. (A few subpatterns I already utilized, e.g. if the first number of the permutation is i, the second will be i+1 or i-1 for the optimum.) And with 1000000, I surely will crack it.
Which brings me to my question. There are numerous projects like Seti@Home, which parallelize tasks for anyone to join. Now, finding alien life is probably more interesting than analyzing the worst runtime of the Splay algorithm :-), but is there any chance I can kick off my own project as a poor nobody?
I was working on a computer science problem that requires me to calculate and keep track of the average of an array of values. The first approach is what came to my mind first. However, the autograder for the problem did not accept the first approach, no matter how many little things I tried. The second approach, however, seems to work. I just can't figure out why.
The main difference I can see is that one keeps track of only the average, and the other keeps track of the cumulative sum. I was reading online that keeping track of the cumulative sum and dividing by the new number of elements could be a better approach because the double will be more accurate. But would the difference be significant?
I don't know how big the test cases were, so maybe that's why the first one did not work. But keeping a double average, and multiplying the number of old elements just makes sense - at least mathematically. So, why does this not work when programming?
Everything else in the code is the same, from reading in data to printing. So, I think it should be this part that makes the difference. Could the inaccuracy due to not keeping track of the running sum and just using an average be the issue?
Thanks!
Approach 1 (Doesn't work):
double average = 0;
int numWealthy = 0;
for (int j = index; j >= 0; --j) {
average = ((average * numWealthy) + people[j]) / (numWealthy + 1);
if (average >= wealthLimit) {
++numWealthy;
} else {
break;
}
}
Approach 2 (Works):
double cumulativeWealth = 0;
int potentialNumWealthy = 0;
for (int j = index; j >= 0; --j) {
double newCumulativeWealth = cumulativeWealth + people[j];
int newPotentialNumWealthy = potentialNumWealthy + 1;
double newAverage = newCumulativeWealth / newPotentialNumWealthy;
if (newAverage >= wealthLimit) {
cumulativeWealth = newCumulativeWealth;
potentialNumWealthy = newPotentialNumWealthy;
} else {
break;
}
}
I was working on a computer science problem that requires me to calculate and keep track of the average of an array of values. The first approach is what came to my mind first. However, the autograder for the problem did not accept the first approach, no matter how many little things I tried. The second approach, however, seems to work. I just can't figure out why.
The main difference I can see is that one keeps track of only the average, and the other keeps track of the cumulative sum. I was reading online that keeping track of the cumulative sum and dividing by the new number of elements could be a better approach because the double will be more accurate. But would the difference be significant?
I don't know how big the test cases were, so maybe that's why the first one did not work. But keeping a double average, and multiplying the number of old elements just makes sense - at least mathematically. So, why does this not work when programming?
Everything else in the code is the same, from reading in data to printing. So, I think it should be this part that makes the difference. Could the inaccuracy due to not keeping track of the running sum and just using an average be the issue?
Thanks!
Approach 1 (Doesn't work):
double average = 0;
int numWealthy = 0;
for (int j = index; j >= 0; --j) {
average = ((average * numWealthy) + people[j]) / (numWealthy + 1);
if (average >= wealthLimit) {
++numWealthy;
} else {
break;
}
}
Approach 2 (Works):
double cumulativeWealth = 0;
int potentialNumWealthy = 0;
for (int j = index; j >= 0; --j) {
double newCumulativeWealth = cumulativeWealth + people[j];
int newPotentialNumWealthy = potentialNumWealthy + 1;
double newAverage = newCumulativeWealth / newPotentialNumWealthy;
if (newAverage >= wealthLimit) {
cumulativeWealth = newCumulativeWealth;
potentialNumWealthy = newPotentialNumWealthy;
} else {
break;
}
}
Hey guys.
I'm confused on how the calculation of low link values during a dfs works.
According to Geeks for Geeks and several other sources, the algorithm for calculating the low link value of a node u is as follows:
Let v be a child of u.
The algorithm makes sense to me, but I watched a video about it that gets another result for this graph. According to the video, when doing the dfs in the order of visits marked, the low link value of each vertex will be zero.
But, for me, following the algorithm exposed above I obtain the values 0, 0, 0, 2, 1, 3, 4, respectively.
What is wrong? The video? My understanding?
Note that I am not interested in the stack version used by the Tarjan's Algorithm to find SCCs. I am just confused about the default computation for low link values.
This feels very much NP-hard like, but none the less ill try asking anyway.
Given M marbles, C buckets and each bucket having capacity K, where M = C * K , find some distribution of the marbles that maximizes the number of colours that are shared. colour A and colour B are considered to be shared if they appear together in the same bucket.
The marbles and their distribution may be seen before you start distributing them. That is to say, they may be added to the buckets in any order you desire.
An example:
Bucket1 : A C
Bucket2 : A C
Bucket3: B D
Bucket4: B D
This is M = 8, C = 4, K = 2
Obviously this is not optimal since C can be swapped with D to give a better distribution of colours.
Sometimes there might be multiple optimal solutions, and i'd prefer to take the optimal which "has a better spread" such that each pair of colours is close to the expected mean that each colour should be expected to connect to , but that might be harder to prove NP-hardness for, so i'm happy to drop that for now (and give more details in the comments if anyone wants to request for them).
Can we prove this is NP-hard? Any greedy algorithm i've written never seems to reach the optimal so far (best ive gotten is trying each available swap and doing so until I can't get any better a score).
public static objectReturn MergeandCountSplitInv(int [] C, int [] D) { int i = 0; int j = 0; int splitInvs = 0;
//int B[] = {};
int n = C.length + D.length;
int []B = new int[n];
for(int k = 0; k < n - 1; k++) {
//for(int k : )
System.out.println("i is " + i);
System.out.println("j is " + j);
if (C[i] < D[j]) {
// System.out.println("C is " + Arrays.toString(C)); // System.out.println("D is " + Arrays.toString(D)); //System.out.println(k); //System.out.println(i); B[k] = C[i]; i = i +1; }
else {
B[k] = D[j];
j = j + 1;
//System.out.println(j);
splitInvs = splitInvs + ((n/2) - i + 1);
}
}
objectReturn pass = new objectReturn(B, splitInvs);
return pass;
}
Hello guys,
I was solving a problem regarding given Big Oh and Theta for worst case and best case runtimes, is it possible to get O(_) on any inputs?
I couldnt wrap my head around what does it mean for 'worst case and best case'? I thought Big Oh and Theta apply to the all cases of the algorithm in general.