programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
• Campbell Ritchie
• Jeanne Boyarsky
• Bear Bibeault
• Knute Snortum
• Liutauras Vilda
Sheriffs:
• Tim Cooke
• Devaka Cooray
• Paul Clapham
Saloon Keepers:
• Tim Moores
• Frits Walraven
• Ron McLeod
• Ganesh Patekar
• salvin francis
Bartenders:
• Tim Holloway
• Carey Brown
• Stephan van Hulst

Search for max sum of number pair in an array

Ranch Hand
Posts: 579
6
Hi,

I have a algorithm problem.  We have an int array, we have to find if two numbers in the array exist whose sum will be equal to the MaxSum value.

Trying to find an optimal solution. I think I have found half of the solution. but it is still not the best solution.

Here is the code:

With the current implementation, my best solution performs better than my brute solution. But if I can find a way to get rid of this sorting and binary search, I think it will be much more faster.

Any ideas are welcome.

Marshal
Posts: 62231
193
Don't run a small loop and use nanoTime on it. The optimisations in the JVM will make the timings inaccurate. Run the loop through several tens of thousands of assessments before you do your timings, then run the same loop 1000000×. That will give you a better estimate of the timings.
You cannot work out algorithms in code; you need to work them out on paper first. Write down what you have, on paper, and show us it. See if you can work out the big‑O value of each algorithm.
What is happening with the binary search call?

Can you work out an algorithm which runs in linear time? Can you get it to work when you have two largest values in your array?

Bartender
Posts: 639
9
• 1
First, your term "MaxSum" is misleading - it implies you are trying to find the maxim sum of two elements. If you sort the elements, the maximum sum will be the sum of the last two elements of the sorted array.

But you seem to be looking for two elements that add up to a particular value, not a maximum value.
For your brute force method you can halve the time by having the inner loop start at innerCount=outerCount+1 so you don't try b+a when you have already tried a+b.

lowercase baba
Posts: 12628
50
• 1

Fred Kleinschmidt wrote: If you sort the elements, the maximum sum will be the sum of the last two elements of the sorted array.

They could be the first two...

author
Posts: 23811
140
• 1

Besides what everyone has stated so far, this is also not a fair comparison. In the straw man case, it is measuring the full time. In the "improved" case, it is ignoring the time to copy and sort the array. Should this time not count? That is like saying that a plane is always faster than a car, for every case... because I am not measuring the time to get to the airport, the time to go through checkin, the time for security, etc. etc. etc.

Henry

Fred Kleinschmidt
Bartender
Posts: 639
9

fred rosenberger wrote:They could be the first two...

If the array is sorted in ascending order, the sum of the last two elements will always be the maximum sum of any two items. I agree that if there are repeated values, there may be other elements that add up to that number, but that won't change what the maximum sum is, and there is  no reason to do any searching if you just use the final two. But again, I don't think that is what the OP is really looking for - the title should be changed to reflect what the real question is.

Master Rancher
Posts: 3036
106
And, in the binary search version, beware not to count an element twice.
For instance, if there is an element 7, and maxsum = 14, then your solution would be 7 + 7, even though 7 might be present only once.

fred rosenberger
lowercase baba
Posts: 12628
50

Fred Kleinschmidt wrote:

fred rosenberger wrote:They could be the first two...

If the array is sorted in ascending order...

You didn't say "sorted in ascending order".  you just said "If you sort the elements".

:-)

Fred Kleinschmidt
Bartender
Posts: 639
9

fred rosenberger wrote:[You didn't say "sorted in ascending order".  you just said "If you sort the elements".

Well, I was just looking at the OP's code for his BestSolution method, which calls Arrays.sort(), which is an ascending sort. :-)

Campbell Ritchie
Marshal
Posts: 62231
193
Shall we wait for OP to come back with the updated solution?

s ravi chandran
Ranch Hand
Posts: 579
6
Sorry for delayed response.

First, your term "MaxSum" is misleading - it implies you are trying to find the maxim sum of two elements. If you sort the elements, the maximum sum will be the sum of the last two elements of the sorted array

Agreed. I will rename it to something relevant. I have already solved the other case you mentioned.

Don't run a small loop and use nanoTime on it. The optimisations in the JVM will make the timings inaccurate. Run the loop through several tens of thousands of assessments before you do your timings, then run the same loop 1000000

Was just trying to create the logic part with small value, then test it will last value range.

Besides what everyone has stated so far, this is also not a fair comparison. In the straw man case, it is measuring the full time. In the "improved" case, it is ignoring the time to copy and sort the array.

In my actual code, both the array copy and sort are included in time calculation. I just moved it out here.

Like I said, Finding the first number by subtracting the current number with given sum value, looks fine.  It is only finding the second number which is not looking too clear.  Once I figure out how to get the second value, the solution should be linear time. As I am having a single loop.
The array sorting and binary search is not the solution. Hence the question posted.

I am not that good with planning, I always solve problems when I am actually running it. For me, its an incremental proces. Solving it step by step.

s ravi chandran
Ranch Hand
Posts: 579
6
One of my friend suggested a solution to find the second value.

I have to keep a hashset( or hashmap) with all the values in the array. Now when I am searching for possibleNumSeq, I just need to do a lookup in this set or map. It is present, the problem is solved.

Piet Souris
Master Rancher
Posts: 3036
106
Don't use a HashSet. And if you go the HashMap way, make a frequency table (see my earlier reply).

Campbell Ritchie
Marshal
Posts: 62231
193

s ravi chandran wrote:. . .

Besides what everyone has stated so far, this is also not a fair comparison. In the straw man case, it is measuring the full time. In the "improved" case, it is ignoring the time to copy and sort the array.

In my actual code, both the array copy and sort are included in time calculation. I just moved it out here.  . . .

Please have a look at this FAQ. If you show us something different from what you are actually using, you will mislead us into giving advice which doesn't help you.

I am not that good with planning, I always solve problems when I am actually running it. For me, its an incremental proces. Solving it step by step.

A bad way to solve programs. You will spend much more time and effort on that sort of process than you would on thinking about the problem first. Thought and planning would lead you to an elegant solution.

You seem to be changing your requirements, and I am not certain what you want, but there is a simple and elegant solution which runs in linear time, which traverses the array once, which finds the largest and next‑to‑largest values and adds them. It can easily be enhanced to cope with arrays with two largest values which are the same, which your solution with a Set will not. Please show us what you have got now.

Bartender
Posts: 10575
66

Piet Souris wrote:Don't use a HashSet. And if you go the HashMap way, make a frequency table (see my earlier reply).

Agreed. And it only needs to be created once, no matter how many times you call findSum() on a particular array.

Thereafter, any sum can be found - or not - in O(n) time (worst case) whether your array is sorted or not.

Winston

Ranch Hand
Posts: 239
12
Rather than a HashMap or HashSet, a BitSet is the best data structure in my opinion since it will usually use less memory.

The duplicate (e.g. 7+7) case is not an issue if we are only trying to find that any pair of numbers summing to a target exists in the input. When the first 7 is encountered, (or whatever half of the target is) the corresponding bit is flipped indicating its presence. When the second 7 is encountered, the slot is checked and returns true, indicating the sum exists. The function then exits returning true.

By the variable name "MaxSum" it is possible that the OP is trying to solve a variant of the problem where the sum containing the highest value is requested, i.e. "Find and return the pair with the highest value that sums to 14 in this array" in which case (8,6) would be a greater pair than (7,7) since 8 > 7. To resolve this difficulty a second BitSet could be used to record any duplicate values should they be encountered.

Piet Souris
Master Rancher
Posts: 3036
106

The array might contain Integer.MAX_VALUE; The Bitset would then be quite large, and what about MIN_VALUE?

And I'm not sure I understand your reasoning why duplicates are not an issue. Although OP's intention is indeed not quite clear, the question is if there exists two numbers in the array such that ...
Well, let's wait if OP can give a definition that removes all remaining doubts.

s ravi chandran
Ranch Hand
Posts: 579
6
okay, let me re-define the problem

The objective is to find if a pair of number exists in an array, whose sum is equal to a given number.

Time calculation is to just see relative difference. It is not the objective of this problem. I just kept the code as I am using it, with aforementioned change.

From my initial understanding, my brute method will have time O(n^2) and am trying to solve best method with O(n).  Considering that with the solution involving Set, I have to iterate the whole array once to populate it, apart from iterating array to find the pairs. It may come to O(n) with some added constants.

I will check out frequency table and Bitset.

Meanwhile, how is frequency table be better than hashset ? It store the frequency of each numbers, right? Won't a direct hashset.contains ( possibleNumSeq ) return me direct result?

Bartender
Posts: 5303
55

we have to find if two numbers in the array exist whose sum will be equal to the MaxSum

Leave me to wonder about them because if you are looking for an equivalent value then the name should be more like, "matchSum".
Are we really matching "MaxSum" or are we supposed to look for two values whose sum is >= MaxSum?

Piet Souris
Master Rancher
Posts: 3036
106
Well, say we have int[] array = {7}, and that we give the sum as 14. Does your question have a solution here? And when array = {7,7}, does that make any difference?
Now, if there is no solution to the first question, can you tell why a HashSet would not be suitable?

s ravi chandran
Ranch Hand
Posts: 579
6
Okay, I see your point with the set. How about a list? It would also have the contains() method, right?

MaxSum is definitely a wrong term. Did not get a suitable name so far. Maybe I can say it is MatchPairSum.

Scott Shipp
Ranch Hand
Posts: 239
12

Piet Souris wrote:Not sure about Bitsets.

The array might contain Integer.MAX_VALUE; The Bitset would then be quite large, and what about MIN_VALUE?

And I'm not sure I understand your reasoning why duplicates are not an issue. Although OP's intention is indeed not quite clear, the question is if there exists two numbers in the array such that ...
Well, let's wait if OP can give a definition that removes all remaining doubts.

You're right that BitSet should not be used in either case you mention. It really is about the problem space. Whenever I have typically heard the question it is for small positive integers in which case you would typically use the same memory as one or two "long" variables, 64 or 128 bits.

s ravi chandran
Ranch Hand
Posts: 579
6
An array  int[] array = {7} would result in a false. This can be one of the base conditions to be applied before actual processing.

Fred Kleinschmidt
Bartender
Posts: 639
9
If it is not allowed to add a number to itself unless there are at least two instances of that number in the set, you can use a List and use indexOf() to see if the desired value is in the set, and lastIndexOf() to see if there is at least one other entry with the same value.

Scott Shipp
Ranch Hand
Posts: 239
12

s ravi chandran wrote:Okay, I see your point with the set. How about a list? It would also have the contains() method, right?

MaxSum is definitely a wrong term. Did not get a suitable name so far. Maybe I can say it is MatchPairSum.

If it really is just a provided input x and we want to find any two numbers that add up to x in our input array, then I like to call it targetSum, as it is the sum we are "target"ting.

Piet souris wrote:Well, say we have int[] array = {7}, and that we give the sum as 14. Does your question have a solution here? And when array = {7,7}, does that make any difference?

For array = {7} there are no pairs even resulting from such an input, so that seems like a case you don't even have to worry about.

Also I just realized that Integer.MAX_VALUE can (and should) just be ignored if it is encountered. In fact any value larger than the targetSum can't by definition be in a pair of values that will add up to targetSum. The target sum and zero would still be a valid pair though.

Carey Brown
Bartender
Posts: 5303
55
• 1

Scott Shipp
Ranch Hand
Posts: 239
12

Carey Brown wrote:

Exactly what I had in mind . . .

s ravi chandran
Ranch Hand
Posts: 579
6

Scott Shipp wrote:

Carey Brown wrote:

Exactly what I had in mind . . .

Thanks Carey. This looks like the solution. My solution would have been close to O(2n) + C. This one does it in O(n) + C.

Carey Brown
Bartender
Posts: 5303
55

Sometimes I'm torn about how much information to give out. Ideally, I'd like to help you figure it out for yourself. I also find English frustrating in conveying an algorithm. By the time you are precise enough it becomes more difficult to understand. I prefer pseudo code but I feel it doesn't give people as much opportunity to puzzle it out for themselves. Oh well.

Winston Gutkowski
Bartender
Posts: 10575
66

Carey Brown wrote:

Very nice. However, it means rebuilding the Set for every call to findSum().

If you do as Piet suggests, and build a "frequency table" of the numbers in the array (my preference would be a LinkedHashMap<Integer, AtomicInteger>), you only have to do it once for any given array, which you can then throw away altogether, and the logic becomes

Winston

Piet Souris
Master Rancher
Posts: 3036
106
Indeed, very much simpler that what I had in mind, and, contrary to what I stated, a Set is used as well... cow!

In Scotts topic on a functional way I gave my idea, using a HashMap and some predicates that are exactly as Winston described.
But it is very much verbose, and is much less "O(n)" than the Set way.

To create a HashMap only once is irrelevant here. In Careys algo you simply can simply create a Set once and clear it when searching for a new target.

Winston Gutkowski
Bartender
Posts: 10575
66

Piet Souris wrote:To create a HashMap only once is irrelevant here...

We'll have to agree to disagree there, because it's entirely dependent on HOW the method is likely to be used. Carey's solution is beautifully elegant (cow from me too), but essentially procedural; whereas I think yours may be more "objective".

You could for example, create a SumFinder object for an array, and then run findSum(value) on it as many times as you like with no overhead ... and perhaps even "enhance" it to deal with sums of more than 2 numbers without changing the API.

Now you could also do that with Carey's solution, but there wouldn't be much point since all its "working information" is provided as parameters.

Winston

Scott Shipp
Ranch Hand
Posts: 239
12

Winston Gutkowski wrote:

Piet Souris wrote:To create a HashMap only once is irrelevant here...

We'll have to agree to disagree there, because it's entirely dependent on HOW the method is likely to be used. Carey's solution is beautifully elegant (cow from me too), but essentially procedural; whereas I think yours may be more "objective".

You could for example, create a SumFinder object for an array, and then run findSum(value) on it as many times as you like with no overhead ... and perhaps even "enhance" it to deal with sums of more than 2 numbers without changing the API.

Now you could also do that with Carey's solution, but there wouldn't be much point since all its "working information" is provided as parameters.

Winston

You're right! Not only is the "working information" provided as parameters, it also relies on discovering that there's a sum during its iteration through the loop. It never tracks how many of any number there are, only that a sum exists. A frequency table would be useful to answer the variations of this question like "which pair of numbers is the latest occurring in the array", earliest occurring, how many permutations of "4 + 4" are there, etc. Basically, a frequency table has wider applicability.

I think both are elegant. Really just depends on exactly what you're trying to solve.

Carey Brown
Bartender
Posts: 5303
55
• 1

Winston Gutkowski wrote:

Carey Brown wrote:

Very nice. However, it means rebuilding the Set for every call to findSum().

True but you wouldn't have to build the entire set (except for worst case), you'd bail out as soon as you found an answer.

P.S. Thanks for the cows, which also means I got a pie. My wife wants to know if it's a cow-pie.

Winston Gutkowski
Bartender
Posts: 10575
66

Carey Brown wrote:True but you wouldn't have to build the entire set (except for worst case), you'd bail out as soon as you found an answer.

Which will only happen when you get to the second number of the sum - since the sum won't be "found" until at least one of its components is part of the Set.

Which suggests that average run time would be ≈O(3n/4) (that may be wrong; I'm no mathematician, as Piet will attest to ), whereas once you have a frequency table (O(n) fixed), each call to findSum() will take only O(n/2) on average, and won't involve any updates.

As Scott says. which is best depends very much on how you're likely to use the method - and kudos for coming up with such an elegant solution.

Winston

Bartender
Posts: 9558
189

Winston Gutkowski wrote:O(3n/4)
O(n/2)

Both of these are O(n). Constant multipliers have no effect on complexity cases.

Winston Gutkowski
Bartender
Posts: 10575
66

Stephan van Hulst wrote:Both of these are O(n). Constant multipliers have no effect on complexity cases.

Yes, but O notation is generally used to work out scalability. Here, we're discussing which "O(n)" might be better and why - rather like what the best key density is for a skiplist, or what load factor is best for a HashMap.

In both cases there are tradeoffs of space and time, so there's no "right" answer; merely ones that work best for applicable situations.

Winston

Scott Shipp
Ranch Hand
Posts: 239
12
It could be interesting to build a reverse-lookup map out of the array. For keys you'd use the values themselves. But for values in the map, you'd store a set of array indices. Taken together, this would give you a unique identity for each value in the array while allowing fast lookup for simpler cases (like "exists").

e.g.

Input Array {1,2,3,4,1,2,3,4,0,0}

Map {
0 -> {8,9}
1 -> {0,4}
2 -> {1,5}
3 -> {2,6}
4 -> {3,7}
}

This probably seems somewhat over the top but it could make sense for complicated real-world scenarios like figuring out unique combinations. For example, I can quickly determine from this map that unique combinations which add to 2 are found at index pairs (0,4}, {1,8}, and {5,9}. It's hard to imagine why you'd want to do that in the case of bare integers and their sums but there are real situations where this makes perfect sense.

For example, let's imagine you had a small company that gives hourly tours of the parthenon between the hours of 8am and 6pm. 8am-6pm could become an array of length 10 where index 0 represented 8am-9am, index 1 represents 9am-10am, etc.

Let's say there are three tour guides on staff all day. Only one of them should give a tour at any given time. None of them can ever give the hour-long tour twice in a row because the next hour's tour needs to start already when they are still finishing up. Not to mention they need to take a break after each hour-long tour. So your array of who is giving each hours tour turns into something like:

{30271, 40568, 10982, 30271, 40568, 30271, 40568, 10982, 30271, 10982}

Now you want to know which two staff will be at the tour desk each hour to cover answering the phones and signing people up for tours?

You would construct the map from this array:

Map {
30271 -> {0, 3, 5, 8}
40568 -> {1, 4, 6}
10982 -> {2, 7, 9}
}

And then you could easily find out from the map which hours each person is covering the desk using set operations on the map values.

For example, at hour "0" (8am-9am) we only do one iteration through the map entries, calling !value.contains(0) to determine that 40568 and 10982 are covering the desk. Etc..

In other words it seems there could be a lot of problems you could solve by indexing the array this way.

 Surfs up space ponies, I'm making gravy without this lumpy, tiny ad: RavenDB is an Open Source NoSQL Database thatâ€™s fully transactional (ACID) across your database https://coderanch.com/t/704633/RavenDB-Open-Source-NoSQL-Database