I have a algorithm problem. We have an int array, we have to find if two numbers in the array exist whose sum will be equal to the MaxSum value.

Trying to find an optimal solution. I think I have found half of the solution. but it is still not the best solution.

Here is the code:

With the current implementation, my best solution performs better than my brute solution. But if I can find a way to get rid of this sorting and binary search, I think it will be much more faster.

Any ideas are welcome.

You cannot work out algorithms in code; you need to work them out on paper first. Write down what you have, on paper, and show us it. See if you can work out the big‑O value of each algorithm.

What is happening with the binary search call?

Can you work out an algorithm which runs in linear time? Can you get it to work when you have two largest values in your array?

- 1

But you seem to be looking for two elements that add up to a particular value, not a maximum value.

For your brute force method you can halve the time by having the inner loop start at innerCount=outerCount+1 so you don't try b+a when you have already tried a+b.

- 1

Besides what everyone has stated so far, this is also not a fair comparison. In the straw man case, it is measuring the full time. In the "improved" case, it is ignoring the time to copy and sort the array. Should this time not count? That is like saying that a plane is always faster than a car, for every case... because I am not measuring the time to get to the airport, the time to go through checkin, the time for security, etc. etc. etc.

Henry

fred rosenberger wrote:They could be the first two...

If the array is sorted in ascending order, the sum of the last two elements will always be the maximum sum of any two items. I agree that if there are repeated values, there may be other elements that add up to that number, but that won't change what the maximum sum is, and there is no reason to do any searching if you just use the final two. But again, I don't think that is what the OP is really looking for - the title should be changed to reflect what the real question is.

Fred Kleinschmidt wrote:

fred rosenberger wrote:They could be the first two...

If the array is sorted in ascending order...

You didn't say "sorted in ascending order". you just said "If you sort the elements".

:-)

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors

First, your term "MaxSum" is misleading - it implies you are trying to find the maxim sum of two elements. If you sort the elements, the maximum sum will be the sum of the last two elements of the sorted array

Agreed. I will rename it to something relevant. I have already solved the other case you mentioned.

Don't run a small loop and use nanoTime on it. The optimisations in the JVM will make the timings inaccurate. Run the loop through several tens of thousands of assessments before you do your timings, then run the same loop 1000000

Was just trying to create the logic part with small value, then test it will last value range.

Besides what everyone has stated so far, this is also not a fair comparison. In the straw man case, it is measuring the full time. In the "improved" case, it is ignoring the time to copy and sort the array.

In my actual code, both the array copy and sort are included in time calculation. I just moved it out here.

Like I said, Finding the first number by subtracting the current number with given sum value, looks fine. It is only finding the second number which is not looking too clear. Once I figure out how to get the second value, the solution should be linear time. As I am having a single loop.

The array sorting and binary search is not the solution. Hence the question posted.

I am not that good with planning, I always solve problems when I am actually running it. For me, its an incremental proces. Solving it step by step.

Please have a look at this FAQ. If you show us something different from what you are actually using, you will mislead us into giving advice which doesn't help you.s ravi chandran wrote:. . .

In my actual code, both the array copy and sort are included in time calculation. I just moved it out here. . . .Besides what everyone has stated so far, this is also not a fair comparison. In the straw man case, it is measuring the full time. In the "improved" case, it is ignoring the time to copy and sort the array.

A bad way to solve programs. You will spend much more time and effort on that sort of process than you would on thinking about the problem first. Thought and planning would lead you to an elegant solution.I am not that good with planning, I always solve problems when I am actually running it. For me, its an incremental proces. Solving it step by step.

You seem to be changing your requirements, and I am not certain what you want, but there is a simple and elegant solution which runs in linear time, which traverses the array once, which finds the largest and next‑to‑largest values and adds them. It can easily be enhanced to cope with arrays with two largest values which are the same, which your solution with a Set will not. Please show us what you have got now.

Piet Souris wrote:Don't use a HashSet. And if you go the HashMap way, make a frequency table (see my earlier reply).

Agreed. And it only needs to be created once, no matter how many times you call

`findSum()`on a particular array.

Thereafter, any sum can be found - or not - in O(n) time (worst case) whether your array is sorted or not.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

The duplicate (e.g. 7+7) case is not an issue if we are only trying to find that any pair of numbers summing to a target exists in the input. When the first 7 is encountered, (or whatever half of the target is) the corresponding bit is flipped indicating its presence. When the second 7 is encountered, the slot is checked and returns true, indicating the sum exists. The function then exits returning true.

By the variable name "MaxSum" it is possible that the OP is trying to solve a variant of the problem where the sum containing the highest value is requested, i.e. "Find and return the pair with the highest value that sums to 14 in this array" in which case (8,6) would be a greater pair than (7,7) since 8 > 7. To resolve this difficulty a second BitSet could be used to record any duplicate values should they be encountered.

The array might contain Integer.MAX_VALUE; The Bitset would then be quite large, and what about MIN_VALUE?

And I'm not sure I understand your reasoning why duplicates are not an issue. Although OP's intention is indeed not quite clear, the question is if there exists two numbers in the array such that ...

Well, let's wait if OP can give a definition that removes all remaining doubts.

The objective is to find if a pair of number exists in an array, whose sum is equal to a given number.

Time calculation is to just see relative difference. It is not the objective of this problem. I just kept the code as I am using it, with aforementioned change.

From my initial understanding, my brute method will have time O(n^2) and am trying to solve best method with O(n). Considering that with the solution involving Set, I have to iterate the whole array once to populate it, apart from iterating array to find the pairs. It may come to O(n) with some added constants.

I will check out frequency table and Bitset.

Meanwhile, how is frequency table be better than hashset ? It store the frequency of each numbers, right? Won't a direct hashset.contains ( possibleNumSeq ) return me direct result?

we have to find if two numbers in the array exist whose sum will be equal to the MaxSum

Leave me to wonder about them because if you are looking for an

*equivalent*value then the name should be more like, "matchSum".

Are we really matching "MaxSum" or are we supposed to look for two values whose sum is >= MaxSum?

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Piet Souris wrote:Not sure about Bitsets.

The array might contain Integer.MAX_VALUE; The Bitset would then be quite large, and what about MIN_VALUE?

And I'm not sure I understand your reasoning why duplicates are not an issue. Although OP's intention is indeed not quite clear, the question is if there exists two numbers in the array such that ...

Well, let's wait if OP can give a definition that removes all remaining doubts.

You're right that BitSet should not be used in either case you mention. It really is about the problem space. Whenever I have typically heard the question it is for small positive integers in which case you would typically use the same memory as one or two "long" variables, 64 or 128 bits.

s ravi chandran wrote:Okay, I see your point with the set. How about a list? It would also have the contains() method, right?

MaxSum is definitely a wrong term. Did not get a suitable name so far. Maybe I can say it is MatchPairSum.

If it really is just a provided input

*x*and we want to find any two numbers that add up to

*x*in our input array, then I like to call it targetSum, as it is the sum we are "target"ting.

Piet souris wrote:Well, say we have int[] array = {7}, and that we give the sum as 14. Does your question have a solution here? And when array = {7,7}, does that make any difference?

For array = {7} there are no pairs even resulting from such an input, so that seems like a case you don't even have to worry about.

Also I just realized that Integer.MAX_VALUE can (and should) just be ignored if it is encountered. In fact any value larger than the targetSum can't by definition be in a pair of values that will add up to targetSum. The target sum and zero would still be a valid pair though.

Sometimes I'm torn about how much information to give out. Ideally, I'd like to help you figure it out for yourself. I also find English frustrating in conveying an algorithm. By the time you are precise enough it becomes more difficult to understand. I prefer pseudo code but I feel it doesn't give people as much opportunity to puzzle it out for themselves. Oh well.

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Carey Brown wrote:

Very nice. However, it means rebuilding the Set for every call to

`findSum()`.

If you do as Piet suggests, and build a "frequency table" of the numbers in the array (my preference would be a

`LinkedHashMap<Integer, AtomicInteger>`), you only have to do it

*once*for any given array, which you can then throw away altogether, and the logic becomes

Your solution is, admittedly, much more elegant though.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

In Scotts topic on a functional way I gave my idea, using a HashMap and some predicates that are exactly as Winston described.

But it is very much verbose, and is much less "O(n)" than the Set way.

To create a HashMap only once is irrelevant here. In Careys algo you simply can simply create a Set once and clear it when searching for a new target.

Piet Souris wrote:To create a HashMap only once is irrelevant here...

We'll have to agree to disagree there, because it's entirely dependent on HOW the method is likely to be used. Carey's solution is beautifully elegant (cow from me too), but essentially procedural; whereas I think yours may be more "objective".

You could for example, create a

`SumFinder`object for an array, and then run

`findSum(value)`on it as many times as you like with no overhead ... and perhaps even "enhance" it to deal with sums of more than 2 numbers without changing the API.

Now you could also do that with Carey's solution, but there wouldn't be much point since all its "working information" is provided as parameters.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

Winston Gutkowski wrote:

Piet Souris wrote:To create a HashMap only once is irrelevant here...

We'll have to agree to disagree there, because it's entirely dependent on HOW the method is likely to be used. Carey's solution is beautifully elegant (cow from me too), but essentially procedural; whereas I think yours may be more "objective".

You could for example, create aSumFinderobject for an array, and then runfindSum(value)on it as many times as you like with no overhead ... and perhaps even "enhance" it to deal with sums of more than 2 numbers without changing the API.

Now you could also do that with Carey's solution, but there wouldn't be much point since all its "working information" is provided as parameters.

Winston

You're right! Not only is the "working information" provided as parameters, it also relies on discovering that there's a sum during its iteration through the loop. It never tracks how many of any number there are, only that a sum exists. A frequency table would be useful to answer the variations of this question like "which pair of numbers is the latest occurring in the array", earliest occurring, how many permutations of "4 + 4" are there, etc. Basically, a frequency table has wider applicability.

I think both are elegant. Really just depends on exactly what you're trying to solve.

- 1

Winston Gutkowski wrote:

Carey Brown wrote:

Very nice. However, it means rebuilding the Set for every call tofindSum().

True but you wouldn't have to build the entire set (except for worst case), you'd bail out as soon as you found an answer.

P.S. Thanks for the cows, which also means I got a pie. My wife wants to know if it's a cow-pie.

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Carey Brown wrote:True but you wouldn't have to build the entire set (except for worst case), you'd bail out as soon as you found an answer.

Which will only happen when you get to the

*second*number of the sum - since the sum won't be "found" until at least one of its components is part of the

`Set`.

Which suggests that

*average*run time would be

`≈O(3n/4)`(that may be wrong; I'm no mathematician, as Piet will attest to ), whereas once you have a frequency table (

`O(n)`fixed), each call to

`findSum()`will take only

`O(n/2)`on average, and won't involve any updates.

As Scott says. which is best depends very much on how you're likely to use the method - and kudos for coming up with such an elegant solution.

Winston

Articles by Winston can be found here

Winston Gutkowski wrote:

O(3n/4)

O(n/2)

Both of these are

`O(n)`. Constant multipliers have no effect on complexity cases.

*The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.*

Stephan van Hulst wrote:Both of these are

O(n). Constant multipliers have no effect on complexity cases.

Yes, but O notation is generally used to work out

*scalability*. Here, we're discussing which "O(n)" might be better and why - rather like what the best key density is for a skiplist, or what load factor is best for a

`HashMap`.

In both cases there are tradeoffs of space and time, so there's no "right" answer; merely ones that work best for applicable situations.

Winston

Articles by Winston can be found here

e.g.

Input Array {1,2,3,4,1,2,3,4,0,0}

Map {

0 -> {8,9}

1 -> {0,4}

2 -> {1,5}

3 -> {2,6}

4 -> {3,7}

}

This probably seems somewhat over the top but it could make sense for complicated real-world scenarios like figuring out unique combinations. For example, I can quickly determine from this map that unique combinations which add to 2 are found at index pairs (0,4}, {1,8}, and {5,9}. It's hard to imagine why you'd want to do that in the case of bare integers and their sums but there are real situations where this makes perfect sense.

For example, let's imagine you had a small company that gives hourly tours of the parthenon between the hours of 8am and 6pm. 8am-6pm could become an array of length 10 where index 0 represented 8am-9am, index 1 represents 9am-10am, etc.

Let's say there are three tour guides on staff all day. Only one of them should give a tour at any given time. None of them can ever give the hour-long tour twice in a row because the next hour's tour needs to start already when they are still finishing up. Not to mention they need to take a break after each hour-long tour. So your array of who is giving each hours tour turns into something like:

{30271, 40568, 10982, 30271, 40568, 30271, 40568, 10982, 30271, 10982}

Now you want to know which two staff will be at the tour desk each hour to cover answering the phones and signing people up for tours?

You would construct the map from this array:

Map {

30271 -> {0, 3, 5, 8}

40568 -> {1, 4, 6}

10982 -> {2, 7, 9}

}

And then you could easily find out from the map which hours each person is covering the desk using set operations on the map values.

For example, at hour "0" (8am-9am) we only do one iteration through the map entries, calling !value.contains(0) to determine that 40568 and 10982 are covering the desk. Etc..

In other words it seems there could be a lot of problems you could solve by indexing the array this way.