# Efficient algorithm for string manipulation

best algorithm for this:

1. We have two strings s1 and s2. We want to subtract every character

that exists in s2 from s1.

String s1 = "Misinterpret";

String s2 = "interpret";

We want to write a method that will return string "Mis" (in any order

"siM", "iMs" etc. is acceptable);

2. The order of characters does not matter.

I know there are java String API's to do that, but we will assume in

our case that we are manipulating using basic arrays.

What is the best approach to do this and what is its efficiency in

Big(O) ?

Sheriff

But if you have to do it using arrays, here is one way:

Create a boolean array with one boolean value for each possible character value. Set every value to false initially.

Now, for every char in the first string, set the corresponding boolean value to true. Then for every char in the second string, set it to false.

Every char that now has a correspending boolean value of true is one that needs to be in your output.

Out of the head I'd say that this has a complexity of O(n) (where n is the sum of characters in the two strings).

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

Originally posted by Ilja Preuss:

Well, I'd probably put the characters into two Sets and apply a removeAll operation.

But if you have to do it using arrays, here is one way:

Create a boolean array with one boolean value for each possible character value. Set every value to false initially.

Now, for every char in the first string, set the corresponding boolean value to true. Then for every char in the second string, set it to false.

Every char that now has a correspending boolean value of true is one that needs to be in your output.

Out of the head I'd say that this has a complexity of O(n) (where n is the sum of characters in the two strings).

grab the first String and push it into the 'map', then grab the second String and do removes. A final walk down the map would finish you off with the remainder.

That should still be O(n), well O(n + m), the two Strings, but basically the same.

My only question would be what if you are trying to eleminate a char that wasn't in the first String? I suppose you would just ignore it.

[added code tags - Ilja]

[ February 09, 2005: Message edited by: Ilja Preuss ]

James Carman, President<br />Carman Consulting, Inc.

Sheriff

Originally posted by Steven Bell:

I would make a 'map' out of the first string with a counter for each character.

Such a structure is typically called a Bag.

http://jakarta.apache.org/commons/collections/ implements it - James' algorithm would probably be simplified quite a bit by using it.

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

Originally posted by Ilja Preuss:

Such a structure is typically called a Bag.

http://jakarta.apache.org/commons/collections/ implements it - James' algorithm would probably be simplified quite a bit by using it.

Yes it would and if I actually wrote it for my use I would use Commons Collections.

James Carman, President<br />Carman Consulting, Inc.

Sheriff

Originally posted by James Carman:

Yes it would and if I actually wrote it for my use I would use Commons Collections.

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

Ranch Hand

Sheriff

Originally posted by Kirk Pepperdine:

which is O(n)

Only if n > m, isn't it?

Sheriff

[ February 28, 2005: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, *Twister*

Sheriff

- we have two inputs into the algorithm

- the two inputs are independent of each other, and

- the algorithm is linear over both inputs

That looks differently to me than O(n), but I might simply have stopped my studies too early, I guess...

On the other hand, if it weren't linearly dependent on both inputs, I guess you could consider them seperately, for example O(n^2,m^3). Then again, one of the inputs usually overshadows the other, and the problems we were given in school always had simple inputs. In a collection we dealt with just n, the number of items.

Sheriff

Originally posted by David Harkness:

Similarly, an O(xN + y) algorithm is also O(N).

In my understanding, that's only true if y is a constant, or if you can prove that y < xN for all N greater some value.

Originally posted by Ilja Preuss:

In my understanding, that's only true if y is a constant, or if you can prove that y < xN for all N greater some value.

This isn't quite right, either. If we assume x is 1 and y is 2N then

y > xN, but O(xN+y) is still O(N). I think David meant x and y

to be constant, though.

I think it helps to keep in mind that the big O notation describes

categories of behavior (e.g., linear, quadratic, logarithmic, etc.).

It's pretty easy to determine which category by recalling your first

year calculus, in particular L'Hospital's Rule.

Cheers,

Glenn

Glenn Murray

Author of Yo Soy Una Vaca De Hoy

Originally posted by Ilja Preuss:

In my understanding, that's only true if y is a constant, or if you can prove that y < xN for all N greater some value.

y is constant wrt N. If y is dependent of some other factor, then that factor should be considered as an input to your functions

So, function funcA takes time T(funcA) = an + y, and y = bm (where a and b are constants) then T= an + bm, and m is an input to your function. If y

*is*dependent on n, then you are expressing T in a wrong way. The correct way to express T is this

T = a(p)n^p + a(p-1)n^(p-1) + ......... a(1)n + a(0) + b(p)m^p + b(p-1)b^(p-1) + ......... b(1)m + b(0)

where n and m are inputs to your function and a(p), a(p-1)....a(1), a(0), b(p), b(p-1)....b(1), b(0) are constants

To simplify computing performance, we always take highest order of the equation. So, if your T is

T = 2n^2 + 10 n + 20 + 5m + 50

for a really large n, 2n^2 will be much larger than other parts of the equation. So, large that 10n + 20 + 5m + 50 becomes negligible

So, for large n, T = 2n^2

When we denote the Order, we ignore the coefficient of that equation . So,

O(funcA) = n^2

[ March 01, 2005: Message edited by: Jayesh Lalwani ]

Yes, I could have been more clear. I had just finished watching Shaun of the Dead and had a difficult time getting my mind into a mathematical frame.Originally posted by Glenn Murray:

I think David meant x and y to be constant, though.

Jayesh, thank you for posting a much more rigorous description.

Sheriff