posted 12 years ago

Hello,

There is a technique for implementing a multiplicative linear congruential random number generator

x[n+1] = ( a * x[n] ) mod m (eq. 1)

in 32 bit integers that eliminates the potential for overflow in the intermediate product (a * x[n]), known as Schrage's method, and popularized in Numerical Recipes in C

http://www.library.cornell.edu/nr/bookcpdf/c7-1.pdf

Briefly:

*********************************

Schrage�s algorithm is based on an approximate factorization of m,

m = aq + r, i.e., q = [m/a], r = m mod a (7.1.4)

with square brackets denoting integer part. If r is small, specifically

r < q, and 0 < z < m - 1,

it can be shown that both a(z mod q) and r[z/q]lie in the range

0,...,m - 1, and that

az mod m = a(z mod q) - r[z/q] if it is = 0, (7.1.5)

a(z mod q) - r[z/q] + m otherwise

********************************

Applying Schrage to the LCG, in C code, Numerical Recipes achieves

x = a * (x - (x / q) q) - r (x / q) (eq. 2)

by substituting the modulo arithmetic equivalence

x mod q = x - q[x/q]

into a preceding step, which is

x = a * (x % q) - r * (x / q) (eq. 3)

which you get by applying Numerical Recipe's (7.1.5) to the LCG (eq. 1)

So, why keep going past (eq. 3) ??

Doing so replaces the mod operator in (eq. 3) with a subtraction, a multiplication, and a division, in (eq. 2).

Is mod that expensive??

Which expression is faster in Java? Note that this code can be iterated up to near 10^18 repetitions, with a combined pair of certain such LCGs, so every cycle counts bigtime.

Thanks!!

Paul

There is a technique for implementing a multiplicative linear congruential random number generator

x[n+1] = ( a * x[n] ) mod m (eq. 1)

in 32 bit integers that eliminates the potential for overflow in the intermediate product (a * x[n]), known as Schrage's method, and popularized in Numerical Recipes in C

http://www.library.cornell.edu/nr/bookcpdf/c7-1.pdf

Briefly:

*********************************

Schrage�s algorithm is based on an approximate factorization of m,

m = aq + r, i.e., q = [m/a], r = m mod a (7.1.4)

with square brackets denoting integer part. If r is small, specifically

r < q, and 0 < z < m - 1,

it can be shown that both a(z mod q) and r[z/q]lie in the range

0,...,m - 1, and that

az mod m = a(z mod q) - r[z/q] if it is = 0, (7.1.5)

a(z mod q) - r[z/q] + m otherwise

********************************

Applying Schrage to the LCG, in C code, Numerical Recipes achieves

x = a * (x - (x / q) q) - r (x / q) (eq. 2)

by substituting the modulo arithmetic equivalence

x mod q = x - q[x/q]

into a preceding step, which is

x = a * (x % q) - r * (x / q) (eq. 3)

which you get by applying Numerical Recipe's (7.1.5) to the LCG (eq. 1)

So, why keep going past (eq. 3) ??

Doing so replaces the mod operator in (eq. 3) with a subtraction, a multiplication, and a division, in (eq. 2).

Is mod that expensive??

Which expression is faster in Java? Note that this code can be iterated up to near 10^18 repetitions, with a combined pair of certain such LCGs, so every cycle counts bigtime.

Thanks!!

Paul

Paul Fenerty

Greenhorn

Posts: 23

posted 12 years ago

They also replace (x/q), which appears twice in (eq. 2) with another variable, and so there is a savings there, evaluating (x/q) only the once per iteration, in a separate expression.

Javaworld claims:

"divide remains the slow operation on most if not all architectures."

in

http://www.javaworld.com/javaworld/jw-04-1997/jw-04-optimize.html

Anybody know the relative efficiency of * / and mod ??

Thanks again!

Javaworld claims:

"divide remains the slow operation on most if not all architectures."

in

http://www.javaworld.com/javaworld/jw-04-1997/jw-04-optimize.html

Anybody know the relative efficiency of * / and mod ??

Thanks again!

posted 12 years ago

If that question was bothering me, as a first step I would write some test code and then use the javap tool to look at the bytecodes written by the compiler.

Bill

Bill

Paul Fenerty

Greenhorn

Posts: 23

posted 12 years ago

Thanks for that.

The new info that javap provides is the following:

* -> imul

/ -> idiv

% -> irem

Digging around, I (only) found a lisp implementation (ACL2), by J. Strother Moore:

http://www.cs.utexas.edu/users/moore/

that looks like this:

http://www.cs.utexas.edu/users/moore/publications/marktoberdorf-02/m5.lisp

-----------------------------------------------------------------------------

; (IDIV) Instruction

(defun execute-IDIV (inst th s)

(modify th s

c (+ (inst-length inst) (pc (top-frame th s)))

:stack (push (int-fix

(truncate (top (pop (stack (top-frame th s))))

(top (stack (top-frame th s)))))

(pop (pop (stack (top-frame th s)))))))

; -----------------------------------------------------------------------------

; (IMUL) Instruction

(defun execute-IMUL (inst th s)

(modify th s

c (+ (inst-length inst) (pc (top-frame th s)))

:stack (push (int-fix

(* (top (pop (stack (top-frame th s))))

(top (stack (top-frame th s)))))

(pop (pop (stack (top-frame th s)))))))

; -----------------------------------------------------------------------------

; (IREM) Instruction

(defun execute-IREM (inst th s)

(let* ((val1 (top (pop (stack (top-frame th s)))))

(val2 (top (stack (top-frame th s))))

(result (- val1 (* (truncate val1 val2) val2))))

(modify th s

c (+ (inst-length inst) (pc (top-frame th s)))

:stack (push result

(pop (pop (stack (top-frame th s))))))))

; -----------------------------------------------------------------------------

Don't know how typical this is among JVMs, but irem here is busier than the other two.

Taking some data on a P3, three samples of 2 x 10^10 iterations across each of the three versions of Schrage's version of Lehmer's mLCG come in as follows, and confirm that the modulo expression is the slowest approach:

**************

x = a * (x - (x / q) * q) - r * (x / q); // (eq. 2)

8m 44s

**************

int k = x / q;

x = a * (x - k * q) - r * k; // (eq. 2b)

8m 43s

**************

x = a * (x % q) - r * (x / q); // (eq. 3)

9m 50s

... so looks like they got it right in Numerical Recipes.

The new info that javap provides is the following:

* -> imul

/ -> idiv

% -> irem

Digging around, I (only) found a lisp implementation (ACL2), by J. Strother Moore:

http://www.cs.utexas.edu/users/moore/

that looks like this:

http://www.cs.utexas.edu/users/moore/publications/marktoberdorf-02/m5.lisp

-----------------------------------------------------------------------------

; (IDIV) Instruction

(defun execute-IDIV (inst th s)

(modify th s

c (+ (inst-length inst) (pc (top-frame th s)))

:stack (push (int-fix

(truncate (top (pop (stack (top-frame th s))))

(top (stack (top-frame th s)))))

(pop (pop (stack (top-frame th s)))))))

; -----------------------------------------------------------------------------

; (IMUL) Instruction

(defun execute-IMUL (inst th s)

(modify th s

c (+ (inst-length inst) (pc (top-frame th s)))

:stack (push (int-fix

(* (top (pop (stack (top-frame th s))))

(top (stack (top-frame th s)))))

(pop (pop (stack (top-frame th s)))))))

; -----------------------------------------------------------------------------

; (IREM) Instruction

(defun execute-IREM (inst th s)

(let* ((val1 (top (pop (stack (top-frame th s)))))

(val2 (top (stack (top-frame th s))))

(result (- val1 (* (truncate val1 val2) val2))))

(modify th s

c (+ (inst-length inst) (pc (top-frame th s)))

:stack (push result

(pop (pop (stack (top-frame th s))))))))

; -----------------------------------------------------------------------------

Don't know how typical this is among JVMs, but irem here is busier than the other two.

Taking some data on a P3, three samples of 2 x 10^10 iterations across each of the three versions of Schrage's version of Lehmer's mLCG come in as follows, and confirm that the modulo expression is the slowest approach:

**************

x = a * (x - (x / q) * q) - r * (x / q); // (eq. 2)

8m 44s

**************

int k = x / q;

x = a * (x - k * q) - r * k; // (eq. 2b)

8m 43s

**************

x = a * (x % q) - r * (x / q); // (eq. 3)

9m 50s

... so looks like they got it right in Numerical Recipes.

Paul Fenerty

Greenhorn

Posts: 23

posted 12 years ago

um, make that 2 x 10^9 iterations ...

posted 12 years ago

Maybe implement both and then run some profiling tests, i.e. how many calculations it can make in 10 min using each method?

It is sorta covered in the JavaRanch Style Guide. |