问题描述:

I'm looking for a method to generate a pseudorandom stream with a somewhat odd property - I want clumps of nearby numbers.

The tricky part is, I can only keep a limited amount of state no matter how large the range is. There are algorithms that give a sequence of results with minimal state (linear congruence?)

Clumping means that there's a higher probability that the next number will be close rather than far.

Example of a desirable sequence (mod 10): 1 3 9 8 2 7 5 6 4

I suspect this would be more obvious with a larger stream, but difficult to enter by hand.

Update:

I don't understand why it's impossible, but yes, I am looking for, as Welbog summarized:

- Non-repeating
- Non-Tracking
- "Clumped"

Cascade a few LFSRs with periods smaller than you need, combining them to get a result such than the fastest changing register controls the least significant values. So if you have L1 with period 3, L2 with period 15 and L3 with some larger period, N = L1(n) + 3 * L2(n/3) + 45 * L3(n/45). This will obviously generate 3 clumped values, then jump and general another 3 clumped values. Use something other than multiplication ( such as mixing some of the bits of the higher period registers ) or different periods to make the clump spread wider than the period of the first register. It won't be particularly smoothly random, but it will be clumpy and non-repeating.

By defining the "clumping features" in terms of the probability distribution of its size, and the probability distribution of its range, you can then use simple random generators with the underlying distribution and produce the sequences.

One way to get "clumpy" numbers would be to use a normal distribution.

You start the random list with your "initial" random value, then you generate a random number with the mean of the previous random value and a constant variance, and repeat as necessary. The overall variance of your entire list of random numbers should be approximately constant, but the "running average" of your numbers will drift randomly with no particular bias.

```
>>> r = [1]
>>> for x in range(20):
r.append(random.normalvariate(r[-1], 1))
>>> r
[1, 0.84583267252801408, 0.18585962715584259, 0.063850022580489857, 1.2892164299497422,
0.019381814281494991, 0.16043424295472472, 0.78446377124854461, 0.064401889591144235,
0.91845494342245126, 0.20196939102054179, -1.6521524237203531, -1.5373703928440983,
-2.1442902977248215, 0.27655425357702956, 0.44417440706703393, 1.3128647361934616,
2.7402744740729705, 5.1420432435119352, 5.9326297626477125, 5.1547981880261782]
```

I know it's hard to tell by looking at the numbers, but you can *sort of* see that the numbers clump together a little bit - the 5.X's at the end, and the 0.X's on the second row.

If you need only integers, you can just use a very large mean and variance, and truncate/divide to obtain integer output. Normal distributions by definition are a continuous distribution, meaning all real numbers are potential output - it is not restricted to integers.

Here's a quick scatter plot in Excel of 200 numbers generated this way (starting at 0, constant variance of 1):

scatter data http://img178.imageshack.us/img178/8677/48855312.png

Ah, I just read that you want non-repeating numbers. No guarantee of that in a normal distribution, so you might have to take into account some of the other approaches others have mentioned.

I don't know of an existing algorithm that would do this, but it doesn't seem difficult to roll your own (depending on how stringent the "limited amount of state" requirement is). For example:

```
RANGE = (1..1000)
CLUMP_ODDS = .5
CLUMP_DIST = 10
last = rand(RANGE)
while still_want_numbers
if rand(CLUMP_ODDS) # clump!
next = last + rand(CLUMP_DIST) - (CLUMP_DIST / 2) # do some boundary checking here
else # don't clump!
next = rand(RANGE)
end
print next
last = next
end
```

It's a little rudimentary, but would something like that suit your needs?

In the range [0, 10] the following should give a uniform distribution. `random()`

yields a (pseudo) random number `r`

with `0 <= r < 1`

.

```
x(n + 1) = (x(n) + 5 * (2 * random() - 1)) mod 10
```

You can get your desired behavior by delinearizing `random()`

- for example `random()^k`

will be skewed towards small numbers for `k > 1`

. An possible function could be the following, but you will have to try some exponents to find your desired distribution. And keep the exponent odd, if you use the following function... ;)

```
x(n + 1) = (x(n) + 5 * (2 * random() - 1)^3) mod 10
```

How about (psuedo code)

```
// clumpiness static in that value retained between calls
static float clumpiness = 0.0f; // from 0 to 1.0
method getNextvalue(int lastValue)
float r = rand(); // float from 0 to 1
int change = MAXCHANGE * (r - 0.5) * (1 - clumpiness);
clumpiness += 0.1 * rand() ;
if (clumpiness >= 1.0) clumpiness -= 1.0;
// -----------------------------------------
return Round(lastValue + change);
```

Perhaps you could generate a random sequence, and then do some strategic element swapping to get the desired property.

For example, if you find 3 values *a,b,c* in the sequence such that *a*>*b* and *a*>*c*, then with some probability you could swap elements *a* and *b* or elements *a* and *c*.

**EDIT** in response to comment:

Yes, you could have a buffer on the stream that is whatever size you are comfortable with. Your swapping rules could be deterministic, or based on another known, reproducible psuedo-random sequence.

Does a sequence like 0, 94, 5, 1, 3, 4, 14, 8, 10, 9, 11, 6, 12, 7, 16, 15, 17, 19, 22, 21, 20, 13, 18, 25, 24, 26, 29, 28, 31, 23, 36, 27, 42, 41, 30, 33, 34, 37, 35, 32, 39, 47, 44, 46, 40, 38, 50, 43, 45, 48, 52, 49, 55, 54, 57, 56, 64, 51, 60, 53, 59, 62, 61, 69, 68, 63, 58, 65, 71, 70, 66, 73, 67, 72, 79, 74, 81, 77, 76, 75, 78, 83, 82, 85, 80, 87, 84, 90, 89, 86, 96, 93, 98, 88, 92, 99, 95, 97, 2, 91 (mod 100) look good to you?

This is the output of a small ruby program (explanations below):

```
#!/usr/bin/env ruby
require 'digest/md5'
$seed = 'Kind of a password'
$n = 100 # size of sequence
$k = 10 # mixing factor (higher means less clumping)
def pseudo_random_bit(k, n)
Digest::MD5.hexdigest($seed + "#{k}|#{n}")[-1] & 1
end
def sequence(x)
h = $n/2
$k.times do |k|
# maybe exchange 1st with 2nd, 3rd with 4th, etc
x ^= pseudo_random_bit(k, x >> 1) if x < 2*h
# maybe exchange 1st with last
if [0, $n-1].include? x
x ^= ($n-1)*pseudo_random_bit(k, 2*h)
end
# move 1st to end
x = (x - 1) % $n
# maybe exchange 1st with 2nd, 3rd with 4th, etc
# (corresponds to 2nd with 3rd, 4th with 5th, etc)
x ^= pseudo_random_bit(k, h+(x >> 1)) if x < 2*(($n-1)/2)
# move 1st to front
x = (x + 1) % $n
end
x
end
puts (0..99).map {|x| sequence(x)}.join(', ')
```

The idea is basically to start with the sequence 0..n-1 and disturb the order by passing *k* times over the sequence (more passes means less clumping). In each pass one first looks at the pairs of numbers at positions 0 and 1, 2 and 3, 4 and 5 etc (general: 2i and 2i+1) and flips a coin for each pair. Heads (=1) means exchange the numbers in the pair, tails (=0) means don't exchange them. Then one does the same for the pairs at positions 1 and 2, 3 and 4, etc (general: 2i+1 and 2i+2). As you mentioned that your sequence is mod 10, I additionally exchanged positions 0 and n-1 if the coin for this pair dictates it.

A single number x can be mapped modulo n after *k* passes to any number of the interval [x-*k*, x+*k*] and is approximately binomial distributed around x. Pairs (x, x+1) of numbers are not independently modified.

As pseudo-random generator I used only the last of the 128 output bits of the hash function MD5, choose whatever function you want instead. Thanks to the clumping one won't get a "secure" (= unpredictable) random sequence.

~~For the record, I'm in the "non-repeating, non-random, non-tracking is a lethal combination" camp~~, and I hope some simple though experiments will shed some light. This is not formal proof by any means. Perhaps someone will shore it up.

So, I can generate a sequence that has some randomness easily:

Given x_i, x_(i+1) ~ U(x_i, r), where r > x_i.

For example:

if x_i = 6, x_(i+1) is random choice from (6+epsilon, some_other_real>6). This guarantees non-repeating, but at the cost that the distribution is monotonically increasing.

Without some condition (like monotonicity), inherent to the sequence of generated numbers themselves, how else can you guarantee uniqueness *without* carrying state?

**Edit**: So after researching RBarryYoung's claim of "Linear Congruential *Generators*" (not differentiators... is this what RBY meant), and clearly, I was wrong! These sequences exist, and by necessity, any PRNG whose next number is dependent only on the current number and some global, non changing state can't have repeats within a cycle (after some initial burn it period).

Maybe you can chain together 2 or more LCGs in a similar manner described for the LSFRs described here. Incement the least-significant LCG with its seed, on full-cycle, increment the next LCG. You only need to store a seed for each LCG. You could then weight each part and sum the parts together. To avoid repititions in the 'clumped' LstSig part you can randomly reseed the LCG on each full cycle.