问题描述:

I'm trying to find a counterexample to the Pólya Conjecture which will be somewhere in the 900 millions. I'm using a very efficient algorithm that doesn't even require any factorization (similar to a Sieve of Eratosthenes, but with even more information. So, a large array of ints is required.

The program is efficient and correct, but requires an array up to the x i want to check for (it checks all numbers from (2, x)). So, if the counterexample is in the 900 millions, I need an array that will be just as large. Java won't allow me anything over about 20 million. Is there anything I can possibly do to get an array that large?

网友答案:

You may want to extend the max size of the JVM Heap. You can do that with a command line option.

I believe it is -Xmx3600m (3600 megabytes)

网友答案:

Java will allow up to 2 billions array entries. It’s your machine (and your limited memory) that can not handle such a large amount.

网友答案:

Java arrays are indexed by int, so an array can't get larger than 2^31 (there are no unsigned ints). So, the maximum size of an array is 2147483648, which consumes (for a plain int[]) 8589934592 bytes (= 8GB).

Thus, the int-index is usually not a limitation, since you would run out of memory anyway.

In your algorithm, you should use a List (or a Map) as your data structure instead, and choose an implementation of List (or Map) that can grow beyond 2^31. This can get tricky, since the "usual" implementation ArrayList (and HashMap) uses arrays internally. You will have to implement a custom data structure; e.g. by using a 2-level array (a list/array). When you are at it, you can also try to pack the bits more tightly.

网友答案:

900 million 32 bit ints with no further overhead - and there will always be more overhead - would require a little over 3.35 GiB. The only way to get that much memory is with a 64 bit JVM (on a machine with at least 8 GB of RAM) or use some disk backed cache.

网友答案:

If you don't need it all loaded in memory at once, you could segment it into files and store on disk.

网友答案:

What do you mean by "won't allow". You probably getting an OutOfMemoryError, so add more memory with the -Xmx command line option.

网友答案:

You could define your own class which stores the data in a 2d array which would be closer to sqrt(n) by sqrt(n). Then use an index function to determine the two indices of the array. This can be extended to more dimensions, as needed.

The main problem you will run into is running out of RAM. If you approach this limit, you'll need to rethink your algorithm or consider external storage (ie a file or database).

网友答案:

If your algorithm allows it:

  • Compute it in slices which fit into memory.

    You will have to redo the computation for each slice, but it will often be fast enough.

  • Use an array of a smaller numeric type such as byte.

网友答案:

For efficiently stored large arrays of primitives (boolean, byte,... double I recommend our library JLargeArrays available on GitHub (https://github.com/IcmVis/JLargeArrays) - it stores arbitrary large arrays providing sufficient memory is available, e.g. 12G byte array on a 16 GB PC, tested on Oracle and IBM JVMs with good multithreaded efficiency.

网友答案:

I wrote a version of the Sieve of Eratosthenes for Project Euler which worked on chunks of the search space at a time. It processes the first 1M integers (for example), but keeps each prime number it finds in a table. After you've iterated over all the primes found so far, the array is re-initialised and the primes found already are used to mark the array before looking for the next one.

The table maps a prime to its 'offset' from the start of the array for the next processing iteration.

This is similar in concept (if not in implementation) to the way functional programming languages perform lazy evaluation of lists (although in larger steps). Allocating all the memory up-front isn't necessary, since you're only interested in the parts of the array that pass your test for primeness. Keeping the non-primes hanging around isn't useful to you.

This method also provides memoisation for later iterations over prime numbers. It's faster than scanning your sparse sieve data structure looking for the ones every time.

网友答案:

I second @sfossen's idea and @Aaron Digulla. I'd go for disk access. If your algorithm can take in a List interface rather than a plain array, you could write an adapter from the List to the memory mapped file.

网友答案:

Use Tokyo Cabinet, Berkeley DB, or any other disk-based key-value store. They're faster than any conventional database but allow you to use the disk instead of memory.

网友答案:

Depending on how you need to access the array, you might find a RandomAccessFile will allow you to use a file which is larger than will fit in memory. However, the performance you get is very dependant on your access behaviour.

网友答案:

could you get by with 900 million bits? (maybe stored as a byte array).

网友答案:

You can try splitting it up into multiple arrays.

for(int x = 0; x <= 1000000; x++){
    myFirstList.add(x);
}
for(int x = 1000001; x <= 2000000; x++){
    mySecondList.add(x);
}

then iterate over them.

for(int x: myFirstList){
    for(int y: myFirstList){
        //Remove multiples
    }
}
//repeat for second list
网友答案:

Use a memory mapped file (Java 5 NIO package) instead. Or move the sieve into a small C library and use Java JNI.

相关阅读:
Top