问题描述:

Hi all please help me achieve this scenario where I have multiple files like aaa.txt, bbb.txt, ccc.txt with data as

aaa.txt:

100110,StringA,22

200110,StringB,2

300110,StringC, 12

400110,StringD,34

500110,StringE,423

bbb.txt as:

100110,StringA,20.1

200110,StringB,2.1

300110,StringC, 12.2

400110,StringD,3.2

500110,StringE,42.1

and ccc.txt as:

100110,StringA,2.1

200110,StringB,2.1

300110,StringC, 11

400110,StringD,3.2

500110,StringE,4.1

Now I have to read all the three files (huge files) and report the result as

100110: (22, 20.1,2.1).

Issue is with the size of files and how to achieve this in optimized way.

网友答案:

I assume you have some sort of code to handle reading the files line by line, so I'll pseudocode a scanner that can keep pulling lines.

The easiest way to handle this would be to use a Map. In this case, I'll just use a HashMap.

    HashMap<String, String[]> map = new HashMap<>();

    while (aaa.hasNextLine()) {
        String[] lineContents = aaa.nextLine().split(",");
        String[] array = new String[3];
        array[0] = lineContents[2].trim();
        map.put(lineContents[0], array);
    }

    while (bbb.hasNextLine()) {
        String[] lineContents = bbb.nextLine().split(",");
        String[] array = map.get(lineContents[0]);
        if (array != null) {
            array[1] = lineContents[2].trim();
            map.put(lineContents[0], lineContents[2].trim());
        } else {
            array = new String[3];
            array[1] = lineContents[2].trim();
            map.put(lineContents[0], array);
        }
    }

    // same for c, with a new index of 2

To add synchronicity, you would probably use one of these maps.

Then you'd create 3 threads that just read and put.

网友答案:

Unless you are doing a lot of processing on loading these files, or are reading a lot of smaller files, it might work better as a sequential operation.

网友答案:

If your files are all ordered, simply maintain an array of Scanner pointing to your files and read the lines one by one, output the result file in a file as you go.

Doing so, you will only keep in memory as many lines as the number of files. It is both time and memory efficient.

If your files are not ordered, you can use the sort command to sort them.

相关阅读:
Top