问题描述:

If you have one billion numbers and one hundred computers, what is the best way to locate the median of these numbers?

One solution which I have is:

- Split the set equally among the computers.
- Sort them.
- Find the medians for each set.
- Sort the sets on medians.
- Merge two sets at a time from the lowest to the highest median.

If we have `m1 < m2 < m3 ...`

then first merge `Set1`

and `Set2`

and in the resulting set we can discard all the numbers lower than the median of `Set12`

(merged). So at any point of time we have equal sized sets. By the way this cannot be done in a parallel manner. Any ideas?

```
sort -g numbers | head -n 500000001 | tail -n 2 | dc -e "1 k ? ? + 2 / p"
```

Ah, my brain has just kicked into gear, I have a sensible suggestion now. Probably too late if this had been an interview, but never mind:

Machine 1 shall be called the "control machine", and for the sake of argument either it starts with all the data, and sends it in equal parcels to the other 99 machines, or else the data starts evenly distributed between the machines, and it sends 1/99 of its data to each of the others. The partitions do not have to be equal, just close.

Each other machine sorts its data, and does so in a way which favours finding the lower values first. So for example a quicksort, always sorting the lower part of the partition first[*]. It writes its data back to the control machine in increasing order as soon as it can (using asynchronous IO so as to continue sorting, and probably with Nagle on: experiment a bit).

The control machine performs a 99-way merge on the data as it arrives, but discards the merged data, just keeping count of the number of values it has seen. It calculates the median as the mean of the 1/2 billionth and 1/2 billion plus oneth values.

This suffers from the "slowest in the herd" problem. The algorithm cannot complete until every value less than the median has been sent by a sorting machine. There's a reasonable chance that one such value will be quite high within its parcel of data. So once the initial partitioning of the data is complete, estimated running time is the combination of the time to sort 1/99th of the data and send it back to the control computer, and the time for the control to read 1/2 the data. The "combination" is somewhere between the maximum and the sum of those times, probably close to the max.

My instinct is that for sending data over a network to be faster than sorting it (let alone just selecting the median) it needs to be a pretty damn fast network. Might be a better prospect if the network can be presumed to be instantaneous, for example if you have 100 cores with equal access to RAM containing the data.

Since network I/O is likely to be the bound, there might be some tricks you can play, at least for the data coming back to the control machine. For example, instead of sending "1,2,3,.. 100", perhaps a sorting machine could send a message meaning "100 values less than 101". The control machine could then perform a modified merge, in which it finds the least of all those top-of-a-range values, then tells all the sorting machines what it was, so that they can (a) tell the control machine how many values to "count" below that value, and (b) resume sending their sorted data from that point.

More generally, there's probably a clever challenge-response guessing game that the control machine can play with the 99 sorting machines.

This involves round-trips between the machines, though, which my simpler first version avoids. I don't really know how to blind-estimate their relative performance, and since the trade-offs are complex, I imagine there are much better solutions out there than anything I'll think of myself, assuming this is ever a real problem.

[*] available stack permitting - your choice of which part to do first is constrained if you don't have O(N) extra space. But if you do have enough extra space, you can take your pick, and if you don't have enough space you can at least use what you do have to cut some corners, by doing the small part first for the first few partitions.

I hate to be the contrarian here, but I don't believe sorting is required, and I think any algorithm involving sorting a billion/100 numbers is going to be slow. Let's consider an algorithm on one computer.

1) Select 1000 values at random from the billion, and use them to get an idea of the distribution of the numbers, especially a range.

2) Instead of sorting the values, allocate them to buckets based on the distribution you just calculated. The number of buckets is chosen so that the computer can handle them efficiently, but should otherwise be as large as convenient. The bucket ranges should be so that approximately equal numbers of values go in each bucket (this isn't critical to the algorithm, but it helps efficiency. 100,000 buckets might be appropriate). Note the number of values in each bucket. This is an O(n) process.

3) Find out which bucket range the median lies. This can be done by simply examining the total numbers in each bucket.

4) Find the actual median by examining the values in that bucket. You can use a sort here if you like, since you are only sorting maybe 10,000 numbers. If the number of values in that bucket is large then you can use this algorithm again until you have a small enough number to sort.

This approach parallelizes trivially by dividing the values between the computers. Each computer reports the totals in each bucket to a 'control' computer which does step 3. For step 4 each computer sends the (sorted) values in the relevant bucket to the control computer (you can do both of those algorithms in parallel too, but it probably isn't worth it).

The total process is O(n), since both steps 3 and 4 are trivial, provided the number of buckets is large enough.

The *estimation* of order statistics like median and 99th percentile can be efficiently distributed with algorithms like t-digest or Q-digest.

Using either algorithm, each node produces a digest, which represents the distribution of the values stored locally. The digests are collected at a single node, merged (effectively summing the distributions), and the median or any other percentile can then be looked up.

This approach is used by elasticsearch and, presumably, BigQuery (going by the description of the QUANTILES function).

One billion is actually quite a boring task for a modern computer. We're talking about 4 GB worth of 4 byte integers here ... 4 GB ... that's the RAM of some smartphones.

```
public class Median {
public static void main(String[] args) {
long start = System.currentTimeMillis();
int[] numbers = new int[1_000_000_000];
System.out.println("created array after " + (System.currentTimeMillis() - start) + " ms");
Random rand = new Random();
for (int i = 0; i < numbers.length; i++) {
numbers[i] = rand.nextInt();
}
System.out.println("initialized array after " + (System.currentTimeMillis() - start) + " ms");
Arrays.sort(numbers);
System.out.println("sorted array after " + (System.currentTimeMillis() - start) + " ms");
if (numbers.length % 2 == 1) {
System.out.println("median = " + numbers[numbers.length / 2 - 1]);
} else {
int m1 = numbers[numbers.length / 2 - 1];
int m2 = numbers[numbers.length / 2];
double m = ((long) m1 + m2) / 2.0;
System.out.println("median = " + new DecimalFormat("#.#").format(m));
}
}
```

Output on my machine:

```
created array after 518 ms
initialized array after 10177 ms
sorted array after 102936 ms
median = 19196
```

So this completes on my machine within less than two minutes (1:43 of which 0:10 are to generate random numbers) using a single core and it's even doing a full sort. Nothing fancy really.

This surely is an interesting task for larger sets of numbers. I just want to make a point here: one billion is peanuts. So think twice before you start throwing complex solutions at surprisingly simple tasks ;)

The median for this set of numbers

2, 3, 5, 7, 11, 13, 67, 71, 73, 79, 83, 89, 97

is 67.

The median for this set of numbers

2, 3, 5, 7, 11, 13, 67, 71, 73, 79, 83, 89

is 40.

Assuming the question was about 1,000,000,000 integers(x) where 0 >= x <= 2,147,483,647 and that the OP was looking for (element(499,999,999) + element(500,000,000)) / 2 (if the numbers were sorted). **Also assuming that all 100 computers were all equal.**

using my laptop and GigE...

What I found was that my laptop can sort 10,000,000 Int32's in 1.3 seconds. So a rough estimate would be that a billion number sort would take 100 x 1.3 seconds(2 minutes 10 seconds) ;).

An estimate of a one-way file transfer of a 40MB file on a gigabit Ethernet is .32 seconds. This means that the sorted results from all computers will be returned in approximately 32 seconds(computer 99 didn't get his file until 30 seconds after the start). From there it shouldn't take long to discard the lowest 499,999,998 numbers, add the next 2 and divide by 2.

Oddly enough, I think if you have enough computers, you're better off sorting than using `O(n)`

median-finding algorithms. (Unless your cores are very, very slow, though, I'd just use one and use an `O(n)`

median-finding algorithm for merely 1e9 numbers; if you had 1e12, though, that might be less practical.)

Anyway, let's suppose we have more than log n cores to deal with this problem, and we don't care about power consumption, just getting the answer fast. Let's further assume that this is a SMP machine with all the data already loaded in memory. (Sun's 32-core machines are of this type, for instance.)

One thread chops the list up blindly into equal sized pieces and tells the other M threads to sort them. Those threads diligently do so, in `(n/M) log (n/M)`

time. They then return not only their medians, but, say, their 25th and 75th percentiles as well (perverse worst cases are better if you choose slightly different numbers). Now you have 4M ranges of data. You then sort these ranges and work upwards through the list until you find a number such that, if you throw out *every* range that is smaller than or contains the number, you will have thrown out half your data. That's your lower bound for the median. Do the same for the upper bound. This takes something like `M log M`

time, and all cores have to wait for it, so it's really wasting `M^2 log M`

potential time. Now you have your single thread tell the others to toss all data outside the range (you should throw out about half on each pass) and repeat--this is a trivially fast operation since the data is already sorted. You shouldn't have to repeat this more than `log(n/M)`

times before it's faster to just grab the remaining data and use a standard `O(n)`

median finder on it.

So, total complexity is something like `O((n/M) log (n/M) + M^2 log M log (n/M))`

. Thus, this is faster than `O(n)`

median sort on one core if `M >> log(n/M)`

and `M^3 log M < n`

, which is true for the scenario you've described.

I think this is a *really bad idea* given how inefficient it is, but it is faster.

This might surprise people, but if the numbers are integers small enough to fit inside 32-bit (or smaller) - Just do a bucket sort! Only needs 16GB of ram for any number of 32-bit ints and runs in O(n), which should outperform any distributed systems for reasonable n, e.g. a billion.

Once you have the sorted list, it's trivial to pick out the median. In fact, you do not need to construct the sorted list, but only looking at the buckets should do it.

A simple implementation is shown below. Only works for 16-bit integers, but extension to 32-bit should be easy.

```
#include <stdio.h>
#include <string.h>
int main()
{
unsigned short buckets[65536];
int input, n=0, count=0, i;
// calculate buckets
memset(buckets, 0, sizeof(buckets));
while (scanf("%d", &input) != EOF)
{
buckets[input & 0xffff]++;
n++;
}
// find median
while (count <= n/2)
{
count += buckets[i++];
}
printf("median: %d\n", i-1);
return 0;
}
```

Using a text file with a billion (10^{9}) numbers and running with `time`

like so

```
time ./median < billion
```

yields a running time on my machine 1m49.293s. Most of the running time is probably disk IO aswell.

One computer is more than enough to solve the problem.

But let's assume that there are 100 computers. The only complex thing you should do is to sort the list. Split it to 100 parts, send one part to each computer, let them be sorted there, and merge parts after that.

Then take number from the middle of the sorted list (i.e. with index 5 000 000 000).

It depends on your data. The worst case scenario is that it's uniformly distributed numbers.

In this case you can find the median in O(N) time like in this example:

Suppose your numbers are 2,7,5,10,1,6,4,4,6,10,4,7,1,8,4,9,9,3,4,3 (range is 1-10).

We create 3 buckets: 1-3, 4-7, 8-10. Note that top and bottom have equal size.

We fill the buckets with the numbers, count how many fall in each, the max and the min

- low (5): 2,1,1,3,3, min 1, max 3
- middle (10): 7,5,6,4,4,6,4,7,4,4, min 4, max 7
- high (5): 10, 10, 8, 9, 9, min 8, max 10

The mean falls in the middle bucket, we disregard the rest

We create 3 buckets: 4, 5-6, 7. Low will start with a count of 5 and with a max of 3 and high with a min of 8 and a count of 5.

For each number we count how many fall in the low and high bucket, the max and the min, and keep the middle bucket.

- old low (5)
- low (5): 4, 4, 4, 4, 4, max 4
- middle (3): 5,6,6
- high (2): 7, 7, min 7
- old high (5)

Now we can calculate the median directly: we have a situation like this

```
old low low middle high old high
x x x x x 4 4 4 4 4 4 5 6 6 7 7 x x x x x
```

so the median is 4.5.

Assuming you know a little about the distribution, you can fine tune how to define the ranges to optimize speed. In any case, the performance should go with O(N), because 1 + 1/3 + 1/9... = 1.5

You need min and max because of edge cases (e.g. if the median is the average between the max of old low and the next element).

All of these operations can be parallelized, you can give 1/100 of the data to each computer and calculate the 3 buckets in each node, then distribute the bucket you keep. This again makes you use the network efficiently because each number is passed on average 1.5 times (so O(N)). You can even beat that if you only pass the minimal numbers among nodes (e.g. if node 1 has 100 numbers and node 2 has 150 numbers, then node 2 can give 25 numbers to node 1).

Unless you know more about the distribution, I doubt you can do better than O(N) here, because you actually need to count the elements at least once.

Split the 10^9 numbers, 10^7 to each computer ~ 80MB on each. Each computer sorts its numbers. Then computer 1 merge-sorts its own numbers with those from computer 2, computer 3 and 4, etc ... Then computer 1 writes half of the numbers back to 2, 3 to 4, etc. Then 1 merge sorts the numbers from computers 1,2,3,4, writes them back. And so on. Depending on the size of RAM on the computers you may get away with not writing all the numbers back to the individual computers at each step, you might be able to accumulate the numbers on computer 1 for several steps, but you do the maths.

Oh, finally get the mean of the 500000000th and 500000001st values (but check there are enough 00s in there, I haven't).

EDIT: @Roman -- well if you can't believe it even it it's true then there's no point in my revealing the truth or falsehood of the proposition. What I meant to state was that brute force sometimes beats smart in a race. It took me about 15 seconds to devise an algorithm which I am confident that I can implement, which will work, and which will be adaptable to a wide range of sizes of inputs and numbers of computers, and tunable to the characteristics of the computers and networking arrangements. If it takes you, or anyone else, say 15 minutes to devise a more sophisticated algorithm I have a 14m45s advantage to code up my solution and start it running.

But I freely admit this is all assertion, I haven't measured anything.

I think Steve Jessop's answer will be the fastest.

If the network data transfer **size** is the bottleneck, here is another approach.

```
Divide the numbers into 100 computers (10 MB each).
Loop until we have one element in each list
Find the meadian in each of them with quickselect which is O(N) and we are processing in parallel. The lists will be partitioned at the end wrt median.
Send the medians to a central computer and find the median of medians. Then send the median back to each computer.
For each computer, if the overall median that we just computed is smaller than its median, continue in the lower part of the list (it is already partitioned), and if larger in the upper part.
When we have one number in each list, send them to the central computer and find and return the median.
```

This can be done faster than the algorithm voted (n log n)

- Order statistics distributed selection algorithm - O(n)

Simplify the problem to the original problem of finding the kth number in an unsorted array.

- Counting sort histogram O(n)

You have to assume some properties about the range of the numbers - can the range fit in the memory?
- External merge sort - O(n log n) - described above

You basically sort the numbers on the first pass, then find the median on the second.

- If anything is known about the distribution of the numbers other
algorithms can be produced.

For more details and implementation see:

http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html

This could be done on nodes using data that is not sorted across nodes (say from log files) in the following manner.

There is 1 parent node and 99 child nodes. The child nodes have two api calls:

- stats(): returns min, max and count
- compare(median_guess): returns count matching value, count less than value and count greater than value

The parent node calls stats() on all child nodes, noting the minimum and maximum of all nodes.

A binary search may now be conducted in the following way:

- Bisect the minimum and maximum rounding down - this is the median 'guess'
- If the greater than count is more than the less than count, set the minimum to the guess
- If the greater than count is less than the less than count, set the maximum to the guess
- If count is odd finish when minimum and maximum are equal
- If count is even finish when maximum <= minimum + guess.match_count This could be done on nodes using unsorted data (say from log files) in the following manner.

There is 1 parent node and 99 child nodes. The child nodes have two api calls:

- stats(): returns min, max and count
- compare(median_guess): returns count matching value, count less than value and count greater than value

The parent node calls stats() on all child nodes, noting the minimum and maximum of all nodes.

A binary search may now be conducted in the following way:

- Bisect the minimum and maximum rounding down - this is the median 'guess'
- If the greater than count is more than the less than count, set the minimum to the guess
- If the greater than count is less than the less than count, set the maximum to the guess
- If count is odd finish when minimum and maximum are equal
- If count is even finish when maximum <= minimum + guess.match_count

If the stats() and compare() could be pre-calculated with a O(N/Mlogn/M) sort, then a O(N/M) pre-calculation with a memory complexity of O(N) for the pre-calculation. Then you could do compare() in constant time, so the whole thing (including pre-calculation) would run in O(N/MlogN/M)+O(logN)

Let me know if I have made a mistake!

An easier method is to have weighted numbers.

- Split the large set among computers
- Sort each set
- iterate through the small-set, and calculate weights to repeated elements
- merge each 2 sets into 1 (each is sorted already) updating weights
- keep merging sets until you get only one set
- iterate through this set accumulating weights until you reach OneBillion/2

How about this:- each node can take 1Billion/100 numbers. At each node the elements can be sorted and median can be found. Find the median of medians. we can, by aggregating the counts of numbers less than median-of-median on all nodes find out x%:y% split which the median-of-medians makes. Now ask all nodes to delete elements less than the median of medians( taking example of 30%:70% split).30% numbers are deleted. 70% of 1Billion is 700million. Now all nodes which deleted less than 3million nodes can send those extra nodes back to a main computer. The main computer redistributes in such a way that now all nodes will have almost equal number of nodes(7million). Now that the problem is reduced to 700million numbers.... goes on until we have a smaller set which can be computed on one comp.

Let's first work out how to find a median of n numbers on a single machine: I am basically using partitioning strategy.

**Problem :selection(n,n/2) :** Find n/2 th number from least number.

You pick say middle element k and partition data into 2 sub arrays. the 1st contains all elements < k and 2nd contains all elements >= k.

if sizeof(1st sub-array) >= n/2, you know that this sub-array contains the median. You can then throw-off the 2nd sub-array. Solve this problem **selection(sizeof 1st sub-array,n/2)**.

In else case, throw off this 1st subarray and solve **selection(2nd subarray , n/2 - sizeof(1st subarray))**

Do it recursively.

**time complexity is** O(n) expected time.

Now if we have many machines, in each iteration, we have to process an array to split, we distribute the array into diff machines. Each machine processes their chunk of array and **sends back the summary to hub controlling machine i.e. size of 1st subarray and size of 2nd subarray.** The hub machines adds up summaries and decide which subarray (1st or 2nd) to process further and 2nd parameter of selection and sends it back to each machine.
and so on.

This algorithm can be implemented very neatly using map reduce?

How does it look?

I would do it like this:

in the beginning all 100 work to find the highest and the lowest number; each of the computer has his part of the database/file which it queries;

when the highest and lowest numbers are found, one computer reads the data, and distributes each number, evenly, to the rest of the 99; the numbers are distributed by equal intervals; (one may take from -100 million to 0, another - from 0 to 100 million, etc);

While receiving numbers, each of the 99 of the computers already sorts them;

Then, it's easy to find the median... See how many numbers has each computer, add all of them (the sum of how many numbers there are, not the numbers themselves), divide by 2; calculate in which computer is the number, and at which index;

:) voilla

P.S. Seems there's a lot of confusion here; the MEDIAN - is the NUMBER IN THE MIDDLE OF A SORTED LIST OF NUMBERS!

You can use the tournament tree method for finding the median. We can create a tree with 1000 leave nodes such that each leaf node is an array. We then conduct n/2 tournaments between the different arrays.The value on the root after the n/2 tournaments is the result.

http://www.geeksforgeeks.org/tournament-tree-and-binary-heap/

If the numbers are not distinct, and only belong to a certain range, that is they are repeated, then a simple solution that comes to my mind is to distribute the numbers among 99 machines equally, and keep one machine as the master. Now every machine iterates over its given numbers, and stores the count of each number in a hash set. Each time the number gets repeated in the set of numbers allotted to that particular computer, it updates its count in the hash set.

All the machines then return their hash set to the master machine. The master machine combines the hash sets, summing the count of the same key found in a hash set. For example machine#1's hash set had an entry of ("1",7), and machine#2's hash set had an entry of ("1",9), so the master machine when combing the hash sets makes an entry of ("1", 16), and so on.

Once the hash sets have been merged, then just sort the keys, and now you can easily find the (n/2)th item and the (n+2/2)th item, from the sorted hash set.

This method won't be beneficial if the billion numbers are distinct.

Well, suppose you know that the number of distinct integers is (say) 4 billion, then you can bucket them into 64k buckets and get a distributed count for each bucket from each machine in the cluster(100 computers). Combine all these counts. Now, find the bucket which has the median, and this time only ask for buckets for the 64k elements that would lie in your target bucket. This requires O(1) (specifically 2) queries over your "cluster". :D

My penny worth, after all that has already been brought up by others:

Finding the median on a single machine is O(N): https://en.wikipedia.org/wiki/Selection_algorithm.

Sending N numbers to 100 machines is also O(N). So, in order to make using 100 machines interesting, either the communication must be relatively fast, or N is so large that a single machine cannot handle it while N/100 is doable, or we just want to consider the mathematical problem without bothering about datacommunication.

To cut things short I'll assume therefore that, within reasonable limits, we can send/distribute the numbers without affecting the efficiency analysis.

Consider then the following approach, where one machine is assigned to be the "master" for some general processing. This will be comparatively fast, so the "master" also participates in the common tasks that each machine performs.

- Each machine receives N/100 of the numbers, computes its own median and sends that information to the master.
- The master compiles a sorted list of all distinct medians and sends that back to each machine, defining an ordered sequence of buckets (on each machine the same), one for each median value (a single-value bucket) and one for each interval between adjacent medians. Of course there are also the lower-end and higher-end buckets for values below the lowest median and above the hightest.
- Each machine computes how many numbers fall in each bucket and communicates that information back to the master.
- The master determines which bucket contains the median, how many lower values (in total) fall below that bucket, and how many above.
- If the selected bucket is a single-value bucket (one of the medians) orelse the selected bucket contains only 1 (N odd) or 2 (N even) values we're done. Otherwise we repeat the steps above with the following (obvious) modifications:
- Only the numbers from the selected bucket are (re)distributed from the master to the 100 machines, and moreover
- We're not going to compute (on each machine) the median, but the k-th value, where we take into account how many higher numbers have been discarded from the total, and how many lower numbers. Conceptually each machine has also its share of the discarded low/high numbers and takes that into account when computing the new median in the set that (conceptually) includes (its share of) the discarded numbers.

Time-complexity:

- A little thinking will convince you that on each step the total number of values to analyse is reduced by a factor at least two (2 would be a rather sick case; you may expect a significantly better reduction). From this we get:
- Assuming that finding the median (or k-th value), which is O(N), takes c*N time where the prefactor c does not vary too wildly with N so that we can take it as a constant for the moment, we'll get our final result in at most 2*c*N/100 time. Using 100 machines gives us, therefore, a speedup factor of 100/2 (at least).
- As remarked initially: the time involved in communicating the numbers between the machines may make it more attractive to simply do everything on one machine. However, IF we go for the distributed approach, the total count of numbers to be communicated in all steps together will not exceed 2*N (N for the first time, <=N/2 the second time, <= half of that the third, and so on).

Divide the 1 billion numbers into 100 machines. Each machine will have 10^7 numbers.

For each incoming number to a machine, store the number in a frequency map, number -> count. Also store the min number in each machine.

Find median in each machine: starting from min number in each machine, sum the counts until median index is reached. The median in each machine, will be the approx. lesser and greater than 5*10^6 numbers.

Find median of all medians, which will be lesser and greater than approx. 50*10^7 numbers, which is the median of 1 billion numbers.

Now some optimization of 2nd step: Instead of storing in a frequency map, store the counts in a variable bit array. For example: Lets say starting from min number in a machine, these are frequency counts:

```
[min number] - 8 count
[min+1 number] - 7 count
[min+2 number] - 5 count
```

The above can be stored in bit array as:

```
[min number] - 10000000
[min+1 number] - 1000000
[min+2 number] - 10000
```

Note that altogether it will cost about 10^7 bits for each machine, since each machine only handles 10^7 numbers. 10^7bits = 1.25*10^6 bytes, which is 1.25MB

So with the above approach each machine will need 1.25MB of space to compute local median. And median of medians can be computed from those 100 local medians, resulting in median of 1 billion numbers.

I suggest a method to calculate approximately the Median. :) If these one billion numbers are in a randomly order, I think I can pick 1/100 or 1/10 of one billion number randomly, sort them with 100 machine, then pick the median of them. Or let's split billion numbers in 100 parts, let each machine pick 1/10 of each part randomly, calculate the median of them. After that we have 100 numbers and we can calculate the median of the 100 number easier. Just a suggestion, I'm not sure if it's mathematically correct. But I think you can show the result to a not-so-good-at-math manager.

Steve Jessop's answer is wrong:

consider the following four groups:

{2, 4, 6, 8, 10}

{21, 21, 24, 26, 28}

{12, 14, 30, 32, 34}

{16, 18, 36, 38, 40}

The median is 21, which is contained in the second group.

The median of the four groups are 6, 24, 30, 36, The total median is 27.

So after the first loop, the four groups will become:

{6, 8, 10}

{24, 26, 28}

{12, 14, 30}

{16, 18, 36}

The 21 is already wrongly discarded.

This algorithm only support the case when there are two groups.