问题描述:

I am planning to use mincemeat.py for my map reduce task on a ~100GB file. After seeing the example code from mincemeat, it seems I need to input an in-memory dictionary as the data source. So, what is the right way to provide my huge file as the data source for mincemeat?

Link to mincemeat: https://github.com/michaelfairley/mincemeatpy

网友答案:

Looking at the example and the concept I would have thought that you would ideally:

  1. Produce an iterator for the data source,
  2. Spilt the file into a number of meerly large files on a number of servers and then
  3. Merge the results.
相关阅读:
Top