问题描述:

We have a requirement where we need to design our MapReduce architecture in a way that it(MR) won't be depended on input pattern. There will be technique/logic where MapReduce code would be constant and change to input pattern will be managed by custom configurable logic only. Can we do this using custom annotation or are there better approaches to do this.

Any suggestion would be of great help. Many Thanks.

网友答案:

This is already a feature of MapReduce thanks to the FileInputFormat and RecordReader. I can't give a much better example here than https://hadoopi.wordpress.com/2013/05/27/understand-recordreader-inputsplit/, but essentially these two classes aren't involved in the core map() and reduce() logic. The FileInputFormat is responsible for reading and parsing input data, and it then passes this data to the RecordReader, which provides single key-value pairs to the mapper.

So the mapper doesn't really have any idea where its key-value pair has come from or how it got there (not entirely true because of context.getInputSplit()). This means you can mix-and-match input types within the same job, although you can only have one FileInputFormat per mapper, but you can just use multiple different mappers with the same POJO beneath them.

相关阅读:
Top