问题描述:

What is the best way to calculate the right number of hadoop mappers and reducers to use, depending on the instances used/available on Amazon Elastic MapReduce ? (using RecommenderJob of mahout-core-0.7 distribution)

网友答案:

The generic Hadoop answer applies:

  • Let Hadoop pick the number of mappers
  • Set the number of reducers equal to the number of reduce slots in your cluster

For EMR, look up the number of reducers that are run by default on the instance type that you're using: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HadoopMemoryDefault_AMI2.3.html

Then multiply by the number of workers you are using. That's a pretty ideal number of reducers -- or a small multiple of it even.

Until you have a specific reason to think these aren't optimal, I'd go with this.

PS Don't forget to use spot instances for your workers to save money and/or deploy more workers.

Ad break: if you are interested in Mahout, and recommendations, and running on EMR, you should probably be looking at Myrrix. I'm the founder, and also the author of some of the Mahout code you're running now. This is a "next-gen" Hadoop-based recommender product that, among other things, is already well optimized for EMR.

相关阅读:
Top