问题描述:

I am new to Hadoop and Map Reduce Programming. I have a dataset which contains Ratings about movies from 943 users. Each user has rated up to 20 movies. Now I want the output of my Mapper to be the User Id and a custom class which will have two lists for Movie (movie ids that the user rated) and the Ratings (Ratings for each Movie). But I am unsure how to output these values from the Map method in such a scenario. Code snippets Below:-

public class UserRatings implements WritableComparable{

private List<String> movieId;

private List<String> movieRatings;

public List<String> getMovieRatings() {

return movieRatings;

}

public void setMovieRatings(List<String> movieRatings) {

this.movieRatings = movieRatings;

}

public List<String> getMovieId() {

return movieId;

}

public void setMovieId(List<String> movieId) {

this.movieId = movieId;

}

@Override

public int compareTo(Object o) {

return 0;

}

@Override

public void write(DataOutput dataOutput) throws IOException {

dataOutput.write

}

@Override

public void readFields(DataInput dataInput) throws IOException {

}

}

ANd here is the Map Method

public class GenreMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

public void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException{

// Logic for parsing the file and exracting the data. Can be ignored...

String[] input = value.toString().split("\t");

Map<String,UserRatings> mapData = new HashMap<String,UserRatings>();

for(int i=0;i<input.length;i++){

List<String> tempList = new ArrayList<String>();

UserRatings userRatings = new UserRatings();

tempList.add(input[3]);

List<String> tempMovieId = new ArrayList<String>();

tempMovieId.add(input[1]);

for(int j=4;j<input.length;j++){

if(input[i].contentEquals(input[j])){

tempMovieId.add(input[j+1]);

tempList.add(input[j+3]);

j = j+4;

}

}

userRatings.setMovieId(tempMovieId);

userRatings.setMovieRatings(tempList);

mapData.put(input[i],userRatings);

}

// context.write();

}

}

网友答案:

I think you're missing the point of the mapper function. The mapper should not emit a list on it's output. The keypoint of the mapper is to produce a tuple that the reducer will catch and regarding the key make the necessary calculations to produce a good output, given this the output format of mapper should be as simple as possible.

In this case, I think the right approach would be to emit on the mapper a key value pair of:

user_id, custom_class

The custom class must have a movie_id and a rating only and not a list. To be more specific I would need to know what do you want for the end result of this map reduce cicle. Please note that if you need you can run a second map reduce on the results of the first.

网友答案:

You may consider using Text and MapWritable as the key value pair for your mapper class.

Here User id will be the key (Text) and the Mapwritable composed from the movie id and rating of the user will we the value object.

The Mapwritable value object should be composed with the MovieId as key and user rating as the value.

Consider this example code snippet,

MapWritable result=new MapWritable();
result.put(new Text("movie1") , new Text("user1_movie1_rating"));
result.put(new Text("movie2") , new Text("user1_movie2_rating"));

Text key = new Text("user_1_id");

context.write(key, result);

Hope this helps :) ..

相关阅读:
Top