Hadoop映射器直接写入输出。（减少器写入映射器的输出）

提问者：小点点

Hadoop映射器直接写入输出。（减少器写入映射器的输出）

我在hadoop中有以下代码，当它运行时，它会将映射器的输出作为减速器的输出。减速器基本上什么都不做。2个输入文件的形式如下：

文件A：Jan-1#starwar，17115（每行都像这样。）VALUE是数字17115。

文件B：#starwar，2017/1/1 5696（每行都像这样。）VALUE是数字5696。

Mapper类处理这些文件并输出（仅粗体字母）：

JAN#STARWARS 17115/A其中关键：JAN#STARWARS

JAN#STARWARS 5696/B其中关键：JAN#STARWARS

减速机应该执行以下操作：

所有相同的键都指向一个还原器，如果我错了请纠正我，我是hadoop的新手，每个还原器将值拆分为2部分：键和值

关键：A，值17115

钥匙：B，值5696

目前，它应该只添加所有值，而不关心它是来自A还是B，并写入（仅粗体）：

1月#星战22.811（22.811=17115 5696）

那么为什么它在没有减速器做它应该做的事情的情况下写入映射器输出呢？我没有将减速器的num设置为零。

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Partitioner;

public class WordCount {

  public static class TokenizerMapper
  extends Mapper<Object, Text, Text, Text>{

  //private final static IntWritable result = new IntWritable();
  private Text word = new Text();


public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString(),"\n");
  while (itr.hasMoreTokens()) {

    String nextWord = itr.nextToken().toUpperCase();

    //System.out.println("'"+nextWord+"'");
    if(isFromPlatformB(nextWord)){
    //Procedure for words of Platform B.
        String[] split1 = nextWord.split("(,)|(/)|(\\s)");
        String seriesTitle = split1[0];
        String numOfMonth = split1[2];
        String numOfDay = split1[3];
        String number = split1[4];//VALUE

        int monthInt = Integer.parseInt(numOfMonth);
        String monthString;

        switch (monthInt) {
            case 1:  monthString = "JAN";
                 break;
            case 2:  monthString = "FEB";
                 break;
            case 3:  monthString = "MAR";
                 break;
            case 4:  monthString = "APR";
                 break;
            case 5:  monthString = "MAY";
                 break;
            case 6:  monthString = "JUN";
                 break;
            case 7:  monthString = "JUL";
                 break;
            case 8:  monthString = "AUG";
                 break;
            case 9:  monthString = "SEP";
                 break;
            case 10: monthString = "OCT";
                 break;
            case 11: monthString = "NOV";
                 break;
            case 12: monthString = "DEC";
                 break;
            default: monthString = "ERROR";
                 break;
            }

         //result.set(numberInt);
         word.set(monthString + " " + seriesTitle);
         System.out.println("key: "+monthString + " " + seriesTitle + ",  value: "+number+"/B");
         context.write(word, new Text(number + "/B"));
         //FORMAT : <KEY,VALUE/B>
    }
    else{
         //Procedure for words of Platform A.
         String[] split5 = nextWord.split("(-)|( )|(,)");
         String month = split5[0];
         String seriesTitle = split5[2];
         String value2 = split5[3];//OUTVALUE
         String finalWord = month + " " + seriesTitle;//OUTKEY   KEY: <APR #WESTWORLD>

         word.set(finalWord);
         //result.set(valueInt);
         System.out.println("key: "+finalWord + ",  value: "+value2+"/A");
         context.write(word, new Text(value2 + "/A"));
         //FORMAT : <KEY,VALUE/A>
    } 
  }
}

 /*
 *This method takes the next token and returns true if the token is taken from platform B file,
 *Or it returns false if the token comes from platform A file.
 *
 */
 public boolean isFromPlatformB(String nextToken){
   // B platform has the form of : "#WestWorld ,2017/1/2){
   if(nextToken.charAt(0) == '#'){  
      return true;
   }
   return false;
}
}

  public static class IntSumReducer
   extends Reducer<Text,IntWritable,Text,Text> {
//private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<Text> values,
                   Context context
                   ) throws IOException, InterruptedException {
  int sum = 0;
  for (Text val : values) {
      String valToString = val.toString();

      String[] split = valToString.split("/");
      //String keyOfValue;
      String valueOfValue;
      int intValueOfValue = 0;

      // FORMAT : <KEY,VALUE/platform>  [<KEY,VALUE>,VALUE = <key,value>]
      //                 [0]      [1]

      if(split.length>1){
             //keyOfValue = split[1];
          valueOfValue = split[0];
          //System.out.println(key);
          //System.out.println(valueOfValue);
          //System.out.println(keyOfValue);
          intValueOfValue = Integer.parseInt(valueOfValue);
          /*if(keyOfValue.equals("A")){//If value is from platform A
              counterForPlatformA += intValueOfValue;
              System.out.println("KEY = 'A' " + "VALUE :"     +intValueOfValue);
              System.out.println("counter A: "+ counterForPlatformA +"||     counter B: "+ counterForPlatformB + "||----||");
          }
          else if(keyOfValue.equals("B")){//If value is from platform B
                 counterForPlatformB += intValueOfValue;
                 System.out.println("KEY = 'B' " + "VALUE :" +intValueOfValue);
                 System.out.println("counter A: "+ counterForPlatformA +"|| counter B: "+ counterForPlatformB + "||----||");
              }
              else{
                    //ERROR
                    System.out.println("Not equal to A or B");
                  }*/

      }
      sum += intValueOfValue;
  }
  context.write(key, new Text(sum));
  }
  }

public static void main(String[] args) throws Exception{

 if (args.length != 3 ){
    System.err.println ("Usage :<inputlocation1> <inputlocation2>   <outputlocation> >");
    System.exit(0);
}
Configuration conf = new Configuration();
String[] files=new GenericOptionsParser(conf,args).getRemainingArgs();
Path input1=new Path(files[0]);
Path input2=new Path(files[1]);
Path output=new Path(files[2]);
//If OUTPUT already exists -> Delete it
FileSystem fs = FileSystem.get(conf);
if(fs.exists(output)){
    fs.delete(output, true);
} 

Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, input1, TextInputFormat.class);
MultipleInputs.addInputPath(job, input2, TextInputFormat.class);
FileOutputFormat.setOutputPath(job, output);
System.exit(job.waitForCompletion(true) ? 0 : 1);

}
}

共2个答案

匿名用户

看起来您的还原器接收一对Text对象并输出Text。如果是这种情况，看起来您有一些问题：

在您的main中，您有：

job. setOutputValueClass（IntWritable.class）它可能应该是job.setOutputValueClass（Text.class）

您还将减速器定义为：

公共静态类IntSumReducer扩展Reducer

还原器接收的是文本值，而不是IntWritables。


                        

                
                    匿名用户

                




                
					
最后是合成器。如果你把你的减速器也设置成合成器，那么你的映射器和减速器之间就不能有不同的类型。


		      
                相关问题
                

																                
					
										   AngularJS-$销毁是否删除事件侦听器？
										   您是否需要取消订阅Angular中的路由器参数？
										   Angular2路由器（@angular2/router），如何设置默认路由？
										   @组件的Angular@取消订阅装饰器
										   Spring：404错误仅以vo类作为Spring控制器中的参数
										   编译器如何为类分配内存？
										   sizeof（）值是由编译器还是链接器决定的？
										   如何禁用RBP帧指针寄存器优化GCC时使用-O*？
										   ARM帧指针寄存器（r11）不断变化
										   为什么x86架构使用两个堆栈寄存器（esp； ebp）？
										   通过修改LLVM后端Clobber X86寄存器
										   Python不和谐音乐机器人停止播放任何歌曲几分钟
										   断开音乐机器人与语音频道的连接
										   Discord.py-音乐机器人队列命令
										   不和谐机器人帮助命令[discord.py]
										   如何使用Discord.py制作不和谐音乐机器人
										   discord.py音乐机器人：如何组合播放和队列命令
										   不和谐机器人无法读取命令
										   为什么我的不和谐机器人只执行我的命令一次，而且只执行一次？
										   与其他特定机器人在公会中时，不和谐机器人功能停止工作

Hadoop映射器直接写入输出。（减少器写入映射器的输出）

共2个答案

相关问题

热门标签

微信关注