当前位置：移动技术网 > IT编程>开发语言>Java > Hadoop MultipleOutputs输出到多个文件中的实现方法

Hadoop MultipleOutputs输出到多个文件中的实现方法

2019年07月19日 | 移动技术网IT编程 | 我要评论

hadoop multipleoutputs输出到多个文件中的实现方法

1.输出到多个文件或多个文件夹：

驱动中不需要额外改变，只需要在mapclass或reduce类中加入如下代码

private multipleoutputs<text,intwritable> mos;
public void setup(context context) throws ioexception,interruptedexception {
　　mos = new multipleoutputs(context);
}
public void cleanup(context context) throws ioexception,interruptedexception {
　　mos.close();
}

　　然后就可以用mos.write(key key,value value,string baseoutputpath)代替context.write(key, value);

　　在mapclass或reduce中使用，输出时也会有默认的文件part-m-00*或part-r-00*，不过这些文件是无内容的，大小为0. 而且只有part-m-00*会传给reduce。

注意：multipleoutputs.write(key, value, baseoutputpath)方法的第三个函数表明了该输出所在的目录（相对于用户指定的输出目录）。

如果baseoutputpath不包含文件分隔符“/”，那么输出的文件格式为baseoutputpath-r-nnnnn（name-r-nnnnn)；
如果包含文件分隔符“/”，例如baseoutputpath=“029070-99999/1901/part”，那么输出文件则为029070-99999/1901/part-r-nnnnn

2.案例-需求

需求，下面是有些测试数据，要对这些数据按类目输出到output中：

1512,iphone5s,4英寸,指纹识别,a7处理器,64位,m7协处理器,低功耗

1512,iphone5,4英寸,a6处理器,ios7

1512,iphone4s,3.5英寸,a5处理器,双核,经典

50019780,ipad,9.7英寸,retina屏幕,丰富的应用

50019780,yoga,联想,待机18小时,外形独特

50019780,nexus 7,华硕&google,7英寸

50019780,ipad mini 2,retina显示屏,苹果,7.9英寸

1101,macbook air,苹果超薄,os x mavericks

1101,macbook pro,苹果,os x lion

1101,thinkpad yoga,联想,windows 8,超级本

3.mapper程序：

package cn.edu.bjut.multioutput;

import java.io.ioexception;

import org.apache.hadoop.io.intwritable;
import org.apache.hadoop.io.longwritable;
import org.apache.hadoop.io.text;
import org.apache.hadoop.mapreduce.mapper;

public class multioutputmapper extends mapper<longwritable, text, intwritable, text> {

  @override
  protected void map(longwritable key, text value, context context)
      throws ioexception, interruptedexception {
    string line = value.tostring().trim();
    if(null != line && 0 != line.length()) {
      string[] arr = line.split(",");
      context.write(new intwritable(integer.parseint(arr[0])), value);
    }
  }

}

4.reducer程序：

package cn.edu.bjut.multioutput;

import java.io.ioexception;

import org.apache.hadoop.io.intwritable;
import org.apache.hadoop.io.nullwritable;
import org.apache.hadoop.io.text;
import org.apache.hadoop.mapreduce.reducer;
import org.apache.hadoop.mapreduce.lib.output.multipleoutputs;

public class multioutputreducer extends
    reducer<intwritable, text, nullwritable, text> {

  private multipleoutputs<nullwritable, text> multipleoutputs = null;

  @override
  protected void reduce(intwritable key, iterable<text> values, context context)
      throws ioexception, interruptedexception {
    for(text text : values) {
      multipleoutputs.write("keyspilt", nullwritable.get(), text, key.tostring()+"/");
      multipleoutputs.write("allpart", nullwritable.get(), text);
    }
  }

  @override
  protected void setup(context context)
      throws ioexception, interruptedexception {
    multipleoutputs = new multipleoutputs<nullwritable, text>(context);
  }

  @override
  protected void cleanup(context context)
      throws ioexception, interruptedexception {
    if(null != multipleoutputs) {
      multipleoutputs.close();
      multipleoutputs = null;
    }
  }


}

5.主程序：

package cn.edu.bjut.multioutput;

import org.apache.hadoop.conf.configuration;
import org.apache.hadoop.fs.filesystem;
import org.apache.hadoop.fs.path;
import org.apache.hadoop.io.intwritable;
import org.apache.hadoop.io.nullwritable;
import org.apache.hadoop.io.text;
import org.apache.hadoop.mapreduce.job;
import org.apache.hadoop.mapreduce.lib.input.fileinputformat;
import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;
import org.apache.hadoop.mapreduce.lib.output.multipleoutputs;
import org.apache.hadoop.mapreduce.lib.output.textoutputformat;

public class mainjob {
  public static void main(string[] args) throws exception {
    configuration conf = new configuration();
    job job = new job(conf, "aaa");
    job.setjarbyclass(mainjob.class);

    job.setmapperclass(multioutputmapper.class);
    job.setmapoutputkeyclass(intwritable.class);
    job.setmapoutputvalueclass(text.class);

    job.setreducerclass(multioutputreducer.class);
    job.setoutputkeyclass(nullwritable.class);
    job.setoutputvalueclass(text.class);

    fileinputformat.addinputpath(job, new path(args[0]));

    multipleoutputs.addnamedoutput(job, "keyspilt", textoutputformat.class, nullwritable.class, text.class);
    multipleoutputs.addnamedoutput(job, "allpart", textoutputformat.class, nullwritable.class, text.class);

    path outpath = new path(args[1]);
    filesystem fs = filesystem.get(conf);
    if(fs.exists(outpath)) {
      fs.delete(outpath, true);
    }
    fileoutputformat.setoutputpath(job, outpath);

    job.waitforcompletion(true);
  }
}

如有疑问请留言或者到本站社区交流讨论，感谢阅读，希望能帮助到大家，谢谢大家对本站的支持！

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

before社区电量是什么意思 Before社区电量获得方法

before社区电量是什么意思？怎么获得更多的电量？有的朋友对此可能还不太清楚，今天，小编为大家带来了before社区电量获得方法。感兴趣的朋友快来了解一下吧。... [阅读全文]
Before社区怎么同步快手 Before社区绑定快手教程

before社区怎么同步快手？在before社区中，用户可以绑定自己的快手，对快手账户进行关联，今天，小编为大家带来了before社区绑定快手教程。感兴趣的玩家... [阅读全文]
before社区怎么加入小组 before社区关注小组教程

before社区怎么加入小组？before社区是由快手推出的一个主要面向年轻人的兴趣社交平台，在before社区中有许多的小组，今天，小编为大家带来了befor... [阅读全文]
before社区怎么玩 Before社区玩法分享

before社区怎么玩？快手正式上线了一款青年文化社区产品before社区。产品定位是专为文艺青年打造的互动交友社区。今天，小编为大家带来了详细的介绍，感兴趣的... [阅读全文]
springboot解决跨域请求带cookie的问题

记录一次解决Spring boot加vue的跨域问题大家都知道vue跨域请求Spring boot的时候，后端服... [阅读全文]
RecycleView入门详解（教你全面掌握RecycleView用法）

RecycleView小白入门详解（教你全面掌握其用法）RecycleView概念效果展示RecycleView... [阅读全文]
Android Emoji表情字符列表

https://apps.timwhitlock.info/emoji/tables/unicodeimport... [阅读全文]
动态权限请求框架RxPermissions(几行代码搞定权限)

RxPermissions简单使用描述：随着社会的发展人们也开始重视对隐私的保护，谷歌也在Android6.0（... [阅读全文]
ExoPlayer之Extractor媒体文件解封装使用

ExoPlayer之ExtractorExtractor的作用接口说明Extractor的调用内置的Extrac... [阅读全文]
URL路径@PathVariable出现点号“.“时值遭截断问题

一：问题描述SpringMVC项目中通过下面的ＵＲＬ进行GET请求。当version有多个小数点的时候。如ver... [阅读全文]

网友评论


验证码：

Hadoop MultipleOutputs输出到多个文件中的实现方法

2019年07月19日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论