当前位置：移动技术网 > IT编程>开发语言>Java > java使用hadoop实现关联商品统计

java使用hadoop实现关联商品统计

2019年07月22日 | 移动技术网IT编程 | 我要评论

最近几天一直在看hadoop相关的书籍，目前稍微有点感觉，自己就仿照着wordcount程序自己编写了一个统计关联商品。

需求描述：

根据超市的销售清单，计算商品之间的关联程度（即统计同时买a商品和b商品的次数）。

数据格式：

超市销售清单简化为如下格式：一行表示一个清单，每个商品采用 "," 分割，如下图所示：

需求分析：

采用hadoop中的mapreduce对该需求进行计算。

map函数主要拆分出关联的商品，输出结果为 key为商品a，value为商品b，对于第一条三条结果拆分结果如下图所示：

这里为了统计出和a、b两件商品想关联的商品，所以商品a、b之间的关系输出两条结果即 a-b、b-a。

reduce函数分别对和商品a相关的商品进行分组统计，即分别求value中的各个商品出现的次数，输出结果为key为商品a|商品b，value为该组合出现的次数。针对上面提到的5条记录，对map输出中key值为r的做下分析：

通过map函数的处理，得到如下图所示的记录：

reduce中对map输出的value值进行分组计数，得到的结果如下图所示

将商品a b作为key，组合个数作为value输出，输出结果如下图所示：

对于需求的实现过程的分析到目前就结束了，下面就看下具体的代码实现

代码实现：

关于代码就不做详细的介绍，具体参照代码之中的注释吧。

package com; 
 
import java.io.ioexception; 
import java.util.hashmap; 
import java.util.map.entry; 
 
import org.apache.hadoop.conf.configuration; 
import org.apache.hadoop.conf.configured; 
import org.apache.hadoop.fs.path; 
import org.apache.hadoop.io.intwritable; 
import org.apache.hadoop.io.longwritable; 
import org.apache.hadoop.io.text; 
import org.apache.hadoop.mapreduce.job; 
import org.apache.hadoop.mapreduce.mapper; 
import org.apache.hadoop.mapreduce.reducer; 
import org.apache.hadoop.mapreduce.lib.input.fileinputformat; 
import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; 
import org.apache.hadoop.mapreduce.lib.output.textoutputformat; 
import org.apache.hadoop.util.tool; 
import org.apache.hadoop.util.toolrunner; 
 
public class test extends configured implements tool{ 
 
  /** 
   * map类，实现数据的预处理 
   * 输出结果key为商品a value为关联商品b 
   * @author lulei 
   */ 
  public static class mapt extends mapper<longwritable, text, text, text> { 
    public void map(longwritable key, text value, context context) throws ioexception, interruptedexception{ 
      string line = value.tostring(); 
      if (!(line == null || "".equals(line))) { 
        //分割商品 
        string []vs = line.split(","); 
        //两两组合，构成一条记录 
        for (int i = 0; i < (vs.length - 1); i++) { 
          if ("".equals(vs[i])) {//排除空记录 
            continue; 
          } 
          for (int j = i+1; j < vs.length; j++) { 
            if ("".equals(vs[j])) { 
              continue; 
            } 
            //输出结果 
            context.write(new text(vs[i]), new text(vs[j])); 
            context.write(new text(vs[j]), new text(vs[i])); 
          } 
        } 
      }  
    } 
  } 
   
  /** 
   * reduce类，实现数据的计数 
   * 输出结果key 为商品a|b value为该关联次数 
   * @author lulei 
   */ 
  public static class reducet extends reducer<text, text, text, intwritable> { 
    private int count; 
     
    /** 
     * 初始化 
     */ 
    public void setup(context context) { 
      //从参数中获取最小记录个数 
      string countstr = context.getconfiguration().get("count"); 
      try { 
        this.count = integer.parseint(countstr); 
      } catch (exception e) { 
        this.count = 0; 
      } 
    } 
    public void reduce(text key, iterable<text> values, context context) throws ioexception, interruptedexception{ 
      string keystr = key.tostring(); 
      hashmap<string, integer> hashmap = new hashmap<string, integer>(); 
      //利用hash统计b商品的次数 
      for (text value : values) { 
        string valuestr = value.tostring(); 
        if (hashmap.containskey(valuestr)) { 
          hashmap.put(valuestr, hashmap.get(valuestr) + 1); 
        } else { 
          hashmap.put(valuestr, 1); 
        } 
      } 
      //将结果输出 
      for (entry<string, integer> entry : hashmap.entryset()) { 
        if (entry.getvalue() >= this.count) {//只输出次数不小于最小值的 
          context.write(new text(keystr + "|" + entry.getkey()), new intwritable(entry.getvalue())); 
        } 
      } 
    } 
  } 
   
  @override 
  public int run(string[] arg0) throws exception { 
    // todo auto-generated method stub 
    configuration conf = getconf(); 
    conf.set("count", arg0[2]); 
     
    job job = new job(conf); 
    job.setjobname("jobtest"); 
     
    job.setoutputformatclass(textoutputformat.class); 
    job.setoutputkeyclass(text.class); 
    job.setoutputvalueclass(text.class); 
     
    job.setmapperclass(mapt.class); 
    job.setreducerclass(reducet.class); 
     
    fileinputformat.addinputpath(job, new path(arg0[0])); 
    fileoutputformat.setoutputpath(job, new path(arg0[1])); 
     
    job.waitforcompletion(true); 
     
    return job.issuccessful() ? 0 : 1; 
     
  } 
   
  /** 
   * @param args 
   */ 
  public static void main(string[] args) { 
    // todo auto-generated method stub 
    if (args.length != 3) { 
      system.exit(-1); 
    } 
    try { 
      int res = toolrunner.run(new configuration(), new test(), args); 
      system.exit(res); 
    } catch (exception e) { 
      // todo auto-generated catch block 
      e.printstacktrace(); 
    } 
  } 
 
}

上传运行：

将程序打包成jar文件，上传到机群之中。将测试数据也上传到hdfs分布式文件系统中。

命令运行截图如下图所示：

运行结束后查看相应的hdfs文件系统，如下图所示：

到此一个完整的mapreduce程序就完成了，关于hadoop的学习，自己还将继续~感谢阅读，希望能帮助到大家，谢谢大家对本站的支持！

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

HashMap、Hashtable、ConcurrentHashMap三者间的异同

Hashtable 和 HashMap 的区别与联系1.两者都继承与map接口，所以两者的遍历方式大同小异2.H... [阅读全文]
等保测评--计算环境之终端设备安全

信息安全等级保护，是对信息和信息载体按照重要性等级分级别进行保护的一种工作，在中国、美国等很多国家都存在的一种信... [阅读全文]
解决RecycleView 中Item包含Edittext时，滑动view复用导致数据错乱的问题

解决RecycleView 中Item包含Edittext时，滑动导致数据错乱的问题一言不合就上代码：overr... [阅读全文]
[每日一练] Java 2020.7.28

1.关于以下application,说法正确是什么？public class Test { static ... [阅读全文]
学习多线程造成线程不同步的原因数组实现简单的栈三

java代码package com.baigu.demo1.stack;public class TestSta... [阅读全文]
多线程、同步工作原理、死锁案例、Lock接口、线程的生命周期的讲解及实现

多线程进程正在运行的程序，称为进程，一个应用程序在内存中占用的资源才是进程线程线程:是进程中的某-功能开启了一条... [阅读全文]
《Java核心技术卷1：基础知识》CH14-并发

《Java核心技术卷1：基础知识》CH14-并发 [阅读全文]
vue表单数据AES加密传输

前端用的是vue框架，后端用的是springboot，话不多说，上来直接撸代码。一、前端1. 先安装前端加密JS... [阅读全文]
基于SpringSecurity实现图片验证码登录功能

图片验证码登录验证1.验证码流程详解2.验证码生成3.验证码校验1.验证码流程详解验证码流程图解析：客户端打开登... [阅读全文]
Linux系统下ssh使用总结

转自：https://www.cnblogs.com/kevingrace/p/6110842.html-bas... [阅读全文]

网友评论


验证码：

java使用hadoop实现关联商品统计

2019年07月22日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论