?spark,又一个传说中的分布式实现,详情:http://spark-project.org/,
安装这里就不写了,因为网上已有中文介绍,这里主要是介绍一下入门,和hadoop一样,学习的时候,首先学习spark提供的字符统计例子:javaWordCount
原始代码如下:
Java代码 ?
class="star" alt="收藏代码" src="/Upload/Images/2015042215/40B102E0EF997EA6.png">
- import?scala.Tuple2;??
- import?spark.api.java.JavaPairRDD;??
- import?spark.api.java.JavaRDD;??
- import?spark.api.java.JavaSparkContext;??
- import?spark.api.java.function.FlatMapFunction;??
- import?spark.api.java.function.Function2;??
- import?spark.api.java.function.PairFunction;??
- ??
- import?java.util.Arrays;??
- import?java.util.List;??
- ??
- public?class?JavaWordCount?{??
- ??public?static?void?main(String[]?args)?throws?Exception?{??
- ????if?(args.length?<?2)?{??
- ??????System.err.println("Usage:?JavaWordCount?<master>?<file>");??
- ??????System.exit(1);??
- ????}??
- ??
- ????JavaSparkContext?ctx?=?new?JavaSparkContext(args[0],?"JavaWordCount",??
- ????????System.getenv("SPARK_HOME"),?System.getenv("SPARK_EXAMPLES_JAR"));??
- ????JavaRDD<String>?lines?=?ctx.textFile(args[1],?1);??
- ??
- ????JavaRDD<String>?words?=?lines.flatMap(new?FlatMapFunction<String,?String>()?{??
- ??????public?Iterable<String>?call(String?s)?{??
- ????????return?Arrays.asList(s.split("?"));??
- ??????}??
- ????});??
- ??????
- ????JavaPairRDD<String,?Integer>?ones?=?words.map(new?PairFunction<String,?String,?Integer>()?{??
- ??????public?Tuple2<String,?Integer>?call(String?s)?{??
- ????????return?new?Tuple2<String,?Integer>(s,?1);??
- ??????}??
- ????});??
- ??????
- ????JavaPairRDD<String,?Integer>?counts?=?ones.reduceByKey(new?Function2<Integer,?Integer,?Integer>()?{??
- ??????public?Integer?call(Integer?i1,?Integer?i2)?{??
- ????????return?i1?+?i2;??
- ??????}??
- ????});??
- ??
- ????List<Tuple2<String,?Integer>>?output?=?counts.collect();??
- ????for?(Tuple2?tuple?:?output)?{??
- ??????System.out.println(tuple._1?+?":?"?+?tuple._2);??
- ????}??
- ????System.exit(0);??
- ??}??
- }??
? 运行: ./run spark/examples/JavaWordCount? local input.txt
?local:不解析,自己查
input.txt:文件类容
Html代码 ?
- Hello?World?Bye?World?goole??
?运行的结果和haddoop中运行的JavaWordCount? 一样
?
Html代码 ?
- goole:?1??
- World:?2??
- Hello:?1??
- Bye:?1?