文字数を数える
サンプルプログラムで文字数を数えてみた。
サンプルファイルを用意する。
[nashuaki@hostname /usr/local/hadoop]$ mkdir inputs [nashuaki@hostname /usr/local/hadoop]$ echo hoge hoge hoge fuge hage hage fuge hoge hoge > inputs/file1
HDFSに移動
[nashuaki@hostname /usr/local/hadoop]$ ./bin/hadoop dfs -copyFromLocal inputs inputs
みてみる
[nashuaki@hostname /usr/local/hadoop]$ ./bin/hadoop dfs -ls Found 1 items drwxr-xr-x - nashuaki supergroup 0 2011-03-31 15:56 /user/nashuaki/inputs [nashuaki@hostname /usr/local/hadoop]$ ./bin/hadoop dfs -ls inputs Found 1 items -rw-r--r-- 1 nashuaki supergroup 44 2011-03-31 15:56 /user/nashuaki/inputs/file1
サンプルプログラムの実行
[nashuaki@hostname /usr/local/hadoop]$ ./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount inputs outputs 11/03/31 16:12:37 INFO input.FileInputFormat: Total input paths to process : 1 11/03/31 16:12:38 INFO mapred.JobClient: Running job: job_201103311518_0001 11/03/31 16:12:39 INFO mapred.JobClient: map 0% reduce 0% 11/03/31 16:12:47 INFO mapred.JobClient: map 100% reduce 0% 11/03/31 16:12:59 INFO mapred.JobClient: map 100% reduce 100% 11/03/31 16:13:01 INFO mapred.JobClient: Job complete: job_201103311518_0001 11/03/31 16:13:01 INFO mapred.JobClient: Counters: 17 11/03/31 16:13:01 INFO mapred.JobClient: Job Counters 11/03/31 16:13:01 INFO mapred.JobClient: Launched reduce tasks=1 11/03/31 16:13:01 INFO mapred.JobClient: Launched map tasks=1 11/03/31 16:13:01 INFO mapred.JobClient: Data-local map tasks=1 11/03/31 16:13:01 INFO mapred.JobClient: FileSystemCounters 11/03/31 16:13:01 INFO mapred.JobClient: FILE_BYTES_READ=39 11/03/31 16:13:01 INFO mapred.JobClient: HDFS_BYTES_READ=44 11/03/31 16:13:01 INFO mapred.JobClient: FILE_BYTES_WRITTEN=110 11/03/31 16:13:01 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=21 11/03/31 16:13:01 INFO mapred.JobClient: Map-Reduce Framework 11/03/31 16:13:01 INFO mapred.JobClient: Reduce input groups=3 11/03/31 16:13:01 INFO mapred.JobClient: Combine output records=3 11/03/31 16:13:01 INFO mapred.JobClient: Map input records=1 11/03/31 16:13:01 INFO mapred.JobClient: Reduce shuffle bytes=0 11/03/31 16:13:01 INFO mapred.JobClient: Reduce output records=3 11/03/31 16:13:01 INFO mapred.JobClient: Spilled Records=6 11/03/31 16:13:01 INFO mapred.JobClient: Map output bytes=81 11/03/31 16:13:01 INFO mapred.JobClient: Combine input records=9 11/03/31 16:13:01 INFO mapred.JobClient: Map output records=9 11/03/31 16:13:01 INFO mapred.JobClient: Reduce input records=3
みてみる
[nashuaki@hostname /usr/local/hadoop]$ ./bin/hadoop dfs -ls Found 1 items drwxr-xr-x - nashuaki supergroup 0 2011-03-31 15:56 /user/nashuaki/inputs drwxr-xr-x - nashuaki supergroup 0 2011-03-31 16:12 /user/nashuaki/outputs [nashuaki@hostname /usr/local/hadoop]$ ./bin/hadoop dfs -ls outputs Found 2 items drwxr-xr-x - nashuaki supergroup 0 2011-03-31 16:12 /user/nashuaki/outputs/_logs -rw-r--r-- 1 nashuaki supergroup 21 2011-03-31 16:12 /user/nashuaki/outputs/part-r-00000
できてたо(ж>▽<)y ☆
ローカルにもってくる
[nashuaki@hostname /usr/local/hadoop]$ ./bin/hadoop dfs -get outputs outputs
確認
[nashuaki@hostname /usr/local/hadoop]$ cat outputs/part-r-00000 fuge 2 hage 2 hoge 5
むふふγ(▽´ )ツヾ( `▽)ゞ