2011/10/21にCDH3u2がリリースされたのでアップデートしてみました。
CDH3 Installation Guide - Cloudera Support
Upgrading to CDH3 - Cloudera Support
$ hadoop version Hadoop 0.20.2-cdh3u1 Subversion file:///tmp/nightly_2011-07-18_07-57-52_3/hadoop-0.20-0.20.2+923.97-1~maverick -r bdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 Compiled by root on Mon Jul 18 09:40:07 PDT 2011 From source with checksum 3127e3d410455d2bacbff7673bf3284c
現在はCDH3u1がインストールされてます。
$ for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done [sudo] password for h-akanuma: Stopping Hadoop datanode daemon: no datanode to stop hadoop-0.20-datanode. Stopping Hadoop jobtracker daemon: no jobtracker to stop hadoop-0.20-jobtracker. Stopping Hadoop namenode daemon: no namenode to stop hadoop-0.20-namenode. Stopping Hadoop secondarynamenode daemon: no secondarynamenode to stop hadoop-0.20-secondarynamenode. Stopping Hadoop tasktracker daemon: no tasktracker to stop hadoop-0.20-tasktracker. Stopping Hadoop HBase master daemon: no master to stop because kill -0 of pid 2271 failed with status 1 hbase-master. Stopping Hadoop HBase regionserver daemon: stopping regionserver........ hbase-regionserver. JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Stopping zookeeper ... STOPPED $ $ jps 9534 Jps $ $ ps aux | grep hadoop 1000 9544 0.0 0.0 5164 788 pts/0 S+ 21:56 0:00 grep --color=auto hadoop
Hadoop関連プロセスを停止。
$ sudo dpkg -i ダウンロード/cdh3-repository_1.0_all.deb 未選択パッケージ cdh3-repository を選択しています。 (データベースを読み込んでいます ... 現在 262400 個のファイルとディレクトリがインストールされています。) (.../cdh3-repository_1.0_all.deb から) cdh3-repository を展開しています... cdh3-repository (1.0) を設定しています ... gpg: 鍵輪「/etc/apt/secring.gpg」ができました gpg: 鍵輪「/etc/apt/trusted.gpg.d/cloudera-cdh3.gpg」ができました gpg: 鍵02A818DD: 公開鍵“Cloudera Apt Repository”を読み込みました gpg: 処理数の合計: 1 gpg: 読込み: 1
ダウンロードしたパッケージをインストール
$ sudo apt-get update ・・・
APTパッケージインデックスを更新
$ apt-cache search hadoop ubuntu-orchestra-modules-hadoop - Modules mainly used by orchestra-management-server flume - reliable, scalable, and manageable distributed data collection application hadoop-0.20 - A software platform for processing vast amounts of data hadoop-0.20-conf-pseudo - Pseudo-distributed Hadoop configuration hadoop-0.20-datanode - Data Node for Hadoop hadoop-0.20-doc - Documentation for Hadoop hadoop-0.20-fuse - HDFS exposed over a Filesystem in Userspace hadoop-0.20-jobtracker - Job Tracker for Hadoop hadoop-0.20-namenode - Name Node for Hadoop hadoop-0.20-native - Native libraries for Hadoop (e.g., compression) hadoop-0.20-pipes - Interface to author Hadoop MapReduce jobs in C++ hadoop-0.20-sbin - Server-side binaries necessary for secured Hadoop clusters hadoop-0.20-secondarynamenode - Secondary Name Node for Hadoop hadoop-0.20-source - Source code for Hadoop hadoop-0.20-tasktracker - Task Tracker for Hadoop hadoop-hbase - HBase is the Hadoop database hadoop-hbase-doc - Documentation for HBase hadoop-hbase-master - HMaster is the "master server" for a HBase hadoop-hbase-regionserver - HRegionServer makes a set of HRegions available to clients hadoop-hbase-thrift - Provides an HBase Thrift service hadoop-hive - A data warehouse infrastructure built on top of Hadoop hadoop-hive-metastore - Shared metadata repository for Hive hadoop-hive-server - Provides a Hive Thrift service hadoop-pig - A platform for analyzing large data sets using Hadoop hadoop-zookeeper - A high-performance coordination service for distributed applications. hadoop-zookeeper-server - This runs the zookeeper server on startup. hue-common - A browser-based desktop interface for Hadoop hue-filebrowser - A UI for the Hadoop Distributed File System (HDFS) hue-jobbrowser - A UI for viewing Hadoop map-reduce jobs hue-jobsub - A UI for designing and submitting map-reduce jobs to Hadoop hue-plugins - Plug-ins for Hadoop to enable integration with Hue hue-shell - A shell for console based Hadoop applications libhdfs0 - JNI Bindings to access Hadoop HDFS from C libhdfs0-dev - Development support for libhdfs0 mahout - A set of Java libraries for scalable machine learning. oozie - A workflow and coordinator sytem for Hadoop jobs. sqoop - Tool for easy imports and exports of data sets between databases and HDFS cdh3-repository - Cloudera's Distribution including Apache Hadoop
Hadoopパッケージの検索
$ sudo apt-get install hadoop-0.20 ・・・ $ hadoop version Hadoop 0.20.2-cdh3u2 Subversion file:///tmp/nightly_2011-10-13_20-02-02_3/hadoop-0.20-0.20.2+923.142-1~maverick -r 95a824e4005b2a94fe1c11f1ef9db4c672ba43cb Compiled by root on Thu Oct 13 21:52:18 PDT 2011 From source with checksum 644e5db6c59d45bca96cec7f220dda51
Hadoopコアパッケージをインストール。
CDH3u2がインストールされました。
Hadoop各デーモンも同時にアップデートされています。
$ sudo apt-get install hadoop-hbase-master ・・・ $ sudo apt-get install hadoop-zookeeper-server ・・・ $ hbase shell 11/10/26 22:36:54 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011 hbase(main):001:0>
HBase, Zookeeper もアップデート。CDH3u2にアップデートされました。
$ sudo /etc/init.d/hadoop-0.20-namenode start Starting Hadoop namenode daemon: starting namenode, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-namenode-h-akanuma-CF-W4.out hadoop-0.20-namenode. $ $ sudo /etc/init.d/hadoop-0.20-datanode start Starting Hadoop datanode daemon: starting datanode, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-h-akanuma-CF-W4.out hadoop-0.20-datanode. $ $ sudo /etc/init.d/hadoop-0.20-secondarynamenode start Starting Hadoop secondarynamenode daemon: starting secondarynamenode, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-secondarynamenode-h-akanuma-CF-W4.out hadoop-0.20-secondarynamenode. $ $ sudo /etc/init.d/hadoop-0.20-jobtracker start Starting Hadoop jobtracker daemon: starting jobtracker, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-jobtracker-h-akanuma-CF-W4.out hadoop-0.20-jobtracker. $ $ sudo /etc/init.d/hadoop-0.20-tasktracker start Starting Hadoop tasktracker daemon: starting tasktracker, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-tasktracker-h-akanuma-CF-W4.out hadoop-0.20-tasktracker. $ $ sudo jps 12799 SecondaryNameNode 12672 DataNode 12552 NameNode 12895 JobTracker 13029 Jps 11574 QuorumPeerMain 12996 TaskTracker
Hadoop各デーモンを起動
$ hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u2-*examples.jar pi 10 10000 Number of Maps = 10 Samples per Map = 10000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 11/10/26 23:09:21 INFO mapred.FileInputFormat: Total input paths to process : 10 11/10/26 23:09:22 INFO mapred.JobClient: Running job: job_201110262307_0001 11/10/26 23:09:23 INFO mapred.JobClient: map 0% reduce 0% 11/10/26 23:09:42 INFO mapred.JobClient: map 20% reduce 0% 11/10/26 23:09:57 INFO mapred.JobClient: map 40% reduce 0% 11/10/26 23:10:12 INFO mapred.JobClient: map 60% reduce 0% 11/10/26 23:10:14 INFO mapred.JobClient: map 60% reduce 13% 11/10/26 23:10:20 INFO mapred.JobClient: map 80% reduce 20% 11/10/26 23:10:26 INFO mapred.JobClient: map 100% reduce 20% 11/10/26 23:10:29 INFO mapred.JobClient: map 100% reduce 33% 11/10/26 23:10:32 INFO mapred.JobClient: map 100% reduce 100% 11/10/26 23:10:34 INFO mapred.JobClient: Job complete: job_201110262307_0001 11/10/26 23:10:35 INFO mapred.JobClient: Counters: 23 11/10/26 23:10:35 INFO mapred.JobClient: Job Counters 11/10/26 23:10:35 INFO mapred.JobClient: Launched reduce tasks=1 11/10/26 23:10:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=113667 11/10/26 23:10:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/10/26 23:10:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/10/26 23:10:35 INFO mapred.JobClient: Launched map tasks=10 11/10/26 23:10:35 INFO mapred.JobClient: Data-local map tasks=10 11/10/26 23:10:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=49553 11/10/26 23:10:35 INFO mapred.JobClient: FileSystemCounters 11/10/26 23:10:35 INFO mapred.JobClient: FILE_BYTES_READ=226 11/10/26 23:10:35 INFO mapred.JobClient: HDFS_BYTES_READ=2420 11/10/26 23:10:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=609632 11/10/26 23:10:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215 11/10/26 23:10:35 INFO mapred.JobClient: Map-Reduce Framework 11/10/26 23:10:35 INFO mapred.JobClient: Reduce input groups=2 11/10/26 23:10:35 INFO mapred.JobClient: Combine output records=0 11/10/26 23:10:35 INFO mapred.JobClient: Map input records=10 11/10/26 23:10:35 INFO mapred.JobClient: Reduce shuffle bytes=280 11/10/26 23:10:35 INFO mapred.JobClient: Reduce output records=0 11/10/26 23:10:35 INFO mapred.JobClient: Spilled Records=40 11/10/26 23:10:35 INFO mapred.JobClient: Map output bytes=180 11/10/26 23:10:35 INFO mapred.JobClient: Map input bytes=240 11/10/26 23:10:35 INFO mapred.JobClient: Combine input records=0 11/10/26 23:10:35 INFO mapred.JobClient: Map output records=20 11/10/26 23:10:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=1240 11/10/26 23:10:35 INFO mapred.JobClient: Reduce input records=20 Job Finished in 74.586 seconds Estimated value of Pi is 3.14120000000000000000
Hadoopジョブをテスト実行。
無事成功しました。
$ sudo /etc/init.d/hadoop-hbase-master start Starting Hadoop HBase master daemon: starting master, logging to /usr/lib/hbase/logs/hbase-hbase-master-h-akanuma-CF-W4.out hbase-master. $ $ sudo /etc/init.d/hadoop-hbase-regionserver start Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /usr/lib/hbase/logs/hbase-hbase-regionserver-h-akanuma-CF-W4.out hbase-regionserver. $ $ sudo jps 14202 Jps 12799 SecondaryNameNode 12672 DataNode 14134 HRegionServer 13996 HMaster 12552 NameNode 12895 JobTracker 11574 QuorumPeerMain 12996 TaskTracker
HBaseのデーモンも起動。
擬似分散モードなのでZookeeperは起動させません。
$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011 hbase(main):001:0> hbase(main):002:0* list TABLE courses scores 2 row(s) in 2.0210 seconds hbase(main):003:0>
hbase shell の listコマンドで動作確認。
こちらも成功です。