IN THIS ARTICLE SERIES WE WILL LOOK AT THE STEPS IN CREATING A WHIRR BASE INSTANCE WHICH WILL BE USED TO LAUNCH THE HADOOP CLUSTER OVER YOUR CUSTOM CREATED INSTANCES “USING AWS VPC – BYON(BRING YOUR OWN NETWORK)”, IE., ON PRE-SETUP MACHINE INSTANCES IN AWS VPC. IN THIS ARTICLE WE WILL FOCUS ON HOW TO CREATE A WHIRR BASE INSTANCE.
Key challenge in getting a hadoop cluster through Whirr – BYON over Amazon AWS VPC, is that each instance in the Hadoop clusters should have a hostname which is traceable both by forward and reverse dns lookup in their network. Any AWS VPC instance is not assigned with any dns names(only ip address) and is not associated with a local dns. So we will have to get a local dns server setup in the Whirr base instance and then setup Whirr with Open JDK. In the local dns we will configure the dns names and ip address of instances which we will use for hadoop cluster launch.
CREATING WHIRR BASE TEMPLATE WITH ABILITY TO LAUNCH BYON.
To start with launch default 12.04.1 ubuntu instance (small)
$ sudo su
Step1: Install DNS Server: bind9
$ apt-get install bind9 $ cd /etc/bind9
Step2: Forward dns lookup setup: db.<dns domain>
Make decision on your local dns name say, “ck.local”. Current machine will be the SOA & NameServer(NS) for the domain. Calling current machine (hostname) as “dc”, meaning dc.ck.local. Make a copy of the db.local file as db.ck.local
$ cp db.local db.ck.local
Edit and update as below: With SOA as dc.ck.local & administrator as root.ck.local (instead of email@example.com) I wanted to configure 3 more machine in dns other than the current dc.ck.local (10.0.1.80 in my case) Make the others as node1.ck.local(10.0.1.220), node2.ck.local(10.0.1.221), node3.ck.local(10.0.1.222)
Step3: Reverse Lookup db.<ip address range in reverse>
Now we need to create reverse lookup, copy the db.0 and name it as db.1.0.10 (where all my machines are going to be in the ip ranges of 10.0.1.* and update as below.
Step 4: Include in Name configuration
Next step is to update the bind name configuration to include all these files.
$ vi named.conf.default-zones
update as below, to include both the db.ck.local (instead of the db.local) & update a new entry for db.1.0.10
Step5: Update External DNS forwarder
Now the next update, is to configure the forwarder for the external dns names to the (default dns server for VPC 10.0.0.2)
$ vi named.conf.options
Step6: Update the current machine hostname & nameserver configurations.
In this Ubuntu it is indirectly controlled by dhcp configuration and editing /etc/resolv.conf is the not the right idea.
$ vi /etc/dhcp/dhclient.conf
Update the set host-name to hostname and then uncomment the “supersede domain-name” and update to the chosen domain name, then uncomment the prepend domain-name-servers and update it with the current machine as the dns server “10.0.1.80” (I have updated the vpc dns server[10.0.0.2] which is not really required) Now make sure the bind9 is part of startup service. I did that using
$ chkconfig bind9 on
If you don’t have the chkconfig install first using apt-get install chkconfig.
Now you have a DNS Server ready.
Step7: Next Step is to get the OpenJDK, Whirr downloaded as below.
$ wget http://apache.techartifact.com/mirror/whirr/stable/whirr-0.8.1.tar.gz $ tar –xzf whirr-0.8.1.tar.gz
OpenJDK: Follow the instructions as in http://www.mkyong.com/java/how-to-install-java-jdk-on-ubuntu-linux/ That is it you are now ready with the Whirr base. Create a Amazon Machine Image(AMI) out of this and should be useful to get a whirr base instance on demand. Next document will give inputs on how to launch a Hadoop cluster via Whirr – BYON service provider. Reference: http://aws.amazon.com/vpc http://whirr.apache.org