Marc J. Greenberg

Codemarc's Blog

Tag Archives: BigData Hadoop Cloudera

Big Data Floating in the Clouds

For the last few months I have been building a prototype on top of an Apache Hadoop 1.0.4 cluster that I  built from scratch out of six virtual machines running  Ubuntu Server 12.04.2 LTS. It has been an interesting experience. Simply put, this is the actual learning process that every hacker goes through on every new project whether its a programming language, platform or technology. So now that I got a handle on the basics and I can take an earnest look at other peoples packaging.

Today I am checking out the current offering from Cloudera. I found the download named Clouder Manager 4.5 Free Edition, and proceeded with the installation. Of course I need to install it on a few nodes so I am back to setting up some more servers.

Cluster up

This time I decide to use my mac pro server configured with virtual box. I planned on running a three server cluster (cloud1,cloud2,cloud3) so I set it up and run into a few networking problems. I get my ops dept to fix my port to allow for multiple mac addresses. Here are some of the issues and solutions I encountered when setting up the environment:

For each cloned virtual server I needed to change (persistently) its host name and mac address. The tools ( virtual box in this case ) should have properly handled this. It did NOT. So I did the following  hand job on each machine.

  1. sudo vi /etc/hosts
  2. sudo vi /etc/hostname
    (remove cloud definition from each)
  3. sudo vi /etc/dhcp/dhclient.conf
  4. sudo rm /etc/udev/rules.d/70-persistent-net.rules
    sudo mkdir /etc/udev/rules.d/70-persisitent-net.rule
    (thank you Peter Mount)

Install Cloudera Manager (Free Edition)

So my first installation was from my remote desktop linux to my cluster and it failed. I then decided to allocate another local instance (cloud0) and try again. The installer runs ok and i point my web browser at http://cloud0:7180, login as admin/admin and away we go:

This installer will deploy the following services on your cluster:

  • Apache Hadoop (MapReduce, HDFS, Common)
  • Apache HBase
  • Apache ZooKeeper
  • Apache Oozie
  • Apache Hive
  • Hue (Apache licensed)
  • Apache Flume NG
  • Cloudera Impala (Apache licensed)

You are using Cloudera Manager (Free Edition) to install and configure your system.

I specify cloud[1-3] and get the following results:

Expanded Query Hostname (FQDN) IP Address Currently Managed Result
cloud1 No Host ready: 9 ms response time.
cloud2 No Host ready: 7 ms response time.
cloud3 No Host ready: 16 ms response time.

While it took a few tries I finally got the following:


So now It asks me decide which CDH4 services I should install. I pick core hadoop for my first attempt withan embedded PostgreSQL database setup:

Database Host Name: Database Type: Database Name : Username: Password:
cloud0:7432 PostgreSQL hive hive aflhU8ZThz

and all defaults for the rest. 13 steps later  and viola:


Now What

cm Ok so its installed, and we can see. I guess I have to spend some time installing my parts and working with this version to see what happens and how it behaves. But thats for another day.