Marc J. Greenberg

Codemarc's Blog

Category Archives: BigData

Big Data Floating in the Clouds

For the last few months I have been building a prototype on top of an Apache Hadoop 1.0.4 cluster that I  built from scratch out of six virtual machines running  Ubuntu Server 12.04.2 LTS. It has been an interesting experience. Simply put, this is the actual learning process that every hacker goes through on every new project whether its a programming language, platform or technology. So now that I got a handle on the basics and I can take an earnest look at other peoples packaging.

Today I am checking out the current offering from Cloudera. I found the download named Clouder Manager 4.5 Free Edition, and proceeded with the installation. Of course I need to install it on a few nodes so I am back to setting up some more servers.

Cluster up

This time I decide to use my mac pro server configured with virtual box. I planned on running a three server cluster (cloud1,cloud2,cloud3) so I set it up and run into a few networking problems. I get my ops dept to fix my port to allow for multiple mac addresses. Here are some of the issues and solutions I encountered when setting up the environment:

For each cloned virtual server I needed to change (persistently) its host name and mac address. The tools ( virtual box in this case ) should have properly handled this. It did NOT. So I did the following  hand job on each machine.

  1. sudo vi /etc/hosts
  2. sudo vi /etc/hostname
    (remove cloud 127.0.0.1 definition from each)
  3. sudo vi /etc/dhcp/dhclient.conf
  4. sudo rm /etc/udev/rules.d/70-persistent-net.rules
    sudo mkdir /etc/udev/rules.d/70-persisitent-net.rule
    (thank you Peter Mount)

Install Cloudera Manager (Free Edition)

So my first installation was from my remote desktop linux to my cluster and it failed. I then decided to allocate another local instance (cloud0) and try again. The installer runs ok and i point my web browser at http://cloud0:7180, login as admin/admin and away we go:

This installer will deploy the following services on your cluster:

  • Apache Hadoop (MapReduce, HDFS, Common)
  • Apache HBase
  • Apache ZooKeeper
  • Apache Oozie
  • Apache Hive
  • Hue (Apache licensed)
  • Apache Flume NG
  • Cloudera Impala (Apache licensed)

You are using Cloudera Manager (Free Edition) to install and configure your system.

I specify cloud[1-3] and get the following results:

Expanded Query Hostname (FQDN) IP Address Currently Managed Result
cloud1 cloud1.ibi.com 172.30.240.110 No Host ready: 9 ms response time.
cloud2 cloud2.ibi.com 172.30.240.111 No Host ready: 7 ms response time.
cloud3 cloud3.ibi.com 172.30.240.112 No Host ready: 16 ms response time.

While it took a few tries I finally got the following:

cm3

So now It asks me decide which CDH4 services I should install. I pick core hadoop for my first attempt withan embedded PostgreSQL database setup:

Database Host Name: Database Type: Database Name : Username: Password:
cloud0:7432 PostgreSQL hive hive aflhU8ZThz

and all defaults for the rest. 13 steps later  and viola:

cm8

Now What

cm Ok so its installed, and we can see. I guess I have to spend some time installing my parts and working with this version to see what happens and how it behaves. But thats for another day.

Advertisements

Remembering

Memorial day weekend (NYC 2015). Everyone should make it his or her business to visit New York City during Fleet Week as it is a powerful reminder of why memorial day is so important. Visiting active duty American service men/woman, shaking their hands, thanking them for their service, remembering those that have given (for real)  is something that every real american citizen should feel they must do.

For me one of the highlights of this weekend was a visit abord the USS San Antonio (LPD-17).  The sailors, marines and coast guardsman were some of the finest examples of young american men and women, that I have ever had the pleasure to meet. The lesson and meanings of that day were not lost on a single person that I met.

"On this Memorial Day, we honor the sacrifices of prior generations. We honor the sacrifices of the men and women next door who have served or continue to serve our country. And we pledge never to forget the true meaning of Memorial Day. We would not have the privilege of celebrating this day and honoring so many memories without the sacrifices of those who gave their last full measure of devotion."