timer Asked: Nov 16th, 2016

Question description

A3: Hadoop


Installing the VM Image

Setting up a Shared Folder

Virtual Machine Tips

The Hadoop application


We offer a standardized development system for UDC 398 that is based on a virtual machine image. This image contains a small Linux installation and the tools you will need for the homework assignments. Having such a standardized system has several advantages:

  1. You do not have to worry about incompatibilities between the software we will be using and other software that may be installed on your machine;
  2. If something goes wrong, you have a 'safe' configuration to go back to - just reinstall the virtual machine image;
  3. The virtual machine image will be the 'gold standard' for grading; thus, if your solution works in the VM image, you can be sure that it will also work on the graders' machines.

The virtual machine image works with VirtualBox Player, which has been installed on lab machines. If you prefer using your own computer, VMware Player is available for free at (Windows and Linux versions; non-commercial use only). If you are a Mac user, try VirtualBox ( or VMware Fusion (; faster, but not free).

Of course, you are free to use a different development system (e.g., a different operating system, your favorite editor or IDE, ...). However, we will only grade solutions based on their performance in the virtual machine image, and we can only offer limited support if you decide to use your own setup.

Installing the VM Image

Your first task is to install the VM image for this assignment. The following step-by-step guide assumes that you are using a Mac OS X, Windows (7+), or Linux (recent Ubuntu) machine.

  1. Download the VM image:
  2. Double-click on the udc398.ova file you downloaded. This should trigger VirtualBox. Leave the settings as they are and choose Import.
  3. Choose Ubuntu (the VM with the Storage: SATA: SATA Port 0: udc398.disk1.vmdk)
  4. Click on the start button in the toolbar.
  5. The Linux in the VM image should boot now.
  6. Log in as the user “udc398:” whenever a password is requested, “udc398” should work.
  7. You should now have a running Virtual Machine with Ubuntu 16.04.

Setting up a Shared Folder

The next step is to ensure that you can share files between your “base” operating system and your virtual machine.

  1. Open VirtualBox, select your OS on the left then click on Settings on the top, and then Shared Folders.
  2. Click on the green cross to add a new Shared Folder
  3. Fill out the form by selecting which folder you want to shared between your machines. Check auto mount and Save/OK.
  4. Start up your Ubuntu now and navigate to /media and you will see your shared folder as
  5. “sf_FolderName”.
  6. If you get a permissions error when you try and open the shared folder in the vm, you need to open Terminal and paste in the following command: sudo usermod -a -G vboxsf nets212

Virtual Machine Tips

A few tips regarding the virtual machine:

  • Always shut down the VM properly before closing VirtualBox (by clicking on the power
  • switch icon in the upper right corner and selecting 'Shut down...').
  • If you do not do this, the VM image can become corrupted, just like your operating system
  • can become corrupted if you switch off your computer before shutting it down properly.
  • If the screen size of your VM is tiny, try resizing it or putting it into full-screen mode by hitting right-Control-key F. This is a toggle; right-Control-key F will make it windowed again, e.g., if you want to see the VirtualBox menus.
  • The data you store inside the VM is persistent (i.e., will survive reboots), and we will be using a version control system. Nevertheless, we strongly recommend that you make occasional backups, e.g., by copying your data files to a place in your /media/sf_shared folder.
  • You can edit text files with Eclipse, gedit, or nano. To install another editor, you must use apt-get to install it. For instance, to install vim, run sudo apt-get install vim in a terminal window.
  • If the VM runs slowly on your machine, try increasing its memory size to 1.5GB (under Virtual Machine / Virtual Machine Settings..., or under "Edit virtual machine settings").

The Hadoop application

You task is to use the VM to running a Hadoop job in standalone mode.

Step 1: Download the code:

Step 2: Compile the code based on the instructions in the README

Step 3: Create & populate input directory

  • Configured in the Driver via addInputPath()
  • Put input file(s) into this directory (ok to have more than 1)
  • Output directory must not exist yet

Step 4: Run Hadoop

  • As simple as this: hadoop jar <jarName> <driverClassName>
  • Example: hadoop jar foo.jar FooDriver
  • In verbose mode, Hadoop will print statistics while running

Step 5: Submit the output files (as .zip file) via Blackboard.

Tutor Answer

(Top Tutor) Studypool Tutor
School: Rice University
Studypool has helped 1,244,100 students
flag Report DMCA
Similar Questions
Hot Questions
Related Tags
Study Guides

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors