Hadoop - Project

User Generated

fgrirayr

Computer Science

Capella University

Description

Hadoop has several subprojects. Please pick up one of the other sub-projects like FLUME, OOZIE, KNOX, STORM, KAFKA. Install the subproject in the VM, make it work and demonstrate your knowledge by solving a problem, Video without audio and document how to use the system with screenshots.

Deadlines:

Project Proposal (1-2 word page) - Saturday (10/29)

Final Project - Friday (11/18)

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and answer. Let me know if you have any questions.

1

Website Activity Tracking

Name of Student
University
Course
Name of Professor
Date of Submission

2

A KAFKA (HADOOP) PROJECT
Background
This project seeks to find and implement the use of Hadoop Kafka to track the activity on
a website include metrics collection and monitoring.
Kafka has recently risen in popularity among many websites that handle large user data.
This is because it offers practical solutions to common problems that website engineers face
when they need to make sense of the real -time data.
One may think that many other software packages work in a similar way to Kafka and
tend to perform the same tasks. However, Kafka is unique and offers operational simplicity,
something that makes it a personal favorite for many. It is easy to set up and use and does not
require a lot of tutorials to get the hang of it. However, this simplicity does not make it any less
powerful. It easily outperforms other similar software, and does so while remaining stable and
reliable in any tasks one throws at it.
A lot of software that perform such powerful operations come at the cost of processing
resources. However, Kafka maintains a high level of resource efficiency. Even when the
workload should be overwhelming for the average processor, Kafka uses the technique of
batching which compresses data to make for faster processing and efficient use of disk space. It
also relies on sharding, which makes use of hundreds of servers thus enabling it to handle
massive loads seamlessly.
Generally, Kafka uses streaming techniques to handle records in real-time, thus allowing
large amounts of user data to be processed in real-time in the most efficient and resourceconscious ways.

Problem Statement
The cost of processing user data in real time on websites that exper...


Anonymous
I was stuck on this subject and a friend recommended Studypool. I'm so glad I checked it out!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags