Description
Hadoop has several subprojects. Please pick up one of the other sub-projects like FLUME, OOZIE, KNOX, STORM, KAFKA. Install the subproject in the VM, make it work and demonstrate your knowledge by solving a problem, Video without audio and document how to use the system with screenshots.
Deadlines:
Project Proposal (1-2 word page) - Saturday (10/29)
Final Project - Friday (11/18)
Explanation & Answer
View attached explanation and answer. Let me know if you have any questions.
1
Website Activity Tracking
Name of Student
University
Course
Name of Professor
Date of Submission
2
A KAFKA (HADOOP) PROJECT
Background
This project seeks to find and implement the use of Hadoop Kafka to track the activity on
a website include metrics collection and monitoring.
Kafka has recently risen in popularity among many websites that handle large user data.
This is because it offers practical solutions to common problems that website engineers face
when they need to make sense of the real -time data.
One may think that many other software packages work in a similar way to Kafka and
tend to perform the same tasks. However, Kafka is unique and offers operational simplicity,
something that makes it a personal favorite for many. It is easy to set up and use and does not
require a lot of tutorials to get the hang of it. However, this simplicity does not make it any less
powerful. It easily outperforms other similar software, and does so while remaining stable and
reliable in any tasks one throws at it.
A lot of software that perform such powerful operations come at the cost of processing
resources. However, Kafka maintains a high level of resource efficiency. Even when the
workload should be overwhelming for the average processor, Kafka uses the technique of
batching which compresses data to make for faster processing and efficient use of disk space. It
also relies on sharding, which makes use of hundreds of servers thus enabling it to handle
massive loads seamlessly.
Generally, Kafka uses streaming techniques to handle records in real-time, thus allowing
large amounts of user data to be processed in real-time in the most efficient and resourceconscious ways.
Problem Statement
The cost of processing user data in real time on websites that exper...
Review
Review
24/7 Homework Help
Stuck on a homework question? Our verified tutors can answer all questions, from basic math to advanced rocket science!
Similar Content
Related Tags
Invisible Man
by Ralph Ellison
Unf*ck Yourself
by Gary John Bishop
Catch-22
by Joseph Heller
Underground A Human History of the Worlds Beneath our Feet
by Will Hunt
The Power of Habit - Why We Do What We Do in Life and Business
by Charles Duhigg
Othello
by Wiliam Shakespeare
The Hobbit
by J. R. R. Tolkien
The Metamorphosis
by Franz Kafka
American Gods
by Neil Gaiman