Apache Flume: Distributed Log Collection for Hadoop - Second Edition
| By: | Steven Hoffman |
| Publisher: | Packt Publishing |
| Print ISBN: | 9781784392178 |
| eText ISBN: | 9781784399146 |
| Edition: | 2 |
| Copyright: | 2015 |
| Format: | Reflowable |
Lifetime - $29.99
eBook Features
Instant Access
Purchase and read your book immediately
Read Offline
Access your eTextbook anytime and anywhere
Study Tools
Built-in study tools like highlights and more
Read Aloud
Listen and follow along as Bookshelf reads to you
Details
Table of Contents
Book Description
If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.What you will learn
- Understand the Flume architecture, and also how to download and install open source Flume from Apache
- Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
- Learn tips and tricks for transporting logs and data in your production environment
- Understand and configure the Hadoop File System (HDFS) Sink
- Use a morphlinebacked Sink to feed data into Solr
- Create redundant data flows using sink groups
- Configure and use various sources to ingest data
- Inspect data records and move them between multiple destinations based on payload content
- Transform data enroute to Hadoop and monitor your data flows