Over

120,000

Worldwide

Saturday - Sunday CLOSED

Mon - Fri 8.00 - 18.00

Call us

 

Hadoop Data Analytics


Hadoop Data Analytics training course explains how to apply data analytics and business intelligence skills to Big Data. This Big Data Analytics training lays emphasis on the usage of Apache Pig, Hive, and Cloudera Impala. It will drive you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop. The participants will learn how to handle heterogeneous data coming from different sources. This data may be structured, unstructured, communication records, log files, audio files, pictures, and videos.

Course Duration :- 40 hours
Upon the completion of the Informatica course, the candidates will be able to do the following:
Explain the fundamentals of Apache Hadoop, Data ETL (extract, transform, load), data processing using Hadoop tools.
Performing data analysis and processing complex data using Pig
Perform data management and text processing using Hive
Extending, troubleshooting, and optimizing Pig and Hive performance
Analyze data with Impala
Comparative study of MapReduce, Pig, Hive, Impala, and Relational Databases
Explain the fundamentals of Apache Hadoop, Data ETL (extract, transform, load), data processing using Hadoop tools.
Performing data analysis and processing complex data using Pig
Perform data management and text processing using Hive
Extending, troubleshooting, and optimizing Pig and Hive performance
Analyze data with Impala
Comparative study of MapReduce, Pig, Hive, Impala, and Relational Databases

KEY FEATURES

Accredited Training Partner

To teach real programming skills

Build a solid understanding

Educated Staff

Timesheets

Video Lessons


Hadoop Data Analytics training course explains how to apply data analytics and business intelligence skills to Big Data. This Big Data Analytics training lays emphasis on the usage of Apache Pig, Hive, and Cloudera Impala. It will drive you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop. The participants will learn how to handle heterogeneous data coming from different sources. This data may be structured, unstructured, communication records, log files, audio files, pictures, and videos.

Hadoop Data Analytics training course explains how to apply data analytics and business intelligence skills to Big Data. This Big Data Analytics training lays emphasis on the usage of Apache Pig, Hive, and Cloudera Impala. It will drive you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop. The participants will learn how to handle heterogeneous data coming from different sources. This data may be structured, unstructured, communication records, log files, audio files, pictures, and videos.

Modules / Levels

1. Introduction

About this Course

About Big Data

Course Logistics

Introductions

2. Hadoop Fundamentals

The Motivation for Hadoop

Hadoop Overview

HDFS

MapReduce

The Hadoop Ecosystem

Lab Scenario Explanation

Hands-On Exercise: Data Ingest with Hadoop Tools

3. Introduction to Pig

What Is Pig?

Pig’s Features

Pig Use Cases

Interacting with Pig

4. Basic Data Analysis with Pig

Pig Latin Syntax

Loading Data

Simple Data Types

Field Definitions

Data Output

Viewing the Schema

Filtering and Sorting Data

Commonly-Used Functions

Hands-On Exercise: Using Pig for ETL Processing

5. Processing Complex Data with Pig

Storage Formats

Complex/Nested Data Types

Grouping

Built-in Functions for Complex Data

Iterating Grouped Data

Hands-On Exercise: Analyzing Ad Campaign Data with Pig

6. Multi-Dataset Operations with Pig

Techniques for Combining Data Sets

Joining Data Sets in Pig

Set Operations

Splitting Data Sets

Hands-On Exercise: Analyzing Disparate Data Sets with Pig

7. Extending Pig

Adding Flexibility with Parameters

Macros and Imports

UDFs

Contributed Functions

Using Other Languages to Process Data with Pig

Hands-On Exercise: Extending Pig with Streaming and UDFs

8. Pig Troubleshooting and Optimization

Troubleshooting Pig

Logging

Using Hadoop’s Web UI

Optional Demo: Troubleshooting a Failed Job with the Web UI

Data Sampling and Debugging

Performance Overview

Understanding the Execution Plan

Tips for Improving the Performance of Your Pig Jobs

9. Introduction to Hive

What Is Hive?

Hive Schema and Data Storage

Comparing Hive to Traditional Databases

Hive vs. Pig

Hive Use Cases

Interacting with Hive

10. Relational Data Analysis with Hive

Hive Databases and Tables

Basic HiveQL Syntax

Data Types

Joining Data Sets

Common Built-in Functions

Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

11. Hive Data Management

Hive Data Formats

Creating Databases and Hive-Managed Tables

Loading Data into Hive

Altering Databases and Tables

Self-Managed Tables

Simplifying Queries with Views

Storing Query Results

Controlling Access to Data

Hands-On Exercise: Data Management with Hive

12. Text Processing with Hive

Overview of Text Processing

Important String Functions

Using Regular Expressions in Hive

Sentiment Analysis and N-Grams

Hands-On Exercise (Optional): Gaining Insight with Sentiment Analysis

13. Hive Optimization

Understanding Query Performance

Controlling Job Execution Plan

Partitioning

Bucketing

Indexing Data

14. Extending Hive

SerDes

Data Transformation with Custom Scripts

User-Defined Functions

Parameterized Queries

Hands-On Exercise: Data Transformation with Hive

15. Introduction to Impala

What is Impala?

How Impala Differs from Hive and Pig

How Impala Differs from Relational Databases

Limitations and Future Directions

Using the Impala Shell

16. Analyzing Data with Impala

Basic Syntax

Data Types

Filtering, Sorting, and Limiting Results

Joining and Grouping Data

Improving Impala Performance

Hands-On Exercise: Interactive Analysis with Impala

17. Choosing the Best Tool for the Job

Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

Which to Choose?

1. Introduction

About this Course

About Big Data

Course Logistics

Introductions

2. Hadoop Fundamentals

The Motivation for Hadoop

Hadoop Overview

HDFS

MapReduce

The Hadoop Ecosystem

Lab Scenario Explanation

Hands-On Exercise: Data Ingest with Hadoop Tools

3. Introduction to Pig

What Is Pig?

Pig’s Features

Pig Use Cases

Interacting with Pig

4. Basic Data Analysis with Pig

Pig Latin Syntax

Loading Data

Simple Data Types

Field Definitions

Data Output

Viewing the Schema

Filtering and Sorting Data

Commonly-Used Functions

Hands-On Exercise: Using Pig for ETL Processing

5. Processing Complex Data with Pig

Storage Formats

Complex/Nested Data Types

Grouping

Built-in Functions for Complex Data

Iterating Grouped Data

Hands-On Exercise: Analyzing Ad Campaign Data with Pig

6. Multi-Dataset Operations with Pig

Techniques for Combining Data Sets

Joining Data Sets in Pig

Set Operations

Splitting Data Sets

Hands-On Exercise: Analyzing Disparate Data Sets with Pig

7. Extending Pig

Adding Flexibility with Parameters

Macros and Imports

UDFs

Contributed Functions

Using Other Languages to Process Data with Pig

Hands-On Exercise: Extending Pig with Streaming and UDFs

8. Pig Troubleshooting and Optimization

Troubleshooting Pig

Logging

Using Hadoop’s Web UI

Optional Demo: Troubleshooting a Failed Job with the Web UI

Data Sampling and Debugging

Performance Overview

Understanding the Execution Plan

Tips for Improving the Performance of Your Pig Jobs

9. Introduction to Hive

What Is Hive?

Hive Schema and Data Storage

Comparing Hive to Traditional Databases

Hive vs. Pig

Hive Use Cases

Interacting with Hive

10. Relational Data Analysis with Hive

Hive Databases and Tables

Basic HiveQL Syntax

Data Types

Joining Data Sets

Common Built-in Functions

Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

11. Hive Data Management

Hive Data Formats

Creating Databases and Hive-Managed Tables

Loading Data into Hive

Altering Databases and Tables

Self-Managed Tables

Simplifying Queries with Views

Storing Query Results

Controlling Access to Data

Hands-On Exercise: Data Management with Hive

12. Text Processing with Hive

Overview of Text Processing

Important String Functions

Using Regular Expressions in Hive

Sentiment Analysis and N-Grams

Hands-On Exercise (Optional): Gaining Insight with Sentiment Analysis

13. Hive Optimization

Understanding Query Performance

Controlling Job Execution Plan

Partitioning

Bucketing

Indexing Data

14. Extending Hive

SerDes

Data Transformation with Custom Scripts

User-Defined Functions

Parameterized Queries

Hands-On Exercise: Data Transformation with Hive

15. Introduction to Impala

What is Impala?

How Impala Differs from Hive and Pig

How Impala Differs from Relational Databases

Limitations and Future Directions

Using the Impala Shell

16. Analyzing Data with Impala

Basic Syntax

Data Types

Filtering, Sorting, and Limiting Results

Joining and Grouping Data

Improving Impala Performance

Hands-On Exercise: Interactive Analysis with Impala

17. Choosing the Best Tool for the Job

Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

Which to Choose?

Drop us a Query

Your Name (required)

Your Email (required)

Phone No

Your Query

What You Get

  • 24/7 e-Learning Access
  • Certified & Industry Experts Trainers
  • Assessments and Mock Tests