Over

120,000

Worldwide

Saturday - Sunday CLOSED

Mon - Fri 8.00 - 18.00

Call us

 

Hadoop development


Big Data Hadoop Developer training delivers the key concepts and expertise necessary to develop robust data processing applications using Apache Hadoop. The interactive sessions and demonstrations carried by an industry expert will help the aspirants in understanding all the features and programming skills easily. The Hadoop developer course focuses on the fundamentals and advanced topics of Hadoop, MapReduce, Hadoop Distributed File System (HDFC), Hadoop cluster, Pig, Hive, Hbase, ZooKeeper, Sqoop, and Flume.

Course Duration :- 40 hours
Upon the completion of the Informatica course, the candidates will be able to do the following:
Describe the concepts of Apache Hadoop, Hadoop Ecosystem, MapReduce, and HDFS
Develop, debug, and implement the MapReduce applications
Set up different configurations of Hadoop cluster
Maintain and monitor Hadoop cluster by considering the optimal hardware and networking settings
Leverage Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, and other projects from the Apache Hadoop ecosystem
Describe the concepts of Apache Hadoop, Hadoop Ecosystem, MapReduce, and HDFS
Develop, debug, and implement the MapReduce applications
Set up different configurations of Hadoop cluster
Maintain and monitor Hadoop cluster by considering the optimal hardware and networking settings
Leverage Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, and other projects from the Apache Hadoop ecosystem

KEY FEATURES

Accredited Training Partner

To teach real programming skills

Build a solid understanding

Educated Staff

Timesheets

Video Lessons

Modules / Levels

1. Meet Hadoop

Data

Data Storage and Analysis

Comparison with Other Systems

RDBMS

Grid Computing

Volunteer Computing

A Brief History of Hadoop

Apache Hadoop and the Hadoop Ecosystem

Hadoop Releases

2. MapReduce

A Weather Dataset

Data Format

Analyzing the Data with Unix Tools

Analyzing the Data with Hadoop

Map and Reduce

Java MapReduce

Scaling Out

Data Flow

Combiner Functions

Running a Distributed MapReduce Job

Hadoop Streaming

Compiling and Running

3. The Hadoop Distributed File System (HDFS)

The Design of HDFS

HDFS Concepts

Blocks

Namenodes and Datanodes

HDFS Federation

HDFS High-Availability

The Command-Line Interface

Basic Filesystem Operations

Hadoop Filesystems

Interfaces

The Java Interface

Reading Data from a Hadoop URL

Reading Data Using the FileSystem API

Writing Data

Directories

Querying the Filesystem

Deleting Data

Data Flow

Anatomy of a File Read

Anatomy of a File Write

Coherency Model

Parallel Copying with distcp

Keeping an HDFS Cluster Balanced

Hadoop Archives

4. Hadoop I/O

Data Integrity

Data Integrity in HDFS

LocalFileSystem

ChecksumFileSystem

Compression

Codecs

Compression and Input Splits

Using Compression in MapReduce

Serialization

The Writable Interface

Writable Classes

File-Based Data Structures

SequenceFile

MapFile

5. Developing a MapReduce Application

The Configuration API

Combining Resources

Variable Expansion

Configuring the Development Environment

Managing Configuration

GenericOptionsParser, Tool, and ToolRunner

Writing a Unit Test

Mapper

Reducer

Running Locally on Test Data

Running a Job in a Local Job Runner

Testing the Driver

Running on a Cluster

Packaging

Launching a Job

The MapReduce Web UI

Retrieving the Results

Debugging a Job

Hadoop Logs

Tuning a Job

Profiling Tasks

MapReduce Workflows

Decomposing a Problem into MapReduce Jobs

JobControl

6. How MapReduce Works

Anatomy of a MapReduce Job Run

Classic MapReduce (MapReduce 1)

Failures

Failures in Classic MapReduce

Failures in YARN

Job Scheduling

The Capacity Scheduler

Shuffle and Sort

The Map Side

The Reduce Side

Configuration Tuning

Task Execution

The Task Execution Environment

Speculative Execution

Output Committers

Task JVM Reuse

Skipping Bad Records

7. MapReduce Types and Formats

MapReduce Types

The Default MapReduce Job

Input Formats

Input Splits and Records

Text Input

Binary Input

Multiple Inputs

Database Input (and Output)

Output Formats

Text Output

Binary Output

Multiple Outputs

Lazy Output

Database Output

8. MapReduce Features

Counters

Built-in Counters

User-Defined Java Counters

User-Defined Streaming Counters

Sorting

Preparation

Partial Sort

Total Sort

Secondary Sort

Joins

Map-Side Joins

Reduce-Side Joins

Side Data Distribution

Using the Job Configuration

Distributed Cache

MapReduce Library Classes

9. Setting Up a Hadoop Cluster

Cluster Specification

Network Topology

Cluster Setup and Installation

Installing Java

Creating a Hadoop User

Installing Hadoop

Testing the Installation

SSH Configuration

Hadoop Configuration

Configuration Management

Environment Settings

Important Hadoop Daemon Properties

Hadoop Daemon Addresses and Ports

Other Hadoop Properties

User Account Creation

YARN Configuration

Important YARN Daemon Properties

YARN Daemon Addresses and Ports

Security

Kerberos and Hadoop

Delegation Tokens

Other Security Enhancements

Benchmarking a Hadoop Cluster

Hadoop Benchmarks

User Jobs

Hadoop in the Cloud

Hadoop on Amazon EC2

10. Administering Hadoop

HDFS

Persistent Data Structures

Safe Mode

Audit Logging

Tools

Monitoring

Logging

Metrics

Java Management Extensions

Routine Administration Procedures

Commissioning and Decommissioning Nodes

Upgrades

11. Pig

Installing and Running Pig

Execution Types

Running Pig Programs

Grunt

Pig Latin Editors

An Example

Generating Examples

Comparison with Databases

Pig Latin

Structure

Statements

Expressions

Types

Schemas

Functions

Macros

User-Defined Functions

A Filter UDF

An Eval UDF

A Load UDF

Data Processing Operators

Loading and Storing Data

Filtering Data

Grouping and Joining Data

Sorting Data

Combining and Splitting Data

Pig in Practice

Parallelism

Parameter Substitution

12. Hive

Installing Hive

The Hive Shell

An Example

Running Hive

Configuring Hive

Hive Services

Comparison with Traditional Databases

Schema on Read Versus Schema on Write

Updates, Transactions, and Indexes

HiveQL

Data Types

Operators and Functions

Tables

Managed Tables and External Tables

Partitions and Buckets

Storage Formats

Importing Data

Altering Tables

Dropping Tables

Querying Data

Sorting and Aggregating

MapReduce Scripts

Joins

Subqueries

Views

User-Defined Functions

Writing a UDF

Writing a UDAF

13. Hbase

Backdrop

Concepts

Whirlwind Tour of the Data Model

Implementation

Installation

Test Drive

Clients

Java

Avro, REST, and Thrift

Schemas

Loading Data

Web Queries

HBase Versus RDBMS

Successful Service

14. ZooKeeper

Installing and Running ZooKeeper

Group Membership in ZooKeeper

Creating the Group

Joining a Group

Listing Members in a Group

Deleting a Group

The ZooKeeper Service

Data Model

Operations

Implementation

Consistency

Sessions

States

15. Sqoop

Getting Sqoop

A Sample Import

Generated Code

Additional Serialization Systems

Database Imports: A Deeper Look

Controlling the Import

Imports and Consistency

Direct-mode Imports

Working with Imported Data

Imported Data and Hive

Importing Large Objects

16. Flume

Introduction

Overview

Architecture

Data flow model

Reliability

Building Flume

Getting the source

Compile/test Flume

Developing custom components

Client

Client SDK

RPC client interface

RPC clients - Avro and Thrift

Failover Client

Load Balancing RPC client

Embedded agent

Transaction interface

Sink

Source

Channel

1. Meet Hadoop

Data

Data Storage and Analysis

Comparison with Other Systems

RDBMS

Grid Computing

Volunteer Computing

A Brief History of Hadoop

Apache Hadoop and the Hadoop Ecosystem

Hadoop Releases

2. MapReduce

A Weather Dataset

Data Format

Analyzing the Data with Unix Tools

Analyzing the Data with Hadoop

Map and Reduce

Java MapReduce

Scaling Out

Data Flow

Combiner Functions

Running a Distributed MapReduce Job

Hadoop Streaming

Compiling and Running

3. The Hadoop Distributed File System (HDFS)

The Design of HDFS

HDFS Concepts

Blocks

Namenodes and Datanodes

HDFS Federation

HDFS High-Availability

The Command-Line Interface

Basic Filesystem Operations

Hadoop Filesystems

Interfaces

The Java Interface

Reading Data from a Hadoop URL

Reading Data Using the FileSystem API

Writing Data

Directories

Querying the Filesystem

Deleting Data

Data Flow

Anatomy of a File Read

Anatomy of a File Write

Coherency Model

Parallel Copying with distcp

Keeping an HDFS Cluster Balanced

Hadoop Archives

4. Hadoop I/O

Data Integrity

Data Integrity in HDFS

LocalFileSystem

ChecksumFileSystem

Compression

Codecs

Compression and Input Splits

Using Compression in MapReduce

Serialization

The Writable Interface

Writable Classes

File-Based Data Structures

SequenceFile

MapFile

5. Developing a MapReduce Application

The Configuration API

Combining Resources

Variable Expansion

Configuring the Development Environment

Managing Configuration

GenericOptionsParser, Tool, and ToolRunner

Writing a Unit Test

Mapper

Reducer

Running Locally on Test Data

Running a Job in a Local Job Runner

Testing the Driver

Running on a Cluster

Packaging

Launching a Job

The MapReduce Web UI

Retrieving the Results

Debugging a Job

Hadoop Logs

Tuning a Job

Profiling Tasks

MapReduce Workflows

Decomposing a Problem into MapReduce Jobs

JobControl

6. How MapReduce Works

Anatomy of a MapReduce Job Run

Classic MapReduce (MapReduce 1)

Failures

Failures in Classic MapReduce

Failures in YARN

Job Scheduling

The Capacity Scheduler

Shuffle and Sort

The Map Side

The Reduce Side

Configuration Tuning

Task Execution

The Task Execution Environment

Speculative Execution

Output Committers

Task JVM Reuse

Skipping Bad Records

7. MapReduce Types and Formats

MapReduce Types

The Default MapReduce Job

Input Formats

Input Splits and Records

Text Input

Binary Input

Multiple Inputs

Database Input (and Output)

Output Formats

Text Output

Binary Output

Multiple Outputs

Lazy Output

Database Output

8. MapReduce Features

Counters

Built-in Counters

User-Defined Java Counters

User-Defined Streaming Counters

Sorting

Preparation

Partial Sort

Total Sort

Secondary Sort

Joins

Map-Side Joins

Reduce-Side Joins

Side Data Distribution

Using the Job Configuration

Distributed Cache

MapReduce Library Classes

9. Setting Up a Hadoop Cluster

Cluster Specification

Network Topology

Cluster Setup and Installation

Installing Java

Creating a Hadoop User

Installing Hadoop

Testing the Installation

SSH Configuration

Hadoop Configuration

Configuration Management

Environment Settings

Important Hadoop Daemon Properties

Hadoop Daemon Addresses and Ports

Other Hadoop Properties

User Account Creation

YARN Configuration

Important YARN Daemon Properties

YARN Daemon Addresses and Ports

Security

Kerberos and Hadoop

Delegation Tokens

Other Security Enhancements

Benchmarking a Hadoop Cluster

Hadoop Benchmarks

User Jobs

Hadoop in the Cloud

Hadoop on Amazon EC2

10. Administering Hadoop

HDFS

Persistent Data Structures

Safe Mode

Audit Logging

Tools

Monitoring

Logging

Metrics

Java Management Extensions

Routine Administration Procedures

Commissioning and Decommissioning Nodes

Upgrades

11. Pig

Installing and Running Pig

Execution Types

Running Pig Programs

Grunt

Pig Latin Editors

An Example

Generating Examples

Comparison with Databases

Pig Latin

Structure

Statements

Expressions

Types

Schemas

Functions

Macros

User-Defined Functions

A Filter UDF

An Eval UDF

A Load UDF

Data Processing Operators

Loading and Storing Data

Filtering Data

Grouping and Joining Data

Sorting Data

Combining and Splitting Data

Pig in Practice

Parallelism

Parameter Substitution

12. Hive

Installing Hive

The Hive Shell

An Example

Running Hive

Configuring Hive

Hive Services

Comparison with Traditional Databases

Schema on Read Versus Schema on Write

Updates, Transactions, and Indexes

HiveQL

Data Types

Operators and Functions

Tables

Managed Tables and External Tables

Partitions and Buckets

Storage Formats

Importing Data

Altering Tables

Dropping Tables

Querying Data

Sorting and Aggregating

MapReduce Scripts

Joins

Subqueries

Views

User-Defined Functions

Writing a UDF

Writing a UDAF

13. Hbase

Backdrop

Concepts

Whirlwind Tour of the Data Model

Implementation

Installation

Test Drive

Clients

Java

Avro, REST, and Thrift

Schemas

Loading Data

Web Queries

HBase Versus RDBMS

Successful Service

Hbase

14. ZooKeeper

Installing and Running ZooKeeper

Group Membership in ZooKeeper

Creating the Group

Joining a Group

Listing Members in a Group

Deleting a Group

The ZooKeeper Service

Data Model

Operations

Implementation

Consistency

Sessions

States

15. Sqoop

Getting Sqoop

A Sample Import

Generated Code

Additional Serialization Systems

Database Imports: A Deeper Look

Controlling the Import

Imports and Consistency

Direct-mode Imports

Working with Imported Data

Imported Data and Hive

Importing Large Objects

16. Flume

Introduction

Overview

Architecture

Data flow model

Reliability

Building Flume

Getting the source

Compile/test Flume

Developing custom components

Client

Client SDK

RPC client interface

RPC clients - Avro and Thrift

Failover Client

Load Balancing RPC client

Embedded agent

Transaction interface

Sink

Source

Channel

Drop us a Query

Your Name (required)

Your Email (required)

Phone No

Your Query

What You Get

  • 24/7 e-Learning Access
  • Certified & Industry Experts Trainers
  • Assessments and Mock Tests