DATA ANALYTICS - BIG DATA/HADOOP

Big data industry is still in its nascent stage and there is a dearth of quality resources. It has huge potential in future and big data skills will be in high demand in the upcoming days. Test Triangle offers a comprehensive big data and analytics package, which will equip young professionals with a skill in high demand.

Our new BigData – Hadoop course is designed for complete begineer with no experience in data analytics. This course is perfectly sutiable for the individual who wants to start career in Data Analytics.

Training course can be tailored to your organisations’ or individual needs. Please contact us for details and prices of private in-house training services.

Scheduled start date: Contact us for dates

Introduction to
Big Data

Rise of Big Data
Compare Hadoop vs traditional systems
Hadoop Master-Slave Architecture
Understanding HDFS Architecture
Name Node, Data Node, Secondary Node
Learn about Job Tracker, Task Tracker

Hadoop Configuration &
Deamon Logs

Hadoop Configuration and Daemon Logs
Hadoop Daemon or Roles

Hadoop Cluster
Setup and Working

Cluster Setup and Working
Getting Virtualization Software and Linux Disk Images
Adding Machines to your VM Box
Installing Linux into Machines
Preparing your Linux Machines to install Hadoop
Cluster Management Solution
Setting Apache Hadoop Cluster
Writing Data to Cluster and Checking Replication Status
Setting up Linux machines in AWS EC2 to setup
Cloudera Cluster
Setting Cloudera Cluster on your machines in AWS EC2

Hadoop Cluster
Maintenance and Administration

Commissioning Decommissioning of Data Nodes in Cloudera Cluster
Decommissioning and Commissioning nodes in Apache Hadoop Cluster
Balancing a Cluster
Managing Services
Managing Software Packages with Apache Hadoop
Managing Role Instances
Improvements in Hadoop Version 2

HDFS & MapReduce
Architecture

Core components of Hadoop
Understanding Hadoop Master-Slave Architecture
Learn about NameNode, DataNode, Secondary Node
Understanding HDFS Architecture
Anatomy of Read and Write data on HDFS
MapReduce Architecture Flow

Hadoop
Configuration

Hadoop Modes
Hadoop Terminal Commands
Cluster Configuration
Web Ports
Hadoop Configuration Files
Reporting, Recovery

Understanding Hadoop
MapReduce Framework

Overview of the MapReduce Framework
Use cases of MapReduce
MapReduce Architecture
Anatomy of MapReduce Program
Mapper/Reducer Class, Driver code
Understand Combiner and Partitioner

Advance
MapReduce
Part 1

Write your own Partitioner
Writing Map and Reduce in Python
Map side/Reduce side Join
Distributed Join
Distributed Cache
Counters
Joining Multiple datasets in MapReduce

Advance
MapReduce
Part 2

MapReduce internals
Understanding Input Format
Custom Input Format
Using Writable and Comparable
Understanding Output Format
Sequence Files
JUnit and MRUnit Testing Frameworks

Apache
PIG

PIG vs MapReduce
PIG Architecture & Data types
PIG Latin Relational Operators
PIG Latin Join and CoGroup
PIG Latin Group and Union
Describe, Explain, Illustrate

Apache
Hive and HiveQL

What is Hive
Hive DDL – Create/Show Database
Hive DDL – Create/Show/Drop Tables
Hive DML – Load Files & Insert Data
Hive SQL – Select, Filter, Join, Group By
Hive Architecture & Components
Difference between Hive and RDBMS

Advance
HiveQL

Multi-Table Inserts
Joins
Grouping Sets, Cubes, Rollups
Custom Map and Reduce scripts
Hive SerDe
Hive UDF
Hive UDAF

Apache
Kafka

Kafka – How Kafka works
Kafka Architecture
Apache Kafka and real world use cases
What are the various components of Apache Kafka
Kafka Cluster configuration
Kafka Broker, Producer and Consumer configuration
Practice lab exercises using Apache Flume

Apache
Flume

Flume – How it works
Flume Architecture
Flume Complex Flow – Multiplexing
Apache Flume and real world use cases
What are the various components of Apache Flume
Flume agent configuration
Practice lab exercises using Apache Flume

Apache Sqoop,
Oozie

Sqoop – How Sqoop works
Sqoop Architecture
Oozie – Simple/Complex Flow
Oozie Service/ Scheduler
Use Cases – Time and Data triggers

NoSQL
Database

CAP theorem
RDBMS vs NoSQL
Key Value stores: Memcached, Riak
Key Value stores: Redis, Dynamo DB
Column Family: Cassandra, HBase

Apache
HBase

When/Why to use HBase
HBase Architecture/Storage
HBase Data Model
HBase Families/ Column Families
HBase Master
HBase vs RDBMS
Access HBase Data

Apache
ZooKeeper

What is Zookeeper
Zookeeper Data Model
ZNokde Types
Sequential ZNodes
Installing and Configuring
Running Zookeeper
Zookeeper use cases

Hadoop 2.0,
YARN, MRV2

Hadoop 1.0 Limitations
MapReduce Limitations
HDFS 2: Architecture
HDFS 2: High availability
HDFS 2: Federation
YARN Architecture
Classic vs YARN
YARN multitenancy

Please enroll for the course by submitting your details to: afshan_alurkar@testtriangle.com

DATA ANALYTICS - BIG DATA/HADOOP

Introduction to Big Data

Hadoop Configuration & Deamon Logs

Hadoop ClusterSetup and Working

Hadoop Cluster Maintenance and Administration

HDFS & MapReduce Architecture

HadoopConfiguration

Understanding HadoopMapReduce Framework

AdvanceMapReducePart 1

AdvanceMapReducePart 2

ApachePIG

ApacheHive and HiveQL

AdvanceHiveQL

ApacheKafka

ApacheFlume

Apache Sqoop, Oozie

NoSQLDatabase

ApacheHBase

ApacheZooKeeper

Hadoop 2.0,YARN, MRV2