- Overview
- Audience
- Prerequisites
- Curriculum
Description:
This hands-on training program equips system administrators, DevOps engineers, and data platform operators with the comprehensive knowledge and real-world skills required to deploy, manage, and support Apache Hadoop clusters using the Hortonworks Data Platform (HDP). Participants will gain a deep understanding of the Hadoop ecosystem, beginning with foundational concepts of Big Data and cluster planning, followed by the installation and configuration of core services such as HDFS and YARN using Apache Ambari. The training emphasizes best practices for resource management, service tuning, and troubleshooting, ensuring attendees are prepared to handle production-scale deployments.
Beyond core administration, the training also explores the wider Hadoop ecosystem, including essential tools and services like Hive for data warehousing, HBase for NoSQL storage, Pig for data transformation, and Apache Spark for in-memory processing. Participants will learn to secure the Hadoop environment using Kerberos, Knox, and Ranger, implement high availability configurations, monitor cluster health using Ambari dashboards, and perform critical maintenance operations such as scaling, upgrades, and backup strategies. With guided labs and enterprise-relevant use cases, this training ensures that learners are equipped to manage high-performing, secure, and scalable Hadoop environments in real-world settings.
Duration: 5 Days
Course Code: BDT 504
Learning Objectives:
By the end of this course, participants will be able to:
- Plan and deploy a Hadoop cluster using Hortonworks and Ambari.
- Configure, manage, and troubleshoot HDFS, YARN, and ecosystem services.
- Connect clients and visualize Hadoop workloads using Hue.
- Perform maintenance, security (Kerberos, Knox, Ranger), and recovery operations.
- Monitor cluster performance and integrate with ticketing tools.
This course is ideal for:
- System Administrators and Infrastructure Engineers
- Hadoop/Big Data Administrators
- DevOps Engineers supporting Hadoop clusters
- Data Platform Engineers.
- Basic Linux/Unix administration
- Understanding of networking fundamentals
- Exposure to distributed computing concepts is helpful
Course Outline:
Module 1: Linux Foundations for Hadoop
- Linux/Unix essentials for setting up Hadoop clusters
- OS-level configuration and tuning for Hadoop deployments
Module 2: Planning Hadoop Cluster Architecture
- General planning and sizing best practices
- Datanode/Namenode hardware requirements
- Network and virtualization considerations
- Node configuration approaches
Module 3: Big Data and Hadoop Fundamentals
- What is Big Data?
- 3Vs and 4Vs of Big Data
- CAP Theorem, NoSQL databases
- Big Data problems and Hadoop's role
- Structured, semi-structured, and unstructured data handling
- Core Hadoop concepts and Gen 1 vs. Gen 2
- Why Hadoop is essential for modern businesses
Module 4: HDFS Administration
- HDFS architecture and file flow (read/write)
- High Availability (HA) in HDFS
- Rack awareness configuration
- HDFS command-line interaction and quotas
Module 5: Cluster Installation Using Ambari
- Overview of Apache Ambari and Hortonworks Manager
- Cluster setup using Ambari
- Parameter tuning during installation
- Clustering internals in Hortonworks (HDP)
Module 6: Manual Apache Hadoop Setup
- Apache Hadoop tarball installation
- Pre-installation checklist
- Configuration files
- Passwordless SSH setup for cluster connectivity
Module 7: MapReduce and YARN Resource Management
- MapReduce concepts and execution phases
- Limitations of Gen 1 Hadoop and the need for YARN
- YARN architecture: ResourceManager & NodeManager
- Tuning memory and CPU allocation
- Capacity Scheduler configuration
Module 8: Introduction to Apache Spark
- Spark architecture and advantages
- Differences from MapReduce and YARN
- Suitable use cases for Spark in Hadoop environments
Module 9: Hadoop Clients and Interfaces
- Hadoop client tools and connectivity
- Installing and configuring clients
- Hue: overview, installation, and configuration
Module 10: Configurations, Logs & Troubleshooting
- Configuring HDFS and YARN services
- Daemon log locations and structure
- Diagnosing issues from logs
- Real-world YARN deployment examples
Module 11: Ecosystem Toolset Overview
- Hive: installation and data warehouse usage
- HBase: NoSQL database and deployment
- Zookeeper: coordination service
- Pig and Grunt shell
- Kudu vs. HDFS comparison
- Logs and troubleshooting for Hive, HBase, and Pig
Module 12: Backup and Recovery Strategy
- Backup types and retention
- What data should be backed up
- Tools and scheduling for recovery planning
Module 13: Hadoop Security Essentials
- Security needs in Hadoop
- Kerberos authentication flow
- Configuring secured Hadoop clusters
- Knox: perimeter security gateway
- Ranger: policy-based access control and auditing
Module 14: Cluster Maintenance Operations
- Adding and removing cluster nodes
- Copying data between clusters
- Upgrade strategy and stepwise approach
- HDFS rebalancing and directory snapshots
- Ensuring cluster high availability
Module 15: Monitoring and Reporting
- Monitoring features in Hortonworks Manager
- Configuring alerts and dashboards
- High Availability monitoring
- Integration with ticketing tools
- Known issues and remediation patterns
Training Material Provided:
- Course slides and reference guides



