Understand the basics of Scala that are required for programming Spark applications
Learn about the basic constructs of Scala such as variable types, control structures, collections,and more.
What is Scala?
Why Scala for Spark?
Introduction to Scala REPL
Basic Scala operations
Variable types in Scala
Loops and collections Array, Map, Lists, Tuples
Functions and procedures in Scala
Eclipse with Scala
Scala REPL Detailed Demo
Configuring scala with Eclipse
Learn about object oriented programming and functional programming techniques in Scala
Introduction to object oriented programming
Different oops concepts
Constructor, getter, setter, singleton, overloading and overriding
Nested Classes, Visibility Rules
Functional programming constructs
Call by Name, Call by Value
Create list of employee objects and sort them based on their firstname
Understand what is big data, challenges associated with it and the different frameworks available
Introduction to big data
Challenges with big data
Batch Vs. Real Time big data analytics
Batch Analytics – Hadoop Ecosystem Overview
Real-time Analytics Options
Streaming Data- Spark
In-memory data- Spark
What is Spark?
Modes of Spark
Spark installation demo
Overview of Spark on a cluster
Spark Standalone cluster
Spark Web UI.
Configuring Spark in Eclipse
Running spark project with Eclipse
Running Spark in Eclipse
Running spark in standalone mode
Running word count program
Learn how to invoke Spark Shell and use it for various common operations.
Play with Spark shell
Execute Scala and Java statements in shell
Understand Spark Context and driver
Read data from local filesystem
Integrate Spark with HDFS
Cache the data in memory for further use
Executing examples in Spark Shell
Running word count program
Learn one of the fundamental building blocks of Spark – RDDs and related manipulations for implementing business logics.
Transformations in RDD
Actions in RDD
Loading data in RDD
Saving data through RDD
Key-Value Pair RDD
MapReduce and Pair RDD Operations
Spark and Hadoop Integration-HDFS
Handling Sequence Files and Partitioner
Analyse NASA Apache web logs, find out top servers
Find out the median salary of developers in different countries through the Stack Overflow survey data
Understand techniques of executing SQL queries in Spark
Loading DBMS data into Spark
Introduction to Apache Spark SQL
The SQL context
Importing and saving data
Processing the Text files,JSON and Parquet Files
Local Hive Metastore server
Explore the price trend by looking at the real estate data in California
Work on Spark streaming which is used to build scalable fault-tolerant streaming applications
Learn about DStreams and various Transformations performed on it
Learn about main streaming operators, Sliding Window Operators and Stateful Operators.
What is Spark Streaming?
Spark Streaming Features
Spark Streaming Workflow
Streaming Context & DStreams
Transformations on DStreams
WordCount Program using Spark Streaming
Important Windowed Operators
Slice, Window and ReduceByWindow Operators
Perform word count using Spark Streaming
Transactions and Actions performed on DStreams.
Output Operations in DStreams
Sliding Window Operations
Word count analysis
Understand Kafka and Kafka Architecture
Learn how to configure different types of Kafka Cluster
Need for Kafka
What is Kafka?
Core Concepts of Kafka
Where is Kafka Used?
Understanding the Components of Kafka Cluster
Configuring Kafka Cluster
Producer and Consumer
Configuring Single Node Single Broker Cluster
Configuring Single Node Multi Broker Cluster
Understand Apache Flume and its basic architecture
Integrate flume with Apache Kafka for event processing
Need of Apache Flume
What is Apache Flume?
Basic Flume Architecture
Integrating Apache Flume and Apache Kafka
Setting up Flume Agent
Streaming Access Logs into HDFS
Analyze the YouTube Data and generate insights like top 10 most videos in various categories, User demographics, no of views, ratings etc. The data contains fields like Id, Age, Category, Length, Views, ratings, comments, etc.
Titanic was one of the biggest disasters in the history of mankind, which happened due to natural events and human mistakes. The objective is to analyze Titanic data sets and generate various insights related to age, gender, survived, class, embarked, etc.
Collect Twitter data in real-time and find out what is currently trending on twitter in various categories. In this project, we will collect live Twitter streams and analyze the same using Spark Streaming to generate insights like finding the current trends in Politics, Finance, Entertainment, etc.