The labs were meant for Big Data professionals who work on the Azure Cloud but are new to Apache Spark.
The labs enabled them to:
- Understand the value proposition of Apache Spark over other Big Data technologies like Hadoop.
- Understand the similarities between Hadoop & Spark, their differences, and respective nuances.
- Decide when to use what technology and why for a given business use case.
Following labs were developed:
- Lab 1 – SparkSQL – Introduction, Analyze & Visualize
Seamlessly mix SQL queries with Spark programs, enable Uniform Data Access, showcase Hive Compatibility
- Lab 2 – Real time analytics using Spark Streaming
Developed classical ‘Sessionization’ techniques to enable web marketers to develop highly personalized marketing campaigns for website visitors in near real time.
- Lab 3 – Building Recommendation Systems using Spark ML
- Implementation of ‘Frequently Bought Together’ use case by “Frequent Pattern Mining” algorithm.
- Implementation of ‘People Like You Also Like This’ use case by “Collaborative Filtering” algorithm.
All implementations were enabled in a fictitious web site for students to visualize the effects of each lab work.
The labs were released to the field training team.
Technologies Used: HDInsights Spark, SparkSQL, Spark Streaming, SparkML, Jupyter, Zeppelin, R, Scala, C#, SQL, HDFS, PowerBI.