Stream Analytics Pipeline

Final Project for CSCI E-88c: Programming in Scala for Big Data Systems

(Spring 2022)

Project Goals

The goal for this project was to provide insights to the game data generated by Project Ogre's player vs. player system. I wanted to gather data by character class in near-real time to let me make informed decisions about balance changes. The insights from this project are used in conjunction with test player feedback to make changes.

Specifically, I wanted to answer a few starting questions for this project:

System Architecture

This system is similar to my Big Data pipeline except that it uses stream processing via Spark instead of batch processing in Elasticsearch. Given the similarity, I had this approved by course staff before submitting the project.

Azure Function App

The Azure Function App is triggered by an Azure Storage BLOB creation. This function deserializes the battle JSON and emits a batch of comma-separated event strings based on the class-based results. The events are written to an Azure Event Hubs topic via a Kafka surface.

Azure Event Hubs

This is a Microsoft Azure PaaS resource which I configured for this project.

Azure Databricks Service

I wrote an Apache Spark stream analytics program in Scala to compute aggregate and averages from the events. The events are consumed from Azure Event Hubs and to answer the questions posed in the project goal statement. I created a Spark SQL DataFrame from the input stream which is queried to use Databricks' built-in Display() functionality for visualization.