Spark Stream

1. People with a virtual machine or laptop may be limited in their choice of optimizations (e.g. you will not be able tr
deal with placement).
2. Please make appropriate assumptions:
a. Define an appropriate workload (what data you will use), how you will input it etc.
b. Define how you will measure the performance, such as throughput, selectivity, accuracy etc.
3. In order to reproduce the graph, it may be ok to use Spark (and not yet Spark streaming)
Please make these assumptions clear in your writeups. Please also include a plot of the performance graph you are
trying to reproduce

1. Operator Reordering
2. Load Shedding
and try to reproduce the graph from the paper (also on the slides).
3. Bonus – pick any of the other optimizations and try to reproduce the graph

A Catalog of Stream Processing Optimizations