A4 Spring 2024
GT CS 7280: Network Science
Assignment 4: Modeling Epidemics
Spring 2024
The objective of this assignment is to experiment with the concepts we covered in Module4
about network epidemics, and see how the theoretical results that were derived in class
compare to simulation results.
Submission
Please submit your Jupyter Notebook A4YOURGTUSERNAME.ipynb with requirements.txt
so that we may be able to replicate your Python dependencies to run your code as needed.
With Anaconda, you can do this by running:
conda list e requirements.txt
Ensure all graphs and plots are properly labeled with unit labels and titles for x y axes.
Producing readable, interpretable graphics is part of the grade as it indicates understanding of
the content there may be point deductions if plots are not properly labeled.
Getting Started
Assignment 4 requires Epidemics on Network python module available here. You can download
this module and run the examples given in the documentation to become familiar with it.
You can install the library using: pip install EoN
This homework, especially parts 2 and 3, might take several minutes to run. Be aware of
this and plan to complete it accordingly.
IMPORTANT As with prior assignments the structure has been designed
to have several subsections in each part. The first few subsections are
meant to just define useful functions and the final subsection of each part
is where the functions are called and the analysis is done. If you are
confused about how a function is meant to be used, check the final
subsection in each part to see how they are being called. This should clear
up a lot of potential points of confusion early on.
https:epidemicsonnetworks.readthedocs.ioenlatestEoN.html
GT CS 7280: Network Science
Part 1: Outbreak Modeling 40 Points
The file fludata.txt has the list of students, teachers and staff at a school. The interaction
between them was measured based on the proximity of the sensors that people were carrying
on them. The data file has three columns, the first two columns are the IDs of the people and
the third column is the number of interactions.
Construct an undirected graph from that text file using functions in the networkX module. For
the purpose of this assignment, we consider only the unweighted graph i.e., you can ignore
the third column.
1. 10 points Suppose there is a pathogen with transmission rate of 0.01 and recovery
rate of 0.5. Suppose that an outbreak started at node 325 patient0. Complete the
simulateoutbreak function to simulate an outbreak under the SIS model using the
provided parameters. The function should return a list of length niter containing
simulation runs where niter is an argument to the function.
Important: When running your simulations, you will want to discard the outbreaks that
died out stochastically. To do this, check whether the number of infected nodes at the
last time step is 0 and ignore them. Additionally, you will find that the EoN package uses
different parameter names than we use in lecture. Reading the documentation should
make it clear which values to provide each of the parameters.
Additionally, complete the plotoutbreaks function to visualize the results of the
simulateoutbreak function. Show the results for each of the simulations on a single
plot and break each simulation into 2 lines, one for the number of infected and the other
for number of susceptible over time. Make sure to properly label these lines and to
create a legend identifying which lines are which.
2. 10 points In the lecture we modeled the initial exponential increase of the number of
infected nodes as , where is a time constant. Note that here as only
one node was infected initially. Now, complete the getexponent function to fit an
exponent to the curve of the number of infections. Choose only the initial portion of the
outbreak, say for 100 the exponential region of the outbreak, where the number of
infected is less than or equal to 100 and return the estimated time constant .
Hint: scipy.optimize.curvefit is a helpful function to fit the exponent to a curve.
Additionally, complete the plotcurvefit function to plot both the actual number of
infected and the theoretical curve given a value of for values of Infected 100. This
function should also compute the rsquared between the two curves and print the value
for and rsquared in the title of the plot. Again, make sure to label both curves and
create a legend identifying which is which.
3. 5 points In the lecture and textbook we discussed theoretical values for that can be
calculated from properties of the graph and the dynamics of the infection spread.
Complete the calculatetheoreticaltaus function to compute:
GT CS 7280: Network Science
The random distribution shown in the Lesson 9 Canvas lecture SIS Model.
The arbitrary distribution from the Canvas lectures shown in the Lesson 9
Canvas lecture Summary of SI, SIS, SIR Models with Arbitrary Degree
Distribution.
The arbitrary distribution from the textbook found in Ch. 10, Equation 10.21.
Additionally, complete the comparetaus function to show a boxplot of the distribution of
sample s calculated from simulation runs see 1.5 to understand where these come
from. Visualize the theoretical calculations as dots on the box plot. Again, label each of
these dots with the calculation used to generate them.
4. 10 points Complete the calculatetheoreticalendemicsize function to compute the
size of the population that remains infected at the endemic state.
Then, complete the compareendemicsizes function to plot the distribution of
endemic sizes across several simulation runs as a boxplot, and compare it with the
theoretical calculation for endemic size as a single dot, similarly to the previous
subsection.
5. 5 points Run the code provided in cell 1.5 and look at the resulting figures. How good of
a fit is the exponential curve in section 1.2? Explain how the theoretical estimates in 1.3
1.4 compare to the empirical distribution and indicate which you would consider a
reasonable fit for the data.
Part 2: Transmission Rate 25 Points
Next, let us vary the transmission rate and see how it affects the spread of infection. Since we
know that only the ratio of the transmission rate and the recovery rate matters, let us keep the
recovery rate constant at 0.5 and vary only the transmission rate.
1. 10 points Complete the simulatebetasweep function to vary the transmission rate
over a range of beta values between betamin, betamax with betasamples number of
points. For each value of the transmission rate, compute 5 simulations to avoid outliers.
You can reuse your simulateoutbreak function from Part 1 in this function.
Next, complete the extractaveragetau function to return a list of the average value
calculated over the five simulation runs for EACH beta value. You may reuse the
getexponent function from Part 1.
Finally, complete the plotbetataucurves function to show the exponential curve
given by the values for each beta value. The xaxis is time and yaxis is the number of
infected people. Use a log scale on the yaxis and make sure that each line has its own
color. This function should be similar to the plotcurvefit function in part 1.2, but you
GT CS 7280: Network Science
will be showing a series of exponentials instead of comparing an experimental with a
theoretical curve.
2. 10 points Complete the extractaverageendemicsize function to return a list of the
average endemic size calculated over the five simulation runs for EACH beta value.
Next, complete the calculatetheoreticalendemic function to find the minimum
theoretical beta values of the transmission rate for an epidemic to occur. Calculate this
minimum based on the equations derived in lecture for both the random distribution and
the arbitrary distribution. Also, calculate the theoretical endemic size for each value of
beta under the assumption of random distribution.
Finally, complete the compareendemicsizesvsbeta function to plot the average
endemic sizes and theoretical endemic sizes as a curve vs beta. Additionally, plot the
minimum values for beta to start an endemic as vertical lines. Make sure to label each
line and provide a legend.
3. 5 points Run the code provided in cell 2.3 and look at the resulting figures. How similar
is the theoretical to experimental endemic sizes? How closely do the minimum beta
values provide a reasonable lower bound for the start of an endemic?
Part 3: Patient0 Centrality 30 Points
Now, let us see how the choice of patient0 affects the spread of an outbreak. Consider every
node of the network as patient0, and run the SIS model using the parameters in Part 1 to
compute . Run the simulation with each node in the simulation as patient0. Hint: You can skip
cases where the infection quickly diminishes to 0.
1. 10 points Complete the sweepinitialinfected function to complete a single
simulation run for each node in the graph as the initial infected. Check for runs that
stochastically die out and do not save those. Return the list of simulation run results and
a list of nodes integer IDs where the simulation was successful.
Additionally, complete the computecentrality function to calculate the: degree
centrality, closeness centrality with wfimprovedfalse, betweenness centrality, and
eigenvector centrality of the graph. Remember to use the unweighted centrality metrics.
Return the centralities for each node where the simulation was kept in the previous
Hint: We provide nodes as an argument which is meant to represent the second output
of the previous function. You can use this to filter for centralities of only these nodes
before you return them. Check the cell for 3.3 to see exactly how this is used.
GT CS 7280: Network Science
2. 15 points Complete the calculatepearsoncorrelation to compute the Pearson
correlation coefficient between each centrality metric and , along with a pvalue for that
correlation.
Additionally, complete the plotcentralityvstau function to plot a scatter plot between
the value that corresponds to each node, and different centrality metrics of that node:
degree centrality, closeness centrality, betweenness centrality, and eigenvector
centrality. Do this all as one figure with four subfigures. Include the Pearson correlation
values as well as the corresponding pvalues in the title for each scatter plot. Remember
to use the unweighted centrality metrics.
3. 5 points Rank these centrality metrics based on Pearsons correlation coefficient, and
determine which metrics can be a better predictor of how fast an outbreak will spread
from the initial node. Analyze your results. That is, do the results match your intuition? If
they differ, why might that be?
Part 4: Knowledge Question 5 Points
Answer the following food for thought question from Lesson 10 Submodularity of Objective
Prove that a nonnegative linear combination of a set of submodular functions is also a
submodular function.
Hint: Make sure you understand the definition of linearity.