Module 2: Epidemic propagation

👀 Reading hidden code

186 μs

We are starting a new module on modelling epidemic propagation.

Let's start off by analysing some of the data that is now available on the current COVID-19 pandemic.

👀 Reading hidden code

255 μs

Exploring COVID-19 data

👀 Reading hidden code

190 μs

In this notebook we will explore and analyse data on the COVID-19 pandemic. The aim is to use Julia's tools to analyse and visualise the data in different ways.

Here is an example of the kind of visualisation we will be able to produce:

👀 Reading hidden code

262 μs

Download and load data

👀 Reading hidden code

182 μs

"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

👀 Reading hidden code

url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

11.6 μs

"covid_data.csv"

👀 Reading hidden code

download(url, "covid_data.csv")

351 ms

We will need a couple of new packages. The data is in CSV format, i.e. Comma-Separated Values. This is a common data format in which observations, i.e. data points, are separated on different lines. Within each line the different data for that observation are separated by commas or other punctuation (possibly spaces and tabs).

👀 Reading hidden code

339 μs

begin

using Pkg

Pkg.add.(["CSV", "DataFrames", "PlutoUI", "Shapefile", "ZipFile", "JSON"])

using CSV

using DataFrames

using PlutoUI

using Shapefile

using ZipFile

using JSON

end

👀 Reading hidden code

    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [336ed68f] + CSV v0.10.15
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [336ed68f] + CSV v0.10.15
  [944b1d66] + CodecZlib v0.7.8
  [34da2185] + Compat v4.16.0
  [9a962f9c] + DataAPI v1.16.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [48062228] + FilePathsBase v0.9.24
  [842dd82b] + InlineStrings v1.4.3
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [bac558e1] + OrderedCollections v1.8.0
  [69de0a69] + Parsers v2.8.1
  [2dfb63ee] + PooledArrays v1.4.3
  [91c51154] + SentinelArrays v1.4.8
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.12.0
  [3bb67fe8] + TranscodingStreams v0.11.3
  [ea10d353] + WeakRefStrings v1.4.2
  [76eceee3] + WorkerUtilities v1.6.1
  [2a0f44e3] + Base64
  [9fa8497b] + Future
  [b77e0a4c] + InteractiveUtils
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [83775a58] + Zlib_jll
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [a93c6f00] + DataFrames v1.7.0
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [a8cc5b0e] + Crayons v4.1.1
  [a93c6f00] + DataFrames v1.7.0
  [864edb3b] + DataStructures v0.18.20
  [41ab1584] + InvertedIndices v1.3.1
  [b964fa9f] + LaTeXStrings v1.4.0
  [e1d29d7a] + Missings v1.2.0
  [08abe8d2] + PrettyTables v2.3.2
  [189a3867] + Reexport v1.2.2
  [a2af1166] + SortingAlgorithms v1.2.1
  [892a3eda] + StringManipulation v0.3.4
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [7f904dfe] + PlutoUI v0.7.61
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [6e696c72] + AbstractPlutoDingetjes v1.3.2
  [3da002f7] + ColorTypes v0.11.5
  [53c48c17] + FixedPointNumbers v0.8.5
  [47d2ed2b] + Hyperscript v0.0.5
  [ac1192a8] + HypertextLiteral v0.9.5
  [b5f81e59] + IOCapture v0.2.5
  [682c06a0] + JSON v0.21.4
  [6c6e2e6c] + MIMEs v1.0.0
  [7f904dfe] + PlutoUI v0.7.61
  [410a4b4d] + Tricks v0.1.10
  [5c2747f8] + URIs v1.5.1
  [0dad84c5] + ArgTools
  [f43a241f] + Downloads
  [7b1f6079] + FileWatching
  [b27032c2] + LibCURL
  [76f85450] + LibGit2
  [ca575930] + NetworkOptions
  [44cfe95a] + Pkg
  [3fa0cd96] + REPL
  [6462fe0b] + Sockets
  [a4e569a6] + Tar
  [deac9b47] + LibCURL_jll
  [29816b5a] + LibSSH2_jll
  [c8ffd9c3] + MbedTLS_jll
  [14a3606d] + MozillaCACerts_jll
  [8e850ede] + nghttp2_jll
  [3f19e933] + p7zip_jll
   Resolving package versions...
   Installed GeoInterfaceRecipes ─ v1.0.2
   Installed DBFTables ─────────── v1.2.6
   Installed Shapefile ─────────── v0.11.0
    Updating `~/.julia/environments/v1.7/Project.toml`
  [8e980c4a] + Shapefile v0.11.0
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [75c7ada1] + DBFTables v1.2.6
  [411431e0] + Extents v0.1.4
  [68eda718] + GeoFormatTypes v0.4.4
  [cf35fbd7] + GeoInterface v1.4.1
  [0329782f] + GeoInterfaceRecipes v1.0.2
  [3cdcf5f2] + RecipesBase v1.3.4
  [8e980c4a] + Shapefile v0.11.0
Precompiling project...
  ✓ GeoInterfaceRecipes
  ✓ DBFTables
  ✓ Shapefile
  3 dependencies successfully precompiled in 3 seconds (53 already precompiled)
   Resolving package versions...
   Installed ZipFile ─ v0.10.1
    Updating `~/.julia/environments/v1.7/Project.toml`
  [a5390f91] + ZipFile v0.10.1
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [a5390f91] + ZipFile v0.10.1
Precompiling project...
  ✓ ZipFile
  1 dependency successfully precompiled in 0 seconds (56 already precompiled)
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [682c06a0] + JSON v0.21.4
  No Changes to `~/.julia/environments/v1.7/Manifest.toml`

8.3 s

We can load the data from a CSV using the File function from the CSV.jl package, and then convert it to a DataFrame:

👀 Reading hidden code

218 μs

	Province/State	Country/Region	Lat	Long	1/22/20	1/23/20	1/24/20	1/25/20
	String?	String	Float64?	Float64?	Int64	Int64	Int64	Int64
1	missing	"Afghanistan"	33.9391	67.71	0	0	0	0
2	missing	"Albania"	41.1533	20.1683	0	0	0	0
3	missing	"Algeria"	28.0339	1.6596	0	0	0	0
4	missing	"Andorra"	42.5063	1.5218	0	0	0	0
5	missing	"Angola"	-11.2027	17.8739	0	0	0	0
6	missing	"Antarctica"	-71.9499	23.347	0	0	0	0
7	missing	"Antigua and Barbuda"	17.0608	-61.7964	0	0	0	0
8	missing	"Argentina"	-38.4161	-63.6167	0	0	0	0
9	missing	"Armenia"	40.0691	45.0382	0	0	0	0
10	"Australian Capital Territory"	"Australia"	-35.4735	149.012	0	0	0	0
more
289	missing	"Zimbabwe"	-19.0154	29.1549	0	0	0	0

begin

csv_data = CSV.File("covid_data.csv");

data = DataFrame(csv_data) # it is common to use `df` as a variable name

end

👀 Reading hidden code

13.4 s

A DataFrame is a standard way of storing heterogeneous data in Julia, i.e. a table consisting of columns with different types. As you can see from the display of the DataFrame object above, each column has an associated type, but different columns have different types, reflecting the type of the data in that column.

In our case, country names are stored as Strings, their latitude and longitude as Float64s and the (cumulative) case counts for each day as Int64s. .

👀 Reading hidden code

347 μs

Using the data

👀 Reading hidden code

207 μs

Since we need to manipulate the columns, let's rename them to something shorter. We can do this either in place, i.e. modifying the original DataFrame, or out of place, creating a new DataFrame. The convention in Julia is that functions that modify their argument have a name ending with ! (often pronounced "bang").

We can use the head function to see only the first few lines of the data.

👀 Reading hidden code

385 μs

Error message

UndefVarError: head not defined

Stack trace

Here is what happened, the most recent locations are first:

from This cell: line 3

	data_2 = rename(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude")   	head(data_2)end

begin

data_2 = rename(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude")

head(data_2)

end

👀 Reading hidden code

---

Error message

UndefVarError: head not defined

Stack trace

Here is what happened, the most recent locations are first:

from This cell: line 3

	rename!(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude") 	head(data)end

begin

rename!(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude")

head(data)

end

👀 Reading hidden code

---

Extracting useful information

👀 Reading hidden code

189 μs

How can we extract the list of all the countries? The country names are in the second column.

For some purposes we can think of a DataFrame.as a matrix and use similar syntax. For example, we can extract the second column:

👀 Reading hidden code

273 μs

String

"Afghanistan"

"Albania"

"Algeria"

"Andorra"

"Angola"

"Antarctica"

"Antigua and Barbuda"

"Argentina"

"Armenia"

"Australia"

"Australia"

"Australia"

"Australia"

"Australia"

"Australia"

"Australia"

"Australia"

"Austria"

"Azerbaijan"

"Bahamas"

280

"Uruguay"

281

"Uzbekistan"

282

"Vanuatu"

283

"Venezuela"

284

"Vietnam"

285

"West Bank and Gaza"

286

"Winter Olympics 2022"

287

"Yemen"

288

"Zambia"

289

"Zimbabwe"

all_countries = data[:, "country"]

👀 Reading hidden code

25.2 μs

It turns out that some countries are divided into provinces, so there are repetitions in the country column that we can eliminate with the unique function:

👀 Reading hidden code

224 μs

String

"Afghanistan"

"Albania"

"Algeria"

"Andorra"

"Angola"

"Antarctica"

"Antigua and Barbuda"

"Argentina"

"Armenia"

"Australia"

"Austria"

"Azerbaijan"

"Bahamas"

"Bahrain"

"Bangladesh"

"Barbados"

"Belarus"

"Belgium"

"Belize"

"Benin"

192

"Uruguay"

193

"Uzbekistan"

194

"Vanuatu"

195

"Venezuela"

196

"Vietnam"

197

"West Bank and Gaza"

198

"Winter Olympics 2022"

199

"Yemen"

200

"Zambia"

201

"Zimbabwe"

countries = unique(all_countries)

👀 Reading hidden code

92.8 μs

@bind i Slider(1:length(countries), show_value=true)

👀 Reading hidden code

277 ms

Afghanistan

👀 Reading hidden code

7.1 ms

[Here we used string interpolation with $ to put the text into a Markdown string.]

👀 Reading hidden code

250 μs

You can also use Select to get a dropdown instead:

👀 Reading hidden code

211 μs

@bind country Select(countries)

👀 Reading hidden code

135 ms

How can we extract the data for a particular country? First we need to know the exact name of the country. E.g. is the US written as "USA", or "United States"?

We could scroll through to find out, or filter the data to only look at a sample of it, for example those countries that begin with the letter "U".

One way to do this is with an array comprehension:

👀 Reading hidden code

375 μs

Array comprehension:

👀 Reading hidden code

212 μs

Bool

false

false

false

false

false

false

false

false

false

false

false

false

false

false

false

false

false

false

false

false

280

true

281

true

282

false

283

false

284

false

285

false

286

false

287

false

288

false

289

false

U_countries = [startswith(country, "U") for country in all_countries]

👀 Reading hidden code

33.2 μs

length(U_countries)

👀 Reading hidden code

13.9 μs

Note that this returns an array of booleans of the same length as the vector all_countries. We can now use this to index into the DataFrame:

👀 Reading hidden code

217 μs

	province	country	latitude	longitude	1/22/20	1/23/20	1/24/20	1/25/20
	String?	String	Float64?	Float64?	Int64	Int64	Int64	Int64
1	missing	"US"	40.0	-100.0	1	1	2	2
2	missing	"Uganda"	1.37333	32.2903	0	0	0	0
3	missing	"Ukraine"	48.3794	31.1656	0	0	0	0
4	missing	"United Arab Emirates"	23.4241	53.8478	0	0	0	0
5	"Anguilla"	"United Kingdom"	18.2206	-63.0686	0	0	0	0
6	"Bermuda"	"United Kingdom"	32.3078	-64.7505	0	0	0	0
7	"British Virgin Islands"	"United Kingdom"	18.4207	-64.64	0	0	0	0
8	"Cayman Islands"	"United Kingdom"	19.3133	-81.2546	0	0	0	0
9	"Channel Islands"	"United Kingdom"	49.3723	-2.3644	0	0	0	0
10	"Falkland Islands (Malvinas)"	"United Kingdom"	-51.7963	-59.5236	0	0	0	0
more
21	missing	"Uzbekistan"	41.3775	64.5853	0	0	0	0

data[U_countries, :]

👀 Reading hidden code

229 ms

We see that the correct spelling is "US". (And note how the different provinces of the UK are separated.)

👀 Reading hidden code

200 μs

Now we would like to extract the data for the US alone. How can we access the correct row of the table? We can again filter on the country name. A nicer way to do this is to use the filter function.

This is a higher-order function: its first argument is itself a function, which must return true or false. filter will return all the rows of the DataFrame that satisfy that predicate:

👀 Reading hidden code

371 μs

	province	country	latitude	longitude	1/22/20	1/23/20	1/24/20	1/25/20
	String?	String	Float64?	Float64?	Int64	Int64	Int64	Int64
1	"Anguilla"	"United Kingdom"	18.2206	-63.0686	0	0	0	0
2	"Bermuda"	"United Kingdom"	32.3078	-64.7505	0	0	0	0
3	"British Virgin Islands"	"United Kingdom"	18.4207	-64.64	0	0	0	0
4	"Cayman Islands"	"United Kingdom"	19.3133	-81.2546	0	0	0	0
5	"Channel Islands"	"United Kingdom"	49.3723	-2.3644	0	0	0	0
6	"Falkland Islands (Malvinas)"	"United Kingdom"	-51.7963	-59.5236	0	0	0	0
7	"Gibraltar"	"United Kingdom"	36.1408	-5.3536	0	0	0	0
8	"Guernsey"	"United Kingdom"	49.4482	-2.58949	0	0	0	0
9	"Isle of Man"	"United Kingdom"	54.2361	-4.5481	0	0	0	0
10	"Jersey"	"United Kingdom"	49.2138	-2.1358	0	0	0	0
11	"Montserrat"	"United Kingdom"	16.7425	-62.1874	0	0	0	0
12	"Pitcairn Islands"	"United Kingdom"	-24.3768	-128.324	0	0	0	0
13	"Saint Helena, Ascension and Tristan da Cunha"	"United Kingdom"	-7.9467	-14.3559	0	0	0	0
14	"Turks and Caicos Islands"	"United Kingdom"	21.694	-71.7979	0	0	0	0
15	missing	"United Kingdom"	55.3781	-3.436	0	0	0	0

filter(x -> x.country == "United Kingdom", data)

👀 Reading hidden code

279 ms

Here we have used an anonymous function with the syntax x -> ⋯. This is a function which takes the argument x and returns whatever is on the right of the arrow (->).

👀 Reading hidden code

266 μs

To extract a single row we need the index of the row (i.e. which number row it is in the DataFrame). The findfirst function finds the first row that satisfies the given predicate:

👀 Reading hidden code

258 μs

US_row = findfirst(==("US"), all_countries)

👀 Reading hidden code

23.6 μs

DataFrameRow (1147 columns)

1047 columns omitted

Row	province	country	latitude	longitude	1/22/20	1/23/20	1/24/20	1/25/20	1/26/20	1/27/20	1/28/20	1/29/20	1/30/20	1/31/20	2/1/20	2/2/20	2/3/20	2/4/20	2/5/20	2/6/20	2/7/20	2/8/20	2/9/20	2/10/20	2/11/20	2/12/20	2/13/20	2/14/20	2/15/20	2/16/20	2/17/20	2/18/20	2/19/20	2/20/20	2/21/20	2/22/20	2/23/20	2/24/20	2/25/20	2/26/20	2/27/20	2/28/20	2/29/20	3/1/20	3/2/20	3/3/20	3/4/20	3/5/20	3/6/20	3/7/20	3/8/20	3/9/20	3/10/20	3/11/20	3/12/20	3/13/20	3/14/20	3/15/20	3/16/20	3/17/20	3/18/20	3/19/20	3/20/20	3/21/20	3/22/20	3/23/20	3/24/20	3/25/20	3/26/20	3/27/20	3/28/20	3/29/20	3/30/20	3/31/20	4/1/20	4/2/20	4/3/20	4/4/20	4/5/20	4/6/20	4/7/20	4/8/20	4/9/20	4/10/20	4/11/20	4/12/20	4/13/20	4/14/20	4/15/20	4/16/20	4/17/20	4/18/20	4/19/20	4/20/20	4/21/20	4/22/20	4/23/20	4/24/20	4/25/20	4/26/20	⋯
	String?	String	Float64?	Float64?	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	⋯
261	missing	US	40.0	-100.0	1	1	2	2	5	5	5	6	6	8	8	8	11	11	11	12	12	12	12	12	13	13	14	14	14	14	14	14	14	14	16	16	16	16	16	16	17	17	25	32	55	74	107	184	237	403	519	594	782	1147	1586	2219	2978	3212	4679	6512	9169	13663	20030	26025	34944	46096	56714	68841	86662	105253	127417	143544	165698	192079	227903	260183	292630	324340	353117	385110	415256	446500	482526	516707	545693	571551	598794	627306	653669	683351	716508	743857	768888	799531	825478	855445	887523	919014	949640	975537	⋯

data[US_row, :]

👀 Reading hidden code

15.8 μs

Now we can extract the data into a standard Julia Vector:

👀 Reading hidden code

204 μs

Int64

1134

103443455

1135

103533872

1136

103589757

1137

103648690

1138

103650837

1139

103646975

1140

103655539

1141

103690910

1142

103755771

1143

103802702

US_data = Vector(data[US_row, 5:end])

👀 Reading hidden code

81.9 ms

Error message

The package Plots.jl could not load because it failed to initialize.

That's not nice! Things you could try:

Restart the notebook.
Try a different Julia version.
Contact the developers of Plots.jl about this error.

You might find useful information in the package installation log:

using Plots

👀 Reading hidden code

---

Error message

UndefVarError: scatter not defined

Stack trace

Here is what happened, the most recent locations are first:

from This cell: line 1

scatter(US_data, m=:o, alpha=0.5, ms=3, xlabel="day", ylabel="cumulative cases", leg=false)

scatter(US_data, m=:o, alpha=0.5, ms=3, xlabel="day", ylabel="cumulative cases", leg=false)

👀 Reading hidden code

---

Note that we are only passing a single vector to the scatter function, so the $x$ coordinates are taken as the natural numbers $1$ , $2$ , etc.

Also note that the $y$ -axis in this plot gives the cumulative case numbers, i.e. the total number of confirmed cases since the start of the epidemic up to the given date.

👀 Reading hidden code

384 μs

This is an example of a time series, i.e. a single quantity that changes over time.

👀 Reading hidden code

253 μs

Using dates

👀 Reading hidden code

185 μs

We would like to use actual dates instead of just the number of days since the start of the recorded data. The dates are given in the column names of the DataFrame:

👀 Reading hidden code

217 μs

String

"province"

"country"

"latitude"

"longitude"

"1/22/20"

"1/23/20"

"1/24/20"

"1/25/20"

"1/26/20"

"1/27/20"

"1/28/20"

"1/29/20"

"1/30/20"

"1/31/20"

"2/1/20"

"2/2/20"

"2/3/20"

"2/4/20"

"2/5/20"

"2/6/20"

1138

"2/28/23"

1139

"3/1/23"

1140

"3/2/23"

1141

"3/3/23"

1142

"3/4/23"

1143

"3/5/23"

1144

"3/6/23"

1145

"3/7/23"

1146

"3/8/23"

1147

"3/9/23"

column_names = names(data)

👀 Reading hidden code

17.2 ms

String

"1/22/20"

"1/23/20"

"1/24/20"

"1/25/20"

"1/26/20"

"1/27/20"

"1/28/20"

"1/29/20"

"1/30/20"

"1/31/20"

"2/1/20"

"2/2/20"

"2/3/20"

"2/4/20"

"2/5/20"

"2/6/20"

"2/7/20"

"2/8/20"

"2/9/20"

"2/10/20"

1134

"2/28/23"

1135

"3/1/23"

1136

"3/2/23"

1137

"3/3/23"

1138

"3/4/23"

1139

"3/5/23"

1140

"3/6/23"

1141

"3/7/23"

1142

"3/8/23"

1143

"3/9/23"

date_strings = String.(names(data)[5:end]) # apply String function to each element

👀 Reading hidden code

79.1 ms

Now we need to parse the date strings, i.e. convert from a string representation into an actual Julia type provided by the Dates.jl standard library package:

👀 Reading hidden code

250 μs

using Dates

👀 Reading hidden code

268 μs

"1/22/20"

date_strings[1]

👀 Reading hidden code

11.4 μs

dateformat"m/d/Y"

date_format = Dates.DateFormat("m/d/Y")

👀 Reading hidden code

6.1 ms

0020-01-22

parse(Date, date_strings[1], date_format)

👀 Reading hidden code

388 ms

Since the year was not correctly represented in the original data, we need to manually fix it:

👀 Reading hidden code

195 μs

Dates.Date

2020-01-22

2020-01-23

2020-01-24

2020-01-25

2020-01-26

2020-01-27

2020-01-28

2020-01-29

2020-01-30

2020-01-31

2020-02-01

2020-02-02

2020-02-03

2020-02-04

2020-02-05

2020-02-06

2020-02-07

2020-02-08

2020-02-09

2020-02-10

1134

2023-02-28

1135

2023-03-01

1136

2023-03-02

1137

2023-03-03

1138

2023-03-04

1139

2023-03-05

1140

2023-03-06

1141

2023-03-07

1142

2023-03-08

1143

2023-03-09

dates = parse.(Date, date_strings, date_format) .+ Year(2000)

👀 Reading hidden code

84.3 ms

Error message

UndefVarError: plot not defined

Stack trace

Here is what happened, the most recent locations are first:

from This cell: line 2

begin	plot(dates, US_data, xrotation=45, leg=:topleft, 	    label="US data", m=:o, ms=3, alpha=0.5)

begin

plot(dates, US_data, xrotation=45, leg=:topleft,

label="US data", m=:o, ms=3, alpha=0.5)

xlabel!("date")

ylabel!("cumulative US cases")

title!("US cumulative confirmed COVID-19 cases")

end

👀 Reading hidden code

---

Exploratory data analysis

👀 Reading hidden code

184 μs

Working with cumulative data is often less intuitive. Let's look at the actual number of daily cases. Julia has a diff function to calculate the difference between successive entries of a vector:

👀 Reading hidden code

251 μs

Error message

UndefVarError: plot not defined

Stack trace

Here is what happened, the most recent locations are first:

from This cell: line 3

	daily_cases = diff(US_data)	plot(dates[2:end], daily_cases, m=:o, leg=false, xlabel="days", ylabel="daily US cases", alpha=0.5)   # use "o"-shaped markersend

begin

daily_cases = diff(US_data)

plot(dates[2:end], daily_cases, m=:o, leg=false, xlabel="days", ylabel="daily US cases", alpha=0.5) # use "o"-shaped markers

end

👀 Reading hidden code

---

Note that discrete data should always be plotted with points. The lines are just to guide the eye.

Cumulating data corresponds to taking the integral of a function and is a smoothing operation. Note that the cumulative data is indeed visually smoother than the daily data.

The oscillations in the daily data seem to be due to a lower incidence of reporting at weekends. We could try to smooth this out by taking a moving average, say over the past week:

👀 Reading hidden code

424 μs

Float64

0.714286

0.714286

0.857143

0.857143

0.428571

0.857143

0.857143

0.714286

0.857143

0.571429

0.571429

0.571429

0.142857

0.285714

0.285714

0.285714

0.285714

0.285714

0.285714

0.285714

1127

34969.4

1128

32148.6

1129

32035.1

1130

38611.7

1131

38525.7

1132

37744.6

1133

36530.3

1134

35350.7

1135

31699.9

1136

30420.7

begin

using Statistics

running_mean = [mean(daily_cases[i-6:i]) for i in 7:length(daily_cases)]

end

👀 Reading hidden code

99.4 ms

Error message

UndefVarError: plot not defined

begin

plot(daily_cases, label="raw daily cases")

plot!(running_mean, m=:o, label="running weakly mean")

end

👀 Reading hidden code

---

Exponential growth

Simple models of epidemic spread often predict a period with exponential growth. Do the data corroborate this?

👀 Reading hidden code

280 μs

A visual check for this is to plot the data with a logarithmic scale on the $y$ axis (but a standard scale on the $x$ axis).

The reason for this is that if we observe a straight line on such a semi-logarithmic plot, we have

$\log (y) \sim a x + b,$

where we are using $\sim$ to denote approximate equality.

Hence, taking exponentials of both sides, we have

$y \sim \exp (a x + b) = c e^{a x},$

for some constant $c$ .

👀 Reading hidden code

1.9 ms

Since the data contains some zeros, we need to replace those with NaNs ("Not a Number"), which Plots.jl interprets as a signal to break the line

👀 Reading hidden code

233 μs

Error message

UndefVarError: plot not defined

begin

plot(replace(daily_cases, 0 => NaN),

yscale=:log10,

leg=false, m=:o)

xlabel!("day")

ylabel!("confirmed cases in US")

title!("US confirmed COVID-19 cases")

end

👀 Reading hidden code

---

Let's zoom on the part where the growth seems linear on this semi-log plot:

👀 Reading hidden code

197 μs

Error message

UndefVarError: plot not defined

begin

plot(replace(daily_cases, 0 => NaN),

yscale=:log10,

leg=false, m=:o,

xlims=(1, 100))

xlabel!("day")

ylabel!("confirmed cases in US")

title!("US confirmed COVID-19 cases")

end

👀 Reading hidden code

---

We see that there is a period lasting from around day 38 to around day 60 when the curve looks straight on the semi-log plot. This corresponds to the following date range:

👀 Reading hidden code

15.2 ms

38:60

exp_period = 38:60

👀 Reading hidden code

16.4 μs

Dates.Date

2020-02-28

2020-02-29

2020-03-01

2020-03-02

2020-03-03

2020-03-04

2020-03-05

2020-03-06

2020-03-07

2020-03-08

2020-03-09

2020-03-10

2020-03-11

2020-03-12

2020-03-13

2020-03-14

2020-03-15

2020-03-16

2020-03-17

2020-03-18

2020-03-19

2020-03-20

2020-03-21

dates[exp_period]

👀 Reading hidden code

17.4 μs

i.e. the first 3 weeks of March. Fortunately the imposition of lockdown during the last 10 days of March (on different days in different US states) significantly reduced transmission.

👀 Reading hidden code

210 μs

We can fit a straight line using linear regression to this portion of the data.

👀 Reading hidden code

244 μs

Geographical data

👀 Reading hidden code

187 μs

Our data set contains more information: the geographical locations (latitude and longitude) of each country (or, rather, of a particular point that was chosen as being representative of that country).

👀 Reading hidden code

215 μs

	province	country	latitude	longitude	1/22/20	1/23/20	1/24/20	1/25/20
	String?	String	Float64?	Float64?	Int64	Int64	Int64	Int64
1	missing	"Afghanistan"	33.9391	67.71	0	0	0	0
2	missing	"Albania"	41.1533	20.1683	0	0	0	0
3	missing	"Algeria"	28.0339	1.6596	0	0	0	0
4	missing	"Andorra"	42.5063	1.5218	0	0	0	0
5	missing	"Angola"	-11.2027	17.8739	0	0	0	0
6	missing	"Antarctica"	-71.9499	23.347	0	0	0	0
7	missing	"Antigua and Barbuda"	17.0608	-61.7964	0	0	0	0
8	missing	"Argentina"	-38.4161	-63.6167	0	0	0	0
9	missing	"Armenia"	40.0691	45.0382	0	0	0	0
10	"Australian Capital Territory"	"Australia"	-35.4735	149.012	0	0	0	0
more
19	missing	"Azerbaijan"	40.1431	47.5769	0	0	0	0

filter(x -> startswith(x.country, "A"), data)

👀 Reading hidden code

54.9 ms

Let's extract and plot the geographical information. To reduce the visual noise a bit we will only use those

👀 Reading hidden code

201 μs

province = data.province;

👀 Reading hidden code

13.8 ms

If the province is missing we should use the country name instead:

👀 Reading hidden code

221 μs

begin

indices = ismissing.(province)

province[indices] .= all_countries[indices]

end;

👀 Reading hidden code

286 ms

Error message

UndefVarError: scatter not defined

Stack trace

Here is what happened, the most recent locations are first:

from This cell: line 3

		scatter(data.longitude, data.latitude, leg=false, alpha=0.5, ms=2)

C'est la vie !

begin

scatter(data.longitude, data.latitude, leg=false, alpha=0.5, ms=2)

for i in 1:length(province)

annotate!(data.longitude[i], data.latitude[i], text(province[i], :center, 5, color=RGBA{Float64}(0.0,0.0,0.0,0.3)))

end

plot!(axis=false)

end

👀 Reading hidden code

---

Union{Missing, Float64}

33.9391

41.1533

28.0339

42.5063

-11.2027

-71.9499

17.0608

-38.4161

40.0691

-35.4735

-33.8688

-12.4634

-27.4698

-34.9285

-42.8821

-37.8136

-31.9505

47.5162

40.1431

25.0259

280

-32.5228

281

41.3775

282

-15.3767

283

6.4238

284

14.0583

285

31.9522

286

39.9042

287

15.5527

288

-13.1339

289

-19.0154

data.latitude

👀 Reading hidden code

19.3 μs

Adding maps

👀 Reading hidden code

183 μs

We would also like to see the outlines of each country. For this we can use, for example, the data from Natural Earth, which comes in the form of shape files, giving the outlines in terms of latitude and longitude coordinates.

These may be read in using the Shapefile.jl package.

👀 Reading hidden code

356 μs

Error message

HTTP/2 500 while requesting https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip

Stack trace

Here is what happened, the most recent locations are first:

anonymous function
from Downloads.jl:243
arg_write(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Nothing, String}, arg::Nothing)
from ArgTools.jl:101
#download#2
from Downloads.jl:230
download(url::String, output::Nothing)
from Downloads.jl:230
#invokelatest#2
from essentials.jl:716
invokelatest
from essentials.jl:714
do_download
from download.jl:24
download
from download.jl:20

from This cell: line 2

begin	zipfile = download("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip")

begin

zipfile = download("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip")

r = ZipFile.Reader(zipfile);

for f in r.files

println("Filename: \$(f.name)")

open(f.name, "w") do io

write(io, read(f))

end

close(r)

end

👀 Reading hidden code

---

"/tmp/jl_q0uI3h"

download("https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries")

👀 Reading hidden code

366 ms

Error message

ArgumentError: File not found: ./ne_110m_admin_0_countries.dbf

Stack trace

Here is what happened, the most recent locations are first:

Shapefile.Table(path::String)
from table.jl:71

from This cell: line 1

shp_countries = Shapefile.shapes(Shapefile.Table("./ne_110m_admin_0_countries.shp"));

shp_countries = Shapefile.shapes(Shapefile.Table("./ne_110m_admin_0_countries.shp"));

👀 Reading hidden code

---

👀 Reading hidden code

83.3 μs

# plot!(shp_countries, alpha=0.2)

👀 Reading hidden code

17.9 μs

Now we would like to combine the geographical and temporal (time) aspects. One way to do so is to animate time:

👀 Reading hidden code

232 μs

daily = max.(1, diff(Array(data[:, 5:end]), dims=2));

👀 Reading hidden code

502 ms

speed:

@bind day_ticks Clock(0.5)

👀 Reading hidden code

62.2 ms

day = min(day_ticks, size(daily, 2))

👀 Reading hidden code

22.0 μs

1.2304489213782739

log10(maximum(daily[:, day]))

👀 Reading hidden code

24.2 μs

2020-01-22

dates[day]

👀 Reading hidden code

11.8 μs

begin

plot(shp_countries, alpha=0.2)

scatter!(data.longitude, data.latitude, leg=false, ms=2*log10.(daily[:, day]), alpha=0.7)

xlabel!("latitude")

ylabel!("longitude")

title!("daily cases per country")

end

👀 Reading hidden code

18.4 μs

Resource("https://api.mapbox.com/mapbox-gl-js/v1.12.0/mapbox-gl.css")

👀 Reading hidden code

4.6 ms

Resource("https://api.mapbox.com/mapbox-gl-js/v1.12.0/mapbox-gl.js")

👀 Reading hidden code

4.5 ms

Day 1

👀 Reading hidden code

1.6 ms

html"""

mapboxgl.accessToken = 'pk.eyJ1Ijoic2hhc2hpNTMiLCJhIjoiY2ppMG5vZmpuMWEyNjNwb2I5dWhveTkyZCJ9.dQ67jXuhU3DGz7QFR35alw';

var map = new mapboxgl.Map({

container: 'map',

style: 'mapbox://styles/mapbox/light-v10',

zoom: 1,

center: [0, 0]

});

var elem = document.getElementById("map");

elem.mapbox = map;

map.on('load', function () {

// Add a GeoJSON source with 2 points

map.addSource('points', {

'type': 'geojson',

'data': {

'type': 'FeatureCollection',

'features': [

{

'type': 'Feature',

'geometry': {

'type': 'Point',

'coordinates': [

-77.03238901390978,

38.913188059745586

]

}

{

'type': 'Feature',

'geometry': {

'type': 'Point',

'coordinates': [-122.414, 37.776]

}

]

}

});

map.addLayer({

'id': 'points',

'type': 'circle',

'source': 'points',

'paint': {

'circle-color': {

"property": "size",

"stops": [

[0, "#fff5f0"],

[1, "#fee0d2"],

[2, "#fcbba1"],

[3, "#fc9272"],

[4, "#fb6a4a"],

[7, "#ef3b2c"],

[8, "#cb181d"],

[9, "#a50f15"],

[10, "#67000d"]

]

// make circles larger as the user zooms from z12 to z22

'circle-radius': [

'interpolate',

['linear'], ["zoom"],

0, ['get', 'size']

]

}

});

</script>

"""

👀 Reading hidden code

134 μs

👀 Reading hidden code

1.0 s

Float64

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

280

0.0

281

0.0

282

0.0

283

0.0

284

0.30103

285

0.0

286

0.0

287

0.0

288

0.0

289

0.0

log10.(daily[:, day])

👀 Reading hidden code

33.8 ms

set_points (generic function with 1 method)

function set_points(data, fotoday)

jsondata = sprint(io->JSON.print(io, make_features(data, fotoday)));

HTML("""

var elem = document.getElementById("map");

elem.mapbox.getSource('points').setData($jsondata)

</script>

""")

end

👀 Reading hidden code

1.0 ms

make_features (generic function with 1 method)

👀 Reading hidden code

1.2 ms

However, we should always be wary about visualisations such as these. Perhaps we should be plotting cases per capita instead of absolute numbers of cases. Or should we divide by the area of the country? Some countries, such as China and Canada, are divided into states or regions in the original data set – but others, such as the US, are not. You should always check exactly what is being plotted!

Unfortunately, published visualisations often hide some of this information. This emphasises the need to be able to get our hands on the data, create our own visualisations and draw our own conclusions.

👀 Reading hidden code

302 μs

In the cloud (experimental)

On your computer

Frontmatter

Preview

Module 2: Epidemic propagation

Exploring COVID-19 data

Download and load data

Using the data

Extracting useful information

Using dates

Exploratory data analysis

Exponential growth

Geographical data

Adding maps