To be able to edit code and run cells, you need to run the notebook yourself. Where would you like to run the notebook?

In the cloud (experimental)

Binder is a free, open source service that runs scientific notebooks in the cloud! It will take a while, usually 2-7 minutes to get a session.

On your computer

(Recommended if you want to store your changes.)

  1. Copy the notebook URL:
  2. Run Pluto

    (Also see: How to install Julia and Pluto)

  3. Paste URL in the Open box

Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Author 1

Module 2: Epidemic propagation

👀 Reading hidden code
186 μs

We are starting a new module on modelling epidemic propagation.

Let's start off by analysing some of the data that is now available on the current COVID-19 pandemic.

👀 Reading hidden code
255 μs

Exploring COVID-19 data

👀 Reading hidden code
190 μs

In this notebook we will explore and analyse data on the COVID-19 pandemic. The aim is to use Julia's tools to analyse and visualise the data in different ways.

Here is an example of the kind of visualisation we will be able to produce:

👀 Reading hidden code
262 μs

Download and load data

👀 Reading hidden code
182 μs
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
👀 Reading hidden code
url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
11.6 μs
"covid_data.csv"
👀 Reading hidden code
download(url, "covid_data.csv")
351 ms

We will need a couple of new packages. The data is in CSV format, i.e. Comma-Separated Values. This is a common data format in which observations, i.e. data points, are separated on different lines. Within each line the different data for that observation are separated by commas or other punctuation (possibly spaces and tabs).

👀 Reading hidden code
339 μs
begin
using Pkg
Pkg.add.(["CSV", "DataFrames", "PlutoUI", "Shapefile", "ZipFile", "JSON"])

using CSV
using DataFrames
using PlutoUI
using Shapefile
using ZipFile
using JSON
end
👀 Reading hidden code
❔
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [336ed68f] + CSV v0.10.15
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [336ed68f] + CSV v0.10.15
  [944b1d66] + CodecZlib v0.7.8
  [34da2185] + Compat v4.16.0
  [9a962f9c] + DataAPI v1.16.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [48062228] + FilePathsBase v0.9.24
  [842dd82b] + InlineStrings v1.4.3
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [bac558e1] + OrderedCollections v1.8.0
  [69de0a69] + Parsers v2.8.1
  [2dfb63ee] + PooledArrays v1.4.3
  [91c51154] + SentinelArrays v1.4.8
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.12.0
  [3bb67fe8] + TranscodingStreams v0.11.3
  [ea10d353] + WeakRefStrings v1.4.2
  [76eceee3] + WorkerUtilities v1.6.1
  [2a0f44e3] + Base64
  [9fa8497b] + Future
  [b77e0a4c] + InteractiveUtils
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [83775a58] + Zlib_jll
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [a93c6f00] + DataFrames v1.7.0
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [a8cc5b0e] + Crayons v4.1.1
  [a93c6f00] + DataFrames v1.7.0
  [864edb3b] + DataStructures v0.18.20
  [41ab1584] + InvertedIndices v1.3.1
  [b964fa9f] + LaTeXStrings v1.4.0
  [e1d29d7a] + Missings v1.2.0
  [08abe8d2] + PrettyTables v2.3.2
  [189a3867] + Reexport v1.2.2
  [a2af1166] + SortingAlgorithms v1.2.1
  [892a3eda] + StringManipulation v0.3.4
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [7f904dfe] + PlutoUI v0.7.61
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [6e696c72] + AbstractPlutoDingetjes v1.3.2
  [3da002f7] + ColorTypes v0.11.5
  [53c48c17] + FixedPointNumbers v0.8.5
  [47d2ed2b] + Hyperscript v0.0.5
  [ac1192a8] + HypertextLiteral v0.9.5
  [b5f81e59] + IOCapture v0.2.5
  [682c06a0] + JSON v0.21.4
  [6c6e2e6c] + MIMEs v1.0.0
  [7f904dfe] + PlutoUI v0.7.61
  [410a4b4d] + Tricks v0.1.10
  [5c2747f8] + URIs v1.5.1
  [0dad84c5] + ArgTools
  [f43a241f] + Downloads
  [7b1f6079] + FileWatching
  [b27032c2] + LibCURL
  [76f85450] + LibGit2
  [ca575930] + NetworkOptions
  [44cfe95a] + Pkg
  [3fa0cd96] + REPL
  [6462fe0b] + Sockets
  [a4e569a6] + Tar
  [deac9b47] + LibCURL_jll
  [29816b5a] + LibSSH2_jll
  [c8ffd9c3] + MbedTLS_jll
  [14a3606d] + MozillaCACerts_jll
  [8e850ede] + nghttp2_jll
  [3f19e933] + p7zip_jll
   Resolving package versions...
   Installed GeoInterfaceRecipes ─ v1.0.2
   Installed DBFTables ─────────── v1.2.6
   Installed Shapefile ─────────── v0.11.0
    Updating `~/.julia/environments/v1.7/Project.toml`
  [8e980c4a] + Shapefile v0.11.0
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [75c7ada1] + DBFTables v1.2.6
  [411431e0] + Extents v0.1.4
  [68eda718] + GeoFormatTypes v0.4.4
  [cf35fbd7] + GeoInterface v1.4.1
  [0329782f] + GeoInterfaceRecipes v1.0.2
  [3cdcf5f2] + RecipesBase v1.3.4
  [8e980c4a] + Shapefile v0.11.0
Precompiling project...
GeoInterfaceRecipes
DBFTables
Shapefile
  3 dependencies successfully precompiled in 3 seconds (53 already precompiled)
   Resolving package versions...
   Installed ZipFile ─ v0.10.1
    Updating `~/.julia/environments/v1.7/Project.toml`
  [a5390f91] + ZipFile v0.10.1
    Updating `~/.julia/environments/v1.7/Manifest.toml`
  [a5390f91] + ZipFile v0.10.1
Precompiling project...
ZipFile
  1 dependency successfully precompiled in 0 seconds (56 already precompiled)
   Resolving package versions...
    Updating `~/.julia/environments/v1.7/Project.toml`
  [682c06a0] + JSON v0.21.4
  No Changes to `~/.julia/environments/v1.7/Manifest.toml`
8.3 s

We can load the data from a CSV using the File function from the CSV.jl package, and then convert it to a DataFrame:

👀 Reading hidden code
218 μs
Province/StateCountry/RegionLatLong1/22/201/23/201/24/201/25/20more
String?StringFloat64?Float64?Int64Int64Int64Int64
1
missing
"Afghanistan"
33.9391
67.71
0
0
0
0
2
missing
"Albania"
41.1533
20.1683
0
0
0
0
3
missing
"Algeria"
28.0339
1.6596
0
0
0
0
4
missing
"Andorra"
42.5063
1.5218
0
0
0
0
5
missing
"Angola"
-11.2027
17.8739
0
0
0
0
6
missing
"Antarctica"
-71.9499
23.347
0
0
0
0
7
missing
"Antigua and Barbuda"
17.0608
-61.7964
0
0
0
0
8
missing
"Argentina"
-38.4161
-63.6167
0
0
0
0
9
missing
"Armenia"
40.0691
45.0382
0
0
0
0
10
"Australian Capital Territory"
"Australia"
-35.4735
149.012
0
0
0
0
more
289
missing
"Zimbabwe"
-19.0154
29.1549
0
0
0
0
begin
csv_data = CSV.File("covid_data.csv");
data = DataFrame(csv_data) # it is common to use `df` as a variable name
end
👀 Reading hidden code
13.4 s

A DataFrame is a standard way of storing heterogeneous data in Julia, i.e. a table consisting of columns with different types. As you can see from the display of the DataFrame object above, each column has an associated type, but different columns have different types, reflecting the type of the data in that column.

In our case, country names are stored as Strings, their latitude and longitude as Float64s and the (cumulative) case counts for each day as Int64s. .

👀 Reading hidden code
347 μs

Using the data

👀 Reading hidden code
207 μs

Since we need to manipulate the columns, let's rename them to something shorter. We can do this either in place, i.e. modifying the original DataFrame, or out of place, creating a new DataFrame. The convention in Julia is that functions that modify their argument have a name ending with ! (often pronounced "bang").

We can use the head function to see only the first few lines of the data.

👀 Reading hidden code
385 μs
Error message

UndefVarError: head not defined

Stack trace

Here is what happened, the most recent locations are first:

  1. 	data_2 = rename(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude")   	head(data_2)end
begin
data_2 = rename(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude")
head(data_2)
end
👀 Reading hidden code
---
Error message

UndefVarError: head not defined

Stack trace

Here is what happened, the most recent locations are first:

  1. 	rename!(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude") 	head(data)end
begin
rename!(data, 1 => "province", 2 => "country", 3 => "latitude", 4 => "longitude")
head(data)
end
👀 Reading hidden code
---

Extracting useful information

👀 Reading hidden code
189 μs

How can we extract the list of all the countries? The country names are in the second column.

For some purposes we can think of a DataFrame.as a matrix and use similar syntax. For example, we can extract the second column:

👀 Reading hidden code
273 μs
all_countries = data[:, "country"]
👀 Reading hidden code
25.2 μs

It turns out that some countries are divided into provinces, so there are repetitions in the country column that we can eliminate with the unique function:

👀 Reading hidden code
224 μs
countries = unique(all_countries)
👀 Reading hidden code
92.8 μs
1
@bind i Slider(1:length(countries), show_value=true)
👀 Reading hidden code
277 ms
Afghanistan
👀 Reading hidden code
7.1 ms

[Here we used string interpolation with $ to put the text into a Markdown string.]

👀 Reading hidden code
250 μs

You can also use Select to get a dropdown instead:

👀 Reading hidden code
211 μs
@bind country Select(countries)
👀 Reading hidden code
135 ms

How can we extract the data for a particular country? First we need to know the exact name of the country. E.g. is the US written as "USA", or "United States"?

We could scroll through to find out, or filter the data to only look at a sample of it, for example those countries that begin with the letter "U".

One way to do this is with an array comprehension:

👀 Reading hidden code
375 μs

Array comprehension:

👀 Reading hidden code
212 μs
U_countries = [startswith(country, "U") for country in all_countries]
👀 Reading hidden code
33.2 μs
289
length(U_countries)
👀 Reading hidden code
13.9 μs

Note that this returns an array of booleans of the same length as the vector all_countries. We can now use this to index into the DataFrame:

👀 Reading hidden code
217 μs
provincecountrylatitudelongitude1/22/201/23/201/24/201/25/20more
String?StringFloat64?Float64?Int64Int64Int64Int64
1
missing
"US"
40.0
-100.0
1
1
2
2
2
missing
"Uganda"
1.37333
32.2903
0
0
0
0
3
missing
"Ukraine"
48.3794
31.1656
0
0
0
0
4
missing
"United Arab Emirates"
23.4241
53.8478
0
0
0
0
5
"Anguilla"
"United Kingdom"
18.2206
-63.0686
0
0
0
0
6
"Bermuda"
"United Kingdom"
32.3078
-64.7505
0
0
0
0
7
"British Virgin Islands"
"United Kingdom"
18.4207
-64.64
0
0
0
0
8
"Cayman Islands"
"United Kingdom"
19.3133
-81.2546
0
0
0
0
9
"Channel Islands"
"United Kingdom"
49.3723
-2.3644
0
0
0
0
10
"Falkland Islands (Malvinas)"
"United Kingdom"
-51.7963
-59.5236
0
0
0
0
more
21
missing
"Uzbekistan"
41.3775
64.5853
0
0
0
0
data[U_countries, :]
👀 Reading hidden code
229 ms

We see that the correct spelling is "US". (And note how the different provinces of the UK are separated.)

👀 Reading hidden code
200 μs

Now we would like to extract the data for the US alone. How can we access the correct row of the table? We can again filter on the country name. A nicer way to do this is to use the filter function.

This is a higher-order function: its first argument is itself a function, which must return true or false. filter will return all the rows of the DataFrame that satisfy that predicate:

👀 Reading hidden code
371 μs
provincecountrylatitudelongitude1/22/201/23/201/24/201/25/20more
String?StringFloat64?Float64?Int64Int64Int64Int64
1
"Anguilla"
"United Kingdom"
18.2206
-63.0686
0
0
0
0
2
"Bermuda"
"United Kingdom"
32.3078
-64.7505
0
0
0
0
3
"British Virgin Islands"
"United Kingdom"
18.4207
-64.64
0
0
0
0
4
"Cayman Islands"
"United Kingdom"
19.3133
-81.2546
0
0
0
0
5
"Channel Islands"
"United Kingdom"
49.3723
-2.3644
0
0
0
0
6
"Falkland Islands (Malvinas)"
"United Kingdom"
-51.7963
-59.5236
0
0
0
0
7
"Gibraltar"
"United Kingdom"
36.1408
-5.3536
0
0
0
0
8
"Guernsey"
"United Kingdom"
49.4482
-2.58949
0
0
0
0
9
"Isle of Man"
"United Kingdom"
54.2361
-4.5481
0
0
0
0
10
"Jersey"
"United Kingdom"
49.2138
-2.1358
0
0
0
0
11
"Montserrat"
"United Kingdom"
16.7425
-62.1874
0
0
0
0
12
"Pitcairn Islands"
"United Kingdom"
-24.3768
-128.324
0
0
0
0
13
"Saint Helena, Ascension and Tristan da Cunha"
"United Kingdom"
-7.9467
-14.3559
0
0
0
0
14
"Turks and Caicos Islands"
"United Kingdom"
21.694
-71.7979
0
0
0
0
15
missing
"United Kingdom"
55.3781
-3.436
0
0
0
0
filter(x -> x.country == "United Kingdom", data)
👀 Reading hidden code
279 ms

Here we have used an anonymous function with the syntax x -> ⋯. This is a function which takes the argument x and returns whatever is on the right of the arrow (->).

👀 Reading hidden code
266 μs

To extract a single row we need the index of the row (i.e. which number row it is in the DataFrame). The findfirst function finds the first row that satisfies the given predicate:

👀 Reading hidden code
258 μs
261
US_row = findfirst(==("US"), all_countries)
👀 Reading hidden code
23.6 μs
DataFrameRow (1147 columns)
1047 columns omitted
Rowprovincecountrylatitudelongitude1/22/201/23/201/24/201/25/201/26/201/27/201/28/201/29/201/30/201/31/202/1/202/2/202/3/202/4/202/5/202/6/202/7/202/8/202/9/202/10/202/11/202/12/202/13/202/14/202/15/202/16/202/17/202/18/202/19/202/20/202/21/202/22/202/23/202/24/202/25/202/26/202/27/202/28/202/29/203/1/203/2/203/3/203/4/203/5/203/6/203/7/203/8/203/9/203/10/203/11/203/12/203/13/203/14/203/15/203/16/203/17/203/18/203/19/203/20/203/21/203/22/203/23/203/24/203/25/203/26/203/27/203/28/203/29/203/30/203/31/204/1/204/2/204/3/204/4/204/5/204/6/204/7/204/8/204/9/204/10/204/11/204/12/204/13/204/14/204/15/204/16/204/17/204/18/204/19/204/20/204/21/204/22/204/23/204/24/204/25/204/26/20
String?StringFloat64?Float64?Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
261missingUS40.0-100.0112255566888111111121212121213131414141414141414161616161616171725325574107184237403519594782114715862219297832124679651291691366320030260253494446096567146884186662105253127417143544165698192079227903260183292630324340353117385110415256446500482526516707545693571551598794627306653669683351716508743857768888799531825478855445887523919014949640975537
data[US_row, :]
👀 Reading hidden code
15.8 μs

Now we can extract the data into a standard Julia Vector:

👀 Reading hidden code
204 μs
US_data = Vector(data[US_row, 5:end])
👀 Reading hidden code
81.9 ms
Error message

The package Plots.jl could not load because it failed to initialize.

That's not nice! Things you could try:

  • Restart the notebook.
  • Try a different Julia version.
  • Contact the developers of Plots.jl about this error.

You might find useful information in the package installation log:

using Plots
👀 Reading hidden code
---
Error message

UndefVarError: scatter not defined

Stack trace

Here is what happened, the most recent locations are first:

  1. scatter(US_data, m=:o, alpha=0.5, ms=3, xlabel="day", ylabel="cumulative cases", leg=false)
scatter(US_data, m=:o, alpha=0.5, ms=3, xlabel="day", ylabel="cumulative cases", leg=false)
👀 Reading hidden code
---

Note that we are only passing a single vector to the scatter function, so the x coordinates are taken as the natural numbers 1, 2, etc.

Also note that the y-axis in this plot gives the cumulative case numbers, i.e. the total number of confirmed cases since the start of the epidemic up to the given date.

👀 Reading hidden code
384 μs

This is an example of a time series, i.e. a single quantity that changes over time.

👀 Reading hidden code
253 μs

Using dates

👀 Reading hidden code
185 μs

We would like to use actual dates instead of just the number of days since the start of the recorded data. The dates are given in the column names of the DataFrame:

👀 Reading hidden code
217 μs
column_names = names(data)
👀 Reading hidden code
17.2 ms
date_strings = String.(names(data)[5:end]) # apply String function to each element
👀 Reading hidden code
79.1 ms

Now we need to parse the date strings, i.e. convert from a string representation into an actual Julia type provided by the Dates.jl standard library package:

👀 Reading hidden code
250 μs
using Dates
👀 Reading hidden code
268 μs
"1/22/20"
date_strings[1]
👀 Reading hidden code
11.4 μs
dateformat"m/d/Y"
date_format = Dates.DateFormat("m/d/Y")
👀 Reading hidden code
6.1 ms
0020-01-22
parse(Date, date_strings[1], date_format)
👀 Reading hidden code
388 ms

Since the year was not correctly represented in the original data, we need to manually fix it:

👀 Reading hidden code
195 μs
dates = parse.(Date, date_strings, date_format) .+ Year(2000)
👀 Reading hidden code
84.3 ms
Error message

UndefVarError: plot not defined

Stack trace

Here is what happened, the most recent locations are first:

  1. begin	plot(dates, US_data, xrotation=45, leg=:topleft, 	    label="US data", m=:o, ms=3, alpha=0.5)
begin
plot(dates, US_data, xrotation=45, leg=:topleft,
label="US data", m=:o, ms=3, alpha=0.5)
xlabel!("date")
ylabel!("cumulative US cases")
title!("US cumulative confirmed COVID-19 cases")
end
👀 Reading hidden code
---

Exploratory data analysis

👀 Reading hidden code
184 μs

Working with cumulative data is often less intuitive. Let's look at the actual number of daily cases. Julia has a diff function to calculate the difference between successive entries of a vector:

👀 Reading hidden code
251 μs
Error message

UndefVarError: plot not defined

Stack trace

Here is what happened, the most recent locations are first:

  1. 	daily_cases = diff(US_data)	plot(dates[2:end], daily_cases, m=:o, leg=false, xlabel="days", ylabel="daily US cases", alpha=0.5)   # use "o"-shaped markersend
begin
daily_cases = diff(US_data)
plot(dates[2:end], daily_cases, m=:o, leg=false, xlabel="days", ylabel="daily US cases", alpha=0.5) # use "o"-shaped markers
end
👀 Reading hidden code
---

Note that discrete data should always be plotted with points. The lines are just to guide the eye.

Cumulating data corresponds to taking the integral of a function and is a smoothing operation. Note that the cumulative data is indeed visually smoother than the daily data.

The oscillations in the daily data seem to be due to a lower incidence of reporting at weekends. We could try to smooth this out by taking a moving average, say over the past week:

👀 Reading hidden code
424 μs
begin
using Statistics
running_mean = [mean(daily_cases[i-6:i]) for i in 7:length(daily_cases)]
end
👀 Reading hidden code
99.4 ms
Error message

UndefVarError: plot not defined

begin
plot(daily_cases, label="raw daily cases")
plot!(running_mean, m=:o, label="running weakly mean")
end
👀 Reading hidden code
---

Exponential growth

Simple models of epidemic spread often predict a period with exponential growth. Do the data corroborate this?

👀 Reading hidden code
280 μs

A visual check for this is to plot the data with a logarithmic scale on the y axis (but a standard scale on the x axis).

The reason for this is that if we observe a straight line on such a semi-logarithmic plot, we have

log(y)ax+b,

where we are using to denote approximate equality.

Hence, taking exponentials of both sides, we have

yexp(ax+b)=ceax,

for some constant c.

👀 Reading hidden code
1.9 ms

Since the data contains some zeros, we need to replace those with NaNs ("Not a Number"), which Plots.jl interprets as a signal to break the line

👀 Reading hidden code
233 μs
Error message

UndefVarError: plot not defined

begin
plot(replace(daily_cases, 0 => NaN),
yscale=:log10,
leg=false, m=:o)
xlabel!("day")
ylabel!("confirmed cases in US")
title!("US confirmed COVID-19 cases")
end
👀 Reading hidden code
---

Let's zoom on the part where the growth seems linear on this semi-log plot:

👀 Reading hidden code
197 μs
Error message

UndefVarError: plot not defined

begin
plot(replace(daily_cases, 0 => NaN),
yscale=:log10,
leg=false, m=:o,
xlims=(1, 100))
xlabel!("day")
ylabel!("confirmed cases in US")
title!("US confirmed COVID-19 cases")
end
👀 Reading hidden code
---

We see that there is a period lasting from around day 38 to around day 60 when the curve looks straight on the semi-log plot. This corresponds to the following date range:

👀 Reading hidden code
15.2 ms
38:60
exp_period = 38:60
👀 Reading hidden code
16.4 μs
dates[exp_period]
👀 Reading hidden code
17.4 μs

i.e. the first 3 weeks of March. Fortunately the imposition of lockdown during the last 10 days of March (on different days in different US states) significantly reduced transmission.

👀 Reading hidden code
210 μs

We can fit a straight line using linear regression to this portion of the data.

👀 Reading hidden code
244 μs

Geographical data

👀 Reading hidden code
187 μs

Our data set contains more information: the geographical locations (latitude and longitude) of each country (or, rather, of a particular point that was chosen as being representative of that country).

👀 Reading hidden code
215 μs
provincecountrylatitudelongitude1/22/201/23/201/24/201/25/20more
String?StringFloat64?Float64?Int64Int64Int64Int64
1
missing
"Afghanistan"
33.9391
67.71
0
0
0
0
2
missing
"Albania"
41.1533
20.1683
0
0
0
0
3
missing
"Algeria"
28.0339
1.6596
0
0
0
0
4
missing
"Andorra"
42.5063
1.5218
0
0
0
0
5
missing
"Angola"
-11.2027
17.8739
0
0
0
0
6
missing
"Antarctica"
-71.9499
23.347
0
0
0
0
7
missing
"Antigua and Barbuda"
17.0608
-61.7964
0
0
0
0
8
missing
"Argentina"
-38.4161
-63.6167
0
0
0
0
9
missing
"Armenia"
40.0691
45.0382
0
0
0
0
10
"Australian Capital Territory"
"Australia"
-35.4735
149.012
0
0
0
0
more
19
missing
"Azerbaijan"
40.1431
47.5769
0
0
0
0
filter(x -> startswith(x.country, "A"), data)
👀 Reading hidden code
54.9 ms

Let's extract and plot the geographical information. To reduce the visual noise a bit we will only use those

👀 Reading hidden code
201 μs
province = data.province;
👀 Reading hidden code
13.8 ms

If the province is missing we should use the country name instead:

👀 Reading hidden code
221 μs
begin
indices = ismissing.(province)
province[indices] .= all_countries[indices]
end;
👀 Reading hidden code
286 ms
Error message

UndefVarError: scatter not defined

Stack trace

Here is what happened, the most recent locations are first:

  1. 		scatter(data.longitude, data.latitude, leg=false, alpha=0.5, ms=2)
C'est la vie !
begin
scatter(data.longitude, data.latitude, leg=false, alpha=0.5, ms=2)

for i in 1:length(province)
annotate!(data.longitude[i], data.latitude[i], text(province[i], :center, 5, color=RGBA{Float64}(0.0,0.0,0.0,0.3)))
end
plot!(axis=false)
end
👀 Reading hidden code
---
data.latitude
👀 Reading hidden code
19.3 μs

Adding maps

👀 Reading hidden code
183 μs

We would also like to see the outlines of each country. For this we can use, for example, the data from Natural Earth, which comes in the form of shape files, giving the outlines in terms of latitude and longitude coordinates.

These may be read in using the Shapefile.jl package.

👀 Reading hidden code
356 μs
Error message

HTTP/2 500 while requesting https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip

Stack trace

Here is what happened, the most recent locations are first:

  1. anonymous function
  2. arg_write(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Nothing, String}, arg::Nothing)
  3. #download#2
  4. download(url::String, output::Nothing)
  5. #invokelatest#2
  6. invokelatest
  7. do_download
  8. download
  9. begin	zipfile = download("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip")
begin
zipfile = download("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip")

r = ZipFile.Reader(zipfile);
for f in r.files
println("Filename: \$(f.name)")
open(f.name, "w") do io
write(io, read(f))
end
end
close(r)
end
👀 Reading hidden code
---
"/tmp/jl_q0uI3h"
download("https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries")
👀 Reading hidden code
366 ms
Error message

ArgumentError: File not found: ./ne_110m_admin_0_countries.dbf

Stack trace

Here is what happened, the most recent locations are first:

  1. Shapefile.Table(path::String)
  2. shp_countries = Shapefile.shapes(Shapefile.Table("./ne_110m_admin_0_countries.shp"));
shp_countries = Shapefile.shapes(Shapefile.Table("./ne_110m_admin_0_countries.shp"));
👀 Reading hidden code
---

👀 Reading hidden code
83.3 μs
# plot!(shp_countries, alpha=0.2)
👀 Reading hidden code
17.9 μs

Now we would like to combine the geographical and temporal (time) aspects. One way to do so is to animate time:

👀 Reading hidden code
232 μs
daily = max.(1, diff(Array(data[:, 5:end]), dims=2));
👀 Reading hidden code
502 ms
image/svg+xml image/svg+xml image/svg+xml speed:
@bind day_ticks Clock(0.5)
👀 Reading hidden code
62.2 ms
1
day = min(day_ticks, size(daily, 2))
👀 Reading hidden code
22.0 μs
1.2304489213782739
log10(maximum(daily[:, day]))
👀 Reading hidden code
24.2 μs
2020-01-22
dates[day]
👀 Reading hidden code
11.8 μs
#=
begin
plot(shp_countries, alpha=0.2)
scatter!(data.longitude, data.latitude, leg=false, ms=2*log10.(daily[:, day]), alpha=0.7)
xlabel!("latitude")
ylabel!("longitude")
title!("daily cases per country")
end
=#
👀 Reading hidden code
18.4 μs
Resource("https://api.mapbox.com/mapbox-gl-js/v1.12.0/mapbox-gl.css")
👀 Reading hidden code
4.6 ms
Resource("https://api.mapbox.com/mapbox-gl-js/v1.12.0/mapbox-gl.js")
👀 Reading hidden code
4.5 ms

Day 1

👀 Reading hidden code
1.6 ms
html"""
<div id="map" style="height: 500px"></div>
<script>
mapboxgl.accessToken = 'pk.eyJ1Ijoic2hhc2hpNTMiLCJhIjoiY2ppMG5vZmpuMWEyNjNwb2I5dWhveTkyZCJ9.dQ67jXuhU3DGz7QFR35alw';
var map = new mapboxgl.Map({
container: 'map',
style: 'mapbox://styles/mapbox/light-v10',
zoom: 1,
center: [0, 0]
});

var elem = document.getElementById("map");

elem.mapbox = map;

map.on('load', function () {

// Add a GeoJSON source with 2 points
map.addSource('points', {
'type': 'geojson',
'data': {
'type': 'FeatureCollection',
'features': [
{
'type': 'Feature',
'geometry': {
'type': 'Point',
'coordinates': [
-77.03238901390978,
38.913188059745586
]
}
},
{
'type': 'Feature',
'geometry': {
'type': 'Point',
'coordinates': [-122.414, 37.776]
}
}
]
}
});
map.addLayer({
'id': 'points',
'type': 'circle',
'source': 'points',
'paint': {
'circle-color': {
"property": "size",
"stops": [
[0, "#fff5f0"],
[1, "#fee0d2"],
[2, "#fcbba1"],
[3, "#fc9272"],
[4, "#fb6a4a"],
[7, "#ef3b2c"],
[8, "#cb181d"],
[9, "#a50f15"],
[10, "#67000d"]
]
},
// make circles larger as the user zooms from z12 to z22
'circle-radius': [
'interpolate',
['linear'], ["zoom"],
0, ['get', 'size']
]
}
});
});
</script>
"""
👀 Reading hidden code
134 μs
👀 Reading hidden code
1.0 s
log10.(daily[:, day])
👀 Reading hidden code
33.8 ms
set_points (generic function with 1 method)
function set_points(data, fotoday)
jsondata = sprint(io->JSON.print(io, make_features(data, fotoday)));
HTML("""
<script>
var elem = document.getElementById("map");
elem.mapbox.getSource('points').setData($jsondata)
</script>
""")
end
👀 Reading hidden code
1.0 ms
make_features (generic function with 1 method)
👀 Reading hidden code
1.2 ms

However, we should always be wary about visualisations such as these. Perhaps we should be plotting cases per capita instead of absolute numbers of cases. Or should we divide by the area of the country? Some countries, such as China and Canada, are divided into states or regions in the original data set – but others, such as the US, are not. You should always check exactly what is being plotted!

Unfortunately, published visualisations often hide some of this information. This emphasises the need to be able to get our hands on the data, create our own visualisations and draw our own conclusions.

👀 Reading hidden code
302 μs