Module 2: Epidemic propagation
We are starting a new module on modelling epidemic propagation.
Let's start off by analysing some of the data that is now available on the current COVID-19 pandemic.
Exploring COVID-19 data
In this notebook we will explore and analyse data on the COVID-19 pandemic. The aim is to use Julia's tools to analyse and visualise the data in different ways.
Here is an example of the kind of visualisation we will be able to produce:
Download and load data
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
"covid_data.csv"
We will need a couple of new packages. The data is in CSV format, i.e. Comma-Separated Values. This is a common data format in which observations, i.e. data points, are separated on different lines. Within each line the different data for that observation are separated by commas or other punctuation (possibly spaces and tabs).
Updating registry at `~/.julia/registries/General.toml` Resolving package versions... Updating `~/.julia/environments/v1.7/Project.toml` [336ed68f] + CSV v0.10.15 Updating `~/.julia/environments/v1.7/Manifest.toml` [336ed68f] + CSV v0.10.15 [944b1d66] + CodecZlib v0.7.8 [34da2185] + Compat v4.16.0 [9a962f9c] + DataAPI v1.16.0 [e2d170a0] + DataValueInterfaces v1.0.0 [48062228] + FilePathsBase v0.9.24 [842dd82b] + InlineStrings v1.4.3 [82899510] + IteratorInterfaceExtensions v1.0.0 [bac558e1] + OrderedCollections v1.8.0 [69de0a69] + Parsers v2.8.1 [2dfb63ee] + PooledArrays v1.4.3 [91c51154] + SentinelArrays v1.4.8 [3783bdb8] + TableTraits v1.0.1 [bd369af6] + Tables v1.12.0 [3bb67fe8] + TranscodingStreams v0.11.3 [ea10d353] + WeakRefStrings v1.4.2 [76eceee3] + WorkerUtilities v1.6.1 [2a0f44e3] + Base64 [9fa8497b] + Future [b77e0a4c] + InteractiveUtils [56ddb016] + Logging [d6f4376e] + Markdown [a63ad114] + Mmap [8dfed614] + Test [cf7118a7] + UUIDs [83775a58] + Zlib_jll Resolving package versions... Updating `~/.julia/environments/v1.7/Project.toml` [a93c6f00] + DataFrames v1.7.0 Updating `~/.julia/environments/v1.7/Manifest.toml` [a8cc5b0e] + Crayons v4.1.1 [a93c6f00] + DataFrames v1.7.0 [864edb3b] + DataStructures v0.18.20 [41ab1584] + InvertedIndices v1.3.1 [b964fa9f] + LaTeXStrings v1.4.0 [e1d29d7a] + Missings v1.2.0 [08abe8d2] + PrettyTables v2.3.2 [189a3867] + Reexport v1.2.2 [a2af1166] + SortingAlgorithms v1.2.1 [892a3eda] + StringManipulation v0.3.4 Resolving package versions... Updating `~/.julia/environments/v1.7/Project.toml` [7f904dfe] + PlutoUI v0.7.61 Updating `~/.julia/environments/v1.7/Manifest.toml` [6e696c72] + AbstractPlutoDingetjes v1.3.2 [3da002f7] + ColorTypes v0.11.5 [53c48c17] + FixedPointNumbers v0.8.5 [47d2ed2b] + Hyperscript v0.0.5 [ac1192a8] + HypertextLiteral v0.9.5 [b5f81e59] + IOCapture v0.2.5 [682c06a0] + JSON v0.21.4 [6c6e2e6c] + MIMEs v1.0.0 [7f904dfe] + PlutoUI v0.7.61 [410a4b4d] + Tricks v0.1.10 [5c2747f8] + URIs v1.5.1 [0dad84c5] + ArgTools [f43a241f] + Downloads [7b1f6079] + FileWatching [b27032c2] + LibCURL [76f85450] + LibGit2 [ca575930] + NetworkOptions [44cfe95a] + Pkg [3fa0cd96] + REPL [6462fe0b] + Sockets [a4e569a6] + Tar [deac9b47] + LibCURL_jll [29816b5a] + LibSSH2_jll [c8ffd9c3] + MbedTLS_jll [14a3606d] + MozillaCACerts_jll [8e850ede] + nghttp2_jll [3f19e933] + p7zip_jll Resolving package versions... Installed GeoInterfaceRecipes ─ v1.0.2 Installed DBFTables ─────────── v1.2.6 Installed Shapefile ─────────── v0.11.0 Updating `~/.julia/environments/v1.7/Project.toml` [8e980c4a] + Shapefile v0.11.0 Updating `~/.julia/environments/v1.7/Manifest.toml` [75c7ada1] + DBFTables v1.2.6 [411431e0] + Extents v0.1.4 [68eda718] + GeoFormatTypes v0.4.4 [cf35fbd7] + GeoInterface v1.4.1 [0329782f] + GeoInterfaceRecipes v1.0.2 [3cdcf5f2] + RecipesBase v1.3.4 [8e980c4a] + Shapefile v0.11.0 Precompiling project... ✓ GeoInterfaceRecipes ✓ DBFTables ✓ Shapefile 3 dependencies successfully precompiled in 3 seconds (53 already precompiled) Resolving package versions... Installed ZipFile ─ v0.10.1 Updating `~/.julia/environments/v1.7/Project.toml` [a5390f91] + ZipFile v0.10.1 Updating `~/.julia/environments/v1.7/Manifest.toml` [a5390f91] + ZipFile v0.10.1 Precompiling project... ✓ ZipFile 1 dependency successfully precompiled in 0 seconds (56 already precompiled) Resolving package versions... Updating `~/.julia/environments/v1.7/Project.toml` [682c06a0] + JSON v0.21.4 No Changes to `~/.julia/environments/v1.7/Manifest.toml`
We can load the data from a CSV using the File
function from the CSV.jl
package, and then convert it to a DataFrame
:
Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
String? | String | Float64? | Float64? | Int64 | Int64 | Int64 | Int64 | ||||
1 | missing | "Afghanistan" | 33.9391 | 67.71 | 0 | 0 | 0 | 0 | |||
2 | missing | "Albania" | 41.1533 | 20.1683 | 0 | 0 | 0 | 0 | |||
3 | missing | "Algeria" | 28.0339 | 1.6596 | 0 | 0 | 0 | 0 | |||
4 | missing | "Andorra" | 42.5063 | 1.5218 | 0 | 0 | 0 | 0 | |||
5 | missing | "Angola" | -11.2027 | 17.8739 | 0 | 0 | 0 | 0 | |||
6 | missing | "Antarctica" | -71.9499 | 23.347 | 0 | 0 | 0 | 0 | |||
7 | missing | "Antigua and Barbuda" | 17.0608 | -61.7964 | 0 | 0 | 0 | 0 | |||
8 | missing | "Argentina" | -38.4161 | -63.6167 | 0 | 0 | 0 | 0 | |||
9 | missing | "Armenia" | 40.0691 | 45.0382 | 0 | 0 | 0 | 0 | |||
10 | "Australian Capital Territory" | "Australia" | -35.4735 | 149.012 | 0 | 0 | 0 | 0 | |||
289 | missing | "Zimbabwe" | -19.0154 | 29.1549 | 0 | 0 | 0 | 0 |
A DataFrame
is a standard way of storing heterogeneous data in Julia, i.e. a table consisting of columns with different types. As you can see from the display of the DataFrame
object above, each column has an associated type, but different columns have different types, reflecting the type of the data in that column.
In our case, country names are stored as String
s, their latitude and longitude as Float64
s and the (cumulative) case counts for each day as Int64
s. .
Using the data
Since we need to manipulate the columns, let's rename them to something shorter. We can do this either in place, i.e. modifying the original DataFrame
, or out of place, creating a new DataFrame
. The convention in Julia is that functions that modify their argument have a name ending with !
(often pronounced "bang").
We can use the head
function to see only the first few lines of the data.
UndefVarError: head not defined
Here is what happened, the most recent locations are first:
UndefVarError: head not defined
Here is what happened, the most recent locations are first:
Extracting useful information
How can we extract the list of all the countries? The country names are in the second column.
For some purposes we can think of a DataFrame
.as a matrix and use similar syntax. For example, we can extract the second column:
"Afghanistan"
"Albania"
"Algeria"
"Andorra"
"Angola"
"Antarctica"
"Antigua and Barbuda"
"Argentina"
"Armenia"
"Australia"
"Australia"
"Australia"
"Australia"
"Australia"
"Australia"
"Australia"
"Australia"
"Austria"
"Azerbaijan"
"Bahamas"
"Uruguay"
"Uzbekistan"
"Vanuatu"
"Venezuela"
"Vietnam"
"West Bank and Gaza"
"Winter Olympics 2022"
"Yemen"
"Zambia"
"Zimbabwe"
It turns out that some countries are divided into provinces, so there are repetitions in the country
column that we can eliminate with the unique
function:
"Afghanistan"
"Albania"
"Algeria"
"Andorra"
"Angola"
"Antarctica"
"Antigua and Barbuda"
"Argentina"
"Armenia"
"Australia"
"Austria"
"Azerbaijan"
"Bahamas"
"Bahrain"
"Bangladesh"
"Barbados"
"Belarus"
"Belgium"
"Belize"
"Benin"
"Uruguay"
"Uzbekistan"
"Vanuatu"
"Venezuela"
"Vietnam"
"West Bank and Gaza"
"Winter Olympics 2022"
"Yemen"
"Zambia"
"Zimbabwe"
[Here we used string interpolation with $
to put the text into a Markdown string.]
You can also use Select
to get a dropdown instead:
How can we extract the data for a particular country? First we need to know the exact name of the country. E.g. is the US written as "USA", or "United States"?
We could scroll through to find out, or filter the data to only look at a sample of it, for example those countries that begin with the letter "U".
One way to do this is with an array comprehension:
Array comprehension:
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
true
true
false
false
false
false
false
false
false
false
289
Note that this returns an array of booleans of the same length as the vector all_countries
. We can now use this to index into the DataFrame
:
province | country | latitude | longitude | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
String? | String | Float64? | Float64? | Int64 | Int64 | Int64 | Int64 | ||||
1 | missing | "US" | 40.0 | -100.0 | 1 | 1 | 2 | 2 | |||
2 | missing | "Uganda" | 1.37333 | 32.2903 | 0 | 0 | 0 | 0 | |||
3 | missing | "Ukraine" | 48.3794 | 31.1656 | 0 | 0 | 0 | 0 | |||
4 | missing | "United Arab Emirates" | 23.4241 | 53.8478 | 0 | 0 | 0 | 0 | |||
5 | "Anguilla" | "United Kingdom" | 18.2206 | -63.0686 | 0 | 0 | 0 | 0 | |||
6 | "Bermuda" | "United Kingdom" | 32.3078 | -64.7505 | 0 | 0 | 0 | 0 | |||
7 | "British Virgin Islands" | "United Kingdom" | 18.4207 | -64.64 | 0 | 0 | 0 | 0 | |||
8 | "Cayman Islands" | "United Kingdom" | 19.3133 | -81.2546 | 0 | 0 | 0 | 0 | |||
9 | "Channel Islands" | "United Kingdom" | 49.3723 | -2.3644 | 0 | 0 | 0 | 0 | |||
10 | "Falkland Islands (Malvinas)" | "United Kingdom" | -51.7963 | -59.5236 | 0 | 0 | 0 | 0 | |||
21 | missing | "Uzbekistan" | 41.3775 | 64.5853 | 0 | 0 | 0 | 0 |
We see that the correct spelling is "US"
. (And note how the different provinces of the UK are separated.)
Now we would like to extract the data for the US alone. How can we access the correct row of the table? We can again filter on the country name. A nicer way to do this is to use the filter
function.
This is a higher-order function: its first argument is itself a function, which must return true
or false
. filter
will return all the rows of the DataFrame
that satisfy that predicate:
province | country | latitude | longitude | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | ||
---|---|---|---|---|---|---|---|---|---|
String? | String | Float64? | Float64? | Int64 | Int64 | Int64 | Int64 | ||
1 | "Anguilla" | "United Kingdom" | 18.2206 | -63.0686 | 0 | 0 | 0 | 0 | |
2 | "Bermuda" | "United Kingdom" | 32.3078 | -64.7505 | 0 | 0 | 0 | 0 | |
3 | "British Virgin Islands" | "United Kingdom" | 18.4207 | -64.64 | 0 | 0 | 0 | 0 | |
4 | "Cayman Islands" | "United Kingdom" | 19.3133 | -81.2546 | 0 | 0 | 0 | 0 | |
5 | "Channel Islands" | "United Kingdom" | 49.3723 | -2.3644 | 0 | 0 | 0 | 0 | |
6 | "Falkland Islands (Malvinas)" | "United Kingdom" | -51.7963 | -59.5236 | 0 | 0 | 0 | 0 | |
7 | "Gibraltar" | "United Kingdom" | 36.1408 | -5.3536 | 0 | 0 | 0 | 0 | |
8 | "Guernsey" | "United Kingdom" | 49.4482 | -2.58949 | 0 | 0 | 0 | 0 | |
9 | "Isle of Man" | "United Kingdom" | 54.2361 | -4.5481 | 0 | 0 | 0 | 0 | |
10 | "Jersey" | "United Kingdom" | 49.2138 | -2.1358 | 0 | 0 | 0 | 0 | |
11 | "Montserrat" | "United Kingdom" | 16.7425 | -62.1874 | 0 | 0 | 0 | 0 | |
12 | "Pitcairn Islands" | "United Kingdom" | -24.3768 | -128.324 | 0 | 0 | 0 | 0 | |
13 | "Saint Helena, Ascension and Tristan da Cunha" | "United Kingdom" | -7.9467 | -14.3559 | 0 | 0 | 0 | 0 | |
14 | "Turks and Caicos Islands" | "United Kingdom" | 21.694 | -71.7979 | 0 | 0 | 0 | 0 | |
15 | missing | "United Kingdom" | 55.3781 | -3.436 | 0 | 0 | 0 | 0 |
Here we have used an anonymous function with the syntax x -> ⋯
. This is a function which takes the argument x
and returns whatever is on the right of the arrow (->
).
To extract a single row we need the index of the row (i.e. which number row it is in the DataFrame
). The findfirst
function finds the first row that satisfies the given predicate:
261
Row | province | country | latitude | longitude | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | 1/30/20 | 1/31/20 | 2/1/20 | 2/2/20 | 2/3/20 | 2/4/20 | 2/5/20 | 2/6/20 | 2/7/20 | 2/8/20 | 2/9/20 | 2/10/20 | 2/11/20 | 2/12/20 | 2/13/20 | 2/14/20 | 2/15/20 | 2/16/20 | 2/17/20 | 2/18/20 | 2/19/20 | 2/20/20 | 2/21/20 | 2/22/20 | 2/23/20 | 2/24/20 | 2/25/20 | 2/26/20 | 2/27/20 | 2/28/20 | 2/29/20 | 3/1/20 | 3/2/20 | 3/3/20 | 3/4/20 | 3/5/20 | 3/6/20 | 3/7/20 | 3/8/20 | 3/9/20 | 3/10/20 | 3/11/20 | 3/12/20 | 3/13/20 | 3/14/20 | 3/15/20 | 3/16/20 | 3/17/20 | 3/18/20 | 3/19/20 | 3/20/20 | 3/21/20 | 3/22/20 | 3/23/20 | 3/24/20 | 3/25/20 | 3/26/20 | 3/27/20 | 3/28/20 | 3/29/20 | 3/30/20 | 3/31/20 | 4/1/20 | 4/2/20 | 4/3/20 | 4/4/20 | 4/5/20 | 4/6/20 | 4/7/20 | 4/8/20 | 4/9/20 | 4/10/20 | 4/11/20 | 4/12/20 | 4/13/20 | 4/14/20 | 4/15/20 | 4/16/20 | 4/17/20 | 4/18/20 | 4/19/20 | 4/20/20 | 4/21/20 | 4/22/20 | 4/23/20 | 4/24/20 | 4/25/20 | 4/26/20 | ⋯ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String? | String | Float64? | Float64? | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | ⋯ | |
261 | missing | US | 40.0 | -100.0 | 1 | 1 | 2 | 2 | 5 | 5 | 5 | 6 | 6 | 8 | 8 | 8 | 11 | 11 | 11 | 12 | 12 | 12 | 12 | 12 | 13 | 13 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 16 | 16 | 16 | 16 | 16 | 16 | 17 | 17 | 25 | 32 | 55 | 74 | 107 | 184 | 237 | 403 | 519 | 594 | 782 | 1147 | 1586 | 2219 | 2978 | 3212 | 4679 | 6512 | 9169 | 13663 | 20030 | 26025 | 34944 | 46096 | 56714 | 68841 | 86662 | 105253 | 127417 | 143544 | 165698 | 192079 | 227903 | 260183 | 292630 | 324340 | 353117 | 385110 | 415256 | 446500 | 482526 | 516707 | 545693 | 571551 | 598794 | 627306 | 653669 | 683351 | 716508 | 743857 | 768888 | 799531 | 825478 | 855445 | 887523 | 919014 | 949640 | 975537 | ⋯ |
Now we can extract the data into a standard Julia Vector
:
1
1
2
2
5
5
5
6
6
8
8
8
11
11
11
12
12
12
12
12
103443455
103533872
103589757
103648690
103650837
103646975
103655539
103690910
103755771
103802702
The package Plots.jl could not load because it failed to initialize.
That's not nice! Things you could try:
- Restart the notebook.
- Try a different Julia version.
- Contact the developers of Plots.jl about this error.
You might find useful information in the package installation log:
UndefVarError: scatter not defined
Here is what happened, the most recent locations are first:
Note that we are only passing a single vector to the scatter
function, so the
Also note that the
This is an example of a time series, i.e. a single quantity that changes over time.
Using dates
We would like to use actual dates instead of just the number of days since the start of the recorded data. The dates are given in the column names of the DataFrame
:
"province"
"country"
"latitude"
"longitude"
"1/22/20"
"1/23/20"
"1/24/20"
"1/25/20"
"1/26/20"
"1/27/20"
"1/28/20"
"1/29/20"
"1/30/20"
"1/31/20"
"2/1/20"
"2/2/20"
"2/3/20"
"2/4/20"
"2/5/20"
"2/6/20"
"2/28/23"
"3/1/23"
"3/2/23"
"3/3/23"
"3/4/23"
"3/5/23"
"3/6/23"
"3/7/23"
"3/8/23"
"3/9/23"
"1/22/20"
"1/23/20"
"1/24/20"
"1/25/20"
"1/26/20"
"1/27/20"
"1/28/20"
"1/29/20"
"1/30/20"
"1/31/20"
"2/1/20"
"2/2/20"
"2/3/20"
"2/4/20"
"2/5/20"
"2/6/20"
"2/7/20"
"2/8/20"
"2/9/20"
"2/10/20"
"2/28/23"
"3/1/23"
"3/2/23"
"3/3/23"
"3/4/23"
"3/5/23"
"3/6/23"
"3/7/23"
"3/8/23"
"3/9/23"
Now we need to parse the date strings, i.e. convert from a string representation into an actual Julia type provided by the Dates.jl
standard library package:
"1/22/20"
dateformat"m/d/Y"
0020-01-22
Since the year was not correctly represented in the original data, we need to manually fix it:
2020-01-22
2020-01-23
2020-01-24
2020-01-25
2020-01-26
2020-01-27
2020-01-28
2020-01-29
2020-01-30
2020-01-31
2020-02-01
2020-02-02
2020-02-03
2020-02-04
2020-02-05
2020-02-06
2020-02-07
2020-02-08
2020-02-09
2020-02-10
2023-02-28
2023-03-01
2023-03-02
2023-03-03
2023-03-04
2023-03-05
2023-03-06
2023-03-07
2023-03-08
2023-03-09
UndefVarError: plot not defined
Here is what happened, the most recent locations are first:
Exploratory data analysis
Working with cumulative data is often less intuitive. Let's look at the actual number of daily cases. Julia has a diff
function to calculate the difference between successive entries of a vector:
UndefVarError: plot not defined
Here is what happened, the most recent locations are first:
Note that discrete data should always be plotted with points. The lines are just to guide the eye.
Cumulating data corresponds to taking the integral of a function and is a smoothing operation. Note that the cumulative data is indeed visually smoother than the daily data.
The oscillations in the daily data seem to be due to a lower incidence of reporting at weekends. We could try to smooth this out by taking a moving average, say over the past week:
0.714286
0.714286
0.857143
0.857143
0.428571
0.857143
0.857143
0.714286
0.857143
0.571429
0.571429
0.571429
0.142857
0.285714
0.285714
0.285714
0.285714
0.285714
0.285714
0.285714
34969.4
32148.6
32035.1
38611.7
38525.7
37744.6
36530.3
35350.7
31699.9
30420.7
UndefVarError: plot not defined
Exponential growth
Simple models of epidemic spread often predict a period with exponential growth. Do the data corroborate this?
A visual check for this is to plot the data with a logarithmic scale on the
The reason for this is that if we observe a straight line on such a semi-logarithmic plot, we have
where we are using
Hence, taking exponentials of both sides, we have
for some constant
Since the data contains some zeros, we need to replace those with NaN
s ("Not a Number"), which Plots.jl
interprets as a signal to break the line
UndefVarError: plot not defined
Let's zoom on the part where the growth seems linear on this semi-log plot:
UndefVarError: plot not defined
We see that there is a period lasting from around day 38 to around day 60 when the curve looks straight on the semi-log plot. This corresponds to the following date range:
38:60
2020-02-28
2020-02-29
2020-03-01
2020-03-02
2020-03-03
2020-03-04
2020-03-05
2020-03-06
2020-03-07
2020-03-08
2020-03-09
2020-03-10
2020-03-11
2020-03-12
2020-03-13
2020-03-14
2020-03-15
2020-03-16
2020-03-17
2020-03-18
2020-03-19
2020-03-20
2020-03-21
i.e. the first 3 weeks of March. Fortunately the imposition of lockdown during the last 10 days of March (on different days in different US states) significantly reduced transmission.
We can fit a straight line using linear regression to this portion of the data.
Geographical data
Our data set contains more information: the geographical locations (latitude and longitude) of each country (or, rather, of a particular point that was chosen as being representative of that country).
province | country | latitude | longitude | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
String? | String | Float64? | Float64? | Int64 | Int64 | Int64 | Int64 | ||||
1 | missing | "Afghanistan" | 33.9391 | 67.71 | 0 | 0 | 0 | 0 | |||
2 | missing | "Albania" | 41.1533 | 20.1683 | 0 | 0 | 0 | 0 | |||
3 | missing | "Algeria" | 28.0339 | 1.6596 | 0 | 0 | 0 | 0 | |||
4 | missing | "Andorra" | 42.5063 | 1.5218 | 0 | 0 | 0 | 0 | |||
5 | missing | "Angola" | -11.2027 | 17.8739 | 0 | 0 | 0 | 0 | |||
6 | missing | "Antarctica" | -71.9499 | 23.347 | 0 | 0 | 0 | 0 | |||
7 | missing | "Antigua and Barbuda" | 17.0608 | -61.7964 | 0 | 0 | 0 | 0 | |||
8 | missing | "Argentina" | -38.4161 | -63.6167 | 0 | 0 | 0 | 0 | |||
9 | missing | "Armenia" | 40.0691 | 45.0382 | 0 | 0 | 0 | 0 | |||
10 | "Australian Capital Territory" | "Australia" | -35.4735 | 149.012 | 0 | 0 | 0 | 0 | |||
19 | missing | "Azerbaijan" | 40.1431 | 47.5769 | 0 | 0 | 0 | 0 |
Let's extract and plot the geographical information. To reduce the visual noise a bit we will only use those
If the province
is missing we should use the country name instead:
UndefVarError: scatter not defined
Here is what happened, the most recent locations are first:
33.9391
41.1533
28.0339
42.5063
-11.2027
-71.9499
17.0608
-38.4161
40.0691
-35.4735
-33.8688
-12.4634
-27.4698
-34.9285
-42.8821
-37.8136
-31.9505
47.5162
40.1431
25.0259
-32.5228
41.3775
-15.3767
6.4238
14.0583
31.9522
39.9042
15.5527
-13.1339
-19.0154
Adding maps
We would also like to see the outlines of each country. For this we can use, for example, the data from Natural Earth, which comes in the form of shape files, giving the outlines in terms of latitude and longitude coordinates.
These may be read in using the Shapefile.jl
package.
HTTP/2 500 while requesting https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip
Here is what happened, the most recent locations are first:
- anonymous functionfrom Downloads.jl:243
- arg_write
(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Nothing, String}, arg::Nothing) from ArgTools.jl:101 - #download#2from Downloads.jl:230
- download
(url::String, output::Nothing) from Downloads.jl:230 - #invokelatest#2from essentials.jl:716
- invokelatestfrom essentials.jl:714
- do_downloadfrom download.jl:24
- downloadfrom download.jl:20
- from This cell: line 2
begin
zipfile = download("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip")
"/tmp/jl_q0uI3h"
ArgumentError: File not found: ./ne_110m_admin_0_countries.dbf
Here is what happened, the most recent locations are first:
- Shapefile.Table
(path::String) from table.jl:71 - from This cell: line 1
shp_countries = Shapefile.shapes(Shapefile.Table("./ne_110m_admin_0_countries.shp"));
Now we would like to combine the geographical and temporal (time) aspects. One way to do so is to animate time:
1
1.2304489213782739
2020-01-22
Day 1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.30103
0.0
0.0
0.0
0.0
0.0
set_points (generic function with 1 method)
make_features (generic function with 1 method)
However, we should always be wary about visualisations such as these. Perhaps we should be plotting cases per capita instead of absolute numbers of cases. Or should we divide by the area of the country? Some countries, such as China and Canada, are divided into states or regions in the original data set – but others, such as the US, are not. You should always check exactly what is being plotted!
Unfortunately, published visualisations often hide some of this information. This emphasises the need to be able to get our hands on the data, create our own visualisations and draw our own conclusions.