Visualizing Global Data: A Guide to Mapping with Python
Written on
Chapter 1: Introduction to Geospatial Data Visualization
Creating visual representations of geographic data can seem daunting at first, but with the right approach, it becomes a straightforward task. While a bar graph displaying country names may convey your information, it lacks the vibrancy that a map can provide. By overlaying your data on a world map, you can make the interaction of your data much clearer and more engaging. This guide will walk you through the process, helping you produce impressive geospatial visuals.
In this article, I will assume you possess a basic familiarity with Python. However, if you find yourself puzzled at any point, please feel free to ask questions in the comments. I'm eager to assist! For this demonstration, we will utilize a dataset from the Center for Near Earth Object Studies (part of NASA's Jet Propulsion Laboratory). If you wish to follow along, download the CSV file from the provided link and save it in an accessible location. Now, let's dive into the coding process.
Section 1.1: Required Libraries
To plot the data, we must first import several libraries:
- Matplotlib: This library provides the pyplot module (commonly referenced as plt), which is essential for plotting.
- Pandas: We will use this library (often abbreviated as pd) for reading and manipulating our dataset.
- GeoPandas: This library (commonly referred to as gpd) enables us to create our world map.
# Importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
Many users have faced challenges when installing GeoPandas, and I encountered similar issues. Fortunately, I found a helpful tutorial that resolved my installation problems. If you haven’t installed GeoPandas yet, I suggest you consult that guide.
Section 1.2: Importing the Data
Next, we will read our dataset using Pandas. The read_csv function allows us to specify which columns to extract from the cneos_fireball_data.csv file. The headers can sometimes be unclear, so we will rename them for better readability.
# Reading CSV file using Pandas
df = pd.read_csv('cneos_fireball_data.csv',
usecols=["Peak Brightness Date/Time (UT)",
"Calculated Total Impact Energy (kt)",
"Latitude (deg.)", "Longitude (deg.)"])
df = df.rename(columns={"Peak Brightness Date/Time (UT)":
'Datetime',
"Calculated Total Impact Energy (kt)":
'Impact Energy [kt]',
"Latitude (deg.)": 'Latitude',
"Longitude (deg.)": 'Longitude'})
After loading the data, we can display a portion of the dataset along with its data types. The output will resemble the following:
print(pd.DataFrame(df))
print('n')
print(df.dtypes)
print('n')
It's worth noting that the Datetime, Latitude, and Longitude columns are currently designated as object data types. We must address this before visualizing the data.
Chapter 2: Data Cleaning and Preparation
The initial step in cleaning our dataset involves converting the Datetime column to a proper datetime type using Pandas. Next, we will convert the Latitude and Longitude columns to floating-point numbers. This requires us to strip the directional indicators (N, E, S, W) and adjust the values accordingly. Additionally, we will filter the Impact Energy [kt] column to exclude any values exceeding 20 kt.
# Converting to a datetime datatype
df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce')
# Adjusting directional values
for x in range(len(df['Longitude'])):
if str(df.loc[x, 'Longitude'])[-1] == 'E':
df.loc[x, 'Longitude'] = str(df.loc[x, 'Longitude'])[:-1]if str(df.loc[x, 'Longitude'])[-1] == 'W':
df.loc[x, 'Longitude'] =
'-' + str(df.loc[x, 'Longitude'])[:-1]
for x in range(len(df['Latitude'])):
if str(df.loc[x, 'Latitude'])[-1] == 'N':
df.loc[x, 'Latitude'] = str(df.loc[x, 'Latitude'])[:-1]if str(df.loc[x, 'Latitude'])[-1] == 'S':
df.loc[x, 'Latitude'] =
'-' + str(df.loc[x, 'Latitude'])[:-1]
df['Longitude'] = pd.to_numeric(df['Longitude'], errors='coerce')
df['Latitude'] = pd.to_numeric(df['Latitude'], errors='coerce')
# Filtering Impact Energy column
threshold = 20
df = df[df['Impact Energy [kt]'] < threshold]
df['Impact Energy [kt]'] = pd.to_numeric(df['Impact Energy [kt]'],
errors='coerce')
To finalize the cleaning process, we will drop any rows that contain errors from the conversion and reset the index of our dataset.
# Dropping errors and resetting index
df.dropna()
df = df.reset_index(drop=True)
# Displaying cleaned data and types
print(pd.DataFrame(df))
print('n')
print(df.dtypes)
print('n')
The output will present our cleaned dataset along with the corrected data types.
Chapter 3: Mapping the Data
Now that our data is prepared, we can utilize the GeoPandas library to obtain a world map. To visualize the data, we will create a figure and plot our dataset alongside the world map.
# Importing world map data from GeoPandas
worldmap = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# Creating figure and plotting world map
fig, ax = plt.subplots(figsize=(12, 6))
worldmap.plot(color="lightgrey", ax=ax)
# Plotting Impact Energy data
x = df['Longitude']
y = df['Latitude']
z = df['Impact Energy [kt]']
plt.scatter(x, y, s=20*z, c=z, alpha=0.6, vmin=0, vmax=threshold,
cmap='autumn')
plt.colorbar(label='Impact Energy [kt]')
# Setting axis limits and title
plt.xlim([-180, 180])
plt.ylim([-90, 90])
first_year = df["Datetime"].min().strftime("%Y")
last_year = df["Datetime"].max().strftime("%Y")
plt.title("NASA: Reported Fireballs from Government Sensorsn" +
str(first_year) + " - " + str(last_year))
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
If you've followed the steps correctly, your plot should display the fireball data effectively.
Acknowledgments
I appreciate you taking the time to read through this article! If you enjoyed it, please consider following my work for more insights on Python, space exploration, and orbital mechanics. Should you have any questions or feedback, don't hesitate to reach out!