plot NaNs in gray color and use log color bar by ValentinGebhart · Pull Request #929

plot NaNs in gray color and use log color bar by ValentinGebhart · Pull Request #929 · CLIMADA-project/climada_python

Changes proposed in this PR:

In the plot function geo_im_from_array, NaN values in the data will be plotted in gray. Before, NaN value were not plotted (i.e. transparent), making them indistinguishable from plot regions for which there is no data (no centroids).
In the plot function plot_from_gdf, the colorbar with will be shown on a logarithmic scale if a) the gdf is about return periods or impacts, b) there are no zeros in the data, c) the span of the data's values are at least two orders of magnitude

PR Author Checklist

Read the Contribution Guide
Correct target branch selected (if unsure, select develop)
Descriptive pull request title added
Source branch up-to-date with target branch
Documentation updated
Tests updated
Tests passing
No new linter issues
Changelog updated

PR Reviewer Checklist

@ValentinGebhart Thank you for that contribution. Can you share an example and compare the resulting plots before and after your changes?

@ValentinGebhart Thank you for that contribution. Can you share an example and compare the resulting plots before and after your changes?

This is an example of plotting the return periods of a hazard object where there are some NaNs (because the centroid had never seen the given threshold intensity, so the return period is given as NaN), and some centroids are removed (left bottom corner). This is the code:

import numpy as np
from climada.hazard import Hazard
from climada.util import HAZ_DEMO_H5 # CLIMADA's Python file
haz_tc_fl = Hazard.from_hdf5(HAZ_DEMO_H5) # Historic tropical cyclones in Florida from 1990 to 2004
haz_tc_fl.check() # Use always the check() method to see if the hazard has been loaded correctly

centroids_mask = np.array(
[ (i + j > 10) for j in range(50) for i in range(50)]
)
haz_tc_fl.centroids = haz_tc_fl.centroids.select(sel_cen=centroids_mask)
haz_tc_fl.intensity = haz_tc_fl.intensity[:, -2434:]

return_periods, label, column_label = haz_tc_fl.local_return_period([30, 40])

from climada.util.plot import plot_from_gdf
plot_from_gdf(return_periods, colorbar_name=label, title_subplots=column_label)

old plots

new plots

Note that if the value range of the hazard return periods was more than two orders of magnitude (without having zeros), the color scale would also be logarithmic in the new plots

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the overall contribution, but I have to say I dislike the approach. Calling griddata again on basically the same data, casting to bool, plotting with a weird colormap...

I think the same thing can be achieved much easier, using all the tools of Matplotlib. You can set "bad" and "over/under" colors for a colormap. Choosing the right vmin should then give you the expected outcome with a single call to pcolormesh

# ...
if "norm" in kwargs:
    min_value = kwargs["norm"].vmin
    vmin = None  # We will pass norm
else:
    min_value = np.nanmin(array_im)
    vmin = kwargs.pop("vmin", min_value)

grid_im = griddata(
    (coord[:, 1], coord[:, 0]),
    array_im,
    (grid_x, grid_y),
    fill_value=min_value-1,  # Values outside the grid
)

# ...
cmap = plt.get_cmap(kwargs.pop("cmap", "viridis"))
cmap.set_bad("gray")  # For NaNs and infs
cmap.set_under("white", alpha=0)  # For values below vmin

axis.pcolormesh(
    grid_x - mid_lon,
    grid_y,
    np.squeeze(grid_im),
    transform=proj,
    cmap=cmap,
    vmin=vmin,
    **kwargs
)

Comment on lines +927 to +928

		gdf = gdf[['geometry', *[col for col in gdf.columns if col != 'geometry']]]
		gdf_values = gdf.values[:,1:].T

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		):
		kwargs.update(
		{'norm': mpl.colors.LogNorm(
		vmin=gdf.values[:,1:].min(), vmax=gdf.values[:,1:].max()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	vmin=gdf.values[:,1:].min(), vmax=gdf.values[:,1:].max()
	vmin=gdf_values.min(), vmax=gdf_values.max()

Thanks for the advice! I agree that the way you describe is easier. I implemented and tested it (example plots from above didn't change), with a small modification for the case of the log colorscale (min_value - 1 did not seem to work, so I used min_value/2).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is looking much better now, thanks for the update! I have a few nitpicky suggestions still 🙈 We can merge once these are resolved!