cleaning up data api file cache by emanuel-schmid · Pull Request #737

cleaning up data api file cache by emanuel-schmid · Pull Request #737 · CLIMADA-project/climada_python

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a very quick fix, thanks! Just a few small comments, but otherwise it looks great to me (when the documentation will be added :)).

Comment on lines +599 to +600

		" or remove cache entry from database by calling"
		f" `Client.purge_cache_db(Path('{local_path}'))`!")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit to difficult to understand. When should a user purge the cache? When should a user wait for the download? Any way to help the user better here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception is more verbose now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear now!

		with the API client, if they are beneath the given directory and if one of the following
		is the case:
		- there status is neither 'active' nor 'test_dataset'
		= their status is 'test_dataset' and keep_testfiles is set to False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	= their status is 'test_dataset' and keep_testfiles is set to False
	- their status is 'test_dataset' and keep_testfiles is set to False


		# collect urls from datasets that should not be removed
		test_datasets = self.list_dataset_infos(status='test_dataset') if keep_testfiles else []
		test_urls = set(filinf.url for dsinf in test_datasets for filinf in dsinf.files)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the short variables dsinf and filinf. Are these some standard abbreviations?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're called ds_info and file_info now

		rm_empty_dirs(subdir)
		try:
		directory.rmdir()
		except OSError:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error is ignored here? I do not understand.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an inline comment # raised when directory is not empty

		Path(temp_dir).joinpath('hazard/tropical_cyclone/rename_files2/v1').is_dir()
		)
		self.assertEqual( # test files are still there
		3,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there 3 test files?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just the nature of that test dataset. datasets can have any number of files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I meant, why does this particular test result in 3 files?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beats me. I picked it for being small (in file size) and expired. I suspect it's an experimental dataset that was used to explore the data api itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a better-known test file then? The test looks rather mysterious like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Wrong. Sorry. The one with 3 files is used in TestStormEurope.test_icon_read. Reading icon files takes a directory as input and collects data from there. Having more than one makes complete sense. And the test is fairly known. Apart from that I think it doesn't really matter which dataset we pick as long as size is acceptable and status and version make a difference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, if it is clear to you, I am fine with it.