Means by LukeMathWalker · Pull Request #20 · rust-ndarray/ndarray-stats
A basic implementation of arithmetic, harmonic and geometric mean.
Even though ndarray exposes mean_axis it does not provide mean, hence I have added it to the PR. Should I contribute it back to ndarray @jturner314?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I've added some comments.
A couple more things:
-
I do think we should add
.mean()tondarraysincendarrayalready has.sum()and.mean_axis(). -
It's unfortunate that
harmonic_meanandgeometric_meanallocate a temporary array for the result of themap. However, for the time being, I don't see a good way to avoid this while still taking advantage of the pairwise implementation of.sum()(rust-ndarray/ndarray#577).
I do think we should add .mean() to ndarray since ndarray already has .sum() and .mean_axis().
I agree - but the semantic of mean_axis is different at the moment: it panics instead of returning a None if the array is empty. That might have to be changed.
It's unfortunate that harmonic_mean and geometric_mean allocate a temporary array for the result of the map. However, for the time being, I don't see a good way to avoid this while still taking advantage of the pairwise implementation of .sum() (rust-ndarray/ndarray#577).
It's exactly what I thought (and mean calculation is why I drafted the pairwise summation PR in the first place). I think it can be optimized to use less memory (we could use a lazy version of map and re-engineer pairwise summation to consume the array in blocks, thus only allocating a fixed size buffer to hold the current block it's working on and partial results), but I wouldn't pursue it right now.
CI on Rust 1.30 is failing with error: Edition 2018 is unstable and only available for nightly builds of rustc. for noisy_floats. I am bumping Rust version to 1.31 in CI to fix it.
Everything looks good, so I merged the PR. If .mean() gets added to ndarray, we can remove it from ndarray-stats.
I agree - but the semantic of
mean_axisis different at the moment: it panics instead of returning aNoneif the array is empty. That might have to be changed.
Will you please submit a PR to ndarray to add .mean()? I lean towards changing mean_axis to return None instead of making mean divide by zero, because of this reasoning. However, for floating-point values, division by zero would be more convenient since it would just result in NaN instead of panicking. What do you think? I'll be interested to hear @bluss's opinion too.
@jturner314 Your rule makes sense, sounds good to me to return an Option. I think it's quite frequent anyway that you want to compute a mean and if empty, get zero, and the option makes it simple to do that.
I do agree with that reasoning and I still think that it's better to return None even for float values: it's much easier to track down an unwrap or an expect that panicked instead of a NaN that might have propagated way down the line when it actually gets detected.
I'll draft the PR for ndarray.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters