Means by LukeMathWalker · Pull Request #20 · rust-ndarray/ndarray-stats

@LukeMathWalker

A basic implementation of arithmetic, harmonic and geometric mean.

Even though ndarray exposes mean_axis it does not provide mean, hence I have added it to the PR. Should I contribute it back to ndarray @jturner314?

@LukeMathWalker

jturner314

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I've added some comments.

A couple more things:

  1. I do think we should add .mean() to ndarray since ndarray already has .sum() and .mean_axis().

  2. It's unfortunate that harmonic_mean and geometric_mean allocate a temporary array for the result of the map. However, for the time being, I don't see a good way to avoid this while still taking advantage of the pairwise implementation of .sum() (rust-ndarray/ndarray#577).

@LukeMathWalker

@LukeMathWalker

@LukeMathWalker

I do think we should add .mean() to ndarray since ndarray already has .sum() and .mean_axis().

I agree - but the semantic of mean_axis is different at the moment: it panics instead of returning a None if the array is empty. That might have to be changed.

It's unfortunate that harmonic_mean and geometric_mean allocate a temporary array for the result of the map. However, for the time being, I don't see a good way to avoid this while still taking advantage of the pairwise implementation of .sum() (rust-ndarray/ndarray#577).

It's exactly what I thought (and mean calculation is why I drafted the pairwise summation PR in the first place). I think it can be optimized to use less memory (we could use a lazy version of map and re-engineer pairwise summation to consume the array in blocks, thus only allocating a fixed size buffer to hold the current block it's working on and partial results), but I wouldn't pursue it right now.

@LukeMathWalker

CI on Rust 1.30 is failing with error: Edition 2018 is unstable and only available for nightly builds of rustc. for noisy_floats. I am bumping Rust version to 1.31 in CI to fix it.

@LukeMathWalker

@jturner314

Everything looks good, so I merged the PR. If .mean() gets added to ndarray, we can remove it from ndarray-stats.

I agree - but the semantic of mean_axis is different at the moment: it panics instead of returning a None if the array is empty. That might have to be changed.

Will you please submit a PR to ndarray to add .mean()? I lean towards changing mean_axis to return None instead of making mean divide by zero, because of this reasoning. However, for floating-point values, division by zero would be more convenient since it would just result in NaN instead of panicking. What do you think? I'll be interested to hear @bluss's opinion too.

@bluss

@jturner314 Your rule makes sense, sounds good to me to return an Option. I think it's quite frequent anyway that you want to compute a mean and if empty, get zero, and the option makes it simple to do that.

@LukeMathWalker

I do agree with that reasoning and I still think that it's better to return None even for float values: it's much easier to track down an unwrap or an expect that panicked instead of a NaN that might have propagated way down the line when it actually gets detected.

I'll draft the PR for ndarray.