[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data by dengemann · Pull Request #2248 · scikit-learn/scikit-learn
Hi there, I finally got it running.
This implements a feature 'advocated' on this scipy page (section on large arrays and linalg):
http://wiki.scipy.org/PerformanceTips
When directly calling blass instead of np.dot it's possible to avoid copying when data are passed in F-contiguous order. In addition I've added chunking to the _logcosh function which avoids an extra copy.
This is now how it looks on 1GB testing data:
This was how the same test would have looked on the current master (plot from the last memory PR):
To make this functionality available for other use cases I've added a fast_dot function to utils.extmath with almost stupid but explicit tests that exemplify the mapping between np.dot and fast_dot which can be a hell.
Finally I've made sure that down-stream applications are still workin. For example with this local branch the mne-python ICA looks as good as it had looked before.

