How fast can we multiply?
tim_one at email.msn.com
tim_one at email.msn.com
Wed Jul 21 19:06:47 EDT 1999
More information about the Python-list mailing list
Wed Jul 21 19:06:47 EDT 1999
- Previous message (by thread): How fast can we multiply?
- Next message (by thread): How fast can we multiply?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Jeffrey Chang <jefftc at leland.Stanford.EDU>] > Oh, shoot. I'm 2 days into a week long calculation of normalizing > a dense ~6000x6000 matrix. Do you mean I'm the only person doing > this kind of thing? ;) Yes <wink>. > I've been toying around with the idea of working on optimized > matrix functions for NumPy (while waiting for the computations to > finish!). Since I often work with large matrices, there could > be big wins for me. However, it's a little disheartening to hear > that few other people would benefit from it. > > I know the O(n**log2(7)) algorithm for matrix multiply and am > aware that there are faster ones. Does anyone know what they are? > Can anyone recommend a good general book for optimized matrix > algorithms? Is Knuth 2 the best source? It's probably the best accessible source for a quick overview of the theory. In practice, though, you're pretty much hosed: I don't know what your 6000x6000 matrices *contain*, but if it's doubles or even floats then each consumes a significant fraction of a gigabyte. So unless you're running on a real-memory machine, cache effects are likely as least as important as operation count; on many platforms, much more important. When high performance requires coding to the most obscure details of a platform's memory hierarchy, portability goes out the window. There's an interesting paper about how all this relates to Strassen- Winograd matmult (the simplest of the "advanced" methods) at: http://www.cs.duke.edu/~alvy/papers/sc98/index.htm Here's part of the abstract: Strassen's algorithm for matrix multiplication gains its lower arithmetic complexity at the expense of reduced locality of reference, which makes it challenging to implement the algorithm efficiently on a modern machine with a hierarchical memory system. We report on an implementation of this algorithm that uses several unconventional techniques to make the algorithm memory-friendly. ... Performance comparisons of our implementation with that of competing implementations show that our implementation often outperforms the alternative techniques (up to 25%). However, we also observe wide variability across platforms and across matrix sizes, indicating that at this time, no single implementation is a clear choice for all platforms or matrix sizes. ... Par for the course. buy-an-old-cray-while-they're-still-cheap<wink>-ly y'rs - tim Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't.
- Previous message (by thread): How fast can we multiply?
- Next message (by thread): How fast can we multiply?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list