ARROW-1392: [C++] Add GPU IO interfaces for CUDA by wesm · Pull Request #985 · apache/arrow

@wesm

This makes it easy to write from host to device and read from device to host. We also need a zero-copy device reader for IPC purposes (where we don't want to move any data to the host), can do that in a subsequent patch.

Change-Id: I3ada86d70d3bf93d1e5dd02a17bbaa5f2faebc34
Change-Id: I205b2861797c57756a38f4c695937d354d497b53

@wesm

I'm going to start a quick cuda-benchmark since I'm curious if buffering writes affects performance at all

@wesm

@wesm

Write buffering is important when the size of individual writes are small; this will be important to enable when writing to GPU memory on IPC hot paths.

In this benchmark, a total of 128MB is written to a GPU buffer using a chunk size varying from 256 bytes to 64K. In the Buffered case, an 8MB host buffer created with cudaMallocHost is used

$ ./release/cuda-benchmark 
Run on (8 X 4200 MHz CPU s)
2017-08-22 15:33:53
Benchmark                                                  Time           CPU Iterations
----------------------------------------------------------------------------------------
BM_Writer_Buffered/256/min_time:1.000/real_time     24026339 ns   24027197 ns         58   5.20262GB/s
BM_Writer_Buffered/4k/min_time:1.000/real_time      23202300 ns   23202817 ns         60    5.3874GB/s
BM_Writer_Buffered/64k/min_time:1.000/real_time     23167510 ns   23168351 ns         60   5.39549GB/s
BM_Writer_Unbuffered/256/min_time:1.000/real_time 1691322141 ns 1691377175 ns          1   75.6804MB/s
BM_Writer_Unbuffered/4k/min_time:1.000/real_time   125177161 ns  125181662 ns         11   1022.55MB/s
BM_Writer_Unbuffered/64k/min_time:1.000/real_time   26685025 ns   26685828 ns         54   4.68428GB/s
Change-Id: Idd283920e38682a85f42621cc5123ad08837cd28
Change-Id: I7acd582f4f4ea3dba28ecd4ddb2d19c0101d2409

@wesm

I still need to add reader tests. On contemplation I think this should be zero-copy and return device pointers; that would be much more useful at the moment.

@wesm

cc @m1mc, I can copy you on GPU related PRs if you want to help review

@wesm

…it tests

Change-Id: I84b55e37a26989f4e99dd9d4fa5ae7ed5a4ad484

@wesm

OK, I will merge this on a green build. These interfaces will see hardening when used on the IPC code paths. The idea is that arrow::gpu::CudaBufferReader can be used for zero-copy IPC reads, and the reconstructed record batches will contain device pointers internally. The user can unbox these pointers into whatever data structure they want

@wesm

Change-Id: Id0c05025d2c5a72c0461a63b91d2f2bbb53f030e

@wesm

@xhochy I added const to the Tell method on files. This probably should have been done in the first place. The impact of this on downstream users is nil, but if any third parties have created their own implementations they will have to add const. This seems unlikely, so I don't think it's needed to go through a deprecation cycle with this

@wesm

+1. One of the Travis CI entries had an unrelated transient failure

@wesm wesm deleted the ARROW-1392 branch

August 23, 2017 13:42