ARROW-31: Python: prototype user object model, add PyList conversion path with type inference by wesm · Pull Request #19 · apache/arrow

added 21 commits

March 7, 2016 14:44
…k toward converting Python lists to Arrow arrays

@wesm wesm deleted the ARROW-31 branch

March 7, 2016 23:02

wesm added a commit to wesm/arrow that referenced this pull request

Sep 8, 2018
I adapted this functionality from Apache Kudu (incubating). There are no real unit tests, yet, but you can now run `ctest` after building to run all tests that have been created with `ADD_PARQUET_TEST`.

Author: Wes McKinney <wes@cloudera.com>

Closes apache#19 from wesm/googletest-infra and squashes the following commits:

758328f [Wes McKinney] BLD: disable fixed OSX deployment target. Compile gtest with -fPIC
61cc5bb [Wes McKinney] Remove 'set -e' from setup_build_env.sh
6435970 [Wes McKinney] Fix setup_build_env.sh script
a54a219 [Wes McKinney] Add googletest to thirdparty and add ADD_PARQUET_TEST cmake helper and support scripts for using ctest after make

Change-Id: I7e86dfc9ae2590d8053ffe3dc687f78008faf3b2

zhouyuan pushed a commit to zhouyuan/arrow that referenced this pull request

Mar 27, 2020
[C++] Use one buffer for merge instead of multiple slices

kou pushed a commit that referenced this pull request

May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>

projjal pushed a commit to projjal/arrow that referenced this pull request

Oct 29, 2020
…ression (apache#19)

Closes apache#8530 from pprudhvi/stringalloc

Authored-by: Prudhvi Porandla <prudhvi.porandla@icloud.com>
Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>

icexelloss pushed a commit to icexelloss/arrow that referenced this pull request

Oct 28, 2022
* ARROW-18008: Added a use_threads option to run_query

* ARROW-18008: pylint

kou pushed a commit that referenced this pull request

Dec 18, 2024
…zone (#45051)

### Rationale for this change

If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.

This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.

Here is a backtrace excerpt:
```
#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116
#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
```

### What changes are included in this PR?

Catch C++ exceptions when iterating ORC batches instead of letting them slip through.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #40633

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request

Oct 24, 2025
* apacheGH-19: Add CONTRIBUTING.md

* Update CONTRIBUTING.md

Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>

---------

Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>