chore(deps): update dependency pyarrow to v23 by renovate[bot] · Pull Request #242 · A-aung/python-docs-samples

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Confidence
pyarrow ==3.0.0==23.0.1 age confidence

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

apache/arrow (pyarrow)

v6.0.1

Bug Fixes

  • ARROW-14437 - [Python] Make CSV cancellation test more robust
  • ARROW-14492 - [JS] Fix export for browser bundles
  • ARROW-14513 - [Release][Go] Add /v6 suffix to release-6.0.0
  • ARROW-14519 - [C++] joins segfault when data contains list column
  • ARROW-14523 - [C++] Fix potential data loss in S3 multipart upload
  • ARROW-14538 - [R] Work around empty tr call on Solaris
  • ARROW-14550 - [Doc] Remove the JSON license; a non-free one.
  • ARROW-14583 - [R][C++] Crash when summarizing after filtering to no rows on partitioned data
  • ARROW-14584 - [Python][CI] Python sdist installation fails with latest setuptools 58.5
  • ARROW-14620 - [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior
  • ARROW-14630 - [C++] DCHECK in GroupByNode when error encountered
  • ARROW-14739 - [JS][Docs] Point to wrong source
  • ARROW-15071 - [C#] Fixed a bug in Column.cs ValidateArrayDataTypes method
  • ARROW-15072 - [R] Error: This build of the arrow package does not support Datasets

New Features and Improvements

  • ARROW-13156 - [R] bindings for str_count
  • ARROW-14181 - [C++][Compute] Hash Join support for dictionary
  • ARROW-14189 - [Docs] Add version dropdown to the sphinx docs
  • ARROW-14310 - [R] Make expect_dplyr_equal() more intuitive
  • ARROW-14365 - [R] Update README example to reflect new capabilities
  • ARROW-14390 - [Packaging][Ubuntu] Add support for Ubuntu 21.10
  • ARROW-14433 - [Release][APT] Skip arm64 Ubuntu 21.04 verification
  • ARROW-14450 - [R] Old macos build error
  • ARROW-14459 - [Doc] Update the pinned sphinx version to 4.2
  • ARROW-14480 - [R] Expose arrow::dataset::ExistingDataBehavior to R
  • ARROW-14486 - [Packaging][deb] Add missing libthrift-dev dependency
  • ARROW-14490 - [Doc] Regenerate CHANGELOG.md to include all versions
  • ARROW-14496 - [Docs] Create relative links for R / JS / C/Glib references in the sphinx toctree using stub pages
  • ARROW-14499 - [Docs] Version dropdown side-by-side with search box
  • ARROW-14514 - [C++][R] UBSAN error on round kernel
  • ARROW-14580 - [Python] update trove classifiers to include Python 3.10
  • ARROW-14623 - [Packaging][Java] Upload not only .jar but also .pom
  • ARROW-14628 - [Release][Python] Use python -m pytest
  • ARROW-15058 - [Java] Remove log4j2 dependency in performance module

v6.0.0

Bug Fixes

  • ARROW-6946 - [Go] Run tests with assert build tag enabled to ensure safety
  • ARROW-8452 - [Go] support proper nested nullable flags
  • ARROW-8453 - [Go][Integration] Support and enable recursive nested type integration tests
  • ARROW-8999 - [Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build
  • ARROW-9948 - [C++] Fix scale handling in Decimal{128, 256}::FromString
  • ARROW-10213 - [C++] Temporal cast from timestamp to date rounds instead of extracting date component
  • ARROW-10373 - [C++] Validate null_count in Array::ValidateFull()
  • ARROW-10773 - [R] parallel as.data.frame.Table hangs indefinitely on Windows
  • ARROW-11518 - [C++][Parquet] Fix buffer allocation when reading/skipping boolean columns
  • ARROW-11579 - [R] read_feather hanging on Windows
  • ARROW-11634 - [C++][Parquet] Parquet statistics (min/max) for dictionary columns are incorrect
  • ARROW-11729 - [R] Add examples to datasets documentation
  • ARROW-12011 - [C++] Fix crashes and incorrect results when printing extreme date values
  • ARROW-12072 - [Go] Fix panics in ipc writer for sliced records
  • ARROW-12087 - [C++] Allow sorting durations, timestamps with timezones
  • ARROW-12321 - [R][C++] Arrow opens too many files at once when writing a dataset
  • ARROW-12513 - [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls
  • ARROW-12540 - [C++] Implementing casting support from date32/date64 to uft8/large_utf8
  • ARROW-12636 - [JS] ESM Tree-Shaking produces broken code
  • ARROW-12700 - [R] Read/Write_feather stuck forever after bad write, R, Win32
  • ARROW-12837 - [C++] Do not crash when printing invalid arrays
  • ARROW-13134 - [C++][CI] Unpin conda package for aws-sdk-cpp
  • ARROW-13151 - [C++][Parquet] Propagate schema changes from selection all the way up the stack
  • ARROW-13198 - [C++][Dataset] Async scanner occasionally segfaulting in CI
  • ARROW-13293 - [R] open_dataset followed by collect hangs (while compute works)
  • ARROW-13304 - [C++] Unable to install nightly on Ubuntu 21.04 due to day of week options
  • ARROW-13336 - [Doc] Make clean in docs should clean generated docs
  • ARROW-13422 - [R] Clarify README about S3 support on Windows
  • ARROW-13424 - [C++] Remove needless workaround for conda and benchmark
  • ARROW-13425 - [Archery] Avoid importing PyArrow indirectly
  • ARROW-13429 - [C++][Gandiva] Fix Gandiva codegen for if-else expression with binary type
  • ARROW-13430 - [Go] fix handling of zero value for FromBigInt
  • ARROW-13436 - [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
  • ARROW-13437 - [C++] Relax FixedSizeList validation to allow excess child values
  • ARROW-13441 - [C++][CSV] Skip empty batches in column decoder
  • ARROW-13443 - [C++] : Fix the incorrect mapping from flatbuf::MetadataVersion to arrow::ipc::MetadataVersion
  • ARROW-13445 - [Java][Packaging] Fix artifact patterns for the Java jars
  • ARROW-13446 - [Release] Fix verification on amazon linux
  • ARROW-13447 - [Release] Verification script for arm64 and universal2 macOS wheels
  • ARROW-13450 - [Python][Packaging] Set deployment target to 10.13 for universal2 wheels
  • ARROW-13469 - [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h
  • ARROW-13474 - [Python] Fix crash in take/filter of empty ExtensionArray
  • ARROW-13477 - [Release] Pass ARTIFACTORY_API_KEY to the upload script
  • ARROW-13484 - [Release] Add support for uploading Amazon Linux 2 packages
  • ARROW-13490 - [R][CI] Need to gate duckdb examples on duckdb version
  • ARROW-13492 - [R][CI] Move r tools 35 build back to per-commit/pre-PR
  • ARROW-13493 - [C++] Anonymous structs in an anonymous union are a GNU extension
  • ARROW-13495 - [C++][Compute] Fixing unaligned memory access in GrouperFastImpl
  • ARROW-13496 - [CI][R] Repair r-sanitizer job
  • ARROW-13497 - [C++][R] FunctionOptions not used by aggregation nodes
  • ARROW-13499 - [R] Aggregation on expression doesn't NSE correctly
  • ARROW-13500 - [C++] Fix using '-Wno-unknown-warning-option' with GCC
  • ARROW-13504 - [Python] Move marks from fixtures to individual tests/params
  • ARROW-13507 - [R] LTO job on CRAN fails
  • ARROW-13509 - [C++] Take kernel with empty inputs
  • ARROW-13522 - [C++] Fix regression in UTF8 trim functions
  • ARROW-13523 - [C++] Normalize test executable name
  • ARROW-13524 - [C++] Fix description for ApplicationVersion::VersionEq
  • ARROW-13529 - [Go] Fixing too many releases in IPC writer
  • ARROW-13538 - [R][CI] Don't test DuckDB in the minimal build
  • ARROW-13543 - [R] Handle summarize() with 0 arguments or no aggregate functions
  • ARROW-13556 - [C++] Add protobuf to linking for flight
  • ARROW-13559 - [CI][C++] Move the test-conda-cpp-valgrind nightly build to azure
  • ARROW-13560 - [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys
  • ARROW-13580 - [C++] quoted_strings_can_be_null only applied to string columns
  • ARROW-13597 - [C++][Compute] Remove AddOnLoad helper
  • ARROW-13600 - [C++] Fix maybe uninitialized warnings
  • ARROW-13602 - [C++] Fix strict aliasing warning in bit util test
  • ARROW-13603 - [GLib] Fix typos in GARROW_VERSION_CHECK()
  • ARROW-13605 - [C++] Capture node with shared_ptr to avoid TSan warning
  • ARROW-13608 - [R] vendor cpp11 to fix segfault under LTO
  • ARROW-13611 - [C++] Scanning datasets does not enforce back pressure
  • ARROW-13624 - [R] readr short type mapping has T and t backwards
  • ARROW-13628 - [Format][C++][Java] Add MONTH_DAY_NANO interval type
  • ARROW-13630 - [CI][C++][s390x] Reduce parallelism to build Arrow library
  • ARROW-13632 - [C++] Fix filtering of sliced FixedSizeList array
  • ARROW-13638 - [C++] Hold owned copy of function options in GroupByNode
  • ARROW-13639 - [C++] Fix out-of-bounds access in Concatenate with null slots and empty dictionary
  • ARROW-13654 - [C++][Parquet] Avoid infinite loop when appending a FileMetaData to itself
  • ARROW-13655 - [C++][Parquet] Disable Thrift message size protections
  • ARROW-13662 - [CI] Fix failing strftime test with older pandas
  • ARROW-13662 - [CI] Failing test test_extract_datetime_components with pandas 0.24
  • ARROW-13669 - [C++] Fix variant emplace methods (add brackets)
  • ARROW-13671 - [Dev] Fix conda recipe on Arm 64k page system
  • ARROW-13676 - [C++][Parquet] Avoid potential invalid access.
  • ARROW-13681 - [C++] Fix list_parent_indices behaviour on chunked array
  • ARROW-13685 - [C++] Cannot write dataset to S3FileSystem if bucket already exists
  • ARROW-13689 - [C#][Integration] Initial commit of C# Integration tests
  • ARROW-13694 - [R] Arrow filter crashes (R aborted session)
  • ARROW-13743 - [CI] OSX job fails due to incompatible git and libcurl
  • ARROW-13744 - [CI] c++14 and 17 nightly job fails
  • ARROW-13747 - [Python][CI] Requiring s3fs >= 2021.8
  • ARROW-13755 - [Python] Allow writing datasets using a partitioning that only specifies field_names
  • ARROW-13761 - [R] arrow::filter() crashes (aborts R session)
  • ARROW-13784 - [Python] Table.from_arrays should raise an error when array is empty but names is not
  • ARROW-13786 - [R][CI] Don't fail the RCHK build if arrow doesn't build
  • ARROW-13788 - [C++] Temporal component extraction functions don't support date32/64
  • ARROW-13792 - [Java] : The toString representation is incorrect for unsigned integer vectors
  • ARROW-13799 - [R] case_when error handling is capturing strings
  • ARROW-13800 - [R] Use divide instead of divide_checked
  • ARROW-13812 - [C++] Fix Valgrind error in Grouper.BooleanKey test
  • ARROW-13814 - [CI] Fix Spark master integration tests
  • ARROW-13819 - [C++] Initialize subseconds in value_parsing.h
  • ARROW-13846 - [C++] Fix crashes on invalid IPC file
  • ARROW-13850 - [C++] Fix crashes on invalid Parquet data
  • ARROW-13860 - [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
  • ARROW-13865 - [C++][R] Writing moderate-size parquet files of nested dataframes from R slows down/process hangs
  • ARROW-13872 - [Java] ExtensionTypeVector does not work with RangeEqualsVisitor
  • ARROW-13876 - [C++] Add trivial null kernels to arithmetic, sort functions
  • ARROW-13877 - [C++] Support FixedSizeList in generic list kernels
  • ARROW-13878 - [C++] Implement fixed-size-binary support for several kernels
  • ARROW-13880 - [C++] Compute function sort_indices does not support timestamps with time zones
  • ARROW-13881 - [C++][FlightRPC][Packaging] Ensure Flight is packaged with advanced TLS options on Windows
  • ARROW-13882 - [C++] Improve min_max/hash_min_max type support
  • ARROW-13884 - [JS] Move source files into a separate directory
  • ARROW-13912 - [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies
  • ARROW-13913 - [C++] Don't segfault if IndexOptions omitted
  • ARROW-13915 - [R][CI] R UCRT C++ bundles are incomplete
  • ARROW-13916 - [C++] Implement strftime on date32/64 types
  • ARROW-13921 - [Python][Packaging] Pin minimum setuptools version for the macos wheels
  • ARROW-13940 - [R] Turn on multithreading with Arrow engine queries
  • ARROW-13961 - [C++] Fix use of non-const references, declaration without initialization
  • ARROW-13976 - [C++] Add path to libjvm.so in ARM CPU
  • ARROW-13978 - [C++] Bump gtest to 1.11 to unbreak builds with recent clang
  • ARROW-13981 - [Java] VectorSchemaRootAppender doesn't work for BitVector
  • ARROW-13982 - [C++] Don't stall in async scanner if a fragment generates no batches
  • ARROW-13983 - [C++] Avoid raising error if fadvise() isn't supported
  • ARROW-13996 - [Go][Parquet] Fix file offsets in go impl
  • ARROW-13997 - [C++] restore exec node based query performance
  • ARROW-14001 - [Go] Fixing AppendBoolean function in BitmapWriter
  • ARROW-14004 - [Python][Doc] Document nullable dtypes handling and usage of types_mapper in to_pandas conversion
  • ARROW-14014 - [Java] Fix Flight parseTrailers for :status keys
  • ARROW-14017 - [C++] NULLPTR is not included in type_fwd.h
  • ARROW-14020 - [R] Writing datafames with list columns is slow and scales poorly with nesting level
  • ARROW-14024 - [C++] Test that batch size is respected for IPC/CSV
  • ARROW-14026 - [C++] Enable batch parallelism in Parquet scanner
  • ARROW-14027 - [C++] Handle scalars in Grouper
  • ARROW-14040 - [C++] Fix result order dependence in scanner test
  • ARROW-14053 - [C++][CSV] Use atomic counter for async tests
  • ARROW-14057 - [C++] Bump aws-c-common version
  • ARROW-14063 - [R] open_dataset() does not work on CSVs without header rows
  • ARROW-14076 - Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
  • ARROW-14090 - [C++][Parquet] rows_written_ should be int64_t instead of int
  • ARROW-14103 - [R] [C++] Allow min/max in grouped aggregation
  • ARROW-14109 - [C++] Fix segfault when parsing JSON with duplicate keys.
  • ARROW-14124 - [R] Timezone support in R <= 3.4
  • ARROW-14129 - [C++][Python] Fix unique/value_counts on empty dictionary arrays
  • ARROW-14139 - [IR][C++] Table flatbuffer object fails to compile on older GCCs
  • ARROW-14141 - [IR][C++] Join missing from RelationImpl
  • ARROW-14156 - [C++] Properly synthesize validity buffer in StructArray::Flatten
  • ARROW-14162 - [R] Simple arrange %>% head does not respect ordering
  • ARROW-14173 - [IR] Allow typed null literals to be represented
  • ARROW-14179 - [C++][C] Do not export/import null bitmap for union and null types
  • ARROW-14184 - [C++] allow joins where the keys include new columns on the left
  • ARROW-14192 - [C++][Dataset] Backpressure broken on ordered scans
  • ARROW-14195 - [R] Fix ExecPlan binding annotations
  • ARROW-14197 - [C++][Compute] Fixing wrong buffer size in GrouperFastImpl
  • ARROW-14200 - [R] strftime on a date should not use or be confused by timezones
  • ARROW-14203 - [C++] Fix description of ExecBatch.length for Scalars in aggregate kernels
  • ARROW-14204 - [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
  • ARROW-14206 - [Go][Parquet] Clean up s390x and arm build code
  • ARROW-14206 - [Go][CI] Fix build on s390x and ARM
  • ARROW-14208 - [C++] Fix compilation on Windows
  • ARROW-14210 - [C++] Add AR and RANLIB flags to bzip2
  • ARROW-14211 - [C++][Compute] Fixing thread sanitizer problems in hash join node
  • ARROW-14214 - [Python][CI] Fix tests using OrcFileFormat for Python 3.6 + orc not built
  • ARROW-14216 - [R] Disable auto-cleaning of duckdb tables
  • ARROW-14219 - [R][CI] DuckDB valgrind failure
  • ARROW-14220 - [C++] Missing ending quote in thirdpartyversions
  • ARROW-14221 - [R][CI] DuckDB tests fail on R < 4.0
  • ARROW-14223 - [C++] add missing third-party dependency
  • ARROW-14224 - [C++] Try to reduce build time/memory usage
  • ARROW-14226 - [R] Handle n_distinct() (and others) with args != 1
  • ARROW-14237 - [R][CI] Disable altrep in R <= 3.5
  • ARROW-14240 - [C++] Fix wrong nlohmann-json header path
  • ARROW-14246 - [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()
  • ARROW-14247 - [C++] Fix Valgrind errors in parquet-arrow-test
  • ARROW-14249 - [R] Slow down in dataframe-to-table benchmark
  • ARROW-14252 - [R] Partial matching of arguments warning
  • ARROW-14255 - [Python] Fix FlightClient.do_action
  • ARROW-14257 - [Python][Docs] Fix usage of sync scanner in dataset writing docs
  • ARROW-14260 - [C++] GTest linker error with vcpkg and Visual Studio 2019
  • ARROW-14283 - [CI][C++] Use LLVM 12 on macOS GHA builds
  • ARROW-14285 - [C++] Fix crashes when pretty-printing data from valid IPC file
  • ARROW-14299 - [Dev][CI] Avoid downloading MinIO multiple times
  • ARROW-14300 - [C++][R][CI] Work around missing include in xsimd
  • ARROW-14301 - [C++] use consistent CMAKE_CXX_STANDARD definition
  • ARROW-14302 - [C++] Valgrind errors
  • ARROW-14305 - [C++][Compute] Fixing Valgrind errors in hash join node tests
  • ARROW-14307 - [R] crashes when reading empty feather with POSIXct column
  • ARROW-14313 - [Doc] Make Archery installation docs more accurate
  • ARROW-14321 - [R] segfault converting dictionary ChunkedArray with 0 chunks
  • ARROW-14340 - [C++] Bump xsimd to fix build error on Apple M1
  • ARROW-14370 - [C++] Fix memory leak in SeqMergedGeneratorTestFixture.ErrorItem
  • ARROW-14373 - [Packaging][Java] Missing LLVM dependency in the macOS java-jars build
  • ARROW-14377 - [Packaging][Python] Python 3.9 installation fails in macOS wheel build
  • ARROW-14381 - [CI][Python] Fix Spark integration failures
  • ARROW-14382 - [C++][Compute] Remove duplicated ThreadIndexer definition
  • ARROW-14392 - [C++] Bundled gRPC misses bundled Abseil include path
  • ARROW-14393 - [C++] GTest linking errors during the source release verification
  • ARROW-14397 - [C++] Fix valgrind error in test utility
  • ARROW-14406 - [CI] Skip failing test on dask-master nightly build
  • ARROW-14411 - [Release][Integration] Go integration tests fail for 6.0.0-RC1
  • ARROW-14417 - [R] Joins ignore projection on left dataset
  • ARROW-14423 - [Python] Fix version constraints in pyproject.toml
  • ARROW-14424 - [Packaging][Python] Disable windows wheel testing for python 3.6
  • ARROW-14434 - R crashes when making an empty selection for Datasets with DateTime
  • ARROW-14439 - [Python][C++] Segfault with read_json when a field is missing
  • PARQUET-2067 - [C++][Parquet] Fix Parquet null count stats for enclosing null lists
  • PARQUET-2089 - [C++] Align RowGroup file_offset with specification

New Features and Improvements

  • ARROW-1565 - [C++] Implement TopK/BottomK
  • ARROW-1568 - [C++] Implement Drop Null Kernel for Arrays
  • ARROW-4333 - [C++] Sketch out design for kernels and "query" execution in compute layer
  • ARROW-4700 - [C++] Added support for decimal128 and decimal256 json converted
  • ARROW-5002 - [C++] Implement Hash Aggregation query execution node
  • ARROW-5244 - [C++] Remove experimental marker from some APIs
  • ARROW-6072 - [C++] Implement casting List <-> LargeList
  • ARROW-6607 - [Python] Support for set/list columns when converting from Pandas
  • ARROW-6626 - [Python] Support converting nested sets when converting to arrow
  • ARROW-6870 - [C#] Add Support for Dictionary Arrays and Dictionary Encoding
  • ARROW-7102 - [Python] Make filesystems compatible with fsspec
  • ARROW-7179 - [C++][Python][R] Consolidate coalesce/fill_null
  • ARROW-7901 - [Go][Integration] enable integration tests for null case
  • ARROW-8022 - [C++] Add static and small vector implementations
  • ARROW-8147 - [C++] add GCS library to ThirdpartyToolchain
  • ARROW-8379 - [R] Investigate/fix thread safety issues (esp. Windows)
  • ARROW-8621 - [Release] Add post release step to add tags for Go versioning
  • ARROW-8780 - [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems
  • ARROW-8928 - [C++] Add microbenchmarks to help measure ExecBatchIterator overhead
  • ARROW-9226 - [Python] Support core-site.xml default filesystem.
  • ARROW-9434 - [C++] Store type code in UnionScalar
  • ARROW-9719 - [Python] Improve HadoopFileSystem docstring
  • ARROW-10094 - [Python][Doc] Document missing pandas to arrow conversions
  • ARROW-10415 - [R] Support for dplyr::distinct()
  • ARROW-10898 - [C++] Improve table sort performance
  • ARROW-11238 - [Python] Make SubTreeFileSystem print method more informative
  • ARROW-11243 - [C++] Recognize time types in CSV files
  • ARROW-11460 - [R] Use system libraries if present on Linux
  • ARROW-11691 - [Developer][CI] Provide a consolidated .env file for benchmark-relevant environment variables
  • ARROW-11748 - [C++] Ensure Decimal fields are in native endian order
  • ARROW-11828 - [C++] Expose CSVWriter object in api
  • ARROW-11885 - [R] Turn off some capabilities when LIBARROW_MINIMAL=true
  • ARROW-11981 - [C++] Implement Union ExecNode
  • ARROW-12063 - [C++] Add null placement option to sort functions
  • ARROW-12181 - [C++][R] The "CSV dataset" in test-dataset.R is failing on RTools 3.5
  • ARROW-12216 - [R] Proactively disable multithreading on RTools3.5 (32bit?)
  • ARROW-12359 - [C++] Deprecate FileSystem::OpenAppendStream
  • ARROW-12388 - [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva
  • ARROW-12410 - [C++][Gandiva] Implement regexp_replace function on Gandiva
  • ARROW-12479 - [C++][Gandiva] Implement castBigInt, castInt, castIntervalDay and castIntervalYear extra functions
  • ARROW-12563 - [C++][Gandiva] Add add_months and datediff functions for string
  • ARROW-12615 - [C++] Add options for handling NAs to stddev and variance
  • ARROW-12650 - [Doc][Python] Improve documentation regarding dealing with memory mapped files
  • ARROW-12657 - [C++] Adding String hex to numeric conversion
  • ARROW-12669 - [C++][Python] Implement a new scalar function: list_element
  • ARROW-12673 - [C++] Add callback to handle incorrect column counts
  • ARROW-12688 - [R] Use DuckDB to query an Arrow Dataset
  • ARROW-12714 - [C++] String title case kernel
  • ARROW-12725 - [C++][Compute] Column at a time hash and comparison in group by
  • ARROW-12728 - [C++] Implement count_distinct/distinct hash aggregate kernels
  • ARROW-12744 - [C++][Compute] Add rounding kernel
  • ARROW-12759 - [C++][Compute] Add ExecNode for group by
  • ARROW-12763 - [R] Optimize dplyr queries that use head/tail after arrange
  • ARROW-12846 - [Release] Reduce download/upload bandwidth for APT/Yum repositories
  • ARROW-12866 - [C++][Gandiva] Implement STRPOS function on Gandiva
  • ARROW-12871 - [R] upgrade to testthat 3e
  • ARROW-12876 - [R] Fix build flags on Raspberry Pi
  • ARROW-12944 - [C++] String capitalize kernel
  • ARROW-12946 - [C++] String swap case kernel
  • ARROW-12953 - [C++][Compute] Refactor CheckScalar* to take Datum arguments
  • ARROW-12959 - [C++][R] Option for is_null(NaN) to evaluate to true
  • ARROW-12965 - [Java] C Data Interface implementation
  • ARROW-12980 - [C++] Kernels to extract datetime components should be timezone aware
  • ARROW-12981 - [R] Install source package from CRAN alone
  • ARROW-13033 - [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time)
  • ARROW-13056 - [MATLAB] Add a matlab label for dev Pull Requests
  • ARROW-13067 - [C++][Compute] Implement integer to decimal cast
  • ARROW-13089 - [Python] Allow creating RecordBatch from Python dict
  • ARROW-13112 - [R] altrep vectors for strings and other types
  • ARROW-13132 - [C++] Add Scalar validation
  • ARROW-13138 - [C++][R] Implement extract temporal components (year, month, day, etc) from date32/64 types
  • ARROW-13141 - [Python] Update HadoopFileSystem docs to clarify setting CLASSPATH env variable is required
  • ARROW-13163 - [C++][Gandiva] Implement REPEAT function on Gandiva
  • ARROW-13164 - [R] altrep vectors from Array with nulls
  • ARROW-13172 - [Java] Make TYPE_WIDTH publicly accessible
  • ARROW-13174 - [C++][Compute] Add strftime kernel
  • ARROW-13202 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on Linux
  • ARROW-13218 - [Format] Clarify interpretation of timestamp values
  • ARROW-13220 - [C++] Implement 'choose' function
  • ARROW-13222 - [C++] Improve type support for case_when
  • ARROW-13227 - [Documentation][Compute] Document ExecNode
  • ARROW-13257 - [Java][Dataset] Allow passing empty columns for projection
  • ARROW-13268 - [C++][Compute] Add ExecNode for semi and anti-semi join
  • ARROW-13279 - [R] Use C++ DayOfWeekOptions in wday implementation instead of manually calculating via Expression
  • ARROW-13287 - [C++] [Dataset] FileSystemDataset::Write should use an async scan
  • ARROW-13295 - [C++] add hash_mean, hash_variance, hash_stddev kernels
  • ARROW-13298 - [C++] Implement any/all hash aggregate kernels
  • ARROW-13307 - [C++] Remove reflection-based enums
  • ARROW-13311 - [C++][Documentation] Document hash aggregate kernels
  • ARROW-13317 - [Python] Improve documentation on what 'use_threads' does in 'read_feather'
  • ARROW-13326 - [R][Archery] Add linting to dev CI
  • ARROW-13327 - [C++][Python] Improve consistency of explicit C++ types in PyArrow files
  • ARROW-13330 - [Go][Parquet] Add the rest of the Encoding package
  • ARROW-13344 - [R] Initial bindings for ExecPlan/ExecNode
  • ARROW-13345 - [C++] Add basic implementation for log to base b
  • ARROW-13358 - [C++] Improve type support in if_else
  • ARROW-13379 - [Dev][Docs] Improvements to archery docs
  • ARROW-13390 - [C++] Implement coalesce for remaining types
  • ARROW-13397 - [R] Update arrow.Rmd vignette
  • ARROW-13399 - [R] Update dataset.Rmd vignette
  • ARROW-13402 - [R] Update flight.Rmd vignette
  • ARROW-13403 - [R] Update developing.Rmd vignette
  • ARROW-13404 - [Doc][Python] Improve PyArrow documentation for new users
  • ARROW-13405 - [Doc] Guide users to the documentation for their own platform
  • ARROW-13416 - [C++] Implement mod compute function
  • ARROW-13420 - [JS] Update dependencies
  • ARROW-13421 - [C++][Python] Add CSV convert option to change decimal point
  • ARROW-13433 - [R] Remove CLI hack from Valgrind test
  • ARROW-13434 - [R] group_by() with an unnammed expression
  • ARROW-13435 - [R] Add function arrow_table() as alias for Table$create()
  • ARROW-13444 - [C++] Remove usage of deprecated std::result_of
  • ARROW-13448 - [R] Bindings for strftime
  • ARROW-13453 - [R] DuckDB has not yet released 0.2.8
  • ARROW-13455 - [C++][Docs] Typo in RecordBatch::SetColumn
  • ARROW-13458 - [C++][Docs] Typo in RecordBatch::schema
  • ARROW-13459 - [C++][Docs] Missing param docs for RecordBatch::SetColumn
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • ARROW-13463 - [Release][Python] Verify python 3.8 macOS arm64 wheel
  • ARROW-13465 - [R] to_arrow() from duckdb
  • ARROW-13466 - [R] make installation fail if Arrow C++ dependencies cannot be installed
  • ARROW-13468 - [Release] Fix binary download/upload failures
  • ARROW-13472 - [R] Remove .engine = "duckdb" argument
  • ARROW-13475 - [Release] Don't consider rust tarballs when cleaning up old releases
  • ARROW-13476 - [Doc][Python] Switch ipc/io doc to use context managers
  • ARROW-13478 - [Release] Unnecessary rc-number argument for the version bumping post-release script
  • ARROW-13480 - [C++] Fix possible deadlock when dataset produces an error
  • ARROW-13482 - [C++][Compute] Refactoring away from hard coded ExecNode factories to a registry
  • ARROW-13485 - [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh
  • ARROW-13488 - [Website] Update Linux packages install information for 5.0.0
  • ARROW-13489 - [R] Bump CI jobs after 5.0.0
  • ARROW-13501 - [R] Bindings for count aggregation
  • ARROW-13502 - [R] Bindings for min/max aggregation
  • ARROW-13503 - [GLib][Ruby][Flight] Add support for DoGet
  • ARROW-13506 - [C++][Java] Upgrade ORC to 1.6.9
  • ARROW-13508 - [C++] Support custom retry strategies in S3Options
  • ARROW-13510 - [CI][R][C++] Add -Wall to fedora-clang-devel as-cran checks
  • ARROW-13511 - [CI][R] Fail in the docker build step if R deps don't install
  • ARROW-13516 - [C++] Detect --version-script flag availability
  • ARROW-13519 - [R] Make doc examples less noisy
  • ARROW-13520 - [C++] Implement hash_aggregate tdigest kernel
  • ARROW-13521 - [C++][Docs] Add note about tdigest in compute functions docs
  • ARROW-13525 - [Python] Mention alternative deprecation message for ParquetDataset.partitions
  • ARROW-13528 - [R] Bindings for mean, var, sd aggregation
  • ARROW-13532 - [C++][Compute] - adding set membership type filtering to hash table interface
  • ARROW-13534 - [C++] Improve csv chunker
  • ARROW-13540 - [C++] Add order by sink node
  • ARROW-13541 - [C++][Python] Implement ExtensionScalar
  • ARROW-13542 - [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to ArrowBuf)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to JDBC)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to Vectors)
  • ARROW-13548 - [C++] Implement temporal difference kernels
  • ARROW-13549 - [C++] Add casts from timestamp to date/time
  • ARROW-13550 - [R] Support .groups argument to dplyr::summarize()
  • ARROW-13552 - [C++] Remove deprecated APIs
  • ARROW-13557 - [Packaging][Python] Skip test_cancellation test case on M1
  • ARROW-13561 - [C++] Implement week kernel that accepts WeekOptions
  • ARROW-13562 - [R] Styler followups
  • ARROW-13565 - [Packaging][Ubuntu] Drop support for 20.10
  • ARROW-13572 - [C++][Datasets] Add ORC support to Datasets API
  • ARROW-13573 - [C++] Support dictionaries natively in case_when
  • ARROW-13574 - [C++] Add 'count all' option to count kernels
  • ARROW-13575 - [C++] Add hash_product kernel
  • ARROW-13576 - [C++] Replace ExecNode::InputReceived with ::MakeTask
  • ARROW-13577 - [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error
  • ARROW-13585 - [GLib] Add support for C ABI interface
  • ARROW-13587 - [R] Handle --use-LTO override
  • ARROW-13595 - [C++] Add debug mode check for compute kernel output type
  • ARROW-13604 - [Java] : Remove deprecation annotations for APIs representing unsupported operations
  • ARROW-13606 - [R] Actually disable LTO
  • ARROW-13613 - [C++] Add decimal support to (hash) sum/mean/product
  • ARROW-13614 - [C++] Add decimal support to min_max/hash_min_max
  • ARROW-13618 - [R] Use Arrow engine for summarize() by default
  • ARROW-13620 - [R] Binding for n_distinct()
  • ARROW-13626 - [R] Bindings for log base b
  • ARROW-13627 - [C++] Fully support ScalarAggregateOptions in (hash) any/all/sum/product/mean
  • ARROW-13629 - [Ruby] Add support for building/converting map
  • ARROW-13633 - [Packaging][Debian] Add support for bookworm
  • ARROW-13634 - [R] Update distro() in nixlibs.R to map from "bookworm" to 12
  • ARROW-13635 - [Packaging][Python] Define --with-lg-page for jemalloc in the arm manylinux builds
  • ARROW-13637 - [Python] Fix docstrings
  • ARROW-13642 - [C++][Compute] Hash join node supporting all semi, anti, inner, outer join types
  • ARROW-13645 - [Java] : Allow NullVectors to have distinct field names
  • ARROW-13646 - [Go][Parquet] adding the parquet metadata package
  • ARROW-13648 - [Dev] Use #!/usr/bin/env instead of #!/bin where possible
  • ARROW-13650 - [C++] Create dataset writer to encapsulate dataset writer logic
  • ARROW-13651 - [Ruby][Symbol] to Arrow array
  • ARROW-13652 - [Python] Expose copy_files in pyarrow.fs
  • ARROW-13660 - [C++] Remove seq_num from ExecNode::InputReceived
  • ARROW-13670 - [C++] add virtual destructors
  • ARROW-13674 - [CI] PR checks should check for JIRA components
  • ARROW-13675 - [Doc][Python] Add a recipe on how to save partitioned datasets to the Cookbook
  • ARROW-13679 - [GLib][Ruby] Add support for group aggregation
  • ARROW-13680 - [C++] Create an asynchronous nursery to simplify capture logic
  • ARROW-13682 - [C++] Add TDigest API to merge one TDigest
  • ARROW-13684 - [C++][Compute] Strftime kernel follow-up
  • ARROW-13686 - [Python] Update deprecated pytest yield_fixture functions
  • ARROW-13687 - [Ruby] Add support for loading table by Arrow Dataset
  • ARROW-13691 - [C++] Support skip_nulls/min_count in VarianceOptions
  • ARROW-13693 - [Website] arrow-site should pin down a specific Ruby version and leverage toolings like rbenv
  • ARROW-13696 - [Python] Support for MapType with Fields
  • ARROW-13699 - [Python][Docs] Improve filesystem documentation
  • ARROW-13700 - [Docs][C++] Clarify DayOfWeekOptions args
  • ARROW-13702 - [Python] Add dataset mark to test_parquet_dataset_deprecated_properties
  • ARROW-13704 - [C#] Add support for reading streaming format delta dictionaries
  • ARROW-13705 - [Website] Pin node version
  • ARROW-13721 - [Doc][Cookbook] Specifying Schemas - Python
  • ARROW-13733 - [Java] : Allow JDBC adapters to

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Never, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.