pyarrow.csv.ParseOptions — Apache Arrow v23.0.1

class pyarrow.csv.ParseOptions(delimiter=None, *, quote_char=None, double_quote=None, escape_char=None, newlines_in_values=None, ignore_empty_lines=None, invalid_row_handler=None)#

Bases: _Weakrefable

Options for parsing CSV files.

Parameters:

delimiter1-character str, optional (default ‘,’): The character delimiting individual cells in the CSV data.
quote_char1-character str or False, optional (default ‘”’): The character used optionally for quoting CSV values (False if quoting is not allowed).
double_quotebool, optional (default True): Whether two quotes in a quoted CSV value denote a single quote in the data.
escape_char1-character str or False, optional (default False): The character used optionally for escaping special characters (False if escaping is not allowed).
newlines_in_valuesbool, optional (default False): Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.
ignore_empty_linesbool, optional (default True): Whether empty lines are ignored in CSV input. If False, an empty line is interpreted as containing a single empty value (assuming a one-column CSV file).
invalid_row_handlercallable(), optional (default None): If not None, this object is called for each CSV row that fails parsing (because of a mismatching number of columns). It should accept a single InvalidRow argument and return either “skip” or “error” depending on the desired outcome.

Examples

Defining an example file from bytes object:

>>> import io
>>> s = (
...     "animals;n_legs;entry\n"
...     "Flamingo;2;2022-03-01\n"
...     "# Comment here:\n"
...     "Horse;4;2022-03-02\n"
...     "Brittle stars;5;2022-03-03\n"
...     "Centipede;100;2022-03-04"
... )
>>> print(s)
animals;n_legs;entry
Flamingo;2;2022-03-01
# Comment here:
Horse;4;2022-03-02
Brittle stars;5;2022-03-03
Centipede;100;2022-03-04
>>> source = io.BytesIO(s.encode())

Read the data from a file skipping rows with comments and defining the delimiter:

>>> from pyarrow import csv
>>> def skip_comment(row):
...     if row.text.startswith("# "):
...         return 'skip'
...     else:
...         return 'error'
...
>>> parse_options = csv.ParseOptions(delimiter=";", invalid_row_handler=skip_comment)
>>> csv.read_csv(source, parse_options=parse_options)
pyarrow.Table
animals: string
n_legs: int64
entry: date32[day]
----
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
n_legs: [[2,4,5,100]]
entry: [[2022-03-01,2022-03-02,2022-03-03,2022-03-04]]

__init__(*args, **kwargs)#

Methods

Attributes

delimiter#: The character delimiting individual cells in the CSV data.

double_quote#: Whether two quotes in a quoted CSV value denote a single quote in the data.

equals(self, ParseOptions other)#

Parameters:

otherpyarrow.csv.ParseOptions

Returns:

bool

escape_char#: The character used optionally for escaping special characters (False if escaping is not allowed).

ignore_empty_lines#: Whether empty lines are ignored in CSV input. If False, an empty line is interpreted as containing a single empty value (assuming a one-column CSV file).

invalid_row_handler#

Optional handler for invalid rows.

If not None, this object is called for each CSV row that fails parsing (because of a mismatching number of columns). It should accept a single InvalidRow argument and return either “skip” or “error” depending on the desired outcome.

newlines_in_values#: Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.

quote_char#: The character used optionally for quoting CSV values (False if quoting is not allowed).

validate(self)#