- class pyarrow.csv.ParseOptions(delimiter=None, *, quote_char=None, double_quote=None, escape_char=None, newlines_in_values=None, ignore_empty_lines=None, invalid_row_handler=None)#
Bases:
_WeakrefableOptions for parsing CSV files.
- Parameters:
- delimiter1-character
str, optional (default ‘,’) The character delimiting individual cells in the CSV data.
- quote_char1-character
strorFalse, optional (default ‘”’) The character used optionally for quoting CSV values (False if quoting is not allowed).
- double_quotebool, optional (default
True) Whether two quotes in a quoted CSV value denote a single quote in the data.
- escape_char1-character
strorFalse, optional (defaultFalse) The character used optionally for escaping special characters (False if escaping is not allowed).
- newlines_in_valuesbool, optional (default
False) Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.
- ignore_empty_linesbool, optional (default
True) Whether empty lines are ignored in CSV input. If False, an empty line is interpreted as containing a single empty value (assuming a one-column CSV file).
- invalid_row_handler
callable(), optional (defaultNone) If not None, this object is called for each CSV row that fails parsing (because of a mismatching number of columns). It should accept a single InvalidRow argument and return either “skip” or “error” depending on the desired outcome.
- delimiter1-character
Examples
Defining an example file from bytes object:
>>> import io >>> s = ( ... "animals;n_legs;entry\n" ... "Flamingo;2;2022-03-01\n" ... "# Comment here:\n" ... "Horse;4;2022-03-02\n" ... "Brittle stars;5;2022-03-03\n" ... "Centipede;100;2022-03-04" ... ) >>> print(s) animals;n_legs;entry Flamingo;2;2022-03-01 # Comment here: Horse;4;2022-03-02 Brittle stars;5;2022-03-03 Centipede;100;2022-03-04 >>> source = io.BytesIO(s.encode())
Read the data from a file skipping rows with comments and defining the delimiter:
>>> from pyarrow import csv >>> def skip_comment(row): ... if row.text.startswith("# "): ... return 'skip' ... else: ... return 'error' ... >>> parse_options = csv.ParseOptions(delimiter=";", invalid_row_handler=skip_comment) >>> csv.read_csv(source, parse_options=parse_options) pyarrow.Table animals: string n_legs: int64 entry: date32[day] ---- animals: [["Flamingo","Horse","Brittle stars","Centipede"]] n_legs: [[2,4,5,100]] entry: [[2022-03-01,2022-03-02,2022-03-03,2022-03-04]]
- __init__(*args, **kwargs)#
Methods
Attributes
- delimiter#
The character delimiting individual cells in the CSV data.
- double_quote#
Whether two quotes in a quoted CSV value denote a single quote in the data.
- equals(self, ParseOptions other)#
- Parameters:
- Returns:
- escape_char#
The character used optionally for escaping special characters (False if escaping is not allowed).
- ignore_empty_lines#
Whether empty lines are ignored in CSV input. If False, an empty line is interpreted as containing a single empty value (assuming a one-column CSV file).
- invalid_row_handler#
Optional handler for invalid rows.
If not None, this object is called for each CSV row that fails parsing (because of a mismatching number of columns). It should accept a single InvalidRow argument and return either “skip” or “error” depending on the desired outcome.
- newlines_in_values#
Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.
- quote_char#
The character used optionally for quoting CSV values (False if quoting is not allowed).
- validate(self)#
pyarrow.csv.ParseOptions — Apache Arrow v23.0.1