Add support for newlines, backslashes, trailing comments and unquoted UTF-8 by bbc2 · Pull Request #148 · theskumar/python-dotenv

OlegSmelov

This was also caught by Flake8 as:

    ./dotenv/main.py:19:2: W605 invalid escape sequence '\$'
    ./dotenv/main.py:19:4: W605 invalid escape sequence '\{'
    ./dotenv/main.py:19:8: W605 invalid escape sequence '\}'
    ./dotenv/main.py:19:12: W605 invalid escape sequence '\}'
This avoids the use of the `is_file` class variable by abstracting away
the difference between `StringIO` and a file stream.
Parsing .env files is a critical part of this package.  To make it
easier to change it and test it, it is important that it is done in only
one place.

Also, code that uses the parser now doesn't depend on the fact that each
key-value binding spans exactly one line.  This will make it easier to
handle multiline bindings in the future.

@bbc2

@bbc2 bbc2 deleted the improve-parser branch

December 5, 2018 21:13

@bbc2 bbc2 mentioned this pull request

Dec 28, 2018

@bbc2 bbc2 mentioned this pull request

Feb 16, 2019

@bbc2 bbc2 mentioned this pull request

May 21, 2019

johnbergvall pushed a commit to johnbergvall/python-dotenv that referenced this pull request

Aug 13, 2021
… UTF-8 (theskumar#148)

* Fix deprecation warning for POSIX variable regex

This was also caught by Flake8 as:

    ./dotenv/main.py:19:2: W605 invalid escape sequence '\$'
    ./dotenv/main.py:19:4: W605 invalid escape sequence '\{'
    ./dotenv/main.py:19:8: W605 invalid escape sequence '\}'
    ./dotenv/main.py:19:12: W605 invalid escape sequence '\}'

* Turn get_stream into a context manager

This avoids the use of the `is_file` class variable by abstracting away
the difference between `StringIO` and a file stream.

* Deduplicate parsing code and abstract away lines

Parsing .env files is a critical part of this package.  To make it
easier to change it and test it, it is important that it is done in only
one place.

Also, code that uses the parser now doesn't depend on the fact that each
key-value binding spans exactly one line.  This will make it easier to
handle multiline bindings in the future.

* Parse newline, UTF-8, trailing comment, backslash

This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338