Breaking changes in 0.10.0 and future changes

As reported by several users in #148, version 0.10.0 changed how some .env files are parsed in a breaking way, especially with regards to escapes. I think some of those changes were necessary, but some other may have been unexpected.

Experiment

It was unclear what exactly broke and whether Python-dotenv is consistent with other parsers, so I ran an experiment to compare versions and packages.

The scripts to generate these tables are in bbc2/dotenv-parser-comparisons. I may update them if I find new interesting behavior.

Basic

parser output
python-dotenv-0.9.1 a b
python-dotenv-0.10.1 a b
bash-5.0.0 a b
js-dotenv-6.2.0 a b
ruby-dotenv-2.6.0 a b

Escaped z

parser output
python-dotenv-0.9.1 a \ \ z b
python-dotenv-0.10.1 a \ z b
bash-5.0.0 a z b
js-dotenv-6.2.0 a \ z b
ruby-dotenv-2.6.0 a \ z b

Escaped and single-quoted z

parser output
python-dotenv-0.9.1 a \ z b
python-dotenv-0.10.1 a \ z b
bash-5.0.0 a \ z b
js-dotenv-6.2.0 a \ z b
ruby-dotenv-2.6.0 a \ z b

Escaped and double-quoted z

parser output
python-dotenv-0.9.1 a \ z b
python-dotenv-0.10.1 a \ z b
bash-5.0.0 a \ z b
js-dotenv-6.2.0 a \ z b
ruby-dotenv-2.6.0 a z b

Escaped n

parser output
python-dotenv-0.9.1 a \ \ n b
python-dotenv-0.10.1 a \ n b
bash-5.0.0 a n b
js-dotenv-6.2.0 a \ n b
ruby-dotenv-2.6.0 a \ n b

Escaped and single-quoted n

parser output
python-dotenv-0.9.1 a \ n b
python-dotenv-0.10.1 a \n b
bash-5.0.0 a \ n b
js-dotenv-6.2.0 a \ n b
ruby-dotenv-2.6.0 a \ n b

Escaped and double-quoted n

parser output
python-dotenv-0.9.1 a \ n b
python-dotenv-0.10.1 a \n b
bash-5.0.0 a \ n b
js-dotenv-6.2.0 a \n b
ruby-dotenv-2.6.0 a \n b

Quoted newline

parser output
python-dotenv-0.9.1 " a
python-dotenv-0.10.1 a \n b
bash-5.0.0 a \n b
js-dotenv-6.2.0 a
ruby-dotenv-2.6.0 a \n b

Non-escaped space

parser output
python-dotenv-0.9.1 a b
python-dotenv-0.10.1 a b
bash-5.0.0 a
js-dotenv-6.2.0 a b
ruby-dotenv-2.6.0 a b

Non-escaped #

parser output
python-dotenv-0.9.1 a # b
python-dotenv-0.10.1 a
bash-5.0.0 a # b
js-dotenv-6.2.0 a # b
ruby-dotenv-2.6.0 a

Non-escaped spaced #

parser output
python-dotenv-0.9.1 a # b
python-dotenv-0.10.1 a
bash-5.0.0 a
js-dotenv-6.2.0 a # b
ruby-dotenv-2.6.0 a

Escaped #

parser output
python-dotenv-0.9.1 a # b
python-dotenv-0.10.1 a # b
bash-5.0.0 a # b
js-dotenv-6.2.0 a # b
ruby-dotenv-2.6.0 a # b

UTF-8

parser output
python-dotenv-0.9.1 \ x e 9
python-dotenv-0.10.1 é
bash-5.0.0 é
js-dotenv-6.2.0 é
ruby-dotenv-2.6.0 é

Quoted UTF-8

parser output
python-dotenv-0.9.1 é
python-dotenv-0.10.1 é
bash-5.0.0 é
js-dotenv-6.2.0 é
ruby-dotenv-2.6.0 é

Conclusion

  1. Non-quoted escapes (valid or invalid): All up-to-date parsers except Bash do the same thing. Bash removes the \. 0.9.1 added a \, which was obviously incorrect.
  2. Single-quoted invalid escapes: Everything is fine.
  3. Single-quoted valid escapes: 0.10.1 is the only parser that interprets them as control characters.
  4. Double-quoted invalid escapes: It's fine except for Ruby.
  5. Double-quoted valid escapes: All up-to-date parsers except Bash do the same thing. Bash and 0.9.1 keep the original characters instead of interpreting them as control characters.
  6. Pound sign #: Interpreted as a comment delimiter since 0.10.0 if unquoted, even if there is no whitespace preceding it. When quoted or prefixed with whitespace, everything is fine except for JavaScript.
  7. Non-quoted UTF-8: Fixed in 0.10.0. When quoted, everything is fine.
  8. Non-escaped space: Only Bash ignores everything after it (or treats the rest as a command). Other parsers include everything until the end of the line.

My opinion:

  • (2, 4, 7) are OK.
  • (1, 5, 8) are where Bash differs from other parsers. It isn't obvious what we should do.
  • (3, 6) are where python-dotenv 0.10.0 is quite obviously broken and should be fixed.