Issue 33850: Json.dump() bug when using generator

Created on 2018-06-13 08:44 by biloup, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4) msg319436 - (view) Author: Clément Boyer (biloup) Date: 2018-06-13 08:44
I use a class to write easily json when having generator.
```python
class StreamArray(list):
    def __init__(self, generator):
        super().__init__()
        self.generator = generator
    def __iter__(self):
        return self.generator
    def __len__(self):
        return 1
```
Below a test comparing json.dump and json.dumps.
```
>>> import json
>>> class StreamArray(list):
...     def __init__(self, generator):
...         super().__init__()
...         self.generator = generator
...     def __iter__(self):
...         return self.generator
...     def __len__(self):
...         return 1
... 
>>> g = (i for i in range(0))
>>> json.dumps({"a": StreamArray(g)})
'{"a": []}'
>>> f = open("/tmp/test.json", "w+")
>>> g = (i for i in range(0))
>>> json.dump({"a": StreamArray(g)}, f)
>>> f.close()
>>> print(open("/tmp/test.json").read())
{"a": ]}
```
I don't know if it's me or if there is actually a problem, could you help me ?
msg319442 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-13 09:44
You should ask your question on this mailing list: https://mail.python.org/mailman/listinfo/python-list

The bug tracker is not a place for asking how to use Python. If you actually find a bug in Python, you can re-open this issue. I do not believe that what you show here is a Python bug.

Good luck!
msg319459 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2018-06-13 13:00
The problem here is that StreamArray lies about the length of the iterator. This confuses json.encoder._make_iterencode._iterencode_list(), (which is called by json.dump()), because it first does a check for "if not lst" and then assumes in the loop that it will be entered at least once.

(Note that json.dumps() doesn't have that problem, because it calls JSONEncoder.encode() with _one_shot=True which leads to a totally different code path).

We could declare that bug as "don't do that then", but the problem is easily solvable, because we can check whether the loop was entered. The attached patch should do the trick.

An even better approach would IMHO be, that the encoder supports a special flag that enables JSON serialization of generators directly, so it's no longer required to masquerade generators as list
msg319461 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-13 13:12
This is a duplicate of issue27613.
History Date User Action Args 2022-04-11 14:59:01adminsetgithub: 78031 2018-06-13 13:12:52serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg319461
resolution: duplicate

superseder: Empty iterator with fake __len__ is rendered as a single bracket ] when using json's iterencode

2018-06-13 13:00:02doerwaltersetstatus: closed -> open
files: + json-dump-generators-bug.diff

keywords: + patch
nosy: + doerwalter
messages: + msg319459
resolution: not a bug -> (no value)

2018-06-13 09:44:58eric.smithsetstatus: open -> closed

nosy: + eric.smith
messages: + msg319442

resolution: not a bug
stage: resolved

2018-06-13 08:44:21biloupcreate