bigquery: please support non-legacy-sql nested data type, and preserve order in json
report in googleapis/google-cloud-node#1593 (comment) from a nodejs app, but since my app can also be written in Python I have a test of Python BigQuery API as well, it seems have a same problem:
$ bq.py --format=prettyjson query --nouse_legacy_sql 'SELECT ARRAY[STRUCT("zx83" AS id, 1 AS a, 2 AS b), ("f8f7", 4, 7)] d' |& cat -An ... 2 [$ 3 {$ 4 "d": [$ 5 {$ 6 "a": "1", $ 7 "b": "2", $ 8 "id": "zx83"$ 9 }, $ 10 {$ 11 "a": "4", $ 12 "b": "7", $ 13 "id": "f8f7"$ 14 }$ 15 ]$ 16 }$ 17 ]$
the problems:
- nested value types are not interpreted, the value types of a and b can be inferred as Integers (from bigquery api response, there are schema including nested field's data type), should be converted to python integers; (the way to fix will need somewhat recursively interpret data types)
- keys order are not preserved, someone here might argue following reasons but there are ways workaround:
- JSON object's key order doesn't matter to machine; but when we print a prettyjson format, it's for human, and keys order does matter
- python's default
json.loadsconvert a JSON object to python dict which uses key's internal hash code somewhat unpredictable order, way to fix is to load withOrderedDict, and drop sort_keys parameter when callingjson.dumps
- trailing space in object key value pair lines, this is trivial and some Python JSON API's own problem, there are many careless python doing this including this project; please read below expected output, can be fixed by
json.dumps(..., separators=(',', ': '))
many tools written in Python suffered the same problem, like what I commented in the popular httpie tool: httpie/cli#427 (comment)
HOW to Fix 2 & 3
In [1]: import json In [2]: from collections import OrderedDict In [3]: data = json.loads( '{"d": [ { "id": "zx83", "a": 1, "b": 2}, { "id": "f8f7", "a": 4, "b": 7 } ]}', object_pairs_hook=OrderedDict) In [4]: data Out[4]: OrderedDict([(u'd', [OrderedDict([(u'id', u'zx83'), (u'a', 1), (u'b', 2)]), OrderedDict([(u'id', u'f8f7'), (u'a', 4), (u'b', 7)])])]) In [5]: json.dumps(data, indent=2, separators=(',', ': ')) Out[5]: '{\n "d": [\n {\n "id": "zx83",\n "a": 1,\n "b": 2\n },\n {\n "id": "f8f7",\n "a": 4,\n "b": 7\n }\n ]\n}' In [6]: print json.dumps(data, indent=2, separators=(',', ': ')) { "d": [ { "id": "zx83", "a": 1, "b": 2 }, { "id": "f8f7", "a": 4, "b": 7 } ] }
Expected output:
2 [$ 3 {$ 4 "d": [$ 5 {$ 6 "id": "zx83",$ 7 "a": 1,$ 8 "b": 2$ 9 },$ 10 {$ 11 "id": "f8f7",$ 12 "a": 4,$ 13 "b": 7$ 14 }$ 15 ]$ 16 }$ 17 ]$