[Task]: Spark runner flatMap output should not be required to fit in the memory
What needs to happen?
Currently on Spark runner, if single processElement call produces multiple output elements, they all needs to fit in the memory [1]. This is problematic e.g. for ParquetIO, which instead of Source<> based reads uses DoFn and let reader from inside DoFn push all elements to the output. Similar happens with JdbcIO and was discussed here [2].
The goal is to overcome this constraint and allow to produce large output from DoFn on Spark runner.
[2] https://www.mail-archive.com/dev@beam.apache.org/msg16806.html
Issue Priority
Priority: 2
Issue Component
Component: runner-spark