[WIP][POC] Use pipelining mode by abenhamdine · Pull Request #2706 · brianc/node-postgres
Work in progress to adress #2646
Base on the previous work of @jusou here #662, thx to him !
Principle
In current behaviour of node-postgres, a query is not sent to the database until the previous query has completed and the associated result received.
With this PR, pipelining mode can be enabled, it allows the client to send all of the queries to the server up front, minimizing time spent by one side waiting for the other to finish sending data.
Important : pipelining has nothing to do with query execution concurrency/parallelism. With or without pipelining mode, the PostgreSQL server is executing the queries sequentially (while using parallelism capabilities if enabled), pipelining just allows both sides of the connection to work concurrently when possible, and to minimize round-trip time.
The higher the latency is between the driver and the posgres serveur and the number of requests sent, the higher the performance gain will be.
API changes
- new property
client.pipeliningcan be set to true to enable pipelining mode (default false)
const client = await pg.connect() client.pipelining = true
Implementation
- add tests
- add unit tests
- queries ready to emit
- other ?
- add integration tests
- TBD
- add unit tests
- currently it adress only js driver, not native one (I know nothing about the native driver, so I doubt I will be able to adress that)
- prepared statements are broken (probably because of race conditions in parsing), need to be fixed
- the default behaviour has to be defined (probably optout until we are sure it's safe) false by default in this PR
- allow to optin (or optout, depending on the default) at the pool/client level client.pipelining = true
While it's not finished, we are currently testing it with our product, with massive usage of queries (50 tasks in parallel sending each thousands of queries) and it seems to work (except of prepared statements).
I'm of course open to suggestions/review but I'm very busy nowadays thus don't be frustrated if I don't answer to remarks in a decent delay.
More details concerning the issue with the prepared statements broken in the PR :
If we send immediatly 2 queries with the same named Foo, we got the error Prepared statement Foo already exists
Looks like the driver try to prepare the statement twice, probably because the event parseComplete has not been received yet for the 1st query, when the 2nd query is sent.
I'm a bit confused about how to find the proper place to fix it.
First I thought that it would be simple to debounce the parsing in
by checking if connection.preparedStatements[this.name] exists
but then I realized this check is already performed here :
| if (!this.hasBeenParsed(connection)) { |
so I don't truly understand why this error occurs 🤔
If someone can help/give some hints, it would be useful !
Perf gain in our use case (⚠️ YMMV)
In our real application (payroll processing) :
165 tasks, concurrency between 1 and 40 :
distributed on 4 clients
each task is sending thousands of DML queries, mostly SELECT
gain : 1 second (no gain expected because of no concurrency)
concurrency 1 w/o pipeline : 2m28s
concurrency 1 with pipeline : 2m27s
gain : 18 seconds
concurrency 5 w/o pipeline : 2m20s
concurrency 5 with pipeline : 2m02s
gain : 25 seconds (test n°2 : 26 seconds)
concurrency 20 w/o pipeline : 2m22s (test 2 : 2m24s)
concurrency 20 with pipeline : 1m57s (test 2 : 1m58s)
gain : 27 seconds (test n°2 : 28 seconds)
concurrency 40 w/o pipeline : 2m20s (test 2 2m27s)
concurrency 40 with pipeline : 1m53s (test 2 1m59s
Conclusion :
Very interesting gain of 20% with concurrency 40 (our default in production)
It's also important to note that we have disabled prepared statements with pipelining mode due to the bug, so
the gain is perhaps even greater (by <5% more I would say).