querystring: improve parse() and escape() performance by mscdex · Pull Request #5012 · nodejs/node

@mscdex added the querystring

Issues and PRs related to the built-in querystring module.

label

Jan 31, 2016
This commit improves parse() performance by ~20-200% with the various
querystring-parse benchmarks.

Some optimization strategies used in this commit include:
* Combining multiple searches (for '&', '=', and '+') on the same
   string into a single loop
* Avoiding string.split()
* Minimizing creation of temporary strings
* Avoiding string decoding if no encoded bytes were found and the
   default string decoder is being used
Before this, v8 would deopt when an out of bounds `inIndex` would get
passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we
directly emulate that behavior as well.

Also, calls to charCodeAt() for constant strings have been replaced
by the raw character codes and parser state is now stored as an
integer instead of a string. Both of these provide a slight
performance increase.
This commit improves escape() performance by up to 15% with the
existing querystring-stringify benchmarks by reducing the number
of string concatentations. A potential deopt is also avoided by
making sure the index passed to charCodeAt() is within bounds.

mscdex added a commit that referenced this pull request

Feb 13, 2016
This commit improves parse() performance by ~20-200% with the various
querystring-parse benchmarks.

Some optimization strategies used in this commit include:
* Combining multiple searches (for '&', '=', and '+') on the same
   string into a single loop
* Avoiding string.split()
* Minimizing creation of temporary strings
* Avoiding string decoding if no encoded bytes were found and the
   default string decoder is being used

PR-URL: #5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

mscdex added a commit that referenced this pull request

Feb 13, 2016
Before this, v8 would deopt when an out of bounds `inIndex` would get
passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we
directly emulate that behavior as well.

Also, calls to charCodeAt() for constant strings have been replaced
by the raw character codes and parser state is now stored as an
integer instead of a string. Both of these provide a slight
performance increase.

PR-URL: #5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

mscdex added a commit that referenced this pull request

Feb 13, 2016
This commit improves escape() performance by up to 15% with the
existing querystring-stringify benchmarks by reducing the number
of string concatentations. A potential deopt is also avoided by
making sure the index passed to charCodeAt() is within bounds.

PR-URL: #5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

@mscdex mscdex deleted the perf-querystring branch

February 13, 2016 01:33

rvagg pushed a commit that referenced this pull request

Feb 15, 2016
This commit improves parse() performance by ~20-200% with the various
querystring-parse benchmarks.

Some optimization strategies used in this commit include:
* Combining multiple searches (for '&', '=', and '+') on the same
   string into a single loop
* Avoiding string.split()
* Minimizing creation of temporary strings
* Avoiding string decoding if no encoded bytes were found and the
   default string decoder is being used

PR-URL: #5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

rvagg pushed a commit that referenced this pull request

Feb 15, 2016
Before this, v8 would deopt when an out of bounds `inIndex` would get
passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we
directly emulate that behavior as well.

Also, calls to charCodeAt() for constant strings have been replaced
by the raw character codes and parser state is now stored as an
integer instead of a string. Both of these provide a slight
performance increase.

PR-URL: #5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

rvagg pushed a commit that referenced this pull request

Feb 15, 2016
This commit improves escape() performance by up to 15% with the
existing querystring-stringify benchmarks by reducing the number
of string concatentations. A potential deopt is also avoided by
making sure the index passed to charCodeAt() is within bounds.

PR-URL: #5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

@rvagg rvagg mentioned this pull request

Feb 18, 2016

rvagg added a commit that referenced this pull request

Feb 21, 2016
* buffer:
  - You can now supply an encoding argument when filling a
    Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying
    an existing Buffer will also work with
    Buffer#fill(buffer[, start[, end]]). See the API documentation for
    details on how this works. (Trevor Norris) #4935
  - Buffer#indexOf() no longer requires a byteOffset argument if you
    also wish to specify an encoding:
    Buffer#indexOf(val[, byteOffset][, encoding]).
    (Trevor Norris) #4803
* child_process: spawn() and spawnSync() now support a 'shell' option
  to allow for optional execution of the given command inside a shell.
  If set to true, cmd.exe will be used on Windows and /bin/sh
  elsewhere. A path to a custom shell can also be passed to override
  these defaults. On Windows, this option allows .bat. and .cmd files
  to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598
* http_parser: Update to http-parser 2.6.2 to fix an unintentionally
  strict limitation of allowable header characters.
  (James M Snell) #5237
* dgram: socket.send() now supports accepts an array of Buffers or
  Strings as the first argument. See the API docs for details on how
  this works. (Matteo Collina) #4374
* http: Fix a bug where handling headers will mistakenly trigger an
  'upgrade' event where the server is just advertising its protocols.
  This bug can prevent HTTP clients from communicating with HTTP/2
  enabled servers. (Fedor Indutny) #4337
* net: Added a listening Boolean property to net and http servers to
  indicate whether the server is listening for connections.
  (José Moreira) #4743
* node: The C++ node::MakeCallback() API is now reentrant and calling
  it from inside another MakeCallback() call no longer causes the
  nextTick queue or Promises microtask queue to be processed out of
  order. (Trevor Norris) #4507
* tls: Add a new tlsSocket.getProtocol() method to get the negotiated
  TLS protocol version of the current connection. (Brian White) #4995
* vm: Introduce new 'produceCachedData' and 'cachedData' options to
  new vm.Script() to interact with V8's code cache. When a new
  vm.Script object is created with the 'produceCachedData' set to true
  a Buffer with V8's code cache data will be produced and stored in
  cachedData property of the returned object. This data in turn may be
  supplied back to another vm.Script() object with a 'cachedData'
  option if the supplied source is the same. Successfully executing a
  script from cached data can speed up instantiation time. See the API
  docs for details. (Fedor Indutny) #4777
* performance: Improvements in:
  - process.nextTick() (Ruben Bridgewater) #5092
  - path module (Brian White) #5123
  - querystring module (Brian White) #5012
  - streams module when processing small chunks (Matteo Collina) #4354

rvagg added a commit that referenced this pull request

Feb 21, 2016
* buffer:
  - You can now supply an encoding argument when filling a
    Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying
    an existing Buffer will also work with
    Buffer#fill(buffer[, start[, end]]). See the API documentation for
    details on how this works. (Trevor Norris) #4935
  - Buffer#indexOf() no longer requires a byteOffset argument if you
    also wish to specify an encoding:
    Buffer#indexOf(val[, byteOffset][, encoding]).
    (Trevor Norris) #4803
* child_process: spawn() and spawnSync() now support a 'shell' option
  to allow for optional execution of the given command inside a shell.
  If set to true, cmd.exe will be used on Windows and /bin/sh
  elsewhere. A path to a custom shell can also be passed to override
  these defaults. On Windows, this option allows .bat. and .cmd files
  to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598
* http_parser: Update to http-parser 2.6.2 to fix an unintentionally
  strict limitation of allowable header characters.
  (James M Snell) #5237
* dgram: socket.send() now supports accepts an array of Buffers or
  Strings as the first argument. See the API docs for details on how
  this works. (Matteo Collina) #4374
* http: Fix a bug where handling headers will mistakenly trigger an
  'upgrade' event where the server is just advertising its protocols.
  This bug can prevent HTTP clients from communicating with HTTP/2
  enabled servers. (Fedor Indutny) #4337
* net: Added a listening Boolean property to net and http servers to
  indicate whether the server is listening for connections.
  (José Moreira) #4743
* node: The C++ node::MakeCallback() API is now reentrant and calling
  it from inside another MakeCallback() call no longer causes the
  nextTick queue or Promises microtask queue to be processed out of
  order. (Trevor Norris) #4507
* tls: Add a new tlsSocket.getProtocol() method to get the negotiated
  TLS protocol version of the current connection. (Brian White) #4995
* vm: Introduce new 'produceCachedData' and 'cachedData' options to
  new vm.Script() to interact with V8's code cache. When a new
  vm.Script object is created with the 'produceCachedData' set to true
  a Buffer with V8's code cache data will be produced and stored in
  cachedData property of the returned object. This data in turn may be
  supplied back to another vm.Script() object with a 'cachedData'
  option if the supplied source is the same. Successfully executing a
  script from cached data can speed up instantiation time. See the API
  docs for details. (Fedor Indutny) #4777
* performance: Improvements in:
  - process.nextTick() (Ruben Bridgewater) #5092
  - path module (Brian White) #5123
  - querystring module (Brian White) #5012
  - streams module when processing small chunks (Matteo Collina) #4354

rvagg added a commit that referenced this pull request

Feb 23, 2016
* buffer:
  - You can now supply an encoding argument when filling a
    Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying
    an existing Buffer will also work with
    Buffer#fill(buffer[, start[, end]]). See the API documentation for
    details on how this works. (Trevor Norris) #4935
  - Buffer#indexOf() no longer requires a byteOffset argument if you
    also wish to specify an encoding:
    Buffer#indexOf(val[, byteOffset][, encoding]).
    (Trevor Norris) #4803
* child_process: spawn() and spawnSync() now support a 'shell' option
  to allow for optional execution of the given command inside a shell.
  If set to true, cmd.exe will be used on Windows and /bin/sh
  elsewhere. A path to a custom shell can also be passed to override
  these defaults. On Windows, this option allows .bat. and .cmd files
  to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598
* http_parser: Update to http-parser 2.6.2 to fix an unintentionally
  strict limitation of allowable header characters.
  (James M Snell) #5237
* dgram: socket.send() now supports accepts an array of Buffers or
  Strings as the first argument. See the API docs for details on how
  this works. (Matteo Collina) #4374
* http: Fix a bug where handling headers will mistakenly trigger an
  'upgrade' event where the server is just advertising its protocols.
  This bug can prevent HTTP clients from communicating with HTTP/2
  enabled servers. (Fedor Indutny) #4337
* net: Added a listening Boolean property to net and http servers to
  indicate whether the server is listening for connections.
  (José Moreira) #4743
* node: The C++ node::MakeCallback() API is now reentrant and calling
  it from inside another MakeCallback() call no longer causes the
  nextTick queue or Promises microtask queue to be processed out of
  order. (Trevor Norris) #4507
* tls: Add a new tlsSocket.getProtocol() method to get the negotiated
  TLS protocol version of the current connection. (Brian White) #4995
* vm: Introduce new 'produceCachedData' and 'cachedData' options to
  new vm.Script() to interact with V8's code cache. When a new
  vm.Script object is created with the 'produceCachedData' set to true
  a Buffer with V8's code cache data will be produced and stored in
  cachedData property of the returned object. This data in turn may be
  supplied back to another vm.Script() object with a 'cachedData'
  option if the supplied source is the same. Successfully executing a
  script from cached data can speed up instantiation time. See the API
  docs for details. (Fedor Indutny) #4777
* performance: Improvements in:
  - process.nextTick() (Ruben Bridgewater) #5092
  - path module (Brian White) #5123
  - querystring module (Brian White) #5012
  - streams module when processing small chunks (Matteo Collina) #4354

PR-URL: #5295

stefanmb pushed a commit to stefanmb/node that referenced this pull request

Feb 23, 2016
This commit improves parse() performance by ~20-200% with the various
querystring-parse benchmarks.

Some optimization strategies used in this commit include:
* Combining multiple searches (for '&', '=', and '+') on the same
   string into a single loop
* Avoiding string.split()
* Minimizing creation of temporary strings
* Avoiding string decoding if no encoded bytes were found and the
   default string decoder is being used

PR-URL: nodejs#5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

stefanmb pushed a commit to stefanmb/node that referenced this pull request

Feb 23, 2016
Before this, v8 would deopt when an out of bounds `inIndex` would get
passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we
directly emulate that behavior as well.

Also, calls to charCodeAt() for constant strings have been replaced
by the raw character codes and parser state is now stored as an
integer instead of a string. Both of these provide a slight
performance increase.

PR-URL: nodejs#5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

stefanmb pushed a commit to stefanmb/node that referenced this pull request

Feb 23, 2016
This commit improves escape() performance by up to 15% with the
existing querystring-stringify benchmarks by reducing the number
of string concatentations. A potential deopt is also avoided by
making sure the index passed to charCodeAt() is within bounds.

PR-URL: nodejs#5012
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

rvagg added a commit that referenced this pull request

Feb 24, 2016
* buffer:
  - You can now supply an encoding argument when filling a
    Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying
    an existing Buffer will also work with
    Buffer#fill(buffer[, start[, end]]). See the API documentation for
    details on how this works. (Trevor Norris) #4935
  - Buffer#indexOf() no longer requires a byteOffset argument if you
    also wish to specify an encoding:
    Buffer#indexOf(val[, byteOffset][, encoding]).
    (Trevor Norris) #4803
* child_process: spawn() and spawnSync() now support a 'shell' option
  to allow for optional execution of the given command inside a shell.
  If set to true, cmd.exe will be used on Windows and /bin/sh
  elsewhere. A path to a custom shell can also be passed to override
  these defaults. On Windows, this option allows .bat. and .cmd files
  to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598
* http_parser: Update to http-parser 2.6.2 to fix an unintentionally
  strict limitation of allowable header characters.
  (James M Snell) #5237
* dgram: socket.send() now supports accepts an array of Buffers or
  Strings as the first argument. See the API docs for details on how
  this works. (Matteo Collina) #4374
* http: Fix a bug where handling headers will mistakenly trigger an
  'upgrade' event where the server is just advertising its protocols.
  This bug can prevent HTTP clients from communicating with HTTP/2
  enabled servers. (Fedor Indutny) #4337
* net: Added a listening Boolean property to net and http servers to
  indicate whether the server is listening for connections.
  (José Moreira) #4743
* node: The C++ node::MakeCallback() API is now reentrant and calling
  it from inside another MakeCallback() call no longer causes the
  nextTick queue or Promises microtask queue to be processed out of
  order. (Trevor Norris) #4507
* tls: Add a new tlsSocket.getProtocol() method to get the negotiated
  TLS protocol version of the current connection. (Brian White) #4995
* vm: Introduce new 'produceCachedData' and 'cachedData' options to
  new vm.Script() to interact with V8's code cache. When a new
  vm.Script object is created with the 'produceCachedData' set to true
  a Buffer with V8's code cache data will be produced and stored in
  cachedData property of the returned object. This data in turn may be
  supplied back to another vm.Script() object with a 'cachedData'
  option if the supplied source is the same. Successfully executing a
  script from cached data can speed up instantiation time. See the API
  docs for details. (Fedor Indutny) #4777
* performance: Improvements in:
  - process.nextTick() (Ruben Bridgewater) #5092
  - path module (Brian White) #5123
  - querystring module (Brian White) #5012
  - streams module when processing small chunks (Matteo Collina) #4354

PR-URL: #5295

This was referenced

Apr 24, 2023