module: implement NODE_COMPILE_CACHE for automatic on-disk code caching by joyeecheung · Pull Request #52535 · nodejs/node

added 2 commits

April 14, 2024 23:00
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified directory
to speed up the compilation. This may slow down the first load of a
module graph, but subsequent loads of the same module graph may get
a significant speedup if the contents of the modules do not change.
Locally, this speeds up loading of test/fixtures/snapshot/typescript.js
from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

@joyeecheung joyeecheung added request-ci

Add this label to start a Jenkins CI on a PR.

and removed lib / src

Issues and PRs related to general changes in the lib or src directory.

needs-ci

PRs that need a full CI run.

labels

Apr 15, 2024

joyeecheung

@richardlau richardlau added the semver-minor

PRs that contain new features and should be released in the next minor version.

label

Apr 15, 2024

anonrig

VoltrexKeyva

benjamingr

benjamingr

benjamingr

@joyeecheung

nodejs-github-bot pushed a commit that referenced this pull request

Apr 19, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52293
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>

targos pushed a commit to targos/node that referenced this pull request

Apr 19, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>

marco-ippolito pushed a commit that referenced this pull request

Apr 19, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52293
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>

targos pushed a commit to targos/node that referenced this pull request

Apr 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>

nodejs-github-bot pushed a commit that referenced this pull request

Apr 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52465
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>

RafaelGSS pushed a commit that referenced this pull request

Apr 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: #52465
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>

aduh95 pushed a commit that referenced this pull request

Apr 29, 2024
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified
directory to speed up the compilation. This may slow down the first
load of a module graph, but subsequent loads of the same module graph
may get a significant speedup if the contents of the modules do not
change. Locally, this speeds up loading of
test/fixtures/snapshot/typescript.js from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

PR-URL: #52535
Refs: #47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>

aduh95 added a commit that referenced this pull request

Apr 30, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: TODO

aduh95 added a commit that referenced this pull request

May 1, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768

aduh95 added a commit that referenced this pull request

May 1, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768

aduh95 added a commit that referenced this pull request

May 1, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768

aduh95 added a commit that referenced this pull request

May 2, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768

targos pushed a commit that referenced this pull request

May 2, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) #52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) #52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) #52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) #52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) #52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) #52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) #52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) #52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) #52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) #52280

PR-URL: #52768

Ch3nYuY pushed a commit to Ch3nYuY/node that referenced this pull request

May 8, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280

PR-URL: nodejs#52768

sophoniie pushed a commit to sophoniie/node that referenced this pull request

Jun 20, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280

PR-URL: nodejs#52768

bmeck pushed a commit to bmeck/node that referenced this pull request

Jun 22, 2024
Original commit message:

    [compiler] reset script details in functions deserialized from code cache

    During the serialization of the code cache, V8 would wipe out the
    host-defined options, so after a script id deserialized from the
    code cache, the host-defined options need to be reset on the script
    using what's provided by the embedder when doing the deserializing
    compilation, otherwise the HostImportModuleDynamically callbacks
    can't get the data it needs to implement dynamic import().

    Change-Id: I33cc6a5e43b6469d3527242e083f7ae6d8ed0c6a
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5401780
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Commit-Queue: Joyee Cheung <joyee@igalia.com>
    Cr-Commit-Position: refs/heads/main@{#93323}

Refs: v8/v8@cd10ad7
PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: nodejs#52465
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>

bmeck pushed a commit to bmeck/node that referenced this pull request

Jun 22, 2024
Notable changes:

buffer:
  * improve `base64` and `base64url` performance (Yagiz Nizipli) nodejs#52428
dns:
  * (SEMVER-MINOR) add order option and support ipv6first (Paolo Insogna) nodejs#52492
events,doc:
  * mark CustomEvent as stable (Daeyeon Jeong) nodejs#52618
lib, url:
  * (SEMVER-MINOR) add a `windows` option to path parsing (Aviv Keller) nodejs#52509
module:
  * (SEMVER-MINOR) implement NODE_COMPILE_CACHE for automatic on-disk code caching (Joyee Cheung) nodejs#52535
net:
  * (SEMVER-MINOR) add CLI option for autoSelectFamilyAttemptTimeout (Paolo Insogna) nodejs#52474
src:
  * (SEMVER-MINOR) add `string_view` overload to snapshot FromBlob (Anna Henningsen) nodejs#52595
src,permission:
  * throw async errors on async APIs (Rafael Gonzaga) nodejs#52730
test_runner:
  * (SEMVER-MINOR) add --test-skip-pattern cli option (Aviv Keller) nodejs#52529
url:
  * (SEMVER-MINOR) implement parse method for safer URL parsing (Ali Hassan) nodejs#52280

PR-URL: nodejs#52768

joyeecheung added a commit to joyeecheung/node that referenced this pull request

Jan 22, 2025
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified
directory to speed up the compilation. This may slow down the first
load of a module graph, but subsequent loads of the same module graph
may get a significant speedup if the contents of the modules do not
change. Locally, this speeds up loading of
test/fixtures/snapshot/typescript.js from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>

joyeecheung added a commit to joyeecheung/node that referenced this pull request

Jan 23, 2025
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified
directory to speed up the compilation. This may slow down the first
load of a module graph, but subsequent loads of the same module graph
may get a significant speedup if the contents of the modules do not
change. Locally, this speeds up loading of
test/fixtures/snapshot/typescript.js from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>

joyeecheung added a commit to joyeecheung/node that referenced this pull request

Jan 23, 2025
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified
directory to speed up the compilation. This may slow down the first
load of a module graph, but subsequent loads of the same module graph
may get a significant speedup if the contents of the modules do not
change. Locally, this speeds up loading of
test/fixtures/snapshot/typescript.js from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>

joyeecheung added a commit to joyeecheung/node that referenced this pull request

Jan 23, 2025
This patch implements automatic on-disk code caching that can be enabled
via an environment variable NODE_COMPILE_CACHE.

When set, whenever Node.js compiles a CommonJS or a ECMAScript Module,
it will use on-disk [V8 code cache][] persisted in the specified
directory to speed up the compilation. This may slow down the first
load of a module graph, but subsequent loads of the same module graph
may get a significant speedup if the contents of the modules do not
change. Locally, this speeds up loading of
test/fixtures/snapshot/typescript.js from ~130ms to ~80ms.

To clean up the generated code cache, simply remove the directory.
It will be recreated the next time the same directory is used for
`NODE_COMPILE_CACHE`.

Compilation cache generated by one version of Node.js may not be used
by a different version of Node.js. Cache generated by different versions
of Node.js will be stored separately if the same directory is used
to persist the cache, so they can co-exist.

Caveat: currently when using this with V8 JavaScript code coverage, the
coverage being collected by V8 may be less precise in functions that are
deserialized from the code cache. It's recommended to turn this off when
running tests to generate precise coverage.

Implementation details:

There is one cache file per module on disk. The directory layout
is:

- Compile cache directory (from NODE_COMPILE_CACHE)
  - 8b23c8fe: CRC32 hash of CachedDataVersionTag + NODE_VERESION
  - 2ea3424d:
     - 10860e5a: CRC32 hash of filename + module type
     - 431e9adc: ...
     - ...

Inside the cache file, there is a header followed by the actual
cache content:

```
[uint32_t] code size
[uint32_t] code hash
[uint32_t] cache size
[uint32_t] cache hash
... compile cache content ...
```

When reading the cache file, we'll also check if the code size
and code hash match the code that the module loader is loading
and whether the cache size and cache hash match the file content
read. If they don't match, or if V8 rejects the cache passed,
we'll ignore the mismatch cache, and regenerate the cache after
compilation succeeds and rewrite it to disk.

PR-URL: nodejs#52535
Refs: nodejs#47472
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>