`yew-macro`: optimise macros' memory usage by its-the-shrimp · Pull Request #3845 · yewstack/yew

@its-the-shrimp

Description

Removes most of .to_string() calls from the impl of yew-macro by utilising the Display impls of tokens & AST values, analysing them chunk by chunk instead of allocating all of them into a string

Checklist

  • I have reviewed my own code
  • I have added tests

@github-actions

Size Comparison

Details
examples master (KB) pull request (KB) diff (KB) diff (%)
async_clock 100.197 100.197 0 0.000%
boids 168.080 168.080 0 0.000%
communication_child_to_parent 93.469 93.469 0 0.000%
communication_grandchild_with_grandparent 105.250 105.250 0 0.000%
communication_grandparent_to_grandchild 101.606 101.606 0 0.000%
communication_parent_to_child 90.900 90.900 0 0.000%
contexts 105.149 105.149 0 0.000%
counter 86.282 86.282 0 0.000%
counter_functional 88.272 88.272 0 0.000%
dyn_create_destroy_apps 90.321 90.321 0 0.000%
file_upload 99.346 99.346 0 0.000%
function_delayed_input 94.375 94.375 0 0.000%
function_memory_game 172.939 172.939 0 0.000%
function_router 405.547 405.547 0 0.000%
function_todomvc 164.148 164.148 0 0.000%
futures 235.159 235.159 0 0.000%
game_of_life 104.718 104.718 0 0.000%
immutable 255.868 255.868 0 0.000%
inner_html 80.803 80.803 0 0.000%
js_callback 109.374 109.374 0 0.000%
keyed_list 179.724 179.724 0 0.000%
mount_point 84.146 84.146 0 0.000%
nested_list 113.058 113.058 0 0.000%
node_refs 91.525 91.525 0 0.000%
password_strength 1728.826 1728.826 0 0.000%
portals 93.035 93.035 0 0.000%
router 376.134 376.134 0 0.000%
suspense 113.452 113.452 0 0.000%
timer 88.634 88.634 0 0.000%
timer_functional 98.875 98.875 0 0.000%
todomvc 142.088 142.088 0 0.000%
two_apps 86.146 86.146 0 0.000%
web_worker_fib 136.224 136.224 0 0.000%
web_worker_prime 187.438 187.438 0 0.000%
webgl 83.223 83.223 0 0.000%

✅ None of the examples has changed their size significantly.

@github-actions

@github-actions

Benchmark - SSR

Yew Master

Details
Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 274.902 275.558 275.156 0.244
Hello World 10 446.465 457.306 449.554 3.846
Function Router 10 38648.527 39278.662 39027.108 217.668
Concurrent Task 10 1005.465 1007.205 1006.415 0.497
Many Providers 10 1012.830 1053.982 1030.057 12.488

Pull Request

Details
Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 274.846 275.492 275.112 0.211
Hello World 10 447.667 472.227 452.007 7.402
Function Router 10 38422.460 39145.468 38966.073 206.549
Concurrent Task 10 1005.145 1006.495 1005.813 0.494
Many Providers 10 1024.339 1061.253 1039.619 10.936

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes macro memory usage in the yew-macro crate by eliminating many unnecessary allocations using the new DisplayExt trait methods. Key changes include:

  • Replacing .to_string() comparisons with non-allocating DisplayExt methods (repr_eq, repr_eq_ignore_ascii_case, etc.).
  • Removing the custom non_capitalized_ascii helper in favor of a method on Ident and similar types.
  • Updating diff files across props, html_tree, hook, and function component modules to use the new DisplayExt APIs.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/yew-macro/src/props/prop_macro.rs Added import for DisplayExt and updated associated properties checking logic.
packages/yew-macro/src/props/prop.rs Replaced .to_string() comparisons with DisplayExt's repr_eq methods.
packages/yew-macro/src/lib.rs Removed non_capitalized_ascii and added DisplayExt trait implementation.
packages/yew-macro/src/html_tree/tag.rs Updated error replacement to pass the original error instead of its string version.
packages/yew-macro/src/html_tree/mod.rs Simplified imports and replaced non_capitalized_ascii with DisplayExt methods.
packages/yew-macro/src/html_tree/lint/mod.rs Updated attribute comparisons using repr_eq_ignore_ascii_case.
packages/yew-macro/src/html_tree/html_node.rs Updated logic to use DisplayExt for literal comparisons.
packages/yew-macro/src/html_tree/html_element.rs Replaced string comparisons with DisplayExt methods and cleaned up related logic.
packages/yew-macro/src/html_tree/html_dashed_name.rs Removed redundant eq_ignore_ascii_case helper in favor of DisplayExt-based checks.
packages/yew-macro/src/html_tree/html_component.rs Added is_component_name helper using DisplayExt, and updated component name logic.
packages/yew-macro/src/hook/mod.rs Updated hook naming checks to use starts_with directly on Identifier.
packages/yew-macro/src/hook/body.rs Replaced to_string() checks with non-allocating starts_with comparisons.
packages/yew-macro/src/function_component.rs Updated attribute filtering using DisplayExt methods for consistency.
Comments suppressed due to low confidence (1)

packages/yew-macro/src/lib.rs:88

  • Consider adding unit tests for the new DisplayExt methods to verify their correctness and prevent regressions.
trait DisplayExt: Display {

WorldSEnder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've done any measurement on performance impact, feel free to add this to the PR, but since you already did the work, I still nevertheless think it's a nice idea that can serve some purpose. (in general more performance is left on the table in yew-macro by quoting, then parsing and quoting again into different token streams but that's a different issue).

Our CI doesn't measure anything inside the macro or during compilation, only of the resulting binaries, so will be a bit useless here for performance.

is_ide_completion: is_ide_completion(),
empty: true,
};
write!(writer, "{s}").is_ok_and(|_| !writer.empty)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a better comment and improved naming, as it is very hard to follow currently.

A name is, at the moment, assumed to be a component name if it

  • starts with an ascii uppercase letter
  • OR ide completion is enable and it contains any ascii uppercase

Both of these are rough approximations, for example I think the second condition could be changed to match a specific identifier passed via the RUST_IDE_PROC_MACRO_COMPLETION_DUMMY_IDENTIFIER env variable, and I'm not even sure how much adoption that experiment got at this point.

/// Needed to check the plentiful token-like values in the impl of the macros, which are
/// `Display`able but which either correspond to multiple source code tokens, or are themselves
/// tokens that don't provide a reference to their repr.
trait DisplayExt: Display {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea! But alas, the current implementation is hard to read when you first see it.

Let me phrase the idea differently. We use write! to feed chunks to an accumulator instead of writing them into a new allocation via to_string. This could be more clear, in my opinion, if the implementations would go through a common method expressing this.

/// A pattern can consume successive chunks of a String to determine
// if it matches, until the end of string is reached.
trait Pattern {
    type Match; // bool for all in this PR. Could one imagine more uses?
    // Returns a Result to enable short-curcuit. Another idea would be std::ops::ControlFlow<Self::Match>
    fn feed_chunk(&mut self, chunk: &str) -> Result<(), Self::Match>;
    fn end_of_string(self) -> Self::Match;
}
// Or add this as a method to DisplayExt
fn match_repr<P: Pattern>(this: impl Display, pattern: P) -> P::Match {
    struct PatternWrite<P: Pattern> { pattern: P, r#match: Option<P::Match> }
    impl<P: Pattern> Write for PatternWrite<P> {
        fn write_str(&mut self, chunk: &str) -> std::fmt::Result {
            debug_assert!(self.r#match.is_none());
            self.pattern.feed_chunk(chunk).map_err(|m| {
                self.r#match = Some(m);
                std::fmt::Error // Short-curcuit further chunks
            })
        }
    }
    let mut writer = PatternWrite { pattern, r#match: None };
    match write!(writer, "{this}") {
        Ok(()) => writer.pattern.end_of_string(),
        Err(_) => writer.r#match.unwrap(),
    }
}

With this machinery in place, I think a lot of the checks become a lot more digestible. For example

fn split_off_start<'s>(s: &mut &'s str, mid: usize) -> Option<&'s str> {
    if mid <= s.len() {
        let start;
        (start, *s) = s.split_at(mid);
        Some(start)
    } else {
        None
    }
}

struct EqualsIgnoreAsciiCase<'src>(&'src str);
impl Pattern for EqualsIgnoreAsciiCase<'_> {
    type Match = bool;
    fn feed_chunk(&mut self, chunk: &str) -> Result<(), bool> {
        match split_off_start(&mut self.0, chunk.len()) {
            Some(start) if start.eq_ignore_ascii_case(chunk) => Ok(()),
            _ => Err(false),
        }
    }
    fn end_of_string(self) -> bool {
        self.0.is_empty()
    }
}

struct EqualsExact<'src>(&'src str);
impl Pattern for EqualsExact<'_> {
    type Match = bool;
    fn feed_chunk(&mut self, chunk: &str) -> Result<(), bool> {
        match split_off_start(&mut self.0, chunk.len()) {
            Some(start) if chunk == start => Ok(()),
            _ => Err(false),
        }
    }
    fn end_of_string(self) -> bool {
        self.0.is_empty()
    }
}
trait DisplayExt: Display {
    fn repr_eq_ignore_ascii_case(&self, pattern: &str) -> bool {
        match_repr(self, EqualsIgnoreAsciiCase(pattern))
    }
    // Etc...
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having types & traits that exist solely to implement other traits seems like overengineering that wouldn't be easier to manage, I think with a good explanation & testing, my admittedly cryptic impl will be manageable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point is that the trait and function would translate between two different semantic domains that is worth the overhead.

Write returns a very generic std::fmt::Error "pretending" it failed. At a different point in the code (what follows write!() basically) that error is translated back into the intended result of returning early. So basically when reviewing I had to go jump from inside the write_str to some other point in the code and do that translation in my head. Writing it out explicitly makes it just so much easier to understand what's going on.
At the same time, the part after the write!() also handles the path when no early return happened and we had to look at all the chunks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But your suggested impl still has to use the overly generic std::fmt::Error, is still broken up into multiple parts, but with extra concepts on top, so the reader has to keep more in the head, not less. I think this algorithm will be hard to read anyway, because it has to account for chunked input

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different impls of Pattern do not have to keep this in mind, only the single instance of the glue code in match_repr has to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they still do, your suggested Pattern trait just renames write_str to feed_chunk, but the difficulty of handling chunked input is still there

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming is not the part that makes it meaningful, changing the return type from Result<(), std::fmt::Error> to ControlFlow<bool> (the more appropriate type than Result the more I think about it) is. This is only 1-to-1 if the impl either only short-curcuits on true or false, not possible either. Feel free to bikeshed over the name of the method or just keep write_str if that doesn't lead to ambiguities.

Just because the transformation is simple doesn't mean it's not meaningful, and I still think the dozen lines of glue code make every impl much more approachable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madoshakalaka feel free to chime in. Maybe i'm too critical or not smart enough to read the current code :p

@its-the-shrimp

I also snuck in 1 change from quote! {#tag_name}.span() to tag_name.span(), the former seemed utterly redundant and wasteful, but please tell me if I missed some detail

@its-the-shrimp

Documented the impl & added tests

@WorldSEnder

Additional tests look good, thanks.

@Madoshakalaka

This comment was marked as resolved.

@Madoshakalaka

I used cargo build --timings and ran the function_router example crate on both 3b5c891 (before) and the PR itself (after)

The function router crate was picked becasuse it had the most #[function_component] and html! macro usage.

I repeated it 20 times and on both branch and compared the build duration of each crates

📦 yew
  Before mean: 1.3910 s
  After mean:  1.3900 s
  p-value (after < before): 0.4756

📦 yew-macro
  Before mean: 1.7035 s
  After mean:  1.7190 s
  p-value (after < before): 1

📦 yew-router
  Before mean: 0.4755 s 
  After mean:  0.4735 s
  p-value (after < before): 0.2595

📦 yew-router-macro
  Before mean: 0.4785 s 
  After mean:  0.4830 s
  p-value (after < before): 0.9524

📦 function_router
  Before mean: 1.2670 s
  After mean:  1.2705 s
  p-value (after < before): 0.7968

There doesn't seem to be perceivable compile speed increase (for reference, the entire cargo build takes 16 seconds)

@its-the-shrimp

That's interesting, in that case maybe there's not much merit to overcomplicating the codebase this way, Rust is good enough to optimise it by itself. Unless there's a better way to benchmark the changes, feel free to close this PR

@its-the-shrimp @Madoshakalaka

    - Removed most of `.to_string()` calls
    - Factored out the check for a component-like name

@Madoshakalaka

Revisiting! my previous benchmarks were too hacky

I rebased this onto current master and ran some more targeted benchmarks. The previous benchmarks (psrecord on the whole cargo build) measured RSS across the entire build, which I think dilutes the signal from proc-macro changes since most memory/time is spent compiling dependencies and codegen, not macro expansion.

I created a stress-test crate (not commited) with 201 #[function_component] + 201 html! invocations (~4x the density of function_router) and used two methods:

  1. Incremental compile time (touch source, rebuild, 15 runs)
  2. DHAT heap profiling (valgrind --tool=dhat on the rustc invocation for just the stress-test crate)

Compile time

Benchmark master PR diff p-value
Stress test (201 components) 1946.8ms 1931.3ms -0.80% p<0.001
function_router (57 macros) 506.8ms 504.2ms -0.51% p=0.05

Small but statistically significant for the stress test.

Heap allocations (DHAT)

Running DHAT on just the rustc invocation that compiles the stress-test crate (where the proc-macros run):

Metric master PR diff
Total allocation blocks 18,758,803 18,734,198 -24,605 blocks (-0.13%)
Peak heap size 394.7 MB 389.1 MB -5.6 MB (-1.41%)
Peak live blocks 989,765 951,786 -37,979 blocks (-3.84%)

The .to_string() calls produce small strings so total bytes saved is negligible, but the allocation count and peak memory reductions are real.

I also think we should have such macro benchmarks in the CI. If somebody wants to standardized these tests please do so in separate issue and PRs

@Madoshakalaka