Add MERGE ON CREATE SET / ON MATCH SET support (#1619) by gregfelice · Pull Request #2347 · apache/age

@gregfelice

Implements the openCypher-standard ON CREATE SET and ON MATCH SET
clauses for the MERGE statement. This allows conditional property
updates depending on whether MERGE created a new path or matched
an existing one:

  MERGE (n:Person {name: 'Alice'})
    ON CREATE SET n.created = timestamp()
    ON MATCH SET n.updated = timestamp()

Implementation spans parser, planner, and executor:

- Grammar: new merge_actions_opt/merge_actions/merge_action rules
  in cypher_gram.y, with ON keyword added to cypher_kwlist.h
- Nodes: on_match/on_create lists on cypher_merge, corresponding
  on_match_set_info/on_create_set_info on cypher_merge_information,
  and prop_expr on cypher_update_item (all serialized through
  copy/out/read funcs)
- Transform: cypher_clause.c transforms ON SET items and stores
  prop_expr for direct expression evaluation
- Executor: cypher_set.c extracts apply_update_list() from
  process_update_list(); cypher_merge.c calls it at all merge
  decision points (simple merge, terminal, non-terminal with
  eager buffering, and first-clause-with-followers paths)

Key design choice: prop_expr stores the Expr* directly in
cypher_update_item rather than using prop_position into the scan
tuple. The planner strips target list entries for SET expressions
that CustomScan doesn't need, making prop_position references
dangling. By storing the expression directly (only for MERGE ON
SET items), we evaluate it with ExecInitExpr/ExecEvalExpr
independent of the scan tuple layout.

Includes regression tests covering: basic ON CREATE SET, basic
ON MATCH SET, combined ON CREATE + ON MATCH, multiple SET items,
expression evaluation, interaction with WITH clause, and edge
property updates.

All 31 regression tests pass.

@gregfelice

…sor test

- Move ExecInitExpr for ON CREATE/MATCH SET items from per-row
  execution in apply_update_list() to plan initialization in
  begin_cypher_merge(). Follows the established pattern used by
  cypher_target_node (id_expr_state, prop_expr_state).
- Add prop_expr_state field to cypher_update_item with serialization
  support in outfuncs/readfuncs/copyfuncs.
- apply_update_list() uses pre-initialized state when available,
  falls back to per-row init for plain SET callers.
- Fix misleading comment: "ON MATCH SET" → "ON CREATE SET" for Case 1
  first-run test.
- Add Case 1 second-run test that triggers ON MATCH SET with a
  predecessor clause (MATCH ... MERGE ... ON MATCH SET).

@gregfelice

1. Add ON to safe_keywords in cypher_gram.y so that property keys
   and labels named 'on' still work (e.g., n.on, MATCH (n:on)).
   All other keywords added as tokens are also in safe_keywords.

2. Add chained (non-terminal) MERGE regression tests exercising the
   eager-buffering code path with ON CREATE SET and ON MATCH SET.
   First run creates both nodes (ON CREATE SET fires), second run
   matches both (ON MATCH SET fires).

All regression tests pass (cypher_merge: ok).

@gregfelice @claude

…N keyword test

- Move ExecStoreVirtualTuple before apply_update_list unconditionally in
  Case 1 non-terminal and terminal MERGE paths, matching the pattern at
  Case 3 (line 994). Ensures tts_nvalid is set for downstream ExecProject
  even when ON CREATE SET is absent.

- Add resolve_merge_set_exprs() helper to deduplicate the prop_expr
  resolution loops for ON MATCH SET and ON CREATE SET. Includes ereport
  when target entry is missing (internal error, should never happen).

- Add regression test for ON keyword as label name, confirming backward
  compatibility via safe_keywords grammar path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>