Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347
Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347gregfelice wants to merge 2 commits intoapache:masterfrom
Conversation
Implements the openCypher-standard ON CREATE SET and ON MATCH SET
clauses for the MERGE statement. This allows conditional property
updates depending on whether MERGE created a new path or matched
an existing one:
MERGE (n:Person {name: 'Alice'})
ON CREATE SET n.created = timestamp()
ON MATCH SET n.updated = timestamp()
Implementation spans parser, planner, and executor:
- Grammar: new merge_actions_opt/merge_actions/merge_action rules
in cypher_gram.y, with ON keyword added to cypher_kwlist.h
- Nodes: on_match/on_create lists on cypher_merge, corresponding
on_match_set_info/on_create_set_info on cypher_merge_information,
and prop_expr on cypher_update_item (all serialized through
copy/out/read funcs)
- Transform: cypher_clause.c transforms ON SET items and stores
prop_expr for direct expression evaluation
- Executor: cypher_set.c extracts apply_update_list() from
process_update_list(); cypher_merge.c calls it at all merge
decision points (simple merge, terminal, non-terminal with
eager buffering, and first-clause-with-followers paths)
Key design choice: prop_expr stores the Expr* directly in
cypher_update_item rather than using prop_position into the scan
tuple. The planner strips target list entries for SET expressions
that CustomScan doesn't need, making prop_position references
dangling. By storing the expression directly (only for MERGE ON
SET items), we evaluate it with ExecInitExpr/ExecEvalExpr
independent of the scan tuple layout.
Includes regression tests covering: basic ON CREATE SET, basic
ON MATCH SET, combined ON CREATE + ON MATCH, multiple SET items,
expression evaluation, interaction with WITH clause, and edge
property updates.
All 31 regression tests pass.
|
Friendly ping — this PR adds MERGE ON CREATE SET / ON MATCH SET support (issue #1619), one of the most requested Cypher features for AGE. This is critical for users migrating from Kuzu (recently archived) and Neo4j. The implementation adds grammar rules, executor support, and full regression test coverage. Would really appreciate a review when someone has bandwidth. Thanks! |
There was a problem hiding this comment.
Pull request overview
This PR implements the ON CREATE SET and ON MATCH SET sub-clauses for MERGE statements (openCypher-standard feature, issue #1619). These allow conditional property updates depending on whether MERGE created a new path or matched an existing one.
Changes:
- New grammar rules (
ONkeyword,merge_actions_opt/actions/actionrules), new node fields (on_match/on_createoncypher_merge;on_match_set_info/on_create_set_infooncypher_merge_information;prop_exproncypher_update_item) with full serialization support - Extracted shared
apply_update_list()from theSETexecutor and wired it into all four MERGE execution paths (simple merge, terminal non-first clause, non-terminal eager-buffering path, first-clause-with-followers) - Regression tests covering basic ON CREATE SET, ON MATCH SET, combined clauses, multiple items, reverse ordering, duplicate clause error detection, and edge property updates
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/include/parser/cypher_kwlist.h |
Adds on as a RESERVED_KEYWORD |
src/backend/parser/cypher_gram.y |
Adds ON token and merge_actions_opt/merge_actions/merge_action grammar rules |
src/include/nodes/cypher_nodes.h |
Adds on_match/on_create to cypher_merge; on_match_set_info/on_create_set_info to cypher_merge_information; prop_expr to cypher_update_item |
src/backend/nodes/cypher_copyfuncs.c |
Copies new prop_expr, on_match_set_info, on_create_set_info fields |
src/backend/nodes/cypher_outfuncs.c |
Serializes new fields; fixes wrong comment (cypher_delete → cypher_merge) |
src/backend/nodes/cypher_readfuncs.c |
Deserializes new prop_expr, on_match_set_info, on_create_set_info fields |
src/backend/parser/cypher_clause.c |
Transforms ON MATCH/CREATE SET item lists and stores prop_expr for direct evaluation |
src/include/executor/cypher_utils.h |
Declares apply_update_list(); adds on_match_set_info/on_create_set_info to scan state |
src/backend/executor/cypher_set.c |
Extracts apply_update_list() from process_update_list(); adds prop_expr-based direct evaluation path |
src/backend/executor/cypher_merge.c |
Calls apply_update_list() at all four merge decision points; reorders mark_tts_isnull/ExecStoreVirtualTuple for correctness |
regress/sql/cypher_merge.sql |
Adds regression tests for ON CREATE SET, ON MATCH SET, errors, and cleanup |
regress/expected/cypher_merge.out |
Expected output for new test cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@gregfelice Please see Copilot's suggestions. |
|
Will do
…On Mon, Mar 2, 2026 at 1:53 PM John Gemignani ***@***.***> wrote:
*jrgemignani* left a comment (apache/age#2347)
<#2347 (comment)>
@gregfelice <https://github.com/gregfelice> Please see Copilot's
suggestions.
—
Reply to this email directly, view it on GitHub
<#2347 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBXGH2CKSBODOEPJIOOND4OXKDLAVCNFSM6AAAAACWCZROSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSOBWGIZTKNJTHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Greg Felice
415.343.5227
***@***.***
linkedin.com/in/gregfelice
calendly.com/gregfelice/30min
|
…sor test - Move ExecInitExpr for ON CREATE/MATCH SET items from per-row execution in apply_update_list() to plan initialization in begin_cypher_merge(). Follows the established pattern used by cypher_target_node (id_expr_state, prop_expr_state). - Add prop_expr_state field to cypher_update_item with serialization support in outfuncs/readfuncs/copyfuncs. - apply_update_list() uses pre-initialized state when available, falls back to per-row init for plain SET callers. - Fix misleading comment: "ON MATCH SET" → "ON CREATE SET" for Case 1 first-run test. - Add Case 1 second-run test that triggers ON MATCH SET with a predecessor clause (MATCH ... MERGE ... ON MATCH SET).
|
Addressed all three Copilot suggestions:
All regression tests pass ( |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LIMIT | ||
| MATCH MERGE | ||
| NOT NULL_P | ||
| OPERATOR OPTIONAL OR ORDER | ||
| ON OPERATOR OPTIONAL OR ORDER |
There was a problem hiding this comment.
The ON token is added to the %token list here (making it a lexical keyword), but it is NOT added to the safe_keywords production rule later in the file (around lines 2431-2477). All other keywords added as tokens in this file are also listed in safe_keywords so they can be used as identifiers, property names, or labels via the schema_name → reserved_keyword → safe_keywords path.
By omitting ON from safe_keywords, property keys or node labels named on (e.g., n.on, MATCH (n:on)) will now fail to parse with a syntax error. This is a breaking change for any graph using on as a property name or label. Adding | ON { $$ = KEYWORD_STRDUP($1); } to the safe_keywords rule would fix this.
| ON MATCH SET n.b = 2 | ||
| RETURN n | ||
| $$) AS (n agtype); | ||
|
|
There was a problem hiding this comment.
There is no test covering the interaction of ON CREATE SET / ON MATCH SET with chained (non-terminal) MERGE statements, which exercises the eager-buffering code path added in PR #2344. The non-terminal MERGE path (lines 664-750 in cypher_merge.c) has ON CREATE SET and ON MATCH SET logic, but the tests only cover Case 1 with a MATCH predecessor (not a MERGE-then-MERGE chain). A test like:
MERGE (a:A {name: 'X'}) ON CREATE SET a.new = true
MERGE (b:B {name: 'Y'}) ON CREATE SET b.new = true
RETURN a, b
or with a chained terminal MERGE following a non-terminal MERGE with ON SET would validate that the eager buffering path handles ON SET correctly.
| -- Chained MERGE with ON CREATE SET in non-terminal and terminal clauses | |
| SELECT * FROM cypher('merge_actions', $$ | |
| MERGE (a:Person {name: 'ChainCreateA'}) | |
| ON CREATE SET a.new = true | |
| MERGE (b:Person {name: 'ChainCreateB'}) | |
| ON CREATE SET b.new = true | |
| RETURN a.name, a.new, b.name, b.new | |
| $$) AS (a_name agtype, a_new agtype, b_name agtype, b_new agtype); | |
| -- Chained MERGE with non-terminal ON CREATE SET and terminal ON MATCH SET | |
| -- Setup an existing node to be matched by the second MERGE | |
| SELECT * FROM cypher('merge_actions', $$ | |
| CREATE (p:Person {name: 'ChainMatch', seen: false}) | |
| $$) AS (n agtype); | |
| SELECT * FROM cypher('merge_actions', $$ | |
| MERGE (a:Person {name: 'ChainCreateOnce'}) | |
| ON CREATE SET a.created = true | |
| MERGE (b:Person {name: 'ChainMatch'}) | |
| ON MATCH SET b.seen = true | |
| RETURN a.name, a.created, b.name, b.seen | |
| $$) AS (a_name agtype, a_created agtype, b_name agtype, b_seen agtype); |
|
@gregfelice Please see the last 2 Copilot reviews |
Summary
Implements the openCypher-standard
ON CREATE SETandON MATCH SETclauses for the MERGE statement, resolving #1619. This allows conditional property updates depending on whether MERGE created a new path or matched an existing one:Design
The implementation spans parser, planner, and executor:
Parser — New grammar rules (
merge_actions_opt,merge_actions,merge_action) incypher_gram.y. TheONkeyword is added tocypher_kwlist.h.Nodes —
on_match/on_createlists oncypher_merge, correspondingon_match_set_info/on_create_set_infooncypher_merge_information, andprop_exproncypher_update_item. All fields serialized through copy/out/read funcs.Transform —
cypher_clause.ctransforms ON SET items and storesprop_exprfor direct expression evaluation.Executor —
apply_update_list()is extracted fromprocess_update_list()incypher_set.cas reusable SET logic.cypher_merge.ccalls it at all merge decision points:Why prop_expr?
The PostgreSQL planner strips target list entries for SET expressions that the CustomScan doesn't reference. This makes
prop_positionreferences into the scan tuple dangling. The solution: store theExpr*directly incypher_update_item->prop_exprand evaluate it withExecInitExpr/ExecEvalExpr, independent of scan tuple layout. This is only done for MERGE ON SET items — regular SET continues to useprop_positionunchanged.Files changed (12)
src/include/parser/cypher_kwlist.hONkeywordsrc/backend/parser/cypher_gram.ysrc/include/nodes/cypher_nodes.hsrc/backend/nodes/cypher_copyfuncs.csrc/backend/nodes/cypher_outfuncs.csrc/backend/nodes/cypher_readfuncs.csrc/backend/parser/cypher_clause.csrc/include/executor/cypher_utils.happly_update_listdeclarationsrc/backend/executor/cypher_set.capply_update_list()fromprocess_update_list()src/backend/executor/cypher_merge.capply_update_listat all merge decision pointsregress/sql/cypher_merge.sqlregress/expected/cypher_merge.outTest plan
Closes #1619