refactor(core): removed old non-deterministic core engine by HarshK97 · Pull Request #4 · HarshK97/diffmantic.nvim

HarshK97 · 2026-02-28T10:27:03Z

The current engine has diverted too much from Gumtree and is in a non-deterministic state will be writing the core engine again ground up

The current engine has diverted too much from Gumtree and is in a non-deterministic state will be writing the core engine again group up

Added a new rules.lua file that basically tells the engine how to handle different node types per language. Some nodes like semicolons and commas are just noise so we ignore them, some nodes like string literals are better treated as a single unit so we flatten them, and some node types are just different names for the same thing so we alias them. Did this for c, cpp, go, javascript, typescript, python and lua.

Hooked the new rules module into preprocess_tree. Now when we walk the tree we skip ignored nodes entirely, flatten nodes that should be treated as leaves, and use canonical type names before hashing. This makes hashes a lot more stable across code that is structurally the same but written slightly differently.

Added a pre-matching step that runs before the main top-down phase. The idea is simple, use vim.diff to find lines that did not change, then match AST nodes that sit on those unchanged lines by type and column. This gives the later phases a better starting point and reduces the chance of nodes getting matched to the wrong counterpart. Based on: Matsumoto et al., Beyond GumTree: A Hybrid Approach to Generate Edit Scripts, MSR 2019.

Rewrote the top-down phase from scratch. Instead of the old parent bucket scan we now use a height queue that processes nodes tallest first. When multiple nodes share the same hash we defer them and resolve largest first to avoid bad early matches. Also added an optional function pre-matching pass that uses name similarity and child dice score to match functions before the main loop runs. Based on: Falleri and Martinez, Fine-grained Accurate and Scalable Source Code Differencing, ICSE 2024. https://doi.org/10.1145/3597503.3639148

Rewrote the bottom-up phase. Dropped the roles dependency and replaced the old name lookup candidate search with dice similarity over mapped descendants. When similarity scores tie we use child position distance as a tiebreaker. Also pulled simple_recovery out as a shared function and exposed it as _simple_recovery along with _prune_mappings so the recovery phase can reuse them. Based on: Falleri and Martinez, ICSE 2024. https://doi.org/10.1145/3597503.3639148

Replaced the old LCS worklist loop with something much simpler. We now just run simple_recovery and prune_mappings in a loop until the mapping set stops changing. The max_size threshold is gone entirely which was always a pain to tune anyway. Based on: Falleri and Martinez, ICSE 2024. https://doi.org/10.1145/3597503.3639148

The old roles module was doing a lot. Stripped it down to just what we actually need which is looking up a name for a given node kind. build_index now only records outer and name capture pairs and get_kind_name_text just walks up the ancestors to find a match. Much easier to reason about.

Rewrote generate_actions to work directly from mapping metadata instead of going through the roles and semantic machinery. Update detection now checks parent consistency and leaf labels directly. Move detection uses LIS to find reordered siblings and parent change detection for structural moves. Removed semantic.lua entirely since nothing uses it anymore.

Rewrote the hunk computation in analysis.lua. The old version had a lot of moving parts with semantic leaf changes and multiple merge and refine passes. New version just calls vim.diff with result_type indices and then does a word level diff for single line changes. Full token types like identifiers and literals get one hunk covering the whole node which is usually what you want anyway.

Added M.diff() to core.lua which wires together all the phases in order: pre_match, top_down_match, bottom_up_match, recovery_match and generate_actions. Also wrapped each phase in its own function with a cleaner signature so callers can run individual phases if they want. Accepts either positional args or an options table and returns either a flat tuple or a structured table.

Added hunks.lua with normalize_hunk and normalize_list. Basically a safety layer so rendering code does not have to defensively check every field on every hunk. Fills in missing rows and cols, infers kind from whichever side is present, and clamps zero width ranges to at least one character wide.

Plugged normalize_list into both renderer.lua and filler.lua before any hunk rendering happens. Added a guard that skips insert and delete hunks inside update actions when they overlap an explicit insert or delete action to avoid double highlighting. Switched init.lua from the old manual four step pipeline to the new core.diff() call.

Copilot

Pull request overview

This PR refactors Diffmantic’s core diff pipeline toward a new deterministic engine by introducing rule-driven Treesitter preprocessing, new prematching, and a redesigned action/hunk generation + rendering flow.

Changes:

Replaces the previous multi-stage core API (top_down_match/bottom_up_match/recovery_match/generate_actions) with a consolidated core.diff() pipeline and new prematch + recovery behavior.
Introduces language rules (core/rules.lua) and hunk normalization (ui/hunks.lua) to stabilize hashing/hunk ranges and rendering.
Reworks action generation (core/actions.lua) and analysis (core/analysis.lua) to rely on vim.diff-based hunks and updated rendering logic.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
lua/diffmantic/ui/renderer.lua	Normalizes hunks before rendering and avoids double-rendering update-hunks that overlap explicit insert/delete actions.
lua/diffmantic/ui/hunks.lua	New utility to normalize/compat ranges and hunk shape for renderer/filler consumers.
lua/diffmantic/ui/filler.lua	Normalizes update hunks before computing uncovered ranges and filler layout.
lua/diffmantic/treesitter.lua	Adds rule-driven node ignore/flatten/canonicalization during preprocessing and updates hash computation.
lua/diffmantic/init.lua	Switches to `core.diff()` returning a result table with actions/mappings/info.
lua/diffmantic/core/top_down.lua	Major rewrite of top-down mapping logic with optional function prematch and ambiguous handling.
lua/diffmantic/core/semantic.lua	Removes the old semantic leaf-diff/rename heuristics module.
lua/diffmantic/core/rules.lua	New per-language rules for ignored/flattened/aliased node types and label handling.
lua/diffmantic/core/roles.lua	Simplifies role indexing to map `diff.<kind>.outer/name` captures into a lightweight lookup.
lua/diffmantic/core/recovery.lua	Replaces prior recovery implementation with repeated passes delegating to bottom-up helpers + pruning.
lua/diffmantic/core/line_prematch.lua	New unchanged-line prematch to seed initial mappings.
lua/diffmantic/core/bottom_up.lua	Major rewrite of bottom-up mapping + recovery helpers + pruning exported for recovery pass.
lua/diffmantic/core/analysis.lua	Replaces semantic/token diff logic with `vim.diff` hunk generation and range construction helpers.
lua/diffmantic/core/actions.lua	Replaces action generation (insert/delete/update/move) and adds reorder-move detection.
lua/diffmantic/core.lua	Introduces the new public `diff()` orchestration and wraps matching stages + analysis enrichment.

Comments suppressed due to low confidence (1)

lua/diffmantic/treesitter.lua:85

This if parent_id then ... pi.parent_id = pi.parent_id end block is a no-op (it assigns the field to itself) and can be removed. Keeping it makes the flattening branch harder to follow and suggests there is missing logic.

			if parent_id then
				local pi = info[parent_id]
				if pi then pi.parent_id = pi.parent_id end -- already set
			end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T10:11:09Z

lua/diffmantic/core/line_prematch.lua

+	for src_line, dst_line in pairs(line_map) do
+		local src_nodes = src_by_line[src_line] or {}
+		local dst_nodes = dst_by_line[dst_line] or {}


prematch_unchanged iterates line_map using pairs(), which is not deterministic in Lua. Because this loop builds mappings (and consumes candidates via table.remove(candidates, 1)), the resulting pre-mappings can vary between runs even for identical inputs. Iterate line_map keys in a stable order (e.g., collect src_line keys into an array, sort numerically, then loop with ipairs).

Copilot · 2026-03-05T10:11:09Z

lua/diffmantic/treesitter.lua

+
+	local lang = opts.lang
+	if not lang then
+		lang = vim.bo[bufnr] and vim.bo[bufnr].filetype or ""


Rules selection uses vim.bo[bufnr].filetype as the language key. For many Neovim setups the Treesitter language differs from filetype (e.g. typescriptreact/tsx, javascriptreact/tsx, etc.), so rules_mod.get(lang) may silently fall back to EMPTY and disable the intended canonicalization/ignores. Consider resolving the Treesitter language via vim.treesitter.language.get_lang(filetype) (as done elsewhere) or requiring callers to pass opts.lang consistently.

Suggested change

lang = vim.bo[bufnr] and vim.bo[bufnr].filetype or ""

local ft = vim.bo[bufnr] and vim.bo[bufnr].filetype or ""

-- Prefer Treesitter language mapping when available, fall back to filetype

lang = vim.treesitter.language.get_lang(ft) or ft

Copilot · 2026-03-05T10:11:10Z

lua/diffmantic/core/top_down.lua

+local function build_height_queue(info)
+	local by_height = {}
+	for id, v in pairs(info) do
+		if v.height >= MIN_HEIGHT then
+			by_height[v.height] = by_height[v.height] or {}
+			table.insert(by_height[v.height], id)
 		end


build_height_queue populates by_height by iterating pairs(info), which is unordered. Since the resulting per-height ID lists are later used to drive matching decisions, this can reintroduce non-determinism when there are multiple same-hash candidates. To keep matching deterministic, sort the ID lists per height (e.g. using node_pos_lt) before returning/consuming them.

Copilot · 2026-03-05T10:11:10Z

lua/diffmantic/core/top_down.lua

+	for _, ids in pairs(src_by_h) do
+		for _, id in ipairs(ids) do
+			local h = src_info[id].height
+			src_open[h] = src_open[h] or {}
+			table.insert(src_open[h], id)
 		end


src_open is seeded by iterating pairs(src_by_h), so insertion order of IDs into src_open[h] depends on Lua hash iteration order. Because later logic preserves this order when building hash buckets, the final mappings can vary between runs. Consider iterating heights in sorted order and appending IDs in a stable order (or sorting src_open[h]/dst_open[h]).

I was checking unstable matches and noticed row/col was not enough in a few cases. This stores start_byte and end_byte from node:range(true) so later steps can sort and compare nodes more consistently.

I was seeing occasional order drift in mappings when tie cases happened. This adds deterministic tie-break sorting in prematch and top-down, and normalizes mappings in core.diff so repeated runs stay identical.

I cleaned up a few noisy action cases so updates are represented more consistently. I also removed filler/alignment wiring from the active UI path for the alpha cut, keeping rendering focused on core highlights and signs.

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T03:28:09Z

lua/diffmantic/treesitter.lua

+	local lang = opts.lang
+	if not lang then
+		lang = vim.bo[bufnr] and vim.bo[bufnr].filetype or ""
+	end
+	local rules = rules_mod.get(lang)


preprocess_tree derives lang from vim.bo[bufnr].filetype, but Treesitter languages often differ from filetype (e.g. typescriptreact -> tsx). That will cause rules_mod.get(lang) to miss rules and change hashing/matching behavior depending on filetype. Resolve lang via vim.treesitter.language.get_lang(filetype) (with a fallback) so rule lookup matches the parser language.

Copilot · 2026-03-10T03:28:09Z

lua/diffmantic/core/analysis.lua

+clamp_non_empty_range = function(range, lines)
+	if not range then
 		return nil
 	end
-	local first = line_value:find("%S")
-	if not first then
-		return nil
-	end
-	local rev_last = line_value:reverse():find("%S")
-	local last = #line_value - rev_last + 1
-	return make_range(row, base_col + first - 1, base_col + last)
-end
-
-local function hunk_change(src_range, dst_range)
-	return { kind = "change", src = src_range, dst = dst_range }
-end
-
-local function hunk_insert(dst_range)
-	return { kind = "insert", src = nil, dst = dst_range }
-end
-
-local function hunk_delete(src_range)
-	return { kind = "delete", src = src_range, dst = nil }
-end
-
-local function range_contains(outer, inner)
-	if not outer or not inner then
-		return false
-	end
-	if outer.start_row == nil or outer.end_row == nil or outer.start_col == nil or outer.end_col == nil then
-		return false
-	end
-	if inner.start_row == nil or inner.end_row == nil or inner.start_col == nil or inner.end_col == nil then
-		return false
+	range.start_row = range.start_row or 0
+	range.end_row = range.end_row or range.start_row
+	range.start_col = range.start_col or 0
+	if range.end_col == nil then
+		local rel_end = (range.end_row - range.start_row) + 1
+		range.end_col = line_len(lines or {}, rel_end)
 	end
-	if inner.start_row < outer.start_row or inner.end_row > outer.end_row then
-		return false
+	if range.start_row == range.end_row and range.end_col <= range.start_col then
+		range.end_col = range.start_col + 1
 	end


clamp_non_empty_range forces end_col = start_col + 1 when end_col <= start_col. For empty lines this can push end_col past the actual line length (0), which can make extmark highlights fail or behave inconsistently. Consider representing empty-line spans using end_row = start_row + 1, end_col = 0 (or setting hl_eol = true at render time) instead of forcing a 1-column range.

Copilot · 2026-03-10T03:28:09Z

lua/diffmantic/core/top_down.lua

+local function build_maps(mappings)
+	local s2d, d2s = {}, {}
+	for _, m in ipairs(mappings) do
+		s2d[m.src] = m.dst
+		d2s[m.dst] = m.src
+	end
+	return s2d, d2s
 end


build_maps is defined but not referenced anywhere in this module. Removing it (or using it where appropriate) would avoid dead code and keep the matching implementation easier to follow.

Copilot · 2026-03-10T03:28:09Z

lua/diffmantic/treesitter.lua

+			if parent_id then
+				local pi = info[parent_id]
+				if pi then pi.parent_id = pi.parent_id end -- already set
+			end


This block is a no-op (pi.parent_id = pi.parent_id) and doesn’t affect behavior. It can be removed to reduce noise and avoid suggesting there’s some important parent-linking logic here.

Copilot · 2026-03-10T03:28:09Z

lua/diffmantic/init.lua

 	local src_role_index = roles.build_index(root1, buf1)
 	local dst_role_index = roles.build_index(root2, buf2)

-	local mappings, src_info, dst_info = core.top_down_match(root1, root2, buf1, buf2, {
-		adaptive_mode = true,
-	})
-	-- print("Top-down mappings: " .. #mappings)
-
-	-- local before_bottom_up = #mappings
-	mappings = core.bottom_up_match(mappings, src_info, dst_info, root1, root2, buf1, buf2, {
-		src_role_index = src_role_index,
-		dst_role_index = dst_role_index,
-		adaptive_mode = true,
-	})
-	-- print("Mappings after Bottom-up: " .. #mappings .. " (+" .. (#mappings - before_bottom_up) .. " new)")
-
-	-- local before_recovery = #mappings
-	local src_count = 0
-	local dst_count = 0
-	for _ in pairs(src_info) do
-		src_count = src_count + 1
-	end
-	for _ in pairs(dst_info) do
-		dst_count = dst_count + 1
-	end
-	local max_nodes = math.max(src_count, dst_count)
-	mappings = core.recovery_match(root1, root2, mappings, src_info, dst_info, buf1, buf2, {
-		recovery_lcs_cell_limit = max_nodes >= 25000 and 1500 or 6000,
-		adaptive_mode = true,
-	})
-	-- debug_utils.print_recovery_mappings(mappings, before_recovery, src_info, dst_info, buf1, buf2)
-
-	local actions = core.generate_actions(root1, root2, mappings, src_info, dst_info, {
+	local result = core.diff(root1, root2, {
 		src_buf = buf1,
 		dst_buf = buf2,
 		src_role_index = src_role_index,
 		dst_role_index = dst_role_index,
-		adaptive_mode = true,
 	})


src_role_index/dst_role_index are computed and passed into core.diff, but the new core pipeline doesn’t read these options anymore (no references under lua/diffmantic/core.lua or matching modules). This adds unnecessary Treesitter query work on every :Diffmantic invocation; consider removing these computations/payload until the indices are actually consumed.

Copilot

Pull request overview

Copilot reviewed 18 out of 22 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

lua/diffmantic/core/actions.lua

+				metadata = {
+					node_type = emit_info.type,
+					new_name  = node_label(emit_info, dst_buf),
+					from_line = range and (range.start_row + 1) or nil,


lua/diffmantic/core/actions.lua

+	for sid, si in pairs(src_info or {}) do
+		if not is_source_property(si) or s2d[sid] then
+			goto continue_src
+		end
+		local mapped_parent = si.parent_id and s2d[si.parent_id] or nil
+		if not mapped_parent then
+			goto continue_src
+		end
+
+		local src_key = property_key_text(si, src_buf)
+		if src_key == "" then
+			goto continue_src
+		end
+
+		local best_di = nil
+		local best_dist = nil
+		for did, di in pairs(dst_info or {}) do
+			if used_dst[did] or d2s[did] then
+				goto continue_dst
 			end
-		elseif t == "update" then
-			local metadata = action.metadata or {}
-			table.insert(summary.updates, {
-				node_type = metadata.node_type,
-				from_line = metadata.from_line,
-				to_line = metadata.to_line,
-				src_range = action.src,
-				dst_range = action.dst,
-			})
-		elseif t == "insert" then
-			local metadata = action.metadata or {}
-			table.insert(summary.inserts, {
-				node_type = metadata.node_type,
-				line = metadata.to_line,
-				dst_range = action.dst,
-			})
-		elseif t == "delete" then
-			local metadata = action.metadata or {}
-			table.insert(summary.deletes, {
-				node_type = metadata.node_type,
-				line = metadata.from_line,
-				src_range = action.src,
-			})
+			if not di or di.type ~= "pair" or di.parent_id ~= mapped_parent then
+				goto continue_dst
+			end
+			local dst_key = property_key_text(di, dst_buf)
+			if dst_key ~= src_key then
+				goto continue_dst
+			end
+			local dist = math.abs((si.start_row or 0) - (di.start_row or 0))
+			if not best_di or dist < best_dist then
+				best_di = did
+				best_dist = dist
+			end
+			::continue_dst::
 		end


HarshK97 added 13 commits February 28, 2026 15:56

refactor(core): removed old non-deterministic core engine

dd8c073

The current engine has diverted too much from Gumtree and is in a non-deterministic state will be writing the core engine again group up

HarshK97 requested a review from Copilot March 5, 2026 10:02

Copilot started reviewing on behalf of HarshK97 March 5, 2026 10:03 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

HarshK97 added 3 commits March 6, 2026 10:15

core(ts): track byte ranges in preprocess info

480faf9

I was checking unstable matches and noticed row/col was not enough in a few cases. This stores start_byte and end_byte from node:range(true) so later steps can sort and compare nodes more consistently.

core(match): stabilize mapping order across runs

89051ca

I was seeing occasional order drift in mappings when tie cases happened. This adds deterministic tie-break sorting in prematch and top-down, and normalizes mappings in core.diff so repeated runs stay identical.

core/ui: refine action output and prep alpha rendering

69161c4

I cleaned up a few noisy action cases so updates are represented more consistently. I also removed filler/alignment wiring from the active UI path for the alpha cut, keeping rendering focused on core highlights and signs.

HarshK97 requested a review from Copilot March 10, 2026 03:22

Copilot started reviewing on behalf of HarshK97 March 10, 2026 03:23 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

HarshK97 added 5 commits March 12, 2026 10:14

feat(core): add rename promotion pipeline

d08f5c4

fix(core): refine action shaping and word-level hunks

9e0367f

fix(ts): resolve parser language rule lookup

d33c940

fix(core): preserve visible string updates

c11630f

docs: refresh README and move assets

e516a05

HarshK97 requested a review from Copilot March 13, 2026 10:15

Copilot started reviewing on behalf of HarshK97 March 13, 2026 10:17 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

fix(core): tighten object pair update matching

feda20b

HarshK97 marked this pull request as ready for review March 13, 2026 11:52

HarshK97 merged commit d65dd9f into main Mar 13, 2026

HarshK97 deleted the rewrite/core branch March 13, 2026 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(core): removed old non-deterministic core engine#4

refactor(core): removed old non-deterministic core engine#4
HarshK97 merged 22 commits intomainfrom
rewrite/core

HarshK97 commented Feb 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-		lang = vim.bo[bufnr] and vim.bo[bufnr].filetype or ""
+		local ft = vim.bo[bufnr] and vim.bo[bufnr].filetype or ""
+		-- Prefer Treesitter language mapping when available, fall back to filetype
+		lang = vim.treesitter.language.get_lang(ft) or ft

Conversation

HarshK97 commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HarshK97 commented Feb 28, 2026 •

edited

Loading