Git development
 help / color / mirror / Atom feed
From: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "D. Ben Knoble" <ben.knoble@gmail.com>,
	Michael Montalbo <mmontalbo@gmail.com>,
	Michael Montalbo <mmontalbo@gmail.com>
Subject: [PATCH 2/7] diff: simplify the line-range filter by classifying removals immediately
Date: Thu, 18 Jun 2026 18:16:27 +0000	[thread overview]
Message-ID: <5602b7976a0fcb6eeb582d4cd88ca379f48dd98d.1781806593.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2152.git.1781806593.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

The filter buffered '-' lines in a pending_rm strbuf, deferring their
classification until a '+' or ' ' line revealed the post-image
position.  That buffering is unnecessary: a removal occupies no
post-image line, so it does not advance lno_in_postimage, and xdiff
emits removals before additions within a change.  A '-' therefore
arrives while lno_in_postimage already holds the index the following
'+'/' ' will occupy, and can be classified against the ranges as it
arrives.

The buffering also hid a bug: flush_range_hunk() drained pending_rm into
the range hunk whenever the hunk was active, even after lno_in_postimage
had advanced past the tracked range, so a deletion just after the
tracked function leaked into the patch.  Classifying each line as it
arrives removes the pending_rm buffer, the discard_pending_rm() helper,
three struct fields, and makes that bug impossible by construction.

With every line classified on arrival, the buffered lines are the
hunk's single source of truth, so the old/new counts need not be kept
alongside them: flush_range_hunk() derives the counts (and whether the
hunk holds any change) from the buffer when it builds the header.  Drop
the per-line counting and the old_count, new_count, and has_changes
fields; there is no longer a second tally that could fall out of sync
with the buffer.

Add begin_range_hunk() to open the accumulator at the first in-range
line, seeding both begins from the live image cursors, as the
counterpart to flush_range_hunk().  With the counting gone too,
line_range_line_fn() now only appends an in-range line.

Document the coordinate model: a block comment on struct
line_range_filter states it (the pre/post-image cursors, the 0-based
idx_in_postimage, removals classified by the following line) with a
worked example.

Add tests for the leaked trailing deletion this fixes, the symmetric
leading-deletion case, and the filter's range boundaries (a change at
the first and last line of a range, and a pure in-range deletion).

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 diff.c              | 215 ++++++++++++++++++++++++--------------------
 t/t4211-line-log.sh | 125 ++++++++++++++++++++++++++
 2 files changed, 243 insertions(+), 97 deletions(-)

diff --git a/diff.c b/diff.c
index 1e043c959f..ee765d7ac2 100644
--- a/diff.c
+++ b/diff.c
@@ -610,18 +610,58 @@ struct emit_callback {
 };
 
 /*
- * State for the line-range callback wrappers that sit between
- * xdi_diff_outf() and fn_out_consume().  xdiff produces a normal,
- * unfiltered diff; the wrappers intercept each hunk header and line,
- * track post-image position, and forward only lines that fall within
- * the requested ranges.  Contiguous in-range lines are collected into
- * range hunks and flushed with a synthetic @@ header so that
- * fn_out_consume() sees well-formed unified-diff fragments.
+ * Line-range filter: scopes "git log -L" output to the tracked ranges.
  *
- * Removal lines ('-') cannot be classified by post-image position, so
- * they are buffered in pending_rm until the next '+' or ' ' line
- * reveals whether they precede an in-range line (flush into range hunk) or
- * an out-of-range line (discard).
+ * It sits between xdi_diff_outf() and an output callback (fn_out_consume,
+ * diffstat_consume, checkdiff_consume).  xdiff produces a normal diff; the
+ * filter forwards only the lines inside the requested ranges, collecting
+ * contiguous in-range lines into a "range hunk" emitted with a synthetic
+ * @@ header so the callback sees well-formed unified-diff fragments.
+ *
+ * A diff describes the change from a pre-image to a post-image.  Each
+ * line is context (' ', in both), a removal ('-', pre-image only), or
+ * an addition ('+', post-image only).  -L tracks ranges in the
+ * post-image, so a line is in range by its post-image position.
+ *
+ * Two 1-based cursors track the next line in each image, named as in
+ * struct emit_callback and seeded from the xdiff hunk header:
+ *
+ *	lno_in_postimage  advances on '+' and ' '   (lines in the post-image)
+ *	lno_in_preimage   advances on '-' and ' '   (lines in the pre-image)
+ *
+ * Ranges are 0-based half-open [start, end), so a line is tested at the
+ * 0-based index idx_in_postimage = lno_in_postimage - 1.
+ *
+ * A '-' is not present in the post-image, so it has no post-image line
+ * number of its own.  Since it does not advance lno_in_postimage, it is
+ * classified at the idx_in_postimage that the following '+'/' ' will
+ * occupy.  xdiff emits a change's removals before its additions, so that
+ * index is already known when the '-' arrives.
+ *
+ * The synthetic "@@ -<old> +<new> @@" header has two sides, old (the
+ * pre-image) and new (the post-image), matching the xdiff_emit_hunk_fn
+ * callback; the hunk.old_begin / hunk.new_begin fields below hold those
+ * begins, and flush_range_hunk() derives the counts from the buffered
+ * lines.
+ *
+ * Example, tracking post-image line 2 (range [1, 2)) of:
+ *
+ *	pre-image  post-image
+ *	1 a        1 a
+ *	2 b        2 X      (b -> X)
+ *	3 c        3 c
+ *
+ *	classify each line by idx_in_postimage.  The pre and post columns
+ *	are each cursor's value while that line is classified, i.e. before
+ *	the line advances them (pre = lno_in_preimage,
+ *	post = lno_in_postimage, idx = idx_in_postimage):
+ *	' a'  pre 1  post 1  idx 0  ->  before start, skip
+ *	'-b'  pre 2  post 2  idx 1  ->  keep (removal)
+ *	'+X'  pre 3  post 2  idx 1  ->  keep (addition)
+ *	' c'  pre 3  post 3  idx 2  ->  past end, flush
+ *
+ * -b and +X share idx = 1 because -b did not advance lno_in_postimage;
+ * both land in the range hunk, flushed when ' c' crosses the range end.
  */
 struct line_range_filter {
 	xdiff_emit_line_fn orig_line_fn;
@@ -640,20 +680,18 @@ struct line_range_filter {
 	char func[80];
 	long funclen;
 
-	/* The range hunk being accumulated for the current range. */
+	/*
+	 * The range hunk being accumulated.  At most one is live at a time:
+	 * it is flushed and reset as the cursor leaves each range (and once
+	 * more at end of diff), then reused for the next range.
+	 */
 	struct {
 		struct strbuf lines;	/* buffered in-range diff lines */
-		long old_begin, old_count;
-		long new_begin, new_count;
+		long old_begin;
+		long new_begin;
 		int active;
-		int has_changes;	/* any '+' or '-' line? */
 	} hunk;
 
-	/* Removal lines not yet known to be in-range */
-	struct strbuf pending_rm;
-	int pending_rm_count;
-	long pending_rm_pre_begin;	/* pre-image line of first pending */
-
 	int ret;			/* latched error from orig_line_fn */
 };
 
@@ -2542,26 +2580,48 @@ static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED
 	return 1;
 }
 
-static void discard_pending_rm(struct line_range_filter *filter)
+/*
+ * Begin a range hunk at the first in-range line.  Its position fixes the
+ * hunk's begins, taken from the two image cursors before they advance:
+ * new_begin from the post-image, old_begin from the pre-image.  The line
+ * counts are not tracked here; flush_range_hunk() derives them from the
+ * buffered lines.
+ */
+static void begin_range_hunk(struct line_range_filter *filter)
 {
-	strbuf_reset(&filter->pending_rm);
-	filter->pending_rm_count = 0;
+	filter->hunk.active = 1;
+	filter->hunk.new_begin = filter->lno_in_postimage;
+	filter->hunk.old_begin = filter->lno_in_preimage;
+	strbuf_reset(&filter->hunk.lines);
 }
 
 static void flush_range_hunk(struct line_range_filter *filter)
 {
 	struct strbuf hdr = STRBUF_INIT;
 	const char *p, *end;
+	long old_count = 0, new_count = 0;
+	int has_changes = 0;
 
 	if (!filter->hunk.active || filter->ret)
 		return;
 
-	/* Drain any pending removal lines into the range hunk */
-	if (filter->pending_rm_count) {
-		strbuf_addbuf(&filter->hunk.lines, &filter->pending_rm);
-		filter->hunk.old_count += filter->pending_rm_count;
-		filter->hunk.has_changes = 1;
-		discard_pending_rm(filter);
+	/*
+	 * Derive the hunk's geometry from the buffered lines: a ' '
+	 * counts on both sides, a '-' on the old side, a '+' on the new.
+	 * A '-' or '+' marks a real change; the "\ No newline at end of
+	 * file" marker (line[0] == '\\') counts on neither side.
+	 */
+	p = filter->hunk.lines.buf;
+	end = p + filter->hunk.lines.len;
+	while (p < end) {
+		const char *eol = memchr(p, '\n', end - p);
+		if (*p == ' ' || *p == '-')
+			old_count++;
+		if (*p == ' ' || *p == '+')
+			new_count++;
+		if (*p == '-' || *p == '+')
+			has_changes = 1;
+		p = eol ? eol + 1 : end;
 	}
 
 	/*
@@ -2570,15 +2630,15 @@ static void flush_range_hunk(struct line_range_filter *filter)
 	 * ctxlen causes xdiff to emit context covering a range that
 	 * has no changes in this commit.
 	 */
-	if (!filter->hunk.has_changes) {
+	if (!has_changes) {
 		filter->hunk.active = 0;
 		strbuf_reset(&filter->hunk.lines);
 		return;
 	}
 
 	strbuf_addf(&hdr, "@@ -%ld,%ld +%ld,%ld @@",
-		    filter->hunk.old_begin, filter->hunk.old_count,
-		    filter->hunk.new_begin, filter->hunk.new_count);
+		    filter->hunk.old_begin, old_count,
+		    filter->hunk.new_begin, new_count);
 	if (filter->funclen > 0) {
 		strbuf_addch(&hdr, ' ');
 		strbuf_add(&hdr, filter->func, filter->funclen);
@@ -2618,11 +2678,6 @@ static void line_range_hunk_fn(void *data,
 	 * When count > 0, begin is 1-based.  When count == 0, begin is
 	 * adjusted down by 1 by xdl_emit_hunk_hdr(), but no lines of
 	 * that type will arrive, so the value is unused.
-	 *
-	 * Any pending removal lines from the previous xdiff hunk are
-	 * intentionally left in pending_rm: the line callback will
-	 * flush or discard them when the next content line reveals
-	 * whether the removals precede in-range content.
 	 */
 	filter->lno_in_postimage = new_begin;
 	filter->lno_in_preimage = old_begin;
@@ -2638,88 +2693,56 @@ static void line_range_hunk_fn(void *data,
 static int line_range_line_fn(void *priv, char *line, unsigned long len)
 {
 	struct line_range_filter *filter = priv;
-	const struct range *cur;
-	long idx_in_postimage, cur_pre;
+	long idx_in_postimage;
+	int in_range;
 
 	if (filter->ret)
 		return filter->ret;
 
-	if (line[0] == '-') {
-		if (!filter->pending_rm_count)
-			filter->pending_rm_pre_begin = filter->lno_in_preimage;
-		filter->lno_in_preimage++;
-		strbuf_add(&filter->pending_rm, line, len);
-		filter->pending_rm_count++;
-		return filter->ret;
-	}
-
 	if (line[0] == '\\') {
-		if (filter->pending_rm_count)
-			strbuf_add(&filter->pending_rm, line, len);
-		else if (filter->hunk.active)
+		if (filter->hunk.active)
 			strbuf_add(&filter->hunk.lines, line, len);
-		/* otherwise outside tracked range; drop silently */
 		return filter->ret;
 	}
 
-	if (line[0] != '+' && line[0] != ' ')
+	if (line[0] != '+' && line[0] != ' ' && line[0] != '-')
 		BUG("unexpected diff line type '%c'", line[0]);
 
+	/*
+	 * idx_in_postimage is this line's 0-based post-image index (see the model on
+	 * struct line_range_filter).  The cursors are advanced only after
+	 * the line is classified, so a '-' is tested at the same idx_in_postimage as
+	 * the '+'/' ' that follows it.
+	 */
 	idx_in_postimage = filter->lno_in_postimage - 1;
-	cur_pre = filter->lno_in_preimage;	/* save before advancing for context lines */
-	filter->lno_in_postimage++;
-	if (line[0] == ' ')
-		filter->lno_in_preimage++;
 
-	/* Advance past ranges we've passed */
+	/* Retire ranges we have passed, flushing the one we leave. */
 	while (filter->cur_range < filter->ranges->nr &&
 	       idx_in_postimage >= filter->ranges->ranges[filter->cur_range].end) {
 		if (filter->hunk.active)
 			flush_range_hunk(filter);
-		discard_pending_rm(filter);
 		filter->cur_range++;
 	}
 
-	/* Past all ranges */
-	if (filter->cur_range >= filter->ranges->nr) {
-		discard_pending_rm(filter);
-		return filter->ret;
-	}
+	in_range = filter->cur_range < filter->ranges->nr &&
+		   idx_in_postimage >= filter->ranges->ranges[filter->cur_range].start &&
+		   idx_in_postimage < filter->ranges->ranges[filter->cur_range].end;
 
-	cur = &filter->ranges->ranges[filter->cur_range];
+	if (in_range) {
+		if (!filter->hunk.active)
+			begin_range_hunk(filter);
 
-	/* Before current range */
-	if (idx_in_postimage < cur->start) {
-		discard_pending_rm(filter);
-		return filter->ret;
+		strbuf_add(&filter->hunk.lines, line, len);
 	}
 
-	/* In range so start a new range hunk if needed */
-	if (!filter->hunk.active) {
-		filter->hunk.active = 1;
-		filter->hunk.has_changes = 0;
-		filter->hunk.new_begin = idx_in_postimage + 1;
-		filter->hunk.old_begin = filter->pending_rm_count
-			? filter->pending_rm_pre_begin : cur_pre;
-		filter->hunk.old_count = 0;
-		filter->hunk.new_count = 0;
-		strbuf_reset(&filter->hunk.lines);
-	}
-
-	/* Flush pending removals into range hunk */
-	if (filter->pending_rm_count) {
-		strbuf_addbuf(&filter->hunk.lines, &filter->pending_rm);
-		filter->hunk.old_count += filter->pending_rm_count;
-		filter->hunk.has_changes = 1;
-		discard_pending_rm(filter);
-	}
-
-	strbuf_add(&filter->hunk.lines, line, len);
-	filter->hunk.new_count++;
-	if (line[0] == '+')
-		filter->hunk.has_changes = 1;
-	else
-		filter->hunk.old_count++;
+	/*
+	 * Advance each image's cursor: a line present in that image (see
+	 * the model) consumes one of its line numbers.
+	 */
+	if (line[0] != '-')
+		filter->lno_in_postimage++;
+	if (line[0] != '+')
+		filter->lno_in_preimage++;
 
 	return filter->ret;
 }
@@ -4097,7 +4120,6 @@ static void builtin_diff(const char *name_a,
 			lr_state.orig_cb_data = &ecbdata;
 			lr_state.ranges = line_ranges;
 			strbuf_init(&lr_state.hunk.lines, 0);
-			strbuf_init(&lr_state.pending_rm, 0);
 
 			/*
 			 * Inflate ctxlen so that all changes within
@@ -4132,7 +4154,6 @@ static void builtin_diff(const char *name_a,
 				die("unable to generate diff for %s",
 				    one->path);
 			strbuf_release(&lr_state.hunk.lines);
-			strbuf_release(&lr_state.pending_rm);
 		} else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
 					 &ecbdata, &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index ca4eb7bbc7..e9691066de 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -738,6 +738,131 @@ test_expect_success '-L with -G filters to diff-text matches' '
 	grep "F2 + 2" actual
 '
 
+test_expect_success 'setup for trailing deletion test' '
+	git checkout --orphan trailing-del &&
+	git reset --hard &&
+	cat >file.c <<-\EOF &&
+	void tracked()
+	{
+	    return 1;
+	}
+	// trailing comment
+	EOF
+	git add file.c &&
+	test_tick &&
+	git commit -m "add file with trailing comment" &&
+	# Modify tracked() AND delete the trailing comment in
+	# one commit, so the commit touches the tracked range
+	# and is not filtered out by the revision walker.
+	cat >file.c <<-\EOF &&
+	void tracked()
+	{
+	    return 2;
+	}
+	EOF
+	git commit -a -m "modify tracked and delete trailing comment"
+'
+
+test_expect_success '-L does not include deletions past end of tracked range' '
+	git log -L:tracked:file.c --format= -1 -p >actual &&
+	# The trailing comment deletion is outside the tracked
+	# range and should not appear in the patch output.
+	test_grep "return 2" actual &&
+	test_grep ! "trailing comment" actual
+'
+
+test_expect_success '-L includes leading deletions resolved by in-range line' '
+	git checkout --orphan leading-del &&
+	git reset --hard &&
+	cat >file.c <<-\EOF &&
+	// leading comment
+	void tracked()
+	{
+	    return 1;
+	}
+	EOF
+	git add file.c &&
+	test_tick &&
+	git commit -m "add file with leading comment" &&
+	cat >file.c <<-\EOF &&
+	void tracked()
+	{
+	    return 2;
+	}
+	EOF
+	git commit -a -m "modify tracked and delete leading comment" &&
+	git log -L:tracked:file.c --format= -1 -p >actual &&
+	# The leading comment deletion is resolved by the next
+	# non-removal line (void tracked), which is in range: a
+	# removal is classified by the position of the following
+	# line, so it joins the range that line falls in.
+	test_grep "return 2" actual &&
+	test_grep "leading comment" actual
+'
+
+test_expect_success 'setup for line-range filter edge cases' '
+	git checkout --orphan filter-edge &&
+	git reset --hard &&
+	cat >file.c <<-\EOF &&
+	void before()
+	{
+	    return 0;
+	}
+
+	void tracked()
+	{
+	    int a = 1;
+	    int b = 2;
+	    int c = 3;
+	    return a + b + c;
+	}
+
+	void after()
+	{
+	    return 9;
+	}
+	EOF
+	git add file.c &&
+	test_tick &&
+	git commit -m "initial"
+'
+
+test_expect_success '-L change at exact first line of range' '
+	git checkout filter-edge &&
+	# Change the function signature (first line of range)
+	sed "s/void tracked/int tracked/" file.c >tmp &&
+	mv tmp file.c &&
+	git commit -a -m "change first line" &&
+	git log -L:tracked:file.c -p --format=%s -1 >actual &&
+	test_grep "change first line" actual &&
+	test_grep "+int tracked" actual &&
+	test_grep "\\-void tracked" actual
+'
+
+test_expect_success '-L change at exact last line of range' '
+	git checkout filter-edge &&
+	git reset --hard HEAD~1 &&
+	# Change the closing brace line (last line of range)
+	sed "s/^}$/} \/\/ end tracked/" file.c >tmp &&
+	mv tmp file.c &&
+	git commit -a -m "change last line" &&
+	git log -L:tracked:file.c -p --format=%s -1 >actual &&
+	test_grep "change last line" actual &&
+	test_grep "end tracked" actual
+'
+
+test_expect_success '-L pure deletion in range (no additions)' '
+	git checkout filter-edge &&
+	git reset --hard HEAD~1 &&
+	# Delete a line inside tracked() without adding anything
+	sed "/int c/d" file.c >tmp &&
+	mv tmp file.c &&
+	git commit -a -m "pure deletion" &&
+	git log -L:tracked:file.c -p --format=%s -1 >actual &&
+	test_grep "pure deletion" actual &&
+	test_grep "\\-.*int c" actual
+'
+
 test_expect_success '-L with --diff-filter=M excludes root commit' '
 	git checkout parent-oids &&
 	git log -L:func2:file.c --diff-filter=M --format=%s --no-patch >actual &&
-- 
gitgitgadget


  parent reply	other threads:[~2026-06-18 18:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 18:16 [PATCH 0/7] line-log: range-scope stat, check, and -G under -L Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 1/7] diff: rename and group the line-range filter for clarity Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` Michael Montalbo via GitGitGadget [this message]
2026-06-18 18:16 ` [PATCH 3/7] diff: emit -L hunk headers via xdiff's formatter Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 4/7] diff: extract a line-range diff helper for reuse Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 5/7] line-log: support diff stat formats with -L Michael Montalbo via GitGitGadget
2026-06-18 22:00   ` Junio C Hamano
2026-06-18 18:16 ` [PATCH 6/7] diff: support --check with -L line ranges Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 7/7] diffcore-pickaxe: scope -G to the -L tracked range Michael Montalbo via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5602b7976a0fcb6eeb582d4cd88ca379f48dd98d.1781806593.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mmontalbo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox