All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "D. Ben Knoble" <ben.knoble@gmail.com>,
	Michael Montalbo <mmontalbo@gmail.com>,
	Michael Montalbo <mmontalbo@gmail.com>
Subject: [PATCH 2/7] diff: simplify the line-range filter by classifying removals immediately
Date: Thu, 18 Jun 2026 18:16:27 +0000	[thread overview]
Message-ID: <5602b7976a0fcb6eeb582d4cd88ca379f48dd98d.1781806593.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2152.git.1781806593.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

The filter buffered '-' lines in a pending_rm strbuf, deferring their
classification until a '+' or ' ' line revealed the post-image
position.  That buffering is unnecessary: a removal occupies no
post-image line, so it does not advance lno_in_postimage, and xdiff
emits removals before additions within a change.  A '-' therefore
arrives while lno_in_postimage already holds the index the following
'+'/' ' will occupy, and can be classified against the ranges as it
arrives.

The buffering also hid a bug: flush_range_hunk() drained pending_rm into
the range hunk whenever the hunk was active, even after lno_in_postimage
had advanced past the tracked range, so a deletion just after the
tracked function leaked into the patch.  Classifying each line as it
arrives removes the pending_rm buffer, the discard_pending_rm() helper,
three struct fields, and makes that bug impossible by construction.

With every line classified on arrival, the buffered lines are the
hunk's single source of truth, so the old/new counts need not be kept
alongside them: flush_range_hunk() derives the counts (and whether the
hunk holds any change) from the buffer when it builds the header.  Drop
the per-line counting and the old_count, new_count, and has_changes
fields; there is no longer a second tally that could fall out of sync
with the buffer.

Add begin_range_hunk() to open the accumulator at the first in-range
line, seeding both begins from the live image cursors, as the
counterpart to flush_range_hunk().  With the counting gone too,
line_range_line_fn() now only appends an in-range line.

Document the coordinate model: a block comment on struct
line_range_filter states it (the pre/post-image cursors, the 0-based
idx_in_postimage, removals classified by the following line) with a
worked example.

Add tests for the leaked trailing deletion this fixes, the symmetric
leading-deletion case, and the filter's range boundaries (a change at
the first and last line of a range, and a pure in-range deletion).

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 diff.c              | 215 ++++++++++++++++++++++++--------------------
 t/t4211-line-log.sh | 125 ++++++++++++++++++++++++++
 2 files changed, 243 insertions(+), 97 deletions(-)

diff --git a/diff.c b/diff.c
index 1e043c959f..ee765d7ac2 100644
--- a/diff.c
+++ b/diff.c
@@ -610,18 +610,58 @@ struct emit_callback {
 };
 
 /*
- * State for the line-range callback wrappers that sit between
- * xdi_diff_outf() and fn_out_consume().  xdiff produces a normal,
- * unfiltered diff; the wrappers intercept each hunk header and line,
- * track post-image position, and forward only lines that fall within
- * the requested ranges.  Contiguous in-range lines are collected into
- * range hunks and flushed with a synthetic @@ header so that
- * fn_out_consume() sees well-formed unified-diff fragments.
+ * Line-range filter: scopes "git log -L" output to the tracked ranges.
  *
- * Removal lines ('-') cannot be classified by post-image position, so
- * they are buffered in pending_rm until the next '+' or ' ' line
- * reveals whether they precede an in-range line (flush into range hunk) or
- * an out-of-range line (discard).
+ * It sits between xdi_diff_outf() and an output callback (fn_out_consume,
+ * diffstat_consume, checkdiff_consume).  xdiff produces a normal diff; the
+ * filter forwards only the lines inside the requested ranges, collecting
+ * contiguous in-range lines into a "range hunk" emitted with a synthetic
+ * @@ header so the callback sees well-formed unified-diff fragments.
+ *
+ * A diff describes the change from a pre-image to a post-image.  Each
+ * line is context (' ', in both), a removal ('-', pre-image only), or
+ * an addition ('+', post-image only).  -L tracks ranges in the
+ * post-image, so a line is in range by its post-image position.
+ *
+ * Two 1-based cursors track the next line in each image, named as in
+ * struct emit_callback and seeded from the xdiff hunk header:
+ *
+ *	lno_in_postimage  advances on '+' and ' '   (lines in the post-image)
+ *	lno_in_preimage   advances on '-' and ' '   (lines in the pre-image)
+ *
+ * Ranges are 0-based half-open [start, end), so a line is tested at the
+ * 0-based index idx_in_postimage = lno_in_postimage - 1.
+ *
+ * A '-' is not present in the post-image, so it has no post-image line
+ * number of its own.  Since it does not advance lno_in_postimage, it is
+ * classified at the idx_in_postimage that the following '+'/' ' will
+ * occupy.  xdiff emits a change's removals before its additions, so that
+ * index is already known when the '-' arrives.
+ *
+ * The synthetic "@@ -<old> +<new> @@" header has two sides, old (the
+ * pre-image) and new (the post-image), matching the xdiff_emit_hunk_fn
+ * callback; the hunk.old_begin / hunk.new_begin fields below hold those
+ * begins, and flush_range_hunk() derives the counts from the buffered
+ * lines.
+ *
+ * Example, tracking post-image line 2 (range [1, 2)) of:
+ *
+ *	pre-image  post-image
+ *	1 a        1 a
+ *	2 b        2 X      (b -> X)
+ *	3 c        3 c
+ *
+ *	classify each line by idx_in_postimage.  The pre and post columns
+ *	are each cursor's value while that line is classified, i.e. before
+ *	the line advances them (pre = lno_in_preimage,
+ *	post = lno_in_postimage, idx = idx_in_postimage):
+ *	' a'  pre 1  post 1  idx 0  ->  before start, skip
+ *	'-b'  pre 2  post 2  idx 1  ->  keep (removal)
+ *	'+X'  pre 3  post 2  idx 1  ->  keep (addition)
+ *	' c'  pre 3  post 3  idx 2  ->  past end, flush
+ *
+ * -b and +X share idx = 1 because -b did not advance lno_in_postimage;
+ * both land in the range hunk, flushed when ' c' crosses the range end.
  */
 struct line_range_filter {
 	xdiff_emit_line_fn orig_line_fn;
@@ -640,20 +680,18 @@ struct line_range_filter {
 	char func[80];
 	long funclen;
 
-	/* The range hunk being accumulated for the current range. */
+	/*
+	 * The range hunk being accumulated.  At most one is live at a time:
+	 * it is flushed and reset as the cursor leaves each range (and once
+	 * more at end of diff), then reused for the next range.
+	 */
 	struct {
 		struct strbuf lines;	/* buffered in-range diff lines */
-		long old_begin, old_count;
-		long new_begin, new_count;
+		long old_begin;
+		long new_begin;
 		int active;
-		int has_changes;	/* any '+' or '-' line? */
 	} hunk;
 
-	/* Removal lines not yet known to be in-range */
-	struct strbuf pending_rm;
-	int pending_rm_count;
-	long pending_rm_pre_begin;	/* pre-image line of first pending */
-
 	int ret;			/* latched error from orig_line_fn */
 };
 
@@ -2542,26 +2580,48 @@ static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED
 	return 1;
 }
 
-static void discard_pending_rm(struct line_range_filter *filter)
+/*
+ * Begin a range hunk at the first in-range line.  Its position fixes the
+ * hunk's begins, taken from the two image cursors before they advance:
+ * new_begin from the post-image, old_begin from the pre-image.  The line
+ * counts are not tracked here; flush_range_hunk() derives them from the
+ * buffered lines.
+ */
+static void begin_range_hunk(struct line_range_filter *filter)
 {
-	strbuf_reset(&filter->pending_rm);
-	filter->pending_rm_count = 0;
+	filter->hunk.active = 1;
+	filter->hunk.new_begin = filter->lno_in_postimage;
+	filter->hunk.old_begin = filter->lno_in_preimage;
+	strbuf_reset(&filter->hunk.lines);
 }
 
 static void flush_range_hunk(struct line_range_filter *filter)
 {
 	struct strbuf hdr = STRBUF_INIT;
 	const char *p, *end;
+	long old_count = 0, new_count = 0;
+	int has_changes = 0;
 
 	if (!filter->hunk.active || filter->ret)
 		return;
 
-	/* Drain any pending removal lines into the range hunk */
-	if (filter->pending_rm_count) {
-		strbuf_addbuf(&filter->hunk.lines, &filter->pending_rm);
-		filter->hunk.old_count += filter->pending_rm_count;
-		filter->hunk.has_changes = 1;
-		discard_pending_rm(filter);
+	/*
+	 * Derive the hunk's geometry from the buffered lines: a ' '
+	 * counts on both sides, a '-' on the old side, a '+' on the new.
+	 * A '-' or '+' marks a real change; the "\ No newline at end of
+	 * file" marker (line[0] == '\\') counts on neither side.
+	 */
+	p = filter->hunk.lines.buf;
+	end = p + filter->hunk.lines.len;
+	while (p < end) {
+		const char *eol = memchr(p, '\n', end - p);
+		if (*p == ' ' || *p == '-')
+			old_count++;
+		if (*p == ' ' || *p == '+')
+			new_count++;
+		if (*p == '-' || *p == '+')
+			has_changes = 1;
+		p = eol ? eol + 1 : end;
 	}
 
 	/*
@@ -2570,15 +2630,15 @@ static void flush_range_hunk(struct line_range_filter *filter)
 	 * ctxlen causes xdiff to emit context covering a range that
 	 * has no changes in this commit.
 	 */
-	if (!filter->hunk.has_changes) {
+	if (!has_changes) {
 		filter->hunk.active = 0;
 		strbuf_reset(&filter->hunk.lines);
 		return;
 	}
 
 	strbuf_addf(&hdr, "@@ -%ld,%ld +%ld,%ld @@",
-		    filter->hunk.old_begin, filter->hunk.old_count,
-		    filter->hunk.new_begin, filter->hunk.new_count);
+		    filter->hunk.old_begin, old_count,
+		    filter->hunk.new_begin, new_count);
 	if (filter->funclen > 0) {
 		strbuf_addch(&hdr, ' ');
 		strbuf_add(&hdr, filter->func, filter->funclen);
@@ -2618,11 +2678,6 @@ static void line_range_hunk_fn(void *data,
 	 * When count > 0, begin is 1-based.  When count == 0, begin is
 	 * adjusted down by 1 by xdl_emit_hunk_hdr(), but no lines of
 	 * that type will arrive, so the value is unused.
-	 *
-	 * Any pending removal lines from the previous xdiff hunk are
-	 * intentionally left in pending_rm: the line callback will
-	 * flush or discard them when the next content line reveals
-	 * whether the removals precede in-range content.
 	 */
 	filter->lno_in_postimage = new_begin;
 	filter->lno_in_preimage = old_begin;
@@ -2638,88 +2693,56 @@ static void line_range_hunk_fn(void *data,
 static int line_range_line_fn(void *priv, char *line, unsigned long len)
 {
 	struct line_range_filter *filter = priv;
-	const struct range *cur;
-	long idx_in_postimage, cur_pre;
+	long idx_in_postimage;
+	int in_range;
 
 	if (filter->ret)
 		return filter->ret;
 
-	if (line[0] == '-') {
-		if (!filter->pending_rm_count)
-			filter->pending_rm_pre_begin = filter->lno_in_preimage;
-		filter->lno_in_preimage++;
-		strbuf_add(&filter->pending_rm, line, len);
-		filter->pending_rm_count++;
-		return filter->ret;
-	}
-
 	if (line[0] == '\\') {
-		if (filter->pending_rm_count)
-			strbuf_add(&filter->pending_rm, line, len);
-		else if (filter->hunk.active)
+		if (filter->hunk.active)
 			strbuf_add(&filter->hunk.lines, line, len);
-		/* otherwise outside tracked range; drop silently */
 		return filter->ret;
 	}
 
-	if (line[0] != '+' && line[0] != ' ')
+	if (line[0] != '+' && line[0] != ' ' && line[0] != '-')
 		BUG("unexpected diff line type '%c'", line[0]);
 
+	/*
+	 * idx_in_postimage is this line's 0-based post-image index (see the model on
+	 * struct line_range_filter).  The cursors are advanced only after
+	 * the line is classified, so a '-' is tested at the same idx_in_postimage as
+	 * the '+'/' ' that follows it.
+	 */
 	idx_in_postimage = filter->lno_in_postimage - 1;
-	cur_pre = filter->lno_in_preimage;	/* save before advancing for context lines */
-	filter->lno_in_postimage++;
-	if (line[0] == ' ')
-		filter->lno_in_preimage++;
 
-	/* Advance past ranges we've passed */
+	/* Retire ranges we have passed, flushing the one we leave. */
 	while (filter->cur_range < filter->ranges->nr &&
 	       idx_in_postimage >= filter->ranges->ranges[filter->cur_range].end) {
 		if (filter->hunk.active)
 			flush_range_hunk(filter);
-		discard_pending_rm(filter);
 		filter->cur_range++;
 	}
 
-	/* Past all ranges */
-	if (filter->cur_range >= filter->ranges->nr) {
-		discard_pending_rm(filter);
-		return filter->ret;
-	}
+	in_range = filter->cur_range < filter->ranges->nr &&
+		   idx_in_postimage >= filter->ranges->ranges[filter->cur_range].start &&
+		   idx_in_postimage < filter->ranges->ranges[filter->cur_range].end;
 
-	cur = &filter->ranges->ranges[filter->cur_range];
+	if (in_range) {
+		if (!filter->hunk.active)
+			begin_range_hunk(filter);
 
-	/* Before current range */
-	if (idx_in_postimage < cur->start) {
-		discard_pending_rm(filter);
-		return filter->ret;
+		strbuf_add(&filter->hunk.lines, line, len);
 	}
 
-	/* In range so start a new range hunk if needed */
-	if (!filter->hunk.active) {
-		filter->hunk.active = 1;
-		filter->hunk.has_changes = 0;
-		filter->hunk.new_begin = idx_in_postimage + 1;
-		filter->hunk.old_begin = filter->pending_rm_count
-			? filter->pending_rm_pre_begin : cur_pre;
-		filter->hunk.old_count = 0;
-		filter->hunk.new_count = 0;
-		strbuf_reset(&filter->hunk.lines);
-	}
-
-	/* Flush pending removals into range hunk */
-	if (filter->pending_rm_count) {
-		strbuf_addbuf(&filter->hunk.lines, &filter->pending_rm);
-		filter->hunk.old_count += filter->pending_rm_count;
-		filter->hunk.has_changes = 1;
-		discard_pending_rm(filter);
-	}
-
-	strbuf_add(&filter->hunk.lines, line, len);
-	filter->hunk.new_count++;
-	if (line[0] == '+')
-		filter->hunk.has_changes = 1;
-	else
-		filter->hunk.old_count++;
+	/*
+	 * Advance each image's cursor: a line present in that image (see
+	 * the model) consumes one of its line numbers.
+	 */
+	if (line[0] != '-')
+		filter->lno_in_postimage++;
+	if (line[0] != '+')
+		filter->lno_in_preimage++;
 
 	return filter->ret;
 }
@@ -4097,7 +4120,6 @@ static void builtin_diff(const char *name_a,
 			lr_state.orig_cb_data = &ecbdata;
 			lr_state.ranges = line_ranges;
 			strbuf_init(&lr_state.hunk.lines, 0);
-			strbuf_init(&lr_state.pending_rm, 0);
 
 			/*
 			 * Inflate ctxlen so that all changes within
@@ -4132,7 +4154,6 @@ static void builtin_diff(const char *name_a,
 				die("unable to generate diff for %s",
 				    one->path);
 			strbuf_release(&lr_state.hunk.lines);
-			strbuf_release(&lr_state.pending_rm);
 		} else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
 					 &ecbdata, &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index ca4eb7bbc7..e9691066de 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -738,6 +738,131 @@ test_expect_success '-L with -G filters to diff-text matches' '
 	grep "F2 + 2" actual
 '
 
+test_expect_success 'setup for trailing deletion test' '
+	git checkout --orphan trailing-del &&
+	git reset --hard &&
+	cat >file.c <<-\EOF &&
+	void tracked()
+	{
+	    return 1;
+	}
+	// trailing comment
+	EOF
+	git add file.c &&
+	test_tick &&
+	git commit -m "add file with trailing comment" &&
+	# Modify tracked() AND delete the trailing comment in
+	# one commit, so the commit touches the tracked range
+	# and is not filtered out by the revision walker.
+	cat >file.c <<-\EOF &&
+	void tracked()
+	{
+	    return 2;
+	}
+	EOF
+	git commit -a -m "modify tracked and delete trailing comment"
+'
+
+test_expect_success '-L does not include deletions past end of tracked range' '
+	git log -L:tracked:file.c --format= -1 -p >actual &&
+	# The trailing comment deletion is outside the tracked
+	# range and should not appear in the patch output.
+	test_grep "return 2" actual &&
+	test_grep ! "trailing comment" actual
+'
+
+test_expect_success '-L includes leading deletions resolved by in-range line' '
+	git checkout --orphan leading-del &&
+	git reset --hard &&
+	cat >file.c <<-\EOF &&
+	// leading comment
+	void tracked()
+	{
+	    return 1;
+	}
+	EOF
+	git add file.c &&
+	test_tick &&
+	git commit -m "add file with leading comment" &&
+	cat >file.c <<-\EOF &&
+	void tracked()
+	{
+	    return 2;
+	}
+	EOF
+	git commit -a -m "modify tracked and delete leading comment" &&
+	git log -L:tracked:file.c --format= -1 -p >actual &&
+	# The leading comment deletion is resolved by the next
+	# non-removal line (void tracked), which is in range: a
+	# removal is classified by the position of the following
+	# line, so it joins the range that line falls in.
+	test_grep "return 2" actual &&
+	test_grep "leading comment" actual
+'
+
+test_expect_success 'setup for line-range filter edge cases' '
+	git checkout --orphan filter-edge &&
+	git reset --hard &&
+	cat >file.c <<-\EOF &&
+	void before()
+	{
+	    return 0;
+	}
+
+	void tracked()
+	{
+	    int a = 1;
+	    int b = 2;
+	    int c = 3;
+	    return a + b + c;
+	}
+
+	void after()
+	{
+	    return 9;
+	}
+	EOF
+	git add file.c &&
+	test_tick &&
+	git commit -m "initial"
+'
+
+test_expect_success '-L change at exact first line of range' '
+	git checkout filter-edge &&
+	# Change the function signature (first line of range)
+	sed "s/void tracked/int tracked/" file.c >tmp &&
+	mv tmp file.c &&
+	git commit -a -m "change first line" &&
+	git log -L:tracked:file.c -p --format=%s -1 >actual &&
+	test_grep "change first line" actual &&
+	test_grep "+int tracked" actual &&
+	test_grep "\\-void tracked" actual
+'
+
+test_expect_success '-L change at exact last line of range' '
+	git checkout filter-edge &&
+	git reset --hard HEAD~1 &&
+	# Change the closing brace line (last line of range)
+	sed "s/^}$/} \/\/ end tracked/" file.c >tmp &&
+	mv tmp file.c &&
+	git commit -a -m "change last line" &&
+	git log -L:tracked:file.c -p --format=%s -1 >actual &&
+	test_grep "change last line" actual &&
+	test_grep "end tracked" actual
+'
+
+test_expect_success '-L pure deletion in range (no additions)' '
+	git checkout filter-edge &&
+	git reset --hard HEAD~1 &&
+	# Delete a line inside tracked() without adding anything
+	sed "/int c/d" file.c >tmp &&
+	mv tmp file.c &&
+	git commit -a -m "pure deletion" &&
+	git log -L:tracked:file.c -p --format=%s -1 >actual &&
+	test_grep "pure deletion" actual &&
+	test_grep "\\-.*int c" actual
+'
+
 test_expect_success '-L with --diff-filter=M excludes root commit' '
 	git checkout parent-oids &&
 	git log -L:func2:file.c --diff-filter=M --format=%s --no-patch >actual &&
-- 
gitgitgadget


  parent reply	other threads:[~2026-06-18 18:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 18:16 [PATCH 0/7] line-log: range-scope stat, check, and -G under -L Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 1/7] diff: rename and group the line-range filter for clarity Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` Michael Montalbo via GitGitGadget [this message]
2026-06-18 18:16 ` [PATCH 3/7] diff: emit -L hunk headers via xdiff's formatter Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 4/7] diff: extract a line-range diff helper for reuse Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 5/7] line-log: support diff stat formats with -L Michael Montalbo via GitGitGadget
2026-06-18 22:00   ` Junio C Hamano
2026-06-18 18:16 ` [PATCH 6/7] diff: support --check with -L line ranges Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 7/7] diffcore-pickaxe: scope -G to the -L tracked range Michael Montalbo via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5602b7976a0fcb6eeb582d4cd88ca379f48dd98d.1781806593.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mmontalbo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.