From: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "D. Ben Knoble" <ben.knoble@gmail.com>,
Michael Montalbo <mmontalbo@gmail.com>,
Michael Montalbo <mmontalbo@gmail.com>
Subject: [PATCH 2/7] diff: simplify the line-range filter by classifying removals immediately
Date: Thu, 18 Jun 2026 18:16:27 +0000 [thread overview]
Message-ID: <5602b7976a0fcb6eeb582d4cd88ca379f48dd98d.1781806593.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2152.git.1781806593.gitgitgadget@gmail.com>
From: Michael Montalbo <mmontalbo@gmail.com>
The filter buffered '-' lines in a pending_rm strbuf, deferring their
classification until a '+' or ' ' line revealed the post-image
position. That buffering is unnecessary: a removal occupies no
post-image line, so it does not advance lno_in_postimage, and xdiff
emits removals before additions within a change. A '-' therefore
arrives while lno_in_postimage already holds the index the following
'+'/' ' will occupy, and can be classified against the ranges as it
arrives.
The buffering also hid a bug: flush_range_hunk() drained pending_rm into
the range hunk whenever the hunk was active, even after lno_in_postimage
had advanced past the tracked range, so a deletion just after the
tracked function leaked into the patch. Classifying each line as it
arrives removes the pending_rm buffer, the discard_pending_rm() helper,
three struct fields, and makes that bug impossible by construction.
With every line classified on arrival, the buffered lines are the
hunk's single source of truth, so the old/new counts need not be kept
alongside them: flush_range_hunk() derives the counts (and whether the
hunk holds any change) from the buffer when it builds the header. Drop
the per-line counting and the old_count, new_count, and has_changes
fields; there is no longer a second tally that could fall out of sync
with the buffer.
Add begin_range_hunk() to open the accumulator at the first in-range
line, seeding both begins from the live image cursors, as the
counterpart to flush_range_hunk(). With the counting gone too,
line_range_line_fn() now only appends an in-range line.
Document the coordinate model: a block comment on struct
line_range_filter states it (the pre/post-image cursors, the 0-based
idx_in_postimage, removals classified by the following line) with a
worked example.
Add tests for the leaked trailing deletion this fixes, the symmetric
leading-deletion case, and the filter's range boundaries (a change at
the first and last line of a range, and a pure in-range deletion).
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
diff.c | 215 ++++++++++++++++++++++++--------------------
t/t4211-line-log.sh | 125 ++++++++++++++++++++++++++
2 files changed, 243 insertions(+), 97 deletions(-)
diff --git a/diff.c b/diff.c
index 1e043c959f..ee765d7ac2 100644
--- a/diff.c
+++ b/diff.c
@@ -610,18 +610,58 @@ struct emit_callback {
};
/*
- * State for the line-range callback wrappers that sit between
- * xdi_diff_outf() and fn_out_consume(). xdiff produces a normal,
- * unfiltered diff; the wrappers intercept each hunk header and line,
- * track post-image position, and forward only lines that fall within
- * the requested ranges. Contiguous in-range lines are collected into
- * range hunks and flushed with a synthetic @@ header so that
- * fn_out_consume() sees well-formed unified-diff fragments.
+ * Line-range filter: scopes "git log -L" output to the tracked ranges.
*
- * Removal lines ('-') cannot be classified by post-image position, so
- * they are buffered in pending_rm until the next '+' or ' ' line
- * reveals whether they precede an in-range line (flush into range hunk) or
- * an out-of-range line (discard).
+ * It sits between xdi_diff_outf() and an output callback (fn_out_consume,
+ * diffstat_consume, checkdiff_consume). xdiff produces a normal diff; the
+ * filter forwards only the lines inside the requested ranges, collecting
+ * contiguous in-range lines into a "range hunk" emitted with a synthetic
+ * @@ header so the callback sees well-formed unified-diff fragments.
+ *
+ * A diff describes the change from a pre-image to a post-image. Each
+ * line is context (' ', in both), a removal ('-', pre-image only), or
+ * an addition ('+', post-image only). -L tracks ranges in the
+ * post-image, so a line is in range by its post-image position.
+ *
+ * Two 1-based cursors track the next line in each image, named as in
+ * struct emit_callback and seeded from the xdiff hunk header:
+ *
+ * lno_in_postimage advances on '+' and ' ' (lines in the post-image)
+ * lno_in_preimage advances on '-' and ' ' (lines in the pre-image)
+ *
+ * Ranges are 0-based half-open [start, end), so a line is tested at the
+ * 0-based index idx_in_postimage = lno_in_postimage - 1.
+ *
+ * A '-' is not present in the post-image, so it has no post-image line
+ * number of its own. Since it does not advance lno_in_postimage, it is
+ * classified at the idx_in_postimage that the following '+'/' ' will
+ * occupy. xdiff emits a change's removals before its additions, so that
+ * index is already known when the '-' arrives.
+ *
+ * The synthetic "@@ -<old> +<new> @@" header has two sides, old (the
+ * pre-image) and new (the post-image), matching the xdiff_emit_hunk_fn
+ * callback; the hunk.old_begin / hunk.new_begin fields below hold those
+ * begins, and flush_range_hunk() derives the counts from the buffered
+ * lines.
+ *
+ * Example, tracking post-image line 2 (range [1, 2)) of:
+ *
+ * pre-image post-image
+ * 1 a 1 a
+ * 2 b 2 X (b -> X)
+ * 3 c 3 c
+ *
+ * classify each line by idx_in_postimage. The pre and post columns
+ * are each cursor's value while that line is classified, i.e. before
+ * the line advances them (pre = lno_in_preimage,
+ * post = lno_in_postimage, idx = idx_in_postimage):
+ * ' a' pre 1 post 1 idx 0 -> before start, skip
+ * '-b' pre 2 post 2 idx 1 -> keep (removal)
+ * '+X' pre 3 post 2 idx 1 -> keep (addition)
+ * ' c' pre 3 post 3 idx 2 -> past end, flush
+ *
+ * -b and +X share idx = 1 because -b did not advance lno_in_postimage;
+ * both land in the range hunk, flushed when ' c' crosses the range end.
*/
struct line_range_filter {
xdiff_emit_line_fn orig_line_fn;
@@ -640,20 +680,18 @@ struct line_range_filter {
char func[80];
long funclen;
- /* The range hunk being accumulated for the current range. */
+ /*
+ * The range hunk being accumulated. At most one is live at a time:
+ * it is flushed and reset as the cursor leaves each range (and once
+ * more at end of diff), then reused for the next range.
+ */
struct {
struct strbuf lines; /* buffered in-range diff lines */
- long old_begin, old_count;
- long new_begin, new_count;
+ long old_begin;
+ long new_begin;
int active;
- int has_changes; /* any '+' or '-' line? */
} hunk;
- /* Removal lines not yet known to be in-range */
- struct strbuf pending_rm;
- int pending_rm_count;
- long pending_rm_pre_begin; /* pre-image line of first pending */
-
int ret; /* latched error from orig_line_fn */
};
@@ -2542,26 +2580,48 @@ static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED
return 1;
}
-static void discard_pending_rm(struct line_range_filter *filter)
+/*
+ * Begin a range hunk at the first in-range line. Its position fixes the
+ * hunk's begins, taken from the two image cursors before they advance:
+ * new_begin from the post-image, old_begin from the pre-image. The line
+ * counts are not tracked here; flush_range_hunk() derives them from the
+ * buffered lines.
+ */
+static void begin_range_hunk(struct line_range_filter *filter)
{
- strbuf_reset(&filter->pending_rm);
- filter->pending_rm_count = 0;
+ filter->hunk.active = 1;
+ filter->hunk.new_begin = filter->lno_in_postimage;
+ filter->hunk.old_begin = filter->lno_in_preimage;
+ strbuf_reset(&filter->hunk.lines);
}
static void flush_range_hunk(struct line_range_filter *filter)
{
struct strbuf hdr = STRBUF_INIT;
const char *p, *end;
+ long old_count = 0, new_count = 0;
+ int has_changes = 0;
if (!filter->hunk.active || filter->ret)
return;
- /* Drain any pending removal lines into the range hunk */
- if (filter->pending_rm_count) {
- strbuf_addbuf(&filter->hunk.lines, &filter->pending_rm);
- filter->hunk.old_count += filter->pending_rm_count;
- filter->hunk.has_changes = 1;
- discard_pending_rm(filter);
+ /*
+ * Derive the hunk's geometry from the buffered lines: a ' '
+ * counts on both sides, a '-' on the old side, a '+' on the new.
+ * A '-' or '+' marks a real change; the "\ No newline at end of
+ * file" marker (line[0] == '\\') counts on neither side.
+ */
+ p = filter->hunk.lines.buf;
+ end = p + filter->hunk.lines.len;
+ while (p < end) {
+ const char *eol = memchr(p, '\n', end - p);
+ if (*p == ' ' || *p == '-')
+ old_count++;
+ if (*p == ' ' || *p == '+')
+ new_count++;
+ if (*p == '-' || *p == '+')
+ has_changes = 1;
+ p = eol ? eol + 1 : end;
}
/*
@@ -2570,15 +2630,15 @@ static void flush_range_hunk(struct line_range_filter *filter)
* ctxlen causes xdiff to emit context covering a range that
* has no changes in this commit.
*/
- if (!filter->hunk.has_changes) {
+ if (!has_changes) {
filter->hunk.active = 0;
strbuf_reset(&filter->hunk.lines);
return;
}
strbuf_addf(&hdr, "@@ -%ld,%ld +%ld,%ld @@",
- filter->hunk.old_begin, filter->hunk.old_count,
- filter->hunk.new_begin, filter->hunk.new_count);
+ filter->hunk.old_begin, old_count,
+ filter->hunk.new_begin, new_count);
if (filter->funclen > 0) {
strbuf_addch(&hdr, ' ');
strbuf_add(&hdr, filter->func, filter->funclen);
@@ -2618,11 +2678,6 @@ static void line_range_hunk_fn(void *data,
* When count > 0, begin is 1-based. When count == 0, begin is
* adjusted down by 1 by xdl_emit_hunk_hdr(), but no lines of
* that type will arrive, so the value is unused.
- *
- * Any pending removal lines from the previous xdiff hunk are
- * intentionally left in pending_rm: the line callback will
- * flush or discard them when the next content line reveals
- * whether the removals precede in-range content.
*/
filter->lno_in_postimage = new_begin;
filter->lno_in_preimage = old_begin;
@@ -2638,88 +2693,56 @@ static void line_range_hunk_fn(void *data,
static int line_range_line_fn(void *priv, char *line, unsigned long len)
{
struct line_range_filter *filter = priv;
- const struct range *cur;
- long idx_in_postimage, cur_pre;
+ long idx_in_postimage;
+ int in_range;
if (filter->ret)
return filter->ret;
- if (line[0] == '-') {
- if (!filter->pending_rm_count)
- filter->pending_rm_pre_begin = filter->lno_in_preimage;
- filter->lno_in_preimage++;
- strbuf_add(&filter->pending_rm, line, len);
- filter->pending_rm_count++;
- return filter->ret;
- }
-
if (line[0] == '\\') {
- if (filter->pending_rm_count)
- strbuf_add(&filter->pending_rm, line, len);
- else if (filter->hunk.active)
+ if (filter->hunk.active)
strbuf_add(&filter->hunk.lines, line, len);
- /* otherwise outside tracked range; drop silently */
return filter->ret;
}
- if (line[0] != '+' && line[0] != ' ')
+ if (line[0] != '+' && line[0] != ' ' && line[0] != '-')
BUG("unexpected diff line type '%c'", line[0]);
+ /*
+ * idx_in_postimage is this line's 0-based post-image index (see the model on
+ * struct line_range_filter). The cursors are advanced only after
+ * the line is classified, so a '-' is tested at the same idx_in_postimage as
+ * the '+'/' ' that follows it.
+ */
idx_in_postimage = filter->lno_in_postimage - 1;
- cur_pre = filter->lno_in_preimage; /* save before advancing for context lines */
- filter->lno_in_postimage++;
- if (line[0] == ' ')
- filter->lno_in_preimage++;
- /* Advance past ranges we've passed */
+ /* Retire ranges we have passed, flushing the one we leave. */
while (filter->cur_range < filter->ranges->nr &&
idx_in_postimage >= filter->ranges->ranges[filter->cur_range].end) {
if (filter->hunk.active)
flush_range_hunk(filter);
- discard_pending_rm(filter);
filter->cur_range++;
}
- /* Past all ranges */
- if (filter->cur_range >= filter->ranges->nr) {
- discard_pending_rm(filter);
- return filter->ret;
- }
+ in_range = filter->cur_range < filter->ranges->nr &&
+ idx_in_postimage >= filter->ranges->ranges[filter->cur_range].start &&
+ idx_in_postimage < filter->ranges->ranges[filter->cur_range].end;
- cur = &filter->ranges->ranges[filter->cur_range];
+ if (in_range) {
+ if (!filter->hunk.active)
+ begin_range_hunk(filter);
- /* Before current range */
- if (idx_in_postimage < cur->start) {
- discard_pending_rm(filter);
- return filter->ret;
+ strbuf_add(&filter->hunk.lines, line, len);
}
- /* In range so start a new range hunk if needed */
- if (!filter->hunk.active) {
- filter->hunk.active = 1;
- filter->hunk.has_changes = 0;
- filter->hunk.new_begin = idx_in_postimage + 1;
- filter->hunk.old_begin = filter->pending_rm_count
- ? filter->pending_rm_pre_begin : cur_pre;
- filter->hunk.old_count = 0;
- filter->hunk.new_count = 0;
- strbuf_reset(&filter->hunk.lines);
- }
-
- /* Flush pending removals into range hunk */
- if (filter->pending_rm_count) {
- strbuf_addbuf(&filter->hunk.lines, &filter->pending_rm);
- filter->hunk.old_count += filter->pending_rm_count;
- filter->hunk.has_changes = 1;
- discard_pending_rm(filter);
- }
-
- strbuf_add(&filter->hunk.lines, line, len);
- filter->hunk.new_count++;
- if (line[0] == '+')
- filter->hunk.has_changes = 1;
- else
- filter->hunk.old_count++;
+ /*
+ * Advance each image's cursor: a line present in that image (see
+ * the model) consumes one of its line numbers.
+ */
+ if (line[0] != '-')
+ filter->lno_in_postimage++;
+ if (line[0] != '+')
+ filter->lno_in_preimage++;
return filter->ret;
}
@@ -4097,7 +4120,6 @@ static void builtin_diff(const char *name_a,
lr_state.orig_cb_data = &ecbdata;
lr_state.ranges = line_ranges;
strbuf_init(&lr_state.hunk.lines, 0);
- strbuf_init(&lr_state.pending_rm, 0);
/*
* Inflate ctxlen so that all changes within
@@ -4132,7 +4154,6 @@ static void builtin_diff(const char *name_a,
die("unable to generate diff for %s",
one->path);
strbuf_release(&lr_state.hunk.lines);
- strbuf_release(&lr_state.pending_rm);
} else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
&ecbdata, &xpp, &xecfg))
die("unable to generate diff for %s", one->path);
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index ca4eb7bbc7..e9691066de 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -738,6 +738,131 @@ test_expect_success '-L with -G filters to diff-text matches' '
grep "F2 + 2" actual
'
+test_expect_success 'setup for trailing deletion test' '
+ git checkout --orphan trailing-del &&
+ git reset --hard &&
+ cat >file.c <<-\EOF &&
+ void tracked()
+ {
+ return 1;
+ }
+ // trailing comment
+ EOF
+ git add file.c &&
+ test_tick &&
+ git commit -m "add file with trailing comment" &&
+ # Modify tracked() AND delete the trailing comment in
+ # one commit, so the commit touches the tracked range
+ # and is not filtered out by the revision walker.
+ cat >file.c <<-\EOF &&
+ void tracked()
+ {
+ return 2;
+ }
+ EOF
+ git commit -a -m "modify tracked and delete trailing comment"
+'
+
+test_expect_success '-L does not include deletions past end of tracked range' '
+ git log -L:tracked:file.c --format= -1 -p >actual &&
+ # The trailing comment deletion is outside the tracked
+ # range and should not appear in the patch output.
+ test_grep "return 2" actual &&
+ test_grep ! "trailing comment" actual
+'
+
+test_expect_success '-L includes leading deletions resolved by in-range line' '
+ git checkout --orphan leading-del &&
+ git reset --hard &&
+ cat >file.c <<-\EOF &&
+ // leading comment
+ void tracked()
+ {
+ return 1;
+ }
+ EOF
+ git add file.c &&
+ test_tick &&
+ git commit -m "add file with leading comment" &&
+ cat >file.c <<-\EOF &&
+ void tracked()
+ {
+ return 2;
+ }
+ EOF
+ git commit -a -m "modify tracked and delete leading comment" &&
+ git log -L:tracked:file.c --format= -1 -p >actual &&
+ # The leading comment deletion is resolved by the next
+ # non-removal line (void tracked), which is in range: a
+ # removal is classified by the position of the following
+ # line, so it joins the range that line falls in.
+ test_grep "return 2" actual &&
+ test_grep "leading comment" actual
+'
+
+test_expect_success 'setup for line-range filter edge cases' '
+ git checkout --orphan filter-edge &&
+ git reset --hard &&
+ cat >file.c <<-\EOF &&
+ void before()
+ {
+ return 0;
+ }
+
+ void tracked()
+ {
+ int a = 1;
+ int b = 2;
+ int c = 3;
+ return a + b + c;
+ }
+
+ void after()
+ {
+ return 9;
+ }
+ EOF
+ git add file.c &&
+ test_tick &&
+ git commit -m "initial"
+'
+
+test_expect_success '-L change at exact first line of range' '
+ git checkout filter-edge &&
+ # Change the function signature (first line of range)
+ sed "s/void tracked/int tracked/" file.c >tmp &&
+ mv tmp file.c &&
+ git commit -a -m "change first line" &&
+ git log -L:tracked:file.c -p --format=%s -1 >actual &&
+ test_grep "change first line" actual &&
+ test_grep "+int tracked" actual &&
+ test_grep "\\-void tracked" actual
+'
+
+test_expect_success '-L change at exact last line of range' '
+ git checkout filter-edge &&
+ git reset --hard HEAD~1 &&
+ # Change the closing brace line (last line of range)
+ sed "s/^}$/} \/\/ end tracked/" file.c >tmp &&
+ mv tmp file.c &&
+ git commit -a -m "change last line" &&
+ git log -L:tracked:file.c -p --format=%s -1 >actual &&
+ test_grep "change last line" actual &&
+ test_grep "end tracked" actual
+'
+
+test_expect_success '-L pure deletion in range (no additions)' '
+ git checkout filter-edge &&
+ git reset --hard HEAD~1 &&
+ # Delete a line inside tracked() without adding anything
+ sed "/int c/d" file.c >tmp &&
+ mv tmp file.c &&
+ git commit -a -m "pure deletion" &&
+ git log -L:tracked:file.c -p --format=%s -1 >actual &&
+ test_grep "pure deletion" actual &&
+ test_grep "\\-.*int c" actual
+'
+
test_expect_success '-L with --diff-filter=M excludes root commit' '
git checkout parent-oids &&
git log -L:func2:file.c --diff-filter=M --format=%s --no-patch >actual &&
--
gitgitgadget
next prev parent reply other threads:[~2026-06-18 18:16 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-18 18:16 [PATCH 0/7] line-log: range-scope stat, check, and -G under -L Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 1/7] diff: rename and group the line-range filter for clarity Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` Michael Montalbo via GitGitGadget [this message]
2026-06-18 18:16 ` [PATCH 3/7] diff: emit -L hunk headers via xdiff's formatter Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 4/7] diff: extract a line-range diff helper for reuse Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 5/7] line-log: support diff stat formats with -L Michael Montalbo via GitGitGadget
2026-06-18 22:00 ` Junio C Hamano
2026-06-18 18:16 ` [PATCH 6/7] diff: support --check with -L line ranges Michael Montalbo via GitGitGadget
2026-06-18 18:16 ` [PATCH 7/7] diffcore-pickaxe: scope -G to the -L tracked range Michael Montalbo via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5602b7976a0fcb6eeb582d4cd88ca379f48dd98d.1781806593.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=ben.knoble@gmail.com \
--cc=git@vger.kernel.org \
--cc=mmontalbo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox