Segfault: git show-branch --reflog refs/pullreqs/1

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Segfault: git show-branch --reflog refs/pullreqs/1
@ 2024-02-21  1:48 Yasushi SHOJI
  2024-02-21  8:42 ` Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Yasushi SHOJI @ 2024-02-21  1:48 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

Does anyone see a segfault on `git show-branch --reflog refs/pullreqs/1`?

It seems files_reflog_path() creates a wrong path with the above command
using REF_WORKTREE_SHARED.

refs/pullreqs is from GitHub if you have the following line in .git/config:

fetch = +refs/pull/*/head:refs/pullreqs/*

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>,
signo=signo@entry=6, no_tid=no_tid@entry=0)
    at ./nptl/pthread_kill.c:44
#1  0x00007ffff7e101cf in __pthread_kill_internal (signo=6,
threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7dc2472 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x00007ffff7dac4b2 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7dad1ed in __libc_message (fmt=fmt@entry=0x7ffff7f1f78c
"%s\n") at ../sysdeps/posix/libc_fatal.c:150
#5  0x00007ffff7e19ae5 in malloc_printerr
(str=str@entry=0x7ffff7f1d22c "free(): invalid pointer") at
./malloc/malloc.c:5658
#6  0x00007ffff7e1b864 in _int_free (av=<optimized out>, p=<optimized
out>, have_lock=have_lock@entry=0) at ./malloc/malloc.c:4432
#7  0x00007ffff7e1e1df in __GI___libc_free (mem=<optimized out>) at
./malloc/malloc.c:3367
#8  0x0000555555625ae2 in cmd_show_branch (ac=<optimized out>,
av=<optimized out>, prefix=<optimized out>)
    at builtin/show-branch.c:801
#9  0x0000555555572250 in run_builtin (argv=0x7fffffffe080, argc=3,
p=0x5555558ca198 <commands+2712>) at git.c:469
#10 handle_builtin (argc=3, argv=0x7fffffffe080) at git.c:724
#11 0x00005555555731b4 in run_argv (argcp=argcp@entry=0x7fffffffde6c,
argv=argv@entry=0x7fffffffde60) at git.c:788
#12 0x0000555555573d38 in cmd_main (argc=<optimized out>,
argc@entry=4, argv=<optimized out>, argv@entry=0x7fffffffe078)
    at git.c:923
#13 0x0000555555571f10 in main (argc=4, argv=0x7fffffffe078) at common-main.c:62

Best,
-- 
             yashi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-21  1:48 Segfault: git show-branch --reflog refs/pullreqs/1 Yasushi SHOJI
@ 2024-02-21  8:42 ` Jeff King
  2024-02-21 10:05   ` Patrick Steinhardt
  2024-02-21 17:44   ` Junio C Hamano
  2024-02-21  9:52 ` Segfault: git show-branch --reflog refs/pullreqs/1 Patrick Steinhardt
  2024-02-21  9:56 ` [PATCH 0/2] Detect empty or missing reflogs with `ref@{0}` Patrick Steinhardt
  2 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2024-02-21  8:42 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: Denton Liu, Junio C Hamano, Git Mailing List

On Wed, Feb 21, 2024 at 10:48:25AM +0900, Yasushi SHOJI wrote:

> Does anyone see a segfault on `git show-branch --reflog refs/pullreqs/1`?
> 
> It seems files_reflog_path() creates a wrong path with the above command
> using REF_WORKTREE_SHARED.

I can trigger a segfault, but I think the issue is simply a ref that has
no reflog. Here's a simple reproduction:

  $ git init
  $ git commit --allow-empty -m foo
  $ rm -rf .git/logs
  $ git show-branch --reflog
  Segmentation fault

The bug is in read_ref_at(). When asked for the reflog at position "0",
it calls refs_for_each_reflog_ent_reverse() with a special callback, but
does not check that it actually found anything! So we return "0" for
success, but all of the returned fields are garbage (including the
pointer to reflog message, which is where I see the segfault).

The bug was introduced by 6436a20284 (refs: allow @{n} to work with
n-sized reflog, 2021-01-07). Probably the fix is something like:

diff --git a/refs.c b/refs.c
index 03968ad787..c2a48f8188 100644
--- a/refs.c
+++ b/refs.c
@@ -945,6 +945,8 @@ static int read_ref_at_ent_newest(struct object_id *ooid, struct object_id *noid

 	set_read_ref_cutoffs(cb, timestamp, tz, message);
 	oidcpy(cb->oid, noid);
+	cb->reccnt++;
+	cb->found_it = 1;
 	/* We just want the first entry */
 	return 1;
 }
@@ -980,12 +982,10 @@ int read_ref_at(struct ref_store *refs, const char *refname,
 	cb.cutoff_cnt = cutoff_cnt;
 	cb.oid = oid;

-	if (cb.cnt == 0) {
+	if (cb.cnt == 0)
 		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
-		return 0;
-	}
-
-	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
+	else
+		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);

 	if (!cb.reccnt) {
 		if (flags & GET_OID_QUIETLY)

but that breaks t1508.35, which explicitly tests for branch@{0} to work
with an empty reflog file (added by that same commit). The code in
get_oid_basic() to parse reflogs doesn't suffer from the same bugs: it
checks up front that the reflog file exists, it preloads the output oid
with the current ref value, and it doesn't look at other fields (like
the reflog message).

So I'm not sure if read_ref_at() needs to be made safer, or if
cmd_show_branch() needs to learn the same tricks as get_oid_basic().
Those are the only two callers of read_ref_at().

Beyond that confusion, I noticed we do not have many tests for
show-branch, and none for "--reflog". So I thought to add a basic one
where we _do_ have an actual reflog to show. But wow, this has been
broken for some time. I found at least two issues trying to run a test
like:

diff --git a/t/t3207-show-branch-reflog.sh b/t/t3207-show-branch-reflog.sh
new file mode 100755
index 0000000000..7f52c8dcb1
--- /dev/null
+++ b/t/t3207-show-branch-reflog.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+
+test_description='show-branch reflog tests'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit base &&
+	git checkout -b branch &&
+	test_commit one &&
+	git reset --hard HEAD^ &&
+	test_commit two &&
+	test_commit three
+'
+
+test_expect_success '--reflog shows reflog entries' '
+	cat >expect <<-\EOF &&
+	! [branch@{0}] (0 seconds ago) commit: three
+	 ! [branch@{1}] (60 seconds ago) commit: two
+	  ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^
+	   ! [branch@{3}] (2 minutes ago) commit: one
+	----
+	+    [refs/heads/branch@{0}] three
+	++   [refs/heads/branch@{1}] two
+	   + [refs/heads/branch@{3}] one
+	++++ [refs/heads/branch@{2}] base
+	EOF
+	# the output always contains relative timestamps; use
+	# a known time to get deterministic results
+	GIT_TEST_DATE_NOW=$test_tick \
+	git show-branch --reflog branch >actual &&
+	test_cmp expect actual
+'
+
+test_done

The first is that "show-branch" does not print the correct reflog
message, and you get output like this:

  ! [branch@{0}] (0 seconds ago) (none)
   ! [branch@{1}] (0 seconds ago) (none)
    ! [branch@{2}] (60 seconds ago) (none)
     ! [branch@{3}] (2 minutes ago) (none)

Once upon a time, read_ref_at() returned the whole reflog line, and
show-branch had to find the tab-separator. But since 4207ed285f (refs.c:
change read_ref_at to use the reflog iterators, 2014-06-03), it returns
just the actual message (curiously, with the newline still attached). So
we need something like this to fix it:

diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index d6d2dabeca..b678b9fedb 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -761,7 +761,7 @@ int cmd_show_branch(int ac, const char **av, const char *prefix)
 		for (i = 0; i < reflog; i++) {
 			char *logmsg;
 			char *nth_desc;
-			const char *msg;
+			char *eol;
 			timestamp_t timestamp;
 			int tz;

@@ -771,15 +771,13 @@ int cmd_show_branch(int ac, const char **av, const char *prefix)
 				reflog = i;
 				break;
 			}
-			msg = strchr(logmsg, '\t');
-			if (!msg)
-				msg = "(none)";
-			else
-				msg++;
+			eol = strchr(logmsg, '\n');
+			if (eol)
+				*eol = '\0';
 			reflog_msg[i] = xstrfmt("(%s) %s",
 						show_date(timestamp, tz,
 							  DATE_MODE(RELATIVE)),
-						msg);
+						logmsg);
 			free(logmsg);

 			nth_desc = xstrfmt("%s@{%d}", *av, base+i);

Easy enough. But the output is still subtly wrong! Now we're back to
6436a20284 (refs: allow @{n} to work with n-sized reflog, 2021-01-07)
again. Before that commit, applying the fix above gives the expected
output from my test:

  ! [branch@{0}] (0 seconds ago) commit: three
   ! [branch@{1}] (60 seconds ago) commit: two
    ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^
     ! [branch@{3}] (2 minutes ago) commit: one

but afterwards, entries higher than one are all shifted (so 1 is a
duplicate of 0, 2 is the old 1, and so on):

  ! [branch@{0}] (0 seconds ago) commit: three
   ! [branch@{1}] (0 seconds ago) commit: three
    ! [branch@{2}] (60 seconds ago) commit: two
     ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^

I am still trying to wrap my head around how it can get such wrong
results for show-branch when asking for "git rev-parse branch@{0}", etc,
are correct. I think it is that "rev-parse branch@{0}" is only looking
at the output oid and does not consider the reflog message at all. So I
think it is subtly broken, but in a way that happens to work for that
caller. But I'm not sure of the correct fix. At least not at this time
of night.

Cc-ing folks involved in 6436a20284.

-Peff

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-21  1:48 Segfault: git show-branch --reflog refs/pullreqs/1 Yasushi SHOJI
  2024-02-21  8:42 ` Jeff King
@ 2024-02-21  9:52 ` Patrick Steinhardt
  2024-02-21  9:56 ` [PATCH 0/2] Detect empty or missing reflogs with `ref@{0}` Patrick Steinhardt
  2 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2024-02-21  9:52 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

On Wed, Feb 21, 2024 at 10:48:25AM +0900, Yasushi SHOJI wrote:
> Hi all,
> 
> Does anyone see a segfault on `git show-branch --reflog refs/pullreqs/1`?

TIL. I didn't even know that git-show-branch(1) is a thing.

By default, no refs but "HEAD" and refs starting with "refs/heads/*"
have a reflog. Thus, your ref "refs/pullreqs/1" won't have a reflog, and
git-show-branch(1) then happily segfaults when it didn't find a reflog
at all. This is of course a bug that needs fixing.

I'll send a patch in a bit, thanks for your report!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/2] Detect empty or missing reflogs with `ref@{0}`
  2024-02-21  1:48 Segfault: git show-branch --reflog refs/pullreqs/1 Yasushi SHOJI
  2024-02-21  8:42 ` Jeff King
  2024-02-21  9:52 ` Segfault: git show-branch --reflog refs/pullreqs/1 Patrick Steinhardt
@ 2024-02-21  9:56 ` Patrick Steinhardt
  2024-02-21  9:56   ` [PATCH 1/2] object-name: detect and report empty reflogs Patrick Steinhardt
  2024-02-21  9:56   ` [PATCH 2/2] builtin/show-branch: detect " Patrick Steinhardt
  2 siblings, 2 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2024-02-21  9:56 UTC (permalink / raw)
  To: git; +Cc: Yasushi SHOJI

[-- Attachment #1: Type: text/plain, Size: 1120 bytes --]

Hi,

this patch series addresses some shortcomings when parsing `ref@{n}`
syntax via `read_ref_at()` when the reflog is missing or empty:

  - First, as reported by Yasushi, git-show-branch(1) would segfault
    because the function does not report when the 0th entry wasn't
    found.

  - Second, `ref@{0}` would fall back to return the object ID of ref
    itself in case the reflog is empty or missing. This behaviour is
    quite confusing and only works by chance.

The series addresses both of these issues by detecting and reporting the
case where the reflog is empty or missing.

Patrick

Patrick Steinhardt (2):
  object-name: detect and report empty reflogs
  builtin/show-branch: detect empty reflogs

 builtin/show-branch.c          |  2 ++
 object-name.c                  | 10 ++++++----
 refs.c                         |  3 ++-
 t/t1506-rev-parse-diagnosis.sh |  8 ++++++++
 t/t1508-at-combinations.sh     | 29 +++++++++++++++++++++++++----
 t/t3202-show-branch.sh         | 25 +++++++++++++++++++++++++
 6 files changed, 68 insertions(+), 9 deletions(-)

-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] object-name: detect and report empty reflogs
  2024-02-21  9:56 ` [PATCH 0/2] Detect empty or missing reflogs with `ref@{0}` Patrick Steinhardt
@ 2024-02-21  9:56   ` Patrick Steinhardt
  2024-02-21 10:37     ` Kristoffer Haugsbakk
                       ` (2 more replies)
  2024-02-21  9:56   ` [PATCH 2/2] builtin/show-branch: detect " Patrick Steinhardt
  1 sibling, 3 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2024-02-21  9:56 UTC (permalink / raw)
  To: git; +Cc: Yasushi SHOJI

[-- Attachment #1: Type: text/plain, Size: 7090 bytes --]

The `ref@{n}` syntax allows the user to request the n'th reflog entry
for a ref. This syntax is zero-indexed, meaning that entry zero refers
to the first entry in the reflog. The behaviour here is quite confusing
though and depends on the state of the corresponding reflog:

  - If the reflog is not empty, we return the first reflog entry.

  - If the reflog is empty, we return the object ID of the ref.

  - If the reflog is missing, we return an error.

This is inconsistent and quite misleading.

This behaviour goes back to 6436a20284 (refs: allow @{n} to work with
n-sized reflog, 2021-01-07), which fixed a bug that wouldn't allow a
user to return the n'th reflog entry with an n-sized reflog. With this
commit, `read_ref_at()` started to special case reading the first entry
of the reflog via a separate `read_ref_at_ent_newest()` function. The
problem here is that we forgot to check whether the callback was invoked
at all, and thus we don't notice empty reflogs.

The commit in question added a test for `ref@{0}` when the reflog is
empty. But that test only works by chance: while `read_ref_at()` won't
initialize the object ID passed in by the pointer, all callers of this
function happen to call `repo_ref_dwim()` and thus pre-populate the
object ID. Thus, the consequence is that we indeed return the object ID
of the refname when the reflog is empty.

This behaviour is documented nowhere, and the fact that we return a
somewhat sensible result to the caller by sheer luck further stresses
the point that this behaviour is only accidental, even if there is a
test covering it.

Furthermore, this behaviour causes problems for the git-show-branch(1)
command. When executing `git show-branch --reflog` for a ref that either
has no or an empty reflog we run into a segfault. This is because the
`read_ref_at()` function doesn't report the error to us, and thus parts
of its out-parameters are not initialized.

Start to detect and report empty or missing reflogs in `read_ref_at()`
and report them to the caller. This results in a change in behaviour
when asking for `ref@{0}` with an empty or missing reflog because we now
die instead of returning the object ID of the ref itself. This adapted
behaviour should lead to less surprises as we now really only report
object IDs to the caller that actually come from the reflog, thus making
the user experience a whole lot more consistent.

This change also fixes the segfault in git-show-branch(1). Note that
this commit does not add a test yet -- this will be handled in the next
commit.

Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c                  | 10 ++++++----
 refs.c                         |  3 ++-
 t/t1506-rev-parse-diagnosis.sh |  8 ++++++++
 t/t1508-at-combinations.sh     | 29 +++++++++++++++++++++++++----
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/object-name.c b/object-name.c
index 3a2ef5d680..e2a6c9d2ec 100644
--- a/object-name.c
+++ b/object-name.c
@@ -994,8 +994,8 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 	if (reflog_len) {
 		int nth, i;
 		timestamp_t at_time;
-		timestamp_t co_time;
-		int co_tz, co_cnt;
+		timestamp_t co_time = 0;
+		int co_tz = 0, co_cnt = 0;

 		/* Is it asking for N-th entry, or approxidate? */
 		for (i = nth = 0; 0 <= nth && i < reflog_len; i++) {
@@ -1020,6 +1020,7 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 				return -1;
 			}
 		}
+
 		if (read_ref_at(get_main_ref_store(r),
 				real_ref, flags, at_time, nth, oid, NULL,
 				&co_time, &co_tz, &co_cnt)) {
@@ -1035,9 +1036,10 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 						show_date(co_time, co_tz, DATE_MODE(RFC2822)));
 				}
 			} else {
-				if (flags & GET_OID_QUIETLY) {
+				if (flags & GET_OID_QUIETLY)
 					exit(128);
-				}
+				if (!co_cnt)
+					die(_("log for '%.*s' is empty"), len, str);
 				die(_("log for '%.*s' only has %d entries"),
 				    len, str, co_cnt);
 			}
diff --git a/refs.c b/refs.c
index c633abf284..a2369e7835 100644
--- a/refs.c
+++ b/refs.c
@@ -1084,6 +1084,7 @@ static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
 	struct read_ref_at_cb *cb = cb_data;

 	set_read_ref_cutoffs(cb, timestamp, tz, message);
+	cb->found_it = 1;
 	oidcpy(cb->oid, noid);
 	/* We just want the first entry */
 	return 1;
@@ -1123,7 +1124,7 @@ int read_ref_at(struct ref_store *refs, const char *refname,

 	if (cb.cnt == 0) {
 		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
-		return 0;
+		return !cb.found_it;
 	}

 	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index ef40511d89..9d147c4ade 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -140,6 +140,14 @@ test_expect_success 'incorrect file in :path and :N:path' '
 	test_grep "path .disk-only.txt. exists on disk, but not in the index" error
 '

+test_expect_success '@{0} reference with empty reflog' '
+	git checkout -B empty-reflog main &&
+	git reflog expire --expire=now refs/heads/empty-reflog &&
+	test_must_fail git rev-parse empty-reflog@{0} >output 2>error &&
+	test_must_be_empty output &&
+	test_grep "log for ${SQ}empty-reflog${SQ} is empty" error
+'
+
 test_expect_success 'invalid @{n} reference' '
 	test_must_fail git rev-parse main@{99999} >output 2>error &&
 	test_must_be_empty output &&
diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
index e841309d0e..020106a1cc 100755
--- a/t/t1508-at-combinations.sh
+++ b/t/t1508-at-combinations.sh
@@ -110,10 +110,31 @@ test_expect_success '@{1} works with only one reflog entry' '
 	test_cmp_rev newbranch~ newbranch@{1}
 '

-test_expect_success '@{0} works with empty reflog' '
-	git checkout -B newbranch main &&
-	git reflog expire --expire=now refs/heads/newbranch &&
-	test_cmp_rev newbranch newbranch@{0}
+test_expect_success '@{0} fails with empty reflog' '
+	git checkout -B empty-reflog main &&
+	git reflog expire --expire=now refs/heads/empty-reflog &&
+	cat >expect <<-EOF &&
+	fatal: Needed a single revision
+	EOF
+	test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success '@{0} fails with missing reflog' '
+	git -c core.logAllRefUpdates=false checkout -B missing-reflog main &&
+	cat >expect <<-EOF &&
+	fatal: Needed a single revision
+	EOF
+	test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success '@{0} favors first reflog entry with diverged reflog' '
+	git checkout -B diverged-reflog main &&
+	test_commit A &&
+	test_commit B &&
+	git reflog delete diverged-reflog@{0} &&
+	test_cmp_rev diverged-reflog~ diverged-reflog@{0}
 '

 test_done
-- 
2.44.0-rc1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] builtin/show-branch: detect empty reflogs
  2024-02-21  9:56 ` [PATCH 0/2] Detect empty or missing reflogs with `ref@{0}` Patrick Steinhardt
  2024-02-21  9:56   ` [PATCH 1/2] object-name: detect and report empty reflogs Patrick Steinhardt
@ 2024-02-21  9:56   ` Patrick Steinhardt
  2024-02-21 17:35     ` Jeff King
  1 sibling, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2024-02-21  9:56 UTC (permalink / raw)
  To: git; +Cc: Yasushi SHOJI

[-- Attachment #1: Type: text/plain, Size: 2525 bytes --]

The `--reflog=n` option for git-show-branch(1) allows the user to list
the reflog of a specific branch, where `n` specifies how many reflog
entries to show. When there are less than `n` entries we handle this
gracefully and simply report less entries.

There is a special case though when the ref either has no reflog or when
its reflog is empty. In this case, we end up printing nothing at all
while returning successfully. This is rather confusing as there is no
indicator why we didn't print anything.

Adapt the behaviour so that we die instead of leaving no user visible
traces. This change in behaviour should be fine given that this case
used to segfault before the preceding commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/show-branch.c  |  2 ++
 t/t3202-show-branch.sh | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index b01ec761d2..8837415031 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -785,6 +785,8 @@ int cmd_show_branch(int ac, const char **av, const char *prefix)
 			if (read_ref_at(get_main_ref_store(the_repository),
 					ref, flags, 0, base + i, &oid, &logmsg,
 					&timestamp, &tz, NULL)) {
+				if (!i)
+					die(_("log for %s is empty"), ref);
 				reflog = i;
 				break;
 			}
diff --git a/t/t3202-show-branch.sh b/t/t3202-show-branch.sh
index 6a98b2df76..e5b74cc55f 100755
--- a/t/t3202-show-branch.sh
+++ b/t/t3202-show-branch.sh
@@ -264,4 +264,29 @@ test_expect_success 'error descriptions on orphan branch' '
 	test_branch_op_in_wt -c new-branch
 '
 
+test_expect_success 'show-branch --reflog with empty reflog' '
+	git checkout -B empty-reflog &&
+	git reflog expire --expire=now refs/heads/empty-reflog &&
+
+	cat >expect <<-EOF &&
+	fatal: log for refs/heads/empty-reflog is empty
+	EOF
+	test_must_fail git show-branch --reflog=1 empty-reflog 2>err &&
+	test_cmp expect err &&
+	test_must_fail git show-branch --reflog empty-reflog 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'show-branch --reflog with missing reflog' '
+	git -c core.logAllRefUpdates=false checkout -B missing-reflog &&
+
+	cat >expect <<-EOF &&
+	fatal: log for refs/heads/missing-reflog is empty
+	EOF
+	test_must_fail git show-branch --reflog=1 missing-reflog 2>err &&
+	test_cmp expect err &&
+	test_must_fail git show-branch --reflog missing-reflog 2>err &&
+	test_cmp expect err
+'
+
 test_done
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-21  8:42 ` Jeff King
@ 2024-02-21 10:05   ` Patrick Steinhardt
  2024-02-21 17:38     ` Jeff King
  2024-02-21 17:44   ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 10:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Yasushi SHOJI, Denton Liu, Junio C Hamano, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1462 bytes --]

On Wed, Feb 21, 2024 at 03:42:50AM -0500, Jeff King wrote:
> On Wed, Feb 21, 2024 at 10:48:25AM +0900, Yasushi SHOJI wrote:
> 
> > Does anyone see a segfault on `git show-branch --reflog refs/pullreqs/1`?
> > 
> > It seems files_reflog_path() creates a wrong path with the above command
> > using REF_WORKTREE_SHARED.
> I am still trying to wrap my head around how it can get such wrong
> results for show-branch when asking for "git rev-parse branch@{0}", etc,
> are correct. I think it is that "rev-parse branch@{0}" is only looking
> at the output oid and does not consider the reflog message at all. So I
> think it is subtly broken, but in a way that happens to work for that
> caller. But I'm not sure of the correct fix. At least not at this time
> of night.
> 
> Cc-ing folks involved in 6436a20284.

Ah, our mails crossed, but we came to the same conclusion. Things indeed
are subtly broken here and work just by chance because all callers pre
initialize the object ID. So in the case where the reflog is missing or
empty we'd use that pre-initialized object ID because `read_ref_at()`
does not indicate the failure to the callers.

I think that this behaviour is not sensible in the first place. When
asking for the reflog, we should only ever return object IDs parsed from
the reflog. Falling back to parsing the ref itself does not make much
sense. I've thus sent a patch series that changes the behaviour here.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] object-name: detect and report empty reflogs
  2024-02-21  9:56   ` [PATCH 1/2] object-name: detect and report empty reflogs Patrick Steinhardt
@ 2024-02-21 10:37     ` Kristoffer Haugsbakk
  2024-02-21 16:48     ` Eric Sunshine
  2024-02-21 17:31     ` Jeff King
  2 siblings, 0 replies; 27+ messages in thread
From: Kristoffer Haugsbakk @ 2024-02-21 10:37 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Yasushi SHOJI, git

On Wed, Feb 21, 2024, at 10:56, Patrick Steinhardt wrote:
> The `ref@{n}` syntax allows the user to request the n'th reflog entry
> for a ref. This syntax is zero-indexed, meaning that entry zero refers
> to the first entry in the reflog. The behaviour here is quite confusing
> though and depends on the state of the corresponding reflog:

Maybe this is obvious to other readers, but I sometimes get tripped up
when reading about such logs: what’s the “first entry”? The oldest one
or newest one? How about:

  “ The `ref@{n}` syntax allows the user to request the n'th reflog entry
    for a ref, starting from `ref@{0}` which points to the commit that
    `ref` points to (zero-indexed). The behaviour here is quite confusing
    though and depends on the state of the corresponding reflog:

-- 
Kristoffer Haugsbakk

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] object-name: detect and report empty reflogs
  2024-02-21  9:56   ` [PATCH 1/2] object-name: detect and report empty reflogs Patrick Steinhardt
  2024-02-21 10:37     ` Kristoffer Haugsbakk
@ 2024-02-21 16:48     ` Eric Sunshine
  2024-02-21 17:31     ` Jeff King
  2 siblings, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2024-02-21 16:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Yasushi SHOJI

On Wed, Feb 21, 2024 at 5:03 AM Patrick Steinhardt <ps@pks.im> wrote:
> [...]
> Start to detect and report empty or missing reflogs in `read_ref_at()`
> and report them to the caller. This results in a change in behaviour
> when asking for `ref@{0}` with an empty or missing reflog because we now
> die instead of returning the object ID of the ref itself. This adapted
> behaviour should lead to less surprises as we now really only report
> object IDs to the caller that actually come from the reflog, thus making
> the user experience a whole lot more consistent.
> [...]
> Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
> @@ -110,10 +110,31 @@ test_expect_success '@{1} works with only one reflog entry' '
> +test_expect_success '@{0} fails with empty reflog' '
> +       git checkout -B empty-reflog main &&
> +       git reflog expire --expire=now refs/heads/empty-reflog &&
> +       cat >expect <<-EOF &&
> +       fatal: Needed a single revision
> +       EOF

Typically, we would use <<-\EOF rather than <<-EOF to signal to the
reader that no variable interpolation is happening in the here-doc
body.

A simpler alternative in this case would, of course, be to use `echo`:

    echo "fatal: Needed a single revision" >expect &&

> +       test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
> +       test_cmp expect err
> +'
> +
> +test_expect_success '@{0} fails with missing reflog' '
> +       git -c core.logAllRefUpdates=false checkout -B missing-reflog main &&
> +       cat >expect <<-EOF &&
> +       fatal: Needed a single revision
> +       EOF

Ditto.

> +       test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
> +       test_cmp expect err
> +'

Probably neither comment is worth a reroll, but if you reroll for some
other reason, perhaps consider them.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] object-name: detect and report empty reflogs
  2024-02-21  9:56   ` [PATCH 1/2] object-name: detect and report empty reflogs Patrick Steinhardt
  2024-02-21 10:37     ` Kristoffer Haugsbakk
  2024-02-21 16:48     ` Eric Sunshine
@ 2024-02-21 17:31     ` Jeff King
  2 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2024-02-21 17:31 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Yasushi SHOJI

On Wed, Feb 21, 2024 at 10:56:36AM +0100, Patrick Steinhardt wrote:

> This behaviour goes back to 6436a20284 (refs: allow @{n} to work with
> n-sized reflog, 2021-01-07), which fixed a bug that wouldn't allow a
> user to return the n'th reflog entry with an n-sized reflog. With this
> commit, `read_ref_at()` started to special case reading the first entry
> of the reflog via a separate `read_ref_at_ent_newest()` function. The
> problem here is that we forgot to check whether the callback was invoked
> at all, and thus we don't notice empty reflogs.

I'm on the fence no whether the current @{0} behavior is sensible and
should be preserved. I agree it mostly worked by luck, but the presence
of the test makes me think at least the intention was there.

But assuming that is a good direction, there's one thing that puzzles me
about your patch:

> @@ -1084,6 +1084,7 @@ static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
>  	struct read_ref_at_cb *cb = cb_data;
>  
>  	set_read_ref_cutoffs(cb, timestamp, tz, message);
> +	cb->found_it = 1;
>  	oidcpy(cb->oid, noid);
>  	/* We just want the first entry */
>  	return 1;

OK, so we note whether the callback was invoked, which is good...

> @@ -1123,7 +1124,7 @@ int read_ref_at(struct ref_store *refs, const char *refname,
>  
>  	if (cb.cnt == 0) {
>  		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
> -		return 0;
> +		return !cb.found_it;
>  	}

...but here we just return without an error message. Whereas later in
the function, we have logic to produce the "log for %s is empty"
message. So now we will produce a message if you ask for branch@{1} in
an empty reflog, but not for branch@{0}, and the caller is responsible
for printing an error in the latter case.

If we instead set reccnt for branch@{0} as we would for branch@{1}, then
we can fall through and share the error handling (like it was before
6436a20284, when they used the same callback):

diff --git a/refs.c b/refs.c
index c633abf284..7d5e7a9ba6 100644
--- a/refs.c
+++ b/refs.c
@@ -1084,6 +1084,8 @@ static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
 	struct read_ref_at_cb *cb = cb_data;
 
 	set_read_ref_cutoffs(cb, timestamp, tz, message);
+	cb->reccnt++;
+	cb->found_it = 1;
 	oidcpy(cb->oid, noid);
 	/* We just want the first entry */
 	return 1;
@@ -1121,12 +1123,10 @@ int read_ref_at(struct ref_store *refs, const char *refname,
 	cb.cutoff_cnt = cutoff_cnt;
 	cb.oid = oid;
 
-	if (cb.cnt == 0) {
+	if (cb.cnt == 0)
 		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
-		return 0;
-	}
-
-	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
+	else
+		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
 
 	if (!cb.reccnt) {
 		if (flags & GET_OID_QUIETLY)

And it all just works without having to touch get_oid_basic() or
cmd_show_branch() at all. Do note that one of the tests needs to be
updated to account for the slightly different format of the error
message; but again, I think that is showing off the inconsistency in
having the error message in two places.

-Peff

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] builtin/show-branch: detect empty reflogs
  2024-02-21  9:56   ` [PATCH 2/2] builtin/show-branch: detect " Patrick Steinhardt
@ 2024-02-21 17:35     ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2024-02-21 17:35 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Yasushi SHOJI

On Wed, Feb 21, 2024 at 10:56:40AM +0100, Patrick Steinhardt wrote:

> diff --git a/builtin/show-branch.c b/builtin/show-branch.c
> index b01ec761d2..8837415031 100644
> --- a/builtin/show-branch.c
> +++ b/builtin/show-branch.c
> @@ -785,6 +785,8 @@ int cmd_show_branch(int ac, const char **av, const char *prefix)
>  			if (read_ref_at(get_main_ref_store(the_repository),
>  					ref, flags, 0, base + i, &oid, &logmsg,
>  					&timestamp, &tz, NULL)) {
> +				if (!i)
> +					die(_("log for %s is empty"), ref);
>  				reflog = i;
>  				break;
>  			}

There's another call to read_ref_at() earlier in the function that does
not check the return code. I think it is OK, though, because it always
passes a timestamp rather than a count, so it never hits the
special-case added by 6436a20284.

But I think this highlights my confusion with the first patch; most of
the time read_ref_at() will print the appropriate message for us and
die, but for this one weird case we have to do it ourselves. It makes
more sense to me for it to behave consistently for an empty reflog,
whether we were looking for branch@{0} or branch@{1}.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-21 10:05   ` Patrick Steinhardt
@ 2024-02-21 17:38     ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2024-02-21 17:38 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Yasushi SHOJI, Denton Liu, Junio C Hamano, Git Mailing List

On Wed, Feb 21, 2024 at 11:05:56AM +0100, Patrick Steinhardt wrote:

> > I am still trying to wrap my head around how it can get such wrong
> > results for show-branch when asking for "git rev-parse branch@{0}", etc,
> > are correct. I think it is that "rev-parse branch@{0}" is only looking
> > at the output oid and does not consider the reflog message at all. So I
> > think it is subtly broken, but in a way that happens to work for that
> > caller. But I'm not sure of the correct fix. At least not at this time
> > of night.
> > 
> > Cc-ing folks involved in 6436a20284.
> 
> Ah, our mails crossed, but we came to the same conclusion. Things indeed
> are subtly broken here and work just by chance because all callers pre
> initialize the object ID. So in the case where the reflog is missing or
> empty we'd use that pre-initialized object ID because `read_ref_at()`
> does not indicate the failure to the callers.
> 
> I think that this behaviour is not sensible in the first place. When
> asking for the reflog, we should only ever return object IDs parsed from
> the reflog. Falling back to parsing the ref itself does not make much
> sense. I've thus sent a patch series that changes the behaviour here.

I left some comments on your patches. But assuming we are OK to change
the branch@{0} behavior for the empty log, the approach is sound.

That still leaves us with the bug in showing the message (which is
easily fixed), and the weird off-by-one output caused by 6436a20284.
Obviously the segfault fix can be taken independently of the rest, but I
wonder if some deeper refactoring of what 6436a20284 did will be
necessary.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-21  8:42 ` Jeff King
  2024-02-21 10:05   ` Patrick Steinhardt
@ 2024-02-21 17:44   ` Junio C Hamano
  2024-02-22  9:02     ` Patrick Steinhardt
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2024-02-21 17:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Yasushi SHOJI, Denton Liu, Git Mailing List

Jeff King <peff@peff.net> writes:

> with an empty reflog file (added by that same commit). The code in
> get_oid_basic() to parse reflogs doesn't suffer from the same bugs: it
> checks up front that the reflog file exists, it preloads the output oid
> with the current ref value, and it doesn't look at other fields (like
> the reflog message).

It is a usability hack to allow foo@{0} to resolve successfully,
instead of causing a failure, when there is no reflog entries for
foo, I would think.  As to the "show-branch -g", the intent is to
see the recent evolution of the branch, so I am fine if we do not
give any output when no reflog entries exist (i.e. "no evolution
behind the current state---it just is"), or just one entry output
for "foo@{0}" to say "there is only one recent state".

Even though it may feel wrong to successfully resolve foo@{0} when
reflog for foo does not exist at the mechanical level (read: the
implementors of reflog mechanism may find the usability hack a bad
idea), I suspect at the end-user level it may be closer to what
people expect out of foo@{0} (i.e. "give me the latest").

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-21 17:44   ` Junio C Hamano
@ 2024-02-22  9:02     ` Patrick Steinhardt
  2024-02-22 16:32       ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2024-02-22  9:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Yasushi SHOJI, Denton Liu, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2889 bytes --]

On Wed, Feb 21, 2024 at 09:44:09AM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > with an empty reflog file (added by that same commit). The code in
> > get_oid_basic() to parse reflogs doesn't suffer from the same bugs: it
> > checks up front that the reflog file exists, it preloads the output oid
> > with the current ref value, and it doesn't look at other fields (like
> > the reflog message).
> 
> It is a usability hack to allow foo@{0} to resolve successfully,
> instead of causing a failure, when there is no reflog entries for
> foo, I would think.  As to the "show-branch -g", the intent is to
> see the recent evolution of the branch, so I am fine if we do not
> give any output when no reflog entries exist (i.e. "no evolution
> behind the current state---it just is"), or just one entry output
> for "foo@{0}" to say "there is only one recent state".
> 
> Even though it may feel wrong to successfully resolve foo@{0} when
> reflog for foo does not exist at the mechanical level (read: the
> implementors of reflog mechanism may find the usability hack a bad
> idea), I suspect at the end-user level it may be closer to what
> people expect out of foo@{0} (i.e. "give me the latest").

Hum, I dunno. I don't really understand what the benefit of this
fallback is. If a user wants to know the latest object ID of the ref
they shouldn't ask for `foo@{0}`, they should ask for `foo`. On the
other hand, if I want to know "What is the latest entry in the ref's
log", I want to ask for `foo@{0}`.

For one, I think that this is adding complexity to the user interface.
If you were to explain this feature to a user who has never encountered
it, you need to now also explain special cases: "It gives you the latest
reflog entry, except when the reflog doesn't exist or is missing, then
we return the object ID of the corresponding ref." This is a lot more
mental overhead than "It gives you the latest reflog entry."

We also have to consider a potential future where we stop deleting
reflogs together with their ref in the "reftable" backend. What do we
return when the reflog is empty and the ref is missing? "It gives you
the latest reflog entry, except when the reflog doesn't exist or is
missing, then we give you the ref except if that is missing, too". It's
getting even harder to explain now.

Another angle to me is scripting. If I really want to get the latest
reflog entry, then I now have to execute two commands. First I have to
check whether the reflog is non-empty, and only then can I ask for the
latest reflog entry. Otherwise, I might not get a reflog entry but the
resolved ref instead. And I wouldn't even know how to check whether the
reflog is non-empty.

So overall, I think the interface is a lot easier to understand and use
correctly if we didn't have this fallback.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-22  9:02     ` Patrick Steinhardt
@ 2024-02-22 16:32       ` Junio C Hamano
  2024-02-22 17:22         ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2024-02-22 16:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jeff King, Yasushi SHOJI, Denton Liu, Git Mailing List

Patrick Steinhardt <ps@pks.im> writes:

>> Even though it may feel wrong to successfully resolve foo@{0} when
>> reflog for foo does not exist at the mechanical level (read: the
>> implementors of reflog mechanism may find the usability hack a bad
>> idea), I suspect at the end-user level it may be closer to what
>> people expect out of foo@{0} (i.e. "give me the latest").
>
> Hum, I dunno. I don't really understand what the benefit of this
> fallback is. If a user wants to know the latest object ID of the ref
> they shouldn't ask for `foo@{0}`, they should ask for `foo`. On the
> other hand, if I want to know "What is the latest entry in the ref's
> log", I want to ask for `foo@{0}`.

The usability hack helps small things like "List up to 4 most recent
states from a branch", e.g.

    for nth in $(seq 0 3)
    do
	git rev-parse --quiet --verify @$nth || break
	git show -s --format="@$nth %h %s" @$nth
    done

vs

    for rev in HEAD @{1} @{2} @{3}
    do
	git rev-parse --quiet --verify "$rev" || break
	git show -s --format="$rev %h %s" "$rev"
    done

by not forcing you to special case the "current".

Ideally, "foo@{0}" should have meant "the state immediately before
the current state of foo" so that "foo" is the unambiguous and only
way to refer to "the current state of foo", but that was not how we
implemented the reflog, allowing a subtle repository corruption
where the latest state of a branch according to the reflog and the
current commit pointed by the branch can diverge.  But that wasn't
what we did, and instead both "foo@{0}" and "foo" mean to refer to
"the latest state of foo".  We can take advantage of that misdesign
and allow "foo@{0}" to refer to the same commit as "foo", at least
at the get_oid_basic() level, whether a reflog actually exists or
not, and that would make the whole thing more consistent.

In any case, I do not know how this "usability" actually helps in
the field, and I wouldn't personally shed tears if it gets removed.
The above is just an explanation why it exists.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Segfault: git show-branch --reflog refs/pullreqs/1
  2024-02-22 16:32       ` Junio C Hamano
@ 2024-02-22 17:22         ` Jeff King
  2024-02-26 10:00           ` [PATCH 0/3] show-branch --reflog fixes Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2024-02-22 17:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

On Thu, Feb 22, 2024 at 08:32:06AM -0800, Junio C Hamano wrote:

> > Hum, I dunno. I don't really understand what the benefit of this
> > fallback is. If a user wants to know the latest object ID of the ref
> > they shouldn't ask for `foo@{0}`, they should ask for `foo`. On the
> > other hand, if I want to know "What is the latest entry in the ref's
> > log", I want to ask for `foo@{0}`.
> 
> The usability hack helps small things like "List up to 4 most recent
> states from a branch", e.g.
> 
>     for nth in $(seq 0 3)
>     do
> 	git rev-parse --quiet --verify @$nth || break
> 	git show -s --format="@$nth %h %s" @$nth
>     done
> 
> vs
> 
>     for rev in HEAD @{1} @{2} @{3}
>     do
> 	git rev-parse --quiet --verify "$rev" || break
> 	git show -s --format="$rev %h %s" "$rev"
>     done
> 
> by not forcing you to special case the "current".

In those examples, though, it is useful precisely because you _do_ have
a reflog, and ref@{0} is conceptually the top entry (which brought us to
the same state as just "ref").

The question to me is more "is ref@{0} useful on its own, even when you
do not necessarily have a reflog". That I am less sure of.

> Ideally, "foo@{0}" should have meant "the state immediately before
> the current state of foo" so that "foo" is the unambiguous and only
> way to refer to "the current state of foo", but that was not how we
> implemented the reflog, allowing a subtle repository corruption
> where the latest state of a branch according to the reflog and the
> current commit pointed by the branch can diverge.  But that wasn't
> what we did, and instead both "foo@{0}" and "foo" mean to refer to
> "the latest state of foo".  We can take advantage of that misdesign
> and allow "foo@{0}" to refer to the same commit as "foo", at least
> at the get_oid_basic() level, whether a reflog actually exists or
> not, and that would make the whole thing more consistent.

I think there is some confusion here between how get_oid_basic() behaves
and how read_ref_at() is used for something like show-branch. In the
former case, we only care about getting an oid as output, but in the
latter we actually want the reflog entry (because we care about its
timestamp, message, and so on).

So in terms of reflog entries, ref@{0} should refer to the most recent
entry. And the oid it returns should be the end-result of that entry,
which (in a non corrupted repository) is identical to the current ref
value. And that "should" is reinforced by stuff like:

  git log -g "%gd %gs"

which shows the most recent entry as HEAD@{0}.

I think 6436a20284 (refs: allow @{n} to work with n-sized reflog,
2021-01-07) confused things mightily by having read_ref_at() with a
count of "n" find entry "n-1" instead, and then return the oid for the
"old" value. That makes get_oid_basic() work, because it doesn't care
about which entry we found, only the oid. But for show-branch, now we
are confused about which reflog entry ref@{1}, etc, refers to (but
ref@{0} still works because of the weird special-casing done by that
commit).

I think we should fix that (and I have the start of some patches to do
so). But in that world-view, having read_ref_at() return anything for a
count of "0" when there is no reflog does not make sense. There is no
such entry!

OTOH, we face the same problem when asking about ref@{N} when there are
only N entries. We can provide an oid (based on the "old" value from the
oldest entry we did see), but we have to "fake" the reflog entry data
(like the messsage), since there wasn't one.

So the open questions to me are:

  - should this faking happen in read_ref_at(), just returning a dummy
    reflog message? Or should we keep read_ref_at() purely about finding
    the entry, and put the special-casing into get_oid_basic(), which
    only cares about the oid result?

  - wherever we put the faking, should we only fake ref@{N} when N > 0?
    Or should we also fake ref@{0} when there is no reflog at all?

If none of this makes sense, it is because I am only now untangling what
is going on with 6436a20284. ;) I will try to polish my proposed patches
and hopefully that will explain it a bit more clearly (I may not get to
it until tomorrow though).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/3] show-branch --reflog fixes
  2024-02-22 17:22         ` Jeff King
@ 2024-02-26 10:00           ` Jeff King
  2024-02-26 10:02             ` [PATCH 1/3] Revert "refs: allow @{n} to work with n-sized reflog" Jeff King
                               ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Jeff King @ 2024-02-26 10:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

On Thu, Feb 22, 2024 at 12:22:52PM -0500, Jeff King wrote:

> If none of this makes sense, it is because I am only now untangling what
> is going on with 6436a20284. ;) I will try to polish my proposed patches
> and hopefully that will explain it a bit more clearly (I may not get to
> it until tomorrow though).

OK, so here's what I came up with. One thing I did not realize, as I
was writing my patches directly atop 6436a20284, is that we actually
fixed the reflog message bug back in f2463490c4 (show-branch: show
reflog message, 2021-12-02). And several tests have been added since
then.

So I gave up on trying to build on top of the source of the bug, and
just rebased onto the tip of master. It should apply to recent "maint"
as well, I'd think.

  [1/3]: Revert "refs: allow @{n} to work with n-sized reflog"
  [2/3]: get_oid_basic(): special-case ref@{n} for oldest reflog entry
  [3/3]: read_ref_at(): special-case ref@{0} for an empty reflog

 object-name.c          |  9 ++++++
 refs.c                 | 65 +++++++++++++++++++-----------------------
 refs.h                 | 15 +++++++++-
 t/t3202-show-branch.sh | 49 +++++++++++++++++++++----------
 4 files changed, 87 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] Revert "refs: allow @{n} to work with n-sized reflog"
  2024-02-26 10:00           ` [PATCH 0/3] show-branch --reflog fixes Jeff King
@ 2024-02-26 10:02             ` Jeff King
  2024-02-26 10:04             ` [PATCH 2/3] get_oid_basic(): special-case ref@{n} for oldest reflog entry Jeff King
  2024-02-26 10:08             ` [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog Jeff King
  2 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2024-02-26 10:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526.

The idea of that commit is that if read_ref_at() is counting back to the
Nth reflog but the reflog is short by one entry (e.g., because it was
pruned), we can find the oid of the missing entry by looking at the
"before" oid value of the entry that comes after it (whereas before, we
looked at the "after" value of each entry and complained that we
couldn't find the one from before the truncation).

This works fine for resolving the oid of ref@{n}, as it is used by
get_oid_basic(), which does not look at any other aspect of the reflog
we found (e.g., its timestamp or message). But there's another caller of
read_ref_at(): in show-branch we use it to walk over the reflog, and we
do care about the reflog entry. And so that commit broke "show-branch
--reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1}
as ref@{2}, and so on.

For example, in the new test in t3202 we produce:

  ! [branch@{0}] (0 seconds ago) commit: three
   ! [branch@{1}] (0 seconds ago) commit: three
    ! [branch@{2}] (60 seconds ago) commit: two
     ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^

instead of the correct:

  ! [branch@{0}] (0 seconds ago) commit: three
   ! [branch@{1}] (60 seconds ago) commit: two
    ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^
     ! [branch@{3}] (2 minutes ago) commit: one

But there's another bug, too: because it is looking at the "old" value
of the reflog after the one we're interested in, it has to special-case
ref@{0} (since there isn't anything after it). That's why it doesn't
show the offset bug in the output above. But this special-case code
fails to handle the situation where the reflog is empty or missing; it
returns success even though the reflog message out-parameter has been
left uninitialized. You can't trigger this through get_oid_basic(), but
"show-branch --reflog" will pretty reliably segfault as it tries to
access the garbage pointer.

Fixing the segfault would be pretty easy. But the off-by-one problem is
inherent in this approach. So let's start by reverting the commit to
give us a clean slate to work with.

This isn't a pure revert; all of the code changes are reverted, but for
the tests:

  1. We'll flip the cases in t1508 to expect_failure; making these work
     was the goal of 6436a2028, and we'll want to use them for our
     replacement approach.

  2. There's a test in t3202 for "show-branch --reflog", but it expects
     the broken output! It was added by f2463490c4 (show-branch: show
     reflog message, 2021-12-02) which was fixing another bug, and I
     think the author simply didn't notice that the second line showed
     the wrong reflog.

     Rather than fixing that test, let's replace it with one that is
     more thorough (while still covering the reflog message fix from
     that commit). We'll use a longer reflog, which lets us see more
     entries (thus making the "off by one" pattern much more clear). And
     we'll use a more recent timestamp for "now" so that our relative
     dates have more resolution. That lets us see that the reflog dates
     are correct (whereas when you are 4 years away, two entries that
     are 60 seconds apart will have the same "4 years ago" relative
     date). Because we're adjusting the repository state, I've moved
     this new test to the end of the script, leaving the other tests
     undisturbed.

     We'll also add a new test which covers the missing reflog case;
     previously it segfaulted, but now it reports the empty reflog).

Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c                     | 48 +++++++++++--------------------------
 t/t1508-at-combinations.sh |  4 ++--
 t/t3202-show-branch.sh     | 49 ++++++++++++++++++++++++++------------
 3 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/refs.c b/refs.c
index c633abf284..ba1a4db754 100644
--- a/refs.c
+++ b/refs.c
@@ -1038,55 +1038,40 @@ static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid,
 			   const char *message, void *cb_data)
 {
 	struct read_ref_at_cb *cb = cb_data;
-	int reached_count;
 
 	cb->tz = tz;
 	cb->date = timestamp;
 
-	/*
-	 * It is not possible for cb->cnt == 0 on the first iteration because
-	 * that special case is handled in read_ref_at().
-	 */
-	if (cb->cnt > 0)
-		cb->cnt--;
-	reached_count = cb->cnt == 0 && !is_null_oid(ooid);
-	if (timestamp <= cb->at_time || reached_count) {
+	if (timestamp <= cb->at_time || cb->cnt == 0) {
 		set_read_ref_cutoffs(cb, timestamp, tz, message);
 		/*
 		 * we have not yet updated cb->[n|o]oid so they still
 		 * hold the values for the previous record.
 		 */
-		if (!is_null_oid(&cb->ooid) && !oideq(&cb->ooid, noid))
-			warning(_("log for ref %s has gap after %s"),
+		if (!is_null_oid(&cb->ooid)) {
+			oidcpy(cb->oid, noid);
+			if (!oideq(&cb->ooid, noid))
+				warning(_("log for ref %s has gap after %s"),
 					cb->refname, show_date(cb->date, cb->tz, DATE_MODE(RFC2822)));
-		if (reached_count)
-			oidcpy(cb->oid, ooid);
-		else if (!is_null_oid(&cb->ooid) || cb->date == cb->at_time)
+		}
+		else if (cb->date == cb->at_time)
 			oidcpy(cb->oid, noid);
 		else if (!oideq(noid, cb->oid))
 			warning(_("log for ref %s unexpectedly ended on %s"),
 				cb->refname, show_date(cb->date, cb->tz,
 						       DATE_MODE(RFC2822)));
+		cb->reccnt++;
+		oidcpy(&cb->ooid, ooid);
+		oidcpy(&cb->noid, noid);
 		cb->found_it = 1;
+		return 1;
 	}
 	cb->reccnt++;
 	oidcpy(&cb->ooid, ooid);
 	oidcpy(&cb->noid, noid);
-	return cb->found_it;
-}
-
-static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
-				  struct object_id *noid,
-				  const char *email UNUSED,
-				  timestamp_t timestamp, int tz,
-				  const char *message, void *cb_data)
-{
-	struct read_ref_at_cb *cb = cb_data;
-
-	set_read_ref_cutoffs(cb, timestamp, tz, message);
-	oidcpy(cb->oid, noid);
-	/* We just want the first entry */
-	return 1;
+	if (cb->cnt > 0)
+		cb->cnt--;
+	return 0;
 }
 
 static int read_ref_at_ent_oldest(struct object_id *ooid, struct object_id *noid,
@@ -1121,11 +1106,6 @@ int read_ref_at(struct ref_store *refs, const char *refname,
 	cb.cutoff_cnt = cutoff_cnt;
 	cb.oid = oid;
 
-	if (cb.cnt == 0) {
-		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
-		return 0;
-	}
-
 	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
 
 	if (!cb.reccnt) {
diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
index e841309d0e..3e5f32f604 100755
--- a/t/t1508-at-combinations.sh
+++ b/t/t1508-at-combinations.sh
@@ -103,14 +103,14 @@ test_expect_success 'create path with @' '
 check "@:normal" blob content
 check "@:fun@ny" blob content
 
-test_expect_success '@{1} works with only one reflog entry' '
+test_expect_failure '@{1} works with only one reflog entry' '
 	git checkout -B newbranch main &&
 	git reflog expire --expire=now refs/heads/newbranch &&
 	git commit --allow-empty -m "first after expiration" &&
 	test_cmp_rev newbranch~ newbranch@{1}
 '
 
-test_expect_success '@{0} works with empty reflog' '
+test_expect_failure '@{0} works with empty reflog' '
 	git checkout -B newbranch main &&
 	git reflog expire --expire=now refs/heads/newbranch &&
 	test_cmp_rev newbranch newbranch@{0}
diff --git a/t/t3202-show-branch.sh b/t/t3202-show-branch.sh
index 6a98b2df76..35f35f8091 100755
--- a/t/t3202-show-branch.sh
+++ b/t/t3202-show-branch.sh
@@ -4,9 +4,6 @@ test_description='test show-branch'
 
 . ./test-lib.sh
 
-# arbitrary reference time: 2009-08-30 19:20:00
-GIT_TEST_DATE_NOW=1251660000; export GIT_TEST_DATE_NOW
-
 test_expect_success 'error descriptions on empty repository' '
 	current=$(git branch --show-current) &&
 	cat >expect <<-EOF &&
@@ -187,18 +184,6 @@ test_expect_success 'show branch --merge-base with N arguments' '
 	test_cmp expect actual
 '
 
-test_expect_success 'show branch --reflog=2' '
-	sed "s/^>	//" >expect <<-\EOF &&
-	>	! [refs/heads/branch10@{0}] (4 years, 5 months ago) commit: branch10
-	>	 ! [refs/heads/branch10@{1}] (4 years, 5 months ago) commit: branch10
-	>	--
-	>	+  [refs/heads/branch10@{0}] branch10
-	>	++ [refs/heads/branch10@{1}] initial
-	EOF
-	git show-branch --reflog=2 >actual &&
-	test_cmp actual expect
-'
-
 # incompatible options
 while read combo
 do
@@ -264,4 +249,38 @@ test_expect_success 'error descriptions on orphan branch' '
 	test_branch_op_in_wt -c new-branch
 '
 
+test_expect_success 'setup reflogs' '
+	test_commit base &&
+	git checkout -b branch &&
+	test_commit one &&
+	git reset --hard HEAD^ &&
+	test_commit two &&
+	test_commit three
+'
+
+test_expect_success '--reflog shows reflog entries' '
+	cat >expect <<-\EOF &&
+	! [branch@{0}] (0 seconds ago) commit: three
+	 ! [branch@{1}] (60 seconds ago) commit: two
+	  ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^
+	   ! [branch@{3}] (2 minutes ago) commit: one
+	----
+	+    [branch@{0}] three
+	++   [branch@{1}] two
+	   + [branch@{3}] one
+	++++ [branch@{2}] base
+	EOF
+	# the output always contains relative timestamps; use
+	# a known time to get deterministic results
+	GIT_TEST_DATE_NOW=$test_tick \
+	git show-branch --reflog branch >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--reflog handles missing reflog' '
+	git reflog expire --expire=now branch &&
+	test_must_fail git show-branch --reflog branch 2>err &&
+	grep "log .* is empty" err
+'
+
 test_done
-- 
2.44.0.rc2.424.gbdbf4d014b


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] get_oid_basic(): special-case ref@{n} for oldest reflog entry
  2024-02-26 10:00           ` [PATCH 0/3] show-branch --reflog fixes Jeff King
  2024-02-26 10:02             ` [PATCH 1/3] Revert "refs: allow @{n} to work with n-sized reflog" Jeff King
@ 2024-02-26 10:04             ` Jeff King
  2024-02-26 15:59               ` Junio C Hamano
  2024-02-26 10:08             ` [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog Jeff King
  2 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2024-02-26 10:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

The goal of 6436a20284 (refs: allow @{n} to work with n-sized reflog,
2021-01-07) was that if we have "n" entries in a reflog, we should still
be able to resolve ref@{n} by looking at the "old" value of the oldest
entry.

Commit 6436a20284 tried to put the logic into read_ref_at() by shifting
its idea of "n" by one. But we reverted that in the previous commit,
since it led to bugs in other callers which cared about the details of
the reflog entry we found. Instead, let's put the special case into the
caller that resolves @{n}, as it cares only about the oid.

read_ref_at() is even kind enough to return the "old" value from the
final reflog; it just returns "1" to signal to us that we ran off the
end of the reflog. But we can notice in the caller that we read just
enough records for that "old" value to be the one we're looking for, and
use it.

Note that read_ref_at() could notice this case, too, and just return 0.
But we don't want to do that, because the caller must be made aware that
we only found the oid, not an actual reflog entry (and the call sites in
show-branch do care about this).

There is one complication, though. When read_ref_at() hits a truncated
reflog, it will return the "old" value of the oldest entry only if it is
not the null oid. Otherwise, it actually returns the "new" value from
that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in
oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a
ref created recently will produce the initial value as a convenience
(even though technically it did not exist 20 years ago).

But this convenience is only useful for time-based cutoffs. For
count-based cutoffs, get_oid_basic() has always simply complained about
going too far back:

  $ git rev-parse HEAD@{20}
  fatal: log for 'HEAD' only has 16 entries

and we should continue to do so, rather than returning a nonsense value
(there's even a test in t1508 already which covers this). So let's have
the d1a4489a56 code kick in only when doing timestamp-based cutoffs.

Signed-off-by: Jeff King <peff@peff.net>
---
 object-name.c              | 9 +++++++++
 refs.c                     | 2 +-
 t/t1508-at-combinations.sh | 2 +-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/object-name.c b/object-name.c
index 3a2ef5d680..511f09bc0f 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1034,6 +1034,15 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 						len, str,
 						show_date(co_time, co_tz, DATE_MODE(RFC2822)));
 				}
+			} else if (nth == co_cnt && !is_null_oid(oid)) {
+				/*
+				 * We were asked for the Nth reflog (counting
+				 * from 0), but there were only N entries.
+				 * read_ref_at() will have returned "1" to tell
+				 * us it did not find an entry, but it did
+				 * still fill in the oid with the "old" value,
+				 * which we can use.
+				 */
 			} else {
 				if (flags & GET_OID_QUIETLY) {
 					exit(128);
diff --git a/refs.c b/refs.c
index ba1a4db754..6b826b002e 100644
--- a/refs.c
+++ b/refs.c
@@ -1083,7 +1083,7 @@ static int read_ref_at_ent_oldest(struct object_id *ooid, struct object_id *noid

 	set_read_ref_cutoffs(cb, timestamp, tz, message);
 	oidcpy(cb->oid, ooid);
-	if (is_null_oid(cb->oid))
+	if (cb->at_time && is_null_oid(cb->oid))
 		oidcpy(cb->oid, noid);
 	/* We just want the first entry */
 	return 1;
diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
index 3e5f32f604..370bf7137e 100755
--- a/t/t1508-at-combinations.sh
+++ b/t/t1508-at-combinations.sh
@@ -103,7 +103,7 @@ test_expect_success 'create path with @' '
 check "@:normal" blob content
 check "@:fun@ny" blob content

-test_expect_failure '@{1} works with only one reflog entry' '
+test_expect_success '@{1} works with only one reflog entry' '
 	git checkout -B newbranch main &&
 	git reflog expire --expire=now refs/heads/newbranch &&
 	git commit --allow-empty -m "first after expiration" &&
-- 
2.44.0.rc2.424.gbdbf4d014b

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog
  2024-02-26 10:00           ` [PATCH 0/3] show-branch --reflog fixes Jeff King
  2024-02-26 10:02             ` [PATCH 1/3] Revert "refs: allow @{n} to work with n-sized reflog" Jeff King
  2024-02-26 10:04             ` [PATCH 2/3] get_oid_basic(): special-case ref@{n} for oldest reflog entry Jeff King
@ 2024-02-26 10:08             ` Jeff King
  2024-02-26 10:10               ` Jeff King
  2024-02-26 17:25               ` Junio C Hamano
  2 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2024-02-26 10:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

The previous commit special-cased get_oid_basic()'s handling of ref@{n}
for a reflog with n entries. But its special case doesn't work for
ref@{0} in an empty reflog, because read_ref_at() dies when it notices
the empty reflog!

We can make this work by special-casing this in read_ref_at(). It's
somewhat gross, for two reasons:

  1. We have no reflog entry to describe in the "msg" out-parameter. So
     we have to leave it uninitialized or make something up.

  2. Likewise, we have no oid to put in the "oid" out-parameter. Leaving
     it untouched is actually the best thing here, as all of the callers
     will have initialized it with the current ref value via
     repo_dwim_log(). This is rather subtle, but it is how things worked
     in 6436a20284 (refs: allow @{n} to work with n-sized reflog,
     2021-01-07) before we reverted it.

The key difference from 6436a20284 here is that we'll return "1" to
indicate that we _didn't_ find the requested reflog entry. Coupled with
the special-casing in get_oid_basic() in the previous commit, that's
enough to make looking up ref@{0} work, and we can flip 6436a20284's
test back to expect_success.

It also means that the call in show-branch which segfaulted with
6436a20284 (and which is now tested in t3202) remains OK. The caller
notices that we could not find any reflog entry, and so it breaks out of
its loop, showing nothing. This is different from the current behavior
of producing an error, but it's just as reasonable (and is exactly what
we'd do if you asked it to walk starting at ref@{1} but there was only 1
entry).

Thus nobody should actually look at the reflog entry info we return. But
we'll still put in some fake values just to be on the safe side, since
this is such a subtle and confusing interface. Likewise, we'll document
what's going on in a comment above the function declaration. If this
were a function with a lot of callers, the footgun would probably not be
worth it. But it has only ever had two callers in its 18-year existence,
and it seems unlikely to grow more. So let's hold our noses and let
users enjoy the convenience of a simulated ref@{0}.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c                     | 15 +++++++++++++++
 refs.h                     | 15 ++++++++++++++-
 t/t1508-at-combinations.sh |  2 +-
 t/t3202-show-branch.sh     |  4 ++--
 4 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 6b826b002e..8eaec5eca7 100644
--- a/refs.c
+++ b/refs.c
@@ -1109,6 +1109,21 @@ int read_ref_at(struct ref_store *refs, const char *refname,
 	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
 
 	if (!cb.reccnt) {
+		if (cnt == 0) {
+			/*
+			 * The caller asked for ref@{0}, and we had no entries.
+			 * It's a bit subtle, but in practice all callers have
+			 * prepped the "oid" field with the current value of
+			 * the ref, which is the most reasonable fallback.
+			 *
+			 * We'll put dummy values into the out-parameters (so
+			 * they're not just uninitialized garbage), and the
+			 * caller can take our return value as a hint that
+			 * we did not find any such reflog.
+			 */
+			set_read_ref_cutoffs(&cb, 0, 0, "empty reflog");
+			return 1;
+		}
 		if (flags & GET_OID_QUIETLY)
 			exit(128);
 		else
diff --git a/refs.h b/refs.h
index 303c5fac4d..37116ad2b2 100644
--- a/refs.h
+++ b/refs.h
@@ -440,7 +440,20 @@ int refs_create_reflog(struct ref_store *refs, const char *refname,
 		       struct strbuf *err);
 int safe_create_reflog(const char *refname, struct strbuf *err);
 
-/** Reads log for the value of ref during at_time. **/
+/**
+ * Reads log for the value of ref during at_time (in which case "cnt" should be
+ * negative) or the reflog "cnt" entries from the top (in which case "at_time"
+ * should be 0).
+ *
+ * If we found the reflog entry in question, returns 0 (and details of the
+ * entry can be found in the out-parameters).
+ *
+ * If we ran out of reflog entries, the out-parameters are filled with the
+ * details of the oldest entry we did find, and the function returns 1. Note
+ * that there is one important special case here! If the reflog was empty
+ * and the caller asked for the 0-th cnt, we will return "1" but leave the
+ * "oid" field untouched.
+ **/
 int read_ref_at(struct ref_store *refs,
 		const char *refname, unsigned int flags,
 		timestamp_t at_time, int cnt,
diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
index 370bf7137e..e841309d0e 100755
--- a/t/t1508-at-combinations.sh
+++ b/t/t1508-at-combinations.sh
@@ -110,7 +110,7 @@ test_expect_success '@{1} works with only one reflog entry' '
 	test_cmp_rev newbranch~ newbranch@{1}
 '
 
-test_expect_failure '@{0} works with empty reflog' '
+test_expect_success '@{0} works with empty reflog' '
 	git checkout -B newbranch main &&
 	git reflog expire --expire=now refs/heads/newbranch &&
 	test_cmp_rev newbranch newbranch@{0}
diff --git a/t/t3202-show-branch.sh b/t/t3202-show-branch.sh
index 35f35f8091..a1139f79e2 100755
--- a/t/t3202-show-branch.sh
+++ b/t/t3202-show-branch.sh
@@ -279,8 +279,8 @@ test_expect_success '--reflog shows reflog entries' '
 
 test_expect_success '--reflog handles missing reflog' '
 	git reflog expire --expire=now branch &&
-	test_must_fail git show-branch --reflog branch 2>err &&
-	grep "log .* is empty" err
+	git show-branch --reflog branch >actual &&
+	test_must_be_empty actual
 '
 
 test_done
-- 
2.44.0.rc2.424.gbdbf4d014b

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog
  2024-02-26 10:08             ` [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog Jeff King
@ 2024-02-26 10:10               ` Jeff King
  2024-02-26 17:25                 ` Junio C Hamano
  2024-02-26 17:25               ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff King @ 2024-02-26 10:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

On Mon, Feb 26, 2024 at 05:08:03AM -0500, Jeff King wrote:

> Thus nobody should actually look at the reflog entry info we return. But
> we'll still put in some fake values just to be on the safe side, since
> this is such a subtle and confusing interface. Likewise, we'll document
> what's going on in a comment above the function declaration. If this
> were a function with a lot of callers, the footgun would probably not be
> worth it. But it has only ever had two callers in its 18-year existence,
> and it seems unlikely to grow more. So let's hold our noses and let
> users enjoy the convenience of a simulated ref@{0}.

Obviously I'm sympathetic to Patrick's position that this empty-reflog
special case is kind of gross. ;)

That's one of the reasons I split this out from patch 2; we can see
exactly what must be done to make each case work. And in fact I had
originally started to write a patch that simply changed t1508 to expect
failure. I could still be persuaded to go that way if anybody feels
strongly.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] get_oid_basic(): special-case ref@{n} for oldest reflog entry
  2024-02-26 10:04             ` [PATCH 2/3] get_oid_basic(): special-case ref@{n} for oldest reflog entry Jeff King
@ 2024-02-26 15:59               ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2024-02-26 15:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

Jeff King <peff@peff.net> writes:

> the reflog entry we found. Instead, let's put the special case into the
> caller that resolves @{n}, as it cares only about the oid.

Thanks.  I am quite puzzled why we didn't do this obvious thing from
the start.

> There is one complication, though. When read_ref_at() hits a truncated
> reflog, it will return the "old" value of the oldest entry only if it is
> not the null oid. Otherwise, it actually returns the "new" value from
> that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in
> oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a
> ref created recently will produce the initial value as a convenience
> (even though technically it did not exist 20 years ago).
>
> But this convenience is only useful for time-based cutoffs. For
> count-based cutoffs, get_oid_basic() has always simply complained about
> going too far back:
>
>   $ git rev-parse HEAD@{20}
>   fatal: log for 'HEAD' only has 16 entries
>
> and we should continue to do so, rather than returning a nonsense value
> (there's even a test in t1508 already which covers this). So let's have
> the d1a4489a56 code kick in only when doing timestamp-based cutoffs.

Makes perfect sense.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  object-name.c              | 9 +++++++++
>  refs.c                     | 2 +-
>  t/t1508-at-combinations.sh | 2 +-
>  3 files changed, 11 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog
  2024-02-26 10:08             ` [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog Jeff King
  2024-02-26 10:10               ` Jeff King
@ 2024-02-26 17:25               ` Junio C Hamano
  2024-02-27  8:05                 ` Jeff King
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2024-02-26 17:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

Jeff King <peff@peff.net> writes:

>  	if (!cb.reccnt) {
> +		if (cnt == 0) {

Style "if (!cnt)" ?  In this particular case I do not think it
actually is an improvement, though, simply because zero is really
special in this logic.

> +			/*
> +			 * The caller asked for ref@{0}, and we had no entries.
> +			 * It's a bit subtle, but in practice all callers have
> +			 * prepped the "oid" field with the current value of
> +			 * the ref, which is the most reasonable fallback.
> +			 *
> +			 * We'll put dummy values into the out-parameters (so
> +			 * they're not just uninitialized garbage), and the
> +			 * caller can take our return value as a hint that
> +			 * we did not find any such reflog.
> +			 */
> +			set_read_ref_cutoffs(&cb, 0, 0, "empty reflog");
> +			return 1;
> +		}

The dummy value I 100% agree with ;-).

You mentioned the convenience special case for time-based reflog
query for a time older than (e.g. @{20.years.ago}) the reflog
itself, and perhaps this one should be treated as its counterpart,
that is only useful for count-based access.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog
  2024-02-26 10:10               ` Jeff King
@ 2024-02-26 17:25                 ` Junio C Hamano
  2024-02-27  8:07                   ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2024-02-26 17:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

Jeff King <peff@peff.net> writes:

> That's one of the reasons I split this out from patch 2; we can see
> exactly what must be done to make each case work. And in fact I had
> originally started to write a patch that simply changed t1508 to expect
> failure. I could still be persuaded to go that way if anybody feels
> strongly.

I do not feel strongly either way myself.  It just is interesting
that the older end of the history is with @{20.years.ago} special
case that is only for time-based query, while the newer end of the
history is with @{0} special case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog
  2024-02-26 17:25               ` Junio C Hamano
@ 2024-02-27  8:05                 ` Jeff King
  2024-02-27 17:03                   ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2024-02-27  8:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

On Mon, Feb 26, 2024 at 09:25:28AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >  	if (!cb.reccnt) {
> > +		if (cnt == 0) {
> 
> Style "if (!cnt)" ?  In this particular case I do not think it
> actually is an improvement, though, simply because zero is really
> special in this logic.

Yeah, I didn't really choose "== 0" as a conscious decision. But I do
tend to write "!cnt" when available, so I think I sub-consciously chose
this as a better description of the intent.

> 
> > +			/*
> > +			 * The caller asked for ref@{0}, and we had no entries.
> > +			 * It's a bit subtle, but in practice all callers have
> > +			 * prepped the "oid" field with the current value of
> > +			 * the ref, which is the most reasonable fallback.
> > +			 *
> > +			 * We'll put dummy values into the out-parameters (so
> > +			 * they're not just uninitialized garbage), and the
> > +			 * caller can take our return value as a hint that
> > +			 * we did not find any such reflog.
> > +			 */
> > +			set_read_ref_cutoffs(&cb, 0, 0, "empty reflog");
> > +			return 1;
> > +		}
> 
> The dummy value I 100% agree with ;-).
> 
> You mentioned the convenience special case for time-based reflog
> query for a time older than (e.g. @{20.years.ago}) the reflog
> itself, and perhaps this one should be treated as its counterpart,
> that is only useful for count-based access.

I think the true counterpart to that would be extending what's added by
patch 2 to get_oid_basic() to stop checking that we hit the count
exactly.

Let me try to lay out my thinking. If you _do_ have a reflog and the
request (whether time or count-based) goes too far back, read_ref_at()
will give you the oldest entry and return "1". And then in
get_oid_basic():

  - if it was a time based cutoff (like @{20.years.ago}), we will issue
    a warning ("log only goes back to 2024-01-01") but return the value
    anyway.

  - before this series, if it is a count based cutoff (like @{9999}), we
    fail and say "there are only have N entries".

  - after this series, we special case asking for @{9999} when there are
    9998 entries by returning the "old" oid but not issuing any warning
    (we do not need to, because we found the right answer for what it
    would have been had that old entry not been pruned).

  - what we _could_ do (but this series does not), and what would be the
    true counterpart to the @{20.years.ago} case, is to allow @{9999}
    for a reflog with only 20 entries, returning the old value from 20
    (or the new value if it was a creation!?) and issuing a warning
    saying "well, it only went back 20, but here you go".

I'm not so sure about that last one. It is pretty subjective, but
somehow asking for timestamps feels more "fuzzy" to me, and Git
returning a fuzzy answer is OK. Whereas asking for item 9999 in a list
with 20 items and getting back an answer feels more absolutely wrong. I
could be persuaded if there were a concrete use case, but I can't really
think of one. It seems more likely to confuse and hinder a user than to
help them.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog
  2024-02-26 17:25                 ` Junio C Hamano
@ 2024-02-27  8:07                   ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2024-02-27  8:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

On Mon, Feb 26, 2024 at 09:25:32AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > That's one of the reasons I split this out from patch 2; we can see
> > exactly what must be done to make each case work. And in fact I had
> > originally started to write a patch that simply changed t1508 to expect
> > failure. I could still be persuaded to go that way if anybody feels
> > strongly.
> 
> I do not feel strongly either way myself.  It just is interesting
> that the older end of the history is with @{20.years.ago} special
> case that is only for time-based query, while the newer end of the
> history is with @{0} special case.

I laid out my thinking on the 20.years.ago special case in another
reply, but I wanted to say one more thing. The special case here in
patch 3 is making @{0} work for the empty reflog, but there is no
matching special case for time-based timestamps. If you have an empty
reflog and ask for @{20.years.ago}, you will get the usual "nope, the
reflog is empty" response (as opposed to having a non-empty reflog that
cuts off before 20 years ago).

Obviously we could make that work, but I think the point is that "@{0}"
is special magic for "the current value" in a way that a timestamp isn't
really.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog
  2024-02-27  8:05                 ` Jeff King
@ 2024-02-27 17:03                   ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2024-02-27 17:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, Yasushi SHOJI, Denton Liu, Git Mailing List

Jeff King <peff@peff.net> writes:

> Let me try to lay out my thinking. If you _do_ have a reflog and the
> request (whether time or count-based) goes too far back, read_ref_at()
> will give you the oldest entry and return "1". And then in
> get_oid_basic():
> ...
>   - what we _could_ do (but this series does not), and what would be the
>     true counterpart to the @{20.years.ago} case, is to allow @{9999}
>     for a reflog with only 20 entries, returning the old value from 20
>     (or the new value if it was a creation!?) and issuing a warning
>     saying "well, it only went back 20, but here you go".

Ah, I wasn't drawing *that* similarity.  My thinking was more like

 - When you have two entries in reflog, ref@{0} will use and find
   the latest entry whose value is the same as the ref itself.

 - When you have one entry, @{0} will use and find the latest entry
   whose value is the same as the ref itself.

 - When you have zero entry, @{0} can do the same by taking
   advantage of the fact that its value is supposed to be the same
   as the ref itself anyway.

that happens near the youngest end of a reflog, contrasting with the
@{20.years.ago} that happens near the oldest end.

> I'm not so sure about that last one. It is pretty subjective, but
> somehow asking for timestamps feels more "fuzzy" to me, and Git
> returning a fuzzy answer is OK. Whereas asking for item 9999 in a list
> with 20 items and getting back an answer feels more absolutely wrong. I
> could be persuaded if there were a concrete use case, but I can't really
> think of one. It seems more likely to confuse and hinder a user than to
> help them.

I do not think anybody misses @{9999} not giving the oldest
available, simply because "oldest" is a concept that fits better
with time-based queries than count-based queries.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-02-27 17:04 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-21  1:48 Segfault: git show-branch --reflog refs/pullreqs/1 Yasushi SHOJI
2024-02-21  8:42 ` Jeff King
2024-02-21 10:05   ` Patrick Steinhardt
2024-02-21 17:38     ` Jeff King
2024-02-21 17:44   ` Junio C Hamano
2024-02-22  9:02     ` Patrick Steinhardt
2024-02-22 16:32       ` Junio C Hamano
2024-02-22 17:22         ` Jeff King
2024-02-26 10:00           ` [PATCH 0/3] show-branch --reflog fixes Jeff King
2024-02-26 10:02             ` [PATCH 1/3] Revert "refs: allow @{n} to work with n-sized reflog" Jeff King
2024-02-26 10:04             ` [PATCH 2/3] get_oid_basic(): special-case ref@{n} for oldest reflog entry Jeff King
2024-02-26 15:59               ` Junio C Hamano
2024-02-26 10:08             ` [PATCH 3/3] read_ref_at(): special-case ref@{0} for an empty reflog Jeff King
2024-02-26 10:10               ` Jeff King
2024-02-26 17:25                 ` Junio C Hamano
2024-02-27  8:07                   ` Jeff King
2024-02-26 17:25               ` Junio C Hamano
2024-02-27  8:05                 ` Jeff King
2024-02-27 17:03                   ` Junio C Hamano
2024-02-21  9:52 ` Segfault: git show-branch --reflog refs/pullreqs/1 Patrick Steinhardt
2024-02-21  9:56 ` [PATCH 0/2] Detect empty or missing reflogs with `ref@{0}` Patrick Steinhardt
2024-02-21  9:56   ` [PATCH 1/2] object-name: detect and report empty reflogs Patrick Steinhardt
2024-02-21 10:37     ` Kristoffer Haugsbakk
2024-02-21 16:48     ` Eric Sunshine
2024-02-21 17:31     ` Jeff King
2024-02-21  9:56   ` [PATCH 2/2] builtin/show-branch: detect " Patrick Steinhardt
2024-02-21 17:35     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).