* Bug report: Strange behavior with `git gc` and `reference-transaction` hook @ 2021-11-18 0:41 Waleed Khan 2021-11-18 0:52 ` Bryan Turner 0 siblings, 1 reply; 5+ messages in thread From: Waleed Khan @ 2021-11-18 0:41 UTC (permalink / raw) To: git Hi all, I'm seeing unusual behavior on Git built from source at cd3e606211bb1cf8bc57f7d76bab98cc17a150bc, but which appears to extend back to Git v2.29. This is a repro script: ``` #!/bin/sh temp_dir=$(mktemp -d) echo "git dir is: $temp_dir" mkdir -p "$temp_dir" trap "rm -rf $temp_dir" EXIT cd "$temp_dir" || exit 1 git init -q date='Thu, 07 Apr 2005 22:13:13 +0200' GIT_AUTHOR_DATE="$date" GIT_COMMITTER_DATE="$date" git commit -q --allow-empty -m 'Initial commit' mkdir -p '.git/hooks' cat >.git/hooks/reference-transaction <<'EOT' #!/bin/sh [[ "$1" != 'committed' ]] && exit 0 echo 'New reference-transaction invocation' while read old new ref; do echo " old: $old, new: $new, ref: $ref" done EOT chmod +x .git/hooks/reference-transaction git gc --prune=now -q git show-ref refs/heads/master ``` And this is the output it produces: ``` git dir is: /var/folders/gn/gdp9z_g968b9nx7c9lvgy8y00000gp/T/tmp.b3Jc6qnb New reference-transaction invocation old: 0000000000000000000000000000000000000000, new: e197d18c017d4038418be8b1cd38f4503e289165, ref: refs/heads/master New reference-transaction invocation old: e197d18c017d4038418be8b1cd38f4503e289165, new: 0000000000000000000000000000000000000000, ref: refs/heads/master e197d18c017d4038418be8b1cd38f4503e289165 refs/heads/master ``` These hooks are invoked a few milliseconds one after another. The expected behavior would be that the latest reference transaction hook refers to the state of the references on disk. That is, either `master` should point to 0 (be deleted), or it should have said that `master` pointed to `e197d1`. But if we actually examine `master`, it's set to `e197d1`, just as you would expect. The GC should have been a no-op overall. Best, Waleed ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook 2021-11-18 0:41 Bug report: Strange behavior with `git gc` and `reference-transaction` hook Waleed Khan @ 2021-11-18 0:52 ` Bryan Turner [not found] ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com> 2021-11-18 18:08 ` Jeff King 0 siblings, 2 replies; 5+ messages in thread From: Bryan Turner @ 2021-11-18 0:52 UTC (permalink / raw) To: Waleed Khan; +Cc: git On Wed, Nov 17, 2021 at 4:42 PM Waleed Khan <me@waleedkhan.name> wrote: > > Hi all, > > I'm seeing unusual behavior on Git built from source at > cd3e606211bb1cf8bc57f7d76bab98cc17a150bc, but which appears to extend > back to Git v2.29. > > This is a repro script: > > ``` > #!/bin/sh > > temp_dir=$(mktemp -d) > echo "git dir is: $temp_dir" > mkdir -p "$temp_dir" > trap "rm -rf $temp_dir" EXIT > > cd "$temp_dir" || exit 1 > git init -q > date='Thu, 07 Apr 2005 22:13:13 +0200' > GIT_AUTHOR_DATE="$date" GIT_COMMITTER_DATE="$date" git commit -q > --allow-empty -m 'Initial commit' > > mkdir -p '.git/hooks' > cat >.git/hooks/reference-transaction <<'EOT' > #!/bin/sh > > [[ "$1" != 'committed' ]] && exit 0 > echo 'New reference-transaction invocation' > > while read old new ref; do > echo " old: $old, new: $new, ref: $ref" > done > EOT > chmod +x .git/hooks/reference-transaction > > git gc --prune=now -q > git show-ref refs/heads/master > ``` > > And this is the output it produces: > > ``` > git dir is: /var/folders/gn/gdp9z_g968b9nx7c9lvgy8y00000gp/T/tmp.b3Jc6qnb > New reference-transaction invocation > old: 0000000000000000000000000000000000000000, new: > e197d18c017d4038418be8b1cd38f4503e289165, ref: refs/heads/master > New reference-transaction invocation > old: e197d18c017d4038418be8b1cd38f4503e289165, new: > 0000000000000000000000000000000000000000, ref: refs/heads/master > e197d18c017d4038418be8b1cd38f4503e289165 refs/heads/master > ``` > > These hooks are invoked a few milliseconds one after another. There are two built-in "ref backends" that Git uses out of the box: A packed backend, which manages the "packed-refs" file, and a loose backend, which manages other files under "$GIT_DIR/refs". What you're seeing is a reference transaction for the packed backend which is adding the new value for "refs/heads/master" to "packed-refs" (packed backend reference-transactions rarely, if ever, include an actual old hash, as far as I can tell), and then a second reference transaction for the loose backend to delete the loose "refs/heads/master" file, now that it's packed. > > The expected behavior would be that the latest reference transaction > hook refers to the state of the references on disk. That is, either > `master` should point to 0 (be deleted), or it should have said that > `master` pointed to `e197d1`. > > But if we actually examine `master`, it's set to `e197d1`, just as you > would expect. The GC should have been a no-op overall. One of the subtasks of "git gc" is "git pack-refs". If you inspect in more detail, I suspect you'll find that "refs/heads/master" was loose before "git gc" ran (as in, there was a file "$GIT_DIR/refs/heads/master") and "packed-refs" either didn't have a "refs/heads/master" entry or had a different hash. (Loose refs always "win" over packed, since ref updates only write loose refs.) Hope this helps, Bryan Turner > > Best, > Waleed ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com>]
* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook [not found] ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com> @ 2021-11-18 1:28 ` Bryan Turner 0 siblings, 0 replies; 5+ messages in thread From: Bryan Turner @ 2021-11-18 1:28 UTC (permalink / raw) To: Waleed Khan; +Cc: git (Please don't top-post on the list) On Wed, Nov 17, 2021 at 5:01 PM Waleed Khan <me@waleedkhan.name> wrote: > > So what should users of the `reference-transaction` hook do in general? It sounds like we can never actually rely on the new value corresponding to the state of the reference on disk. Is that correct? I wish I had a good answer here, but unfortunately I don't. My comments are coming from a similar place: The end result of my own experiences and learnings trying to use "reference-transaction" for something. The dual backends, and the fact that "reference-transaction" receives _no hints_, as far as I'm aware, of what backend a callback is for, makes some use cases quite difficult in practice. It would be nice if there was an environment variable, or something, to indicate what backend a transaction was for. But for cases like this, where a ref is being "moved" between backends, the split nature of the backends, and the fact that each has its own separate transaction, makes working in a "simple" script, which has no more state than its current invocation, extremely difficult. Of course, since Git doesn't really have an ACID mechanism for atomically updating both "packed-refs" and loose ref files, trying to have a single reference transaction wouldn't really work either. > > I suppose if we want the real new value of the reference, we should always look it up freshly? If so, the githooks documentation should be updated. Currently, it has a remark on the validity when the old reference is zero, but not on the validity of the new reference. I'm not sure the "validity" of the new reference is the problem here. Both reference transactions _are_ giving you the correct new value; "packed-refs" is inserting "refs/heads/master" at e197d18, and the loose ref for it is being deleted. To me the difficulty isn't that the new value is _wrong_ (it's actually right), it's that, because there are two reference transactions and no shared context between them or hints as to what backend they're for, it's not really possible to put 2+2 together and realize that 4 is a ref being moved from loose to packed--but otherwise unchanged. Best regards, Bryan Turner ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook 2021-11-18 0:52 ` Bryan Turner [not found] ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com> @ 2021-11-18 18:08 ` Jeff King 2021-12-07 8:24 ` Patrick Steinhardt 1 sibling, 1 reply; 5+ messages in thread From: Jeff King @ 2021-11-18 18:08 UTC (permalink / raw) To: Bryan Turner; +Cc: Patrick Steinhardt, Waleed Khan, git [+cc pks] On Wed, Nov 17, 2021 at 04:52:46PM -0800, Bryan Turner wrote: > > The expected behavior would be that the latest reference transaction > > hook refers to the state of the references on disk. That is, either > > `master` should point to 0 (be deleted), or it should have said that > > `master` pointed to `e197d1`. > > > > But if we actually examine `master`, it's set to `e197d1`, just as you > > would expect. The GC should have been a no-op overall. > > One of the subtasks of "git gc" is "git pack-refs". If you inspect in > more detail, I suspect you'll find that "refs/heads/master" was loose > before "git gc" ran (as in, there was a file > "$GIT_DIR/refs/heads/master") and "packed-refs" either didn't have a > "refs/heads/master" entry or had a different hash. (Loose refs always > "win" over packed, since ref updates only write loose refs.) It seems totally broken to me that we'd trigger the reference-transaction hook for ref packing. The point of the hook is to track logical updates to the refs. But during ref packing that does not change at all; the value remains the same. So I don't think we should be triggering the hook at all, let alone with confusing values. This snippet shows a simple case that I think is wrong: -- >8 -- git init -q repo cd repo cat >.git/hooks/reference-transaction <<\EOF #!/bin/sh echo >&2 "==> reference-transaction $*" sed 's/^/ /' EOF chmod +x .git/hooks/reference-transaction echo >&2 "running commit..." git commit --allow-empty -qm foo echo >&2 "running pack-refs..." git pack-refs --all --prune -- >8 -- It produces: running commit... ==> reference-transaction prepared 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main ==> reference-transaction committed 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main running pack-refs... ==> reference-transaction prepared 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main ==> reference-transaction committed 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main ==> reference-transaction prepared 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main ==> reference-transaction committed 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main I think the final four invocations should be skipped entirely. They're pointless at best (nothing actually changed), and extremely misleading at worst (they look like the ref ended up deleted!). -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook 2021-11-18 18:08 ` Jeff King @ 2021-12-07 8:24 ` Patrick Steinhardt 0 siblings, 0 replies; 5+ messages in thread From: Patrick Steinhardt @ 2021-12-07 8:24 UTC (permalink / raw) To: Jeff King; +Cc: Bryan Turner, Waleed Khan, git [-- Attachment #1: Type: text/plain, Size: 4393 bytes --] On Thu, Nov 18, 2021 at 01:08:41PM -0500, Jeff King wrote: > [+cc pks] > > On Wed, Nov 17, 2021 at 04:52:46PM -0800, Bryan Turner wrote: > > > > The expected behavior would be that the latest reference transaction > > > hook refers to the state of the references on disk. That is, either > > > `master` should point to 0 (be deleted), or it should have said that > > > `master` pointed to `e197d1`. > > > > > > But if we actually examine `master`, it's set to `e197d1`, just as you > > > would expect. The GC should have been a no-op overall. > > > > One of the subtasks of "git gc" is "git pack-refs". If you inspect in > > more detail, I suspect you'll find that "refs/heads/master" was loose > > before "git gc" ran (as in, there was a file > > "$GIT_DIR/refs/heads/master") and "packed-refs" either didn't have a > > "refs/heads/master" entry or had a different hash. (Loose refs always > > "win" over packed, since ref updates only write loose refs.) > > It seems totally broken to me that we'd trigger the > reference-transaction hook for ref packing. The point of the hook is to > track logical updates to the refs. But during ref packing that does not > change at all; the value remains the same. So I don't think we should be > triggering the hook at all, let alone with confusing values. > > This snippet shows a simple case that I think is wrong: > > -- >8 -- > git init -q repo > cd repo > > cat >.git/hooks/reference-transaction <<\EOF > #!/bin/sh > echo >&2 "==> reference-transaction $*" > sed 's/^/ /' > EOF > chmod +x .git/hooks/reference-transaction > > echo >&2 "running commit..." > git commit --allow-empty -qm foo > echo >&2 "running pack-refs..." > git pack-refs --all --prune > -- >8 -- > > It produces: > > running commit... > ==> reference-transaction prepared > 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD > 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main > ==> reference-transaction committed > 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD > 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main > running pack-refs... > ==> reference-transaction prepared > 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main > ==> reference-transaction committed > 0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main > ==> reference-transaction prepared > 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main > ==> reference-transaction committed > 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main > > I think the final four invocations should be skipped entirely. They're > pointless at best (nothing actually changed), and extremely misleading > at worst (they look like the ref ended up deleted!). > > -Peff Yeah, I agree that this is something that is totally misleading. For what it's worth, we also hit a similar case in production at GitLab, where we use the hook to do voting on ref updates across different nodes. Sometimes we observed different votes on a subset of nodes, and it took me quite some time to figure out that this was dependent on whether a ref was packed or not. We're now filtering out transactions which consist only of force-deletions [1], which are _likely_ to be cleanups of such packed refs. But this is very clearly a hack, and I agree that calling the hook for In the end, these really are special cases of how the "files" backend works and thus are implementation details which shouldn't be exposed to the user at all. With these implementation details exposed, we'll start to see different behaviour of when the hook is executed depending on which ref backend you use, which is even worse compared to the current state where it's at least consistently misleading. As Peff said, the hook should really only track logical changes and not expose any implementation details. Patrick [1]: https://gitlab.com/gitlab-org/gitaly/-/blob/3ef55853e9e161204464868390d97d1a1577042d/internal/gitaly/hook/referencetransaction.go#L58 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-07 8:24 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-11-18 0:41 Bug report: Strange behavior with `git gc` and `reference-transaction` hook Waleed Khan 2021-11-18 0:52 ` Bryan Turner [not found] ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com> 2021-11-18 1:28 ` Bryan Turner 2021-11-18 18:08 ` Jeff King 2021-12-07 8:24 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).