git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo
@ 2024-08-02  7:31 Han Young
  2024-08-02  7:31 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
                   ` (10 more replies)
  0 siblings, 11 replies; 67+ messages in thread
From: Han Young @ 2024-08-02  7:31 UTC (permalink / raw)
  To: git; +Cc: Han Young

We use --filter=blob:none to clone our large monorepo.
After a while we started getting reports from engineers complaining 
that their local repository was broken. Upon further investigation, 
we found that broken repositories are missing objects that created 
in that particular local repository. git fsck reports "bad object: xxx".

Here are the minimal steps to recreate issue.
    # create a normal git repo, add one file and push to remote
    $ mkdir full && cd full && git init && touch foo
    $ git add foo && git commit -m "commit 1" && git push

    # partial clone a copy of the repo we just created
    $ cd ..
    $ git clone git@example.org:example/foo.git --filter=blob:none partial

    # create a commit in partial cloned repo and push it to remote
    $ cd partial && echo 'hello' > foo && git commit -a -m "commit 2"
    $ git push

    # run gc in partial repo
    $ git gc --prune=now

    # in normal git repo, create another commit on top of the
    # commit we created in partial repo
    $ cd ../full && git pull && echo ' world' >> foo
    $ git commit -a -m "commit 3" && git push

    # pull from remote in partial repo, and run gc again
    $ cd ../partial && git pull && git gc --prune=now

The last `git gc` will error out on fsck with error message like this:

  error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94
  error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94

Note that disabling commit graph on the partial repo will cause 
`git gc` to exit normally, but will still not solve the 
underlying problem. And in more complex situations, 
disabling commit graph will not avoid the error.

The problem is caused by the wrong result returned by setup_revision
with `--exclude-promisor-objects` enabled.
`git gc` will call `git repack`, which will call `git pack-objects`
twice on a partially cloned repo. The first call to pack-objects 
combines all the promisor packfiles, and the second pack-objects 
command packs all reachable non-promisor objects into a normal packfile.
However, a bug in setup_revision caused some non-promisor objects 
to be mistakenly marked as in promisor packfiles in the second call 
to pack-objects. These incorrectly marked objects are never repacked, 
and were deleted from the object store as a result. In revision.c, 
`process_parents()` recursively marks commit parents as UNINTERESTING 
if the commit itself is UNINTERESTING. `--exclude-promisor-objects` 
is implemented as "iterate all objects in promisor packfiles, 
mark them as UNINTERESTING". So when we find a commit object in 
a promisor packfile, we also set its ancestors as UNINTERESTING, 
whether the ancestor is a promisor object or not. In the example above, 
"commit 2" is a normal commit object, living in a normal packfile, 
but marked as a promisor object and gc'ed from the object store.

Han Young (1):
  revision: don't set parents as uninteresting if exclude promisor

 revision.c               |  2 +-
 t/t0410-partial-clone.sh | 22 +++++++++++++++++++++-
 2 files changed, 22 insertions(+), 2 deletions(-)

-- 
2.46.0.rc0.107.gae139121ac.dirty


^ permalink raw reply	[flat|nested] 67+ messages in thread
* [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set
@ 2024-07-19  9:34 Han Young
  2024-07-19  9:34 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
  2024-07-20  5:28 ` [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Han Young
  0 siblings, 2 replies; 67+ messages in thread
From: Han Young @ 2024-07-19  9:34 UTC (permalink / raw)
  To: git; +Cc: Han Young

We use --filter=blob:none to clone our large monorepo. After a while we started getting reports from engineers complaining that their local repository was broken. Upon further investigation, we found that broken repositories are missing objects that created in that particular local repository. `git fsck` reports "bad object: xxxx".

Here are the minimal steps to recreate issue.
  # create a normal git repo, add one file and push to remote
  mkdir full && cd full && git init && touch foo && git add foo && git commit -m "commit 1" && git push

  # partial clone a copy of the repo we just created
  cd .. && git clone git@example.org:example/foo.git --filter=blob:none partial

  # create a commit in partial cloned repo and push it to remote
  cd partial && echo 'hello' > foo && git commit -a -m "commit 2" && git push

  # run gc in partial repo
  git gc --prune=now

  # in normal git repo, create another commit on top of the commit we created in partial repo
  cd ../full && git pull && echo ' world' >> foo && git commit -a -m "commit 3" && git push

  # pull from remote in partial repo, and run gc again
  cd ../partial && git pull && git gc --prune=now

The last `git gc` will error out on fsck with error message like this:
  error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94
  error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94

Note that disabling commit graph on the partial repo will cause`git gc` to exit normally, but will still not solve the underlying problem. And in more complex situations, disabling commit graph will not avoid the error.

The problem is caused by the wrong result returned by setup_revision with `--exclude-promisor-objects` enabled. `git gc` will call `git repack`, which will call `git pack-objects` twice on a partially cloned repo. The first call to pack-objects combines all the promisor packfiles, and the second pack-objects command packs all reachable non-promisor objects into a normal packfile. However, a bug in setup_revision caused some non-promisor objects to be mistakenly marked as in promisor packfiles in the second call to pack-objects. These incorrectly marked objects are never repacked, and were deleted from the object store as a result. In revision.c, `process_parents()` recursively marks commit parents as UNINTERESTING if the commit itself is UNINTERESTING. `--exclude-promisor-objects` is implemented as "iterate all objects in promisor packfiles, mark them as UNINTERESTING". So when we find a commit object in a promisor packfile, we also set its ancestors as UNINTERESTING, whether the ancestor is a promisor object or not. In the example above, "commit 2" is a normal commit object, living in a normal packfile, but marked as a promisor object and gc'ed from the object store.

Han Young (1):
  revision: don't set parents as uninteresting if exclude promisor

 revision.c               |  2 +-
 t/t0410-partial-clone.sh | 22 +++++++++++++++++++++-
 2 files changed, 22 insertions(+), 2 deletions(-)

-- 
2.46.0.rc0.107.gae139121ac.dirty


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2024-10-23 17:03 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02  7:31 [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Han Young
2024-08-02  7:31 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2024-08-02 16:45   ` Junio C Hamano
2024-08-12 12:34     ` [External] " 韩仰
2024-08-12 16:09       ` Junio C Hamano
2024-08-22  8:28         ` 韩仰
2024-08-13  0:45 ` [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Jonathan Tan
2024-08-13 17:18   ` Jonathan Tan
2024-08-14  4:10     ` Junio C Hamano
2024-08-14 19:30       ` Jonathan Tan
2024-08-23 12:43 ` [WIP v2 0/4] " Han Young
2024-08-23 12:43   ` [WIP v2 1/4] packfile: split promisor objects oidset into two Han Young
2024-08-23 12:43   ` [WIP v2 2/4] revision: add exclude-promisor-pack-objects option Han Young
2024-08-23 12:43   ` [WIP v2 3/4] revision: don't mark commit as UNINTERESTING if --exclude-promisor-objects is set Han Young
2024-08-23 12:43   ` [WIP v2 4/4] repack: use new exclude promisor pack objects option Han Young
2024-09-19 23:47 ` [PATCH 0/2] revision: fix reachable commits being gc'ed in partial repo Calvin Wan
2024-09-19 23:47 ` [PATCH 1/2] packfile: split promisor objects oidset into two Calvin Wan
2024-09-22  6:37   ` Junio C Hamano
2024-09-19 23:47 ` [PATCH 2/2] fetch-pack.c: do not declare local commits as "have" in partial repos Calvin Wan
2024-09-22  6:53   ` Junio C Hamano
2024-09-22 16:41     ` Junio C Hamano
2024-09-23  3:44     ` [External] " 韩仰
2024-09-23 16:21       ` Junio C Hamano
2024-10-02 22:35       ` Calvin Wan
2024-09-25  7:20 ` [PATCH 0/2] repack: pack everything into promisor packfile " Han Young
2024-09-25  7:20   ` [PATCH 1/2] repack: pack everything into packfile Han Young
2024-09-25  7:20   ` [PATCH 2/2] t0410: adapt tests to repack changes Han Young
2024-09-25 15:20   ` [PATCH 0/2] repack: pack everything into promisor packfile in partial repos Phillip Wood
2024-09-25 16:48     ` Junio C Hamano
2024-09-25 17:03   ` Junio C Hamano
2024-10-01 19:17 ` Missing Promisor Objects in Partial Repo Design Doc Calvin Wan
2024-10-01 19:35   ` Junio C Hamano
2024-10-02  2:54   ` Junio C Hamano
2024-10-02  7:57     ` [External] " Han Young
2024-10-08 21:35     ` Calvin Wan
2024-10-09  6:46       ` [External] " Han Young
2024-10-09 18:34         ` Jonathan Tan
2024-10-12  2:05           ` Jonathan Tan
2024-10-12  3:30             ` Han Young
2024-10-14 17:52               ` Jonathan Tan
2024-10-09 18:53     ` Jonathan Tan
2024-10-08  8:13 ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Han Young
2024-10-08  8:13   ` [PATCH v2 1/3] repack: pack everything into packfile Han Young
2024-10-08 21:41     ` Calvin Wan
2024-10-08  8:13   ` [PATCH v2 2/3] t0410: adapt tests to repack changes Han Young
2024-10-08  8:13   ` [PATCH v2 3/3] partial-clone: update doc Han Young
2024-10-08 21:57   ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-08 22:43     ` Junio C Hamano
2024-10-09  6:31     ` [External] " Han Young
2024-10-11  8:24 ` [PATCH v3 " Han Young
2024-10-11  8:24   ` [PATCH v3 1/3] repack: pack everything into packfile Han Young
2024-10-11  8:24   ` [PATCH v3 2/3] repack: adapt tests to repack changes Han Young
2024-10-11  8:24   ` [PATCH v3 3/3] partial-clone: update doc Han Young
2024-10-11 18:18   ` [PATCH v3 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-11 18:23     ` Junio C Hamano
2024-10-14  3:25 ` [PATCH v4 " Han Young
2024-10-14  3:25   ` [PATCH v4 1/3] repack: pack everything into packfile Han Young
2024-10-14  3:25   ` [PATCH v4 2/3] t0410: adapt tests to repack changes Han Young
2024-10-14  3:25   ` [PATCH v4 3/3] partial-clone: update doc Han Young
2024-10-21 22:29   ` [WIP 0/3] Repack on fetch Jonathan Tan
2024-10-21 22:29     ` [WIP 1/3] move variable Jonathan Tan
2024-10-21 22:29     ` [WIP 2/3] pack-objects Jonathan Tan
2024-10-21 22:29     ` [WIP 3/3] record local links and call pack-objects Jonathan Tan
2024-10-23  7:00     ` [External] [WIP 0/3] Repack on fetch Han Young
2024-10-23 17:03       ` Jonathan Tan
  -- strict thread matches above, loose matches on Subject: below --
2024-07-19  9:34 [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Han Young
2024-07-19  9:34 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2024-07-20  5:28 ` [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Han Young
2024-07-20  5:28   ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).