* [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set
@ 2024-07-19 9:34 Han Young
2024-07-19 9:34 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Han Young @ 2024-07-19 9:34 UTC (permalink / raw)
To: git; +Cc: Han Young
We use --filter=blob:none to clone our large monorepo. After a while we started getting reports from engineers complaining that their local repository was broken. Upon further investigation, we found that broken repositories are missing objects that created in that particular local repository. `git fsck` reports "bad object: xxxx".
Here are the minimal steps to recreate issue.
# create a normal git repo, add one file and push to remote
mkdir full && cd full && git init && touch foo && git add foo && git commit -m "commit 1" && git push
# partial clone a copy of the repo we just created
cd .. && git clone git@example.org:example/foo.git --filter=blob:none partial
# create a commit in partial cloned repo and push it to remote
cd partial && echo 'hello' > foo && git commit -a -m "commit 2" && git push
# run gc in partial repo
git gc --prune=now
# in normal git repo, create another commit on top of the commit we created in partial repo
cd ../full && git pull && echo ' world' >> foo && git commit -a -m "commit 3" && git push
# pull from remote in partial repo, and run gc again
cd ../partial && git pull && git gc --prune=now
The last `git gc` will error out on fsck with error message like this:
error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94
error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94
Note that disabling commit graph on the partial repo will cause`git gc` to exit normally, but will still not solve the underlying problem. And in more complex situations, disabling commit graph will not avoid the error.
The problem is caused by the wrong result returned by setup_revision with `--exclude-promisor-objects` enabled. `git gc` will call `git repack`, which will call `git pack-objects` twice on a partially cloned repo. The first call to pack-objects combines all the promisor packfiles, and the second pack-objects command packs all reachable non-promisor objects into a normal packfile. However, a bug in setup_revision caused some non-promisor objects to be mistakenly marked as in promisor packfiles in the second call to pack-objects. These incorrectly marked objects are never repacked, and were deleted from the object store as a result. In revision.c, `process_parents()` recursively marks commit parents as UNINTERESTING if the commit itself is UNINTERESTING. `--exclude-promisor-objects` is implemented as "iterate all objects in promisor packfiles, mark them as UNINTERESTING". So when we find a commit object in a promisor packfile, we also set its ancestors as UNINTERESTING, whether the ancestor is a promisor object or not. In the example above, "commit 2" is a normal commit object, living in a normal packfile, but marked as a promisor object and gc'ed from the object store.
Han Young (1):
revision: don't set parents as uninteresting if exclude promisor
revision.c | 2 +-
t/t0410-partial-clone.sh | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 2 deletions(-)
--
2.46.0.rc0.107.gae139121ac.dirty
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects
2024-07-19 9:34 [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Han Young
@ 2024-07-19 9:34 ` Han Young
2024-07-19 15:07 ` [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Junio C Hamano
2024-07-20 5:28 ` Han Young
2 siblings, 0 replies; 5+ messages in thread
From: Han Young @ 2024-07-19 9:34 UTC (permalink / raw)
To: git; +Cc: Han Young, C O Xing Xin
In revision.c, `process_parents()` recursively marks commit parents
as UNINTERESTING if the commit itself is UNINTERESTING.
`--exclude-promisor-objects` is implemented as
"iterate all objects in promisor packfiles, mark them as UNINTERESTING".
So when we find a commit object in a promisor packfile, we also set its ancestors
as UNINTERESTING, whether the ancestor is a promisor object or not.
This causes normal objects to be falsely marked as promisor objects
and removed by `git repack`.
Stop setting the parents of uninteresting commits' to UNINTERESTING
when we exclude promisor objects, and add a test to prevent regression.
Note that this change would cause rev-list to report incorrect results if
`--exclude-promisor-objects` is used with other revision walk filters. But
`--exclude-promisor-objects` is for internal use only, so we don't have to worry
about users using other filters with `--exclude-promisor-objects`.
Signed-off-by: Han Young <hanyang.tony@bytedance.com>
Helped-by: C O Xing Xin <xingxin.xx@bytedance.com>
---
revision.c | 2 +-
t/t0410-partial-clone.sh | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/revision.c b/revision.c
index 1c0192f522..eacb0c909d 100644
--- a/revision.c
+++ b/revision.c
@@ -1164,7 +1164,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
* wasn't uninteresting), in which case we need
* to mark its parents recursively too..
*/
- if (commit->object.flags & UNINTERESTING) {
+ if (!revs->exclude_promisor_objects && commit->object.flags & UNINTERESTING) {
while (parent) {
struct commit *p = parent->item;
parent = parent->next;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 2c30c86e7b..4ee3437685 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -22,6 +22,17 @@ pack_as_from_promisor () {
echo $HASH
}
+pack_commit() {
+ HASH=$(echo $1 | git -C repo pack-objects .git/objects/pack/pack --missing=allow-any) &&
+ delete_object repo $1 &&
+ echo $HASH
+}
+
+pack_commit_as_promisor() {
+ HASH=$(pack_commit $1) &&
+ >repo/.git/objects/pack/pack-$HASH.promisor
+}
+
promise_and_delete () {
HASH=$(git -C repo rev-parse "$1") &&
git -C repo tag -a -m message my_annotated_tag "$HASH" &&
@@ -369,7 +380,16 @@ test_expect_success 'missing tree objects with --missing=allow-promisor and --ex
git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
test_must_be_empty rev_list_err &&
# 3 commits, no blobs or trees
- test_line_count = 3 objs
+ test_line_count = 3 objs &&
+
+ # Pack all three commits into individual packs, and mark the last commit pack as promisor
+ pack_commit_as_promisor $(git -C repo rev-parse baz) &&
+ pack_commit $(git -C repo rev-parse bar) &&
+ pack_commit $(git -C repo rev-parse foo) &&
+ git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+ test_must_be_empty rev_list_err &&
+ # commits foo and bar should remain
+ test_line_count = 2 objs
'
test_expect_success 'missing non-root tree object and rev-list' '
--
2.46.0.rc0.107.gae139121ac.dirty
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set
2024-07-19 9:34 [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Han Young
2024-07-19 9:34 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
@ 2024-07-19 15:07 ` Junio C Hamano
2024-07-20 5:28 ` Han Young
2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2024-07-19 15:07 UTC (permalink / raw)
To: Han Young; +Cc: git
Han Young <hanyang.tony@bytedance.com> writes:
> We use --filter=blob:none to clone our large monorepo. After a while we started getting reports from engineers complaining that their local repository was broken. Upon further investigation, we found that broken repositories are missing objects that created in that particular local repository. `git fsck` reports "bad object: xxxx".
Please wrap overly long lines. Unless they are wrapped down to
something that can be quoted in e-mail responses a few times and
still comfortably fit on a terminal line, e.g. 68-72 columns, they
are just unreadable mess.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set
2024-07-19 9:34 [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Han Young
2024-07-19 9:34 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2024-07-19 15:07 ` [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Junio C Hamano
@ 2024-07-20 5:28 ` Han Young
2024-07-20 5:28 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2 siblings, 1 reply; 5+ messages in thread
From: Han Young @ 2024-07-20 5:28 UTC (permalink / raw)
To: git; +Cc: Han Young
We use --filter=blob:none to clone our large monorepo.
After a while we started getting reports from engineers complaining
that their local repository was broken. Upon further investigation,
we found that broken repositories are missing objects that created
in that particular local repository. git fsck reports "bad object: xxx".
Here are the minimal steps to recreate issue.
# create a normal git repo, add one file and push to remote
mkdir full && cd full && git init && touch foo
git add foo && git commit -m "commit 1" && git push
# partial clone a copy of the repo we just created
cd ..
git clone git@example.org:example/foo.git --filter=blob:none partial
# create a commit in partial cloned repo and push it to remote
cd partial && echo 'hello' > foo && git commit -a -m "commit 2"
git push
# run gc in partial repo
git gc --prune=now
# in normal git repo, create another commit on top of the
# commit we created in partial repo
cd ../full && git pull && echo ' world' >> foo
git commit -a -m "commit 3" && git push
# pull from remote in partial repo, and run gc again
cd ../partial && git pull && git gc --prune=now
The last `git gc` will error out on fsck with error message like this:
error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94
error: Could not read d3fbfea9e448461c2b72a79a95a220ae10defd94
Note that disabling commit graph on the partial repo will cause
`git gc` to exit normally, but will still not solve the
underlying problem. And in more complex situations,
disabling commit graph will not avoid the error.
The problem is caused by the wrong result returned by setup_revision
with `--exclude-promisor-objects` enabled.
`git gc` will call `git repack`, which will call `git pack-objects`
twice on a partially cloned repo. The first call to pack-objects
combines all the promisor packfiles, and the second pack-objects
command packs all reachable non-promisor objects into a normal packfile.
However, a bug in setup_revision caused some non-promisor objects
to be mistakenly marked as in promisor packfiles in the second call
to pack-objects. These incorrectly marked objects are never repacked,
and were deleted from the object store as a result. In revision.c,
`process_parents()` recursively marks commit parents as UNINTERESTING
if the commit itself is UNINTERESTING. `--exclude-promisor-objects`
is implemented as "iterate all objects in promisor packfiles,
mark them as UNINTERESTING". So when we find a commit object in
a promisor packfile, we also set its ancestors as UNINTERESTING,
whether the ancestor is a promisor object or not. In the example above,
"commit 2" is a normal commit object, living in a normal packfile,
but marked as a promisor object and gc'ed from the object store.
Han Young (1):
revision: don't set parents as uninteresting if exclude promisor
revision.c | 2 +-
t/t0410-partial-clone.sh | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 2 deletions(-)
--
2.46.0.rc0.107.gae139121ac.dirty
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects
2024-07-20 5:28 ` Han Young
@ 2024-07-20 5:28 ` Han Young
0 siblings, 0 replies; 5+ messages in thread
From: Han Young @ 2024-07-20 5:28 UTC (permalink / raw)
To: git; +Cc: Han Young, C O Xing Xin
In revision.c, `process_parents()` recursively marks commit parents
as UNINTERESTING if the commit itself is UNINTERESTING.
`--exclude-promisor-objects` is implemented as
"iterate all objects in promisor packfiles, mark them as UNINTERESTING".
So when we find a commit object in a promisor packfile, we also set its ancestors
as UNINTERESTING, whether the ancestor is a promisor object or not.
This causes normal objects to be falsely marked as promisor objects
and removed by `git repack`.
Stop setting the parents of uninteresting commits' to UNINTERESTING
when we exclude promisor objects, and add a test to prevent regression.
Note that this change would cause rev-list to report incorrect results if
`--exclude-promisor-objects` is used with other revision walk filters. But
`--exclude-promisor-objects` is for internal use only, so we don't have to worry
about users using other filters with `--exclude-promisor-objects`.
Signed-off-by: Han Young <hanyang.tony@bytedance.com>
Helped-by: C O Xing Xin <xingxin.xx@bytedance.com>
---
revision.c | 2 +-
t/t0410-partial-clone.sh | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/revision.c b/revision.c
index 1c0192f522..eacb0c909d 100644
--- a/revision.c
+++ b/revision.c
@@ -1164,7 +1164,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
* wasn't uninteresting), in which case we need
* to mark its parents recursively too..
*/
- if (commit->object.flags & UNINTERESTING) {
+ if (!revs->exclude_promisor_objects && commit->object.flags & UNINTERESTING) {
while (parent) {
struct commit *p = parent->item;
parent = parent->next;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 2c30c86e7b..4ee3437685 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -22,6 +22,17 @@ pack_as_from_promisor () {
echo $HASH
}
+pack_commit() {
+ HASH=$(echo $1 | git -C repo pack-objects .git/objects/pack/pack --missing=allow-any) &&
+ delete_object repo $1 &&
+ echo $HASH
+}
+
+pack_commit_as_promisor() {
+ HASH=$(pack_commit $1) &&
+ >repo/.git/objects/pack/pack-$HASH.promisor
+}
+
promise_and_delete () {
HASH=$(git -C repo rev-parse "$1") &&
git -C repo tag -a -m message my_annotated_tag "$HASH" &&
@@ -369,7 +380,16 @@ test_expect_success 'missing tree objects with --missing=allow-promisor and --ex
git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
test_must_be_empty rev_list_err &&
# 3 commits, no blobs or trees
- test_line_count = 3 objs
+ test_line_count = 3 objs &&
+
+ # Pack all three commits into individual packs, and mark the last commit pack as promisor
+ pack_commit_as_promisor $(git -C repo rev-parse baz) &&
+ pack_commit $(git -C repo rev-parse bar) &&
+ pack_commit $(git -C repo rev-parse foo) &&
+ git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+ test_must_be_empty rev_list_err &&
+ # commits foo and bar should remain
+ test_line_count = 2 objs
'
test_expect_success 'missing non-root tree object and rev-list' '
--
2.46.0.rc0.107.gae139121ac.dirty
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-20 5:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-19 9:34 [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Han Young
2024-07-19 9:34 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2024-07-19 15:07 ` [PATCH 0/1] Fix bug in revision walk if --exclude-promisor-objects is set Junio C Hamano
2024-07-20 5:28 ` Han Young
2024-07-20 5:28 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).