[PATCH] fsck: do not loop infinitely when processing packs

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] fsck: do not loop infinitely when processing packs
@ 2026-02-22 18:37 brian m. carlson
  2026-02-22 22:39 ` Junio C Hamano
  2026-02-23  8:43 ` Patrick Steinhardt
  0 siblings, 2 replies; 14+ messages in thread
From: brian m. carlson @ 2026-02-22 18:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt

When we iterate over our packfiles in the fsck code, we do so twice.
The first time, we count the number of objects in all of the packs
together and later on, we iterate a second time, processing each pack
and verifying its integrity.

This would normally work fine, but if we have two packs and we're
processing the second, the verification process will open the pack to
read from it, which will place it at the beginning of the most recently
used list.  Since this same list is used for iteration, the pack we most
recently processed before this will then be behind the current pack in
the linked list, so when we next process the list, we will go back to
the first pack again and then loop forever.  This also makes our
progress indicator loop up to many thousands of percent, which is not
only nonsensical, but a clear indication that something has gone wrong.

Solve this by skipping our MRU updates when we're iterating over
packfiles, which avoids the reordering that causes problems.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
I realize that t1050 may seem like a bizarre place to put this test.
However, I was debugging my sha256-interop branch and why the final test
calling `git fsck` was failing, so I placed a `git fsck` earlier in the
test to double-check and discovered the problem.  Since we already have
a natural testcase here, I thought I'd just place the test where we
already know it will trigger the problem.

 packfile.h       | 16 ++++++++++++++--
 t/t1050-large.sh |  4 ++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/packfile.h b/packfile.h
index acc5c55ad5..086d98c1a0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -183,6 +183,7 @@ struct packfile_list_entry *packfile_store_get_packs(struct packfile_store *stor
 struct repo_for_each_pack_data {
 	struct odb_source *source;
 	struct packfile_list_entry *entry;
+	struct repository *repo;
 };
 
 static inline struct repo_for_each_pack_data repo_for_eack_pack_data_init(struct repository *repo)
@@ -191,8 +192,13 @@ static inline struct repo_for_each_pack_data repo_for_eack_pack_data_init(struct
 
 	odb_prepare_alternates(repo->objects);
 
+	data.repo = repo;
+
 	for (struct odb_source *source = repo->objects->sources; source; source = source->next) {
-		struct packfile_list_entry *entry = packfile_store_get_packs(source->packfiles);
+		struct packfile_list_entry *entry;
+
+		source->packfiles->skip_mru_updates = true;
+		entry = packfile_store_get_packs(source->packfiles);
 		if (!entry)
 			continue;
 		data.source = source;
@@ -212,7 +218,10 @@ static inline void repo_for_each_pack_data_next(struct repo_for_each_pack_data *
 		return;
 
 	for (source = data->source->next; source; source = source->next) {
-		struct packfile_list_entry *entry = packfile_store_get_packs(source->packfiles);
+		struct packfile_list_entry *entry;
+
+		source->packfiles->skip_mru_updates = true;
+		entry = packfile_store_get_packs(source->packfiles);
 		if (!entry)
 			continue;
 		data->source = source;
@@ -220,6 +229,9 @@ static inline void repo_for_each_pack_data_next(struct repo_for_each_pack_data *
 		return;
 	}
 
+	for (struct odb_source *source = data->repo->objects->sources; source; source = source->next)
+		source->packfiles->skip_mru_updates = false;
+
 	data->source = NULL;
 	data->entry = NULL;
 }
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 5be273611a..75e75e627c 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -160,6 +160,10 @@ test_expect_success 'hash-object' '
 	git hash-object large1
 '
 
+test_expect_success 'fsck does not loop forever' '
+	git fsck
+'
+
 test_expect_success 'cat-file a large file' '
 	git cat-file blob :large1 >/dev/null
 '

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-22 18:37 [PATCH] fsck: do not loop infinitely when processing packs brian m. carlson
@ 2026-02-22 22:39 ` Junio C Hamano
  2026-02-22 23:07   ` brian m. carlson
  2026-02-23  8:43 ` Patrick Steinhardt
  1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2026-02-22 22:39 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Patrick Steinhardt

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 5be273611a..75e75e627c 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -160,6 +160,10 @@ test_expect_success 'hash-object' '
>  	git hash-object large1
>  '
>  
> +test_expect_success 'fsck does not loop forever' '
> +	git fsck
> +'
> +
>  test_expect_success 'cat-file a large file' '
>  	git cat-file blob :large1 >/dev/null
>  '

Wow, this is a fun test ;-).

Thanks.  Will queue.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-22 22:39 ` Junio C Hamano
@ 2026-02-22 23:07   ` brian m. carlson
  2026-02-23  7:12     ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2026-02-22 23:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]

On 2026-02-22 at 22:39:57, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> > index 5be273611a..75e75e627c 100755
> > --- a/t/t1050-large.sh
> > +++ b/t/t1050-large.sh
> > @@ -160,6 +160,10 @@ test_expect_success 'hash-object' '
> >  	git hash-object large1
> >  '
> >  
> > +test_expect_success 'fsck does not loop forever' '
> > +	git fsck
> > +'
> > +
> >  test_expect_success 'cat-file a large file' '
> >  	git cat-file blob :large1 >/dev/null
> >  '
> 
> Wow, this is a fun test ;-).
> 
> Thanks.  Will queue.

I noticed that the code here seems to have come in with the 2.53 cycle,
so we may want to cherry-pick it to `maint` at some point if it seems
like the problem occurs often.  From what I can tell, it only occurs
when one explicitly invokes `git fsck`[0] and not on transfer, so it
shouldn't cause a DoS against server implementations.

Of course, we should wait for Patrick, who authored this code, to chime
in and lend his expertise here.  I must admit I'm not very familiar with
this area, although I had recently seen the MRU code when working on
pack index v3 (and then I thought, "is this actually the problem?").

[0] The code I saw is the `if (check_full)` branch in `cmd_fsck`, which
is obviously only invoked by the `fsck` command itself.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-22 23:07   ` brian m. carlson
@ 2026-02-23  7:12     ` Jeff King
  2026-02-23  8:46       ` Patrick Steinhardt
  2026-02-23 15:49       ` Junio C Hamano
  0 siblings, 2 replies; 14+ messages in thread
From: Jeff King @ 2026-02-23  7:12 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, git, Patrick Steinhardt

On Sun, Feb 22, 2026 at 11:07:41PM +0000, brian m. carlson wrote:

> I noticed that the code here seems to have come in with the 2.53 cycle,
> so we may want to cherry-pick it to `maint` at some point if it seems
> like the problem occurs often.  From what I can tell, it only occurs
> when one explicitly invokes `git fsck`[0] and not on transfer, so it
> shouldn't cause a DoS against server implementations.
> 
> Of course, we should wait for Patrick, who authored this code, to chime
> in and lend his expertise here.  I must admit I'm not very familiar with
> this area, although I had recently seen the MRU code when working on
> pack index v3 (and then I thought, "is this actually the problem?").

The problem seems to bisect to c31bad4f7d (packfile: track packs via the
MRU list exclusively, 2025-10-30), which is not terribly surprising, as
it was one of the known risks of collapsing the two lists into one.

Your solution is using the tool provided by that commit for its edge
case:

    Note that there is one important edge case: `for_each_packed_object()`
    uses the MRU list to iterate through packs, and then it lists each
    object in those packs. This would have the effect that we now sort the
    current pack towards the front, thus modifying the list of packfiles we
    are iterating over, with the consequence that we'll see an infinite
    loop. This edge case is worked around by introducing a new field that
    allows us to skip updating the MRU.

So in that sense it is the right thing. But it really makes me wonder if
we are going back to keeping two lists (one MRU and one in some stable
order). Or at the very least providing _some_ iteration method that is
guaranteed to be stable (whether a linked list or a function), so that
iterating code is not subject to this subtle dependency by default.

Having to identify each potential spot and set a "btw, don't switch the
pack list order!" flag seems error-prone. And also loses efficiency when
you are iterating a pack and accessing objects in it (since we can't
push that pack to the front of the MRU then, even though we'd expect
there to be high locality with our iteration).

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-22 18:37 [PATCH] fsck: do not loop infinitely when processing packs brian m. carlson
  2026-02-22 22:39 ` Junio C Hamano
@ 2026-02-23  8:43 ` Patrick Steinhardt
  2026-02-23  9:27   ` Jeff King
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2026-02-23  8:43 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano

On Sun, Feb 22, 2026 at 06:37:10PM +0000, brian m. carlson wrote:
> When we iterate over our packfiles in the fsck code, we do so twice.
> The first time, we count the number of objects in all of the packs
> together and later on, we iterate a second time, processing each pack
> and verifying its integrity.
> 
> This would normally work fine, but if we have two packs and we're
> processing the second, the verification process will open the pack to
> read from it, which will place it at the beginning of the most recently
> used list.  Since this same list is used for iteration, the pack we most
> recently processed before this will then be behind the current pack in
> the linked list, so when we next process the list, we will go back to
> the first pack again and then loop forever.  This also makes our
> progress indicator loop up to many thousands of percent, which is not
> only nonsensical, but a clear indication that something has gone wrong.
> 
> Solve this by skipping our MRU updates when we're iterating over
> packfiles, which avoids the reordering that causes problems.

Right, this makes sense. We know that we cannot modify the list of packs
in case we're iterating through them, so `repo_for_each_pack()` should
indeed skip the MRU updates.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> I realize that t1050 may seem like a bizarre place to put this test.
> However, I was debugging my sha256-interop branch and why the final test
> calling `git fsck` was failing, so I placed a `git fsck` earlier in the
> test to double-check and discovered the problem.  Since we already have
> a natural testcase here, I thought I'd just place the test where we
> already know it will trigger the problem.

Makes me wonder though why none of the tests t1450-fsck exhibit this
pattern. I cannot imagine that there is no test there that doesn't have
multiple packs. *goes checking* We actually might not, but when trying
to come up with a minimum reproducer I failed at first.

This is because ultimately the root cause seems to be a bit more
complex: we don't only care about there being multiple packfiles. We
also care about "core.bigFileThreshold".

Typically, we don't execute `find_pack_entry()` at all when verifying
packfiles as we iterate through objects in packfile order. We thus don't
have to look up objects via their object ID, but instead we do so by
using their packfile offset. And this mechanism will not end up in
`find_pack_entry()`, and thus we wouldn't update the MRU.

But there's an exception: when the size of the object that is to be
checked exceeds "core.bigFileThreshold" we won't read it directly, but
we'll instead use `stream_object_signature()`, which eventually ends up
calling `odb_read_stream_open()`. And that of course _will_ call
`find_pack_entry()`, as we're now in the mode where we search by object
ID, not by offset. And consequently, we'll update the MRU in this call
path.

With that knowledge it's kind of easy to reproduce the issue: we simply
need two packfiles, and each of them must contain at least one blob that
is larger than "core.bigFileThreshold".

Now I agree that the below proposed fix would be a good change to make
the code more solid while we still have `repo_for_each_pack()` (I plan
to eventually get rid of it). But arguably, the above logic is kind of
broken regardless of this: we are asked to verify objects in the current
pack, but we may end up verifying the object via a different pack. So if
the same object were to exist in multiple packs, we might end up only
verifying one of its instances.

I've got a couple patches in the making that'll fix this.

> diff --git a/packfile.h b/packfile.h
> index acc5c55ad5..086d98c1a0 100644
> --- a/packfile.h
> +++ b/packfile.h
> @@ -191,8 +192,13 @@ static inline struct repo_for_each_pack_data repo_for_eack_pack_data_init(struct
>  
>  	odb_prepare_alternates(repo->objects);
>  
> +	data.repo = repo;
> +
>  	for (struct odb_source *source = repo->objects->sources; source; source = source->next) {
> -		struct packfile_list_entry *entry = packfile_store_get_packs(source->packfiles);
> +		struct packfile_list_entry *entry;
> +
> +		source->packfiles->skip_mru_updates = true;
> +		entry = packfile_store_get_packs(source->packfiles);
>  		if (!entry)
>  			continue;
>  		data.source = source;
> @@ -212,7 +218,10 @@ static inline void repo_for_each_pack_data_next(struct repo_for_each_pack_data *
>  		return;
>  
>  	for (source = data->source->next; source; source = source->next) {
> -		struct packfile_list_entry *entry = packfile_store_get_packs(source->packfiles);
> +		struct packfile_list_entry *entry;
> +
> +		source->packfiles->skip_mru_updates = true;
> +		entry = packfile_store_get_packs(source->packfiles);
>  		if (!entry)
>  			continue;
>  		data->source = source;
> @@ -220,6 +229,9 @@ static inline void repo_for_each_pack_data_next(struct repo_for_each_pack_data *
>  		return;
>  	}
>  
> +	for (struct odb_source *source = data->repo->objects->sources; source; source = source->next)
> +		source->packfiles->skip_mru_updates = false;
> +
>  	data->source = NULL;
>  	data->entry = NULL;
>  }

I still think that this hardening here would be worth it, but it has the
problem that we won't reset the value in case the caller breaks out of
the loop by themselves. I don't really have a good idea for how to fix
this, except by turning it into a function with a callback.

Patrick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  7:12     ` Jeff King
@ 2026-02-23  8:46       ` Patrick Steinhardt
  2026-02-23  9:25         ` Jeff King
  2026-02-23 15:49       ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Patrick Steinhardt @ 2026-02-23  8:46 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Junio C Hamano, git

On Mon, Feb 23, 2026 at 02:12:15AM -0500, Jeff King wrote:
> On Sun, Feb 22, 2026 at 11:07:41PM +0000, brian m. carlson wrote:
> 
> > I noticed that the code here seems to have come in with the 2.53 cycle,
> > so we may want to cherry-pick it to `maint` at some point if it seems
> > like the problem occurs often.  From what I can tell, it only occurs
> > when one explicitly invokes `git fsck`[0] and not on transfer, so it
> > shouldn't cause a DoS against server implementations.
> > 
> > Of course, we should wait for Patrick, who authored this code, to chime
> > in and lend his expertise here.  I must admit I'm not very familiar with
> > this area, although I had recently seen the MRU code when working on
> > pack index v3 (and then I thought, "is this actually the problem?").
> 
> The problem seems to bisect to c31bad4f7d (packfile: track packs via the
> MRU list exclusively, 2025-10-30), which is not terribly surprising, as
> it was one of the known risks of collapsing the two lists into one.
> 
> Your solution is using the tool provided by that commit for its edge
> case:
> 
>     Note that there is one important edge case: `for_each_packed_object()`
>     uses the MRU list to iterate through packs, and then it lists each
>     object in those packs. This would have the effect that we now sort the
>     current pack towards the front, thus modifying the list of packfiles we
>     are iterating over, with the consequence that we'll see an infinite
>     loop. This edge case is worked around by introducing a new field that
>     allows us to skip updating the MRU.
> 
> So in that sense it is the right thing. But it really makes me wonder if
> we are going back to keeping two lists (one MRU and one in some stable
> order). Or at the very least providing _some_ iteration method that is
> guaranteed to be stable (whether a linked list or a function), so that
> iterating code is not subject to this subtle dependency by default.
> 
> Having to identify each potential spot and set a "btw, don't switch the
> pack list order!" flag seems error-prone. And also loses efficiency when
> you are iterating a pack and accessing objects in it (since we can't
> push that pack to the front of the MRU then, even though we'd expect
> there to be high locality with our iteration).

As pointed out in [1] the root cause is actually something different,
and we merely expose this now with the MRU-based iteration. But I
wouldn't mind if we eventually switched back to maintaining two lists,
or finding a different way for how to maintain the iteration order.

Patrick

[1]: <aZwTPfmyrFp-QAPq@pks.im>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  8:46       ` Patrick Steinhardt
@ 2026-02-23  9:25         ` Jeff King
  2026-02-23  9:36           ` Patrick Steinhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2026-02-23  9:25 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: brian m. carlson, Junio C Hamano, git

On Mon, Feb 23, 2026 at 09:46:00AM +0100, Patrick Steinhardt wrote:

> As pointed out in [1] the root cause is actually something different,
> and we merely expose this now with the MRU-based iteration. But I
> wouldn't mind if we eventually switched back to maintaining two lists,
> or finding a different way for how to maintain the iteration order.

Maybe I don't understand what you're saying, but isn't the root cause
the same?

Code is iterating the list, and then during that iteration calls
find_pack_entry(). The fact that fsck only calls find_pack_entry() in
some subset of cases is immaterial, I'd think. The risk is always there
when iterating now.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  8:43 ` Patrick Steinhardt
@ 2026-02-23  9:27   ` Jeff King
  2026-02-23  9:53   ` Patrick Steinhardt
  2026-02-24 22:23   ` brian m. carlson
  2 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2026-02-23  9:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: brian m. carlson, git, Junio C Hamano

On Mon, Feb 23, 2026 at 09:43:41AM +0100, Patrick Steinhardt wrote:

> This is because ultimately the root cause seems to be a bit more
> complex: we don't only care about there being multiple packfiles. We
> also care about "core.bigFileThreshold".
> 
> Typically, we don't execute `find_pack_entry()` at all when verifying
> packfiles as we iterate through objects in packfile order. We thus don't
> have to look up objects via their object ID, but instead we do so by
> using their packfile offset. And this mechanism will not end up in
> `find_pack_entry()`, and thus we wouldn't update the MRU.
> 
> But there's an exception: when the size of the object that is to be
> checked exceeds "core.bigFileThreshold" we won't read it directly, but
> we'll instead use `stream_object_signature()`, which eventually ends up
> calling `odb_read_stream_open()`. And that of course _will_ call
> `find_pack_entry()`, as we're now in the mode where we search by object
> ID, not by offset. And consequently, we'll update the MRU in this call
> path.

Good find.

> With that knowledge it's kind of easy to reproduce the issue: we simply
> need two packfiles, and each of them must contain at least one blob that
> is larger than "core.bigFileThreshold".
> 
> Now I agree that the below proposed fix would be a good change to make
> the code more solid while we still have `repo_for_each_pack()` (I plan
> to eventually get rid of it). But arguably, the above logic is kind of
> broken regardless of this: we are asked to verify objects in the current
> pack, but we may end up verifying the object via a different pack. So if
> the same object were to exist in multiple packs, we might end up only
> verifying one of its instances.

Yeah, that was my immediate response after reading your analysis above
(that fsck should not be doing find_pack_entry() in the first place
here).

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  9:25         ` Jeff King
@ 2026-02-23  9:36           ` Patrick Steinhardt
  2026-02-23  9:46             ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Patrick Steinhardt @ 2026-02-23  9:36 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Junio C Hamano, git

On Mon, Feb 23, 2026 at 04:25:23AM -0500, Jeff King wrote:
> On Mon, Feb 23, 2026 at 09:46:00AM +0100, Patrick Steinhardt wrote:
> 
> > As pointed out in [1] the root cause is actually something different,
> > and we merely expose this now with the MRU-based iteration. But I
> > wouldn't mind if we eventually switched back to maintaining two lists,
> > or finding a different way for how to maintain the iteration order.
> 
> Maybe I don't understand what you're saying, but isn't the root cause
> the same?
> 
> Code is iterating the list, and then during that iteration calls
> find_pack_entry(). The fact that fsck only calls find_pack_entry() in
> some subset of cases is immaterial, I'd think. The risk is always there
> when iterating now.

It is, true. All I'm saying is that the problem runs a bit deeper, and
that fixing the actual root cause would also fix the issue reported by
brian.

So we might want to have another look at hardening packfile iteration
either by reinstating the second list for iteration or by extending
`repo_for_each_pack()` to also set the `skip_updating_mru` bit. Over
time though I'd rather get rid of `repo_for_each_pack()`, and once that
is the case and packed object iteration is neatly encapsulated in the
backend the risk of only having the MRU will be significantly reduced.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  9:36           ` Patrick Steinhardt
@ 2026-02-23  9:46             ` Jeff King
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2026-02-23  9:46 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: brian m. carlson, Junio C Hamano, git

On Mon, Feb 23, 2026 at 10:36:25AM +0100, Patrick Steinhardt wrote:

> On Mon, Feb 23, 2026 at 04:25:23AM -0500, Jeff King wrote:
> > On Mon, Feb 23, 2026 at 09:46:00AM +0100, Patrick Steinhardt wrote:
> > 
> > > As pointed out in [1] the root cause is actually something different,
> > > and we merely expose this now with the MRU-based iteration. But I
> > > wouldn't mind if we eventually switched back to maintaining two lists,
> > > or finding a different way for how to maintain the iteration order.
> > 
> > Maybe I don't understand what you're saying, but isn't the root cause
> > the same?
> > 
> > Code is iterating the list, and then during that iteration calls
> > find_pack_entry(). The fact that fsck only calls find_pack_entry() in
> > some subset of cases is immaterial, I'd think. The risk is always there
> > when iterating now.
> 
> It is, true. All I'm saying is that the problem runs a bit deeper, and
> that fixing the actual root cause would also fix the issue reported by
> brian.

Ah, OK, after reading your other email again, I see what you're saying.
The root cause (for you) is that it is unexpected for fsck to call
find_pack_entry() at all in this case. Which I agree is wrong, but I
just wouldn't haven't called it the "root". ;)

But I think we are both on the same page that there are two problems
worth looking at (fsck should not be looking up the object again, and we
should make iteration less susceptible to re-ordering bugs).

> So we might want to have another look at hardening packfile iteration
> either by reinstating the second list for iteration or by extending
> `repo_for_each_pack()` to also set the `skip_updating_mru` bit. Over
> time though I'd rather get rid of `repo_for_each_pack()`, and once that
> is the case and packed object iteration is neatly encapsulated in the
> backend the risk of only having the MRU will be significantly reduced.

OK. Of the two short term solutions, I prefer the double-list. IMHO
skip_updating_mru is a bit of a hack in the first place, because it
misses opportunities to update the MRU.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  8:43 ` Patrick Steinhardt
  2026-02-23  9:27   ` Jeff King
@ 2026-02-23  9:53   ` Patrick Steinhardt
  2026-02-24 22:23   ` brian m. carlson
  2 siblings, 0 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2026-02-23  9:53 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano

On Mon, Feb 23, 2026 at 09:43:41AM +0100, Patrick Steinhardt wrote:
> On Sun, Feb 22, 2026 at 06:37:10PM +0000, brian m. carlson wrote:
> > When we iterate over our packfiles in the fsck code, we do so twice.
> > The first time, we count the number of objects in all of the packs
> > together and later on, we iterate a second time, processing each pack
> > and verifying its integrity.
> > 
> > This would normally work fine, but if we have two packs and we're
> > processing the second, the verification process will open the pack to
> > read from it, which will place it at the beginning of the most recently
> > used list.  Since this same list is used for iteration, the pack we most
> > recently processed before this will then be behind the current pack in
> > the linked list, so when we next process the list, we will go back to
> > the first pack again and then loop forever.  This also makes our
> > progress indicator loop up to many thousands of percent, which is not
> > only nonsensical, but a clear indication that something has gone wrong.
> > 
> > Solve this by skipping our MRU updates when we're iterating over
> > packfiles, which avoids the reordering that causes problems.
> 
> Right, this makes sense. We know that we cannot modify the list of packs
> in case we're iterating through them, so `repo_for_each_pack()` should
> indeed skip the MRU updates.
> 
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> > I realize that t1050 may seem like a bizarre place to put this test.
> > However, I was debugging my sha256-interop branch and why the final test
> > calling `git fsck` was failing, so I placed a `git fsck` earlier in the
> > test to double-check and discovered the problem.  Since we already have
> > a natural testcase here, I thought I'd just place the test where we
> > already know it will trigger the problem.
> 
> Makes me wonder though why none of the tests t1450-fsck exhibit this
> pattern. I cannot imagine that there is no test there that doesn't have
> multiple packs. *goes checking* We actually might not, but when trying
> to come up with a minimum reproducer I failed at first.
> 
> This is because ultimately the root cause seems to be a bit more
> complex: we don't only care about there being multiple packfiles. We
> also care about "core.bigFileThreshold".
> 
> Typically, we don't execute `find_pack_entry()` at all when verifying
> packfiles as we iterate through objects in packfile order. We thus don't
> have to look up objects via their object ID, but instead we do so by
> using their packfile offset. And this mechanism will not end up in
> `find_pack_entry()`, and thus we wouldn't update the MRU.
> 
> But there's an exception: when the size of the object that is to be
> checked exceeds "core.bigFileThreshold" we won't read it directly, but
> we'll instead use `stream_object_signature()`, which eventually ends up
> calling `odb_read_stream_open()`. And that of course _will_ call
> `find_pack_entry()`, as we're now in the mode where we search by object
> ID, not by offset. And consequently, we'll update the MRU in this call
> path.
> 
> With that knowledge it's kind of easy to reproduce the issue: we simply
> need two packfiles, and each of them must contain at least one blob that
> is larger than "core.bigFileThreshold".
> 
> Now I agree that the below proposed fix would be a good change to make
> the code more solid while we still have `repo_for_each_pack()` (I plan
> to eventually get rid of it). But arguably, the above logic is kind of
> broken regardless of this: we are asked to verify objects in the current
> pack, but we may end up verifying the object via a different pack. So if
> the same object were to exist in multiple packs, we might end up only
> verifying one of its instances.
> 
> I've got a couple patches in the making that'll fix this.

I've sent out the patches via [1]. Thanks!

Patrick

[1]: <20260223-pks-fsck-fix-v1-0-c29036832b6e@pks.im>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  7:12     ` Jeff King
  2026-02-23  8:46       ` Patrick Steinhardt
@ 2026-02-23 15:49       ` Junio C Hamano
  1 sibling, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2026-02-23 15:49 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git, Patrick Steinhardt

Jeff King <peff@peff.net> writes:

> Having to identify each potential spot and set a "btw, don't switch the
> pack list order!" flag seems error-prone. And also loses efficiency when
> you are iterating a pack and accessing objects in it (since we can't
> push that pack to the front of the MRU then, even though we'd expect
> there to be high locality with our iteration).

Ah, you said so clearly what I was feeling but I couldn't form into
words.  Using a stable second list to stably iterate over it for a
codepath like the fsck does sound a lot less error prone.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-23  8:43 ` Patrick Steinhardt
  2026-02-23  9:27   ` Jeff King
  2026-02-23  9:53   ` Patrick Steinhardt
@ 2026-02-24 22:23   ` brian m. carlson
  2026-02-24 22:32     ` Junio C Hamano
  2 siblings, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2026-02-24 22:23 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

On 2026-02-23 at 08:43:41, Patrick Steinhardt wrote:
> Typically, we don't execute `find_pack_entry()` at all when verifying
> packfiles as we iterate through objects in packfile order. We thus don't
> have to look up objects via their object ID, but instead we do so by
> using their packfile offset. And this mechanism will not end up in
> `find_pack_entry()`, and thus we wouldn't update the MRU.

If you're thinking about `nth_packed_object_id`, that is index (object
ID) order, not packfile order.  I actually made this mistake when
writing the interop code and having that function operate in pack order
breaks a surprising number of things in very subtle ways, notably
generating multi-pack indexes.

I will be sending a patch in the future documenting that requirement
clearly.

> I've got a couple patches in the making that'll fix this.

I'm happy to drop this patch in favour of yours.  Thanks for a quick
response.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fsck: do not loop infinitely when processing packs
  2026-02-24 22:23   ` brian m. carlson
@ 2026-02-24 22:32     ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2026-02-24 22:32 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Patrick Steinhardt, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2026-02-23 at 08:43:41, Patrick Steinhardt wrote:
>> Typically, we don't execute `find_pack_entry()` at all when verifying
>> packfiles as we iterate through objects in packfile order. We thus don't
>> have to look up objects via their object ID, but instead we do so by
>> using their packfile offset. And this mechanism will not end up in
>> `find_pack_entry()`, and thus we wouldn't update the MRU.
>
> If you're thinking about `nth_packed_object_id`, that is index (object
> ID) order, not packfile order.  I actually made this mistake when
> writing the interop code and having that function operate in pack order
> breaks a surprising number of things in very subtle ways, notably
> generating multi-pack indexes.
>
> I will be sending a patch in the future documenting that requirement
> clearly.
>
>> I've got a couple patches in the making that'll fix this.
>
> I'm happy to drop this patch in favour of yours.  Thanks for a quick
> response.

OK, so I'll retire your fef2a726 (fsck: do not loop infinitely when
processing packs, 2026-02-22) and replace it with the four-patch
series:

26fc7b59cd t/helper: improve "genrandom" test helper
10a6762719 object-file: adapt `stream_object_signature()` to take a stream
41b42e3527 packfile: expose function to read object stream for an offset
13eb65d366 pack-check: fix verification of large objects

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-02-24 22:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-22 18:37 [PATCH] fsck: do not loop infinitely when processing packs brian m. carlson
2026-02-22 22:39 ` Junio C Hamano
2026-02-22 23:07   ` brian m. carlson
2026-02-23  7:12     ` Jeff King
2026-02-23  8:46       ` Patrick Steinhardt
2026-02-23  9:25         ` Jeff King
2026-02-23  9:36           ` Patrick Steinhardt
2026-02-23  9:46             ` Jeff King
2026-02-23 15:49       ` Junio C Hamano
2026-02-23  8:43 ` Patrick Steinhardt
2026-02-23  9:27   ` Jeff King
2026-02-23  9:53   ` Patrick Steinhardt
2026-02-24 22:23   ` brian m. carlson
2026-02-24 22:32     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox