Git development
 help / color / mirror / Atom feed
* [GSoC PATCH] backfill: auto-detect sparse-checkout from config
@ 2026-03-31 11:25 Trieu Huynh
  2026-03-31 16:59 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Trieu Huynh @ 2026-03-31 11:25 UTC (permalink / raw)
  To: git; +Cc: Trieu Huynh

git backfill currently initializes the `sparse` field in
backfill_context to 0. This causes the command to always perform a
full backfill by default, even when the repository has sparse-checkout
enabled in its configuration (core.sparseCheckout).

Because 'sparse' is explicitly set to 0 at initialization, any later
logic intended to auto-detect the setting from the repository
configuration becomes dead code, as it only triggers if the value
is negative (sentinel).

Change the initial value of .sparse to -1. This allows the command
to correctly fallback to the repository's sparse-checkout settings
when the '--sparse' or '--no-sparse' options are not provided on the
command line.

Add a test case in t5620-backfill.sh to verify that 'git backfill'
automatically respects the sparse-checkout configuration without
explicit flags.

Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
---
 builtin/backfill.c  |  2 +-
 t/t5620-backfill.sh | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/backfill.c b/builtin/backfill.c
index 4b2db94173..0f31844ce7 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
 		.repo = repo,
 		.current_batch = OID_ARRAY_INIT,
 		.min_batch_size = 50000,
-		.sparse = 0,
+		.sparse = -1,
 		.show_progress = -1,
 	};
 	struct option options[] = {
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 91b5115732..a1a8d736db 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -149,6 +149,21 @@ test_expect_success 'backfill --sparse' '
 	test_line_count = 0 missing
 '
 
+test_expect_success 'backfill auto-detects sparse-checkout from config' '
+	git clone --sparse --filter=blob:none \
+		--single-branch --branch=main \
+		"file://$(pwd)/srv.bare" backfill-auto-sparse &&
+
+	git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing &&
+	test_line_count = 44 missing &&
+
+	GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \
+		-C backfill-auto-sparse backfill &&
+
+	test_trace2_data promisor fetch_count 4 <auto-sparse-trace &&
+	test_trace2_data path-walk paths 5 <auto-sparse-trace
+'
+
 test_expect_success 'backfill --sparse without cone mode (positive)' '
 	git clone --no-checkout --filter=blob:none		\
 		--single-branch --branch=main 		\
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config
  2026-03-31 11:25 [GSoC PATCH] backfill: auto-detect sparse-checkout from config Trieu Huynh
@ 2026-03-31 16:59 ` Junio C Hamano
  2026-04-01 19:31   ` Trieu Huynh
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2026-03-31 16:59 UTC (permalink / raw)
  To: Trieu Huynh; +Cc: git, Derrick Stolee

Trieu Huynh <vikingtc4@gmail.com> writes:

> git backfill currently initializes the `sparse` field in
> backfill_context to 0. This causes the command to always perform a
> full backfill by default, even when the repository has sparse-checkout
> enabled in its configuration (core.sparseCheckout).
>
> Because 'sparse' is explicitly set to 0 at initialization, any later
> logic intended to auto-detect the setting from the repository
> configuration becomes dead code, as it only triggers if the value
> is negative (sentinel).
>
> Change the initial value of .sparse to -1. This allows the command
> to correctly fallback to the repository's sparse-checkout settings
> when the '--sparse' or '--no-sparse' options are not provided on the
> command line.

The author of bff45557 (backfill: add --sparse option, 2025-02-03),
where this .sparse member originates, CC'ed for more intelligent
input than my review can offer ;-)


> Add a test case in t5620-backfill.sh to verify that 'git backfill'
> automatically respects the sparse-checkout configuration without
> explicit flags.
>
> Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
> ---
>  builtin/backfill.c  |  2 +-
>  t/t5620-backfill.sh | 15 +++++++++++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index 4b2db94173..0f31844ce7 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  		.repo = repo,
>  		.current_batch = OID_ARRAY_INIT,
>  		.min_batch_size = 50000,
> -		.sparse = 0,
> +		.sparse = -1,
>  		.show_progress = -1,
>  	};
>  	struct option options[] = {

I am a bit confused by this change.  What's the difference between
using -1 (which you picked) and 1 as the initial value for this
member?  From the proposed log message, I would have expected a new
code that says "ah, we notice, from this member being -1, that the
user did not specify --no-sparse or --sparse, so let's figure out if
our working tree is sparsely checked out ourselves and set it either
to 0 or to 1", but there is nothing like that in the code.  It seems
that the updated code relies on the fact that this part of
do_backfill() only cares if .sparse is zero or not, and ...

	if (ctx->sparse) {
		CALLOC_ARRAY(info.pl, 1);
		if (get_sparse_checkout_patterns(info.pl)) {
			path_walk_info_clear(&info);
			return error(_("problem loading sparse-checkout"));
		}
	}

... relies on get_sparse_checkout_patterns() not to do any harm when
the working tree is not sparsely checked out.

I am not sure if we want to call it "auto-detction".  It looks more
like "default to --sparse, relying that --sparse is a no-op in a
non-sparse working tree" at least to me.  Not that it is necessarily
wrong, and when people do "backfill" knowing that the working tree
is sparse, I am sympathetic if they prefer to keep the sparseness,
so such a change of default may be beneficial.

Derrick, what do you think?

> diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
> index 91b5115732..a1a8d736db 100755
> --- a/t/t5620-backfill.sh
> +++ b/t/t5620-backfill.sh
> @@ -149,6 +149,21 @@ test_expect_success 'backfill --sparse' '
>  	test_line_count = 0 missing
>  '
>  
> +test_expect_success 'backfill auto-detects sparse-checkout from config' '
> +	git clone --sparse --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" backfill-auto-sparse &&
> +
> +	git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing &&
> +	test_line_count = 44 missing &&
> +
> +	GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \
> +		-C backfill-auto-sparse backfill &&
> +
> +	test_trace2_data promisor fetch_count 4 <auto-sparse-trace &&
> +	test_trace2_data path-walk paths 5 <auto-sparse-trace
> +'
> +
>  test_expect_success 'backfill --sparse without cone mode (positive)' '
>  	git clone --no-checkout --filter=blob:none		\
>  		--single-branch --branch=main 		\

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config
  2026-03-31 16:59 ` Junio C Hamano
@ 2026-04-01 19:31   ` Trieu Huynh
  2026-04-02 14:24     ` Derrick Stolee
  0 siblings, 1 reply; 5+ messages in thread
From: Trieu Huynh @ 2026-04-01 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Derrick Stolee

On Tue, Mar 31, 2026 at 09:59:10AM -0700, Junio C Hamano wrote:
> Trieu Huynh <vikingtc4@gmail.com> writes:
> 
> > git backfill currently initializes the `sparse` field in
> > backfill_context to 0. This causes the command to always perform a
> > full backfill by default, even when the repository has sparse-checkout
> > enabled in its configuration (core.sparseCheckout).
> >
> > Because 'sparse' is explicitly set to 0 at initialization, any later
> > logic intended to auto-detect the setting from the repository
> > configuration becomes dead code, as it only triggers if the value
> > is negative (sentinel).
> >
> > Change the initial value of .sparse to -1. This allows the command
> > to correctly fallback to the repository's sparse-checkout settings
> > when the '--sparse' or '--no-sparse' options are not provided on the
> > command line.
> 
> The author of bff45557 (backfill: add --sparse option, 2025-02-03),
> where this .sparse member originates, CC'ed for more intelligent
> input than my review can offer ;-)
> 
> 
> > Add a test case in t5620-backfill.sh to verify that 'git backfill'
> > automatically respects the sparse-checkout configuration without
> > explicit flags.
> >
> > Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
> > ---
> >  builtin/backfill.c  |  2 +-
> >  t/t5620-backfill.sh | 15 +++++++++++++++
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/backfill.c b/builtin/backfill.c
> > index 4b2db94173..0f31844ce7 100644
> > --- a/builtin/backfill.c
> > +++ b/builtin/backfill.c
> > @@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
> >  		.repo = repo,
> >  		.current_batch = OID_ARRAY_INIT,
> >  		.min_batch_size = 50000,
> > -		.sparse = 0,
> > +		.sparse = -1,
> >  		.show_progress = -1,
> >  	};
> >  	struct option options[] = {
> 
> I am a bit confused by this change.  What's the difference between
> using -1 (which you picked) and 1 as the initial value for this
> member?  From the proposed log message, I would have expected a new
> code that says "ah, we notice, from this member being -1, that the
> user did not specify --no-sparse or --sparse, so let's figure out if
> our working tree is sparsely checked out ourselves and set it either
> to 0 or to 1", but there is nothing like that in the code.  It seems
> that the updated code relies on the fact that this part of
> do_backfill() only cares if .sparse is zero or not, and ...
> 
> 	if (ctx->sparse) {
> 		CALLOC_ARRAY(info.pl, 1);
> 		if (get_sparse_checkout_patterns(info.pl)) {
> 			path_walk_info_clear(&info);
> 			return error(_("problem loading sparse-checkout"));
> 		}
> 	}
> 
> ... relies on get_sparse_checkout_patterns() not to do any harm when
> the working tree is not sparsely checked out.
> 
> I am not sure if we want to call it "auto-detction".  It looks more
> like "default to --sparse, relying that --sparse is a no-op in a
> non-sparse working tree" at least to me.  Not that it is necessarily
> wrong, and when people do "backfill" knowing that the working tree
> is sparse, I am sympathetic if they prefer to keep the sparseness,
> so such a change of default may be beneficial.
> 
actually, the logic IIUC here is:
- first, ctx.sprase originally is set to 0.
- then, it check user's options. Assume, there is no option passed,
still 0.
- then, it check repo's config (core.sparseCheckout (default to 0 in
enviroment.c) but it doesn't since the guard:
	if (ctx.sparse < 0)
		ctx.sparse = cfg->apply_sparse_checkout;
- evenly. ctx.sparse still 0 eventhough in case the
core.sparseCheckout = 1 (git config core.sparseCheckout true)

IMHO, this change set the default value to -1, then it can fallback to
repo's config value if user has no-op passing (default to 0 (full
backfill if user doesnt intent to config previously either).
> Derrick, what do you think?
> 
> > diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
> > index 91b5115732..a1a8d736db 100755
> > --- a/t/t5620-backfill.sh
> > +++ b/t/t5620-backfill.sh
> > @@ -149,6 +149,21 @@ test_expect_success 'backfill --sparse' '
> >  	test_line_count = 0 missing
> >  '
> >  
> > +test_expect_success 'backfill auto-detects sparse-checkout from config' '
> > +	git clone --sparse --filter=blob:none \
> > +		--single-branch --branch=main \
> > +		"file://$(pwd)/srv.bare" backfill-auto-sparse &&
> > +
> > +	git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing &&
> > +	test_line_count = 44 missing &&
> > +
> > +	GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \
> > +		-C backfill-auto-sparse backfill &&
> > +
> > +	test_trace2_data promisor fetch_count 4 <auto-sparse-trace &&
> > +	test_trace2_data path-walk paths 5 <auto-sparse-trace
> > +'
> > +
> >  test_expect_success 'backfill --sparse without cone mode (positive)' '
> >  	git clone --no-checkout --filter=blob:none		\
> >  		--single-branch --branch=main 		\

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config
  2026-04-01 19:31   ` Trieu Huynh
@ 2026-04-02 14:24     ` Derrick Stolee
  2026-04-02 16:05       ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2026-04-02 14:24 UTC (permalink / raw)
  To: Trieu Huynh, Junio C Hamano; +Cc: git

On 4/1/26 3:31 PM, Trieu Huynh wrote:
> On Tue, Mar 31, 2026 at 09:59:10AM -0700, Junio C Hamano wrote:

>>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>>> index 4b2db94173..0f31844ce7 100644
>>> --- a/builtin/backfill.c
>>> +++ b/builtin/backfill.c
>>> @@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>>>   		.repo = repo,
>>>   		.current_batch = OID_ARRAY_INIT,
>>>   		.min_batch_size = 50000,
>>> -		.sparse = 0,
>>> +		.sparse = -1,
>>>   		.show_progress = -1,
>>>   	};
>>>   	struct option options[] = {
>>
>> I am a bit confused by this change.  What's the difference between
>> using -1 (which you picked) and 1 as the initial value for this
>> member?  From the proposed log message, I would have expected a new
>> code that says "ah, we notice, from this member being -1, that the
>> user did not specify --no-sparse or --sparse, so let's figure out if
>> our working tree is sparsely checked out ourselves and set it either
>> to 0 or to 1", but there is nothing like that in the code.  It seems
>> that the updated code relies on the fact that this part of
>> do_backfill() only cares if .sparse is zero or not, and ...
>>
>> 	if (ctx->sparse) {
>> 		CALLOC_ARRAY(info.pl, 1);
>> 		if (get_sparse_checkout_patterns(info.pl)) {
>> 			path_walk_info_clear(&info);
>> 			return error(_("problem loading sparse-checkout"));
>> 		}
>> 	}
>>
>> ... relies on get_sparse_checkout_patterns() not to do any harm when
>> the working tree is not sparsely checked out.
>>
>> I am not sure if we want to call it "auto-detction".  It looks more
>> like "default to --sparse, relying that --sparse is a no-op in a
>> non-sparse working tree" at least to me.  Not that it is necessarily
>> wrong, and when people do "backfill" knowing that the working tree
>> is sparse, I am sympathetic if they prefer to keep the sparseness,
>> so such a change of default may be beneficial.
>>
> actually, the logic IIUC here is:
> - first, ctx.sprase originally is set to 0.
> - then, it check user's options. Assume, there is no option passed,
> still 0.
> - then, it check repo's config (core.sparseCheckout (default to 0 in
> enviroment.c) but it doesn't since the guard:
> 	if (ctx.sparse < 0)
> 		ctx.sparse = cfg->apply_sparse_checkout;
> - evenly. ctx.sparse still 0 eventhough in case the
> core.sparseCheckout = 1 (git config core.sparseCheckout true)
> 
> IMHO, this change set the default value to -1, then it can fallback to
> repo's config value if user has no-op passing (default to 0 (full
> backfill if user doesnt intent to config previously either).
>> Derrick, what do you think?

Indeed, I thought this was how it already worked, as 85127bcdea
(backfill: assume --sparse when sparse-checkout is enabled,
2025-02-03) (introduced in [1]) should have covered.

[1] 
https://lore.kernel.org/git/f22cf8b34851a3ba4cd6ab1d31f0835579143c40.1738602667.git.gitgitgadget@gmail.com/

However, that change did not include this test:

>>> +test_expect_success 'backfill auto-detects sparse-checkout from config' '
>>> +	git clone --sparse --filter=blob:none \
>>> +		--single-branch --branch=main \
>>> +		"file://$(pwd)/srv.bare" backfill-auto-sparse &&
>>> +
>>> +	git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing &&
>>> +	test_line_count = 44 missing &&
>>> +
>>> +	GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \
>>> +		-C backfill-auto-sparse backfill &&
>>> +
>>> +	test_trace2_data promisor fetch_count 4 <auto-sparse-trace &&
>>> +	test_trace2_data path-walk paths 5 <auto-sparse-trace
>>> +'

This actually demonstrates the behavior that I was intending
to introduce in that change, but made an error in (1) not testing
it like I thought I had, and (2) not doing exactly what this change
does by having the .sparse option initialized as -1 (unset).

The code and commit message do a good job of identifying this bug
and difference in behavior. The only suggestion I have is to update
the commit message to point to my original commit that failed to
implement this behavior in the expected way.

Thanks,
-Stolee




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config
  2026-04-02 14:24     ` Derrick Stolee
@ 2026-04-02 16:05       ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2026-04-02 16:05 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Trieu Huynh, git

Derrick Stolee <stolee@gmail.com> writes:

> On 4/1/26 3:31 PM, Trieu Huynh wrote:
>> On Tue, Mar 31, 2026 at 09:59:10AM -0700, Junio C Hamano wrote:
> ...
>>> I am a bit confused by this change.  What's the difference between
>>> using -1 (which you picked) and 1 as the initial value for this
>>> member?  From the proposed log message, I would have expected a new
>>> code that says "ah, we notice, from this member being -1, that the
>>> user did not specify --no-sparse or --sparse, so let's figure out if
>>> our working tree is sparsely checked out ourselves and set it either
>>> to 0 or to 1", but there is nothing like that in the code.
>> ...
>> IMHO, this change set the default value to -1, then it can fallback to
>> repo's config value if user has no-op passing (default to 0 (full
>> backfill if user doesnt intent to config previously either).
>>> Derrick, what do you think?
>
> Indeed, I thought this was how it already worked, as 85127bcdea
> (backfill: assume --sparse when sparse-checkout is enabled,
> 2025-02-03) (introduced in [1]) should have covered.

Ah, OK, in other words, the code to use how the repository is
configured when the user does not override from the command line was
already there; it was just the way the code checked if the user gave
something from the command line was wrong (i.e., initialized to 0,
pretending that '--no-foo" was given even when there isn't), and
that is why we do not see "ok, there is nothing on the command line,
so let's check the repository" _added_ by the patch---because it has
always been there.  Makes sense.

> The code and commit message do a good job of identifying this bug
> and difference in behavior. The only suggestion I have is to update
> the commit message to point to my original commit that failed to
> implement this behavior in the expected way.

Sounds sensible and makes sense.

Thanks, both.  Let's see a (hopefully small and final) reroll.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-02 16:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 11:25 [GSoC PATCH] backfill: auto-detect sparse-checkout from config Trieu Huynh
2026-03-31 16:59 ` Junio C Hamano
2026-04-01 19:31   ` Trieu Huynh
2026-04-02 14:24     ` Derrick Stolee
2026-04-02 16:05       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox