* [GSoC PATCH] backfill: auto-detect sparse-checkout from config @ 2026-03-31 11:25 Trieu Huynh 2026-03-31 16:59 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Trieu Huynh @ 2026-03-31 11:25 UTC (permalink / raw) To: git; +Cc: Trieu Huynh git backfill currently initializes the `sparse` field in backfill_context to 0. This causes the command to always perform a full backfill by default, even when the repository has sparse-checkout enabled in its configuration (core.sparseCheckout). Because 'sparse' is explicitly set to 0 at initialization, any later logic intended to auto-detect the setting from the repository configuration becomes dead code, as it only triggers if the value is negative (sentinel). Change the initial value of .sparse to -1. This allows the command to correctly fallback to the repository's sparse-checkout settings when the '--sparse' or '--no-sparse' options are not provided on the command line. Add a test case in t5620-backfill.sh to verify that 'git backfill' automatically respects the sparse-checkout configuration without explicit flags. Signed-off-by: Trieu Huynh <vikingtc4@gmail.com> --- builtin/backfill.c | 2 +- t/t5620-backfill.sh | 15 +++++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/builtin/backfill.c b/builtin/backfill.c index 4b2db94173..0f31844ce7 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit .repo = repo, .current_batch = OID_ARRAY_INIT, .min_batch_size = 50000, - .sparse = 0, + .sparse = -1, .show_progress = -1, }; struct option options[] = { diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh index 91b5115732..a1a8d736db 100755 --- a/t/t5620-backfill.sh +++ b/t/t5620-backfill.sh @@ -149,6 +149,21 @@ test_expect_success 'backfill --sparse' ' test_line_count = 0 missing ' +test_expect_success 'backfill auto-detects sparse-checkout from config' ' + git clone --sparse --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" backfill-auto-sparse && + + git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing && + test_line_count = 44 missing && + + GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \ + -C backfill-auto-sparse backfill && + + test_trace2_data promisor fetch_count 4 <auto-sparse-trace && + test_trace2_data path-walk paths 5 <auto-sparse-trace +' + test_expect_success 'backfill --sparse without cone mode (positive)' ' git clone --no-checkout --filter=blob:none \ --single-branch --branch=main \ -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config 2026-03-31 11:25 [GSoC PATCH] backfill: auto-detect sparse-checkout from config Trieu Huynh @ 2026-03-31 16:59 ` Junio C Hamano 2026-04-01 19:31 ` Trieu Huynh 0 siblings, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2026-03-31 16:59 UTC (permalink / raw) To: Trieu Huynh; +Cc: git, Derrick Stolee Trieu Huynh <vikingtc4@gmail.com> writes: > git backfill currently initializes the `sparse` field in > backfill_context to 0. This causes the command to always perform a > full backfill by default, even when the repository has sparse-checkout > enabled in its configuration (core.sparseCheckout). > > Because 'sparse' is explicitly set to 0 at initialization, any later > logic intended to auto-detect the setting from the repository > configuration becomes dead code, as it only triggers if the value > is negative (sentinel). > > Change the initial value of .sparse to -1. This allows the command > to correctly fallback to the repository's sparse-checkout settings > when the '--sparse' or '--no-sparse' options are not provided on the > command line. The author of bff45557 (backfill: add --sparse option, 2025-02-03), where this .sparse member originates, CC'ed for more intelligent input than my review can offer ;-) > Add a test case in t5620-backfill.sh to verify that 'git backfill' > automatically respects the sparse-checkout configuration without > explicit flags. > > Signed-off-by: Trieu Huynh <vikingtc4@gmail.com> > --- > builtin/backfill.c | 2 +- > t/t5620-backfill.sh | 15 +++++++++++++++ > 2 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/builtin/backfill.c b/builtin/backfill.c > index 4b2db94173..0f31844ce7 100644 > --- a/builtin/backfill.c > +++ b/builtin/backfill.c > @@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit > .repo = repo, > .current_batch = OID_ARRAY_INIT, > .min_batch_size = 50000, > - .sparse = 0, > + .sparse = -1, > .show_progress = -1, > }; > struct option options[] = { I am a bit confused by this change. What's the difference between using -1 (which you picked) and 1 as the initial value for this member? From the proposed log message, I would have expected a new code that says "ah, we notice, from this member being -1, that the user did not specify --no-sparse or --sparse, so let's figure out if our working tree is sparsely checked out ourselves and set it either to 0 or to 1", but there is nothing like that in the code. It seems that the updated code relies on the fact that this part of do_backfill() only cares if .sparse is zero or not, and ... if (ctx->sparse) { CALLOC_ARRAY(info.pl, 1); if (get_sparse_checkout_patterns(info.pl)) { path_walk_info_clear(&info); return error(_("problem loading sparse-checkout")); } } ... relies on get_sparse_checkout_patterns() not to do any harm when the working tree is not sparsely checked out. I am not sure if we want to call it "auto-detction". It looks more like "default to --sparse, relying that --sparse is a no-op in a non-sparse working tree" at least to me. Not that it is necessarily wrong, and when people do "backfill" knowing that the working tree is sparse, I am sympathetic if they prefer to keep the sparseness, so such a change of default may be beneficial. Derrick, what do you think? > diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh > index 91b5115732..a1a8d736db 100755 > --- a/t/t5620-backfill.sh > +++ b/t/t5620-backfill.sh > @@ -149,6 +149,21 @@ test_expect_success 'backfill --sparse' ' > test_line_count = 0 missing > ' > > +test_expect_success 'backfill auto-detects sparse-checkout from config' ' > + git clone --sparse --filter=blob:none \ > + --single-branch --branch=main \ > + "file://$(pwd)/srv.bare" backfill-auto-sparse && > + > + git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing && > + test_line_count = 44 missing && > + > + GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \ > + -C backfill-auto-sparse backfill && > + > + test_trace2_data promisor fetch_count 4 <auto-sparse-trace && > + test_trace2_data path-walk paths 5 <auto-sparse-trace > +' > + > test_expect_success 'backfill --sparse without cone mode (positive)' ' > git clone --no-checkout --filter=blob:none \ > --single-branch --branch=main \ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config 2026-03-31 16:59 ` Junio C Hamano @ 2026-04-01 19:31 ` Trieu Huynh 2026-04-02 14:24 ` Derrick Stolee 0 siblings, 1 reply; 5+ messages in thread From: Trieu Huynh @ 2026-04-01 19:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Derrick Stolee On Tue, Mar 31, 2026 at 09:59:10AM -0700, Junio C Hamano wrote: > Trieu Huynh <vikingtc4@gmail.com> writes: > > > git backfill currently initializes the `sparse` field in > > backfill_context to 0. This causes the command to always perform a > > full backfill by default, even when the repository has sparse-checkout > > enabled in its configuration (core.sparseCheckout). > > > > Because 'sparse' is explicitly set to 0 at initialization, any later > > logic intended to auto-detect the setting from the repository > > configuration becomes dead code, as it only triggers if the value > > is negative (sentinel). > > > > Change the initial value of .sparse to -1. This allows the command > > to correctly fallback to the repository's sparse-checkout settings > > when the '--sparse' or '--no-sparse' options are not provided on the > > command line. > > The author of bff45557 (backfill: add --sparse option, 2025-02-03), > where this .sparse member originates, CC'ed for more intelligent > input than my review can offer ;-) > > > > Add a test case in t5620-backfill.sh to verify that 'git backfill' > > automatically respects the sparse-checkout configuration without > > explicit flags. > > > > Signed-off-by: Trieu Huynh <vikingtc4@gmail.com> > > --- > > builtin/backfill.c | 2 +- > > t/t5620-backfill.sh | 15 +++++++++++++++ > > 2 files changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/builtin/backfill.c b/builtin/backfill.c > > index 4b2db94173..0f31844ce7 100644 > > --- a/builtin/backfill.c > > +++ b/builtin/backfill.c > > @@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit > > .repo = repo, > > .current_batch = OID_ARRAY_INIT, > > .min_batch_size = 50000, > > - .sparse = 0, > > + .sparse = -1, > > .show_progress = -1, > > }; > > struct option options[] = { > > I am a bit confused by this change. What's the difference between > using -1 (which you picked) and 1 as the initial value for this > member? From the proposed log message, I would have expected a new > code that says "ah, we notice, from this member being -1, that the > user did not specify --no-sparse or --sparse, so let's figure out if > our working tree is sparsely checked out ourselves and set it either > to 0 or to 1", but there is nothing like that in the code. It seems > that the updated code relies on the fact that this part of > do_backfill() only cares if .sparse is zero or not, and ... > > if (ctx->sparse) { > CALLOC_ARRAY(info.pl, 1); > if (get_sparse_checkout_patterns(info.pl)) { > path_walk_info_clear(&info); > return error(_("problem loading sparse-checkout")); > } > } > > ... relies on get_sparse_checkout_patterns() not to do any harm when > the working tree is not sparsely checked out. > > I am not sure if we want to call it "auto-detction". It looks more > like "default to --sparse, relying that --sparse is a no-op in a > non-sparse working tree" at least to me. Not that it is necessarily > wrong, and when people do "backfill" knowing that the working tree > is sparse, I am sympathetic if they prefer to keep the sparseness, > so such a change of default may be beneficial. > actually, the logic IIUC here is: - first, ctx.sprase originally is set to 0. - then, it check user's options. Assume, there is no option passed, still 0. - then, it check repo's config (core.sparseCheckout (default to 0 in enviroment.c) but it doesn't since the guard: if (ctx.sparse < 0) ctx.sparse = cfg->apply_sparse_checkout; - evenly. ctx.sparse still 0 eventhough in case the core.sparseCheckout = 1 (git config core.sparseCheckout true) IMHO, this change set the default value to -1, then it can fallback to repo's config value if user has no-op passing (default to 0 (full backfill if user doesnt intent to config previously either). > Derrick, what do you think? > > > diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh > > index 91b5115732..a1a8d736db 100755 > > --- a/t/t5620-backfill.sh > > +++ b/t/t5620-backfill.sh > > @@ -149,6 +149,21 @@ test_expect_success 'backfill --sparse' ' > > test_line_count = 0 missing > > ' > > > > +test_expect_success 'backfill auto-detects sparse-checkout from config' ' > > + git clone --sparse --filter=blob:none \ > > + --single-branch --branch=main \ > > + "file://$(pwd)/srv.bare" backfill-auto-sparse && > > + > > + git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing && > > + test_line_count = 44 missing && > > + > > + GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \ > > + -C backfill-auto-sparse backfill && > > + > > + test_trace2_data promisor fetch_count 4 <auto-sparse-trace && > > + test_trace2_data path-walk paths 5 <auto-sparse-trace > > +' > > + > > test_expect_success 'backfill --sparse without cone mode (positive)' ' > > git clone --no-checkout --filter=blob:none \ > > --single-branch --branch=main \ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config 2026-04-01 19:31 ` Trieu Huynh @ 2026-04-02 14:24 ` Derrick Stolee 2026-04-02 16:05 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Derrick Stolee @ 2026-04-02 14:24 UTC (permalink / raw) To: Trieu Huynh, Junio C Hamano; +Cc: git On 4/1/26 3:31 PM, Trieu Huynh wrote: > On Tue, Mar 31, 2026 at 09:59:10AM -0700, Junio C Hamano wrote: >>> diff --git a/builtin/backfill.c b/builtin/backfill.c >>> index 4b2db94173..0f31844ce7 100644 >>> --- a/builtin/backfill.c >>> +++ b/builtin/backfill.c >>> @@ -124,7 +124,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit >>> .repo = repo, >>> .current_batch = OID_ARRAY_INIT, >>> .min_batch_size = 50000, >>> - .sparse = 0, >>> + .sparse = -1, >>> .show_progress = -1, >>> }; >>> struct option options[] = { >> >> I am a bit confused by this change. What's the difference between >> using -1 (which you picked) and 1 as the initial value for this >> member? From the proposed log message, I would have expected a new >> code that says "ah, we notice, from this member being -1, that the >> user did not specify --no-sparse or --sparse, so let's figure out if >> our working tree is sparsely checked out ourselves and set it either >> to 0 or to 1", but there is nothing like that in the code. It seems >> that the updated code relies on the fact that this part of >> do_backfill() only cares if .sparse is zero or not, and ... >> >> if (ctx->sparse) { >> CALLOC_ARRAY(info.pl, 1); >> if (get_sparse_checkout_patterns(info.pl)) { >> path_walk_info_clear(&info); >> return error(_("problem loading sparse-checkout")); >> } >> } >> >> ... relies on get_sparse_checkout_patterns() not to do any harm when >> the working tree is not sparsely checked out. >> >> I am not sure if we want to call it "auto-detction". It looks more >> like "default to --sparse, relying that --sparse is a no-op in a >> non-sparse working tree" at least to me. Not that it is necessarily >> wrong, and when people do "backfill" knowing that the working tree >> is sparse, I am sympathetic if they prefer to keep the sparseness, >> so such a change of default may be beneficial. >> > actually, the logic IIUC here is: > - first, ctx.sprase originally is set to 0. > - then, it check user's options. Assume, there is no option passed, > still 0. > - then, it check repo's config (core.sparseCheckout (default to 0 in > enviroment.c) but it doesn't since the guard: > if (ctx.sparse < 0) > ctx.sparse = cfg->apply_sparse_checkout; > - evenly. ctx.sparse still 0 eventhough in case the > core.sparseCheckout = 1 (git config core.sparseCheckout true) > > IMHO, this change set the default value to -1, then it can fallback to > repo's config value if user has no-op passing (default to 0 (full > backfill if user doesnt intent to config previously either). >> Derrick, what do you think? Indeed, I thought this was how it already worked, as 85127bcdea (backfill: assume --sparse when sparse-checkout is enabled, 2025-02-03) (introduced in [1]) should have covered. [1] https://lore.kernel.org/git/f22cf8b34851a3ba4cd6ab1d31f0835579143c40.1738602667.git.gitgitgadget@gmail.com/ However, that change did not include this test: >>> +test_expect_success 'backfill auto-detects sparse-checkout from config' ' >>> + git clone --sparse --filter=blob:none \ >>> + --single-branch --branch=main \ >>> + "file://$(pwd)/srv.bare" backfill-auto-sparse && >>> + >>> + git -C backfill-auto-sparse rev-list --quiet --objects --missing=print HEAD >missing && >>> + test_line_count = 44 missing && >>> + >>> + GIT_TRACE2_EVENT="$(pwd)/auto-sparse-trace" git \ >>> + -C backfill-auto-sparse backfill && >>> + >>> + test_trace2_data promisor fetch_count 4 <auto-sparse-trace && >>> + test_trace2_data path-walk paths 5 <auto-sparse-trace >>> +' This actually demonstrates the behavior that I was intending to introduce in that change, but made an error in (1) not testing it like I thought I had, and (2) not doing exactly what this change does by having the .sparse option initialized as -1 (unset). The code and commit message do a good job of identifying this bug and difference in behavior. The only suggestion I have is to update the commit message to point to my original commit that failed to implement this behavior in the expected way. Thanks, -Stolee ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [GSoC PATCH] backfill: auto-detect sparse-checkout from config 2026-04-02 14:24 ` Derrick Stolee @ 2026-04-02 16:05 ` Junio C Hamano 0 siblings, 0 replies; 5+ messages in thread From: Junio C Hamano @ 2026-04-02 16:05 UTC (permalink / raw) To: Derrick Stolee; +Cc: Trieu Huynh, git Derrick Stolee <stolee@gmail.com> writes: > On 4/1/26 3:31 PM, Trieu Huynh wrote: >> On Tue, Mar 31, 2026 at 09:59:10AM -0700, Junio C Hamano wrote: > ... >>> I am a bit confused by this change. What's the difference between >>> using -1 (which you picked) and 1 as the initial value for this >>> member? From the proposed log message, I would have expected a new >>> code that says "ah, we notice, from this member being -1, that the >>> user did not specify --no-sparse or --sparse, so let's figure out if >>> our working tree is sparsely checked out ourselves and set it either >>> to 0 or to 1", but there is nothing like that in the code. >> ... >> IMHO, this change set the default value to -1, then it can fallback to >> repo's config value if user has no-op passing (default to 0 (full >> backfill if user doesnt intent to config previously either). >>> Derrick, what do you think? > > Indeed, I thought this was how it already worked, as 85127bcdea > (backfill: assume --sparse when sparse-checkout is enabled, > 2025-02-03) (introduced in [1]) should have covered. Ah, OK, in other words, the code to use how the repository is configured when the user does not override from the command line was already there; it was just the way the code checked if the user gave something from the command line was wrong (i.e., initialized to 0, pretending that '--no-foo" was given even when there isn't), and that is why we do not see "ok, there is nothing on the command line, so let's check the repository" _added_ by the patch---because it has always been there. Makes sense. > The code and commit message do a good job of identifying this bug > and difference in behavior. The only suggestion I have is to update > the commit message to point to my original commit that failed to > implement this behavior in the expected way. Sounds sensible and makes sense. Thanks, both. Let's see a (hopefully small and final) reroll. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-02 16:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-31 11:25 [GSoC PATCH] backfill: auto-detect sparse-checkout from config Trieu Huynh 2026-03-31 16:59 ` Junio C Hamano 2026-04-01 19:31 ` Trieu Huynh 2026-04-02 14:24 ` Derrick Stolee 2026-04-02 16:05 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox