* [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar
@ 2025-10-09 21:01 Emily Yang via GitGitGadget
2025-10-09 22:30 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Emily Yang via GitGitGadget @ 2025-10-09 21:01 UTC (permalink / raw)
To: git; +Cc: gitster, stolee, me, ps, newren, Emily Yang, Emily Yang
From: Emily Yang <emilyyang.git@gmail.com>
The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option.
Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths" - "true" value acts like
"--changed-paths"; "false" disables a previous "true" config value but
doesn't imply "--no-changed-paths". This config will always respect the
precedence of command line option "--changed-paths" and
"--no-changed-paths".
We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
---
commit-graph: add new config for changed-paths & recommend it in scalar
Hello,
I'm Emily and I'm interested in contributing to Git. This is my first
contribution to Git, super excited!
I'm from Microsoft and spend most of my time working in the Office
MonoRepo (OMR, one of the largest repos in the world). Recently I've
been working with Derrick Stolee on Git performance related topics. We'd
love to propose a small enhancement on the existing changed-paths Bloom
filters feature to benefit large repos like OMR. Please kindly review
the code and provide your feedback!
Thanks, Emily
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1983
Documentation/config/commitgraph.adoc | 8 +++++
builtin/commit-graph.c | 2 ++
scalar.c | 1 +
t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++
4 files changed, 55 insertions(+)
diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
index 7f8c9d6638..c540e8a43d 100644
--- a/Documentation/config/commitgraph.adoc
+++ b/Documentation/config/commitgraph.adoc
@@ -8,6 +8,14 @@ commitGraph.maxNewFilters::
Specifies the default value for the `--max-new-filters` option of `git
commit-graph write` (c.f., linkgit:git-commit-graph[1]).
+commitGraph.changedPaths::
+ If true, then `git commit-graph write` will compute and write
+ changed-path Bloom filters by default, equivalent to passing
+ `--changed-paths`. If false or unset, changed-path Bloom filters
+ will only be written when explicitly requested via `--changed-paths`.
+ Command-line options always take precedence over this configuration.
+ Defaults to unset.
+
commitGraph.readChangedPaths::
Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index fe3ebaadad..d62005edc0 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
{
if (!strcmp(var, "commitgraph.maxnewfilters"))
write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
+ else if (!strcmp(var, "commitgraph.changedpaths"))
+ opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
/*
* No need to fall-back to 'git_default_config', since this was already
* called in 'cmd_commit_graph()'.
diff --git a/scalar.c b/scalar.c
index 4a373c133d..f754311627 100644
--- a/scalar.c
+++ b/scalar.c
@@ -166,6 +166,7 @@ static int set_recommended_config(int reconfigure)
#endif
/* Optional */
{ "status.aheadBehind", "false" },
+ { "commitGraph.changedPaths", "true" },
{ "commitGraph.generationVersion", "1" },
{ "core.autoCRLF", "false" },
{ "core.safeCRLF", "false" },
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0b3404f58f..98c6910963 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -946,4 +946,48 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
)
'
+test_expect_success 'config commitGraph.changedPaths acts like --changed-paths' '
+ git init config-changed-paths &&
+ (
+ cd config-changed-paths &&
+
+ # commitGraph.changedPaths is not set and it should not write Bloom filters
+ test_commit first &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+
+ # Set commitGraph.changedPaths to true and it should write Bloom filters
+ test_commit second &&
+ git config commitGraph.changedPaths true &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # Add one more config commitGraph.changedPaths as false to disable the previous true config value
+ # It should still write Bloom filters due to existing filters
+ test_commit third &&
+ git config --add commitGraph.changedPaths false &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # commitGraph.changedPaths is still false and command line options should take precedence
+ test_commit fourth &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+
+ # commitGraph.changedPaths is all cleared and then set to false again, command line options should take precedence
+ test_commit fifth &&
+ git config --unset-all commitGraph.changedPaths &&
+ git config commitGraph.changedPaths false &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --changed-paths --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # commitGraph.changedPaths is still false and it should write Bloom filters due to existing filters
+ test_commit sixth &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error
+ )
+'
+
test_done
base-commit: 79cf913ea9321f774da29b2330b5781d5ff420ef
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget @ 2025-10-09 22:30 ` Junio C Hamano 2025-10-10 12:48 ` Derrick Stolee 2025-10-10 12:32 ` Derrick Stolee 2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget 2 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2025-10-09 22:30 UTC (permalink / raw) To: Emily Yang via GitGitGadget; +Cc: git, stolee, me, ps, newren, Emily Yang "Emily Yang via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Emily Yang <emilyyang.git@gmail.com> > > The changed-path Bloom filters feature has proven stable and reliable > over several years of use, delivering significant performance > improvement for file history computation in large monorepos. Currently > a user can opt-in to writing the changed-path Bloom filters using the > "--changed-paths" option to "git commit-graph write". The filters will > be persisted until the user drops the filters using the > "--no-changed-paths" option. Makes sense. > Large monorepos using Git's background maintenance to build and update > commit-graph files could use an easy switch to enable this feature > without a foreground computation. Again makes sense. > In this commit, we're proposing a new > config option "commitGraph.changedPaths" - "true" value acts like > "--changed-paths"; "false" disables a previous "true" config value but > doesn't imply "--no-changed-paths". The way the above is phrased is so unusual that I am afraid it would confuse readers. When a configuration variable gives an opportunity for the users to override the hardcoded default (in this case, --no-changed-paths has been the traditional default, and graph.changedPaths=true would make us pretend as if --changed-paths were given from the command line). So if we were to have this configuration variable, setting it false MUST make it pretend as if --no-changed-paths were given from the command line, and MUST continue to do so even in some future we changed the hardcoded default to be "true" (i.e., unless the user says graph.changedPath=false in the configuration and/or declines with "--no-changed-paths" from the command line, we will record the changed paths filter by default). Setting commitGraph.changedPaths to true should mean that the "git commit-graph write" command behaves as if --changed-paths were given immediately after that "write", so that an end-user commmand $ git commit-graph write should behave as if it was written like this $ git commit-graph write --changed-paths and $ git commit-graph write --no-changed-paths should behave as if it was written like this $ git commit-graph write --changed-paths --no-changed-paths i.e. allowing the command line --no-changed-paths to override it. Setting commitGraph.changedPaths to false should similarly mean that "--no-changed-paths" implicitly is added immediately after "write", meaning that $ git commit-graph write should behave as if it was written like this $ git commit-graph write --no-changed-paths As it is the default not to write changed-paths filter, this has no effect, but I would say it still "implies" --no-changed-paths, and I hope you'd agree once you imagine a hypothetical future in which the default for "git commit-graph write" is to write changed-paths filter by default. > This config will always respect the > precedence of command line option "--changed-paths" and > "--no-changed-paths". This is a bit unusual way to phrase this, but I think it makes sense for the configuraiton variable to be overridden by the command line option, as that is the bog-standard way configuration variables and command line options interact with each other; it is so standard that it is probably not even worth saying it. > We also set this new config as optional recommended config in scalar to > turn on this feature for large repos. Great. Yes, from the start of the description above, anybody who is aware of the "scalar" effort would be anticipating this conclusion. > Helped-by: Derrick Stolee <stolee@gmail.com> > Signed-off-by: Emily Yang <emilyyang.git@gmail.com> > --- > commit-graph: add new config for changed-paths & recommend it in scalar > > Hello, > > I'm Emily and I'm interested in contributing to Git. This is my first > contribution to Git, super excited! > > I'm from Microsoft and spend most of my time working in the Office > MonoRepo (OMR, one of the largest repos in the world). Recently I've > been working with Derrick Stolee on Git performance related topics. We'd > love to propose a small enhancement on the existing changed-paths Bloom > filters feature to benefit large repos like OMR. Please kindly review > the code and provide your feedback! > > Thanks, Emily > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1983 > > Documentation/config/commitgraph.adoc | 8 +++++ > builtin/commit-graph.c | 2 ++ > scalar.c | 1 + > t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++ > 4 files changed, 55 insertions(+) > > diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc > index 7f8c9d6638..c540e8a43d 100644 > --- a/Documentation/config/commitgraph.adoc > +++ b/Documentation/config/commitgraph.adoc > @@ -8,6 +8,14 @@ commitGraph.maxNewFilters:: > Specifies the default value for the `--max-new-filters` option of `git > commit-graph write` (c.f., linkgit:git-commit-graph[1]). > > +commitGraph.changedPaths:: > + If true, then `git commit-graph write` will compute and write > + changed-path Bloom filters by default, equivalent to passing > + `--changed-paths`. If false or unset, changed-path Bloom filters > + will only be written when explicitly requested via `--changed-paths`. > + Command-line options always take precedence over this configuration. > + Defaults to unset. > + > commitGraph.readChangedPaths:: > Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and > commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > index fe3ebaadad..d62005edc0 100644 > --- a/builtin/commit-graph.c > +++ b/builtin/commit-graph.c > @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value, > { > if (!strcmp(var, "commitgraph.maxnewfilters")) > write_opts.max_new_filters = git_config_int(var, value, ctx->kvi); > + else if (!strcmp(var, "commitgraph.changedpaths")) > + opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1; This is iffy. Unless the way existing command line parser figures out if the user wants or does not want to use the feature is so screwed up, you shouldn't have to do any such thing. Why do you need to special case 'false' this way? The usual practice is * First, you initialize the variable "enable_changed_paths" with the hardcoded default. In this case, as changed-paths is not written by default, you'd initialize it to 0 (not -1). * Then you read from the configuration variables to update it. If you see commitgraph.changedPaths configuration, you take its value (either 0 or 1 as it is a Boolean variable) and overwrite the hardcoded default in the "enable_changed_paths" variable. Otherwise you leave "enable_changed_paths" as-is. * If you also have environment variable override, then you see if there is the environment variable you care about, and if so, override "enable_changed_paths" with its value. Otherwise you leave "enable_changed_paths" as-is. * Finally you read from the command line options using parse_options(). If there are command line options given, "enable_changed_paths" would be overriden again. If the way the existing parser sets up enable_changed_paths is screwed up and does not follow the above pattern (I didn't check), perhaps you'd need a preliminary clean-up patch before adding this new feature. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-09 22:30 ` Junio C Hamano @ 2025-10-10 12:48 ` Derrick Stolee 2025-10-10 16:32 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Derrick Stolee @ 2025-10-10 12:48 UTC (permalink / raw) To: Junio C Hamano, Emily Yang via GitGitGadget Cc: git, me, ps, newren, Emily Yang On 10/9/2025 6:30 PM, Junio C Hamano wrote: > "Emily Yang via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Emily Yang <emilyyang.git@gmail.com> >> In this commit, we're proposing a new >> config option "commitGraph.changedPaths" - "true" value acts like >> "--changed-paths"; "false" disables a previous "true" config value but >> doesn't imply "--no-changed-paths". > > The way the above is phrased is so unusual that I am afraid it would > confuse readers. > > When a configuration variable gives an opportunity for the users to > override the hardcoded default (in this case, --no-changed-paths has > been the traditional default, (I'm pointing out this statement and how it's not quite right. I'll explain more fully lower in this reply.) > and graph.changedPaths=true would make > us pretend as if --changed-paths were given from the command line). > So if we were to have this configuration variable, setting it false > MUST make it pretend as if --no-changed-paths were given from the > command line, and MUST continue to do so even in some future we > changed the hardcoded default to be "true" (i.e., unless the user > says graph.changedPath=false in the configuration and/or declines > with "--no-changed-paths" from the command line, we will record the > changed paths filter by default). > > Setting commitGraph.changedPaths to true should mean that the > "git commit-graph write" command behaves as if --changed-paths > were given immediately after that "write", so that an end-user > commmand > > $ git commit-graph write > > should behave as if it was written like this > > $ git commit-graph write --changed-paths > > and > > $ git commit-graph write --no-changed-paths > > should behave as if it was written like this > > $ git commit-graph write --changed-paths --no-changed-paths > > i.e. allowing the command line --no-changed-paths to override it. > > Setting commitGraph.changedPaths to false should similarly mean that > "--no-changed-paths" implicitly is added immediately after "write", > meaning that > > $ git commit-graph write > > should behave as if it was written like this > > $ git commit-graph write --no-changed-paths One thing that is tricky about --[no-]changed-paths is that it is a "tri-state" argument due to 0087a87ba8 (commit-graph: persist existence of changed-paths, 2020-07-01): * --changed-paths : Definitely write the data, even if it didn't exist already. * --no-changed-paths : Definitely _don't_ write the data, even if it exists already. * (not present) : Update filters that do exist, but don't write them if they don't exist. This is reflected in how opts.enable_changed_paths is initialized to -1 in the existing version. Then, the config is loaded before the arguments are parsed (this is already enforcing the precedence of '--max-new-filters=<N>' over the 'commitGraph.maxNewFilters' config). Later, opts.enable_changed_paths is converted into COMMIT_GRAPH_WRITE_BLOOM_FILTERS or COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS flags for the underlying commit-graph API, with the default of -1 passing neither flag (which will use any existing commit-graph to persist and extend filters that already exist). The big reason for this is so users can use a foreground process to initialize filters, then background maintenance will respect and persist that behavior. The big change here is that the config allows a user to enable the filters and have them be computed entirely in the background. So I think this is the root of your concerns here. >> @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value, >> { >> if (!strcmp(var, "commitgraph.maxnewfilters")) >> write_opts.max_new_filters = git_config_int(var, value, ctx->kvi); >> + else if (!strcmp(var, "commitgraph.changedpaths")) >> + opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1; > > This is iffy. > > Unless the way existing command line parser figures out if the user > wants or does not want to use the feature is so screwed up, you > shouldn't have to do any such thing. > > Why do you need to special case 'false' this way? The config now has this implication: * true : turn '(not present)' into '--changed-paths'. * false/unset : Continue to assume '(not present)'. And the typical case is that we would have 'false' imply '--no-changed-paths' which _removes_ filters that may exist. I could see a case for this. The situation that I wanted to think about was this: * A user sets the config to 'true' in global config. * They then set the config to 'false' in a specific repo. In this case, the 'false' _disables the config_ but doesn't cause any existing filters to be deleted. I hope this helps. I could see a case for 'false' implying '--no-changed-filters' but as Emily was investigating this and noting this discrepancy, we leaned in the direction of being non- destructive with the config. Thanks, -Stolee ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-10 12:48 ` Derrick Stolee @ 2025-10-10 16:32 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2025-10-10 16:32 UTC (permalink / raw) To: Derrick Stolee Cc: Emily Yang via GitGitGadget, git, me, ps, newren, Emily Yang Derrick Stolee <stolee@gmail.com> writes: > One thing that is tricky about --[no-]changed-paths is that it is a > "tri-state" argument due to 0087a87ba8 (commit-graph: persist > existence of changed-paths, 2020-07-01): > > * --changed-paths : Definitely write the data, even if it didn't > exist already. > > * --no-changed-paths : Definitely _don't_ write the data, even if > it exists already. > > * (not present) : Update filters that do exist, but don't write them > if they don't exist. OK, so "--no-" is not the usual "no"; it is more like "strip existing" that implies "even existing ones are getting nuked, there is no way we write new ones". OK, that may explain the construct I found funny. Thanks for clarifying. > The situation that I wanted to think about was this: > > * A user sets the config to 'true' in global config. > * They then set the config to 'false' in a specific repo. > > In this case, the 'false' _disables the config_ but doesn't cause > any existing filters to be deleted. Ouch, that hurts, as they expected this specific one would drop existing filters but that does not happen. Perhaps we need to strengthen the description of --no-* (if not renaming it to --drop-* or something to clarify what it really does). At least the configuration needs to be explained not like: "setting it to false is different from --no-changed-paths option". The documentation should not stop at saying what it is not, but should also say what it does. Perhaps "setting it to true is like always giving --changed-paths, setting it to false stops writing new changed paths filters, but without dropping existing changed paths filters" or something along that line. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget 2025-10-09 22:30 ` Junio C Hamano @ 2025-10-10 12:32 ` Derrick Stolee 2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget 2 siblings, 0 replies; 9+ messages in thread From: Derrick Stolee @ 2025-10-10 12:32 UTC (permalink / raw) To: Emily Yang via GitGitGadget, git; +Cc: gitster, me, ps, newren, Emily Yang On 10/9/2025 5:01 PM, Emily Yang via GitGitGadget wrote: > From: Emily Yang <emilyyang.git@gmail.com> > Hello, > > I'm Emily and I'm interested in contributing to Git. This is my first > contribution to Git, super excited! > > I'm from Microsoft and spend most of my time working in the Office > MonoRepo (OMR, one of the largest repos in the world). Recently I've > been working with Derrick Stolee on Git performance related topics. We'd > love to propose a small enhancement on the existing changed-paths Bloom > filters feature to benefit large repos like OMR. Please kindly review > the code and provide your feedback! Congratulations on your first Git submission, Emily! For the rest on the list, Emily and I work together in support of engineering systems at Microsoft, and her team is particularly interested in Git performance for the Office monorepo. This first patch is hopefully one of many to follow as we build up more people with the right expertise to make changes to Git, especially at our boundaries of scale. I've already done a "pre-review" of this patch as part of mentoring Emily in her journey to Git contribution. Thanks, -Stolee ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget 2025-10-09 22:30 ` Junio C Hamano 2025-10-10 12:32 ` Derrick Stolee @ 2025-10-17 20:58 ` Emily Yang via GitGitGadget 2025-10-22 14:53 ` Derrick Stolee 2025-10-29 21:04 ` SZEDER Gábor 2 siblings, 2 replies; 9+ messages in thread From: Emily Yang via GitGitGadget @ 2025-10-17 20:58 UTC (permalink / raw) To: git; +Cc: gitster, stolee, me, ps, newren, Emily Yang, Emily Yang From: Emily Yang <emilyyang.git@gmail.com> The changed-path Bloom filters feature has proven stable and reliable over several years of use, delivering significant performance improvement for file history computation in large monorepos. Currently a user can opt-in to writing the changed-path Bloom filters using the "--changed-paths" option to "git commit-graph write". The filters will be persisted until the user drops the filters using the "--no-changed-paths" option. For this functionality, refer to 0087a87ba8 (commit-graph: persist existence of changed-paths, 2020-07-01). Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation. In this commit, we're proposing a new config option "commitGraph.changedPaths": * If "true", "git commit-graph write" will write Bloom filters, equivalent to passing "--changed-paths"; * If "false" or "unset", Bloom filters will be written during "git commit-graph write" only if the filters already exist in the current commit-graph file. This matches the default behaviour of "git commit-graph write" without any "--[no-]changed-paths" option. Note "false" can disable a previous "true" config value but doesn't imply "--no-changed-paths". This config will always respect the precedence of command line option "--[no-]changed-paths". We also set this new config as optional recommended config in scalar to turn on this feature for large repos. Helped-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Emily Yang <emilyyang.git@gmail.com> --- commit-graph: add new config for changed-paths & recommend it in scalar Hello, I'm Emily and I'm interested in contributing to Git. This is my first contribution to Git, super excited! I'm from Microsoft and spend most of my time working in the Office MonoRepo (OMR, one of the largest repos in the world). Recently I've been working with Derrick Stolee on Git performance related topics. We'd love to propose a small enhancement on the existing changed-paths Bloom filters feature to benefit large repos like OMR. Please kindly review the code and provide your feedback! What's included in v2: I received feedback about the confusion around the config explanation, so in v2 I added more clarification in the doc and commit message, hopefully it helps! Thanks, Emily Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1983 Range-diff vs v1: 1: 90b271e905 ! 1: 365db79f4d commit-graph: add new config for changed-paths & recommend it in scalar @@ Commit message a user can opt-in to writing the changed-path Bloom filters using the "--changed-paths" option to "git commit-graph write". The filters will be persisted until the user drops the filters using the - "--no-changed-paths" option. + "--no-changed-paths" option. For this functionality, refer to 0087a87ba8 + (commit-graph: persist existence of changed-paths, 2020-07-01). Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation. In this commit, we're proposing a new - config option "commitGraph.changedPaths" - "true" value acts like - "--changed-paths"; "false" disables a previous "true" config value but - doesn't imply "--no-changed-paths". This config will always respect the - precedence of command line option "--changed-paths" and - "--no-changed-paths". + config option "commitGraph.changedPaths": + + * If "true", "git commit-graph write" will write Bloom filters, + equivalent to passing "--changed-paths"; + * If "false" or "unset", Bloom filters will be written during "git + commit-graph write" only if the filters already exist in the current + commit-graph file. This matches the default behaviour of "git + commit-graph write" without any "--[no-]changed-paths" option. Note + "false" can disable a previous "true" config value but doesn't imply + "--no-changed-paths". + + This config will always respect the precedence of command line option + "--[no-]changed-paths". We also set this new config as optional recommended config in scalar to turn on this feature for large repos. @@ Documentation/config/commitgraph.adoc: commitGraph.maxNewFilters:: +commitGraph.changedPaths:: + If true, then `git commit-graph write` will compute and write + changed-path Bloom filters by default, equivalent to passing -+ `--changed-paths`. If false or unset, changed-path Bloom filters -+ will only be written when explicitly requested via `--changed-paths`. -+ Command-line options always take precedence over this configuration. -+ Defaults to unset. ++ `--changed-paths`. If false or unset, changed-paths Bloom filters will ++ be written during `git commit-graph write` only if the filters already ++ exist in the current commit-graph file. This matches the default ++ behavior of `git commit-graph write` without any `--[no-]changed-paths` ++ option. To rewrite a commit-graph file without any filters, use the ++ `--no-changed-paths` option. Command-line option `--[no-]changed-paths` ++ always takes precedence over this configuration. Defaults to unset. + commitGraph.readChangedPaths:: Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion + ## Documentation/git-commit-graph.adoc ## +@@ Documentation/git-commit-graph.adoc: take a while on large repositories. It provides significant performance gains + for getting history of a directory or a file with `git log -- <path>`. If + this option is given, future commit-graph writes will automatically assume + that this option was intended. Use `--no-changed-paths` to stop storing this +-data. ++data. `--changed-paths` is implied by config `commitGraph.changedPaths=true`. + + + With the `--max-new-filters=<n>` option, generate at most `n` new Bloom + filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is + ## builtin/commit-graph.c ## @@ builtin/commit-graph.c: static int git_commit_graph_write_config(const char *var, const char *value, { Documentation/config/commitgraph.adoc | 11 +++++++ Documentation/git-commit-graph.adoc | 2 +- builtin/commit-graph.c | 2 ++ scalar.c | 1 + t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++ 5 files changed, 59 insertions(+), 1 deletion(-) diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc index 7f8c9d6638..70a56c53d2 100644 --- a/Documentation/config/commitgraph.adoc +++ b/Documentation/config/commitgraph.adoc @@ -8,6 +8,17 @@ commitGraph.maxNewFilters:: Specifies the default value for the `--max-new-filters` option of `git commit-graph write` (c.f., linkgit:git-commit-graph[1]). +commitGraph.changedPaths:: + If true, then `git commit-graph write` will compute and write + changed-path Bloom filters by default, equivalent to passing + `--changed-paths`. If false or unset, changed-paths Bloom filters will + be written during `git commit-graph write` only if the filters already + exist in the current commit-graph file. This matches the default + behavior of `git commit-graph write` without any `--[no-]changed-paths` + option. To rewrite a commit-graph file without any filters, use the + `--no-changed-paths` option. Command-line option `--[no-]changed-paths` + always takes precedence over this configuration. Defaults to unset. + commitGraph.readChangedPaths:: Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion diff --git a/Documentation/git-commit-graph.adoc b/Documentation/git-commit-graph.adoc index e9558173c0..6d19026035 100644 --- a/Documentation/git-commit-graph.adoc +++ b/Documentation/git-commit-graph.adoc @@ -71,7 +71,7 @@ take a while on large repositories. It provides significant performance gains for getting history of a directory or a file with `git log -- <path>`. If this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this -data. +data. `--changed-paths` is implied by config `commitGraph.changedPaths=true`. + With the `--max-new-filters=<n>` option, generate at most `n` new Bloom filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index fe3ebaadad..d62005edc0 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value, { if (!strcmp(var, "commitgraph.maxnewfilters")) write_opts.max_new_filters = git_config_int(var, value, ctx->kvi); + else if (!strcmp(var, "commitgraph.changedpaths")) + opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1; /* * No need to fall-back to 'git_default_config', since this was already * called in 'cmd_commit_graph()'. diff --git a/scalar.c b/scalar.c index 4a373c133d..f754311627 100644 --- a/scalar.c +++ b/scalar.c @@ -166,6 +166,7 @@ static int set_recommended_config(int reconfigure) #endif /* Optional */ { "status.aheadBehind", "false" }, + { "commitGraph.changedPaths", "true" }, { "commitGraph.generationVersion", "1" }, { "core.autoCRLF", "false" }, { "core.safeCRLF", "false" }, diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 0b3404f58f..98c6910963 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -946,4 +946,48 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' ' ) ' +test_expect_success 'config commitGraph.changedPaths acts like --changed-paths' ' + git init config-changed-paths && + ( + cd config-changed-paths && + + # commitGraph.changedPaths is not set and it should not write Bloom filters + test_commit first && + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && + test_grep ! "Bloom filters" error && + + # Set commitGraph.changedPaths to true and it should write Bloom filters + test_commit second && + git config commitGraph.changedPaths true && + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && + test_grep "Bloom filters" error && + + # Add one more config commitGraph.changedPaths as false to disable the previous true config value + # It should still write Bloom filters due to existing filters + test_commit third && + git config --add commitGraph.changedPaths false && + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && + test_grep "Bloom filters" error && + + # commitGraph.changedPaths is still false and command line options should take precedence + test_commit fourth && + GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error && + test_grep ! "Bloom filters" error && + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && + test_grep ! "Bloom filters" error && + + # commitGraph.changedPaths is all cleared and then set to false again, command line options should take precedence + test_commit fifth && + git config --unset-all commitGraph.changedPaths && + git config commitGraph.changedPaths false && + GIT_PROGRESS_DELAY=0 git commit-graph write --changed-paths --reachable --progress 2>error && + test_grep "Bloom filters" error && + + # commitGraph.changedPaths is still false and it should write Bloom filters due to existing filters + test_commit sixth && + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && + test_grep "Bloom filters" error + ) +' + test_done base-commit: 79cf913ea9321f774da29b2330b5781d5ff420ef -- gitgitgadget ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget @ 2025-10-22 14:53 ` Derrick Stolee 2025-10-22 17:42 ` Junio C Hamano 2025-10-29 21:04 ` SZEDER Gábor 1 sibling, 1 reply; 9+ messages in thread From: Derrick Stolee @ 2025-10-22 14:53 UTC (permalink / raw) To: Emily Yang via GitGitGadget, git; +Cc: gitster, me, ps, newren, Emily Yang On 10/17/2025 4:58 PM, Emily Yang via GitGitGadget wrote: > From: Emily Yang <emilyyang.git@gmail.com> > What's included in v2: > > I received feedback about the confusion around the config explanation, > so in v2 I added more clarification in the doc and commit message, > hopefully it helps! > > Thanks, Emily Thanks for these updates. I'm happy with the new version. Thanks, -Stolee ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-22 14:53 ` Derrick Stolee @ 2025-10-22 17:42 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2025-10-22 17:42 UTC (permalink / raw) To: Derrick Stolee Cc: Emily Yang via GitGitGadget, git, me, ps, newren, Emily Yang Derrick Stolee <stolee@gmail.com> writes: > On 10/17/2025 4:58 PM, Emily Yang via GitGitGadget wrote: >> From: Emily Yang <emilyyang.git@gmail.com> > >> What's included in v2: >> >> I received feedback about the confusion around the config explanation, >> so in v2 I added more clarification in the doc and commit message, >> hopefully it helps! >> >> Thanks, Emily > > Thanks for these updates. I'm happy with the new version. Thanks, both. Will queue and mark it for 'next'. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget 2025-10-22 14:53 ` Derrick Stolee @ 2025-10-29 21:04 ` SZEDER Gábor 1 sibling, 0 replies; 9+ messages in thread From: SZEDER Gábor @ 2025-10-29 21:04 UTC (permalink / raw) To: Emily Yang via GitGitGadget Cc: git, gitster, stolee, me, ps, newren, Emily Yang On Fri, Oct 17, 2025 at 08:58:59PM +0000, Emily Yang via GitGitGadget wrote: > From: Emily Yang <emilyyang.git@gmail.com> > > The changed-path Bloom filters feature has proven stable and reliable > over several years of use, delivering significant performance > improvement for file history computation in large monorepos. Currently > a user can opt-in to writing the changed-path Bloom filters using the > "--changed-paths" option to "git commit-graph write". The filters will > be persisted until the user drops the filters using the > "--no-changed-paths" option. For this functionality, refer to 0087a87ba8 > (commit-graph: persist existence of changed-paths, 2020-07-01). > > Large monorepos using Git's background maintenance to build and update > commit-graph files could use an easy switch to enable this feature > without a foreground computation. In this commit, we're proposing a new > config option "commitGraph.changedPaths": > > * If "true", "git commit-graph write" will write Bloom filters, > equivalent to passing "--changed-paths"; > * If "false" or "unset", Bloom filters will be written during "git > commit-graph write" only if the filters already exist in the current > commit-graph file. This matches the default behaviour of "git > commit-graph write" without any "--[no-]changed-paths" option. Note > "false" can disable a previous "true" config value but doesn't imply > "--no-changed-paths". So if the commit-graph contains changed path Bloom filters, and the user takes the effort, and explicitly sets this config variable to false, then Git will just ignore that, and will continue to waste resources to compute the Bloom filters?! This doesn't seems like a sensible behavior to me. > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index 0b3404f58f..98c6910963 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -946,4 +946,48 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' ' > ) > ' > > +test_expect_success 'config commitGraph.changedPaths acts like --changed-paths' ' > + git init config-changed-paths && > + ( > + cd config-changed-paths && > + > + # commitGraph.changedPaths is not set and it should not write Bloom filters > + test_commit first && > + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && > + test_grep ! "Bloom filters" error && > + > + # Set commitGraph.changedPaths to true and it should write Bloom filters > + test_commit second && > + git config commitGraph.changedPaths true && > + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && > + test_grep "Bloom filters" error && > + > + # Add one more config commitGraph.changedPaths as false to disable the previous true config value > + # It should still write Bloom filters due to existing filters > + test_commit third && > + git config --add commitGraph.changedPaths false && > + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && > + test_grep "Bloom filters" error && > + > + # commitGraph.changedPaths is still false and command line options should take precedence > + test_commit fourth && > + GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error && > + test_grep ! "Bloom filters" error && > + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && > + test_grep ! "Bloom filters" error && > + > + # commitGraph.changedPaths is all cleared and then set to false again, command line options should take precedence > + test_commit fifth && > + git config --unset-all commitGraph.changedPaths && > + git config commitGraph.changedPaths false && > + GIT_PROGRESS_DELAY=0 git commit-graph write --changed-paths --reachable --progress 2>error && > + test_grep "Bloom filters" error && > + > + # commitGraph.changedPaths is still false and it should write Bloom filters due to existing filters > + test_commit sixth && > + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error && > + test_grep "Bloom filters" error > + ) > +' The interaction of split commit-graphs and changed path Bloom filters used to be buggy, even after attempts to fix it. Therefore, I think this config variable should be tested with split commit-graphs as well. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-10-29 21:04 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget 2025-10-09 22:30 ` Junio C Hamano 2025-10-10 12:48 ` Derrick Stolee 2025-10-10 16:32 ` Junio C Hamano 2025-10-10 12:32 ` Derrick Stolee 2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget 2025-10-22 14:53 ` Derrick Stolee 2025-10-22 17:42 ` Junio C Hamano 2025-10-29 21:04 ` SZEDER Gábor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).