* [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar
@ 2025-10-09 21:01 Emily Yang via GitGitGadget
2025-10-09 22:30 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Emily Yang via GitGitGadget @ 2025-10-09 21:01 UTC (permalink / raw)
To: git; +Cc: gitster, stolee, me, ps, newren, Emily Yang, Emily Yang
From: Emily Yang <emilyyang.git@gmail.com>
The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option.
Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths" - "true" value acts like
"--changed-paths"; "false" disables a previous "true" config value but
doesn't imply "--no-changed-paths". This config will always respect the
precedence of command line option "--changed-paths" and
"--no-changed-paths".
We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
---
commit-graph: add new config for changed-paths & recommend it in scalar
Hello,
I'm Emily and I'm interested in contributing to Git. This is my first
contribution to Git, super excited!
I'm from Microsoft and spend most of my time working in the Office
MonoRepo (OMR, one of the largest repos in the world). Recently I've
been working with Derrick Stolee on Git performance related topics. We'd
love to propose a small enhancement on the existing changed-paths Bloom
filters feature to benefit large repos like OMR. Please kindly review
the code and provide your feedback!
Thanks, Emily
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1983
Documentation/config/commitgraph.adoc | 8 +++++
builtin/commit-graph.c | 2 ++
scalar.c | 1 +
t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++
4 files changed, 55 insertions(+)
diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
index 7f8c9d6638..c540e8a43d 100644
--- a/Documentation/config/commitgraph.adoc
+++ b/Documentation/config/commitgraph.adoc
@@ -8,6 +8,14 @@ commitGraph.maxNewFilters::
Specifies the default value for the `--max-new-filters` option of `git
commit-graph write` (c.f., linkgit:git-commit-graph[1]).
+commitGraph.changedPaths::
+ If true, then `git commit-graph write` will compute and write
+ changed-path Bloom filters by default, equivalent to passing
+ `--changed-paths`. If false or unset, changed-path Bloom filters
+ will only be written when explicitly requested via `--changed-paths`.
+ Command-line options always take precedence over this configuration.
+ Defaults to unset.
+
commitGraph.readChangedPaths::
Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index fe3ebaadad..d62005edc0 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
{
if (!strcmp(var, "commitgraph.maxnewfilters"))
write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
+ else if (!strcmp(var, "commitgraph.changedpaths"))
+ opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
/*
* No need to fall-back to 'git_default_config', since this was already
* called in 'cmd_commit_graph()'.
diff --git a/scalar.c b/scalar.c
index 4a373c133d..f754311627 100644
--- a/scalar.c
+++ b/scalar.c
@@ -166,6 +166,7 @@ static int set_recommended_config(int reconfigure)
#endif
/* Optional */
{ "status.aheadBehind", "false" },
+ { "commitGraph.changedPaths", "true" },
{ "commitGraph.generationVersion", "1" },
{ "core.autoCRLF", "false" },
{ "core.safeCRLF", "false" },
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0b3404f58f..98c6910963 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -946,4 +946,48 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
)
'
+test_expect_success 'config commitGraph.changedPaths acts like --changed-paths' '
+ git init config-changed-paths &&
+ (
+ cd config-changed-paths &&
+
+ # commitGraph.changedPaths is not set and it should not write Bloom filters
+ test_commit first &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+
+ # Set commitGraph.changedPaths to true and it should write Bloom filters
+ test_commit second &&
+ git config commitGraph.changedPaths true &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # Add one more config commitGraph.changedPaths as false to disable the previous true config value
+ # It should still write Bloom filters due to existing filters
+ test_commit third &&
+ git config --add commitGraph.changedPaths false &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # commitGraph.changedPaths is still false and command line options should take precedence
+ test_commit fourth &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+
+ # commitGraph.changedPaths is all cleared and then set to false again, command line options should take precedence
+ test_commit fifth &&
+ git config --unset-all commitGraph.changedPaths &&
+ git config commitGraph.changedPaths false &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --changed-paths --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # commitGraph.changedPaths is still false and it should write Bloom filters due to existing filters
+ test_commit sixth &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error
+ )
+'
+
test_done
base-commit: 79cf913ea9321f774da29b2330b5781d5ff420ef
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget
@ 2025-10-09 22:30 ` Junio C Hamano
2025-10-10 12:48 ` Derrick Stolee
2025-10-10 12:32 ` Derrick Stolee
2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget
2 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2025-10-09 22:30 UTC (permalink / raw)
To: Emily Yang via GitGitGadget; +Cc: git, stolee, me, ps, newren, Emily Yang
"Emily Yang via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Emily Yang <emilyyang.git@gmail.com>
>
> The changed-path Bloom filters feature has proven stable and reliable
> over several years of use, delivering significant performance
> improvement for file history computation in large monorepos. Currently
> a user can opt-in to writing the changed-path Bloom filters using the
> "--changed-paths" option to "git commit-graph write". The filters will
> be persisted until the user drops the filters using the
> "--no-changed-paths" option.
Makes sense.
> Large monorepos using Git's background maintenance to build and update
> commit-graph files could use an easy switch to enable this feature
> without a foreground computation.
Again makes sense.
> In this commit, we're proposing a new
> config option "commitGraph.changedPaths" - "true" value acts like
> "--changed-paths"; "false" disables a previous "true" config value but
> doesn't imply "--no-changed-paths".
The way the above is phrased is so unusual that I am afraid it would
confuse readers.
When a configuration variable gives an opportunity for the users to
override the hardcoded default (in this case, --no-changed-paths has
been the traditional default, and graph.changedPaths=true would make
us pretend as if --changed-paths were given from the command line).
So if we were to have this configuration variable, setting it false
MUST make it pretend as if --no-changed-paths were given from the
command line, and MUST continue to do so even in some future we
changed the hardcoded default to be "true" (i.e., unless the user
says graph.changedPath=false in the configuration and/or declines
with "--no-changed-paths" from the command line, we will record the
changed paths filter by default).
Setting commitGraph.changedPaths to true should mean that the
"git commit-graph write" command behaves as if --changed-paths
were given immediately after that "write", so that an end-user
commmand
$ git commit-graph write
should behave as if it was written like this
$ git commit-graph write --changed-paths
and
$ git commit-graph write --no-changed-paths
should behave as if it was written like this
$ git commit-graph write --changed-paths --no-changed-paths
i.e. allowing the command line --no-changed-paths to override it.
Setting commitGraph.changedPaths to false should similarly mean that
"--no-changed-paths" implicitly is added immediately after "write",
meaning that
$ git commit-graph write
should behave as if it was written like this
$ git commit-graph write --no-changed-paths
As it is the default not to write changed-paths filter, this has no
effect, but I would say it still "implies" --no-changed-paths, and I
hope you'd agree once you imagine a hypothetical future in which the
default for "git commit-graph write" is to write changed-paths
filter by default.
> This config will always respect the
> precedence of command line option "--changed-paths" and
> "--no-changed-paths".
This is a bit unusual way to phrase this, but I think it makes sense
for the configuraiton variable to be overridden by the command line
option, as that is the bog-standard way configuration variables and
command line options interact with each other; it is so standard
that it is probably not even worth saying it.
> We also set this new config as optional recommended config in scalar to
> turn on this feature for large repos.
Great. Yes, from the start of the description above, anybody who is
aware of the "scalar" effort would be anticipating this conclusion.
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
> ---
> commit-graph: add new config for changed-paths & recommend it in scalar
>
> Hello,
>
> I'm Emily and I'm interested in contributing to Git. This is my first
> contribution to Git, super excited!
>
> I'm from Microsoft and spend most of my time working in the Office
> MonoRepo (OMR, one of the largest repos in the world). Recently I've
> been working with Derrick Stolee on Git performance related topics. We'd
> love to propose a small enhancement on the existing changed-paths Bloom
> filters feature to benefit large repos like OMR. Please kindly review
> the code and provide your feedback!
>
> Thanks, Emily
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1983
>
> Documentation/config/commitgraph.adoc | 8 +++++
> builtin/commit-graph.c | 2 ++
> scalar.c | 1 +
> t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++
> 4 files changed, 55 insertions(+)
>
> diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
> index 7f8c9d6638..c540e8a43d 100644
> --- a/Documentation/config/commitgraph.adoc
> +++ b/Documentation/config/commitgraph.adoc
> @@ -8,6 +8,14 @@ commitGraph.maxNewFilters::
> Specifies the default value for the `--max-new-filters` option of `git
> commit-graph write` (c.f., linkgit:git-commit-graph[1]).
>
> +commitGraph.changedPaths::
> + If true, then `git commit-graph write` will compute and write
> + changed-path Bloom filters by default, equivalent to passing
> + `--changed-paths`. If false or unset, changed-path Bloom filters
> + will only be written when explicitly requested via `--changed-paths`.
> + Command-line options always take precedence over this configuration.
> + Defaults to unset.
> +
> commitGraph.readChangedPaths::
> Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
> commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index fe3ebaadad..d62005edc0 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
> {
> if (!strcmp(var, "commitgraph.maxnewfilters"))
> write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
> + else if (!strcmp(var, "commitgraph.changedpaths"))
> + opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
This is iffy.
Unless the way existing command line parser figures out if the user
wants or does not want to use the feature is so screwed up, you
shouldn't have to do any such thing.
Why do you need to special case 'false' this way? The usual
practice is
* First, you initialize the variable "enable_changed_paths" with the
hardcoded default. In this case, as changed-paths is not written
by default, you'd initialize it to 0 (not -1).
* Then you read from the configuration variables to update it. If
you see commitgraph.changedPaths configuration, you take its
value (either 0 or 1 as it is a Boolean variable) and overwrite
the hardcoded default in the "enable_changed_paths" variable. Otherwise
you leave "enable_changed_paths" as-is.
* If you also have environment variable override, then you see if
there is the environment variable you care about, and if so,
override "enable_changed_paths" with its value. Otherwise you leave
"enable_changed_paths" as-is.
* Finally you read from the command line options using
parse_options(). If there are command line options given,
"enable_changed_paths" would be overriden again.
If the way the existing parser sets up enable_changed_paths is
screwed up and does not follow the above pattern (I didn't check),
perhaps you'd need a preliminary clean-up patch before adding this
new feature.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget
2025-10-09 22:30 ` Junio C Hamano
@ 2025-10-10 12:32 ` Derrick Stolee
2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget
2 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2025-10-10 12:32 UTC (permalink / raw)
To: Emily Yang via GitGitGadget, git; +Cc: gitster, me, ps, newren, Emily Yang
On 10/9/2025 5:01 PM, Emily Yang via GitGitGadget wrote:
> From: Emily Yang <emilyyang.git@gmail.com>
> Hello,
>
> I'm Emily and I'm interested in contributing to Git. This is my first
> contribution to Git, super excited!
>
> I'm from Microsoft and spend most of my time working in the Office
> MonoRepo (OMR, one of the largest repos in the world). Recently I've
> been working with Derrick Stolee on Git performance related topics. We'd
> love to propose a small enhancement on the existing changed-paths Bloom
> filters feature to benefit large repos like OMR. Please kindly review
> the code and provide your feedback!
Congratulations on your first Git submission, Emily!
For the rest on the list, Emily and I work together in support of engineering
systems at Microsoft, and her team is particularly interested in Git
performance for the Office monorepo. This first patch is hopefully one of
many to follow as we build up more people with the right expertise to make
changes to Git, especially at our boundaries of scale.
I've already done a "pre-review" of this patch as part of mentoring Emily in
her journey to Git contribution.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-09 22:30 ` Junio C Hamano
@ 2025-10-10 12:48 ` Derrick Stolee
2025-10-10 16:32 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2025-10-10 12:48 UTC (permalink / raw)
To: Junio C Hamano, Emily Yang via GitGitGadget
Cc: git, me, ps, newren, Emily Yang
On 10/9/2025 6:30 PM, Junio C Hamano wrote:
> "Emily Yang via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Emily Yang <emilyyang.git@gmail.com>
>> In this commit, we're proposing a new
>> config option "commitGraph.changedPaths" - "true" value acts like
>> "--changed-paths"; "false" disables a previous "true" config value but
>> doesn't imply "--no-changed-paths".
>
> The way the above is phrased is so unusual that I am afraid it would
> confuse readers.
>
> When a configuration variable gives an opportunity for the users to
> override the hardcoded default (in this case, --no-changed-paths has
> been the traditional default,
(I'm pointing out this statement and how it's not quite right. I'll
explain more fully lower in this reply.)
> and graph.changedPaths=true would make
> us pretend as if --changed-paths were given from the command line).
> So if we were to have this configuration variable, setting it false
> MUST make it pretend as if --no-changed-paths were given from the
> command line, and MUST continue to do so even in some future we
> changed the hardcoded default to be "true" (i.e., unless the user
> says graph.changedPath=false in the configuration and/or declines
> with "--no-changed-paths" from the command line, we will record the
> changed paths filter by default).
>
> Setting commitGraph.changedPaths to true should mean that the
> "git commit-graph write" command behaves as if --changed-paths
> were given immediately after that "write", so that an end-user
> commmand
>
> $ git commit-graph write
>
> should behave as if it was written like this
>
> $ git commit-graph write --changed-paths
>
> and
>
> $ git commit-graph write --no-changed-paths
>
> should behave as if it was written like this
>
> $ git commit-graph write --changed-paths --no-changed-paths
>
> i.e. allowing the command line --no-changed-paths to override it.
>
> Setting commitGraph.changedPaths to false should similarly mean that
> "--no-changed-paths" implicitly is added immediately after "write",
> meaning that
>
> $ git commit-graph write
>
> should behave as if it was written like this
>
> $ git commit-graph write --no-changed-paths
One thing that is tricky about --[no-]changed-paths is that it is a
"tri-state" argument due to 0087a87ba8 (commit-graph: persist
existence of changed-paths, 2020-07-01):
* --changed-paths : Definitely write the data, even if it didn't
exist already.
* --no-changed-paths : Definitely _don't_ write the data, even if
it exists already.
* (not present) : Update filters that do exist, but don't write them
if they don't exist.
This is reflected in how opts.enable_changed_paths is initialized to
-1 in the existing version. Then, the config is loaded before the
arguments are parsed (this is already enforcing the precedence of
'--max-new-filters=<N>' over the 'commitGraph.maxNewFilters' config).
Later, opts.enable_changed_paths is converted into
COMMIT_GRAPH_WRITE_BLOOM_FILTERS or COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS
flags for the underlying commit-graph API, with the default of -1
passing neither flag (which will use any existing commit-graph to
persist and extend filters that already exist).
The big reason for this is so users can use a foreground process to
initialize filters, then background maintenance will respect and persist
that behavior. The big change here is that the config allows a user to
enable the filters and have them be computed entirely in the background.
So I think this is the root of your concerns here.
>> @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
>> {
>> if (!strcmp(var, "commitgraph.maxnewfilters"))
>> write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
>> + else if (!strcmp(var, "commitgraph.changedpaths"))
>> + opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
>
> This is iffy.
>
> Unless the way existing command line parser figures out if the user
> wants or does not want to use the feature is so screwed up, you
> shouldn't have to do any such thing.
>
> Why do you need to special case 'false' this way?
The config now has this implication:
* true : turn '(not present)' into '--changed-paths'.
* false/unset : Continue to assume '(not present)'.
And the typical case is that we would have 'false' imply
'--no-changed-paths' which _removes_ filters that may exist. I
could see a case for this.
The situation that I wanted to think about was this:
* A user sets the config to 'true' in global config.
* They then set the config to 'false' in a specific repo.
In this case, the 'false' _disables the config_ but doesn't cause
any existing filters to be deleted.
I hope this helps. I could see a case for 'false' implying
'--no-changed-filters' but as Emily was investigating this and
noting this discrepancy, we leaned in the direction of being non-
destructive with the config.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-10 12:48 ` Derrick Stolee
@ 2025-10-10 16:32 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2025-10-10 16:32 UTC (permalink / raw)
To: Derrick Stolee
Cc: Emily Yang via GitGitGadget, git, me, ps, newren, Emily Yang
Derrick Stolee <stolee@gmail.com> writes:
> One thing that is tricky about --[no-]changed-paths is that it is a
> "tri-state" argument due to 0087a87ba8 (commit-graph: persist
> existence of changed-paths, 2020-07-01):
>
> * --changed-paths : Definitely write the data, even if it didn't
> exist already.
>
> * --no-changed-paths : Definitely _don't_ write the data, even if
> it exists already.
>
> * (not present) : Update filters that do exist, but don't write them
> if they don't exist.
OK, so "--no-" is not the usual "no"; it is more like "strip
existing" that implies "even existing ones are getting nuked, there
is no way we write new ones". OK, that may explain the construct I
found funny. Thanks for clarifying.
> The situation that I wanted to think about was this:
>
> * A user sets the config to 'true' in global config.
> * They then set the config to 'false' in a specific repo.
>
> In this case, the 'false' _disables the config_ but doesn't cause
> any existing filters to be deleted.
Ouch, that hurts, as they expected this specific one would drop
existing filters but that does not happen.
Perhaps we need to strengthen the description of --no-* (if not
renaming it to --drop-* or something to clarify what it really
does).
At least the configuration needs to be explained not like: "setting
it to false is different from --no-changed-paths option". The
documentation should not stop at saying what it is not, but should
also say what it does. Perhaps "setting it to true is like always
giving --changed-paths, setting it to false stops writing new
changed paths filters, but without dropping existing changed paths
filters" or something along that line.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget
2025-10-09 22:30 ` Junio C Hamano
2025-10-10 12:32 ` Derrick Stolee
@ 2025-10-17 20:58 ` Emily Yang via GitGitGadget
2025-10-22 14:53 ` Derrick Stolee
2025-10-29 21:04 ` SZEDER Gábor
2 siblings, 2 replies; 9+ messages in thread
From: Emily Yang via GitGitGadget @ 2025-10-17 20:58 UTC (permalink / raw)
To: git; +Cc: gitster, stolee, me, ps, newren, Emily Yang, Emily Yang
From: Emily Yang <emilyyang.git@gmail.com>
The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option. For this functionality, refer to 0087a87ba8
(commit-graph: persist existence of changed-paths, 2020-07-01).
Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths":
* If "true", "git commit-graph write" will write Bloom filters,
equivalent to passing "--changed-paths";
* If "false" or "unset", Bloom filters will be written during "git
commit-graph write" only if the filters already exist in the current
commit-graph file. This matches the default behaviour of "git
commit-graph write" without any "--[no-]changed-paths" option. Note
"false" can disable a previous "true" config value but doesn't imply
"--no-changed-paths".
This config will always respect the precedence of command line option
"--[no-]changed-paths".
We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
---
commit-graph: add new config for changed-paths & recommend it in scalar
Hello,
I'm Emily and I'm interested in contributing to Git. This is my first
contribution to Git, super excited!
I'm from Microsoft and spend most of my time working in the Office
MonoRepo (OMR, one of the largest repos in the world). Recently I've
been working with Derrick Stolee on Git performance related topics. We'd
love to propose a small enhancement on the existing changed-paths Bloom
filters feature to benefit large repos like OMR. Please kindly review
the code and provide your feedback!
What's included in v2:
I received feedback about the confusion around the config explanation,
so in v2 I added more clarification in the doc and commit message,
hopefully it helps!
Thanks, Emily
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1983
Range-diff vs v1:
1: 90b271e905 ! 1: 365db79f4d commit-graph: add new config for changed-paths & recommend it in scalar
@@ Commit message
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
- "--no-changed-paths" option.
+ "--no-changed-paths" option. For this functionality, refer to 0087a87ba8
+ (commit-graph: persist existence of changed-paths, 2020-07-01).
Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
- config option "commitGraph.changedPaths" - "true" value acts like
- "--changed-paths"; "false" disables a previous "true" config value but
- doesn't imply "--no-changed-paths". This config will always respect the
- precedence of command line option "--changed-paths" and
- "--no-changed-paths".
+ config option "commitGraph.changedPaths":
+
+ * If "true", "git commit-graph write" will write Bloom filters,
+ equivalent to passing "--changed-paths";
+ * If "false" or "unset", Bloom filters will be written during "git
+ commit-graph write" only if the filters already exist in the current
+ commit-graph file. This matches the default behaviour of "git
+ commit-graph write" without any "--[no-]changed-paths" option. Note
+ "false" can disable a previous "true" config value but doesn't imply
+ "--no-changed-paths".
+
+ This config will always respect the precedence of command line option
+ "--[no-]changed-paths".
We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.
@@ Documentation/config/commitgraph.adoc: commitGraph.maxNewFilters::
+commitGraph.changedPaths::
+ If true, then `git commit-graph write` will compute and write
+ changed-path Bloom filters by default, equivalent to passing
-+ `--changed-paths`. If false or unset, changed-path Bloom filters
-+ will only be written when explicitly requested via `--changed-paths`.
-+ Command-line options always take precedence over this configuration.
-+ Defaults to unset.
++ `--changed-paths`. If false or unset, changed-paths Bloom filters will
++ be written during `git commit-graph write` only if the filters already
++ exist in the current commit-graph file. This matches the default
++ behavior of `git commit-graph write` without any `--[no-]changed-paths`
++ option. To rewrite a commit-graph file without any filters, use the
++ `--no-changed-paths` option. Command-line option `--[no-]changed-paths`
++ always takes precedence over this configuration. Defaults to unset.
+
commitGraph.readChangedPaths::
Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
+ ## Documentation/git-commit-graph.adoc ##
+@@ Documentation/git-commit-graph.adoc: take a while on large repositories. It provides significant performance gains
+ for getting history of a directory or a file with `git log -- <path>`. If
+ this option is given, future commit-graph writes will automatically assume
+ that this option was intended. Use `--no-changed-paths` to stop storing this
+-data.
++data. `--changed-paths` is implied by config `commitGraph.changedPaths=true`.
+ +
+ With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
+ filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
+
## builtin/commit-graph.c ##
@@ builtin/commit-graph.c: static int git_commit_graph_write_config(const char *var, const char *value,
{
Documentation/config/commitgraph.adoc | 11 +++++++
Documentation/git-commit-graph.adoc | 2 +-
builtin/commit-graph.c | 2 ++
scalar.c | 1 +
t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++
5 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
index 7f8c9d6638..70a56c53d2 100644
--- a/Documentation/config/commitgraph.adoc
+++ b/Documentation/config/commitgraph.adoc
@@ -8,6 +8,17 @@ commitGraph.maxNewFilters::
Specifies the default value for the `--max-new-filters` option of `git
commit-graph write` (c.f., linkgit:git-commit-graph[1]).
+commitGraph.changedPaths::
+ If true, then `git commit-graph write` will compute and write
+ changed-path Bloom filters by default, equivalent to passing
+ `--changed-paths`. If false or unset, changed-paths Bloom filters will
+ be written during `git commit-graph write` only if the filters already
+ exist in the current commit-graph file. This matches the default
+ behavior of `git commit-graph write` without any `--[no-]changed-paths`
+ option. To rewrite a commit-graph file without any filters, use the
+ `--no-changed-paths` option. Command-line option `--[no-]changed-paths`
+ always takes precedence over this configuration. Defaults to unset.
+
commitGraph.readChangedPaths::
Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
diff --git a/Documentation/git-commit-graph.adoc b/Documentation/git-commit-graph.adoc
index e9558173c0..6d19026035 100644
--- a/Documentation/git-commit-graph.adoc
+++ b/Documentation/git-commit-graph.adoc
@@ -71,7 +71,7 @@ take a while on large repositories. It provides significant performance gains
for getting history of a directory or a file with `git log -- <path>`. If
this option is given, future commit-graph writes will automatically assume
that this option was intended. Use `--no-changed-paths` to stop storing this
-data.
+data. `--changed-paths` is implied by config `commitGraph.changedPaths=true`.
+
With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index fe3ebaadad..d62005edc0 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
{
if (!strcmp(var, "commitgraph.maxnewfilters"))
write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
+ else if (!strcmp(var, "commitgraph.changedpaths"))
+ opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
/*
* No need to fall-back to 'git_default_config', since this was already
* called in 'cmd_commit_graph()'.
diff --git a/scalar.c b/scalar.c
index 4a373c133d..f754311627 100644
--- a/scalar.c
+++ b/scalar.c
@@ -166,6 +166,7 @@ static int set_recommended_config(int reconfigure)
#endif
/* Optional */
{ "status.aheadBehind", "false" },
+ { "commitGraph.changedPaths", "true" },
{ "commitGraph.generationVersion", "1" },
{ "core.autoCRLF", "false" },
{ "core.safeCRLF", "false" },
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0b3404f58f..98c6910963 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -946,4 +946,48 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
)
'
+test_expect_success 'config commitGraph.changedPaths acts like --changed-paths' '
+ git init config-changed-paths &&
+ (
+ cd config-changed-paths &&
+
+ # commitGraph.changedPaths is not set and it should not write Bloom filters
+ test_commit first &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+
+ # Set commitGraph.changedPaths to true and it should write Bloom filters
+ test_commit second &&
+ git config commitGraph.changedPaths true &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # Add one more config commitGraph.changedPaths as false to disable the previous true config value
+ # It should still write Bloom filters due to existing filters
+ test_commit third &&
+ git config --add commitGraph.changedPaths false &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # commitGraph.changedPaths is still false and command line options should take precedence
+ test_commit fourth &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep ! "Bloom filters" error &&
+
+ # commitGraph.changedPaths is all cleared and then set to false again, command line options should take precedence
+ test_commit fifth &&
+ git config --unset-all commitGraph.changedPaths &&
+ git config commitGraph.changedPaths false &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --changed-paths --reachable --progress 2>error &&
+ test_grep "Bloom filters" error &&
+
+ # commitGraph.changedPaths is still false and it should write Bloom filters due to existing filters
+ test_commit sixth &&
+ GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+ test_grep "Bloom filters" error
+ )
+'
+
test_done
base-commit: 79cf913ea9321f774da29b2330b5781d5ff420ef
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget
@ 2025-10-22 14:53 ` Derrick Stolee
2025-10-22 17:42 ` Junio C Hamano
2025-10-29 21:04 ` SZEDER Gábor
1 sibling, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2025-10-22 14:53 UTC (permalink / raw)
To: Emily Yang via GitGitGadget, git; +Cc: gitster, me, ps, newren, Emily Yang
On 10/17/2025 4:58 PM, Emily Yang via GitGitGadget wrote:
> From: Emily Yang <emilyyang.git@gmail.com>
> What's included in v2:
>
> I received feedback about the confusion around the config explanation,
> so in v2 I added more clarification in the doc and commit message,
> hopefully it helps!
>
> Thanks, Emily
Thanks for these updates. I'm happy with the new version.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-22 14:53 ` Derrick Stolee
@ 2025-10-22 17:42 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2025-10-22 17:42 UTC (permalink / raw)
To: Derrick Stolee
Cc: Emily Yang via GitGitGadget, git, me, ps, newren, Emily Yang
Derrick Stolee <stolee@gmail.com> writes:
> On 10/17/2025 4:58 PM, Emily Yang via GitGitGadget wrote:
>> From: Emily Yang <emilyyang.git@gmail.com>
>
>> What's included in v2:
>>
>> I received feedback about the confusion around the config explanation,
>> so in v2 I added more clarification in the doc and commit message,
>> hopefully it helps!
>>
>> Thanks, Emily
>
> Thanks for these updates. I'm happy with the new version.
Thanks, both. Will queue and mark it for 'next'.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget
2025-10-22 14:53 ` Derrick Stolee
@ 2025-10-29 21:04 ` SZEDER Gábor
1 sibling, 0 replies; 9+ messages in thread
From: SZEDER Gábor @ 2025-10-29 21:04 UTC (permalink / raw)
To: Emily Yang via GitGitGadget
Cc: git, gitster, stolee, me, ps, newren, Emily Yang
On Fri, Oct 17, 2025 at 08:58:59PM +0000, Emily Yang via GitGitGadget wrote:
> From: Emily Yang <emilyyang.git@gmail.com>
>
> The changed-path Bloom filters feature has proven stable and reliable
> over several years of use, delivering significant performance
> improvement for file history computation in large monorepos. Currently
> a user can opt-in to writing the changed-path Bloom filters using the
> "--changed-paths" option to "git commit-graph write". The filters will
> be persisted until the user drops the filters using the
> "--no-changed-paths" option. For this functionality, refer to 0087a87ba8
> (commit-graph: persist existence of changed-paths, 2020-07-01).
>
> Large monorepos using Git's background maintenance to build and update
> commit-graph files could use an easy switch to enable this feature
> without a foreground computation. In this commit, we're proposing a new
> config option "commitGraph.changedPaths":
>
> * If "true", "git commit-graph write" will write Bloom filters,
> equivalent to passing "--changed-paths";
> * If "false" or "unset", Bloom filters will be written during "git
> commit-graph write" only if the filters already exist in the current
> commit-graph file. This matches the default behaviour of "git
> commit-graph write" without any "--[no-]changed-paths" option. Note
> "false" can disable a previous "true" config value but doesn't imply
> "--no-changed-paths".
So if the commit-graph contains changed path Bloom filters, and the
user takes the effort, and explicitly sets this config variable to
false, then Git will just ignore that, and will continue to waste
resources to compute the Bloom filters?! This doesn't seems like a
sensible behavior to me.
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 0b3404f58f..98c6910963 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -946,4 +946,48 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
> )
> '
>
> +test_expect_success 'config commitGraph.changedPaths acts like --changed-paths' '
> + git init config-changed-paths &&
> + (
> + cd config-changed-paths &&
> +
> + # commitGraph.changedPaths is not set and it should not write Bloom filters
> + test_commit first &&
> + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
> + test_grep ! "Bloom filters" error &&
> +
> + # Set commitGraph.changedPaths to true and it should write Bloom filters
> + test_commit second &&
> + git config commitGraph.changedPaths true &&
> + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
> + test_grep "Bloom filters" error &&
> +
> + # Add one more config commitGraph.changedPaths as false to disable the previous true config value
> + # It should still write Bloom filters due to existing filters
> + test_commit third &&
> + git config --add commitGraph.changedPaths false &&
> + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
> + test_grep "Bloom filters" error &&
> +
> + # commitGraph.changedPaths is still false and command line options should take precedence
> + test_commit fourth &&
> + GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error &&
> + test_grep ! "Bloom filters" error &&
> + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
> + test_grep ! "Bloom filters" error &&
> +
> + # commitGraph.changedPaths is all cleared and then set to false again, command line options should take precedence
> + test_commit fifth &&
> + git config --unset-all commitGraph.changedPaths &&
> + git config commitGraph.changedPaths false &&
> + GIT_PROGRESS_DELAY=0 git commit-graph write --changed-paths --reachable --progress 2>error &&
> + test_grep "Bloom filters" error &&
> +
> + # commitGraph.changedPaths is still false and it should write Bloom filters due to existing filters
> + test_commit sixth &&
> + GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
> + test_grep "Bloom filters" error
> + )
> +'
The interaction of split commit-graphs and changed path Bloom filters
used to be buggy, even after attempts to fix it. Therefore, I think
this config variable should be tested with split commit-graphs as
well.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-10-29 21:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget
2025-10-09 22:30 ` Junio C Hamano
2025-10-10 12:48 ` Derrick Stolee
2025-10-10 16:32 ` Junio C Hamano
2025-10-10 12:32 ` Derrick Stolee
2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget
2025-10-22 14:53 ` Derrick Stolee
2025-10-22 17:42 ` Junio C Hamano
2025-10-29 21:04 ` SZEDER Gábor
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).