From: Junio C Hamano <gitster@pobox.com>
To: "Emily Yang via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, stolee@gmail.com, me@ttaylorr.com,
ps@pks.im, newren@gmail.com,
Emily Yang <emilyyang.git@gmail.com>
Subject: Re: [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar
Date: Thu, 09 Oct 2025 15:30:14 -0700 [thread overview]
Message-ID: <xmqqecrbd7yh.fsf@gitster.g> (raw)
In-Reply-To: <pull.1983.git.1760043710502.gitgitgadget@gmail.com> (Emily Yang via GitGitGadget's message of "Thu, 09 Oct 2025 21:01:50 +0000")
"Emily Yang via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Emily Yang <emilyyang.git@gmail.com>
>
> The changed-path Bloom filters feature has proven stable and reliable
> over several years of use, delivering significant performance
> improvement for file history computation in large monorepos. Currently
> a user can opt-in to writing the changed-path Bloom filters using the
> "--changed-paths" option to "git commit-graph write". The filters will
> be persisted until the user drops the filters using the
> "--no-changed-paths" option.
Makes sense.
> Large monorepos using Git's background maintenance to build and update
> commit-graph files could use an easy switch to enable this feature
> without a foreground computation.
Again makes sense.
> In this commit, we're proposing a new
> config option "commitGraph.changedPaths" - "true" value acts like
> "--changed-paths"; "false" disables a previous "true" config value but
> doesn't imply "--no-changed-paths".
The way the above is phrased is so unusual that I am afraid it would
confuse readers.
When a configuration variable gives an opportunity for the users to
override the hardcoded default (in this case, --no-changed-paths has
been the traditional default, and graph.changedPaths=true would make
us pretend as if --changed-paths were given from the command line).
So if we were to have this configuration variable, setting it false
MUST make it pretend as if --no-changed-paths were given from the
command line, and MUST continue to do so even in some future we
changed the hardcoded default to be "true" (i.e., unless the user
says graph.changedPath=false in the configuration and/or declines
with "--no-changed-paths" from the command line, we will record the
changed paths filter by default).
Setting commitGraph.changedPaths to true should mean that the
"git commit-graph write" command behaves as if --changed-paths
were given immediately after that "write", so that an end-user
commmand
$ git commit-graph write
should behave as if it was written like this
$ git commit-graph write --changed-paths
and
$ git commit-graph write --no-changed-paths
should behave as if it was written like this
$ git commit-graph write --changed-paths --no-changed-paths
i.e. allowing the command line --no-changed-paths to override it.
Setting commitGraph.changedPaths to false should similarly mean that
"--no-changed-paths" implicitly is added immediately after "write",
meaning that
$ git commit-graph write
should behave as if it was written like this
$ git commit-graph write --no-changed-paths
As it is the default not to write changed-paths filter, this has no
effect, but I would say it still "implies" --no-changed-paths, and I
hope you'd agree once you imagine a hypothetical future in which the
default for "git commit-graph write" is to write changed-paths
filter by default.
> This config will always respect the
> precedence of command line option "--changed-paths" and
> "--no-changed-paths".
This is a bit unusual way to phrase this, but I think it makes sense
for the configuraiton variable to be overridden by the command line
option, as that is the bog-standard way configuration variables and
command line options interact with each other; it is so standard
that it is probably not even worth saying it.
> We also set this new config as optional recommended config in scalar to
> turn on this feature for large repos.
Great. Yes, from the start of the description above, anybody who is
aware of the "scalar" effort would be anticipating this conclusion.
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
> ---
> commit-graph: add new config for changed-paths & recommend it in scalar
>
> Hello,
>
> I'm Emily and I'm interested in contributing to Git. This is my first
> contribution to Git, super excited!
>
> I'm from Microsoft and spend most of my time working in the Office
> MonoRepo (OMR, one of the largest repos in the world). Recently I've
> been working with Derrick Stolee on Git performance related topics. We'd
> love to propose a small enhancement on the existing changed-paths Bloom
> filters feature to benefit large repos like OMR. Please kindly review
> the code and provide your feedback!
>
> Thanks, Emily
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1983
>
> Documentation/config/commitgraph.adoc | 8 +++++
> builtin/commit-graph.c | 2 ++
> scalar.c | 1 +
> t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++
> 4 files changed, 55 insertions(+)
>
> diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
> index 7f8c9d6638..c540e8a43d 100644
> --- a/Documentation/config/commitgraph.adoc
> +++ b/Documentation/config/commitgraph.adoc
> @@ -8,6 +8,14 @@ commitGraph.maxNewFilters::
> Specifies the default value for the `--max-new-filters` option of `git
> commit-graph write` (c.f., linkgit:git-commit-graph[1]).
>
> +commitGraph.changedPaths::
> + If true, then `git commit-graph write` will compute and write
> + changed-path Bloom filters by default, equivalent to passing
> + `--changed-paths`. If false or unset, changed-path Bloom filters
> + will only be written when explicitly requested via `--changed-paths`.
> + Command-line options always take precedence over this configuration.
> + Defaults to unset.
> +
> commitGraph.readChangedPaths::
> Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
> commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index fe3ebaadad..d62005edc0 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
> {
> if (!strcmp(var, "commitgraph.maxnewfilters"))
> write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
> + else if (!strcmp(var, "commitgraph.changedpaths"))
> + opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
This is iffy.
Unless the way existing command line parser figures out if the user
wants or does not want to use the feature is so screwed up, you
shouldn't have to do any such thing.
Why do you need to special case 'false' this way? The usual
practice is
* First, you initialize the variable "enable_changed_paths" with the
hardcoded default. In this case, as changed-paths is not written
by default, you'd initialize it to 0 (not -1).
* Then you read from the configuration variables to update it. If
you see commitgraph.changedPaths configuration, you take its
value (either 0 or 1 as it is a Boolean variable) and overwrite
the hardcoded default in the "enable_changed_paths" variable. Otherwise
you leave "enable_changed_paths" as-is.
* If you also have environment variable override, then you see if
there is the environment variable you care about, and if so,
override "enable_changed_paths" with its value. Otherwise you leave
"enable_changed_paths" as-is.
* Finally you read from the command line options using
parse_options(). If there are command line options given,
"enable_changed_paths" would be overriden again.
If the way the existing parser sets up enable_changed_paths is
screwed up and does not follow the above pattern (I didn't check),
perhaps you'd need a preliminary clean-up patch before adding this
new feature.
Thanks.
next prev parent reply other threads:[~2025-10-09 22:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget
2025-10-09 22:30 ` Junio C Hamano [this message]
2025-10-10 12:48 ` Derrick Stolee
2025-10-10 16:32 ` Junio C Hamano
2025-10-10 12:32 ` Derrick Stolee
2025-10-17 20:58 ` [PATCH v2] " Emily Yang via GitGitGadget
2025-10-22 14:53 ` Derrick Stolee
2025-10-22 17:42 ` Junio C Hamano
2025-10-29 21:04 ` SZEDER Gábor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqecrbd7yh.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=emilyyang.git@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).