git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Emily Yang via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, stolee@gmail.com, me@ttaylorr.com, ps@pks.im,
	newren@gmail.com, Emily Yang <emilyyang.git@gmail.com>,
	Emily Yang <emilyyang.git@gmail.com>
Subject: [PATCH v2] commit-graph: add new config for changed-paths & recommend it in scalar
Date: Fri, 17 Oct 2025 20:58:59 +0000	[thread overview]
Message-ID: <pull.1983.v2.git.1760734739642.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1983.git.1760043710502.gitgitgadget@gmail.com>

From: Emily Yang <emilyyang.git@gmail.com>

The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option. For this functionality, refer to 0087a87ba8
(commit-graph: persist existence of changed-paths, 2020-07-01).

Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths":

* If "true", "git commit-graph write" will write Bloom filters,
  equivalent to passing "--changed-paths";
* If "false" or "unset", Bloom filters will be written during "git
  commit-graph write" only if the filters already exist in the current
  commit-graph file. This matches the default behaviour of "git
  commit-graph write" without any "--[no-]changed-paths" option. Note
  "false" can disable a previous "true" config value but doesn't imply
  "--no-changed-paths".

This config will always respect the precedence of command line option
"--[no-]changed-paths".

We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.

Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
---
    commit-graph: add new config for changed-paths & recommend it in scalar
    
    Hello,
    
    I'm Emily and I'm interested in contributing to Git. This is my first
    contribution to Git, super excited!
    
    I'm from Microsoft and spend most of my time working in the Office
    MonoRepo (OMR, one of the largest repos in the world). Recently I've
    been working with Derrick Stolee on Git performance related topics. We'd
    love to propose a small enhancement on the existing changed-paths Bloom
    filters feature to benefit large repos like OMR. Please kindly review
    the code and provide your feedback!
    
    What's included in v2:
    
    I received feedback about the confusion around the config explanation,
    so in v2 I added more clarification in the doc and commit message,
    hopefully it helps!
    
    Thanks, Emily

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1983

Range-diff vs v1:

 1:  90b271e905 ! 1:  365db79f4d commit-graph: add new config for changed-paths & recommend it in scalar
     @@ Commit message
          a user can opt-in to writing the changed-path Bloom filters using the
          "--changed-paths" option to "git commit-graph write". The filters will
          be persisted until the user drops the filters using the
     -    "--no-changed-paths" option.
     +    "--no-changed-paths" option. For this functionality, refer to 0087a87ba8
     +    (commit-graph: persist existence of changed-paths, 2020-07-01).
      
          Large monorepos using Git's background maintenance to build and update
          commit-graph files could use an easy switch to enable this feature
          without a foreground computation. In this commit, we're proposing a new
     -    config option "commitGraph.changedPaths" - "true" value acts like
     -    "--changed-paths"; "false" disables a previous "true" config value but
     -    doesn't imply "--no-changed-paths". This config will always respect the
     -    precedence of command line option "--changed-paths" and
     -    "--no-changed-paths".
     +    config option "commitGraph.changedPaths":
     +
     +    * If "true", "git commit-graph write" will write Bloom filters,
     +      equivalent to passing "--changed-paths";
     +    * If "false" or "unset", Bloom filters will be written during "git
     +      commit-graph write" only if the filters already exist in the current
     +      commit-graph file. This matches the default behaviour of "git
     +      commit-graph write" without any "--[no-]changed-paths" option. Note
     +      "false" can disable a previous "true" config value but doesn't imply
     +      "--no-changed-paths".
     +
     +    This config will always respect the precedence of command line option
     +    "--[no-]changed-paths".
      
          We also set this new config as optional recommended config in scalar to
          turn on this feature for large repos.
     @@ Documentation/config/commitgraph.adoc: commitGraph.maxNewFilters::
      +commitGraph.changedPaths::
      +	If true, then `git commit-graph write` will compute and write
      +	changed-path Bloom filters by default, equivalent to passing
     -+	`--changed-paths`. If false or unset, changed-path Bloom filters
     -+	will only be written when explicitly requested via `--changed-paths`.
     -+	Command-line options always take precedence over this configuration.
     -+	Defaults to unset.
     ++	`--changed-paths`. If false or unset, changed-paths Bloom filters will
     ++	be written during `git commit-graph write` only if the filters already
     ++	exist in the current commit-graph file. This matches the default
     ++	behavior of `git commit-graph write` without any `--[no-]changed-paths`
     ++	option. To rewrite a commit-graph file without any filters, use the
     ++	`--no-changed-paths` option. Command-line option `--[no-]changed-paths`
     ++	always takes precedence over this configuration. Defaults to unset.
      +
       commitGraph.readChangedPaths::
       	Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
       	commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
      
     + ## Documentation/git-commit-graph.adoc ##
     +@@ Documentation/git-commit-graph.adoc: take a while on large repositories. It provides significant performance gains
     + for getting history of a directory or a file with `git log -- <path>`. If
     + this option is given, future commit-graph writes will automatically assume
     + that this option was intended. Use `--no-changed-paths` to stop storing this
     +-data.
     ++data. `--changed-paths` is implied by config `commitGraph.changedPaths=true`.
     + +
     + With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
     + filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
     +
       ## builtin/commit-graph.c ##
      @@ builtin/commit-graph.c: static int git_commit_graph_write_config(const char *var, const char *value,
       {


 Documentation/config/commitgraph.adoc | 11 +++++++
 Documentation/git-commit-graph.adoc   |  2 +-
 builtin/commit-graph.c                |  2 ++
 scalar.c                              |  1 +
 t/t5318-commit-graph.sh               | 44 +++++++++++++++++++++++++++
 5 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
index 7f8c9d6638..70a56c53d2 100644
--- a/Documentation/config/commitgraph.adoc
+++ b/Documentation/config/commitgraph.adoc
@@ -8,6 +8,17 @@ commitGraph.maxNewFilters::
 	Specifies the default value for the `--max-new-filters` option of `git
 	commit-graph write` (c.f., linkgit:git-commit-graph[1]).
 
+commitGraph.changedPaths::
+	If true, then `git commit-graph write` will compute and write
+	changed-path Bloom filters by default, equivalent to passing
+	`--changed-paths`. If false or unset, changed-paths Bloom filters will
+	be written during `git commit-graph write` only if the filters already
+	exist in the current commit-graph file. This matches the default
+	behavior of `git commit-graph write` without any `--[no-]changed-paths`
+	option. To rewrite a commit-graph file without any filters, use the
+	`--no-changed-paths` option. Command-line option `--[no-]changed-paths`
+	always takes precedence over this configuration. Defaults to unset.
+
 commitGraph.readChangedPaths::
 	Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
 	commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
diff --git a/Documentation/git-commit-graph.adoc b/Documentation/git-commit-graph.adoc
index e9558173c0..6d19026035 100644
--- a/Documentation/git-commit-graph.adoc
+++ b/Documentation/git-commit-graph.adoc
@@ -71,7 +71,7 @@ take a while on large repositories. It provides significant performance gains
 for getting history of a directory or a file with `git log -- <path>`. If
 this option is given, future commit-graph writes will automatically assume
 that this option was intended. Use `--no-changed-paths` to stop storing this
-data.
+data. `--changed-paths` is implied by config `commitGraph.changedPaths=true`.
 +
 With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
 filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index fe3ebaadad..d62005edc0 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
 {
 	if (!strcmp(var, "commitgraph.maxnewfilters"))
 		write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
+	else if (!strcmp(var, "commitgraph.changedpaths"))
+		opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
 	/*
 	 * No need to fall-back to 'git_default_config', since this was already
 	 * called in 'cmd_commit_graph()'.
diff --git a/scalar.c b/scalar.c
index 4a373c133d..f754311627 100644
--- a/scalar.c
+++ b/scalar.c
@@ -166,6 +166,7 @@ static int set_recommended_config(int reconfigure)
 #endif
 		/* Optional */
 		{ "status.aheadBehind", "false" },
+		{ "commitGraph.changedPaths", "true" },
 		{ "commitGraph.generationVersion", "1" },
 		{ "core.autoCRLF", "false" },
 		{ "core.safeCRLF", "false" },
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0b3404f58f..98c6910963 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -946,4 +946,48 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
 	)
 '
 
+test_expect_success 'config commitGraph.changedPaths acts like --changed-paths' '
+	git init config-changed-paths &&
+	(
+		cd config-changed-paths &&
+
+		# commitGraph.changedPaths is not set and it should not write Bloom filters
+		test_commit first &&
+		GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+		test_grep ! "Bloom filters" error &&
+
+		# Set commitGraph.changedPaths to true and it should write Bloom filters
+		test_commit second &&
+		git config commitGraph.changedPaths true &&
+		GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+		test_grep "Bloom filters" error &&
+
+		# Add one more config commitGraph.changedPaths as false to disable the previous true config value
+		# It should still write Bloom filters due to existing filters
+		test_commit third &&
+		git config --add commitGraph.changedPaths false &&
+		GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+		test_grep "Bloom filters" error &&
+
+		# commitGraph.changedPaths is still false and command line options should take precedence
+		test_commit fourth &&
+		GIT_PROGRESS_DELAY=0 git commit-graph write --no-changed-paths --reachable --progress 2>error &&
+		test_grep ! "Bloom filters" error &&
+		GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+		test_grep ! "Bloom filters" error &&
+
+		# commitGraph.changedPaths is all cleared and then set to false again, command line options should take precedence
+		test_commit fifth &&
+		git config --unset-all commitGraph.changedPaths &&
+		git config commitGraph.changedPaths false &&
+		GIT_PROGRESS_DELAY=0 git commit-graph write --changed-paths --reachable --progress 2>error &&
+		test_grep "Bloom filters" error &&
+
+		# commitGraph.changedPaths is still false and it should write Bloom filters due to existing filters
+		test_commit sixth &&
+		GIT_PROGRESS_DELAY=0 git commit-graph write --reachable --progress 2>error &&
+		test_grep "Bloom filters" error
+	)
+'
+
 test_done

base-commit: 79cf913ea9321f774da29b2330b5781d5ff420ef
-- 
gitgitgadget

  parent reply	other threads:[~2025-10-17 20:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09 21:01 [PATCH] commit-graph: add new config for changed-paths & recommend it in scalar Emily Yang via GitGitGadget
2025-10-09 22:30 ` Junio C Hamano
2025-10-10 12:48   ` Derrick Stolee
2025-10-10 16:32     ` Junio C Hamano
2025-10-10 12:32 ` Derrick Stolee
2025-10-17 20:58 ` Emily Yang via GitGitGadget [this message]
2025-10-22 14:53   ` [PATCH v2] " Derrick Stolee
2025-10-22 17:42     ` Junio C Hamano
2025-10-29 21:04   ` SZEDER Gábor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1983.v2.git.1760734739642.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=emilyyang.git@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).