* [PATCH 0/2] describe and diff: implement --no-optional-locks
@ 2025-03-06 5:58 Benjamin Woodruff via GitGitGadget
2025-03-06 5:58 ` [PATCH 1/2] describe: " Benjamin Woodruff via GitGitGadget
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Benjamin Woodruff via GitGitGadget @ 2025-03-06 5:58 UTC (permalink / raw)
To: git; +Cc: Benjamin Woodruff
git describe and git diff may update the index in the background for similar
performance reasons to git-status. These patches implement the
--no-optional-locks option for these commands, which allows scripts to
bypass this behavior.
I'm implementing this as a solution to a stale lockfile issue we've
sporadically encountered due to a build script that runs git describe. We're
mitigating this issue by using libgit2 in our build script, which does not
have the same background update behavior for its git_describe_workdir
implementation, but it would be nice to have this option supported in the
git CLI.
Tests and documentation changes are included!
Benjamin Woodruff (2):
describe: implement --no-optional-locks
diff: implement --no-optional-locks
Documentation/config/diff.adoc | 4 ++-
Documentation/git-describe.adoc | 12 ++++++++
Documentation/git.adoc | 4 ++-
builtin/describe.c | 12 ++++----
builtin/diff.c | 4 +++
t/meson.build | 1 +
t/t4070-diff-auto-refresh-index.sh | 46 ++++++++++++++++++++++++++++++
t/t6120-describe.sh | 8 ++++++
8 files changed, 84 insertions(+), 7 deletions(-)
create mode 100755 t/t4070-diff-auto-refresh-index.sh
base-commit: e969bc875963a10890d61ba84eab3a460bd9e535
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1872%2Fbgw%2Fbgw%2Fdiff-describe-optional-locks-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1872/bgw/bgw/diff-describe-optional-locks-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1872
--
gitgitgadget
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] describe: implement --no-optional-locks
2025-03-06 5:58 [PATCH 0/2] describe and diff: implement --no-optional-locks Benjamin Woodruff via GitGitGadget
@ 2025-03-06 5:58 ` Benjamin Woodruff via GitGitGadget
2025-03-06 5:58 ` [PATCH 2/2] diff: " Benjamin Woodruff via GitGitGadget
2025-03-06 16:11 ` [PATCH 0/2] describe and " Junio C Hamano
2 siblings, 0 replies; 11+ messages in thread
From: Benjamin Woodruff via GitGitGadget @ 2025-03-06 5:58 UTC (permalink / raw)
To: git; +Cc: Benjamin Woodruff, Benjamin Woodruff
From: Benjamin Woodruff <benjamin.woodruff@vercel.com>
When used with `--dirty`, describe may update the index in the
background for similar performance reasons to `git-status`.
This commit implements the `--no-optional-locks` option for
`git describe`, which allows scripts to bypass this behavior.
Signed-off-by: Benjamin Woodruff <benjamin.woodruff@vercel.com>
---
Documentation/git-describe.adoc | 12 ++++++++++++
Documentation/git.adoc | 3 ++-
builtin/describe.c | 12 +++++++-----
t/t6120-describe.sh | 8 ++++++++
4 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-describe.adoc b/Documentation/git-describe.adoc
index 08ff715709c..97b4c656078 100644
--- a/Documentation/git-describe.adoc
+++ b/Documentation/git-describe.adoc
@@ -198,6 +198,18 @@ selected and output. Here fewest commits different is defined as
the number of commits which would be shown by `git log tag..input`
will be the smallest number of commits possible.
+BACKGROUND REFRESH
+------------------
+
+By default, `git describe --dirty` will automatically refresh the index,
+similar to the background refresh functionality of
+linkgit:git-status[1]. Writing out the updated index is an optimization
+that isn't strictly necessary. When `describe` is run in the background,
+the lock held during the write may conflict with other simultaneous
+processes, causing them to fail. Scripts running `describe` in the
+background should consider using `git --no-optional-locks status` (see
+linkgit:git[1] for details).
+
BUGS
----
diff --git a/Documentation/git.adoc b/Documentation/git.adoc
index 743b7b00e4d..f084b2f0f1e 100644
--- a/Documentation/git.adoc
+++ b/Documentation/git.adoc
@@ -189,7 +189,8 @@ If you just want to run git as if it was started in `<path>` then use
--no-optional-locks::
Do not perform optional operations that require locks. This is
- equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
+ equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. This
+ functionality is implemented for `git status` and `git describe`.
--no-advice::
Disable all advice hints from being printed.
diff --git a/builtin/describe.c b/builtin/describe.c
index e2e73f3d757..d4fea55352d 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -704,7 +704,6 @@ int cmd_describe(int argc,
} else if (dirty) {
struct lock_file index_lock = LOCK_INIT;
struct rev_info revs;
- int fd;
setup_work_tree();
prepare_repo_settings(the_repository);
@@ -712,10 +711,13 @@ int cmd_describe(int argc,
repo_read_index(the_repository);
refresh_index(the_repository->index, REFRESH_QUIET|REFRESH_UNMERGED,
NULL, NULL, NULL);
- fd = repo_hold_locked_index(the_repository,
- &index_lock, 0);
- if (0 <= fd)
- repo_update_index_if_able(the_repository, &index_lock);
+ if (use_optional_locks()) {
+ int fd = repo_hold_locked_index(the_repository,
+ &index_lock, 0);
+ if (0 <= fd)
+ repo_update_index_if_able(the_repository,
+ &index_lock);
+ }
repo_init_revisions(the_repository, &revs, prefix);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 76843a61691..ae7b6901ed6 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -749,4 +749,12 @@ test_expect_success 'do not be fooled by invalid describe format ' '
test_must_fail git cat-file -t "refs/tags/super-invalid/./../...../ ~^:/?*[////\\\\\\&}/busted.lock-42-g"$(cat out)
'
+test_expect_success '--no-optional-locks prevents index update' '
+ test_set_magic_mtime .git/index &&
+ git --no-optional-locks describe --dirty &&
+ test_is_magic_mtime .git/index &&
+ git describe --dirty &&
+ ! test_is_magic_mtime .git/index
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] diff: implement --no-optional-locks
2025-03-06 5:58 [PATCH 0/2] describe and diff: implement --no-optional-locks Benjamin Woodruff via GitGitGadget
2025-03-06 5:58 ` [PATCH 1/2] describe: " Benjamin Woodruff via GitGitGadget
@ 2025-03-06 5:58 ` Benjamin Woodruff via GitGitGadget
2025-03-06 16:11 ` [PATCH 0/2] describe and " Junio C Hamano
2 siblings, 0 replies; 11+ messages in thread
From: Benjamin Woodruff via GitGitGadget @ 2025-03-06 5:58 UTC (permalink / raw)
To: git; +Cc: Benjamin Woodruff, Benjamin Woodruff
From: Benjamin Woodruff <benjamin.woodruff@vercel.com>
When used with `autoRefreshIndex`, `git diff` may update the index in
the background for similar performance reasons to `git-status`.
This commit implements the `--no-optional-locks` option for `git diff`,
which allows scripts to bypass this behavior.
Signed-off-by: Benjamin Woodruff <benjamin.woodruff@vercel.com>
---
Documentation/config/diff.adoc | 4 ++-
Documentation/git.adoc | 3 +-
builtin/diff.c | 4 +++
t/meson.build | 1 +
t/t4070-diff-auto-refresh-index.sh | 46 ++++++++++++++++++++++++++++++
5 files changed, 56 insertions(+), 2 deletions(-)
create mode 100755 t/t4070-diff-auto-refresh-index.sh
diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
index 1135a62a0ad..2dadcf7a1da 100644
--- a/Documentation/config/diff.adoc
+++ b/Documentation/config/diff.adoc
@@ -6,7 +6,9 @@
contents in the work tree match the contents in the
index. This option defaults to `true`. Note that this
affects only `git diff` Porcelain, and not lower level
- `diff` commands such as `git diff-files`.
+ `diff` commands such as `git diff-files`. If
+ `--no-optional-locks` is set (see linkgit:git[1] for
+ details), the index file is not updated.
`diff.dirstat`::
ifdef::git-diff[]
diff --git a/Documentation/git.adoc b/Documentation/git.adoc
index f084b2f0f1e..754880b8672 100644
--- a/Documentation/git.adoc
+++ b/Documentation/git.adoc
@@ -190,7 +190,8 @@ If you just want to run git as if it was started in `<path>` then use
--no-optional-locks::
Do not perform optional operations that require locks. This is
equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. This
- functionality is implemented for `git status` and `git describe`.
+ functionality is implemented for `git status`, `git describe`,
+ and `git diff`.
--no-advice::
Disable all advice hints from being printed.
diff --git a/builtin/diff.c b/builtin/diff.c
index a4fffee42c6..5469e58bf5d 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -9,6 +9,7 @@
#include "builtin.h"
#include "config.h"
+#include "environment.h"
#include "ewah/ewok.h"
#include "lockfile.h"
#include "color.h"
@@ -239,6 +240,9 @@ static void refresh_index_quietly(void)
struct lock_file lock_file = LOCK_INIT;
int fd;
+ if (!use_optional_locks())
+ return;
+
fd = repo_hold_locked_index(the_repository, &lock_file, 0);
if (fd < 0)
return;
diff --git a/t/meson.build b/t/meson.build
index a59da26be3f..10d7cace3c6 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -500,6 +500,7 @@ integration_tests = [
't4067-diff-partial-clone.sh',
't4068-diff-symmetric-merge-base.sh',
't4069-remerge-diff.sh',
+ 't4070-diff-auto-refresh-index.sh',
't4100-apply-stat.sh',
't4101-apply-nonl.sh',
't4102-apply-rename.sh',
diff --git a/t/t4070-diff-auto-refresh-index.sh b/t/t4070-diff-auto-refresh-index.sh
new file mode 100755
index 00000000000..9e38f1d3206
--- /dev/null
+++ b/t/t4070-diff-auto-refresh-index.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# Copyright (c) 2025 Benjamin Woodruff
+#
+
+test_description='diff.autoRefreshIndex config option'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-diff.sh
+
+test_expect_success 'index is updated when autoRefreshIndex is true' '
+ >tracked &&
+ git add tracked &&
+
+ # stat() must change (but not file contents) to trigger an index update
+ test_set_magic_mtime tracked &&
+
+ # check the mtime of .git/index does not change without autoRefreshIndex
+ test_set_magic_mtime .git/index &&
+ git config diff.autoRefreshIndex false &&
+ git diff &&
+ test_is_magic_mtime .git/index &&
+
+ # but it does change when autoRefreshIndex is true (the default)
+ git config diff.autoRefreshIndex true &&
+ git diff &&
+ ! test_is_magic_mtime .git/index
+'
+
+test_expect_success '--no-optional-locks overrides autoRefreshIndex' '
+ >tracked &&
+ git add tracked &&
+ test_set_magic_mtime tracked &&
+
+ # `--no-optional-locks` overrides `autoRefreshIndex`
+ test_set_magic_mtime .git/index &&
+ git config diff.autoRefreshIndex true &&
+ git --no-optional-locks diff &&
+
+ # sanity check that without `--no-optional-locks` it still updates
+ test_is_magic_mtime .git/index &&
+ git diff &&
+ ! test_is_magic_mtime .git/index
+'
+
+test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-06 5:58 [PATCH 0/2] describe and diff: implement --no-optional-locks Benjamin Woodruff via GitGitGadget
2025-03-06 5:58 ` [PATCH 1/2] describe: " Benjamin Woodruff via GitGitGadget
2025-03-06 5:58 ` [PATCH 2/2] diff: " Benjamin Woodruff via GitGitGadget
@ 2025-03-06 16:11 ` Junio C Hamano
2025-03-09 3:39 ` Jeff King
2 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2025-03-06 16:11 UTC (permalink / raw)
To: Benjamin Woodruff via GitGitGadget; +Cc: git, Benjamin Woodruff
"Benjamin Woodruff via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> git describe and git diff may update the index in the background for similar
> performance reasons to git-status.
That is a wrong reasoning that is completely opposite, though.
The commands at the Porcelain level, like "status" and "diff",
refresh the index for the CORRECTNESS purposes.
The commands at the plumbing level, which are designed to be used in
your own scripts, like "diff-files" and "diff-index", do not refresh
the index for the performance purposes. If your own script wants to
produce correct result, your script IS responsible for refreshing
the index after its last modification to working tree files before
it starts to use the plumbing commands to inspect which ones are
modified and which ones are not. This is so that your script has
more control over when the index is refreshed. It does not have to
pay cost to run refresh for each Git command it invokes, if it knows
that it does not make any modification between the two invocations;
it can instead refresh just once and then run these two plumbing
commands.
So, it would be absolute no-no to make the Porcelain commands like
"describe" and "diff" not to refresh the index before they work by
default. As an optional behaviour, it might be acceptable if there
is no other reasonable solutions (like, using plumbing commands if
the callers are not you typing but your scripts calling them).
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-06 16:11 ` [PATCH 0/2] describe and " Junio C Hamano
@ 2025-03-09 3:39 ` Jeff King
2025-03-10 12:25 ` Junio C Hamano
0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2025-03-09 3:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Benjamin Woodruff via GitGitGadget, git, Benjamin Woodruff
On Thu, Mar 06, 2025 at 08:11:09AM -0800, Junio C Hamano wrote:
> "Benjamin Woodruff via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > git describe and git diff may update the index in the background for similar
> > performance reasons to git-status.
>
> That is a wrong reasoning that is completely opposite, though.
>
> The commands at the Porcelain level, like "status" and "diff",
> refresh the index for the CORRECTNESS purposes.
Right, but "status" supports --no-optional-locks already.
So I think these patches are wrong, but the goal is reasonable. What
git-status does with --no-optional-locks is to update the index
internally for its _own_ use (giving it the correct results), but not to
lock nor write out the resulting index (to avoid conflicting with other
running programs). So it's pessimal (losing the opportunity to share
what it learned) but prevents lock contention.
And I think other programs could do that. I even wrote back in
27344d6a6c (git: add --no-optional-locks option, 2017-09-27):
I've punted here on finding more callers to convert, since "status" is
the obvious one to call as a repeated background job. But "git diff"'s
opportunistic refresh of the index may be a good candidate.
I must admit that I didn't imagine "describe" is something that somebody
would run a lot in the background, but there's not really any harm in
having it support optional locks if somebody cares about that case.
> The commands at the plumbing level, which are designed to be used in
> your own scripts, like "diff-files" and "diff-index", do not refresh
> the index for the performance purposes. If your own script wants to
> produce correct result, your script IS responsible for refreshing
> the index after its last modification to working tree files before
> it starts to use the plumbing commands to inspect which ones are
> modified and which ones are not. This is so that your script has
> more control over when the index is refreshed. It does not have to
> pay cost to run refresh for each Git command it invokes, if it knows
> that it does not make any modification between the two invocations;
> it can instead refresh just once and then run these two plumbing
> commands.
Assuming --no-optional-locks is being used as intended, the idea is that
the script cannot touch the index at all, because it is running in the
background and does not want lock contention with things the user is
doing in the foreground. So it cannot naively do a single index refresh
followed by plumbing commands like diff-files. Either it must:
- accept that diff-files might return stat-dirty results (yuck)
- use its own index that is separate from the regular .git/index file.
But that may be overly slow, since the index "update" would rewrite
the whole thing from scratch. Of course all of our index writes are
from scratch, but you'd pay the price even when there is nothing to
update.
- use a command which operates all in a single process, with an
in-memory index that is updated but not written out (e.g., "git
--no-optional-locks status").
-Peff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-09 3:39 ` Jeff King
@ 2025-03-10 12:25 ` Junio C Hamano
2025-03-10 16:08 ` Jeff King
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2025-03-10 12:25 UTC (permalink / raw)
To: Jeff King; +Cc: Benjamin Woodruff via GitGitGadget, git, Benjamin Woodruff
Jeff King <peff@peff.net> writes:
> .... What
> git-status does with --no-optional-locks is to update the index
> internally for its _own_ use (giving it the correct results), but not to
> lock nor write out the resulting index (to avoid conflicting with other
> running programs). So it's pessimal (losing the opportunity to share
> what it learned) but prevents lock contention.
Yup, that sounds somewhat sensible. I also have to wonder, other
than commands that are clearly about changing the repository state
like "add", the inspection commands like diff and status should
always opportunistically write the index back, without even being
asked?
> ... Either it must:
>
> - accept that diff-files might return stat-dirty results (yuck)
>
> - use its own index that is separate from the regular .git/index file.
> But that may be overly slow, since the index "update" would rewrite
> the whole thing from scratch. Of course all of our index writes are
> from scratch, but you'd pay the price even when there is nothing to
> update.
>
> - use a command which operates all in a single process, with an
> in-memory index that is updated but not written out (e.g., "git
> --no-optional-locks status").
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-10 12:25 ` Junio C Hamano
@ 2025-03-10 16:08 ` Jeff King
2025-03-10 18:53 ` Junio C Hamano
0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2025-03-10 16:08 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Benjamin Woodruff via GitGitGadget, git, Benjamin Woodruff
On Mon, Mar 10, 2025 at 05:25:32AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > .... What
> > git-status does with --no-optional-locks is to update the index
> > internally for its _own_ use (giving it the correct results), but not to
> > lock nor write out the resulting index (to avoid conflicting with other
> > running programs). So it's pessimal (losing the opportunity to share
> > what it learned) but prevents lock contention.
>
> Yup, that sounds somewhat sensible. I also have to wonder, other
> than commands that are clearly about changing the repository state
> like "add", the inspection commands like diff and status should
> always opportunistically write the index back, without even being
> asked?
Yeah, certainly it is a potential source of confusion to have a
conceptually read-only operation take locks or modify the repo state.
But I'm not sure we have a sense of how valuable that optimization is in
practice. After touching some files, every git-status, git-diff, etc
would end up re-hashing those files instead of using the stat cache.
But maybe that is lost in the noise of reading the files to actually do
diffs, etc? I dunno. I expect it is more important for status, which
probably does not need to read the whole file contents in most cases
(and which may be run a lot from the user's prompt, etc).
It seems like a big and possibly risky departure from what we've done
for so many years. I'm inclined not to rock the boat too much. ;)
-Peff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-10 16:08 ` Jeff King
@ 2025-03-10 18:53 ` Junio C Hamano
2025-03-10 20:50 ` Benjamin Woodruff
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2025-03-10 18:53 UTC (permalink / raw)
To: Jeff King; +Cc: Benjamin Woodruff via GitGitGadget, git, Benjamin Woodruff
Jeff King <peff@peff.net> writes:
> But maybe that is lost in the noise of reading the files to actually do
> diffs, etc? I dunno. I expect it is more important for status, which
> probably does not need to read the whole file contents in most cases
> (and which may be run a lot from the user's prompt, etc).
Yeah, and old timers who run "diff --raw" as if it were a quick
analogue for "status" also would notice.
> It seems like a big and possibly risky departure from what we've done
> for so many years. I'm inclined not to rock the boat too much. ;)
Certainly not right now. But adding a command line option is even
worse as we would have to carry the support for it for practically
forever X-<.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-10 18:53 ` Junio C Hamano
@ 2025-03-10 20:50 ` Benjamin Woodruff
2025-03-10 23:04 ` Junio C Hamano
2025-03-11 2:10 ` Jeff King
0 siblings, 2 replies; 11+ messages in thread
From: Benjamin Woodruff @ 2025-03-10 20:50 UTC (permalink / raw)
To: Junio C Hamano, Jeff King; +Cc: Benjamin Woodruff via GitGitGadget, git
On Mon, Mar 10, 2025, at 11:53 AM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
>> But maybe that is lost in the noise of reading the files to actually do
>> diffs, etc? I dunno. I expect it is more important for status, which
>> probably does not need to read the whole file contents in most cases
>> (and which may be run a lot from the user's prompt, etc).
>
> Yeah, and old timers who run "diff --raw" as if it were a quick
> analogue for "status" also would notice.
Thanks for your reviews, Junio and Jeff.
>>>> I must admit that I didn't imagine "describe" is something that
>>>> somebody would run a lot in the background
It might help if I start with some more context of why I wrote this
patch. We've got a tool that uses the Rust `vergen-gitcl` crate to call
`git describe --dirty`. You can see that code here:
https://github.com/vercel/next.js/pull/76889/files
We use `vergen-gitcl` to generate version identifiers for an on-disk
cache. This cache stores results of thousands of functions and has no
backwards compatibility. We want to invalidate it when *any* of the code
changes. `git-describe` felt like a good fit for that, as it gives us a
unique identifier that's still reasonably user-friendly.
However, we discovered that we'd frequently end up with stale git
lockfiles. This appeared to be due some combination of IDE tools that
run the build in the background (i.e. the rust-analyzer LSP), behavior
that causes builds to sometimes get killed before completion, and the
fact that `git describe --dirty` takes a lock.
I've worked around this on our end, by re-implementing `describe`'s
`--dirty` flag on top of `status`:
<https://github.com/rustyhorde/vergen/pull/406>
So, from my end, there's no urgency to get this change in, or to add
support for `diff` (we only care about `describe`). Rather, I felt like
this might be a footgun for other users, and wanted to do the right
thing by submitting an upstream change.
>>> git describe and git diff may update the index in the background for
>>> similar performance reasons to git-status.
>>
>> That is a wrong reasoning that is completely opposite, though.
>>
>> The commands at the Porcelain level, like "status" and "diff",
>> refresh the index for the CORRECTNESS purposes.
>
> Right, but "status" supports --no-optional-locks already.
Does this mean the documentation in `git-status` is incorrect? It
implies that the background refresh is only for performance reasons.
That's where I got this idea from:
<https://git-scm.com/docs/git-status#_background_refresh>
It's also worth noting that libgit2 does not do this background refresh
by default (`GIT_DIFF_UPDATE_INDEX` and `GIT_STATUS_OPT_UPDATE_INDEX`).
I think that makes sense for libgit2's typical use-cases, but it is a
divergence in behavior.
>> Yeah, certainly it is a potential source of confusion to have a
>> conceptually read-only operation take locks or modify the repo state.
>>
>> [...]
>>
>> It seems like a big and possibly risky departure from what we've done
>> for so many years. I'm inclined not to rock the boat too much. ;)
>
> Certainly not right now. But adding a command line option is even
> worse as we would have to carry the support for it for practically
> forever X-<.
What about if we reduce this to documentation changes? That alone
would've saved me a lot of pain of trying to figure out what does and
doesn't take a lock:
- Explicitly document that `--no-optional-locks` only changes behavior
for `git status`.
- Document that the `--dirty` flag on `describe` will lead to
`status`-like background refresh behavior.
- Change the documentation for status's background refresh to indicate
that there is a subtle difference in behavior caused by
`--no-optional-locks`? I lack sufficient context on how best to
describe or explain this.
- Leave `git-diff` documentation as-is. I think the current
documentation of `diff.autoRefreshIndex` is sufficient?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-10 20:50 ` Benjamin Woodruff
@ 2025-03-10 23:04 ` Junio C Hamano
2025-03-11 2:10 ` Jeff King
1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2025-03-10 23:04 UTC (permalink / raw)
To: Benjamin Woodruff; +Cc: Jeff King, Benjamin Woodruff via GitGitGadget, git
"Benjamin Woodruff" <github@benjam.info> writes:
>>> The commands at the Porcelain level, like "status" and "diff",
>>> refresh the index for the CORRECTNESS purposes.
>>
>> Right, but "status" supports --no-optional-locks already.
>
> Does this mean the documentation in `git-status` is incorrect? It
> implies that the background refresh is only for performance reasons.
It has to do, and it does, refresh the in-core index for correctness
reasons. Writing the result out to the on-disk file for _other_
processes that will use the index later is a performance hack, and
may or may not happen (it is up to what repo_update_index_if_able()
does).
If we do not care about redundant refreshing these other processes
need to do before they start, we do not have to write the resulting
in-core index out to the disk.
Writing the in-core index out for other processes has one downside,
which is that for correctness it has to be done under lock. We
cannot say "we tried to be nice to others by writing the index file
out, but we just found out that there is a competing process trying
to open/write the index so let's skip writing it and let them do
their thing---they are responsible for refreshing the index for
their own use". There needs some coordination, i.e. when we start
writing the result of our refreshing the index for others, if other
processes need to pay attention to that fact and wait for a while
before reading the index, then they benefit because they do not have
to refresh the index themselves. Even though our programs, by
calling hold_lock_file_for_update(), know how to wait for a bit when
this happens, not everybody cooperates.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] describe and diff: implement --no-optional-locks
2025-03-10 20:50 ` Benjamin Woodruff
2025-03-10 23:04 ` Junio C Hamano
@ 2025-03-11 2:10 ` Jeff King
1 sibling, 0 replies; 11+ messages in thread
From: Jeff King @ 2025-03-11 2:10 UTC (permalink / raw)
To: Benjamin Woodruff; +Cc: Junio C Hamano, Benjamin Woodruff via GitGitGadget, git
On Mon, Mar 10, 2025 at 01:50:36PM -0700, Benjamin Woodruff wrote:
> It might help if I start with some more context of why I wrote this
> patch. We've got a tool that uses the Rust `vergen-gitcl` crate to call
> `git describe --dirty`. You can see that code here:
> https://github.com/vercel/next.js/pull/76889/files
>
> We use `vergen-gitcl` to generate version identifiers for an on-disk
> cache. This cache stores results of thousands of functions and has no
> backwards compatibility. We want to invalidate it when *any* of the code
> changes. `git-describe` felt like a good fit for that, as it gives us a
> unique identifier that's still reasonably user-friendly.
>
> However, we discovered that we'd frequently end up with stale git
> lockfiles. This appeared to be due some combination of IDE tools that
> run the build in the background (i.e. the rust-analyzer LSP), behavior
> that causes builds to sometimes get killed before completion, and the
> fact that `git describe --dirty` takes a lock.
Yeah, that is not quite the original use case that --no-optional-locks
was designed for (i.e., simultaneous contention), but I think it is a
reasonable application of the flag.
> >>> git describe and git diff may update the index in the background for
> >>> similar performance reasons to git-status.
> >>
> >> That is a wrong reasoning that is completely opposite, though.
> >>
> >> The commands at the Porcelain level, like "status" and "diff",
> >> refresh the index for the CORRECTNESS purposes.
> >
> > Right, but "status" supports --no-optional-locks already.
>
> Does this mean the documentation in `git-status` is incorrect? It
> implies that the background refresh is only for performance reasons.
> That's where I got this idea from:
> <https://git-scm.com/docs/git-status#_background_refresh>
I think Junio gave an explanation here, so I won't repeat that. But I
also think both of us may have been a bit confused about the changes
your patches are making, because there's some subtlety.
The important thing to keep in mind is that there are _two_ steps:
refreshing the in-core index and writing the result out to the on-disk
file. With --no-optional-locks we must continue to do the first step
(for correctness), and skip the second step.
So looking at your patch 1/2 for git-describe, it is doing the right
thing: we still call refresh_index() always, and only skip the calls to
repo_hold_locked_index() and repo_update_index_if_able().
But one thing that puzzles me is that we read and refresh the index
first and only _then_ take a lock. Which seems wrong to me, as we could
racily overwrite an intermediate write from somebody else that we never
even saw (e.g., imagine you call "git add" at just the wrong moment).
That is not a bug in your code, but an existing problem that I think
made it harder to understand your change (and probably one we should
fix regardless).
Your patch 2/2 for git-diff is what I thought was actually wrong, but
after digging further, I'm not so sure.
In your patch we return early from refresh_index_quietly(), without
actually refreshing the in-core index. So I _thought_ that meant we'd
produce a wrong answer for something like this:
$ touch git.c
$ ./git --no-optional-locks diff
where we should report "no changes", but would instead find the
stat-dirty git.c (just like a plumbing "git diff-files" would). But
that doesn't happen!
That's because refresh_index_quietly() runs after the diff has completed
anyway. The real magic is in diffcore_skip_stat_unmatch(), which
processes individual stat-dirty entries and suppresses them (when
there's no actual content change).
So the call in refresh_index_quietly() really is just about updating
what we're about to write out, and your patch is correct to bail from
the whole function (if we are not writing it out, there is no purpose in
refreshing at that point).
So as far as I can tell the patches are doing the right thing. But I
think the commit messages probably need to describe those subtleties and
argue that the change is correct. Bonus points if a preparatory patch
fixes the race in git-describe. ;)
> It's also worth noting that libgit2 does not do this background refresh
> by default (`GIT_DIFF_UPDATE_INDEX` and `GIT_STATUS_OPT_UPDATE_INDEX`).
> I think that makes sense for libgit2's typical use-cases, but it is a
> divergence in behavior.
Yes, I think that is probably a reasonable default for libgit2, where
you'd expect everything to happen in a single process (that can share
the in-core index).
-Peff
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-03-11 2:10 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-06 5:58 [PATCH 0/2] describe and diff: implement --no-optional-locks Benjamin Woodruff via GitGitGadget
2025-03-06 5:58 ` [PATCH 1/2] describe: " Benjamin Woodruff via GitGitGadget
2025-03-06 5:58 ` [PATCH 2/2] diff: " Benjamin Woodruff via GitGitGadget
2025-03-06 16:11 ` [PATCH 0/2] describe and " Junio C Hamano
2025-03-09 3:39 ` Jeff King
2025-03-10 12:25 ` Junio C Hamano
2025-03-10 16:08 ` Jeff King
2025-03-10 18:53 ` Junio C Hamano
2025-03-10 20:50 ` Benjamin Woodruff
2025-03-10 23:04 ` Junio C Hamano
2025-03-11 2:10 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).