* [PATCH] config: retry acquiring config.lock for 100ms @ 2026-04-03 10:01 Joerg Thalheim 2026-04-03 17:53 ` Junio C Hamano 2026-04-08 10:34 ` Patrick Steinhardt 0 siblings, 2 replies; 11+ messages in thread From: Joerg Thalheim @ 2026-04-03 10:01 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Jörg Thalheim From: Jörg Thalheim <joerg@thalheim.io> When multiple processes write to a config file concurrently, they contend on its ".lock" file, which is acquired via open(O_EXCL) with no retry. The losers fail immediately with "could not lock config file". Two processes writing unrelated keys (say, "branch.a.remote" and "branch.b.remote") have no semantic conflict, yet one of them fails for a purely mechanical reason. This bites in practice when running `git worktree add -b` concurrently against the same repository. Each invocation makes several writes to ".git/config" to set up branch tracking, and tooling that creates worktrees in parallel sees intermittent failures. Worse, `git worktree add` does not propagate the failed config write to its exit code: the worktree is created and the command exits 0, but tracking configuration is silently dropped. The lock is held only for the duration of rewriting a small file, so retrying for 100 ms papers over any realistic contention while still failing fast if a stale lock has been left behind by a crashed process. This mirrors what we already do for individual reference locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, 2017-08-21)). Signed-off-by: Jörg Thalheim <joerg@thalheim.io> --- config.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/config.c b/config.c index 156f2a24fa..f7aff8725d 100644 --- a/config.c +++ b/config.c @@ -2903,6 +2903,14 @@ char *git_config_prepare_comment_string(const char *comment) return prepared; } +/* + * How long to retry acquiring config.lock when another process holds it. + * The lock is held only for the duration of rewriting a small file, so + * 100 ms covers any realistic contention while still failing fast if + * a stale lock has been left behind by a crashed process. + */ +#define CONFIG_LOCK_TIMEOUT_MS 100 + static void validate_comment_string(const char *comment) { size_t leading_blanks; @@ -2986,7 +2994,8 @@ int repo_config_set_multivar_in_file_gently(struct repository *r, * The lock serves a purpose in addition to locking: the new * contents of .git/config will be written into it. */ - fd = hold_lock_file_for_update(&lock, config_filename, 0); + fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, + CONFIG_LOCK_TIMEOUT_MS); if (fd < 0) { error_errno(_("could not lock config file %s"), config_filename); ret = CONFIG_NO_LOCK; @@ -3331,7 +3340,8 @@ static int repo_config_copy_or_rename_section_in_file( if (!config_filename) config_filename = filename_buf = repo_git_path(r, "config"); - out_fd = hold_lock_file_for_update(&lock, config_filename, 0); + out_fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, + CONFIG_LOCK_TIMEOUT_MS); if (out_fd < 0) { ret = error(_("could not lock config file %s"), config_filename); goto out; -- 2.53.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] config: retry acquiring config.lock for 100ms 2026-04-03 10:01 [PATCH] config: retry acquiring config.lock for 100ms Joerg Thalheim @ 2026-04-03 17:53 ` Junio C Hamano 2026-04-08 10:34 ` Patrick Steinhardt 1 sibling, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2026-04-03 17:53 UTC (permalink / raw) To: Joerg Thalheim; +Cc: git, Patrick Steinhardt Joerg Thalheim <joerg@thalheim.io> writes: > From: Jörg Thalheim <joerg@thalheim.io> > > When multiple processes write to a config file concurrently, they > contend on its ".lock" file, which is acquired via open(O_EXCL) with > no retry. The losers fail immediately with "could not lock config > file". Two processes writing unrelated keys (say, "branch.a.remote" > and "branch.b.remote") have no semantic conflict, yet one of them > fails for a purely mechanical reason. > > This bites in practice when running `git worktree add -b` concurrently > against the same repository. Each invocation makes several writes to > ".git/config" to set up branch tracking, and tooling that creates > worktrees in parallel sees intermittent failures. Worse, `git worktree > add` does not propagate the failed config write to its exit code: the > worktree is created and the command exits 0, but tracking > configuration is silently dropped. > > The lock is held only for the duration of rewriting a small file, so > retrying for 100 ms papers over any realistic contention while still > failing fast if a stale lock has been left behind by a crashed > process. This mirrors what we already do for individual reference > locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, > 2017-08-21)). > > Signed-off-by: Jörg Thalheim <joerg@thalheim.io> > --- > config.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) OK. This is consistent with the default used for a ref update with files backend. Various subsystems use their own value randomly chosen out of thin air, it seems. credential-store uses 1000ms, gc uses 150ms to interact with launchctl, refs have their own per backend, etc. > diff --git a/config.c b/config.c > index 156f2a24fa..f7aff8725d 100644 > --- a/config.c > +++ b/config.c > @@ -2903,6 +2903,14 @@ char *git_config_prepare_comment_string(const char *comment) > return prepared; > } > > +/* > + * How long to retry acquiring config.lock when another process holds it. > + * The lock is held only for the duration of rewriting a small file, so > + * 100 ms covers any realistic contention while still failing fast if > + * a stale lock has been left behind by a crashed process. > + */ > +#define CONFIG_LOCK_TIMEOUT_MS 100 > + Making this configurable would make a cute chicken-and-egg problem? ;-) No, no need for that---just kidding. Will queue. Thanks. > @@ -2986,7 +2994,8 @@ int repo_config_set_multivar_in_file_gently(struct repository *r, > * The lock serves a purpose in addition to locking: the new > * contents of .git/config will be written into it. > */ > - fd = hold_lock_file_for_update(&lock, config_filename, 0); > + fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, > + CONFIG_LOCK_TIMEOUT_MS); > if (fd < 0) { > error_errno(_("could not lock config file %s"), config_filename); > ret = CONFIG_NO_LOCK; > @@ -3331,7 +3340,8 @@ static int repo_config_copy_or_rename_section_in_file( > if (!config_filename) > config_filename = filename_buf = repo_git_path(r, "config"); > > - out_fd = hold_lock_file_for_update(&lock, config_filename, 0); > + out_fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, > + CONFIG_LOCK_TIMEOUT_MS); > if (out_fd < 0) { > ret = error(_("could not lock config file %s"), config_filename); > goto out; ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] config: retry acquiring config.lock for 100ms 2026-04-03 10:01 [PATCH] config: retry acquiring config.lock for 100ms Joerg Thalheim 2026-04-03 17:53 ` Junio C Hamano @ 2026-04-08 10:34 ` Patrick Steinhardt 2026-05-11 2:32 ` Junio C Hamano 1 sibling, 1 reply; 11+ messages in thread From: Patrick Steinhardt @ 2026-04-08 10:34 UTC (permalink / raw) To: Joerg Thalheim; +Cc: git, Junio C Hamano On Fri, Apr 03, 2026 at 12:01:35PM +0200, Joerg Thalheim wrote: > From: Jörg Thalheim <joerg@thalheim.io> > > When multiple processes write to a config file concurrently, they > contend on its ".lock" file, which is acquired via open(O_EXCL) with > no retry. The losers fail immediately with "could not lock config > file". Two processes writing unrelated keys (say, "branch.a.remote" > and "branch.b.remote") have no semantic conflict, yet one of them > fails for a purely mechanical reason. > > This bites in practice when running `git worktree add -b` concurrently > against the same repository. Each invocation makes several writes to > ".git/config" to set up branch tracking, and tooling that creates > worktrees in parallel sees intermittent failures. Worse, `git worktree > add` does not propagate the failed config write to its exit code: the > worktree is created and the command exits 0, but tracking > configuration is silently dropped. This very much sounds like a bug that is worth fixing independently. > The lock is held only for the duration of rewriting a small file, so > retrying for 100 ms papers over any realistic contention while still > failing fast if a stale lock has been left behind by a crashed > process. This mirrors what we already do for individual reference > locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, > 2017-08-21)). Famous last words :) Experience tells me that any timeout value that isn't excessive will eventually be hit in some production system. Which raises the question whether we want to make the timeout configurable, similar to "core.filesRefLockTimeout" and "core.packedRefsTimeout". > diff --git a/config.c b/config.c > index 156f2a24fa..f7aff8725d 100644 > --- a/config.c > +++ b/config.c > @@ -2903,6 +2903,14 @@ char *git_config_prepare_comment_string(const char *comment) > return prepared; > } > > +/* > + * How long to retry acquiring config.lock when another process holds it. > + * The lock is held only for the duration of rewriting a small file, so > + * 100 ms covers any realistic contention while still failing fast if > + * a stale lock has been left behind by a crashed process. > + */ > +#define CONFIG_LOCK_TIMEOUT_MS 100 > + > static void validate_comment_string(const char *comment) > { > size_t leading_blanks; > @@ -2986,7 +2994,8 @@ int repo_config_set_multivar_in_file_gently(struct repository *r, > * The lock serves a purpose in addition to locking: the new > * contents of .git/config will be written into it. > */ > - fd = hold_lock_file_for_update(&lock, config_filename, 0); > + fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, > + CONFIG_LOCK_TIMEOUT_MS); > if (fd < 0) { > error_errno(_("could not lock config file %s"), config_filename); > ret = CONFIG_NO_LOCK; > @@ -3331,7 +3340,8 @@ static int repo_config_copy_or_rename_section_in_file( > if (!config_filename) > config_filename = filename_buf = repo_git_path(r, "config"); > > - out_fd = hold_lock_file_for_update(&lock, config_filename, 0); > + out_fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, > + CONFIG_LOCK_TIMEOUT_MS); > if (out_fd < 0) { > ret = error(_("could not lock config file %s"), config_filename); > goto out; Okay, so we now handle this more gracefully with concurrent writers. But even if we handle that more gracefully, we still don't really have any guarantee that the outcome will be reasonable, or even close. The main difference to the locking timeouts we have for refs is that we have an easy way to check for conflicts there: we simpliy verify that the refs we want to update haven't moved from their expected value. We have no such thing for our configuration though, so two processes that race with one another will happily overwrite each other's changes. Honestly though, I'm not really sure what to make with this. We could of course also add some validation that the configuration we want to set hasn't been modified meanwhile. But that would now lead to a situation where we have to update every single caller in our tree to make use of the new mechanism, which would be a bunch of work. And adding the timeout doesn't really change the status quo, either. We already have the case that we'll happily overwrite changes made by concurrent processes. The only thing that changes is that we make it more likely for concurrent changes to succeed. So I'm a bit cautious, but I think overall I'm fine with this? Thanks! Patrick ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] config: retry acquiring config.lock for 100ms 2026-04-08 10:34 ` Patrick Steinhardt @ 2026-05-11 2:32 ` Junio C Hamano 2026-05-11 7:33 ` Patrick Steinhardt 2026-05-11 9:06 ` Jörg Thalheim 0 siblings, 2 replies; 11+ messages in thread From: Junio C Hamano @ 2026-05-11 2:32 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Joerg Thalheim, git Patrick Steinhardt <ps@pks.im> writes: >> This bites in practice when running `git worktree add -b` concurrently >> against the same repository. Each invocation makes several writes to >> ".git/config" to set up branch tracking, and tooling that creates >> worktrees in parallel sees intermittent failures. Worse, `git worktree >> add` does not propagate the failed config write to its exit code: the >> worktree is created and the command exits 0, but tracking >> configuration is silently dropped. > > This very much sounds like a bug that is worth fixing independently. > >> The lock is held only for the duration of rewriting a small file, so >> retrying for 100 ms papers over any realistic contention while still >> failing fast if a stale lock has been left behind by a crashed >> process. This mirrors what we already do for individual reference >> locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, >> 2017-08-21)). > > Famous last words :) Experience tells me that any timeout value that > isn't excessive will eventually be hit in some production system. Which > raises the question whether we want to make the timeout configurable, > similar to "core.filesRefLockTimeout" and "core.packedRefsTimeout". > ... > Honestly though, I'm not really sure what to make with this. We could > of course also add some validation that the configuration we want to set > hasn't been modified meanwhile. But that would now lead to a situation > where we have to update every single caller in our tree to make use of > the new mechanism, which would be a bunch of work. > > And adding the timeout doesn't really change the status quo, either. We > already have the case that we'll happily overwrite changes made by > concurrent processes. The only thing that changes is that we make it > more likely for concurrent changes to succeed. We haven't heard any response to these points raised in the message I am responding to. Should I still keep the patch in my tree, hoping that a responses may come some day? I am tempted to discard the topic as it has been quite a while since we last looked at it. Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] config: retry acquiring config.lock for 100ms 2026-05-11 2:32 ` Junio C Hamano @ 2026-05-11 7:33 ` Patrick Steinhardt 2026-05-11 9:06 ` Jörg Thalheim 1 sibling, 0 replies; 11+ messages in thread From: Patrick Steinhardt @ 2026-05-11 7:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: Joerg Thalheim, git On Mon, May 11, 2026 at 11:32:33AM +0900, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > >> This bites in practice when running `git worktree add -b` concurrently > >> against the same repository. Each invocation makes several writes to > >> ".git/config" to set up branch tracking, and tooling that creates > >> worktrees in parallel sees intermittent failures. Worse, `git worktree > >> add` does not propagate the failed config write to its exit code: the > >> worktree is created and the command exits 0, but tracking > >> configuration is silently dropped. > > > > This very much sounds like a bug that is worth fixing independently. > > > >> The lock is held only for the duration of rewriting a small file, so > >> retrying for 100 ms papers over any realistic contention while still > >> failing fast if a stale lock has been left behind by a crashed > >> process. This mirrors what we already do for individual reference > >> locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, > >> 2017-08-21)). > > > > Famous last words :) Experience tells me that any timeout value that > > isn't excessive will eventually be hit in some production system. Which > > raises the question whether we want to make the timeout configurable, > > similar to "core.filesRefLockTimeout" and "core.packedRefsTimeout". > > ... > > Honestly though, I'm not really sure what to make with this. We could > > of course also add some validation that the configuration we want to set > > hasn't been modified meanwhile. But that would now lead to a situation > > where we have to update every single caller in our tree to make use of > > the new mechanism, which would be a bunch of work. > > > > And adding the timeout doesn't really change the status quo, either. We > > already have the case that we'll happily overwrite changes made by > > concurrent processes. The only thing that changes is that we make it > > more likely for concurrent changes to succeed. > > We haven't heard any response to these points raised in the message > I am responding to. Should I still keep the patch in my tree, > hoping that a responses may come some day? I am tempted to discard > the topic as it has been quite a while since we last looked at it. Same here, let's discard it for now as it can be easily added back in once a new version was posted or the feedback has been addressed. Patrick ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] config: retry acquiring config.lock for 100ms 2026-05-11 2:32 ` Junio C Hamano 2026-05-11 7:33 ` Patrick Steinhardt @ 2026-05-11 9:06 ` Jörg Thalheim 2026-05-11 10:01 ` Patrick Steinhardt 1 sibling, 1 reply; 11+ messages in thread From: Jörg Thalheim @ 2026-05-11 9:06 UTC (permalink / raw) To: Junio C Hamano, Patrick Steinhardt; +Cc: git I am not really sure what you want me to do here. I don't see how git can have this value configurable, given it's about reading the configuration itself. Is the user supposed via command line? Meanwhile the project I was facing this issue, added the required file lock in its own code, which has since then worked perfectly to fix my use case: https://github.com/raine/workmux/issues/116 May 11, 2026 at 4:32 AM, "Junio C Hamano" <gitster@pobox.com mailto:gitster@pobox.com?to=%22Junio%20C%20Hamano%22%20%3Cgitster%40pobox.com%3E > wrote: > > Patrick Steinhardt <ps@pks.im> writes: > > > > > > > > > This bites in practice when running `git worktree add -b` concurrently > > > against the same repository. Each invocation makes several writes to > > > ".git/config" to set up branch tracking, and tooling that creates > > > worktrees in parallel sees intermittent failures. Worse, `git worktree > > > add` does not propagate the failed config write to its exit code: the > > > worktree is created and the command exits 0, but tracking > > > configuration is silently dropped. > > > > > This very much sounds like a bug that is worth fixing independently. > > > > > > > > The lock is held only for the duration of rewriting a small file, so > > > retrying for 100 ms papers over any realistic contention while still > > > failing fast if a stale lock has been left behind by a crashed > > > process. This mirrors what we already do for individual reference > > > locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, > > > 2017-08-21)). > > > > > Famous last words :) Experience tells me that any timeout value that > > isn't excessive will eventually be hit in some production system. Which > > raises the question whether we want to make the timeout configurable, > > similar to "core.filesRefLockTimeout" and "core.packedRefsTimeout". > > ... > > Honestly though, I'm not really sure what to make with this. We could > > of course also add some validation that the configuration we want to set > > hasn't been modified meanwhile. But that would now lead to a situation > > where we have to update every single caller in our tree to make use of > > the new mechanism, which would be a bunch of work. > > > > And adding the timeout doesn't really change the status quo, either. We > > already have the case that we'll happily overwrite changes made by > > concurrent processes. The only thing that changes is that we make it > > more likely for concurrent changes to succeed. > > > We haven't heard any response to these points raised in the message > I am responding to. Should I still keep the patch in my tree, > hoping that a responses may come some day? I am tempted to discard > the topic as it has been quite a while since we last looked at it. > > Thanks. > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] config: retry acquiring config.lock for 100ms 2026-05-11 9:06 ` Jörg Thalheim @ 2026-05-11 10:01 ` Patrick Steinhardt 2026-05-17 11:27 ` Johannes Schindelin 0 siblings, 1 reply; 11+ messages in thread From: Patrick Steinhardt @ 2026-05-11 10:01 UTC (permalink / raw) To: Jörg Thalheim; +Cc: Junio C Hamano, git On Mon, May 11, 2026 at 09:06:00AM +0000, Jörg Thalheim wrote: > May 11, 2026 at 4:32 AM, "Junio C Hamano" <gitster@pobox.com mailto:gitster@pobox.com?to=%22Junio%20C%20Hamano%22%20%3Cgitster%40pobox.com%3E > wrote: > > Patrick Steinhardt <ps@pks.im> writes: > > > > This bites in practice when running `git worktree add -b` concurrently > > > > against the same repository. Each invocation makes several writes to > > > > ".git/config" to set up branch tracking, and tooling that creates > > > > worktrees in parallel sees intermittent failures. Worse, `git worktree > > > > add` does not propagate the failed config write to its exit code: the > > > > worktree is created and the command exits 0, but tracking > > > > configuration is silently dropped. > > > > > > > This very much sounds like a bug that is worth fixing independently. > > > > > > > > > > > The lock is held only for the duration of rewriting a small file, so > > > > retrying for 100 ms papers over any realistic contention while still > > > > failing fast if a stale lock has been left behind by a crashed > > > > process. This mirrors what we already do for individual reference > > > > locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, > > > > 2017-08-21)). > > > > > > > Famous last words :) Experience tells me that any timeout value that > > > isn't excessive will eventually be hit in some production system. Which > > > raises the question whether we want to make the timeout configurable, > > > similar to "core.filesRefLockTimeout" and "core.packedRefsTimeout". > > > ... > > > Honestly though, I'm not really sure what to make with this. We could > > > of course also add some validation that the configuration we want to set > > > hasn't been modified meanwhile. But that would now lead to a situation > > > where we have to update every single caller in our tree to make use of > > > the new mechanism, which would be a bunch of work. > > > > > > And adding the timeout doesn't really change the status quo, either. We > > > already have the case that we'll happily overwrite changes made by > > > concurrent processes. The only thing that changes is that we make it > > > more likely for concurrent changes to succeed. > > > > > We haven't heard any response to these points raised in the message > > I am responding to. Should I still keep the patch in my tree, > > hoping that a responses may come some day? I am tempted to discard > > the topic as it has been quite a while since we last looked at it. > > I am not really sure what you want me to do here. In general, the idea here is to engage in a discussion that can ultimately lead to one of two outcomes: - The discussion surfaces an area the author hasn't thought about, so the patch is adapted accordingly. - The discussion shows that the author already did think about the issue, but hasn't documented the assumptions. In this case, it should be the commit message that gets adapted. > I don't see how git can have this value configurable, given it's about > reading the configuration itself. Is the user supposed via command > line? This is a fair point indeed. But if it's not possible to change via the configuration itself, then the next-best thing might be to introduce an environment variable that allows configuring it. The other aspect that wasn't discussed in the commit message is how concurrent writes are handled, both when they are non-conflicting (updating different keys) and when they are conflicting (updating the same key). After spending some more time in the code I think it's ultimately nothing we have to worry about too much, as we only start reading the configuration after we've locked it. So in the semantically non-conflicting case there isn't really much of a race, because things already work as expected. But in the semantically conflicting case it's a bit different, as the latter writer will overwrite the result of the former one. In theory it would be possible to detect such conflicts by: - Reading the configuration file. - Taking the lock. - Rereading the configuration to check for conflicts. But even that is racy as the first writer might have succeeded before we read the configuration the first time. So I'm not sure whether we can do anything about that in the first place, as the race basically exists in the outer loop controlled by the caller. So there probably isn't much we can do about that, and unless I missed something I think your timeout is sensible. But ideally, such nuances would be discussed as part of the commit message so that reviewers and future readers are made aware of them. Thanks! Patrick ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] config: retry acquiring config.lock for 100ms 2026-05-11 10:01 ` Patrick Steinhardt @ 2026-05-17 11:27 ` Johannes Schindelin 2026-05-17 13:21 ` [PATCH v2] config: retry acquiring config.lock, configurable via core.configLockTimeout Joerg Thalheim 0 siblings, 1 reply; 11+ messages in thread From: Johannes Schindelin @ 2026-05-17 11:27 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Jörg Thalheim, Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 6855 bytes --] Hi Patrick & Jörg, On Mon, 11 May 2026, Patrick Steinhardt wrote: > On Mon, May 11, 2026 at 09:06:00AM +0000, Jörg Thalheim wrote: > > May 11, 2026 at 4:32 AM, "Junio C Hamano" <gitster@pobox.com > > mailto:gitster@pobox.com?to=%22Junio%20C%20Hamano%22%20%3Cgitster%40pobox.com%3E > > > wrote: > > > Patrick Steinhardt <ps@pks.im> writes: > > > > > This bites in practice when running `git worktree add -b` concurrently > > > > > against the same repository. Each invocation makes several writes to > > > > > ".git/config" to set up branch tracking, and tooling that creates > > > > > worktrees in parallel sees intermittent failures. Worse, `git worktree > > > > > add` does not propagate the failed config write to its exit code: the > > > > > worktree is created and the command exits 0, but tracking > > > > > configuration is silently dropped. > > > > > > > > > This very much sounds like a bug that is worth fixing independently. > > > > > > > > > > > > > > The lock is held only for the duration of rewriting a small file, so > > > > > retrying for 100 ms papers over any realistic contention while still > > > > > failing fast if a stale lock has been left behind by a crashed > > > > > process. This mirrors what we already do for individual reference > > > > > locks (4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, > > > > > 2017-08-21)). > > > > > > > > > Famous last words :) Experience tells me that any timeout value that > > > > isn't excessive will eventually be hit in some production system. Which > > > > raises the question whether we want to make the timeout configurable, > > > > similar to "core.filesRefLockTimeout" and "core.packedRefsTimeout". > > > > ... > > > > Honestly though, I'm not really sure what to make with this. We could > > > > of course also add some validation that the configuration we want to set > > > > hasn't been modified meanwhile. But that would now lead to a situation > > > > where we have to update every single caller in our tree to make use of > > > > the new mechanism, which would be a bunch of work. > > > > > > > > And adding the timeout doesn't really change the status quo, either. We > > > > already have the case that we'll happily overwrite changes made by > > > > concurrent processes. The only thing that changes is that we make it > > > > more likely for concurrent changes to succeed. > > > > > > > We haven't heard any response to these points raised in the message > > > I am responding to. Should I still keep the patch in my tree, > > > hoping that a responses may come some day? I am tempted to discard > > > the topic as it has been quite a while since we last looked at it. > > > > I am not really sure what you want me to do here. > > In general, the idea here is to engage in a discussion that can > ultimately lead to one of two outcomes: > > - The discussion surfaces an area the author hasn't thought about, so > the patch is adapted accordingly. > > - The discussion shows that the author already did think about the > issue, but hasn't documented the assumptions. In this case, it > should be the commit message that gets adapted. For what it's worth, I meant to chime in earlier, but obligations kept preventing me from setting aside the time to do so. Well, better late than never. > > I don't see how git can have this value configurable, given it's about > > reading the configuration itself. Is the user supposed via command > > line? > > This is a fair point indeed. But if it's not possible to change via the > configuration itself, then the next-best thing might be to introduce an > environment variable that allows configuring it. Well, given that the config is read first before it's written, it is totally possible to configure a timeout via the config, and I have some real-world proof that this works as intended (see below). > The other aspect that wasn't discussed in the commit message is how > concurrent writes are handled, both when they are non-conflicting > (updating different keys) and when they are conflicting (updating the > same key). After spending some more time in the code I think it's > ultimately nothing we have to worry about too much, as we only start > reading the configuration after we've locked it. Correct. I had performed this analysis myself when writing a similar patch to fix problems in Scalar's Functional Test suite, which wants to register _many_ Scalar repositories with ~/.gitconfig concurrently. The current iteration of the patch can be found here: https://github.com/microsoft/git/commit/a1c2d97cb61bc3697086d1749de848586df2ec54 It does include the config setting, leaving the default as "off" (but I missed the separate code path to rename sections, which has _independent_ code that also wants to lock the config file, which your patch did not miss). The subsequent child commit https://github.com/microsoft/git/commit/5d365c1f332b8d2214ae9c44970d6370ed9caffc configures it to 150ms in Scalar repositories only. This is notably larger than the 100ms you suggested, and it is rooted in the fact that NTFS I/O characteristics are unfortunately in need of a wider margin. In other words, the optimal value depends on the operating system (and the CPU load, as Junio had pointed out). For the record, feel free to adopt whatever you want from my patches for your next iteration (but also feel free to ignore all of it). > So in the semantically non-conflicting case there isn't really much of a > race, because things already work as expected. But in the semantically > conflicting case it's a bit different, as the latter writer will > overwrite the result of the former one. In theory it would be possible > to detect such conflicts by: > > - Reading the configuration file. > > - Taking the lock. > > - Rereading the configuration to check for conflicts. > > But even that is racy as the first writer might have succeeded before we > read the configuration the first time. So I'm not sure whether we can do > anything about that in the first place, as the race basically exists in > the outer loop controlled by the caller. > > So there probably isn't much we can do about that, and unless I missed > something I think your timeout is sensible. But ideally, such nuances > would be discussed as part of the commit message so that reviewers and > future readers are made aware of them. I agree. Complex cases like this would require a sort of transactional support to be added to `git config`, and that would in and of itself open a can of worms I'm not sure we should open unless there is a concrete use case that bites enough real-world scenarios to require acting upon. Ciao, Johannes ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] config: retry acquiring config.lock, configurable via core.configLockTimeout 2026-05-17 11:27 ` Johannes Schindelin @ 2026-05-17 13:21 ` Joerg Thalheim 2026-05-18 0:46 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Joerg Thalheim @ 2026-05-17 13:21 UTC (permalink / raw) To: git Cc: Junio C Hamano, Patrick Steinhardt, Johannes Schindelin, Jörg Thalheim From: Jörg Thalheim <joerg@thalheim.io> Concurrent config writers race for the ".lock" file, which is taken with open(O_EXCL) and no retry, so the losers fail right away with "could not lock config file". This shows up with parallel "git worktree add -b" against the same repository: each one writes a couple of branch.* keys and the losers fail at random. Worse, "git worktree add" doesn't propagate that failure to its exit code, so the tracking config is silently dropped. (The swallowed error is a separate bug.) Retry instead of giving up on the first EEXIST. The lock is only held while rewriting a small file, so the loser only has to wait out the other writers. Same approach as 4ff0f01cb7 (refs: retry acquiring reference locks for 100ms, 2017-08-21). On the semantics: the on-disk config is read only after the lock is taken, so writers touching different keys can't lose each other's change. Writers touching the same key still get last-writer-wins, but that is already the case today and would need a compare-and-swap config API to fix. The retry only turns hard failures into successes. Default to 1000ms, like core.packedRefsTimeout: same shape of problem, one shared file everyone serializes through. A larger timeout only costs anything when a stale lock is left behind by a crash, which is rare; a smaller one fails spuriously on slow filesystems (NTFS has been seen needing more than 100ms). Make it configurable as core.configLockTimeout. There is no chicken-and-egg problem: we read the config before we lock it. microsoft/git carries a similar patch (core.configWriteLockTimeoutMS, default off) for Scalar's tests. Defaulting to non-zero here because the worktree case fails silently. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Jörg Thalheim <joerg@thalheim.io> --- Thanks for the review and for poking me, this had fallen off my radar. v1 -> v2: - added core.configLockTimeout. Johannes is right that there is no chicken-and-egg problem (config is read before the lock), so no env var needed. - default bumped to 1000ms; packed-refs is the closer precedent and it keeps NTFS out of trouble. - commit message now covers the read-after-lock / last-writer-wins semantics Patrick asked about. - added tests; existing stale-lock tests in t3200/t5505 now pass -c core.configLockTimeout=0 so they still fail fast. I matched the core.filesRefLockTimeout naming rather than reusing microsoft/git's core.configWriteLockTimeoutMS, but can switch if the downstream compat matters more. Documentation/config/core.adoc | 8 ++++++++ config.c | 24 ++++++++++++++++++++++-- t/t1300-config.sh | 17 +++++++++++++++++ t/t3200-branch.sh | 6 ++++-- t/t5505-remote.sh | 3 ++- 5 files changed, 53 insertions(+), 5 deletions(-) diff --git a/Documentation/config/core.adoc b/Documentation/config/core.adoc index a0ebf03e2e..340329edc3 100644 --- a/Documentation/config/core.adoc +++ b/Documentation/config/core.adoc @@ -589,6 +589,14 @@ core.packedRefsTimeout:: all; -1 means to try indefinitely. Default is 1000 (i.e., retry for 1 second). +core.configLockTimeout:: + The length of time, in milliseconds, to retry when trying to + lock a configuration file for writing. Value 0 means not to + retry at all; -1 means to try indefinitely. Default is 1000 + (i.e., retry for 1 second). This is read from the configuration + that is already on disk before the lock is taken, so it can be + set persistently like any other option. + core.pager:: Text viewer for use by Git commands (e.g., 'less'). The value is meant to be interpreted by the shell. The order of preference diff --git a/config.c b/config.c index 156f2a24fa..ac9563781b 100644 --- a/config.c +++ b/config.c @@ -2903,6 +2903,24 @@ char *git_config_prepare_comment_string(const char *comment) return prepared; } +/* + * How long to retry acquiring config.lock when another process holds + * it. Default matches core.packedRefsTimeout; override via + * core.configLockTimeout. + */ +static long config_lock_timeout_ms(struct repository *r) +{ + static int configured; + static int timeout_ms = 1000; + + if (!configured) { + repo_config_get_int(r, "core.configlocktimeout", &timeout_ms); + configured = 1; + } + + return timeout_ms; +} + static void validate_comment_string(const char *comment) { size_t leading_blanks; @@ -2986,7 +3004,8 @@ int repo_config_set_multivar_in_file_gently(struct repository *r, * The lock serves a purpose in addition to locking: the new * contents of .git/config will be written into it. */ - fd = hold_lock_file_for_update(&lock, config_filename, 0); + fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, + config_lock_timeout_ms(r)); if (fd < 0) { error_errno(_("could not lock config file %s"), config_filename); ret = CONFIG_NO_LOCK; @@ -3331,7 +3350,8 @@ static int repo_config_copy_or_rename_section_in_file( if (!config_filename) config_filename = filename_buf = repo_git_path(r, "config"); - out_fd = hold_lock_file_for_update(&lock, config_filename, 0); + out_fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0, + config_lock_timeout_ms(r)); if (out_fd < 0) { ret = error(_("could not lock config file %s"), config_filename); goto out; diff --git a/t/t1300-config.sh b/t/t1300-config.sh index 128971ee12..12a1cb8580 100755 --- a/t/t1300-config.sh +++ b/t/t1300-config.sh @@ -2939,4 +2939,21 @@ test_expect_success 'writing value with trailing CR not stripped on read' ' test_cmp expect actual ' +test_expect_success 'writing config fails immediately with core.configLockTimeout=0' ' + test_when_finished "rm -f .git/config.lock" && + >.git/config.lock && + test_must_fail git -c core.configLockTimeout=0 config foo.bar baz 2>err && + test_grep "could not lock config file" err +' + +test_expect_success 'writing config retries until lock is released' ' + test_when_finished "rm -f .git/config.lock" && + >.git/config.lock && + { + ( sleep 1 && rm -f .git/config.lock ) & + } && + git -c core.configLockTimeout=5000 config retried.key value && + test "$(git config retried.key)" = value +' + test_done diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh index e7829c2c4b..5f8a31c21d 100755 --- a/t/t3200-branch.sh +++ b/t/t3200-branch.sh @@ -1037,7 +1037,8 @@ test_expect_success '--set-upstream-to fails on locked config' ' test_when_finished "rm -f .git/config.lock" && >.git/config.lock && git branch locked && - test_must_fail git branch --set-upstream-to locked 2>err && + test_must_fail git -c core.configLockTimeout=0 \ + branch --set-upstream-to locked 2>err && test_grep "could not lock config file .git/config" err ' @@ -1068,7 +1069,8 @@ test_expect_success '--unset-upstream should fail if config is locked' ' test_when_finished "rm -f .git/config.lock" && git branch --set-upstream-to locked && >.git/config.lock && - test_must_fail git branch --unset-upstream 2>err && + test_must_fail git -c core.configLockTimeout=0 \ + branch --unset-upstream 2>err && test_grep "could not lock config file .git/config" err ' diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh index e592c0bcde..aea9222649 100755 --- a/t/t5505-remote.sh +++ b/t/t5505-remote.sh @@ -1327,7 +1327,8 @@ test_expect_success 'remote set-url with locked config' ' test_when_finished "rm -f .git/config.lock" && git config --get-all remote.someremote.url >expect && >.git/config.lock && - test_must_fail git remote set-url someremote baz && + test_must_fail git -c core.configLockTimeout=0 \ + remote set-url someremote baz && git config --get-all remote.someremote.url >actual && cmp expect actual ' -- 2.53.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] config: retry acquiring config.lock, configurable via core.configLockTimeout 2026-05-17 13:21 ` [PATCH v2] config: retry acquiring config.lock, configurable via core.configLockTimeout Joerg Thalheim @ 2026-05-18 0:46 ` Junio C Hamano 2026-05-18 8:07 ` Patrick Steinhardt 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2026-05-18 0:46 UTC (permalink / raw) To: Joerg Thalheim; +Cc: git, Patrick Steinhardt, Johannes Schindelin Joerg Thalheim <joerg@thalheim.io> writes: > +/* > + * How long to retry acquiring config.lock when another process holds > + * it. Default matches core.packedRefsTimeout; override via > + * core.configLockTimeout. > + */ > +static long config_lock_timeout_ms(struct repository *r) > +{ > + static int configured; > + static int timeout_ms = 1000; > + > + if (!configured) { > + repo_config_get_int(r, "core.configlocktimeout", &timeout_ms); > + configured = 1; > + } > + > + return timeout_ms; > +} The above design means whichever repository happens to be passed for the first time as "r" to this call will fix the return value from the function for the rest of the system, meaning that the lock timeout is a per-process property and the repository parameter passed to the function does not matter all that much. It may make sense to admit that this is not a per-repository property (due to the use of local caching), have the function take no parameter and use the_repository to the config_get call. That would make the intention more clear. Of course the other end of the spectrum is to get rid of the "configured" caching here, and ask the config system to make a hashtable look-up every time the function is called. That will keep the lock timeout per-repository, which is closer to what the current function signature suggests. I dunno. My gut feeling is that there aren't valid reasons why you would want to specifically set different timeout values per repository, so the simplicity of using the_repository (i.e. the primary repository instance this process deals with) sounds like a better way to go. Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] config: retry acquiring config.lock, configurable via core.configLockTimeout 2026-05-18 0:46 ` Junio C Hamano @ 2026-05-18 8:07 ` Patrick Steinhardt 0 siblings, 0 replies; 11+ messages in thread From: Patrick Steinhardt @ 2026-05-18 8:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: Joerg Thalheim, git, Johannes Schindelin On Mon, May 18, 2026 at 09:46:05AM +0900, Junio C Hamano wrote: > Joerg Thalheim <joerg@thalheim.io> writes: > > > +/* > > + * How long to retry acquiring config.lock when another process holds > > + * it. Default matches core.packedRefsTimeout; override via > > + * core.configLockTimeout. > > + */ > > +static long config_lock_timeout_ms(struct repository *r) > > +{ > > + static int configured; > > + static int timeout_ms = 1000; > > + > > + if (!configured) { > > + repo_config_get_int(r, "core.configlocktimeout", &timeout_ms); > > + configured = 1; > > + } > > + > > + return timeout_ms; > > +} > > The above design means whichever repository happens to be passed for > the first time as "r" to this call will fix the return value from > the function for the rest of the system, meaning that the lock timeout > is a per-process property and the repository parameter passed to the > function does not matter all that much. > > It may make sense to admit that this is not a per-repository > property (due to the use of local caching), have the function take > no parameter and use the_repository to the config_get call. That > would make the intention more clear. > > Of course the other end of the spectrum is to get rid of the > "configured" caching here, and ask the config system to make a > hashtable look-up every time the function is called. That will keep > the lock timeout per-repository, which is closer to what the current > function signature suggests. > > I dunno. My gut feeling is that there aren't valid reasons why you > would want to specifically set different timeout values per > repository, so the simplicity of using the_repository (i.e. the > primary repository instance this process deals with) sounds like a > better way to go. There probably is no reason to have different values per repo. But to me the question is whether there even are any use cases where we have to lock the config file so often in quick succession that the caching mechanism even matters. My gut feeling says no, also because parsing the value from the configuration is going to be drowned out by actually writing the lockfile and renaming it into place. So I'd rather lean towards dropping the cache and keeping the repository parameter. Patrick ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-05-18 8:07 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-03 10:01 [PATCH] config: retry acquiring config.lock for 100ms Joerg Thalheim 2026-04-03 17:53 ` Junio C Hamano 2026-04-08 10:34 ` Patrick Steinhardt 2026-05-11 2:32 ` Junio C Hamano 2026-05-11 7:33 ` Patrick Steinhardt 2026-05-11 9:06 ` Jörg Thalheim 2026-05-11 10:01 ` Patrick Steinhardt 2026-05-17 11:27 ` Johannes Schindelin 2026-05-17 13:21 ` [PATCH v2] config: retry acquiring config.lock, configurable via core.configLockTimeout Joerg Thalheim 2026-05-18 0:46 ` Junio C Hamano 2026-05-18 8:07 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox