* git hogs the CPU, RAM and storage despite its config
@ 2026-05-04 15:27 jean-christophe manciot
2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King
0 siblings, 1 reply; 15+ messages in thread
From: jean-christophe manciot @ 2026-05-04 15:27 UTC (permalink / raw)
To: git
What did you do before the bug happened? (Steps to reproduce your issue)
many:
git add -v --force -- "${file_path}/${file_name}"
git commit -v -m "${full_commit_message}" -o -- "${file_path}/${file_name}"
What did you expect to happen? (Expected behavior)
I expected git to respect the configuration (.git/config):
[core]
repositoryformatversion = 0
filemode = false
bare = false
logallrefupdates = true
fsmonitor = true
untrackedcache = true
[remote "origin"]
url = git@examle.com:ppa
fetch = +refs/heads/*:refs/remotes/origin/*
[branch]
autosetuprebase = always
[branch "main"]
remote = origin
merge = refs/heads/main
rebase = true
[feature]
manyFiles = true
[fetch]
writeCommitGraph = true
[gc]
auto = 0
[pack]
threads = 1
windowMemory = 1g
I expected git to use maximum one thread for packing and I'm surprised
it even tried to perform packing as gc.auto was disabled.
What happened instead? (Actual behavior)
Instead, it used all the threads it could find (28 out of 32),
depriving the whole server of CPU, RAM and storage as the tmp files
kept piling up.
What's different between what you expected and what actually happened?
Uncontrollable use of CPU threads, RAM and storage
Anything else you want to add:
All the threads were running:
git pack-objects --local --delta-base-offset --honor-pack-keep
.git/objects/pack/.tmp-<number>-pack
Please review the rest of the bug report below.
You can delete any lines you don't wish to share.
[System Info]
git version:
git version 2.51.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
libcurl: 8.14.1
zlib: 1.3.1
SHA-1: SHA1_DC
SHA-256: SHA256_BLK
default-ref-format: files
default-hash: sha1
uname: Linux 6.17.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr
11 23:29:57 UTC 2026 x86_64
compiler info: gnuc: 15.2
libc info: glibc: 2.42
$SHELL (typically, interactive shell): /bin/bash
[Enabled Hooks]
--
Jean-Christophe
^ permalink raw reply [flat|nested] 15+ messages in thread* unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-04 15:27 git hogs the CPU, RAM and storage despite its config jean-christophe manciot @ 2026-05-08 18:03 ` Jeff King 2026-05-09 15:13 ` Mikael Magnusson 2026-05-09 17:52 ` Jeff King 0 siblings, 2 replies; 15+ messages in thread From: Jeff King @ 2026-05-08 18:03 UTC (permalink / raw) To: jean-christophe manciot; +Cc: git, Derrick Stolee On Mon, May 04, 2026 at 05:27:21PM +0200, jean-christophe manciot wrote: > [gc] > auto = 0 This is enough to disable auto-gc. But these days we also (instead?) run git-maintenance, which is controlled by maintenance.auto. So you probably are getting a bunch of background git-maintenance runs kicked off. > [pack] > threads = 1 > windowMemory = 1g > > I expected git to use maximum one thread for packing and I'm surprised > it even tried to perform packing as gc.auto was disabled. This should work to tell pack-objects to use only one thread, but that is one thread per invocation. And we were probably kicking off a ton of processes due to the background maintenance (and worse, they were all doing the same work redundantly and maybe even stepping on each others toes). +cc Stolee for wisdom on all things git-maintenance. Should maintenance.auto fall back to gc.auto for compatibility and avoiding unwanted surprises when people upgrade? Also, should background maintenance be locking to avoid multiple runs? It does not seem to do so, and if I run: git init for i in $(seq 10000); do echo $i >>file git add file git commit -m "commit $i" done I get several concurrent pack-objects processes. After a few thousand commits I got bored and hit ^C, and the resulting repo was corrupt! Which is not too surprising, as multiple simultaneous repacks are known to be unsafe, but means we should probably avoid them. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King @ 2026-05-09 15:13 ` Mikael Magnusson 2026-05-09 17:53 ` Jeff King 2026-05-09 17:52 ` Jeff King 1 sibling, 1 reply; 15+ messages in thread From: Mikael Magnusson @ 2026-05-09 15:13 UTC (permalink / raw) To: Jeff King; +Cc: jean-christophe manciot, git, Derrick Stolee On Fri, May 8, 2026 at 8:06 PM Jeff King <peff@peff.net> wrote: > > On Mon, May 04, 2026 at 05:27:21PM +0200, jean-christophe manciot wrote: > > > [gc] > > auto = 0 > > This is enough to disable auto-gc. But these days we also (instead?) run > git-maintenance, which is controlled by maintenance.auto. So you > probably are getting a bunch of background git-maintenance runs kicked > off. > > > [pack] > > threads = 1 > > windowMemory = 1g > > > > I expected git to use maximum one thread for packing and I'm surprised > > it even tried to perform packing as gc.auto was disabled. > > This should work to tell pack-objects to use only one thread, but that > is one thread per invocation. And we were probably kicking off a ton of > processes due to the background maintenance (and worse, they were all > doing the same work redundantly and maybe even stepping on each others > toes). > > +cc Stolee for wisdom on all things git-maintenance. > > Should maintenance.auto fall back to gc.auto for compatibility and > avoiding unwanted surprises when people upgrade? > > Also, should background maintenance be locking to avoid multiple runs? > It does not seem to do so, and if I run: > > git init > for i in $(seq 10000); do > echo $i >>file > git add file > git commit -m "commit $i" > done > > I get several concurrent pack-objects processes. After a few thousand > commits I got bored and hit ^C, and the resulting repo was corrupt! > Which is not too surprising, as multiple simultaneous repacks are known > to be unsafe, but means we should probably avoid them. This sounds pretty horrific, this maintenance thing is enabled by default and there's nothing but pure chance that stops it from corrupting the repo if I happen to run git repack manually at the same time? It's hard to believe something this disruptive is even enabled by default, I don't expect jobs to kick off at random hours using up resources when I'm not working on git. (I'm assuming I'm several months late to protest this being enabled by default, but still). I would strongly suggest anything like this is *never* enabled by default, it is extremely surprising to find out about. Hopefully I misunderstood the part about this being enabled by default, and you still need to say "git maintenance register", right? (although the manpage erroneously(?) claims this will only enable tasks that are safe, which you seem to have disproven?). The documentation is extremely confusing to read, it says that registering a repo for maintenance will *disable* maintenance.auto in the current repo? It almost seems like you guys made two entirely separate things, named them both the same thing, and expected anyone to understand what's going on. git maintenance will schedule tasks to run in the background with cron, and if you don't do this, the config variable named "maintenance.auto" will control if things are repacked actively while doing things in the foreground? Holy moly. I'm leaving my confusion in place even though I figured it out because that is something else. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-09 15:13 ` Mikael Magnusson @ 2026-05-09 17:53 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2026-05-09 17:53 UTC (permalink / raw) To: Mikael Magnusson; +Cc: jean-christophe manciot, git, Derrick Stolee On Sat, May 09, 2026 at 05:13:17PM +0200, Mikael Magnusson wrote: > This sounds pretty horrific, this maintenance thing is enabled by > default and there's nothing but pure chance that stops it from > corrupting the repo if I happen to run git repack manually at the same > time? It's hard to believe something this disruptive is even enabled > by default, I don't expect jobs to kick off at random hours using up > resources when I'm not working on git. (I'm assuming I'm several > months late to protest this being enabled by default, but still). I > would strongly suggest anything like this is *never* enabled by > default, it is extremely surprising to find out about. Hopefully I > misunderstood the part about this being enabled by default, and you > still need to say "git maintenance register", right? (although the > manpage erroneously(?) claims this will only enable tasks that are > safe, which you seem to have disproven?). This is not the time-based maintenance runs, but rather an "auto" mode that is triggered by commands like commit when the repo state looks like it is in need. That mode has existed for many years, but used to be "git gc --auto". And now it is "git maintenance run --auto" with a new name and new code (and, it turns out, some new bugs). The locking issue seems to be specific to --detach mode, so I don't think affects time-based runs, only the auto mode (though if you register for time-based maintenance and then manually run git-repack outside of "git maintenance run", I suspect you are subject to an unlikely race there). I think this did get much worse in v2.54. I left more details elsewhere in the thread. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King 2026-05-09 15:13 ` Mikael Magnusson @ 2026-05-09 17:52 ` Jeff King 2026-05-09 21:52 ` Taylor Blau 1 sibling, 1 reply; 15+ messages in thread From: Jeff King @ 2026-05-09 17:52 UTC (permalink / raw) To: jean-christophe manciot; +Cc: Patrick Steinhardt, git, Derrick Stolee On Fri, May 08, 2026 at 02:03:41PM -0400, Jeff King wrote: > Also, should background maintenance be locking to avoid multiple runs? > It does not seem to do so, and if I run: > > git init > for i in $(seq 10000); do > echo $i >>file > git add file > git commit -m "commit $i" > done > > I get several concurrent pack-objects processes. After a few thousand > commits I got bored and hit ^C, and the resulting repo was corrupt! > Which is not too surprising, as multiple simultaneous repacks are known > to be unsafe, but means we should probably avoid them. There is a pretty awful bug here, and it got much worse in v2.54. First, the locking for "git maintenance run --detach" is totally broken. We take the lock as the first step of maintenance_run_tasks(), then run foreground tasks, then daemonize(), and then run background tasks. But daemonize() forks and exits the parent process, leaving the child process to do the real work. But the lock subsystem registers the lockfile as a tempfile to be cleaned up at exit. So the moment we daemonize, we release the lock, even though we're still going to run sub-commands that should happen under lock. I don't think this affected the old "git gc --detach" because it takes the lock after daemonizing[1]. We can't do the same here, though, since we need to hold the lock for the foreground tasks. So either we need to release and re-take the lock between foreground and background tasks, or we need to teach the daemonize() function to update the "owner" field on all of the tempfiles to the new child[2]. AFAICT this has been broken since a6affd3343 (builtin/maintenance: add a `--detach` flag, 2024-08-16). But it was mostly harmless because under the hood we were running the "gc" task, which ran git-gc itself in no-detach mode. So git-gc took its own lock, and we never ended up with more than one repack running at a time. But that changed in 452b12c2e0 (builtin/maintenance: use "geometric" strategy by default, 2026-02-24), which is new in v2.54. Now we are running repack directly, with no additional locking. So because of the lock bug from a6affd3343, that means lots of repacks running at the same time. Ultimately fixing the lock bug will solve that. Though if doing so is too complicated for a quick maint release, I'm tempted to say we should consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a few other regression fixes so far, I assume we'll have one soon-ish). -Peff [1] git-gc does have a mode where it will take the lock and run some foreground tasks. That mode probably has the same bug, but I'm not sure if it existed before we switched to "maintenance run --auto". At this point it's moot. [2] You can't just disable the atexit handler in the parent process here, because then the child wouldn't actually clean them up either (because its pid will not match the owner field). So we really need to transfer ownership during the daemonize() process, which probably needs a new tempfile.[ch] function to walk the list and update all of the owner fields. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-09 17:52 ` Jeff King @ 2026-05-09 21:52 ` Taylor Blau 2026-05-10 16:08 ` Derrick Stolee 2026-05-11 20:01 ` Jeff King 0 siblings, 2 replies; 15+ messages in thread From: Taylor Blau @ 2026-05-09 21:52 UTC (permalink / raw) To: Jeff King Cc: jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee On Sat, May 09, 2026 at 01:52:49PM -0400, Jeff King wrote: > I don't think this affected the old "git gc --detach" because it takes > the lock after daemonizing[1]. We can't do the same here, though, since we > need to hold the lock for the foreground tasks. So either we need to > release and re-take the lock between foreground and background tasks, or > we need to teach the daemonize() function to update the "owner" field on > all of the tempfiles to the new child[2]. I agree. We can't simply do what was done in 329e6e8794c (gc: save log from daemonized gc --auto and print it next time, 2015-09-19), for exactly the reason that you stated. Dropping and re-acquiring the lock is possible, but it's racy since there is a gap during the critical window while we fork(). I believe that the only airtight way to do this is to update the owner field of the tempfiles we want to pass down during daemonization. (As an aside, do we want to do that for all tempfiles? It might be nice to have a "->reassign_on_fork" flag or something on the tempfile struct in case there are instances where the parent wants to retain ownership of the tempfile after fork()-ing, but I can't think of any off the top of my head. If we do introduce such a field, it should probably default to "true" to avoid any foot-guns.) > Ultimately fixing the lock bug will solve that. Though if doing so is > too complicated for a quick maint release, I'm tempted to say we should > consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a > few other regression fixes so far, I assume we'll have one soon-ish). I think something like the following (untested) would do the trick: --- 8< --- diff --git a/setup.c b/setup.c index 7ec4427368a..c07aeac4f7d 100644 --- a/setup.c +++ b/setup.c @@ -22,6 +22,7 @@ #include "chdir-notify.h" #include "path.h" #include "quote.h" +#include "tempfile.h" #include "trace.h" #include "trace2.h" #include "worktree.h" @@ -2162,12 +2163,17 @@ int daemonize(void) errno = ENOSYS; return -1; #else - switch (fork()) { + pid_t ppid = getpid(); + pid_t pid; + + switch ((pid = fork())) { case 0: + reassign_tempfile_ownership(ppid, getpid()); break; case -1: die_errno(_("fork failed")); default: + reassign_tempfile_ownership(ppid, pid); exit(0); } if (setsid() == -1) diff --git a/tempfile.c b/tempfile.c index 82dfa3d82f2..f0fdf582794 100644 --- a/tempfile.c +++ b/tempfile.c @@ -373,3 +373,15 @@ int delete_tempfile(struct tempfile **tempfile_p) return err ? -1 : 0; } + +void reassign_tempfile_ownership(pid_t from, pid_t to) +{ + volatile struct volatile_list_head *pos; + + list_for_each(pos, &tempfile_list) { + struct tempfile *p = list_entry(pos, struct tempfile, list); + + if (is_tempfile_active(p) && p->owner == from) + p->owner = to; + } +} diff --git a/tempfile.h b/tempfile.h index 2d2ae5b657d..783d7470b54 100644 --- a/tempfile.h +++ b/tempfile.h @@ -282,4 +282,16 @@ int delete_tempfile(struct tempfile **tempfile_p); */ int rename_tempfile(struct tempfile **tempfile_p, const char *path); +/* + * Reassign ownership of all active tempfiles whose `owner` field + * matches `from` to `to`. + * + * This is intended for use by `daemonize()`; after `fork(2)`-ing, the + * parent transfers ownership to the daemonized child so that its + * atexit handler does not unlink tempfiles that should outlive it, + * and the child claims the inherited tempfiles so that they are + * cleaned up when the daemon exits. + */ +void reassign_tempfile_ownership(pid_t from, pid_t to); + #endif /* TEMPFILE_H */ --- >8 --- That's not too terrible to write, and I would feel OK about putting it in a 2.54.1 release soon-ish provided that others think it is reasonable. Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric" strategy by default, 2026-02-24) feels somewhat unsatisfying, since it is merely making the bug less likely rather than eliminating it entirely. So in that sense I would prefer to "fix forward" here rather than to mask over the bug. But even the relatively short diff above is not so straightforward to reason through, review, or test, so I'm open to other ideas on how to proceed here. Thanks, Taylor ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-09 21:52 ` Taylor Blau @ 2026-05-10 16:08 ` Derrick Stolee 2026-05-10 20:00 ` Taylor Blau 2026-05-11 20:10 ` Jeff King 2026-05-11 20:01 ` Jeff King 1 sibling, 2 replies; 15+ messages in thread From: Derrick Stolee @ 2026-05-10 16:08 UTC (permalink / raw) To: Taylor Blau, Jeff King; +Cc: jean-christophe manciot, Patrick Steinhardt, git On 5/9/26 5:52 PM, Taylor Blau wrote: > On Sat, May 09, 2026 at 01:52:49PM -0400, Jeff King wrote: >> I don't think this affected the old "git gc --detach" because it takes >> the lock after daemonizing[1]. We can't do the same here, though, since we >> need to hold the lock for the foreground tasks. So either we need to >> release and re-take the lock between foreground and background tasks, or >> we need to teach the daemonize() function to update the "owner" field on >> all of the tempfiles to the new child[2]. > > I agree. We can't simply do what was done in 329e6e8794c (gc: save log > from daemonized gc --auto and print it next time, 2015-09-19), for > exactly the reason that you stated. > > Dropping and re-acquiring the lock is possible, but it's racy since > there is a gap during the critical window while we fork(). I believe > that the only airtight way to do this is to update the owner field of > the tempfiles we want to pass down during daemonization. I agree that this drop/reacquire pattern would not be a safe choice. > (As an aside, do we want to do that for all tempfiles? It might be nice > to have a "->reassign_on_fork" flag or something on the tempfile struct > in case there are instances where the parent wants to retain ownership > of the tempfile after fork()-ing, but I can't think of any off the top > of my head. If we do introduce such a field, it should probably default > to "true" to avoid any foot-guns.) > >> Ultimately fixing the lock bug will solve that. Though if doing so is >> too complicated for a quick maint release, I'm tempted to say we should >> consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a >> few other regression fixes so far, I assume we'll have one soon-ish). > > I think something like the following (untested) would do the trick: > > --- 8< --- > diff --git a/setup.c b/setup.c > index 7ec4427368a..c07aeac4f7d 100644 > --- a/setup.c > +++ b/setup.c > @@ -22,6 +22,7 @@ > #include "chdir-notify.h" > #include "path.h" > #include "quote.h" > +#include "tempfile.h" > #include "trace.h" > #include "trace2.h" > #include "worktree.h" > @@ -2162,12 +2163,17 @@ int daemonize(void) > errno = ENOSYS; > return -1; > #else > - switch (fork()) { > + pid_t ppid = getpid(); > + pid_t pid; > + > + switch ((pid = fork())) { > case 0: > + reassign_tempfile_ownership(ppid, getpid()); > break; > case -1: > die_errno(_("fork failed")); > default: > + reassign_tempfile_ownership(ppid, pid); > exit(0); > } > if (setsid() == -1) > diff --git a/tempfile.c b/tempfile.c > index 82dfa3d82f2..f0fdf582794 100644 > --- a/tempfile.c > +++ b/tempfile.c > @@ -373,3 +373,15 @@ int delete_tempfile(struct tempfile **tempfile_p) > > return err ? -1 : 0; > } > + > +void reassign_tempfile_ownership(pid_t from, pid_t to) > +{ > + volatile struct volatile_list_head *pos; > + > + list_for_each(pos, &tempfile_list) { > + struct tempfile *p = list_entry(pos, struct tempfile, list); > + > + if (is_tempfile_active(p) && p->owner == from) > + p->owner = to; > + } > +} I worry that we're assigning _all_ tempfiles to the child in this case, not just specific ones like the maintenance lock. But since this is specifically called only in daemonize() then it may be fine. Is there any concern that since the child didn't create a tempfile that the atexit() may not be initialized in the child process? Or will that carry over from the fork() automatically? (This is hard to test without something causing a die() in the detached process.) > That's not too terrible to write, and I would feel OK about putting it > in a 2.54.1 release soon-ish provided that others think it is reasonable. I'd rather revert the geometric maintenance default for a point release and let something more complicated like this percolate until the next release window. > Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric" > strategy by default, 2026-02-24) feels somewhat unsatisfying, since it > is merely making the bug less likely rather than eliminating it > entirely. It does limit the behavior to those who previously chose to opt-in to this behavior, and likely that's a low number or else we'd have more bug reports. > So in that sense I would prefer to "fix forward" here rather than to > mask over the bug. But even the relatively short diff above is not so > straightforward to reason through, review, or test, so I'm open to other > ideas on how to proceed here. I initially worried about cross-platform support, thinking that we needed to pass file descriptors / handles and Windows always has issues with file handles. But we aren't actually keeping a handle open but instead a record that we created the lock and should delete it when everything resolves. For me to be convinced that this forward fix is the right direction, I'd need to see a test that proves the detached process will clean up the locks on a normal process end and an early exit. Thanks, -Stolee ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-10 16:08 ` Derrick Stolee @ 2026-05-10 20:00 ` Taylor Blau 2026-05-11 6:42 ` Patrick Steinhardt 2026-05-11 20:10 ` Jeff King 1 sibling, 1 reply; 15+ messages in thread From: Taylor Blau @ 2026-05-10 20:00 UTC (permalink / raw) To: Derrick Stolee Cc: Jeff King, jean-christophe manciot, Patrick Steinhardt, git On Sun, May 10, 2026 at 12:08:14PM -0400, Derrick Stolee wrote: > I worry that we're assigning _all_ tempfiles to the child in this > case, not just specific ones like the maintenance lock. But since > this is specifically called only in daemonize() then it may be > fine. Right, I had wondered about this above, and I'm not sure what the right behavior is here. I think in our case its fine to pass all tempfiles down to the child, since we're about to exit. The only other spot that uses `daemonize()` is 'gc --detach' and 'git daemon'. I think before considering something like this we would want to reason through what the intended behavior of each of those callers is. If we end up pursuing a fix in this direction, I think the safest thing that we could do would be to have callers opt-in to this behavior so we can do the right thing in git-maintenance, and other callers retain their existing behavior. > Is there any concern that since the child didn't create a tempfile > that the atexit() may not be initialized in the child process? Or > will that carry over from the fork() automatically? (This is hard > to test without something causing a die() in the detached process.) The child will have its own copy of the tempfile list as well as its own copy of the atexit() handlers. POSIX (via atexit(3p)) says[1] that atexit() behaves according to the ISO C standard: The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1‐2017 defers to the ISO C standard. , and atexit(3) says[2]: When a child process is created via fork(2), it inherits copies of its parent's registrations. Upon a successful call to one of the exec(3) functions, all registrations are removed. So the child has an identical copy of both the atexit() handlers and the tempfile list. > > That's not too terrible to write, and I would feel OK about putting it > > in a 2.54.1 release soon-ish provided that others think it is reasonable. > > I'd rather revert the geometric maintenance default for a point > release and let something more complicated like this percolate > until the next release window. > > > Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric" > > strategy by default, 2026-02-24) feels somewhat unsatisfying, since it > > is merely making the bug less likely rather than eliminating it > > entirely. > > It does limit the behavior to those who previously chose to opt-in to > this behavior, and likely that's a low number or else we'd have more > bug reports. I agree with what you're saying. In an ideal world we would be able to apply a fix on top that would prevent this bug from occurring, but if that's not trivial to do then we shouldn't rush it in the 2.54.1 window. I think that not having this bug whatsoever (as opposed to simply reverting 452b12c2e0) would be preferable, but as you note we haven't seen any bug reports in a pre-452b12c2e0 world, so perhaps the risk is low enough that we could merely revert 452b12c2e0. > For me to be convinced that this forward fix is the right direction, > I'd need to see a test that proves the detached process will clean up > the locks on a normal process end and an early exit. Yeah, I agree. Testing this is a little tricky, but I played around with it for a while and came up with the following: --- 8< --- diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 4700beacc18..77bfa537385 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -1460,6 +1460,56 @@ test_expect_success '--detach causes maintenance to run in background' ' ) ' +test_expect_success PIPE '--detach holds maintenance lock until daemonized child exits' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + mkfifo fifo && + + git config maintenance.auto false && + git config core.lockfilepid true && + + git remote add origin /does/not/exist && + git config set remote.origin.uploadpack \ + "echo \$PPID >child && cat \"$(pwd)/fifo\"" && + + # Start a detached prefetch maintenance task. Note that + # we are backgrounding git-maintenance here in order to + # determine its PID to validate that the lockfile was + # created by the parent. + { git maintenance run --task=prefetch --detach & } && + parent="$!" && + + # Open fifo for writing, which will block until the + # upload-pack helper opens it for reading. Once exec + # returns, we know that the daemonized child is alive + # and pinned. + exec 8>fifo && + + test_path_is_file .git/objects/maintenance.lock && + test_path_is_file .git/objects/maintenance~pid.lock && + + # Verify that the maintenance.lock still exists, and + # that it was created by the parent process, not the + # child. + echo "pid $parent" >expect && + test_cmp expect .git/objects/maintenance~pid.lock && + + # Close the write end of the FIFO, causing our upload-pack + # helper to quit. Wait until the grandparent (from the + # perspective of our upload-pack helper, the daemonized + # git-maintenance child) + exec 8>&- && + gpid="$(ps -o ppid= -p $(cat child) | tr -d " ")" && + test -n $gpid && wait $gpid && + + test_path_is_missing .git/objects/maintenance.lock && + test_path_is_missing .git/objects/maintenance~pid.lock + ) +' + test_expect_success 'repacking loose objects is quiet' ' test_when_finished "rm -rf repo" && git init repo && --- >8 --- I'm not sure how portable it is, though. I think that 'ps -o ppid=' is OK, since 'start_git_in_background()' uses it, but I'm not sure if $PPID is available on Windows. It reliably passes with the fix that I shared earlier in the thread, and reliably fails without it. Thanks, Taylor [1]: https://man7.org/linux/man-pages/man3/atexit.3p.html [2]: https://man7.org/linux/man-pages/man3/atexit.3.html ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-10 20:00 ` Taylor Blau @ 2026-05-11 6:42 ` Patrick Steinhardt 2026-05-11 20:22 ` Jeff King 0 siblings, 1 reply; 15+ messages in thread From: Patrick Steinhardt @ 2026-05-11 6:42 UTC (permalink / raw) To: Taylor Blau; +Cc: Derrick Stolee, Jeff King, jean-christophe manciot, git On Sun, May 10, 2026 at 04:00:08PM -0400, Taylor Blau wrote: > On Sun, May 10, 2026 at 12:08:14PM -0400, Derrick Stolee wrote: > > > That's not too terrible to write, and I would feel OK about putting it > > > in a 2.54.1 release soon-ish provided that others think it is reasonable. > > > > I'd rather revert the geometric maintenance default for a point > > release and let something more complicated like this percolate > > until the next release window. > > > > > Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric" > > > strategy by default, 2026-02-24) feels somewhat unsatisfying, since it > > > is merely making the bug less likely rather than eliminating it > > > entirely. > > > > It does limit the behavior to those who previously chose to opt-in to > > this behavior, and likely that's a low number or else we'd have more > > bug reports. > > I agree with what you're saying. In an ideal world we would be able to > apply a fix on top that would prevent this bug from occurring, but if > that's not trivial to do then we shouldn't rush it in the 2.54.1 window. > > I think that not having this bug whatsoever (as opposed to simply > reverting 452b12c2e0) would be preferable, but as you note we haven't > seen any bug reports in a pre-452b12c2e0 world, so perhaps the risk is > low enough that we could merely revert 452b12c2e0. I think having an actual fix for the locking logic in git-maintenance(1) would be preferable though, as the issue isn't merely contained to Git 2.54. With that release we are of course much more likely to hit the issue as the "geometric" strategy is the default now. But even before this release it was possible to configure that strategy, and if the user had opted in to it they might've hit the same bug. > > For me to be convinced that this forward fix is the right direction, > > I'd need to see a test that proves the detached process will clean up > > the locks on a normal process end and an early exit. > > Yeah, I agree. Testing this is a little tricky, but I played around > with it for a while and came up with the following: > > --- 8< --- > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > index 4700beacc18..77bfa537385 100755 > --- a/t/t7900-maintenance.sh > +++ b/t/t7900-maintenance.sh > @@ -1460,6 +1460,56 @@ test_expect_success '--detach causes maintenance to run in background' ' > ) > ' > > +test_expect_success PIPE '--detach holds maintenance lock until daemonized child exits' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + > + mkfifo fifo && > + > + git config maintenance.auto false && > + git config core.lockfilepid true && > + > + git remote add origin /does/not/exist && > + git config set remote.origin.uploadpack \ > + "echo \$PPID >child && cat \"$(pwd)/fifo\"" && > + > + # Start a detached prefetch maintenance task. Note that > + # we are backgrounding git-maintenance here in order to > + # determine its PID to validate that the lockfile was > + # created by the parent. > + { git maintenance run --task=prefetch --detach & } && > + parent="$!" && > + > + # Open fifo for writing, which will block until the > + # upload-pack helper opens it for reading. Once exec > + # returns, we know that the daemonized child is alive > + # and pinned. > + exec 8>fifo && > + > + test_path_is_file .git/objects/maintenance.lock && > + test_path_is_file .git/objects/maintenance~pid.lock && > + > + # Verify that the maintenance.lock still exists, and > + # that it was created by the parent process, not the > + # child. > + echo "pid $parent" >expect && > + test_cmp expect .git/objects/maintenance~pid.lock && > + > + # Close the write end of the FIFO, causing our upload-pack > + # helper to quit. Wait until the grandparent (from the > + # perspective of our upload-pack helper, the daemonized > + # git-maintenance child) > + exec 8>&- && > + gpid="$(ps -o ppid= -p $(cat child) | tr -d " ")" && > + test -n $gpid && wait $gpid && > + > + test_path_is_missing .git/objects/maintenance.lock && > + test_path_is_missing .git/objects/maintenance~pid.lock > + ) > +' > + > test_expect_success 'repacking loose objects is quiet' ' > test_when_finished "rm -rf repo" && > git init repo && > --- >8 --- > > I'm not sure how portable it is, though. I think that 'ps -o ppid=' is > OK, since 'start_git_in_background()' uses it, but I'm not sure if $PPID > is available on Windows. Yeah, this test seems to work on my system, too. But it's another Windows machine indeed. I'll start a CI pipeline to check whether it works on Windows, as well. In any case, I also agree with Stolee that it may have unintended consequences to unconditionally reassign tempfiles to the child process when daemonizing. It is okay for all current callers, but I'm not sure whether future callers would expect this behaviour. An alternative would be to explicitly reassign the lockfile's ownership in git-maintenance(1). Something like the below patch. Thanks! Patrick diff --git a/builtin/gc.c b/builtin/gc.c index 3a71e314c9..09cb92ac97 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1810,10 +1810,32 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts, TASK_PHASE_FOREGROUND)) result = 1; - /* Failure to daemonize is ok, we'll continue in foreground. */ if (opts->detach > 0) { + pid_t child_pid; + trace2_region_enter("maintenance", "detach", the_repository); - daemonize(); + + child_pid = daemonize_without_exit(); + if (!child_pid) { + /* + * We're in the child process, so we take ownership of + * the lockfile. + */ + lock_file_reassign_owner(&lk, getpid()); + } else if (child_pid > 0) { + /* + * We're in the parent process, so we assign ownership + * of the lockfile to the child and then exit immediately. + */ + lock_file_reassign_owner(&lk, child_pid); + exit(0); + } else { + /* + * Failure to daemonize is ok, we'll continue in + * foreground. + */ + } + trace2_region_leave("maintenance", "detach", the_repository); } diff --git a/lockfile.c b/lockfile.c index 7add2f136a..96aab3c885 100644 --- a/lockfile.c +++ b/lockfile.c @@ -356,3 +356,12 @@ int rollback_lock_file(struct lock_file *lk) delete_tempfile(&lk->pid_tempfile); return delete_tempfile(&lk->tempfile); } + +void lock_file_reassign_owner(struct lock_file *lk, pid_t owner) +{ + if (!is_lock_file_locked(lk)) + BUG("cannot reassign ownership of unlocked lockfile"); + lk->tempfile->owner = owner; + if (lk->pid_tempfile) + lk->pid_tempfile->owner = owner; +} diff --git a/lockfile.h b/lockfile.h index e7233f28de..0b10b624fa 100644 --- a/lockfile.h +++ b/lockfile.h @@ -341,4 +341,14 @@ static inline int commit_lock_file_to(struct lock_file *lk, const char *path) */ int rollback_lock_file(struct lock_file *lk); +/* + * Reassign ownership of the lockfile to a different process. + * + * This is intended for use after `fork(2)`-ing. The parent transfers ownership + * to the daemonized child so that its atexit handler does not unlink the lock + * that should outlive it, and the child claims the inherited tempfiles so that + * they are cleaned up when the daemon exits. + */ +void lock_file_reassign_owner(struct lock_file *lk, pid_t owner); + #endif /* LOCKFILE_H */ diff --git a/setup.c b/setup.c index 7ec4427368..d651e11d4b 100644 --- a/setup.c +++ b/setup.c @@ -2156,20 +2156,18 @@ void sanitize_stdfds(void) close(fd); } -int daemonize(void) +pid_t daemonize_without_exit(void) { #ifdef NO_POSIX_GOODIES errno = ENOSYS; return -1; #else - switch (fork()) { - case 0: - break; - case -1: - die_errno(_("fork failed")); - default: - exit(0); - } + pid_t pid = fork(); + if (pid < 0) + return error_errno(_("fork failed")); + if (pid > 0) + return pid; + if (setsid() == -1) die_errno(_("setsid failed")); close(0); @@ -2180,6 +2178,21 @@ int daemonize(void) #endif } +int daemonize(void) +{ +#ifdef NO_POSIX_GOODIES + errno = ENOSYS; + return -1; +#else + pid_t pid = daemonize_without_exit(); + if (pid < 0) + die(NULL); + if (pid > 0) + exit(0); + return 0; +#endif +} + struct template_dir_cb_data { char *path; int initialized; diff --git a/setup.h b/setup.h index 80bc6e5f07..10f8b377f2 100644 --- a/setup.h +++ b/setup.h @@ -150,6 +150,7 @@ int path_inside_repo(const char *prefix, const char *path); void sanitize_stdfds(void); int daemonize(void); +int daemonize_without_exit(void); /* * GIT_REPO_VERSION is the version we write by default. The ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-11 6:42 ` Patrick Steinhardt @ 2026-05-11 20:22 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2026-05-11 20:22 UTC (permalink / raw) To: Patrick Steinhardt Cc: Taylor Blau, Derrick Stolee, jean-christophe manciot, git On Mon, May 11, 2026 at 08:42:11AM +0200, Patrick Steinhardt wrote: > In any case, I also agree with Stolee that it may have unintended > consequences to unconditionally reassign tempfiles to the child process > when daemonizing. It is okay for all current callers, but I'm not sure > whether future callers would expect this behaviour. > > An alternative would be to explicitly reassign the lockfile's ownership > in git-maintenance(1). Something like the below patch. I really think this should be an automatic part of daemonize(), for the reasons I gave earlier in the thread. It's really a lurking problem for any cleanup handler, but lockfiles are the one that is most likely to bite us. In fact, I thought "git gc --detach" without "--skip-foreground-tasks" had the same bug, but it narrowly avoids it by dropping and reacquiring the lock (!) on either side of the daemonize() call. So I think you are more likely to avoid unpleasant surprises by baking the behavior into daemonize() rather than requiring each caller to do it manually. Plus it keeps daemonize() a bit more abstract. It's a noop on Windows, but you can imagine an implementation that does weird platform-specific stuff and doesn't switch pids at all. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-10 16:08 ` Derrick Stolee 2026-05-10 20:00 ` Taylor Blau @ 2026-05-11 20:10 ` Jeff King 1 sibling, 0 replies; 15+ messages in thread From: Jeff King @ 2026-05-11 20:10 UTC (permalink / raw) To: Derrick Stolee Cc: Taylor Blau, jean-christophe manciot, Patrick Steinhardt, git On Sun, May 10, 2026 at 12:08:14PM -0400, Derrick Stolee wrote: > > So in that sense I would prefer to "fix forward" here rather than to > > mask over the bug. But even the relatively short diff above is not so > > straightforward to reason through, review, or test, so I'm open to other > > ideas on how to proceed here. > > I initially worried about cross-platform support, thinking that we > needed to pass file descriptors / handles and Windows always has > issues with file handles. But we aren't actually keeping a handle > open but instead a record that we created the lock and should delete > it when everything resolves. The daemonize() function is a noop on Windows anyway, because we don't have fork(). It's controlled by the NO_POSIX_GOODIES knob. > For me to be convinced that this forward fix is the right direction, > I'd need to see a test that proves the detached process will clean up > the locks on a normal process end and an early exit. I think it's hard to trigger an early exit, since the only thing we do while holding the lock is run the maintenance tasks, and those are almost entirely just using run-command to run subprocesses. So I don't see a single path that can lead to a die(). Which leaves signals. It is easy-ish to test manually with a long-running maintenance (say, the "gc" task on linux.git), verifying that maintenance.lock is there, killing the detached git-maintenance process with SIGTERM, and then confirming that the lock was cleaned up. It looks like Taylor found a way to make an arbitrarily slow task using the prefetch process. So I think his test could be modified to issue a kill at the right moment and verify cleanup (rather than waiting for a normal exit, which calls rollback_lock_file() and never checks the owner field at all). -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-09 21:52 ` Taylor Blau 2026-05-10 16:08 ` Derrick Stolee @ 2026-05-11 20:01 ` Jeff King 2026-05-11 20:21 ` Jacob Keller 1 sibling, 1 reply; 15+ messages in thread From: Jeff King @ 2026-05-11 20:01 UTC (permalink / raw) To: Taylor Blau Cc: jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee On Sat, May 09, 2026 at 05:52:29PM -0400, Taylor Blau wrote: > Dropping and re-acquiring the lock is possible, but it's racy since > there is a gap during the critical window while we fork(). I believe > that the only airtight way to do this is to update the owner field of > the tempfiles we want to pass down during daemonization. It's racy in the sense that we might not get the lock back, but I don't think that's _wrong_ exactly. We just care that some instance runs the second half eventually. It is technically vulnerable to starvation, if processes keep taking the lock, doing the foreground jobs, and then losing the race to take it a second time. But besides the fact that you'd eventually win by chance, unless you have an infinite supply of processes coming along to run the foreground half, the final invocation will run the background tasks. I agree it feels pretty hacky. It is, however, how "git gc" seems to work (notice that it does a manual delete_tempfile() on the lock right before daemonizing). > (As an aside, do we want to do that for all tempfiles? It might be nice > to have a "->reassign_on_fork" flag or something on the tempfile struct > in case there are instances where the parent wants to retain ownership > of the tempfile after fork()-ing, but I can't think of any off the top > of my head. If we do introduce such a field, it should probably default > to "true" to avoid any foot-guns.) I think the answer is yes, we want it for all tempfiles. Because it is not a property of the individual tempfiles, but rather of the caller who is forking. The point of that owner field is so that when we spawn children, we don't end up with duplicate or unexpected cleanups. But daemonize() is not about spawning a child or otherwise splitting the process into two. There is still a single contiguous flow of execution, and the fact that we switch pids and call exit() is an implementation detail. So we would still want to remain the "owner" of any cleanup. And that is true for any other atexit() handlers besides the tempfile ones, too. If we called run_processes_parallel(), and then while that child was running we called daemonize(), we'd not want to kill the child then, but our atexit handler would do so! But we don't tend to leave processes running while doing stuff like daemonize(), which is why we never bothered to implement an owner field there in the first place. > > Ultimately fixing the lock bug will solve that. Though if doing so is > > too complicated for a quick maint release, I'm tempted to say we should > > consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a > > few other regression fixes so far, I assume we'll have one soon-ish). > > I think something like the following (untested) would do the trick: Yeah, that's what I was envisioning. Note that there's a small race here: > + switch ((pid = fork())) { > case 0: > + reassign_tempfile_ownership(ppid, getpid()); > break; > case -1: > die_errno(_("fork failed")); > default: > + reassign_tempfile_ownership(ppid, pid); > exit(0); > } The parent is giving up ownership and the child is taking it. So there may be a moment where both claim ownership, or a moment where the parent has relinquished but the child has not yet taken. We're not going to call exit() in the middle there, but a signal could kill us. And then either: - If both claim ownership, it's mostly OK, because they'll both try to unlink() which is idempotent-ish. Though there is a bad sequence where we delete somebody _else's_ lock, like: 1. Parent forks, but has not yet reassigned. 2. Child calls reassign to take ownership. Now both have ownership. 3. Signal kills both parent and child, so they enter cleanup code. 4. One of them (let's say the parent) deletes the lockfile. 5. Some other unrelated process (let's call it "git other") takes the lock. 6. The child deletes the lockfile. And at that point "git other" thinks it holds the lock, but it doesn't. It's quite an unlikely sequence in practice, though, I'd think. - If neither claims ownership and a signal kills both, then nobody cleans up the lock and it is left in place. This is annoying, but also something that can happen occasionally anyway (kill -9, etc). I don't have an easy suggestion for making it more atomic, though. You could choose one or the other direction using some synchronization between the two (e.g., child reassigns only after parent signals over a pipe that it has relinquished), but it's all kind of ugly. So it may be reasonable to just shrug and assume it's unlikely enough not to worry about. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-11 20:01 ` Jeff King @ 2026-05-11 20:21 ` Jacob Keller 2026-05-11 20:35 ` Jeff King 0 siblings, 1 reply; 15+ messages in thread From: Jacob Keller @ 2026-05-11 20:21 UTC (permalink / raw) To: Jeff King, Taylor Blau Cc: jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee On 5/11/2026 1:01 PM, Jeff King wrote: > On Sat, May 09, 2026 at 05:52:29PM -0400, Taylor Blau wrote: > >> Dropping and re-acquiring the lock is possible, but it's racy since >> there is a gap during the critical window while we fork(). I believe >> that the only airtight way to do this is to update the owner field of >> the tempfiles we want to pass down during daemonization. > > It's racy in the sense that we might not get the lock back, but I don't > think that's _wrong_ exactly. We just care that some instance runs the > second half eventually. > > It is technically vulnerable to starvation, if processes keep taking the > lock, doing the foreground jobs, and then losing the race to take it a > second time. But besides the fact that you'd eventually win by chance, > unless you have an infinite supply of processes coming along to run the > foreground half, the final invocation will run the background tasks. > > I agree it feels pretty hacky. It is, however, how "git gc" seems to > work (notice that it does a manual delete_tempfile() on the lock right > before daemonizing). > >> (As an aside, do we want to do that for all tempfiles? It might be nice >> to have a "->reassign_on_fork" flag or something on the tempfile struct >> in case there are instances where the parent wants to retain ownership >> of the tempfile after fork()-ing, but I can't think of any off the top >> of my head. If we do introduce such a field, it should probably default >> to "true" to avoid any foot-guns.) > > I think the answer is yes, we want it for all tempfiles. Because it is > not a property of the individual tempfiles, but rather of the caller who > is forking. The point of that owner field is so that when we spawn > children, we don't end up with duplicate or unexpected cleanups. > > But daemonize() is not about spawning a child or otherwise splitting the > process into two. There is still a single contiguous flow of execution, > and the fact that we switch pids and call exit() is an implementation > detail. So we would still want to remain the "owner" of any cleanup. > > And that is true for any other atexit() handlers besides the tempfile > ones, too. If we called run_processes_parallel(), and then while that > child was running we called daemonize(), we'd not want to kill the child > then, but our atexit handler would do so! But we don't tend to leave > processes running while doing stuff like daemonize(), which is why we > never bothered to implement an owner field there in the first place. > >>> Ultimately fixing the lock bug will solve that. Though if doing so is >>> too complicated for a quick maint release, I'm tempted to say we should >>> consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a >>> few other regression fixes so far, I assume we'll have one soon-ish). >> >> I think something like the following (untested) would do the trick: > > Yeah, that's what I was envisioning. Note that there's a small race > here: > >> + switch ((pid = fork())) { >> case 0: >> + reassign_tempfile_ownership(ppid, getpid()); >> break; >> case -1: >> die_errno(_("fork failed")); >> default: >> + reassign_tempfile_ownership(ppid, pid); >> exit(0); >> } > > The parent is giving up ownership and the child is taking it. So there > may be a moment where both claim ownership, or a moment where the parent > has relinquished but the child has not yet taken. We're not going to > call exit() in the middle there, but a signal could kill us. > > And then either: > > - If both claim ownership, it's mostly OK, because they'll both try to > unlink() which is idempotent-ish. Though there is a bad sequence > where we delete somebody _else's_ lock, like: > > 1. Parent forks, but has not yet reassigned. > > 2. Child calls reassign to take ownership. Now both have > ownership. > > 3. Signal kills both parent and child, so they enter cleanup > code. > > 4. One of them (let's say the parent) deletes the lockfile. > > 5. Some other unrelated process (let's call it "git other") takes > the lock. > > 6. The child deletes the lockfile. > > And at that point "git other" thinks it holds the lock, but it > doesn't. It's quite an unlikely sequence in practice, though, I'd > think. > > - If neither claims ownership and a signal kills both, then nobody > cleans up the lock and it is left in place. This is annoying, but > also something that can happen occasionally anyway (kill -9, etc). > > I don't have an easy suggestion for making it more atomic, though. You > could choose one or the other direction using some synchronization > between the two (e.g., child reassigns only after parent signals over a > pipe that it has relinquished), but it's all kind of ugly. > Ya, that seems really ugly. My first thought was some way to disable signals temporarily, but I am guessing that either has no good way to do it or would introduce even more issues. Plus there is always kill -9... > So it may be reasonable to just shrug and assume it's unlikely enough > not to worry about. > > -Peff > Yea, I agree that this might qualify as unlikely enough to not worry. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-11 20:21 ` Jacob Keller @ 2026-05-11 20:35 ` Jeff King 2026-05-11 23:58 ` Jacob Keller 0 siblings, 1 reply; 15+ messages in thread From: Jeff King @ 2026-05-11 20:35 UTC (permalink / raw) To: Jacob Keller Cc: Taylor Blau, jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee On Mon, May 11, 2026 at 01:21:37PM -0700, Jacob Keller wrote: > > - If both claim ownership, it's mostly OK, because they'll both try to > > unlink() which is idempotent-ish. Though there is a bad sequence > > where we delete somebody _else's_ lock, like: > > > > 1. Parent forks, but has not yet reassigned. > > > > 2. Child calls reassign to take ownership. Now both have > > ownership. > > > > 3. Signal kills both parent and child, so they enter cleanup > > code. > > > > 4. One of them (let's say the parent) deletes the lockfile. > > > > 5. Some other unrelated process (let's call it "git other") takes > > the lock. > > > > 6. The child deletes the lockfile. > > > > And at that point "git other" thinks it holds the lock, but it > > doesn't. It's quite an unlikely sequence in practice, though, I'd > > think. > > > > - If neither claims ownership and a signal kills both, then nobody > > cleans up the lock and it is left in place. This is annoying, but > > also something that can happen occasionally anyway (kill -9, etc). > > > > I don't have an easy suggestion for making it more atomic, though. You > > could choose one or the other direction using some synchronization > > between the two (e.g., child reassigns only after parent signals over a > > pipe that it has relinquished), but it's all kind of ugly. > > > Ya, that seems really ugly. My first thought was some way to disable > signals temporarily, but I am guessing that either has no good way to do > it or would introduce even more issues. Plus there is always kill -9... I guess the parent could relinquish control by reassigning to "0" before even calling fork(), and then taking it back if fork() fails. And then worst case is that nobody cleans up the lock (if the child is killed before taking control), but that's the least-bad outcome from a correctness perspective (though still annoying). -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config 2026-05-11 20:35 ` Jeff King @ 2026-05-11 23:58 ` Jacob Keller 0 siblings, 0 replies; 15+ messages in thread From: Jacob Keller @ 2026-05-11 23:58 UTC (permalink / raw) To: Jeff King Cc: Taylor Blau, jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee On 5/11/2026 1:35 PM, Jeff King wrote: > On Mon, May 11, 2026 at 01:21:37PM -0700, Jacob Keller wrote: > >>> - If both claim ownership, it's mostly OK, because they'll both try to >>> unlink() which is idempotent-ish. Though there is a bad sequence >>> where we delete somebody _else's_ lock, like: >>> >>> 1. Parent forks, but has not yet reassigned. >>> >>> 2. Child calls reassign to take ownership. Now both have >>> ownership. >>> >>> 3. Signal kills both parent and child, so they enter cleanup >>> code. >>> >>> 4. One of them (let's say the parent) deletes the lockfile. >>> >>> 5. Some other unrelated process (let's call it "git other") takes >>> the lock. >>> >>> 6. The child deletes the lockfile. >>> >>> And at that point "git other" thinks it holds the lock, but it >>> doesn't. It's quite an unlikely sequence in practice, though, I'd >>> think. >>> >>> - If neither claims ownership and a signal kills both, then nobody >>> cleans up the lock and it is left in place. This is annoying, but >>> also something that can happen occasionally anyway (kill -9, etc). >>> >>> I don't have an easy suggestion for making it more atomic, though. You >>> could choose one or the other direction using some synchronization >>> between the two (e.g., child reassigns only after parent signals over a >>> pipe that it has relinquished), but it's all kind of ugly. >>> >> Ya, that seems really ugly. My first thought was some way to disable >> signals temporarily, but I am guessing that either has no good way to do >> it or would introduce even more issues. Plus there is always kill -9... > > I guess the parent could relinquish control by reassigning to "0" before > even calling fork(), and then taking it back if fork() fails. And then > worst case is that nobody cleans up the lock (if the child is killed > before taking control), but that's the least-bad outcome from a > correctness perspective (though still annoying). > > -Peff This seems simpler than the suggestion of a pipe signal, and the worst result is only the version that is possible no matter what if there is a well timed kill -9. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-05-11 23:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-04 15:27 git hogs the CPU, RAM and storage despite its config jean-christophe manciot 2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King 2026-05-09 15:13 ` Mikael Magnusson 2026-05-09 17:53 ` Jeff King 2026-05-09 17:52 ` Jeff King 2026-05-09 21:52 ` Taylor Blau 2026-05-10 16:08 ` Derrick Stolee 2026-05-10 20:00 ` Taylor Blau 2026-05-11 6:42 ` Patrick Steinhardt 2026-05-11 20:22 ` Jeff King 2026-05-11 20:10 ` Jeff King 2026-05-11 20:01 ` Jeff King 2026-05-11 20:21 ` Jacob Keller 2026-05-11 20:35 ` Jeff King 2026-05-11 23:58 ` Jacob Keller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.