git hogs the CPU, RAM and storage despite its config

All of lore.kernel.org
 help / color / mirror / Atom feed

* git hogs the CPU, RAM and storage despite its config
@ 2026-05-04 15:27 jean-christophe manciot
  2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: jean-christophe manciot @ 2026-05-04 15:27 UTC (permalink / raw)
  To: git

What did you do before the bug happened? (Steps to reproduce your issue)

many:
git add -v --force -- "${file_path}/${file_name}"
git commit -v -m "${full_commit_message}" -o -- "${file_path}/${file_name}"

What did you expect to happen? (Expected behavior)

I expected git to respect the configuration (.git/config):
[core]
    repositoryformatversion = 0
    filemode = false
    bare = false
    logallrefupdates = true
    fsmonitor = true
    untrackedcache = true
[remote "origin"]
    url = git@examle.com:ppa
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch]
    autosetuprebase = always
[branch "main"]
    remote = origin
    merge = refs/heads/main
    rebase = true
[feature]
    manyFiles = true
[fetch]
    writeCommitGraph = true
[gc]
    auto = 0
[pack]
    threads = 1
    windowMemory = 1g

I expected git to use maximum one thread for packing and I'm surprised
it even tried to perform packing as gc.auto was disabled.

What happened instead? (Actual behavior)

Instead, it used all the threads it could find (28 out of 32),
depriving the whole server of CPU, RAM and storage as the tmp files
kept piling up.

What's different between what you expected and what actually happened?

Uncontrollable use of CPU threads, RAM and storage

Anything else you want to add:

All the threads were running:
git pack-objects --local --delta-base-offset --honor-pack-keep
.git/objects/pack/.tmp-<number>-pack

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.

[System Info]
git version:
git version 2.51.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
libcurl: 8.14.1
zlib: 1.3.1
SHA-1: SHA1_DC
SHA-256: SHA256_BLK
default-ref-format: files
default-hash: sha1
uname: Linux 6.17.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr
11 23:29:57 UTC 2026 x86_64
compiler info: gnuc: 15.2
libc info: glibc: 2.42
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]

-- 
Jean-Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-04 15:27 git hogs the CPU, RAM and storage despite its config jean-christophe manciot
@ 2026-05-08 18:03 ` Jeff King
  2026-05-09 15:13   ` Mikael Magnusson
  2026-05-09 17:52   ` Jeff King
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2026-05-08 18:03 UTC (permalink / raw)
  To: jean-christophe manciot; +Cc: git, Derrick Stolee

On Mon, May 04, 2026 at 05:27:21PM +0200, jean-christophe manciot wrote:

> [gc]
>     auto = 0

This is enough to disable auto-gc. But these days we also (instead?) run
git-maintenance, which is controlled by maintenance.auto. So you
probably are getting a bunch of background git-maintenance runs kicked
off.

> [pack]
>     threads = 1
>     windowMemory = 1g
> 
> I expected git to use maximum one thread for packing and I'm surprised
> it even tried to perform packing as gc.auto was disabled.

This should work to tell pack-objects to use only one thread, but that
is one thread per invocation. And we were probably kicking off a ton of
processes due to the background maintenance (and worse, they were all
doing the same work redundantly and maybe even stepping on each others
toes).

+cc Stolee for wisdom on all things git-maintenance.

Should maintenance.auto fall back to gc.auto for compatibility and
avoiding unwanted surprises when people upgrade?

Also, should background maintenance be locking to avoid multiple runs?
It does not seem to do so, and if I run:

  git init
  for i in $(seq 10000); do
    echo $i >>file
    git add file
    git commit -m "commit $i"
  done

I get several concurrent pack-objects processes. After a few thousand
commits I got bored and hit ^C, and the resulting repo was corrupt!
Which is not too surprising, as multiple simultaneous repacks are known
to be unsafe, but means we should probably avoid them.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King
@ 2026-05-09 15:13   ` Mikael Magnusson
  2026-05-09 17:53     ` Jeff King
  2026-05-09 17:52   ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Mikael Magnusson @ 2026-05-09 15:13 UTC (permalink / raw)
  To: Jeff King; +Cc: jean-christophe manciot, git, Derrick Stolee

On Fri, May 8, 2026 at 8:06 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, May 04, 2026 at 05:27:21PM +0200, jean-christophe manciot wrote:
>
> > [gc]
> >     auto = 0
>
> This is enough to disable auto-gc. But these days we also (instead?) run
> git-maintenance, which is controlled by maintenance.auto. So you
> probably are getting a bunch of background git-maintenance runs kicked
> off.
>
> > [pack]
> >     threads = 1
> >     windowMemory = 1g
> >
> > I expected git to use maximum one thread for packing and I'm surprised
> > it even tried to perform packing as gc.auto was disabled.
>
> This should work to tell pack-objects to use only one thread, but that
> is one thread per invocation. And we were probably kicking off a ton of
> processes due to the background maintenance (and worse, they were all
> doing the same work redundantly and maybe even stepping on each others
> toes).
>
> +cc Stolee for wisdom on all things git-maintenance.
>
> Should maintenance.auto fall back to gc.auto for compatibility and
> avoiding unwanted surprises when people upgrade?
>
> Also, should background maintenance be locking to avoid multiple runs?
> It does not seem to do so, and if I run:
>
>   git init
>   for i in $(seq 10000); do
>     echo $i >>file
>     git add file
>     git commit -m "commit $i"
>   done
>
> I get several concurrent pack-objects processes. After a few thousand
> commits I got bored and hit ^C, and the resulting repo was corrupt!
> Which is not too surprising, as multiple simultaneous repacks are known
> to be unsafe, but means we should probably avoid them.

This sounds pretty horrific, this maintenance thing is enabled by
default and there's nothing but pure chance that stops it from
corrupting the repo if I happen to run git repack manually at the same
time? It's hard to believe something this disruptive is even enabled
by default, I don't expect jobs to kick off at random hours using up
resources when I'm not working on git. (I'm assuming I'm several
months late to protest this being enabled by default, but still). I
would strongly suggest anything like this is *never* enabled by
default, it is extremely surprising to find out about. Hopefully I
misunderstood the part about this being enabled by default, and you
still need to say "git maintenance register", right? (although the
manpage erroneously(?) claims this will only enable tasks that are
safe, which you seem to have disproven?). The documentation is
extremely confusing to read, it says that registering a repo for
maintenance will *disable* maintenance.auto in the current repo? It
almost seems like you guys made two entirely separate things, named
them both the same thing, and expected anyone to understand what's
going on. git maintenance will schedule tasks to run in the background
with cron, and if you don't do this, the config variable named
"maintenance.auto" will control if things are repacked actively while
doing things in the foreground? Holy moly. I'm leaving my confusion in
place even though I figured it out because that is something else.

-- 
Mikael Magnusson

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-09 15:13   ` Mikael Magnusson
@ 2026-05-09 17:53     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2026-05-09 17:53 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: jean-christophe manciot, git, Derrick Stolee

On Sat, May 09, 2026 at 05:13:17PM +0200, Mikael Magnusson wrote:

> This sounds pretty horrific, this maintenance thing is enabled by
> default and there's nothing but pure chance that stops it from
> corrupting the repo if I happen to run git repack manually at the same
> time? It's hard to believe something this disruptive is even enabled
> by default, I don't expect jobs to kick off at random hours using up
> resources when I'm not working on git. (I'm assuming I'm several
> months late to protest this being enabled by default, but still). I
> would strongly suggest anything like this is *never* enabled by
> default, it is extremely surprising to find out about. Hopefully I
> misunderstood the part about this being enabled by default, and you
> still need to say "git maintenance register", right? (although the
> manpage erroneously(?) claims this will only enable tasks that are
> safe, which you seem to have disproven?).

This is not the time-based maintenance runs, but rather an "auto" mode
that is triggered by commands like commit when the repo state looks like
it is in need. That mode has existed for many years, but used to be "git
gc --auto". And now it is "git maintenance run --auto" with a new name
and new code (and, it turns out, some new bugs).

The locking issue seems to be specific to --detach mode, so I don't
think affects time-based runs, only the auto mode (though if you
register for time-based maintenance and then manually run git-repack
outside of "git maintenance run", I suspect you are subject to an
unlikely race there).

I think this did get much worse in v2.54. I left more details elsewhere
in the thread.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King
  2026-05-09 15:13   ` Mikael Magnusson
@ 2026-05-09 17:52   ` Jeff King
  2026-05-09 21:52     ` Taylor Blau
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2026-05-09 17:52 UTC (permalink / raw)
  To: jean-christophe manciot; +Cc: Patrick Steinhardt, git, Derrick Stolee

On Fri, May 08, 2026 at 02:03:41PM -0400, Jeff King wrote:

> Also, should background maintenance be locking to avoid multiple runs?
> It does not seem to do so, and if I run:
> 
>   git init
>   for i in $(seq 10000); do
>     echo $i >>file
>     git add file
>     git commit -m "commit $i"
>   done
> 
> I get several concurrent pack-objects processes. After a few thousand
> commits I got bored and hit ^C, and the resulting repo was corrupt!
> Which is not too surprising, as multiple simultaneous repacks are known
> to be unsafe, but means we should probably avoid them.

There is a pretty awful bug here, and it got much worse in v2.54.

First, the locking for "git maintenance run --detach" is totally broken.
We take the lock as the first step of maintenance_run_tasks(), then run
foreground tasks, then daemonize(), and then run background tasks. But
daemonize() forks and exits the parent process, leaving the child
process to do the real work. But the lock subsystem registers the
lockfile as a tempfile to be cleaned up at exit. So the moment we
daemonize, we release the lock, even though we're still going to run
sub-commands that should happen under lock.

I don't think this affected the old "git gc --detach" because it takes
the lock after daemonizing[1]. We can't do the same here, though, since we
need to hold the lock for the foreground tasks. So either we need to
release and re-take the lock between foreground and background tasks, or
we need to teach the daemonize() function to update the "owner" field on
all of the tempfiles to the new child[2].

AFAICT this has been broken since a6affd3343 (builtin/maintenance: add a
`--detach` flag, 2024-08-16). But it was mostly harmless because under
the hood we were running the "gc" task, which ran git-gc itself in
no-detach mode. So git-gc took its own lock, and we never ended up with
more than one repack running at a time.

But that changed in 452b12c2e0 (builtin/maintenance: use "geometric"
strategy by default, 2026-02-24), which is new in v2.54. Now we are
running repack directly, with no additional locking. So because of the
lock bug from a6affd3343, that means lots of repacks running at the same
time.

Ultimately fixing the lock bug will solve that. Though if doing so is
too complicated for a quick maint release, I'm tempted to say we should
consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a
few other regression fixes so far, I assume we'll have one soon-ish).

-Peff

[1] git-gc does have a mode where it will take the lock and run some
    foreground tasks. That mode probably has the same bug, but I'm not
    sure if it existed before we switched to "maintenance run --auto".
    At this point it's moot.

[2] You can't just disable the atexit handler in the parent process
    here, because then the child wouldn't actually clean them up either
    (because its pid will not match the owner field). So we really need
    to transfer ownership during the daemonize() process, which probably
    needs a new tempfile.[ch] function to walk the list and update all
    of the owner fields.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-09 17:52   ` Jeff King
@ 2026-05-09 21:52     ` Taylor Blau
  2026-05-10 16:08       ` Derrick Stolee
  2026-05-11 20:01       ` Jeff King
  0 siblings, 2 replies; 15+ messages in thread
From: Taylor Blau @ 2026-05-09 21:52 UTC (permalink / raw)
  To: Jeff King
  Cc: jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee

On Sat, May 09, 2026 at 01:52:49PM -0400, Jeff King wrote:
> I don't think this affected the old "git gc --detach" because it takes
> the lock after daemonizing[1]. We can't do the same here, though, since we
> need to hold the lock for the foreground tasks. So either we need to
> release and re-take the lock between foreground and background tasks, or
> we need to teach the daemonize() function to update the "owner" field on
> all of the tempfiles to the new child[2].

I agree. We can't simply do what was done in 329e6e8794c (gc: save log
from daemonized gc --auto and print it next time, 2015-09-19), for
exactly the reason that you stated.

Dropping and re-acquiring the lock is possible, but it's racy since
there is a gap during the critical window while we fork(). I believe
that the only airtight way to do this is to update the owner field of
the tempfiles we want to pass down during daemonization.

(As an aside, do we want to do that for all tempfiles? It might be nice
to have a "->reassign_on_fork" flag or something on the tempfile struct
in case there are instances where the parent wants to retain ownership
of the tempfile after fork()-ing, but I can't think of any off the top
of my head. If we do introduce such a field, it should probably default
to "true" to avoid any foot-guns.)

> Ultimately fixing the lock bug will solve that. Though if doing so is
> too complicated for a quick maint release, I'm tempted to say we should
> consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a
> few other regression fixes so far, I assume we'll have one soon-ish).

I think something like the following (untested) would do the trick:

--- 8< ---
diff --git a/setup.c b/setup.c
index 7ec4427368a..c07aeac4f7d 100644
--- a/setup.c
+++ b/setup.c
@@ -22,6 +22,7 @@
 #include "chdir-notify.h"
 #include "path.h"
 #include "quote.h"
+#include "tempfile.h"
 #include "trace.h"
 #include "trace2.h"
 #include "worktree.h"
@@ -2162,12 +2163,17 @@ int daemonize(void)
 	errno = ENOSYS;
 	return -1;
 #else
-	switch (fork()) {
+	pid_t ppid = getpid();
+	pid_t pid;
+
+	switch ((pid = fork())) {
 		case 0:
+			reassign_tempfile_ownership(ppid, getpid());
 			break;
 		case -1:
 			die_errno(_("fork failed"));
 		default:
+			reassign_tempfile_ownership(ppid, pid);
 			exit(0);
 	}
 	if (setsid() == -1)
diff --git a/tempfile.c b/tempfile.c
index 82dfa3d82f2..f0fdf582794 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -373,3 +373,15 @@ int delete_tempfile(struct tempfile **tempfile_p)

 	return err ? -1 : 0;
 }
+
+void reassign_tempfile_ownership(pid_t from, pid_t to)
+{
+	volatile struct volatile_list_head *pos;
+
+	list_for_each(pos, &tempfile_list) {
+		struct tempfile *p = list_entry(pos, struct tempfile, list);
+
+		if (is_tempfile_active(p) && p->owner == from)
+			p->owner = to;
+	}
+}
diff --git a/tempfile.h b/tempfile.h
index 2d2ae5b657d..783d7470b54 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -282,4 +282,16 @@ int delete_tempfile(struct tempfile **tempfile_p);
  */
 int rename_tempfile(struct tempfile **tempfile_p, const char *path);

+/*
+ * Reassign ownership of all active tempfiles whose `owner` field
+ * matches `from` to `to`.
+ *
+ * This is intended for use by `daemonize()`; after `fork(2)`-ing, the
+ * parent transfers ownership to the daemonized child so that its
+ * atexit handler does not unlink tempfiles that should outlive it,
+ * and the child claims the inherited tempfiles so that they are
+ * cleaned up when the daemon exits.
+ */
+void reassign_tempfile_ownership(pid_t from, pid_t to);
+
 #endif /* TEMPFILE_H */
--- >8 ---

That's not too terrible to write, and I would feel OK about putting it
in a 2.54.1 release soon-ish provided that others think it is reasonable.

Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric"
strategy by default, 2026-02-24) feels somewhat unsatisfying, since it
is merely making the bug less likely rather than eliminating it
entirely.

So in that sense I would prefer to "fix forward" here rather than to
mask over the bug. But even the relatively short diff above is not so
straightforward to reason through, review, or test, so I'm open to other
ideas on how to proceed here.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-09 21:52     ` Taylor Blau
@ 2026-05-10 16:08       ` Derrick Stolee
  2026-05-10 20:00         ` Taylor Blau
  2026-05-11 20:10         ` Jeff King
  2026-05-11 20:01       ` Jeff King
  1 sibling, 2 replies; 15+ messages in thread
From: Derrick Stolee @ 2026-05-10 16:08 UTC (permalink / raw)
  To: Taylor Blau, Jeff King; +Cc: jean-christophe manciot, Patrick Steinhardt, git

On 5/9/26 5:52 PM, Taylor Blau wrote:
> On Sat, May 09, 2026 at 01:52:49PM -0400, Jeff King wrote:
>> I don't think this affected the old "git gc --detach" because it takes
>> the lock after daemonizing[1]. We can't do the same here, though, since we
>> need to hold the lock for the foreground tasks. So either we need to
>> release and re-take the lock between foreground and background tasks, or
>> we need to teach the daemonize() function to update the "owner" field on
>> all of the tempfiles to the new child[2].
> 
> I agree. We can't simply do what was done in 329e6e8794c (gc: save log
> from daemonized gc --auto and print it next time, 2015-09-19), for
> exactly the reason that you stated.
> 
> Dropping and re-acquiring the lock is possible, but it's racy since
> there is a gap during the critical window while we fork(). I believe
> that the only airtight way to do this is to update the owner field of
> the tempfiles we want to pass down during daemonization.

I agree that this drop/reacquire pattern would not be a safe choice.

> (As an aside, do we want to do that for all tempfiles? It might be nice
> to have a "->reassign_on_fork" flag or something on the tempfile struct
> in case there are instances where the parent wants to retain ownership
> of the tempfile after fork()-ing, but I can't think of any off the top
> of my head. If we do introduce such a field, it should probably default
> to "true" to avoid any foot-guns.)
> 
>> Ultimately fixing the lock bug will solve that. Though if doing so is
>> too complicated for a quick maint release, I'm tempted to say we should
>> consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a
>> few other regression fixes so far, I assume we'll have one soon-ish).
> 
> I think something like the following (untested) would do the trick:
> 
> --- 8< ---
> diff --git a/setup.c b/setup.c
> index 7ec4427368a..c07aeac4f7d 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -22,6 +22,7 @@
>   #include "chdir-notify.h"
>   #include "path.h"
>   #include "quote.h"
> +#include "tempfile.h"
>   #include "trace.h"
>   #include "trace2.h"
>   #include "worktree.h"
> @@ -2162,12 +2163,17 @@ int daemonize(void)
>   	errno = ENOSYS;
>   	return -1;
>   #else
> -	switch (fork()) {
> +	pid_t ppid = getpid();
> +	pid_t pid;
> +
> +	switch ((pid = fork())) {
>   		case 0:
> +			reassign_tempfile_ownership(ppid, getpid());
>   			break;
>   		case -1:
>   			die_errno(_("fork failed"));
>   		default:
> +			reassign_tempfile_ownership(ppid, pid);
>   			exit(0);
>   	}
>   	if (setsid() == -1)
> diff --git a/tempfile.c b/tempfile.c
> index 82dfa3d82f2..f0fdf582794 100644
> --- a/tempfile.c
> +++ b/tempfile.c
> @@ -373,3 +373,15 @@ int delete_tempfile(struct tempfile **tempfile_p)
> 
>   	return err ? -1 : 0;
>   }
> +
> +void reassign_tempfile_ownership(pid_t from, pid_t to)
> +{
> +	volatile struct volatile_list_head *pos;
> +
> +	list_for_each(pos, &tempfile_list) {
> +		struct tempfile *p = list_entry(pos, struct tempfile, list);
> +
> +		if (is_tempfile_active(p) && p->owner == from)
> +			p->owner = to;
> +	}
> +}

I worry that we're assigning _all_ tempfiles to the child in this
case, not just specific ones like the maintenance lock. But since
this is specifically called only in daemonize() then it may be
fine.

Is there any concern that since the child didn't create a tempfile
that the atexit() may not be initialized in the child process? Or
will that carry over from the fork() automatically? (This is hard
to test without something causing a die() in the detached process.)
> That's not too terrible to write, and I would feel OK about putting it
> in a 2.54.1 release soon-ish provided that others think it is reasonable.

I'd rather revert the geometric maintenance default for a point
release and let something more complicated like this percolate
until the next release window.

> Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric"
> strategy by default, 2026-02-24) feels somewhat unsatisfying, since it
> is merely making the bug less likely rather than eliminating it
> entirely.

It does limit the behavior to those who previously chose to opt-in to
this behavior, and likely that's a low number or else we'd have more
bug reports.

> So in that sense I would prefer to "fix forward" here rather than to
> mask over the bug. But even the relatively short diff above is not so
> straightforward to reason through, review, or test, so I'm open to other
> ideas on how to proceed here.

I initially worried about cross-platform support, thinking that we
needed to pass file descriptors / handles and Windows always has
issues with file handles. But we aren't actually keeping a handle
open but instead a record that we created the lock and should delete
it when everything resolves.

For me to be convinced that this forward fix is the right direction,
I'd need to see a test that proves the detached process will clean up
the locks on a normal process end and an early exit.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-10 16:08       ` Derrick Stolee
@ 2026-05-10 20:00         ` Taylor Blau
  2026-05-11  6:42           ` Patrick Steinhardt
  2026-05-11 20:10         ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2026-05-10 20:00 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff King, jean-christophe manciot, Patrick Steinhardt, git

On Sun, May 10, 2026 at 12:08:14PM -0400, Derrick Stolee wrote:
> I worry that we're assigning _all_ tempfiles to the child in this
> case, not just specific ones like the maintenance lock. But since
> this is specifically called only in daemonize() then it may be
> fine.

Right, I had wondered about this above, and I'm not sure what the right
behavior is here. I think in our case its fine to pass all tempfiles
down to the child, since we're about to exit.

The only other spot that uses `daemonize()` is 'gc --detach' and 'git
daemon'. I think before considering something like this we would want to
reason through what the intended behavior of each of those callers is.
If we end up pursuing a fix in this direction, I think the safest thing
that we could do would be to have callers opt-in to this behavior so we
can do the right thing in git-maintenance, and other callers retain
their existing behavior.

> Is there any concern that since the child didn't create a tempfile
> that the atexit() may not be initialized in the child process? Or
> will that carry over from the fork() automatically? (This is hard
> to test without something causing a die() in the detached process.)

The child will have its own copy of the tempfile list as well as its own
copy of the atexit() handlers. POSIX (via atexit(3p)) says[1] that
atexit() behaves according to the ISO C standard:

    The functionality described on this reference page is aligned with
    the ISO C standard. Any conflict between the requirements described
    here and the ISO C standard is unintentional. This volume of
    POSIX.1‐2017 defers to the ISO C standard.

, and atexit(3) says[2]:

    When a child process is created via fork(2), it inherits copies of
    its parent's registrations.  Upon a successful call to one of the
    exec(3) functions, all registrations are removed.

So the child has an identical copy of both the atexit() handlers and the
tempfile list.

> > That's not too terrible to write, and I would feel OK about putting it
> > in a 2.54.1 release soon-ish provided that others think it is reasonable.
>
> I'd rather revert the geometric maintenance default for a point
> release and let something more complicated like this percolate
> until the next release window.
>
> > Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric"
> > strategy by default, 2026-02-24) feels somewhat unsatisfying, since it
> > is merely making the bug less likely rather than eliminating it
> > entirely.
>
> It does limit the behavior to those who previously chose to opt-in to
> this behavior, and likely that's a low number or else we'd have more
> bug reports.

I agree with what you're saying. In an ideal world we would be able to
apply a fix on top that would prevent this bug from occurring, but if
that's not trivial to do then we shouldn't rush it in the 2.54.1 window.

I think that not having this bug whatsoever (as opposed to simply
reverting 452b12c2e0) would be preferable, but as you note we haven't
seen any bug reports in a pre-452b12c2e0 world, so perhaps the risk is
low enough that we could merely revert 452b12c2e0.

> For me to be convinced that this forward fix is the right direction,
> I'd need to see a test that proves the detached process will clean up
> the locks on a normal process end and an early exit.

Yeah, I agree. Testing this is a little tricky, but I played around
with it for a while and came up with the following:

--- 8< ---
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 4700beacc18..77bfa537385 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -1460,6 +1460,56 @@ test_expect_success '--detach causes maintenance to run in background' '
 	)
 '

+test_expect_success PIPE '--detach holds maintenance lock until daemonized child exits' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		mkfifo fifo &&
+
+		git config maintenance.auto false &&
+		git config core.lockfilepid true &&
+
+		git remote add origin /does/not/exist &&
+		git config set remote.origin.uploadpack \
+			"echo \$PPID >child && cat \"$(pwd)/fifo\"" &&
+
+		# Start a detached prefetch maintenance task. Note that
+		# we are backgrounding git-maintenance here in order to
+		# determine its PID to validate that the lockfile was
+		# created by the parent.
+		{ git maintenance run --task=prefetch --detach & } &&
+		parent="$!" &&
+
+		# Open fifo for writing, which will block until the
+		# upload-pack helper opens it for reading. Once exec
+		# returns, we know that the daemonized child is alive
+		# and pinned.
+		exec 8>fifo &&
+
+		test_path_is_file .git/objects/maintenance.lock &&
+		test_path_is_file .git/objects/maintenance~pid.lock &&
+
+		# Verify that the maintenance.lock still exists, and
+		# that it was created by the parent process, not the
+		# child.
+		echo "pid $parent" >expect &&
+		test_cmp expect .git/objects/maintenance~pid.lock &&
+
+		# Close the write end of the FIFO, causing our upload-pack
+		# helper to quit. Wait until the grandparent (from the
+		# perspective of our upload-pack helper, the daemonized
+		# git-maintenance child)
+		exec 8>&- &&
+		gpid="$(ps -o ppid= -p $(cat child) | tr -d " ")" &&
+		test -n $gpid && wait $gpid &&
+
+		test_path_is_missing .git/objects/maintenance.lock &&
+		test_path_is_missing .git/objects/maintenance~pid.lock
+	)
+'
+
 test_expect_success 'repacking loose objects is quiet' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
--- >8 ---

I'm not sure how portable it is, though. I think that 'ps -o ppid=' is
OK, since 'start_git_in_background()' uses it, but I'm not sure if $PPID
is available on Windows.

It reliably passes with the fix that I shared earlier in the thread, and
reliably fails without it.

Thanks,
Taylor

[1]: https://man7.org/linux/man-pages/man3/atexit.3p.html
[2]: https://man7.org/linux/man-pages/man3/atexit.3.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-10 20:00         ` Taylor Blau
@ 2026-05-11  6:42           ` Patrick Steinhardt
  2026-05-11 20:22             ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick Steinhardt @ 2026-05-11  6:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, Jeff King, jean-christophe manciot, git

On Sun, May 10, 2026 at 04:00:08PM -0400, Taylor Blau wrote:
> On Sun, May 10, 2026 at 12:08:14PM -0400, Derrick Stolee wrote:
> > > That's not too terrible to write, and I would feel OK about putting it
> > > in a 2.54.1 release soon-ish provided that others think it is reasonable.
> >
> > I'd rather revert the geometric maintenance default for a point
> > release and let something more complicated like this percolate
> > until the next release window.
> >
> > > Simply reverting 452b12c2e0 (builtin/maintenance: use "geometric"
> > > strategy by default, 2026-02-24) feels somewhat unsatisfying, since it
> > > is merely making the bug less likely rather than eliminating it
> > > entirely.
> >
> > It does limit the behavior to those who previously chose to opt-in to
> > this behavior, and likely that's a low number or else we'd have more
> > bug reports.
> 
> I agree with what you're saying. In an ideal world we would be able to
> apply a fix on top that would prevent this bug from occurring, but if
> that's not trivial to do then we shouldn't rush it in the 2.54.1 window.
> 
> I think that not having this bug whatsoever (as opposed to simply
> reverting 452b12c2e0) would be preferable, but as you note we haven't
> seen any bug reports in a pre-452b12c2e0 world, so perhaps the risk is
> low enough that we could merely revert 452b12c2e0.

I think having an actual fix for the locking logic in git-maintenance(1)
would be preferable though, as the issue isn't merely contained to Git
2.54. With that release we are of course much more likely to hit the
issue as the "geometric" strategy is the default now. But even before
this release it was possible to configure that strategy, and if the user
had opted in to it they might've hit the same bug.

> > For me to be convinced that this forward fix is the right direction,
> > I'd need to see a test that proves the detached process will clean up
> > the locks on a normal process end and an early exit.
> 
> Yeah, I agree. Testing this is a little tricky, but I played around
> with it for a while and came up with the following:
> 
> --- 8< ---
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 4700beacc18..77bfa537385 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -1460,6 +1460,56 @@ test_expect_success '--detach causes maintenance to run in background' '
>  	)
>  '
> 
> +test_expect_success PIPE '--detach holds maintenance lock until daemonized child exits' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		mkfifo fifo &&
> +
> +		git config maintenance.auto false &&
> +		git config core.lockfilepid true &&
> +
> +		git remote add origin /does/not/exist &&
> +		git config set remote.origin.uploadpack \
> +			"echo \$PPID >child && cat \"$(pwd)/fifo\"" &&
> +
> +		# Start a detached prefetch maintenance task. Note that
> +		# we are backgrounding git-maintenance here in order to
> +		# determine its PID to validate that the lockfile was
> +		# created by the parent.
> +		{ git maintenance run --task=prefetch --detach & } &&
> +		parent="$!" &&
> +
> +		# Open fifo for writing, which will block until the
> +		# upload-pack helper opens it for reading. Once exec
> +		# returns, we know that the daemonized child is alive
> +		# and pinned.
> +		exec 8>fifo &&
> +
> +		test_path_is_file .git/objects/maintenance.lock &&
> +		test_path_is_file .git/objects/maintenance~pid.lock &&
> +
> +		# Verify that the maintenance.lock still exists, and
> +		# that it was created by the parent process, not the
> +		# child.
> +		echo "pid $parent" >expect &&
> +		test_cmp expect .git/objects/maintenance~pid.lock &&
> +
> +		# Close the write end of the FIFO, causing our upload-pack
> +		# helper to quit. Wait until the grandparent (from the
> +		# perspective of our upload-pack helper, the daemonized
> +		# git-maintenance child)
> +		exec 8>&- &&
> +		gpid="$(ps -o ppid= -p $(cat child) | tr -d " ")" &&
> +		test -n $gpid && wait $gpid &&
> +
> +		test_path_is_missing .git/objects/maintenance.lock &&
> +		test_path_is_missing .git/objects/maintenance~pid.lock
> +	)
> +'
> +
>  test_expect_success 'repacking loose objects is quiet' '
>  	test_when_finished "rm -rf repo" &&
>  	git init repo &&
> --- >8 ---
> 
> I'm not sure how portable it is, though. I think that 'ps -o ppid=' is
> OK, since 'start_git_in_background()' uses it, but I'm not sure if $PPID
> is available on Windows.

Yeah, this test seems to work on my system, too. But it's another
Windows machine indeed. I'll start a CI pipeline to check whether it
works on Windows, as well.

In any case, I also agree with Stolee that it may have unintended
consequences to unconditionally reassign tempfiles to the child process
when daemonizing. It is okay for all current callers, but I'm not sure
whether future callers would expect this behaviour.

An alternative would be to explicitly reassign the lockfile's ownership
in git-maintenance(1). Something like the below patch.

Thanks!

Patrick

diff --git a/builtin/gc.c b/builtin/gc.c
index 3a71e314c9..09cb92ac97 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1810,10 +1810,32 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts,
 				   TASK_PHASE_FOREGROUND))
 			result = 1;
 
-	/* Failure to daemonize is ok, we'll continue in foreground. */
 	if (opts->detach > 0) {
+		pid_t child_pid;
+
 		trace2_region_enter("maintenance", "detach", the_repository);
-		daemonize();
+
+		child_pid = daemonize_without_exit();
+		if (!child_pid) {
+			/*
+			 * We're in the child process, so we take ownership of
+			 * the lockfile.
+			 */
+			lock_file_reassign_owner(&lk, getpid());
+		} else if (child_pid > 0) {
+			/*
+			 * We're in the parent process, so we assign ownership
+			 * of the lockfile to the child and then exit immediately.
+			 */
+			lock_file_reassign_owner(&lk, child_pid);
+			exit(0);
+		} else {
+			/*
+			 * Failure to daemonize is ok, we'll continue in
+			 * foreground.
+			 */
+		}
+
 		trace2_region_leave("maintenance", "detach", the_repository);
 	}
 
diff --git a/lockfile.c b/lockfile.c
index 7add2f136a..96aab3c885 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -356,3 +356,12 @@ int rollback_lock_file(struct lock_file *lk)
 	delete_tempfile(&lk->pid_tempfile);
 	return delete_tempfile(&lk->tempfile);
 }
+
+void lock_file_reassign_owner(struct lock_file *lk, pid_t owner)
+{
+	if (!is_lock_file_locked(lk))
+		BUG("cannot reassign ownership of unlocked lockfile");
+	lk->tempfile->owner = owner;
+	if (lk->pid_tempfile)
+		lk->pid_tempfile->owner = owner;
+}
diff --git a/lockfile.h b/lockfile.h
index e7233f28de..0b10b624fa 100644
--- a/lockfile.h
+++ b/lockfile.h
@@ -341,4 +341,14 @@ static inline int commit_lock_file_to(struct lock_file *lk, const char *path)
  */
 int rollback_lock_file(struct lock_file *lk);
 
+/*
+ * Reassign ownership of the lockfile to a different process.
+ *
+ * This is intended for use after `fork(2)`-ing. The parent transfers ownership
+ * to the daemonized child so that its atexit handler does not unlink the lock
+ * that should outlive it, and the child claims the inherited tempfiles so that
+ * they are cleaned up when the daemon exits.
+ */
+void lock_file_reassign_owner(struct lock_file *lk, pid_t owner);
+
 #endif /* LOCKFILE_H */
diff --git a/setup.c b/setup.c
index 7ec4427368..d651e11d4b 100644
--- a/setup.c
+++ b/setup.c
@@ -2156,20 +2156,18 @@ void sanitize_stdfds(void)
 		close(fd);
 }
 
-int daemonize(void)
+pid_t daemonize_without_exit(void)
 {
 #ifdef NO_POSIX_GOODIES
 	errno = ENOSYS;
 	return -1;
 #else
-	switch (fork()) {
-		case 0:
-			break;
-		case -1:
-			die_errno(_("fork failed"));
-		default:
-			exit(0);
-	}
+	pid_t pid = fork();
+	if (pid < 0)
+		return error_errno(_("fork failed"));
+	if (pid > 0)
+		return pid;
+
 	if (setsid() == -1)
 		die_errno(_("setsid failed"));
 	close(0);
@@ -2180,6 +2178,21 @@ int daemonize(void)
 #endif
 }
 
+int daemonize(void)
+{
+#ifdef NO_POSIX_GOODIES
+	errno = ENOSYS;
+	return -1;
+#else
+	pid_t pid = daemonize_without_exit();
+	if (pid < 0)
+		die(NULL);
+	if (pid > 0)
+		exit(0);
+	return 0;
+#endif
+}
+
 struct template_dir_cb_data {
 	char *path;
 	int initialized;
diff --git a/setup.h b/setup.h
index 80bc6e5f07..10f8b377f2 100644
--- a/setup.h
+++ b/setup.h
@@ -150,6 +150,7 @@ int path_inside_repo(const char *prefix, const char *path);
 
 void sanitize_stdfds(void);
 int daemonize(void);
+int daemonize_without_exit(void);
 
 /*
  * GIT_REPO_VERSION is the version we write by default. The

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-11  6:42           ` Patrick Steinhardt
@ 2026-05-11 20:22             ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2026-05-11 20:22 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Taylor Blau, Derrick Stolee, jean-christophe manciot, git

On Mon, May 11, 2026 at 08:42:11AM +0200, Patrick Steinhardt wrote:

> In any case, I also agree with Stolee that it may have unintended
> consequences to unconditionally reassign tempfiles to the child process
> when daemonizing. It is okay for all current callers, but I'm not sure
> whether future callers would expect this behaviour.
> 
> An alternative would be to explicitly reassign the lockfile's ownership
> in git-maintenance(1). Something like the below patch.

I really think this should be an automatic part of daemonize(), for the
reasons I gave earlier in the thread. It's really a lurking problem for
any cleanup handler, but lockfiles are the one that is most likely to
bite us.

In fact, I thought "git gc --detach" without "--skip-foreground-tasks"
had the same bug, but it narrowly avoids it by dropping and reacquiring
the lock (!) on either side of the daemonize() call.

So I think you are more likely to avoid unpleasant surprises by baking
the behavior into daemonize() rather than requiring each caller to do it
manually.

Plus it keeps daemonize() a bit more abstract. It's a noop on Windows,
but you can imagine an implementation that does weird platform-specific
stuff and doesn't switch pids at all.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-10 16:08       ` Derrick Stolee
  2026-05-10 20:00         ` Taylor Blau
@ 2026-05-11 20:10         ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2026-05-11 20:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, jean-christophe manciot, Patrick Steinhardt, git

On Sun, May 10, 2026 at 12:08:14PM -0400, Derrick Stolee wrote:

> > So in that sense I would prefer to "fix forward" here rather than to
> > mask over the bug. But even the relatively short diff above is not so
> > straightforward to reason through, review, or test, so I'm open to other
> > ideas on how to proceed here.
> 
> I initially worried about cross-platform support, thinking that we
> needed to pass file descriptors / handles and Windows always has
> issues with file handles. But we aren't actually keeping a handle
> open but instead a record that we created the lock and should delete
> it when everything resolves.

The daemonize() function is a noop on Windows anyway, because we don't
have fork(). It's controlled by the NO_POSIX_GOODIES knob.

> For me to be convinced that this forward fix is the right direction,
> I'd need to see a test that proves the detached process will clean up
> the locks on a normal process end and an early exit.

I think it's hard to trigger an early exit, since the only thing we do
while holding the lock is run the maintenance tasks, and those are
almost entirely just using run-command to run subprocesses. So I don't
see a single path that can lead to a die().

Which leaves signals. It is easy-ish to test manually with a
long-running maintenance (say, the "gc" task on linux.git), verifying
that maintenance.lock is there, killing the detached git-maintenance
process with SIGTERM, and then confirming that the lock was cleaned up.

It looks like Taylor found a way to make an arbitrarily slow task using
the prefetch process. So I think his test could be modified to issue a
kill at the right moment and verify cleanup (rather than waiting for a
normal exit, which calls rollback_lock_file() and never checks the owner
field at all).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-09 21:52     ` Taylor Blau
  2026-05-10 16:08       ` Derrick Stolee
@ 2026-05-11 20:01       ` Jeff King
  2026-05-11 20:21         ` Jacob Keller
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2026-05-11 20:01 UTC (permalink / raw)
  To: Taylor Blau
  Cc: jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee

On Sat, May 09, 2026 at 05:52:29PM -0400, Taylor Blau wrote:

> Dropping and re-acquiring the lock is possible, but it's racy since
> there is a gap during the critical window while we fork(). I believe
> that the only airtight way to do this is to update the owner field of
> the tempfiles we want to pass down during daemonization.

It's racy in the sense that we might not get the lock back, but I don't
think that's _wrong_ exactly. We just care that some instance runs the
second half eventually.

It is technically vulnerable to starvation, if processes keep taking the
lock, doing the foreground jobs, and then losing the race to take it a
second time. But besides the fact that you'd eventually win by chance,
unless you have an infinite supply of processes coming along to run the
foreground half, the final invocation will run the background tasks.

I agree it feels pretty hacky. It is, however, how "git gc" seems to
work (notice that it does a manual delete_tempfile() on the lock right
before daemonizing).

> (As an aside, do we want to do that for all tempfiles? It might be nice
> to have a "->reassign_on_fork" flag or something on the tempfile struct
> in case there are instances where the parent wants to retain ownership
> of the tempfile after fork()-ing, but I can't think of any off the top
> of my head. If we do introduce such a field, it should probably default
> to "true" to avoid any foot-guns.)

I think the answer is yes, we want it for all tempfiles. Because it is
not a property of the individual tempfiles, but rather of the caller who
is forking. The point of that owner field is so that when we spawn
children, we don't end up with duplicate or unexpected cleanups.

But daemonize() is not about spawning a child or otherwise splitting the
process into two. There is still a single contiguous flow of execution,
and the fact that we switch pids and call exit() is an implementation
detail. So we would still want to remain the "owner" of any cleanup.

And that is true for any other atexit() handlers besides the tempfile
ones, too. If we called run_processes_parallel(), and then while that
child was running we called daemonize(), we'd not want to kill the child
then, but our atexit handler would do so! But we don't tend to leave
processes running while doing stuff like daemonize(), which is why we
never bothered to implement an owner field there in the first place.

> > Ultimately fixing the lock bug will solve that. Though if doing so is
> > too complicated for a quick maint release, I'm tempted to say we should
> > consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a
> > few other regression fixes so far, I assume we'll have one soon-ish).
> 
> I think something like the following (untested) would do the trick:

Yeah, that's what I was envisioning. Note that there's a small race
here:

> +	switch ((pid = fork())) {
>  		case 0:
> +			reassign_tempfile_ownership(ppid, getpid());
>  			break;
>  		case -1:
>  			die_errno(_("fork failed"));
>  		default:
> +			reassign_tempfile_ownership(ppid, pid);
>  			exit(0);
>  	}

The parent is giving up ownership and the child is taking it. So there
may be a moment where both claim ownership, or a moment where the parent
has relinquished but the child has not yet taken. We're not going to
call exit() in the middle there, but a signal could kill us.

And then either:

  - If both claim ownership, it's mostly OK, because they'll both try to
    unlink() which is idempotent-ish. Though there is a bad sequence
    where we delete somebody _else's_ lock, like:

       1. Parent forks, but has not yet reassigned.

       2. Child calls reassign to take ownership. Now both have
	  ownership.

       3. Signal kills both parent and child, so they enter cleanup
	  code.

       4. One of them (let's say the parent) deletes the lockfile.

       5. Some other unrelated process (let's call it "git other") takes
	  the lock.

       6. The child deletes the lockfile.

    And at that point "git other" thinks it holds the lock, but it
    doesn't. It's quite an unlikely sequence in practice, though, I'd
    think.

  - If neither claims ownership and a signal kills both, then nobody
    cleans up the lock and it is left in place. This is annoying, but
    also something that can happen occasionally anyway (kill -9, etc).

I don't have an easy suggestion for making it more atomic, though. You
could choose one or the other direction using some synchronization
between the two (e.g., child reassigns only after parent signals over a
pipe that it has relinquished), but it's all kind of ugly.

So it may be reasonable to just shrug and assume it's unlikely enough
not to worry about.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-11 20:01       ` Jeff King
@ 2026-05-11 20:21         ` Jacob Keller
  2026-05-11 20:35           ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Jacob Keller @ 2026-05-11 20:21 UTC (permalink / raw)
  To: Jeff King, Taylor Blau
  Cc: jean-christophe manciot, Patrick Steinhardt, git, Derrick Stolee

On 5/11/2026 1:01 PM, Jeff King wrote:
> On Sat, May 09, 2026 at 05:52:29PM -0400, Taylor Blau wrote:
> 
>> Dropping and re-acquiring the lock is possible, but it's racy since
>> there is a gap during the critical window while we fork(). I believe
>> that the only airtight way to do this is to update the owner field of
>> the tempfiles we want to pass down during daemonization.
> 
> It's racy in the sense that we might not get the lock back, but I don't
> think that's _wrong_ exactly. We just care that some instance runs the
> second half eventually.
> 
> It is technically vulnerable to starvation, if processes keep taking the
> lock, doing the foreground jobs, and then losing the race to take it a
> second time. But besides the fact that you'd eventually win by chance,
> unless you have an infinite supply of processes coming along to run the
> foreground half, the final invocation will run the background tasks.
> 
> I agree it feels pretty hacky. It is, however, how "git gc" seems to
> work (notice that it does a manual delete_tempfile() on the lock right
> before daemonizing).
> 
>> (As an aside, do we want to do that for all tempfiles? It might be nice
>> to have a "->reassign_on_fork" flag or something on the tempfile struct
>> in case there are instances where the parent wants to retain ownership
>> of the tempfile after fork()-ing, but I can't think of any off the top
>> of my head. If we do introduce such a field, it should probably default
>> to "true" to avoid any foot-guns.)
> 
> I think the answer is yes, we want it for all tempfiles. Because it is
> not a property of the individual tempfiles, but rather of the caller who
> is forking. The point of that owner field is so that when we spawn
> children, we don't end up with duplicate or unexpected cleanups.
> 
> But daemonize() is not about spawning a child or otherwise splitting the
> process into two. There is still a single contiguous flow of execution,
> and the fact that we switch pids and call exit() is an implementation
> detail. So we would still want to remain the "owner" of any cleanup.
> 
> And that is true for any other atexit() handlers besides the tempfile
> ones, too. If we called run_processes_parallel(), and then while that
> child was running we called daemonize(), we'd not want to kill the child
> then, but our atexit handler would do so! But we don't tend to leave
> processes running while doing stuff like daemonize(), which is why we
> never bothered to implement an owner field there in the first place.
> 
>>> Ultimately fixing the lock bug will solve that. Though if doing so is
>>> too complicated for a quick maint release, I'm tempted to say we should
>>> consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a
>>> few other regression fixes so far, I assume we'll have one soon-ish).
>>
>> I think something like the following (untested) would do the trick:
> 
> Yeah, that's what I was envisioning. Note that there's a small race
> here:
> 
>> +	switch ((pid = fork())) {
>>  		case 0:
>> +			reassign_tempfile_ownership(ppid, getpid());
>>  			break;
>>  		case -1:
>>  			die_errno(_("fork failed"));
>>  		default:
>> +			reassign_tempfile_ownership(ppid, pid);
>>  			exit(0);
>>  	}
> 
> The parent is giving up ownership and the child is taking it. So there
> may be a moment where both claim ownership, or a moment where the parent
> has relinquished but the child has not yet taken. We're not going to
> call exit() in the middle there, but a signal could kill us.
> 
> And then either:
> 
>   - If both claim ownership, it's mostly OK, because they'll both try to
>     unlink() which is idempotent-ish. Though there is a bad sequence
>     where we delete somebody _else's_ lock, like:
> 
>        1. Parent forks, but has not yet reassigned.
> 
>        2. Child calls reassign to take ownership. Now both have
> 	  ownership.
> 
>        3. Signal kills both parent and child, so they enter cleanup
> 	  code.
> 
>        4. One of them (let's say the parent) deletes the lockfile.
> 
>        5. Some other unrelated process (let's call it "git other") takes
> 	  the lock.
> 
>        6. The child deletes the lockfile.
> 
>     And at that point "git other" thinks it holds the lock, but it
>     doesn't. It's quite an unlikely sequence in practice, though, I'd
>     think.
> 
>   - If neither claims ownership and a signal kills both, then nobody
>     cleans up the lock and it is left in place. This is annoying, but
>     also something that can happen occasionally anyway (kill -9, etc).
> 
> I don't have an easy suggestion for making it more atomic, though. You
> could choose one or the other direction using some synchronization
> between the two (e.g., child reassigns only after parent signals over a
> pipe that it has relinquished), but it's all kind of ugly.
> 
Ya, that seems really ugly. My first thought was some way to disable
signals temporarily, but I am guessing that either has no good way to do
it or would introduce even more issues. Plus there is always kill -9...

> So it may be reasonable to just shrug and assume it's unlikely enough
> not to worry about.
> 
> -Peff
> 

Yea, I agree that this might qualify as unlikely enough to not worry.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-11 20:21         ` Jacob Keller
@ 2026-05-11 20:35           ` Jeff King
  2026-05-11 23:58             ` Jacob Keller
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2026-05-11 20:35 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Taylor Blau, jean-christophe manciot, Patrick Steinhardt, git,
	Derrick Stolee

On Mon, May 11, 2026 at 01:21:37PM -0700, Jacob Keller wrote:

> >   - If both claim ownership, it's mostly OK, because they'll both try to
> >     unlink() which is idempotent-ish. Though there is a bad sequence
> >     where we delete somebody _else's_ lock, like:
> > 
> >        1. Parent forks, but has not yet reassigned.
> > 
> >        2. Child calls reassign to take ownership. Now both have
> > 	  ownership.
> > 
> >        3. Signal kills both parent and child, so they enter cleanup
> > 	  code.
> > 
> >        4. One of them (let's say the parent) deletes the lockfile.
> > 
> >        5. Some other unrelated process (let's call it "git other") takes
> > 	  the lock.
> > 
> >        6. The child deletes the lockfile.
> > 
> >     And at that point "git other" thinks it holds the lock, but it
> >     doesn't. It's quite an unlikely sequence in practice, though, I'd
> >     think.
> > 
> >   - If neither claims ownership and a signal kills both, then nobody
> >     cleans up the lock and it is left in place. This is annoying, but
> >     also something that can happen occasionally anyway (kill -9, etc).
> > 
> > I don't have an easy suggestion for making it more atomic, though. You
> > could choose one or the other direction using some synchronization
> > between the two (e.g., child reassigns only after parent signals over a
> > pipe that it has relinquished), but it's all kind of ugly.
> > 
> Ya, that seems really ugly. My first thought was some way to disable
> signals temporarily, but I am guessing that either has no good way to do
> it or would introduce even more issues. Plus there is always kill -9...

I guess the parent could relinquish control by reassigning to "0" before
even calling fork(), and then taking it back if fork() fails. And then
worst case is that nobody cleans up the lock (if the child is killed
before taking control), but that's the least-bad outcome from a
correctness perspective (though still annoying).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config
  2026-05-11 20:35           ` Jeff King
@ 2026-05-11 23:58             ` Jacob Keller
  0 siblings, 0 replies; 15+ messages in thread
From: Jacob Keller @ 2026-05-11 23:58 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, jean-christophe manciot, Patrick Steinhardt, git,
	Derrick Stolee

On 5/11/2026 1:35 PM, Jeff King wrote:
> On Mon, May 11, 2026 at 01:21:37PM -0700, Jacob Keller wrote:
> 
>>>   - If both claim ownership, it's mostly OK, because they'll both try to
>>>     unlink() which is idempotent-ish. Though there is a bad sequence
>>>     where we delete somebody _else's_ lock, like:
>>>
>>>        1. Parent forks, but has not yet reassigned.
>>>
>>>        2. Child calls reassign to take ownership. Now both have
>>> 	  ownership.
>>>
>>>        3. Signal kills both parent and child, so they enter cleanup
>>> 	  code.
>>>
>>>        4. One of them (let's say the parent) deletes the lockfile.
>>>
>>>        5. Some other unrelated process (let's call it "git other") takes
>>> 	  the lock.
>>>
>>>        6. The child deletes the lockfile.
>>>
>>>     And at that point "git other" thinks it holds the lock, but it
>>>     doesn't. It's quite an unlikely sequence in practice, though, I'd
>>>     think.
>>>
>>>   - If neither claims ownership and a signal kills both, then nobody
>>>     cleans up the lock and it is left in place. This is annoying, but
>>>     also something that can happen occasionally anyway (kill -9, etc).
>>>
>>> I don't have an easy suggestion for making it more atomic, though. You
>>> could choose one or the other direction using some synchronization
>>> between the two (e.g., child reassigns only after parent signals over a
>>> pipe that it has relinquished), but it's all kind of ugly.
>>>
>> Ya, that seems really ugly. My first thought was some way to disable
>> signals temporarily, but I am guessing that either has no good way to do
>> it or would introduce even more issues. Plus there is always kill -9...
> 
> I guess the parent could relinquish control by reassigning to "0" before
> even calling fork(), and then taking it back if fork() fails. And then
> worst case is that nobody cleans up the lock (if the child is killed
> before taking control), but that's the least-bad outcome from a
> correctness perspective (though still annoying).
> 
> -Peff

This seems simpler than the suggestion of a pipe signal, and the worst
result is only the version that is possible no matter what if there is a
well timed kill -9.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-05-11 23:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 15:27 git hogs the CPU, RAM and storage despite its config jean-christophe manciot
2026-05-08 18:03 ` unexpected auto-maintenance, was " Jeff King
2026-05-09 15:13   ` Mikael Magnusson
2026-05-09 17:53     ` Jeff King
2026-05-09 17:52   ` Jeff King
2026-05-09 21:52     ` Taylor Blau
2026-05-10 16:08       ` Derrick Stolee
2026-05-10 20:00         ` Taylor Blau
2026-05-11  6:42           ` Patrick Steinhardt
2026-05-11 20:22             ` Jeff King
2026-05-11 20:10         ` Jeff King
2026-05-11 20:01       ` Jeff King
2026-05-11 20:21         ` Jacob Keller
2026-05-11 20:35           ` Jeff King
2026-05-11 23:58             ` Jacob Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.