From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from cloud.peff.net (cloud.peff.net [217.216.95.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 275723BB40 for ; Mon, 11 May 2026 20:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.216.95.84 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778529676; cv=none; b=dqKS9glstT74UE4HmggSb+XTK7L1KyfelTYuiGcQaAKOmfAcwi29C+dlXbTL/ua0biZRN6ROKUxD6PESY+F++QQDf0LDUUITrme04fdvTLATIlEHtciYjTi8ZfAxr79MJzxe9K84909LkOuXE/M9pXmBu0nseXJXdjAtgisojno= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778529676; c=relaxed/simple; bh=+TdXfFPnfRbUKq5JKI5is5w3MPZADtwuifyguSkUMLI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KS0RV2jX+Po7ALT1Jb3RXhwuW731DqjCBEr5t7T9JwiSlUNZLKQP5N7FkQZr64psoRKeqFJbMcXynr+Gru5vzRWw88o06HDdLxu5Y97g5xKPxcWZOskSjaNcvaq+EXQcS9BqCfPRrlTjlGlpnGtKPsV7+lHxj0Ryvi11nVRlDbI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=PaF+MlrK; arc=none smtp.client-ip=217.216.95.84 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="PaF+MlrK" Received: (qmail 19296 invoked by uid 106); 11 May 2026 20:01:12 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=+TdXfFPnfRbUKq5JKI5is5w3MPZADtwuifyguSkUMLI=; b=PaF+MlrK6F7ryTqCL42C5Q7DCHjeshW9mrwroo/x63y3DyGqbvubjuA7bJ6JDbnXTiMPG+U3Yy3icn0nF1Trr6wiTk0UAkHUDKVtUvaB5GMlDxY94e/HC+vXDt615MZ3OwNSk9R0j8IGZKI7UUC5VWVHjEwmypswBPkS7mWT3D0flg8+Nh1Jpvb3DKbMkwdWPapwg2rg2F8lCm2f7nQzhlvfW1KCujK3uS06UdzVV8pkC94zP2BuLM4qJnKowd8R2JcaQO5vNcmVexL2BvUnIGGuQSeUoA0KlffKWyVWkGdDlAs7jBtiDPnnl138LTwn86N1DAHS1xZ1QBt0lfxd0w== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Mon, 11 May 2026 20:01:12 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 34186 invoked by uid 111); 11 May 2026 20:01:12 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Mon, 11 May 2026 16:01:12 -0400 Authentication-Results: peff.net; auth=none Date: Mon, 11 May 2026 16:01:12 -0400 From: Jeff King To: Taylor Blau Cc: jean-christophe manciot , Patrick Steinhardt , git@vger.kernel.org, Derrick Stolee Subject: Re: unexpected auto-maintenance, was Re: git hogs the CPU, RAM and storage despite its config Message-ID: <20260511200112.GA22912@coredump.intra.peff.net> References: <20260508180341.GB737125@coredump.intra.peff.net> <20260509175249.GA2336928@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Sat, May 09, 2026 at 05:52:29PM -0400, Taylor Blau wrote: > Dropping and re-acquiring the lock is possible, but it's racy since > there is a gap during the critical window while we fork(). I believe > that the only airtight way to do this is to update the owner field of > the tempfiles we want to pass down during daemonization. It's racy in the sense that we might not get the lock back, but I don't think that's _wrong_ exactly. We just care that some instance runs the second half eventually. It is technically vulnerable to starvation, if processes keep taking the lock, doing the foreground jobs, and then losing the race to take it a second time. But besides the fact that you'd eventually win by chance, unless you have an infinite supply of processes coming along to run the foreground half, the final invocation will run the background tasks. I agree it feels pretty hacky. It is, however, how "git gc" seems to work (notice that it does a manual delete_tempfile() on the lock right before daemonizing). > (As an aside, do we want to do that for all tempfiles? It might be nice > to have a "->reassign_on_fork" flag or something on the tempfile struct > in case there are instances where the parent wants to retain ownership > of the tempfile after fork()-ing, but I can't think of any off the top > of my head. If we do introduce such a field, it should probably default > to "true" to avoid any foot-guns.) I think the answer is yes, we want it for all tempfiles. Because it is not a property of the individual tempfiles, but rather of the caller who is forking. The point of that owner field is so that when we spawn children, we don't end up with duplicate or unexpected cleanups. But daemonize() is not about spawning a child or otherwise splitting the process into two. There is still a single contiguous flow of execution, and the fact that we switch pids and call exit() is an implementation detail. So we would still want to remain the "owner" of any cleanup. And that is true for any other atexit() handlers besides the tempfile ones, too. If we called run_processes_parallel(), and then while that child was running we called daemonize(), we'd not want to kill the child then, but our atexit handler would do so! But we don't tend to leave processes running while doing stuff like daemonize(), which is why we never bothered to implement an owner field there in the first place. > > Ultimately fixing the lock bug will solve that. Though if doing so is > > too complicated for a quick maint release, I'm tempted to say we should > > consider reverting 452b12c2e0 for a potential v2.54.1 (as there were a > > few other regression fixes so far, I assume we'll have one soon-ish). > > I think something like the following (untested) would do the trick: Yeah, that's what I was envisioning. Note that there's a small race here: > + switch ((pid = fork())) { > case 0: > + reassign_tempfile_ownership(ppid, getpid()); > break; > case -1: > die_errno(_("fork failed")); > default: > + reassign_tempfile_ownership(ppid, pid); > exit(0); > } The parent is giving up ownership and the child is taking it. So there may be a moment where both claim ownership, or a moment where the parent has relinquished but the child has not yet taken. We're not going to call exit() in the middle there, but a signal could kill us. And then either: - If both claim ownership, it's mostly OK, because they'll both try to unlink() which is idempotent-ish. Though there is a bad sequence where we delete somebody _else's_ lock, like: 1. Parent forks, but has not yet reassigned. 2. Child calls reassign to take ownership. Now both have ownership. 3. Signal kills both parent and child, so they enter cleanup code. 4. One of them (let's say the parent) deletes the lockfile. 5. Some other unrelated process (let's call it "git other") takes the lock. 6. The child deletes the lockfile. And at that point "git other" thinks it holds the lock, but it doesn't. It's quite an unlikely sequence in practice, though, I'd think. - If neither claims ownership and a signal kills both, then nobody cleans up the lock and it is left in place. This is annoying, but also something that can happen occasionally anyway (kill -9, etc). I don't have an easy suggestion for making it more atomic, though. You could choose one or the other direction using some synchronization between the two (e.g., child reassigns only after parent signals over a pipe that it has relinquished), but it's all kind of ugly. So it may be reasonable to just shrug and assume it's unlikely enough not to worry about. -Peff