From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC39A25B311 for ; Fri, 19 Dec 2025 20:07:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766174843; cv=none; b=X8ojLD29yKa9e7mNwUoqWGYqkYAZxITURo56t3HNZsVHqAzqVEIh8YR7QT32G+15/USQc5keRcLbc8XWot9PbOGHpM+Gh2p+DoREJ+BKNhwJIxonEATOhuKxTq1YaYLXsLUH+DDx+Ry6V0RGUv0CYalPQ0jTo+L4vNXBYryti3M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766174843; c=relaxed/simple; bh=HY8Up/SGDZe1Emwm5iotJElBdZzhcz3tIr4FeUEAyNs=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=XjviWl3MezGOlhbBAWOHo0aIt4thUkqKpHduB1/xT8ZLShe8BYrMK4JYTRxSm3aADS1fVqa630Ti8925URGsS0bZ+kYz3j2O2nNwxL50H4G9ipm3tSJjErbMiMvUnBN83DDiJL635cNjyW3AwZzTqZwoS5JnnpSFXU6wTuI8BDA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=H8Nj9dNt; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=WMJdnGmo; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="H8Nj9dNt"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="WMJdnGmo" From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1766174834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8JgZqjD8fbyQtxMraHfBpYpK6YZq8ZtfAAbTG0FKKs8=; b=H8Nj9dNtz7ZkgERrbWx9a9x0hbr7329+8IvLy1UW3hw6R4n6IZI0blpO+X54KcjI0SAfsT fAf9ehKEcBJRyFxAfxBpF8p+YegYjApkXcQQ+onC/BFwfs0ROx87KPHkGoyN5vt4pFRMr8 I3d8+490iAiskEEzrF1K5ErQgqTt9jfsnjrFx2IULvBRqRn0vU2TaWeRfDNPsSq7vSiKYO /f74KJpu/WltSldywwjtt1FDL7yRIzi//2S482ynJAOa/6XYonzp6VFPg1tMsSY8gjww2S IWCrhwoSgoennbg7ckJjJ5Un56oKA3v8Vu7kR0eemXhRmD22tGFId3A4kjiSkQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1766174834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8JgZqjD8fbyQtxMraHfBpYpK6YZq8ZtfAAbTG0FKKs8=; b=WMJdnGmoWBNaWMLEnjcFJb6Fc/ta4W/9dIUoZYnqpj2RfvtgnNSozxHMIWZCdkR4m9EjPb s6GgfsckfK6aIPAQ== To: Florian Albertz , mingo@redhat.com Cc: linux-kernel@vger.kernel.org, Sebastian Andrzej Siewior , Peter Zijlstra Subject: Re: PROBLEM: Kernel 6.17 newly deadlocks futex In-Reply-To: <1d9fe0eb-11a0-4f8e-a8e7-57e1756193d3@app.fastmail.com> References: <1d9fe0eb-11a0-4f8e-a8e7-57e1756193d3@app.fastmail.com> Date: Fri, 19 Dec 2025 21:07:13 +0100 Message-ID: <873456b5hq.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Fri, Dec 19 2025 at 11:02, Florian Albertz wrote: > static int child(void *arg) { > // It is important this call to create a thread happens between > // the wait and wake calls. > // > // Due to the new behavior around `need_futex_hash_allocate_defaults`, > // the first clone which includes CLONE_THREAD (CLONE_VM is not enough) > // results in a change in how futex hashes are calculated. The problem is not this one. > clone(noop, malloc(STACK_SIZE) + STACK_SIZE, > CLONE_VM | CLONE_SIGHAND | CLONE_THREAD, NULL, NULL, NULL); > > // So this now works with another hash and therefore does not wake the main > // process. > *fut = 1; > syscall(SYS_futex, fut, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0); > > return 0; > } > > int main(int argc, char *argv[]) { > fut = calloc(1, sizeof(*fut)); > > // Now we create a new process sharing virtual memory but crucially without > // specifying CLONE_THREAD. The problem is here because the condition for hash allocation is too tight. The private hash is bound to the MM which shared with CLONE_VM, so the clone has to install a private hash despite creating a process and not a thread. > clone(child, malloc(STACK_SIZE) + STACK_SIZE, CLONE_VM, NULL, NULL, NULL); > > // And now this futex wait never wakes from kernel 6.17 onwards. > syscall(SYS_futex, fut, FUTEX_WAIT_PRIVATE, 0, NULL, NULL, 0); > } The below should fix that. It's not completely correct because the resulting hash sizing looks at current->signal->threads. As signal is not shared each resulting process accounts for their own threads. Fixing that needs some more thoughts. Thanks, tglx --- --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1948,11 +1948,9 @@ static void rv_task_fork(struct task_str #define rv_task_fork(p) do {} while (0) #endif -static bool need_futex_hash_allocate_default(u64 clone_flags) +static inline bool need_futex_hash_allocate_default(u64 clone_flags) { - if ((clone_flags & (CLONE_THREAD | CLONE_VM)) != (CLONE_THREAD | CLONE_VM)) - return false; - return true; + return !!(clone_flags & CLONE_VM); } /*