public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Kernel 6.17 newly deadlocks futex
@ 2025-12-19 10:02 Florian Albertz
  2025-12-19 20:07 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Albertz @ 2025-12-19 10:02 UTC (permalink / raw)
  To: tglx, mingo; +Cc: linux-kernel

Hi everyone,

a program of mine started deadlocking in kernel 6.17 due to hanging in a
FUTEX_WAIT_PRIVATE call.

Now first off, due to factors outside of my control, I am using futexes with
the FUTEX_PRIVATE_FLAG while also working with child processes which aren't
spawned with CLONE_THREAD. They are however created with CLONE_VM.

This did work before (and works now, excluding the specific edge case demonstrated
below), but I would understand this not being fixed as FUTEX_PRIVATE_FLAG
is documented to be specifically about threaded programs. I would be very happy
if the previous behaviour could be restored though. Ideally with FUTEX_PRIVATE_FLAG
being documented to work as long as processes run in the same memory space.

But about the actual deadlock. The following program completes execution on
a released 6.16.10 kernel on x86_64. On kernel 6.17.9 as well as 6.18.1 it deadlocks.
Tested kernels are from the official archlinux repositories:

---
#define _GNU_SOURCE
#include <linux/futex.h>
#include <sched.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>

#define STACK_SIZE (1024 * 1024)

static uint32_t *fut;

static int noop(void *arg) { return 0; }

static int child(void *arg) {
    // It is important this call to create a thread happens between
    // the wait and wake calls.
    //
    // Due to the new behavior around `need_futex_hash_allocate_defaults`,
    // the first clone which includes CLONE_THREAD (CLONE_VM is not enough)
    // results in a change in how futex hashes are calculated.
    clone(noop, malloc(STACK_SIZE) + STACK_SIZE,
            CLONE_VM | CLONE_SIGHAND | CLONE_THREAD, NULL, NULL, NULL);

    // So this now works with another hash and therefore does not wake the main
    // process.
    *fut = 1;
    syscall(SYS_futex, fut, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);

    return 0;
}

int main(int argc, char *argv[]) {
    fut = calloc(1, sizeof(*fut));

    // Now we create a new process sharing virtual memory but crucially without
    // specifying CLONE_THREAD.
    clone(child, malloc(STACK_SIZE) + STACK_SIZE, CLONE_VM, NULL, NULL, NULL);

    // And now this futex wait never wakes from kernel 6.17 onwards.
    syscall(SYS_futex, fut, FUTEX_WAIT_PRIVATE, 0, NULL, NULL, 0);
}
---


I realise for a fully reliable reproduction there would probably be more synchronization required,
but I hope the above is enough to demonstrate the problem. Same goes for error handling etc.
Also apologies for any other things causing confusion with the above code, I think this
reproduction may be the first C code I have written in years.

The issue does not occur if any process with CLONE_THREAD was created before the wait.
It does not occur if no process with CLONE_THREAD is created at all. And the code also
works as expected if the FUTEX_PRIVATE_FLAG is omitted.

Thank you for your time and work on the kernel, I'll gladly provide any further info you need.
Greetings and happy holidays,

Florian A.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-01-19 18:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 10:02 PROBLEM: Kernel 6.17 newly deadlocks futex Florian Albertz
2025-12-19 20:07 ` Thomas Gleixner
2026-01-09 16:56   ` Sebastian Andrzej Siewior
2026-01-19 18:24     ` Florian Albertz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox