From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "André Almeida" <andrealmeid@igalia.com>,
"Carlos O'Donell" <carlos@redhat.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Peter Zijlstra" <peterz@infradead.org>,
"Florian Weimer" <fweimer@redhat.com>,
"Rich Felker" <dalias@aerifal.cx>,
"Torvald Riegel" <triegel@redhat.com>,
"Darren Hart" <dvhart@infradead.org>,
"Thomas Gleixner" <tglx@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"Arnd Bergmann" <arnd@arndb.de>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Michal Hocko" <mhocko@suse.com>
Cc: kernel-dev@igalia.com, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org,
libc-alpha <libc-alpha@sourceware.org>
Subject: Re: [RFC PATCH 0/2] futex: how to solve the robust_list race condition?
Date: Fri, 20 Feb 2026 16:42:08 -0500 [thread overview]
Message-ID: <0d334517-63ee-46c9-884d-6c2ae8388b87@efficios.com> (raw)
In-Reply-To: <20260220202620.139584-1-andrealmeid@igalia.com>
+CC libc-alpha.
On 2026-02-20 15:26, André Almeida wrote:
> During LPC 2025, I presented a session about creating a new syscall for
> robust_list[0][1]. However, most of the session discussion wasn't much related
> to the new syscall itself, but much more related to an old bug that exists in
> the current robust_list mechanism.
>
> Since at least 2012, there's an open bug reporting a race condition, as
> Carlos O'Donell pointed out:
>
> "File corruption race condition in robust mutex unlocking"
> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>
> To help understand the bug, I've created a reproducer (patch 1/2) and a
> companion kernel hack (patch 2/2) that helps to make the race condition
> more likely. When the bug happens, the reproducer shows a message
> comparing the original memory with the corrupted one:
>
> "Memory was corrupted by the kernel: 8001fe8d8001fe8d vs 8001fe8dc0000000"
>
> I'm not sure yet what would be the appropriated approach to fix it, so I
> decided to reach the community before moving forward in some direction.
> One suggestion from Peter[2] resolves around serializing the mmap() and the
> robust list exit path, which might cause overheads for the common case,
> where list_op_pending is empty.
>
> However, giving that there's a new interface being prepared, this could
> also give the opportunity to rethink how list_op_pending works, and get
> rid of the race condition by design.
>
> Feedback is very much welcome.
Looking at this bug, one thing I'm starting to consider is that it
appears to be an issue inherent to lack of synchronization between
pthread_mutex_destroy(3) and the per-thread list_op_pending fields
and not so much a kernel issue.
Here is why I think the issue is purely userspace:
Let's suppose we have a shared memory area across Processes 1 and Process 2,
which internally have its own custom memory allocator in userspace to
allocate/free space within that shared memory.
Process 1, Thread A stumbles through the scenario highlighted by this bug, and
basically gets preempted at this FIXME in libc __pthread_mutex_unlock_full():
if (__glibc_unlikely ((atomic_exchange_release (&mutex->__data.__lock, 0)
& FUTEX_WAITERS) != 0))
futex_wake ((unsigned int *) &mutex->__data.__lock, 1, private);
/* We must clear op_pending after we release the mutex.
FIXME However, this violates the mutex destruction requirements
because another thread could acquire the mutex, destroy it, and
reuse the memory for something else; then, if this thread crashes,
and the memory happens to have a value equal to the TID, the kernel
will believe it is still related to the mutex (which has been
destroyed already) and will modify some other random object. */
__asm ("" ::: "memory");
THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
Then Process 1, Thread B runs, grabs the lock, releases it, and based on
program state it knows it can pthread_mutex_destroy() this lock, free its
associated memory through the custom shared memory allocator, and allocate
it for other purposes. Then we get to the point where Process 1 is
killed, and where the robust futex kernel code corrupts data in shared
memory because of the dangling list_op_pending pointer.
That shared memory data is still observable by Process B, which will get a
corrupted state.
Notice how this all happens without any munmap(2)/mmap(2) in the sequence ?
This is why I think this is purely a userspace issue rather than an issue
we can solve by adding extra synchronization in the kernel.
The one point we have in that sequence where I think we can add synchronization
is pthread_mutex_destroy(3) in libc. One possible "big hammer" solution would be
to make pthread_mutex_destroy iterate on all other threads list_op_pending
and busy-wait if it finds that the mutex address is in use. It would of course
only have to do that for robust futexes.
If that big hammer solution is not fast enough for many-threaded use-cases,
then we can think of other approaches such as adding a reference counter
in the mutex structure, or introducing hazard pointers in userspace to reduce
synchronization iteration from nr_threads to nr_cpus (or even down to max
rseq mm_cid).
Thoughts ?
Thanks,
Mathieu
>
> Thanks!
> André
>
> [0] https://lore.kernel.org/lkml/20251122-tonyk-robust_futex-v6-0-05fea005a0fd@igalia.com/
> [1] https://lpc.events/event/19/contributions/2108/
> [2] https://lore.kernel.org/lkml/20241219171344.GA26279@noisy.programming.kicks-ass.net/
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
next prev parent reply other threads:[~2026-02-20 21:42 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-20 20:26 [RFC PATCH 0/2] futex: how to solve the robust_list race condition? André Almeida
2026-02-20 20:26 ` [RFC PATCH 1/2] futex: Create reproducer for robust_list race condition André Almeida
2026-03-12 9:04 ` Sebastian Andrzej Siewior
2026-03-12 13:36 ` André Almeida
2026-02-20 20:26 ` [RFC PATCH 2/2] futex: hack: Add debug delays André Almeida
2026-02-20 20:51 ` [RFC PATCH 0/2] futex: how to solve the robust_list race condition? Liam R. Howlett
2026-02-27 19:15 ` André Almeida
2026-02-20 21:42 ` Mathieu Desnoyers [this message]
2026-02-20 22:41 ` Mathieu Desnoyers
2026-02-20 23:17 ` Mathieu Desnoyers
2026-02-23 11:13 ` Florian Weimer
2026-02-23 13:37 ` Mathieu Desnoyers
2026-02-23 13:47 ` Rich Felker
2026-02-27 19:16 ` André Almeida
2026-02-27 19:59 ` Mathieu Desnoyers
2026-02-27 20:41 ` Suren Baghdasaryan
2026-03-01 15:49 ` Mathieu Desnoyers
2026-03-02 7:31 ` Florian Weimer
2026-03-02 14:57 ` Mathieu Desnoyers
2026-03-02 15:32 ` Florian Weimer
2026-03-02 16:32 ` Mathieu Desnoyers
2026-03-02 16:42 ` Florian Weimer
2026-03-02 16:56 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0d334517-63ee-46c9-884d-6c2ae8388b87@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=Liam.Howlett@oracle.com \
--cc=andrealmeid@igalia.com \
--cc=arnd@arndb.de \
--cc=bigeasy@linutronix.de \
--cc=carlos@redhat.com \
--cc=dalias@aerifal.cx \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=fweimer@redhat.com \
--cc=kernel-dev@igalia.com \
--cc=libc-alpha@sourceware.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@kernel.org \
--cc=triegel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox