From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "André Almeida" <andrealmeid@igalia.com>,
"Carlos O'Donell" <carlos@redhat.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Peter Zijlstra" <peterz@infradead.org>,
"Florian Weimer" <fweimer@redhat.com>,
"Rich Felker" <dalias@aerifal.cx>,
"Torvald Riegel" <triegel@redhat.com>,
"Darren Hart" <dvhart@infradead.org>,
"Thomas Gleixner" <tglx@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"Arnd Bergmann" <arnd@arndb.de>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Michal Hocko" <mhocko@suse.com>
Cc: kernel-dev@igalia.com, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org,
libc-alpha <libc-alpha@sourceware.org>
Subject: Re: [RFC PATCH 0/2] futex: how to solve the robust_list race condition?
Date: Fri, 20 Feb 2026 17:41:17 -0500 [thread overview]
Message-ID: <67be0aa1-2241-43ef-aa01-a89ced26c8f6@efficios.com> (raw)
In-Reply-To: <0d334517-63ee-46c9-884d-6c2ae8388b87@efficios.com>
On 2026-02-20 16:42, Mathieu Desnoyers wrote:
> +CC libc-alpha.
>
> On 2026-02-20 15:26, André Almeida wrote:
>> During LPC 2025, I presented a session about creating a new syscall for
>> robust_list[0][1]. However, most of the session discussion wasn't much
>> related
>> to the new syscall itself, but much more related to an old bug that
>> exists in
>> the current robust_list mechanism.
>>
>> Since at least 2012, there's an open bug reporting a race condition, as
>> Carlos O'Donell pointed out:
>>
>> "File corruption race condition in robust mutex unlocking"
>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>>
>> To help understand the bug, I've created a reproducer (patch 1/2) and a
>> companion kernel hack (patch 2/2) that helps to make the race condition
>> more likely. When the bug happens, the reproducer shows a message
>> comparing the original memory with the corrupted one:
>>
>> "Memory was corrupted by the kernel: 8001fe8d8001fe8d vs
>> 8001fe8dc0000000"
>>
>> I'm not sure yet what would be the appropriated approach to fix it, so I
>> decided to reach the community before moving forward in some direction.
>> One suggestion from Peter[2] resolves around serializing the mmap()
>> and the
>> robust list exit path, which might cause overheads for the common case,
>> where list_op_pending is empty.
>>
>> However, giving that there's a new interface being prepared, this could
>> also give the opportunity to rethink how list_op_pending works, and get
>> rid of the race condition by design.
>>
>> Feedback is very much welcome.
>
> Looking at this bug, one thing I'm starting to consider is that it
> appears to be an issue inherent to lack of synchronization between
> pthread_mutex_destroy(3) and the per-thread list_op_pending fields
> and not so much a kernel issue.
>
> Here is why I think the issue is purely userspace:
>
> Let's suppose we have a shared memory area across Processes 1 and
> Process 2,
> which internally have its own custom memory allocator in userspace to
> allocate/free space within that shared memory.
>
> Process 1, Thread A stumbles through the scenario highlighted by this
> bug, and
> basically gets preempted at this FIXME in libc
> __pthread_mutex_unlock_full():
>
> if (__glibc_unlikely ((atomic_exchange_release (&mutex-
> >__data.__lock, 0)
> & FUTEX_WAITERS) != 0))
> futex_wake ((unsigned int *) &mutex->__data.__lock, 1, private);
>
> /* We must clear op_pending after we release the mutex.
> FIXME However, this violates the mutex destruction requirements
> because another thread could acquire the mutex, destroy it, and
> reuse the memory for something else; then, if this thread
> crashes,
> and the memory happens to have a value equal to the TID, the
> kernel
> will believe it is still related to the mutex (which has been
> destroyed already) and will modify some other random object. */
> __asm ("" ::: "memory");
> THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
>
> Then Process 1, Thread B runs, grabs the lock, releases it, and based on
> program state it knows it can pthread_mutex_destroy() this lock, free its
> associated memory through the custom shared memory allocator, and allocate
> it for other purposes. Then we get to the point where Process 1 is
> killed, and where the robust futex kernel code corrupts data in shared
> memory because of the dangling list_op_pending pointer.
>
> That shared memory data is still observable by Process B, which will get a
> corrupted state.
>
> Notice how this all happens without any munmap(2)/mmap(2) in the sequence ?
> This is why I think this is purely a userspace issue rather than an issue
> we can solve by adding extra synchronization in the kernel.
>
> The one point we have in that sequence where I think we can add
> synchronization
> is pthread_mutex_destroy(3) in libc. One possible "big hammer" solution
> would be
> to make pthread_mutex_destroy iterate on all other threads list_op_pending
> and busy-wait if it finds that the mutex address is in use. It would of
> course
> only have to do that for robust futexes.
>
> If that big hammer solution is not fast enough for many-threaded use-cases,
> then we can think of other approaches such as adding a reference counter
> in the mutex structure, or introducing hazard pointers in userspace to
> reduce
> synchronization iteration from nr_threads to nr_cpus (or even down to max
> rseq mm_cid).
To make matters even worse, the pthread_mutex_destroy(3) and reallocation
could happen from Process 2 rather than Process 1. So iterating on a
threads from Process 1 is not sufficient. We'd need to synchronize
pthread_mutex_destroy on something within the mutex structure which is
observable from all processes using the lock, for instance a reference count.
Thanks,
Mathieu
>
> Thoughts ?
>
> Thanks,
>
> Mathieu
>
>>
>> Thanks!
>> André
>>
>> [0] https://lore.kernel.org/lkml/20251122-tonyk-robust_futex-
>> v6-0-05fea005a0fd@igalia.com/
>> [1] https://lpc.events/event/19/contributions/2108/
>> [2] https://lore.kernel.org/
>> lkml/20241219171344.GA26279@noisy.programming.kicks-ass.net/
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
next prev parent reply other threads:[~2026-02-20 22:41 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-20 20:26 [RFC PATCH 0/2] futex: how to solve the robust_list race condition? André Almeida
2026-02-20 20:26 ` [RFC PATCH 1/2] futex: Create reproducer for robust_list race condition André Almeida
2026-03-12 9:04 ` Sebastian Andrzej Siewior
2026-03-12 13:36 ` André Almeida
2026-02-20 20:26 ` [RFC PATCH 2/2] futex: hack: Add debug delays André Almeida
2026-02-20 20:51 ` [RFC PATCH 0/2] futex: how to solve the robust_list race condition? Liam R. Howlett
2026-02-27 19:15 ` André Almeida
2026-02-20 21:42 ` Mathieu Desnoyers
2026-02-20 22:41 ` Mathieu Desnoyers [this message]
2026-02-20 23:17 ` Mathieu Desnoyers
2026-02-23 11:13 ` Florian Weimer
2026-02-23 13:37 ` Mathieu Desnoyers
2026-02-23 13:47 ` Rich Felker
2026-02-27 19:16 ` André Almeida
2026-02-27 19:59 ` Mathieu Desnoyers
2026-02-27 20:41 ` Suren Baghdasaryan
2026-03-01 15:49 ` Mathieu Desnoyers
2026-03-02 7:31 ` Florian Weimer
2026-03-02 14:57 ` Mathieu Desnoyers
2026-03-02 15:32 ` Florian Weimer
2026-03-02 16:32 ` Mathieu Desnoyers
2026-03-02 16:42 ` Florian Weimer
2026-03-02 16:56 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67be0aa1-2241-43ef-aa01-a89ced26c8f6@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=Liam.Howlett@oracle.com \
--cc=andrealmeid@igalia.com \
--cc=arnd@arndb.de \
--cc=bigeasy@linutronix.de \
--cc=carlos@redhat.com \
--cc=dalias@aerifal.cx \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=fweimer@redhat.com \
--cc=kernel-dev@igalia.com \
--cc=libc-alpha@sourceware.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@kernel.org \
--cc=triegel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox