Re: [RFC PATCH 0/2] futex: how to solve the robust_list race condition?

public inbox for linux-api@vger.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "André Almeida" <andrealmeid@igalia.com>,
	"Carlos O'Donell" <carlos@redhat.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Florian Weimer" <fweimer@redhat.com>,
	"Rich Felker" <dalias@aerifal.cx>,
	"Torvald Riegel" <triegel@redhat.com>,
	"Darren Hart" <dvhart@infradead.org>,
	"Thomas Gleixner" <tglx@kernel.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Michal Hocko" <mhocko@suse.com>
Cc: kernel-dev@igalia.com, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	libc-alpha <libc-alpha@sourceware.org>
Subject: Re: [RFC PATCH 0/2] futex: how to solve the robust_list race condition?
Date: Fri, 20 Feb 2026 17:41:17 -0500	[thread overview]
Message-ID: <67be0aa1-2241-43ef-aa01-a89ced26c8f6@efficios.com> (raw)
In-Reply-To: <0d334517-63ee-46c9-884d-6c2ae8388b87@efficios.com>

On 2026-02-20 16:42, Mathieu Desnoyers wrote:
> +CC libc-alpha.
> 
> On 2026-02-20 15:26, André Almeida wrote:
>> During LPC 2025, I presented a session about creating a new syscall for
>> robust_list[0][1]. However, most of the session discussion wasn't much 
>> related
>> to the new syscall itself, but much more related to an old bug that 
>> exists in
>> the current robust_list mechanism.
>>
>> Since at least 2012, there's an open bug reporting a race condition, as
>> Carlos O'Donell pointed out:
>>
>>    "File corruption race condition in robust mutex unlocking"
>>    https://sourceware.org/bugzilla/show_bug.cgi?id=14485
>>
>> To help understand the bug, I've created a reproducer (patch 1/2) and a
>> companion kernel hack (patch 2/2) that helps to make the race condition
>> more likely. When the bug happens, the reproducer shows a message
>> comparing the original memory with the corrupted one:
>>
>>    "Memory was corrupted by the kernel: 8001fe8d8001fe8d vs 
>> 8001fe8dc0000000"
>>
>> I'm not sure yet what would be the appropriated approach to fix it, so I
>> decided to reach the community before moving forward in some direction.
>> One suggestion from Peter[2] resolves around serializing the mmap() 
>> and the
>> robust list exit path, which might cause overheads for the common case,
>> where list_op_pending is empty.
>>
>> However, giving that there's a new interface being prepared, this could
>> also give the opportunity to rethink how list_op_pending works, and get
>> rid of the race condition by design.
>>
>> Feedback is very much welcome.
> 
> Looking at this bug, one thing I'm starting to consider is that it
> appears to be an issue inherent to lack of synchronization between
> pthread_mutex_destroy(3) and the per-thread list_op_pending fields
> and not so much a kernel issue.
> 
> Here is why I think the issue is purely userspace:
> 
> Let's suppose we have a shared memory area across Processes 1 and 
> Process 2,
> which internally have its own custom memory allocator in userspace to
> allocate/free space within that shared memory.
> 
> Process 1, Thread A stumbles through the scenario highlighted by this 
> bug, and
> basically gets preempted at this FIXME in libc 
> __pthread_mutex_unlock_full():
> 
>        if (__glibc_unlikely ((atomic_exchange_release (&mutex- 
>  >__data.__lock, 0)
>                               & FUTEX_WAITERS) != 0))
>          futex_wake ((unsigned int *) &mutex->__data.__lock, 1, private);
> 
>        /* We must clear op_pending after we release the mutex.
>           FIXME However, this violates the mutex destruction requirements
>           because another thread could acquire the mutex, destroy it, and
>           reuse the memory for something else; then, if this thread 
> crashes,
>           and the memory happens to have a value equal to the TID, the 
> kernel
>           will believe it is still related to the mutex (which has been
>           destroyed already) and will modify some other random object.  */
>        __asm ("" ::: "memory");
>        THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> 
> Then Process 1, Thread B runs, grabs the lock, releases it, and based on
> program state it knows it can pthread_mutex_destroy() this lock, free its
> associated memory through the custom shared memory allocator, and allocate
> it for other purposes. Then we get to the point where Process 1 is
> killed, and where the robust futex kernel code corrupts data in shared
> memory because of the dangling list_op_pending pointer.
> 
> That shared memory data is still observable by Process B, which will get a
> corrupted state.
> 
> Notice how this all happens without any munmap(2)/mmap(2) in the sequence ?
> This is why I think this is purely a userspace issue rather than an issue
> we can solve by adding extra synchronization in the kernel.
> 
> The one point we have in that sequence where I think we can add 
> synchronization
> is pthread_mutex_destroy(3) in libc. One possible "big hammer" solution 
> would be
> to make pthread_mutex_destroy iterate on all other threads list_op_pending
> and busy-wait if it finds that the mutex address is in use. It would of 
> course
> only have to do that for robust futexes.
> 
> If that big hammer solution is not fast enough for many-threaded use-cases,
> then we can think of other approaches such as adding a reference counter
> in the mutex structure, or introducing hazard pointers in userspace to 
> reduce
> synchronization iteration from nr_threads to nr_cpus (or even down to max
> rseq mm_cid).

To make matters even worse, the pthread_mutex_destroy(3) and reallocation
could happen from Process 2 rather than Process 1. So iterating on a
threads from Process 1 is not sufficient. We'd need to synchronize
pthread_mutex_destroy on something within the mutex structure which is
observable from all processes using the lock, for instance a reference count.

Thanks,

Mathieu

> 
> Thoughts ?
> 
> Thanks,
> 
> Mathieu
> 
>>
>> Thanks!
>>     André
>>
>> [0] https://lore.kernel.org/lkml/20251122-tonyk-robust_futex- 
>> v6-0-05fea005a0fd@igalia.com/
>> [1] https://lpc.events/event/19/contributions/2108/
>> [2] https://lore.kernel.org/ 
>> lkml/20241219171344.GA26279@noisy.programming.kicks-ass.net/
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

next prev parent reply	other threads:[~2026-02-20 22:41 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 20:26 [RFC PATCH 0/2] futex: how to solve the robust_list race condition? André Almeida
2026-02-20 20:26 ` [RFC PATCH 1/2] futex: Create reproducer for robust_list race condition André Almeida
2026-03-12  9:04   ` Sebastian Andrzej Siewior
2026-03-12 13:36     ` André Almeida
2026-02-20 20:26 ` [RFC PATCH 2/2] futex: hack: Add debug delays André Almeida
2026-02-20 20:51 ` [RFC PATCH 0/2] futex: how to solve the robust_list race condition? Liam R. Howlett
2026-02-27 19:15   ` André Almeida
2026-02-20 21:42 ` Mathieu Desnoyers
2026-02-20 22:41   ` Mathieu Desnoyers [this message]
2026-02-20 23:17     ` Mathieu Desnoyers
2026-02-23 11:13       ` Florian Weimer
2026-02-23 13:37         ` Mathieu Desnoyers
2026-02-23 13:47           ` Rich Felker
2026-02-27 19:16       ` André Almeida
2026-02-27 19:59         ` Mathieu Desnoyers
2026-02-27 20:41           ` Suren Baghdasaryan
2026-03-01 15:49           ` Mathieu Desnoyers
2026-03-02  7:31             ` Florian Weimer
2026-03-02 14:57               ` Mathieu Desnoyers
2026-03-02 15:32                 ` Florian Weimer
2026-03-02 16:32                   ` Mathieu Desnoyers
2026-03-02 16:42                     ` Florian Weimer
2026-03-02 16:56                       ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67be0aa1-2241-43ef-aa01-a89ced26c8f6@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=andrealmeid@igalia.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=carlos@redhat.com \
    --cc=dalias@aerifal.cx \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fweimer@redhat.com \
    --cc=kernel-dev@igalia.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@kernel.org \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox