All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: "André Almeida" <andrealmeid@igalia.com>,
	"Rich Felker" <dalias@aerifal.cx>
Cc: LKML <linux-kernel@vger.kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	"Carlos O'Donell" <carlos@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Florian Weimer" <fweimer@redhat.com>,
	"Torvald Riegel" <triegel@redhat.com>,
	"Darren Hart" <dvhart@infradead.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Uros Bizjak" <ubizjak@gmail.com>,
	"Thomas Weißschuh" <linux@weissschuh.net>
Subject: Re: [patch v2 00/11] futex: Address the robust futex unlock race for real
Date: Fri, 27 Mar 2026 11:08:10 +0100	[thread overview]
Message-ID: <875x6hd1px.ffs@tglx> (raw)
In-Reply-To: <d51aec74-64ee-419f-a880-b9c41e7f1c95@igalia.com>

On Fri, Mar 27 2026 at 00:42, André Almeida wrote:
> Em 26/03/2026 19:08, Rich Felker escreveu:
>> On Thu, Mar 26, 2026 at 10:59:20PM +0100, Thomas Gleixner wrote:
>>> On Fri, Mar 20 2026 at 00:24, Thomas Gleixner wrote:
>>>> If the functionality itself is agreed on we only need to agree on the names
>>>> and signatures of the functions exposed through the VDSO before we set them
>>>> in stone. That will hopefully not take another 15 years :)
>>>
>>> Have the libc folks any further opinion on the syscall and the vDSO part
>>> before I prepare v3?
>> 
>> This whole conversation has been way too much for me to keep up with,
>> so I'm not sure where it's at right now.
>> 
>>  From musl's perspective, the way we make robust mutex unlocking safe
>> right now is by inhibiting munmap/mremap/MAP_FIXED and
>> pthread_mutex_destroy while there are any in-flight robust unlocks. It
>> will be nice to be able to conditionally stop doing that if vdso is
>> available, but I can't see using a fallback that requires a syscall,
>> as that would just be a lot more expensive than what we're doing right
>> now and still not work on older kernels. So I think the only part
>> we're interested in is the fully-userspace approach in vdso.
>> 
>
> You just need the syscall for the contented case (where you would need a 
> syscall anyway for a FUTEX_WAKE).
>
> As Thomas wrote in patch 09/11:
>
>    The resulting code sequence for user space is:
>
>    if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != 
> tid)
>   	err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....);
>
>    Both the VDSO unlock and the kernel side unlock ensure that the 
> pending_op pointer is always cleared when the lock becomes unlocked.
>
>
> So you call the vDSO first. If it fails, it means that the lock is 
> contented and you need to call futex(). It will wake a waiter, release 
> the lock and clean list_op_pending.

See also the V1 cover letter which has a full deep dive:

     https://lore.kernel.org/20260316162316.356674433@kernel.org

TLDR:

The problem can be split into two issues:

    1) Contended unlock

    2) Uncontended unlock

#1 is solved by moving the unlock into the kernel instead of unlocking
   first and then invoking the syscall to wake waiters. The syscall
   takes the list_pending_op pointer as an argument and after unlocking,
   i.e. *lock = 0, it clears the list_pending_op pointer

   For this to work, it needs to use try_cmpxchg() like PI unlock does.

#2 The race is between the succesful try_cmpxchg() and the clearing of
   the list_pending_op pointer

   That's where the VDSO comes into play. Instead of having the
   try_cmpxchg() in the library code the library invokes the VDSO
   provided variant. That allows the kernel to check in the signal
   delivery path whether a successful unlock requires a helping hand to
   clear the list pending op pointer. If the interrupted IP is in the
   critical section _and_ the try_cmpxchg() succeeded then the kernel
   clears the pointer.

   In x86 ASM:

   0000000000001590 <__vdso_futex_robust_list64_try_unlock@@LINUX_2.6>:
    1590:  mov    %esi,%eax
    1592:  xor    %ecx,%ecx
    1594:  lock cmpxchg %ecx,(%rdi)    // Result goes into ZF
    1598:  jne    159d               <- CS start    
    159a:  mov    %rcx,(%rdx)          // Clear list_pending_op
    159d:  ret                       <- CS end
    159e:  xchg   %ax,%ax

   So if the kernel observes

         IP >= CS start && IP < CS end

   then it checks the ZF flag in pt_regs and if set it clears the
   list_pending op.

Obviously #1 depends on #2 to close all holes.

Thanks,

        tglx

  reply	other threads:[~2026-03-27 10:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 23:24 [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
2026-03-19 23:24 ` [patch v2 01/11] futex: Move futex task related data into a struct Thomas Gleixner
2026-03-20 14:59   ` André Almeida
2026-03-19 23:24 ` [patch v2 02/11] futex: Move futex related mm_struct " Thomas Gleixner
2026-03-20 15:00   ` André Almeida
2026-03-19 23:24 ` [patch v2 03/11] futex: Provide UABI defines for robust list entry modifiers Thomas Gleixner
2026-03-20 15:01   ` André Almeida
2026-03-19 23:24 ` [patch v2 04/11] uaccess: Provide unsafe_atomic_store_release_user() Thomas Gleixner
2026-03-20  9:11   ` Peter Zijlstra
2026-03-20 12:38     ` Thomas Gleixner
2026-03-20 16:07   ` André Almeida
2026-03-19 23:24 ` [patch v2 05/11] x86: Select ARCH_STORE_IMPLIES_RELEASE Thomas Gleixner
2026-03-20 16:08   ` André Almeida
2026-03-19 23:24 ` [patch v2 06/11] futex: Cleanup UAPI defines Thomas Gleixner
2026-03-20 16:09   ` André Almeida
2026-03-19 23:24 ` [patch v2 07/11] futex: Add support for unlocking robust futexes Thomas Gleixner
2026-03-20 17:14   ` André Almeida
2026-03-26 22:23     ` Thomas Gleixner
2026-03-27  0:48       ` André Almeida
2026-03-19 23:24 ` [patch v2 08/11] futex: Add robust futex unlock IP range Thomas Gleixner
2026-03-20  9:07   ` Peter Zijlstra
2026-03-20 12:07     ` Thomas Gleixner
2026-03-27 13:24   ` Sebastian Andrzej Siewior
2026-03-27 16:19     ` Thomas Gleixner
2026-03-19 23:24 ` [patch v2 09/11] futex: Provide infrastructure to plug the non contended robust futex unlock race Thomas Gleixner
2026-03-20 13:35   ` Thomas Gleixner
2026-03-19 23:24 ` [patch v2 10/11] x86/vdso: Prepare for robust futex unlock support Thomas Gleixner
2026-03-19 23:25 ` [patch v2 11/11] x86/vdso: Implement __vdso_futex_robust_try_unlock() Thomas Gleixner
2026-03-20  7:14   ` Uros Bizjak
2026-03-20 12:48     ` Thomas Gleixner
2026-03-26 21:59 ` [patch v2 00/11] futex: Address the robust futex unlock race for real Thomas Gleixner
2026-03-26 22:08   ` Rich Felker
2026-03-27  3:42     ` André Almeida
2026-03-27 10:08       ` Thomas Gleixner [this message]
2026-03-27 16:50       ` Rich Felker
2026-03-28 12:41         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875x6hd1px.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=andrealmeid@igalia.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=carlos@redhat.com \
    --cc=dalias@aerifal.cx \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fweimer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@weissschuh.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=triegel@redhat.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.