linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	libc-alpha@sourceware.org
Cc: John Ogness <john.ogness@linutronix.de>,
	linux-rt-devel@lists.linux.dev,
	Thomas Gleixner <tglx@linutronix.de>,
	Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH] nptl: Use a PI-aware lock for internal pthread_cond-related locking
Date: Mon, 8 Sep 2025 14:46:12 -0300	[thread overview]
Message-ID: <4c9c86ae-a432-47fe-bede-7f98f4bb1f7c@linaro.org> (raw)
In-Reply-To: <20250908163329.Qq79ujq_@linutronix.de>



On 08/09/25 13:33, Sebastian Andrzej Siewior wrote:
> The pthread_cond_t structure provides an internal lock using the lower two bits
> of __g1_orig_size. This lock is always used by the signaller. The waiter uses
> it only in certain corner cases such as when canceling a wait due to a timeout.
> 
> Under contention, a thread attempting to acquire the lock invokes the futex
> operation FUTEX_WAIT while waiting for the lock to become available. This
> behavior can be problematic in real-time environments because the priority of
> the waiting task is not considered. As a result, a lower-priority task may
> become runnable while the lock owner remains preempted.
> 
> To address this, repurpose the unused member __unused_initialized_1 as a
> priority-inheritance (PI) futex named __pi_lock. This is the futex word, not a
> pthread_mutex_t.
> 
> __condvar_acquire_lock() performs an atomic transition from 0 to TID. If it
> fails, it falls back to __futex_lock_pi64(), which acquires the lock via
> FUTEX_LOCK_PI.
> __condvar_release_lock() performs an atomic transition from TID to 0. If it
> fails, it falls back to futex_unlock_pi(), which releases the lock and
> wakes a potential waiter.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> 
> This is a follow-up to
> 	https://lore.kernel.org/all/20250829145659.KhM4kIoQ@linutronix.de/
> 
> This change expects that PI-FUTEX is supported by the kernel. This is
> the case since a long time but it is possible to disable the FUTEX
> subsystem or the FUTEX_PI part of it while building the kernel.
> Is it okay to assume that PI-FUTEX is available or should there be a
> check somewhere during startup and in case of -ENOSYS a fallback to
> current implementation?

We removed the __ASSUME_FUTEX_LOCK_PI (f5c77f78ec03363d5e550c4996deb75ee3f2e32a)
in favor or always check for PI support at runtime during pthread_mutex_init
(prio_inherit_missing).

Since the kernel still might return ENOSYS for FUTEX_PI I think we should
keep probing its support as runtime.
> 
>  nptl/pthread_cond_common.c              | 58 ++++++++++---------------
>  sysdeps/nptl/bits/thread-shared-types.h |  2 +-
>  2 files changed, 24 insertions(+), 36 deletions(-)
> 
> diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
> index 2708d262958f2..74010787b18de 100644
> --- a/nptl/pthread_cond_common.c
> +++ b/nptl/pthread_cond_common.c
> @@ -106,41 +106,23 @@ __condvar_fetch_xor_wseq_release (pthread_cond_t *cond, unsigned int val)
>  #endif /* !__HAVE_64B_ATOMICS */
>  
>  /* The lock that signalers use.  See pthread_cond_wait_common for uses.
> -   The lock is our normal three-state lock: not acquired (0) / acquired (1) /
> -   acquired-with-futex_wake-request (2).  However, we need to preserve the
> -   other bits in the unsigned int used for the lock, and therefore it is a
> -   little more complex.  */
> +   The lock is the futex PI lock (PTHREAD_MUTEX_PI_NORMAL_NP) using just the
> +   futex word. */
>  static void __attribute__ ((unused))
>  __condvar_acquire_lock (pthread_cond_t *cond, int private)
>  {
> -  unsigned int s = atomic_load_relaxed (&cond->__data.__g1_orig_size);
> -  while ((s & 3) == 0)
> +  int oldval, newval;
> +
> +  newval = THREAD_GETMEM (THREAD_SELF, tid);
> +  oldval = atomic_compare_and_exchange_val_acq (&cond->__data.__pi_lock,
> +                                                newval, 0);
> +  if (oldval != 0)
>      {
> -      if (atomic_compare_exchange_weak_acquire (&cond->__data.__g1_orig_size,
> -	  &s, s | 1))
> -	return;
> -      /* TODO Spinning and back-off.  */
> -    }
> -  /* We can't change from not acquired to acquired, so try to change to
> -     acquired-with-futex-wake-request and do a futex wait if we cannot change
> -     from not acquired.  */
> -  while (1)
> -    {
> -      while ((s & 3) != 2)
> -	{
> -	  if (atomic_compare_exchange_weak_acquire
> -	      (&cond->__data.__g1_orig_size, &s, (s & ~(unsigned int) 3) | 2))
> -	    {
> -	      if ((s & 3) == 0)
> -		return;
> -	      break;
> -	    }
> -	  /* TODO Back off.  */
> -	}
> -      futex_wait_simple (&cond->__data.__g1_orig_size,
> -	  (s & ~(unsigned int) 3) | 2, private);
> -      /* Reload so we see a recent value.  */
> -      s = atomic_load_relaxed (&cond->__data.__g1_orig_size);
> +      int e;
> +
> +      e = __futex_lock_pi64 (&cond->__data.__pi_lock, 0, NULL, private);
> +      if (e != 0)
> +        futex_fatal_error ();

The __futex_lock_pi64 already issues futex_fatal_error() non-expected return code;
so I think we should only use it futex-internal.{c,h}.

And I think we will need to handle ENOSYS here, maybe to use the old locking logic
if the kernel does not support FUTEX_PI. I think it should be fair to assume that
if your running your workload on an environment without FUTEX_PI support you are
not subject to the issue you raised.

I am ccing Carlos, he has helped fix some issue on cond vars recently and he
might have some idea if using PI aware locks does make sense here.

Also, I think we really need regression testcases for this change.

>      }
>  }
>  
> @@ -148,10 +130,16 @@ __condvar_acquire_lock (pthread_cond_t *cond, int private)
>  static void __attribute__ ((unused))
>  __condvar_release_lock (pthread_cond_t *cond, int private)
>  {
> -  if ((atomic_fetch_and_release (&cond->__data.__g1_orig_size,
> -				 ~(unsigned int) 3) & 3)
> -      == 2)
> -    futex_wake (&cond->__data.__g1_orig_size, 1, private);
> +  pid_t id = THREAD_GETMEM (THREAD_SELF, tid);
> +
> +  if (!atomic_compare_exchange_weak_release (&cond->__data.__pi_lock, &id, 0))
> +    {
> +      int e;
> +
> +      e = futex_unlock_pi ((unsigned int *) &cond->__data.__pi_lock, private);
> +      if (e != 0)
> +	futex_fatal_error ();
> +    }
>  }
>  
>  /* Only use this when having acquired the lock.  */
> diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h
> index e614c7f3c900e..9b362d5282cd9 100644
> --- a/sysdeps/nptl/bits/thread-shared-types.h
> +++ b/sysdeps/nptl/bits/thread-shared-types.h
> @@ -99,7 +99,7 @@ struct __pthread_cond_s
>    unsigned int __g1_orig_size;
>    unsigned int __wrefs;
>    unsigned int __g_signals[2];
> -  unsigned int __unused_initialized_1;
> +  int __pi_lock;
>    unsigned int __unused_initialized_2;
>  };
>  


  reply	other threads:[~2025-09-08 17:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08 16:33 [PATCH] nptl: Use a PI-aware lock for internal pthread_cond-related locking Sebastian Andrzej Siewior
2025-09-08 17:46 ` Adhemerval Zanella Netto [this message]
2025-09-09  7:32   ` Sebastian Andrzej Siewior
2025-09-09 12:22     ` Florian Weimer
2025-09-09 15:37       ` Sebastian Andrzej Siewior
2025-09-09 16:32         ` Florian Weimer
2025-09-09 19:14           ` Adhemerval Zanella Netto
2025-09-10  5:57             ` Florian Weimer
2025-09-09 13:09     ` Adhemerval Zanella Netto
2025-09-09 15:52       ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c9c86ae-a432-47fe-bede-7f98f4bb1f7c@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=bigeasy@linutronix.de \
    --cc=carlos@redhat.com \
    --cc=john.ogness@linutronix.de \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).