linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Florian Weimer <fweimer@redhat.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	libc-alpha@sourceware.org,
	John Ogness <john.ogness@linutronix.de>,
	linux-rt-devel@lists.linux.dev,
	Thomas Gleixner <tglx@linutronix.de>,
	Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH] nptl: Use a PI-aware lock for internal pthread_cond-related locking
Date: Tue, 9 Sep 2025 17:37:58 +0200	[thread overview]
Message-ID: <20250909153758.JD5t64rY@linutronix.de> (raw)
In-Reply-To: <lhubjnjbyt9.fsf@oldenburg.str.redhat.com>

On 2025-09-09 14:22:26 [+0200], Florian Weimer wrote:
> * Sebastian Andrzej Siewior:
> 
> > On 2025-09-08 14:46:12 [-0300], Adhemerval Zanella Netto wrote:
> >> > This change expects that PI-FUTEX is supported by the kernel. This is
> >> > the case since a long time but it is possible to disable the FUTEX
> >> > subsystem or the FUTEX_PI part of it while building the kernel.
> >> > Is it okay to assume that PI-FUTEX is available or should there be a
> >> > check somewhere during startup and in case of -ENOSYS a fallback to
> >> > current implementation?
> >> 
> >> We removed the __ASSUME_FUTEX_LOCK_PI (f5c77f78ec03363d5e550c4996deb75ee3f2e32a)
> >> in favor or always check for PI support at runtime during pthread_mutex_init
> >> (prio_inherit_missing).
> >> 
> >> Since the kernel still might return ENOSYS for FUTEX_PI I think we should
> >> keep probing its support as runtime.
> >
> > Okay. Since that one is static, I guess I would have to make my own
> > check in pthread_cond.
> >
> > But… Now that I look at it again. The kernel has this FUTEX_PI option
> > which depends on RT_MUTEXES. But RT_MUTEXES has no off switch so it must
> > always be selected once FUTEX itself is enabled. The I2C subsystem
> > selects RT_MUTEXES and I don't think there is a config without PI-FUTEX
> > considering this.
> > We _used_ to have runtime detection for PI-FUTEX support because not all
> > architectures provided a cmpxchg function for futex. This is gone and
> > all architectures as of v5.17 provide it. That would be commit
> >    3297481d688a5 ("futex: Remove futex_cmpxchg detection")
> >    https://git.kernel.org/torvalds/c/3297481d688a5
> >
> > for reference. So I *think* this config option can be removed on kernel
> > side and it appears to me as of v5.17 there should be no need for a
> > runtime check regarding PI-futex.
> 
> Are there seccomp filters for PI futexes?  I wouldn't be surprised if
> people added them after some of the high-profile futex vulnerabilities.
> I think we should not support such seccomp filters, but we need to know
> what we are up against.

I don't know. But don't you allow a syscall such a sys_futex and don't
filter additional arguments? Unless one would filter the op argument, it
shouldn't be an issue. There was this switch from sys_futex to
sys_futex_time64 on 32bit architectures but this is more a permanent
switch…
And libc doesn't use sys_futex_wake, sys_futex_wait which would make a
difference in this case. So unless op is checked, it should be fine.

> > There shouldn't be any error. There might be the case where the lock
> > owner is gone (ESRCH I believe) or the theoretical ENOMEM. ESRCH isn't
> > handled now but it can't be recognized either. It would require to kill
> > the thread owning the lock.
> > So either abort the operation if the futex-op returns an error because
> > "this shouldn't happen" or I don't know.
> 
> ENOMEM needs to be reported to the caller because the application may
> want to react to it.

Well. Right now it only checks ESRCH and EDEADLK. Everything else is
considered success.
So do want to update this + man-page?
But what should be done this pthread_cond_.*() functions? I guess we
can't forward that possible -ENOMEM to the caller?

Also if we are in good mood, there pthread_mutex_lock() has this comment
|                 /* ESRCH can happen only for non-robust PI mutexes where
|                    the owner of the lock died.  */

This is simply not true as far as the kernel goes. If the futex uaddr
contains a pid of a non-existing task then LOCK_PI will return ESRCH. A
simple testcase would be

| #include <stdio.h>
| #include <pthread.h>
|
| static pthread_mutex_t l;
|
| static void *thread_code(void *arg)
| {
|         pthread_mutex_lock(&l);
|         return NULL;
| }
|
| int main(void)
| {
|         pthread_mutexattr_t attr;
|         pthread_t thread;
|         int ret;
|
|         ret = pthread_mutexattr_init(&attr);
|         ret |= pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT);
|         ret |= pthread_mutex_init(&l, &attr);
|         ret |= pthread_create(&thread, NULL, thread_code, NULL);
|         if (ret) {
|                 printf("->[%d] %d\n", __LINE__, ret);
|                 return 1;
|         }
|         pthread_join(thread, NULL);
|         ret = pthread_mutex_lock(&l);
|         printf("-> %d %m\n", ret);
|
|         return 0;
| }

and strace says:
| futex(0x55afb0ceb080, FUTEX_LOCK_PI_PRIVATE, NULL) = -1 ESRCH (No such process)
| futex(0x7ffe73216f24, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY

the second futex() is the __futex_abstimed_wait64() that follows.

> Long term, we should probably have different interfaces for full locking
> (with expected failures during locking) and simple mutexes (where
> locking can only fail due to memory corruption).
> 
> Unlocking must never fail with ENOMEM or similar error codes, it must
> always suceed (except if there is memory corruption).

UNLOCK_PI. There is no ENOMEM, unlock always succeeds.
Except for pilot errors:
- EFAULT (can't read, uaddr not properly aligned)
- EPERM (we are not owner).
- EAGAIN if userland fiddled with with the value while kernel tried to
  unlock. This _shouldn't_ happen because if the 0->tid transition fails
  the user should go to kernel. But if it fiddles, EAGAIN will be a
  possible outcome.

> Thanks,
> Florian

Sebastian

  reply	other threads:[~2025-09-09 15:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08 16:33 [PATCH] nptl: Use a PI-aware lock for internal pthread_cond-related locking Sebastian Andrzej Siewior
2025-09-08 17:46 ` Adhemerval Zanella Netto
2025-09-09  7:32   ` Sebastian Andrzej Siewior
2025-09-09 12:22     ` Florian Weimer
2025-09-09 15:37       ` Sebastian Andrzej Siewior [this message]
2025-09-09 16:32         ` Florian Weimer
2025-09-09 19:14           ` Adhemerval Zanella Netto
2025-09-10  5:57             ` Florian Weimer
2025-09-09 13:09     ` Adhemerval Zanella Netto
2025-09-09 15:52       ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250909153758.JD5t64rY@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=john.ogness@linutronix.de \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).