All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: наб <nabijaczleweli@nabijaczleweli.xyz>
Cc: linux-man@vger.kernel.org
Subject: Re: [PATCH v4] futex_waitv.2: new page
Date: Wed, 11 Feb 2026 14:23:13 +0100	[thread overview]
Message-ID: <aYx8Ghyq8E76xx9c@devuan> (raw)
In-Reply-To: <mlooycjkz6pfdvket6f2gwet76aar3zsihtvpm7klt2wl2z3y4@tarta.nabijaczleweli.xyz>

[-- Attachment #1: Type: text/plain, Size: 14529 bytes --]

Hi наб,

On 2026-02-11T05:00:26+0100, наб wrote:
> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
> ---
>  man/man2/futex_waitv.2 | 426 +++++++++++++++++++++++++++++++++++++++++
>  man/man7/futex.7       |   9 +-
>  2 files changed, 433 insertions(+), 2 deletions(-)
>  create mode 100644 man/man2/futex_waitv.2
> 
> diff --git u/man/man2/futex_waitv.2 p/man/man2/futex_waitv.2
> new file mode 100644
> index 000000000..b05eb08ef
> --- /dev/null
> +++ p/man/man2/futex_waitv.2
> @@ -0,0 +1,426 @@
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: MIT
> +.\"
> +.TH futex_waitv 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +futex_waitv \- wait for FUTEX_WAKE operation on multiple futexes
> +.SH LIBRARY
> +Standard C library
> +.RI ( libc ,\~ \-lc )
> +.SH SYNOPSIS
> +.nf
> +.BR "#include <linux/futex.h>" "  /* Definition of " "struct futex_waitv" " */"

The comment should mention the FUTEX_ constants, and not the structure
(which should be documented next to the structure).

> +.BR "#include <sys/syscall.h>" "  /* Definition of " SYS_* " constants */"
> +.B #include <unistd.h>
> +.B #include <time.h>
> +.P
> +.BR "long syscall(" "unsigned int nr_futexes;"

Let's call this just 'n'.  I have been wanting to globally trim
parameter names that are unnecessarily verbose.  Let's not add more.

> +.BI "             SYS_futex_waitv, struct futex_waitv " waiters [ nr_futexes ],
> +.BI "             unsigned int " nr_futexes ", unsigned int " flags ,
> +.BI "             const struct timespec *_Nullable " timeout ", clockid_t " clockid ");"
> +.fi
> +.P
> +.EX

Please include a header for this structure (consider as if the other
headers were not included).

> +struct futex_waitv {
> +    u64 val;        /* Expected value at \f[I]uaddr\f[] */
> +    u64 uaddr;      /* User address to wait on */
> +    u32 flags;      /* Flags for this waiter */
> +    u32 __reserved; /* Align to u64 */
> +};
> +.EE
> +.SH DESCRIPTION
> +.\" This name is used internally in the kernel
> +Implements the FUTEX_WAIT_MULTIPLE operation,
> +analogous to a synchronous atomic parallel
> +.BR FUTEX_WAIT (2const)
> +or
> +.B FUTEX_WAIT_PRIVATE
> +on up to
> +.B FUTEX_WAITV_MAX
> +futex words.
> +For an overview of futexes, see
> +.BR futex (7);
> +for a description of the general interface, see
> +.BR futex (2);
> +for general minutiae of futex waiting, see the page above.
> +.P
> +This operation tests that the values at the
> +futex words pointed to by the addresses
> +.IR waiters []. uaddr
> +still contain respective expected values
> +.IR waiters []. val ,
> +and if so, sleeps waiting for a
> +.BR FUTEX_WAKE (2const)
> +operation on any of the futex words,
> +and returns the index of
> +.I a
> +waiter whose futex was woken.
> +.P
> +If the thread starts to sleep,
> +it is considered a waiter on all given futex words.
> +If any of the futex values do not match their respective
> +.IR waiters []. val ,
> +the call fails immediately with the error
> +.BR EAGAIN .
> +.P
> +If
> +.I timeout
> +is not NULL,
> +.I *timeout
> +specifies a deadline measured against clock
> +.IR clockid .
> +This interval will be rounded up to the system clock granularity,
> +and is guaranteed not to expire early.
> +If
> +.I timeout
> +is NULL, the call blocks indefinitely.
> +.P
> +Futex words to monitor are given by
> +.IR "struct futex_waitv" ,
> +whose fields are analogous to
> +.BR FUTEX_WAIT (2const)
> +parameters, except
> +.I __reserved

Please use .__reserved syntax for struct members (also with other
members).

> +must be 0
> +and
> +.I flags
> +must contain one of
> +.BR FUTEX2_SIZE_*
> +ORed with some of the flags below.

I expect the flags should start immediately after 'below'.

> +.P
> +C programs should assign to
> +.I uaddr
> +by casting a pointer to
> +.B uintptr_t
> +to ensure the top bits are cleared on 32-bit systems.

I'm not sure the paragraph above is correct.
What code are you worried about exactly?  Could you show an example?

> +.TP
> +.BR FUTEX2_SIZE_U32
> +.I val
> +and
> +.I *uaddr
> +are 32-bit unsigned integers.
> +.TP
> +.B FUTEX2_NUMA
> +The futex word is followed by another word of the same size
> +.RI ( uaddr
> +points to
> +.IR uint N _t[2]
> +rather than
> +.IR uint N _t .
> +The word is given by
> +.IR uaddr[1] ),
> +which can be either
> +.B FUTEX_NO_NODE
> +(all bits set)
> +or a NUMA node number.
> +.IP
> +If the NUMA word is
> +.BR FUTEX_NO_NODE ,
> +the node number of the processor the syscall executes on is written to it.
> +(Except in an
> +.B EINVAL
> +or
> +.B EFAULT
> +condition, this happens to all waiters whose
> +.I flags
> +have
> +.B FUTEX2_NUMA
> +set.)
> +.IP
> +Futexes are placed on the NUMA node given by the NUMA word.
> +Futexes without this flag are placed on a random node.
> +.\" commit cec199c5e39bde7191a08087cc3d002ccfab31ff
> +.\" Author: Peter Zijlstra <peterz@infradead.org>
> +.\" Date:   Wed Apr 16 18:29:16 2025 +0200
> +.\"
> +.\"     futex: Implement FUTEX2_NUMA
> +.\"
> +.\" FUTEX2_MPOL is not documented or used anywhere;
> +.\" it's unclear to me what it does
> +.\" (defined in commit c042c505210dc3453f378df432c10fff3d471bc5
> +.\"  "futex: Implement FUTEX2_MPOL")
> +.TP
> +.B FUTEX2_PRIVATE
> +By default, the futex is shared
> +.RB "(like " FUTEX_WAIT (2const)),
> +and can be accessed by multiple processes;
> +this flag waits on a private futex word,
> +where all users must use the same virtual memory map
> +(like
> +.BR FUTEX_WAIT_PRIVATE ;
> +this most often means they are part of the same process).
> +Private futexes are faster than shared ones.
> +.\"
> +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
> +.\"
> +.SH RETURN VALUE
> +Returns an index to an arbitrary entry in
> +.I waiters
> +corresponding to some woken-up futex.
> +This implies no information about other waiters.
> +.P
> +On error,
> +\-1 is returned,
> +and
> +.I errno
> +is set to indicate the error.
> +.\"
> +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
> +.\"
> +.SH ERRORS
> +.TP
> +.B EFAULT
> +.I waiters
> +points outside the accessible address space.
> +.TP
> +.B EFAULT
> +.I timeout
> +was not NULL and did not point to a valid user-space address.
> +.TP
> +.B EFAULT
> +Any
> +.IR waiters []. uaddr
> +field is not a valid user-space address.

Please homogenize the wording of this and the first EFAULT.
(I don't have a preference for which language is best).

> +.TP
> +.B EINVAL
> +Any
> +.IR waiters []. uaddr
> +field does not point to a valid object\[em]that is,
> +the address is not aligned appropriately for the specified
> +.BR FUTEX2_SIZE_* .
> +.TP
> +.B EINVAL
> +.I flags
> +was not 0.
> +.TP
> +.B EINVAL
> +.I nr_futexes
> +was not in
> +[1,
> +.B FUTEX_WAITV_MAX
> +(128)].
> +.TP
> +.B EINVAL
> +.I timeout
> +was not NULL and
> +.I clockid
> +was not a valid clock
> +.RB ( CLOCK_MONOTONIC
> +or
> +.BR CLOCK_REALTIME ).
> +.TP
> +.B EINVAL
> +.I *timeout
> +is denormal (before epoch or
> +.I tv_nsec
> +more than 999'999'999).

' should actually be \[aq].

> +.TP
> +.B EINVAL
> +Any
> +.IR waiters []. flags
> +field contains an unknown flag.
> +.TP
> +.B EINVAL
> +Any
> +.IR waiters []. flags
> +field is missing a
> +.B FUTEX2_SIZE_*
> +flag or has a size flag different than
> +.BR FUTEX2_SIZE_U32
> +set.
> +.TP
> +.B EINVAL
> +Any
> +.IR waiters []. __reserved
> +field is not 0.
> +.TP
> +.B EINVAL
> +Any
> +.IR waiters []. value
> +field has more bits set than permitted than the size flags.
> +.TP
> +.B EINVAL
> +.B FUTEX2_NUMA
> +was set in
> +.IR waiters []. flags ,
> +and the NUMA word
> +(which is the same size as the futex word)
> +is too small to contain the index of the biggest NUMA domain
> +(for example,
> +.B FUTEX2_SIZE_U8
> +and there are more than 255 NUMA domains).
> +.TP
> +.B EINVAL
> +.B FUTEX2_NUMA
> +was set in
> +.IR waiters []. flags ,
> +and the NUMA word is larger than the maximum possible NUMA node and not
> +.BR FUTEX_NO_NODE .
> +.TP
> +.B ETIMEDOUT
> +.I timeout
> +was not NULL and no futex was woken before the timeout elapsed.
> +.TP
> +.BR EAGAIN " or " EWOULDBLOCK
> +The value pointed to by
> +.I uaddr
> +was not equal to the expected value
> +.I val
> +at the time of the call.
> +.TP
> +.B EINTR
> +The
> +operation was interrupted by a signal (see
> +.BR signal (7)).
> +.\"
> +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
> +.\"
> +.SH STANDARDS
> +Linux.
> +.SH NOTES
> +.BR FUTEX2_SIZE_U8 ,
> +.BR FUTEX2_SIZE_U16 ,
> +and
> +.B FUTEX2_SIZE_U64
> +where
> +.I val
> +and
> +.I *uaddr
> +are 8, 16, or 64 bits are defined, but not implemented
> +.RB ( EINVAL ).
> +.SH HISTORY
> +.\" commit bf69bad38cf63d980e8a603f8d1bd1f85b5ed3d9
> +.\" Author: André Almeida <andrealmeid@igalia.com>
> +.\" Date:   Thu Sep 23 14:11:05 2021 -0300
> +.\"
> +.\"     futex: Implement sys_futex_waitv()
> +Linux 5.16.
> +.SH EXAMPLES
> +The program below executes a linear-time operation on 10 threads,
> +displaying the results in real time,
> +waiting at most 1 second for each new result.
> +The first 3 threads operate on the same data (complete in the same time).
> +.B !\&
> +indicates the futex that woke up each
> +.BR futex_waitv ().
> +.in +4
> +.EX
> +.RB $\~ ./futex_waitv
> +153	153	153	237	100	245	177	127	215	61
> +									122!
> +				200!
> +							254!
> +306	306!
> +		306!
> +						354!
> +								430!
> +			474!
> +					490!
> +Connection timed out
> +.EE
> +.P
> +.\" SRC BEGIN (futex_waitv.c)
> +.EX
> +#include <errno.h>
> +#include <inttypes.h>
> +#include <linux/futex.h>
> +#include <pthread.h>
> +#include <stdatomic.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/syscall.h>
> +#include <time.h>
> +#include <unistd.h>
> +\&
> +static inline long
> +my_futex_wait_private(_Atomic uint32_t  *uaddr, uint32_t  val,

Should this be atomic?  If so, should we document futex(2) as taking
a pointer to atomic element?

> +                      const struct timespec *_Nullable  timeout)

The two spaces are for variables, but not for function parameters.

Also, my bad: this shouldn't have _Nullable.  We don't use it in
actual code (because Clang's _Nullable is really broken, and only serves
documentation purposes).  Actually, Clang is still sub-par in many ways;
it's a C++ compiler, after all, where the C part is just a side-effect.

> +{
> +	return syscall(SYS_futex, uaddr, FUTEX_WAKE_PRIVATE, val, timeout);
> +}
> +\&
> +static inline long
> +my_futex_waitv(struct futex_waitv  *waiters, unsigned int  nr_futexes,
> +               unsigned int  flags, const struct timespec *_Nullable  timeout,
> +               clockid_t  clockid)
> +{
> +	return syscall(SYS_futex_waitv, waiters, nr_futexes, flags, timeout, clockid);
> +}
> +\&
> +void *
> +worker(void  *arg)
> +{
> +	_Atomic uint32_t  *futex = arg;
> +\&
> +	usleep(*futex * 10000);
> +	*futex *= 2;
> +	my_futex_wait_private(futex, 1, NULL);
> +	return NULL;
> +}
> +\&
> +int
> +main(void)
> +{
> +	_Atomic uint32_t  futexes[10];
> +	uint8_t  init[countof(futexes)];
> +	struct futex_waitv waiters[countof(futexes)] = {};
> +	int  i;
> +\&
> +	getentropy(init, sizeof(init));
> +	init[0] = init[1] = init[2];
> +	for (i = 0; i < countof(futexes); ++i) {
> +		printf("%" PRIu8 "\[rs]t", init[i]);
> +		atomic_init(&futexes[i], init[i]);
> +		pthread_create(&(pthread_t){}, NULL, worker, &futexes[i]);
> +	}
> +	putchar('\[rs]n');
> +\&
> +	for (i = 0; i < countof(futexes); ++i) {
> +		waiters[i].val   = futexes[i];
> +		waiters[i].uaddr = (uintptr_t)&futexes[i];
> +		waiters[i].flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE;
> +	}
> +	for (;;) {
> +		struct timespec  timeout;
> +		int  woke;
> +\&
> +		clock_gettime(CLOCK_MONOTONIC, &timeout);
> +		timeout.tv_sec += 1;
> +\&
> +		woke = my_futex_waitv(waiters, countof(futexes), 0, &timeout, CLOCK_MONOTONIC);
> +		if (woke == -1 && (errno != EAGAIN && errno != EWOULDBLOCK))
> +			break;
> +\&
> +		for (i = 0; i < countof(futexes); ++i) {
> +			if (futexes[i] != waiters[i].val)
> +				printf("%" PRIu32 "%s", futexes[i], i == woke ? "!" : "");
> +			putchar('\[rs]t');
> +		}
> +		putchar('\[rs]n');
> +\&
> +		for (i = 0; i < countof(futexes); ++i)
> +			waiters[i].val = futexes[i];
> +	}
> +	fprintf(stderr, "%s\[rs]n", strerror(errno));
> +}
> +.EE
> +.\" SRC END
> +.SH SEE ALSO
> +.ad l

What's the effect of this ad l, and why do we want it?

> +.BR futex (2),
> +.BR FUTEX_WAIT (2const),
> +.BR FUTEX_WAKE (2const),
> +.BR futex (7)
> +.P
> +The following kernel source files:
> +.IP \[bu]
> +.I Documentation/userspace-api/futex2.rst

I'd remove the below, and only point to the rst.


Have a lovely day!
Alex

> +.IP \[bu]
> +.I kernel/futex/syscall.c
> +.IP \[bu]
> +.I kernel/futex/waitwake.c
> +.IP \[bu]
> +.I kernel/futex/futex.h
> diff --git u/man/man7/futex.7 p/man/man7/futex.7
> index 51c5d5d9b..d271144ff 100644
> --- u/man/man7/futex.7
> +++ p/man/man7/futex.7
> @@ -45,7 +45,9 @@ .SS Semantics
>  Any futex operation starts in user space,
>  but it may be necessary to communicate with the kernel using the
>  .BR futex (2)
> -system call.
> +or
> +.BR futex_waitv (2)
> +system calls.
>  .P
>  To "up" a futex, execute the proper assembler instructions that
>  will cause the host CPU to atomically increment the integer.
> @@ -72,7 +74,9 @@ .SS Semantics
>  .P
>  The
>  .BR futex (2)
> -system call can optionally be passed a timeout specifying how long
> +and
> +.BR futex_waitv (2)
> +system calls can optionally be passed a timeout specifying how long
>  the kernel should
>  wait for the futex to be upped.
>  In this case, semantics are more complex and the programmer is referred
> @@ -107,6 +111,7 @@ .SH NOTES
>  .SH SEE ALSO
>  .BR clone (2),
>  .BR futex (2),
> +.BR futex_waitv (2),
>  .BR get_robust_list (2),
>  .BR set_robust_list (2),
>  .BR set_tid_address (2),
> -- 
> 2.39.5



-- 
<https://www.alejandro-colomar.es>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2026-02-11 13:23 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-07 12:49 [PATCH] futex_waitv.2: new page наб
2026-02-07 18:57 ` Alejandro Colomar
2026-02-07 19:16   ` наб
2026-02-07 21:50     ` Alejandro Colomar
2026-02-07 22:00 ` [PATCH v2] " наб
2026-02-09 22:35   ` Alejandro Colomar
2026-02-10 14:17     ` наб
2026-02-10 14:30       ` Alejandro Colomar
2026-02-10 15:54         ` Kristoffer Haugsbakk
2026-02-10 18:39           ` Alejandro Colomar
2026-02-11  7:35           ` Jeff King
2026-02-11  8:15             ` Kristoffer Haugsbakk
2026-02-11 15:43             ` Junio C Hamano
2026-02-10 16:54         ` Junio C Hamano
2026-02-10 17:11           ` Kristoffer Haugsbakk
2026-02-10 18:44           ` Alejandro Colomar
2026-02-10 20:05   ` Alejandro Colomar
2026-02-10 20:32     ` [PATCH v3] " наб
2026-02-10 21:11       ` Alejandro Colomar
2026-02-11  4:00         ` [PATCH v4] " наб
2026-02-11 13:23           ` Alejandro Colomar [this message]
2026-02-11 13:51             ` [PATCH v5] " наб
2026-02-11 14:15               ` [PATCH v6] " наб
2026-02-11 14:31                 ` Alejandro Colomar
2026-02-11 14:44                   ` [PATCH v7] " наб
2026-02-11 14:55                     ` Alejandro Colomar
2026-02-11 14:59                       ` наб
2026-02-11 15:13                         ` Alejandro Colomar
2026-02-14 17:32                     ` Alejandro Colomar
2026-02-14 19:30                       ` [PATCH v8] " наб
2026-02-14 20:03                         ` Alejandro Colomar
2026-02-14 20:48                           ` [PATCH v9] " наб
2026-02-15 18:18                             ` Alejandro Colomar
2026-02-15 19:00                               ` [PATCH v10] " наб
2026-02-16  0:32                                 ` Alejandro Colomar
2026-02-16 14:20                                   ` [PATCH v11] " наб
2026-02-16 14:50                                     ` Alejandro Colomar
2026-02-16 20:43                                       ` [PATCH v12] " наб
2026-02-17 13:07                                         ` Alejandro Colomar
2026-02-17 14:31                                           ` [PATCH v13] " наб
2026-02-17 15:46                                             ` Alejandro Colomar
2026-02-17 16:17                                               ` наб
2026-02-18  0:31                                                 ` Alejandro Colomar
2026-02-11 14:28               ` [PATCH v5] " Alejandro Colomar
2026-02-18  0:41 ` [PATCH v1 0/1] futex_waitv.2: Move text to a new PARAMETERS section Alejandro Colomar
2026-02-18  0:41   ` [PATCH v1 1/1] man/man2/futex_waitv.2: " Alejandro Colomar
2026-02-18 20:16   ` [PATCH v1 0/1] futex_waitv.2: " наб
2026-02-18 20:26     ` Alejandro Colomar
2026-02-18 20:30       ` наб
2026-02-18 20:33         ` Alejandro Colomar
2026-02-18 20:40         ` G. Branden Robinson
2026-02-18 21:28           ` Alejandro Colomar
2026-02-18 22:04             ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYx8Ghyq8E76xx9c@devuan \
    --to=alx@kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=nabijaczleweli@nabijaczleweli.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.