Hi наб, (I think we're close to merging.) On 2026-02-15T20:00:50+0100, наб wrote: > Signed-off-by: Ahelenia Ziemiańska > --- [...] > --- /dev/null > +++ p/man/man2/futex_waitv.2 > @@ -0,0 +1,420 @@ > +.\" Copyright, the authors of the Linux man-pages project > +.\" > +.\" SPDX-License-Identifier: MIT > +.\" > +.TH futex_waitv 2 (date) "Linux man-pages (unreleased)" > +.SH NAME > +futex_waitv \- wait for FUTEX_WAKE operation on multiple futexes > +.SH LIBRARY > +Standard C library > +.RI ( libc ,\~ \-lc ) > +.SH SYNOPSIS > +.nf > +.BR "#include " " /* Definition of " FUTEX* " constants */" > +.BR "#include " " /* Definition of " SYS_* " constants */" > +.B #include > +.B #include > +.P > +.BR "long syscall(" "unsigned int n;" > +.BI " SYS_futex_waitv, struct futex_waitv " waiters [ n ], > +.BI " unsigned int " n ", unsigned int " flags , > +.BI " const struct timespec *_Nullable " timeout ", clockid_t " clockid ");" > +.fi > +.P > +.EX > +.B "#include " > +.P > +struct futex_waitv { > + u64 val; /* Expected value at \f[I]uaddr\f[] */ Should we say at .uaddr[0] to be more precise? > + u64 uaddr; /* User address to wait on */ > + u32 flags; /* Flags for this waiter */ > + u32 __reserved; /* Align to u64 */ > +}; > +.EE > +.SH DESCRIPTION > +.\" This name is used internally in the kernel > +Implements the FUTEX_WAIT_MULTIPLE operation, > +analogous to a synchronous atomic parallel > +.BR FUTEX_WAIT (2const) > +or > +.B FUTEX_WAIT_PRIVATE > +on up to > +.B FUTEX_WAITV_MAX > +futex words. > +For an overview of futexes, see > +.BR futex (7); > +for a description of the general interface, see > +.BR futex (2); > +for general minutiae of futex waiting, see the page above. > +.P > +This operation tests that the values at the > +futex words pointed to by the addresses > +.IR waiters []. uaddr Should we maybe say?: futex words .IR waiters []. uaddr [0] I'm fine with either; as you prefer. But this is more specific, which might help readers a bit. > +still contain respective expected values > +.IR waiters []. val , > +and if so, sleeps waiting for a > +.BR FUTEX_WAKE (2const) > +operation on any of the futex words, > +and returns the index of > +.I a > +waiter whose futex was woken. > +.P > +If the thread starts to sleep, > +it is considered a waiter on all given futex words. > +If any of the futex values do not match their respective > +.IR waiters []. val , > +the call fails immediately with the error > +.BR EAGAIN . > +.P > +If > +.I timeout > +is not NULL, > +.I *timeout > +specifies a deadline measured against clock > +.IR clockid . > +This interval will be rounded up to the system clock granularity, > +and is guaranteed not to expire early. > +If > +.I timeout > +is NULL, > +the call blocks indefinitely. I would reorder the sentences in this paragraph, to simplify: If .I timeout is NULL, the call blocks indefinitely. Otherwise, .I *timeout specifies a deadline measured against clock .IR clockid . This interval ... This reduces the number of occurences of NULL, which helps find its meaning. > +.P > +Futex words to monitor are given by > +.IR "struct futex_waitv" , > +whose fields are analogous to > +.BR FUTEX_WAIT (2const) > +parameters, except > +.I .__reserved > +must be 0 > +and > +.I .flags > +must contain one of > +.BI FUTEX2_SIZE_ * > +ORed with some of the flags below. > +.TP > +.B FUTEX2_SIZE_U32 > +.I .val > +and > +.I .uaddr[] > +are 32-bit unsigned integers. > +.TP > +.B FUTEX2_NUMA > +The futex word is followed by another word of the same size > +.RI ( .uaddr > +points to > +.IR uint N _t[2] > +rather than > +.IR uint N _t . > +The word is given by > +.IR .uaddr[1] ), > +which can be either > +.B FUTEX_NO_NODE > +(all bits set) > +or a NUMA node number. > +.IP > +If the NUMA word is > +.BR FUTEX_NO_NODE , > +the node number of the processor the syscall executes on is written to it. > +(Except in an Maybe 'Except that' would be easier to read? > +.B EINVAL > +or > +.B EFAULT > +condition, this happens to all waiters whose > +.I .flags > +have > +.B FUTEX2_NUMA > +set.) > +.IP > +Futexes are placed on the NUMA node given by the NUMA word. > +Futexes without this flag are placed on a random node. > +.\" commit cec199c5e39bde7191a08087cc3d002ccfab31ff Please refer to kernel commits in a single line: .\" linux.git cec199c5e39b (2025-05-03; "futex: Implement FUTEX2_NUMA") The format is documented in and there's a git alias to produce them documented in Unless for some reason the commit message is really relevant. > +.\" Author: Peter Zijlstra > +.\" Date: Wed Apr 16 18:29:16 2025 +0200 > +.\" > +.\" futex: Implement FUTEX2_NUMA > +.\" > +.\" FUTEX2_MPOL is not documented or used anywhere; > +.\" it's unclear to me what it does > +.\" (defined in commit c042c505210dc3453f378df432c10fff3d471bc5 > +.\" "futex: Implement FUTEX2_MPOL") > +.TP > +.B FUTEX2_PRIVATE > +By default, the futex is shared > +.RB "(like " FUTEX_WAIT (2const)), '(like' is not part of an identifier, nor punctuation; thus, it's better to have it in a separate line ... > +and can be accessed by multiple processes; > +this flag waits on a private futex word, > +where all users must use the same virtual memory map > +(like ... like here. > +.BR FUTEX_WAIT_PRIVATE ; > +this most often means they are part of the same process). > +Private futexes are faster than shared ones. > +.P > +Programs should assign to > +.I .uaddr > +by casting a pointer to > +.BR uintptr_t . Types should go in italics, not bold. > +.\" > +.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" > +.\" > +.SH RETURN VALUE > +Returns an index to an arbitrary entry in > +.I waiters > +corresponding to some woken-up futex. > +This implies no information about other waiters. > +.P > +On error, > +\-1 is returned, > +and > +.I errno > +is set to indicate the error. > +.\" > +.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" > +.\" > +.SH ERRORS > +.TP > +.B EFAULT > +.I waiters > +points outside the accessible address space. > +.TP > +.B EFAULT > +.I timeout > +is not NULL and points outside the accessible address space. > +.TP > +.B EFAULT > +Any > +.IR waiters []. uaddr > +field points outside the accessible address space. > +.TP > +.B EINVAL > +Any > +.IR waiters []. uaddr > +field does not point to a valid object\[em]that is, > +the address is not aligned appropriately for the specified > +.BI FUTEX2_SIZE_ * . > +.TP > +.B EINVAL > +.I flags > +was not 0. > +.TP > +.B EINVAL > +.I n > +was not in the range > +.RB [ 1 , > +.IR FUTEX_WAITV_MAX ]. This is not an expression anymore, but just an macro, so it should be in bold. > +.TP > +.B EINVAL > +.I timeout > +was not NULL and > +.I clockid > +was not a valid clock > +.RB ( CLOCK_MONOTONIC > +or > +.BR CLOCK_REALTIME ). > +.TP > +.B EINVAL > +.I *timeout > +is denormal (before epoch or > +.I tv_nsec > +more than 999\[aq]999\[aq]999). > +.TP > +.B EINVAL > +Any > +.IR waiters []. flags > +field contains an unknown flag. > +.TP > +.B EINVAL > +Any > +.IR waiters []. flags > +field is missing a > +.BI FUTEX2_SIZE_ * > +flag or has a size flag different than > +.B FUTEX2_SIZE_U32 > +set. > +.TP > +.B EINVAL > +Any > +.IR waiters []. __reserved > +field is not 0. > +.TP > +.B EINVAL > +Any > +.IR waiters []. value > +field has more bits set than permitted than the size flags. > +.TP > +.B EINVAL > +.B FUTEX2_NUMA > +was set in > +.IR waiters []. flags , > +and the NUMA word > +(which is the same size as the futex word) > +is too small to contain the index of the biggest NUMA domain Is biggest the best word here? I don't have a better idea, but please have a look at this, just in case. > +(for example, > +.B FUTEX2_SIZE_U8 > +and there are at least 255 possible NUMA domains). > +.TP > +.B EINVAL > +.B FUTEX2_NUMA > +was set in > +.IR waiters []. flags , > +and the NUMA word is larger than the maximum possible NUMA node and not > +.BR FUTEX_NO_NODE . > +.TP > +.B ETIMEDOUT > +.I timeout > +was not NULL and no futex was woken before the timeout elapsed. I think this might be simpler and more readable as The timeout elapsed and no futex was woken. If it elapsed, it's obvious that it wasn't NULL. It could also be written as No futex was woken before the timeout elapsed. I find it slightly more readable in the first form, but if you differ feel free to pick the second form. > +.TP > +.BR EAGAIN " or " EWOULDBLOCK > +The value pointed to by > +.I .uaddr Should we say directly 'The value in .uaddr[0]'? > +was not equal to the expected value > +.I .val > +at the time of the call. > +.TP > +.B EINTR > +The > +operation was interrupted by a signal (see > +.BR signal (7)). Cheers, Alex > +.\" > +.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" > +.\" > +.SH STANDARDS > +Linux. > +.SH NOTES > +.BR FUTEX2_SIZE_U8 , > +.BR FUTEX2_SIZE_U16 , > +and > +.B FUTEX2_SIZE_U64 > +where > +.I .val > +and > +.I .uaddr[] > +are 8, 16, or 64 bits are defined, but not implemented > +.RB ( EINVAL ). > +.SH HISTORY > +.\" commit bf69bad38cf63d980e8a603f8d1bd1f85b5ed3d9 > +.\" Author: André Almeida > +.\" Date: Thu Sep 23 14:11:05 2021 -0300 > +.\" > +.\" futex: Implement sys_futex_waitv() > +Linux 5.16. > +.SH EXAMPLES > +The program below executes a linear-time operation on 10 threads, > +displaying the results in real time, > +waiting at most 1 second for each new result. > +The first 3 threads operate on the same data (complete in the same time). > +.B !\& > +indicates the futex that woke up each > +.BR futex_waitv (). > +.in +4 > +.EX > +.RB $\~ ./futex_waitv > +153 153 153 237 100 245 177 127 215 61 > + 122! > + 200! > + 254! > +306 306! > + 306! > + 354! > + 430! > + 474! > + 490! > +futex_waitv: my_futex_waitv: Connection timed out > +.EE > +.P > +.\" SRC BEGIN (futex_waitv.c) > +.EX > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +\& > +static inline long > +my_futex_wake_private(_Atomic uint32_t *uaddr, uint32_t val) > +{ > + return syscall(SYS_futex, uaddr, FUTEX_WAKE_PRIVATE, val); > +} > +\& > +static inline long > +my_futex_waitv(unsigned int n; > + struct futex_waitv waiters[n], unsigned int n, > + unsigned int flags, const struct timespec *timeout, > + clockid_t clockid) > +{ > + return syscall(SYS_futex_waitv, waiters, n, flags, timeout, clockid); > +} > +\& > +void * > +worker(void *arg) > +{ > + _Atomic uint32_t *futex = arg; > +\& > + usleep(*futex * 10000); > + *futex *= 2; > + my_futex_wait_private(futex, 1); > + return NULL; > +} > +\& > +int > +main(void) > +{ > + _Atomic uint32_t futexes[10]; > + uint8_t init[countof(futexes)]; > + struct futex_waitv waiters[countof(futexes)] = {}; > + int i; > +\& > + if (getentropy(init, sizeof(init))) > + err(EXIT_FAILURE, "getentropy"); > + init[0] = init[1] = init[2]; > + for (i = 0; i < countof(futexes); ++i) { > + printf("%w8u\[rs]t", init[i]); > + atomic_init(&futexes[i], init[i]); > + pthread_create(&(pthread_t){}, NULL, worker, &futexes[i]); > + } > + putchar(\[aq]\[rs]n\[aq]); > +\& > + for (i = 0; i < countof(futexes); ++i) { > + waiters[i].val = futexes[i]; > + waiters[i].uaddr = (uintptr_t) &futexes[i]; > + waiters[i].flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE; > + } > + for (;;) { > + struct timespec timeout; > + int woke; > +\& > + clock_gettime(CLOCK_MONOTONIC, &timeout); > + timeout.tv_sec += 1; > +\& > + woke = my_futex_waitv(waiters, countof(futexes), 0, &timeout, CLOCK_MONOTONIC); > + if (woke == \-1 && (errno != EAGAIN && errno != EWOULDBLOCK)) > + err(EXIT_FAILURE, "my_futex_waitv"); > +\& > + for (i = 0; i < countof(futexes); ++i) { > + if (futexes[i] != waiters[i].val) > + printf("%w32u%s", futexes[i], i == woke ? "!" : ""); > + putchar(\[aq]\[rs]t\[aq]); > + } > + putchar(\[aq]\[rs]n\[aq]); > +\& > + for (i = 0; i < countof(futexes); ++i) > + waiters[i].val = futexes[i]; > + } > +} > +.EE > +.\" SRC END > +.SH SEE ALSO > +.BR futex (2), > +.BR FUTEX_WAIT (2const), > +.BR FUTEX_WAKE (2const), > +.BR futex (7) > +.P > +Kernel source file > +.I Documentation/userspace-api/futex2.rst > diff --git u/man/man7/futex.7 p/man/man7/futex.7 > index 51c5d5d9b..d271144ff 100644 > --- u/man/man7/futex.7 > +++ p/man/man7/futex.7 > @@ -45,7 +45,9 @@ .SS Semantics > Any futex operation starts in user space, > but it may be necessary to communicate with the kernel using the > .BR futex (2) > -system call. > +or > +.BR futex_waitv (2) > +system calls. > .P > To "up" a futex, execute the proper assembler instructions that > will cause the host CPU to atomically increment the integer. > @@ -72,7 +74,9 @@ .SS Semantics > .P > The > .BR futex (2) > -system call can optionally be passed a timeout specifying how long > +and > +.BR futex_waitv (2) > +system calls can optionally be passed a timeout specifying how long > the kernel should > wait for the futex to be upped. > In this case, semantics are more complex and the programmer is referred > @@ -107,6 +111,7 @@ .SH NOTES > .SH SEE ALSO > .BR clone (2), > .BR futex (2), > +.BR futex_waitv (2), > .BR get_robust_list (2), > .BR set_robust_list (2), > .BR set_tid_address (2), > -- > 2.39.5 --