From: Andrey Semashev <andrey.semashev@gmail.com>
To: "André Almeida" <andrealmeid@igalia.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Darren Hart" <dvhart@infradead.org>,
linux-kernel@vger.kernel.org
Cc: linux-api@vger.kernel.org, fweimer@redhat.com,
libc-alpha@sourceware.org, Davidlohr Bueso <dave@stgolabs.net>,
Steven Rostedt <rostedt@goodmis.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [RFC] futex2: add NUMA awareness
Date: Thu, 14 Jul 2022 14:01:04 +0300 [thread overview]
Message-ID: <74ba5239-27b0-299e-717c-595680cd52f9@gmail.com> (raw)
In-Reply-To: <36a8f60a-69b2-4586-434e-29820a64cd88@igalia.com>
On 7/14/22 06:18, André Almeida wrote:
> Hi,
>
> futex2 is an ongoing project with the goal to create a new interface for
> futex that solves ongoing issues with the current syscall.
>
> One of this problems is the lack of NUMA awareness for futex operations.
> This RFC is aimed to gather feedback around the a NUMA interface proposal.
>
> * The problem
>
> futex has a single, global hash table to store information of current
> waiters to be queried by wakers. This hash table is stored in a single
> node in non-uniform machines. This means that a process running in other
> nodes will have some overhead using futex, given that it will need to
> access the table in a different node.
>
> * A solution
>
> For NUMA machines, it would be allocated a table per node. Processes
> then would be able to use the local table to avoid sharing data with
> other nodes.
>
> * The interface
>
> Userspace needs to specify which node would like to use to store/query
> the futex table. The common case would be to operate on the current
> node, but some cases could required to operate in another one.
>
> Before getting to the NUMA part, a quick recap of the syscalls interface
> of futex2:
>
> futex_wait(void *uaddr, unsigned int val, unsigned int flags,
> struct timespec *timo)
>
> futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)
>
> struct futex_requeue {
> void *uaddr;
> unsigned int flags;
> };
>
> futex_requeue(struct futex_requeue *rq1, struct futex_requeue *rq2,
> unsigned int nr_wake, unsigned int nr_requeue,
> u64 cmpval, unsigned int flags)
>
>
> As requeue already has 6 arguments, we can't add an argument for the
> node ID, we need to pack it in a struct. So then we have
>
> struct futexX_numa {
> __uX value;
> __sX hint;
> };
>
> Where X can be 8, 16, 32 or 64 (futex2 supports variable sized futexes).
> `value` is the futex value and `hint` can be -1 for the current node, or
> [0, MAX_NUMA_NODES) to specify a node. Example:
>
> struct futex32_numa f = {.value = 0, hint = -1};
>
> ...
>
> futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
>
> Then &f would be used as the futex address, as expected, and this would
> be used for the current node. If an app is expecting to have calls from
> different nodes then it should do for instance:
>
> struct futex32_numa f = {.value = 0, hint = 2};
>
> For non-NUMA apps, a call without FUTEX_NUMA flag would just use the
> first node as default.
>
> Feedback? Who else should I CC?
Just a few questions:
Do I understand correctly that notifiers won't be able to wake up
waiters unless they know on which node they are waiting?
Is it possible to wait on a futex on different nodes?
Is it possible to wake waiters on a futex on all nodes? When a single
(or N, where N is not "all") waiter is woken, which node is selected? Is
there a rotation of nodes, so that nodes are not skewed in terms of
notified waiters?
next prev parent reply other threads:[~2022-07-14 11:01 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-14 3:18 [RFC] futex2: add NUMA awareness André Almeida
2022-07-14 11:01 ` Andrey Semashev [this message]
2022-07-14 15:00 ` André Almeida
2022-07-22 16:42 ` Andrey Semashev
2022-07-27 17:19 ` André Almeida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=74ba5239-27b0-299e-717c-595680cd52f9@gmail.com \
--to=andrey.semashev@gmail.com \
--cc=andrealmeid@igalia.com \
--cc=bigeasy@linutronix.de \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=fweimer@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).