From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Oskolkov <posk@posk.io>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
linux-api <linux-api@vger.kernel.org>,
Christian Brauner <christian.brauner@ubuntu.com>,
Florian Weimer <fw@deneb.enyo.de>,
David Laight <David.Laight@aculab.com>,
carlos <carlos@redhat.com>, Chris Kennelly <ckennelly@google.com>
Subject: Re: [RFC PATCH 1/3] Introduce per thread group current virtual cpu id
Date: Tue, 1 Feb 2022 16:00:52 -0500 (EST) [thread overview]
Message-ID: <2083444900.25808.1643749252639.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <CAFTs51XYWqN6bPbVYh8a9ta+VxS4iBbiWWNO7n1t-4_VLpKGXQ@mail.gmail.com>
----- On Feb 1, 2022, at 2:49 PM, Peter Oskolkov posk@posk.io wrote:
> On Tue, Feb 1, 2022 at 11:26 AM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> This feature allows the scheduler to expose a current virtual cpu id
>> to user-space. This virtual cpu id is within the possible cpus range,
>> and is temporarily (and uniquely) assigned while threads are actively
>> running within a thread group. If a thread group has fewer threads than
>> cores, or is limited to run on few cores concurrently through sched
>> affinity or cgroup cpusets, the virtual cpu ids will be values close
>> to 0, thus allowing efficient use of user-space memory for per-cpu
>> data structures.
>
> Why per thread group and not per mm? The main use case is for
> per-(v)cpu memory allocation logic, so it seems having this feature
> per mm is more appropriate?
Good point, yes, per-mm would be more appropriate.
So I guess that from a userspace perspective, the rseq field could become
"__u32 vm_vcpu; /* Current vcpu within memory space. */"
[...]
>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> index b6ecb9fc4cd2..c87e7ad5a1ea 100644
>> --- a/include/linux/sched/signal.h
>> +++ b/include/linux/sched/signal.h
>> @@ -244,6 +244,12 @@ struct signal_struct {
>> * and may have inconsistent
>> * permissions.
>> */
>> +#ifdef CONFIG_SCHED_THREAD_GROUP_VCPU
>> + /*
>> + * Mask of allocated vcpu ids within the thread group.
>> + */
>> + cpumask_t vcpu_mask;
>
> We use a pointer for the mask (in struct mm). Adds complexity around
> alloc/free, though. Just FYI.
It does make sense if this is opt-in.
[...]
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 2e4ae00e52d1..2690e80977b1 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4795,6 +4795,8 @@ prepare_task_switch(struct rq *rq, struct task_struct
>> *prev,
>> sched_info_switch(rq, prev, next);
>> perf_event_task_sched_out(prev, next);
>> rseq_preempt(prev);
>> + tg_vcpu_put(prev);
>> + tg_vcpu_get(next);
>
> Doing this for all tasks on all context switches will most likely be
> too expensive. We do it only for tasks that explicitly asked for this
> feature during their rseq registration, and still the tight loop in
> our equivalent of tg_vcpu_get() is occasionally noticeable (lots of
> short wakeups can lead to the loop thrashing around).
>
> Again, our approach is more complicated as a result.
I suspect that the overhead of tg_vcpu_get is quite small for processes
which work on only few cores, but becomes noticeable when processes have
many threads and are massively parallel (not affined to only a few cores).
When the feature is disabled, we can always fall-back on the value returned
by raw_smp_processor_id() and use that as a "vm-vcpu-id" value.
Whether the vm-vcpu-id or the processor id is used needs to be a consensus
across all threads from all processes using a mm at a given time.
There appears to be a tradeoff here, and I wonder how this should be presented
to users. A few possible options:
- vm-vcpu feature is opt-in (default off) or opt-out (default on),
- whether vm-vcpu is enabled for a process could be selected at runtime by the
process, either at process initialization (single thread, single mm user)
and/or while the process is multi-threaded (requires more synchronization),
- if we find a way to move automatically between vm-vcpu-id and processor id as
information source for all threads tied to a mm when we reach a number of parallel
threads threshold, then I suspect we could have best of both worlds. But it's not
clear to me how to achieve this.
Thoughts ?
Thanks,
Mathieu
>
>> fire_sched_out_preempt_notifiers(prev, next);
>> kmap_local_sched_out();
>> prepare_task(next);
>> --
>> 2.17.1
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2022-02-01 21:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-01 19:25 [RFC PATCH 1/3] Introduce per thread group current virtual cpu id Mathieu Desnoyers
2022-02-01 19:25 ` [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id Mathieu Desnoyers
2022-02-01 20:03 ` Florian Weimer
2022-02-01 20:22 ` Mathieu Desnoyers
2022-02-01 20:32 ` Florian Weimer
2022-02-01 21:20 ` Mathieu Desnoyers
2022-02-01 21:30 ` Florian Weimer
2022-02-02 1:32 ` Mathieu Desnoyers
2022-02-03 15:53 ` Mathieu Desnoyers
2022-02-01 19:25 ` [RFC PATCH 3/3] selftests/rseq: Implement rseq tg_vcpu_id field support Mathieu Desnoyers
2022-02-01 19:49 ` [RFC PATCH 1/3] Introduce per thread group current virtual cpu id Peter Oskolkov
2022-02-01 21:00 ` Mathieu Desnoyers [this message]
2022-02-02 11:23 ` Peter Zijlstra
2022-02-02 13:48 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2083444900.25808.1643749252639.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=David.Laight@aculab.com \
--cc=boqun.feng@gmail.com \
--cc=carlos@redhat.com \
--cc=christian.brauner@ubuntu.com \
--cc=ckennelly@google.com \
--cc=fw@deneb.enyo.de \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=posk@posk.io \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).