From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Chris Lameter <cl@linux.com>,
Jann Horn <jannh@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
linux-kernel <linux-kernel@vger.kernel.org>,
Joel Fernandes <joelaf@google.com>,
Ingo Molnar <mingo@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Dave Watson <davejwatson@fb.com>,
Will Deacon <will.deacon@arm.com>, shuah <shuah@kernel.org>,
Andi Kleen <andi@firstfloor.org>,
linux-kselftest <linux-kselftest@vger.kernel.org>,
Russell King <linux@arm.linux.org.uk>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Paul <paulmck@linux.vnet.ibm.com>, Paul Turner <pjt@google.com>,
Boqun Feng <boqun.feng@gmail.com>,
Josh Triplett <josh@joshtriplett.org>,
rostedt <rostedt@goodmis.org>, Ben Maurer <bmaurer@fb.com>,
linux-api <linux-api@vger.kernel.org>,
Andy Lutomirski <luto@amacapital.net>
Subject: Re: [RFC PATCH v1] pin_on_cpu: Introduce thread CPU pinning system call
Date: Fri, 14 Feb 2020 11:54:50 -0500 (EST) [thread overview]
Message-ID: <1713146428.2610.1581699290029.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <87blql5hfb.fsf@oldenburg2.str.redhat.com>
----- On Jan 30, 2020, at 6:10 AM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>> It brings an interesting idea to the table though. Let's assume for now that
>> the only intended use of pin_on_cpu(2) would be to allow rseq(2) critical
>> sections to update per-cpu data on specific cpu number targets. In fact,
>> considering that userspace can be preempted at any point, we still need a
>> mechanism to guarantee atomicity with respect to other threads running on
>> the same runqueue, which rseq(2) provides. Therefore, that assumption does
>> not appear too far-fetched.
>>
>> There are 2 scenarios we need to consider here:
>>
>> A) pin_on_cpu(2) targets a CPU which is not part of the affinity mask.
>>
>> This case is easy: pin_on_cpu can return an error, and the caller needs to act
>> accordingly (e.g. figure out that this is a design error and report it, or
>> decide that it really did not want to touch that per-cpu data that badly and
>> make the entire process fall-back to a mechanism which does not use per-cpu
>> data at all from that point onwards)
>
> Affinity masks currently are not like process memory: there is an
> expectation that they can be altered from outside the process.
Yes, that's my main issue.
> Given that the caller may not have any ways to recover from the
> suggested pin_on_cpu behavior, that seems problematic.
Indeed.
>
> What I would expect is that if pin_on_cpu cannot achieve implied
> exclusion by running on the associated CPU, it acquires a lock that
> prevents others pin_on_cpu calls from entering the critical section, and
> tasks in the same task group from running on that CPU (if the CPU
> becomes available to the task group). The second part should maintain
> exclusion of rseq sequences even if their fast path is not changed.
I try to avoid mutual exclusion over shared memory as rseq fallback whenever
I can, so we can use rseq from lock-free algorithms without losing lock-freedom.
> (On the other hand, I'm worried that per-CPU data structures are a dead
> end for user space unless we get containerized affinity masks, so that
> contains only see resources that are actually available to them.)
I'm currently implementing a prototype of the following ideas, and I'm curious to
read your thoughts on those:
I'm adding a "affinity_pinned" flag to the task struct of each thread. It can
be set and cleared only by the owner thread through pin_on_cpu syscall commands.
When the affinity is pinned by a thread, trying to change its affinity (from an
external thread, or possibly from itself) will fail.
Whenever a thread would (temporarily) pin itself on a specific CPU, it would
also pin its affinity mask as a side-effect. When a thread unpins from a CPU,
the affinity mask stays pinned. The purpose of keeping this affinity pinned
state per-thread is to ensure we don't end up with tiny race windows where
changing the thread's affinity mask "typically" works, but fails once in a
while because it's done concurrently with a 1ms long cpu pinning. This would
lead to flaky code, and I try hard to avoid that.
How changing this affinity should fail (from sched_setaffinity and cpusets) is a
big unanswered question. I see two major alternatives so far:
1) We deliver a signal to the target thread (SIGKILL ? SIGSEGV ?), considering
that failure to be able to change its affinity mask means we need to send a
signal. How exactly would the killed application recover (or if it should)
is still unclear.
2) Return an error to the sched_setaffinity or cpusets caller, and let it deal
with the error as it sees fit: ignore it, log it, or send a signal.
I think option (2) provides the most flexiblity, and moves policy outside of
the kernel, which is a good thing. However, looking at how cpusets seems to
simply ignore errors when setting a task's cpumask, I wonder if asking from
cpusets to handle any kind of error is asking too much. :-/
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2020-02-14 16:54 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-21 16:03 [RFC PATCH v1] pin_on_cpu: Introduce thread CPU pinning system call Mathieu Desnoyers
2020-01-21 16:03 ` Mathieu Desnoyers
2020-01-21 17:20 ` Jann Horn
2020-01-21 17:20 ` Jann Horn
2020-01-21 19:47 ` Mathieu Desnoyers
2020-01-21 19:47 ` Mathieu Desnoyers
2020-01-21 20:35 ` Jann Horn
2020-01-21 20:35 ` Jann Horn
2020-01-21 21:18 ` Mathieu Desnoyers
2020-01-21 21:18 ` Mathieu Desnoyers
[not found] ` <2049164886.596497.1579641536619.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2020-01-21 21:44 ` Christopher Lameter
2020-01-21 21:44 ` Christopher Lameter
[not found] ` <alpine.DEB.2.21.2001212141590.1231-FiTwH0KJEoYIjDr1QQGPvw@public.gmane.org>
2020-01-22 1:11 ` Mathieu Desnoyers
2020-01-22 1:11 ` Mathieu Desnoyers
[not found] ` <1648013936.596672.1579655468604.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2020-01-23 7:53 ` H. Peter Anvin
2020-01-23 7:53 ` H. Peter Anvin
[not found] ` <ead7a565-9a23-a7d7-904d-c4860f63952a-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2020-01-23 8:19 ` Florian Weimer
2020-01-23 8:19 ` Florian Weimer
[not found] ` <87a76efuux.fsf-fjB847h8rq1N9UpBYOmNkhcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
2020-01-27 19:39 ` Mathieu Desnoyers
2020-01-27 19:39 ` Mathieu Desnoyers
2020-01-30 11:10 ` Florian Weimer
2020-01-30 11:10 ` Florian Weimer
2020-02-14 16:54 ` Mathieu Desnoyers [this message]
2020-01-22 8:23 ` Jann Horn
2020-01-22 8:23 ` Jann Horn
2020-01-24 9:44 ` kbuild test robot
2020-01-24 12:25 ` kbuild test robot
-- strict thread matches above, loose matches on Subject: below --
2020-01-22 15:48 Jan Ziak
2020-01-22 15:48 ` Jan Ziak
[not found] <CAODFU0rTLmb-Ph_n1EHaZmdOAjsa6Jmx=3zkuT8LH3No=sOk5w@mail.gmail.com>
[not found] ` <CAODFU0rTLmb-Ph_n1EHaZmdOAjsa6Jmx=3zkuT8LH3No=sOk5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-01-22 17:16 ` Mathieu Desnoyers
2020-01-22 17:16 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1713146428.2610.1581699290029.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=andi@firstfloor.org \
--cc=bmaurer@fb.com \
--cc=boqun.feng@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=cl@linux.com \
--cc=davejwatson@fb.com \
--cc=fweimer@redhat.com \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=joelaf@google.com \
--cc=josh@joshtriplett.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=mtk.manpages@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rostedt@goodmis.org \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.