From: Florian Weimer <fweimer@redhat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-api <linux-api@vger.kernel.org>,
Jann Horn <jannh@google.com>,
libc-alpha <libc-alpha@sourceware.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
paulmck <paulmck@kernel.org>
Subject: Re: rseq + membarrier programming model
Date: Mon, 13 Dec 2021 21:12:12 +0100 [thread overview]
Message-ID: <87bl1ktgbn.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <1424606270.30586.1639425414221.JavaMail.zimbra@efficios.com> (Mathieu Desnoyers's message of "Mon, 13 Dec 2021 14:56:54 -0500 (EST)")
* Mathieu Desnoyers:
> ----- On Dec 13, 2021, at 2:29 PM, Florian Weimer fweimer@redhat.com wrote:
>
>> * Mathieu Desnoyers:
>>
>>>> Could it fall back to
>>>> MEMBARRIER_CMD_GLOBAL instead?
>>>
>>> No. CMD_GLOBAL does not issue the required rseq fence used by the
>>> algorithm discussed. Also, CMD_GLOBAL has quite a few other shortcomings:
>>> it takes a while to execute, and is incompatible with nohz_full kernels.
>>
>> What about using sched_setcpu to move the current thread to the same CPU
>> (and move it back afterwards)? Surely that implies the required sort of
>> rseq barrier that MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ with
>> MEMBARRIER_CMD_FLAG_CPU performs?
>
> I guess you refer to using sched_setaffinity(2) there ? There are various
> reasons why this may fail. For one, the affinity mask is a shared global
> resource which can be changed by external applications.
So is process memory …
> Also, setting the affinity is really just a hint. In the presence of
> cpu hotplug and or cgroup cpuset, it is known to lead to situations
> where the kernel just gives up and provides an affinity mask including
> all CPUs.
How does CPU hotplug impact this negatively?
The cgroup cpuset issue clearly is a bug.
> Therefore, using sched_setaffinity() and expecting to be pinned to
> a specific CPU for correctness purposes seems brittle.
I'm pretty sure it used to work reliably for some forms of concurrency
control.
> But _if_ we'd have something like a sched_setaffinity which we can
> trust, yes, temporarily migrating to the target CPU, and observing that
> we indeed run there, would AFAIU provide the same guarantee as the rseq
> fence provided by membarrier. It would have a higher overhead than
> membarrier as well.
Presumably a signal could do it as well.
>> That is possible even without membarrier, so I wonder why registration
>> of intent is needed for MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
>
> I would answer that it is not possible to do this _reliably_ today
> without membarrier (see above discussion of cpu hotplug, cgroups, and
> modification of cpu affinity by external processes).
>
> AFAIR, registration of intent for MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
> is mainly there to provide a programming model similar to private expedited
> plain and core-sync cmds.
>
> The registration of intent allows the kernel to further tweak what is
> done internally and make tradeoffs which only impact applications
> performing the registration.
But if there is no strong performance argument to do so, this introduces
additional complexity into userspace. Surely we could say we just do
MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ at process start and document
failure (in case of seccomp etc.), but then why do this at all?
>>> In order to make sure the programming model is the same for expedited
>>> private/global plain/sync-core/rseq membarrier commands, we require that
>>> each process perform a registration beforehand.
>>
>> Hmm. At least it's not possible to unregister again.
>>
>> But I think it would be really useful to have some of these barriers
>> available without registration, possibly in a more expensive form.
>
> What would be wrong with doing a membarrier private-expedited-rseq
> registration on libc startup, and exposing a glibc tunable to allow
> disabling this ?
The configurations that need to be supported go from “no rseq“/“rseq”
to “no rseq“/“rseq”/“rseq with membarrier”. Everyone now needs to
think about implementing support for all three instead just the obvious
two.
Thanks,
Florian
next prev parent reply other threads:[~2021-12-13 20:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-13 18:47 rseq + membarrier programming model Florian Weimer
2021-12-13 19:19 ` Mathieu Desnoyers
2021-12-13 19:29 ` Florian Weimer
2021-12-13 19:56 ` Mathieu Desnoyers
2021-12-13 20:12 ` Florian Weimer [this message]
2021-12-14 20:25 ` Mathieu Desnoyers
2021-12-13 19:27 ` Jann Horn
2021-12-13 19:31 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bl1ktgbn.fsf@oldenburg.str.redhat.com \
--to=fweimer@redhat.com \
--cc=jannh@google.com \
--cc=libc-alpha@sourceware.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.