From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
linux-api <linux-api@vger.kernel.org>,
Christian Brauner <christian.brauner@ubuntu.com>,
carlos <carlos@redhat.com>,
Vincenzo Frascino <vincenzo.frascino@arm.com>
Subject: Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64
Date: Tue, 20 Oct 2020 14:47:57 -0400 (EDT) [thread overview]
Message-ID: <1247061646.32339.1603219677094.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <873631yp8t.fsf@oldenburg2.str.redhat.com>
----- On Sep 29, 2020, at 4:13 AM, Florian Weimer fweimer@redhat.com wrote:
> * Mathieu Desnoyers:
>
>>> So we have a bootstrap issue here that needs to be solved, I think.
>>
>> The one thing I'm not sure about is whether the vDSO interface is indeed
>> superior to KTLS, or if it is just the model we are used to.
>>
>> AFAIU, the current use-cases for vDSO is that an application calls into
>> glibc, which then calls the vDSO function exposed by the kernel. I wonder
>> whether the vDSO indirection is really needed if we typically have a glibc
>> function used as indirection ? For an end user, what is the benefit of vDSO
>> over accessing KTLS data directly from glibc ?
>
> I think the kernel can only reasonably maintain a single userspace data
> structure. It's not reasonable to update several versions of the data
> structure in parallel.
I disagree with your statement. Considering that the kernel needs to keep
ABI compatibility for whatever it exposes to user-space, claiming that it
should never update several versions of data structures exposed to user-space
in parallel means that once a data structure is exposed to user-space as ABI
in a certain way, it can never ever change in the future, even if we find
a better way to do things.
It makes more sense to allow multiple data structures to be updated
in parallel until older ones become deprecated/unused/irrelevant, at
which point those can be configured out at build time and eventually
phased out after years of deprecation. Having the ability to update multiple
data structures in user-space with replicated information is IMHO necessary
to allow creation of new/better accelerated ABIs.
>
> This means that glibc would have to support multiple kernel data
> structures, and users might lose userspace acceleration after a kernel
> update, until they update glibc as well. The glibc update should be
> ABI-compatible, but someone would still have to backport it, apply it to
> container images, etc.
No. If the kernel ever exposes a data structure to user-space as ABI,
then it needs to stay there, and not break userspace. Hence the need to
duplicate information provided to user-space if need be, so we can move
on to better ABIs without breaking the old ones.
>
> What's worse, the glibc code would be quite hard to test because we
> would have to keep around multiple kernel versions to exercise all the
> different data structure variants.
>
> In contrast, the vDSO code always matches the userspace data structures,
> is always updated at the same time, and tested together. That looks
> like a clear win to me.
For cases where the overhead of vDSO is not an issue, I agree that it
makes things tidier than directly accessing a data structure. The
documentation of the ABI becomes much simpler as well.
>
>> If we decide that using KTLS from a vDSO function is indeed a requirement,
>> then, as you point out, the thread_pointer is available as ABI, but we miss
>> the KTLS offset.
>>
>> Some ideas on how we could solve this: we could either make the KTLS
>> offset part of the ABI (fixed offset), or save the offset near the
>> thread pointer at a location that would become ABI. It would have to
>> be already populated with something which can help detect the case
>> where a vDSO is called from a thread which does not populate KTLS
>> though. Is that even remotely doable ?
>
> I don't know.
>
> We could decide that these accelerated system calls must only be called
> with a valid TCB. That's unavoidable if the vDSO sets errno directly,
> so it's perhaps not a big loss. It's also backwards-compatible because
> existing TCB-less code won't know about those new vDSO entrypoints.
> Calling into glibc from a TCB-less thread has always been undefined.
> TCB-less code would have to make direct, non-vDSO system calls, as today.
>
> For discovering the KTLS offset, a per-process page at a fixed offset
> from the vDSO code (i.e., what real shared objects already do for global
> data) could store this offset. This way, we could entirely avoid an ABI
> dependency.
Or as Andy mentioned, we would simply pass the ktls offset as argument to
the vDSO ? It seems simple enough. Would it fit all our use-cases including
errno ?
>
> We'll see what will break once we have the correct TID after vfork. 8->
> glibc currently supports malloc-after-vfork as an extension, and
> a lot of software depends on it (OpenJDK, for example).
I am not sure to see how that is related to ktls ?
>
>>> With the latter, we could
>>> directly expose the vDSO implementation to applications, assuming that
>>> we agree that the vDSO will not fail with ENOSYS to request fallback to
>>> the system call, but will itself perform the system call.
>>
>> We should not forget the fields needed by rseq as well: the rseq_cs
>> pointer and the cpu_id fields need to be accessed directly from the
>> rseq critical section, without function call. Those use-cases require
>> that applications and library can know the KTLS offset and size and
>> use those fields directly.
>
> Yes, but those offsets could be queried using a function from the vDSO
> (or using a glibc interface, to simplify linking).
Good point!
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2020-10-20 18:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-25 18:15 [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 Mathieu Desnoyers
2020-09-25 18:15 ` [RFC PATCH 2/2] selftests/rseq: Adapt x86-64 rseq selftest to rseq KTLS prototype Mathieu Desnoyers
2020-09-28 15:13 ` [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 Florian Weimer
2020-09-28 17:29 ` Mathieu Desnoyers
2020-09-29 8:13 ` Florian Weimer
2020-10-20 18:47 ` Mathieu Desnoyers [this message]
2020-10-29 15:35 ` Florian Weimer
2020-09-29 18:01 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1247061646.32339.1603219677094.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=boqun.feng@gmail.com \
--cc=carlos@redhat.com \
--cc=christian.brauner@ubuntu.com \
--cc=fweimer@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tglx@linutronix.de \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).