Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
	linux-api <linux-api@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	carlos <carlos@redhat.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>
Subject: Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64
Date: Tue, 20 Oct 2020 14:47:57 -0400 (EDT)	[thread overview]
Message-ID: <1247061646.32339.1603219677094.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <873631yp8t.fsf@oldenburg2.str.redhat.com>

----- On Sep 29, 2020, at 4:13 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:
> 
>>> So we have a bootstrap issue here that needs to be solved, I think.
>>
>> The one thing I'm not sure about is whether the vDSO interface is indeed
>> superior to KTLS, or if it is just the model we are used to.
>>
>> AFAIU, the current use-cases for vDSO is that an application calls into
>> glibc, which then calls the vDSO function exposed by the kernel. I wonder
>> whether the vDSO indirection is really needed if we typically have a glibc
>> function used as indirection ? For an end user, what is the benefit of vDSO
>> over accessing KTLS data directly from glibc ?
> 
> I think the kernel can only reasonably maintain a single userspace data
> structure.  It's not reasonable to update several versions of the data
> structure in parallel.

I disagree with your statement. Considering that the kernel needs to keep
ABI compatibility for whatever it exposes to user-space, claiming that it
should never update several versions of data structures exposed to user-space
in parallel means that once a data structure is exposed to user-space as ABI
in a certain way, it can never ever change in the future, even if we find
a better way to do things.

It makes more sense to allow multiple data structures to be updated
in parallel until older ones become deprecated/unused/irrelevant, at
which point those can be configured out at build time and eventually
phased out after years of deprecation. Having the ability to update multiple
data structures in user-space with replicated information is IMHO necessary
to allow creation of new/better accelerated ABIs.

> 
> This means that glibc would have to support multiple kernel data
> structures, and users might lose userspace acceleration after a kernel
> update, until they update glibc as well.  The glibc update should be
> ABI-compatible, but someone would still have to backport it, apply it to
> container images, etc.

No. If the kernel ever exposes a data structure to user-space as ABI,
then it needs to stay there, and not break userspace. Hence the need to
duplicate information provided to user-space if need be, so we can move
on to better ABIs without breaking the old ones.

> 
> What's worse, the glibc code would be quite hard to test because we
> would have to keep around multiple kernel versions to exercise all the
> different data structure variants.
> 
> In contrast, the vDSO code always matches the userspace data structures,
> is always updated at the same time, and tested together.  That looks
> like a clear win to me.

For cases where the overhead of vDSO is not an issue, I agree that it
makes things tidier than directly accessing a data structure. The
documentation of the ABI becomes much simpler as well.

> 
>> If we decide that using KTLS from a vDSO function is indeed a requirement,
>> then, as you point out, the thread_pointer is available as ABI, but we miss
>> the KTLS offset.
>>
>> Some ideas on how we could solve this: we could either make the KTLS
>> offset part of the ABI (fixed offset), or save the offset near the
>> thread pointer at a location that would become ABI. It would have to
>> be already populated with something which can help detect the case
>> where a vDSO is called from a thread which does not populate KTLS
>> though. Is that even remotely doable ?
> 
> I don't know.
> 
> We could decide that these accelerated system calls must only be called
> with a valid TCB.  That's unavoidable if the vDSO sets errno directly,
> so it's perhaps not a big loss.  It's also backwards-compatible because
> existing TCB-less code won't know about those new vDSO entrypoints.
> Calling into glibc from a TCB-less thread has always been undefined.
> TCB-less code would have to make direct, non-vDSO system calls, as today.
> 
> For discovering the KTLS offset, a per-process page at a fixed offset
> from the vDSO code (i.e., what real shared objects already do for global
> data) could store this offset.  This way, we could entirely avoid an ABI
> dependency.

Or as Andy mentioned, we would simply pass the ktls offset as argument to
the vDSO ? It seems simple enough. Would it fit all our use-cases including
errno ?

> 
> We'll see what will break once we have the correct TID after vfork. 8->
> glibc currently supports malloc-after-vfork as an extension, and
> a lot of software depends on it (OpenJDK, for example).

I am not sure to see how that is related to ktls ?

> 
>>> With the latter, we could
>>> directly expose the vDSO implementation to applications, assuming that
>>> we agree that the vDSO will not fail with ENOSYS to request fallback to
>>> the system call, but will itself perform the system call.
>>
>> We should not forget the fields needed by rseq as well: the rseq_cs
>> pointer and the cpu_id fields need to be accessed directly from the
>> rseq critical section, without function call. Those use-cases require
>> that applications and library can know the KTLS offset and size and
>> use those fields directly.
> 
> Yes, but those offsets could be queried using a function from the vDSO
> (or using a glibc interface, to simplify linking).

Good point!

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2020-10-20 18:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25 18:15 [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 Mathieu Desnoyers
2020-09-25 18:15 ` [RFC PATCH 2/2] selftests/rseq: Adapt x86-64 rseq selftest to rseq KTLS prototype Mathieu Desnoyers
2020-09-28 15:13 ` [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64 Florian Weimer
2020-09-28 17:29   ` Mathieu Desnoyers
2020-09-29  8:13     ` Florian Weimer
2020-10-20 18:47       ` Mathieu Desnoyers [this message]
2020-10-29 15:35         ` Florian Weimer
2020-09-29 18:01   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1247061646.32339.1603219677094.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=boqun.feng@gmail.com \
    --cc=carlos@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fweimer@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=vincenzo.frascino@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.