From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Florian Weimer <fw@deneb.enyo.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
linux-api <linux-api@vger.kernel.org>,
Christian Brauner <christian.brauner@ubuntu.com>,
David Laight <David.Laight@ACULAB.COM>,
carlos <carlos@redhat.com>, Peter Oskolkov <posk@posk.io>
Subject: Re: [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id
Date: Tue, 1 Feb 2022 20:32:37 -0500 (EST) [thread overview]
Message-ID: <1285409089.26848.1643765557716.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <87o83qxok9.fsf@mid.deneb.enyo.de>
----- On Feb 1, 2022, at 4:30 PM, Florian Weimer fw@deneb.enyo.de wrote:
> * Mathieu Desnoyers:
>
>> ----- On Feb 1, 2022, at 3:32 PM, Florian Weimer fw@deneb.enyo.de wrote:
>> [...]
>>>
>>>>> Is the switch really useful? I suspect it's faster to just write as
>>>>> much as possible all the time. The switch should be well-predictable
>>>>> if running uniform userspace, but still …
>>>>
>>>> The switch ensures the kernel don't try to write to a memory area beyond
>>>> the rseq size which has been registered by user-space. So it seems to be
>>>> useful to ensure we don't corrupt user-space memory. Or am I missing your
>>>> point ?
>>>
>>> Due to the alignment, I think you'd only ever see 32 and 64 bytes for
>>> now?
>>
>> Yes, but I would expect the rseq registration arguments to have a rseq_len
>> of offsetofend(struct rseq, tg_vcpu_id) when userspace wants the tg_vcpu_id
>> feature to be supported (but not the following features).
>
> But if rseq is managed by libc, it really has to use the full size
> unconditionally. I would even expect that eventually, the kernel only
> supports the initial 32, maybe 64 for a few early extension, and the
> size indicated by the auxiliary vector.
>
> Not all of that area would be ABI, some of it would be used by the
> vDSO only and opaque to userspace application (with applications/libcs
> passing __rseq_offset as an argument to these functions).
>
I think one aspect leading to our misunderstanding here is the distinction
between the size of the rseq area _allocation_, and the offset after the last
field supported by the given kernel.
With this in mind, let's state a bit more clearly our expected aux. vector
extensibility scheme.
With CONFIG_RSEQ=y, the kernel would pass the following information through
the ELF auxv:
- rseq allocation size (auxv_rseq_alloc_size),
- rseq allocation alignment (auxv_rseq_alloc_align),
- offset after the end of the last rseq field supported by this kernel (auxv_rseq_offset_end),
We always have auxv_rseq_alloc_size >= auxv_rseq_offset_end.
I would expect libc to use this information to allocate a memory area
at least auxv_rseq_alloc_size in size, with an alignment respecting
auxv_rseq_alloc_align. It would use a value >= auvx_rseq_alloc_size
as rseq_len argument for the rseq registration.
But I would expect libc to use the auxv_rseq_offset_end value to populate __rseq_size,
so rseq users can rely on this to check whether the fields they are trying to access
is indeed populated by the kernel.
Of course, the kernel would still allow the original 32-byte rseq_len argument
for the rseq registration, so the original ABI still works. It would however
reject any rseq registration with size smaller than auxv_rseq_alloc_size (other
than the 32-byte special-case).
Is that in line with what you have in mind ? Do we really need to expose those 3
auxv variables independently or can we somehow remove auxv_rseq_alloc_size and
use auxv_rseq_offset_end as a min value for allocation instead ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2022-02-02 1:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-01 19:25 [RFC PATCH 1/3] Introduce per thread group current virtual cpu id Mathieu Desnoyers
2022-02-01 19:25 ` [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id Mathieu Desnoyers
2022-02-01 20:03 ` Florian Weimer
2022-02-01 20:22 ` Mathieu Desnoyers
2022-02-01 20:32 ` Florian Weimer
2022-02-01 21:20 ` Mathieu Desnoyers
2022-02-01 21:30 ` Florian Weimer
2022-02-02 1:32 ` Mathieu Desnoyers [this message]
2022-02-03 15:53 ` Mathieu Desnoyers
2022-02-01 19:25 ` [RFC PATCH 3/3] selftests/rseq: Implement rseq tg_vcpu_id field support Mathieu Desnoyers
2022-02-01 19:49 ` [RFC PATCH 1/3] Introduce per thread group current virtual cpu id Peter Oskolkov
2022-02-01 21:00 ` Mathieu Desnoyers
2022-02-02 11:23 ` Peter Zijlstra
2022-02-02 13:48 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1285409089.26848.1643765557716.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=David.Laight@ACULAB.COM \
--cc=boqun.feng@gmail.com \
--cc=carlos@redhat.com \
--cc=christian.brauner@ubuntu.com \
--cc=fw@deneb.enyo.de \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=posk@posk.io \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox