From: David Woodhouse <dwmw2@infradead.org>
To: "Thomas Weißschuh" <thomas.weissschuh@linutronix.de>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
Helge Deller <deller@gmx.de>, Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Frederic Weisbecker <frederic@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Theodore Ts'o <tytso@mit.edu>,
"Jason A. Donenfeld" <Jason@zx2c4.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Huacai Chen <chenhuacai@kernel.org>,
WANG Xuerui <kernel@xen0n.name>,
Russell King <linux@armlinux.org.uk>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Naveen N Rao <naveen@kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Arnd Bergmann <arnd@arndb.de>, Guo Ren <guoren@kernel.org>,
linux-parisc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-riscv@lists.infradead.org, loongarch@lists.linux.dev,
linux-s390@vger.kernel.org, linux-mips@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-arch@vger.kernel.org,
Nam Cao <namcao@linutronix.de>,
linux-csky@vger.kernel.org, "Ridoux, Julien" <ridouxj@amazon.com>,
"Luu, Ryan" <rluu@amazon.com>, kvm <kvm@vger.kernel.org>
Subject: Re: [PATCH v3 00/18] vDSO: Introduce generic data storage
Date: Fri, 07 Feb 2025 10:15:49 +0000 [thread overview]
Message-ID: <0a6b88c0edd85a2ae0886e5454afea09cfcd3a24.camel@infradead.org> (raw)
In-Reply-To: <20250206110648-ec4cf3d0-0aef-4feb-a859-c69e53ab110c@linutronix.de>
[-- Attachment #1: Type: text/plain, Size: 4065 bytes --]
On Thu, 2025-02-06 at 11:59 +0100, Thomas Weißschuh wrote:
> On Thu, Feb 06, 2025 at 09:31:42AM +0000, David Woodhouse wrote:
> > On Tue, 2025-02-04 at 13:05 +0100, Thomas Weißschuh wrote:
> > > Currently each architecture defines the setup of the vDSO data page on
> > > its own, mostly through copy-and-paste from some other architecture.
> > > Extend the existing generic vDSO implementation to also provide generic
> > > data storage.
> > > This removes duplicated code and paves the way for further changes to
> > > the generic vDSO implementation without having to go through a lot of
> > > per-architecture changes.
> > >
> > > Based on v6.14-rc1 and intended to be merged through the tip tree.
>
> Note: The real answer will need to come from the timekeeping
> maintainers, my personal two cents below.
>
> > Thanks for working on this. Is there a plan to expose the time data
> > directly to userspace in a form which is usable *other* than by
> > function calls which get the value of the clock at a given moment?
>
> There are no current plans that I am aware of.
>
> > For populating the vmclock device¹ we need to know the actual
> > relationship between the hardware counter (TSC, arch timer, etc.) and
> > real time in order to propagate that to the guest.
> >
> > I see two options for doing this:
> >
> > 1. Via userspace, exposing the vdso time data (and a notification when
> > it changes?) and letting the userspace VMM populate the vmclock.
> > This is complex for x86 because of TSC scaling; in fact userspace
> > doesn't currently know the precise scaling from host to guest TSC
> > so we'd have to be able to extract that from KVM.
>
> Exposing the raw vdso time data is problematic as it precludes any
> evolution to its datastructures, like the one we are currently doing.
>
> An additional, trimmed down and stable data structure could be used.
> But I don't think it makes sense. The vDSO is all about a stable
> highlevel function interface on top of an unstable data interface.
> However the vmclock needs the lowlevel data to populate its own
> datastructure, wrapping raw data access in function calls is unnecessary.
> If no functions are involved then the vDSO is not needed. The data can
> be maintained separately in any other place in the kernel and accessed
> or mapped by userspace from there.
> Also the vDSO does not have an active notification mechanism, this would
> probably be implemented through a filedescriptor, but then the data
> can also be mapped through exactly that fd.
>
> > 2. In kernel, asking KVM to populate the vmclock structure much like
> > it does other pvclocks shared with the guest. KVM/x86 already uses
> > pvclock_gtod_register_notifier() to hook changes; should we expand
> > on that? The problem with that notifier is that it seems to be
> > called far more frequently than I'd expect.
>
> This sounds better, especially as any custom ABI from the host kernel to
> the VMM would look a lot like the vmclock structure anyways.
>
> Timekeeper updates are indeed very frequent, but what are the concrete
> issues? That frequency is fine for regular vDSO data page updates,
> updating the vmclock data page should be very similar.
> The timekeeper core can pass context to the notifier callbacks, maybe
> this can be used to skip some expensive steps where possible.
In the context of a hypervisor with lots of guests running, that's a
lot of pointless steal time. But it isn't just that; ISTR the result
was also *inaccurate*.
I need to go back and reproduce the testing, but I think it was
constantly adjusting the apparent rate even with no changed inputs from
NTP. Where the number of clock counts per jiffy wasn't an integer, the
notification would be constantly changing, for example to report 333333
counts per jiffy for most of the time, and occasionally 333334 counts
for a single jiffy before flipping back again. Or something like that.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
next prev parent reply other threads:[~2025-02-07 10:16 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-04 12:05 [PATCH v3 00/18] vDSO: Introduce generic data storage Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 01/18] x86/vdso: Fix latent bug in vclock_pages calculation Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 02/18] parisc: Remove unused symbol vdso_data Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 03/18] vdso: Introduce vdso/align.h Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 04/18] vdso: Rename included Makefile Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 05/18] vdso: Add generic time data storage Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 06/18] vdso: Add generic random " Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 07/18] vdso: Add generic architecture-specific " Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 08/18] arm64: vdso: Switch to generic storage implementation Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 09/18] riscv: " Thomas Weißschuh
2025-02-22 8:17 ` Xi Ruoyao
2025-02-22 10:14 ` Thomas Gleixner
2025-02-04 12:05 ` [PATCH v3 10/18] LoongArch: vDSO: " Thomas Weißschuh
2025-02-22 8:20 ` Xi Ruoyao
2025-02-04 12:05 ` [PATCH v3 11/18] arm: vdso: " Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 12/18] s390/vdso: " Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 13/18] MIPS: vdso: " Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 14/18] powerpc/vdso: " Thomas Weißschuh
2025-02-05 8:50 ` Christophe Leroy
2025-02-04 12:05 ` [PATCH v3 15/18] x86/vdso: " Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 16/18] x86/vdso/vdso2c: Remove page handling Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 17/18] vdso: Remove remnants of architecture-specific random state storage Thomas Weißschuh
2025-02-04 12:05 ` [PATCH v3 18/18] vdso: Remove remnants of architecture-specific time storage Thomas Weißschuh
2025-02-06 9:31 ` [PATCH v3 00/18] vDSO: Introduce generic data storage David Woodhouse
2025-02-06 10:59 ` Thomas Weißschuh
2025-02-07 10:15 ` David Woodhouse [this message]
2025-02-14 11:34 ` Thomas Gleixner
2025-02-14 12:04 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0a6b88c0edd85a2ae0886e5454afea09cfcd3a24.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=James.Bottomley@hansenpartnership.com \
--cc=Jason@zx2c4.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=anna-maria@linutronix.de \
--cc=aou@eecs.berkeley.edu \
--cc=arnd@arndb.de \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=christophe.leroy@csgroup.eu \
--cc=dave.hansen@linux.intel.com \
--cc=deller@gmx.de \
--cc=frederic@kernel.org \
--cc=gor@linux.ibm.com \
--cc=guoren@kernel.org \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=kernel@xen0n.name \
--cc=kvm@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-csky@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=maddy@linux.ibm.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=namcao@linutronix.de \
--cc=naveen@kernel.org \
--cc=npiggin@gmail.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=ridouxj@amazon.com \
--cc=rluu@amazon.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=thomas.weissschuh@linutronix.de \
--cc=tsbogend@alpha.franken.de \
--cc=tytso@mit.edu \
--cc=vincenzo.frascino@arm.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).