qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>,
	Peter Hilber <peter.hilber@opensynergy.com>,
	linux-kernel@vger.kernel.org,  virtualization@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,  linux-rtc@vger.kernel.org,
	"Ridoux, Julien" <ridouxj@amazon.com>,
	 virtio-dev@lists.linux.dev, "Luu, Ryan" <rluu@amazon.com>,
	"Chashper, David" <chashper@amazon.com>,
	"Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>,
	 "Christopher S . Hall" <christopher.s.hall@intel.com>,
	Jason Wang <jasowang@redhat.com>,
	John Stultz <jstultz@google.com>,
	 netdev@vger.kernel.org, Stephen Boyd <sboyd@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Marc Zyngier <maz@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Alessandro Zummo <a.zummo@towertech.it>,
	 Alexandre Belloni <alexandre.belloni@bootlin.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Simon Horman <horms@kernel.org>
Subject: Re: [PATCH] ptp: Add vDSO-style vmclock support
Date: Thu, 25 Jul 2024 22:29:18 +0100	[thread overview]
Message-ID: <c5a48c032a2788ecd98bbcec71f6f3fb0fb65e8c.camel@infradead.org> (raw)
In-Reply-To: <20240725170328-mutt-send-email-mst@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4062 bytes --]

On Thu, 2024-07-25 at 17:04 -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2024 at 10:00:24PM +0100, David Woodhouse wrote:
> > On Thu, 2024-07-25 at 16:50 -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 25, 2024 at 08:35:40PM +0100, David Woodhouse wrote:
> > > > On Thu, 2024-07-25 at 12:38 -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
> > > > > > The use case isn't necessarily for all users of gettimeofday(), of
> > > > > > course; this is for those applications which *need* precision time.
> > > > > > Like distributed databases which rely on timestamps for coherency, and
> > > > > > users who get fined millions of dollars when LM messes up their clocks
> > > > > > and they put wrong timestamps on financial transactions.
> > > > > 
> > > > > I would however worry that with all this pass through,
> > > > > applications have to be coded to each hypervisor or even
> > > > > version of the hypervisor.
> > > > 
> > > > Yes, that would be a problem. Which is why I feel it's so important to
> > > > harmonise the contents of the shared memory, and I'm implementing it
> > > > both QEMU and $DAYJOB, as well as aligning with virtio-rtc.
> > > 
> > > 
> > > Writing an actual spec for this would be another thing that might help.

Potentially, although working over it with our internal clock team and
with Peter on virtio-rtc has put us in good shape. I'm confident now
that we have something that's viable and extensible enough.

> > > 
> > > > > virtio has been developed with the painful experience that we keep
> > > > > making mistakes, or coming up with new needed features,
> > > > > and that maintaining forward and backward compatibility
> > > > > becomes a whole lot harder than it seems in the beginning.
> > > > 
> > > > Yes. But as you note, this shared memory structure is a userspace ABI
> > > > all of its own, so we get to make a completely *different* kind of
> > > > mistake :)
> > > > 
> > > 
> > > 
> > > So, something I still don't completely understand.
> > > Can't the VDSO thing be written to by kernel?
> > > Let's say on LM, an interrupt triggers and kernel copies
> > > data from a specific device to the VDSO.
> > > 
> > > Is that problematic somehow? I imagine there is a race where
> > > userspace reads vdso after lm but before kernel updated
> > > vdso - is that the concern?

Yes.

> > > Then can't we fix it by interrupting all CPUs right after LM?
> > > 
> > > To me that seems like a cleaner approach - we then compartmentalize
> > > the ABI issue - kernel has its own ABI against userspace,
> > > devices have their own ABI against kernel.
> > > It'd mean we need a way to detect that interrupt was sent,
> > > maybe yet another counter inside that structure.
> > > 
> > > WDYT?
> > > 
> > > By the way the same idea would work for snapshots -
> > > some people wanted to expose that info to userspace, too.

Those people included me. I wanted to interrupt all the vCPUs, even the
ones which were in userspace at the moment of migration, and have the
kernel deal with passing it on to userspace via a different ABI.

It ends up being complex and intricate, and requiring a lot of new
kernel and userspace support. I gave up on it in the end for snapshots,
and didn't go there again for this.

By contrast, a driver which merely exposes a page of MMIO space
identified by an ACPI device (without even the in-kernel PTP support)
could probably be fewer than a hundred lines of code. In an externally-
buildable module that goes back as far as RHEL8 or even further,
allowing users to just build and use it from their application.

> was there supposed to be text here, or did you just like this
> so much you decided to repost my mail ;) 

Hm, weirdness. I've known Evolution get into a state where it sends
completely *empty* messages, but I've never seen it eat only my own
part before. I had definitely typed responses (along the lines of the
above) last time.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

  reply	other threads:[~2024-07-25 21:30 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-24 17:16 [PATCH] ptp: Add vDSO-style vmclock support David Woodhouse
2024-07-25  5:48 ` Michael S. Tsirkin
2024-07-25  9:56   ` David Woodhouse
2024-07-25 11:31     ` Daniel P. Berrangé
2024-07-25 11:53       ` David Woodhouse
2024-07-25 12:00         ` Daniel P. Berrangé
2024-07-25 12:17     ` Michael S. Tsirkin
2024-07-25 12:27       ` David Woodhouse
2024-07-25 12:29         ` Michael S. Tsirkin
2024-07-25 12:31           ` David Woodhouse
2024-07-25 12:33             ` Michael S. Tsirkin
2024-07-25 13:50               ` David Woodhouse
2024-07-25 14:11                 ` Michael S. Tsirkin
2024-07-25 15:18                   ` David Woodhouse
2024-07-25 16:38                     ` Michael S. Tsirkin
2024-07-25 19:35                       ` David Woodhouse
2024-07-25 20:50                         ` Michael S. Tsirkin
2024-07-25 21:00                           ` David Woodhouse
2024-07-25 21:04                             ` Michael S. Tsirkin
2024-07-25 21:29                               ` David Woodhouse [this message]
2024-07-25 21:47                                 ` Michael S. Tsirkin
2024-07-25 22:20                                   ` David Woodhouse
2024-07-26  6:06                                     ` Michael S. Tsirkin
2024-07-26  8:35                                       ` David Woodhouse
2024-07-26 12:52                                         ` Michael S. Tsirkin
2024-07-26 13:00                                           ` David Woodhouse
2024-07-26 13:04                                             ` Michael S. Tsirkin
2024-07-26 13:08                                               ` David Woodhouse
2024-07-26  5:09                                 ` Michael S. Tsirkin
2024-07-26  5:55                                   ` Michael S. Tsirkin
2024-07-26  8:06                                     ` David Woodhouse
2024-07-26 12:47                                       ` Michael S. Tsirkin
2024-07-26 12:51                                         ` David Woodhouse
2024-07-26 16:49                 ` Jonathan Cameron via
2024-07-26 18:28                   ` David Woodhouse
2024-07-28 10:37                     ` Michael S. Tsirkin
2024-07-28 13:07                       ` David Woodhouse
2024-07-28 15:23                         ` Michael S. Tsirkin
2024-07-29  6:45                           ` David Woodhouse
2024-07-25  5:54 ` Michael S. Tsirkin
2024-07-25 10:00   ` David Woodhouse
2024-07-25 11:20 ` Paolo Abeni
2024-07-25 11:49   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5a48c032a2788ecd98bbcec71f6f3fb0fb65e8c.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=a.zummo@towertech.it \
    --cc=abuehaze@amazon.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=chashper@amazon.com \
    --cc=christopher.s.hall@intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=jstultz@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rtc@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peter.hilber@opensynergy.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richardcochran@gmail.com \
    --cc=ridouxj@amazon.com \
    --cc=rluu@amazon.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=virtio-dev@lists.linux.dev \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).