linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Ricardo Koller <ricarkol@google.com>,
	Simon Veith <sveith@amazon.de>,
	dwmw2@infradead.org
Subject: Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
Date: Wed, 22 Feb 2023 11:56:53 +0000	[thread overview]
Message-ID: <86bkllyku2.wl-maz@kernel.org> (raw)
In-Reply-To: <Y+/7mO1sxH4jThmu@linux.dev>

On Fri, 17 Feb 2023 22:11:36 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Fri, Feb 17, 2023 at 10:17:27AM +0000, Marc Zyngier wrote:
> > Hi Oliver,
> > 
> > On Thu, 16 Feb 2023 22:09:47 +0000,
> > Oliver Upton <oliver.upton@linux.dev> wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On Thu, Feb 16, 2023 at 02:21:15PM +0000, Marc Zyngier wrote:
> > > > And this is the moment you have all been waiting for: setting the
> > > > counter offsets from userspace.
> > > > 
> > > > We expose a brand new capability that reports the ability to set
> > > > the offsets for both the virtual and physical sides, independently.
> > > > 
> > > > In keeping with the architecture, the offsets are expressed as
> > > > a delta that is substracted from the physical counter value.
> > > > 
> > > > Once this new API is used, there is no going back, and the counters
> > > > cannot be written to to set the offsets implicitly (the writes
> > > > are instead ignored).
> > > 
> > > Is there any particular reason to use an explicit ioctl as opposed to
> > > the KVM_{GET,SET}_DEVICE_ATTR ioctls? Dunno where you stand on it, but I
> > > quite like that interface for simple state management. We also avoid
> > > eating up more UAPI bits in the global namespace.
> > 
> > The problem with that is that it requires yet another KVM device for
> > this, and I'm lazy. It also makes it a bit harder for the VMM to buy
> > into this (need to track another FD, for example).
> 
> You can also accept the device ioctls on the actual VM FD, quite like
> we do for the vCPU right now. And hey, I've got a patch that gets you
> most of the way there!
> 
> https://lore.kernel.org/kvmarm/20230211013759.3556016-3-oliver.upton@linux.dev/

Huh... I don't know yet if I love it or hate it.At the end of the day,
this is just another ioctl, so I don't care either way.

> > > Is there any reason why we can't just order this ioctl before vCPU
> > > creation altogether, or is there a need to do this at runtime? We're
> > > about to tolerate multiple writers to the offset value, and I think the
> > > only thing we need to guarantee is that the below flag is set before
> > > vCPU ioctls have a chance to run.
> > 
> > Again, we don't know for sure whether the final offset is available
> > before vcpu creation time. My idea for QEMU would be to perform the
> > offset adjustment as late as possible, right before executing the VM,
> > after having restored the vcpus with whatever value they had.
> 
> So how does userspace work out an offset based on available information?
> The part that hasn't clicked for me yet is where userspace gets the
> current value of the true physical counter to calculate an offset.

What's wrong with CNTVCT_EL0?

> We could make it ABI that the guest's physical counter matches that of
> the host by default. Of course, that has been the case since the
> beginning of time but it is now directly user-visible.
> 
> The only part I don't like about that is that we aren't fully creating
> an abstraction around host and guest system time. So here's my current
> mental model of how we represent the generic timer to userspace:
> 
> 				+-----------------------+
> 				|	   		|
> 				| Host System Counter	|
> 				|	   (1) 		|
> 				+-----------------------+
> 				    	   |
> 			       +-----------+-----------+
> 			       |		       |
>        +-----------------+  +-----+		    +-----+  +--------------------+
>        | (2) CNTPOFF_EL2 |--| sub |		    | sub |--| (3) CNTVOFF_EL2    |
>        +-----------------+  +-----+	     	    +-----+  +--------------------+
> 			       |           	       |
> 			       |		       |
> 		     +-----------------+	 +----------------+
> 		     | (5) CNTPCT_EL0  |         | (4) CNTVCT_EL0 |
> 		     +-----------------+	 +----------------+
> 
> AFAICT, this UAPI exposes abstractions for (2) and (3) to userspace, but
> userspace cannot directly get at (1).

Of course it can! CNTVCT_EL0 is accessible from userspace, and is
guaranteed to have an offset of 0 on a host.

> 
> Chewing on this a bit more, I don't think userspace has any business
> messing with virtual and physical time independently, especially when
> nested virtualization comes into play.

Well, NV already ignores the virtual offset completely (see how the
virtual timer gets its offset reassigned at reset time).

> 
> I think the illusion to userspace needs to be built around the notion of
> a system counter:
> 
>                                 +-----------------------+
>                                 |                       |
>                                 | Host System Counter   |
>                                 |          (1)          |
>                                 +-----------------------+
> 					   |
> 					   |
> 					+-----+   +-------------------+
> 					| sub |---| (6) system_offset |
> 					+-----+   +-------------------+
> 					   |
> 					   |
>                                 +-----------------------+
>                                 |                       |
>                                 | Guest System Counter  |
>                                 |          (7)          |
>                                 +-----------------------+
>                                            |
>                                +-----------+-----------+
>                                |                       |
>        +-----------------+  +-----+                 +-----+  +--------------------+
>        | (2) CNTPOFF_EL2 |--| sub |                 | sub |--| (3) CNTVOFF_EL2    |
>        +-----------------+  +-----+                 +-----+  +--------------------+
>                                |                       |
>                                |                       |
>                      +-----------------+         +----------------+
>                      | (5) CNTPCT_EL0  |         | (4) CNTVCT_EL0 |
>                      +-----------------+         +----------------+
> 
> And from a UAPI perspective, we would either expose (1) and (6) to let
> userspace calculate an offset or simply allow (7) to be directly
> read/written.

I previously toyed with this idea, and I really like it. However, the
problem with this is that it breaks the current behaviour of having
two different values for CNTVCT and CNTPCT in the guest, and CNTPCT
representing the counter value on the host.

Such a VM cannot be migrated *today*, but not everybody cares about
migration. My "dual offset" approach allows the current behaviour to
persist, and such a VM to be migrated. The luser even gets the choice
of preserving counter continuity in the guest or to stay without a
physical offset and reflect the host's counter.

Is it a good behaviour? Of course not. Does anyone depend on it? I
have no idea, but odds are that someone does. Can we break their toys?
The jury is still out.

> 
> That frees up the meaning of the counter offsets as being purely a
> virtual EL2 thing. These registers would reset to 0, and non-NV guests
> could never change their value.
> 
> Under the hood KVM would program the true offset registers as:
> 
> 	CNT{P,V}OFF_EL2 = 'virtual CNT{P,V}OFF_EL2' + system_offset
> 
> With this we would effectively configure CNTPCT = CNTVCT = 0 at the
> point of VM creation. Only crappy thing is it requires full physical
> counter/timer emulation for non-ECV systems, but the guest shouldn't be
> using the physical counter in the first place.

And I think that's the point where we differ. I can completely imagine
some in-VM code using the physical counter to export some timestamping
to the host (for tracing purposes, amongst other things).

> Yes, this sucks for guests running on hosts w/ NV but not ECV. If anyone
> can tell me how an L0 hypervisor is supposed to do NV without ECV, I'm
> all ears.

You absolutely can run with NV2 without ECV. You just get a bad
quality of emulation for the EL0 timers. But that's about it.

> Does any of what I've written make remote sense or have I gone entirely
> off the rails with my ASCII art? :)

Your ASCII art is beautiful, only a tad too wide! ;-) What you suggest
makes a lot of sense, but it leaves existing behaviours in the lurch.
Can we pretend they don't exist? You tell me!

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-02-22 11:58 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
2023-02-16 14:21 ` [PATCH 01/16] arm64: Add CNTPOFF_EL2 register definition Marc Zyngier
2023-02-16 14:21 ` [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability Marc Zyngier
2023-02-22  4:30   ` Reiji Watanabe
2023-02-22 10:47     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 03/16] kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM Marc Zyngier
2023-02-23 22:30   ` Colton Lewis
2023-02-16 14:21 ` [PATCH 04/16] KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional ns Marc Zyngier
2023-02-23 22:30   ` Colton Lewis
2023-02-16 14:21 ` [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value Marc Zyngier
2023-02-22  6:15   ` Reiji Watanabe
2023-02-22 10:54     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer Marc Zyngier
2023-02-23 22:34   ` Colton Lewis
2023-02-24  8:59     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 Marc Zyngier
2023-02-23 22:40   ` Colton Lewis
2023-02-24 10:54     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets Marc Zyngier
2023-02-16 22:09   ` Oliver Upton
2023-02-17 10:17     ` Marc Zyngier
2023-02-17 22:11       ` Oliver Upton
2023-02-22 11:56         ` Marc Zyngier [this message]
2023-02-22 16:34           ` Oliver Upton
2023-02-23 18:25             ` Marc Zyngier
2023-03-08  7:46               ` Oliver Upton
2023-03-08  7:53                 ` Oliver Upton
2023-03-09  8:29                   ` Marc Zyngier
2023-03-09  8:25                 ` Marc Zyngier
2023-02-23 22:41   ` Colton Lewis
2023-02-24 11:24     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 09/16] KVM: arm64: timers: Allow save/restoring of the physical timer Marc Zyngier
2023-02-16 14:21 ` [PATCH 10/16] KVM: arm64: timers: Rationalise per-vcpu timer init Marc Zyngier
2023-02-16 14:21 ` [PATCH 11/16] KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co Marc Zyngier
2023-02-16 14:21 ` [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset Marc Zyngier
2023-02-24 20:07   ` Colton Lewis
2023-02-25 10:32     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation Marc Zyngier
2023-02-24 20:08   ` Colton Lewis
2023-02-25 10:34     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 14/16] KVM: arm64: selftests: Add physical timer registers to the sysreg list Marc Zyngier
2023-02-16 14:21 ` [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets Marc Zyngier
2023-03-06 22:08   ` Colton Lewis
2023-03-09  9:01     ` Marc Zyngier
2023-03-10 19:26       ` Colton Lewis
2023-03-12 15:53         ` Marc Zyngier
2023-03-13 11:43         ` Marc Zyngier
2023-03-14 17:47           ` Colton Lewis
2023-03-14 18:18             ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 16/16] KVM: arm64: selftests: Deal with spurious timer interrupts Marc Zyngier
2023-02-21 16:28 ` [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Veith, Simon
2023-02-21 22:17   ` Marc Zyngier
2023-02-23 22:29 ` Colton Lewis
2023-02-24  8:45   ` Marc Zyngier
2023-02-24 20:07 ` Colton Lewis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86bkllyku2.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=oliver.upton@linux.dev \
    --cc=ricarkol@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=sveith@amazon.de \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).