From: Marc Zyngier <maz@kernel.org>
To: Simon Veith <sveith@amazon.de>
Cc: <dwmw2@infradead.org>, Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, James Morse <james.morse@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Oliver Upton <oliver.upton@linux.dev>,
Zenghui Yu <yuzenghui@huawei.com>,
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] arm64: kvm: Expose timer offset directly via KVM_{GET,SET}_ONE_REG
Date: Thu, 02 Feb 2023 13:50:58 +0000 [thread overview]
Message-ID: <86a61w18hp.wl-maz@kernel.org> (raw)
In-Reply-To: <20230202121314.206195-1-sveith@amazon.de>
Hi Simon,
On Thu, 02 Feb 2023 12:13:14 +0000,
Simon Veith <sveith@amazon.de> wrote:
>
> The virtual timer count register (CNTVCT_EL0) is virtualized by
> configuring offset register CNTVOFF_EL2 to subtract from the underlying
> raw hardware timer count when the guest reads the current count.
>
> Currently, we offer userspace the ability to serialize and deserialize
> only the absolute count register value, using KVM_{GET,SET}_ONE_REG with
> KVM_REG_ARM_TIMER_CNT. Internally, we then compute and set the offset
> register accordingly to obtain the requested count value.
>
> Allowing to set this timer count register only by absolute value poses
> some problems to virtual machine monitors that try to maintain the
> illusion of continuously ticking clocks to the guest: In workflows like
> live migration or liveupdate, the timers must be increased artificially
> to account for pause time.
"must" is a pretty strong word. Given that this isn't advertised as
stolen time to the guest, any sort of time-sensitive process (such as
an in-guest watchdog) is likely to be ticked the wrong way if you
start adding that time to the counter.
For example, QEMU doesn't do that, and wants time continuity, hence
the current behaviour.
>
> Any delays between userspace computing the correct timer count value and
> actually setting it in kernel space by KVM_SET_ONE_REG (such as can be
> incurred by scheduling) become visible as under-accounted pause time in
> the guest, meaning the guest observes that its system clock seems to
> have fallen behind its NTP time reference.
>
> The issue is further complicated when vCPU setup is performed by
> independent threads which may experience different delays, leading to
> jitter between the clocks of different vCPUs.
How? I really hope that you will have restored *all* the vcpus before
restarting any. If you don't, your userspace is buggy.
>
> We could deliver a more stable timer in such scenarios if we allowed
> userspace to set the offset with regards to the physical counter
> directly.
>
> Expose the KVM_REG_ARM_TIMER_OFF register directly to userspace, as an
> alternative view of the timer counts. By default, userspace still sees
> only the existing KVM_REG_ARM_TIMER_CNT register when querying the list
> with KVM_GET_REG_LIST, as that register value is portable across
> different VM hosts and thus safe to persist.
I can see a few things are not quite right with this approach:
- You hijack a register that isn't an EL1 register. This should never
be exposed to a userspace as such, as it would otherwise change
behaviour with NV, which is definitely in control of it.
- What is the ordering between restoring the timer value and restoring
the timer offset? Both do the same thing, and impact all vcpus. How
does it make anything better if your userspace (such as QEMU) saves
*all* the available registers and restores them all, on all vcpus?
- You make this a per-vcpu value. But what this does is to provide an
offset for the *whole VM*. Why not take the bullet and simply make
this a per-VM adjustment?
- What about the physical timer? Doesn't it need some similar
treatment as well, irrespective of the presence of ECV?
We have been around that particular block a few times in the past, and
I may have changed my mind more than once. But as the NV code has
finally reached a point where these things matter, we really shouldn't
go into a direction where we'd end-up with varying semantics depending
on whether CNT{P,V}OFF_EL2 is under control of the host or the guest.
It should also be a feature that is advertised, and bought into from
the VMM. It cannot be an implicit behaviour.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-02-02 13:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-02 12:13 [PATCH] arm64: kvm: Expose timer offset directly via KVM_{GET,SET}_ONE_REG Simon Veith
2023-02-02 12:54 ` David Woodhouse
2023-02-02 13:50 ` Marc Zyngier [this message]
2023-02-02 15:18 ` David Woodhouse
2023-02-06 19:55 ` Oliver Upton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86a61w18hp.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dwmw2@infradead.org \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=oliver.upton@linux.dev \
--cc=suzuki.poulose@arm.com \
--cc=sveith@amazon.de \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).