From: jan.kiszka@web.de (Jan Kiszka)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
Date: Thu, 09 Jul 2015 12:40:39 +0200 [thread overview]
Message-ID: <559E4FA7.2060307@web.de> (raw)
In-Reply-To: <20150709102201.GH13530@cbox>
On 2015-07-09 12:22, Christoffer Dall wrote:
> Hi Peter and Marc,
>
> [cc'ing Paolo for his input on x86 timekeeping]
>
> On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote:
>> On 8 July 2015 at 17:37, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 08/07/15 17:06, Peter Maydell wrote:
>>>> I'd prefer it if somebody could investigate to see why QEMU
>>>> is actually doing this -- so far we just have speculation.
>>>
>>> I'd prefer that too, but so far people seem to be more comfortable
>>> waiting for the issue to fix itself. In the meantime, VMs are broken in
>>> weird and wonderful ways, and I don't think the current status-quo helps
>>> anyone.
>>
>> Putting in a patch which might not be the right fix isn't
>> necessarily a good plan either...
>>
>> Does has_run_once get cleared if we do a re-VCPU_INIT
>> of a CPU that's run before? (We need to allow rewriting
>> of guest state at that point so that "reset VM and
>> load migration state" behaves correctly.)
>
> no, it does not, has_run_once is set the first time a VCPU is run and is
> currently *never* cleared.
>
>>
>> I suspect Jan is right and we really need to distinguish
>> the KVM_PUT_*_STATE levels in ARM QEMU. This probably
>> implies some kind of whitelist/override mechanism, since
>> by and large we neither know nor want to know the
>> semantics for system registers, we leave that up to the
>> kernel.
>>
>> Q: if you have a running VM, and you pause it for
>> an hour, what should the CNTVCT register do? Presumably
>> it should not advance, but how do we arrange for that
>> to happen?
>>
>
> I think the CNTVCT should not advance when the VM is not scheduled, so
> if we pause the VM or starve all the VCPUs for enough time, the guest
> should not see time progressing, since otherwise the guest scheduler
> cannot maintain fairness and you're bound to see spurious RCU stalls
> etc.
>
> That's exactly why a guest can read both a virtual and physical counter
> and it is an area where you simply want some level of
> paravirtualization. I haven't studied how/if Linux deals with this at
> all.
>
> So I think adjusting CNTVOFF should be managed by the kernel for the
> pause/starvation scenario (which I think Avi once told me x86 does too -
> does anyone know the current state of the art?).
>
> So the only situation where I think userspace should adjust the CNTVOFF
> value is for migration where we are talking about a brand new VM with
> has_run_once clear.
>
> Thus, if we were designing this from scratch now, the API should
> be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the
> VM has run once, but it's too late for that as we would break userspace.
> The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF
> in the kernel side as well, and finally also fix QEMU so that it doesn't
> try to do the thing that future kernels will ignore.
Fixing QEMU to only write on KVM_PUT_FULL_STATE - yes, that should be
done, but I don't think the approach for the kernel is generally right.
The kernel should not do any policing on user space requests to change
the VCPU or VM state unless
- security is affected
- userspace lacks information, thus cannot decide correctly
- legacy userspace has a bug, we can detect it and want to fix that up
without affecting future userspace that has a reason to do it
differently
Regarding CNTVOFF, the first two criteria do not apply for sure. Maybe
the last one, don't know. Just think of the hypothetical scenario that a
userspace VM debugger wants to inject certain register manipulations. If
you block this by some hidden VM state like proposed, that feature would
no longer be implementable easily.
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150709/4a8a2d51/attachment.sig>
next prev parent reply other threads:[~2015-07-09 10:40 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-24 14:54 [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running Marc Zyngier
2015-06-25 8:04 ` Christoffer Dall
2015-06-25 8:48 ` Marc Zyngier
2015-06-25 8:59 ` Claudio Fontana
2015-06-25 9:10 ` Peter Maydell
2015-06-25 9:25 ` Claudio Fontana
2015-06-26 4:49 ` Jan Kiszka
2015-06-29 17:20 ` Claudio Fontana
2015-06-29 17:37 ` Peter Maydell
2015-07-08 15:56 ` Marc Zyngier
2015-07-08 16:06 ` Peter Maydell
2015-07-08 16:37 ` Marc Zyngier
2015-07-08 19:13 ` Peter Maydell
2015-07-09 10:22 ` Christoffer Dall
2015-07-09 10:38 ` Peter Maydell
2015-07-09 12:05 ` Christoffer Dall
2015-07-09 12:07 ` Peter Maydell
2015-07-09 12:24 ` Christoffer Dall
2015-07-09 14:17 ` Christoffer Dall
2015-07-09 14:26 ` Peter Maydell
2015-07-09 16:06 ` Christoffer Dall
2015-07-09 10:40 ` Jan Kiszka [this message]
2015-07-09 12:08 ` Christoffer Dall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=559E4FA7.2060307@web.de \
--to=jan.kiszka@web.de \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).