[RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: jan.kiszka@web.de (Jan Kiszka)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running
Date: Thu, 09 Jul 2015 12:40:39 +0200	[thread overview]
Message-ID: <559E4FA7.2060307@web.de> (raw)
In-Reply-To: <20150709102201.GH13530@cbox>

On 2015-07-09 12:22, Christoffer Dall wrote:
> Hi Peter and Marc,
> 
> [cc'ing Paolo for his input on x86 timekeeping]
> 
> On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote:
>> On 8 July 2015 at 17:37, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 08/07/15 17:06, Peter Maydell wrote:
>>>> I'd prefer it if somebody could investigate to see why QEMU
>>>> is actually doing this -- so far we just have speculation.
>>>
>>> I'd prefer that too, but so far people seem to be more comfortable
>>> waiting for the issue to fix itself. In the meantime, VMs are broken in
>>> weird and wonderful ways, and I don't think the current status-quo helps
>>> anyone.
>>
>> Putting in a patch which might not be the right fix isn't
>> necessarily a good plan either...
>>
>> Does has_run_once get cleared if we do a re-VCPU_INIT
>> of a CPU that's run before? (We need to allow rewriting
>> of guest state at that point so that "reset VM and
>> load migration state" behaves correctly.)
> 
> no, it does not, has_run_once is set the first time a VCPU is run and is
> currently *never* cleared.
> 
>>
>> I suspect Jan is right and we really need to distinguish
>> the KVM_PUT_*_STATE levels in ARM QEMU. This probably
>> implies some kind of whitelist/override mechanism, since
>> by and large we neither know nor want to know the
>> semantics for system registers, we leave that up to the
>> kernel.
>>
>> Q: if you have a running VM, and you pause it for
>> an hour, what should the CNTVCT register do? Presumably
>> it should not advance, but how do we arrange for that
>> to happen?
>>
> 
> I think the CNTVCT should not advance when the VM is not scheduled, so
> if we pause the VM or starve all the VCPUs for enough time, the guest
> should not see time progressing, since otherwise the guest scheduler
> cannot maintain fairness and you're bound to see spurious RCU stalls
> etc.
> 
> That's exactly why a guest can read both a virtual and physical counter
> and it is an area where you simply want some level of
> paravirtualization.  I haven't studied how/if Linux deals with this at
> all.
> 
> So I think adjusting CNTVOFF should be managed by the kernel for the
> pause/starvation scenario (which I think Avi once told me x86 does too -
> does anyone know the current state of the art?).
> 
> So the only situation where I think userspace should adjust the CNTVOFF
> value is for migration where we are talking about a brand new VM with
> has_run_once clear.
> 
> Thus, if we were designing this from scratch now, the API should
> be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the
> VM has run once, but it's too late for that as we would break userspace.
> The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF
> in the kernel side as well, and finally also fix QEMU so that it doesn't
> try to do the thing that future kernels will ignore.

Fixing QEMU to only write on KVM_PUT_FULL_STATE - yes, that should be
done, but I don't think the approach for the kernel is generally right.
The kernel should not do any policing on user space requests to change
the VCPU or VM state unless

 - security is affected
 - userspace lacks information, thus cannot decide correctly
 - legacy userspace has a bug, we can detect it and want to fix that up
   without affecting future userspace that has a reason to do it
   differently

Regarding CNTVOFF, the first two criteria do not apply for sure. Maybe
the last one, don't know. Just think of the hypothetical scenario that a
userspace VM debugger wants to inject certain register manipulations. If
you block this by some hidden VM state like proposed, that feature would
no longer be implementable easily.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150709/4a8a2d51/attachment.sig>

next prev parent reply	other threads:[~2015-07-09 10:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-24 14:54 [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running Marc Zyngier
2015-06-25  8:04 ` Christoffer Dall
2015-06-25  8:48   ` Marc Zyngier
2015-06-25  8:59   ` Claudio Fontana
2015-06-25  9:10     ` Peter Maydell
2015-06-25  9:25       ` Claudio Fontana
2015-06-26  4:49         ` Jan Kiszka
2015-06-29 17:20           ` Claudio Fontana
2015-06-29 17:37             ` Peter Maydell
2015-07-08 15:56               ` Marc Zyngier
2015-07-08 16:06                 ` Peter Maydell
2015-07-08 16:37                   ` Marc Zyngier
2015-07-08 19:13                     ` Peter Maydell
2015-07-09 10:22                       ` Christoffer Dall
2015-07-09 10:38                         ` Peter Maydell
2015-07-09 12:05                           ` Christoffer Dall
2015-07-09 12:07                             ` Peter Maydell
2015-07-09 12:24                               ` Christoffer Dall
2015-07-09 14:17                                 ` Christoffer Dall
2015-07-09 14:26                                   ` Peter Maydell
2015-07-09 16:06                                     ` Christoffer Dall
2015-07-09 10:40                         ` Jan Kiszka [this message]
2015-07-09 12:08                           ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559E4FA7.2060307@web.de \
    --to=jan.kiszka@web.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).