From: Marc Zyngier <maz@kernel.org>
To: Andrew Jones <drjones@redhat.com>
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
pbonzini@redhat.com, steven.price@arm.com
Subject: Re: [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap
Date: Mon, 22 Jun 2020 10:51:47 +0100 [thread overview]
Message-ID: <5a52210e5f123d52459f15c594e77bad@kernel.org> (raw)
In-Reply-To: <20200622084110.uosiqx3oy22lremu@kamzik.brq.redhat.com>
On 2020-06-22 09:41, Andrew Jones wrote:
> On Mon, Jun 22, 2020 at 09:20:02AM +0100, Marc Zyngier wrote:
>> Hi Andrew,
>>
>> On 2020-06-19 19:46, Andrew Jones wrote:
>> > arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
>> > support for steal time. However this is unnecessary and complicates
>> > userspace (userspace may prefer delaying vcpu creation until after
>> > feature probing). Since probing steal time only requires a KVM fd,
>> > we introduce a cap that can be checked.
>>
>> So this is purely an API convenience, right? You want a way to
>> identify the presence of steal time accounting without having to
>> create a vcpu? It would have been nice to have this requirement
>> before we merged this code :-(.
>
> Yes. I wish I had considered it more closely when I was reviewing the
> patches. And, I believe we have yet another user interface issue that
> I'm looking at now. Without the VCPU feature bit I'm not sure how easy
> it will be for a migration to fail when attempting to migrate from a
> host
> with steal-time enabled to one that does not support steal-time. So
> it's
> starting to look like steal-time should have followed the pmu pattern
> completely, not just the vcpu device ioctl part.
Should we consider disabling steal time altogether until this is worked
out?
>>
>> > Additionally, when probing
>> > steal time we should check delayacct_on, because even though
>> > CONFIG_KVM selects TASK_DELAY_ACCT, it's possible for the host
>> > kernel to have delay accounting disabled with the 'nodelayacct'
>> > command line option. x86 already determines support for steal time
>> > by checking delayacct_on and can already probe steal time support
>> > with a kvm fd (KVM_GET_SUPPORTED_CPUID), but we add the cap there
>> > too for consistency.
>> >
>> > Signed-off-by: Andrew Jones <drjones@redhat.com>
>> > ---
>> > Documentation/virt/kvm/api.rst | 11 +++++++++++
>> > arch/arm64/kvm/arm.c | 3 +++
>> > arch/x86/kvm/x86.c | 3 +++
>> > include/uapi/linux/kvm.h | 1 +
>> > 4 files changed, 18 insertions(+)
>> >
>> > diff --git a/Documentation/virt/kvm/api.rst
>> > b/Documentation/virt/kvm/api.rst
>> > index 9a12ea498dbb..05b1fdb88383 100644
>> > --- a/Documentation/virt/kvm/api.rst
>> > +++ b/Documentation/virt/kvm/api.rst
>> > @@ -6151,3 +6151,14 @@ KVM can therefore start protected VMs.
>> > This capability governs the KVM_S390_PV_COMMAND ioctl and the
>> > KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
>> > guests when the state change is invalid.
>> > +
>> > +8.24 KVM_CAP_STEAL_TIME
>> > +-----------------------
>> > +
>> > +:Architectures: arm64, x86
>> > +
>> > +This capability indicates that KVM supports steal time accounting.
>> > +When steal time accounting is supported it may be enabled with
>> > +architecture-specific interfaces. For x86 see
>> > +Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME". For arm64 see
>> > +Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL"
>> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> > index 90cb90561446..f6dca6d09952 100644
>> > --- a/arch/arm64/kvm/arm.c
>> > +++ b/arch/arm64/kvm/arm.c
>> > @@ -222,6 +222,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
>> > long ext)
>> > */
>> > r = 1;
>> > break;
>> > + case KVM_CAP_STEAL_TIME:
>> > + r = sched_info_on();
>> > + break;
>> > default:
>> > r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
>> > break;
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index 00c88c2f34e4..ced6335e403e 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -3533,6 +3533,9 @@ int kvm_vm_ioctl_check_extension(struct kvm
>> > *kvm, long ext)
>> > case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
>> > r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
>> > break;
>> > + case KVM_CAP_STEAL_TIME:
>> > + r = sched_info_on();
>> > + break;
>> > default:
>> > break;
>> > }
>> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> > index 4fdf30316582..121fb29ac004 100644
>> > --- a/include/uapi/linux/kvm.h
>> > +++ b/include/uapi/linux/kvm.h
>> > @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
>> > #define KVM_CAP_PPC_SECURE_GUEST 181
>> > #define KVM_CAP_HALT_POLL 182
>> > #define KVM_CAP_ASYNC_PF_INT 183
>> > +#define KVM_CAP_STEAL_TIME 184
>> >
>> > #ifdef KVM_CAP_IRQ_ROUTING
>>
>> Shouldn't you also add the same check of sched_info_on() to
>> the various pvtime attribute handling functions? It feels odd
>> that the capability can say "no", and yet we'd accept userspace
>> messing with the steal time parameters...
>
> I considered that, but the 'has attr' interface is really only asking
> if the interface exists, not if it should be used. I'm not sure what
> we should do about it other than document that the cap needs to say
> it's usable, rather than just the attr presence. But, since we've had
> the attr merged quite a while without the cap, then maybe we can't
> rely on a doc change alone?
Accepting the pvtime attributes (setting up the per-vcpu area) has two
effects: we promise both the guest and userspace that we will provide
the guest with steal time. By not checking sched_info_on(), we lie to
both, with potential consequences. It really feels like a bug.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
next prev parent reply other threads:[~2020-06-22 9:51 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 18:46 [PATCH 0/4] arm64/x86: KVM: Introduce KVM_CAP_STEAL_TIME Andrew Jones
2020-06-19 18:46 ` [PATCH 1/4] KVM: Documentation minor fixups Andrew Jones
2020-06-19 18:46 ` [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap Andrew Jones
2020-06-22 8:20 ` Marc Zyngier
2020-06-22 8:35 ` Steven Price
2020-06-22 8:41 ` Andrew Jones
2020-06-22 9:51 ` Marc Zyngier [this message]
2020-06-22 10:31 ` Andrew Jones
2020-06-22 10:39 ` Marc Zyngier
2020-06-22 11:04 ` Andrew Jones
2020-06-19 18:46 ` [PATCH 3/4] tools headers kvm: Sync linux/kvm.h with the kernel sources Andrew Jones
2020-06-19 18:46 ` [PATCH 4/4] KVM: selftests: Use KVM_CAP_STEAL_TIME Andrew Jones
2020-06-22 8:09 ` [PATCH 0/4] arm64/x86: KVM: Introduce KVM_CAP_STEAL_TIME Steven Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a52210e5f123d52459f15c594e77bad@kernel.org \
--to=maz@kernel.org \
--cc=drjones@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=pbonzini@redhat.com \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox