Re: [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap

From: Marc Zyngier <maz@kernel.org>
To: Andrew Jones <drjones@redhat.com>
Cc: pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu,
	kvm@vger.kernel.org, steven.price@arm.com
Subject: Re: [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap
Date: Mon, 22 Jun 2020 10:51:47 +0100	[thread overview]
Message-ID: <5a52210e5f123d52459f15c594e77bad@kernel.org> (raw)
In-Reply-To: <20200622084110.uosiqx3oy22lremu@kamzik.brq.redhat.com>

On 2020-06-22 09:41, Andrew Jones wrote:
> On Mon, Jun 22, 2020 at 09:20:02AM +0100, Marc Zyngier wrote:
>> Hi Andrew,
>> 
>> On 2020-06-19 19:46, Andrew Jones wrote:
>> > arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
>> > support for steal time. However this is unnecessary and complicates
>> > userspace (userspace may prefer delaying vcpu creation until after
>> > feature probing). Since probing steal time only requires a KVM fd,
>> > we introduce a cap that can be checked.
>> 
>> So this is purely an API convenience, right? You want a way to
>> identify the presence of steal time accounting without having to
>> create a vcpu? It would have been nice to have this requirement
>> before we merged this code :-(.
> 
> Yes. I wish I had considered it more closely when I was reviewing the
> patches. And, I believe we have yet another user interface issue that
> I'm looking at now. Without the VCPU feature bit I'm not sure how easy
> it will be for a migration to fail when attempting to migrate from a 
> host
> with steal-time enabled to one that does not support steal-time. So 
> it's
> starting to look like steal-time should have followed the pmu pattern
> completely, not just the vcpu device ioctl part.

Should we consider disabling steal time altogether until this is worked 
out?

>> 
>> > Additionally, when probing
>> > steal time we should check delayacct_on, because even though
>> > CONFIG_KVM selects TASK_DELAY_ACCT, it's possible for the host
>> > kernel to have delay accounting disabled with the 'nodelayacct'
>> > command line option. x86 already determines support for steal time
>> > by checking delayacct_on and can already probe steal time support
>> > with a kvm fd (KVM_GET_SUPPORTED_CPUID), but we add the cap there
>> > too for consistency.
>> >
>> > Signed-off-by: Andrew Jones <drjones@redhat.com>
>> > ---
>> >  Documentation/virt/kvm/api.rst | 11 +++++++++++
>> >  arch/arm64/kvm/arm.c           |  3 +++
>> >  arch/x86/kvm/x86.c             |  3 +++
>> >  include/uapi/linux/kvm.h       |  1 +
>> >  4 files changed, 18 insertions(+)
>> >
>> > diff --git a/Documentation/virt/kvm/api.rst
>> > b/Documentation/virt/kvm/api.rst
>> > index 9a12ea498dbb..05b1fdb88383 100644
>> > --- a/Documentation/virt/kvm/api.rst
>> > +++ b/Documentation/virt/kvm/api.rst
>> > @@ -6151,3 +6151,14 @@ KVM can therefore start protected VMs.
>> >  This capability governs the KVM_S390_PV_COMMAND ioctl and the
>> >  KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
>> >  guests when the state change is invalid.
>> > +
>> > +8.24 KVM_CAP_STEAL_TIME
>> > +-----------------------
>> > +
>> > +:Architectures: arm64, x86
>> > +
>> > +This capability indicates that KVM supports steal time accounting.
>> > +When steal time accounting is supported it may be enabled with
>> > +architecture-specific interfaces.  For x86 see
>> > +Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME".  For arm64 see
>> > +Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL"
>> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> > index 90cb90561446..f6dca6d09952 100644
>> > --- a/arch/arm64/kvm/arm.c
>> > +++ b/arch/arm64/kvm/arm.c
>> > @@ -222,6 +222,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
>> > long ext)
>> >  		 */
>> >  		r = 1;
>> >  		break;
>> > +	case KVM_CAP_STEAL_TIME:
>> > +		r = sched_info_on();
>> > +		break;
>> >  	default:
>> >  		r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
>> >  		break;
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index 00c88c2f34e4..ced6335e403e 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -3533,6 +3533,9 @@ int kvm_vm_ioctl_check_extension(struct kvm
>> > *kvm, long ext)
>> >  	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
>> >  		r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
>> >  		break;
>> > +	case KVM_CAP_STEAL_TIME:
>> > +		r = sched_info_on();
>> > +		break;
>> >  	default:
>> >  		break;
>> >  	}
>> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> > index 4fdf30316582..121fb29ac004 100644
>> > --- a/include/uapi/linux/kvm.h
>> > +++ b/include/uapi/linux/kvm.h
>> > @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
>> >  #define KVM_CAP_PPC_SECURE_GUEST 181
>> >  #define KVM_CAP_HALT_POLL 182
>> >  #define KVM_CAP_ASYNC_PF_INT 183
>> > +#define KVM_CAP_STEAL_TIME 184
>> >
>> >  #ifdef KVM_CAP_IRQ_ROUTING
>> 
>> Shouldn't you also add the same check of sched_info_on() to
>> the various pvtime attribute handling functions? It feels odd
>> that the capability can say "no", and yet we'd accept userspace
>> messing with the steal time parameters...
> 
> I considered that, but the 'has attr' interface is really only asking
> if the interface exists, not if it should be used. I'm not sure what
> we should do about it other than document that the cap needs to say
> it's usable, rather than just the attr presence. But, since we've had
> the attr merged quite a while without the cap, then maybe we can't
> rely on a doc change alone?

Accepting the pvtime attributes (setting up the per-vcpu area) has two
effects: we promise both the guest and userspace that we will provide
the guest with steal time. By not checking sched_info_on(), we lie to
both, with potential consequences. It really feels like a bug.

Thanks,

          M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm