All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Andrew Jones <drjones@redhat.com>
Cc: pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu,
	kvm@vger.kernel.org, steven.price@arm.com
Subject: Re: [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap
Date: Mon, 22 Jun 2020 10:51:47 +0100	[thread overview]
Message-ID: <5a52210e5f123d52459f15c594e77bad@kernel.org> (raw)
In-Reply-To: <20200622084110.uosiqx3oy22lremu@kamzik.brq.redhat.com>

On 2020-06-22 09:41, Andrew Jones wrote:
> On Mon, Jun 22, 2020 at 09:20:02AM +0100, Marc Zyngier wrote:
>> Hi Andrew,
>> 
>> On 2020-06-19 19:46, Andrew Jones wrote:
>> > arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
>> > support for steal time. However this is unnecessary and complicates
>> > userspace (userspace may prefer delaying vcpu creation until after
>> > feature probing). Since probing steal time only requires a KVM fd,
>> > we introduce a cap that can be checked.
>> 
>> So this is purely an API convenience, right? You want a way to
>> identify the presence of steal time accounting without having to
>> create a vcpu? It would have been nice to have this requirement
>> before we merged this code :-(.
> 
> Yes. I wish I had considered it more closely when I was reviewing the
> patches. And, I believe we have yet another user interface issue that
> I'm looking at now. Without the VCPU feature bit I'm not sure how easy
> it will be for a migration to fail when attempting to migrate from a 
> host
> with steal-time enabled to one that does not support steal-time. So 
> it's
> starting to look like steal-time should have followed the pmu pattern
> completely, not just the vcpu device ioctl part.

Should we consider disabling steal time altogether until this is worked 
out?

>> 
>> > Additionally, when probing
>> > steal time we should check delayacct_on, because even though
>> > CONFIG_KVM selects TASK_DELAY_ACCT, it's possible for the host
>> > kernel to have delay accounting disabled with the 'nodelayacct'
>> > command line option. x86 already determines support for steal time
>> > by checking delayacct_on and can already probe steal time support
>> > with a kvm fd (KVM_GET_SUPPORTED_CPUID), but we add the cap there
>> > too for consistency.
>> >
>> > Signed-off-by: Andrew Jones <drjones@redhat.com>
>> > ---
>> >  Documentation/virt/kvm/api.rst | 11 +++++++++++
>> >  arch/arm64/kvm/arm.c           |  3 +++
>> >  arch/x86/kvm/x86.c             |  3 +++
>> >  include/uapi/linux/kvm.h       |  1 +
>> >  4 files changed, 18 insertions(+)
>> >
>> > diff --git a/Documentation/virt/kvm/api.rst
>> > b/Documentation/virt/kvm/api.rst
>> > index 9a12ea498dbb..05b1fdb88383 100644
>> > --- a/Documentation/virt/kvm/api.rst
>> > +++ b/Documentation/virt/kvm/api.rst
>> > @@ -6151,3 +6151,14 @@ KVM can therefore start protected VMs.
>> >  This capability governs the KVM_S390_PV_COMMAND ioctl and the
>> >  KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
>> >  guests when the state change is invalid.
>> > +
>> > +8.24 KVM_CAP_STEAL_TIME
>> > +-----------------------
>> > +
>> > +:Architectures: arm64, x86
>> > +
>> > +This capability indicates that KVM supports steal time accounting.
>> > +When steal time accounting is supported it may be enabled with
>> > +architecture-specific interfaces.  For x86 see
>> > +Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME".  For arm64 see
>> > +Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL"
>> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> > index 90cb90561446..f6dca6d09952 100644
>> > --- a/arch/arm64/kvm/arm.c
>> > +++ b/arch/arm64/kvm/arm.c
>> > @@ -222,6 +222,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
>> > long ext)
>> >  		 */
>> >  		r = 1;
>> >  		break;
>> > +	case KVM_CAP_STEAL_TIME:
>> > +		r = sched_info_on();
>> > +		break;
>> >  	default:
>> >  		r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
>> >  		break;
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index 00c88c2f34e4..ced6335e403e 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -3533,6 +3533,9 @@ int kvm_vm_ioctl_check_extension(struct kvm
>> > *kvm, long ext)
>> >  	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
>> >  		r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
>> >  		break;
>> > +	case KVM_CAP_STEAL_TIME:
>> > +		r = sched_info_on();
>> > +		break;
>> >  	default:
>> >  		break;
>> >  	}
>> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> > index 4fdf30316582..121fb29ac004 100644
>> > --- a/include/uapi/linux/kvm.h
>> > +++ b/include/uapi/linux/kvm.h
>> > @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
>> >  #define KVM_CAP_PPC_SECURE_GUEST 181
>> >  #define KVM_CAP_HALT_POLL 182
>> >  #define KVM_CAP_ASYNC_PF_INT 183
>> > +#define KVM_CAP_STEAL_TIME 184
>> >
>> >  #ifdef KVM_CAP_IRQ_ROUTING
>> 
>> Shouldn't you also add the same check of sched_info_on() to
>> the various pvtime attribute handling functions? It feels odd
>> that the capability can say "no", and yet we'd accept userspace
>> messing with the steal time parameters...
> 
> I considered that, but the 'has attr' interface is really only asking
> if the interface exists, not if it should be used. I'm not sure what
> we should do about it other than document that the cap needs to say
> it's usable, rather than just the attr presence. But, since we've had
> the attr merged quite a while without the cap, then maybe we can't
> rely on a doc change alone?

Accepting the pvtime attributes (setting up the per-vcpu area) has two
effects: we promise both the guest and userspace that we will provide
the guest with steal time. By not checking sched_info_on(), we lie to
both, with potential consequences. It really feels like a bug.

Thanks,

          M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Andrew Jones <drjones@redhat.com>
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	pbonzini@redhat.com, steven.price@arm.com
Subject: Re: [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap
Date: Mon, 22 Jun 2020 10:51:47 +0100	[thread overview]
Message-ID: <5a52210e5f123d52459f15c594e77bad@kernel.org> (raw)
In-Reply-To: <20200622084110.uosiqx3oy22lremu@kamzik.brq.redhat.com>

On 2020-06-22 09:41, Andrew Jones wrote:
> On Mon, Jun 22, 2020 at 09:20:02AM +0100, Marc Zyngier wrote:
>> Hi Andrew,
>> 
>> On 2020-06-19 19:46, Andrew Jones wrote:
>> > arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
>> > support for steal time. However this is unnecessary and complicates
>> > userspace (userspace may prefer delaying vcpu creation until after
>> > feature probing). Since probing steal time only requires a KVM fd,
>> > we introduce a cap that can be checked.
>> 
>> So this is purely an API convenience, right? You want a way to
>> identify the presence of steal time accounting without having to
>> create a vcpu? It would have been nice to have this requirement
>> before we merged this code :-(.
> 
> Yes. I wish I had considered it more closely when I was reviewing the
> patches. And, I believe we have yet another user interface issue that
> I'm looking at now. Without the VCPU feature bit I'm not sure how easy
> it will be for a migration to fail when attempting to migrate from a 
> host
> with steal-time enabled to one that does not support steal-time. So 
> it's
> starting to look like steal-time should have followed the pmu pattern
> completely, not just the vcpu device ioctl part.

Should we consider disabling steal time altogether until this is worked 
out?

>> 
>> > Additionally, when probing
>> > steal time we should check delayacct_on, because even though
>> > CONFIG_KVM selects TASK_DELAY_ACCT, it's possible for the host
>> > kernel to have delay accounting disabled with the 'nodelayacct'
>> > command line option. x86 already determines support for steal time
>> > by checking delayacct_on and can already probe steal time support
>> > with a kvm fd (KVM_GET_SUPPORTED_CPUID), but we add the cap there
>> > too for consistency.
>> >
>> > Signed-off-by: Andrew Jones <drjones@redhat.com>
>> > ---
>> >  Documentation/virt/kvm/api.rst | 11 +++++++++++
>> >  arch/arm64/kvm/arm.c           |  3 +++
>> >  arch/x86/kvm/x86.c             |  3 +++
>> >  include/uapi/linux/kvm.h       |  1 +
>> >  4 files changed, 18 insertions(+)
>> >
>> > diff --git a/Documentation/virt/kvm/api.rst
>> > b/Documentation/virt/kvm/api.rst
>> > index 9a12ea498dbb..05b1fdb88383 100644
>> > --- a/Documentation/virt/kvm/api.rst
>> > +++ b/Documentation/virt/kvm/api.rst
>> > @@ -6151,3 +6151,14 @@ KVM can therefore start protected VMs.
>> >  This capability governs the KVM_S390_PV_COMMAND ioctl and the
>> >  KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
>> >  guests when the state change is invalid.
>> > +
>> > +8.24 KVM_CAP_STEAL_TIME
>> > +-----------------------
>> > +
>> > +:Architectures: arm64, x86
>> > +
>> > +This capability indicates that KVM supports steal time accounting.
>> > +When steal time accounting is supported it may be enabled with
>> > +architecture-specific interfaces.  For x86 see
>> > +Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME".  For arm64 see
>> > +Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL"
>> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> > index 90cb90561446..f6dca6d09952 100644
>> > --- a/arch/arm64/kvm/arm.c
>> > +++ b/arch/arm64/kvm/arm.c
>> > @@ -222,6 +222,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
>> > long ext)
>> >  		 */
>> >  		r = 1;
>> >  		break;
>> > +	case KVM_CAP_STEAL_TIME:
>> > +		r = sched_info_on();
>> > +		break;
>> >  	default:
>> >  		r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
>> >  		break;
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index 00c88c2f34e4..ced6335e403e 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -3533,6 +3533,9 @@ int kvm_vm_ioctl_check_extension(struct kvm
>> > *kvm, long ext)
>> >  	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
>> >  		r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
>> >  		break;
>> > +	case KVM_CAP_STEAL_TIME:
>> > +		r = sched_info_on();
>> > +		break;
>> >  	default:
>> >  		break;
>> >  	}
>> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> > index 4fdf30316582..121fb29ac004 100644
>> > --- a/include/uapi/linux/kvm.h
>> > +++ b/include/uapi/linux/kvm.h
>> > @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
>> >  #define KVM_CAP_PPC_SECURE_GUEST 181
>> >  #define KVM_CAP_HALT_POLL 182
>> >  #define KVM_CAP_ASYNC_PF_INT 183
>> > +#define KVM_CAP_STEAL_TIME 184
>> >
>> >  #ifdef KVM_CAP_IRQ_ROUTING
>> 
>> Shouldn't you also add the same check of sched_info_on() to
>> the various pvtime attribute handling functions? It feels odd
>> that the capability can say "no", and yet we'd accept userspace
>> messing with the steal time parameters...
> 
> I considered that, but the 'has attr' interface is really only asking
> if the interface exists, not if it should be used. I'm not sure what
> we should do about it other than document that the cap needs to say
> it's usable, rather than just the attr presence. But, since we've had
> the attr merged quite a while without the cap, then maybe we can't
> rely on a doc change alone?

Accepting the pvtime attributes (setting up the per-vcpu area) has two
effects: we promise both the guest and userspace that we will provide
the guest with steal time. By not checking sched_info_on(), we lie to
both, with potential consequences. It really feels like a bug.

Thanks,

          M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2020-06-22  9:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 18:46 [PATCH 0/4] arm64/x86: KVM: Introduce KVM_CAP_STEAL_TIME Andrew Jones
2020-06-19 18:46 ` Andrew Jones
2020-06-19 18:46 ` [PATCH 1/4] KVM: Documentation minor fixups Andrew Jones
2020-06-19 18:46   ` Andrew Jones
2020-06-19 18:46 ` [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap Andrew Jones
2020-06-19 18:46   ` Andrew Jones
2020-06-22  8:20   ` Marc Zyngier
2020-06-22  8:20     ` Marc Zyngier
2020-06-22  8:35     ` Steven Price
2020-06-22  8:35       ` Steven Price
2020-06-22  8:41     ` Andrew Jones
2020-06-22  8:41       ` Andrew Jones
2020-06-22  9:51       ` Marc Zyngier [this message]
2020-06-22  9:51         ` Marc Zyngier
2020-06-22 10:31         ` Andrew Jones
2020-06-22 10:31           ` Andrew Jones
2020-06-22 10:39           ` Marc Zyngier
2020-06-22 10:39             ` Marc Zyngier
2020-06-22 11:04             ` Andrew Jones
2020-06-22 11:04               ` Andrew Jones
2020-06-19 18:46 ` [PATCH 3/4] tools headers kvm: Sync linux/kvm.h with the kernel sources Andrew Jones
2020-06-19 18:46   ` Andrew Jones
2020-06-19 18:46 ` [PATCH 4/4] KVM: selftests: Use KVM_CAP_STEAL_TIME Andrew Jones
2020-06-19 18:46   ` Andrew Jones
2020-06-22  8:09 ` [PATCH 0/4] arm64/x86: KVM: Introduce KVM_CAP_STEAL_TIME Steven Price
2020-06-22  8:09   ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a52210e5f123d52459f15c594e77bad@kernel.org \
    --to=maz@kernel.org \
    --cc=drjones@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=pbonzini@redhat.com \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.