All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
	shan.gavin@gmail.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH RFCv2 7/9] kvm/arm64: Support async page fault
Date: Wed, 27 May 2020 08:37:06 +0100	[thread overview]
Message-ID: <28c74819f42306e66370ddaf88f16918@kernel.org> (raw)
In-Reply-To: <e1230110-b51f-b8b8-60d9-372660c5c387@redhat.com>

On 2020-05-27 05:05, Gavin Shan wrote:
> Hi Mark,
> 

[...]

>>> +struct kvm_vcpu_pv_apf_data {
>>> +	__u32	reason;
>>> +	__u8	pad[60];
>>> +	__u32	enabled;
>>> +};
>> 
>> What's all the padding for?
>> 
> 
> The padding is ensure the @reason and @enabled in different cache
> line. @reason is shared by host/guest, while @enabled is almostly
> owned by guest.

So you are assuming that a cache line is at most 64 bytes.
It is actualy implementation defined, and you can probe for it
by looking at the CTR_EL0 register. There are implementations
ranging from 32 to 256 bytes in the wild, and let's not mention
broken big-little implementations here.

[...]

>>> +bool kvm_arch_can_inject_async_page_not_present(struct kvm_vcpu 
>>> *vcpu)
>>> +{
>>> +	u64 vbar, pc;
>>> +	u32 val;
>>> +	int ret;
>>> +
>>> +	if (!(vcpu->arch.apf.control_block & KVM_ASYNC_PF_ENABLED))
>>> +		return false;
>>> +
>>> +	if (vcpu->arch.apf.send_user_only && vcpu_mode_priv(vcpu))
>>> +		return false;
>>> +
>>> +	/* Pending page fault, which ins't acknowledged by guest */
>>> +	ret = kvm_async_pf_read_cache(vcpu, &val);
>>> +	if (ret || val)
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * Events can't be injected through data abort because it's
>>> +	 * going to clobber ELR_EL1, which might not consued (or saved)
>>> +	 * by guest yet.
>>> +	 */
>>> +	vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
>>> +	pc = *vcpu_pc(vcpu);
>>> +	if (pc >= vbar && pc < (vbar + vcpu->arch.apf.no_fault_inst_range))
>>> +		return false;
>> 
>> Ah, so that's when this `no_fault_inst_range` is for.
>> 
>> As-is this is not sufficient, and we'll need t be extremely careful
>> here.
>> 
>> The vectors themselves typically only have a small amount of stub 
>> code,
>> and the bulk of the non-reentrant exception entry work happens
>> elsewhere, in a mixture of assembly and C code that isn't even 
>> virtually
>> contiguous with either the vectors or itself.
>> 
>> It's possible in theory that code in modules (or perhaps in eBPF JIT'd
>> code) that isn't safe to take a fault from, so even having a 
>> contiguous
>> range controlled by the kernel isn't ideal.
>> 
>> How does this work on x86?
>> 
> 
> Yeah, here we just provide a mechanism to forbid injecting data abort. 
> The
> range is fed by guest through HVC call. So I think it's guest related 
> issue.
> You had more comments about this in PATCH[9]. I will explain a bit more 
> there.
> 
> x86 basically relies on EFLAGS[IF] flag. The async page fault can be 
> injected
> if it's on. Otherwise, it's forbidden. It's workable because exception 
> is
> special interrupt to x86 if I'm correct.
> 
>            return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>                   !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>                         (GUEST_INTR_STATE_STI | 
> GUEST_INTR_STATE_MOV_SS));

I really wish this was relying on an architected exception delivery
mechanism that can be blocked by the guest itself (PSTATE.{I,F,A}).
Trying to guess based on the PC won't fly. But these signals are
pretty hard to multiplex with anything else. Like any form of
non-architected exception injection, I don't see a good path forward
unless we start considering something like SDEI.

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	aarcange@redhat.com, drjones@redhat.com, suzuki.poulose@arm.com,
	catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
	eric.auger@redhat.com, james.morse@arm.com, shan.gavin@gmail.com,
	will@kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH RFCv2 7/9] kvm/arm64: Support async page fault
Date: Wed, 27 May 2020 08:37:06 +0100	[thread overview]
Message-ID: <28c74819f42306e66370ddaf88f16918@kernel.org> (raw)
In-Reply-To: <e1230110-b51f-b8b8-60d9-372660c5c387@redhat.com>

On 2020-05-27 05:05, Gavin Shan wrote:
> Hi Mark,
> 

[...]

>>> +struct kvm_vcpu_pv_apf_data {
>>> +	__u32	reason;
>>> +	__u8	pad[60];
>>> +	__u32	enabled;
>>> +};
>> 
>> What's all the padding for?
>> 
> 
> The padding is ensure the @reason and @enabled in different cache
> line. @reason is shared by host/guest, while @enabled is almostly
> owned by guest.

So you are assuming that a cache line is at most 64 bytes.
It is actualy implementation defined, and you can probe for it
by looking at the CTR_EL0 register. There are implementations
ranging from 32 to 256 bytes in the wild, and let's not mention
broken big-little implementations here.

[...]

>>> +bool kvm_arch_can_inject_async_page_not_present(struct kvm_vcpu 
>>> *vcpu)
>>> +{
>>> +	u64 vbar, pc;
>>> +	u32 val;
>>> +	int ret;
>>> +
>>> +	if (!(vcpu->arch.apf.control_block & KVM_ASYNC_PF_ENABLED))
>>> +		return false;
>>> +
>>> +	if (vcpu->arch.apf.send_user_only && vcpu_mode_priv(vcpu))
>>> +		return false;
>>> +
>>> +	/* Pending page fault, which ins't acknowledged by guest */
>>> +	ret = kvm_async_pf_read_cache(vcpu, &val);
>>> +	if (ret || val)
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * Events can't be injected through data abort because it's
>>> +	 * going to clobber ELR_EL1, which might not consued (or saved)
>>> +	 * by guest yet.
>>> +	 */
>>> +	vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
>>> +	pc = *vcpu_pc(vcpu);
>>> +	if (pc >= vbar && pc < (vbar + vcpu->arch.apf.no_fault_inst_range))
>>> +		return false;
>> 
>> Ah, so that's when this `no_fault_inst_range` is for.
>> 
>> As-is this is not sufficient, and we'll need t be extremely careful
>> here.
>> 
>> The vectors themselves typically only have a small amount of stub 
>> code,
>> and the bulk of the non-reentrant exception entry work happens
>> elsewhere, in a mixture of assembly and C code that isn't even 
>> virtually
>> contiguous with either the vectors or itself.
>> 
>> It's possible in theory that code in modules (or perhaps in eBPF JIT'd
>> code) that isn't safe to take a fault from, so even having a 
>> contiguous
>> range controlled by the kernel isn't ideal.
>> 
>> How does this work on x86?
>> 
> 
> Yeah, here we just provide a mechanism to forbid injecting data abort. 
> The
> range is fed by guest through HVC call. So I think it's guest related 
> issue.
> You had more comments about this in PATCH[9]. I will explain a bit more 
> there.
> 
> x86 basically relies on EFLAGS[IF] flag. The async page fault can be 
> injected
> if it's on. Otherwise, it's forbidden. It's workable because exception 
> is
> special interrupt to x86 if I'm correct.
> 
>            return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>                   !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>                         (GUEST_INTR_STATE_STI | 
> GUEST_INTR_STATE_MOV_SS));

I really wish this was relying on an architected exception delivery
mechanism that can be blocked by the guest itself (PSTATE.{I,F,A}).
Trying to guess based on the PC won't fly. But these signals are
pretty hard to multiplex with anything else. Like any form of
non-architected exception injection, I don't see a good path forward
unless we start considering something like SDEI.

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, will@kernel.org,
	catalin.marinas@arm.com, james.morse@arm.com,
	suzuki.poulose@arm.com, drjones@redhat.com,
	eric.auger@redhat.com, aarcange@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH RFCv2 7/9] kvm/arm64: Support async page fault
Date: Wed, 27 May 2020 08:37:06 +0100	[thread overview]
Message-ID: <28c74819f42306e66370ddaf88f16918@kernel.org> (raw)
In-Reply-To: <e1230110-b51f-b8b8-60d9-372660c5c387@redhat.com>

On 2020-05-27 05:05, Gavin Shan wrote:
> Hi Mark,
> 

[...]

>>> +struct kvm_vcpu_pv_apf_data {
>>> +	__u32	reason;
>>> +	__u8	pad[60];
>>> +	__u32	enabled;
>>> +};
>> 
>> What's all the padding for?
>> 
> 
> The padding is ensure the @reason and @enabled in different cache
> line. @reason is shared by host/guest, while @enabled is almostly
> owned by guest.

So you are assuming that a cache line is at most 64 bytes.
It is actualy implementation defined, and you can probe for it
by looking at the CTR_EL0 register. There are implementations
ranging from 32 to 256 bytes in the wild, and let's not mention
broken big-little implementations here.

[...]

>>> +bool kvm_arch_can_inject_async_page_not_present(struct kvm_vcpu 
>>> *vcpu)
>>> +{
>>> +	u64 vbar, pc;
>>> +	u32 val;
>>> +	int ret;
>>> +
>>> +	if (!(vcpu->arch.apf.control_block & KVM_ASYNC_PF_ENABLED))
>>> +		return false;
>>> +
>>> +	if (vcpu->arch.apf.send_user_only && vcpu_mode_priv(vcpu))
>>> +		return false;
>>> +
>>> +	/* Pending page fault, which ins't acknowledged by guest */
>>> +	ret = kvm_async_pf_read_cache(vcpu, &val);
>>> +	if (ret || val)
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * Events can't be injected through data abort because it's
>>> +	 * going to clobber ELR_EL1, which might not consued (or saved)
>>> +	 * by guest yet.
>>> +	 */
>>> +	vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
>>> +	pc = *vcpu_pc(vcpu);
>>> +	if (pc >= vbar && pc < (vbar + vcpu->arch.apf.no_fault_inst_range))
>>> +		return false;
>> 
>> Ah, so that's when this `no_fault_inst_range` is for.
>> 
>> As-is this is not sufficient, and we'll need t be extremely careful
>> here.
>> 
>> The vectors themselves typically only have a small amount of stub 
>> code,
>> and the bulk of the non-reentrant exception entry work happens
>> elsewhere, in a mixture of assembly and C code that isn't even 
>> virtually
>> contiguous with either the vectors or itself.
>> 
>> It's possible in theory that code in modules (or perhaps in eBPF JIT'd
>> code) that isn't safe to take a fault from, so even having a 
>> contiguous
>> range controlled by the kernel isn't ideal.
>> 
>> How does this work on x86?
>> 
> 
> Yeah, here we just provide a mechanism to forbid injecting data abort. 
> The
> range is fed by guest through HVC call. So I think it's guest related 
> issue.
> You had more comments about this in PATCH[9]. I will explain a bit more 
> there.
> 
> x86 basically relies on EFLAGS[IF] flag. The async page fault can be 
> injected
> if it's on. Otherwise, it's forbidden. It's workable because exception 
> is
> special interrupt to x86 if I'm correct.
> 
>            return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>                   !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>                         (GUEST_INTR_STATE_STI | 
> GUEST_INTR_STATE_MOV_SS));

I really wish this was relying on an architected exception delivery
mechanism that can be blocked by the guest itself (PSTATE.{I,F,A}).
Trying to guess based on the PC won't fly. But these signals are
pretty hard to multiplex with anything else. Like any form of
non-architected exception injection, I don't see a good path forward
unless we start considering something like SDEI.

         M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2020-05-27  7:37 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-08  3:29 [PATCH RFCv2 0/9] kvm/arm64: Support Async Page Fault Gavin Shan
2020-05-08  3:29 ` Gavin Shan
2020-05-08  3:29 ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 1/9] arm64: Probe for the presence of KVM hypervisor services during boot Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 3/9] kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-26 10:42   ` Mark Rutland
2020-05-26 10:42     ` Mark Rutland
2020-05-26 10:42     ` Mark Rutland
2020-05-27  2:43     ` Gavin Shan
2020-05-27  2:43       ` Gavin Shan
2020-05-27  2:43       ` Gavin Shan
2020-05-27  7:20       ` Marc Zyngier
2020-05-27  7:20         ` Marc Zyngier
2020-05-27  7:20         ` Marc Zyngier
2020-05-28  6:34         ` Gavin Shan
2020-05-28  6:34           ` Gavin Shan
2020-05-28  6:34           ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 4/9] kvm/arm64: Detach ESR operator from vCPU struct Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-26 10:51   ` Mark Rutland
2020-05-26 10:51     ` Mark Rutland
2020-05-26 10:51     ` Mark Rutland
2020-05-27  2:55     ` Gavin Shan
2020-05-27  2:55       ` Gavin Shan
2020-05-27  2:55       ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 5/9] kvm/arm64: Replace hsr with esr Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-26 10:45   ` Mark Rutland
2020-05-26 10:45     ` Mark Rutland
2020-05-26 10:45     ` Mark Rutland
2020-05-27  2:56     ` Gavin Shan
2020-05-27  2:56       ` Gavin Shan
2020-05-27  2:56       ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 6/9] kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-26 10:58   ` Mark Rutland
2020-05-26 10:58     ` Mark Rutland
2020-05-26 10:58     ` Mark Rutland
2020-05-27  3:01     ` Gavin Shan
2020-05-27  3:01       ` Gavin Shan
2020-05-27  3:01       ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 7/9] kvm/arm64: Support async page fault Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-26 12:34   ` Mark Rutland
2020-05-26 12:34     ` Mark Rutland
2020-05-26 12:34     ` Mark Rutland
2020-05-27  4:05     ` Gavin Shan
2020-05-27  4:05       ` Gavin Shan
2020-05-27  4:05       ` Gavin Shan
2020-05-27  7:37       ` Marc Zyngier [this message]
2020-05-27  7:37         ` Marc Zyngier
2020-05-27  7:37         ` Marc Zyngier
2020-05-28  6:32         ` Gavin Shan
2020-05-28  6:32           ` Gavin Shan
2020-05-28  6:32           ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 8/9] kernel/sched: Add cpu_rq_is_locked() Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29 ` [PATCH RFCv2 9/9] arm64: Support async page fault Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-08  3:29   ` Gavin Shan
2020-05-26 12:56   ` Mark Rutland
2020-05-26 12:56     ` Mark Rutland
2020-05-26 12:56     ` Mark Rutland
2020-05-27  6:48   ` Paolo Bonzini
2020-05-27  6:48     ` Paolo Bonzini
2020-05-27  6:48     ` Paolo Bonzini
2020-05-28  6:14     ` Gavin Shan
2020-05-28  6:14       ` Gavin Shan
2020-05-28  6:14       ` Gavin Shan
2020-05-28  7:03       ` Marc Zyngier
2020-05-28  7:03         ` Marc Zyngier
2020-05-28  7:03         ` Marc Zyngier
2020-05-28 10:53         ` Paolo Bonzini
2020-05-28 10:53           ` Paolo Bonzini
2020-05-28 10:53           ` Paolo Bonzini
2020-05-28 10:48       ` Paolo Bonzini
2020-05-28 10:48         ` Paolo Bonzini
2020-05-28 10:48         ` Paolo Bonzini
2020-05-28 23:02         ` Gavin Shan
2020-05-28 23:02           ` Gavin Shan
2020-05-28 23:02           ` Gavin Shan
2020-05-29  9:41           ` Marc Zyngier
2020-05-29  9:41             ` Marc Zyngier
2020-05-29  9:41             ` Marc Zyngier
2020-05-29 11:11             ` Paolo Bonzini
2020-05-29 11:11               ` Paolo Bonzini
2020-05-29 11:11               ` Paolo Bonzini
2020-05-31 12:44               ` Marc Zyngier
2020-05-31 12:44                 ` Marc Zyngier
2020-05-31 12:44                 ` Marc Zyngier
2020-06-01  9:21                 ` Paolo Bonzini
2020-06-01  9:21                   ` Paolo Bonzini
2020-06-01  9:21                   ` Paolo Bonzini
2020-06-02  5:44                   ` Gavin Shan
2020-06-02  5:44                     ` Gavin Shan
2020-06-02  5:44                     ` Gavin Shan
2020-05-25 23:39 ` [PATCH RFCv2 0/9] kvm/arm64: Support Async Page Fault Gavin Shan
2020-05-25 23:39   ` Gavin Shan
2020-05-25 23:39   ` Gavin Shan
2020-05-26 13:09 ` Mark Rutland
2020-05-26 13:09   ` Mark Rutland
2020-05-26 13:09   ` Mark Rutland
2020-05-27  2:39   ` Gavin Shan
2020-05-27  2:39     ` Gavin Shan
2020-05-27  2:39     ` Gavin Shan
2020-05-27  7:48     ` Marc Zyngier
2020-05-27  7:48       ` Marc Zyngier
2020-05-27  7:48       ` Marc Zyngier
2020-05-27 16:10       ` Paolo Bonzini
2020-05-27 16:10         ` Paolo Bonzini
2020-05-27 16:10         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28c74819f42306e66370ddaf88f16918@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=gshan@redhat.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shan.gavin@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.