linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: James Morse <james.morse@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Suzuki K Poulose <suzuki.poulose@arm.com>
Subject: Re: [PATCH] KVM: arm64: arch_timer shouldn't assume the vcpu is loaded
Date: Thu, 9 Apr 2020 09:27:06 +0100	[thread overview]
Message-ID: <20200409092706.74e6bd1d@why> (raw)
In-Reply-To: <cc6bed09-33dd-027a-126f-ed22389c1624@arm.com>

Hi James,

On Wed, 8 Apr 2020 12:16:01 +0100
James Morse <james.morse@arm.com> wrote:

> Hi Marc,
> 
> On 08/04/2020 11:07, Marc Zyngier wrote:
> > On Mon,  6 Apr 2020 16:03:55 +0100
> > James Morse <james.morse@arm.com> wrote:
> >   
> >> kvm_arch_timer_get_input_level() needs to get the arch_timer_context for
> >> a particular vcpu, and uses kvm_get_running_vcpu() to find it.
> >>
> >> kvm_arch_timer_get_input_level() may be called to handle a user-space
> >> write to the redistributor, where the vcpu is not loaded. This causes
> >> kvm_get_running_vcpu() to return NULL:
> >> | Unable to handle kernel paging request at virtual address 0000000000001ec0
> >> | Mem abort info:
> >> |   ESR = 0x96000004
> >> |   EC = 0x25: DABT (current EL), IL = 32 bits
> >> |   SET = 0, FnV = 0
> >> |   EA = 0, S1PTW = 0
> >> | Data abort info:
> >> |   ISV = 0, ISS = 0x00000004
> >> |   CM = 0, WnR = 0
> >> | user pgtable: 4k pages, 48-bit VAs, pgdp=000000003cbf9000
> >> | [0000000000001ec0] pgd=0000000000000000
> >> | Internal error: Oops: 96000004 [#1] PREEMPT SMP
> >> | Modules linked in: r8169 realtek efivarfs ip_tables x_tables
> >> | CPU: 1 PID: 2615 Comm: qemu-system-aar Not tainted 5.6.0-rc7 #30
> >> | Hardware name: Marvell mvebu_armada-37xx/mvebu_armada-37xx, BIOS 2018.03-devel-18.12.3-gc9aa92c-armbian 02/20/2019
> >> | pstate: 00000085 (nzcv daIf -PAN -UAO)
> >> | pc : kvm_arch_timer_get_input_level+0x1c/0x68
> >> | lr : kvm_arch_timer_get_input_level+0x1c/0x68
> >>
> >> | Call trace:
> >> |  kvm_arch_timer_get_input_level+0x1c/0x68
> >> |  vgic_get_phys_line_level+0x3c/0x90
> >> |  vgic_mmio_write_senable+0xe4/0x130
> >> |  vgic_uaccess+0xe0/0x100
> >> |  vgic_v3_redist_uaccess+0x5c/0x80
> >> |  vgic_v3_attr_regs_access+0xf0/0x200
> >> |  nvgic_v3_set_attr+0x234/0x250
> >> |  kvm_device_ioctl_attr+0xa4/0xf8
> >> |  kvm_device_ioctl+0x7c/0xc0
> >> |  ksys_ioctl+0x1fc/0xc18
> >> |  __arm64_sys_ioctl+0x24/0x30
> >> |  do_el0_svc+0x7c/0x148
> >> |  el0_sync_handler+0x138/0x258
> >> |  el0_sync+0x140/0x180
> >> | Code: 910003fd f9000bf3 2a0003f3 97ff650c (b95ec001)
> >> | ---[ end trace 81287612d93f1e70 ]---
> >> | note: qemu-system-aar[2615] exited with preempt_count 1
> >>
> >> Loading the vcpu doesn't make a lot of sense for handling a device ioctl(),
> >> so instead pass the vcpu through to kvm_arch_timer_get_input_level(). Its
> >> not clear that an intid makes much sense without the paired vcpu.  
> > 
> > I don't fully agree with the analysis, Remember we are looking at the
> > state of the physical interrupt associated with a virtual interrupt, so
> > the vcpu doesn't quite make sense here if it isn't loaded.
> > 
> > What does it mean to look at the HW timer when we are not in the right
> > context? For all we know, it is completely random (the only guarantee
> > we have is that it is disabled, actually).  
> 
> > My gut feeling is that this is another instance where we should provide
> > specific userspace accessors that would only deal with the virtual
> > state, and leave anything that deals with the physical state of the
> > interrupt to be exercised only by the guest.  
> 
> > Does it make sense?  
> 
> Broadly, yes. Specifically ... I'm not familiar enough with this code to work out where
> such a change should go!
> 
> ~20 mins of grepping later~
> 
> Remove REGISTER_DESC_WITH_LENGTH() so that uaccess helpers have to be provided, and forbid
> NULL for the ur/uw values in REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED()...?

I'd suggest something like this (untested, of course):

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index d63881f60e1a..f51c6e939c76 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -409,10 +409,12 @@ static const struct vgic_register_region vgic_v2_dist_registers[] = {
 		NULL, vgic_mmio_uaccess_write_v2_group, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET,
-		vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_senable,
+		NULL, vgic_uaccess_write_senable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR,
-		vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_cenable,
+		NULL, vgic_uaccess_write_cenable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET,
 		vgic_mmio_read_pending, vgic_mmio_write_spending, NULL, NULL, 1,
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 77c8ba1a2535..a9c45048fadb 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -538,10 +538,12 @@ static const struct vgic_register_region vgic_v3_dist_registers[] = {
 		vgic_mmio_read_group, vgic_mmio_write_group, NULL, NULL, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISENABLER,
-		vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_senable,
+		NULL, vgic_uaccess_write_senable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICENABLER,
-		vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_cenable,
+	       NULL, vgic_uaccess_write_cenable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISPENDR,
 		vgic_mmio_read_pending, vgic_mmio_write_spending,
@@ -609,11 +611,13 @@ static const struct vgic_register_region vgic_v3_rd_registers[] = {
 	REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IGROUPR0,
 		vgic_mmio_read_group, vgic_mmio_write_group, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ISENABLER0,
-		vgic_mmio_read_enable, vgic_mmio_write_senable, 4,
+	REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISENABLER0,
+		vgic_mmio_read_enable, vgic_mmio_write_senable,
+		NULL, vgic_uaccess_write_senable, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ICENABLER0,
-		vgic_mmio_read_enable, vgic_mmio_write_cenable, 4,
+	REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ICENABLER0,
+		vgic_mmio_read_enable, vgic_mmio_write_cenable,
+		NULL, vgic_uaccess_write_cenable, 4,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISPENDR0,
 		vgic_mmio_read_pending, vgic_mmio_write_spending,
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 4012cd68ac93..2ca11b05b17b 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -184,6 +184,48 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
 	}
 }
 
+int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val)
+{
+	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+	int i;
+	unsigned long flags;
+
+	for_each_set_bit(i, &val, len * 8) {
+		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+		raw_spin_lock_irqsave(&irq->irq_lock, flags);
+		irq->enabled = true;
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+
+	return 0;
+}
+
+int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val)
+{
+	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+	int i;
+	unsigned long flags;
+
+	for_each_set_bit(i, &val, len * 8) {
+		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+		raw_spin_lock_irqsave(&irq->irq_lock, flags);
+		irq->enabled = false;
+		raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+
+	return 0;
+}
+
 unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
 				     gpa_t addr, unsigned int len)
 {
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 30713a44e3fa..327d0a6938e4 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -138,6 +138,14 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
 			     gpa_t addr, unsigned int len,
 			     unsigned long val);
 
+int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val);
+
+int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val);
+
 unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
 				     gpa_t addr, unsigned int len);
 


> 
> Or if that is too invasive, something like, (totally, untested):
> ----------------%<----------------
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 97fb2a40e6ba..30ae5f29e429 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -113,10 +113,11 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>                 struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> 
>                 raw_spin_lock_irqsave(&irq->irq_lock, flags);
> -               if (vgic_irq_is_mapped_level(irq)) {
> +               if (kvm_running_vcpu() && vgic_irq_is_mapped_level(irq)) {
>                         bool was_high = irq->line_level;
> 
>                         /*
> +                        * Unless we are running due to a user-space access,
>                          * We need to update the state of the interrupt because
>                          * the guest might have changed the state of the device
>                          * while the interrupt was disabled at the VGIC level.
> ----------------%<----------------

Huh, nice try! ;-) Unfortunately, I think there is more than the enable
register that suffers from a similar problem (see how the pending
register write is also accessing the HW state, even if accessed from
userspace).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-04-09  8:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-06 15:03 [PATCH] KVM: arm64: arch_timer shouldn't assume the vcpu is loaded James Morse
2020-04-08 10:07 ` Marc Zyngier
2020-04-08 11:16   ` James Morse
2020-04-09  8:27     ` Marc Zyngier [this message]
2020-04-09 16:53       ` James Morse
2020-04-08 12:13   ` André Przywara
2020-04-08 14:19     ` Marc Zyngier
2020-04-08 16:50       ` André Przywara
2020-04-09  8:08         ` Marc Zyngier
2020-04-09 16:04           ` André Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200409092706.74e6bd1d@why \
    --to=maz@kernel.org \
    --cc=andre.przywara@arm.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=suzuki.poulose@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).