From: Marc Zyngier <maz@kernel.org>
To: James Morse <james.morse@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>,
Julien Thierry <julien.thierry.kdev@gmail.com>,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org,
Suzuki K Poulose <suzuki.poulose@arm.com>
Subject: Re: [PATCH] KVM: arm64: arch_timer shouldn't assume the vcpu is loaded
Date: Thu, 9 Apr 2020 09:27:06 +0100 [thread overview]
Message-ID: <20200409092706.74e6bd1d@why> (raw)
In-Reply-To: <cc6bed09-33dd-027a-126f-ed22389c1624@arm.com>
Hi James,
On Wed, 8 Apr 2020 12:16:01 +0100
James Morse <james.morse@arm.com> wrote:
> Hi Marc,
>
> On 08/04/2020 11:07, Marc Zyngier wrote:
> > On Mon, 6 Apr 2020 16:03:55 +0100
> > James Morse <james.morse@arm.com> wrote:
> >
> >> kvm_arch_timer_get_input_level() needs to get the arch_timer_context for
> >> a particular vcpu, and uses kvm_get_running_vcpu() to find it.
> >>
> >> kvm_arch_timer_get_input_level() may be called to handle a user-space
> >> write to the redistributor, where the vcpu is not loaded. This causes
> >> kvm_get_running_vcpu() to return NULL:
> >> | Unable to handle kernel paging request at virtual address 0000000000001ec0
> >> | Mem abort info:
> >> | ESR = 0x96000004
> >> | EC = 0x25: DABT (current EL), IL = 32 bits
> >> | SET = 0, FnV = 0
> >> | EA = 0, S1PTW = 0
> >> | Data abort info:
> >> | ISV = 0, ISS = 0x00000004
> >> | CM = 0, WnR = 0
> >> | user pgtable: 4k pages, 48-bit VAs, pgdp=000000003cbf9000
> >> | [0000000000001ec0] pgd=0000000000000000
> >> | Internal error: Oops: 96000004 [#1] PREEMPT SMP
> >> | Modules linked in: r8169 realtek efivarfs ip_tables x_tables
> >> | CPU: 1 PID: 2615 Comm: qemu-system-aar Not tainted 5.6.0-rc7 #30
> >> | Hardware name: Marvell mvebu_armada-37xx/mvebu_armada-37xx, BIOS 2018.03-devel-18.12.3-gc9aa92c-armbian 02/20/2019
> >> | pstate: 00000085 (nzcv daIf -PAN -UAO)
> >> | pc : kvm_arch_timer_get_input_level+0x1c/0x68
> >> | lr : kvm_arch_timer_get_input_level+0x1c/0x68
> >>
> >> | Call trace:
> >> | kvm_arch_timer_get_input_level+0x1c/0x68
> >> | vgic_get_phys_line_level+0x3c/0x90
> >> | vgic_mmio_write_senable+0xe4/0x130
> >> | vgic_uaccess+0xe0/0x100
> >> | vgic_v3_redist_uaccess+0x5c/0x80
> >> | vgic_v3_attr_regs_access+0xf0/0x200
> >> | nvgic_v3_set_attr+0x234/0x250
> >> | kvm_device_ioctl_attr+0xa4/0xf8
> >> | kvm_device_ioctl+0x7c/0xc0
> >> | ksys_ioctl+0x1fc/0xc18
> >> | __arm64_sys_ioctl+0x24/0x30
> >> | do_el0_svc+0x7c/0x148
> >> | el0_sync_handler+0x138/0x258
> >> | el0_sync+0x140/0x180
> >> | Code: 910003fd f9000bf3 2a0003f3 97ff650c (b95ec001)
> >> | ---[ end trace 81287612d93f1e70 ]---
> >> | note: qemu-system-aar[2615] exited with preempt_count 1
> >>
> >> Loading the vcpu doesn't make a lot of sense for handling a device ioctl(),
> >> so instead pass the vcpu through to kvm_arch_timer_get_input_level(). Its
> >> not clear that an intid makes much sense without the paired vcpu.
> >
> > I don't fully agree with the analysis, Remember we are looking at the
> > state of the physical interrupt associated with a virtual interrupt, so
> > the vcpu doesn't quite make sense here if it isn't loaded.
> >
> > What does it mean to look at the HW timer when we are not in the right
> > context? For all we know, it is completely random (the only guarantee
> > we have is that it is disabled, actually).
>
> > My gut feeling is that this is another instance where we should provide
> > specific userspace accessors that would only deal with the virtual
> > state, and leave anything that deals with the physical state of the
> > interrupt to be exercised only by the guest.
>
> > Does it make sense?
>
> Broadly, yes. Specifically ... I'm not familiar enough with this code to work out where
> such a change should go!
>
> ~20 mins of grepping later~
>
> Remove REGISTER_DESC_WITH_LENGTH() so that uaccess helpers have to be provided, and forbid
> NULL for the ur/uw values in REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED()...?
I'd suggest something like this (untested, of course):
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index d63881f60e1a..f51c6e939c76 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -409,10 +409,12 @@ static const struct vgic_register_region vgic_v2_dist_registers[] = {
NULL, vgic_mmio_uaccess_write_v2_group, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET,
- vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
+ vgic_mmio_read_enable, vgic_mmio_write_senable,
+ NULL, vgic_uaccess_write_senable, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR,
- vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1,
+ vgic_mmio_read_enable, vgic_mmio_write_cenable,
+ NULL, vgic_uaccess_write_cenable, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET,
vgic_mmio_read_pending, vgic_mmio_write_spending, NULL, NULL, 1,
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 77c8ba1a2535..a9c45048fadb 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -538,10 +538,12 @@ static const struct vgic_register_region vgic_v3_dist_registers[] = {
vgic_mmio_read_group, vgic_mmio_write_group, NULL, NULL, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISENABLER,
- vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
+ vgic_mmio_read_enable, vgic_mmio_write_senable,
+ NULL, vgic_uaccess_write_senable, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICENABLER,
- vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1,
+ vgic_mmio_read_enable, vgic_mmio_write_cenable,
+ NULL, vgic_uaccess_write_cenable, 1,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISPENDR,
vgic_mmio_read_pending, vgic_mmio_write_spending,
@@ -609,11 +611,13 @@ static const struct vgic_register_region vgic_v3_rd_registers[] = {
REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IGROUPR0,
vgic_mmio_read_group, vgic_mmio_write_group, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ISENABLER0,
- vgic_mmio_read_enable, vgic_mmio_write_senable, 4,
+ REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISENABLER0,
+ vgic_mmio_read_enable, vgic_mmio_write_senable,
+ NULL, vgic_uaccess_write_senable, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ICENABLER0,
- vgic_mmio_read_enable, vgic_mmio_write_cenable, 4,
+ REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ICENABLER0,
+ vgic_mmio_read_enable, vgic_mmio_write_cenable,
+ NULL, vgic_uaccess_write_cenable, 4,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISPENDR0,
vgic_mmio_read_pending, vgic_mmio_write_spending,
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 4012cd68ac93..2ca11b05b17b 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -184,6 +184,48 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
}
}
+int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len,
+ unsigned long val)
+{
+ u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+ int i;
+ unsigned long flags;
+
+ for_each_set_bit(i, &val, len * 8) {
+ struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+ raw_spin_lock_irqsave(&irq->irq_lock, flags);
+ irq->enabled = true;
+ vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
+
+ vgic_put_irq(vcpu->kvm, irq);
+ }
+
+ return 0;
+}
+
+int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len,
+ unsigned long val)
+{
+ u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+ int i;
+ unsigned long flags;
+
+ for_each_set_bit(i, &val, len * 8) {
+ struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+ raw_spin_lock_irqsave(&irq->irq_lock, flags);
+ irq->enabled = false;
+ raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+ vgic_put_irq(vcpu->kvm, irq);
+ }
+
+ return 0;
+}
+
unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
gpa_t addr, unsigned int len)
{
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 30713a44e3fa..327d0a6938e4 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -138,6 +138,14 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
gpa_t addr, unsigned int len,
unsigned long val);
+int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len,
+ unsigned long val);
+
+int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len,
+ unsigned long val);
+
unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
gpa_t addr, unsigned int len);
>
> Or if that is too invasive, something like, (totally, untested):
> ----------------%<----------------
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 97fb2a40e6ba..30ae5f29e429 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -113,10 +113,11 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>
> raw_spin_lock_irqsave(&irq->irq_lock, flags);
> - if (vgic_irq_is_mapped_level(irq)) {
> + if (kvm_running_vcpu() && vgic_irq_is_mapped_level(irq)) {
> bool was_high = irq->line_level;
>
> /*
> + * Unless we are running due to a user-space access,
> * We need to update the state of the interrupt because
> * the guest might have changed the state of the device
> * while the interrupt was disabled at the VGIC level.
> ----------------%<----------------
Huh, nice try! ;-) Unfortunately, I think there is more than the enable
register that suffers from a similar problem (see how the pending
register write is also accessing the HW state, even if accessed from
userspace).
Thanks,
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-04-09 8:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-06 15:03 [PATCH] KVM: arm64: arch_timer shouldn't assume the vcpu is loaded James Morse
2020-04-08 10:07 ` Marc Zyngier
2020-04-08 11:16 ` James Morse
2020-04-09 8:27 ` Marc Zyngier [this message]
2020-04-09 16:53 ` James Morse
2020-04-08 12:13 ` André Przywara
2020-04-08 14:19 ` Marc Zyngier
2020-04-08 16:50 ` André Przywara
2020-04-09 8:08 ` Marc Zyngier
2020-04-09 16:04 ` André Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200409092706.74e6bd1d@why \
--to=maz@kernel.org \
--cc=andre.przywara@arm.com \
--cc=james.morse@arm.com \
--cc=julien.thierry.kdev@gmail.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=suzuki.poulose@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).