From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.0 required=3.0 tests=BAYES_00,BITCOIN_OBFU_SUBJ, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DB9BC4320E for ; Wed, 25 Aug 2021 17:44:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 64C36610E9 for ; Wed, 25 Aug 2021 17:44:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234730AbhHYRpl convert rfc822-to-8bit (ORCPT ); Wed, 25 Aug 2021 13:45:41 -0400 Received: from mail.kernel.org ([198.145.29.99]:54164 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231602AbhHYRpl (ORCPT ); Wed, 25 Aug 2021 13:45:41 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1294C610E8; Wed, 25 Aug 2021 17:44:55 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mIwxJ-007CtG-4O; Wed, 25 Aug 2021 18:44:53 +0100 Date: Wed, 25 Aug 2021 18:44:52 +0100 Message-ID: <878s0ppgff.wl-maz@kernel.org> From: Marc Zyngier To: Andre Przywara Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, kernel-team@android.com, Alexandru Elisei , Thomas Gleixner , Will Deacon Subject: Re: [PATCH][kvmtool] virtio/pci: Correctly handle MSI-X masking while MSI-X is disabled In-Reply-To: <87a6l5pmim.wl-maz@kernel.org> References: <20210821120742.855712-1-maz@kernel.org> <20210823174833.05adee5d@slackpad.fritz.box> <87tujeq5ey.wl-maz@kernel.org> <87a6l5pmim.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: andre.przywara@arm.com, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, kernel-team@android.com, alexandru.elisei@arm.com, tglx@linutronix.de, will@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, 25 Aug 2021 16:33:21 +0100, Marc Zyngier wrote: > > On Tue, 24 Aug 2021 15:32:53 +0100, > Marc Zyngier wrote: > > > > Hi Andre, > > > > On Mon, 23 Aug 2021 17:48:33 +0100, > > Andre Przywara wrote: > > > > > > On Sat, 21 Aug 2021 13:07:42 +0100 > > > Marc Zyngier wrote: > > > > > > Hi Marc, > > > > > > > Since Linux commit 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X > > > > entries"), kvmtool segfaults when the guest boots and tries to > > > > disable all the MSI-X entries of a virtio device while MSI-X itself > > > > is disabled. > > > > > > > > What Linux does is seems perfectly correct. However, kvmtool uses > > > > a different decoding depending on whether MSI-X is enabled for > > > > this device or not. Which seems pretty wrong. > > > > > > While I really wish this would be wrong, I think this is > > > indeed how this is supposed to work: The Virtio legacy spec makes the > > > existence of those two virtio config fields dependent on the > > > (dynamic!) enablement status of MSI-X. This is reflected in: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/virtio_pci.h#n72 > > > and explicitly mentioned as a footnote in the virtio 0.9.5 spec[1]: > > > "3) ie. once you enable MSI-X on the device, the other fields move. If > > > you turn it off again, they move back!" > > > > Madness! What was Rusty on at the time? I really hope the bitcoin > > thing is buying him better stuff... > > > > > I agree that this looks like a bad idea, but I am afraid we are stuck > > > with this. It looks like the Linux driver is at fault here, it should > > > not issue the config access when MSIs are disabled. Something like this > > > (untested): > > > > > > --- a/drivers/virtio/virtio_pci_legacy.c > > > +++ b/drivers/virtio/virtio_pci_legacy.c > > > @@ -103,6 +103,9 @@ static void vp_reset(struct virtio_device *vdev) > > > > > > static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector) > > > { > > > + if (!vp_dev->msix_enabled) > > > + return VIRTIO_MSI_NO_VECTOR; > > > + > > > /* Setup the vector used for configuration events */ > > > iowrite16(vector, vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR); > > > /* Verify we had enough resources to assign the vector */ > > > > > > This is just my first idea after looking at this, happy to stand > > > corrected or hear about a better solution. > > > > I don't think this works. It instead completely disables MSI-X, which > > is a total bore. I think the only way to deal with it is to quirk it > > to prevent the bulk masking to take effect before MSI-X is enabled. > > Actually, let me correct myself. I tested the wrong configuration (why > isn't --force-pci the bloody default in kvmtool?). This patch doesn't > fix anything at all, and kvmtool just explodes. > > Having dug further, it isn't the config space that causes problems, > but the programming of the MSI-X vectors. I'm starting to suspect the > layout of the MSI-X bar in kvmtool. OK, this is hilarious. Sort of. The MSI-X bar sizing is bonkers: you can't fit 33 MSIs there (33 being the number of MSI-X that kvmtool advertises), and you will have notionally overwritten the PBA as well. Amusingly, the last write ends up being misdecoded as a config space access... "works for me". M. >From a2b3a338aab535a1683cc5b424455ed7fd3a500a Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 25 Aug 2021 18:19:27 +0100 Subject: [PATCH] virtio/pci: Size the MSI-X bar according to the number of MSI-X Since 45d3b59e8c45 ("kvm tools: Increase amount of possible interrupts per PCI device"), the number of MSI-S has gone from 4 to 33. However, the corresponding storage hasn't been upgraded, and writing to the MSI-X table is a pretty risky business. Now that the Linux kernel writes to *all* MSI-X entries before doing anything else with the device, kvmtool dies a horrible death. Fix it by properly defining the size of the MSI-X bar, and make Linux great again. Signed-off-by: Marc Zyngier --- virtio/pci.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/virtio/pci.c b/virtio/pci.c index eb91f512..726146fc 100644 --- a/virtio/pci.c +++ b/virtio/pci.c @@ -7,6 +7,7 @@ #include "kvm/irq.h" #include "kvm/virtio.h" #include "kvm/ioeventfd.h" +#include "kvm/util.h" #include #include @@ -14,6 +15,13 @@ #include #include +#define ALIGN_UP(x, s) ALIGN((x) + (s) - 1, (s)) +#define VIRTIO_NR_MSIX (VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG) +#define VIRTIO_MSIX_TABLE_SIZE (VIRTIO_NR_MSIX * 16) +#define VIRTIO_MSIX_PBA_SIZE (ALIGN_UP(VIRTIO_MSIX_TABLE_SIZE, 64) / 8) +#define VIRTIO_MSIX_BAR_SIZE (1UL << fls_long(VIRTIO_MSIX_TABLE_SIZE + \ + VIRTIO_MSIX_PBA_SIZE)) + static u16 virtio_pci__port_addr(struct virtio_pci *vpci) { return pci__bar_address(&vpci->pci_hdr, 0); @@ -336,15 +344,20 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu, int vecnum; size_t offset; - if (addr > msix_io_addr + PCI_IO_SIZE) { + if (addr > msix_io_addr + VIRTIO_MSIX_TABLE_SIZE) { + /* Read access to PBA */ if (is_write) return; - table = (struct msix_table *)&vpci->msix_pba; - offset = addr - (msix_io_addr + PCI_IO_SIZE); - } else { - table = vpci->msix_table; - offset = addr - msix_io_addr; + offset = addr - (msix_io_addr + VIRTIO_MSIX_TABLE_SIZE); + if ((offset + len) > sizeof (vpci->msix_pba)) + return; + memcpy(data, (void *)&vpci->msix_pba + offset, len); + return; } + + table = vpci->msix_table; + offset = addr - msix_io_addr; + vecnum = offset / sizeof(struct msix_table); offset = offset % sizeof(struct msix_table); @@ -520,7 +533,7 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, port_addr = pci_get_io_port_block(PCI_IO_SIZE); mmio_addr = pci_get_mmio_block(PCI_IO_SIZE); - msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2); + msix_io_block = pci_get_mmio_block(VIRTIO_MSIX_BAR_SIZE); vpci->pci_hdr = (struct pci_device_header) { .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), @@ -543,7 +556,7 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, .capabilities = (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr, .bar_size[0] = cpu_to_le32(PCI_IO_SIZE), .bar_size[1] = cpu_to_le32(PCI_IO_SIZE), - .bar_size[2] = cpu_to_le32(PCI_IO_SIZE*2), + .bar_size[2] = cpu_to_le32(VIRTIO_MSIX_BAR_SIZE), }; r = pci__register_bar_regions(kvm, &vpci->pci_hdr, -- 2.30.2 -- Without deviation from the norm, progress is not possible.