From: Marc Zyngier <maz@kernel.org>
To: Andre Przywara <andre.przywara@arm.com>
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
Alexandru Elisei <alexandru.elisei@arm.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH][kvmtool] virtio/pci: Size the MSI-X bar according to the number of MSI-X
Date: Tue, 31 Aug 2021 12:28:28 +0100 [thread overview]
Message-ID: <87wno1ontv.wl-maz@kernel.org> (raw)
In-Reply-To: <20210831121035.6b5c993b@slackpad.fritz.box>
Hi Andre,
On Tue, 31 Aug 2021 12:10:35 +0100,
Andre Przywara <andre.przywara@arm.com> wrote:
>
> On Fri, 27 Aug 2021 12:54:05 +0100
> Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Marc,
>
> > Since 45d3b59e8c45 ("kvm tools: Increase amount of possible interrupts
> > per PCI device"), the number of MSI-S has gone from 4 to 33.
> >
> > However, the corresponding storage hasn't been upgraded, and writing
> > to the MSI-X table is a pretty risky business. Now that the Linux
> > kernel writes to *all* MSI-X entries before doing anything else
> > with the device, kvmtool dies a horrible death.
> >
> > Fix it by properly defining the size of the MSI-X bar, and make
> > Linux great again.
> >
> > This includes some fixes the PBA region decoding, as well as minor
> > cleanups to make this code a bit more maintainable.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
>
> Many thanks for fixing this, it looks good to me now. Just some
> questions below:
>
> > ---
> > virtio/pci.c | 42 ++++++++++++++++++++++++++++++------------
> > 1 file changed, 30 insertions(+), 12 deletions(-)
> >
> > diff --git a/virtio/pci.c b/virtio/pci.c
> > index eb91f512..41085291 100644
> > --- a/virtio/pci.c
> > +++ b/virtio/pci.c
> > @@ -7,6 +7,7 @@
> > #include "kvm/irq.h"
> > #include "kvm/virtio.h"
> > #include "kvm/ioeventfd.h"
> > +#include "kvm/util.h"
> >
> > #include <sys/ioctl.h>
> > #include <linux/virtio_pci.h>
> > @@ -14,6 +15,13 @@
> > #include <assert.h>
> > #include <string.h>
> >
> > +#define ALIGN_UP(x, s) ALIGN((x) + (s) - 1, (s))
> > +#define VIRTIO_NR_MSIX (VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG)
> > +#define VIRTIO_MSIX_TABLE_SIZE (VIRTIO_NR_MSIX * 16)
> > +#define VIRTIO_MSIX_PBA_SIZE (ALIGN_UP(VIRTIO_MSIX_TABLE_SIZE, 64) / 8)
> > +#define VIRTIO_MSIX_BAR_SIZE (1UL << fls_long(VIRTIO_MSIX_TABLE_SIZE + \
> > + VIRTIO_MSIX_PBA_SIZE))
> > +
> > static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
> > {
> > return pci__bar_address(&vpci->pci_hdr, 0);
> > @@ -333,18 +341,27 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
> > struct virtio_pci *vpci = vdev->virtio;
> > struct msix_table *table;
> > u32 msix_io_addr = virtio_pci__msix_io_addr(vpci);
> > + u32 pba_offset;
> > int vecnum;
> > size_t offset;
> >
> > - if (addr > msix_io_addr + PCI_IO_SIZE) {
>
> Ouch, the missing "=" looks like another long standing bug you fixed, I
> wonder how this ever worked before? Looking deeper it looks like the
> whole PBA code was quite broken (allowing writes, for instance, and
> mixing with the code for the MSIX table)?
I don't think it ever worked. And to be fair, no known guest ever
reads from it either. It just that as I was reworking it, some of the
pitfalls became obvious.
>
> > + BUILD_BUG_ON(VIRTIO_NR_MSIX > (sizeof(vpci->msix_pba) * 8));
> > +
> > + pba_offset = vpci->pci_hdr.msix.pba_offset & ~PCI_MSIX_TABLE_BIR;
>
> Any particular reason you read back the offset from the MSIX capability
> instead of just using VIRTIO_MSIX_TABLE_SIZE here? Is that to avoid
> accidentally diverging in the future, by having just one place of
> definition?
Exactly. My first version of this patch actually failed to update the
offset advertised to the guest, so I decided to just have a single
location for this. At least, we won't have to touch this code again if
we change the number of MSI-X.
>
> > + if (addr >= msix_io_addr + pba_offset) {
> > + /* Read access to PBA */
> > if (is_write)
> > return;
> > - table = (struct msix_table *)&vpci->msix_pba;
> > - offset = addr - (msix_io_addr + PCI_IO_SIZE);
> > - } else {
> > - table = vpci->msix_table;
> > - offset = addr - msix_io_addr;
> > + offset = addr - (msix_io_addr + pba_offset);
> > + if ((offset + len) > sizeof (vpci->msix_pba))
> > + return;
> > + memcpy(data, (void *)&vpci->msix_pba + offset, len);
>
> Should this be a char* cast, since pointer arithmetic on void* is
> somewhat frowned upon (aka "forbidden in the C standard, but allowed as
> a GCC extension")?
I am trying to be consistent. A quick grep shows at least 19
occurrences of pointer arithmetic with '(void *)', and none with
'(char *)'. Happy for someone to go and repaint this, but I don't
think this should be the purpose of this patch.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2021-08-31 11:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-27 11:54 [PATCH][kvmtool] virtio/pci: Size the MSI-X bar according to the number of MSI-X Marc Zyngier
2021-08-31 11:10 ` Andre Przywara
2021-08-31 11:28 ` Marc Zyngier [this message]
2021-08-31 11:39 ` Andre Przywara
2021-08-31 15:05 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wno1ontv.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=alexandru.elisei@arm.com \
--cc=andre.przywara@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox