[PATCH v3 0/4] MSI-X support with qemu in stubdomain, and other related changes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] MSI-X support with qemu in stubdomain, and other related changes
@ 2023-04-06  3:57 Marek Marczykowski-Górecki
  2023-04-06  3:57 ` [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-04-06  3:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Marek Marczykowski-Górecki

This series includes changes to make MSI-X working with Linux stubdomain and
especially Intel Wifi 6 AX210 card. This takes care of remaining reasons for
QEMU to access /dev/mem, but also the Intel Wifi card violating spec by putting
some registers on the same page as the MSI-X table.

See individual patches for details.

Marek Marczykowski-Górecki (4):
  x86/msi: passthrough all MSI-X vector ctrl writes to device model
  tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext
  x86/hvm: Allow writes to registers on the same page as MSI-X table
  x86/msi: clear initial MSI-X state on boot

 tools/include/xendevicemodel.h |  23 ++++-
 tools/libs/devicemodel/core.c  |  16 ++-
 xen/arch/x86/hvm/vmsi.c        | 209 ++++++++++++++++++++++++++++++++--
 xen/arch/x86/include/asm/msi.h |   5 +-
 xen/arch/x86/msi.c             |  38 ++++++-
 xen/common/ioreq.c             |   9 +-
 xen/drivers/passthrough/msi.c  |  17 +++-
 xen/include/public/hvm/dm_op.h |  12 +-
 8 files changed, 310 insertions(+), 19 deletions(-)

base-commit: 881ba20eb0222305a9d2cd090c9345992794f4f5
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-04-06  3:57 [PATCH v3 0/4] MSI-X support with qemu in stubdomain, and other related changes Marek Marczykowski-Górecki
@ 2023-04-06  3:57 ` Marek Marczykowski-Górecki
  2023-04-24 13:06   ` Jan Beulich
  2023-05-03  9:01   ` Roger Pau Monné
  2023-04-06  3:57 ` [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext Marek Marczykowski-Górecki
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-04-06  3:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Paul Durrant

QEMU needs to know whether clearing maskbit of a vector is really
clearing, or was already cleared before. Currently Xen sends only
clearing that bit to the device model, but not setting it, so QEMU
cannot detect it. Because of that, QEMU is working this around by
checking via /dev/mem, but that isn't the proper approach. It's just a
workaround which in fact is racy.

Give all necessary information to QEMU by passing all ctrl writes,
including masking a vector.

While this commit doesn't move the whole maskbit handling to QEMU (as
discussed on xen-devel as one of the possibilities), it is a necessary
first step anyway. Including telling QEMU it will get all the required
information to do so. The actual implementation would need to include:
 - a hypercall for QEMU to control just maskbit (without (re)binding the
   interrupt again
 - a methor for QEMU to tell Xen it will actually do the work
Those are not part of this series.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v3:
 - advertise changed behavior in XEN_DMOP_get_ioreq_server_info - make
   "flags" parameter IN/OUT
 - move len check back to msixtbl_write() - will be needed there anyway
   in a later patch
v2:
 - passthrough quad writes to emulator too (Jan)
 - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
   #define for this magic value

Should flags on output include only "out" values (current version), or
also include those passed in by the caller unchanged?
---
 xen/arch/x86/hvm/vmsi.c        | 18 ++++++++++++++----
 xen/common/ioreq.c             |  9 +++++++--
 xen/include/public/hvm/dm_op.h | 12 ++++++++----
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 3cd4923060c8..231253a2cbd4 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -272,6 +272,15 @@ out:
     return r;
 }
 
+/*
+ * This function returns X86EMUL_UNHANDLEABLE even if write is properly
+ * handled, to propagate it to the device model (so it can keep its internal
+ * state in sync).
+ * len==0 means really len==4, but as a write completion that will return
+ * X86EMUL_OKAY on successful processing. Use WRITE_LEN4_COMPLETION to make it
+ * less confusing.
+ */
+#define WRITE_LEN4_COMPLETION 0
 static int msixtbl_write(struct vcpu *v, unsigned long address,
                          unsigned int len, unsigned long val)
 {
@@ -283,8 +292,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
     unsigned long flags;
     struct irq_desc *desc;
 
-    if ( (len != 4 && len != 8) || (address & (len - 1)) )
-        return r;
+    if ( (len != 4 && len != 8 && len != WRITE_LEN4_COMPLETION) ||
+         (len && (address & (len - 1))) )
+        return X86EMUL_UNHANDLEABLE;
 
     rcu_read_lock(&msixtbl_rcu_lock);
 
@@ -345,7 +355,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
 
 unlock:
     spin_unlock_irqrestore(&desc->lock, flags);
-    if ( len == 4 )
+    if ( len == WRITE_LEN4_COMPLETION )
         r = X86EMUL_OKAY;
 
 out:
@@ -635,7 +645,7 @@ void msix_write_completion(struct vcpu *v)
         return;
 
     v->arch.hvm.hvm_io.msix_unmask_address = 0;
-    if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
+    if ( msixtbl_write(v, ctrl_address, WRITE_LEN4_COMPLETION, 0) != X86EMUL_OKAY )
         gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
 }
 
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index ecb8f545e1c4..bd6f074c1e85 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -743,7 +743,8 @@ static int ioreq_server_destroy(struct domain *d, ioservid_t id)
 static int ioreq_server_get_info(struct domain *d, ioservid_t id,
                                  unsigned long *ioreq_gfn,
                                  unsigned long *bufioreq_gfn,
-                                 evtchn_port_t *bufioreq_port)
+                                 evtchn_port_t *bufioreq_port,
+                                 uint16_t *flags)
 {
     struct ioreq_server *s;
     int rc;
@@ -779,6 +780,9 @@ static int ioreq_server_get_info(struct domain *d, ioservid_t id,
             *bufioreq_port = s->bufioreq_evtchn;
     }
 
+    /* Advertise supported features/behaviors. */
+    *flags = XEN_DMOP_all_msix_writes;
+
     rc = 0;
 
  out:
@@ -1374,7 +1378,8 @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op)
                                    NULL : (unsigned long *)&data->ioreq_gfn,
                                    (data->flags & XEN_DMOP_no_gfns) ?
                                    NULL : (unsigned long *)&data->bufioreq_gfn,
-                                   &data->bufioreq_port);
+                                   &data->bufioreq_port, &data->flags);
+
         break;
     }
 
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index acdf91693d0b..490b151c5dd7 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -70,7 +70,9 @@ typedef struct xen_dm_op_create_ioreq_server xen_dm_op_create_ioreq_server_t;
  * not contain XEN_DMOP_no_gfns then these pages will be made available and
  * the frame numbers passed back in gfns <ioreq_gfn> and <bufioreq_gfn>
  * respectively. (If the IOREQ Server is not handling buffered emulation
- * only <ioreq_gfn> will be valid).
+ * only <ioreq_gfn> will be valid). When Xen returns XEN_DMOP_all_msix_writes
+ * flag set, it will notify the IOREQ server about all writes to MSI-X table
+ * (if it's handled by this IOREQ server), not only those clearing a mask bit.
  *
  * NOTE: To access the synchronous ioreq structures and buffered ioreq
  *       ring, it is preferable to use the XENMEM_acquire_resource memory
@@ -81,11 +83,13 @@ typedef struct xen_dm_op_create_ioreq_server xen_dm_op_create_ioreq_server_t;
 struct xen_dm_op_get_ioreq_server_info {
     /* IN - server id */
     ioservid_t id;
-    /* IN - flags */
+    /* IN/OUT - flags */
     uint16_t flags;
 
-#define _XEN_DMOP_no_gfns 0
-#define XEN_DMOP_no_gfns (1u << _XEN_DMOP_no_gfns)
+#define _XEN_DMOP_no_gfns         0  /* IN */
+#define _XEN_DMOP_all_msix_writes 1  /* OUT */
+#define XEN_DMOP_no_gfns         (1u << _XEN_DMOP_no_gfns)
+#define XEN_DMOP_all_msix_writes (1u << _XEN_DMOP_all_msix_writes)
 
     /* OUT - buffered ioreq port */
     evtchn_port_t bufioreq_port;
-- 
git-series 0.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext
  2023-04-06  3:57 [PATCH v3 0/4] MSI-X support with qemu in stubdomain, and other related changes Marek Marczykowski-Górecki
  2023-04-06  3:57 ` [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
@ 2023-04-06  3:57 ` Marek Marczykowski-Górecki
  2023-04-06  6:05   ` Juergen Gross
  2023-04-06  3:57 ` [PATCH v3 3/4] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
  2023-04-06  3:57 ` [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
  3 siblings, 1 reply; 15+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-04-06  3:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Wei Liu, Anthony PERARD,
	Juergen Gross

Add xendevicemodel_get_ioreq_server_info_ext() which additionally
returns output flags that XEN_DMOP_get_ioreq_server_info can now return.
Do not change signature of existing
xendevicemodel_get_ioreq_server_info() so existing users will not need
to be changed.

This advertises behavior change of "x86/msi: passthrough all MSI-X
vector ctrl writes to device model" patch.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v3:
 - new patch

Should there be some HAVE_* #define in the header? Does this change
require soname bump (I hope it doesn't...).
---
 tools/include/xendevicemodel.h | 23 +++++++++++++++++++++++
 tools/libs/devicemodel/core.c  | 16 ++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
index 797e0c6b2961..77a99e670551 100644
--- a/tools/include/xendevicemodel.h
+++ b/tools/include/xendevicemodel.h
@@ -72,6 +72,29 @@ int xendevicemodel_get_ioreq_server_info(
     evtchn_port_t *bufioreq_port);
 
 /**
+ * This function retrieves the necessary information to allow an
+ * emulator to use an IOREQ Server, including feature flags.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm ioreq_gfn pointer to a xen_pfn_t to receive the synchronous ioreq
+ *                  gfn. (May be NULL if not required)
+ * @parm bufioreq_gfn pointer to a xen_pfn_t to receive the buffered ioreq
+ *                    gfn. (May be NULL if not required)
+ * @parm bufioreq_port pointer to a evtchn_port_t to receive the buffered
+ *                     ioreq event channel. (May be NULL if not required)
+ * @parm flags pointer to receive flags bitmask, see hvm/dm_op.h for details.
+ *             (May be NULL if not required)
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_get_ioreq_server_info_ext(
+    xendevicemodel_handle *dmod, domid_t domid, ioservid_t id,
+    xen_pfn_t *ioreq_gfn, xen_pfn_t *bufioreq_gfn,
+    evtchn_port_t *bufioreq_port,
+    unsigned int *flags);
+
+/**
  * This function registers a range of memory or I/O ports for emulation.
  *
  * @parm dmod a handle to an open devicemodel interface.
diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 8e619eeb0a1f..337622e608c2 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -189,10 +189,10 @@ int xendevicemodel_create_ioreq_server(
     return 0;
 }
 
-int xendevicemodel_get_ioreq_server_info(
+int xendevicemodel_get_ioreq_server_info_ext(
     xendevicemodel_handle *dmod, domid_t domid, ioservid_t id,
     xen_pfn_t *ioreq_gfn, xen_pfn_t *bufioreq_gfn,
-    evtchn_port_t *bufioreq_port)
+    evtchn_port_t *bufioreq_port, unsigned int *flags)
 {
     struct xen_dm_op op;
     struct xen_dm_op_get_ioreq_server_info *data;
@@ -226,9 +226,21 @@ int xendevicemodel_get_ioreq_server_info(
     if (bufioreq_port)
         *bufioreq_port = data->bufioreq_port;
 
+    if (flags)
+        *flags = data->flags;
+
     return 0;
 }
 
+int xendevicemodel_get_ioreq_server_info(
+    xendevicemodel_handle *dmod, domid_t domid, ioservid_t id,
+    xen_pfn_t *ioreq_gfn, xen_pfn_t *bufioreq_gfn,
+    evtchn_port_t *bufioreq_port)
+{
+    return xendevicemodel_get_ioreq_server_info_ext(
+        dmod, domid, id, ioreq_gfn, bufioreq_gfn, bufioreq_port, NULL);
+}
+
 int xendevicemodel_map_io_range_to_ioreq_server(
     xendevicemodel_handle *dmod, domid_t domid, ioservid_t id, int is_mmio,
     uint64_t start, uint64_t end)
-- 
git-series 0.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/4] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-04-06  3:57 [PATCH v3 0/4] MSI-X support with qemu in stubdomain, and other related changes Marek Marczykowski-Górecki
  2023-04-06  3:57 ` [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
  2023-04-06  3:57 ` [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext Marek Marczykowski-Górecki
@ 2023-04-06  3:57 ` Marek Marczykowski-Górecki
  2023-04-24 13:59   ` Jan Beulich
  2023-04-06  3:57 ` [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
  3 siblings, 1 reply; 15+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-04-06  3:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Wei Liu

Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
on the same page as MSI-X table. Device model (especially one in
stubdomain) cannot really handle those, as direct writes to that page is
refused (page is on the mmio_ro_ranges list). Instead, extend
msixtbl_mmio_ops to handle such accesses too.

Doing this, requires correlating write location with guest view
of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
for PV would need to be done separately.

This will be also used to read Pending Bit Array, if it lives on the same
page, making QEMU not needing /dev/mem access at all (especially helpful
with lockdown enabled in dom0). If PBA lives on another page, QEMU will
map it to the guest directly.
If PBA lives on the same page, discard writes and log a message.
Technically, writes outside of PBA could be allowed, but at this moment
the precise location of PBA isn't saved, and also no known device abuses
the spec in this way (at least yet).

To access those registers, msixtbl_mmio_ops need the relevant page
mapped. MSI handling already has infrastructure for that, using fixmap,
so try to map first/last page of the MSI-X table (if necessary) and save
their fixmap indexes. Note that msix_get_fixmap() does reference
counting and reuses existing mapping, so just call it directly, even if
the page was mapped before. Also, it uses a specific range of fixmap
indexes which doesn't include 0, so use 0 as default ("not mapped")
value - which simplifies code a bit.

GCC gets confused about 'desc' variable:

    arch/x86/hvm/vmsi.c: In function ‘msixtbl_range’:
    arch/x86/hvm/vmsi.c:553:8: error: ‘desc’ may be used uninitialized [-Werror=maybe-uninitialized]
      553 |     if ( desc )
          |        ^
    arch/x86/hvm/vmsi.c:537:28: note: ‘desc’ was declared here
      537 |     const struct msi_desc *desc;
          |                            ^~~~

It's conditional initialization is actually correct (in the case where
it isn't initialized, function returns early), but to avoid
build failure initialize it explicitly to NULL anyway.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v3:
 - merge handling into msixtbl_mmio_ops
 - extend commit message
v2:
 - adjust commit message
 - pass struct domain to msixtbl_page_handler_get_hwaddr()
 - reduce local variables used only once
 - log a warning if write is forbidden if MSI-X and PBA lives on the same
   page
 - do not passthrough unaligned accesses
 - handle accesses both before and after MSI-X table
---
 xen/arch/x86/hvm/vmsi.c        | 199 ++++++++++++++++++++++++++++++++--
 xen/arch/x86/include/asm/msi.h |   5 +-
 xen/arch/x86/msi.c             |  38 ++++++-
 3 files changed, 231 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 231253a2cbd4..6f49493d3f58 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -181,15 +181,21 @@ static bool msixtbl_initialised(const struct domain *d)
 }
 
 static struct msixtbl_entry *msixtbl_find_entry(
-    struct vcpu *v, unsigned long addr)
+    struct vcpu *v, unsigned long addr, bool same_page)
 {
     struct msixtbl_entry *entry;
     struct domain *d = v->domain;
 
     list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
+    {
+        if ( same_page &&
+             PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) &&
+             PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len) )
+            return entry;
         if ( addr >= entry->gtable &&
              addr < entry->gtable + entry->table_len )
             return entry;
+    }
 
     return NULL;
 }
@@ -213,6 +219,144 @@ static struct msi_desc *msixtbl_addr_to_desc(
     return NULL;
 }
 
+/*
+ * Returns:
+ *  - UINT_MAX if no handling should be done
+ *  - UINT_MAX-1 if write should be discarded
+ *  - a fixmap idx to use for handling
+ */
+#define ADJACENT_DONT_HANDLE UINT_MAX
+#define ADJACENT_DISCARD_WRITE (UINT_MAX - 1)
+static unsigned int adjacent_handle(
+    const struct msixtbl_entry *entry, unsigned long addr, bool write)
+{
+    unsigned int adj_type;
+
+    if ( !entry || !entry->pdev )
+        return ADJACENT_DONT_HANDLE;
+
+    if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable )
+        adj_type = ADJ_IDX_FIRST;
+    else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len - 1) &&
+              addr >= entry->gtable + entry->table_len )
+        adj_type = ADJ_IDX_LAST;
+    else
+        return ADJACENT_DONT_HANDLE;
+
+    ASSERT(entry->pdev->msix);
+
+    if ( !entry->pdev->msix->adj_access_table_idx[adj_type] )
+    {
+        gprintk(XENLOG_WARNING,
+                "Page for adjacent(%d) MSI-X table access not initialized for %pp (addr %#lx, gtable %#lx\n",
+                adj_type, &entry->pdev->sbdf, addr, entry->gtable);
+
+        return ADJACENT_DONT_HANDLE;
+    }
+
+    /* If PBA lives on the same page too, discard writes. */
+    if ( write && (
+        (adj_type == ADJ_IDX_LAST &&
+         entry->pdev->msix->table.last == entry->pdev->msix->pba.first) ||
+        (adj_type == ADJ_IDX_FIRST &&
+         entry->pdev->msix->table.first == entry->pdev->msix->pba.last)) )
+    {
+        gprintk(XENLOG_WARNING,
+                 "MSI-X table and PBA of %pp live on the same page, "
+                 "writing to other registers there is not implemented\n",
+                 &entry->pdev->sbdf);
+        return ADJACENT_DISCARD_WRITE;
+    }
+
+    return entry->pdev->msix->adj_access_table_idx[adj_type];
+}
+
+static int adjacent_read(
+        unsigned int fixmap_idx,
+        uint64_t address, uint32_t len, uint64_t *pval)
+{
+    const void __iomem *hwaddr;
+
+    *pval = ~0UL;
+
+    if ( !IS_ALIGNED(address, len) )
+    {
+        gdprintk(XENLOG_WARNING,
+                "Dropping unaligned read from MSI-X table page at %" PRIx64 "\n",
+                address);
+        return X86EMUL_OKAY;
+    }
+
+    ASSERT(fixmap_idx != ADJACENT_DISCARD_WRITE);
+
+    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
+
+    switch ( len )
+    {
+    case 1:
+        *pval = readb(hwaddr);
+        break;
+
+    case 2:
+        *pval = readw(hwaddr);
+        break;
+
+    case 4:
+        *pval = readl(hwaddr);
+        break;
+
+    case 8:
+        *pval = readq(hwaddr);
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
+    return X86EMUL_OKAY;
+}
+
+static int adjacent_write(
+        unsigned int fixmap_idx,
+        uint64_t address, uint32_t len, uint64_t val)
+{
+    void __iomem *hwaddr;
+
+    if ( !IS_ALIGNED(address, len) )
+    {
+        gdprintk(XENLOG_WARNING,
+                "Dropping unaligned write to MSI-X table page at %" PRIx64 "\n",
+                address);
+        return X86EMUL_OKAY;
+    }
+
+    if ( fixmap_idx == ADJACENT_DISCARD_WRITE )
+        return X86EMUL_OKAY;
+
+    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
+
+    switch ( len ) {
+    case 1:
+        writeb(val, hwaddr);
+        break;
+
+    case 2:
+        writew(val, hwaddr);
+        break;
+
+    case 4:
+        writel(val, hwaddr);
+        break;
+
+    case 8:
+        writeq(val, hwaddr);
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
+    return X86EMUL_OKAY;
+}
+
 static int cf_check msixtbl_read(
     const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
     uint64_t *pval)
@@ -220,16 +364,27 @@ static int cf_check msixtbl_read(
     unsigned long offset;
     struct msixtbl_entry *entry;
     unsigned int nr_entry, index;
+    unsigned int adjacent_fixmap;
     int r = X86EMUL_UNHANDLEABLE;
 
-    if ( (len != 4 && len != 8) || (address & (len - 1)) )
+    if ( !IS_ALIGNED(address, len) )
         return r;
 
     rcu_read_lock(&msixtbl_rcu_lock);
-
-    entry = msixtbl_find_entry(current, address);
+    entry = msixtbl_find_entry(current, address, true);
     if ( !entry )
         goto out;
+
+    adjacent_fixmap = adjacent_handle(entry, address, false);
+    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
+    {
+        r = adjacent_read(adjacent_fixmap, address, len, pval);
+        goto out;
+    }
+
+    if ( len != 4 && len != 8 )
+        goto out;
+
     offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
 
     if ( offset != PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET )
@@ -291,16 +446,29 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
     int r = X86EMUL_UNHANDLEABLE;
     unsigned long flags;
     struct irq_desc *desc;
+    unsigned int adjacent_fixmap;
 
-    if ( (len != 4 && len != 8 && len != WRITE_LEN4_COMPLETION) ||
-         (len && (address & (len - 1))) )
-        return X86EMUL_UNHANDLEABLE;
+    if ( len && !IS_ALIGNED(address, len) )
+        return r;
 
     rcu_read_lock(&msixtbl_rcu_lock);
 
-    entry = msixtbl_find_entry(v, address);
+    entry = msixtbl_find_entry(v, address, true);
     if ( !entry )
         goto out;
+
+    if ( len != WRITE_LEN4_COMPLETION )
+    {
+        adjacent_fixmap = adjacent_handle(entry, address, true);
+        if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
+        {
+            r = adjacent_write(adjacent_fixmap, address, len, val);
+            goto out;
+        }
+        if ( len != 4 && len != 8 )
+            goto out;
+    }
+
     nr_entry = array_index_nospec(((address - entry->gtable) /
                                    PCI_MSIX_ENTRY_SIZE),
                                   MAX_MSIX_TABLE_ENTRIES);
@@ -375,14 +543,23 @@ static bool cf_check msixtbl_range(
 {
     struct vcpu *curr = current;
     unsigned long addr = r->addr;
-    const struct msi_desc *desc;
+    struct msixtbl_entry *entry;
+    const struct msi_desc *desc = NULL;
+    unsigned int adjacent_fixmap;
+
 
     ASSERT(r->type == IOREQ_TYPE_COPY);
 
     rcu_read_lock(&msixtbl_rcu_lock);
-    desc = msixtbl_addr_to_desc(msixtbl_find_entry(curr, addr), addr);
+    entry = msixtbl_find_entry(curr, addr, true);
+    adjacent_fixmap = adjacent_handle(entry, addr, false);
+    if ( adjacent_fixmap == ADJACENT_DONT_HANDLE )
+        desc = msixtbl_addr_to_desc(entry, addr);
     rcu_read_unlock(&msixtbl_rcu_lock);
 
+    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
+        return 1;
+
     if ( desc )
         return 1;
 
@@ -627,7 +804,7 @@ void msix_write_completion(struct vcpu *v)
         uint32_t data;
 
         rcu_read_lock(&msixtbl_rcu_lock);
-        desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, snoop_addr),
+        desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, snoop_addr, false),
                                     snoop_addr);
         rcu_read_unlock(&msixtbl_rcu_lock);
 
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index a53ade95c9ad..d13cf1c1f873 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -207,6 +207,10 @@ struct msg_address {
                                        PCI_MSIX_ENTRY_SIZE + \
                                        (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
 
+/* indexes in adj_access_table_idx[] below */
+#define ADJ_IDX_FIRST 0
+#define ADJ_IDX_LAST  1
+
 struct arch_msix {
     unsigned int nr_entries, used_entries;
     struct {
@@ -214,6 +218,7 @@ struct arch_msix {
     } table, pba;
     int table_refcnt[MAX_MSIX_TABLE_PAGES];
     int table_idx[MAX_MSIX_TABLE_PAGES];
+    int adj_access_table_idx[2];
     spinlock_t table_lock;
     bool host_maskall, guest_maskall;
     domid_t warned;
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index d0bf63df1def..c216acbf0e5d 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
                 domain_crash(d);
             /* XXX How to deal with existing mappings? */
         }
+
+        /*
+         * If the MSI-X table doesn't start at the page boundary, map the first page for
+         * passthrough accesses.
+         */
+        if ( PAGE_OFFSET(table_paddr) )
+        {
+            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
+
+            if ( idx > 0 )
+                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
+            else
+                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx);
+        }
+        /*
+         * If the MSI-X table doesn't span full page(s), map the last page for
+         * passthrough accesses.
+         */
+        if ( PAGE_OFFSET(table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) )
+        {
+            uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE;
+            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
+
+            if ( idx > 0 )
+                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
+            else
+                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx);
+        }
     }
     WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT));
     ++msix->used_entries;
@@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
             WARN();
         msix->table.first = 0;
         msix->table.last = 0;
+        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
+        {
+            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]);
+            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
+        }
+        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
+        {
+            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]);
+            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;
+        }
 
         if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
                                    msix->pba.last) )
-- 
git-series 0.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot
  2023-04-06  3:57 [PATCH v3 0/4] MSI-X support with qemu in stubdomain, and other related changes Marek Marczykowski-Górecki
                   ` (2 preceding siblings ...)
  2023-04-06  3:57 ` [PATCH v3 3/4] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
@ 2023-04-06  3:57 ` Marek Marczykowski-Górecki
  2023-04-24 14:19   ` Jan Beulich
  3 siblings, 1 reply; 15+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-04-06  3:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Jason Andryuk, Jan Beulich,
	Paul Durrant, Roger Pau Monné

Some firmware/devices are found to not reset MSI-X properly, leaving
MASKALL set. Jason reports on his machine MASKALL persists through a
warm reboot, but is cleared on cold boot. Xen relies on initial state
being MASKALL clear. Especially, pci_reset_msix_state() assumes if
MASKALL is set, it was Xen setting it due to msix->host_maskall or
msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
set, so clear them both.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v3:
 - update comment
 - clear bits only when they were set
---
 xen/drivers/passthrough/msi.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
index ce1a450f6f4a..c9f7eac29ebf 100644
--- a/xen/drivers/passthrough/msi.c
+++ b/xen/drivers/passthrough/msi.c
@@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev)
         spin_lock_init(&msix->table_lock);
 
         ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
+
+        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )
+        {
+            /*
+             * pci_reset_msix_state() relies on MASKALL not being set
+             * initially, clear it (and ENABLE too - for safety), to meet that
+             * expectation.
+             */
+            printk(XENLOG_WARNING
+                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
+                   &pdev->sbdf,
+                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
+                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);
+            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
+            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
+        }
+
         msix->nr_entries = msix_table_size(ctrl);
 
         pdev->msix = msix;
-- 
git-series 0.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext
  2023-04-06  3:57 ` [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext Marek Marczykowski-Górecki
@ 2023-04-06  6:05   ` Juergen Gross
  2023-05-02 15:13     ` Anthony PERARD
  0 siblings, 1 reply; 15+ messages in thread
From: Juergen Gross @ 2023-04-06  6:05 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 796 bytes --]

On 06.04.23 05:57, Marek Marczykowski-Górecki wrote:
> Add xendevicemodel_get_ioreq_server_info_ext() which additionally
> returns output flags that XEN_DMOP_get_ioreq_server_info can now return.
> Do not change signature of existing
> xendevicemodel_get_ioreq_server_info() so existing users will not need
> to be changed.
> 
> This advertises behavior change of "x86/msi: passthrough all MSI-X
> vector ctrl writes to device model" patch.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> v3:
>   - new patch
> 
> Should there be some HAVE_* #define in the header? Does this change
> require soname bump (I hope it doesn't...).

You need to add version 1.5 to libxendevicemodel.map which should define
the new function.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-04-06  3:57 ` [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
@ 2023-04-24 13:06   ` Jan Beulich
  2023-05-03  9:01   ` Roger Pau Monné
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2023-04-24 13:06 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Paul Durrant, xen-devel

On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -272,6 +272,15 @@ out:
>      return r;
>  }
>  
> +/*
> + * This function returns X86EMUL_UNHANDLEABLE even if write is properly
> + * handled, to propagate it to the device model (so it can keep its internal
> + * state in sync).
> + * len==0 means really len==4, but as a write completion that will return
> + * X86EMUL_OKAY on successful processing. Use WRITE_LEN4_COMPLETION to make it
> + * less confusing.
> + */
> +#define WRITE_LEN4_COMPLETION 0
>  static int msixtbl_write(struct vcpu *v, unsigned long address,
>                           unsigned int len, unsigned long val)
>  {
> @@ -283,8 +292,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
>      unsigned long flags;
>      struct irq_desc *desc;
>  
> -    if ( (len != 4 && len != 8) || (address & (len - 1)) )
> -        return r;
> +    if ( (len != 4 && len != 8 && len != WRITE_LEN4_COMPLETION) ||
> +         (len && (address & (len - 1))) )
> +        return X86EMUL_UNHANDLEABLE;
>  
>      rcu_read_lock(&msixtbl_rcu_lock);
>  
> @@ -345,7 +355,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
>  
>  unlock:
>      spin_unlock_irqrestore(&desc->lock, flags);
> -    if ( len == 4 )
> +    if ( len == WRITE_LEN4_COMPLETION )
>          r = X86EMUL_OKAY;
>  
>  out:
> @@ -635,7 +645,7 @@ void msix_write_completion(struct vcpu *v)
>          return;
>  
>      v->arch.hvm.hvm_io.msix_unmask_address = 0;
> -    if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
> +    if ( msixtbl_write(v, ctrl_address, WRITE_LEN4_COMPLETION, 0) != X86EMUL_OKAY )
>          gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
>  }
>  

This part is okay with me, but ...

> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -743,7 +743,8 @@ static int ioreq_server_destroy(struct domain *d, ioservid_t id)
>  static int ioreq_server_get_info(struct domain *d, ioservid_t id,
>                                   unsigned long *ioreq_gfn,
>                                   unsigned long *bufioreq_gfn,
> -                                 evtchn_port_t *bufioreq_port)
> +                                 evtchn_port_t *bufioreq_port,
> +                                 uint16_t *flags)
>  {
>      struct ioreq_server *s;
>      int rc;
> @@ -779,6 +780,9 @@ static int ioreq_server_get_info(struct domain *d, ioservid_t id,
>              *bufioreq_port = s->bufioreq_evtchn;
>      }
>  
> +    /* Advertise supported features/behaviors. */
> +    *flags = XEN_DMOP_all_msix_writes;
> +
>      rc = 0;
>  
>   out:
> @@ -1374,7 +1378,8 @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op)
>                                     NULL : (unsigned long *)&data->ioreq_gfn,
>                                     (data->flags & XEN_DMOP_no_gfns) ?
>                                     NULL : (unsigned long *)&data->bufioreq_gfn,
> -                                   &data->bufioreq_port);
> +                                   &data->bufioreq_port, &data->flags);
> +
>          break;
>      }
>  
> diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
> index acdf91693d0b..490b151c5dd7 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -70,7 +70,9 @@ typedef struct xen_dm_op_create_ioreq_server xen_dm_op_create_ioreq_server_t;
>   * not contain XEN_DMOP_no_gfns then these pages will be made available and
>   * the frame numbers passed back in gfns <ioreq_gfn> and <bufioreq_gfn>
>   * respectively. (If the IOREQ Server is not handling buffered emulation
> - * only <ioreq_gfn> will be valid).
> + * only <ioreq_gfn> will be valid). When Xen returns XEN_DMOP_all_msix_writes
> + * flag set, it will notify the IOREQ server about all writes to MSI-X table
> + * (if it's handled by this IOREQ server), not only those clearing a mask bit.
>   *
>   * NOTE: To access the synchronous ioreq structures and buffered ioreq
>   *       ring, it is preferable to use the XENMEM_acquire_resource memory
> @@ -81,11 +83,13 @@ typedef struct xen_dm_op_create_ioreq_server xen_dm_op_create_ioreq_server_t;
>  struct xen_dm_op_get_ioreq_server_info {
>      /* IN - server id */
>      ioservid_t id;
> -    /* IN - flags */
> +    /* IN/OUT - flags */
>      uint16_t flags;
>  
> -#define _XEN_DMOP_no_gfns 0
> -#define XEN_DMOP_no_gfns (1u << _XEN_DMOP_no_gfns)
> +#define _XEN_DMOP_no_gfns         0  /* IN */
> +#define _XEN_DMOP_all_msix_writes 1  /* OUT */
> +#define XEN_DMOP_no_gfns         (1u << _XEN_DMOP_no_gfns)
> +#define XEN_DMOP_all_msix_writes (1u << _XEN_DMOP_all_msix_writes)
>  
>      /* OUT - buffered ioreq port */
>      evtchn_port_t bufioreq_port;

... this isn't: What is obtained through this hypercall is information
pertaining to a particular ioreq server. I don't think Xen behavior
should be reported there. To me this information much rather fits in
what public/features.h has / offers. And XENVER_get_features isn't
constrained in any way, so any dm ought to be able to retrieve the
information that way.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-04-06  3:57 ` [PATCH v3 3/4] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
@ 2023-04-24 13:59   ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2023-04-24 13:59 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -181,15 +181,21 @@ static bool msixtbl_initialised(const struct domain *d)
>  }
>  
>  static struct msixtbl_entry *msixtbl_find_entry(
> -    struct vcpu *v, unsigned long addr)
> +    struct vcpu *v, unsigned long addr, bool same_page)
>  {
>      struct msixtbl_entry *entry;
>      struct domain *d = v->domain;
>  
>      list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> +    {
> +        if ( same_page &&
> +             PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) &&
> +             PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len) )

This will allow access to the subsequent page in case entry->gtable +
entry->table_len falls on a page boundary. I think you mean "< PFN_UP()"
or "<= PFN_DOWN(... - 1)".

But I anyway don't understand why you need both this and ...

> +            return entry;
>          if ( addr >= entry->gtable &&
>               addr < entry->gtable + entry->table_len )
>              return entry;

... this (and the new function parameter). Again like in the recently
added vPCI code, all you should need ought to be the new code you add
(see msix.c:msix_find()), with the caller then separating "in table" vs
"other space". Only one of the four callers really cares about _just_
the table range. In no event do you need to do both pairs of checks:

        if ( same_page
             ? PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) &&
               ...
             : ...

(i.e. in case you disagree with moving the original check to the sole
place where it's needed).

> @@ -213,6 +219,144 @@ static struct msi_desc *msixtbl_addr_to_desc(
>      return NULL;
>  }
>  
> +/*
> + * Returns:
> + *  - UINT_MAX if no handling should be done
> + *  - UINT_MAX-1 if write should be discarded
> + *  - a fixmap idx to use for handling
> + */
> +#define ADJACENT_DONT_HANDLE UINT_MAX
> +#define ADJACENT_DISCARD_WRITE (UINT_MAX - 1)
> +static unsigned int adjacent_handle(
> +    const struct msixtbl_entry *entry, unsigned long addr, bool write)
> +{
> +    unsigned int adj_type;
> +
> +    if ( !entry || !entry->pdev )
> +        return ADJACENT_DONT_HANDLE;
> +
> +    if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable )
> +        adj_type = ADJ_IDX_FIRST;
> +    else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len - 1) &&
> +              addr >= entry->gtable + entry->table_len )
> +        adj_type = ADJ_IDX_LAST;
> +    else
> +        return ADJACENT_DONT_HANDLE;
> +
> +    ASSERT(entry->pdev->msix);

Considering the code below, you probably want to latch this pointer into
a local variable.

> +    if ( !entry->pdev->msix->adj_access_table_idx[adj_type] )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "Page for adjacent(%d) MSI-X table access not initialized for %pp (addr %#lx, gtable %#lx\n",
> +                adj_type, &entry->pdev->sbdf, addr, entry->gtable);
> +
> +        return ADJACENT_DONT_HANDLE;
> +    }
> +
> +    /* If PBA lives on the same page too, discard writes. */
> +    if ( write && (

Nit: Style (no standalone opening parenthesis ought to be last on a line).

> +        (adj_type == ADJ_IDX_LAST &&
> +         entry->pdev->msix->table.last == entry->pdev->msix->pba.first) ||
> +        (adj_type == ADJ_IDX_FIRST &&
> +         entry->pdev->msix->table.first == entry->pdev->msix->pba.last)) )

And these all need indenting by one more blank.

> +    {
> +        gprintk(XENLOG_WARNING,
> +                 "MSI-X table and PBA of %pp live on the same page, "
> +                 "writing to other registers there is not implemented\n",
> +                 &entry->pdev->sbdf);

Whereas here a blank needs dropping. But - don't you discard writes to
more than just the PBA here?

> +        return ADJACENT_DISCARD_WRITE;
> +    }
> +
> +    return entry->pdev->msix->adj_access_table_idx[adj_type];
> +}
> +
> +static int adjacent_read(
> +        unsigned int fixmap_idx,
> +        uint64_t address, uint32_t len, uint64_t *pval)

Nit: Style (too deep indentation). Also while I'm fine with the two
uint64_t (arguably the former may want to be paddr_t), the uint32_t
clearly isn't needed here, and wants to be unsigned int.

> +{
> +    const void __iomem *hwaddr;
> +
> +    *pval = ~0UL;
> +
> +    if ( !IS_ALIGNED(address, len) )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                "Dropping unaligned read from MSI-X table page at %" PRIx64 "\n",
> +                address);

Here indentation is one too little again.

> +        return X86EMUL_OKAY;
> +    }
> +
> +    ASSERT(fixmap_idx != ADJACENT_DISCARD_WRITE);
> +
> +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> +
> +    switch ( len )
> +    {
> +    case 1:
> +        *pval = readb(hwaddr);
> +        break;
> +
> +    case 2:
> +        *pval = readw(hwaddr);
> +        break;
> +
> +    case 4:
> +        *pval = readl(hwaddr);
> +        break;
> +
> +    case 8:
> +        *pval = readq(hwaddr);
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }
> +    return X86EMUL_OKAY;
> +}
> +
> +static int adjacent_write(
> +        unsigned int fixmap_idx,
> +        uint64_t address, uint32_t len, uint64_t val)
> +{
> +    void __iomem *hwaddr;
> +
> +    if ( !IS_ALIGNED(address, len) )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                "Dropping unaligned write to MSI-X table page at %" PRIx64 "\n",
> +                address);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    if ( fixmap_idx == ADJACENT_DISCARD_WRITE )
> +        return X86EMUL_OKAY;
> +
> +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> +
> +    switch ( len ) {

Nit: Style (brace placement).

> @@ -220,16 +364,27 @@ static int cf_check msixtbl_read(
>      unsigned long offset;
>      struct msixtbl_entry *entry;
>      unsigned int nr_entry, index;
> +    unsigned int adjacent_fixmap;
>      int r = X86EMUL_UNHANDLEABLE;
>  
> -    if ( (len != 4 && len != 8) || (address & (len - 1)) )
> +    if ( !IS_ALIGNED(address, len) )
>          return r;
>  
>      rcu_read_lock(&msixtbl_rcu_lock);
> -
> -    entry = msixtbl_find_entry(current, address);
> +    entry = msixtbl_find_entry(current, address, true);
>      if ( !entry )
>          goto out;
> +
> +    adjacent_fixmap = adjacent_handle(entry, address, false);
> +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> +    {
> +        r = adjacent_read(adjacent_fixmap, address, len, pval);
> +        goto out;
> +    }

adjacent_handle() may return ADJACENT_DONT_HANDLE for reasons other than
the address not being in the expected ranges, so you can't blindly fall
through to the original code just yet (same in functions further down).

> @@ -375,14 +543,23 @@ static bool cf_check msixtbl_range(
>  {
>      struct vcpu *curr = current;
>      unsigned long addr = r->addr;
> -    const struct msi_desc *desc;
> +    struct msixtbl_entry *entry;

const?

> +    const struct msi_desc *desc = NULL;
> +    unsigned int adjacent_fixmap;
> +
>  

Nit: Please don't introduce double blank lines.

> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -207,6 +207,10 @@ struct msg_address {
>                                         PCI_MSIX_ENTRY_SIZE + \
>                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
>  
> +/* indexes in adj_access_table_idx[] below */
> +#define ADJ_IDX_FIRST 0
> +#define ADJ_IDX_LAST  1
> +
>  struct arch_msix {
>      unsigned int nr_entries, used_entries;
>      struct {
> @@ -214,6 +218,7 @@ struct arch_msix {
>      } table, pba;
>      int table_refcnt[MAX_MSIX_TABLE_PAGES];
>      int table_idx[MAX_MSIX_TABLE_PAGES];
> +    int adj_access_table_idx[2];

adjacent_handle() returns unsigned int - why is it plain int here?

Also, to keep variable name length somewhat limited, I don't think "table"
adds much value to the name here.

> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
>                  domain_crash(d);
>              /* XXX How to deal with existing mappings? */
>          }
> +
> +        /*
> +         * If the MSI-X table doesn't start at the page boundary, map the first page for
> +         * passthrough accesses.
> +         */
> +        if ( PAGE_OFFSET(table_paddr) )
> +        {
> +            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> +
> +            if ( idx > 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx);
> +        }
> +        /*
> +         * If the MSI-X table doesn't span full page(s), map the last page for
> +         * passthrough accesses.
> +         */
> +        if ( PAGE_OFFSET(table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) )
> +        {
> +            uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE;
> +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> +
> +            if ( idx > 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx);
> +        }

Wouldn't this better be delayed until / restricted to the device actually
being prepared for guest use? Which may be as simple as making all of this
an "else" to the earlier if()?

Also I think the 2nd comment is somewhat misleading - you don't really mean
"spans full pages" but, along the lines of the first one, "ends on a page
boundary".

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot
  2023-04-06  3:57 ` [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
@ 2023-04-24 14:19   ` Jan Beulich
  2023-04-24 15:25     ` Jason Andryuk
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-04-24 14:19 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Jason Andryuk
  Cc: Paul Durrant, Roger Pau Monné, xen-devel

On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> Some firmware/devices are found to not reset MSI-X properly, leaving
> MASKALL set. Jason reports on his machine MASKALL persists through a
> warm reboot, but is cleared on cold boot. Xen relies on initial state
> being MASKALL clear. Especially, pci_reset_msix_state() assumes if
> MASKALL is set, it was Xen setting it due to msix->host_maskall or
> msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
> set, so clear them both.
> 
> Reported-by: Jason Andryuk <jandryuk@gmail.com>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit with a couple of nits (which I'd be happy to address while
committing, so long as you agree). First one being on the last
sentence above: It's surely not just "might"; if resetting already
doesn't work right, nothing says that the individual mask bit all
end up set. Clearing ENABLE as well is only natural imo, if we
already need to fix up after firmware. So maybe "Even if so far not
observed to be left set, clear ENABLE as well"?

> --- a/xen/drivers/passthrough/msi.c
> +++ b/xen/drivers/passthrough/msi.c
> @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev)
>          spin_lock_init(&msix->table_lock);
>  
>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> +
> +        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )

Style (missing blanks around |; once more below).

> +        {
> +            /*
> +             * pci_reset_msix_state() relies on MASKALL not being set
> +             * initially, clear it (and ENABLE too - for safety), to meet that
> +             * expectation.
> +             */
> +            printk(XENLOG_WARNING
> +                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
> +                   &pdev->sbdf,
> +                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
> +                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);

Our "canonical" way of dealing with this is !!(x & y).

> +            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> +            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> +        }
> +
>          msix->nr_entries = msix_table_size(ctrl);
>  
>          pdev->msix = msix;


Aiui there's no dependency here on the earlier patches in the series;
please confirm (or otherwise).

Jason - any chance of getting a Tested-by: from you?

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot
  2023-04-24 14:19   ` Jan Beulich
@ 2023-04-24 15:25     ` Jason Andryuk
  2023-04-24 15:30       ` Jan Beulich
  2023-04-24 15:34       ` Marek Marczykowski-Górecki
  0 siblings, 2 replies; 15+ messages in thread
From: Jason Andryuk @ 2023-04-24 15:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Marek Marczykowski-Górecki, Paul Durrant,
	Roger Pau Monné, xen-devel

On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> > Some firmware/devices are found to not reset MSI-X properly, leaving
> > MASKALL set. Jason reports on his machine MASKALL persists through a
> > warm reboot, but is cleared on cold boot. Xen relies on initial state
> > being MASKALL clear. Especially, pci_reset_msix_state() assumes if
> > MASKALL is set, it was Xen setting it due to msix->host_maskall or
> > msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
> > set, so clear them both.
> >
> > Reported-by: Jason Andryuk <jandryuk@gmail.com>
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit with a couple of nits (which I'd be happy to address while
> committing, so long as you agree). First one being on the last
> sentence above: It's surely not just "might"; if resetting already
> doesn't work right, nothing says that the individual mask bit all
> end up set. Clearing ENABLE as well is only natural imo, if we
> already need to fix up after firmware. So maybe "Even if so far not
> observed to be left set, clear ENABLE as well"?
>
> > --- a/xen/drivers/passthrough/msi.c
> > +++ b/xen/drivers/passthrough/msi.c
> > @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev)
> >          spin_lock_init(&msix->table_lock);
> >
> >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> > +
> > +        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )
>
> Style (missing blanks around |; once more below).
>
> > +        {
> > +            /*
> > +             * pci_reset_msix_state() relies on MASKALL not being set
> > +             * initially, clear it (and ENABLE too - for safety), to meet that
> > +             * expectation.
> > +             */
> > +            printk(XENLOG_WARNING
> > +                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
> > +                   &pdev->sbdf,
> > +                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
> > +                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);
>
> Our "canonical" way of dealing with this is !!(x & y).
>
> > +            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> > +            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> > +        }
> > +
> >          msix->nr_entries = msix_table_size(ctrl);
> >
> >          pdev->msix = msix;
>
>
> Aiui there's no dependency here on the earlier patches in the series;
> please confirm (or otherwise).
>
> Jason - any chance of getting a Tested-by: from you?

I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.

I posted in these two messages - a summary is below.
https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/

OpenXT has a patch that performs an extra reset after domain shutdown,
and that causes Xen to set MASKALL.  I confirmed by removing it.  So
this patch helps with clearing MASKALL on host boot, but with the
OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
shutdown and then the subsequent boot can't assign the device.

So this patch is helpful in some scenarios, but it was also an issue
caused by the OpenXT patch.  Does that make it unsuitable for
inclusion?  I assume the OpenXT patch wasn't an issue previously since
MSI-X was never enabled.

Regards,
Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot
  2023-04-24 15:25     ` Jason Andryuk
@ 2023-04-24 15:30       ` Jan Beulich
  2023-04-24 16:42         ` Jason Andryuk
  2023-04-24 15:34       ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-04-24 15:30 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Marek Marczykowski-Górecki, Paul Durrant,
	Roger Pau Monné, xen-devel

On 24.04.2023 17:25, Jason Andryuk wrote:
> On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
>> Jason - any chance of getting a Tested-by: from you?
> 
> I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.
> 
> I posted in these two messages - a summary is below.
> https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
> https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/
> 
> OpenXT has a patch that performs an extra reset after domain shutdown,
> and that causes Xen to set MASKALL.  I confirmed by removing it.  So
> this patch helps with clearing MASKALL on host boot, but with the
> OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
> shutdown and then the subsequent boot can't assign the device.
> 
> So this patch is helpful in some scenarios, but it was also an issue
> caused by the OpenXT patch.  Does that make it unsuitable for
> inclusion?

What is "it" here? If I get your reply right, there is a similar issue
left unaddressed by this version of the change (and as was said before,
a device reset changing state that Xen tracks or otherwise cares about
needs to be reported to Xen). Yet that doesn't really fit with the
question, at least the way I read it ...

Jan

>  I assume the OpenXT patch wasn't an issue previously since
> MSI-X was never enabled.
> 
> Regards,
> Jason



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot
  2023-04-24 15:25     ` Jason Andryuk
  2023-04-24 15:30       ` Jan Beulich
@ 2023-04-24 15:34       ` Marek Marczykowski-Górecki
  1 sibling, 0 replies; 15+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-04-24 15:34 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Jan Beulich, Paul Durrant, Roger Pau Monné, xen-devel

[-- Attachment #1: Type: text/plain, Size: 4605 bytes --]

On Mon, Apr 24, 2023 at 11:25:01AM -0400, Jason Andryuk wrote:
> On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> > > Some firmware/devices are found to not reset MSI-X properly, leaving
> > > MASKALL set. Jason reports on his machine MASKALL persists through a
> > > warm reboot, but is cleared on cold boot. Xen relies on initial state
> > > being MASKALL clear. Especially, pci_reset_msix_state() assumes if
> > > MASKALL is set, it was Xen setting it due to msix->host_maskall or
> > > msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
> > > set, so clear them both.
> > >
> > > Reported-by: Jason Andryuk <jandryuk@gmail.com>
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > albeit with a couple of nits (which I'd be happy to address while
> > committing, so long as you agree). 

Yes, thanks!

> > First one being on the last
> > sentence above: It's surely not just "might"; if resetting already
> > doesn't work right, nothing says that the individual mask bit all
> > end up set. Clearing ENABLE as well is only natural imo, if we
> > already need to fix up after firmware. So maybe "Even if so far not
> > observed to be left set, clear ENABLE as well"?
> >
> > > --- a/xen/drivers/passthrough/msi.c
> > > +++ b/xen/drivers/passthrough/msi.c
> > > @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev)
> > >          spin_lock_init(&msix->table_lock);
> > >
> > >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> > > +
> > > +        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )
> >
> > Style (missing blanks around |; once more below).
> >
> > > +        {
> > > +            /*
> > > +             * pci_reset_msix_state() relies on MASKALL not being set
> > > +             * initially, clear it (and ENABLE too - for safety), to meet that
> > > +             * expectation.
> > > +             */
> > > +            printk(XENLOG_WARNING
> > > +                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
> > > +                   &pdev->sbdf,
> > > +                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
> > > +                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);
> >
> > Our "canonical" way of dealing with this is !!(x & y).
> >
> > > +            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> > > +            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> > > +        }
> > > +
> > >          msix->nr_entries = msix_table_size(ctrl);
> > >
> > >          pdev->msix = msix;
> >
> >
> > Aiui there's no dependency here on the earlier patches in the series;
> > please confirm (or otherwise).

Indeed. An earlier patch uncovered a firmware (or such) issue on some
systems and this patch deals with it, but it doesn't depend on earlier
patches.

> > Jason - any chance of getting a Tested-by: from you?
> 
> I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.
> 
> I posted in these two messages - a summary is below.
> https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
> https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/
> 
> OpenXT has a patch that performs an extra reset after domain shutdown,
> and that causes Xen to set MASKALL.  I confirmed by removing it.  So
> this patch helps with clearing MASKALL on host boot, but with the
> OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
> shutdown and then the subsequent boot can't assign the device.
> 
> So this patch is helpful in some scenarios, but it was also an issue
> caused by the OpenXT patch.  Does that make it unsuitable for
> inclusion?  I assume the OpenXT patch wasn't an issue previously since
> MSI-X was never enabled.

Upstream Xen IMO should deal with the state it gets on boot, regardless
of what was running previously (the actual issue is likely in firmware
or device itself, that it doesn't clear that bit, but well...). So,
rebooting from OpenXT, into vanilla upstream Xen should result in fully
functional system. That's why I included this patch, but haven't dealt
with an issue caused by OpenXT patch on subsequent domain startups (as
it doesn't apply to the upstream code base).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot
  2023-04-24 15:30       ` Jan Beulich
@ 2023-04-24 16:42         ` Jason Andryuk
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Andryuk @ 2023-04-24 16:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Marek Marczykowski-Górecki, Paul Durrant,
	Roger Pau Monné, xen-devel

On Mon, Apr 24, 2023 at 11:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 24.04.2023 17:25, Jason Andryuk wrote:
> > On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> Jason - any chance of getting a Tested-by: from you?
> >
> > I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.
> >
> > I posted in these two messages - a summary is below.
> > https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
> > https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/
> >
> > OpenXT has a patch that performs an extra reset after domain shutdown,
> > and that causes Xen to set MASKALL.  I confirmed by removing it.  So
> > this patch helps with clearing MASKALL on host boot, but with the
> > OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
> > shutdown and then the subsequent boot can't assign the device.
> >
> > So this patch is helpful in some scenarios, but it was also an issue
> > caused by the OpenXT patch.  Does that make it unsuitable for
> > inclusion?
>
> What is "it" here? If I get your reply right, there is a similar issue
> left unaddressed by this version of the change (and as was said before,
> a device reset changing state that Xen tracks or otherwise cares about
> needs to be reported to Xen). Yet that doesn't really fit with the
> question, at least the way I read it ...

"So this patch is helpful in some scenarios, but setting MASKALL in
the first place is an issue caused by the OpenXT patch.  Does that
make this patch unsuitable for inclusion?"

I think Marek's response that "Xen IMO should deal with the state it
gets on boot, regardless of what was running previously" makes sense
and means this is worthy of inclusion.

And I tested it.  Without the OpenXT libxl-fix-flr.patch:
(XEN) 0000:00:14.3: unexpected initial MSI-X state (MASKALL=0, ENABLE=1), fixing
With the OpenXT patch:
(XEN) 0000:00:14.3: unexpected initial MSI-X state (MASKALL=1, ENABLE=1), fixing

Tested-by: Jason Andryuk <jandryuk@gmail.com>

The patch is here if anyone want to look:
https://github.com/OpenXT/xenclient-oe/blob/master/recipes-extended/xen/files/libxl-fix-flr.patch

It's calling libxl__device_pci_reset() from destroy_finish_check(), so
it's not trying to do anything behind Xen's back.  It's just that Xen
sees memory decoding disabled, and then sets MASKALL.

Regards,
Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext
  2023-04-06  6:05   ` Juergen Gross
@ 2023-05-02 15:13     ` Anthony PERARD
  0 siblings, 0 replies; 15+ messages in thread
From: Anthony PERARD @ 2023-05-02 15:13 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel, Wei Liu, Juergen Gross

On Thu, Apr 06, 2023 at 08:05:04AM +0200, Juergen Gross wrote:
> On 06.04.23 05:57, Marek Marczykowski-Górecki wrote:
> > Add xendevicemodel_get_ioreq_server_info_ext() which additionally
> > returns output flags that XEN_DMOP_get_ioreq_server_info can now return.
> > Do not change signature of existing
> > xendevicemodel_get_ioreq_server_info() so existing users will not need
> > to be changed.
> > 
> > This advertises behavior change of "x86/msi: passthrough all MSI-X
> > vector ctrl writes to device model" patch.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > v3:
> >   - new patch
> > 
> > Should there be some HAVE_* #define in the header? Does this change
> > require soname bump (I hope it doesn't...).
> 
> You need to add version 1.5 to libxendevicemodel.map which should define
> the new function.

And update MINOR in the Makefile.

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-04-06  3:57 ` [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
  2023-04-24 13:06   ` Jan Beulich
@ 2023-05-03  9:01   ` Roger Pau Monné
  1 sibling, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2023-05-03  9:01 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Paul Durrant

On Thu, Apr 06, 2023 at 05:57:23AM +0200, Marek Marczykowski-Górecki wrote:
> QEMU needs to know whether clearing maskbit of a vector is really
> clearing, or was already cleared before. Currently Xen sends only
> clearing that bit to the device model, but not setting it, so QEMU
> cannot detect it. Because of that, QEMU is working this around by
> checking via /dev/mem, but that isn't the proper approach. It's just a
> workaround which in fact is racy.
> 
> Give all necessary information to QEMU by passing all ctrl writes,
> including masking a vector.
> 
> While this commit doesn't move the whole maskbit handling to QEMU (as
> discussed on xen-devel as one of the possibilities), it is a necessary
> first step anyway. Including telling QEMU it will get all the required
> information to do so. The actual implementation would need to include:
>  - a hypercall for QEMU to control just maskbit (without (re)binding the
>    interrupt again
>  - a methor for QEMU to tell Xen it will actually do the work
> Those are not part of this series.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> v3:
>  - advertise changed behavior in XEN_DMOP_get_ioreq_server_info - make
>    "flags" parameter IN/OUT
>  - move len check back to msixtbl_write() - will be needed there anyway
>    in a later patch
> v2:
>  - passthrough quad writes to emulator too (Jan)
>  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
>    #define for this magic value
> 
> Should flags on output include only "out" values (current version), or
> also include those passed in by the caller unchanged?
> ---
>  xen/arch/x86/hvm/vmsi.c        | 18 ++++++++++++++----
>  xen/common/ioreq.c             |  9 +++++++--
>  xen/include/public/hvm/dm_op.h | 12 ++++++++----
>  3 files changed, 29 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 3cd4923060c8..231253a2cbd4 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -272,6 +272,15 @@ out:
>      return r;
>  }
>  
> +/*
> + * This function returns X86EMUL_UNHANDLEABLE even if write is properly
> + * handled, to propagate it to the device model (so it can keep its internal
> + * state in sync).
> + * len==0 means really len==4, but as a write completion that will return
> + * X86EMUL_OKAY on successful processing. Use WRITE_LEN4_COMPLETION to make it
> + * less confusing.

Isn't it fine to just forward every (valid) write to the dm, and so
not introduce WRITE_LEN4_COMPLETION? (see my comment about
_msixtbl_write()).

> + */
> +#define WRITE_LEN4_COMPLETION 0
>  static int msixtbl_write(struct vcpu *v, unsigned long address,
>                           unsigned int len, unsigned long val)
>  {
> @@ -283,8 +292,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
>      unsigned long flags;
>      struct irq_desc *desc;
>  
> -    if ( (len != 4 && len != 8) || (address & (len - 1)) )
> -        return r;
> +    if ( (len != 4 && len != 8 && len != WRITE_LEN4_COMPLETION) ||
> +         (len && (address & (len - 1))) )
> +        return X86EMUL_UNHANDLEABLE;

I think you want to just return X86EMUL_OKAY here, and ignore the
access since it's not properly sized or aligned?

>  
>      rcu_read_lock(&msixtbl_rcu_lock);
>  
> @@ -345,7 +355,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
>  
>  unlock:
>      spin_unlock_irqrestore(&desc->lock, flags);
> -    if ( len == 4 )
> +    if ( len == WRITE_LEN4_COMPLETION )
>          r = X86EMUL_OKAY;
>  
>  out:
> @@ -635,7 +645,7 @@ void msix_write_completion(struct vcpu *v)
>          return;
>  
>      v->arch.hvm.hvm_io.msix_unmask_address = 0;
> -    if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
> +    if ( msixtbl_write(v, ctrl_address, WRITE_LEN4_COMPLETION, 0) != X86EMUL_OKAY )
>          gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");

Would it be possible to always return X86EMUL_UNHANDLEABLE from
_msixtbl_write() and keep the return values of msixtbl_write()
as-is?

>  }
>  
> diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
> index ecb8f545e1c4..bd6f074c1e85 100644
> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -743,7 +743,8 @@ static int ioreq_server_destroy(struct domain *d, ioservid_t id)
>  static int ioreq_server_get_info(struct domain *d, ioservid_t id,
>                                   unsigned long *ioreq_gfn,
>                                   unsigned long *bufioreq_gfn,
> -                                 evtchn_port_t *bufioreq_port)
> +                                 evtchn_port_t *bufioreq_port,
> +                                 uint16_t *flags)
>  {
>      struct ioreq_server *s;
>      int rc;
> @@ -779,6 +780,9 @@ static int ioreq_server_get_info(struct domain *d, ioservid_t id,
>              *bufioreq_port = s->bufioreq_evtchn;
>      }
>  
> +    /* Advertise supported features/behaviors. */
> +    *flags = XEN_DMOP_all_msix_writes;
> +
>      rc = 0;
>  
>   out:
> @@ -1374,7 +1378,8 @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op)
>                                     NULL : (unsigned long *)&data->ioreq_gfn,
>                                     (data->flags & XEN_DMOP_no_gfns) ?
>                                     NULL : (unsigned long *)&data->bufioreq_gfn,
> -                                   &data->bufioreq_port);
> +                                   &data->bufioreq_port, &data->flags);
> +
>          break;
>      }
>  
> diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
> index acdf91693d0b..490b151c5dd7 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -70,7 +70,9 @@ typedef struct xen_dm_op_create_ioreq_server xen_dm_op_create_ioreq_server_t;
>   * not contain XEN_DMOP_no_gfns then these pages will be made available and
>   * the frame numbers passed back in gfns <ioreq_gfn> and <bufioreq_gfn>
>   * respectively. (If the IOREQ Server is not handling buffered emulation
> - * only <ioreq_gfn> will be valid).
> + * only <ioreq_gfn> will be valid). When Xen returns XEN_DMOP_all_msix_writes
> + * flag set, it will notify the IOREQ server about all writes to MSI-X table
> + * (if it's handled by this IOREQ server), not only those clearing a mask bit.
>   *
>   * NOTE: To access the synchronous ioreq structures and buffered ioreq
>   *       ring, it is preferable to use the XENMEM_acquire_resource memory
> @@ -81,11 +83,13 @@ typedef struct xen_dm_op_create_ioreq_server xen_dm_op_create_ioreq_server_t;
>  struct xen_dm_op_get_ioreq_server_info {
>      /* IN - server id */
>      ioservid_t id;
> -    /* IN - flags */
> +    /* IN/OUT - flags */
>      uint16_t flags;
>  
> -#define _XEN_DMOP_no_gfns 0
> -#define XEN_DMOP_no_gfns (1u << _XEN_DMOP_no_gfns)
> +#define _XEN_DMOP_no_gfns         0  /* IN */
> +#define _XEN_DMOP_all_msix_writes 1  /* OUT */
> +#define XEN_DMOP_no_gfns         (1u << _XEN_DMOP_no_gfns)
> +#define XEN_DMOP_all_msix_writes (1u << _XEN_DMOP_all_msix_writes)

FWIW, we usually interleave _XEN_DMOP_no_gfns and XEN_DMOP_no_gfns,
ie:

#define _XEN_DMOP_no_gfns         0  /* IN */
#define XEN_DMOP_no_gfns          (1u << _XEN_DMOP_no_gfns)
#define _XEN_DMOP_all_msix_writes 1  /* OUT */
#define XEN_DMOP_all_msix_writes  (1u << _XEN_DMOP_all_msix_writes)

I wonder whether XEN_DMOP_all_msix_writes should be a feature
requested by the dm, as to not change the existing behaviour of how
MSIX writes are handled (which might work for QEMU, but could cause
issues with other out of tree users of ioreqs)?

That would turn XEN_DMOP_all_msix_writes into an IN flag also.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-05-03  9:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-06  3:57 [PATCH v3 0/4] MSI-X support with qemu in stubdomain, and other related changes Marek Marczykowski-Górecki
2023-04-06  3:57 ` [PATCH v3 1/4] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
2023-04-24 13:06   ` Jan Beulich
2023-05-03  9:01   ` Roger Pau Monné
2023-04-06  3:57 ` [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext Marek Marczykowski-Górecki
2023-04-06  6:05   ` Juergen Gross
2023-05-02 15:13     ` Anthony PERARD
2023-04-06  3:57 ` [PATCH v3 3/4] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
2023-04-24 13:59   ` Jan Beulich
2023-04-06  3:57 ` [PATCH v3 4/4] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
2023-04-24 14:19   ` Jan Beulich
2023-04-24 15:25     ` Jason Andryuk
2023-04-24 15:30       ` Jan Beulich
2023-04-24 16:42         ` Jason Andryuk
2023-04-24 15:34       ` Marek Marczykowski-Górecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.