[XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen

All of lore.kernel.org
 help / color / mirror / Atom feed

* [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen
@ 2024-07-08 11:41 Jiqian Chen
  2024-07-08 11:41 ` [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
                   ` (6 more replies)
  0 siblings, 7 replies; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

Hi All,
This is v12 series to support passthrough when dom0 is PVH
The expected merge order of this series is the first three patches in this series, then patches on
kernel side, then the last four patches in this series.
v11->v12 changes:
* patch#1: Change the title of this patch.
           Remove unnecessary notes, erroneous stamps, and #define.
* patch#2: Avoid using return, set error code instead when (un)map is not allowed.
           Due to functional change in v11, remove the Reviewed-by of Stefano.
* patch#3: Add more detailed descriptions into commit message not just callstack.

patch#4 in v11: remove from this series and upstream individually.

* patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to
           remove "__init" of highest_gsi function.
           Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
           Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
* patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is affected.

Best regards,
Jiqian Chen

v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line.
           Delete unnecessary local variables "struct physdev_pci_device *dev".
           Downgrade printk to dprintk.
           Moved struct pci_device_state_reset to the public header file.
           Delete enum pci_device_state_reset_type, and use macro definitions to represent different
           reset types.
           Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset
           to handle different reset functions.
           Add reset type as a function parameter for vpci_reset_device_state for possible future use
* patch#2: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being
           executed when domU has no pirq, instead of just preventing self-mapping; and modify the
           description of the commit message accordingly.
* patch#3: Modify the commit message to explain why the gsi of normal devices can work in PVH dom0 and why
           the passthrough device does not work in PVH dom0.
* patch#4: New patch, modification of allocate_pirq function, return the allocated pirq when there is
           already an allocated pirq and the caller has no specific requirements for pirq, and make it
           successful.
* patch#5: Modification on the hypervisor side proposed from patch#5 of v10.
           Add non-zero judgment for other bits of allow_access.
           Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
           Change the error exit path identifier "out" to "gsi_permission_out".
           Use ARRAY_SIZE() instead of open coed.
* patch#6: New patch, modification of xc_physdev_map_pirq to support mapping gsi to an idle pirq.
* patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev
           instead of adding unnecessary functions to libxencall.
           Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32.
* patch#8: Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get
           gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
           Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
           Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations.
           Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be
           used to obtain the corresponding pirq when unmap PIRQ.

v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code style.
* patch#3: Modified the description in the commit message, changing "it calls" to "it will need to call",
           indicating that there will be new codes on the kernel side that will call PHYSDEVOP_setup_gsi.
           Also added an explanation of why the interrupt of passthrough device does not work if gsi is not
           registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission.
           Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission directly in pci_add_dm_done.
           Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd
           instead of current->domain.
           In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and
           error handling for irq0 was added.
           Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission
           definition.
All patches have modified signatures as follows:
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> means I am the author.
Signed-off-by: Huang Rui <ray.huang@amd.com> means Rui sent them to upstream firstly.
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> means I take continue to upstream.

v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());"
           from vpci_reset_device_state;
           Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall.
           Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to
           "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke in below.Although their return
           values are different, this difference is acceptable for the sake of code consistency
           if ( !is_hardware_domain(currd) )
		       return -ENOSYS;
           break;
* patch#5: Change the commit message to describe more why we need this new hypercall.
           Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain
           why we need this check.
           Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
           Add explicit padding to struct xen_domctl_gsi_permission.

v7->v8 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq.
           That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in pci_add_dm_done use a new function
           pci_device_set_gsi to do map_pirq and grant permission. That gets more intuitive code logic.

v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.

v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device,
           vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to be understood, which to use
           gsi by default and be compatible with older kernel versions to continue to use irq

v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and
           just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno
           and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi

v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to grant irq permission in
           XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-Jiqian.Chen@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete pci_reset_device_state; add
           xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; add description for
		   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and
           map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need
		   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi
           instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-Jiqian.Chen@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-Jiqian.Chen@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions.
I will introduce all issues that these patches try to fix and the differences between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add <sbdf>\u201d to assign a device, pci_stub will call
pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()\u201d, the pci config
write will trigger an io interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in
pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the
state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from
v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.huang@amd.com/), v1 simply allow domU to write
pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call
into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail
at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH). The
second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2
patch is better than v1, v1 simply remove the has_pirq check
(xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.huang@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has corresponding
mappings in dom0. But it didn\u2019t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current"
is PVH dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH
are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function
vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0)
call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the
v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/,
kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.huang@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may cause
multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device\u2019s gsi to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and
treat irq as gsi, it is got from file /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function xen_host_pci_device_get().
But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing
first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated
from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38
get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a
translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations
in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi
from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch is the
same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.huang@amd.com/)

About the v2 patch of qemu, just change an included head file, other are similar to the
v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.huang@amd.com/), just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (7):
  xen/pci: Add hypercall to support reset of pcidev
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  x86/domctl: Add hypercall to set the access of x86 gsi
  tools/libxc: Allow gsi be mapped into a free pirq
  tools: Add new function to get gsi from dev
  tools: Add new function to do PIRQ (un)map on PVH dom0

 tools/include/xen-sys/Linux/privcmd.h |   7 ++
 tools/include/xenctrl.h               |   7 ++
 tools/libs/ctrl/xc_domain.c           |  15 ++++
 tools/libs/ctrl/xc_physdev.c          |  37 ++++++++-
 tools/libs/light/libxl_arch.h         |   4 +
 tools/libs/light/libxl_arm.c          |  10 +++
 tools/libs/light/libxl_pci.c          |  17 ++++
 tools/libs/light/libxl_x86.c          | 111 ++++++++++++++++++++++++++
 tools/python/xen/lowlevel/xc/xc.c     |   2 +
 xen/arch/x86/domctl.c                 |  32 ++++++++
 xen/arch/x86/hvm/hypercall.c          |   8 ++
 xen/arch/x86/include/asm/io_apic.h    |   2 +
 xen/arch/x86/io_apic.c                |  17 ++++
 xen/arch/x86/mpparse.c                |   5 +-
 xen/arch/x86/physdev.c                |  12 ++-
 xen/drivers/pci/physdev.c             |  52 ++++++++++++
 xen/drivers/vpci/vpci.c               |  10 +++
 xen/include/public/domctl.h           |   9 +++
 xen/include/public/physdev.h          |  16 ++++
 xen/include/xen/vpci.h                |   8 ++
 xen/xsm/flask/hooks.c                 |   1 +
 21 files changed, 376 insertions(+), 6 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
@ 2024-07-08 11:41 ` Jiqian Chen
  2024-07-08 14:56   ` Jan Beulich
  2024-07-31 15:55   ` Roger Pau Monné
  2024-07-08 11:41 ` [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui, Stewart Hildebrand

When a device has been reset on dom0 side, the Xen hypervisor
doesn't get notification, so the cached state in vpci is all
out of date compare with the real device state.

To solve that problem, add a new hypercall to support the reset
of pcidev and clear the vpci state of device. So that once the
state of device is reset on dom0 side, dom0 can call this
hypercall to notify hypervisor.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c      | 10 +++++++
 xen/include/public/physdev.h | 16 +++++++++++
 xen/include/xen/vpci.h       |  8 ++++++
 5 files changed, 87 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
+    case PHYSDEVOP_pci_device_state_reset:
     case PHYSDEVOP_dbgp_op:
         if ( !is_hardware_domain(currd) )
             return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..c0f47945d955 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/vpci.h>
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case PHYSDEVOP_pci_device_state_reset:
+    {
+        struct pci_device_state_reset dev_reset;
+        struct pci_dev *pdev;
+        pci_sbdf_t sbdf;
+
+        ret = -EOPNOTSUPP;
+        if ( !is_pci_passthrough_enabled() )
+            break;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+            break;
+
+        sbdf = PCI_SBDF(dev_reset.dev.seg,
+                        dev_reset.dev.bus,
+                        dev_reset.dev.devfn);
+
+        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+        if ( ret )
+            break;
+
+        pcidevs_lock();
+        pdev = pci_get_pdev(NULL, sbdf);
+        if ( !pdev )
+        {
+            pcidevs_unlock();
+            ret = -ENODEV;
+            break;
+        }
+
+        write_lock(&pdev->domain->pci_lock);
+        pcidevs_unlock();
+        switch ( dev_reset.reset_type )
+        {
+        case PCI_DEVICE_STATE_RESET_COLD:
+        case PCI_DEVICE_STATE_RESET_WARM:
+        case PCI_DEVICE_STATE_RESET_HOT:
+        case PCI_DEVICE_STATE_RESET_FLR:
+            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+            break;
+
+        default:
+            ret = -EOPNOTSUPP;
+            break;
+        }
+        write_unlock(&pdev->domain->pci_lock);
+
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..7e914d1eff9f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
     return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev,
+                            uint32_t reset_type)
+{
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    vpci_deassign_device(pdev);
+    return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..3cfde3fd2389 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix          30
 #define PHYSDEVOP_release_msix          31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
     /* IN */
     uint16_t seg;
@@ -305,6 +312,15 @@ struct physdev_pci_device {
 typedef struct physdev_pci_device physdev_pci_device_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
 
+struct pci_device_state_reset {
+    physdev_pci_device_t dev;
+#define PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_STATE_RESET_HOT  2
+#define PCI_DEVICE_STATE_RESET_FLR  3
+    uint32_t reset_type;
+};
+
 #define PHYSDEVOP_DBGP_RESET_PREPARE    1
 #define PHYSDEVOP_DBGP_RESET_DONE       2
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index da8d0f41e6f4..6be812dbc04a 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -38,6 +38,8 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev,
+                                         uint32_t reset_type);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -282,6 +284,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev,
+                                                       uint32_t reset_type)
+{
+    return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-08 11:41 ` [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
@ 2024-07-08 14:56   ` Jan Beulich
  2024-07-09  2:47     ` Chen, Jiqian
  2024-07-31 15:55   ` Roger Pau Monné
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-08 14:56 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 08.07.2024 13:41, Jiqian Chen wrote:
> When a device has been reset on dom0 side, the Xen hypervisor
> doesn't get notification, so the cached state in vpci is all
> out of date compare with the real device state.
> 
> To solve that problem, add a new hypercall to support the reset
> of pcidev and clear the vpci state of device. So that once the
> state of device is reset on dom0 side, dom0 can call this
> hypercall to notify hypervisor.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Just to double check: You're sure the other two R-b are still applicable,
despite the various changes that have been made?

As a purely cosmetic remark: I think I would have preferred if the new
identifiers didn't have "state" as a part; I simply don't think this adds
much value, while at the same time making these pretty long.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-08 14:56   ` Jan Beulich
@ 2024-07-09  2:47     ` Chen, Jiqian
  2024-07-09  6:01       ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-09  2:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/7/8 22:56, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the Xen hypervisor
>> doesn't get notification, so the cached state in vpci is all
>> out of date compare with the real device state.
>>
>> To solve that problem, add a new hypercall to support the reset
>> of pcidev and clear the vpci state of device. So that once the
>> state of device is reset on dom0 side, dom0 can call this
>> hypercall to notify hypervisor.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thank you very much!

> 
> Just to double check: You're sure the other two R-b are still applicable,
> despite the various changes that have been made?
Will remove in next version.

> 
> As a purely cosmetic remark: I think I would have preferred if the new
> identifiers didn't have "state" as a part; I simply don't think this adds
> much value, while at the same time making these pretty long.
Do you mean: remove "state" identifier on all the new codes?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-09  2:47     ` Chen, Jiqian
@ 2024-07-09  6:01       ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2024-07-09  6:01 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 09.07.2024 04:47, Chen, Jiqian wrote:
> On 2024/7/8 22:56, Jan Beulich wrote:
>> On 08.07.2024 13:41, Jiqian Chen wrote:
>>> When a device has been reset on dom0 side, the Xen hypervisor
>>> doesn't get notification, so the cached state in vpci is all
>>> out of date compare with the real device state.
>>>
>>> To solve that problem, add a new hypercall to support the reset
>>> of pcidev and clear the vpci state of device. So that once the
>>> state of device is reset on dom0 side, dom0 can call this
>>> hypercall to notify hypervisor.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Thank you very much!
> 
>>
>> Just to double check: You're sure the other two R-b are still applicable,
>> despite the various changes that have been made?
> Will remove in next version.
> 
>>
>> As a purely cosmetic remark: I think I would have preferred if the new
>> identifiers didn't have "state" as a part; I simply don't think this adds
>> much value, while at the same time making these pretty long.
> Do you mean: remove "state" identifier on all the new codes?

"part of identifiers", yes. As that's a personal view, I wouldn't insist
though, unless others shared my perspective.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-08 11:41 ` [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
  2024-07-08 14:56   ` Jan Beulich
@ 2024-07-31 15:55   ` Roger Pau Monné
  2024-07-31 15:58     ` Jan Beulich
  2024-08-02  2:55     ` Chen, Jiqian
  1 sibling, 2 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31 15:55 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
> When a device has been reset on dom0 side, the Xen hypervisor
> doesn't get notification, so the cached state in vpci is all
> out of date compare with the real device state.
> 
> To solve that problem, add a new hypercall to support the reset
> of pcidev and clear the vpci state of device. So that once the
> state of device is reset on dom0 side, dom0 can call this
> hypercall to notify hypervisor.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Thanks, just a couple of nits.

This is missing a changelog between versions, and I haven't been
following all the versions, so some of my questions might have been
answered in previous revisions.

> ---
>  xen/arch/x86/hvm/hypercall.c |  1 +
>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>  xen/drivers/vpci/vpci.c      | 10 +++++++
>  xen/include/public/physdev.h | 16 +++++++++++
>  xen/include/xen/vpci.h       |  8 ++++++
>  5 files changed, 87 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 7fb3136f0c7c..0fab670a4871 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case PHYSDEVOP_pci_mmcfg_reserved:
>      case PHYSDEVOP_pci_device_add:
>      case PHYSDEVOP_pci_device_remove:
> +    case PHYSDEVOP_pci_device_state_reset:
>      case PHYSDEVOP_dbgp_op:
>          if ( !is_hardware_domain(currd) )
>              return -ENOSYS;
> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
> index 42db3e6d133c..c0f47945d955 100644
> --- a/xen/drivers/pci/physdev.c
> +++ b/xen/drivers/pci/physdev.c
> @@ -2,6 +2,7 @@
>  #include <xen/guest_access.h>
>  #include <xen/hypercall.h>
>  #include <xen/init.h>
> +#include <xen/vpci.h>
>  
>  #ifndef COMPAT
>  typedef long ret_t;
> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case PHYSDEVOP_pci_device_state_reset:
> +    {
> +        struct pci_device_state_reset dev_reset;
> +        struct pci_dev *pdev;
> +        pci_sbdf_t sbdf;
> +
> +        ret = -EOPNOTSUPP;
> +        if ( !is_pci_passthrough_enabled() )
> +            break;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
> +            break;
> +
> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
> +                        dev_reset.dev.bus,
> +                        dev_reset.dev.devfn);
> +
> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> +        if ( ret )
> +            break;
> +
> +        pcidevs_lock();
> +        pdev = pci_get_pdev(NULL, sbdf);
> +        if ( !pdev )
> +        {
> +            pcidevs_unlock();
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        write_lock(&pdev->domain->pci_lock);
> +        pcidevs_unlock();
> +        switch ( dev_reset.reset_type )
> +        {
> +        case PCI_DEVICE_STATE_RESET_COLD:
> +        case PCI_DEVICE_STATE_RESET_WARM:
> +        case PCI_DEVICE_STATE_RESET_HOT:
> +        case PCI_DEVICE_STATE_RESET_FLR:
> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
> +            break;
> +
> +        default:
> +            ret = -EOPNOTSUPP;
> +            break;
> +        }
> +        write_unlock(&pdev->domain->pci_lock);
> +
> +        break;
> +    }
> +
>      default:
>          ret = -ENOSYS;
>          break;
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 1e6aa5d799b9..7e914d1eff9f 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>  
>      return rc;
>  }
> +
> +int vpci_reset_device_state(struct pci_dev *pdev,
> +                            uint32_t reset_type)

There's probably no use in passing reset_type to
vpci_reset_device_state() if it's ignored?

> +{
> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> +
> +    vpci_deassign_device(pdev);
> +    return vpci_assign_device(pdev);
> +}
> +
>  #endif /* __XEN__ */
>  
>  static int vpci_register_cmp(const struct vpci_register *r1,
> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
> index f0c0d4727c0b..3cfde3fd2389 100644
> --- a/xen/include/public/physdev.h
> +++ b/xen/include/public/physdev.h
> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>   */
>  #define PHYSDEVOP_prepare_msix          30
>  #define PHYSDEVOP_release_msix          31
> +/*
> + * Notify the hypervisor that a PCI device has been reset, so that any
> + * internally cached state is regenerated.  Should be called after any
> + * device reset performed by the hardware domain.
> + */
> +#define PHYSDEVOP_pci_device_state_reset 32
> +
>  struct physdev_pci_device {
>      /* IN */
>      uint16_t seg;
> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>  typedef struct physdev_pci_device physdev_pci_device_t;
>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>  
> +struct pci_device_state_reset {
> +    physdev_pci_device_t dev;
> +#define PCI_DEVICE_STATE_RESET_COLD 0
> +#define PCI_DEVICE_STATE_RESET_WARM 1
> +#define PCI_DEVICE_STATE_RESET_HOT  2
> +#define PCI_DEVICE_STATE_RESET_FLR  3
> +    uint32_t reset_type;

This might want to be a flags field, with the low 2 bits (or maybe 3
bits to cope if more rest modes are added in the future) being used to
signal the reset type.  We can always do that later if flags need to
be added.

Seeing as reset_type has no impact on the hypercall, I would like to
ask for some reasoning for it's presence to be added to the commit
message, otherwise it feels like pointless code churn.

> +};
> +
>  #define PHYSDEVOP_DBGP_RESET_PREPARE    1
>  #define PHYSDEVOP_DBGP_RESET_DONE       2
>  
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index da8d0f41e6f4..6be812dbc04a 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -38,6 +38,8 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>  
>  /* Remove all handlers and free vpci related structures. */
>  void vpci_deassign_device(struct pci_dev *pdev);
> +int __must_check vpci_reset_device_state(struct pci_dev *pdev,
> +                                         uint32_t reset_type);
>  
>  /* Add/remove a register handler. */
>  int __must_check vpci_add_register_mask(struct vpci *vpci,
> @@ -282,6 +284,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
>  
>  static inline void vpci_deassign_device(struct pci_dev *pdev) { }
>  
> +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev,
> +                                                       uint32_t reset_type)
> +{
> +    return 0;
> +}
> +

Maybe it turns out to be more complicated than the current approach,
but vpci_reset_device_state() could be an static inline function in
vpci.h defined regardless of whether CONFIG_HAS_VPCI is selected or
not, as the underlying functions vpci_{de}assign_device() are always
defined.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-31 15:55   ` Roger Pau Monné
@ 2024-07-31 15:58     ` Jan Beulich
  2024-07-31 16:13       ` Roger Pau Monné
  2024-08-02  2:55     ` Chen, Jiqian
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-31 15:58 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 31.07.2024 17:55, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>      return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev,
>> +                            uint32_t reset_type)
> 
> There's probably no use in passing reset_type to
> vpci_reset_device_state() if it's ignored?

I consider this forward-looking. It seems rather unlikely that in the
longer run the reset type doesn't matter.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-31 15:58     ` Jan Beulich
@ 2024-07-31 16:13       ` Roger Pau Monné
  2024-08-01  6:49         ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31 16:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Wed, Jul 31, 2024 at 05:58:54PM +0200, Jan Beulich wrote:
> On 31.07.2024 17:55, Roger Pau Monné wrote:
> > On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
> >> --- a/xen/drivers/vpci/vpci.c
> >> +++ b/xen/drivers/vpci/vpci.c
> >> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
> >>  
> >>      return rc;
> >>  }
> >> +
> >> +int vpci_reset_device_state(struct pci_dev *pdev,
> >> +                            uint32_t reset_type)
> > 
> > There's probably no use in passing reset_type to
> > vpci_reset_device_state() if it's ignored?
> 
> I consider this forward-looking. It seems rather unlikely that in the
> longer run the reset type doesn't matter.

I'm fine with having it in the hypercall interface, but passing it to
vpci_reset_device_state() can be done once there's a purpose for it,
and it won't change any public facing interface.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-31 16:13       ` Roger Pau Monné
@ 2024-08-01  6:49         ` Jan Beulich
  2024-08-02  2:56           ` Chen, Jiqian
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-08-01  6:49 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui,
	Roger Pau Monné

On 31.07.2024 18:13, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 05:58:54PM +0200, Jan Beulich wrote:
>> On 31.07.2024 17:55, Roger Pau Monné wrote:
>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>  
>>>>      return rc;
>>>>  }
>>>> +
>>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>>> +                            uint32_t reset_type)
>>>
>>> There's probably no use in passing reset_type to
>>> vpci_reset_device_state() if it's ignored?
>>
>> I consider this forward-looking. It seems rather unlikely that in the
>> longer run the reset type doesn't matter.
> 
> I'm fine with having it in the hypercall interface, but passing it to
> vpci_reset_device_state() can be done once there's a purpose for it,
> and it won't change any public facing interface.

Jiqian, just to clarify: I'm okay either way.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-08-01  6:49         ` Jan Beulich
@ 2024-08-02  2:56           ` Chen, Jiqian
  0 siblings, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  2:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	Roger Pau Monné, Chen, Jiqian

On 2024/8/1 14:49, Jan Beulich wrote:
> On 31.07.2024 18:13, Roger Pau Monné wrote:
>> On Wed, Jul 31, 2024 at 05:58:54PM +0200, Jan Beulich wrote:
>>> On 31.07.2024 17:55, Roger Pau Monné wrote:
>>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>>> --- a/xen/drivers/vpci/vpci.c
>>>>> +++ b/xen/drivers/vpci/vpci.c
>>>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>>  
>>>>>      return rc;
>>>>>  }
>>>>> +
>>>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>>>> +                            uint32_t reset_type)
>>>>
>>>> There's probably no use in passing reset_type to
>>>> vpci_reset_device_state() if it's ignored?
>>>
>>> I consider this forward-looking. It seems rather unlikely that in the
>>> longer run the reset type doesn't matter.
>>
>> I'm fine with having it in the hypercall interface, but passing it to
>> vpci_reset_device_state() can be done once there's a purpose for it,
>> and it won't change any public facing interface.
> 
> Jiqian, just to clarify: I'm okay either way.
Thank you very much! You dispelled my concerns.
I will remove reset_type in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-07-31 15:55   ` Roger Pau Monné
  2024-07-31 15:58     ` Jan Beulich
@ 2024-08-02  2:55     ` Chen, Jiqian
  2024-08-02  6:25       ` Jan Beulich
  1 sibling, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  2:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/7/31 23:55, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the Xen hypervisor
>> doesn't get notification, so the cached state in vpci is all
>> out of date compare with the real device state.
>>
>> To solve that problem, add a new hypercall to support the reset
>> of pcidev and clear the vpci state of device. So that once the
>> state of device is reset on dom0 side, dom0 can call this
>> hypercall to notify hypervisor.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> Thanks, just a couple of nits.
> 
> This is missing a changelog between versions, and I haven't been
> following all the versions, so some of my questions might have been
> answered in previous revisions.
Sorry, I will add changelogs here in next version.

> 
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  1 +
>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>>  xen/drivers/vpci/vpci.c      | 10 +++++++
>>  xen/include/public/physdev.h | 16 +++++++++++
>>  xen/include/xen/vpci.h       |  8 ++++++
>>  5 files changed, 87 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 7fb3136f0c7c..0fab670a4871 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      case PHYSDEVOP_pci_mmcfg_reserved:
>>      case PHYSDEVOP_pci_device_add:
>>      case PHYSDEVOP_pci_device_remove:
>> +    case PHYSDEVOP_pci_device_state_reset:
>>      case PHYSDEVOP_dbgp_op:
>>          if ( !is_hardware_domain(currd) )
>>              return -ENOSYS;
>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>> index 42db3e6d133c..c0f47945d955 100644
>> --- a/xen/drivers/pci/physdev.c
>> +++ b/xen/drivers/pci/physdev.c
>> @@ -2,6 +2,7 @@
>>  #include <xen/guest_access.h>
>>  #include <xen/hypercall.h>
>>  #include <xen/init.h>
>> +#include <xen/vpci.h>
>>  
>>  #ifndef COMPAT
>>  typedef long ret_t;
>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case PHYSDEVOP_pci_device_state_reset:
>> +    {
>> +        struct pci_device_state_reset dev_reset;
>> +        struct pci_dev *pdev;
>> +        pci_sbdf_t sbdf;
>> +
>> +        ret = -EOPNOTSUPP;
>> +        if ( !is_pci_passthrough_enabled() )
>> +            break;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
>> +            break;
>> +
>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
>> +                        dev_reset.dev.bus,
>> +                        dev_reset.dev.devfn);
>> +
>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +        if ( ret )
>> +            break;
>> +
>> +        pcidevs_lock();
>> +        pdev = pci_get_pdev(NULL, sbdf);
>> +        if ( !pdev )
>> +        {
>> +            pcidevs_unlock();
>> +            ret = -ENODEV;
>> +            break;
>> +        }
>> +
>> +        write_lock(&pdev->domain->pci_lock);
>> +        pcidevs_unlock();
>> +        switch ( dev_reset.reset_type )
>> +        {
>> +        case PCI_DEVICE_STATE_RESET_COLD:
>> +        case PCI_DEVICE_STATE_RESET_WARM:
>> +        case PCI_DEVICE_STATE_RESET_HOT:
>> +        case PCI_DEVICE_STATE_RESET_FLR:
>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>> +            break;
>> +
>> +        default:
>> +            ret = -EOPNOTSUPP;
>> +            break;
>> +        }
>> +        write_unlock(&pdev->domain->pci_lock);
>> +
>> +        break;
>> +    }
>> +
>>      default:
>>          ret = -ENOSYS;
>>          break;
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 1e6aa5d799b9..7e914d1eff9f 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>      return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev,
>> +                            uint32_t reset_type)
> 
> There's probably no use in passing reset_type to
> vpci_reset_device_state() if it's ignored?
> 
>> +{
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>> +    vpci_deassign_device(pdev);
>> +    return vpci_assign_device(pdev);
>> +}
>> +
>>  #endif /* __XEN__ */
>>  
>>  static int vpci_register_cmp(const struct vpci_register *r1,
>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
>> index f0c0d4727c0b..3cfde3fd2389 100644
>> --- a/xen/include/public/physdev.h
>> +++ b/xen/include/public/physdev.h
>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>   */
>>  #define PHYSDEVOP_prepare_msix          30
>>  #define PHYSDEVOP_release_msix          31
>> +/*
>> + * Notify the hypervisor that a PCI device has been reset, so that any
>> + * internally cached state is regenerated.  Should be called after any
>> + * device reset performed by the hardware domain.
>> + */
>> +#define PHYSDEVOP_pci_device_state_reset 32
>> +
>>  struct physdev_pci_device {
>>      /* IN */
>>      uint16_t seg;
>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>  
>> +struct pci_device_state_reset {
>> +    physdev_pci_device_t dev;
>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>> +    uint32_t reset_type;
> 
> This might want to be a flags field, with the low 2 bits (or maybe 3
> bits to cope if more rest modes are added in the future) being used to
> signal the reset type.  We can always do that later if flags need to
> be added.
Do you mean this?
+struct pci_device_state_reset {
+    physdev_pci_device_t dev;
+#define _PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
+#define _PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
+#define _PCI_DEVICE_STATE_RESET_HOT  2
+#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
+#define _PCI_DEVICE_STATE_RESET_FLR  3
+#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
+    uint32_t reset_type;
+};

> 
> Seeing as reset_type has no impact on the hypercall, I would like to
> ask for some reasoning for it's presence to be added to the commit
> message, otherwise it feels like pointless code churn.
OK, will add some commit messages to illustrate that this is for the forward-looking implementation of different reset types of processing situations in the future.

> 
>> +};
>> +
>>  #define PHYSDEVOP_DBGP_RESET_PREPARE    1
>>  #define PHYSDEVOP_DBGP_RESET_DONE       2
>>  
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index da8d0f41e6f4..6be812dbc04a 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -38,6 +38,8 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>>  
>>  /* Remove all handlers and free vpci related structures. */
>>  void vpci_deassign_device(struct pci_dev *pdev);
>> +int __must_check vpci_reset_device_state(struct pci_dev *pdev,
>> +                                         uint32_t reset_type);
>>  
>>  /* Add/remove a register handler. */
>>  int __must_check vpci_add_register_mask(struct vpci *vpci,
>> @@ -282,6 +284,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
>>  
>>  static inline void vpci_deassign_device(struct pci_dev *pdev) { }
>>  
>> +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev,
>> +                                                       uint32_t reset_type)
>> +{
>> +    return 0;
>> +}
>> +
> 
> Maybe it turns out to be more complicated than the current approach,
> but vpci_reset_device_state() could be an static inline function in
> vpci.h defined regardless of whether CONFIG_HAS_VPCI is selected or
> not, as the underlying functions vpci_{de}assign_device() are always
> defined.
OK, will change to this in next version.
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    vpci_deassign_device(pdev);
+    return vpci_assign_device(pdev);
+}

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-08-02  2:55     ` Chen, Jiqian
@ 2024-08-02  6:25       ` Jan Beulich
  2024-08-02  7:41         ` Chen, Jiqian
  2024-08-02  7:44         ` Roger Pau Monné
  0 siblings, 2 replies; 76+ messages in thread
From: Jan Beulich @ 2024-08-02  6:25 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	Roger Pau Monné

On 02.08.2024 04:55, Chen, Jiqian wrote:
> On 2024/7/31 23:55, Roger Pau Monné wrote:
>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>> When a device has been reset on dom0 side, the Xen hypervisor
>>> doesn't get notification, so the cached state in vpci is all
>>> out of date compare with the real device state.
>>>
>>> To solve that problem, add a new hypercall to support the reset
>>> of pcidev and clear the vpci state of device. So that once the
>>> state of device is reset on dom0 side, dom0 can call this
>>> hypercall to notify hypervisor.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>
>> Thanks, just a couple of nits.
>>
>> This is missing a changelog between versions, and I haven't been
>> following all the versions, so some of my questions might have been
>> answered in previous revisions.
> Sorry, I will add changelogs here in next version.
> 
>>
>>> ---
>>>  xen/arch/x86/hvm/hypercall.c |  1 +
>>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>>>  xen/drivers/vpci/vpci.c      | 10 +++++++
>>>  xen/include/public/physdev.h | 16 +++++++++++
>>>  xen/include/xen/vpci.h       |  8 ++++++
>>>  5 files changed, 87 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>>> index 7fb3136f0c7c..0fab670a4871 100644
>>> --- a/xen/arch/x86/hvm/hypercall.c
>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>      case PHYSDEVOP_pci_mmcfg_reserved:
>>>      case PHYSDEVOP_pci_device_add:
>>>      case PHYSDEVOP_pci_device_remove:
>>> +    case PHYSDEVOP_pci_device_state_reset:
>>>      case PHYSDEVOP_dbgp_op:
>>>          if ( !is_hardware_domain(currd) )
>>>              return -ENOSYS;
>>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>>> index 42db3e6d133c..c0f47945d955 100644
>>> --- a/xen/drivers/pci/physdev.c
>>> +++ b/xen/drivers/pci/physdev.c
>>> @@ -2,6 +2,7 @@
>>>  #include <xen/guest_access.h>
>>>  #include <xen/hypercall.h>
>>>  #include <xen/init.h>
>>> +#include <xen/vpci.h>
>>>  
>>>  #ifndef COMPAT
>>>  typedef long ret_t;
>>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>          break;
>>>      }
>>>  
>>> +    case PHYSDEVOP_pci_device_state_reset:
>>> +    {
>>> +        struct pci_device_state_reset dev_reset;
>>> +        struct pci_dev *pdev;
>>> +        pci_sbdf_t sbdf;
>>> +
>>> +        ret = -EOPNOTSUPP;
>>> +        if ( !is_pci_passthrough_enabled() )
>>> +            break;
>>> +
>>> +        ret = -EFAULT;
>>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
>>> +            break;
>>> +
>>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
>>> +                        dev_reset.dev.bus,
>>> +                        dev_reset.dev.devfn);
>>> +
>>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>>> +        if ( ret )
>>> +            break;
>>> +
>>> +        pcidevs_lock();
>>> +        pdev = pci_get_pdev(NULL, sbdf);
>>> +        if ( !pdev )
>>> +        {
>>> +            pcidevs_unlock();
>>> +            ret = -ENODEV;
>>> +            break;
>>> +        }
>>> +
>>> +        write_lock(&pdev->domain->pci_lock);
>>> +        pcidevs_unlock();
>>> +        switch ( dev_reset.reset_type )
>>> +        {
>>> +        case PCI_DEVICE_STATE_RESET_COLD:
>>> +        case PCI_DEVICE_STATE_RESET_WARM:
>>> +        case PCI_DEVICE_STATE_RESET_HOT:
>>> +        case PCI_DEVICE_STATE_RESET_FLR:
>>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>>> +            break;
>>> +
>>> +        default:
>>> +            ret = -EOPNOTSUPP;
>>> +            break;
>>> +        }
>>> +        write_unlock(&pdev->domain->pci_lock);
>>> +
>>> +        break;
>>> +    }
>>> +
>>>      default:
>>>          ret = -ENOSYS;
>>>          break;
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index 1e6aa5d799b9..7e914d1eff9f 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>  
>>>      return rc;
>>>  }
>>> +
>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>> +                            uint32_t reset_type)
>>
>> There's probably no use in passing reset_type to
>> vpci_reset_device_state() if it's ignored?
>>
>>> +{
>>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>> +
>>> +    vpci_deassign_device(pdev);
>>> +    return vpci_assign_device(pdev);
>>> +}
>>> +
>>>  #endif /* __XEN__ */
>>>  
>>>  static int vpci_register_cmp(const struct vpci_register *r1,
>>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
>>> index f0c0d4727c0b..3cfde3fd2389 100644
>>> --- a/xen/include/public/physdev.h
>>> +++ b/xen/include/public/physdev.h
>>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>>   */
>>>  #define PHYSDEVOP_prepare_msix          30
>>>  #define PHYSDEVOP_release_msix          31
>>> +/*
>>> + * Notify the hypervisor that a PCI device has been reset, so that any
>>> + * internally cached state is regenerated.  Should be called after any
>>> + * device reset performed by the hardware domain.
>>> + */
>>> +#define PHYSDEVOP_pci_device_state_reset 32
>>> +
>>>  struct physdev_pci_device {
>>>      /* IN */
>>>      uint16_t seg;
>>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>>  
>>> +struct pci_device_state_reset {
>>> +    physdev_pci_device_t dev;
>>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>>> +    uint32_t reset_type;
>>
>> This might want to be a flags field, with the low 2 bits (or maybe 3
>> bits to cope if more rest modes are added in the future) being used to
>> signal the reset type.  We can always do that later if flags need to
>> be added.
> Do you mean this?
> +struct pci_device_state_reset {
> +    physdev_pci_device_t dev;
> +#define _PCI_DEVICE_STATE_RESET_COLD 0
> +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
> +#define _PCI_DEVICE_STATE_RESET_WARM 1
> +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
> +#define _PCI_DEVICE_STATE_RESET_HOT  2
> +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
> +#define _PCI_DEVICE_STATE_RESET_FLR  3
> +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
> +    uint32_t reset_type;
> +};

That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
like #define-s, but additionally define a 2-bit mask constant (0x3). I
don't think it needs to be three bits right away - we can decide what to
do there when any of the higher bits are to be assigned a meaning.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-08-02  6:25       ` Jan Beulich
@ 2024-08-02  7:41         ` Chen, Jiqian
  2024-08-02  7:43           ` Jan Beulich
  2024-08-02  7:44         ` Roger Pau Monné
  1 sibling, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  7:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	Roger Pau Monné, Chen, Jiqian

On 2024/8/2 14:25, Jan Beulich wrote:
> On 02.08.2024 04:55, Chen, Jiqian wrote:
>> On 2024/7/31 23:55, Roger Pau Monné wrote:
>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>> When a device has been reset on dom0 side, the Xen hypervisor
>>>> doesn't get notification, so the cached state in vpci is all
>>>> out of date compare with the real device state.
>>>>
>>>> To solve that problem, add a new hypercall to support the reset
>>>> of pcidev and clear the vpci state of device. So that once the
>>>> state of device is reset on dom0 side, dom0 can call this
>>>> hypercall to notify hypervisor.
>>>>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>
>>> Thanks, just a couple of nits.
>>>
>>> This is missing a changelog between versions, and I haven't been
>>> following all the versions, so some of my questions might have been
>>> answered in previous revisions.
>> Sorry, I will add changelogs here in next version.
>>
>>>
>>>> ---
>>>>  xen/arch/x86/hvm/hypercall.c |  1 +
>>>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
>>>>  xen/drivers/vpci/vpci.c      | 10 +++++++
>>>>  xen/include/public/physdev.h | 16 +++++++++++
>>>>  xen/include/xen/vpci.h       |  8 ++++++
>>>>  5 files changed, 87 insertions(+)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>>>> index 7fb3136f0c7c..0fab670a4871 100644
>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>      case PHYSDEVOP_pci_mmcfg_reserved:
>>>>      case PHYSDEVOP_pci_device_add:
>>>>      case PHYSDEVOP_pci_device_remove:
>>>> +    case PHYSDEVOP_pci_device_state_reset:
>>>>      case PHYSDEVOP_dbgp_op:
>>>>          if ( !is_hardware_domain(currd) )
>>>>              return -ENOSYS;
>>>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>>>> index 42db3e6d133c..c0f47945d955 100644
>>>> --- a/xen/drivers/pci/physdev.c
>>>> +++ b/xen/drivers/pci/physdev.c
>>>> @@ -2,6 +2,7 @@
>>>>  #include <xen/guest_access.h>
>>>>  #include <xen/hypercall.h>
>>>>  #include <xen/init.h>
>>>> +#include <xen/vpci.h>
>>>>  
>>>>  #ifndef COMPAT
>>>>  typedef long ret_t;
>>>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>          break;
>>>>      }
>>>>  
>>>> +    case PHYSDEVOP_pci_device_state_reset:
>>>> +    {
>>>> +        struct pci_device_state_reset dev_reset;
>>>> +        struct pci_dev *pdev;
>>>> +        pci_sbdf_t sbdf;
>>>> +
>>>> +        ret = -EOPNOTSUPP;
>>>> +        if ( !is_pci_passthrough_enabled() )
>>>> +            break;
>>>> +
>>>> +        ret = -EFAULT;
>>>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
>>>> +            break;
>>>> +
>>>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
>>>> +                        dev_reset.dev.bus,
>>>> +                        dev_reset.dev.devfn);
>>>> +
>>>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>>>> +        if ( ret )
>>>> +            break;
>>>> +
>>>> +        pcidevs_lock();
>>>> +        pdev = pci_get_pdev(NULL, sbdf);
>>>> +        if ( !pdev )
>>>> +        {
>>>> +            pcidevs_unlock();
>>>> +            ret = -ENODEV;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        write_lock(&pdev->domain->pci_lock);
>>>> +        pcidevs_unlock();
>>>> +        switch ( dev_reset.reset_type )
>>>> +        {
>>>> +        case PCI_DEVICE_STATE_RESET_COLD:
>>>> +        case PCI_DEVICE_STATE_RESET_WARM:
>>>> +        case PCI_DEVICE_STATE_RESET_HOT:
>>>> +        case PCI_DEVICE_STATE_RESET_FLR:
>>>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>>>> +            break;
>>>> +
>>>> +        default:
>>>> +            ret = -EOPNOTSUPP;
>>>> +            break;
>>>> +        }
>>>> +        write_unlock(&pdev->domain->pci_lock);
>>>> +
>>>> +        break;
>>>> +    }
>>>> +
>>>>      default:
>>>>          ret = -ENOSYS;
>>>>          break;
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index 1e6aa5d799b9..7e914d1eff9f 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>  
>>>>      return rc;
>>>>  }
>>>> +
>>>> +int vpci_reset_device_state(struct pci_dev *pdev,
>>>> +                            uint32_t reset_type)
>>>
>>> There's probably no use in passing reset_type to
>>> vpci_reset_device_state() if it's ignored?
>>>
>>>> +{
>>>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>>> +
>>>> +    vpci_deassign_device(pdev);
>>>> +    return vpci_assign_device(pdev);
>>>> +}
>>>> +
>>>>  #endif /* __XEN__ */
>>>>  
>>>>  static int vpci_register_cmp(const struct vpci_register *r1,
>>>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
>>>> index f0c0d4727c0b..3cfde3fd2389 100644
>>>> --- a/xen/include/public/physdev.h
>>>> +++ b/xen/include/public/physdev.h
>>>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>>>   */
>>>>  #define PHYSDEVOP_prepare_msix          30
>>>>  #define PHYSDEVOP_release_msix          31
>>>> +/*
>>>> + * Notify the hypervisor that a PCI device has been reset, so that any
>>>> + * internally cached state is regenerated.  Should be called after any
>>>> + * device reset performed by the hardware domain.
>>>> + */
>>>> +#define PHYSDEVOP_pci_device_state_reset 32
>>>> +
>>>>  struct physdev_pci_device {
>>>>      /* IN */
>>>>      uint16_t seg;
>>>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>>>  
>>>> +struct pci_device_state_reset {
>>>> +    physdev_pci_device_t dev;
>>>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>>>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>>>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>>>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>>>> +    uint32_t reset_type;
>>>
>>> This might want to be a flags field, with the low 2 bits (or maybe 3
>>> bits to cope if more rest modes are added in the future) being used to
>>> signal the reset type.  We can always do that later if flags need to
>>> be added.
>> Do you mean this?
>> +struct pci_device_state_reset {
>> +    physdev_pci_device_t dev;
>> +#define _PCI_DEVICE_STATE_RESET_COLD 0
>> +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
>> +#define _PCI_DEVICE_STATE_RESET_WARM 1
>> +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
>> +#define _PCI_DEVICE_STATE_RESET_HOT  2
>> +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
>> +#define _PCI_DEVICE_STATE_RESET_FLR  3
>> +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
>> +    uint32_t reset_type;
>> +};
> 
> That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
> like #define-s, but additionally define a 2-bit mask constant (0x3). I
> don't think it needs to be three bits right away - we can decide what to
> do there when any of the higher bits are to be assigned a meaning.
Like this?
struct pci_device_state_reset {
    physdev_pci_device_t dev;
#define PCI_DEVICE_STATE_RESET_COLD 0x0
#define PCI_DEVICE_STATE_RESET_WARM 0x1
#define PCI_DEVICE_STATE_RESET_HOT  0x2
#define PCI_DEVICE_STATE_RESET_FLR  0x3
#define PCI_DEVICE_STATE_RESET_MASK  0x3
    uint32_t flags;
};

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-08-02  7:41         ` Chen, Jiqian
@ 2024-08-02  7:43           ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2024-08-02  7:43 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	Roger Pau Monné

On 02.08.2024 09:41, Chen, Jiqian wrote:
> On 2024/8/2 14:25, Jan Beulich wrote:
>> On 02.08.2024 04:55, Chen, Jiqian wrote:
>>> On 2024/7/31 23:55, Roger Pau Monné wrote:
>>>> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>>>>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
>>>>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>>>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>>>>  
>>>>> +struct pci_device_state_reset {
>>>>> +    physdev_pci_device_t dev;
>>>>> +#define PCI_DEVICE_STATE_RESET_COLD 0
>>>>> +#define PCI_DEVICE_STATE_RESET_WARM 1
>>>>> +#define PCI_DEVICE_STATE_RESET_HOT  2
>>>>> +#define PCI_DEVICE_STATE_RESET_FLR  3
>>>>> +    uint32_t reset_type;
>>>>
>>>> This might want to be a flags field, with the low 2 bits (or maybe 3
>>>> bits to cope if more rest modes are added in the future) being used to
>>>> signal the reset type.  We can always do that later if flags need to
>>>> be added.
>>> Do you mean this?
>>> +struct pci_device_state_reset {
>>> +    physdev_pci_device_t dev;
>>> +#define _PCI_DEVICE_STATE_RESET_COLD 0
>>> +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
>>> +#define _PCI_DEVICE_STATE_RESET_WARM 1
>>> +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
>>> +#define _PCI_DEVICE_STATE_RESET_HOT  2
>>> +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
>>> +#define _PCI_DEVICE_STATE_RESET_FLR  3
>>> +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
>>> +    uint32_t reset_type;
>>> +};
>>
>> That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
>> like #define-s, but additionally define a 2-bit mask constant (0x3). I
>> don't think it needs to be three bits right away - we can decide what to
>> do there when any of the higher bits are to be assigned a meaning.
> Like this?
> struct pci_device_state_reset {
>     physdev_pci_device_t dev;
> #define PCI_DEVICE_STATE_RESET_COLD 0x0
> #define PCI_DEVICE_STATE_RESET_WARM 0x1
> #define PCI_DEVICE_STATE_RESET_HOT  0x2
> #define PCI_DEVICE_STATE_RESET_FLR  0x3
> #define PCI_DEVICE_STATE_RESET_MASK  0x3
>     uint32_t flags;
> };

Yes, with the last #define adjusted such that columns align.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
  2024-08-02  6:25       ` Jan Beulich
  2024-08-02  7:41         ` Chen, Jiqian
@ 2024-08-02  7:44         ` Roger Pau Monné
  1 sibling, 0 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-02  7:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Chen, Jiqian, xen-devel@lists.xenproject.org, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray

On Fri, Aug 02, 2024 at 08:25:58AM +0200, Jan Beulich wrote:
> On 02.08.2024 04:55, Chen, Jiqian wrote:
> > On 2024/7/31 23:55, Roger Pau Monné wrote:
> >> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
> >>> When a device has been reset on dom0 side, the Xen hypervisor
> >>> doesn't get notification, so the cached state in vpci is all
> >>> out of date compare with the real device state.
> >>>
> >>> To solve that problem, add a new hypercall to support the reset
> >>> of pcidev and clear the vpci state of device. So that once the
> >>> state of device is reset on dom0 side, dom0 can call this
> >>> hypercall to notify hypervisor.
> >>>
> >>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >>> Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> >>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> >>
> >> Thanks, just a couple of nits.
> >>
> >> This is missing a changelog between versions, and I haven't been
> >> following all the versions, so some of my questions might have been
> >> answered in previous revisions.
> > Sorry, I will add changelogs here in next version.
> > 
> >>
> >>> ---
> >>>  xen/arch/x86/hvm/hypercall.c |  1 +
> >>>  xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
> >>>  xen/drivers/vpci/vpci.c      | 10 +++++++
> >>>  xen/include/public/physdev.h | 16 +++++++++++
> >>>  xen/include/xen/vpci.h       |  8 ++++++
> >>>  5 files changed, 87 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> >>> index 7fb3136f0c7c..0fab670a4871 100644
> >>> --- a/xen/arch/x86/hvm/hypercall.c
> >>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>      case PHYSDEVOP_pci_mmcfg_reserved:
> >>>      case PHYSDEVOP_pci_device_add:
> >>>      case PHYSDEVOP_pci_device_remove:
> >>> +    case PHYSDEVOP_pci_device_state_reset:
> >>>      case PHYSDEVOP_dbgp_op:
> >>>          if ( !is_hardware_domain(currd) )
> >>>              return -ENOSYS;
> >>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
> >>> index 42db3e6d133c..c0f47945d955 100644
> >>> --- a/xen/drivers/pci/physdev.c
> >>> +++ b/xen/drivers/pci/physdev.c
> >>> @@ -2,6 +2,7 @@
> >>>  #include <xen/guest_access.h>
> >>>  #include <xen/hypercall.h>
> >>>  #include <xen/init.h>
> >>> +#include <xen/vpci.h>
> >>>  
> >>>  #ifndef COMPAT
> >>>  typedef long ret_t;
> >>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>          break;
> >>>      }
> >>>  
> >>> +    case PHYSDEVOP_pci_device_state_reset:
> >>> +    {
> >>> +        struct pci_device_state_reset dev_reset;
> >>> +        struct pci_dev *pdev;
> >>> +        pci_sbdf_t sbdf;
> >>> +
> >>> +        ret = -EOPNOTSUPP;
> >>> +        if ( !is_pci_passthrough_enabled() )
> >>> +            break;
> >>> +
> >>> +        ret = -EFAULT;
> >>> +        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
> >>> +            break;
> >>> +
> >>> +        sbdf = PCI_SBDF(dev_reset.dev.seg,
> >>> +                        dev_reset.dev.bus,
> >>> +                        dev_reset.dev.devfn);
> >>> +
> >>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> >>> +        if ( ret )
> >>> +            break;
> >>> +
> >>> +        pcidevs_lock();
> >>> +        pdev = pci_get_pdev(NULL, sbdf);
> >>> +        if ( !pdev )
> >>> +        {
> >>> +            pcidevs_unlock();
> >>> +            ret = -ENODEV;
> >>> +            break;
> >>> +        }
> >>> +
> >>> +        write_lock(&pdev->domain->pci_lock);
> >>> +        pcidevs_unlock();
> >>> +        switch ( dev_reset.reset_type )
> >>> +        {
> >>> +        case PCI_DEVICE_STATE_RESET_COLD:
> >>> +        case PCI_DEVICE_STATE_RESET_WARM:
> >>> +        case PCI_DEVICE_STATE_RESET_HOT:
> >>> +        case PCI_DEVICE_STATE_RESET_FLR:
> >>> +            ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
> >>> +            break;
> >>> +
> >>> +        default:
> >>> +            ret = -EOPNOTSUPP;
> >>> +            break;
> >>> +        }
> >>> +        write_unlock(&pdev->domain->pci_lock);
> >>> +
> >>> +        break;
> >>> +    }
> >>> +
> >>>      default:
> >>>          ret = -ENOSYS;
> >>>          break;
> >>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> >>> index 1e6aa5d799b9..7e914d1eff9f 100644
> >>> --- a/xen/drivers/vpci/vpci.c
> >>> +++ b/xen/drivers/vpci/vpci.c
> >>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
> >>>  
> >>>      return rc;
> >>>  }
> >>> +
> >>> +int vpci_reset_device_state(struct pci_dev *pdev,
> >>> +                            uint32_t reset_type)
> >>
> >> There's probably no use in passing reset_type to
> >> vpci_reset_device_state() if it's ignored?
> >>
> >>> +{
> >>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> >>> +
> >>> +    vpci_deassign_device(pdev);
> >>> +    return vpci_assign_device(pdev);
> >>> +}
> >>> +
> >>>  #endif /* __XEN__ */
> >>>  
> >>>  static int vpci_register_cmp(const struct vpci_register *r1,
> >>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
> >>> index f0c0d4727c0b..3cfde3fd2389 100644
> >>> --- a/xen/include/public/physdev.h
> >>> +++ b/xen/include/public/physdev.h
> >>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
> >>>   */
> >>>  #define PHYSDEVOP_prepare_msix          30
> >>>  #define PHYSDEVOP_release_msix          31
> >>> +/*
> >>> + * Notify the hypervisor that a PCI device has been reset, so that any
> >>> + * internally cached state is regenerated.  Should be called after any
> >>> + * device reset performed by the hardware domain.
> >>> + */
> >>> +#define PHYSDEVOP_pci_device_state_reset 32
> >>> +
> >>>  struct physdev_pci_device {
> >>>      /* IN */
> >>>      uint16_t seg;
> >>> @@ -305,6 +312,15 @@ struct physdev_pci_device {
> >>>  typedef struct physdev_pci_device physdev_pci_device_t;
> >>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
> >>>  
> >>> +struct pci_device_state_reset {
> >>> +    physdev_pci_device_t dev;
> >>> +#define PCI_DEVICE_STATE_RESET_COLD 0
> >>> +#define PCI_DEVICE_STATE_RESET_WARM 1
> >>> +#define PCI_DEVICE_STATE_RESET_HOT  2
> >>> +#define PCI_DEVICE_STATE_RESET_FLR  3
> >>> +    uint32_t reset_type;
> >>
> >> This might want to be a flags field, with the low 2 bits (or maybe 3
> >> bits to cope if more rest modes are added in the future) being used to
> >> signal the reset type.  We can always do that later if flags need to
> >> be added.
> > Do you mean this?
> > +struct pci_device_state_reset {
> > +    physdev_pci_device_t dev;
> > +#define _PCI_DEVICE_STATE_RESET_COLD 0
> > +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
> > +#define _PCI_DEVICE_STATE_RESET_WARM 1
> > +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
> > +#define _PCI_DEVICE_STATE_RESET_HOT  2
> > +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
> > +#define _PCI_DEVICE_STATE_RESET_FLR  3
> > +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
> > +    uint32_t reset_type;
> > +};
> 
> That's four bits, not two. I'm pretty sure Roger meant to keep the enum-
> like #define-s, but additionally define a 2-bit mask constant (0x3). I
> don't think it needs to be three bits right away - we can decide what to
> do there when any of the higher bits are to be assigned a meaning.

Indeed, what I was requesting is just a cosmetic change, it doesn't
result in the values on the enum changing at all.

The field however should be better named "flags" or something more
generic so in the future it can accommodate other flags not related to
the reset type.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-07-08 11:41 ` [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
@ 2024-07-08 11:41 ` Jiqian Chen
  2024-07-08 14:58   ` Jan Beulich
                     ` (3 more replies)
  2024-07-08 11:41 ` [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
                   ` (4 subsequent siblings)
  6 siblings, 4 replies; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
And add a new check to prevent (un)map when the subject domain
doesn't have a notion of PIRQ.

So that the interrupt of a passthrough device can be
successfully mapped to pirq for domU with a notion of PIRQ
when dom0 is PVH

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 xen/arch/x86/hvm/hypercall.c |  6 ++++++
 xen/arch/x86/physdev.c       | 12 ++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd )
     {
+        /*
+        * Only being permitted for management of other domains.
+        * Further restrictions are enforced in do_physdev_op.
+        */
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..9f30a8c63a06 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !d )
             break;
 
-        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
+        /* Only mapping when the subject domain has a notion of PIRQ */
+        if ( !is_hvm_domain(d) || has_pirq(d) )
+            ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
+        else
+            ret = -EOPNOTSUPP;
 
         rcu_unlock_domain(d);
 
@@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !d )
             break;
 
-        ret = physdev_unmap_pirq(d, unmap.pirq);
+        /* Only unmapping when the subject domain has a notion of PIRQ */
+        if ( !is_hvm_domain(d) || has_pirq(d) )
+            ret = physdev_unmap_pirq(d, unmap.pirq);
+        else
+            ret = -EOPNOTSUPP;
 
         rcu_unlock_domain(d);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-08 11:41 ` [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-07-08 14:58   ` Jan Beulich
  2024-07-22 21:37   ` Stefano Stabellini
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2024-07-08 14:58 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 08.07.2024 13:41, Jiqian Chen wrote:
> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> a passthrough device by using gsi, see qemu code
> xen_pt_realize->xc_physdev_map_pirq and libxl code
> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
> is not allowed because currd is PVH dom0 and PVH has no
> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
> 
> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
> And add a new check to prevent (un)map when the subject domain
> doesn't have a notion of PIRQ.
> 
> So that the interrupt of a passthrough device can be
> successfully mapped to pirq for domU with a notion of PIRQ
> when dom0 is PVH
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-08 11:41 ` [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
  2024-07-08 14:58   ` Jan Beulich
@ 2024-07-22 21:37   ` Stefano Stabellini
  2024-07-30 13:09   ` Andrew Cooper
  2024-07-31  7:50   ` Roger Pau Monné
  3 siblings, 0 replies; 76+ messages in thread
From: Stefano Stabellini @ 2024-07-22 21:37 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui

On Mon, 8 Jul 2024, Jiqian Chen wrote:
> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> a passthrough device by using gsi, see qemu code
> xen_pt_realize->xc_physdev_map_pirq and libxl code
> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
> is not allowed because currd is PVH dom0 and PVH has no
> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
> 
> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
> And add a new check to prevent (un)map when the subject domain
> doesn't have a notion of PIRQ.
> 
> So that the interrupt of a passthrough device can be
> successfully mapped to pirq for domU with a notion of PIRQ
> when dom0 is PVH
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-08 11:41 ` [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
  2024-07-08 14:58   ` Jan Beulich
  2024-07-22 21:37   ` Stefano Stabellini
@ 2024-07-30 13:09   ` Andrew Cooper
  2024-07-31  1:47     ` Chen, Jiqian
  2024-07-31  8:31     ` Chen, Jiqian
  2024-07-31  7:50   ` Roger Pau Monné
  3 siblings, 2 replies; 76+ messages in thread
From: Andrew Cooper @ 2024-07-30 13:09 UTC (permalink / raw)
  To: Jiqian Chen, xen-devel
  Cc: Jan Beulich, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On 08/07/2024 12:41 pm, Jiqian Chen wrote:
> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> a passthrough device by using gsi, see qemu code
> xen_pt_realize->xc_physdev_map_pirq and libxl code
> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
> is not allowed because currd is PVH dom0 and PVH has no
> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>
> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
> And add a new check to prevent (un)map when the subject domain
> doesn't have a notion of PIRQ.
>
> So that the interrupt of a passthrough device can be
> successfully mapped to pirq for domU with a notion of PIRQ
> when dom0 is PVH
>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  xen/arch/x86/hvm/hypercall.c |  6 ++++++
>  xen/arch/x86/physdev.c       | 12 ++++++++++--
>  2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 0fab670a4871..03ada3c880bd 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      switch ( cmd )
>      {
> +        /*
> +        * Only being permitted for management of other domains.
> +        * Further restrictions are enforced in do_physdev_op.
> +        */
>      case PHYSDEVOP_map_pirq:
>      case PHYSDEVOP_unmap_pirq:
> +        break;
> +
>      case PHYSDEVOP_eoi:
>      case PHYSDEVOP_irq_status_query:
>      case PHYSDEVOP_get_free_pirq:
> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
> index d6dd622952a9..9f30a8c63a06 100644
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( !d )
>              break;
>  
> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
> +        /* Only mapping when the subject domain has a notion of PIRQ */
> +        if ( !is_hvm_domain(d) || has_pirq(d) )
> +            ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
> +        else
> +            ret = -EOPNOTSUPP;
>  
>          rcu_unlock_domain(d);
>  
> @@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( !d )
>              break;
>  
> -        ret = physdev_unmap_pirq(d, unmap.pirq);
> +        /* Only unmapping when the subject domain has a notion of PIRQ */
> +        if ( !is_hvm_domain(d) || has_pirq(d) )
> +            ret = physdev_unmap_pirq(d, unmap.pirq);
> +        else
> +            ret = -EOPNOTSUPP;
>  
>          rcu_unlock_domain(d);
>  

Gitlab is displeased with your offering.

https://gitlab.com/xen-project/xen/-/pipelines/1393459622

This breaks both {adl,zen3p}-pci-hvm-x86-64-gcc-debug, and given the:

(XEN) [    8.150305] HVM restore d1: CPU 0
libxl: error: libxl_pci.c:1491:pci_add_dm_done: Domain
1:xc_physdev_map_pirq irq=18 (error=-1): Not supported
libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain
1:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3)
libxl: error: libxl_create.c:1962:domcreate_attach_devices: Domain
1:unable to add pci devices
libxl: error: libxl_xshelp.c:206:libxl__xs_read_mandatory: xenstore read
failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain
type for domid=1, assuming HVM
libxl: error: libxl_domain.c:1616:domain_destroy_domid_cb: Domain
1:xc_domain_destroy failed: No such process

I'd say that we're hitting the newly introduced -EOPNOTSUPP path.

In the test scenario, dom0 is PV, and it's an HVM domU which is breaking.

The sibling *-pci-pv-* tests (a PV domU) are working fine.

Either way, I'm going to revert this for now because clearly the "the
subject domain has a notion of PIRQ" hasn't been reasoned about
correctly, and it's important to keep Gitlab CI green across the board.

~Andrew


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-30 13:09   ` Andrew Cooper
@ 2024-07-31  1:47     ` Chen, Jiqian
  2024-07-31  8:31     ` Chen, Jiqian
  1 sibling, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-31  1:47 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel@lists.xenproject.org
  Cc: Jan Beulich, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

Hi Andrew,

On 2024/7/30 21:09, Andrew Cooper wrote:
> On 08/07/2024 12:41 pm, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>> And add a new check to prevent (un)map when the subject domain
>> doesn't have a notion of PIRQ.
>>
>> So that the interrupt of a passthrough device can be
>> successfully mapped to pirq for domU with a notion of PIRQ
>> when dom0 is PVH
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  6 ++++++
>>  xen/arch/x86/physdev.c       | 12 ++++++++++--
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 0fab670a4871..03ada3c880bd 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>  
>>      switch ( cmd )
>>      {
>> +        /*
>> +        * Only being permitted for management of other domains.
>> +        * Further restrictions are enforced in do_physdev_op.
>> +        */
>>      case PHYSDEVOP_map_pirq:
>>      case PHYSDEVOP_unmap_pirq:
>> +        break;
>> +
>>      case PHYSDEVOP_eoi:
>>      case PHYSDEVOP_irq_status_query:
>>      case PHYSDEVOP_get_free_pirq:
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index d6dd622952a9..9f30a8c63a06 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          if ( !d )
>>              break;
>>  
>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>> +        /* Only mapping when the subject domain has a notion of PIRQ */
>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>> +            ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>> +        else
>> +            ret = -EOPNOTSUPP;
>>  
>>          rcu_unlock_domain(d);
>>  
>> @@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          if ( !d )
>>              break;
>>  
>> -        ret = physdev_unmap_pirq(d, unmap.pirq);
>> +        /* Only unmapping when the subject domain has a notion of PIRQ */
>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>> +            ret = physdev_unmap_pirq(d, unmap.pirq);
>> +        else
>> +            ret = -EOPNOTSUPP;
>>  
>>          rcu_unlock_domain(d);
>>  
> 
> Gitlab is displeased with your offering.
> 
> https://gitlab.com/xen-project/xen/-/pipelines/1393459622
> 
> This breaks both {adl,zen3p}-pci-hvm-x86-64-gcc-debug, and given the:
> 
> (XEN) [    8.150305] HVM restore d1: CPU 0
> libxl: error: libxl_pci.c:1491:pci_add_dm_done: Domain
> 1:xc_physdev_map_pirq irq=18 (error=-1): Not supported
> libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain
> 1:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3)
> libxl: error: libxl_create.c:1962:domcreate_attach_devices: Domain
> 1:unable to add pci devices
> libxl: error: libxl_xshelp.c:206:libxl__xs_read_mandatory: xenstore read
> failed: `/libxl/1/type': No such file or directory
> libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain
> type for domid=1, assuming HVM
> libxl: error: libxl_domain.c:1616:domain_destroy_domid_cb: Domain
> 1:xc_domain_destroy failed: No such process
> 
> I'd say that we're hitting the newly introduced -EOPNOTSUPP path.
> 
> In the test scenario, dom0 is PV, and it's an HVM domU which is breaking.
> 
> The sibling *-pci-pv-* tests (a PV domU) are working fine.
> 
> Either way, I'm going to revert this for now because clearly the "the
> subject domain has a notion of PIRQ" hasn't been reasoned about
> correctly, and it's important to keep Gitlab CI green across the board.

OK, I will try to reproduce and investigate this issue, thanks.

> 
> ~Andrew

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-30 13:09   ` Andrew Cooper
  2024-07-31  1:47     ` Chen, Jiqian
@ 2024-07-31  8:31     ` Chen, Jiqian
  2024-07-31  8:42       ` Jan Beulich
  1 sibling, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-31  8:31 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/7/30 21:09, Andrew Cooper wrote:
> On 08/07/2024 12:41 pm, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>> And add a new check to prevent (un)map when the subject domain
>> doesn't have a notion of PIRQ.
>>
>> So that the interrupt of a passthrough device can be
>> successfully mapped to pirq for domU with a notion of PIRQ
>> when dom0 is PVH
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  6 ++++++
>>  xen/arch/x86/physdev.c       | 12 ++++++++++--
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 0fab670a4871..03ada3c880bd 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>  
>>      switch ( cmd )
>>      {
>> +        /*
>> +        * Only being permitted for management of other domains.
>> +        * Further restrictions are enforced in do_physdev_op.
>> +        */
>>      case PHYSDEVOP_map_pirq:
>>      case PHYSDEVOP_unmap_pirq:
>> +        break;
>> +
>>      case PHYSDEVOP_eoi:
>>      case PHYSDEVOP_irq_status_query:
>>      case PHYSDEVOP_get_free_pirq:
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index d6dd622952a9..9f30a8c63a06 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          if ( !d )
>>              break;
>>  
>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>> +        /* Only mapping when the subject domain has a notion of PIRQ */
>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>> +            ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>> +        else
>> +            ret = -EOPNOTSUPP;
>>  
>>          rcu_unlock_domain(d);
>>  
>> @@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          if ( !d )
>>              break;
>>  
>> -        ret = physdev_unmap_pirq(d, unmap.pirq);
>> +        /* Only unmapping when the subject domain has a notion of PIRQ */
>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>> +            ret = physdev_unmap_pirq(d, unmap.pirq);
>> +        else
>> +            ret = -EOPNOTSUPP;
>>  
>>          rcu_unlock_domain(d);
>>  
> 
> Gitlab is displeased with your offering.
> 
> https://gitlab.com/xen-project/xen/-/pipelines/1393459622
> 
> This breaks both {adl,zen3p}-pci-hvm-x86-64-gcc-debug, and given the:
> 
> (XEN) [    8.150305] HVM restore d1: CPU 0
> libxl: error: libxl_pci.c:1491:pci_add_dm_done: Domain
> 1:xc_physdev_map_pirq irq=18 (error=-1): Not supported
> libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain
> 1:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3)
> libxl: error: libxl_create.c:1962:domcreate_attach_devices: Domain
> 1:unable to add pci devices
> libxl: error: libxl_xshelp.c:206:libxl__xs_read_mandatory: xenstore read
> failed: `/libxl/1/type': No such file or directory
> libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain
> type for domid=1, assuming HVM
> libxl: error: libxl_domain.c:1616:domain_destroy_domid_cb: Domain
> 1:xc_domain_destroy failed: No such process

Sorry to forget to validate the scenario of "hvm_pirq=0" for HVM guest since V10->V11(remove the self-check "d == currd").

V10 version:
+        /* Prevent self-map when currd has no X86_EMU_USE_PIRQ flag */
+        if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+        {
+            rcu_unlock_domain(d);
+            return -EOPNOTSUPP;
+        }

V11 version:
+        /* Prevent mapping when the subject domain has no X86_EMU_USE_PIRQ */
+        if ( is_hvm_domain(d) && !has_pirq(d) )
+        {
+            rcu_unlock_domain(d);
+            return -EOPNOTSUPP;
+        }

V10 is fine for when hvm_pirq is enable or disable. 
This issue is from V11, the cause is that when pass "hvm_pirq=0" to HVM guest, then has_pirq() is false, but it still uses the pirq to route the interrupt of passthrough devices.
So, it still does xc_physdev_(un)map_pirq, then fails at the has_pirq() check.

Hi Jan,
Should I need to change to V10 to only prevent the self-mapping when the subject domain has no PIRQ?
So that it can allow PHYSDEVOP_map_pirq for foreign mapping, no matter the dom0 or the domU has PIRQ or not?

> 
> I'd say that we're hitting the newly introduced -EOPNOTSUPP path.
> 
> In the test scenario, dom0 is PV, and it's an HVM domU which is breaking.
> 
> The sibling *-pci-pv-* tests (a PV domU) are working fine.
> 
> Either way, I'm going to revert this for now because clearly the "the
> subject domain has a notion of PIRQ" hasn't been reasoned about
> correctly, and it's important to keep Gitlab CI green across the board.
> 
> ~Andrew

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  8:31     ` Chen, Jiqian
@ 2024-07-31  8:42       ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2024-07-31  8:42 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray, Andrew Cooper,
	Roger Pau Monné

On 31.07.2024 10:31, Chen, Jiqian wrote:
> On 2024/7/30 21:09, Andrew Cooper wrote:
>> On 08/07/2024 12:41 pm, Jiqian Chen wrote:
>>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>>> a passthrough device by using gsi, see qemu code
>>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>>> is not allowed because currd is PVH dom0 and PVH has no
>>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>>
>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>>> And add a new check to prevent (un)map when the subject domain
>>> doesn't have a notion of PIRQ.
>>>
>>> So that the interrupt of a passthrough device can be
>>> successfully mapped to pirq for domU with a notion of PIRQ
>>> when dom0 is PVH
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> ---
>>>  xen/arch/x86/hvm/hypercall.c |  6 ++++++
>>>  xen/arch/x86/physdev.c       | 12 ++++++++++--
>>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>>> index 0fab670a4871..03ada3c880bd 100644
>>> --- a/xen/arch/x86/hvm/hypercall.c
>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>  
>>>      switch ( cmd )
>>>      {
>>> +        /*
>>> +        * Only being permitted for management of other domains.
>>> +        * Further restrictions are enforced in do_physdev_op.
>>> +        */
>>>      case PHYSDEVOP_map_pirq:
>>>      case PHYSDEVOP_unmap_pirq:
>>> +        break;
>>> +
>>>      case PHYSDEVOP_eoi:
>>>      case PHYSDEVOP_irq_status_query:
>>>      case PHYSDEVOP_get_free_pirq:
>>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>>> index d6dd622952a9..9f30a8c63a06 100644
>>> --- a/xen/arch/x86/physdev.c
>>> +++ b/xen/arch/x86/physdev.c
>>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>          if ( !d )
>>>              break;
>>>  
>>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>>> +        /* Only mapping when the subject domain has a notion of PIRQ */
>>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>>> +            ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>>> +        else
>>> +            ret = -EOPNOTSUPP;
>>>  
>>>          rcu_unlock_domain(d);
>>>  
>>> @@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>          if ( !d )
>>>              break;
>>>  
>>> -        ret = physdev_unmap_pirq(d, unmap.pirq);
>>> +        /* Only unmapping when the subject domain has a notion of PIRQ */
>>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>>> +            ret = physdev_unmap_pirq(d, unmap.pirq);
>>> +        else
>>> +            ret = -EOPNOTSUPP;
>>>  
>>>          rcu_unlock_domain(d);
>>>  
>>
>> Gitlab is displeased with your offering.
>>
>> https://gitlab.com/xen-project/xen/-/pipelines/1393459622
>>
>> This breaks both {adl,zen3p}-pci-hvm-x86-64-gcc-debug, and given the:
>>
>> (XEN) [    8.150305] HVM restore d1: CPU 0
>> libxl: error: libxl_pci.c:1491:pci_add_dm_done: Domain
>> 1:xc_physdev_map_pirq irq=18 (error=-1): Not supported
>> libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain
>> 1:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3)
>> libxl: error: libxl_create.c:1962:domcreate_attach_devices: Domain
>> 1:unable to add pci devices
>> libxl: error: libxl_xshelp.c:206:libxl__xs_read_mandatory: xenstore read
>> failed: `/libxl/1/type': No such file or directory
>> libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain
>> type for domid=1, assuming HVM
>> libxl: error: libxl_domain.c:1616:domain_destroy_domid_cb: Domain
>> 1:xc_domain_destroy failed: No such process
> 
> Sorry to forget to validate the scenario of "hvm_pirq=0" for HVM guest since V10->V11(remove the self-check "d == currd").
> 
> V10 version:
> +        /* Prevent self-map when currd has no X86_EMU_USE_PIRQ flag */
> +        if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
> +        {
> +            rcu_unlock_domain(d);
> +            return -EOPNOTSUPP;
> +        }
> 
> V11 version:
> +        /* Prevent mapping when the subject domain has no X86_EMU_USE_PIRQ */
> +        if ( is_hvm_domain(d) && !has_pirq(d) )
> +        {
> +            rcu_unlock_domain(d);
> +            return -EOPNOTSUPP;
> +        }
> 
> V10 is fine for when hvm_pirq is enable or disable. 
> This issue is from V11, the cause is that when pass "hvm_pirq=0" to HVM guest, then has_pirq() is false, but it still uses the pirq to route the interrupt of passthrough devices.
> So, it still does xc_physdev_(un)map_pirq, then fails at the has_pirq() check.
> 
> Hi Jan,
> Should I need to change to V10 to only prevent the self-mapping when the subject domain has no PIRQ?
> So that it can allow PHYSDEVOP_map_pirq for foreign mapping, no matter the dom0 or the domU has PIRQ or not?

No, my position there hasn't changed. I continue to view it as wrong to
have any d == currd checks here.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-08 11:41 ` [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
                     ` (2 preceding siblings ...)
  2024-07-30 13:09   ` Andrew Cooper
@ 2024-07-31  7:50   ` Roger Pau Monné
  2024-07-31  7:58     ` Jan Beulich
  2024-07-31  8:39     ` Chen, Jiqian
  3 siblings, 2 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31  7:50 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> a passthrough device by using gsi, see qemu code
> xen_pt_realize->xc_physdev_map_pirq and libxl code
> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
> is not allowed because currd is PVH dom0 and PVH has no
> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
> 
> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
> And add a new check to prevent (un)map when the subject domain
> doesn't have a notion of PIRQ.
> 
> So that the interrupt of a passthrough device can be
> successfully mapped to pirq for domU with a notion of PIRQ
> when dom0 is PVH
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  xen/arch/x86/hvm/hypercall.c |  6 ++++++
>  xen/arch/x86/physdev.c       | 12 ++++++++++--
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 0fab670a4871..03ada3c880bd 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      switch ( cmd )
>      {
> +        /*
> +        * Only being permitted for management of other domains.
> +        * Further restrictions are enforced in do_physdev_op.
> +        */
>      case PHYSDEVOP_map_pirq:
>      case PHYSDEVOP_unmap_pirq:
> +        break;
> +
>      case PHYSDEVOP_eoi:
>      case PHYSDEVOP_irq_status_query:
>      case PHYSDEVOP_get_free_pirq:
> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
> index d6dd622952a9..9f30a8c63a06 100644
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( !d )
>              break;
>  
> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
> +        /* Only mapping when the subject domain has a notion of PIRQ */
> +        if ( !is_hvm_domain(d) || has_pirq(d) )

I'm afraid this is not true.  It's fine to map interrupts to HVM
domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
allow HVM domains to route interrupts from devices (either emulated or
passed through) over event channels.

It might have worked in the past (when using a version of Xen < 4.19)
because XENFEAT_hvm_pirqs was enabled by default for HVM guests.

physdev_map_pirq() will work fine when used against domains that don't
have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.

I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
keep the code in do_physdev_op() as-is.  You will have to check
whether the current paths in do_physdev_op() are not making
assumptions about XENFEAT_hvm_pirqs being enabled when the calling
domain is of HVM type.  I don't think that's the case, but better
check.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  7:50   ` Roger Pau Monné
@ 2024-07-31  7:58     ` Jan Beulich
  2024-07-31  8:24       ` Roger Pau Monné
  2024-07-31  8:39     ` Chen, Jiqian
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-31  7:58 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 31.07.2024 09:50, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          if ( !d )
>>              break;
>>  
>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>> +        /* Only mapping when the subject domain has a notion of PIRQ */
>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
> 
> I'm afraid this is not true.  It's fine to map interrupts to HVM
> domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
> allow HVM domains to route interrupts from devices (either emulated or
> passed through) over event channels.
> 
> It might have worked in the past (when using a version of Xen < 4.19)
> because XENFEAT_hvm_pirqs was enabled by default for HVM guests.
> 
> physdev_map_pirq() will work fine when used against domains that don't
> have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.
> 
> I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
> keep the code in do_physdev_op() as-is.  You will have to check
> whether the current paths in do_physdev_op() are not making
> assumptions about XENFEAT_hvm_pirqs being enabled when the calling
> domain is of HVM type.  I don't think that's the case, but better
> check.

Yet the goal is to disallow mapping into PVH domains. The use of
has_pirq() was aiming at that. If that predicate can't be used (anymore)
for this purpose, which one is appropriate now?

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  7:58     ` Jan Beulich
@ 2024-07-31  8:24       ` Roger Pau Monné
  2024-07-31  8:40         ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31  8:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Wed, Jul 31, 2024 at 09:58:28AM +0200, Jan Beulich wrote:
> On 31.07.2024 09:50, Roger Pau Monné wrote:
> > On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
> >> --- a/xen/arch/x86/physdev.c
> >> +++ b/xen/arch/x86/physdev.c
> >> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>          if ( !d )
> >>              break;
> >>  
> >> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
> >> +        /* Only mapping when the subject domain has a notion of PIRQ */
> >> +        if ( !is_hvm_domain(d) || has_pirq(d) )
> > 
> > I'm afraid this is not true.  It's fine to map interrupts to HVM
> > domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
> > allow HVM domains to route interrupts from devices (either emulated or
> > passed through) over event channels.
> > 
> > It might have worked in the past (when using a version of Xen < 4.19)
> > because XENFEAT_hvm_pirqs was enabled by default for HVM guests.
> > 
> > physdev_map_pirq() will work fine when used against domains that don't
> > have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.
> > 
> > I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
> > keep the code in do_physdev_op() as-is.  You will have to check
> > whether the current paths in do_physdev_op() are not making
> > assumptions about XENFEAT_hvm_pirqs being enabled when the calling
> > domain is of HVM type.  I don't think that's the case, but better
> > check.
> 
> Yet the goal is to disallow mapping into PVH domains. The use of
> has_pirq() was aiming at that. If that predicate can't be used (anymore)
> for this purpose, which one is appropriate now?

Why do you want to add such restriction now, when it's not currently
present?

It was already the case that a PV dom0 could issue
PHYSDEVOP_{,un}map_pirq operations against a PVH domU, whatever the
result of such operation be.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  8:24       ` Roger Pau Monné
@ 2024-07-31  8:40         ` Jan Beulich
  2024-07-31  8:51           ` Roger Pau Monné
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-31  8:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 31.07.2024 10:24, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 09:58:28AM +0200, Jan Beulich wrote:
>> On 31.07.2024 09:50, Roger Pau Monné wrote:
>>> On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
>>>> --- a/xen/arch/x86/physdev.c
>>>> +++ b/xen/arch/x86/physdev.c
>>>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>          if ( !d )
>>>>              break;
>>>>  
>>>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>>>> +        /* Only mapping when the subject domain has a notion of PIRQ */
>>>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>>>
>>> I'm afraid this is not true.  It's fine to map interrupts to HVM
>>> domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
>>> allow HVM domains to route interrupts from devices (either emulated or
>>> passed through) over event channels.
>>>
>>> It might have worked in the past (when using a version of Xen < 4.19)
>>> because XENFEAT_hvm_pirqs was enabled by default for HVM guests.
>>>
>>> physdev_map_pirq() will work fine when used against domains that don't
>>> have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.
>>>
>>> I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
>>> keep the code in do_physdev_op() as-is.  You will have to check
>>> whether the current paths in do_physdev_op() are not making
>>> assumptions about XENFEAT_hvm_pirqs being enabled when the calling
>>> domain is of HVM type.  I don't think that's the case, but better
>>> check.
>>
>> Yet the goal is to disallow mapping into PVH domains. The use of
>> has_pirq() was aiming at that. If that predicate can't be used (anymore)
>> for this purpose, which one is appropriate now?
> 
> Why do you want to add such restriction now, when it's not currently
> present?
> 
> It was already the case that a PV dom0 could issue
> PHYSDEVOP_{,un}map_pirq operations against a PVH domU, whatever the
> result of such operation be.

Because (a) that was wrong and (b) we'd suddenly permit a PVH DomU to
issue such for itself.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  8:40         ` Jan Beulich
@ 2024-07-31  8:51           ` Roger Pau Monné
  2024-07-31  9:02             ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31  8:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Wed, Jul 31, 2024 at 10:40:46AM +0200, Jan Beulich wrote:
> On 31.07.2024 10:24, Roger Pau Monné wrote:
> > On Wed, Jul 31, 2024 at 09:58:28AM +0200, Jan Beulich wrote:
> >> On 31.07.2024 09:50, Roger Pau Monné wrote:
> >>> On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
> >>>> --- a/xen/arch/x86/physdev.c
> >>>> +++ b/xen/arch/x86/physdev.c
> >>>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>>          if ( !d )
> >>>>              break;
> >>>>  
> >>>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
> >>>> +        /* Only mapping when the subject domain has a notion of PIRQ */
> >>>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
> >>>
> >>> I'm afraid this is not true.  It's fine to map interrupts to HVM
> >>> domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
> >>> allow HVM domains to route interrupts from devices (either emulated or
> >>> passed through) over event channels.
> >>>
> >>> It might have worked in the past (when using a version of Xen < 4.19)
> >>> because XENFEAT_hvm_pirqs was enabled by default for HVM guests.
> >>>
> >>> physdev_map_pirq() will work fine when used against domains that don't
> >>> have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.
> >>>
> >>> I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
> >>> keep the code in do_physdev_op() as-is.  You will have to check
> >>> whether the current paths in do_physdev_op() are not making
> >>> assumptions about XENFEAT_hvm_pirqs being enabled when the calling
> >>> domain is of HVM type.  I don't think that's the case, but better
> >>> check.
> >>
> >> Yet the goal is to disallow mapping into PVH domains. The use of
> >> has_pirq() was aiming at that. If that predicate can't be used (anymore)
> >> for this purpose, which one is appropriate now?
> > 
> > Why do you want to add such restriction now, when it's not currently
> > present?
> > 
> > It was already the case that a PV dom0 could issue
> > PHYSDEVOP_{,un}map_pirq operations against a PVH domU, whatever the
> > result of such operation be.
> 
> Because (a) that was wrong and (b) we'd suddenly permit a PVH DomU to
> issue such for itself.

Regarding (b) a PVH domU issuing such operations would fail at the
xsm_map_domain_pirq() check in physdev_map_pirq().

I agree with (a), but I don't think enabling PVH dom0 usage of the
hypercalls should be gated on this.  As said a PV dom0 is already
capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
domU.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  8:51           ` Roger Pau Monné
@ 2024-07-31  9:02             ` Jan Beulich
  2024-07-31  9:37               ` Roger Pau Monné
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-31  9:02 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 31.07.2024 10:51, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 10:40:46AM +0200, Jan Beulich wrote:
>> On 31.07.2024 10:24, Roger Pau Monné wrote:
>>> On Wed, Jul 31, 2024 at 09:58:28AM +0200, Jan Beulich wrote:
>>>> On 31.07.2024 09:50, Roger Pau Monné wrote:
>>>>> On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
>>>>>> --- a/xen/arch/x86/physdev.c
>>>>>> +++ b/xen/arch/x86/physdev.c
>>>>>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>          if ( !d )
>>>>>>              break;
>>>>>>  
>>>>>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>>>>>> +        /* Only mapping when the subject domain has a notion of PIRQ */
>>>>>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
>>>>>
>>>>> I'm afraid this is not true.  It's fine to map interrupts to HVM
>>>>> domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
>>>>> allow HVM domains to route interrupts from devices (either emulated or
>>>>> passed through) over event channels.
>>>>>
>>>>> It might have worked in the past (when using a version of Xen < 4.19)
>>>>> because XENFEAT_hvm_pirqs was enabled by default for HVM guests.
>>>>>
>>>>> physdev_map_pirq() will work fine when used against domains that don't
>>>>> have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.
>>>>>
>>>>> I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
>>>>> keep the code in do_physdev_op() as-is.  You will have to check
>>>>> whether the current paths in do_physdev_op() are not making
>>>>> assumptions about XENFEAT_hvm_pirqs being enabled when the calling
>>>>> domain is of HVM type.  I don't think that's the case, but better
>>>>> check.
>>>>
>>>> Yet the goal is to disallow mapping into PVH domains. The use of
>>>> has_pirq() was aiming at that. If that predicate can't be used (anymore)
>>>> for this purpose, which one is appropriate now?
>>>
>>> Why do you want to add such restriction now, when it's not currently
>>> present?
>>>
>>> It was already the case that a PV dom0 could issue
>>> PHYSDEVOP_{,un}map_pirq operations against a PVH domU, whatever the
>>> result of such operation be.
>>
>> Because (a) that was wrong and (b) we'd suddenly permit a PVH DomU to
>> issue such for itself.
> 
> Regarding (b) a PVH domU issuing such operations would fail at the
> xsm_map_domain_pirq() check in physdev_map_pirq().

Hmm, yes, fair point.

> I agree with (a), but I don't think enabling PVH dom0 usage of the
> hypercalls should be gated on this.  As said a PV dom0 is already
> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
> domU.

Okay, I can accept that as an intermediate position. We ought to deny
such requests at some point though for PVH domains, the latest in the
course of making vPCI work there.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  9:02             ` Jan Beulich
@ 2024-07-31  9:37               ` Roger Pau Monné
  2024-07-31  9:55                 ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31  9:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
> On 31.07.2024 10:51, Roger Pau Monné wrote:
> > I agree with (a), but I don't think enabling PVH dom0 usage of the
> > hypercalls should be gated on this.  As said a PV dom0 is already
> > capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
> > domU.
> 
> Okay, I can accept that as an intermediate position. We ought to deny
> such requests at some point though for PVH domains, the latest in the
> course of making vPCI work there.

Hm, once physdev_map_pirq() works as intended against PVH domains, I
don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
against such domains.

Granted using vPCI for plain PCI passthrough is the best option, but I
also don't think we should limit it in the hypervisor.  Some kind of
passthrough (like when using vfio/mdev) will still need something akin
to a device model I would expect.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  9:37               ` Roger Pau Monné
@ 2024-07-31  9:55                 ` Jan Beulich
  2024-07-31 11:29                   ` Roger Pau Monné
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-31  9:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 31.07.2024 11:37, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
>> On 31.07.2024 10:51, Roger Pau Monné wrote:
>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
>>> hypercalls should be gated on this.  As said a PV dom0 is already
>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
>>> domU.
>>
>> Okay, I can accept that as an intermediate position. We ought to deny
>> such requests at some point though for PVH domains, the latest in the
>> course of making vPCI work there.
> 
> Hm, once physdev_map_pirq() works as intended against PVH domains, I
> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
> against such domains.

Well. If it can be made work as intended, then I certainly agree. However,
without even the concept of pIRQ in PVH I'm having a hard time seeing how
it can be made work. Iirc you were advocating for us to not introduce pIRQ
into PVH.

Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
pass in GSIs? I think I suggested something along these lines also to
Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
not sure this could be made work reliably.

Which reminds me of another question I had: What meaning does the pirq
field have right now, if Dom0 would issue the request against a PVH DomU?
What meaning will it have for a !has_pirq() HVM domain?

Jan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  9:55                 ` Jan Beulich
@ 2024-07-31 11:29                   ` Roger Pau Monné
  2024-07-31 11:39                     ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31 11:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
> On 31.07.2024 11:37, Roger Pau Monné wrote:
> > On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
> >> On 31.07.2024 10:51, Roger Pau Monné wrote:
> >>> I agree with (a), but I don't think enabling PVH dom0 usage of the
> >>> hypercalls should be gated on this.  As said a PV dom0 is already
> >>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
> >>> domU.
> >>
> >> Okay, I can accept that as an intermediate position. We ought to deny
> >> such requests at some point though for PVH domains, the latest in the
> >> course of making vPCI work there.
> > 
> > Hm, once physdev_map_pirq() works as intended against PVH domains, I
> > don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
> > against such domains.
> 
> Well. If it can be made work as intended, then I certainly agree. However,
> without even the concept of pIRQ in PVH I'm having a hard time seeing how
> it can be made work. Iirc you were advocating for us to not introduce pIRQ
> into PVH.

From what I'm seeing here the intention is to expose
PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.

> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
> pass in GSIs?

I think that was one my proposals, to either introduce a new
hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
in an ABI compatible way so that semantically the field could be a GSI
rather than a pIRQ.  We however would also need a way to reference an
MSI entry.

My main concern is not with pIRQs by itself, pIRQs are just an
abstract way to reference interrupts, my concern and what I wanted to
avoid on PVH is being able to route pIRQs over event channels.  IOW:
have interrupts from physical devices delivered over event channels.

> I think I suggested something along these lines also to
> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
> not sure this could be made work reliably.

I'm afraid I've been lacking behind on reviewing those series.

> Which reminds me of another question I had: What meaning does the pirq
> field have right now, if Dom0 would issue the request against a PVH DomU?
> What meaning will it have for a !has_pirq() HVM domain?

The pirq field could be a way to reference an interrupt.  It doesn't
need to be exposed to the PVH domU at all, but it's a way for the
device model to identify which interrupt should be mapped to which
domain.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31 11:29                   ` Roger Pau Monné
@ 2024-07-31 11:39                     ` Jan Beulich
  2024-07-31 13:03                       ` Roger Pau Monné
  0 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-31 11:39 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 31.07.2024 13:29, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
>> On 31.07.2024 11:37, Roger Pau Monné wrote:
>>> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
>>>> On 31.07.2024 10:51, Roger Pau Monné wrote:
>>>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
>>>>> hypercalls should be gated on this.  As said a PV dom0 is already
>>>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
>>>>> domU.
>>>>
>>>> Okay, I can accept that as an intermediate position. We ought to deny
>>>> such requests at some point though for PVH domains, the latest in the
>>>> course of making vPCI work there.
>>>
>>> Hm, once physdev_map_pirq() works as intended against PVH domains, I
>>> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
>>> against such domains.
>>
>> Well. If it can be made work as intended, then I certainly agree. However,
>> without even the concept of pIRQ in PVH I'm having a hard time seeing how
>> it can be made work. Iirc you were advocating for us to not introduce pIRQ
>> into PVH.
> 
> From what I'm seeing here the intention is to expose
> PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
> pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.

Only in so far as it is an abstract, handle-like value pertaining solely
to the target domain.

>> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
>> pass in GSIs?
> 
> I think that was one my proposals, to either introduce a new
> hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
> in an ABI compatible way so that semantically the field could be a GSI
> rather than a pIRQ.  We however would also need a way to reference an
> MSI entry.

Of course.

> My main concern is not with pIRQs by itself, pIRQs are just an
> abstract way to reference interrupts, my concern and what I wanted to
> avoid on PVH is being able to route pIRQs over event channels.  IOW:
> have interrupts from physical devices delivered over event channels.

Oh, I might have slightly misunderstood your intentions then.

>> I think I suggested something along these lines also to
>> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
>> not sure this could be made work reliably.
> 
> I'm afraid I've been lacking behind on reviewing those series.
> 
>> Which reminds me of another question I had: What meaning does the pirq
>> field have right now, if Dom0 would issue the request against a PVH DomU?
>> What meaning will it have for a !has_pirq() HVM domain?
> 
> The pirq field could be a way to reference an interrupt.  It doesn't
> need to be exposed to the PVH domU at all, but it's a way for the
> device model to identify which interrupt should be mapped to which
> domain.

Since pIRQ-s are per-domain, _that_ kind of association won't be
helped. But yes, as per above it could serve as an abstract handle-
like value.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31 11:39                     ` Jan Beulich
@ 2024-07-31 13:03                       ` Roger Pau Monné
  2024-08-02  2:37                         ` Chen, Jiqian
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-07-31 13:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Wed, Jul 31, 2024 at 01:39:40PM +0200, Jan Beulich wrote:
> On 31.07.2024 13:29, Roger Pau Monné wrote:
> > On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
> >> On 31.07.2024 11:37, Roger Pau Monné wrote:
> >>> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
> >>>> On 31.07.2024 10:51, Roger Pau Monné wrote:
> >>>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
> >>>>> hypercalls should be gated on this.  As said a PV dom0 is already
> >>>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
> >>>>> domU.
> >>>>
> >>>> Okay, I can accept that as an intermediate position. We ought to deny
> >>>> such requests at some point though for PVH domains, the latest in the
> >>>> course of making vPCI work there.
> >>>
> >>> Hm, once physdev_map_pirq() works as intended against PVH domains, I
> >>> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
> >>> against such domains.
> >>
> >> Well. If it can be made work as intended, then I certainly agree. However,
> >> without even the concept of pIRQ in PVH I'm having a hard time seeing how
> >> it can be made work. Iirc you were advocating for us to not introduce pIRQ
> >> into PVH.
> > 
> > From what I'm seeing here the intention is to expose
> > PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
> > pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.
> 
> Only in so far as it is an abstract, handle-like value pertaining solely
> to the target domain.
> 
> >> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
> >> pass in GSIs?
> > 
> > I think that was one my proposals, to either introduce a new
> > hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
> > in an ABI compatible way so that semantically the field could be a GSI
> > rather than a pIRQ.  We however would also need a way to reference an
> > MSI entry.
> 
> Of course.
> 
> > My main concern is not with pIRQs by itself, pIRQs are just an
> > abstract way to reference interrupts, my concern and what I wanted to
> > avoid on PVH is being able to route pIRQs over event channels.  IOW:
> > have interrupts from physical devices delivered over event channels.
> 
> Oh, I might have slightly misunderstood your intentions then.

My intention would be to not even use pIRQs at all, in order to avoid
the temptation of the guest itself managing interrupts using
hypercalls, hence I would have preferred that abstract interface to be
something else.

Maybe we could even expose the Xen IRQ space directly, and just use
that as interrupt handles, but since I'm not the one doing the work
I'm not sure it's fair to ask for something that would require more
changes internally to Xen.

> >> I think I suggested something along these lines also to
> >> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
> >> not sure this could be made work reliably.
> > 
> > I'm afraid I've been lacking behind on reviewing those series.
> > 
> >> Which reminds me of another question I had: What meaning does the pirq
> >> field have right now, if Dom0 would issue the request against a PVH DomU?
> >> What meaning will it have for a !has_pirq() HVM domain?
> > 
> > The pirq field could be a way to reference an interrupt.  It doesn't
> > need to be exposed to the PVH domU at all, but it's a way for the
> > device model to identify which interrupt should be mapped to which
> > domain.
> 
> Since pIRQ-s are per-domain, _that_ kind of association won't be
> helped. But yes, as per above it could serve as an abstract handle-
> like value.

I would be fine with doing the interrupt bindings based on IRQs
instead of pIRQs, but I'm afraid that would require more changes to
hypercalls and Xen internals.

At some point I need to work on a new interface to do passthrough, so
that we can remove the usage of domctls from QEMU.  That might be a
good opportunity to switch from using pIRQs.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31 13:03                       ` Roger Pau Monné
@ 2024-08-02  2:37                         ` Chen, Jiqian
  2024-08-02  8:11                           ` Roger Pau Monné
  0 siblings, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  2:37 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	Chen, Jiqian

On 2024/7/31 21:03, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 01:39:40PM +0200, Jan Beulich wrote:
>> On 31.07.2024 13:29, Roger Pau Monné wrote:
>>> On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
>>>> On 31.07.2024 11:37, Roger Pau Monné wrote:
>>>>> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
>>>>>> On 31.07.2024 10:51, Roger Pau Monné wrote:
>>>>>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
>>>>>>> hypercalls should be gated on this.  As said a PV dom0 is already
>>>>>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
>>>>>>> domU.
>>>>>>
>>>>>> Okay, I can accept that as an intermediate position. We ought to deny
>>>>>> such requests at some point though for PVH domains, the latest in the
>>>>>> course of making vPCI work there.
>>>>>
>>>>> Hm, once physdev_map_pirq() works as intended against PVH domains, I
>>>>> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
>>>>> against such domains.
>>>>
>>>> Well. If it can be made work as intended, then I certainly agree. However,
>>>> without even the concept of pIRQ in PVH I'm having a hard time seeing how
>>>> it can be made work. Iirc you were advocating for us to not introduce pIRQ
>>>> into PVH.
>>>
>>> From what I'm seeing here the intention is to expose
>>> PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
>>> pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.
>>
>> Only in so far as it is an abstract, handle-like value pertaining solely
>> to the target domain.
>>
>>>> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
>>>> pass in GSIs?
>>>
>>> I think that was one my proposals, to either introduce a new
>>> hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
>>> in an ABI compatible way so that semantically the field could be a GSI
>>> rather than a pIRQ.  We however would also need a way to reference an
>>> MSI entry.
>>
>> Of course.
>>
>>> My main concern is not with pIRQs by itself, pIRQs are just an
>>> abstract way to reference interrupts, my concern and what I wanted to
>>> avoid on PVH is being able to route pIRQs over event channels.  IOW:
>>> have interrupts from physical devices delivered over event channels.
>>
>> Oh, I might have slightly misunderstood your intentions then.
> 
> My intention would be to not even use pIRQs at all, in order to avoid
> the temptation of the guest itself managing interrupts using
> hypercalls, hence I would have preferred that abstract interface to be
> something else.
> 
> Maybe we could even expose the Xen IRQ space directly, and just use
> that as interrupt handles, but since I'm not the one doing the work
> I'm not sure it's fair to ask for something that would require more
> changes internally to Xen.
> 
>>>> I think I suggested something along these lines also to
>>>> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
>>>> not sure this could be made work reliably.
>>>
>>> I'm afraid I've been lacking behind on reviewing those series.
>>>
>>>> Which reminds me of another question I had: What meaning does the pirq
>>>> field have right now, if Dom0 would issue the request against a PVH DomU?
>>>> What meaning will it have for a !has_pirq() HVM domain?
>>>
>>> The pirq field could be a way to reference an interrupt.  It doesn't
>>> need to be exposed to the PVH domU at all, but it's a way for the
>>> device model to identify which interrupt should be mapped to which
>>> domain.
>>
>> Since pIRQ-s are per-domain, _that_ kind of association won't be
>> helped. But yes, as per above it could serve as an abstract handle-
>> like value.
> 
> I would be fine with doing the interrupt bindings based on IRQs
> instead of pIRQs, but I'm afraid that would require more changes to
> hypercalls and Xen internals.
> 
> At some point I need to work on a new interface to do passthrough, so
> that we can remove the usage of domctls from QEMU.  That might be a
> good opportunity to switch from using pIRQs.

Thanks for your input, but I may be a bit behind you with my knowledge and can't fully understand the discussion.
How should I modify this question later?
Should I add a new hypercall specifically for passthrough?
Or if it is to prevent the (un)map from being used for PVH guests, can I just add a new function to check if the subject domain is a PVH type? Like is_pvh_domain().

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-08-02  2:37                         ` Chen, Jiqian
@ 2024-08-02  8:11                           ` Roger Pau Monné
  2024-08-02  8:17                             ` Chen, Jiqian
  2024-08-02  9:37                             ` Jan Beulich
  0 siblings, 2 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-02  8:11 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Jan Beulich, xen-devel@lists.xenproject.org, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray

On Fri, Aug 02, 2024 at 02:37:24AM +0000, Chen, Jiqian wrote:
> On 2024/7/31 21:03, Roger Pau Monné wrote:
> > On Wed, Jul 31, 2024 at 01:39:40PM +0200, Jan Beulich wrote:
> >> On 31.07.2024 13:29, Roger Pau Monné wrote:
> >>> On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
> >>>> On 31.07.2024 11:37, Roger Pau Monné wrote:
> >>>>> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
> >>>>>> On 31.07.2024 10:51, Roger Pau Monné wrote:
> >>>>>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
> >>>>>>> hypercalls should be gated on this.  As said a PV dom0 is already
> >>>>>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
> >>>>>>> domU.
> >>>>>>
> >>>>>> Okay, I can accept that as an intermediate position. We ought to deny
> >>>>>> such requests at some point though for PVH domains, the latest in the
> >>>>>> course of making vPCI work there.
> >>>>>
> >>>>> Hm, once physdev_map_pirq() works as intended against PVH domains, I
> >>>>> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
> >>>>> against such domains.
> >>>>
> >>>> Well. If it can be made work as intended, then I certainly agree. However,
> >>>> without even the concept of pIRQ in PVH I'm having a hard time seeing how
> >>>> it can be made work. Iirc you were advocating for us to not introduce pIRQ
> >>>> into PVH.
> >>>
> >>> From what I'm seeing here the intention is to expose
> >>> PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
> >>> pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.
> >>
> >> Only in so far as it is an abstract, handle-like value pertaining solely
> >> to the target domain.
> >>
> >>>> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
> >>>> pass in GSIs?
> >>>
> >>> I think that was one my proposals, to either introduce a new
> >>> hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
> >>> in an ABI compatible way so that semantically the field could be a GSI
> >>> rather than a pIRQ.  We however would also need a way to reference an
> >>> MSI entry.
> >>
> >> Of course.
> >>
> >>> My main concern is not with pIRQs by itself, pIRQs are just an
> >>> abstract way to reference interrupts, my concern and what I wanted to
> >>> avoid on PVH is being able to route pIRQs over event channels.  IOW:
> >>> have interrupts from physical devices delivered over event channels.
> >>
> >> Oh, I might have slightly misunderstood your intentions then.
> > 
> > My intention would be to not even use pIRQs at all, in order to avoid
> > the temptation of the guest itself managing interrupts using
> > hypercalls, hence I would have preferred that abstract interface to be
> > something else.
> > 
> > Maybe we could even expose the Xen IRQ space directly, and just use
> > that as interrupt handles, but since I'm not the one doing the work
> > I'm not sure it's fair to ask for something that would require more
> > changes internally to Xen.
> > 
> >>>> I think I suggested something along these lines also to
> >>>> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
> >>>> not sure this could be made work reliably.
> >>>
> >>> I'm afraid I've been lacking behind on reviewing those series.
> >>>
> >>>> Which reminds me of another question I had: What meaning does the pirq
> >>>> field have right now, if Dom0 would issue the request against a PVH DomU?
> >>>> What meaning will it have for a !has_pirq() HVM domain?
> >>>
> >>> The pirq field could be a way to reference an interrupt.  It doesn't
> >>> need to be exposed to the PVH domU at all, but it's a way for the
> >>> device model to identify which interrupt should be mapped to which
> >>> domain.
> >>
> >> Since pIRQ-s are per-domain, _that_ kind of association won't be
> >> helped. But yes, as per above it could serve as an abstract handle-
> >> like value.
> > 
> > I would be fine with doing the interrupt bindings based on IRQs
> > instead of pIRQs, but I'm afraid that would require more changes to
> > hypercalls and Xen internals.
> > 
> > At some point I need to work on a new interface to do passthrough, so
> > that we can remove the usage of domctls from QEMU.  That might be a
> > good opportunity to switch from using pIRQs.
> 
> Thanks for your input, but I may be a bit behind you with my knowledge and can't fully understand the discussion.
> How should I modify this question later?
> Should I add a new hypercall specifically for passthrough?
> Or if it is to prevent the (un)map from being used for PVH guests, can I just add a new function to check if the subject domain is a PVH type? Like is_pvh_domain().

I think that would be part of a new interface, as said before I don't
think it would be fair to force you to do all this work.  I won't
oppose with the approach to attempt to re-use the existing interfaces
as much as possible.

I think this patch needs to be adjusted to drop the change to
xen/arch/x86/physdev.c, as just allowing PHYSDEVOP_{,un}map_pirq
without any change to do_physdev_op() should result in the correct
behavior.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-08-02  8:11                           ` Roger Pau Monné
@ 2024-08-02  8:17                             ` Chen, Jiqian
  2024-08-02  8:35                               ` Roger Pau Monné
  2024-08-02  9:37                             ` Jan Beulich
  1 sibling, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  8:17 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel@lists.xenproject.org, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/8/2 16:11, Roger Pau Monné wrote:
> On Fri, Aug 02, 2024 at 02:37:24AM +0000, Chen, Jiqian wrote:
>> On 2024/7/31 21:03, Roger Pau Monné wrote:
>>> On Wed, Jul 31, 2024 at 01:39:40PM +0200, Jan Beulich wrote:
>>>> On 31.07.2024 13:29, Roger Pau Monné wrote:
>>>>> On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
>>>>>> On 31.07.2024 11:37, Roger Pau Monné wrote:
>>>>>>> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
>>>>>>>> On 31.07.2024 10:51, Roger Pau Monné wrote:
>>>>>>>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
>>>>>>>>> hypercalls should be gated on this.  As said a PV dom0 is already
>>>>>>>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
>>>>>>>>> domU.
>>>>>>>>
>>>>>>>> Okay, I can accept that as an intermediate position. We ought to deny
>>>>>>>> such requests at some point though for PVH domains, the latest in the
>>>>>>>> course of making vPCI work there.
>>>>>>>
>>>>>>> Hm, once physdev_map_pirq() works as intended against PVH domains, I
>>>>>>> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
>>>>>>> against such domains.
>>>>>>
>>>>>> Well. If it can be made work as intended, then I certainly agree. However,
>>>>>> without even the concept of pIRQ in PVH I'm having a hard time seeing how
>>>>>> it can be made work. Iirc you were advocating for us to not introduce pIRQ
>>>>>> into PVH.
>>>>>
>>>>> From what I'm seeing here the intention is to expose
>>>>> PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
>>>>> pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.
>>>>
>>>> Only in so far as it is an abstract, handle-like value pertaining solely
>>>> to the target domain.
>>>>
>>>>>> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
>>>>>> pass in GSIs?
>>>>>
>>>>> I think that was one my proposals, to either introduce a new
>>>>> hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
>>>>> in an ABI compatible way so that semantically the field could be a GSI
>>>>> rather than a pIRQ.  We however would also need a way to reference an
>>>>> MSI entry.
>>>>
>>>> Of course.
>>>>
>>>>> My main concern is not with pIRQs by itself, pIRQs are just an
>>>>> abstract way to reference interrupts, my concern and what I wanted to
>>>>> avoid on PVH is being able to route pIRQs over event channels.  IOW:
>>>>> have interrupts from physical devices delivered over event channels.
>>>>
>>>> Oh, I might have slightly misunderstood your intentions then.
>>>
>>> My intention would be to not even use pIRQs at all, in order to avoid
>>> the temptation of the guest itself managing interrupts using
>>> hypercalls, hence I would have preferred that abstract interface to be
>>> something else.
>>>
>>> Maybe we could even expose the Xen IRQ space directly, and just use
>>> that as interrupt handles, but since I'm not the one doing the work
>>> I'm not sure it's fair to ask for something that would require more
>>> changes internally to Xen.
>>>
>>>>>> I think I suggested something along these lines also to
>>>>>> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
>>>>>> not sure this could be made work reliably.
>>>>>
>>>>> I'm afraid I've been lacking behind on reviewing those series.
>>>>>
>>>>>> Which reminds me of another question I had: What meaning does the pirq
>>>>>> field have right now, if Dom0 would issue the request against a PVH DomU?
>>>>>> What meaning will it have for a !has_pirq() HVM domain?
>>>>>
>>>>> The pirq field could be a way to reference an interrupt.  It doesn't
>>>>> need to be exposed to the PVH domU at all, but it's a way for the
>>>>> device model to identify which interrupt should be mapped to which
>>>>> domain.
>>>>
>>>> Since pIRQ-s are per-domain, _that_ kind of association won't be
>>>> helped. But yes, as per above it could serve as an abstract handle-
>>>> like value.
>>>
>>> I would be fine with doing the interrupt bindings based on IRQs
>>> instead of pIRQs, but I'm afraid that would require more changes to
>>> hypercalls and Xen internals.
>>>
>>> At some point I need to work on a new interface to do passthrough, so
>>> that we can remove the usage of domctls from QEMU.  That might be a
>>> good opportunity to switch from using pIRQs.
>>
>> Thanks for your input, but I may be a bit behind you with my knowledge and can't fully understand the discussion.
>> How should I modify this question later?
>> Should I add a new hypercall specifically for passthrough?
>> Or if it is to prevent the (un)map from being used for PVH guests, can I just add a new function to check if the subject domain is a PVH type? Like is_pvh_domain().
> 
> I think that would be part of a new interface, as said before I don't
> think it would be fair to force you to do all this work.  I won't
> oppose with the approach to attempt to re-use the existing interfaces
> as much as possible.
Thanks.

> 
> I think this patch needs to be adjusted to drop the change to
> xen/arch/x86/physdev.c, as just allowing PHYSDEVOP_{,un}map_pirq
> without any change to do_physdev_op() should result in the correct
> behavior.
Do you mean that I don't need to add any further restrictions in do_physdev_op(), just simply allow PHYSDEVOP_{,un}map_pirq in hvm_physdev_op() ?

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-08-02  8:17                             ` Chen, Jiqian
@ 2024-08-02  8:35                               ` Roger Pau Monné
  2024-08-02  8:40                                 ` Chen, Jiqian
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-02  8:35 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Jan Beulich, xen-devel@lists.xenproject.org, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray

On Fri, Aug 02, 2024 at 08:17:15AM +0000, Chen, Jiqian wrote:
> On 2024/8/2 16:11, Roger Pau Monné wrote:
> > On Fri, Aug 02, 2024 at 02:37:24AM +0000, Chen, Jiqian wrote:
> >> On 2024/7/31 21:03, Roger Pau Monné wrote:
> >>> On Wed, Jul 31, 2024 at 01:39:40PM +0200, Jan Beulich wrote:
> >>>> On 31.07.2024 13:29, Roger Pau Monné wrote:
> >>>>> On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
> >>>>>> On 31.07.2024 11:37, Roger Pau Monné wrote:
> >>>>>>> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
> >>>>>>>> On 31.07.2024 10:51, Roger Pau Monné wrote:
> >>>>>>>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
> >>>>>>>>> hypercalls should be gated on this.  As said a PV dom0 is already
> >>>>>>>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
> >>>>>>>>> domU.
> >>>>>>>>
> >>>>>>>> Okay, I can accept that as an intermediate position. We ought to deny
> >>>>>>>> such requests at some point though for PVH domains, the latest in the
> >>>>>>>> course of making vPCI work there.
> >>>>>>>
> >>>>>>> Hm, once physdev_map_pirq() works as intended against PVH domains, I
> >>>>>>> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
> >>>>>>> against such domains.
> >>>>>>
> >>>>>> Well. If it can be made work as intended, then I certainly agree. However,
> >>>>>> without even the concept of pIRQ in PVH I'm having a hard time seeing how
> >>>>>> it can be made work. Iirc you were advocating for us to not introduce pIRQ
> >>>>>> into PVH.
> >>>>>
> >>>>> From what I'm seeing here the intention is to expose
> >>>>> PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
> >>>>> pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.
> >>>>
> >>>> Only in so far as it is an abstract, handle-like value pertaining solely
> >>>> to the target domain.
> >>>>
> >>>>>> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
> >>>>>> pass in GSIs?
> >>>>>
> >>>>> I think that was one my proposals, to either introduce a new
> >>>>> hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
> >>>>> in an ABI compatible way so that semantically the field could be a GSI
> >>>>> rather than a pIRQ.  We however would also need a way to reference an
> >>>>> MSI entry.
> >>>>
> >>>> Of course.
> >>>>
> >>>>> My main concern is not with pIRQs by itself, pIRQs are just an
> >>>>> abstract way to reference interrupts, my concern and what I wanted to
> >>>>> avoid on PVH is being able to route pIRQs over event channels.  IOW:
> >>>>> have interrupts from physical devices delivered over event channels.
> >>>>
> >>>> Oh, I might have slightly misunderstood your intentions then.
> >>>
> >>> My intention would be to not even use pIRQs at all, in order to avoid
> >>> the temptation of the guest itself managing interrupts using
> >>> hypercalls, hence I would have preferred that abstract interface to be
> >>> something else.
> >>>
> >>> Maybe we could even expose the Xen IRQ space directly, and just use
> >>> that as interrupt handles, but since I'm not the one doing the work
> >>> I'm not sure it's fair to ask for something that would require more
> >>> changes internally to Xen.
> >>>
> >>>>>> I think I suggested something along these lines also to
> >>>>>> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
> >>>>>> not sure this could be made work reliably.
> >>>>>
> >>>>> I'm afraid I've been lacking behind on reviewing those series.
> >>>>>
> >>>>>> Which reminds me of another question I had: What meaning does the pirq
> >>>>>> field have right now, if Dom0 would issue the request against a PVH DomU?
> >>>>>> What meaning will it have for a !has_pirq() HVM domain?
> >>>>>
> >>>>> The pirq field could be a way to reference an interrupt.  It doesn't
> >>>>> need to be exposed to the PVH domU at all, but it's a way for the
> >>>>> device model to identify which interrupt should be mapped to which
> >>>>> domain.
> >>>>
> >>>> Since pIRQ-s are per-domain, _that_ kind of association won't be
> >>>> helped. But yes, as per above it could serve as an abstract handle-
> >>>> like value.
> >>>
> >>> I would be fine with doing the interrupt bindings based on IRQs
> >>> instead of pIRQs, but I'm afraid that would require more changes to
> >>> hypercalls and Xen internals.
> >>>
> >>> At some point I need to work on a new interface to do passthrough, so
> >>> that we can remove the usage of domctls from QEMU.  That might be a
> >>> good opportunity to switch from using pIRQs.
> >>
> >> Thanks for your input, but I may be a bit behind you with my knowledge and can't fully understand the discussion.
> >> How should I modify this question later?
> >> Should I add a new hypercall specifically for passthrough?
> >> Or if it is to prevent the (un)map from being used for PVH guests, can I just add a new function to check if the subject domain is a PVH type? Like is_pvh_domain().
> > 
> > I think that would be part of a new interface, as said before I don't
> > think it would be fair to force you to do all this work.  I won't
> > oppose with the approach to attempt to re-use the existing interfaces
> > as much as possible.
> Thanks.
> 
> > 
> > I think this patch needs to be adjusted to drop the change to
> > xen/arch/x86/physdev.c, as just allowing PHYSDEVOP_{,un}map_pirq
> > without any change to do_physdev_op() should result in the correct
> > behavior.
> Do you mean that I don't need to add any further restrictions in do_physdev_op(), just simply allow PHYSDEVOP_{,un}map_pirq in hvm_physdev_op() ?

That's my understanding, yes, no further restrictions should be added.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-08-02  8:35                               ` Roger Pau Monné
@ 2024-08-02  8:40                                 ` Chen, Jiqian
  2024-08-02  9:17                                   ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  8:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org,
	Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

Hi Jan,

On 2024/8/2 16:35, Roger Pau Monné wrote:
> On Fri, Aug 02, 2024 at 08:17:15AM +0000, Chen, Jiqian wrote:
>> On 2024/8/2 16:11, Roger Pau Monné wrote:
>>> I think this patch needs to be adjusted to drop the change to
>>> xen/arch/x86/physdev.c, as just allowing PHYSDEVOP_{,un}map_pirq
>>> without any change to do_physdev_op() should result in the correct
>>> behavior.
>> Do you mean that I don't need to add any further restrictions in do_physdev_op(), just simply allow PHYSDEVOP_{,un}map_pirq in hvm_physdev_op() ?
> 
> That's my understanding, yes, no further restrictions should be added.

Are you okey with this?

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-08-02  8:40                                 ` Chen, Jiqian
@ 2024-08-02  9:17                                   ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2024-08-02  9:17 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Roger Pau Monné, xen-devel@lists.xenproject.org,
	Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray

On 02.08.2024 10:40, Chen, Jiqian wrote:
> On 2024/8/2 16:35, Roger Pau Monné wrote:
>> On Fri, Aug 02, 2024 at 08:17:15AM +0000, Chen, Jiqian wrote:
>>> On 2024/8/2 16:11, Roger Pau Monné wrote:
>>>> I think this patch needs to be adjusted to drop the change to
>>>> xen/arch/x86/physdev.c, as just allowing PHYSDEVOP_{,un}map_pirq
>>>> without any change to do_physdev_op() should result in the correct
>>>> behavior.
>>> Do you mean that I don't need to add any further restrictions in do_physdev_op(), just simply allow PHYSDEVOP_{,un}map_pirq in hvm_physdev_op() ?
>>
>> That's my understanding, yes, no further restrictions should be added.
> 
> Are you okey with this?

I think I already indicated so - yes, for the time being.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-08-02  8:11                           ` Roger Pau Monné
  2024-08-02  8:17                             ` Chen, Jiqian
@ 2024-08-02  9:37                             ` Jan Beulich
  1 sibling, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2024-08-02  9:37 UTC (permalink / raw)
  To: Roger Pau Monné, Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray

On 02.08.2024 10:11, Roger Pau Monné wrote:
> On Fri, Aug 02, 2024 at 02:37:24AM +0000, Chen, Jiqian wrote:
>> On 2024/7/31 21:03, Roger Pau Monné wrote:
>>> On Wed, Jul 31, 2024 at 01:39:40PM +0200, Jan Beulich wrote:
>>>> On 31.07.2024 13:29, Roger Pau Monné wrote:
>>>>> On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
>>>>>> On 31.07.2024 11:37, Roger Pau Monné wrote:
>>>>>>> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
>>>>>>>> On 31.07.2024 10:51, Roger Pau Monné wrote:
>>>>>>>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
>>>>>>>>> hypercalls should be gated on this.  As said a PV dom0 is already
>>>>>>>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
>>>>>>>>> domU.
>>>>>>>>
>>>>>>>> Okay, I can accept that as an intermediate position. We ought to deny
>>>>>>>> such requests at some point though for PVH domains, the latest in the
>>>>>>>> course of making vPCI work there.
>>>>>>>
>>>>>>> Hm, once physdev_map_pirq() works as intended against PVH domains, I
>>>>>>> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
>>>>>>> against such domains.
>>>>>>
>>>>>> Well. If it can be made work as intended, then I certainly agree. However,
>>>>>> without even the concept of pIRQ in PVH I'm having a hard time seeing how
>>>>>> it can be made work. Iirc you were advocating for us to not introduce pIRQ
>>>>>> into PVH.
>>>>>
>>>>> From what I'm seeing here the intention is to expose
>>>>> PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
>>>>> pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.
>>>>
>>>> Only in so far as it is an abstract, handle-like value pertaining solely
>>>> to the target domain.
>>>>
>>>>>> Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
>>>>>> pass in GSIs?
>>>>>
>>>>> I think that was one my proposals, to either introduce a new
>>>>> hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
>>>>> in an ABI compatible way so that semantically the field could be a GSI
>>>>> rather than a pIRQ.  We however would also need a way to reference an
>>>>> MSI entry.
>>>>
>>>> Of course.
>>>>
>>>>> My main concern is not with pIRQs by itself, pIRQs are just an
>>>>> abstract way to reference interrupts, my concern and what I wanted to
>>>>> avoid on PVH is being able to route pIRQs over event channels.  IOW:
>>>>> have interrupts from physical devices delivered over event channels.
>>>>
>>>> Oh, I might have slightly misunderstood your intentions then.
>>>
>>> My intention would be to not even use pIRQs at all, in order to avoid
>>> the temptation of the guest itself managing interrupts using
>>> hypercalls, hence I would have preferred that abstract interface to be
>>> something else.
>>>
>>> Maybe we could even expose the Xen IRQ space directly, and just use
>>> that as interrupt handles, but since I'm not the one doing the work
>>> I'm not sure it's fair to ask for something that would require more
>>> changes internally to Xen.
>>>
>>>>>> I think I suggested something along these lines also to
>>>>>> Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
>>>>>> not sure this could be made work reliably.
>>>>>
>>>>> I'm afraid I've been lacking behind on reviewing those series.
>>>>>
>>>>>> Which reminds me of another question I had: What meaning does the pirq
>>>>>> field have right now, if Dom0 would issue the request against a PVH DomU?
>>>>>> What meaning will it have for a !has_pirq() HVM domain?
>>>>>
>>>>> The pirq field could be a way to reference an interrupt.  It doesn't
>>>>> need to be exposed to the PVH domU at all, but it's a way for the
>>>>> device model to identify which interrupt should be mapped to which
>>>>> domain.
>>>>
>>>> Since pIRQ-s are per-domain, _that_ kind of association won't be
>>>> helped. But yes, as per above it could serve as an abstract handle-
>>>> like value.
>>>
>>> I would be fine with doing the interrupt bindings based on IRQs
>>> instead of pIRQs, but I'm afraid that would require more changes to
>>> hypercalls and Xen internals.
>>>
>>> At some point I need to work on a new interface to do passthrough, so
>>> that we can remove the usage of domctls from QEMU.  That might be a
>>> good opportunity to switch from using pIRQs.
>>
>> Thanks for your input, but I may be a bit behind you with my knowledge and can't fully understand the discussion.
>> How should I modify this question later?
>> Should I add a new hypercall specifically for passthrough?
>> Or if it is to prevent the (un)map from being used for PVH guests, can I just add a new function to check if the subject domain is a PVH type? Like is_pvh_domain().
> 
> I think that would be part of a new interface, as said before I don't
> think it would be fair to force you to do all this work.  I won't
> oppose with the approach to attempt to re-use the existing interfaces
> as much as possible.
> 
> I think this patch needs to be adjusted to drop the change to
> xen/arch/x86/physdev.c, as just allowing PHYSDEVOP_{,un}map_pirq
> without any change to do_physdev_op() should result in the correct
> behavior.

Plus perhaps adding respective clarification to the description, as
to exposing the functionality to wider than (presently) necessary
"audience".

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-07-31  7:50   ` Roger Pau Monné
  2024-07-31  7:58     ` Jan Beulich
@ 2024-07-31  8:39     ` Chen, Jiqian
  1 sibling, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-31  8:39 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/7/31 15:50, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>> And add a new check to prevent (un)map when the subject domain
>> doesn't have a notion of PIRQ.
>>
>> So that the interrupt of a passthrough device can be
>> successfully mapped to pirq for domU with a notion of PIRQ
>> when dom0 is PVH
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  6 ++++++
>>  xen/arch/x86/physdev.c       | 12 ++++++++++--
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 0fab670a4871..03ada3c880bd 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>  
>>      switch ( cmd )
>>      {
>> +        /*
>> +        * Only being permitted for management of other domains.
>> +        * Further restrictions are enforced in do_physdev_op.
>> +        */
>>      case PHYSDEVOP_map_pirq:
>>      case PHYSDEVOP_unmap_pirq:
>> +        break;
>> +
>>      case PHYSDEVOP_eoi:
>>      case PHYSDEVOP_irq_status_query:
>>      case PHYSDEVOP_get_free_pirq:
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index d6dd622952a9..9f30a8c63a06 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          if ( !d )
>>              break;
>>  
>> -        ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
>> +        /* Only mapping when the subject domain has a notion of PIRQ */
>> +        if ( !is_hvm_domain(d) || has_pirq(d) )
> 
> I'm afraid this is not true.  It's fine to map interrupts to HVM
> domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
> allow HVM domains to route interrupts from devices (either emulated or
> passed through) over event channels.
> 
> It might have worked in the past (when using a version of Xen < 4.19)
> because XENFEAT_hvm_pirqs was enabled by default for HVM guests.
> 
> physdev_map_pirq() will work fine when used against domains that don't
> have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.
> 
> I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
> keep the code in do_physdev_op() as-is.  You will have to check
> whether the current paths in do_physdev_op() are not making
> assumptions about XENFEAT_hvm_pirqs being enabled when the calling
> domain is of HVM type.  I don't think that's the case, but better
> check.
If I understand correctly, you also talked about preventing self-mapping when the domain is HVM type and doesn't has XENFEAT_hvm_pirqs.
Change to this?
        if ( !is_hvm_domain(d) || has_pirq(d) || d != currd )
            ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
        else
            ret = -EOPNOTSUPP;

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-07-08 11:41 ` [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
  2024-07-08 11:41 ` [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-07-08 11:41 ` Jiqian Chen
  2024-07-10  8:01   ` Chen, Jiqian
                     ` (2 more replies)
  2024-07-08 11:41 ` [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis may not get registered(see below
clarification), it causes the info of apic, pin and irq not be
added into irq_2_pin list, and the handler of irq_desc is not set,
then when passthrough a device, setting ioapic affinity and vector
will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Clarify two questions:
First, why the gsi of devices belong to PVH dom0 can work?
Because when probe a driver to a normal device, it uses the normal
probe function of pci device, in its callstack, it requests irq
and unmask corresponding ioapic of gsi, then trap into xen and
register gsi finally.
Callstack is(on linux kernel side) pci_device_probe->
request_threaded_irq-> irq_startup-> __unmask_ioapic->
io_apic_write, then trap into xen hvmemul_do_io->
hvm_io_intercept-> hvm_process_io_intercept->
vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
So that the gsi can be registered.

Second, why the gsi of passthrough device can't work when dom0
is PVH?
Because when assign a device to passthrough, it uses the specific
probe function of pciback, in its callstack, it doesn't install a
fake irq handler due to the ISR is not running. So that
mp_register_gsi on Xen side is never called, then the gsi is not
registered.
Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
pcistub_init_device-> xen_pcibk_reset_device->
xen_pcibk_control_isr->isr_on==0.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ENOSYS;
         break;
 
+    case PHYSDEVOP_setup_gsi:
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-07-08 11:41 ` [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
@ 2024-07-10  8:01   ` Chen, Jiqian
  2024-07-11  7:58   ` Chen, Jiqian
  2024-07-22 21:38   ` Stefano Stabellini
  2 siblings, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-10  8:01 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

Hi,

On 2024/7/8 19:41, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis may not get registered(see below
> clarification), it causes the info of apic, pin and irq not be
> added into irq_2_pin list, and the handler of irq_desc is not set,
> then when passthrough a device, setting ioapic affinity and vector
> will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Clarify two questions:
> First, why the gsi of devices belong to PVH dom0 can work?
> Because when probe a driver to a normal device, it uses the normal
> probe function of pci device, in its callstack, it requests irq
> and unmask corresponding ioapic of gsi, then trap into xen and
> register gsi finally.
> Callstack is(on linux kernel side) pci_device_probe->
> request_threaded_irq-> irq_startup-> __unmask_ioapic->
> io_apic_write, then trap into xen hvmemul_do_io->
> hvm_io_intercept-> hvm_process_io_intercept->
> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
> So that the gsi can be registered.
> 
> Second, why the gsi of passthrough device can't work when dom0
> is PVH?
> Because when assign a device to passthrough, it uses the specific
> probe function of pciback, in its callstack, it doesn't install a
> fake irq handler due to the ISR is not running. So that
> mp_register_gsi on Xen side is never called, then the gsi is not
> registered.
> Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
> pcistub_init_device-> xen_pcibk_reset_device->
> xen_pcibk_control_isr->isr_on==0.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  xen/arch/x86/hvm/hypercall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 03ada3c880bd..cfe82d0f96ed 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return -ENOSYS;
>          break;
>  
> +    case PHYSDEVOP_setup_gsi:
>      case PHYSDEVOP_pci_mmcfg_reserved:
>      case PHYSDEVOP_pci_device_add:
>      case PHYSDEVOP_pci_device_remove:

Do you have any other concern about this patch?
If not, may I get your Reviewd-by?
Then the first three patches of this series can be considered to merged once I send next version, so that I can continue to upstream the kernel patches that depend on them.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-07-08 11:41 ` [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
  2024-07-10  8:01   ` Chen, Jiqian
@ 2024-07-11  7:58   ` Chen, Jiqian
  2024-07-22 21:38   ` Stefano Stabellini
  2 siblings, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-11  7:58 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Jan Beulich, Andrew Cooper,
	Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Wei Liu, George Dunlap,
	Julien Grall, Anthony PERARD, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

Hi all,

On 2024/7/8 19:41, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis may not get registered(see below
> clarification), it causes the info of apic, pin and irq not be
> added into irq_2_pin list, and the handler of irq_desc is not set,
> then when passthrough a device, setting ioapic affinity and vector
> will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Clarify two questions:
> First, why the gsi of devices belong to PVH dom0 can work?
> Because when probe a driver to a normal device, it uses the normal
> probe function of pci device, in its callstack, it requests irq
> and unmask corresponding ioapic of gsi, then trap into xen and
> register gsi finally.
> Callstack is(on linux kernel side) pci_device_probe->
> request_threaded_irq-> irq_startup-> __unmask_ioapic->
> io_apic_write, then trap into xen hvmemul_do_io->
> hvm_io_intercept-> hvm_process_io_intercept->
> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
> So that the gsi can be registered.
> 
> Second, why the gsi of passthrough device can't work when dom0
> is PVH?
> Because when assign a device to passthrough, it uses the specific
> probe function of pciback, in its callstack, it doesn't install a
> fake irq handler due to the ISR is not running. So that
> mp_register_gsi on Xen side is never called, then the gsi is not
> registered.
> Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
> pcistub_init_device-> xen_pcibk_reset_device->
> xen_pcibk_control_isr->isr_on==0.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  xen/arch/x86/hvm/hypercall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 03ada3c880bd..cfe82d0f96ed 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return -ENOSYS;
>          break;
>  
> +    case PHYSDEVOP_setup_gsi:
>      case PHYSDEVOP_pci_mmcfg_reserved:
>      case PHYSDEVOP_pci_device_add:
>      case PHYSDEVOP_pci_device_remove:

If you still have concerns about this implementation that allow PHYSDEVOP_setup_gsi for PVH on Xen side
and call PHYSDEVOP_setup_gsi when pciback probe the passthrough device.

I have another method to solve this gsi is not registered problem.
It is to adjust the codes of pciback on linux kernl side.
See:
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 51b3002b085b..db94529e65f9 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -445,6 +445,10 @@ static int pcistub_init_device(struct pcistub_device *psdev)
        err = pci_enable_device(dev);
        if (err)
                goto config_release;
+       else {
+               dev_data->enable_intx = 1;
+               xen_pcibk_control_isr(dev, 0);
+       }

During pcistub_init_device, once pcidev is enabled(through pci_enable_device), I enable the isr for pciback, so that the fake irq handler can be installed and then gsi can be registered.
In the end of pcistub_init_device, original code calls xen_pcibk_reset_device to disable isr and pcidev, so the fake irq handler will be freed. Then like nothing happened.
Do you think this method is feasible?
If so, we don't need this patch anymore.

Looking forward to getting your input.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-07-08 11:41 ` [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
  2024-07-10  8:01   ` Chen, Jiqian
  2024-07-11  7:58   ` Chen, Jiqian
@ 2024-07-22 21:38   ` Stefano Stabellini
  2 siblings, 0 replies; 76+ messages in thread
From: Stefano Stabellini @ 2024-07-22 21:38 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui

On Mon, 8 Jul 2024, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis may not get registered(see below
> clarification), it causes the info of apic, pin and irq not be
> added into irq_2_pin list, and the handler of irq_desc is not set,
> then when passthrough a device, setting ioapic affinity and vector
> will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Clarify two questions:
> First, why the gsi of devices belong to PVH dom0 can work?
> Because when probe a driver to a normal device, it uses the normal
> probe function of pci device, in its callstack, it requests irq
> and unmask corresponding ioapic of gsi, then trap into xen and
> register gsi finally.
> Callstack is(on linux kernel side) pci_device_probe->
> request_threaded_irq-> irq_startup-> __unmask_ioapic->
> io_apic_write, then trap into xen hvmemul_do_io->
> hvm_io_intercept-> hvm_process_io_intercept->
> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
> So that the gsi can be registered.
> 
> Second, why the gsi of passthrough device can't work when dom0
> is PVH?
> Because when assign a device to passthrough, it uses the specific
> probe function of pciback, in its callstack, it doesn't install a
> fake irq handler due to the ISR is not running. So that
> mp_register_gsi on Xen side is never called, then the gsi is not
> registered.
> Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
> pcistub_init_device-> xen_pcibk_reset_device->
> xen_pcibk_control_isr->isr_on==0.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
>  xen/arch/x86/hvm/hypercall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 03ada3c880bd..cfe82d0f96ed 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return -ENOSYS;
>          break;
>  
> +    case PHYSDEVOP_setup_gsi:
>      case PHYSDEVOP_pci_mmcfg_reserved:
>      case PHYSDEVOP_pci_device_add:
>      case PHYSDEVOP_pci_device_remove:
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (2 preceding siblings ...)
  2024-07-08 11:41 ` [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
@ 2024-07-08 11:41 ` Jiqian Chen
  2024-07-09 13:08   ` Jan Beulich
                     ` (3 more replies)
  2024-07-08 11:41 ` [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq Jiqian Chen
                   ` (2 subsequent siblings)
  6 siblings, 4 replies; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

Some type of domains don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq to set the access of irq, it is not suitable for
dom0 that doesn't have PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
the permission of irq(translate from x86 gsi) to dumU when dom0
has no PIRQs.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
CC: Daniel P . Smith <dpsmith@apertussolutions.com>
Remaining comment @Daniel P . Smith:
+        ret = -EPERM;
+        if ( !irq_access_permitted(currd, irq) ||
+             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
+            goto gsi_permission_out;
Is it okay to issue the XSM check using the translated value, 
not the one that was originally passed into the hypercall?
---
 xen/arch/x86/domctl.c              | 32 ++++++++++++++++++++++++++++++
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c             | 17 ++++++++++++++++
 xen/arch/x86/mpparse.c             |  5 ++---
 xen/include/public/domctl.h        |  9 +++++++++
 xen/xsm/flask/hooks.c              |  1 +
 6 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9190e11faaa3..4e9e4c4cfed3 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include <asm/xstate.h>
 #include <asm/psr.h>
 #include <asm/cpu-policy.h>
+#include <asm/io_apic.h>
 
 static int update_domain_cpu_policy(struct domain *d,
                                     xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,37 @@ long arch_do_domctl(
         break;
     }
 
+    case XEN_DOMCTL_gsi_permission:
+    {
+        int irq;
+        unsigned int gsi = domctl->u.gsi_permission.gsi;
+        uint8_t access_flag = domctl->u.gsi_permission.access_flag;
+
+        /* Check all bits and pads are zero except lowest bit */
+        ret = -EINVAL;
+        if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
+            goto gsi_permission_out;
+        for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
+            if ( domctl->u.gsi_permission.pad[i] )
+                goto gsi_permission_out;
+
+        if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
+            goto gsi_permission_out;
+
+        ret = -EPERM;
+        if ( !irq_access_permitted(currd, irq) ||
+             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
+            goto gsi_permission_out;
+
+        if ( access_flag )
+            ret = irq_permit_access(d, irq);
+        else
+            ret = irq_deny_access(d, irq);
+
+    gsi_permission_out:
+        break;
+    }
+
     case XEN_DOMCTL_getpageframeinfo3:
     {
         unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
index 78268ea8f666..7e86d8337758 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -213,5 +213,7 @@ unsigned highest_gsi(void);
 
 int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
+int mp_find_ioapic(int gsi);
+int gsi_2_irq(int gsi);
 
 #endif
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index d2a313c4ac72..5968c8055671 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
     return irq;
 }
 
+int gsi_2_irq(int gsi)
+{
+    int ioapic, pin, irq;
+
+    ioapic = mp_find_ioapic(gsi);
+    if ( ioapic < 0 )
+        return -EINVAL;
+
+    pin = gsi - io_apic_gsi_base(ioapic);
+
+    irq = apic_pin_2_gsi_irq(ioapic, pin);
+    if ( irq <= 0 )
+        return -EINVAL;
+
+    return irq;
+}
+
 static inline int IO_APIC_irq_trigger(int irq)
 {
     int apic, idx, pin;
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449c6..7786a3337760 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
 } mp_ioapic_routing[MAX_IO_APICS];
 
 
-static int mp_find_ioapic (
-	int			gsi)
+int mp_find_ioapic(int gsi)
 {
 	unsigned int		i;
 
@@ -914,7 +913,7 @@ void __init mp_register_ioapic (
 	return;
 }
 
-unsigned __init highest_gsi(void)
+unsigned highest_gsi(void)
 {
 	unsigned x, res = 0;
 	for (x = 0; x < nr_ioapics; x++)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 2a49fe46ce25..877e35ab1376 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
     uint8_t pad[3];
 };
 
+/* XEN_DOMCTL_gsi_permission */
+struct xen_domctl_gsi_permission {
+    uint32_t gsi;
+#define XEN_DOMCTL_GSI_PERMISSION_MASK 1
+    uint8_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
+    uint8_t pad[3];
+};
 
 /* XEN_DOMCTL_iomem_permission */
 struct xen_domctl_iomem_permission {
@@ -1306,6 +1313,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_get_paging_mempool_size       85
 #define XEN_DOMCTL_set_paging_mempool_size       86
 #define XEN_DOMCTL_dt_overlay                    87
+#define XEN_DOMCTL_gsi_permission                88
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1328,6 +1336,7 @@ struct xen_domctl {
         struct xen_domctl_setdomainhandle   setdomainhandle;
         struct xen_domctl_setdebugging      setdebugging;
         struct xen_domctl_irq_permission    irq_permission;
+        struct xen_domctl_gsi_permission    gsi_permission;
         struct xen_domctl_iomem_permission  iomem_permission;
         struct xen_domctl_ioport_permission ioport_permission;
         struct xen_domctl_hypercall_init    hypercall_init;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 5e88c71b8e22..a5b134c91101 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -685,6 +685,7 @@ static int cf_check flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_shadow_op:
     case XEN_DOMCTL_ioport_permission:
     case XEN_DOMCTL_ioport_mapping:
+    case XEN_DOMCTL_gsi_permission:
 #endif
 #ifdef CONFIG_HAS_PASSTHROUGH
     /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-08 11:41 ` [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
@ 2024-07-09 13:08   ` Jan Beulich
  2024-07-26  6:55     ` Chen, Jiqian
  2024-07-22 22:10   ` Stefano Stabellini
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-07-09 13:08 UTC (permalink / raw)
  To: Jiqian Chen, Daniel P . Smith
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Stewart Hildebrand, Huang Rui, xen-devel

On 08.07.2024 13:41, Jiqian Chen wrote:
> Some type of domains don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq to set the access of irq, it is not suitable for
> dom0 that doesn't have PIRQs.
> 
> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
> the permission of irq(translate from x86 gsi) to dumU when dom0
> has no PIRQs.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
> Remaining comment @Daniel P . Smith:
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
> +            goto gsi_permission_out;
> Is it okay to issue the XSM check using the translated value, 
> not the one that was originally passed into the hypercall?

As long as the answer to this is going to be "Yes":
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Daniel, awaiting your input.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-09 13:08   ` Jan Beulich
@ 2024-07-26  6:55     ` Chen, Jiqian
  0 siblings, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-26  6:55 UTC (permalink / raw)
  To: Daniel P . Smith
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Chen, Jiqian

Hi Daniel,

On 2024/7/9 21:08, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>> has no PIRQs.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
>> Remaining comment @Daniel P . Smith:
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +            goto gsi_permission_out;
>> Is it okay to issue the XSM check using the translated value, 
>> not the one that was originally passed into the hypercall?

Need your input.

> 
> As long as the answer to this is going to be "Yes":
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Daniel, awaiting your input.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-08 11:41 ` [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
  2024-07-09 13:08   ` Jan Beulich
@ 2024-07-22 22:10   ` Stefano Stabellini
  2024-07-26  6:53     ` Chen, Jiqian
  2024-08-01 11:06   ` Roger Pau Monné
  2024-08-02  8:08   ` Roger Pau Monné
  3 siblings, 1 reply; 76+ messages in thread
From: Stefano Stabellini @ 2024-07-22 22:10 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui

On Mon, 8 Jul 2024, Jiqian Chen wrote:
> Some type of domains don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq to set the access of irq, it is not suitable for
> dom0 that doesn't have PIRQs.
> 
> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
> the permission of irq(translate from x86 gsi) to dumU when dom0
> has no PIRQs.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
> Remaining comment @Daniel P . Smith:
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
> +            goto gsi_permission_out;
> Is it okay to issue the XSM check using the translated value, 
> not the one that was originally passed into the hypercall?
> ---
>  xen/arch/x86/domctl.c              | 32 ++++++++++++++++++++++++++++++
>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>  xen/arch/x86/io_apic.c             | 17 ++++++++++++++++
>  xen/arch/x86/mpparse.c             |  5 ++---
>  xen/include/public/domctl.h        |  9 +++++++++
>  xen/xsm/flask/hooks.c              |  1 +
>  6 files changed, 63 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 9190e11faaa3..4e9e4c4cfed3 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -36,6 +36,7 @@
>  #include <asm/xstate.h>
>  #include <asm/psr.h>
>  #include <asm/cpu-policy.h>
> +#include <asm/io_apic.h>
>  
>  static int update_domain_cpu_policy(struct domain *d,
>                                      xen_domctl_cpu_policy_t *xdpc)
> @@ -237,6 +238,37 @@ long arch_do_domctl(
>          break;
>      }
>  
> +    case XEN_DOMCTL_gsi_permission:
> +    {
> +        int irq;
> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
> +        uint8_t access_flag = domctl->u.gsi_permission.access_flag;
> +
> +        /* Check all bits and pads are zero except lowest bit */
> +        ret = -EINVAL;
> +        if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
> +            goto gsi_permission_out;
> +        for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
> +            if ( domctl->u.gsi_permission.pad[i] )
> +                goto gsi_permission_out;
> +
> +        if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )

gsi is unsigned int but it is passed to gsi_2_irq which takes an int as
parameter. If gsi >= INT32_MAX we have a problem. I think we should
explicitly check for the possible overflow and return error in that
case.


> +            goto gsi_permission_out;
> +
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
> +            goto gsi_permission_out;
> +
> +        if ( access_flag )
> +            ret = irq_permit_access(d, irq);
> +        else
> +            ret = irq_deny_access(d, irq);
> +
> +    gsi_permission_out:
> +        break;
> +    }
> +
>      case XEN_DOMCTL_getpageframeinfo3:
>      {
>          unsigned int num = domctl->u.getpageframeinfo3.num;
> diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
> index 78268ea8f666..7e86d8337758 100644
> --- a/xen/arch/x86/include/asm/io_apic.h
> +++ b/xen/arch/x86/include/asm/io_apic.h
> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>  
>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
> +int mp_find_ioapic(int gsi);
> +int gsi_2_irq(int gsi);
>  
>  #endif
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index d2a313c4ac72..5968c8055671 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
>      return irq;
>  }
>  
> +int gsi_2_irq(int gsi)
> +{
> +    int ioapic, pin, irq;
> +
> +    ioapic = mp_find_ioapic(gsi);
> +    if ( ioapic < 0 )
> +        return -EINVAL;
> +
> +    pin = gsi - io_apic_gsi_base(ioapic);
> +
> +    irq = apic_pin_2_gsi_irq(ioapic, pin);
> +    if ( irq <= 0 )
> +        return -EINVAL;
> +
> +    return irq;
> +}
> +
>  static inline int IO_APIC_irq_trigger(int irq)
>  {
>      int apic, idx, pin;
> diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
> index d8ccab2449c6..7786a3337760 100644
> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
>  } mp_ioapic_routing[MAX_IO_APICS];
>  
>  
> -static int mp_find_ioapic (
> -	int			gsi)
> +int mp_find_ioapic(int gsi)
>  {
>  	unsigned int		i;
>  
> @@ -914,7 +913,7 @@ void __init mp_register_ioapic (
>  	return;
>  }
>  
> -unsigned __init highest_gsi(void)
> +unsigned highest_gsi(void)
>  {
>  	unsigned x, res = 0;
>  	for (x = 0; x < nr_ioapics; x++)
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 2a49fe46ce25..877e35ab1376 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
>      uint8_t pad[3];
>  };
>  
> +/* XEN_DOMCTL_gsi_permission */
> +struct xen_domctl_gsi_permission {
> +    uint32_t gsi;
> +#define XEN_DOMCTL_GSI_PERMISSION_MASK 1
> +    uint8_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
> +    uint8_t pad[3];
> +};
>  
>  /* XEN_DOMCTL_iomem_permission */
>  struct xen_domctl_iomem_permission {
> @@ -1306,6 +1313,7 @@ struct xen_domctl {
>  #define XEN_DOMCTL_get_paging_mempool_size       85
>  #define XEN_DOMCTL_set_paging_mempool_size       86
>  #define XEN_DOMCTL_dt_overlay                    87
> +#define XEN_DOMCTL_gsi_permission                88
>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> @@ -1328,6 +1336,7 @@ struct xen_domctl {
>          struct xen_domctl_setdomainhandle   setdomainhandle;
>          struct xen_domctl_setdebugging      setdebugging;
>          struct xen_domctl_irq_permission    irq_permission;
> +        struct xen_domctl_gsi_permission    gsi_permission;
>          struct xen_domctl_iomem_permission  iomem_permission;
>          struct xen_domctl_ioport_permission ioport_permission;
>          struct xen_domctl_hypercall_init    hypercall_init;
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 5e88c71b8e22..a5b134c91101 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -685,6 +685,7 @@ static int cf_check flask_domctl(struct domain *d, int cmd)
>      case XEN_DOMCTL_shadow_op:
>      case XEN_DOMCTL_ioport_permission:
>      case XEN_DOMCTL_ioport_mapping:
> +    case XEN_DOMCTL_gsi_permission:
>  #endif
>  #ifdef CONFIG_HAS_PASSTHROUGH
>      /*
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-22 22:10   ` Stefano Stabellini
@ 2024-07-26  6:53     ` Chen, Jiqian
  2024-07-26 20:16       ` Stefano Stabellini
  0 siblings, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-26  6:53 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/7/23 06:10, Stefano Stabellini wrote:
> On Mon, 8 Jul 2024, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>> has no PIRQs.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
>> Remaining comment @Daniel P . Smith:
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +            goto gsi_permission_out;
>> Is it okay to issue the XSM check using the translated value, 
>> not the one that was originally passed into the hypercall?
>> ---
>>  xen/arch/x86/domctl.c              | 32 ++++++++++++++++++++++++++++++
>>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>>  xen/arch/x86/io_apic.c             | 17 ++++++++++++++++
>>  xen/arch/x86/mpparse.c             |  5 ++---
>>  xen/include/public/domctl.h        |  9 +++++++++
>>  xen/xsm/flask/hooks.c              |  1 +
>>  6 files changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index 9190e11faaa3..4e9e4c4cfed3 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -36,6 +36,7 @@
>>  #include <asm/xstate.h>
>>  #include <asm/psr.h>
>>  #include <asm/cpu-policy.h>
>> +#include <asm/io_apic.h>
>>  
>>  static int update_domain_cpu_policy(struct domain *d,
>>                                      xen_domctl_cpu_policy_t *xdpc)
>> @@ -237,6 +238,37 @@ long arch_do_domctl(
>>          break;
>>      }
>>  
>> +    case XEN_DOMCTL_gsi_permission:
>> +    {
>> +        int irq;
>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +        uint8_t access_flag = domctl->u.gsi_permission.access_flag;
>> +
>> +        /* Check all bits and pads are zero except lowest bit */
>> +        ret = -EINVAL;
>> +        if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
>> +            goto gsi_permission_out;
>> +        for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
>> +            if ( domctl->u.gsi_permission.pad[i] )
>> +                goto gsi_permission_out;
>> +
>> +        if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
> 
> gsi is unsigned int but it is passed to gsi_2_irq which takes an int as
> parameter. If gsi >= INT32_MAX we have a problem. I think we should
> explicitly check for the possible overflow and return error in that
> case.
But here has checked "gsi > highest_gsi()", can highesi_gsi() return a gsi >= INT32_MAX?

> 
> 
>> +            goto gsi_permission_out;
>> +
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +            goto gsi_permission_out;
>> +
>> +        if ( access_flag )
>> +            ret = irq_permit_access(d, irq);
>> +        else
>> +            ret = irq_deny_access(d, irq);
>> +
>> +    gsi_permission_out:
>> +        break;
>> +    }
>> +
>>      case XEN_DOMCTL_getpageframeinfo3:
>>      {
>>          unsigned int num = domctl->u.getpageframeinfo3.num;
>> diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
>> index 78268ea8f666..7e86d8337758 100644
>> --- a/xen/arch/x86/include/asm/io_apic.h
>> +++ b/xen/arch/x86/include/asm/io_apic.h
>> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>>  
>>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
>> +int mp_find_ioapic(int gsi);
>> +int gsi_2_irq(int gsi);
>>  
>>  #endif
>> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
>> index d2a313c4ac72..5968c8055671 100644
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
>>      return irq;
>>  }
>>  
>> +int gsi_2_irq(int gsi)
>> +{
>> +    int ioapic, pin, irq;
>> +
>> +    ioapic = mp_find_ioapic(gsi);
>> +    if ( ioapic < 0 )
>> +        return -EINVAL;
>> +
>> +    pin = gsi - io_apic_gsi_base(ioapic);
>> +
>> +    irq = apic_pin_2_gsi_irq(ioapic, pin);
>> +    if ( irq <= 0 )
>> +        return -EINVAL;
>> +
>> +    return irq;
>> +}
>> +
>>  static inline int IO_APIC_irq_trigger(int irq)
>>  {
>>      int apic, idx, pin;
>> diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
>> index d8ccab2449c6..7786a3337760 100644
>> --- a/xen/arch/x86/mpparse.c
>> +++ b/xen/arch/x86/mpparse.c
>> @@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
>>  } mp_ioapic_routing[MAX_IO_APICS];
>>  
>>  
>> -static int mp_find_ioapic (
>> -	int			gsi)
>> +int mp_find_ioapic(int gsi)
>>  {
>>  	unsigned int		i;
>>  
>> @@ -914,7 +913,7 @@ void __init mp_register_ioapic (
>>  	return;
>>  }
>>  
>> -unsigned __init highest_gsi(void)
>> +unsigned highest_gsi(void)
>>  {
>>  	unsigned x, res = 0;
>>  	for (x = 0; x < nr_ioapics; x++)
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 2a49fe46ce25..877e35ab1376 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
>>      uint8_t pad[3];
>>  };
>>  
>> +/* XEN_DOMCTL_gsi_permission */
>> +struct xen_domctl_gsi_permission {
>> +    uint32_t gsi;
>> +#define XEN_DOMCTL_GSI_PERMISSION_MASK 1
>> +    uint8_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
>> +    uint8_t pad[3];
>> +};
>>  
>>  /* XEN_DOMCTL_iomem_permission */
>>  struct xen_domctl_iomem_permission {
>> @@ -1306,6 +1313,7 @@ struct xen_domctl {
>>  #define XEN_DOMCTL_get_paging_mempool_size       85
>>  #define XEN_DOMCTL_set_paging_mempool_size       86
>>  #define XEN_DOMCTL_dt_overlay                    87
>> +#define XEN_DOMCTL_gsi_permission                88
>>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
>> @@ -1328,6 +1336,7 @@ struct xen_domctl {
>>          struct xen_domctl_setdomainhandle   setdomainhandle;
>>          struct xen_domctl_setdebugging      setdebugging;
>>          struct xen_domctl_irq_permission    irq_permission;
>> +        struct xen_domctl_gsi_permission    gsi_permission;
>>          struct xen_domctl_iomem_permission  iomem_permission;
>>          struct xen_domctl_ioport_permission ioport_permission;
>>          struct xen_domctl_hypercall_init    hypercall_init;
>> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
>> index 5e88c71b8e22..a5b134c91101 100644
>> --- a/xen/xsm/flask/hooks.c
>> +++ b/xen/xsm/flask/hooks.c
>> @@ -685,6 +685,7 @@ static int cf_check flask_domctl(struct domain *d, int cmd)
>>      case XEN_DOMCTL_shadow_op:
>>      case XEN_DOMCTL_ioport_permission:
>>      case XEN_DOMCTL_ioport_mapping:
>> +    case XEN_DOMCTL_gsi_permission:
>>  #endif
>>  #ifdef CONFIG_HAS_PASSTHROUGH
>>      /*
>> -- 
>> 2.34.1
>>

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-26  6:53     ` Chen, Jiqian
@ 2024-07-26 20:16       ` Stefano Stabellini
  0 siblings, 0 replies; 76+ messages in thread
From: Stefano Stabellini @ 2024-07-26 20:16 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Stefano Stabellini, xen-devel@lists.xenproject.org, Jan Beulich,
	Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray

On Fri, 26 Jul 2024, Chen, Jiqian wrote:
> On 2024/7/23 06:10, Stefano Stabellini wrote:
> > On Mon, 8 Jul 2024, Jiqian Chen wrote:
> >> Some type of domains don't have PIRQs, like PVH, it doesn't do
> >> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> >> to guest base on PVH dom0, callstack
> >> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> >> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> >> irq on Xen side.
> >> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> >> passing in pirq to set the access of irq, it is not suitable for
> >> dom0 that doesn't have PIRQs.
> >>
> >> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
> >> the permission of irq(translate from x86 gsi) to dumU when dom0
> >> has no PIRQs.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
> >> Remaining comment @Daniel P . Smith:
> >> +        ret = -EPERM;
> >> +        if ( !irq_access_permitted(currd, irq) ||
> >> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
> >> +            goto gsi_permission_out;
> >> Is it okay to issue the XSM check using the translated value, 
> >> not the one that was originally passed into the hypercall?
> >> ---
> >>  xen/arch/x86/domctl.c              | 32 ++++++++++++++++++++++++++++++
> >>  xen/arch/x86/include/asm/io_apic.h |  2 ++
> >>  xen/arch/x86/io_apic.c             | 17 ++++++++++++++++
> >>  xen/arch/x86/mpparse.c             |  5 ++---
> >>  xen/include/public/domctl.h        |  9 +++++++++
> >>  xen/xsm/flask/hooks.c              |  1 +
> >>  6 files changed, 63 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> >> index 9190e11faaa3..4e9e4c4cfed3 100644
> >> --- a/xen/arch/x86/domctl.c
> >> +++ b/xen/arch/x86/domctl.c
> >> @@ -36,6 +36,7 @@
> >>  #include <asm/xstate.h>
> >>  #include <asm/psr.h>
> >>  #include <asm/cpu-policy.h>
> >> +#include <asm/io_apic.h>
> >>  
> >>  static int update_domain_cpu_policy(struct domain *d,
> >>                                      xen_domctl_cpu_policy_t *xdpc)
> >> @@ -237,6 +238,37 @@ long arch_do_domctl(
> >>          break;
> >>      }
> >>  
> >> +    case XEN_DOMCTL_gsi_permission:
> >> +    {
> >> +        int irq;
> >> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
> >> +        uint8_t access_flag = domctl->u.gsi_permission.access_flag;
> >> +
> >> +        /* Check all bits and pads are zero except lowest bit */
> >> +        ret = -EINVAL;
> >> +        if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
> >> +            goto gsi_permission_out;
> >> +        for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
> >> +            if ( domctl->u.gsi_permission.pad[i] )
> >> +                goto gsi_permission_out;
> >> +
> >> +        if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
> > 
> > gsi is unsigned int but it is passed to gsi_2_irq which takes an int as
> > parameter. If gsi >= INT32_MAX we have a problem. I think we should
> > explicitly check for the possible overflow and return error in that
> > case.
> But here has checked "gsi > highest_gsi()", can highesi_gsi() return a gsi >= INT32_MAX?

In practice it is impossible but in theory it could. But then I looked
at the implementation of highest_gsi() and gsi_end actually a signed
int. So I think this is OK:

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-08 11:41 ` [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
  2024-07-09 13:08   ` Jan Beulich
  2024-07-22 22:10   ` Stefano Stabellini
@ 2024-08-01 11:06   ` Roger Pau Monné
  2024-08-01 11:36     ` Jan Beulich
  2024-08-02  3:10     ` Chen, Jiqian
  2024-08-02  8:08   ` Roger Pau Monné
  3 siblings, 2 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-01 11:06 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
> Some type of domains don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq to set the access of irq, it is not suitable for
> dom0 that doesn't have PIRQs.
> 
> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
> the permission of irq(translate from x86 gsi) to dumU when dom0
                       ^ missing space, and s/translate/translated/

> has no PIRQs.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
> Remaining comment @Daniel P . Smith:
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
> +            goto gsi_permission_out;
> Is it okay to issue the XSM check using the translated value, 
> not the one that was originally passed into the hypercall?

FWIW, I don't see the GSI -> IRQ translation much different from the
pIRQ -> IRQ translation done by pirq_access_permitted(), which is also
ahead of the xsm check.

> ---
>  xen/arch/x86/domctl.c              | 32 ++++++++++++++++++++++++++++++
>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>  xen/arch/x86/io_apic.c             | 17 ++++++++++++++++
>  xen/arch/x86/mpparse.c             |  5 ++---
>  xen/include/public/domctl.h        |  9 +++++++++
>  xen/xsm/flask/hooks.c              |  1 +
>  6 files changed, 63 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 9190e11faaa3..4e9e4c4cfed3 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -36,6 +36,7 @@
>  #include <asm/xstate.h>
>  #include <asm/psr.h>
>  #include <asm/cpu-policy.h>
> +#include <asm/io_apic.h>
>  
>  static int update_domain_cpu_policy(struct domain *d,
>                                      xen_domctl_cpu_policy_t *xdpc)
> @@ -237,6 +238,37 @@ long arch_do_domctl(
>          break;
>      }
>  
> +    case XEN_DOMCTL_gsi_permission:
> +    {
> +        int irq;
> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
> +        uint8_t access_flag = domctl->u.gsi_permission.access_flag;
> +
> +        /* Check all bits and pads are zero except lowest bit */
> +        ret = -EINVAL;
> +        if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
                              ^ unneeded parentheses and spaces.
> +            goto gsi_permission_out;
> +        for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
> +            if ( domctl->u.gsi_permission.pad[i] )
> +                goto gsi_permission_out;
> +
> +        if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )

FWIW, I would place the gsi > highest_gsi() check inside gsi_2_irq().
There's no reason to open-code it here, and it could help other
users of gsi_2_irq().  The error code could also be ERANGE here
instead of EINVAL IMO.

> +            goto gsi_permission_out;
> +
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
> +            goto gsi_permission_out;
> +
> +        if ( access_flag )
> +            ret = irq_permit_access(d, irq);
> +        else
> +            ret = irq_deny_access(d, irq);
> +
> +    gsi_permission_out:
> +        break;

Why do you need a label when it just contains a break?  Instead of the
goto gsi_permission_out just use break directly.

> +    }
> +
>      case XEN_DOMCTL_getpageframeinfo3:
>      {
>          unsigned int num = domctl->u.getpageframeinfo3.num;
> diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
> index 78268ea8f666..7e86d8337758 100644
> --- a/xen/arch/x86/include/asm/io_apic.h
> +++ b/xen/arch/x86/include/asm/io_apic.h
> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>  
>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
> +int mp_find_ioapic(int gsi);
> +int gsi_2_irq(int gsi);
>  
>  #endif
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index d2a313c4ac72..5968c8055671 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
>      return irq;
>  }
>  
> +int gsi_2_irq(int gsi)

unsigned int for gsi.

> +{
> +    int ioapic, pin, irq;

pin would better be unsigned int also.

> +
> +    ioapic = mp_find_ioapic(gsi);
> +    if ( ioapic < 0 )
> +        return -EINVAL;
> +
> +    pin = gsi - io_apic_gsi_base(ioapic);
> +
> +    irq = apic_pin_2_gsi_irq(ioapic, pin);
> +    if ( irq <= 0 )
> +        return -EINVAL;
> +
> +    return irq;
> +}
> +
>  static inline int IO_APIC_irq_trigger(int irq)
>  {
>      int apic, idx, pin;
> diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
> index d8ccab2449c6..7786a3337760 100644
> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
>  } mp_ioapic_routing[MAX_IO_APICS];
>  
>  
> -static int mp_find_ioapic (
> -	int			gsi)
> +int mp_find_ioapic(int gsi)

If you are changing this, you might as well make the gsi parameter
unsigned int.

>  {
>  	unsigned int		i;
>  
> @@ -914,7 +913,7 @@ void __init mp_register_ioapic (
>  	return;
>  }
>  
> -unsigned __init highest_gsi(void)
> +unsigned highest_gsi(void)
>  {
>  	unsigned x, res = 0;
>  	for (x = 0; x < nr_ioapics; x++)
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 2a49fe46ce25..877e35ab1376 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
>      uint8_t pad[3];
>  };
>  
> +/* XEN_DOMCTL_gsi_permission */
> +struct xen_domctl_gsi_permission {
> +    uint32_t gsi;
> +#define XEN_DOMCTL_GSI_PERMISSION_MASK 1

IMO this would be better named GRANT or similar, maybe something like:

/* Low bit used to signal grant/revoke action. */
#define XEN_DOMCTL_GSI_REVOKE 0
#define XEN_DOMCTL_GSI_GRANT  1

> +    uint8_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
> +    uint8_t pad[3];

We might as well declare the flags field as uint32_t and avoid the
padding field.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-01 11:06   ` Roger Pau Monné
@ 2024-08-01 11:36     ` Jan Beulich
  2024-08-01 12:41       ` Roger Pau Monné
  2024-08-02  3:10     ` Chen, Jiqian
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-08-01 11:36 UTC (permalink / raw)
  To: Roger Pau Monné, Daniel P . Smith
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Stewart Hildebrand, Huang Rui, Jiqian Chen

On 01.08.2024 13:06, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>> Remaining comment @Daniel P . Smith:
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +            goto gsi_permission_out;
>> Is it okay to issue the XSM check using the translated value, 
>> not the one that was originally passed into the hypercall?
> 
> FWIW, I don't see the GSI -> IRQ translation much different from the
> pIRQ -> IRQ translation done by pirq_access_permitted(), which is also
> ahead of the xsm check.

The question (which I raised originally) isn't an ordering one, but an
auditing one: Is it okay to pass the XSM hook a value that isn't what
was passed into the hypercall?

And Daniel, please, can you finally take a moment to help here, in your
role as XSM maintainer? Elsewhere you complained you weren't Cc-ed or
asked; now that you were asked, you haven't responded for weeks if not
months.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-01 11:36     ` Jan Beulich
@ 2024-08-01 12:41       ` Roger Pau Monné
  2024-08-01 13:11         ` Jan Beulich
  0 siblings, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-01 12:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Daniel P . Smith, xen-devel, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Thu, Aug 01, 2024 at 01:36:16PM +0200, Jan Beulich wrote:
> On 01.08.2024 13:06, Roger Pau Monné wrote:
> > On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
> >> Remaining comment @Daniel P . Smith:
> >> +        ret = -EPERM;
> >> +        if ( !irq_access_permitted(currd, irq) ||
> >> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
> >> +            goto gsi_permission_out;
> >> Is it okay to issue the XSM check using the translated value, 
> >> not the one that was originally passed into the hypercall?
> > 
> > FWIW, I don't see the GSI -> IRQ translation much different from the
> > pIRQ -> IRQ translation done by pirq_access_permitted(), which is also
> > ahead of the xsm check.
> 
> The question (which I raised originally) isn't an ordering one, but an
> auditing one: Is it okay to pass the XSM hook a value that isn't what
> was passed into the hypercall?

But that's also the case with the current XEN_DOMCTL_irq_permission
implementation?  As the hypercall parameter is a pIRQ, and the XSM
check is done against the translated IRQ obtained from the pIRQ
parameter.

Not saying you question is not relevant, but we already have at least
one very similar instance of doing the XSM check against a value
derived from an hypercall parameter.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-01 12:41       ` Roger Pau Monné
@ 2024-08-01 13:11         ` Jan Beulich
  0 siblings, 0 replies; 76+ messages in thread
From: Jan Beulich @ 2024-08-01 13:11 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Daniel P . Smith, xen-devel, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 01.08.2024 14:41, Roger Pau Monné wrote:
> On Thu, Aug 01, 2024 at 01:36:16PM +0200, Jan Beulich wrote:
>> On 01.08.2024 13:06, Roger Pau Monné wrote:
>>> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>>>> Remaining comment @Daniel P . Smith:
>>>> +        ret = -EPERM;
>>>> +        if ( !irq_access_permitted(currd, irq) ||
>>>> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>>>> +            goto gsi_permission_out;
>>>> Is it okay to issue the XSM check using the translated value, 
>>>> not the one that was originally passed into the hypercall?
>>>
>>> FWIW, I don't see the GSI -> IRQ translation much different from the
>>> pIRQ -> IRQ translation done by pirq_access_permitted(), which is also
>>> ahead of the xsm check.
>>
>> The question (which I raised originally) isn't an ordering one, but an
>> auditing one: Is it okay to pass the XSM hook a value that isn't what
>> was passed into the hypercall?
> 
> But that's also the case with the current XEN_DOMCTL_irq_permission
> implementation?  As the hypercall parameter is a pIRQ, and the XSM
> check is done against the translated IRQ obtained from the pIRQ
> parameter.

In a way you're right, but in a way there's also a meaningful difference:
There we translate between internal numbering spaces. Here we first
translate a quantity in a numbering space superimposed onto us to an
internal representation. Flask, otoh, in such a situation may prefer to
see the external representation of the resource.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-01 11:06   ` Roger Pau Monné
  2024-08-01 11:36     ` Jan Beulich
@ 2024-08-02  3:10     ` Chen, Jiqian
  2024-08-02  6:27       ` Jan Beulich
  2024-08-02  7:59       ` Roger Pau Monné
  1 sibling, 2 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  3:10 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/8/1 19:06, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>                        ^ missing space, and s/translate/translated/
> 
>> has no PIRQs.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
>> Remaining comment @Daniel P . Smith:
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +            goto gsi_permission_out;
>> Is it okay to issue the XSM check using the translated value, 
>> not the one that was originally passed into the hypercall?
> 
> FWIW, I don't see the GSI -> IRQ translation much different from the
> pIRQ -> IRQ translation done by pirq_access_permitted(), which is also
> ahead of the xsm check.
> 
>> ---
>>  xen/arch/x86/domctl.c              | 32 ++++++++++++++++++++++++++++++
>>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>>  xen/arch/x86/io_apic.c             | 17 ++++++++++++++++
>>  xen/arch/x86/mpparse.c             |  5 ++---
>>  xen/include/public/domctl.h        |  9 +++++++++
>>  xen/xsm/flask/hooks.c              |  1 +
>>  6 files changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index 9190e11faaa3..4e9e4c4cfed3 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -36,6 +36,7 @@
>>  #include <asm/xstate.h>
>>  #include <asm/psr.h>
>>  #include <asm/cpu-policy.h>
>> +#include <asm/io_apic.h>
>>  
>>  static int update_domain_cpu_policy(struct domain *d,
>>                                      xen_domctl_cpu_policy_t *xdpc)
>> @@ -237,6 +238,37 @@ long arch_do_domctl(
>>          break;
>>      }
>>  
>> +    case XEN_DOMCTL_gsi_permission:
>> +    {
>> +        int irq;
>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +        uint8_t access_flag = domctl->u.gsi_permission.access_flag;
>> +
>> +        /* Check all bits and pads are zero except lowest bit */
>> +        ret = -EINVAL;
>> +        if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
>                               ^ unneeded parentheses and spaces.
>> +            goto gsi_permission_out;
>> +        for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
>> +            if ( domctl->u.gsi_permission.pad[i] )
>> +                goto gsi_permission_out;
>> +
>> +        if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
> 
> FWIW, I would place the gsi > highest_gsi() check inside gsi_2_irq().
> There's no reason to open-code it here, and it could help other
> users of gsi_2_irq().  The error code could also be ERANGE here
> instead of EINVAL IMO.
> 
>> +            goto gsi_permission_out;
>> +
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +            goto gsi_permission_out;
>> +
>> +        if ( access_flag )
>> +            ret = irq_permit_access(d, irq);
>> +        else
>> +            ret = irq_deny_access(d, irq);
>> +
>> +    gsi_permission_out:
>> +        break;
> 
> Why do you need a label when it just contains a break?  Instead of the
> goto gsi_permission_out just use break directly.
> 
>> +    }
>> +
>>      case XEN_DOMCTL_getpageframeinfo3:
>>      {
>>          unsigned int num = domctl->u.getpageframeinfo3.num;
>> diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
>> index 78268ea8f666..7e86d8337758 100644
>> --- a/xen/arch/x86/include/asm/io_apic.h
>> +++ b/xen/arch/x86/include/asm/io_apic.h
>> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>>  
>>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
>> +int mp_find_ioapic(int gsi);
>> +int gsi_2_irq(int gsi);
>>  
>>  #endif
>> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
>> index d2a313c4ac72..5968c8055671 100644
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
>>      return irq;
>>  }
>>  
>> +int gsi_2_irq(int gsi)
> 
> unsigned int for gsi.
> 
>> +{
>> +    int ioapic, pin, irq;
> 
> pin would better be unsigned int also.
> 
>> +
>> +    ioapic = mp_find_ioapic(gsi);
>> +    if ( ioapic < 0 )
>> +        return -EINVAL;
>> +
>> +    pin = gsi - io_apic_gsi_base(ioapic);
>> +
>> +    irq = apic_pin_2_gsi_irq(ioapic, pin);
>> +    if ( irq <= 0 )
>> +        return -EINVAL;
>> +
>> +    return irq;
>> +}
>> +
>>  static inline int IO_APIC_irq_trigger(int irq)
>>  {
>>      int apic, idx, pin;
>> diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
>> index d8ccab2449c6..7786a3337760 100644
>> --- a/xen/arch/x86/mpparse.c
>> +++ b/xen/arch/x86/mpparse.c
>> @@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
>>  } mp_ioapic_routing[MAX_IO_APICS];
>>  
>>  
>> -static int mp_find_ioapic (
>> -	int			gsi)
>> +int mp_find_ioapic(int gsi)
> 
> If you are changing this, you might as well make the gsi parameter
> unsigned int.

Thanks, I will change codes according above comments in next version.

> 
>>  {
>>  	unsigned int		i;
>>  
>> @@ -914,7 +913,7 @@ void __init mp_register_ioapic (
>>  	return;
>>  }
>>  
>> -unsigned __init highest_gsi(void)
>> +unsigned highest_gsi(void)
>>  {
>>  	unsigned x, res = 0;
>>  	for (x = 0; x < nr_ioapics; x++)
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 2a49fe46ce25..877e35ab1376 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
>>      uint8_t pad[3];
>>  };
>>  
>> +/* XEN_DOMCTL_gsi_permission */
>> +struct xen_domctl_gsi_permission {
>> +    uint32_t gsi;
>> +#define XEN_DOMCTL_GSI_PERMISSION_MASK 1
> 
> IMO this would be better named GRANT or similar, maybe something like:
> 
> /* Low bit used to signal grant/revoke action. */
> #define XEN_DOMCTL_GSI_REVOKE 0
> #define XEN_DOMCTL_GSI_GRANT  1
> 
>> +    uint8_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
>> +    uint8_t pad[3];
> 
> We might as well declare the flags field as uint32_t and avoid the
> padding field.
So, should this struct be like below? Then I just need to check whether everything except the lowest bit is 0.
struct xen_domctl_gsi_permission {
    uint32_t gsi;
/* Lowest bit used to signal grant/revoke action. */
#define XEN_DOMCTL_GSI_REVOKE 0
#define XEN_DOMCTL_GSI_GRANT  1
#define XEN_DOMCTL_GSI_PERMISSION_MASK 1
    uint32_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
};

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-02  3:10     ` Chen, Jiqian
@ 2024-08-02  6:27       ` Jan Beulich
  2024-08-02  7:44         ` Chen, Jiqian
  2024-08-02  7:59       ` Roger Pau Monné
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-08-02  6:27 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	Roger Pau Monné

On 02.08.2024 05:10, Chen, Jiqian wrote:
> On 2024/8/1 19:06, Roger Pau Monné wrote:
>> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
>>>      uint8_t pad[3];
>>>  };
>>>  
>>> +/* XEN_DOMCTL_gsi_permission */
>>> +struct xen_domctl_gsi_permission {
>>> +    uint32_t gsi;
>>> +#define XEN_DOMCTL_GSI_PERMISSION_MASK 1
>>
>> IMO this would be better named GRANT or similar, maybe something like:
>>
>> /* Low bit used to signal grant/revoke action. */
>> #define XEN_DOMCTL_GSI_REVOKE 0
>> #define XEN_DOMCTL_GSI_GRANT  1
>>
>>> +    uint8_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
>>> +    uint8_t pad[3];
>>
>> We might as well declare the flags field as uint32_t and avoid the
>> padding field.
> So, should this struct be like below? Then I just need to check whether everything except the lowest bit is 0.
> struct xen_domctl_gsi_permission {
>     uint32_t gsi;
> /* Lowest bit used to signal grant/revoke action. */
> #define XEN_DOMCTL_GSI_REVOKE 0
> #define XEN_DOMCTL_GSI_GRANT  1
> #define XEN_DOMCTL_GSI_PERMISSION_MASK 1
>     uint32_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
> };

Yet then why "access_flags"? You can't foresee what meaning the other bits may
gain. That meaning may (and likely will) not be access related at all.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-02  6:27       ` Jan Beulich
@ 2024-08-02  7:44         ` Chen, Jiqian
  0 siblings, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  7:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	Roger Pau Monné, Chen, Jiqian

On 2024/8/2 14:27, Jan Beulich wrote:
> On 02.08.2024 05:10, Chen, Jiqian wrote:
>> On 2024/8/1 19:06, Roger Pau Monné wrote:
>>> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>>>> --- a/xen/include/public/domctl.h
>>>> +++ b/xen/include/public/domctl.h
>>>> @@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
>>>>      uint8_t pad[3];
>>>>  };
>>>>  
>>>> +/* XEN_DOMCTL_gsi_permission */
>>>> +struct xen_domctl_gsi_permission {
>>>> +    uint32_t gsi;
>>>> +#define XEN_DOMCTL_GSI_PERMISSION_MASK 1
>>>
>>> IMO this would be better named GRANT or similar, maybe something like:
>>>
>>> /* Low bit used to signal grant/revoke action. */
>>> #define XEN_DOMCTL_GSI_REVOKE 0
>>> #define XEN_DOMCTL_GSI_GRANT  1
>>>
>>>> +    uint8_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
>>>> +    uint8_t pad[3];
>>>
>>> We might as well declare the flags field as uint32_t and avoid the
>>> padding field.
>> So, should this struct be like below? Then I just need to check whether everything except the lowest bit is 0.
>> struct xen_domctl_gsi_permission {
>>     uint32_t gsi;
>> /* Lowest bit used to signal grant/revoke action. */
>> #define XEN_DOMCTL_GSI_REVOKE 0
>> #define XEN_DOMCTL_GSI_GRANT  1
>> #define XEN_DOMCTL_GSI_PERMISSION_MASK 1
>>     uint32_t access_flag;    /* flag to specify enable/disable of x86 gsi access */
>> };
> 
> Yet then why "access_flags"? You can't foresee what meaning the other bits may
> gain. That meaning may (and likely will) not be access related at all.

OK, just "uint32_t flags".

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-02  3:10     ` Chen, Jiqian
  2024-08-02  6:27       ` Jan Beulich
@ 2024-08-02  7:59       ` Roger Pau Monné
  1 sibling, 0 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-02  7:59 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray

On Fri, Aug 02, 2024 at 03:10:27AM +0000, Chen, Jiqian wrote:
> On 2024/8/1 19:06, Roger Pau Monné wrote:
> > We might as well declare the flags field as uint32_t and avoid the
> > padding field.
> So, should this struct be like below? Then I just need to check whether everything except the lowest bit is 0.
> struct xen_domctl_gsi_permission {
>     uint32_t gsi;
> /* Lowest bit used to signal grant/revoke action. */
> #define XEN_DOMCTL_GSI_REVOKE 0
> #define XEN_DOMCTL_GSI_GRANT  1
> #define XEN_DOMCTL_GSI_PERMISSION_MASK 1

Maybe ACTION_MASK rather than PERMISSION_MASK?

>     uint32_t access_flag;    /* flag to specify enable/disable of x86 gsi access */

I would again be fine naming this just flags and the comment is likely
to go stale quite soon if we add more flags.  However given the
simplicity of this hypercall I'm unsure whether any new flags could
appear.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-07-08 11:41 ` [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
                     ` (2 preceding siblings ...)
  2024-08-01 11:06   ` Roger Pau Monné
@ 2024-08-02  8:08   ` Roger Pau Monné
  2024-08-02  8:23     ` Chen, Jiqian
  2024-08-02  9:40     ` Jan Beulich
  3 siblings, 2 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-02  8:08 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
> Some type of domains don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq to set the access of irq, it is not suitable for
> dom0 that doesn't have PIRQs.
> 
> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
> the permission of irq(translate from x86 gsi) to dumU when dom0
> has no PIRQs.

I've been wondering about this, and if the hypercall is strictly to
resolve GSIs into IRQs, isn't that the case that Xen identity maps GSI
into the IRQ space, and hence no translation is required?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-02  8:08   ` Roger Pau Monné
@ 2024-08-02  8:23     ` Chen, Jiqian
  2024-08-02  9:40     ` Jan Beulich
  1 sibling, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  8:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/8/2 16:08, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>> has no PIRQs.
> 
> I've been wondering about this, and if the hypercall is strictly to
> resolve GSIs into IRQs, isn't that the case that Xen identity maps GSI
> into the IRQ space, and hence no translation is required?
Yes, for gsis that has no entries in mp_irqs, xen do the identity maps.
I will delete the words "translate .."

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-02  8:08   ` Roger Pau Monné
  2024-08-02  8:23     ` Chen, Jiqian
@ 2024-08-02  9:40     ` Jan Beulich
  2024-08-02 12:05       ` Roger Pau Monné
  1 sibling, 1 reply; 76+ messages in thread
From: Jan Beulich @ 2024-08-02  9:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 02.08.2024 10:08, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>> has no PIRQs.
> 
> I've been wondering about this, and if the hypercall is strictly to
> resolve GSIs into IRQs, isn't that the case that Xen identity maps GSI
> into the IRQ space, and hence no translation is required?

It was a long-winded discussion to clarify that in obscure cases
translation is required: Whenever there's a source override in ACPI.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-08-02  9:40     ` Jan Beulich
@ 2024-08-02 12:05       ` Roger Pau Monné
  0 siblings, 0 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-02 12:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On Fri, Aug 02, 2024 at 11:40:53AM +0200, Jan Beulich wrote:
> On 02.08.2024 10:08, Roger Pau Monné wrote:
> > On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
> >> Some type of domains don't have PIRQs, like PVH, it doesn't do
> >> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> >> to guest base on PVH dom0, callstack
> >> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> >> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> >> irq on Xen side.
> >> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> >> passing in pirq to set the access of irq, it is not suitable for
> >> dom0 that doesn't have PIRQs.
> >>
> >> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
> >> the permission of irq(translate from x86 gsi) to dumU when dom0
> >> has no PIRQs.
> > 
> > I've been wondering about this, and if the hypercall is strictly to
> > resolve GSIs into IRQs, isn't that the case that Xen identity maps GSI
> > into the IRQ space, and hence no translation is required?
> 
> It was a long-winded discussion to clarify that in obscure cases
> translation is required: Whenever there's a source override in ACPI.

Right, I see it's a bit convoluted to get the overrides, as those are
indexed by IO-APIC pin, so we need to resolve the GSI -> (ioapic, pin)
first and then check for any possible overrides.

Might be helpful to mention in the commit description that the GSI to
IRQ translation is done to account for ACPI overrides, as otherwise
GSIs are identity mapped into IRQs.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq
  2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (3 preceding siblings ...)
  2024-07-08 11:41 ` [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
@ 2024-07-08 11:41 ` Jiqian Chen
  2024-07-09 13:26   ` Jan Beulich
  2024-07-08 11:41 ` [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev Jiqian Chen
  2024-07-08 11:41 ` [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
  6 siblings, 1 reply; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
But in current xc_physdev_map_pirq, it set *pirq=index when
parameter pirq is <0, it causes to force all cases to be mapped
to a specific pirq. That has some problems, one is caller can't
get a free pirq value, another is that once the pecific pirq was
already mapped to other gsi, then it will fail.

So, change xc_physdev_map_pirq to allow to pass negative parameter
in and then get a free pirq.

There are four caller of xc_physdev_map_pirq in original codes, so
clarify the affect below(just need to clarify the pirq<0 case):

First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq
parameter, if pirq<0 means irq<0, then it will fail at check
"index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is
the same as original code.

Second, domcreate_launch_dm->libxl__arch_domain_map_irq->
xc_physdev_map_pirq, the passed pirq is always >=0, so no affect.

Third, pyxc_physdev_map_pirq->xc_physdev_map_pirq, not sure, so add
the check logic into pyxc_physdev_map_pirq to keep the same behavior.

Fourth, xen_pt_realize->xc_physdev_map_pirq, it wants to allocate a
pirq for gsi, but it isn't necessary to get pirq whose value is equal
with the value of gsi. After this patch, it will get a free pirq, and
it also can work.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 tools/libs/ctrl/xc_physdev.c      | 2 +-
 tools/python/xen/lowlevel/xc/xc.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..e9fcd755fa62 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
     map.domid = domid;
     map.type = MAP_PIRQ_TYPE_GSI;
     map.index = index;
-    map.pirq = *pirq < 0 ? index : *pirq;
+    map.pirq = *pirq;
 
     rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
 
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 9feb12ae2b16..f8c9db7115ee 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -774,6 +774,8 @@ static PyObject *pyxc_physdev_map_pirq(PyObject *self,
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list,
                                       &dom, &index, &pirq) )
         return NULL;
+    if ( pirq < 0 )
+        pirq = index;
     ret = xc_physdev_map_pirq(xc->xc_handle, dom, index, &pirq);
     if ( ret != 0 )
           return pyxc_error_to_exception(xc->xc_handle);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq
  2024-07-08 11:41 ` [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq Jiqian Chen
@ 2024-07-09 13:26   ` Jan Beulich
  2024-07-10  7:55     ` Chen, Jiqian
  2024-08-01 12:55     ` Roger Pau Monné
  0 siblings, 2 replies; 76+ messages in thread
From: Jan Beulich @ 2024-07-09 13:26 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel,
	Marek Marczykowski

On 08.07.2024 13:41, Jiqian Chen wrote:
> Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
> pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
> But in current xc_physdev_map_pirq, it set *pirq=index when
> parameter pirq is <0, it causes to force all cases to be mapped
> to a specific pirq. That has some problems, one is caller can't
> get a free pirq value, another is that once the pecific pirq was
> already mapped to other gsi, then it will fail.
> 
> So, change xc_physdev_map_pirq to allow to pass negative parameter
> in and then get a free pirq.
> 
> There are four caller of xc_physdev_map_pirq in original codes, so
> clarify the affect below(just need to clarify the pirq<0 case):
> 
> First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq
> parameter, if pirq<0 means irq<0, then it will fail at check
> "index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is
> the same as original code.

There we have

    int pirq = XEN_PT_UNASSIGNED_PIRQ;

(with XEN_PT_UNASSIGNED_PIRQ being -1) and then

    rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);

Therefore ...

> --- a/tools/libs/ctrl/xc_physdev.c
> +++ b/tools/libs/ctrl/xc_physdev.c
> @@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
>      map.domid = domid;
>      map.type = MAP_PIRQ_TYPE_GSI;
>      map.index = index;
> -    map.pirq = *pirq < 0 ? index : *pirq;
> +    map.pirq = *pirq;
>  
>      rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));

... this very much looks like a change in behavior to me: *pirq is
negative, and hence index would have been put in map.pirq instead. While
with your change we'd then pass -1, i.e. requesting to obtain a new
pIRQ.

I also consider it questionable to go by in-tree users. I think proof of
no functional change needs to also consider possible out-of-tree users,
not the least seeing the Python binding below (even if right there you
indeed attempt to retain prior behavior). The one aspect in your favor
is that libxc isn't considered to have a stable ABI.

Overall I see little room to avoid introducing a new function with this
improved behavior (maybe xc_physdev_map_pirq_gsi()). Ideally existing
callers would then be switched, to eventually allow removing the old
function (thus cleanly and noticeably breaking any out-of-tree users
that there may be, indicating to their developers that they need to
adjust their code).

> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -774,6 +774,8 @@ static PyObject *pyxc_physdev_map_pirq(PyObject *self,
>      if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list,
>                                        &dom, &index, &pirq) )
>          return NULL;
> +    if ( pirq < 0 )
> +        pirq = index;
>      ret = xc_physdev_map_pirq(xc->xc_handle, dom, index, &pirq);
>      if ( ret != 0 )
>            return pyxc_error_to_exception(xc->xc_handle);

I question this change, yet without Cc-ing the maintainer (now added)
you're not very likely to get a comment (let alone an ack) on this.

Jan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq
  2024-07-09 13:26   ` Jan Beulich
@ 2024-07-10  7:55     ` Chen, Jiqian
  2024-08-01 12:55     ` Roger Pau Monné
  1 sibling, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-10  7:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Marek Marczykowski, Chen, Jiqian

On 2024/7/9 21:26, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
>> Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
>> pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
>> But in current xc_physdev_map_pirq, it set *pirq=index when
>> parameter pirq is <0, it causes to force all cases to be mapped
>> to a specific pirq. That has some problems, one is caller can't
>> get a free pirq value, another is that once the pecific pirq was
>> already mapped to other gsi, then it will fail.
>>
>> So, change xc_physdev_map_pirq to allow to pass negative parameter
>> in and then get a free pirq.
>>
>> There are four caller of xc_physdev_map_pirq in original codes, so
>> clarify the affect below(just need to clarify the pirq<0 case):
>>
>> First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq
>> parameter, if pirq<0 means irq<0, then it will fail at check
>> "index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is
>> the same as original code.
> 
> There we have
> 
>     int pirq = XEN_PT_UNASSIGNED_PIRQ;
> 
> (with XEN_PT_UNASSIGNED_PIRQ being -1) and then
> 
>     rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
> 
> Therefore ...
> 
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
>>      map.domid = domid;
>>      map.type = MAP_PIRQ_TYPE_GSI;
>>      map.index = index;
>> -    map.pirq = *pirq < 0 ? index : *pirq;
>> +    map.pirq = *pirq;
>>  
>>      rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
> 
> ... this very much looks like a change in behavior to me: *pirq is
> negative, and hence index would have been put in map.pirq instead. While
> with your change we'd then pass -1, i.e. requesting to obtain a new
> pIRQ.
> 
> I also consider it questionable to go by in-tree users. I think proof of
> no functional change needs to also consider possible out-of-tree users,
> not the least seeing the Python binding below (even if right there you
> indeed attempt to retain prior behavior). The one aspect in your favor
> is that libxc isn't considered to have a stable ABI.
> 
> Overall I see little room to avoid introducing a new function with this
> improved behavior (maybe xc_physdev_map_pirq_gsi()). Ideally existing
> callers would then be switched, to eventually allow removing the old
> function (thus cleanly and noticeably breaking any out-of-tree users
> that there may be, indicating to their developers that they need to
> adjust their code).
Make sense, adding a new function xc_physdev_map_pirq_gsi is much better, and it has the least impact.
Thank you very much!
I will change to add xc_physdev_map_pirq_gsi in next version.

> 
>> --- a/tools/python/xen/lowlevel/xc/xc.c
>> +++ b/tools/python/xen/lowlevel/xc/xc.c
>> @@ -774,6 +774,8 @@ static PyObject *pyxc_physdev_map_pirq(PyObject *self,
>>      if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list,
>>                                        &dom, &index, &pirq) )
>>          return NULL;
>> +    if ( pirq < 0 )
>> +        pirq = index;
>>      ret = xc_physdev_map_pirq(xc->xc_handle, dom, index, &pirq);
>>      if ( ret != 0 )
>>            return pyxc_error_to_exception(xc->xc_handle);
> 
> I question this change, yet without Cc-ing the maintainer (now added)
> you're not very likely to get a comment (let alone an ack) on this.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq
  2024-07-09 13:26   ` Jan Beulich
  2024-07-10  7:55     ` Chen, Jiqian
@ 2024-08-01 12:55     ` Roger Pau Monné
  1 sibling, 0 replies; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-01 12:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jiqian Chen, Andrew Cooper, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel,
	Marek Marczykowski

On Tue, Jul 09, 2024 at 03:26:31PM +0200, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
> > Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
> > pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
> > But in current xc_physdev_map_pirq, it set *pirq=index when
> > parameter pirq is <0, it causes to force all cases to be mapped
> > to a specific pirq. That has some problems, one is caller can't
> > get a free pirq value, another is that once the pecific pirq was
> > already mapped to other gsi, then it will fail.
> > 
> > So, change xc_physdev_map_pirq to allow to pass negative parameter
> > in and then get a free pirq.
> > 
> > There are four caller of xc_physdev_map_pirq in original codes, so
> > clarify the affect below(just need to clarify the pirq<0 case):
> > 
> > First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq
> > parameter, if pirq<0 means irq<0, then it will fail at check
> > "index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is
> > the same as original code.
> 
> There we have
> 
>     int pirq = XEN_PT_UNASSIGNED_PIRQ;
> 
> (with XEN_PT_UNASSIGNED_PIRQ being -1) and then
> 
>     rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
> 
> Therefore ...
> 
> > --- a/tools/libs/ctrl/xc_physdev.c
> > +++ b/tools/libs/ctrl/xc_physdev.c
> > @@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
> >      map.domid = domid;
> >      map.type = MAP_PIRQ_TYPE_GSI;
> >      map.index = index;
> > -    map.pirq = *pirq < 0 ? index : *pirq;
> > +    map.pirq = *pirq;
> >  
> >      rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
> 
> ... this very much looks like a change in behavior to me: *pirq is
> negative, and hence index would have been put in map.pirq instead. While
> with your change we'd then pass -1, i.e. requesting to obtain a new
> pIRQ.
> 
> I also consider it questionable to go by in-tree users. I think proof of
> no functional change needs to also consider possible out-of-tree users,
> not the least seeing the Python binding below (even if right there you
> indeed attempt to retain prior behavior). The one aspect in your favor
> is that libxc isn't considered to have a stable ABI.

FWIW, it seems this forced identity mapping was introduced to overcome
a regression in xend as a result of an XSA:

934a5253d932 fix XSA-46 regression with xend/xm

Not sure however if other tools have since then come to rely on this
behavior.

> Overall I see little room to avoid introducing a new function with this
> improved behavior (maybe xc_physdev_map_pirq_gsi()). Ideally existing
> callers would then be switched, to eventually allow removing the old
> function (thus cleanly and noticeably breaking any out-of-tree users
> that there may be, indicating to their developers that they need to
> adjust their code).

I'm fine with the naming.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev
  2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (4 preceding siblings ...)
  2024-07-08 11:41 ` [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq Jiqian Chen
@ 2024-07-08 11:41 ` Jiqian Chen
  2024-07-08 13:27   ` Anthony PERARD
  2024-08-01 13:01   ` Roger Pau Monné
  2024-07-08 11:41 ` [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
  6 siblings, 2 replies; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices/<sbdf>/irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-Jiqian.Chen@amd.com/
This patch must be merged after the patch on linux kernel side

CC: Anthony PERARD <anthony@xenproject.org>
Remaining comment @Anthony PERARD:
Do I need to make " opening of /dev/xen/privcmd " as a single function, then use it in this
patch and other libraries?
---
 tools/include/xen-sys/Linux/privcmd.h |  7 ++++++
 tools/include/xenctrl.h               |  2 ++
 tools/libs/ctrl/xc_physdev.c          | 35 +++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..4cf719102116 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
 	__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_pcidev {
+	__u32 sbdf;
+	__u32 gsi;
+} privcmd_gsi_from_pcidev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
 	_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE				\
 	_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV				\
+	_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED				\
 	_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..3720e22b399a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
                           uint32_t domid,
                           int pirq);
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index e9fcd755fa62..54edb0f3c0dc 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
     return rc;
 }
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
+{
+    int rc = -1;
+
+#if defined(__linux__)
+    int fd;
+    privcmd_gsi_from_pcidev_t dev_gsi = {
+        .sbdf = sbdf,
+        .gsi = 0,
+    };
+
+    fd = open("/dev/xen/privcmd", O_RDWR);
+
+    if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
+        /* Fallback to /proc/xen/privcmd */
+        fd = open("/proc/xen/privcmd", O_RDWR);
+    }
+
+    if (fd < 0) {
+        PERROR("Could not obtain handle on privileged command interface");
+        return rc;
+    }
+
+    rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, &dev_gsi);
+    close(fd);
+
+    if (rc) {
+        PERROR("Failed to get gsi from dev");
+    } else {
+        rc = dev_gsi.gsi;
+    }
+#endif
+
+    return rc;
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev
  2024-07-08 11:41 ` [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev Jiqian Chen
@ 2024-07-08 13:27   ` Anthony PERARD
  2024-07-09  3:35     ` Chen, Jiqian
  2024-08-01 13:01   ` Roger Pau Monné
  1 sibling, 1 reply; 76+ messages in thread
From: Anthony PERARD @ 2024-07-08 13:27 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Mon, Jul 08, 2024 at 07:41:23PM +0800, Jiqian Chen wrote:
> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
> index e9fcd755fa62..54edb0f3c0dc 100644
> --- a/tools/libs/ctrl/xc_physdev.c
> +++ b/tools/libs/ctrl/xc_physdev.c
> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>      return rc;
>  }
>  
> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
> +{
> +    int rc = -1;
> +
> +#if defined(__linux__)
> +    int fd;
> +    privcmd_gsi_from_pcidev_t dev_gsi = {
> +        .sbdf = sbdf,
> +        .gsi = 0,
> +    };
> +
> +    fd = open("/dev/xen/privcmd", O_RDWR);


You could reuse the already opened fd from libxencall:
    xencall_fd(xch->xcall)

> +
> +    if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
> +        /* Fallback to /proc/xen/privcmd */
> +        fd = open("/proc/xen/privcmd", O_RDWR);
> +    }
> +
> +    if (fd < 0) {
> +        PERROR("Could not obtain handle on privileged command interface");
> +        return rc;
> +    }
> +
> +    rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, &dev_gsi);

I think this would be better implemented in Linux only C file instead of
using #define. There's already "xc_linux.c" which is probably good
enough to be used here.

Implementation for other OS would just set errno to ENOSYS and
return -1.


-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev
  2024-07-08 13:27   ` Anthony PERARD
@ 2024-07-09  3:35     ` Chen, Jiqian
  2024-07-29 16:30       ` Anthony PERARD
  0 siblings, 1 reply; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-09  3:35 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/7/8 21:27, Anthony PERARD wrote:
> On Mon, Jul 08, 2024 at 07:41:23PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index e9fcd755fa62..54edb0f3c0dc 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>      return rc;
>>  }
>>  
>> +int -(xc_interface *xch, uint32_t sbdf)
>> +{
>> +    int rc = -1;
>> +
>> +#if defined(__linux__)
>> +    int fd;
>> +    privcmd_gsi_from_pcidev_t dev_gsi = {
>> +        .sbdf = sbdf,
>> +        .gsi = 0,
>> +    };
>> +
>> +    fd = open("/dev/xen/privcmd", O_RDWR);
> 
> 
> You could reuse the already opened fd from libxencall:
>     xencall_fd(xch->xcall)
Do I need to check it this fd<0?

> 
>> +
>> +    if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
>> +        /* Fallback to /proc/xen/privcmd */
>> +        fd = open("/proc/xen/privcmd", O_RDWR);
>> +    }
>> +
>> +    if (fd < 0) {
>> +        PERROR("Could not obtain handle on privileged command interface");
>> +        return rc;
>> +    }
>> +
>> +    rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, &dev_gsi);
> 
> I think this would be better implemented in Linux only C file instead of
> using #define. There's already "xc_linux.c" which is probably good
> enough to be used here.
> 
> Implementation for other OS would just set errno to ENOSYS and
> return -1.
Thanks, will change in next version.

> 
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev
  2024-07-09  3:35     ` Chen, Jiqian
@ 2024-07-29 16:30       ` Anthony PERARD
  0 siblings, 0 replies; 76+ messages in thread
From: Anthony PERARD @ 2024-07-29 16:30 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray

On Tue, Jul 09, 2024 at 03:35:57AM +0000, Chen, Jiqian wrote:
> On 2024/7/8 21:27, Anthony PERARD wrote:
> > You could reuse the already opened fd from libxencall:
> >     xencall_fd(xch->xcall)
> Do I need to check it this fd<0?

No, it should be good to use.

Cheers,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev
  2024-07-08 11:41 ` [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev Jiqian Chen
  2024-07-08 13:27   ` Anthony PERARD
@ 2024-08-01 13:01   ` Roger Pau Monné
  2024-08-02  3:13     ` Chen, Jiqian
  1 sibling, 1 reply; 76+ messages in thread
From: Roger Pau Monné @ 2024-08-01 13:01 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Mon, Jul 08, 2024 at 07:41:23PM +0800, Jiqian Chen wrote:
> When passthrough a device to domU, QEMU and xl tools use its gsi
> number to do pirq mapping, see QEMU code
> xen_pt_realize->xc_physdev_map_pirq, and xl code
> pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
> from file /sys/bus/pci/devices/<sbdf>/irq, that is wrong, because
> irq is not equal with gsi, they are in different spaces, so pirq
> mapping fails.
> 
> And in current codes, there is no method to get gsi for userspace.
> For above purpose, add new function to get gsi, and the
> corresponding ioctl is implemented on linux kernel side.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> ---
> RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged.
> https://lore.kernel.org/xen-devel/20240607075109.126277-4-Jiqian.Chen@amd.com/
> This patch must be merged after the patch on linux kernel side
> 
> CC: Anthony PERARD <anthony@xenproject.org>
> Remaining comment @Anthony PERARD:
> Do I need to make " opening of /dev/xen/privcmd " as a single function, then use it in this
> patch and other libraries?
> ---
>  tools/include/xen-sys/Linux/privcmd.h |  7 ++++++
>  tools/include/xenctrl.h               |  2 ++
>  tools/libs/ctrl/xc_physdev.c          | 35 +++++++++++++++++++++++++++
>  3 files changed, 44 insertions(+)
> 
> diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
> index bc60e8fd55eb..4cf719102116 100644
> --- a/tools/include/xen-sys/Linux/privcmd.h
> +++ b/tools/include/xen-sys/Linux/privcmd.h
> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>  	__u64 addr;
>  } privcmd_mmap_resource_t;
>  
> +typedef struct privcmd_gsi_from_pcidev {
> +	__u32 sbdf;
> +	__u32 gsi;
> +} privcmd_gsi_from_pcidev_t;
> +
>  /*
>   * @cmd: IOCTL_PRIVCMD_HYPERCALL
>   * @arg: &privcmd_hypercall_t
> @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
>  	_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
>  #define IOCTL_PRIVCMD_MMAP_RESOURCE				\
>  	_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
> +#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV				\
> +	_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
>  #define IOCTL_PRIVCMD_UNIMPLEMENTED				\
>  	_IOC(_IOC_NONE, 'P', 0xFF, 0)
>  
> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
> index 9ceca0cffc2f..3720e22b399a 100644
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>                            uint32_t domid,
>                            int pirq);
>  
> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
> +
>  /*
>   *  LOGGING AND ERROR REPORTING
>   */
> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
> index e9fcd755fa62..54edb0f3c0dc 100644
> --- a/tools/libs/ctrl/xc_physdev.c
> +++ b/tools/libs/ctrl/xc_physdev.c
> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>      return rc;
>  }
>  
> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)

FWIW, I'm not sure it's fine to use the xc_physdev prefix here, as
this is not a PHYSDEVOP hypercall.

As Anthony suggested, it would be better placed in xc_linux.c, and
possibly named xc_pcidev_get_gsi() or similar, to avoid polluting the
xc_physdev namespace.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev
  2024-08-01 13:01   ` Roger Pau Monné
@ 2024-08-02  3:13     ` Chen, Jiqian
  0 siblings, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-08-02  3:13 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Chen, Jiqian, Huang, Ray

On 2024/8/1 21:01, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:23PM +0800, Jiqian Chen wrote:
>> When passthrough a device to domU, QEMU and xl tools use its gsi
>> number to do pirq mapping, see QEMU code
>> xen_pt_realize->xc_physdev_map_pirq, and xl code
>> pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
>> from file /sys/bus/pci/devices/<sbdf>/irq, that is wrong, because
>> irq is not equal with gsi, they are in different spaces, so pirq
>> mapping fails.
>>
>> And in current codes, there is no method to get gsi for userspace.
>> For above purpose, add new function to get gsi, and the
>> corresponding ioctl is implemented on linux kernel side.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
>> ---
>> RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged.
>> https://lore.kernel.org/xen-devel/20240607075109.126277-4-Jiqian.Chen@amd.com/
>> This patch must be merged after the patch on linux kernel side
>>
>> CC: Anthony PERARD <anthony@xenproject.org>
>> Remaining comment @Anthony PERARD:
>> Do I need to make " opening of /dev/xen/privcmd " as a single function, then use it in this
>> patch and other libraries?
>> ---
>>  tools/include/xen-sys/Linux/privcmd.h |  7 ++++++
>>  tools/include/xenctrl.h               |  2 ++
>>  tools/libs/ctrl/xc_physdev.c          | 35 +++++++++++++++++++++++++++
>>  3 files changed, 44 insertions(+)
>>
>> diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
>> index bc60e8fd55eb..4cf719102116 100644
>> --- a/tools/include/xen-sys/Linux/privcmd.h
>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>>  	__u64 addr;
>>  } privcmd_mmap_resource_t;
>>  
>> +typedef struct privcmd_gsi_from_pcidev {
>> +	__u32 sbdf;
>> +	__u32 gsi;
>> +} privcmd_gsi_from_pcidev_t;
>> +
>>  /*
>>   * @cmd: IOCTL_PRIVCMD_HYPERCALL
>>   * @arg: &privcmd_hypercall_t
>> @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
>>  	_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
>>  #define IOCTL_PRIVCMD_MMAP_RESOURCE				\
>>  	_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
>> +#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV				\
>> +	_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
>>  #define IOCTL_PRIVCMD_UNIMPLEMENTED				\
>>  	_IOC(_IOC_NONE, 'P', 0xFF, 0)
>>  
>> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
>> index 9ceca0cffc2f..3720e22b399a 100644
>> --- a/tools/include/xenctrl.h
>> +++ b/tools/include/xenctrl.h
>> @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>                            uint32_t domid,
>>                            int pirq);
>>  
>> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
>> +
>>  /*
>>   *  LOGGING AND ERROR REPORTING
>>   */
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index e9fcd755fa62..54edb0f3c0dc 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>      return rc;
>>  }
>>  
>> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
> 
> FWIW, I'm not sure it's fine to use the xc_physdev prefix here, as
> this is not a PHYSDEVOP hypercall.
> 
> As Anthony suggested, it would be better placed in xc_linux.c, and
> possibly named xc_pcidev_get_gsi() or similar, to avoid polluting the
> xc_physdev namespace.
Thanks, will change in next version.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0
  2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (5 preceding siblings ...)
  2024-07-08 11:41 ` [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev Jiqian Chen
@ 2024-07-08 11:41 ` Jiqian Chen
  2024-07-08 14:57   ` Anthony PERARD
  6 siblings, 1 reply; 76+ messages in thread
From: Jiqian Chen @ 2024-07-08 11:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices/<sbdf>/irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and then to map pirq.

Besides, PVH dom doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-Jiqian.Chen@amd.com/
This patch must be merged after the patch on linux kernel side
---
 tools/include/xenctrl.h       |   5 ++
 tools/libs/ctrl/xc_domain.c   |  15 +++++
 tools/libs/light/libxl_arch.h |   4 ++
 tools/libs/light/libxl_arm.c  |  10 +++
 tools/libs/light/libxl_pci.c  |  17 ++++++
 tools/libs/light/libxl_x86.c  | 111 ++++++++++++++++++++++++++++++++++
 6 files changed, 162 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 3720e22b399a..9ff5f1810cf8 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
                              uint32_t pirq,
                              bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             uint8_t access_flag);
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..4c89f07e4d6e 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             uint8_t access_flag)
+{
+    struct xen_domctl domctl = {
+        .cmd = XEN_DOMCTL_gsi_permission,
+        .domain = domid,
+        .u.gsi_permission.gsi = gsi,
+        .u.gsi_permission.access_flag = access_flag,
+    };
+
+    return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
index f88f11d6de1d..11b736067951 100644
--- a/tools/libs/light/libxl_arch.h
+++ b/tools/libs/light/libxl_arch.h
@@ -91,6 +91,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
                                       libxl_domain_config *dst,
                                       const libxl_domain_config *src);
 
+_hidden
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
+_hidden
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
 #if defined(__i386__) || defined(__x86_64__)
 
 #define LAPIC_BASE_ADDRESS  0xfee00000
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index a4029e3ac810..d869bbec769e 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
 {
 }
 
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    return -1;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..3d25997921cc 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 
 #define PCI_BDF                "%04x:%02x:%02x.%01x"
 #define PCI_BDF_SHORT          "%02x:%02x.%01x"
@@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
     fclose(f);
     if (!pci_supp_legacy_irq())
         goto out_no_irq;
+
+    /*
+     * When dom0 is PVH and mapping a x86 gsi to pirq for domU,
+     * should use gsi to grant irq permission.
+     */
+    if (!libxl__arch_hvm_map_gsi(gc, pci_encode_bdf(pci), domid))
+        goto pci_permissive;
+    else
+        LOGED(WARN, domid, "libxl__arch_hvm_map_gsi failed (err=%d)", errno);
+
     sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
                                 pci->bus, pci->dev, pci->func);
     f = fopen(sysfs_path, "r");
@@ -1505,6 +1516,7 @@ static void pci_add_dm_done(libxl__egc *egc,
     }
     fclose(f);
 
+pci_permissive:
     /* Don't restrict writes to the PCI config space from this VM */
     if (pci->permissive) {
         if ( sysfs_write_bdf(gc, SYSFS_PCIBACK_DRIVER"/permissive",
@@ -2229,6 +2241,11 @@ skip_bar:
     if (!pci_supp_legacy_irq())
         goto skip_legacy_irq;
 
+    if (!libxl__arch_hvm_unmap_gsi(gc, pci_encode_bdf(pci), domid))
+        goto skip_legacy_irq;
+    else
+        LOGED(WARN, domid, "libxl__arch_hvm_unmap_gsi failed (err=%d)", errno);
+
     sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
                            pci->bus, pci->dev, pci->func);
 
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index 60643d6f5376..e7756d323cb6 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -879,6 +879,117 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
                                  libxl_defbool_val(src->b_info.u.hvm.pirq));
 }
 
+struct pcidev_map_pirq {
+    uint32_t sbdf;
+    uint32_t pirq;
+    XEN_LIST_ENTRY(struct pcidev_map_pirq) entry;
+};
+
+static pthread_mutex_t pcidev_pirq_mutex = PTHREAD_MUTEX_INITIALIZER;
+static XEN_LIST_HEAD(, struct pcidev_map_pirq) pcidev_pirq_list =
+    XEN_LIST_HEAD_INITIALIZER(pcidev_pirq_list);
+
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    int pirq = -1, gsi, r;
+    xc_domaininfo_t info;
+    struct pcidev_map_pirq *pcidev_pirq;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+
+    r = xc_domain_getinfo_single(ctx->xch, LIBXL_TOOLSTACK_DOMID, &info);
+    if (r < 0) {
+        LOGED(ERROR, domid, "getdomaininfo failed (error=%d)", errno);
+        return r;
+    }
+    if ((info.flags & XEN_DOMINF_hvm_guest) &&
+        !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ)) {
+        gsi = xc_physdev_gsi_from_pcidev(ctx->xch, sbdf);
+        if (gsi < 0) {
+            return ERROR_FAIL;
+        }
+        r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq);
+        if (r < 0) {
+            LOGED(ERROR, domid, "xc_physdev_map_pirq gsi=%d (error=%d)",
+                  gsi, errno);
+            return r;
+        }
+        r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+        if (r < 0) {
+            LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d (error=%d)",
+                  gsi, errno);
+            return r;
+        }
+    } else {
+        return ERROR_FAIL;
+    }
+
+    /* Save the pirq for the usage of unmapping */
+    pcidev_pirq = malloc(sizeof(struct pcidev_map_pirq));
+    if (!pcidev_pirq) {
+        LOGED(ERROR, domid, "no memory for saving pirq of pcidev info");
+        return ERROR_NOMEM;
+    }
+    pcidev_pirq->sbdf = sbdf;
+    pcidev_pirq->pirq = pirq;
+
+    assert(!pthread_mutex_lock(&pcidev_pirq_mutex));
+    XEN_LIST_INSERT_HEAD(&pcidev_pirq_list, pcidev_pirq, entry);
+    assert(!pthread_mutex_unlock(&pcidev_pirq_mutex));
+
+    return 0;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    int pirq = -1, gsi, r;
+    xc_domaininfo_t info;
+    struct pcidev_map_pirq *pcidev_pirq;
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+
+    r = xc_domain_getinfo_single(ctx->xch, LIBXL_TOOLSTACK_DOMID, &info);
+    if (r < 0) {
+        LOGED(ERROR, domid, "getdomaininfo failed (error=%d)", errno);
+        return r;
+    }
+    if ((info.flags & XEN_DOMINF_hvm_guest) &&
+        !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ)) {
+        gsi = xc_physdev_gsi_from_pcidev(ctx->xch, sbdf);
+        if (gsi < 0) {
+            return ERROR_FAIL;
+        }
+        assert(!pthread_mutex_lock(&pcidev_pirq_mutex));
+        XEN_LIST_FOREACH(pcidev_pirq, &pcidev_pirq_list, entry) {
+            if (pcidev_pirq->sbdf == sbdf) {
+                pirq = pcidev_pirq->pirq;
+                XEN_LIST_REMOVE(pcidev_pirq, entry);
+                free(pcidev_pirq);
+                break;
+            }
+        }
+        assert(!pthread_mutex_unlock(&pcidev_pirq_mutex));
+        if (pirq < 0) {
+            /* pirq has been unmapped, so return directly */
+            return 0;
+        }
+        r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
+        if (r < 0) {
+            LOGED(ERROR, domid, "xc_physdev_unmap_pirq pirq=%d (error=%d)",
+                  pirq, errno);
+            return r;
+        }
+        r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 0);
+        if (r < 0) {
+            LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d (error=%d)",
+                  gsi, errno);
+            return r;
+        }
+    } else {
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0
  2024-07-08 11:41 ` [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
@ 2024-07-08 14:57   ` Anthony PERARD
  2024-07-09  6:18     ` Chen, Jiqian
  0 siblings, 1 reply; 76+ messages in thread
From: Anthony PERARD @ 2024-07-08 14:57 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Mon, Jul 08, 2024 at 07:41:24PM +0800, Jiqian Chen wrote:
> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
> index a4029e3ac810..d869bbec769e 100644
> --- a/tools/libs/light/libxl_arm.c
> +++ b/tools/libs/light/libxl_arm.c
> @@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
>  {
>  }
>  
> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
> +{
> +    return -1;

It's best to return an ERROR_* for libxl error code instead of -1.
ERROR_NI seems to be the one, it probably means not-implemented. Or
maybe ERROR_INVAL would do to.

> +}
> +
> +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
> +{
> +    return -1;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index 96cb4da0794e..3d25997921cc 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -17,6 +17,7 @@
>  #include "libxl_osdeps.h" /* must come before any other headers */
>  
>  #include "libxl_internal.h"
> +#include "libxl_arch.h"
>  
>  #define PCI_BDF                "%04x:%02x:%02x.%01x"
>  #define PCI_BDF_SHORT          "%02x:%02x.%01x"
> @@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
>      fclose(f);
>      if (!pci_supp_legacy_irq())
>          goto out_no_irq;
> +
> +    /*
> +     * When dom0 is PVH and mapping a x86 gsi to pirq for domU,
> +     * should use gsi to grant irq permission.
> +     */
> +    if (!libxl__arch_hvm_map_gsi(gc, pci_encode_bdf(pci), domid))

Could you store the result of libxl__arch_hvm_map_gsi() in `rc', then
test that in the condition?

> +        goto pci_permissive;

Why do you skip part of the function on success?
But also, please avoid the "goto" coding style, in libxl, it's tolerated
for error handling when used to skip to the end of function to have a
single path (or error path) out of a function.

> +    else
> +        LOGED(WARN, domid, "libxl__arch_hvm_map_gsi failed (err=%d)", errno);

No one reads logs unless there's a failure or something doesn't work. So
here we just ignore failure returned by libxl__arch_hvm_map_gsi(), is it
the right things to do? Usually, just ignoring error is wrong.

FYI: LOGE* already logs errno.

> +
>      sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>                                  pci->bus, pci->dev, pci->func);
>      f = fopen(sysfs_path, "r");
> @@ -1505,6 +1516,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>      }
>      fclose(f);
>  
> +pci_permissive:
>      /* Don't restrict writes to the PCI config space from this VM */
>      if (pci->permissive) {
>          if ( sysfs_write_bdf(gc, SYSFS_PCIBACK_DRIVER"/permissive",
> @@ -2229,6 +2241,11 @@ skip_bar:
>      if (!pci_supp_legacy_irq())
>          goto skip_legacy_irq;
>  
> +    if (!libxl__arch_hvm_unmap_gsi(gc, pci_encode_bdf(pci), domid))
> +        goto skip_legacy_irq;
> +    else
> +        LOGED(WARN, domid, "libxl__arch_hvm_unmap_gsi failed (err=%d)", errno);
> +
>      sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>                             pci->bus, pci->dev, pci->func);
>  
> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> index 60643d6f5376..e7756d323cb6 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -879,6 +879,117 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
>                                   libxl_defbool_val(src->b_info.u.hvm.pirq));
>  }
>  
> +struct pcidev_map_pirq {
> +    uint32_t sbdf;
> +    uint32_t pirq;
> +    XEN_LIST_ENTRY(struct pcidev_map_pirq) entry;
> +};
> +
> +static pthread_mutex_t pcidev_pirq_mutex = PTHREAD_MUTEX_INITIALIZER;
> +static XEN_LIST_HEAD(, struct pcidev_map_pirq) pcidev_pirq_list =
> +    XEN_LIST_HEAD_INITIALIZER(pcidev_pirq_list);
> +
> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
> +{
> +    int pirq = -1, gsi, r;
> +    xc_domaininfo_t info;
> +    struct pcidev_map_pirq *pcidev_pirq;
> +    libxl_ctx *ctx = libxl__gc_owner(gc);

Instead of declaring "ctx", you can use the macro "CTX" when you need
"ctx".

> +
> +    r = xc_domain_getinfo_single(ctx->xch, LIBXL_TOOLSTACK_DOMID, &info);
> +    if (r < 0) {
> +        LOGED(ERROR, domid, "getdomaininfo failed (error=%d)", errno);
> +        return r;

libxl_*() functions should return only libxl error code, that is return
code from other libxl_* functions, useally store in 'rc', or one of ERROR_*.

> +    }
> +    if ((info.flags & XEN_DOMINF_hvm_guest) &&
> +        !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ)) {
> +        gsi = xc_physdev_gsi_from_pcidev(ctx->xch, sbdf);
> +        if (gsi < 0) {
> +            return ERROR_FAIL;
> +        }
> +        r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq);
> +        if (r < 0) {
> +            LOGED(ERROR, domid, "xc_physdev_map_pirq gsi=%d (error=%d)",
> +                  gsi, errno);
> +            return r;
> +        }
> +        r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
> +        if (r < 0) {
> +            LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d (error=%d)",
> +                  gsi, errno);
> +            return r;
> +        }
> +    } else {
> +        return ERROR_FAIL;

Is it really an error?

I few values can be returned here,
  * ERROR_INVAL meaing that the function was called on a dom0 that don't
    do "GSI",
  * 0, that is success, because the function check if it need to do
    anything, and since there's nothing to do, we can return success.

> +    }
> +
> +    /* Save the pirq for the usage of unmapping */
> +    pcidev_pirq = malloc(sizeof(struct pcidev_map_pirq));
> +    if (!pcidev_pirq) {
> +        LOGED(ERROR, domid, "no memory for saving pirq of pcidev info");
> +        return ERROR_NOMEM;
> +    }
> +    pcidev_pirq->sbdf = sbdf;
> +    pcidev_pirq->pirq = pirq;
> +
> +    assert(!pthread_mutex_lock(&pcidev_pirq_mutex));
> +    XEN_LIST_INSERT_HEAD(&pcidev_pirq_list, pcidev_pirq, entry);

I don't think that's going to work as you expect. libxl isn't a daemon
(or sometime it is but used for several domains), so anything store in
memory will be lost, or would be shared with other guest.

Do you need this mappins sbdf<-> pirq ? Is there a way to query this
information later from the environement? If not, you will need to store
the data somewhere else, probably in "libxl_domain_config *d_config" as
libxl can retrive the data with libxl__get_domain_configuration().
There's also the posibility to store that info in xenstore, but we
should probably avoid that.

Thanks,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0
  2024-07-08 14:57   ` Anthony PERARD
@ 2024-07-09  6:18     ` Chen, Jiqian
  0 siblings, 0 replies; 76+ messages in thread
From: Chen, Jiqian @ 2024-07-09  6:18 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/7/8 22:57, Anthony PERARD wrote:
> On Mon, Jul 08, 2024 at 07:41:24PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
>> index a4029e3ac810..d869bbec769e 100644
>> --- a/tools/libs/light/libxl_arm.c
>> +++ b/tools/libs/light/libxl_arm.c
>> @@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
>>  {
>>  }
>>  
>> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +    return -1;
> 
> It's best to return an ERROR_* for libxl error code instead of -1.
> ERROR_NI seems to be the one, it probably means not-implemented. Or
> maybe ERROR_INVAL would do to.
Seems ERROR_INVAL is more suitable. Will change in next version.

> 
>> +}
>> +
>> +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +    return -1;
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 96cb4da0794e..3d25997921cc 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -17,6 +17,7 @@
>>  #include "libxl_osdeps.h" /* must come before any other headers */
>>  
>>  #include "libxl_internal.h"
>> +#include "libxl_arch.h"
>>  
>>  #define PCI_BDF                "%04x:%02x:%02x.%01x"
>>  #define PCI_BDF_SHORT          "%02x:%02x.%01x"
>> @@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
>>      fclose(f);
>>      if (!pci_supp_legacy_irq())
>>          goto out_no_irq;
>> +
>> +    /*
>> +     * When dom0 is PVH and mapping a x86 gsi to pirq for domU,
>> +     * should use gsi to grant irq permission.
>> +     */
>> +    if (!libxl__arch_hvm_map_gsi(gc, pci_encode_bdf(pci), domid))
> 
> Could you store the result of libxl__arch_hvm_map_gsi() in `rc', then
> test that in the condition?
Will change in next version.
> 
>> +        goto pci_permissive;
> 
> Why do you skip part of the function on success?
Because libxl__arch_hvm_map_gsi do the same thing for PVH dom0, and the following part is for PV dom0.
If libxl__arch_hvm_map_gsi success, it should skip the following part.

> But also, please avoid the "goto" coding style, in libxl, it's tolerated
> for error handling when used to skip to the end of function to have a
> single path (or error path) out of a function.
Maybe I should split the part " xc_domain_getinfo_single(ctx->xch, LIBXL_TOOLSTACK_DOMID, &info); " in libxl__arch_hvm_map_gsi to a single function.
Then I can distinguish PVH and PV, and do different things for them.

> 
>> +    else
>> +        LOGED(WARN, domid, "libxl__arch_hvm_map_gsi failed (err=%d)", errno);
> 
> No one reads logs unless there's a failure or something doesn't work. So
> here we just ignore failure returned by libxl__arch_hvm_map_gsi(), is it
> the right things to do? Usually, just ignoring error is wrong.
Will change in next version.
> 
> FYI: LOGE* already logs errno.
> 
>> +
>>      sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>>                                  pci->bus, pci->dev, pci->func);
>>      f = fopen(sysfs_path, "r");
>> @@ -1505,6 +1516,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>      }
>>      fclose(f);
>>  
>> +pci_permissive:
>>      /* Don't restrict writes to the PCI config space from this VM */
>>      if (pci->permissive) {
>>          if ( sysfs_write_bdf(gc, SYSFS_PCIBACK_DRIVER"/permissive",
>> @@ -2229,6 +2241,11 @@ skip_bar:
>>      if (!pci_supp_legacy_irq())
>>          goto skip_legacy_irq;
>>  
>> +    if (!libxl__arch_hvm_unmap_gsi(gc, pci_encode_bdf(pci), domid))
>> +        goto skip_legacy_irq;
>> +    else
>> +        LOGED(WARN, domid, "libxl__arch_hvm_unmap_gsi failed (err=%d)", errno);
>> +
>>      sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>>                             pci->bus, pci->dev, pci->func);
>>  
>> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
>> index 60643d6f5376..e7756d323cb6 100644
>> --- a/tools/libs/light/libxl_x86.c
>> +++ b/tools/libs/light/libxl_x86.c
>> @@ -879,6 +879,117 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
>>                                   libxl_defbool_val(src->b_info.u.hvm.pirq));
>>  }
>>  
>> +struct pcidev_map_pirq {
>> +    uint32_t sbdf;
>> +    uint32_t pirq;
>> +    XEN_LIST_ENTRY(struct pcidev_map_pirq) entry;
>> +};
>> +
>> +static pthread_mutex_t pcidev_pirq_mutex = PTHREAD_MUTEX_INITIALIZER;
>> +static XEN_LIST_HEAD(, struct pcidev_map_pirq) pcidev_pirq_list =
>> +    XEN_LIST_HEAD_INITIALIZER(pcidev_pirq_list);
>> +
>> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +    int pirq = -1, gsi, r;
>> +    xc_domaininfo_t info;
>> +    struct pcidev_map_pirq *pcidev_pirq;
>> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> 
> Instead of declaring "ctx", you can use the macro "CTX" when you need
> "ctx".
Will change in next version.

> 
>> +
>> +    r = xc_domain_getinfo_single(ctx->xch, LIBXL_TOOLSTACK_DOMID, &info);
>> +    if (r < 0) {
>> +        LOGED(ERROR, domid, "getdomaininfo failed (error=%d)", errno);
>> +        return r;
> 
> libxl_*() functions should return only libxl error code, that is return
> code from other libxl_* functions, useally store in 'rc', or one of ERROR_*.
OK, will change in next version.

> 
>> +    }
>> +    if ((info.flags & XEN_DOMINF_hvm_guest) &&
>> +        !(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ)) {
>> +        gsi = xc_physdev_gsi_from_pcidev(ctx->xch, sbdf);
>> +        if (gsi < 0) {
>> +            return ERROR_FAIL;
>> +        }
>> +        r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq);
>> +        if (r < 0) {
>> +            LOGED(ERROR, domid, "xc_physdev_map_pirq gsi=%d (error=%d)",
>> +                  gsi, errno);
>> +            return r;
>> +        }
>> +        r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
>> +        if (r < 0) {
>> +            LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d (error=%d)",
>> +                  gsi, errno);
>> +            return r;
>> +        }
>> +    } else {
>> +        return ERROR_FAIL;
> 
> Is it really an error?
> 
> I few values can be returned here,
>   * ERROR_INVAL meaing that the function was called on a dom0 that don't
>     do "GSI",
I think this is more suitable. And then the following code of PV can be done in pci_add_dm_done.

>   * 0, that is success, because the function check if it need to do
>     anything, and since there's nothing to do, we can return success.
> 
>> +    }
>> +
>> +    /* Save the pirq for the usage of unmapping */
>> +    pcidev_pirq = malloc(sizeof(struct pcidev_map_pirq));
>> +    if (!pcidev_pirq) {
>> +        LOGED(ERROR, domid, "no memory for saving pirq of pcidev info");
>> +        return ERROR_NOMEM;
>> +    }
>> +    pcidev_pirq->sbdf = sbdf;
>> +    pcidev_pirq->pirq = pirq;
>> +
>> +    assert(!pthread_mutex_lock(&pcidev_pirq_mutex));
>> +    XEN_LIST_INSERT_HEAD(&pcidev_pirq_list, pcidev_pirq, entry);
> 
> I don't think that's going to work as you expect. libxl isn't a daemon
> (or sometime it is but used for several domains), so anything store in
> memory will be lost, or would be shared with other guest.
> 
> Do you need this mappins sbdf<-> pirq ? 
I need to store the pirq that assigned to the gsi. Because libxl__arch_hvm_unmap_gsi need pirq to do xc_physdev_unmap_pirq

> Is there a way to query this information later from the environement? 
What I can think of is before xc_physdev_unmap_pirq, use xc_physdev_map_pirq to get the already mapped pirq, but I am not sure if it is suitable.

> If not, you will need to store the data somewhere else, probably in "libxl_domain_config *d_config" as
> libxl can retrive the data with libxl__get_domain_configuration().
However, pirq is dynamically mapped during starting domU, it may not be suitable for saving in d_config.

> There's also the posibility to store that info in xenstore, but we
> should probably avoid that.
> 
> Thanks,
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2024-08-02 12:05 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-08 11:41 [XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
2024-07-08 11:41 ` [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
2024-07-08 14:56   ` Jan Beulich
2024-07-09  2:47     ` Chen, Jiqian
2024-07-09  6:01       ` Jan Beulich
2024-07-31 15:55   ` Roger Pau Monné
2024-07-31 15:58     ` Jan Beulich
2024-07-31 16:13       ` Roger Pau Monné
2024-08-01  6:49         ` Jan Beulich
2024-08-02  2:56           ` Chen, Jiqian
2024-08-02  2:55     ` Chen, Jiqian
2024-08-02  6:25       ` Jan Beulich
2024-08-02  7:41         ` Chen, Jiqian
2024-08-02  7:43           ` Jan Beulich
2024-08-02  7:44         ` Roger Pau Monné
2024-07-08 11:41 ` [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
2024-07-08 14:58   ` Jan Beulich
2024-07-22 21:37   ` Stefano Stabellini
2024-07-30 13:09   ` Andrew Cooper
2024-07-31  1:47     ` Chen, Jiqian
2024-07-31  8:31     ` Chen, Jiqian
2024-07-31  8:42       ` Jan Beulich
2024-07-31  7:50   ` Roger Pau Monné
2024-07-31  7:58     ` Jan Beulich
2024-07-31  8:24       ` Roger Pau Monné
2024-07-31  8:40         ` Jan Beulich
2024-07-31  8:51           ` Roger Pau Monné
2024-07-31  9:02             ` Jan Beulich
2024-07-31  9:37               ` Roger Pau Monné
2024-07-31  9:55                 ` Jan Beulich
2024-07-31 11:29                   ` Roger Pau Monné
2024-07-31 11:39                     ` Jan Beulich
2024-07-31 13:03                       ` Roger Pau Monné
2024-08-02  2:37                         ` Chen, Jiqian
2024-08-02  8:11                           ` Roger Pau Monné
2024-08-02  8:17                             ` Chen, Jiqian
2024-08-02  8:35                               ` Roger Pau Monné
2024-08-02  8:40                                 ` Chen, Jiqian
2024-08-02  9:17                                   ` Jan Beulich
2024-08-02  9:37                             ` Jan Beulich
2024-07-31  8:39     ` Chen, Jiqian
2024-07-08 11:41 ` [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
2024-07-10  8:01   ` Chen, Jiqian
2024-07-11  7:58   ` Chen, Jiqian
2024-07-22 21:38   ` Stefano Stabellini
2024-07-08 11:41 ` [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
2024-07-09 13:08   ` Jan Beulich
2024-07-26  6:55     ` Chen, Jiqian
2024-07-22 22:10   ` Stefano Stabellini
2024-07-26  6:53     ` Chen, Jiqian
2024-07-26 20:16       ` Stefano Stabellini
2024-08-01 11:06   ` Roger Pau Monné
2024-08-01 11:36     ` Jan Beulich
2024-08-01 12:41       ` Roger Pau Monné
2024-08-01 13:11         ` Jan Beulich
2024-08-02  3:10     ` Chen, Jiqian
2024-08-02  6:27       ` Jan Beulich
2024-08-02  7:44         ` Chen, Jiqian
2024-08-02  7:59       ` Roger Pau Monné
2024-08-02  8:08   ` Roger Pau Monné
2024-08-02  8:23     ` Chen, Jiqian
2024-08-02  9:40     ` Jan Beulich
2024-08-02 12:05       ` Roger Pau Monné
2024-07-08 11:41 ` [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq Jiqian Chen
2024-07-09 13:26   ` Jan Beulich
2024-07-10  7:55     ` Chen, Jiqian
2024-08-01 12:55     ` Roger Pau Monné
2024-07-08 11:41 ` [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev Jiqian Chen
2024-07-08 13:27   ` Anthony PERARD
2024-07-09  3:35     ` Chen, Jiqian
2024-07-29 16:30       ` Anthony PERARD
2024-08-01 13:01   ` Roger Pau Monné
2024-08-02  3:13     ` Chen, Jiqian
2024-07-08 11:41 ` [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
2024-07-08 14:57   ` Anthony PERARD
2024-07-09  6:18     ` Chen, Jiqian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.