[XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen

All of lore.kernel.org
 help / color / mirror / Atom feed

* [XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen
@ 2024-09-03  7:04 Jiqian Chen
  2024-09-03  7:04 ` [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Jiqian Chen @ 2024-09-03  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

Hi All,
This is v14 series to support passthrough when dom0 is PVH
The expected merge order of this series is the first two patches in this series, then patches on
kernel side, then the last three patches in this series.

v13->v14 changes:
* patch#1: Removed the check ( !is_pci_passthrough_enabled() ).
           Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if the other bits are zero.

* patch#2: Modified the commit message.

Due to the patch#3 of v13 had been merged, so the sequence number of following patches are v13 decrese one.

* patch#3~5: No changes.

Best regards,
Jiqian Chen

v12->v13 changes:
Due to major changes in the codes, all the Reviewed-by received before have been removed.
Please review them again.
* patch#1: Delete all "state" words in new code, because it is not necessary.
           Delete unnecessary parameter reset_type of function vpci_reset_device, and changed this
           function to inline function.
           Add description to commit message to indicate that the classification of reset types is
           for possible different behaviors in the future.
           Rename reset_type of struct pci_device_reset to flags, and modified the value of macro
           definition of reset, let them occupy two lowest bits.
           Change the function vpci_reset_device to an inline function and delete the
           "ASSERT(rw_is_write_locked(&pdev->domain->pci_lock))"; because this exists in subsequent
           functions and it accesses domain and pci_lock, which will affect the compilation process.
* patch#2: Remove the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding
           description in the commit message.
* patch#3: Add more detailed descriptions into commit message not just callstack.
* patch#4: For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from
           uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros.
           Move "gsi > highest_gsi()" into function gsi_2_irq.
           Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type.
           Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
           Delete unnecessary goto statements and change to direct break.
           Add description in commit message to explain how gsi to irq is converted.
* patch#5: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with
           physdev namesapce.
           Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
           Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening
           "privcmd".
* patch#6: Delete patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi.
           For functions that generate libxl error, changed the return value from -1 to ERROR_*.
           Instead of declaring "ctx", use the macro "CTX".
           Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept
           of pirq in the domain where xl is located.
           In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq
           corresponding to gsi.

v11->v12 changes:
* patch#1: Change the title of this patch.
           Remove unnecessary notes, erroneous stamps, and #define.
* patch#2: Avoid using return, set error code instead when (un)map is not allowed.
           Due to functional change in v11, remove the Reviewed-by of Stefano.
* patch#3: Add more detailed descriptions into commit message not just callstack.

patch#4 in v11: remove from this series and upstream individually.

* patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to
           remove "__init" of highest_gsi function.
           Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
           Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
* patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is affected.

v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line.
           Delete unnecessary local variables "struct physdev_pci_device *dev".
           Downgrade printk to dprintk.
           Moved struct pci_device_state_reset to the public header file.
           Delete enum pci_device_state_reset_type, and use macro definitions to represent different
           reset types.
           Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset
           to handle different reset functions.
           Add reset type as a function parameter for vpci_reset_device_state for possible future use
* patch#2: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being
           executed when domU has no pirq, instead of just preventing self-mapping; and modify the
           description of the commit message accordingly.
* patch#3: Modify the commit message to explain why the gsi of normal devices can work in PVH dom0 and why
           the passthrough device does not work in PVH dom0.
* patch#4: New patch, modification of allocate_pirq function, return the allocated pirq when there is
           already an allocated pirq and the caller has no specific requirements for pirq, and make it
           successful.
* patch#5: Modification on the hypervisor side proposed from patch#5 of v10.
           Add non-zero judgment for other bits of allow_access.
           Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
           Change the error exit path identifier "out" to "gsi_permission_out".
           Use ARRAY_SIZE() instead of open coed.
* patch#6: New patch, modification of xc_physdev_map_pirq to support mapping gsi to an idle pirq.
* patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev
           instead of adding unnecessary functions to libxencall.
           Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32.
* patch#8: Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get
           gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
           Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
           Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations.
           Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be
           used to obtain the corresponding pirq when unmap PIRQ.

v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code style.
* patch#3: Modified the description in the commit message, changing "it calls" to "it will need to call",
           indicating that there will be new codes on the kernel side that will call PHYSDEVOP_setup_gsi.
           Also added an explanation of why the interrupt of passthrough device does not work if gsi is not
           registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission.
           Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission directly in pci_add_dm_done.
           Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd
           instead of current->domain.
           In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and
           error handling for irq0 was added.
           Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission
           definition.
All patches have modified signatures as follows:
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> means I am the author.
Signed-off-by: Huang Rui <ray.huang@amd.com> means Rui sent them to upstream firstly.
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> means I take continue to upstream.

v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());"
           from vpci_reset_device_state;
           Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall.
           Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to
           "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke in below.Although their return
           values are different, this difference is acceptable for the sake of code consistency
           if ( !is_hardware_domain(currd) )
		       return -ENOSYS;
           break;
* patch#5: Change the commit message to describe more why we need this new hypercall.
           Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain
           why we need this check.
           Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
           Add explicit padding to struct xen_domctl_gsi_permission.

v7->v8 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq.
           That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in pci_add_dm_done use a new function
           pci_device_set_gsi to do map_pirq and grant permission. That gets more intuitive code logic.

v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.

v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device,
           vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to be understood, which to use
           gsi by default and be compatible with older kernel versions to continue to use irq

v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and
           just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno
           and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi

v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to grant irq permission in
           XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-Jiqian.Chen@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete pci_reset_device_state; add
           xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; add description for
		   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and
           map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need
		   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi
           instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-Jiqian.Chen@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-Jiqian.Chen@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions.
I will introduce all issues that these patches try to fix and the differences between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add <sbdf>\u201d to assign a device, pci_stub will call
pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()\u201d, the pci config
write will trigger an io interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in
pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the
state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from
v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.huang@amd.com/), v1 simply allow domU to write
pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call
into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail
at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH). The
second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2
patch is better than v1, v1 simply remove the has_pirq check
(xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.huang@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has corresponding
mappings in dom0. But it didn\u2019t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current"
is PVH dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH
are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function
vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0)
call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the
v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/,
kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.huang@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may cause
multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device\u2019s gsi to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and
treat irq as gsi, it is got from file /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function xen_host_pci_device_get().
But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing
first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated
from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38
get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a
translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations
in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi
from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch is the
same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.huang@amd.com/)

About the v2 patch of qemu, just change an included head file, other are similar to the
v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.huang@amd.com/), just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (5):
  xen/pci: Add hypercall to support reset of pcidev
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/domctl: Add hypercall to set the access of x86 gsi
  tools: Add new function to get gsi from dev
  tools: Add new function to do PIRQ (un)map on PVH dom0

 tools/include/xen-sys/Linux/privcmd.h |   7 ++
 tools/include/xenctrl.h               |  12 +++
 tools/libs/ctrl/xc_domain.c           |  15 ++++
 tools/libs/ctrl/xc_freebsd.c          |   6 ++
 tools/libs/ctrl/xc_linux.c            |  20 +++++
 tools/libs/ctrl/xc_minios.c           |   6 ++
 tools/libs/ctrl/xc_netbsd.c           |   6 ++
 tools/libs/ctrl/xc_physdev.c          |  27 +++++++
 tools/libs/ctrl/xc_solaris.c          |   6 ++
 tools/libs/light/libxl_arch.h         |   6 ++
 tools/libs/light/libxl_arm.c          |  15 ++++
 tools/libs/light/libxl_pci.c          | 112 +++++++++++++++-----------
 tools/libs/light/libxl_x86.c          |  72 +++++++++++++++++
 xen/arch/x86/domctl.c                 |  29 +++++++
 xen/arch/x86/hvm/hypercall.c          |   3 +
 xen/arch/x86/include/asm/io_apic.h    |   2 +
 xen/arch/x86/io_apic.c                |  21 +++++
 xen/arch/x86/mpparse.c                |   7 +-
 xen/drivers/pci/physdev.c             |  52 ++++++++++++
 xen/include/public/domctl.h           |  10 +++
 xen/include/public/physdev.h          |  17 ++++
 xen/include/xen/vpci.h                |   6 ++
 xen/xsm/flask/hooks.c                 |   1 +
 23 files changed, 409 insertions(+), 49 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev
  2024-09-03  7:04 [XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
@ 2024-09-03  7:04 ` Jiqian Chen
  2024-09-09  9:34   ` Roger Pau Monné
  2024-09-03  7:04 ` [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Jiqian Chen @ 2024-09-03  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

When a device has been reset on dom0 side, the Xen hypervisor
doesn't get notification, so the cached state in vpci is all
out of date compare with the real device state.

To solve that problem, add a new hypercall to support the reset
of pcidev and clear the vpci state of device. So that once the
state of device is reset on dom0 side, dom0 can call this
hypercall to notify hypervisor.

The behavior of different reset types may be different in the
future, so divide them now so that they can be easily modified
in the future without affecting the hypercall interface.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
v13->v14 changes:
Removed the check ( !is_pci_passthrough_enabled() ).
Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if the other bits are zero.

v12->v13 changes:
Deleted all "state" words in new code, because it is not necessary.
Deleted unnecessary parameter reset_type of function vpci_reset_device, and changed this function to inline function
Added description to commit message to indicate that the classification of reset types is for possible different behaviors in the future
Renamed reset_type of struct pci_device_reset to flags, and modified the value of macro definition of reset, let them occupy two lowest bits.
Change the function vpci_reset_device to an inline function and delete the ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); because this call exists in subsequent functions and it accesses domain and pci_lock, which will affect the compilation process.

v11->v12 changes:
Change the title of this patch(Add hypercall to support reset of pcidev).
Remove unnecessary notes, erroneous stamps, and #define.

v10->v11 changes:
Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line.
Delete unnecessary local variables "struct physdev_pci_device *dev".
Downgrade printk to dprintk.
Moved struct pci_device_state_reset to the public header file.
Delete enum pci_device_state_reset_type, and use macro definitions to represent different reset types.
Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions.
Add reset type as a function parameter for vpci_reset_device_state for possible future use.

v9->v10 changes:
Nothing.

v8->v9 changes:
Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" from vpci_reset_device_state;
Add pci_device_state_reset_type to distinguish the reset types.

v7->v8 changes:
Nothing.

v6->v7 changes:
Nothing.

v5->v6 changes:
Rebase code and change old function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, vpci_assign_device.

v4->v5 changes:
Add pci_lock wrap function vpci_reset_device_state.

v3->v4 changes:
Change the comment of PHYSDEVOP_pci_device_state_reset;
Move printings behind pcidevs_unlock.

v2->v3 changes:
Move the content out of pci_reset_device_state and delete pci_reset_device_state;
Add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset;
Add description for PHYSDEVOP_pci_device_state_reset;

for patch 1
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c    | 52 ++++++++++++++++++++++++++++++++++++
 xen/include/public/physdev.h | 17 ++++++++++++
 xen/include/xen/vpci.h       |  6 +++++
 4 files changed, 76 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 44342e7e7fc3..f023f7879e24 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
+    case PHYSDEVOP_pci_device_reset:
     case PHYSDEVOP_dbgp_op:
         if ( !is_hardware_domain(currd) )
             return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..0161a85e1e9c 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/vpci.h>

 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }

+    case PHYSDEVOP_pci_device_reset:
+    {
+        struct pci_device_reset dev_reset;
+        struct pci_dev *pdev;
+        pci_sbdf_t sbdf;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+            break;
+
+        ret = -EINVAL;
+        if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK )
+            break;
+
+        sbdf = PCI_SBDF(dev_reset.dev.seg,
+                        dev_reset.dev.bus,
+                        dev_reset.dev.devfn);
+
+        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+        if ( ret )
+            break;
+
+        pcidevs_lock();
+        pdev = pci_get_pdev(NULL, sbdf);
+        if ( !pdev )
+        {
+            pcidevs_unlock();
+            ret = -ENODEV;
+            break;
+        }
+
+        write_lock(&pdev->domain->pci_lock);
+        pcidevs_unlock();
+        switch ( dev_reset.flags & PCI_DEVICE_RESET_MASK )
+        {
+        case PCI_DEVICE_RESET_COLD:
+        case PCI_DEVICE_RESET_WARM:
+        case PCI_DEVICE_RESET_HOT:
+        case PCI_DEVICE_RESET_FLR:
+            ret = vpci_reset_device(pdev);
+            break;
+
+        default:
+            ret = -EINVAL;
+            break;
+        }
+        write_unlock(&pdev->domain->pci_lock);
+
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..45e1c18541c8 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix          30
 #define PHYSDEVOP_release_msix          31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_reset      32
+
 struct physdev_pci_device {
     /* IN */
     uint16_t seg;
@@ -305,6 +312,16 @@ struct physdev_pci_device {
 typedef struct physdev_pci_device physdev_pci_device_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);

+struct pci_device_reset {
+    physdev_pci_device_t dev;
+#define PCI_DEVICE_RESET_COLD 0x0
+#define PCI_DEVICE_RESET_WARM 0x1
+#define PCI_DEVICE_RESET_HOT  0x2
+#define PCI_DEVICE_RESET_FLR  0x3
+#define PCI_DEVICE_RESET_MASK 0x3
+    uint32_t flags;
+};
+
 #define PHYSDEVOP_DBGP_RESET_PREPARE    1
 #define PHYSDEVOP_DBGP_RESET_DONE       2

diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index da8d0f41e6f4..41e7c3bc2791 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -304,6 +304,12 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
 }
 #endif

+static inline int __must_check vpci_reset_device(struct pci_dev *pdev)
+{
+    vpci_deassign_device(pdev);
+    return vpci_assign_device(pdev);
+}
+
 #endif

 /*
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-03  7:04 [XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-09-03  7:04 ` [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
@ 2024-09-03  7:04 ` Jiqian Chen
  2024-09-03  7:42   ` Jan Beulich
  2024-09-03  7:04 ` [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Jiqian Chen @ 2024-09-03  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
xc_physdev_map_pirq map a pirq for passthrough devices.
In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
codes.

But it is fine to map interrupts through pirq to a HVM domain whose
XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
reference interrupts and it is just the way for the device model to
identify which interrupt should be mapped to which domain, however
has_pirq() is just to check if HVM domains route interrupts from
devices(emulated or passthrough) through event channel, so, the has_pirq()
check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
interrupt of a passthrough device can be successfully mapped to pirq for domU.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
v13->v14 changes:
Modified the commit message.

v12->v13 changes:
Removed the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding description in the commit message.

v11->v12 changes:
Avoid using return, set error code instead when (un)map is not allowed.

v10->v11 changes:
Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being executed when domU has no pirq, instead of just preventing self-mapping.
And modify the description of the commit message accordingly.

v9->v10 changes:
Indent the comments above PHYSDEVOP_map_pirq according to the code style.

v8->v9 changes:
Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall.
Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to "d == current->domian".

v7->v8 changes:
Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq.
That check was missed in the previous version.

v6->v7 changes:
Nothing.

v5->v6 changes:
Nothing.

v4->v5 changes:
Move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op.

v3->v4 changes:
add check to prevent PVH self map.

v2->v3 changes:
Du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping.
---
 xen/arch/x86/hvm/hypercall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index f023f7879e24..81883c8d4f60 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     {
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-09-03  7:04 [XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-09-03  7:04 ` [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
  2024-09-03  7:04 ` [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-09-03  7:04 ` Jiqian Chen
  2024-09-09  9:15   ` Roger Pau Monné
  2024-09-03  7:04 ` [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev Jiqian Chen
  2024-09-03  7:04 ` [RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
  4 siblings, 1 reply; 30+ messages in thread
From: Jiqian Chen @ 2024-09-03  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

Some type of domains don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq to set the access of irq, it is not suitable for
dom0 that doesn't have PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke
the permission of irq (translated from x86 gsi) to dumU when dom0
has no PIRQs.

Regarding the translation from gsi to irq, it is that if there are
ACPI overrides entries then get translation from them, if not gsi
are identity mapped into irq.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
CC: Daniel P . Smith <dpsmith@apertussolutions.com>
Remaining unsolved comment @Daniel P . Smith:
+        ret = -EPERM;
+        if ( !irq_access_permitted(currd, irq) ||
+             xsm_irq_permission(XSM_HOOK, d, irq, flags) )
+            break;
Is it okay to issue the XSM check using the translated value(irq),
not the one(gsi) that was originally passed into the hypercall?
---
v13->v14 changes:
No.

v12->v13 changes:
For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros.
Move "gsi > highest_gsi()" into function gsi_2_irq.
Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type.
Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
Delete unnecessary goto statements and change to direct break.
Add description in commit message to explain how gsi to irq isconverted.

v11->v12 changes:
Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function.
Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.

v10->v11 changes:
Extracted from patch#5 of v10 into a separate patch.
Add non-zero judgment for other bits of allow_access.
Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
Change the error exit path identifier "out" to "gsi_permission_out".
Use ARRAY_SIZE() instead of open coed.

v9->v10 changes:
Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission.
Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain.
In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added.
Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition.

v8->v9 changes:
Change the commit message to describe more why we need this new hypercall.
Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check.
Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
Add explicit padding to struct xen_domctl_gsi_permission.

v5->v8 changes:
Nothing.

v4->v5 changes:
New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi.
---
 xen/arch/x86/domctl.c              | 29 +++++++++++++++++++++++++++++
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c             | 21 +++++++++++++++++++++
 xen/arch/x86/mpparse.c             |  7 +++----
 xen/include/public/domctl.h        | 10 ++++++++++
 xen/xsm/flask/hooks.c              |  1 +
 6 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 68b5b46d1a83..60b5578c47f8 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include <asm/xstate.h>
 #include <asm/psr.h>
 #include <asm/cpu-policy.h>
+#include <asm/io_apic.h>
 
 static int update_domain_cpu_policy(struct domain *d,
                                     xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,34 @@ long arch_do_domctl(
         break;
     }
 
+    case XEN_DOMCTL_gsi_permission:
+    {
+        int irq;
+        unsigned int gsi = domctl->u.gsi_permission.gsi;
+        uint32_t flags = domctl->u.gsi_permission.flags;
+
+        /* Check all bits are zero except lowest bit */
+        ret = -EINVAL;
+        if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK )
+            break;
+
+        ret = irq = gsi_2_irq(gsi);
+        if ( ret <= 0 )
+            break;
+
+        ret = -EPERM;
+        if ( !irq_access_permitted(currd, irq) ||
+             xsm_irq_permission(XSM_HOOK, d, irq, flags) )
+            break;
+
+        if ( flags )
+            ret = irq_permit_access(d, irq);
+        else
+            ret = irq_deny_access(d, irq);
+
+        break;
+    }
+
     case XEN_DOMCTL_getpageframeinfo3:
     {
         unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
index 78268ea8f666..62456806c7af 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -213,5 +213,7 @@ unsigned highest_gsi(void);
 
 int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
+int mp_find_ioapic(unsigned int gsi);
+int gsi_2_irq(unsigned int gsi);
 
 #endif
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 772700584639..5859484875cc 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -955,6 +955,27 @@ static int pin_2_irq(int idx, int apic, int pin)
     return irq;
 }
 
+int gsi_2_irq(unsigned int gsi)
+{
+    int ioapic, irq;
+    unsigned int pin;
+
+    if ( gsi > highest_gsi() )
+        return -ERANGE;
+
+    ioapic = mp_find_ioapic(gsi);
+    if ( ioapic < 0 )
+        return -EINVAL;
+
+    pin = gsi - io_apic_gsi_base(ioapic);
+
+    irq = apic_pin_2_gsi_irq(ioapic, pin);
+    if ( irq <= 0 )
+        return -EINVAL;
+
+    return irq;
+}
+
 static inline int IO_APIC_irq_trigger(int irq)
 {
     int apic, idx, pin;
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index 306d8ed97a83..e13b83bbe9dd 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -842,8 +842,7 @@ static struct mp_ioapic_routing {
 } mp_ioapic_routing[MAX_IO_APICS];
 
 
-static int mp_find_ioapic (
-	int			gsi)
+int mp_find_ioapic(unsigned int gsi)
 {
 	unsigned int		i;
 
@@ -854,7 +853,7 @@ static int mp_find_ioapic (
 			return i;
 	}
 
-	printk(KERN_ERR "ERROR: Unable to locate IOAPIC for GSI %d\n", gsi);
+	printk(KERN_ERR "ERROR: Unable to locate IOAPIC for GSI %u\n", gsi);
 
 	return -1;
 }
@@ -915,7 +914,7 @@ void __init mp_register_ioapic (
 	return;
 }
 
-unsigned __init highest_gsi(void)
+unsigned highest_gsi(void)
 {
 	unsigned x, res = 0;
 	for (x = 0; x < nr_ioapics; x++)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 2a49fe46ce25..e1028fc524cf 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -464,6 +464,14 @@ struct xen_domctl_irq_permission {
     uint8_t pad[3];
 };
 
+/* XEN_DOMCTL_gsi_permission */
+struct xen_domctl_gsi_permission {
+    uint32_t gsi;
+#define XEN_DOMCTL_GSI_REVOKE      0
+#define XEN_DOMCTL_GSI_GRANT       1
+#define XEN_DOMCTL_GSI_ACTION_MASK 1
+    uint32_t flags;
+};
 
 /* XEN_DOMCTL_iomem_permission */
 struct xen_domctl_iomem_permission {
@@ -1306,6 +1314,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_get_paging_mempool_size       85
 #define XEN_DOMCTL_set_paging_mempool_size       86
 #define XEN_DOMCTL_dt_overlay                    87
+#define XEN_DOMCTL_gsi_permission                88
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1328,6 +1337,7 @@ struct xen_domctl {
         struct xen_domctl_setdomainhandle   setdomainhandle;
         struct xen_domctl_setdebugging      setdebugging;
         struct xen_domctl_irq_permission    irq_permission;
+        struct xen_domctl_gsi_permission    gsi_permission;
         struct xen_domctl_iomem_permission  iomem_permission;
         struct xen_domctl_ioport_permission ioport_permission;
         struct xen_domctl_hypercall_init    hypercall_init;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 278ad38c2af3..dfa23738cd8a 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -695,6 +695,7 @@ static int cf_check flask_domctl(struct domain *d, unsigned int cmd,
     case XEN_DOMCTL_shadow_op:
     case XEN_DOMCTL_ioport_permission:
     case XEN_DOMCTL_ioport_mapping:
+    case XEN_DOMCTL_gsi_permission:
 #endif
 #ifdef CONFIG_HAS_PASSTHROUGH
     /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev
  2024-09-03  7:04 [XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (2 preceding siblings ...)
  2024-09-03  7:04 ` [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
@ 2024-09-03  7:04 ` Jiqian Chen
  2024-09-03 14:12   ` Anthony PERARD
  2024-09-03 15:51   ` Anthony PERARD
  2024-09-03  7:04 ` [RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
  4 siblings, 2 replies; 30+ messages in thread
From: Jiqian Chen @ 2024-09-03  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices/<sbdf>/irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-Jiqian.Chen@amd.com/
---
v13->v14 changes:
No.

v12->v13 changes:
Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce.
Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd".

v11->v12 changes:
Nothing.

v10->v11 changes:
Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall.
Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32.

v9->v10 changes:
Extract the implementation of xc_physdev_gsi_from_dev to be a new patch.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++++++
 tools/include/xenctrl.h               |  2 ++
 tools/libs/ctrl/xc_freebsd.c          |  6 ++++++
 tools/libs/ctrl/xc_linux.c            | 20 ++++++++++++++++++++
 tools/libs/ctrl/xc_minios.c           |  6 ++++++
 tools/libs/ctrl/xc_netbsd.c           |  6 ++++++
 tools/libs/ctrl/xc_solaris.c          |  6 ++++++
 7 files changed, 53 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..607dfa2287bc 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
 	__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_pcidev_get_gsi {
+	__u32 sbdf;
+	__u32 gsi;
+} privcmd_pcidev_get_gsi_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
 	_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE				\
 	_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_PCIDEV_GET_GSI			\
+	_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED				\
 	_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2c4608c09ab0..924f9a35f790 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1642,6 +1642,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
                           uint32_t domid,
                           int pirq);
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c
index 9dd48a3a08bb..9019fc663361 100644
--- a/tools/libs/ctrl/xc_freebsd.c
+++ b/tools/libs/ctrl/xc_freebsd.c
@@ -60,6 +60,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+    errno = ENOSYS;
+    return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c
index c67c71c08be3..92591e49a1c8 100644
--- a/tools/libs/ctrl/xc_linux.c
+++ b/tools/libs/ctrl/xc_linux.c
@@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+    int ret;
+    privcmd_pcidev_get_gsi_t dev_gsi = {
+        .sbdf = sbdf,
+        .gsi = 0,
+    };
+
+    ret = ioctl(xencall_fd(xch->xcall),
+                IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi);
+
+    if (ret < 0) {
+        PERROR("Failed to get gsi from dev");
+    } else {
+        ret = dev_gsi.gsi;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c
index 3dea7a78a576..462af827b33c 100644
--- a/tools/libs/ctrl/xc_minios.c
+++ b/tools/libs/ctrl/xc_minios.c
@@ -47,6 +47,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return memalign(alignment, size);
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+    errno = ENOSYS;
+    return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_netbsd.c b/tools/libs/ctrl/xc_netbsd.c
index 31979937621e..1318d4d90608 100644
--- a/tools/libs/ctrl/xc_netbsd.c
+++ b/tools/libs/ctrl/xc_netbsd.c
@@ -63,6 +63,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return valloc(size);
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+    errno = ENOSYS;
+    return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_solaris.c b/tools/libs/ctrl/xc_solaris.c
index 5128f3f0f490..049e28d55ccd 100644
--- a/tools/libs/ctrl/xc_solaris.c
+++ b/tools/libs/ctrl/xc_solaris.c
@@ -32,6 +32,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return memalign(alignment, size);
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+    errno = ENOSYS;
+    return -1;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0
  2024-09-03  7:04 [XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (3 preceding siblings ...)
  2024-09-03  7:04 ` [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev Jiqian Chen
@ 2024-09-03  7:04 ` Jiqian Chen
  2024-09-03 17:16   ` Anthony PERARD
  4 siblings, 1 reply; 30+ messages in thread
From: Jiqian Chen @ 2024-09-03  7:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Jiqian Chen,
	Huang Rui

When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices/<sbdf>/irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and add a new function xc_physdev_map_pirq_gsi to get
a free pirq for gsi(why not use current function
xc_physdev_map_pirq, because it doesn't support to allocate a
free pirq, what's more, to prevent changing it and affecting
its callers, so add xc_physdev_map_pirq_gsi).

Besides, PVH dom0 doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-Jiqian.Chen@amd.com/
---
v13->v14 changes:
No.

v12->v13 changes:
Deleted patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi.
For functions that generate libxl error, changed the return value from -1 to ERROR_*.
Instead of declaring "ctx", use the macro "CTX".
Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept of pirq in the domain where xl is located.
In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq corresponding to gsi.

v11->v12 changes:
Nothing.

v10->v11 changes:
New patch
Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations.
Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be used to obtain the corresponding pirq when unmap PIRQ.
---
 tools/include/xenctrl.h       |  10 +++
 tools/libs/ctrl/xc_domain.c   |  15 +++++
 tools/libs/ctrl/xc_physdev.c  |  27 ++++++++
 tools/libs/light/libxl_arch.h |   6 ++
 tools/libs/light/libxl_arm.c  |  15 +++++
 tools/libs/light/libxl_pci.c  | 112 ++++++++++++++++++++--------------
 tools/libs/light/libxl_x86.c  |  72 ++++++++++++++++++++++
 7 files changed, 212 insertions(+), 45 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 924f9a35f790..29617585c535 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1383,6 +1383,11 @@ int xc_domain_irq_permission(xc_interface *xch,
                              uint32_t pirq,
                              bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             uint32_t flags);
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
@@ -1638,6 +1643,11 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
                             int entry_nr,
                             uint64_t table_base);
 
+int xc_physdev_map_pirq_gsi(xc_interface *xch,
+                            uint32_t domid,
+                            int gsi,
+                            int *pirq);
+
 int xc_physdev_unmap_pirq(xc_interface *xch,
                           uint32_t domid,
                           int pirq);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..e3538ec0ba80 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             uint32_t flags)
+{
+    struct xen_domctl domctl = {
+        .cmd = XEN_DOMCTL_gsi_permission,
+        .domain = domid,
+        .u.gsi_permission.gsi = gsi,
+        .u.gsi_permission.flags = flags,
+    };
+
+    return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..c752cd1f4410 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -95,6 +95,33 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
     return rc;
 }
 
+int xc_physdev_map_pirq_gsi(xc_interface *xch,
+                            uint32_t domid,
+                            int gsi,
+                            int *pirq)
+{
+    int rc;
+    struct physdev_map_pirq map;
+
+    if ( !pirq )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+    memset(&map, 0, sizeof(struct physdev_map_pirq));
+    map.domid = domid;
+    map.type = MAP_PIRQ_TYPE_GSI;
+    map.index = gsi;
+    map.pirq = *pirq;
+
+    rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
+
+    if ( !rc )
+        *pirq = map.pirq;
+
+    return rc;
+}
+
 int xc_physdev_unmap_pirq(xc_interface *xch,
                           uint32_t domid,
                           int pirq)
diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
index f88f11d6de1d..c8ef52ddbe7f 100644
--- a/tools/libs/light/libxl_arch.h
+++ b/tools/libs/light/libxl_arch.h
@@ -91,6 +91,12 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
                                       libxl_domain_config *dst,
                                       const libxl_domain_config *src);
 
+_hidden
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
+_hidden
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
+_hidden
+bool libxl__arch_local_domain_has_pirq_notion(libxl__gc *gc);
 #if defined(__i386__) || defined(__x86_64__)
 
 #define LAPIC_BASE_ADDRESS  0xfee00000
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index a4029e3ac810..5a9db5e85f6f 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -1774,6 +1774,21 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
 {
 }
 
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    return ERROR_INVAL;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    return ERROR_INVAL;
+}
+
+bool libxl__arch_local_domain_has_pirq_notion(libxl__gc *gc)
+{
+    return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..2014a67e6e56 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 
 #define PCI_BDF                "%04x:%02x:%02x.%01x"
 #define PCI_BDF_SHORT          "%02x:%02x.%01x"
@@ -1478,32 +1479,43 @@ static void pci_add_dm_done(libxl__egc *egc,
     fclose(f);
     if (!pci_supp_legacy_irq())
         goto out_no_irq;
-    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
-                                pci->bus, pci->dev, pci->func);
-    f = fopen(sysfs_path, "r");
-    if (f == NULL) {
-        LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
-        goto out_no_irq;
-    }
-    if ((fscanf(f, "%u", &irq) == 1) && irq) {
-        r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
-        if (r < 0) {
-            LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
-                  irq, r);
-            fclose(f);
-            rc = ERROR_FAIL;
+
+    /* When dom0 is PVH, should use gsi to map pirq and grant permission */
+    rc = libxl__arch_local_domain_has_pirq_notion(gc);
+    if (!rc) {
+        rc = libxl__arch_hvm_map_gsi(gc, pci_encode_bdf(pci), domid);
+        if (rc) {
+            LOGED(ERROR, domainid, "libxl__arch_hvm_map_gsi failed");
             goto out;
         }
-        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
-        if (r < 0) {
-            LOGED(ERROR, domainid,
-                  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
-            fclose(f);
-            rc = ERROR_FAIL;
-            goto out;
+    } else {
+        sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+                                    pci->bus, pci->dev, pci->func);
+        f = fopen(sysfs_path, "r");
+        if (f == NULL) {
+            LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
+            goto out_no_irq;
+        }
+        if ((fscanf(f, "%u", &irq) == 1) && irq) {
+            r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
+            if (r < 0) {
+                LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
+                    irq, r);
+                fclose(f);
+                rc = ERROR_FAIL;
+                goto out;
+            }
+            r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+            if (r < 0) {
+                LOGED(ERROR, domainid,
+                    "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+                fclose(f);
+                rc = ERROR_FAIL;
+                goto out;
+            }
         }
+        fclose(f);
     }
-    fclose(f);
 
     /* Don't restrict writes to the PCI config space from this VM */
     if (pci->permissive) {
@@ -2229,33 +2241,43 @@ skip_bar:
     if (!pci_supp_legacy_irq())
         goto skip_legacy_irq;
 
-    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
-                           pci->bus, pci->dev, pci->func);
-
-    f = fopen(sysfs_path, "r");
-    if (f == NULL) {
-        LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-        goto skip_legacy_irq;
-    }
+    /* When dom0 is PVH, should use gsi to unmap pirq and deny permission */
+    rc = libxl__arch_local_domain_has_pirq_notion(gc);
+    if (!rc) {
+        rc = libxl__arch_hvm_unmap_gsi(gc, pci_encode_bdf(pci), domid);
+        if (rc) {
+            LOGED(ERROR, domid, "libxl__arch_hvm_unmap_gsi failed");
+            goto out;
+        }
+    } else {
+        sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+                            pci->bus, pci->dev, pci->func);
 
-    if ((fscanf(f, "%u", &irq) == 1) && irq) {
-        rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
-        if (rc < 0) {
-            /*
-             * QEMU may have already unmapped the IRQ. So the error
-             * may be spurious. For now, still print an error message as
-             * it is not easy to distinguished between valid and
-             * spurious error.
-             */
-            LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
+        f = fopen(sysfs_path, "r");
+        if (f == NULL) {
+            LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
+            goto skip_legacy_irq;
         }
-        rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
-        if (rc < 0) {
-            LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
+
+        if ((fscanf(f, "%u", &irq) == 1) && irq) {
+            rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
+            if (rc < 0) {
+                /*
+                * QEMU may have already unmapped the IRQ. So the error
+                * may be spurious. For now, still print an error message as
+                * it is not easy to distinguished between valid and
+                * spurious error.
+                */
+                LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
+            }
+            rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
+            if (rc < 0) {
+                LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
+            }
         }
-    }
 
-    fclose(f);
+        fclose(f);
+    }
 
 skip_legacy_irq:
 
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index 60643d6f5376..20e3956f09b8 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -879,6 +879,78 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
                                  libxl_defbool_val(src->b_info.u.hvm.pirq));
 }
 
+bool libxl__arch_local_domain_has_pirq_notion(libxl__gc *gc)
+{
+    int r;
+    xc_domaininfo_t info;
+
+    r = xc_domain_getinfo_single(CTX->xch, LIBXL_TOOLSTACK_DOMID, &info);
+    if (r == 0) {
+        if (!(info.flags & XEN_DOMINF_hvm_guest) ||
+            (info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ))
+            return true;
+    } else {
+        LOGE(ERROR, "getdomaininfo failed ret=%d", r);
+    }
+
+    return false;
+}
+
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    int pirq = -1, gsi, r;
+
+    gsi = xc_pcidev_get_gsi(CTX->xch, sbdf);
+    if (gsi < 0) {
+        return ERROR_FAIL;
+    }
+
+    r = xc_physdev_map_pirq_gsi(CTX->xch, domid, gsi, &pirq);
+    if (r < 0) {
+        LOGED(ERROR, domid, "xc_physdev_map_pirq_gsi gsi=%d ret=%d", gsi, r);
+        return ERROR_FAIL;
+    }
+
+    r = xc_domain_gsi_permission(CTX->xch, domid, gsi, XEN_DOMCTL_GSI_GRANT);
+    if (r < 0) {
+        LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d ret=%d", gsi, r);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+    int pirq = -1, gsi, r;
+
+    gsi = xc_pcidev_get_gsi(CTX->xch, sbdf);
+    if (gsi < 0) {
+        return ERROR_FAIL;
+    }
+
+    /* Before unmapping, use mapping to get the already mapped pirq first */
+    r = xc_physdev_map_pirq_gsi(CTX->xch, domid, gsi, &pirq);
+    if (r < 0) {
+        LOGED(ERROR, domid, "xc_physdev_map_pirq_gsi gsi=%d ret=%d", gsi, r);
+        return ERROR_FAIL;
+    }
+
+    r = xc_physdev_unmap_pirq(CTX->xch, domid, pirq);
+    if (r < 0) {
+        LOGED(ERROR, domid, "xc_physdev_unmap_pirq gsi=%d ret=%d", gsi, r);
+        return ERROR_FAIL;
+    }
+
+    r = xc_domain_gsi_permission(CTX->xch, domid, gsi, XEN_DOMCTL_GSI_REVOKE);
+    if (r < 0) {
+        LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d ret=%d", gsi, r);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-03  7:04 ` [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-09-03  7:42   ` Jan Beulich
  2024-09-03  7:58     ` Chen, Jiqian
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2024-09-03  7:42 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 03.09.2024 09:04, Jiqian Chen wrote:
> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> xc_physdev_map_pirq map a pirq for passthrough devices.
> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> codes.
> 
> But it is fine to map interrupts through pirq to a HVM domain whose
> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
> reference interrupts and it is just the way for the device model to
> identify which interrupt should be mapped to which domain, however
> has_pirq() is just to check if HVM domains route interrupts from
> devices(emulated or passthrough) through event channel, so, the has_pirq()
> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
> 
> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
> interrupt of a passthrough device can be successfully mapped to pirq for domU.

As before: When you talk about just Dom0, ...

> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      {
>      case PHYSDEVOP_map_pirq:
>      case PHYSDEVOP_unmap_pirq:
> +        break;
> +
>      case PHYSDEVOP_eoi:
>      case PHYSDEVOP_irq_status_query:
>      case PHYSDEVOP_get_free_pirq:

... that ought to match the code. IOW you've again lost why it is okay(ish)
(or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
Similarly imo Dom0 using this on itself wants discussing.

As to my earlier comments regarding your commit message adjustments: I
forgot that the change had to be reverted. I'm sorry for that.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-03  7:42   ` Jan Beulich
@ 2024-09-03  7:58     ` Chen, Jiqian
  2024-09-03  9:25       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chen, Jiqian @ 2024-09-03  7:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/9/3 15:42, Jan Beulich wrote:
> On 03.09.2024 09:04, Jiqian Chen wrote:
>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>> xc_physdev_map_pirq map a pirq for passthrough devices.
>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>> codes.
>>
>> But it is fine to map interrupts through pirq to a HVM domain whose
>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>> reference interrupts and it is just the way for the device model to
>> identify which interrupt should be mapped to which domain, however
>> has_pirq() is just to check if HVM domains route interrupts from
>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
> 
> As before: When you talk about just Dom0, ...
> 
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      {
>>      case PHYSDEVOP_map_pirq:
>>      case PHYSDEVOP_unmap_pirq:
>> +        break;
>> +
>>      case PHYSDEVOP_eoi:
>>      case PHYSDEVOP_irq_status_query:
>>      case PHYSDEVOP_get_free_pirq:
> 
> ... that ought to match the code. IOW you've again lost why it is okay(ish)
> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
> Similarly imo Dom0 using this on itself wants discussing.
Do you mean I need to talk about why permit this op for all HVM and  where can prevent PVH domain calling this for self-mapping, not only dom0?

> 
> As to my earlier comments regarding your commit message adjustments: I
> forgot that the change had to be reverted. I'm sorry for that.
Which change? Do I need to change the codes?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-03  7:58     ` Chen, Jiqian
@ 2024-09-03  9:25       ` Jan Beulich
  2024-09-03 10:53         ` Chen, Jiqian
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2024-09-03  9:25 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 03.09.2024 09:58, Chen, Jiqian wrote:
> On 2024/9/3 15:42, Jan Beulich wrote:
>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>> codes.
>>>
>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>> reference interrupts and it is just the way for the device model to
>>> identify which interrupt should be mapped to which domain, however
>>> has_pirq() is just to check if HVM domains route interrupts from
>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>
>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>
>> As before: When you talk about just Dom0, ...
>>
>>> --- a/xen/arch/x86/hvm/hypercall.c
>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>      {
>>>      case PHYSDEVOP_map_pirq:
>>>      case PHYSDEVOP_unmap_pirq:
>>> +        break;
>>> +
>>>      case PHYSDEVOP_eoi:
>>>      case PHYSDEVOP_irq_status_query:
>>>      case PHYSDEVOP_get_free_pirq:
>>
>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>> Similarly imo Dom0 using this on itself wants discussing.
> Do you mean I need to talk about why permit this op for all HVM

You don't need to invent reasons, but it needs making clear that wider than
necessary (for your purpose) exposure is at least not going to be a problem.

> and  where can prevent PVH domain calling this for self-mapping, not only dom0?

Aiui use on itself is limited to Dom0, so only that would need clarifying
(along the lines of the above, i.e. that/why it is not a problem). For
has_pirq() domains use on oneself was already permitted before.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-03  9:25       ` Jan Beulich
@ 2024-09-03 10:53         ` Chen, Jiqian
  2024-09-03 14:14           ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chen, Jiqian @ 2024-09-03 10:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/9/3 17:25, Jan Beulich wrote:
> On 03.09.2024 09:58, Chen, Jiqian wrote:
>> On 2024/9/3 15:42, Jan Beulich wrote:
>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>> codes.
>>>>
>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>> reference interrupts and it is just the way for the device model to
>>>> identify which interrupt should be mapped to which domain, however
>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>
>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>
>>> As before: When you talk about just Dom0, ...
>>>
>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>      {
>>>>      case PHYSDEVOP_map_pirq:
>>>>      case PHYSDEVOP_unmap_pirq:
>>>> +        break;
>>>> +
>>>>      case PHYSDEVOP_eoi:
>>>>      case PHYSDEVOP_irq_status_query:
>>>>      case PHYSDEVOP_get_free_pirq:
>>>
>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>> Similarly imo Dom0 using this on itself wants discussing.
>> Do you mean I need to talk about why permit this op for all HVM
> 
> You don't need to invent reasons, but it needs making clear that wider than
> necessary (for your purpose) exposure is at least not going to be a problem.
> 
>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
> 
> Aiui use on itself is limited to Dom0, so only that would need clarifying
> (along the lines of the above, i.e. that/why it is not a problem). For
> has_pirq() domains use on oneself was already permitted before.

Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
{
x86/pvh: Allow (un)map_pirq when dom0 is PVH

Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
xc_physdev_map_pirq map a pirq for passthrough devices.
In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
codes.

To solve above problem, need to remove the chack has_pirq() for that
situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
what the problem need.
So, clarify below:

For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
interrupts through pirq for them. Because pirq field is used as a way to
reference interrupts and it is just the way for the device model to
identify which interrupt should be mapped to which domain, however
has_pirq() is just to check if HVM domains route interrupts from
devices(emulated or passthrough) through event channel, so, remove
has_pirq() check has no impact on HVM domUs.

For PVH domUs that performs such an operation will fail at the check
xsm_map_dedomain_pirq() in physdev_map-nirq().

For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
it also has no impact.
}

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev
  2024-09-03  7:04 ` [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev Jiqian Chen
@ 2024-09-03 14:12   ` Anthony PERARD
  2024-09-03 15:51   ` Anthony PERARD
  1 sibling, 0 replies; 30+ messages in thread
From: Anthony PERARD @ 2024-09-03 14:12 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Tue, Sep 03, 2024 at 03:04:23PM +0800, Jiqian Chen wrote:
> When passthrough a device to domU, QEMU and xl tools use its gsi
> number to do pirq mapping, see QEMU code
> xen_pt_realize->xc_physdev_map_pirq, and xl code
> pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
> from file /sys/bus/pci/devices/<sbdf>/irq, that is wrong, because
> irq is not equal with gsi, they are in different spaces, so pirq
> mapping fails.
> 
> And in current codes, there is no method to get gsi for userspace.
> For above purpose, add new function to get gsi, and the
> corresponding ioctl is implemented on linux kernel side.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> ---
> RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged.
> https://lore.kernel.org/xen-devel/20240607075109.126277-4-Jiqian.Chen@amd.com/
> ---
> v13->v14 changes:
> No.
> 
> v12->v13 changes:
> Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce.
> Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
> Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd".

Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

Thanks,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-03 10:53         ` Chen, Jiqian
@ 2024-09-03 14:14           ` Jan Beulich
  2024-09-04  1:43             ` Stefano Stabellini
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2024-09-03 14:14 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 03.09.2024 12:53, Chen, Jiqian wrote:
> On 2024/9/3 17:25, Jan Beulich wrote:
>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>> codes.
>>>>>
>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>> reference interrupts and it is just the way for the device model to
>>>>> identify which interrupt should be mapped to which domain, however
>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>
>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>
>>>> As before: When you talk about just Dom0, ...
>>>>
>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>      {
>>>>>      case PHYSDEVOP_map_pirq:
>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>> +        break;
>>>>> +
>>>>>      case PHYSDEVOP_eoi:
>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>
>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>> Similarly imo Dom0 using this on itself wants discussing.
>>> Do you mean I need to talk about why permit this op for all HVM
>>
>> You don't need to invent reasons, but it needs making clear that wider than
>> necessary (for your purpose) exposure is at least not going to be a problem.
>>
>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>
>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>> (along the lines of the above, i.e. that/why it is not a problem). For
>> has_pirq() domains use on oneself was already permitted before.
> 
> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
> {
> x86/pvh: Allow (un)map_pirq when dom0 is PVH
> 
> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> xc_physdev_map_pirq map a pirq for passthrough devices.
> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> codes.
> 
> To solve above problem, need to remove the chack has_pirq() for that
> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
> what the problem need.
> So, clarify below:
> 
> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
> interrupts through pirq for them. Because pirq field is used as a way to
> reference interrupts and it is just the way for the device model to
> identify which interrupt should be mapped to which domain, however
> has_pirq() is just to check if HVM domains route interrupts from
> devices(emulated or passthrough) through event channel, so, remove
> has_pirq() check has no impact on HVM domUs.
> 
> For PVH domUs that performs such an operation will fail at the check
> xsm_map_dedomain_pirq() in physdev_map-nirq().
> 
> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
> it also has no impact.
> }

This is better than what you had before, and I don't really fancy re-
writing the description effectively from scratch. So let's just go from
there. As to the "excess" permission for the sub-ops: The main thing I'm
after is that it be clarified that we're not going to introduce any
security issues here. That requires auditing the code, and merely saying
"also has no impact" is a little too little for my taste. For Dom0 an
argument may be that it is overly powerful already anyway, even if for
PVH we're a little more strict than for PV (I think).

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev
  2024-09-03  7:04 ` [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev Jiqian Chen
  2024-09-03 14:12   ` Anthony PERARD
@ 2024-09-03 15:51   ` Anthony PERARD
  2024-09-03 15:57     ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Anthony PERARD @ 2024-09-03 15:51 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Tue, Sep 03, 2024 at 03:04:23PM +0800, Jiqian Chen wrote:
> diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
> index bc60e8fd55eb..607dfa2287bc 100644
> --- a/tools/include/xen-sys/Linux/privcmd.h
> +++ b/tools/include/xen-sys/Linux/privcmd.h
> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>  	__u64 addr;
>  } privcmd_mmap_resource_t;
>  
> +typedef struct privcmd_pcidev_get_gsi {
> +	__u32 sbdf;
> +	__u32 gsi;
> +} privcmd_pcidev_get_gsi_t;

> diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c
> index c67c71c08be3..92591e49a1c8 100644
> --- a/tools/libs/ctrl/xc_linux.c
> +++ b/tools/libs/ctrl/xc_linux.c
> @@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
>      return ptr;
>  }
>  
> +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
> +{
> +    int ret;
> +    privcmd_pcidev_get_gsi_t dev_gsi = {
> +        .sbdf = sbdf,
> +        .gsi = 0,
> +    };
> +
> +    ret = ioctl(xencall_fd(xch->xcall),
> +                IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi);
> +
> +    if (ret < 0) {
> +        PERROR("Failed to get gsi from dev");
> +    } else {
> +        ret = dev_gsi.gsi;

I've just notice that this is mixing signed and unsigned int.
We are storing a "__u32" into an "int" here. This isn't great as we are
throwing way lots of potentially good value. (Even if I have no idea if
they are possible.)

Cheers,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev
  2024-09-03 15:51   ` Anthony PERARD
@ 2024-09-03 15:57     ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2024-09-03 15:57 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, Jiqian Chen

On 03.09.2024 17:51, Anthony PERARD wrote:
> On Tue, Sep 03, 2024 at 03:04:23PM +0800, Jiqian Chen wrote:
>> --- a/tools/libs/ctrl/xc_linux.c
>> +++ b/tools/libs/ctrl/xc_linux.c
>> @@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
>>      return ptr;
>>  }
>>  
>> +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
>> +{
>> +    int ret;
>> +    privcmd_pcidev_get_gsi_t dev_gsi = {
>> +        .sbdf = sbdf,
>> +        .gsi = 0,
>> +    };
>> +
>> +    ret = ioctl(xencall_fd(xch->xcall),
>> +                IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi);
>> +
>> +    if (ret < 0) {
>> +        PERROR("Failed to get gsi from dev");
>> +    } else {
>> +        ret = dev_gsi.gsi;
> 
> I've just notice that this is mixing signed and unsigned int.
> We are storing a "__u32" into an "int" here. This isn't great as we are
> throwing way lots of potentially good value. (Even if I have no idea if
> they are possible.)

GSIs are numbered 0...N-1, with N being the number of lines all IO-APICs
in the system together have. Surely that's not going to go into the
millions, let alone billions. At least not any time soon, and there's
move away from using line interrupts anyway.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0
  2024-09-03  7:04 ` [RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
@ 2024-09-03 17:16   ` Anthony PERARD
  2024-09-04  9:31     ` Chen, Jiqian
  0 siblings, 1 reply; 30+ messages in thread
From: Anthony PERARD @ 2024-09-03 17:16 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Juergen Gross, Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Tue, Sep 03, 2024 at 03:04:24PM +0800, Jiqian Chen wrote:
> When dom0 is PVH, and passthrough a device to dumU, xl will
> use the gsi number of device to do a pirq mapping, see
> pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
> got from file /sys/bus/pci/devices/<sbdf>/irq, that confuses
> irq and gsi, they are in different space and are not equal,
> so it will fail when mapping.
> To solve this issue, use xc_physdev_gsi_from_dev to get the

The function name has changed, it's not xc_physdev_gsi_from_dev anymore
;-). It is always possible to write commit description without naming
the functions that are going to be use, they are in the patch anyway.
But, the description should be updated to reflect changes in previous
patches.

> real gsi and add a new function xc_physdev_map_pirq_gsi to get
> a free pirq for gsi(why not use current function
> xc_physdev_map_pirq, because it doesn't support to allocate a
> free pirq, what's more, to prevent changing it and affecting
> its callers, so add xc_physdev_map_pirq_gsi).
> 
> Besides, PVH dom0 doesn't have PIRQ flag, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. So grant function callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
> requires passing in pirq, it is not suitable for dom0 that
> doesn't have PIRQs to grant irq permission.
> To solve this issue, use the new hypercall
> XEN_DOMCTL_gsi_permission to grant the permission of irq(
> translate from gsi) to dumU when dom0 has no PIRQs.
>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> ---
> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
> index 460a8e779ce8..c752cd1f4410 100644
> --- a/tools/libs/ctrl/xc_physdev.c
> +++ b/tools/libs/ctrl/xc_physdev.c
> @@ -95,6 +95,33 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
>      return rc;
>  }
>  
> +int xc_physdev_map_pirq_gsi(xc_interface *xch,
> +                            uint32_t domid,
> +                            int gsi,
> +                            int *pirq)
> +{
> +    int rc;
> +    struct physdev_map_pirq map;
> +
> +    if ( !pirq )
> +    {
> +        errno = EINVAL;
> +        return -1;
> +    }
> +    memset(&map, 0, sizeof(struct physdev_map_pirq));
> +    map.domid = domid;
> +    map.type = MAP_PIRQ_TYPE_GSI;
> +    map.index = gsi;

You can write instead, when declaring `map`:

    struct physdev_map_pirq map = {
        .domid = domid,
        .type = MAP_PIRQ_TYPE_GSI,
        .index = gsi,
    };

Then there's no need to call memset(), the compiler will know what to
do.

> +    map.pirq = *pirq;
> +
> +    rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
> +
> +    if ( !rc )
> +        *pirq = map.pirq;
> +
> +    return rc;
> +}
> +
>  int xc_physdev_unmap_pirq(xc_interface *xch,
>                            uint32_t domid,
>                            int pirq)
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index 96cb4da0794e..2014a67e6e56 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -17,6 +17,7 @@
>  #include "libxl_osdeps.h" /* must come before any other headers */
>  
>  #include "libxl_internal.h"
> +#include "libxl_arch.h"
>  
>  #define PCI_BDF                "%04x:%02x:%02x.%01x"
>  #define PCI_BDF_SHORT          "%02x:%02x.%01x"
> @@ -1478,32 +1479,43 @@ static void pci_add_dm_done(libxl__egc *egc,
>      fclose(f);
>      if (!pci_supp_legacy_irq())
>          goto out_no_irq;
> -    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
> -                                pci->bus, pci->dev, pci->func);
> -    f = fopen(sysfs_path, "r");
> -    if (f == NULL) {
> -        LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
> -        goto out_no_irq;
> -    }
> -    if ((fscanf(f, "%u", &irq) == 1) && irq) {
> -        r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
> -        if (r < 0) {
> -            LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
> -                  irq, r);
> -            fclose(f);
> -            rc = ERROR_FAIL;
> +
> +    /* When dom0 is PVH, should use gsi to map pirq and grant permission */
> +    rc = libxl__arch_local_domain_has_pirq_notion(gc);
> +    if (!rc) {

Here, you can libxl__arch_local_domain_has_pirq_notion() within the if()
condition because the function returns a bool. (Also, `rc` is for libxl
error code, so we can make the mistake here in thinking that there an
error code been been store.) Alternatively, you could declare "bool ok"
and use that.

After that, it will be easier to read the condition has "if has pirq" or
"if ok" instead of "if no error".



> +        rc = libxl__arch_hvm_map_gsi(gc, pci_encode_bdf(pci), domid);
> +        if (rc) {
> +            LOGED(ERROR, domainid, "libxl__arch_hvm_map_gsi failed");

I think LOGD() instead of LOGED() would be enough here.
libxl__arch_hvm_map_gsi() already logs `strerror(errno)` so there's no
need to print it again.

>              goto out;
>          }
> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> -        if (r < 0) {
> -            LOGED(ERROR, domainid,
> -                  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
> -            fclose(f);
> -            rc = ERROR_FAIL;
> -            goto out;
> +    } else {
> +        sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
> +                                    pci->bus, pci->dev, pci->func);
> +        f = fopen(sysfs_path, "r");
> +        if (f == NULL) {
> +            LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
> +            goto out_no_irq;
> +        }
> +        if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +            r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
> +            if (r < 0) {
> +                LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
> +                    irq, r);
> +                fclose(f);
> +                rc = ERROR_FAIL;
> +                goto out;
> +            }
> +            r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> +            if (r < 0) {
> +                LOGED(ERROR, domainid,
> +                    "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
> +                fclose(f);
> +                rc = ERROR_FAIL;
> +                goto out;
> +            }
>          }
> +        fclose(f);
>      }
> -    fclose(f);
>  
>      /* Don't restrict writes to the PCI config space from this VM */
>      if (pci->permissive) {
> @@ -2229,33 +2241,43 @@ skip_bar:
>      if (!pci_supp_legacy_irq())
>          goto skip_legacy_irq;
>  
> -    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
> -                           pci->bus, pci->dev, pci->func);
> -
> -    f = fopen(sysfs_path, "r");
> -    if (f == NULL) {
> -        LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
> -        goto skip_legacy_irq;
> -    }
> +    /* When dom0 is PVH, should use gsi to unmap pirq and deny permission */
> +    rc = libxl__arch_local_domain_has_pirq_notion(gc);
> +    if (!rc) {
> +        rc = libxl__arch_hvm_unmap_gsi(gc, pci_encode_bdf(pci), domid);
> +        if (rc) {
> +            LOGED(ERROR, domid, "libxl__arch_hvm_unmap_gsi failed");
> +            goto out;
> +        }
> +    } else {
> +        sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
> +                            pci->bus, pci->dev, pci->func);
>  
> -    if ((fscanf(f, "%u", &irq) == 1) && irq) {
> -        rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
> -        if (rc < 0) {
> -            /*
> -             * QEMU may have already unmapped the IRQ. So the error
> -             * may be spurious. For now, still print an error message as
> -             * it is not easy to distinguished between valid and
> -             * spurious error.
> -             */
> -            LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
> +        f = fopen(sysfs_path, "r");
> +        if (f == NULL) {
> +            LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
> +            goto skip_legacy_irq;
>          }
> -        rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
> -        if (rc < 0) {
> -            LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
> +
> +        if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +            rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
> +            if (rc < 0) {
> +                /*
> +                * QEMU may have already unmapped the IRQ. So the error

The * ...     here ^ should be aligned with the * on the previous line.

> +                * may be spurious. For now, still print an error message as
> +                * it is not easy to distinguished between valid and
> +                * spurious error.
> +                */
> +                LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
> +            }
> +            rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
> +            if (rc < 0) {
> +                LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
> +            }
>          }
> -    }
>  
> -    fclose(f);
> +        fclose(f);
> +    }
>  
>  skip_legacy_irq:
>  
> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> index 60643d6f5376..20e3956f09b8 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
> +{
> +    int pirq = -1, gsi, r;
> +
> +    gsi = xc_pcidev_get_gsi(CTX->xch, sbdf);
> +    if (gsi < 0) {
> +        return ERROR_FAIL;
> +    }
> +
> +    r = xc_physdev_map_pirq_gsi(CTX->xch, domid, gsi, &pirq);
> +    if (r < 0) {
> +        LOGED(ERROR, domid, "xc_physdev_map_pirq_gsi gsi=%d ret=%d", gsi, r);

`r` should be -1, I don't think loggin it is useful..

> +        return ERROR_FAIL;
> +    }
> +
> +    r = xc_domain_gsi_permission(CTX->xch, domid, gsi, XEN_DOMCTL_GSI_GRANT);
> +    if (r < 0) {
> +        LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d ret=%d", gsi, r);

Same here. And in the next function.

> +        return ERROR_FAIL;
> +    }
> +
> +    return 0;
> +}
> +
> +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
> +{
> +    int pirq = -1, gsi, r;
> +
> +    gsi = xc_pcidev_get_gsi(CTX->xch, sbdf);
> +    if (gsi < 0) {
> +        return ERROR_FAIL;
> +    }
> +
> +    /* Before unmapping, use mapping to get the already mapped pirq first */
> +    r = xc_physdev_map_pirq_gsi(CTX->xch, domid, gsi, &pirq);
> +    if (r < 0) {
> +        LOGED(ERROR, domid, "xc_physdev_map_pirq_gsi gsi=%d ret=%d", gsi, r);
> +        return ERROR_FAIL;
> +    }
> +
> +    r = xc_physdev_unmap_pirq(CTX->xch, domid, pirq);
> +    if (r < 0) {
> +        LOGED(ERROR, domid, "xc_physdev_unmap_pirq gsi=%d ret=%d", gsi, r);
> +        return ERROR_FAIL;
> +    }
> +
> +    r = xc_domain_gsi_permission(CTX->xch, domid, gsi, XEN_DOMCTL_GSI_REVOKE);
> +    if (r < 0) {
> +        LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d ret=%d", gsi, r);
> +        return ERROR_FAIL;
> +    }
> +
> +    return 0;
> +}


Thanks,

-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-03 14:14           ` Jan Beulich
@ 2024-09-04  1:43             ` Stefano Stabellini
  2024-09-04  6:04               ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2024-09-04  1:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Chen, Jiqian, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Juergen Gross, Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On Tue, 3 Sep 2024, Jan Beulich wrote:
> On 03.09.2024 12:53, Chen, Jiqian wrote:
> > On 2024/9/3 17:25, Jan Beulich wrote:
> >> On 03.09.2024 09:58, Chen, Jiqian wrote:
> >>> On 2024/9/3 15:42, Jan Beulich wrote:
> >>>> On 03.09.2024 09:04, Jiqian Chen wrote:
> >>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
> >>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> >>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
> >>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> >>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> >>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> >>>>> codes.
> >>>>>
> >>>>> But it is fine to map interrupts through pirq to a HVM domain whose
> >>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
> >>>>> reference interrupts and it is just the way for the device model to
> >>>>> identify which interrupt should be mapped to which domain, however
> >>>>> has_pirq() is just to check if HVM domains route interrupts from
> >>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
> >>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
> >>>>>
> >>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> >>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
> >>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
> >>>>
> >>>> As before: When you talk about just Dom0, ...
> >>>>
> >>>>> --- a/xen/arch/x86/hvm/hypercall.c
> >>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>>>      {
> >>>>>      case PHYSDEVOP_map_pirq:
> >>>>>      case PHYSDEVOP_unmap_pirq:
> >>>>> +        break;
> >>>>> +
> >>>>>      case PHYSDEVOP_eoi:
> >>>>>      case PHYSDEVOP_irq_status_query:
> >>>>>      case PHYSDEVOP_get_free_pirq:
> >>>>
> >>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
> >>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
> >>>> Similarly imo Dom0 using this on itself wants discussing.
> >>> Do you mean I need to talk about why permit this op for all HVM
> >>
> >> You don't need to invent reasons, but it needs making clear that wider than
> >> necessary (for your purpose) exposure is at least not going to be a problem.
> >>
> >>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
> >>
> >> Aiui use on itself is limited to Dom0, so only that would need clarifying
> >> (along the lines of the above, i.e. that/why it is not a problem). For
> >> has_pirq() domains use on oneself was already permitted before.
> > 
> > Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
> > {
> > x86/pvh: Allow (un)map_pirq when dom0 is PVH
> > 
> > Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
> > code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> > xc_physdev_map_pirq map a pirq for passthrough devices.
> > In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> > has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> > so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> > codes.
> > 
> > To solve above problem, need to remove the chack has_pirq() for that
> > situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
> > adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
> > what the problem need.
> > So, clarify below:
> > 
> > For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
> > interrupts through pirq for them. Because pirq field is used as a way to
> > reference interrupts and it is just the way for the device model to
> > identify which interrupt should be mapped to which domain, however
> > has_pirq() is just to check if HVM domains route interrupts from
> > devices(emulated or passthrough) through event channel, so, remove
> > has_pirq() check has no impact on HVM domUs.
> > 
> > For PVH domUs that performs such an operation will fail at the check
> > xsm_map_dedomain_pirq() in physdev_map-nirq().
> > 
> > For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
> > it also has no impact.
> > }
> 
> This is better than what you had before, and I don't really fancy re-
> writing the description effectively from scratch. So let's just go from
> there. As to the "excess" permission for the sub-ops: The main thing I'm
> after is that it be clarified that we're not going to introduce any
> security issues here. That requires auditing the code, and merely saying
> "also has no impact" is a little too little for my taste. For Dom0 an
> argument may be that it is overly powerful already anyway, even if for
> PVH we're a little more strict than for PV (I think).

Hi Jan, for clarity and to make this actionable, are you suggesting to
clarify the commit message by adding wording around "Dom0 is overly
powerful already anyway so it is OK so this is OK" ?

There are typically no restrictions in terms of functionalities offered
to Dom0 as Dom0 can destroy other VMs and reboot the platform.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-04  1:43             ` Stefano Stabellini
@ 2024-09-04  6:04               ` Jan Beulich
  2024-09-05  6:45                 ` Chen, Jiqian
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2024-09-04  6:04 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Chen, Jiqian, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 04.09.2024 03:43, Stefano Stabellini wrote:
> On Tue, 3 Sep 2024, Jan Beulich wrote:
>> On 03.09.2024 12:53, Chen, Jiqian wrote:
>>> On 2024/9/3 17:25, Jan Beulich wrote:
>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>> codes.
>>>>>>>
>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>>>
>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>>>
>>>>>> As before: When you talk about just Dom0, ...
>>>>>>
>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>      {
>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>> +        break;
>>>>>>> +
>>>>>>>      case PHYSDEVOP_eoi:
>>>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>>>
>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>>>> Similarly imo Dom0 using this on itself wants discussing.
>>>>> Do you mean I need to talk about why permit this op for all HVM
>>>>
>>>> You don't need to invent reasons, but it needs making clear that wider than
>>>> necessary (for your purpose) exposure is at least not going to be a problem.
>>>>
>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>>>
>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>>>> (along the lines of the above, i.e. that/why it is not a problem). For
>>>> has_pirq() domains use on oneself was already permitted before.
>>>
>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
>>> {
>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
>>>
>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>> codes.
>>>
>>> To solve above problem, need to remove the chack has_pirq() for that
>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
>>> what the problem need.
>>> So, clarify below:
>>>
>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
>>> interrupts through pirq for them. Because pirq field is used as a way to
>>> reference interrupts and it is just the way for the device model to
>>> identify which interrupt should be mapped to which domain, however
>>> has_pirq() is just to check if HVM domains route interrupts from
>>> devices(emulated or passthrough) through event channel, so, remove
>>> has_pirq() check has no impact on HVM domUs.
>>>
>>> For PVH domUs that performs such an operation will fail at the check
>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
>>>
>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
>>> it also has no impact.
>>> }
>>
>> This is better than what you had before, and I don't really fancy re-
>> writing the description effectively from scratch. So let's just go from
>> there. As to the "excess" permission for the sub-ops: The main thing I'm
>> after is that it be clarified that we're not going to introduce any
>> security issues here. That requires auditing the code, and merely saying
>> "also has no impact" is a little too little for my taste. For Dom0 an
>> argument may be that it is overly powerful already anyway, even if for
>> PVH we're a little more strict than for PV (I think).
> 
> Hi Jan, for clarity and to make this actionable, are you suggesting to
> clarify the commit message by adding wording around "Dom0 is overly
> powerful already anyway so it is OK so this is OK" ?

Yes, perhaps with "deemed" added. And text for DomU-s similarly extended,
as the pointing at the XSM check is presently incomplete (stubdom-s can
pass that check, after all, as can de-priv qemu running in Dom0).

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0
  2024-09-03 17:16   ` Anthony PERARD
@ 2024-09-04  9:31     ` Chen, Jiqian
  0 siblings, 0 replies; 30+ messages in thread
From: Chen, Jiqian @ 2024-09-04  9:31 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Chen, Jiqian, Huang, Ray

On 2024/9/4 01:16, Anthony PERARD wrote:
> On Tue, Sep 03, 2024 at 03:04:24PM +0800, Jiqian Chen wrote:
>> When dom0 is PVH, and passthrough a device to dumU, xl will
>> use the gsi number of device to do a pirq mapping, see
>> pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
>> got from file /sys/bus/pci/devices/<sbdf>/irq, that confuses
>> irq and gsi, they are in different space and are not equal,
>> so it will fail when mapping.
>> To solve this issue, use xc_physdev_gsi_from_dev to get the
> 
> The function name has changed, it's not xc_physdev_gsi_from_dev anymore
> ;-). It is always possible to write commit description without naming
> the functions that are going to be use, they are in the patch anyway.
> But, the description should be updated to reflect changes in previous
> patches.
Will change in next version, and include all your below comments, thank you very much!

> 
>> real gsi and add a new function xc_physdev_map_pirq_gsi to get
>> a free pirq for gsi(why not use current function
>> xc_physdev_map_pirq, because it doesn't support to allocate a
>> free pirq, what's more, to prevent changing it and affecting
>> its callers, so add xc_physdev_map_pirq_gsi).
>>
>> Besides, PVH dom0 doesn't have PIRQ flag, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. So grant function callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
>> requires passing in pirq, it is not suitable for dom0 that
>> doesn't have PIRQs to grant irq permission.
>> To solve this issue, use the new hypercall
>> XEN_DOMCTL_gsi_permission to grant the permission of irq(
>> translate from gsi) to dumU when dom0 has no PIRQs.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
>> ---
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index 460a8e779ce8..c752cd1f4410 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -95,6 +95,33 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
>>      return rc;
>>  }
>>  
>> +int xc_physdev_map_pirq_gsi(xc_interface *xch,
>> +                            uint32_t domid,
>> +                            int gsi,
>> +                            int *pirq)
>> +{
>> +    int rc;
>> +    struct physdev_map_pirq map;
>> +
>> +    if ( !pirq )
>> +    {
>> +        errno = EINVAL;
>> +        return -1;
>> +    }
>> +    memset(&map, 0, sizeof(struct physdev_map_pirq));
>> +    map.domid = domid;
>> +    map.type = MAP_PIRQ_TYPE_GSI;
>> +    map.index = gsi;
> 
> You can write instead, when declaring `map`:
> 
>     struct physdev_map_pirq map = {
>         .domid = domid,
>         .type = MAP_PIRQ_TYPE_GSI,
>         .index = gsi,
>     };
> 
> Then there's no need to call memset(), the compiler will know what to
> do.
> 
>> +    map.pirq = *pirq;
>> +
>> +    rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
>> +
>> +    if ( !rc )
>> +        *pirq = map.pirq;
>> +
>> +    return rc;
>> +}
>> +
>>  int xc_physdev_unmap_pirq(xc_interface *xch,
>>                            uint32_t domid,
>>                            int pirq)
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 96cb4da0794e..2014a67e6e56 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -17,6 +17,7 @@
>>  #include "libxl_osdeps.h" /* must come before any other headers */
>>  
>>  #include "libxl_internal.h"
>> +#include "libxl_arch.h"
>>  
>>  #define PCI_BDF                "%04x:%02x:%02x.%01x"
>>  #define PCI_BDF_SHORT          "%02x:%02x.%01x"
>> @@ -1478,32 +1479,43 @@ static void pci_add_dm_done(libxl__egc *egc,
>>      fclose(f);
>>      if (!pci_supp_legacy_irq())
>>          goto out_no_irq;
>> -    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> -                                pci->bus, pci->dev, pci->func);
>> -    f = fopen(sysfs_path, "r");
>> -    if (f == NULL) {
>> -        LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
>> -        goto out_no_irq;
>> -    }
>> -    if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> -        r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>> -        if (r < 0) {
>> -            LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
>> -                  irq, r);
>> -            fclose(f);
>> -            rc = ERROR_FAIL;
>> +
>> +    /* When dom0 is PVH, should use gsi to map pirq and grant permission */
>> +    rc = libxl__arch_local_domain_has_pirq_notion(gc);
>> +    if (!rc) {
> 
> Here, you can libxl__arch_local_domain_has_pirq_notion() within the if()
> condition because the function returns a bool. (Also, `rc` is for libxl
> error code, so we can make the mistake here in thinking that there an
> error code been been store.) Alternatively, you could declare "bool ok"
> and use that.
> 
> After that, it will be easier to read the condition has "if has pirq" or
> "if ok" instead of "if no error".
> 
> 
> 
>> +        rc = libxl__arch_hvm_map_gsi(gc, pci_encode_bdf(pci), domid);
>> +        if (rc) {
>> +            LOGED(ERROR, domainid, "libxl__arch_hvm_map_gsi failed");
> 
> I think LOGD() instead of LOGED() would be enough here.
> libxl__arch_hvm_map_gsi() already logs `strerror(errno)` so there's no
> need to print it again.
> 
>>              goto out;
>>          }
>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>> -        if (r < 0) {
>> -            LOGED(ERROR, domainid,
>> -                  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
>> -            fclose(f);
>> -            rc = ERROR_FAIL;
>> -            goto out;
>> +    } else {
>> +        sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> +                                    pci->bus, pci->dev, pci->func);
>> +        f = fopen(sysfs_path, "r");
>> +        if (f == NULL) {
>> +            LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
>> +            goto out_no_irq;
>> +        }
>> +        if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> +            r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>> +            if (r < 0) {
>> +                LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
>> +                    irq, r);
>> +                fclose(f);
>> +                rc = ERROR_FAIL;
>> +                goto out;
>> +            }
>> +            r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>> +            if (r < 0) {
>> +                LOGED(ERROR, domainid,
>> +                    "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
>> +                fclose(f);
>> +                rc = ERROR_FAIL;
>> +                goto out;
>> +            }
>>          }
>> +        fclose(f);
>>      }
>> -    fclose(f);
>>  
>>      /* Don't restrict writes to the PCI config space from this VM */
>>      if (pci->permissive) {
>> @@ -2229,33 +2241,43 @@ skip_bar:
>>      if (!pci_supp_legacy_irq())
>>          goto skip_legacy_irq;
>>  
>> -    sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> -                           pci->bus, pci->dev, pci->func);
>> -
>> -    f = fopen(sysfs_path, "r");
>> -    if (f == NULL) {
>> -        LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
>> -        goto skip_legacy_irq;
>> -    }
>> +    /* When dom0 is PVH, should use gsi to unmap pirq and deny permission */
>> +    rc = libxl__arch_local_domain_has_pirq_notion(gc);
>> +    if (!rc) {
>> +        rc = libxl__arch_hvm_unmap_gsi(gc, pci_encode_bdf(pci), domid);
>> +        if (rc) {
>> +            LOGED(ERROR, domid, "libxl__arch_hvm_unmap_gsi failed");
>> +            goto out;
>> +        }
>> +    } else {
>> +        sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> +                            pci->bus, pci->dev, pci->func);
>>  
>> -    if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> -        rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
>> -        if (rc < 0) {
>> -            /*
>> -             * QEMU may have already unmapped the IRQ. So the error
>> -             * may be spurious. For now, still print an error message as
>> -             * it is not easy to distinguished between valid and
>> -             * spurious error.
>> -             */
>> -            LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
>> +        f = fopen(sysfs_path, "r");
>> +        if (f == NULL) {
>> +            LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
>> +            goto skip_legacy_irq;
>>          }
>> -        rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
>> -        if (rc < 0) {
>> -            LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
>> +
>> +        if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> +            rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
>> +            if (rc < 0) {
>> +                /*
>> +                * QEMU may have already unmapped the IRQ. So the error
> 
> The * ...     here ^ should be aligned with the * on the previous line.
> 
>> +                * may be spurious. For now, still print an error message as
>> +                * it is not easy to distinguished between valid and
>> +                * spurious error.
>> +                */
>> +                LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
>> +            }
>> +            rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
>> +            if (rc < 0) {
>> +                LOGED(ERROR, domid, "xc_domain_irq_permission irq=%d", irq);
>> +            }
>>          }
>> -    }
>>  
>> -    fclose(f);
>> +        fclose(f);
>> +    }
>>  
>>  skip_legacy_irq:
>>  
>> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
>> index 60643d6f5376..20e3956f09b8 100644
>> --- a/tools/libs/light/libxl_x86.c
>> +++ b/tools/libs/light/libxl_x86.c
>> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +    int pirq = -1, gsi, r;
>> +
>> +    gsi = xc_pcidev_get_gsi(CTX->xch, sbdf);
>> +    if (gsi < 0) {
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    r = xc_physdev_map_pirq_gsi(CTX->xch, domid, gsi, &pirq);
>> +    if (r < 0) {
>> +        LOGED(ERROR, domid, "xc_physdev_map_pirq_gsi gsi=%d ret=%d", gsi, r);
> 
> `r` should be -1, I don't think loggin it is useful..
> 
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    r = xc_domain_gsi_permission(CTX->xch, domid, gsi, XEN_DOMCTL_GSI_GRANT);
>> +    if (r < 0) {
>> +        LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d ret=%d", gsi, r);
> 
> Same here. And in the next function.
> 
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +    int pirq = -1, gsi, r;
>> +
>> +    gsi = xc_pcidev_get_gsi(CTX->xch, sbdf);
>> +    if (gsi < 0) {
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    /* Before unmapping, use mapping to get the already mapped pirq first */
>> +    r = xc_physdev_map_pirq_gsi(CTX->xch, domid, gsi, &pirq);
>> +    if (r < 0) {
>> +        LOGED(ERROR, domid, "xc_physdev_map_pirq_gsi gsi=%d ret=%d", gsi, r);
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    r = xc_physdev_unmap_pirq(CTX->xch, domid, pirq);
>> +    if (r < 0) {
>> +        LOGED(ERROR, domid, "xc_physdev_unmap_pirq gsi=%d ret=%d", gsi, r);
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    r = xc_domain_gsi_permission(CTX->xch, domid, gsi, XEN_DOMCTL_GSI_REVOKE);
>> +    if (r < 0) {
>> +        LOGED(ERROR, domid, "xc_domain_gsi_permission gsi=%d ret=%d", gsi, r);
>> +        return ERROR_FAIL;
>> +    }
>> +
>> +    return 0;
>> +}
> 
> 
> Thanks,
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-04  6:04               ` Jan Beulich
@ 2024-09-05  6:45                 ` Chen, Jiqian
  2024-09-05  9:44                   ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chen, Jiqian @ 2024-09-05  6:45 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini, Roger Pau Monné
  Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Chen, Jiqian

HI,

On 2024/9/4 14:04, Jan Beulich wrote:
> On 04.09.2024 03:43, Stefano Stabellini wrote:
>> On Tue, 3 Sep 2024, Jan Beulich wrote:
>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
>>>> On 2024/9/3 17:25, Jan Beulich wrote:
>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>> codes.
>>>>>>>>
>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>>>>
>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>>>>
>>>>>>> As before: When you talk about just Dom0, ...
>>>>>>>
>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>      {
>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>> +        break;
>>>>>>>> +
>>>>>>>>      case PHYSDEVOP_eoi:
>>>>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>>>>
>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
>>>>>> Do you mean I need to talk about why permit this op for all HVM
>>>>>
>>>>> You don't need to invent reasons, but it needs making clear that wider than
>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
>>>>>
>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>>>>
>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
>>>>> has_pirq() domains use on oneself was already permitted before.
>>>>
>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
>>>> {
>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
>>>>
>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>> codes.
>>>>
>>>> To solve above problem, need to remove the chack has_pirq() for that
>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
>>>> what the problem need.
>>>> So, clarify below:
>>>>
>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
>>>> interrupts through pirq for them. Because pirq field is used as a way to
>>>> reference interrupts and it is just the way for the device model to
>>>> identify which interrupt should be mapped to which domain, however
>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>> devices(emulated or passthrough) through event channel, so, remove
>>>> has_pirq() check has no impact on HVM domUs.
>>>>
>>>> For PVH domUs that performs such an operation will fail at the check
>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
>>>>
>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
>>>> it also has no impact.
>>>> }
>>>
>>> This is better than what you had before, and I don't really fancy re-
>>> writing the description effectively from scratch. So let's just go from
>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
>>> after is that it be clarified that we're not going to introduce any
>>> security issues here. That requires auditing the code, and merely saying
>>> "also has no impact" is a little too little for my taste. For Dom0 an
>>> argument may be that it is overly powerful already anyway, even if for
>>> PVH we're a little more strict than for PV (I think).
>>
>> Hi Jan, for clarity and to make this actionable, are you suggesting to
>> clarify the commit message by adding wording around "Dom0 is overly
>> powerful already anyway so it is OK so this is OK" ?
> 
> Yes, perhaps with "deemed" added. 
OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "

> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
> pass that check, after all, as can de-priv qemu running in Dom0).
So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-05  6:45                 ` Chen, Jiqian
@ 2024-09-05  9:44                   ` Jan Beulich
  2024-09-05 22:51                     ` Stefano Stabellini
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2024-09-05  9:44 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Stefano Stabellini, Roger Pau Monné

On 05.09.2024 08:45, Chen, Jiqian wrote:
> HI,
> 
> On 2024/9/4 14:04, Jan Beulich wrote:
>> On 04.09.2024 03:43, Stefano Stabellini wrote:
>>> On Tue, 3 Sep 2024, Jan Beulich wrote:
>>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
>>>>> On 2024/9/3 17:25, Jan Beulich wrote:
>>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>> codes.
>>>>>>>>>
>>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>>>>>
>>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>>>>>
>>>>>>>> As before: When you talk about just Dom0, ...
>>>>>>>>
>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>>      {
>>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>>> +        break;
>>>>>>>>> +
>>>>>>>>>      case PHYSDEVOP_eoi:
>>>>>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>>>>>
>>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
>>>>>>> Do you mean I need to talk about why permit this op for all HVM
>>>>>>
>>>>>> You don't need to invent reasons, but it needs making clear that wider than
>>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
>>>>>>
>>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>>>>>
>>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
>>>>>> has_pirq() domains use on oneself was already permitted before.
>>>>>
>>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
>>>>> {
>>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
>>>>>
>>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
>>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>> codes.
>>>>>
>>>>> To solve above problem, need to remove the chack has_pirq() for that
>>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
>>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
>>>>> what the problem need.
>>>>> So, clarify below:
>>>>>
>>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
>>>>> interrupts through pirq for them. Because pirq field is used as a way to
>>>>> reference interrupts and it is just the way for the device model to
>>>>> identify which interrupt should be mapped to which domain, however
>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>> devices(emulated or passthrough) through event channel, so, remove
>>>>> has_pirq() check has no impact on HVM domUs.
>>>>>
>>>>> For PVH domUs that performs such an operation will fail at the check
>>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
>>>>>
>>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
>>>>> it also has no impact.
>>>>> }
>>>>
>>>> This is better than what you had before, and I don't really fancy re-
>>>> writing the description effectively from scratch. So let's just go from
>>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
>>>> after is that it be clarified that we're not going to introduce any
>>>> security issues here. That requires auditing the code, and merely saying
>>>> "also has no impact" is a little too little for my taste. For Dom0 an
>>>> argument may be that it is overly powerful already anyway, even if for
>>>> PVH we're a little more strict than for PV (I think).
>>>
>>> Hi Jan, for clarity and to make this actionable, are you suggesting to
>>> clarify the commit message by adding wording around "Dom0 is overly
>>> powerful already anyway so it is OK so this is OK" ?
>>
>> Yes, perhaps with "deemed" added. 
> OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "

I don't mind the deemed as you add it, but the important place to add it
here is before "OK". I'm sorry, it didn't occur to me that after all the
prior discussion this earlier reply of mine could still be mis-interpreted.

>> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
>> pass that check, after all, as can de-priv qemu running in Dom0).
> So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?

I'm afraid that in order to make (propose) such a change you need to be
able to explain why it is okay to expose functionality beyond where it's
presently exposed. It's not just writing a new paragraph that's needed
here. You first need to _check_ that what you do is okay. And once you've
done that checking, you then summarize that in writing.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-05  9:44                   ` Jan Beulich
@ 2024-09-05 22:51                     ` Stefano Stabellini
  2024-09-06  6:26                       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2024-09-05 22:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Chen, Jiqian, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Stefano Stabellini, Roger Pau Monné

On Thu, 5 Sep 2024, Jan Beulich wrote:
> On 05.09.2024 08:45, Chen, Jiqian wrote:
> > HI,
> > 
> > On 2024/9/4 14:04, Jan Beulich wrote:
> >> On 04.09.2024 03:43, Stefano Stabellini wrote:
> >>> On Tue, 3 Sep 2024, Jan Beulich wrote:
> >>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
> >>>>> On 2024/9/3 17:25, Jan Beulich wrote:
> >>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
> >>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
> >>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
> >>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
> >>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> >>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
> >>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> >>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> >>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> >>>>>>>>> codes.
> >>>>>>>>>
> >>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
> >>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
> >>>>>>>>> reference interrupts and it is just the way for the device model to
> >>>>>>>>> identify which interrupt should be mapped to which domain, however
> >>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
> >>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
> >>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
> >>>>>>>>>
> >>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> >>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
> >>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
> >>>>>>>>
> >>>>>>>> As before: When you talk about just Dom0, ...
> >>>>>>>>
> >>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
> >>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>>>>>>>      {
> >>>>>>>>>      case PHYSDEVOP_map_pirq:
> >>>>>>>>>      case PHYSDEVOP_unmap_pirq:
> >>>>>>>>> +        break;
> >>>>>>>>> +
> >>>>>>>>>      case PHYSDEVOP_eoi:
> >>>>>>>>>      case PHYSDEVOP_irq_status_query:
> >>>>>>>>>      case PHYSDEVOP_get_free_pirq:
> >>>>>>>>
> >>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
> >>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
> >>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
> >>>>>>> Do you mean I need to talk about why permit this op for all HVM
> >>>>>>
> >>>>>> You don't need to invent reasons, but it needs making clear that wider than
> >>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
> >>>>>>
> >>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
> >>>>>>
> >>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
> >>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
> >>>>>> has_pirq() domains use on oneself was already permitted before.
> >>>>>
> >>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
> >>>>> {
> >>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
> >>>>>
> >>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
> >>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> >>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
> >>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> >>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> >>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> >>>>> codes.
> >>>>>
> >>>>> To solve above problem, need to remove the chack has_pirq() for that
> >>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
> >>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
> >>>>> what the problem need.
> >>>>> So, clarify below:
> >>>>>
> >>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
> >>>>> interrupts through pirq for them. Because pirq field is used as a way to
> >>>>> reference interrupts and it is just the way for the device model to
> >>>>> identify which interrupt should be mapped to which domain, however
> >>>>> has_pirq() is just to check if HVM domains route interrupts from
> >>>>> devices(emulated or passthrough) through event channel, so, remove
> >>>>> has_pirq() check has no impact on HVM domUs.
> >>>>>
> >>>>> For PVH domUs that performs such an operation will fail at the check
> >>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
> >>>>>
> >>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
> >>>>> it also has no impact.
> >>>>> }
> >>>>
> >>>> This is better than what you had before, and I don't really fancy re-
> >>>> writing the description effectively from scratch. So let's just go from
> >>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
> >>>> after is that it be clarified that we're not going to introduce any
> >>>> security issues here. That requires auditing the code, and merely saying
> >>>> "also has no impact" is a little too little for my taste. For Dom0 an
> >>>> argument may be that it is overly powerful already anyway, even if for
> >>>> PVH we're a little more strict than for PV (I think).
> >>>
> >>> Hi Jan, for clarity and to make this actionable, are you suggesting to
> >>> clarify the commit message by adding wording around "Dom0 is overly
> >>> powerful already anyway so it is OK so this is OK" ?
> >>
> >> Yes, perhaps with "deemed" added. 
> > OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "
> 
> I don't mind the deemed as you add it, but the important place to add it
> here is before "OK". I'm sorry, it didn't occur to me that after all the
> prior discussion this earlier reply of mine could still be mis-interpreted.
> 
> >> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
> >> pass that check, after all, as can de-priv qemu running in Dom0).
> > So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?
> 
> I'm afraid that in order to make (propose) such a change you need to be
> able to explain why it is okay to expose functionality beyond where it's
> presently exposed. It's not just writing a new paragraph that's needed
> here. You first need to _check_ that what you do is okay. And once you've
> done that checking, you then summarize that in writing.
 

PHYSDEVOP_map_pirq ends up calling physdev_map_pirq which is protected
by:

    ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
    if ( ret )
        return ret;

Dom0 having permission to do PHYSDEVOP_map_pirq even without has_pirq is
fine. Device models are also OK because the code we are trying to enable
is in fact part of the device model. If someone were to run an HVM
stubdom they might need this patch as well.

If PHYSDEVOP_map_pirq is allowed, also PHYSDEVOP_unmap_pirq should be
allowed.

Is this explanation OK?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-05 22:51                     ` Stefano Stabellini
@ 2024-09-06  6:26                       ` Jan Beulich
  2024-09-06 23:34                         ` Stefano Stabellini
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2024-09-06  6:26 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Chen, Jiqian, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Roger Pau Monné

On 06.09.2024 00:51, Stefano Stabellini wrote:
> On Thu, 5 Sep 2024, Jan Beulich wrote:
>> On 05.09.2024 08:45, Chen, Jiqian wrote:
>>> HI,
>>>
>>> On 2024/9/4 14:04, Jan Beulich wrote:
>>>> On 04.09.2024 03:43, Stefano Stabellini wrote:
>>>>> On Tue, 3 Sep 2024, Jan Beulich wrote:
>>>>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
>>>>>>> On 2024/9/3 17:25, Jan Beulich wrote:
>>>>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>>>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>>>> codes.
>>>>>>>>>>>
>>>>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>>>>>>>
>>>>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>>>>>>>
>>>>>>>>>> As before: When you talk about just Dom0, ...
>>>>>>>>>>
>>>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>>>>      {
>>>>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>>>>> +        break;
>>>>>>>>>>> +
>>>>>>>>>>>      case PHYSDEVOP_eoi:
>>>>>>>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>>>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>>>>>>>
>>>>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
>>>>>>>>> Do you mean I need to talk about why permit this op for all HVM
>>>>>>>>
>>>>>>>> You don't need to invent reasons, but it needs making clear that wider than
>>>>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
>>>>>>>>
>>>>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>>>>>>>
>>>>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>>>>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
>>>>>>>> has_pirq() domains use on oneself was already permitted before.
>>>>>>>
>>>>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
>>>>>>> {
>>>>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
>>>>>>>
>>>>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
>>>>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>> codes.
>>>>>>>
>>>>>>> To solve above problem, need to remove the chack has_pirq() for that
>>>>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
>>>>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
>>>>>>> what the problem need.
>>>>>>> So, clarify below:
>>>>>>>
>>>>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
>>>>>>> interrupts through pirq for them. Because pirq field is used as a way to
>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>> devices(emulated or passthrough) through event channel, so, remove
>>>>>>> has_pirq() check has no impact on HVM domUs.
>>>>>>>
>>>>>>> For PVH domUs that performs such an operation will fail at the check
>>>>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
>>>>>>>
>>>>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
>>>>>>> it also has no impact.
>>>>>>> }
>>>>>>
>>>>>> This is better than what you had before, and I don't really fancy re-
>>>>>> writing the description effectively from scratch. So let's just go from
>>>>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
>>>>>> after is that it be clarified that we're not going to introduce any
>>>>>> security issues here. That requires auditing the code, and merely saying
>>>>>> "also has no impact" is a little too little for my taste. For Dom0 an
>>>>>> argument may be that it is overly powerful already anyway, even if for
>>>>>> PVH we're a little more strict than for PV (I think).
>>>>>
>>>>> Hi Jan, for clarity and to make this actionable, are you suggesting to
>>>>> clarify the commit message by adding wording around "Dom0 is overly
>>>>> powerful already anyway so it is OK so this is OK" ?
>>>>
>>>> Yes, perhaps with "deemed" added. 
>>> OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "
>>
>> I don't mind the deemed as you add it, but the important place to add it
>> here is before "OK". I'm sorry, it didn't occur to me that after all the
>> prior discussion this earlier reply of mine could still be mis-interpreted.
>>
>>>> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
>>>> pass that check, after all, as can de-priv qemu running in Dom0).
>>> So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?
>>
>> I'm afraid that in order to make (propose) such a change you need to be
>> able to explain why it is okay to expose functionality beyond where it's
>> presently exposed. It's not just writing a new paragraph that's needed
>> here. You first need to _check_ that what you do is okay. And once you've
>> done that checking, you then summarize that in writing.
>  
> 
> PHYSDEVOP_map_pirq ends up calling physdev_map_pirq which is protected
> by:
> 
>     ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
>     if ( ret )
>         return ret;
> 
> Dom0 having permission to do PHYSDEVOP_map_pirq even without has_pirq is
> fine. Device models are also OK because the code we are trying to enable
> is in fact part of the device model. If someone were to run an HVM
> stubdom they might need this patch as well.
> 
> If PHYSDEVOP_map_pirq is allowed, also PHYSDEVOP_unmap_pirq should be
> allowed.
> 
> Is this explanation OK?

This still solely focuses on why the functionality is wanted. There
continues to be nothing about the wider exposure actually being safe.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-06  6:26                       ` Jan Beulich
@ 2024-09-06 23:34                         ` Stefano Stabellini
  2024-09-09  8:56                           ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2024-09-06 23:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Chen, Jiqian, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Roger Pau Monné

On Fri, 6 Sep 2024, Jan Beulich wrote:
> On 06.09.2024 00:51, Stefano Stabellini wrote:
> > On Thu, 5 Sep 2024, Jan Beulich wrote:
> >> On 05.09.2024 08:45, Chen, Jiqian wrote:
> >>> HI,
> >>>
> >>> On 2024/9/4 14:04, Jan Beulich wrote:
> >>>> On 04.09.2024 03:43, Stefano Stabellini wrote:
> >>>>> On Tue, 3 Sep 2024, Jan Beulich wrote:
> >>>>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
> >>>>>>> On 2024/9/3 17:25, Jan Beulich wrote:
> >>>>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
> >>>>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
> >>>>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
> >>>>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
> >>>>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> >>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
> >>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> >>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> >>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> >>>>>>>>>>> codes.
> >>>>>>>>>>>
> >>>>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
> >>>>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
> >>>>>>>>>>> reference interrupts and it is just the way for the device model to
> >>>>>>>>>>> identify which interrupt should be mapped to which domain, however
> >>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
> >>>>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
> >>>>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
> >>>>>>>>>>>
> >>>>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> >>>>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
> >>>>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
> >>>>>>>>>>
> >>>>>>>>>> As before: When you talk about just Dom0, ...
> >>>>>>>>>>
> >>>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
> >>>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>>>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>>>>>>>>>      {
> >>>>>>>>>>>      case PHYSDEVOP_map_pirq:
> >>>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
> >>>>>>>>>>> +        break;
> >>>>>>>>>>> +
> >>>>>>>>>>>      case PHYSDEVOP_eoi:
> >>>>>>>>>>>      case PHYSDEVOP_irq_status_query:
> >>>>>>>>>>>      case PHYSDEVOP_get_free_pirq:
> >>>>>>>>>>
> >>>>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
> >>>>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
> >>>>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
> >>>>>>>>> Do you mean I need to talk about why permit this op for all HVM
> >>>>>>>>
> >>>>>>>> You don't need to invent reasons, but it needs making clear that wider than
> >>>>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
> >>>>>>>>
> >>>>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
> >>>>>>>>
> >>>>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
> >>>>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
> >>>>>>>> has_pirq() domains use on oneself was already permitted before.
> >>>>>>>
> >>>>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
> >>>>>>> {
> >>>>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
> >>>>>>>
> >>>>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
> >>>>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> >>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
> >>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> >>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> >>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> >>>>>>> codes.
> >>>>>>>
> >>>>>>> To solve above problem, need to remove the chack has_pirq() for that
> >>>>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
> >>>>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
> >>>>>>> what the problem need.
> >>>>>>> So, clarify below:
> >>>>>>>
> >>>>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
> >>>>>>> interrupts through pirq for them. Because pirq field is used as a way to
> >>>>>>> reference interrupts and it is just the way for the device model to
> >>>>>>> identify which interrupt should be mapped to which domain, however
> >>>>>>> has_pirq() is just to check if HVM domains route interrupts from
> >>>>>>> devices(emulated or passthrough) through event channel, so, remove
> >>>>>>> has_pirq() check has no impact on HVM domUs.
> >>>>>>>
> >>>>>>> For PVH domUs that performs such an operation will fail at the check
> >>>>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
> >>>>>>>
> >>>>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
> >>>>>>> it also has no impact.
> >>>>>>> }
> >>>>>>
> >>>>>> This is better than what you had before, and I don't really fancy re-
> >>>>>> writing the description effectively from scratch. So let's just go from
> >>>>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
> >>>>>> after is that it be clarified that we're not going to introduce any
> >>>>>> security issues here. That requires auditing the code, and merely saying
> >>>>>> "also has no impact" is a little too little for my taste. For Dom0 an
> >>>>>> argument may be that it is overly powerful already anyway, even if for
> >>>>>> PVH we're a little more strict than for PV (I think).
> >>>>>
> >>>>> Hi Jan, for clarity and to make this actionable, are you suggesting to
> >>>>> clarify the commit message by adding wording around "Dom0 is overly
> >>>>> powerful already anyway so it is OK so this is OK" ?
> >>>>
> >>>> Yes, perhaps with "deemed" added. 
> >>> OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "
> >>
> >> I don't mind the deemed as you add it, but the important place to add it
> >> here is before "OK". I'm sorry, it didn't occur to me that after all the
> >> prior discussion this earlier reply of mine could still be mis-interpreted.
> >>
> >>>> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
> >>>> pass that check, after all, as can de-priv qemu running in Dom0).
> >>> So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?
> >>
> >> I'm afraid that in order to make (propose) such a change you need to be
> >> able to explain why it is okay to expose functionality beyond where it's
> >> presently exposed. It's not just writing a new paragraph that's needed
> >> here. You first need to _check_ that what you do is okay. And once you've
> >> done that checking, you then summarize that in writing.
> >  
> > 
> > PHYSDEVOP_map_pirq ends up calling physdev_map_pirq which is protected
> > by:
> > 
> >     ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
> >     if ( ret )
> >         return ret;
> > 
> > Dom0 having permission to do PHYSDEVOP_map_pirq even without has_pirq is
> > fine. Device models are also OK because the code we are trying to enable
> > is in fact part of the device model. If someone were to run an HVM
> > stubdom they might need this patch as well.
> > 
> > If PHYSDEVOP_map_pirq is allowed, also PHYSDEVOP_unmap_pirq should be
> > allowed.
> > 
> > Is this explanation OK?
> 
> This still solely focuses on why the functionality is wanted. There
> continues to be nothing about the wider exposure actually being safe.

I don't think I understand what you would like to be checked or
clarified...

The only wider exposure is to device models, and device models can do a
lot worse than mapping pirqs already. There is no wider exposure to
DomUs. Also PV device models can already do this.

But the above must be clear to you as well, so I am not sure what you
are looking for.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-06 23:34                         ` Stefano Stabellini
@ 2024-09-09  8:56                           ` Jan Beulich
  2024-09-09 10:04                             ` Roger Pau Monné
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2024-09-09  8:56 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Chen, Jiqian, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Roger Pau Monné

On 07.09.2024 01:34, Stefano Stabellini wrote:
> On Fri, 6 Sep 2024, Jan Beulich wrote:
>> On 06.09.2024 00:51, Stefano Stabellini wrote:
>>> On Thu, 5 Sep 2024, Jan Beulich wrote:
>>>> On 05.09.2024 08:45, Chen, Jiqian wrote:
>>>>> HI,
>>>>>
>>>>> On 2024/9/4 14:04, Jan Beulich wrote:
>>>>>> On 04.09.2024 03:43, Stefano Stabellini wrote:
>>>>>>> On Tue, 3 Sep 2024, Jan Beulich wrote:
>>>>>>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
>>>>>>>>> On 2024/9/3 17:25, Jan Beulich wrote:
>>>>>>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>>>>>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>>>>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>>>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>>>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>>>>>> codes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>>>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>>>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>>>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>>>>>>>>>
>>>>>>>>>>>> As before: When you talk about just Dom0, ...
>>>>>>>>>>>>
>>>>>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>>>>>>      {
>>>>>>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>>>>>>> +        break;
>>>>>>>>>>>>> +
>>>>>>>>>>>>>      case PHYSDEVOP_eoi:
>>>>>>>>>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>>>>>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>>>>>>>>>
>>>>>>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>>>>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>>>>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
>>>>>>>>>>> Do you mean I need to talk about why permit this op for all HVM
>>>>>>>>>>
>>>>>>>>>> You don't need to invent reasons, but it needs making clear that wider than
>>>>>>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
>>>>>>>>>>
>>>>>>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>>>>>>>>>
>>>>>>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>>>>>>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
>>>>>>>>>> has_pirq() domains use on oneself was already permitted before.
>>>>>>>>>
>>>>>>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
>>>>>>>>> {
>>>>>>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
>>>>>>>>>
>>>>>>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
>>>>>>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>> codes.
>>>>>>>>>
>>>>>>>>> To solve above problem, need to remove the chack has_pirq() for that
>>>>>>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
>>>>>>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
>>>>>>>>> what the problem need.
>>>>>>>>> So, clarify below:
>>>>>>>>>
>>>>>>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
>>>>>>>>> interrupts through pirq for them. Because pirq field is used as a way to
>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>> devices(emulated or passthrough) through event channel, so, remove
>>>>>>>>> has_pirq() check has no impact on HVM domUs.
>>>>>>>>>
>>>>>>>>> For PVH domUs that performs such an operation will fail at the check
>>>>>>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
>>>>>>>>>
>>>>>>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
>>>>>>>>> it also has no impact.
>>>>>>>>> }
>>>>>>>>
>>>>>>>> This is better than what you had before, and I don't really fancy re-
>>>>>>>> writing the description effectively from scratch. So let's just go from
>>>>>>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
>>>>>>>> after is that it be clarified that we're not going to introduce any
>>>>>>>> security issues here. That requires auditing the code, and merely saying
>>>>>>>> "also has no impact" is a little too little for my taste. For Dom0 an
>>>>>>>> argument may be that it is overly powerful already anyway, even if for
>>>>>>>> PVH we're a little more strict than for PV (I think).
>>>>>>>
>>>>>>> Hi Jan, for clarity and to make this actionable, are you suggesting to
>>>>>>> clarify the commit message by adding wording around "Dom0 is overly
>>>>>>> powerful already anyway so it is OK so this is OK" ?
>>>>>>
>>>>>> Yes, perhaps with "deemed" added. 
>>>>> OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "
>>>>
>>>> I don't mind the deemed as you add it, but the important place to add it
>>>> here is before "OK". I'm sorry, it didn't occur to me that after all the
>>>> prior discussion this earlier reply of mine could still be mis-interpreted.
>>>>
>>>>>> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
>>>>>> pass that check, after all, as can de-priv qemu running in Dom0).
>>>>> So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?
>>>>
>>>> I'm afraid that in order to make (propose) such a change you need to be
>>>> able to explain why it is okay to expose functionality beyond where it's
>>>> presently exposed. It's not just writing a new paragraph that's needed
>>>> here. You first need to _check_ that what you do is okay. And once you've
>>>> done that checking, you then summarize that in writing.
>>>  
>>>
>>> PHYSDEVOP_map_pirq ends up calling physdev_map_pirq which is protected
>>> by:
>>>
>>>     ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
>>>     if ( ret )
>>>         return ret;
>>>
>>> Dom0 having permission to do PHYSDEVOP_map_pirq even without has_pirq is
>>> fine. Device models are also OK because the code we are trying to enable
>>> is in fact part of the device model. If someone were to run an HVM
>>> stubdom they might need this patch as well.
>>>
>>> If PHYSDEVOP_map_pirq is allowed, also PHYSDEVOP_unmap_pirq should be
>>> allowed.
>>>
>>> Is this explanation OK?
>>
>> This still solely focuses on why the functionality is wanted. There
>> continues to be nothing about the wider exposure actually being safe.
> 
> I don't think I understand what you would like to be checked or
> clarified...
> 
> The only wider exposure is to device models, and device models can do a
> lot worse than mapping pirqs already. There is no wider exposure to
> DomUs. Also PV device models can already do this.

What do you mean by "worse"? I hope not "crash Xen"? And _that's_ what I
want to have assurance of, e.g. a PVH/HVM DM not suddenly bringing Xen
down, because these paths previously weren't accessible to them.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-09-03  7:04 ` [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
@ 2024-09-09  9:15   ` Roger Pau Monné
  2024-09-09 10:30     ` Chen, Jiqian
  0 siblings, 1 reply; 30+ messages in thread
From: Roger Pau Monné @ 2024-09-09  9:15 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Tue, Sep 03, 2024 at 03:04:22PM +0800, Jiqian Chen wrote:
> Some type of domains don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq to set the access of irq, it is not suitable for
> dom0 that doesn't have PIRQs.

I think the above commit message is a bit confusing, I would rather
write it as:

x86/irq: allow setting IRQ permissions from GSI instead of pIRQ

Some domains are not aware of the pIRQ abstraction layer that maps
interrupt sources into Xen space interrupt numbers.  pIRQs values are
only exposed to domains that have the option to route physical
interrupts over event channels.

This creates issues for PCI-passthrough from a PVH domain, as some of
the passthrough related hypercalls use pIRQ as references to physical
interrupts on the system.  One of such interfaces is
XEN_DOMCTL_irq_permission, used to grant or revoke access to
interrupts, takes a pIRQ as the reference to the interrupt to be
adjusted.

Since PVH doesn't manage interrupts in terms of pIRQs, introduce a new
hypercall that allows setting interrupt permissions based on GSI value
rather than pIRQ.

Note the GSI hypercall parameters is translated to an IRQ value (in
case there are ACPI overrides) before doing the checks.

> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke
> the permission of irq (translated from x86 gsi) to dumU when dom0
> has no PIRQs.
> 
> Regarding the translation from gsi to irq, it is that if there are
> ACPI overrides entries then get translation from them, if not gsi
> are identity mapped into irq.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
> Remaining unsolved comment @Daniel P . Smith:
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, flags) )
> +            break;
> Is it okay to issue the XSM check using the translated value(irq),
> not the one(gsi) that was originally passed into the hypercall?
> ---
> v13->v14 changes:
> No.
> 
> v12->v13 changes:
> For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros.
> Move "gsi > highest_gsi()" into function gsi_2_irq.
> Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type.
> Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
> Delete unnecessary goto statements and change to direct break.
> Add description in commit message to explain how gsi to irq isconverted.
> 
> v11->v12 changes:
> Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function.
> Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
> Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
> 
> v10->v11 changes:
> Extracted from patch#5 of v10 into a separate patch.
> Add non-zero judgment for other bits of allow_access.
> Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
> Change the error exit path identifier "out" to "gsi_permission_out".
> Use ARRAY_SIZE() instead of open coed.
> 
> v9->v10 changes:
> Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission.
> Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain.
> In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added.
> Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition.
> 
> v8->v9 changes:
> Change the commit message to describe more why we need this new hypercall.
> Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check.
> Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
> Add explicit padding to struct xen_domctl_gsi_permission.
> 
> v5->v8 changes:
> Nothing.
> 
> v4->v5 changes:
> New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi.
> ---
>  xen/arch/x86/domctl.c              | 29 +++++++++++++++++++++++++++++
>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>  xen/arch/x86/io_apic.c             | 21 +++++++++++++++++++++
>  xen/arch/x86/mpparse.c             |  7 +++----
>  xen/include/public/domctl.h        | 10 ++++++++++
>  xen/xsm/flask/hooks.c              |  1 +
>  6 files changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 68b5b46d1a83..60b5578c47f8 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -36,6 +36,7 @@
>  #include <asm/xstate.h>
>  #include <asm/psr.h>
>  #include <asm/cpu-policy.h>
> +#include <asm/io_apic.h>
>  
>  static int update_domain_cpu_policy(struct domain *d,
>                                      xen_domctl_cpu_policy_t *xdpc)
> @@ -237,6 +238,34 @@ long arch_do_domctl(
>          break;
>      }
>  
> +    case XEN_DOMCTL_gsi_permission:
> +    {
> +        int irq;
> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
> +        uint32_t flags = domctl->u.gsi_permission.flags;
> +
> +        /* Check all bits are zero except lowest bit */

Nit: I would instead use: "Check only valid bits are set" in order to
avoid the comment going stale if more bits are used in the flags
field.

> +        ret = -EINVAL;
> +        if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK )
> +            break;
> +
> +        ret = irq = gsi_2_irq(gsi);
> +        if ( ret <= 0 )
> +            break;
> +
> +        ret = -EPERM;
> +        if ( !irq_access_permitted(currd, irq) ||
> +             xsm_irq_permission(XSM_HOOK, d, irq, flags) )
> +            break;
> +
> +        if ( flags )
> +            ret = irq_permit_access(d, irq);
> +        else
> +            ret = irq_deny_access(d, irq);
> +
> +        break;
> +    }
> +
>      case XEN_DOMCTL_getpageframeinfo3:
>      {
>          unsigned int num = domctl->u.getpageframeinfo3.num;
> diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
> index 78268ea8f666..62456806c7af 100644
> --- a/xen/arch/x86/include/asm/io_apic.h
> +++ b/xen/arch/x86/include/asm/io_apic.h
> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>  
>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
> +int mp_find_ioapic(unsigned int gsi);
> +int gsi_2_irq(unsigned int gsi);
>  
>  #endif
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index 772700584639..5859484875cc 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -955,6 +955,27 @@ static int pin_2_irq(int idx, int apic, int pin)
>      return irq;
>  }
>  
> +int gsi_2_irq(unsigned int gsi)
> +{
> +    int ioapic, irq;
> +    unsigned int pin;
> +
> +    if ( gsi > highest_gsi() )
> +        return -ERANGE;
> +
> +    ioapic = mp_find_ioapic(gsi);
> +    if ( ioapic < 0 )
> +        return -EINVAL;
> +
> +    pin = gsi - io_apic_gsi_base(ioapic);
> +
> +    irq = apic_pin_2_gsi_irq(ioapic, pin);
> +    if ( irq <= 0 )
> +        return -EINVAL;
> +
> +    return irq;

I think you could simplify this as:

return irq ?: -EINVAL;

So that the error code is possibly preserved from
apic_pin_2_gsi_irq(), or otherwise -EINVAL is returned if irq == 0.

pin_2_irq() is IMO broken in returning irq == 0 when the bus is
unknown, as irq == 0 is a valid irq, but let's not get into that here.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev
  2024-09-03  7:04 ` [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
@ 2024-09-09  9:34   ` Roger Pau Monné
  0 siblings, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2024-09-09  9:34 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui

On Tue, Sep 03, 2024 at 03:04:20PM +0800, Jiqian Chen wrote:
> When a device has been reset on dom0 side, the Xen hypervisor
> doesn't get notification, so the cached state in vpci is all
> out of date compare with the real device state.
> 
> To solve that problem, add a new hypercall to support the reset
> of pcidev and clear the vpci state of device. So that once the
> state of device is reset on dom0 side, dom0 can call this
> hypercall to notify hypervisor.
> 
> The behavior of different reset types may be different in the
> future, so divide them now so that they can be easily modified
> in the future without affecting the hypercall interface.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-09  8:56                           ` Jan Beulich
@ 2024-09-09 10:04                             ` Roger Pau Monné
  2024-09-09 10:22                               ` Jan Beulich
  2024-09-09 10:35                               ` Chen, Jiqian
  0 siblings, 2 replies; 30+ messages in thread
From: Roger Pau Monné @ 2024-09-09 10:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Chen, Jiqian, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On Mon, Sep 09, 2024 at 10:56:07AM +0200, Jan Beulich wrote:
> On 07.09.2024 01:34, Stefano Stabellini wrote:
> > On Fri, 6 Sep 2024, Jan Beulich wrote:
> >> On 06.09.2024 00:51, Stefano Stabellini wrote:
> >>> On Thu, 5 Sep 2024, Jan Beulich wrote:
> >>>> On 05.09.2024 08:45, Chen, Jiqian wrote:
> >>>>> HI,
> >>>>>
> >>>>> On 2024/9/4 14:04, Jan Beulich wrote:
> >>>>>> On 04.09.2024 03:43, Stefano Stabellini wrote:
> >>>>>>> On Tue, 3 Sep 2024, Jan Beulich wrote:
> >>>>>>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
> >>>>>>>>> On 2024/9/3 17:25, Jan Beulich wrote:
> >>>>>>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
> >>>>>>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
> >>>>>>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
> >>>>>>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
> >>>>>>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> >>>>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
> >>>>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> >>>>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> >>>>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> >>>>>>>>>>>>> codes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
> >>>>>>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
> >>>>>>>>>>>>> reference interrupts and it is just the way for the device model to
> >>>>>>>>>>>>> identify which interrupt should be mapped to which domain, however
> >>>>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
> >>>>>>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
> >>>>>>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> >>>>>>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
> >>>>>>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
> >>>>>>>>>>>>
> >>>>>>>>>>>> As before: When you talk about just Dom0, ...
> >>>>>>>>>>>>
> >>>>>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
> >>>>>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>>>>>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>>>>>>>>>>>      {
> >>>>>>>>>>>>>      case PHYSDEVOP_map_pirq:
> >>>>>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
> >>>>>>>>>>>>> +        break;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>>      case PHYSDEVOP_eoi:
> >>>>>>>>>>>>>      case PHYSDEVOP_irq_status_query:
> >>>>>>>>>>>>>      case PHYSDEVOP_get_free_pirq:
> >>>>>>>>>>>>
> >>>>>>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
> >>>>>>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
> >>>>>>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
> >>>>>>>>>>> Do you mean I need to talk about why permit this op for all HVM
> >>>>>>>>>>
> >>>>>>>>>> You don't need to invent reasons, but it needs making clear that wider than
> >>>>>>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
> >>>>>>>>>>
> >>>>>>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
> >>>>>>>>>>
> >>>>>>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
> >>>>>>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
> >>>>>>>>>> has_pirq() domains use on oneself was already permitted before.
> >>>>>>>>>
> >>>>>>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
> >>>>>>>>> {
> >>>>>>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
> >>>>>>>>>
> >>>>>>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
> >>>>>>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
> >>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
> >>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
> >>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
> >>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
> >>>>>>>>> codes.
> >>>>>>>>>
> >>>>>>>>> To solve above problem, need to remove the chack has_pirq() for that
> >>>>>>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
> >>>>>>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
> >>>>>>>>> what the problem need.
> >>>>>>>>> So, clarify below:
> >>>>>>>>>
> >>>>>>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
> >>>>>>>>> interrupts through pirq for them. Because pirq field is used as a way to
> >>>>>>>>> reference interrupts and it is just the way for the device model to
> >>>>>>>>> identify which interrupt should be mapped to which domain, however
> >>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
> >>>>>>>>> devices(emulated or passthrough) through event channel, so, remove
> >>>>>>>>> has_pirq() check has no impact on HVM domUs.
> >>>>>>>>>
> >>>>>>>>> For PVH domUs that performs such an operation will fail at the check
> >>>>>>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
> >>>>>>>>>
> >>>>>>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
> >>>>>>>>> it also has no impact.
> >>>>>>>>> }
> >>>>>>>>
> >>>>>>>> This is better than what you had before, and I don't really fancy re-
> >>>>>>>> writing the description effectively from scratch. So let's just go from
> >>>>>>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
> >>>>>>>> after is that it be clarified that we're not going to introduce any
> >>>>>>>> security issues here. That requires auditing the code, and merely saying
> >>>>>>>> "also has no impact" is a little too little for my taste. For Dom0 an
> >>>>>>>> argument may be that it is overly powerful already anyway, even if for
> >>>>>>>> PVH we're a little more strict than for PV (I think).
> >>>>>>>
> >>>>>>> Hi Jan, for clarity and to make this actionable, are you suggesting to
> >>>>>>> clarify the commit message by adding wording around "Dom0 is overly
> >>>>>>> powerful already anyway so it is OK so this is OK" ?
> >>>>>>
> >>>>>> Yes, perhaps with "deemed" added. 
> >>>>> OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "
> >>>>
> >>>> I don't mind the deemed as you add it, but the important place to add it
> >>>> here is before "OK". I'm sorry, it didn't occur to me that after all the
> >>>> prior discussion this earlier reply of mine could still be mis-interpreted.
> >>>>
> >>>>>> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
> >>>>>> pass that check, after all, as can de-priv qemu running in Dom0).
> >>>>> So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?
> >>>>
> >>>> I'm afraid that in order to make (propose) such a change you need to be
> >>>> able to explain why it is okay to expose functionality beyond where it's
> >>>> presently exposed. It's not just writing a new paragraph that's needed
> >>>> here. You first need to _check_ that what you do is okay. And once you've
> >>>> done that checking, you then summarize that in writing.
> >>>  
> >>>
> >>> PHYSDEVOP_map_pirq ends up calling physdev_map_pirq which is protected
> >>> by:
> >>>
> >>>     ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
> >>>     if ( ret )
> >>>         return ret;
> >>>
> >>> Dom0 having permission to do PHYSDEVOP_map_pirq even without has_pirq is
> >>> fine. Device models are also OK because the code we are trying to enable
> >>> is in fact part of the device model. If someone were to run an HVM
> >>> stubdom they might need this patch as well.
> >>>
> >>> If PHYSDEVOP_map_pirq is allowed, also PHYSDEVOP_unmap_pirq should be
> >>> allowed.
> >>>
> >>> Is this explanation OK?
> >>
> >> This still solely focuses on why the functionality is wanted. There
> >> continues to be nothing about the wider exposure actually being safe.
> > 
> > I don't think I understand what you would like to be checked or
> > clarified...
> > 
> > The only wider exposure is to device models, and device models can do a
> > lot worse than mapping pirqs already. There is no wider exposure to
> > DomUs. Also PV device models can already do this.
> 
> What do you mean by "worse"? I hope not "crash Xen"? And _that's_ what I
> want to have assurance of, e.g. a PVH/HVM DM not suddenly bringing Xen
> down, because these paths previously weren't accessible to them.

What about a commit message along the lines of:

x86/hvm: allow {,un}map_pirq hypercalls unconditionally

The current hypercall interfaces to manage and assign interrupts to
domains is mostly based in using pIRQs as handlers.  Such pIRQ values
are abstract domain-specific references to interrupts.

Classic HVM domains can have access to {,un}map_pirq hypercalls if the
domain is allowed to route physical interrupts over event channels.
That's however a different interface, limited to only mapping
interrupts to itself. PVH domains on the other hand never had access
to the interface, as PVH domains are not allowed to route interrupts
over event channels.

In order to allow setting up PCI passthrough from a PVH domain it
needs access to the {,un}map_pirq hypercalls so interrupts can be
assigned a pIRQ handler that can then be used by further hypercalls to
bind the interrupt to a domain.

Note that the {,un}map_pirq hypercalls end up calling helpers that are
already used against a PVH domain in order to setup interrupts for the
hardware domain when running in PVH mode.  physdev_map_pirq() will
call allocate_and_map_{gsi,msi}_pirq() which is already used by the
vIO-APIC or the vPCI code respectively.  So the exposed code paths are
not new when targeting a PVH domain, but rather previous callers are
not hypercall but emulation based.

Regards, Roger.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-09 10:04                             ` Roger Pau Monné
@ 2024-09-09 10:22                               ` Jan Beulich
  2024-09-09 10:35                               ` Chen, Jiqian
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2024-09-09 10:22 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Chen, Jiqian, Andrew Cooper, Wei Liu,
	George Dunlap, Julien Grall, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 09.09.2024 12:04, Roger Pau Monné wrote:
> On Mon, Sep 09, 2024 at 10:56:07AM +0200, Jan Beulich wrote:
>> On 07.09.2024 01:34, Stefano Stabellini wrote:
>>> On Fri, 6 Sep 2024, Jan Beulich wrote:
>>>> On 06.09.2024 00:51, Stefano Stabellini wrote:
>>>>> On Thu, 5 Sep 2024, Jan Beulich wrote:
>>>>>> On 05.09.2024 08:45, Chen, Jiqian wrote:
>>>>>>> HI,
>>>>>>>
>>>>>>> On 2024/9/4 14:04, Jan Beulich wrote:
>>>>>>>> On 04.09.2024 03:43, Stefano Stabellini wrote:
>>>>>>>>> On Tue, 3 Sep 2024, Jan Beulich wrote:
>>>>>>>>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
>>>>>>>>>>> On 2024/9/3 17:25, Jan Beulich wrote:
>>>>>>>>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>>>>>>>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>>>>>>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>>>>>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>>>>>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>>>>>>>> codes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>>>>>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>>>>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>>>>>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>>>>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>>>>>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As before: When you talk about just Dom0, ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>>>>>>>>> +        break;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>      case PHYSDEVOP_eoi:
>>>>>>>>>>>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>>>>>>>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>>>>>>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>>>>>>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
>>>>>>>>>>>>> Do you mean I need to talk about why permit this op for all HVM
>>>>>>>>>>>>
>>>>>>>>>>>> You don't need to invent reasons, but it needs making clear that wider than
>>>>>>>>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
>>>>>>>>>>>>
>>>>>>>>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>>>>>>>>>>>
>>>>>>>>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>>>>>>>>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
>>>>>>>>>>>> has_pirq() domains use on oneself was already permitted before.
>>>>>>>>>>>
>>>>>>>>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
>>>>>>>>>>> {
>>>>>>>>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
>>>>>>>>>>>
>>>>>>>>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
>>>>>>>>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>>>> codes.
>>>>>>>>>>>
>>>>>>>>>>> To solve above problem, need to remove the chack has_pirq() for that
>>>>>>>>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
>>>>>>>>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
>>>>>>>>>>> what the problem need.
>>>>>>>>>>> So, clarify below:
>>>>>>>>>>>
>>>>>>>>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
>>>>>>>>>>> interrupts through pirq for them. Because pirq field is used as a way to
>>>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>>>> devices(emulated or passthrough) through event channel, so, remove
>>>>>>>>>>> has_pirq() check has no impact on HVM domUs.
>>>>>>>>>>>
>>>>>>>>>>> For PVH domUs that performs such an operation will fail at the check
>>>>>>>>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
>>>>>>>>>>>
>>>>>>>>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
>>>>>>>>>>> it also has no impact.
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> This is better than what you had before, and I don't really fancy re-
>>>>>>>>>> writing the description effectively from scratch. So let's just go from
>>>>>>>>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
>>>>>>>>>> after is that it be clarified that we're not going to introduce any
>>>>>>>>>> security issues here. That requires auditing the code, and merely saying
>>>>>>>>>> "also has no impact" is a little too little for my taste. For Dom0 an
>>>>>>>>>> argument may be that it is overly powerful already anyway, even if for
>>>>>>>>>> PVH we're a little more strict than for PV (I think).
>>>>>>>>>
>>>>>>>>> Hi Jan, for clarity and to make this actionable, are you suggesting to
>>>>>>>>> clarify the commit message by adding wording around "Dom0 is overly
>>>>>>>>> powerful already anyway so it is OK so this is OK" ?
>>>>>>>>
>>>>>>>> Yes, perhaps with "deemed" added. 
>>>>>>> OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "
>>>>>>
>>>>>> I don't mind the deemed as you add it, but the important place to add it
>>>>>> here is before "OK". I'm sorry, it didn't occur to me that after all the
>>>>>> prior discussion this earlier reply of mine could still be mis-interpreted.
>>>>>>
>>>>>>>> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
>>>>>>>> pass that check, after all, as can de-priv qemu running in Dom0).
>>>>>>> So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?
>>>>>>
>>>>>> I'm afraid that in order to make (propose) such a change you need to be
>>>>>> able to explain why it is okay to expose functionality beyond where it's
>>>>>> presently exposed. It's not just writing a new paragraph that's needed
>>>>>> here. You first need to _check_ that what you do is okay. And once you've
>>>>>> done that checking, you then summarize that in writing.
>>>>>  
>>>>>
>>>>> PHYSDEVOP_map_pirq ends up calling physdev_map_pirq which is protected
>>>>> by:
>>>>>
>>>>>     ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
>>>>>     if ( ret )
>>>>>         return ret;
>>>>>
>>>>> Dom0 having permission to do PHYSDEVOP_map_pirq even without has_pirq is
>>>>> fine. Device models are also OK because the code we are trying to enable
>>>>> is in fact part of the device model. If someone were to run an HVM
>>>>> stubdom they might need this patch as well.
>>>>>
>>>>> If PHYSDEVOP_map_pirq is allowed, also PHYSDEVOP_unmap_pirq should be
>>>>> allowed.
>>>>>
>>>>> Is this explanation OK?
>>>>
>>>> This still solely focuses on why the functionality is wanted. There
>>>> continues to be nothing about the wider exposure actually being safe.
>>>
>>> I don't think I understand what you would like to be checked or
>>> clarified...
>>>
>>> The only wider exposure is to device models, and device models can do a
>>> lot worse than mapping pirqs already. There is no wider exposure to
>>> DomUs. Also PV device models can already do this.
>>
>> What do you mean by "worse"? I hope not "crash Xen"? And _that's_ what I
>> want to have assurance of, e.g. a PVH/HVM DM not suddenly bringing Xen
>> down, because these paths previously weren't accessible to them.
> 
> What about a commit message along the lines of:
> 
> x86/hvm: allow {,un}map_pirq hypercalls unconditionally
> 
> The current hypercall interfaces to manage and assign interrupts to
> domains is mostly based in using pIRQs as handlers.  Such pIRQ values
> are abstract domain-specific references to interrupts.
> 
> Classic HVM domains can have access to {,un}map_pirq hypercalls if the
> domain is allowed to route physical interrupts over event channels.
> That's however a different interface, limited to only mapping
> interrupts to itself. PVH domains on the other hand never had access
> to the interface, as PVH domains are not allowed to route interrupts
> over event channels.
> 
> In order to allow setting up PCI passthrough from a PVH domain it
> needs access to the {,un}map_pirq hypercalls so interrupts can be
> assigned a pIRQ handler that can then be used by further hypercalls to
> bind the interrupt to a domain.
> 
> Note that the {,un}map_pirq hypercalls end up calling helpers that are
> already used against a PVH domain in order to setup interrupts for the
> hardware domain when running in PVH mode.  physdev_map_pirq() will
> call allocate_and_map_{gsi,msi}_pirq() which is already used by the
> vIO-APIC or the vPCI code respectively.  So the exposed code paths are
> not new when targeting a PVH domain, but rather previous callers are
> not hypercall but emulation based.

I think I'd be fine with that, thanks.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi
  2024-09-09  9:15   ` Roger Pau Monné
@ 2024-09-09 10:30     ` Chen, Jiqian
  0 siblings, 0 replies; 30+ messages in thread
From: Chen, Jiqian @ 2024-09-09 10:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Jan Beulich, Andrew Cooper,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, Chen, Jiqian

On 2024/9/9 17:15, Roger Pau Monné wrote:
> On Tue, Sep 03, 2024 at 03:04:22PM +0800, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
> 
> I think the above commit message is a bit confusing, I would rather
> write it as:

Thank you very much! I will change in next version according to your all comments.

> 
> x86/irq: allow setting IRQ permissions from GSI instead of pIRQ
> 
> Some domains are not aware of the pIRQ abstraction layer that maps
> interrupt sources into Xen space interrupt numbers.  pIRQs values are
> only exposed to domains that have the option to route physical
> interrupts over event channels.
> 
> This creates issues for PCI-passthrough from a PVH domain, as some of
> the passthrough related hypercalls use pIRQ as references to physical
> interrupts on the system.  One of such interfaces is
> XEN_DOMCTL_irq_permission, used to grant or revoke access to
> interrupts, takes a pIRQ as the reference to the interrupt to be
> adjusted.
> 
> Since PVH doesn't manage interrupts in terms of pIRQs, introduce a new
> hypercall that allows setting interrupt permissions based on GSI value
> rather than pIRQ.
> 
> Note the GSI hypercall parameters is translated to an IRQ value (in
> case there are ACPI overrides) before doing the checks.
> 
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke
>> the permission of irq (translated from x86 gsi) to dumU when dom0
>> has no PIRQs.
>>
>> Regarding the translation from gsi to irq, it is that if there are
>> ACPI overrides entries then get translation from them, if not gsi
>> are identity mapped into irq.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> CC: Daniel P . Smith <dpsmith@apertussolutions.com>
>> Remaining unsolved comment @Daniel P . Smith:
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, flags) )
>> +            break;
>> Is it okay to issue the XSM check using the translated value(irq),
>> not the one(gsi) that was originally passed into the hypercall?
>> ---
>> v13->v14 changes:
>> No.
>>
>> v12->v13 changes:
>> For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros.
>> Move "gsi > highest_gsi()" into function gsi_2_irq.
>> Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type.
>> Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
>> Delete unnecessary goto statements and change to direct break.
>> Add description in commit message to explain how gsi to irq isconverted.
>>
>> v11->v12 changes:
>> Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function.
>> Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
>> Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
>>
>> v10->v11 changes:
>> Extracted from patch#5 of v10 into a separate patch.
>> Add non-zero judgment for other bits of allow_access.
>> Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
>> Change the error exit path identifier "out" to "gsi_permission_out".
>> Use ARRAY_SIZE() instead of open coed.
>>
>> v9->v10 changes:
>> Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission.
>> Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain.
>> In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added.
>> Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition.
>>
>> v8->v9 changes:
>> Change the commit message to describe more why we need this new hypercall.
>> Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check.
>> Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
>> Add explicit padding to struct xen_domctl_gsi_permission.
>>
>> v5->v8 changes:
>> Nothing.
>>
>> v4->v5 changes:
>> New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi.
>> ---
>>  xen/arch/x86/domctl.c              | 29 +++++++++++++++++++++++++++++
>>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>>  xen/arch/x86/io_apic.c             | 21 +++++++++++++++++++++
>>  xen/arch/x86/mpparse.c             |  7 +++----
>>  xen/include/public/domctl.h        | 10 ++++++++++
>>  xen/xsm/flask/hooks.c              |  1 +
>>  6 files changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index 68b5b46d1a83..60b5578c47f8 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -36,6 +36,7 @@
>>  #include <asm/xstate.h>
>>  #include <asm/psr.h>
>>  #include <asm/cpu-policy.h>
>> +#include <asm/io_apic.h>
>>  
>>  static int update_domain_cpu_policy(struct domain *d,
>>                                      xen_domctl_cpu_policy_t *xdpc)
>> @@ -237,6 +238,34 @@ long arch_do_domctl(
>>          break;
>>      }
>>  
>> +    case XEN_DOMCTL_gsi_permission:
>> +    {
>> +        int irq;
>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +        uint32_t flags = domctl->u.gsi_permission.flags;
>> +
>> +        /* Check all bits are zero except lowest bit */
> 
> Nit: I would instead use: "Check only valid bits are set" in order to
> avoid the comment going stale if more bits are used in the flags
> field.
> 
>> +        ret = -EINVAL;
>> +        if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK )
>> +            break;
>> +
>> +        ret = irq = gsi_2_irq(gsi);
>> +        if ( ret <= 0 )
>> +            break;
>> +
>> +        ret = -EPERM;
>> +        if ( !irq_access_permitted(currd, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, flags) )
>> +            break;
>> +
>> +        if ( flags )
>> +            ret = irq_permit_access(d, irq);
>> +        else
>> +            ret = irq_deny_access(d, irq);
>> +
>> +        break;
>> +    }
>> +
>>      case XEN_DOMCTL_getpageframeinfo3:
>>      {
>>          unsigned int num = domctl->u.getpageframeinfo3.num;
>> diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
>> index 78268ea8f666..62456806c7af 100644
>> --- a/xen/arch/x86/include/asm/io_apic.h
>> +++ b/xen/arch/x86/include/asm/io_apic.h
>> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>>  
>>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
>> +int mp_find_ioapic(unsigned int gsi);
>> +int gsi_2_irq(unsigned int gsi);
>>  
>>  #endif
>> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
>> index 772700584639..5859484875cc 100644
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -955,6 +955,27 @@ static int pin_2_irq(int idx, int apic, int pin)
>>      return irq;
>>  }
>>  
>> +int gsi_2_irq(unsigned int gsi)
>> +{
>> +    int ioapic, irq;
>> +    unsigned int pin;
>> +
>> +    if ( gsi > highest_gsi() )
>> +        return -ERANGE;
>> +
>> +    ioapic = mp_find_ioapic(gsi);
>> +    if ( ioapic < 0 )
>> +        return -EINVAL;
>> +
>> +    pin = gsi - io_apic_gsi_base(ioapic);
>> +
>> +    irq = apic_pin_2_gsi_irq(ioapic, pin);
>> +    if ( irq <= 0 )
>> +        return -EINVAL;
>> +
>> +    return irq;
> 
> I think you could simplify this as:
> 
> return irq ?: -EINVAL;
> 
> So that the error code is possibly preserved from
> apic_pin_2_gsi_irq(), or otherwise -EINVAL is returned if irq == 0.
> 
> pin_2_irq() is IMO broken in returning irq == 0 when the bus is
> unknown, as irq == 0 is a valid irq, but let's not get into that here.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-09-09 10:04                             ` Roger Pau Monné
  2024-09-09 10:22                               ` Jan Beulich
@ 2024-09-09 10:35                               ` Chen, Jiqian
  1 sibling, 0 replies; 30+ messages in thread
From: Chen, Jiqian @ 2024-09-09 10:35 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich, Stefano Stabellini
  Cc: Chen, Jiqian, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org

On 2024/9/9 18:04, Roger Pau Monné wrote:
> On Mon, Sep 09, 2024 at 10:56:07AM +0200, Jan Beulich wrote:
>> On 07.09.2024 01:34, Stefano Stabellini wrote:
>>> On Fri, 6 Sep 2024, Jan Beulich wrote:
>>>> On 06.09.2024 00:51, Stefano Stabellini wrote:
>>>>> On Thu, 5 Sep 2024, Jan Beulich wrote:
>>>>>> On 05.09.2024 08:45, Chen, Jiqian wrote:
>>>>>>> HI,
>>>>>>>
>>>>>>> On 2024/9/4 14:04, Jan Beulich wrote:
>>>>>>>> On 04.09.2024 03:43, Stefano Stabellini wrote:
>>>>>>>>> On Tue, 3 Sep 2024, Jan Beulich wrote:
>>>>>>>>>> On 03.09.2024 12:53, Chen, Jiqian wrote:
>>>>>>>>>>> On 2024/9/3 17:25, Jan Beulich wrote:
>>>>>>>>>>>> On 03.09.2024 09:58, Chen, Jiqian wrote:
>>>>>>>>>>>>> On 2024/9/3 15:42, Jan Beulich wrote:
>>>>>>>>>>>>>> On 03.09.2024 09:04, Jiqian Chen wrote:
>>>>>>>>>>>>>>> When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
>>>>>>>>>>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>>>>>>>> codes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But it is fine to map interrupts through pirq to a HVM domain whose
>>>>>>>>>>>>>>> XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
>>>>>>>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>>>>>>>> devices(emulated or passthrough) through event channel, so, the has_pirq()
>>>>>>>>>>>>>>> check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>>>>>>>>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
>>>>>>>>>>>>>>> interrupt of a passthrough device can be successfully mapped to pirq for domU.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As before: When you talk about just Dom0, ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>>>>>> @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>>>>>>>>> +        break;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>      case PHYSDEVOP_eoi:
>>>>>>>>>>>>>>>      case PHYSDEVOP_irq_status_query:
>>>>>>>>>>>>>>>      case PHYSDEVOP_get_free_pirq:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ... that ought to match the code. IOW you've again lost why it is okay(ish)
>>>>>>>>>>>>>> (or even necessary) to also permit the op for non-Dom0 (e.g. a PVH stubdom).
>>>>>>>>>>>>>> Similarly imo Dom0 using this on itself wants discussing.
>>>>>>>>>>>>> Do you mean I need to talk about why permit this op for all HVM
>>>>>>>>>>>>
>>>>>>>>>>>> You don't need to invent reasons, but it needs making clear that wider than
>>>>>>>>>>>> necessary (for your purpose) exposure is at least not going to be a problem.
>>>>>>>>>>>>
>>>>>>>>>>>>> and  where can prevent PVH domain calling this for self-mapping, not only dom0?
>>>>>>>>>>>>
>>>>>>>>>>>> Aiui use on itself is limited to Dom0, so only that would need clarifying
>>>>>>>>>>>> (along the lines of the above, i.e. that/why it is not a problem). For
>>>>>>>>>>>> has_pirq() domains use on oneself was already permitted before.
>>>>>>>>>>>
>>>>>>>>>>> Changed commit message to below. Please check and that will be great helpful if you would show me how to modify it.
>>>>>>>>>>> {
>>>>>>>>>>> x86/pvh: Allow (un)map_pirq when dom0 is PVH
>>>>>>>>>>>
>>>>>>>>>>> Problem: when dom0 is PVH type and passthrough a device to HVM domU, Qemu
>>>>>>>>>>> code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
>>>>>>>>>>> xc_physdev_map_pirq map a pirq for passthrough devices.
>>>>>>>>>>> In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
>>>>>>>>>>> has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
>>>>>>>>>>> so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
>>>>>>>>>>> codes.
>>>>>>>>>>>
>>>>>>>>>>> To solve above problem, need to remove the chack has_pirq() for that
>>>>>>>>>>> situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But without
>>>>>>>>>>> adding other restrictions, PHYSDEVOP_map_pirq will be allowed wider than
>>>>>>>>>>> what the problem need.
>>>>>>>>>>> So, clarify below:
>>>>>>>>>>>
>>>>>>>>>>> For HVM domUs whose XENFEAT_hvm_pirqs is not enabled,it is fine to map
>>>>>>>>>>> interrupts through pirq for them. Because pirq field is used as a way to
>>>>>>>>>>> reference interrupts and it is just the way for the device model to
>>>>>>>>>>> identify which interrupt should be mapped to which domain, however
>>>>>>>>>>> has_pirq() is just to check if HVM domains route interrupts from
>>>>>>>>>>> devices(emulated or passthrough) through event channel, so, remove
>>>>>>>>>>> has_pirq() check has no impact on HVM domUs.
>>>>>>>>>>>
>>>>>>>>>>> For PVH domUs that performs such an operation will fail at the check
>>>>>>>>>>> xsm_map_dedomain_pirq() in physdev_map-nirq().
>>>>>>>>>>>
>>>>>>>>>>> For PVH dom0, it uses vpci and doesn't use event channel, as above talks,
>>>>>>>>>>> it also has no impact.
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> This is better than what you had before, and I don't really fancy re-
>>>>>>>>>> writing the description effectively from scratch. So let's just go from
>>>>>>>>>> there. As to the "excess" permission for the sub-ops: The main thing I'm
>>>>>>>>>> after is that it be clarified that we're not going to introduce any
>>>>>>>>>> security issues here. That requires auditing the code, and merely saying
>>>>>>>>>> "also has no impact" is a little too little for my taste. For Dom0 an
>>>>>>>>>> argument may be that it is overly powerful already anyway, even if for
>>>>>>>>>> PVH we're a little more strict than for PV (I think).
>>>>>>>>>
>>>>>>>>> Hi Jan, for clarity and to make this actionable, are you suggesting to
>>>>>>>>> clarify the commit message by adding wording around "Dom0 is overly
>>>>>>>>> powerful already anyway so it is OK so this is OK" ?
>>>>>>>>
>>>>>>>> Yes, perhaps with "deemed" added. 
>>>>>>> OK, for PVH dom0, I will change to " Dom0 is deemed overly powerful already anyway, so it is OK "
>>>>>>
>>>>>> I don't mind the deemed as you add it, but the important place to add it
>>>>>> here is before "OK". I'm sorry, it didn't occur to me that after all the
>>>>>> prior discussion this earlier reply of mine could still be mis-interpreted.
>>>>>>
>>>>>>>> And text for DomU-s similarly extended, as the pointing at the XSM check is presently incomplete (stubdom-s can
>>>>>>>> pass that check, after all, as can de-priv qemu running in Dom0).
>>>>>>> So sorry, I know so little about this, I can't explain these situations, could you tell me how to describe or help me write a paragraph?
>>>>>>
>>>>>> I'm afraid that in order to make (propose) such a change you need to be
>>>>>> able to explain why it is okay to expose functionality beyond where it's
>>>>>> presently exposed. It's not just writing a new paragraph that's needed
>>>>>> here. You first need to _check_ that what you do is okay. And once you've
>>>>>> done that checking, you then summarize that in writing.
>>>>>  
>>>>>
>>>>> PHYSDEVOP_map_pirq ends up calling physdev_map_pirq which is protected
>>>>> by:
>>>>>
>>>>>     ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
>>>>>     if ( ret )
>>>>>         return ret;
>>>>>
>>>>> Dom0 having permission to do PHYSDEVOP_map_pirq even without has_pirq is
>>>>> fine. Device models are also OK because the code we are trying to enable
>>>>> is in fact part of the device model. If someone were to run an HVM
>>>>> stubdom they might need this patch as well.
>>>>>
>>>>> If PHYSDEVOP_map_pirq is allowed, also PHYSDEVOP_unmap_pirq should be
>>>>> allowed.
>>>>>
>>>>> Is this explanation OK?
>>>>
>>>> This still solely focuses on why the functionality is wanted. There
>>>> continues to be nothing about the wider exposure actually being safe.
>>>
>>> I don't think I understand what you would like to be checked or
>>> clarified...
>>>
>>> The only wider exposure is to device models, and device models can do a
>>> lot worse than mapping pirqs already. There is no wider exposure to
>>> DomUs. Also PV device models can already do this.
>>
>> What do you mean by "worse"? I hope not "crash Xen"? And _that's_ what I
>> want to have assurance of, e.g. a PVH/HVM DM not suddenly bringing Xen
>> down, because these paths previously weren't accessible to them.
> 
> What about a commit message along the lines of:
> 
> x86/hvm: allow {,un}map_pirq hypercalls unconditionally
> 
> The current hypercall interfaces to manage and assign interrupts to
> domains is mostly based in using pIRQs as handlers.  Such pIRQ values
> are abstract domain-specific references to interrupts.
> 
> Classic HVM domains can have access to {,un}map_pirq hypercalls if the
> domain is allowed to route physical interrupts over event channels.
> That's however a different interface, limited to only mapping
> interrupts to itself. PVH domains on the other hand never had access
> to the interface, as PVH domains are not allowed to route interrupts
> over event channels.
> 
> In order to allow setting up PCI passthrough from a PVH domain it
> needs access to the {,un}map_pirq hypercalls so interrupts can be
> assigned a pIRQ handler that can then be used by further hypercalls to
> bind the interrupt to a domain.
> 
> Note that the {,un}map_pirq hypercalls end up calling helpers that are
> already used against a PVH domain in order to setup interrupts for the
> hardware domain when running in PVH mode.  physdev_map_pirq() will
> call allocate_and_map_{gsi,msi}_pirq() which is already used by the
> vIO-APIC or the vPCI code respectively.  So the exposed code paths are
> not new when targeting a PVH domain, but rather previous callers are
> not hypercall but emulation based.

Thank you three very very much for your help.
I will change to this in next version.

> 
> Regards, Roger.

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-09-09 10:35 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-03  7:04 [XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
2024-09-03  7:04 ` [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev Jiqian Chen
2024-09-09  9:34   ` Roger Pau Monné
2024-09-03  7:04 ` [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
2024-09-03  7:42   ` Jan Beulich
2024-09-03  7:58     ` Chen, Jiqian
2024-09-03  9:25       ` Jan Beulich
2024-09-03 10:53         ` Chen, Jiqian
2024-09-03 14:14           ` Jan Beulich
2024-09-04  1:43             ` Stefano Stabellini
2024-09-04  6:04               ` Jan Beulich
2024-09-05  6:45                 ` Chen, Jiqian
2024-09-05  9:44                   ` Jan Beulich
2024-09-05 22:51                     ` Stefano Stabellini
2024-09-06  6:26                       ` Jan Beulich
2024-09-06 23:34                         ` Stefano Stabellini
2024-09-09  8:56                           ` Jan Beulich
2024-09-09 10:04                             ` Roger Pau Monné
2024-09-09 10:22                               ` Jan Beulich
2024-09-09 10:35                               ` Chen, Jiqian
2024-09-03  7:04 ` [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi Jiqian Chen
2024-09-09  9:15   ` Roger Pau Monné
2024-09-09 10:30     ` Chen, Jiqian
2024-09-03  7:04 ` [RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev Jiqian Chen
2024-09-03 14:12   ` Anthony PERARD
2024-09-03 15:51   ` Anthony PERARD
2024-09-03 15:57     ` Jan Beulich
2024-09-03  7:04 ` [RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0 Jiqian Chen
2024-09-03 17:16   ` Anthony PERARD
2024-09-04  9:31     ` Chen, Jiqian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.