* [PATCH v6 0/2] vfio/pci: add interrupt affinity support
@ 2024-06-11 17:44 Fred Griffoul
2024-06-11 17:44 ` [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul
2024-06-11 17:44 ` [PATCH v6 2/2] vfio/pci: add interrupt affinity support Fred Griffoul
0 siblings, 2 replies; 6+ messages in thread
From: Fred Griffoul @ 2024-06-11 17:44 UTC (permalink / raw)
To: griffoul
Cc: Fred Griffoul, Catalin Marinas, Will Deacon, Alex Williamson,
Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, Mark Rutland,
Marc Zyngier, Oliver Upton, Mark Brown, Ard Biesheuvel,
Joey Gouly, Ryan Roberts, Jeremy Linton, Jason Gunthorpe, Yi Liu,
Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner,
Ankit Agrawal, Reinette Chatre, Ye Bin, linux-arm-kernel,
linux-kernel, kvm, cgroups
v6:
- corrections following review from Alex Williamson
<alex.williamson@redhat.com>:
- rename the new flag VFIO_IRQ_SET_DATA_CPUSET.
- add a new VFIO_IRQ_INFO_CPUSET flag for IRQ_INFO.
- rename the new vfio pci function vfio_pci_set_affinity()
and calls it for both msi or intx interrupt.
- use size_mul() for the VFIO_IRQ_SET_DATA_BOOL data
size computation.
- remove the specific cpumask_var_t allocation/release to
handle DATA_CPUSET data copy as all the other flags, with
the generic memdup_user(). The minor drawback is we then
have to reject a cpu_set_t data smaller than the actual
cpumask kernel structure.
- in vfio_pci_set_affinity() use pci_irq_vector() to retrieve
the irq number of each vector.
v5:
- vfio_pci_ioctl_set_irqs(): fix copy_from_user() check when copying
the cpumask argument. Reported by Dan Carpenter
<dan.carpenter@linaro.org>
- vfio_set_irqs_validate_and_prepare(): use size_mul() to compute the
data size of a VFIO_IRQ_SET_DATA_EVENTFD ioctl() to avoid a possible
overflow on 32-bit system. Reported by Dan Carpenter
<dan.carpenter@linaro.org>
- export system_32bit_el0_cpumask() to fix yet another missing symbol
for arm64 architecture.
v4:
- export arm64_mismatched_32bit_el0 to compile the vfio driver as
a kernel module on arm64 if CONFIG_CPUSETS is not defined.
- vfio_pci_ioctl_set_irqs(): free the cpumask_var_t only if data_size
is not zero, otherwise it was not allocated.
- vfio_pci_set_msi_trigger(): call the new function
vfio_pci_set_msi_affinity() later, after the DATA_EVENTFD
processing and the vdev index check.
v3:
- add a first patch to export cpuset_cpus_allowed() to be able to
compile the vfio driver as a kernel module.
v2:
- change the ioctl() interface to use a cpu_set_t in vfio_irq_set
'data' to keep the 'start' and 'count' semantic, as suggested by
David Woodhouse <dwmw2@infradead.org>
v1:
The usual way to configure a device interrupt from userland is to write
the /proc/irq/<irq>/smp_affinity or smp_affinity_list files. When using
vfio to implement a device driver or a virtual machine monitor, this may
not be ideal: the process managing the vfio device interrupts may not be
granted root privilege, for security reasons. Thus it cannot directly
control the interrupt affinity and has to rely on an external command.
This patch extends the VFIO_DEVICE_SET_IRQS ioctl() with a new data flag
to specify the affinity of a vfio pci device interrupt.
The affinity argument must be a subset of the process cpuset, otherwise
an error -EPERM is returned.
The vfio_irq_set argument shall be set-up in the following way:
- the 'flags' field have the new flag VFIO_IRQ_SET_DATA_AFFINITY set
as well as VFIO_IRQ_SET_ACTION_TRIGGER.
- the 'start' field is the device interrupt index. Only one interrupt
can be configured per ioctl().
- the variable-length array consists of one or more CPU index
encoded as __u32, the number of entries in the array is specified in the
'count' field.
Fred Griffoul (2):
cgroup/cpuset: export cpuset_cpus_allowed()
vfio/pci: add interrupt affinity support
arch/arm64/kernel/cpufeature.c | 2 ++
drivers/vfio/pci/vfio_pci_core.c | 2 +-
drivers/vfio/pci/vfio_pci_intrs.c | 41 +++++++++++++++++++++++++++++++
drivers/vfio/vfio_main.c | 20 ++++++++++++---
include/uapi/linux/vfio.h | 11 ++++++++-
kernel/cgroup/cpuset.c | 1 +
6 files changed, 71 insertions(+), 6 deletions(-)
base-commit: cbb325e77fbe62a06184175aa98c9eb98736c3e8
--
2.40.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() 2024-06-11 17:44 [PATCH v6 0/2] vfio/pci: add interrupt affinity support Fred Griffoul @ 2024-06-11 17:44 ` Fred Griffoul 2024-06-11 18:32 ` Waiman Long 2024-06-12 9:21 ` Catalin Marinas 2024-06-11 17:44 ` [PATCH v6 2/2] vfio/pci: add interrupt affinity support Fred Griffoul 1 sibling, 2 replies; 6+ messages in thread From: Fred Griffoul @ 2024-06-11 17:44 UTC (permalink / raw) To: griffoul Cc: Fred Griffoul, kernel test robot, Catalin Marinas, Will Deacon, Alex Williamson, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, Mark Rutland, Marc Zyngier, Oliver Upton, Mark Brown, Ard Biesheuvel, Joey Gouly, Ryan Roberts, Jeremy Linton, Jason Gunthorpe, Yi Liu, Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner, Ankit Agrawal, Reinette Chatre, Ye Bin, linux-arm-kernel, linux-kernel, kvm, cgroups A subsequent patch calls cpuset_cpus_allowed() in the vfio driver pci code. Export the symbol to be able to build the vfio driver as a kernel module. This is not enough, however: when CONFIG_CPUSETS is _not_ defined cpuset_cpus_allowed() is an inline function returning task_cpu_possible_mask(). For the arm64 architecture this function is also inline: it checks the arm64_mismatched_32bit_el0 static key and calls system_32bit_el0_cpumask(). We need to export those symbols as well. Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406060731.L3NSR1Hy-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202406070659.pYu6zNrx-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202406101154.iaDyTRwZ-lkp@intel.com/ --- arch/arm64/kernel/cpufeature.c | 2 ++ kernel/cgroup/cpuset.c | 1 + 2 files changed, 3 insertions(+) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 56583677c1f2..2f1de6343bee 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -127,6 +127,7 @@ static bool __read_mostly allow_mismatched_32bit_el0; * seen at least one CPU capable of 32-bit EL0. */ DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0); +EXPORT_SYMBOL_GPL(arm64_mismatched_32bit_el0); /* * Mask of CPUs supporting 32-bit EL0. @@ -1614,6 +1615,7 @@ const struct cpumask *system_32bit_el0_cpumask(void) return cpu_possible_mask; } +EXPORT_SYMBOL_GPL(system_32bit_el0_cpumask); static int __init parse_32bit_el0_param(char *str) { diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 4237c8748715..9fd56222aa4b 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -4764,6 +4764,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask) rcu_read_unlock(); spin_unlock_irqrestore(&callback_lock, flags); } +EXPORT_SYMBOL_GPL(cpuset_cpus_allowed); /** * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe. -- 2.40.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() 2024-06-11 17:44 ` [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul @ 2024-06-11 18:32 ` Waiman Long 2024-06-12 9:21 ` Catalin Marinas 1 sibling, 0 replies; 6+ messages in thread From: Waiman Long @ 2024-06-11 18:32 UTC (permalink / raw) To: Fred Griffoul, griffoul Cc: kernel test robot, Catalin Marinas, Will Deacon, Alex Williamson, Zefan Li, Tejun Heo, Johannes Weiner, Mark Rutland, Marc Zyngier, Oliver Upton, Mark Brown, Ard Biesheuvel, Joey Gouly, Ryan Roberts, Jeremy Linton, Jason Gunthorpe, Yi Liu, Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner, Ankit Agrawal, Reinette Chatre, Ye Bin, linux-arm-kernel, linux-kernel, kvm, cgroups On 6/11/24 13:44, Fred Griffoul wrote: > A subsequent patch calls cpuset_cpus_allowed() in the vfio driver pci > code. Export the symbol to be able to build the vfio driver as a kernel > module. > > This is not enough, however: when CONFIG_CPUSETS is _not_ defined > cpuset_cpus_allowed() is an inline function returning > task_cpu_possible_mask(). For the arm64 architecture this function is > also inline: it checks the arm64_mismatched_32bit_el0 static key and > calls system_32bit_el0_cpumask(). We need to export those symbols as > well. > > Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk> > Reported-by: kernel test robot <lkp@intel.com> > Closes: https://lore.kernel.org/oe-kbuild-all/202406060731.L3NSR1Hy-lkp@intel.com/ > Closes: https://lore.kernel.org/oe-kbuild-all/202406070659.pYu6zNrx-lkp@intel.com/ > Closes: https://lore.kernel.org/oe-kbuild-all/202406101154.iaDyTRwZ-lkp@intel.com/ > --- > arch/arm64/kernel/cpufeature.c | 2 ++ > kernel/cgroup/cpuset.c | 1 + > 2 files changed, 3 insertions(+) > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 56583677c1f2..2f1de6343bee 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -127,6 +127,7 @@ static bool __read_mostly allow_mismatched_32bit_el0; > * seen at least one CPU capable of 32-bit EL0. > */ > DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0); > +EXPORT_SYMBOL_GPL(arm64_mismatched_32bit_el0); > > /* > * Mask of CPUs supporting 32-bit EL0. > @@ -1614,6 +1615,7 @@ const struct cpumask *system_32bit_el0_cpumask(void) > > return cpu_possible_mask; > } > +EXPORT_SYMBOL_GPL(system_32bit_el0_cpumask); > > static int __init parse_32bit_el0_param(char *str) > { > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 4237c8748715..9fd56222aa4b 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -4764,6 +4764,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask) > rcu_read_unlock(); > spin_unlock_irqrestore(&callback_lock, flags); > } > +EXPORT_SYMBOL_GPL(cpuset_cpus_allowed); > > /** > * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe. Acked-by: Waiman Long <longman@redhat.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() 2024-06-11 17:44 ` [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul 2024-06-11 18:32 ` Waiman Long @ 2024-06-12 9:21 ` Catalin Marinas 1 sibling, 0 replies; 6+ messages in thread From: Catalin Marinas @ 2024-06-12 9:21 UTC (permalink / raw) To: Fred Griffoul Cc: griffoul, kernel test robot, Will Deacon, Alex Williamson, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, Mark Rutland, Marc Zyngier, Oliver Upton, Mark Brown, Ard Biesheuvel, Joey Gouly, Ryan Roberts, Jeremy Linton, Jason Gunthorpe, Yi Liu, Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner, Ankit Agrawal, Reinette Chatre, Ye Bin, linux-arm-kernel, linux-kernel, kvm, cgroups On Tue, Jun 11, 2024 at 05:44:24PM +0000, Fred Griffoul wrote: > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 56583677c1f2..2f1de6343bee 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -127,6 +127,7 @@ static bool __read_mostly allow_mismatched_32bit_el0; > * seen at least one CPU capable of 32-bit EL0. > */ > DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0); > +EXPORT_SYMBOL_GPL(arm64_mismatched_32bit_el0); > > /* > * Mask of CPUs supporting 32-bit EL0. > @@ -1614,6 +1615,7 @@ const struct cpumask *system_32bit_el0_cpumask(void) > > return cpu_possible_mask; > } > +EXPORT_SYMBOL_GPL(system_32bit_el0_cpumask); For the arm64 bits: Acked-by: Catalin Marinas <catalin.marinas@arm.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v6 2/2] vfio/pci: add interrupt affinity support 2024-06-11 17:44 [PATCH v6 0/2] vfio/pci: add interrupt affinity support Fred Griffoul 2024-06-11 17:44 ` [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul @ 2024-06-11 17:44 ` Fred Griffoul 2024-06-18 20:29 ` Alex Williamson 1 sibling, 1 reply; 6+ messages in thread From: Fred Griffoul @ 2024-06-11 17:44 UTC (permalink / raw) To: griffoul Cc: Fred Griffoul, Catalin Marinas, Will Deacon, Alex Williamson, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, Mark Rutland, Marc Zyngier, Oliver Upton, Mark Brown, Ard Biesheuvel, Joey Gouly, Ryan Roberts, Jeremy Linton, Jason Gunthorpe, Yi Liu, Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner, Ankit Agrawal, Reinette Chatre, Ye Bin, linux-arm-kernel, linux-kernel, kvm, cgroups The usual way to configure a device interrupt from userland is to write the /proc/irq/<irq>/smp_affinity or smp_affinity_list files. When using vfio to implement a device driver or a virtual machine monitor, this may not be ideal: the process managing the vfio device interrupts may not be granted root privilege, for security reasons. Thus it cannot directly control the interrupt affinity and has to rely on an external command. This patch extends the VFIO_DEVICE_SET_IRQS ioctl() with a new data flag to specify the affinity of interrupts of a vfio pci device. The CPU affinity mask argument must be a subset of the process cpuset, otherwise an error -EPERM is returned. The vfio_irq_set argument shall be set-up in the following way: - the 'flags' field have the new flag VFIO_IRQ_SET_DATA_CPUSET set as well as VFIO_IRQ_SET_ACTION_TRIGGER. - the variable-length 'data' field is a cpu_set_t structure, as for the sched_setaffinity() syscall, the size of which is derived from 'argsz'. Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk> --- drivers/vfio/pci/vfio_pci_core.c | 2 +- drivers/vfio/pci/vfio_pci_intrs.c | 41 +++++++++++++++++++++++++++++++ drivers/vfio/vfio_main.c | 15 ++++++++--- include/uapi/linux/vfio.h | 15 ++++++++++- 4 files changed, 67 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 80cae87fff36..fbc490703031 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1174,7 +1174,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev, return -EINVAL; } - info.flags = VFIO_IRQ_INFO_EVENTFD; + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_CPUSET; info.count = vfio_pci_get_irq_count(vdev, info.index); diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 8382c5834335..b339c42cb1c0 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -19,6 +19,7 @@ #include <linux/vfio.h> #include <linux/wait.h> #include <linux/slab.h> +#include <linux/cpuset.h> #include "vfio_pci_priv.h" @@ -82,6 +83,40 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index) return ctx; } +static int vfio_pci_set_affinity(struct vfio_pci_core_device *vdev, + unsigned int start, unsigned int count, + struct cpumask *irq_mask) +{ + cpumask_var_t allowed_mask; + int irq, err = 0; + unsigned int i; + + if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL)) + return -ENOMEM; + + cpuset_cpus_allowed(current, allowed_mask); + if (!cpumask_subset(irq_mask, allowed_mask)) { + err = -EPERM; + goto finish; + } + + for (i = start; i < start + count; i++) { + irq = pci_irq_vector(vdev->pdev, i); + if (irq < 0) { + err = -EINVAL; + break; + } + + err = irq_set_affinity(irq, irq_mask); + if (err) + break; + } + +finish: + free_cpumask_var(allowed_mask); + return err; +} + /* * INTx */ @@ -665,6 +700,9 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev, if (!is_intx(vdev)) return -EINVAL; + if (flags & VFIO_IRQ_SET_DATA_CPUSET) + return vfio_pci_set_affinity(vdev, start, count, data); + if (flags & VFIO_IRQ_SET_DATA_NONE) { vfio_send_intx_eventfd(vdev, vfio_irq_ctx_get(vdev, 0)); } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { @@ -713,6 +751,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev, if (!irq_is(vdev, index)) return -EINVAL; + if (flags & VFIO_IRQ_SET_DATA_CPUSET) + return vfio_pci_set_affinity(vdev, start, count, data); + for (i = start; i < start + count; i++) { ctx = vfio_irq_ctx_get(vdev, i); if (!ctx) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index e97d796a54fb..2e4f4e37cf89 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1505,23 +1505,30 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs, size = 0; break; case VFIO_IRQ_SET_DATA_BOOL: - size = sizeof(uint8_t); + size = size_mul(hdr->count, sizeof(uint8_t)); break; case VFIO_IRQ_SET_DATA_EVENTFD: - size = sizeof(int32_t); + size = size_mul(hdr->count, sizeof(int32_t)); + break; + case VFIO_IRQ_SET_DATA_CPUSET: + size = hdr->argsz - minsz; + if (size < cpumask_size()) + return -EINVAL; + if (size > cpumask_size()) + size = cpumask_size(); break; default: return -EINVAL; } if (size) { - if (hdr->argsz - minsz < hdr->count * size) + if (hdr->argsz - minsz < size) return -EINVAL; if (!data_size) return -EINVAL; - *data_size = hdr->count * size; + *data_size = size; } return 0; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2b68e6cdf190..d2edf6b725f8 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -530,6 +530,10 @@ struct vfio_region_info_cap_nvlink2_lnkspd { * Absence of the NORESIZE flag indicates that vectors can be enabled * and disabled dynamically without impacting other vectors within the * index. + * + * The CPUSET flag indicates the interrupt index supports setting + * its affinity with a cpu_set_t configured with the SET_IRQ + * ioctl(). */ struct vfio_irq_info { __u32 argsz; @@ -538,6 +542,7 @@ struct vfio_irq_info { #define VFIO_IRQ_INFO_MASKABLE (1 << 1) #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2) #define VFIO_IRQ_INFO_NORESIZE (1 << 3) +#define VFIO_IRQ_INFO_CPUSET (1 << 4) __u32 index; /* IRQ index */ __u32 count; /* Number of IRQs within this index */ }; @@ -580,6 +585,12 @@ struct vfio_irq_info { * * Note that ACTION_[UN]MASK specify user->kernel signaling (irqfds) while * ACTION_TRIGGER specifies kernel->user signaling. + * + * DATA_CPUSET specifies the affinity for the range of interrupt vectors. + * It must be set with ACTION_TRIGGER in 'flags'. The variable-length 'data' + * array is the CPU affinity mask represented as a 'cpu_set_t' structure, as + * for the sched_setaffinity() syscall argument: the 'argsz' field is used + * to check the actual cpu_set_t size. */ struct vfio_irq_set { __u32 argsz; @@ -587,6 +598,7 @@ struct vfio_irq_set { #define VFIO_IRQ_SET_DATA_NONE (1 << 0) /* Data not present */ #define VFIO_IRQ_SET_DATA_BOOL (1 << 1) /* Data is bool (u8) */ #define VFIO_IRQ_SET_DATA_EVENTFD (1 << 2) /* Data is eventfd (s32) */ +#define VFIO_IRQ_SET_DATA_CPUSET (1 << 6) /* Data is cpu_set_t */ #define VFIO_IRQ_SET_ACTION_MASK (1 << 3) /* Mask interrupt */ #define VFIO_IRQ_SET_ACTION_UNMASK (1 << 4) /* Unmask interrupt */ #define VFIO_IRQ_SET_ACTION_TRIGGER (1 << 5) /* Trigger interrupt */ @@ -599,7 +611,8 @@ struct vfio_irq_set { #define VFIO_IRQ_SET_DATA_TYPE_MASK (VFIO_IRQ_SET_DATA_NONE | \ VFIO_IRQ_SET_DATA_BOOL | \ - VFIO_IRQ_SET_DATA_EVENTFD) + VFIO_IRQ_SET_DATA_EVENTFD | \ + VFIO_IRQ_SET_DATA_CPUSET) #define VFIO_IRQ_SET_ACTION_TYPE_MASK (VFIO_IRQ_SET_ACTION_MASK | \ VFIO_IRQ_SET_ACTION_UNMASK | \ VFIO_IRQ_SET_ACTION_TRIGGER) -- 2.40.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v6 2/2] vfio/pci: add interrupt affinity support 2024-06-11 17:44 ` [PATCH v6 2/2] vfio/pci: add interrupt affinity support Fred Griffoul @ 2024-06-18 20:29 ` Alex Williamson 0 siblings, 0 replies; 6+ messages in thread From: Alex Williamson @ 2024-06-18 20:29 UTC (permalink / raw) To: Fred Griffoul Cc: griffoul, Catalin Marinas, Will Deacon, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, Mark Rutland, Marc Zyngier, Oliver Upton, Mark Brown, Ard Biesheuvel, Joey Gouly, Ryan Roberts, Jeremy Linton, Jason Gunthorpe, Yi Liu, Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner, Ankit Agrawal, Reinette Chatre, Ye Bin, linux-arm-kernel, linux-kernel, kvm, cgroups On Tue, 11 Jun 2024 17:44:25 +0000 Fred Griffoul <fgriffo@amazon.co.uk> wrote: > The usual way to configure a device interrupt from userland is to write > the /proc/irq/<irq>/smp_affinity or smp_affinity_list files. When using > vfio to implement a device driver or a virtual machine monitor, this may > not be ideal: the process managing the vfio device interrupts may not be > granted root privilege, for security reasons. Thus it cannot directly > control the interrupt affinity and has to rely on an external command. > > This patch extends the VFIO_DEVICE_SET_IRQS ioctl() with a new data flag > to specify the affinity of interrupts of a vfio pci device. > > The CPU affinity mask argument must be a subset of the process cpuset, > otherwise an error -EPERM is returned. > > The vfio_irq_set argument shall be set-up in the following way: > > - the 'flags' field have the new flag VFIO_IRQ_SET_DATA_CPUSET set > as well as VFIO_IRQ_SET_ACTION_TRIGGER. > > - the variable-length 'data' field is a cpu_set_t structure, as > for the sched_setaffinity() syscall, the size of which is derived > from 'argsz'. > > Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk> > --- > drivers/vfio/pci/vfio_pci_core.c | 2 +- > drivers/vfio/pci/vfio_pci_intrs.c | 41 +++++++++++++++++++++++++++++++ > drivers/vfio/vfio_main.c | 15 ++++++++--- > include/uapi/linux/vfio.h | 15 ++++++++++- > 4 files changed, 67 insertions(+), 6 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index 80cae87fff36..fbc490703031 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1174,7 +1174,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev, > return -EINVAL; > } > > - info.flags = VFIO_IRQ_INFO_EVENTFD; > + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_CPUSET; > > info.count = vfio_pci_get_irq_count(vdev, info.index); > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index 8382c5834335..b339c42cb1c0 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -19,6 +19,7 @@ > #include <linux/vfio.h> > #include <linux/wait.h> > #include <linux/slab.h> > +#include <linux/cpuset.h> > > #include "vfio_pci_priv.h" > > @@ -82,6 +83,40 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index) > return ctx; > } > > +static int vfio_pci_set_affinity(struct vfio_pci_core_device *vdev, > + unsigned int start, unsigned int count, > + struct cpumask *irq_mask) > +{ > + cpumask_var_t allowed_mask; > + int irq, err = 0; > + unsigned int i; > + > + if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL)) > + return -ENOMEM; > + > + cpuset_cpus_allowed(current, allowed_mask); > + if (!cpumask_subset(irq_mask, allowed_mask)) { > + err = -EPERM; > + goto finish; > + } > + > + for (i = start; i < start + count; i++) { > + irq = pci_irq_vector(vdev->pdev, i); > + if (irq < 0) { > + err = -EINVAL; > + break; > + } > + > + err = irq_set_affinity(irq, irq_mask); > + if (err) > + break; > + } Sorry I didn't have an opportunity to reply to your previous comments, but you stated: On Tue, 11 Jun 2024 09:58:48 +0100 Frederic Griffoul <griffoul@gmail.com> wrote: > My main use case is to configure NVMe queues in a virtual machine monitor > to interrupt only the physical CPUs assigned to that vmm. Then we can > set the same cpu_set_t to all the admin and I/O queues with a single ioctl(). So if I interpolate a little, the vmm's cpuset is likely set elsewhere by some management tool, but that management tool isn't monitoring registration of interrupts so you want the vmm to make some default choice about interrupt affinity as they're enabled. If that's all we want, couldn't we just add a flag that directs the existing SET_IRQS ioctl to call irq_set_affinity() based on the cpuset_cpus_allowed() when called with DATA_EVENTFD|ACTION_TRIGGER? What you're proposing here has a lot more versatility, but it's also not clear how the vmm would really make an optimal choice at this granularity. Whether it's better to target an interrupt to the pCPU running the vCPU where the guest has configured affinity isn't even necessarily the right choice. It could be for posted interrupts, but could also induce a vmexit otherwise. Is the vCPU necessarily even within the allowed cpuset of the vmm itself when this ioctl is called? I also wonder if there might be something through the irqbypass framework where the interrupt consumer could direct the affinity of the interrupt producer. It'd really be preferable to see a viable userspace application of this to prove it's worthwhile. > + > +finish: > + free_cpumask_var(allowed_mask); > + return err; > +} > + > /* > * INTx > */ > @@ -665,6 +700,9 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev, > if (!is_intx(vdev)) > return -EINVAL; > > + if (flags & VFIO_IRQ_SET_DATA_CPUSET) > + return vfio_pci_set_affinity(vdev, start, count, data); > + > if (flags & VFIO_IRQ_SET_DATA_NONE) { > vfio_send_intx_eventfd(vdev, vfio_irq_ctx_get(vdev, 0)); > } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { > @@ -713,6 +751,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev, > if (!irq_is(vdev, index)) > return -EINVAL; > > + if (flags & VFIO_IRQ_SET_DATA_CPUSET) > + return vfio_pci_set_affinity(vdev, start, count, data); > + > for (i = start; i < start + count; i++) { > ctx = vfio_irq_ctx_get(vdev, i); > if (!ctx) > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index e97d796a54fb..2e4f4e37cf89 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -1505,23 +1505,30 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs, > size = 0; > break; > case VFIO_IRQ_SET_DATA_BOOL: > - size = sizeof(uint8_t); > + size = size_mul(hdr->count, sizeof(uint8_t)); > break; > case VFIO_IRQ_SET_DATA_EVENTFD: > - size = sizeof(int32_t); > + size = size_mul(hdr->count, sizeof(int32_t)); > + break; > + case VFIO_IRQ_SET_DATA_CPUSET: > + size = hdr->argsz - minsz; > + if (size < cpumask_size()) > + return -EINVAL; > + if (size > cpumask_size()) > + size = cpumask_size(); You previously stated that a valid cpu_set_t could be smaller than a cpumask_var_t, but it looks like we're handling that as an error here? Truncating user data that's too large seems no more correct than masking in user data that's too small. Thanks, Alex > break; > default: > return -EINVAL; > } > > if (size) { > - if (hdr->argsz - minsz < hdr->count * size) > + if (hdr->argsz - minsz < size) > return -EINVAL; > > if (!data_size) > return -EINVAL; > > - *data_size = hdr->count * size; > + *data_size = size; > } > > return 0; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 2b68e6cdf190..d2edf6b725f8 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -530,6 +530,10 @@ struct vfio_region_info_cap_nvlink2_lnkspd { > * Absence of the NORESIZE flag indicates that vectors can be enabled > * and disabled dynamically without impacting other vectors within the > * index. > + * > + * The CPUSET flag indicates the interrupt index supports setting > + * its affinity with a cpu_set_t configured with the SET_IRQ > + * ioctl(). > */ > struct vfio_irq_info { > __u32 argsz; > @@ -538,6 +542,7 @@ struct vfio_irq_info { > #define VFIO_IRQ_INFO_MASKABLE (1 << 1) > #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2) > #define VFIO_IRQ_INFO_NORESIZE (1 << 3) > +#define VFIO_IRQ_INFO_CPUSET (1 << 4) > __u32 index; /* IRQ index */ > __u32 count; /* Number of IRQs within this index */ > }; > @@ -580,6 +585,12 @@ struct vfio_irq_info { > * > * Note that ACTION_[UN]MASK specify user->kernel signaling (irqfds) while > * ACTION_TRIGGER specifies kernel->user signaling. > + * > + * DATA_CPUSET specifies the affinity for the range of interrupt vectors. > + * It must be set with ACTION_TRIGGER in 'flags'. The variable-length 'data' > + * array is the CPU affinity mask represented as a 'cpu_set_t' structure, as > + * for the sched_setaffinity() syscall argument: the 'argsz' field is used > + * to check the actual cpu_set_t size. > */ > struct vfio_irq_set { > __u32 argsz; > @@ -587,6 +598,7 @@ struct vfio_irq_set { > #define VFIO_IRQ_SET_DATA_NONE (1 << 0) /* Data not present */ > #define VFIO_IRQ_SET_DATA_BOOL (1 << 1) /* Data is bool (u8) */ > #define VFIO_IRQ_SET_DATA_EVENTFD (1 << 2) /* Data is eventfd (s32) */ > +#define VFIO_IRQ_SET_DATA_CPUSET (1 << 6) /* Data is cpu_set_t */ > #define VFIO_IRQ_SET_ACTION_MASK (1 << 3) /* Mask interrupt */ > #define VFIO_IRQ_SET_ACTION_UNMASK (1 << 4) /* Unmask interrupt */ > #define VFIO_IRQ_SET_ACTION_TRIGGER (1 << 5) /* Trigger interrupt */ > @@ -599,7 +611,8 @@ struct vfio_irq_set { > > #define VFIO_IRQ_SET_DATA_TYPE_MASK (VFIO_IRQ_SET_DATA_NONE | \ > VFIO_IRQ_SET_DATA_BOOL | \ > - VFIO_IRQ_SET_DATA_EVENTFD) > + VFIO_IRQ_SET_DATA_EVENTFD | \ > + VFIO_IRQ_SET_DATA_CPUSET) > #define VFIO_IRQ_SET_ACTION_TYPE_MASK (VFIO_IRQ_SET_ACTION_MASK | \ > VFIO_IRQ_SET_ACTION_UNMASK | \ > VFIO_IRQ_SET_ACTION_TRIGGER) ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-18 20:30 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-11 17:44 [PATCH v6 0/2] vfio/pci: add interrupt affinity support Fred Griffoul 2024-06-11 17:44 ` [PATCH v6 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul 2024-06-11 18:32 ` Waiman Long 2024-06-12 9:21 ` Catalin Marinas 2024-06-11 17:44 ` [PATCH v6 2/2] vfio/pci: add interrupt affinity support Fred Griffoul 2024-06-18 20:29 ` Alex Williamson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).