* [PATCH v3 0/2] vfio/pci: add msi interrupt affinity support
@ 2024-06-06 15:10 Fred Griffoul
2024-06-06 15:10 ` [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul
2024-06-06 15:10 ` [PATCH v3 2/2] vfio/pci: add msi interrupt affinity support Fred Griffoul
0 siblings, 2 replies; 6+ messages in thread
From: Fred Griffoul @ 2024-06-06 15:10 UTC (permalink / raw)
To: griffoul
Cc: Fred Griffoul, Alex Williamson, Jason Gunthorpe, Yi Liu,
Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner,
Ankit Agrawal, Reinette Chatre, Ye Bin, kvm, linux-kernel
v3:
- add a first patch to export cpuset_cpus_allowed() to be able to
compile the vfio driver as a kernel module.
v2:
- change the ioctl() interface to use a cpu_set_t in vfio_irq_set
'data' to keep the 'start' and 'count' semantic, as suggested by
David Woodhouse <dwmw2@infradead.org>
v1:
The usual way to configure a device interrupt from userland is to write
the /proc/irq/<irq>/smp_affinity or smp_affinity_list files. When using
vfio to implement a device driver or a virtual machine monitor, this may
not be ideal: the process managing the vfio device interrupts may not be
granted root privilege, for security reasons. Thus it cannot directly
control the interrupt affinity and has to rely on an external command.
This patch extends the VFIO_DEVICE_SET_IRQS ioctl() with a new data flag
to specify the affinity of a vfio pci device interrupt.
The affinity argument must be a subset of the process cpuset, otherwise
an error -EPERM is returned.
The vfio_irq_set argument shall be set-up in the following way:
- the 'flags' field have the new flag VFIO_IRQ_SET_DATA_AFFINITY set
as well as VFIO_IRQ_SET_ACTION_TRIGGER.
- the 'start' field is the device interrupt index. Only one interrupt
can be configured per ioctl().
- the variable-length array consists of one or more CPU index
encoded as __u32, the number of entries in the array is specified in the
'count' field.
Fred Griffoul (2):
cgroup/cpuset: export cpusset_cpus_allowed()
vfio/pci: add msi interrupt affinity support
drivers/vfio/pci/vfio_pci_core.c | 26 +++++++++++++++++----
drivers/vfio/pci/vfio_pci_intrs.c | 39 +++++++++++++++++++++++++++++++
drivers/vfio/vfio_main.c | 13 +++++++----
include/uapi/linux/vfio.h | 10 +++++++-
kernel/cgroup/cpuset.c | 1 +
5 files changed, 80 insertions(+), 9 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed()
2024-06-06 15:10 [PATCH v3 0/2] vfio/pci: add msi interrupt affinity support Fred Griffoul
@ 2024-06-06 15:10 ` Fred Griffoul
2024-06-06 15:45 ` Waiman Long
2024-06-06 15:10 ` [PATCH v3 2/2] vfio/pci: add msi interrupt affinity support Fred Griffoul
1 sibling, 1 reply; 6+ messages in thread
From: Fred Griffoul @ 2024-06-06 15:10 UTC (permalink / raw)
To: griffoul
Cc: Fred Griffoul, kernel test robot, Alex Williamson, Waiman Long,
Zefan Li, Tejun Heo, Johannes Weiner, Jason Gunthorpe, Yi Liu,
Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner,
Ankit Agrawal, Reinette Chatre, Ye Bin, kvm, linux-kernel,
cgroups
A subsequent patch calls cpuset_cpus_allowed() in the vfio driver pci
code. Export the symbol to be able to build the vfio driver as a kernel
module.
Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202406060731.L3NSR1Hy-lkp@intel.com/
---
kernel/cgroup/cpuset.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4237c8748715..9fd56222aa4b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -4764,6 +4764,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
rcu_read_unlock();
spin_unlock_irqrestore(&callback_lock, flags);
}
+EXPORT_SYMBOL_GPL(cpuset_cpus_allowed);
/**
* cpuset_cpus_allowed_fallback - final fallback before complete catastrophe.
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] vfio/pci: add msi interrupt affinity support
2024-06-06 15:10 [PATCH v3 0/2] vfio/pci: add msi interrupt affinity support Fred Griffoul
2024-06-06 15:10 ` [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul
@ 2024-06-06 15:10 ` Fred Griffoul
1 sibling, 0 replies; 6+ messages in thread
From: Fred Griffoul @ 2024-06-06 15:10 UTC (permalink / raw)
To: griffoul
Cc: Fred Griffoul, Alex Williamson, Jason Gunthorpe, Yi Liu,
Kevin Tian, Eric Auger, Stefan Hajnoczi, Christian Brauner,
Ankit Agrawal, Reinette Chatre, Ye Bin, kvm, linux-kernel
The usual way to configure a device interrupt from userland is to write
the /proc/irq/<irq>/smp_affinity or smp_affinity_list files. When using
vfio to implement a device driver or a virtual machine monitor, this may
not be ideal: the process managing the vfio device interrupts may not be
granted root privilege, for security reasons. Thus it cannot directly
control the interrupt affinity and has to rely on an external command.
This patch extends the VFIO_DEVICE_SET_IRQS ioctl() with a new data flag
to specify the affinity of interrupts of a vfio pci device.
The CPU affinity mask argument must be a subset of the process cpuset,
otherwise an error -EPERM is returned.
The vfio_irq_set argument shall be set-up in the following way:
- the 'flags' field have the new flag VFIO_IRQ_SET_DATA_AFFINITY set
as well as VFIO_IRQ_SET_ACTION_TRIGGER.
- the variable-length 'data' field is a cpu_set_t structure, as
for the sched_setaffinity() syscall, the size of which is derived
from 'argsz'.
Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
---
drivers/vfio/pci/vfio_pci_core.c | 26 +++++++++++++++++----
drivers/vfio/pci/vfio_pci_intrs.c | 39 +++++++++++++++++++++++++++++++
drivers/vfio/vfio_main.c | 13 +++++++----
include/uapi/linux/vfio.h | 10 +++++++-
4 files changed, 79 insertions(+), 9 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 80cae87fff36..b89df562fb5c 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1192,6 +1192,7 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
{
unsigned long minsz = offsetofend(struct vfio_irq_set, count);
struct vfio_irq_set hdr;
+ cpumask_var_t mask;
u8 *data = NULL;
int max, ret = 0;
size_t data_size = 0;
@@ -1207,9 +1208,21 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
return ret;
if (data_size) {
- data = memdup_user(&arg->data, data_size);
- if (IS_ERR(data))
- return PTR_ERR(data);
+ if (hdr.flags & VFIO_IRQ_SET_DATA_AFFINITY) {
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ ret = copy_from_user(mask, &arg->data, data_size);
+ if (ret)
+ goto out;
+
+ data = (u8 *)mask;
+
+ } else {
+ data = memdup_user(&arg->data, data_size);
+ if (IS_ERR(data))
+ return PTR_ERR(data);
+ }
}
mutex_lock(&vdev->igate);
@@ -1218,7 +1231,12 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
hdr.count, data);
mutex_unlock(&vdev->igate);
- kfree(data);
+
+out:
+ if (hdr.flags & VFIO_IRQ_SET_DATA_AFFINITY)
+ free_cpumask_var(mask);
+ else
+ kfree(data);
return ret;
}
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8382c5834335..58fc751e75f1 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -19,6 +19,7 @@
#include <linux/vfio.h>
#include <linux/wait.h>
#include <linux/slab.h>
+#include <linux/cpuset.h>
#include "vfio_pci_priv.h"
@@ -675,6 +676,41 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_set_msi_affinity(struct vfio_pci_core_device *vdev,
+ unsigned int start, unsigned int count,
+ struct cpumask *irq_mask)
+{
+ struct vfio_pci_irq_ctx *ctx;
+ cpumask_var_t allowed_mask;
+ unsigned int i;
+ int err = 0;
+
+ if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ cpuset_cpus_allowed(current, allowed_mask);
+ if (!cpumask_subset(irq_mask, allowed_mask)) {
+ err = -EPERM;
+ goto finish;
+ }
+
+ for (i = start; i < start + count; i++) {
+ ctx = vfio_irq_ctx_get(vdev, i);
+ if (!ctx) {
+ err = -EINVAL;
+ break;
+ }
+
+ err = irq_set_affinity(ctx->producer.irq, irq_mask);
+ if (err)
+ break;
+ }
+
+finish:
+ free_cpumask_var(allowed_mask);
+ return err;
+}
+
static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
@@ -691,6 +727,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
if (!(irq_is(vdev, index) || is_irq_none(vdev)))
return -EINVAL;
+ if (flags & VFIO_IRQ_SET_DATA_AFFINITY)
+ return vfio_pci_set_msi_affinity(vdev, start, count, data);
+
if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
int32_t *fds = data;
int ret;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index e97d796a54fb..e87131d45059 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1505,23 +1505,28 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
size = 0;
break;
case VFIO_IRQ_SET_DATA_BOOL:
- size = sizeof(uint8_t);
+ size = hdr->count * sizeof(uint8_t);
break;
case VFIO_IRQ_SET_DATA_EVENTFD:
- size = sizeof(int32_t);
+ size = hdr->count * sizeof(int32_t);
+ break;
+ case VFIO_IRQ_SET_DATA_AFFINITY:
+ size = hdr->argsz - minsz;
+ if (size > cpumask_size())
+ size = cpumask_size();
break;
default:
return -EINVAL;
}
if (size) {
- if (hdr->argsz - minsz < hdr->count * size)
+ if (hdr->argsz - minsz < size)
return -EINVAL;
if (!data_size)
return -EINVAL;
- *data_size = hdr->count * size;
+ *data_size = size;
}
return 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 2b68e6cdf190..5ba2ca223550 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -580,6 +580,12 @@ struct vfio_irq_info {
*
* Note that ACTION_[UN]MASK specify user->kernel signaling (irqfds) while
* ACTION_TRIGGER specifies kernel->user signaling.
+ *
+ * DATA_AFFINITY specifies the affinity for the range of interrupt vectors.
+ * It must be set with ACTION_TRIGGER in 'flags'. The variable-length 'data'
+ * array is a CPU affinity mask 'cpu_set_t' structure, as for the
+ * sched_setaffinity() syscall argument: the 'argsz' field is used to check
+ * the actual cpu_set_t size.
*/
struct vfio_irq_set {
__u32 argsz;
@@ -587,6 +593,7 @@ struct vfio_irq_set {
#define VFIO_IRQ_SET_DATA_NONE (1 << 0) /* Data not present */
#define VFIO_IRQ_SET_DATA_BOOL (1 << 1) /* Data is bool (u8) */
#define VFIO_IRQ_SET_DATA_EVENTFD (1 << 2) /* Data is eventfd (s32) */
+#define VFIO_IRQ_SET_DATA_AFFINITY (1 << 6) /* Data is cpu_set_t */
#define VFIO_IRQ_SET_ACTION_MASK (1 << 3) /* Mask interrupt */
#define VFIO_IRQ_SET_ACTION_UNMASK (1 << 4) /* Unmask interrupt */
#define VFIO_IRQ_SET_ACTION_TRIGGER (1 << 5) /* Trigger interrupt */
@@ -599,7 +606,8 @@ struct vfio_irq_set {
#define VFIO_IRQ_SET_DATA_TYPE_MASK (VFIO_IRQ_SET_DATA_NONE | \
VFIO_IRQ_SET_DATA_BOOL | \
- VFIO_IRQ_SET_DATA_EVENTFD)
+ VFIO_IRQ_SET_DATA_EVENTFD | \
+ VFIO_IRQ_SET_DATA_AFFINITY)
#define VFIO_IRQ_SET_ACTION_TYPE_MASK (VFIO_IRQ_SET_ACTION_MASK | \
VFIO_IRQ_SET_ACTION_UNMASK | \
VFIO_IRQ_SET_ACTION_TRIGGER)
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed()
2024-06-06 15:10 ` [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul
@ 2024-06-06 15:45 ` Waiman Long
2024-06-07 16:28 ` Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2024-06-06 15:45 UTC (permalink / raw)
To: Fred Griffoul, griffoul
Cc: kernel test robot, Alex Williamson, Zefan Li, Tejun Heo,
Johannes Weiner, Jason Gunthorpe, Yi Liu, Kevin Tian, Eric Auger,
Stefan Hajnoczi, Christian Brauner, Ankit Agrawal,
Reinette Chatre, Ye Bin, kvm, linux-kernel, cgroups
On 6/6/24 11:10, Fred Griffoul wrote:
> A subsequent patch calls cpuset_cpus_allowed() in the vfio driver pci
> code. Export the symbol to be able to build the vfio driver as a kernel
> module.
>
> Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202406060731.L3NSR1Hy-lkp@intel.com/
> ---
> kernel/cgroup/cpuset.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 4237c8748715..9fd56222aa4b 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -4764,6 +4764,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> rcu_read_unlock();
> spin_unlock_irqrestore(&callback_lock, flags);
> }
> +EXPORT_SYMBOL_GPL(cpuset_cpus_allowed);
>
> /**
> * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe.
LGTM
Acked-by: Waiman Long <longman@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed()
2024-06-06 15:45 ` Waiman Long
@ 2024-06-07 16:28 ` Tejun Heo
2024-06-07 18:18 ` Frederic Griffoul
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2024-06-07 16:28 UTC (permalink / raw)
To: Waiman Long
Cc: Fred Griffoul, griffoul, kernel test robot, Alex Williamson,
Zefan Li, Johannes Weiner, Jason Gunthorpe, Yi Liu, Kevin Tian,
Eric Auger, Stefan Hajnoczi, Christian Brauner, Ankit Agrawal,
Reinette Chatre, Ye Bin, kvm, linux-kernel, cgroups
On Thu, Jun 06, 2024 at 11:45:37AM -0400, Waiman Long wrote:
>
> On 6/6/24 11:10, Fred Griffoul wrote:
> > A subsequent patch calls cpuset_cpus_allowed() in the vfio driver pci
> > code. Export the symbol to be able to build the vfio driver as a kernel
> > module.
> >
> > Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes: https://lore.kernel.org/oe-kbuild-all/202406060731.L3NSR1Hy-lkp@intel.com/
> > ---
> > kernel/cgroup/cpuset.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > index 4237c8748715..9fd56222aa4b 100644
> > --- a/kernel/cgroup/cpuset.c
> > +++ b/kernel/cgroup/cpuset.c
> > @@ -4764,6 +4764,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> > rcu_read_unlock();
> > spin_unlock_irqrestore(&callback_lock, flags);
> > }
> > +EXPORT_SYMBOL_GPL(cpuset_cpus_allowed);
> > /**
> > * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe.
>
> LGTM
>
> Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
If more convenient, please feel free to route the patch with the rest of the
series. If you want it applied to the cgroup tree, please let me know.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed()
2024-06-07 16:28 ` Tejun Heo
@ 2024-06-07 18:18 ` Frederic Griffoul
0 siblings, 0 replies; 6+ messages in thread
From: Frederic Griffoul @ 2024-06-07 18:18 UTC (permalink / raw)
To: Tejun Heo
Cc: Waiman Long, Fred Griffoul, kernel test robot, Alex Williamson,
Zefan Li, Johannes Weiner, Jason Gunthorpe, Yi Liu, Kevin Tian,
Eric Auger, Stefan Hajnoczi, Christian Brauner, Ankit Agrawal,
Reinette Chatre, Ye Bin, kvm, linux-kernel, cgroups
Thanks. Unfortunately exporting cpuset_cpus_allowed() is not enough.
When CONFIG_CPUSETS is _not_ defined, the function is inline to return
task_cpu_possible_mask(). On arm64 the latter checks the static key
arm64_mismatched_32bit_el0, and thus this symbol must be exported too.
I wonder whether it would be better to avoid inlining cpuset_cpus_allowed()
in this case.
Br,
Fred
On Fri, Jun 7, 2024 at 5:29 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Thu, Jun 06, 2024 at 11:45:37AM -0400, Waiman Long wrote:
> >
> > On 6/6/24 11:10, Fred Griffoul wrote:
> > > A subsequent patch calls cpuset_cpus_allowed() in the vfio driver pci
> > > code. Export the symbol to be able to build the vfio driver as a kernel
> > > module.
> > >
> > > Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Closes: https://lore.kernel.org/oe-kbuild-all/202406060731.L3NSR1Hy-lkp@intel.com/
> > > ---
> > > kernel/cgroup/cpuset.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > > index 4237c8748715..9fd56222aa4b 100644
> > > --- a/kernel/cgroup/cpuset.c
> > > +++ b/kernel/cgroup/cpuset.c
> > > @@ -4764,6 +4764,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> > > rcu_read_unlock();
> > > spin_unlock_irqrestore(&callback_lock, flags);
> > > }
> > > +EXPORT_SYMBOL_GPL(cpuset_cpus_allowed);
> > > /**
> > > * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe.
> >
> > LGTM
> >
> > Acked-by: Waiman Long <longman@redhat.com>
>
> Acked-by: Tejun Heo <tj@kernel.org>
>
> If more convenient, please feel free to route the patch with the rest of the
> series. If you want it applied to the cgroup tree, please let me know.
>
> Thanks.
>
> --
> tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-07 18:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06 15:10 [PATCH v3 0/2] vfio/pci: add msi interrupt affinity support Fred Griffoul
2024-06-06 15:10 ` [PATCH v3 1/2] cgroup/cpuset: export cpuset_cpus_allowed() Fred Griffoul
2024-06-06 15:45 ` Waiman Long
2024-06-07 16:28 ` Tejun Heo
2024-06-07 18:18 ` Frederic Griffoul
2024-06-06 15:10 ` [PATCH v3 2/2] vfio/pci: add msi interrupt affinity support Fred Griffoul
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox