* [RFC PATCH 0/7] vfio: Add alias region uapi for device feature
@ 2025-09-24 14:09 Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 1/7] vfio/pci: refactor region dereferences for RCU Mahmoud Adam
` (9 more replies)
0 siblings, 10 replies; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
This RFC proposes a new uapi VFIO DEVICE_FEATURE to create per-region
aliases with selectable attributes, initially enabling write-combine
(WC) where supported by the underlying region. The goal is to expose a
UAPI for userspace to request an alias of an existing VFIO region with
extra flags, then interact with it via a stable alias index through
existing ioctls and mmap where applicable.
This proposal is following Alex's suggestion [1]. This uapi allows
creating a region alias where the user could specify to enable certain
attributes through the alias. And then could use the alias index to
get the region info and grab the offset to operate on.
One example is to create a new Alias for bar 0 or similar BAR with WC
enabled. Then you can use the alias offset to mmap to the region with
WC enabled.
The uapi allows the user to request a region index to alias and the
extra flags to be set. Users can PROBE to get which flags are
supported by this region. The flags are the same to the region flags
in the region_info uapi.
This adds two new region flags:
- VFIO_REGION_INFO_FLAG_ALIAS: set on alias regions.
- VFIO_REGION_INFO_FLAG_WC: indicates WC is in effect for that region.
Then this series implement this uapi on vfio-pci. For vfio-pci, Alias
regions are only (for now) possible for mmap supported regions. There
could be future usages for these alias regions other than mmaps (like
I think we could use it to also allow to use read & write on
pci_iomap_wc version of the region?). In case if similar alias region
already exist return the current alias index to the user.
To mmap the region alias, we use the mmap region ops. Through that we
translate the vm_pgoff to its aliased region and call vfio_device mmap
with the alias pgoff. This enables us to mmap the original region then
update the pgrot for WC afterwards.
The call path would be:
vfio_pci_core_mmap (index >= VFIO_PCI_NUM_REGIONS)
vfio_pci_alias_region_mmap (update vm_pgoff)
vfio_pci_core_mmap
This series also adds required locking for region array
accessing. Since now regions are added after initial setup.
[1]: https://lore.kernel.org/kvm/20250811160710.174ca708.alex.williamson@redhat.com/
references:
https://lore.kernel.org/kvm/20250804104012.87915-1-mngyadam@amazon.de/
https://lore.kernel.org/kvm/20240731155352.3973857-1-kbusch@meta.com/
https://lore.kernel.org/kvm/lrkyq4ivccb6x.fsf@dev-dsk-mngyadam-1c-cb3f7548.eu-west-1.amazon.com/
Mahmoud Adam (7):
vfio/pci: refactor region dereferences for RCU.
vfio_pci_core: split krealloc to allow use RCU & return index
vfio/pci: add RCU locking for regions access
vfio: add FEATURE_ALIAS_REGION uapi
vfio_pci_core: allow regions with no release op
vfio-pci: add alias_region mmap ops
vfio-pci-core: implement FEATURE_ALIAS_REGION uapi
drivers/vfio/pci/vfio_pci_core.c | 289 +++++++++++++++++++++++++++----
drivers/vfio/pci/vfio_pci_igd.c | 34 +++-
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 24 +++
4 files changed, 301 insertions(+), 47 deletions(-)
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH 1/7] vfio/pci: refactor region dereferences for RCU.
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
@ 2025-09-24 14:09 ` Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 2/7] vfio_pci_core: split krealloc to allow use RCU & return index Mahmoud Adam
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
No functional changes. These refactors multiple region array accessing
into one place. This prepares for the RCU locking in the following
patches.
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
---
drivers/vfio/pci/vfio_pci_core.c | 21 ++++++++++++---------
drivers/vfio/pci/vfio_pci_igd.c | 20 ++++++++++++++------
2 files changed, 26 insertions(+), 15 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7dcf5439dedc9..ea04c1291af68 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1003,6 +1003,7 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev,
struct pci_dev *pdev = vdev->pdev;
struct vfio_region_info info;
struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+ struct vfio_pci_region *region;
int i, ret;
if (copy_from_user(&info, arg, minsz))
@@ -1091,22 +1092,23 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev,
info.index, VFIO_PCI_NUM_REGIONS + vdev->num_regions);
i = info.index - VFIO_PCI_NUM_REGIONS;
+ region = &vdev->region[i];
info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
- info.size = vdev->region[i].size;
- info.flags = vdev->region[i].flags;
+ info.size = region->size;
+ info.flags = region->flags;
- cap_type.type = vdev->region[i].type;
- cap_type.subtype = vdev->region[i].subtype;
+ cap_type.type = region->type;
+ cap_type.subtype = region->subtype;
ret = vfio_info_add_capability(&caps, &cap_type.header,
sizeof(cap_type));
if (ret)
return ret;
- if (vdev->region[i].ops->add_capability) {
- ret = vdev->region[i].ops->add_capability(
- vdev, &vdev->region[i], &caps);
+ if (region->ops->add_capability) {
+ ret = region->ops->add_capability(
+ vdev, region, &caps);
if (ret)
return ret;
}
@@ -1726,10 +1728,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
int regnum = index - VFIO_PCI_NUM_REGIONS;
struct vfio_pci_region *region = vdev->region + regnum;
+ ret = -EINVAL;
if (region->ops && region->ops->mmap &&
(region->flags & VFIO_REGION_INFO_FLAG_MMAP))
- return region->ops->mmap(vdev, region, vma);
- return -EINVAL;
+ ret = region->ops->mmap(vdev, region, vma);
+ return ret;
}
if (index >= VFIO_PCI_ROM_REGION_INDEX)
return -EINVAL;
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 988b6919c2c31..ac0921fdc62da 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -66,14 +66,18 @@ static ssize_t vfio_pci_igd_rw(struct vfio_pci_core_device *vdev,
bool iswrite)
{
unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
- struct igd_opregion_vbt *opregionvbt = vdev->region[i].data;
loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK, off = 0;
size_t remaining;
+ struct vfio_pci_region *region;
+ struct igd_opregion_vbt *opregionvbt;
+
+ region = &vdev->region[i];
+ opregionvbt = region->data;
- if (pos >= vdev->region[i].size || iswrite)
+ if (pos >= region->size || iswrite)
return -EINVAL;
- count = min_t(size_t, count, vdev->region[i].size - pos);
+ count = min_t(size_t, count, region->size - pos);
remaining = count;
/* Copy until OpRegion version */
@@ -283,15 +287,19 @@ static ssize_t vfio_pci_igd_cfg_rw(struct vfio_pci_core_device *vdev,
bool iswrite)
{
unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
- struct pci_dev *pdev = vdev->region[i].data;
loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
size_t size;
int ret;
+ struct vfio_pci_region *region;
+ struct pci_dev *pdev;
+
+ region = &vdev->region[i];
+ pdev = region->data;
- if (pos >= vdev->region[i].size || iswrite)
+ if (pos >= region->size || iswrite)
return -EINVAL;
- size = count = min(count, (size_t)(vdev->region[i].size - pos));
+ size = count = min(count, (size_t)(region->size - pos));
if ((pos & 1) && size) {
u8 val;
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 2/7] vfio_pci_core: split krealloc to allow use RCU & return index
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 1/7] vfio/pci: refactor region dereferences for RCU Mahmoud Adam
@ 2025-09-24 14:09 ` Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 3/7] vfio/pci: add RCU locking for regions access Mahmoud Adam
` (7 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
Unwrap the allocation, copying, assignation & freeing part of the
krealloc. This enables using RCU for picking the reference in the
following patches, and synchronize before writing back to region.
Use the return value for returning the region index that was
created. This is helpful for the caller to know the index of the
region that was created.
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
---
drivers/vfio/pci/vfio_pci_core.c | 34 ++++++++++++++++++++------------
drivers/vfio/pci/vfio_pci_igd.c | 6 +++---
2 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index ea04c1291af68..6629490c0e46f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -881,30 +881,38 @@ static int msix_mmappable_cap(struct vfio_pci_core_device *vdev,
return vfio_info_add_capability(caps, &header, sizeof(header));
}
+/*
+ * Registers a new region to vfio_pci_core_device.
+ * Returns region index on success or a negative errno.
+ */
int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
unsigned int type, unsigned int subtype,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data)
{
- struct vfio_pci_region *region;
+ int num_regions = vdev->num_regions;
+ struct vfio_pci_region *region, *old_region;
- region = krealloc(vdev->region,
- (vdev->num_regions + 1) * sizeof(*region),
- GFP_KERNEL_ACCOUNT);
+ region = kmalloc((num_regions + 1) * sizeof(*region),
+ GFP_KERNEL_ACCOUNT);
if (!region)
return -ENOMEM;
- vdev->region = region;
- vdev->region[vdev->num_regions].type = type;
- vdev->region[vdev->num_regions].subtype = subtype;
- vdev->region[vdev->num_regions].ops = ops;
- vdev->region[vdev->num_regions].size = size;
- vdev->region[vdev->num_regions].flags = flags;
- vdev->region[vdev->num_regions].data = data;
+ old_region = vdev->region;
+ if (old_region)
+ memcpy(region, old_region, num_regions * sizeof(*region));
- vdev->num_regions++;
+ region[num_regions].type = type;
+ region[num_regions].subtype = subtype;
+ region[num_regions].ops = ops;
+ region[num_regions].size = size;
+ region[num_regions].flags = flags;
+ region[num_regions].data = data;
- return 0;
+ vdev->region = region;
+ vdev->num_regions++;
+ kfree(old_region);
+ return num_regions;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_register_dev_region);
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index ac0921fdc62da..93ddef48e4e4c 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -265,7 +265,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_core_device *vdev)
PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, &vfio_pci_igd_regops,
size, VFIO_REGION_INFO_FLAG_READ, opregionvbt);
- if (ret) {
+ if (ret < 0) {
if (opregionvbt->vbt_ex)
memunmap(opregionvbt->vbt_ex);
@@ -415,7 +415,7 @@ static int vfio_pci_igd_cfg_init(struct vfio_pci_core_device *vdev)
VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG,
&vfio_pci_igd_cfg_regops, host_bridge->cfg_size,
VFIO_REGION_INFO_FLAG_READ, host_bridge);
- if (ret) {
+ if (ret < 0) {
pci_dev_put(host_bridge);
return ret;
}
@@ -435,7 +435,7 @@ static int vfio_pci_igd_cfg_init(struct vfio_pci_core_device *vdev)
VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG,
&vfio_pci_igd_cfg_regops, lpc_bridge->cfg_size,
VFIO_REGION_INFO_FLAG_READ, lpc_bridge);
- if (ret) {
+ if (ret < 0) {
pci_dev_put(lpc_bridge);
return ret;
}
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 3/7] vfio/pci: add RCU locking for regions access
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 1/7] vfio/pci: refactor region dereferences for RCU Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 2/7] vfio_pci_core: split krealloc to allow use RCU & return index Mahmoud Adam
@ 2025-09-24 14:09 ` Mahmoud Adam
2025-09-24 16:15 ` Mahmoud Nagy Adam
2025-09-24 14:09 ` [RFC PATCH 4/7] vfio: add FEATURE_ALIAS_REGION uapi Mahmoud Adam
` (6 subsequent siblings)
9 siblings, 1 reply; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
Since we could request to add more regions after initialization. We
would need locking to avoid racing with readers and cause UAF. use RCU
for read-write synchronization. And region_lock mutex is used to
synchronize the write section.
Changing the value of num_regions is done under the mutex. Since the
num_regions can only increase, using READ_ONCE and WRITE_ONCE should
be enough to make sure we have a valid value. On the write section,
synchronize_rcu() is run before incrementing num_regions. Doing that
makes sure read sections are passed before increasing num_regions to
avoid causing out-of-bound access.
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
---
drivers/vfio/pci/vfio_pci_core.c | 59 +++++++++++++++++++++++---------
drivers/vfio/pci/vfio_pci_igd.c | 16 ++++++---
include/linux/vfio_pci_core.h | 1 +
3 files changed, 55 insertions(+), 21 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 6629490c0e46f..78e18bfd973e5 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -882,7 +882,8 @@ static int msix_mmappable_cap(struct vfio_pci_core_device *vdev,
}
/*
- * Registers a new region to vfio_pci_core_device.
+ * Registers a new region to vfio_pci_core_device. region_lock should
+ * be held when multiple registers could happen.
* Returns region index on success or a negative errno.
*/
int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
@@ -890,15 +891,20 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data)
{
- int num_regions = vdev->num_regions;
struct vfio_pci_region *region, *old_region;
+ int num_regions;
+
+ mutex_lock(&vdev->region_lock);
+ num_regions = READ_ONCE(vdev->num_regions);
region = kmalloc((num_regions + 1) * sizeof(*region),
GFP_KERNEL_ACCOUNT);
if (!region)
return -ENOMEM;
- old_region = vdev->region;
+ old_region =
+ rcu_dereference_protected(vdev->region,
+ lockdep_is_held(&vdev->region_lock));
if (old_region)
memcpy(region, old_region, num_regions * sizeof(*region));
@@ -909,8 +915,10 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
region[num_regions].flags = flags;
region[num_regions].data = data;
- vdev->region = region;
- vdev->num_regions++;
+ rcu_assign_pointer(vdev->region, region);
+ synchronize_rcu();
+ WRITE_ONCE(vdev->num_regions, READ_ONCE(vdev->num_regions) + 1);
+ mutex_unlock(&vdev->region_lock);
kfree(old_region);
return num_regions;
}
@@ -968,7 +976,7 @@ static int vfio_pci_ioctl_get_info(struct vfio_pci_core_device *vdev,
if (vdev->reset_works)
info.flags |= VFIO_DEVICE_FLAGS_RESET;
- info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
+ info.num_regions = VFIO_PCI_NUM_REGIONS + READ_ONCE(vdev->num_regions);
info.num_irqs = VFIO_PCI_NUM_IRQS;
ret = vfio_pci_info_zdev_add_caps(vdev, &caps);
@@ -1094,13 +1102,16 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev,
.header.version = 1
};
- if (info.index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+ if (info.index >= VFIO_PCI_NUM_REGIONS +
+ READ_ONCE(vdev->num_regions))
return -EINVAL;
- info.index = array_index_nospec(
- info.index, VFIO_PCI_NUM_REGIONS + vdev->num_regions);
+ info.index = array_index_nospec(info.index,
+ VFIO_PCI_NUM_REGIONS +
+ READ_ONCE(vdev->num_regions));
i = info.index - VFIO_PCI_NUM_REGIONS;
- region = &vdev->region[i];
+ rcu_read_lock();
+ region = &rcu_dereference(vdev->region)[i];
info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
info.size = region->size;
@@ -1111,15 +1122,20 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev,
ret = vfio_info_add_capability(&caps, &cap_type.header,
sizeof(cap_type));
- if (ret)
+ if (ret) {
+ rcu_read_unlock();
return ret;
+ }
if (region->ops->add_capability) {
ret = region->ops->add_capability(
vdev, region, &caps);
- if (ret)
+ if (ret) {
+ rcu_read_unlock();
return ret;
+ }
}
+ rcu_read_unlock();
}
}
@@ -1536,7 +1552,7 @@ static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
int ret;
- if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+ if (index >= VFIO_PCI_NUM_REGIONS + READ_ONCE(vdev->num_regions))
return -EINVAL;
ret = pm_runtime_resume_and_get(&vdev->pdev->dev);
@@ -1568,8 +1584,11 @@ static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
default:
index -= VFIO_PCI_NUM_REGIONS;
- ret = vdev->region[index].ops->rw(vdev, buf,
- count, ppos, iswrite);
+ rcu_read_lock();
+ ret = rcu_dereference(vdev->region)[index].ops->rw(vdev, buf,
+ count, ppos,
+ iswrite);
+ rcu_read_unlock();
break;
}
@@ -1726,7 +1745,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
- if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+ if (index >= VFIO_PCI_NUM_REGIONS + READ_ONCE(vdev->num_regions))
return -EINVAL;
if (vma->vm_end < vma->vm_start)
return -EINVAL;
@@ -1734,12 +1753,16 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
return -EINVAL;
if (index >= VFIO_PCI_NUM_REGIONS) {
int regnum = index - VFIO_PCI_NUM_REGIONS;
- struct vfio_pci_region *region = vdev->region + regnum;
+ struct vfio_pci_region *region;
+
+ rcu_read_lock();
+ region = rcu_dereference(vdev->region) + regnum;
ret = -EINVAL;
if (region->ops && region->ops->mmap &&
(region->flags & VFIO_REGION_INFO_FLAG_MMAP))
ret = region->ops->mmap(vdev, region, vma);
+ rcu_read_unlock();
return ret;
}
if (index >= VFIO_PCI_ROM_REGION_INDEX)
@@ -2107,6 +2130,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
init_rwsem(&vdev->memory_lock);
xa_init(&vdev->ctx);
+ mutex_init(&vdev->region_lock);
return 0;
}
@@ -2119,6 +2143,7 @@ void vfio_pci_core_release_dev(struct vfio_device *core_vdev)
mutex_destroy(&vdev->igate);
mutex_destroy(&vdev->ioeventfds_lock);
+ mutex_destroy(&vdev->region_lock);
kfree(vdev->region);
kfree(vdev->pm_save);
}
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 93ddef48e4e4c..1f7e9e82ac08c 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -71,13 +71,17 @@ static ssize_t vfio_pci_igd_rw(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region;
struct igd_opregion_vbt *opregionvbt;
- region = &vdev->region[i];
+ rcu_read_lock();
+ region = &rcu_dereference(vdev->region)[i];
opregionvbt = region->data;
- if (pos >= region->size || iswrite)
+ if (pos >= region->size || iswrite) {
+ rcu_read_unlock();
return -EINVAL;
+ }
count = min_t(size_t, count, region->size - pos);
+ rcu_read_unlock();
remaining = count;
/* Copy until OpRegion version */
@@ -293,13 +297,17 @@ static ssize_t vfio_pci_igd_cfg_rw(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region;
struct pci_dev *pdev;
- region = &vdev->region[i];
+ rcu_read_lock();
+ region = &rcu_dereference(vdev->region)[i];
pdev = region->data;
- if (pos >= region->size || iswrite)
+ if (pos >= region->size || iswrite) {
+ rcu_read_unlock();
return -EINVAL;
+ }
size = count = min(count, (size_t)(region->size - pos));
+ rcu_read_unlock();
if ((pos & 1) && size) {
u8 val;
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index f541044e42a2a..e106e58f297e9 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -63,6 +63,7 @@ struct vfio_pci_core_device {
int irq_type;
int num_regions;
struct vfio_pci_region *region;
+ struct mutex region_lock;
u8 msi_qmax;
u8 msix_bar;
u16 msix_size;
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 4/7] vfio: add FEATURE_ALIAS_REGION uapi
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
` (2 preceding siblings ...)
2025-09-24 14:09 ` [RFC PATCH 3/7] vfio/pci: add RCU locking for regions access Mahmoud Adam
@ 2025-09-24 14:09 ` Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 5/7] vfio_pci_core: allow regions with no release op Mahmoud Adam
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
Add a new uapi DEVICE_FEATURE uapi, which allows users to create
region aliases. The main usage is allowing user to request alias
region with different attributes set, Like WC etc.
This could be used create alias for current regions with WC or similar
attributes set. Which is helpful for mmap-ing a region with WC. User
can use PROBE to get the supported flags by the specified region index.
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
---
include/uapi/linux/vfio.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 75100bf009baf..1584409ba2fb9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -275,6 +275,8 @@ struct vfio_region_info {
#define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */
#define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */
#define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */
+#define VFIO_REGION_INFO_FLAG_ALIAS (1 << 4) /* This is an Alias Region */
+#define VFIO_REGION_INFO_FLAG_WC (1 << 5) /* Region supports write combine */
__u32 index; /* Region index */
__u32 cap_offset; /* Offset within info struct of first cap */
__aligned_u64 size; /* Region size (bytes) */
@@ -1478,6 +1480,28 @@ struct vfio_device_feature_bus_master {
};
#define VFIO_DEVICE_FEATURE_BUS_MASTER 10
+
+/**
+ * Upon VFIO_DEVICE_FEATURE_SET, creates a new region with the specified flags set.
+ * VFIO_DEVICE_FEATURE_PROBE can be used to return the supported flags for this region.
+ *
+ * Alias a region with certain region flags set. For example this
+ * could be used to alias a region with Write Combine or similar
+ * attributes set for mmap. The new region index is returned on
+ * alias_index with the flags specified set. GET_REGION_INFO could then
+ * be used with the new index. By probing a region index the supported
+ * region flags are returned.
+ * Region flags follows the same flags from REGION_GET_REGION_INFO.
+ */
+struct vfio_device_feature_alias_region {
+ __u32 flags; /* Region flags to be used */
+ __u32 index; /* Region index */
+ __u32 alias_index; /* New region index */
+ __u32 _resv1;
+ __u64 _resv2;
+};
+
+#define VFIO_DEVICE_FEATURE_ALIAS_REGION 11
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 5/7] vfio_pci_core: allow regions with no release op
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
` (3 preceding siblings ...)
2025-09-24 14:09 ` [RFC PATCH 4/7] vfio: add FEATURE_ALIAS_REGION uapi Mahmoud Adam
@ 2025-09-24 14:09 ` Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 6/7] vfio-pci: add alias_region mmap ops Mahmoud Adam
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
Allow regions to not to have a release op. This could be helpful with
alias regions. These regions wouldn't need to implement release ops.
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
---
With the initial implementnation purposed in this RFC, there is
nothing to release for the alias regions. I wasn't sure if we should
force regions to implement release ops, If this is the case then an
empty function might be the better solution here.
drivers/vfio/pci/vfio_pci_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 78e18bfd973e5..04b93bd55a5c2 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -605,7 +605,8 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
vdev->virq_disabled = false;
for (i = 0; i < vdev->num_regions; i++)
- vdev->region[i].ops->release(vdev, &vdev->region[i]);
+ if (vdev->region[i].ops && vdev->region[i].ops->release)
+ vdev->region[i].ops->release(vdev, &vdev->region[i]);
vdev->num_regions = 0;
kfree(vdev->region);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 6/7] vfio-pci: add alias_region mmap ops
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
` (4 preceding siblings ...)
2025-09-24 14:09 ` [RFC PATCH 5/7] vfio_pci_core: allow regions with no release op Mahmoud Adam
@ 2025-09-24 14:09 ` Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 7/7] vfio-pci-core: implement FEATURE_ALIAS_REGION uapi Mahmoud Adam
` (3 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
Implement struct vfio_pci_regops for alias regions. Where it
implements the mmap ops. When mmap is called on these regions it
translates the vm_pgoff to match the aliased region. Then it calls the
required mmap for the target region. It updates the vm_page_prot
afterwards with the requested flags.
The call path would be:
vfio_pci_core_mmap (index >= VFIO_PCI_NUM_REGIONS)
vfio_pci_alias_region_mmap (update vm_pgoff)
vfio_pci_core_mmap
For now no more information is needed more than the aliased index. So
we use region->data to save the aliased index number.
Note: Alias regions can't alias another alias.
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
---
drivers/vfio/pci/vfio_pci_core.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 04b93bd55a5c2..962d3eda1ea9f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1528,6 +1528,33 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
return 0;
}
+static int vfio_pci_alias_region_mmap(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_region *region,
+ struct vm_area_struct *vma)
+{
+ unsigned int alias_index = (uintptr_t) region->data;
+ unsigned long vm_pgoff;
+ int ret;
+
+ /* change the pgoff to the corresponding alias */
+ vm_pgoff = alias_index << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
+ vm_pgoff |= vma->vm_pgoff &
+ ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+ vma->vm_pgoff = vm_pgoff;
+
+ ret = vdev->vdev.ops->mmap(&vdev->vdev, vma);
+
+ /* overwrite prot with the alias flags */
+ if (region->flags & VFIO_REGION_INFO_FLAG_WC)
+ vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+ return ret;
+}
+
+struct vfio_pci_regops vfio_pci_alias_region_ops = {
+ .mmap = vfio_pci_alias_region_mmap,
+};
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 7/7] vfio-pci-core: implement FEATURE_ALIAS_REGION uapi
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
` (5 preceding siblings ...)
2025-09-24 14:09 ` [RFC PATCH 6/7] vfio-pci: add alias_region mmap ops Mahmoud Adam
@ 2025-09-24 14:09 ` Mahmoud Adam
2025-10-03 21:58 ` [RFC PATCH 0/7] vfio: Add alias region uapi for device feature David Matlack
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Adam @ 2025-09-24 14:09 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
This implements the new DEVICE_FEATURE_ALIAS_REGION. As of right now
Alias is only needed for mmaping. So we will allow aliasing mmap
supported regions only.
If the user requested a similar alias (same flags and aliased
index). re-use the old index instead by returning it to the
user. Since creating another alias gives no extra value for the
user. The region with the new flag (WC), will allow the user to
mmap the aliased region with WC enabled.
We also supports probing. When the user probe a region index, we
return the region flags supported to be enabled for this
region. Initially we are supporting WC only when the region is
mmap-able.
add vfio_pci_core_register_dev_region_locked to allow externally
locking the mutex. So that we can check for if a similar region exists
and add a new region under the same mutex lock, to avoid racing.
Signed-off-by: Mahmoud Adam <mngyadam@amazon.de>
---
drivers/vfio/pci/vfio_pci_core.c | 173 ++++++++++++++++++++++++++++---
1 file changed, 161 insertions(+), 12 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 962d3eda1ea9f..3c162cf47a1eb 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -882,20 +882,14 @@ static int msix_mmappable_cap(struct vfio_pci_core_device *vdev,
return vfio_info_add_capability(caps, &header, sizeof(header));
}
-/*
- * Registers a new region to vfio_pci_core_device. region_lock should
- * be held when multiple registers could happen.
- * Returns region index on success or a negative errno.
- */
-int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
- unsigned int type, unsigned int subtype,
- const struct vfio_pci_regops *ops,
- size_t size, u32 flags, void *data)
+static int vfio_pci_core_register_dev_region_locked(
+ struct vfio_pci_core_device *vdev,
+ unsigned int type, unsigned int subtype,
+ const struct vfio_pci_regops *ops,
+ size_t size, u32 flags, void *data)
{
struct vfio_pci_region *region, *old_region;
int num_regions;
-
- mutex_lock(&vdev->region_lock);
num_regions = READ_ONCE(vdev->num_regions);
region = kmalloc((num_regions + 1) * sizeof(*region),
@@ -919,10 +913,29 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
rcu_assign_pointer(vdev->region, region);
synchronize_rcu();
WRITE_ONCE(vdev->num_regions, READ_ONCE(vdev->num_regions) + 1);
- mutex_unlock(&vdev->region_lock);
kfree(old_region);
return num_regions;
}
+
+/*
+ * Registers a new region to vfio_pci_core_device. region_lock should
+ * be held when multiple registers could happen.
+ * Returns region index on success or a negative errno.
+ */
+int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
+ unsigned int type, unsigned int subtype,
+ const struct vfio_pci_regops *ops,
+ size_t size, u32 flags, void *data)
+{
+ int index;
+
+ mutex_lock(&vdev->region_lock);
+ index = vfio_pci_core_register_dev_region_locked(vdev, type, subtype,
+ ops,
+ size, flags, data);
+ mutex_unlock(&vdev->region_lock);
+ return index;
+}
EXPORT_SYMBOL_GPL(vfio_pci_core_register_dev_region);
static int vfio_pci_info_atomic_cap(struct vfio_pci_core_device *vdev,
@@ -1528,6 +1541,48 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
return 0;
}
+static bool vfio_pci_region_is_mmap_supported(struct vfio_pci_core_device *vdev,
+ int index)
+{
+ if (index <= VFIO_PCI_BAR5_REGION_INDEX)
+ return vdev->bar_mmap_supported[index];
+
+ if (index >= VFIO_PCI_NUM_REGIONS) {
+ int i = index - VFIO_PCI_NUM_REGIONS;
+ bool is_mmap;
+ struct vfio_pci_region *region;
+
+ rcu_read_lock();
+ region = &rcu_dereference(vdev->region)[i];
+ is_mmap = (region->flags & VFIO_REGION_INFO_FLAG_MMAP) &&
+ region->ops && region->ops->mmap;
+ rcu_read_unlock();
+ return is_mmap;
+ }
+ return false;
+}
+
+static bool vfio_pci_region_alias_exists(struct vfio_pci_core_device *vdev,
+ u32 flags, int index, int *alias_index)
+{
+ int i;
+
+ for (i = 0; i < READ_ONCE(vdev->num_regions); i++) {
+ struct vfio_pci_region *region;
+
+ region = &rcu_dereference_protected(
+ vdev->region, lockdep_is_held(&vdev->region_lock))[i];
+ if (!(region->flags & VFIO_REGION_INFO_FLAG_ALIAS))
+ continue;
+ if ((int)(uintptr_t) region->data == index &&
+ region->flags == flags) {
+ *alias_index = i + VFIO_PCI_NUM_REGIONS;
+ return true;
+ }
+ }
+ return false;
+}
+
static int vfio_pci_alias_region_mmap(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region,
struct vm_area_struct *vma)
@@ -1555,6 +1610,97 @@ struct vfio_pci_regops vfio_pci_alias_region_ops = {
.mmap = vfio_pci_alias_region_mmap,
};
+static int vfio_pci_core_feature_alias_region(
+ struct vfio_device *device, u32 flags,
+ struct vfio_device_feature_alias_region __user *arg,
+ size_t argsz)
+{
+ struct vfio_pci_core_device *vdev =
+ container_of(device, struct vfio_pci_core_device, vdev);
+ struct pci_dev *pdev = vdev->pdev;
+ bool is_probe = false;
+ u32 region_flags;
+ struct vfio_device_feature_alias_region request_region;
+ int ret, index, new_index;
+ size_t size;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+ sizeof(request_region));
+ if (ret < 0)
+ return ret;
+
+ if (ret == 0) /* probing only */
+ is_probe = true;
+
+ if (copy_from_user(&request_region, arg, sizeof(request_region)))
+ return -EFAULT;
+
+ if (request_region.index >= VFIO_PCI_NUM_REGIONS +
+ READ_ONCE(vdev->num_regions))
+ return -EINVAL;
+
+ index = array_index_nospec(request_region.index,
+ VFIO_PCI_NUM_REGIONS +
+ READ_ONCE(vdev->num_regions));
+
+ /* make sure we are not aliasing an alias region */
+ if (index >= VFIO_PCI_NUM_REGIONS) {
+ int i;
+
+ rcu_read_lock();
+ i = index - VFIO_PCI_NUM_REGIONS;
+ if (rcu_dereference(vdev->region)[i].flags &
+ VFIO_REGION_INFO_FLAG_ALIAS) {
+ rcu_read_unlock();
+ return -EINVAL;
+ }
+ rcu_read_unlock();
+ }
+
+ /* For now we only allow aliasing mmap supported regions. */
+ if (!vfio_pci_region_is_mmap_supported(vdev, index))
+ return -EINVAL;
+
+ if (is_probe) {
+ request_region.flags = VFIO_REGION_INFO_FLAG_WC;
+ goto out_copy;
+ }
+
+ if (request_region.flags & ~VFIO_REGION_INFO_FLAG_WC)
+ return -EINVAL;
+
+ region_flags = VFIO_REGION_INFO_FLAG_ALIAS |
+ VFIO_REGION_INFO_FLAG_MMAP | VFIO_REGION_INFO_FLAG_WC;
+
+ mutex_lock(&vdev->region_lock);
+ if (vfio_pci_region_alias_exists(vdev, region_flags,
+ index, &new_index)) {
+ request_region.alias_index = new_index;
+ goto out_copy_unlock;
+ }
+
+ if (index <= VFIO_PCI_BAR5_REGION_INDEX)
+ size = pci_resource_len(pdev, index);
+ else
+ size = vdev->region[index].size;
+
+ new_index = vfio_pci_core_register_dev_region_locked(
+ vdev, 0, 0, &vfio_pci_alias_region_ops, size, region_flags,
+ (void *)(uintptr_t)index);
+
+ if (new_index < 0) {
+ mutex_unlock(&vdev->region_lock);
+ return new_index;
+ }
+ request_region.alias_index = new_index + VFIO_PCI_NUM_REGIONS;
+
+out_copy_unlock:
+ mutex_unlock(&vdev->region_lock);
+out_copy:
+ ret = copy_to_user(arg, &request_region, sizeof(request_region));
+ return ret ? -EFAULT : 0;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1568,6 +1714,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_pm_exit(device, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
return vfio_pci_core_feature_token(device, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_ALIAS_REGION:
+ return vfio_pci_core_feature_alias_region(device, flags,
+ arg, argsz);
default:
return -ENOTTY;
}
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 3/7] vfio/pci: add RCU locking for regions access
2025-09-24 14:09 ` [RFC PATCH 3/7] vfio/pci: add RCU locking for regions access Mahmoud Adam
@ 2025-09-24 16:15 ` Mahmoud Nagy Adam
0 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Nagy Adam @ 2025-09-24 16:15 UTC (permalink / raw)
To: kvm
Cc: alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
..Having a second look on the rcu read sections. Some of these read
sections could sleep/block. simple RCU with these sections will not
work. Need to fix this on the next send.
-MNAdam
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/7] vfio: Add alias region uapi for device feature
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
` (6 preceding siblings ...)
2025-09-24 14:09 ` [RFC PATCH 7/7] vfio-pci-core: implement FEATURE_ALIAS_REGION uapi Mahmoud Adam
@ 2025-10-03 21:58 ` David Matlack
2025-10-05 10:16 ` Mahmoud Nagy Adam
2025-10-15 19:36 ` Alex Williamson
2025-10-27 16:32 ` David Matlack
9 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2025-10-03 21:58 UTC (permalink / raw)
To: Mahmoud Adam
Cc: kvm, alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
On Wed, Sep 24, 2025 at 7:11 AM Mahmoud Adam <mngyadam@amazon.de> wrote:
>
> This adds two new region flags:
> - VFIO_REGION_INFO_FLAG_ALIAS: set on alias regions.
> - VFIO_REGION_INFO_FLAG_WC: indicates WC is in effect for that region.
Once you settle on a uAPI, this would be a good candidate for some
VFIO selftests coverage [1][2].
Assuming the uAPI supports setting up equivalent aliases of BARs, you
can use the VFIO selftests driver framework to drive some meaningful
usage of the aliases. For example, you could write a test that
overrides bars[n].vaddr in struct vfio_pci_device to force the drivers
to use aliased BARs that you set up and mmapped, and make sure the
device can still perform memcpys as it should afterward.
But regardless, there is still plenty of opportunity for sanity tests
that verify the new ioctls fail and succeed when they should, and that
mmap works as expected.
Meaningfully verifying that VFIO_REGION_INFO_FLAG_WC is being applied
correctly might be a challenge.
[1] https://lore.kernel.org/kvm/20250822212518.4156428-1-dmatlack@google.com/
[2] https://lore.kernel.org/kvm/20250930124221.39523455.alex.williamson@redhat.com/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/7] vfio: Add alias region uapi for device feature
2025-10-03 21:58 ` [RFC PATCH 0/7] vfio: Add alias region uapi for device feature David Matlack
@ 2025-10-05 10:16 ` Mahmoud Nagy Adam
0 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Nagy Adam @ 2025-10-05 10:16 UTC (permalink / raw)
To: David Matlack
Cc: kvm, alex.williamson, jgg, kbusch, benh, David Woodhouse, pravkmr,
nagy, linux-kernel
David Matlack <dmatlack@google.com> writes:
> On Wed, Sep 24, 2025 at 7:11 AM Mahmoud Adam <mngyadam@amazon.de> wrote:
>>
>> This adds two new region flags:
>> - VFIO_REGION_INFO_FLAG_ALIAS: set on alias regions.
>> - VFIO_REGION_INFO_FLAG_WC: indicates WC is in effect for that region.
>
> Once you settle on a uAPI, this would be a good candidate for some
> VFIO selftests coverage [1][2].
Yup, I was planning to do that after seeing the new addition of vfio
selftests. I have already a small test that I use for testing this. I
can clean it up & port it to fit in the vfio selftest, once there is
some momentum on this series. Thanks David.
-MNAdam
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/7] vfio: Add alias region uapi for device feature
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
` (7 preceding siblings ...)
2025-10-03 21:58 ` [RFC PATCH 0/7] vfio: Add alias region uapi for device feature David Matlack
@ 2025-10-15 19:36 ` Alex Williamson
2025-10-27 16:32 ` David Matlack
9 siblings, 0 replies; 14+ messages in thread
From: Alex Williamson @ 2025-10-15 19:36 UTC (permalink / raw)
To: Mahmoud Adam
Cc: kvm, jgg, kbusch, benh, David Woodhouse, pravkmr, nagy,
linux-kernel
On Wed, 24 Sep 2025 16:09:51 +0200
Mahmoud Adam <mngyadam@amazon.de> wrote:
> This RFC proposes a new uapi VFIO DEVICE_FEATURE to create per-region
> aliases with selectable attributes, initially enabling write-combine
> (WC) where supported by the underlying region. The goal is to expose a
> UAPI for userspace to request an alias of an existing VFIO region with
> extra flags, then interact with it via a stable alias index through
> existing ioctls and mmap where applicable.
>
> This proposal is following Alex's suggestion [1]. This uapi allows
> creating a region alias where the user could specify to enable certain
> attributes through the alias. And then could use the alias index to
> get the region info and grab the offset to operate on.
>
> One example is to create a new Alias for bar 0 or similar BAR with WC
> enabled. Then you can use the alias offset to mmap to the region with
> WC enabled.
>
> The uapi allows the user to request a region index to alias and the
> extra flags to be set. Users can PROBE to get which flags are
> supported by this region. The flags are the same to the region flags
> in the region_info uapi.
>
> This adds two new region flags:
> - VFIO_REGION_INFO_FLAG_ALIAS: set on alias regions.
> - VFIO_REGION_INFO_FLAG_WC: indicates WC is in effect for that region.
Sorry for the delayed feedback...
I think these should be described via capabilities returned with the
vfio_region_info rather than flags. A flag that indicates the region
is an alias is really only useful for the restriction that we don't
want to allow aliases of aliases, but it doesn't provide full
introspection of what region this is actually an alias of.
The WC flag also doesn't allow much extension. I think we want this to
have natural room to implement further mapping flags, so likely the
same capability that describes the region as being an alias should also
report back the mapping flags for the alias. Thanks,
Alex
>
> Then this series implement this uapi on vfio-pci. For vfio-pci, Alias
> regions are only (for now) possible for mmap supported regions. There
> could be future usages for these alias regions other than mmaps (like
> I think we could use it to also allow to use read & write on
> pci_iomap_wc version of the region?). In case if similar alias region
> already exist return the current alias index to the user.
>
> To mmap the region alias, we use the mmap region ops. Through that we
> translate the vm_pgoff to its aliased region and call vfio_device mmap
> with the alias pgoff. This enables us to mmap the original region then
> update the pgrot for WC afterwards.
>
> The call path would be:
> vfio_pci_core_mmap (index >= VFIO_PCI_NUM_REGIONS)
> vfio_pci_alias_region_mmap (update vm_pgoff)
> vfio_pci_core_mmap
>
> This series also adds required locking for region array
> accessing. Since now regions are added after initial setup.
>
> [1]: https://lore.kernel.org/kvm/20250811160710.174ca708.alex.williamson@redhat.com/
>
> references:
> https://lore.kernel.org/kvm/20250804104012.87915-1-mngyadam@amazon.de/
> https://lore.kernel.org/kvm/20240731155352.3973857-1-kbusch@meta.com/
> https://lore.kernel.org/kvm/lrkyq4ivccb6x.fsf@dev-dsk-mngyadam-1c-cb3f7548.eu-west-1.amazon.com/
>
> Mahmoud Adam (7):
> vfio/pci: refactor region dereferences for RCU.
> vfio_pci_core: split krealloc to allow use RCU & return index
> vfio/pci: add RCU locking for regions access
> vfio: add FEATURE_ALIAS_REGION uapi
> vfio_pci_core: allow regions with no release op
> vfio-pci: add alias_region mmap ops
> vfio-pci-core: implement FEATURE_ALIAS_REGION uapi
>
> drivers/vfio/pci/vfio_pci_core.c | 289 +++++++++++++++++++++++++++----
> drivers/vfio/pci/vfio_pci_igd.c | 34 +++-
> include/linux/vfio_pci_core.h | 1 +
> include/uapi/linux/vfio.h | 24 +++
> 4 files changed, 301 insertions(+), 47 deletions(-)
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/7] vfio: Add alias region uapi for device feature
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
` (8 preceding siblings ...)
2025-10-15 19:36 ` Alex Williamson
@ 2025-10-27 16:32 ` David Matlack
2025-10-27 18:19 ` Mahmoud Nagy Adam
9 siblings, 1 reply; 14+ messages in thread
From: David Matlack @ 2025-10-27 16:32 UTC (permalink / raw)
To: Mahmoud Adam
Cc: kvm, jgg, kbusch, benh, David Woodhouse, pravkmr, nagy,
linux-kernel, Alex Williamson
On Wed, Sep 24, 2025 at 7:11 AM Mahmoud Adam <mngyadam@amazon.de> wrote:
>
> This RFC proposes a new uapi VFIO DEVICE_FEATURE to create per-region
> aliases with selectable attributes, initially enabling write-combine
> (WC) where supported by the underlying region. The goal is to expose a
> UAPI for userspace to request an alias of an existing VFIO region with
> extra flags, then interact with it via a stable alias index through
> existing ioctls and mmap where applicable.
Would it make sense to build this on top of Leon's dma-buf series [1]?
My understanding is that dma-buf can support mmap, so WC could just be
a property attached to a dma-buf fd and passed by userspace via
VFIO_DEVICE_FEATURE_DMA_BUF. Then VFIO wouldn't have to create or
manage region aliases.
Apologies if this has already been discussed, I did not go through all
the past discussion.
[1] https://lore.kernel.org/kvm/72ecaa13864ca346797e342d23a7929562788148.1760368250.git.leon@kernel.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/7] vfio: Add alias region uapi for device feature
2025-10-27 16:32 ` David Matlack
@ 2025-10-27 18:19 ` Mahmoud Nagy Adam
0 siblings, 0 replies; 14+ messages in thread
From: Mahmoud Nagy Adam @ 2025-10-27 18:19 UTC (permalink / raw)
To: David Matlack
Cc: kvm, jgg, kbusch, benh, David Woodhouse, pravkmr, nagy,
linux-kernel, Alex Williamson
David Matlack <dmatlack@google.com> writes:
> On Wed, Sep 24, 2025 at 7:11 AM Mahmoud Adam <mngyadam@amazon.de> wrote:
>>
>> This RFC proposes a new uapi VFIO DEVICE_FEATURE to create per-region
>> aliases with selectable attributes, initially enabling write-combine
>> (WC) where supported by the underlying region. The goal is to expose a
>> UAPI for userspace to request an alias of an existing VFIO region with
>> extra flags, then interact with it via a stable alias index through
>> existing ioctls and mmap where applicable.
>
> Would it make sense to build this on top of Leon's dma-buf series [1]?
> My understanding is that dma-buf can support mmap, so WC could just be
> a property attached to a dma-buf fd and passed by userspace via
> VFIO_DEVICE_FEATURE_DMA_BUF. Then VFIO wouldn't have to create or
> manage region aliases.
>
The motivation for this proposal is that it would integrate seamlessly
with DPDK. I haven’t yet investigated the new dma-buf series as a
solution, but my initial impression is that it doesn’t fit well with
DPDK’s existing model.
Enabling write-combine with dma-buf was also proposed here[1] and I
agree that’s a generally good idea and fits naturally with dma-buf. I
think it would be valuable to support WC with both region aliasing and
dma-buf, But maybe Alex have a different opinion on that.
[1]: https://lore.kernel.org/kvm/20250918214425.2677057-1-amastro@fb.com/
> Apologies if this has already been discussed, I did not go through all
> the past discussion.
>
> [1] https://lore.kernel.org/kvm/72ecaa13864ca346797e342d23a7929562788148.1760368250.git.leon@kernel.org/
Thanks,
MNAdam
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-10-27 18:20 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-24 14:09 [RFC PATCH 0/7] vfio: Add alias region uapi for device feature Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 1/7] vfio/pci: refactor region dereferences for RCU Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 2/7] vfio_pci_core: split krealloc to allow use RCU & return index Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 3/7] vfio/pci: add RCU locking for regions access Mahmoud Adam
2025-09-24 16:15 ` Mahmoud Nagy Adam
2025-09-24 14:09 ` [RFC PATCH 4/7] vfio: add FEATURE_ALIAS_REGION uapi Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 5/7] vfio_pci_core: allow regions with no release op Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 6/7] vfio-pci: add alias_region mmap ops Mahmoud Adam
2025-09-24 14:09 ` [RFC PATCH 7/7] vfio-pci-core: implement FEATURE_ALIAS_REGION uapi Mahmoud Adam
2025-10-03 21:58 ` [RFC PATCH 0/7] vfio: Add alias region uapi for device feature David Matlack
2025-10-05 10:16 ` Mahmoud Nagy Adam
2025-10-15 19:36 ` Alex Williamson
2025-10-27 16:32 ` David Matlack
2025-10-27 18:19 ` Mahmoud Nagy Adam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox