* [PATCH rdma-next 0/6] Add support for TLP emulation
@ 2026-02-25 14:19 Leon Romanovsky
2026-02-25 14:19 ` [PATCH mlx5-next 1/6] net/mlx5: Add TLP emulation device capabilities Leon Romanovsky
` (10 more replies)
0 siblings, 11 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:19 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
This series adds support for Transaction Layer Packet (TLP) emulation
response gateway regions, enabling userspace device emulation software
to write TLP responses directly to lower layers without kernel driver
involvement.
Currently, the mlx5 driver exposes VirtIO emulation access regions via
the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
ioctl to also support allocating TLP response gateway channels for
PCI device emulation use cases.
Thanks
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Maher Sanalla (6):
net/mlx5: Add TLP emulation device capabilities
net/mlx5: Expose TLP emulation capabilities
RDMA/mlx5: Refactor VAR table to use region abstraction
RDMA/mlx5: Add TLP VAR region support and infrastructure
RDMA/mlx5: Add support for TLP VAR allocation
RDMA/mlx5: Add VAR object query method for cross-process sharing
drivers/infiniband/hw/mlx5/main.c | 196 ++++++++++++++++++++-----
drivers/infiniband/hw/mlx5/mlx5_ib.h | 8 +-
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 6 +
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
include/linux/mlx5/device.h | 9 ++
include/linux/mlx5/mlx5_ifc.h | 23 ++-
include/uapi/rdma/mlx5_user_ioctl_cmds.h | 9 ++
include/uapi/rdma/mlx5_user_ioctl_verbs.h | 4 +
8 files changed, 218 insertions(+), 38 deletions(-)
---
base-commit: 58409f0d4dd3f9e987214064e49b088823934304
change-id: 20260225-var-tlp-93de10adedb8
Best regards,
--
Leon Romanovsky <leonro@nvidia.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH mlx5-next 1/6] net/mlx5: Add TLP emulation device capabilities
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
@ 2026-02-25 14:19 ` Leon Romanovsky
2026-02-25 14:19 ` [PATCH mlx5-next 2/6] net/mlx5: Expose TLP emulation capabilities Leon Romanovsky
` (9 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:19 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
Introduce the hardware structures and definitions needed for the driver
support of TLP emulation in mlx5_ifc.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 775cb0c56865..a3948b36820d 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1389,6 +1389,26 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
u8 reserved_at_1c0[0x640];
};
+struct mlx5_ifc_tlp_dev_emu_capabilities_bits {
+ u8 reserved_at_0[0x20];
+
+ u8 reserved_at_20[0x13];
+ u8 log_tlp_rsp_gw_page_stride[0x5];
+ u8 reserved_at_38[0x8];
+
+ u8 reserved_at_40[0xc0];
+
+ u8 reserved_at_100[0xc];
+ u8 tlp_rsp_gw_num_pages[0x4];
+ u8 reserved_at_110[0x10];
+
+ u8 reserved_at_120[0xa0];
+
+ u8 tlp_rsp_gw_pages_bar_offset[0x40];
+
+ u8 reserved_at_200[0x600];
+};
+
enum {
MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_1_BYTE = 0x0,
MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_2_BYTES = 0x2,
@@ -1961,7 +1981,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 log_max_rqt[0x5];
u8 reserved_at_390[0x3];
u8 log_max_rqt_size[0x5];
- u8 reserved_at_398[0x1];
+ u8 tlp_device_emulation_manager[0x1];
u8 vnic_env_cnt_bar_uar_access[0x1];
u8 vnic_env_cnt_odp_page_fault[0x1];
u8 log_max_tis_per_sq[0x5];
@@ -3830,6 +3850,7 @@ union mlx5_ifc_hca_cap_union_bits {
struct mlx5_ifc_tls_cap_bits tls_cap;
struct mlx5_ifc_device_mem_cap_bits device_mem_cap;
struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap;
+ struct mlx5_ifc_tlp_dev_emu_capabilities_bits tlp_dev_emu_capabilities;
struct mlx5_ifc_macsec_cap_bits macsec_cap;
struct mlx5_ifc_crypto_cap_bits crypto_cap;
struct mlx5_ifc_ipsec_cap_bits ipsec_cap;
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH mlx5-next 2/6] net/mlx5: Expose TLP emulation capabilities
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
2026-02-25 14:19 ` [PATCH mlx5-next 1/6] net/mlx5: Add TLP emulation device capabilities Leon Romanovsky
@ 2026-02-25 14:19 ` Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 3/6] RDMA/mlx5: Refactor VAR table to use region abstraction Leon Romanovsky
` (8 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:19 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
Expose and query TLP device emulation caps on driver load.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 6 ++++++
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
include/linux/mlx5/device.h | 9 +++++++++
3 files changed, 16 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index eeb4437975f2..55249f405841 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -255,6 +255,12 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
+ if (MLX5_CAP_GEN(dev, tlp_device_emulation_manager)) {
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_TLP_EMULATION, HCA_CAP_OPMOD_GET_CUR);
+ if (err)
+ return err;
+ }
+
if (MLX5_CAP_GEN(dev, ipsec_offload)) {
err = mlx5_core_get_caps_mode(dev, MLX5_CAP_IPSEC, HCA_CAP_OPMOD_GET_CUR);
if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index fdc3ba20912e..b0bc4a7d4a93 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1772,6 +1772,7 @@ static const int types[] = {
MLX5_CAP_CRYPTO,
MLX5_CAP_SHAMPO,
MLX5_CAP_ADV_RDMA,
+ MLX5_CAP_TLP_EMULATION,
};
static void mlx5_hca_caps_free(struct mlx5_core_dev *dev)
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index b37fe39cef27..25c6b42140b2 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1259,6 +1259,7 @@ enum mlx5_cap_type {
MLX5_CAP_PORT_SELECTION = 0x25,
MLX5_CAP_ADV_VIRTUALIZATION = 0x26,
MLX5_CAP_ADV_RDMA = 0x28,
+ MLX5_CAP_TLP_EMULATION = 0x2a,
/* NUM OF CAP Types */
MLX5_CAP_NUM
};
@@ -1481,6 +1482,14 @@ enum mlx5_qcam_feature_groups {
MLX5_GET64(virtio_emulation_cap, \
(mdev)->caps.hca[MLX5_CAP_VDPA_EMULATION]->cur, cap)
+#define MLX5_CAP_DEV_TLP_EMULATION(mdev, cap)\
+ MLX5_GET(tlp_dev_emu_capabilities, \
+ (mdev)->caps.hca[MLX5_CAP_TLP_EMULATION]->cur, cap)
+
+#define MLX5_CAP64_DEV_TLP_EMULATION(mdev, cap)\
+ MLX5_GET64(tlp_dev_emu_capabilities, \
+ (mdev)->caps.hca[MLX5_CAP_TLP_EMULATION]->cur, cap)
+
#define MLX5_CAP_IPSEC(mdev, cap)\
MLX5_GET(ipsec_cap, (mdev)->caps.hca[MLX5_CAP_IPSEC]->cur, cap)
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH rdma-next 3/6] RDMA/mlx5: Refactor VAR table to use region abstraction
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
2026-02-25 14:19 ` [PATCH mlx5-next 1/6] net/mlx5: Add TLP emulation device capabilities Leon Romanovsky
2026-02-25 14:19 ` [PATCH mlx5-next 2/6] net/mlx5: Expose TLP emulation capabilities Leon Romanovsky
@ 2026-02-25 14:19 ` Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 4/6] RDMA/mlx5: Add TLP VAR region support and infrastructure Leon Romanovsky
` (7 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:19 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
Extract mlx5_var_region struct from mlx5_var_table to enable
supporting multiple VAR regions in VAR table, which will be used in
the upcoming patches (Virtio emulation VAR and TLP emulation VAR).
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 62 +++++++++++++++++++-----------------
drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 +++-
2 files changed, 38 insertions(+), 30 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 26ee8e763d5e..835fe2a95ad6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2524,6 +2524,7 @@ static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry)
struct mlx5_ib_dev *dev = to_mdev(entry->ucontext->device);
struct mlx5_var_table *var_table = &dev->var_table;
struct mlx5_ib_ucontext *context = to_mucontext(entry->ucontext);
+ struct mlx5_var_region *var_region;
switch (mentry->mmap_flag) {
case MLX5_IB_MMAP_TYPE_MEMIC:
@@ -2531,9 +2532,10 @@ static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry)
mlx5_ib_dm_mmap_free(dev, mentry);
break;
case MLX5_IB_MMAP_TYPE_VAR:
- mutex_lock(&var_table->bitmap_lock);
- clear_bit(mentry->page_idx, var_table->bitmap);
- mutex_unlock(&var_table->bitmap_lock);
+ var_region = &var_table->var_region;
+ mutex_lock(&var_region->bitmap_lock);
+ clear_bit(mentry->page_idx, var_region->bitmap);
+ mutex_unlock(&var_region->bitmap_lock);
kfree(mentry);
break;
case MLX5_IB_MMAP_TYPE_UAR_WC:
@@ -4143,43 +4145,45 @@ static struct mlx5_user_mmap_entry *
alloc_var_entry(struct mlx5_ib_ucontext *c)
{
struct mlx5_user_mmap_entry *entry;
+ struct mlx5_var_region *var_region;
struct mlx5_var_table *var_table;
u32 page_idx;
int err;
var_table = &to_mdev(c->ibucontext.device)->var_table;
+ var_region = &var_table->var_region;
entry = kzalloc_obj(*entry);
if (!entry)
return ERR_PTR(-ENOMEM);
- mutex_lock(&var_table->bitmap_lock);
- page_idx = find_first_zero_bit(var_table->bitmap,
- var_table->num_var_hw_entries);
- if (page_idx >= var_table->num_var_hw_entries) {
+ mutex_lock(&var_region->bitmap_lock);
+ page_idx = find_first_zero_bit(var_region->bitmap,
+ var_region->num_var_hw_entries);
+ if (page_idx >= var_region->num_var_hw_entries) {
err = -ENOSPC;
- mutex_unlock(&var_table->bitmap_lock);
+ mutex_unlock(&var_region->bitmap_lock);
goto end;
}
- set_bit(page_idx, var_table->bitmap);
- mutex_unlock(&var_table->bitmap_lock);
+ set_bit(page_idx, var_region->bitmap);
+ mutex_unlock(&var_region->bitmap_lock);
- entry->address = var_table->hw_start_addr +
- (page_idx * var_table->stride_size);
+ entry->address = var_region->hw_start_addr +
+ (page_idx * var_region->stride_size);
entry->page_idx = page_idx;
entry->mmap_flag = MLX5_IB_MMAP_TYPE_VAR;
err = mlx5_rdma_user_mmap_entry_insert(c, entry,
- var_table->stride_size);
+ var_region->stride_size);
if (err)
goto err_insert;
return entry;
err_insert:
- mutex_lock(&var_table->bitmap_lock);
- clear_bit(page_idx, var_table->bitmap);
- mutex_unlock(&var_table->bitmap_lock);
+ mutex_lock(&var_region->bitmap_lock);
+ clear_bit(page_idx, var_region->bitmap);
+ mutex_unlock(&var_region->bitmap_lock);
end:
kfree(entry);
return ERR_PTR(err);
@@ -4607,10 +4611,10 @@ static const struct ib_device_ops mlx5_ib_dev_xrc_ops = {
INIT_RDMA_OBJ_SIZE(ib_xrcd, mlx5_ib_xrcd, ibxrcd),
};
-static int mlx5_ib_init_var_table(struct mlx5_ib_dev *dev)
+static int mlx5_ib_init_var_region(struct mlx5_ib_dev *dev)
{
+ struct mlx5_var_region *var_region = &dev->var_table.var_region;
struct mlx5_core_dev *mdev = dev->mdev;
- struct mlx5_var_table *var_table = &dev->var_table;
u8 log_doorbell_bar_size;
u8 log_doorbell_stride;
u64 bar_size;
@@ -4619,17 +4623,17 @@ static int mlx5_ib_init_var_table(struct mlx5_ib_dev *dev)
log_doorbell_bar_size);
log_doorbell_stride = MLX5_CAP_DEV_VDPA_EMULATION(mdev,
log_doorbell_stride);
- var_table->hw_start_addr = dev->mdev->bar_addr +
+ var_region->hw_start_addr = dev->mdev->bar_addr +
MLX5_CAP64_DEV_VDPA_EMULATION(mdev,
doorbell_bar_offset);
bar_size = (1ULL << log_doorbell_bar_size) * 4096;
- var_table->stride_size = 1ULL << log_doorbell_stride;
- var_table->num_var_hw_entries = div_u64(bar_size,
- var_table->stride_size);
- mutex_init(&var_table->bitmap_lock);
- var_table->bitmap = bitmap_zalloc(var_table->num_var_hw_entries,
- GFP_KERNEL);
- return (var_table->bitmap) ? 0 : -ENOMEM;
+ var_region->stride_size = 1ULL << log_doorbell_stride;
+ var_region->num_var_hw_entries = div_u64(bar_size,
+ var_region->stride_size);
+ mutex_init(&var_region->bitmap_lock);
+ var_region->bitmap = bitmap_zalloc(var_region->num_var_hw_entries,
+ GFP_KERNEL);
+ return (var_region->bitmap) ? 0 : -ENOMEM;
}
static void mlx5_ib_cleanup_ucaps(struct mlx5_ib_dev *dev)
@@ -4673,7 +4677,7 @@ static void mlx5_ib_stage_caps_cleanup(struct mlx5_ib_dev *dev)
MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL)
mlx5_ib_cleanup_ucaps(dev);
- bitmap_free(dev->var_table.bitmap);
+ bitmap_free(dev->var_table.var_region.bitmap);
}
static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
@@ -4721,7 +4725,7 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
if (MLX5_CAP_GEN_64(dev->mdev, general_obj_types) &
MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
- err = mlx5_ib_init_var_table(dev);
+ err = mlx5_ib_init_var_region(dev);
if (err)
return err;
}
@@ -4738,7 +4742,7 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
return 0;
err_ucaps:
- bitmap_free(dev->var_table.bitmap);
+ bitmap_free(dev->var_table.var_region.bitmap);
return err;
}
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 2556e326afde..3d0ae52c68a7 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1132,7 +1132,7 @@ struct mlx5_devx_event_table {
struct xarray event_xa;
};
-struct mlx5_var_table {
+struct mlx5_var_region {
/* serialize updating the bitmap */
struct mutex bitmap_lock;
unsigned long *bitmap;
@@ -1141,6 +1141,10 @@ struct mlx5_var_table {
u64 num_var_hw_entries;
};
+struct mlx5_var_table {
+ struct mlx5_var_region var_region;
+};
+
struct mlx5_port_caps {
bool has_smi;
u8 ext_port_cap;
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH rdma-next 4/6] RDMA/mlx5: Add TLP VAR region support and infrastructure
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (2 preceding siblings ...)
2026-02-25 14:19 ` [PATCH rdma-next 3/6] RDMA/mlx5: Refactor VAR table to use region abstraction Leon Romanovsky
@ 2026-02-25 14:19 ` Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 5/6] RDMA/mlx5: Add support for TLP VAR allocation Leon Romanovsky
` (6 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:19 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
Add support for TLP (Transaction Layer Packet) VAR regions used by
software-defined device emulation. TLP VAR provides dedicated response
gateways for sending TLP responses back to the host in TLP emulation
scenarios.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 57 ++++++++++++++++++++++++++++++++----
drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 ++
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 835fe2a95ad6..424426a2cd76 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2518,6 +2518,15 @@ mlx5_ib_pgoff_to_mmap_entry(struct ib_ucontext *ucontext, off_t pg_off)
return rdma_user_mmap_entry_get_pgoff(ucontext, entry_pgoff);
}
+static void mlx5_ib_free_var_mmap_entry(struct mlx5_user_mmap_entry *mentry,
+ struct mlx5_var_region *var_region)
+{
+ mutex_lock(&var_region->bitmap_lock);
+ clear_bit(mentry->page_idx, var_region->bitmap);
+ mutex_unlock(&var_region->bitmap_lock);
+ kfree(mentry);
+}
+
static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry)
{
struct mlx5_user_mmap_entry *mentry = to_mmmap(entry);
@@ -2533,10 +2542,11 @@ static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry)
break;
case MLX5_IB_MMAP_TYPE_VAR:
var_region = &var_table->var_region;
- mutex_lock(&var_region->bitmap_lock);
- clear_bit(mentry->page_idx, var_region->bitmap);
- mutex_unlock(&var_region->bitmap_lock);
- kfree(mentry);
+ mlx5_ib_free_var_mmap_entry(mentry, var_region);
+ break;
+ case MLX5_IB_MMAP_TYPE_TLP_VAR:
+ var_region = &var_table->tlp_var_region;
+ mlx5_ib_free_var_mmap_entry(mentry, var_region);
break;
case MLX5_IB_MMAP_TYPE_UAR_WC:
case MLX5_IB_MMAP_TYPE_UAR_NC:
@@ -2687,6 +2697,7 @@ static int mlx5_ib_mmap_offset(struct mlx5_ib_dev *dev,
mentry = to_mmmap(entry);
pfn = (mentry->address >> PAGE_SHIFT);
if (mentry->mmap_flag == MLX5_IB_MMAP_TYPE_VAR ||
+ mentry->mmap_flag == MLX5_IB_MMAP_TYPE_TLP_VAR ||
mentry->mmap_flag == MLX5_IB_MMAP_TYPE_UAR_NC)
prot = pgprot_noncached(vma->vm_page_prot);
else
@@ -4636,6 +4647,28 @@ static int mlx5_ib_init_var_region(struct mlx5_ib_dev *dev)
return (var_region->bitmap) ? 0 : -ENOMEM;
}
+static int mlx5_ib_init_tlp_var_region(struct mlx5_ib_dev *dev)
+{
+ struct mlx5_var_region *var_region = &dev->var_table.tlp_var_region;
+ struct mlx5_core_dev *mdev = dev->mdev;
+ u8 log_tlp_var_stride;
+
+ log_tlp_var_stride =
+ MLX5_CAP_DEV_TLP_EMULATION(mdev, log_tlp_rsp_gw_page_stride);
+ var_region->hw_start_addr =
+ dev->mdev->bar_addr +
+ MLX5_CAP64_DEV_TLP_EMULATION(mdev, tlp_rsp_gw_pages_bar_offset);
+
+ var_region->stride_size = (1ULL << log_tlp_var_stride) * 4096;
+ var_region->num_var_hw_entries =
+ MLX5_CAP_DEV_TLP_EMULATION(mdev, tlp_rsp_gw_num_pages);
+
+ mutex_init(&var_region->bitmap_lock);
+ var_region->bitmap = bitmap_zalloc(var_region->num_var_hw_entries,
+ GFP_KERNEL);
+ return (var_region->bitmap) ? 0 : -ENOMEM;
+}
+
static void mlx5_ib_cleanup_ucaps(struct mlx5_ib_dev *dev)
{
if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RDMA_CTRL)
@@ -4671,13 +4704,19 @@ static int mlx5_ib_init_ucaps(struct mlx5_ib_dev *dev)
return ret;
}
+static void mlx5_ib_cleanup_var_table(struct mlx5_ib_dev *dev)
+{
+ bitmap_free(dev->var_table.var_region.bitmap);
+ bitmap_free(dev->var_table.tlp_var_region.bitmap);
+}
+
static void mlx5_ib_stage_caps_cleanup(struct mlx5_ib_dev *dev)
{
if (MLX5_CAP_GEN_2_64(dev->mdev, general_obj_types_127_64) &
MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL)
mlx5_ib_cleanup_ucaps(dev);
- bitmap_free(dev->var_table.var_region.bitmap);
+ mlx5_ib_cleanup_var_table(dev);
}
static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
@@ -4737,10 +4776,18 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
goto err_ucaps;
}
+ if (MLX5_CAP_GEN(dev->mdev, tlp_device_emulation_manager)) {
+ err = mlx5_ib_init_tlp_var_region(dev);
+ if (err)
+ goto err_tlp_var;
+ }
+
dev->ib_dev.use_cq_dim = true;
return 0;
+err_tlp_var:
+ mlx5_ib_cleanup_ucaps(dev);
err_ucaps:
bitmap_free(dev->var_table.var_region.bitmap);
return err;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 3d0ae52c68a7..5f789291be93 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -162,6 +162,7 @@ enum mlx5_ib_mmap_type {
MLX5_IB_MMAP_TYPE_UAR_WC = 3,
MLX5_IB_MMAP_TYPE_UAR_NC = 4,
MLX5_IB_MMAP_TYPE_MEMIC_OP = 5,
+ MLX5_IB_MMAP_TYPE_TLP_VAR = 6,
};
struct mlx5_bfreg_info {
@@ -1143,6 +1144,7 @@ struct mlx5_var_region {
struct mlx5_var_table {
struct mlx5_var_region var_region;
+ struct mlx5_var_region tlp_var_region;
};
struct mlx5_port_caps {
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH rdma-next 5/6] RDMA/mlx5: Add support for TLP VAR allocation
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (3 preceding siblings ...)
2026-02-25 14:19 ` [PATCH rdma-next 4/6] RDMA/mlx5: Add TLP VAR region support and infrastructure Leon Romanovsky
@ 2026-02-25 14:19 ` Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 6/6] RDMA/mlx5: Add VAR object query method for cross-process sharing Leon Romanovsky
` (5 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:19 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
Extend the VAR allocation UAPI to accept an optional flags attribute,
allowing userspace to request TLP VAR allocation via the
MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP flag.
When the TLP flag "MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP" is specified,
the driver selects the TLP VAR region for allocation instead of the
regular VirtIO VAR region.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 40 ++++++++++++++++++++++++++-----
include/uapi/rdma/mlx5_user_ioctl_cmds.h | 1 +
include/uapi/rdma/mlx5_user_ioctl_verbs.h | 4 ++++
3 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 424426a2cd76..77cd11c6cca9 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4153,7 +4153,7 @@ static int mlx5_rdma_user_mmap_entry_insert(struct mlx5_ib_ucontext *c,
}
static struct mlx5_user_mmap_entry *
-alloc_var_entry(struct mlx5_ib_ucontext *c)
+alloc_var_entry(struct mlx5_ib_ucontext *c, u32 flags)
{
struct mlx5_user_mmap_entry *entry;
struct mlx5_var_region *var_region;
@@ -4162,7 +4162,11 @@ alloc_var_entry(struct mlx5_ib_ucontext *c)
int err;
var_table = &to_mdev(c->ibucontext.device)->var_table;
- var_region = &var_table->var_region;
+ if (flags & MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP)
+ var_region = &var_table->tlp_var_region;
+ else
+ var_region = &var_table->var_region;
+
entry = kzalloc_obj(*entry);
if (!entry)
return ERR_PTR(-ENOMEM);
@@ -4182,7 +4186,9 @@ alloc_var_entry(struct mlx5_ib_ucontext *c)
entry->address = var_region->hw_start_addr +
(page_idx * var_region->stride_size);
entry->page_idx = page_idx;
- entry->mmap_flag = MLX5_IB_MMAP_TYPE_VAR;
+ entry->mmap_flag = flags & MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP ?
+ MLX5_IB_MMAP_TYPE_TLP_VAR :
+ MLX5_IB_MMAP_TYPE_VAR;
err = mlx5_rdma_user_mmap_entry_insert(c, entry,
var_region->stride_size);
@@ -4205,9 +4211,10 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_VAR_OBJ_ALLOC)(
{
struct ib_uobject *uobj = uverbs_attr_get_uobject(
attrs, MLX5_IB_ATTR_VAR_OBJ_ALLOC_HANDLE);
- struct mlx5_ib_ucontext *c;
struct mlx5_user_mmap_entry *entry;
+ struct mlx5_ib_ucontext *c;
u64 mmap_offset;
+ u32 flags = 0;
u32 length;
int err;
@@ -4215,7 +4222,24 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_VAR_OBJ_ALLOC)(
if (IS_ERR(c))
return PTR_ERR(c);
- entry = alloc_var_entry(c);
+ err = uverbs_get_flags32(&flags, attrs,
+ MLX5_IB_ATTR_VAR_OBJ_ALLOC_FLAGS,
+ MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP);
+ if (err)
+ return err;
+
+ if (flags & MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP) {
+ if (!MLX5_CAP_GEN(to_mdev(c->ibucontext.device)->mdev,
+ tlp_device_emulation_manager))
+ return -EOPNOTSUPP;
+ } else {
+ if (!(MLX5_CAP_GEN_64(to_mdev(c->ibucontext.device)->mdev,
+ general_obj_types) &
+ MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q))
+ return -EOPNOTSUPP;
+ }
+
+ entry = alloc_var_entry(c, flags);
if (IS_ERR(entry))
return PTR_ERR(entry);
@@ -4245,6 +4269,9 @@ DECLARE_UVERBS_NAMED_METHOD(
MLX5_IB_OBJECT_VAR,
UVERBS_ACCESS_NEW,
UA_MANDATORY),
+ UVERBS_ATTR_FLAGS_IN(MLX5_IB_ATTR_VAR_OBJ_ALLOC_FLAGS,
+ enum mlx5_ib_uapi_var_alloc_flags,
+ UA_OPTIONAL),
UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_VAR_OBJ_ALLOC_PAGE_ID,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
@@ -4272,7 +4299,8 @@ static bool var_is_supported(struct ib_device *device)
struct mlx5_ib_dev *dev = to_mdev(device);
return (MLX5_CAP_GEN_64(dev->mdev, general_obj_types) &
- MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q);
+ MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) ||
+ MLX5_CAP_GEN(dev->mdev, tlp_device_emulation_manager);
}
static struct mlx5_user_mmap_entry *
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index 18f9fe070213..01a2a050e468 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -139,6 +139,7 @@ enum mlx5_ib_var_alloc_attrs {
MLX5_IB_ATTR_VAR_OBJ_ALLOC_MMAP_OFFSET,
MLX5_IB_ATTR_VAR_OBJ_ALLOC_MMAP_LENGTH,
MLX5_IB_ATTR_VAR_OBJ_ALLOC_PAGE_ID,
+ MLX5_IB_ATTR_VAR_OBJ_ALLOC_FLAGS,
};
enum mlx5_ib_var_obj_destroy_attrs {
diff --git a/include/uapi/rdma/mlx5_user_ioctl_verbs.h b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
index 8f86e79d78a5..ef295b38a1cf 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_verbs.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
@@ -100,6 +100,10 @@ enum mlx5_ib_uapi_query_port_flags {
MLX5_IB_UAPI_QUERY_PORT_ESW_OWNER_VHCA_ID = 1 << 5,
};
+enum mlx5_ib_uapi_var_alloc_flags {
+ MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP = 1 << 0,
+};
+
struct mlx5_ib_uapi_reg {
__u32 value;
__u32 mask;
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH rdma-next 6/6] RDMA/mlx5: Add VAR object query method for cross-process sharing
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (4 preceding siblings ...)
2026-02-25 14:19 ` [PATCH rdma-next 5/6] RDMA/mlx5: Add support for TLP VAR allocation Leon Romanovsky
@ 2026-02-25 14:19 ` Leon Romanovsky
2026-02-25 14:48 ` [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (4 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:19 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
Introduce MLX5_IB_METHOD_VAR_OBJ_QUERY to enable cross-process sharing
of VAR objects. This method allows a process that has imported the uverbs
fd via ibv_import_device() to query an existing VAR handle and obtain
the mmap parameters needed to map the VAR region into its address space.
This follows the same pattern as UVERBS_METHOD_QUERY_MR, allowing
userspace to implement mlx5dv_import_var() analogous to ibv_import_mr().
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 47 +++++++++++++++++++++++++++++++-
include/uapi/rdma/mlx5_user_ioctl_cmds.h | 8 ++++++
2 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 77cd11c6cca9..a75769f55d31 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4263,6 +4263,34 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_VAR_OBJ_ALLOC)(
return err;
}
+static int UVERBS_HANDLER(MLX5_IB_METHOD_VAR_OBJ_QUERY)(
+ struct uverbs_attr_bundle *attrs)
+{
+ struct mlx5_user_mmap_entry *entry =
+ uverbs_attr_get_obj(attrs, MLX5_IB_ATTR_VAR_OBJ_QUERY_HANDLE);
+ u64 mmap_offset;
+ u32 length;
+ int err;
+
+ mmap_offset = mlx5_entry_to_mmap_offset(entry);
+ if (check_mul_overflow(entry->rdma_entry.npages, (u32)PAGE_SIZE, &length))
+ return -EOVERFLOW;
+
+ err = uverbs_copy_to(attrs, MLX5_IB_ATTR_VAR_OBJ_QUERY_MMAP_OFFSET,
+ &mmap_offset, sizeof(mmap_offset));
+ if (err)
+ return err;
+
+ err = uverbs_copy_to(attrs, MLX5_IB_ATTR_VAR_OBJ_QUERY_PAGE_ID,
+ &entry->page_idx, sizeof(entry->page_idx));
+ if (err)
+ return err;
+
+ err = uverbs_copy_to(attrs, MLX5_IB_ATTR_VAR_OBJ_QUERY_MMAP_LENGTH,
+ &length, sizeof(length));
+ return err;
+}
+
DECLARE_UVERBS_NAMED_METHOD(
MLX5_IB_METHOD_VAR_OBJ_ALLOC,
UVERBS_ATTR_IDR(MLX5_IB_ATTR_VAR_OBJ_ALLOC_HANDLE,
@@ -4289,10 +4317,27 @@ DECLARE_UVERBS_NAMED_METHOD_DESTROY(
UVERBS_ACCESS_DESTROY,
UA_MANDATORY));
+DECLARE_UVERBS_NAMED_METHOD(
+ MLX5_IB_METHOD_VAR_OBJ_QUERY,
+ UVERBS_ATTR_IDR(MLX5_IB_ATTR_VAR_OBJ_QUERY_HANDLE,
+ MLX5_IB_OBJECT_VAR,
+ UVERBS_ACCESS_READ,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_VAR_OBJ_QUERY_PAGE_ID,
+ UVERBS_ATTR_TYPE(u32),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_VAR_OBJ_QUERY_MMAP_LENGTH,
+ UVERBS_ATTR_TYPE(u32),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_VAR_OBJ_QUERY_MMAP_OFFSET,
+ UVERBS_ATTR_TYPE(u64),
+ UA_MANDATORY));
+
DECLARE_UVERBS_NAMED_OBJECT(MLX5_IB_OBJECT_VAR,
UVERBS_TYPE_ALLOC_IDR(mmap_obj_cleanup),
&UVERBS_METHOD(MLX5_IB_METHOD_VAR_OBJ_ALLOC),
- &UVERBS_METHOD(MLX5_IB_METHOD_VAR_OBJ_DESTROY));
+ &UVERBS_METHOD(MLX5_IB_METHOD_VAR_OBJ_DESTROY),
+ &UVERBS_METHOD(MLX5_IB_METHOD_VAR_OBJ_QUERY));
static bool var_is_supported(struct ib_device *device)
{
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index 01a2a050e468..7db03b3fdae6 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -146,9 +146,17 @@ enum mlx5_ib_var_obj_destroy_attrs {
MLX5_IB_ATTR_VAR_OBJ_DESTROY_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
};
+enum mlx5_ib_var_obj_query_attrs {
+ MLX5_IB_ATTR_VAR_OBJ_QUERY_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+ MLX5_IB_ATTR_VAR_OBJ_QUERY_PAGE_ID,
+ MLX5_IB_ATTR_VAR_OBJ_QUERY_MMAP_OFFSET,
+ MLX5_IB_ATTR_VAR_OBJ_QUERY_MMAP_LENGTH,
+};
+
enum mlx5_ib_var_obj_methods {
MLX5_IB_METHOD_VAR_OBJ_ALLOC = (1U << UVERBS_ID_NS_SHIFT),
MLX5_IB_METHOD_VAR_OBJ_DESTROY,
+ MLX5_IB_METHOD_VAR_OBJ_QUERY,
};
enum mlx5_ib_uar_alloc_attrs {
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH rdma-next 0/6] Add support for TLP emulation
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (5 preceding siblings ...)
2026-02-25 14:19 ` [PATCH rdma-next 6/6] RDMA/mlx5: Add VAR object query method for cross-process sharing Leon Romanovsky
@ 2026-02-25 14:48 ` Leon Romanovsky
2026-02-27 1:34 ` Jakub Kicinski
` (3 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-02-25 14:48 UTC (permalink / raw)
To: Saeed Mahameed, Tariq Toukan, Mark Bloch, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jason Gunthorpe
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
>
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
>
> Thanks
>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> Maher Sanalla (6):
> net/mlx5: Add TLP emulation device capabilities
> net/mlx5: Expose TLP emulation capabilities
> RDMA/mlx5: Refactor VAR table to use region abstraction
> RDMA/mlx5: Add TLP VAR region support and infrastructure
> RDMA/mlx5: Add support for TLP VAR allocation
> RDMA/mlx5: Add VAR object query method for cross-process sharing
There is no need in this last patch. There is a way to implement it
purely in userspace.
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH rdma-next 0/6] Add support for TLP emulation
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (6 preceding siblings ...)
2026-02-25 14:48 ` [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
@ 2026-02-27 1:34 ` Jakub Kicinski
2026-03-02 14:06 ` Leon Romanovsky
2026-02-27 21:37 ` Keith Busch
` (2 subsequent siblings)
10 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2026-02-27 1:34 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Jason Gunthorpe,
linux-rdma, netdev, linux-kernel, Maher Sanalla
On Wed, 25 Feb 2026 16:19:30 +0200 Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
>
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
Why is this an RDMA thing if it's a PCIe feature indented for VirtIO?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH rdma-next 0/6] Add support for TLP emulation
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (7 preceding siblings ...)
2026-02-27 1:34 ` Jakub Kicinski
@ 2026-02-27 21:37 ` Keith Busch
2026-03-02 14:04 ` Jason Gunthorpe
2026-03-05 10:34 ` (subset) " Leon Romanovsky
2026-03-05 10:44 ` Leon Romanovsky
10 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2026-02-27 21:37 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jason Gunthorpe, linux-rdma, netdev, linux-kernel, Maher Sanalla
On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
>
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
Sorry if this is obvious to people in the know, but could you possibly
give a quick high level description of the use case behind this feature?
I'm just curious what emulation needs are enabled by having access to
this packet level. Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH rdma-next 0/6] Add support for TLP emulation
2026-02-27 21:37 ` Keith Busch
@ 2026-03-02 14:04 ` Jason Gunthorpe
0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2026-03-02 14:04 UTC (permalink / raw)
To: Keith Busch
Cc: Leon Romanovsky, Saeed Mahameed, Tariq Toukan, Mark Bloch,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, linux-rdma, netdev, linux-kernel, Maher Sanalla,
Max Gurtovoy
On Fri, Feb 27, 2026 at 02:37:05PM -0700, Keith Busch wrote:
> On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote:
> > This series adds support for Transaction Layer Packet (TLP) emulation
> > response gateway regions, enabling userspace device emulation software
> > to write TLP responses directly to lower layers without kernel driver
> > involvement.
> >
> > Currently, the mlx5 driver exposes VirtIO emulation access regions via
> > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> > ioctl to also support allocating TLP response gateway channels for
> > PCI device emulation use cases.
>
> Sorry if this is obvious to people in the know, but could you possibly
> give a quick high level description of the use case behind this feature?
> I'm just curious what emulation needs are enabled by having access to
> this packet level. Thanks!
These days the DPU world supports what I think of as "software defined
PCI functions". Meaning when the DPU receives a PCIe TLP on its PCI
interface it may invoke software generate a response packet for that
TLP.
At least the Mellanox DPU can route the TLPs to software in many
different places: various on-device processors, or on the ARM cores
running Linux..
So, for example, using this basic capability you can write some
software to have the DPU create a PCI function that conforms to the
virtio-net specification. Or NVMe. Or whatever else you dream up.
The peculiar thing is that this is all tightly coupled to RDMA. Eg if
you want your TLP to trigger a DMA from the PCI function then RDMA QPs
and MRs have to be used to execute the DMA.
Jason
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH rdma-next 0/6] Add support for TLP emulation
2026-02-27 1:34 ` Jakub Kicinski
@ 2026-03-02 14:06 ` Leon Romanovsky
0 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-02 14:06 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Jason Gunthorpe,
linux-rdma, netdev, linux-kernel, Maher Sanalla
On Thu, Feb 26, 2026 at 05:34:34PM -0800, Jakub Kicinski wrote:
> On Wed, 25 Feb 2026 16:19:30 +0200 Leon Romanovsky wrote:
> > This series adds support for Transaction Layer Packet (TLP) emulation
> > response gateway regions, enabling userspace device emulation software
> > to write TLP responses directly to lower layers without kernel driver
> > involvement.
> >
> > Currently, the mlx5 driver exposes VirtIO emulation access regions via
> > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> > ioctl to also support allocating TLP response gateway channels for
> > PCI device emulation use cases.
>
> Why is this an RDMA thing if it's a PCIe feature indented for VirtIO?
This is the result of a long path of evolution.
Early on, we had VDPA emulation implemented entirely within the RDMA
stack. The idea was to build something similar to a tun/tap pair, where
a native RDMA QP could be connected to RDMA QPs carrying WQEs formatted
in the VirtIO layout. With some QEMU-side handling, this produced a
virtio-net device.
Later, this model was adapted for a DPU configuration. In that setup,
the DPU's RDMA block held the native QPs, while the x86 host exposed the
VirtIO-formatted QPs, still with QEMU involved. The DPU controlled the
x86-side "tun/tap" through RDMA-linked operations on the associated
objects.
Next, the DPU evolved to instantiate a full VirtIO PCI function on its
own, removing the need for x86 to run QEMU. The DPU continued to manage
the tun/tap via RDMA operations, with some extensions to cover PCI-
related details.
Eventually, the DPU gained general-purpose programmable co-processors
capable of executing various RDMA and non-RDMA operations. As a result,
the RDMA subsystem also became responsible for loading programs onto
these co-processors and managing them within RDMA context and PD
security constraints.
Now we have reached a stage where these co-processors can manage a much
larger portion of the PCI-side behavior, including delegating some
responsibilities back to the host CPU. This produces an odd situation
where a privileged RDMA user can:
- Claim an "emulation" PCI function
- Load a co-processor program associated with that PCI function
- Use RDMA-mediated queues and security controls to interact with the
co-processor program
- Use the co-processor and related mechanisms to capture and respond to
TLPs directed to that PCI function
There are many tightly coupled components in this design, but the TLP
handling cannot be separated from the RDMA-related logic that enables
it.
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: (subset) [PATCH rdma-next 0/6] Add support for TLP emulation
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (8 preceding siblings ...)
2026-02-27 21:37 ` Keith Busch
@ 2026-03-05 10:34 ` Leon Romanovsky
2026-03-05 10:44 ` Leon Romanovsky
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-05 10:34 UTC (permalink / raw)
To: Saeed Mahameed, Tariq Toukan, Mark Bloch, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jason Gunthorpe, Leon Romanovsky
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
On Wed, 25 Feb 2026 16:19:30 +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
>
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
>
> [...]
Applied, thanks!
[1/6] net/mlx5: Add TLP emulation device capabilities
(no commit info)
[2/6] net/mlx5: Expose TLP emulation capabilities
(no commit info)
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: (subset) [PATCH rdma-next 0/6] Add support for TLP emulation
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
` (9 preceding siblings ...)
2026-03-05 10:34 ` (subset) " Leon Romanovsky
@ 2026-03-05 10:44 ` Leon Romanovsky
10 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2026-03-05 10:44 UTC (permalink / raw)
To: Saeed Mahameed, Tariq Toukan, Mark Bloch, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jason Gunthorpe, Leon Romanovsky
Cc: linux-rdma, netdev, linux-kernel, Maher Sanalla
On Wed, 25 Feb 2026 16:19:30 +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
>
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
>
> [...]
Applied, thanks!
[3/6] RDMA/mlx5: Refactor VAR table to use region abstraction
(no commit info)
[4/6] RDMA/mlx5: Add TLP VAR region support and infrastructure
(no commit info)
[5/6] RDMA/mlx5: Add support for TLP VAR allocation
(no commit info)
[6/6] RDMA/mlx5: Add VAR object query method for cross-process sharing
(no commit info)
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-03-05 10:44 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25 14:19 [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
2026-02-25 14:19 ` [PATCH mlx5-next 1/6] net/mlx5: Add TLP emulation device capabilities Leon Romanovsky
2026-02-25 14:19 ` [PATCH mlx5-next 2/6] net/mlx5: Expose TLP emulation capabilities Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 3/6] RDMA/mlx5: Refactor VAR table to use region abstraction Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 4/6] RDMA/mlx5: Add TLP VAR region support and infrastructure Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 5/6] RDMA/mlx5: Add support for TLP VAR allocation Leon Romanovsky
2026-02-25 14:19 ` [PATCH rdma-next 6/6] RDMA/mlx5: Add VAR object query method for cross-process sharing Leon Romanovsky
2026-02-25 14:48 ` [PATCH rdma-next 0/6] Add support for TLP emulation Leon Romanovsky
2026-02-27 1:34 ` Jakub Kicinski
2026-03-02 14:06 ` Leon Romanovsky
2026-02-27 21:37 ` Keith Busch
2026-03-02 14:04 ` Jason Gunthorpe
2026-03-05 10:34 ` (subset) " Leon Romanovsky
2026-03-05 10:44 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox