* [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
@ 2024-09-03 11:37 Leon Romanovsky
2024-09-03 11:37 ` [PATCH mlx5-next 1/2] net/mlx5: Introduce data placement ordering bits Leon Romanovsky
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-09-03 11:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Edward Srouji, linux-kernel, linux-rdma, netdev,
Saeed Mahameed, Tariq Toukan, Yishai Hadas
From: Leon Romanovsky <leonro@nvidia.com>
Hi,
This series from Edward introduces mlx5 data direct placement (DDP)
feature.
This feature allows WRs on the receiver side of the QP to be consumed
out of order, permitting the sender side to transmit messages without
guaranteeing arrival order on the receiver side.
When enabled, the completion ordering of WRs remains in-order,
regardless of the Receive WRs consumption order.
RDMA Read and RDMA Atomic operations on the responder side continue to
be executed in-order, while the ordering of data placement for RDMA
Write and Send operations is not guaranteed.
Thanks
Edward Srouji (2):
net/mlx5: Introduce data placement ordering bits
RDMA/mlx5: Support OOO RX WQE consumption
drivers/infiniband/hw/mlx5/main.c | 8 +++++
drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
drivers/infiniband/hw/mlx5/qp.c | 51 +++++++++++++++++++++++++---
include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
include/uapi/rdma/mlx5-abi.h | 5 +++
5 files changed, 78 insertions(+), 11 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH mlx5-next 1/2] net/mlx5: Introduce data placement ordering bits
2024-09-03 11:37 [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Leon Romanovsky
@ 2024-09-03 11:37 ` Leon Romanovsky
2024-09-03 11:37 ` [PATCH rdma-next 2/2] RDMA/mlx5: Support OOO RX WQE consumption Leon Romanovsky
` (3 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-09-03 11:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Edward Srouji, linux-rdma, netdev, Saeed Mahameed, Tariq Toukan,
Yishai Hadas
From: Edward Srouji <edwards@nvidia.com>
Introduce out-of-order (OOO) data placement (DP) IFC related bits to
support OOO DP QP.
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 970c9d8473ef..691a285f9c1e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1765,7 +1765,12 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_at_328[0x2];
u8 relaxed_ordering_read[0x1];
u8 log_max_pd[0x5];
- u8 reserved_at_330[0x6];
+ u8 dp_ordering_ooo_all_ud[0x1];
+ u8 dp_ordering_ooo_all_uc[0x1];
+ u8 dp_ordering_ooo_all_xrc[0x1];
+ u8 dp_ordering_ooo_all_dc[0x1];
+ u8 dp_ordering_ooo_all_rc[0x1];
+ u8 reserved_at_335[0x1];
u8 pci_sync_for_fw_update_with_driver_unload[0x1];
u8 vnic_env_cnt_steering_fail[0x1];
u8 vport_counter_local_loopback[0x1];
@@ -1986,7 +1991,9 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
u8 reserved_at_0[0x80];
u8 migratable[0x1];
- u8 reserved_at_81[0x11];
+ u8 reserved_at_81[0x7];
+ u8 dp_ordering_force[0x1];
+ u8 reserved_at_89[0x9];
u8 query_vuid[0x1];
u8 reserved_at_93[0xd];
@@ -3397,7 +3404,8 @@ struct mlx5_ifc_qpc_bits {
u8 latency_sensitive[0x1];
u8 reserved_at_24[0x1];
u8 drain_sigerr[0x1];
- u8 reserved_at_26[0x2];
+ u8 reserved_at_26[0x1];
+ u8 dp_ordering_force[0x1];
u8 pd[0x18];
u8 mtu[0x3];
@@ -3470,7 +3478,8 @@ struct mlx5_ifc_qpc_bits {
u8 rae[0x1];
u8 reserved_at_493[0x1];
u8 page_offset[0x6];
- u8 reserved_at_49a[0x3];
+ u8 reserved_at_49a[0x2];
+ u8 dp_ordering_1[0x1];
u8 cd_slave_receive[0x1];
u8 cd_slave_send[0x1];
u8 cd_master[0x1];
@@ -4377,7 +4386,8 @@ struct mlx5_ifc_dctc_bits {
u8 state[0x4];
u8 reserved_at_8[0x18];
- u8 reserved_at_20[0x8];
+ u8 reserved_at_20[0x7];
+ u8 dp_ordering_force[0x1];
u8 user_index[0x18];
u8 reserved_at_40[0x8];
@@ -4392,7 +4402,9 @@ struct mlx5_ifc_dctc_bits {
u8 latency_sensitive[0x1];
u8 rlky[0x1];
u8 free_ar[0x1];
- u8 reserved_at_73[0xd];
+ u8 reserved_at_73[0x1];
+ u8 dp_ordering_1[0x1];
+ u8 reserved_at_75[0xb];
u8 reserved_at_80[0x8];
u8 cs_res[0x8];
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH rdma-next 2/2] RDMA/mlx5: Support OOO RX WQE consumption
2024-09-03 11:37 [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Leon Romanovsky
2024-09-03 11:37 ` [PATCH mlx5-next 1/2] net/mlx5: Introduce data placement ordering bits Leon Romanovsky
@ 2024-09-03 11:37 ` Leon Romanovsky
2024-09-04 6:02 ` [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Zhu Yanjun
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-09-03 11:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Edward Srouji, linux-rdma, netdev, Saeed Mahameed, Tariq Toukan,
Yishai Hadas
From: Edward Srouji <edwards@nvidia.com>
Support QP with out-of-order (OOO) capabilities enabled.
This allows WRs on the receiver side of the QP to be consumed OOO,
permitting the sender side to transmit messages without guaranteeing
arrival order on the receiver side.
When enabled, the completion ordering of WRs remains in-order,
regardless of the Receive WRs consumption order.
RDMA Read and RDMA Atomic operations on the responder side continue to
be executed in-order, while the ordering of data placement for RDMA
Write and Send operations is not guaranteed.
Atomic operations larger than 8 bytes are currently not supported.
Therefore, when this feature is enabled, the created QP restricts its
atomic support to 8 bytes at most.
In addition, when querying the device, a new flag is returned in
response to indicate that the Kernel supports OOO QP.
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 8 +++++
drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
drivers/infiniband/hw/mlx5/qp.c | 51 +++++++++++++++++++++++++---
include/uapi/rdma/mlx5-abi.h | 5 +++
4 files changed, 60 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b85ad3c0bfa1..6cefefd2b578 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1154,6 +1154,14 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
MLX5_IB_QUERY_DEV_RESP_PACKET_BASED_CREDIT_MODE;
resp.flags |= MLX5_IB_QUERY_DEV_RESP_FLAGS_SCAT2CQE_DCT;
+
+ if (MLX5_CAP_GEN_2(mdev, dp_ordering_force) &&
+ (MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_xrc) ||
+ MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_dc) ||
+ MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_rc) ||
+ MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_ud) ||
+ MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_uc)))
+ resp.flags |= MLX5_IB_QUERY_DEV_RESP_FLAGS_OOO_DP;
}
if (offsetofend(typeof(resp), sw_parsing_caps) <= uhw_outlen) {
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 5505eb70939b..926a965e4570 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -532,6 +532,7 @@ struct mlx5_ib_qp {
struct mlx5_bf bf;
u8 has_rq:1;
u8 is_rss:1;
+ u8 is_ooo_rq:1;
/* only for user space QPs. For kernel
* we have it from the bf object
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index e39b1a101e97..837b662b41de 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1960,7 +1960,7 @@ static int atomic_size_to_mode(int size_mask)
}
static int get_atomic_mode(struct mlx5_ib_dev *dev,
- enum ib_qp_type qp_type)
+ struct mlx5_ib_qp *qp)
{
u8 atomic_operations = MLX5_CAP_ATOMIC(dev->mdev, atomic_operations);
u8 atomic = MLX5_CAP_GEN(dev->mdev, atomic);
@@ -1970,7 +1970,7 @@ static int get_atomic_mode(struct mlx5_ib_dev *dev,
if (!atomic)
return -EOPNOTSUPP;
- if (qp_type == MLX5_IB_QPT_DCT)
+ if (qp->type == MLX5_IB_QPT_DCT)
atomic_size_mask = MLX5_CAP_ATOMIC(dev->mdev, atomic_size_dc);
else
atomic_size_mask = MLX5_CAP_ATOMIC(dev->mdev, atomic_size_qp);
@@ -1984,6 +1984,10 @@ static int get_atomic_mode(struct mlx5_ib_dev *dev,
atomic_operations & MLX5_ATOMIC_OPS_FETCH_ADD))
atomic_mode = MLX5_ATOMIC_MODE_IB_COMP;
+ /* OOO DP QPs do not support larger than 8-Bytes atomic operations */
+ if (atomic_mode > MLX5_ATOMIC_MODE_8B && qp->is_ooo_rq)
+ atomic_mode = MLX5_ATOMIC_MODE_8B;
+
return atomic_mode;
}
@@ -2839,6 +2843,29 @@ static int check_valid_flow(struct mlx5_ib_dev *dev, struct ib_pd *pd,
return 0;
}
+static bool get_dp_ooo_cap(struct mlx5_core_dev *mdev, enum ib_qp_type qp_type)
+{
+ if (!MLX5_CAP_GEN_2(mdev, dp_ordering_force))
+ return false;
+
+ switch (qp_type) {
+ case IB_QPT_RC:
+ return MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_rc);
+ case IB_QPT_XRC_INI:
+ case IB_QPT_XRC_TGT:
+ return MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_xrc);
+ case IB_QPT_UC:
+ return MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_uc);
+ case IB_QPT_UD:
+ return MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_ud);
+ case MLX5_IB_QPT_DCI:
+ case MLX5_IB_QPT_DCT:
+ return MLX5_CAP_GEN(mdev, dp_ordering_ooo_all_dc);
+ default:
+ return false;
+ }
+}
+
static void process_vendor_flag(struct mlx5_ib_dev *dev, int *flags, int flag,
bool cond, struct mlx5_ib_qp *qp)
{
@@ -3365,7 +3392,7 @@ static int set_qpc_atomic_flags(struct mlx5_ib_qp *qp,
if (access_flags & IB_ACCESS_REMOTE_ATOMIC) {
int atomic_mode;
- atomic_mode = get_atomic_mode(dev, qp->type);
+ atomic_mode = get_atomic_mode(dev, qp);
if (atomic_mode < 0)
return -EOPNOTSUPP;
@@ -4316,6 +4343,11 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
if (qp->flags & MLX5_IB_QP_CREATE_SQPN_QP1)
MLX5_SET(qpc, qpc, deth_sqpn, 1);
+ if (qp->is_ooo_rq && cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR) {
+ MLX5_SET(qpc, qpc, dp_ordering_1, 1);
+ MLX5_SET(qpc, qpc, dp_ordering_force, 1);
+ }
+
mlx5_cur = to_mlx5_state(cur_state);
mlx5_new = to_mlx5_state(new_state);
@@ -4531,7 +4563,7 @@ static int mlx5_ib_modify_dct(struct ib_qp *ibqp, struct ib_qp_attr *attr,
if (attr->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC) {
int atomic_mode;
- atomic_mode = get_atomic_mode(dev, MLX5_IB_QPT_DCT);
+ atomic_mode = get_atomic_mode(dev, qp);
if (atomic_mode < 0)
return -EOPNOTSUPP;
@@ -4573,6 +4605,10 @@ static int mlx5_ib_modify_dct(struct ib_qp *ibqp, struct ib_qp_attr *attr,
MLX5_SET(dctc, dctc, hop_limit, attr->ah_attr.grh.hop_limit);
if (attr->ah_attr.type == RDMA_AH_ATTR_TYPE_ROCE)
MLX5_SET(dctc, dctc, eth_prio, attr->ah_attr.sl & 0x7);
+ if (qp->is_ooo_rq) {
+ MLX5_SET(dctc, dctc, dp_ordering_1, 1);
+ MLX5_SET(dctc, dctc, dp_ordering_force, 1);
+ }
err = mlx5_core_create_dct(dev, &qp->dct.mdct, qp->dct.in,
MLX5_ST_SZ_BYTES(create_dct_in), out,
@@ -4676,11 +4712,16 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
min(udata->inlen, sizeof(ucmd))))
return -EFAULT;
- if (ucmd.comp_mask ||
+ if (ucmd.comp_mask & ~MLX5_IB_MODIFY_QP_OOO_DP ||
memchr_inv(&ucmd.burst_info.reserved, 0,
sizeof(ucmd.burst_info.reserved)))
return -EOPNOTSUPP;
+ if (ucmd.comp_mask & MLX5_IB_MODIFY_QP_OOO_DP) {
+ if (!get_dp_ooo_cap(dev->mdev, qp->type))
+ return -EOPNOTSUPP;
+ qp->is_ooo_rq = 1;
+ }
}
if (qp->type == IB_QPT_GSI)
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index d4f6a36dffb0..8a6ad6c6841c 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -252,6 +252,7 @@ enum mlx5_ib_query_dev_resp_flags {
MLX5_IB_QUERY_DEV_RESP_FLAGS_CQE_128B_PAD = 1 << 1,
MLX5_IB_QUERY_DEV_RESP_PACKET_BASED_CREDIT_MODE = 1 << 2,
MLX5_IB_QUERY_DEV_RESP_FLAGS_SCAT2CQE_DCT = 1 << 3,
+ MLX5_IB_QUERY_DEV_RESP_FLAGS_OOO_DP = 1 << 4,
};
enum mlx5_ib_tunnel_offloads {
@@ -439,6 +440,10 @@ struct mlx5_ib_burst_info {
__u16 reserved;
};
+enum mlx5_ib_modify_qp_mask {
+ MLX5_IB_MODIFY_QP_OOO_DP = 1 << 0,
+};
+
struct mlx5_ib_modify_qp {
__u32 comp_mask;
struct mlx5_ib_burst_info burst_info;
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-03 11:37 [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Leon Romanovsky
2024-09-03 11:37 ` [PATCH mlx5-next 1/2] net/mlx5: Introduce data placement ordering bits Leon Romanovsky
2024-09-03 11:37 ` [PATCH rdma-next 2/2] RDMA/mlx5: Support OOO RX WQE consumption Leon Romanovsky
@ 2024-09-04 6:02 ` Zhu Yanjun
2024-09-04 8:27 ` Edward Srouji
2024-11-04 8:20 ` (subset) " Leon Romanovsky
2024-11-04 8:27 ` Leon Romanovsky
4 siblings, 1 reply; 16+ messages in thread
From: Zhu Yanjun @ 2024-09-04 6:02 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, Edward Srouji, linux-kernel, linux-rdma, netdev,
Saeed Mahameed, Tariq Toukan, Yishai Hadas
在 2024/9/3 19:37, Leon Romanovsky 写道:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Hi,
>
> This series from Edward introduces mlx5 data direct placement (DDP)
> feature.
>
> This feature allows WRs on the receiver side of the QP to be consumed
> out of order, permitting the sender side to transmit messages without
> guaranteeing arrival order on the receiver side.
>
> When enabled, the completion ordering of WRs remains in-order,
> regardless of the Receive WRs consumption order.
>
> RDMA Read and RDMA Atomic operations on the responder side continue to
> be executed in-order, while the ordering of data placement for RDMA
> Write and Send operations is not guaranteed.
It is an interesting feature. If I got this feature correctly, this
feature permits the user consumes the data out of order when RDMA Write
and Send operations. But its completiong ordering is still in order.
Any scenario that this feature can be applied and what benefits will be
got from this feature?
I am just curious about this. Normally the users will consume the data
in order. In what scenario, the user will consume the data out of order?
Thanks,
Zhu Yanjun
>
> Thanks
>
> Edward Srouji (2):
> net/mlx5: Introduce data placement ordering bits
> RDMA/mlx5: Support OOO RX WQE consumption
>
> drivers/infiniband/hw/mlx5/main.c | 8 +++++
> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
> drivers/infiniband/hw/mlx5/qp.c | 51 +++++++++++++++++++++++++---
> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
> include/uapi/rdma/mlx5-abi.h | 5 +++
> 5 files changed, 78 insertions(+), 11 deletions(-)
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-04 6:02 ` [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Zhu Yanjun
@ 2024-09-04 8:27 ` Edward Srouji
2024-09-04 11:53 ` Zhu Yanjun
0 siblings, 1 reply; 16+ messages in thread
From: Edward Srouji @ 2024-09-04 8:27 UTC (permalink / raw)
To: Zhu Yanjun, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
> External email: Use caution opening links or attachments
>
>
> 在 2024/9/3 19:37, Leon Romanovsky 写道:
>> From: Leon Romanovsky <leonro@nvidia.com>
>>
>> Hi,
>>
>> This series from Edward introduces mlx5 data direct placement (DDP)
>> feature.
>>
>> This feature allows WRs on the receiver side of the QP to be consumed
>> out of order, permitting the sender side to transmit messages without
>> guaranteeing arrival order on the receiver side.
>>
>> When enabled, the completion ordering of WRs remains in-order,
>> regardless of the Receive WRs consumption order.
>>
>> RDMA Read and RDMA Atomic operations on the responder side continue to
>> be executed in-order, while the ordering of data placement for RDMA
>> Write and Send operations is not guaranteed.
>
> It is an interesting feature. If I got this feature correctly, this
> feature permits the user consumes the data out of order when RDMA Write
> and Send operations. But its completiong ordering is still in order.
>
Correct.
> Any scenario that this feature can be applied and what benefits will be
> got from this feature?
>
> I am just curious about this. Normally the users will consume the data
> in order. In what scenario, the user will consume the data out of order?
>
One of the main benefits of this feature is achieving higher bandwidth
(BW) by allowing
responders to receive packets out of order (OOO).
For example, this can be utilized in devices that support multi-plane
functionality,
as introduced in the "Multi-plane support for mlx5" series [1]. When
mlx5 multi-plane
is supported, a single logical mlx5 port aggregates multiple physical
plane ports.
In this scenario, the requester can "spray" packets across the multiple
physical
plane ports without guaranteeing packet order, either on the wire or on
the receiver
(responder) side.
With this approach, no barriers or fences are required to ensure
in-order packet
reception, which optimizes the data path for performance. This can
result in better
BW, theoretically achieving line-rate performance equivalent to the sum of
the maximum BW of all physical plane ports, with only one QP.
[1] https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/
> Thanks,
> Zhu Yanjun
>
>>
>> Thanks
>>
>> Edward Srouji (2):
>> net/mlx5: Introduce data placement ordering bits
>> RDMA/mlx5: Support OOO RX WQE consumption
>>
>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
>> drivers/infiniband/hw/mlx5/qp.c | 51 +++++++++++++++++++++++++---
>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
>> include/uapi/rdma/mlx5-abi.h | 5 +++
>> 5 files changed, 78 insertions(+), 11 deletions(-)
>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-04 8:27 ` Edward Srouji
@ 2024-09-04 11:53 ` Zhu Yanjun
2024-09-05 12:23 ` Edward Srouji
0 siblings, 1 reply; 16+ messages in thread
From: Zhu Yanjun @ 2024-09-04 11:53 UTC (permalink / raw)
To: Edward Srouji, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
在 2024/9/4 16:27, Edward Srouji 写道:
>
> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>
>>> Hi,
>>>
>>> This series from Edward introduces mlx5 data direct placement (DDP)
>>> feature.
>>>
>>> This feature allows WRs on the receiver side of the QP to be consumed
>>> out of order, permitting the sender side to transmit messages without
>>> guaranteeing arrival order on the receiver side.
>>>
>>> When enabled, the completion ordering of WRs remains in-order,
>>> regardless of the Receive WRs consumption order.
>>>
>>> RDMA Read and RDMA Atomic operations on the responder side continue to
>>> be executed in-order, while the ordering of data placement for RDMA
>>> Write and Send operations is not guaranteed.
>>
>> It is an interesting feature. If I got this feature correctly, this
>> feature permits the user consumes the data out of order when RDMA Write
>> and Send operations. But its completiong ordering is still in order.
>>
> Correct.
>> Any scenario that this feature can be applied and what benefits will be
>> got from this feature?
>>
>> I am just curious about this. Normally the users will consume the data
>> in order. In what scenario, the user will consume the data out of order?
>>
> One of the main benefits of this feature is achieving higher bandwidth
> (BW) by allowing
> responders to receive packets out of order (OOO).
>
> For example, this can be utilized in devices that support multi-plane
> functionality,
> as introduced in the "Multi-plane support for mlx5" series [1]. When
> mlx5 multi-plane
> is supported, a single logical mlx5 port aggregates multiple physical
> plane ports.
> In this scenario, the requester can "spray" packets across the
> multiple physical
> plane ports without guaranteeing packet order, either on the wire or
> on the receiver
> (responder) side.
>
> With this approach, no barriers or fences are required to ensure
> in-order packet
> reception, which optimizes the data path for performance. This can
> result in better
> BW, theoretically achieving line-rate performance equivalent to the
> sum of
> the maximum BW of all physical plane ports, with only one QP.
Thanks a lot for your quick reply. Without ensuring in-order packet
reception, this does optimize the data path for performance.
I agree with you.
But how does the receiver get the correct packets from the out-of-order
packets efficiently?
The method is implemented in Software or Hardware?
I am just interested in this feature and want to know more about this.
Thanks,
Zhu Yanjun
>
> [1] https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/
>> Thanks,
>> Zhu Yanjun
>>
>>>
>>> Thanks
>>>
>>> Edward Srouji (2):
>>> net/mlx5: Introduce data placement ordering bits
>>> RDMA/mlx5: Support OOO RX WQE consumption
>>>
>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
>>> drivers/infiniband/hw/mlx5/qp.c | 51
>>> +++++++++++++++++++++++++---
>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
>>> include/uapi/rdma/mlx5-abi.h | 5 +++
>>> 5 files changed, 78 insertions(+), 11 deletions(-)
>>>
>>
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-04 11:53 ` Zhu Yanjun
@ 2024-09-05 12:23 ` Edward Srouji
2024-09-06 5:02 ` Zhu Yanjun
2024-09-06 13:02 ` Bernard Metzler
0 siblings, 2 replies; 16+ messages in thread
From: Edward Srouji @ 2024-09-05 12:23 UTC (permalink / raw)
To: Zhu Yanjun, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
> External email: Use caution opening links or attachments
>
>
> 在 2024/9/4 16:27, Edward Srouji 写道:
>>
>> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
>>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>>
>>>> Hi,
>>>>
>>>> This series from Edward introduces mlx5 data direct placement (DDP)
>>>> feature.
>>>>
>>>> This feature allows WRs on the receiver side of the QP to be consumed
>>>> out of order, permitting the sender side to transmit messages without
>>>> guaranteeing arrival order on the receiver side.
>>>>
>>>> When enabled, the completion ordering of WRs remains in-order,
>>>> regardless of the Receive WRs consumption order.
>>>>
>>>> RDMA Read and RDMA Atomic operations on the responder side continue to
>>>> be executed in-order, while the ordering of data placement for RDMA
>>>> Write and Send operations is not guaranteed.
>>>
>>> It is an interesting feature. If I got this feature correctly, this
>>> feature permits the user consumes the data out of order when RDMA Write
>>> and Send operations. But its completiong ordering is still in order.
>>>
>> Correct.
>>> Any scenario that this feature can be applied and what benefits will be
>>> got from this feature?
>>>
>>> I am just curious about this. Normally the users will consume the data
>>> in order. In what scenario, the user will consume the data out of
>>> order?
>>>
>> One of the main benefits of this feature is achieving higher bandwidth
>> (BW) by allowing
>> responders to receive packets out of order (OOO).
>>
>> For example, this can be utilized in devices that support multi-plane
>> functionality,
>> as introduced in the "Multi-plane support for mlx5" series [1]. When
>> mlx5 multi-plane
>> is supported, a single logical mlx5 port aggregates multiple physical
>> plane ports.
>> In this scenario, the requester can "spray" packets across the
>> multiple physical
>> plane ports without guaranteeing packet order, either on the wire or
>> on the receiver
>> (responder) side.
>>
>> With this approach, no barriers or fences are required to ensure
>> in-order packet
>> reception, which optimizes the data path for performance. This can
>> result in better
>> BW, theoretically achieving line-rate performance equivalent to the
>> sum of
>> the maximum BW of all physical plane ports, with only one QP.
>
> Thanks a lot for your quick reply. Without ensuring in-order packet
> reception, this does optimize the data path for performance.
>
> I agree with you.
>
> But how does the receiver get the correct packets from the out-of-order
> packets efficiently?
>
> The method is implemented in Software or Hardware?
The packets have new field that is used by the HW to understand the
correct message order (similar to PSN).
Once the packets arrive OOO to the receiver side, the data is scattered
directly (hence the DDP - "Direct Data Placement" name) by the HW.
So the efficiency is achieved by the HW, as it also saves the required
context and metadata so it can deliver the correct completion to the
user (in-order) once we have some WQEs that can be considered an
"in-order window" and be delivered to the user.
The SW/Applications may receive OOO WR_IDs though (because the first CQE
may have consumed Recv WQE of any index on the receiver side), and it's
their responsibility to handle it from this point, if it's required.
>
> I am just interested in this feature and want to know more about this.
>
> Thanks,
>
> Zhu Yanjun
>
>>
>> [1] https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/
>>> Thanks,
>>> Zhu Yanjun
>>>
>>>>
>>>> Thanks
>>>>
>>>> Edward Srouji (2):
>>>> net/mlx5: Introduce data placement ordering bits
>>>> RDMA/mlx5: Support OOO RX WQE consumption
>>>>
>>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
>>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
>>>> drivers/infiniband/hw/mlx5/qp.c | 51
>>>> +++++++++++++++++++++++++---
>>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
>>>> include/uapi/rdma/mlx5-abi.h | 5 +++
>>>> 5 files changed, 78 insertions(+), 11 deletions(-)
>>>>
>>>
> --
> Best Regards,
> Yanjun.Zhu
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-05 12:23 ` Edward Srouji
@ 2024-09-06 5:02 ` Zhu Yanjun
2024-09-06 12:17 ` Edward Srouji
2024-09-06 13:02 ` Bernard Metzler
1 sibling, 1 reply; 16+ messages in thread
From: Zhu Yanjun @ 2024-09-06 5:02 UTC (permalink / raw)
To: Edward Srouji, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
在 2024/9/5 20:23, Edward Srouji 写道:
>
> On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> 在 2024/9/4 16:27, Edward Srouji 写道:
>>>
>>> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
>>>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>>>
>>>>> Hi,
>>>>>
>>>>> This series from Edward introduces mlx5 data direct placement (DDP)
>>>>> feature.
>>>>>
>>>>> This feature allows WRs on the receiver side of the QP to be consumed
>>>>> out of order, permitting the sender side to transmit messages without
>>>>> guaranteeing arrival order on the receiver side.
>>>>>
>>>>> When enabled, the completion ordering of WRs remains in-order,
>>>>> regardless of the Receive WRs consumption order.
>>>>>
>>>>> RDMA Read and RDMA Atomic operations on the responder side continue to
>>>>> be executed in-order, while the ordering of data placement for RDMA
>>>>> Write and Send operations is not guaranteed.
>>>>
>>>> It is an interesting feature. If I got this feature correctly, this
>>>> feature permits the user consumes the data out of order when RDMA Write
>>>> and Send operations. But its completiong ordering is still in order.
>>>>
>>> Correct.
>>>> Any scenario that this feature can be applied and what benefits will be
>>>> got from this feature?
>>>>
>>>> I am just curious about this. Normally the users will consume the data
>>>> in order. In what scenario, the user will consume the data out of
>>>> order?
>>>>
>>> One of the main benefits of this feature is achieving higher bandwidth
>>> (BW) by allowing
>>> responders to receive packets out of order (OOO).
>>>
>>> For example, this can be utilized in devices that support multi-plane
>>> functionality,
>>> as introduced in the "Multi-plane support for mlx5" series [1]. When
>>> mlx5 multi-plane
>>> is supported, a single logical mlx5 port aggregates multiple physical
>>> plane ports.
>>> In this scenario, the requester can "spray" packets across the
>>> multiple physical
>>> plane ports without guaranteeing packet order, either on the wire or
>>> on the receiver
>>> (responder) side.
>>>
>>> With this approach, no barriers or fences are required to ensure
>>> in-order packet
>>> reception, which optimizes the data path for performance. This can
>>> result in better
>>> BW, theoretically achieving line-rate performance equivalent to the
>>> sum of
>>> the maximum BW of all physical plane ports, with only one QP.
>>
>> Thanks a lot for your quick reply. Without ensuring in-order packet
>> reception, this does optimize the data path for performance.
>>
>> I agree with you.
>>
>> But how does the receiver get the correct packets from the out-of-order
>> packets efficiently?
>>
>> The method is implemented in Software or Hardware?
>
>
> The packets have new field that is used by the HW to understand the
> correct message order (similar to PSN).
>
> Once the packets arrive OOO to the receiver side, the data is scattered
> directly (hence the DDP - "Direct Data Placement" name) by the HW.
>
> So the efficiency is achieved by the HW, as it also saves the required
> context and metadata so it can deliver the correct completion to the
> user (in-order) once we have some WQEs that can be considered an
> "in-order window" and be delivered to the user.
>
> The SW/Applications may receive OOO WR_IDs though (because the first CQE
> may have consumed Recv WQE of any index on the receiver side), and it's
> their responsibility to handle it from this point, if it's required.
Got it. It seems that all the functionalities are implemented in HW. The
SW only receives OOO WR_IDs. Thanks a lot. Perhaps it is helpful to RDMA
LAG devices. It should enhance the performance^_^
BTW, do you have any performance data with this feature?
Best Regards,
Zhu Yanjun
>
>>
>> I am just interested in this feature and want to know more about this.
>>
>> Thanks,
>>
>> Zhu Yanjun
>>
>>>
>>> [1] https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/
>>>> Thanks,
>>>> Zhu Yanjun
>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Edward Srouji (2):
>>>>> net/mlx5: Introduce data placement ordering bits
>>>>> RDMA/mlx5: Support OOO RX WQE consumption
>>>>>
>>>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
>>>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
>>>>> drivers/infiniband/hw/mlx5/qp.c | 51
>>>>> +++++++++++++++++++++++++---
>>>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
>>>>> include/uapi/rdma/mlx5-abi.h | 5 +++
>>>>> 5 files changed, 78 insertions(+), 11 deletions(-)
>>>>>
>>>>
>> --
>> Best Regards,
>> Yanjun.Zhu
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-06 5:02 ` Zhu Yanjun
@ 2024-09-06 12:17 ` Edward Srouji
2024-09-06 15:17 ` Zhu Yanjun
0 siblings, 1 reply; 16+ messages in thread
From: Edward Srouji @ 2024-09-06 12:17 UTC (permalink / raw)
To: Zhu Yanjun, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
On 9/6/2024 8:02 AM, Zhu Yanjun wrote:
> External email: Use caution opening links or attachments
>
>
> 在 2024/9/5 20:23, Edward Srouji 写道:
>>
>> On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> 在 2024/9/4 16:27, Edward Srouji 写道:
>>>>
>>>> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
>>>>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This series from Edward introduces mlx5 data direct placement (DDP)
>>>>>> feature.
>>>>>>
>>>>>> This feature allows WRs on the receiver side of the QP to be
>>>>>> consumed
>>>>>> out of order, permitting the sender side to transmit messages
>>>>>> without
>>>>>> guaranteeing arrival order on the receiver side.
>>>>>>
>>>>>> When enabled, the completion ordering of WRs remains in-order,
>>>>>> regardless of the Receive WRs consumption order.
>>>>>>
>>>>>> RDMA Read and RDMA Atomic operations on the responder side
>>>>>> continue to
>>>>>> be executed in-order, while the ordering of data placement for RDMA
>>>>>> Write and Send operations is not guaranteed.
>>>>>
>>>>> It is an interesting feature. If I got this feature correctly, this
>>>>> feature permits the user consumes the data out of order when RDMA
>>>>> Write
>>>>> and Send operations. But its completiong ordering is still in order.
>>>>>
>>>> Correct.
>>>>> Any scenario that this feature can be applied and what benefits
>>>>> will be
>>>>> got from this feature?
>>>>>
>>>>> I am just curious about this. Normally the users will consume the
>>>>> data
>>>>> in order. In what scenario, the user will consume the data out of
>>>>> order?
>>>>>
>>>> One of the main benefits of this feature is achieving higher bandwidth
>>>> (BW) by allowing
>>>> responders to receive packets out of order (OOO).
>>>>
>>>> For example, this can be utilized in devices that support multi-plane
>>>> functionality,
>>>> as introduced in the "Multi-plane support for mlx5" series [1]. When
>>>> mlx5 multi-plane
>>>> is supported, a single logical mlx5 port aggregates multiple physical
>>>> plane ports.
>>>> In this scenario, the requester can "spray" packets across the
>>>> multiple physical
>>>> plane ports without guaranteeing packet order, either on the wire or
>>>> on the receiver
>>>> (responder) side.
>>>>
>>>> With this approach, no barriers or fences are required to ensure
>>>> in-order packet
>>>> reception, which optimizes the data path for performance. This can
>>>> result in better
>>>> BW, theoretically achieving line-rate performance equivalent to the
>>>> sum of
>>>> the maximum BW of all physical plane ports, with only one QP.
>>>
>>> Thanks a lot for your quick reply. Without ensuring in-order packet
>>> reception, this does optimize the data path for performance.
>>>
>>> I agree with you.
>>>
>>> But how does the receiver get the correct packets from the out-of-order
>>> packets efficiently?
>>>
>>> The method is implemented in Software or Hardware?
>>
>>
>> The packets have new field that is used by the HW to understand the
>> correct message order (similar to PSN).
>>
>> Once the packets arrive OOO to the receiver side, the data is scattered
>> directly (hence the DDP - "Direct Data Placement" name) by the HW.
>>
>> So the efficiency is achieved by the HW, as it also saves the required
>> context and metadata so it can deliver the correct completion to the
>> user (in-order) once we have some WQEs that can be considered an
>> "in-order window" and be delivered to the user.
>>
>> The SW/Applications may receive OOO WR_IDs though (because the first CQE
>> may have consumed Recv WQE of any index on the receiver side), and it's
>> their responsibility to handle it from this point, if it's required.
>
> Got it. It seems that all the functionalities are implemented in HW. The
> SW only receives OOO WR_IDs. Thanks a lot. Perhaps it is helpful to RDMA
> LAG devices. It should enhance the performance^_^
>
> BTW, do you have any performance data with this feature?
Not yet. We tested it functionality wise for now.
But we should be able to measure its performance soon :).
>
> Best Regards,
> Zhu Yanjun
>
>>
>>>
>>> I am just interested in this feature and want to know more about this.
>>>
>>> Thanks,
>>>
>>> Zhu Yanjun
>>>
>>>>
>>>> [1] https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/
>>>>> Thanks,
>>>>> Zhu Yanjun
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Edward Srouji (2):
>>>>>> net/mlx5: Introduce data placement ordering bits
>>>>>> RDMA/mlx5: Support OOO RX WQE consumption
>>>>>>
>>>>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
>>>>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
>>>>>> drivers/infiniband/hw/mlx5/qp.c | 51
>>>>>> +++++++++++++++++++++++++---
>>>>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
>>>>>> include/uapi/rdma/mlx5-abi.h | 5 +++
>>>>>> 5 files changed, 78 insertions(+), 11 deletions(-)
>>>>>>
>>>>>
>>> --
>>> Best Regards,
>>> Yanjun.Zhu
>>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-05 12:23 ` Edward Srouji
2024-09-06 5:02 ` Zhu Yanjun
@ 2024-09-06 13:02 ` Bernard Metzler
1 sibling, 0 replies; 16+ messages in thread
From: Bernard Metzler @ 2024-09-06 13:02 UTC (permalink / raw)
To: Edward Srouji, Zhu Yanjun, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
Saeed Mahameed, Tariq Toukan, Yishai Hadas
> -----Original Message-----
> From: Edward Srouji <edwards@nvidia.com>
> Sent: Thursday, September 5, 2024 2:23 PM
> To: Zhu Yanjun <yanjun.zhu@linux.dev>; Leon Romanovsky <leon@kernel.org>;
> Jason Gunthorpe <jgg@nvidia.com>
> Cc: Leon Romanovsky <leonro@nvidia.com>; linux-kernel@vger.kernel.org;
> linux-rdma@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed
> <saeedm@nvidia.com>; Tariq Toukan <tariqt@nvidia.com>; Yishai Hadas
> <yishaih@nvidia.com>
> Subject: [EXTERNAL] Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct
> placement (DDP)
>
>
> On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > 在 2024/9/4 16:27, Edward Srouji 写道:
> >>
> >> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
> >>>> From: Leon Romanovsky <leonro@nvidia.com>
> >>>>
> >>>> Hi,
> >>>>
> >>>> This series from Edward introduces mlx5 data direct placement (DDP)
> >>>> feature.
> >>>>
> >>>> This feature allows WRs on the receiver side of the QP to be consumed
> >>>> out of order, permitting the sender side to transmit messages without
> >>>> guaranteeing arrival order on the receiver side.
> >>>>
> >>>> When enabled, the completion ordering of WRs remains in-order,
> >>>> regardless of the Receive WRs consumption order.
> >>>>
> >>>> RDMA Read and RDMA Atomic operations on the responder side continue to
> >>>> be executed in-order, while the ordering of data placement for RDMA
> >>>> Write and Send operations is not guaranteed.
> >>>
> >>> It is an interesting feature. If I got this feature correctly, this
> >>> feature permits the user consumes the data out of order when RDMA Write
> >>> and Send operations. But its completiong ordering is still in order.
> >>>
> >> Correct.
> >>> Any scenario that this feature can be applied and what benefits will be
> >>> got from this feature?
> >>>
> >>> I am just curious about this. Normally the users will consume the data
> >>> in order. In what scenario, the user will consume the data out of
> >>> order?
> >>>
> >> One of the main benefits of this feature is achieving higher bandwidth
> >> (BW) by allowing
> >> responders to receive packets out of order (OOO).
> >>
> >> For example, this can be utilized in devices that support multi-plane
> >> functionality,
> >> as introduced in the "Multi-plane support for mlx5" series [1]. When
> >> mlx5 multi-plane
> >> is supported, a single logical mlx5 port aggregates multiple physical
> >> plane ports.
> >> In this scenario, the requester can "spray" packets across the
> >> multiple physical
> >> plane ports without guaranteeing packet order, either on the wire or
> >> on the receiver
> >> (responder) side.
> >>
> >> With this approach, no barriers or fences are required to ensure
> >> in-order packet
> >> reception, which optimizes the data path for performance. This can
> >> result in better
> >> BW, theoretically achieving line-rate performance equivalent to the
> >> sum of
> >> the maximum BW of all physical plane ports, with only one QP.
> >
> > Thanks a lot for your quick reply. Without ensuring in-order packet
> > reception, this does optimize the data path for performance.
> >
> > I agree with you.
> >
> > But how does the receiver get the correct packets from the out-of-order
> > packets efficiently?
> >
> > The method is implemented in Software or Hardware?
>
>
> The packets have new field that is used by the HW to understand the
> correct message order (similar to PSN).
>
Interesting feature! Reminds me somehow on iWarp RDMA with its
DDP sub-layer 😉
But can that extra field be compliant with the standardized wire
protocol?
Thanks,
Bernard.
> Once the packets arrive OOO to the receiver side, the data is scattered
> directly (hence the DDP - "Direct Data Placement" name) by the HW.
>
> So the efficiency is achieved by the HW, as it also saves the required
> context and metadata so it can deliver the correct completion to the
> user (in-order) once we have some WQEs that can be considered an
> "in-order window" and be delivered to the user.
>
> The SW/Applications may receive OOO WR_IDs though (because the first CQE
> may have consumed Recv WQE of any index on the receiver side), and it's
> their responsibility to handle it from this point, if it's required.
>
> >
> > I am just interested in this feature and want to know more about this.
> >
> > Thanks,
> >
> > Zhu Yanjun
> >
> >>
> >> [1] INVALID URI REMOVED
> 3A__lore.kernel.org_lkml_cover.1718553901.git.leon-
> 40kernel.org_&d=DwIDaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tYSb
> qxyOwdSiLedP4yO55g&m=v7mstcYLoga4Ed_laSGpqjuQbnScgHCiflwmA4TzvXgi9x64qGYB4C
> ZGFrxQviQF&s=a-4dG1bvzL3dPsLsCSkubdHg_9eDKHIt-rEGQdaXvgU&e=
> >>> Thanks,
> >>> Zhu Yanjun
> >>>
> >>>>
> >>>> Thanks
> >>>>
> >>>> Edward Srouji (2):
> >>>> net/mlx5: Introduce data placement ordering bits
> >>>> RDMA/mlx5: Support OOO RX WQE consumption
> >>>>
> >>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
> >>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
> >>>> drivers/infiniband/hw/mlx5/qp.c | 51
> >>>> +++++++++++++++++++++++++---
> >>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
> >>>> include/uapi/rdma/mlx5-abi.h | 5 +++
> >>>> 5 files changed, 78 insertions(+), 11 deletions(-)
> >>>>
> >>>
> > --
> > Best Regards,
> > Yanjun.Zhu
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-06 12:17 ` Edward Srouji
@ 2024-09-06 15:17 ` Zhu Yanjun
2024-09-08 8:47 ` Edward Srouji
0 siblings, 1 reply; 16+ messages in thread
From: Zhu Yanjun @ 2024-09-06 15:17 UTC (permalink / raw)
To: Edward Srouji, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
在 2024/9/6 20:17, Edward Srouji 写道:
>
> On 9/6/2024 8:02 AM, Zhu Yanjun wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> 在 2024/9/5 20:23, Edward Srouji 写道:
>>>
>>> On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> 在 2024/9/4 16:27, Edward Srouji 写道:
>>>>>
>>>>> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
>>>>>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> This series from Edward introduces mlx5 data direct placement (DDP)
>>>>>>> feature.
>>>>>>>
>>>>>>> This feature allows WRs on the receiver side of the QP to be
>>>>>>> consumed
>>>>>>> out of order, permitting the sender side to transmit messages
>>>>>>> without
>>>>>>> guaranteeing arrival order on the receiver side.
>>>>>>>
>>>>>>> When enabled, the completion ordering of WRs remains in-order,
>>>>>>> regardless of the Receive WRs consumption order.
>>>>>>>
>>>>>>> RDMA Read and RDMA Atomic operations on the responder side
>>>>>>> continue to
>>>>>>> be executed in-order, while the ordering of data placement for RDMA
>>>>>>> Write and Send operations is not guaranteed.
>>>>>>
>>>>>> It is an interesting feature. If I got this feature correctly, this
>>>>>> feature permits the user consumes the data out of order when RDMA
>>>>>> Write
>>>>>> and Send operations. But its completiong ordering is still in order.
>>>>>>
>>>>> Correct.
>>>>>> Any scenario that this feature can be applied and what benefits
>>>>>> will be
>>>>>> got from this feature?
>>>>>>
>>>>>> I am just curious about this. Normally the users will consume the
>>>>>> data
>>>>>> in order. In what scenario, the user will consume the data out of
>>>>>> order?
>>>>>>
>>>>> One of the main benefits of this feature is achieving higher
>>>>> bandwidth
>>>>> (BW) by allowing
>>>>> responders to receive packets out of order (OOO).
>>>>>
>>>>> For example, this can be utilized in devices that support multi-plane
>>>>> functionality,
>>>>> as introduced in the "Multi-plane support for mlx5" series [1]. When
>>>>> mlx5 multi-plane
>>>>> is supported, a single logical mlx5 port aggregates multiple physical
>>>>> plane ports.
>>>>> In this scenario, the requester can "spray" packets across the
>>>>> multiple physical
>>>>> plane ports without guaranteeing packet order, either on the wire or
>>>>> on the receiver
>>>>> (responder) side.
>>>>>
>>>>> With this approach, no barriers or fences are required to ensure
>>>>> in-order packet
>>>>> reception, which optimizes the data path for performance. This can
>>>>> result in better
>>>>> BW, theoretically achieving line-rate performance equivalent to the
>>>>> sum of
>>>>> the maximum BW of all physical plane ports, with only one QP.
>>>>
>>>> Thanks a lot for your quick reply. Without ensuring in-order packet
>>>> reception, this does optimize the data path for performance.
>>>>
>>>> I agree with you.
>>>>
>>>> But how does the receiver get the correct packets from the
>>>> out-of-order
>>>> packets efficiently?
>>>>
>>>> The method is implemented in Software or Hardware?
>>>
>>>
>>> The packets have new field that is used by the HW to understand the
>>> correct message order (similar to PSN).
>>>
>>> Once the packets arrive OOO to the receiver side, the data is scattered
>>> directly (hence the DDP - "Direct Data Placement" name) by the HW.
>>>
>>> So the efficiency is achieved by the HW, as it also saves the required
>>> context and metadata so it can deliver the correct completion to the
>>> user (in-order) once we have some WQEs that can be considered an
>>> "in-order window" and be delivered to the user.
>>>
>>> The SW/Applications may receive OOO WR_IDs though (because the first
>>> CQE
>>> may have consumed Recv WQE of any index on the receiver side), and it's
>>> their responsibility to handle it from this point, if it's required.
>>
>> Got it. It seems that all the functionalities are implemented in HW. The
>> SW only receives OOO WR_IDs. Thanks a lot. Perhaps it is helpful to RDMA
>> LAG devices. It should enhance the performance^_^
>>
>> BTW, do you have any performance data with this feature?
>
> Not yet. We tested it functionality wise for now.
>
> But we should be able to measure its performance soon :).
Thanks a lot. It is an interesting feature. If performance reports,
please share them with us.
IMO, perhaps this feature can be used in random read/write devices, for
example, hard disk?
Just my idea. Not sure if you have applied this feature with hard disk
or not.
Best Regards,
Zhu Yanjun
>
>
>>
>> Best Regards,
>> Zhu Yanjun
>>
>>>
>>>>
>>>> I am just interested in this feature and want to know more about this.
>>>>
>>>> Thanks,
>>>>
>>>> Zhu Yanjun
>>>>
>>>>>
>>>>> [1]
>>>>> https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/
>>>>>> Thanks,
>>>>>> Zhu Yanjun
>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Edward Srouji (2):
>>>>>>> net/mlx5: Introduce data placement ordering bits
>>>>>>> RDMA/mlx5: Support OOO RX WQE consumption
>>>>>>>
>>>>>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
>>>>>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
>>>>>>> drivers/infiniband/hw/mlx5/qp.c | 51
>>>>>>> +++++++++++++++++++++++++---
>>>>>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
>>>>>>> include/uapi/rdma/mlx5-abi.h | 5 +++
>>>>>>> 5 files changed, 78 insertions(+), 11 deletions(-)
>>>>>>>
>>>>>>
>>>> --
>>>> Best Regards,
>>>> Yanjun.Zhu
>>>>
>>
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-06 15:17 ` Zhu Yanjun
@ 2024-09-08 8:47 ` Edward Srouji
0 siblings, 0 replies; 16+ messages in thread
From: Edward Srouji @ 2024-09-08 8:47 UTC (permalink / raw)
To: Zhu Yanjun, Leon Romanovsky, Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
On 9/6/2024 6:17 PM, Zhu Yanjun wrote:
> External email: Use caution opening links or attachments
>
>
> 在 2024/9/6 20:17, Edward Srouji 写道:
>>
>> On 9/6/2024 8:02 AM, Zhu Yanjun wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> 在 2024/9/5 20:23, Edward Srouji 写道:
>>>>
>>>> On 9/4/2024 2:53 PM, Zhu Yanjun wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> 在 2024/9/4 16:27, Edward Srouji 写道:
>>>>>>
>>>>>> On 9/4/2024 9:02 AM, Zhu Yanjun wrote:
>>>>>>> External email: Use caution opening links or attachments
>>>>>>>
>>>>>>>
>>>>>>> 在 2024/9/3 19:37, Leon Romanovsky 写道:
>>>>>>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This series from Edward introduces mlx5 data direct placement
>>>>>>>> (DDP)
>>>>>>>> feature.
>>>>>>>>
>>>>>>>> This feature allows WRs on the receiver side of the QP to be
>>>>>>>> consumed
>>>>>>>> out of order, permitting the sender side to transmit messages
>>>>>>>> without
>>>>>>>> guaranteeing arrival order on the receiver side.
>>>>>>>>
>>>>>>>> When enabled, the completion ordering of WRs remains in-order,
>>>>>>>> regardless of the Receive WRs consumption order.
>>>>>>>>
>>>>>>>> RDMA Read and RDMA Atomic operations on the responder side
>>>>>>>> continue to
>>>>>>>> be executed in-order, while the ordering of data placement for
>>>>>>>> RDMA
>>>>>>>> Write and Send operations is not guaranteed.
>>>>>>>
>>>>>>> It is an interesting feature. If I got this feature correctly, this
>>>>>>> feature permits the user consumes the data out of order when RDMA
>>>>>>> Write
>>>>>>> and Send operations. But its completiong ordering is still in
>>>>>>> order.
>>>>>>>
>>>>>> Correct.
>>>>>>> Any scenario that this feature can be applied and what benefits
>>>>>>> will be
>>>>>>> got from this feature?
>>>>>>>
>>>>>>> I am just curious about this. Normally the users will consume the
>>>>>>> data
>>>>>>> in order. In what scenario, the user will consume the data out of
>>>>>>> order?
>>>>>>>
>>>>>> One of the main benefits of this feature is achieving higher
>>>>>> bandwidth
>>>>>> (BW) by allowing
>>>>>> responders to receive packets out of order (OOO).
>>>>>>
>>>>>> For example, this can be utilized in devices that support
>>>>>> multi-plane
>>>>>> functionality,
>>>>>> as introduced in the "Multi-plane support for mlx5" series [1]. When
>>>>>> mlx5 multi-plane
>>>>>> is supported, a single logical mlx5 port aggregates multiple
>>>>>> physical
>>>>>> plane ports.
>>>>>> In this scenario, the requester can "spray" packets across the
>>>>>> multiple physical
>>>>>> plane ports without guaranteeing packet order, either on the wire or
>>>>>> on the receiver
>>>>>> (responder) side.
>>>>>>
>>>>>> With this approach, no barriers or fences are required to ensure
>>>>>> in-order packet
>>>>>> reception, which optimizes the data path for performance. This can
>>>>>> result in better
>>>>>> BW, theoretically achieving line-rate performance equivalent to the
>>>>>> sum of
>>>>>> the maximum BW of all physical plane ports, with only one QP.
>>>>>
>>>>> Thanks a lot for your quick reply. Without ensuring in-order packet
>>>>> reception, this does optimize the data path for performance.
>>>>>
>>>>> I agree with you.
>>>>>
>>>>> But how does the receiver get the correct packets from the
>>>>> out-of-order
>>>>> packets efficiently?
>>>>>
>>>>> The method is implemented in Software or Hardware?
>>>>
>>>>
>>>> The packets have new field that is used by the HW to understand the
>>>> correct message order (similar to PSN).
>>>>
>>>> Once the packets arrive OOO to the receiver side, the data is
>>>> scattered
>>>> directly (hence the DDP - "Direct Data Placement" name) by the HW.
>>>>
>>>> So the efficiency is achieved by the HW, as it also saves the required
>>>> context and metadata so it can deliver the correct completion to the
>>>> user (in-order) once we have some WQEs that can be considered an
>>>> "in-order window" and be delivered to the user.
>>>>
>>>> The SW/Applications may receive OOO WR_IDs though (because the first
>>>> CQE
>>>> may have consumed Recv WQE of any index on the receiver side), and
>>>> it's
>>>> their responsibility to handle it from this point, if it's required.
>>>
>>> Got it. It seems that all the functionalities are implemented in HW.
>>> The
>>> SW only receives OOO WR_IDs. Thanks a lot. Perhaps it is helpful to
>>> RDMA
>>> LAG devices. It should enhance the performance^_^
>>>
>>> BTW, do you have any performance data with this feature?
>>
>> Not yet. We tested it functionality wise for now.
>>
>> But we should be able to measure its performance soon :).
>
> Thanks a lot. It is an interesting feature. If performance reports,
> please share them with us.
Sure, will do.
>
>
> IMO, perhaps this feature can be used in random read/write devices, for
> example, hard disk?
>
> Just my idea. Not sure if you have applied this feature with hard disk
> or not.
You're right, it can be used with storage and we're planning to do this
integration and usage in the near future.
>
> Best Regards,
>
> Zhu Yanjun
>
>>
>>
>>>
>>> Best Regards,
>>> Zhu Yanjun
>>>
>>>>
>>>>>
>>>>> I am just interested in this feature and want to know more about
>>>>> this.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Zhu Yanjun
>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/
>>>>>>> Thanks,
>>>>>>> Zhu Yanjun
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Edward Srouji (2):
>>>>>>>> net/mlx5: Introduce data placement ordering bits
>>>>>>>> RDMA/mlx5: Support OOO RX WQE consumption
>>>>>>>>
>>>>>>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++
>>>>>>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
>>>>>>>> drivers/infiniband/hw/mlx5/qp.c | 51
>>>>>>>> +++++++++++++++++++++++++---
>>>>>>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
>>>>>>>> include/uapi/rdma/mlx5-abi.h | 5 +++
>>>>>>>> 5 files changed, 78 insertions(+), 11 deletions(-)
>>>>>>>>
>>>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Yanjun.Zhu
>>>>>
>>>
> --
> Best Regards,
> Yanjun.Zhu
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: (subset) [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-03 11:37 [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Leon Romanovsky
` (2 preceding siblings ...)
2024-09-04 6:02 ` [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Zhu Yanjun
@ 2024-11-04 8:20 ` Leon Romanovsky
2024-11-04 8:27 ` Leon Romanovsky
4 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-11-04 8:20 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Edward Srouji, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas, Leon Romanovsky
On Tue, 03 Sep 2024 14:37:50 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Hi,
>
> This series from Edward introduces mlx5 data direct placement (DDP)
> feature.
>
> [...]
Applied, thanks!
[2/2] RDMA/mlx5: Support OOO RX WQE consumption
https://git.kernel.org/rdma/rdma/c/ded397366b5540
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-09-03 11:37 [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Leon Romanovsky
` (3 preceding siblings ...)
2024-11-04 8:20 ` (subset) " Leon Romanovsky
@ 2024-11-04 8:27 ` Leon Romanovsky
2024-11-05 2:53 ` Jakub Kicinski
4 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2024-11-04 8:27 UTC (permalink / raw)
To: Jakub Kicinski, Jason Gunthorpe
Cc: Edward Srouji, linux-kernel, linux-rdma, netdev, Saeed Mahameed,
Tariq Toukan, Yishai Hadas
On Tue, Sep 03, 2024 at 02:37:50PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Hi,
>
> This series from Edward introduces mlx5 data direct placement (DDP)
> feature.
>
> This feature allows WRs on the receiver side of the QP to be consumed
> out of order, permitting the sender side to transmit messages without
> guaranteeing arrival order on the receiver side.
>
> When enabled, the completion ordering of WRs remains in-order,
> regardless of the Receive WRs consumption order.
>
> RDMA Read and RDMA Atomic operations on the responder side continue to
> be executed in-order, while the ordering of data placement for RDMA
> Write and Send operations is not guaranteed.
>
> Thanks
>
> Edward Srouji (2):
> net/mlx5: Introduce data placement ordering bits
Jakub,
We applied this series to RDMA and first patch generates merge conflicts
in include/linux/mlx5/mlx5_ifc.h between netdev and RDMA trees.
Can you please pull shared mlx5-next branch to avoid it?
Thanks
> RDMA/mlx5: Support OOO RX WQE consumption
>
> drivers/infiniband/hw/mlx5/main.c | 8 +++++
> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 +
> drivers/infiniband/hw/mlx5/qp.c | 51 +++++++++++++++++++++++++---
> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++----
> include/uapi/rdma/mlx5-abi.h | 5 +++
> 5 files changed, 78 insertions(+), 11 deletions(-)
>
> --
> 2.46.0
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-11-04 8:27 ` Leon Romanovsky
@ 2024-11-05 2:53 ` Jakub Kicinski
2024-11-05 6:26 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2024-11-05 2:53 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, Edward Srouji, linux-kernel, linux-rdma, netdev,
Saeed Mahameed, Tariq Toukan, Yishai Hadas
On Mon, 4 Nov 2024 10:27:10 +0200 Leon Romanovsky wrote:
> Jakub,
>
> We applied this series to RDMA and first patch generates merge conflicts
> in include/linux/mlx5/mlx5_ifc.h between netdev and RDMA trees.
>
> Can you please pull shared mlx5-next branch to avoid it?
Sorry I don't have the context, the thread looks 2 months old.
If you'd like us to pull something please sense a pull request
targeting net-next...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP)
2024-11-05 2:53 ` Jakub Kicinski
@ 2024-11-05 6:26 ` Leon Romanovsky
0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2024-11-05 6:26 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jason Gunthorpe, Edward Srouji, linux-kernel, linux-rdma, netdev,
Saeed Mahameed, Tariq Toukan, Yishai Hadas
On Mon, Nov 04, 2024 at 06:53:03PM -0800, Jakub Kicinski wrote:
> On Mon, 4 Nov 2024 10:27:10 +0200 Leon Romanovsky wrote:
> > Jakub,
> >
> > We applied this series to RDMA and first patch generates merge conflicts
> > in include/linux/mlx5/mlx5_ifc.h between netdev and RDMA trees.
> >
> > Can you please pull shared mlx5-next branch to avoid it?
>
> Sorry I don't have the context, the thread looks 2 months old.
> If you'd like us to pull something please sense a pull request
> targeting net-next...
Sure, will do.
Thanks
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-11-05 6:26 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-03 11:37 [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Leon Romanovsky
2024-09-03 11:37 ` [PATCH mlx5-next 1/2] net/mlx5: Introduce data placement ordering bits Leon Romanovsky
2024-09-03 11:37 ` [PATCH rdma-next 2/2] RDMA/mlx5: Support OOO RX WQE consumption Leon Romanovsky
2024-09-04 6:02 ` [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) Zhu Yanjun
2024-09-04 8:27 ` Edward Srouji
2024-09-04 11:53 ` Zhu Yanjun
2024-09-05 12:23 ` Edward Srouji
2024-09-06 5:02 ` Zhu Yanjun
2024-09-06 12:17 ` Edward Srouji
2024-09-06 15:17 ` Zhu Yanjun
2024-09-08 8:47 ` Edward Srouji
2024-09-06 13:02 ` Bernard Metzler
2024-11-04 8:20 ` (subset) " Leon Romanovsky
2024-11-04 8:27 ` Leon Romanovsky
2024-11-05 2:53 ` Jakub Kicinski
2024-11-05 6:26 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).