[PATCH] nvme-rdma: set ack timeout of RoCE to 262ms

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
@ 2022-08-19  7:58 Chao Leng
  2022-08-21  6:20 ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Leng @ 2022-08-19  7:58 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, sagi, kbusch, lengchao

Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
case of low concurrency, if some packets lost due to network abnormal
such as network rerouting, Optical fiber signal interference, etc,
it will wait 2 second to try retransmitting the lost packets.
As a result, the I/O latency is greater than 2 seconds.
The I/O latency is so long for real-time transaction service. Indeed we
do not have to wait so long time to make sure that packets are lost.
Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/rdma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 7d01fb770284..2dbb1b21acc8 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -602,6 +602,8 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 		ret = PTR_ERR(queue->cm_id);
 		goto out_destroy_mutex;
 	}
+	/* set ack timeout to 262ms(2^(15+1)*4us=262ms) */
+	rdma_set_ack_timeout(queue->cm_id, 15);

 	if (ctrl->ctrl.opts->mask & NVMF_OPT_HOST_TRADDR)
 		src_addr = (struct sockaddr *)&ctrl->src_addr;
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-19  7:58 [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms Chao Leng
@ 2022-08-21  6:20 ` Christoph Hellwig
  2022-08-22  9:50   ` Chao Leng
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2022-08-21  6:20 UTC (permalink / raw)
  To: Chao Leng; +Cc: linux-nvme, hch, sagi, kbusch

On Fri, Aug 19, 2022 at 03:58:25PM +0800, Chao Leng wrote:
> Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
> case of low concurrency, if some packets lost due to network abnormal
> such as network rerouting, Optical fiber signal interference, etc,
> it will wait 2 second to try retransmitting the lost packets.
> As a result, the I/O latency is greater than 2 seconds.
> The I/O latency is so long for real-time transaction service. Indeed we
> do not have to wait so long time to make sure that packets are lost.
> Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.

I'll leave people more familar with RoCE to judge the merits of this
change, but I really want a comment explaining the choice in the
source code.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-21  6:20 ` Christoph Hellwig
@ 2022-08-22  9:50   ` Chao Leng
  2022-08-22 15:30     ` Max Gurtovoy
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Leng @ 2022-08-22  9:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, sagi, kbusch



On 2022/8/21 14:20, Christoph Hellwig wrote:
> On Fri, Aug 19, 2022 at 03:58:25PM +0800, Chao Leng wrote:
>> Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
>> case of low concurrency, if some packets lost due to network abnormal
>> such as network rerouting, Optical fiber signal interference, etc,
>> it will wait 2 second to try retransmitting the lost packets.
>> As a result, the I/O latency is greater than 2 seconds.
>> The I/O latency is so long for real-time transaction service. Indeed we
>> do not have to wait so long time to make sure that packets are lost.
>> Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.
> 
> I'll leave people more familar with RoCE to judge the merits of this
> change, but I really want a comment explaining the choice in the
> source code.
Now the TCP retransmission timeout interval is 250ms, and this setting
has been maintained for many years.
The network quality of rdma is better than that of common Ethernet.
That is the reason to set 262ms as the default ack timeout.
Adding a module parameter may be a better option.
> 
> .
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-22  9:50   ` Chao Leng
@ 2022-08-22 15:30     ` Max Gurtovoy
  2022-08-25  9:58       ` Chao Leng
  0 siblings, 1 reply; 13+ messages in thread
From: Max Gurtovoy @ 2022-08-22 15:30 UTC (permalink / raw)
  To: Chao Leng, Christoph Hellwig; +Cc: linux-nvme, sagi, kbusch


On 8/22/2022 12:50 PM, Chao Leng wrote:
>
>
> On 2022/8/21 14:20, Christoph Hellwig wrote:
>> On Fri, Aug 19, 2022 at 03:58:25PM +0800, Chao Leng wrote:
>>> Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
>>> case of low concurrency, if some packets lost due to network abnormal
>>> such as network rerouting, Optical fiber signal interference, etc,
>>> it will wait 2 second to try retransmitting the lost packets.
>>> As a result, the I/O latency is greater than 2 seconds.
>>> The I/O latency is so long for real-time transaction service. Indeed we
>>> do not have to wait so long time to make sure that packets are lost.
>>> Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.
>>
>> I'll leave people more familar with RoCE to judge the merits of this
>> change, but I really want a comment explaining the choice in the
>> source code.
> Now the TCP retransmission timeout interval is 250ms, and this setting
> has been maintained for many years.
> The network quality of rdma is better than that of common Ethernet.
> That is the reason to set 262ms as the default ack timeout.
> Adding a module parameter may be a better option.

Are you solving a real issue you encountered ?

If so, which devices did you use ?

>>
>> .
>>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-22 15:30     ` Max Gurtovoy
@ 2022-08-25  9:58       ` Chao Leng
  2022-08-28 14:57         ` Sagi Grimberg
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Leng @ 2022-08-25  9:58 UTC (permalink / raw)
  To: Max Gurtovoy, Christoph Hellwig; +Cc: linux-nvme, sagi, kbusch



On 2022/8/22 23:30, Max Gurtovoy wrote:
> 
> On 8/22/2022 12:50 PM, Chao Leng wrote:
>>
>>
>> On 2022/8/21 14:20, Christoph Hellwig wrote:
>>> On Fri, Aug 19, 2022 at 03:58:25PM +0800, Chao Leng wrote:
>>>> Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
>>>> case of low concurrency, if some packets lost due to network abnormal
>>>> such as network rerouting, Optical fiber signal interference, etc,
>>>> it will wait 2 second to try retransmitting the lost packets.
>>>> As a result, the I/O latency is greater than 2 seconds.
>>>> The I/O latency is so long for real-time transaction service. Indeed we
>>>> do not have to wait so long time to make sure that packets are lost.
>>>> Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.
>>>
>>> I'll leave people more familar with RoCE to judge the merits of this
>>> change, but I really want a comment explaining the choice in the
>>> source code.
>> Now the TCP retransmission timeout interval is 250ms, and this setting
>> has been maintained for many years.
>> The network quality of rdma is better than that of common Ethernet.
>> That is the reason to set 262ms as the default ack timeout.
>> Adding a module parameter may be a better option.
> 
> Are you solving a real issue you encountered ?
There is a low probability that this occurs in real scenarios.
The issue occurs in fault simulation test.
In the core-leaf fabrics,simulate a fiber fault between the core switch
and the leaf switch.
In the case of low concurrency, There is a high probability that the
I/O latency is greater than 2 seconds.
This patch can reduce the I/O latency to less than 1 second.
> 
> If so, which devices did you use ?
The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
The switch and storage are huawei equipments.
In principle, switches and storage devices from other vendors
have the same problem.
If you think it is necessary, we can test the other vendor switchs
and linux target.
> 
>>>
>>> .
>>>
>>
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-25  9:58       ` Chao Leng
@ 2022-08-28 14:57         ` Sagi Grimberg
  2022-08-29  8:05           ` Chao Leng
  0 siblings, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2022-08-28 14:57 UTC (permalink / raw)
  To: Chao Leng, Max Gurtovoy, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org


>>> On 2022/8/21 14:20, Christoph Hellwig wrote:
>>>> On Fri, Aug 19, 2022 at 03:58:25PM +0800, Chao Leng wrote:
>>>>> Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
>>>>> case of low concurrency, if some packets lost due to network abnormal
>>>>> such as network rerouting, Optical fiber signal interference, etc,
>>>>> it will wait 2 second to try retransmitting the lost packets.
>>>>> As a result, the I/O latency is greater than 2 seconds.
>>>>> The I/O latency is so long for real-time transaction service. 
>>>>> Indeed we
>>>>> do not have to wait so long time to make sure that packets are lost.
>>>>> Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.
>>>>
>>>> I'll leave people more familar with RoCE to judge the merits of this
>>>> change, but I really want a comment explaining the choice in the
>>>> source code.
>>> Now the TCP retransmission timeout interval is 250ms, and this setting
>>> has been maintained for many years.
>>> The network quality of rdma is better than that of common Ethernet.
>>> That is the reason to set 262ms as the default ack timeout.
>>> Adding a module parameter may be a better option.
>>
>> Are you solving a real issue you encountered ?
> There is a low probability that this occurs in real scenarios.
> The issue occurs in fault simulation test.
> In the core-leaf fabrics,simulate a fiber fault between the core switch
> and the leaf switch.
> In the case of low concurrency, There is a high probability that the
> I/O latency is greater than 2 seconds.
> This patch can reduce the I/O latency to less than 1 second.
>>
>> If so, which devices did you use ?
> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
> The switch and storage are huawei equipments.
> In principle, switches and storage devices from other vendors
> have the same problem.
> If you think it is necessary, we can test the other vendor switchs
> and linux target.

Why is the 2s default chosen, what is the downside for a 250ms seconds 
ack timeout? and why is nvme-rdma different than all other kernel rdma
consumers that it needs to set this explicitly?

Adding linux-rdma folks.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-28 14:57         ` Sagi Grimberg
@ 2022-08-29  8:05           ` Chao Leng
  2022-08-29  9:06             ` Sagi Grimberg
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Leng @ 2022-08-29  8:05 UTC (permalink / raw)
  To: Sagi Grimberg, Max Gurtovoy, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org



On 2022/8/28 22:57, Sagi Grimberg wrote:
> 
>>>> On 2022/8/21 14:20, Christoph Hellwig wrote:
>>>>> On Fri, Aug 19, 2022 at 03:58:25PM +0800, Chao Leng wrote:
>>>>>> Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
>>>>>> case of low concurrency, if some packets lost due to network abnormal
>>>>>> such as network rerouting, Optical fiber signal interference, etc,
>>>>>> it will wait 2 second to try retransmitting the lost packets.
>>>>>> As a result, the I/O latency is greater than 2 seconds.
>>>>>> The I/O latency is so long for real-time transaction service. Indeed we
>>>>>> do not have to wait so long time to make sure that packets are lost.
>>>>>> Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.
>>>>>
>>>>> I'll leave people more familar with RoCE to judge the merits of this
>>>>> change, but I really want a comment explaining the choice in the
>>>>> source code.
>>>> Now the TCP retransmission timeout interval is 250ms, and this setting
>>>> has been maintained for many years.
>>>> The network quality of rdma is better than that of common Ethernet.
>>>> That is the reason to set 262ms as the default ack timeout.
>>>> Adding a module parameter may be a better option.
>>>
>>> Are you solving a real issue you encountered ?
>> There is a low probability that this occurs in real scenarios.
>> The issue occurs in fault simulation test.
>> In the core-leaf fabrics,simulate a fiber fault between the core switch
>> and the leaf switch.
>> In the case of low concurrency, There is a high probability that the
>> I/O latency is greater than 2 seconds.
>> This patch can reduce the I/O latency to less than 1 second.
>>>
>>> If so, which devices did you use ?
>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>> The switch and storage are huawei equipments.
>> In principle, switches and storage devices from other vendors
>> have the same problem.
>> If you think it is necessary, we can test the other vendor switchs
>> and linux target.
> 
> Why is the 2s default chosen, what is the downside for a 250ms seconds ack timeout? and why is nvme-rdma different than all other kernel rdma
The downside is redundant retransmit if the packets delay more than
250ms in the networks and finally reaches the receiver.
Only in extreme scenarios, the packet delay may exceed 250 ms.
> consumers that it needs to set this explicitly?
The real-time transaction services are sensitive to the delay.
nvme-rdma will be used in real-time transactions.
The real-time transaction services do not allow that the packets
delay more than 250ms in the networks.
So we need to set the ack timeout to 262ms.
> 
> Adding linux-rdma folks.
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-29  8:05           ` Chao Leng
@ 2022-08-29  9:06             ` Sagi Grimberg
  2022-08-29 13:15               ` Chao Leng
  0 siblings, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2022-08-29  9:06 UTC (permalink / raw)
  To: Chao Leng, Max Gurtovoy, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org


>>>> If so, which devices did you use ?
>>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>>> The switch and storage are huawei equipments.
>>> In principle, switches and storage devices from other vendors
>>> have the same problem.
>>> If you think it is necessary, we can test the other vendor switchs
>>> and linux target.
>>
>> Why is the 2s default chosen, what is the downside for a 250ms seconds 
>> ack timeout? and why is nvme-rdma different than all other kernel rdma
> The downside is redundant retransmit if the packets delay more than
> 250ms in the networks and finally reaches the receiver.
> Only in extreme scenarios, the packet delay may exceed 250 ms.

Sounds like the default needs to be changed if it only addresses the
extreme scenarios...

>> consumers that it needs to set this explicitly?
> The real-time transaction services are sensitive to the delay.
> nvme-rdma will be used in real-time transactions.
> The real-time transaction services do not allow that the packets
> delay more than 250ms in the networks.
> So we need to set the ack timeout to 262ms.

While I don't disagree with the change itself, I do disagree why this
needs to be driven by nvme-rdma locally. If all kernel rdma consumers
need this (and if not, I'd like to understand why), this needs to be set 
in the rdma core.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-29  9:06             ` Sagi Grimberg
@ 2022-08-29 13:15               ` Chao Leng
  2022-10-10  9:12                 ` Chao Leng
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Leng @ 2022-08-29 13:15 UTC (permalink / raw)
  To: Sagi Grimberg, Max Gurtovoy, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org



On 2022/8/29 17:06, Sagi Grimberg wrote:
> 
>>>>> If so, which devices did you use ?
>>>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>>>> The switch and storage are huawei equipments.
>>>> In principle, switches and storage devices from other vendors
>>>> have the same problem.
>>>> If you think it is necessary, we can test the other vendor switchs
>>>> and linux target.
>>>
>>> Why is the 2s default chosen, what is the downside for a 250ms seconds ack timeout? and why is nvme-rdma different than all other kernel rdma
>> The downside is redundant retransmit if the packets delay more than
>> 250ms in the networks and finally reaches the receiver.
>> Only in extreme scenarios, the packet delay may exceed 250 ms.
> 
> Sounds like the default needs to be changed if it only addresses the
> extreme scenarios...
> 
>>> consumers that it needs to set this explicitly?
>> The real-time transaction services are sensitive to the delay.
>> nvme-rdma will be used in real-time transactions.
>> The real-time transaction services do not allow that the packets
>> delay more than 250ms in the networks.
>> So we need to set the ack timeout to 262ms.
> 
> While I don't disagree with the change itself, I do disagree why this
> needs to be driven by nvme-rdma locally. If all kernel rdma consumers
> need this (and if not, I'd like to understand why), this needs to be set in the rdma core.Changing the default set in the rdma core is another option.
But it will affect all application based on RDMA.
Max, what do you think? Thank you.
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-08-29 13:15               ` Chao Leng
@ 2022-10-10  9:12                 ` Chao Leng
  2022-10-14  0:05                   ` Max Gurtovoy
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Leng @ 2022-10-10  9:12 UTC (permalink / raw)
  To: Sagi Grimberg, Max Gurtovoy, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org

Hi, Max
     Can you give some comment? Thank you.

On 2022/8/29 21:15, Chao Leng wrote:
> 
> 
> On 2022/8/29 17:06, Sagi Grimberg wrote:
>>
>>>>>> If so, which devices did you use ?
>>>>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>>>>> The switch and storage are huawei equipments.
>>>>> In principle, switches and storage devices from other vendors
>>>>> have the same problem.
>>>>> If you think it is necessary, we can test the other vendor switchs
>>>>> and linux target.
>>>>
>>>> Why is the 2s default chosen, what is the downside for a 250ms seconds ack timeout? and why is nvme-rdma different than all other kernel rdma
>>> The downside is redundant retransmit if the packets delay more than
>>> 250ms in the networks and finally reaches the receiver.
>>> Only in extreme scenarios, the packet delay may exceed 250 ms.
>>
>> Sounds like the default needs to be changed if it only addresses the
>> extreme scenarios...
>>
>>>> consumers that it needs to set this explicitly?
>>> The real-time transaction services are sensitive to the delay.
>>> nvme-rdma will be used in real-time transactions.
>>> The real-time transaction services do not allow that the packets
>>> delay more than 250ms in the networks.
>>> So we need to set the ack timeout to 262ms.
>>
>> While I don't disagree with the change itself, I do disagree why this
>> needs to be driven by nvme-rdma locally. If all kernel rdma consumers
>> need this (and if not, I'd like to understand why), this needs to be set in the rdma core.Changing the default set in the rdma core is another option.
> But it will affect all application based on RDMA.
> Max, what do you think? Thank you.
>> .
> 
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-10-10  9:12                 ` Chao Leng
@ 2022-10-14  0:05                   ` Max Gurtovoy
  2022-10-14  2:15                     ` Chao Leng
  0 siblings, 1 reply; 13+ messages in thread
From: Max Gurtovoy @ 2022-10-14  0:05 UTC (permalink / raw)
  To: Chao Leng, Sagi Grimberg, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org

Sorry for late response, we have holiday's in my country.

I still can't understand how this patch fixes your problem if you use 
ConnectX-5 since we use adaptive re-transmission by default and it's 
faster than 256msec to re-transmit.

Did you disable it ?

I'll try to re-spin it internally again.

On 10/10/2022 12:12 PM, Chao Leng wrote:
> Hi, Max
>     Can you give some comment? Thank you.
>
> On 2022/8/29 21:15, Chao Leng wrote:
>>
>>
>> On 2022/8/29 17:06, Sagi Grimberg wrote:
>>>
>>>>>>> If so, which devices did you use ?
>>>>>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>>>>>> The switch and storage are huawei equipments.
>>>>>> In principle, switches and storage devices from other vendors
>>>>>> have the same problem.
>>>>>> If you think it is necessary, we can test the other vendor switchs
>>>>>> and linux target.
>>>>>
>>>>> Why is the 2s default chosen, what is the downside for a 250ms 
>>>>> seconds ack timeout? and why is nvme-rdma different than all other 
>>>>> kernel rdma
>>>> The downside is redundant retransmit if the packets delay more than
>>>> 250ms in the networks and finally reaches the receiver.
>>>> Only in extreme scenarios, the packet delay may exceed 250 ms.
>>>
>>> Sounds like the default needs to be changed if it only addresses the
>>> extreme scenarios...
>>>
>>>>> consumers that it needs to set this explicitly?
>>>> The real-time transaction services are sensitive to the delay.
>>>> nvme-rdma will be used in real-time transactions.
>>>> The real-time transaction services do not allow that the packets
>>>> delay more than 250ms in the networks.
>>>> So we need to set the ack timeout to 262ms.
>>>
>>> While I don't disagree with the change itself, I do disagree why this
>>> needs to be driven by nvme-rdma locally. If all kernel rdma consumers
>>> need this (and if not, I'd like to understand why), this needs to be 
>>> set in the rdma core.Changing the default set in the rdma core is 
>>> another option.
>> But it will affect all application based on RDMA.
>> Max, what do you think? Thank you.
>>> .
>>
>> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-10-14  0:05                   ` Max Gurtovoy
@ 2022-10-14  2:15                     ` Chao Leng
  2022-11-16  2:24                       ` Chao Leng
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Leng @ 2022-10-14  2:15 UTC (permalink / raw)
  To: Max Gurtovoy, Sagi Grimberg, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org



On 2022/10/14 8:05, Max Gurtovoy wrote:
> Sorry for late response, we have holiday's in my country.
> 
> I still can't understand how this patch fixes your problem if you use ConnectX-5 since we use adaptive re-transmission by default and it's faster than 256msec to re-transmit.
adaptive re-transmission? Do you mean NAK-triggered retransmission?
NAK-triggered retransmission is very fast, but timeout-triggered retransmission
is very slow. Because There is a possibility that all packets of a QP are lost,
receiver HBA can not send NAK.
 From our analysis, we didn't see any other adaptive re-transmission.
If there is any other adaptive re-transmission, can you explain it?

This patch modify the waiting time for timeout re-transmission, Thus if all packets
of a QP are lost, the re-transmission waiting time will become short.
> 
> Did you disable it ?
We do not disable anything.
> 
> I'll try to re-spin it internally again.
If you need more information, please feel free to contact me.
Thank you.
> 
> On 10/10/2022 12:12 PM, Chao Leng wrote:
>> Hi, Max
>>     Can you give some comment? Thank you.
>>
>> On 2022/8/29 21:15, Chao Leng wrote:
>>>
>>>
>>> On 2022/8/29 17:06, Sagi Grimberg wrote:
>>>>
>>>>>>>> If so, which devices did you use ?
>>>>>>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>>>>>>> The switch and storage are huawei equipments.
>>>>>>> In principle, switches and storage devices from other vendors
>>>>>>> have the same problem.
>>>>>>> If you think it is necessary, we can test the other vendor switchs
>>>>>>> and linux target.
>>>>>>
>>>>>> Why is the 2s default chosen, what is the downside for a 250ms seconds ack timeout? and why is nvme-rdma different than all other kernel rdma
>>>>> The downside is redundant retransmit if the packets delay more than
>>>>> 250ms in the networks and finally reaches the receiver.
>>>>> Only in extreme scenarios, the packet delay may exceed 250 ms.
>>>>
>>>> Sounds like the default needs to be changed if it only addresses the
>>>> extreme scenarios...
>>>>
>>>>>> consumers that it needs to set this explicitly?
>>>>> The real-time transaction services are sensitive to the delay.
>>>>> nvme-rdma will be used in real-time transactions.
>>>>> The real-time transaction services do not allow that the packets
>>>>> delay more than 250ms in the networks.
>>>>> So we need to set the ack timeout to 262ms.
>>>>
>>>> While I don't disagree with the change itself, I do disagree why this
>>>> needs to be driven by nvme-rdma locally. If all kernel rdma consumers
>>>> need this (and if not, I'd like to understand why), this needs to be set in the rdma core.Changing the default set in the rdma core is another option.
>>> But it will affect all application based on RDMA.
>>> Max, what do you think? Thank you.
>>>> .
>>>
>>> .
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
  2022-10-14  2:15                     ` Chao Leng
@ 2022-11-16  2:24                       ` Chao Leng
  0 siblings, 0 replies; 13+ messages in thread
From: Chao Leng @ 2022-11-16  2:24 UTC (permalink / raw)
  To: Max Gurtovoy, Sagi Grimberg, Christoph Hellwig
  Cc: linux-nvme, kbusch, linux-rdma@vger.kernel.org

Hi, Max
    How's it going now?
    Thank you.

On 2022/10/14 10:15, Chao Leng wrote:
> 
> 
> On 2022/10/14 8:05, Max Gurtovoy wrote:
>> Sorry for late response, we have holiday's in my country.
>>
>> I still can't understand how this patch fixes your problem if you use ConnectX-5 since we use adaptive re-transmission by default and it's faster than 256msec to re-transmit.
> adaptive re-transmission? Do you mean NAK-triggered retransmission?
> NAK-triggered retransmission is very fast, but timeout-triggered retransmission
> is very slow. Because There is a possibility that all packets of a QP are lost,
> receiver HBA can not send NAK.
>  From our analysis, we didn't see any other adaptive re-transmission.
> If there is any other adaptive re-transmission, can you explain it?
> 
> This patch modify the waiting time for timeout re-transmission, Thus if all packets
> of a QP are lost, the re-transmission waiting time will become short.
>>
>> Did you disable it ?
> We do not disable anything.
>>
>> I'll try to re-spin it internally again.
> If you need more information, please feel free to contact me.
> Thank you.
>>
>> On 10/10/2022 12:12 PM, Chao Leng wrote:
>>> Hi, Max
>>>     Can you give some comment? Thank you.
>>>
>>> On 2022/8/29 21:15, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/8/29 17:06, Sagi Grimberg wrote:
>>>>>
>>>>>>>>> If so, which devices did you use ?
>>>>>>>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>>>>>>>> The switch and storage are huawei equipments.
>>>>>>>> In principle, switches and storage devices from other vendors
>>>>>>>> have the same problem.
>>>>>>>> If you think it is necessary, we can test the other vendor switchs
>>>>>>>> and linux target.
>>>>>>>
>>>>>>> Why is the 2s default chosen, what is the downside for a 250ms seconds ack timeout? and why is nvme-rdma different than all other kernel rdma
>>>>>> The downside is redundant retransmit if the packets delay more than
>>>>>> 250ms in the networks and finally reaches the receiver.
>>>>>> Only in extreme scenarios, the packet delay may exceed 250 ms.
>>>>>
>>>>> Sounds like the default needs to be changed if it only addresses the
>>>>> extreme scenarios...
>>>>>
>>>>>>> consumers that it needs to set this explicitly?
>>>>>> The real-time transaction services are sensitive to the delay.
>>>>>> nvme-rdma will be used in real-time transactions.
>>>>>> The real-time transaction services do not allow that the packets
>>>>>> delay more than 250ms in the networks.
>>>>>> So we need to set the ack timeout to 262ms.
>>>>>
>>>>> While I don't disagree with the change itself, I do disagree why this
>>>>> needs to be driven by nvme-rdma locally. If all kernel rdma consumers
>>>>> need this (and if not, I'd like to understand why), this needs to be set in the rdma core.Changing the default set in the rdma core is another option.
>>>> But it will affect all application based on RDMA.
>>>> Max, what do you think? Thank you.
>>>>> .
>>>>
>>>> .
>> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-16  2:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-19  7:58 [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms Chao Leng
2022-08-21  6:20 ` Christoph Hellwig
2022-08-22  9:50   ` Chao Leng
2022-08-22 15:30     ` Max Gurtovoy
2022-08-25  9:58       ` Chao Leng
2022-08-28 14:57         ` Sagi Grimberg
2022-08-29  8:05           ` Chao Leng
2022-08-29  9:06             ` Sagi Grimberg
2022-08-29 13:15               ` Chao Leng
2022-10-10  9:12                 ` Chao Leng
2022-10-14  0:05                   ` Max Gurtovoy
2022-10-14  2:15                     ` Chao Leng
2022-11-16  2:24                       ` Chao Leng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox