public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Chao Leng <lengchao@huawei.com>
To: Sagi Grimberg <sagi@grimberg.me>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Christoph Hellwig <hch@lst.de>
Cc: <linux-nvme@lists.infradead.org>, <kbusch@kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms
Date: Mon, 29 Aug 2022 16:05:54 +0800	[thread overview]
Message-ID: <550d4612-0041-3d84-b1cb-786d0c8e0d11@huawei.com> (raw)
In-Reply-To: <fbee7c67-fd7b-12c8-5685-066b1974aadb@grimberg.me>



On 2022/8/28 22:57, Sagi Grimberg wrote:
> 
>>>> On 2022/8/21 14:20, Christoph Hellwig wrote:
>>>>> On Fri, Aug 19, 2022 at 03:58:25PM +0800, Chao Leng wrote:
>>>>>> Now the ack timeout of RoCE is 2 second(2^(18+1)*4us=2 second). In the
>>>>>> case of low concurrency, if some packets lost due to network abnormal
>>>>>> such as network rerouting, Optical fiber signal interference, etc,
>>>>>> it will wait 2 second to try retransmitting the lost packets.
>>>>>> As a result, the I/O latency is greater than 2 seconds.
>>>>>> The I/O latency is so long for real-time transaction service. Indeed we
>>>>>> do not have to wait so long time to make sure that packets are lost.
>>>>>> Setting the ack timeout to 262ms(2^(15+1)*4us=262ms) is sufficient.
>>>>>
>>>>> I'll leave people more familar with RoCE to judge the merits of this
>>>>> change, but I really want a comment explaining the choice in the
>>>>> source code.
>>>> Now the TCP retransmission timeout interval is 250ms, and this setting
>>>> has been maintained for many years.
>>>> The network quality of rdma is better than that of common Ethernet.
>>>> That is the reason to set 262ms as the default ack timeout.
>>>> Adding a module parameter may be a better option.
>>>
>>> Are you solving a real issue you encountered ?
>> There is a low probability that this occurs in real scenarios.
>> The issue occurs in fault simulation test.
>> In the core-leaf fabrics,simulate a fiber fault between the core switch
>> and the leaf switch.
>> In the case of low concurrency, There is a high probability that the
>> I/O latency is greater than 2 seconds.
>> This patch can reduce the I/O latency to less than 1 second.
>>>
>>> If so, which devices did you use ?
>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5];
>> The switch and storage are huawei equipments.
>> In principle, switches and storage devices from other vendors
>> have the same problem.
>> If you think it is necessary, we can test the other vendor switchs
>> and linux target.
> 
> Why is the 2s default chosen, what is the downside for a 250ms seconds ack timeout? and why is nvme-rdma different than all other kernel rdma
The downside is redundant retransmit if the packets delay more than
250ms in the networks and finally reaches the receiver.
Only in extreme scenarios, the packet delay may exceed 250 ms.
> consumers that it needs to set this explicitly?
The real-time transaction services are sensitive to the delay.
nvme-rdma will be used in real-time transactions.
The real-time transaction services do not allow that the packets
delay more than 250ms in the networks.
So we need to set the ack timeout to 262ms.
> 
> Adding linux-rdma folks.
> .


  reply	other threads:[~2022-08-29  8:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19  7:58 [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms Chao Leng
2022-08-21  6:20 ` Christoph Hellwig
2022-08-22  9:50   ` Chao Leng
2022-08-22 15:30     ` Max Gurtovoy
2022-08-25  9:58       ` Chao Leng
2022-08-28 14:57         ` Sagi Grimberg
2022-08-29  8:05           ` Chao Leng [this message]
2022-08-29  9:06             ` Sagi Grimberg
2022-08-29 13:15               ` Chao Leng
2022-10-10  9:12                 ` Chao Leng
2022-10-14  0:05                   ` Max Gurtovoy
2022-10-14  2:15                     ` Chao Leng
2022-11-16  2:24                       ` Chao Leng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550d4612-0041-3d84-b1cb-786d0c8e0d11@huawei.com \
    --to=lengchao@huawei.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mgurtovoy@nvidia.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox