From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E0EE8C4332F for ; Wed, 16 Nov 2022 02:24:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:References:CC:To:From: Subject:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=2AQD/FXBIx2QvsjuglJoA4d0XuO7JQMYdbfSyzs3a0k=; b=ktYVaODt+kaay3PYihf64u9fk2 7TyIQklXGe7o+TrdPhEaNERYjc5ftuzXw4b9Vp1eP0Wo6gNLpDrIk7xT9ZC+RfADA7ZonTsPqxmJK R1uH45+vnRiVfo90j2r4RCg8X8ZObfWmoDDw2baaPc8IZfHAttfQozbzPZ8nwjPr8tm3yaBcx8ewM 8r+UNNJ/uiSgOHjHtjQlHdhAF6GpnEHrsVXKewUyK1mTEbBFgGeL113HBssl0kztR74gjNCOa6gU1 ePFGUffU2/MhT5qHkqW4kyUhu5OWjHCpoKtaGGiTT5AKxUc1YElbWfR9e1JYXhEmuj/b2wKoCv4ZD HJpfpfpQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ov86a-00GaRb-Fl; Wed, 16 Nov 2022 02:24:48 +0000 Received: from szxga02-in.huawei.com ([45.249.212.188]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1ov86W-00GaQm-5w for linux-nvme@lists.infradead.org; Wed, 16 Nov 2022 02:24:46 +0000 Received: from canpemm500002.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NBn1B5y75zRpMY; Wed, 16 Nov 2022 10:24:14 +0800 (CST) Received: from [10.169.59.127] (10.169.59.127) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 16 Nov 2022 10:24:33 +0800 Subject: Re: [PATCH] nvme-rdma: set ack timeout of RoCE to 262ms From: Chao Leng To: Max Gurtovoy , Sagi Grimberg , Christoph Hellwig CC: , , "linux-rdma@vger.kernel.org" References: <20220819075825.21231-1-lengchao@huawei.com> <20220821062016.GA26553@lst.de> <83992e8f-b18a-ccd3-e0ee-a5802043f161@huawei.com> <86e9fc3b-aded-220d-1ee0-4d5928097104@nvidia.com> <550d4612-0041-3d84-b1cb-786d0c8e0d11@huawei.com> <3030fbb2-5c63-54ea-5be3-b88cf63c6b75@grimberg.me> <328a807f-bfaf-b279-69c5-09be179891ac@huawei.com> <1bd4d4f6-fe33-7fe5-f662-cdef61acf800@nvidia.com> <405b78aa-51b8-30b4-ff86-c46d1bc84cda@huawei.com> Message-ID: <47eb747d-0b48-acc5-a833-02457817e71b@huawei.com> Date: Wed, 16 Nov 2022 10:24:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: <405b78aa-51b8-30b4-ff86-c46d1bc84cda@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.169.59.127] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221115_182444_601546_8329C664 X-CRM114-Status: GOOD ( 21.19 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org Hi, Max How's it going now? Thank you. On 2022/10/14 10:15, Chao Leng wrote: > > > On 2022/10/14 8:05, Max Gurtovoy wrote: >> Sorry for late response, we have holiday's in my country. >> >> I still can't understand how this patch fixes your problem if you use ConnectX-5 since we use adaptive re-transmission by default and it's faster than 256msec to re-transmit. > adaptive re-transmission? Do you mean NAK-triggered retransmission? > NAK-triggered retransmission is very fast, but timeout-triggered retransmission > is very slow. Because There is a possibility that all packets of a QP are lost, > receiver HBA can not send NAK. > From our analysis, we didn't see any other adaptive re-transmission. > If there is any other adaptive re-transmission, can you explain it? > > This patch modify the waiting time for timeout re-transmission, Thus if all packets > of a QP are lost, the re-transmission waiting time will become short. >> >> Did you disable it ? > We do not disable anything. >> >> I'll try to re-spin it internally again. > If you need more information, please feel free to contact me. > Thank you. >> >> On 10/10/2022 12:12 PM, Chao Leng wrote: >>> Hi, Max >>>     Can you give some comment? Thank you. >>> >>> On 2022/8/29 21:15, Chao Leng wrote: >>>> >>>> >>>> On 2022/8/29 17:06, Sagi Grimberg wrote: >>>>> >>>>>>>>> If so, which devices did you use ? >>>>>>>> The host HBA is Mellanox Technologies MT27800 Family [ConnectX-5]; >>>>>>>> The switch and storage are huawei equipments. >>>>>>>> In principle, switches and storage devices from other vendors >>>>>>>> have the same problem. >>>>>>>> If you think it is necessary, we can test the other vendor switchs >>>>>>>> and linux target. >>>>>>> >>>>>>> Why is the 2s default chosen, what is the downside for a 250ms seconds ack timeout? and why is nvme-rdma different than all other kernel rdma >>>>>> The downside is redundant retransmit if the packets delay more than >>>>>> 250ms in the networks and finally reaches the receiver. >>>>>> Only in extreme scenarios, the packet delay may exceed 250 ms. >>>>> >>>>> Sounds like the default needs to be changed if it only addresses the >>>>> extreme scenarios... >>>>> >>>>>>> consumers that it needs to set this explicitly? >>>>>> The real-time transaction services are sensitive to the delay. >>>>>> nvme-rdma will be used in real-time transactions. >>>>>> The real-time transaction services do not allow that the packets >>>>>> delay more than 250ms in the networks. >>>>>> So we need to set the ack timeout to 262ms. >>>>> >>>>> While I don't disagree with the change itself, I do disagree why this >>>>> needs to be driven by nvme-rdma locally. If all kernel rdma consumers >>>>> need this (and if not, I'd like to understand why), this needs to be set in the rdma core.Changing the default set in the rdma core is another option. >>>> But it will affect all application based on RDMA. >>>> Max, what do you think? Thank you. >>>>> . >>>> >>>> . >> .