From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C92C2C433F5 for ; Fri, 4 Mar 2022 09:44:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=58MGuuANpo33yASYhfyJPATZQgZLU4K3CGhEaS/ML4s=; b=T8GVKP1L+eVUm9n/9KA3HcVjM6 6VRNTPPmNo/BaIpSYbl8r9mJmPupyio9MYM+zGVLvhBu2eYlTe+8iiVVnH1GHfaiwuYMUAQ6Kevg8 0tD7J9ORHct2yS0tgbMZQqrkITVWSX/6t2n7g8wf+0DijHU+0DGX2xhp5tsGis9X8gfMsPiUJL7kP HPYZmCKcu0KcG0PVAvCmNP30UJOb7Gv8PeG+SePuNZEb54fyzqMQdCESXJSSTwjNZ30O0ed8/rn42 7u/MIiznOHGX7YLIXCHDW9oU0w2b+0IIji/0q76mnRrmSyTdWLFTPzAAnzvM6y+FKSkIwIVnJC7y5 YB0RNaLQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nQ4UD-009MzD-HY; Fri, 04 Mar 2022 09:44:33 +0000 Received: from smtprz14.163.net ([106.3.154.247] helo=smtp.tom.com) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nQ4FR-009HxM-V1 for linux-nvme@lists.infradead.org; Fri, 04 Mar 2022 09:29:20 +0000 Received: from my-app02.tom.com (my-app02.tom.com [127.0.0.1]) by freemail02.tom.com (Postfix) with ESMTP id 2A921B00D6E for ; Fri, 4 Mar 2022 17:29:16 +0800 (CST) Received: from my-app02.tom.com (HELO smtp.tom.com) ([127.0.0.1]) by my-app02 (TOM SMTP Server) with SMTP ID 1290107232 for ; Fri, 04 Mar 2022 17:29:16 +0800 (CST) Received: from antispam1.tom.com (unknown [172.25.16.55]) by freemail02.tom.com (Postfix) with ESMTP id 11AFCB00D5D for ; Fri, 4 Mar 2022 17:29:16 +0800 (CST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tom.com; s=201807; t=1646386156; bh=a8O+cqlKrcAi4SPISzPwhi3ONqz/CHSJ1yBQ/iW1LBc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uaTz/KHMSN8lazFwZ859MuANQQZSWw6HJumLrrfsawRSgD9SEFfr5VpBPJq03BRi8 LFp5E7dgwDV0TSQ77YLVrxxYfrQeHMM3WgaDpR+lCZUxJ3UsFHLxMkIl6kY7FWZeZA Y7lonHTfGQqou9Oidn2NbLM6+TzCSGCtOfKma1f0= Received: from antispam1.tom.com (antispam1.tom.com [127.0.0.1]) by antispam1.tom.com (Postfix) with ESMTP id F244CD41596 for ; Fri, 4 Mar 2022 17:29:15 +0800 (CST) X-Virus-Scanned: Debian amavisd-new at antispam1.tom.com Received: from antispam1.tom.com ([127.0.0.1]) by antispam1.tom.com (antispam1.tom.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RastNnX_PwEx for ; Fri, 4 Mar 2022 17:29:14 +0800 (CST) Received: from localhost.localdomain (unknown [39.144.44.23]) by antispam1.tom.com (Postfix) with ESMTPA id 74437D41530; Fri, 4 Mar 2022 17:29:12 +0800 (CST) From: Mingbao Sun To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Cc: sunmingbao@tom.com, tyler.sun@dell.com, ping.gan@dell.com, yanxiu.cai@dell.com, libin.zhang@dell.com, ao.sun@dell.com Subject: [PATCH 2/2] nvme-tcp: support specifying the congestion-control Date: Fri, 4 Mar 2022 17:27:54 +0800 Message-Id: <20220304092754.2721-3-sunmingbao@tom.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20220304092754.2721-1-sunmingbao@tom.com> References: <20220304092754.2721-1-sunmingbao@tom.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220304_012918_429313_D7AE0399 X-CRM114-Status: GOOD ( 20.99 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org From: Mingbao Sun congestion-control could have a noticeable impaction on the performance of TCP-based communications. This is of course true to NVMe_over_TCP. Different congestion-controls (e.g., cubic, dctcp) are suitable for different scenarios. Proper adoption of congestion control would benefit the performance. On the contrary, the performance could be destroyed. Though we can specify the congestion-control of NVMe_over_TCP via writing '/proc/sys/net/ipv4/tcp_congestion_control', but this also changes the congestion-control of all the future TCP sockets that have not been explicitly assigned the congestion-control, thus bringing potential impaction on their performance. So it makes sense to make NVMe_over_TCP support specifying the congestion-control. And this commit addresses the host side. Implementation approach: a new option called 'tcp_congestion' was created in fabrics opt_tokens for 'nvme connect' command to passed in the congestion-control specified by the user. Then later in nvme_tcp_alloc_queue, the specified congestion-control would be applied to the relevant sockets of the host side. Signed-off-by: Mingbao Sun --- drivers/nvme/host/fabrics.c | 24 ++++++++++++++++++++++++ drivers/nvme/host/fabrics.h | 2 ++ drivers/nvme/host/tcp.c | 20 +++++++++++++++++++- 3 files changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index ee79a6d639b4..6d946f758372 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "nvme.h" #include "fabrics.h" @@ -548,6 +549,7 @@ static const match_table_t opt_tokens = { { NVMF_OPT_TOS, "tos=%d" }, { NVMF_OPT_FAIL_FAST_TMO, "fast_io_fail_tmo=%d" }, { NVMF_OPT_DISCOVERY, "discovery" }, + { NVMF_OPT_TCP_CONGESTION, "tcp_congestion=%s" }, { NVMF_OPT_ERR, NULL } }; @@ -560,6 +562,8 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts, size_t nqnlen = 0; int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO; uuid_t hostid; + bool ecn_ca; + u32 key; /* Set defaults */ opts->queue_size = NVMF_DEF_QUEUE_SIZE; @@ -829,6 +833,25 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts, case NVMF_OPT_DISCOVERY: opts->discovery_nqn = true; break; + case NVMF_OPT_TCP_CONGESTION: + p = match_strdup(args); + if (!p) { + ret = -ENOMEM; + goto out; + } + + key = tcp_ca_get_key_by_name(NULL, p, &ecn_ca); + if (key == TCP_CA_UNSPEC) { + pr_err("congestion control %s not found.\n", + p); + ret = -EINVAL; + kfree(p); + goto out; + } + + kfree(opts->tcp_congestion); + opts->tcp_congestion = p; + break; default: pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n", p); @@ -947,6 +970,7 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts) kfree(opts->subsysnqn); kfree(opts->host_traddr); kfree(opts->host_iface); + kfree(opts->tcp_congestion); kfree(opts); } EXPORT_SYMBOL_GPL(nvmf_free_options); diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index c3203ff1c654..25fdc169949d 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h @@ -68,6 +68,7 @@ enum { NVMF_OPT_FAIL_FAST_TMO = 1 << 20, NVMF_OPT_HOST_IFACE = 1 << 21, NVMF_OPT_DISCOVERY = 1 << 22, + NVMF_OPT_TCP_CONGESTION = 1 << 23, }; /** @@ -117,6 +118,7 @@ struct nvmf_ctrl_options { unsigned int nr_io_queues; unsigned int reconnect_delay; bool discovery_nqn; + const char *tcp_congestion; bool duplicate_connect; unsigned int kato; struct nvmf_host *host; diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 6cbcc8b4daaf..cb2c7d7371d4 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1403,6 +1403,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, { struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl); struct nvme_tcp_queue *queue = &ctrl->queues[qid]; + char ca_name[TCP_CA_NAME_MAX]; + sockptr_t optval; int ret, rcv_pdu_size; mutex_init(&queue->queue_lock); @@ -1447,6 +1449,21 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, if (nctrl->opts->tos >= 0) ip_sock_set_tos(queue->sock->sk, nctrl->opts->tos); + if (nctrl->opts->mask & NVMF_OPT_TCP_CONGESTION) { + strncpy(ca_name, nctrl->opts->tcp_congestion, + TCP_CA_NAME_MAX-1); + optval = KERNEL_SOCKPTR(ca_name); + ret = sock_common_setsockopt(queue->sock, IPPROTO_TCP, + TCP_CONGESTION, optval, + strlen(ca_name)); + if (ret) { + dev_err(nctrl->device, + "failed to set TCP congestion to %s: %d\n", + ca_name, ret); + goto err_sock; + } + } + /* Set 10 seconds timeout for icresp recvmsg */ queue->sock->sk->sk_rcvtimeo = 10 * HZ; @@ -2611,7 +2628,8 @@ static struct nvmf_transport_ops nvme_tcp_transport = { NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO | NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST | NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES | - NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE, + NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE | + NVMF_OPT_TCP_CONGESTION, .create_ctrl = nvme_tcp_create_ctrl, }; -- 2.26.2