From: Mingbao Sun <sunmingbao@tom.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
Christoph Hellwig <hch@lst.de>,
Chaitanya Kulkarni <kch@nvidia.com>,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
Eric Dumazet <edumazet@google.com>,
"David S . Miller" <davem@davemloft.net>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
David Ahern <dsahern@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
netdev@vger.kernel.org, tyler.sun@dell.com, ping.gan@dell.com,
yanxiu.cai@dell.com, libin.zhang@dell.com, ao.sun@dell.com
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
Date: Thu, 31 Mar 2022 11:26:13 +0800 [thread overview]
Message-ID: <20220331112613.0000063e@tom.com> (raw)
In-Reply-To: <15f24dcd-9a62-8bab-271c-baa9cc693d8d@grimberg.me>
On Tue, 29 Mar 2022 10:46:08 +0300
Sagi Grimberg <sagi@grimberg.me> wrote:
> >> As I said, TCP can be tuned in various ways, congestion being just one
> >> of them. I'm sure you can find a workload where rmem/wmem will make
> >> a difference.
> >
> > agree.
> > but the difference for the knob of rmem/wmem is:
> > we could enlarge rmem/wmem for NVMe/TCP via sysctl,
> > and it would not bring downside to any other sockets whose
> > rmem/wmem are not explicitly specified.
>
> It can most certainly affect them, positively or negatively, depends
> on the use-case.
Agree.
Your saying is rigorous.
> >> In addition, based on my knowledge, application specific TCP level
> >> tuning (like congestion) is not really a common thing to do. So why in
> >> nvme-tcp?
> >>
> >> So to me at least, it is not clear why we should add it to the driver.
> >
> > As mentioned in the commit message, though we can specify the
> > congestion-control of NVMe_over_TCP via sysctl or writing
> > '/proc/sys/net/ipv4/tcp_congestion_control', but this also
> > changes the congestion-control of all the future TCP sockets on
> > the same host that have not been explicitly assigned the
> > congestion-control, thus bringing potential impaction on their
> > performance.
> >
> > For example:
> >
> > A server in a data-center with the following 2 NICs:
> >
> > - NIC_fron-end, for interacting with clients through WAN
> > (high latency, ms-level)
> >
> > - NIC_back-end, for interacting with NVMe/TCP target through LAN
> > (low latency, ECN-enabled, ideal for dctcp)
> >
> > This server interacts with clients (handling requests) via the fron-end
> > network and accesses the NVMe/TCP storage via the back-end network.
> > This is a normal use case, right?
> >
> > For the client devices, we can’t determine their congestion-control.
> > But normally it’s cubic by default (per the CONFIG_DEFAULT_TCP_CONG).
> > So if we change the default congestion control on the server to dctcp
> > on behalf of the NVMe/TCP traffic of the LAN side, it could at the
> > same time change the congestion-control of the front-end sockets
> > to dctcp while the congestion-control of the client-side is cubic.
> > So this is an unexpected scenario.
> >
> > In addition, distributed storage products like the following also have
> > the above problem:
> >
> > - The product consists of a cluster of servers.
> >
> > - Each server serves clients via its front-end NIC
> > (WAN, high latency).
> >
> > - All servers interact with each other via NVMe/TCP via back-end NIC
> > (LAN, low latency, ECN-enabled, ideal for dctcp).
>
> Separate networks are still not application (nvme-tcp) specific and as
> mentioned, we have a way to control that. IMO, this still does not
> qualify as solid justification to add this to nvme-tcp.
>
> What do others think?
Well, per the fact that the approach (‘ip route …’) proposed
by Jakub could largely fit the per link requirement on
congestion-control, so the usefulness of this patchset is really
not so significant.
So here I terminate all the threads of this patchset.
At last, many thanks to all of you for reviewing this patchset.
next prev parent reply other threads:[~2022-03-31 4:10 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-11 10:34 [PATCH v2 1/3] tcp: export symbol tcp_set_congestion_control Mingbao Sun
2022-03-11 10:34 ` [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control Mingbao Sun
2022-03-13 11:40 ` Sagi Grimberg
2022-03-14 1:34 ` Mingbao Sun
2022-03-25 12:11 ` Mingbao Sun
2022-03-25 13:44 ` Sagi Grimberg
2022-03-29 2:48 ` Mingbao Sun
2022-03-29 4:33 ` Jakub Kicinski
2022-03-30 7:31 ` Mingbao Sun
2022-03-29 7:46 ` Sagi Grimberg
2022-03-30 7:57 ` Mingbao Sun
2022-03-30 10:27 ` Mingbao Sun
2022-03-31 3:26 ` Mingbao Sun [this message]
2022-03-31 5:33 ` Mingbao Sun
2022-03-25 12:44 ` Mingbao Sun
2022-03-25 14:11 ` Mingbao Sun
2022-03-25 14:46 ` Mingbao Sun
2022-03-14 7:19 ` Christoph Hellwig
2022-03-11 10:34 ` [PATCH v2 3/3] nvmet-tcp: " Mingbao Sun
2022-03-13 11:44 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220331112613.0000063e@tom.com \
--to=sunmingbao@tom.com \
--cc=ao.sun@dell.com \
--cc=axboe@fb.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=kuba@kernel.org \
--cc=libin.zhang@dell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=netdev@vger.kernel.org \
--cc=ping.gan@dell.com \
--cc=sagi@grimberg.me \
--cc=tyler.sun@dell.com \
--cc=yanxiu.cai@dell.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).