netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mingbao Sun <sunmingbao@tom.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
	Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, tyler.sun@dell.com, ping.gan@dell.com,
	yanxiu.cai@dell.com, libin.zhang@dell.com, ao.sun@dell.com
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
Date: Tue, 29 Mar 2022 10:48:06 +0800	[thread overview]
Message-ID: <20220329104806.00000126@tom.com> (raw)
In-Reply-To: <b7b5106a-9c0d-db49-00ab-234756955de8@grimberg.me>

> As I said, TCP can be tuned in various ways, congestion being just one
> of them. I'm sure you can find a workload where rmem/wmem will make
> a difference.

agree.
but the difference for the knob of rmem/wmem is:
we could enlarge rmem/wmem for NVMe/TCP via sysctl,
and it would not bring downside to any other sockets whose
rmem/wmem are not explicitly specified.

> In addition, based on my knowledge, application specific TCP level
> tuning (like congestion) is not really a common thing to do. So why in
> nvme-tcp?
> 
> So to me at least, it is not clear why we should add it to the driver.

As mentioned in the commit message, though we can specify the
congestion-control of NVMe_over_TCP via sysctl or writing
'/proc/sys/net/ipv4/tcp_congestion_control', but this also
changes the congestion-control of all the future TCP sockets on
the same host that have not been explicitly assigned the
congestion-control, thus bringing potential impaction on their
performance.

For example:

A server in a data-center with the following 2 NICs:

    - NIC_fron-end, for interacting with clients through WAN
      (high latency, ms-level)

    - NIC_back-end, for interacting with NVMe/TCP target through LAN
      (low latency, ECN-enabled, ideal for dctcp)

This server interacts with clients (handling requests) via the fron-end
network and accesses the NVMe/TCP storage via the back-end network.
This is a normal use case, right?

For the client devices, we can’t determine their congestion-control.
But normally it’s cubic by default (per the CONFIG_DEFAULT_TCP_CONG).
So if we change the default congestion control on the server to dctcp
on behalf of the NVMe/TCP traffic of the LAN side, it could at the
same time change the congestion-control of the front-end sockets
to dctcp while the congestion-control of the client-side is cubic.
So this is an unexpected scenario.

In addition, distributed storage products like the following also have
the above problem:

    - The product consists of a cluster of servers.

    - Each server serves clients via its front-end NIC
     (WAN, high latency).

    - All servers interact with each other via NVMe/TCP via back-end NIC
     (LAN, low latency, ECN-enabled, ideal for dctcp).

  reply	other threads:[~2022-03-29  2:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 10:34 [PATCH v2 1/3] tcp: export symbol tcp_set_congestion_control Mingbao Sun
2022-03-11 10:34 ` [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control Mingbao Sun
2022-03-13 11:40   ` Sagi Grimberg
2022-03-14  1:34     ` Mingbao Sun
2022-03-25 12:11     ` Mingbao Sun
2022-03-25 13:44       ` Sagi Grimberg
2022-03-29  2:48         ` Mingbao Sun [this message]
2022-03-29  4:33           ` Jakub Kicinski
2022-03-30  7:31             ` Mingbao Sun
2022-03-29  7:46           ` Sagi Grimberg
2022-03-30  7:57             ` Mingbao Sun
2022-03-30 10:27             ` Mingbao Sun
2022-03-31  3:26             ` Mingbao Sun
2022-03-31  5:33             ` Mingbao Sun
2022-03-25 12:44     ` Mingbao Sun
2022-03-25 14:11     ` Mingbao Sun
2022-03-25 14:46     ` Mingbao Sun
2022-03-14  7:19   ` Christoph Hellwig
2022-03-11 10:34 ` [PATCH v2 3/3] nvmet-tcp: " Mingbao Sun
2022-03-13 11:44   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220329104806.00000126@tom.com \
    --to=sunmingbao@tom.com \
    --cc=ao.sun@dell.com \
    --cc=axboe@fb.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=libin.zhang@dell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=ping.gan@dell.com \
    --cc=sagi@grimberg.me \
    --cc=tyler.sun@dell.com \
    --cc=yanxiu.cai@dell.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).