* [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
@ 2022-03-09 5:37 Mingbao Sun
2022-03-09 6:02 ` Mingbao Sun
2022-03-09 6:15 ` Christoph Hellwig
0 siblings, 2 replies; 9+ messages in thread
From: Mingbao Sun @ 2022-03-09 5:37 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, linux-nvme, linux-kernel
Cc: sunmingbao, tyler.sun, ping.gan, yanxiu.cai, libin.zhang, ao.sun
From: Mingbao Sun <tyler.sun@dell.com>
congestion-control could have a noticeable impaction on the
performance of TCP-based communications. This is of course true
to NVMe_over_TCP.
Different congestion-controls (e.g., cubic, dctcp) are suitable for
different scenarios. Proper adoption of congestion control would benefit
the performance. On the contrary, the performance could be destroyed.
Though we can specify the congestion-control of NVMe_over_TCP via
writing '/proc/sys/net/ipv4/tcp_congestion_control', but this also
changes the congestion-control of all the future TCP sockets that
have not been explicitly assigned the congestion-control, thus bringing
potential impaction on their performance.
So it makes sense to make NVMe_over_TCP support specifying the
congestion-control. And this commit addresses the target side.
Implementation approach:
the following new file entry was created for user to specify the
congestion-control of each nvmet port.
'/sys/kernel/config/nvmet/ports/X/tcp_congestion'
Then later in nvmet_tcp_add_port, the specified congestion-control
would be applied to the listening socket of the nvmet port.
Signed-off-by: Mingbao Sun <tyler.sun@dell.com>
---
drivers/nvme/target/configfs.c | 37 ++++++++++++++++++++++++++++++++++
drivers/nvme/target/nvmet.h | 1 +
drivers/nvme/target/tcp.c | 27 +++++++++++++++++++++++++
3 files changed, 65 insertions(+)
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 091a0ca16361..644e89bb0ee9 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -222,6 +222,41 @@ static ssize_t nvmet_addr_trsvcid_store(struct config_item *item,
CONFIGFS_ATTR(nvmet_, addr_trsvcid);
+static ssize_t nvmet_tcp_congestion_show(struct config_item *item,
+ char *page)
+{
+ struct nvmet_port *port = to_nvmet_port(item);
+
+ return snprintf(page, PAGE_SIZE, "%s\n",
+ port->tcp_congestion ? port->tcp_congestion : "");
+}
+
+static ssize_t nvmet_tcp_congestion_store(struct config_item *item,
+ const char *page, size_t count)
+{
+ struct nvmet_port *port = to_nvmet_port(item);
+ int len;
+ char *buf;
+
+ len = strcspn(page, "\n");
+ if (!len)
+ return -EINVAL;
+
+ if (nvmet_is_port_enabled(port, __func__))
+ return -EACCES;
+
+ buf = kmemdup_nul(page, len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ kfree(port->tcp_congestion);
+ port->tcp_congestion = buf;
+
+ return count;
+}
+
+CONFIGFS_ATTR(nvmet_, tcp_congestion);
+
static ssize_t nvmet_param_inline_data_size_show(struct config_item *item,
char *page)
{
@@ -1597,6 +1632,7 @@ static void nvmet_port_release(struct config_item *item)
list_del(&port->global_entry);
kfree(port->ana_state);
+ kfree(port->tcp_congestion);
kfree(port);
}
@@ -1605,6 +1641,7 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
&nvmet_attr_addr_treq,
&nvmet_attr_addr_traddr,
&nvmet_attr_addr_trsvcid,
+ &nvmet_attr_tcp_congestion,
&nvmet_attr_addr_trtype,
&nvmet_attr_param_inline_data_size,
#ifdef CONFIG_BLK_DEV_INTEGRITY
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 69637bf8f8e1..76a57c4c3456 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -145,6 +145,7 @@ struct nvmet_port {
struct config_group ana_groups_group;
struct nvmet_ana_group ana_default_group;
enum nvme_ana_state *ana_state;
+ const char *tcp_congestion;
void *priv;
bool enabled;
int inline_data_size;
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 83ca577f72be..3b72e782c901 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1657,8 +1657,10 @@ static void nvmet_tcp_accept_work(struct work_struct *w)
struct nvmet_tcp_port *port =
container_of(w, struct nvmet_tcp_port, accept_work);
struct socket *newsock;
+ struct inet_connection_sock *icsk, *icsk_new;
int ret;
+ icsk = inet_csk(port->sock->sk);
while (true) {
ret = kernel_accept(port->sock, &newsock, O_NONBLOCK);
if (ret < 0) {
@@ -1666,6 +1668,16 @@ static void nvmet_tcp_accept_work(struct work_struct *w)
pr_warn("failed to accept err=%d\n", ret);
return;
}
+
+ if (port->nport->tcp_congestion) {
+ icsk_new = inet_csk(newsock->sk);
+ if (icsk_new->icsk_ca_ops != icsk->icsk_ca_ops) {
+ pr_warn("congestion abnormal: expected %s, actual %s.\n",
+ icsk->icsk_ca_ops->name,
+ icsk_new->icsk_ca_ops->name);
+ }
+ }
+
ret = nvmet_tcp_alloc_queue(port, newsock);
if (ret) {
pr_err("failed to allocate queue\n");
@@ -1693,6 +1705,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
{
struct nvmet_tcp_port *port;
__kernel_sa_family_t af;
+ char ca_name[TCP_CA_NAME_MAX];
+ sockptr_t optval;
int ret;
port = kzalloc(sizeof(*port), GFP_KERNEL);
@@ -1741,6 +1755,19 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
if (so_priority > 0)
sock_set_priority(port->sock->sk, so_priority);
+ if (nport->tcp_congestion) {
+ strncpy(ca_name, nport->tcp_congestion, TCP_CA_NAME_MAX-1);
+ optval = KERNEL_SOCKPTR(ca_name);
+ ret = sock_common_setsockopt(port->sock, IPPROTO_TCP,
+ TCP_CONGESTION, optval,
+ strlen(ca_name));
+ if (ret) {
+ pr_err("failed to set port socket's congestion to %s: %d\n",
+ ca_name, ret);
+ goto err_sock;
+ }
+ }
+
ret = kernel_bind(port->sock, (struct sockaddr *)&port->addr,
sizeof(port->addr));
if (ret) {
--
2.26.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-09 5:37 [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control Mingbao Sun
@ 2022-03-09 6:02 ` Mingbao Sun
2022-03-09 6:15 ` Christoph Hellwig
1 sibling, 0 replies; 9+ messages in thread
From: Mingbao Sun @ 2022-03-09 6:02 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, linux-nvme, linux-kernel
Cc: tyler.sun, ping.gan, yanxiu.cai, libin.zhang, ao.sun
The patch v1 for target side also inserted some calls to
networking APIs in the generic part (configfs.c) for validating
the congestion-control specified by the user.
Per the comments from Christoph Hellwig on
‘[PATCH 2/2] nvme-tcp: support specifying the congestion-control’
(the patch for the host side), here delete these calls.
Since the tcp_congestion specified by the user could also get
checked later within sock_common_setsockopt in nvmet_tcp_add_port.
So this deletion brings little downside.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-09 5:37 [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control Mingbao Sun
2022-03-09 6:02 ` Mingbao Sun
@ 2022-03-09 6:15 ` Christoph Hellwig
2022-03-09 9:52 ` Mingbao Sun
1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2022-03-09 6:15 UTC (permalink / raw)
To: Mingbao Sun
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, linux-nvme, linux-kernel, tyler.sun, ping.gan,
yanxiu.cai, libin.zhang, ao.sun
On Wed, Mar 09, 2022 at 01:37:11PM +0800, Mingbao Sun wrote:
> + if (port->nport->tcp_congestion) {
> + icsk_new = inet_csk(newsock->sk);
> + if (icsk_new->icsk_ca_ops != icsk->icsk_ca_ops) {
> + pr_warn("congestion abnormal: expected %s, actual %s.\n",
> + icsk->icsk_ca_ops->name,
> + icsk_new->icsk_ca_ops->name);
> + }
> + }
What is the point of having this code?
> + if (nport->tcp_congestion) {
> + strncpy(ca_name, nport->tcp_congestion, TCP_CA_NAME_MAX-1);
> + optval = KERNEL_SOCKPTR(ca_name);
> + ret = sock_common_setsockopt(port->sock, IPPROTO_TCP,
> + TCP_CONGESTION, optval,
> + strlen(ca_name));
> + if (ret) {
> + pr_err("failed to set port socket's congestion to %s: %d\n",
> + ca_name, ret);
> + goto err_sock;
> + }
> + }
Same comment as for the host side.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-09 6:15 ` Christoph Hellwig
@ 2022-03-09 9:52 ` Mingbao Sun
2022-03-10 8:38 ` Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: Mingbao Sun @ 2022-03-09 9:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Jens Axboe, Sagi Grimberg, Chaitanya Kulkarni,
linux-nvme, linux-kernel, tyler.sun, ping.gan, yanxiu.cai,
libin.zhang, ao.sun
On Wed, 9 Mar 2022 07:15:41 +0100
Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Mar 09, 2022 at 01:37:11PM +0800, Mingbao Sun wrote:
> > + if (port->nport->tcp_congestion) {
> > + icsk_new = inet_csk(newsock->sk);
> > + if (icsk_new->icsk_ca_ops != icsk->icsk_ca_ops) {
> > + pr_warn("congestion abnormal: expected %s, actual %s.\n",
> > + icsk->icsk_ca_ops->name,
> > + icsk_new->icsk_ca_ops->name);
> > + }
> > + }
>
> What is the point of having this code?
Well, this could happen in certain circumstances.
Take the result from my test as an example:
- The congestion of the listening socket of the target was set to
‘dctcp’.
- But the congestion of the socket of the host side was set to
‘cubic’.
- Then the congestion of the socket of the new connection at the
target side would automatically be altered to ‘dctcp-reno’.
In case tcp_congestion was explicitly set for the target, it can be
supposed that the user attaches great importance to performance.
So we’d better make the users aware that the system is not working
in the way they expect.
Thus the checking and warning was added here.
>
> > + if (nport->tcp_congestion) {
> > + strncpy(ca_name, nport->tcp_congestion, TCP_CA_NAME_MAX-1);
> > + optval = KERNEL_SOCKPTR(ca_name);
> > + ret = sock_common_setsockopt(port->sock, IPPROTO_TCP,
> > + TCP_CONGESTION, optval,
> > + strlen(ca_name));
> > + if (ret) {
> > + pr_err("failed to set port socket's congestion to %s: %d\n",
> > + ca_name, ret);
> > + goto err_sock;
> > + }
> > + }
>
> Same comment as for the host side.
This will be handled as the host side in the next version.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-09 9:52 ` Mingbao Sun
@ 2022-03-10 8:38 ` Christoph Hellwig
2022-03-10 11:06 ` Mingbao Sun
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2022-03-10 8:38 UTC (permalink / raw)
To: Mingbao Sun
Cc: Christoph Hellwig, Keith Busch, Jens Axboe, Sagi Grimberg,
Chaitanya Kulkarni, linux-nvme, linux-kernel, tyler.sun, ping.gan,
yanxiu.cai, libin.zhang, ao.sun
On Wed, Mar 09, 2022 at 05:52:03PM +0800, Mingbao Sun wrote:
> On Wed, 9 Mar 2022 07:15:41 +0100
> Christoph Hellwig <hch@lst.de> wrote:
>
> > On Wed, Mar 09, 2022 at 01:37:11PM +0800, Mingbao Sun wrote:
> > > + if (port->nport->tcp_congestion) {
> > > + icsk_new = inet_csk(newsock->sk);
> > > + if (icsk_new->icsk_ca_ops != icsk->icsk_ca_ops) {
> > > + pr_warn("congestion abnormal: expected %s, actual %s.\n",
> > > + icsk->icsk_ca_ops->name,
> > > + icsk_new->icsk_ca_ops->name);
> > > + }
> > > + }
> >
> > What is the point of having this code?
>
> Well, this could happen in certain circumstances.
> Take the result from my test as an example:
>
> - The congestion of the listening socket of the target was set to
> ‘dctcp’.
>
> - But the congestion of the socket of the host side was set to
> ‘cubic’.
>
> - Then the congestion of the socket of the new connection at the
> target side would automatically be altered to ‘dctcp-reno’.
>
> In case tcp_congestion was explicitly set for the target, it can be
> supposed that the user attaches great importance to performance.
> So we’d better make the users aware that the system is not working
> in the way they expect.
A warning message really seems very severe for a condition like this.
Maybe the better interface is a way to figure out which congestion
control algorithm is in use by reading a sysfs file.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-10 8:38 ` Christoph Hellwig
@ 2022-03-10 11:06 ` Mingbao Sun
2022-03-10 11:35 ` Mingbao Sun
2022-03-10 14:20 ` Christoph Hellwig
0 siblings, 2 replies; 9+ messages in thread
From: Mingbao Sun @ 2022-03-10 11:06 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Jens Axboe, Sagi Grimberg, Chaitanya Kulkarni,
linux-nvme, linux-kernel, tyler.sun, ping.gan, yanxiu.cai,
libin.zhang, ao.sun
On Thu, 10 Mar 2022 09:38:11 +0100
Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Mar 09, 2022 at 05:52:03PM +0800, Mingbao Sun wrote:
> > On Wed, 9 Mar 2022 07:15:41 +0100
> > Christoph Hellwig <hch@lst.de> wrote:
> >
> > > On Wed, Mar 09, 2022 at 01:37:11PM +0800, Mingbao Sun wrote:
> > > > + if (port->nport->tcp_congestion) {
> > > > + icsk_new = inet_csk(newsock->sk);
> > > > + if (icsk_new->icsk_ca_ops != icsk->icsk_ca_ops) {
> > > > + pr_warn("congestion abnormal: expected %s, actual %s.\n",
> > > > + icsk->icsk_ca_ops->name,
> > > > + icsk_new->icsk_ca_ops->name);
> > > > + }
> > > > + }
> > >
> > > What is the point of having this code?
> >
> > Well, this could happen in certain circumstances.
> > Take the result from my test as an example:
> >
> > - The congestion of the listening socket of the target was set to
> > ‘dctcp’.
> >
> > - But the congestion of the socket of the host side was set to
> > ‘cubic’.
> >
> > - Then the congestion of the socket of the new connection at the
> > target side would automatically be altered to ‘dctcp-reno’.
> >
> > In case tcp_congestion was explicitly set for the target, it can be
> > supposed that the user attaches great importance to performance.
> > So we’d better make the users aware that the system is not working
> > in the way they expect.
>
> A warning message really seems very severe for a condition like this.
> Maybe the better interface is a way to figure out which congestion
> control algorithm is in use by reading a sysfs file.
Well, a target could have a great number of TCP sockets.
I feel it’s not proper to create a sysfs entry for each socket.
And for those sockets that do not have the exception of
congestion-control, it’s merely a waste of resources.
Also, since these sockets generate and die dynamically, the info
exported via fs may even do not have the opportunity to be seen by
the user.
Anyway, if you insist that the checking and warning here is not proper,
I can remove it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-10 11:06 ` Mingbao Sun
@ 2022-03-10 11:35 ` Mingbao Sun
2022-03-10 14:20 ` Christoph Hellwig
1 sibling, 0 replies; 9+ messages in thread
From: Mingbao Sun @ 2022-03-10 11:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Jens Axboe, Sagi Grimberg, Chaitanya Kulkarni,
linux-nvme, linux-kernel, tyler.sun, ping.gan, yanxiu.cai,
libin.zhang, ao.sun
On Thu, 10 Mar 2022 19:06:36 +0800
Mingbao Sun <sunmingbao@tom.com> wrote:
> On Thu, 10 Mar 2022 09:38:11 +0100
> Christoph Hellwig <hch@lst.de> wrote:
>
> > On Wed, Mar 09, 2022 at 05:52:03PM +0800, Mingbao Sun wrote:
> > > On Wed, 9 Mar 2022 07:15:41 +0100
> > > Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > > On Wed, Mar 09, 2022 at 01:37:11PM +0800, Mingbao Sun wrote:
> > > > > + if (port->nport->tcp_congestion) {
> > > > > + icsk_new = inet_csk(newsock->sk);
> > > > > + if (icsk_new->icsk_ca_ops != icsk->icsk_ca_ops) {
> > > > > + pr_warn("congestion abnormal: expected %s, actual %s.\n",
> > > > > + icsk->icsk_ca_ops->name,
> > > > > + icsk_new->icsk_ca_ops->name);
> > > > > + }
> > > > > + }
> > > >
> > > > What is the point of having this code?
> > >
> > > Well, this could happen in certain circumstances.
> > > Take the result from my test as an example:
> > >
> > > - The congestion of the listening socket of the target was set to
> > > ‘dctcp’.
> > >
> > > - But the congestion of the socket of the host side was set to
> > > ‘cubic’.
> > >
> > > - Then the congestion of the socket of the new connection at the
> > > target side would automatically be altered to ‘dctcp-reno’.
> > >
> > > In case tcp_congestion was explicitly set for the target, it can be
> > > supposed that the user attaches great importance to performance.
> > > So we’d better make the users aware that the system is not working
> > > in the way they expect.
> >
> > A warning message really seems very severe for a condition like this.
> > Maybe the better interface is a way to figure out which congestion
> > control algorithm is in use by reading a sysfs file.
>
> Well, a target could have a great number of TCP sockets.
>
> I feel it’s not proper to create a sysfs entry for each socket.
> And for those sockets that do not have the exception of
> congestion-control, it’s merely a waste of resources.
>
> Also, since these sockets generate and die dynamically, the info
> exported via fs may even do not have the opportunity to be seen by
> the user.
>
> Anyway, if you insist that the checking and warning here is not proper,
> I can remove it.
How about replacing pr_warn with pr_warn_once?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-10 11:06 ` Mingbao Sun
2022-03-10 11:35 ` Mingbao Sun
@ 2022-03-10 14:20 ` Christoph Hellwig
2022-03-10 15:13 ` Mingbao Sun
1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2022-03-10 14:20 UTC (permalink / raw)
To: Mingbao Sun
Cc: Christoph Hellwig, Keith Busch, Jens Axboe, Sagi Grimberg,
Chaitanya Kulkarni, linux-nvme, linux-kernel, tyler.sun, ping.gan,
yanxiu.cai, libin.zhang, ao.sun
On Thu, Mar 10, 2022 at 07:06:36PM +0800, Mingbao Sun wrote:
> I feel it’s not proper to create a sysfs entry for each socket.
> And for those sockets that do not have the exception of
> congestion-control, it’s merely a waste of resources.
>
> Also, since these sockets generate and die dynamically, the info
> exported via fs may even do not have the opportunity to be seen by
> the user.
>
> Anyway, if you insist that the checking and warning here is not proper,
> I can remove it.
Something that can happen during normal operation is per definition no
something that should be warned about.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control
2022-03-10 14:20 ` Christoph Hellwig
@ 2022-03-10 15:13 ` Mingbao Sun
0 siblings, 0 replies; 9+ messages in thread
From: Mingbao Sun @ 2022-03-10 15:13 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Jens Axboe, Sagi Grimberg, Chaitanya Kulkarni,
linux-nvme, linux-kernel, tyler.sun, ping.gan, yanxiu.cai,
libin.zhang, ao.sun
On Thu, 10 Mar 2022 15:20:34 +0100
Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Mar 10, 2022 at 07:06:36PM +0800, Mingbao Sun wrote:
> > I feel it’s not proper to create a sysfs entry for each socket.
> > And for those sockets that do not have the exception of
> > congestion-control, it’s merely a waste of resources.
> >
> > Also, since these sockets generate and die dynamically, the info
> > exported via fs may even do not have the opportunity to be seen by
> > the user.
> >
> > Anyway, if you insist that the checking and warning here is not proper,
> > I can remove it.
>
> Something that can happen during normal operation is per definition no
> something that should be warned about.
Got.
Will remove this checking and warning in the next version.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-10 15:13 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-09 5:37 [PATCH v2 1/2] nvmet-tcp: support specifying the congestion-control Mingbao Sun
2022-03-09 6:02 ` Mingbao Sun
2022-03-09 6:15 ` Christoph Hellwig
2022-03-09 9:52 ` Mingbao Sun
2022-03-10 8:38 ` Christoph Hellwig
2022-03-10 11:06 ` Mingbao Sun
2022-03-10 11:35 ` Mingbao Sun
2022-03-10 14:20 ` Christoph Hellwig
2022-03-10 15:13 ` Mingbao Sun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).