netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peilin Ye <yepeilin.cs@gmail.com>
To: Dave Taht <dave.taht@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Peilin Ye <peilin.ye@bytedance.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Cong Wang <cong.wang@bytedance.com>
Subject: Re: [PATCH RFC v1 net-next 1/4] net: Introduce Qdisc backpressure infrastructure
Date: Mon, 9 May 2022 19:23:50 -0700	[thread overview]
Message-ID: <20220510022350.GA4619@bytedance> (raw)
In-Reply-To: <CAA93jw4dFxwWCrhv98wwbPvM+UrAQKYRNbbSVp3UCp1zOnsD5w@mail.gmail.com>

Hi Dave,

On Mon, May 09, 2022 at 12:53:28AM -0700, Dave Taht wrote:
> I am very pleased to see this work.

Thanks!

> However,  my "vision" such as it was, and as misguided as it might be,
> was to implement a facility similar to tcp_notsent_lowat for udp
> packets, tracking the progress of the udp packet through the kernel,
> and supplying backpressure and providing better information about
> where when and why the packet was dropped in the stack back to the
> application.

By "a facility similar to tcp_notsent_lowat", do you mean a smaller
sk_sndbuf, or "UDP Small Queues"?

I don't fully understand the implications of using a smaller sk_sndbuf
yet, but I think it can work together with this RFC.

sk_sndbuf is a per-socket attribute, while this RFC tries to improve it
from Qdisc's perspective.  Using a smaller sk_sndbuf alone does not
prevent the "when UDP sends faster, TBF simply drops faster" issue
(described in [I] of the cover letter) from happening.  There's always a
point, that there're too many sockets, so TBF's queue cannot contain
"sk_sndbuf times number of sockets" (roughly speaking) bytes of skbs.
After that point, TBF will suddenly start to drop a lot.

For example, I used the default 212992 sk_sndbuf
(/proc/sys/net/core/wmem_default) in the test setup ([V] in the cover
letter).  Let's make it one tenth as large, 21299.  It works well for
the 2-client setup; zero packets dropped.  However, if we test it with
15 iperf2 clients:

[  3]  0.0-30.0 sec  46.4 MBytes  13.0 Mbits/sec   1.251 ms 89991/123091 (73%)
[  3]  0.0-30.0 sec  46.6 MBytes  13.0 Mbits/sec   2.033 ms 91204/124464 (73%)
[  3]  0.0-30.0 sec  46.5 MBytes  13.0 Mbits/sec   0.504 ms 89879/123054 (73%)
<...>                                                       ^^^^^^^^^^^^ ^^^^^

73% drop rate again.  Now apply this RFC:

[  3]  0.0-30.0 sec  46.3 MBytes  12.9 Mbits/sec   1.206 ms  807/33839 (2.4%)
[  3]  0.0-30.0 sec  45.5 MBytes  12.7 Mbits/sec   1.919 ms  839/33283 (2.5%)
[  3]  0.0-30.0 sec  45.8 MBytes  12.8 Mbits/sec   2.521 ms  837/33508 (2.5%)
<...>                                                        ^^^^^^^^^ ^^^^^^

Down to 3% again.

Next, same 21299 sk_sndbuf, 20 iperf2 clients, without RFC:

[  3]  0.0-30.0 sec  34.5 MBytes  9.66 Mbits/sec   1.054 ms 258703/283342 (91%)
[  3]  0.0-30.0 sec  34.5 MBytes  9.66 Mbits/sec   1.033 ms 257324/281964 (91%)
[  3]  0.0-30.0 sec  34.5 MBytes  9.66 Mbits/sec   1.116 ms 257858/282500 (91%)
<...>                                                       ^^^^^^^^^^^^^ ^^^^^

91% drop rate.  Finally, apply RFC:

[  3]  0.0-30.0 sec  34.4 MBytes  9.61 Mbits/sec   0.974 ms 7982/32503 (25%)
[  3]  0.0-30.0 sec  34.1 MBytes  9.54 Mbits/sec   1.381 ms 7394/31732 (23%)
[  3]  0.0-30.0 sec  34.3 MBytes  9.58 Mbits/sec   2.431 ms 8149/32583 (25%)
<...>                                                       ^^^^^^^^^^ ^^^^^

The thundering herd probelm ([III] in the cover letter) surfaces, but
still an improvement.

In conclusion, assuming we are going to use smaller sk_sndbuf or "UDP
Small Queues", I think it doesn't replace this RFC, and vice versa.

> I've been really impressed by the DROP_REASON work and had had no clue
> prior to seeing all that instrumentation, where else packets might be
> dropped in the kernel.
> 
> I'd be interested to see what happens with sch_cake.

Sure, I will cover sch_cake in v2.

Thanks,
Peilin Ye


  reply	other threads:[~2022-05-10  2:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-06 19:43 [PATCH RFC v1 net-next 0/4] net: Qdisc backpressure infrastructure Peilin Ye
2022-05-06 19:44 ` [PATCH RFC v1 net-next 1/4] net: Introduce " Peilin Ye
2022-05-06 20:31   ` Stephen Hemminger
2022-05-06 23:34     ` Peilin Ye
2022-05-09  7:53   ` Dave Taht
2022-05-10  2:23     ` Peilin Ye [this message]
2022-05-06 19:44 ` [PATCH RFC v1 net-next 2/4] net/sched: sch_tbf: Use " Peilin Ye
2022-05-06 19:45 ` [PATCH RFC v1 net-next 3/4] net/sched: sch_htb: " Peilin Ye
2022-05-06 19:45 ` [PATCH RFC v1 net-next 4/4] net/sched: sch_cbq: " Peilin Ye
2022-05-10  3:26 ` [PATCH RFC v1 net-next 0/4] net: " Eric Dumazet
2022-05-10 23:03   ` Peilin Ye
2022-05-10 23:27     ` Peilin Ye
2022-08-22  9:10 ` [PATCH RFC v2 net-next 0/5] " Peilin Ye
2022-08-22  9:11   ` [PATCH RFC v2 net-next 1/5] net: Introduce " Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 2/5] net/udp: Implement Qdisc backpressure algorithm Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 3/5] net/sched: sch_tbf: Use Qdisc backpressure infrastructure Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 4/5] net/sched: sch_htb: " Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 5/5] net/sched: sch_cbq: " Peilin Ye
2022-08-22 16:17   ` [PATCH RFC v2 net-next 0/5] net: " Jakub Kicinski
2022-08-29 16:53     ` Cong Wang
2022-08-30  0:21       ` Jakub Kicinski
2022-09-19 17:00         ` Cong Wang
2022-08-22 16:22   ` Eric Dumazet
2022-08-29 16:47     ` Cong Wang
2022-08-29 16:53       ` Eric Dumazet
2022-09-19 17:06         ` Cong Wang
2022-08-30  2:28     ` Yafang Shao
2022-09-19 17:04       ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220510022350.GA4619@bytedance \
    --to=yepeilin.cs@gmail.com \
    --cc=cong.wang@bytedance.com \
    --cc=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eric.dumazet@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peilin.ye@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).