linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peilin Ye <yepeilin.cs@gmail.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>
Cc: Peilin Ye <peilin.ye@bytedance.com>,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Cong Wang <cong.wang@bytedance.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Dave Taht <dave.taht@gmail.com>,
	Peilin Ye <yepeilin.cs@gmail.com>
Subject: [PATCH RFC v2 net-next 0/5] net: Qdisc backpressure infrastructure
Date: Mon, 22 Aug 2022 02:10:17 -0700	[thread overview]
Message-ID: <cover.1661158173.git.peilin.ye@bytedance.com> (raw)
In-Reply-To: <cover.1651800598.git.peilin.ye@bytedance.com>

From: Peilin Ye <peilin.ye@bytedance.com>

Hi all,

Currently sockets (especially UDP ones) can drop a lot of packets at TC
egress when rate limited by shaper Qdiscs like HTB.  This patchset series
tries to solve this by introducing a Qdisc backpressure mechanism.

RFC v1 [1] used a throttle & unthrottle approach, which introduced several
issues, including a thundering herd problem and a socket reference count
issue [2].  This RFC v2 uses a different approach to avoid those issues:

  1. When a shaper Qdisc drops a packet that belongs to a local socket due
     to TC egress congestion, we make part of the socket's sndbuf
     temporarily unavailable, so it sends slower.
  
  2. Later, when TC egress becomes idle again, we gradually recover the
     socket's sndbuf back to normal.  Patch 2 implements this step using a
     timer for UDP sockets.

The thundering herd problem is avoided, since we no longer wake up all
throttled sockets at the same time in qdisc_watchdog().  The socket
reference count issue is also avoided, since we no longer maintain socket
list on Qdisc.

Performance is better than RFC v1.  There is one concern about fairness
between flows for TBF Qdisc, which could be solved by using a SFQ inner
Qdisc.

Please see the individual patches for details and numbers.  Any comments,
suggestions would be much appreciated.  Thanks!

[1] https://lore.kernel.org/netdev/cover.1651800598.git.peilin.ye@bytedance.com/
[2] https://lore.kernel.org/netdev/20220506133111.1d4bebf3@hermes.local/

Peilin Ye (5):
  net: Introduce Qdisc backpressure infrastructure
  net/udp: Implement Qdisc backpressure algorithm
  net/sched: sch_tbf: Use Qdisc backpressure infrastructure
  net/sched: sch_htb: Use Qdisc backpressure infrastructure
  net/sched: sch_cbq: Use Qdisc backpressure infrastructure

 Documentation/networking/ip-sysctl.rst | 11 ++++
 include/linux/udp.h                    |  3 ++
 include/net/netns/ipv4.h               |  1 +
 include/net/sch_generic.h              | 11 ++++
 include/net/sock.h                     | 21 ++++++++
 include/net/udp.h                      |  1 +
 net/core/sock.c                        |  5 +-
 net/ipv4/sysctl_net_ipv4.c             |  7 +++
 net/ipv4/udp.c                         | 69 +++++++++++++++++++++++++-
 net/ipv6/udp.c                         |  2 +-
 net/sched/sch_cbq.c                    |  1 +
 net/sched/sch_htb.c                    |  2 +
 net/sched/sch_tbf.c                    |  2 +
 13 files changed, 132 insertions(+), 4 deletions(-)

-- 
2.20.1


       reply	other threads:[~2022-08-22  9:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1651800598.git.peilin.ye@bytedance.com>
2022-08-22  9:10 ` Peilin Ye [this message]
2022-08-22  9:11   ` [PATCH RFC v2 net-next 1/5] net: Introduce Qdisc backpressure infrastructure Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 2/5] net/udp: Implement Qdisc backpressure algorithm Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 3/5] net/sched: sch_tbf: Use Qdisc backpressure infrastructure Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 4/5] net/sched: sch_htb: " Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 5/5] net/sched: sch_cbq: " Peilin Ye
2022-08-22 16:17   ` [PATCH RFC v2 net-next 0/5] net: " Jakub Kicinski
2022-08-29 16:53     ` Cong Wang
2022-08-30  0:21       ` Jakub Kicinski
2022-09-19 17:00         ` Cong Wang
2022-08-22 16:22   ` Eric Dumazet
2022-08-29 16:47     ` Cong Wang
2022-08-29 16:53       ` Eric Dumazet
2022-09-19 17:06         ` Cong Wang
2022-08-30  2:28     ` Yafang Shao
2022-09-19 17:04       ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1661158173.git.peilin.ye@bytedance.com \
    --to=yepeilin.cs@gmail.com \
    --cc=cong.wang@bytedance.com \
    --cc=corbet@lwn.net \
    --cc=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peilin.ye@bytedance.com \
    --cc=stephen@networkplumber.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).