From: brakmo <brakmo@fb.com>
To: netdev <netdev@vger.kernel.org>
Cc: Martin Lau <kafai@fb.com>, Alexei Starovoitov <ast@fb.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Eric Dumazet <eric.dumazet@gmail.com>,
Kernel Team <Kernel-team@fb.com>
Subject: [PATCH v3 bpf-next 0/6] bpf: Propagate cn to TCP
Date: Mon, 27 May 2019 20:49:01 -0700 [thread overview]
Message-ID: <20190528034907.1957536-1-brakmo@fb.com> (raw)
This patchset adds support for propagating congestion notifications (cn)
to TCP from cgroup inet skb egress BPF programs.
Current cgroup skb BPF programs cannot trigger TCP congestion window
reductions, even when they drop a packet. This patch-set adds support
for cgroup skb BPF programs to send congestion notifications in the
return value when the packets are TCP packets. Rather than the
current 1 for keeping the packet and 0 for dropping it, they can
now return:
NET_XMIT_SUCCESS (0) - continue with packet output
NET_XMIT_DROP (1) - drop packet and do cn
NET_XMIT_CN (2) - continue with packet output and do cn
-EPERM - drop packet
Finally, HBM programs are modified to collect and return more
statistics.
There has been some discussion regarding the best place to manage
bandwidths. Some believe this should be done in the qdisc where it can
also be managed with a BPF program. We believe there are advantages
for doing it with a BPF program in the cgroup/skb callback. For example,
it reduces overheads in the cases where there is on primary workload and
one or more secondary workloads, where each workload is running on its
own cgroupv2. In this scenario, we only need to throttle the secondary
workloads and there is no overhead for the primary workload since there
will be no BPF program attached to its cgroup.
Regardless, we agree that this mechanism should not penalize those that
are not using it. We tested this by doing 1 byte req/reply RPCs over
loopback. Each test consists of 30 sec of back-to-back 1 byte RPCs.
Each test was repeated 50 times with a 1 minute delay between each set
of 10. We then calculated the average RPCs/sec over the 50 tests. We
compare upstream with upstream + patchset and no BPF program as well
as upstream + patchset and a BPF program that just returns ALLOW_PKT.
Here are the results:
upstream 80937 RPCs/sec
upstream + patches, no BPF program 80894 RPCs/sec
upstream + patches, BPF program 80634 RPCs/sec
These numbers indicate that there is no penalty for these patches
The use of congestion notifications improves the performance of HBM when
using Cubic. Without congestion notifications, Cubic will not decrease its
cwnd and HBM will need to drop a large percentage of the packets.
The following results are obtained for rate limits of 1Gbps,
between two servers using netperf, and only one flow. We also show how
reducing the max delayed ACK timer can improve the performance when
using Cubic.
Command used was:
./do_hbm_test.sh -l -D --stats -N -r=<rate> [--no_cn] [dctcp] \
-s=<server running netserver>
where:
<rate> is 1000
--no_cn specifies no cwr notifications
dctcp uses dctcp
Cubic DCTCP
Lim, DA Mbps cwnd cred drops Mbps cwnd cred drops
-------- ---- ---- ---- ----- ---- ---- ---- -----
1G, 40 35 462 -320 67% 995 1 -212 0.05%
1G, 40,cn 736 9 -78 0.07 995 1 -212 0.05
1G, 5,cn 941 2 -189 0.13 995 1 -212 0.05
Notes:
--no_cn has no effect with DCTCP
Lim = rate limit
DA = maximum delay ack timer
cred = credit in packets
drops = % packets dropped
v1->v2: Insures that only BPF_CGROUP_INET_EGRESS can return values 2 and 3
New egress values apply to all protocols, not just TCP
Cleaned up patch 4, Update BPF_CGROUP_RUN_PROG_INET_EGRESS callers
Removed changes to __tcp_transmit_skb (patch 5), no longer needed
Removed sample use of EDT
v2->v3: Removed the probe timer related changes
brakmo (6):
bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY
bpf: cgroup inet skb programs can return 0 to 3
bpf: Update __cgroup_bpf_run_filter_skb with cn
bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls
bpf: Add cn support to hbm_out_kern.c
bpf: Add more stats to HBM
include/linux/bpf.h | 50 +++++++++++++++++++++++++++++
include/linux/filter.h | 3 +-
kernel/bpf/cgroup.c | 25 ++++++++++++---
kernel/bpf/syscall.c | 12 +++++++
kernel/bpf/verifier.c | 16 +++++++--
net/ipv4/ip_output.c | 34 +++++++++++++-------
net/ipv6/ip6_output.c | 26 +++++++++------
samples/bpf/do_hbm_test.sh | 10 ++++--
samples/bpf/hbm.c | 51 +++++++++++++++++++++++++++--
samples/bpf/hbm.h | 9 +++++-
samples/bpf/hbm_kern.h | 66 ++++++++++++++++++++++++++++++++++++--
samples/bpf/hbm_out_kern.c | 48 +++++++++++++++++++--------
12 files changed, 299 insertions(+), 51 deletions(-)
--
2.17.1
next reply other threads:[~2019-05-28 3:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-28 3:49 brakmo [this message]
2019-05-28 3:49 ` [PATCH v3 bpf-next 1/6] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY brakmo
2019-05-28 13:42 ` Eric Dumazet
2019-05-28 18:54 ` Lawrence Brakmo
2019-05-28 20:43 ` Eric Dumazet
2019-05-28 21:23 ` Lawrence Brakmo
2019-05-28 3:49 ` [PATCH v3 bpf-next 2/6] bpf: cgroup inet skb programs can return 0 to 3 brakmo
2019-05-28 3:49 ` [PATCH v3 bpf-next 3/6] bpf: Update __cgroup_bpf_run_filter_skb with cn brakmo
2019-05-28 3:49 ` [PATCH v3 bpf-next 4/6] bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls brakmo
2019-05-28 3:49 ` [PATCH v3 bpf-next 5/6] bpf: Add cn support to hbm_out_kern.c brakmo
2019-05-28 3:49 ` [PATCH v3 bpf-next 6/6] bpf: Add more stats to HBM brakmo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190528034907.1957536-1-brakmo@fb.com \
--to=brakmo@fb.com \
--cc=Kernel-team@fb.com \
--cc=ast@fb.com \
--cc=daniel@iogearbox.net \
--cc=eric.dumazet@gmail.com \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).