From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B261EC43381 for ; Sat, 23 Mar 2019 08:07:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7E0B92190A for ; Sat, 23 Mar 2019 08:07:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="KTao4SpI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726038AbfCWIHC (ORCPT ); Sat, 23 Mar 2019 04:07:02 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:58882 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725835AbfCWIHB (ORCPT ); Sat, 23 Mar 2019 04:07:01 -0400 Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x2N84K9n027368 for ; Sat, 23 Mar 2019 01:07:00 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=facebook; bh=69LTpUSifxDhmE4GJ2343OOkX7rQRRTLHI4L10a+GfA=; b=KTao4SpIGIwXziVda6sPXUxUeRVP1R12OaO2ZkfRHvhawa+9fBt5lOSduxXxT60wvDob saM4qPSFmKo2bQi4RkMXhzfUPaGETjI+fgidY80f0o/mXn4YsHc/tKekdGxUiFvWxob2 o0o3Br97H6mirec+fcLY0qGudtAKWsd9/Ds= Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 2rddjj0ep2-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Sat, 23 Mar 2019 01:07:00 -0700 Received: from mx-out.facebook.com (2620:10d:c0a1:3::13) by mail.thefacebook.com (2620:10d:c021:18::172) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1713.5; Sat, 23 Mar 2019 01:06:59 -0700 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 478F75AE260E; Sat, 23 Mar 2019 01:06:56 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP Date: Sat, 23 Mar 2019 01:05:35 -0700 Message-ID: <20190323080542.173569-1-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 X-FB-Internal: Safe MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-03-23_05:,, signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patchset adds support for propagating congestion notifications (cn) to TCP from cgroup inet skb egress BPF programs. Current cgroup skb BPF programs cannot trigger TCP congestion window reductions, even when they drop a packet. This patch-set adds support for cgroup skb BPF programs to send congestion notifications in the return value when the packets are TCP packets. Rather than the current 1 for keeping the packet and 0 for dropping it, they can now return: NET_XMIT_SUCCESS (0) - continue with packet output NET_XMIT_DROP (1) - drop packet and do cn NET_XMIT_CN (2) - continue with packet output and do cn -EPERM - drop packet There is also support for setting the probe timer to a small value, specified by a sysctl, when a packet is dropped when calling queue_xmit in __tcp_transmit_skb and there are no other packets in transit. In addition, HBM programs are modified to collect and return more statistics. The use of congestion notifications improves the performance of HBM when using Cubic. Without congestion notifications, Cubic will not decrease its cwnd and HBM will need to drop a large percentage of the packets. Smaller probe timers improve the performance of Cubic and DCTCP when the rates are small enough that there are times when HBM cannot send a packet per RTT in order to mainting the bandwidth limit. The following results are obtained for rate limits of 1Gbps and 200Mbps, between two servers using netperf, and only one flow. We also show how reducing the max delayed ACK timer can improve the performance when using Cubic. A following patch will add support for fq's Earliest Departure Time (EDT). The command used was: ./do_hbm_test.sh -l -D --stats -N -r= [--no_cn] [dctcp] \ -s= where: is 1000 or 200 --no_cn specifies no cwr notifications dctcp use of dctcp Cubic DCTCP Lim,Prob,DA Mbps cwnd cred drops Mbps cwnd cred drops ------------ ---- ---- ---- ----- ---- ---- ---- ----- 1G, 0,40 35 462 -320 67% 995 1 -212 0.05% 1G, 0,40,cn 349 3 -229 0.15 995 1 -212 0.05 1G, 0, 5,cn 941 2 -189 0.13 995 1 -212 0.05 200M, 0,40,cn 50 3 -152 0.34 31 3 -203 0.50 200M, 0, 5,cn 43 2 -202 0.48 33 3 -199 0.50 200M,20, 5,cn 199 2 -209 0.38 199 1 -214 0.30 Notes: --no_cn has no effect with DCTCP Lim = rate limit Prob = Probe timer DA = maximum delay ack timer cred = credit in packets drops = % packets dropped brakmo (7): bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY bpf: cgroup inet skb programs can return 0 to 3 bpf: Update __cgroup_bpf_run_filter_skb with cn bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls bpf: sysctl for probe_on_drop bpf: Add cn support to hbm_out_kern.c bpf: Add more stats to HBM include/linux/bpf.h | 50 +++++++++++++++++++++++++++++ include/linux/filter.h | 3 +- include/net/netns/ipv4.h | 1 + kernel/bpf/cgroup.c | 25 ++++++++++++--- kernel/bpf/syscall.c | 12 +++++++ kernel/bpf/verifier.c | 16 +++++++-- net/ipv4/ip_output.c | 39 ++++++++++++---------- net/ipv4/sysctl_net_ipv4.c | 10 ++++++ net/ipv4/tcp_ipv4.c | 1 + net/ipv4/tcp_output.c | 18 +++++++++-- net/ipv6/ip6_output.c | 22 +++++++------ samples/bpf/do_hbm_test.sh | 10 ++++-- samples/bpf/hbm.c | 51 +++++++++++++++++++++++++++-- samples/bpf/hbm.h | 9 +++++- samples/bpf/hbm_kern.h | 66 ++++++++++++++++++++++++++++++++++++-- samples/bpf/hbm_out_kern.c | 48 +++++++++++++++++++-------- 16 files changed, 321 insertions(+), 60 deletions(-) -- 2.17.1