From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67F5EC43381 for ; Sun, 24 Mar 2019 05:36:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2791021741 for ; Sun, 24 Mar 2019 05:36:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kkNCqkSV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726970AbfCXFg3 (ORCPT ); Sun, 24 Mar 2019 01:36:29 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:39834 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726137AbfCXFg3 (ORCPT ); Sun, 24 Mar 2019 01:36:29 -0400 Received: by mail-wr1-f66.google.com with SMTP id j9so6498718wrn.6 for ; Sat, 23 Mar 2019 22:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=irBDq0DIefrVC7v+wsJxwO/V1qq2WkG4Pfdj7jWwKhU=; b=kkNCqkSVPZ2zDNXAPSELRyUf+PYwNv7tDTKSt73CJGoJwbyubZ6+0IfdHTSDKMzdOs nK2Qo+lPoaPJlElhNLhnQYddaSapghpVmqbnEk1/f5BwIO/EcTyJ+fC7rT3TSYoHJ9xA rN6VdO4M1+buOXJh4tRL5cK1XDd/ZDx7/GvYZc+7Z1DGC+bR2+gQIoDIegsrK3wzuLwS pX3fmvCNMlcW2Mb1Nke7Yf968rrlECCP3tUkIWsab6bx0rVOGKBYRQgF7HcDl9OBlzaj +QDv/u6Yo3XviGxNI/qM4IvcNxorrQNvXkQ3LuZHIBSW2R0OcvUegkH4Ee3xjDdg8Ny4 LQ5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=irBDq0DIefrVC7v+wsJxwO/V1qq2WkG4Pfdj7jWwKhU=; b=Efu+JsDBqfGhQW0M+cDhXd+4oDVFOxNDT3oadS1L1ijd8JF0hk5i/IYyRsMl5SGYzp xrAFxN2a3lwaUfTZ3lfSpLQH6bxkclXo630L4D6cKuv9fnnQOZ0v7XjJ94w9MWoAVJn5 tUpTeIOjG3su6HH8oBxFQPAmdPEe4DbaemH7rGm+XSDvL0Y/AqyLL9lk11hztS9Epijo 2o4su3BGuxU2ziEJSPRu2d8jxF7n4qTeuob3/ap3z5J5JQpv/0zpaVV1OxHPzhg0fZSN w+d7fHMrsd5OpFyIei3Gsc+RD6sK7qVXcsYDH17t0lMGewzChXLNnUCTcSg8k/9l5sVS yStA== X-Gm-Message-State: APjAAAUxWS6ySReNDebW6FCjVC1RYzyb6Tfpf1uoX7+B3g4YAnVJhPjL hQ9fZ5sgSirjSl0z7hcw8aA= X-Google-Smtp-Source: APXvYqxrtX1IHc+ND/2ISG7F+JItCWMNDQInwiTJNAg2dwLNgmIaif1UO5fZt99y4f8qrRsR12rCsQ== X-Received: by 2002:a5d:400c:: with SMTP id n12mr11412156wrp.31.1553405787891; Sat, 23 Mar 2019 22:36:27 -0700 (PDT) Received: from [192.168.8.147] (186.198.23.93.rev.sfr.net. [93.23.198.186]) by smtp.gmail.com with ESMTPSA id k9sm13722998wru.55.2019.03.23.22.36.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 23 Mar 2019 22:36:26 -0700 (PDT) Subject: Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP To: Alexei Starovoitov , Eric Dumazet Cc: brakmo , netdev , Martin Lau , Alexei Starovoitov , Daniel Borkmann , Kernel Team References: <20190323080542.173569-1-brakmo@fb.com> <704cb63c-13cd-f0ed-d546-18e3596bb63d@gmail.com> <20190323154124.gorqpaqex7ihfs6d@ast-mbp> From: Eric Dumazet Message-ID: <0841fe0d-7fcd-bb59-3694-af9969cec5af@gmail.com> Date: Sat, 23 Mar 2019 22:36:24 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190323154124.gorqpaqex7ihfs6d@ast-mbp> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 03/23/2019 08:41 AM, Alexei Starovoitov wrote: > On Sat, Mar 23, 2019 at 02:12:39AM -0700, Eric Dumazet wrote: >> >> >> On 03/23/2019 01:05 AM, brakmo wrote: >>> This patchset adds support for propagating congestion notifications (cn) >>> to TCP from cgroup inet skb egress BPF programs. >>> >>> Current cgroup skb BPF programs cannot trigger TCP congestion window >>> reductions, even when they drop a packet. This patch-set adds support >>> for cgroup skb BPF programs to send congestion notifications in the >>> return value when the packets are TCP packets. Rather than the >>> current 1 for keeping the packet and 0 for dropping it, they can >>> now return: >>> NET_XMIT_SUCCESS (0) - continue with packet output >>> NET_XMIT_DROP (1) - drop packet and do cn >>> NET_XMIT_CN (2) - continue with packet output and do cn >>> -EPERM - drop packet >>> >> >> I believe I already mentioned this model is broken, if you have any virtual >> device before the cgroup BPF program. >> >> Please think about offloading the pacing/throttling in the NIC, >> there is no way we will report back to tcp stack instant notifications. > > I don't think 'offload to google proprietary nic' is a suggestion > that folks can practically follow. > Very few NICs can offload pacing to hw and there are plenty of limitations. > This patch set represents a pure sw solution that works and scales to millions of flows. > >> This patch series is going way too far for my taste. > > I would really appreciate if you can do a technical review of the patches. > Our previous approach didn't quite work due to complexity around locked/non-locked socket. > This is a cleaner approach. > Either we go with this one or will add a bpf hook into __tcp_transmit_skb. > This approach is better since it works for other protocols and can be > used by qdiscs w/o any bpf. > >> This idea is not new, you were at Google when it was experimented by Nandita and >> others, and we know it is not worth the pain. > > google networking needs are different from the rest of the world. > This has nothing to do with Google against Facebook really, it is a bit sad you react like this Alexei. We just upstreamed bpf_skb_ecn_set_ce(), so I doubt you already have numbers to show that this strategy is not enough. All recent discussions about ECN (TCP Prague and SCE) do not _require_ instant feedback to the sender. Please show us experimental results before we have to carry these huge hacks. Thank you.