From: Eric Dumazet <eric.dumazet@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Eric Dumazet <eric.dumazet@gmail.com>
Cc: brakmo <brakmo@fb.com>, netdev <netdev@vger.kernel.org>,
Martin Lau <kafai@fb.com>, Alexei Starovoitov <ast@fb.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP
Date: Sat, 23 Mar 2019 22:36:24 -0700 [thread overview]
Message-ID: <0841fe0d-7fcd-bb59-3694-af9969cec5af@gmail.com> (raw)
In-Reply-To: <20190323154124.gorqpaqex7ihfs6d@ast-mbp>
On 03/23/2019 08:41 AM, Alexei Starovoitov wrote:
> On Sat, Mar 23, 2019 at 02:12:39AM -0700, Eric Dumazet wrote:
>>
>>
>> On 03/23/2019 01:05 AM, brakmo wrote:
>>> This patchset adds support for propagating congestion notifications (cn)
>>> to TCP from cgroup inet skb egress BPF programs.
>>>
>>> Current cgroup skb BPF programs cannot trigger TCP congestion window
>>> reductions, even when they drop a packet. This patch-set adds support
>>> for cgroup skb BPF programs to send congestion notifications in the
>>> return value when the packets are TCP packets. Rather than the
>>> current 1 for keeping the packet and 0 for dropping it, they can
>>> now return:
>>> NET_XMIT_SUCCESS (0) - continue with packet output
>>> NET_XMIT_DROP (1) - drop packet and do cn
>>> NET_XMIT_CN (2) - continue with packet output and do cn
>>> -EPERM - drop packet
>>>
>>
>> I believe I already mentioned this model is broken, if you have any virtual
>> device before the cgroup BPF program.
>>
>> Please think about offloading the pacing/throttling in the NIC,
>> there is no way we will report back to tcp stack instant notifications.
>
> I don't think 'offload to google proprietary nic' is a suggestion
> that folks can practically follow.
> Very few NICs can offload pacing to hw and there are plenty of limitations.
> This patch set represents a pure sw solution that works and scales to millions of flows.
>
>> This patch series is going way too far for my taste.
>
> I would really appreciate if you can do a technical review of the patches.
> Our previous approach didn't quite work due to complexity around locked/non-locked socket.
> This is a cleaner approach.
> Either we go with this one or will add a bpf hook into __tcp_transmit_skb.
> This approach is better since it works for other protocols and can be
> used by qdiscs w/o any bpf.
>
>> This idea is not new, you were at Google when it was experimented by Nandita and
>> others, and we know it is not worth the pain.
>
> google networking needs are different from the rest of the world.
>
This has nothing to do with Google against Facebook really, it is a bit sad you react like this Alexei.
We just upstreamed bpf_skb_ecn_set_ce(), so I doubt you already have numbers to show that this strategy
is not enough.
All recent discussions about ECN (TCP Prague and SCE) do not _require_ instant feedback to the sender.
Please show us experimental results before we have to carry these huge hacks.
Thank you.
next prev parent reply other threads:[~2019-03-24 5:36 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-23 8:05 [PATCH bpf-next 0/7] bpf: Propagate cn to TCP brakmo
2019-03-23 8:05 ` [PATCH bpf-next 1/7] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY brakmo
2019-03-23 8:05 ` [PATCH bpf-next 2/7] bpf: cgroup inet skb programs can return 0 to 3 brakmo
2019-03-23 8:05 ` [PATCH bpf-next 3/7] bpf: Update __cgroup_bpf_run_filter_skb with cn brakmo
2019-03-23 8:05 ` [PATCH bpf-next 4/7] bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls brakmo
2019-03-23 8:05 ` [PATCH bpf-next 5/7] bpf: sysctl for probe_on_drop brakmo
2019-03-23 8:05 ` [PATCH bpf-next 6/7] bpf: Add cn support to hbm_out_kern.c brakmo
2019-03-23 8:05 ` [PATCH bpf-next 7/7] bpf: Add more stats to HBM brakmo
2019-03-23 9:12 ` [PATCH bpf-next 0/7] bpf: Propagate cn to TCP Eric Dumazet
2019-03-23 15:41 ` Alexei Starovoitov
2019-03-24 5:36 ` Eric Dumazet [this message]
2019-03-24 16:19 ` Alexei Starovoitov
2019-03-25 8:33 ` Eric Dumazet
2019-03-25 8:48 ` Eric Dumazet
2019-03-26 4:27 ` Alexei Starovoitov
2019-03-26 8:06 ` Eric Dumazet
2019-03-26 15:07 ` Alexei Starovoitov
2019-03-26 15:43 ` Eric Dumazet
2019-03-26 17:01 ` Alexei Starovoitov
2019-03-26 18:07 ` Eric Dumazet
2019-03-26 8:13 ` Eric Dumazet
2019-03-24 5:48 ` Eric Dumazet
2019-03-24 1:14 ` Lawrence Brakmo
2019-03-24 5:58 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0841fe0d-7fcd-bb59-3694-af9969cec5af@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=Kernel-team@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@fb.com \
--cc=brakmo@fb.com \
--cc=daniel@iogearbox.net \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).