From: Eric Dumazet <eric.dumazet@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Eric Dumazet <eric.dumazet@gmail.com>
Cc: brakmo <brakmo@fb.com>, netdev <netdev@vger.kernel.org>,
Martin Lau <kafai@fb.com>, Alexei Starovoitov <ast@fb.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP
Date: Sat, 23 Mar 2019 22:36:24 -0700 [thread overview]
Message-ID: <0841fe0d-7fcd-bb59-3694-af9969cec5af@gmail.com> (raw)
In-Reply-To: <20190323154124.gorqpaqex7ihfs6d@ast-mbp>
On 03/23/2019 08:41 AM, Alexei Starovoitov wrote:
> On Sat, Mar 23, 2019 at 02:12:39AM -0700, Eric Dumazet wrote:
>>
>>
>> On 03/23/2019 01:05 AM, brakmo wrote:
>>> This patchset adds support for propagating congestion notifications (cn)
>>> to TCP from cgroup inet skb egress BPF programs.
>>>
>>> Current cgroup skb BPF programs cannot trigger TCP congestion window
>>> reductions, even when they drop a packet. This patch-set adds support
>>> for cgroup skb BPF programs to send congestion notifications in the
>>> return value when the packets are TCP packets. Rather than the
>>> current 1 for keeping the packet and 0 for dropping it, they can
>>> now return:
>>> NET_XMIT_SUCCESS (0) - continue with packet output
>>> NET_XMIT_DROP (1) - drop packet and do cn
>>> NET_XMIT_CN (2) - continue with packet output and do cn
>>> -EPERM - drop packet
>>>
>>
>> I believe I already mentioned this model is broken, if you have any virtual
>> device before the cgroup BPF program.
>>
>> Please think about offloading the pacing/throttling in the NIC,
>> there is no way we will report back to tcp stack instant notifications.
>
> I don't think 'offload to google proprietary nic' is a suggestion
> that folks can practically follow.
> Very few NICs can offload pacing to hw and there are plenty of limitations.
> This patch set represents a pure sw solution that works and scales to millions of flows.
>
>> This patch series is going way too far for my taste.
>
> I would really appreciate if you can do a technical review of the patches.
> Our previous approach didn't quite work due to complexity around locked/non-locked socket.
> This is a cleaner approach.
> Either we go with this one or will add a bpf hook into __tcp_transmit_skb.
> This approach is better since it works for other protocols and can be
> used by qdiscs w/o any bpf.
>
>> This idea is not new, you were at Google when it was experimented by Nandita and
>> others, and we know it is not worth the pain.
>
> google networking needs are different from the rest of the world.
>
This has nothing to do with Google against Facebook really, it is a bit sad you react like this Alexei.
We just upstreamed bpf_skb_ecn_set_ce(), so I doubt you already have numbers to show that this strategy
is not enough.
All recent discussions about ECN (TCP Prague and SCE) do not _require_ instant feedback to the sender.
Please show us experimental results before we have to carry these huge hacks.
Thank you.
next prev parent reply other threads:[~2019-03-24 5:36 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-23 8:05 [PATCH bpf-next 0/7] bpf: Propagate cn to TCP brakmo
2019-03-23 8:05 ` [PATCH bpf-next 1/7] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY brakmo
2019-03-23 8:05 ` [PATCH bpf-next 2/7] bpf: cgroup inet skb programs can return 0 to 3 brakmo
2019-03-23 8:05 ` [PATCH bpf-next 3/7] bpf: Update __cgroup_bpf_run_filter_skb with cn brakmo
2019-03-23 8:05 ` [PATCH bpf-next 4/7] bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls brakmo
2019-03-23 8:05 ` [PATCH bpf-next 5/7] bpf: sysctl for probe_on_drop brakmo
2019-03-23 8:05 ` [PATCH bpf-next 6/7] bpf: Add cn support to hbm_out_kern.c brakmo
2019-03-23 8:05 ` [PATCH bpf-next 7/7] bpf: Add more stats to HBM brakmo
2019-03-23 9:12 ` [PATCH bpf-next 0/7] bpf: Propagate cn to TCP Eric Dumazet
2019-03-23 15:41 ` Alexei Starovoitov
2019-03-24 5:36 ` Eric Dumazet [this message]
2019-03-24 16:19 ` Alexei Starovoitov
2019-03-25 8:33 ` Eric Dumazet
2019-03-25 8:48 ` Eric Dumazet
2019-03-26 4:27 ` Alexei Starovoitov
2019-03-26 8:06 ` Eric Dumazet
2019-03-26 15:07 ` Alexei Starovoitov
2019-03-26 15:43 ` Eric Dumazet
2019-03-26 17:01 ` Alexei Starovoitov
2019-03-26 18:07 ` Eric Dumazet
2019-03-26 8:13 ` Eric Dumazet
2019-03-24 5:48 ` Eric Dumazet
2019-03-24 1:14 ` Lawrence Brakmo
2019-03-24 5:58 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0841fe0d-7fcd-bb59-3694-af9969cec5af@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=Kernel-team@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@fb.com \
--cc=brakmo@fb.com \
--cc=daniel@iogearbox.net \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.