From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Kuniyuki Iwashima <kuni1840@gmail.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org,
Eric Dumazet <edumazet@google.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [PATCH v7 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC.
Date: Thu, 11 Jan 2024 22:20:06 -0800 [thread overview]
Message-ID: <a413b206-df50-4445-a4de-494339ea1ce6@linux.dev> (raw)
In-Reply-To: <20231221012806.37137-1-kuniyu@amazon.com>
On 12/20/23 5:28 PM, Kuniyuki Iwashima wrote:
> Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless
> for the connection request until a valid ACK is responded to the SYN+ACK.
>
> The cookie contains two kinds of host-specific bits, a timestamp and
> secrets, so only can it be validated by the generator. It means SYN
> Cookie consumes network resources between the client and the server;
> intermediate nodes must remember which nodes to route ACK for the cookie.
>
> SYN Proxy reduces such unwanted resource allocation by handling 3WHS at
> the edge network. After SYN Proxy completes 3WHS, it forwards SYN to the
> backend server and completes another 3WHS. However, since the server's
> ISN differs from the cookie, the proxy must manage the ISN mappings and
> fix up SEQ/ACK numbers in every packet for each connection. If a proxy
> node goes down, all the connections through it are terminated. Keeping
> a state at proxy is painful from that perspective.
>
> At AWS, we use a dirty hack to build truly stateless SYN Proxy at scale.
> Our SYN Proxy consists of the front proxy layer and the backend kernel
> module. (See slides of LPC2023 [0], p37 - p48)
>
> The cookie that SYN Proxy generates differs from the kernel's cookie in
> that it contains a secret (called rolling salt) (i) shared by all the proxy
> nodes so that any node can validate ACK and (ii) updated periodically so
> that old cookies cannot be validated and we need not encode a timestamp for
> the cookie. Also, ISN contains WScale, SACK, and ECN, not in TS val. This
> is not to sacrifice any connection quality, where some customers turn off
> TCP timestamps option due to retro CVE.
>
> After 3WHS, the proxy restores SYN, encapsulates ACK into SYN, and forward
> the TCP-in-TCP packet to the backend server. Our kernel module works at
> Netfilter input/output hooks and first feeds SYN to the TCP stack to
> initiate 3WHS. When the module is triggered for SYN+ACK, it looks up the
> corresponding request socket and overwrites tcp_rsk(req)->snt_isn with the
> proxy's cookie. Then, the module can complete 3WHS with the original ACK
> as is.
>
> This way, our SYN Proxy does not manage the ISN mappings nor wait for
> SYN+ACK from the backend thus can remain stateless. It's working very
> well for high-bandwidth services like multiple Tbps, but we are looking
> for a way to drop the dirty hack and further optimise the sequences.
>
> If we could validate an arbitrary SYN Cookie on the backend server with
> BPF, the proxy would need not restore SYN nor pass it. After validating
> ACK, the proxy node just needs to forward it, and then the server can do
> the lightweight validation (e.g. check if ACK came from proxy nodes, etc)
> and create a connection from the ACK.
>
> This series allows us to create a full sk from an arbitrary SYN Cookie,
> which is done in 3 steps.
>
> 1) At tc, BPF prog calls a new kfunc to create a reqsk and configure
> it based on the argument populated from SYN Cookie. The reqsk has
> its listener as req->rsk_listener and is passed to the TCP stack as
> skb->sk.
>
> 2) During TCP socket lookup for the skb, skb_steal_sock() returns a
> listener in the reuseport group that inet_reqsk(skb->sk)->rsk_listener
> belongs to.
>
> 3) In cookie_v[46]_check(), the reqsk (skb->sk) is fully initialised and
> a full sk is created.
>
> The kfunc usage is as follows:
>
> struct bpf_tcp_req_attrs attrs = {
> .mss = mss,
> .wscale_ok = wscale_ok,
> .rcv_wscale = rcv_wscale, /* Server's WScale < 15 */
> .snd_wscale = snd_wscale, /* Client's WScale < 15 */
> .tstamp_ok = tstamp_ok,
> .rcv_tsval = tsval,
> .rcv_tsecr = tsecr, /* Server's Initial TSval */
> .usec_ts_ok = usec_ts_ok,
> .sack_ok = sack_ok,
> .ecn_ok = ecn_ok,
> }
>
> skc = bpf_skc_lookup_tcp(...);
> sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
> bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs));
> bpf_sk_release(skc);
>
> [0]: https://lpc.events/event/17/contributions/1645/attachments/1350/2701/SYN_Proxy_at_Scale_with_BPF.pdf
>
>
> Changes:
> v7:
> * Patch 5 & 6
> * Drop MPTCP support
I think Yonghong's (thanks!) cpuv4 patch
(https://lore.kernel.org/bpf/20240110051348.2737007-1-yonghong.song@linux.dev/)
has addressed the issue that the selftest in patch 6 has encountered.
There are some minor comments in v7. Please respin v8 when the cpuv4 patch has
concluded so that it can kick off the CI also.
prev parent reply other threads:[~2024-01-12 6:20 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-21 1:28 [PATCH v7 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
2023-12-21 1:28 ` [PATCH v7 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h Kuniyuki Iwashima
2023-12-21 1:28 ` [PATCH v7 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h Kuniyuki Iwashima
2023-12-21 1:28 ` [PATCH v7 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock() Kuniyuki Iwashima
2023-12-21 1:28 ` [PATCH v7 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
2023-12-21 1:28 ` [PATCH v7 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
2024-01-12 1:44 ` Martin KaFai Lau
2024-01-15 20:13 ` Kuniyuki Iwashima
2023-12-21 1:28 ` [PATCH v7 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk() Kuniyuki Iwashima
2023-12-21 6:35 ` Martin KaFai Lau
2023-12-21 7:04 ` Kuniyuki Iwashima
2023-12-21 8:43 ` Kuniyuki Iwashima
2024-01-02 19:17 ` Yonghong Song
2023-12-21 16:44 ` Matthieu Baerts
2024-01-12 6:20 ` Martin KaFai Lau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a413b206-df50-4445-a4de-494339ea1ce6@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=edumazet@google.com \
--cc=kuni1840@gmail.com \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).