* [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC.
@ 2024-01-15 20:55 Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h Kuniyuki Iwashima
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless
for the connection request until a valid ACK is responded to the SYN+ACK.
The cookie contains two kinds of host-specific bits, a timestamp and
secrets, so only can it be validated by the generator. It means SYN
Cookie consumes network resources between the client and the server;
intermediate nodes must remember which nodes to route ACK for the cookie.
SYN Proxy reduces such unwanted resource allocation by handling 3WHS at
the edge network. After SYN Proxy completes 3WHS, it forwards SYN to the
backend server and completes another 3WHS. However, since the server's
ISN differs from the cookie, the proxy must manage the ISN mappings and
fix up SEQ/ACK numbers in every packet for each connection. If a proxy
node goes down, all the connections through it are terminated. Keeping
a state at proxy is painful from that perspective.
At AWS, we use a dirty hack to build truly stateless SYN Proxy at scale.
Our SYN Proxy consists of the front proxy layer and the backend kernel
module. (See slides of LPC2023 [0], p37 - p48)
The cookie that SYN Proxy generates differs from the kernel's cookie in
that it contains a secret (called rolling salt) (i) shared by all the proxy
nodes so that any node can validate ACK and (ii) updated periodically so
that old cookies cannot be validated and we need not encode a timestamp for
the cookie. Also, ISN contains WScale, SACK, and ECN, not in TS val. This
is not to sacrifice any connection quality, where some customers turn off
TCP timestamps option due to retro CVE.
After 3WHS, the proxy restores SYN, encapsulates ACK into SYN, and forward
the TCP-in-TCP packet to the backend server. Our kernel module works at
Netfilter input/output hooks and first feeds SYN to the TCP stack to
initiate 3WHS. When the module is triggered for SYN+ACK, it looks up the
corresponding request socket and overwrites tcp_rsk(req)->snt_isn with the
proxy's cookie. Then, the module can complete 3WHS with the original ACK
as is.
This way, our SYN Proxy does not manage the ISN mappings nor wait for
SYN+ACK from the backend thus can remain stateless. It's working very
well for high-bandwidth services like multiple Tbps, but we are looking
for a way to drop the dirty hack and further optimise the sequences.
If we could validate an arbitrary SYN Cookie on the backend server with
BPF, the proxy would need not restore SYN nor pass it. After validating
ACK, the proxy node just needs to forward it, and then the server can do
the lightweight validation (e.g. check if ACK came from proxy nodes, etc)
and create a connection from the ACK.
This series allows us to create a full sk from an arbitrary SYN Cookie,
which is done in 3 steps.
1) At tc, BPF prog calls a new kfunc to create a reqsk and configure
it based on the argument populated from SYN Cookie. The reqsk has
its listener as req->rsk_listener and is passed to the TCP stack as
skb->sk.
2) During TCP socket lookup for the skb, skb_steal_sock() returns a
listener in the reuseport group that inet_reqsk(skb->sk)->rsk_listener
belongs to.
3) In cookie_v[46]_check(), the reqsk (skb->sk) is fully initialised and
a full sk is created.
The kfunc usage is as follows:
struct bpf_tcp_req_attrs attrs = {
.mss = mss,
.wscale_ok = wscale_ok,
.rcv_wscale = rcv_wscale, /* Server's WScale < 15 */
.snd_wscale = snd_wscale, /* Client's WScale < 15 */
.tstamp_ok = tstamp_ok,
.rcv_tsval = tsval,
.rcv_tsecr = tsecr, /* Server's Initial TSval */
.usec_ts_ok = usec_ts_ok,
.sack_ok = sack_ok,
.ecn_ok = ecn_ok,
}
skc = bpf_skc_lookup_tcp(...);
sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs));
bpf_sk_release(skc);
[0]: https://lpc.events/event/17/contributions/1645/attachments/1350/2701/SYN_Proxy_at_Scale_with_BPF.pdf
Changes:
v8
* Rebase on Yonghong's cpuv4 fix
* Patch 5
* Fill the trailing 3-bytes padding in struct bpf_tcp_req_attrs
and test it as null
* Patch 6
* Remove unused IPPROTP_MPTCP definition
v7: https://lore.kernel.org/bpf/20231221012806.37137-1-kuniyu@amazon.com/
* Patch 5 & 6
* Drop MPTCP support
v6: https://lore.kernel.org/bpf/20231214155424.67136-1-kuniyu@amazon.com/
* Patch 5 & 6
* /struct /s/tcp_cookie_attributes/bpf_tcp_req_attrs/
* Don't reuse struct tcp_options_received and use u8 for each attrs
* Patch 6
* Check retval of test__start_subtest()
v5: https://lore.kernel.org/netdev/20231211073650.90819-1-kuniyu@amazon.com/
* Split patch 1-3
* Patch 3
* Clear req->rsk_listener in skb_steal_sock()
* Patch 4 & 5
* Move sysctl validation and tsoff init from cookie_bpf_check() to kfunc
* Patch 5
* Do not increment LINUX_MIB_SYNCOOKIES(RECV|FAILED)
* Patch 6
* Remove __always_inline
* Test if tcp_handle_{syn,ack}() is executed
* Move some definition to bpf_tracing_net.h
* s/BPF_F_CURRENT_NETNS/-1/
v4: https://lore.kernel.org/bpf/20231205013420.88067-1-kuniyu@amazon.com/
* Patch 1 & 2
* s/CONFIG_SYN_COOKIE/CONFIG_SYN_COOKIES/
* Patch 1
* Don't set rcv_wscale for BPF SYN Cookie case.
* Patch 2
* Add test for tcp_opt.{unused,rcv_wscale} in kfunc
* Modify skb_steal_sock() to avoid resetting skb-sk
* Support SO_REUSEPORT lookup
* Patch 3
* Add CONFIG_SYN_COOKIES to Kconfig for CI
* Define BPF_F_CURRENT_NETNS
v3: https://lore.kernel.org/netdev/20231121184245.69569-1-kuniyu@amazon.com/
* Guard kfunc and req->syncookie part in inet6?_steal_sock() with
CONFIG_SYN_COOKIE
v2: https://lore.kernel.org/netdev/20231120222341.54776-1-kuniyu@amazon.com/
* Drop SOCK_OPS and move SYN Cookie validation logic to TC with kfunc.
* Add cleanup patches to reduce discrepancy between cookie_v[46]_check()
v1: https://lore.kernel.org/bpf/20231013220433.70792-1-kuniyu@amazon.com/
Kuniyuki Iwashima (6):
tcp: Move tcp_ns_to_ts() to tcp.h
tcp: Move skb_steal_sock() to request_sock.h
bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().
bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
bpf: tcp: Support arbitrary SYN Cookie.
selftest: bpf: Test bpf_sk_assign_tcp_reqsk().
include/net/request_sock.h | 39 ++
include/net/sock.h | 25 -
include/net/tcp.h | 43 ++
net/core/filter.c | 114 +++-
net/core/sock.c | 14 +-
net/ipv4/syncookies.c | 40 +-
net/ipv6/syncookies.c | 13 +-
tools/testing/selftests/bpf/bpf_kfuncs.h | 10 +
tools/testing/selftests/bpf/config | 1 +
.../bpf/prog_tests/tcp_custom_syncookie.c | 150 +++++
.../selftests/bpf/progs/bpf_tracing_net.h | 16 +
.../selftests/bpf/progs/test_siphash.h | 64 ++
.../bpf/progs/test_tcp_custom_syncookie.c | 572 ++++++++++++++++++
.../bpf/progs/test_tcp_custom_syncookie.h | 140 +++++
14 files changed, 1195 insertions(+), 46 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c
create mode 100644 tools/testing/selftests/bpf/progs/test_siphash.h
create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h
--
2.30.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h Kuniyuki Iwashima
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF.
When BPF prog validates ACK and kfunc allocates a reqsk, we need
to call tcp_ns_to_ts() to calculate an offset of TSval for later
use:
time
t0 : Send SYN+ACK
-> tsval = Initial TSval (Random Number)
t1 : Recv ACK of 3WHS
-> tsoff = TSecr - tcp_ns_to_ts(usec_ts_ok, tcp_clock_ns())
= Initial TSval - t1
t2 : Send ACK
-> tsval = t2 + tsoff
= Initial TSval + (t2 - t1)
= Initial TSval + Time Delta (x)
(x) Note that the time delta does not include the initial RTT
from t0 to t1.
Let's move tcp_ns_to_ts() to tcp.h.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
include/net/tcp.h | 9 +++++++++
net/ipv4/syncookies.c | 9 ---------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index dd78a1181031..114000e71a46 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -577,6 +577,15 @@ static inline u32 tcp_cookie_time(void)
return val;
}
+/* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */
+static inline u64 tcp_ns_to_ts(bool usec_ts, u64 val)
+{
+ if (usec_ts)
+ return div_u64(val, NSEC_PER_USEC);
+
+ return div_u64(val, NSEC_PER_MSEC);
+}
+
u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th,
u16 *mssp);
__u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 61f1c96cfe63..981944c22820 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -51,15 +51,6 @@ static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport,
count, &syncookie_secret[c]);
}
-/* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */
-static u64 tcp_ns_to_ts(bool usec_ts, u64 val)
-{
- if (usec_ts)
- return div_u64(val, NSEC_PER_USEC);
-
- return div_u64(val, NSEC_PER_MSEC);
-}
-
/*
* when syncookies are in effect and tcp timestamps are enabled we encode
* tcp options in the lower bits of the timestamp value that will be
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock() Kuniyuki Iwashima
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF.
If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to TCP stack as skb->sk with req->syncookie 1.
In skb_steal_sock(), we need to check inet_reqsk(sk)->syncookie
to see if the reqsk is created by kfunc. However, inet_reqsk()
is not available in sock.h.
Let's move skb_steal_sock() to request_sock.h.
While at it, we refactor skb_steal_sock() so it returns early if
skb->sk is NULL to minimise the following patch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
include/net/request_sock.h | 28 ++++++++++++++++++++++++++++
include/net/sock.h | 25 -------------------------
2 files changed, 28 insertions(+), 25 deletions(-)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 144c39db9898..26c630c40abb 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -83,6 +83,34 @@ static inline struct sock *req_to_sk(struct request_sock *req)
return (struct sock *)req;
}
+/**
+ * skb_steal_sock - steal a socket from an sk_buff
+ * @skb: sk_buff to steal the socket from
+ * @refcounted: is set to true if the socket is reference-counted
+ * @prefetched: is set to true if the socket was assigned from bpf
+ */
+static inline struct sock *skb_steal_sock(struct sk_buff *skb,
+ bool *refcounted, bool *prefetched)
+{
+ struct sock *sk = skb->sk;
+
+ if (!sk) {
+ *prefetched = false;
+ *refcounted = false;
+ return NULL;
+ }
+
+ *prefetched = skb_sk_is_prefetched(skb);
+ if (*prefetched)
+ *refcounted = sk_is_refcounted(sk);
+ else
+ *refcounted = true;
+
+ skb->destructor = NULL;
+ skb->sk = NULL;
+ return sk;
+}
+
static inline struct request_sock *
reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
bool attach_listener)
diff --git a/include/net/sock.h b/include/net/sock.h
index a7f815c7cfdf..32a399fdcbb5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2814,31 +2814,6 @@ sk_is_refcounted(struct sock *sk)
return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE);
}
-/**
- * skb_steal_sock - steal a socket from an sk_buff
- * @skb: sk_buff to steal the socket from
- * @refcounted: is set to true if the socket is reference-counted
- * @prefetched: is set to true if the socket was assigned from bpf
- */
-static inline struct sock *
-skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched)
-{
- if (skb->sk) {
- struct sock *sk = skb->sk;
-
- *refcounted = true;
- *prefetched = skb_sk_is_prefetched(skb);
- if (*prefetched)
- *refcounted = sk_is_refcounted(sk);
- skb->destructor = NULL;
- skb->sk = NULL;
- return sk;
- }
- *prefetched = false;
- *refcounted = false;
- return NULL;
-}
-
/* Checks if this SKB belongs to an HW offloaded socket
* and whether any SW fallbacks are required based on dev.
* Check decrypted mark in case skb_orphan() cleared socket.
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF.
If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to TCP stack as skb->sk with req->syncookie 1. Also,
the reqsk has its listener as req->rsk_listener with no refcnt
taken.
When the TCP stack looks up a socket from the skb, we steal
inet_reqsk(skb->sk)->rsk_listener in skb_steal_sock() so that
the skb will be processed in cookie_v[46]_check() with the
listener.
Note that we do not clear skb->sk and skb->destructor so that we
can carry the reqsk to cookie_v[46]_check().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/request_sock.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 26c630c40abb..8839133d6f6b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -101,10 +101,21 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb,
}
*prefetched = skb_sk_is_prefetched(skb);
- if (*prefetched)
+ if (*prefetched) {
+#if IS_ENABLED(CONFIG_SYN_COOKIES)
+ if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) {
+ struct request_sock *req = inet_reqsk(sk);
+
+ *refcounted = false;
+ sk = req->rsk_listener;
+ req->rsk_listener = NULL;
+ return sk;
+ }
+#endif
*refcounted = sk_is_refcounted(sk);
- else
+ } else {
*refcounted = true;
+ }
skb->destructor = NULL;
skb->sk = NULL;
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (2 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock() Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-03-15 13:37 ` Eric Dumazet
2024-01-15 20:55 ` [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
` (2 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF in the following
patch.
If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to cookie_[46]_check() as skb->sk. If skb->sk is not
NULL, we call cookie_bpf_check().
Then, we clear skb->sk and skb->destructor, which are needed not
to hold refcnt for reqsk and the listener. See the following patch
for details.
After that, we finish initialisation for the remaining fields with
cookie_tcp_reqsk_init().
Note that the server side WScale is set only for non-BPF SYN Cookie.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/tcp.h | 20 ++++++++++++++++++++
net/ipv4/syncookies.c | 31 +++++++++++++++++++++++++++----
net/ipv6/syncookies.c | 13 +++++++++----
3 files changed, 56 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 114000e71a46..dfe99a084a71 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -599,6 +599,26 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
dst_feature(dst, RTAX_FEATURE_ECN);
}
+#if IS_ENABLED(CONFIG_BPF)
+static inline bool cookie_bpf_ok(struct sk_buff *skb)
+{
+ return skb->sk;
+}
+
+struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
+#else
+static inline bool cookie_bpf_ok(struct sk_buff *skb)
+{
+ return false;
+}
+
+static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ return NULL;
+}
+#endif
+
/* From net/ipv6/syncookies.c */
int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 981944c22820..be88bf586ff9 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -295,6 +295,24 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
return 0;
}
+#if IS_ENABLED(CONFIG_BPF)
+struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
+{
+ struct request_sock *req = inet_reqsk(skb->sk);
+
+ skb->sk = NULL;
+ skb->destructor = NULL;
+
+ if (cookie_tcp_reqsk_init(sk, skb, req)) {
+ reqsk_free(req);
+ req = NULL;
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_GPL(cookie_bpf_check);
+#endif
+
struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
struct sock *sk, struct sk_buff *skb,
struct tcp_options_received *tcp_opt,
@@ -395,9 +413,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
!th->ack || th->rst)
goto out;
- req = cookie_tcp_check(net, sk, skb);
- if (IS_ERR(req))
- goto out;
+ if (cookie_bpf_ok(skb)) {
+ req = cookie_bpf_check(sk, skb);
+ } else {
+ req = cookie_tcp_check(net, sk, skb);
+ if (IS_ERR(req))
+ goto out;
+ }
if (!req)
goto out_drop;
@@ -445,7 +467,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
ireq->wscale_ok, &rcv_wscale,
dst_metric(&rt->dst, RTAX_INITRWND));
- ireq->rcv_wscale = rcv_wscale;
+ if (!req->syncookie)
+ ireq->rcv_wscale = rcv_wscale;
ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst);
ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index c8d2ca27220c..6b9c69278819 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -182,9 +182,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
!th->ack || th->rst)
goto out;
- req = cookie_tcp_check(net, sk, skb);
- if (IS_ERR(req))
- goto out;
+ if (cookie_bpf_ok(skb)) {
+ req = cookie_bpf_check(sk, skb);
+ } else {
+ req = cookie_tcp_check(net, sk, skb);
+ if (IS_ERR(req))
+ goto out;
+ }
if (!req)
goto out_drop;
@@ -247,7 +251,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
ireq->wscale_ok, &rcv_wscale,
dst_metric(dst, RTAX_INITRWND));
- ireq->rcv_wscale = rcv_wscale;
+ if (!req->syncookie)
+ ireq->rcv_wscale = rcv_wscale;
ireq->ecn_ok &= cookie_ecn_ok(net, dst);
ret = tcp_get_cookie_sock(sk, skb, req, dst);
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie.
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (3 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-17 1:55 ` Martin KaFai Lau
2024-01-15 20:55 ` [PATCH v8 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk() Kuniyuki Iwashima
2024-01-17 1:50 ` [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC patchwork-bot+netdevbpf
6 siblings, 1 reply; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
This patch adds a new kfunc available at TC hook to support arbitrary
SYN Cookie.
The basic usage is as follows:
struct bpf_tcp_req_attrs attrs = {
.mss = mss,
.wscale_ok = wscale_ok,
.rcv_wscale = rcv_wscale, /* Server's WScale < 15 */
.snd_wscale = snd_wscale, /* Client's WScale < 15 */
.tstamp_ok = tstamp_ok,
.rcv_tsval = tsval,
.rcv_tsecr = tsecr, /* Server's Initial TSval */
.usec_ts_ok = usec_ts_ok,
.sack_ok = sack_ok,
.ecn_ok = ecn_ok,
}
skc = bpf_skc_lookup_tcp(...);
sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs));
bpf_sk_release(skc);
bpf_sk_assign_tcp_reqsk() takes skb, a listener sk, and struct
bpf_tcp_req_attrs and allocates reqsk and configures it. Then,
bpf_sk_assign_tcp_reqsk() links reqsk with skb and the listener.
The notable thing here is that we do not hold refcnt for both reqsk
and listener. To differentiate that, we mark reqsk->syncookie, which
is only used in TX for now. So, if reqsk->syncookie is 1 in RX, it
means that the reqsk is allocated by kfunc.
When skb is freed, sock_pfree() checks if reqsk->syncookie is 1,
and in that case, we set NULL to reqsk->rsk_listener before calling
reqsk_free() as reqsk does not hold a refcnt of the listener.
When the TCP stack looks up a socket from the skb, we steal the
listener from the reqsk in skb_steal_sock() and create a full sk
in cookie_v[46]_check().
The refcnt of reqsk will finally be set to 1 in tcp_get_cookie_sock()
after creating a full sk.
Note that we can extend struct bpf_tcp_req_attrs in the future when
we add a new attribute that is determined in 3WHS.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/tcp.h | 14 ++++++
net/core/filter.c | 114 +++++++++++++++++++++++++++++++++++++++++++++-
net/core/sock.c | 14 +++++-
3 files changed, 138 insertions(+), 4 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index dfe99a084a71..451dc1373970 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -600,6 +600,20 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
}
#if IS_ENABLED(CONFIG_BPF)
+struct bpf_tcp_req_attrs {
+ u32 rcv_tsval;
+ u32 rcv_tsecr;
+ u16 mss;
+ u8 rcv_wscale;
+ u8 snd_wscale;
+ u8 ecn_ok;
+ u8 wscale_ok;
+ u8 sack_ok;
+ u8 tstamp_ok;
+ u8 usec_ts_ok;
+ u8 reserved[3];
+};
+
static inline bool cookie_bpf_ok(struct sk_buff *skb)
{
return skb->sk;
diff --git a/net/core/filter.c b/net/core/filter.c
index 8c9f67c81e22..647d04171b7e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -11837,6 +11837,106 @@ __bpf_kfunc int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
return 0;
}
+
+__bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct sk_buff *skb, struct sock *sk,
+ struct bpf_tcp_req_attrs *attrs, int attrs__sz)
+{
+#if IS_ENABLED(CONFIG_SYN_COOKIES)
+ const struct request_sock_ops *ops;
+ struct inet_request_sock *ireq;
+ struct tcp_request_sock *treq;
+ struct request_sock *req;
+ struct net *net;
+ __u16 min_mss;
+ u32 tsoff = 0;
+
+ if (attrs__sz != sizeof(*attrs) ||
+ attrs->reserved[0] || attrs->reserved[1] || attrs->reserved[2])
+ return -EINVAL;
+
+ if (!sk)
+ return -EINVAL;
+
+ if (!skb_at_tc_ingress(skb))
+ return -EINVAL;
+
+ net = dev_net(skb->dev);
+ if (net != sock_net(sk))
+ return -ENETUNREACH;
+
+ switch (skb->protocol) {
+ case htons(ETH_P_IP):
+ ops = &tcp_request_sock_ops;
+ min_mss = 536;
+ break;
+#if IS_BUILTIN(CONFIG_IPV6)
+ case htons(ETH_P_IPV6):
+ ops = &tcp6_request_sock_ops;
+ min_mss = IPV6_MIN_MTU - 60;
+ break;
+#endif
+ default:
+ return -EINVAL;
+ }
+
+ if (sk->sk_type != SOCK_STREAM || sk->sk_state != TCP_LISTEN ||
+ sk_is_mptcp(sk))
+ return -EINVAL;
+
+ if (attrs->mss < min_mss)
+ return -EINVAL;
+
+ if (attrs->wscale_ok) {
+ if (!READ_ONCE(net->ipv4.sysctl_tcp_window_scaling))
+ return -EINVAL;
+
+ if (attrs->snd_wscale > TCP_MAX_WSCALE ||
+ attrs->rcv_wscale > TCP_MAX_WSCALE)
+ return -EINVAL;
+ }
+
+ if (attrs->sack_ok && !READ_ONCE(net->ipv4.sysctl_tcp_sack))
+ return -EINVAL;
+
+ if (attrs->tstamp_ok) {
+ if (!READ_ONCE(net->ipv4.sysctl_tcp_timestamps))
+ return -EINVAL;
+
+ tsoff = attrs->rcv_tsecr - tcp_ns_to_ts(attrs->usec_ts_ok, tcp_clock_ns());
+ }
+
+ req = inet_reqsk_alloc(ops, sk, false);
+ if (!req)
+ return -ENOMEM;
+
+ ireq = inet_rsk(req);
+ treq = tcp_rsk(req);
+
+ req->rsk_listener = sk;
+ req->syncookie = 1;
+ req->mss = attrs->mss;
+ req->ts_recent = attrs->rcv_tsval;
+
+ ireq->snd_wscale = attrs->snd_wscale;
+ ireq->rcv_wscale = attrs->rcv_wscale;
+ ireq->tstamp_ok = !!attrs->tstamp_ok;
+ ireq->sack_ok = !!attrs->sack_ok;
+ ireq->wscale_ok = !!attrs->wscale_ok;
+ ireq->ecn_ok = !!attrs->ecn_ok;
+
+ treq->req_usec_ts = !!attrs->usec_ts_ok;
+ treq->ts_off = tsoff;
+
+ skb_orphan(skb);
+ skb->sk = req_to_sk(req);
+ skb->destructor = sock_pfree;
+
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
__bpf_kfunc_end_defs();
int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags,
@@ -11865,6 +11965,10 @@ BTF_SET8_START(bpf_kfunc_check_set_sock_addr)
BTF_ID_FLAGS(func, bpf_sock_addr_set_sun_path)
BTF_SET8_END(bpf_kfunc_check_set_sock_addr)
+BTF_SET8_START(bpf_kfunc_check_set_tcp_reqsk)
+BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk)
+BTF_SET8_END(bpf_kfunc_check_set_tcp_reqsk)
+
static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
.owner = THIS_MODULE,
.set = &bpf_kfunc_check_set_skb,
@@ -11880,6 +11984,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_addr = {
.set = &bpf_kfunc_check_set_sock_addr,
};
+static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
+ .owner = THIS_MODULE,
+ .set = &bpf_kfunc_check_set_tcp_reqsk,
+};
+
static int __init bpf_kfunc_init(void)
{
int ret;
@@ -11895,8 +12004,9 @@ static int __init bpf_kfunc_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
- return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
- &bpf_kfunc_set_sock_addr);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+ &bpf_kfunc_set_sock_addr);
+ return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
}
late_initcall(bpf_kfunc_init);
diff --git a/net/core/sock.c b/net/core/sock.c
index 158dbdebce6a..147fb2656e6b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2582,8 +2582,18 @@ EXPORT_SYMBOL(sock_efree);
#ifdef CONFIG_INET
void sock_pfree(struct sk_buff *skb)
{
- if (sk_is_refcounted(skb->sk))
- sock_gen_put(skb->sk);
+ struct sock *sk = skb->sk;
+
+ if (!sk_is_refcounted(sk))
+ return;
+
+ if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) {
+ inet_reqsk(sk)->rsk_listener = NULL;
+ reqsk_free(inet_reqsk(sk));
+ return;
+ }
+
+ sock_gen_put(sk);
}
EXPORT_SYMBOL(sock_pfree);
#endif /* CONFIG_INET */
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk().
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (4 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-17 1:50 ` [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC patchwork-bot+netdevbpf
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
This commit adds a sample selftest to demonstrate how we can use
bpf_sk_assign_tcp_reqsk() as the backend of SYN Proxy.
The test creates IPv4/IPv6 x TCP connections and transfer messages
over them on lo with BPF tc prog attached.
The tc prog will process SYN and returns SYN+ACK with the following
ISN and TS. In a real use case, this part will be done by other
hosts.
MSB LSB
ISN: | 31 ... 8 | 7 6 | 5 | 4 | 3 2 1 0 |
| Hash_1 | MSS | ECN | SACK | WScale |
TS: | 31 ... 8 | 7 ... 0 |
| Random | Hash_2 |
WScale in SYN is reused in SYN+ACK.
The client returns ACK, and tc prog will recalculate ISN and TS
from ACK and validate SYN Cookie.
If it's valid, the prog calls kfunc to allocate a reqsk for skb and
configure the reqsk based on the argument created from SYN Cookie.
Later, the reqsk will be processed in cookie_v[46]_check() to create
a connection.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
tools/testing/selftests/bpf/bpf_kfuncs.h | 10 +
tools/testing/selftests/bpf/config | 1 +
.../bpf/prog_tests/tcp_custom_syncookie.c | 150 +++++
.../selftests/bpf/progs/bpf_tracing_net.h | 16 +
.../selftests/bpf/progs/test_siphash.h | 64 ++
.../bpf/progs/test_tcp_custom_syncookie.c | 572 ++++++++++++++++++
.../bpf/progs/test_tcp_custom_syncookie.h | 140 +++++
7 files changed, 953 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c
create mode 100644 tools/testing/selftests/bpf/progs/test_siphash.h
create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h
diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h
index b4e78c1eb37b..23c24f852f4f 100644
--- a/tools/testing/selftests/bpf/bpf_kfuncs.h
+++ b/tools/testing/selftests/bpf/bpf_kfuncs.h
@@ -51,6 +51,16 @@ extern int bpf_dynptr_clone(const struct bpf_dynptr *ptr, struct bpf_dynptr *clo
extern int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
const __u8 *sun_path, __u32 sun_path__sz) __ksym;
+/* Description
+ * Allocate and configure a reqsk and link it with a listener and skb.
+ * Returns
+ * Error code
+ */
+struct sock;
+struct bpf_tcp_req_attrs;
+extern int bpf_sk_assign_tcp_reqsk(struct __sk_buff *skb, struct sock *sk,
+ struct bpf_tcp_req_attrs *attrs, int attrs__sz) __ksym;
+
void *bpf_cast_to_kern_ctx(void *) __ksym;
void *bpf_rdonly_cast(void *obj, __u32 btf_id) __ksym;
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index c125c441abc7..01f241ea2c67 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -81,6 +81,7 @@ CONFIG_NF_NAT=y
CONFIG_RC_CORE=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
+CONFIG_SYN_COOKIES=y
CONFIG_TEST_BPF=m
CONFIG_USERFAULTFD=y
CONFIG_VSOCKETS=y
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c b/tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c
new file mode 100644
index 000000000000..eaf441dc7e79
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdlib.h>
+#include <net/if.h>
+
+#include "test_progs.h"
+#include "cgroup_helpers.h"
+#include "network_helpers.h"
+#include "test_tcp_custom_syncookie.skel.h"
+
+static struct test_tcp_custom_syncookie_case {
+ int family, type;
+ char addr[16];
+ char name[10];
+} test_cases[] = {
+ {
+ .name = "IPv4 TCP",
+ .family = AF_INET,
+ .type = SOCK_STREAM,
+ .addr = "127.0.0.1",
+ },
+ {
+ .name = "IPv6 TCP",
+ .family = AF_INET6,
+ .type = SOCK_STREAM,
+ .addr = "::1",
+ },
+};
+
+static int setup_netns(void)
+{
+ if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns"))
+ return -1;
+
+ if (!ASSERT_OK(system("ip link set dev lo up"), "ip"))
+ goto err;
+
+ if (!ASSERT_OK(write_sysctl("/proc/sys/net/ipv4/tcp_ecn", "1"),
+ "write_sysctl"))
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static int setup_tc(struct test_tcp_custom_syncookie *skel)
+{
+ LIBBPF_OPTS(bpf_tc_hook, qdisc_lo, .attach_point = BPF_TC_INGRESS);
+ LIBBPF_OPTS(bpf_tc_opts, tc_attach,
+ .prog_fd = bpf_program__fd(skel->progs.tcp_custom_syncookie));
+
+ qdisc_lo.ifindex = if_nametoindex("lo");
+ if (!ASSERT_OK(bpf_tc_hook_create(&qdisc_lo), "qdisc add dev lo clsact"))
+ goto err;
+
+ if (!ASSERT_OK(bpf_tc_attach(&qdisc_lo, &tc_attach),
+ "filter add dev lo ingress"))
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+#define msg "Hello World"
+#define msglen 11
+
+static void transfer_message(int sender, int receiver)
+{
+ char buf[msglen];
+ int ret;
+
+ ret = send(sender, msg, msglen, 0);
+ if (!ASSERT_EQ(ret, msglen, "send"))
+ return;
+
+ memset(buf, 0, sizeof(buf));
+
+ ret = recv(receiver, buf, msglen, 0);
+ if (!ASSERT_EQ(ret, msglen, "recv"))
+ return;
+
+ ret = strncmp(buf, msg, msglen);
+ if (!ASSERT_EQ(ret, 0, "strncmp"))
+ return;
+}
+
+static void create_connection(struct test_tcp_custom_syncookie_case *test_case)
+{
+ int server, client, child;
+
+ server = start_server(test_case->family, test_case->type, test_case->addr, 0, 0);
+ if (!ASSERT_NEQ(server, -1, "start_server"))
+ return;
+
+ client = connect_to_fd(server, 0);
+ if (!ASSERT_NEQ(client, -1, "connect_to_fd"))
+ goto close_server;
+
+ child = accept(server, NULL, 0);
+ if (!ASSERT_NEQ(child, -1, "accept"))
+ goto close_client;
+
+ transfer_message(client, child);
+ transfer_message(child, client);
+
+ close(child);
+close_client:
+ close(client);
+close_server:
+ close(server);
+}
+
+void test_tcp_custom_syncookie(void)
+{
+ struct test_tcp_custom_syncookie *skel;
+ int i;
+
+ if (setup_netns())
+ return;
+
+ skel = test_tcp_custom_syncookie__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "open_and_load"))
+ return;
+
+ if (setup_tc(skel))
+ goto destroy_skel;
+
+ for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+ if (!test__start_subtest(test_cases[i].name))
+ continue;
+
+ skel->bss->handled_syn = false;
+ skel->bss->handled_ack = false;
+
+ create_connection(&test_cases[i]);
+
+ ASSERT_EQ(skel->bss->handled_syn, true, "SYN is not handled at tc.");
+ ASSERT_EQ(skel->bss->handled_ack, true, "ACK is not handled at tc");
+ }
+
+destroy_skel:
+ system("tc qdisc del dev lo clsact");
+
+ test_tcp_custom_syncookie__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 1bdc680b0e0e..49e525ad9856 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -51,9 +51,25 @@
#define ICSK_TIME_LOSS_PROBE 5
#define ICSK_TIME_REO_TIMEOUT 6
+#define ETH_ALEN 6
#define ETH_HLEN 14
+#define ETH_P_IP 0x0800
#define ETH_P_IPV6 0x86DD
+#define NEXTHDR_TCP 6
+
+#define TCPOPT_NOP 1
+#define TCPOPT_EOL 0
+#define TCPOPT_MSS 2
+#define TCPOPT_WINDOW 3
+#define TCPOPT_TIMESTAMP 8
+#define TCPOPT_SACK_PERM 4
+
+#define TCPOLEN_MSS 4
+#define TCPOLEN_WINDOW 3
+#define TCPOLEN_TIMESTAMP 10
+#define TCPOLEN_SACK_PERM 2
+
#define CHECKSUM_NONE 0
#define CHECKSUM_PARTIAL 3
diff --git a/tools/testing/selftests/bpf/progs/test_siphash.h b/tools/testing/selftests/bpf/progs/test_siphash.h
new file mode 100644
index 000000000000..5d3a7ec36780
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_siphash.h
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#ifndef _TEST_SIPHASH_H
+#define _TEST_SIPHASH_H
+
+/* include/linux/bitops.h */
+static inline u64 rol64(u64 word, unsigned int shift)
+{
+ return (word << (shift & 63)) | (word >> ((-shift) & 63));
+}
+
+/* include/linux/siphash.h */
+#define SIPHASH_PERMUTATION(a, b, c, d) ( \
+ (a) += (b), (b) = rol64((b), 13), (b) ^= (a), (a) = rol64((a), 32), \
+ (c) += (d), (d) = rol64((d), 16), (d) ^= (c), \
+ (a) += (d), (d) = rol64((d), 21), (d) ^= (a), \
+ (c) += (b), (b) = rol64((b), 17), (b) ^= (c), (c) = rol64((c), 32))
+
+#define SIPHASH_CONST_0 0x736f6d6570736575ULL
+#define SIPHASH_CONST_1 0x646f72616e646f6dULL
+#define SIPHASH_CONST_2 0x6c7967656e657261ULL
+#define SIPHASH_CONST_3 0x7465646279746573ULL
+
+/* lib/siphash.c */
+#define SIPROUND SIPHASH_PERMUTATION(v0, v1, v2, v3)
+
+#define PREAMBLE(len) \
+ u64 v0 = SIPHASH_CONST_0; \
+ u64 v1 = SIPHASH_CONST_1; \
+ u64 v2 = SIPHASH_CONST_2; \
+ u64 v3 = SIPHASH_CONST_3; \
+ u64 b = ((u64)(len)) << 56; \
+ v3 ^= key->key[1]; \
+ v2 ^= key->key[0]; \
+ v1 ^= key->key[1]; \
+ v0 ^= key->key[0];
+
+#define POSTAMBLE \
+ v3 ^= b; \
+ SIPROUND; \
+ SIPROUND; \
+ v0 ^= b; \
+ v2 ^= 0xff; \
+ SIPROUND; \
+ SIPROUND; \
+ SIPROUND; \
+ SIPROUND; \
+ return (v0 ^ v1) ^ (v2 ^ v3);
+
+static inline u64 siphash_2u64(const u64 first, const u64 second, const siphash_key_t *key)
+{
+ PREAMBLE(16)
+ v3 ^= first;
+ SIPROUND;
+ SIPROUND;
+ v0 ^= first;
+ v3 ^= second;
+ SIPROUND;
+ SIPROUND;
+ v0 ^= second;
+ POSTAMBLE
+}
+#endif
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c
new file mode 100644
index 000000000000..a5501b29979a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c
@@ -0,0 +1,572 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+#include "bpf_tracing_net.h"
+#include "bpf_kfuncs.h"
+#include "test_siphash.h"
+#include "test_tcp_custom_syncookie.h"
+
+/* Hash is calculated for each client and split into ISN and TS.
+ *
+ * MSB LSB
+ * ISN: | 31 ... 8 | 7 6 | 5 | 4 | 3 2 1 0 |
+ * | Hash_1 | MSS | ECN | SACK | WScale |
+ *
+ * TS: | 31 ... 8 | 7 ... 0 |
+ * | Random | Hash_2 |
+ */
+#define COOKIE_BITS 8
+#define COOKIE_MASK (((__u32)1 << COOKIE_BITS) - 1)
+
+enum {
+ /* 0xf is invalid thus means that SYN did not have WScale. */
+ BPF_SYNCOOKIE_WSCALE_MASK = (1 << 4) - 1,
+ BPF_SYNCOOKIE_SACK = (1 << 4),
+ BPF_SYNCOOKIE_ECN = (1 << 5),
+};
+
+#define MSS_LOCAL_IPV4 65495
+#define MSS_LOCAL_IPV6 65476
+
+const __u16 msstab4[] = {
+ 536,
+ 1300,
+ 1460,
+ MSS_LOCAL_IPV4,
+};
+
+const __u16 msstab6[] = {
+ 1280 - 60, /* IPV6_MIN_MTU - 60 */
+ 1480 - 60,
+ 9000 - 60,
+ MSS_LOCAL_IPV6,
+};
+
+static siphash_key_t test_key_siphash = {
+ { 0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL }
+};
+
+struct tcp_syncookie {
+ struct __sk_buff *skb;
+ void *data_end;
+ struct ethhdr *eth;
+ struct iphdr *ipv4;
+ struct ipv6hdr *ipv6;
+ struct tcphdr *tcp;
+ union {
+ char *ptr;
+ __be32 *ptr32;
+ };
+ struct bpf_tcp_req_attrs attrs;
+ u32 cookie;
+ u64 first;
+};
+
+bool handled_syn, handled_ack;
+
+static int tcp_load_headers(struct tcp_syncookie *ctx)
+{
+ ctx->data_end = (void *)(long)ctx->skb->data_end;
+ ctx->eth = (struct ethhdr *)(long)ctx->skb->data;
+
+ if (ctx->eth + 1 > ctx->data_end)
+ goto err;
+
+ switch (bpf_ntohs(ctx->eth->h_proto)) {
+ case ETH_P_IP:
+ ctx->ipv4 = (struct iphdr *)(ctx->eth + 1);
+
+ if (ctx->ipv4 + 1 > ctx->data_end)
+ goto err;
+
+ if (ctx->ipv4->ihl != sizeof(*ctx->ipv4) / 4)
+ goto err;
+
+ if (ctx->ipv4->version != 4)
+ goto err;
+
+ if (ctx->ipv4->protocol != IPPROTO_TCP)
+ goto err;
+
+ ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1);
+ break;
+ case ETH_P_IPV6:
+ ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1);
+
+ if (ctx->ipv6 + 1 > ctx->data_end)
+ goto err;
+
+ if (ctx->ipv6->version != 6)
+ goto err;
+
+ if (ctx->ipv6->nexthdr != NEXTHDR_TCP)
+ goto err;
+
+ ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1);
+ break;
+ default:
+ goto err;
+ }
+
+ if (ctx->tcp + 1 > ctx->data_end)
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static int tcp_reload_headers(struct tcp_syncookie *ctx)
+{
+ /* Without volatile,
+ * R3 32-bit pointer arithmetic prohibited
+ */
+ volatile u64 data_len = ctx->skb->data_end - ctx->skb->data;
+
+ if (ctx->tcp->doff < sizeof(*ctx->tcp) / 4)
+ goto err;
+
+ /* Needed to calculate csum and parse TCP options. */
+ if (bpf_skb_change_tail(ctx->skb, data_len + 60 - ctx->tcp->doff * 4, 0))
+ goto err;
+
+ ctx->data_end = (void *)(long)ctx->skb->data_end;
+ ctx->eth = (struct ethhdr *)(long)ctx->skb->data;
+ if (ctx->ipv4) {
+ ctx->ipv4 = (struct iphdr *)(ctx->eth + 1);
+ ctx->ipv6 = NULL;
+ ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1);
+ } else {
+ ctx->ipv4 = NULL;
+ ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1);
+ ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1);
+ }
+
+ if ((void *)ctx->tcp + 60 > ctx->data_end)
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static __sum16 tcp_v4_csum(struct tcp_syncookie *ctx, __wsum csum)
+{
+ return csum_tcpudp_magic(ctx->ipv4->saddr, ctx->ipv4->daddr,
+ ctx->tcp->doff * 4, IPPROTO_TCP, csum);
+}
+
+static __sum16 tcp_v6_csum(struct tcp_syncookie *ctx, __wsum csum)
+{
+ return csum_ipv6_magic(&ctx->ipv6->saddr, &ctx->ipv6->daddr,
+ ctx->tcp->doff * 4, IPPROTO_TCP, csum);
+}
+
+static int tcp_validate_header(struct tcp_syncookie *ctx)
+{
+ s64 csum;
+
+ if (tcp_reload_headers(ctx))
+ goto err;
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0);
+ if (csum < 0)
+ goto err;
+
+ if (ctx->ipv4) {
+ /* check tcp_v4_csum(csum) is 0 if not on lo. */
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, ctx->ipv4->ihl * 4, 0);
+ if (csum < 0)
+ goto err;
+
+ if (csum_fold(csum) != 0)
+ goto err;
+ } else if (ctx->ipv6) {
+ /* check tcp_v6_csum(csum) is 0 if not on lo. */
+ }
+
+ return 0;
+err:
+ return -1;
+}
+
+static int tcp_parse_option(__u32 index, struct tcp_syncookie *ctx)
+{
+ char opcode, opsize;
+
+ if (ctx->ptr + 1 > ctx->data_end)
+ goto stop;
+
+ opcode = *ctx->ptr++;
+
+ if (opcode == TCPOPT_EOL)
+ goto stop;
+
+ if (opcode == TCPOPT_NOP)
+ goto next;
+
+ if (ctx->ptr + 1 > ctx->data_end)
+ goto stop;
+
+ opsize = *ctx->ptr++;
+
+ if (opsize < 2)
+ goto stop;
+
+ switch (opcode) {
+ case TCPOPT_MSS:
+ if (opsize == TCPOLEN_MSS && ctx->tcp->syn &&
+ ctx->ptr + (TCPOLEN_MSS - 2) < ctx->data_end)
+ ctx->attrs.mss = get_unaligned_be16(ctx->ptr);
+ break;
+ case TCPOPT_WINDOW:
+ if (opsize == TCPOLEN_WINDOW && ctx->tcp->syn &&
+ ctx->ptr + (TCPOLEN_WINDOW - 2) < ctx->data_end) {
+ ctx->attrs.wscale_ok = 1;
+ ctx->attrs.snd_wscale = *ctx->ptr;
+ }
+ break;
+ case TCPOPT_TIMESTAMP:
+ if (opsize == TCPOLEN_TIMESTAMP &&
+ ctx->ptr + (TCPOLEN_TIMESTAMP - 2) < ctx->data_end) {
+ ctx->attrs.rcv_tsval = get_unaligned_be32(ctx->ptr);
+ ctx->attrs.rcv_tsecr = get_unaligned_be32(ctx->ptr + 4);
+
+ if (ctx->tcp->syn && ctx->attrs.rcv_tsecr)
+ ctx->attrs.tstamp_ok = 0;
+ else
+ ctx->attrs.tstamp_ok = 1;
+ }
+ break;
+ case TCPOPT_SACK_PERM:
+ if (opsize == TCPOLEN_SACK_PERM && ctx->tcp->syn &&
+ ctx->ptr + (TCPOLEN_SACK_PERM - 2) < ctx->data_end)
+ ctx->attrs.sack_ok = 1;
+ break;
+ }
+
+ ctx->ptr += opsize - 2;
+next:
+ return 0;
+stop:
+ return 1;
+}
+
+static void tcp_parse_options(struct tcp_syncookie *ctx)
+{
+ ctx->ptr = (char *)(ctx->tcp + 1);
+
+ bpf_loop(40, tcp_parse_option, ctx, 0);
+}
+
+static int tcp_validate_sysctl(struct tcp_syncookie *ctx)
+{
+ if ((ctx->ipv4 && ctx->attrs.mss != MSS_LOCAL_IPV4) ||
+ (ctx->ipv6 && ctx->attrs.mss != MSS_LOCAL_IPV6))
+ goto err;
+
+ if (!ctx->attrs.wscale_ok || ctx->attrs.snd_wscale != 7)
+ goto err;
+
+ if (!ctx->attrs.tstamp_ok)
+ goto err;
+
+ if (!ctx->attrs.sack_ok)
+ goto err;
+
+ if (!ctx->tcp->ece || !ctx->tcp->cwr)
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static void tcp_prepare_cookie(struct tcp_syncookie *ctx)
+{
+ u32 seq = bpf_ntohl(ctx->tcp->seq);
+ u64 first = 0, second;
+ int mssind = 0;
+ u32 hash;
+
+ if (ctx->ipv4) {
+ for (mssind = ARRAY_SIZE(msstab4) - 1; mssind; mssind--)
+ if (ctx->attrs.mss >= msstab4[mssind])
+ break;
+
+ ctx->attrs.mss = msstab4[mssind];
+
+ first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr;
+ } else if (ctx->ipv6) {
+ for (mssind = ARRAY_SIZE(msstab6) - 1; mssind; mssind--)
+ if (ctx->attrs.mss >= msstab6[mssind])
+ break;
+
+ ctx->attrs.mss = msstab6[mssind];
+
+ first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 |
+ ctx->ipv6->daddr.in6_u.u6_addr32[0];
+ }
+
+ second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest;
+ hash = siphash_2u64(first, second, &test_key_siphash);
+
+ if (ctx->attrs.tstamp_ok) {
+ ctx->attrs.rcv_tsecr = bpf_get_prandom_u32();
+ ctx->attrs.rcv_tsecr &= ~COOKIE_MASK;
+ ctx->attrs.rcv_tsecr |= hash & COOKIE_MASK;
+ }
+
+ hash &= ~COOKIE_MASK;
+ hash |= mssind << 6;
+
+ if (ctx->attrs.wscale_ok)
+ hash |= ctx->attrs.snd_wscale & BPF_SYNCOOKIE_WSCALE_MASK;
+
+ if (ctx->attrs.sack_ok)
+ hash |= BPF_SYNCOOKIE_SACK;
+
+ if (ctx->attrs.tstamp_ok && ctx->tcp->ece && ctx->tcp->cwr)
+ hash |= BPF_SYNCOOKIE_ECN;
+
+ ctx->cookie = hash;
+}
+
+static void tcp_write_options(struct tcp_syncookie *ctx)
+{
+ ctx->ptr32 = (__be32 *)(ctx->tcp + 1);
+
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_MSS << 24 | TCPOLEN_MSS << 16 |
+ ctx->attrs.mss);
+
+ if (ctx->attrs.wscale_ok)
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
+ TCPOPT_WINDOW << 16 |
+ TCPOLEN_WINDOW << 8 |
+ ctx->attrs.snd_wscale);
+
+ if (ctx->attrs.tstamp_ok) {
+ if (ctx->attrs.sack_ok)
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_SACK_PERM << 24 |
+ TCPOLEN_SACK_PERM << 16 |
+ TCPOPT_TIMESTAMP << 8 |
+ TCPOLEN_TIMESTAMP);
+ else
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
+ TCPOPT_NOP << 16 |
+ TCPOPT_TIMESTAMP << 8 |
+ TCPOLEN_TIMESTAMP);
+
+ *ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsecr);
+ *ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsval);
+ } else if (ctx->attrs.sack_ok) {
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
+ TCPOPT_NOP << 16 |
+ TCPOPT_SACK_PERM << 8 |
+ TCPOLEN_SACK_PERM);
+ }
+}
+
+static int tcp_handle_syn(struct tcp_syncookie *ctx)
+{
+ s64 csum;
+
+ if (tcp_validate_header(ctx))
+ goto err;
+
+ tcp_parse_options(ctx);
+
+ if (tcp_validate_sysctl(ctx))
+ goto err;
+
+ tcp_prepare_cookie(ctx);
+ tcp_write_options(ctx);
+
+ swap(ctx->tcp->source, ctx->tcp->dest);
+ ctx->tcp->check = 0;
+ ctx->tcp->ack_seq = bpf_htonl(bpf_ntohl(ctx->tcp->seq) + 1);
+ ctx->tcp->seq = bpf_htonl(ctx->cookie);
+ ctx->tcp->doff = ((long)ctx->ptr32 - (long)ctx->tcp) >> 2;
+ ctx->tcp->ack = 1;
+ if (!ctx->attrs.tstamp_ok || !ctx->tcp->ece || !ctx->tcp->cwr)
+ ctx->tcp->ece = 0;
+ ctx->tcp->cwr = 0;
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0);
+ if (csum < 0)
+ goto err;
+
+ if (ctx->ipv4) {
+ swap(ctx->ipv4->saddr, ctx->ipv4->daddr);
+ ctx->tcp->check = tcp_v4_csum(ctx, csum);
+
+ ctx->ipv4->check = 0;
+ ctx->ipv4->tos = 0;
+ ctx->ipv4->tot_len = bpf_htons((long)ctx->ptr32 - (long)ctx->ipv4);
+ ctx->ipv4->id = 0;
+ ctx->ipv4->ttl = 64;
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, sizeof(*ctx->ipv4), 0);
+ if (csum < 0)
+ goto err;
+
+ ctx->ipv4->check = csum_fold(csum);
+ } else if (ctx->ipv6) {
+ swap(ctx->ipv6->saddr, ctx->ipv6->daddr);
+ ctx->tcp->check = tcp_v6_csum(ctx, csum);
+
+ *(__be32 *)ctx->ipv6 = bpf_htonl(0x60000000);
+ ctx->ipv6->payload_len = bpf_htons((long)ctx->ptr32 - (long)ctx->tcp);
+ ctx->ipv6->hop_limit = 64;
+ }
+
+ swap_array(ctx->eth->h_source, ctx->eth->h_dest);
+
+ if (bpf_skb_change_tail(ctx->skb, (long)ctx->ptr32 - (long)ctx->eth, 0))
+ goto err;
+
+ return bpf_redirect(ctx->skb->ifindex, 0);
+err:
+ return TC_ACT_SHOT;
+}
+
+static int tcp_validate_cookie(struct tcp_syncookie *ctx)
+{
+ u32 cookie = bpf_ntohl(ctx->tcp->ack_seq) - 1;
+ u32 seq = bpf_ntohl(ctx->tcp->seq) - 1;
+ u64 first = 0, second;
+ int mssind;
+ u32 hash;
+
+ if (ctx->ipv4)
+ first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr;
+ else if (ctx->ipv6)
+ first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 |
+ ctx->ipv6->daddr.in6_u.u6_addr32[0];
+
+ second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest;
+ hash = siphash_2u64(first, second, &test_key_siphash);
+
+ if (ctx->attrs.tstamp_ok)
+ hash -= ctx->attrs.rcv_tsecr & COOKIE_MASK;
+ else
+ hash &= ~COOKIE_MASK;
+
+ hash -= cookie & ~COOKIE_MASK;
+ if (hash)
+ goto err;
+
+ mssind = (cookie & (3 << 6)) >> 6;
+ if (ctx->ipv4) {
+ if (mssind > ARRAY_SIZE(msstab4))
+ goto err;
+
+ ctx->attrs.mss = msstab4[mssind];
+ } else {
+ if (mssind > ARRAY_SIZE(msstab6))
+ goto err;
+
+ ctx->attrs.mss = msstab6[mssind];
+ }
+
+ ctx->attrs.snd_wscale = cookie & BPF_SYNCOOKIE_WSCALE_MASK;
+ ctx->attrs.rcv_wscale = ctx->attrs.snd_wscale;
+ ctx->attrs.wscale_ok = ctx->attrs.snd_wscale == BPF_SYNCOOKIE_WSCALE_MASK;
+ ctx->attrs.sack_ok = cookie & BPF_SYNCOOKIE_SACK;
+ ctx->attrs.ecn_ok = cookie & BPF_SYNCOOKIE_ECN;
+
+ return 0;
+err:
+ return -1;
+}
+
+static int tcp_handle_ack(struct tcp_syncookie *ctx)
+{
+ struct bpf_sock_tuple tuple;
+ struct bpf_sock *skc;
+ int ret = TC_ACT_OK;
+ struct sock *sk;
+ u32 tuple_size;
+
+ if (ctx->ipv4) {
+ tuple.ipv4.saddr = ctx->ipv4->saddr;
+ tuple.ipv4.daddr = ctx->ipv4->daddr;
+ tuple.ipv4.sport = ctx->tcp->source;
+ tuple.ipv4.dport = ctx->tcp->dest;
+ tuple_size = sizeof(tuple.ipv4);
+ } else if (ctx->ipv6) {
+ __builtin_memcpy(tuple.ipv6.saddr, &ctx->ipv6->saddr, sizeof(tuple.ipv6.saddr));
+ __builtin_memcpy(tuple.ipv6.daddr, &ctx->ipv6->daddr, sizeof(tuple.ipv6.daddr));
+ tuple.ipv6.sport = ctx->tcp->source;
+ tuple.ipv6.dport = ctx->tcp->dest;
+ tuple_size = sizeof(tuple.ipv6);
+ } else {
+ goto out;
+ }
+
+ skc = bpf_skc_lookup_tcp(ctx->skb, &tuple, tuple_size, -1, 0);
+ if (!skc)
+ goto out;
+
+ if (skc->state != TCP_LISTEN)
+ goto release;
+
+ sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
+ if (!sk)
+ goto err;
+
+ if (tcp_validate_header(ctx))
+ goto err;
+
+ tcp_parse_options(ctx);
+
+ if (tcp_validate_cookie(ctx))
+ goto err;
+
+ ret = bpf_sk_assign_tcp_reqsk(ctx->skb, sk, &ctx->attrs, sizeof(ctx->attrs));
+ if (ret < 0)
+ goto err;
+
+release:
+ bpf_sk_release(skc);
+out:
+ return ret;
+
+err:
+ ret = TC_ACT_SHOT;
+ goto release;
+}
+
+SEC("tc")
+int tcp_custom_syncookie(struct __sk_buff *skb)
+{
+ struct tcp_syncookie ctx = {
+ .skb = skb,
+ };
+
+ if (tcp_load_headers(&ctx))
+ return TC_ACT_OK;
+
+ if (ctx.tcp->rst)
+ return TC_ACT_OK;
+
+ if (ctx.tcp->syn) {
+ if (ctx.tcp->ack)
+ return TC_ACT_OK;
+
+ handled_syn = true;
+
+ return tcp_handle_syn(&ctx);
+ }
+
+ handled_ack = true;
+
+ return tcp_handle_ack(&ctx);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h
new file mode 100644
index 000000000000..29a6a53cf229
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#ifndef _TEST_TCP_SYNCOOKIE_H
+#define _TEST_TCP_SYNCOOKIE_H
+
+#define __packed __attribute__((__packed__))
+#define __force
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+#define swap(a, b) \
+ do { \
+ typeof(a) __tmp = (a); \
+ (a) = (b); \
+ (b) = __tmp; \
+ } while (0)
+
+#define swap_array(a, b) \
+ do { \
+ typeof(a) __tmp[sizeof(a)]; \
+ __builtin_memcpy(__tmp, a, sizeof(a)); \
+ __builtin_memcpy(a, b, sizeof(a)); \
+ __builtin_memcpy(b, __tmp, sizeof(a)); \
+ } while (0)
+
+/* asm-generic/unaligned.h */
+#define __get_unaligned_t(type, ptr) ({ \
+ const struct { type x; } __packed * __pptr = (typeof(__pptr))(ptr); \
+ __pptr->x; \
+})
+
+#define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr))
+
+static inline u16 get_unaligned_be16(const void *p)
+{
+ return bpf_ntohs(__get_unaligned_t(__be16, p));
+}
+
+static inline u32 get_unaligned_be32(const void *p)
+{
+ return bpf_ntohl(__get_unaligned_t(__be32, p));
+}
+
+/* lib/checksum.c */
+static inline u32 from64to32(u64 x)
+{
+ /* add up 32-bit and 32-bit for 32+c bit */
+ x = (x & 0xffffffff) + (x >> 32);
+ /* add up carry.. */
+ x = (x & 0xffffffff) + (x >> 32);
+ return (u32)x;
+}
+
+static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
+ __u32 len, __u8 proto, __wsum sum)
+{
+ unsigned long long s = (__force u32)sum;
+
+ s += (__force u32)saddr;
+ s += (__force u32)daddr;
+#ifdef __BIG_ENDIAN
+ s += proto + len;
+#else
+ s += (proto + len) << 8;
+#endif
+ return (__force __wsum)from64to32(s);
+}
+
+/* asm-generic/checksum.h */
+static inline __sum16 csum_fold(__wsum csum)
+{
+ u32 sum = (__force u32)csum;
+
+ sum = (sum & 0xffff) + (sum >> 16);
+ sum = (sum & 0xffff) + (sum >> 16);
+ return (__force __sum16)~sum;
+}
+
+static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
+ __u8 proto, __wsum sum)
+{
+ return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
+}
+
+/* net/ipv6/ip6_checksum.c */
+static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum csum)
+{
+ int carry;
+ __u32 ulen;
+ __u32 uproto;
+ __u32 sum = (__force u32)csum;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[0];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[0]);
+ sum += carry;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[1];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[1]);
+ sum += carry;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[2];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[2]);
+ sum += carry;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[3];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[3]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[0];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[0]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[1];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[1]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[2];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[2]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[3];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[3]);
+ sum += carry;
+
+ ulen = (__force u32)bpf_htonl((__u32)len);
+ sum += ulen;
+ carry = (sum < ulen);
+ sum += carry;
+
+ uproto = (__force u32)bpf_htonl(proto);
+ sum += uproto;
+ carry = (sum < uproto);
+ sum += carry;
+
+ return csum_fold((__force __wsum)sum);
+}
+#endif
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC.
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (5 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk() Kuniyuki Iwashima
@ 2024-01-17 1:50 ` patchwork-bot+netdevbpf
6 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-17 1:50 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: edumazet, ast, daniel, andrii, martin.lau, pabeni, kuni1840, bpf,
netdev
Hello:
This series was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:
On Mon, 15 Jan 2024 12:55:08 -0800 you wrote:
> Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless
> for the connection request until a valid ACK is responded to the SYN+ACK.
>
> The cookie contains two kinds of host-specific bits, a timestamp and
> secrets, so only can it be validated by the generator. It means SYN
> Cookie consumes network resources between the client and the server;
> intermediate nodes must remember which nodes to route ACK for the cookie.
>
> [...]
Here is the summary with links:
- [v8,bpf-next,1/6] tcp: Move tcp_ns_to_ts() to tcp.h
https://git.kernel.org/bpf/bpf-next/c/e8a7ea899527
- [v8,bpf-next,2/6] tcp: Move skb_steal_sock() to request_sock.h
https://git.kernel.org/bpf/bpf-next/c/2d1ee30a3b07
- [v8,bpf-next,3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().
https://git.kernel.org/bpf/bpf-next/c/5f8b96b9b391
- [v8,bpf-next,4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
https://git.kernel.org/bpf/bpf-next/c/311ef79955d3
- [v8,bpf-next,5/6] bpf: tcp: Support arbitrary SYN Cookie.
https://git.kernel.org/bpf/bpf-next/c/b9c3eca5c086
- [v8,bpf-next,6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk().
https://git.kernel.org/bpf/bpf-next/c/98af7dca1e0d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie.
2024-01-15 20:55 ` [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
@ 2024-01-17 1:55 ` Martin KaFai Lau
0 siblings, 0 replies; 11+ messages in thread
From: Martin KaFai Lau @ 2024-01-17 1:55 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Kuniyuki Iwashima, bpf, netdev, Eric Dumazet, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Paolo Abeni
On 1/15/24 12:55 PM, Kuniyuki Iwashima wrote:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8c9f67c81e22..647d04171b7e 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -11837,6 +11837,106 @@ __bpf_kfunc int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
>
> return 0;
> }
> +
> +__bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct sk_buff *skb, struct sock *sk,
> + struct bpf_tcp_req_attrs *attrs, int attrs__sz)
> +{
> +#if IS_ENABLED(CONFIG_SYN_COOKIES)
> + const struct request_sock_ops *ops;
> + struct inet_request_sock *ireq;
> + struct tcp_request_sock *treq;
> + struct request_sock *req;
> + struct net *net;
> + __u16 min_mss;
> + u32 tsoff = 0;
> +
> + if (attrs__sz != sizeof(*attrs) ||
> + attrs->reserved[0] || attrs->reserved[1] || attrs->reserved[2])
> + return -EINVAL;
> +
> + if (!sk)
I removed this "!sk" check, the verifier will check for it,
and ...
> +BTF_SET8_START(bpf_kfunc_check_set_tcp_reqsk)
> +BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk)
... limited it to KF_TRUSTED_ARGS. The arg "sk" must be from "bpf_sk*_lookup_*"
or from "bpf_map_lookup_elem(&sock_map,...)". Both of them have
"reg->ref_obj_id" (i.e. the verifier tracks the refcnt acquire/release) and it
is as good as trusted ptr.
The above is some final details I noticed. Applied. Thanks.
> +BTF_SET8_END(bpf_kfunc_check_set_tcp_reqsk)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
2024-01-15 20:55 ` [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
@ 2024-03-15 13:37 ` Eric Dumazet
2024-03-15 19:02 ` Kuniyuki Iwashima
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2024-03-15 13:37 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Paolo Abeni, Kuniyuki Iwashima, bpf, netdev
On Mon, Jan 15, 2024 at 9:57 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> We will support arbitrary SYN Cookie with BPF in the following
> patch.
>
> If BPF prog validates ACK and kfunc allocates a reqsk, it will
> be carried to cookie_[46]_check() as skb->sk. If skb->sk is not
> NULL, we call cookie_bpf_check().
>
> Then, we clear skb->sk and skb->destructor, which are needed not
> to hold refcnt for reqsk and the listener. See the following patch
> for details.
>
> After that, we finish initialisation for the remaining fields with
> cookie_tcp_reqsk_init().
>
> Note that the server side WScale is set only for non-BPF SYN Cookie.
So the difference between BPF and non-BPF is using a req->syncookie
which had a prior meaning ?
This is very confusing, and needs documentation/comments.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> include/net/tcp.h | 20 ++++++++++++++++++++
> net/ipv4/syncookies.c | 31 +++++++++++++++++++++++++++----
> net/ipv6/syncookies.c | 13 +++++++++----
> 3 files changed, 56 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 114000e71a46..dfe99a084a71 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -599,6 +599,26 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
> dst_feature(dst, RTAX_FEATURE_ECN);
> }
>
> +#if IS_ENABLED(CONFIG_BPF)
> +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> +{
> + return skb->sk;
> +}
> +
> +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
> +#else
> +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> +{
> + return false;
> +}
> +
> +static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
> + struct sk_buff *skb)
> +{
> + return NULL;
> +}
> +#endif
> +
> /* From net/ipv6/syncookies.c */
> int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
> struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 981944c22820..be88bf586ff9 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -295,6 +295,24 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
> return 0;
> }
>
> +#if IS_ENABLED(CONFIG_BPF)
> +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
> +{
> + struct request_sock *req = inet_reqsk(skb->sk);
> +
> + skb->sk = NULL;
> + skb->destructor = NULL;
> +
> + if (cookie_tcp_reqsk_init(sk, skb, req)) {
> + reqsk_free(req);
> + req = NULL;
> + }
> +
> + return req;
> +}
> +EXPORT_SYMBOL_GPL(cookie_bpf_check);
> +#endif
> +
> struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
> struct sock *sk, struct sk_buff *skb,
> struct tcp_options_received *tcp_opt,
> @@ -395,9 +413,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> !th->ack || th->rst)
> goto out;
>
> - req = cookie_tcp_check(net, sk, skb);
> - if (IS_ERR(req))
> - goto out;
> + if (cookie_bpf_ok(skb)) {
> + req = cookie_bpf_check(sk, skb);
> + } else {
> + req = cookie_tcp_check(net, sk, skb);
> + if (IS_ERR(req))
> + goto out;
> + }
> if (!req)
> goto out_drop;
>
> @@ -445,7 +467,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> ireq->wscale_ok, &rcv_wscale,
> dst_metric(&rt->dst, RTAX_INITRWND));
>
> - ireq->rcv_wscale = rcv_wscale;
> + if (!req->syncookie)
> + ireq->rcv_wscale = rcv_wscale;
> ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst);
>
> ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);
> diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
> index c8d2ca27220c..6b9c69278819 100644
> --- a/net/ipv6/syncookies.c
> +++ b/net/ipv6/syncookies.c
> @@ -182,9 +182,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> !th->ack || th->rst)
> goto out;
>
> - req = cookie_tcp_check(net, sk, skb);
> - if (IS_ERR(req))
> - goto out;
> + if (cookie_bpf_ok(skb)) {
> + req = cookie_bpf_check(sk, skb);
> + } else {
> + req = cookie_tcp_check(net, sk, skb);
> + if (IS_ERR(req))
> + goto out;
> + }
> if (!req)
> goto out_drop;
>
> @@ -247,7 +251,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> ireq->wscale_ok, &rcv_wscale,
> dst_metric(dst, RTAX_INITRWND));
>
> - ireq->rcv_wscale = rcv_wscale;
> + if (!req->syncookie)
> + ireq->rcv_wscale = rcv_wscale;
I think a comment is deserved. I do not understand this.
cookie_v6_check() is dealing with syncookie, unless I am mistaken.
Also syzbot is not happy, req->syncookie might be uninitialized here.
BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0
net/ipv4/syncookies.c:477
cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477
tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5538 [inline]
__netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
process_backlog+0x480/0x8b0 net/core/dev.c:5981
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
do_softirq+0x9a/0x100 kernel/softirq.c:455
__local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
__dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235
__ip_finish_output+0x287/0x810
ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433
dst_output include/net/dst.h:450 [inline]
ip_local_out net/ipv4/ip_output.c:129 [inline]
__ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535
ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549
__tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462
tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline]
tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792
__tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline]
tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578
tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716
inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923
__sys_shutdown_sock net/socket.c:2425 [inline]
__sys_shutdown net/socket.c:2437 [inline]
__do_sys_shutdown net/socket.c:2445 [inline]
__se_sys_shutdown+0x2a4/0x440 net/socket.c:2443
__x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was stored to memory at:
reqsk_alloc include/net/request_sock.h:148 [inline]
inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978
cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328
cookie_tcp_check net/ipv4/syncookies.c:388 [inline]
cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420
tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5538 [inline]
__netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
process_backlog+0x480/0x8b0 net/core/dev.c:5981
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
Uninit was created at:
__alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page mm/slub.c:2175 [inline]
allocate_slab mm/slub.c:2338 [inline]
new_slab+0x2de/0x1400 mm/slub.c:2391
___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
__slab_alloc mm/slub.c:3610 [inline]
__slab_alloc_node mm/slub.c:3663 [inline]
slab_alloc_node mm/slub.c:3835 [inline]
kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
reqsk_alloc include/net/request_sock.h:131 [inline]
inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978
tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135
tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716
tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655
tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639
ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674
__netif_receive_skb_list_ptype net/core/dev.c:5581 [inline]
__netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629
__netif_receive_skb_list net/core/dev.c:5681 [inline]
netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773
gro_normal_list include/net/gro.h:438 [inline]
napi_complete_done+0x425/0x880 net/core/dev.c:6113
virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline]
virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted
6.8.0-syzkaller-05562-g61387b8dcf1d #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 02/29/2024
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
2024-03-15 13:37 ` Eric Dumazet
@ 2024-03-15 19:02 ` Kuniyuki Iwashima
0 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-03-15 19:02 UTC (permalink / raw)
To: edumazet
Cc: andrii, ast, bpf, daniel, kuni1840, kuniyu, martin.lau, netdev,
pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 15 Mar 2024 14:37:57 +0100
> On Mon, Jan 15, 2024 at 9:57 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > We will support arbitrary SYN Cookie with BPF in the following
> > patch.
> >
> > If BPF prog validates ACK and kfunc allocates a reqsk, it will
> > be carried to cookie_[46]_check() as skb->sk. If skb->sk is not
> > NULL, we call cookie_bpf_check().
> >
> > Then, we clear skb->sk and skb->destructor, which are needed not
> > to hold refcnt for reqsk and the listener. See the following patch
> > for details.
> >
> > After that, we finish initialisation for the remaining fields with
> > cookie_tcp_reqsk_init().
> >
> > Note that the server side WScale is set only for non-BPF SYN Cookie.
>
> So the difference between BPF and non-BPF is using a req->syncookie
> which had a prior meaning ?
Yes, it was used only in tcp_conn_request(), so I reused the field
and added another meaning for syncookie ACK path.
>
> This is very confusing, and needs documentation/comments.
Will add comment in request_sock.h and syncookies.c
>
>
> >
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> > include/net/tcp.h | 20 ++++++++++++++++++++
> > net/ipv4/syncookies.c | 31 +++++++++++++++++++++++++++----
> > net/ipv6/syncookies.c | 13 +++++++++----
> > 3 files changed, 56 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 114000e71a46..dfe99a084a71 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -599,6 +599,26 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
> > dst_feature(dst, RTAX_FEATURE_ECN);
> > }
> >
> > +#if IS_ENABLED(CONFIG_BPF)
> > +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> > +{
> > + return skb->sk;
> > +}
> > +
> > +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
> > +#else
> > +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> > +{
> > + return false;
> > +}
> > +
> > +static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
> > + struct sk_buff *skb)
> > +{
> > + return NULL;
> > +}
> > +#endif
> > +
> > /* From net/ipv6/syncookies.c */
> > int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
> > struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
> > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> > index 981944c22820..be88bf586ff9 100644
> > --- a/net/ipv4/syncookies.c
> > +++ b/net/ipv4/syncookies.c
> > @@ -295,6 +295,24 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
> > return 0;
> > }
> >
> > +#if IS_ENABLED(CONFIG_BPF)
> > +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
> > +{
> > + struct request_sock *req = inet_reqsk(skb->sk);
> > +
> > + skb->sk = NULL;
> > + skb->destructor = NULL;
> > +
> > + if (cookie_tcp_reqsk_init(sk, skb, req)) {
> > + reqsk_free(req);
> > + req = NULL;
> > + }
> > +
> > + return req;
> > +}
> > +EXPORT_SYMBOL_GPL(cookie_bpf_check);
> > +#endif
> > +
> > struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
> > struct sock *sk, struct sk_buff *skb,
> > struct tcp_options_received *tcp_opt,
> > @@ -395,9 +413,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> > !th->ack || th->rst)
> > goto out;
> >
> > - req = cookie_tcp_check(net, sk, skb);
> > - if (IS_ERR(req))
> > - goto out;
> > + if (cookie_bpf_ok(skb)) {
> > + req = cookie_bpf_check(sk, skb);
> > + } else {
> > + req = cookie_tcp_check(net, sk, skb);
> > + if (IS_ERR(req))
> > + goto out;
> > + }
> > if (!req)
> > goto out_drop;
> >
> > @@ -445,7 +467,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> > ireq->wscale_ok, &rcv_wscale,
> > dst_metric(&rt->dst, RTAX_INITRWND));
> >
> > - ireq->rcv_wscale = rcv_wscale;
> > + if (!req->syncookie)
> > + ireq->rcv_wscale = rcv_wscale;
> > ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst);
> >
> > ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);
> > diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
> > index c8d2ca27220c..6b9c69278819 100644
> > --- a/net/ipv6/syncookies.c
> > +++ b/net/ipv6/syncookies.c
> > @@ -182,9 +182,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> > !th->ack || th->rst)
> > goto out;
> >
> > - req = cookie_tcp_check(net, sk, skb);
> > - if (IS_ERR(req))
> > - goto out;
> > + if (cookie_bpf_ok(skb)) {
> > + req = cookie_bpf_check(sk, skb);
> > + } else {
> > + req = cookie_tcp_check(net, sk, skb);
> > + if (IS_ERR(req))
> > + goto out;
> > + }
> > if (!req)
> > goto out_drop;
> >
> > @@ -247,7 +251,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> > ireq->wscale_ok, &rcv_wscale,
> > dst_metric(dst, RTAX_INITRWND));
> >
> > - ireq->rcv_wscale = rcv_wscale;
> > + if (!req->syncookie)
> > + ireq->rcv_wscale = rcv_wscale;
>
> I think a comment is deserved. I do not understand this.
>
> cookie_v6_check() is dealing with syncookie, unless I am mistaken.
Exactly, in both cases, here we handle syncookie, but req->syncookie
can be true only for the BPF case.
> Also syzbot is not happy, req->syncookie might be uninitialized here.
I'll make sure we init the field during allocation.
Thank you!
>
> BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0
> net/ipv4/syncookies.c:477
> cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477
> tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
> tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
> tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
> ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
> ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
> dst_input include/net/dst.h:460 [inline]
> ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
> __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
> __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
> process_backlog+0x480/0x8b0 net/core/dev.c:5981
> __napi_poll+0xe7/0x980 net/core/dev.c:6632
> napi_poll net/core/dev.c:6701 [inline]
> net_rx_action+0x89d/0x1820 net/core/dev.c:6813
> __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
> do_softirq+0x9a/0x100 kernel/softirq.c:455
> __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
> local_bh_enable include/linux/bottom_half.h:33 [inline]
> rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
> __dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362
> dev_queue_xmit include/linux/netdevice.h:3091 [inline]
> neigh_hh_output include/net/neighbour.h:526 [inline]
> neigh_output include/net/neighbour.h:540 [inline]
> ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235
> __ip_finish_output+0x287/0x810
> ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323
> NF_HOOK_COND include/linux/netfilter.h:303 [inline]
> ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433
> dst_output include/net/dst.h:450 [inline]
> ip_local_out net/ipv4/ip_output.c:129 [inline]
> __ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535
> ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549
> __tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462
> tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline]
> tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792
> __tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline]
> tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578
> tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716
> inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923
> __sys_shutdown_sock net/socket.c:2425 [inline]
> __sys_shutdown net/socket.c:2437 [inline]
> __do_sys_shutdown net/socket.c:2445 [inline]
> __se_sys_shutdown+0x2a4/0x440 net/socket.c:2443
> __x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443
> do_syscall_64+0xd5/0x1f0
> entry_SYSCALL_64_after_hwframe+0x6d/0x75
>
> Uninit was stored to memory at:
> reqsk_alloc include/net/request_sock.h:148 [inline]
> inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978
> cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328
> cookie_tcp_check net/ipv4/syncookies.c:388 [inline]
> cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420
> tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
> tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
> tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
> ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
> ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
> dst_input include/net/dst.h:460 [inline]
> ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
> __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
> __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
> process_backlog+0x480/0x8b0 net/core/dev.c:5981
> __napi_poll+0xe7/0x980 net/core/dev.c:6632
> napi_poll net/core/dev.c:6701 [inline]
> net_rx_action+0x89d/0x1820 net/core/dev.c:6813
> __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
>
> Uninit was created at:
> __alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592
> __alloc_pages_node include/linux/gfp.h:238 [inline]
> alloc_pages_node include/linux/gfp.h:261 [inline]
> alloc_slab_page mm/slub.c:2175 [inline]
> allocate_slab mm/slub.c:2338 [inline]
> new_slab+0x2de/0x1400 mm/slub.c:2391
> ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
> __slab_alloc mm/slub.c:3610 [inline]
> __slab_alloc_node mm/slub.c:3663 [inline]
> slab_alloc_node mm/slub.c:3835 [inline]
> kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
> reqsk_alloc include/net/request_sock.h:131 [inline]
> inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978
> tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135
> tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716
> tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655
> tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929
> tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
> ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
> ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
> dst_input include/net/dst.h:460 [inline]
> ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
> ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
> ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639
> ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674
> __netif_receive_skb_list_ptype net/core/dev.c:5581 [inline]
> __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629
> __netif_receive_skb_list net/core/dev.c:5681 [inline]
> netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773
> gro_normal_list include/net/gro.h:438 [inline]
> napi_complete_done+0x425/0x880 net/core/dev.c:6113
> virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline]
> virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211
> __napi_poll+0xe7/0x980 net/core/dev.c:6632
> napi_poll net/core/dev.c:6701 [inline]
> net_rx_action+0x89d/0x1820 net/core/dev.c:6813
> __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
>
> CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted
> 6.8.0-syzkaller-05562-g61387b8dcf1d #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 02/29/2024
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-03-15 19:02 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock() Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
2024-03-15 13:37 ` Eric Dumazet
2024-03-15 19:02 ` Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
2024-01-17 1:55 ` Martin KaFai Lau
2024-01-15 20:55 ` [PATCH v8 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk() Kuniyuki Iwashima
2024-01-17 1:50 ` [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).