* [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h Kuniyuki Iwashima
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF.
When BPF prog validates ACK and kfunc allocates a reqsk, we need
to call tcp_ns_to_ts() to calculate an offset of TSval for later
use:
time
t0 : Send SYN+ACK
-> tsval = Initial TSval (Random Number)
t1 : Recv ACK of 3WHS
-> tsoff = TSecr - tcp_ns_to_ts(usec_ts_ok, tcp_clock_ns())
= Initial TSval - t1
t2 : Send ACK
-> tsval = t2 + tsoff
= Initial TSval + (t2 - t1)
= Initial TSval + Time Delta (x)
(x) Note that the time delta does not include the initial RTT
from t0 to t1.
Let's move tcp_ns_to_ts() to tcp.h.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
include/net/tcp.h | 9 +++++++++
net/ipv4/syncookies.c | 9 ---------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index dd78a1181031..114000e71a46 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -577,6 +577,15 @@ static inline u32 tcp_cookie_time(void)
return val;
}
+/* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */
+static inline u64 tcp_ns_to_ts(bool usec_ts, u64 val)
+{
+ if (usec_ts)
+ return div_u64(val, NSEC_PER_USEC);
+
+ return div_u64(val, NSEC_PER_MSEC);
+}
+
u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th,
u16 *mssp);
__u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 61f1c96cfe63..981944c22820 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -51,15 +51,6 @@ static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport,
count, &syncookie_secret[c]);
}
-/* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */
-static u64 tcp_ns_to_ts(bool usec_ts, u64 val)
-{
- if (usec_ts)
- return div_u64(val, NSEC_PER_USEC);
-
- return div_u64(val, NSEC_PER_MSEC);
-}
-
/*
* when syncookies are in effect and tcp timestamps are enabled we encode
* tcp options in the lower bits of the timestamp value that will be
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v8 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock() Kuniyuki Iwashima
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF.
If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to TCP stack as skb->sk with req->syncookie 1.
In skb_steal_sock(), we need to check inet_reqsk(sk)->syncookie
to see if the reqsk is created by kfunc. However, inet_reqsk()
is not available in sock.h.
Let's move skb_steal_sock() to request_sock.h.
While at it, we refactor skb_steal_sock() so it returns early if
skb->sk is NULL to minimise the following patch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
include/net/request_sock.h | 28 ++++++++++++++++++++++++++++
include/net/sock.h | 25 -------------------------
2 files changed, 28 insertions(+), 25 deletions(-)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 144c39db9898..26c630c40abb 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -83,6 +83,34 @@ static inline struct sock *req_to_sk(struct request_sock *req)
return (struct sock *)req;
}
+/**
+ * skb_steal_sock - steal a socket from an sk_buff
+ * @skb: sk_buff to steal the socket from
+ * @refcounted: is set to true if the socket is reference-counted
+ * @prefetched: is set to true if the socket was assigned from bpf
+ */
+static inline struct sock *skb_steal_sock(struct sk_buff *skb,
+ bool *refcounted, bool *prefetched)
+{
+ struct sock *sk = skb->sk;
+
+ if (!sk) {
+ *prefetched = false;
+ *refcounted = false;
+ return NULL;
+ }
+
+ *prefetched = skb_sk_is_prefetched(skb);
+ if (*prefetched)
+ *refcounted = sk_is_refcounted(sk);
+ else
+ *refcounted = true;
+
+ skb->destructor = NULL;
+ skb->sk = NULL;
+ return sk;
+}
+
static inline struct request_sock *
reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
bool attach_listener)
diff --git a/include/net/sock.h b/include/net/sock.h
index a7f815c7cfdf..32a399fdcbb5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2814,31 +2814,6 @@ sk_is_refcounted(struct sock *sk)
return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE);
}
-/**
- * skb_steal_sock - steal a socket from an sk_buff
- * @skb: sk_buff to steal the socket from
- * @refcounted: is set to true if the socket is reference-counted
- * @prefetched: is set to true if the socket was assigned from bpf
- */
-static inline struct sock *
-skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched)
-{
- if (skb->sk) {
- struct sock *sk = skb->sk;
-
- *refcounted = true;
- *prefetched = skb_sk_is_prefetched(skb);
- if (*prefetched)
- *refcounted = sk_is_refcounted(sk);
- skb->destructor = NULL;
- skb->sk = NULL;
- return sk;
- }
- *prefetched = false;
- *refcounted = false;
- return NULL;
-}
-
/* Checks if this SKB belongs to an HW offloaded socket
* and whether any SW fallbacks are required based on dev.
* Check decrypted mark in case skb_orphan() cleared socket.
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v8 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 1/6] tcp: Move tcp_ns_to_ts() to tcp.h Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 2/6] tcp: Move skb_steal_sock() to request_sock.h Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-15 20:55 ` [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF.
If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to TCP stack as skb->sk with req->syncookie 1. Also,
the reqsk has its listener as req->rsk_listener with no refcnt
taken.
When the TCP stack looks up a socket from the skb, we steal
inet_reqsk(skb->sk)->rsk_listener in skb_steal_sock() so that
the skb will be processed in cookie_v[46]_check() with the
listener.
Note that we do not clear skb->sk and skb->destructor so that we
can carry the reqsk to cookie_v[46]_check().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/request_sock.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 26c630c40abb..8839133d6f6b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -101,10 +101,21 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb,
}
*prefetched = skb_sk_is_prefetched(skb);
- if (*prefetched)
+ if (*prefetched) {
+#if IS_ENABLED(CONFIG_SYN_COOKIES)
+ if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) {
+ struct request_sock *req = inet_reqsk(sk);
+
+ *refcounted = false;
+ sk = req->rsk_listener;
+ req->rsk_listener = NULL;
+ return sk;
+ }
+#endif
*refcounted = sk_is_refcounted(sk);
- else
+ } else {
*refcounted = true;
+ }
skb->destructor = NULL;
skb->sk = NULL;
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (2 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock() Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-03-15 13:37 ` Eric Dumazet
2024-01-15 20:55 ` [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
` (2 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
We will support arbitrary SYN Cookie with BPF in the following
patch.
If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to cookie_[46]_check() as skb->sk. If skb->sk is not
NULL, we call cookie_bpf_check().
Then, we clear skb->sk and skb->destructor, which are needed not
to hold refcnt for reqsk and the listener. See the following patch
for details.
After that, we finish initialisation for the remaining fields with
cookie_tcp_reqsk_init().
Note that the server side WScale is set only for non-BPF SYN Cookie.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/tcp.h | 20 ++++++++++++++++++++
net/ipv4/syncookies.c | 31 +++++++++++++++++++++++++++----
net/ipv6/syncookies.c | 13 +++++++++----
3 files changed, 56 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 114000e71a46..dfe99a084a71 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -599,6 +599,26 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
dst_feature(dst, RTAX_FEATURE_ECN);
}
+#if IS_ENABLED(CONFIG_BPF)
+static inline bool cookie_bpf_ok(struct sk_buff *skb)
+{
+ return skb->sk;
+}
+
+struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
+#else
+static inline bool cookie_bpf_ok(struct sk_buff *skb)
+{
+ return false;
+}
+
+static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ return NULL;
+}
+#endif
+
/* From net/ipv6/syncookies.c */
int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 981944c22820..be88bf586ff9 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -295,6 +295,24 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
return 0;
}
+#if IS_ENABLED(CONFIG_BPF)
+struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
+{
+ struct request_sock *req = inet_reqsk(skb->sk);
+
+ skb->sk = NULL;
+ skb->destructor = NULL;
+
+ if (cookie_tcp_reqsk_init(sk, skb, req)) {
+ reqsk_free(req);
+ req = NULL;
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_GPL(cookie_bpf_check);
+#endif
+
struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
struct sock *sk, struct sk_buff *skb,
struct tcp_options_received *tcp_opt,
@@ -395,9 +413,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
!th->ack || th->rst)
goto out;
- req = cookie_tcp_check(net, sk, skb);
- if (IS_ERR(req))
- goto out;
+ if (cookie_bpf_ok(skb)) {
+ req = cookie_bpf_check(sk, skb);
+ } else {
+ req = cookie_tcp_check(net, sk, skb);
+ if (IS_ERR(req))
+ goto out;
+ }
if (!req)
goto out_drop;
@@ -445,7 +467,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
ireq->wscale_ok, &rcv_wscale,
dst_metric(&rt->dst, RTAX_INITRWND));
- ireq->rcv_wscale = rcv_wscale;
+ if (!req->syncookie)
+ ireq->rcv_wscale = rcv_wscale;
ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst);
ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index c8d2ca27220c..6b9c69278819 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -182,9 +182,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
!th->ack || th->rst)
goto out;
- req = cookie_tcp_check(net, sk, skb);
- if (IS_ERR(req))
- goto out;
+ if (cookie_bpf_ok(skb)) {
+ req = cookie_bpf_check(sk, skb);
+ } else {
+ req = cookie_tcp_check(net, sk, skb);
+ if (IS_ERR(req))
+ goto out;
+ }
if (!req)
goto out_drop;
@@ -247,7 +251,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
ireq->wscale_ok, &rcv_wscale,
dst_metric(dst, RTAX_INITRWND));
- ireq->rcv_wscale = rcv_wscale;
+ if (!req->syncookie)
+ ireq->rcv_wscale = rcv_wscale;
ireq->ecn_ok &= cookie_ecn_ok(net, dst);
ret = tcp_get_cookie_sock(sk, skb, req, dst);
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
2024-01-15 20:55 ` [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
@ 2024-03-15 13:37 ` Eric Dumazet
2024-03-15 19:02 ` Kuniyuki Iwashima
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2024-03-15 13:37 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Paolo Abeni, Kuniyuki Iwashima, bpf, netdev
On Mon, Jan 15, 2024 at 9:57 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> We will support arbitrary SYN Cookie with BPF in the following
> patch.
>
> If BPF prog validates ACK and kfunc allocates a reqsk, it will
> be carried to cookie_[46]_check() as skb->sk. If skb->sk is not
> NULL, we call cookie_bpf_check().
>
> Then, we clear skb->sk and skb->destructor, which are needed not
> to hold refcnt for reqsk and the listener. See the following patch
> for details.
>
> After that, we finish initialisation for the remaining fields with
> cookie_tcp_reqsk_init().
>
> Note that the server side WScale is set only for non-BPF SYN Cookie.
So the difference between BPF and non-BPF is using a req->syncookie
which had a prior meaning ?
This is very confusing, and needs documentation/comments.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> include/net/tcp.h | 20 ++++++++++++++++++++
> net/ipv4/syncookies.c | 31 +++++++++++++++++++++++++++----
> net/ipv6/syncookies.c | 13 +++++++++----
> 3 files changed, 56 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 114000e71a46..dfe99a084a71 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -599,6 +599,26 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
> dst_feature(dst, RTAX_FEATURE_ECN);
> }
>
> +#if IS_ENABLED(CONFIG_BPF)
> +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> +{
> + return skb->sk;
> +}
> +
> +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
> +#else
> +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> +{
> + return false;
> +}
> +
> +static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
> + struct sk_buff *skb)
> +{
> + return NULL;
> +}
> +#endif
> +
> /* From net/ipv6/syncookies.c */
> int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
> struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 981944c22820..be88bf586ff9 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -295,6 +295,24 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
> return 0;
> }
>
> +#if IS_ENABLED(CONFIG_BPF)
> +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
> +{
> + struct request_sock *req = inet_reqsk(skb->sk);
> +
> + skb->sk = NULL;
> + skb->destructor = NULL;
> +
> + if (cookie_tcp_reqsk_init(sk, skb, req)) {
> + reqsk_free(req);
> + req = NULL;
> + }
> +
> + return req;
> +}
> +EXPORT_SYMBOL_GPL(cookie_bpf_check);
> +#endif
> +
> struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
> struct sock *sk, struct sk_buff *skb,
> struct tcp_options_received *tcp_opt,
> @@ -395,9 +413,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> !th->ack || th->rst)
> goto out;
>
> - req = cookie_tcp_check(net, sk, skb);
> - if (IS_ERR(req))
> - goto out;
> + if (cookie_bpf_ok(skb)) {
> + req = cookie_bpf_check(sk, skb);
> + } else {
> + req = cookie_tcp_check(net, sk, skb);
> + if (IS_ERR(req))
> + goto out;
> + }
> if (!req)
> goto out_drop;
>
> @@ -445,7 +467,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> ireq->wscale_ok, &rcv_wscale,
> dst_metric(&rt->dst, RTAX_INITRWND));
>
> - ireq->rcv_wscale = rcv_wscale;
> + if (!req->syncookie)
> + ireq->rcv_wscale = rcv_wscale;
> ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst);
>
> ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);
> diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
> index c8d2ca27220c..6b9c69278819 100644
> --- a/net/ipv6/syncookies.c
> +++ b/net/ipv6/syncookies.c
> @@ -182,9 +182,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> !th->ack || th->rst)
> goto out;
>
> - req = cookie_tcp_check(net, sk, skb);
> - if (IS_ERR(req))
> - goto out;
> + if (cookie_bpf_ok(skb)) {
> + req = cookie_bpf_check(sk, skb);
> + } else {
> + req = cookie_tcp_check(net, sk, skb);
> + if (IS_ERR(req))
> + goto out;
> + }
> if (!req)
> goto out_drop;
>
> @@ -247,7 +251,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> ireq->wscale_ok, &rcv_wscale,
> dst_metric(dst, RTAX_INITRWND));
>
> - ireq->rcv_wscale = rcv_wscale;
> + if (!req->syncookie)
> + ireq->rcv_wscale = rcv_wscale;
I think a comment is deserved. I do not understand this.
cookie_v6_check() is dealing with syncookie, unless I am mistaken.
Also syzbot is not happy, req->syncookie might be uninitialized here.
BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0
net/ipv4/syncookies.c:477
cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477
tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5538 [inline]
__netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
process_backlog+0x480/0x8b0 net/core/dev.c:5981
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
do_softirq+0x9a/0x100 kernel/softirq.c:455
__local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
__dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235
__ip_finish_output+0x287/0x810
ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433
dst_output include/net/dst.h:450 [inline]
ip_local_out net/ipv4/ip_output.c:129 [inline]
__ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535
ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549
__tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462
tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline]
tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792
__tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline]
tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578
tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716
inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923
__sys_shutdown_sock net/socket.c:2425 [inline]
__sys_shutdown net/socket.c:2437 [inline]
__do_sys_shutdown net/socket.c:2445 [inline]
__se_sys_shutdown+0x2a4/0x440 net/socket.c:2443
__x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was stored to memory at:
reqsk_alloc include/net/request_sock.h:148 [inline]
inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978
cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328
cookie_tcp_check net/ipv4/syncookies.c:388 [inline]
cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420
tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5538 [inline]
__netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
process_backlog+0x480/0x8b0 net/core/dev.c:5981
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
Uninit was created at:
__alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page mm/slub.c:2175 [inline]
allocate_slab mm/slub.c:2338 [inline]
new_slab+0x2de/0x1400 mm/slub.c:2391
___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
__slab_alloc mm/slub.c:3610 [inline]
__slab_alloc_node mm/slub.c:3663 [inline]
slab_alloc_node mm/slub.c:3835 [inline]
kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
reqsk_alloc include/net/request_sock.h:131 [inline]
inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978
tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135
tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716
tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655
tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639
ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674
__netif_receive_skb_list_ptype net/core/dev.c:5581 [inline]
__netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629
__netif_receive_skb_list net/core/dev.c:5681 [inline]
netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773
gro_normal_list include/net/gro.h:438 [inline]
napi_complete_done+0x425/0x880 net/core/dev.c:6113
virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline]
virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted
6.8.0-syzkaller-05562-g61387b8dcf1d #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 02/29/2024
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
2024-03-15 13:37 ` Eric Dumazet
@ 2024-03-15 19:02 ` Kuniyuki Iwashima
0 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-03-15 19:02 UTC (permalink / raw)
To: edumazet
Cc: andrii, ast, bpf, daniel, kuni1840, kuniyu, martin.lau, netdev,
pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 15 Mar 2024 14:37:57 +0100
> On Mon, Jan 15, 2024 at 9:57 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > We will support arbitrary SYN Cookie with BPF in the following
> > patch.
> >
> > If BPF prog validates ACK and kfunc allocates a reqsk, it will
> > be carried to cookie_[46]_check() as skb->sk. If skb->sk is not
> > NULL, we call cookie_bpf_check().
> >
> > Then, we clear skb->sk and skb->destructor, which are needed not
> > to hold refcnt for reqsk and the listener. See the following patch
> > for details.
> >
> > After that, we finish initialisation for the remaining fields with
> > cookie_tcp_reqsk_init().
> >
> > Note that the server side WScale is set only for non-BPF SYN Cookie.
>
> So the difference between BPF and non-BPF is using a req->syncookie
> which had a prior meaning ?
Yes, it was used only in tcp_conn_request(), so I reused the field
and added another meaning for syncookie ACK path.
>
> This is very confusing, and needs documentation/comments.
Will add comment in request_sock.h and syncookies.c
>
>
> >
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> > include/net/tcp.h | 20 ++++++++++++++++++++
> > net/ipv4/syncookies.c | 31 +++++++++++++++++++++++++++----
> > net/ipv6/syncookies.c | 13 +++++++++----
> > 3 files changed, 56 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 114000e71a46..dfe99a084a71 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -599,6 +599,26 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
> > dst_feature(dst, RTAX_FEATURE_ECN);
> > }
> >
> > +#if IS_ENABLED(CONFIG_BPF)
> > +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> > +{
> > + return skb->sk;
> > +}
> > +
> > +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
> > +#else
> > +static inline bool cookie_bpf_ok(struct sk_buff *skb)
> > +{
> > + return false;
> > +}
> > +
> > +static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
> > + struct sk_buff *skb)
> > +{
> > + return NULL;
> > +}
> > +#endif
> > +
> > /* From net/ipv6/syncookies.c */
> > int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
> > struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
> > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> > index 981944c22820..be88bf586ff9 100644
> > --- a/net/ipv4/syncookies.c
> > +++ b/net/ipv4/syncookies.c
> > @@ -295,6 +295,24 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
> > return 0;
> > }
> >
> > +#if IS_ENABLED(CONFIG_BPF)
> > +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
> > +{
> > + struct request_sock *req = inet_reqsk(skb->sk);
> > +
> > + skb->sk = NULL;
> > + skb->destructor = NULL;
> > +
> > + if (cookie_tcp_reqsk_init(sk, skb, req)) {
> > + reqsk_free(req);
> > + req = NULL;
> > + }
> > +
> > + return req;
> > +}
> > +EXPORT_SYMBOL_GPL(cookie_bpf_check);
> > +#endif
> > +
> > struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
> > struct sock *sk, struct sk_buff *skb,
> > struct tcp_options_received *tcp_opt,
> > @@ -395,9 +413,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> > !th->ack || th->rst)
> > goto out;
> >
> > - req = cookie_tcp_check(net, sk, skb);
> > - if (IS_ERR(req))
> > - goto out;
> > + if (cookie_bpf_ok(skb)) {
> > + req = cookie_bpf_check(sk, skb);
> > + } else {
> > + req = cookie_tcp_check(net, sk, skb);
> > + if (IS_ERR(req))
> > + goto out;
> > + }
> > if (!req)
> > goto out_drop;
> >
> > @@ -445,7 +467,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
> > ireq->wscale_ok, &rcv_wscale,
> > dst_metric(&rt->dst, RTAX_INITRWND));
> >
> > - ireq->rcv_wscale = rcv_wscale;
> > + if (!req->syncookie)
> > + ireq->rcv_wscale = rcv_wscale;
> > ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst);
> >
> > ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);
> > diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
> > index c8d2ca27220c..6b9c69278819 100644
> > --- a/net/ipv6/syncookies.c
> > +++ b/net/ipv6/syncookies.c
> > @@ -182,9 +182,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> > !th->ack || th->rst)
> > goto out;
> >
> > - req = cookie_tcp_check(net, sk, skb);
> > - if (IS_ERR(req))
> > - goto out;
> > + if (cookie_bpf_ok(skb)) {
> > + req = cookie_bpf_check(sk, skb);
> > + } else {
> > + req = cookie_tcp_check(net, sk, skb);
> > + if (IS_ERR(req))
> > + goto out;
> > + }
> > if (!req)
> > goto out_drop;
> >
> > @@ -247,7 +251,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> > ireq->wscale_ok, &rcv_wscale,
> > dst_metric(dst, RTAX_INITRWND));
> >
> > - ireq->rcv_wscale = rcv_wscale;
> > + if (!req->syncookie)
> > + ireq->rcv_wscale = rcv_wscale;
>
> I think a comment is deserved. I do not understand this.
>
> cookie_v6_check() is dealing with syncookie, unless I am mistaken.
Exactly, in both cases, here we handle syncookie, but req->syncookie
can be true only for the BPF case.
> Also syzbot is not happy, req->syncookie might be uninitialized here.
I'll make sure we init the field during allocation.
Thank you!
>
> BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0
> net/ipv4/syncookies.c:477
> cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477
> tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
> tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
> tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
> ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
> ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
> dst_input include/net/dst.h:460 [inline]
> ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
> __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
> __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
> process_backlog+0x480/0x8b0 net/core/dev.c:5981
> __napi_poll+0xe7/0x980 net/core/dev.c:6632
> napi_poll net/core/dev.c:6701 [inline]
> net_rx_action+0x89d/0x1820 net/core/dev.c:6813
> __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
> do_softirq+0x9a/0x100 kernel/softirq.c:455
> __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
> local_bh_enable include/linux/bottom_half.h:33 [inline]
> rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
> __dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362
> dev_queue_xmit include/linux/netdevice.h:3091 [inline]
> neigh_hh_output include/net/neighbour.h:526 [inline]
> neigh_output include/net/neighbour.h:540 [inline]
> ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235
> __ip_finish_output+0x287/0x810
> ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323
> NF_HOOK_COND include/linux/netfilter.h:303 [inline]
> ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433
> dst_output include/net/dst.h:450 [inline]
> ip_local_out net/ipv4/ip_output.c:129 [inline]
> __ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535
> ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549
> __tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462
> tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline]
> tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792
> __tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline]
> tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578
> tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716
> inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923
> __sys_shutdown_sock net/socket.c:2425 [inline]
> __sys_shutdown net/socket.c:2437 [inline]
> __do_sys_shutdown net/socket.c:2445 [inline]
> __se_sys_shutdown+0x2a4/0x440 net/socket.c:2443
> __x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443
> do_syscall_64+0xd5/0x1f0
> entry_SYSCALL_64_after_hwframe+0x6d/0x75
>
> Uninit was stored to memory at:
> reqsk_alloc include/net/request_sock.h:148 [inline]
> inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978
> cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328
> cookie_tcp_check net/ipv4/syncookies.c:388 [inline]
> cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420
> tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
> tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
> tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
> ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
> ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
> dst_input include/net/dst.h:460 [inline]
> ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
> __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
> __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
> process_backlog+0x480/0x8b0 net/core/dev.c:5981
> __napi_poll+0xe7/0x980 net/core/dev.c:6632
> napi_poll net/core/dev.c:6701 [inline]
> net_rx_action+0x89d/0x1820 net/core/dev.c:6813
> __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
>
> Uninit was created at:
> __alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592
> __alloc_pages_node include/linux/gfp.h:238 [inline]
> alloc_pages_node include/linux/gfp.h:261 [inline]
> alloc_slab_page mm/slub.c:2175 [inline]
> allocate_slab mm/slub.c:2338 [inline]
> new_slab+0x2de/0x1400 mm/slub.c:2391
> ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
> __slab_alloc mm/slub.c:3610 [inline]
> __slab_alloc_node mm/slub.c:3663 [inline]
> slab_alloc_node mm/slub.c:3835 [inline]
> kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
> reqsk_alloc include/net/request_sock.h:131 [inline]
> inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978
> tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135
> tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716
> tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655
> tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929
> tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
> ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
> ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
> NF_HOOK include/linux/netfilter.h:314 [inline]
> ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
> dst_input include/net/dst.h:460 [inline]
> ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
> ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
> ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639
> ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674
> __netif_receive_skb_list_ptype net/core/dev.c:5581 [inline]
> __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629
> __netif_receive_skb_list net/core/dev.c:5681 [inline]
> netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773
> gro_normal_list include/net/gro.h:438 [inline]
> napi_complete_done+0x425/0x880 net/core/dev.c:6113
> virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline]
> virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211
> __napi_poll+0xe7/0x980 net/core/dev.c:6632
> napi_poll net/core/dev.c:6701 [inline]
> net_rx_action+0x89d/0x1820 net/core/dev.c:6813
> __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
>
> CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted
> 6.8.0-syzkaller-05562-g61387b8dcf1d #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 02/29/2024
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie.
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (3 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check() Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-17 1:55 ` Martin KaFai Lau
2024-01-15 20:55 ` [PATCH v8 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk() Kuniyuki Iwashima
2024-01-17 1:50 ` [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC patchwork-bot+netdevbpf
6 siblings, 1 reply; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
This patch adds a new kfunc available at TC hook to support arbitrary
SYN Cookie.
The basic usage is as follows:
struct bpf_tcp_req_attrs attrs = {
.mss = mss,
.wscale_ok = wscale_ok,
.rcv_wscale = rcv_wscale, /* Server's WScale < 15 */
.snd_wscale = snd_wscale, /* Client's WScale < 15 */
.tstamp_ok = tstamp_ok,
.rcv_tsval = tsval,
.rcv_tsecr = tsecr, /* Server's Initial TSval */
.usec_ts_ok = usec_ts_ok,
.sack_ok = sack_ok,
.ecn_ok = ecn_ok,
}
skc = bpf_skc_lookup_tcp(...);
sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs));
bpf_sk_release(skc);
bpf_sk_assign_tcp_reqsk() takes skb, a listener sk, and struct
bpf_tcp_req_attrs and allocates reqsk and configures it. Then,
bpf_sk_assign_tcp_reqsk() links reqsk with skb and the listener.
The notable thing here is that we do not hold refcnt for both reqsk
and listener. To differentiate that, we mark reqsk->syncookie, which
is only used in TX for now. So, if reqsk->syncookie is 1 in RX, it
means that the reqsk is allocated by kfunc.
When skb is freed, sock_pfree() checks if reqsk->syncookie is 1,
and in that case, we set NULL to reqsk->rsk_listener before calling
reqsk_free() as reqsk does not hold a refcnt of the listener.
When the TCP stack looks up a socket from the skb, we steal the
listener from the reqsk in skb_steal_sock() and create a full sk
in cookie_v[46]_check().
The refcnt of reqsk will finally be set to 1 in tcp_get_cookie_sock()
after creating a full sk.
Note that we can extend struct bpf_tcp_req_attrs in the future when
we add a new attribute that is determined in 3WHS.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/net/tcp.h | 14 ++++++
net/core/filter.c | 114 +++++++++++++++++++++++++++++++++++++++++++++-
net/core/sock.c | 14 +++++-
3 files changed, 138 insertions(+), 4 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index dfe99a084a71..451dc1373970 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -600,6 +600,20 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
}
#if IS_ENABLED(CONFIG_BPF)
+struct bpf_tcp_req_attrs {
+ u32 rcv_tsval;
+ u32 rcv_tsecr;
+ u16 mss;
+ u8 rcv_wscale;
+ u8 snd_wscale;
+ u8 ecn_ok;
+ u8 wscale_ok;
+ u8 sack_ok;
+ u8 tstamp_ok;
+ u8 usec_ts_ok;
+ u8 reserved[3];
+};
+
static inline bool cookie_bpf_ok(struct sk_buff *skb)
{
return skb->sk;
diff --git a/net/core/filter.c b/net/core/filter.c
index 8c9f67c81e22..647d04171b7e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -11837,6 +11837,106 @@ __bpf_kfunc int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
return 0;
}
+
+__bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct sk_buff *skb, struct sock *sk,
+ struct bpf_tcp_req_attrs *attrs, int attrs__sz)
+{
+#if IS_ENABLED(CONFIG_SYN_COOKIES)
+ const struct request_sock_ops *ops;
+ struct inet_request_sock *ireq;
+ struct tcp_request_sock *treq;
+ struct request_sock *req;
+ struct net *net;
+ __u16 min_mss;
+ u32 tsoff = 0;
+
+ if (attrs__sz != sizeof(*attrs) ||
+ attrs->reserved[0] || attrs->reserved[1] || attrs->reserved[2])
+ return -EINVAL;
+
+ if (!sk)
+ return -EINVAL;
+
+ if (!skb_at_tc_ingress(skb))
+ return -EINVAL;
+
+ net = dev_net(skb->dev);
+ if (net != sock_net(sk))
+ return -ENETUNREACH;
+
+ switch (skb->protocol) {
+ case htons(ETH_P_IP):
+ ops = &tcp_request_sock_ops;
+ min_mss = 536;
+ break;
+#if IS_BUILTIN(CONFIG_IPV6)
+ case htons(ETH_P_IPV6):
+ ops = &tcp6_request_sock_ops;
+ min_mss = IPV6_MIN_MTU - 60;
+ break;
+#endif
+ default:
+ return -EINVAL;
+ }
+
+ if (sk->sk_type != SOCK_STREAM || sk->sk_state != TCP_LISTEN ||
+ sk_is_mptcp(sk))
+ return -EINVAL;
+
+ if (attrs->mss < min_mss)
+ return -EINVAL;
+
+ if (attrs->wscale_ok) {
+ if (!READ_ONCE(net->ipv4.sysctl_tcp_window_scaling))
+ return -EINVAL;
+
+ if (attrs->snd_wscale > TCP_MAX_WSCALE ||
+ attrs->rcv_wscale > TCP_MAX_WSCALE)
+ return -EINVAL;
+ }
+
+ if (attrs->sack_ok && !READ_ONCE(net->ipv4.sysctl_tcp_sack))
+ return -EINVAL;
+
+ if (attrs->tstamp_ok) {
+ if (!READ_ONCE(net->ipv4.sysctl_tcp_timestamps))
+ return -EINVAL;
+
+ tsoff = attrs->rcv_tsecr - tcp_ns_to_ts(attrs->usec_ts_ok, tcp_clock_ns());
+ }
+
+ req = inet_reqsk_alloc(ops, sk, false);
+ if (!req)
+ return -ENOMEM;
+
+ ireq = inet_rsk(req);
+ treq = tcp_rsk(req);
+
+ req->rsk_listener = sk;
+ req->syncookie = 1;
+ req->mss = attrs->mss;
+ req->ts_recent = attrs->rcv_tsval;
+
+ ireq->snd_wscale = attrs->snd_wscale;
+ ireq->rcv_wscale = attrs->rcv_wscale;
+ ireq->tstamp_ok = !!attrs->tstamp_ok;
+ ireq->sack_ok = !!attrs->sack_ok;
+ ireq->wscale_ok = !!attrs->wscale_ok;
+ ireq->ecn_ok = !!attrs->ecn_ok;
+
+ treq->req_usec_ts = !!attrs->usec_ts_ok;
+ treq->ts_off = tsoff;
+
+ skb_orphan(skb);
+ skb->sk = req_to_sk(req);
+ skb->destructor = sock_pfree;
+
+ return 0;
+#else
+ return -EOPNOTSUPP;
+#endif
+}
+
__bpf_kfunc_end_defs();
int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags,
@@ -11865,6 +11965,10 @@ BTF_SET8_START(bpf_kfunc_check_set_sock_addr)
BTF_ID_FLAGS(func, bpf_sock_addr_set_sun_path)
BTF_SET8_END(bpf_kfunc_check_set_sock_addr)
+BTF_SET8_START(bpf_kfunc_check_set_tcp_reqsk)
+BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk)
+BTF_SET8_END(bpf_kfunc_check_set_tcp_reqsk)
+
static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
.owner = THIS_MODULE,
.set = &bpf_kfunc_check_set_skb,
@@ -11880,6 +11984,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_addr = {
.set = &bpf_kfunc_check_set_sock_addr,
};
+static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
+ .owner = THIS_MODULE,
+ .set = &bpf_kfunc_check_set_tcp_reqsk,
+};
+
static int __init bpf_kfunc_init(void)
{
int ret;
@@ -11895,8 +12004,9 @@ static int __init bpf_kfunc_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
- return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
- &bpf_kfunc_set_sock_addr);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+ &bpf_kfunc_set_sock_addr);
+ return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
}
late_initcall(bpf_kfunc_init);
diff --git a/net/core/sock.c b/net/core/sock.c
index 158dbdebce6a..147fb2656e6b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2582,8 +2582,18 @@ EXPORT_SYMBOL(sock_efree);
#ifdef CONFIG_INET
void sock_pfree(struct sk_buff *skb)
{
- if (sk_is_refcounted(skb->sk))
- sock_gen_put(skb->sk);
+ struct sock *sk = skb->sk;
+
+ if (!sk_is_refcounted(sk))
+ return;
+
+ if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) {
+ inet_reqsk(sk)->rsk_listener = NULL;
+ reqsk_free(inet_reqsk(sk));
+ return;
+ }
+
+ sock_gen_put(sk);
}
EXPORT_SYMBOL(sock_pfree);
#endif /* CONFIG_INET */
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie.
2024-01-15 20:55 ` [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
@ 2024-01-17 1:55 ` Martin KaFai Lau
0 siblings, 0 replies; 11+ messages in thread
From: Martin KaFai Lau @ 2024-01-17 1:55 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Kuniyuki Iwashima, bpf, netdev, Eric Dumazet, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Paolo Abeni
On 1/15/24 12:55 PM, Kuniyuki Iwashima wrote:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8c9f67c81e22..647d04171b7e 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -11837,6 +11837,106 @@ __bpf_kfunc int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
>
> return 0;
> }
> +
> +__bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct sk_buff *skb, struct sock *sk,
> + struct bpf_tcp_req_attrs *attrs, int attrs__sz)
> +{
> +#if IS_ENABLED(CONFIG_SYN_COOKIES)
> + const struct request_sock_ops *ops;
> + struct inet_request_sock *ireq;
> + struct tcp_request_sock *treq;
> + struct request_sock *req;
> + struct net *net;
> + __u16 min_mss;
> + u32 tsoff = 0;
> +
> + if (attrs__sz != sizeof(*attrs) ||
> + attrs->reserved[0] || attrs->reserved[1] || attrs->reserved[2])
> + return -EINVAL;
> +
> + if (!sk)
I removed this "!sk" check, the verifier will check for it,
and ...
> +BTF_SET8_START(bpf_kfunc_check_set_tcp_reqsk)
> +BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk)
... limited it to KF_TRUSTED_ARGS. The arg "sk" must be from "bpf_sk*_lookup_*"
or from "bpf_map_lookup_elem(&sock_map,...)". Both of them have
"reg->ref_obj_id" (i.e. the verifier tracks the refcnt acquire/release) and it
is as good as trusted ptr.
The above is some final details I noticed. Applied. Thanks.
> +BTF_SET8_END(bpf_kfunc_check_set_tcp_reqsk)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v8 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk().
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (4 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 5/6] bpf: tcp: Support arbitrary SYN Cookie Kuniyuki Iwashima
@ 2024-01-15 20:55 ` Kuniyuki Iwashima
2024-01-17 1:50 ` [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC patchwork-bot+netdevbpf
6 siblings, 0 replies; 11+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-15 20:55 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
This commit adds a sample selftest to demonstrate how we can use
bpf_sk_assign_tcp_reqsk() as the backend of SYN Proxy.
The test creates IPv4/IPv6 x TCP connections and transfer messages
over them on lo with BPF tc prog attached.
The tc prog will process SYN and returns SYN+ACK with the following
ISN and TS. In a real use case, this part will be done by other
hosts.
MSB LSB
ISN: | 31 ... 8 | 7 6 | 5 | 4 | 3 2 1 0 |
| Hash_1 | MSS | ECN | SACK | WScale |
TS: | 31 ... 8 | 7 ... 0 |
| Random | Hash_2 |
WScale in SYN is reused in SYN+ACK.
The client returns ACK, and tc prog will recalculate ISN and TS
from ACK and validate SYN Cookie.
If it's valid, the prog calls kfunc to allocate a reqsk for skb and
configure the reqsk based on the argument created from SYN Cookie.
Later, the reqsk will be processed in cookie_v[46]_check() to create
a connection.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
tools/testing/selftests/bpf/bpf_kfuncs.h | 10 +
tools/testing/selftests/bpf/config | 1 +
.../bpf/prog_tests/tcp_custom_syncookie.c | 150 +++++
.../selftests/bpf/progs/bpf_tracing_net.h | 16 +
.../selftests/bpf/progs/test_siphash.h | 64 ++
.../bpf/progs/test_tcp_custom_syncookie.c | 572 ++++++++++++++++++
.../bpf/progs/test_tcp_custom_syncookie.h | 140 +++++
7 files changed, 953 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c
create mode 100644 tools/testing/selftests/bpf/progs/test_siphash.h
create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h
diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h
index b4e78c1eb37b..23c24f852f4f 100644
--- a/tools/testing/selftests/bpf/bpf_kfuncs.h
+++ b/tools/testing/selftests/bpf/bpf_kfuncs.h
@@ -51,6 +51,16 @@ extern int bpf_dynptr_clone(const struct bpf_dynptr *ptr, struct bpf_dynptr *clo
extern int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
const __u8 *sun_path, __u32 sun_path__sz) __ksym;
+/* Description
+ * Allocate and configure a reqsk and link it with a listener and skb.
+ * Returns
+ * Error code
+ */
+struct sock;
+struct bpf_tcp_req_attrs;
+extern int bpf_sk_assign_tcp_reqsk(struct __sk_buff *skb, struct sock *sk,
+ struct bpf_tcp_req_attrs *attrs, int attrs__sz) __ksym;
+
void *bpf_cast_to_kern_ctx(void *) __ksym;
void *bpf_rdonly_cast(void *obj, __u32 btf_id) __ksym;
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index c125c441abc7..01f241ea2c67 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -81,6 +81,7 @@ CONFIG_NF_NAT=y
CONFIG_RC_CORE=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
+CONFIG_SYN_COOKIES=y
CONFIG_TEST_BPF=m
CONFIG_USERFAULTFD=y
CONFIG_VSOCKETS=y
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c b/tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c
new file mode 100644
index 000000000000..eaf441dc7e79
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdlib.h>
+#include <net/if.h>
+
+#include "test_progs.h"
+#include "cgroup_helpers.h"
+#include "network_helpers.h"
+#include "test_tcp_custom_syncookie.skel.h"
+
+static struct test_tcp_custom_syncookie_case {
+ int family, type;
+ char addr[16];
+ char name[10];
+} test_cases[] = {
+ {
+ .name = "IPv4 TCP",
+ .family = AF_INET,
+ .type = SOCK_STREAM,
+ .addr = "127.0.0.1",
+ },
+ {
+ .name = "IPv6 TCP",
+ .family = AF_INET6,
+ .type = SOCK_STREAM,
+ .addr = "::1",
+ },
+};
+
+static int setup_netns(void)
+{
+ if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns"))
+ return -1;
+
+ if (!ASSERT_OK(system("ip link set dev lo up"), "ip"))
+ goto err;
+
+ if (!ASSERT_OK(write_sysctl("/proc/sys/net/ipv4/tcp_ecn", "1"),
+ "write_sysctl"))
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static int setup_tc(struct test_tcp_custom_syncookie *skel)
+{
+ LIBBPF_OPTS(bpf_tc_hook, qdisc_lo, .attach_point = BPF_TC_INGRESS);
+ LIBBPF_OPTS(bpf_tc_opts, tc_attach,
+ .prog_fd = bpf_program__fd(skel->progs.tcp_custom_syncookie));
+
+ qdisc_lo.ifindex = if_nametoindex("lo");
+ if (!ASSERT_OK(bpf_tc_hook_create(&qdisc_lo), "qdisc add dev lo clsact"))
+ goto err;
+
+ if (!ASSERT_OK(bpf_tc_attach(&qdisc_lo, &tc_attach),
+ "filter add dev lo ingress"))
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+#define msg "Hello World"
+#define msglen 11
+
+static void transfer_message(int sender, int receiver)
+{
+ char buf[msglen];
+ int ret;
+
+ ret = send(sender, msg, msglen, 0);
+ if (!ASSERT_EQ(ret, msglen, "send"))
+ return;
+
+ memset(buf, 0, sizeof(buf));
+
+ ret = recv(receiver, buf, msglen, 0);
+ if (!ASSERT_EQ(ret, msglen, "recv"))
+ return;
+
+ ret = strncmp(buf, msg, msglen);
+ if (!ASSERT_EQ(ret, 0, "strncmp"))
+ return;
+}
+
+static void create_connection(struct test_tcp_custom_syncookie_case *test_case)
+{
+ int server, client, child;
+
+ server = start_server(test_case->family, test_case->type, test_case->addr, 0, 0);
+ if (!ASSERT_NEQ(server, -1, "start_server"))
+ return;
+
+ client = connect_to_fd(server, 0);
+ if (!ASSERT_NEQ(client, -1, "connect_to_fd"))
+ goto close_server;
+
+ child = accept(server, NULL, 0);
+ if (!ASSERT_NEQ(child, -1, "accept"))
+ goto close_client;
+
+ transfer_message(client, child);
+ transfer_message(child, client);
+
+ close(child);
+close_client:
+ close(client);
+close_server:
+ close(server);
+}
+
+void test_tcp_custom_syncookie(void)
+{
+ struct test_tcp_custom_syncookie *skel;
+ int i;
+
+ if (setup_netns())
+ return;
+
+ skel = test_tcp_custom_syncookie__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "open_and_load"))
+ return;
+
+ if (setup_tc(skel))
+ goto destroy_skel;
+
+ for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+ if (!test__start_subtest(test_cases[i].name))
+ continue;
+
+ skel->bss->handled_syn = false;
+ skel->bss->handled_ack = false;
+
+ create_connection(&test_cases[i]);
+
+ ASSERT_EQ(skel->bss->handled_syn, true, "SYN is not handled at tc.");
+ ASSERT_EQ(skel->bss->handled_ack, true, "ACK is not handled at tc");
+ }
+
+destroy_skel:
+ system("tc qdisc del dev lo clsact");
+
+ test_tcp_custom_syncookie__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 1bdc680b0e0e..49e525ad9856 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -51,9 +51,25 @@
#define ICSK_TIME_LOSS_PROBE 5
#define ICSK_TIME_REO_TIMEOUT 6
+#define ETH_ALEN 6
#define ETH_HLEN 14
+#define ETH_P_IP 0x0800
#define ETH_P_IPV6 0x86DD
+#define NEXTHDR_TCP 6
+
+#define TCPOPT_NOP 1
+#define TCPOPT_EOL 0
+#define TCPOPT_MSS 2
+#define TCPOPT_WINDOW 3
+#define TCPOPT_TIMESTAMP 8
+#define TCPOPT_SACK_PERM 4
+
+#define TCPOLEN_MSS 4
+#define TCPOLEN_WINDOW 3
+#define TCPOLEN_TIMESTAMP 10
+#define TCPOLEN_SACK_PERM 2
+
#define CHECKSUM_NONE 0
#define CHECKSUM_PARTIAL 3
diff --git a/tools/testing/selftests/bpf/progs/test_siphash.h b/tools/testing/selftests/bpf/progs/test_siphash.h
new file mode 100644
index 000000000000..5d3a7ec36780
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_siphash.h
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#ifndef _TEST_SIPHASH_H
+#define _TEST_SIPHASH_H
+
+/* include/linux/bitops.h */
+static inline u64 rol64(u64 word, unsigned int shift)
+{
+ return (word << (shift & 63)) | (word >> ((-shift) & 63));
+}
+
+/* include/linux/siphash.h */
+#define SIPHASH_PERMUTATION(a, b, c, d) ( \
+ (a) += (b), (b) = rol64((b), 13), (b) ^= (a), (a) = rol64((a), 32), \
+ (c) += (d), (d) = rol64((d), 16), (d) ^= (c), \
+ (a) += (d), (d) = rol64((d), 21), (d) ^= (a), \
+ (c) += (b), (b) = rol64((b), 17), (b) ^= (c), (c) = rol64((c), 32))
+
+#define SIPHASH_CONST_0 0x736f6d6570736575ULL
+#define SIPHASH_CONST_1 0x646f72616e646f6dULL
+#define SIPHASH_CONST_2 0x6c7967656e657261ULL
+#define SIPHASH_CONST_3 0x7465646279746573ULL
+
+/* lib/siphash.c */
+#define SIPROUND SIPHASH_PERMUTATION(v0, v1, v2, v3)
+
+#define PREAMBLE(len) \
+ u64 v0 = SIPHASH_CONST_0; \
+ u64 v1 = SIPHASH_CONST_1; \
+ u64 v2 = SIPHASH_CONST_2; \
+ u64 v3 = SIPHASH_CONST_3; \
+ u64 b = ((u64)(len)) << 56; \
+ v3 ^= key->key[1]; \
+ v2 ^= key->key[0]; \
+ v1 ^= key->key[1]; \
+ v0 ^= key->key[0];
+
+#define POSTAMBLE \
+ v3 ^= b; \
+ SIPROUND; \
+ SIPROUND; \
+ v0 ^= b; \
+ v2 ^= 0xff; \
+ SIPROUND; \
+ SIPROUND; \
+ SIPROUND; \
+ SIPROUND; \
+ return (v0 ^ v1) ^ (v2 ^ v3);
+
+static inline u64 siphash_2u64(const u64 first, const u64 second, const siphash_key_t *key)
+{
+ PREAMBLE(16)
+ v3 ^= first;
+ SIPROUND;
+ SIPROUND;
+ v0 ^= first;
+ v3 ^= second;
+ SIPROUND;
+ SIPROUND;
+ v0 ^= second;
+ POSTAMBLE
+}
+#endif
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c
new file mode 100644
index 000000000000..a5501b29979a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c
@@ -0,0 +1,572 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+#include "bpf_tracing_net.h"
+#include "bpf_kfuncs.h"
+#include "test_siphash.h"
+#include "test_tcp_custom_syncookie.h"
+
+/* Hash is calculated for each client and split into ISN and TS.
+ *
+ * MSB LSB
+ * ISN: | 31 ... 8 | 7 6 | 5 | 4 | 3 2 1 0 |
+ * | Hash_1 | MSS | ECN | SACK | WScale |
+ *
+ * TS: | 31 ... 8 | 7 ... 0 |
+ * | Random | Hash_2 |
+ */
+#define COOKIE_BITS 8
+#define COOKIE_MASK (((__u32)1 << COOKIE_BITS) - 1)
+
+enum {
+ /* 0xf is invalid thus means that SYN did not have WScale. */
+ BPF_SYNCOOKIE_WSCALE_MASK = (1 << 4) - 1,
+ BPF_SYNCOOKIE_SACK = (1 << 4),
+ BPF_SYNCOOKIE_ECN = (1 << 5),
+};
+
+#define MSS_LOCAL_IPV4 65495
+#define MSS_LOCAL_IPV6 65476
+
+const __u16 msstab4[] = {
+ 536,
+ 1300,
+ 1460,
+ MSS_LOCAL_IPV4,
+};
+
+const __u16 msstab6[] = {
+ 1280 - 60, /* IPV6_MIN_MTU - 60 */
+ 1480 - 60,
+ 9000 - 60,
+ MSS_LOCAL_IPV6,
+};
+
+static siphash_key_t test_key_siphash = {
+ { 0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL }
+};
+
+struct tcp_syncookie {
+ struct __sk_buff *skb;
+ void *data_end;
+ struct ethhdr *eth;
+ struct iphdr *ipv4;
+ struct ipv6hdr *ipv6;
+ struct tcphdr *tcp;
+ union {
+ char *ptr;
+ __be32 *ptr32;
+ };
+ struct bpf_tcp_req_attrs attrs;
+ u32 cookie;
+ u64 first;
+};
+
+bool handled_syn, handled_ack;
+
+static int tcp_load_headers(struct tcp_syncookie *ctx)
+{
+ ctx->data_end = (void *)(long)ctx->skb->data_end;
+ ctx->eth = (struct ethhdr *)(long)ctx->skb->data;
+
+ if (ctx->eth + 1 > ctx->data_end)
+ goto err;
+
+ switch (bpf_ntohs(ctx->eth->h_proto)) {
+ case ETH_P_IP:
+ ctx->ipv4 = (struct iphdr *)(ctx->eth + 1);
+
+ if (ctx->ipv4 + 1 > ctx->data_end)
+ goto err;
+
+ if (ctx->ipv4->ihl != sizeof(*ctx->ipv4) / 4)
+ goto err;
+
+ if (ctx->ipv4->version != 4)
+ goto err;
+
+ if (ctx->ipv4->protocol != IPPROTO_TCP)
+ goto err;
+
+ ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1);
+ break;
+ case ETH_P_IPV6:
+ ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1);
+
+ if (ctx->ipv6 + 1 > ctx->data_end)
+ goto err;
+
+ if (ctx->ipv6->version != 6)
+ goto err;
+
+ if (ctx->ipv6->nexthdr != NEXTHDR_TCP)
+ goto err;
+
+ ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1);
+ break;
+ default:
+ goto err;
+ }
+
+ if (ctx->tcp + 1 > ctx->data_end)
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static int tcp_reload_headers(struct tcp_syncookie *ctx)
+{
+ /* Without volatile,
+ * R3 32-bit pointer arithmetic prohibited
+ */
+ volatile u64 data_len = ctx->skb->data_end - ctx->skb->data;
+
+ if (ctx->tcp->doff < sizeof(*ctx->tcp) / 4)
+ goto err;
+
+ /* Needed to calculate csum and parse TCP options. */
+ if (bpf_skb_change_tail(ctx->skb, data_len + 60 - ctx->tcp->doff * 4, 0))
+ goto err;
+
+ ctx->data_end = (void *)(long)ctx->skb->data_end;
+ ctx->eth = (struct ethhdr *)(long)ctx->skb->data;
+ if (ctx->ipv4) {
+ ctx->ipv4 = (struct iphdr *)(ctx->eth + 1);
+ ctx->ipv6 = NULL;
+ ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1);
+ } else {
+ ctx->ipv4 = NULL;
+ ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1);
+ ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1);
+ }
+
+ if ((void *)ctx->tcp + 60 > ctx->data_end)
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static __sum16 tcp_v4_csum(struct tcp_syncookie *ctx, __wsum csum)
+{
+ return csum_tcpudp_magic(ctx->ipv4->saddr, ctx->ipv4->daddr,
+ ctx->tcp->doff * 4, IPPROTO_TCP, csum);
+}
+
+static __sum16 tcp_v6_csum(struct tcp_syncookie *ctx, __wsum csum)
+{
+ return csum_ipv6_magic(&ctx->ipv6->saddr, &ctx->ipv6->daddr,
+ ctx->tcp->doff * 4, IPPROTO_TCP, csum);
+}
+
+static int tcp_validate_header(struct tcp_syncookie *ctx)
+{
+ s64 csum;
+
+ if (tcp_reload_headers(ctx))
+ goto err;
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0);
+ if (csum < 0)
+ goto err;
+
+ if (ctx->ipv4) {
+ /* check tcp_v4_csum(csum) is 0 if not on lo. */
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, ctx->ipv4->ihl * 4, 0);
+ if (csum < 0)
+ goto err;
+
+ if (csum_fold(csum) != 0)
+ goto err;
+ } else if (ctx->ipv6) {
+ /* check tcp_v6_csum(csum) is 0 if not on lo. */
+ }
+
+ return 0;
+err:
+ return -1;
+}
+
+static int tcp_parse_option(__u32 index, struct tcp_syncookie *ctx)
+{
+ char opcode, opsize;
+
+ if (ctx->ptr + 1 > ctx->data_end)
+ goto stop;
+
+ opcode = *ctx->ptr++;
+
+ if (opcode == TCPOPT_EOL)
+ goto stop;
+
+ if (opcode == TCPOPT_NOP)
+ goto next;
+
+ if (ctx->ptr + 1 > ctx->data_end)
+ goto stop;
+
+ opsize = *ctx->ptr++;
+
+ if (opsize < 2)
+ goto stop;
+
+ switch (opcode) {
+ case TCPOPT_MSS:
+ if (opsize == TCPOLEN_MSS && ctx->tcp->syn &&
+ ctx->ptr + (TCPOLEN_MSS - 2) < ctx->data_end)
+ ctx->attrs.mss = get_unaligned_be16(ctx->ptr);
+ break;
+ case TCPOPT_WINDOW:
+ if (opsize == TCPOLEN_WINDOW && ctx->tcp->syn &&
+ ctx->ptr + (TCPOLEN_WINDOW - 2) < ctx->data_end) {
+ ctx->attrs.wscale_ok = 1;
+ ctx->attrs.snd_wscale = *ctx->ptr;
+ }
+ break;
+ case TCPOPT_TIMESTAMP:
+ if (opsize == TCPOLEN_TIMESTAMP &&
+ ctx->ptr + (TCPOLEN_TIMESTAMP - 2) < ctx->data_end) {
+ ctx->attrs.rcv_tsval = get_unaligned_be32(ctx->ptr);
+ ctx->attrs.rcv_tsecr = get_unaligned_be32(ctx->ptr + 4);
+
+ if (ctx->tcp->syn && ctx->attrs.rcv_tsecr)
+ ctx->attrs.tstamp_ok = 0;
+ else
+ ctx->attrs.tstamp_ok = 1;
+ }
+ break;
+ case TCPOPT_SACK_PERM:
+ if (opsize == TCPOLEN_SACK_PERM && ctx->tcp->syn &&
+ ctx->ptr + (TCPOLEN_SACK_PERM - 2) < ctx->data_end)
+ ctx->attrs.sack_ok = 1;
+ break;
+ }
+
+ ctx->ptr += opsize - 2;
+next:
+ return 0;
+stop:
+ return 1;
+}
+
+static void tcp_parse_options(struct tcp_syncookie *ctx)
+{
+ ctx->ptr = (char *)(ctx->tcp + 1);
+
+ bpf_loop(40, tcp_parse_option, ctx, 0);
+}
+
+static int tcp_validate_sysctl(struct tcp_syncookie *ctx)
+{
+ if ((ctx->ipv4 && ctx->attrs.mss != MSS_LOCAL_IPV4) ||
+ (ctx->ipv6 && ctx->attrs.mss != MSS_LOCAL_IPV6))
+ goto err;
+
+ if (!ctx->attrs.wscale_ok || ctx->attrs.snd_wscale != 7)
+ goto err;
+
+ if (!ctx->attrs.tstamp_ok)
+ goto err;
+
+ if (!ctx->attrs.sack_ok)
+ goto err;
+
+ if (!ctx->tcp->ece || !ctx->tcp->cwr)
+ goto err;
+
+ return 0;
+err:
+ return -1;
+}
+
+static void tcp_prepare_cookie(struct tcp_syncookie *ctx)
+{
+ u32 seq = bpf_ntohl(ctx->tcp->seq);
+ u64 first = 0, second;
+ int mssind = 0;
+ u32 hash;
+
+ if (ctx->ipv4) {
+ for (mssind = ARRAY_SIZE(msstab4) - 1; mssind; mssind--)
+ if (ctx->attrs.mss >= msstab4[mssind])
+ break;
+
+ ctx->attrs.mss = msstab4[mssind];
+
+ first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr;
+ } else if (ctx->ipv6) {
+ for (mssind = ARRAY_SIZE(msstab6) - 1; mssind; mssind--)
+ if (ctx->attrs.mss >= msstab6[mssind])
+ break;
+
+ ctx->attrs.mss = msstab6[mssind];
+
+ first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 |
+ ctx->ipv6->daddr.in6_u.u6_addr32[0];
+ }
+
+ second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest;
+ hash = siphash_2u64(first, second, &test_key_siphash);
+
+ if (ctx->attrs.tstamp_ok) {
+ ctx->attrs.rcv_tsecr = bpf_get_prandom_u32();
+ ctx->attrs.rcv_tsecr &= ~COOKIE_MASK;
+ ctx->attrs.rcv_tsecr |= hash & COOKIE_MASK;
+ }
+
+ hash &= ~COOKIE_MASK;
+ hash |= mssind << 6;
+
+ if (ctx->attrs.wscale_ok)
+ hash |= ctx->attrs.snd_wscale & BPF_SYNCOOKIE_WSCALE_MASK;
+
+ if (ctx->attrs.sack_ok)
+ hash |= BPF_SYNCOOKIE_SACK;
+
+ if (ctx->attrs.tstamp_ok && ctx->tcp->ece && ctx->tcp->cwr)
+ hash |= BPF_SYNCOOKIE_ECN;
+
+ ctx->cookie = hash;
+}
+
+static void tcp_write_options(struct tcp_syncookie *ctx)
+{
+ ctx->ptr32 = (__be32 *)(ctx->tcp + 1);
+
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_MSS << 24 | TCPOLEN_MSS << 16 |
+ ctx->attrs.mss);
+
+ if (ctx->attrs.wscale_ok)
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
+ TCPOPT_WINDOW << 16 |
+ TCPOLEN_WINDOW << 8 |
+ ctx->attrs.snd_wscale);
+
+ if (ctx->attrs.tstamp_ok) {
+ if (ctx->attrs.sack_ok)
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_SACK_PERM << 24 |
+ TCPOLEN_SACK_PERM << 16 |
+ TCPOPT_TIMESTAMP << 8 |
+ TCPOLEN_TIMESTAMP);
+ else
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
+ TCPOPT_NOP << 16 |
+ TCPOPT_TIMESTAMP << 8 |
+ TCPOLEN_TIMESTAMP);
+
+ *ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsecr);
+ *ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsval);
+ } else if (ctx->attrs.sack_ok) {
+ *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
+ TCPOPT_NOP << 16 |
+ TCPOPT_SACK_PERM << 8 |
+ TCPOLEN_SACK_PERM);
+ }
+}
+
+static int tcp_handle_syn(struct tcp_syncookie *ctx)
+{
+ s64 csum;
+
+ if (tcp_validate_header(ctx))
+ goto err;
+
+ tcp_parse_options(ctx);
+
+ if (tcp_validate_sysctl(ctx))
+ goto err;
+
+ tcp_prepare_cookie(ctx);
+ tcp_write_options(ctx);
+
+ swap(ctx->tcp->source, ctx->tcp->dest);
+ ctx->tcp->check = 0;
+ ctx->tcp->ack_seq = bpf_htonl(bpf_ntohl(ctx->tcp->seq) + 1);
+ ctx->tcp->seq = bpf_htonl(ctx->cookie);
+ ctx->tcp->doff = ((long)ctx->ptr32 - (long)ctx->tcp) >> 2;
+ ctx->tcp->ack = 1;
+ if (!ctx->attrs.tstamp_ok || !ctx->tcp->ece || !ctx->tcp->cwr)
+ ctx->tcp->ece = 0;
+ ctx->tcp->cwr = 0;
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0);
+ if (csum < 0)
+ goto err;
+
+ if (ctx->ipv4) {
+ swap(ctx->ipv4->saddr, ctx->ipv4->daddr);
+ ctx->tcp->check = tcp_v4_csum(ctx, csum);
+
+ ctx->ipv4->check = 0;
+ ctx->ipv4->tos = 0;
+ ctx->ipv4->tot_len = bpf_htons((long)ctx->ptr32 - (long)ctx->ipv4);
+ ctx->ipv4->id = 0;
+ ctx->ipv4->ttl = 64;
+
+ csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, sizeof(*ctx->ipv4), 0);
+ if (csum < 0)
+ goto err;
+
+ ctx->ipv4->check = csum_fold(csum);
+ } else if (ctx->ipv6) {
+ swap(ctx->ipv6->saddr, ctx->ipv6->daddr);
+ ctx->tcp->check = tcp_v6_csum(ctx, csum);
+
+ *(__be32 *)ctx->ipv6 = bpf_htonl(0x60000000);
+ ctx->ipv6->payload_len = bpf_htons((long)ctx->ptr32 - (long)ctx->tcp);
+ ctx->ipv6->hop_limit = 64;
+ }
+
+ swap_array(ctx->eth->h_source, ctx->eth->h_dest);
+
+ if (bpf_skb_change_tail(ctx->skb, (long)ctx->ptr32 - (long)ctx->eth, 0))
+ goto err;
+
+ return bpf_redirect(ctx->skb->ifindex, 0);
+err:
+ return TC_ACT_SHOT;
+}
+
+static int tcp_validate_cookie(struct tcp_syncookie *ctx)
+{
+ u32 cookie = bpf_ntohl(ctx->tcp->ack_seq) - 1;
+ u32 seq = bpf_ntohl(ctx->tcp->seq) - 1;
+ u64 first = 0, second;
+ int mssind;
+ u32 hash;
+
+ if (ctx->ipv4)
+ first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr;
+ else if (ctx->ipv6)
+ first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 |
+ ctx->ipv6->daddr.in6_u.u6_addr32[0];
+
+ second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest;
+ hash = siphash_2u64(first, second, &test_key_siphash);
+
+ if (ctx->attrs.tstamp_ok)
+ hash -= ctx->attrs.rcv_tsecr & COOKIE_MASK;
+ else
+ hash &= ~COOKIE_MASK;
+
+ hash -= cookie & ~COOKIE_MASK;
+ if (hash)
+ goto err;
+
+ mssind = (cookie & (3 << 6)) >> 6;
+ if (ctx->ipv4) {
+ if (mssind > ARRAY_SIZE(msstab4))
+ goto err;
+
+ ctx->attrs.mss = msstab4[mssind];
+ } else {
+ if (mssind > ARRAY_SIZE(msstab6))
+ goto err;
+
+ ctx->attrs.mss = msstab6[mssind];
+ }
+
+ ctx->attrs.snd_wscale = cookie & BPF_SYNCOOKIE_WSCALE_MASK;
+ ctx->attrs.rcv_wscale = ctx->attrs.snd_wscale;
+ ctx->attrs.wscale_ok = ctx->attrs.snd_wscale == BPF_SYNCOOKIE_WSCALE_MASK;
+ ctx->attrs.sack_ok = cookie & BPF_SYNCOOKIE_SACK;
+ ctx->attrs.ecn_ok = cookie & BPF_SYNCOOKIE_ECN;
+
+ return 0;
+err:
+ return -1;
+}
+
+static int tcp_handle_ack(struct tcp_syncookie *ctx)
+{
+ struct bpf_sock_tuple tuple;
+ struct bpf_sock *skc;
+ int ret = TC_ACT_OK;
+ struct sock *sk;
+ u32 tuple_size;
+
+ if (ctx->ipv4) {
+ tuple.ipv4.saddr = ctx->ipv4->saddr;
+ tuple.ipv4.daddr = ctx->ipv4->daddr;
+ tuple.ipv4.sport = ctx->tcp->source;
+ tuple.ipv4.dport = ctx->tcp->dest;
+ tuple_size = sizeof(tuple.ipv4);
+ } else if (ctx->ipv6) {
+ __builtin_memcpy(tuple.ipv6.saddr, &ctx->ipv6->saddr, sizeof(tuple.ipv6.saddr));
+ __builtin_memcpy(tuple.ipv6.daddr, &ctx->ipv6->daddr, sizeof(tuple.ipv6.daddr));
+ tuple.ipv6.sport = ctx->tcp->source;
+ tuple.ipv6.dport = ctx->tcp->dest;
+ tuple_size = sizeof(tuple.ipv6);
+ } else {
+ goto out;
+ }
+
+ skc = bpf_skc_lookup_tcp(ctx->skb, &tuple, tuple_size, -1, 0);
+ if (!skc)
+ goto out;
+
+ if (skc->state != TCP_LISTEN)
+ goto release;
+
+ sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
+ if (!sk)
+ goto err;
+
+ if (tcp_validate_header(ctx))
+ goto err;
+
+ tcp_parse_options(ctx);
+
+ if (tcp_validate_cookie(ctx))
+ goto err;
+
+ ret = bpf_sk_assign_tcp_reqsk(ctx->skb, sk, &ctx->attrs, sizeof(ctx->attrs));
+ if (ret < 0)
+ goto err;
+
+release:
+ bpf_sk_release(skc);
+out:
+ return ret;
+
+err:
+ ret = TC_ACT_SHOT;
+ goto release;
+}
+
+SEC("tc")
+int tcp_custom_syncookie(struct __sk_buff *skb)
+{
+ struct tcp_syncookie ctx = {
+ .skb = skb,
+ };
+
+ if (tcp_load_headers(&ctx))
+ return TC_ACT_OK;
+
+ if (ctx.tcp->rst)
+ return TC_ACT_OK;
+
+ if (ctx.tcp->syn) {
+ if (ctx.tcp->ack)
+ return TC_ACT_OK;
+
+ handled_syn = true;
+
+ return tcp_handle_syn(&ctx);
+ }
+
+ handled_ack = true;
+
+ return tcp_handle_ack(&ctx);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h
new file mode 100644
index 000000000000..29a6a53cf229
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright Amazon.com Inc. or its affiliates. */
+
+#ifndef _TEST_TCP_SYNCOOKIE_H
+#define _TEST_TCP_SYNCOOKIE_H
+
+#define __packed __attribute__((__packed__))
+#define __force
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+#define swap(a, b) \
+ do { \
+ typeof(a) __tmp = (a); \
+ (a) = (b); \
+ (b) = __tmp; \
+ } while (0)
+
+#define swap_array(a, b) \
+ do { \
+ typeof(a) __tmp[sizeof(a)]; \
+ __builtin_memcpy(__tmp, a, sizeof(a)); \
+ __builtin_memcpy(a, b, sizeof(a)); \
+ __builtin_memcpy(b, __tmp, sizeof(a)); \
+ } while (0)
+
+/* asm-generic/unaligned.h */
+#define __get_unaligned_t(type, ptr) ({ \
+ const struct { type x; } __packed * __pptr = (typeof(__pptr))(ptr); \
+ __pptr->x; \
+})
+
+#define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr))
+
+static inline u16 get_unaligned_be16(const void *p)
+{
+ return bpf_ntohs(__get_unaligned_t(__be16, p));
+}
+
+static inline u32 get_unaligned_be32(const void *p)
+{
+ return bpf_ntohl(__get_unaligned_t(__be32, p));
+}
+
+/* lib/checksum.c */
+static inline u32 from64to32(u64 x)
+{
+ /* add up 32-bit and 32-bit for 32+c bit */
+ x = (x & 0xffffffff) + (x >> 32);
+ /* add up carry.. */
+ x = (x & 0xffffffff) + (x >> 32);
+ return (u32)x;
+}
+
+static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
+ __u32 len, __u8 proto, __wsum sum)
+{
+ unsigned long long s = (__force u32)sum;
+
+ s += (__force u32)saddr;
+ s += (__force u32)daddr;
+#ifdef __BIG_ENDIAN
+ s += proto + len;
+#else
+ s += (proto + len) << 8;
+#endif
+ return (__force __wsum)from64to32(s);
+}
+
+/* asm-generic/checksum.h */
+static inline __sum16 csum_fold(__wsum csum)
+{
+ u32 sum = (__force u32)csum;
+
+ sum = (sum & 0xffff) + (sum >> 16);
+ sum = (sum & 0xffff) + (sum >> 16);
+ return (__force __sum16)~sum;
+}
+
+static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
+ __u8 proto, __wsum sum)
+{
+ return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
+}
+
+/* net/ipv6/ip6_checksum.c */
+static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u32 len, __u8 proto, __wsum csum)
+{
+ int carry;
+ __u32 ulen;
+ __u32 uproto;
+ __u32 sum = (__force u32)csum;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[0];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[0]);
+ sum += carry;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[1];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[1]);
+ sum += carry;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[2];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[2]);
+ sum += carry;
+
+ sum += (__force u32)saddr->in6_u.u6_addr32[3];
+ carry = (sum < (__force u32)saddr->in6_u.u6_addr32[3]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[0];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[0]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[1];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[1]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[2];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[2]);
+ sum += carry;
+
+ sum += (__force u32)daddr->in6_u.u6_addr32[3];
+ carry = (sum < (__force u32)daddr->in6_u.u6_addr32[3]);
+ sum += carry;
+
+ ulen = (__force u32)bpf_htonl((__u32)len);
+ sum += ulen;
+ carry = (sum < ulen);
+ sum += carry;
+
+ uproto = (__force u32)bpf_htonl(proto);
+ sum += uproto;
+ carry = (sum < uproto);
+ sum += carry;
+
+ return csum_fold((__force __wsum)sum);
+}
+#endif
--
2.30.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC.
2024-01-15 20:55 [PATCH v8 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at TC Kuniyuki Iwashima
` (5 preceding siblings ...)
2024-01-15 20:55 ` [PATCH v8 bpf-next 6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk() Kuniyuki Iwashima
@ 2024-01-17 1:50 ` patchwork-bot+netdevbpf
6 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-17 1:50 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: edumazet, ast, daniel, andrii, martin.lau, pabeni, kuni1840, bpf,
netdev
Hello:
This series was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:
On Mon, 15 Jan 2024 12:55:08 -0800 you wrote:
> Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless
> for the connection request until a valid ACK is responded to the SYN+ACK.
>
> The cookie contains two kinds of host-specific bits, a timestamp and
> secrets, so only can it be validated by the generator. It means SYN
> Cookie consumes network resources between the client and the server;
> intermediate nodes must remember which nodes to route ACK for the cookie.
>
> [...]
Here is the summary with links:
- [v8,bpf-next,1/6] tcp: Move tcp_ns_to_ts() to tcp.h
https://git.kernel.org/bpf/bpf-next/c/e8a7ea899527
- [v8,bpf-next,2/6] tcp: Move skb_steal_sock() to request_sock.h
https://git.kernel.org/bpf/bpf-next/c/2d1ee30a3b07
- [v8,bpf-next,3/6] bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().
https://git.kernel.org/bpf/bpf-next/c/5f8b96b9b391
- [v8,bpf-next,4/6] bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().
https://git.kernel.org/bpf/bpf-next/c/311ef79955d3
- [v8,bpf-next,5/6] bpf: tcp: Support arbitrary SYN Cookie.
https://git.kernel.org/bpf/bpf-next/c/b9c3eca5c086
- [v8,bpf-next,6/6] selftest: bpf: Test bpf_sk_assign_tcp_reqsk().
https://git.kernel.org/bpf/bpf-next/c/98af7dca1e0d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 11+ messages in thread