BPF List
 help / color / mirror / Atom feed
* [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for
@ 2024-12-10  1:20 zijianzhang
  2024-12-10  1:20 ` [PATCH v2 bpf 1/2] tcp_bpf: charge receive socket buffer in bpf_tcp_ingress() zijianzhang
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: zijianzhang @ 2024-12-10  1:20 UTC (permalink / raw)
  To: bpf
  Cc: john.fastabend, jakub, davem, edumazet, kuba, pabeni, dsahern,
	horms, ast, daniel, netdev, Zijian Zhang

From: Zijian Zhang <zijianzhang@bytedance.com>

We should do sk_rmem_schedule instead of sk_wmem_schedule in function
bpf_tcp_ingress. We also need to update sk_rmem_alloc in bpf_tcp_ingress
accordingly to account for the rmem.

v2:
  - Update the commit message to indicate the reason for msg->skb check

Cong Wang (1):
  tcp_bpf: charge receive socket buffer in bpf_tcp_ingress()

Zijian Zhang (1):
  tcp_bpf: add sk_rmem_alloc related logic for tcp_bpf ingress
    redirection

 include/linux/skmsg.h | 11 ++++++++---
 include/net/sock.h    | 10 ++++++++--
 net/core/skmsg.c      |  6 +++++-
 net/ipv4/tcp_bpf.c    |  6 ++++--
 4 files changed, 25 insertions(+), 8 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 bpf 1/2] tcp_bpf: charge receive socket buffer in bpf_tcp_ingress()
  2024-12-10  1:20 [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for zijianzhang
@ 2024-12-10  1:20 ` zijianzhang
  2024-12-10  1:20 ` [PATCH v2 bpf 2/2] tcp_bpf: add sk_rmem_alloc related logic for tcp_bpf ingress redirection zijianzhang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: zijianzhang @ 2024-12-10  1:20 UTC (permalink / raw)
  To: bpf
  Cc: john.fastabend, jakub, davem, edumazet, kuba, pabeni, dsahern,
	horms, ast, daniel, netdev, Cong Wang

From: Cong Wang <cong.wang@bytedance.com>

When bpf_tcp_ingress() is called, the skmsg is being redirected to the
ingress of the destination socket. Therefore, we should charge its
receive socket buffer, instead of sending socket buffer.

Because sk_rmem_schedule() tests pfmemalloc of skb, we need to
introduce a wrapper and call it for skmsg.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/sock.h | 10 ++++++++--
 net/ipv4/tcp_bpf.c |  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7464e9f9f47c..c383126f691d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1527,7 +1527,7 @@ static inline bool sk_wmem_schedule(struct sock *sk, int size)
 }
 
 static inline bool
-sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size)
+__sk_rmem_schedule(struct sock *sk, int size, bool pfmemalloc)
 {
 	int delta;
 
@@ -1535,7 +1535,13 @@ sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size)
 		return true;
 	delta = size - sk->sk_forward_alloc;
 	return delta <= 0 || __sk_mem_schedule(sk, delta, SK_MEM_RECV) ||
-		skb_pfmemalloc(skb);
+	       pfmemalloc;
+}
+
+static inline bool
+sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size)
+{
+	return __sk_rmem_schedule(sk, size, skb_pfmemalloc(skb));
 }
 
 static inline int sk_unused_reserved_mem(const struct sock *sk)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 99cef92e6290..b21ea634909c 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -49,7 +49,7 @@ static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock,
 		sge = sk_msg_elem(msg, i);
 		size = (apply && apply_bytes < sge->length) ?
 			apply_bytes : sge->length;
-		if (!sk_wmem_schedule(sk, size)) {
+		if (!__sk_rmem_schedule(sk, size, false)) {
 			if (!copied)
 				ret = -ENOMEM;
 			break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 bpf 2/2] tcp_bpf: add sk_rmem_alloc related logic for tcp_bpf ingress redirection
  2024-12-10  1:20 [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for zijianzhang
  2024-12-10  1:20 ` [PATCH v2 bpf 1/2] tcp_bpf: charge receive socket buffer in bpf_tcp_ingress() zijianzhang
@ 2024-12-10  1:20 ` zijianzhang
  2024-12-20 16:51 ` [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for John Fastabend
  2024-12-20 17:10 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: zijianzhang @ 2024-12-10  1:20 UTC (permalink / raw)
  To: bpf
  Cc: john.fastabend, jakub, davem, edumazet, kuba, pabeni, dsahern,
	horms, ast, daniel, netdev, Zijian Zhang

From: Zijian Zhang <zijianzhang@bytedance.com>

When we do sk_psock_verdict_apply->sk_psock_skb_ingress, an sk_msg will
be created out of the skb, and the rmem accounting of the sk_msg will be
handled by the skb.

For skmsgs in __SK_REDIRECT case of tcp_bpf_send_verdict, when redirecting
to the ingress of a socket, although we sk_rmem_schedule and add sk_msg to
the ingress_msg of sk_redir, we do not update sk_rmem_alloc. As a result,
except for the global memory limit, the rmem of sk_redir is nearly
unlimited. Thus, add sk_rmem_alloc related logic to limit the recv buffer.

Since the function sk_msg_recvmsg and __sk_psock_purge_ingress_msg are
used in these two paths. We use "msg->skb" to test whether the sk_msg is
skb backed up. If it's not, we shall do the memory accounting explicitly.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 include/linux/skmsg.h | 11 ++++++++---
 net/core/skmsg.c      |  6 +++++-
 net/ipv4/tcp_bpf.c    |  4 +++-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index d9b03e0746e7..2cbe0c22a32f 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -317,17 +317,22 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb)
 	kfree_skb(skb);
 }
 
-static inline void sk_psock_queue_msg(struct sk_psock *psock,
+static inline bool sk_psock_queue_msg(struct sk_psock *psock,
 				      struct sk_msg *msg)
 {
+	bool ret;
+
 	spin_lock_bh(&psock->ingress_lock);
-	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
+	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
 		list_add_tail(&msg->list, &psock->ingress_msg);
-	else {
+		ret = true;
+	} else {
 		sk_msg_free(psock->sk, msg);
 		kfree(msg);
+		ret = false;
 	}
 	spin_unlock_bh(&psock->ingress_lock);
+	return ret;
 }
 
 static inline struct sk_msg *sk_psock_dequeue_msg(struct sk_psock *psock)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index e90fbab703b2..8ad7e6755fd6 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -445,8 +445,10 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
 			if (likely(!peek)) {
 				sge->offset += copy;
 				sge->length -= copy;
-				if (!msg_rx->skb)
+				if (!msg_rx->skb) {
 					sk_mem_uncharge(sk, copy);
+					atomic_sub(copy, &sk->sk_rmem_alloc);
+				}
 				msg_rx->sg.size -= copy;
 
 				if (!sge->length) {
@@ -772,6 +774,8 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock)
 
 	list_for_each_entry_safe(msg, tmp, &psock->ingress_msg, list) {
 		list_del(&msg->list);
+		if (!msg->skb)
+			atomic_sub(msg->sg.size, &psock->sk->sk_rmem_alloc);
 		sk_msg_free(psock->sk, msg);
 		kfree(msg);
 	}
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index b21ea634909c..392678ae80f4 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -56,6 +56,7 @@ static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock,
 		}
 
 		sk_mem_charge(sk, size);
+		atomic_add(size, &sk->sk_rmem_alloc);
 		sk_msg_xfer(tmp, msg, i, size);
 		copied += size;
 		if (sge->length)
@@ -74,7 +75,8 @@ static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock,
 
 	if (!ret) {
 		msg->sg.start = i;
-		sk_psock_queue_msg(psock, tmp);
+		if (!sk_psock_queue_msg(psock, tmp))
+			atomic_sub(copied, &sk->sk_rmem_alloc);
 		sk_psock_data_ready(sk, psock);
 	} else {
 		sk_msg_free(sk, tmp);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for
  2024-12-10  1:20 [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for zijianzhang
  2024-12-10  1:20 ` [PATCH v2 bpf 1/2] tcp_bpf: charge receive socket buffer in bpf_tcp_ingress() zijianzhang
  2024-12-10  1:20 ` [PATCH v2 bpf 2/2] tcp_bpf: add sk_rmem_alloc related logic for tcp_bpf ingress redirection zijianzhang
@ 2024-12-20 16:51 ` John Fastabend
  2024-12-20 17:10 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: John Fastabend @ 2024-12-20 16:51 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: john.fastabend, jakub, davem, edumazet, kuba, pabeni, dsahern,
	horms, ast, daniel, netdev, Zijian Zhang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> We should do sk_rmem_schedule instead of sk_wmem_schedule in function
> bpf_tcp_ingress. We also need to update sk_rmem_alloc in bpf_tcp_ingress
> accordingly to account for the rmem.
> 
> v2:
>   - Update the commit message to indicate the reason for msg->skb check
> 
> Cong Wang (1):
>   tcp_bpf: charge receive socket buffer in bpf_tcp_ingress()
> 
> Zijian Zhang (1):
>   tcp_bpf: add sk_rmem_alloc related logic for tcp_bpf ingress
>     redirection
> 
>  include/linux/skmsg.h | 11 ++++++++---
>  include/net/sock.h    | 10 ++++++++--
>  net/core/skmsg.c      |  6 +++++-
>  net/ipv4/tcp_bpf.c    |  6 ++++--
>  4 files changed, 25 insertions(+), 8 deletions(-)
> 
> -- 
> 2.20.1
> 

Thanks. Sorry fo rthe delay I thought this had an ACK already. My fault.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for
  2024-12-10  1:20 [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for zijianzhang
                   ` (2 preceding siblings ...)
  2024-12-20 16:51 ` [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for John Fastabend
@ 2024-12-20 17:10 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-12-20 17:10 UTC (permalink / raw)
  To: Zijian Zhang
  Cc: bpf, john.fastabend, jakub, davem, edumazet, kuba, pabeni,
	dsahern, horms, ast, daniel, netdev

Hello:

This series was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Tue, 10 Dec 2024 01:20:37 +0000 you wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> We should do sk_rmem_schedule instead of sk_wmem_schedule in function
> bpf_tcp_ingress. We also need to update sk_rmem_alloc in bpf_tcp_ingress
> accordingly to account for the rmem.
> 
> v2:
>   - Update the commit message to indicate the reason for msg->skb check
> 
> [...]

Here is the summary with links:
  - [v2,bpf,1/2] tcp_bpf: charge receive socket buffer in bpf_tcp_ingress()
    https://git.kernel.org/bpf/bpf/c/54f89b3178d5
  - [v2,bpf,2/2] tcp_bpf: add sk_rmem_alloc related logic for tcp_bpf ingress redirection
    https://git.kernel.org/bpf/bpf/c/d888b7af7c14

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-12-20 17:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-10  1:20 [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for zijianzhang
2024-12-10  1:20 ` [PATCH v2 bpf 1/2] tcp_bpf: charge receive socket buffer in bpf_tcp_ingress() zijianzhang
2024-12-10  1:20 ` [PATCH v2 bpf 2/2] tcp_bpf: add sk_rmem_alloc related logic for tcp_bpf ingress redirection zijianzhang
2024-12-20 16:51 ` [PATCH v2 bpf 0/2] tcp_bpf: update the rmem scheduling for John Fastabend
2024-12-20 17:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox