netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Zijian Zhang <zijianzhang@bytedance.com>,
	 Cong Wang <xiyou.wangcong@gmail.com>
Cc: bpf@vger.kernel.org,  john.fastabend@gmail.com,
	 jakub@cloudflare.com,  davem@davemloft.net,
	 edumazet@google.com,  kuba@kernel.org,  pabeni@redhat.com,
	 dsahern@kernel.org,  netdev@vger.kernel.org,
	 cong.wang@bytedance.com
Subject: Re: [External] Re: [PATCH bpf 2/2] tcp_bpf: add sk_rmem_alloc related logic for ingress redirection
Date: Wed, 20 Nov 2024 22:07:19 -0800	[thread overview]
Message-ID: <673ece17df5d6_157a20815@john.notmuch> (raw)
In-Reply-To: <1016b317-d521-4787-80dc-3b92320f2d19@bytedance.com>

Zijian Zhang wrote:
> 
> On 11/7/24 8:04 PM, Cong Wang wrote:
> > On Thu, Oct 17, 2024 at 12:57:42AM +0000, zijianzhang@bytedance.com wrote:
> >> From: Zijian Zhang <zijianzhang@bytedance.com>
> >>
> >> Although we sk_rmem_schedule and add sk_msg to the ingress_msg of sk_redir
> >> in bpf_tcp_ingress, we do not update sk_rmem_alloc. As a result, except
> >> for the global memory limit, the rmem of sk_redir is nearly unlimited.
> >>
> >> Thus, add sk_rmem_alloc related logic to limit the recv buffer.
> >>
> >> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> >> ---
> >>   include/linux/skmsg.h | 11 ++++++++---
> >>   net/core/skmsg.c      |  6 +++++-
> >>   net/ipv4/tcp_bpf.c    |  4 +++-
> >>   3 files changed, 16 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> >> index d9b03e0746e7..2cbe0c22a32f 100644
> >> --- a/include/linux/skmsg.h
> >> +++ b/include/linux/skmsg.h
> >> @@ -317,17 +317,22 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb)
> >>   	kfree_skb(skb);
> >>   }
> >>   
> >> -static inline void sk_psock_queue_msg(struct sk_psock *psock,
> >> +static inline bool sk_psock_queue_msg(struct sk_psock *psock,
> >>   				      struct sk_msg *msg)
> >>   {
> >> +	bool ret;
> >> +
> >>   	spin_lock_bh(&psock->ingress_lock);
> >> -	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
> >> +	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
> >>   		list_add_tail(&msg->list, &psock->ingress_msg);
> >> -	else {
> >> +		ret = true;
> >> +	} else {
> >>   		sk_msg_free(psock->sk, msg);
> >>   		kfree(msg);
> >> +		ret = false;
> >>   	}
> >>   	spin_unlock_bh(&psock->ingress_lock);
> >> +	return ret;
> >>   }
> >>   
> >>   static inline struct sk_msg *sk_psock_dequeue_msg(struct sk_psock *psock)
> >> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> >> index b1dcbd3be89e..110ee0abcfe0 100644
> >> --- a/net/core/skmsg.c
> >> +++ b/net/core/skmsg.c
> >> @@ -445,8 +445,10 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
> >>   			if (likely(!peek)) {
> >>   				sge->offset += copy;
> >>   				sge->length -= copy;
> >> -				if (!msg_rx->skb)
> >> +				if (!msg_rx->skb) {
> >>   					sk_mem_uncharge(sk, copy);
> >> +					atomic_sub(copy, &sk->sk_rmem_alloc);
> >> +				}
> >>   				msg_rx->sg.size -= copy;
> >>   
> >>   				if (!sge->length) {
> >> @@ -772,6 +774,8 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock)
> >>   
> >>   	list_for_each_entry_safe(msg, tmp, &psock->ingress_msg, list) {
> >>   		list_del(&msg->list);
> >> +		if (!msg->skb)
> >> +			atomic_sub(msg->sg.size, &psock->sk->sk_rmem_alloc);
> >>   		sk_msg_free(psock->sk, msg);
> > 
> > Why not calling this atomic_sub() in sk_msg_free_elem()?
> > 
> > Thanks.
> 
> sk_msg_free_elem called by sk_msg_free or sk_msg_free_no_charge will
> be invoked in multiple locations including TX/RX/Error and etc.
> 
> We should call atomic_sub(&sk->sk_rmem_alloc) for sk_msgs that have
> been atomic_add before. In other words, we need to call atomic_sub
> only for sk_msgs in ingress_msg.
> 
> As for "!msg->skb" check here, I want to make sure the sk_msg is not
> from function sk_psock_skb_ingress_enqueue, because these sk_msgs'
> rmem accounting has already handled by skb_set_owner_r in function
> sk_psock_skb_ingress.
> 

Assuming I read above correct this is only an issue when doing a
redirect to ingress of a socket? The other path where we do a
SK_PASS directly from the verdict to hit the ingress of the
current socket is OK because all account is done already through
skb. Basically that is what the above explanation for !msg->skb
is describing?

Can we add this to the description so we don't forget it and/or
have to look up the mailing list in the future.

Thanks,
John

  reply	other threads:[~2024-11-21  6:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-17  0:57 [PATCH bpf 0/2] tcp_bpf: update the rmem scheduling for ingress redirection zijianzhang
2024-10-17  0:57 ` [PATCH bpf 1/2] tcp_bpf: charge receive socket buffer in bpf_tcp_ingress() zijianzhang
2024-10-17  0:57 ` [PATCH bpf 2/2] tcp_bpf: add sk_rmem_alloc related logic for ingress redirection zijianzhang
2024-11-08  4:04   ` Cong Wang
2024-11-08 18:28     ` [External] " Zijian Zhang
2024-11-21  6:07       ` John Fastabend [this message]
2024-11-08  3:58 ` [PATCH bpf 0/2] tcp_bpf: update the rmem scheduling " Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=673ece17df5d6_157a20815@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=jakub@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=zijianzhang@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).