All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Cong Wang <xiyou.wangcong@gmail.com>,
	 Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>,
	 John Fastabend <john.fastabend@gmail.com>,
	 Jakub Sitnicki <jakub@cloudflare.com>,
	 Jason Xing <kernelxing@tencent.com>,
	 netdev@vger.kernel.org,  bpf@vger.kernel.org
Subject: Re: Recursive locking in sockmap
Date: Thu, 13 Jun 2024 12:08:25 -0700	[thread overview]
Message-ID: <666b43a9b2b4a_1995c208f@john.notmuch> (raw)
In-Reply-To: <ZmMuh5mkK7w7s/3L@pop-os.localdomain>

Cong Wang wrote:
> On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > > <vincent.whitchurch@datadoghq.com> wrote:
> > > > With a socket in the sockmap, if there's a parser callback installed
> > > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > > immediately after the verdict callback is run. This started at commit
> > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > > >
> > > > It can be reproduced by running ./test_sockmap -t ping
> > > > --txmsg_pass_skb.  The --txmsg_pass_skb command to test_sockmap is
> > > > available in this series:
> > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> > >
> > > I don't have time right now to look into this issue carefully until
> > > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > > solve the problem?
> > 
> > No.  That patch set addresses a different problem which occurs even if
> > only a verdict callback is used. But patch 4/5 in that patch set adds
> > the --txmsg_pass_skb option to the test_sockmap test program, and that
> > option can be used to reproduce this deadlock too.
> 
> I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
> test the following patch?
> 
> ------------>
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..da64ded97f3a 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
>  		if (tls_sw_has_ctx_rx(sk)) {
>  			psock->saved_data_ready(sk);
>  		} else {
> -			write_lock_bh(&sk->sk_callback_lock);
>  			strp_data_ready(&psock->strp);
> -			write_unlock_bh(&sk->sk_callback_lock);
>  		}
>  	}
>  	rcu_read_unlock();

Its not obvious to me that we can run the strp parser without the
sk_callback lock here. I believe below is the correct fix. It
fixes the splat above with test.

bpf: sockmap, fix introduced strparser recursive lock

Originally there was a race where removing a psock from the sock map while
it was also receiving an skb and calling sk_psock_data_ready(). It was
possible the removal code would NULL/set the data_ready callback while
concurrently calling the hook from receive path. The fix was to wrap the
access in sk_callback_lock to ensure the saved_data_ready pointer didn't
change under us. There was some discussion around doing a larger change
to ensure we could use READ_ONCE/WRITE_ONCE over the callback, but that
was for *next kernels not stable fixes.

But, we unfortunately introduced a regression with the fix because there
is another path into this code (that didn't have a test case) through
the stream parser. The stream parser runs with the lower lock which means
we get the following splat and lock up.


 ============================================
 WARNING: possible recursive locking detected
 6.10.0-rc2 #59 Not tainted
 --------------------------------------------
 test_sockmap/342 is trying to acquire lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
 sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
 net/core/skmsg.c:555)

 but task is already holding lock:
 ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
 sk_psock_strp_data_ready (net/core/skmsg.c:1120)

To fix ensure we do not grap lock when we reach this code through the
strparser.

Fixes: 6648e613226e1 ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/skmsg.h |    9 +++++++--
 net/core/skmsg.c      |    5 ++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c9efda9df285..3659e9b514d0 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -461,13 +461,18 @@ static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock)
 		sk_psock_drop(sk, psock);
 }
 
-static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+static inline void __sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
 {
-	read_lock_bh(&sk->sk_callback_lock);
 	if (psock->saved_data_ready)
 		psock->saved_data_ready(sk);
 	else
 		sk->sk_data_ready(sk);
+}
+
+static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+{
+	read_lock_bh(&sk->sk_callback_lock);
+	__sk_psock_data_ready(sk, psock);
 	read_unlock_bh(&sk->sk_callback_lock);
 }
 
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..8429daecbbb6 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -552,7 +552,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
 	msg->skb = skb;
 
 	sk_psock_queue_msg(psock, msg);
-	sk_psock_data_ready(sk, psock);
+	if (skb_bpf_strparser(skb))
+		__sk_psock_data_ready(sk, psock);
+	else
+		sk_psock_data_ready(sk, psock);
 	return copied;
 }

  reply	other threads:[~2024-06-13 19:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-06 10:00 Recursive locking in sockmap Vincent Whitchurch
2024-06-06 12:46 ` Jason Xing
2024-06-07 12:09   ` Vincent Whitchurch
2024-06-07 16:00     ` Cong Wang
2024-06-13 19:08       ` John Fastabend [this message]
2024-07-20 18:20         ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=666b43a9b2b4a_1995c208f@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=netdev@vger.kernel.org \
    --cc=vincent.whitchurch@datadoghq.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.