netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fixed rtnl deadlock from gtp
@ 2024-10-01  1:55 Daniel Yang
  2024-10-01  3:03 ` Kuniyuki Iwashima
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Yang @ 2024-10-01  1:55 UTC (permalink / raw)
  To: Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-s390, netdev, linux-kernel
  Cc: danielyangkang, syzbot+e953a8f3071f5c0a28fd

Fixes deadlock described in this bug:
https://syzkaller.appspot.com/bug?extid=e953a8f3071f5c0a28fd.
Specific crash report here:
https://syzkaller.appspot.com/text?tag=CrashReport&x=14670e07980000.

DESCRIPTION OF ISSUE
Deadlock: sk_lock-AF_INET --> &smc->clcsock_release_lock --> rtnl_mutex

rtnl_mutex->sk_lock-AF_INET
rtnetlink_rcv_msg() acquires rtnl_lock() and calls rtnl_newlink(), which
eventually calls gtp_newlink() which calls lock_sock() to attempt to
acquire sk_lock.

sk_lock-AF_INET->&smc->clcsock_release_lock
smc_sendmsg() calls lock_sock() to acquire sk_lock, then calls
smc_switch_to_fallback() which attempts to acquire mutex_lock(&smc->...).

&smc->clcsock_release_lock->rtnl_mutex
smc_setsockopt() calls mutex_lock(&smc->...). smc->...->setsockopt() is
called, which calls nf_setsockopt() which attempts to acquire
rtnl_lock() in some nested call in start_sync_thread() in ip_vs_sync.c.

FIX:
In smc_switch_to_fallback(), separate the logic into inline function
__smc_switch_to_fallback(). In smc_sendmsg(), lock ordering can be
modified and the functionality of smc_switch_to_fallback() is
encapsulated in the __smc_switch_to_fallback() function.

Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Tested-by: Daniel Yang <danielyangkang@gmail.com>
Reported-by: syzbot+e953a8f3071f5c0a28fd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e953a8f3071f5c0a28fd
---
 net/smc/af_smc.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0316217b7..e04f132be 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -895,11 +895,15 @@ static void smc_fback_replace_callbacks(struct smc_sock *smc)
 	write_unlock_bh(&clcsk->sk_callback_lock);
 }
 
-static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
+/* assumes smc->clcsock_release_lock is held during execution
+ * reason for separating locking is to give flexibility in
+ * lock ordering in functions wanting to call smc_switch_to_fallback
+ * so that deadlocks can be avoided.
+ */
+static inline int __smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 {
 	int rc = 0;
 
-	mutex_lock(&smc->clcsock_release_lock);
 	if (!smc->clcsock) {
 		rc = -EBADF;
 		goto out;
@@ -923,6 +927,13 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 		smc_fback_replace_callbacks(smc);
 	}
 out:
+	return rc;
+}
+
+static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
+{
+	mutex_lock(&smc->clcsock_release_lock);
+	int rc = __smc_switch_to_fallback(smc, reason_code);
 	mutex_unlock(&smc->clcsock_release_lock);
 	return rc;
 }
@@ -2762,13 +2773,15 @@ int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 	int rc;
 
 	smc = smc_sk(sk);
+	/* acquire smc lock before sk to avoid deadlock with rtnl */
+	mutex_lock(&smc->clcsock_release_lock);
 	lock_sock(sk);
 
 	/* SMC does not support connect with fastopen */
 	if (msg->msg_flags & MSG_FASTOPEN) {
 		/* not connected yet, fallback */
 		if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) {
-			rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP);
+			rc = __smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP);
 			if (rc)
 				goto out;
 		} else {
@@ -2790,6 +2803,7 @@ int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 	}
 out:
 	release_sock(sk);
+	mutex_unlock(&smc->clcsock_release_lock);
 	return rc;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] fixed rtnl deadlock from gtp
  2024-10-01  1:55 [PATCH] fixed rtnl deadlock from gtp Daniel Yang
@ 2024-10-01  3:03 ` Kuniyuki Iwashima
       [not found]   ` <CAGiJo8Rmr2JJ0cCuGDGUeM-fNXdF1L1==bBqJdcCxBkJUTHzuw@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-01  3:03 UTC (permalink / raw)
  To: danielyangkang
  Cc: alibuda, davem, edumazet, guwen, jaka, kuba, linux-kernel,
	linux-s390, netdev, pabeni, syzbot+e953a8f3071f5c0a28fd, tonylu,
	wenjia, kuniyu

From: Daniel Yang <danielyangkang@gmail.com>
Date: Mon, 30 Sep 2024 18:55:54 -0700
> Fixes deadlock described in this bug:
> https://syzkaller.appspot.com/bug?extid=e953a8f3071f5c0a28fd.
> Specific crash report here:
> https://syzkaller.appspot.com/text?tag=CrashReport&x=14670e07980000.
> 
> DESCRIPTION OF ISSUE
> Deadlock: sk_lock-AF_INET --> &smc->clcsock_release_lock --> rtnl_mutex
> 
> rtnl_mutex->sk_lock-AF_INET
> rtnetlink_rcv_msg() acquires rtnl_lock() and calls rtnl_newlink(), which
> eventually calls gtp_newlink() which calls lock_sock() to attempt to
> acquire sk_lock.

Is the deadlock real ?

From the lockdep splat, the gtp's sk_protocol is verified to be
IPPROTO_UDP before holding lock_sock(), so it seems just a labeling
issue.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/drivers/net/gtp.c?id=9410645520e9b820069761f3450ef6661418e279#n1674


> 
> sk_lock-AF_INET->&smc->clcsock_release_lock
> smc_sendmsg() calls lock_sock() to acquire sk_lock, then calls
> smc_switch_to_fallback() which attempts to acquire mutex_lock(&smc->...).
> 
> &smc->clcsock_release_lock->rtnl_mutex
> smc_setsockopt() calls mutex_lock(&smc->...). smc->...->setsockopt() is
> called, which calls nf_setsockopt() which attempts to acquire
> rtnl_lock() in some nested call in start_sync_thread() in ip_vs_sync.c.
> 
> FIX:
> In smc_switch_to_fallback(), separate the logic into inline function
> __smc_switch_to_fallback(). In smc_sendmsg(), lock ordering can be
> modified and the functionality of smc_switch_to_fallback() is
> encapsulated in the __smc_switch_to_fallback() function.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] fixed rtnl deadlock from gtp
       [not found]   ` <CAGiJo8Rmr2JJ0cCuGDGUeM-fNXdF1L1==bBqJdcCxBkJUTHzuw@mail.gmail.com>
@ 2024-10-01  7:46     ` Eric Dumazet
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2024-10-01  7:46 UTC (permalink / raw)
  To: Daniel Yang
  Cc: Kuniyuki Iwashima, alibuda, davem, guwen, jaka, kuba,
	linux-kernel, linux-s390, netdev, pabeni,
	syzbot+e953a8f3071f5c0a28fd, tonylu, wenjia

On Tue, Oct 1, 2024 at 6:54 AM Daniel Yang <danielyangkang@gmail.com> wrote:
>
> Ok I see the issue. Yes it does seem to be a false positive. Then do we already have lockdep classes and subclasses set up for lock_sock() to prevent other false positives like this one? If not, should I add one then to resolve this?
>

Please  do not top post on linux mailing lists

About your question :
https://lore.kernel.org/netdev/CANn89iKcWmufo83xy-SwSrXYt6UpL2Pb+5pWuzyYjMva5F8bBQ@mail.gmail.com/


> On Mon, Sep 30, 2024 at 8:04 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>>
>> From: Daniel Yang <danielyangkang@gmail.com>
>> Date: Mon, 30 Sep 2024 18:55:54 -0700
>> > Fixes deadlock described in this bug:
>> > https://syzkaller.appspot.com/bug?extid=e953a8f3071f5c0a28fd.
>> > Specific crash report here:
>> > https://syzkaller.appspot.com/text?tag=CrashReport&x=14670e07980000.
>> >
>> > DESCRIPTION OF ISSUE
>> > Deadlock: sk_lock-AF_INET --> &smc->clcsock_release_lock --> rtnl_mutex
>> >
>> > rtnl_mutex->sk_lock-AF_INET
>> > rtnetlink_rcv_msg() acquires rtnl_lock() and calls rtnl_newlink(), which
>> > eventually calls gtp_newlink() which calls lock_sock() to attempt to
>> > acquire sk_lock.
>>
>> Is the deadlock real ?
>>
>> From the lockdep splat, the gtp's sk_protocol is verified to be
>> IPPROTO_UDP before holding lock_sock(), so it seems just a labeling
>> issue.
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/drivers/net/gtp.c?id=9410645520e9b820069761f3450ef6661418e279#n1674
>>
>>
>> >
>> > sk_lock-AF_INET->&smc->clcsock_release_lock
>> > smc_sendmsg() calls lock_sock() to acquire sk_lock, then calls
>> > smc_switch_to_fallback() which attempts to acquire mutex_lock(&smc->...).
>> >
>> > &smc->clcsock_release_lock->rtnl_mutex
>> > smc_setsockopt() calls mutex_lock(&smc->...). smc->...->setsockopt() is
>> > called, which calls nf_setsockopt() which attempts to acquire
>> > rtnl_lock() in some nested call in start_sync_thread() in ip_vs_sync.c.
>> >
>> > FIX:
>> > In smc_switch_to_fallback(), separate the logic into inline function
>> > __smc_switch_to_fallback(). In smc_sendmsg(), lock ordering can be
>> > modified and the functionality of smc_switch_to_fallback() is
>> > encapsulated in the __smc_switch_to_fallback() function.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-10-01  7:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01  1:55 [PATCH] fixed rtnl deadlock from gtp Daniel Yang
2024-10-01  3:03 ` Kuniyuki Iwashima
     [not found]   ` <CAGiJo8Rmr2JJ0cCuGDGUeM-fNXdF1L1==bBqJdcCxBkJUTHzuw@mail.gmail.com>
2024-10-01  7:46     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).