From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>,
"yanjun.zhu@linux.dev" <yanjun.zhu@linux.dev>
Cc: Edward Adam Davis <eadavis@qq.com>,
akpm@linux-foundation.org, arjan@linux.intel.com,
davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
hdanton@sina.com, horms@kernel.org, jgg@ziepe.ca,
kuba@kernel.org, leon@kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
pabeni@redhat.com,
syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com,
syzkaller-bugs@googlegroups.com, zyjzyj2000@gmail.com
Subject: Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
Date: Sat, 16 May 2026 21:31:59 -0700 [thread overview]
Message-ID: <0ed07dc7-b303-4577-8c07-06fc536ab1ca@linux.dev> (raw)
In-Reply-To: <1dbc6a9d-4933-4123-90d2-a2735f9d8f58@linux.dev>
在 2026/5/16 20:27, Zhu Yanjun 写道:
>
> 在 2026/5/16 19:15, Kuniyuki Iwashima 写道:
>> On Sat, May 16, 2026 at 4:40 PM Yanjun.Zhu <yanjun.zhu@linux.dev> wrote:
>>>
>>> On 5/16/26 7:00 AM, Edward Adam Davis wrote:
>>>> We must serialize calls to rxe_net_del() or risk a crash as syzbot
>>>> reported:
>>>>
>>>> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
>>>> Call Trace:
>>>> udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>>> rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294
>>>> [inline]
>>>> rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>>> rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>>> rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>>
>>>> Jason Gunthorpe suggest placing the lock within rxe to protect its racy
>>>> implementation of rxe_net_del(), which looks like it is possibly also
>>>> triggered by NETDEV_UNREGISTER.
>>>>
>>>> The patch addressing this issue in nldev_dellink() has already been
>>>> applied(0b28000b64f4); however, since the fix has now been relocated
>>>> to rxe, the corresponding remedial code in nldev has been removed.
>>>>
>>>> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and
>>>> destruction per net namespace")
>>>> Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in
>>>> nldev_dellink()")
>>>> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>>> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
>>>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>>>> ---
>>>> v1 -> v2: serialize calls to rxe net del
>>>>
>>>> drivers/infiniband/core/nldev.c | 4 ----
>>>> drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
>>>> 2 files changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/
>>>> core/nldev.c
>>>> index 3cb3cb7629fe..96c745d5bac4 100644
>>>> --- a/drivers/infiniband/core/nldev.c
>>>> +++ b/drivers/infiniband/core/nldev.c
>>>> @@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb,
>>>> struct nlmsghdr *nlh,
>>>> return err;
>>>> }
>>>>
>>>> -static DEFINE_MUTEX(nldev_dellink_mutex);
>>>> -
>>>> static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>>>> struct netlink_ext_ack *extack)
>>>> {
>>>> @@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb,
>>>> struct nlmsghdr *nlh,
>>>> * implicitly scoped to the driver supporting dynamic link
>>>> deletion like RXE.
>>>> */
>>>> if (device->link_ops && device->link_ops->dellink) {
>>>> - mutex_lock(&nldev_dellink_mutex);
>>>> err = device->link_ops->dellink(device);
>>>> - mutex_unlock(&nldev_dellink_mutex);
>>>> if (err)
>>>> return err;
>>>> }
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/
>>>> infiniband/sw/rxe/rxe_net.c
>>>> index 50a2cb5405e2..92847e955ca2 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>>> @@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
>>>> }
>>>> }
>>>>
>>> I read this commit carefully. There are two paths that can invoke
>>> rxe_net_del().
>>>
>>> One is through the rdma link del xxx command, while the other is through
>>> the netdevice notification chain.
>>>
>>> In the netdevice notification chain path, rtnl_lock is already held, and
>>> rxe_net_del() is called under that lock.
>>>
>>> However, in the rdma link del xxx path, no rtnl_lock is taken.
>>>
>>> Because of this, I would like to use the existing rtnl_lock to serialize
>>> calls to rxe_net_del().
>> -1 for this.
>>
>> It's a global mutex and heavily contended because many
>> components use it without much care. We are working
>> to reduce the RTNL pressure for years by converting such
>> users with a dedicated lock or per-netns RTNL mutex.
>>
>> RTNL is not needed here at all, so please use a dedicated lock.
>
> Thanks a lot for your review. I think the following commit can fix this
> problem.
>
> Please review.
The root cause is clear. If no one disagrees with this commit, I will
send out the official patch.
In the latest revision, I will move the mutex lock into the network
namespace.
I think we have discussed this problem thoroughly, and we all understand
the root cause now.
Zhu Yanjun
>
> From 80525f5b7fb0af18b9759cbde0237aabb76158cc Mon Sep 17 00:00:00 2001
>
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> Date: Sat, 16 May 2026 22:27:35 +0200
> Subject: [PATCH 1/1] RDMA/rxe: Fix Use-After-Free problem in rxe_net_del
>
> syzbot reported a general protection fault (KASAN: null-ptr-deref) in
> kernel_sock_shutdown() called during the software RoCE (rxe) link
> deletion path (rxe_dellink -> rxe_net_del).
>
> The root cause is a TOCTOU (Time-of-Check to Time-of-Use) race condition
> in rxe_net_del(). Previously, the function fetched the socket pointer
> via rxe_ns_pernet_sk4/6() outside the critical section, and then
> acquired the lock to release it via rxe_sock_put().
>
> In a highly concurrent teardown environment, another thread could close
> and clear the pernet socket after it was fetched but before the lock
> was acquired. This causes rxe_sock_put() to operate on a dangling or
> already cleared socket pointer, leading to a NULL pointer dereference
> when kernel_sock_shutdown() attempts to access sock->sk.
>
> Fix this by introducing a dedicated, per-device mutex 'release_lock'
> and extending its scope. The socket pointers are now fetched, checked,
> and released entirely within the same locked critical section. This
> ensures the atomicity of the socket lookup and teardown sequence.
>
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and
> destruction per net namespace")
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
> drivers/infiniband/sw/rxe/rxe.c | 2 ++
> drivers/infiniband/sw/rxe/rxe_net.c | 4 ++++
> drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
> 3 files changed, 7 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/
> rxe/rxe.c
> index b0714f9abe3d..46967ecdaf7d 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -34,6 +34,7 @@ void rxe_dealloc(struct ib_device *ib_dev)
> WARN_ON(!RB_EMPTY_ROOT(&rxe->mcg_tree));
>
> mutex_destroy(&rxe->usdev_lock);
> + mutex_destroy(&rxe->release_lock);
> }
>
> static const struct ib_device_ops rxe_ib_dev_odp_ops = {
> @@ -186,6 +187,7 @@ static void rxe_init(struct rxe_dev *rxe, struct
> net_device *ndev)
> rxe->mcg_tree = RB_ROOT;
>
> mutex_init(&rxe->usdev_lock);
> + mutex_init(&rxe->release_lock);
> }
>
> void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/
> sw/rxe/rxe_net.c
> index 50a2cb5405e2..c3b188538540 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -655,6 +655,8 @@ void rxe_net_del(struct ib_device *dev)
>
> net = dev_net(ndev);
>
> + mutex_lock(&rxe->release_lock);
> +
> sk = rxe_ns_pernet_sk4(net);
> if (sk)
> rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
> @@ -663,6 +665,8 @@ void rxe_net_del(struct ib_device *dev)
> if (sk)
> rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
>
> + mutex_unlock(&rxe->release_lock);
> +
> dev_put(ndev);
> }
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/
> sw/rxe/rxe_verbs.h
> index d92f80d16f78..3f54aa0a4356 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -422,6 +422,7 @@ struct rxe_dev {
> int max_ucontext;
> int max_inline_data;
> struct mutex usdev_lock;
> + struct mutex release_lock;
>
> char raw_gid[ETH_ALEN];
>
> --
> 2.43.0
>
>>
>>> My proposed commit is shown below. I am not sure whether it fully
>>> resolves the problem.
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe.c
>>> b/drivers/infiniband/sw/rxe/rxe.c
>>> index b0714f9abe3d..84266dc416c4 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>>> @@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name,
>>> struct net_device *ndev)
>>>
>>> static int rxe_dellink(struct ib_device *dev)
>>> {
>>> + rtnl_lock();
>>> rxe_net_del(dev);
>>> + rtnl_unlock();
>>>
>>> return 0;
>>> }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c
>>> b/drivers/infiniband/sw/rxe/rxe_net.c
>>> index 50a2cb5405e2..ac53ea73996d 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>> @@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev)
>>> struct sock *sk;
>>> struct net *net;
>>>
>>> + ASSERT_RTNL();
>>> +
>>> ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>>> if (!ndev)
>>> return;
>>>
>>> Zhu Yanjun
>>>
>>>> +static DEFINE_MUTEX(rxe_net_del_mutex);
>>>> +
>>>> void rxe_net_del(struct ib_device *dev)
>>>> {
>>>> struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
>>>> @@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
>>>> struct sock *sk;
>>>> struct net *net;
>>>>
>>>> + mutex_lock(&rxe_net_del_mutex);
>>>> ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
>>>> if (!ndev)
>>>> - return;
>>>> + goto out;
>>>>
>>>> net = dev_net(ndev);
>>>>
>>>> @@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
>>>> rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
>>>>
>>>> dev_put(ndev);
>>>> +out:
>>>> + mutex_unlock(&rxe_net_del_mutex);
>>>> }
>>>>
>>>> static void rxe_port_event(struct rxe_dev *rxe,
>
next prev parent reply other threads:[~2026-05-17 4:32 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 15:01 [syzbot] [net?] general protection fault in kernel_sock_shutdown (4) syzbot
2026-04-23 17:41 ` Jakub Kicinski
2026-04-24 16:47 ` Arjan van de Ven
2026-04-24 18:08 ` Arjan van de Ven
2026-05-06 13:48 ` [syzbot] [rdma] " syzbot
2026-05-06 14:28 ` Zhu Yanjun
2026-05-06 15:19 ` Kuniyuki Iwashima
2026-05-07 3:52 ` syzbot
2026-05-07 12:50 ` [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink() Edward Adam Davis
2026-05-07 13:25 ` Zhu Yanjun
2026-05-07 13:40 ` Edward Adam Davis
2026-05-07 14:11 ` Zhu Yanjun
2026-05-13 18:17 ` Leon Romanovsky
2026-05-13 23:46 ` Jason Gunthorpe
2026-05-14 7:31 ` Edward Adam Davis
2026-05-14 11:50 ` Jason Gunthorpe
2026-05-14 13:58 ` David Ahern
2026-05-14 14:14 ` Jason Gunthorpe
2026-05-14 14:26 ` David Ahern
2026-05-14 15:46 ` Zhu Yanjun
2026-05-16 12:40 ` Edward Adam Davis
2026-05-16 14:00 ` [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del() Edward Adam Davis
2026-05-16 14:31 ` Zhu Yanjun
2026-05-16 23:40 ` Yanjun.Zhu
2026-05-17 1:56 ` Edward Adam Davis
2026-05-17 2:15 ` Kuniyuki Iwashima
2026-05-17 3:27 ` Zhu Yanjun
2026-05-17 4:31 ` Zhu Yanjun [this message]
2026-05-14 5:15 ` [syzbot] [rdma] general protection fault in kernel_sock_shutdown (4) Zhu Yanjun
2026-05-16 5:44 ` Zhu Yanjun
2026-05-16 7:02 ` syzbot
2026-05-16 18:40 ` Zhu Yanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ed07dc7-b303-4577-8c07-06fc536ab1ca@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eadavis@qq.com \
--cc=edumazet@google.com \
--cc=hdanton@sina.com \
--cc=horms@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox