From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 581393176EF for ; Sun, 17 May 2026 04:32:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778992343; cv=none; b=PbuzBfItl86zI+DzVlfrWxEysH3AURCGTQOuYSvojGD3lrNZN/6Ebgq2mgdvHvwrNplmFcS/3eDjnUequXw0BnpY+gx2lccN8HchhaF8qesJWHEDxVdURu35NmVVGS/Jaq47Rbh7tNqn5H7FqdedFS7ISNQFL2R/Q3+/yKafPes= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778992343; c=relaxed/simple; bh=YwJzqq3bvyUvu0Bz6tGPEQ8Io17vl9ejCAS55kqSTe4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=m6oXaPPb0J+COs9jHAm6ENJWdrOChT+t1gWWKMiU7t3AIXn01o2oH5y5CCcaPIFL1+e+3JW5093L7naA8KFTZjh3kAyFbEQWM1Epzn321PrjGADnIsWItHXGrRrKx3aM7OcKfo1r4CKchYhC8ULeiLvwldCF8pSnKOf6uuh5ekk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=mduEXD+j; arc=none smtp.client-ip=91.218.175.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="mduEXD+j" Message-ID: <0ed07dc7-b303-4577-8c07-06fc536ab1ca@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1778992327; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4ZfaUHnsu6dpLXBP1XrGyGURasC65yr4T94tkiPouRc=; b=mduEXD+jsmlu/KezQqsHGu+WmCTsWsNGLhzQ3z9xshrJzgq0ZosbwJsQffRPJCRbQuCA3I DSmKt4yidJANKvuXS3twcppja6nwmfiHl5b2SjIXgrBfQV+1P+fGHgtRxlYqbGGXKEAD/i MdaSiu8HmmCzFzRPp74flII5E3ilsSo= Date: Sat, 16 May 2026 21:31:59 -0700 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del() To: Kuniyuki Iwashima , "yanjun.zhu@linux.dev" Cc: Edward Adam Davis , akpm@linux-foundation.org, arjan@linux.intel.com, davem@davemloft.net, dsahern@kernel.org, edumazet@google.com, hdanton@sina.com, horms@kernel.org, jgg@ziepe.ca, kuba@kernel.org, leon@kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, pabeni@redhat.com, syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com, syzkaller-bugs@googlegroups.com, zyjzyj2000@gmail.com References: <13bc2e56-ea61-4cde-896f-c10636b29e9d@linux.dev> <1dbc6a9d-4933-4123-90d2-a2735f9d8f58@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Zhu Yanjun In-Reply-To: <1dbc6a9d-4933-4123-90d2-a2735f9d8f58@linux.dev> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT 在 2026/5/16 20:27, Zhu Yanjun 写道: > > 在 2026/5/16 19:15, Kuniyuki Iwashima 写道: >> On Sat, May 16, 2026 at 4:40 PM Yanjun.Zhu wrote: >>> >>> On 5/16/26 7:00 AM, Edward Adam Davis wrote: >>>> We must serialize calls to rxe_net_del() or risk a crash as syzbot >>>> reported: >>>> >>>> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] >>>> Call Trace: >>>>    udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197 >>>>    rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 >>>> [inline] >>>>    rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline] >>>>    rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660 >>>>    rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254 >>>> >>>> Jason Gunthorpe suggest placing the lock within rxe to protect its racy >>>> implementation of rxe_net_del(), which looks like it is possibly also >>>> triggered by NETDEV_UNREGISTER. >>>> >>>> The patch addressing this issue in nldev_dellink() has already been >>>> applied(0b28000b64f4); however, since the fix has now been relocated >>>> to rxe, the corresponding remedial code in nldev has been removed. >>>> >>>> Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and >>>> destruction per net namespace") >>>> Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in >>>> nldev_dellink()") >>>> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com >>>> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21 >>>> Signed-off-by: Edward Adam Davis >>>> --- >>>> v1 -> v2: serialize calls to rxe net del >>>> >>>>    drivers/infiniband/core/nldev.c     | 4 ---- >>>>    drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++- >>>>    2 files changed, 6 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/ >>>> core/nldev.c >>>> index 3cb3cb7629fe..96c745d5bac4 100644 >>>> --- a/drivers/infiniband/core/nldev.c >>>> +++ b/drivers/infiniband/core/nldev.c >>>> @@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, >>>> struct nlmsghdr *nlh, >>>>        return err; >>>>    } >>>> >>>> -static DEFINE_MUTEX(nldev_dellink_mutex); >>>> - >>>>    static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, >>>>                          struct netlink_ext_ack *extack) >>>>    { >>>> @@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, >>>> struct nlmsghdr *nlh, >>>>         * implicitly scoped to the driver supporting dynamic link >>>> deletion like RXE. >>>>         */ >>>>        if (device->link_ops && device->link_ops->dellink) { >>>> -             mutex_lock(&nldev_dellink_mutex); >>>>                err = device->link_ops->dellink(device); >>>> -             mutex_unlock(&nldev_dellink_mutex); >>>>                if (err) >>>>                        return err; >>>>        } >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/ >>>> infiniband/sw/rxe/rxe_net.c >>>> index 50a2cb5405e2..92847e955ca2 100644 >>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c >>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c >>>> @@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk, >>>>        } >>>>    } >>>> >>> I read this commit carefully. There are two paths that can invoke >>> rxe_net_del(). >>> >>> One is through the rdma link del xxx command, while the other is through >>> the netdevice notification chain. >>> >>> In the netdevice notification chain path, rtnl_lock is already held, and >>> rxe_net_del() is called under that lock. >>> >>> However, in the rdma link del xxx path, no rtnl_lock is taken. >>> >>> Because of this, I would like to use the existing rtnl_lock to serialize >>> calls to rxe_net_del(). >> -1 for this. >> >> It's a global mutex and heavily contended because many >> components use it without much care.  We are working >> to reduce the RTNL pressure for years by converting such >> users with a dedicated lock or per-netns RTNL mutex. >> >> RTNL is not needed here at all, so please use a dedicated lock. > > Thanks a lot for your review. I think the following commit can fix this > problem. > > Please review. The root cause is clear. If no one disagrees with this commit, I will send out the official patch. In the latest revision, I will move the mutex lock into the network namespace. I think we have discussed this problem thoroughly, and we all understand the root cause now. Zhu Yanjun > > From 80525f5b7fb0af18b9759cbde0237aabb76158cc Mon Sep 17 00:00:00 2001 > > From: Zhu Yanjun > Date: Sat, 16 May 2026 22:27:35 +0200 > Subject: [PATCH 1/1] RDMA/rxe: Fix Use-After-Free problem in rxe_net_del > > syzbot reported a general protection fault (KASAN: null-ptr-deref) in > kernel_sock_shutdown() called during the software RoCE (rxe) link > deletion path (rxe_dellink -> rxe_net_del). > > The root cause is a TOCTOU (Time-of-Check to Time-of-Use) race condition > in rxe_net_del(). Previously, the function fetched the socket pointer > via rxe_ns_pernet_sk4/6() outside the critical section, and then > acquired the lock to release it via rxe_sock_put(). > > In a highly concurrent teardown environment, another thread could close > and clear the pernet socket after it was fetched but before the lock > was acquired. This causes rxe_sock_put() to operate on a dangling or > already cleared socket pointer, leading to a NULL pointer dereference > when kernel_sock_shutdown() attempts to access sock->sk. > > Fix this by introducing a dedicated, per-device mutex 'release_lock' > and extending its scope. The socket pointers are now fetched, checked, > and released entirely within the same locked critical section. This > ensures the atomicity of the socket lookup and teardown sequence. > > Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21 > Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and > destruction per net namespace") > Signed-off-by: Zhu Yanjun > --- >  drivers/infiniband/sw/rxe/rxe.c       | 2 ++ >  drivers/infiniband/sw/rxe/rxe_net.c   | 4 ++++ >  drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + >  3 files changed, 7 insertions(+) > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/ > rxe/rxe.c > index b0714f9abe3d..46967ecdaf7d 100644 > --- a/drivers/infiniband/sw/rxe/rxe.c > +++ b/drivers/infiniband/sw/rxe/rxe.c > @@ -34,6 +34,7 @@ void rxe_dealloc(struct ib_device *ib_dev) >         WARN_ON(!RB_EMPTY_ROOT(&rxe->mcg_tree)); > >         mutex_destroy(&rxe->usdev_lock); > +       mutex_destroy(&rxe->release_lock); >  } > >  static const struct ib_device_ops rxe_ib_dev_odp_ops = { > @@ -186,6 +187,7 @@ static void rxe_init(struct rxe_dev *rxe, struct > net_device *ndev) >         rxe->mcg_tree = RB_ROOT; > >         mutex_init(&rxe->usdev_lock); > +       mutex_init(&rxe->release_lock); >  } > >  void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu) > diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/ > sw/rxe/rxe_net.c > index 50a2cb5405e2..c3b188538540 100644 > --- a/drivers/infiniband/sw/rxe/rxe_net.c > +++ b/drivers/infiniband/sw/rxe/rxe_net.c > @@ -655,6 +655,8 @@ void rxe_net_del(struct ib_device *dev) > >         net = dev_net(ndev); > > +       mutex_lock(&rxe->release_lock); > + >         sk = rxe_ns_pernet_sk4(net); >         if (sk) >                 rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net); > @@ -663,6 +665,8 @@ void rxe_net_del(struct ib_device *dev) >         if (sk) >                 rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net); > > +       mutex_unlock(&rxe->release_lock); > + >         dev_put(ndev); >  } > > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/ > sw/rxe/rxe_verbs.h > index d92f80d16f78..3f54aa0a4356 100644 > --- a/drivers/infiniband/sw/rxe/rxe_verbs.h > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h > @@ -422,6 +422,7 @@ struct rxe_dev { >         int                     max_ucontext; >         int                     max_inline_data; >         struct mutex            usdev_lock; > +       struct mutex            release_lock; > >         char                    raw_gid[ETH_ALEN]; > > -- > 2.43.0 > >> >>> My proposed commit is shown below. I am not sure whether it fully >>> resolves the problem. >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe.c >>> b/drivers/infiniband/sw/rxe/rxe.c >>> index b0714f9abe3d..84266dc416c4 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe.c >>> +++ b/drivers/infiniband/sw/rxe/rxe.c >>> @@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name, >>> struct net_device *ndev) >>> >>>    static int rxe_dellink(struct ib_device *dev) >>>    { >>> +       rtnl_lock(); >>>           rxe_net_del(dev); >>> +       rtnl_unlock(); >>> >>>           return 0; >>>    } >>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c >>> b/drivers/infiniband/sw/rxe/rxe_net.c >>> index 50a2cb5405e2..ac53ea73996d 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_net.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c >>> @@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev) >>>           struct sock *sk; >>>           struct net *net; >>> >>> +       ASSERT_RTNL(); >>> + >>>           ndev = rxe_ib_device_get_netdev(&rxe->ib_dev); >>>           if (!ndev) >>>                   return; >>> >>> Zhu Yanjun >>> >>>> +static DEFINE_MUTEX(rxe_net_del_mutex); >>>> + >>>>    void rxe_net_del(struct ib_device *dev) >>>>    { >>>>        struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev); >>>> @@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev) >>>>        struct sock *sk; >>>>        struct net *net; >>>> >>>> +     mutex_lock(&rxe_net_del_mutex); >>>>        ndev = rxe_ib_device_get_netdev(&rxe->ib_dev); >>>>        if (!ndev) >>>> -             return; >>>> +             goto out; >>>> >>>>        net = dev_net(ndev); >>>> >>>> @@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev) >>>>                rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net); >>>> >>>>        dev_put(ndev); >>>> +out: >>>> +     mutex_unlock(&rxe_net_del_mutex); >>>>    } >>>> >>>>    static void rxe_port_event(struct rxe_dev *rxe, >