From: Ido Schimmel <idosch@idosch.org>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Roopa Prabhu <roopa@nvidia.com>,
Nikolay Aleksandrov <razor@blackwall.org>,
Willem de Bruijn <willemb@google.com>,
Simon Horman <horms@kernel.org>,
Kuniyuki Iwashima <kuni1840@gmail.com>,
netdev@vger.kernel.org, bridge@lists.linux.dev,
syzkaller <syzkaller@googlegroups.com>,
yan kang <kangyan91@outlook.com>,
yue sun <samsun1006219@gmail.com>
Subject: Re: [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
Date: Sun, 16 Mar 2025 18:16:00 +0200 [thread overview]
Message-ID: <Z9b5QAKtTSCv3DVZ@shredder> (raw)
In-Reply-To: <20250314010501.75798-1-kuniyu@amazon.com>
On Thu, Mar 13, 2025 at 05:59:55PM -0700, Kuniyuki Iwashima wrote:
> SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
> br_ioctl_call(), which causes unnecessary RTNL dance and the splat
> below [0] under RTNL pressure.
>
> Let's say Thread A is trying to detach a device from a bridge and
> Thread B is trying to remove the bridge.
>
> In dev_ioctl(), Thread A bumps the bridge device's refcnt by
> netdev_hold() and releases RTNL because the following br_ioctl_call()
> also re-acquires RTNL.
>
> In the race window, Thread B could acquire RTNL and try to remove
> the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
> and wait for netdev_put() by Thread A.
>
> Thread A, however, must hold RTNL twice after the unlock in dev_ifsioc(),
> which may take long under RTNL pressure, resulting in the splat by
> Thread B.
>
> Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
> ---------------------- ----------------------
> sock_ioctl sock_ioctl
> `- sock_do_ioctl `- br_ioctl_call
> `- dev_ioctl `- br_ioctl_stub
> |- rtnl_lock |
> |- dev_ifsioc '
> ' |- dev = __dev_get_by_name(...)
> |- netdev_hold(dev, ...) .
> / |- rtnl_unlock ------. |
> | |- br_ioctl_call `---> |- rtnl_lock
> Race | | `- br_ioctl_stub |- br_del_bridge
> Window | | | |- dev = __dev_get_by_name(...)
> | | | May take long | `- br_dev_delete(dev, ...)
> | | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...)
> | | | | `- rtnl_unlock
> | | |- rtnl_lock <--| `- netdev_run_todo
> | | |- ... | `- netdev_run_todo
> | | `- rtnl_unlock | |- __rtnl_unlock
> | | | |- netdev_wait_allrefs_any
> \ |- rtnl_lock <--------' |
> |- netdev_put(dev, ...) <----------------' Wait refcnt decrement
> and log splat below
Isn't the race window a bit smaller? dev_ifsioc() does netdev_put()
before rtnl_lock().
>
> To avoid blocking SIOCBRDELBR unnecessarily, let's not call
> dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
>
> In the dev_ioctl() path, we do the following:
>
> 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
> 2. Check CAP_NET_ADMIN in dev_ioctl()
> 3. Call dev_load() in dev_ioctl()
> 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
>
> 3. can be done by request_module() in br_ioctl_call(), so we move
> 1., 2., and 4. to br_ioctl_stub().
>
> Note that 2. is also checked later in add_del_if(), but it's better
> performed before RTNL.
>
> SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
> the pre-git era, and there seems to be no specific reason to process
> them there.
I couldn't find an explanation as well.
Doesn't seem like we have any tests for the IOCTL path, but FWIW I
verified that basic operations using brctl still work after this patch.
>
> [0]:
> unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
> ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
> __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
> netdev_hold include/linux/netdevice.h:4311 [inline]
> dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
> dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
> sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
> sock_ioctl+0x23a/0x6c0 net/socket.c:1318
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:906 [inline]
> __se_sys_ioctl fs/ioctl.c:892 [inline]
> __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Reported-by: yan kang <kangyan91@outlook.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Thanks for the fix and the detailed commit message. One nit below.
> ---
> include/linux/if_bridge.h | 6 ++----
> net/bridge/br_ioctl.c | 39 ++++++++++++++++++++++++++++++++++++---
> net/bridge/br_private.h | 3 +--
> net/core/dev_ioctl.c | 19 -------------------
> net/socket.c | 19 +++++++++----------
> 5 files changed, 48 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 3ff96ae31bf6..c5fe3b2a53e8 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -65,11 +65,9 @@ struct br_ip_list {
> #define BR_DEFAULT_AGEING_TIME (300 * HZ)
>
> struct net_bridge;
> -void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
> - unsigned int cmd, struct ifreq *ifr,
> +void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
> void __user *uarg));
> -int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
> - struct ifreq *ifr, void __user *uarg);
> +int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg);
>
> #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
> int br_multicast_list_adjacent(struct net_device *dev,
> diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
> index f213ed108361..b5a607f6da4e 100644
> --- a/net/bridge/br_ioctl.c
> +++ b/net/bridge/br_ioctl.c
> @@ -394,10 +394,29 @@ static int old_deviceless(struct net *net, void __user *data)
> return -EOPNOTSUPP;
> }
>
> -int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
> - struct ifreq *ifr, void __user *uarg)
> +int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg)
> {
> int ret = -EOPNOTSUPP;
> + struct ifreq ifr;
> +
> + switch (cmd) {
> + case SIOCBRADDIF:
> + case SIOCBRDELIF: {
Why not a simple if statement? Unlikely that we will add more commands
to this switch statement.
> + void __user *data;
> + char *colon;
> +
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + return -EPERM;
> +
> + if (get_user_ifreq(&ifr, &data, uarg))
> + return -EFAULT;
> +
> + ifr.ifr_name[IFNAMSIZ - 1] = 0;
> + colon = strchr(ifr.ifr_name, ':');
> + if (colon)
> + *colon = 0;
> + }
> + }
>
> rtnl_lock();
next prev parent reply other threads:[~2025-03-16 16:16 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-14 0:59 [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF Kuniyuki Iwashima
2025-03-16 16:16 ` Ido Schimmel [this message]
2025-03-16 19:01 ` Kuniyuki Iwashima
2025-03-16 16:41 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z9b5QAKtTSCv3DVZ@shredder \
--to=idosch@idosch.org \
--cc=andrew+netdev@lunn.ch \
--cc=bridge@lists.linux.dev \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kangyan91@outlook.com \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=razor@blackwall.org \
--cc=roopa@nvidia.com \
--cc=samsun1006219@gmail.com \
--cc=syzkaller@googlegroups.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).