* [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
@ 2025-03-14 0:59 Kuniyuki Iwashima
2025-03-16 16:16 ` Ido Schimmel
2025-03-16 16:41 ` Stanislav Fomichev
0 siblings, 2 replies; 4+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-14 0:59 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Willem de Bruijn,
Simon Horman
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, bridge, syzkaller,
yan kang, yue sun
SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
br_ioctl_call(), which causes unnecessary RTNL dance and the splat
below [0] under RTNL pressure.
Let's say Thread A is trying to detach a device from a bridge and
Thread B is trying to remove the bridge.
In dev_ioctl(), Thread A bumps the bridge device's refcnt by
netdev_hold() and releases RTNL because the following br_ioctl_call()
also re-acquires RTNL.
In the race window, Thread B could acquire RTNL and try to remove
the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
and wait for netdev_put() by Thread A.
Thread A, however, must hold RTNL twice after the unlock in dev_ifsioc(),
which may take long under RTNL pressure, resulting in the splat by
Thread B.
Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
---------------------- ----------------------
sock_ioctl sock_ioctl
`- sock_do_ioctl `- br_ioctl_call
`- dev_ioctl `- br_ioctl_stub
|- rtnl_lock |
|- dev_ifsioc '
' |- dev = __dev_get_by_name(...)
|- netdev_hold(dev, ...) .
/ |- rtnl_unlock ------. |
| |- br_ioctl_call `---> |- rtnl_lock
Race | | `- br_ioctl_stub |- br_del_bridge
Window | | | |- dev = __dev_get_by_name(...)
| | | May take long | `- br_dev_delete(dev, ...)
| | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...)
| | | | `- rtnl_unlock
| | |- rtnl_lock <--| `- netdev_run_todo
| | |- ... | `- netdev_run_todo
| | `- rtnl_unlock | |- __rtnl_unlock
| | | |- netdev_wait_allrefs_any
\ |- rtnl_lock <--------' |
|- netdev_put(dev, ...) <----------------' Wait refcnt decrement
and log splat below
To avoid blocking SIOCBRDELBR unnecessarily, let's not call
dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
In the dev_ioctl() path, we do the following:
1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
2. Check CAP_NET_ADMIN in dev_ioctl()
3. Call dev_load() in dev_ioctl()
4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
3. can be done by request_module() in br_ioctl_call(), so we move
1., 2., and 4. to br_ioctl_stub().
Note that 2. is also checked later in add_del_if(), but it's better
performed before RTNL.
SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
the pre-git era, and there seems to be no specific reason to process
them there.
[0]:
unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
__netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
netdev_hold include/linux/netdevice.h:4311 [inline]
dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
sock_ioctl+0x23a/0x6c0 net/socket.c:1318
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: yan kang <kangyan91@outlook.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
include/linux/if_bridge.h | 6 ++----
net/bridge/br_ioctl.c | 39 ++++++++++++++++++++++++++++++++++++---
net/bridge/br_private.h | 3 +--
net/core/dev_ioctl.c | 19 -------------------
net/socket.c | 19 +++++++++----------
5 files changed, 48 insertions(+), 38 deletions(-)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 3ff96ae31bf6..c5fe3b2a53e8 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -65,11 +65,9 @@ struct br_ip_list {
#define BR_DEFAULT_AGEING_TIME (300 * HZ)
struct net_bridge;
-void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
- unsigned int cmd, struct ifreq *ifr,
+void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
void __user *uarg));
-int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg);
+int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg);
#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
int br_multicast_list_adjacent(struct net_device *dev,
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index f213ed108361..b5a607f6da4e 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -394,10 +394,29 @@ static int old_deviceless(struct net *net, void __user *data)
return -EOPNOTSUPP;
}
-int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg)
+int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg)
{
int ret = -EOPNOTSUPP;
+ struct ifreq ifr;
+
+ switch (cmd) {
+ case SIOCBRADDIF:
+ case SIOCBRDELIF: {
+ void __user *data;
+ char *colon;
+
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return -EPERM;
+
+ if (get_user_ifreq(&ifr, &data, uarg))
+ return -EFAULT;
+
+ ifr.ifr_name[IFNAMSIZ - 1] = 0;
+ colon = strchr(ifr.ifr_name, ':');
+ if (colon)
+ *colon = 0;
+ }
+ }
rtnl_lock();
@@ -430,9 +449,23 @@ int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
break;
case SIOCBRADDIF:
case SIOCBRDELIF:
- ret = add_del_if(br, ifr->ifr_ifindex, cmd == SIOCBRADDIF);
+ {
+ struct net_device *dev;
+
+ dev = __dev_get_by_name(net, ifr.ifr_name);
+ if (!dev || !netif_device_present(dev)) {
+ ret = -ENODEV;
+ break;
+ }
+ if (!netif_is_bridge_master(dev)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = add_del_if(netdev_priv(dev), ifr.ifr_ifindex, cmd == SIOCBRADDIF);
break;
}
+ }
rtnl_unlock();
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1054b8a88edc..d5b3c5936a79 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -949,8 +949,7 @@ br_port_get_check_rtnl(const struct net_device *dev)
/* br_ioctl.c */
int br_dev_siocdevprivate(struct net_device *dev, struct ifreq *rq,
void __user *data, int cmd);
-int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg);
+int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg);
/* br_multicast.c */
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 4c2098ac9d72..57f79f8e8466 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -551,7 +551,6 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, void __user *data,
int err;
struct net_device *dev = __dev_get_by_name(net, ifr->ifr_name);
const struct net_device_ops *ops;
- netdevice_tracker dev_tracker;
if (!dev)
return -ENODEV;
@@ -614,22 +613,6 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, void __user *data,
case SIOCWANDEV:
return dev_siocwandev(dev, &ifr->ifr_settings);
- case SIOCBRADDIF:
- case SIOCBRDELIF:
- if (!netif_device_present(dev))
- return -ENODEV;
- if (!netif_is_bridge_master(dev))
- return -EOPNOTSUPP;
-
- netdev_hold(dev, &dev_tracker, GFP_KERNEL);
- rtnl_net_unlock(net);
-
- err = br_ioctl_call(net, netdev_priv(dev), cmd, ifr, NULL);
-
- netdev_put(dev, &dev_tracker);
- rtnl_net_lock(net);
- return err;
-
case SIOCDEVPRIVATE ... SIOCDEVPRIVATE + 15:
return dev_siocdevprivate(dev, ifr, data, cmd);
@@ -812,8 +795,6 @@ int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr,
case SIOCBONDRELEASE:
case SIOCBONDSETHWADDR:
case SIOCBONDCHANGEACTIVE:
- case SIOCBRADDIF:
- case SIOCBRDELIF:
case SIOCSHWTSTAMP:
if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
return -EPERM;
diff --git a/net/socket.c b/net/socket.c
index 28bae5a94234..38227d00d198 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1145,12 +1145,10 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
*/
static DEFINE_MUTEX(br_ioctl_mutex);
-static int (*br_ioctl_hook)(struct net *net, struct net_bridge *br,
- unsigned int cmd, struct ifreq *ifr,
+static int (*br_ioctl_hook)(struct net *net, unsigned int cmd,
void __user *uarg);
-void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
- unsigned int cmd, struct ifreq *ifr,
+void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
void __user *uarg))
{
mutex_lock(&br_ioctl_mutex);
@@ -1159,8 +1157,7 @@ void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
}
EXPORT_SYMBOL(brioctl_set);
-int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
- struct ifreq *ifr, void __user *uarg)
+int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg)
{
int err = -ENOPKG;
@@ -1169,7 +1166,7 @@ int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
mutex_lock(&br_ioctl_mutex);
if (br_ioctl_hook)
- err = br_ioctl_hook(net, br, cmd, ifr, uarg);
+ err = br_ioctl_hook(net, cmd, uarg);
mutex_unlock(&br_ioctl_mutex);
return err;
@@ -1269,7 +1266,9 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
case SIOCSIFBR:
case SIOCBRADDBR:
case SIOCBRDELBR:
- err = br_ioctl_call(net, NULL, cmd, NULL, argp);
+ case SIOCBRADDIF:
+ case SIOCBRDELIF:
+ err = br_ioctl_call(net, cmd, argp);
break;
case SIOCGIFVLAN:
case SIOCSIFVLAN:
@@ -3429,6 +3428,8 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
case SIOCGPGRP:
case SIOCBRADDBR:
case SIOCBRDELBR:
+ case SIOCBRADDIF:
+ case SIOCBRDELIF:
case SIOCGIFVLAN:
case SIOCSIFVLAN:
case SIOCGSKNS:
@@ -3468,8 +3469,6 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
case SIOCGIFPFLAGS:
case SIOCGIFTXQLEN:
case SIOCSIFTXQLEN:
- case SIOCBRADDIF:
- case SIOCBRDELIF:
case SIOCGIFNAME:
case SIOCSIFNAME:
case SIOCGMIIPHY:
--
2.48.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
2025-03-14 0:59 [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF Kuniyuki Iwashima
@ 2025-03-16 16:16 ` Ido Schimmel
2025-03-16 19:01 ` Kuniyuki Iwashima
2025-03-16 16:41 ` Stanislav Fomichev
1 sibling, 1 reply; 4+ messages in thread
From: Ido Schimmel @ 2025-03-16 16:16 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Willem de Bruijn,
Simon Horman, Kuniyuki Iwashima, netdev, bridge, syzkaller,
yan kang, yue sun
On Thu, Mar 13, 2025 at 05:59:55PM -0700, Kuniyuki Iwashima wrote:
> SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
> br_ioctl_call(), which causes unnecessary RTNL dance and the splat
> below [0] under RTNL pressure.
>
> Let's say Thread A is trying to detach a device from a bridge and
> Thread B is trying to remove the bridge.
>
> In dev_ioctl(), Thread A bumps the bridge device's refcnt by
> netdev_hold() and releases RTNL because the following br_ioctl_call()
> also re-acquires RTNL.
>
> In the race window, Thread B could acquire RTNL and try to remove
> the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
> and wait for netdev_put() by Thread A.
>
> Thread A, however, must hold RTNL twice after the unlock in dev_ifsioc(),
> which may take long under RTNL pressure, resulting in the splat by
> Thread B.
>
> Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
> ---------------------- ----------------------
> sock_ioctl sock_ioctl
> `- sock_do_ioctl `- br_ioctl_call
> `- dev_ioctl `- br_ioctl_stub
> |- rtnl_lock |
> |- dev_ifsioc '
> ' |- dev = __dev_get_by_name(...)
> |- netdev_hold(dev, ...) .
> / |- rtnl_unlock ------. |
> | |- br_ioctl_call `---> |- rtnl_lock
> Race | | `- br_ioctl_stub |- br_del_bridge
> Window | | | |- dev = __dev_get_by_name(...)
> | | | May take long | `- br_dev_delete(dev, ...)
> | | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...)
> | | | | `- rtnl_unlock
> | | |- rtnl_lock <--| `- netdev_run_todo
> | | |- ... | `- netdev_run_todo
> | | `- rtnl_unlock | |- __rtnl_unlock
> | | | |- netdev_wait_allrefs_any
> \ |- rtnl_lock <--------' |
> |- netdev_put(dev, ...) <----------------' Wait refcnt decrement
> and log splat below
Isn't the race window a bit smaller? dev_ifsioc() does netdev_put()
before rtnl_lock().
>
> To avoid blocking SIOCBRDELBR unnecessarily, let's not call
> dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
>
> In the dev_ioctl() path, we do the following:
>
> 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
> 2. Check CAP_NET_ADMIN in dev_ioctl()
> 3. Call dev_load() in dev_ioctl()
> 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
>
> 3. can be done by request_module() in br_ioctl_call(), so we move
> 1., 2., and 4. to br_ioctl_stub().
>
> Note that 2. is also checked later in add_del_if(), but it's better
> performed before RTNL.
>
> SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
> the pre-git era, and there seems to be no specific reason to process
> them there.
I couldn't find an explanation as well.
Doesn't seem like we have any tests for the IOCTL path, but FWIW I
verified that basic operations using brctl still work after this patch.
>
> [0]:
> unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
> ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
> __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
> netdev_hold include/linux/netdevice.h:4311 [inline]
> dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
> dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
> sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
> sock_ioctl+0x23a/0x6c0 net/socket.c:1318
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:906 [inline]
> __se_sys_ioctl fs/ioctl.c:892 [inline]
> __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Reported-by: yan kang <kangyan91@outlook.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Thanks for the fix and the detailed commit message. One nit below.
> ---
> include/linux/if_bridge.h | 6 ++----
> net/bridge/br_ioctl.c | 39 ++++++++++++++++++++++++++++++++++++---
> net/bridge/br_private.h | 3 +--
> net/core/dev_ioctl.c | 19 -------------------
> net/socket.c | 19 +++++++++----------
> 5 files changed, 48 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 3ff96ae31bf6..c5fe3b2a53e8 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -65,11 +65,9 @@ struct br_ip_list {
> #define BR_DEFAULT_AGEING_TIME (300 * HZ)
>
> struct net_bridge;
> -void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
> - unsigned int cmd, struct ifreq *ifr,
> +void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
> void __user *uarg));
> -int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
> - struct ifreq *ifr, void __user *uarg);
> +int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg);
>
> #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
> int br_multicast_list_adjacent(struct net_device *dev,
> diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
> index f213ed108361..b5a607f6da4e 100644
> --- a/net/bridge/br_ioctl.c
> +++ b/net/bridge/br_ioctl.c
> @@ -394,10 +394,29 @@ static int old_deviceless(struct net *net, void __user *data)
> return -EOPNOTSUPP;
> }
>
> -int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
> - struct ifreq *ifr, void __user *uarg)
> +int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg)
> {
> int ret = -EOPNOTSUPP;
> + struct ifreq ifr;
> +
> + switch (cmd) {
> + case SIOCBRADDIF:
> + case SIOCBRDELIF: {
Why not a simple if statement? Unlikely that we will add more commands
to this switch statement.
> + void __user *data;
> + char *colon;
> +
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + return -EPERM;
> +
> + if (get_user_ifreq(&ifr, &data, uarg))
> + return -EFAULT;
> +
> + ifr.ifr_name[IFNAMSIZ - 1] = 0;
> + colon = strchr(ifr.ifr_name, ':');
> + if (colon)
> + *colon = 0;
> + }
> + }
>
> rtnl_lock();
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
2025-03-16 16:16 ` Ido Schimmel
@ 2025-03-16 19:01 ` Kuniyuki Iwashima
0 siblings, 0 replies; 4+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-16 19:01 UTC (permalink / raw)
To: idosch, stfomichev
Cc: andrew+netdev, bridge, davem, edumazet, horms, kangyan91, kuba,
kuni1840, kuniyu, netdev, pabeni, razor, roopa, samsun1006219,
syzkaller, willemb
From: Ido Schimmel <idosch@idosch.org>
Date: Sun, 16 Mar 2025 18:16:00 +0200
> On Thu, Mar 13, 2025 at 05:59:55PM -0700, Kuniyuki Iwashima wrote:
> > SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
> > br_ioctl_call(), which causes unnecessary RTNL dance and the splat
> > below [0] under RTNL pressure.
> >
> > Let's say Thread A is trying to detach a device from a bridge and
> > Thread B is trying to remove the bridge.
> >
> > In dev_ioctl(), Thread A bumps the bridge device's refcnt by
> > netdev_hold() and releases RTNL because the following br_ioctl_call()
> > also re-acquires RTNL.
> >
> > In the race window, Thread B could acquire RTNL and try to remove
> > the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
> > and wait for netdev_put() by Thread A.
> >
> > Thread A, however, must hold RTNL twice after the unlock in dev_ifsioc(),
> > which may take long under RTNL pressure, resulting in the splat by
> > Thread B.
> >
> > Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
> > ---------------------- ----------------------
> > sock_ioctl sock_ioctl
> > `- sock_do_ioctl `- br_ioctl_call
> > `- dev_ioctl `- br_ioctl_stub
> > |- rtnl_lock |
> > |- dev_ifsioc '
> > ' |- dev = __dev_get_by_name(...)
> > |- netdev_hold(dev, ...) .
> > / |- rtnl_unlock ------. |
> > | |- br_ioctl_call `---> |- rtnl_lock
> > Race | | `- br_ioctl_stub |- br_del_bridge
> > Window | | | |- dev = __dev_get_by_name(...)
> > | | | May take long | `- br_dev_delete(dev, ...)
> > | | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...)
> > | | | | `- rtnl_unlock
> > | | |- rtnl_lock <--| `- netdev_run_todo
> > | | |- ... | `- netdev_run_todo
> > | | `- rtnl_unlock | |- __rtnl_unlock
> > | | | |- netdev_wait_allrefs_any
> > \ |- rtnl_lock <--------' |
> > |- netdev_put(dev, ...) <----------------' Wait refcnt decrement
> > and log splat below
>
> Isn't the race window a bit smaller? dev_ifsioc() does netdev_put()
> before rtnl_lock().
Ah right, looks like I'm lost while writing.
>
> >
> > To avoid blocking SIOCBRDELBR unnecessarily, let's not call
> > dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
> >
> > In the dev_ioctl() path, we do the following:
> >
> > 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
> > 2. Check CAP_NET_ADMIN in dev_ioctl()
> > 3. Call dev_load() in dev_ioctl()
> > 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
> >
> > 3. can be done by request_module() in br_ioctl_call(), so we move
> > 1., 2., and 4. to br_ioctl_stub().
> >
> > Note that 2. is also checked later in add_del_if(), but it's better
> > performed before RTNL.
> >
> > SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
> > the pre-git era, and there seems to be no specific reason to process
> > them there.
>
> I couldn't find an explanation as well.
>
> Doesn't seem like we have any tests for the IOCTL path, but FWIW I
> verified that basic operations using brctl still work after this patch.
Thanks :)
>
> >
> > [0]:
> > unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
> > ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
> > __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
> > netdev_hold include/linux/netdevice.h:4311 [inline]
> > dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
> > dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
> > sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
> > sock_ioctl+0x23a/0x6c0 net/socket.c:1318
> > vfs_ioctl fs/ioctl.c:51 [inline]
> > __do_sys_ioctl fs/ioctl.c:906 [inline]
> > __se_sys_ioctl fs/ioctl.c:892 [inline]
> > __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
> > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > Reported-by: yan kang <kangyan91@outlook.com>
> > Reported-by: yue sun <samsun1006219@gmail.com>
> > Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>
> Thanks for the fix and the detailed commit message. One nit below.
>
> > ---
> > include/linux/if_bridge.h | 6 ++----
> > net/bridge/br_ioctl.c | 39 ++++++++++++++++++++++++++++++++++++---
> > net/bridge/br_private.h | 3 +--
> > net/core/dev_ioctl.c | 19 -------------------
> > net/socket.c | 19 +++++++++----------
> > 5 files changed, 48 insertions(+), 38 deletions(-)
> >
> > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > index 3ff96ae31bf6..c5fe3b2a53e8 100644
> > --- a/include/linux/if_bridge.h
> > +++ b/include/linux/if_bridge.h
> > @@ -65,11 +65,9 @@ struct br_ip_list {
> > #define BR_DEFAULT_AGEING_TIME (300 * HZ)
> >
> > struct net_bridge;
> > -void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
> > - unsigned int cmd, struct ifreq *ifr,
> > +void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
> > void __user *uarg));
> > -int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
> > - struct ifreq *ifr, void __user *uarg);
> > +int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg);
> >
> > #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
> > int br_multicast_list_adjacent(struct net_device *dev,
> > diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
> > index f213ed108361..b5a607f6da4e 100644
> > --- a/net/bridge/br_ioctl.c
> > +++ b/net/bridge/br_ioctl.c
> > @@ -394,10 +394,29 @@ static int old_deviceless(struct net *net, void __user *data)
> > return -EOPNOTSUPP;
> > }
> >
> > -int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
> > - struct ifreq *ifr, void __user *uarg)
> > +int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg)
> > {
> > int ret = -EOPNOTSUPP;
> > + struct ifreq ifr;
> > +
> > + switch (cmd) {
> > + case SIOCBRADDIF:
> > + case SIOCBRDELIF: {
>
> Why not a simple if statement? Unlikely that we will add more commands
> to this switch statement.
Exactly, will use if in v2.
Then the funky }} will look cleaner too.
Thank you both !
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
2025-03-14 0:59 [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF Kuniyuki Iwashima
2025-03-16 16:16 ` Ido Schimmel
@ 2025-03-16 16:41 ` Stanislav Fomichev
1 sibling, 0 replies; 4+ messages in thread
From: Stanislav Fomichev @ 2025-03-16 16:41 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Roopa Prabhu, Nikolay Aleksandrov, Willem de Bruijn,
Simon Horman, Kuniyuki Iwashima, netdev, bridge, syzkaller,
yan kang, yue sun
On 03/13, Kuniyuki Iwashima wrote:
> SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
> br_ioctl_call(), which causes unnecessary RTNL dance and the splat
> below [0] under RTNL pressure.
>
> Let's say Thread A is trying to detach a device from a bridge and
> Thread B is trying to remove the bridge.
>
> In dev_ioctl(), Thread A bumps the bridge device's refcnt by
> netdev_hold() and releases RTNL because the following br_ioctl_call()
> also re-acquires RTNL.
>
> In the race window, Thread B could acquire RTNL and try to remove
> the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
> and wait for netdev_put() by Thread A.
>
> Thread A, however, must hold RTNL twice after the unlock in dev_ifsioc(),
> which may take long under RTNL pressure, resulting in the splat by
> Thread B.
>
> Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
> ---------------------- ----------------------
> sock_ioctl sock_ioctl
> `- sock_do_ioctl `- br_ioctl_call
> `- dev_ioctl `- br_ioctl_stub
> |- rtnl_lock |
> |- dev_ifsioc '
> ' |- dev = __dev_get_by_name(...)
> |- netdev_hold(dev, ...) .
> / |- rtnl_unlock ------. |
> | |- br_ioctl_call `---> |- rtnl_lock
> Race | | `- br_ioctl_stub |- br_del_bridge
> Window | | | |- dev = __dev_get_by_name(...)
> | | | May take long | `- br_dev_delete(dev, ...)
> | | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...)
> | | | | `- rtnl_unlock
> | | |- rtnl_lock <--| `- netdev_run_todo
> | | |- ... | `- netdev_run_todo
> | | `- rtnl_unlock | |- __rtnl_unlock
> | | | |- netdev_wait_allrefs_any
> \ |- rtnl_lock <--------' |
> |- netdev_put(dev, ...) <----------------' Wait refcnt decrement
> and log splat below
>
> To avoid blocking SIOCBRDELBR unnecessarily, let's not call
> dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
>
> In the dev_ioctl() path, we do the following:
>
> 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
> 2. Check CAP_NET_ADMIN in dev_ioctl()
> 3. Call dev_load() in dev_ioctl()
> 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
>
> 3. can be done by request_module() in br_ioctl_call(), so we move
> 1., 2., and 4. to br_ioctl_stub().
>
> Note that 2. is also checked later in add_del_if(), but it's better
> performed before RTNL.
>
> SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
> the pre-git era, and there seems to be no specific reason to process
> them there.
>
> [0]:
> unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
> ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
> __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
> netdev_hold include/linux/netdevice.h:4311 [inline]
> dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
> dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
> sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
> sock_ioctl+0x23a/0x6c0 net/socket.c:1318
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:906 [inline]
> __se_sys_ioctl fs/ioctl.c:892 [inline]
> __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Reported-by: yan kang <kangyan91@outlook.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> include/linux/if_bridge.h | 6 ++----
> net/bridge/br_ioctl.c | 39 ++++++++++++++++++++++++++++++++++++---
> net/bridge/br_private.h | 3 +--
> net/core/dev_ioctl.c | 19 -------------------
> net/socket.c | 19 +++++++++----------
> 5 files changed, 48 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 3ff96ae31bf6..c5fe3b2a53e8 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -65,11 +65,9 @@ struct br_ip_list {
> #define BR_DEFAULT_AGEING_TIME (300 * HZ)
>
> struct net_bridge;
> -void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br,
> - unsigned int cmd, struct ifreq *ifr,
> +void brioctl_set(int (*hook)(struct net *net, unsigned int cmd,
> void __user *uarg));
> -int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd,
> - struct ifreq *ifr, void __user *uarg);
> +int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg);
>
> #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
> int br_multicast_list_adjacent(struct net_device *dev,
> diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
> index f213ed108361..b5a607f6da4e 100644
> --- a/net/bridge/br_ioctl.c
> +++ b/net/bridge/br_ioctl.c
> @@ -394,10 +394,29 @@ static int old_deviceless(struct net *net, void __user *data)
> return -EOPNOTSUPP;
> }
>
> -int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd,
> - struct ifreq *ifr, void __user *uarg)
> +int br_ioctl_stub(struct net *net, unsigned int cmd, void __user *uarg)
> {
> int ret = -EOPNOTSUPP;
> + struct ifreq ifr;
> +
> + switch (cmd) {
> + case SIOCBRADDIF:
> + case SIOCBRDELIF: {
> + void __user *data;
> + char *colon;
> +
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + return -EPERM;
> +
> + if (get_user_ifreq(&ifr, &data, uarg))
> + return -EFAULT;
> +
> + ifr.ifr_name[IFNAMSIZ - 1] = 0;
> + colon = strchr(ifr.ifr_name, ':');
> + if (colon)
> + *colon = 0;
> + }
> + }
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Although these double } } look funky. Maybe properly declare variables
at the top instead?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-03-16 19:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-14 0:59 [PATCH v1 net] net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF Kuniyuki Iwashima
2025-03-16 16:16 ` Ido Schimmel
2025-03-16 19:01 ` Kuniyuki Iwashima
2025-03-16 16:41 ` Stanislav Fomichev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).