* [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
@ 2026-02-14 3:38 Jiayuan Chen
2026-02-18 1:10 ` Stanislav Fomichev
0 siblings, 1 reply; 13+ messages in thread
From: Jiayuan Chen @ 2026-02-14 3:38 UTC (permalink / raw)
To: netdev
Cc: jiayuan.chen, Jiayuan Chen, syzbot+2b3391f44313b3983e91,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
From: Jiayuan Chen <jiayuan.chen@shopee.com>
__dev_set_rx_mode() is called with addr_list_lock (spinlock) held from
many places in dev_addr_lists.c. When a device lacks IFF_UNICAST_FLT,
__dev_set_rx_mode() calls __dev_set_promiscuity() which propagates
through dev_change_rx_flags -> ndo_change_rx_flags -> dev_set_promiscuity
on lower devices. Since commit 78cd408356fe ("net: add missing instance
lock to dev_set_promiscuity"), dev_set_promiscuity() acquires the netdev
instance lock (mutex) via netdev_lock_ops(). This leads to a
"sleeping function called from invalid context" / "Invalid wait context"
bug when the lower device has request_ops_lock or queue_mgmt_ops set.
The call chain is:
dev_uc_add(bridge0) # e.g. from macsec_dev_open
netif_addr_lock_bh(bridge0) # <- spinlock, BH disabled
__dev_set_rx_mode(bridge0) # bridge has no IFF_UNICAST_FLT
__dev_set_promiscuity(bridge0)
ndo_change_rx_flags(bridge0)
br_manage_promisc -> dev_set_promiscuity(team0)
ndo_change_rx_flags(team0)
team_change_rx_flags -> dev_set_promiscuity(dummy0)
netdev_lock_ops(dummy0) # <- mutex! dummy has
# request_ops_lock=true
This is not limited to bridge/team/dummy. Any combination of stacking
devices (bridge, bond, macvlan, vlan, macsec, team, dsa, netvsc) over
devices with instance lock (dummy, mlx5, bnxt, gve) can trigger this.
Fix this by deferring __dev_set_promiscuity() to after the spinlock is
released:
1. Change __dev_set_rx_mode() to return a promiscuity increment value
(+1, 0, -1) instead of calling __dev_set_promiscuity() directly.
The uc_promisc flag is still updated under the lock for correctness.
2. Change dev_set_rx_mode() to call __dev_set_promiscuity() after
releasing addr_list_lock, based on the returned increment.
3. Change all callers in dev_addr_lists.c to release their spinlock
first, then call dev_set_rx_mode() which handles both the rx mode
update and the deferred promiscuity change safely.
Reproducer:
ip link add dummy0 type dummy
ip link add team0 type team
ip link set dummy0 master team0
ip link set team0 up
ip link add bridge0 type bridge vlan_filtering 1
ip link set bridge0 up
ip link set team0 master bridge0
ip link add macsec0 link bridge0 type macsec
ip link set macsec0 up # triggers the bug
Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
net/core/dev.c | 26 ++++++++++++++++-------
net/core/dev.h | 2 +-
net/core/dev_addr_lists.c | 44 +++++++++++++++++++--------------------
3 files changed, 42 insertions(+), 30 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index ac6bcb2a0784..7cd25e850972 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9639,39 +9639,51 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify)
* filtering it is put in promiscuous mode while unicast addresses
* are present.
*/
-void __dev_set_rx_mode(struct net_device *dev)
+int __dev_set_rx_mode(struct net_device *dev)
{
const struct net_device_ops *ops = dev->netdev_ops;
+ int promisc_inc = 0;
/* dev_open will call this function so the list will stay sane. */
if (!(dev->flags&IFF_UP))
- return;
+ return 0;
if (!netif_device_present(dev))
- return;
+ return 0;
if (!(dev->priv_flags & IFF_UNICAST_FLT)) {
/* Unicast addresses changes may only happen under the rtnl,
- * therefore calling __dev_set_promiscuity here is safe.
+ * therefore changing uc_promisc here is safe. The actual
+ * __dev_set_promiscuity() call is deferred to the caller
+ * (after releasing addr_list_lock) because it may trigger
+ * ndo_change_rx_flags -> dev_set_promiscuity on lower
+ * devices which can acquire netdev instance lock (mutex).
*/
if (!netdev_uc_empty(dev) && !dev->uc_promisc) {
- __dev_set_promiscuity(dev, 1, false);
dev->uc_promisc = true;
+ promisc_inc = 1;
} else if (netdev_uc_empty(dev) && dev->uc_promisc) {
- __dev_set_promiscuity(dev, -1, false);
dev->uc_promisc = false;
+ promisc_inc = -1;
}
}
if (ops->ndo_set_rx_mode)
ops->ndo_set_rx_mode(dev);
+
+ return promisc_inc;
}
void dev_set_rx_mode(struct net_device *dev)
{
+ int promisc_inc;
+
netif_addr_lock_bh(dev);
- __dev_set_rx_mode(dev);
+ promisc_inc = __dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
+
+ if (promisc_inc)
+ __dev_set_promiscuity(dev, promisc_inc, false);
}
/**
diff --git a/net/core/dev.h b/net/core/dev.h
index 98793a738f43..1759384dea8d 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -144,7 +144,7 @@ void dev_set_group(struct net_device *dev, int new_group);
int netif_change_carrier(struct net_device *dev, bool new_carrier);
int dev_change_carrier(struct net_device *dev, bool new_carrier);
-void __dev_set_rx_mode(struct net_device *dev);
+int __dev_set_rx_mode(struct net_device *dev);
void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
unsigned int gchanges, u32 portid,
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 76c91f224886..04222e7544c1 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -667,9 +667,9 @@ int dev_uc_add_excl(struct net_device *dev, const unsigned char *addr)
err = __hw_addr_add_ex(&dev->uc, addr, dev->addr_len,
NETDEV_HW_ADDR_T_UNICAST, true, false,
0, true);
- if (!err)
- __dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
+ if (!err)
+ dev_set_rx_mode(dev);
return err;
}
EXPORT_SYMBOL(dev_uc_add_excl);
@@ -689,9 +689,9 @@ int dev_uc_add(struct net_device *dev, const unsigned char *addr)
netif_addr_lock_bh(dev);
err = __hw_addr_add(&dev->uc, addr, dev->addr_len,
NETDEV_HW_ADDR_T_UNICAST);
- if (!err)
- __dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
+ if (!err)
+ dev_set_rx_mode(dev);
return err;
}
EXPORT_SYMBOL(dev_uc_add);
@@ -711,9 +711,9 @@ int dev_uc_del(struct net_device *dev, const unsigned char *addr)
netif_addr_lock_bh(dev);
err = __hw_addr_del(&dev->uc, addr, dev->addr_len,
NETDEV_HW_ADDR_T_UNICAST);
- if (!err)
- __dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
+ if (!err)
+ dev_set_rx_mode(dev);
return err;
}
EXPORT_SYMBOL(dev_uc_del);
@@ -740,9 +740,9 @@ int dev_uc_sync(struct net_device *to, struct net_device *from)
netif_addr_lock(to);
err = __hw_addr_sync(&to->uc, &from->uc, to->addr_len);
- if (!err)
- __dev_set_rx_mode(to);
netif_addr_unlock(to);
+ if (!err)
+ dev_set_rx_mode(to);
return err;
}
EXPORT_SYMBOL(dev_uc_sync);
@@ -770,9 +770,9 @@ int dev_uc_sync_multiple(struct net_device *to, struct net_device *from)
netif_addr_lock(to);
err = __hw_addr_sync_multiple(&to->uc, &from->uc, to->addr_len);
- if (!err)
- __dev_set_rx_mode(to);
netif_addr_unlock(to);
+ if (!err)
+ dev_set_rx_mode(to);
return err;
}
EXPORT_SYMBOL(dev_uc_sync_multiple);
@@ -803,9 +803,9 @@ void dev_uc_unsync(struct net_device *to, struct net_device *from)
netif_addr_lock_bh(from);
netif_addr_lock(to);
__hw_addr_unsync(&to->uc, &from->uc, to->addr_len);
- __dev_set_rx_mode(to);
netif_addr_unlock(to);
netif_addr_unlock_bh(from);
+ dev_set_rx_mode(to);
}
EXPORT_SYMBOL(dev_uc_unsync);
@@ -852,9 +852,9 @@ int dev_mc_add_excl(struct net_device *dev, const unsigned char *addr)
err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
NETDEV_HW_ADDR_T_MULTICAST, true, false,
0, true);
- if (!err)
- __dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
+ if (!err)
+ dev_set_rx_mode(dev);
return err;
}
EXPORT_SYMBOL(dev_mc_add_excl);
@@ -868,9 +868,9 @@ static int __dev_mc_add(struct net_device *dev, const unsigned char *addr,
err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
NETDEV_HW_ADDR_T_MULTICAST, global, false,
0, false);
- if (!err)
- __dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
+ if (!err)
+ dev_set_rx_mode(dev);
return err;
}
/**
@@ -908,9 +908,9 @@ static int __dev_mc_del(struct net_device *dev, const unsigned char *addr,
netif_addr_lock_bh(dev);
err = __hw_addr_del_ex(&dev->mc, addr, dev->addr_len,
NETDEV_HW_ADDR_T_MULTICAST, global, false);
- if (!err)
- __dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
+ if (!err)
+ dev_set_rx_mode(dev);
return err;
}
@@ -963,9 +963,9 @@ int dev_mc_sync(struct net_device *to, struct net_device *from)
netif_addr_lock(to);
err = __hw_addr_sync(&to->mc, &from->mc, to->addr_len);
- if (!err)
- __dev_set_rx_mode(to);
netif_addr_unlock(to);
+ if (!err)
+ dev_set_rx_mode(to);
return err;
}
EXPORT_SYMBOL(dev_mc_sync);
@@ -993,9 +993,9 @@ int dev_mc_sync_multiple(struct net_device *to, struct net_device *from)
netif_addr_lock(to);
err = __hw_addr_sync_multiple(&to->mc, &from->mc, to->addr_len);
- if (!err)
- __dev_set_rx_mode(to);
netif_addr_unlock(to);
+ if (!err)
+ dev_set_rx_mode(to);
return err;
}
EXPORT_SYMBOL(dev_mc_sync_multiple);
@@ -1018,9 +1018,9 @@ void dev_mc_unsync(struct net_device *to, struct net_device *from)
netif_addr_lock_bh(from);
netif_addr_lock(to);
__hw_addr_unsync(&to->mc, &from->mc, to->addr_len);
- __dev_set_rx_mode(to);
netif_addr_unlock(to);
netif_addr_unlock_bh(from);
+ dev_set_rx_mode(to);
}
EXPORT_SYMBOL(dev_mc_unsync);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-14 3:38 [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context Jiayuan Chen
@ 2026-02-18 1:10 ` Stanislav Fomichev
2026-02-19 1:40 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2026-02-18 1:10 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, Jiayuan Chen, syzbot+2b3391f44313b3983e91,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
On 02/14, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
> __dev_set_rx_mode() is called with addr_list_lock (spinlock) held from
> many places in dev_addr_lists.c. When a device lacks IFF_UNICAST_FLT,
> __dev_set_rx_mode() calls __dev_set_promiscuity() which propagates
> through dev_change_rx_flags -> ndo_change_rx_flags -> dev_set_promiscuity
> on lower devices. Since commit 78cd408356fe ("net: add missing instance
> lock to dev_set_promiscuity"), dev_set_promiscuity() acquires the netdev
> instance lock (mutex) via netdev_lock_ops(). This leads to a
> "sleeping function called from invalid context" / "Invalid wait context"
> bug when the lower device has request_ops_lock or queue_mgmt_ops set.
>
> The call chain is:
>
> dev_uc_add(bridge0) # e.g. from macsec_dev_open
> netif_addr_lock_bh(bridge0) # <- spinlock, BH disabled
> __dev_set_rx_mode(bridge0) # bridge has no IFF_UNICAST_FLT
> __dev_set_promiscuity(bridge0)
> ndo_change_rx_flags(bridge0)
> br_manage_promisc -> dev_set_promiscuity(team0)
> ndo_change_rx_flags(team0)
> team_change_rx_flags -> dev_set_promiscuity(dummy0)
> netdev_lock_ops(dummy0) # <- mutex! dummy has
> # request_ops_lock=true
>
> This is not limited to bridge/team/dummy. Any combination of stacking
> devices (bridge, bond, macvlan, vlan, macsec, team, dsa, netvsc) over
> devices with instance lock (dummy, mlx5, bnxt, gve) can trigger this.
>
> Fix this by deferring __dev_set_promiscuity() to after the spinlock is
> released:
>
> 1. Change __dev_set_rx_mode() to return a promiscuity increment value
> (+1, 0, -1) instead of calling __dev_set_promiscuity() directly.
> The uc_promisc flag is still updated under the lock for correctness.
>
> 2. Change dev_set_rx_mode() to call __dev_set_promiscuity() after
> releasing addr_list_lock, based on the returned increment.
>
> 3. Change all callers in dev_addr_lists.c to release their spinlock
> first, then call dev_set_rx_mode() which handles both the rx mode
> update and the deferred promiscuity change safely.
[..]
> Reproducer:
>
> ip link add dummy0 type dummy
> ip link add team0 type team
> ip link set dummy0 master team0
> ip link set team0 up
> ip link add bridge0 type bridge vlan_filtering 1
> ip link set bridge0 up
> ip link set team0 master bridge0
> ip link add macsec0 link bridge0 type macsec
> ip link set macsec0 up # triggers the bug
Can you add it as a selftest under selftests/drivers/net/team/?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-18 1:10 ` Stanislav Fomichev
@ 2026-02-19 1:40 ` Jakub Kicinski
2026-02-19 18:59 ` Stanislav Fomichev
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2026-02-19 1:40 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jiayuan Chen, netdev, Jiayuan Chen, syzbot+2b3391f44313b3983e91,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Sabrina Dubroca, Stanislav Fomichev, Kuniyuki Iwashima,
Samiullah Khawaja, Ahmed Zaki, Alexander Lobakin,
Willem de Bruijn, linux-kernel
On Tue, 17 Feb 2026 17:10:36 -0800 Stanislav Fomichev wrote:
> > Reproducer:
> >
> > ip link add dummy0 type dummy
> > ip link add team0 type team
> > ip link set dummy0 master team0
> > ip link set team0 up
> > ip link add bridge0 type bridge vlan_filtering 1
> > ip link set bridge0 up
> > ip link set team0 master bridge0
> > ip link add macsec0 link bridge0 type macsec
> > ip link set macsec0 up # triggers the bug
>
> Can you add it as a selftest under selftests/drivers/net/team/?
Stan, this "fix" may work for the promisc flag but won't we have
the same problem with sync'ing the address list? Looks like team
will do:
- team_set_rx_mode()
- dev_uc_sync_multiple()
- __dev_set_rx_mode(port->dev)
so AFAICT we're calling ndo_set_rx_mode without holding the instance
lock?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-19 1:40 ` Jakub Kicinski
@ 2026-02-19 18:59 ` Stanislav Fomichev
2026-02-19 20:12 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2026-02-19 18:59 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jiayuan Chen, netdev, Jiayuan Chen, syzbot+2b3391f44313b3983e91,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Sabrina Dubroca, Stanislav Fomichev, Kuniyuki Iwashima,
Samiullah Khawaja, Ahmed Zaki, Alexander Lobakin,
Willem de Bruijn, linux-kernel
On 02/18, Jakub Kicinski wrote:
> On Tue, 17 Feb 2026 17:10:36 -0800 Stanislav Fomichev wrote:
> > > Reproducer:
> > >
> > > ip link add dummy0 type dummy
> > > ip link add team0 type team
> > > ip link set dummy0 master team0
> > > ip link set team0 up
> > > ip link add bridge0 type bridge vlan_filtering 1
> > > ip link set bridge0 up
> > > ip link set team0 master bridge0
> > > ip link add macsec0 link bridge0 type macsec
> > > ip link set macsec0 up # triggers the bug
> >
> > Can you add it as a selftest under selftests/drivers/net/team/?
>
> Stan, this "fix" may work for the promisc flag but won't we have
> the same problem with sync'ing the address list? Looks like team
> will do:
> - team_set_rx_mode()
> - dev_uc_sync_multiple()
> - __dev_set_rx_mode(port->dev)
> so AFAICT we're calling ndo_set_rx_mode without holding the instance
> lock?
Not sure I understand your trace without more details about the hierarchy.
But you have a point, per netdevices.rst ndo_set_rx_mode is synchronized via
netif_addr_lock and we are breaking that with this patch.. :-(
(so I don't think we need an instance lock if we keep netif_addr_lock?)
For this particular issue, maybe we can do something similar to net_todo_list?
Instead of changing the promisc for !FLT under right here right now, move it
to the rtnl_unlock? Not sure how important the ordering is..
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-19 18:59 ` Stanislav Fomichev
@ 2026-02-19 20:12 ` Jakub Kicinski
2026-02-20 0:30 ` Stanislav Fomichev
2026-02-20 5:21 ` I Viswanath
0 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2026-02-19 20:12 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jiayuan Chen, netdev, Jiayuan Chen, syzbot+2b3391f44313b3983e91,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Sabrina Dubroca, Stanislav Fomichev, Kuniyuki Iwashima,
Samiullah Khawaja, Ahmed Zaki, Alexander Lobakin,
Willem de Bruijn, linux-kernel
On Thu, 19 Feb 2026 10:59:01 -0800 Stanislav Fomichev wrote:
> On 02/18, Jakub Kicinski wrote:
> > On Tue, 17 Feb 2026 17:10:36 -0800 Stanislav Fomichev wrote:
> > > > Reproducer:
> > > >
> > > > ip link add dummy0 type dummy
> > > > ip link add team0 type team
> > > > ip link set dummy0 master team0
> > > > ip link set team0 up
> > > > ip link add bridge0 type bridge vlan_filtering 1
> > > > ip link set bridge0 up
> > > > ip link set team0 master bridge0
> > > > ip link add macsec0 link bridge0 type macsec
> > > > ip link set macsec0 up # triggers the bug
> > >
> > > Can you add it as a selftest under selftests/drivers/net/team/?
> >
> > Stan, this "fix" may work for the promisc flag but won't we have
> > the same problem with sync'ing the address list? Looks like team
> > will do:
> > - team_set_rx_mode()
> > - dev_uc_sync_multiple()
> > - __dev_set_rx_mode(port->dev)
> > so AFAICT we're calling ndo_set_rx_mode without holding the instance
> > lock?
>
> Not sure I understand your trace without more details about the hierarchy.
Team on top of a ops-locked netdev
- team_set_rx_mode() # set_rx_mode on team
- dev_uc_sync_multiple()
- __dev_set_rx_mode(port->dev) # calls ndo_set_rx_mode on ops-locked
# netdev without holding the inst. lock
IOW this will fire:
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 6285fbefe38a..77991f62bffc 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -184,6 +184,7 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
static void nsim_set_rx_mode(struct net_device *dev)
{
+ netdev_assert_locked(dev);
}
static int nsim_change_mtu(struct net_device *dev, int new_mtu)
> But you have a point, per netdevices.rst ndo_set_rx_mode is synchronized via
> netif_addr_lock and we are breaking that with this patch.. :-(
> (so I don't think we need an instance lock if we keep netif_addr_lock?)
>
> For this particular issue, maybe we can do something similar to net_todo_list?
> Instead of changing the promisc for !FLT under right here right now, move it
> to the rtnl_unlock? Not sure how important the ordering is..
Not sure. Another alternative is to implement the long standing idea of
having an async / sleeping version of ndo_set_rx_mode() orchestrated
by the core. Because a lot of drivers need to sleep, anyway, so they
just schedule a work from that callback.
Then we can say old ndo_set_rx_mode is under netif_addr_lock.
ndo_set_rx_mode_async is under instance lock.
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-19 20:12 ` Jakub Kicinski
@ 2026-02-20 0:30 ` Stanislav Fomichev
2026-02-20 1:10 ` Jakub Kicinski
2026-02-20 5:21 ` I Viswanath
1 sibling, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2026-02-20 0:30 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jiayuan Chen, netdev, Jiayuan Chen, syzbot+2b3391f44313b3983e91,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Sabrina Dubroca, Stanislav Fomichev, Kuniyuki Iwashima,
Samiullah Khawaja, Ahmed Zaki, Alexander Lobakin,
Willem de Bruijn, linux-kernel
On 02/19, Jakub Kicinski wrote:
> On Thu, 19 Feb 2026 10:59:01 -0800 Stanislav Fomichev wrote:
> > On 02/18, Jakub Kicinski wrote:
> > > On Tue, 17 Feb 2026 17:10:36 -0800 Stanislav Fomichev wrote:
> > > > > Reproducer:
> > > > >
> > > > > ip link add dummy0 type dummy
> > > > > ip link add team0 type team
> > > > > ip link set dummy0 master team0
> > > > > ip link set team0 up
> > > > > ip link add bridge0 type bridge vlan_filtering 1
> > > > > ip link set bridge0 up
> > > > > ip link set team0 master bridge0
> > > > > ip link add macsec0 link bridge0 type macsec
> > > > > ip link set macsec0 up # triggers the bug
> > > >
> > > > Can you add it as a selftest under selftests/drivers/net/team/?
> > >
> > > Stan, this "fix" may work for the promisc flag but won't we have
> > > the same problem with sync'ing the address list? Looks like team
> > > will do:
> > > - team_set_rx_mode()
> > > - dev_uc_sync_multiple()
> > > - __dev_set_rx_mode(port->dev)
> > > so AFAICT we're calling ndo_set_rx_mode without holding the instance
> > > lock?
> >
> > Not sure I understand your trace without more details about the hierarchy.
>
> Team on top of a ops-locked netdev
>
> - team_set_rx_mode() # set_rx_mode on team
> - dev_uc_sync_multiple()
> - __dev_set_rx_mode(port->dev) # calls ndo_set_rx_mode on ops-locked
> # netdev without holding the inst. lock
>
> IOW this will fire:
>
> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
> index 6285fbefe38a..77991f62bffc 100644
> --- a/drivers/net/netdevsim/netdev.c
> +++ b/drivers/net/netdevsim/netdev.c
> @@ -184,6 +184,7 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
>
> static void nsim_set_rx_mode(struct net_device *dev)
> {
> + netdev_assert_locked(dev);
> }
>
> static int nsim_change_mtu(struct net_device *dev, int new_mtu)
Ah, you're saying in general.. Yeah, agreed, for the instance locked
devices not grabbing an instance lock for these paths looks problematic.
> > But you have a point, per netdevices.rst ndo_set_rx_mode is synchronized via
> > netif_addr_lock and we are breaking that with this patch.. :-(
> > (so I don't think we need an instance lock if we keep netif_addr_lock?)
> >
> > For this particular issue, maybe we can do something similar to net_todo_list?
> > Instead of changing the promisc for !FLT under right here right now, move it
> > to the rtnl_unlock? Not sure how important the ordering is..
>
> Not sure. Another alternative is to implement the long standing idea of
> having an async / sleeping version of ndo_set_rx_mode() orchestrated
> by the core. Because a lot of drivers need to sleep, anyway, so they
> just schedule a work from that callback.
>
> Then we can say old ndo_set_rx_mode is under netif_addr_lock.
> ndo_set_rx_mode_async is under instance lock.
That sounds like a better plan going forward, but gonna need a bunch of
work to redo the addr lock it seems? We can start with moving promisc into
rtnl_unlock to unblock that "bridge vlan_filtering 1" and I
can look into adding an instance lock for set_rx_mode.. LMK if you prefer
me to focus on the latter and don't waste time on the former.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-20 0:30 ` Stanislav Fomichev
@ 2026-02-20 1:10 ` Jakub Kicinski
0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2026-02-20 1:10 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jiayuan Chen, netdev, Jiayuan Chen, syzbot+2b3391f44313b3983e91,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Sabrina Dubroca, Stanislav Fomichev, Kuniyuki Iwashima,
Samiullah Khawaja, Ahmed Zaki, Alexander Lobakin,
Willem de Bruijn, linux-kernel
On Thu, 19 Feb 2026 16:30:42 -0800 Stanislav Fomichev wrote:
> > > But you have a point, per netdevices.rst ndo_set_rx_mode is synchronized via
> > > netif_addr_lock and we are breaking that with this patch.. :-(
> > > (so I don't think we need an instance lock if we keep netif_addr_lock?)
> > >
> > > For this particular issue, maybe we can do something similar to net_todo_list?
> > > Instead of changing the promisc for !FLT under right here right now, move it
> > > to the rtnl_unlock? Not sure how important the ordering is..
> >
> > Not sure. Another alternative is to implement the long standing idea of
> > having an async / sleeping version of ndo_set_rx_mode() orchestrated
> > by the core. Because a lot of drivers need to sleep, anyway, so they
> > just schedule a work from that callback.
> >
> > Then we can say old ndo_set_rx_mode is under netif_addr_lock.
> > ndo_set_rx_mode_async is under instance lock.
>
> That sounds like a better plan going forward, but gonna need a bunch of
> work to redo the addr lock it seems? We can start with moving promisc into
> rtnl_unlock to unblock that "bridge vlan_filtering 1" and I
> can look into adding an instance lock for set_rx_mode.. LMK if you prefer
> me to focus on the latter and don't waste time on the former.
I'd jump directly to the latter.
Former adds its own trickiness and IIUC the latter will deprecate it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-19 20:12 ` Jakub Kicinski
2026-02-20 0:30 ` Stanislav Fomichev
@ 2026-02-20 5:21 ` I Viswanath
2026-02-21 1:15 ` Stanislav Fomichev
1 sibling, 1 reply; 13+ messages in thread
From: I Viswanath @ 2026-02-20 5:21 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Stanislav Fomichev, Jiayuan Chen, netdev, Jiayuan Chen,
syzbot+2b3391f44313b3983e91, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
On Fri, 20 Feb 2026 at 01:42, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Not sure. Another alternative is to implement the long standing idea of
> having an async / sleeping version of ndo_set_rx_mode() orchestrated
> by the core. Because a lot of drivers need to sleep, anyway, so they
> just schedule a work from that callback.
>
> Then we can say old ndo_set_rx_mode is under netif_addr_lock.
> ndo_set_rx_mode_async is under instance lock.
Hello, I have been working on this idea here :/
https://lore.kernel.org/netdev/20260112181626.20117-1-viswanathiyyappan@gmail.com/
I am calling it ndo_write_rx_mode but if ndo_set_rx_mode_async sounds
better, I will go with that. As the constructs I would
be introducing in v9 (enable/disable deferred ctx) should be useful
for other NDOs that would want to do their work async, I am looking
for concrete use cases to justify this. Right now, I have
- ndo_set_rx_mode
- ndo_change_rx_flags
- ndo_tx_timeout (Few drivers seem to schedule this and it's under
tx_global_lock spinlock)
Do you mean rtnl_lock when you say instance lock?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-21 1:15 ` Stanislav Fomichev
@ 2026-02-20 20:45 ` I Viswanath
2026-02-21 6:23 ` Stanislav Fomichev
2026-02-21 1:22 ` Jakub Kicinski
1 sibling, 1 reply; 13+ messages in thread
From: I Viswanath @ 2026-02-20 20:45 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jakub Kicinski, Jiayuan Chen, netdev, Jiayuan Chen,
syzbot+2b3391f44313b3983e91, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
On Sat, 21 Feb 2026 at 06:45, Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > Do you mean rtnl_lock when you say instance lock?
>
> Search for the instance locks in Documentation/networking/netdevices.rst. We
> essentially need to add netdev_ops_assert_locked to __dev_set_rx_mode
> and make it work.
So, the handler should look something like this?
static void netif_write_rx_mode(struct work_struct *param) {
rtnl_lock()
netdev_lock()
/* Processing */
netdev_unlock();
rtnl_unlock();
}
v9 should be out by next week and fixes all the issues I could think
of so please wait for that
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-20 5:21 ` I Viswanath
@ 2026-02-21 1:15 ` Stanislav Fomichev
2026-02-20 20:45 ` I Viswanath
2026-02-21 1:22 ` Jakub Kicinski
0 siblings, 2 replies; 13+ messages in thread
From: Stanislav Fomichev @ 2026-02-21 1:15 UTC (permalink / raw)
To: I Viswanath
Cc: Jakub Kicinski, Jiayuan Chen, netdev, Jiayuan Chen,
syzbot+2b3391f44313b3983e91, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
On 02/20, I Viswanath wrote:
> On Fri, 20 Feb 2026 at 01:42, Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Not sure. Another alternative is to implement the long standing idea of
> > having an async / sleeping version of ndo_set_rx_mode() orchestrated
> > by the core. Because a lot of drivers need to sleep, anyway, so they
> > just schedule a work from that callback.
> >
> > Then we can say old ndo_set_rx_mode is under netif_addr_lock.
> > ndo_set_rx_mode_async is under instance lock.
>
> Hello, I have been working on this idea here :/
>
> https://lore.kernel.org/netdev/20260112181626.20117-1-viswanathiyyappan@gmail.com/
>
> I am calling it ndo_write_rx_mode but if ndo_set_rx_mode_async sounds
> better, I will go with that. As the constructs I would
> be introducing in v9 (enable/disable deferred ctx) should be useful
> for other NDOs that would want to do their work async, I am looking
> for concrete use cases to justify this. Right now, I have
> - ndo_set_rx_mode
> - ndo_change_rx_flags
> - ndo_tx_timeout (Few drivers seem to schedule this and it's under
> tx_global_lock spinlock)
Hmm, interesting, I haven't seen that. So this takes care of
non-sleeping ndo_set_rx_mode. We'd have to move ndo_change_rx_flags to
that new sleeping ndo as well (to resolve the issue that this patch tries
to address>. And then we can try to add instance locks on top of that.
> Do you mean rtnl_lock when you say instance lock?
Search for the instance locks in Documentation/networking/netdevices.rst. We
essentially need to add netdev_ops_assert_locked to __dev_set_rx_mode
and make it work.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-21 1:15 ` Stanislav Fomichev
2026-02-20 20:45 ` I Viswanath
@ 2026-02-21 1:22 ` Jakub Kicinski
2026-02-21 6:22 ` Stanislav Fomichev
1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2026-02-21 1:22 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: I Viswanath, Jiayuan Chen, netdev, Jiayuan Chen,
syzbot+2b3391f44313b3983e91, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
On Fri, 20 Feb 2026 17:15:00 -0800 Stanislav Fomichev wrote:
> Hmm, interesting, I haven't seen that.
FWIW it was one of the "public TODOs" I put together a while back.
Not sure if the version posted by I Viswanath was converging fast
enough to be worth reusing. Up to you.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-21 1:22 ` Jakub Kicinski
@ 2026-02-21 6:22 ` Stanislav Fomichev
0 siblings, 0 replies; 13+ messages in thread
From: Stanislav Fomichev @ 2026-02-21 6:22 UTC (permalink / raw)
To: Jakub Kicinski
Cc: I Viswanath, Jiayuan Chen, netdev, Jiayuan Chen,
syzbot+2b3391f44313b3983e91, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
On 02/20, Jakub Kicinski wrote:
> On Fri, 20 Feb 2026 17:15:00 -0800 Stanislav Fomichev wrote:
> > Hmm, interesting, I haven't seen that.
>
> FWIW it was one of the "public TODOs" I put together a while back.
> Not sure if the version posted by I Viswanath was converging fast
> enough to be worth reusing. Up to you.
I feel like it will take me awhile to figure out the instance locking
(looking back at the lockdep issues we had), but happy to help with
the other parts if I get there first..
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context
2026-02-20 20:45 ` I Viswanath
@ 2026-02-21 6:23 ` Stanislav Fomichev
0 siblings, 0 replies; 13+ messages in thread
From: Stanislav Fomichev @ 2026-02-21 6:23 UTC (permalink / raw)
To: I Viswanath
Cc: Jakub Kicinski, Jiayuan Chen, netdev, Jiayuan Chen,
syzbot+2b3391f44313b3983e91, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Sabrina Dubroca, Stanislav Fomichev,
Kuniyuki Iwashima, Samiullah Khawaja, Ahmed Zaki,
Alexander Lobakin, Willem de Bruijn, linux-kernel
On 02/21, I Viswanath wrote:
> On Sat, 21 Feb 2026 at 06:45, Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > Do you mean rtnl_lock when you say instance lock?
> >
> > Search for the instance locks in Documentation/networking/netdevices.rst. We
> > essentially need to add netdev_ops_assert_locked to __dev_set_rx_mode
> > and make it work.
>
> So, the handler should look something like this?
>
> static void netif_write_rx_mode(struct work_struct *param) {
> rtnl_lock()
> netdev_lock()
> /* Processing */
> netdev_unlock();
> rtnl_unlock();
> }
>
> v9 should be out by next week and fixes all the issues I could think
> of so please wait for that
Let's handle instance locking separately, I don't think it's as simple
as adding netdev_ops_lock/unlock.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-02-21 6:23 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-14 3:38 [PATCH net v1] net: defer __dev_set_promiscuity() to avoid sleeping in atomic context Jiayuan Chen
2026-02-18 1:10 ` Stanislav Fomichev
2026-02-19 1:40 ` Jakub Kicinski
2026-02-19 18:59 ` Stanislav Fomichev
2026-02-19 20:12 ` Jakub Kicinski
2026-02-20 0:30 ` Stanislav Fomichev
2026-02-20 1:10 ` Jakub Kicinski
2026-02-20 5:21 ` I Viswanath
2026-02-21 1:15 ` Stanislav Fomichev
2026-02-20 20:45 ` I Viswanath
2026-02-21 6:23 ` Stanislav Fomichev
2026-02-21 1:22 ` Jakub Kicinski
2026-02-21 6:22 ` Stanislav Fomichev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox