* [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock
@ 2025-03-22 14:37 Kirill Tkhai
2025-03-22 14:37 ` [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller Kirill Tkhai
` (52 more replies)
0 siblings, 53 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:37 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Hi,
this patchset shows the way to completely remove rtnl lock and that
this process can be done iteratively without any shocks. It implements
the architecture of new fine-grained locking to use instead of rtnl,
and iteratively converts many drivers to use it.
I mostly write this mostly a few years ago, more or less recently
I rebased the patches on kernel around 6.11 (there should not
be many conflicts on that version). Currenly I have no plans
to complete this.
If anyone wants to continue, this person can take this patchset
and done the work.
Kirill Tkhai
-----------------------------------------------------------------------
The stages and what is done.
0)Introduce nd_lock. lock_netdev(), double_lock_netdev(), ordering,
nd_lock_transfer_devices()
1)The target of this stage is to attach nd_lock to every registering
device and to keep it locked during netdevice registration.
The significant thing is that we want to have two or more netdevices
sharing the same nd_lock in case of they are configured or modified
together during their lifetime. E.g., bridge and all their port must
share the same nd_lock. Bond and bound devices mush share the same
nd_lock. Both peer of veth are also. Upper and lower, master and slave
devices too.
[This is recursive rule. E.g., if veth is a port of bridge, then
the both veth peers, all bridge ports and bridge itself must relate
to the same nd_lock].
Net devices in kernel are registered via register_netdev() and register_netdevices()
functions.
a)register_netdev() can't be called nested in a configuration actions
(since it's called with rtnl_mutex unlocked), and usually it is used
to register a standalone device (say, driver for physical device.
There are exceptions, and we'll talk about them). So, this primitive
is not very interesting for us.
b)register_netdevice() may be called nested, e.g. from netdevice notifier,
or two devices may be registered at once (e.g., from .newlink or
.changelink). Later these two devices usually modified at the same time
together. Users of this primitive want modifications to make devices
requiring to share nd_lock really relate to same nd_lock.
To make this, we introduce a new variation __register_netdevice().
The difference between __register_netdevice() and register_netdevice()
is in that register_netdevice() allocates and attaches a new nd_lock
to device, while __register_netdevice() should be used in the cases,
when nd_lock is inherited from bound device (say, the second veth peer
inherits nd_lock from the first peer).
Important thing to say is that despite in general register_netdev()
is used to register a standalone device, there are some (mentioned)
exceptions, where several devices may used together in netdevice
notifiers (one of examples is mlx5 drivers family). To handle such
cases and to make their modifications not vital, we connect such
devices to a special fallback_nd_lock. Connected to this fallback_nd_lock,
such devices will share the same nd_lock like we want.
Note, that register_netdev() are changed to use fallback_nd_lock
for every registering device to minimize number of patches for now.
For the most drivers, register_netdev() should be replaced with
register_netdevice() under new nd_lock is locked like it's done
in the patch for loopback. See that patch for the example.
2)The target of this stage is to modify .newlink and .changelink
to make registering and attaching devices sharing the same nd_lock.
Currently, rtnl_newlink() and rtnl_setlink() stacks look like:
rtnl_setlink()
dev1 = __dev_get_by_index(index1)
dev2 = rtnl_create_link()
ops->newlink() or ops->changelink()
dev3 = __dev_get_by_index(index3)
netdev_upper_dev_link(dev2, dev3)
do_set_master()
dev4 = __dev_get_by_index(index4)
These dev1, dev2, dev3 and dev4 have to relate the same nd_lock.
Transfering between two nd_locks requires to own both of them
(see nd_lock_transfer_devices()). But it's impossible in the above
stack since a nested taking of nd_lock requires to follow ordering
rules.
To make nested locking possible, we transform the stack in the below:
rtnl_setlink()
dev1 = __dev_get_by_index(index1)
dev4 = __dev_get_by_index(index4)
dev3 = __dev_get_by_index(index3)
double_lock_netdev(dev1, &nd_lockA, dev4, &nd_lockB)
nd_lock_transfer_devices(&nd_lockA, &nd_lockB) // Now dev1 and dev4 relate
double_unlock_netdev(nd_lockA, nd_lockB) // to same nd_lock
double_lock_netdev(dev1, &nd_lockA, dev3, &nd_lockB)
nd_lock_transfer_devices(&nd_lockA, &nd_lockB) // Now dev1 and dev3 relate
double_unlock_netdev(nd_lockA, nd_lockB) // to same nd_lock (and dev4)
lock_netdev(dev1, &nd_lockA)
dev2 = rtnl_create_link()
attach_nd_lock(dev2, nd_lockA) // Now all four devices share the same nd lock
ops->newlink() or ops->changelink()
dev3 = __dev_get_by_index(index3)
netdev_upper_dev_link(dev2, dev3)
do_set_master(dev4)
unlock_netdev(nd_lockA)
It's important to see that in this example rtnl_setlink() knows that ops->newlink
will dereference device with index3. To make this possible, struct rtnl_link_ops
was extended by two deps members: .newlink_deps and .changelink_deps. Instead of
describing them formally, I'll show them on two examples:
struct link_deps bond_changelink_deps = {
.optional.data = { IFLA_BOND_ACTIVE_SLAVE, IFLA_BOND_PRIMARY, },
};
static struct link_deps hsr_newlink_deps = {
.mandatory.data = { IFLA_HSR_SLAVE1, IFLA_HSR_SLAVE2, },
.optional.data = { IFLA_HSR_INTERLINK, },
};
struct link_deps generic_newlink_deps = {
.mandatory.tb = { IFLA_LINK, }
};
These ids are that bond_changelink(), hsr_newlink() and most of other
drivers dereference. They may be mandatory and optional in dependence
of the way .newlink/.changelink react on devices with such indexes
exist or not. The rest .data and .tb is related to the array used
by .newlink and .changelink to get device id:
int (*newlink)(struct net *src_net,
struct net_device *dev,
struct nlattr *tb[], <---
struct nlattr *data[], <---
struct netlink_ext_ack *extack);
After we introduces .newlink_deps and .changelink_deps and we simply
use corrent nd_lock to attach nested device in .newlink and .changelink.
3)The target of this stage is to make the rest of drivers using register_netdevice()
to share the same nd_lock by bound devices. Examples: dsa, cfg80211, etc.
Here are just non-invasive refactoring.
Now we have all drivers changed to share nd_lock in correct way.
4)At this stage we make all netdev_master_upper_dev_link() called under nd_lock.
Now all connecting to master is protected by nd_lock (unlink is not).
5)At this stage we make all dev_change_net_namespace() called under nd_lock.
6)Next is to make NETDEV_REGISTER event notifier be called under nd_lock.
After previous stage it's not difficult.
See comments about netdevice events in its chapter below.
The patchset ends at this stage.
What is next?
1)More and more netdev parameters should be placed under locked
nd_lock like it's done in this patchset. Netdevice events, rtnl
callbacks, ioctls, etc.
The target is to place a taking nd_lock of device upper in stack
and to bring everything
from:
ioctl()
rtnl_lock()
dev = __dev_get_by_index(index)
func1()
change dev parameter1
func2()
lock_netdev(dev) <-- attention here
netdev_master_upper_dev_link()
change dev parameter2
to:
ioctl()
rtnl_lock()
dev = __dev_get_by_index(index)
lock_netdev(dev) <-- attention here
func1()
change dev parameter1
func2()
netdev_master_upper_dev_link()
change dev parameter2
After we complete this, all device parametes will be protected
by nd_lock.
Keep in mind, that despite we introduced nd_lock and begun
to use it, there is no concurrency during access to device
parameters yet until rtnl_mutex is taken first. Everything
is still protected by rtnl_mutex. So, our responsibility
is not to prevent races connected with introduction nd_lock.
Our responcibility is tracing nested functions calls and
to prevent taking nd_lock when we already own it.
2)The next stage is we change rtnl_mutex and nd_lock order.
We dereference devices from RCU lists.
ioctl(index)
dev = netdev_get_by_index_locked(index, &nd_lock) <-- attention to _locked suffix
rtnl_lock() <-- and here
func1()
change dev parameter1
func2()
netdev_master_upper_dev_link()
change dev parameter2
...
rtnl_unlock()
unlock_netdev(nd_lock)
A new function netdev_get_by_index_locked() from the above:
struct net_device *netdev_get_by_index_locked(index, nd_lock_ptr)
{
again:
dev = netdev_get_by_index(index, tracker);
if (!dev)
return NULL;
if (!lock_netdev(dev, nd_lock_ptr)) { /* device was unregistered in parallel */
netdev_put(dev, tracker);
goto again;
}
netdev_put(dev, tracker);
if (dev->ifindex != index) {
unlock_netdev(*nd_lock_ptr);
goto again;
}
return dev;
}
The same way will look a new function taking two locks of two devices.
3)Now rtnl_lock() will go down in stack and this way will be fast:
ioctl(index)
dev = netdev_get_by_index_locked(index, &nd_lock)
func1()
change dev parameter1
func2()
netdev_master_upper_dev_link()
rtnl_lock(); <--- now it is here
change dev parameter2
rtnl_unlock()
...
unlock_netdev(nd_lock)
We will move more parameters out of rtnl_lock and one day nothing
will be protected using it. But we may leave it to protect something
unrelated to net devices by historical reasons.
***)Netdevice events
int register_netdevice_notifier(struct notifier_block *nb)
int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb)
int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
This may be converted into register_nd_lock_group_notifier()
---
Kirill Tkhai (51):
net: Move some checks from __rtnl_newlink() to caller
net: Add nlaattr check to rtnl_link_get_net_capable()
net: do_setlink() refactoring: move target_net acquiring to callers
net: Extract some code from __rtnl_newlink() to separate func
net: Move dereference of tb[IFLA_MASTER] up
net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create()
net: Introduce nd_lock and primitives to work with it
net: Initially attaching and detaching nd_lock
net: Use register_netdevice() in loopback()
net: Underline newlink and changelink dependencies
net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc
net: Use __register_netdevice in trivial .newlink cases
infiniband_ipoib: Use __register_netdevice in .newlink
vxcan: Use __register_netdevice in .newlink
iavf: Use __register_netdevice()
geneve: Use __register_netdevice in .newlink
netkit: Use __register_netdevice in .newlink
qmi_wwan: Use __register_netdevice in .newlink
bpqether: Provide determined context in __register_netdevice()
ppp: Use __register_netdevice in .newlink
veth: Use __register_netdevice in .newlink
vxlan: Use __register_netdevice in .newlink
hdlc_fr: Use __register_netdevice
lapbeth: Provide determined context in __register_netdevice()
wwan: Use __register_netdevice in .newlink
6lowpan: Use __register_netdevice in .newlink
vlan: Use __register_netdevice in .newlink
dsa: Use __register_netdevice()
ip6gre: Use __register_netdevice() in .changelink
ip6_tunnel: Use __register_netdevice() in .newlink and .changelink
ip6_vti: Use __register_netdevice() in .newlink and .changelink
ip6_sit: Use __register_netdevice() in .newlink and .changelink
net: Now check nobody calls register_netdevice() with nd_lock attached
dsa: Make all switch tree ports relate to same nd_lock
cfg80211: Use fallback_nd_lock for registered devices
ieee802154: Use fallback_nd_lock for registered devices
net: Introduce delayed event work
failover: Link master and slave under nd_lock
netvsc: Make joined device to share master's nd_lock
openvswitch: Make ports share nd_lock of master device
bridge: Make port to have the same nd_lock as bridge
bond: Make master and slave relate to the same nd_lock
net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached
net: Call dellink with nd_lock is held
t7xx: Use __unregister_netdevice()
6lowpan: Use __unregister_netdevice()
netvsc: Call dev_change_net_namespace() under nd_lock
default_device: Call dev_change_net_namespace() under nd_lock
ieee802154: Call dev_change_net_namespace() under nd_lock
cfg80211: Call dev_change_net_namespace() under nd_lock
net: Make all NETDEV_REGISTER events to be called under nd_lock
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 3
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 12
drivers/net/amt.c | 9
drivers/net/bareudp.c | 2
drivers/net/bonding/bond_main.c | 4
drivers/net/bonding/bond_netlink.c | 7
drivers/net/bonding/bond_options.c | 4
drivers/net/can/vxcan.c | 8
drivers/net/ethernet/intel/iavf/iavf_main.c | 59 +-
drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 3
drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 2
drivers/net/geneve.c | 12
drivers/net/gtp.c | 2
drivers/net/hamradio/bpqether.c | 33 +
drivers/net/hyperv/netvsc_drv.c | 28 +
drivers/net/ipvlan/ipvlan_main.c | 5
drivers/net/ipvlan/ipvtap.c | 1
drivers/net/loopback.c | 6
drivers/net/macsec.c | 5
drivers/net/macvlan.c | 5
drivers/net/macvtap.c | 1
drivers/net/netkit.c | 8
drivers/net/pfcp.c | 2
drivers/net/ppp/ppp_generic.c | 13
drivers/net/team/team_core.c | 2
drivers/net/usb/qmi_wwan.c | 14
drivers/net/veth.c | 11
drivers/net/vrf.c | 6
drivers/net/vxlan/vxlan_core.c | 42 +
drivers/net/wan/hdlc_fr.c | 18 -
drivers/net/wan/lapbether.c | 28 +
drivers/net/wireguard/device.c | 2
drivers/net/wireless/ath/ath6kl/core.c | 2
drivers/net/wireless/ath/wil6210/netdev.c | 2
drivers/net/wireless/marvell/mwifiex/main.c | 5
drivers/net/wireless/quantenna/qtnfmac/core.c | 2
drivers/net/wireless/virtual/virt_wifi.c | 5
drivers/net/wwan/iosm/iosm_ipc_wwan.c | 2
drivers/net/wwan/mhi_wwan_mbim.c | 2
drivers/net/wwan/t7xx/t7xx_netdev.c | 4
drivers/net/wwan/wwan_core.c | 13
include/linux/netdevice.h | 38 +
include/net/rtnetlink.h | 16 +
net/6lowpan/core.c | 16 -
net/8021q/vlan.c | 11
net/8021q/vlan_netlink.c | 1
net/batman-adv/soft-interface.c | 2
net/bridge/br_ioctl.c | 8
net/bridge/br_netlink.c | 2
net/caif/chnl_net.c | 2
net/core/dev.c | 576 +++++++++++++++++++-
net/core/dev_ioctl.c | 1
net/core/failover.c | 24 +
net/core/rtnetlink.c | 478 ++++++++++++-----
net/dsa/dsa.c | 14
net/dsa/netlink.c | 5
net/dsa/user.c | 25 +
net/hsr/hsr_device.c | 4
net/hsr/hsr_netlink.c | 6
net/ieee802154/6lowpan/core.c | 1
net/ieee802154/core.c | 2
net/ieee802154/nl802154.c | 8
net/ipv4/ip_tunnel.c | 4
net/ipv6/ip6_gre.c | 28 +
net/ipv6/ip6_tunnel.c | 37 +
net/ipv6/ip6_vti.c | 36 +
net/ipv6/sit.c | 45 +-
net/mac80211/main.c | 2
net/mac802154/cfg.c | 2
net/mac802154/iface.c | 10
net/mac802154/main.c | 2
net/openvswitch/vport-netdev.c | 6
net/wireless/core.c | 12
net/wireless/nl80211.c | 15 +
net/xfrm/xfrm_interface_core.c | 2
75 files changed, 1532 insertions(+), 303 deletions(-)
--
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
@ 2025-03-22 14:37 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 02/51] net: Add nlaattr check to rtnl_link_get_net_capable() Kirill Tkhai
` (51 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:37 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 73fd7f543fd0..b33a7e86c534 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3572,15 +3572,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
#ifdef CONFIG_MODULES
replay:
#endif
- err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
- ifla_policy, extack);
- if (err < 0)
- return err;
-
- err = rtnl_ensure_unique_netns(tb, extack, false);
- if (err < 0)
- return err;
-
ifm = nlmsg_data(nlh);
if (ifm->ifi_index > 0) {
link_specified = true;
@@ -3734,13 +3725,25 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct rtnl_newlink_tbs *tbs;
+ struct nlattr **tb;
int ret;
tbs = kmalloc(sizeof(*tbs), GFP_KERNEL);
if (!tbs)
return -ENOMEM;
+ tb = tbs->tb;
+
+ ret = nlmsg_parse_deprecated(nlh, sizeof(struct ifinfomsg), tb,
+ IFLA_MAX, ifla_policy, extack);
+ if (ret < 0)
+ goto out;
+
+ ret = rtnl_ensure_unique_netns(tb, extack, false);
+ if (ret < 0)
+ goto out;
ret = __rtnl_newlink(skb, nlh, tbs, extack);
+out:
kfree(tbs);
return ret;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 02/51] net: Add nlaattr check to rtnl_link_get_net_capable()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
2025-03-22 14:37 ` [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 03/51] net: do_setlink() refactoring: move target_net acquiring to callers Kirill Tkhai
` (50 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b33a7e86c534..34e35b81cfa6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2363,6 +2363,9 @@ static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
{
struct net *net;
+ if (!tb[IFLA_NET_NS_PID] && !tb[IFLA_NET_NS_FD] && !tb[IFLA_TARGET_NETNSID])
+ return NULL;
+
net = rtnl_link_get_net_by_nlattr(src_net, tb);
if (IS_ERR(net))
return net;
@@ -3480,6 +3483,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
if (IS_ERR(dest_net))
return PTR_ERR(dest_net);
+ dest_net = dest_net ? : get_net(net);
if (tb[IFLA_LINK_NETNSID]) {
int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 03/51] net: do_setlink() refactoring: move target_net acquiring to callers
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
2025-03-22 14:37 ` [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 02/51] net: Add nlaattr check to rtnl_link_get_net_capable() Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 04/51] net: Extract some code from __rtnl_newlink() to separate func Kirill Tkhai
` (49 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 78 +++++++++++++++++++++++++++++++-------------------
1 file changed, 49 insertions(+), 29 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 34e35b81cfa6..a5af69af235f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2774,7 +2774,7 @@ static int do_set_proto_down(struct net_device *dev,
#define DO_SETLINK_MODIFIED 0x01
/* notify flag means notify + modified. */
#define DO_SETLINK_NOTIFY 0x03
-static int do_setlink(const struct sk_buff *skb,
+static int do_setlink(struct net *net, const struct sk_buff *skb,
struct net_device *dev, struct ifinfomsg *ifm,
struct netlink_ext_ack *extack,
struct nlattr **tb, int status)
@@ -2788,25 +2788,16 @@ static int do_setlink(const struct sk_buff *skb,
else
ifname[0] = '\0';
- if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD] || tb[IFLA_TARGET_NETNSID]) {
+ if (net) { /* target net */
const char *pat = ifname[0] ? ifname : NULL;
- struct net *net;
int new_ifindex;
- net = rtnl_link_get_net_capable(skb, dev_net(dev),
- tb, CAP_NET_ADMIN);
- if (IS_ERR(net)) {
- err = PTR_ERR(net);
- goto errout;
- }
-
if (tb[IFLA_NEW_IFINDEX])
new_ifindex = nla_get_s32(tb[IFLA_NEW_IFINDEX]);
else
new_ifindex = 0;
err = __dev_change_net_namespace(dev, net, pat, new_ifindex);
- put_net(net);
if (err)
goto errout;
status |= DO_SETLINK_MODIFIED;
@@ -3171,6 +3162,7 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct ifinfomsg *ifm;
struct net_device *dev;
+ struct net *target_net = NULL;
int err;
struct nlattr *tb[IFLA_MAX+1];
@@ -3183,6 +3175,13 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
goto errout;
+ target_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
+ if (IS_ERR(target_net)) {
+ err = PTR_ERR(target_net);
+ target_net = NULL;
+ goto errout;
+ }
+
err = -EINVAL;
ifm = nlmsg_data(nlh);
if (ifm->ifi_index > 0)
@@ -3201,8 +3200,10 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
goto errout;
- err = do_setlink(skb, dev, ifm, extack, tb, 0);
+ err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
errout:
+ if (target_net)
+ put_net(target_net);
return err;
}
@@ -3440,38 +3441,51 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
struct nlattr **tb)
{
struct net_device *dev, *aux;
+ struct net *target_net;
int err;
+ target_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
+ if (IS_ERR(target_net)) {
+ err = PTR_ERR(target_net);
+ target_net = NULL;
+ goto out;
+ }
+
for_each_netdev_safe(net, dev, aux) {
if (dev->group == group) {
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
- return err;
- err = do_setlink(skb, dev, ifm, extack, tb, 0);
+ break;
+ err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
if (err < 0)
- return err;
+ break;
}
}
-
- return 0;
+out:
+ if (target_net)
+ put_net(target_net);
+ return err;
}
static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
const struct rtnl_link_ops *ops,
const struct nlmsghdr *nlh,
struct nlattr **tb, struct nlattr **data,
- struct netlink_ext_ack *extack)
+ struct netlink_ext_ack *extack,
+ struct net *dest_net)
{
unsigned char name_assign_type = NET_NAME_USER;
struct net *net = sock_net(skb->sk);
u32 portid = NETLINK_CB(skb).portid;
- struct net *dest_net, *link_net;
+ struct net *link_net;
struct net_device *dev;
char ifname[IFNAMSIZ];
int err;
if (!ops->alloc && !ops->setup)
return -EOPNOTSUPP;
+ if (!dest_net)
+ dest_net = net;
if (tb[IFLA_IFNAME]) {
nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
@@ -3480,11 +3494,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
name_assign_type = NET_NAME_ENUM;
}
- dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
- if (IS_ERR(dest_net))
- return PTR_ERR(dest_net);
- dest_net = dest_net ? : get_net(net);
-
if (tb[IFLA_LINK_NETNSID]) {
int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
@@ -3535,7 +3544,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
out:
if (link_net)
put_net(link_net);
- put_net(dest_net);
return err;
out_unregister:
if (ops->newlink) {
@@ -3557,7 +3565,8 @@ struct rtnl_newlink_tbs {
static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtnl_newlink_tbs *tbs,
- struct netlink_ext_ack *extack)
+ struct netlink_ext_ack *extack,
+ struct net *target_net)
{
struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
struct nlattr ** const tb = tbs->tb;
@@ -3688,7 +3697,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
status |= DO_SETLINK_NOTIFY;
}
- return do_setlink(skb, dev, ifm, extack, tb, status);
+ return do_setlink(target_net, skb, dev, ifm, extack, tb, status);
}
if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
@@ -3722,12 +3731,14 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
return -EOPNOTSUPP;
}
- return rtnl_newlink_create(skb, ifm, ops, nlh, tb, data, extack);
+ return rtnl_newlink_create(skb, ifm, ops, nlh, tb, data, extack, target_net);
}
static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
+ struct net *net = sock_net(skb->sk);
+ struct net *target_net = NULL;
struct rtnl_newlink_tbs *tbs;
struct nlattr **tb;
int ret;
@@ -3746,8 +3757,17 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (ret < 0)
goto out;
- ret = __rtnl_newlink(skb, nlh, tbs, extack);
+ target_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
+ if (IS_ERR(target_net)) {
+ ret = PTR_ERR(target_net);
+ target_net = NULL;
+ goto out;
+ }
+
+ ret = __rtnl_newlink(skb, nlh, tbs, extack, target_net);
out:
+ if (target_net)
+ put_net(target_net);
kfree(tbs);
return ret;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 04/51] net: Extract some code from __rtnl_newlink() to separate func
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (2 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 03/51] net: do_setlink() refactoring: move target_net acquiring to callers Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 05/51] net: Move dereference of tb[IFLA_MASTER] up Kirill Tkhai
` (48 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 167 +++++++++++++++++++++++++++-----------------------
1 file changed, 91 insertions(+), 76 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a5af69af235f..6da137f1a764 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3563,6 +3563,80 @@ struct rtnl_newlink_tbs {
struct nlattr *slave_attr[RTNL_SLAVE_MAX_TYPE + 1];
};
+static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct rtnl_newlink_tbs *tbs,
+ struct netlink_ext_ack *extack,
+ struct net *target_net, struct net_device *dev,
+ const struct rtnl_link_ops *ops,
+ struct nlattr **linkinfo, struct nlattr **data)
+{
+ const struct rtnl_link_ops *m_ops = NULL;
+ struct ifinfomsg *ifm = nlmsg_data(nlh);
+ struct nlattr ** const tb = tbs->tb;
+ struct nlattr **slave_data = NULL;
+ struct net_device *master_dev;
+ int err, status = 0;
+
+ if (nlh->nlmsg_flags & NLM_F_EXCL)
+ return -EEXIST;
+ if (nlh->nlmsg_flags & NLM_F_REPLACE)
+ return -EOPNOTSUPP;
+
+ err = validate_linkmsg(dev, tb, extack);
+ if (err < 0)
+ return err;
+
+ master_dev = netdev_master_upper_dev_get(dev);
+ if (master_dev)
+ m_ops = master_dev->rtnl_link_ops;
+
+ if (m_ops) {
+ err = -EINVAL;
+ if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
+ goto out;
+
+ if (m_ops->slave_maxtype &&
+ linkinfo[IFLA_INFO_SLAVE_DATA]) {
+ err = nla_parse_nested_deprecated(tbs->slave_attr,
+ m_ops->slave_maxtype,
+ linkinfo[IFLA_INFO_SLAVE_DATA],
+ m_ops->slave_policy,
+ extack);
+ if (err < 0)
+ goto out;
+ slave_data = tbs->slave_attr;
+ }
+ }
+
+ if (linkinfo[IFLA_INFO_DATA]) {
+ err = -EOPNOTSUPP;
+ if (!ops || ops != dev->rtnl_link_ops ||
+ !ops->changelink)
+ goto out;
+
+ err = ops->changelink(dev, tb, data, extack);
+ if (err < 0)
+ goto out;
+ status |= DO_SETLINK_NOTIFY;
+ }
+
+ if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
+ err = -EOPNOTSUPP;
+ if (!m_ops || !m_ops->slave_changelink)
+ goto out;
+
+ err = m_ops->slave_changelink(master_dev, dev, tb,
+ slave_data, extack);
+ if (err < 0)
+ goto out;
+ status |= DO_SETLINK_NOTIFY;
+ }
+
+ err = do_setlink(target_net, skb, dev, ifm, extack, tb, status);
+out:
+ return err;
+}
+
static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtnl_newlink_tbs *tbs,
struct netlink_ext_ack *extack,
@@ -3570,11 +3644,8 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
{
struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
struct nlattr ** const tb = tbs->tb;
- const struct rtnl_link_ops *m_ops;
- struct net_device *master_dev;
struct net *net = sock_net(skb->sk);
const struct rtnl_link_ops *ops;
- struct nlattr **slave_data;
char kind[MODULE_NAME_LEN];
struct net_device *dev;
struct ifinfomsg *ifm;
@@ -3585,29 +3656,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
#ifdef CONFIG_MODULES
replay:
#endif
- ifm = nlmsg_data(nlh);
- if (ifm->ifi_index > 0) {
- link_specified = true;
- dev = __dev_get_by_index(net, ifm->ifi_index);
- } else if (ifm->ifi_index < 0) {
- NL_SET_ERR_MSG(extack, "ifindex can't be negative");
- return -EINVAL;
- } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
- link_specified = true;
- dev = rtnl_dev_get(net, tb);
- } else {
- link_specified = false;
- dev = NULL;
- }
-
- master_dev = NULL;
- m_ops = NULL;
- if (dev) {
- master_dev = netdev_master_upper_dev_get(dev);
- if (master_dev)
- m_ops = master_dev->rtnl_link_ops;
- }
-
if (tb[IFLA_LINKINFO]) {
err = nla_parse_nested_deprecated(linkinfo, IFLA_INFO_MAX,
tb[IFLA_LINKINFO],
@@ -3645,59 +3693,26 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
}
}
- slave_data = NULL;
- if (m_ops) {
- if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
- return -EINVAL;
-
- if (m_ops->slave_maxtype &&
- linkinfo[IFLA_INFO_SLAVE_DATA]) {
- err = nla_parse_nested_deprecated(tbs->slave_attr,
- m_ops->slave_maxtype,
- linkinfo[IFLA_INFO_SLAVE_DATA],
- m_ops->slave_policy,
- extack);
- if (err < 0)
- return err;
- slave_data = tbs->slave_attr;
- }
+ ifm = nlmsg_data(nlh);
+ if (ifm->ifi_index > 0) {
+ link_specified = true;
+ dev = __dev_get_by_index(net, ifm->ifi_index);
+ } else if (ifm->ifi_index < 0) {
+ NL_SET_ERR_MSG(extack, "ifindex can't be negative");
+ return -EINVAL;
+ } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
+ link_specified = true;
+ dev = rtnl_dev_get(net, tb);
+ } else {
+ link_specified = false;
+ dev = NULL;
}
if (dev) {
- int status = 0;
-
- if (nlh->nlmsg_flags & NLM_F_EXCL)
- return -EEXIST;
- if (nlh->nlmsg_flags & NLM_F_REPLACE)
- return -EOPNOTSUPP;
-
- err = validate_linkmsg(dev, tb, extack);
- if (err < 0)
- return err;
-
- if (linkinfo[IFLA_INFO_DATA]) {
- if (!ops || ops != dev->rtnl_link_ops ||
- !ops->changelink)
- return -EOPNOTSUPP;
-
- err = ops->changelink(dev, tb, data, extack);
- if (err < 0)
- return err;
- status |= DO_SETLINK_NOTIFY;
- }
-
- if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
- if (!m_ops || !m_ops->slave_changelink)
- return -EOPNOTSUPP;
-
- err = m_ops->slave_changelink(master_dev, dev, tb,
- slave_data, extack);
- if (err < 0)
- return err;
- status |= DO_SETLINK_NOTIFY;
- }
-
- return do_setlink(target_net, skb, dev, ifm, extack, tb, status);
+ err = __rtnl_newlink_setlink(skb, nlh, tbs, extack,
+ target_net, dev,
+ ops, linkinfo, data);
+ return err;
}
if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 05/51] net: Move dereference of tb[IFLA_MASTER] up
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (3 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 04/51] net: Extract some code from __rtnl_newlink() to separate func Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 06/51] net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create() Kirill Tkhai
` (47 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
...to where main dev is dereferenced by ifi_index.
The patch is preparation in rtnetlink code for using nd_lock.
Having dereference of dev and master in same places allow
to double lock them the same time.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 72 +++++++++++++++++++++++++++++++++++++-------------
1 file changed, 53 insertions(+), 19 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6da137f1a764..a33b60d1de2d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2675,7 +2675,7 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
return err;
}
-static int do_set_master(struct net_device *dev, int ifindex,
+static int do_set_master(struct net_device *dev, struct net_device *master,
struct netlink_ext_ack *extack)
{
struct net_device *upper_dev = netdev_master_upper_dev_get(dev);
@@ -2683,7 +2683,7 @@ static int do_set_master(struct net_device *dev, int ifindex,
int err;
if (upper_dev) {
- if (upper_dev->ifindex == ifindex)
+ if (upper_dev == master)
return 0;
ops = upper_dev->netdev_ops;
if (ops->ndo_del_slave) {
@@ -2695,10 +2695,8 @@ static int do_set_master(struct net_device *dev, int ifindex,
}
}
- if (ifindex) {
- upper_dev = __dev_get_by_index(dev_net(dev), ifindex);
- if (!upper_dev)
- return -EINVAL;
+ if (master) {
+ upper_dev = master;
ops = upper_dev->netdev_ops;
if (ops->ndo_add_slave) {
err = ops->ndo_add_slave(upper_dev, dev, extack);
@@ -2775,7 +2773,8 @@ static int do_set_proto_down(struct net_device *dev,
/* notify flag means notify + modified. */
#define DO_SETLINK_NOTIFY 0x03
static int do_setlink(struct net *net, const struct sk_buff *skb,
- struct net_device *dev, struct ifinfomsg *ifm,
+ struct net_device *dev, struct net_device *master,
+ struct ifinfomsg *ifm,
struct netlink_ext_ack *extack,
struct nlattr **tb, int status)
{
@@ -2897,8 +2896,8 @@ static int do_setlink(struct net *net, const struct sk_buff *skb,
goto errout;
}
- if (tb[IFLA_MASTER]) {
- err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]), extack);
+ if (master) {
+ err = do_set_master(dev, master, extack);
if (err)
goto errout;
status |= DO_SETLINK_MODIFIED;
@@ -3156,12 +3155,24 @@ static struct net_device *rtnl_dev_get(struct net *net,
return __dev_get_by_name(net, ifname);
}
+static struct net_device *rtnl_master_get(struct net *net, struct nlattr *tb[])
+{
+ struct net_device *master;
+
+ if (!tb[IFLA_MASTER])
+ return NULL;
+ master = __dev_get_by_index(net, nla_get_u32(tb[IFLA_MASTER]));
+ if (!master)
+ return ERR_PTR(-EINVAL);
+ return master;
+}
+
static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
struct ifinfomsg *ifm;
- struct net_device *dev;
+ struct net_device *dev, *master = NULL;
struct net *target_net = NULL;
int err;
struct nlattr *tb[IFLA_MAX+1];
@@ -3196,11 +3207,17 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
goto errout;
}
+ master = rtnl_master_get(target_net ? : net, tb);
+ if (IS_ERR(master)) {
+ err = -EINVAL;
+ goto errout;
+ }
+
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
goto errout;
- err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
+ err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
errout:
if (target_net)
put_net(target_net);
@@ -3440,7 +3457,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
struct netlink_ext_ack *extack,
struct nlattr **tb)
{
- struct net_device *dev, *aux;
+ struct net_device *dev, *aux, *master = NULL;
struct net *target_net;
int err;
@@ -3451,12 +3468,18 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
goto out;
}
+ master = rtnl_master_get(target_net ? : net, tb);
+ if (IS_ERR(master)) {
+ err = -EINVAL;
+ goto out;
+ }
+
for_each_netdev_safe(net, dev, aux) {
if (dev->group == group) {
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
break;
- err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
+ err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
if (err < 0)
break;
}
@@ -3478,7 +3501,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
struct net *net = sock_net(skb->sk);
u32 portid = NETLINK_CB(skb).portid;
struct net *link_net;
- struct net_device *dev;
+ struct net_device *dev, *master = NULL;
char ifname[IFNAMSIZ];
int err;
@@ -3519,6 +3542,12 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
dev->ifindex = ifm->ifi_index;
+ master = rtnl_master_get(link_net ? : dest_net, tb);
+ if (IS_ERR(master)) {
+ err = -EINVAL;
+ goto out;
+ }
+
if (ops->newlink)
err = ops->newlink(link_net ? : net, dev, tb, data, extack);
else
@@ -3536,8 +3565,8 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
if (err < 0)
goto out_unregister;
}
- if (tb[IFLA_MASTER]) {
- err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]), extack);
+ if (master) {
+ err = do_set_master(dev, master, extack);
if (err)
goto out_unregister;
}
@@ -3567,6 +3596,7 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtnl_newlink_tbs *tbs,
struct netlink_ext_ack *extack,
struct net *target_net, struct net_device *dev,
+ struct net_device *new_master,
const struct rtnl_link_ops *ops,
struct nlattr **linkinfo, struct nlattr **data)
{
@@ -3632,7 +3662,7 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
status |= DO_SETLINK_NOTIFY;
}
- err = do_setlink(target_net, skb, dev, ifm, extack, tb, status);
+ err = do_setlink(target_net, skb, dev, new_master, ifm, extack, tb, status);
out:
return err;
}
@@ -3647,7 +3677,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
const struct rtnl_link_ops *ops;
char kind[MODULE_NAME_LEN];
- struct net_device *dev;
+ struct net_device *dev, *new_master = NULL;
struct ifinfomsg *ifm;
struct nlattr **data;
bool link_specified;
@@ -3709,8 +3739,12 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
}
if (dev) {
+ new_master = rtnl_master_get(target_net ? : net, tb);
+ if (IS_ERR(new_master))
+ return -EINVAL;
+
err = __rtnl_newlink_setlink(skb, nlh, tbs, extack,
- target_net, dev,
+ target_net, dev, new_master,
ops, linkinfo, data);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 06/51] net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (4 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 05/51] net: Move dereference of tb[IFLA_MASTER] up Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 07/51] net: Introduce nd_lock and primitives to work with it Kirill Tkhai
` (46 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a33b60d1de2d..046736091b4f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3503,6 +3503,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
struct net *link_net;
struct net_device *dev, *master = NULL;
char ifname[IFNAMSIZ];
+ LIST_HEAD(list_kill);
int err;
if (!ops->alloc && !ops->setup)
@@ -3576,13 +3577,11 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
return err;
out_unregister:
if (ops->newlink) {
- LIST_HEAD(list_kill);
-
ops->dellink(dev, &list_kill);
- unregister_netdevice_many(&list_kill);
} else {
- unregister_netdevice(dev);
+ unregister_netdevice_queue(dev, &list_kill);
}
+ unregister_netdevice_many(&list_kill);
goto out;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 07/51] net: Introduce nd_lock and primitives to work with it
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (5 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 06/51] net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create() Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 08/51] net: Initially attaching and detaching nd_lock Kirill Tkhai
` (45 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
include/linux/netdevice.h | 28 ++++
net/core/dev.c | 329 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 357 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 614ec5d3d75b..e36e64310bd4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1716,6 +1716,14 @@ enum netdev_reg_state {
NETREG_DUMMY, /* dummy device for NAPI poll */
};
+struct nd_lock {
+ struct mutex mutex;
+ struct list_head list;
+ int nr; /* number of entries in list */
+ refcount_t usage;
+ struct rcu_head rcu;
+};
+
/**
* struct net_device - The DEVICE structure.
*
@@ -2081,6 +2089,8 @@ struct net_device {
char name[IFNAMSIZ];
struct netdev_name_node *name_node;
struct dev_ifalias __rcu *ifalias;
+ struct nd_lock __rcu *nd_lock; /* lock protecting this dev */
+ struct list_head nd_lock_entry; /* entry in nd_lock::list */
/*
* I/O specific fields
* FIXME: Merge these and struct ifmap into one
@@ -3094,6 +3104,24 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
return ret;
}
+void unlock_netdev(struct nd_lock *nd_lock);
+bool lock_netdev(struct net_device *dev, struct nd_lock **nd_lock);
+bool lock_netdev_nested(struct net_device *dev, struct nd_lock **nd_lock,
+ struct nd_lock *held_lock);
+bool double_lock_netdev(struct net_device *dev, struct nd_lock **nd_lock,
+ struct net_device *dev2, struct nd_lock **nd_lock2);
+void double_unlock_netdev(struct nd_lock *nd_lock, struct nd_lock *nd_lock2);
+
+struct nd_lock *alloc_nd_lock(void);
+void put_nd_lock(struct nd_lock *nd_lock);
+void attach_nd_lock(struct net_device *dev, struct nd_lock *nd_lock);
+void detach_nd_lock(struct net_device *dev);
+struct nd_lock *attach_new_nd_lock(struct net_device *dev);
+
+extern struct nd_lock fallback_nd_lock;
+
+void nd_lock_transfer_devices(struct nd_lock **p_lock, struct nd_lock **p_lock2);
+
int register_netdevice(struct net_device *dev);
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
void unregister_netdevice_many(struct list_head *head);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0d0b983a6c21..9d98ab1e76bd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -170,6 +170,25 @@ static int call_netdevice_notifiers_extack(unsigned long val,
struct net_device *dev,
struct netlink_ext_ack *extack);
+/* While unregistering many devices at once, e.g., in ->exit_batch_rtnl
+ * methods, every netdev must be locked.
+ * Instead of taking all original nd_locks of devices at once, we transfer
+ * devices to relate to this @fallback_nd_lock. It allows to own a single
+ * lock during the unregistration. See locks_ordered() for locking order
+ * details.
+ *
+ * Not a first priority TODO is to change this algorithm to use one
+ * of original locks of these devices to transfer every device to.
+ *
+ * XXX: look at comment to nd_lock_transfer_devices().
+ */
+struct nd_lock fallback_nd_lock = {
+ .mutex = __MUTEX_INITIALIZER(fallback_nd_lock.mutex),
+ .list = LIST_HEAD_INIT(fallback_nd_lock.list),
+ .usage = REFCOUNT_INIT(1),
+};
+EXPORT_SYMBOL(fallback_nd_lock);
+
static DEFINE_MUTEX(ifalias_mutex);
/* protects napi_hash addition/deletion and napi_gen_id */
@@ -10322,6 +10341,315 @@ static void netdev_do_free_pcpu_stats(struct net_device *dev)
}
}
+struct nd_lock *alloc_nd_lock(void)
+{
+ struct nd_lock *nd_lock = kmalloc(sizeof(*nd_lock), GFP_KERNEL);
+
+ if (!nd_lock)
+ return NULL;
+
+ mutex_init(&nd_lock->mutex);
+ INIT_LIST_HEAD(&nd_lock->list);
+ nd_lock->nr = 0;
+ refcount_set(&nd_lock->usage, 1);
+ return nd_lock;
+}
+EXPORT_SYMBOL(alloc_nd_lock);
+
+void put_nd_lock(struct nd_lock *nd_lock)
+{
+ if (!refcount_dec_and_test(&nd_lock->usage))
+ return;
+ BUG_ON(!list_empty(&nd_lock->list));
+ kfree_rcu(nd_lock, rcu);
+}
+EXPORT_SYMBOL(put_nd_lock);
+
+/* Locking order: fallback_nd_lock is first,
+ * then prefer lock with smaller address.
+ */
+static bool locks_ordered(struct nd_lock *nd_lock, struct nd_lock *nd_lock2)
+{
+ if ((nd_lock <= nd_lock2 && nd_lock2 != &fallback_nd_lock) ||
+ nd_lock == &fallback_nd_lock)
+ return true;
+ return false;
+}
+
+/* Lock alive @dev or return false. @held_lock is optional argument.
+ * In case of @held_lock is passed, the caller must guarantee that
+ * dev->nd_lock is after @held_lock in the locking order (for details
+ * see locks_ordered()).
+ * Usually, held_lock is fallback_nd_lock.
+ */
+static bool __lock_netdev(struct net_device *dev, struct nd_lock **ret_nd_lock,
+ struct nd_lock *held_lock)
+{
+ struct nd_lock *nd_lock;
+ bool got;
+
+ if (held_lock)
+ lockdep_assert_held(&held_lock->mutex);
+
+ while (1) {
+ rcu_read_lock();
+ nd_lock = rcu_dereference(dev->nd_lock);
+ if (nd_lock && nd_lock != held_lock)
+ got = refcount_inc_not_zero(&nd_lock->usage);
+ rcu_read_unlock();
+
+ if (unlikely(!nd_lock)) {
+ /* Someone is unregistering @dev in parallel */
+ *ret_nd_lock = NULL;
+ return false;
+ }
+
+ /* The same lock as we own. Nothing to do. */
+ if (nd_lock == held_lock)
+ break;
+
+ if (unlikely(!got)) {
+ /* @dev->nd_lock changed or @dev is unregistering */
+ cond_resched();
+ continue;
+ }
+
+ WARN_ON(held_lock && !locks_ordered(held_lock, nd_lock));
+
+ if (!held_lock)
+ mutex_lock(&nd_lock->mutex);
+ else
+ mutex_lock_nested(&nd_lock->mutex, SINGLE_DEPTH_NESTING);
+ /* Check after mutex is locked it has not changed */
+ if (likely(nd_lock == rcu_access_pointer(dev->nd_lock)))
+ break;
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+ cond_resched();
+ }
+
+ *ret_nd_lock = nd_lock;
+ return true;
+}
+
+bool lock_netdev(struct net_device *dev, struct nd_lock **nd_lock)
+{
+ return __lock_netdev(dev, nd_lock, NULL);
+}
+EXPORT_SYMBOL(lock_netdev);
+
+bool lock_netdev_nested(struct net_device *dev, struct nd_lock **nd_lock,
+ struct nd_lock *held_lock)
+{
+ return __lock_netdev(dev, nd_lock, held_lock);
+}
+EXPORT_SYMBOL(lock_netdev_nested);
+
+void unlock_netdev(struct nd_lock *nd_lock)
+{
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+}
+EXPORT_SYMBOL(unlock_netdev);
+
+/* Double lock two devices and return locks they currently attached to.
+ * It's acceptable for one of devices to be NULL, or @dev and @dev2 may
+ * point to the same device. Pair bracket double_unlock_netdev() are able
+ * to handle such cases.
+ */
+bool double_lock_netdev(struct net_device *dev, struct nd_lock **ret_nd_lock,
+ struct net_device *dev2, struct nd_lock **ret_nd_lock2)
+{
+ struct nd_lock *nd_lock, *nd_lock2;
+ bool got, got2, ret;
+
+ if (WARN_ON_ONCE(!dev && !dev2))
+ return false;
+
+ if (dev == dev2 || !dev != !dev2) {
+ ret = lock_netdev(dev ? : dev2, ret_nd_lock);
+ *ret_nd_lock2 = *ret_nd_lock;
+ return ret;
+ }
+
+ while (1) {
+ got = got2 = false;
+ rcu_read_lock();
+ nd_lock = rcu_dereference(dev->nd_lock);
+ if (nd_lock)
+ got = refcount_inc_not_zero(&nd_lock->usage);
+ nd_lock2 = rcu_dereference(dev2->nd_lock);
+ if (nd_lock2) {
+ if (nd_lock2 != nd_lock)
+ got2 = refcount_inc_not_zero(&nd_lock2->usage);
+ }
+ rcu_read_unlock();
+ if (!got || (!got2 && nd_lock2 != nd_lock))
+ goto restart;
+
+ if (locks_ordered(nd_lock, nd_lock2)){
+ mutex_lock(&nd_lock->mutex);
+ if (nd_lock != nd_lock2)
+ mutex_lock_nested(&nd_lock2->mutex, SINGLE_DEPTH_NESTING);
+ } else {
+ mutex_lock(&nd_lock2->mutex);
+ if (nd_lock != nd_lock2)
+ mutex_lock_nested(&nd_lock->mutex, SINGLE_DEPTH_NESTING);
+ }
+
+ if (likely(nd_lock == rcu_access_pointer(dev->nd_lock) &&
+ nd_lock2 == rcu_access_pointer(dev2->nd_lock))) {
+ /* Both locks are acquired and correct */
+ break;
+ }
+ if (nd_lock != nd_lock2)
+ mutex_unlock(&nd_lock2->mutex);
+ mutex_unlock(&nd_lock->mutex);
+restart:
+ if (got)
+ put_nd_lock(nd_lock);
+ if (got2)
+ put_nd_lock(nd_lock2);
+ if (!nd_lock || !nd_lock2) {
+ *ret_nd_lock = *ret_nd_lock2 = NULL;
+ return false;
+ }
+ }
+
+ *ret_nd_lock = nd_lock;
+ *ret_nd_lock2 = nd_lock2;
+ return true;
+}
+EXPORT_SYMBOL(double_lock_netdev);
+
+void double_unlock_netdev(struct nd_lock *nd_lock, struct nd_lock *nd_lock2)
+{
+ if (nd_lock != nd_lock2)
+ unlock_netdev(nd_lock);
+ unlock_netdev(nd_lock2);
+}
+EXPORT_SYMBOL(double_unlock_netdev);
+
+/* Make set of devices protected by @p_lock and set of devices protected
+ * by @p_lock2 to be protected the same lock (this function chooses one
+ * of @p_lock and @p_lock2 as that common lock).
+ *
+ * 1)We call this in drivers which make two or more devices bound each other.
+ * E.g., drivers using newlink (like bonding, bridge and veth), or connecting
+ * several devices in switch (like dsa). Nested configurations are also
+ * handled to relate the same nd_lock (e.g., if veth is attached to bridge,
+ * the same lock will be shared by both veth peers, all bridge ports
+ * and the bridge itself).
+ *
+ * This allow to introduce sane locking like:
+ *
+ * lock_netdev(bridge, &nd_lock)
+ * ioctl(change bridge)
+ * netdevice notifier for bridge // protected by nd_lock
+ * netdevice notifier for veth // protected by nd_lock
+ * change veth parameter // protected by nd_lock
+ * netdevice notifier for other port // protected by nd_lock
+ * change port device parameter // protected by nd_lock
+ * unlock_netdev(nd_lock)
+ *
+ * So, each lock protects some group devices in the system, and all
+ * of devices in the group are connected in some logical way.
+ *
+ * 2)The main rule for choosing common lock is simple: we prefer fallback_nd_lock.
+ * Why it is so? Along with common used virtual devices, there are
+ * several hardware devices, which connect devices in groups and
+ * touches or modifies several devices together in one ioctl
+ * or netdevice event (e.g., mlx5). Not having every of devices zoo
+ * physically, it's impossible to organize them in small exact groups
+ * and test. So, we attach them to bigger fallback group.
+ *
+ * Let we have converted bridge driver and not converted my_driver. In case
+ * of we attach my_driver dev1 to the bridge, the bridge and my_driver dev1
+ * must relate to the same nd_lock. But the only nd_lock we can attach is
+ * fallback_nd_lock, otherwise my_driver dev1 may appear in different lock
+ * groups with some my_driver dev2 after my_driver dev2 is loaded. This
+ * would be wrong, since dev1 and dev2 may be used in same ioctl or netdevice
+ * event. So, fallback_nd_lock will be used as result lock.
+ *
+ * Note, that after all hardware drivers organize their logically connected
+ * devices in correct nd_lock groups, we remove this rule.
+ *
+ * The second rule is we prefer to migrate from smaller list, since
+ * there are less iterations.
+ *
+ * 3)Note, that reverse operation (splitting a lock into two locks) is not
+ * implemented at the moment (and it maybe useless).
+ *
+ * 4)Newly used lock for both sets is returned in @p_lock2 argument.
+ */
+void nd_lock_transfer_devices(struct nd_lock **p_lock, struct nd_lock **p_lock2)
+{
+ struct nd_lock *lock = *p_lock, *lock2 = *p_lock2;
+ struct net_device *dev;
+
+ lockdep_assert_held(&lock->mutex);
+ lockdep_assert_held(&lock2->mutex);
+
+ if (lock == lock2)
+ return;
+
+ if (lock == &fallback_nd_lock ||
+ (lock2 != &fallback_nd_lock && lock->nr > lock2->nr))
+ swap(lock, lock2);
+
+ list_for_each_entry(dev, &lock->list, nd_lock_entry)
+ rcu_assign_pointer(dev->nd_lock, lock2);
+
+ list_splice(&lock->list, &lock2->list);
+ refcount_add(lock->nr, &lock2->usage);
+ lock2->nr += lock->nr;
+ lock->nr = 0;
+ /* Our caller must own @lock since its locked */
+ WARN_ON(refcount_sub_and_test(lock->nr, &lock->usage));
+
+ *p_lock = lock;
+ *p_lock2 = lock2;
+}
+EXPORT_SYMBOL(nd_lock_transfer_devices);
+
+void attach_nd_lock(struct net_device *dev, struct nd_lock *nd_lock)
+{
+ lockdep_assert_held(&nd_lock->mutex);
+ rcu_assign_pointer(dev->nd_lock, nd_lock);
+ list_add(&dev->nd_lock_entry, &nd_lock->list);
+ refcount_inc(&nd_lock->usage);
+ nd_lock->nr++;
+}
+EXPORT_SYMBOL(attach_nd_lock);
+
+void detach_nd_lock(struct net_device *dev)
+{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
+
+ lockdep_assert_held(&nd_lock->mutex);
+ rcu_assign_pointer(dev->nd_lock, NULL);
+ list_del_init(&dev->nd_lock_entry);
+ nd_lock->nr--;
+ /* Our caller must own @lock since its locked */
+ WARN_ON(refcount_dec_and_test(&nd_lock->usage));
+}
+EXPORT_SYMBOL(detach_nd_lock);
+
+struct nd_lock *attach_new_nd_lock(struct net_device *dev)
+{
+ struct nd_lock *nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ return NULL;
+
+ mutex_lock(&nd_lock->mutex);
+ attach_nd_lock(dev, nd_lock);
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+
+ return nd_lock;
+}
+EXPORT_SYMBOL(attach_new_nd_lock);
+
/**
* register_netdevice() - register a network device
* @dev: device to register
@@ -11094,6 +11422,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->link_watch_list);
INIT_LIST_HEAD(&dev->adj_list.upper);
INIT_LIST_HEAD(&dev->adj_list.lower);
+ INIT_LIST_HEAD(&dev->nd_lock_entry);
INIT_LIST_HEAD(&dev->ptype_all);
INIT_LIST_HEAD(&dev->ptype_specific);
INIT_LIST_HEAD(&dev->net_notifier_list);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 08/51] net: Initially attaching and detaching nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (6 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 07/51] net: Introduce nd_lock and primitives to work with it Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 09/51] net: Use register_netdevice() in loopback() Kirill Tkhai
` (44 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
To start convertation devices one by one, we need
defaults assigned to rest of devices.
Here we add default lock assignment and a branch
for already converted drivers in register_netdevice.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
include/linux/netdevice.h | 8 +++
net/core/dev.c | 127 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 123 insertions(+), 12 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e36e64310bd4..2e9052e808a4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3122,14 +3122,22 @@ extern struct nd_lock fallback_nd_lock;
void nd_lock_transfer_devices(struct nd_lock **p_lock, struct nd_lock **p_lock2);
+int __register_netdevice(struct net_device *dev);
int register_netdevice(struct net_device *dev);
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
void unregister_netdevice_many(struct list_head *head);
+/* XXX: This will be converted to take nd_lock after drivers are ready */
static inline void unregister_netdevice(struct net_device *dev)
{
unregister_netdevice_queue(dev, NULL);
}
+/* XXX: This will be used in places, where nd_lock is already taken */
+static inline void __unregister_netdevice(struct net_device *dev)
+{
+ unregister_netdevice_queue(dev, NULL);
+}
+
int netdev_refcnt_read(const struct net_device *dev);
void free_netdev(struct net_device *dev);
void init_dummy_netdev(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 9d98ab1e76bd..63ece39c9286 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10651,7 +10651,7 @@ struct nd_lock *attach_new_nd_lock(struct net_device *dev)
EXPORT_SYMBOL(attach_new_nd_lock);
/**
- * register_netdevice() - register a network device
+ * __register_netdevice() - register a network device
* @dev: device to register
*
* Take a prepared network device structure and make it externally accessible.
@@ -10659,7 +10659,7 @@ EXPORT_SYMBOL(attach_new_nd_lock);
* Callers must hold the rtnl lock - you may want register_netdev()
* instead of this.
*/
-int register_netdevice(struct net_device *dev)
+int __register_netdevice(struct net_device *dev)
{
int ret;
struct net *net = dev_net(dev);
@@ -10675,6 +10675,9 @@ int register_netdevice(struct net_device *dev)
BUG_ON(dev->reg_state != NETREG_UNINITIALIZED);
BUG_ON(!net);
+ if (WARN_ON(!rcu_access_pointer(dev->nd_lock)))
+ return -ENOLCK;
+
ret = ethtool_check_ops(dev->ethtool_ops);
if (ret)
return ret;
@@ -10837,6 +10840,40 @@ int register_netdevice(struct net_device *dev)
netdev_name_node_free(dev->name_node);
goto out;
}
+EXPORT_SYMBOL(__register_netdevice);
+
+int register_netdevice(struct net_device *dev)
+{
+ struct nd_lock *nd_lock;
+ int err;
+
+ /* XXX: This "if" is to start one by one convertation
+ * to use __register_netdevice() in devices, that
+ * want to attach nd_lock themself (e.g., having newlink).
+ * After all of them are converted, we remove this.
+ */
+ if (rcu_access_pointer(dev->nd_lock))
+ return __register_netdevice(dev);
+
+ nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ return -ENOMEM;
+
+ /* This may be called from netdevice notifier, which is not converted
+ * yet. The context is unknown: either some nd_lock is locked or not.
+ * Sometimes here is nested mutex and sometimes is not. We use trylock
+ * to silence lockdep assert about that.
+ * It will be replaced by mutex_lock(), see next patches.
+ */
+ BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ attach_nd_lock(dev, nd_lock);
+ err = __register_netdevice(dev);
+ if (err)
+ detach_nd_lock(dev);
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+ return err;
+}
EXPORT_SYMBOL(register_netdevice);
/* Initialize the core of a dummy net device.
@@ -10907,7 +10944,23 @@ int register_netdev(struct net_device *dev)
if (rtnl_lock_killable())
return -EINTR;
- err = register_netdevice(dev);
+
+ /* Since this function is called without rtnl_lock(),
+ * nested registration is not possible here (compare
+ * to .newlink). So it's not interesting for us as
+ * much as register_netdevice(). Here are possible some
+ * real cross-device links between devices related
+ * to specific driver family, and they are handled by
+ * using fallback_nd_lock for all devices.
+ * Also, see comment in nd_lock_transfer_devices().
+ */
+ mutex_lock(&fallback_nd_lock.mutex);
+ attach_nd_lock(dev, &fallback_nd_lock);
+ err = __register_netdevice(dev);
+ if (err)
+ detach_nd_lock(dev);
+ mutex_unlock(&fallback_nd_lock.mutex);
+
rtnl_unlock();
return err;
}
@@ -11474,6 +11527,54 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
}
EXPORT_SYMBOL(alloc_netdev_mqs);
+static DEFINE_SPINLOCK(put_lock);
+static LIST_HEAD(put_list);
+
+static void put_work_func(struct work_struct *unused)
+{
+ struct nd_lock *nd_lock;
+ struct net_device *dev;
+ LIST_HEAD(list);
+
+ spin_lock(&put_lock);
+ list_replace_init(&put_list, &list);
+ spin_unlock(&put_lock);
+
+ while (!list_empty(&list)) {
+ dev = list_first_entry(&list,
+ struct net_device,
+ todo_list);
+ list_del_init(&dev->todo_list);
+
+ /* XXX: this nd_lock finaly should be held during
+ * the whole unregistering. Since not all of devices
+ * are converted yet, we place the detach_nd_lock here
+ * to be able to start attaching nd_lock to every device
+ * one by one in separate patches of this series.
+ * Then, it will be moved to callers (unregister_netdevice()
+ * and others).
+ *
+ * Note, we can't place the below to free_netdev(), because
+ * of free_netdev() currently may be called locked and unlocked
+ * from different callers.
+ *
+ * Also note, that lock may be detached here in case of
+ * this is cleanup after failed __register_netdevice().
+ */
+ if (lock_netdev(dev, &nd_lock)) {
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
+ }
+
+ if (dev->reg_state == NETREG_RELEASED)
+ put_device(&dev->dev); /* free via device release */
+ else /* Compatibility with error handling in drivers */
+ kvfree(dev);
+ }
+}
+
+static DECLARE_WORK(put_work, put_work_func);
+
/**
* free_netdev - free network device
* @dev: device
@@ -11486,6 +11587,7 @@ EXPORT_SYMBOL(alloc_netdev_mqs);
void free_netdev(struct net_device *dev)
{
struct napi_struct *p, *n;
+ bool work;
might_sleep();
@@ -11521,18 +11623,19 @@ void free_netdev(struct net_device *dev)
free_percpu(dev->xdp_bulkq);
dev->xdp_bulkq = NULL;
- /* Compatibility with error handling in drivers */
- if (dev->reg_state == NETREG_UNINITIALIZED ||
- dev->reg_state == NETREG_DUMMY) {
- kvfree(dev);
- return;
+ if (dev->reg_state != NETREG_UNINITIALIZED &&
+ dev->reg_state != NETREG_DUMMY) {
+ BUG_ON(dev->reg_state != NETREG_UNREGISTERED);
+ WRITE_ONCE(dev->reg_state, NETREG_RELEASED);
}
- BUG_ON(dev->reg_state != NETREG_UNREGISTERED);
- WRITE_ONCE(dev->reg_state, NETREG_RELEASED);
+ spin_lock(&put_lock);
+ list_add_tail(&dev->todo_list, &put_list);
+ work = list_is_singular(&put_list);
+ spin_unlock(&put_lock);
- /* will free via device release */
- put_device(&dev->dev);
+ if (work)
+ schedule_work(&put_work);
}
EXPORT_SYMBOL(free_netdev);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 09/51] net: Use register_netdevice() in loopback()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (7 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 08/51] net: Initially attaching and detaching nd_lock Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 10/51] net: Underline newlink and changelink dependencies Kirill Tkhai
` (43 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
loopback is simple interface without logical links to other devices.
Make it using register_netdevice() to allocate unique nd_lock
for it.
loopback is converted, so 50% work of removing rtnl_lock in kernel
is done.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/loopback.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2b486e7c749c..c911ee7e6c68 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -214,7 +214,11 @@ static __net_init int loopback_net_init(struct net *net)
goto out;
dev_net_set(dev, net);
- err = register_netdev(dev);
+ err = -EINTR;
+ if (rtnl_lock_killable())
+ goto out_free_netdev;
+ err = register_netdevice(dev);
+ rtnl_unlock();
if (err)
goto out_free_netdev;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 10/51] net: Underline newlink and changelink dependencies
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (8 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 09/51] net: Use register_netdevice() in loopback() Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 11/51] net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc Kirill Tkhai
` (42 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
This is to get rtnetlink code knowledge about devices
touching by newlink and changelink to bring them to the same
lock group.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 1 +
drivers/net/amt.c | 5 +++++
drivers/net/bonding/bond_netlink.c | 5 +++++
drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 1 +
drivers/net/ipvlan/ipvlan_main.c | 1 +
drivers/net/ipvlan/ipvtap.c | 1 +
drivers/net/macsec.c | 1 +
drivers/net/macvlan.c | 1 +
drivers/net/macvtap.c | 1 +
drivers/net/vxlan/vxlan_core.c | 6 ++++++
drivers/net/wireless/virtual/virt_wifi.c | 1 +
include/net/rtnetlink.h | 16 ++++++++++++++++
net/8021q/vlan_netlink.c | 1 +
net/core/rtnetlink.c | 5 +++++
net/dsa/netlink.c | 5 +++++
net/hsr/hsr_netlink.c | 6 ++++++
net/ieee802154/6lowpan/core.c | 1 +
17 files changed, 58 insertions(+)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index 9ad8d9856275..2dd3231df36c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -172,6 +172,7 @@ static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
.policy = ipoib_policy,
.priv_size = sizeof(struct ipoib_dev_priv),
.setup = ipoib_setup_common,
+ .newlink_deps = &generic_newlink_deps,
.newlink = ipoib_new_child_link,
.dellink = ipoib_del_child_link,
.changelink = ipoib_changelink,
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index 6d15ab3bfbbc..2288f4bf649c 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3330,6 +3330,10 @@ static int amt_fill_info(struct sk_buff *skb, const struct net_device *dev)
return -EMSGSIZE;
}
+struct link_deps amt_newlink_deps = {
+ .mandatory.data = { IFLA_AMT_LINK, },
+};
+
static struct rtnl_link_ops amt_link_ops __read_mostly = {
.kind = "amt",
.maxtype = IFLA_AMT_MAX,
@@ -3337,6 +3341,7 @@ static struct rtnl_link_ops amt_link_ops __read_mostly = {
.priv_size = sizeof(struct amt_dev),
.setup = amt_link_setup,
.validate = amt_validate,
+ .newlink_deps = &amt_newlink_deps,
.newlink = amt_newlink,
.dellink = amt_dellink,
.get_size = amt_get_size,
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 2a6a424806aa..5fcab77d616f 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -906,6 +906,10 @@ static int bond_fill_linkxstats(struct sk_buff *skb,
return 0;
}
+struct link_deps bond_changelink_deps = {
+ .optional.data = { IFLA_BOND_ACTIVE_SLAVE, IFLA_BOND_PRIMARY, },
+};
+
struct rtnl_link_ops bond_link_ops __read_mostly = {
.kind = "bond",
.priv_size = sizeof(struct bonding),
@@ -914,6 +918,7 @@ struct rtnl_link_ops bond_link_ops __read_mostly = {
.policy = bond_policy,
.validate = bond_validate,
.newlink = bond_newlink,
+ .changelink_deps = &bond_changelink_deps,
.changelink = bond_changelink,
.get_size = bond_get_size,
.fill_info = bond_fill_info,
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index f3bea196a8f9..495368cbef34 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -400,6 +400,7 @@ struct rtnl_link_ops rmnet_link_ops __read_mostly = {
.priv_size = sizeof(struct rmnet_priv),
.setup = rmnet_vnd_setup,
.validate = rmnet_rtnl_validate,
+ .newlink_deps = &generic_newlink_deps,
.newlink = rmnet_newlink,
.dellink = rmnet_dellink,
.get_size = rmnet_get_size,
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 094f44dac5c8..aafaf9d1d822 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -700,6 +700,7 @@ static struct rtnl_link_ops ipvlan_link_ops = {
.priv_size = sizeof(struct ipvl_dev),
.setup = ipvlan_link_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = ipvlan_link_new,
.dellink = ipvlan_link_delete,
.get_link_net = ipvlan_get_link_net,
diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c
index 1afc4c47be73..df1d22092b21 100644
--- a/drivers/net/ipvlan/ipvtap.c
+++ b/drivers/net/ipvlan/ipvtap.c
@@ -128,6 +128,7 @@ static void ipvtap_setup(struct net_device *dev)
static struct rtnl_link_ops ipvtap_link_ops __read_mostly = {
.kind = "ipvtap",
.setup = ipvtap_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = ipvtap_newlink,
.dellink = ipvtap_dellink,
.priv_size = sizeof(struct ipvtap_dev),
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 2da70bc3dd86..246cf09a0ebc 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4430,6 +4430,7 @@ static struct rtnl_link_ops macsec_link_ops __read_mostly = {
.policy = macsec_rtnl_policy,
.setup = macsec_setup,
.validate = macsec_validate_attr,
+ .newlink_deps = &generic_newlink_deps,
.newlink = macsec_newlink,
.changelink = macsec_changelink,
.dellink = macsec_dellink,
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 24298a33e0e9..b51e2e21dead 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1754,6 +1754,7 @@ static struct net *macvlan_get_link_net(const struct net_device *dev)
static struct rtnl_link_ops macvlan_link_ops = {
.kind = "macvlan",
.setup = macvlan_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = macvlan_newlink,
.dellink = macvlan_dellink,
.get_link_net = macvlan_get_link_net,
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 29a5929d48e5..f24168080e04 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -140,6 +140,7 @@ static struct net *macvtap_link_net(const struct net_device *dev)
static struct rtnl_link_ops macvtap_link_ops __read_mostly = {
.kind = "macvtap",
.setup = macvtap_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = macvtap_newlink,
.dellink = macvtap_dellink,
.get_link_net = macvtap_link_net,
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 8983e75e9881..b041ddc2ab34 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -4579,6 +4579,10 @@ static struct net *vxlan_get_link_net(const struct net_device *dev)
return READ_ONCE(vxlan->net);
}
+struct link_deps vxlan_newlink_deps = {
+ .mandatory.data = { IFLA_VXLAN_LINK, },
+};
+
static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
.kind = "vxlan",
.maxtype = IFLA_VXLAN_MAX,
@@ -4586,7 +4590,9 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
.priv_size = sizeof(struct vxlan_dev),
.setup = vxlan_setup,
.validate = vxlan_validate,
+ .newlink_deps = &vxlan_newlink_deps,
.newlink = vxlan_newlink,
+ .changelink_deps= &vxlan_newlink_deps,
.changelink = vxlan_changelink,
.dellink = vxlan_dellink,
.get_size = vxlan_get_size,
diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c
index 4ee374080466..c80ae0e0df53 100644
--- a/drivers/net/wireless/virtual/virt_wifi.c
+++ b/drivers/net/wireless/virtual/virt_wifi.c
@@ -622,6 +622,7 @@ static void virt_wifi_dellink(struct net_device *dev,
static struct rtnl_link_ops virt_wifi_link_ops = {
.kind = "virt_wifi",
.setup = virt_wifi_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = virt_wifi_newlink,
.dellink = virt_wifi_dellink,
.priv_size = sizeof(struct virt_wifi_netdev_priv),
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index b45d57b5968a..f1702e8872cf 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -29,6 +29,18 @@ static inline enum rtnl_kinds rtnl_msgtype_kind(int msgtype)
return msgtype & RTNL_KIND_MASK;
}
+#define MAX_LINK_DEPS 5
+struct link_deps_table {
+ int tb[MAX_LINK_DEPS + 1];
+ int data[MAX_LINK_DEPS + 1];
+};
+
+struct link_deps {
+ struct link_deps_table mandatory;
+ struct link_deps_table optional;
+};
+extern struct link_deps generic_newlink_deps;
+
void rtnl_register(int protocol, int msgtype,
rtnl_doit_func, rtnl_dumpit_func, unsigned int flags);
int rtnl_register_module(struct module *owner, int protocol, int msgtype,
@@ -58,7 +70,9 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
* and @setup are unused. Returns a netdev or ERR_PTR().
* @priv_size: sizeof net_device private space
* @setup: net_device setup function
+ * @newlink_deps: Indexes of real devices that newlink depends on.
* @newlink: Function for configuring and registering a new device
+ * @changelink_deps: Indexes of real devices that changelink depends on.
* @changelink: Function for changing parameters of an existing device
* @dellink: Function to remove a device
* @get_size: Function to calculate required room for dumping device
@@ -96,11 +110,13 @@ struct rtnl_link_ops {
struct nlattr *data[],
struct netlink_ext_ack *extack);
+ struct link_deps *newlink_deps;
int (*newlink)(struct net *src_net,
struct net_device *dev,
struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack);
+ struct link_deps *changelink_deps;
int (*changelink)(struct net_device *dev,
struct nlattr *tb[],
struct nlattr *data[],
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index cf5219df7903..c71180ba0746 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -293,6 +293,7 @@ struct rtnl_link_ops vlan_link_ops __read_mostly = {
.priv_size = sizeof(struct vlan_dev_priv),
.setup = vlan_setup,
.validate = vlan_validate,
+ .newlink_deps = &generic_newlink_deps,
.newlink = vlan_newlink,
.changelink = vlan_changelink,
.dellink = unregister_vlan_dev,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 046736091b4f..cf060ba4cd1d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3490,6 +3490,11 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
return err;
}
+struct link_deps generic_newlink_deps = {
+ .mandatory.tb = { IFLA_LINK, }
+};
+EXPORT_SYMBOL_GPL(generic_newlink_deps);
+
static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
const struct rtnl_link_ops *ops,
const struct nlmsghdr *nlh,
diff --git a/net/dsa/netlink.c b/net/dsa/netlink.c
index 1332e56349e5..835d935814fb 100644
--- a/net/dsa/netlink.c
+++ b/net/dsa/netlink.c
@@ -11,6 +11,10 @@ static const struct nla_policy dsa_policy[IFLA_DSA_MAX + 1] = {
[IFLA_DSA_CONDUIT] = { .type = NLA_U32 },
};
+struct link_deps dsa_changelink_deps = {
+ .optional.data = { IFLA_DSA_CONDUIT, },
+};
+
static int dsa_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
@@ -57,6 +61,7 @@ struct rtnl_link_ops dsa_link_ops __read_mostly = {
.priv_size = sizeof(struct dsa_port),
.maxtype = IFLA_DSA_MAX,
.policy = dsa_policy,
+ .changelink_deps = &dsa_changelink_deps,
.changelink = dsa_changelink,
.get_size = dsa_get_size,
.fill_info = dsa_fill_info,
diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
index f6ff0b61e08a..6ec883739415 100644
--- a/net/hsr/hsr_netlink.c
+++ b/net/hsr/hsr_netlink.c
@@ -176,12 +176,18 @@ static int hsr_fill_info(struct sk_buff *skb, const struct net_device *dev)
return -EMSGSIZE;
}
+static struct link_deps hsr_newlink_deps = {
+ .mandatory.data = { IFLA_HSR_SLAVE1, IFLA_HSR_SLAVE2, },
+ .optional.data = { IFLA_HSR_INTERLINK, },
+};
+
static struct rtnl_link_ops hsr_link_ops __read_mostly = {
.kind = "hsr",
.maxtype = IFLA_HSR_MAX,
.policy = hsr_policy,
.priv_size = sizeof(struct hsr_priv),
.setup = hsr_dev_setup,
+ .newlink_deps = &hsr_newlink_deps,
.newlink = hsr_newlink,
.dellink = hsr_dellink,
.fill_info = hsr_fill_info,
diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
index 77b4e92027c5..4236aafd448f 100644
--- a/net/ieee802154/6lowpan/core.c
+++ b/net/ieee802154/6lowpan/core.c
@@ -196,6 +196,7 @@ static struct rtnl_link_ops lowpan_link_ops __read_mostly = {
.kind = "lowpan",
.priv_size = LOWPAN_PRIV_SIZE(sizeof(struct lowpan_802154_dev)),
.setup = lowpan_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = lowpan_newlink,
.dellink = lowpan_dellink,
.validate = lowpan_validate,
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 11/51] net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (9 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 10/51] net: Underline newlink and changelink dependencies Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 12/51] net: Use __register_netdevice in trivial .newlink cases Kirill Tkhai
` (41 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 129 insertions(+), 5 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index cf060ba4cd1d..67b4b0610d14 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2696,9 +2696,16 @@ static int do_set_master(struct net_device *dev, struct net_device *master,
}
if (master) {
+ struct nd_lock *nd_lock = rcu_access_pointer(dev->nd_lock);
+ struct nd_lock *nd_lock2 = rcu_access_pointer(master->nd_lock);
+
upper_dev = master;
ops = upper_dev->netdev_ops;
if (ops->ndo_add_slave) {
+ /* Devices linked as upper<->lower must relate
+ * to the same nd_lock.
+ */
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
err = ops->ndo_add_slave(upper_dev, dev, extack);
if (err)
return err;
@@ -3173,6 +3180,7 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct ifinfomsg *ifm;
struct net_device *dev, *master = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
struct net *target_net = NULL;
int err;
struct nlattr *tb[IFLA_MAX+1];
@@ -3217,7 +3225,9 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
goto errout;
+ double_lock_netdev(dev, &nd_lock, master, &nd_lock2);
err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
+ double_unlock_netdev(nd_lock, nd_lock2);
errout:
if (target_net)
put_net(target_net);
@@ -3458,6 +3468,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
struct nlattr **tb)
{
struct net_device *dev, *aux, *master = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
struct net *target_net;
int err;
@@ -3479,7 +3490,9 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
break;
+ double_lock_netdev(dev, &nd_lock, master, &nd_lock2);
err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
+ double_unlock_netdev(nd_lock, nd_lock2);
if (err < 0)
break;
}
@@ -3495,6 +3508,74 @@ struct link_deps generic_newlink_deps = {
};
EXPORT_SYMBOL_GPL(generic_newlink_deps);
+static struct net_device *__resolve_deps_locks(struct net *net,
+ struct net_device *dev,
+ struct nlattr **attr,
+ const int deps[],
+ bool mandatory)
+{
+ struct nd_lock *nd_lock, *nd_lock2;
+ struct net_device *dev2;
+ int i, key, ifindex;
+
+ for (i = 0; i <= MAX_LINK_DEPS; i++) {
+ key = deps[i];
+ if (!key)
+ break;
+ if (!attr[key]) {
+ if (mandatory)
+ return ERR_PTR(-ENODEV);
+ continue;
+ }
+ ifindex = nla_get_u32(attr[key]);
+
+ if (!dev) {
+ dev = __dev_get_by_index(net, ifindex);
+ if (!dev && mandatory)
+ return ERR_PTR(-ENODEV);
+ continue;
+ }
+
+ dev2 = __dev_get_by_index(net, ifindex);
+ if (!dev2) {
+ if (mandatory)
+ return ERR_PTR(-ENODEV);
+ continue;
+ }
+ double_lock_netdev(dev, &nd_lock, dev2, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ double_unlock_netdev(nd_lock, nd_lock2);
+ }
+
+ return dev;
+}
+
+/* Transfer all dependencies to the same nd_lock.
+ * Note, here we use that list of nd_lock devices
+ * can't be split in pieces.
+ */
+static struct net_device *resolve_deps_locks(struct net *net,
+ const struct link_deps *deps,
+ struct nlattr **tb,
+ struct nlattr **data)
+{
+ struct net_device *dev = NULL;
+
+ if (!deps)
+ return NULL;
+
+ dev = __resolve_deps_locks(net, dev, tb, deps->mandatory.tb, true);
+ if (IS_ERR(dev))
+ return dev;
+ dev = __resolve_deps_locks(net, dev, data, deps->mandatory.data, true);
+ if (IS_ERR(dev))
+ return dev;
+ dev = __resolve_deps_locks(net, dev, tb, deps->optional.tb, false);
+ dev = __resolve_deps_locks(net, dev, tb, deps->optional.data, false);
+
+ return dev;
+}
+
static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
const struct rtnl_link_ops *ops,
const struct nlmsghdr *nlh,
@@ -3506,7 +3587,8 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
struct net *net = sock_net(skb->sk);
u32 portid = NETLINK_CB(skb).portid;
struct net *link_net;
- struct net_device *dev, *master = NULL;
+ struct net_device *dev, *master = NULL, *link_dev = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
char ifname[IFNAMSIZ];
LIST_HEAD(list_kill);
int err;
@@ -3554,13 +3636,36 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
goto out;
}
+ link_dev = resolve_deps_locks(link_net ? : net, ops->newlink_deps, tb, data);
+ if (IS_ERR(link_dev)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (master && link_dev) {
+ double_lock_netdev(master, &nd_lock, link_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ if (nd_lock != nd_lock2)
+ unlock_netdev(nd_lock);
+ } else if (master || link_dev) {
+ lock_netdev(master ? : link_dev, &nd_lock);
+ } else {
+ nd_lock = alloc_nd_lock();
+ err = -ENOMEM;
+ if (!nd_lock)
+ goto out;
+ mutex_lock(&nd_lock->mutex);
+ }
+ attach_nd_lock(dev, nd_lock);
+
if (ops->newlink)
err = ops->newlink(link_net ? : net, dev, tb, data, extack);
else
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0) {
+ detach_nd_lock(dev);
free_netdev(dev);
- goto out;
+ goto unlock;
}
err = rtnl_configure_link(dev, ifm, portid, nlh);
@@ -3576,6 +3681,8 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
if (err)
goto out_unregister;
}
+unlock:
+ unlock_netdev(nd_lock);
out:
if (link_net)
put_net(link_net);
@@ -3587,7 +3694,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
unregister_netdevice_queue(dev, &list_kill);
}
unregister_netdevice_many(&list_kill);
- goto out;
+ goto unlock;
}
struct rtnl_newlink_tbs {
@@ -3608,7 +3715,8 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct ifinfomsg *ifm = nlmsg_data(nlh);
struct nlattr ** const tb = tbs->tb;
struct nlattr **slave_data = NULL;
- struct net_device *master_dev;
+ struct net_device *master_dev, *link_dev;
+ struct nd_lock *nd_lock, *nd_lock2;
int err, status = 0;
if (nlh->nlmsg_flags & NLM_F_EXCL)
@@ -3620,6 +3728,21 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
return err;
+ if (ops && ops == dev->rtnl_link_ops && linkinfo[IFLA_INFO_DATA]) {
+ link_dev = resolve_deps_locks(dev_net(dev),
+ ops->changelink_deps,
+ tb, data);
+ if (IS_ERR(link_dev))
+ return PTR_ERR(link_dev);
+
+ if (link_dev) {
+ double_lock_netdev(dev, &nd_lock, link_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ double_unlock_netdev(nd_lock, nd_lock2);
+ }
+ }
+
+ double_lock_netdev(dev, &nd_lock, new_master, &nd_lock2);
master_dev = netdev_master_upper_dev_get(dev);
if (master_dev)
m_ops = master_dev->rtnl_link_ops;
@@ -3668,6 +3791,7 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
err = do_setlink(target_net, skb, dev, new_master, ifm, extack, tb, status);
out:
+ double_unlock_netdev(nd_lock, nd_lock2);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 12/51] net: Use __register_netdevice in trivial .newlink cases
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (10 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 11/51] net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 13/51] infiniband_ipoib: Use __register_netdevice in .newlink Kirill Tkhai
` (40 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Replace register_netdevice() in drivers calling it only
from .newlink methods.
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/amt.c | 4 ++--
drivers/net/bareudp.c | 2 +-
drivers/net/bonding/bond_netlink.c | 2 +-
drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 2 +-
drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 2 +-
drivers/net/gtp.c | 2 +-
drivers/net/ipvlan/ipvlan_main.c | 4 ++--
drivers/net/macsec.c | 4 ++--
drivers/net/macvlan.c | 4 ++--
drivers/net/pfcp.c | 2 +-
drivers/net/team/team_core.c | 2 +-
drivers/net/vrf.c | 6 +++---
drivers/net/wireguard/device.c | 2 +-
drivers/net/wireless/virtual/virt_wifi.c | 4 ++--
net/batman-adv/soft-interface.c | 2 +-
net/bridge/br_netlink.c | 2 +-
net/caif/chnl_net.c | 2 +-
net/hsr/hsr_device.c | 4 ++--
net/ipv4/ip_tunnel.c | 4 ++--
net/ipv6/ip6_gre.c | 2 +-
net/xfrm/xfrm_interface_core.c | 2 +-
21 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index 2288f4bf649c..d39cde2be85e 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3258,7 +3258,7 @@ static int amt_newlink(struct net *net, struct net_device *dev,
}
amt->qi = AMT_INIT_QUERY_INTERVAL;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0) {
netdev_dbg(dev, "failed to register new netdev %d\n", err);
goto err;
@@ -3266,7 +3266,7 @@ static int amt_newlink(struct net *net, struct net_device *dev,
err = netdev_upper_dev_link(amt->stream_dev, dev, extack);
if (err < 0) {
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
goto err;
}
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index d5c56ca91b77..ee54fec65e2e 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -647,7 +647,7 @@ static int bareudp_configure(struct net *net, struct net_device *dev,
bareudp->sport_min = conf->sport_min;
bareudp->multi_proto_mode = conf->multi_proto_mode;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 5fcab77d616f..70e3c93df0ba 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -574,7 +574,7 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
if (err < 0)
return err;
- err = register_netdevice(bond_dev);
+ err = __register_netdevice(bond_dev);
if (!err) {
struct bonding *bond = netdev_priv(bond_dev);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index 495368cbef34..526e4b7dd27b 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -178,7 +178,7 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev,
return 0;
err2:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
rmnet_vnd_dellink(mux_id, port, ep);
err1:
rmnet_unregister_real_device(real_dev);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index f1e40aade127..1c36ef3c1c7c 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -324,7 +324,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
return -EINVAL;
}
- rc = register_netdevice(rmnet_dev);
+ rc = __register_netdevice(rmnet_dev);
if (!rc) {
ep->egress_dev = rmnet_dev;
ep->mux_id = id;
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 0696faf60013..eef7c7a6edb2 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1520,7 +1520,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
dev->needed_headroom = LL_MAX_HEADER + GTP_IPV6_MAXLEN;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0) {
netdev_dbg(dev, "failed to register new netdev %d\n", err);
goto out_encap;
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index aafaf9d1d822..0887a7640cc0 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -585,7 +585,7 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
dev->priv_flags |= IFF_NO_RX_HANDLER;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
return err;
@@ -643,7 +643,7 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
remove_ida:
ida_free(&port->ida, dev->dev_id);
unregister_netdev:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
EXPORT_SYMBOL_GPL(ipvlan_link_new);
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 246cf09a0ebc..43ccba5a787d 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4187,7 +4187,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev,
if (rx_handler && rx_handler != macsec_handle_frame)
return -EBUSY;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
return err;
@@ -4257,7 +4257,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev,
unlink:
netdev_upper_dev_unlink(real_dev, dev);
unregister:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index b51e2e21dead..1bf9bb435ef4 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1531,7 +1531,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
update_port_bc_cutoff(
vlan, nla_get_s32(data[IFLA_MACVLAN_BC_CUTOFF]));
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto destroy_macvlan_port;
@@ -1549,7 +1549,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
unregister_netdev:
/* macvlan_uninit would free the macvlan port */
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
destroy_macvlan_port:
/* the macvlan port may be freed by macvlan_uninit when fail to register.
diff --git a/drivers/net/pfcp.c b/drivers/net/pfcp.c
index 69434fd13f96..a28a9aed14eb 100644
--- a/drivers/net/pfcp.c
+++ b/drivers/net/pfcp.c
@@ -200,7 +200,7 @@ static int pfcp_newlink(struct net *net, struct net_device *dev,
goto exit_err;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err) {
netdev_dbg(dev, "failed to register pfcp netdev %d\n", err);
goto exit_del_pfcp_sock;
diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
index ab1935a4aa2c..3e98771bcced 100644
--- a/drivers/net/team/team_core.c
+++ b/drivers/net/team/team_core.c
@@ -2214,7 +2214,7 @@ static int team_newlink(struct net *src_net, struct net_device *dev,
if (tb[IFLA_ADDRESS] == NULL)
eth_hw_addr_random(dev);
- return register_netdevice(dev);
+ return __register_netdevice(dev);
}
static int team_validate(struct nlattr *tb[], struct nlattr *data[],
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 040f0bb36c0e..85c0903d1ef0 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1719,7 +1719,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
dev->priv_flags |= IFF_L3MDEV_MASTER;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto out;
@@ -1731,7 +1731,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
err = vrf_map_register_dev(dev, extack);
if (err) {
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
goto out;
}
@@ -1743,7 +1743,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
err = vrf_add_fib_rules(dev);
if (err) {
vrf_map_unregister_dev(dev);
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
goto out;
}
*add_fib_rules = false;
diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 3feb36ee5bfb..b2a3d5260d42 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -364,7 +364,7 @@ static int wg_newlink(struct net *src_net, struct net_device *dev,
if (ret < 0)
goto err_free_handshake_queue;
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret < 0)
goto err_uninit_ratelimiter;
diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c
index c80ae0e0df53..877c3deeef5b 100644
--- a/drivers/net/wireless/virtual/virt_wifi.c
+++ b/drivers/net/wireless/virtual/virt_wifi.c
@@ -564,7 +564,7 @@ static int virt_wifi_newlink(struct net *src_net, struct net_device *dev,
dev->ieee80211_ptr->iftype = NL80211_IFTYPE_STATION;
dev->ieee80211_ptr->wiphy = common_wiphy;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err) {
dev_err(&priv->lowerdev->dev, "can't register_netdevice: %d\n",
err);
@@ -587,7 +587,7 @@ static int virt_wifi_newlink(struct net *src_net, struct net_device *dev,
return 0;
unregister_netdev:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
free_wireless_dev:
kfree(dev->ieee80211_ptr);
dev->ieee80211_ptr = NULL;
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 30ecbc2ef1fd..c1a9ae252a1c 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -1085,7 +1085,7 @@ static int batadv_softif_newlink(struct net *src_net, struct net_device *dev,
return -EINVAL;
}
- return register_netdevice(dev);
+ return __register_netdevice(dev);
}
/**
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index f17dbac7d828..4298c14d4295 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1560,7 +1560,7 @@ static int br_dev_newlink(struct net *src_net, struct net_device *dev,
struct net_bridge *br = netdev_priv(dev);
int err;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 47901bd4def1..69dc15baaab6 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -450,7 +450,7 @@ static int ipcaif_newlink(struct net *src_net, struct net_device *dev,
caifdev = netdev_priv(dev);
caif_netlink_parms(data, &caifdev->conn_req);
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret)
pr_warn("device rtml registration failed\n");
else
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index e4cc6b78dcfc..e2fa0130a66c 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -649,7 +649,7 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2],
(slave[1]->features & NETIF_F_HW_HSR_FWD))
hsr->fwd_offloaded = true;
- res = register_netdevice(hsr_dev);
+ res = __register_netdevice(hsr_dev);
if (res)
goto err_unregister;
@@ -685,6 +685,6 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2],
hsr_del_self_node(hsr);
if (unregister)
- unregister_netdevice(hsr_dev);
+ __unregister_netdevice(hsr_dev);
return res;
}
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 5cffad42fe8c..065b51dde995 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -1235,7 +1235,7 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
nt->net = net;
nt->parms = *p;
nt->fwmark = fwmark;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto err_register_netdevice;
@@ -1260,7 +1260,7 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
return 0;
err_dev_set_mtu:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
err_register_netdevice:
return err;
}
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 3942bd2ade78..57cbf7942dc8 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1993,7 +1993,7 @@ static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev,
nt->dev = dev;
nt->net = dev_net(dev);
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto out;
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index e50e4bf993fa..18bd60efd2cc 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -250,7 +250,7 @@ static int xfrmi_create(struct net_device *dev)
int err;
dev->rtnl_link_ops = &xfrmi_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 13/51] infiniband_ipoib: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (11 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 12/51] net: Use __register_netdevice in trivial .newlink cases Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 14/51] vxcan: " Kirill Tkhai
` (39 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Here are two path to __register_netdevice().
One is from .newlink, other is from store method.
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 12 ++++++++++--
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index 2dd3231df36c..b8add59c6c69 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -140,7 +140,7 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
if (data) {
err = ipoib_changelink(dev, tb, data, extack);
if (err) {
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 562df2b3ef18..970f344260df 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -128,7 +128,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
goto out_early;
}
- result = register_netdevice(ndev);
+ result = __register_netdevice(ndev);
if (result) {
ipoib_warn(priv, "failed to initialize; error %i", result);
@@ -155,7 +155,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
return 0;
sysfs_failed:
- unregister_netdevice(priv->dev);
+ __unregister_netdevice(priv->dev);
return -ENOMEM;
out_early:
@@ -169,6 +169,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
struct ipoib_dev_priv *ppriv, *priv;
char intf_name[IFNAMSIZ];
struct net_device *ndev;
+ struct nd_lock *nd_lock;
int result;
if (!capable(CAP_NET_ADMIN))
@@ -200,8 +201,15 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
ndev->rtnl_link_ops = ipoib_get_link_ops();
+ lock_netdev(pdev, &nd_lock);
+ attach_nd_lock(ndev, nd_lock);
+
result = __ipoib_vlan_add(ppriv, priv, pkey, IPOIB_LEGACY_CHILD);
+ if (result)
+ detach_nd_lock(ndev);
+ unlock_netdev(nd_lock);
+
if (result && ndev->reg_state == NETREG_UNINITIALIZED)
free_netdev(ndev);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 14/51] vxcan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (12 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 13/51] infiniband_ipoib: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 15/51] iavf: Use __register_netdevice() Kirill Tkhai
` (38 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/can/vxcan.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 9e1b7d41005f..6c44472af609 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -221,10 +221,12 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
if (ifmp && dev->ifindex)
peer->ifindex = ifmp->ifi_index;
- err = register_netdevice(peer);
+ attach_nd_lock(peer, rcu_dereference_protected(dev->nd_lock, true));
+ err = __register_netdevice(peer);
put_net(peer_net);
peer_net = NULL;
if (err < 0) {
+ detach_nd_lock(peer);
free_netdev(peer);
return err;
}
@@ -241,7 +243,7 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
else
snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto unregister_network_device;
@@ -257,7 +259,7 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
return 0;
unregister_network_device:
- unregister_netdevice(peer);
+ __unregister_netdevice(peer);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 15/51] iavf: Use __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (13 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 14/51] vxcan: " Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 16/51] geneve: Use __register_netdevice in .newlink Kirill Tkhai
` (37 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Attach, detach and take nd_lock in appropriate way:
nd_lock should be outside driver's locks.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 59 +++++++++++++++++++--------
1 file changed, 41 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index f782402cd789..77fbe80c04a4 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1968,14 +1968,36 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool runni
static void iavf_finish_config(struct work_struct *work)
{
struct iavf_adapter *adapter;
- int pairs, err;
+ struct nd_lock *nd_lock;
+ int pairs, err = 0;
adapter = container_of(work, struct iavf_adapter, finish_config);
/* Always take RTNL first to prevent circular lock dependency */
rtnl_lock();
+ lock_netdev(adapter->netdev, &nd_lock);
mutex_lock(&adapter->crit_lock);
+ if (adapter->netdev->reg_state != NETREG_REGISTERED &&
+ adapter->state == __IAVF_DOWN) {
+ err = __register_netdevice(adapter->netdev);
+ }
+
+ unlock_netdev(nd_lock);
+
+ if (err) {
+ dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n",
+ err);
+
+ /* go back and try again.*/
+ iavf_free_rss(adapter);
+ iavf_free_misc_irq(adapter);
+ iavf_reset_interrupt_capability(adapter);
+ iavf_change_state(adapter,
+ __IAVF_INIT_CONFIG_ADAPTER);
+ goto out;
+ }
+
if ((adapter->flags & IAVF_FLAG_SETUP_NETDEV_FEATURES) &&
adapter->netdev->reg_state == NETREG_REGISTERED &&
!test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) {
@@ -1985,22 +2007,6 @@ static void iavf_finish_config(struct work_struct *work)
switch (adapter->state) {
case __IAVF_DOWN:
- if (adapter->netdev->reg_state != NETREG_REGISTERED) {
- err = register_netdevice(adapter->netdev);
- if (err) {
- dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n",
- err);
-
- /* go back and try again.*/
- iavf_free_rss(adapter);
- iavf_free_misc_irq(adapter);
- iavf_reset_interrupt_capability(adapter);
- iavf_change_state(adapter,
- __IAVF_INIT_CONFIG_ADAPTER);
- goto out;
- }
- }
-
/* Set the real number of queues when reset occurs while
* state == __IAVF_DOWN
*/
@@ -5054,6 +5060,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
struct net_device *netdev;
struct iavf_adapter *adapter = NULL;
struct iavf_hw *hw = NULL;
+ struct nd_lock *nd_lock;
int err;
err = pci_enable_device(pdev);
@@ -5085,6 +5092,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
SET_NETDEV_DEV(netdev, &pdev->dev);
+ nd_lock = attach_new_nd_lock(netdev);
+ if (!nd_lock) {
+ err = -ENOMEM;
+ goto err_alloc_lock;
+ }
+
pci_set_drvdata(pdev, netdev);
adapter = netdev_priv(netdev);
@@ -5163,6 +5176,10 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
err_ioremap:
destroy_workqueue(adapter->wq);
err_alloc_wq:
+ mutex_lock(&nd_lock->mutex);
+ detach_nd_lock(netdev);
+ mutex_unlock(&nd_lock->mutex);
+err_alloc_lock:
free_netdev(netdev);
err_alloc_etherdev:
pci_release_regions(pdev);
@@ -5255,6 +5272,7 @@ static void iavf_remove(struct pci_dev *pdev)
struct iavf_mac_filter *f, *ftmp;
struct iavf_adapter *adapter;
struct net_device *netdev;
+ struct nd_lock *nd_lock;
struct iavf_hw *hw;
/* Don't proceed with remove if netdev is already freed */
@@ -5291,8 +5309,13 @@ static void iavf_remove(struct pci_dev *pdev)
cancel_delayed_work_sync(&adapter->watchdog_task);
cancel_work_sync(&adapter->finish_config);
- if (netdev->reg_state == NETREG_REGISTERED)
+ if (netdev->reg_state == NETREG_REGISTERED) {
unregister_netdev(netdev);
+ } else {
+ lock_netdev(netdev, &nd_lock);
+ detach_nd_lock(netdev);
+ unlock_netdev(nd_lock);
+ }
mutex_lock(&adapter->crit_lock);
dev_info(&adapter->pdev->dev, "Removing device\n");
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 16/51] geneve: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (14 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 15/51] iavf: Use __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 17/51] netkit: " Kirill Tkhai
` (36 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/geneve.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 838e85ddec67..f74f92753063 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1380,7 +1380,7 @@ static int geneve_configure(struct net *net, struct net_device *dev,
dev->flags = IFF_POINTOPOINT | IFF_NOARP;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
@@ -1830,6 +1830,7 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
u8 name_assign_type, u16 dst_port)
{
struct nlattr *tb[IFLA_MAX + 1];
+ struct nd_lock *nd_lock;
struct net_device *dev;
LIST_HEAD(list_kill);
int err;
@@ -1846,12 +1847,21 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
if (IS_ERR(dev))
return dev;
+ if (!attach_new_nd_lock(dev)) {
+ free_netdev(dev);
+ return ERR_PTR(-ENOMEM);
+ }
+
init_tnl_info(&cfg.info, dst_port);
+ lock_netdev(dev, &nd_lock);
err = geneve_configure(net, dev, NULL, &cfg);
if (err) {
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
free_netdev(dev);
return ERR_PTR(err);
}
+ unlock_netdev(nd_lock);
/* openvswitch users expect packet sizes to be unrestricted,
* so set the largest MTU we can.
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 17/51] netkit: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (15 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 16/51] geneve: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 18/51] qmi_wwan: " Kirill Tkhai
` (35 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/netkit.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 16789cd446e9..da8d806b8249 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -408,7 +408,8 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev,
nk->mode = mode;
bpf_mprog_bundle_init(&nk->bundle);
- err = register_netdevice(peer);
+ attach_nd_lock(peer, rcu_dereference_protected(dev->nd_lock, true));
+ err = __register_netdevice(peer);
put_net(net);
if (err < 0)
goto err_register_peer;
@@ -433,7 +434,7 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev,
nk->mode = mode;
bpf_mprog_bundle_init(&nk->bundle);
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto err_configure_peer;
netif_carrier_off(dev);
@@ -444,9 +445,10 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev,
rcu_assign_pointer(netkit_priv(peer)->peer, dev);
return 0;
err_configure_peer:
- unregister_netdevice(peer);
+ __unregister_netdevice(peer);
return err;
err_register_peer:
+ detach_nd_lock(peer);
free_netdev(peer);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 18/51] qmi_wwan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (16 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 17/51] netkit: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 19/51] bpqether: Provide determined context in __register_netdevice() Kirill Tkhai
` (34 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/usb/qmi_wwan.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 4823dbdf5465..beec69580978 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -246,6 +246,7 @@ static int qmimux_register_device(struct net_device *real_dev, u8 mux_id)
{
struct net_device *new_dev;
struct qmimux_priv *priv;
+ struct nd_lock *nd_lock;
int err;
new_dev = alloc_netdev(sizeof(struct qmimux_priv),
@@ -260,14 +261,23 @@ static int qmimux_register_device(struct net_device *real_dev, u8 mux_id)
new_dev->sysfs_groups[0] = &qmi_wwan_sysfs_qmimux_attr_group;
- err = register_netdevice(new_dev);
- if (err < 0)
+ err = -ENOMEM;
+
+ lock_netdev(real_dev, &nd_lock);
+ attach_nd_lock(new_dev, nd_lock);
+ err = __register_netdevice(new_dev);
+ if (err < 0) {
+ detach_nd_lock(new_dev);
+ unlock_netdev(nd_lock);
goto out_free_newdev;
+ }
/* Account for reference in struct qmimux_priv_priv */
dev_hold(real_dev);
err = netdev_upper_dev_link(real_dev, new_dev, NULL);
+ unlock_netdev(nd_lock);
+
if (err)
goto out_unregister_netdev;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 19/51] bpqether: Provide determined context in __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (17 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 18/51] qmi_wwan: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 20/51] ppp: Use __register_netdevice in .newlink Kirill Tkhai
` (33 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
In case of caller already owns nd_lock, there is
nesting without underlying that to lockdep.
So we use trylock and __register_netdevice() here.
XXX: after callers of netdevice notifyiers are converted,
we will inherit @edev nd_lock instead.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/hamradio/bpqether.c | 33 +++++++++++++++++++++++++++------
1 file changed, 27 insertions(+), 6 deletions(-)
diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 83a16d10eedb..bf2792f98afe 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -480,6 +480,7 @@ static int bpq_new_device(struct net_device *edev)
{
int err;
struct net_device *ndev;
+ struct nd_lock *nd_lock;
struct bpqdev *bpq;
ndev = alloc_netdev(sizeof(struct bpqdev), "bpq%d", NET_NAME_UNKNOWN,
@@ -487,7 +488,23 @@ static int bpq_new_device(struct net_device *edev)
if (!ndev)
return -ENOMEM;
-
+ err = -ENOMEM;
+ nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ goto err_free;
+
+ /* This is called from netdevice notifier, which is not converted yet.
+ * The context is unknown: either some nd_lock is locked or not. Since
+ * @ndev is undependent of @edev (on this stage of convertation we don't
+ * require that, we will require when we convert unregister_netdev()).
+ * So, a new nd_lock is used for @ndev for now.
+ * Q: Why is trylock, despite it can't fail?
+ * A: Caller may own some other nd_lock, so lockdep will unhappy seeing
+ * there is nested lock without mutex_lock_nested() prefix.
+ */
+ BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ attach_nd_lock(ndev, nd_lock);
+
bpq = netdev_priv(ndev);
dev_hold(edev);
bpq->ethdev = edev;
@@ -496,19 +513,23 @@ static int bpq_new_device(struct net_device *edev)
eth_broadcast_addr(bpq->dest_addr);
eth_broadcast_addr(bpq->acpt_addr);
- err = register_netdevice(ndev);
+ err = __register_netdevice(ndev);
if (err)
- goto error;
+ goto err_detach;
bpq_set_lockdep_class(ndev);
/* List protected by RTNL */
list_add_rcu(&bpq->bpq_list, &bpq_devices);
- return 0;
+unlock:
+ unlock_netdev(nd_lock);
+ return err;
- error:
+err_detach:
+ detach_nd_lock(ndev);
dev_put(edev);
+err_free:
free_netdev(ndev);
- return err;
+ goto unlock;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 20/51] ppp: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (18 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 19/51] bpqether: Provide determined context in __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 21/51] veth: " Kirill Tkhai
` (32 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/ppp/ppp_generic.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index eb9acfcaeb09..c094bc5e6d8f 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1216,7 +1216,7 @@ static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
mutex_unlock(&pn->all_ppp_mutex);
- ret = register_netdevice(ppp->dev);
+ ret = __register_netdevice(ppp->dev);
if (ret < 0)
goto err_unit;
@@ -3331,6 +3331,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
.unit = *unit,
.ifname_is_set = false,
};
+ struct nd_lock *nd_lock;
struct net_device *dev;
struct ppp *ppp;
int err;
@@ -3343,7 +3344,13 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
dev_net_set(dev, net);
dev->rtnl_link_ops = &ppp_link_ops;
+ if (!attach_new_nd_lock(dev)) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+
rtnl_lock();
+ lock_netdev(dev, &nd_lock);
err = ppp_dev_configure(net, dev, &conf);
if (err < 0)
@@ -3351,12 +3358,16 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
ppp = netdev_priv(dev);
*unit = ppp->file.index;
+ unlock_netdev(nd_lock);
rtnl_unlock();
return 0;
err_dev:
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
rtnl_unlock();
+err_free:
free_netdev(dev);
err:
return err;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 21/51] veth: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (19 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 20/51] ppp: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 22/51] vxlan: " Kirill Tkhai
` (31 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/veth.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 34499b91a8bd..7a502dbed5b9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1827,7 +1827,9 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
netif_inherit_tso_max(peer, dev);
- err = register_netdevice(peer);
+ attach_nd_lock(peer, rcu_dereference_protected(dev->nd_lock, true));
+
+ err = __register_netdevice(peer);
put_net(net);
net = NULL;
if (err < 0)
@@ -1858,7 +1860,7 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
else
snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto err_register_dev;
@@ -1888,14 +1890,15 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
return 0;
err_queues:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
err_register_dev:
/* nothing to do */
err_configure_peer:
- unregister_netdevice(peer);
+ __unregister_netdevice(peer);
return err;
err_register_peer:
+ detach_nd_lock(peer);
free_netdev(peer);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 22/51] vxlan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (20 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 21/51] veth: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 23/51] hdlc_fr: Use __register_netdevice Kirill Tkhai
` (30 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/vxlan/vxlan_core.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index b041ddc2ab34..369f7b667424 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -3950,7 +3950,7 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
return err;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto errout;
unregister = true;
@@ -4001,7 +4001,7 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
__vxlan_fdb_free(f);
unregister:
if (unregister)
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
@@ -4604,22 +4604,37 @@ struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type,
struct vxlan_config *conf)
{
+ struct net_device *dev, *lowerdev = NULL;
struct nlattr *tb[IFLA_MAX + 1];
- struct net_device *dev;
+ struct nd_lock *nd_lock;
int err;
memset(&tb, 0, sizeof(tb));
+ if (conf->remote_ifindex) {
+ lowerdev = __dev_get_by_index(net, conf->remote_ifindex);
+ if (!lowerdev)
+ return ERR_PTR(-ENODEV);
+ }
+
dev = rtnl_create_link(net, name, name_assign_type,
&vxlan_link_ops, tb, NULL);
if (IS_ERR(dev))
return dev;
- err = __vxlan_dev_create(net, dev, conf, NULL);
- if (err < 0) {
- free_netdev(dev);
- return ERR_PTR(err);
+ if (lowerdev) {
+ lock_netdev(lowerdev, &nd_lock);
+ attach_nd_lock(dev, nd_lock);
+ } else {
+ err = -ENOMEM;
+ if (!attach_new_nd_lock(dev))
+ goto err_free;
+ lock_netdev(dev, &nd_lock);
}
+ err = __vxlan_dev_create(net, dev, conf, NULL);
+ if (err < 0)
+ goto err_detach;
+ unlock_netdev(nd_lock);
err = rtnl_configure_link(dev, NULL, 0, NULL);
if (err < 0) {
@@ -4631,6 +4646,13 @@ struct net_device *vxlan_dev_create(struct net *net, const char *name,
}
return dev;
+
+err_detach:
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
+err_free:
+ free_netdev(dev);
+ return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(vxlan_dev_create);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 23/51] hdlc_fr: Use __register_netdevice
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (21 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 22/51] vxlan: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 24/51] lapbeth: Provide determined context in __register_netdevice() Kirill Tkhai
` (29 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to make dependent devices share
the same nd_lock.
Finaly, taking nd_lock should be moved to ioctl
caller, but now we can't do this at least because
netdevice notifiers are not converted.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wan/hdlc_fr.c | 18 ++++++++++++------
net/core/dev_ioctl.c | 1 +
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index 81e72bc1891f..93c61083de76 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1106,7 +1106,9 @@ static int fr_add_pvc(struct net_device *frad, unsigned int dlci, int type)
dev->priv_flags |= IFF_NO_QUEUE;
dev->ml_priv = pvc;
- if (register_netdevice(dev) != 0) {
+ attach_nd_lock(dev, rcu_dereference_protected(frad->nd_lock, true));
+ if (__register_netdevice(dev) != 0) {
+ detach_nd_lock(dev);
free_netdev(dev);
delete_unused_pvcs(hdlc);
return -EIO;
@@ -1187,8 +1189,9 @@ static int fr_ioctl(struct net_device *dev, struct if_settings *ifs)
const size_t size = sizeof(fr_proto);
fr_proto new_settings;
hdlc_device *hdlc = dev_to_hdlc(dev);
+ struct nd_lock *nd_lock;
fr_proto_pvc pvc;
- int result;
+ int result, err;
switch (ifs->type) {
case IF_GET_PROTO:
@@ -1272,10 +1275,13 @@ static int fr_ioctl(struct net_device *dev, struct if_settings *ifs)
result = ARPHRD_DLCI;
if (ifs->type == IF_PROTO_FR_ADD_PVC ||
- ifs->type == IF_PROTO_FR_ADD_ETH_PVC)
- return fr_add_pvc(dev, pvc.dlci, result);
- else
- return fr_del_pvc(hdlc, pvc.dlci, result);
+ ifs->type == IF_PROTO_FR_ADD_ETH_PVC) {
+ lock_netdev(dev, &nd_lock);
+ err = fr_add_pvc(dev, pvc.dlci, result);
+ unlock_netdev(nd_lock);
+ } else {
+ err = fr_del_pvc(hdlc, pvc.dlci, result);
+ }
}
return -EINVAL;
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 8592c052c0f4..dc2a0f513bac 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -496,6 +496,7 @@ static int dev_siocwandev(struct net_device *dev, struct if_settings *ifs)
{
const struct net_device_ops *ops = dev->netdev_ops;
+ /* This may take nd_lock. See fr_add_pvc() */
if (ops->ndo_siocwandev) {
if (netif_device_present(dev))
return ops->ndo_siocwandev(dev, ifs);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 24/51] lapbeth: Provide determined context in __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (22 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 23/51] hdlc_fr: Use __register_netdevice Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 25/51] wwan: Use __register_netdevice in .newlink Kirill Tkhai
` (28 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
In case of caller already owns nd_lock, there is
nesting without underlying that to lockdep.
So we use trylock and __register_netdevice() here.
XXX: after callers of netdevice notifyiers are converted,
we will inherit @edev nd_lock instead.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wan/lapbether.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index 56326f38fe8a..793a2ed424c0 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -380,6 +380,7 @@ static int lapbeth_new_device(struct net_device *dev)
{
struct net_device *ndev;
struct lapbethdev *lapbeth;
+ struct nd_lock *nd_lock;
int rc = -ENOMEM;
ASSERT_RTNL();
@@ -392,6 +393,23 @@ static int lapbeth_new_device(struct net_device *dev)
if (!ndev)
goto out;
+ rc = -ENOMEM;
+ nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ goto err_free;
+
+ /* This is called from netdevice notifier, which is not converted yet.
+ * The context is unknown: either some nd_lock is locked or not. Since
+ * @ndev is undependent of @edev (on this stage of convertation we don't
+ * require that, we will require when we convert unregister_netdev()).
+ * So, a new nd_lock is used for @ndev for now.
+ * Q: Why is trylock, despite it can't fail?
+ * A: Caller may own some other nd_lock, so lockdep will unhappy seeing
+ * there is nested lock without mutex_lock_nested() prefix.
+ */
+ BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ attach_nd_lock(ndev, nd_lock);
+
/* When transmitting data:
* first this driver removes a pseudo header of 1 byte,
* then the lapb module prepends an LAPB header of at most 3 bytes,
@@ -415,15 +433,19 @@ static int lapbeth_new_device(struct net_device *dev)
netif_napi_add_weight(ndev, &lapbeth->napi, lapbeth_napi_poll, 16);
rc = -EIO;
- if (register_netdevice(ndev))
- goto fail;
+ if (__register_netdevice(ndev))
+ goto err_put;
+ unlock_netdev(nd_lock);
list_add_rcu(&lapbeth->node, &lapbeth_devices);
rc = 0;
out:
return rc;
-fail:
+err_put:
dev_put(dev);
+ detach_nd_lock(ndev);
+ unlock_netdev(nd_lock);
+err_free:
free_netdev(ndev);
goto out;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 25/51] wwan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (23 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 24/51] lapbeth: Provide determined context in __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 26/51] 6lowpan: " Kirill Tkhai
` (27 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wwan/iosm/iosm_ipc_wwan.c | 2 +-
drivers/net/wwan/mhi_wwan_mbim.c | 2 +-
drivers/net/wwan/t7xx/t7xx_netdev.c | 2 +-
drivers/net/wwan/wwan_core.c | 13 +++++++++++--
4 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wwan/iosm/iosm_ipc_wwan.c b/drivers/net/wwan/iosm/iosm_ipc_wwan.c
index ff747fc79aaf..f84f59df0747 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_wwan.c
+++ b/drivers/net/wwan/iosm/iosm_ipc_wwan.c
@@ -180,7 +180,7 @@ static int ipc_wwan_newlink(void *ctxt, struct net_device *dev,
if (rcu_access_pointer(ipc_wwan->sub_netlist[if_id]))
return -EBUSY;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
diff --git a/drivers/net/wwan/mhi_wwan_mbim.c b/drivers/net/wwan/mhi_wwan_mbim.c
index d5a9360323d2..369ed68211dd 100644
--- a/drivers/net/wwan/mhi_wwan_mbim.c
+++ b/drivers/net/wwan/mhi_wwan_mbim.c
@@ -566,7 +566,7 @@ static int mhi_mbim_newlink(void *ctxt, struct net_device *ndev, u32 if_id,
/* Already protected by RTNL lock */
hlist_add_head_rcu(&link->hlnode, &mbim->link_list[LINK_HASH(if_id)]);
- return register_netdevice(ndev);
+ return __register_netdevice(ndev);
}
static void mhi_mbim_dellink(void *ctxt, struct net_device *ndev,
diff --git a/drivers/net/wwan/t7xx/t7xx_netdev.c b/drivers/net/wwan/t7xx/t7xx_netdev.c
index 91fa082e9cab..3bde38147930 100644
--- a/drivers/net/wwan/t7xx/t7xx_netdev.c
+++ b/drivers/net/wwan/t7xx/t7xx_netdev.c
@@ -304,7 +304,7 @@ static int t7xx_ccmni_wwan_newlink(void *ctxt, struct net_device *dev, u32 if_id
atomic_set(&ccmni->usage, 0);
ctlb->ccmni_inst[if_id] = ccmni;
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret)
return ret;
diff --git a/drivers/net/wwan/wwan_core.c b/drivers/net/wwan/wwan_core.c
index 17431f1b1a0c..c2878efcde59 100644
--- a/drivers/net/wwan/wwan_core.c
+++ b/drivers/net/wwan/wwan_core.c
@@ -982,7 +982,7 @@ static int wwan_rtnl_newlink(struct net *src_net, struct net_device *dev,
ret = wwandev->ops->newlink(wwandev->ops_ctxt, dev,
link_id, extack);
else
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
out:
/* release the reference */
@@ -1053,9 +1053,11 @@ static void wwan_create_default_link(struct wwan_device *wwandev,
{
struct nlattr *tb[IFLA_MAX + 1], *linkinfo[IFLA_INFO_MAX + 1];
struct nlattr *data[IFLA_WWAN_MAX + 1];
+ struct nd_lock *nd_lock;
struct net_device *dev;
struct nlmsghdr *nlh;
struct sk_buff *msg;
+ int ret;
/* Forge attributes required to create a WWAN netdev. We first
* build a netlink message and then parse it. This looks
@@ -1097,7 +1099,14 @@ static void wwan_create_default_link(struct wwan_device *wwandev,
if (WARN_ON(IS_ERR(dev)))
goto unlock;
- if (WARN_ON(wwan_rtnl_newlink(&init_net, dev, tb, data, NULL))) {
+ if (!attach_new_nd_lock(dev))
+ goto unlock;
+
+ lock_netdev(dev, &nd_lock);
+ ret = wwan_rtnl_newlink(&init_net, dev, tb, data, NULL);
+ unlock_netdev(nd_lock);
+
+ if (WARN_ON(ret)) {
free_netdev(dev);
goto unlock;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 26/51] 6lowpan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (24 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 25/51] wwan: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 27/51] vlan: " Kirill Tkhai
` (26 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/6lowpan/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
index 850d4a185f55..b5cbf85b291c 100644
--- a/net/6lowpan/core.c
+++ b/net/6lowpan/core.c
@@ -39,7 +39,7 @@ int lowpan_register_netdevice(struct net_device *dev,
dev->ndisc_ops = &lowpan_ndisc_ops;
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret < 0)
return ret;
@@ -52,10 +52,18 @@ EXPORT_SYMBOL(lowpan_register_netdevice);
int lowpan_register_netdev(struct net_device *dev,
enum lowpan_lltypes lltype)
{
+ struct nd_lock *nd_lock;
int ret;
rtnl_lock();
+ if (!attach_new_nd_lock(dev))
+ goto out;
+ lock_netdev(dev, &nd_lock);
ret = lowpan_register_netdevice(dev, lltype);
+ if (ret)
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
+out:
rtnl_unlock();
return ret;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 27/51] vlan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (25 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 26/51] 6lowpan: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 28/51] dsa: Use __register_netdevice() Kirill Tkhai
` (25 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/8021q/vlan.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index e45187b88220..ca3ba251a145 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -176,7 +176,7 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
if (err < 0)
goto out_uninit_mvrp;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out_uninit_mvrp;
@@ -196,7 +196,7 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
return 0;
out_unregister_netdev:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
out_uninit_mvrp:
if (grp->nr_vlan_devs == 0)
vlan_mvrp_uninit_applicant(real_dev);
@@ -217,6 +217,7 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id)
struct vlan_dev_priv *vlan;
struct net *net = dev_net(real_dev);
struct vlan_net *vn = net_generic(net, vlan_net_id);
+ struct nd_lock *nd_lock;
char name[IFNAMSIZ];
int err;
@@ -274,7 +275,13 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id)
vlan->flags = VLAN_FLAG_REORDER_HDR;
new_dev->rtnl_link_ops = &vlan_link_ops;
+
+ lock_netdev(real_dev, &nd_lock);
+ attach_nd_lock(new_dev, nd_lock);
err = register_vlan_dev(new_dev, NULL);
+ if (err)
+ detach_nd_lock(new_dev);
+ unlock_netdev(nd_lock);
if (err < 0)
goto out_free_newdev;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 28/51] dsa: Use __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (26 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 27/51] vlan: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 29/51] ip6gre: Use __register_netdevice() in .changelink Kirill Tkhai
` (24 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Inherit nd_lock from conduit during registration
of a new device.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/dsa/user.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/net/dsa/user.c b/net/dsa/user.c
index f5adfa1d978a..cc3e0006f953 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -2686,6 +2686,7 @@ int dsa_user_create(struct dsa_port *port)
struct net_device *conduit = dsa_port_to_conduit(port);
struct dsa_switch *ds = port->ds;
struct net_device *user_dev;
+ struct nd_lock *nd_lock;
struct dsa_user_priv *p;
const char *name;
int assign_type;
@@ -2759,38 +2760,42 @@ int dsa_user_create(struct dsa_port *port)
dev_warn(ds->dev, "nonfatal error %d setting MTU to %d on port %d\n",
ret, ETH_DATA_LEN, port->index);
- ret = register_netdevice(user_dev);
+ lock_netdev(conduit, &nd_lock);
+ attach_nd_lock(user_dev, nd_lock);
+ ret = __register_netdevice(user_dev);
if (ret) {
netdev_err(conduit, "error %d registering interface %s\n",
ret, user_dev->name);
- rtnl_unlock();
+ detach_nd_lock(user_dev);
+ unlock_netdev(nd_lock);
goto out_phy;
}
+ ret = netdev_upper_dev_link(conduit, user_dev, NULL);
+ unlock_netdev(nd_lock);
+
+ if (ret)
+ goto out_unregister;
+
if (IS_ENABLED(CONFIG_DCB)) {
ret = dsa_user_dcbnl_init(user_dev);
if (ret) {
netdev_err(user_dev,
"failed to initialize DCB: %pe\n",
ERR_PTR(ret));
- rtnl_unlock();
goto out_unregister;
}
}
- ret = netdev_upper_dev_link(conduit, user_dev, NULL);
-
rtnl_unlock();
- if (ret)
- goto out_unregister;
-
return 0;
out_unregister:
- unregister_netdev(user_dev);
+ lock_netdev(user_dev, &nd_lock);
+ __unregister_netdevice(user_dev);
+ unlock_netdev(nd_lock);
out_phy:
- rtnl_lock();
phylink_disconnect_phy(p->dp->pl);
rtnl_unlock();
dsa_port_phylink_destroy(p->dp);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 29/51] ip6gre: Use __register_netdevice() in .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (27 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 28/51] dsa: Use __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 30/51] ip6_tunnel: Use __register_netdevice() in .newlink and .changelink Kirill Tkhai
` (23 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .changelink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/ip6_gre.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 57cbf7942dc8..e40780da15a0 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -344,6 +344,7 @@ static struct ip6_tnl *ip6gre_tunnel_find(struct net *net,
}
static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
+ struct nd_lock *nd_lock,
const struct __ip6_tnl_parm *parms, int create)
{
struct ip6_tnl *t, *nt;
@@ -378,8 +379,11 @@ static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
nt->dev = dev;
nt->net = dev_net(dev);
- if (register_netdevice(dev) < 0)
+ attach_nd_lock(dev, nd_lock);
+ if (__register_netdevice(dev) < 0) {
+ detach_nd_lock(dev);
goto failed_free;
+ }
ip6gre_tnl_link_config(nt, 1);
ip6gre_tunnel_link(ign, nt);
@@ -1277,6 +1281,10 @@ static void ip6gre_tnl_parm_to_user(struct ip6_tnl_parm2 *u,
memcpy(u->name, p->name, sizeof(u->name));
}
+/* XXX: Currently ->ndo_siocdevprivate is called with @dev unlocked
+ * (the only place where @dev may be locked is phonet_device_autoconf(),
+ * but it can't be caller of this).
+ */
static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
struct ifreq *ifr, void __user *data,
int cmd)
@@ -1287,6 +1295,7 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
struct ip6_tnl *t = netdev_priv(dev);
struct net *net = t->net;
struct ip6gre_net *ign = net_generic(net, ip6gre_net_id);
+ struct nd_lock *nd_lock;
memset(&p1, 0, sizeof(p1));
@@ -1298,7 +1307,9 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
break;
}
ip6gre_tnl_parm_from_user(&p1, &p);
- t = ip6gre_tunnel_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6gre_tunnel_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (!t)
t = netdev_priv(dev);
}
@@ -1328,7 +1339,9 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
p.o_key = 0;
ip6gre_tnl_parm_from_user(&p1, &p);
- t = ip6gre_tunnel_locate(net, &p1, cmd == SIOCADDTUNNEL);
+ lock_netdev(dev, &nd_lock);
+ t = ip6gre_tunnel_locate(net, nd_lock, &p1, cmd == SIOCADDTUNNEL);
+ unlock_netdev(nd_lock);
if (dev != ign->fb_tunnel_dev && cmd == SIOCCHGTUNNEL) {
if (t) {
@@ -1369,7 +1382,9 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
goto done;
err = -ENOENT;
ip6gre_tnl_parm_from_user(&p1, &p);
- t = ip6gre_tunnel_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6gre_tunnel_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (!t)
goto done;
err = -EPERM;
@@ -2038,6 +2053,7 @@ ip6gre_changelink_common(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[], struct __ip6_tnl_parm *p_p,
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip6_tnl *t, *nt = netdev_priv(dev);
struct net *net = nt->net;
struct ip6gre_net *ign = net_generic(net, ip6gre_net_id);
@@ -2055,7 +2071,7 @@ ip6gre_changelink_common(struct net_device *dev, struct nlattr *tb[],
ip6gre_netlink_parms(data, p_p);
- t = ip6gre_tunnel_locate(net, p_p, 0);
+ t = ip6gre_tunnel_locate(net, nd_lock, p_p, 0);
if (t) {
if (t->dev != dev)
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 30/51] ip6_tunnel: Use __register_netdevice() in .newlink and .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (28 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 29/51] ip6gre: Use __register_netdevice() in .changelink Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 31/51] ip6_vti: " Kirill Tkhai
` (22 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink and .changelink with their
callers, which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/ip6_tunnel.c | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 87dfb565a9f8..d6435cb1e4fc 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -257,7 +257,7 @@ static int ip6_tnl_create2(struct net_device *dev)
int err;
dev->rtnl_link_ops = &ip6_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
@@ -282,7 +282,8 @@ static int ip6_tnl_create2(struct net_device *dev)
* created tunnel or error pointer
**/
-static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
+static struct ip6_tnl *ip6_tnl_create(struct net *net, struct nd_lock *nd_lock,
+ struct __ip6_tnl_parm *p)
{
struct net_device *dev;
struct ip6_tnl *t;
@@ -307,6 +308,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
t = netdev_priv(dev);
t->parms = *p;
t->net = dev_net(dev);
+ attach_nd_lock(dev, nd_lock);
err = ip6_tnl_create2(dev);
if (err < 0)
goto failed_free;
@@ -314,6 +316,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
return t;
failed_free:
+ detach_nd_lock(dev);
free_netdev(dev);
failed:
return ERR_PTR(err);
@@ -322,6 +325,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
/**
* ip6_tnl_locate - find or create tunnel matching given parameters
* @net: network namespace
+ * @nd_lock: created device lock
* @p: tunnel parameters
* @create: != 0 if allowed to create new tunnel if no match found
*
@@ -335,6 +339,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
**/
static struct ip6_tnl *ip6_tnl_locate(struct net *net,
+ struct nd_lock *nd_lock,
struct __ip6_tnl_parm *p, int create)
{
const struct in6_addr *remote = &p->raddr;
@@ -357,7 +362,7 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
}
if (!create)
return ERR_PTR(-ENODEV);
- return ip6_tnl_create(net, p);
+ return ip6_tnl_create(net, nd_lock, p);
}
/**
@@ -1621,8 +1626,11 @@ ip6_tnl_parm_to_user(struct ip6_tnl_parm *u, const struct __ip6_tnl_parm *p)
* %-EINVAL if passed tunnel parameters are invalid,
* %-EEXIST if changing a tunnel's parameters would cause a conflict
* %-ENODEV if attempting to change or delete a nonexisting device
- **/
-
+ *
+ * XXX: Currently ->ndo_siocdevprivate is called with @dev unlocked
+ * (the only place where @dev may be locked is phonet_device_autoconf(),
+ * but it can't be caller of this).
+ */
static int
ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
void __user *data, int cmd)
@@ -1633,6 +1641,7 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
struct ip6_tnl *t = netdev_priv(dev);
struct net *net = t->net;
struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
+ struct nd_lock *nd_lock;
memset(&p1, 0, sizeof(p1));
@@ -1644,7 +1653,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
break;
}
ip6_tnl_parm_from_user(&p1, &p);
- t = ip6_tnl_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6_tnl_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (IS_ERR(t))
t = netdev_priv(dev);
} else {
@@ -1667,7 +1678,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
p.proto != 0)
break;
ip6_tnl_parm_from_user(&p1, &p);
- t = ip6_tnl_locate(net, &p1, cmd == SIOCADDTUNNEL);
+ lock_netdev(dev, &nd_lock);
+ t = ip6_tnl_locate(net, nd_lock, &p1, cmd == SIOCADDTUNNEL);
+ unlock_netdev(nd_lock);
if (cmd == SIOCCHGTUNNEL) {
if (!IS_ERR(t)) {
if (t->dev != dev) {
@@ -1702,7 +1715,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
break;
err = -ENOENT;
ip6_tnl_parm_from_user(&p1, &p);
- t = ip6_tnl_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6_tnl_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (IS_ERR(t))
break;
err = -EPERM;
@@ -2003,6 +2018,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct net *net = dev_net(dev);
struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
struct ip_tunnel_encap ipencap;
@@ -2023,7 +2039,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
if (rtnl_dereference(ip6n->collect_md_tun))
return -EEXIST;
} else {
- t = ip6_tnl_locate(net, &nt->parms, 0);
+ t = ip6_tnl_locate(net, nd_lock, &nt->parms, 0);
if (!IS_ERR(t))
return -EEXIST;
}
@@ -2039,6 +2055,7 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip6_tnl *t = netdev_priv(dev);
struct __ip6_tnl_parm p;
struct net *net = t->net;
@@ -2058,7 +2075,7 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[],
if (p.collect_md)
return -EINVAL;
- t = ip6_tnl_locate(net, &p, 0);
+ t = ip6_tnl_locate(net, nd_lock, &p, 0);
if (!IS_ERR(t)) {
if (t->dev != dev)
return -EEXIST;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 31/51] ip6_vti: Use __register_netdevice() in .newlink and .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (29 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 30/51] ip6_tunnel: Use __register_netdevice() in .newlink and .changelink Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 32/51] ip6_sit: " Kirill Tkhai
` (21 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink and .changelink with their
callers, which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/ip6_vti.c | 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 590737c27537..b20a18c403e9 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -182,7 +182,7 @@ static int vti6_tnl_create2(struct net_device *dev)
int err;
dev->rtnl_link_ops = &vti6_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
@@ -196,7 +196,8 @@ static int vti6_tnl_create2(struct net_device *dev)
return err;
}
-static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
+static struct ip6_tnl *vti6_tnl_create(struct net *net, struct nd_lock *nd_lock,
+ struct __ip6_tnl_parm *p)
{
struct net_device *dev;
struct ip6_tnl *t;
@@ -221,6 +222,7 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
t->parms = *p;
t->net = dev_net(dev);
+ attach_nd_lock(dev, nd_lock);
err = vti6_tnl_create2(dev);
if (err < 0)
goto failed_free;
@@ -228,6 +230,7 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
return t;
failed_free:
+ detach_nd_lock(dev);
free_netdev(dev);
failed:
return NULL;
@@ -247,8 +250,8 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
* Return:
* matching tunnel or NULL
**/
-static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
- int create)
+static struct ip6_tnl *vti6_locate(struct net *net, struct nd_lock *nd_lock,
+ struct __ip6_tnl_parm *p, int create)
{
const struct in6_addr *remote = &p->raddr;
const struct in6_addr *local = &p->laddr;
@@ -269,7 +272,7 @@ static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
}
if (!create)
return NULL;
- return vti6_tnl_create(net, p);
+ return vti6_tnl_create(net, nd_lock, p);
}
/**
@@ -791,6 +794,10 @@ vti6_parm_to_user(struct ip6_tnl_parm2 *u, const struct __ip6_tnl_parm *p)
* %-EINVAL if passed tunnel parameters are invalid,
* %-EEXIST if changing a tunnel's parameters would cause a conflict
* %-ENODEV if attempting to change or delete a nonexisting device
+ *
+ * XXX: Currently ->ndo_siocdevprivate is called with @dev unlocked
+ * (the only place where @dev may be locked is phonet_device_autoconf(),
+ * but it can't be caller of this).
**/
static int
vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data, int cmd)
@@ -801,6 +808,7 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
struct ip6_tnl *t = NULL;
struct net *net = dev_net(dev);
struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+ struct nd_lock *nd_lock;
memset(&p1, 0, sizeof(p1));
@@ -812,7 +820,9 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
break;
}
vti6_parm_from_user(&p1, &p);
- t = vti6_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = vti6_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
} else {
memset(&p, 0, sizeof(p));
}
@@ -834,7 +844,9 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
if (p.proto != IPPROTO_IPV6 && p.proto != 0)
break;
vti6_parm_from_user(&p1, &p);
- t = vti6_locate(net, &p1, cmd == SIOCADDTUNNEL);
+ lock_netdev(dev, &nd_lock);
+ t = vti6_locate(net, nd_lock, &p1, cmd == SIOCADDTUNNEL);
+ unlock_netdev(nd_lock);
if (dev != ip6n->fb_tnl_dev && cmd == SIOCCHGTUNNEL) {
if (t) {
if (t->dev != dev) {
@@ -866,7 +878,9 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
break;
err = -ENOENT;
vti6_parm_from_user(&p1, &p);
- t = vti6_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = vti6_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (!t)
break;
err = -EPERM;
@@ -1001,6 +1015,7 @@ static int vti6_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct net *net = dev_net(dev);
struct ip6_tnl *nt;
@@ -1009,7 +1024,7 @@ static int vti6_newlink(struct net *src_net, struct net_device *dev,
nt->parms.proto = IPPROTO_IPV6;
- if (vti6_locate(net, &nt->parms, 0))
+ if (vti6_locate(net, nd_lock, &nt->parms, 0))
return -EEXIST;
return vti6_tnl_create2(dev);
@@ -1028,6 +1043,7 @@ static int vti6_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip6_tnl *t;
struct __ip6_tnl_parm p;
struct net *net = dev_net(dev);
@@ -1038,7 +1054,7 @@ static int vti6_changelink(struct net_device *dev, struct nlattr *tb[],
vti6_netlink_parms(data, &p);
- t = vti6_locate(net, &p, 0);
+ t = vti6_locate(net, nd_lock, &p, 0);
if (t) {
if (t->dev != dev)
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 32/51] ip6_sit: Use __register_netdevice() in .newlink and .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (30 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 31/51] ip6_vti: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 33/51] net: Now check nobody calls register_netdevice() with nd_lock attached Kirill Tkhai
` (20 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink and .changelink and their
callers, which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/sit.c | 45 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 36 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 83b195f09561..1749defa4b70 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -212,7 +212,7 @@ static int ipip6_tunnel_create(struct net_device *dev)
dev->rtnl_link_ops = &sit_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
@@ -226,6 +226,7 @@ static int ipip6_tunnel_create(struct net_device *dev)
}
static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
+ struct nd_lock *nd_lock,
struct ip_tunnel_parm_kern *parms,
int create)
{
@@ -269,6 +270,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
nt = netdev_priv(dev);
nt->parms = *parms;
+ attach_nd_lock(dev, nd_lock);
if (ipip6_tunnel_create(dev) < 0)
goto failed_free;
@@ -278,6 +280,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
return nt;
failed_free:
+ detach_nd_lock(dev);
free_netdev(dev);
failed:
return NULL;
@@ -1200,11 +1203,14 @@ ipip6_tunnel_get6rd(struct net_device *dev, struct ip_tunnel_parm __user *data)
struct ip_tunnel *t = netdev_priv(dev);
struct ip_tunnel_parm_kern p;
struct ip_tunnel_6rd ip6rd;
+ struct nd_lock *nd_lock;
if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
if (!ip_tunnel_parm_from_user(&p, data))
return -EFAULT;
- t = ipip6_tunnel_locate(t->net, &p, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, &p, 0);
+ unlock_netdev(nd_lock);
}
if (!t)
t = netdev_priv(dev);
@@ -1273,9 +1279,13 @@ static int
ipip6_tunnel_get(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
- if (dev == dev_to_sit_net(dev)->fb_tunnel_dev)
- t = ipip6_tunnel_locate(t->net, p, 0);
+ if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 0);
+ unlock_netdev(nd_lock);
+ }
if (!t)
t = netdev_priv(dev);
memcpy(p, &t->parms, sizeof(*p));
@@ -1286,13 +1296,16 @@ static int
ipip6_tunnel_add(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
int err;
err = __ipip6_tunnel_ioctl_validate(t->net, p);
if (err)
return err;
- t = ipip6_tunnel_locate(t->net, p, 1);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 1);
+ unlock_netdev(nd_lock);
if (!t)
return -ENOBUFS;
return 0;
@@ -1302,13 +1315,16 @@ static int
ipip6_tunnel_change(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
int err;
err = __ipip6_tunnel_ioctl_validate(t->net, p);
if (err)
return err;
- t = ipip6_tunnel_locate(t->net, p, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 0);
+ unlock_netdev(nd_lock);
if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
if (!t)
return -ENOENT;
@@ -1333,12 +1349,15 @@ static int
ipip6_tunnel_del(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
if (!ns_capable(t->net->user_ns, CAP_NET_ADMIN))
return -EPERM;
if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
- t = ipip6_tunnel_locate(t->net, p, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 0);
+ unlock_netdev(nd_lock);
if (!t)
return -ENOENT;
if (t == netdev_priv(dev_to_sit_net(dev)->fb_tunnel_dev))
@@ -1349,6 +1368,12 @@ ipip6_tunnel_del(struct net_device *dev, struct ip_tunnel_parm_kern *p)
return 0;
}
+/* This is called with rtnl locked and dev nd_lock unlocked.
+ * Note, that currently we take nd_lock in every of below
+ * function: ipip6_tunnel_get, ipip6_tunnel_add, etc instead
+ * of taking it once here, since there is call_netdevice_notifiers()
+ * in one of them, which is not prepared to use nd_lock yet.
+ */
static int
ipip6_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p,
int cmd)
@@ -1553,6 +1578,7 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct net *net = dev_net(dev);
struct ip_tunnel *nt;
struct ip_tunnel_encap ipencap;
@@ -1571,7 +1597,7 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
ipip6_netlink_parms(data, &nt->parms, &nt->fwmark);
- if (ipip6_tunnel_locate(net, &nt->parms, 0))
+ if (ipip6_tunnel_locate(net, nd_lock, &nt->parms, 0))
return -EEXIST;
err = ipip6_tunnel_create(dev);
@@ -1601,6 +1627,7 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip_tunnel *t = netdev_priv(dev);
struct ip_tunnel_encap ipencap;
struct ip_tunnel_parm_kern p;
@@ -1627,7 +1654,7 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
(!(dev->flags & IFF_POINTOPOINT) && p.iph.daddr))
return -EINVAL;
- t = ipip6_tunnel_locate(net, &p, 0);
+ t = ipip6_tunnel_locate(net, nd_lock, &p, 0);
if (t) {
if (t->dev != dev)
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 33/51] net: Now check nobody calls register_netdevice() with nd_lock attached
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (31 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 32/51] ip6_sit: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 34/51] dsa: Make all switch tree ports relate to same nd_lock Kirill Tkhai
` (19 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
At this moment after .newlink and .changelink are switched
to __register_netdevice(), there must not be calls of
register_netdevice() with lock attached.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 63ece39c9286..e6809a80644e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10847,25 +10847,14 @@ int register_netdevice(struct net_device *dev)
struct nd_lock *nd_lock;
int err;
- /* XXX: This "if" is to start one by one convertation
- * to use __register_netdevice() in devices, that
- * want to attach nd_lock themself (e.g., having newlink).
- * After all of them are converted, we remove this.
- */
- if (rcu_access_pointer(dev->nd_lock))
- return __register_netdevice(dev);
+ if (WARN_ON(rcu_access_pointer(dev->nd_lock)))
+ return -EINVAL;
nd_lock = alloc_nd_lock();
if (!nd_lock)
return -ENOMEM;
- /* This may be called from netdevice notifier, which is not converted
- * yet. The context is unknown: either some nd_lock is locked or not.
- * Sometimes here is nested mutex and sometimes is not. We use trylock
- * to silence lockdep assert about that.
- * It will be replaced by mutex_lock(), see next patches.
- */
- BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ mutex_lock(&nd_lock->mutex);
attach_nd_lock(dev, nd_lock);
err = __register_netdevice(dev);
if (err)
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 34/51] dsa: Make all switch tree ports relate to same nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (32 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 33/51] net: Now check nobody calls register_netdevice() with nd_lock attached Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 35/51] cfg80211: Use fallback_nd_lock for registered devices Kirill Tkhai
` (18 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
dsa_tree_migrate_ports_from_lag_conduit() may take any
of ports as new conduit, and it will be connected to
the rest of ports (and using netdev_upper_dev_link()),
so all of them must share the same nd_lock.
xxx: Keep in mind NETDEV_CHANGEUPPER is called
by netdev_upper_dev_unlink().
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/dsa/dsa.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 668c729946ea..6468b03d3d46 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -1156,6 +1156,8 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
struct dsa_switch *ds = dp->ds;
struct dsa_switch_tree *dst = ds->dst;
enum dsa_tag_protocol default_proto;
+ struct nd_lock *nd_lock, *nd_lock2;
+ struct dsa_port *first_dp;
/* Find out which protocol the switch would prefer. */
default_proto = dsa_get_tag_protocol(dp, conduit);
@@ -1213,6 +1215,18 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
dst->tag_ops = tag_ops;
}
+ first_dp = dsa_tree_find_first_cpu(dst);
+ if (first_dp && first_dp->conduit) {
+ /* All conduits must relate the same nd_lock
+ * since dsa_tree_migrate_ports_from_lag_conduit()
+ * may take any of them from list.
+ */
+ double_lock_netdev(first_dp->conduit, &nd_lock,
+ conduit, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ double_unlock_netdev(nd_lock, nd_lock2);
+ }
+
dp->conduit = conduit;
dp->type = DSA_PORT_TYPE_CPU;
dsa_port_set_tag_protocol(dp, dst->tag_ops);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 35/51] cfg80211: Use fallback_nd_lock for registered devices
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (33 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 34/51] dsa: Make all switch tree ports relate to same nd_lock Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 36/51] ieee802154: " Kirill Tkhai
` (17 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
For now we use fallback_nd_lock for all drivers registering
via cfg80211_register_netdevice().
One of the reasons is that they are used as a bunch
in cfg80211_switch_netns(), while we want to call
dev_change_net_namespace() under nd_lock in one of
next patches.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wireless/ath/ath6kl/core.c | 2 ++
drivers/net/wireless/ath/wil6210/netdev.c | 2 ++
drivers/net/wireless/marvell/mwifiex/main.c | 5 +++++
drivers/net/wireless/quantenna/qtnfmac/core.c | 2 ++
net/mac80211/main.c | 2 ++
net/wireless/core.c | 10 ++++++++--
net/wireless/nl80211.c | 14 ++++++++++++++
7 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath6kl/core.c b/drivers/net/wireless/ath/ath6kl/core.c
index 4f0a7a185fc9..8c28f5a476ef 100644
--- a/drivers/net/wireless/ath/ath6kl/core.c
+++ b/drivers/net/wireless/ath/ath6kl/core.c
@@ -212,6 +212,7 @@ int ath6kl_core_init(struct ath6kl *ar, enum ath6kl_htc_type htc_type)
ar->avail_idx_map |= BIT(i);
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(ar->wiphy);
/* Add an initial station interface */
@@ -219,6 +220,7 @@ int ath6kl_core_init(struct ath6kl *ar, enum ath6kl_htc_type htc_type)
NL80211_IFTYPE_STATION, 0, INFRA_NETWORK);
wiphy_unlock(ar->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
if (!wdev) {
diff --git a/drivers/net/wireless/ath/wil6210/netdev.c b/drivers/net/wireless/ath/wil6210/netdev.c
index d5d364683c0e..57958b44717d 100644
--- a/drivers/net/wireless/ath/wil6210/netdev.c
+++ b/drivers/net/wireless/ath/wil6210/netdev.c
@@ -474,9 +474,11 @@ int wil_if_add(struct wil6210_priv *wil)
wil_update_net_queues_bh(wil, vif, NULL, true);
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(wiphy);
rc = wil_vif_add(wil, vif);
wiphy_unlock(wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
if (rc < 0)
goto free_dummy;
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index 96d1f6039fbc..c4b112d2f0b2 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -623,6 +623,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
}
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(adapter->wiphy);
/* Create station interface by default */
wdev = mwifiex_add_virtual_intf(adapter->wiphy, "mlan%d", NET_NAME_ENUM,
@@ -631,6 +632,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
mwifiex_dbg(adapter, ERROR,
"cannot create default STA interface\n");
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
goto err_add_intf;
}
@@ -642,6 +644,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
mwifiex_dbg(adapter, ERROR,
"cannot create AP interface\n");
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
goto err_add_intf;
}
@@ -654,11 +657,13 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
mwifiex_dbg(adapter, ERROR,
"cannot create p2p client interface\n");
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
goto err_add_intf;
}
}
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
mwifiex_drv_get_driver_version(adapter, fmt, sizeof(fmt) - 1);
diff --git a/drivers/net/wireless/quantenna/qtnfmac/core.c b/drivers/net/wireless/quantenna/qtnfmac/core.c
index 825b05dd3271..7952e3314aca 100644
--- a/drivers/net/wireless/quantenna/qtnfmac/core.c
+++ b/drivers/net/wireless/quantenna/qtnfmac/core.c
@@ -597,9 +597,11 @@ static int qtnf_core_mac_attach(struct qtnf_bus *bus, unsigned int macid)
mac->wiphy_registered = 1;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(priv_to_wiphy(mac));
ret = qtnf_core_net_attach(mac, vif, "wlan%d", NET_NAME_ENUM);
wiphy_unlock(priv_to_wiphy(mac));
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
if (ret) {
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index a3104b6ea6f0..bacea2473a21 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1582,6 +1582,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
ieee80211_check_wbrf_support(local);
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(hw->wiphy);
/* add one default STA interface if supported */
@@ -1597,6 +1598,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
}
wiphy_unlock(hw->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
#ifdef CONFIG_INET
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 4d5d351bd0b5..8ba0ada86678 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1439,7 +1439,11 @@ int cfg80211_register_netdevice(struct net_device *dev)
/* we'll take care of this */
wdev->registered = true;
wdev->registering = true;
- ret = register_netdevice(dev);
+
+ if (!mutex_is_locked(&fallback_nd_lock.mutex))
+ return -EXDEV;
+ attach_nd_lock(dev, &fallback_nd_lock);
+ ret = __register_netdevice(dev);
if (ret)
goto out;
@@ -1447,8 +1451,10 @@ int cfg80211_register_netdevice(struct net_device *dev)
ret = 0;
out:
wdev->registering = false;
- if (ret)
+ if (ret) {
+ detach_nd_lock(dev);
wdev->registered = false;
+ }
return ret;
}
EXPORT_SYMBOL(cfg80211_register_netdevice);
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 7397a372c78e..0fd66f75eace 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16455,6 +16455,7 @@ nl80211_set_ttlm(struct sk_buff *skb, struct genl_info *info)
#define NL80211_FLAG_NO_WIPHY_MTX 0x40
#define NL80211_FLAG_MLO_VALID_LINK_ID 0x80
#define NL80211_FLAG_MLO_UNSUPPORTED 0x100
+#define NL80211_FLAG_NEED_FALLBACK_ND_LOCK 0x200
#define INTERNAL_FLAG_SELECTORS(__sel) \
SELECTOR(__sel, NONE, 0) /* must be first */ \
@@ -16477,6 +16478,11 @@ nl80211_set_ttlm(struct sk_buff *skb, struct genl_info *info)
NL80211_FLAG_NEED_WIPHY | \
NL80211_FLAG_NEED_RTNL | \
NL80211_FLAG_NO_WIPHY_MTX) \
+ SELECTOR(__sel, WIPHY_RTNL_ND_LOCK, \
+ NL80211_FLAG_NEED_WIPHY | \
+ NL80211_FLAG_NEED_FALLBACK_ND_LOCK | \
+ NL80211_FLAG_NEED_RTNL | \
+ NL80211_FLAG_NO_WIPHY_MTX) \
SELECTOR(__sel, WDEV_RTNL, \
NL80211_FLAG_NEED_WDEV | \
NL80211_FLAG_NEED_RTNL) \
@@ -16545,6 +16551,7 @@ static int nl80211_pre_doit(const struct genl_split_ops *ops,
internal_flags = nl80211_internal_flags[ops->internal_flags];
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
if (internal_flags & NL80211_FLAG_NEED_WIPHY) {
rdev = cfg80211_get_dev_from_info(genl_info_net(info), info);
if (IS_ERR(rdev)) {
@@ -16621,11 +16628,15 @@ static int nl80211_pre_doit(const struct genl_split_ops *ops,
/* we keep the mutex locked until post_doit */
__release(&rdev->wiphy.mtx);
}
+
+ if (!(internal_flags & NL80211_FLAG_NEED_FALLBACK_ND_LOCK))
+ mutex_unlock(&fallback_nd_lock.mutex);
if (!(internal_flags & NL80211_FLAG_NEED_RTNL))
rtnl_unlock();
return 0;
out_unlock:
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
dev_put(dev);
return err;
@@ -16656,6 +16667,8 @@ static void nl80211_post_doit(const struct genl_split_ops *ops,
wiphy_unlock(&rdev->wiphy);
}
+ if (internal_flags & NL80211_FLAG_NEED_FALLBACK_ND_LOCK)
+ mutex_unlock(&fallback_nd_lock.mutex);
if (internal_flags & NL80211_FLAG_NEED_RTNL)
rtnl_unlock();
@@ -16821,6 +16834,7 @@ static const struct genl_small_ops nl80211_small_ops[] = {
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags =
IFLAGS(NL80211_FLAG_NEED_WIPHY |
+ NL80211_FLAG_NEED_FALLBACK_ND_LOCK |
NL80211_FLAG_NEED_RTNL |
/* we take the wiphy mutex later ourselves */
NL80211_FLAG_NO_WIPHY_MTX),
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 36/51] ieee802154: Use fallback_nd_lock for registered devices
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (34 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 35/51] cfg80211: Use fallback_nd_lock for registered devices Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 37/51] net: Introduce delayed event work Kirill Tkhai
` (16 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
For now we use fallback_nd_lock for all drivers registering
via ieee802154_if_add().
One of the reasons is that they are used as a bunch
in cfg802154_switch_netns(), while we want to call
dev_change_net_namespace() under nd_lock in one of
next patches.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ieee802154/nl802154.c | 7 +++++++
net/mac802154/cfg.c | 2 ++
net/mac802154/iface.c | 10 ++++++++--
net/mac802154/main.c | 2 ++
4 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index 7eb37de3add2..a512f2a647e8 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -2691,6 +2691,7 @@ static int nl802154_del_llsec_seclevel(struct sk_buff *skb,
#define NL802154_FLAG_NEED_RTNL 0x04
#define NL802154_FLAG_CHECK_NETDEV_UP 0x08
#define NL802154_FLAG_NEED_WPAN_DEV 0x10
+#define NL802154_FLAG_NEED_FALLBACK_ND_LOCK 0x20
static int nl802154_pre_doit(const struct genl_split_ops *ops,
struct sk_buff *skb,
@@ -2700,9 +2701,12 @@ static int nl802154_pre_doit(const struct genl_split_ops *ops,
struct wpan_dev *wpan_dev;
struct net_device *dev;
bool rtnl = ops->internal_flags & NL802154_FLAG_NEED_RTNL;
+ bool nd = ops->internal_flags & NL802154_FLAG_NEED_FALLBACK_ND_LOCK;
if (rtnl)
rtnl_lock();
+ if (nd)
+ mutex_lock(&fallback_nd_lock.mutex);
if (ops->internal_flags & NL802154_FLAG_NEED_WPAN_PHY) {
rdev = cfg802154_get_dev_from_info(genl_info_net(info), info);
@@ -2769,6 +2773,8 @@ static void nl802154_post_doit(const struct genl_split_ops *ops,
}
}
+ if (ops->internal_flags & NL802154_FLAG_NEED_FALLBACK_ND_LOCK)
+ mutex_unlock(&fallback_nd_lock.mutex);
if (ops->internal_flags & NL802154_FLAG_NEED_RTNL)
rtnl_unlock();
}
@@ -2800,6 +2806,7 @@ static const struct genl_ops nl802154_ops[] = {
.doit = nl802154_new_interface,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
+ NL802154_FLAG_NEED_FALLBACK_ND_LOCK |
NL802154_FLAG_NEED_RTNL,
},
{
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index ef7f23af043f..405183d258b6 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -23,8 +23,10 @@ ieee802154_add_iface_deprecated(struct wpan_phy *wpan_phy,
struct net_device *dev;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
dev = ieee802154_if_add(local, name, name_assign_type, type,
cpu_to_le64(0x0000000000000000ULL));
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
return dev;
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c
index c0e2da5072be..7ec23e8268de 100644
--- a/net/mac802154/iface.c
+++ b/net/mac802154/iface.c
@@ -664,9 +664,15 @@ ieee802154_if_add(struct ieee802154_local *local, const char *name,
if (ret)
goto err;
- ret = register_netdevice(ndev);
- if (ret < 0)
+ ret = -EXDEV;
+ if (!mutex_is_locked(&fallback_nd_lock.mutex))
+ goto err;
+ attach_nd_lock(ndev, &fallback_nd_lock);
+ ret = __register_netdevice(ndev);
+ if (ret < 0) {
+ detach_nd_lock(ndev);
goto err;
+ }
mutex_lock(&local->iflist_mtx);
list_add_tail_rcu(&sdata->list, &local->interfaces);
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index 21b7c3b280b4..14bcad399dae 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -246,9 +246,11 @@ int ieee802154_register_hw(struct ieee802154_hw *hw)
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
dev = ieee802154_if_add(local, "wpan%d", NET_NAME_ENUM,
NL802154_IFTYPE_NODE,
cpu_to_le64(0x0000000000000000ULL));
+ mutex_unlock(&fallback_nd_lock.mutex);
if (IS_ERR(dev)) {
rtnl_unlock();
rc = PTR_ERR(dev);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 37/51] net: Introduce delayed event work
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (35 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 36/51] ieee802154: " Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 38/51] failover: Link master and slave under nd_lock Kirill Tkhai
` (15 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Some drivers (e.g., failover and netvsc) use netdevice notifiers
to link devices each other by calling netdev_master_upper_dev_link().
Since we want 1)to make both of the devices using the same lock after
linking, and 2)to call netdevice notifiers with nd_lock is locked,
we can't do these two options at the same time, because there will
be a problem with priority inversion:
lock_netdev(dev1, &nd_lock1);
call_netdevice_notifier()
lock_netdev(dev2, &nd_lock2); <--- problem here if !locks_ordered()
nd_lock_transfer_devices(nd_lock, nd_lock2);
netdev_master_upper_dev_link(dev1, dev2);
We can't use double_lock_netdev() instead of lock_netdev() here,
since dev2 is unknown at that moment.
This patch introduces interface to allow handling events in delayed work.
It consists of three:
1)Delayed work to call event callback. The work starting without
any locks locked, so it can take locks of both devices in correct
order;
2)Completion to notify the task that delayed work is done;
3)task_work to allow task to wait for the completion in
the place where task has nd_lock unlocked.
Here is an example of what happens on module loading:
[Task] [Work]
insmod slave_netdev_drv.ko
enter to kernel
init_module()
...
...
lock_netdev()
call_netdevice_notifier()
schedule_delayed_event()
unlock_netdev()
delayed_event_work()
double_lock_netdev(dev1, &nd_lock1, dev2, &nd_lock2)
nd_lock_transfer_devices(nd_lock, nd_lock2)
netdev_master_upper_dev_link(dev1, dev2)
double_unlock_netdev(nd_lock1, nd_lock2)
complete()
wait_for_delayed_event_work()
wait_for_completion()
exit to userspace
As it's seen, using of task work allows to remain user-visible behavior here.
We return from syscall to userspace after delayed work is completed and
all events are handled. This is why we need this task work.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
include/linux/netdevice.h | 2 +
net/core/dev.c | 95 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 97 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e9052e808a4..83b675ec2b0a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2991,6 +2991,8 @@ netdev_notifier_info_to_extack(const struct netdev_notifier_info *info)
int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
int call_netdevice_notifiers_info(unsigned long val,
struct netdev_notifier_info *info);
+int schedule_delayed_event(struct net_device *dev,
+ void (*func)(struct net_device *dev));
#define for_each_netdev(net, d) \
list_for_each_entry(d, &(net)->dev_base_head, dev_list)
diff --git a/net/core/dev.c b/net/core/dev.c
index e6809a80644e..1c447446215d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -154,6 +154,7 @@
#include <linux/pm_runtime.h>
#include <linux/prandom.h>
#include <linux/once_lite.h>
+#include <linux/task_work.h>
#include <net/netdev_rx_queue.h>
#include <net/page_pool/types.h>
#include <net/page_pool/helpers.h>
@@ -2088,6 +2089,100 @@ static int call_netdevice_notifiers_mtu(unsigned long val,
return call_netdevice_notifiers_info(val, &info.info);
}
+struct event_info {
+ struct work_struct work;
+ struct net_device *dev;
+ netdevice_tracker dev_tracker;
+ void (*func)(struct net_device *slave_dev);
+
+ struct callback_head task_work;
+ struct completion comp;
+ refcount_t usage;
+};
+
+static void put_delayed_reg_info(struct event_info *info)
+{
+ if (refcount_dec_and_test(&info->usage))
+ kfree(info);
+}
+
+static void delayed_event_work(struct work_struct *work)
+{
+ struct event_info *info;
+ struct net_device *dev;
+
+ info = container_of(work, struct event_info, work);
+ dev = info->dev;
+
+ info->func(dev);
+
+ /* Not needed to own device during all @info life.
+ * Put device right after callback is handled,
+ * since a task submitted this work may wait for
+ * @dev counter.
+ */
+ netdev_put(dev, &info->dev_tracker);
+ info->dev = NULL;
+
+ complete(&info->comp);
+ put_delayed_reg_info(info);
+}
+
+static void wait_for_delayed_event_work(struct callback_head *task_work)
+{
+ struct event_info *info;
+
+ info = container_of(task_work, struct event_info, task_work);
+ wait_for_completion(&info->comp);
+
+ put_delayed_reg_info(info);
+}
+
+static struct event_info *alloc_delayed_event_info(struct net_device *dev,
+ void (*func)(struct net_device *dev))
+{
+ struct event_info *info;
+
+ info = kmalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return NULL;
+
+ INIT_WORK(&info->work, delayed_event_work);
+ init_task_work(&info->task_work, wait_for_delayed_event_work);
+ init_completion(&info->comp);
+ refcount_set(&info->usage, 1);
+ info->func = func;
+ info->dev = dev;
+ netdev_hold(dev, &info->dev_tracker, GFP_KERNEL);
+
+ return info;
+}
+
+int schedule_delayed_event(struct net_device *dev,
+ void (*func)(struct net_device *dev))
+{
+ struct event_info *info;
+
+ info = alloc_delayed_event_info(dev, func);
+ if (!info)
+ return NOTIFY_DONE;
+
+ /* In case of the notifier is called from regular task,
+ * make the task to wait for registration is completed
+ * before task is returned to userspace. E.g., a syscall
+ * caller will have failover already connected after
+ * he loaded slave device driver.
+ */
+ if (!(current->flags & PF_KTHREAD)) {
+ if (!task_work_add(current, &info->task_work, TWA_RESUME))
+ refcount_inc(&info->usage);
+ }
+
+ schedule_work(&info->work);
+ return NOTIFY_OK;
+}
+EXPORT_SYMBOL_GPL(schedule_delayed_event);
+
#ifdef CONFIG_NET_INGRESS
static DEFINE_STATIC_KEY_FALSE(ingress_needed_key);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 38/51] failover: Link master and slave under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (36 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 37/51] net: Introduce delayed event work Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 39/51] netvsc: Make joined device to share master's nd_lock Kirill Tkhai
` (14 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We don't want to do this in failover_event(), since we
want to call netdevice notifiers with nd_lock already
locked in the future.
Also see comments in patch introducing schedule_delayed_event()
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/failover.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/net/core/failover.c b/net/core/failover.c
index 2a140b3ea669..83be0d0ab99a 100644
--- a/net/core/failover.c
+++ b/net/core/failover.c
@@ -46,6 +46,7 @@ static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
static int failover_slave_register(struct net_device *slave_dev)
{
struct netdev_lag_upper_info lag_upper_info;
+ struct nd_lock *nd_lock, *nd_lock2;
struct net_device *failover_dev;
struct failover_ops *fops;
int err;
@@ -72,8 +73,14 @@ static int failover_slave_register(struct net_device *slave_dev)
}
lag_upper_info.tx_type = NETDEV_LAG_TX_TYPE_ACTIVEBACKUP;
+
+ double_lock_netdev(slave_dev, &nd_lock, failover_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+
err = netdev_master_upper_dev_link(slave_dev, failover_dev, NULL,
&lag_upper_info, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
+
if (err) {
netdev_err(slave_dev, "can not set failover device %s (err = %d)\n",
failover_dev->name, err);
@@ -182,6 +189,18 @@ static int failover_slave_name_change(struct net_device *slave_dev)
return NOTIFY_DONE;
}
+static void call_failover_slave_register(struct net_device *dev)
+{
+ rtnl_lock();
+ if (dev->reg_state == NETREG_REGISTERED) {
+ failover_slave_register(dev);
+ failover_slave_link_change(dev);
+ failover_slave_name_change(dev);
+
+ }
+ rtnl_unlock();
+}
+
static int
failover_event(struct notifier_block *this, unsigned long event, void *ptr)
{
@@ -193,7 +212,10 @@ failover_event(struct notifier_block *this, unsigned long event, void *ptr)
switch (event) {
case NETDEV_REGISTER:
- return failover_slave_register(event_dev);
+ if (netdev_is_rx_handler_busy(event_dev))
+ return NOTIFY_DONE;
+ return schedule_delayed_event(event_dev,
+ call_failover_slave_register);
case NETDEV_UNREGISTER:
return failover_slave_unregister(event_dev);
case NETDEV_UP:
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 39/51] netvsc: Make joined device to share master's nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (37 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 38/51] failover: Link master and slave under nd_lock Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 40/51] openvswitch: Make ports share nd_lock of master device Kirill Tkhai
` (13 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We don't want to do that from netvsc_netdev_event() since
we want to make netdevice notifiers be called under nd_lock
in future.
Also see comments in patch introducing schedule_delayed_event()
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/hyperv/netvsc_drv.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 44142245343d..be8038e6393f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2192,6 +2192,7 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
struct net_device *ndev, int context)
{
struct net_device_context *ndev_ctx = netdev_priv(ndev);
+ struct nd_lock *nd_lock, *nd_lock2;
int ret;
ret = netdev_rx_handler_register(vf_netdev,
@@ -2203,8 +2204,12 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
goto rx_handler_failed;
}
+ double_lock_netdev(ndev, &nd_lock, vf_netdev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+
ret = netdev_master_upper_dev_link(vf_netdev, ndev,
NULL, NULL, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
if (ret != 0) {
netdev_err(vf_netdev,
"can not set master device %s (err = %d)\n",
@@ -2797,6 +2802,20 @@ static struct hv_driver netvsc_drv = {
},
};
+static void call_netvsc_register(struct net_device *dev)
+{
+ unsigned long event;
+
+ rtnl_lock();
+ netvsc_prepare_bonding(dev);
+ netvsc_register_vf(dev, VF_REG_IN_NOTIFIER);
+ event = NETDEV_GOING_DOWN;
+ if (netif_running(dev))
+ event = NETDEV_CHANGE;
+ netvsc_vf_changed(dev, event);
+ rtnl_unlock();
+}
+
/*
* On Hyper-V, every VF interface is matched with a corresponding
* synthetic interface. The synthetic interface is presented first
@@ -2814,10 +2833,10 @@ static int netvsc_netdev_event(struct notifier_block *this,
return NOTIFY_DONE;
switch (event) {
- case NETDEV_POST_INIT:
- return netvsc_prepare_bonding(event_dev);
case NETDEV_REGISTER:
- return netvsc_register_vf(event_dev, VF_REG_IN_NOTIFIER);
+ return schedule_delayed_event(event_dev,
+ call_netvsc_register);
+ return NOTIFY_DONE;
case NETDEV_UNREGISTER:
return netvsc_unregister_vf(event_dev);
case NETDEV_UP:
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 40/51] openvswitch: Make ports share nd_lock of master device
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (38 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 39/51] netvsc: Make joined device to share master's nd_lock Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 41/51] bridge: Make port to have the same nd_lock as bridge Kirill Tkhai
` (12 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/openvswitch/vport-netdev.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 91a11067e458..e629fc3c1442 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -75,6 +75,7 @@ static struct net_device *get_dpdev(const struct datapath *dp)
struct vport *ovs_netdev_link(struct vport *vport, const char *name)
{
+ struct nd_lock *nd_lock, *nd_lock2;
int err;
vport->dev = dev_get_by_name(ovs_dp_get_net(vport->dp), name);
@@ -99,9 +100,14 @@ struct vport *ovs_netdev_link(struct vport *vport, const char *name)
}
rtnl_lock();
+ double_lock_netdev(vport->dev, &nd_lock, get_dpdev(vport->dp), &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+
err = netdev_master_upper_dev_link(vport->dev,
get_dpdev(vport->dp),
NULL, NULL, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
+
if (err)
goto error_unlock;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 41/51] bridge: Make port to have the same nd_lock as bridge
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (39 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 40/51] openvswitch: Make ports share nd_lock of master device Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 42/51] bond: Make master and slave relate to the same nd_lock Kirill Tkhai
` (11 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/bridge/br_ioctl.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index f213ed108361..b4b0cc6ac08b 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -85,6 +85,7 @@ static int get_fdb_entries(struct net_bridge *br, void __user *userbuf,
static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
{
struct net *net = dev_net(br->dev);
+ struct nd_lock *nd_lock, *nd_lock2;
struct net_device *dev;
int ret;
@@ -95,9 +96,12 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
if (dev == NULL)
return -EINVAL;
- if (isadd)
+ if (isadd) {
+ double_lock_netdev(br->dev, &nd_lock, dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
ret = br_add_if(br, dev, NULL);
- else
+ double_unlock_netdev(nd_lock, nd_lock2);
+ } else
ret = br_del_if(br, dev);
return ret;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 42/51] bond: Make master and slave relate to the same nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (40 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 41/51] bridge: Make port to have the same nd_lock as bridge Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 43/51] net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached Kirill Tkhai
` (10 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/bonding/bond_main.c | 4 ++++
drivers/net/bonding/bond_options.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 96f5470a5f55..1140e01f72b8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4490,6 +4490,7 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
struct ifbond __user *u_binfo = NULL;
struct ifslave k_sinfo;
struct ifslave __user *u_sinfo = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
struct bond_opt_value newval;
struct net *net;
int res = 0;
@@ -4538,7 +4539,10 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
switch (cmd) {
case SIOCBONDENSLAVE:
+ double_lock_netdev(bond_dev, &nd_lock, slave_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
res = bond_enslave(bond_dev, slave_dev, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
break;
case SIOCBONDRELEASE:
res = bond_release(bond_dev, slave_dev);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 95d59a18c022..a3ebb8d6c529 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1605,6 +1605,7 @@ static int bond_option_slaves_set(struct bonding *bond,
const struct bond_opt_value *newval)
{
char command[IFNAMSIZ + 1] = { 0, };
+ struct nd_lock *nd_lock, *nd_lock2;
struct net_device *dev;
char *ifname;
int ret;
@@ -1627,7 +1628,10 @@ static int bond_option_slaves_set(struct bonding *bond,
switch (command[0]) {
case '+':
slave_dbg(bond->dev, dev, "Enslaving interface\n");
+ double_lock_netdev(bond->dev, &nd_lock, dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
ret = bond_enslave(bond->dev, dev, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
break;
case '-':
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 43/51] net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (41 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 42/51] bond: Make master and slave relate to the same nd_lock Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 44/51] net: Call dellink with nd_lock is held Kirill Tkhai
` (9 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
... or with devices not related to the same nd_lock,
since at this moment all callers are switched to follow this way.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 1c447446215d..55df8157bca9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8126,6 +8126,12 @@ int netdev_master_upper_dev_link(struct net_device *dev,
.flags = NESTED_SYNC_IMM | NESTED_SYNC_TODO,
.data = NULL,
};
+ struct nd_lock *nd_lock;
+
+ nd_lock = rcu_dereference_protected(upper_dev->nd_lock, true);
+ if (WARN_ON(!mutex_is_locked(&nd_lock->mutex) ||
+ nd_lock != rcu_dereference_protected(dev->nd_lock, true)))
+ return -EXDEV;
return __netdev_upper_dev_link(dev, upper_dev, true,
upper_priv, upper_info, &priv, extack);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 44/51] net: Call dellink with nd_lock is held
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (42 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 43/51] net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 45/51] t7xx: Use __unregister_netdevice() Kirill Tkhai
` (8 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
After previous patches all linked devices share the same lock.
Here we add nd_lock around dellink to start making calls of
unregister_netdevice() under nd_lock is locked.
One more good thing is many netdev_upper_dev_unlink() becomes
called under nd_lock is held, but not all yet.
Note, that ->dellink called from netdevice notifiers are not
braced yet.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 3 +++
net/core/rtnetlink.c | 10 ++++++++++
2 files changed, 13 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 55df8157bca9..f0f93b5a2819 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12399,6 +12399,7 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
* Do this across as many network namespaces as possible to
* improve batching efficiency.
*/
+ struct nd_lock *nd_lock;
struct net_device *dev;
struct net *net;
LIST_HEAD(dev_kill_list);
@@ -12411,10 +12412,12 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
list_for_each_entry(net, net_list, exit_list) {
for_each_netdev_reverse(net, dev) {
+ lock_netdev(dev, &nd_lock);
if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
dev->rtnl_link_ops->dellink(dev, &dev_kill_list);
else
unregister_netdevice_queue(dev, &dev_kill_list);
+ unlock_netdev(nd_lock);
}
}
unregister_netdevice_many(&dev_kill_list);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 67b4b0610d14..fdc06f0ecf31 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -449,12 +449,15 @@ EXPORT_SYMBOL_GPL(rtnl_link_register);
static void __rtnl_kill_links(struct net *net, struct rtnl_link_ops *ops)
{
+ struct nd_lock *nd_lock;
struct net_device *dev;
LIST_HEAD(list_kill);
for_each_netdev(net, dev) {
+ lock_netdev(dev, &nd_lock);
if (dev->rtnl_link_ops == ops)
ops->dellink(dev, &list_kill);
+ unlock_netdev(nd_lock);
}
unregister_netdevice_many(&list_kill);
}
@@ -3260,9 +3263,12 @@ static int rtnl_group_dellink(const struct net *net, int group)
for_each_netdev_safe(net, dev, aux) {
if (dev->group == group) {
const struct rtnl_link_ops *ops;
+ struct nd_lock *nd_lock;
ops = dev->rtnl_link_ops;
+ lock_netdev(dev, &nd_lock);
ops->dellink(dev, &list_kill);
+ unlock_netdev(nd_lock);
}
}
unregister_netdevice_many(&list_kill);
@@ -3273,13 +3279,17 @@ static int rtnl_group_dellink(const struct net *net, int group)
int rtnl_delete_link(struct net_device *dev, u32 portid, const struct nlmsghdr *nlh)
{
const struct rtnl_link_ops *ops;
+ struct nd_lock *nd_lock;
LIST_HEAD(list_kill);
ops = dev->rtnl_link_ops;
if (!ops || !ops->dellink)
return -EOPNOTSUPP;
+ lock_netdev(dev, &nd_lock);
ops->dellink(dev, &list_kill);
+ unlock_netdev(nd_lock);
+
unregister_netdevice_many_notify(&list_kill, portid, nlh);
return 0;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 45/51] t7xx: Use __unregister_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (43 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 44/51] net: Call dellink with nd_lock is held Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 46/51] 6lowpan: " Kirill Tkhai
` (7 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
->dellink is going to be called with nd_lock is held
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wwan/t7xx/t7xx_netdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wwan/t7xx/t7xx_netdev.c b/drivers/net/wwan/t7xx/t7xx_netdev.c
index 3bde38147930..d3da299a59ff 100644
--- a/drivers/net/wwan/t7xx/t7xx_netdev.c
+++ b/drivers/net/wwan/t7xx/t7xx_netdev.c
@@ -324,7 +324,7 @@ static void t7xx_ccmni_wwan_dellink(void *ctxt, struct net_device *dev, struct l
if (WARN_ON(ctlb->ccmni_inst[if_id] != ccmni))
return;
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
}
static const struct wwan_ops ccmni_wwan_ops = {
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 46/51] 6lowpan: Use __unregister_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (44 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 45/51] t7xx: Use __unregister_netdevice() Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 47/51] netvsc: Call dev_change_net_namespace() under nd_lock Kirill Tkhai
` (6 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
->dellink is going to be called with nd_lock is held
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/6lowpan/core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
index b5cbf85b291c..bd77076b125e 100644
--- a/net/6lowpan/core.c
+++ b/net/6lowpan/core.c
@@ -71,15 +71,19 @@ EXPORT_SYMBOL(lowpan_register_netdev);
void lowpan_unregister_netdevice(struct net_device *dev)
{
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
lowpan_dev_debugfs_exit(dev);
}
EXPORT_SYMBOL(lowpan_unregister_netdevice);
void lowpan_unregister_netdev(struct net_device *dev)
{
+ struct nd_lock *nd_lock;
+
rtnl_lock();
+ lock_netdev(dev, &nd_lock);
lowpan_unregister_netdevice(dev);
+ unlock_netdev(nd_lock);
rtnl_unlock();
}
EXPORT_SYMBOL(lowpan_unregister_netdev);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 47/51] netvsc: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (45 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 46/51] 6lowpan: " Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 48/51] default_device: " Kirill Tkhai
` (5 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/hyperv/netvsc_drv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index be8038e6393f..cc9f07f8d499 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2365,6 +2365,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev, int context)
struct netvsc_device *netvsc_dev;
struct bpf_prog *prog;
struct net_device *ndev;
+ struct nd_lock *nd_lock;
int ret;
if (vf_netdev->addr_len != ETH_ALEN)
@@ -2384,8 +2385,10 @@ static int netvsc_register_vf(struct net_device *vf_netdev, int context)
* done again in that context.
*/
if (!net_eq(dev_net(ndev), dev_net(vf_netdev))) {
+ lock_netdev(vf_netdev, &nd_lock);
ret = dev_change_net_namespace(vf_netdev,
dev_net(ndev), "eth%d");
+ unlock_netdev(nd_lock);
if (ret)
netdev_err(vf_netdev,
"could not move to same namespace as %s: %d\n",
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 48/51] default_device: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (46 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 47/51] netvsc: Call dev_change_net_namespace() under nd_lock Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 49/51] ieee802154: " Kirill Tkhai
` (4 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index f0f93b5a2819..c477b39d08b9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12357,6 +12357,7 @@ static void __net_exit default_device_exit_net(struct net *net)
{
struct netdev_name_node *name_node, *tmp;
struct net_device *dev, *aux;
+ struct nd_lock *nd_lock;
/*
* Push all migratable network devices back to the
* initial network namespace
@@ -12383,7 +12384,9 @@ static void __net_exit default_device_exit_net(struct net *net)
if (netdev_name_in_use(&init_net, name_node->name))
__netdev_name_node_alt_destroy(name_node);
+ lock_netdev(dev, &nd_lock);
err = dev_change_net_namespace(dev, &init_net, fb_name);
+ unlock_netdev(nd_lock);
if (err) {
pr_emerg("%s: failed to move %s to init_net: %d\n",
__func__, dev->name, err);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 49/51] ieee802154: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (47 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 48/51] default_device: " Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:44 ` [PATCH NET-PREV 50/51] cfg80211: " Kirill Tkhai
` (3 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ieee802154/core.c | 2 ++
net/ieee802154/nl802154.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c
index 60e8fff1347e..8a85a57bf042 100644
--- a/net/ieee802154/core.c
+++ b/net/ieee802154/core.c
@@ -349,10 +349,12 @@ static void __net_exit cfg802154_pernet_exit(struct net *net)
struct cfg802154_registered_device *rdev;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
list_for_each_entry(rdev, &cfg802154_rdev_list, list) {
if (net_eq(wpan_phy_net(&rdev->wpan_phy), net))
WARN_ON(cfg802154_switch_netns(rdev, &init_net));
}
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
}
diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index a512f2a647e8..e8f21de679b7 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -2855,6 +2855,7 @@ static const struct genl_ops nl802154_ops[] = {
.doit = nl802154_wpan_phy_netns,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
+ NL802154_FLAG_NEED_FALLBACK_ND_LOCK |
NL802154_FLAG_NEED_RTNL,
},
{
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 50/51] cfg80211: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (48 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 49/51] ieee802154: " Kirill Tkhai
@ 2025-03-22 14:44 ` Kirill Tkhai
2025-03-22 14:44 ` [PATCH NET-PREV 51/51] net: Make all NETDEV_REGISTER events to be called " Kirill Tkhai
` (2 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:44 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/wireless/core.c | 2 ++
net/wireless/nl80211.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 8ba0ada86678..c661bba9fc7b 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1605,10 +1605,12 @@ static void __net_exit cfg80211_pernet_exit(struct net *net)
struct cfg80211_registered_device *rdev;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
for_each_rdev(rdev) {
if (net_eq(wiphy_net(&rdev->wiphy), net))
WARN_ON(cfg80211_switch_netns(rdev, &init_net));
}
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
}
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 0fd66f75eace..f8bd7c72bd3e 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -17136,6 +17136,7 @@ static const struct genl_small_ops nl80211_small_ops[] = {
.doit = nl80211_wiphy_netns,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = IFLAGS(NL80211_FLAG_NEED_WIPHY |
+ NL80211_FLAG_NEED_FALLBACK_ND_LOCK |
NL80211_FLAG_NEED_RTNL |
NL80211_FLAG_NO_WIPHY_MTX),
},
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH NET-PREV 51/51] net: Make all NETDEV_REGISTER events to be called under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (49 preceding siblings ...)
2025-03-22 14:44 ` [PATCH NET-PREV 50/51] cfg80211: " Kirill Tkhai
@ 2025-03-22 14:44 ` Kirill Tkhai
2025-03-24 2:51 ` [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Stanislav Fomichev
2025-03-25 11:15 ` Jakub Kicinski
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:44 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index c477b39d08b9..03c1bfa35309 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1737,13 +1737,19 @@ static void call_netdevice_unregister_notifiers(struct notifier_block *nb,
}
static int call_netdevice_register_net_notifiers(struct notifier_block *nb,
- struct net *net)
+ struct net *net,
+ bool locked)
{
+ struct nd_lock *nd_lock;
struct net_device *dev;
int err;
for_each_netdev(net, dev) {
+ if (!locked)
+ lock_netdev(dev, &nd_lock);
err = call_netdevice_register_notifiers(nb, dev);
+ if (!locked)
+ unlock_netdev(nd_lock);
if (err)
goto rollback;
}
@@ -1794,7 +1800,7 @@ int register_netdevice_notifier(struct notifier_block *nb)
if (dev_boot_phase)
goto unlock;
for_each_net(net) {
- err = call_netdevice_register_net_notifiers(nb, net);
+ err = call_netdevice_register_net_notifiers(nb, net, false);
if (err)
goto rollback;
}
@@ -1851,7 +1857,8 @@ EXPORT_SYMBOL(unregister_netdevice_notifier);
static int __register_netdevice_notifier_net(struct net *net,
struct notifier_block *nb,
- bool ignore_call_fail)
+ bool ignore_call_fail,
+ bool locked)
{
int err;
@@ -1861,7 +1868,7 @@ static int __register_netdevice_notifier_net(struct net *net,
if (dev_boot_phase)
return 0;
- err = call_netdevice_register_net_notifiers(nb, net);
+ err = call_netdevice_register_net_notifiers(nb, net, locked);
if (err && !ignore_call_fail)
goto chain_unregister;
@@ -1905,7 +1912,7 @@ int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb)
int err;
rtnl_lock();
- err = __register_netdevice_notifier_net(net, nb, false);
+ err = __register_netdevice_notifier_net(net, nb, false, false);
rtnl_unlock();
return err;
}
@@ -1944,17 +1951,20 @@ static void __move_netdevice_notifier_net(struct net *src_net,
struct notifier_block *nb)
{
__unregister_netdevice_notifier_net(src_net, nb);
- __register_netdevice_notifier_net(dst_net, nb, true);
+ __register_netdevice_notifier_net(dst_net, nb, true, true);
}
int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
{
+ struct nd_lock *nd_lock;
int err;
rtnl_lock();
- err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
+ lock_netdev(dev, &nd_lock);
+ err = __register_netdevice_notifier_net(dev_net(dev), nb, false, true);
+ unlock_netdev(nd_lock);
if (!err) {
nn->nb = nb;
list_add(&nn->list, &dev->net_notifier_list);
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (50 preceding siblings ...)
2025-03-22 14:44 ` [PATCH NET-PREV 51/51] net: Make all NETDEV_REGISTER events to be called " Kirill Tkhai
@ 2025-03-24 2:51 ` Stanislav Fomichev
2025-03-25 11:15 ` Jakub Kicinski
52 siblings, 0 replies; 54+ messages in thread
From: Stanislav Fomichev @ 2025-03-24 2:51 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: netdev, linux-kernel
On 03/22, Kirill Tkhai wrote:
> Hi,
>
> this patchset shows the way to completely remove rtnl lock and that
> this process can be done iteratively without any shocks. It implements
> the architecture of new fine-grained locking to use instead of rtnl,
> and iteratively converts many drivers to use it.
>
> I mostly write this mostly a few years ago, more or less recently
> I rebased the patches on kernel around 6.11 (there should not
> be many conflicts on that version). Currenly I have no plans
> to complete this.
>
> If anyone wants to continue, this person can take this patchset
> and done the work.
Skimmed through, but high level comment: we are slowly migrating to netdev
instance/ops lock:
https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=cc34acd577f1a6ed805106bfcc9a262837dbd0da
Instead of introducing another nd_lock, it should be possible (in
theory) to convert existing upper/lower devices to maintain locking
hierarchy and grab upper->lower during netdev_lock_ops().
There are a few nasty places where we lock lower->upper->lower, like
this, that need careful consideration:
https://lore.kernel.org/netdev/20250313100657.2287455-1-sdf@fomichev.me/
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (51 preceding siblings ...)
2025-03-24 2:51 ` [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Stanislav Fomichev
@ 2025-03-25 11:15 ` Jakub Kicinski
52 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2025-03-25 11:15 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: netdev, linux-kernel
On Sat, 22 Mar 2025 17:37:41 +0300 Kirill Tkhai wrote:
> I mostly write this mostly a few years ago, more or less recently
> I rebased the patches on kernel around 6.11 (there should not
> be many conflicts on that version). Currenly I have no plans
> to complete this.
>
> If anyone wants to continue, this person can take this patchset
> and done the work.
Was there a pain point you were trying to address, or was rtnl_lock
just an interesting challenge? Paolo mentioned trying to convert veth
to instance locking, I guess he may need to reach for similar solutions.
^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2025-03-25 11:15 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
2025-03-22 14:37 ` [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 02/51] net: Add nlaattr check to rtnl_link_get_net_capable() Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 03/51] net: do_setlink() refactoring: move target_net acquiring to callers Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 04/51] net: Extract some code from __rtnl_newlink() to separate func Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 05/51] net: Move dereference of tb[IFLA_MASTER] up Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 06/51] net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create() Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 07/51] net: Introduce nd_lock and primitives to work with it Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 08/51] net: Initially attaching and detaching nd_lock Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 09/51] net: Use register_netdevice() in loopback() Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 10/51] net: Underline newlink and changelink dependencies Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 11/51] net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 12/51] net: Use __register_netdevice in trivial .newlink cases Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 13/51] infiniband_ipoib: Use __register_netdevice in .newlink Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 14/51] vxcan: " Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 15/51] iavf: Use __register_netdevice() Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 16/51] geneve: Use __register_netdevice in .newlink Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 17/51] netkit: " Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 18/51] qmi_wwan: " Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 19/51] bpqether: Provide determined context in __register_netdevice() Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 20/51] ppp: Use __register_netdevice in .newlink Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 21/51] veth: " Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 22/51] vxlan: " Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 23/51] hdlc_fr: Use __register_netdevice Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 24/51] lapbeth: Provide determined context in __register_netdevice() Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 25/51] wwan: Use __register_netdevice in .newlink Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 26/51] 6lowpan: " Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 27/51] vlan: " Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 28/51] dsa: Use __register_netdevice() Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 29/51] ip6gre: Use __register_netdevice() in .changelink Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 30/51] ip6_tunnel: Use __register_netdevice() in .newlink and .changelink Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 31/51] ip6_vti: " Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 32/51] ip6_sit: " Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 33/51] net: Now check nobody calls register_netdevice() with nd_lock attached Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 34/51] dsa: Make all switch tree ports relate to same nd_lock Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 35/51] cfg80211: Use fallback_nd_lock for registered devices Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 36/51] ieee802154: " Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 37/51] net: Introduce delayed event work Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 38/51] failover: Link master and slave under nd_lock Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 39/51] netvsc: Make joined device to share master's nd_lock Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 40/51] openvswitch: Make ports share nd_lock of master device Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 41/51] bridge: Make port to have the same nd_lock as bridge Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 42/51] bond: Make master and slave relate to the same nd_lock Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 43/51] net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 44/51] net: Call dellink with nd_lock is held Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 45/51] t7xx: Use __unregister_netdevice() Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 46/51] 6lowpan: " Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 47/51] netvsc: Call dev_change_net_namespace() under nd_lock Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 48/51] default_device: " Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 49/51] ieee802154: " Kirill Tkhai
2025-03-22 14:44 ` [PATCH NET-PREV 50/51] cfg80211: " Kirill Tkhai
2025-03-22 14:44 ` [PATCH NET-PREV 51/51] net: Make all NETDEV_REGISTER events to be called " Kirill Tkhai
2025-03-24 2:51 ` [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Stanislav Fomichev
2025-03-25 11:15 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).