* [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
@ 2025-03-22 14:37 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 02/51] net: Add nlaattr check to rtnl_link_get_net_capable() Kirill Tkhai
` (51 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:37 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 73fd7f543fd0..b33a7e86c534 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3572,15 +3572,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
#ifdef CONFIG_MODULES
replay:
#endif
- err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
- ifla_policy, extack);
- if (err < 0)
- return err;
-
- err = rtnl_ensure_unique_netns(tb, extack, false);
- if (err < 0)
- return err;
-
ifm = nlmsg_data(nlh);
if (ifm->ifi_index > 0) {
link_specified = true;
@@ -3734,13 +3725,25 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct rtnl_newlink_tbs *tbs;
+ struct nlattr **tb;
int ret;
tbs = kmalloc(sizeof(*tbs), GFP_KERNEL);
if (!tbs)
return -ENOMEM;
+ tb = tbs->tb;
+
+ ret = nlmsg_parse_deprecated(nlh, sizeof(struct ifinfomsg), tb,
+ IFLA_MAX, ifla_policy, extack);
+ if (ret < 0)
+ goto out;
+
+ ret = rtnl_ensure_unique_netns(tb, extack, false);
+ if (ret < 0)
+ goto out;
ret = __rtnl_newlink(skb, nlh, tbs, extack);
+out:
kfree(tbs);
return ret;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 02/51] net: Add nlaattr check to rtnl_link_get_net_capable()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
2025-03-22 14:37 ` [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 03/51] net: do_setlink() refactoring: move target_net acquiring to callers Kirill Tkhai
` (50 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b33a7e86c534..34e35b81cfa6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2363,6 +2363,9 @@ static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
{
struct net *net;
+ if (!tb[IFLA_NET_NS_PID] && !tb[IFLA_NET_NS_FD] && !tb[IFLA_TARGET_NETNSID])
+ return NULL;
+
net = rtnl_link_get_net_by_nlattr(src_net, tb);
if (IS_ERR(net))
return net;
@@ -3480,6 +3483,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
if (IS_ERR(dest_net))
return PTR_ERR(dest_net);
+ dest_net = dest_net ? : get_net(net);
if (tb[IFLA_LINK_NETNSID]) {
int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 03/51] net: do_setlink() refactoring: move target_net acquiring to callers
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
2025-03-22 14:37 ` [PATCH NET-PREV 01/51] net: Move some checks from __rtnl_newlink() to caller Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 02/51] net: Add nlaattr check to rtnl_link_get_net_capable() Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 04/51] net: Extract some code from __rtnl_newlink() to separate func Kirill Tkhai
` (49 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 78 +++++++++++++++++++++++++++++++-------------------
1 file changed, 49 insertions(+), 29 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 34e35b81cfa6..a5af69af235f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2774,7 +2774,7 @@ static int do_set_proto_down(struct net_device *dev,
#define DO_SETLINK_MODIFIED 0x01
/* notify flag means notify + modified. */
#define DO_SETLINK_NOTIFY 0x03
-static int do_setlink(const struct sk_buff *skb,
+static int do_setlink(struct net *net, const struct sk_buff *skb,
struct net_device *dev, struct ifinfomsg *ifm,
struct netlink_ext_ack *extack,
struct nlattr **tb, int status)
@@ -2788,25 +2788,16 @@ static int do_setlink(const struct sk_buff *skb,
else
ifname[0] = '\0';
- if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD] || tb[IFLA_TARGET_NETNSID]) {
+ if (net) { /* target net */
const char *pat = ifname[0] ? ifname : NULL;
- struct net *net;
int new_ifindex;
- net = rtnl_link_get_net_capable(skb, dev_net(dev),
- tb, CAP_NET_ADMIN);
- if (IS_ERR(net)) {
- err = PTR_ERR(net);
- goto errout;
- }
-
if (tb[IFLA_NEW_IFINDEX])
new_ifindex = nla_get_s32(tb[IFLA_NEW_IFINDEX]);
else
new_ifindex = 0;
err = __dev_change_net_namespace(dev, net, pat, new_ifindex);
- put_net(net);
if (err)
goto errout;
status |= DO_SETLINK_MODIFIED;
@@ -3171,6 +3162,7 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct ifinfomsg *ifm;
struct net_device *dev;
+ struct net *target_net = NULL;
int err;
struct nlattr *tb[IFLA_MAX+1];
@@ -3183,6 +3175,13 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
goto errout;
+ target_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
+ if (IS_ERR(target_net)) {
+ err = PTR_ERR(target_net);
+ target_net = NULL;
+ goto errout;
+ }
+
err = -EINVAL;
ifm = nlmsg_data(nlh);
if (ifm->ifi_index > 0)
@@ -3201,8 +3200,10 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
goto errout;
- err = do_setlink(skb, dev, ifm, extack, tb, 0);
+ err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
errout:
+ if (target_net)
+ put_net(target_net);
return err;
}
@@ -3440,38 +3441,51 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
struct nlattr **tb)
{
struct net_device *dev, *aux;
+ struct net *target_net;
int err;
+ target_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
+ if (IS_ERR(target_net)) {
+ err = PTR_ERR(target_net);
+ target_net = NULL;
+ goto out;
+ }
+
for_each_netdev_safe(net, dev, aux) {
if (dev->group == group) {
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
- return err;
- err = do_setlink(skb, dev, ifm, extack, tb, 0);
+ break;
+ err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
if (err < 0)
- return err;
+ break;
}
}
-
- return 0;
+out:
+ if (target_net)
+ put_net(target_net);
+ return err;
}
static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
const struct rtnl_link_ops *ops,
const struct nlmsghdr *nlh,
struct nlattr **tb, struct nlattr **data,
- struct netlink_ext_ack *extack)
+ struct netlink_ext_ack *extack,
+ struct net *dest_net)
{
unsigned char name_assign_type = NET_NAME_USER;
struct net *net = sock_net(skb->sk);
u32 portid = NETLINK_CB(skb).portid;
- struct net *dest_net, *link_net;
+ struct net *link_net;
struct net_device *dev;
char ifname[IFNAMSIZ];
int err;
if (!ops->alloc && !ops->setup)
return -EOPNOTSUPP;
+ if (!dest_net)
+ dest_net = net;
if (tb[IFLA_IFNAME]) {
nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
@@ -3480,11 +3494,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
name_assign_type = NET_NAME_ENUM;
}
- dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
- if (IS_ERR(dest_net))
- return PTR_ERR(dest_net);
- dest_net = dest_net ? : get_net(net);
-
if (tb[IFLA_LINK_NETNSID]) {
int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
@@ -3535,7 +3544,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
out:
if (link_net)
put_net(link_net);
- put_net(dest_net);
return err;
out_unregister:
if (ops->newlink) {
@@ -3557,7 +3565,8 @@ struct rtnl_newlink_tbs {
static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtnl_newlink_tbs *tbs,
- struct netlink_ext_ack *extack)
+ struct netlink_ext_ack *extack,
+ struct net *target_net)
{
struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
struct nlattr ** const tb = tbs->tb;
@@ -3688,7 +3697,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
status |= DO_SETLINK_NOTIFY;
}
- return do_setlink(skb, dev, ifm, extack, tb, status);
+ return do_setlink(target_net, skb, dev, ifm, extack, tb, status);
}
if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
@@ -3722,12 +3731,14 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
return -EOPNOTSUPP;
}
- return rtnl_newlink_create(skb, ifm, ops, nlh, tb, data, extack);
+ return rtnl_newlink_create(skb, ifm, ops, nlh, tb, data, extack, target_net);
}
static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
+ struct net *net = sock_net(skb->sk);
+ struct net *target_net = NULL;
struct rtnl_newlink_tbs *tbs;
struct nlattr **tb;
int ret;
@@ -3746,8 +3757,17 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (ret < 0)
goto out;
- ret = __rtnl_newlink(skb, nlh, tbs, extack);
+ target_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
+ if (IS_ERR(target_net)) {
+ ret = PTR_ERR(target_net);
+ target_net = NULL;
+ goto out;
+ }
+
+ ret = __rtnl_newlink(skb, nlh, tbs, extack, target_net);
out:
+ if (target_net)
+ put_net(target_net);
kfree(tbs);
return ret;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 04/51] net: Extract some code from __rtnl_newlink() to separate func
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (2 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 03/51] net: do_setlink() refactoring: move target_net acquiring to callers Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 05/51] net: Move dereference of tb[IFLA_MASTER] up Kirill Tkhai
` (48 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The patch is preparation in rtnetlink code for using nd_lock.
This is a step to move dereference of tb[IFLA_MASTER] up
to where main dev is dereferenced by ifi_index.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 167 +++++++++++++++++++++++++++-----------------------
1 file changed, 91 insertions(+), 76 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a5af69af235f..6da137f1a764 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3563,6 +3563,80 @@ struct rtnl_newlink_tbs {
struct nlattr *slave_attr[RTNL_SLAVE_MAX_TYPE + 1];
};
+static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct rtnl_newlink_tbs *tbs,
+ struct netlink_ext_ack *extack,
+ struct net *target_net, struct net_device *dev,
+ const struct rtnl_link_ops *ops,
+ struct nlattr **linkinfo, struct nlattr **data)
+{
+ const struct rtnl_link_ops *m_ops = NULL;
+ struct ifinfomsg *ifm = nlmsg_data(nlh);
+ struct nlattr ** const tb = tbs->tb;
+ struct nlattr **slave_data = NULL;
+ struct net_device *master_dev;
+ int err, status = 0;
+
+ if (nlh->nlmsg_flags & NLM_F_EXCL)
+ return -EEXIST;
+ if (nlh->nlmsg_flags & NLM_F_REPLACE)
+ return -EOPNOTSUPP;
+
+ err = validate_linkmsg(dev, tb, extack);
+ if (err < 0)
+ return err;
+
+ master_dev = netdev_master_upper_dev_get(dev);
+ if (master_dev)
+ m_ops = master_dev->rtnl_link_ops;
+
+ if (m_ops) {
+ err = -EINVAL;
+ if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
+ goto out;
+
+ if (m_ops->slave_maxtype &&
+ linkinfo[IFLA_INFO_SLAVE_DATA]) {
+ err = nla_parse_nested_deprecated(tbs->slave_attr,
+ m_ops->slave_maxtype,
+ linkinfo[IFLA_INFO_SLAVE_DATA],
+ m_ops->slave_policy,
+ extack);
+ if (err < 0)
+ goto out;
+ slave_data = tbs->slave_attr;
+ }
+ }
+
+ if (linkinfo[IFLA_INFO_DATA]) {
+ err = -EOPNOTSUPP;
+ if (!ops || ops != dev->rtnl_link_ops ||
+ !ops->changelink)
+ goto out;
+
+ err = ops->changelink(dev, tb, data, extack);
+ if (err < 0)
+ goto out;
+ status |= DO_SETLINK_NOTIFY;
+ }
+
+ if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
+ err = -EOPNOTSUPP;
+ if (!m_ops || !m_ops->slave_changelink)
+ goto out;
+
+ err = m_ops->slave_changelink(master_dev, dev, tb,
+ slave_data, extack);
+ if (err < 0)
+ goto out;
+ status |= DO_SETLINK_NOTIFY;
+ }
+
+ err = do_setlink(target_net, skb, dev, ifm, extack, tb, status);
+out:
+ return err;
+}
+
static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtnl_newlink_tbs *tbs,
struct netlink_ext_ack *extack,
@@ -3570,11 +3644,8 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
{
struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
struct nlattr ** const tb = tbs->tb;
- const struct rtnl_link_ops *m_ops;
- struct net_device *master_dev;
struct net *net = sock_net(skb->sk);
const struct rtnl_link_ops *ops;
- struct nlattr **slave_data;
char kind[MODULE_NAME_LEN];
struct net_device *dev;
struct ifinfomsg *ifm;
@@ -3585,29 +3656,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
#ifdef CONFIG_MODULES
replay:
#endif
- ifm = nlmsg_data(nlh);
- if (ifm->ifi_index > 0) {
- link_specified = true;
- dev = __dev_get_by_index(net, ifm->ifi_index);
- } else if (ifm->ifi_index < 0) {
- NL_SET_ERR_MSG(extack, "ifindex can't be negative");
- return -EINVAL;
- } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
- link_specified = true;
- dev = rtnl_dev_get(net, tb);
- } else {
- link_specified = false;
- dev = NULL;
- }
-
- master_dev = NULL;
- m_ops = NULL;
- if (dev) {
- master_dev = netdev_master_upper_dev_get(dev);
- if (master_dev)
- m_ops = master_dev->rtnl_link_ops;
- }
-
if (tb[IFLA_LINKINFO]) {
err = nla_parse_nested_deprecated(linkinfo, IFLA_INFO_MAX,
tb[IFLA_LINKINFO],
@@ -3645,59 +3693,26 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
}
}
- slave_data = NULL;
- if (m_ops) {
- if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
- return -EINVAL;
-
- if (m_ops->slave_maxtype &&
- linkinfo[IFLA_INFO_SLAVE_DATA]) {
- err = nla_parse_nested_deprecated(tbs->slave_attr,
- m_ops->slave_maxtype,
- linkinfo[IFLA_INFO_SLAVE_DATA],
- m_ops->slave_policy,
- extack);
- if (err < 0)
- return err;
- slave_data = tbs->slave_attr;
- }
+ ifm = nlmsg_data(nlh);
+ if (ifm->ifi_index > 0) {
+ link_specified = true;
+ dev = __dev_get_by_index(net, ifm->ifi_index);
+ } else if (ifm->ifi_index < 0) {
+ NL_SET_ERR_MSG(extack, "ifindex can't be negative");
+ return -EINVAL;
+ } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
+ link_specified = true;
+ dev = rtnl_dev_get(net, tb);
+ } else {
+ link_specified = false;
+ dev = NULL;
}
if (dev) {
- int status = 0;
-
- if (nlh->nlmsg_flags & NLM_F_EXCL)
- return -EEXIST;
- if (nlh->nlmsg_flags & NLM_F_REPLACE)
- return -EOPNOTSUPP;
-
- err = validate_linkmsg(dev, tb, extack);
- if (err < 0)
- return err;
-
- if (linkinfo[IFLA_INFO_DATA]) {
- if (!ops || ops != dev->rtnl_link_ops ||
- !ops->changelink)
- return -EOPNOTSUPP;
-
- err = ops->changelink(dev, tb, data, extack);
- if (err < 0)
- return err;
- status |= DO_SETLINK_NOTIFY;
- }
-
- if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
- if (!m_ops || !m_ops->slave_changelink)
- return -EOPNOTSUPP;
-
- err = m_ops->slave_changelink(master_dev, dev, tb,
- slave_data, extack);
- if (err < 0)
- return err;
- status |= DO_SETLINK_NOTIFY;
- }
-
- return do_setlink(target_net, skb, dev, ifm, extack, tb, status);
+ err = __rtnl_newlink_setlink(skb, nlh, tbs, extack,
+ target_net, dev,
+ ops, linkinfo, data);
+ return err;
}
if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 05/51] net: Move dereference of tb[IFLA_MASTER] up
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (3 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 04/51] net: Extract some code from __rtnl_newlink() to separate func Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 06/51] net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create() Kirill Tkhai
` (47 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
...to where main dev is dereferenced by ifi_index.
The patch is preparation in rtnetlink code for using nd_lock.
Having dereference of dev and master in same places allow
to double lock them the same time.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 72 +++++++++++++++++++++++++++++++++++++-------------
1 file changed, 53 insertions(+), 19 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6da137f1a764..a33b60d1de2d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2675,7 +2675,7 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
return err;
}
-static int do_set_master(struct net_device *dev, int ifindex,
+static int do_set_master(struct net_device *dev, struct net_device *master,
struct netlink_ext_ack *extack)
{
struct net_device *upper_dev = netdev_master_upper_dev_get(dev);
@@ -2683,7 +2683,7 @@ static int do_set_master(struct net_device *dev, int ifindex,
int err;
if (upper_dev) {
- if (upper_dev->ifindex == ifindex)
+ if (upper_dev == master)
return 0;
ops = upper_dev->netdev_ops;
if (ops->ndo_del_slave) {
@@ -2695,10 +2695,8 @@ static int do_set_master(struct net_device *dev, int ifindex,
}
}
- if (ifindex) {
- upper_dev = __dev_get_by_index(dev_net(dev), ifindex);
- if (!upper_dev)
- return -EINVAL;
+ if (master) {
+ upper_dev = master;
ops = upper_dev->netdev_ops;
if (ops->ndo_add_slave) {
err = ops->ndo_add_slave(upper_dev, dev, extack);
@@ -2775,7 +2773,8 @@ static int do_set_proto_down(struct net_device *dev,
/* notify flag means notify + modified. */
#define DO_SETLINK_NOTIFY 0x03
static int do_setlink(struct net *net, const struct sk_buff *skb,
- struct net_device *dev, struct ifinfomsg *ifm,
+ struct net_device *dev, struct net_device *master,
+ struct ifinfomsg *ifm,
struct netlink_ext_ack *extack,
struct nlattr **tb, int status)
{
@@ -2897,8 +2896,8 @@ static int do_setlink(struct net *net, const struct sk_buff *skb,
goto errout;
}
- if (tb[IFLA_MASTER]) {
- err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]), extack);
+ if (master) {
+ err = do_set_master(dev, master, extack);
if (err)
goto errout;
status |= DO_SETLINK_MODIFIED;
@@ -3156,12 +3155,24 @@ static struct net_device *rtnl_dev_get(struct net *net,
return __dev_get_by_name(net, ifname);
}
+static struct net_device *rtnl_master_get(struct net *net, struct nlattr *tb[])
+{
+ struct net_device *master;
+
+ if (!tb[IFLA_MASTER])
+ return NULL;
+ master = __dev_get_by_index(net, nla_get_u32(tb[IFLA_MASTER]));
+ if (!master)
+ return ERR_PTR(-EINVAL);
+ return master;
+}
+
static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
struct ifinfomsg *ifm;
- struct net_device *dev;
+ struct net_device *dev, *master = NULL;
struct net *target_net = NULL;
int err;
struct nlattr *tb[IFLA_MAX+1];
@@ -3196,11 +3207,17 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
goto errout;
}
+ master = rtnl_master_get(target_net ? : net, tb);
+ if (IS_ERR(master)) {
+ err = -EINVAL;
+ goto errout;
+ }
+
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
goto errout;
- err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
+ err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
errout:
if (target_net)
put_net(target_net);
@@ -3440,7 +3457,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
struct netlink_ext_ack *extack,
struct nlattr **tb)
{
- struct net_device *dev, *aux;
+ struct net_device *dev, *aux, *master = NULL;
struct net *target_net;
int err;
@@ -3451,12 +3468,18 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
goto out;
}
+ master = rtnl_master_get(target_net ? : net, tb);
+ if (IS_ERR(master)) {
+ err = -EINVAL;
+ goto out;
+ }
+
for_each_netdev_safe(net, dev, aux) {
if (dev->group == group) {
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
break;
- err = do_setlink(target_net, skb, dev, ifm, extack, tb, 0);
+ err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
if (err < 0)
break;
}
@@ -3478,7 +3501,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
struct net *net = sock_net(skb->sk);
u32 portid = NETLINK_CB(skb).portid;
struct net *link_net;
- struct net_device *dev;
+ struct net_device *dev, *master = NULL;
char ifname[IFNAMSIZ];
int err;
@@ -3519,6 +3542,12 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
dev->ifindex = ifm->ifi_index;
+ master = rtnl_master_get(link_net ? : dest_net, tb);
+ if (IS_ERR(master)) {
+ err = -EINVAL;
+ goto out;
+ }
+
if (ops->newlink)
err = ops->newlink(link_net ? : net, dev, tb, data, extack);
else
@@ -3536,8 +3565,8 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
if (err < 0)
goto out_unregister;
}
- if (tb[IFLA_MASTER]) {
- err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]), extack);
+ if (master) {
+ err = do_set_master(dev, master, extack);
if (err)
goto out_unregister;
}
@@ -3567,6 +3596,7 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtnl_newlink_tbs *tbs,
struct netlink_ext_ack *extack,
struct net *target_net, struct net_device *dev,
+ struct net_device *new_master,
const struct rtnl_link_ops *ops,
struct nlattr **linkinfo, struct nlattr **data)
{
@@ -3632,7 +3662,7 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
status |= DO_SETLINK_NOTIFY;
}
- err = do_setlink(target_net, skb, dev, ifm, extack, tb, status);
+ err = do_setlink(target_net, skb, dev, new_master, ifm, extack, tb, status);
out:
return err;
}
@@ -3647,7 +3677,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
const struct rtnl_link_ops *ops;
char kind[MODULE_NAME_LEN];
- struct net_device *dev;
+ struct net_device *dev, *new_master = NULL;
struct ifinfomsg *ifm;
struct nlattr **data;
bool link_specified;
@@ -3709,8 +3739,12 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
}
if (dev) {
+ new_master = rtnl_master_get(target_net ? : net, tb);
+ if (IS_ERR(new_master))
+ return -EINVAL;
+
err = __rtnl_newlink_setlink(skb, nlh, tbs, extack,
- target_net, dev,
+ target_net, dev, new_master,
ops, linkinfo, data);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 06/51] net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (4 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 05/51] net: Move dereference of tb[IFLA_MASTER] up Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 07/51] net: Introduce nd_lock and primitives to work with it Kirill Tkhai
` (46 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a33b60d1de2d..046736091b4f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3503,6 +3503,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
struct net *link_net;
struct net_device *dev, *master = NULL;
char ifname[IFNAMSIZ];
+ LIST_HEAD(list_kill);
int err;
if (!ops->alloc && !ops->setup)
@@ -3576,13 +3577,11 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
return err;
out_unregister:
if (ops->newlink) {
- LIST_HEAD(list_kill);
-
ops->dellink(dev, &list_kill);
- unregister_netdevice_many(&list_kill);
} else {
- unregister_netdevice(dev);
+ unregister_netdevice_queue(dev, &list_kill);
}
+ unregister_netdevice_many(&list_kill);
goto out;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 07/51] net: Introduce nd_lock and primitives to work with it
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (5 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 06/51] net: Use unregister_netdevice_many() for both error cases in rtnl_newlink_create() Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:38 ` [PATCH NET-PREV 08/51] net: Initially attaching and detaching nd_lock Kirill Tkhai
` (45 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
include/linux/netdevice.h | 28 ++++
net/core/dev.c | 329 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 357 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 614ec5d3d75b..e36e64310bd4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1716,6 +1716,14 @@ enum netdev_reg_state {
NETREG_DUMMY, /* dummy device for NAPI poll */
};
+struct nd_lock {
+ struct mutex mutex;
+ struct list_head list;
+ int nr; /* number of entries in list */
+ refcount_t usage;
+ struct rcu_head rcu;
+};
+
/**
* struct net_device - The DEVICE structure.
*
@@ -2081,6 +2089,8 @@ struct net_device {
char name[IFNAMSIZ];
struct netdev_name_node *name_node;
struct dev_ifalias __rcu *ifalias;
+ struct nd_lock __rcu *nd_lock; /* lock protecting this dev */
+ struct list_head nd_lock_entry; /* entry in nd_lock::list */
/*
* I/O specific fields
* FIXME: Merge these and struct ifmap into one
@@ -3094,6 +3104,24 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
return ret;
}
+void unlock_netdev(struct nd_lock *nd_lock);
+bool lock_netdev(struct net_device *dev, struct nd_lock **nd_lock);
+bool lock_netdev_nested(struct net_device *dev, struct nd_lock **nd_lock,
+ struct nd_lock *held_lock);
+bool double_lock_netdev(struct net_device *dev, struct nd_lock **nd_lock,
+ struct net_device *dev2, struct nd_lock **nd_lock2);
+void double_unlock_netdev(struct nd_lock *nd_lock, struct nd_lock *nd_lock2);
+
+struct nd_lock *alloc_nd_lock(void);
+void put_nd_lock(struct nd_lock *nd_lock);
+void attach_nd_lock(struct net_device *dev, struct nd_lock *nd_lock);
+void detach_nd_lock(struct net_device *dev);
+struct nd_lock *attach_new_nd_lock(struct net_device *dev);
+
+extern struct nd_lock fallback_nd_lock;
+
+void nd_lock_transfer_devices(struct nd_lock **p_lock, struct nd_lock **p_lock2);
+
int register_netdevice(struct net_device *dev);
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
void unregister_netdevice_many(struct list_head *head);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0d0b983a6c21..9d98ab1e76bd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -170,6 +170,25 @@ static int call_netdevice_notifiers_extack(unsigned long val,
struct net_device *dev,
struct netlink_ext_ack *extack);
+/* While unregistering many devices at once, e.g., in ->exit_batch_rtnl
+ * methods, every netdev must be locked.
+ * Instead of taking all original nd_locks of devices at once, we transfer
+ * devices to relate to this @fallback_nd_lock. It allows to own a single
+ * lock during the unregistration. See locks_ordered() for locking order
+ * details.
+ *
+ * Not a first priority TODO is to change this algorithm to use one
+ * of original locks of these devices to transfer every device to.
+ *
+ * XXX: look at comment to nd_lock_transfer_devices().
+ */
+struct nd_lock fallback_nd_lock = {
+ .mutex = __MUTEX_INITIALIZER(fallback_nd_lock.mutex),
+ .list = LIST_HEAD_INIT(fallback_nd_lock.list),
+ .usage = REFCOUNT_INIT(1),
+};
+EXPORT_SYMBOL(fallback_nd_lock);
+
static DEFINE_MUTEX(ifalias_mutex);
/* protects napi_hash addition/deletion and napi_gen_id */
@@ -10322,6 +10341,315 @@ static void netdev_do_free_pcpu_stats(struct net_device *dev)
}
}
+struct nd_lock *alloc_nd_lock(void)
+{
+ struct nd_lock *nd_lock = kmalloc(sizeof(*nd_lock), GFP_KERNEL);
+
+ if (!nd_lock)
+ return NULL;
+
+ mutex_init(&nd_lock->mutex);
+ INIT_LIST_HEAD(&nd_lock->list);
+ nd_lock->nr = 0;
+ refcount_set(&nd_lock->usage, 1);
+ return nd_lock;
+}
+EXPORT_SYMBOL(alloc_nd_lock);
+
+void put_nd_lock(struct nd_lock *nd_lock)
+{
+ if (!refcount_dec_and_test(&nd_lock->usage))
+ return;
+ BUG_ON(!list_empty(&nd_lock->list));
+ kfree_rcu(nd_lock, rcu);
+}
+EXPORT_SYMBOL(put_nd_lock);
+
+/* Locking order: fallback_nd_lock is first,
+ * then prefer lock with smaller address.
+ */
+static bool locks_ordered(struct nd_lock *nd_lock, struct nd_lock *nd_lock2)
+{
+ if ((nd_lock <= nd_lock2 && nd_lock2 != &fallback_nd_lock) ||
+ nd_lock == &fallback_nd_lock)
+ return true;
+ return false;
+}
+
+/* Lock alive @dev or return false. @held_lock is optional argument.
+ * In case of @held_lock is passed, the caller must guarantee that
+ * dev->nd_lock is after @held_lock in the locking order (for details
+ * see locks_ordered()).
+ * Usually, held_lock is fallback_nd_lock.
+ */
+static bool __lock_netdev(struct net_device *dev, struct nd_lock **ret_nd_lock,
+ struct nd_lock *held_lock)
+{
+ struct nd_lock *nd_lock;
+ bool got;
+
+ if (held_lock)
+ lockdep_assert_held(&held_lock->mutex);
+
+ while (1) {
+ rcu_read_lock();
+ nd_lock = rcu_dereference(dev->nd_lock);
+ if (nd_lock && nd_lock != held_lock)
+ got = refcount_inc_not_zero(&nd_lock->usage);
+ rcu_read_unlock();
+
+ if (unlikely(!nd_lock)) {
+ /* Someone is unregistering @dev in parallel */
+ *ret_nd_lock = NULL;
+ return false;
+ }
+
+ /* The same lock as we own. Nothing to do. */
+ if (nd_lock == held_lock)
+ break;
+
+ if (unlikely(!got)) {
+ /* @dev->nd_lock changed or @dev is unregistering */
+ cond_resched();
+ continue;
+ }
+
+ WARN_ON(held_lock && !locks_ordered(held_lock, nd_lock));
+
+ if (!held_lock)
+ mutex_lock(&nd_lock->mutex);
+ else
+ mutex_lock_nested(&nd_lock->mutex, SINGLE_DEPTH_NESTING);
+ /* Check after mutex is locked it has not changed */
+ if (likely(nd_lock == rcu_access_pointer(dev->nd_lock)))
+ break;
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+ cond_resched();
+ }
+
+ *ret_nd_lock = nd_lock;
+ return true;
+}
+
+bool lock_netdev(struct net_device *dev, struct nd_lock **nd_lock)
+{
+ return __lock_netdev(dev, nd_lock, NULL);
+}
+EXPORT_SYMBOL(lock_netdev);
+
+bool lock_netdev_nested(struct net_device *dev, struct nd_lock **nd_lock,
+ struct nd_lock *held_lock)
+{
+ return __lock_netdev(dev, nd_lock, held_lock);
+}
+EXPORT_SYMBOL(lock_netdev_nested);
+
+void unlock_netdev(struct nd_lock *nd_lock)
+{
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+}
+EXPORT_SYMBOL(unlock_netdev);
+
+/* Double lock two devices and return locks they currently attached to.
+ * It's acceptable for one of devices to be NULL, or @dev and @dev2 may
+ * point to the same device. Pair bracket double_unlock_netdev() are able
+ * to handle such cases.
+ */
+bool double_lock_netdev(struct net_device *dev, struct nd_lock **ret_nd_lock,
+ struct net_device *dev2, struct nd_lock **ret_nd_lock2)
+{
+ struct nd_lock *nd_lock, *nd_lock2;
+ bool got, got2, ret;
+
+ if (WARN_ON_ONCE(!dev && !dev2))
+ return false;
+
+ if (dev == dev2 || !dev != !dev2) {
+ ret = lock_netdev(dev ? : dev2, ret_nd_lock);
+ *ret_nd_lock2 = *ret_nd_lock;
+ return ret;
+ }
+
+ while (1) {
+ got = got2 = false;
+ rcu_read_lock();
+ nd_lock = rcu_dereference(dev->nd_lock);
+ if (nd_lock)
+ got = refcount_inc_not_zero(&nd_lock->usage);
+ nd_lock2 = rcu_dereference(dev2->nd_lock);
+ if (nd_lock2) {
+ if (nd_lock2 != nd_lock)
+ got2 = refcount_inc_not_zero(&nd_lock2->usage);
+ }
+ rcu_read_unlock();
+ if (!got || (!got2 && nd_lock2 != nd_lock))
+ goto restart;
+
+ if (locks_ordered(nd_lock, nd_lock2)){
+ mutex_lock(&nd_lock->mutex);
+ if (nd_lock != nd_lock2)
+ mutex_lock_nested(&nd_lock2->mutex, SINGLE_DEPTH_NESTING);
+ } else {
+ mutex_lock(&nd_lock2->mutex);
+ if (nd_lock != nd_lock2)
+ mutex_lock_nested(&nd_lock->mutex, SINGLE_DEPTH_NESTING);
+ }
+
+ if (likely(nd_lock == rcu_access_pointer(dev->nd_lock) &&
+ nd_lock2 == rcu_access_pointer(dev2->nd_lock))) {
+ /* Both locks are acquired and correct */
+ break;
+ }
+ if (nd_lock != nd_lock2)
+ mutex_unlock(&nd_lock2->mutex);
+ mutex_unlock(&nd_lock->mutex);
+restart:
+ if (got)
+ put_nd_lock(nd_lock);
+ if (got2)
+ put_nd_lock(nd_lock2);
+ if (!nd_lock || !nd_lock2) {
+ *ret_nd_lock = *ret_nd_lock2 = NULL;
+ return false;
+ }
+ }
+
+ *ret_nd_lock = nd_lock;
+ *ret_nd_lock2 = nd_lock2;
+ return true;
+}
+EXPORT_SYMBOL(double_lock_netdev);
+
+void double_unlock_netdev(struct nd_lock *nd_lock, struct nd_lock *nd_lock2)
+{
+ if (nd_lock != nd_lock2)
+ unlock_netdev(nd_lock);
+ unlock_netdev(nd_lock2);
+}
+EXPORT_SYMBOL(double_unlock_netdev);
+
+/* Make set of devices protected by @p_lock and set of devices protected
+ * by @p_lock2 to be protected the same lock (this function chooses one
+ * of @p_lock and @p_lock2 as that common lock).
+ *
+ * 1)We call this in drivers which make two or more devices bound each other.
+ * E.g., drivers using newlink (like bonding, bridge and veth), or connecting
+ * several devices in switch (like dsa). Nested configurations are also
+ * handled to relate the same nd_lock (e.g., if veth is attached to bridge,
+ * the same lock will be shared by both veth peers, all bridge ports
+ * and the bridge itself).
+ *
+ * This allow to introduce sane locking like:
+ *
+ * lock_netdev(bridge, &nd_lock)
+ * ioctl(change bridge)
+ * netdevice notifier for bridge // protected by nd_lock
+ * netdevice notifier for veth // protected by nd_lock
+ * change veth parameter // protected by nd_lock
+ * netdevice notifier for other port // protected by nd_lock
+ * change port device parameter // protected by nd_lock
+ * unlock_netdev(nd_lock)
+ *
+ * So, each lock protects some group devices in the system, and all
+ * of devices in the group are connected in some logical way.
+ *
+ * 2)The main rule for choosing common lock is simple: we prefer fallback_nd_lock.
+ * Why it is so? Along with common used virtual devices, there are
+ * several hardware devices, which connect devices in groups and
+ * touches or modifies several devices together in one ioctl
+ * or netdevice event (e.g., mlx5). Not having every of devices zoo
+ * physically, it's impossible to organize them in small exact groups
+ * and test. So, we attach them to bigger fallback group.
+ *
+ * Let we have converted bridge driver and not converted my_driver. In case
+ * of we attach my_driver dev1 to the bridge, the bridge and my_driver dev1
+ * must relate to the same nd_lock. But the only nd_lock we can attach is
+ * fallback_nd_lock, otherwise my_driver dev1 may appear in different lock
+ * groups with some my_driver dev2 after my_driver dev2 is loaded. This
+ * would be wrong, since dev1 and dev2 may be used in same ioctl or netdevice
+ * event. So, fallback_nd_lock will be used as result lock.
+ *
+ * Note, that after all hardware drivers organize their logically connected
+ * devices in correct nd_lock groups, we remove this rule.
+ *
+ * The second rule is we prefer to migrate from smaller list, since
+ * there are less iterations.
+ *
+ * 3)Note, that reverse operation (splitting a lock into two locks) is not
+ * implemented at the moment (and it maybe useless).
+ *
+ * 4)Newly used lock for both sets is returned in @p_lock2 argument.
+ */
+void nd_lock_transfer_devices(struct nd_lock **p_lock, struct nd_lock **p_lock2)
+{
+ struct nd_lock *lock = *p_lock, *lock2 = *p_lock2;
+ struct net_device *dev;
+
+ lockdep_assert_held(&lock->mutex);
+ lockdep_assert_held(&lock2->mutex);
+
+ if (lock == lock2)
+ return;
+
+ if (lock == &fallback_nd_lock ||
+ (lock2 != &fallback_nd_lock && lock->nr > lock2->nr))
+ swap(lock, lock2);
+
+ list_for_each_entry(dev, &lock->list, nd_lock_entry)
+ rcu_assign_pointer(dev->nd_lock, lock2);
+
+ list_splice(&lock->list, &lock2->list);
+ refcount_add(lock->nr, &lock2->usage);
+ lock2->nr += lock->nr;
+ lock->nr = 0;
+ /* Our caller must own @lock since its locked */
+ WARN_ON(refcount_sub_and_test(lock->nr, &lock->usage));
+
+ *p_lock = lock;
+ *p_lock2 = lock2;
+}
+EXPORT_SYMBOL(nd_lock_transfer_devices);
+
+void attach_nd_lock(struct net_device *dev, struct nd_lock *nd_lock)
+{
+ lockdep_assert_held(&nd_lock->mutex);
+ rcu_assign_pointer(dev->nd_lock, nd_lock);
+ list_add(&dev->nd_lock_entry, &nd_lock->list);
+ refcount_inc(&nd_lock->usage);
+ nd_lock->nr++;
+}
+EXPORT_SYMBOL(attach_nd_lock);
+
+void detach_nd_lock(struct net_device *dev)
+{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
+
+ lockdep_assert_held(&nd_lock->mutex);
+ rcu_assign_pointer(dev->nd_lock, NULL);
+ list_del_init(&dev->nd_lock_entry);
+ nd_lock->nr--;
+ /* Our caller must own @lock since its locked */
+ WARN_ON(refcount_dec_and_test(&nd_lock->usage));
+}
+EXPORT_SYMBOL(detach_nd_lock);
+
+struct nd_lock *attach_new_nd_lock(struct net_device *dev)
+{
+ struct nd_lock *nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ return NULL;
+
+ mutex_lock(&nd_lock->mutex);
+ attach_nd_lock(dev, nd_lock);
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+
+ return nd_lock;
+}
+EXPORT_SYMBOL(attach_new_nd_lock);
+
/**
* register_netdevice() - register a network device
* @dev: device to register
@@ -11094,6 +11422,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->link_watch_list);
INIT_LIST_HEAD(&dev->adj_list.upper);
INIT_LIST_HEAD(&dev->adj_list.lower);
+ INIT_LIST_HEAD(&dev->nd_lock_entry);
INIT_LIST_HEAD(&dev->ptype_all);
INIT_LIST_HEAD(&dev->ptype_specific);
INIT_LIST_HEAD(&dev->net_notifier_list);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 08/51] net: Initially attaching and detaching nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (6 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 07/51] net: Introduce nd_lock and primitives to work with it Kirill Tkhai
@ 2025-03-22 14:38 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 09/51] net: Use register_netdevice() in loopback() Kirill Tkhai
` (44 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:38 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
To start convertation devices one by one, we need
defaults assigned to rest of devices.
Here we add default lock assignment and a branch
for already converted drivers in register_netdevice.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
include/linux/netdevice.h | 8 +++
net/core/dev.c | 127 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 123 insertions(+), 12 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e36e64310bd4..2e9052e808a4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3122,14 +3122,22 @@ extern struct nd_lock fallback_nd_lock;
void nd_lock_transfer_devices(struct nd_lock **p_lock, struct nd_lock **p_lock2);
+int __register_netdevice(struct net_device *dev);
int register_netdevice(struct net_device *dev);
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
void unregister_netdevice_many(struct list_head *head);
+/* XXX: This will be converted to take nd_lock after drivers are ready */
static inline void unregister_netdevice(struct net_device *dev)
{
unregister_netdevice_queue(dev, NULL);
}
+/* XXX: This will be used in places, where nd_lock is already taken */
+static inline void __unregister_netdevice(struct net_device *dev)
+{
+ unregister_netdevice_queue(dev, NULL);
+}
+
int netdev_refcnt_read(const struct net_device *dev);
void free_netdev(struct net_device *dev);
void init_dummy_netdev(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 9d98ab1e76bd..63ece39c9286 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10651,7 +10651,7 @@ struct nd_lock *attach_new_nd_lock(struct net_device *dev)
EXPORT_SYMBOL(attach_new_nd_lock);
/**
- * register_netdevice() - register a network device
+ * __register_netdevice() - register a network device
* @dev: device to register
*
* Take a prepared network device structure and make it externally accessible.
@@ -10659,7 +10659,7 @@ EXPORT_SYMBOL(attach_new_nd_lock);
* Callers must hold the rtnl lock - you may want register_netdev()
* instead of this.
*/
-int register_netdevice(struct net_device *dev)
+int __register_netdevice(struct net_device *dev)
{
int ret;
struct net *net = dev_net(dev);
@@ -10675,6 +10675,9 @@ int register_netdevice(struct net_device *dev)
BUG_ON(dev->reg_state != NETREG_UNINITIALIZED);
BUG_ON(!net);
+ if (WARN_ON(!rcu_access_pointer(dev->nd_lock)))
+ return -ENOLCK;
+
ret = ethtool_check_ops(dev->ethtool_ops);
if (ret)
return ret;
@@ -10837,6 +10840,40 @@ int register_netdevice(struct net_device *dev)
netdev_name_node_free(dev->name_node);
goto out;
}
+EXPORT_SYMBOL(__register_netdevice);
+
+int register_netdevice(struct net_device *dev)
+{
+ struct nd_lock *nd_lock;
+ int err;
+
+ /* XXX: This "if" is to start one by one convertation
+ * to use __register_netdevice() in devices, that
+ * want to attach nd_lock themself (e.g., having newlink).
+ * After all of them are converted, we remove this.
+ */
+ if (rcu_access_pointer(dev->nd_lock))
+ return __register_netdevice(dev);
+
+ nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ return -ENOMEM;
+
+ /* This may be called from netdevice notifier, which is not converted
+ * yet. The context is unknown: either some nd_lock is locked or not.
+ * Sometimes here is nested mutex and sometimes is not. We use trylock
+ * to silence lockdep assert about that.
+ * It will be replaced by mutex_lock(), see next patches.
+ */
+ BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ attach_nd_lock(dev, nd_lock);
+ err = __register_netdevice(dev);
+ if (err)
+ detach_nd_lock(dev);
+ mutex_unlock(&nd_lock->mutex);
+ put_nd_lock(nd_lock);
+ return err;
+}
EXPORT_SYMBOL(register_netdevice);
/* Initialize the core of a dummy net device.
@@ -10907,7 +10944,23 @@ int register_netdev(struct net_device *dev)
if (rtnl_lock_killable())
return -EINTR;
- err = register_netdevice(dev);
+
+ /* Since this function is called without rtnl_lock(),
+ * nested registration is not possible here (compare
+ * to .newlink). So it's not interesting for us as
+ * much as register_netdevice(). Here are possible some
+ * real cross-device links between devices related
+ * to specific driver family, and they are handled by
+ * using fallback_nd_lock for all devices.
+ * Also, see comment in nd_lock_transfer_devices().
+ */
+ mutex_lock(&fallback_nd_lock.mutex);
+ attach_nd_lock(dev, &fallback_nd_lock);
+ err = __register_netdevice(dev);
+ if (err)
+ detach_nd_lock(dev);
+ mutex_unlock(&fallback_nd_lock.mutex);
+
rtnl_unlock();
return err;
}
@@ -11474,6 +11527,54 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
}
EXPORT_SYMBOL(alloc_netdev_mqs);
+static DEFINE_SPINLOCK(put_lock);
+static LIST_HEAD(put_list);
+
+static void put_work_func(struct work_struct *unused)
+{
+ struct nd_lock *nd_lock;
+ struct net_device *dev;
+ LIST_HEAD(list);
+
+ spin_lock(&put_lock);
+ list_replace_init(&put_list, &list);
+ spin_unlock(&put_lock);
+
+ while (!list_empty(&list)) {
+ dev = list_first_entry(&list,
+ struct net_device,
+ todo_list);
+ list_del_init(&dev->todo_list);
+
+ /* XXX: this nd_lock finaly should be held during
+ * the whole unregistering. Since not all of devices
+ * are converted yet, we place the detach_nd_lock here
+ * to be able to start attaching nd_lock to every device
+ * one by one in separate patches of this series.
+ * Then, it will be moved to callers (unregister_netdevice()
+ * and others).
+ *
+ * Note, we can't place the below to free_netdev(), because
+ * of free_netdev() currently may be called locked and unlocked
+ * from different callers.
+ *
+ * Also note, that lock may be detached here in case of
+ * this is cleanup after failed __register_netdevice().
+ */
+ if (lock_netdev(dev, &nd_lock)) {
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
+ }
+
+ if (dev->reg_state == NETREG_RELEASED)
+ put_device(&dev->dev); /* free via device release */
+ else /* Compatibility with error handling in drivers */
+ kvfree(dev);
+ }
+}
+
+static DECLARE_WORK(put_work, put_work_func);
+
/**
* free_netdev - free network device
* @dev: device
@@ -11486,6 +11587,7 @@ EXPORT_SYMBOL(alloc_netdev_mqs);
void free_netdev(struct net_device *dev)
{
struct napi_struct *p, *n;
+ bool work;
might_sleep();
@@ -11521,18 +11623,19 @@ void free_netdev(struct net_device *dev)
free_percpu(dev->xdp_bulkq);
dev->xdp_bulkq = NULL;
- /* Compatibility with error handling in drivers */
- if (dev->reg_state == NETREG_UNINITIALIZED ||
- dev->reg_state == NETREG_DUMMY) {
- kvfree(dev);
- return;
+ if (dev->reg_state != NETREG_UNINITIALIZED &&
+ dev->reg_state != NETREG_DUMMY) {
+ BUG_ON(dev->reg_state != NETREG_UNREGISTERED);
+ WRITE_ONCE(dev->reg_state, NETREG_RELEASED);
}
- BUG_ON(dev->reg_state != NETREG_UNREGISTERED);
- WRITE_ONCE(dev->reg_state, NETREG_RELEASED);
+ spin_lock(&put_lock);
+ list_add_tail(&dev->todo_list, &put_list);
+ work = list_is_singular(&put_list);
+ spin_unlock(&put_lock);
- /* will free via device release */
- put_device(&dev->dev);
+ if (work)
+ schedule_work(&put_work);
}
EXPORT_SYMBOL(free_netdev);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 09/51] net: Use register_netdevice() in loopback()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (7 preceding siblings ...)
2025-03-22 14:38 ` [PATCH NET-PREV 08/51] net: Initially attaching and detaching nd_lock Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 10/51] net: Underline newlink and changelink dependencies Kirill Tkhai
` (43 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
loopback is simple interface without logical links to other devices.
Make it using register_netdevice() to allocate unique nd_lock
for it.
loopback is converted, so 50% work of removing rtnl_lock in kernel
is done.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/loopback.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2b486e7c749c..c911ee7e6c68 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -214,7 +214,11 @@ static __net_init int loopback_net_init(struct net *net)
goto out;
dev_net_set(dev, net);
- err = register_netdev(dev);
+ err = -EINTR;
+ if (rtnl_lock_killable())
+ goto out_free_netdev;
+ err = register_netdevice(dev);
+ rtnl_unlock();
if (err)
goto out_free_netdev;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 10/51] net: Underline newlink and changelink dependencies
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (8 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 09/51] net: Use register_netdevice() in loopback() Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 11/51] net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc Kirill Tkhai
` (42 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
This is to get rtnetlink code knowledge about devices
touching by newlink and changelink to bring them to the same
lock group.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 1 +
drivers/net/amt.c | 5 +++++
drivers/net/bonding/bond_netlink.c | 5 +++++
drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 1 +
drivers/net/ipvlan/ipvlan_main.c | 1 +
drivers/net/ipvlan/ipvtap.c | 1 +
drivers/net/macsec.c | 1 +
drivers/net/macvlan.c | 1 +
drivers/net/macvtap.c | 1 +
drivers/net/vxlan/vxlan_core.c | 6 ++++++
drivers/net/wireless/virtual/virt_wifi.c | 1 +
include/net/rtnetlink.h | 16 ++++++++++++++++
net/8021q/vlan_netlink.c | 1 +
net/core/rtnetlink.c | 5 +++++
net/dsa/netlink.c | 5 +++++
net/hsr/hsr_netlink.c | 6 ++++++
net/ieee802154/6lowpan/core.c | 1 +
17 files changed, 58 insertions(+)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index 9ad8d9856275..2dd3231df36c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -172,6 +172,7 @@ static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
.policy = ipoib_policy,
.priv_size = sizeof(struct ipoib_dev_priv),
.setup = ipoib_setup_common,
+ .newlink_deps = &generic_newlink_deps,
.newlink = ipoib_new_child_link,
.dellink = ipoib_del_child_link,
.changelink = ipoib_changelink,
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index 6d15ab3bfbbc..2288f4bf649c 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3330,6 +3330,10 @@ static int amt_fill_info(struct sk_buff *skb, const struct net_device *dev)
return -EMSGSIZE;
}
+struct link_deps amt_newlink_deps = {
+ .mandatory.data = { IFLA_AMT_LINK, },
+};
+
static struct rtnl_link_ops amt_link_ops __read_mostly = {
.kind = "amt",
.maxtype = IFLA_AMT_MAX,
@@ -3337,6 +3341,7 @@ static struct rtnl_link_ops amt_link_ops __read_mostly = {
.priv_size = sizeof(struct amt_dev),
.setup = amt_link_setup,
.validate = amt_validate,
+ .newlink_deps = &amt_newlink_deps,
.newlink = amt_newlink,
.dellink = amt_dellink,
.get_size = amt_get_size,
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 2a6a424806aa..5fcab77d616f 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -906,6 +906,10 @@ static int bond_fill_linkxstats(struct sk_buff *skb,
return 0;
}
+struct link_deps bond_changelink_deps = {
+ .optional.data = { IFLA_BOND_ACTIVE_SLAVE, IFLA_BOND_PRIMARY, },
+};
+
struct rtnl_link_ops bond_link_ops __read_mostly = {
.kind = "bond",
.priv_size = sizeof(struct bonding),
@@ -914,6 +918,7 @@ struct rtnl_link_ops bond_link_ops __read_mostly = {
.policy = bond_policy,
.validate = bond_validate,
.newlink = bond_newlink,
+ .changelink_deps = &bond_changelink_deps,
.changelink = bond_changelink,
.get_size = bond_get_size,
.fill_info = bond_fill_info,
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index f3bea196a8f9..495368cbef34 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -400,6 +400,7 @@ struct rtnl_link_ops rmnet_link_ops __read_mostly = {
.priv_size = sizeof(struct rmnet_priv),
.setup = rmnet_vnd_setup,
.validate = rmnet_rtnl_validate,
+ .newlink_deps = &generic_newlink_deps,
.newlink = rmnet_newlink,
.dellink = rmnet_dellink,
.get_size = rmnet_get_size,
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 094f44dac5c8..aafaf9d1d822 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -700,6 +700,7 @@ static struct rtnl_link_ops ipvlan_link_ops = {
.priv_size = sizeof(struct ipvl_dev),
.setup = ipvlan_link_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = ipvlan_link_new,
.dellink = ipvlan_link_delete,
.get_link_net = ipvlan_get_link_net,
diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c
index 1afc4c47be73..df1d22092b21 100644
--- a/drivers/net/ipvlan/ipvtap.c
+++ b/drivers/net/ipvlan/ipvtap.c
@@ -128,6 +128,7 @@ static void ipvtap_setup(struct net_device *dev)
static struct rtnl_link_ops ipvtap_link_ops __read_mostly = {
.kind = "ipvtap",
.setup = ipvtap_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = ipvtap_newlink,
.dellink = ipvtap_dellink,
.priv_size = sizeof(struct ipvtap_dev),
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 2da70bc3dd86..246cf09a0ebc 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4430,6 +4430,7 @@ static struct rtnl_link_ops macsec_link_ops __read_mostly = {
.policy = macsec_rtnl_policy,
.setup = macsec_setup,
.validate = macsec_validate_attr,
+ .newlink_deps = &generic_newlink_deps,
.newlink = macsec_newlink,
.changelink = macsec_changelink,
.dellink = macsec_dellink,
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 24298a33e0e9..b51e2e21dead 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1754,6 +1754,7 @@ static struct net *macvlan_get_link_net(const struct net_device *dev)
static struct rtnl_link_ops macvlan_link_ops = {
.kind = "macvlan",
.setup = macvlan_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = macvlan_newlink,
.dellink = macvlan_dellink,
.get_link_net = macvlan_get_link_net,
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 29a5929d48e5..f24168080e04 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -140,6 +140,7 @@ static struct net *macvtap_link_net(const struct net_device *dev)
static struct rtnl_link_ops macvtap_link_ops __read_mostly = {
.kind = "macvtap",
.setup = macvtap_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = macvtap_newlink,
.dellink = macvtap_dellink,
.get_link_net = macvtap_link_net,
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 8983e75e9881..b041ddc2ab34 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -4579,6 +4579,10 @@ static struct net *vxlan_get_link_net(const struct net_device *dev)
return READ_ONCE(vxlan->net);
}
+struct link_deps vxlan_newlink_deps = {
+ .mandatory.data = { IFLA_VXLAN_LINK, },
+};
+
static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
.kind = "vxlan",
.maxtype = IFLA_VXLAN_MAX,
@@ -4586,7 +4590,9 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
.priv_size = sizeof(struct vxlan_dev),
.setup = vxlan_setup,
.validate = vxlan_validate,
+ .newlink_deps = &vxlan_newlink_deps,
.newlink = vxlan_newlink,
+ .changelink_deps= &vxlan_newlink_deps,
.changelink = vxlan_changelink,
.dellink = vxlan_dellink,
.get_size = vxlan_get_size,
diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c
index 4ee374080466..c80ae0e0df53 100644
--- a/drivers/net/wireless/virtual/virt_wifi.c
+++ b/drivers/net/wireless/virtual/virt_wifi.c
@@ -622,6 +622,7 @@ static void virt_wifi_dellink(struct net_device *dev,
static struct rtnl_link_ops virt_wifi_link_ops = {
.kind = "virt_wifi",
.setup = virt_wifi_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = virt_wifi_newlink,
.dellink = virt_wifi_dellink,
.priv_size = sizeof(struct virt_wifi_netdev_priv),
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index b45d57b5968a..f1702e8872cf 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -29,6 +29,18 @@ static inline enum rtnl_kinds rtnl_msgtype_kind(int msgtype)
return msgtype & RTNL_KIND_MASK;
}
+#define MAX_LINK_DEPS 5
+struct link_deps_table {
+ int tb[MAX_LINK_DEPS + 1];
+ int data[MAX_LINK_DEPS + 1];
+};
+
+struct link_deps {
+ struct link_deps_table mandatory;
+ struct link_deps_table optional;
+};
+extern struct link_deps generic_newlink_deps;
+
void rtnl_register(int protocol, int msgtype,
rtnl_doit_func, rtnl_dumpit_func, unsigned int flags);
int rtnl_register_module(struct module *owner, int protocol, int msgtype,
@@ -58,7 +70,9 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
* and @setup are unused. Returns a netdev or ERR_PTR().
* @priv_size: sizeof net_device private space
* @setup: net_device setup function
+ * @newlink_deps: Indexes of real devices that newlink depends on.
* @newlink: Function for configuring and registering a new device
+ * @changelink_deps: Indexes of real devices that changelink depends on.
* @changelink: Function for changing parameters of an existing device
* @dellink: Function to remove a device
* @get_size: Function to calculate required room for dumping device
@@ -96,11 +110,13 @@ struct rtnl_link_ops {
struct nlattr *data[],
struct netlink_ext_ack *extack);
+ struct link_deps *newlink_deps;
int (*newlink)(struct net *src_net,
struct net_device *dev,
struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack);
+ struct link_deps *changelink_deps;
int (*changelink)(struct net_device *dev,
struct nlattr *tb[],
struct nlattr *data[],
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index cf5219df7903..c71180ba0746 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -293,6 +293,7 @@ struct rtnl_link_ops vlan_link_ops __read_mostly = {
.priv_size = sizeof(struct vlan_dev_priv),
.setup = vlan_setup,
.validate = vlan_validate,
+ .newlink_deps = &generic_newlink_deps,
.newlink = vlan_newlink,
.changelink = vlan_changelink,
.dellink = unregister_vlan_dev,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 046736091b4f..cf060ba4cd1d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3490,6 +3490,11 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
return err;
}
+struct link_deps generic_newlink_deps = {
+ .mandatory.tb = { IFLA_LINK, }
+};
+EXPORT_SYMBOL_GPL(generic_newlink_deps);
+
static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
const struct rtnl_link_ops *ops,
const struct nlmsghdr *nlh,
diff --git a/net/dsa/netlink.c b/net/dsa/netlink.c
index 1332e56349e5..835d935814fb 100644
--- a/net/dsa/netlink.c
+++ b/net/dsa/netlink.c
@@ -11,6 +11,10 @@ static const struct nla_policy dsa_policy[IFLA_DSA_MAX + 1] = {
[IFLA_DSA_CONDUIT] = { .type = NLA_U32 },
};
+struct link_deps dsa_changelink_deps = {
+ .optional.data = { IFLA_DSA_CONDUIT, },
+};
+
static int dsa_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
@@ -57,6 +61,7 @@ struct rtnl_link_ops dsa_link_ops __read_mostly = {
.priv_size = sizeof(struct dsa_port),
.maxtype = IFLA_DSA_MAX,
.policy = dsa_policy,
+ .changelink_deps = &dsa_changelink_deps,
.changelink = dsa_changelink,
.get_size = dsa_get_size,
.fill_info = dsa_fill_info,
diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
index f6ff0b61e08a..6ec883739415 100644
--- a/net/hsr/hsr_netlink.c
+++ b/net/hsr/hsr_netlink.c
@@ -176,12 +176,18 @@ static int hsr_fill_info(struct sk_buff *skb, const struct net_device *dev)
return -EMSGSIZE;
}
+static struct link_deps hsr_newlink_deps = {
+ .mandatory.data = { IFLA_HSR_SLAVE1, IFLA_HSR_SLAVE2, },
+ .optional.data = { IFLA_HSR_INTERLINK, },
+};
+
static struct rtnl_link_ops hsr_link_ops __read_mostly = {
.kind = "hsr",
.maxtype = IFLA_HSR_MAX,
.policy = hsr_policy,
.priv_size = sizeof(struct hsr_priv),
.setup = hsr_dev_setup,
+ .newlink_deps = &hsr_newlink_deps,
.newlink = hsr_newlink,
.dellink = hsr_dellink,
.fill_info = hsr_fill_info,
diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
index 77b4e92027c5..4236aafd448f 100644
--- a/net/ieee802154/6lowpan/core.c
+++ b/net/ieee802154/6lowpan/core.c
@@ -196,6 +196,7 @@ static struct rtnl_link_ops lowpan_link_ops __read_mostly = {
.kind = "lowpan",
.priv_size = LOWPAN_PRIV_SIZE(sizeof(struct lowpan_802154_dev)),
.setup = lowpan_setup,
+ .newlink_deps = &generic_newlink_deps,
.newlink = lowpan_newlink,
.dellink = lowpan_dellink,
.validate = lowpan_validate,
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 11/51] net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (9 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 10/51] net: Underline newlink and changelink dependencies Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 12/51] net: Use __register_netdevice in trivial .newlink cases Kirill Tkhai
` (41 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/rtnetlink.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 129 insertions(+), 5 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index cf060ba4cd1d..67b4b0610d14 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2696,9 +2696,16 @@ static int do_set_master(struct net_device *dev, struct net_device *master,
}
if (master) {
+ struct nd_lock *nd_lock = rcu_access_pointer(dev->nd_lock);
+ struct nd_lock *nd_lock2 = rcu_access_pointer(master->nd_lock);
+
upper_dev = master;
ops = upper_dev->netdev_ops;
if (ops->ndo_add_slave) {
+ /* Devices linked as upper<->lower must relate
+ * to the same nd_lock.
+ */
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
err = ops->ndo_add_slave(upper_dev, dev, extack);
if (err)
return err;
@@ -3173,6 +3180,7 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct ifinfomsg *ifm;
struct net_device *dev, *master = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
struct net *target_net = NULL;
int err;
struct nlattr *tb[IFLA_MAX+1];
@@ -3217,7 +3225,9 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
goto errout;
+ double_lock_netdev(dev, &nd_lock, master, &nd_lock2);
err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
+ double_unlock_netdev(nd_lock, nd_lock2);
errout:
if (target_net)
put_net(target_net);
@@ -3458,6 +3468,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
struct nlattr **tb)
{
struct net_device *dev, *aux, *master = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
struct net *target_net;
int err;
@@ -3479,7 +3490,9 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
break;
+ double_lock_netdev(dev, &nd_lock, master, &nd_lock2);
err = do_setlink(target_net, skb, dev, master, ifm, extack, tb, 0);
+ double_unlock_netdev(nd_lock, nd_lock2);
if (err < 0)
break;
}
@@ -3495,6 +3508,74 @@ struct link_deps generic_newlink_deps = {
};
EXPORT_SYMBOL_GPL(generic_newlink_deps);
+static struct net_device *__resolve_deps_locks(struct net *net,
+ struct net_device *dev,
+ struct nlattr **attr,
+ const int deps[],
+ bool mandatory)
+{
+ struct nd_lock *nd_lock, *nd_lock2;
+ struct net_device *dev2;
+ int i, key, ifindex;
+
+ for (i = 0; i <= MAX_LINK_DEPS; i++) {
+ key = deps[i];
+ if (!key)
+ break;
+ if (!attr[key]) {
+ if (mandatory)
+ return ERR_PTR(-ENODEV);
+ continue;
+ }
+ ifindex = nla_get_u32(attr[key]);
+
+ if (!dev) {
+ dev = __dev_get_by_index(net, ifindex);
+ if (!dev && mandatory)
+ return ERR_PTR(-ENODEV);
+ continue;
+ }
+
+ dev2 = __dev_get_by_index(net, ifindex);
+ if (!dev2) {
+ if (mandatory)
+ return ERR_PTR(-ENODEV);
+ continue;
+ }
+ double_lock_netdev(dev, &nd_lock, dev2, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ double_unlock_netdev(nd_lock, nd_lock2);
+ }
+
+ return dev;
+}
+
+/* Transfer all dependencies to the same nd_lock.
+ * Note, here we use that list of nd_lock devices
+ * can't be split in pieces.
+ */
+static struct net_device *resolve_deps_locks(struct net *net,
+ const struct link_deps *deps,
+ struct nlattr **tb,
+ struct nlattr **data)
+{
+ struct net_device *dev = NULL;
+
+ if (!deps)
+ return NULL;
+
+ dev = __resolve_deps_locks(net, dev, tb, deps->mandatory.tb, true);
+ if (IS_ERR(dev))
+ return dev;
+ dev = __resolve_deps_locks(net, dev, data, deps->mandatory.data, true);
+ if (IS_ERR(dev))
+ return dev;
+ dev = __resolve_deps_locks(net, dev, tb, deps->optional.tb, false);
+ dev = __resolve_deps_locks(net, dev, tb, deps->optional.data, false);
+
+ return dev;
+}
+
static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
const struct rtnl_link_ops *ops,
const struct nlmsghdr *nlh,
@@ -3506,7 +3587,8 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
struct net *net = sock_net(skb->sk);
u32 portid = NETLINK_CB(skb).portid;
struct net *link_net;
- struct net_device *dev, *master = NULL;
+ struct net_device *dev, *master = NULL, *link_dev = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
char ifname[IFNAMSIZ];
LIST_HEAD(list_kill);
int err;
@@ -3554,13 +3636,36 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
goto out;
}
+ link_dev = resolve_deps_locks(link_net ? : net, ops->newlink_deps, tb, data);
+ if (IS_ERR(link_dev)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (master && link_dev) {
+ double_lock_netdev(master, &nd_lock, link_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ if (nd_lock != nd_lock2)
+ unlock_netdev(nd_lock);
+ } else if (master || link_dev) {
+ lock_netdev(master ? : link_dev, &nd_lock);
+ } else {
+ nd_lock = alloc_nd_lock();
+ err = -ENOMEM;
+ if (!nd_lock)
+ goto out;
+ mutex_lock(&nd_lock->mutex);
+ }
+ attach_nd_lock(dev, nd_lock);
+
if (ops->newlink)
err = ops->newlink(link_net ? : net, dev, tb, data, extack);
else
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0) {
+ detach_nd_lock(dev);
free_netdev(dev);
- goto out;
+ goto unlock;
}
err = rtnl_configure_link(dev, ifm, portid, nlh);
@@ -3576,6 +3681,8 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
if (err)
goto out_unregister;
}
+unlock:
+ unlock_netdev(nd_lock);
out:
if (link_net)
put_net(link_net);
@@ -3587,7 +3694,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
unregister_netdevice_queue(dev, &list_kill);
}
unregister_netdevice_many(&list_kill);
- goto out;
+ goto unlock;
}
struct rtnl_newlink_tbs {
@@ -3608,7 +3715,8 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct ifinfomsg *ifm = nlmsg_data(nlh);
struct nlattr ** const tb = tbs->tb;
struct nlattr **slave_data = NULL;
- struct net_device *master_dev;
+ struct net_device *master_dev, *link_dev;
+ struct nd_lock *nd_lock, *nd_lock2;
int err, status = 0;
if (nlh->nlmsg_flags & NLM_F_EXCL)
@@ -3620,6 +3728,21 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
return err;
+ if (ops && ops == dev->rtnl_link_ops && linkinfo[IFLA_INFO_DATA]) {
+ link_dev = resolve_deps_locks(dev_net(dev),
+ ops->changelink_deps,
+ tb, data);
+ if (IS_ERR(link_dev))
+ return PTR_ERR(link_dev);
+
+ if (link_dev) {
+ double_lock_netdev(dev, &nd_lock, link_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ double_unlock_netdev(nd_lock, nd_lock2);
+ }
+ }
+
+ double_lock_netdev(dev, &nd_lock, new_master, &nd_lock2);
master_dev = netdev_master_upper_dev_get(dev);
if (master_dev)
m_ops = master_dev->rtnl_link_ops;
@@ -3668,6 +3791,7 @@ static int __rtnl_newlink_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
err = do_setlink(target_net, skb, dev, new_master, ifm, extack, tb, status);
out:
+ double_unlock_netdev(nd_lock, nd_lock2);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 12/51] net: Use __register_netdevice in trivial .newlink cases
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (10 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 11/51] net: Make master and slaves (any dependent devices) share the same nd_lock in .setlink etc Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 13/51] infiniband_ipoib: Use __register_netdevice in .newlink Kirill Tkhai
` (40 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Replace register_netdevice() in drivers calling it only
from .newlink methods.
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/amt.c | 4 ++--
drivers/net/bareudp.c | 2 +-
drivers/net/bonding/bond_netlink.c | 2 +-
drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 2 +-
drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 2 +-
drivers/net/gtp.c | 2 +-
drivers/net/ipvlan/ipvlan_main.c | 4 ++--
drivers/net/macsec.c | 4 ++--
drivers/net/macvlan.c | 4 ++--
drivers/net/pfcp.c | 2 +-
drivers/net/team/team_core.c | 2 +-
drivers/net/vrf.c | 6 +++---
drivers/net/wireguard/device.c | 2 +-
drivers/net/wireless/virtual/virt_wifi.c | 4 ++--
net/batman-adv/soft-interface.c | 2 +-
net/bridge/br_netlink.c | 2 +-
net/caif/chnl_net.c | 2 +-
net/hsr/hsr_device.c | 4 ++--
net/ipv4/ip_tunnel.c | 4 ++--
net/ipv6/ip6_gre.c | 2 +-
net/xfrm/xfrm_interface_core.c | 2 +-
21 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index 2288f4bf649c..d39cde2be85e 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3258,7 +3258,7 @@ static int amt_newlink(struct net *net, struct net_device *dev,
}
amt->qi = AMT_INIT_QUERY_INTERVAL;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0) {
netdev_dbg(dev, "failed to register new netdev %d\n", err);
goto err;
@@ -3266,7 +3266,7 @@ static int amt_newlink(struct net *net, struct net_device *dev,
err = netdev_upper_dev_link(amt->stream_dev, dev, extack);
if (err < 0) {
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
goto err;
}
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index d5c56ca91b77..ee54fec65e2e 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -647,7 +647,7 @@ static int bareudp_configure(struct net *net, struct net_device *dev,
bareudp->sport_min = conf->sport_min;
bareudp->multi_proto_mode = conf->multi_proto_mode;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 5fcab77d616f..70e3c93df0ba 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -574,7 +574,7 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
if (err < 0)
return err;
- err = register_netdevice(bond_dev);
+ err = __register_netdevice(bond_dev);
if (!err) {
struct bonding *bond = netdev_priv(bond_dev);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index 495368cbef34..526e4b7dd27b 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -178,7 +178,7 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev,
return 0;
err2:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
rmnet_vnd_dellink(mux_id, port, ep);
err1:
rmnet_unregister_real_device(real_dev);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index f1e40aade127..1c36ef3c1c7c 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -324,7 +324,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
return -EINVAL;
}
- rc = register_netdevice(rmnet_dev);
+ rc = __register_netdevice(rmnet_dev);
if (!rc) {
ep->egress_dev = rmnet_dev;
ep->mux_id = id;
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 0696faf60013..eef7c7a6edb2 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1520,7 +1520,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
dev->needed_headroom = LL_MAX_HEADER + GTP_IPV6_MAXLEN;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0) {
netdev_dbg(dev, "failed to register new netdev %d\n", err);
goto out_encap;
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index aafaf9d1d822..0887a7640cc0 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -585,7 +585,7 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
dev->priv_flags |= IFF_NO_RX_HANDLER;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
return err;
@@ -643,7 +643,7 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
remove_ida:
ida_free(&port->ida, dev->dev_id);
unregister_netdev:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
EXPORT_SYMBOL_GPL(ipvlan_link_new);
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 246cf09a0ebc..43ccba5a787d 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4187,7 +4187,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev,
if (rx_handler && rx_handler != macsec_handle_frame)
return -EBUSY;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
return err;
@@ -4257,7 +4257,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev,
unlink:
netdev_upper_dev_unlink(real_dev, dev);
unregister:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index b51e2e21dead..1bf9bb435ef4 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1531,7 +1531,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
update_port_bc_cutoff(
vlan, nla_get_s32(data[IFLA_MACVLAN_BC_CUTOFF]));
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto destroy_macvlan_port;
@@ -1549,7 +1549,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
unregister_netdev:
/* macvlan_uninit would free the macvlan port */
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
destroy_macvlan_port:
/* the macvlan port may be freed by macvlan_uninit when fail to register.
diff --git a/drivers/net/pfcp.c b/drivers/net/pfcp.c
index 69434fd13f96..a28a9aed14eb 100644
--- a/drivers/net/pfcp.c
+++ b/drivers/net/pfcp.c
@@ -200,7 +200,7 @@ static int pfcp_newlink(struct net *net, struct net_device *dev,
goto exit_err;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err) {
netdev_dbg(dev, "failed to register pfcp netdev %d\n", err);
goto exit_del_pfcp_sock;
diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
index ab1935a4aa2c..3e98771bcced 100644
--- a/drivers/net/team/team_core.c
+++ b/drivers/net/team/team_core.c
@@ -2214,7 +2214,7 @@ static int team_newlink(struct net *src_net, struct net_device *dev,
if (tb[IFLA_ADDRESS] == NULL)
eth_hw_addr_random(dev);
- return register_netdevice(dev);
+ return __register_netdevice(dev);
}
static int team_validate(struct nlattr *tb[], struct nlattr *data[],
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 040f0bb36c0e..85c0903d1ef0 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1719,7 +1719,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
dev->priv_flags |= IFF_L3MDEV_MASTER;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto out;
@@ -1731,7 +1731,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
err = vrf_map_register_dev(dev, extack);
if (err) {
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
goto out;
}
@@ -1743,7 +1743,7 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev,
err = vrf_add_fib_rules(dev);
if (err) {
vrf_map_unregister_dev(dev);
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
goto out;
}
*add_fib_rules = false;
diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 3feb36ee5bfb..b2a3d5260d42 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -364,7 +364,7 @@ static int wg_newlink(struct net *src_net, struct net_device *dev,
if (ret < 0)
goto err_free_handshake_queue;
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret < 0)
goto err_uninit_ratelimiter;
diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c
index c80ae0e0df53..877c3deeef5b 100644
--- a/drivers/net/wireless/virtual/virt_wifi.c
+++ b/drivers/net/wireless/virtual/virt_wifi.c
@@ -564,7 +564,7 @@ static int virt_wifi_newlink(struct net *src_net, struct net_device *dev,
dev->ieee80211_ptr->iftype = NL80211_IFTYPE_STATION;
dev->ieee80211_ptr->wiphy = common_wiphy;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err) {
dev_err(&priv->lowerdev->dev, "can't register_netdevice: %d\n",
err);
@@ -587,7 +587,7 @@ static int virt_wifi_newlink(struct net *src_net, struct net_device *dev,
return 0;
unregister_netdev:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
free_wireless_dev:
kfree(dev->ieee80211_ptr);
dev->ieee80211_ptr = NULL;
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 30ecbc2ef1fd..c1a9ae252a1c 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -1085,7 +1085,7 @@ static int batadv_softif_newlink(struct net *src_net, struct net_device *dev,
return -EINVAL;
}
- return register_netdevice(dev);
+ return __register_netdevice(dev);
}
/**
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index f17dbac7d828..4298c14d4295 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1560,7 +1560,7 @@ static int br_dev_newlink(struct net *src_net, struct net_device *dev,
struct net_bridge *br = netdev_priv(dev);
int err;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 47901bd4def1..69dc15baaab6 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -450,7 +450,7 @@ static int ipcaif_newlink(struct net *src_net, struct net_device *dev,
caifdev = netdev_priv(dev);
caif_netlink_parms(data, &caifdev->conn_req);
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret)
pr_warn("device rtml registration failed\n");
else
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index e4cc6b78dcfc..e2fa0130a66c 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -649,7 +649,7 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2],
(slave[1]->features & NETIF_F_HW_HSR_FWD))
hsr->fwd_offloaded = true;
- res = register_netdevice(hsr_dev);
+ res = __register_netdevice(hsr_dev);
if (res)
goto err_unregister;
@@ -685,6 +685,6 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2],
hsr_del_self_node(hsr);
if (unregister)
- unregister_netdevice(hsr_dev);
+ __unregister_netdevice(hsr_dev);
return res;
}
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 5cffad42fe8c..065b51dde995 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -1235,7 +1235,7 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
nt->net = net;
nt->parms = *p;
nt->fwmark = fwmark;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto err_register_netdevice;
@@ -1260,7 +1260,7 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
return 0;
err_dev_set_mtu:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
err_register_netdevice:
return err;
}
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 3942bd2ade78..57cbf7942dc8 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1993,7 +1993,7 @@ static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev,
nt->dev = dev;
nt->net = dev_net(dev);
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto out;
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index e50e4bf993fa..18bd60efd2cc 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -250,7 +250,7 @@ static int xfrmi_create(struct net_device *dev)
int err;
dev->rtnl_link_ops = &xfrmi_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 13/51] infiniband_ipoib: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (11 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 12/51] net: Use __register_netdevice in trivial .newlink cases Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 14/51] vxcan: " Kirill Tkhai
` (39 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Here are two path to __register_netdevice().
One is from .newlink, other is from store method.
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 12 ++++++++++--
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index 2dd3231df36c..b8add59c6c69 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -140,7 +140,7 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
if (data) {
err = ipoib_changelink(dev, tb, data, extack);
if (err) {
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 562df2b3ef18..970f344260df 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -128,7 +128,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
goto out_early;
}
- result = register_netdevice(ndev);
+ result = __register_netdevice(ndev);
if (result) {
ipoib_warn(priv, "failed to initialize; error %i", result);
@@ -155,7 +155,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
return 0;
sysfs_failed:
- unregister_netdevice(priv->dev);
+ __unregister_netdevice(priv->dev);
return -ENOMEM;
out_early:
@@ -169,6 +169,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
struct ipoib_dev_priv *ppriv, *priv;
char intf_name[IFNAMSIZ];
struct net_device *ndev;
+ struct nd_lock *nd_lock;
int result;
if (!capable(CAP_NET_ADMIN))
@@ -200,8 +201,15 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
ndev->rtnl_link_ops = ipoib_get_link_ops();
+ lock_netdev(pdev, &nd_lock);
+ attach_nd_lock(ndev, nd_lock);
+
result = __ipoib_vlan_add(ppriv, priv, pkey, IPOIB_LEGACY_CHILD);
+ if (result)
+ detach_nd_lock(ndev);
+ unlock_netdev(nd_lock);
+
if (result && ndev->reg_state == NETREG_UNINITIALIZED)
free_netdev(ndev);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 14/51] vxcan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (12 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 13/51] infiniband_ipoib: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 15/51] iavf: Use __register_netdevice() Kirill Tkhai
` (38 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/can/vxcan.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 9e1b7d41005f..6c44472af609 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -221,10 +221,12 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
if (ifmp && dev->ifindex)
peer->ifindex = ifmp->ifi_index;
- err = register_netdevice(peer);
+ attach_nd_lock(peer, rcu_dereference_protected(dev->nd_lock, true));
+ err = __register_netdevice(peer);
put_net(peer_net);
peer_net = NULL;
if (err < 0) {
+ detach_nd_lock(peer);
free_netdev(peer);
return err;
}
@@ -241,7 +243,7 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
else
snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto unregister_network_device;
@@ -257,7 +259,7 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
return 0;
unregister_network_device:
- unregister_netdevice(peer);
+ __unregister_netdevice(peer);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 15/51] iavf: Use __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (13 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 14/51] vxcan: " Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:39 ` [PATCH NET-PREV 16/51] geneve: Use __register_netdevice in .newlink Kirill Tkhai
` (37 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Attach, detach and take nd_lock in appropriate way:
nd_lock should be outside driver's locks.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 59 +++++++++++++++++++--------
1 file changed, 41 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index f782402cd789..77fbe80c04a4 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1968,14 +1968,36 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool runni
static void iavf_finish_config(struct work_struct *work)
{
struct iavf_adapter *adapter;
- int pairs, err;
+ struct nd_lock *nd_lock;
+ int pairs, err = 0;
adapter = container_of(work, struct iavf_adapter, finish_config);
/* Always take RTNL first to prevent circular lock dependency */
rtnl_lock();
+ lock_netdev(adapter->netdev, &nd_lock);
mutex_lock(&adapter->crit_lock);
+ if (adapter->netdev->reg_state != NETREG_REGISTERED &&
+ adapter->state == __IAVF_DOWN) {
+ err = __register_netdevice(adapter->netdev);
+ }
+
+ unlock_netdev(nd_lock);
+
+ if (err) {
+ dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n",
+ err);
+
+ /* go back and try again.*/
+ iavf_free_rss(adapter);
+ iavf_free_misc_irq(adapter);
+ iavf_reset_interrupt_capability(adapter);
+ iavf_change_state(adapter,
+ __IAVF_INIT_CONFIG_ADAPTER);
+ goto out;
+ }
+
if ((adapter->flags & IAVF_FLAG_SETUP_NETDEV_FEATURES) &&
adapter->netdev->reg_state == NETREG_REGISTERED &&
!test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) {
@@ -1985,22 +2007,6 @@ static void iavf_finish_config(struct work_struct *work)
switch (adapter->state) {
case __IAVF_DOWN:
- if (adapter->netdev->reg_state != NETREG_REGISTERED) {
- err = register_netdevice(adapter->netdev);
- if (err) {
- dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n",
- err);
-
- /* go back and try again.*/
- iavf_free_rss(adapter);
- iavf_free_misc_irq(adapter);
- iavf_reset_interrupt_capability(adapter);
- iavf_change_state(adapter,
- __IAVF_INIT_CONFIG_ADAPTER);
- goto out;
- }
- }
-
/* Set the real number of queues when reset occurs while
* state == __IAVF_DOWN
*/
@@ -5054,6 +5060,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
struct net_device *netdev;
struct iavf_adapter *adapter = NULL;
struct iavf_hw *hw = NULL;
+ struct nd_lock *nd_lock;
int err;
err = pci_enable_device(pdev);
@@ -5085,6 +5092,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
SET_NETDEV_DEV(netdev, &pdev->dev);
+ nd_lock = attach_new_nd_lock(netdev);
+ if (!nd_lock) {
+ err = -ENOMEM;
+ goto err_alloc_lock;
+ }
+
pci_set_drvdata(pdev, netdev);
adapter = netdev_priv(netdev);
@@ -5163,6 +5176,10 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
err_ioremap:
destroy_workqueue(adapter->wq);
err_alloc_wq:
+ mutex_lock(&nd_lock->mutex);
+ detach_nd_lock(netdev);
+ mutex_unlock(&nd_lock->mutex);
+err_alloc_lock:
free_netdev(netdev);
err_alloc_etherdev:
pci_release_regions(pdev);
@@ -5255,6 +5272,7 @@ static void iavf_remove(struct pci_dev *pdev)
struct iavf_mac_filter *f, *ftmp;
struct iavf_adapter *adapter;
struct net_device *netdev;
+ struct nd_lock *nd_lock;
struct iavf_hw *hw;
/* Don't proceed with remove if netdev is already freed */
@@ -5291,8 +5309,13 @@ static void iavf_remove(struct pci_dev *pdev)
cancel_delayed_work_sync(&adapter->watchdog_task);
cancel_work_sync(&adapter->finish_config);
- if (netdev->reg_state == NETREG_REGISTERED)
+ if (netdev->reg_state == NETREG_REGISTERED) {
unregister_netdev(netdev);
+ } else {
+ lock_netdev(netdev, &nd_lock);
+ detach_nd_lock(netdev);
+ unlock_netdev(nd_lock);
+ }
mutex_lock(&adapter->crit_lock);
dev_info(&adapter->pdev->dev, "Removing device\n");
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 16/51] geneve: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (14 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 15/51] iavf: Use __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:39 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 17/51] netkit: " Kirill Tkhai
` (36 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:39 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/geneve.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 838e85ddec67..f74f92753063 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1380,7 +1380,7 @@ static int geneve_configure(struct net *net, struct net_device *dev,
dev->flags = IFF_POINTOPOINT | IFF_NOARP;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
@@ -1830,6 +1830,7 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
u8 name_assign_type, u16 dst_port)
{
struct nlattr *tb[IFLA_MAX + 1];
+ struct nd_lock *nd_lock;
struct net_device *dev;
LIST_HEAD(list_kill);
int err;
@@ -1846,12 +1847,21 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
if (IS_ERR(dev))
return dev;
+ if (!attach_new_nd_lock(dev)) {
+ free_netdev(dev);
+ return ERR_PTR(-ENOMEM);
+ }
+
init_tnl_info(&cfg.info, dst_port);
+ lock_netdev(dev, &nd_lock);
err = geneve_configure(net, dev, NULL, &cfg);
if (err) {
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
free_netdev(dev);
return ERR_PTR(err);
}
+ unlock_netdev(nd_lock);
/* openvswitch users expect packet sizes to be unrestricted,
* so set the largest MTU we can.
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 17/51] netkit: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (15 preceding siblings ...)
2025-03-22 14:39 ` [PATCH NET-PREV 16/51] geneve: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 18/51] qmi_wwan: " Kirill Tkhai
` (35 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/netkit.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 16789cd446e9..da8d806b8249 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -408,7 +408,8 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev,
nk->mode = mode;
bpf_mprog_bundle_init(&nk->bundle);
- err = register_netdevice(peer);
+ attach_nd_lock(peer, rcu_dereference_protected(dev->nd_lock, true));
+ err = __register_netdevice(peer);
put_net(net);
if (err < 0)
goto err_register_peer;
@@ -433,7 +434,7 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev,
nk->mode = mode;
bpf_mprog_bundle_init(&nk->bundle);
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto err_configure_peer;
netif_carrier_off(dev);
@@ -444,9 +445,10 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev,
rcu_assign_pointer(netkit_priv(peer)->peer, dev);
return 0;
err_configure_peer:
- unregister_netdevice(peer);
+ __unregister_netdevice(peer);
return err;
err_register_peer:
+ detach_nd_lock(peer);
free_netdev(peer);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 18/51] qmi_wwan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (16 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 17/51] netkit: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 19/51] bpqether: Provide determined context in __register_netdevice() Kirill Tkhai
` (34 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/usb/qmi_wwan.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 4823dbdf5465..beec69580978 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -246,6 +246,7 @@ static int qmimux_register_device(struct net_device *real_dev, u8 mux_id)
{
struct net_device *new_dev;
struct qmimux_priv *priv;
+ struct nd_lock *nd_lock;
int err;
new_dev = alloc_netdev(sizeof(struct qmimux_priv),
@@ -260,14 +261,23 @@ static int qmimux_register_device(struct net_device *real_dev, u8 mux_id)
new_dev->sysfs_groups[0] = &qmi_wwan_sysfs_qmimux_attr_group;
- err = register_netdevice(new_dev);
- if (err < 0)
+ err = -ENOMEM;
+
+ lock_netdev(real_dev, &nd_lock);
+ attach_nd_lock(new_dev, nd_lock);
+ err = __register_netdevice(new_dev);
+ if (err < 0) {
+ detach_nd_lock(new_dev);
+ unlock_netdev(nd_lock);
goto out_free_newdev;
+ }
/* Account for reference in struct qmimux_priv_priv */
dev_hold(real_dev);
err = netdev_upper_dev_link(real_dev, new_dev, NULL);
+ unlock_netdev(nd_lock);
+
if (err)
goto out_unregister_netdev;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 19/51] bpqether: Provide determined context in __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (17 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 18/51] qmi_wwan: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 20/51] ppp: Use __register_netdevice in .newlink Kirill Tkhai
` (33 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
In case of caller already owns nd_lock, there is
nesting without underlying that to lockdep.
So we use trylock and __register_netdevice() here.
XXX: after callers of netdevice notifyiers are converted,
we will inherit @edev nd_lock instead.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/hamradio/bpqether.c | 33 +++++++++++++++++++++++++++------
1 file changed, 27 insertions(+), 6 deletions(-)
diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 83a16d10eedb..bf2792f98afe 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -480,6 +480,7 @@ static int bpq_new_device(struct net_device *edev)
{
int err;
struct net_device *ndev;
+ struct nd_lock *nd_lock;
struct bpqdev *bpq;
ndev = alloc_netdev(sizeof(struct bpqdev), "bpq%d", NET_NAME_UNKNOWN,
@@ -487,7 +488,23 @@ static int bpq_new_device(struct net_device *edev)
if (!ndev)
return -ENOMEM;
-
+ err = -ENOMEM;
+ nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ goto err_free;
+
+ /* This is called from netdevice notifier, which is not converted yet.
+ * The context is unknown: either some nd_lock is locked or not. Since
+ * @ndev is undependent of @edev (on this stage of convertation we don't
+ * require that, we will require when we convert unregister_netdev()).
+ * So, a new nd_lock is used for @ndev for now.
+ * Q: Why is trylock, despite it can't fail?
+ * A: Caller may own some other nd_lock, so lockdep will unhappy seeing
+ * there is nested lock without mutex_lock_nested() prefix.
+ */
+ BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ attach_nd_lock(ndev, nd_lock);
+
bpq = netdev_priv(ndev);
dev_hold(edev);
bpq->ethdev = edev;
@@ -496,19 +513,23 @@ static int bpq_new_device(struct net_device *edev)
eth_broadcast_addr(bpq->dest_addr);
eth_broadcast_addr(bpq->acpt_addr);
- err = register_netdevice(ndev);
+ err = __register_netdevice(ndev);
if (err)
- goto error;
+ goto err_detach;
bpq_set_lockdep_class(ndev);
/* List protected by RTNL */
list_add_rcu(&bpq->bpq_list, &bpq_devices);
- return 0;
+unlock:
+ unlock_netdev(nd_lock);
+ return err;
- error:
+err_detach:
+ detach_nd_lock(ndev);
dev_put(edev);
+err_free:
free_netdev(ndev);
- return err;
+ goto unlock;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 20/51] ppp: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (18 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 19/51] bpqether: Provide determined context in __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 21/51] veth: " Kirill Tkhai
` (32 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/ppp/ppp_generic.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index eb9acfcaeb09..c094bc5e6d8f 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1216,7 +1216,7 @@ static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
mutex_unlock(&pn->all_ppp_mutex);
- ret = register_netdevice(ppp->dev);
+ ret = __register_netdevice(ppp->dev);
if (ret < 0)
goto err_unit;
@@ -3331,6 +3331,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
.unit = *unit,
.ifname_is_set = false,
};
+ struct nd_lock *nd_lock;
struct net_device *dev;
struct ppp *ppp;
int err;
@@ -3343,7 +3344,13 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
dev_net_set(dev, net);
dev->rtnl_link_ops = &ppp_link_ops;
+ if (!attach_new_nd_lock(dev)) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+
rtnl_lock();
+ lock_netdev(dev, &nd_lock);
err = ppp_dev_configure(net, dev, &conf);
if (err < 0)
@@ -3351,12 +3358,16 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
ppp = netdev_priv(dev);
*unit = ppp->file.index;
+ unlock_netdev(nd_lock);
rtnl_unlock();
return 0;
err_dev:
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
rtnl_unlock();
+err_free:
free_netdev(dev);
err:
return err;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 21/51] veth: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (19 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 20/51] ppp: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 22/51] vxlan: " Kirill Tkhai
` (31 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/veth.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 34499b91a8bd..7a502dbed5b9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1827,7 +1827,9 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
netif_inherit_tso_max(peer, dev);
- err = register_netdevice(peer);
+ attach_nd_lock(peer, rcu_dereference_protected(dev->nd_lock, true));
+
+ err = __register_netdevice(peer);
put_net(net);
net = NULL;
if (err < 0)
@@ -1858,7 +1860,7 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
else
snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto err_register_dev;
@@ -1888,14 +1890,15 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
return 0;
err_queues:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
err_register_dev:
/* nothing to do */
err_configure_peer:
- unregister_netdevice(peer);
+ __unregister_netdevice(peer);
return err;
err_register_peer:
+ detach_nd_lock(peer);
free_netdev(peer);
return err;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 22/51] vxlan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (20 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 21/51] veth: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 23/51] hdlc_fr: Use __register_netdevice Kirill Tkhai
` (30 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/vxlan/vxlan_core.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index b041ddc2ab34..369f7b667424 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -3950,7 +3950,7 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
return err;
}
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
goto errout;
unregister = true;
@@ -4001,7 +4001,7 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
__vxlan_fdb_free(f);
unregister:
if (unregister)
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
return err;
}
@@ -4604,22 +4604,37 @@ struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type,
struct vxlan_config *conf)
{
+ struct net_device *dev, *lowerdev = NULL;
struct nlattr *tb[IFLA_MAX + 1];
- struct net_device *dev;
+ struct nd_lock *nd_lock;
int err;
memset(&tb, 0, sizeof(tb));
+ if (conf->remote_ifindex) {
+ lowerdev = __dev_get_by_index(net, conf->remote_ifindex);
+ if (!lowerdev)
+ return ERR_PTR(-ENODEV);
+ }
+
dev = rtnl_create_link(net, name, name_assign_type,
&vxlan_link_ops, tb, NULL);
if (IS_ERR(dev))
return dev;
- err = __vxlan_dev_create(net, dev, conf, NULL);
- if (err < 0) {
- free_netdev(dev);
- return ERR_PTR(err);
+ if (lowerdev) {
+ lock_netdev(lowerdev, &nd_lock);
+ attach_nd_lock(dev, nd_lock);
+ } else {
+ err = -ENOMEM;
+ if (!attach_new_nd_lock(dev))
+ goto err_free;
+ lock_netdev(dev, &nd_lock);
}
+ err = __vxlan_dev_create(net, dev, conf, NULL);
+ if (err < 0)
+ goto err_detach;
+ unlock_netdev(nd_lock);
err = rtnl_configure_link(dev, NULL, 0, NULL);
if (err < 0) {
@@ -4631,6 +4646,13 @@ struct net_device *vxlan_dev_create(struct net *net, const char *name,
}
return dev;
+
+err_detach:
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
+err_free:
+ free_netdev(dev);
+ return ERR_PTR(err);
}
EXPORT_SYMBOL_GPL(vxlan_dev_create);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 23/51] hdlc_fr: Use __register_netdevice
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (21 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 22/51] vxlan: " Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:40 ` [PATCH NET-PREV 24/51] lapbeth: Provide determined context in __register_netdevice() Kirill Tkhai
` (29 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to make dependent devices share
the same nd_lock.
Finaly, taking nd_lock should be moved to ioctl
caller, but now we can't do this at least because
netdevice notifiers are not converted.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wan/hdlc_fr.c | 18 ++++++++++++------
net/core/dev_ioctl.c | 1 +
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index 81e72bc1891f..93c61083de76 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1106,7 +1106,9 @@ static int fr_add_pvc(struct net_device *frad, unsigned int dlci, int type)
dev->priv_flags |= IFF_NO_QUEUE;
dev->ml_priv = pvc;
- if (register_netdevice(dev) != 0) {
+ attach_nd_lock(dev, rcu_dereference_protected(frad->nd_lock, true));
+ if (__register_netdevice(dev) != 0) {
+ detach_nd_lock(dev);
free_netdev(dev);
delete_unused_pvcs(hdlc);
return -EIO;
@@ -1187,8 +1189,9 @@ static int fr_ioctl(struct net_device *dev, struct if_settings *ifs)
const size_t size = sizeof(fr_proto);
fr_proto new_settings;
hdlc_device *hdlc = dev_to_hdlc(dev);
+ struct nd_lock *nd_lock;
fr_proto_pvc pvc;
- int result;
+ int result, err;
switch (ifs->type) {
case IF_GET_PROTO:
@@ -1272,10 +1275,13 @@ static int fr_ioctl(struct net_device *dev, struct if_settings *ifs)
result = ARPHRD_DLCI;
if (ifs->type == IF_PROTO_FR_ADD_PVC ||
- ifs->type == IF_PROTO_FR_ADD_ETH_PVC)
- return fr_add_pvc(dev, pvc.dlci, result);
- else
- return fr_del_pvc(hdlc, pvc.dlci, result);
+ ifs->type == IF_PROTO_FR_ADD_ETH_PVC) {
+ lock_netdev(dev, &nd_lock);
+ err = fr_add_pvc(dev, pvc.dlci, result);
+ unlock_netdev(nd_lock);
+ } else {
+ err = fr_del_pvc(hdlc, pvc.dlci, result);
+ }
}
return -EINVAL;
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 8592c052c0f4..dc2a0f513bac 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -496,6 +496,7 @@ static int dev_siocwandev(struct net_device *dev, struct if_settings *ifs)
{
const struct net_device_ops *ops = dev->netdev_ops;
+ /* This may take nd_lock. See fr_add_pvc() */
if (ops->ndo_siocwandev) {
if (netif_device_present(dev))
return ops->ndo_siocwandev(dev, ifs);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 24/51] lapbeth: Provide determined context in __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (22 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 23/51] hdlc_fr: Use __register_netdevice Kirill Tkhai
@ 2025-03-22 14:40 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 25/51] wwan: Use __register_netdevice in .newlink Kirill Tkhai
` (28 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:40 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
In case of caller already owns nd_lock, there is
nesting without underlying that to lockdep.
So we use trylock and __register_netdevice() here.
XXX: after callers of netdevice notifyiers are converted,
we will inherit @edev nd_lock instead.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wan/lapbether.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index 56326f38fe8a..793a2ed424c0 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -380,6 +380,7 @@ static int lapbeth_new_device(struct net_device *dev)
{
struct net_device *ndev;
struct lapbethdev *lapbeth;
+ struct nd_lock *nd_lock;
int rc = -ENOMEM;
ASSERT_RTNL();
@@ -392,6 +393,23 @@ static int lapbeth_new_device(struct net_device *dev)
if (!ndev)
goto out;
+ rc = -ENOMEM;
+ nd_lock = alloc_nd_lock();
+ if (!nd_lock)
+ goto err_free;
+
+ /* This is called from netdevice notifier, which is not converted yet.
+ * The context is unknown: either some nd_lock is locked or not. Since
+ * @ndev is undependent of @edev (on this stage of convertation we don't
+ * require that, we will require when we convert unregister_netdev()).
+ * So, a new nd_lock is used for @ndev for now.
+ * Q: Why is trylock, despite it can't fail?
+ * A: Caller may own some other nd_lock, so lockdep will unhappy seeing
+ * there is nested lock without mutex_lock_nested() prefix.
+ */
+ BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ attach_nd_lock(ndev, nd_lock);
+
/* When transmitting data:
* first this driver removes a pseudo header of 1 byte,
* then the lapb module prepends an LAPB header of at most 3 bytes,
@@ -415,15 +433,19 @@ static int lapbeth_new_device(struct net_device *dev)
netif_napi_add_weight(ndev, &lapbeth->napi, lapbeth_napi_poll, 16);
rc = -EIO;
- if (register_netdevice(ndev))
- goto fail;
+ if (__register_netdevice(ndev))
+ goto err_put;
+ unlock_netdev(nd_lock);
list_add_rcu(&lapbeth->node, &lapbeth_devices);
rc = 0;
out:
return rc;
-fail:
+err_put:
dev_put(dev);
+ detach_nd_lock(ndev);
+ unlock_netdev(nd_lock);
+err_free:
free_netdev(ndev);
goto out;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 25/51] wwan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (23 preceding siblings ...)
2025-03-22 14:40 ` [PATCH NET-PREV 24/51] lapbeth: Provide determined context in __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 26/51] 6lowpan: " Kirill Tkhai
` (27 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wwan/iosm/iosm_ipc_wwan.c | 2 +-
drivers/net/wwan/mhi_wwan_mbim.c | 2 +-
drivers/net/wwan/t7xx/t7xx_netdev.c | 2 +-
drivers/net/wwan/wwan_core.c | 13 +++++++++++--
4 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wwan/iosm/iosm_ipc_wwan.c b/drivers/net/wwan/iosm/iosm_ipc_wwan.c
index ff747fc79aaf..f84f59df0747 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_wwan.c
+++ b/drivers/net/wwan/iosm/iosm_ipc_wwan.c
@@ -180,7 +180,7 @@ static int ipc_wwan_newlink(void *ctxt, struct net_device *dev,
if (rcu_access_pointer(ipc_wwan->sub_netlist[if_id]))
return -EBUSY;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err)
return err;
diff --git a/drivers/net/wwan/mhi_wwan_mbim.c b/drivers/net/wwan/mhi_wwan_mbim.c
index d5a9360323d2..369ed68211dd 100644
--- a/drivers/net/wwan/mhi_wwan_mbim.c
+++ b/drivers/net/wwan/mhi_wwan_mbim.c
@@ -566,7 +566,7 @@ static int mhi_mbim_newlink(void *ctxt, struct net_device *ndev, u32 if_id,
/* Already protected by RTNL lock */
hlist_add_head_rcu(&link->hlnode, &mbim->link_list[LINK_HASH(if_id)]);
- return register_netdevice(ndev);
+ return __register_netdevice(ndev);
}
static void mhi_mbim_dellink(void *ctxt, struct net_device *ndev,
diff --git a/drivers/net/wwan/t7xx/t7xx_netdev.c b/drivers/net/wwan/t7xx/t7xx_netdev.c
index 91fa082e9cab..3bde38147930 100644
--- a/drivers/net/wwan/t7xx/t7xx_netdev.c
+++ b/drivers/net/wwan/t7xx/t7xx_netdev.c
@@ -304,7 +304,7 @@ static int t7xx_ccmni_wwan_newlink(void *ctxt, struct net_device *dev, u32 if_id
atomic_set(&ccmni->usage, 0);
ctlb->ccmni_inst[if_id] = ccmni;
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret)
return ret;
diff --git a/drivers/net/wwan/wwan_core.c b/drivers/net/wwan/wwan_core.c
index 17431f1b1a0c..c2878efcde59 100644
--- a/drivers/net/wwan/wwan_core.c
+++ b/drivers/net/wwan/wwan_core.c
@@ -982,7 +982,7 @@ static int wwan_rtnl_newlink(struct net *src_net, struct net_device *dev,
ret = wwandev->ops->newlink(wwandev->ops_ctxt, dev,
link_id, extack);
else
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
out:
/* release the reference */
@@ -1053,9 +1053,11 @@ static void wwan_create_default_link(struct wwan_device *wwandev,
{
struct nlattr *tb[IFLA_MAX + 1], *linkinfo[IFLA_INFO_MAX + 1];
struct nlattr *data[IFLA_WWAN_MAX + 1];
+ struct nd_lock *nd_lock;
struct net_device *dev;
struct nlmsghdr *nlh;
struct sk_buff *msg;
+ int ret;
/* Forge attributes required to create a WWAN netdev. We first
* build a netlink message and then parse it. This looks
@@ -1097,7 +1099,14 @@ static void wwan_create_default_link(struct wwan_device *wwandev,
if (WARN_ON(IS_ERR(dev)))
goto unlock;
- if (WARN_ON(wwan_rtnl_newlink(&init_net, dev, tb, data, NULL))) {
+ if (!attach_new_nd_lock(dev))
+ goto unlock;
+
+ lock_netdev(dev, &nd_lock);
+ ret = wwan_rtnl_newlink(&init_net, dev, tb, data, NULL);
+ unlock_netdev(nd_lock);
+
+ if (WARN_ON(ret)) {
free_netdev(dev);
goto unlock;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 26/51] 6lowpan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (24 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 25/51] wwan: Use __register_netdevice in .newlink Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 27/51] vlan: " Kirill Tkhai
` (26 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/6lowpan/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
index 850d4a185f55..b5cbf85b291c 100644
--- a/net/6lowpan/core.c
+++ b/net/6lowpan/core.c
@@ -39,7 +39,7 @@ int lowpan_register_netdevice(struct net_device *dev,
dev->ndisc_ops = &lowpan_ndisc_ops;
- ret = register_netdevice(dev);
+ ret = __register_netdevice(dev);
if (ret < 0)
return ret;
@@ -52,10 +52,18 @@ EXPORT_SYMBOL(lowpan_register_netdevice);
int lowpan_register_netdev(struct net_device *dev,
enum lowpan_lltypes lltype)
{
+ struct nd_lock *nd_lock;
int ret;
rtnl_lock();
+ if (!attach_new_nd_lock(dev))
+ goto out;
+ lock_netdev(dev, &nd_lock);
ret = lowpan_register_netdevice(dev, lltype);
+ if (ret)
+ detach_nd_lock(dev);
+ unlock_netdev(nd_lock);
+out:
rtnl_unlock();
return ret;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 27/51] vlan: Use __register_netdevice in .newlink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (25 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 26/51] 6lowpan: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 28/51] dsa: Use __register_netdevice() Kirill Tkhai
` (25 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Also, use __unregister_netdevice() since we know
there is held lock in that path.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/8021q/vlan.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index e45187b88220..ca3ba251a145 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -176,7 +176,7 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
if (err < 0)
goto out_uninit_mvrp;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out_uninit_mvrp;
@@ -196,7 +196,7 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
return 0;
out_unregister_netdev:
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
out_uninit_mvrp:
if (grp->nr_vlan_devs == 0)
vlan_mvrp_uninit_applicant(real_dev);
@@ -217,6 +217,7 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id)
struct vlan_dev_priv *vlan;
struct net *net = dev_net(real_dev);
struct vlan_net *vn = net_generic(net, vlan_net_id);
+ struct nd_lock *nd_lock;
char name[IFNAMSIZ];
int err;
@@ -274,7 +275,13 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id)
vlan->flags = VLAN_FLAG_REORDER_HDR;
new_dev->rtnl_link_ops = &vlan_link_ops;
+
+ lock_netdev(real_dev, &nd_lock);
+ attach_nd_lock(new_dev, nd_lock);
err = register_vlan_dev(new_dev, NULL);
+ if (err)
+ detach_nd_lock(new_dev);
+ unlock_netdev(nd_lock);
if (err < 0)
goto out_free_newdev;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 28/51] dsa: Use __register_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (26 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 27/51] vlan: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 29/51] ip6gre: Use __register_netdevice() in .changelink Kirill Tkhai
` (24 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Inherit nd_lock from conduit during registration
of a new device.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/dsa/user.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/net/dsa/user.c b/net/dsa/user.c
index f5adfa1d978a..cc3e0006f953 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -2686,6 +2686,7 @@ int dsa_user_create(struct dsa_port *port)
struct net_device *conduit = dsa_port_to_conduit(port);
struct dsa_switch *ds = port->ds;
struct net_device *user_dev;
+ struct nd_lock *nd_lock;
struct dsa_user_priv *p;
const char *name;
int assign_type;
@@ -2759,38 +2760,42 @@ int dsa_user_create(struct dsa_port *port)
dev_warn(ds->dev, "nonfatal error %d setting MTU to %d on port %d\n",
ret, ETH_DATA_LEN, port->index);
- ret = register_netdevice(user_dev);
+ lock_netdev(conduit, &nd_lock);
+ attach_nd_lock(user_dev, nd_lock);
+ ret = __register_netdevice(user_dev);
if (ret) {
netdev_err(conduit, "error %d registering interface %s\n",
ret, user_dev->name);
- rtnl_unlock();
+ detach_nd_lock(user_dev);
+ unlock_netdev(nd_lock);
goto out_phy;
}
+ ret = netdev_upper_dev_link(conduit, user_dev, NULL);
+ unlock_netdev(nd_lock);
+
+ if (ret)
+ goto out_unregister;
+
if (IS_ENABLED(CONFIG_DCB)) {
ret = dsa_user_dcbnl_init(user_dev);
if (ret) {
netdev_err(user_dev,
"failed to initialize DCB: %pe\n",
ERR_PTR(ret));
- rtnl_unlock();
goto out_unregister;
}
}
- ret = netdev_upper_dev_link(conduit, user_dev, NULL);
-
rtnl_unlock();
- if (ret)
- goto out_unregister;
-
return 0;
out_unregister:
- unregister_netdev(user_dev);
+ lock_netdev(user_dev, &nd_lock);
+ __unregister_netdevice(user_dev);
+ unlock_netdev(nd_lock);
out_phy:
- rtnl_lock();
phylink_disconnect_phy(p->dp->pl);
rtnl_unlock();
dsa_port_phylink_destroy(p->dp);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 29/51] ip6gre: Use __register_netdevice() in .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (27 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 28/51] dsa: Use __register_netdevice() Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 30/51] ip6_tunnel: Use __register_netdevice() in .newlink and .changelink Kirill Tkhai
` (23 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .changelink with its callers,
which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/ip6_gre.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 57cbf7942dc8..e40780da15a0 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -344,6 +344,7 @@ static struct ip6_tnl *ip6gre_tunnel_find(struct net *net,
}
static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
+ struct nd_lock *nd_lock,
const struct __ip6_tnl_parm *parms, int create)
{
struct ip6_tnl *t, *nt;
@@ -378,8 +379,11 @@ static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
nt->dev = dev;
nt->net = dev_net(dev);
- if (register_netdevice(dev) < 0)
+ attach_nd_lock(dev, nd_lock);
+ if (__register_netdevice(dev) < 0) {
+ detach_nd_lock(dev);
goto failed_free;
+ }
ip6gre_tnl_link_config(nt, 1);
ip6gre_tunnel_link(ign, nt);
@@ -1277,6 +1281,10 @@ static void ip6gre_tnl_parm_to_user(struct ip6_tnl_parm2 *u,
memcpy(u->name, p->name, sizeof(u->name));
}
+/* XXX: Currently ->ndo_siocdevprivate is called with @dev unlocked
+ * (the only place where @dev may be locked is phonet_device_autoconf(),
+ * but it can't be caller of this).
+ */
static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
struct ifreq *ifr, void __user *data,
int cmd)
@@ -1287,6 +1295,7 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
struct ip6_tnl *t = netdev_priv(dev);
struct net *net = t->net;
struct ip6gre_net *ign = net_generic(net, ip6gre_net_id);
+ struct nd_lock *nd_lock;
memset(&p1, 0, sizeof(p1));
@@ -1298,7 +1307,9 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
break;
}
ip6gre_tnl_parm_from_user(&p1, &p);
- t = ip6gre_tunnel_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6gre_tunnel_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (!t)
t = netdev_priv(dev);
}
@@ -1328,7 +1339,9 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
p.o_key = 0;
ip6gre_tnl_parm_from_user(&p1, &p);
- t = ip6gre_tunnel_locate(net, &p1, cmd == SIOCADDTUNNEL);
+ lock_netdev(dev, &nd_lock);
+ t = ip6gre_tunnel_locate(net, nd_lock, &p1, cmd == SIOCADDTUNNEL);
+ unlock_netdev(nd_lock);
if (dev != ign->fb_tunnel_dev && cmd == SIOCCHGTUNNEL) {
if (t) {
@@ -1369,7 +1382,9 @@ static int ip6gre_tunnel_siocdevprivate(struct net_device *dev,
goto done;
err = -ENOENT;
ip6gre_tnl_parm_from_user(&p1, &p);
- t = ip6gre_tunnel_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6gre_tunnel_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (!t)
goto done;
err = -EPERM;
@@ -2038,6 +2053,7 @@ ip6gre_changelink_common(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[], struct __ip6_tnl_parm *p_p,
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip6_tnl *t, *nt = netdev_priv(dev);
struct net *net = nt->net;
struct ip6gre_net *ign = net_generic(net, ip6gre_net_id);
@@ -2055,7 +2071,7 @@ ip6gre_changelink_common(struct net_device *dev, struct nlattr *tb[],
ip6gre_netlink_parms(data, p_p);
- t = ip6gre_tunnel_locate(net, p_p, 0);
+ t = ip6gre_tunnel_locate(net, nd_lock, p_p, 0);
if (t) {
if (t->dev != dev)
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 30/51] ip6_tunnel: Use __register_netdevice() in .newlink and .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (28 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 29/51] ip6gre: Use __register_netdevice() in .changelink Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 31/51] ip6_vti: " Kirill Tkhai
` (22 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink and .changelink with their
callers, which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/ip6_tunnel.c | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 87dfb565a9f8..d6435cb1e4fc 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -257,7 +257,7 @@ static int ip6_tnl_create2(struct net_device *dev)
int err;
dev->rtnl_link_ops = &ip6_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
@@ -282,7 +282,8 @@ static int ip6_tnl_create2(struct net_device *dev)
* created tunnel or error pointer
**/
-static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
+static struct ip6_tnl *ip6_tnl_create(struct net *net, struct nd_lock *nd_lock,
+ struct __ip6_tnl_parm *p)
{
struct net_device *dev;
struct ip6_tnl *t;
@@ -307,6 +308,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
t = netdev_priv(dev);
t->parms = *p;
t->net = dev_net(dev);
+ attach_nd_lock(dev, nd_lock);
err = ip6_tnl_create2(dev);
if (err < 0)
goto failed_free;
@@ -314,6 +316,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
return t;
failed_free:
+ detach_nd_lock(dev);
free_netdev(dev);
failed:
return ERR_PTR(err);
@@ -322,6 +325,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
/**
* ip6_tnl_locate - find or create tunnel matching given parameters
* @net: network namespace
+ * @nd_lock: created device lock
* @p: tunnel parameters
* @create: != 0 if allowed to create new tunnel if no match found
*
@@ -335,6 +339,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
**/
static struct ip6_tnl *ip6_tnl_locate(struct net *net,
+ struct nd_lock *nd_lock,
struct __ip6_tnl_parm *p, int create)
{
const struct in6_addr *remote = &p->raddr;
@@ -357,7 +362,7 @@ static struct ip6_tnl *ip6_tnl_locate(struct net *net,
}
if (!create)
return ERR_PTR(-ENODEV);
- return ip6_tnl_create(net, p);
+ return ip6_tnl_create(net, nd_lock, p);
}
/**
@@ -1621,8 +1626,11 @@ ip6_tnl_parm_to_user(struct ip6_tnl_parm *u, const struct __ip6_tnl_parm *p)
* %-EINVAL if passed tunnel parameters are invalid,
* %-EEXIST if changing a tunnel's parameters would cause a conflict
* %-ENODEV if attempting to change or delete a nonexisting device
- **/
-
+ *
+ * XXX: Currently ->ndo_siocdevprivate is called with @dev unlocked
+ * (the only place where @dev may be locked is phonet_device_autoconf(),
+ * but it can't be caller of this).
+ */
static int
ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
void __user *data, int cmd)
@@ -1633,6 +1641,7 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
struct ip6_tnl *t = netdev_priv(dev);
struct net *net = t->net;
struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
+ struct nd_lock *nd_lock;
memset(&p1, 0, sizeof(p1));
@@ -1644,7 +1653,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
break;
}
ip6_tnl_parm_from_user(&p1, &p);
- t = ip6_tnl_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6_tnl_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (IS_ERR(t))
t = netdev_priv(dev);
} else {
@@ -1667,7 +1678,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
p.proto != 0)
break;
ip6_tnl_parm_from_user(&p1, &p);
- t = ip6_tnl_locate(net, &p1, cmd == SIOCADDTUNNEL);
+ lock_netdev(dev, &nd_lock);
+ t = ip6_tnl_locate(net, nd_lock, &p1, cmd == SIOCADDTUNNEL);
+ unlock_netdev(nd_lock);
if (cmd == SIOCCHGTUNNEL) {
if (!IS_ERR(t)) {
if (t->dev != dev) {
@@ -1702,7 +1715,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
break;
err = -ENOENT;
ip6_tnl_parm_from_user(&p1, &p);
- t = ip6_tnl_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ip6_tnl_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (IS_ERR(t))
break;
err = -EPERM;
@@ -2003,6 +2018,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct net *net = dev_net(dev);
struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
struct ip_tunnel_encap ipencap;
@@ -2023,7 +2039,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
if (rtnl_dereference(ip6n->collect_md_tun))
return -EEXIST;
} else {
- t = ip6_tnl_locate(net, &nt->parms, 0);
+ t = ip6_tnl_locate(net, nd_lock, &nt->parms, 0);
if (!IS_ERR(t))
return -EEXIST;
}
@@ -2039,6 +2055,7 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip6_tnl *t = netdev_priv(dev);
struct __ip6_tnl_parm p;
struct net *net = t->net;
@@ -2058,7 +2075,7 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[],
if (p.collect_md)
return -EINVAL;
- t = ip6_tnl_locate(net, &p, 0);
+ t = ip6_tnl_locate(net, nd_lock, &p, 0);
if (!IS_ERR(t)) {
if (t->dev != dev)
return -EEXIST;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 31/51] ip6_vti: Use __register_netdevice() in .newlink and .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (29 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 30/51] ip6_tunnel: Use __register_netdevice() in .newlink and .changelink Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 32/51] ip6_sit: " Kirill Tkhai
` (21 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink and .changelink with their
callers, which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/ip6_vti.c | 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 590737c27537..b20a18c403e9 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -182,7 +182,7 @@ static int vti6_tnl_create2(struct net_device *dev)
int err;
dev->rtnl_link_ops = &vti6_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
@@ -196,7 +196,8 @@ static int vti6_tnl_create2(struct net_device *dev)
return err;
}
-static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p)
+static struct ip6_tnl *vti6_tnl_create(struct net *net, struct nd_lock *nd_lock,
+ struct __ip6_tnl_parm *p)
{
struct net_device *dev;
struct ip6_tnl *t;
@@ -221,6 +222,7 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
t->parms = *p;
t->net = dev_net(dev);
+ attach_nd_lock(dev, nd_lock);
err = vti6_tnl_create2(dev);
if (err < 0)
goto failed_free;
@@ -228,6 +230,7 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
return t;
failed_free:
+ detach_nd_lock(dev);
free_netdev(dev);
failed:
return NULL;
@@ -247,8 +250,8 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p
* Return:
* matching tunnel or NULL
**/
-static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
- int create)
+static struct ip6_tnl *vti6_locate(struct net *net, struct nd_lock *nd_lock,
+ struct __ip6_tnl_parm *p, int create)
{
const struct in6_addr *remote = &p->raddr;
const struct in6_addr *local = &p->laddr;
@@ -269,7 +272,7 @@ static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
}
if (!create)
return NULL;
- return vti6_tnl_create(net, p);
+ return vti6_tnl_create(net, nd_lock, p);
}
/**
@@ -791,6 +794,10 @@ vti6_parm_to_user(struct ip6_tnl_parm2 *u, const struct __ip6_tnl_parm *p)
* %-EINVAL if passed tunnel parameters are invalid,
* %-EEXIST if changing a tunnel's parameters would cause a conflict
* %-ENODEV if attempting to change or delete a nonexisting device
+ *
+ * XXX: Currently ->ndo_siocdevprivate is called with @dev unlocked
+ * (the only place where @dev may be locked is phonet_device_autoconf(),
+ * but it can't be caller of this).
**/
static int
vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data, int cmd)
@@ -801,6 +808,7 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
struct ip6_tnl *t = NULL;
struct net *net = dev_net(dev);
struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+ struct nd_lock *nd_lock;
memset(&p1, 0, sizeof(p1));
@@ -812,7 +820,9 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
break;
}
vti6_parm_from_user(&p1, &p);
- t = vti6_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = vti6_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
} else {
memset(&p, 0, sizeof(p));
}
@@ -834,7 +844,9 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
if (p.proto != IPPROTO_IPV6 && p.proto != 0)
break;
vti6_parm_from_user(&p1, &p);
- t = vti6_locate(net, &p1, cmd == SIOCADDTUNNEL);
+ lock_netdev(dev, &nd_lock);
+ t = vti6_locate(net, nd_lock, &p1, cmd == SIOCADDTUNNEL);
+ unlock_netdev(nd_lock);
if (dev != ip6n->fb_tnl_dev && cmd == SIOCCHGTUNNEL) {
if (t) {
if (t->dev != dev) {
@@ -866,7 +878,9 @@ vti6_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data
break;
err = -ENOENT;
vti6_parm_from_user(&p1, &p);
- t = vti6_locate(net, &p1, 0);
+ lock_netdev(dev, &nd_lock);
+ t = vti6_locate(net, nd_lock, &p1, 0);
+ unlock_netdev(nd_lock);
if (!t)
break;
err = -EPERM;
@@ -1001,6 +1015,7 @@ static int vti6_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct net *net = dev_net(dev);
struct ip6_tnl *nt;
@@ -1009,7 +1024,7 @@ static int vti6_newlink(struct net *src_net, struct net_device *dev,
nt->parms.proto = IPPROTO_IPV6;
- if (vti6_locate(net, &nt->parms, 0))
+ if (vti6_locate(net, nd_lock, &nt->parms, 0))
return -EEXIST;
return vti6_tnl_create2(dev);
@@ -1028,6 +1043,7 @@ static int vti6_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip6_tnl *t;
struct __ip6_tnl_parm p;
struct net *net = dev_net(dev);
@@ -1038,7 +1054,7 @@ static int vti6_changelink(struct net_device *dev, struct nlattr *tb[],
vti6_netlink_parms(data, &p);
- t = vti6_locate(net, &p, 0);
+ t = vti6_locate(net, nd_lock, &p, 0);
if (t) {
if (t->dev != dev)
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 32/51] ip6_sit: Use __register_netdevice() in .newlink and .changelink
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (30 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 31/51] ip6_vti: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:41 ` [PATCH NET-PREV 33/51] net: Now check nobody calls register_netdevice() with nd_lock attached Kirill Tkhai
` (20 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
The objective is to conform .newlink and .changelink and their
callers, which already assign nd_lock (and matches master nd_lock
if there is one).
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ipv6/sit.c | 45 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 36 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 83b195f09561..1749defa4b70 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -212,7 +212,7 @@ static int ipip6_tunnel_create(struct net_device *dev)
dev->rtnl_link_ops = &sit_link_ops;
- err = register_netdevice(dev);
+ err = __register_netdevice(dev);
if (err < 0)
goto out;
@@ -226,6 +226,7 @@ static int ipip6_tunnel_create(struct net_device *dev)
}
static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
+ struct nd_lock *nd_lock,
struct ip_tunnel_parm_kern *parms,
int create)
{
@@ -269,6 +270,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
nt = netdev_priv(dev);
nt->parms = *parms;
+ attach_nd_lock(dev, nd_lock);
if (ipip6_tunnel_create(dev) < 0)
goto failed_free;
@@ -278,6 +280,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
return nt;
failed_free:
+ detach_nd_lock(dev);
free_netdev(dev);
failed:
return NULL;
@@ -1200,11 +1203,14 @@ ipip6_tunnel_get6rd(struct net_device *dev, struct ip_tunnel_parm __user *data)
struct ip_tunnel *t = netdev_priv(dev);
struct ip_tunnel_parm_kern p;
struct ip_tunnel_6rd ip6rd;
+ struct nd_lock *nd_lock;
if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
if (!ip_tunnel_parm_from_user(&p, data))
return -EFAULT;
- t = ipip6_tunnel_locate(t->net, &p, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, &p, 0);
+ unlock_netdev(nd_lock);
}
if (!t)
t = netdev_priv(dev);
@@ -1273,9 +1279,13 @@ static int
ipip6_tunnel_get(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
- if (dev == dev_to_sit_net(dev)->fb_tunnel_dev)
- t = ipip6_tunnel_locate(t->net, p, 0);
+ if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 0);
+ unlock_netdev(nd_lock);
+ }
if (!t)
t = netdev_priv(dev);
memcpy(p, &t->parms, sizeof(*p));
@@ -1286,13 +1296,16 @@ static int
ipip6_tunnel_add(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
int err;
err = __ipip6_tunnel_ioctl_validate(t->net, p);
if (err)
return err;
- t = ipip6_tunnel_locate(t->net, p, 1);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 1);
+ unlock_netdev(nd_lock);
if (!t)
return -ENOBUFS;
return 0;
@@ -1302,13 +1315,16 @@ static int
ipip6_tunnel_change(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
int err;
err = __ipip6_tunnel_ioctl_validate(t->net, p);
if (err)
return err;
- t = ipip6_tunnel_locate(t->net, p, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 0);
+ unlock_netdev(nd_lock);
if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
if (!t)
return -ENOENT;
@@ -1333,12 +1349,15 @@ static int
ipip6_tunnel_del(struct net_device *dev, struct ip_tunnel_parm_kern *p)
{
struct ip_tunnel *t = netdev_priv(dev);
+ struct nd_lock *nd_lock;
if (!ns_capable(t->net->user_ns, CAP_NET_ADMIN))
return -EPERM;
if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
- t = ipip6_tunnel_locate(t->net, p, 0);
+ lock_netdev(dev, &nd_lock);
+ t = ipip6_tunnel_locate(t->net, nd_lock, p, 0);
+ unlock_netdev(nd_lock);
if (!t)
return -ENOENT;
if (t == netdev_priv(dev_to_sit_net(dev)->fb_tunnel_dev))
@@ -1349,6 +1368,12 @@ ipip6_tunnel_del(struct net_device *dev, struct ip_tunnel_parm_kern *p)
return 0;
}
+/* This is called with rtnl locked and dev nd_lock unlocked.
+ * Note, that currently we take nd_lock in every of below
+ * function: ipip6_tunnel_get, ipip6_tunnel_add, etc instead
+ * of taking it once here, since there is call_netdevice_notifiers()
+ * in one of them, which is not prepared to use nd_lock yet.
+ */
static int
ipip6_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p,
int cmd)
@@ -1553,6 +1578,7 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct net *net = dev_net(dev);
struct ip_tunnel *nt;
struct ip_tunnel_encap ipencap;
@@ -1571,7 +1597,7 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
ipip6_netlink_parms(data, &nt->parms, &nt->fwmark);
- if (ipip6_tunnel_locate(net, &nt->parms, 0))
+ if (ipip6_tunnel_locate(net, nd_lock, &nt->parms, 0))
return -EEXIST;
err = ipip6_tunnel_create(dev);
@@ -1601,6 +1627,7 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct nd_lock *nd_lock = rcu_dereference_protected(dev->nd_lock, true);
struct ip_tunnel *t = netdev_priv(dev);
struct ip_tunnel_encap ipencap;
struct ip_tunnel_parm_kern p;
@@ -1627,7 +1654,7 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
(!(dev->flags & IFF_POINTOPOINT) && p.iph.daddr))
return -EINVAL;
- t = ipip6_tunnel_locate(net, &p, 0);
+ t = ipip6_tunnel_locate(net, nd_lock, &p, 0);
if (t) {
if (t->dev != dev)
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 33/51] net: Now check nobody calls register_netdevice() with nd_lock attached
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (31 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 32/51] ip6_sit: " Kirill Tkhai
@ 2025-03-22 14:41 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 34/51] dsa: Make all switch tree ports relate to same nd_lock Kirill Tkhai
` (19 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:41 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
At this moment after .newlink and .changelink are switched
to __register_netdevice(), there must not be calls of
register_netdevice() with lock attached.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 63ece39c9286..e6809a80644e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10847,25 +10847,14 @@ int register_netdevice(struct net_device *dev)
struct nd_lock *nd_lock;
int err;
- /* XXX: This "if" is to start one by one convertation
- * to use __register_netdevice() in devices, that
- * want to attach nd_lock themself (e.g., having newlink).
- * After all of them are converted, we remove this.
- */
- if (rcu_access_pointer(dev->nd_lock))
- return __register_netdevice(dev);
+ if (WARN_ON(rcu_access_pointer(dev->nd_lock)))
+ return -EINVAL;
nd_lock = alloc_nd_lock();
if (!nd_lock)
return -ENOMEM;
- /* This may be called from netdevice notifier, which is not converted
- * yet. The context is unknown: either some nd_lock is locked or not.
- * Sometimes here is nested mutex and sometimes is not. We use trylock
- * to silence lockdep assert about that.
- * It will be replaced by mutex_lock(), see next patches.
- */
- BUG_ON(!mutex_trylock(&nd_lock->mutex));
+ mutex_lock(&nd_lock->mutex);
attach_nd_lock(dev, nd_lock);
err = __register_netdevice(dev);
if (err)
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 34/51] dsa: Make all switch tree ports relate to same nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (32 preceding siblings ...)
2025-03-22 14:41 ` [PATCH NET-PREV 33/51] net: Now check nobody calls register_netdevice() with nd_lock attached Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 35/51] cfg80211: Use fallback_nd_lock for registered devices Kirill Tkhai
` (18 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
dsa_tree_migrate_ports_from_lag_conduit() may take any
of ports as new conduit, and it will be connected to
the rest of ports (and using netdev_upper_dev_link()),
so all of them must share the same nd_lock.
xxx: Keep in mind NETDEV_CHANGEUPPER is called
by netdev_upper_dev_unlink().
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/dsa/dsa.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 668c729946ea..6468b03d3d46 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -1156,6 +1156,8 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
struct dsa_switch *ds = dp->ds;
struct dsa_switch_tree *dst = ds->dst;
enum dsa_tag_protocol default_proto;
+ struct nd_lock *nd_lock, *nd_lock2;
+ struct dsa_port *first_dp;
/* Find out which protocol the switch would prefer. */
default_proto = dsa_get_tag_protocol(dp, conduit);
@@ -1213,6 +1215,18 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *conduit,
dst->tag_ops = tag_ops;
}
+ first_dp = dsa_tree_find_first_cpu(dst);
+ if (first_dp && first_dp->conduit) {
+ /* All conduits must relate the same nd_lock
+ * since dsa_tree_migrate_ports_from_lag_conduit()
+ * may take any of them from list.
+ */
+ double_lock_netdev(first_dp->conduit, &nd_lock,
+ conduit, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+ double_unlock_netdev(nd_lock, nd_lock2);
+ }
+
dp->conduit = conduit;
dp->type = DSA_PORT_TYPE_CPU;
dsa_port_set_tag_protocol(dp, dst->tag_ops);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 35/51] cfg80211: Use fallback_nd_lock for registered devices
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (33 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 34/51] dsa: Make all switch tree ports relate to same nd_lock Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 36/51] ieee802154: " Kirill Tkhai
` (17 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
For now we use fallback_nd_lock for all drivers registering
via cfg80211_register_netdevice().
One of the reasons is that they are used as a bunch
in cfg80211_switch_netns(), while we want to call
dev_change_net_namespace() under nd_lock in one of
next patches.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wireless/ath/ath6kl/core.c | 2 ++
drivers/net/wireless/ath/wil6210/netdev.c | 2 ++
drivers/net/wireless/marvell/mwifiex/main.c | 5 +++++
drivers/net/wireless/quantenna/qtnfmac/core.c | 2 ++
net/mac80211/main.c | 2 ++
net/wireless/core.c | 10 ++++++++--
net/wireless/nl80211.c | 14 ++++++++++++++
7 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath6kl/core.c b/drivers/net/wireless/ath/ath6kl/core.c
index 4f0a7a185fc9..8c28f5a476ef 100644
--- a/drivers/net/wireless/ath/ath6kl/core.c
+++ b/drivers/net/wireless/ath/ath6kl/core.c
@@ -212,6 +212,7 @@ int ath6kl_core_init(struct ath6kl *ar, enum ath6kl_htc_type htc_type)
ar->avail_idx_map |= BIT(i);
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(ar->wiphy);
/* Add an initial station interface */
@@ -219,6 +220,7 @@ int ath6kl_core_init(struct ath6kl *ar, enum ath6kl_htc_type htc_type)
NL80211_IFTYPE_STATION, 0, INFRA_NETWORK);
wiphy_unlock(ar->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
if (!wdev) {
diff --git a/drivers/net/wireless/ath/wil6210/netdev.c b/drivers/net/wireless/ath/wil6210/netdev.c
index d5d364683c0e..57958b44717d 100644
--- a/drivers/net/wireless/ath/wil6210/netdev.c
+++ b/drivers/net/wireless/ath/wil6210/netdev.c
@@ -474,9 +474,11 @@ int wil_if_add(struct wil6210_priv *wil)
wil_update_net_queues_bh(wil, vif, NULL, true);
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(wiphy);
rc = wil_vif_add(wil, vif);
wiphy_unlock(wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
if (rc < 0)
goto free_dummy;
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index 96d1f6039fbc..c4b112d2f0b2 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -623,6 +623,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
}
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(adapter->wiphy);
/* Create station interface by default */
wdev = mwifiex_add_virtual_intf(adapter->wiphy, "mlan%d", NET_NAME_ENUM,
@@ -631,6 +632,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
mwifiex_dbg(adapter, ERROR,
"cannot create default STA interface\n");
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
goto err_add_intf;
}
@@ -642,6 +644,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
mwifiex_dbg(adapter, ERROR,
"cannot create AP interface\n");
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
goto err_add_intf;
}
@@ -654,11 +657,13 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
mwifiex_dbg(adapter, ERROR,
"cannot create p2p client interface\n");
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
goto err_add_intf;
}
}
wiphy_unlock(adapter->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
mwifiex_drv_get_driver_version(adapter, fmt, sizeof(fmt) - 1);
diff --git a/drivers/net/wireless/quantenna/qtnfmac/core.c b/drivers/net/wireless/quantenna/qtnfmac/core.c
index 825b05dd3271..7952e3314aca 100644
--- a/drivers/net/wireless/quantenna/qtnfmac/core.c
+++ b/drivers/net/wireless/quantenna/qtnfmac/core.c
@@ -597,9 +597,11 @@ static int qtnf_core_mac_attach(struct qtnf_bus *bus, unsigned int macid)
mac->wiphy_registered = 1;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(priv_to_wiphy(mac));
ret = qtnf_core_net_attach(mac, vif, "wlan%d", NET_NAME_ENUM);
wiphy_unlock(priv_to_wiphy(mac));
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
if (ret) {
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index a3104b6ea6f0..bacea2473a21 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1582,6 +1582,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
ieee80211_check_wbrf_support(local);
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
wiphy_lock(hw->wiphy);
/* add one default STA interface if supported */
@@ -1597,6 +1598,7 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
}
wiphy_unlock(hw->wiphy);
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
#ifdef CONFIG_INET
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 4d5d351bd0b5..8ba0ada86678 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1439,7 +1439,11 @@ int cfg80211_register_netdevice(struct net_device *dev)
/* we'll take care of this */
wdev->registered = true;
wdev->registering = true;
- ret = register_netdevice(dev);
+
+ if (!mutex_is_locked(&fallback_nd_lock.mutex))
+ return -EXDEV;
+ attach_nd_lock(dev, &fallback_nd_lock);
+ ret = __register_netdevice(dev);
if (ret)
goto out;
@@ -1447,8 +1451,10 @@ int cfg80211_register_netdevice(struct net_device *dev)
ret = 0;
out:
wdev->registering = false;
- if (ret)
+ if (ret) {
+ detach_nd_lock(dev);
wdev->registered = false;
+ }
return ret;
}
EXPORT_SYMBOL(cfg80211_register_netdevice);
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 7397a372c78e..0fd66f75eace 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16455,6 +16455,7 @@ nl80211_set_ttlm(struct sk_buff *skb, struct genl_info *info)
#define NL80211_FLAG_NO_WIPHY_MTX 0x40
#define NL80211_FLAG_MLO_VALID_LINK_ID 0x80
#define NL80211_FLAG_MLO_UNSUPPORTED 0x100
+#define NL80211_FLAG_NEED_FALLBACK_ND_LOCK 0x200
#define INTERNAL_FLAG_SELECTORS(__sel) \
SELECTOR(__sel, NONE, 0) /* must be first */ \
@@ -16477,6 +16478,11 @@ nl80211_set_ttlm(struct sk_buff *skb, struct genl_info *info)
NL80211_FLAG_NEED_WIPHY | \
NL80211_FLAG_NEED_RTNL | \
NL80211_FLAG_NO_WIPHY_MTX) \
+ SELECTOR(__sel, WIPHY_RTNL_ND_LOCK, \
+ NL80211_FLAG_NEED_WIPHY | \
+ NL80211_FLAG_NEED_FALLBACK_ND_LOCK | \
+ NL80211_FLAG_NEED_RTNL | \
+ NL80211_FLAG_NO_WIPHY_MTX) \
SELECTOR(__sel, WDEV_RTNL, \
NL80211_FLAG_NEED_WDEV | \
NL80211_FLAG_NEED_RTNL) \
@@ -16545,6 +16551,7 @@ static int nl80211_pre_doit(const struct genl_split_ops *ops,
internal_flags = nl80211_internal_flags[ops->internal_flags];
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
if (internal_flags & NL80211_FLAG_NEED_WIPHY) {
rdev = cfg80211_get_dev_from_info(genl_info_net(info), info);
if (IS_ERR(rdev)) {
@@ -16621,11 +16628,15 @@ static int nl80211_pre_doit(const struct genl_split_ops *ops,
/* we keep the mutex locked until post_doit */
__release(&rdev->wiphy.mtx);
}
+
+ if (!(internal_flags & NL80211_FLAG_NEED_FALLBACK_ND_LOCK))
+ mutex_unlock(&fallback_nd_lock.mutex);
if (!(internal_flags & NL80211_FLAG_NEED_RTNL))
rtnl_unlock();
return 0;
out_unlock:
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
dev_put(dev);
return err;
@@ -16656,6 +16667,8 @@ static void nl80211_post_doit(const struct genl_split_ops *ops,
wiphy_unlock(&rdev->wiphy);
}
+ if (internal_flags & NL80211_FLAG_NEED_FALLBACK_ND_LOCK)
+ mutex_unlock(&fallback_nd_lock.mutex);
if (internal_flags & NL80211_FLAG_NEED_RTNL)
rtnl_unlock();
@@ -16821,6 +16834,7 @@ static const struct genl_small_ops nl80211_small_ops[] = {
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags =
IFLAGS(NL80211_FLAG_NEED_WIPHY |
+ NL80211_FLAG_NEED_FALLBACK_ND_LOCK |
NL80211_FLAG_NEED_RTNL |
/* we take the wiphy mutex later ourselves */
NL80211_FLAG_NO_WIPHY_MTX),
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 36/51] ieee802154: Use fallback_nd_lock for registered devices
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (34 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 35/51] cfg80211: Use fallback_nd_lock for registered devices Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 37/51] net: Introduce delayed event work Kirill Tkhai
` (16 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
For now we use fallback_nd_lock for all drivers registering
via ieee802154_if_add().
One of the reasons is that they are used as a bunch
in cfg802154_switch_netns(), while we want to call
dev_change_net_namespace() under nd_lock in one of
next patches.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ieee802154/nl802154.c | 7 +++++++
net/mac802154/cfg.c | 2 ++
net/mac802154/iface.c | 10 ++++++++--
net/mac802154/main.c | 2 ++
4 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index 7eb37de3add2..a512f2a647e8 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -2691,6 +2691,7 @@ static int nl802154_del_llsec_seclevel(struct sk_buff *skb,
#define NL802154_FLAG_NEED_RTNL 0x04
#define NL802154_FLAG_CHECK_NETDEV_UP 0x08
#define NL802154_FLAG_NEED_WPAN_DEV 0x10
+#define NL802154_FLAG_NEED_FALLBACK_ND_LOCK 0x20
static int nl802154_pre_doit(const struct genl_split_ops *ops,
struct sk_buff *skb,
@@ -2700,9 +2701,12 @@ static int nl802154_pre_doit(const struct genl_split_ops *ops,
struct wpan_dev *wpan_dev;
struct net_device *dev;
bool rtnl = ops->internal_flags & NL802154_FLAG_NEED_RTNL;
+ bool nd = ops->internal_flags & NL802154_FLAG_NEED_FALLBACK_ND_LOCK;
if (rtnl)
rtnl_lock();
+ if (nd)
+ mutex_lock(&fallback_nd_lock.mutex);
if (ops->internal_flags & NL802154_FLAG_NEED_WPAN_PHY) {
rdev = cfg802154_get_dev_from_info(genl_info_net(info), info);
@@ -2769,6 +2773,8 @@ static void nl802154_post_doit(const struct genl_split_ops *ops,
}
}
+ if (ops->internal_flags & NL802154_FLAG_NEED_FALLBACK_ND_LOCK)
+ mutex_unlock(&fallback_nd_lock.mutex);
if (ops->internal_flags & NL802154_FLAG_NEED_RTNL)
rtnl_unlock();
}
@@ -2800,6 +2806,7 @@ static const struct genl_ops nl802154_ops[] = {
.doit = nl802154_new_interface,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
+ NL802154_FLAG_NEED_FALLBACK_ND_LOCK |
NL802154_FLAG_NEED_RTNL,
},
{
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index ef7f23af043f..405183d258b6 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -23,8 +23,10 @@ ieee802154_add_iface_deprecated(struct wpan_phy *wpan_phy,
struct net_device *dev;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
dev = ieee802154_if_add(local, name, name_assign_type, type,
cpu_to_le64(0x0000000000000000ULL));
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
return dev;
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c
index c0e2da5072be..7ec23e8268de 100644
--- a/net/mac802154/iface.c
+++ b/net/mac802154/iface.c
@@ -664,9 +664,15 @@ ieee802154_if_add(struct ieee802154_local *local, const char *name,
if (ret)
goto err;
- ret = register_netdevice(ndev);
- if (ret < 0)
+ ret = -EXDEV;
+ if (!mutex_is_locked(&fallback_nd_lock.mutex))
+ goto err;
+ attach_nd_lock(ndev, &fallback_nd_lock);
+ ret = __register_netdevice(ndev);
+ if (ret < 0) {
+ detach_nd_lock(ndev);
goto err;
+ }
mutex_lock(&local->iflist_mtx);
list_add_tail_rcu(&sdata->list, &local->interfaces);
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index 21b7c3b280b4..14bcad399dae 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -246,9 +246,11 @@ int ieee802154_register_hw(struct ieee802154_hw *hw)
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
dev = ieee802154_if_add(local, "wpan%d", NET_NAME_ENUM,
NL802154_IFTYPE_NODE,
cpu_to_le64(0x0000000000000000ULL));
+ mutex_unlock(&fallback_nd_lock.mutex);
if (IS_ERR(dev)) {
rtnl_unlock();
rc = PTR_ERR(dev);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 37/51] net: Introduce delayed event work
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (35 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 36/51] ieee802154: " Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 38/51] failover: Link master and slave under nd_lock Kirill Tkhai
` (15 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Some drivers (e.g., failover and netvsc) use netdevice notifiers
to link devices each other by calling netdev_master_upper_dev_link().
Since we want 1)to make both of the devices using the same lock after
linking, and 2)to call netdevice notifiers with nd_lock is locked,
we can't do these two options at the same time, because there will
be a problem with priority inversion:
lock_netdev(dev1, &nd_lock1);
call_netdevice_notifier()
lock_netdev(dev2, &nd_lock2); <--- problem here if !locks_ordered()
nd_lock_transfer_devices(nd_lock, nd_lock2);
netdev_master_upper_dev_link(dev1, dev2);
We can't use double_lock_netdev() instead of lock_netdev() here,
since dev2 is unknown at that moment.
This patch introduces interface to allow handling events in delayed work.
It consists of three:
1)Delayed work to call event callback. The work starting without
any locks locked, so it can take locks of both devices in correct
order;
2)Completion to notify the task that delayed work is done;
3)task_work to allow task to wait for the completion in
the place where task has nd_lock unlocked.
Here is an example of what happens on module loading:
[Task] [Work]
insmod slave_netdev_drv.ko
enter to kernel
init_module()
...
...
lock_netdev()
call_netdevice_notifier()
schedule_delayed_event()
unlock_netdev()
delayed_event_work()
double_lock_netdev(dev1, &nd_lock1, dev2, &nd_lock2)
nd_lock_transfer_devices(nd_lock, nd_lock2)
netdev_master_upper_dev_link(dev1, dev2)
double_unlock_netdev(nd_lock1, nd_lock2)
complete()
wait_for_delayed_event_work()
wait_for_completion()
exit to userspace
As it's seen, using of task work allows to remain user-visible behavior here.
We return from syscall to userspace after delayed work is completed and
all events are handled. This is why we need this task work.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
include/linux/netdevice.h | 2 +
net/core/dev.c | 95 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 97 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e9052e808a4..83b675ec2b0a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2991,6 +2991,8 @@ netdev_notifier_info_to_extack(const struct netdev_notifier_info *info)
int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
int call_netdevice_notifiers_info(unsigned long val,
struct netdev_notifier_info *info);
+int schedule_delayed_event(struct net_device *dev,
+ void (*func)(struct net_device *dev));
#define for_each_netdev(net, d) \
list_for_each_entry(d, &(net)->dev_base_head, dev_list)
diff --git a/net/core/dev.c b/net/core/dev.c
index e6809a80644e..1c447446215d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -154,6 +154,7 @@
#include <linux/pm_runtime.h>
#include <linux/prandom.h>
#include <linux/once_lite.h>
+#include <linux/task_work.h>
#include <net/netdev_rx_queue.h>
#include <net/page_pool/types.h>
#include <net/page_pool/helpers.h>
@@ -2088,6 +2089,100 @@ static int call_netdevice_notifiers_mtu(unsigned long val,
return call_netdevice_notifiers_info(val, &info.info);
}
+struct event_info {
+ struct work_struct work;
+ struct net_device *dev;
+ netdevice_tracker dev_tracker;
+ void (*func)(struct net_device *slave_dev);
+
+ struct callback_head task_work;
+ struct completion comp;
+ refcount_t usage;
+};
+
+static void put_delayed_reg_info(struct event_info *info)
+{
+ if (refcount_dec_and_test(&info->usage))
+ kfree(info);
+}
+
+static void delayed_event_work(struct work_struct *work)
+{
+ struct event_info *info;
+ struct net_device *dev;
+
+ info = container_of(work, struct event_info, work);
+ dev = info->dev;
+
+ info->func(dev);
+
+ /* Not needed to own device during all @info life.
+ * Put device right after callback is handled,
+ * since a task submitted this work may wait for
+ * @dev counter.
+ */
+ netdev_put(dev, &info->dev_tracker);
+ info->dev = NULL;
+
+ complete(&info->comp);
+ put_delayed_reg_info(info);
+}
+
+static void wait_for_delayed_event_work(struct callback_head *task_work)
+{
+ struct event_info *info;
+
+ info = container_of(task_work, struct event_info, task_work);
+ wait_for_completion(&info->comp);
+
+ put_delayed_reg_info(info);
+}
+
+static struct event_info *alloc_delayed_event_info(struct net_device *dev,
+ void (*func)(struct net_device *dev))
+{
+ struct event_info *info;
+
+ info = kmalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return NULL;
+
+ INIT_WORK(&info->work, delayed_event_work);
+ init_task_work(&info->task_work, wait_for_delayed_event_work);
+ init_completion(&info->comp);
+ refcount_set(&info->usage, 1);
+ info->func = func;
+ info->dev = dev;
+ netdev_hold(dev, &info->dev_tracker, GFP_KERNEL);
+
+ return info;
+}
+
+int schedule_delayed_event(struct net_device *dev,
+ void (*func)(struct net_device *dev))
+{
+ struct event_info *info;
+
+ info = alloc_delayed_event_info(dev, func);
+ if (!info)
+ return NOTIFY_DONE;
+
+ /* In case of the notifier is called from regular task,
+ * make the task to wait for registration is completed
+ * before task is returned to userspace. E.g., a syscall
+ * caller will have failover already connected after
+ * he loaded slave device driver.
+ */
+ if (!(current->flags & PF_KTHREAD)) {
+ if (!task_work_add(current, &info->task_work, TWA_RESUME))
+ refcount_inc(&info->usage);
+ }
+
+ schedule_work(&info->work);
+ return NOTIFY_OK;
+}
+EXPORT_SYMBOL_GPL(schedule_delayed_event);
+
#ifdef CONFIG_NET_INGRESS
static DEFINE_STATIC_KEY_FALSE(ingress_needed_key);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 38/51] failover: Link master and slave under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (36 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 37/51] net: Introduce delayed event work Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 39/51] netvsc: Make joined device to share master's nd_lock Kirill Tkhai
` (14 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We don't want to do this in failover_event(), since we
want to call netdevice notifiers with nd_lock already
locked in the future.
Also see comments in patch introducing schedule_delayed_event()
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/failover.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/net/core/failover.c b/net/core/failover.c
index 2a140b3ea669..83be0d0ab99a 100644
--- a/net/core/failover.c
+++ b/net/core/failover.c
@@ -46,6 +46,7 @@ static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
static int failover_slave_register(struct net_device *slave_dev)
{
struct netdev_lag_upper_info lag_upper_info;
+ struct nd_lock *nd_lock, *nd_lock2;
struct net_device *failover_dev;
struct failover_ops *fops;
int err;
@@ -72,8 +73,14 @@ static int failover_slave_register(struct net_device *slave_dev)
}
lag_upper_info.tx_type = NETDEV_LAG_TX_TYPE_ACTIVEBACKUP;
+
+ double_lock_netdev(slave_dev, &nd_lock, failover_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+
err = netdev_master_upper_dev_link(slave_dev, failover_dev, NULL,
&lag_upper_info, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
+
if (err) {
netdev_err(slave_dev, "can not set failover device %s (err = %d)\n",
failover_dev->name, err);
@@ -182,6 +189,18 @@ static int failover_slave_name_change(struct net_device *slave_dev)
return NOTIFY_DONE;
}
+static void call_failover_slave_register(struct net_device *dev)
+{
+ rtnl_lock();
+ if (dev->reg_state == NETREG_REGISTERED) {
+ failover_slave_register(dev);
+ failover_slave_link_change(dev);
+ failover_slave_name_change(dev);
+
+ }
+ rtnl_unlock();
+}
+
static int
failover_event(struct notifier_block *this, unsigned long event, void *ptr)
{
@@ -193,7 +212,10 @@ failover_event(struct notifier_block *this, unsigned long event, void *ptr)
switch (event) {
case NETDEV_REGISTER:
- return failover_slave_register(event_dev);
+ if (netdev_is_rx_handler_busy(event_dev))
+ return NOTIFY_DONE;
+ return schedule_delayed_event(event_dev,
+ call_failover_slave_register);
case NETDEV_UNREGISTER:
return failover_slave_unregister(event_dev);
case NETDEV_UP:
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 39/51] netvsc: Make joined device to share master's nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (37 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 38/51] failover: Link master and slave under nd_lock Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 40/51] openvswitch: Make ports share nd_lock of master device Kirill Tkhai
` (13 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We don't want to do that from netvsc_netdev_event() since
we want to make netdevice notifiers be called under nd_lock
in future.
Also see comments in patch introducing schedule_delayed_event()
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/hyperv/netvsc_drv.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 44142245343d..be8038e6393f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2192,6 +2192,7 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
struct net_device *ndev, int context)
{
struct net_device_context *ndev_ctx = netdev_priv(ndev);
+ struct nd_lock *nd_lock, *nd_lock2;
int ret;
ret = netdev_rx_handler_register(vf_netdev,
@@ -2203,8 +2204,12 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
goto rx_handler_failed;
}
+ double_lock_netdev(ndev, &nd_lock, vf_netdev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+
ret = netdev_master_upper_dev_link(vf_netdev, ndev,
NULL, NULL, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
if (ret != 0) {
netdev_err(vf_netdev,
"can not set master device %s (err = %d)\n",
@@ -2797,6 +2802,20 @@ static struct hv_driver netvsc_drv = {
},
};
+static void call_netvsc_register(struct net_device *dev)
+{
+ unsigned long event;
+
+ rtnl_lock();
+ netvsc_prepare_bonding(dev);
+ netvsc_register_vf(dev, VF_REG_IN_NOTIFIER);
+ event = NETDEV_GOING_DOWN;
+ if (netif_running(dev))
+ event = NETDEV_CHANGE;
+ netvsc_vf_changed(dev, event);
+ rtnl_unlock();
+}
+
/*
* On Hyper-V, every VF interface is matched with a corresponding
* synthetic interface. The synthetic interface is presented first
@@ -2814,10 +2833,10 @@ static int netvsc_netdev_event(struct notifier_block *this,
return NOTIFY_DONE;
switch (event) {
- case NETDEV_POST_INIT:
- return netvsc_prepare_bonding(event_dev);
case NETDEV_REGISTER:
- return netvsc_register_vf(event_dev, VF_REG_IN_NOTIFIER);
+ return schedule_delayed_event(event_dev,
+ call_netvsc_register);
+ return NOTIFY_DONE;
case NETDEV_UNREGISTER:
return netvsc_unregister_vf(event_dev);
case NETDEV_UP:
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 40/51] openvswitch: Make ports share nd_lock of master device
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (38 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 39/51] netvsc: Make joined device to share master's nd_lock Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:42 ` [PATCH NET-PREV 41/51] bridge: Make port to have the same nd_lock as bridge Kirill Tkhai
` (12 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/openvswitch/vport-netdev.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 91a11067e458..e629fc3c1442 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -75,6 +75,7 @@ static struct net_device *get_dpdev(const struct datapath *dp)
struct vport *ovs_netdev_link(struct vport *vport, const char *name)
{
+ struct nd_lock *nd_lock, *nd_lock2;
int err;
vport->dev = dev_get_by_name(ovs_dp_get_net(vport->dp), name);
@@ -99,9 +100,14 @@ struct vport *ovs_netdev_link(struct vport *vport, const char *name)
}
rtnl_lock();
+ double_lock_netdev(vport->dev, &nd_lock, get_dpdev(vport->dp), &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
+
err = netdev_master_upper_dev_link(vport->dev,
get_dpdev(vport->dp),
NULL, NULL, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
+
if (err)
goto error_unlock;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 41/51] bridge: Make port to have the same nd_lock as bridge
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (39 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 40/51] openvswitch: Make ports share nd_lock of master device Kirill Tkhai
@ 2025-03-22 14:42 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 42/51] bond: Make master and slave relate to the same nd_lock Kirill Tkhai
` (11 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:42 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/bridge/br_ioctl.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index f213ed108361..b4b0cc6ac08b 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -85,6 +85,7 @@ static int get_fdb_entries(struct net_bridge *br, void __user *userbuf,
static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
{
struct net *net = dev_net(br->dev);
+ struct nd_lock *nd_lock, *nd_lock2;
struct net_device *dev;
int ret;
@@ -95,9 +96,12 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
if (dev == NULL)
return -EINVAL;
- if (isadd)
+ if (isadd) {
+ double_lock_netdev(br->dev, &nd_lock, dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
ret = br_add_if(br, dev, NULL);
- else
+ double_unlock_netdev(nd_lock, nd_lock2);
+ } else
ret = br_del_if(br, dev);
return ret;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 42/51] bond: Make master and slave relate to the same nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (40 preceding siblings ...)
2025-03-22 14:42 ` [PATCH NET-PREV 41/51] bridge: Make port to have the same nd_lock as bridge Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 43/51] net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached Kirill Tkhai
` (10 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/bonding/bond_main.c | 4 ++++
drivers/net/bonding/bond_options.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 96f5470a5f55..1140e01f72b8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4490,6 +4490,7 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
struct ifbond __user *u_binfo = NULL;
struct ifslave k_sinfo;
struct ifslave __user *u_sinfo = NULL;
+ struct nd_lock *nd_lock, *nd_lock2;
struct bond_opt_value newval;
struct net *net;
int res = 0;
@@ -4538,7 +4539,10 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
switch (cmd) {
case SIOCBONDENSLAVE:
+ double_lock_netdev(bond_dev, &nd_lock, slave_dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
res = bond_enslave(bond_dev, slave_dev, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
break;
case SIOCBONDRELEASE:
res = bond_release(bond_dev, slave_dev);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 95d59a18c022..a3ebb8d6c529 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1605,6 +1605,7 @@ static int bond_option_slaves_set(struct bonding *bond,
const struct bond_opt_value *newval)
{
char command[IFNAMSIZ + 1] = { 0, };
+ struct nd_lock *nd_lock, *nd_lock2;
struct net_device *dev;
char *ifname;
int ret;
@@ -1627,7 +1628,10 @@ static int bond_option_slaves_set(struct bonding *bond,
switch (command[0]) {
case '+':
slave_dbg(bond->dev, dev, "Enslaving interface\n");
+ double_lock_netdev(bond->dev, &nd_lock, dev, &nd_lock2);
+ nd_lock_transfer_devices(&nd_lock, &nd_lock2);
ret = bond_enslave(bond->dev, dev, NULL);
+ double_unlock_netdev(nd_lock, nd_lock2);
break;
case '-':
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 43/51] net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (41 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 42/51] bond: Make master and slave relate to the same nd_lock Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 44/51] net: Call dellink with nd_lock is held Kirill Tkhai
` (9 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
... or with devices not related to the same nd_lock,
since at this moment all callers are switched to follow this way.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 1c447446215d..55df8157bca9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8126,6 +8126,12 @@ int netdev_master_upper_dev_link(struct net_device *dev,
.flags = NESTED_SYNC_IMM | NESTED_SYNC_TODO,
.data = NULL,
};
+ struct nd_lock *nd_lock;
+
+ nd_lock = rcu_dereference_protected(upper_dev->nd_lock, true);
+ if (WARN_ON(!mutex_is_locked(&nd_lock->mutex) ||
+ nd_lock != rcu_dereference_protected(dev->nd_lock, true)))
+ return -EXDEV;
return __netdev_upper_dev_link(dev, upper_dev, true,
upper_priv, upper_info, &priv, extack);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 44/51] net: Call dellink with nd_lock is held
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (42 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 43/51] net: Now check nobody calls netdev_master_upper_dev_link() without nd_lock attached Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 45/51] t7xx: Use __unregister_netdevice() Kirill Tkhai
` (8 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
After previous patches all linked devices share the same lock.
Here we add nd_lock around dellink to start making calls of
unregister_netdevice() under nd_lock is locked.
One more good thing is many netdev_upper_dev_unlink() becomes
called under nd_lock is held, but not all yet.
Note, that ->dellink called from netdevice notifiers are not
braced yet.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 3 +++
net/core/rtnetlink.c | 10 ++++++++++
2 files changed, 13 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 55df8157bca9..f0f93b5a2819 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12399,6 +12399,7 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
* Do this across as many network namespaces as possible to
* improve batching efficiency.
*/
+ struct nd_lock *nd_lock;
struct net_device *dev;
struct net *net;
LIST_HEAD(dev_kill_list);
@@ -12411,10 +12412,12 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
list_for_each_entry(net, net_list, exit_list) {
for_each_netdev_reverse(net, dev) {
+ lock_netdev(dev, &nd_lock);
if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
dev->rtnl_link_ops->dellink(dev, &dev_kill_list);
else
unregister_netdevice_queue(dev, &dev_kill_list);
+ unlock_netdev(nd_lock);
}
}
unregister_netdevice_many(&dev_kill_list);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 67b4b0610d14..fdc06f0ecf31 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -449,12 +449,15 @@ EXPORT_SYMBOL_GPL(rtnl_link_register);
static void __rtnl_kill_links(struct net *net, struct rtnl_link_ops *ops)
{
+ struct nd_lock *nd_lock;
struct net_device *dev;
LIST_HEAD(list_kill);
for_each_netdev(net, dev) {
+ lock_netdev(dev, &nd_lock);
if (dev->rtnl_link_ops == ops)
ops->dellink(dev, &list_kill);
+ unlock_netdev(nd_lock);
}
unregister_netdevice_many(&list_kill);
}
@@ -3260,9 +3263,12 @@ static int rtnl_group_dellink(const struct net *net, int group)
for_each_netdev_safe(net, dev, aux) {
if (dev->group == group) {
const struct rtnl_link_ops *ops;
+ struct nd_lock *nd_lock;
ops = dev->rtnl_link_ops;
+ lock_netdev(dev, &nd_lock);
ops->dellink(dev, &list_kill);
+ unlock_netdev(nd_lock);
}
}
unregister_netdevice_many(&list_kill);
@@ -3273,13 +3279,17 @@ static int rtnl_group_dellink(const struct net *net, int group)
int rtnl_delete_link(struct net_device *dev, u32 portid, const struct nlmsghdr *nlh)
{
const struct rtnl_link_ops *ops;
+ struct nd_lock *nd_lock;
LIST_HEAD(list_kill);
ops = dev->rtnl_link_ops;
if (!ops || !ops->dellink)
return -EOPNOTSUPP;
+ lock_netdev(dev, &nd_lock);
ops->dellink(dev, &list_kill);
+ unlock_netdev(nd_lock);
+
unregister_netdevice_many_notify(&list_kill, portid, nlh);
return 0;
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 45/51] t7xx: Use __unregister_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (43 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 44/51] net: Call dellink with nd_lock is held Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 46/51] 6lowpan: " Kirill Tkhai
` (7 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
->dellink is going to be called with nd_lock is held
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/wwan/t7xx/t7xx_netdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wwan/t7xx/t7xx_netdev.c b/drivers/net/wwan/t7xx/t7xx_netdev.c
index 3bde38147930..d3da299a59ff 100644
--- a/drivers/net/wwan/t7xx/t7xx_netdev.c
+++ b/drivers/net/wwan/t7xx/t7xx_netdev.c
@@ -324,7 +324,7 @@ static void t7xx_ccmni_wwan_dellink(void *ctxt, struct net_device *dev, struct l
if (WARN_ON(ctlb->ccmni_inst[if_id] != ccmni))
return;
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
}
static const struct wwan_ops ccmni_wwan_ops = {
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 46/51] 6lowpan: Use __unregister_netdevice()
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (44 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 45/51] t7xx: Use __unregister_netdevice() Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 47/51] netvsc: Call dev_change_net_namespace() under nd_lock Kirill Tkhai
` (6 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
->dellink is going to be called with nd_lock is held
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/6lowpan/core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
index b5cbf85b291c..bd77076b125e 100644
--- a/net/6lowpan/core.c
+++ b/net/6lowpan/core.c
@@ -71,15 +71,19 @@ EXPORT_SYMBOL(lowpan_register_netdev);
void lowpan_unregister_netdevice(struct net_device *dev)
{
- unregister_netdevice(dev);
+ __unregister_netdevice(dev);
lowpan_dev_debugfs_exit(dev);
}
EXPORT_SYMBOL(lowpan_unregister_netdevice);
void lowpan_unregister_netdev(struct net_device *dev)
{
+ struct nd_lock *nd_lock;
+
rtnl_lock();
+ lock_netdev(dev, &nd_lock);
lowpan_unregister_netdevice(dev);
+ unlock_netdev(nd_lock);
rtnl_unlock();
}
EXPORT_SYMBOL(lowpan_unregister_netdev);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 47/51] netvsc: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (45 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 46/51] 6lowpan: " Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 48/51] default_device: " Kirill Tkhai
` (5 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
drivers/net/hyperv/netvsc_drv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index be8038e6393f..cc9f07f8d499 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2365,6 +2365,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev, int context)
struct netvsc_device *netvsc_dev;
struct bpf_prog *prog;
struct net_device *ndev;
+ struct nd_lock *nd_lock;
int ret;
if (vf_netdev->addr_len != ETH_ALEN)
@@ -2384,8 +2385,10 @@ static int netvsc_register_vf(struct net_device *vf_netdev, int context)
* done again in that context.
*/
if (!net_eq(dev_net(ndev), dev_net(vf_netdev))) {
+ lock_netdev(vf_netdev, &nd_lock);
ret = dev_change_net_namespace(vf_netdev,
dev_net(ndev), "eth%d");
+ unlock_netdev(nd_lock);
if (ret)
netdev_err(vf_netdev,
"could not move to same namespace as %s: %d\n",
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 48/51] default_device: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (46 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 47/51] netvsc: Call dev_change_net_namespace() under nd_lock Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:43 ` [PATCH NET-PREV 49/51] ieee802154: " Kirill Tkhai
` (4 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index f0f93b5a2819..c477b39d08b9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -12357,6 +12357,7 @@ static void __net_exit default_device_exit_net(struct net *net)
{
struct netdev_name_node *name_node, *tmp;
struct net_device *dev, *aux;
+ struct nd_lock *nd_lock;
/*
* Push all migratable network devices back to the
* initial network namespace
@@ -12383,7 +12384,9 @@ static void __net_exit default_device_exit_net(struct net *net)
if (netdev_name_in_use(&init_net, name_node->name))
__netdev_name_node_alt_destroy(name_node);
+ lock_netdev(dev, &nd_lock);
err = dev_change_net_namespace(dev, &init_net, fb_name);
+ unlock_netdev(nd_lock);
if (err) {
pr_emerg("%s: failed to move %s to init_net: %d\n",
__func__, dev->name, err);
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 49/51] ieee802154: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (47 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 48/51] default_device: " Kirill Tkhai
@ 2025-03-22 14:43 ` Kirill Tkhai
2025-03-22 14:44 ` [PATCH NET-PREV 50/51] cfg80211: " Kirill Tkhai
` (3 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:43 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/ieee802154/core.c | 2 ++
net/ieee802154/nl802154.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c
index 60e8fff1347e..8a85a57bf042 100644
--- a/net/ieee802154/core.c
+++ b/net/ieee802154/core.c
@@ -349,10 +349,12 @@ static void __net_exit cfg802154_pernet_exit(struct net *net)
struct cfg802154_registered_device *rdev;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
list_for_each_entry(rdev, &cfg802154_rdev_list, list) {
if (net_eq(wpan_phy_net(&rdev->wpan_phy), net))
WARN_ON(cfg802154_switch_netns(rdev, &init_net));
}
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
}
diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index a512f2a647e8..e8f21de679b7 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -2855,6 +2855,7 @@ static const struct genl_ops nl802154_ops[] = {
.doit = nl802154_wpan_phy_netns,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
+ NL802154_FLAG_NEED_FALLBACK_ND_LOCK |
NL802154_FLAG_NEED_RTNL,
},
{
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 50/51] cfg80211: Call dev_change_net_namespace() under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (48 preceding siblings ...)
2025-03-22 14:43 ` [PATCH NET-PREV 49/51] ieee802154: " Kirill Tkhai
@ 2025-03-22 14:44 ` Kirill Tkhai
2025-03-22 14:44 ` [PATCH NET-PREV 51/51] net: Make all NETDEV_REGISTER events to be called " Kirill Tkhai
` (2 subsequent siblings)
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:44 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
We want to provide "nd_lock is locked" context during
NETDEV_REGISTER (and later for NETDEV_UNREGISTER)
events. When calling from __register_netdevice(),
notifiers are already in that context, and we do the
same for dev_change_net_namespace() here.
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/wireless/core.c | 2 ++
net/wireless/nl80211.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 8ba0ada86678..c661bba9fc7b 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1605,10 +1605,12 @@ static void __net_exit cfg80211_pernet_exit(struct net *net)
struct cfg80211_registered_device *rdev;
rtnl_lock();
+ mutex_lock(&fallback_nd_lock.mutex);
for_each_rdev(rdev) {
if (net_eq(wiphy_net(&rdev->wiphy), net))
WARN_ON(cfg80211_switch_netns(rdev, &init_net));
}
+ mutex_unlock(&fallback_nd_lock.mutex);
rtnl_unlock();
}
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 0fd66f75eace..f8bd7c72bd3e 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -17136,6 +17136,7 @@ static const struct genl_small_ops nl80211_small_ops[] = {
.doit = nl80211_wiphy_netns,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = IFLAGS(NL80211_FLAG_NEED_WIPHY |
+ NL80211_FLAG_NEED_FALLBACK_ND_LOCK |
NL80211_FLAG_NEED_RTNL |
NL80211_FLAG_NO_WIPHY_MTX),
},
^ permalink raw reply related [flat|nested] 54+ messages in thread* [PATCH NET-PREV 51/51] net: Make all NETDEV_REGISTER events to be called under nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (49 preceding siblings ...)
2025-03-22 14:44 ` [PATCH NET-PREV 50/51] cfg80211: " Kirill Tkhai
@ 2025-03-22 14:44 ` Kirill Tkhai
2025-03-24 2:51 ` [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Stanislav Fomichev
2025-03-25 11:15 ` Jakub Kicinski
52 siblings, 0 replies; 54+ messages in thread
From: Kirill Tkhai @ 2025-03-22 14:44 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tkhai
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
net/core/dev.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index c477b39d08b9..03c1bfa35309 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1737,13 +1737,19 @@ static void call_netdevice_unregister_notifiers(struct notifier_block *nb,
}
static int call_netdevice_register_net_notifiers(struct notifier_block *nb,
- struct net *net)
+ struct net *net,
+ bool locked)
{
+ struct nd_lock *nd_lock;
struct net_device *dev;
int err;
for_each_netdev(net, dev) {
+ if (!locked)
+ lock_netdev(dev, &nd_lock);
err = call_netdevice_register_notifiers(nb, dev);
+ if (!locked)
+ unlock_netdev(nd_lock);
if (err)
goto rollback;
}
@@ -1794,7 +1800,7 @@ int register_netdevice_notifier(struct notifier_block *nb)
if (dev_boot_phase)
goto unlock;
for_each_net(net) {
- err = call_netdevice_register_net_notifiers(nb, net);
+ err = call_netdevice_register_net_notifiers(nb, net, false);
if (err)
goto rollback;
}
@@ -1851,7 +1857,8 @@ EXPORT_SYMBOL(unregister_netdevice_notifier);
static int __register_netdevice_notifier_net(struct net *net,
struct notifier_block *nb,
- bool ignore_call_fail)
+ bool ignore_call_fail,
+ bool locked)
{
int err;
@@ -1861,7 +1868,7 @@ static int __register_netdevice_notifier_net(struct net *net,
if (dev_boot_phase)
return 0;
- err = call_netdevice_register_net_notifiers(nb, net);
+ err = call_netdevice_register_net_notifiers(nb, net, locked);
if (err && !ignore_call_fail)
goto chain_unregister;
@@ -1905,7 +1912,7 @@ int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb)
int err;
rtnl_lock();
- err = __register_netdevice_notifier_net(net, nb, false);
+ err = __register_netdevice_notifier_net(net, nb, false, false);
rtnl_unlock();
return err;
}
@@ -1944,17 +1951,20 @@ static void __move_netdevice_notifier_net(struct net *src_net,
struct notifier_block *nb)
{
__unregister_netdevice_notifier_net(src_net, nb);
- __register_netdevice_notifier_net(dst_net, nb, true);
+ __register_netdevice_notifier_net(dst_net, nb, true, true);
}
int register_netdevice_notifier_dev_net(struct net_device *dev,
struct notifier_block *nb,
struct netdev_net_notifier *nn)
{
+ struct nd_lock *nd_lock;
int err;
rtnl_lock();
- err = __register_netdevice_notifier_net(dev_net(dev), nb, false);
+ lock_netdev(dev, &nd_lock);
+ err = __register_netdevice_notifier_net(dev_net(dev), nb, false, true);
+ unlock_netdev(nd_lock);
if (!err) {
nn->nb = nb;
list_add(&nn->list, &dev->net_notifier_list);
^ permalink raw reply related [flat|nested] 54+ messages in thread* Re: [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (50 preceding siblings ...)
2025-03-22 14:44 ` [PATCH NET-PREV 51/51] net: Make all NETDEV_REGISTER events to be called " Kirill Tkhai
@ 2025-03-24 2:51 ` Stanislav Fomichev
2025-03-25 11:15 ` Jakub Kicinski
52 siblings, 0 replies; 54+ messages in thread
From: Stanislav Fomichev @ 2025-03-24 2:51 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: netdev, linux-kernel
On 03/22, Kirill Tkhai wrote:
> Hi,
>
> this patchset shows the way to completely remove rtnl lock and that
> this process can be done iteratively without any shocks. It implements
> the architecture of new fine-grained locking to use instead of rtnl,
> and iteratively converts many drivers to use it.
>
> I mostly write this mostly a few years ago, more or less recently
> I rebased the patches on kernel around 6.11 (there should not
> be many conflicts on that version). Currenly I have no plans
> to complete this.
>
> If anyone wants to continue, this person can take this patchset
> and done the work.
Skimmed through, but high level comment: we are slowly migrating to netdev
instance/ops lock:
https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=cc34acd577f1a6ed805106bfcc9a262837dbd0da
Instead of introducing another nd_lock, it should be possible (in
theory) to convert existing upper/lower devices to maintain locking
hierarchy and grab upper->lower during netdev_lock_ops().
There are a few nasty places where we lock lower->upper->lower, like
this, that need careful consideration:
https://lore.kernel.org/netdev/20250313100657.2287455-1-sdf@fomichev.me/
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock
2025-03-22 14:37 [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Kirill Tkhai
` (51 preceding siblings ...)
2025-03-24 2:51 ` [PATCH NET-PREV 00/51] Kill rtnl_lock using fine-grained nd_lock Stanislav Fomichev
@ 2025-03-25 11:15 ` Jakub Kicinski
52 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2025-03-25 11:15 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: netdev, linux-kernel
On Sat, 22 Mar 2025 17:37:41 +0300 Kirill Tkhai wrote:
> I mostly write this mostly a few years ago, more or less recently
> I rebased the patches on kernel around 6.11 (there should not
> be many conflicts on that version). Currenly I have no plans
> to complete this.
>
> If anyone wants to continue, this person can take this patchset
> and done the work.
Was there a pain point you were trying to address, or was rtnl_lock
just an interesting challenge? Paolo mentioned trying to convert veth
to instance locking, I guess he may need to reach for similar solutions.
^ permalink raw reply [flat|nested] 54+ messages in thread