netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL.
@ 2024-10-09 23:16 Kuniyuki Iwashima
  2024-10-09 23:16 ` [PATCH v1 net-next 01/13] rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs Kuniyuki Iwashima
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

This is a prep for the next series where we will push RTNL down to
rtnl_{new,del,set}link().

That means, for example, __rtnl_newlink() is always under RTNL, but
rtnl_newlink() has a non-RTNL section.

As a prerequisite for per-netns RTNL, we will move netns validation
(and RTNL-independent validations if possible) to that section.

rtnl_link_ops and rtnl_af_ops will be protected with SRCU not to
depend on RTNL.


Kuniyuki Iwashima (13):
  rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs.
  rtnetlink: Call validate_linkmsg() in do_setlink().
  rtnetlink: Factorise do_setlink() path from __rtnl_newlink().
  rtnetlink: Move simple validation from __rtnl_newlink() to
    rtnl_newlink().
  rtnetlink: Move rtnl_link_ops_get() and retry to rtnl_newlink().
  rtnetlink: Move ops->validate to rtnl_newlink().
  rtnetlink: Protect struct rtnl_link_ops with SRCU.
  rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink().
  rtnetlink: Fetch IFLA_LINK_NETNSID in rtnl_newlink().
  rtnetlink: Clean up rtnl_dellink().
  rtnetlink: Clean up rtnl_setlink().
  rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().
  rtnetlink: Protect struct rtnl_af_ops with SRCU.

 include/net/rtnetlink.h |  10 +-
 net/core/rtnetlink.c    | 552 ++++++++++++++++++++++------------------
 2 files changed, 310 insertions(+), 252 deletions(-)

-- 
2.39.5 (Apple Git-154)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 01/13] rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs.
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 12:36   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 02/13] rtnetlink: Call validate_linkmsg() in do_setlink() Kuniyuki Iwashima
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will move linkinfo to rtnl_newlink() and pass it down to other
functions.

Let's pack it into rtnl_newlink_tbs.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6d68247aea70..abc44ee018a0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3621,6 +3621,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 
 struct rtnl_newlink_tbs {
 	struct nlattr *tb[IFLA_MAX + 1];
+	struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
 	struct nlattr *attr[RTNL_MAX_TYPE + 1];
 	struct nlattr *slave_attr[RTNL_SLAVE_MAX_TYPE + 1];
 };
@@ -3629,7 +3630,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  struct rtnl_newlink_tbs *tbs,
 			  struct netlink_ext_ack *extack)
 {
-	struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
+	struct nlattr ** const linkinfo = tbs->linkinfo;
 	struct nlattr ** const tb = tbs->tb;
 	const struct rtnl_link_ops *m_ops;
 	struct net_device *master_dev;
@@ -3684,8 +3685,9 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 						  ifla_info_policy, NULL);
 		if (err < 0)
 			return err;
-	} else
-		memset(linkinfo, 0, sizeof(linkinfo));
+	} else {
+		memset(linkinfo, 0, sizeof(tbs->linkinfo));
+	}
 
 	if (linkinfo[IFLA_INFO_KIND]) {
 		nla_strscpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 02/13] rtnetlink: Call validate_linkmsg() in do_setlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
  2024-10-09 23:16 ` [PATCH v1 net-next 01/13] rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 12:44   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 03/13] rtnetlink: Factorise do_setlink() path from __rtnl_newlink() Kuniyuki Iwashima
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

There are 3 paths that finally call do_setlink(), and validate_linkmsg()
is called in each path.

  1. RTM_NEWLINK
    1-1. dev is found in __rtnl_newlink()
    1-2. dev isn't found, but IFLA_GROUP is specified in
          rtnl_group_changelink()
  2. RTM_SETLINK

The next patch factorises 1-1 to a separate function.

As a preparation, let's move validate_linkmsg() calls to do_setlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index abc44ee018a0..bb14ddf2901e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2854,6 +2854,10 @@ static int do_setlink(const struct sk_buff *skb,
 	char ifname[IFNAMSIZ];
 	int err;
 
+	err = validate_linkmsg(dev, tb, extack);
+	if (err < 0)
+		goto errout;
+
 	if (tb[IFLA_IFNAME])
 		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
 	else
@@ -3268,10 +3272,6 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		goto errout;
 	}
 
-	err = validate_linkmsg(dev, tb, extack);
-	if (err < 0)
-		goto errout;
-
 	err = do_setlink(skb, dev, ifm, extack, tb, 0);
 errout:
 	return err;
@@ -3515,9 +3515,6 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
 
 	for_each_netdev_safe(net, dev, aux) {
 		if (dev->group == group) {
-			err = validate_linkmsg(dev, tb, extack);
-			if (err < 0)
-				return err;
 			err = do_setlink(skb, dev, ifm, extack, tb, 0);
 			if (err < 0)
 				return err;
@@ -3743,10 +3740,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		if (nlh->nlmsg_flags & NLM_F_REPLACE)
 			return -EOPNOTSUPP;
 
-		err = validate_linkmsg(dev, tb, extack);
-		if (err < 0)
-			return err;
-
 		if (linkinfo[IFLA_INFO_DATA]) {
 			if (!ops || ops != dev->rtnl_link_ops ||
 			    !ops->changelink)
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 03/13] rtnetlink: Factorise do_setlink() path from __rtnl_newlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
  2024-10-09 23:16 ` [PATCH v1 net-next 01/13] rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs Kuniyuki Iwashima
  2024-10-09 23:16 ` [PATCH v1 net-next 02/13] rtnetlink: Call validate_linkmsg() in do_setlink() Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 12:39   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 04/13] rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink() Kuniyuki Iwashima
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

__rtnl_newlink() got too long to maintain.

For example, netdev_master_upper_dev_get()->rtnl_link_ops is fetched even
when IFLA_INFO_SLAVE_DATA is not specified.

Let's factorise the single dev do_setlink() path to a separate function.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 142 ++++++++++++++++++++++---------------------
 1 file changed, 74 insertions(+), 68 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index bb14ddf2901e..1d214c76011d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3504,6 +3504,78 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
 }
 EXPORT_SYMBOL(rtnl_create_link);
 
+struct rtnl_newlink_tbs {
+	struct nlattr *tb[IFLA_MAX + 1];
+	struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
+	struct nlattr *attr[RTNL_MAX_TYPE + 1];
+	struct nlattr *slave_attr[RTNL_SLAVE_MAX_TYPE + 1];
+};
+
+static int rtnl_changelink(const struct sk_buff *skb, struct nlmsghdr *nlh,
+			   const struct rtnl_link_ops *ops,
+			   struct net_device *dev,
+			   struct rtnl_newlink_tbs *tbs,
+			   struct nlattr **data,
+			   struct netlink_ext_ack *extack)
+{
+	struct nlattr ** const linkinfo = tbs->linkinfo;
+	struct nlattr ** const tb = tbs->tb;
+	int status = 0;
+	int err;
+
+	if (nlh->nlmsg_flags & NLM_F_EXCL)
+		return -EEXIST;
+
+	if (nlh->nlmsg_flags & NLM_F_REPLACE)
+		return -EOPNOTSUPP;
+
+	if (linkinfo[IFLA_INFO_DATA]) {
+		if (!ops || ops != dev->rtnl_link_ops || !ops->changelink)
+			return -EOPNOTSUPP;
+
+		err = ops->changelink(dev, tb, data, extack);
+		if (err < 0)
+			return err;
+
+		status |= DO_SETLINK_NOTIFY;
+	}
+
+	if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
+		const struct rtnl_link_ops *m_ops = NULL;
+		struct nlattr **slave_data = NULL;
+		struct net_device *master_dev;
+
+		master_dev = netdev_master_upper_dev_get(dev);
+		if (master_dev)
+			m_ops = master_dev->rtnl_link_ops;
+
+		if (!m_ops || !m_ops->slave_changelink)
+			return -EOPNOTSUPP;
+
+		if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
+			return -EINVAL;
+
+		if (m_ops->slave_maxtype) {
+			err = nla_parse_nested_deprecated(tbs->slave_attr,
+							  m_ops->slave_maxtype,
+							  linkinfo[IFLA_INFO_SLAVE_DATA],
+							  m_ops->slave_policy, extack);
+			if (err < 0)
+				return err;
+
+			slave_data = tbs->slave_attr;
+		}
+
+		err = m_ops->slave_changelink(master_dev, dev, tb, slave_data, extack);
+		if (err < 0)
+			return err;
+
+		status |= DO_SETLINK_NOTIFY;
+	}
+
+	return do_setlink(skb, dev, nlmsg_data(nlh), extack, tb, status);
+}
+
 static int rtnl_group_changelink(const struct sk_buff *skb,
 		struct net *net, int group,
 		struct ifinfomsg *ifm,
@@ -3616,24 +3688,14 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 	goto out;
 }
 
-struct rtnl_newlink_tbs {
-	struct nlattr *tb[IFLA_MAX + 1];
-	struct nlattr *linkinfo[IFLA_INFO_MAX + 1];
-	struct nlattr *attr[RTNL_MAX_TYPE + 1];
-	struct nlattr *slave_attr[RTNL_SLAVE_MAX_TYPE + 1];
-};
-
 static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  struct rtnl_newlink_tbs *tbs,
 			  struct netlink_ext_ack *extack)
 {
 	struct nlattr ** const linkinfo = tbs->linkinfo;
 	struct nlattr ** const tb = tbs->tb;
-	const struct rtnl_link_ops *m_ops;
-	struct net_device *master_dev;
 	struct net *net = sock_net(skb->sk);
 	const struct rtnl_link_ops *ops;
-	struct nlattr **slave_data;
 	char kind[MODULE_NAME_LEN];
 	struct net_device *dev;
 	struct ifinfomsg *ifm;
@@ -3668,14 +3730,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		dev = NULL;
 	}
 
-	master_dev = NULL;
-	m_ops = NULL;
-	if (dev) {
-		master_dev = netdev_master_upper_dev_get(dev);
-		if (master_dev)
-			m_ops = master_dev->rtnl_link_ops;
-	}
-
 	if (tb[IFLA_LINKINFO]) {
 		err = nla_parse_nested_deprecated(linkinfo, IFLA_INFO_MAX,
 						  tb[IFLA_LINKINFO],
@@ -3714,56 +3768,8 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		}
 	}
 
-	slave_data = NULL;
-	if (m_ops) {
-		if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
-			return -EINVAL;
-
-		if (m_ops->slave_maxtype &&
-		    linkinfo[IFLA_INFO_SLAVE_DATA]) {
-			err = nla_parse_nested_deprecated(tbs->slave_attr,
-							  m_ops->slave_maxtype,
-							  linkinfo[IFLA_INFO_SLAVE_DATA],
-							  m_ops->slave_policy,
-							  extack);
-			if (err < 0)
-				return err;
-			slave_data = tbs->slave_attr;
-		}
-	}
-
-	if (dev) {
-		int status = 0;
-
-		if (nlh->nlmsg_flags & NLM_F_EXCL)
-			return -EEXIST;
-		if (nlh->nlmsg_flags & NLM_F_REPLACE)
-			return -EOPNOTSUPP;
-
-		if (linkinfo[IFLA_INFO_DATA]) {
-			if (!ops || ops != dev->rtnl_link_ops ||
-			    !ops->changelink)
-				return -EOPNOTSUPP;
-
-			err = ops->changelink(dev, tb, data, extack);
-			if (err < 0)
-				return err;
-			status |= DO_SETLINK_NOTIFY;
-		}
-
-		if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
-			if (!m_ops || !m_ops->slave_changelink)
-				return -EOPNOTSUPP;
-
-			err = m_ops->slave_changelink(master_dev, dev, tb,
-						      slave_data, extack);
-			if (err < 0)
-				return err;
-			status |= DO_SETLINK_NOTIFY;
-		}
-
-		return do_setlink(skb, dev, ifm, extack, tb, status);
-	}
+	if (dev)
+		return rtnl_changelink(skb, nlh, ops, dev, tbs, data, extack);
 
 	if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
 		/* No dev found and NLM_F_CREATE not set. Requested dev does not exist,
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 04/13] rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (2 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 03/13] rtnetlink: Factorise do_setlink() path from __rtnl_newlink() Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 12:40   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 05/13] rtnetlink: Move rtnl_link_ops_get() and retry " Kuniyuki Iwashima
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will push RTNL down to rtnl_newlink().

Let's move RTNL-independent validation to rtnl_newlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1d214c76011d..3416f364db83 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3706,15 +3706,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 #ifdef CONFIG_MODULES
 replay:
 #endif
-	err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
-				     ifla_policy, extack);
-	if (err < 0)
-		return err;
-
-	err = rtnl_ensure_unique_netns(tb, extack, false);
-	if (err < 0)
-		return err;
-
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0) {
 		link_specified = true;
@@ -3730,16 +3721,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		dev = NULL;
 	}
 
-	if (tb[IFLA_LINKINFO]) {
-		err = nla_parse_nested_deprecated(linkinfo, IFLA_INFO_MAX,
-						  tb[IFLA_LINKINFO],
-						  ifla_info_policy, NULL);
-		if (err < 0)
-			return err;
-	} else {
-		memset(linkinfo, 0, sizeof(tbs->linkinfo));
-	}
-
 	if (linkinfo[IFLA_INFO_KIND]) {
 		nla_strscpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));
 		ops = rtnl_link_ops_get(kind);
@@ -3808,6 +3789,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
+	struct nlattr **tb, **linkinfo;
 	struct rtnl_newlink_tbs *tbs;
 	int ret;
 
@@ -3815,7 +3797,30 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (!tbs)
 		return -ENOMEM;
 
+	tb = tbs->tb;
+	ret = nlmsg_parse_deprecated(nlh, sizeof(struct ifinfomsg), tb,
+				     IFLA_MAX, ifla_policy, extack);
+	if (ret < 0)
+		goto free;
+
+	ret = rtnl_ensure_unique_netns(tb, extack, false);
+	if (ret < 0)
+		goto free;
+
+	linkinfo = tbs->linkinfo;
+	if (tb[IFLA_LINKINFO]) {
+		ret = nla_parse_nested_deprecated(linkinfo, IFLA_INFO_MAX,
+						  tb[IFLA_LINKINFO],
+						  ifla_info_policy, NULL);
+		if (ret < 0)
+			goto free;
+	} else {
+		memset(linkinfo, 0, sizeof(tbs->linkinfo));
+	}
+
 	ret = __rtnl_newlink(skb, nlh, tbs, extack);
+
+free:
 	kfree(tbs);
 	return ret;
 }
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 05/13] rtnetlink: Move rtnl_link_ops_get() and retry to rtnl_newlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (3 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 04/13] rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink() Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-09 23:16 ` [PATCH v1 net-next 06/13] rtnetlink: Move ops->validate " Kuniyuki Iwashima
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Currently, if neither dev nor rtnl_link_ops is found in __rtnl_newlink(),
we release RTNL and redo the whole process after request_module(), which
complicates the logic.

The ops will be RTNL-independent later.

Let's move the ops lookup to rtnl_newlink() and do the retry earlier.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3416f364db83..fe36d584136f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3689,23 +3689,19 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 }
 
 static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
+			  const struct rtnl_link_ops *ops,
 			  struct rtnl_newlink_tbs *tbs,
 			  struct netlink_ext_ack *extack)
 {
 	struct nlattr ** const linkinfo = tbs->linkinfo;
 	struct nlattr ** const tb = tbs->tb;
 	struct net *net = sock_net(skb->sk);
-	const struct rtnl_link_ops *ops;
-	char kind[MODULE_NAME_LEN];
 	struct net_device *dev;
 	struct ifinfomsg *ifm;
 	struct nlattr **data;
 	bool link_specified;
 	int err;
 
-#ifdef CONFIG_MODULES
-replay:
-#endif
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0) {
 		link_specified = true;
@@ -3721,14 +3717,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		dev = NULL;
 	}
 
-	if (linkinfo[IFLA_INFO_KIND]) {
-		nla_strscpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));
-		ops = rtnl_link_ops_get(kind);
-	} else {
-		kind[0] = '\0';
-		ops = NULL;
-	}
-
 	data = NULL;
 	if (ops) {
 		if (ops->maxtype > RTNL_MAX_TYPE)
@@ -3769,16 +3757,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -EOPNOTSUPP;
 
 	if (!ops) {
-#ifdef CONFIG_MODULES
-		if (kind[0]) {
-			__rtnl_unlock();
-			request_module("rtnl-link-%s", kind);
-			rtnl_lock();
-			ops = rtnl_link_ops_get(kind);
-			if (ops)
-				goto replay;
-		}
-#endif
 		NL_SET_ERR_MSG(extack, "Unknown device type");
 		return -EOPNOTSUPP;
 	}
@@ -3789,6 +3767,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
+	const struct rtnl_link_ops *ops = NULL;
 	struct nlattr **tb, **linkinfo;
 	struct rtnl_newlink_tbs *tbs;
 	int ret;
@@ -3818,7 +3797,22 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		memset(linkinfo, 0, sizeof(tbs->linkinfo));
 	}
 
-	ret = __rtnl_newlink(skb, nlh, tbs, extack);
+	if (linkinfo[IFLA_INFO_KIND]) {
+		char kind[MODULE_NAME_LEN];
+
+		nla_strscpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));
+		ops = rtnl_link_ops_get(kind);
+#ifdef CONFIG_MODULES
+		if (!ops) {
+			__rtnl_unlock();
+			request_module("rtnl-link-%s", kind);
+			rtnl_lock();
+			ops = rtnl_link_ops_get(kind);
+		}
+#endif
+	}
+
+	ret = __rtnl_newlink(skb, nlh, ops, tbs, extack);
 
 free:
 	kfree(tbs);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 06/13] rtnetlink: Move ops->validate to rtnl_newlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (4 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 05/13] rtnetlink: Move rtnl_link_ops_get() and retry " Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 12:48   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 07/13] rtnetlink: Protect struct rtnl_link_ops with SRCU Kuniyuki Iwashima
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

ops->validate() does not require RTNL.

Let's move it to rtnl_newlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 49 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index fe36d584136f..24545c5b7e48 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3691,16 +3691,14 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  const struct rtnl_link_ops *ops,
 			  struct rtnl_newlink_tbs *tbs,
+			  struct nlattr **data,
 			  struct netlink_ext_ack *extack)
 {
-	struct nlattr ** const linkinfo = tbs->linkinfo;
 	struct nlattr ** const tb = tbs->tb;
 	struct net *net = sock_net(skb->sk);
 	struct net_device *dev;
 	struct ifinfomsg *ifm;
-	struct nlattr **data;
 	bool link_specified;
-	int err;
 
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0) {
@@ -3717,26 +3715,6 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		dev = NULL;
 	}
 
-	data = NULL;
-	if (ops) {
-		if (ops->maxtype > RTNL_MAX_TYPE)
-			return -EINVAL;
-
-		if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
-			err = nla_parse_nested_deprecated(tbs->attr, ops->maxtype,
-							  linkinfo[IFLA_INFO_DATA],
-							  ops->policy, extack);
-			if (err < 0)
-				return err;
-			data = tbs->attr;
-		}
-		if (ops->validate) {
-			err = ops->validate(tb, data, extack);
-			if (err < 0)
-				return err;
-		}
-	}
-
 	if (dev)
 		return rtnl_changelink(skb, nlh, ops, dev, tbs, data, extack);
 
@@ -3767,8 +3745,8 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
+	struct nlattr **tb, **linkinfo, **data = NULL;
 	const struct rtnl_link_ops *ops = NULL;
-	struct nlattr **tb, **linkinfo;
 	struct rtnl_newlink_tbs *tbs;
 	int ret;
 
@@ -3812,7 +3790,28 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 #endif
 	}
 
-	ret = __rtnl_newlink(skb, nlh, ops, tbs, extack);
+	if (ops) {
+		if (ops->maxtype > RTNL_MAX_TYPE)
+			return -EINVAL;
+
+		if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
+			ret = nla_parse_nested_deprecated(tbs->attr, ops->maxtype,
+							  linkinfo[IFLA_INFO_DATA],
+							  ops->policy, extack);
+			if (ret < 0)
+				goto free;
+
+			data = tbs->attr;
+		}
+
+		if (ops->validate) {
+			ret = ops->validate(tb, data, extack);
+			if (ret < 0)
+				goto free;
+		}
+	}
+
+	ret = __rtnl_newlink(skb, nlh, ops, tbs, data, extack);
 
 free:
 	kfree(tbs);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 07/13] rtnetlink: Protect struct rtnl_link_ops with SRCU.
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (5 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 06/13] rtnetlink: Move ops->validate " Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 13:02   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 08/13] rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink() Kuniyuki Iwashima
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_link_ops is alive during inflight RTM_NEWLINK
even when its module is being unloaded.

Let's use SRCU to protect rtnl_link_ops.

rtnl_link_ops_get() now iterates link_ops under RCU and returns
SRCU-protected ops pointer.  The caller must call rtnl_link_ops_put()
to release the pointer after the use.

Also, __rtnl_link_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_NEWLINK requests to
complete.

Note that link_ops needs to be protected by its dedicated lock
when RTNL is removed.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/rtnetlink.h |  5 ++-
 net/core/rtnetlink.c    | 78 +++++++++++++++++++++++++++++------------
 2 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index b45d57b5968a..c873fd6193ed 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -3,6 +3,7 @@
 #define __NET_RTNETLINK_H
 
 #include <linux/rtnetlink.h>
+#include <linux/srcu.h>
 #include <net/netlink.h>
 
 typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *,
@@ -47,7 +48,8 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
 /**
  *	struct rtnl_link_ops - rtnetlink link operations
  *
- *	@list: Used internally
+ *	@list: Used internally, protected by RTNL and SRCU
+ *	@srcu: Used internally
  *	@kind: Identifier
  *	@netns_refund: Physical device, move to init_net on netns exit
  *	@maxtype: Highest device specific netlink attribute number
@@ -78,6 +80,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  */
 struct rtnl_link_ops {
 	struct list_head	list;
+	struct srcu_struct	srcu;
 
 	const char		*kind;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 24545c5b7e48..7f464554d881 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -456,15 +456,29 @@ EXPORT_SYMBOL_GPL(rtnl_unregister_all);
 
 static LIST_HEAD(link_ops);
 
-static const struct rtnl_link_ops *rtnl_link_ops_get(const char *kind)
+static struct rtnl_link_ops *rtnl_link_ops_get(const char *kind, int *srcu_index)
 {
-	const struct rtnl_link_ops *ops;
+	struct rtnl_link_ops *ops;
 
-	list_for_each_entry(ops, &link_ops, list) {
-		if (!strcmp(ops->kind, kind))
-			return ops;
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(ops, &link_ops, list) {
+		if (!strcmp(ops->kind, kind)) {
+			*srcu_index = srcu_read_lock(&ops->srcu);
+			goto unlock;
+		}
 	}
-	return NULL;
+
+	ops = NULL;
+unlock:
+	rcu_read_unlock();
+
+	return ops;
+}
+
+static void rtnl_link_ops_put(struct rtnl_link_ops *ops, int srcu_index)
+{
+	srcu_read_unlock(&ops->srcu, srcu_index);
 }
 
 /**
@@ -479,8 +493,15 @@ static const struct rtnl_link_ops *rtnl_link_ops_get(const char *kind)
  */
 int __rtnl_link_register(struct rtnl_link_ops *ops)
 {
-	if (rtnl_link_ops_get(ops->kind))
-		return -EEXIST;
+	struct rtnl_link_ops *tmp;
+
+	/* When RTNL is removed, add lock for link_ops. */
+	ASSERT_RTNL();
+
+	list_for_each_entry(tmp, &link_ops, list) {
+		if (!strcmp(ops->kind, tmp->kind))
+			return -EEXIST;
+	}
 
 	/* The check for alloc/setup is here because if ops
 	 * does not have that filled up, it is not possible
@@ -490,7 +511,9 @@ int __rtnl_link_register(struct rtnl_link_ops *ops)
 	if ((ops->alloc || ops->setup) && !ops->dellink)
 		ops->dellink = unregister_netdevice_queue;
 
-	list_add_tail(&ops->list, &link_ops);
+	init_srcu_struct(&ops->srcu);
+	list_add_tail_rcu(&ops->list, &link_ops);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__rtnl_link_register);
@@ -541,10 +564,11 @@ void __rtnl_link_unregister(struct rtnl_link_ops *ops)
 {
 	struct net *net;
 
-	for_each_net(net) {
+	list_del_rcu(&ops->list);
+	synchronize_srcu(&ops->srcu);
+
+	for_each_net(net)
 		__rtnl_kill_links(net, ops);
-	}
-	list_del(&ops->list);
 }
 EXPORT_SYMBOL_GPL(__rtnl_link_unregister);
 
@@ -2157,10 +2181,11 @@ static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = {
 	[IFLA_XDP_PROG_ID]	= { .type = NLA_U32 },
 };
 
-static const struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr *nla)
+static struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr *nla,
+						  int *ops_srcu_index)
 {
-	const struct rtnl_link_ops *ops = NULL;
 	struct nlattr *linfo[IFLA_INFO_MAX + 1];
+	struct rtnl_link_ops *ops = NULL;
 
 	if (nla_parse_nested_deprecated(linfo, IFLA_INFO_MAX, nla, ifla_info_policy, NULL) < 0)
 		return NULL;
@@ -2169,7 +2194,7 @@ static const struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr *nla
 		char kind[MODULE_NAME_LEN];
 
 		nla_strscpy(kind, linfo[IFLA_INFO_KIND], sizeof(kind));
-		ops = rtnl_link_ops_get(kind);
+		ops = rtnl_link_ops_get(kind, ops_srcu_index);
 	}
 
 	return ops;
@@ -2289,8 +2314,8 @@ static int rtnl_valid_dump_ifinfo_req(const struct nlmsghdr *nlh,
 
 static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	const struct rtnl_link_ops *kind_ops = NULL;
 	struct netlink_ext_ack *extack = cb->extack;
+	struct rtnl_link_ops *kind_ops = NULL;
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
 	unsigned int flags = NLM_F_MULTI;
@@ -2301,6 +2326,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 	struct net *tgt_net = net;
 	u32 ext_filter_mask = 0;
 	struct net_device *dev;
+	int ops_srcu_index;
 	int master_idx = 0;
 	int netnsid = -1;
 	int err, i;
@@ -2334,7 +2360,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 			master_idx = nla_get_u32(tb[i]);
 			break;
 		case IFLA_LINKINFO:
-			kind_ops = linkinfo_to_kind_ops(tb[i]);
+			kind_ops = linkinfo_to_kind_ops(tb[i], &ops_srcu_index);
 			break;
 		default:
 			if (cb->strict_check) {
@@ -2360,6 +2386,10 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 		if (err < 0)
 			break;
 	}
+
+	if (kind_ops)
+		rtnl_link_ops_put(kind_ops, ops_srcu_index);
+
 	cb->seq = tgt_net->dev_base_seq;
 	nl_dump_check_consistent(cb, nlmsg_hdr(skb));
 	if (netnsid >= 0)
@@ -3746,8 +3776,9 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
 	struct nlattr **tb, **linkinfo, **data = NULL;
-	const struct rtnl_link_ops *ops = NULL;
+	struct rtnl_link_ops *ops = NULL;
 	struct rtnl_newlink_tbs *tbs;
+	int ops_srcu_index;
 	int ret;
 
 	tbs = kmalloc(sizeof(*tbs), GFP_KERNEL);
@@ -3779,13 +3810,13 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		char kind[MODULE_NAME_LEN];
 
 		nla_strscpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));
-		ops = rtnl_link_ops_get(kind);
+		ops = rtnl_link_ops_get(kind, &ops_srcu_index);
 #ifdef CONFIG_MODULES
 		if (!ops) {
 			__rtnl_unlock();
 			request_module("rtnl-link-%s", kind);
 			rtnl_lock();
-			ops = rtnl_link_ops_get(kind);
+			ops = rtnl_link_ops_get(kind, &ops_srcu_index);
 		}
 #endif
 	}
@@ -3799,7 +3830,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 							  linkinfo[IFLA_INFO_DATA],
 							  ops->policy, extack);
 			if (ret < 0)
-				goto free;
+				goto put_ops;
 
 			data = tbs->attr;
 		}
@@ -3807,12 +3838,15 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		if (ops->validate) {
 			ret = ops->validate(tb, data, extack);
 			if (ret < 0)
-				goto free;
+				goto put_ops;
 		}
 	}
 
 	ret = __rtnl_newlink(skb, nlh, ops, tbs, data, extack);
 
+put_ops:
+	if (ops)
+		rtnl_link_ops_put(ops, ops_srcu_index);
 free:
 	kfree(tbs);
 	return ret;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 08/13] rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (6 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 07/13] rtnetlink: Protect struct rtnl_link_ops with SRCU Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 13:08   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 09/13] rtnetlink: Fetch IFLA_LINK_NETNSID " Kuniyuki Iwashima
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

As a prerequisite of per-netns RTNL, we must fetch netns before
looking up dev or moving it to another netns.

rtnl_link_get_net_capable() is called in rtnl_newlink_create() and
do_setlink(), but both of them need to be moved to the RTNL-independent
region, which will be rtnl_newlink().

Let's call rtnl_link_get_net_capable() in rtnl_newlink() and pass the
netns down to where needed.

Note that the latter two have not passed the nets to do_setlink() yet
but will do so after the remaining rtnl_link_get_net_capable() is moved
to rtnl_setlink() later.

While at it, dest_net is renamed to tgt_net in rtnl_newlink_create() to
align with rtnl_{del,set}link().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 51 ++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7f464554d881..fc62f23d2647 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3543,7 +3543,7 @@ struct rtnl_newlink_tbs {
 
 static int rtnl_changelink(const struct sk_buff *skb, struct nlmsghdr *nlh,
 			   const struct rtnl_link_ops *ops,
-			   struct net_device *dev,
+			   struct net_device *dev, struct net *tgt_net,
 			   struct rtnl_newlink_tbs *tbs,
 			   struct nlattr **data,
 			   struct netlink_ext_ack *extack)
@@ -3607,10 +3607,10 @@ static int rtnl_changelink(const struct sk_buff *skb, struct nlmsghdr *nlh,
 }
 
 static int rtnl_group_changelink(const struct sk_buff *skb,
-		struct net *net, int group,
-		struct ifinfomsg *ifm,
-		struct netlink_ext_ack *extack,
-		struct nlattr **tb)
+				 struct net *net, struct net *tgt_net,
+				 int group, struct ifinfomsg *ifm,
+				 struct netlink_ext_ack *extack,
+				 struct nlattr **tb)
 {
 	struct net_device *dev, *aux;
 	int err;
@@ -3628,6 +3628,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
 
 static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 			       const struct rtnl_link_ops *ops,
+			       struct net *tgt_net,
 			       const struct nlmsghdr *nlh,
 			       struct nlattr **tb, struct nlattr **data,
 			       struct netlink_ext_ack *extack)
@@ -3635,9 +3636,9 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 	unsigned char name_assign_type = NET_NAME_USER;
 	struct net *net = sock_net(skb->sk);
 	u32 portid = NETLINK_CB(skb).portid;
-	struct net *dest_net, *link_net;
 	struct net_device *dev;
 	char ifname[IFNAMSIZ];
+	struct net *link_net;
 	int err;
 
 	if (!ops->alloc && !ops->setup)
@@ -3650,14 +3651,10 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 		name_assign_type = NET_NAME_ENUM;
 	}
 
-	dest_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
-	if (IS_ERR(dest_net))
-		return PTR_ERR(dest_net);
-
 	if (tb[IFLA_LINK_NETNSID]) {
 		int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
 
-		link_net = get_net_ns_by_id(dest_net, id);
+		link_net = get_net_ns_by_id(tgt_net, id);
 		if (!link_net) {
 			NL_SET_ERR_MSG(extack, "Unknown network namespace id");
 			err =  -EINVAL;
@@ -3670,7 +3667,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 		link_net = NULL;
 	}
 
-	dev = rtnl_create_link(link_net ? : dest_net, ifname,
+	dev = rtnl_create_link(link_net ? : tgt_net, ifname,
 			       name_assign_type, ops, tb, extack);
 	if (IS_ERR(dev)) {
 		err = PTR_ERR(dev);
@@ -3692,7 +3689,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 	if (err < 0)
 		goto out_unregister;
 	if (link_net) {
-		err = dev_change_net_namespace(dev, dest_net, ifname);
+		err = dev_change_net_namespace(dev, tgt_net, ifname);
 		if (err < 0)
 			goto out_unregister;
 	}
@@ -3704,7 +3701,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 out:
 	if (link_net)
 		put_net(link_net);
-	put_net(dest_net);
+
 	return err;
 out_unregister:
 	if (ops->newlink) {
@@ -3720,6 +3717,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 
 static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  const struct rtnl_link_ops *ops,
+			  struct net *tgt_net,
 			  struct rtnl_newlink_tbs *tbs,
 			  struct nlattr **data,
 			  struct netlink_ext_ack *extack)
@@ -3746,19 +3744,18 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 
 	if (dev)
-		return rtnl_changelink(skb, nlh, ops, dev, tbs, data, extack);
+		return rtnl_changelink(skb, nlh, ops, dev, tgt_net, tbs, data, extack);
 
 	if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
 		/* No dev found and NLM_F_CREATE not set. Requested dev does not exist,
 		 * or it's for a group
 		*/
-		if (link_specified)
+		if (link_specified || !tb[IFLA_GROUP])
 			return -ENODEV;
-		if (tb[IFLA_GROUP])
-			return rtnl_group_changelink(skb, net,
-						nla_get_u32(tb[IFLA_GROUP]),
-						ifm, extack, tb);
-		return -ENODEV;
+
+		return rtnl_group_changelink(skb, net, tgt_net,
+					     nla_get_u32(tb[IFLA_GROUP]),
+					     ifm, extack, tb);
 	}
 
 	if (tb[IFLA_MAP] || tb[IFLA_PROTINFO])
@@ -3769,7 +3766,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -EOPNOTSUPP;
 	}
 
-	return rtnl_newlink_create(skb, ifm, ops, nlh, tb, data, extack);
+	return rtnl_newlink_create(skb, ifm, ops, tgt_net, nlh, tb, data, extack);
 }
 
 static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -3778,6 +3775,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct nlattr **tb, **linkinfo, **data = NULL;
 	struct rtnl_link_ops *ops = NULL;
 	struct rtnl_newlink_tbs *tbs;
+	struct net *tgt_net;
 	int ops_srcu_index;
 	int ret;
 
@@ -3842,8 +3840,15 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		}
 	}
 
-	ret = __rtnl_newlink(skb, nlh, ops, tbs, data, extack);
+	tgt_net = rtnl_link_get_net_capable(skb, sock_net(skb->sk), tb, CAP_NET_ADMIN);
+	if (IS_ERR(tgt_net)) {
+		ret = PTR_ERR(tgt_net);
+		goto put_ops;
+	}
+
+	ret = __rtnl_newlink(skb, nlh, ops, tgt_net, tbs, data, extack);
 
+	put_net(tgt_net);
 put_ops:
 	if (ops)
 		rtnl_link_ops_put(ops, ops_srcu_index);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 09/13] rtnetlink: Fetch IFLA_LINK_NETNSID in rtnl_newlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (7 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 08/13] rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink() Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 13:09   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 10/13] rtnetlink: Clean up rtnl_dellink() Kuniyuki Iwashima
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Another netns option for RTM_NEWLINK is IFLA_LINK_NETNSID and
is fetched in rtnl_newlink_create().

This must be done before holding rtnl_net_lock().

Let's move IFLA_LINK_NETNSID processing to rtnl_newlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 49 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index fc62f23d2647..70a3a9f411d8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3628,7 +3628,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
 
 static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 			       const struct rtnl_link_ops *ops,
-			       struct net *tgt_net,
+			       struct net *tgt_net, struct net *link_net,
 			       const struct nlmsghdr *nlh,
 			       struct nlattr **tb, struct nlattr **data,
 			       struct netlink_ext_ack *extack)
@@ -3638,7 +3638,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 	u32 portid = NETLINK_CB(skb).portid;
 	struct net_device *dev;
 	char ifname[IFNAMSIZ];
-	struct net *link_net;
 	int err;
 
 	if (!ops->alloc && !ops->setup)
@@ -3651,22 +3650,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 		name_assign_type = NET_NAME_ENUM;
 	}
 
-	if (tb[IFLA_LINK_NETNSID]) {
-		int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
-
-		link_net = get_net_ns_by_id(tgt_net, id);
-		if (!link_net) {
-			NL_SET_ERR_MSG(extack, "Unknown network namespace id");
-			err =  -EINVAL;
-			goto out;
-		}
-		err = -EPERM;
-		if (!netlink_ns_capable(skb, link_net->user_ns, CAP_NET_ADMIN))
-			goto out;
-	} else {
-		link_net = NULL;
-	}
-
 	dev = rtnl_create_link(link_net ? : tgt_net, ifname,
 			       name_assign_type, ops, tb, extack);
 	if (IS_ERR(dev)) {
@@ -3699,9 +3682,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 			goto out_unregister;
 	}
 out:
-	if (link_net)
-		put_net(link_net);
-
 	return err;
 out_unregister:
 	if (ops->newlink) {
@@ -3717,7 +3697,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
 
 static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			  const struct rtnl_link_ops *ops,
-			  struct net *tgt_net,
+			  struct net *tgt_net, struct net *link_net,
 			  struct rtnl_newlink_tbs *tbs,
 			  struct nlattr **data,
 			  struct netlink_ext_ack *extack)
@@ -3766,16 +3746,16 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -EOPNOTSUPP;
 	}
 
-	return rtnl_newlink_create(skb, ifm, ops, tgt_net, nlh, tb, data, extack);
+	return rtnl_newlink_create(skb, ifm, ops, tgt_net, link_net, nlh, tb, data, extack);
 }
 
 static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
 	struct nlattr **tb, **linkinfo, **data = NULL;
+	struct net *tgt_net, *link_net = NULL;
 	struct rtnl_link_ops *ops = NULL;
 	struct rtnl_newlink_tbs *tbs;
-	struct net *tgt_net;
 	int ops_srcu_index;
 	int ret;
 
@@ -3846,8 +3826,27 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		goto put_ops;
 	}
 
-	ret = __rtnl_newlink(skb, nlh, ops, tgt_net, tbs, data, extack);
+	if (tb[IFLA_LINK_NETNSID]) {
+		int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
+
+		link_net = get_net_ns_by_id(tgt_net, id);
+		if (!link_net) {
+			NL_SET_ERR_MSG(extack, "Unknown network namespace id");
+			ret =  -EINVAL;
+			goto put_net;
+		}
+
+		if (!netlink_ns_capable(skb, link_net->user_ns, CAP_NET_ADMIN)) {
+			ret = -EPERM;
+			goto put_net;
+		}
+	}
+
+	ret = __rtnl_newlink(skb, nlh, ops, tgt_net, link_net, tbs, data, extack);
 
+put_net:
+	if (link_net)
+		put_net(link_net);
 	put_net(tgt_net);
 put_ops:
 	if (ops)
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 10/13] rtnetlink: Clean up rtnl_dellink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (8 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 09/13] rtnetlink: Fetch IFLA_LINK_NETNSID " Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 13:10   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 11/13] rtnetlink: Clean up rtnl_setlink() Kuniyuki Iwashima
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will push RTNL down to rtnl_delink().

Let's unify the error path to make it easy to place rtnl_net_lock().

While at it, keep the variables in reverse xmas order.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 70a3a9f411d8..59a83fd52a92 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3362,14 +3362,14 @@ EXPORT_SYMBOL_GPL(rtnl_delete_link);
 static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
+	struct ifinfomsg *ifm = nlmsg_data(nlh);
 	struct net *net = sock_net(skb->sk);
 	u32 portid = NETLINK_CB(skb).portid;
-	struct net *tgt_net = net;
-	struct net_device *dev = NULL;
-	struct ifinfomsg *ifm;
 	struct nlattr *tb[IFLA_MAX+1];
-	int err;
+	struct net_device *dev = NULL;
+	struct net *tgt_net = net;
 	int netnsid = -1;
+	int err;
 
 	err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
 				     ifla_policy, extack);
@@ -3387,27 +3387,20 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			return PTR_ERR(tgt_net);
 	}
 
-	err = -EINVAL;
-	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(tgt_net, ifm->ifi_index);
 	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
 		dev = rtnl_dev_get(tgt_net, tb);
+
+	if (dev)
+		err = rtnl_delete_link(dev, portid, nlh);
+	else if (ifm->ifi_index > 0 || tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
+		err = -ENODEV;
 	else if (tb[IFLA_GROUP])
 		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
 	else
-		goto out;
-
-	if (!dev) {
-		if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME] || ifm->ifi_index > 0)
-			err = -ENODEV;
-
-		goto out;
-	}
-
-	err = rtnl_delete_link(dev, portid, nlh);
+		err = -EINVAL;
 
-out:
 	if (netnsid >= 0)
 		put_net(tgt_net);
 
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 11/13] rtnetlink: Clean up rtnl_setlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (9 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 10/13] rtnetlink: Clean up rtnl_dellink() Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 13:11   ` Eric Dumazet
  2024-10-09 23:16 ` [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink() Kuniyuki Iwashima
  2024-10-09 23:16 ` [PATCH v1 net-next 13/13] rtnetlink: Protect struct rtnl_af_ops with SRCU Kuniyuki Iwashima
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will push RTNL down to rtnl_setlink().

Let's unify the error path to make it easy to place rtnl_net_lock().

While at it, keep the variables in reverse xmas order.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 59a83fd52a92..de693a88986e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3273,11 +3273,11 @@ static struct net_device *rtnl_dev_get(struct net *net,
 static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			struct netlink_ext_ack *extack)
 {
+	struct ifinfomsg *ifm = nlmsg_data(nlh);
 	struct net *net = sock_net(skb->sk);
-	struct ifinfomsg *ifm;
-	struct net_device *dev;
-	int err;
 	struct nlattr *tb[IFLA_MAX+1];
+	struct net_device *dev = NULL;
+	int err;
 
 	err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
 				     ifla_policy, extack);
@@ -3288,21 +3288,18 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err < 0)
 		goto errout;
 
-	err = -EINVAL;
-	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
 	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
 		dev = rtnl_dev_get(net, tb);
 	else
-		goto errout;
+		err = -EINVAL;
 
-	if (dev == NULL) {
+	if (dev)
+		err = do_setlink(skb, dev, ifm, extack, tb, 0);
+	else if (!err)
 		err = -ENODEV;
-		goto errout;
-	}
 
-	err = do_setlink(skb, dev, ifm, extack, tb, 0);
 errout:
 	return err;
 }
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (10 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 11/13] rtnetlink: Clean up rtnl_setlink() Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 13:15   ` Eric Dumazet
  2024-10-11  7:36   ` kernel test robot
  2024-10-09 23:16 ` [PATCH v1 net-next 13/13] rtnetlink: Protect struct rtnl_af_ops with SRCU Kuniyuki Iwashima
  12 siblings, 2 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will push RTNL down to rtnl_setlink().

RTM_SETLINK could call rtnl_link_get_net_capable() in do_setlink()
to move a dev to a new netns, but the netns needs to be fetched before
holding rtnl_net_lock().

Let's move it to rtnl_setlink() and pass the netns to do_setlink().

Now, RTM_NEWLINK paths (rtnl_changelink() and rtnl_group_changelink())
can pass the prefetched netns to do_setlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/rtnetlink.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index de693a88986e..a0702e531331 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2875,8 +2875,8 @@ static int do_set_proto_down(struct net_device *dev,
 #define DO_SETLINK_MODIFIED	0x01
 /* notify flag means notify + modified. */
 #define DO_SETLINK_NOTIFY	0x03
-static int do_setlink(const struct sk_buff *skb,
-		      struct net_device *dev, struct ifinfomsg *ifm,
+static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
+		      struct net *tgt_net, struct ifinfomsg *ifm,
 		      struct netlink_ext_ack *extack,
 		      struct nlattr **tb, int status)
 {
@@ -2893,27 +2893,19 @@ static int do_setlink(const struct sk_buff *skb,
 	else
 		ifname[0] = '\0';
 
-	if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD] || tb[IFLA_TARGET_NETNSID]) {
+	if (!net_eq(tgt_net, dev_net(dev))) {
 		const char *pat = ifname[0] ? ifname : NULL;
-		struct net *net;
 		int new_ifindex;
 
-		net = rtnl_link_get_net_capable(skb, dev_net(dev),
-						tb, CAP_NET_ADMIN);
-		if (IS_ERR(net)) {
-			err = PTR_ERR(net);
-			goto errout;
-		}
-
 		if (tb[IFLA_NEW_IFINDEX])
 			new_ifindex = nla_get_s32(tb[IFLA_NEW_IFINDEX]);
 		else
 			new_ifindex = 0;
 
-		err = __dev_change_net_namespace(dev, net, pat, new_ifindex);
-		put_net(net);
+		err = __dev_change_net_namespace(dev, tgt_net, pat, new_ifindex);
 		if (err)
 			goto errout;
+
 		status |= DO_SETLINK_MODIFIED;
 	}
 
@@ -3277,6 +3269,7 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[IFLA_MAX+1];
 	struct net_device *dev = NULL;
+	struct net *tgt_net;
 	int err;
 
 	err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
@@ -3288,6 +3281,10 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err < 0)
 		goto errout;
 
+	tgt_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
+	if (IS_ERR(tgt_net))
+		return PTR_ERR(tgt_net);
+
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
 	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
@@ -3296,11 +3293,13 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		err = -EINVAL;
 
 	if (dev)
-		err = do_setlink(skb, dev, ifm, extack, tb, 0);
+		err = do_setlink(skb, dev, tgt_net, ifm, extack, tb, 0);
 	else if (!err)
 		err = -ENODEV;
 
 errout:
+	put_net(tgt_net);
+
 	return err;
 }
 
@@ -3593,7 +3592,7 @@ static int rtnl_changelink(const struct sk_buff *skb, struct nlmsghdr *nlh,
 		status |= DO_SETLINK_NOTIFY;
 	}
 
-	return do_setlink(skb, dev, nlmsg_data(nlh), extack, tb, status);
+	return do_setlink(skb, dev, tgt_net, nlmsg_data(nlh), extack, tb, status);
 }
 
 static int rtnl_group_changelink(const struct sk_buff *skb,
@@ -3607,7 +3606,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
 
 	for_each_netdev_safe(net, dev, aux) {
 		if (dev->group == group) {
-			err = do_setlink(skb, dev, ifm, extack, tb, 0);
+			err = do_setlink(skb, dev, tgt_net, ifm, extack, tb, 0);
 			if (err < 0)
 				return err;
 		}
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 net-next 13/13] rtnetlink: Protect struct rtnl_af_ops with SRCU.
  2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
                   ` (11 preceding siblings ...)
  2024-10-09 23:16 ` [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink() Kuniyuki Iwashima
@ 2024-10-09 23:16 ` Kuniyuki Iwashima
  2024-10-10 13:16   ` Eric Dumazet
  12 siblings, 1 reply; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-09 23:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_af_ops is alive during inflight RTM_SETLINK
even when its module is being unloaded.

Let's use SRCU to protect rtnl_af_ops.

rtnl_af_lookup() now iterates rtnl_af_ops under RCU and returns
SRCU-protected ops pointer.  The caller must call rtnl_af_put()
to release the pointer after the use.

Also, rtnl_af_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_SETLINK requests to
complete.

Note that rtnl_af_ops needs to be protected by its dedicated lock
when RTNL is removed.

Note also that BUG_ON() in do_setlink() is changed to the normal
error handling as a different af_ops might be found after
validate_linkmsg().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/rtnetlink.h |  5 +++-
 net/core/rtnetlink.c    | 58 +++++++++++++++++++++++++++++------------
 2 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index c873fd6193ed..407a2f56f00a 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -150,7 +150,8 @@ void rtnl_link_unregister(struct rtnl_link_ops *ops);
 /**
  * 	struct rtnl_af_ops - rtnetlink address family operations
  *
- *	@list: Used internally
+ *	@list: Used internally, protected by RTNL and SRCU
+ *	@srcu: Used internally
  * 	@family: Address family
  * 	@fill_link_af: Function to fill IFLA_AF_SPEC with address family
  * 		       specific netlink attributes.
@@ -163,6 +164,8 @@ void rtnl_link_unregister(struct rtnl_link_ops *ops);
  */
 struct rtnl_af_ops {
 	struct list_head	list;
+	struct srcu_struct	srcu;
+
 	int			family;
 
 	int			(*fill_link_af)(struct sk_buff *skb,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a0702e531331..817165f6d5ef 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -660,18 +660,31 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
 
 static LIST_HEAD(rtnl_af_ops);
 
-static const struct rtnl_af_ops *rtnl_af_lookup(const int family)
+static struct rtnl_af_ops *rtnl_af_lookup(const int family, int *srcu_index)
 {
-	const struct rtnl_af_ops *ops;
+	struct rtnl_af_ops *ops;
 
 	ASSERT_RTNL();
 
-	list_for_each_entry(ops, &rtnl_af_ops, list) {
-		if (ops->family == family)
-			return ops;
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(ops, &rtnl_af_ops, list) {
+		if (ops->family == family) {
+			*srcu_index = srcu_read_lock(&ops->srcu);
+			goto unlock;
+		}
 	}
 
-	return NULL;
+	ops = NULL;
+unlock:
+	rcu_read_unlock();
+
+	return ops;
+}
+
+static void rtnl_af_put(struct rtnl_af_ops *ops, int srcu_index)
+{
+	srcu_read_unlock(&ops->srcu, srcu_index);
 }
 
 /**
@@ -683,6 +696,7 @@ static const struct rtnl_af_ops *rtnl_af_lookup(const int family)
 void rtnl_af_register(struct rtnl_af_ops *ops)
 {
 	rtnl_lock();
+	init_srcu_struct(&ops->srcu);
 	list_add_tail_rcu(&ops->list, &rtnl_af_ops);
 	rtnl_unlock();
 }
@@ -699,6 +713,7 @@ void rtnl_af_unregister(struct rtnl_af_ops *ops)
 	rtnl_unlock();
 
 	synchronize_rcu();
+	synchronize_srcu(&ops->srcu);
 }
 EXPORT_SYMBOL_GPL(rtnl_af_unregister);
 
@@ -2571,20 +2586,24 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[],
 		int rem, err;
 
 		nla_for_each_nested(af, tb[IFLA_AF_SPEC], rem) {
-			const struct rtnl_af_ops *af_ops;
+			struct rtnl_af_ops *af_ops;
+			int srcu_ops_index;
 
-			af_ops = rtnl_af_lookup(nla_type(af));
+			af_ops = rtnl_af_lookup(nla_type(af), &srcu_ops_index);
 			if (!af_ops)
 				return -EAFNOSUPPORT;
 
 			if (!af_ops->set_link_af)
-				return -EOPNOTSUPP;
-
-			if (af_ops->validate_link_af) {
+				err = -EOPNOTSUPP;
+			else if (af_ops->validate_link_af)
 				err = af_ops->validate_link_af(dev, af, extack);
-				if (err < 0)
-					return err;
-			}
+			else
+				err = 0;
+
+			rtnl_af_put(af_ops, srcu_ops_index);
+
+			if (err < 0)
+				return err;
 		}
 	}
 
@@ -3164,11 +3183,18 @@ static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
 		int rem;
 
 		nla_for_each_nested(af, tb[IFLA_AF_SPEC], rem) {
-			const struct rtnl_af_ops *af_ops;
+			struct rtnl_af_ops *af_ops;
+			int srcu_ops_index;
 
-			BUG_ON(!(af_ops = rtnl_af_lookup(nla_type(af))));
+			af_ops = rtnl_af_lookup(nla_type(af), &srcu_ops_index);
+			if (!af_ops) {
+				err = -EAFNOSUPPORT;
+				goto errout;
+			}
 
 			err = af_ops->set_link_af(dev, af, extack);
+			rtnl_af_put(af_ops, srcu_ops_index);
+
 			if (err < 0)
 				goto errout;
 
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 01/13] rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs.
  2024-10-09 23:16 ` [PATCH v1 net-next 01/13] rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs Kuniyuki Iwashima
@ 2024-10-10 12:36   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 12:36 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:17 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> We will move linkinfo to rtnl_newlink() and pass it down to other
> functions.
>
> Let's pack it into rtnl_newlink_tbs.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 03/13] rtnetlink: Factorise do_setlink() path from __rtnl_newlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 03/13] rtnetlink: Factorise do_setlink() path from __rtnl_newlink() Kuniyuki Iwashima
@ 2024-10-10 12:39   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 12:39 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:18 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> __rtnl_newlink() got too long to maintain.
>
> For example, netdev_master_upper_dev_get()->rtnl_link_ops is fetched even
> when IFLA_INFO_SLAVE_DATA is not specified.
>
> Let's factorise the single dev do_setlink() path to a separate function.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 04/13] rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 04/13] rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink() Kuniyuki Iwashima
@ 2024-10-10 12:40   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 12:40 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:18 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> We will push RTNL down to rtnl_newlink().
>
> Let's move RTNL-independent validation to rtnl_newlink().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 02/13] rtnetlink: Call validate_linkmsg() in do_setlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 02/13] rtnetlink: Call validate_linkmsg() in do_setlink() Kuniyuki Iwashima
@ 2024-10-10 12:44   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 12:44 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:17 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> There are 3 paths that finally call do_setlink(), and validate_linkmsg()
> is called in each path.
>
>   1. RTM_NEWLINK
>     1-1. dev is found in __rtnl_newlink()
>     1-2. dev isn't found, but IFLA_GROUP is specified in
>           rtnl_group_changelink()
>   2. RTM_SETLINK
>
> The next patch factorises 1-1 to a separate function.
>
> As a preparation, let's move validate_linkmsg() calls to do_setlink().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 06/13] rtnetlink: Move ops->validate to rtnl_newlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 06/13] rtnetlink: Move ops->validate " Kuniyuki Iwashima
@ 2024-10-10 12:48   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 12:48 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:19 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> ops->validate() does not require RTNL.
>
> Let's move it to rtnl_newlink().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 07/13] rtnetlink: Protect struct rtnl_link_ops with SRCU.
  2024-10-09 23:16 ` [PATCH v1 net-next 07/13] rtnetlink: Protect struct rtnl_link_ops with SRCU Kuniyuki Iwashima
@ 2024-10-10 13:02   ` Eric Dumazet
  2024-10-10 16:20     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 13:02 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:19 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
> guarantee that rtnl_link_ops is alive during inflight RTM_NEWLINK
> even when its module is being unloaded.
>
> Let's use SRCU to protect rtnl_link_ops.
>
> rtnl_link_ops_get() now iterates link_ops under RCU and returns
> SRCU-protected ops pointer.  The caller must call rtnl_link_ops_put()
> to release the pointer after the use.
>
> Also, __rtnl_link_unregister() unlinks the ops first and calls
> synchronize_srcu() to wait for inflight RTM_NEWLINK requests to
> complete.
>
> Note that link_ops needs to be protected by its dedicated lock
> when RTNL is removed.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  include/net/rtnetlink.h |  5 ++-
>  net/core/rtnetlink.c    | 78 +++++++++++++++++++++++++++++------------
>  2 files changed, 60 insertions(+), 23 deletions(-)
>
> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
> index b45d57b5968a..c873fd6193ed 100644
> --- a/include/net/rtnetlink.h
> +++ b/include/net/rtnetlink.h
> @@ -3,6 +3,7 @@
>  #define __NET_RTNETLINK_H
>
>  #include <linux/rtnetlink.h>
> +#include <linux/srcu.h>
>  #include <net/netlink.h>
>
>  typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *,
> @@ -47,7 +48,8 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
>  /**
>   *     struct rtnl_link_ops - rtnetlink link operations
>   *
> - *     @list: Used internally
> + *     @list: Used internally, protected by RTNL and SRCU
> + *     @srcu: Used internally
>   *     @kind: Identifier
>   *     @netns_refund: Physical device, move to init_net on netns exit
>   *     @maxtype: Highest device specific netlink attribute number
> @@ -78,6 +80,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
>   */
>  struct rtnl_link_ops {
>         struct list_head        list;
> +       struct srcu_struct      srcu;
>
>         const char              *kind;
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 24545c5b7e48..7f464554d881 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -456,15 +456,29 @@ EXPORT_SYMBOL_GPL(rtnl_unregister_all);
>
>  static LIST_HEAD(link_ops);
>
> -static const struct rtnl_link_ops *rtnl_link_ops_get(const char *kind)
> +static struct rtnl_link_ops *rtnl_link_ops_get(const char *kind, int *srcu_index)
>  {
> -       const struct rtnl_link_ops *ops;
> +       struct rtnl_link_ops *ops;
>
> -       list_for_each_entry(ops, &link_ops, list) {
> -               if (!strcmp(ops->kind, kind))
> -                       return ops;
> +       rcu_read_lock();
> +
> +       list_for_each_entry_rcu(ops, &link_ops, list) {
> +               if (!strcmp(ops->kind, kind)) {
> +                       *srcu_index = srcu_read_lock(&ops->srcu);
> +                       goto unlock;
> +               }
>         }
> -       return NULL;
> +
> +       ops = NULL;
> +unlock:
> +       rcu_read_unlock();
> +
> +       return ops;
> +}
> +
> +static void rtnl_link_ops_put(struct rtnl_link_ops *ops, int srcu_index)
> +{
> +       srcu_read_unlock(&ops->srcu, srcu_index);
>  }
>
>  /**
> @@ -479,8 +493,15 @@ static const struct rtnl_link_ops *rtnl_link_ops_get(const char *kind)
>   */
>  int __rtnl_link_register(struct rtnl_link_ops *ops)
>  {
> -       if (rtnl_link_ops_get(ops->kind))
> -               return -EEXIST;
> +       struct rtnl_link_ops *tmp;
> +
> +       /* When RTNL is removed, add lock for link_ops. */
> +       ASSERT_RTNL();
> +
> +       list_for_each_entry(tmp, &link_ops, list) {
> +               if (!strcmp(ops->kind, tmp->kind))
> +                       return -EEXIST;
> +       }
>
>         /* The check for alloc/setup is here because if ops
>          * does not have that filled up, it is not possible
> @@ -490,7 +511,9 @@ int __rtnl_link_register(struct rtnl_link_ops *ops)
>         if ((ops->alloc || ops->setup) && !ops->dellink)
>                 ops->dellink = unregister_netdevice_queue;
>
> -       list_add_tail(&ops->list, &link_ops);
> +       init_srcu_struct(&ops->srcu);

init_srcu_struct() could fail.

> +       list_add_tail_rcu(&ops->list, &link_ops);
> +
>         return 0;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 08/13] rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 08/13] rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink() Kuniyuki Iwashima
@ 2024-10-10 13:08   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 13:08 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:19 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> As a prerequisite of per-netns RTNL, we must fetch netns before
> looking up dev or moving it to another netns.
>
> rtnl_link_get_net_capable() is called in rtnl_newlink_create() and
> do_setlink(), but both of them need to be moved to the RTNL-independent
> region, which will be rtnl_newlink().
>
> Let's call rtnl_link_get_net_capable() in rtnl_newlink() and pass the
> netns down to where needed.
>
> Note that the latter two have not passed the nets to do_setlink() yet
> but will do so after the remaining rtnl_link_get_net_capable() is moved
> to rtnl_setlink() later.
>
> While at it, dest_net is renamed to tgt_net in rtnl_newlink_create() to
> align with rtnl_{del,set}link().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 09/13] rtnetlink: Fetch IFLA_LINK_NETNSID in rtnl_newlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 09/13] rtnetlink: Fetch IFLA_LINK_NETNSID " Kuniyuki Iwashima
@ 2024-10-10 13:09   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 13:09 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> Another netns option for RTM_NEWLINK is IFLA_LINK_NETNSID and
> is fetched in rtnl_newlink_create().
>
> This must be done before holding rtnl_net_lock().
>
> Let's move IFLA_LINK_NETNSID processing to rtnl_newlink().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 10/13] rtnetlink: Clean up rtnl_dellink().
  2024-10-09 23:16 ` [PATCH v1 net-next 10/13] rtnetlink: Clean up rtnl_dellink() Kuniyuki Iwashima
@ 2024-10-10 13:10   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 13:10 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> We will push RTNL down to rtnl_delink().
>
> Let's unify the error path to make it easy to place rtnl_net_lock().
>
> While at it, keep the variables in reverse xmas order.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 11/13] rtnetlink: Clean up rtnl_setlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 11/13] rtnetlink: Clean up rtnl_setlink() Kuniyuki Iwashima
@ 2024-10-10 13:11   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 13:11 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> We will push RTNL down to rtnl_setlink().
>
> Let's unify the error path to make it easy to place rtnl_net_lock().
>
> While at it, keep the variables in reverse xmas order.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink() Kuniyuki Iwashima
@ 2024-10-10 13:15   ` Eric Dumazet
  2024-10-11  7:36   ` kernel test robot
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 13:15 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:21 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> We will push RTNL down to rtnl_setlink().
>
> RTM_SETLINK could call rtnl_link_get_net_capable() in do_setlink()
> to move a dev to a new netns, but the netns needs to be fetched before
> holding rtnl_net_lock().
>
> Let's move it to rtnl_setlink() and pass the netns to do_setlink().
>
> Now, RTM_NEWLINK paths (rtnl_changelink() and rtnl_group_changelink())
> can pass the prefetched netns to do_setlink().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 13/13] rtnetlink: Protect struct rtnl_af_ops with SRCU.
  2024-10-09 23:16 ` [PATCH v1 net-next 13/13] rtnetlink: Protect struct rtnl_af_ops with SRCU Kuniyuki Iwashima
@ 2024-10-10 13:16   ` Eric Dumazet
  2024-10-10 16:27     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2024-10-10 13:16 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	netdev

On Thu, Oct 10, 2024 at 1:21 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
> guarantee that rtnl_af_ops is alive during inflight RTM_SETLINK
> even when its module is being unloaded.
>
> Let's use SRCU to protect rtnl_af_ops.
>
> rtnl_af_lookup() now iterates rtnl_af_ops under RCU and returns
> SRCU-protected ops pointer.  The caller must call rtnl_af_put()
> to release the pointer after the use.
>
> Also, rtnl_af_unregister() unlinks the ops first and calls
> synchronize_srcu() to wait for inflight RTM_SETLINK requests to
> complete.
>
> Note that rtnl_af_ops needs to be protected by its dedicated lock
> when RTNL is removed.
>
> Note also that BUG_ON() in do_setlink() is changed to the normal
> error handling as a different af_ops might be found after
> validate_linkmsg().
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  include/net/rtnetlink.h |  5 +++-
>  net/core/rtnetlink.c    | 58 +++++++++++++++++++++++++++++------------
>  2 files changed, 46 insertions(+), 17 deletions(-)
>
> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
> index c873fd6193ed..407a2f56f00a 100644
> --- a/include/net/rtnetlink.h
> +++ b/include/net/rtnetlink.h
> @@ -150,7 +150,8 @@ void rtnl_link_unregister(struct rtnl_link_ops *ops);
>  /**
>   *     struct rtnl_af_ops - rtnetlink address family operations
>   *
> - *     @list: Used internally
> + *     @list: Used internally, protected by RTNL and SRCU
> + *     @srcu: Used internally
>   *     @family: Address family
>   *     @fill_link_af: Function to fill IFLA_AF_SPEC with address family
>   *                    specific netlink attributes.
> @@ -163,6 +164,8 @@ void rtnl_link_unregister(struct rtnl_link_ops *ops);
>   */
>  struct rtnl_af_ops {
>         struct list_head        list;
> +       struct srcu_struct      srcu;
> +
>         int                     family;
>
>         int                     (*fill_link_af)(struct sk_buff *skb,
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a0702e531331..817165f6d5ef 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -660,18 +660,31 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
>
>  static LIST_HEAD(rtnl_af_ops);
>
> -static const struct rtnl_af_ops *rtnl_af_lookup(const int family)
> +static struct rtnl_af_ops *rtnl_af_lookup(const int family, int *srcu_index)
>  {
> -       const struct rtnl_af_ops *ops;
> +       struct rtnl_af_ops *ops;
>
>         ASSERT_RTNL();
>
> -       list_for_each_entry(ops, &rtnl_af_ops, list) {
> -               if (ops->family == family)
> -                       return ops;
> +       rcu_read_lock();
> +
> +       list_for_each_entry_rcu(ops, &rtnl_af_ops, list) {
> +               if (ops->family == family) {
> +                       *srcu_index = srcu_read_lock(&ops->srcu);
> +                       goto unlock;
> +               }
>         }
>
> -       return NULL;
> +       ops = NULL;
> +unlock:
> +       rcu_read_unlock();
> +
> +       return ops;
> +}
> +
> +static void rtnl_af_put(struct rtnl_af_ops *ops, int srcu_index)
> +{
> +       srcu_read_unlock(&ops->srcu, srcu_index);
>  }
>
>  /**
> @@ -683,6 +696,7 @@ static const struct rtnl_af_ops *rtnl_af_lookup(const int family)
>  void rtnl_af_register(struct rtnl_af_ops *ops)
>  {
>         rtnl_lock();
> +       init_srcu_struct(&ops->srcu);

Same remark, this could fail.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 07/13] rtnetlink: Protect struct rtnl_link_ops with SRCU.
  2024-10-10 13:02   ` Eric Dumazet
@ 2024-10-10 16:20     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-10 16:20 UTC (permalink / raw)
  To: edumazet; +Cc: davem, kuba, kuni1840, kuniyu, netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 10 Oct 2024 15:02:39 +0200
> > @@ -490,7 +511,9 @@ int __rtnl_link_register(struct rtnl_link_ops *ops)
> >         if ((ops->alloc || ops->setup) && !ops->dellink)
> >                 ops->dellink = unregister_netdevice_queue;
> >
> > -       list_add_tail(&ops->list, &link_ops);
> > +       init_srcu_struct(&ops->srcu);
> 
> init_srcu_struct() could fail.

Oh, I somehow assumed init wouldn't fail.
Will fix in v2.

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 13/13] rtnetlink: Protect struct rtnl_af_ops with SRCU.
  2024-10-10 13:16   ` Eric Dumazet
@ 2024-10-10 16:27     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-10 16:27 UTC (permalink / raw)
  To: edumazet; +Cc: davem, kuba, kuni1840, kuniyu, netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 10 Oct 2024 15:16:20 +0200
> > @@ -683,6 +696,7 @@ static const struct rtnl_af_ops *rtnl_af_lookup(const int family)
> >  void rtnl_af_register(struct rtnl_af_ops *ops)
> >  {
> >         rtnl_lock();
> > +       init_srcu_struct(&ops->srcu);
> 
> Same remark, this could fail.

This requries rtnl_af_register() to return int, and that needs
few more changes conflicting with rtnl_register_many() changes.

I'll see if it'd be a bit noisy, and if so, I'll split this patch
to another series.

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().
  2024-10-09 23:16 ` [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink() Kuniyuki Iwashima
  2024-10-10 13:15   ` Eric Dumazet
@ 2024-10-11  7:36   ` kernel test robot
  2024-10-11 16:17     ` Kuniyuki Iwashima
  1 sibling, 1 reply; 30+ messages in thread
From: kernel test robot @ 2024-10-11  7:36 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: llvm, oe-kbuild-all, netdev, Kuniyuki Iwashima

Hi Kuniyuki,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Kuniyuki-Iwashima/rtnetlink-Allocate-linkinfo-as-struct-rtnl_newlink_tbs/20241010-072158
base:   net-next/main
patch link:    https://lore.kernel.org/r/20241009231656.57830-13-kuniyu%40amazon.com
patch subject: [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().
config: x86_64-buildonly-randconfig-003-20241011 (https://download.01.org/0day-ci/archive/20241011/202410111515.TbOH4hSS-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241011/202410111515.TbOH4hSS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410111515.TbOH4hSS-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/core/rtnetlink.c:3281:6: warning: variable 'tgt_net' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
    3281 |         if (err < 0)
         |             ^~~~~~~
   net/core/rtnetlink.c:3301:10: note: uninitialized use occurs here
    3301 |         put_net(tgt_net);
         |                 ^~~~~~~
   net/core/rtnetlink.c:3281:2: note: remove the 'if' if its condition is always false
    3281 |         if (err < 0)
         |         ^~~~~~~~~~~~
    3282 |                 goto errout;
         |                 ~~~~~~~~~~~
   net/core/rtnetlink.c:3277:6: warning: variable 'tgt_net' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
    3277 |         if (err < 0)
         |             ^~~~~~~
   net/core/rtnetlink.c:3301:10: note: uninitialized use occurs here
    3301 |         put_net(tgt_net);
         |                 ^~~~~~~
   net/core/rtnetlink.c:3277:2: note: remove the 'if' if its condition is always false
    3277 |         if (err < 0)
         |         ^~~~~~~~~~~~
    3278 |                 goto errout;
         |                 ~~~~~~~~~~~
   net/core/rtnetlink.c:3272:21: note: initialize the variable 'tgt_net' to silence this warning
    3272 |         struct net *tgt_net;
         |                            ^
         |                             = NULL
   2 warnings generated.


vim +3281 net/core/rtnetlink.c

cc6090e985d7d6 Jiri Pirko        2019-09-30  3264  
c21ef3e343ae91 David Ahern       2017-04-16  3265  static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
c21ef3e343ae91 David Ahern       2017-04-16  3266  			struct netlink_ext_ack *extack)
0157f60c0caea2 Patrick McHardy   2007-06-13  3267  {
3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3268  	struct ifinfomsg *ifm = nlmsg_data(nlh);
3b1e0a655f8eba YOSHIFUJI Hideaki 2008-03-26  3269  	struct net *net = sock_net(skb->sk);
0157f60c0caea2 Patrick McHardy   2007-06-13  3270  	struct nlattr *tb[IFLA_MAX+1];
3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3271  	struct net_device *dev = NULL;
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3272  	struct net *tgt_net;
3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3273  	int err;
0157f60c0caea2 Patrick McHardy   2007-06-13  3274  
8cb081746c031f Johannes Berg     2019-04-26  3275  	err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
8cb081746c031f Johannes Berg     2019-04-26  3276  				     ifla_policy, extack);
0157f60c0caea2 Patrick McHardy   2007-06-13  3277  	if (err < 0)
0157f60c0caea2 Patrick McHardy   2007-06-13  3278  		goto errout;
0157f60c0caea2 Patrick McHardy   2007-06-13  3279  
4ff66cae7f10b6 Christian Brauner 2018-02-07  3280  	err = rtnl_ensure_unique_netns(tb, extack, false);
4ff66cae7f10b6 Christian Brauner 2018-02-07 @3281  	if (err < 0)
4ff66cae7f10b6 Christian Brauner 2018-02-07  3282  		goto errout;
4ff66cae7f10b6 Christian Brauner 2018-02-07  3283  
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3284  	tgt_net = rtnl_link_get_net_capable(skb, net, tb, CAP_NET_ADMIN);
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3285  	if (IS_ERR(tgt_net))
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3286  		return PTR_ERR(tgt_net);
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3287  
0157f60c0caea2 Patrick McHardy   2007-06-13  3288  	if (ifm->ifi_index > 0)
a3d1289126e7b1 Eric Dumazet      2009-10-21  3289  		dev = __dev_get_by_index(net, ifm->ifi_index);
76c9ac0ee878f6 Jiri Pirko        2019-09-30  3290  	else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME])
5ea08b5286f66e Florent Fourcot   2022-04-15  3291  		dev = rtnl_dev_get(net, tb);
0157f60c0caea2 Patrick McHardy   2007-06-13  3292  	else
3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3293  		err = -EINVAL;
0157f60c0caea2 Patrick McHardy   2007-06-13  3294  
3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3295  	if (dev)
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3296  		err = do_setlink(skb, dev, tgt_net, ifm, extack, tb, 0);
3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3297  	else if (!err)
0157f60c0caea2 Patrick McHardy   2007-06-13  3298  		err = -ENODEV;
0157f60c0caea2 Patrick McHardy   2007-06-13  3299  
da5e0494c542dd Thomas Graf       2006-08-10  3300  errout:
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3301  	put_net(tgt_net);
b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3302  
^1da177e4c3f41 Linus Torvalds    2005-04-16  3303  	return err;
^1da177e4c3f41 Linus Torvalds    2005-04-16  3304  }
^1da177e4c3f41 Linus Torvalds    2005-04-16  3305  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().
  2024-10-11  7:36   ` kernel test robot
@ 2024-10-11 16:17     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-11 16:17 UTC (permalink / raw)
  To: lkp; +Cc: davem, edumazet, kuba, kuniyu, llvm, netdev, oe-kbuild-all,
	pabeni

From: kernel test robot <lkp@intel.com>
Date: Fri, 11 Oct 2024 15:36:55 +0800
> cc6090e985d7d6 Jiri Pirko        2019-09-30  3264  
> c21ef3e343ae91 David Ahern       2017-04-16  3265  static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> c21ef3e343ae91 David Ahern       2017-04-16  3266  			struct netlink_ext_ack *extack)
> 0157f60c0caea2 Patrick McHardy   2007-06-13  3267  {
> 3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3268  	struct ifinfomsg *ifm = nlmsg_data(nlh);
> 3b1e0a655f8eba YOSHIFUJI Hideaki 2008-03-26  3269  	struct net *net = sock_net(skb->sk);
> 0157f60c0caea2 Patrick McHardy   2007-06-13  3270  	struct nlattr *tb[IFLA_MAX+1];
> 3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3271  	struct net_device *dev = NULL;
> b27f78e2575aa2 Kuniyuki Iwashima 2024-10-09  3272  	struct net *tgt_net;
> 3a6cb17da69fbf Kuniyuki Iwashima 2024-10-09  3273  	int err;
> 0157f60c0caea2 Patrick McHardy   2007-06-13  3274  
> 8cb081746c031f Johannes Berg     2019-04-26  3275  	err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFLA_MAX,
> 8cb081746c031f Johannes Berg     2019-04-26  3276  				     ifla_policy, extack);
> 0157f60c0caea2 Patrick McHardy   2007-06-13  3277  	if (err < 0)
> 0157f60c0caea2 Patrick McHardy   2007-06-13  3278  		goto errout;
> 0157f60c0caea2 Patrick McHardy   2007-06-13  3279  
> 4ff66cae7f10b6 Christian Brauner 2018-02-07  3280  	err = rtnl_ensure_unique_netns(tb, extack, false);
> 4ff66cae7f10b6 Christian Brauner 2018-02-07 @3281  	if (err < 0)
> 4ff66cae7f10b6 Christian Brauner 2018-02-07  3282  		goto errout;

Oops, I'll simply remove the errout label in v2.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-10-11 16:18 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09 23:16 [PATCH v1 net-next 00/13] rtnetlink: Refactor rtnl_{new,del,set}link() for per-netns RTNL Kuniyuki Iwashima
2024-10-09 23:16 ` [PATCH v1 net-next 01/13] rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs Kuniyuki Iwashima
2024-10-10 12:36   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 02/13] rtnetlink: Call validate_linkmsg() in do_setlink() Kuniyuki Iwashima
2024-10-10 12:44   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 03/13] rtnetlink: Factorise do_setlink() path from __rtnl_newlink() Kuniyuki Iwashima
2024-10-10 12:39   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 04/13] rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink() Kuniyuki Iwashima
2024-10-10 12:40   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 05/13] rtnetlink: Move rtnl_link_ops_get() and retry " Kuniyuki Iwashima
2024-10-09 23:16 ` [PATCH v1 net-next 06/13] rtnetlink: Move ops->validate " Kuniyuki Iwashima
2024-10-10 12:48   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 07/13] rtnetlink: Protect struct rtnl_link_ops with SRCU Kuniyuki Iwashima
2024-10-10 13:02   ` Eric Dumazet
2024-10-10 16:20     ` Kuniyuki Iwashima
2024-10-09 23:16 ` [PATCH v1 net-next 08/13] rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink() Kuniyuki Iwashima
2024-10-10 13:08   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 09/13] rtnetlink: Fetch IFLA_LINK_NETNSID " Kuniyuki Iwashima
2024-10-10 13:09   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 10/13] rtnetlink: Clean up rtnl_dellink() Kuniyuki Iwashima
2024-10-10 13:10   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 11/13] rtnetlink: Clean up rtnl_setlink() Kuniyuki Iwashima
2024-10-10 13:11   ` Eric Dumazet
2024-10-09 23:16 ` [PATCH v1 net-next 12/13] rtnetlink: Call rtnl_link_get_net_capable() in do_setlink() Kuniyuki Iwashima
2024-10-10 13:15   ` Eric Dumazet
2024-10-11  7:36   ` kernel test robot
2024-10-11 16:17     ` Kuniyuki Iwashima
2024-10-09 23:16 ` [PATCH v1 net-next 13/13] rtnetlink: Protect struct rtnl_af_ops with SRCU Kuniyuki Iwashima
2024-10-10 13:16   ` Eric Dumazet
2024-10-10 16:27     ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).