* [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel
@ 2024-10-23 2:31 Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link Xiao Liang
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 2:31 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Ido Schimmel
This patch series includes some netns-related improvements and fixes for
RTNL and ip_tunnel, to make link creation more intuitive:
- Creating link in another net namespace doesn't conflict with link names
in current one.
- Add a flag in rtnl_ops, to avoid netns change when link-netns is present
if possible.
- When creating ip tunnel (e.g. GRE) in another netns, use current as
link-netns if not specified explicitly.
So that
# modprobe ip_gre netns_atomic=1
# ip link add netns ns1 link-netns ns2 tun0 type gre ...
will create tun0 in ns1, rather than create it in ns2 and move to ns1.
And don't conflict with another interface named "tun0" in current netns.
---
Xiao Liang (5):
rtnetlink: Lookup device in target netns when creating link
rtnetlink: Add netns_atomic flag in rtnl_link_ops
net: ip_tunnel: Build flow in underlay net namespace
net: ip_tunnel: Add source netns support for newlink
net: ip_gre: Add netns_atomic module parameter
include/net/ip_tunnels.h | 3 +++
include/net/rtnetlink.h | 3 +++
net/core/rtnetlink.c | 15 ++++++++++-----
net/ipv4/ip_gre.c | 15 +++++++++++++--
net/ipv4/ip_tunnel.c | 27 ++++++++++++++++++---------
5 files changed, 47 insertions(+), 16 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link
2024-10-23 2:31 [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Xiao Liang
@ 2024-10-23 2:31 ` Xiao Liang
2024-10-23 3:49 ` Kuniyuki Iwashima
2024-10-23 2:31 ` [PATCH net-next 2/5] rtnetlink: Add netns_atomic flag in rtnl_link_ops Xiao Liang
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 2:31 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Ido Schimmel
When creating link, lookup for existing device in target net namespace
instead of current one.
For example, two links created by:
# ip link add dummy1 type dummy
# ip link add netns ns1 dummy1 type dummy
should have no conflict since they are in different namespaces.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
---
net/core/rtnetlink.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 194a81e5f608..ff8d25acfc00 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3733,20 +3733,24 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
{
struct nlattr ** const tb = tbs->tb;
struct net *net = sock_net(skb->sk);
+ struct net *device_net;
struct net_device *dev;
struct ifinfomsg *ifm;
bool link_specified;
+ /* When creating, lookup for existing device in target net namespace */
+ device_net = nlh->nlmsg_flags & NLM_F_CREATE ? tgt_net : net;
+
ifm = nlmsg_data(nlh);
if (ifm->ifi_index > 0) {
link_specified = true;
- dev = __dev_get_by_index(net, ifm->ifi_index);
+ dev = __dev_get_by_index(device_net, ifm->ifi_index);
} else if (ifm->ifi_index < 0) {
NL_SET_ERR_MSG(extack, "ifindex can't be negative");
return -EINVAL;
} else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
link_specified = true;
- dev = rtnl_dev_get(net, tb);
+ dev = rtnl_dev_get(device_net, tb);
} else {
link_specified = false;
dev = NULL;
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 2/5] rtnetlink: Add netns_atomic flag in rtnl_link_ops
2024-10-23 2:31 [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link Xiao Liang
@ 2024-10-23 2:31 ` Xiao Liang
2024-10-23 4:03 ` Kuniyuki Iwashima
2024-10-23 2:31 ` [PATCH net-next 3/5] net: ip_tunnel: Build flow in underlay net namespace Xiao Liang
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 2:31 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Ido Schimmel
Currently these two steps are needed to create a net device with
IFLA_LINK_NETNSID attr:
1. create and setup the netdev in the link netns with
rtnl_create_link()
2. move it to the target netns with dev_change_net_namespace()
This has some side effects, including extra ifindex allocation, ifname
validation and link notifications in link netns.
Add a netns_atomic flag, that if set to true, devices will be created in
the target netns directly.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
---
include/net/rtnetlink.h | 3 +++
net/core/rtnetlink.c | 7 ++++---
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index e0d9a8eae6b6..59594cef2272 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -74,6 +74,8 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
* @srcu: Used internally
* @kind: Identifier
* @netns_refund: Physical device, move to init_net on netns exit
+ * @netns_atomic: Device can be created in target netns even when
+ * link-netns is different, avoiding netns change.
* @maxtype: Highest device specific netlink attribute number
* @policy: Netlink policy for device specific attribute validation
* @validate: Optional validation function for netlink/changelink parameters
@@ -115,6 +117,7 @@ struct rtnl_link_ops {
void (*setup)(struct net_device *dev);
bool netns_refund;
+ bool netns_atomic;
unsigned int maxtype;
const struct nla_policy *policy;
int (*validate)(struct nlattr *tb[],
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ff8d25acfc00..99250779d8ba 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3679,8 +3679,9 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
name_assign_type = NET_NAME_ENUM;
}
- dev = rtnl_create_link(link_net ? : tgt_net, ifname,
- name_assign_type, ops, tb, extack);
+ dev = rtnl_create_link(!link_net || ops->netns_atomic ?
+ tgt_net : link_net, ifname, name_assign_type,
+ ops, tb, extack);
if (IS_ERR(dev)) {
err = PTR_ERR(dev);
goto out;
@@ -3700,7 +3701,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
err = rtnl_configure_link(dev, ifm, portid, nlh);
if (err < 0)
goto out_unregister;
- if (link_net) {
+ if (link_net && !ops->netns_atomic) {
err = dev_change_net_namespace(dev, tgt_net, ifname);
if (err < 0)
goto out_unregister;
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 3/5] net: ip_tunnel: Build flow in underlay net namespace
2024-10-23 2:31 [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 2/5] rtnetlink: Add netns_atomic flag in rtnl_link_ops Xiao Liang
@ 2024-10-23 2:31 ` Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 4/5] net: ip_tunnel: Add source netns support for newlink Xiao Liang
` (2 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 2:31 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Ido Schimmel
Build IPv4 flow in underlay net namespace, where encapsulated packets
are routed.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
---
net/ipv4/ip_tunnel.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d591c73e2c0e..09ee39e7b617 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -294,7 +294,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev)
ip_tunnel_init_flow(&fl4, iph->protocol, iph->daddr,
iph->saddr, tunnel->parms.o_key,
- iph->tos & INET_DSCP_MASK, dev_net(dev),
+ iph->tos & INET_DSCP_MASK, tunnel->net,
tunnel->parms.link, tunnel->fwmark, 0, 0);
rt = ip_route_output_key(tunnel->net, &fl4);
@@ -611,7 +611,7 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
}
ip_tunnel_init_flow(&fl4, proto, key->u.ipv4.dst, key->u.ipv4.src,
tunnel_id_to_key32(key->tun_id),
- tos & INET_DSCP_MASK, dev_net(dev), 0, skb->mark,
+ tos & INET_DSCP_MASK, tunnel->net, 0, skb->mark,
skb_get_hash(skb), key->flow_flags);
if (!tunnel_hlen)
@@ -774,7 +774,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
ip_tunnel_init_flow(&fl4, protocol, dst, tnl_params->saddr,
tunnel->parms.o_key, tos & INET_DSCP_MASK,
- dev_net(dev), READ_ONCE(tunnel->parms.link),
+ tunnel->net, READ_ONCE(tunnel->parms.link),
tunnel->fwmark, skb_get_hash(skb), 0);
if (ip_tunnel_encap(skb, &tunnel->encap, &protocol, &fl4) < 0)
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 4/5] net: ip_tunnel: Add source netns support for newlink
2024-10-23 2:31 [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Xiao Liang
` (2 preceding siblings ...)
2024-10-23 2:31 ` [PATCH net-next 3/5] net: ip_tunnel: Build flow in underlay net namespace Xiao Liang
@ 2024-10-23 2:31 ` Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 5/5] net: ip_gre: Add netns_atomic module parameter Xiao Liang
2024-10-29 23:17 ` [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Jakub Kicinski
5 siblings, 0 replies; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 2:31 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Ido Schimmel
Add ip_tunnel_newlink_net() that accepts src_net parameter, which is
passed from newlink() of RTNL ops, and use it as tunnel source netns.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
---
include/net/ip_tunnels.h | 3 +++
net/ipv4/ip_tunnel.c | 21 +++++++++++++++------
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 4e4f9e24c9c1..67f56440e72a 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -408,6 +408,9 @@ int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[],
struct ip_tunnel_parm_kern *p, __u32 fwmark);
int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
struct ip_tunnel_parm_kern *p, __u32 fwmark);
+int ip_tunnel_newlink_net(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct ip_tunnel_parm_kern *p,
+ __u32 fwmark);
void ip_tunnel_setup(struct net_device *dev, unsigned int net_id);
bool ip_tunnel_netlink_encap_parms(struct nlattr *data[],
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 09ee39e7b617..91d54ebe09c7 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -1213,17 +1213,17 @@ void ip_tunnel_delete_nets(struct list_head *net_list, unsigned int id,
}
EXPORT_SYMBOL_GPL(ip_tunnel_delete_nets);
-int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
- struct ip_tunnel_parm_kern *p, __u32 fwmark)
+int ip_tunnel_newlink_net(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct ip_tunnel_parm_kern *p,
+ __u32 fwmark)
{
struct ip_tunnel *nt;
- struct net *net = dev_net(dev);
struct ip_tunnel_net *itn;
int mtu;
int err;
nt = netdev_priv(dev);
- itn = net_generic(net, nt->ip_tnl_net_id);
+ itn = net_generic(src_net, nt->ip_tnl_net_id);
if (nt->collect_md) {
if (rtnl_dereference(itn->collect_md_tun))
@@ -1233,7 +1233,7 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
return -EEXIST;
}
- nt->net = net;
+ nt->net = src_net;
nt->parms = *p;
nt->fwmark = fwmark;
err = register_netdevice(dev);
@@ -1265,6 +1265,13 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
err_register_netdevice:
return err;
}
+EXPORT_SYMBOL_GPL(ip_tunnel_newlink_net);
+
+int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
+ struct ip_tunnel_parm_kern *p, __u32 fwmark)
+{
+ return ip_tunnel_newlink_net(dev_net(dev), dev, tb, p, fwmark);
+}
EXPORT_SYMBOL_GPL(ip_tunnel_newlink);
int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[],
@@ -1326,7 +1333,9 @@ int ip_tunnel_init(struct net_device *dev)
}
tunnel->dev = dev;
- tunnel->net = dev_net(dev);
+ if (!tunnel->net)
+ tunnel->net = dev_net(dev);
+
strscpy(tunnel->parms.name, dev->name);
iph->version = 4;
iph->ihl = 5;
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 5/5] net: ip_gre: Add netns_atomic module parameter
2024-10-23 2:31 [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Xiao Liang
` (3 preceding siblings ...)
2024-10-23 2:31 ` [PATCH net-next 4/5] net: ip_tunnel: Add source netns support for newlink Xiao Liang
@ 2024-10-23 2:31 ` Xiao Liang
2024-10-29 23:17 ` [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Jakub Kicinski
5 siblings, 0 replies; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 2:31 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Ido Schimmel
If set to true, create device in target netns (when different from
link-netns) without netns change, and use current netns as link-netns
if not specified explicitly.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
---
net/ipv4/ip_gre.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index f1f31ebfc793..6ef7e98e4620 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -107,6 +107,11 @@ static bool log_ecn_error = true;
module_param(log_ecn_error, bool, 0644);
MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
+static bool netns_atomic;
+module_param(netns_atomic, bool, 0444);
+MODULE_PARM_DESC(netns_atomic,
+ "Create tunnel in target net namespace directly and use current net namespace as link-netns by default");
+
static struct rtnl_link_ops ipgre_link_ops __read_mostly;
static const struct header_ops ipgre_header_ops;
@@ -1393,6 +1398,7 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct net *link_net = netns_atomic ? src_net : dev_net(dev);
struct ip_tunnel_parm_kern p;
__u32 fwmark = 0;
int err;
@@ -1404,13 +1410,14 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev,
err = ipgre_netlink_parms(dev, data, tb, &p, &fwmark);
if (err < 0)
return err;
- return ip_tunnel_newlink(dev, tb, &p, fwmark);
+ return ip_tunnel_newlink_net(link_net, dev, tb, &p, fwmark);
}
static int erspan_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct net *link_net = netns_atomic ? src_net : dev_net(dev);
struct ip_tunnel_parm_kern p;
__u32 fwmark = 0;
int err;
@@ -1422,7 +1429,7 @@ static int erspan_newlink(struct net *src_net, struct net_device *dev,
err = erspan_netlink_parms(dev, data, tb, &p, &fwmark);
if (err)
return err;
- return ip_tunnel_newlink(dev, tb, &p, fwmark);
+ return ip_tunnel_newlink_net(link_net, dev, tb, &p, fwmark);
}
static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[],
@@ -1777,6 +1784,10 @@ static int __init ipgre_init(void)
pr_info("GRE over IPv4 tunneling driver\n");
+ ipgre_link_ops.netns_atomic = netns_atomic;
+ ipgre_tap_ops.netns_atomic = netns_atomic;
+ erspan_link_ops.netns_atomic = netns_atomic;
+
err = register_pernet_device(&ipgre_net_ops);
if (err < 0)
return err;
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link
2024-10-23 2:31 ` [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link Xiao Liang
@ 2024-10-23 3:49 ` Kuniyuki Iwashima
2024-10-23 4:19 ` Xiao Liang
0 siblings, 1 reply; 14+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-23 3:49 UTC (permalink / raw)
To: shaw.leon; +Cc: davem, dsahern, edumazet, idosch, kuba, kuniyu, netdev, pabeni
From: Xiao Liang <shaw.leon@gmail.com>
Date: Wed, 23 Oct 2024 10:31:42 +0800
> When creating link, lookup for existing device in target net namespace
> instead of current one.
> For example, two links created by:
>
> # ip link add dummy1 type dummy
> # ip link add netns ns1 dummy1 type dummy
>
> should have no conflict since they are in different namespaces.
>
> Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
> ---
> net/core/rtnetlink.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 194a81e5f608..ff8d25acfc00 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3733,20 +3733,24 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> {
> struct nlattr ** const tb = tbs->tb;
> struct net *net = sock_net(skb->sk);
> + struct net *device_net;
> struct net_device *dev;
> struct ifinfomsg *ifm;
> bool link_specified;
>
> + /* When creating, lookup for existing device in target net namespace */
> + device_net = nlh->nlmsg_flags & NLM_F_CREATE ? tgt_net : net;
Technically, this changes uAPI behaviour.
Let's say a user wants to
1) move the device X in the current netns to another if exists, otherwise
2) create a new device X in the target netns
This can be achieved by setting NLM_F_CREATE and IFLA_NET_NS_PID,
IFLA_NET_NS_FD, or IFLA_TARGET_NETNSID.
But with this change, the device X in the current netns will not be moved,
and a new device X is created in the target netns.
> +
> ifm = nlmsg_data(nlh);
> if (ifm->ifi_index > 0) {
> link_specified = true;
> - dev = __dev_get_by_index(net, ifm->ifi_index);
> + dev = __dev_get_by_index(device_net, ifm->ifi_index);
> } else if (ifm->ifi_index < 0) {
> NL_SET_ERR_MSG(extack, "ifindex can't be negative");
> return -EINVAL;
> } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) {
> link_specified = true;
> - dev = rtnl_dev_get(net, tb);
> + dev = rtnl_dev_get(device_net, tb);
> } else {
> link_specified = false;
> dev = NULL;
> --
> 2.47.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 2/5] rtnetlink: Add netns_atomic flag in rtnl_link_ops
2024-10-23 2:31 ` [PATCH net-next 2/5] rtnetlink: Add netns_atomic flag in rtnl_link_ops Xiao Liang
@ 2024-10-23 4:03 ` Kuniyuki Iwashima
2024-10-23 4:36 ` Xiao Liang
0 siblings, 1 reply; 14+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-23 4:03 UTC (permalink / raw)
To: shaw.leon; +Cc: davem, dsahern, edumazet, idosch, kuba, kuniyu, netdev, pabeni
From: Xiao Liang <shaw.leon@gmail.com>
Date: Wed, 23 Oct 2024 10:31:43 +0800
> Currently these two steps are needed to create a net device with
> IFLA_LINK_NETNSID attr:
>
> 1. create and setup the netdev in the link netns with
> rtnl_create_link()
> 2. move it to the target netns with dev_change_net_namespace()
IIRC, this is to send the notification in the link netns.
>
> This has some side effects, including extra ifindex allocation, ifname
> validation and link notifications in link netns.
>
> Add a netns_atomic flag, that if set to true, devices will be created in
> the target netns directly.
>
> Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
> ---
> include/net/rtnetlink.h | 3 +++
> net/core/rtnetlink.c | 7 ++++---
> 2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
> index e0d9a8eae6b6..59594cef2272 100644
> --- a/include/net/rtnetlink.h
> +++ b/include/net/rtnetlink.h
> @@ -74,6 +74,8 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
> * @srcu: Used internally
> * @kind: Identifier
> * @netns_refund: Physical device, move to init_net on netns exit
> + * @netns_atomic: Device can be created in target netns even when
> + * link-netns is different, avoiding netns change.
> * @maxtype: Highest device specific netlink attribute number
> * @policy: Netlink policy for device specific attribute validation
> * @validate: Optional validation function for netlink/changelink parameters
> @@ -115,6 +117,7 @@ struct rtnl_link_ops {
> void (*setup)(struct net_device *dev);
>
> bool netns_refund;
> + bool netns_atomic;
> unsigned int maxtype;
> const struct nla_policy *policy;
> int (*validate)(struct nlattr *tb[],
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index ff8d25acfc00..99250779d8ba 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3679,8 +3679,9 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
> name_assign_type = NET_NAME_ENUM;
> }
>
> - dev = rtnl_create_link(link_net ? : tgt_net, ifname,
> - name_assign_type, ops, tb, extack);
> + dev = rtnl_create_link(!link_net || ops->netns_atomic ?
> + tgt_net : link_net, ifname, name_assign_type,
> + ops, tb, extack);
> if (IS_ERR(dev)) {
> err = PTR_ERR(dev);
> goto out;
> @@ -3700,7 +3701,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm,
> err = rtnl_configure_link(dev, ifm, portid, nlh);
> if (err < 0)
> goto out_unregister;
> - if (link_net) {
> + if (link_net && !ops->netns_atomic) {
> err = dev_change_net_namespace(dev, tgt_net, ifname);
> if (err < 0)
> goto out_unregister;
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link
2024-10-23 3:49 ` Kuniyuki Iwashima
@ 2024-10-23 4:19 ` Xiao Liang
0 siblings, 0 replies; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 4:19 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, dsahern, edumazet, idosch, kuba, netdev, pabeni
On Wed, Oct 23, 2024 at 11:49 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -3733,20 +3733,24 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> > {
> > struct nlattr ** const tb = tbs->tb;
> > struct net *net = sock_net(skb->sk);
> > + struct net *device_net;
> > struct net_device *dev;
> > struct ifinfomsg *ifm;
> > bool link_specified;
> >
> > + /* When creating, lookup for existing device in target net namespace */
> > + device_net = nlh->nlmsg_flags & NLM_F_CREATE ? tgt_net : net;
>
> Technically, this changes uAPI behaviour.
>
> Let's say a user wants to
>
> 1) move the device X in the current netns to another if exists, otherwise
> 2) create a new device X in the target netns
>
> This can be achieved by setting NLM_F_CREATE and IFLA_NET_NS_PID,
> IFLA_NET_NS_FD, or IFLA_TARGET_NETNSID.
>
> But with this change, the device X in the current netns will not be moved,
> and a new device X is created in the target netns.
You're right, what about testing for NLM_F_EXCL aslo?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 2/5] rtnetlink: Add netns_atomic flag in rtnl_link_ops
2024-10-23 4:03 ` Kuniyuki Iwashima
@ 2024-10-23 4:36 ` Xiao Liang
0 siblings, 0 replies; 14+ messages in thread
From: Xiao Liang @ 2024-10-23 4:36 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, dsahern, edumazet, idosch, kuba, netdev, pabeni
On Wed, Oct 23, 2024 at 12:03 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Xiao Liang <shaw.leon@gmail.com>
> Date: Wed, 23 Oct 2024 10:31:43 +0800
> > Currently these two steps are needed to create a net device with
> > IFLA_LINK_NETNSID attr:
> >
> > 1. create and setup the netdev in the link netns with
> > rtnl_create_link()
> > 2. move it to the target netns with dev_change_net_namespace()
>
> IIRC, this is to send the notification in the link netns.
>
Yes. This patch changes this behavior only when the new flag is set.
I doubt if it's really necessary to send link create/delete notifications
to link-netns. Also the current behavior is somewhat inconsistent, say
1) ip link add netns n1 link-netns n2 link eth0 mac0 type macvlan
2) ip -n n2 link add netns n1 link eth0 mac0 type macvlan
Intuitively, the two commands are equivalent. But notification is sent
only for 1.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel
2024-10-23 2:31 [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Xiao Liang
` (4 preceding siblings ...)
2024-10-23 2:31 ` [PATCH net-next 5/5] net: ip_gre: Add netns_atomic module parameter Xiao Liang
@ 2024-10-29 23:17 ` Jakub Kicinski
2024-10-30 2:10 ` Xiao Liang
5 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-10-29 23:17 UTC (permalink / raw)
To: Xiao Liang
Cc: netdev, David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
Kuniyuki Iwashima, Ido Schimmel
On Wed, 23 Oct 2024 10:31:41 +0800 Xiao Liang wrote:
> This patch series includes some netns-related improvements and fixes for
> RTNL and ip_tunnel, to make link creation more intuitive:
>
> - Creating link in another net namespace doesn't conflict with link names
> in current one.
> - Add a flag in rtnl_ops, to avoid netns change when link-netns is present
> if possible.
> - When creating ip tunnel (e.g. GRE) in another netns, use current as
> link-netns if not specified explicitly.
>
> So that
>
> # modprobe ip_gre netns_atomic=1
> # ip link add netns ns1 link-netns ns2 tun0 type gre ...
Do you think the netns_atomic module param is really necessary?
I doubt anyone cares about the event popping up in the wrong
name space first.
BTW would be good to have tests for this. At least the behavior
around name / ifindex collisions in different namespaces.
You can possibly extend/re-purpose netns-name.sh for this.
For notifications you could use python and subscribe to the events
using a YNL socket. May be easier than dealing with ip monitor
as a background process. But either way is fine.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel
2024-10-29 23:17 ` [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Jakub Kicinski
@ 2024-10-30 2:10 ` Xiao Liang
2024-10-30 23:35 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Xiao Liang @ 2024-10-30 2:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
Kuniyuki Iwashima, Ido Schimmel
On Wed, Oct 30, 2024 at 7:17 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 23 Oct 2024 10:31:41 +0800 Xiao Liang wrote:
> > This patch series includes some netns-related improvements and fixes for
> > RTNL and ip_tunnel, to make link creation more intuitive:
> >
> > - Creating link in another net namespace doesn't conflict with link names
> > in current one.
> > - Add a flag in rtnl_ops, to avoid netns change when link-netns is present
> > if possible.
> > - When creating ip tunnel (e.g. GRE) in another netns, use current as
> > link-netns if not specified explicitly.
> >
> > So that
> >
> > # modprobe ip_gre netns_atomic=1
> > # ip link add netns ns1 link-netns ns2 tun0 type gre ...
>
> Do you think the netns_atomic module param is really necessary?
> I doubt anyone cares about the event popping up in the wrong
> name space first.
We used FRRouting in our solution which listens to link notifications to
set up corresponding objects in userspace. Since the events are sent
in different namespaces (thus via different RTNL sockets), we can't
guarantee that the events are received in the correct order, and have
trouble processing them. The way to solve this problem I can think of is
to have a multi-netns RTNL socket where all events are synchronized,
or to eliminate the redundant events in the first place. The latter seems
easier to implement.
>
> BTW would be good to have tests for this. At least the behavior
> around name / ifindex collisions in different namespaces.
> You can possibly extend/re-purpose netns-name.sh for this.
>
> For notifications you could use python and subscribe to the events
> using a YNL socket. May be easier than dealing with ip monitor
> as a background process. But either way is fine.
I will try to add some tests. Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel
2024-10-30 2:10 ` Xiao Liang
@ 2024-10-30 23:35 ` Jakub Kicinski
2024-10-31 3:08 ` Xiao Liang
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-10-30 23:35 UTC (permalink / raw)
To: Xiao Liang
Cc: netdev, David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
Kuniyuki Iwashima, Ido Schimmel
On Wed, 30 Oct 2024 10:10:32 +0800 Xiao Liang wrote:
> > Do you think the netns_atomic module param is really necessary?
> > I doubt anyone cares about the event popping up in the wrong
> > name space first.
>
> We used FRRouting in our solution which listens to link notifications to
> set up corresponding objects in userspace. Since the events are sent
> in different namespaces (thus via different RTNL sockets), we can't
> guarantee that the events are received in the correct order, and have
> trouble processing them. The way to solve this problem I can think of is
> to have a multi-netns RTNL socket where all events are synchronized,
> or to eliminate the redundant events in the first place. The latter seems
> easier to implement.
I think we're on the same page. I'm saying that any potential user
will run into a similar problem, and I don't see a clear use case
for notifications in both namespaces. So we can try to make the
behavior of netns_atomic=1 the default one and get rid of the module
param.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel
2024-10-30 23:35 ` Jakub Kicinski
@ 2024-10-31 3:08 ` Xiao Liang
0 siblings, 0 replies; 14+ messages in thread
From: Xiao Liang @ 2024-10-31 3:08 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
Kuniyuki Iwashima, Ido Schimmel
On Thu, Oct 31, 2024 at 7:35 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 30 Oct 2024 10:10:32 +0800 Xiao Liang wrote:
> > > Do you think the netns_atomic module param is really necessary?
> > > I doubt anyone cares about the event popping up in the wrong
> > > name space first.
> >
> > We used FRRouting in our solution which listens to link notifications to
> > set up corresponding objects in userspace. Since the events are sent
> > in different namespaces (thus via different RTNL sockets), we can't
> > guarantee that the events are received in the correct order, and have
> > trouble processing them. The way to solve this problem I can think of is
> > to have a multi-netns RTNL socket where all events are synchronized,
> > or to eliminate the redundant events in the first place. The latter seems
> > easier to implement.
>
> I think we're on the same page. I'm saying that any potential user
> will run into a similar problem, and I don't see a clear use case
> for notifications in both namespaces. So we can try to make the
> behavior of netns_atomic=1 the default one and get rid of the module
> param.
Ah.. I misunderstood your question... Besides notifications there's another
behavior change.
In this patchset, RTNL core passes link-netns as src_net to drivers. But
ip tunnel driver doesn't support source netns currently, and that's why
we have Patch 4. As a side effect, source netns will be used if link-netns
is not specified.For example:
ip -n ns1 link add netns ns2 gre1 type gre ...
In current implementation, the link-netns is ns2. While with netns_atomic=1,
link-netns will be ns1 (source netns). The module param serves as a way to
keep the current behavior.
I think we can make it default for drivers that already have source
netns support.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-31 3:09 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 2:31 [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 1/5] rtnetlink: Lookup device in target netns when creating link Xiao Liang
2024-10-23 3:49 ` Kuniyuki Iwashima
2024-10-23 4:19 ` Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 2/5] rtnetlink: Add netns_atomic flag in rtnl_link_ops Xiao Liang
2024-10-23 4:03 ` Kuniyuki Iwashima
2024-10-23 4:36 ` Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 3/5] net: ip_tunnel: Build flow in underlay net namespace Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 4/5] net: ip_tunnel: Add source netns support for newlink Xiao Liang
2024-10-23 2:31 ` [PATCH net-next 5/5] net: ip_gre: Add netns_atomic module parameter Xiao Liang
2024-10-29 23:17 ` [PATCH net-next 0/5] net: Improve netns handling in RTNL and ip_tunnel Jakub Kicinski
2024-10-30 2:10 ` Xiao Liang
2024-10-30 23:35 ` Jakub Kicinski
2024-10-31 3:08 ` Xiao Liang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).