Netdev List
 help / color / mirror / Atom feed
* [RFC PATCH net-next 3/5] dev/netns: allow to get netns from nsindex in rtnl msg
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel
In-Reply-To: <1355332630-4256-1-git-send-email-nicolas.dichtel@6wind.com>

This patch allows to move a netdevice to another netns by giving the nsindex.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/net_namespace.h  |  1 +
 include/uapi/linux/if_link.h |  1 +
 net/core/net_namespace.c     | 14 ++++++++++++++
 net/core/rtnetlink.c         |  7 ++++++-
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index c373f2e..68e7a36 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -151,6 +151,7 @@ extern struct list_head net_namespace_list;
 
 extern struct net *get_net_ns_by_pid(pid_t pid);
 extern struct net *get_net_ns_by_fd(int pid);
+extern struct net *get_net_ns_by_nsindex(int nsindex);
 
 #ifdef CONFIG_NET_NS
 extern void __put_net(struct net *net);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 60f3b6b..6720a47 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -142,6 +142,7 @@ enum {
 #define IFLA_PROMISCUITY IFLA_PROMISCUITY
 	IFLA_NUM_TX_QUEUES,
 	IFLA_NUM_RX_QUEUES,
+	IFLA_NET_NS_INDEX,
 	__IFLA_MAX
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2ae22b0..18fc62f 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -399,6 +399,20 @@ struct net *get_net_ns_by_pid(pid_t pid)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_pid);
 
+struct net *get_net_ns_by_nsindex(int nsindex)
+{
+	struct net *net;
+
+	ASSERT_RTNL();
+	for_each_net(net)
+		if (net->nsindex == nsindex) {
+			get_net(net);
+			break;
+		}
+	return net;
+}
+EXPORT_SYMBOL_GPL(get_net_ns_by_nsindex);
+
 static struct genl_family netns_nl_family = {
 	.id		= GENL_ID_GENERATE,
 	.name		= NETNS_GENL_NAME,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1868625..e22954a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1115,6 +1115,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_LINKINFO]		= { .type = NLA_NESTED },
 	[IFLA_NET_NS_PID]	= { .type = NLA_U32 },
 	[IFLA_NET_NS_FD]	= { .type = NLA_U32 },
+	[IFLA_NET_NS_INDEX]	= { .type = NLA_U32 },
 	[IFLA_IFALIAS]	        = { .type = NLA_STRING, .len = IFALIASZ-1 },
 	[IFLA_VFINFO_LIST]	= {. type = NLA_NESTED },
 	[IFLA_VF_PORTS]		= { .type = NLA_NESTED },
@@ -1171,6 +1172,8 @@ struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 		net = get_net_ns_by_pid(nla_get_u32(tb[IFLA_NET_NS_PID]));
 	else if (tb[IFLA_NET_NS_FD])
 		net = get_net_ns_by_fd(nla_get_u32(tb[IFLA_NET_NS_FD]));
+	else if (tb[IFLA_NET_NS_INDEX])
+		net = get_net_ns_by_nsindex(nla_get_u32(tb[IFLA_NET_NS_INDEX]));
 	else
 		net = get_net(src_net);
 	return net;
@@ -1310,7 +1313,9 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	int send_addr_notify = 0;
 	int err;
 
-	if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD]) {
+	if (tb[IFLA_NET_NS_PID] ||
+	    tb[IFLA_NET_NS_FD] ||
+	    tb[IFLA_NET_NS_INDEX]) {
 		struct net *net = rtnl_link_get_net(dev_net(dev), tb);
 		if (IS_ERR(net)) {
 			err = PTR_ERR(net);
-- 
1.8.0.1

^ permalink raw reply related

* [RFC PATCH net-next 2/5] netns: allow to dump netns with netlink
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel
In-Reply-To: <1355332630-4256-1-git-send-email-nicolas.dichtel@6wind.com>

This patch adds the basic support of netlink for netns. The user can dump all
existing netns and get associated nsindex.
He also can get nsindex associated to a pid or fd.

To initialize genetlink family for netns, there is a problem of chicken and
eggs. genetlink init is done after init_net is created, hence when init_net is
created, we cannot call genl_register_family_with_ops(). It's why I put the
init part in genetlink module.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/net_namespace.h |   1 +
 include/uapi/linux/netns.h  |  27 ++++++++
 net/core/net_namespace.c    | 157 ++++++++++++++++++++++++++++++++++++++++++++
 net/netlink/genetlink.c     |   4 ++
 4 files changed, 189 insertions(+)
 create mode 100644 include/uapi/linux/netns.h

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5db7a1b..c373f2e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -306,6 +306,7 @@ extern int register_pernet_subsys(struct pernet_operations *);
 extern void unregister_pernet_subsys(struct pernet_operations *);
 extern int register_pernet_device(struct pernet_operations *);
 extern void unregister_pernet_device(struct pernet_operations *);
+extern int netns_genl_register(void);
 
 struct ctl_table;
 struct ctl_table_header;
diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
new file mode 100644
index 0000000..e1c1da3
--- /dev/null
+++ b/include/uapi/linux/netns.h
@@ -0,0 +1,27 @@
+#ifndef _UAPI_LINUX_NETNS_H_
+#define _UAPI_LINUX_NETNS_H_
+
+/* Generic netlink messages */
+
+#define NETNS_GENL_NAME			"netns"
+#define NETNS_GENL_VERSION		0x1
+
+/* Commands */
+enum {
+	NETNS_CMD_NOOP,
+	NETNS_CMD_GET,
+	__NETNS_CMD_MAX,
+};
+#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
+
+/* Attributes */
+enum {
+	NETNSA_NONE,
+	NETNSA_NSINDEX,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NETNS_H_ */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f5267e4..2ae22b0 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -14,6 +14,8 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/netns.h>
+#include <net/genetlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -397,6 +399,161 @@ struct net *get_net_ns_by_pid(pid_t pid)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_pid);
 
+static struct genl_family netns_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NETNS_GENL_NAME,
+	.version	= NETNS_GENL_VERSION,
+	.hdrsize	= 0,
+	.maxattr	= NETNSA_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC, },
+	[NETNSA_NSINDEX]	= { .type = NLA_U32, },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int netns_nl_get_size(void)
+{
+	return nla_total_size(sizeof(u32)) /* NETNSA_NSINDEX */
+	       ;
+}
+
+static int netns_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
+{
+	struct sk_buff *msg;
+	void *hdr;
+	int ret = -ENOBUFS;
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
+			  &netns_nl_family, 0, NETNS_CMD_NOOP);
+	if (!hdr) {
+		ret = -EMSGSIZE;
+		goto err_out;
+	}
+
+	genlmsg_end(msg, hdr);
+
+	return genlmsg_unicast(genl_info_net(info), msg, info->snd_portid);
+
+err_out:
+	nlmsg_free(msg);
+
+out:
+	return ret;
+}
+
+static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(skb, portid, seq, &netns_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, NETNSA_NSINDEX, net->nsindex))
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int netns_nl_cmd_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+
+	if (info->attrs[NETNSA_PID])
+		net = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		net = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		get_net(net);
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
+			    NLM_F_ACK, NETNS_CMD_GET, net);
+	if (err < 0)
+		goto err_out;
+
+	err = genlmsg_unicast(genl_info_net(info), msg, info->snd_portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+
+out:
+	put_net(net);
+	return err;
+}
+
+static int netns_nl_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	int i = 0, s_i = cb->args[0];
+	struct net *net;
+
+	rtnl_lock();
+	for_each_net(net) {
+		if (i < s_i) {
+			i++;
+			continue;
+		}
+
+		if (netns_nl_fill(skb, NETLINK_CB(cb->skb).portid,
+				  cb->nlh->nlmsg_seq, NLM_F_MULTI,
+				  NETNS_CMD_GET, net) <= 0)
+			goto out;
+
+		i++;
+	}
+
+out:
+	cb->args[0] = i;
+	rtnl_unlock();
+
+	return skb->len;
+}
+
+static struct genl_ops netns_nl_ops[] = {
+	{
+		.cmd = NETNS_CMD_NOOP,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_noop,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NETNS_CMD_GET,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_get,
+		.dumpit = netns_nl_cmd_dump,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+int netns_genl_register(void)
+{
+	return genl_register_family_with_ops(&netns_nl_family, netns_nl_ops,
+					     ARRAY_SIZE(netns_nl_ops));
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f2aabb6..6d25ddb 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -963,6 +963,10 @@ static int __init genl_init(void)
 	if (err < 0)
 		goto problem;
 
+	err = netns_genl_register();
+	if (err < 0)
+		goto problem;
+
 	return 0;
 
 problem:
-- 
1.8.0.1

^ permalink raw reply related

* [RFC PATCH net-next 5/5] net/sock: add support of SO_NETNS
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel
In-Reply-To: <1355332630-4256-1-git-send-email-nicolas.dichtel@6wind.com>

This new setsockopt() option allows user to change netns of a socket. It
should be done enough early, before any bind(), etc.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 arch/alpha/include/asm/socket.h        |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  2 ++
 arch/h8300/include/asm/socket.h        |  2 ++
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/asm/socket.h         |  2 ++
 arch/m68k/include/uapi/asm/socket.h    |  2 ++
 arch/mips/include/uapi/asm/socket.h    |  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h    |  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 28 ++++++++++++++++++++++++++++
 16 files changed, 58 insertions(+)

diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index 0087d05..13aa509 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -77,6 +77,8 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #ifdef __KERNEL__
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 486df68..39cc927 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index 871f89b..ac7eef6 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -70,5 +70,7 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index 90a2e57..4d2a4e8 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 23d6759..ed4534b 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -79,4 +79,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/asm/socket.h b/arch/m32r/include/asm/socket.h
index 5e7088a..37d0eb0 100644
--- a/arch/m32r/include/asm/socket.h
+++ b/arch/m32r/include/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/m68k/include/uapi/asm/socket.h b/arch/m68k/include/uapi/asm/socket.h
index 285da3b..e79aad8 100644
--- a/arch/m68k/include/uapi/asm/socket.h
+++ b/arch/m68k/include/uapi/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 17307ab..356f943 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@ To add: #define SO_REUSEPORT 0x0200	/* Allow local address and port reuse.  */
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index af5366b..b899cf8 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -70,4 +70,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index d9ff473..8503329 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -69,6 +69,8 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		0x4024
 
+#define SO_NETNS		0x4025
+
 
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index eb0b186..1a520ff 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -77,4 +77,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 436d07c..cbdda59 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -76,4 +76,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index c83a937..c1c2853 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -66,6 +66,8 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		0x0027
 
+#define SO_NETNS		0x0028
+
 
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 38079be..a8f956d 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -81,4 +81,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 2d32d07..08c108c 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -73,4 +73,6 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define SO_NOFCS		43
 
+#define SO_NETNS		44
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index a692ef4..7ec288f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -895,6 +895,30 @@ set_rcvbuf:
 		sock_valbool_flag(sk, SOCK_NOFCS, valbool);
 		break;
 
+	case SO_NETNS:
+#ifdef CONFIG_NET_NS
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+			ret = -EPERM;
+		else if (sk->sk_state != TCP_CLOSE)
+			ret = -EBUSY;	/* Too late to change netns */
+		else {
+			struct net *net = get_net_ns_by_nsindex(val);
+
+			if (net) {
+				/* We can not use sk_change_net() because sk
+				 * will not be released with
+				 * sk_release_kernel(). Let do it manually.
+				 */
+				put_net(sock_net(sk));
+				sock_net_set(sk, net);
+			} else
+				ret = -EINVAL;
+		}
+#else
+		ret = -EOPNOTSUPP;
+#endif
+		break;
+
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1140,6 +1164,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 
 		goto lenout;
 
+	case SO_NETNS:
+		v.val = sock_net(sk)->nsindex;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
1.8.0.1

^ permalink raw reply related

* [RFC PATCH net-next 4/5] netns: advertise netns activity with netlink
From: Nicolas Dichtel @ 2012-12-12 17:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka, Nicolas Dichtel
In-Reply-To: <1355332630-4256-1-git-send-email-nicolas.dichtel@6wind.com>

Goal of this patch is to send netlink messages when netns are crated/deleted.
This is useful for daemon that wants to manage all netns with only one running
instance.
Note that until that netns_nl_event_mcgrp group is not registered, we cannot
send event.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/uapi/linux/netns.h |  4 ++++
 net/core/net_namespace.c   | 38 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
index e1c1da3..e14d90b 100644
--- a/include/uapi/linux/netns.h
+++ b/include/uapi/linux/netns.h
@@ -6,10 +6,14 @@
 #define NETNS_GENL_NAME			"netns"
 #define NETNS_GENL_VERSION		0x1
 
+#define NETNS_GENL_MCAST_EVENT_NAME	"events"
+
 /* Commands */
 enum {
 	NETNS_CMD_NOOP,
 	NETNS_CMD_GET,
+	NETNS_CMD_NEW,
+	NETNS_CMD_DEL,
 	__NETNS_CMD_MAX,
 };
 #define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 18fc62f..da92ecb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -40,6 +40,8 @@ EXPORT_SYMBOL(init_net);
 
 static unsigned int max_gen_ptrs = INITIAL_NET_GEN_PTRS;
 
+static int netns_nl_event(struct net *net, int cmd);
+
 static struct net_generic *net_alloc_generic(void)
 {
 	struct net_generic *ng;
@@ -179,6 +181,7 @@ again:
 		if (error < 0)
 			goto out_undo;
 	}
+	netns_nl_event(net, NETNS_CMD_NEW);
 out:
 	return error;
 
@@ -311,6 +314,7 @@ static void cleanup_net(struct work_struct *work)
 	synchronize_rcu();
 
 	list_for_each_entry(net, &net_exit_list, exit_list) {
+		netns_nl_event(net, NETNS_CMD_DEL);
 		/* Free the index */
 		ida_remove(&net_namespace_ids, net->nsindex);
 	}
@@ -413,6 +417,10 @@ struct net *get_net_ns_by_nsindex(int nsindex)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_nsindex);
 
+static struct genl_multicast_group netns_nl_event_mcgrp = {
+	.name = NETNS_GENL_MCAST_EVENT_NAME,
+};
+
 static struct genl_family netns_nl_family = {
 	.id		= GENL_ID_GENERATE,
 	.name		= NETNS_GENL_NAME,
@@ -562,10 +570,38 @@ static struct genl_ops netns_nl_ops[] = {
 	},
 };
 
+static int netns_nl_event(struct net *net, int cmd)
+{
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+
+	/* Check that gennl infra is ready */
+	if (!netns_nl_event_mcgrp.id)
+		return -ENOENT;
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_ATOMIC);
+	if (!msg)
+		return -ENOMEM;
+
+	err = netns_nl_fill(msg, 0, 0, 0, cmd, net);
+	if (err < 0) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_multicast(msg, 0, netns_nl_event_mcgrp.id, GFP_ATOMIC);
+}
+
 int netns_genl_register(void)
 {
-	return genl_register_family_with_ops(&netns_nl_family, netns_nl_ops,
+	int err;
+
+	err =  genl_register_family_with_ops(&netns_nl_family, netns_nl_ops,
 					     ARRAY_SIZE(netns_nl_ops));
+	if (err < 0)
+		return err;
+
+	return genl_register_mc_group(&netns_nl_family, &netns_nl_event_mcgrp);
 }
 
 static int __init net_ns_init(void)
-- 
1.8.0.1

^ permalink raw reply related

* [PATCHv3 iproute2] add DOVE extensions for iproute2
From: David L Stevens @ 2012-12-12 17:55 UTC (permalink / raw)
  To: David Miller, Stephen Hemminger; +Cc: netdev


	This patch adds a new flag to iproute2 for vxlan devices to enable
DOVE features. It also adds support for L2 and L3 switch lookup miss
netlink messages to "ip monitor".

Changes since v2: fix merge conflict
Changes since v1:
	- split "dove" flag into separate feature flags:
		- "proxy" for ARP reduction
		- "rsc" for route short circuiting
		- "l2miss" for L2 switch miss notifications
		- "l3miss" for L3 switch miss notifications

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index ba5c4ab..f2e6bef 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -26,6 +26,8 @@ static void explain(void)
 	fprintf(stderr, "Usage: ... vxlan id VNI [ group ADDR ] [ local ADDR ]\n");
 	fprintf(stderr, "                 [ ttl TTL ] [ tos TOS ] [ dev PHYS_DEV ]\n");
 	fprintf(stderr, "                 [ port MIN MAX ] [ [no]learning ]\n");
+	fprintf(stderr, "                 [ [no]proxy ] [ [no]rsc ]\n");
+	fprintf(stderr, "                 [ [no]l2miss ] [ [no]l3miss ]\n");
 	fprintf(stderr, "\n");
 	fprintf(stderr, "Where: VNI := 0-16777215\n");
 	fprintf(stderr, "       ADDR := { IP_ADDRESS | any }\n");
@@ -44,6 +46,10 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 tos = 0;
 	__u8 ttl = 0;
 	__u8 learning = 1;
+	__u8 proxy = 0;
+	__u8 rsc = 0;
+	__u8 l2miss = 0;
+	__u8 l3miss = 0;
 	__u8 noage = 0;
 	__u32 age = 0;
 	__u32 maxaddr = 0;
@@ -123,6 +129,22 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			learning = 0;
 		} else if (!matches(*argv, "learning")) {
 			learning = 1;
+		} else if (!matches(*argv, "noproxy")) {
+			proxy = 0;
+		} else if (!matches(*argv, "proxy")) {
+			proxy = 1;
+		} else if (!matches(*argv, "norsc")) {
+			rsc = 0;
+		} else if (!matches(*argv, "rsc")) {
+			rsc = 1;
+		} else if (!matches(*argv, "nol2miss")) {
+			l2miss = 0;
+		} else if (!matches(*argv, "l2miss")) {
+			l2miss = 1;
+		} else if (!matches(*argv, "nol3miss")) {
+			l3miss = 0;
+		} else if (!matches(*argv, "l3miss")) {
+			l3miss = 1;
 		} else if (matches(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -148,6 +170,10 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	addattr8(n, 1024, IFLA_VXLAN_TTL, ttl);
 	addattr8(n, 1024, IFLA_VXLAN_TOS, tos);
 	addattr8(n, 1024, IFLA_VXLAN_LEARNING, learning);
+	addattr8(n, 1024, IFLA_VXLAN_PROXY, proxy);
+	addattr8(n, 1024, IFLA_VXLAN_RSC, rsc);
+	addattr8(n, 1024, IFLA_VXLAN_L2MISS, l2miss);
+	addattr8(n, 1024, IFLA_VXLAN_L3MISS, l3miss);
 	if (noage)
 		addattr32(n, 1024, IFLA_VXLAN_AGEING, 0);
 	else if (age)
@@ -213,6 +239,18 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (tb[IFLA_VXLAN_LEARNING] &&
 	    !rta_getattr_u8(tb[IFLA_VXLAN_LEARNING]))
 		fputs("nolearning ", f);
+ 
+	if (tb[IFLA_VXLAN_PROXY] && rta_getattr_u8(tb[IFLA_VXLAN_PROXY]))
+		fputs("proxy ", f);
+ 
+	if (tb[IFLA_VXLAN_RSC] && rta_getattr_u8(tb[IFLA_VXLAN_RSC]))
+		fputs("rsc ", f);
+
+	if (tb[IFLA_VXLAN_L2MISS] && rta_getattr_u8(tb[IFLA_VXLAN_L2MISS]))
+		fputs("l2miss ", f);
+
+	if (tb[IFLA_VXLAN_L3MISS] && rta_getattr_u8(tb[IFLA_VXLAN_L3MISS]))
+		fputs("l3miss ", f);
 	
 	if (tb[IFLA_VXLAN_TOS] &&
 	    (tos = rta_getattr_u8(tb[IFLA_VXLAN_TOS]))) {
diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index d87e58f..d971623 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -67,7 +67,8 @@ int accept_msg(const struct sockaddr_nl *who,
 		print_addrlabel(who, n, arg);
 		return 0;
 	}
-	if (n->nlmsg_type == RTM_NEWNEIGH || n->nlmsg_type == RTM_DELNEIGH) {
+	if (n->nlmsg_type == RTM_NEWNEIGH || n->nlmsg_type == RTM_DELNEIGH ||
+	    n->nlmsg_type == RTM_GETNEIGH) {
 		if (prefix_banner)
 			fprintf(fp, "[NEIGH]");
 		print_neigh(who, n, arg);
diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 56e56b2..1b7600b 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -189,7 +189,8 @@ int print_neigh(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	struct rtattr * tb[NDA_MAX+1];
 	char abuf[256];
 
-	if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
+	if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH &&
+	    n->nlmsg_type != RTM_GETNEIGH) {
 		fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
 			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
 
@@ -251,6 +252,8 @@ int print_neigh(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 
 	if (n->nlmsg_type == RTM_DELNEIGH)
 		fprintf(fp, "delete ");
+	else if (n->nlmsg_type == RTM_GETNEIGH)
+		fprintf(fp, "miss ");
 	if (tb[NDA_DST]) {
 		fprintf(fp, "%s ",
 			format_host(r->ndm_family,

^ permalink raw reply related

* Re: [PATCH V1 net-next 0/4] Add destination MAC address to ethtool flow steering
From: David Miller @ 2012-12-12 18:03 UTC (permalink / raw)
  To: amirv; +Cc: netdev, ogerlitz, hadarh, yanb
In-Reply-To: <1355314400-14909-1-git-send-email-amirv@mellanox.com>

From: Amir Vadai <amirv@mellanox.com>
Date: Wed, 12 Dec 2012 14:13:16 +0200

> From: Yan Burman <yanb@mellanox.com>
> 
> In vSwitch configuration it is often beneficial to create flow steering
> rules for L3/L4 traffic based on VM port. This requires destination MAC
> address of that port to be present. Note that today the mlx4_en driver 
> adds the mac address of itself to the flow spec, where under the new
> ethtool flag suggested here it doesn't.
> 
> It may also be useful in macvlan devices.
> 
> These patches add kernel support for the new field (does not break old
> userspace compatibility, so new ethtool will work on old kernels and
> old ethtool will work with new kernels).
> 
> Also present here is the ethtool userspace patch.
> 
> See more details here http ://marc.info/?t=134977576500003

Kernel side applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/2] bridge: notify mdb changes via netlink
From: David Miller @ 2012-12-12 18:03 UTC (permalink / raw)
  To: amwang; +Cc: tgraf, netdev, shemminger, bridge, herbert
In-Reply-To: <1355300590-2390-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>
Date: Wed, 12 Dec 2012 16:23:07 +0800

> From: Cong Wang <amwang@redhat.com>
> 
> As Stephen mentioned, we need to monitor the mdb
> changes in user-space, so add notifications via netlink too.
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Cong Wang <amwang@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/2] bridge: add support of adding and deleting mdb entries
From: David Miller @ 2012-12-12 18:03 UTC (permalink / raw)
  To: amwang; +Cc: netdev, bridge, herbert, shemminger, tgraf
In-Reply-To: <1355300590-2390-2-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>
Date: Wed, 12 Dec 2012 16:23:08 +0800

> From: Cong Wang <amwang@redhat.com>
> 
> This patch implents adding/deleting mdb entries via netlink.
> Currently all entries are temp, we probably need a flag to distinguish
> permanent entries too.
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Cong Wang <amwang@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCHv3 iproute2] add DOVE extensions for iproute2
From: Stephen Hemminger @ 2012-12-12 18:03 UTC (permalink / raw)
  To: David L Stevens; +Cc: David Miller, netdev
In-Reply-To: <201212121756.qBCHtOfn021538@lab1.dls>

On Wed, 12 Dec 2012 12:55:24 -0500
David L Stevens <dlstevens@us.ibm.com> wrote:

> 	This patch adds a new flag to iproute2 for vxlan devices to enable
> DOVE features. It also adds support for L2 and L3 switch lookup miss
> netlink messages to "ip monitor".
> 
> Changes since v2: fix merge conflict
> Changes since v1:
> 	- split "dove" flag into separate feature flags:
> 		- "proxy" for ARP reduction
> 		- "rsc" for route short circuiting
> 		- "l2miss" for L2 switch miss notifications
> 		- "l3miss" for L3 switch miss notifications
> 
> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

Applied, after mollifying the git whitespace complaints.

^ permalink raw reply

* NET development closed...
From: David Miller @ 2012-12-12 18:07 UTC (permalink / raw)
  To: netdev; +Cc: linux-wireless, netfilter-devel


We're in the merge window, that means only bug fixes from now until
the merge window closes and I make a posting notifying everyone
that net-next is open again.

I should be sending a pull request to Linus later today.

Thanks.

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Jiri Pirko @ 2012-12-12 18:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212092700.7ef2607a@nehalam.linuxnetplumber.net>

Wed, Dec 12, 2012 at 06:27:00PM CET, shemminger@vyatta.com wrote:
>On Wed, 12 Dec 2012 18:05:20 +0100
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Wed, Dec 12, 2012 at 05:15:00PM CET, shemminger@vyatta.com wrote:
>> >On Wed, 12 Dec 2012 11:58:03 +0100
>> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >
>> >> This is basically a repost of my previous patchset:
>> >> "[patch net-next-2.6 0/2] net: allow to change carrier via sysfs" from Aug 30
>> >> 
>> >> The way net-sysfs stores values changed and this patchset reflects it.
>> >> Also, I exposed carrier via rtnetlink iface.
>> >> 
>> >> So far, only dummy driver uses carrier change ndo. In very near future
>> >> team driver will use that as well.
>> >> 
>> >> Jiri Pirko (4):
>> >>   net: add change_carrier netdev op
>> >>   net: allow to change carrier via sysfs
>> >>   rtnl: expose carrier value with possibility to set it
>> >>   dummy: implement carrier change
>> >> 
>> >>  drivers/net/dummy.c          | 10 ++++++++++
>> >>  include/linux/netdevice.h    |  7 +++++++
>> >>  include/uapi/linux/if_link.h |  1 +
>> >>  net/core/dev.c               | 19 +++++++++++++++++++
>> >>  net/core/net-sysfs.c         | 15 ++++++++++++++-
>> >>  net/core/rtnetlink.c         | 10 ++++++++++
>> >>  6 files changed, 61 insertions(+), 1 deletion(-)
>> >> 
>> >
>> >I needed to do the same thing for a project we are working on and discovered
>> >that there already is a working documented interface for doing that via
>> >operstate mode. Therefore I can't recommend that the additional complexity
>> >of a new API for this is required.
>> 
>> I might be missing something, but I'm unable to find how operstate set
>> can affect value returned by netif_carrier_ok()
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>Here is an example using dummy device using libmnl. It is also possible
>with ip commands.
>
># modprobe dummy
># ip li show dev dummy0
>12: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT 
>    link/ether ce:90:46:83:6e:f8 brd ff:ff:ff:ff:ff:ff
># ./dummy dummy0 init
># ip li show dev dummy0
>12: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DORMANT 
>    link/ether ce:90:46:83:6e:f8 brd ff:ff:ff:ff:ff:ff
># ip li set dummy0 up
># ip li show dev dummy0
>12: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DORMANT 
>    link/ether ce:90:46:83:6e:f8 brd ff:ff:ff:ff:ff:ff
># ./dummy dummy0 down
># ip li show dev dummy0
>12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT 

if you mean this "NO-CARRIER"
it has no direct relation with netif_carrier_ok().


>    link/ether ce:90:46:83:6e:f8 brd ff:ff:ff:ff:ff:ff
># ./dummy dummy0 up
># ip li show dev dummy0
>12: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT 
>    link/ether ce:90:46:83:6e:f8 brd ff:ff:ff:ff:ff:ff
>
>
>/* Sample program to control link mode and link state */
>#include <stdio.h>
>#include <stdlib.h>
>#include <unistd.h>
>#include <string.h>
>#include <time.h>
>#include <errno.h>
>#include <sys/types.h>
>#include <sys/fcntl.h>
>#include <sys/ioctl.h>
>#include <libmnl/libmnl.h>
>
>#include <linux/if.h>
>#include <linux/if_tun.h>
>#include <linux/rtnetlink.h>
>
>static void panic(const char *str)
>{
>	perror(str);
>	exit(1);
>}
>
>static void usage(const char *cmd)
>{
>	fprintf(stderr, "Usage: %s dummyX [up|down|init]\n", cmd);
>	exit(1);
>}
>
>/* Send request and parse response */
>static void mnl_talk(struct mnl_socket *nl, struct nlmsghdr *nlh)
>{
>	unsigned portid = mnl_socket_get_portid(nl);
>	uint32_t seq = time(NULL);
>	char buf[MNL_SOCKET_BUFFER_SIZE];
>
>	nlh->nlmsg_flags |= NLM_F_ACK;
>	nlh->nlmsg_seq = seq;
>
>	if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0)
>		panic("mnl_socket_sendto failed");
>
>	int ret = mnl_socket_recvfrom(nl, buf, sizeof(buf));
>	if (ret < 0)
>		panic("mnl_socket_recvfrom");
>
>	if ( mnl_cb_run(buf, ret, seq, portid, NULL, NULL) < 0)
>		panic("mnl_cb_run");
>}
>
>static void linkstate(struct mnl_socket *nl,
>		      const char *ifname, unsigned int state)
>{
>	char buf[MNL_SOCKET_BUFFER_SIZE];
>	struct nlmsghdr *nlh = mnl_nlmsg_put_header(buf);
>	nlh->nlmsg_type = RTM_NEWLINK;
>	nlh->nlmsg_flags = NLM_F_REQUEST;
>
>	struct ifinfomsg *ifi;
>	ifi = mnl_nlmsg_put_extra_header(nlh, sizeof(struct ifinfomsg));
>	ifi->ifi_family = AF_UNSPEC;
>
>	mnl_attr_put_strz(nlh, IFLA_IFNAME, ifname);
>	mnl_attr_put_u8(nlh, IFLA_OPERSTATE, state);
>
>	mnl_talk(nl, nlh);
>}
>
>/* Set device link mode */
>static void init(struct mnl_socket *nl, const char *ifname)
>{
>	char buf[MNL_SOCKET_BUFFER_SIZE];
>	struct nlmsghdr *nlh = mnl_nlmsg_put_header(buf);
>	nlh->nlmsg_type = RTM_NEWLINK;
>	nlh->nlmsg_flags = NLM_F_REQUEST;
>
>	struct ifinfomsg *ifi;
>	ifi = mnl_nlmsg_put_extra_header(nlh, sizeof(struct ifinfomsg));
>	ifi->ifi_family = AF_UNSPEC;
>	
>	mnl_attr_put_strz(nlh, IFLA_IFNAME, ifname);
>	mnl_attr_put_u8(nlh, IFLA_LINKMODE, IF_LINK_MODE_DORMANT);
>	mnl_talk(nl, nlh);
>}
>
>int main(int argc, char **argv)
>{
>	if (argc != 3)
>		usage(argv[0]);
>
>	struct mnl_socket *nl = mnl_socket_open(NETLINK_ROUTE);
>	if (!nl)
>		panic("mnl_socket_open");
>
>	if (mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID) < 0)
>		panic("mnl_socket_bind");
>	
>
>	if (strcmp(argv[2], "init") == 0)
>		init(nl, argv[1]);
>	else if (strcmp(argv[2], "up") == 0)
>		linkstate(nl, argv[1], IF_OPER_UP);
>	else if (strcmp(argv[2], "down") == 0)
>		linkstate(nl, argv[1], IF_OPER_DORMANT);
>	else
>		usage(argv[0]);
>
>	return 0;
>}
>
>

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Stephen Hemminger @ 2012-12-12 18:12 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212181017.GB3060@minipsycho.orion>

On Wed, 12 Dec 2012 19:10:17 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> ># ip li show dev dummy0
> >12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT   
> 
> if you mean this "NO-CARRIER"
> it has no direct relation with netif_carrier_ok().

It is the same value (IFF_RUNNING) that is visible from user space.

^ permalink raw reply

* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
From: Jesse Gross @ 2012-12-12 18:17 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller
In-Reply-To: <CA+mtBx-Zf9FNf11H9RM12etHnJ1bPpM_Eyc4mR7E6xsb7sUP2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, Dec 11, 2012 at 7:14 PM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> This patch adds ipv6 set action functionality. It allows to change
>> traffic class, flow label, hop-limit, ipv6 source and destination
>> address fields.
>>
> I have to wonder about these patches and the underlying design
> direction.  Aren't these sort of things and more already implemented
> by IPtables but in a modular and extensible fashion?  Has there been
> any thought into hooking OVS to IP tables to leverage all the existing
> functionality?

At an implementation level, the goal is definitely to share as much
code as possible.  Some of that was obviously done to support this
patch and I'm sure there are more areas where it could be taken
further.

At a more conceptual level we've explored this path a number of times
and it's never been attractive since it has a tendency to drag more
OVS code into other parts of the kernel and generally make things
worse for everybody.  Of course, it's hard to say without knowing what
you're thinking.  Do you have a specific proposal?

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Jiri Pirko @ 2012-12-12 18:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212101208.361ccda0@nehalam.linuxnetplumber.net>

Wed, Dec 12, 2012 at 07:12:08PM CET, shemminger@vyatta.com wrote:
>On Wed, 12 Dec 2012 19:10:17 +0100
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> ># ip li show dev dummy0
>> >12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT   
>> 
>> if you mean this "NO-CARRIER"
>> it has no direct relation with netif_carrier_ok().
>
>It is the same value (IFF_RUNNING) that is visible from user space.

static inline bool netif_carrier_ok(const struct net_device *dev)
{
	        return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
}

So netif_carrier[ok/on/off] are working with on __LINK_STATE_NOCARRIER
bit. Not with IFF_RUNNING flag.

^ permalink raw reply

* Re: [PATCH iproute2] ip: use rtnelink to manage mroute
From: Stephen Hemminger @ 2012-12-12 18:26 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev
In-Reply-To: <1355304728-4944-1-git-send-email-nicolas.dichtel@6wind.com>

On Wed, 12 Dec 2012 10:32:08 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> mroute was using /proc/net/ip_mr_[vif|cache] to display mroute entries. Hence,
> only RT_TABLE_DEFAULT was displayed and only IPv4.
> With rtnetlink, it is possible to display all tables for IPv4 and IPv6. The output
> format is kept. Also, like before the patch, statistics are displayed when user specify
> the '-s' argument.
> 
> The patch also adds the support of 'ip monitor mroute', which is now possible.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---

The functionality is fine, and if you clean it up I will accept it.

Patch does not apply cleanly against current iproute2 git.

Also, it causes several compiler warnings.
ipmonitor.c: In function ‘accept_msg’:
ipmonitor.c:50:20: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long unsigned int’ [-Wformat]

ipmroute.c: In function ‘print_mroute’:
ipmroute.c:165:4: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘__u64’ [-Wformat]
ipmroute.c:165:4: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘__u64’ [-Wformat]
ipmroute.c:168:5: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘__u64’ [-Wformat]

^ permalink raw reply

* Re: [PATCH V1 net-next 1/3] net: ethtool: Add destination MAC address to flow steering API
From: Ben Hutchings @ 2012-12-12 18:28 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Or Gerlitz, Hadar Har-Zion, Yan Burman
In-Reply-To: <1355314400-14909-2-git-send-email-amirv@mellanox.com>

On Wed, 2012-12-12 at 14:13 +0200, Amir Vadai wrote:
> From: Yan Burman <yanb@mellanox.com>
> 
> Add ability to specify destination MAC address for L3/L4 flow spec
> in order to be able to specify action for different VM's under vSwitch
> configuration. This change is transparent to older userspace.
> 
> Signed-off-by: Yan Burman <yanb@mellanox.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
>  include/uapi/linux/ethtool.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index d3eaaaf..be8c41e 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -500,13 +500,15 @@ union ethtool_flow_union {
>  	struct ethtool_ah_espip4_spec		esp_ip4_spec;
>  	struct ethtool_usrip4_spec		usr_ip4_spec;
>  	struct ethhdr				ether_spec;
> -	__u8					hdata[60];
> +	__u8					hdata[52];
>  };
>  
>  struct ethtool_flow_ext {
> -	__be16	vlan_etype;
> -	__be16	vlan_tci;
> -	__be32	data[2];
> +	__u8		padding[2];
> +	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
> +	__be16		vlan_etype;
> +	__be16		vlan_tci;
> +	__be32		data[2];
>  };
>  
>  /**
> @@ -1027,6 +1029,7 @@ enum ethtool_sfeatures_retval_bits {
>  #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
>  /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
>  #define	FLOW_EXT	0x80000000
> +#define	FLOW_MAC_EXT	0x40000000

Please can you send another patch that adds kernel-doc to struct
ethtool_flow_ext explaining which fields are dependent on which flags.

Ben.

>  /* L3-L4 network traffic flow hash options */
>  #define	RXH_L2DA	(1 << 1)

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 6/6] netfilter: nf_nat: Handle routing changes in MASQUERADE target
From: Jozsef Kadlecsik @ 2012-12-12 18:37 UTC (permalink / raw)
  To: Andrew Collins; +Cc: netfilter-devel, netdev
In-Reply-To: <CAKTPYJQn_vVg+f1Nvbe=hU7Xzw7mX6Xw7ZR4Tz2Bpd49792-rg@mail.gmail.com>

On Tue, 11 Dec 2012, Andrew Collins wrote:

> On Tue, Dec 4, 2012 at 10:31 AM, <pablo@netfilter.org> wrote:
> >
> > From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> >
> > When the route changes (backup default route, VPNs) which affect a
> > masqueraded target, the packets were sent out with the outdated source
> > address. The patch addresses the issue by comparing the outgoing interface
> > directly with the masqueraded interface in the nat table.
> >
> > Events are inefficient in this case, because it'd require adding route
> > events to the network core and then scanning the whole conntrack table
> > and re-checking the route for all entry.
> >
> > Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> 
> Jozsef, a small question about this change.  Should this same check
> not exist here:
> 
>         case IP_CT_NEW:
>                 /* Seen it before?  This can happen for loopback, retrans,
>                  * or local packets.
>                  */
>                 if (!nf_nat_initialized(ct, maniptype)) {
>                         unsigned int ret;
> 
>                         ret = nf_nat_rule_find(skb, hooknum, in, out, ct);
>                         if (ret != NF_ACCEPT)
>                                 return ret;
> -               } else
> +               } else {
>                         pr_debug("Already setup manip %s for ct %p\n",
>                                  maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
>                                  ct);
> +                       if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) {
> +                               nf_ct_kill_acct(ct, ctinfo, skb);
> +                               return NF_DROP;
> +                       }
> +               }
>                 break;
> 
> as well?  It's *significantly* less common than the case you fixed,
> and perhaps just letting the state time out is acceptable, but I've
> seen TCP connections get stuck with the wrong source address if we
> haven't hit ESTABLISHED at the point when the routing change occurs
> (most reproducible on high latency links).

It is a less common case, but I think you are right: the timeout can take 
several minutes. But instead of repeating the code segment, a "goto" case 
handling were better. Are you going to submit a patch?

Best regards,
Jozsef

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Stephen Hemminger @ 2012-12-12 18:36 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212182556.GC3060@minipsycho.orion>

On Wed, 12 Dec 2012 19:25:56 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> Wed, Dec 12, 2012 at 07:12:08PM CET, shemminger@vyatta.com wrote:
> >On Wed, 12 Dec 2012 19:10:17 +0100
> >Jiri Pirko <jiri@resnulli.us> wrote:
> >
> >> ># ip li show dev dummy0
> >> >12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT   
> >> 
> >> if you mean this "NO-CARRIER"
> >> it has no direct relation with netif_carrier_ok().
> >
> >It is the same value (IFF_RUNNING) that is visible from user space.
> 
> static inline bool netif_carrier_ok(const struct net_device *dev)
> {
> 	        return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
> }
> 
> So netif_carrier[ok/on/off] are working with on __LINK_STATE_NOCARRIER
> bit. Not with IFF_RUNNING flag.

What is the code path that you are worried about netif_carrier_ok being set or clear?
The interaction here is complex, and right now LINK_STATE_NOCARRIER is purely
controlled by the driver, your patch changes that, but before acking I want
to make sure why it is required.

^ permalink raw reply

* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
From: Tom Herbert @ 2012-12-12 18:38 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller, Mike Waychison
In-Reply-To: <CAEP_g=-1aWGsjR55AaD6sLLt4QzbYgUs-3hfNNONrrf8MDwSyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

> At an implementation level, the goal is definitely to share as much
> code as possible.  Some of that was obviously done to support this
> patch and I'm sure there are more areas where it could be taken
> further.
>
> At a more conceptual level we've explored this path a number of times
> and it's never been attractive since it has a tendency to drag more
> OVS code into other parts of the kernel and generally make things
> worse for everybody.  Of course, it's hard to say without knowing what
> you're thinking.  Do you have a specific proposal?

Where is the line drawn?  Is the intent that over the next five years
that functionality will be added ad hoc increments to make OVS have
the same functionality as IP tables, tc, routing?  Are we going to
have things like NAT, stateful firewalls, DDOS mechanisms implemented
in OVS (we already have people proposing such things!).

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Nicolas Dichtel @ 2012-12-12 18:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm, aatteka
In-Reply-To: <1355332630-4256-1-git-send-email-nicolas.dichtel@6wind.com>

2012/12/12 Nicolas Dichtel <nicolas.dichtel@6wind.com>:
> The goal of this serie is to ease netns management by daemons. Some systems use
> netns only to virtualize network stack and don't want to multiply userland
> daemons.  These system may have a lot of netns, up to 2000. We don't want to
> launch an instance of each daemons (quagga, strongswan, conntrackd, ...) for
> each netns because it will consume a lot of ressources. Having one daemon that
> manage all netns is more efficient (mainly if there are few objects to manage:
> one or two routes per netns for example).
> Hence, one goal of this serie is to allow, for a daemon, to monitor netns
> activities, thus it can open or close netlink sockets, allocating structures
> needed to manage these netns when they are created or deleted.
> To help to identify a netns, an index has been added to each netns.
>
> A new setsockopt() option is also added, to help daemons to open socket in the
> right netns. For now, a daemon that want to open a socket in a specified netns,
> need to call setns(CLONE_NEWNET) with a fd (not so easy to found), open the
> socket and then call again setns() to go back in the initial netns. Having this
> kind of setsockopt() will simplify operations. Obviously, this setsockopt()
> should be done enough early (is test on sk_state enough?). The first target is
> netlink socket but it can be useful for other kind of socket, it's why a add a
> generic socket option.
>
> As usual, the patch against iproute2 will be sent once the patches are included
> and net-next merged. I can send it on demand.
>
>  arch/alpha/include/asm/socket.h        |   2 +
>  arch/avr32/include/uapi/asm/socket.h   |   2 +
>  arch/frv/include/uapi/asm/socket.h     |   2 +
>  arch/h8300/include/asm/socket.h        |   2 +
>  arch/ia64/include/uapi/asm/socket.h    |   2 +
>  arch/m32r/include/asm/socket.h         |   2 +
>  arch/m68k/include/uapi/asm/socket.h    |   2 +
>  arch/mips/include/uapi/asm/socket.h    |   2 +
>  arch/mn10300/include/uapi/asm/socket.h |   2 +
>  arch/parisc/include/uapi/asm/socket.h  |   2 +
>  arch/powerpc/include/uapi/asm/socket.h |   2 +
>  arch/s390/include/uapi/asm/socket.h    |   2 +
>  arch/sparc/include/uapi/asm/socket.h   |   2 +
>  arch/xtensa/include/uapi/asm/socket.h  |   2 +
>  include/net/net_namespace.h            |   3 +
>  include/uapi/asm-generic/socket.h      |   2 +
>  include/uapi/linux/if_link.h           |   1 +
>  include/uapi/linux/netns.h             |  31 +++++
>  net/core/net_namespace.c               | 223 +++++++++++++++++++++++++++++++++
>  net/core/rtnetlink.c                   |   7 +-
>  net/core/sock.c                        |  28 +++++
>  net/netlink/genetlink.c                |   4 +
>  22 files changed, 326 insertions(+), 1 deletion(-)
>
> I do not pretend to be a netns expert, it's why I add RFC in the title ;-)
>
> Comments are welcome.

Sorry for the double send, it's a wrong manip!

^ permalink raw reply

* Re: [PATCH 2/2] iproute2: add support to monitor mdb entries too
From: Stephen Hemminger @ 2012-12-12 18:41 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, bridge, Thomas Graf
In-Reply-To: <1355300590-2390-4-git-send-email-amwang@redhat.com>

On Wed, 12 Dec 2012 16:23:10 +0800
Cong Wang <amwang@redhat.com> wrote:

> From: Cong Wang <amwang@redhat.com>
> 
> This patch implements `bridge monitor mdb`.
> 
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Cong Wang <amwang@redhat.com>
> 

Accepted for 3.8 since Dave accepted the kernel parts. Thanks

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Jiri Pirko @ 2012-12-12 18:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212103632.2020efce@nehalam.linuxnetplumber.net>

Wed, Dec 12, 2012 at 07:36:32PM CET, shemminger@vyatta.com wrote:
>On Wed, 12 Dec 2012 19:25:56 +0100
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Wed, Dec 12, 2012 at 07:12:08PM CET, shemminger@vyatta.com wrote:
>> >On Wed, 12 Dec 2012 19:10:17 +0100
>> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >
>> >> ># ip li show dev dummy0
>> >> >12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT   
>> >> 
>> >> if you mean this "NO-CARRIER"
>> >> it has no direct relation with netif_carrier_ok().
>> >
>> >It is the same value (IFF_RUNNING) that is visible from user space.
>> 
>> static inline bool netif_carrier_ok(const struct net_device *dev)
>> {
>> 	        return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
>> }
>> 
>> So netif_carrier[ok/on/off] are working with on __LINK_STATE_NOCARRIER
>> bit. Not with IFF_RUNNING flag.
>
>What is the code path that you are worried about netif_carrier_ok being set or clear?
>The interaction here is complex, and right now LINK_STATE_NOCARRIER is purely
>controlled by the driver, your patch changes that, but before acking I want
>to make sure why it is required.

This patchset would provide a possibility to set or clear the carrier
from userspace. For dummy device it would serve for direct emulation
of link fail.

Also for team deriver, that would serve for teamd (userspace part) to
set the carrier actually on or off (in case of LACP runner for example
this is required).

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-12 18:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <20121212092236.GB4354@redhat.com>

On Wednesday, December 12, 2012 11:22:36 AM Michael S. Tsirkin wrote:
> On Wed, Dec 05, 2012 at 03:26:19PM -0500, Paul Moore wrote:
> > This patch corrects some problems with LSM/SELinux that were introduced
> > with the multiqueue patchset.  The problem stems from the fact that the
> > multiqueue work changed the relationship between the tun device and its
> > associated socket; before the socket persisted for the life of the
> > device, however after the multiqueue changes the socket only persisted
> > for the life of the userspace connection (fd open).  For non-persistent
> > devices this is not an issue, but for persistent devices this can cause
> > the tun device to lose its SELinux label.
> > 
> > We correct this problem by adding an opaque LSM security blob to the
> > tun device struct which allows us to have the LSM security state, e.g.
> > SELinux labeling information, persist for the lifetime of the tun
> > device.  In the process we tweak the LSM hooks to work with this new
> > approach to TUN device/socket labeling and introduce a new LSM hook,
> > security_tun_dev_create_queue(), to approve requests to create a new
> > TUN queue via TUNSETQUEUE.
> > 
> > The SELinux code has been adjusted to match the new LSM hooks, the
> > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > use of the recently added "tun_socket:create_queue" permission to
> > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > policies which do not define the "tun_socket:create_queue" permission
> > the access control decision for TUNSETQUEUE will be handled according
> > to the SELinux policy's unknown permission setting.

...

> > @@ -465,6 +466,10 @@ static int tun_attach(struct tun_struct *tun, struct
> > file *file)> 
> >  	struct tun_file *tfile = file->private_data;
> >  	int err;
> > 
> > +	err = security_tun_dev_attach(tfile->socket.sk, tun->security);
> > +	if (err < 0)
> > +		goto out;
> > +
> > 
> >  	err = -EINVAL;
> >  	if (rcu_dereference_protected(tfile->tun, lockdep_rtnl_is_held()))
> >  	
> >  		goto out;
> 
> This hook triggers with both set_queue and set_iff,
> and it also seems to trigger when attaching to a
> persistent device and when creating a new one. But I
> believe we might want to be able to allow one but not the other.
> 
> For example:
> 	- we might want to allow qemu to do set_queue but not set_iff
> 	- we might want to configure presistent devices and
> 	  prevent a user from adding new ones

Please look at the rest of the patch and see what the hook actually does.  It 
does not perform any access control under SELinux, all it does is ensure that 
the socket is labeled based on the associated TUN device.

> > - * @tun_dev_post_create:
> > - *	This hook allows a module to update or allocate a per-socket security
> > - *	structure.
> > - *	@sk contains the newly created sock structure.
> 
> I worry that removing a hook hurt users that use it in their
> security policy.

We need to change the hooks because there was a significant change to the 
implementation of a TUN device.

However, even when changing the LSM hooks, we have preserved the SELinux 
access controls for standard, e.g. single queue, TUN devices such that 
existing SELinux policies will work for existing TUN users.  The new SELinux 
access control we added only comes into play when TUN users want to enable 
multiple queues.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Stephen Hemminger @ 2012-12-12 18:54 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212184925.GD3060@minipsycho.orion>

On Wed, 12 Dec 2012 19:49:26 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> Wed, Dec 12, 2012 at 07:36:32PM CET, shemminger@vyatta.com wrote:
> >On Wed, 12 Dec 2012 19:25:56 +0100
> >Jiri Pirko <jiri@resnulli.us> wrote:
> >
> >> Wed, Dec 12, 2012 at 07:12:08PM CET, shemminger@vyatta.com wrote:
> >> >On Wed, 12 Dec 2012 19:10:17 +0100
> >> >Jiri Pirko <jiri@resnulli.us> wrote:
> >> >
> >> >> ># ip li show dev dummy0
> >> >> >12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT   
> >> >> 
> >> >> if you mean this "NO-CARRIER"
> >> >> it has no direct relation with netif_carrier_ok().
> >> >
> >> >It is the same value (IFF_RUNNING) that is visible from user space.
> >> 
> >> static inline bool netif_carrier_ok(const struct net_device *dev)
> >> {
> >> 	        return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
> >> }
> >> 
> >> So netif_carrier[ok/on/off] are working with on __LINK_STATE_NOCARRIER
> >> bit. Not with IFF_RUNNING flag.
> >
> >What is the code path that you are worried about netif_carrier_ok being set or clear?
> >The interaction here is complex, and right now LINK_STATE_NOCARRIER is purely
> >controlled by the driver, your patch changes that, but before acking I want
> >to make sure why it is required.
> 
> This patchset would provide a possibility to set or clear the carrier
> from userspace. For dummy device it would serve for direct emulation
> of link fail.
> 
> Also for team deriver, that would serve for teamd (userspace part) to
> set the carrier actually on or off (in case of LACP runner for example
> this is required).
> 

You want to able to control the dummy device, so that you can test carrier
management in the team device. Another alternative is to use carrier control
on a virtual device. Vmware can do it, there were patches to do this with KVM/QEMU
not sure if they ever got incorporated.

Since this is a specific feature of the dummy device which is specialized for
testing, maybe it should just be done by adding device specific ioctl rather
than letting it creep in as a general facility.

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Jiri Pirko @ 2012-12-12 19:06 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212105448.490aca5c@nehalam.linuxnetplumber.net>

Wed, Dec 12, 2012 at 07:54:48PM CET, shemminger@vyatta.com wrote:
>On Wed, 12 Dec 2012 19:49:26 +0100
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Wed, Dec 12, 2012 at 07:36:32PM CET, shemminger@vyatta.com wrote:
>> >On Wed, 12 Dec 2012 19:25:56 +0100
>> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >
>> >> Wed, Dec 12, 2012 at 07:12:08PM CET, shemminger@vyatta.com wrote:
>> >> >On Wed, 12 Dec 2012 19:10:17 +0100
>> >> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >
>> >> >> ># ip li show dev dummy0
>> >> >> >12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT   
>> >> >> 
>> >> >> if you mean this "NO-CARRIER"
>> >> >> it has no direct relation with netif_carrier_ok().
>> >> >
>> >> >It is the same value (IFF_RUNNING) that is visible from user space.
>> >> 
>> >> static inline bool netif_carrier_ok(const struct net_device *dev)
>> >> {
>> >> 	        return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
>> >> }
>> >> 
>> >> So netif_carrier[ok/on/off] are working with on __LINK_STATE_NOCARRIER
>> >> bit. Not with IFF_RUNNING flag.
>> >
>> >What is the code path that you are worried about netif_carrier_ok being set or clear?
>> >The interaction here is complex, and right now LINK_STATE_NOCARRIER is purely
>> >controlled by the driver, your patch changes that, but before acking I want
>> >to make sure why it is required.
>> 
>> This patchset would provide a possibility to set or clear the carrier
>> from userspace. For dummy device it would serve for direct emulation
>> of link fail.
>> 
>> Also for team deriver, that would serve for teamd (userspace part) to
>> set the carrier actually on or off (in case of LACP runner for example
>> this is required).
>> 
>
>You want to able to control the dummy device, so that you can test carrier
>management in the team device. Another alternative is to use carrier control
>on a virtual device. Vmware can do it, there were patches to do this with KVM/QEMU
>not sure if they ever got incorporated.
>
>Since this is a specific feature of the dummy device which is specialized for
>testing, maybe it should just be done by adding device specific ioctl rather
>than letting it creep in as a general facility.

Ugh, specific ioctl stinks...
But this is not only for dummy. As I said, we need this for team driver.
Maybe I did not explain that correctly. Given the fact that the whole
Team logic is in userspace, teamd (userspace daemon) needs to set the
carrier state as if it was done in kernel. Yes, we would be able to do
this by specific Team option in team driver, but I thought this would be
nicer to do that more generally.

Also, in previous discussion Michał Mirosław wrote he would like this
feature also for GRE tunnel devices.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox