netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] macvtap: add namespace support to the sysfs device class
@ 2016-04-20 14:11 Marc Angel
  2016-04-24 18:14 ` David Miller
  2016-04-25 19:12 ` Eric W. Biederman
  0 siblings, 2 replies; 12+ messages in thread
From: Marc Angel @ 2016-04-20 14:11 UTC (permalink / raw)
  To: netdev; +Cc: ebiederm

When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):

/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN

This is reproducible by running the following commands:

ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap

The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.

The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.

This patch adds namespace support the the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.

However, doing this has the side effect of changing
/sys/devices/virtual/net/macvtapX/tapNN  into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN

This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces.")

Signed-off-by: Marc Angel <marc@arista.com>
---
I'm not sure that the problems described in that commit message
apply to macvtaps so maybe it is possible to keep the 'tapNN'
device directly under 'macvtapX' and not disrupt userland.

Should it even be possible to add a device of a class that doesn't
support namespaces under one that does?
This could lead to dead symlinks in the new device class directory or
duplicate warnings because a device of the same name already exists in
another namespace.
---
 drivers/net/macvtap.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 95394ed..a76d72d 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -129,7 +129,18 @@ static DEFINE_MUTEX(minor_lock);
 static DEFINE_IDR(minor_idr);
 
 #define GOODCOPY_LEN 128
-static struct class *macvtap_class;
+static const void *macvtap_net_namespace(struct device *d)
+{
+	struct net_device *dev = to_net_dev(d->parent);
+	return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+	.name = "macvtap",
+	.owner = THIS_MODULE,
+	.ns_type = &net_ns_type_operations,
+	.namespace = macvtap_net_namespace,
+};
 static struct cdev macvtap_cdev;
 
 static const struct proto_ops macvtap_socket_ops;
@@ -1295,7 +1306,7 @@ static int macvtap_device_event(struct notifier_block *unused,
 			return notifier_from_errno(err);
 
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		classdev = device_create(macvtap_class, &dev->dev, devt,
+		classdev = device_create(&macvtap_class, &dev->dev, devt,
 					 dev, "tap%d", dev->ifindex);
 		if (IS_ERR(classdev)) {
 			macvtap_free_minor(vlan);
@@ -1304,7 +1315,7 @@ static int macvtap_device_event(struct notifier_block *unused,
 		break;
 	case NETDEV_UNREGISTER:
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		device_destroy(macvtap_class, devt);
+		device_destroy(&macvtap_class, devt);
 		macvtap_free_minor(vlan);
 		break;
 	}
@@ -1330,11 +1341,9 @@ static int macvtap_init(void)
 	if (err)
 		goto out2;
 
-	macvtap_class = class_create(THIS_MODULE, "macvtap");
-	if (IS_ERR(macvtap_class)) {
-		err = PTR_ERR(macvtap_class);
+	err = class_register(&macvtap_class);
+	if (err)
 		goto out3;
-	}
 
 	err = register_netdevice_notifier(&macvtap_notifier_block);
 	if (err)
@@ -1349,7 +1358,7 @@ static int macvtap_init(void)
 out5:
 	unregister_netdevice_notifier(&macvtap_notifier_block);
 out4:
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 out3:
 	cdev_del(&macvtap_cdev);
 out2:
@@ -1363,7 +1372,7 @@ static void macvtap_exit(void)
 {
 	rtnl_link_unregister(&macvtap_link_ops);
 	unregister_netdevice_notifier(&macvtap_notifier_block);
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 	cdev_del(&macvtap_cdev);
 	unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
 	idr_destroy(&minor_idr);
-- 
2.8.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] macvtap: add namespace support to the sysfs device class
  2016-04-20 14:11 [PATCH net-next] macvtap: add namespace support to the sysfs device class Marc Angel
@ 2016-04-24 18:14 ` David Miller
  2016-04-25 19:12 ` Eric W. Biederman
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2016-04-24 18:14 UTC (permalink / raw)
  To: marc; +Cc: netdev, ebiederm

From: Marc Angel <marc@arista.com>
Date: Wed, 20 Apr 2016 16:11:31 +0200

> However, doing this has the side effect of changing
> /sys/devices/virtual/net/macvtapX/tapNN  into
> /sys/devices/virtual/net/macvtapX/macvtap/tapNN

I'm really not comfortable at all with having this sysfs
device name change.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] macvtap: add namespace support to the sysfs device class
  2016-04-20 14:11 [PATCH net-next] macvtap: add namespace support to the sysfs device class Marc Angel
  2016-04-24 18:14 ` David Miller
@ 2016-04-25 19:12 ` Eric W. Biederman
  2016-04-28 16:35   ` Marc Angel
  1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2016-04-25 19:12 UTC (permalink / raw)
  To: Marc Angel; +Cc: netdev

Marc Angel <marc@arista.com> writes:

> When creating macvtaps that are expected to have the same ifindex
> in different network namespaces, only the first one will succeed.
> The others will fail with a sysfs_warn_dup warning due to them trying
> to create the following sysfs link (with 'NN' the ifindex of macvtapX):
>
> /sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN
>
> This is reproducible by running the following commands:
>
> ip netns add ns1
> ip netns add ns2
> ip link add veth0 type veth peer name veth1
> ip link set veth0 netns ns1
> ip link set veth1 netns ns2
> ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
> ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap
>
> The last command will fail with "RTNETLINK answers: File exists" (along
> with the kernel warning) but retrying it will work because the ifindex
> was incremented.

Yes.  That is totally broken, and you would not even care excpect that
the ifindex maintained separately per network namespace.  Useful for
migration but it totally breaks things in this case.

> The 'net' device class is isolated between network namespaces so each
> one has its own hierarchy of net devices.
> This isn't the case for the 'macvtap' device class.
> The problem occurs half-way through the netdev registration, when
> `macvtap_device_event` is called-back to create the 'tapNN' macvtap
> class device under the 'macvtapX' net class device.
>
> This patch adds namespace support the the 'macvtap' device class so
> that /sys/class/macvtap is no longer shared between net namespaces.
>
> However, doing this has the side effect of changing
> /sys/devices/virtual/net/macvtapX/tapNN  into
> /sys/devices/virtual/net/macvtapX/macvtap/tapNN

I forget the details of how this interface works, but
/sys/devices/virtual/net is definitely allows different overlapping
content per network namespace, so we should not need to add an extra
directory to make this work.

> This is due to Commit 24b1442 ("Driver-core: Always create class
> directories for classses that support namespaces.")
>
> Signed-off-by: Marc Angel <marc@arista.com>
> ---
> I'm not sure that the problems described in that commit message
> apply to macvtaps so maybe it is possible to keep the 'tapNN'
> device directly under 'macvtapX' and not disrupt userland.
>
> Should it even be possible to add a device of a class that doesn't
> support namespaces under one that does?
> This could lead to dead symlinks in the new device class directory or
> duplicate warnings because a device of the same name already exists in
> another namespace.

This definitely looks like something that bears digging into, and fixing
properly.

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] macvtap: add namespace support to the sysfs device class
  2016-04-25 19:12 ` Eric W. Biederman
@ 2016-04-28 16:35   ` Marc Angel
  2016-05-03 18:30     ` [PATCH net-next v2] " Marc Angel
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Angel @ 2016-04-28 16:35 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev

On Mon, Apr 25, 2016 at 8:12 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>> The 'net' device class is isolated between network namespaces so each
>> one has its own hierarchy of net devices.
>> This isn't the case for the 'macvtap' device class.
>> The problem occurs half-way through the netdev registration, when
>> `macvtap_device_event` is called-back to create the 'tapNN' macvtap
>> class device under the 'macvtapX' net class device.
>>
>> This patch adds namespace support the the 'macvtap' device class so
>> that /sys/class/macvtap is no longer shared between net namespaces.
>>
>> However, doing this has the side effect of changing
>> /sys/devices/virtual/net/macvtapX/tapNN  into
>> /sys/devices/virtual/net/macvtapX/macvtap/tapNN
>
> I forget the details of how this interface works, but
> /sys/devices/virtual/net is definitely allows different overlapping
> content per network namespace, so we should not need to add an extra
> directory to make this work.

It really seems like we do, unfortunately.

For a kernfs_node to have the KERNFS_NS flag enabled, sysfs_enable_ns
has to be called on it. This is only done in the create_dir function of
lib/kobject.c, and only when the parent of that kobject has a ktype with
the child_ns_type field set to something.

This is the case for class_dir_ktype which is the type used for the
"glue" dirs (the extra macvtap/ that is created under macvtapX).
This, however, is not the case for device_ktype, which is the type
used for every device directory.
When we create tapN directly under macvtapX, tapN doesn't get the
KERNFS_NS flag enabled -- unlike when created under the "glue" dir.

This is problematic when creating the following symlink:
/sys/class/macvtap/tapN -> /sys/devices/virtual/net/macvtapX/tapN.
The tapN in /sys/class/macvtap inherits the namespace tag from
/sys/devices/virtual/net/macvtapX/tapN, which doesn't have one anymore
and kernfs_add_one fails because it expects it to.

Adding a child_ns_type field to device_ktype is probably not a good idea
and seems to cause other problems.

The best workaround is probably to just create a symlink inside the
macvtapX device directory (tapN -> macvtap/tapN).

I'll update my patch accordingly if you don't have a better idea.

>> Should it even be possible to add a device of a class that doesn't
>> support namespaces under one that does?
>> This could lead to dead symlinks in the new device class directory or
>> duplicate warnings because a device of the same name already exists in
>> another namespace.
>
> This definitely looks like something that bears digging into, and fixing
> properly.
>
> Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v2] macvtap: add namespace support to the sysfs device class
  2016-04-28 16:35   ` Marc Angel
@ 2016-05-03 18:30     ` Marc Angel
  2016-05-04 20:04       ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Angel @ 2016-05-03 18:30 UTC (permalink / raw)
  To: netdev; +Cc: ebiederm

When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):

/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN

This is reproducible by running the following commands:

ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap

The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.

The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.

This patch adds namespace support to the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.

However, making the macvtap sysfs class namespace-aware has the side
effect of changing /sys/devices/virtual/net/macvtapX/tapNN  into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN.

This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces") and the fact that
class devices supporting namespaces are really not supposed to be placed
directly under other class devices.

To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
created inside the macvtapX directory.

Signed-off-by: Marc Angel <marc@arista.com>
---
 drivers/net/macvtap.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 95394ed..b7ebfcd 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -129,7 +129,18 @@ static DEFINE_MUTEX(minor_lock);
 static DEFINE_IDR(minor_idr);
 
 #define GOODCOPY_LEN 128
-static struct class *macvtap_class;
+static const void *macvtap_net_namespace(struct device *d)
+{
+	struct net_device *dev = to_net_dev(d->parent);
+	return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+	.name = "macvtap",
+	.owner = THIS_MODULE,
+	.ns_type = &net_ns_type_operations,
+	.namespace = macvtap_net_namespace,
+};
 static struct cdev macvtap_cdev;
 
 static const struct proto_ops macvtap_socket_ops;
@@ -1274,6 +1285,7 @@ static int macvtap_device_event(struct notifier_block *unused,
 				unsigned long event, void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	const char *tap_name = kasprintf(GFP_KERNEL, "tap%d", dev->ifindex);
 	struct macvlan_dev *vlan;
 	struct device *classdev;
 	dev_t devt;
@@ -1295,16 +1307,21 @@ static int macvtap_device_event(struct notifier_block *unused,
 			return notifier_from_errno(err);
 
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		classdev = device_create(macvtap_class, &dev->dev, devt,
-					 dev, "tap%d", dev->ifindex);
+		classdev = device_create(&macvtap_class, &dev->dev, devt,
+					 dev, tap_name);
 		if (IS_ERR(classdev)) {
 			macvtap_free_minor(vlan);
 			return notifier_from_errno(PTR_ERR(classdev));
 		}
+		err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj,
+					dev_name(classdev));
+		if (err)
+			return notifier_from_errno(err);
 		break;
 	case NETDEV_UNREGISTER:
+		sysfs_remove_link(&dev->dev.kobj, tap_name);
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		device_destroy(macvtap_class, devt);
+		device_destroy(&macvtap_class, devt);
 		macvtap_free_minor(vlan);
 		break;
 	}
@@ -1330,11 +1347,9 @@ static int macvtap_init(void)
 	if (err)
 		goto out2;
 
-	macvtap_class = class_create(THIS_MODULE, "macvtap");
-	if (IS_ERR(macvtap_class)) {
-		err = PTR_ERR(macvtap_class);
+	err = class_register(&macvtap_class);
+	if (err)
 		goto out3;
-	}
 
 	err = register_netdevice_notifier(&macvtap_notifier_block);
 	if (err)
@@ -1349,7 +1364,7 @@ static int macvtap_init(void)
 out5:
 	unregister_netdevice_notifier(&macvtap_notifier_block);
 out4:
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 out3:
 	cdev_del(&macvtap_cdev);
 out2:
@@ -1363,7 +1378,7 @@ static void macvtap_exit(void)
 {
 	rtnl_link_unregister(&macvtap_link_ops);
 	unregister_netdevice_notifier(&macvtap_notifier_block);
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 	cdev_del(&macvtap_cdev);
 	unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
 	idr_destroy(&minor_idr);
-- 
2.8.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2] macvtap: add namespace support to the sysfs device class
  2016-05-03 18:30     ` [PATCH net-next v2] " Marc Angel
@ 2016-05-04 20:04       ` David Miller
  2016-05-05  0:33         ` Marc Angel
  2016-05-05  0:34         ` [PATCH net-next v3] " Marc Angel
  0 siblings, 2 replies; 12+ messages in thread
From: David Miller @ 2016-05-04 20:04 UTC (permalink / raw)
  To: marc; +Cc: netdev, ebiederm

From: Marc Angel <marc@arista.com>
Date: Tue,  3 May 2016 20:30:54 +0200

> @@ -1274,6 +1285,7 @@ static int macvtap_device_event(struct notifier_block *unused,
>  				unsigned long event, void *ptr)
>  {
>  	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	const char *tap_name = kasprintf(GFP_KERNEL, "tap%d", dev->ifindex);
>  	struct macvlan_dev *vlan;
>  	struct device *classdev;
>  	dev_t devt;

This 'tap_name' buffer seems to be leaked in several code paths.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2] macvtap: add namespace support to the sysfs device class
  2016-05-04 20:04       ` David Miller
@ 2016-05-05  0:33         ` Marc Angel
  2016-05-05  3:53           ` David Miller
  2016-05-05  0:34         ` [PATCH net-next v3] " Marc Angel
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Angel @ 2016-05-05  0:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric W. Biederman

On Wed, May 4, 2016 at 9:04 PM, David Miller <davem@davemloft.net> wrote:
> From: Marc Angel <marc@arista.com>
> Date: Tue,  3 May 2016 20:30:54 +0200
>
>> @@ -1274,6 +1285,7 @@ static int macvtap_device_event(struct notifier_block *unused,
>>                               unsigned long event, void *ptr)
>>  {
>>       struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>> +     const char *tap_name = kasprintf(GFP_KERNEL, "tap%d", dev->ifindex);
>>       struct macvlan_dev *vlan;
>>       struct device *classdev;
>>       dev_t devt;
>
> This 'tap_name' buffer seems to be leaked in several code paths.

Sorry about that... V3 should be better.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v3] macvtap: add namespace support to the sysfs device class
  2016-05-04 20:04       ` David Miller
  2016-05-05  0:33         ` Marc Angel
@ 2016-05-05  0:34         ` Marc Angel
  1 sibling, 0 replies; 12+ messages in thread
From: Marc Angel @ 2016-05-05  0:34 UTC (permalink / raw)
  To: netdev; +Cc: ebiederm

When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):

/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN

This is reproducible by running the following commands:

ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap

The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.

The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.

This patch adds namespace support to the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.

However, making the macvtap sysfs class namespace-aware has the side
effect of changing /sys/devices/virtual/net/macvtapX/tapNN  into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN.

This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces") and the fact that
class devices supporting namespaces are really not supposed to be placed
directly under other class devices.

To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
created inside the macvtapX directory.

Signed-off-by: Marc Angel <marc@arista.com>
---
 drivers/net/macvtap.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 95394ed..d89820f 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -129,7 +129,18 @@ static DEFINE_MUTEX(minor_lock);
 static DEFINE_IDR(minor_idr);
 
 #define GOODCOPY_LEN 128
-static struct class *macvtap_class;
+static const void *macvtap_net_namespace(struct device *d)
+{
+	struct net_device *dev = to_net_dev(d->parent);
+	return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+	.name = "macvtap",
+	.owner = THIS_MODULE,
+	.ns_type = &net_ns_type_operations,
+	.namespace = macvtap_net_namespace,
+};
 static struct cdev macvtap_cdev;
 
 static const struct proto_ops macvtap_socket_ops;
@@ -1278,10 +1289,12 @@ static int macvtap_device_event(struct notifier_block *unused,
 	struct device *classdev;
 	dev_t devt;
 	int err;
+	char tap_name[IFNAMSIZ];
 
 	if (dev->rtnl_link_ops != &macvtap_link_ops)
 		return NOTIFY_DONE;
 
+	snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex);
 	vlan = netdev_priv(dev);
 
 	switch (event) {
@@ -1295,16 +1308,21 @@ static int macvtap_device_event(struct notifier_block *unused,
 			return notifier_from_errno(err);
 
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		classdev = device_create(macvtap_class, &dev->dev, devt,
-					 dev, "tap%d", dev->ifindex);
+		classdev = device_create(&macvtap_class, &dev->dev, devt,
+					 dev, tap_name);
 		if (IS_ERR(classdev)) {
 			macvtap_free_minor(vlan);
 			return notifier_from_errno(PTR_ERR(classdev));
 		}
+		err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj,
+					tap_name);
+		if (err)
+			return notifier_from_errno(err);
 		break;
 	case NETDEV_UNREGISTER:
+		sysfs_remove_link(&dev->dev.kobj, tap_name);
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		device_destroy(macvtap_class, devt);
+		device_destroy(&macvtap_class, devt);
 		macvtap_free_minor(vlan);
 		break;
 	}
@@ -1330,11 +1348,9 @@ static int macvtap_init(void)
 	if (err)
 		goto out2;
 
-	macvtap_class = class_create(THIS_MODULE, "macvtap");
-	if (IS_ERR(macvtap_class)) {
-		err = PTR_ERR(macvtap_class);
+	err = class_register(&macvtap_class);
+	if (err)
 		goto out3;
-	}
 
 	err = register_netdevice_notifier(&macvtap_notifier_block);
 	if (err)
@@ -1349,7 +1365,7 @@ static int macvtap_init(void)
 out5:
 	unregister_netdevice_notifier(&macvtap_notifier_block);
 out4:
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 out3:
 	cdev_del(&macvtap_cdev);
 out2:
@@ -1363,7 +1379,7 @@ static void macvtap_exit(void)
 {
 	rtnl_link_unregister(&macvtap_link_ops);
 	unregister_netdevice_notifier(&macvtap_notifier_block);
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 	cdev_del(&macvtap_cdev);
 	unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
 	idr_destroy(&minor_idr);
-- 
2.8.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2] macvtap: add namespace support to the sysfs device class
  2016-05-05  0:33         ` Marc Angel
@ 2016-05-05  3:53           ` David Miller
  2016-05-05 10:12             ` Marc Angel
  2016-05-05 10:14             ` [PATCH net-next v4] " Marc Angel
  0 siblings, 2 replies; 12+ messages in thread
From: David Miller @ 2016-05-05  3:53 UTC (permalink / raw)
  To: marc; +Cc: netdev, ebiederm

From: Marc Angel <marc@arista.com>
Date: Thu, 5 May 2016 01:33:09 +0100

> On Wed, May 4, 2016 at 9:04 PM, David Miller <davem@davemloft.net> wrote:
>> From: Marc Angel <marc@arista.com>
>> Date: Tue,  3 May 2016 20:30:54 +0200
>>
>>> @@ -1274,6 +1285,7 @@ static int macvtap_device_event(struct notifier_block *unused,
>>>                               unsigned long event, void *ptr)
>>>  {
>>>       struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>> +     const char *tap_name = kasprintf(GFP_KERNEL, "tap%d", dev->ifindex);
>>>       struct macvlan_dev *vlan;
>>>       struct device *classdev;
>>>       dev_t devt;
>>
>> This 'tap_name' buffer seems to be leaked in several code paths.
> 
> Sorry about that... V3 should be better.

V3 doesn't apply cleanly to net-next, please respin your patch against
the current tree.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2] macvtap: add namespace support to the sysfs device class
  2016-05-05  3:53           ` David Miller
@ 2016-05-05 10:12             ` Marc Angel
  2016-05-05 10:14             ` [PATCH net-next v4] " Marc Angel
  1 sibling, 0 replies; 12+ messages in thread
From: Marc Angel @ 2016-05-05 10:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric W. Biederman

On Thu, May 5, 2016 at 4:53 AM, David Miller <davem@davemloft.net> wrote:
> From: Marc Angel <marc@arista.com>
> Date: Thu, 5 May 2016 01:33:09 +0100
>
>> On Wed, May 4, 2016 at 9:04 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Marc Angel <marc@arista.com>
>>> Date: Tue,  3 May 2016 20:30:54 +0200
>>>
>>>> @@ -1274,6 +1285,7 @@ static int macvtap_device_event(struct notifier_block *unused,
>>>>                               unsigned long event, void *ptr)
>>>>  {
>>>>       struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>>> +     const char *tap_name = kasprintf(GFP_KERNEL, "tap%d", dev->ifindex);
>>>>       struct macvlan_dev *vlan;
>>>>       struct device *classdev;
>>>>       dev_t devt;
>>>
>>> This 'tap_name' buffer seems to be leaked in several code paths.
>>
>> Sorry about that... V3 should be better.
>
> V3 doesn't apply cleanly to net-next, please respin your patch against
> the current tree.
>
> Thanks.

I wasn't up-to-date, sorry again about that...

Thanks for being so patient.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v4] macvtap: add namespace support to the sysfs device class
  2016-05-05  3:53           ` David Miller
  2016-05-05 10:12             ` Marc Angel
@ 2016-05-05 10:14             ` Marc Angel
  2016-05-05 21:23               ` David Miller
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Angel @ 2016-05-05 10:14 UTC (permalink / raw)
  To: netdev; +Cc: ebiederm

When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):

/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN

This is reproducible by running the following commands:

ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap

The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.

The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.

This patch adds namespace support to the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.

However, making the macvtap sysfs class namespace-aware has the side
effect of changing /sys/devices/virtual/net/macvtapX/tapNN  into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN.

This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces") and the fact that
class devices supporting namespaces are really not supposed to be placed
directly under other class devices.

To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
created inside the macvtapX directory.

Signed-off-by: Marc Angel <marc@arista.com>
---
 drivers/net/macvtap.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 74cb15a..22b85b0 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -129,7 +129,18 @@ static DEFINE_MUTEX(minor_lock);
 static DEFINE_IDR(minor_idr);
 
 #define GOODCOPY_LEN 128
-static struct class *macvtap_class;
+static const void *macvtap_net_namespace(struct device *d)
+{
+	struct net_device *dev = to_net_dev(d->parent);
+	return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+	.name = "macvtap",
+	.owner = THIS_MODULE,
+	.ns_type = &net_ns_type_operations,
+	.namespace = macvtap_net_namespace,
+};
 static struct cdev macvtap_cdev;
 
 static const struct proto_ops macvtap_socket_ops;
@@ -1278,10 +1289,12 @@ static int macvtap_device_event(struct notifier_block *unused,
 	struct device *classdev;
 	dev_t devt;
 	int err;
+	char tap_name[IFNAMSIZ];
 
 	if (dev->rtnl_link_ops != &macvtap_link_ops)
 		return NOTIFY_DONE;
 
+	snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex);
 	vlan = netdev_priv(dev);
 
 	switch (event) {
@@ -1295,19 +1308,24 @@ static int macvtap_device_event(struct notifier_block *unused,
 			return notifier_from_errno(err);
 
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		classdev = device_create(macvtap_class, &dev->dev, devt,
-					 dev, "tap%d", dev->ifindex);
+		classdev = device_create(&macvtap_class, &dev->dev, devt,
+					 dev, tap_name);
 		if (IS_ERR(classdev)) {
 			macvtap_free_minor(vlan);
 			return notifier_from_errno(PTR_ERR(classdev));
 		}
+		err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj,
+					tap_name);
+		if (err)
+			return notifier_from_errno(err);
 		break;
 	case NETDEV_UNREGISTER:
 		/* vlan->minor == 0 if NETDEV_REGISTER above failed */
 		if (vlan->minor == 0)
 			break;
+		sysfs_remove_link(&dev->dev.kobj, tap_name);
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		device_destroy(macvtap_class, devt);
+		device_destroy(&macvtap_class, devt);
 		macvtap_free_minor(vlan);
 		break;
 	}
@@ -1333,11 +1351,9 @@ static int macvtap_init(void)
 	if (err)
 		goto out2;
 
-	macvtap_class = class_create(THIS_MODULE, "macvtap");
-	if (IS_ERR(macvtap_class)) {
-		err = PTR_ERR(macvtap_class);
+	err = class_register(&macvtap_class);
+	if (err)
 		goto out3;
-	}
 
 	err = register_netdevice_notifier(&macvtap_notifier_block);
 	if (err)
@@ -1352,7 +1368,7 @@ static int macvtap_init(void)
 out5:
 	unregister_netdevice_notifier(&macvtap_notifier_block);
 out4:
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 out3:
 	cdev_del(&macvtap_cdev);
 out2:
@@ -1366,7 +1382,7 @@ static void macvtap_exit(void)
 {
 	rtnl_link_unregister(&macvtap_link_ops);
 	unregister_netdevice_notifier(&macvtap_notifier_block);
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 	cdev_del(&macvtap_cdev);
 	unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
 	idr_destroy(&minor_idr);
-- 
2.8.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v4] macvtap: add namespace support to the sysfs device class
  2016-05-05 10:14             ` [PATCH net-next v4] " Marc Angel
@ 2016-05-05 21:23               ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2016-05-05 21:23 UTC (permalink / raw)
  To: marc; +Cc: netdev, ebiederm

From: Marc Angel <marc@arista.com>
Date: Thu,  5 May 2016 12:14:26 +0200

> When creating macvtaps that are expected to have the same ifindex
> in different network namespaces, only the first one will succeed.
> The others will fail with a sysfs_warn_dup warning due to them trying
> to create the following sysfs link (with 'NN' the ifindex of macvtapX):
> 
> /sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN
> 
> This is reproducible by running the following commands:
> 
> ip netns add ns1
> ip netns add ns2
> ip link add veth0 type veth peer name veth1
> ip link set veth0 netns ns1
> ip link set veth1 netns ns2
> ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
> ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap
> 
> The last command will fail with "RTNETLINK answers: File exists" (along
> with the kernel warning) but retrying it will work because the ifindex
> was incremented.
> 
> The 'net' device class is isolated between network namespaces so each
> one has its own hierarchy of net devices.
> This isn't the case for the 'macvtap' device class.
> The problem occurs half-way through the netdev registration, when
> `macvtap_device_event` is called-back to create the 'tapNN' macvtap
> class device under the 'macvtapX' net class device.
> 
> This patch adds namespace support to the 'macvtap' device class so
> that /sys/class/macvtap is no longer shared between net namespaces.
> 
> However, making the macvtap sysfs class namespace-aware has the side
> effect of changing /sys/devices/virtual/net/macvtapX/tapNN  into
> /sys/devices/virtual/net/macvtapX/macvtap/tapNN.
> 
> This is due to Commit 24b1442 ("Driver-core: Always create class
> directories for classses that support namespaces") and the fact that
> class devices supporting namespaces are really not supposed to be placed
> directly under other class devices.
> 
> To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
> created inside the macvtapX directory.
> 
> Signed-off-by: Marc Angel <marc@arista.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-05-05 21:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-20 14:11 [PATCH net-next] macvtap: add namespace support to the sysfs device class Marc Angel
2016-04-24 18:14 ` David Miller
2016-04-25 19:12 ` Eric W. Biederman
2016-04-28 16:35   ` Marc Angel
2016-05-03 18:30     ` [PATCH net-next v2] " Marc Angel
2016-05-04 20:04       ` David Miller
2016-05-05  0:33         ` Marc Angel
2016-05-05  3:53           ` David Miller
2016-05-05 10:12             ` Marc Angel
2016-05-05 10:14             ` [PATCH net-next v4] " Marc Angel
2016-05-05 21:23               ` David Miller
2016-05-05  0:34         ` [PATCH net-next v3] " Marc Angel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).