* Re: PATCH: Network Device Naming mechanism and policy [not found] <EDA0A4495861324DA2618B4C45DCB3EE58964E@blrx3m08.blr.amer.dell.com> @ 2009-10-28 13:06 ` Narendra K 0 siblings, 0 replies; 86+ messages in thread From: Narendra K @ 2009-10-28 13:06 UTC (permalink / raw) To: notting, scott Cc: netdev, linux-hotplug, matt_domsch, jordan_hargrave, rose_charles On Wed, Oct 28, 2009 at 06:21:49PM +0530, K, Narendra wrote: > > At the moment, we do not appear to get the proper change uevents from > > things like 'ip link set dev <foo> address <bar>', so we can't > > currently maintain these symlinks. > > > > I have observed that the kernel does generate a "move" event when > interfaces are renamed. Looks like udev at present doesn't handle this > event, but i suppose it could be extended to hanlde this event. > With the patch "[PATCH]udev:Extend udev to support move events" (http://marc.info/?l=linux-hotplug&m=125673399217656&w=2) udev would be able to handle "move" events that are generated when interfaces are renamed by commands like nameif. And we can maintain the symlinks by having rules to handle this move event. With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <EDA0A4495861324DA2618B4C45DCB3EE589541@blrx3m08.blr.amer.dell.com>]
* Re: PATCH: Network Device Naming mechanism and policy [not found] <EDA0A4495861324DA2618B4C45DCB3EE589541@blrx3m08.blr.amer.dell.com> @ 2009-10-12 18:47 ` Narendra K 2009-10-12 19:09 ` Greg KH 0 siblings, 1 reply; 86+ messages in thread From: Narendra K @ 2009-10-12 18:47 UTC (permalink / raw) To: greg, notting Cc: matt_domsch, netdev, shemminger, linux-hotplug, jordan_hargrave, charles_rose > On Mon, Oct 12, 2009 at 01:45:28PM -0400, Bill Nottingham wrote: > > Greg KH (greg@kroah.com) said: > > > > Today, port naming is completely nondeterministic. If you have > > > > but one NIC, there are few chances to get the name wrong (it'll be > eth0). > > > > If you have >1 NIC, chances increase to get it wrong. > > > > > > That is why all distros name network devices based on the only > > > deterministic thing they have today, the MAC address. I still fail > > > to see why you do not like this solution, it is honestly the only > > > way to properly name network devices in a sane manner. > > > > > > All distros also provide a way to easily rename the network devices, > > > > to place a specific name on a specific MAC address, so again, this > > > should all be solved already. > > > > No, it's not solved. Even if you have persistent names once you > > install, if you ever re-image, you're likely to get *different* > > persistent names; the first load will always be non-detmerministic. > > > > The only way around this would be to have some sort of screen like: > > > > Would you like your network devices to be enumerated by > > > > [ ] MAC address > > [ ] PCI device order > > [ ] Driver name > > [ ] Other > > [ ] PCI slot name > > That's one that modern systems are now reporting, and should solve > Matt's problem as well, right? MAC address and pci slots might ensure that device names are persistant across system reboots. They do not assure that the LOM 1 is named as "eth0" which is the expectation. In case of unattended installs, installers abort installation if the port which gets the name "eth0" does not have the link up and doesn't have the IP.This is often the case becaused the LOMS have the boot capability. We can acheive persistent naming using MAC adresses. But it doesn't address the expectation that LOM-1 becomes "eth0" on every reboot which is mostly used for unattended installs.(Installers can be told to use options like IPAPPEND 2, but the this solution would make it of no use). > > which is just all sorts of fail in and of itself. Especially since > > once you get to the point where you can coherently ask this in a > > native installer, the drivers have already loaded. > > No, the driver load order doesn't determine this, you need the drivers > loaded first before you can rename anything :) > Renaming an interface in the kernel namespace itself, might need to problems like duplicate names. But having names in alternate namespace not in kernel namespace might be more useful. > And I don't see how Matt's proposed patch helps resolve this type of > issue any better than what we currently have today, do you? > I have a system which has 4 LOMS and 1 add-in NIC and the add-in NIC always gets the name "eth0" eventhough i PXE booted from LOM-1. Since "eth0" doesn't have link up, the installer stops and asks which interface should get IP. This would not suit an unattended install scenario. If the installer can use a pathname like /dev/net/by-chassis-label/Embedded_NIC_1 (->eth1 which is my LOM-1), it would always point to the correct interface irrespective of whether it is "eth0" or not. With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 18:47 ` Narendra K @ 2009-10-12 19:09 ` Greg KH 2009-10-12 19:41 ` Karl O. Pinc 2009-10-12 19:48 ` Matt Domsch 0 siblings, 2 replies; 86+ messages in thread From: Greg KH @ 2009-10-12 19:09 UTC (permalink / raw) To: Narendra K Cc: notting, matt_domsch, netdev, shemminger, linux-hotplug, jordan_hargrave, charles_rose On Mon, Oct 12, 2009 at 01:47:12PM -0500, Narendra K wrote: > > On Mon, Oct 12, 2009 at 01:45:28PM -0400, Bill Nottingham wrote: > > > Greg KH (greg@kroah.com) said: > > > > > Today, port naming is completely nondeterministic. If you have > > > > > but one NIC, there are few chances to get the name wrong (it'll be > > eth0). > > > > > If you have >1 NIC, chances increase to get it wrong. > > > > > > > > That is why all distros name network devices based on the only > > > > deterministic thing they have today, the MAC address. I still fail > > > > to see why you do not like this solution, it is honestly the only > > > > way to properly name network devices in a sane manner. > > > > > > > > All distros also provide a way to easily rename the network devices, > > > > > > to place a specific name on a specific MAC address, so again, this > > > > should all be solved already. > > > > > > No, it's not solved. Even if you have persistent names once you > > > install, if you ever re-image, you're likely to get *different* > > > persistent names; the first load will always be non-detmerministic. > > > > > > The only way around this would be to have some sort of screen like: > > > > > > Would you like your network devices to be enumerated by > > > > > > [ ] MAC address > > > [ ] PCI device order > > > [ ] Driver name > > > [ ] Other > > > > [ ] PCI slot name > > > > That's one that modern systems are now reporting, and should solve > > Matt's problem as well, right? > > MAC address and pci slots might ensure that device names are persistant > across system reboots. They do not assure that the LOM 1 is named as > "eth0" which is the expectation. "LOM"? Isn't what you want is a PCI slot detection, combined with the order on board in which the port is enumerated? > In case of unattended installs, installers abort installation if the > port which gets the name "eth0" does not have the link up and doesn't > have the IP. Sounds like a broken installer :) > This is often the case becaused the LOMS have the boot capability. We > can acheive persistent naming using MAC adresses. But it doesn't > address the expectation that LOM-1 becomes "eth0" on every reboot > which is mostly used for unattended installs.(Installers can be told > to use options like IPAPPEND 2, but the this solution would make it of > no use). I still fail to see how this dummy char device would solve this problem, as everything you can do today in userspace would be the same with this device node as you can't do anything with the symlink name on its own, right? > > > which is just all sorts of fail in and of itself. Especially since > > > once you get to the point where you can coherently ask this in a > > > native installer, the drivers have already loaded. > > > > No, the driver load order doesn't determine this, you need the drivers > > loaded first before you can rename anything :) > > > > Renaming an interface in the kernel namespace itself, might need to > problems like duplicate names. But having names in alternate namespace > not in kernel namespace might be more useful. Not if you can't do anything useful with those names :) > > And I don't see how Matt's proposed patch helps resolve this type of > > issue any better than what we currently have today, do you? > > > > I have a system which has 4 LOMS and 1 add-in NIC and the add-in NIC > always gets the name "eth0" eventhough i PXE booted from LOM-1. Since > "eth0" doesn't have link up, the installer stops and asks which > interface should get IP. This would not suit an unattended install > scenario. If the installer can use a pathname like > /dev/net/by-chassis-label/Embedded_NIC_1 (->eth1 which is my LOM-1), it > would always point to the correct interface irrespective of whether it > is "eth0" or not. Um, again, you can name your network devices like this today, without these symlinks... thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 19:09 ` Greg KH @ 2009-10-12 19:41 ` Karl O. Pinc 2009-10-13 18:17 ` Dan Williams 2009-10-12 19:48 ` Matt Domsch 1 sibling, 1 reply; 86+ messages in thread From: Karl O. Pinc @ 2009-10-12 19:41 UTC (permalink / raw) To: Greg KH Cc: Narendra K, notting, matt_domsch, netdev, shemminger, linux-hotplug, jordan_hargrave, charles_rose On 10/12/2009 02:09:00 PM, Greg KH wrote: > "LOM"? "LAN On Motherboard" of all things. (I had to look this one up. The expansion better suited to today's economy is "Low On Manna".) Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 19:41 ` Karl O. Pinc @ 2009-10-13 18:17 ` Dan Williams 2009-10-13 18:56 ` Ben Hutchings 0 siblings, 1 reply; 86+ messages in thread From: Dan Williams @ 2009-10-13 18:17 UTC (permalink / raw) To: Karl O. Pinc Cc: Greg KH, Narendra K, notting, matt_domsch, netdev, shemminger, linux-hotplug, jordan_hargrave, charles_rose On Mon, 2009-10-12 at 14:41 -0500, Karl O. Pinc wrote: > On 10/12/2009 02:09:00 PM, Greg KH wrote: > > > "LOM"? > > "LAN On Motherboard" of all things. > > (I had to look this > one up. The expansion better suited > to today's economy is "Low On Manna".) Or the previous usage from Sun, Apple, and others: "Lights Out Management". Not sure why they needed to clash with a name already used in server-space, but perhaps I'm just out-of-date and there's a fancier name for LOM these days, and LOM got repurposed. Dan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 18:17 ` Dan Williams @ 2009-10-13 18:56 ` Ben Hutchings 0 siblings, 0 replies; 86+ messages in thread From: Ben Hutchings @ 2009-10-13 18:56 UTC (permalink / raw) To: Dan Williams Cc: Karl O. Pinc, Greg KH, Narendra K, notting, matt_domsch, netdev, shemminger, linux-hotplug, jordan_hargrave, charles_rose On Tue, 2009-10-13 at 11:17 -0700, Dan Williams wrote: > On Mon, 2009-10-12 at 14:41 -0500, Karl O. Pinc wrote: > > On 10/12/2009 02:09:00 PM, Greg KH wrote: > > > > > "LOM"? > > > > "LAN On Motherboard" of all things. > > > > (I had to look this > > one up. The expansion better suited > > to today's economy is "Low On Manna".) > > Or the previous usage from Sun, Apple, and others: "Lights Out > Management". Not sure why they needed to clash with a name already used > in server-space, but perhaps I'm just out-of-date and there's a fancier > name for LOM these days, and LOM got repurposed. It's widely used with both meanings - what's more, LOM is an expected feature of LOMs. :-) Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 19:09 ` Greg KH 2009-10-12 19:41 ` Karl O. Pinc @ 2009-10-12 19:48 ` Matt Domsch 1 sibling, 0 replies; 86+ messages in thread From: Matt Domsch @ 2009-10-12 19:48 UTC (permalink / raw) To: Greg KH Cc: Narendra K, notting, netdev, shemminger, linux-hotplug, jordan_hargrave, charles_rose On Mon, Oct 12, 2009 at 12:09:00PM -0700, Greg KH wrote: > "LOM"? LAN on Motherboard (e.g. an embedded NIC, as opposed to being in some slot). > Isn't what you want is a PCI slot detection, combined with the order on > board in which the port is enumerated? Most folks do, yes. > I still fail to see how this dummy char device would solve this problem, > as everything you can do today in userspace would be the same with this > device node as you can't do anything with the symlink name on its own, > right? You are correct, the char device by itself doesn't help with this. You noted earlier, the char device is really only needed if we want to be able have multiple names for the same device, only exposed in userspace. If all we want to do is change the namespace for devices the kernel uses, from "ethN" to something else, we can do that with a single simple rename. And biosdevname has several --policy=[] options to provide that. --policy=smbios_names => "Embedded NIC 1", "PCI2" --policy=kernelnames => "eth0" (kind of pointless, but included for completeness) --policy=all_ethN => "eth0..ethN" in ascending slot order, embedded before slots, within a single slot in PCI breadth-first order, and thereafter in MAC address order if really needed. --policy=all_names => "eth_s0_0" for the first embedded NIC in PCI breadth-first order, "eth_s1_1" for the second NIC port in PCI slot 1, again in breadth-first order. --policy=embedded_ethN_slots_names a combo of the above, but making the embeddeds still retain the "eth0" format and the slots get "eth_s1_1" format. We could add a dozen more. all_ethN, and to a lesser extent, embedded_ethN, are bad choices if biosdevname is invoked by udev on every run (e.g. not using persistent rules), because when it's run, userspace doesn't know if there are more drivers to be loaded yet, and so biosdevname can't know if there are more NICs to include in the enumeration, to get the naming right. (yet another example of enumeration != naming). Now, for --policy=smbios_names, we get lucky in that the string length returned from SMBIOS is 14 characters, it fits in IFNAMSZ. We may not always get so lucky, SMBIOS strings are arbitrary lengths. This works somewhat better as a symlink source, as that can be longer than IFNAMSZ long. --policy=all_names is pretty good. It fits, it lines up to a fairly obvious hardware mapping. It breaks any code that assumes a regular expression eth[[:digit:]]+ for the name. By having a single name in the kernel for a particular device, it forces a sysadmin to choose one naming policy. We can't have multiple names for the same device (like we do for disks). And conceptually, I'd like to be able to have a physical-based naming scheme (all_names) for use at installtime and mechanical configuration, and a logical-based naming scheme for firewall rules and other policy-based configuration. I can't do that with a single name. I can't please everyone. I can't keep the kernel's eth* namespace intact, as it is meaningless and non-deterministic. I can switch names to another namespace, at the risk of breaking all the applications that have bad assumptions. And I can't have multiple names for the same device. But if I have multiple names for the same device, then I can keep the eth* namespace intact (meaningless as it is), and provide more meaningful names that work too. I'm not hung up on the char device. If I could have multiple names for the same device, done entirely inside the kernel, I'd go for that too. That suggestion has met similar resistance. Or any other mechanism, I'm open to also. But _not_ solving it is no longer an option for me and my customers. -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <EDA0A4495861324DA2618B4C45DCB3EE58953F@blrx3m08.blr.amer.dell.com>]
* Re: PATCH: Network Device Naming mechanism and policy [not found] <EDA0A4495861324DA2618B4C45DCB3EE58953F@blrx3m08.blr.amer.dell.com> @ 2009-10-12 18:07 ` Narendra K 0 siblings, 0 replies; 86+ messages in thread From: Narendra K @ 2009-10-12 18:07 UTC (permalink / raw) To: notting, scott Cc: matt_domsch, netdev, linux-hotplug, jordan_hargrave, charles_rose > > This makes them pretty comparable to LABELs on disks, and we have a > > /dev/disk/by-label > > > > Remember that udev already supports symlink stacking, and priorities > > and such. > > > > I don't think there's any danger of supporting a /dev/netdev/by-mac by > > > default, it'll be a benefit to most and those who don't have unique > > MACs will just ignore it. > > At the moment, we do not appear to get the proper change uevents from > things like 'ip link set dev <foo> address <bar>', so we can't currently > maintain these symlinks. > I have observed that the kernel does generate a "move" event when interfaces are renamed. Looks like udev at present doesn't handle this event, but i suppose it could be extended to hanlde this event. With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <EDA0A4495861324DA2618B4C45DCB3EE5894F6@blrx3m08.blr.amer.dell.com>]
* Re: PATCH: Network Device Naming mechanism and policy [not found] <EDA0A4495861324DA2618B4C45DCB3EE5894F6@blrx3m08.blr.amer.dell.com> @ 2009-10-09 16:04 ` Narendra K 2009-10-09 16:12 ` Stephen Hemminger 0 siblings, 1 reply; 86+ messages in thread From: Narendra K @ 2009-10-09 16:04 UTC (permalink / raw) To: netdev, linux-hotplug; +Cc: matt_domsch, jordan_hargrave On Fri, Oct 09, 2009 at 09:22:19PM +0530, K, Narendra wrote: > Jordan_Hargrave@Dell.com wrote: > > example udev config: > > SUBSYSTEM=="net", > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > work as well. But coupling the ifindex to the MAC address like this > doesn't work. (In general, coupling any two unrelated attributes when > trying to do persistent names doesn't work.) > Attaching the latest patch incorporating review comments. By creating character devices for every network device, we can use udev to maintain alternate naming policies for devices, including additional names for the same device, without interfering with the name that the kernel assigns a device. This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), device nodes will automatically be created in /dev/netdev/ for each network device. (/dev/net/ is already populated by the tun device.) These device nodes are not functional at the moment - open() returns -ENOSYS. Their only purpose is to provide userspace with a kernel name to ifindex mapping, in a form that udev can easily manage. Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com> Signed-off-by: Narendra K <Narendra_K@dell.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> --- include/linux/netdevice.h | 4 ++++ net/Kconfig | 10 ++++++++++ net/core/Makefile | 1 + net/core/cdev.c | 42 ++++++++++++++++++++++++++++++++++++++++++ net/core/cdev.h | 13 +++++++++++++ net/core/dev.c | 10 ++++++++++ net/core/net-sysfs.c | 13 +++++++++++++ 7 files changed, 93 insertions(+), 0 deletions(-) create mode 100644 net/core/cdev.c create mode 100644 net/core/cdev.h diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 94958c1..7c0fc81 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -44,6 +44,7 @@ #include <linux/workqueue.h> #include <linux/ethtool.h> +#include <linux/cdev.h> #include <net/net_namespace.h> #include <net/dsa.h> #ifdef CONFIG_DCB @@ -916,6 +917,9 @@ struct net_device /* max exchange id for FCoE LRO by ddp */ unsigned int fcoe_ddp_xid; #endif +#ifdef CONFIG_NET_CDEV + struct cdev cdev; +#endif }; #define to_net_dev(d) container_of(d, struct net_device, dev) diff --git a/net/Kconfig b/net/Kconfig index 041c35e..bdc5bd7 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -43,6 +43,16 @@ config COMPAT_NETLINK_MESSAGES Newly written code should NEVER need this option but do compat-independent messages instead! +config NET_CDEV + bool "/dev files for network devices" + default y + help + This option causes /dev entries to be created for each + network device. This allows the use of udev to create + alternate device naming policies. + + If unsure, say Y. + menu "Networking options" source "net/packet/Kconfig" diff --git a/net/core/Makefile b/net/core/Makefile index 796f46e..0b40d2c 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -19,4 +19,5 @@ obj-$(CONFIG_NET_DMA) += user_dma.o obj-$(CONFIG_FIB_RULES) += fib_rules.o obj-$(CONFIG_TRACEPOINTS) += net-traces.o obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o +obj-$(CONFIG_NET_CDEV) += cdev.o diff --git a/net/core/cdev.c b/net/core/cdev.c new file mode 100644 index 0000000..1f36076 --- /dev/null +++ b/net/core/cdev.c @@ -0,0 +1,42 @@ +#include <linux/fs.h> +#include <linux/cdev.h> +#include <linux/netdevice.h> +#include <linux/device.h> + +/* Used for network dynamic major number */ +static dev_t netdev_devt; + +static int netdev_cdev_open(struct inode *inode, struct file *filep) +{ + /* no operations on this device are implemented */ + return -ENOSYS; +} + +static const struct file_operations netdev_cdev_fops = { + .owner = THIS_MODULE, + .open = netdev_cdev_open, +}; + +void netdev_cdev_alloc(void) +{ + alloc_chrdev_region(&netdev_devt, 0, 1<<20, "net"); +} + +void netdev_cdev_init(struct net_device *dev) +{ + cdev_init(&dev->cdev, &netdev_cdev_fops); + cdev_add(&dev->cdev, MKDEV(MAJOR(netdev_devt), dev->ifindex), 1); + +} + +void netdev_cdev_del(struct net_device *dev) +{ + if (dev->cdev.dev) + cdev_del(&dev->cdev); +} + +void netdev_cdev_kobj_init(struct device *dev, struct net_device *net) +{ + if (net->cdev.dev) + dev->devt = net->cdev.dev; +} diff --git a/net/core/cdev.h b/net/core/cdev.h new file mode 100644 index 0000000..9cf5a90 --- /dev/null +++ b/net/core/cdev.h @@ -0,0 +1,13 @@ +#include <linux/netdevice.h> + +#ifdef CONFIG_NET_CDEV +void netdev_cdev_alloc(void); +void netdev_cdev_init(struct net_device *dev); +void netdev_cdev_del(struct net_device *dev); +void netdev_cdev_kobj_init(struct device *dev, struct net_device *net); +#else +static inline void netdev_cdev_alloc(void) {} +static inline void netdev_cdev_init(struct net_device *dev) {} +static inline void netdev_cdev_del(struct net_device *dev) {} +static inline void netdev_cdev_kobj_init(struct device *dev, struct net_device *net) {} +#endif diff --git a/net/core/dev.c b/net/core/dev.c index b8f74cf..c4ebfcd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -129,6 +129,7 @@ #include <trace/events/napi.h> #include "net-sysfs.h" +#include "cdev.h" /* Instead of increasing this, you should create a hash table. */ #define MAX_GRO_SKBS 8 @@ -4684,6 +4685,7 @@ static void rollback_registered(struct net_device *dev) /* Remove entries from kobject tree */ netdev_unregister_kobject(dev); + netdev_cdev_del(dev); synchronize_net(); @@ -4835,6 +4837,8 @@ int register_netdevice(struct net_device *dev) if (dev->features & NETIF_F_SG) dev->features |= NETIF_F_GSO; + netdev_cdev_init(dev); + netdev_initialize_kobject(dev); ret = netdev_register_kobject(dev); if (ret) @@ -4864,6 +4868,7 @@ out: return ret; err_uninit: + netdev_cdev_del(dev); if (dev->netdev_ops->ndo_uninit) dev->netdev_ops->ndo_uninit(dev); goto out; @@ -5371,6 +5376,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char dev_addr_discard(dev); netdev_unregister_kobject(dev); + netdev_cdev_del(dev); /* Actually switch the network namespace */ dev_net_set(dev, net); @@ -5387,6 +5393,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char dev->iflink = dev->ifindex; } + netdev_cdev_init(dev); + /* Fixup kobjects */ err = netdev_register_kobject(dev); WARN_ON(err); @@ -5620,6 +5628,8 @@ static int __init net_dev_init(void) BUG_ON(!dev_boot_phase); + netdev_cdev_alloc(); + if (dev_proc_init()) goto out; diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 821d309..ba0af79 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -19,6 +19,7 @@ #include <net/wext.h> #include "net-sysfs.h" +#include "cdev.h" #ifdef CONFIG_SYSFS static const char fmt_hex[] = "%#x\n"; @@ -461,6 +462,14 @@ static void netdev_release(struct device *d) kfree((char *)dev - dev->padded); } +#ifdef CONFIG_NET_CDEV +static char *netdev_devnode(struct device *d, mode_t *mode) +{ + struct net_device *dev = to_net_dev(d); + return kasprintf(GFP_KERNEL, "netdev/%s", dev->name); +} +#endif + static struct class net_class = { .name = "net", .dev_release = netdev_release, @@ -470,6 +479,9 @@ static struct class net_class = { #ifdef CONFIG_HOTPLUG .dev_uevent = netdev_uevent, #endif +#ifdef CONFIG_NET_CDEV + .devnode = netdev_devnode, +#endif }; /* Delete sysfs entries but hold kobject reference until after all @@ -496,6 +508,7 @@ int netdev_register_kobject(struct net_device *net) dev->class = &net_class; dev->platform_data = net; dev->groups = groups; + netdev_cdev_kobj_init(dev, net); dev_set_name(dev, "%s", net->name); -- With regards, Narendra K ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 16:04 ` Narendra K @ 2009-10-09 16:12 ` Stephen Hemminger 2009-10-09 16:25 ` Matt Domsch 0 siblings, 1 reply; 86+ messages in thread From: Stephen Hemminger @ 2009-10-09 16:12 UTC (permalink / raw) To: Narendra K; +Cc: netdev, linux-hotplug, matt_domsch, jordan_hargrave On Fri, 9 Oct 2009 11:04:43 -0500 Narendra K <Narendra_K@dell.com> wrote: > On Fri, Oct 09, 2009 at 09:22:19PM +0530, K, Narendra wrote: > > Jordan_Hargrave@Dell.com wrote: > > > example udev config: > > > SUBSYSTEM=="net", > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > work as well. But coupling the ifindex to the MAC address like this > > doesn't work. (In general, coupling any two unrelated attributes when > > trying to do persistent names doesn't work.) > > > Attaching the latest patch incorporating review comments. > > By creating character devices for every network device, we can use > udev to maintain alternate naming policies for devices, including > additional names for the same device, without interfering with the > name that the kernel assigns a device. > > This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), > device nodes will automatically be created in /dev/netdev/ for each > network device. (/dev/net/ is already populated by the tun device.) > > These device nodes are not functional at the moment - open() returns > -ENOSYS. Their only purpose is to provide userspace with a kernel > name to ifindex mapping, in a form that udev can easily manage. > > Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com> > Signed-off-by: Narendra K <Narendra_K@dell.com> > Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> What happens if interface is renamed by either networking API: ip li set dev eth0 name eth-renamed-by-me or via mv /dev/net/eth0 /dev/net/eth-renamed-by-user or if both are done at same time (what is locking model?) ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 16:12 ` Stephen Hemminger @ 2009-10-09 16:25 ` Matt Domsch 0 siblings, 0 replies; 86+ messages in thread From: Matt Domsch @ 2009-10-09 16:25 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave On Fri, Oct 09, 2009 at 09:12:47AM -0700, Stephen Hemminger wrote: > On Fri, 9 Oct 2009 11:04:43 -0500 > Narendra K <Narendra_K@dell.com> wrote: > > > By creating character devices for every network device, we can use > > udev to maintain alternate naming policies for devices, including > > additional names for the same device, without interfering with the > > name that the kernel assigns a device. > > > What happens if interface is renamed by either networking API: > ip li set dev eth0 name eth-renamed-by-me udev sees a KOBJ_MOVE uevent. Today it does not handle these events at all, but talking with Kay, he believes udev can be extended to handle that pretty easily. > or via > mv /dev/net/eth0 /dev/net/eth-renamed-by-user There is no VFS magic today such that this 'mv' will translate into a device_rename() function inside the kernel. udev "owns" the /dev/netdev/eth0 device node name. If a user (root) does a 'mv', the symlink referants will be broken. This is no different than doing so for a disk device or any other udev-managed device node. If someone does a mv /dev/sda /dev/sda-mybootdisk and is relying on the /dev/disk/by-label/mybootdisk -> /dev/sda symlink in some way, the application will fail. > or if both are done at same time (what is locking model?) There is no locking model. udev will serialize the rename events though, as seen in userspace. Thanks, Matt -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <EDA0A4495861324DA2618B4C45DCB3EE5894ED@blrx3m08.blr.amer.dell.com>]
* Re: PATCH: Network Device Naming mechanism and policy [not found] <EDA0A4495861324DA2618B4C45DCB3EE5894ED@blrx3m08.blr.amer.dell.com> @ 2009-10-09 14:00 ` Narendra K 2009-10-09 14:51 ` Matt Domsch ` (3 more replies) 0 siblings, 4 replies; 86+ messages in thread From: Narendra K @ 2009-10-09 14:00 UTC (permalink / raw) To: netdev, linux-hotplug; +Cc: matt_domsch, jordan_hargrave, narendra_k On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote: > > example udev config: > > SUBSYSTEM=="net", > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > work as well. But coupling the ifindex to the MAC address like this > doesn't work. (In general, coupling any two unrelated attributes when > trying to do persistent names doesn't work.) > Attaching the latest patch incorporating review comments. By creating character devices for every network device, we can use udev to maintain alternate naming policies for devices, including additional names for the same device, without interfering with the name that the kernel assigns a device. This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), device nodes will automatically be created in /dev/netdev/ for each network device. (/dev/net/ is already populated by the tun device.) These device nodes are not functional at the moment - open() returns -ENOSYS. Their only purpose is to provide userspace with a kernel name to ifindex mapping, in a form that udev can easily manage. Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com> Signed-off-by: Narendra K <narendra_k@dell.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> --- include/linux/netdevice.h | 4 ++++ net/Kconfig | 10 ++++++++++ net/core/Makefile | 1 + net/core/cdev.c | 42 ++++++++++++++++++++++++++++++++++++++++++ net/core/cdev.h | 13 +++++++++++++ net/core/dev.c | 10 ++++++++++ net/core/net-sysfs.c | 13 +++++++++++++ 7 files changed, 93 insertions(+), 0 deletions(-) create mode 100644 net/core/cdev.c create mode 100644 net/core/cdev.h diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 94958c1..7c0fc81 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -44,6 +44,7 @@ #include <linux/workqueue.h> #include <linux/ethtool.h> +#include <linux/cdev.h> #include <net/net_namespace.h> #include <net/dsa.h> #ifdef CONFIG_DCB @@ -916,6 +917,9 @@ struct net_device /* max exchange id for FCoE LRO by ddp */ unsigned int fcoe_ddp_xid; #endif +#ifdef CONFIG_NET_CDEV + struct cdev cdev; +#endif }; #define to_net_dev(d) container_of(d, struct net_device, dev) diff --git a/net/Kconfig b/net/Kconfig index 041c35e..bdc5bd7 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -43,6 +43,16 @@ config COMPAT_NETLINK_MESSAGES Newly written code should NEVER need this option but do compat-independent messages instead! +config NET_CDEV + bool "/dev files for network devices" + default y + help + This option causes /dev entries to be created for each + network device. This allows the use of udev to create + alternate device naming policies. + + If unsure, say Y. + menu "Networking options" source "net/packet/Kconfig" diff --git a/net/core/Makefile b/net/core/Makefile index 796f46e..0b40d2c 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -19,4 +19,5 @@ obj-$(CONFIG_NET_DMA) += user_dma.o obj-$(CONFIG_FIB_RULES) += fib_rules.o obj-$(CONFIG_TRACEPOINTS) += net-traces.o obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o +obj-$(CONFIG_NET_CDEV) += cdev.o diff --git a/net/core/cdev.c b/net/core/cdev.c new file mode 100644 index 0000000..1f36076 --- /dev/null +++ b/net/core/cdev.c @@ -0,0 +1,42 @@ +#include <linux/fs.h> +#include <linux/cdev.h> +#include <linux/netdevice.h> +#include <linux/device.h> + +/* Used for network dynamic major number */ +static dev_t netdev_devt; + +static int netdev_cdev_open(struct inode *inode, struct file *filep) +{ + /* no operations on this device are implemented */ + return -ENOSYS; +} + +static const struct file_operations netdev_cdev_fops = { + .owner = THIS_MODULE, + .open = netdev_cdev_open, +}; + +void netdev_cdev_alloc(void) +{ + alloc_chrdev_region(&netdev_devt, 0, 1<<20, "net"); +} + +void netdev_cdev_init(struct net_device *dev) +{ + cdev_init(&dev->cdev, &netdev_cdev_fops); + cdev_add(&dev->cdev, MKDEV(MAJOR(netdev_devt), dev->ifindex), 1); + +} + +void netdev_cdev_del(struct net_device *dev) +{ + if (dev->cdev.dev) + cdev_del(&dev->cdev); +} + +void netdev_cdev_kobj_init(struct device *dev, struct net_device *net) +{ + if (net->cdev.dev) + dev->devt = net->cdev.dev; +} diff --git a/net/core/cdev.h b/net/core/cdev.h new file mode 100644 index 0000000..9cf5a90 --- /dev/null +++ b/net/core/cdev.h @@ -0,0 +1,13 @@ +#include <linux/netdevice.h> + +#ifdef CONFIG_NET_CDEV +void netdev_cdev_alloc(void); +void netdev_cdev_init(struct net_device *dev); +void netdev_cdev_del(struct net_device *dev); +void netdev_cdev_kobj_init(struct device *dev, struct net_device *net); +#else +static inline void netdev_cdev_alloc(void) {} +static inline void netdev_cdev_init(struct net_device *dev) {} +static inline void netdev_cdev_del(struct net_device *dev) {} +static inline void netdev_cdev_kobj_init(struct device *dev, struct net_device *net) {} +#endif diff --git a/net/core/dev.c b/net/core/dev.c index b8f74cf..c4ebfcd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -129,6 +129,7 @@ #include <trace/events/napi.h> #include "net-sysfs.h" +#include "cdev.h" /* Instead of increasing this, you should create a hash table. */ #define MAX_GRO_SKBS 8 @@ -4684,6 +4685,7 @@ static void rollback_registered(struct net_device *dev) /* Remove entries from kobject tree */ netdev_unregister_kobject(dev); + netdev_cdev_del(dev); synchronize_net(); @@ -4835,6 +4837,8 @@ int register_netdevice(struct net_device *dev) if (dev->features & NETIF_F_SG) dev->features |= NETIF_F_GSO; + netdev_cdev_init(dev); + netdev_initialize_kobject(dev); ret = netdev_register_kobject(dev); if (ret) @@ -4864,6 +4868,7 @@ out: return ret; err_uninit: + netdev_cdev_del(dev); if (dev->netdev_ops->ndo_uninit) dev->netdev_ops->ndo_uninit(dev); goto out; @@ -5371,6 +5376,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char dev_addr_discard(dev); netdev_unregister_kobject(dev); + netdev_cdev_del(dev); /* Actually switch the network namespace */ dev_net_set(dev, net); @@ -5387,6 +5393,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char dev->iflink = dev->ifindex; } + netdev_cdev_init(dev); + /* Fixup kobjects */ err = netdev_register_kobject(dev); WARN_ON(err); @@ -5620,6 +5628,8 @@ static int __init net_dev_init(void) BUG_ON(!dev_boot_phase); + netdev_cdev_alloc(); + if (dev_proc_init()) goto out; diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 821d309..ba0af79 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -19,6 +19,7 @@ #include <net/wext.h> #include "net-sysfs.h" +#include "cdev.h" #ifdef CONFIG_SYSFS static const char fmt_hex[] = "%#x\n"; @@ -461,6 +462,14 @@ static void netdev_release(struct device *d) kfree((char *)dev - dev->padded); } +#ifdef CONFIG_NET_CDEV +static char *netdev_devnode(struct device *d, mode_t *mode) +{ + struct net_device *dev = to_net_dev(d); + return kasprintf(GFP_KERNEL, "netdev/%s", dev->name); +} +#endif + static struct class net_class = { .name = "net", .dev_release = netdev_release, @@ -470,6 +479,9 @@ static struct class net_class = { #ifdef CONFIG_HOTPLUG .dev_uevent = netdev_uevent, #endif +#ifdef CONFIG_NET_CDEV + .devnode = netdev_devnode, +#endif }; /* Delete sysfs entries but hold kobject reference until after all @@ -496,6 +508,7 @@ int netdev_register_kobject(struct net_device *net) dev->class = &net_class; dev->platform_data = net; dev->groups = groups; + netdev_cdev_kobj_init(dev, net); dev_set_name(dev, "%s", net->name); -- With regards, Narendra K ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 14:00 ` Narendra K @ 2009-10-09 14:51 ` Matt Domsch 2009-10-09 16:23 ` Bryan Kadzban 2009-10-12 10:41 ` Scott James Remnant 2009-10-09 16:36 ` Greg KH ` (2 subsequent siblings) 3 siblings, 2 replies; 86+ messages in thread From: Matt Domsch @ 2009-10-09 14:51 UTC (permalink / raw) To: Narendra K; +Cc: netdev, linux-hotplug, jordan_hargrave On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote: > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote: > > > example udev config: > > > SUBSYSTEM=="net", > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > > > work as well. But coupling the ifindex to the MAC address like this > > doesn't work. (In general, coupling any two unrelated attributes when > > trying to do persistent names doesn't work.) > > > Attaching the latest patch incorporating review comments. Thank you Narendra. Let me also note that we are prepared to have userspace consumers of this new character device node. http://linux.dell.com/wiki/index.php/Oss/libnetdevname notes how the kernel patch will interact with udev, describes the new library helper function in libnetdevname, and has patches for net-tools, iproute2, and ethtool to make use of the helper function. As has been noted here, MAC addresses are not necessarily unique to an interface. As such, we are not proposing a net/by-mac/* symlink to /dev/netdev/*. Thanks, Matt -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 14:51 ` Matt Domsch @ 2009-10-09 16:23 ` Bryan Kadzban 2009-10-09 16:56 ` Marco d'Itri 2009-10-12 10:41 ` Scott James Remnant 1 sibling, 1 reply; 86+ messages in thread From: Bryan Kadzban @ 2009-10-09 16:23 UTC (permalink / raw) To: Matt Domsch; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave [-- Attachment #1: Type: text/plain, Size: 905 bytes --] Matt Domsch wrote: > Let me also note that we are prepared to have userspace consumers of > this new character device node. > > http://linux.dell.com/wiki/index.php/Oss/libnetdevname > > notes how the kernel patch will interact with udev, describes the new > library helper function in libnetdevname, and has patches for > net-tools, iproute2, and ethtool to make use of the helper function. > > As has been noted here, MAC addresses are not necessarily unique to > an interface. Only in the case of e.g. qemu (virtual hardware), I think. (Or some kinds of broken hardware. Anything not on the udev whitelist from 75-persistent-net-generator.rules.) The combination of (MAC, ifindex) is not unique, which is what I meant earlier -- but the setup on the wiki seems to handle this properly. Assuming there was a /dev/net/by-mac/00:01:02:03:04:05 link, it should work fine... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 260 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 16:23 ` Bryan Kadzban @ 2009-10-09 16:56 ` Marco d'Itri 0 siblings, 0 replies; 86+ messages in thread From: Marco d'Itri @ 2009-10-09 16:56 UTC (permalink / raw) To: Bryan Kadzban Cc: Matt Domsch, Narendra K, netdev, linux-hotplug, jordan_hargrave [-- Attachment #1: Type: text/plain, Size: 353 bytes --] On Oct 09, Bryan Kadzban <bryan@kadzban.is-a-geek.net> wrote: > > As has been noted here, MAC addresses are not necessarily unique to > > an interface. > Only in the case of e.g. qemu (virtual hardware), I think. (Or some > kinds of broken hardware. Some Sun products have multiple interfaces sharing the same MAC address. -- ciao, Marco [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 14:51 ` Matt Domsch 2009-10-09 16:23 ` Bryan Kadzban @ 2009-10-12 10:41 ` Scott James Remnant 2009-10-12 11:31 ` Ben Hutchings 2009-10-12 17:37 ` Bill Nottingham 1 sibling, 2 replies; 86+ messages in thread From: Scott James Remnant @ 2009-10-12 10:41 UTC (permalink / raw) To: Matt Domsch; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave [-- Attachment #1: Type: text/plain, Size: 693 bytes --] On Fri, 2009-10-09 at 09:51 -0500, Matt Domsch wrote: > As has been noted here, MAC addresses are not necessarily unique to an > interface. As such, we are not proposing a net/by-mac/* symlink to > /dev/netdev/*. > On the other hand, they *tend* to be unique for a wide range of systems. This makes them pretty comparable to LABELs on disks, and we have a /dev/disk/by-label Remember that udev already supports symlink stacking, and priorities and such. I don't think there's any danger of supporting a /dev/netdev/by-mac by default, it'll be a benefit to most and those who don't have unique MACs will just ignore it. Scott -- Scott James Remnant scott@ubuntu.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 10:41 ` Scott James Remnant @ 2009-10-12 11:31 ` Ben Hutchings 2009-10-12 17:37 ` Bill Nottingham 1 sibling, 0 replies; 86+ messages in thread From: Ben Hutchings @ 2009-10-12 11:31 UTC (permalink / raw) To: Scott James Remnant Cc: Matt Domsch, Narendra K, netdev, linux-hotplug, jordan_hargrave On Mon, 2009-10-12 at 11:41 +0100, Scott James Remnant wrote: > On Fri, 2009-10-09 at 09:51 -0500, Matt Domsch wrote: > > > As has been noted here, MAC addresses are not necessarily unique to an > > interface. As such, we are not proposing a net/by-mac/* symlink to > > /dev/netdev/*. > > > On the other hand, they *tend* to be unique for a wide range of systems. > This makes them pretty comparable to LABELs on disks, and we have > a /dev/disk/by-label [...] MAC addresses are normally assigned automatically but can be overridden if necessary. In that respect they are more like UUIDs for disks. I don't see any analogue of disk labels, though labels could conceivably be added to some NICs using VPD. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 10:41 ` Scott James Remnant 2009-10-12 11:31 ` Ben Hutchings @ 2009-10-12 17:37 ` Bill Nottingham 2009-10-13 18:06 ` Dan Williams 1 sibling, 1 reply; 86+ messages in thread From: Bill Nottingham @ 2009-10-12 17:37 UTC (permalink / raw) To: Scott James Remnant Cc: Matt Domsch, Narendra K, netdev, linux-hotplug, jordan_hargrave Scott James Remnant (scott@ubuntu.com) said: > On the other hand, they *tend* to be unique for a wide range of systems. > This makes them pretty comparable to LABELs on disks, and we have > a /dev/disk/by-label > > Remember that udev already supports symlink stacking, and priorities and > such. > > I don't think there's any danger of supporting a /dev/netdev/by-mac by > default, it'll be a benefit to most and those who don't have unique MACs > will just ignore it. At the moment, we do not appear to get the proper change uevents from things like 'ip link set dev <foo> address <bar>', so we can't currently maintain these symlinks. Bill ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 17:37 ` Bill Nottingham @ 2009-10-13 18:06 ` Dan Williams 2009-10-13 18:53 ` Ben Hutchings 0 siblings, 1 reply; 86+ messages in thread From: Dan Williams @ 2009-10-13 18:06 UTC (permalink / raw) To: Bill Nottingham Cc: Scott James Remnant, Matt Domsch, Narendra K, netdev, linux-hotplug, jordan_hargrave On Mon, 2009-10-12 at 13:37 -0400, Bill Nottingham wrote: > Scott James Remnant (scott@ubuntu.com) said: > > On the other hand, they *tend* to be unique for a wide range of systems. > > This makes them pretty comparable to LABELs on disks, and we have > > a /dev/disk/by-label > > > > Remember that udev already supports symlink stacking, and priorities and > > such. > > > > I don't think there's any danger of supporting a /dev/netdev/by-mac by > > default, it'll be a benefit to most and those who don't have unique MACs > > will just ignore it. > > At the moment, we do not appear to get the proper change uevents from things > like 'ip link set dev <foo> address <bar>', so we can't currently maintain > these symlinks. And if we really want seamless support for MAC spoofing, we want ETHTOOL_GPERMADDR for all drivers too, so that if your configuration says "rename device XX:XX:XX:XX:XX:XX to YY:YY:YY:YY:YY:YY" we can actually figure stuff out after the spoof. Dan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 18:06 ` Dan Williams @ 2009-10-13 18:53 ` Ben Hutchings 2009-10-13 19:53 ` John W. Linville 0 siblings, 1 reply; 86+ messages in thread From: Ben Hutchings @ 2009-10-13 18:53 UTC (permalink / raw) To: Dan Williams Cc: Bill Nottingham, Scott James Remnant, Matt Domsch, Narendra K, netdev, linux-hotplug, jordan_hargrave On Tue, 2009-10-13 at 11:06 -0700, Dan Williams wrote: > On Mon, 2009-10-12 at 13:37 -0400, Bill Nottingham wrote: > > Scott James Remnant (scott@ubuntu.com) said: > > > On the other hand, they *tend* to be unique for a wide range of systems. > > > This makes them pretty comparable to LABELs on disks, and we have > > > a /dev/disk/by-label > > > > > > Remember that udev already supports symlink stacking, and priorities and > > > such. > > > > > > I don't think there's any danger of supporting a /dev/netdev/by-mac by > > > default, it'll be a benefit to most and those who don't have unique MACs > > > will just ignore it. > > > > At the moment, we do not appear to get the proper change uevents from things > > like 'ip link set dev <foo> address <bar>', so we can't currently maintain > > these symlinks. > > And if we really want seamless support for MAC spoofing, we want > ETHTOOL_GPERMADDR for all drivers too, so that if your configuration > says "rename device XX:XX:XX:XX:XX:XX to YY:YY:YY:YY:YY:YY" we can > actually figure stuff out after the spoof. ETHTOOL_GPERMADDR is handled in the ethtool core now. Are you thinking of drivers that don't have ethtool ops? Maybe it's time to add default operations. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 18:53 ` Ben Hutchings @ 2009-10-13 19:53 ` John W. Linville 0 siblings, 0 replies; 86+ messages in thread From: John W. Linville @ 2009-10-13 19:53 UTC (permalink / raw) To: Ben Hutchings Cc: Dan Williams, Bill Nottingham, Scott James Remnant, Matt Domsch, Narendra K, netdev, linux-hotplug, jordan_hargrave On Tue, Oct 13, 2009 at 07:53:04PM +0100, Ben Hutchings wrote: > On Tue, 2009-10-13 at 11:06 -0700, Dan Williams wrote: > > And if we really want seamless support for MAC spoofing, we want > > ETHTOOL_GPERMADDR for all drivers too, so that if your configuration > > says "rename device XX:XX:XX:XX:XX:XX to YY:YY:YY:YY:YY:YY" we can > > actually figure stuff out after the spoof. > > ETHTOOL_GPERMADDR is handled in the ethtool core now. Are you thinking > of drivers that don't have ethtool ops? Maybe it's time to add default > operations. Not quite true -- dev->perm_addr still has to be set by the driver. John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 14:00 ` Narendra K 2009-10-09 14:51 ` Matt Domsch @ 2009-10-09 16:36 ` Greg KH 2009-10-09 17:17 ` Matt Domsch 2009-10-09 21:09 ` Matt Domsch 2009-10-13 15:08 ` dann frazier 3 siblings, 1 reply; 86+ messages in thread From: Greg KH @ 2009-10-09 16:36 UTC (permalink / raw) To: Narendra K; +Cc: netdev, linux-hotplug, matt_domsch, jordan_hargrave On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote: > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote: > > > example udev config: > > > SUBSYSTEM=="net", > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > > > work as well. But coupling the ifindex to the MAC address like this > > doesn't work. (In general, coupling any two unrelated attributes when > > trying to do persistent names doesn't work.) > > > Attaching the latest patch incorporating review comments. > > By creating character devices for every network device, we can use > udev to maintain alternate naming policies for devices, including > additional names for the same device, without interfering with the > name that the kernel assigns a device. > > This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), > device nodes will automatically be created in /dev/netdev/ for each > network device. (/dev/net/ is already populated by the tun device.) > > These device nodes are not functional at the moment - open() returns > -ENOSYS. Their only purpose is to provide userspace with a kernel > name to ifindex mapping, in a form that udev can easily manage. How does this patch work with the network namespace functionality? thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 16:36 ` Greg KH @ 2009-10-09 17:17 ` Matt Domsch 2009-10-09 17:22 ` Greg KH 0 siblings, 1 reply; 86+ messages in thread From: Matt Domsch @ 2009-10-09 17:17 UTC (permalink / raw) To: Greg KH; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave On Fri, Oct 09, 2009 at 09:36:13AM -0700, Greg KH wrote: > On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote: > > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote: > > > > example udev config: > > > > SUBSYSTEM=="net", > > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > > > > > work as well. But coupling the ifindex to the MAC address like this > > > doesn't work. (In general, coupling any two unrelated attributes when > > > trying to do persistent names doesn't work.) > > > > > Attaching the latest patch incorporating review comments. > > > > By creating character devices for every network device, we can use > > udev to maintain alternate naming policies for devices, including > > additional names for the same device, without interfering with the > > name that the kernel assigns a device. > > > > This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), > > device nodes will automatically be created in /dev/netdev/ for each > > network device. (/dev/net/ is already populated by the tun device.) > > > > These device nodes are not functional at the moment - open() returns > > -ENOSYS. Their only purpose is to provide userspace with a kernel > > name to ifindex mapping, in a form that udev can easily manage. > > How does this patch work with the network namespace functionality? There is a monitonically increasing static ifindex kept in net/core/dev.c:dev_new_index(), which is shared across all namespaces. struct net_device ifindex field is assigned from this. So two devices in two different namespaces can't share an ifindex value. However, the device can be present (or not) in the per-namespace dev_name_hash and dev_index_hashes. This patch doesn't change this at all. uevents aren't namespaced. Presumably that means /dev can't be polyinstantiated. Therefore, all devnodes in /dev/netdev/* will be visible to all processes, where 'ifconfig' and friends would only show device names in the processes namespace. This doesn't mean the app can _do_ anything (it's the same as if it tried to act on a device using an ifindex for a device not in its namespace), but yes, the fact that such a device exists will be exposed. -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 17:17 ` Matt Domsch @ 2009-10-09 17:22 ` Greg KH 0 siblings, 0 replies; 86+ messages in thread From: Greg KH @ 2009-10-09 17:22 UTC (permalink / raw) To: Matt Domsch; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave On Fri, Oct 09, 2009 at 12:17:24PM -0500, Matt Domsch wrote: > > uevents aren't namespaced. Presumably that means /dev can't be > polyinstantiated. Therefore, all devnodes in /dev/netdev/* will be > visible to all processes, where 'ifconfig' and friends would only show > device names in the processes namespace. This doesn't mean the app > can _do_ anything (it's the same as if it tried to act on a device > using an ifindex for a device not in its namespace), but yes, the fact > that such a device exists will be exposed. That's the problem that the sysfs namespace patches were trying to address. Now I'm not saying it is a valid thing to try to work with this kind of crazy, I was just wondering how it would work out. Looks like it doesn't :) thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 14:00 ` Narendra K 2009-10-09 14:51 ` Matt Domsch 2009-10-09 16:36 ` Greg KH @ 2009-10-09 21:09 ` Matt Domsch 2009-10-10 2:44 ` Stephen Hemminger 2009-10-13 15:08 ` dann frazier 3 siblings, 1 reply; 86+ messages in thread From: Matt Domsch @ 2009-10-09 21:09 UTC (permalink / raw) To: netdev, linux-hotplug; +Cc: Narendra_K, jordan_hargrave On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote: > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote: > > > example udev config: > > > SUBSYSTEM=="net", > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > > > work as well. But coupling the ifindex to the MAC address like this > > doesn't work. (In general, coupling any two unrelated attributes when > > trying to do persistent names doesn't work.) > > > Attaching the latest patch incorporating review comments. Same patch, rebased to linux-next. By creating character devices for every network device, we can use udev to maintain alternate naming policies for devices, including additional names for the same device, without interfering with the name that the kernel assigns a device. This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), device nodes will automatically be created in /dev/netdev/ for each network device. (/dev/net/ is already populated by the tun device.) These device nodes are not functional at the moment - open() returns -ENOSYS. Their only purpose is to provide userspace with a kernel name to ifindex mapping, in a form that udev can easily manage. Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com> Signed-off-by: Narendra K <Narendra_K@dell.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> --- include/linux/netdevice.h | 4 ++++ net/Kconfig | 10 ++++++++++ net/core/Makefile | 1 + net/core/cdev.c | 42 ++++++++++++++++++++++++++++++++++++++++++ net/core/cdev.h | 13 +++++++++++++ net/core/dev.c | 10 ++++++++++ net/core/net-sysfs.c | 13 +++++++++++++ 7 files changed, 93 insertions(+), 0 deletions(-) create mode 100644 net/core/cdev.c create mode 100644 net/core/cdev.h diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b332eef..a2f23b4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -44,6 +44,7 @@ #include <linux/workqueue.h> #include <linux/ethtool.h> +#include <linux/cdev.h> #include <net/net_namespace.h> #include <net/dsa.h> #ifdef CONFIG_DCB @@ -916,6 +917,9 @@ struct net_device /* max exchange id for FCoE LRO by ddp */ unsigned int fcoe_ddp_xid; #endif +#ifdef CONFIG_NET_CDEV + struct cdev cdev; +#endif }; #define to_net_dev(d) container_of(d, struct net_device, dev) diff --git a/net/Kconfig b/net/Kconfig index 041c35e..bdc5bd7 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -43,6 +43,16 @@ config COMPAT_NETLINK_MESSAGES Newly written code should NEVER need this option but do compat-independent messages instead! +config NET_CDEV + bool "/dev files for network devices" + default y + help + This option causes /dev entries to be created for each + network device. This allows the use of udev to create + alternate device naming policies. + + If unsure, say Y. + menu "Networking options" source "net/packet/Kconfig" diff --git a/net/core/Makefile b/net/core/Makefile index 796f46e..0b40d2c 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -19,4 +19,5 @@ obj-$(CONFIG_NET_DMA) += user_dma.o obj-$(CONFIG_FIB_RULES) += fib_rules.o obj-$(CONFIG_TRACEPOINTS) += net-traces.o obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o +obj-$(CONFIG_NET_CDEV) += cdev.o diff --git a/net/core/cdev.c b/net/core/cdev.c new file mode 100644 index 0000000..1f36076 --- /dev/null +++ b/net/core/cdev.c @@ -0,0 +1,42 @@ +#include <linux/fs.h> +#include <linux/cdev.h> +#include <linux/netdevice.h> +#include <linux/device.h> + +/* Used for network dynamic major number */ +static dev_t netdev_devt; + +static int netdev_cdev_open(struct inode *inode, struct file *filep) +{ + /* no operations on this device are implemented */ + return -ENOSYS; +} + +static const struct file_operations netdev_cdev_fops = { + .owner = THIS_MODULE, + .open = netdev_cdev_open, +}; + +void netdev_cdev_alloc(void) +{ + alloc_chrdev_region(&netdev_devt, 0, 1<<20, "net"); +} + +void netdev_cdev_init(struct net_device *dev) +{ + cdev_init(&dev->cdev, &netdev_cdev_fops); + cdev_add(&dev->cdev, MKDEV(MAJOR(netdev_devt), dev->ifindex), 1); + +} + +void netdev_cdev_del(struct net_device *dev) +{ + if (dev->cdev.dev) + cdev_del(&dev->cdev); +} + +void netdev_cdev_kobj_init(struct device *dev, struct net_device *net) +{ + if (net->cdev.dev) + dev->devt = net->cdev.dev; +} diff --git a/net/core/cdev.h b/net/core/cdev.h new file mode 100644 index 0000000..9cf5a90 --- /dev/null +++ b/net/core/cdev.h @@ -0,0 +1,13 @@ +#include <linux/netdevice.h> + +#ifdef CONFIG_NET_CDEV +void netdev_cdev_alloc(void); +void netdev_cdev_init(struct net_device *dev); +void netdev_cdev_del(struct net_device *dev); +void netdev_cdev_kobj_init(struct device *dev, struct net_device *net); +#else +static inline void netdev_cdev_alloc(void) {} +static inline void netdev_cdev_init(struct net_device *dev) {} +static inline void netdev_cdev_del(struct net_device *dev) {} +static inline void netdev_cdev_kobj_init(struct device *dev, struct net_device *net) {} +#endif diff --git a/net/core/dev.c b/net/core/dev.c index a74c8fd..d771438 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -129,6 +129,7 @@ #include <trace/events/napi.h> #include "net-sysfs.h" +#include "cdev.h" /* Instead of increasing this, you should create a hash table. */ #define MAX_GRO_SKBS 8 @@ -4684,6 +4685,7 @@ static void rollback_registered(struct net_device *dev) /* Remove entries from kobject tree */ netdev_unregister_kobject(dev); + netdev_cdev_del(dev); synchronize_net(); @@ -4835,6 +4837,8 @@ int register_netdevice(struct net_device *dev) if (dev->features & NETIF_F_SG) dev->features |= NETIF_F_GSO; + netdev_cdev_init(dev); + netdev_initialize_kobject(dev); ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev); @@ -4870,6 +4874,7 @@ out: return ret; err_uninit: + netdev_cdev_del(dev); if (dev->netdev_ops->ndo_uninit) dev->netdev_ops->ndo_uninit(dev); goto out; @@ -5377,6 +5382,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char dev_addr_discard(dev); netdev_unregister_kobject(dev); + netdev_cdev_del(dev); /* Actually switch the network namespace */ dev_net_set(dev, net); @@ -5393,6 +5399,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char dev->iflink = dev->ifindex; } + netdev_cdev_init(dev); + /* Fixup kobjects */ err = netdev_register_kobject(dev); WARN_ON(err); @@ -5626,6 +5634,8 @@ static int __init net_dev_init(void) BUG_ON(!dev_boot_phase); + netdev_cdev_alloc(); + if (dev_proc_init()) goto out; diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 753c420..f4ee557 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -19,6 +19,7 @@ #include <net/wext.h> #include "net-sysfs.h" +#include "cdev.h" #ifdef CONFIG_SYSFS static const char fmt_hex[] = "%#x\n"; @@ -501,6 +502,14 @@ static void netdev_release(struct device *d) kfree((char *)dev - dev->padded); } +#ifdef CONFIG_NET_CDEV +static char *netdev_devnode(struct device *d, mode_t *mode) +{ + struct net_device *dev = to_net_dev(d); + return kasprintf(GFP_KERNEL, "netdev/%s", dev->name); +} +#endif + static struct class net_class = { .name = "net", .dev_release = netdev_release, @@ -510,6 +519,9 @@ static struct class net_class = { #ifdef CONFIG_HOTPLUG .dev_uevent = netdev_uevent, #endif +#ifdef CONFIG_NET_CDEV + .devnode = netdev_devnode, +#endif }; /* Delete sysfs entries but hold kobject reference until after all @@ -536,6 +548,7 @@ int netdev_register_kobject(struct net_device *net) dev->class = &net_class; dev->platform_data = net; dev->groups = groups; + netdev_cdev_kobj_init(dev, net); dev_set_name(dev, "%s", net->name); -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 21:09 ` Matt Domsch @ 2009-10-10 2:44 ` Stephen Hemminger 2009-10-10 4:40 ` Matt Domsch 0 siblings, 1 reply; 86+ messages in thread From: Stephen Hemminger @ 2009-10-10 2:44 UTC (permalink / raw) To: Matt Domsch; +Cc: netdev, linux-hotplug, Narendra_K, jordan_hargrave On Fri, 9 Oct 2009 16:09:09 -0500 Matt Domsch <Matt_Domsch@dell.com> wrote: > On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote: > > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote: > > > > example udev config: > > > > SUBSYSTEM=="net", > > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > > > > > work as well. But coupling the ifindex to the MAC address like this > > > doesn't work. (In general, coupling any two unrelated attributes when > > > trying to do persistent names doesn't work.) > > > > > Attaching the latest patch incorporating review comments. > > Same patch, rebased to linux-next. > > By creating character devices for every network device, we can use > udev to maintain alternate naming policies for devices, including > additional names for the same device, without interfering with the > name that the kernel assigns a device. > > This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), > device nodes will automatically be created in /dev/netdev/ for each > network device. (/dev/net/ is already populated by the tun device.) > > These device nodes are not functional at the moment - open() returns > -ENOSYS. Their only purpose is to provide userspace with a kernel > name to ifindex mapping, in a form that udev can easily manage. > > Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com> > Signed-off-by: Narendra K <Narendra_K@dell.com> > Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Maybe I'm dense but can't see why having a useless /dev/net/ symlinks is a good interface choice. Perhaps you should explain the race between PCI scan and udev in more detail, and why solving it in either of those places won't work. As it stands you are proposing yet another wart to the already complex set of network interface API's which has implications for security as well as increasing the number of possible bugs. -- ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 2:44 ` Stephen Hemminger @ 2009-10-10 4:40 ` Matt Domsch 2009-10-10 5:23 ` Greg KH ` (2 more replies) 0 siblings, 3 replies; 86+ messages in thread From: Matt Domsch @ 2009-10-10 4:40 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, linux-hotplug, Narendra_K, jordan_hargrave On Fri, Oct 09, 2009 at 07:44:01PM -0700, Stephen Hemminger wrote: > Maybe I'm dense but can't see why having a useless /dev/net/ symlinks > is a good interface choice. Perhaps you should explain the race between > PCI scan and udev in more detail, and why solving it in either of those > places won't work. As it stands you are proposing yet another wart to > the already complex set of network interface API's which has implications > for security as well as increasing the number of possible bugs. The fundamental challenge is that system administrators, particularly those of server-class hardware with multiple network ports present (some on the motherboard, some on add-in cards), have the not-so-unreasonable expectation that there is a deterministic mapping between those ports and the name one uses to address those ports. The fundamental roadblock to this is that enumeration != naming, except that it is for network devices, and we keep changing the enumeration order. Today, port naming is completely nondeterministic. If you have but one NIC, there are few chances to get the name wrong (it'll be eth0). If you have >1 NIC, chances increase to get it wrong. The complexity arises at multiple levels. First, device driver load order. In the 2.4 kernel days, and even mostly early 2.6 kernel days, the order in which network drivers loaded played a role in determining the name of the device. Drivers loaded first would get their devices named first. If I have two types of devices, say an e100-driven NIC and a tg3-driven NIC, I could figure out that the names would be eth0=e100 and eth1=tg3 by setting the load order in /etc/modules.conf (now modprobe.conf). If I wanted the other order, fine, just switch it around in modules.conf and reboot. OS installers, being the first running instance of Linux, before modprobe.conf existed to set that ordering, had to have other mechanisms to load drivers (often manually, or if programmatically such as in a kickstart or autoyast file, was still somewhat fixed). With the advent of modaliases + udev, now modprobe.conf doesn't contain this ordering anymore, and udev loads the drivers. So while it wasn't perfect, it was better than nothing, and that's gone now. It gets even worse as, to speed up boot time, modprobes can be run in parallel, and even within individual drivers, the NICs get initialized (and named) in parallel. Further confusing things, some devices need firmware loaded into them before getting names assigned, which is done from userspace, and they race. Second, PCI device list order. In the 2.4 kernel days, the PCI device list was scanned "breadth-first" (for each bus; for each device; for each function; do load...). FWIW, Windows still does this. It gives BIOS, which assigns PCI bus numbers, a chance to put LOMs at a lower bus number than add-in cards. Module load order still mattered, but at least if you had say 2 e1000 ports as LOMs, and 2 e1000 ports on add-in cards, you pretty much knew the ordering would be eth0 as lowest bdf on the motherboard, eth1 as next bdf on the motherboard, and eth2 and 3 as the add-in cards in ascending slot order. With the advent of PCI hot plug in the 2.5 kernel series, the breadth-first ordering became depth-first. (for each bus; for each device; if the device is a bridge, scan the busses behind it.). This caused NICs on bus 0 device 5, and bus 1 device 3, (eth0 and 1 respectively) to be enumerated differently due to the a bridge from bus 0 to bus 1 at 0:4. My crude hack of pci=bfsort, with some dmi strings to match and auto-enable, at least reverted this back to the ordering the 2.4 kernel and Windows used. Now we have to keep adding systems to this DMI list (Dell has a number of systems on this list today; HP has even more). And it doesn't completely solve the problem, just masks it. So, to address the ordering problem, I placed a constraint on our server hardware teams, forcing them to lay out their boards and assign PCIe lanes and bus numbers, such that at least the designed "first" LOM would get found first in either depth-first or breadth-first order. Our 10G and 11G servers have this restriction in place, though it wasn't easy. And it's gotten even harder, as the PCIe switches expand the number of lanes available. We no longer have the traditional tiered buses architecture, but the PCI layer for this purpose thinks we do. I need to remove this constraint on the hardware teams - it's gotten to be impossible for the chipset lanes to be laid out efficiently with this constraint. All of the above just papered over the enumeration != naming problem. Third, stateless computing is becoming more and more commonplace. The Field Replaceable Unit is the server itself. Got a bad server? Pull it out, move the disks to an identical unit, insert the new server, and go. Fix the bad server offline and bring it back. In this model, having MAC addresses as the mechanism that is providing the determinism (/etc/mactab or udev persistent naming rules) breaks, because the MAC addresses of the ports on the new server won't be the same as on the old server. HP even has a technology to solve _this_ problem (in their blade chassis) - Virtual Connect. The MACs get assigned by the chassis to the blades at POST, and are fixed to the slot. Slick, and Dell has an even more flexible similar feature FlexAddress. This doesn't solve the OS installer problem of "which of these NICs should I use to do an install?" but it does recognize the problem space and tries to overcome it. Fourth, for OS installers, choosing which NIC to use at installtime, when all the NICs are plugged in, can be difficult. PXE environments, using pxelinux and its IPAPPEND 2 option, will append "BOOTIF=xx:xx:xx:xx:xx:xx" to the kernel command line, that containing the MAC address of the NIC used for PXE. Neat trick. Yes, we then had to teach the OS installers to recognize and use this. But it only works if you PXE boot, and only for that one NIC. Fifth, network devices can have only a single name. eth0. If we look at disks, we see udev manages a tree of symlinks for /dev/disk/by-label, /dev/disk/by-path, /dev/disk/by-uuid. And as a system admin, if I wanted to also create a udev rule for /dev/disk/by-function (boot, swap, mattsstorage), it's trivial to do so. Why can't we have this flexibility for network devices too? So, how do we get deterministic naming for all the NICs in a system? That's what I'm going for. Picture a network switch, with several blades, and several ports on each blade. The network admin addresses each port as say 1/16 (the 16th port on blade 1, clearly labeled). The parallel on servers is the chassis label printed on the outside (say, "Gb1"). But due the above, there is no guarantee, and in fact little chance, that Gb1 will be consistently named eth0 - it may vary from boot to boot. That's full of fail. For a concrete example, the 4 bnx2 chips in my PowerEdge R610 with a current 2.6 kernel, loading only one driver, the ports get assigned names in nondeterministic order on each boot. Given that the ifcfg-eth* rules, netfilter rules, and the rest all expect deterministic naming, massive failure ensues unless some form of determinism is brought back in. The idea to use a character device node to expose the ifindex value, and udev to manage a tree of symlinks to it, really follows the model used today for disks. It allows us to get deterministic names for devices (albeit, the names are symlinks), and multiple names for devices (through multiple symlink rules). That some people want to use the char device to call ioctl() and read/write, as is possible on the BSDs, would just be gravy IMHO. It does require a change in behavior for a system administrator. Instead of hard-coding 'eth0' into her scripts, she uses '/dev/net/by-function/boot' or somesuch. But then that name is guaranteed to always refer to the "right" NIC. Every admin I've spoken to is willing to make this kind of change, as long as they get the consistent, deterministic naming they expect but don't have today. And it does require patching userspace apps to take both a kernel device name, or a path, and to resolve the path to device name or ifindex. We wrote libnetdevname (really, one function), and have patches for several userspace apps to use it, to prove it can be done. One alternative would be to do something using the sysfs ifindex value already exported. e.g. /sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/0000:06:07.0/net/eth0/ifindex but we have never had symlinks from /dev into /sys before (doesn't mean we couldn't though). In that case, udev would grow to manage /dev/net/by-chassis-label/Embedded_NIC_1 -> /sys/devices/.../net/eth0, and libnetdevname would be used to follow the symlink in applications. This approach could solve my problem without (many or any?) kernel changes needed, but wouldn't help those who want to do ioctl/read/write to a devnode. Given the problem, I really do need a solution. I've proposed one method, and an alternative, but I can't afford to let the problem stay unaddressed any longer, and need a clear direction to be chosen. The char device gives me what I need, and others what they want also. Thanks for listening to the diatribe. For more examples and workarounds that we've been telling our customers for several years, check out http://linux.dell.com/papers.shtml for the Network Interface Card Naming whitepaper. -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 4:40 ` Matt Domsch @ 2009-10-10 5:23 ` Greg KH 2009-10-10 8:17 ` Sujit K M ` (4 more replies) 2009-10-10 18:32 ` Stephen Hemminger 2009-10-11 0:37 ` Marco d'Itri 2 siblings, 5 replies; 86+ messages in thread From: Greg KH @ 2009-10-10 5:23 UTC (permalink / raw) To: Matt Domsch Cc: Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: > The fundamental roadblock to this is that enumeration != naming, > except that it is for network devices, and we keep changing the > enumeration order. No, the hardware changes the enumeration order, it places _no_ guarantees on what order stuff will be found in. So this is not the kernel changing, just to be clear. Again, I have a machine here that likes to reorder PCI devices every 4th or so boot times, and that's fine according to the PCI spec. Yeah, it's a crappy BIOS, but the manufacturer rightly pointed out that it is not in violation of anything. > Today, port naming is completely nondeterministic. If you have but > one NIC, there are few chances to get the name wrong (it'll be eth0). > If you have >1 NIC, chances increase to get it wrong. That is why all distros name network devices based on the only deterministic thing they have today, the MAC address. I still fail to see why you do not like this solution, it is honestly the only way to properly name network devices in a sane manner. All distros also provide a way to easily rename the network devices, to place a specific name on a specific MAC address, so again, this should all be solved already. No matter how badly your BIOS teams mess up the PCI enumeration order :) thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 5:23 ` Greg KH @ 2009-10-10 8:17 ` Sujit K M 2009-10-10 16:27 ` Greg KH 2009-10-10 12:47 ` Matt Domsch ` (3 subsequent siblings) 4 siblings, 1 reply; 86+ messages in thread From: Sujit K M @ 2009-10-10 8:17 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave Greg, > No, the hardware changes the enumeration order, it places _no_ > guarantees on what order stuff will be found in. So this is not the > kernel changing, just to be clear. > Again, I have a machine here that likes to reorder PCI devices every 4th > or so boot times, and that's fine according to the PCI spec. Yeah, it's > a crappy BIOS, but the manufacturer rightly pointed out that it is not > in violation of anything. > I think the open call should be implemented then. By the patch very little knowledge is being shared on type of network implementation it is trying to do.Also it is messing with core datastructure and procedures. This seems to be simplified by changing implementing the other operations like poll(). > That is why all distros name network devices based on the only > deterministic thing they have today, the MAC address. I still fail to > see why you do not like this solution, it is honestly the only way to > properly name network devices in a sane manner. This is feature that needs to be implemented. As per the rules followed. > > All distros also provide a way to easily rename the network devices, to > place a specific name on a specific MAC address, so again, this should > all be solved already. > > No matter how badly your BIOS teams mess up the PCI enumeration order :) This is an problem, But I think this can be solved by implementing some of the routines in the network device. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 8:17 ` Sujit K M @ 2009-10-10 16:27 ` Greg KH 2009-10-10 19:00 ` Ben Hutchings 0 siblings, 1 reply; 86+ messages in thread From: Greg KH @ 2009-10-10 16:27 UTC (permalink / raw) To: Sujit K M Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 01:47:39PM +0530, Sujit K M wrote: > Greg, > > > > No, the hardware changes the enumeration order, it places _no_ > > guarantees on what order stuff will be found in. ?So this is not the > > kernel changing, just to be clear. > > Again, I have a machine here that likes to reorder PCI devices every 4th > > or so boot times, and that's fine according to the PCI spec. ?Yeah, it's > > a crappy BIOS, but the manufacturer rightly pointed out that it is not > > in violation of anything. > > > > I think the open call should be implemented then. By the patch very little > knowledge is being shared on type of network implementation it is trying to > do. What would open() accomplish? What good would the file descriptor be? What could you use it for? > Also it is messing with core datastructure and procedures. This seems > to be simplified by changing implementing the other operations like poll(). I don't understand. > > That is why all distros name network devices based on the only > > deterministic thing they have today, the MAC address. ?I still fail to > > see why you do not like this solution, it is honestly the only way to > > properly name network devices in a sane manner. > > This is feature that needs to be implemented. As per the rules followed. This feature is already implemented today, all distros have it. > > All distros also provide a way to easily rename the network devices, to > > place a specific name on a specific MAC address, so again, this should > > all be solved already. > > > > No matter how badly your BIOS teams mess up the PCI enumeration order :) > > This is an problem, But I think this can be solved by implementing some of the > routines in the network device. I don't, see the rules that your distro ships today for persistant network devices, it's already there, no need to change the kernel at all. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 16:27 ` Greg KH @ 2009-10-10 19:00 ` Ben Hutchings 2009-10-10 21:10 ` Greg KH 0 siblings, 1 reply; 86+ messages in thread From: Ben Hutchings @ 2009-10-10 19:00 UTC (permalink / raw) To: Greg KH Cc: Sujit K M, Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, 2009-10-10 at 09:27 -0700, Greg KH wrote: > On Sat, Oct 10, 2009 at 01:47:39PM +0530, Sujit K M wrote: > > Greg, > > > > > > > No, the hardware changes the enumeration order, it places _no_ > > > guarantees on what order stuff will be found in. ?So this is not the > > > kernel changing, just to be clear. > > > Again, I have a machine here that likes to reorder PCI devices every 4th > > > or so boot times, and that's fine according to the PCI spec. ?Yeah, it's > > > a crappy BIOS, but the manufacturer rightly pointed out that it is not > > > in violation of anything. > > > > > > > I think the open call should be implemented then. By the patch very little > > knowledge is being shared on type of network implementation it is trying to > > do. > > What would open() accomplish? What good would the file descriptor be? > What could you use it for? Currently all net device ioctls are carried out through arbitrary sockets and identify the device by name (aside from one to look up the name by ifindex). Ever since it became possible to rename net devices, it has been possible for a sequence of ioctls intended for one device to race with renaming of that device. Adding open() and ioctl() to the character device (which seems reasonably easy) would provide a way to avoid this. On the other hand, the netlink configuration APIs already use ifindex so it may be better just to say that the device ioctls are deprecated and applications should use netlink. > > Also it is messing with core datastructure and procedures. This seems > > to be simplified by changing implementing the other operations like poll(). > > I don't understand. > > > > That is why all distros name network devices based on the only > > > deterministic thing they have today, the MAC address. ?I still fail to > > > see why you do not like this solution, it is honestly the only way to > > > properly name network devices in a sane manner. > > > > This is feature that needs to be implemented. As per the rules followed. > > This feature is already implemented today, all distros have it. No, see below. > > > All distros also provide a way to easily rename the network devices, to > > > place a specific name on a specific MAC address, so again, this should > > > all be solved already. > > > > > > No matter how badly your BIOS teams mess up the PCI enumeration order :) > > > > This is an problem, But I think this can be solved by implementing some of the > > routines in the network device. > > I don't, see the rules that your distro ships today for persistant > network devices, it's already there, no need to change the kernel at > all. The udev persistent net rules work tolerably well for a single system with a stable set of net devices. They do not solve the problem Matt's talking about, which is lack of consistency between multiple systems, because the initial enumeration order is not predictable. They also result in name changes when a NIC (or motherboard) is swapped. For some users, that's fine; for others, it's not. The ability to specify NICs by port name or PCI address should solve these problems. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 19:00 ` Ben Hutchings @ 2009-10-10 21:10 ` Greg KH 0 siblings, 0 replies; 86+ messages in thread From: Greg KH @ 2009-10-10 21:10 UTC (permalink / raw) To: Ben Hutchings Cc: Sujit K M, Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 08:00:30PM +0100, Ben Hutchings wrote: > On the other hand, the netlink configuration APIs already use ifindex so > it may be better just to say that the device ioctls are deprecated and > applications should use netlink. I thought that is what was already encouraged to happen. > > > > That is why all distros name network devices based on the only > > > > deterministic thing they have today, the MAC address. ?I still fail to > > > > see why you do not like this solution, it is honestly the only way to > > > > properly name network devices in a sane manner. > > > > > > This is feature that needs to be implemented. As per the rules followed. > > > > This feature is already implemented today, all distros have it. > > No, see below. Yes, if not, file a bug in your distro, all of the infrastructure is already in place, and the udev rules and scripts are already written. > > > > All distros also provide a way to easily rename the network devices, to > > > > place a specific name on a specific MAC address, so again, this should > > > > all be solved already. > > > > > > > > No matter how badly your BIOS teams mess up the PCI enumeration order :) > > > > > > This is an problem, But I think this can be solved by implementing some of the > > > routines in the network device. > > > > I don't, see the rules that your distro ships today for persistant > > network devices, it's already there, no need to change the kernel at > > all. > > The udev persistent net rules work tolerably well for a single system > with a stable set of net devices. > > They do not solve the problem Matt's talking about, which is lack of > consistency between multiple systems, because the initial enumeration > order is not predictable. Again, you name the device as a MAC address. Or something else that the BIOS exports in a unique manner (PCI slot name, etc.). That is consistant. If not, then fix the BIOS. > They also result in name changes when a NIC (or motherboard) is swapped. > For some users, that's fine; for others, it's not. > > The ability to specify NICs by port name or PCI address should solve > these problems. That can be done today quite easily. But note that PCI addresses are not guaranteed to be stable. As lots of machines are known to have happen. Again, none of this requires any kernel changes today at all, let alone adding dummy char devices for network devices. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 5:23 ` Greg KH 2009-10-10 8:17 ` Sujit K M @ 2009-10-10 12:47 ` Matt Domsch 2009-10-10 16:25 ` Greg KH 2009-10-10 18:11 ` Bill Fink ` (2 subsequent siblings) 4 siblings, 1 reply; 86+ messages in thread From: Matt Domsch @ 2009-10-10 12:47 UTC (permalink / raw) To: Greg KH Cc: Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Fri, Oct 09, 2009 at 10:23:08PM -0700, Greg KH wrote: > On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: > > The fundamental roadblock to this is that enumeration != naming, > > except that it is for network devices, and we keep changing the > > enumeration order. > > No, the hardware changes the enumeration order, it places _no_ > guarantees on what order stuff will be found in. So this is not the > kernel changing, just to be clear. Over time the kernel has changed its enumeration mechanisms, and introduced parallelism into the process (which is a good thing), which, from a user perspective, makes names nondeterministic. Yes, fixing this up by hard-coding MAC addresses after install has been the traditional mechanism to address this. I think there's a better way. > Again, I have a machine here that likes to reorder PCI devices every 4th > or so boot times, and that's fine according to the PCI spec. Yeah, it's > a crappy BIOS, but the manufacturer rightly pointed out that it is not > in violation of anything. I haven't encounted this myself, but yes, it's valid but annoying. > > Today, port naming is completely nondeterministic. If you have but > > one NIC, there are few chances to get the name wrong (it'll be eth0). > > If you have >1 NIC, chances increase to get it wrong. > > That is why all distros name network devices based on the only > deterministic thing they have today, the MAC address. I still fail to > see why you do not like this solution, it is honestly the only way to > properly name network devices in a sane manner. > > All distros also provide a way to easily rename the network devices, to > place a specific name on a specific MAC address, so again, this should > all be solved already. It's not the only way, it introduces state where there's a desire for a stateless solution, it's useless for getting all the names right at initial OS install time, and it restricts us to a single "name" for a given device. We can get additional information from BIOS. SMBIOS 2.6 (types 9 and 41) has the fields to let us get a "label" for an device at a given b/d/f. On my PowerEdge R610, I see "Embedded NIC 1" .. "Embedded NIC 4" for the 4 LOMs. These labels have a clear correlation to the labels on the back of the chassis at these ports. biosdevname can parse and report this. HP made a similar vendor-specific extension to SMBIOS for their platforms, which biosdevname also parses. Even if BIOS decides they need to renumber the busses on every boot, it can keep this table correct. (insert general mistrust of BIOS authors rant; that's not the point here.) biosdevname can be used in udev rules to create multiple names for a given device. Rules such as: PROGRAM="/sbin/biosdevname --policy=all_names -i %k", SYMLINK+="net/by-slot-name/%c", OPTIONS+="string_escape=replace" PROGRAM="/sbin/biosdevname --policy=smbios_names -i %k", SYMLINK+="net/by-chassis-label/%c", OPTIONS+="string_escape=replace" SMBIOS has its own problems, specifically that it's not hot-plug aware (it's a static table created during POST). And if a better way is found (perhaps through the PCI SIG or ACPI), great, biosdevname can be extended to use it. But, without at least a change in udev or the kernel, it doesn't do any good. > No matter how badly your BIOS teams mess up the PCI enumeration > order :) In my case, the BIOS for a given system always configures the ports the same way, and assigns b/d/f the same way. With no change in the BIOS or hardware, I still see the ports enumerated differently on each boot. :-( -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 12:47 ` Matt Domsch @ 2009-10-10 16:25 ` Greg KH 2009-10-10 17:34 ` Bryan Kadzban 2009-10-11 16:40 ` David Zeuthen 0 siblings, 2 replies; 86+ messages in thread From: Greg KH @ 2009-10-10 16:25 UTC (permalink / raw) To: Matt Domsch Cc: Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 07:47:32AM -0500, Matt Domsch wrote: > On Fri, Oct 09, 2009 at 10:23:08PM -0700, Greg KH wrote: > > On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: > > > The fundamental roadblock to this is that enumeration != naming, > > > except that it is for network devices, and we keep changing the > > > enumeration order. > > > > No, the hardware changes the enumeration order, it places _no_ > > guarantees on what order stuff will be found in. So this is not the > > kernel changing, just to be clear. > > Over time the kernel has changed its enumeration mechanisms, and > introduced parallelism into the process (which is a good thing), > which, from a user perspective, makes names nondeterministic. Yes, > fixing this up by hard-coding MAC addresses after install has been > the traditional mechanism to address this. I think there's a better > way. Ok, but that way can be done in userspace, without the need for this char device, right? > > > Today, port naming is completely nondeterministic. If you have but > > > one NIC, there are few chances to get the name wrong (it'll be eth0). > > > If you have >1 NIC, chances increase to get it wrong. > > > > That is why all distros name network devices based on the only > > deterministic thing they have today, the MAC address. I still fail to > > see why you do not like this solution, it is honestly the only way to > > properly name network devices in a sane manner. > > > > All distros also provide a way to easily rename the network devices, to > > place a specific name on a specific MAC address, so again, this should > > all be solved already. > > It's not the only way, it introduces state where there's a desire for > a stateless solution, it's useless for getting all the names right at > initial OS install time, and it restricts us to a single "name" for a > given device. > > We can get additional information from BIOS. SMBIOS 2.6 (types 9 and > 41) has the fields to let us get a "label" for an device at a given > b/d/f. On my PowerEdge R610, I see "Embedded NIC 1" .. "Embedded NIC > 4" for the 4 LOMs. These labels have a clear correlation to the > labels on the back of the chassis at these ports. biosdevname can > parse and report this. HP made a similar vendor-specific extension to > SMBIOS for their platforms, which biosdevname also parses. Even if > BIOS decides they need to renumber the busses on every boot, it can > keep this table correct. (insert general mistrust of BIOS authors > rant; that's not the point here.) > > biosdevname can be used in udev rules to create multiple names for a > given device. Rules such as: Yes, if you want multiple ways to name a network device, then you need the char nodes. But without that, you can just pick "always use the biosdevname" type option from your distro setup screen and go with that. Then you have everything always working properly from the very beginning. > > No matter how badly your BIOS teams mess up the PCI enumeration > > order :) > > In my case, the BIOS for a given system always configures the ports > the same way, and assigns b/d/f the same way. With no change in the > BIOS or hardware, I still see the ports enumerated differently on each > boot. :-( Again, that's legal from a PCI standpoint :) So you really want this for multiple ways to name the same network device. That's a choice the network developers are going to have to make, as to if that is going to be a legal thing to have happen or not. But this code is not a requirement to "solve" the fact that network devices can show up in different order, that problem can be solved as long as the user picks a single way to name the devices, using tools that are already present today in distros. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 16:25 ` Greg KH @ 2009-10-10 17:34 ` Bryan Kadzban 2009-10-10 21:13 ` Greg KH 2009-10-11 16:40 ` David Zeuthen 1 sibling, 1 reply; 86+ messages in thread From: Bryan Kadzban @ 2009-10-10 17:34 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave [-- Attachment #1: Type: text/plain, Size: 3218 bytes --] Greg KH wrote: > On Sat, Oct 10, 2009 at 07:47:32AM -0500, Matt Domsch wrote: >> On Fri, Oct 09, 2009 at 10:23:08PM -0700, Greg KH wrote: >>> On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: >>>> The fundamental roadblock to this is that enumeration != >>>> naming, except that it is for network devices, and we keep >>>> changing the enumeration order. >>> No, the hardware changes the enumeration order, it places _no_ >>> guarantees on what order stuff will be found in. So this is not >>> the kernel changing, just to be clear. >> Over time the kernel has changed its enumeration mechanisms, and >> introduced parallelism into the process (which is a good thing), >> which, from a user perspective, makes names nondeterministic. Yes, >> fixing this up by hard-coding MAC addresses after install has been >> the traditional mechanism to address this. I think there's a >> better way. > > Ok, but that way can be done in userspace, without the need for this > char device, right? For the record -- when I tried to send a patch that did exactly this (provided an option to use by-path persistence for network drivers), it was rejected because "that doesn't work for USB". True, it doesn't. But by-mac (what we have today) doesn't work for replacing motherboards in a random home system (that can't override the MAC address in the BIOS), either. So why not provide both alternatives? As you say below, it's up to the network devs whether this should be allowed... >> biosdevname can be used in udev rules to create multiple names for >> a given device. Rules such as: > > Yes, if you want multiple ways to name a network device, then you > need the char nodes. But without that, you can just pick "always use > the biosdevname" type option from your distro setup screen and go > with that. Then you have everything always working properly from the > very beginning. *If* biosdevname works on your system. It doesn't on mine: this SMBIOS extension doesn't exist. :-) > So you really want this for multiple ways to name the same network > device. That's a choice the network developers are going to have to > make, as to if that is going to be a legal thing to have happen or > not. Yes. So do I, actually (for what little that's worth)... > But this code is not a requirement to "solve" the fact that network > devices can show up in different order, that problem can be solved as > long as the user picks a single way to name the devices, using tools > that are already present today in distros. This code is not a requirement, no. But -- as you say -- it does provide a halfway-decent way to assign multiple names to a NIC. And that provides admins the choice to use a couple different persistence schemes, depending on how they expect their hardware to work. (It *may* even be possible to use some kind of layer-2 traffic to see what else is on the connected network and provide symlinks based on that. IPv6 autoconfig type of thing, maybe. That's probably a *lot* more complicated, and may be impossible, but would be even closer to what I think Dell customers are asking for based on Matt's posts.) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 260 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 17:34 ` Bryan Kadzban @ 2009-10-10 21:13 ` Greg KH 2009-10-12 6:21 ` Bryan Kadzban 0 siblings, 1 reply; 86+ messages in thread From: Greg KH @ 2009-10-10 21:13 UTC (permalink / raw) To: Bryan Kadzban Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 10:34:16AM -0700, Bryan Kadzban wrote: > Greg KH wrote: > > On Sat, Oct 10, 2009 at 07:47:32AM -0500, Matt Domsch wrote: > >> On Fri, Oct 09, 2009 at 10:23:08PM -0700, Greg KH wrote: > >>> On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: > >>>> The fundamental roadblock to this is that enumeration != > >>>> naming, except that it is for network devices, and we keep > >>>> changing the enumeration order. > >>> No, the hardware changes the enumeration order, it places _no_ > >>> guarantees on what order stuff will be found in. So this is not > >>> the kernel changing, just to be clear. > >> Over time the kernel has changed its enumeration mechanisms, and > >> introduced parallelism into the process (which is a good thing), > >> which, from a user perspective, makes names nondeterministic. Yes, > >> fixing this up by hard-coding MAC addresses after install has been > >> the traditional mechanism to address this. I think there's a > >> better way. > > > > Ok, but that way can be done in userspace, without the need for this > > char device, right? > > For the record -- when I tried to send a patch that did exactly this > (provided an option to use by-path persistence for network drivers), it > was rejected because "that doesn't work for USB". > > True, it doesn't. But by-mac (what we have today) doesn't work for > replacing motherboards in a random home system (that can't override the > MAC address in the BIOS), either. If you replace a motherboard, you honestly expect no configuration to be needed to be changed? If so, then don't use the MAC naming scheme for your systems. > > But this code is not a requirement to "solve" the fact that network > > devices can show up in different order, that problem can be solved as > > long as the user picks a single way to name the devices, using tools > > that are already present today in distros. > > This code is not a requirement, no. But -- as you say -- it does > provide a halfway-decent way to assign multiple names to a NIC. And > that provides admins the choice to use a couple different persistence > schemes, depending on how they expect their hardware to work. But the names need to then be resolved back to a "real" kernel name in order to do anything with that network connection, as the char devices are not real ones. So that adds an additional layer of complexity on all of the system configuration tools. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 21:13 ` Greg KH @ 2009-10-12 6:21 ` Bryan Kadzban 2009-10-12 16:19 ` Bryan Kadzban 0 siblings, 1 reply; 86+ messages in thread From: Bryan Kadzban @ 2009-10-12 6:21 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave [-- Attachment #1: Type: text/plain, Size: 3285 bytes --] Greg KH wrote: > On Sat, Oct 10, 2009 at 10:34:16AM -0700, Bryan Kadzban wrote: >> Greg KH wrote: >>> On Sat, Oct 10, 2009 at 07:47:32AM -0500, Matt Domsch wrote: >>>> On Fri, Oct 09, 2009 at 10:23:08PM -0700, Greg KH wrote: >>>>> On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: >>>>>> The fundamental roadblock to this is that enumeration != >>>>>> naming, except that it is for network devices, and we keep >>>>>> changing the enumeration order. >>>>> No, the hardware changes the enumeration order, it places >>>>> _no_ guarantees on what order stuff will be found in. So >>>>> this is not the kernel changing, just to be clear. >>>> Over time the kernel has changed its enumeration mechanisms, >>>> and introduced parallelism into the process (which is a good >>>> thing), which, from a user perspective, makes names >>>> nondeterministic. Yes, fixing this up by hard-coding MAC >>>> addresses after install has been the traditional mechanism to >>>> address this. I think there's a better way. >>> Ok, but that way can be done in userspace, without the need for >>> this char device, right? >> For the record -- when I tried to send a patch that did exactly >> this (provided an option to use by-path persistence for network >> drivers), it was rejected because "that doesn't work for USB". >> >> True, it doesn't. But by-mac (what we have today) doesn't work for >> replacing motherboards in a random home system (that can't override >> the MAC address in the BIOS), either. > > If you replace a motherboard, you honestly expect no configuration to > be needed to be changed? If so, then don't use the MAC naming scheme > for your systems. What else is there? biosdevname doesn't work with this BIOS. It looks like at least path_id has been updated to work with NICs now, so that might work, with a bit of custom rule hacking. Or at least, it won't work any more poorly than for disks, which seem to work pretty well... :-) >>> But this code is not a requirement to "solve" the fact that >>> network devices can show up in different order, that problem can >>> be solved as long as the user picks a single way to name the >>> devices, using tools that are already present today in distros. >> This code is not a requirement, no. But -- as you say -- it does >> provide a halfway-decent way to assign multiple names to a NIC. >> And that provides admins the choice to use a couple different >> persistence schemes, depending on how they expect their hardware to >> work. > > But the names need to then be resolved back to a "real" kernel name > in order to do anything with that network connection, as the char > devices are not real ones. So that adds an additional layer of > complexity on all of the system configuration tools. Yes, that is true -- and no, this change isn't perfect. But it lets me have multiple "names" per interface, and have "names" that are longer than IFNAMSIZ, though, which is why I like it. (Now, if open() would return effectively a netlink socket bound to that ifindex already, such that the program didn't need to fill in the various ifindex fields for e.g. rtnetlink... but it's probably really hard to do that, so this isn't a serious suggestion.) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 260 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 6:21 ` Bryan Kadzban @ 2009-10-12 16:19 ` Bryan Kadzban 0 siblings, 0 replies; 86+ messages in thread From: Bryan Kadzban @ 2009-10-12 16:19 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave [-- Attachment #1: Type: text/plain, Size: 788 bytes --] Bryan Kadzban wrote: > (Now, if open() would return effectively a netlink socket bound to > that ifindex already, such that the program didn't need to fill in > the various ifindex fields for e.g. rtnetlink... but it's probably > really hard to do that, so this isn't a serious suggestion.) Wait, scratch that. It's not "really hard", it's "almost impossible". At open() time, you have no idea which netlink family the program wants to communicate with. bind() is also hard. (In theory, you could support bind() on this new FD -- but then why is userspace using a file in the first place, and not a socket?) So this is even less of a serious suggestion now. I'd still like to be able to refer to NICs by multiple names though, if we can find a way that works... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 260 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 16:25 ` Greg KH 2009-10-10 17:34 ` Bryan Kadzban @ 2009-10-11 16:40 ` David Zeuthen 2009-10-11 18:47 ` Greg KH 1 sibling, 1 reply; 86+ messages in thread From: David Zeuthen @ 2009-10-11 16:40 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, 2009-10-10 at 09:25 -0700, Greg KH wrote: > Ok, but that way can be done in userspace, without the need for this > char device, right? It might actually be nice to have a device file anyway since you can use existing udev infrastructure to adjust permissions (e.g. chown it to the netdev group) and add ACLs. This would allow running some software as an unprivileged user instead of uid 0. David ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-11 16:40 ` David Zeuthen @ 2009-10-11 18:47 ` Greg KH 0 siblings, 0 replies; 86+ messages in thread From: Greg KH @ 2009-10-11 18:47 UTC (permalink / raw) To: David Zeuthen Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sun, Oct 11, 2009 at 12:40:18PM -0400, David Zeuthen wrote: > On Sat, 2009-10-10 at 09:25 -0700, Greg KH wrote: > > Ok, but that way can be done in userspace, without the need for this > > char device, right? > > It might actually be nice to have a device file anyway since you can use > existing udev infrastructure to adjust permissions (e.g. chown it to the > netdev group) and add ACLs. This would allow running some software as an > unprivileged user instead of uid 0. But as the char nodes would not actually control access to anything, how would this help? Remember, these device nodes are "dummies" with nothing behind them (open() fails). thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 5:23 ` Greg KH 2009-10-10 8:17 ` Sujit K M 2009-10-10 12:47 ` Matt Domsch @ 2009-10-10 18:11 ` Bill Fink 2009-10-10 18:35 ` Kay Sievers 2009-10-11 21:10 ` Rob Townley 2009-10-12 17:45 ` Bill Nottingham 4 siblings, 1 reply; 86+ messages in thread From: Bill Fink @ 2009-10-10 18:11 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Fri, 9 Oct 2009, Greg KH wrote: > On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: > > The fundamental roadblock to this is that enumeration != naming, > > except that it is for network devices, and we keep changing the > > enumeration order. > > No, the hardware changes the enumeration order, it places _no_ > guarantees on what order stuff will be found in. So this is not the > kernel changing, just to be clear. > > Again, I have a machine here that likes to reorder PCI devices every 4th > or so boot times, and that's fine according to the PCI spec. Yeah, it's > a crappy BIOS, but the manufacturer rightly pointed out that it is not > in violation of anything. > > > Today, port naming is completely nondeterministic. If you have but > > one NIC, there are few chances to get the name wrong (it'll be eth0). > > If you have >1 NIC, chances increase to get it wrong. > > That is why all distros name network devices based on the only > deterministic thing they have today, the MAC address. I still fail to > see why you do not like this solution, it is honestly the only way to > properly name network devices in a sane manner. > > All distros also provide a way to easily rename the network devices, to > place a specific name on a specific MAC address, so again, this should > all be solved already. > > No matter how badly your BIOS teams mess up the PCI enumeration order :) No comment on the specific implementation decision, but I am in the process of setting up a large number of test systems with identical hardware configurations, and using a master disk image to clone all the test systems. The biggest pain in this process is identiying the MAC addresses for each of the six or more network interfaces in each test system (we want eth0...ethN to always reference the same physical port on the test systems), and then having to modify the 70-persistent-net.rules udev file and the HWADDR entry for all the ifcfg-ethX files to reflect the correct MAC addresses. It would be fantastic if there were some mechanism for making this part of the process unnecessary. -Bill ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 18:11 ` Bill Fink @ 2009-10-10 18:35 ` Kay Sievers 0 siblings, 0 replies; 86+ messages in thread From: Kay Sievers @ 2009-10-10 18:35 UTC (permalink / raw) To: Bill Fink Cc: Greg KH, Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 20:11, Bill Fink <billfink@mindspring.com> wrote: > No comment on the specific implementation decision, but I am in the > process of setting up a large number of test systems with identical > hardware configurations, and using a master disk image to clone all the > test systems. The biggest pain in this process is identiying the MAC > addresses for each of the six or more network interfaces in each test > system (we want eth0...ethN to always reference the same physical port > on the test systems), and then having to modify the 70-persistent-net.rules > udev file and the HWADDR entry for all the ifcfg-ethX files to reflect > the correct MAC addresses. It would be fantastic if there were some > mechanism for making this part of the process unnecessary. Udev creates the persistent rules only if no other rule set a name. Adding something like: SUBSYSTEM=="net", KERNEL==""eth*", NAME="eth%n" in any earlier rules file before the udev generated one will skip all off the automatic udev rule creation. Kay ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 5:23 ` Greg KH ` (2 preceding siblings ...) 2009-10-10 18:11 ` Bill Fink @ 2009-10-11 21:10 ` Rob Townley 2009-10-11 23:04 ` Matt Domsch 2009-10-12 3:00 ` Greg KH 2009-10-12 17:45 ` Bill Nottingham 4 siblings, 2 replies; 86+ messages in thread From: Rob Townley @ 2009-10-11 21:10 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 12:23 AM, Greg KH <greg@kroah.com> wrote: > On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote: >> The fundamental roadblock to this is that enumeration != naming, >> except that it is for network devices, and we keep changing the >> enumeration order. > > No, the hardware changes the enumeration order, it places _no_ > guarantees on what order stuff will be found in. So this is not the > kernel changing, just to be clear. > > Again, I have a machine here that likes to reorder PCI devices every 4th > or so boot times, and that's fine according to the PCI spec. Yeah, it's > a crappy BIOS, but the manufacturer rightly pointed out that it is not > in violation of anything. > >> Today, port naming is completely nondeterministic. If you have but >> one NIC, there are few chances to get the name wrong (it'll be eth0). >> If you have >1 NIC, chances increase to get it wrong. > > That is why all distros name network devices based on the only > deterministic thing they have today, the MAC address. I still fail to > see why you do not like this solution, it is honestly the only way to > properly name network devices in a sane manner. > > All distros also provide a way to easily rename the network devices, to > place a specific name on a specific MAC address, so again, this should > all be solved already. > > No matter how badly your BIOS teams mess up the PCI enumeration order :) > > thanks, > > greg k-h > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > So when an add-in PCI NIC has a lower MAC than the motherboard NICs, the add-in cards will come before the motherboard NICs. i don't like it. But please whatever is done, make sure ping and tracert still works when telling it to use a ethX source interface: eth0 = 4.3.2.8, the default gateway is thru eth1. ping -I eth0 208.67.222.222 FAILS ping -I 4.3.2.8 208.67.222.222 WORKS tracert -i eth0 -I 208.67.222.222 FAILS tracert -s 4.3.2.8 -I 208.67.222.222 WORKS tracert -i eth0 208.67.222.222 FAILS tracert -s 4.3.2.8 208.67.222.222 WORKS ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-11 21:10 ` Rob Townley @ 2009-10-11 23:04 ` Matt Domsch 2009-10-12 3:00 ` Greg KH 1 sibling, 0 replies; 86+ messages in thread From: Matt Domsch @ 2009-10-11 23:04 UTC (permalink / raw) To: Rob Townley Cc: Greg KH, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sun, Oct 11, 2009 at 04:10:03PM -0500, Rob Townley wrote: > So when an add-in PCI NIC has a lower MAC than the motherboard NICs, > the add-in cards will come before the motherboard NICs. i don't like it. Actually, MAC address has nothing to do with device naming/ordering at all. Often systems will have onboard NICs in ascending MAC address order, but that's not a requirement, and I've seen systems not do that. And once you get to add-in vs onboard, BIOS wouldn't be able to enforce such an ordering anyhow (in general). But yes, you raise the point that, without using MAC-assigned names or another naming mechanism designed to cope with this, adding or removing a card can cause a difference in device enumeration, and thus name. -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-11 21:10 ` Rob Townley 2009-10-11 23:04 ` Matt Domsch @ 2009-10-12 3:00 ` Greg KH 2009-10-12 18:35 ` Rob Townley 1 sibling, 1 reply; 86+ messages in thread From: Greg KH @ 2009-10-12 3:00 UTC (permalink / raw) To: Rob Townley Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sun, Oct 11, 2009 at 04:10:03PM -0500, Rob Townley wrote: > So when an add-in PCI NIC has a lower MAC than the motherboard NICs, > the add-in cards will come before the motherboard NICs. i don't like it. Huh? Have you used the MAC persistant rules? If you add a new card, what does it pick for it? > But please whatever is done, make sure ping and tracert still works when > telling it to use a ethX source interface: > > eth0 = 4.3.2.8, the default gateway is thru eth1. > ping -I eth0 208.67.222.222 FAILS > ping -I 4.3.2.8 208.67.222.222 WORKS > tracert -i eth0 -I 208.67.222.222 FAILS > tracert -s 4.3.2.8 -I 208.67.222.222 WORKS > tracert -i eth0 208.67.222.222 FAILS > tracert -s 4.3.2.8 208.67.222.222 WORKS Again, is what we currently have broken? I am confused as to what this is referring to. greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 3:00 ` Greg KH @ 2009-10-12 18:35 ` Rob Townley 2009-10-12 18:44 ` Matt Domsch 0 siblings, 1 reply; 86+ messages in thread From: Rob Townley @ 2009-10-12 18:35 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sun, Oct 11, 2009 at 10:00 PM, Greg KH <greg@kroah.com> wrote: > On Sun, Oct 11, 2009 at 04:10:03PM -0500, Rob Townley wrote: >> So when an add-in PCI NIC has a lower MAC than the motherboard NICs, >> the add-in cards will come before the motherboard NICs. i don't like it. > > Huh? Have you used the MAC persistant rules? If you add a new card, > what does it pick for it? i have a hp-dl360 (two nics) with a fibre optic add in nic. On a fresh install, the add-in is eth0. i didn't like it, but ran it for years. > >> But please whatever is done, make sure ping and tracert still works when >> telling it to use a ethX source interface: >> >> eth0 = 4.3.2.8, the default gateway is thru eth1. >> ping -I eth0 208.67.222.222 FAILS >> ping -I 4.3.2.8 208.67.222.222 WORKS >> tracert -i eth0 -I 208.67.222.222 FAILS >> tracert -s 4.3.2.8 -I 208.67.222.222 WORKS >> tracert -i eth0 208.67.222.222 FAILS >> tracert -s 4.3.2.8 208.67.222.222 WORKS > > Again, is what we currently have broken? I am confused as to what this > is referring to. Yes, ping and traceroute are broken at least on Fedora, CentOS, and busybox. On a multinic, multigatewayed machine, passing ethX instead of the IP address will give the false result: "Destination Host Unreachable" when the machine's default gateway is reached thru the other nic. In the following example, the default gateway is thru eth1, not eth0. Pay attention to the text between the '*****'. ping -c 1 -B -I eth0 208.67.222.222 PING 208.67.222.222 (208.67.222.222) from ***** 4.3.2.8 eth0*****: 56(84) bytes of data. From 4.3.2.8 icmp_seq=1 Destination Host Unreachable #ping -c 1 -B -I 4.3.2.8 208.67.222.222 PING 208.67.222.222 (208.67.222.222) from ***** 4.3.2.8 *****: 56(84) bytes of data. 64 bytes from 208.67.222.222: icmp_seq=1 ttl=55 time=562 ms > > greg k-h > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 18:35 ` Rob Townley @ 2009-10-12 18:44 ` Matt Domsch 0 siblings, 0 replies; 86+ messages in thread From: Matt Domsch @ 2009-10-12 18:44 UTC (permalink / raw) To: Rob Townley Cc: Greg KH, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Mon, Oct 12, 2009 at 01:35:25PM -0500, Rob Townley wrote: > > Again, is what we currently have broken? I am confused as to what this > > is referring to. > > Yes, ping and traceroute are broken at least on Fedora, CentOS, and busybox. > On a multinic, multigatewayed machine, passing ethX instead of the IP > address will give the false result: "Destination Host Unreachable" > when the machine's default gateway is reached thru the other nic. In > the following example, the default gateway is thru eth1, not eth0. Unrelated to this thread. We're having a hard enough time making sure this conversation accurately reflects the views and needs of everyone involved. Please let's not throw in another tangent. Thanks, Matt -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 5:23 ` Greg KH ` (3 preceding siblings ...) 2009-10-11 21:10 ` Rob Townley @ 2009-10-12 17:45 ` Bill Nottingham 2009-10-12 17:55 ` Greg KH 4 siblings, 1 reply; 86+ messages in thread From: Bill Nottingham @ 2009-10-12 17:45 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave Greg KH (greg@kroah.com) said: > > Today, port naming is completely nondeterministic. If you have but > > one NIC, there are few chances to get the name wrong (it'll be eth0). > > If you have >1 NIC, chances increase to get it wrong. > > That is why all distros name network devices based on the only > deterministic thing they have today, the MAC address. I still fail to > see why you do not like this solution, it is honestly the only way to > properly name network devices in a sane manner. > > All distros also provide a way to easily rename the network devices, to > place a specific name on a specific MAC address, so again, this should > all be solved already. No, it's not solved. Even if you have persistent names once you install, if you ever re-image, you're likely to get *different* persistent names; the first load will always be non-detmerministic. The only way around this would be to have some sort of screen like: Would you like your network devices to be enumerated by [ ] MAC address [ ] PCI device order [ ] Driver name [ ] Other which is just all sorts of fail in and of itself. Especially since once you get to the point where you can coherently ask this in a native installer, the drivers have already loaded. Bill ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 17:45 ` Bill Nottingham @ 2009-10-12 17:55 ` Greg KH 2009-10-12 18:07 ` Bill Nottingham 0 siblings, 1 reply; 86+ messages in thread From: Greg KH @ 2009-10-12 17:55 UTC (permalink / raw) To: Bill Nottingham Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Mon, Oct 12, 2009 at 01:45:28PM -0400, Bill Nottingham wrote: > Greg KH (greg@kroah.com) said: > > > Today, port naming is completely nondeterministic. If you have but > > > one NIC, there are few chances to get the name wrong (it'll be eth0). > > > If you have >1 NIC, chances increase to get it wrong. > > > > That is why all distros name network devices based on the only > > deterministic thing they have today, the MAC address. I still fail to > > see why you do not like this solution, it is honestly the only way to > > properly name network devices in a sane manner. > > > > All distros also provide a way to easily rename the network devices, to > > place a specific name on a specific MAC address, so again, this should > > all be solved already. > > No, it's not solved. Even if you have persistent names once you install, > if you ever re-image, you're likely to get *different* persistent names; > the first load will always be non-detmerministic. > > The only way around this would be to have some sort of screen like: > > Would you like your network devices to be enumerated by > > [ ] MAC address > [ ] PCI device order > [ ] Driver name > [ ] Other [ ] PCI slot name That's one that modern systems are now reporting, and should solve Matt's problem as well, right? > which is just all sorts of fail in and of itself. Especially since > once you get to the point where you can coherently ask this in a > native installer, the drivers have already loaded. No, the driver load order doesn't determine this, you need the drivers loaded first before you can rename anything :) And I don't see how Matt's proposed patch helps resolve this type of issue any better than what we currently have today, do you? thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 17:55 ` Greg KH @ 2009-10-12 18:07 ` Bill Nottingham 2009-10-12 18:15 ` Greg KH 0 siblings, 1 reply; 86+ messages in thread From: Bill Nottingham @ 2009-10-12 18:07 UTC (permalink / raw) To: Greg KH Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave Greg KH (greg@kroah.com) said: > > No, it's not solved. Even if you have persistent names once you install, > > if you ever re-image, you're likely to get *different* persistent names; > > the first load will always be non-detmerministic. > > > > The only way around this would be to have some sort of screen like: > > > > Would you like your network devices to be enumerated by > > > > [ ] MAC address > > [ ] PCI device order > > [ ] Driver name > > [ ] Other > > [ ] PCI slot name > > That's one that modern systems are now reporting, and should solve > Matt's problem as well, right? ... maybe. On my laptop, the first 'slot' enumerated appears to be the cardbus bridge, before the on-board ethernet. And on the desktop next to me, the slot driver shows nothing. > And I don't see how Matt's proposed patch helps resolve this type of > issue any better than what we currently have today, do you? It allows multiple addressing schemes to be active at once, which can allow the admin to choose post-install without making an active choice at installation. This is an improvement, even if it doesn't solve the world. Bill ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-12 18:07 ` Bill Nottingham @ 2009-10-12 18:15 ` Greg KH 0 siblings, 0 replies; 86+ messages in thread From: Greg KH @ 2009-10-12 18:15 UTC (permalink / raw) To: Bill Nottingham Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Mon, Oct 12, 2009 at 02:07:42PM -0400, Bill Nottingham wrote: > Greg KH (greg@kroah.com) said: > > > No, it's not solved. Even if you have persistent names once you install, > > > if you ever re-image, you're likely to get *different* persistent names; > > > the first load will always be non-detmerministic. > > > > > > The only way around this would be to have some sort of screen like: > > > > > > Would you like your network devices to be enumerated by > > > > > > [ ] MAC address > > > [ ] PCI device order > > > [ ] Driver name > > > [ ] Other > > > > [ ] PCI slot name > > > > That's one that modern systems are now reporting, and should solve > > Matt's problem as well, right? > > ... maybe. On my laptop, the first 'slot' enumerated appears to be > the cardbus bridge, before the on-board ethernet. And on the desktop > next to me, the slot driver shows nothing. On servers, where this matters (multiple ethernet pci devices), this should all be present if the manufacturer wants it to be, as it is just an ACPI table entry. > > And I don't see how Matt's proposed patch helps resolve this type of > > issue any better than what we currently have today, do you? > > It allows multiple addressing schemes to be active at once, which > can allow the admin to choose post-install without making an > active choice at installation. This is an improvement, even if > it doesn't solve the world. But these different names can not be used by the networking stack, or in scripts, as others have pointed out. Which seems to be the big problem here. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 4:40 ` Matt Domsch 2009-10-10 5:23 ` Greg KH @ 2009-10-10 18:32 ` Stephen Hemminger 2009-10-10 21:06 ` Greg KH 2009-10-12 7:30 ` Kurt Van Dijck 2009-10-11 0:37 ` Marco d'Itri 2 siblings, 2 replies; 86+ messages in thread From: Stephen Hemminger @ 2009-10-10 18:32 UTC (permalink / raw) To: Matt Domsch; +Cc: netdev, linux-hotplug, Narendra_K, jordan_hargrave On Fri, 9 Oct 2009 23:40:57 -0500 Matt Domsch <Matt_Domsch@dell.com> wrote: > On Fri, Oct 09, 2009 at 07:44:01PM -0700, Stephen Hemminger wrote: > > Maybe I'm dense but can't see why having a useless /dev/net/ symlinks > > is a good interface choice. Perhaps you should explain the race between > > PCI scan and udev in more detail, and why solving it in either of those > > places won't work. As it stands you are proposing yet another wart to > > the already complex set of network interface API's which has implications > > for security as well as increasing the number of possible bugs. > > The fundamental challenge is that system administrators, particularly > those of server-class hardware with multiple network ports present > (some on the motherboard, some on add-in cards), have the > not-so-unreasonable expectation that there is a deterministic mapping > between those ports and the name one uses to address those ports. > > The fundamental roadblock to this is that enumeration != naming, > except that it is for network devices, and we keep changing the > enumeration order. > > Today, port naming is completely nondeterministic. If you have but > one NIC, there are few chances to get the name wrong (it'll be eth0). > If you have >1 NIC, chances increase to get it wrong. > > The complexity arises at multiple levels. > > First, device driver load order. In the 2.4 kernel days, and even > mostly early 2.6 kernel days, the order in which network drivers > loaded played a role in determining the name of the device. Drivers > loaded first would get their devices named first. If I have two types > of devices, say an e100-driven NIC and a tg3-driven NIC, I could > figure out that the names would be eth0=e100 and eth1=tg3 by setting > the load order in /etc/modules.conf (now modprobe.conf). If I wanted > the other order, fine, just switch it around in modules.conf and > reboot. OS installers, being the first running instance of Linux, > before modprobe.conf existed to set that ordering, had to have other > mechanisms to load drivers (often manually, or if programmatically > such as in a kickstart or autoyast file, was still somewhat fixed). > > With the advent of modaliases + udev, now modprobe.conf doesn't > contain this ordering anymore, and udev loads the drivers. So while > it wasn't perfect, it was better than nothing, and that's gone now. > > It gets even worse as, to speed up boot time, modprobes can be run in > parallel, and even within individual drivers, the NICs get initialized > (and named) in parallel. Further confusing things, some devices need > firmware loaded into them before getting names assigned, which is done > from userspace, and they race. > > Second, PCI device list order. In the 2.4 kernel days, the PCI device > list was scanned "breadth-first" (for each bus; for each device; for > each function; do load...). FWIW, Windows still does this. It gives > BIOS, which assigns PCI bus numbers, a chance to put LOMs at a lower > bus number than add-in cards. Module load order still mattered, but > at least if you had say 2 e1000 ports as LOMs, and 2 e1000 ports on > add-in cards, you pretty much knew the ordering would be eth0 as > lowest bdf on the motherboard, eth1 as next bdf on the motherboard, > and eth2 and 3 as the add-in cards in ascending slot order. > > With the advent of PCI hot plug in the 2.5 kernel series, the > breadth-first ordering became depth-first. (for each bus; for each > device; if the device is a bridge, scan the busses behind it.). This > caused NICs on bus 0 device 5, and bus 1 device 3, (eth0 and 1 > respectively) to be enumerated differently due to the a bridge from > bus 0 to bus 1 at 0:4. My crude hack of pci=bfsort, with some dmi > strings to match and auto-enable, at least reverted this back to the > ordering the 2.4 kernel and Windows used. Now we have to keep adding > systems to this DMI list (Dell has a number of systems on this list > today; HP has even more). And it doesn't completely solve the > problem, just masks it. > > So, to address the ordering problem, I placed a constraint on our > server hardware teams, forcing them to lay out their boards and assign > PCIe lanes and bus numbers, such that at least the designed "first" > LOM would get found first in either depth-first or breadth-first > order. Our 10G and 11G servers have this restriction in place, though > it wasn't easy. And it's gotten even harder, as the PCIe switches > expand the number of lanes available. We no longer have the > traditional tiered buses architecture, but the PCI layer for this > purpose thinks we do. I need to remove this constraint on the > hardware teams - it's gotten to be impossible for the chipset lanes to > be laid out efficiently with this constraint. > > All of the above just papered over the enumeration != naming problem. > > Third, stateless computing is becoming more and more commonplace. The > Field Replaceable Unit is the server itself. Got a bad server? Pull > it out, move the disks to an identical unit, insert the new server, > and go. Fix the bad server offline and bring it back. In this model, > having MAC addresses as the mechanism that is providing the > determinism (/etc/mactab or udev persistent naming rules) breaks, > because the MAC addresses of the ports on the new server won't be the > same as on the old server. HP even has a technology to solve _this_ > problem (in their blade chassis) - Virtual Connect. The MACs get > assigned by the chassis to the blades at POST, and are fixed to the > slot. Slick, and Dell has an even more flexible similar feature > FlexAddress. This doesn't solve the OS installer problem of "which of > these NICs should I use to do an install?" but it does recognize the > problem space and tries to overcome it. > > Fourth, for OS installers, choosing which NIC to use at installtime, > when all the NICs are plugged in, can be difficult. PXE environments, > using pxelinux and its IPAPPEND 2 option, will append > "BOOTIF=xx:xx:xx:xx:xx:xx" to the kernel command line, that > containing the MAC address of the NIC used for PXE. Neat trick. Yes, > we then had to teach the OS installers to recognize and use this. But > it only works if you PXE boot, and only for that one NIC. > > Fifth, network devices can have only a single name. eth0. If we look > at disks, we see udev manages a tree of symlinks for > /dev/disk/by-label, /dev/disk/by-path, /dev/disk/by-uuid. And as a > system admin, if I wanted to also create a udev rule for > /dev/disk/by-function (boot, swap, mattsstorage), it's trivial to do > so. Why can't we have this flexibility for network devices too? > > So, how do we get deterministic naming for all the NICs in a system? > That's what I'm going for. Picture a network switch, with several > blades, and several ports on each blade. The network admin addresses > each port as say 1/16 (the 16th port on blade 1, clearly labeled). > The parallel on servers is the chassis label printed on the outside > (say, "Gb1"). But due the above, there is no guarantee, and in fact > little chance, that Gb1 will be consistently named eth0 - it may vary > from boot to boot. That's full of fail. > > For a concrete example, the 4 bnx2 chips in my PowerEdge R610 with a > current 2.6 kernel, loading only one driver, the ports get assigned > names in nondeterministic order on each boot. Given that the > ifcfg-eth* rules, netfilter rules, and the rest all expect > deterministic naming, massive failure ensues unless some form of > determinism is brought back in. > > The idea to use a character device node to expose the ifindex value, > and udev to manage a tree of symlinks to it, really follows the model > used today for disks. It allows us to get deterministic names for > devices (albeit, the names are symlinks), and multiple names for > devices (through multiple symlink rules). That some people want to > use the char device to call ioctl() and read/write, as is possible on > the BSDs, would just be gravy IMHO. > > It does require a change in behavior for a system administrator. > Instead of hard-coding 'eth0' into her scripts, she uses > '/dev/net/by-function/boot' or somesuch. But then that name is > guaranteed to always refer to the "right" NIC. Every admin I've > spoken to is willing to make this kind of change, as long as they get > the consistent, deterministic naming they expect but don't have > today. And it does require patching userspace apps to take both a > kernel device name, or a path, and to resolve the path to device name > or ifindex. We wrote libnetdevname (really, one function), and have > patches for several userspace apps to use it, to prove it can be done. > > One alternative would be to do something using the sysfs ifindex value > already exported. e.g. > /sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/0000:06:07.0/net/eth0/ifindex > > but we have never had symlinks from /dev into /sys before (doesn't > mean we couldn't though). In that case, udev would grow to manage > /dev/net/by-chassis-label/Embedded_NIC_1 -> /sys/devices/.../net/eth0, > and libnetdevname would be used to follow the symlink in applications. > This approach could solve my problem without (many or any?) kernel > changes needed, but wouldn't help those who want to do > ioctl/read/write to a devnode. > > Given the problem, I really do need a solution. I've proposed one > method, and an alternative, but I can't afford to let the problem stay > unaddressed any longer, and need a clear direction to be chosen. The > char device gives me what I need, and others what they want also. > > Thanks for listening to the diatribe. For more examples and > workarounds that we've been telling our customers for several years, > check out http://linux.dell.com/papers.shtml for the Network Interface > Card Naming whitepaper. > > Why isn't the available through sysfs enough, if not why not add the necessary attributes there. BTW, for our distro, we are looking into device renaming based on PCI slot because that is what router OS's do. Customers expect if they replace the card in slot 0, it will come back with the same name. This is not what server customers expect. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 18:32 ` Stephen Hemminger @ 2009-10-10 21:06 ` Greg KH 2009-10-13 18:02 ` Dan Williams 2009-10-12 7:30 ` Kurt Van Dijck 1 sibling, 1 reply; 86+ messages in thread From: Greg KH @ 2009-10-10 21:06 UTC (permalink / raw) To: Stephen Hemminger Cc: Matt Domsch, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 11:32:19AM -0700, Stephen Hemminger wrote: > > BTW, for our distro, we are looking into device renaming based on PCI slot > because that is what router OS's do. Customers expect if they replace the card > in slot 0, it will come back with the same name. This is not what server > customers expect. If your bios exposes the PCI slots to userspace (through the proper ACPI namespace), doing this type of naming should be trivial with some simple udev rules, no additional kernel infrastructure is needed. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 21:06 ` Greg KH @ 2009-10-13 18:02 ` Dan Williams 2009-10-13 18:53 ` Narendra_K 0 siblings, 1 reply; 86+ messages in thread From: Dan Williams @ 2009-10-13 18:02 UTC (permalink / raw) To: Greg KH Cc: Stephen Hemminger, Matt Domsch, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, 2009-10-10 at 14:06 -0700, Greg KH wrote: > On Sat, Oct 10, 2009 at 11:32:19AM -0700, Stephen Hemminger wrote: > > > > BTW, for our distro, we are looking into device renaming based on PCI slot > > because that is what router OS's do. Customers expect if they replace the card > > in slot 0, it will come back with the same name. This is not what server > > customers expect. > > If your bios exposes the PCI slots to userspace (through the proper ACPI > namespace), doing this type of naming should be trivial with some simple > udev rules, no additional kernel infrastructure is needed. By and large, the people that care most about persistent network device names based on *location in the machine* are server users. This allows hotswap of cards or single-image-multiple-machine without needing configuration changes, which is nice. Those users can reasonably be expected to choose hardware whose BIOS supports the ACPI tables that (mostly) guarantee to provide actual, stable names for their hardware. If there's even a 10% chance that on consumer-level systems the names won't be stable on a given boot (and I can't see how, without BIOS support, we can guarantee 100% stability) then it's a worthless guarantee. If the BIOS support exists, it is trivial to use udev to create the correct naming mechanism for your machine, either using MAC address or BIOS-provided slot naming. No kernel patch is required. If the BIOS support does not exist, you are not guaranteed a stable naming mechanism except by MAC address, because the BIOS may randomly change enumeration based on the time of day, or it may not. A 90 or 95% stability guarantee is not a guarantee at all. Third, USB enumeration will always be unstable. Thus we have an unsolvable discrepancy in behavior between PCI and USB. Is this correct? Dan ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-10-13 18:02 ` Dan Williams @ 2009-10-13 18:53 ` Narendra_K 0 siblings, 0 replies; 86+ messages in thread From: Narendra_K @ 2009-10-13 18:53 UTC (permalink / raw) To: dcbw, greg Cc: shemminger, Matt_Domsch, netdev, linux-hotplug, Jordan_Hargrave >If the BIOS support exists, it is trivial to use udev to >create the correct naming mechanism for your machine, either >using MAC address or BIOS-provided slot naming. No kernel >patch is required. > Yes. In case, we want to rename only once. MAC address or slot names do provide persistent naming. They help in retaining whatever names are assigned during install time, which is the first instantiation of the OS. But these names may not be as expected (like first on board network interface name is expected to be "eth0" which is not always the case and might not reflect what is written on the chassis label as "Gb1" and "Gb2" etc) which would result in unattended installs break. Also image based deployments will face problems by introducing state such as MAC address. With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 18:32 ` Stephen Hemminger 2009-10-10 21:06 ` Greg KH @ 2009-10-12 7:30 ` Kurt Van Dijck 1 sibling, 0 replies; 86+ messages in thread From: Kurt Van Dijck @ 2009-10-12 7:30 UTC (permalink / raw) To: Stephen Hemminger Cc: Matt Domsch, netdev, linux-hotplug, Narendra_K, jordan_hargrave On Sat, Oct 10, 2009 at 11:32:19AM -0700, Stephen Hemminger wrote: > > On Fri, Oct 09, 2009 at 07:44:01PM -0700, Stephen Hemminger wrote: [...] > > Why isn't the available through sysfs enough, if not why not > add the necessary attributes there. True. If sysfs is not sufficient, what exact naming scheme could be applied that the chardev based naming could use? > [...] Kurt ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-10 4:40 ` Matt Domsch 2009-10-10 5:23 ` Greg KH 2009-10-10 18:32 ` Stephen Hemminger @ 2009-10-11 0:37 ` Marco d'Itri 2 siblings, 0 replies; 86+ messages in thread From: Marco d'Itri @ 2009-10-11 0:37 UTC (permalink / raw) To: Matt Domsch Cc: Stephen Hemminger, netdev, linux-hotplug, Narendra_K, jordan_hargrave [-- Attachment #1: Type: text/plain, Size: 1879 bytes --] On Oct 10, Matt Domsch <Matt_Domsch@dell.com> wrote: > It does require a change in behavior for a system administrator. > Instead of hard-coding 'eth0' into her scripts, she uses > '/dev/net/by-function/boot' or somesuch. But then that name is > guaranteed to always refer to the "right" NIC. Every admin I've > spoken to is willing to make this kind of change, as long as they get > the consistent, deterministic naming they expect but don't have > today. And it does require patching userspace apps to take both a > kernel device name, or a path, and to resolve the path to device name > or ifindex. We wrote libnetdevname (really, one function), and have > patches for several userspace apps to use it, to prove it can be done. For the records, before being a distribution developer I am a system administrator (who designed and manages many firewalls with multiple network interfaces) and I am still unconvinced that what you are proposing is a practical solution and that its downsides justify the significant changes both in software and in system administration practices that it requires. The first issue which greatly concerns me is the need to modify *every* userspace application and kernel tool (what about iptables? What about the kernel logs?): from an users experience point of view it would be very annoying if different applications used different names to refer to the same network device. I am also concerned with the practical implications of trying to use such long (and unusual) names: IFNAMSIZ is 16, so user interfaces tend to assume both short names and that they match something like /^[a-z0-9]+$/. What about e.g. distribution scripts which use the interface name as a file system path component? Do you already have a (standard) scheme to losslessly convert the names to a form without slashes? -- ciao, Marco [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-09 14:00 ` Narendra K ` (2 preceding siblings ...) 2009-10-09 21:09 ` Matt Domsch @ 2009-10-13 15:08 ` dann frazier 2009-10-13 17:13 ` Narendra_K 3 siblings, 1 reply; 86+ messages in thread From: dann frazier @ 2009-10-13 15:08 UTC (permalink / raw) To: Narendra K; +Cc: netdev, linux-hotplug, matt_domsch, jordan_hargrave On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote: > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote: > > > example udev config: > > > SUBSYSTEM=="net", > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > > > > work as well. But coupling the ifindex to the MAC address like this > > doesn't work. (In general, coupling any two unrelated attributes when > > trying to do persistent names doesn't work.) > > > Attaching the latest patch incorporating review comments. > > By creating character devices for every network device, we can use > udev to maintain alternate naming policies for devices, including > additional names for the same device, without interfering with the > name that the kernel assigns a device. > > This is conditionalized on CONFIG_NET_CDEV. If enabled (the default), > device nodes will automatically be created in /dev/netdev/ for each > network device. (/dev/net/ is already populated by the tun device.) > > These device nodes are not functional at the moment - open() returns > -ENOSYS. Their only purpose is to provide userspace with a kernel > name to ifindex mapping, in a form that udev can easily manage. If the idea is just to provide a userspace-visible mapping (and presumably take advantage of udev's infrastructure for naming) does this need kernel changes? Could this be a hierarchy under e.g. /etc/udev instead, using plain text files? It still means we need something like libnetdevname for apps to do the translation, but I'm not seeing why it matters how this map is stored. Is there some special property of the character devices (e.g. uevents) that we're not already getting with the existing interfaces? -- dann frazier ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-10-13 15:08 ` dann frazier @ 2009-10-13 17:13 ` Narendra_K 2009-10-13 17:36 ` dann frazier 2009-10-13 19:51 ` Greg KH 0 siblings, 2 replies; 86+ messages in thread From: Narendra_K @ 2009-10-13 17:13 UTC (permalink / raw) To: dannf; +Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose >> These device nodes are not functional at the moment - open() returns >> -ENOSYS. Their only purpose is to provide userspace with a kernel >> name to ifindex mapping, in a form that udev can easily manage. > >If the idea is just to provide a userspace-visible mapping >(and presumably take advantage of udev's infrastructure for >naming) does this need kernel changes? Could this be a >hierarchy under e.g. /etc/udev instead, using plain text >files? It still means we need something like libnetdevname for >apps to do the translation, but I'm not seeing why it matters >how this map is stored. Is there some special property of the >character devices (e.g. uevents) that we're not already >getting with the existing interfaces? Yes. The char device by itself doesn't help in any way. But it provides a flexible mechanism to provide multiple names for the same device, just the way it is for disks. With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 17:13 ` Narendra_K @ 2009-10-13 17:36 ` dann frazier 2009-10-16 0:32 ` dann frazier 2009-10-13 19:51 ` Greg KH 1 sibling, 1 reply; 86+ messages in thread From: dann frazier @ 2009-10-13 17:36 UTC (permalink / raw) To: Narendra_K Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose 1;2202;0cOn Tue, Oct 13, 2009 at 10:43:49PM +0530, Narendra_K@Dell.com wrote: > > >> These device nodes are not functional at the moment - open() returns > >> -ENOSYS. Their only purpose is to provide userspace with a kernel > >> name to ifindex mapping, in a form that udev can easily manage. > > > >If the idea is just to provide a userspace-visible mapping > >(and presumably take advantage of udev's infrastructure for > >naming) does this need kernel changes? Could this be a > >hierarchy under e.g. /etc/udev instead, using plain text > >files? It still means we need something like libnetdevname for > >apps to do the translation, but I'm not seeing why it matters > >how this map is stored. Is there some special property of the > >character devices (e.g. uevents) that we're not already > >getting with the existing interfaces? > > Yes. The char device by itself doesn't help in any way. But it provides > a flexible mechanism to provide multiple names for the same device, just > the way it is for disks. Right - so any reason this couldn't be implemented completely in userspace by having udev manipulate plain text files under say /etc/udev/net/? I do agree that it would be nice for admins/installers to tweak/use nic names in a similar way to storage names (udev rules), and it might let us take advantage of a lot of the existing udev code. -- dann frazier ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 17:36 ` dann frazier @ 2009-10-16 0:32 ` dann frazier 2009-10-16 14:02 ` Narendra_K 0 siblings, 1 reply; 86+ messages in thread From: dann frazier @ 2009-10-16 0:32 UTC (permalink / raw) To: Narendra_K Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose On Tue, Oct 13, 2009 at 11:36:38AM -0600, dann frazier wrote: > 1;2202;0cOn Tue, Oct 13, 2009 at 10:43:49PM +0530, Narendra_K@Dell.com wrote: > > > > >> These device nodes are not functional at the moment - open() returns > > >> -ENOSYS. Their only purpose is to provide userspace with a kernel > > >> name to ifindex mapping, in a form that udev can easily manage. > > > > > >If the idea is just to provide a userspace-visible mapping > > >(and presumably take advantage of udev's infrastructure for > > >naming) does this need kernel changes? Could this be a > > >hierarchy under e.g. /etc/udev instead, using plain text > > >files? It still means we need something like libnetdevname for > > >apps to do the translation, but I'm not seeing why it matters > > >how this map is stored. Is there some special property of the > > >character devices (e.g. uevents) that we're not already > > >getting with the existing interfaces? > > > > Yes. The char device by itself doesn't help in any way. But it provides > > a flexible mechanism to provide multiple names for the same device, just > > the way it is for disks. > > Right - so any reason this couldn't be implemented completely in > userspace by having udev manipulate plain text files under say > /etc/udev/net/? > > I do agree that it would be nice for admins/installers to tweak/use > nic names in a similar way to storage names (udev rules), and it might > let us take advantage of a lot of the existing udev code. Is there interest in this approach? - modify udev to manage network devices names as regular (non-device) files (stored in /etc/udev, /dev/netdev, or wherever) - use the existing udev rules to manage symlinks to these files - point libnetdevname at these text files for its name resolution I've started prototyping this, and it certainly looks possible w/o any kernel changes. However, I could probably use some advice from a udev person to do a proper implementation. -- dann frazier ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-10-16 0:32 ` dann frazier @ 2009-10-16 14:02 ` Narendra_K 2009-10-16 15:20 ` dann frazier 0 siblings, 1 reply; 86+ messages in thread From: Narendra_K @ 2009-10-16 14:02 UTC (permalink / raw) To: dannf; +Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose >On Tue, Oct 13, 2009 at 11:36:38AM -0600, dann frazier wrote: >> Right - so any reason this couldn't be implemented completely in >> userspace by having udev manipulate plain text files under say >> /etc/udev/net/? >> >> I do agree that it would be nice for admins/installers to tweak/use >> nic names in a similar way to storage names (udev rules), >and it might >> let us take advantage of a lot of the existing udev code. > >Is there interest in this approach? > - modify udev to manage network devices names as regular (non-device) > files (stored in /etc/udev, /dev/netdev, or wherever) Yes. Would you elaborate little more on "modify udev to manage network devices as regular files". Does it mean some custom rules which will generate a regular file under, say, /dev/netdev/ or extend udev itself ? And how would the regular file look like in terms of holding ifindex of the interface, which can be passed to libnetdevname. > - use the existing udev rules to manage symlinks to these files > - point libnetdevname at these text files for its name resolution > >I've started prototyping this, and it certainly looks possible >w/o any kernel changes. However, I could probably use some >advice from a udev person to do a proper implementation. With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-16 14:02 ` Narendra_K @ 2009-10-16 15:20 ` dann frazier 2009-10-16 15:33 ` Ben Hutchings 0 siblings, 1 reply; 86+ messages in thread From: dann frazier @ 2009-10-16 15:20 UTC (permalink / raw) To: Narendra_K Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose On Fri, Oct 16, 2009 at 07:32:50PM +0530, Narendra_K@Dell.com wrote: > > >On Tue, Oct 13, 2009 at 11:36:38AM -0600, dann frazier wrote: > >> Right - so any reason this couldn't be implemented completely in > >> userspace by having udev manipulate plain text files under say > >> /etc/udev/net/? > >> > >> I do agree that it would be nice for admins/installers to tweak/use > >> nic names in a similar way to storage names (udev rules), > >and it might > >> let us take advantage of a lot of the existing udev code. > > > >Is there interest in this approach? > > - modify udev to manage network devices names as regular (non-device) > > files (stored in /etc/udev, /dev/netdev, or wherever) > > Yes. Would you elaborate little more on "modify udev to manage network > devices as regular files". Sure. We already get an event when netifs get added/removed - udev just doesn't create a device file for it. And since all we care about is the file's name (and the symlinks to it), there's really no point in creating a real device file anyway. So, instead of 'mknod /dev/netdev/eth0', why not just 'touch /dev/netdev/eth0'? A file exists, so we can still maintain aliases as symlinks, we just don't need to modify the kernel. > Does it mean some custom rules which will > generate a regular file under, say, /dev/netdev/ or extend udev > itself ? I believe we have to extend udev itself. We could probably do this completely within udev rules by running programs that do the touching and symlinking, but it would be nicer and more consistent/familiar to take advantage of the udev syntax (SYMLINK) to do this natively. Besides, udev already has the logic to know when/how to instantiate and unlink symlinks, it would suck to duplicate that. So, udev would need to be modified to know how to go through the normal "node" creation for net devices, and to call creat() instead of mknod(). > And how would the regular file look like in terms of holding ifindex of > the interface, which can be passed to libnetdevname. I can't think of anything we need to store in the regular file. If we have the kernel name for the device, we can look up the ifindex in /sys. Correct me if I'm wrong, but storing it ourselves seems redundant. > > > > - use the existing udev rules to manage symlinks to these files > > - point libnetdevname at these text files for its name resolution > > > >I've started prototyping this, and it certainly looks possible > >w/o any kernel changes. However, I could probably use some > >advice from a udev person to do a proper implementation. > > With regards, > Narendra K -- dann frazier ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-16 15:20 ` dann frazier @ 2009-10-16 15:33 ` Ben Hutchings 2009-10-16 15:41 ` dann frazier 2009-10-16 21:40 ` dann frazier 0 siblings, 2 replies; 86+ messages in thread From: Ben Hutchings @ 2009-10-16 15:33 UTC (permalink / raw) To: dann frazier Cc: Narendra_K, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose On Fri, 2009-10-16 at 09:20 -0600, dann frazier wrote: > On Fri, Oct 16, 2009 at 07:32:50PM +0530, Narendra_K@Dell.com wrote: [...] > > And how would the regular file look like in terms of holding ifindex of > > the interface, which can be passed to libnetdevname. > > I can't think of anything we need to store in the regular file. If we > have the kernel name for the device, we can look up the ifindex in > /sys. Correct me if I'm wrong, but storing it ourselves seems > redundant. But the name of a netdev can change whereas its ifindex never does. Identifying netdevs by name would require additional work to update the links when a netdev is renamed and would still be prone to race conditions. This is why Narendra and Matt were proposing to store the ifindex in the node all along... Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-16 15:33 ` Ben Hutchings @ 2009-10-16 15:41 ` dann frazier 2009-10-16 21:40 ` dann frazier 1 sibling, 0 replies; 86+ messages in thread From: dann frazier @ 2009-10-16 15:41 UTC (permalink / raw) To: Ben Hutchings Cc: Narendra_K, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose On Fri, Oct 16, 2009 at 04:33:13PM +0100, Ben Hutchings wrote: > On Fri, 2009-10-16 at 09:20 -0600, dann frazier wrote: > > On Fri, Oct 16, 2009 at 07:32:50PM +0530, Narendra_K@Dell.com wrote: > [...] > > > And how would the regular file look like in terms of holding ifindex of > > > the interface, which can be passed to libnetdevname. > > > > I can't think of anything we need to store in the regular file. If we > > have the kernel name for the device, we can look up the ifindex in > > /sys. Correct me if I'm wrong, but storing it ourselves seems > > redundant. > > But the name of a netdev can change whereas its ifindex never does. > Identifying netdevs by name would require additional work to update the > links when a netdev is renamed and would still be prone to race > conditions. This is why Narendra and Matt were proposing to store the > ifindex in the node all along... ah, yes - I see that now - the ability to rename an interface is what prevents this from working. Thanks for the explanation. -- dann frazier ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-16 15:33 ` Ben Hutchings 2009-10-16 15:41 ` dann frazier @ 2009-10-16 21:40 ` dann frazier 2009-10-19 11:30 ` Narendra_K 1 sibling, 1 reply; 86+ messages in thread From: dann frazier @ 2009-10-16 21:40 UTC (permalink / raw) To: Ben Hutchings Cc: Narendra_K, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose On Fri, Oct 16, 2009 at 04:33:13PM +0100, Ben Hutchings wrote: > On Fri, 2009-10-16 at 09:20 -0600, dann frazier wrote: > > On Fri, Oct 16, 2009 at 07:32:50PM +0530, Narendra_K@Dell.com wrote: > [...] > > > And how would the regular file look like in terms of holding ifindex of > > > the interface, which can be passed to libnetdevname. > > > > I can't think of anything we need to store in the regular file. If we > > have the kernel name for the device, we can look up the ifindex in > > /sys. Correct me if I'm wrong, but storing it ourselves seems > > redundant. > > But the name of a netdev can change whereas its ifindex never does. > Identifying netdevs by name would require additional work to update the > links when a netdev is renamed and would still be prone to race > conditions. This is why Narendra and Matt were proposing to store the > ifindex in the node all along... Matt, Ben and I talked about a few other possibilities on IRC. The one I like the most at the moment is an idea Ben had to creat dummy files named after the ifindex. Then, use symlinks for the kernel name and the various by-$property subdirectories. This means the KOBJ events will need to expose the ifindex. I'm a novice at net programming, but I'm told that ifindex is the information apps ultimately require here. -- dann frazier ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-10-16 21:40 ` dann frazier @ 2009-10-19 11:30 ` Narendra_K 2009-10-19 16:14 ` Bryan Kadzban 0 siblings, 1 reply; 86+ messages in thread From: Narendra_K @ 2009-10-19 11:30 UTC (permalink / raw) To: dannf, bhutchings Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose >> > > And how would the regular file look like in terms of holding >> > > ifindex of the interface, which can be passed to libnetdevname. >> > >> > I can't think of anything we need to store in the regular file. If >> > we have the kernel name for the device, we can look up the ifindex >> > in /sys. Correct me if I'm wrong, but storing it ourselves seems >> > redundant. >> >> But the name of a netdev can change whereas its ifindex never does. >> Identifying netdevs by name would require additional work to update >> the links when a netdev is renamed and would still be prone to race >> conditions. This is why Narendra and Matt were proposing to >store the >> ifindex in the node all along... > >Matt, Ben and I talked about a few other possibilities on IRC. >The one I like the most at the moment is an idea Ben had to >creat dummy files named after the ifindex. Then, use symlinks >for the kernel name and the various by-$property >subdirectories. This means the KOBJ events will need to expose >the ifindex. > I suppose the KOBJ events already expose the ifindex of a network interface. The file "/sys/class/net/ethN/uevent" contains INTERFACE=ethN and IFINDEX=n already. But it looks like udev doesn't use it in any way. For example, with the kernel patch the "/sys/class/net/ethN/uevent" contains in addition to the above details, MAJOR=M and MINOR=m which the udev knows how to make use of with a rule like SUBSYSTEM=="net", KERNEL!="tun", NAME="netdev/%k", MODE="0600". >I'm a novice at net programming, but I'm told that ifindex is >the information apps ultimately require here. Yes. The minor number of the device node is retreived by libnetdevname by "stat"ing the pathname which happens to be ifindex of the device and it is mapped to corresponding kernel name by "if_indextoname" call. With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-19 11:30 ` Narendra_K @ 2009-10-19 16:14 ` Bryan Kadzban 2009-11-04 14:23 ` Narendra_K 0 siblings, 1 reply; 86+ messages in thread From: Bryan Kadzban @ 2009-10-19 16:14 UTC (permalink / raw) To: Narendra_K Cc: dannf, bhutchings, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose [-- Attachment #1: Type: text/plain, Size: 2371 bytes --] Narendra_K@Dell.com wrote: >>>>> And how would the regular file look like in terms of holding >>>>> ifindex of the interface, which can be passed to >>>>> libnetdevname. >>>> I can't think of anything we need to store in the regular file. >>>> If we have the kernel name for the device, we can look up the >>>> ifindex in /sys. Correct me if I'm wrong, but storing it >>>> ourselves seems redundant. >>> But the name of a netdev can change whereas its ifindex never >>> does. Identifying netdevs by name would require additional work >>> to update the links when a netdev is renamed and would still be >>> prone to race conditions. This is why Narendra and Matt were >>> proposing to >> store the >>> ifindex in the node all along... >> Matt, Ben and I talked about a few other possibilities on IRC. The >> one I like the most at the moment is an idea Ben had to creat dummy >> files named after the ifindex. Then, use symlinks for the kernel >> name and the various by-$property subdirectories. This means the >> KOBJ events will need to expose the ifindex. >> > > I suppose the KOBJ events already expose the ifindex of a network > interface. The file "/sys/class/net/ethN/uevent" contains > INTERFACE=ethN and IFINDEX=n already. But it looks like udev doesn't > use it in any way. Right; it could simply do the equivalent of: touch /dev/netdev/$env{IFINDEX} instead of its normal mknod(2), and then do normal SYMLINK processing. That last part is what would link /dev/netdev/by-name/$env{INTERFACE} to that device, along with /dev/netdev/by-mac/*, /dev/netdev/by-path/*, etc., etc., in as many different ways as people want to add rules. (Or /dev/net/by-* instead of netdev; I'm mostly ambivalent about the first-level directory under /dev. Looks like libnetdevname requires /dev/netdev though.) > For example, with the kernel patch the "/sys/class/net/ethN/uevent" > contains in addition to the above details, MAJOR=M and MINOR=m which > the udev knows how to make use of with a rule like > > SUBSYSTEM=="net", KERNEL!="tun", NAME="netdev/%k", MODE="0600". And if the only point is to get the ifindex via stat(2) on the resulting symlinks, but people don't like device files, then why not get the ifindex via readlink(2) (and a bit of string parsing, and a strtol(3) or strtoul(3) call) instead? :-) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 260 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-10-19 16:14 ` Bryan Kadzban @ 2009-11-04 14:23 ` Narendra_K 2009-11-06 8:49 ` Marco d'Itri 2009-11-06 22:05 ` Domsch, Matt 0 siblings, 2 replies; 86+ messages in thread From: Narendra_K @ 2009-11-04 14:23 UTC (permalink / raw) To: bryan Cc: dannf, bhutchings, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose, Sandeep_K_Shandilya >>> Matt, Ben and I talked about a few other possibilities on IRC. The >>> one I like the most at the moment is an idea Ben had to creat dummy >>> files named after the ifindex. Then, use symlinks for the >kernel name >>> and the various by-$property subdirectories. This means the KOBJ >>> events will need to expose the ifindex. >>> >> >> I suppose the KOBJ events already expose the ifindex of a network >> interface. The file "/sys/class/net/ethN/uevent" contains >> INTERFACE=ethN and IFINDEX=n already. But it looks like udev doesn't >> use it in any way. > >Right; it could simply do the equivalent of: > >touch /dev/netdev/$env{IFINDEX} > >instead of its normal mknod(2), and then do normal SYMLINK processing. >That last part is what would link >/dev/netdev/by-name/$env{INTERFACE} to that device, along with >/dev/netdev/by-mac/*, /dev/netdev/by-path/*, etc., etc., in as >many different ways as people want to add rules. > >(Or /dev/net/by-* instead of netdev; I'm mostly ambivalent >about the first-level directory under /dev. Looks like >libnetdevname requires /dev/netdev though.) > >> For example, with the kernel patch the "/sys/class/net/ethN/uevent" >> contains in addition to the above details, MAJOR=M and MINOR=m which >> the udev knows how to make use of with a rule like >> >> SUBSYSTEM=="net", KERNEL!="tun", NAME="netdev/%k", MODE="0600". > >And if the only point is to get the ifindex via stat(2) on the >resulting symlinks, but people don't like device files, then >why not get the ifindex via readlink(2) (and a bit of string >parsing, and a strtol(3) or >strtoul(3) call) instead? :-) I suppose this issue can also be addressed in another way. Currently, the sysfs contains various attributes of a network interface under the directory "/sys/class/net/ethN", for example "/sys/class/net/ethN/address". This will be used by udev as below - SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:1d:09:6a:78:ec", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1". Similarly, export an attribute named "smbios_name" to sysfs, i.e "/sys/class/net/eth0/smbios_name". "Cat /sys/class/net/eth0/smbios_name" would show "Embedded_NIC_1[23..]" and this can be used by udev in 70-persistent-net.rules as SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{smbios_name}=="Embedded_NIC_1", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0". I suppose this would not need any changes to the udev code and existing udev infrastructure can be used as udev is capable handling ATTR{something}. This would also ensure that whichever device is "Embedded_NIC_1" as per the BIOS, will also be "eth0" in the os. Netdev, What are your views on this idea ? With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-11-04 14:23 ` Narendra_K @ 2009-11-06 8:49 ` Marco d'Itri 2009-11-06 22:06 ` Matt Domsch 2009-11-06 22:05 ` Domsch, Matt 1 sibling, 1 reply; 86+ messages in thread From: Marco d'Itri @ 2009-11-06 8:49 UTC (permalink / raw) To: Narendra_K Cc: bryan, dannf, bhutchings, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose, Sandeep_K_Shandilya On Nov 04, Narendra_K@Dell.com wrote: > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", > ATTR{smbios_name}=="Embedded_NIC_1", ATTR{type}=="1", KERNEL=="eth*", > NAME="eth0". As a distribution developer I highly value solutions like this which do not require patching every application which deals with interface names and then teaching users about aliases which only work in some places and are unknown to the kernel. -- ciao, Marco ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-11-06 8:49 ` Marco d'Itri @ 2009-11-06 22:06 ` Matt Domsch 2009-11-06 22:35 ` Marco d'Itri 0 siblings, 1 reply; 86+ messages in thread From: Matt Domsch @ 2009-11-06 22:06 UTC (permalink / raw) To: Narendra_K, bryan, dannf, bhutchings, netdev, linux-hotplug, Jordan_Hargrave, Charles_Ros On Fri, Nov 06, 2009 at 09:49:21AM +0100, Marco d'Itri wrote: > On Nov 04, Narendra_K@Dell.com wrote: > > > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", > > ATTR{smbios_name}=="Embedded_NIC_1", ATTR{type}=="1", KERNEL=="eth*", > > NAME="eth0". > As a distribution developer I highly value solutions like this which do > not require patching every application which deals with interface names > and then teaching users about aliases which only work in some places and > are unknown to the kernel. Fair enough - but would you object if we changed the naming scheme from eth%d to something else? -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-11-06 22:06 ` Matt Domsch @ 2009-11-06 22:35 ` Marco d'Itri 2009-11-06 23:17 ` dann frazier 2009-11-09 14:41 ` Narendra_K 0 siblings, 2 replies; 86+ messages in thread From: Marco d'Itri @ 2009-11-06 22:35 UTC (permalink / raw) To: Matt Domsch Cc: Narendra_K, bryan, dannf, bhutchings, netdev, linux-hotplug, Jordan_Hargrave, Charles_Rose, Sandeep_K_Shandilya [-- Attachment #1: Type: text/plain, Size: 1284 bytes --] On Nov 06, Matt Domsch <Matt_Domsch@Dell.com> wrote: > > As a distribution developer I highly value solutions like this which do > > not require patching every application which deals with interface names > > and then teaching users about aliases which only work in some places and > > are unknown to the kernel. > Fair enough - but would you object if we changed the naming scheme > from eth%d to something else? I suppose that this would depend on what else. :-) Since you want radical changes I recommend that you design the new persistent naming infrastructure in a way that will allow root to choose to use the classic naming scheme, or many users will scream a lot and at least some distributions will do it anyway. I also expect that providing choice at the beginning of development may lead to more acceptance later if and when the new scheme will have proved itself to be superior (at least in some situations). You have tought about this for a long time and if so far you have not found a solution which is widely considered superior then I doubt that one will appear soon. Providing your favourite naming scheme as an optional add on will immediately benefit those who like it and greatly reduce opposition from those who do not. -- ciao, Marco [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-11-06 22:35 ` Marco d'Itri @ 2009-11-06 23:17 ` dann frazier 2009-11-09 14:41 ` Narendra_K 1 sibling, 0 replies; 86+ messages in thread From: dann frazier @ 2009-11-06 23:17 UTC (permalink / raw) To: Matt Domsch, Narendra_K, bryan, bhutchings, netdev, linux-hotplug, Jordan_Hargrave On Fri, Nov 06, 2009 at 11:35:24PM +0100, Marco d'Itri wrote: > On Nov 06, Matt Domsch <Matt_Domsch@Dell.com> wrote: > > > > As a distribution developer I highly value solutions like this which do > > > not require patching every application which deals with interface names > > > and then teaching users about aliases which only work in some places and > > > are unknown to the kernel. > > Fair enough - but would you object if we changed the naming scheme > > from eth%d to something else? > I suppose that this would depend on what else. :-) > Since you want radical changes I recommend that you design the new > persistent naming infrastructure in a way that will allow root to choose > to use the classic naming scheme, or many users will scream a lot and at > least some distributions will do it anyway. > I also expect that providing choice at the beginning of development may > lead to more acceptance later if and when the new scheme will have > proved itself to be superior (at least in some situations). > You have tought about this for a long time and if so far you have not > found a solution which is widely considered superior then I doubt that > one will appear soon. Providing your favourite naming scheme as an > optional add on will immediately benefit those who like it and greatly > reduce opposition from those who do not. This seems to me like a good installer feature - give the user an option to enter a name for an interface, with the default option to use the eth* names. To illustrate by example, I imagine an installer flow that looks like this: [Do Hardware Discovery] [Automatically reorder kernel names for reasonable defaults; eth0-eth{n-1} map to n onboard nics] Sample user interface for network configuration: ------------Choose an interface to configure -------------- | Multiple unconfigured interfaces detected. | | Select an interface to configure by: | | 1. Kernel name (eth0, eth1, etc) | | 2. Mac Address | | 3. Chassis name | | 4. PCI Slot | ----------------------------------------------------------- ----Choose an interface to configure (by chassis name)----- | 1. LOM0 | | 2. LOM1 | | 3. Undefined | | 4. Undefined | ----------------------------------------------------------- ----------------Name interface - (chassis name LOM0)------- | Name to use for this interface [eth0]: __mynet0_ | ----------------------------------------------------------- ----------------------------------------------------------- | Configure interface - mynet0 | | 1. DHCP | | 2. Static | | ... | ----------------------------------------------------------- [Generate udev rules that bind the user-selected name to the user-selected attribute] ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-11-06 22:35 ` Marco d'Itri 2009-11-06 23:17 ` dann frazier @ 2009-11-09 14:41 ` Narendra_K 2009-11-10 17:23 ` Stephen Hemminger 1 sibling, 1 reply; 86+ messages in thread From: Narendra_K @ 2009-11-09 14:41 UTC (permalink / raw) To: md, Matt_Domsch Cc: bryan, dannf, bhutchings, netdev, linux-hotplug, Jordan_Hargrave, Charles_Rose, Sandeep_K_Shandilya >> > As a distribution developer I highly value solutions like >this which >> > do not require patching every application which deals with >interface >> > names and then teaching users about aliases which only >work in some >> > places and are unknown to the kernel. >> Fair enough - but would you object if we changed the naming scheme >> from eth%d to something else? >I suppose that this would depend on what else. :-) Since you >want radical changes I recommend that you design the new >persistent naming infrastructure in a way that will allow root >to choose to use the classic naming scheme, or many users will >scream a lot and at least some distributions will do it anyway. >I also expect that providing choice at the beginning of >development may lead to more acceptance later if and when the >new scheme will have proved itself to be superior (at least in >some situations). >You have tought about this for a long time and if so far you >have not found a solution which is widely considered superior >then I doubt that one will appear soon. Providing your >favourite naming scheme as an optional add on will immediately >benefit those who like it and greatly reduce opposition from >those who do not. In that way, I suppose char device node solution fits the scheme perfectly. It doesn't change or interfere with the kernel's default naming scheme (ethN) in any way. Also, the applications continue to work the way they did and in addition to supporting traditional names, they would also support pathnames. Whether all the user space applications need to be patched can be discussed and debated. But, we can patch applications like, installers and firewall code, which when don't see determinism ("eth0 mapping to integrated port 1"), fail and cause very high impact could be patched. Since users are already familiar with pathnames like /dev/disk/by-id{label, uuid}, I suppose it might not be very difficult to get used to pathnames like /dev/netdev/by-chassis-label/Embedded_NIC_1. Would that be acceptable ? With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-11-09 14:41 ` Narendra_K @ 2009-11-10 17:23 ` Stephen Hemminger 2009-11-11 6:31 ` Narendra_K 0 siblings, 1 reply; 86+ messages in thread From: Stephen Hemminger @ 2009-11-10 17:23 UTC (permalink / raw) To: Narendra_K Cc: md, Matt_Domsch, bryan, dannf, bhutchings, netdev, linux-hotplug, Jordan_Hargrave, Charles_Rose, Sandeep_K_Shandilya On Mon, 9 Nov 2009 20:11:47 +0530 <Narendra_K@Dell.com> wrote: > > >> > As a distribution developer I highly value solutions like > >this which > >> > do not require patching every application which deals with > >interface > >> > names and then teaching users about aliases which only > >work in some > >> > places and are unknown to the kernel. > >> Fair enough - but would you object if we changed the naming scheme > >> from eth%d to something else? > >I suppose that this would depend on what else. :-) Since you > >want radical changes I recommend that you design the new > >persistent naming infrastructure in a way that will allow root > >to choose to use the classic naming scheme, or many users will > >scream a lot and at least some distributions will do it anyway. > >I also expect that providing choice at the beginning of > >development may lead to more acceptance later if and when the > >new scheme will have proved itself to be superior (at least in > >some situations). > >You have tought about this for a long time and if so far you > >have not found a solution which is widely considered superior > >then I doubt that one will appear soon. Providing your > >favourite naming scheme as an optional add on will immediately > >benefit those who like it and greatly reduce opposition from > >those who do not. > > In that way, I suppose char device node solution fits the scheme > perfectly. It doesn't change or interfere with the kernel's default > naming scheme (ethN) in any way. Also, the applications continue to work > the way they did and in addition to supporting traditional names, they > would also support pathnames. Whether all the user space applications > need to be patched can be discussed and debated. But, we can patch > applications like, installers and firewall code, which when don't see > determinism ("eth0 mapping to integrated port 1"), fail and cause very > high impact could be patched. Since users are already familiar with > pathnames like /dev/disk/by-id{label, uuid}, I suppose it might not be > very difficult to get used to pathnames like > /dev/netdev/by-chassis-label/Embedded_NIC_1. Would that be acceptable ? > IFNAMSIZ = 16 is hardwired as part of the kernel binary user space API. Have you observed that the only developers arguing for this come from outside the normal circle of networking? It seems to be favored only by those who come to networking from a system or disk point of view. -- ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-11-10 17:23 ` Stephen Hemminger @ 2009-11-11 6:31 ` Narendra_K 0 siblings, 0 replies; 86+ messages in thread From: Narendra_K @ 2009-11-11 6:31 UTC (permalink / raw) To: shemminger Cc: md, Matt_Domsch, bryan, dannf, bhutchings, netdev, linux-hotplug, Jordan_Hargrave, Charles_Rose, Sandeep_K_Shandilya >> >> Fair enough - but would you object if we changed the >naming scheme >> >> from eth%d to something else? >> >I suppose that this would depend on what else. :-) Since you want >> >radical changes I recommend that you design the new >persistent naming >> >infrastructure in a way that will allow root to choose to use the >> >classic naming scheme, or many users will scream a lot and at least >> >some distributions will do it anyway. >> >I also expect that providing choice at the beginning of development >> >may lead to more acceptance later if and when the new scheme will >> >have proved itself to be superior (at least in some situations). >> >You have tought about this for a long time and if so far >you have not >> >found a solution which is widely considered superior then I doubt >> >that one will appear soon. Providing your favourite naming >scheme as >> >an optional add on will immediately benefit those who like it and >> >greatly reduce opposition from those who do not. >> >> In that way, I suppose char device node solution fits the scheme >> perfectly. It doesn't change or interfere with the kernel's default >> naming scheme (ethN) in any way. Also, the applications continue to >> work the way they did and in addition to supporting >traditional names, >> they would also support pathnames. Whether all the user space >> applications need to be patched can be discussed and >debated. But, we >> can patch applications like, installers and firewall code, >which when >> don't see determinism ("eth0 mapping to integrated port 1"), >fail and >> cause very high impact could be patched. Since users are already >> familiar with pathnames like /dev/disk/by-id{label, uuid}, I suppose >> it might not be very difficult to get used to pathnames like >> /dev/netdev/by-chassis-label/Embedded_NIC_1. Would that be >acceptable ? >> > >IFNAMSIZ = 16 is hardwired as part of the kernel binary user space API. This factor is taken into consideration. The user space applications take this pathname, map it to the kernel name and use the kernel name to issue ioctls (http://linux.dell.com/wiki/index.php/Oss/libnetdevname). The pathname was suggested because it provides a way to get to the right interface when "integrated port 1" doesn't get the expected name "eth0". With regards, Narendra K ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-11-04 14:23 ` Narendra_K 2009-11-06 8:49 ` Marco d'Itri @ 2009-11-06 22:05 ` Domsch, Matt 1 sibling, 0 replies; 86+ messages in thread From: Domsch, Matt @ 2009-11-06 22:05 UTC (permalink / raw) To: K, Narendra Cc: Bryan Kadzban, dannf, bhutchings, netdev, linux-hotplug, Hargrave, Jordan, Rose, Charles, Shandilya, Sandeep K On Wed, Nov 04, 2009 at 08:23:38AM -0600, K, Narendra wrote: > Similarly, export an attribute named "smbios_name" to sysfs, i.e > "/sys/class/net/eth0/smbios_name". "Cat /sys/class/net/eth0/smbios_name" > would show "Embedded_NIC_1[23..]" and this can be used by udev in > 70-persistent-net.rules as > > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", > ATTR{smbios_name}=="Embedded_NIC_1", ATTR{type}=="1", KERNEL=="eth*", > NAME="eth0". > > I suppose this would not need any changes to the udev code and existing > udev infrastructure can be used as udev is capable handling > ATTR{something}. > > This would also ensure that whichever device is "Embedded_NIC_1" as per > the BIOS, will also be "eth0" in the os. We can grab the smbios_name value using biosdevname in a PROGRAM= part of the udev rule. But it doesn't actually solve the problem. We haven't changed the network device naming scheme from "eth%d" to something else. Therefore, by having rules which simply try to re-order names within that scheme, when they're being enumerated in parallel and racing, we get collisions. Take for example, this which tries to rename the 4 onboard NICs in a particular order, in the absence of any other rules: PROGRAM="/sbin/biosdevname --policy=smbios_names -i %k", RESULT=="Embedded NIC 1", NAME="eth0" PROGRAM="/sbin/biosdevname --policy=smbios_names -i %k", RESULT=="Embedded NIC 2", NAME="eth1" PROGRAM="/sbin/biosdevname --policy=smbios_names -i %k", RESULT=="Embedded NIC 3", NAME="eth2" PROGRAM="/sbin/biosdevname --policy=smbios_names -i %k", RESULT=="Embedded NIC 4", NAME="eth3" I wind up with instead this in ifconfig -a: eth0 00:1B:21:42:66:30 eth1 00:1B:21:42:66:31 eth2 00:22:19:59:8E:5A eth2_rename 00:22:19:59:8E:56 eth3 00:22:19:59:8E:5C eth3_rename 00:22:19:59:8E:58 When what I would have expected would have been: eth0 00:22:19:59:8E:56 eth1 00:22:19:59:8E:58 eth2 00:22:19:59:8E:5A eth3 00:22:19:59:8E:5C eth4 00:1B:21:42:66:30 eth5 00:1B:21:42:66:31 I can't use eth%d as the scheme - that's the kernel's scheme. I have to switch the scheme to something else. -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 17:13 ` Narendra_K 2009-10-13 17:36 ` dann frazier @ 2009-10-13 19:51 ` Greg KH 2009-10-13 20:00 ` Jordan_Hargrave 1 sibling, 1 reply; 86+ messages in thread From: Greg KH @ 2009-10-13 19:51 UTC (permalink / raw) To: Narendra_K Cc: dannf, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose On Tue, Oct 13, 2009 at 10:43:49PM +0530, Narendra_K@Dell.com wrote: > > >> These device nodes are not functional at the moment - open() returns > >> -ENOSYS. Their only purpose is to provide userspace with a kernel > >> name to ifindex mapping, in a form that udev can easily manage. > > > >If the idea is just to provide a userspace-visible mapping > >(and presumably take advantage of udev's infrastructure for > >naming) does this need kernel changes? Could this be a > >hierarchy under e.g. /etc/udev instead, using plain text > >files? It still means we need something like libnetdevname for > >apps to do the translation, but I'm not seeing why it matters > >how this map is stored. Is there some special property of the > >character devices (e.g. uevents) that we're not already > >getting with the existing interfaces? > > Yes. The char device by itself doesn't help in any way. But it provides > a flexible mechanism to provide multiple names for the same device, just > the way it is for disks. No, it's quite different than disks in that the symlinks, _and_ the device nodes do absolutly nothing. And any reference to a name that is a symlink will not work with any existing network tool, you will have to do some kind of lookup to determine which network device you really were referring to. These links end up being useless, and confusing, I still don't see how you can use them for anything. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-10-13 19:51 ` Greg KH @ 2009-10-13 20:00 ` Jordan_Hargrave 2009-10-13 20:19 ` Greg KH 0 siblings, 1 reply; 86+ messages in thread From: Jordan_Hargrave @ 2009-10-13 20:00 UTC (permalink / raw) To: greg, Narendra_K; +Cc: dannf, netdev, linux-hotplug, Matt_Domsch, Charles_Rose We have developed a mapping library that will convert the user-friendly symlink names to the kernel names necessary for socket ioctls. All network tools that normally take ethX as argument have been modified to use this mapping library. Usually it's just a one-line addition when parsing the command line arguments. --jordan hargrave Dell Enterprise Linux Engineering -----Original Message----- From: Greg KH [mailto:greg@kroah.com] Sent: Tue 10/13/2009 14:51 To: K, Narendra Cc: dannf@hp.com; netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; Domsch, Matt; Hargrave, Jordan; Rose, Charles Subject: Re: PATCH: Network Device Naming mechanism and policy On Tue, Oct 13, 2009 at 10:43:49PM +0530, Narendra_K@Dell.com wrote: > > >> These device nodes are not functional at the moment - open() returns > >> -ENOSYS. Their only purpose is to provide userspace with a kernel > >> name to ifindex mapping, in a form that udev can easily manage. > > > >If the idea is just to provide a userspace-visible mapping > >(and presumably take advantage of udev's infrastructure for > >naming) does this need kernel changes? Could this be a > >hierarchy under e.g. /etc/udev instead, using plain text > >files? It still means we need something like libnetdevname for > >apps to do the translation, but I'm not seeing why it matters > >how this map is stored. Is there some special property of the > >character devices (e.g. uevents) that we're not already > >getting with the existing interfaces? > > Yes. The char device by itself doesn't help in any way. But it provides > a flexible mechanism to provide multiple names for the same device, just > the way it is for disks. No, it's quite different than disks in that the symlinks, _and_ the device nodes do absolutly nothing. And any reference to a name that is a symlink will not work with any existing network tool, you will have to do some kind of lookup to determine which network device you really were referring to. These links end up being useless, and confusing, I still don't see how you can use them for anything. thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 20:00 ` Jordan_Hargrave @ 2009-10-13 20:19 ` Greg KH 2009-10-13 22:05 ` Matt Domsch 2009-10-13 22:08 ` dann frazier 0 siblings, 2 replies; 86+ messages in thread From: Greg KH @ 2009-10-13 20:19 UTC (permalink / raw) To: Jordan_Hargrave Cc: Narendra_K, dannf, netdev, linux-hotplug, Matt_Domsch, Charles_Rose A: No. Q: Should I include quotations after my reply? http://daringfireball.net/2007/07/on_top On Tue, Oct 13, 2009 at 03:00:59PM -0500, Jordan_Hargrave@Dell.com wrote: > We have developed a mapping library that will convert the > user-friendly symlink names to the kernel names necessary for socket > ioctls. All network tools that normally take ethX as argument have > been modified to use this mapping library. Usually it's just a > one-line addition when parsing the command line arguments. Either I missed this in the first message in this thread, or this was never stated before, but that is nice. Where is this library, and will it be accepted by the upstream tool maintainers? thanks, greg k-h ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 20:19 ` Greg KH @ 2009-10-13 22:05 ` Matt Domsch 2009-10-13 22:08 ` dann frazier 1 sibling, 0 replies; 86+ messages in thread From: Matt Domsch @ 2009-10-13 22:05 UTC (permalink / raw) To: Greg KH Cc: Jordan_Hargrave, Narendra_K, dannf, netdev, linux-hotplug, Charles_Rose On Tue, Oct 13, 2009 at 01:19:31PM -0700, Greg KH wrote: > On Tue, Oct 13, 2009 at 03:00:59PM -0500, Jordan_Hargrave@Dell.com wrote: > > We have developed a mapping library that will convert the > > user-friendly symlink names to the kernel names necessary for socket > > ioctls. All network tools that normally take ethX as argument have > > been modified to use this mapping library. Usually it's just a > > one-line addition when parsing the command line arguments. > > Either I missed this in the first message in this thread, or this was > never stated before, but that is nice. Where is this library, It was not noted in the initial patch post, but I did note it immediately thereafter. Let me also note that we are prepared to have userspace consumers of this new character device node. http://linux.dell.com/wiki/index.php/Oss/libnetdevname notes how the kernel patch will interact with udev, describes the new library helper function in libnetdevname, and has patches for net-tools, iproute2, and ethtool to make use of the helper function. As has been noted here, MAC addresses are not necessarily unique to an interface. As such, we are not proposing a net/by-mac/* symlink to /dev/netdev/*. > and will it be accepted by the upstream tool maintainers? Unknown, we haven't proposed it to any yet as it's irrelevant until there is general acceptance of the approach (kernel or otherwise). I figured we'd start with the kernel discussion, and show how it could be used. -- Matt Domsch Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-10-13 20:19 ` Greg KH 2009-10-13 22:05 ` Matt Domsch @ 2009-10-13 22:08 ` dann frazier 1 sibling, 0 replies; 86+ messages in thread From: dann frazier @ 2009-10-13 22:08 UTC (permalink / raw) To: Greg KH Cc: Jordan_Hargrave, Narendra_K, netdev, linux-hotplug, Matt_Domsch, Charles_Rose On Tue, Oct 13, 2009 at 01:19:31PM -0700, Greg KH wrote: > > A: No. > Q: Should I include quotations after my reply? > > http://daringfireball.net/2007/07/on_top > > On Tue, Oct 13, 2009 at 03:00:59PM -0500, Jordan_Hargrave@Dell.com wrote: > > We have developed a mapping library that will convert the > > user-friendly symlink names to the kernel names necessary for socket > > ioctls. All network tools that normally take ethX as argument have > > been modified to use this mapping library. Usually it's just a > > one-line addition when parsing the command line arguments. > > Either I missed this in the first message in this thread, or this was > never stated before, but that is nice. Where is this library, I read about it here: http://linux.dell.com/wiki/index.php/Oss/libnetdevname#libnetdevname Source appears to be here: http://linux.dell.com/git/?p=libnetdevname.git;a=summary > and will > it be accepted by the upstream tool maintainers? -- dann frazier ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <5DDAB7BA7BDB58439DD0EED0B8E9A3AE011CD92D@ausx3mpc102.aus.amer.dell.com>]
* PATCH: Network Device Naming mechanism and policy [not found] <5DDAB7BA7BDB58439DD0EED0B8E9A3AE011CD92D@ausx3mpc102.aus.amer.dell.com> @ 2009-08-19 18:56 ` Jordan_Hargrave 2009-08-19 19:26 ` Ben Hutchings 2009-08-20 4:41 ` Bryan Kadzban 0 siblings, 2 replies; 86+ messages in thread From: Jordan_Hargrave @ 2009-08-19 18:56 UTC (permalink / raw) To: netdev, linux-hotplug This is from an old discussion several months ago: http://lkml.org/lkml/2009/3/24/357 http://lkml.org/lkml/2009/3/24/380 Basically the issue is that between a race in udev and PCI scan order the ethX IDs may not be consistent between reboots. The idea is to use a mechanism similar to how disks now can be accessed by their LABEL/PATH/UUID instead of raw /dev/sdX ids. example udev config: SUBSYSTEM=="net", SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" SUBSYSTEM=="net", PROGRAM="/sbin/biosdevname -i %k --policy=all_names", SYMLINK+="net/by-chassis-id/%c" The following patch will create a device node for network devices based off their ifindex; udev can then use this device node for creating symlinks in /dev/net/xxxx similar to the way that disks now use by-label and by-path symlinks. Combining this with the biosdevname utility and patches to common network utilities, it could be possible to access ethernet devices by their PCI path or BIOS Label. eg. ifconfig Embedded_NIC_1 --- include/linux/major.h~ 2009-07-30 18:34:47.000000000 -0400 +++ include/linux/major.h 2009-08-05 14:52:10.000000000 -0400 @@ -169,6 +169,7 @@ #define IBM_FS3270_MAJOR 228 #define VIOTAPE_MAJOR 230 +#define NETDEV_MAJOR 234 #define BLOCK_EXT_MAJOR 259 #define SCSI_OSD_MAJOR 260 /* open-osd's OSD scsi device */ --- net/core/net-sysfs.cx 2009-08-05 15:00:13.000000000 -0400 +++ net/core/net-sysfs.c 2009-08-05 15:01:20.000000000 -0400 @@ -11,6 +11,7 @@ #include <linux/capability.h> #include <linux/kernel.h> +#include <linux/major.h> #include <linux/netdevice.h> #include <linux/if_arp.h> #include <net/sock.h> @@ -496,6 +497,7 @@ int netdev_register_kobject(struct net_d dev->class = &net_class; dev->platform_data = net; dev->groups = groups; + dev->devt = MKDEV(NETDEV_MAJOR, net->ifindex); BUILD_BUG_ON(BUS_ID_SIZE < IFNAMSIZ); dev_set_name(dev, "%s", net->name); --jordan hargrave Dell Enterprise Linux Engineering ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-08-19 18:56 ` Jordan_Hargrave @ 2009-08-19 19:26 ` Ben Hutchings 2009-08-19 19:40 ` Jordan_Hargrave 2009-08-20 4:41 ` Bryan Kadzban 1 sibling, 1 reply; 86+ messages in thread From: Ben Hutchings @ 2009-08-19 19:26 UTC (permalink / raw) To: Jordan_Hargrave; +Cc: netdev, linux-hotplug On Wed, 2009-08-19 at 13:56 -0500, Jordan_Hargrave@Dell.com wrote: > This is from an old discussion several months ago: > http://lkml.org/lkml/2009/3/24/357 > http://lkml.org/lkml/2009/3/24/380 > > Basically the issue is that between a race in udev and PCI scan order the ethX IDs may not > be consistent between reboots. The idea is to use a mechanism similar to how disks now can > be accessed by their LABEL/PATH/UUID instead of raw /dev/sdX ids. > > example udev config: > SUBSYSTEM=="net", SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > SUBSYSTEM=="net", PROGRAM="/sbin/biosdevname -i %k --policy=all_names", SYMLINK+="net/by-chassis-id/%c" > > The following patch will create a device node for network devices based off their ifindex; > udev can then use this device node for creating symlinks in /dev/net/xxxx similar to the > way that disks now use by-label and by-path symlinks. > > Combining this with the biosdevname utility and patches to common network utilities, > it could be possible to access ethernet devices by their PCI path or BIOS Label. > > eg. ifconfig Embedded_NIC_1 Nice idea, but wouldn't it be "ifconfig LABEL=Embedded_NIC_1"? > --- include/linux/major.h~ 2009-07-30 18:34:47.000000000 -0400 > +++ include/linux/major.h 2009-08-05 14:52:10.000000000 -0400 > @@ -169,6 +169,7 @@ > #define IBM_FS3270_MAJOR 228 > > #define VIOTAPE_MAJOR 230 > +#define NETDEV_MAJOR 234 > > #define BLOCK_EXT_MAJOR 259 > #define SCSI_OSD_MAJOR 260 /* open-osd's OSD scsi device */ > --- net/core/net-sysfs.cx 2009-08-05 15:00:13.000000000 -0400 > +++ net/core/net-sysfs.c 2009-08-05 15:01:20.000000000 -0400 > @@ -11,6 +11,7 @@ > > #include <linux/capability.h> > #include <linux/kernel.h> > +#include <linux/major.h> > #include <linux/netdevice.h> > #include <linux/if_arp.h> > #include <net/sock.h> > @@ -496,6 +497,7 @@ int netdev_register_kobject(struct net_d > dev->class = &net_class; > dev->platform_data = net; > dev->groups = groups; > + dev->devt = MKDEV(NETDEV_MAJOR, net->ifindex); [...] Since this major number is unregistered, the device inode can only be stat'd and not open'd, which seems like a bit of a hack. Is there anything that would stop register_chrdev(0, ...) from allocating this major number, causing network devices to be confused with some other device type? Maybe there *should* be character devices for network device manipulation. It seems like that would avoid the race conditions that device renaming and removal causes for name-based socket ioctls. But maybe everyone should be using netlink for that instead. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: PATCH: Network Device Naming mechanism and policy 2009-08-19 19:26 ` Ben Hutchings @ 2009-08-19 19:40 ` Jordan_Hargrave 0 siblings, 0 replies; 86+ messages in thread From: Jordan_Hargrave @ 2009-08-19 19:40 UTC (permalink / raw) To: bhutchings; +Cc: netdev, linux-hotplug -----Original Message----- From: Ben Hutchings [mailto:bhutchings@solarflare.com] Sent: Wed 8/19/2009 14:26 To: Hargrave, Jordan Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org Subject: Re: PATCH: Network Device Naming mechanism and policy On Wed, 2009-08-19 at 13:56 -0500, Jordan_Hargrave@Dell.com wrote: > This is from an old discussion several months ago: > http://lkml.org/lkml/2009/3/24/357 > http://lkml.org/lkml/2009/3/24/380 > > Basically the issue is that between a race in udev and PCI scan order the ethX IDs may not > be consistent between reboots. The idea is to use a mechanism similar to how disks now can > be accessed by their LABEL/PATH/UUID instead of raw /dev/sdX ids. > > example udev config: > SUBSYSTEM=="net", SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" > SUBSYSTEM=="net", PROGRAM="/sbin/biosdevname -i %k --policy=all_names", SYMLINK+="net/by-chassis-id/%c" > > The following patch will create a device node for network devices based off their ifindex; > udev can then use this device node for creating symlinks in /dev/net/xxxx similar to the > way that disks now use by-label and by-path symlinks. > > Combining this with the biosdevname utility and patches to common network utilities, > it could be possible to access ethernet devices by their PCI path or BIOS Label. > > eg. ifconfig Embedded_NIC_1 Nice idea, but wouldn't it be "ifconfig LABEL=Embedded_NIC_1"? ** Still debating that.. if ifconfig and other tools were smart enough to figure it out ** Right now we have a library that lets you specify it by device node: ** ifconfig /dev/net/by-chassis-id/eth0_s0 for example > --- include/linux/major.h~ 2009-07-30 18:34:47.000000000 -0400 > +++ include/linux/major.h 2009-08-05 14:52:10.000000000 -0400 > @@ -169,6 +169,7 @@ > #define IBM_FS3270_MAJOR 228 > > #define VIOTAPE_MAJOR 230 > +#define NETDEV_MAJOR 234 > > #define BLOCK_EXT_MAJOR 259 > #define SCSI_OSD_MAJOR 260 /* open-osd's OSD scsi device */ > --- net/core/net-sysfs.cx 2009-08-05 15:00:13.000000000 -0400 > +++ net/core/net-sysfs.c 2009-08-05 15:01:20.000000000 -0400 > @@ -11,6 +11,7 @@ > > #include <linux/capability.h> > #include <linux/kernel.h> > +#include <linux/major.h> > #include <linux/netdevice.h> > #include <linux/if_arp.h> > #include <net/sock.h> > @@ -496,6 +497,7 @@ int netdev_register_kobject(struct net_d > dev->class = &net_class; > dev->platform_data = net; > dev->groups = groups; > + dev->devt = MKDEV(NETDEV_MAJOR, net->ifindex); [...] Since this major number is unregistered, the device inode can only be stat'd and not open'd, which seems like a bit of a hack. Is there anything that would stop register_chrdev(0, ...) from allocating this major number, causing network devices to be confused with some other device type? ** yeah a bit of a hack.. I suppose the netdev code could allocate a ** dynamic major node at startup with register_chrdev.. since nothing ** would use the hardcoded major:minor #s anyway (yet). Maybe there *should* be character devices for network device manipulation. It seems like that would avoid the race conditions that device renaming and removal causes for name-based socket ioctls. But maybe everyone should be using netlink for that instead. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: PATCH: Network Device Naming mechanism and policy 2009-08-19 18:56 ` Jordan_Hargrave 2009-08-19 19:26 ` Ben Hutchings @ 2009-08-20 4:41 ` Bryan Kadzban 1 sibling, 0 replies; 86+ messages in thread From: Bryan Kadzban @ 2009-08-20 4:41 UTC (permalink / raw) To: Jordan_Hargrave; +Cc: netdev, linux-hotplug [-- Attachment #1: Type: text/plain, Size: 1223 bytes --] Jordan_Hargrave@Dell.com wrote: > example udev config: > SUBSYSTEM=="net", SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}" So say I have two NICs, currently named eth0/eth1, and I keep them using those names via the current udev MAC address rules. Furthermore, say that this patch is applied, and so I start using 0.<addr0> and 1.<addr1> in my network configuration, instead of eth0/eth1. Now, say that on some given boot, these two NICs show up to the kernel in a different order. I might move them around in the machine (this is, after all, the point behind using the MAC as the identifier ;-) ), or one of them might be USB, or something else random changes the order. With this rule, they're now at 0.<addr1> and 1.<addr0>. In other words, these names are not actually persistent. If you get rid of the $sysfs{ifindex}, then this should work. If that doesn't work for what you're trying to do for some reason, then you can make another directory of symlinks by-ifindex, and use that; that should work as well. But coupling the ifindex to the MAC address like this doesn't work. (In general, coupling any two unrelated attributes when trying to do persistent names doesn't work.) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 260 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
end of thread, other threads:[~2009-11-11 6:31 UTC | newest]
Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <EDA0A4495861324DA2618B4C45DCB3EE58964E@blrx3m08.blr.amer.dell.com>
2009-10-28 13:06 ` PATCH: Network Device Naming mechanism and policy Narendra K
[not found] <EDA0A4495861324DA2618B4C45DCB3EE589541@blrx3m08.blr.amer.dell.com>
2009-10-12 18:47 ` Narendra K
2009-10-12 19:09 ` Greg KH
2009-10-12 19:41 ` Karl O. Pinc
2009-10-13 18:17 ` Dan Williams
2009-10-13 18:56 ` Ben Hutchings
2009-10-12 19:48 ` Matt Domsch
[not found] <EDA0A4495861324DA2618B4C45DCB3EE58953F@blrx3m08.blr.amer.dell.com>
2009-10-12 18:07 ` Narendra K
[not found] <EDA0A4495861324DA2618B4C45DCB3EE5894F6@blrx3m08.blr.amer.dell.com>
2009-10-09 16:04 ` Narendra K
2009-10-09 16:12 ` Stephen Hemminger
2009-10-09 16:25 ` Matt Domsch
[not found] <EDA0A4495861324DA2618B4C45DCB3EE5894ED@blrx3m08.blr.amer.dell.com>
2009-10-09 14:00 ` Narendra K
2009-10-09 14:51 ` Matt Domsch
2009-10-09 16:23 ` Bryan Kadzban
2009-10-09 16:56 ` Marco d'Itri
2009-10-12 10:41 ` Scott James Remnant
2009-10-12 11:31 ` Ben Hutchings
2009-10-12 17:37 ` Bill Nottingham
2009-10-13 18:06 ` Dan Williams
2009-10-13 18:53 ` Ben Hutchings
2009-10-13 19:53 ` John W. Linville
2009-10-09 16:36 ` Greg KH
2009-10-09 17:17 ` Matt Domsch
2009-10-09 17:22 ` Greg KH
2009-10-09 21:09 ` Matt Domsch
2009-10-10 2:44 ` Stephen Hemminger
2009-10-10 4:40 ` Matt Domsch
2009-10-10 5:23 ` Greg KH
2009-10-10 8:17 ` Sujit K M
2009-10-10 16:27 ` Greg KH
2009-10-10 19:00 ` Ben Hutchings
2009-10-10 21:10 ` Greg KH
2009-10-10 12:47 ` Matt Domsch
2009-10-10 16:25 ` Greg KH
2009-10-10 17:34 ` Bryan Kadzban
2009-10-10 21:13 ` Greg KH
2009-10-12 6:21 ` Bryan Kadzban
2009-10-12 16:19 ` Bryan Kadzban
2009-10-11 16:40 ` David Zeuthen
2009-10-11 18:47 ` Greg KH
2009-10-10 18:11 ` Bill Fink
2009-10-10 18:35 ` Kay Sievers
2009-10-11 21:10 ` Rob Townley
2009-10-11 23:04 ` Matt Domsch
2009-10-12 3:00 ` Greg KH
2009-10-12 18:35 ` Rob Townley
2009-10-12 18:44 ` Matt Domsch
2009-10-12 17:45 ` Bill Nottingham
2009-10-12 17:55 ` Greg KH
2009-10-12 18:07 ` Bill Nottingham
2009-10-12 18:15 ` Greg KH
2009-10-10 18:32 ` Stephen Hemminger
2009-10-10 21:06 ` Greg KH
2009-10-13 18:02 ` Dan Williams
2009-10-13 18:53 ` Narendra_K
2009-10-12 7:30 ` Kurt Van Dijck
2009-10-11 0:37 ` Marco d'Itri
2009-10-13 15:08 ` dann frazier
2009-10-13 17:13 ` Narendra_K
2009-10-13 17:36 ` dann frazier
2009-10-16 0:32 ` dann frazier
2009-10-16 14:02 ` Narendra_K
2009-10-16 15:20 ` dann frazier
2009-10-16 15:33 ` Ben Hutchings
2009-10-16 15:41 ` dann frazier
2009-10-16 21:40 ` dann frazier
2009-10-19 11:30 ` Narendra_K
2009-10-19 16:14 ` Bryan Kadzban
2009-11-04 14:23 ` Narendra_K
2009-11-06 8:49 ` Marco d'Itri
2009-11-06 22:06 ` Matt Domsch
2009-11-06 22:35 ` Marco d'Itri
2009-11-06 23:17 ` dann frazier
2009-11-09 14:41 ` Narendra_K
2009-11-10 17:23 ` Stephen Hemminger
2009-11-11 6:31 ` Narendra_K
2009-11-06 22:05 ` Domsch, Matt
2009-10-13 19:51 ` Greg KH
2009-10-13 20:00 ` Jordan_Hargrave
2009-10-13 20:19 ` Greg KH
2009-10-13 22:05 ` Matt Domsch
2009-10-13 22:08 ` dann frazier
[not found] <5DDAB7BA7BDB58439DD0EED0B8E9A3AE011CD92D@ausx3mpc102.aus.amer.dell.com>
2009-08-19 18:56 ` Jordan_Hargrave
2009-08-19 19:26 ` Ben Hutchings
2009-08-19 19:40 ` Jordan_Hargrave
2009-08-20 4:41 ` Bryan Kadzban
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).