* [RFD] Device Renaming Mechanism
@ 2010-10-08 5:23 Nao Nishijima
2010-10-08 20:48 ` Greg KH
2010-11-02 7:00 ` Jon Masters
0 siblings, 2 replies; 7+ messages in thread
From: Nao Nishijima @ 2010-10-08 5:23 UTC (permalink / raw)
To: gregkh, James.Bottomley, rwheeler
Cc: linux-kernel, linux-hotplug-devel, linux-hotplug,
masami.hiramatsu.pt
Hi,
I'm trying to solve a device name(or device node) mismatch problem caused by
device configuration changes. Now I have an idea of device renaming to solve it,
and would like to request for comments from kernel developers.
Device Name Mismatch
==========
Device names(e.g. sda) are assigned by the order of driver loading and device
recognizing (usually from small bus number). This may cause a device name
mismatch between previous and current boot whenever the device configuration is
changed. Suppose there is an application opens disk via /dev/sdb. When device
configuration changing (hot-plug, device breakdown) or system configuration
changing(driver loading order, changing modprobe.conf) causes changing order
device names. This device names does not always point to same disks.
This mismatch causes unexpected disk access and redundancy miss setting (e.g.
Multipath, software-raid), if you use device file names to a configuration file.
Udev Solution
======
Typically we use to avoid this problem we uses persistent device names provided
by udev.
Udev makes persistent symbolic links(by-{id, uuid, path, label}) pointing to each
device based on device information. Applications access the device via these
symbolic links. Udev solves mismatch between device name and physical disk.
However the persistent name mismatches kernel's device name.
This mismatch causes following 4 issues.
Issue 1: /proc/partitions, /proc/diskstat gives you device names
We have to run "ls -l /dev/disk/by-*" or "udevadm" for finding corresponding
persistent symbolic links.
Issue 2: dmesg output device name instead of persistent symbolic links
Users might not know which disk is sdX, because they identify the disk by a
persistent symbolic link.
Issue 3: Some system commands don't accept symbolic link(e.g. df, iostat,...)
These commands just expect sdX device name or check input by /proc information.
This will also occur on several GNOME/KDE/etc GUI sysadmin tools. :(
Issue 4: Undecided symbolic link
Even if we would like to introduce device names/persistent symbolic links
mapping tool to solve it, we can not determine a symbolic link from a device,
because several symbolic links point a device file.
Therefore, I think the symbolic link is not enough to solve. We need a
better solution.
Proposal
====
I'd like to propose introducing device renaming interface to solve these issues.
I think renaming device name in the kernel is the simplest way to solve mismatch
dmesg and /proc information. This can be done while kernel booting up(like
ifcfg). Of course, udev still needs to assign new name for each device via that
interface.
This proposal just requests to add a simple interface to kernel as below. And we
can continue to use user program without any modification.
int rename_device(const char *newname, const char *oldname)
Any comments, or suggestions are very welcome!
Best Regards,
--
Nao NISHIJIMA
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
Email: nao.nishijima.xt@hitachi.com
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFD] Device Renaming Mechanism 2010-10-08 5:23 [RFD] Device Renaming Mechanism Nao Nishijima @ 2010-10-08 20:48 ` Greg KH 2010-10-18 11:43 ` Nao Nishijima 2010-11-02 7:00 ` Jon Masters 1 sibling, 1 reply; 7+ messages in thread From: Greg KH @ 2010-10-08 20:48 UTC (permalink / raw) To: Nao Nishijima Cc: gregkh, James.Bottomley, rwheeler, linux-kernel, linux-hotplug, masami.hiramatsu.pt On Fri, Oct 08, 2010 at 02:23:32PM +0900, Nao Nishijima wrote: > Hi, > > I'm trying to solve a device name(or device node) mismatch problem caused by > device configuration changes. Now I have an idea of device renaming to solve it, > and would like to request for comments from kernel developers. Hi, We spoke at LinuxCon Tokyo about this, so it's good to see you continue here on the mailing lists. > Device Name Mismatch > ========== > > Device names(e.g. sda) are assigned by the order of driver loading and device > recognizing (usually from small bus number). This may cause a device name > mismatch between previous and current boot whenever the device configuration is > changed. This is as it always has been. > This mismatch causes unexpected disk access and redundancy miss setting (e.g. > Multipath, software-raid), if you use device file names to a configuration file. That's why most people know to not use the kernel name for their fstab, and why distros haven't for a very long time, used them when installing. > Udev Solution > ======> > Typically we use to avoid this problem we uses persistent device names provided > by udev. Yes. > Udev makes persistent symbolic links(by-{id, uuid, path, label}) pointing to each > device based on device information. Applications access the device via these > symbolic links. Udev solves mismatch between device name and physical disk. > However the persistent name mismatches kernel's device name. > This mismatch causes following 4 issues. > > Issue 1: /proc/partitions, /proc/diskstat gives you device names > We have to run "ls -l /dev/disk/by-*" or "udevadm" for finding corresponding > persistent symbolic links. That's fine, don't use the /proc files :) > Issue 2: dmesg output device name instead of persistent symbolic links > Users might not know which disk is sdX, because they identify the disk by a > persistent symbolic link. Again, that's fine as well, don't use "raw" dmesg output and expect it to map to what the user has mounted. There are a number of tools to handle the mapping of this out there by companies (IBM, CA, etc.) > Issue 3: Some system commands don't accept symbolic link(e.g. df, iostat,...) > These commands just expect sdX device name or check input by /proc information. > This will also occur on several GNOME/KDE/etc GUI sysadmin tools. :( Then we should fix those programs, it is a simple one line change. Can you please tell us which programs you have tested and found problems with? > Issue 4: Undecided symbolic link > Even if we would like to introduce device names/persistent symbolic links > mapping tool to solve it, we can not determine a symbolic link from a device, > because several symbolic links point a device file. Huh? You want to move from a kernel name to the one that the user used? As you are the user, you should know which one you used :) > Therefore, I think the symbolic link is not enough to solve. We need a > better solution. I strongly disagree. > Proposal > ==== > I'd like to propose introducing device renaming interface to solve these issues. Ick, no, please, no. > I think renaming device name in the kernel is the simplest way to solve mismatch > dmesg and /proc information. This can be done while kernel booting up(like > ifcfg). Of course, udev still needs to assign new name for each device via that > interface. > > This proposal just requests to add a simple interface to kernel as below. And we > can continue to use user program without any modification. > > int rename_device(const char *newname, const char *oldname) I'll quote a message that Kay wrote to me last week when I told him about this talk. It's why we don't want to rename kernel devices, and why using symlinks are the way to go: - All links or nodes can be stat()'d an then /sys/dev/block/M:m points to the kernel. So easy! - Libudev provides all device meta information, list of links, events. No app/management tool can ever work properly in 2010 that does not react on hotplug or device update events. 1980 is over, we are all 100% hotplug aware, or we die! - Kernel device renaming is very fragile and only done for netdevs because they can't have symlinks. There are many cross-refs for blockdevs like holders/ slaves/ sysfs dirs, they all need to be renamed atomically and race-free, which is almost impossible I would say. - Biggest problem with renaming is that the device gets advertised and is accessed immediately by userspace. Renaming after advertising (sysfs, devtmpfs, uevent) is very difficult, racy, almost impossible. - The only option to have named block devs is to have change the block layer to create intermediate devices in sysfs (which are advertised but not accessible as blockdevs) and then let userspace hook into it and request a real blockdev with a specified name, and only _after_ this create the real blockdev. This is, and must be, not a renaming, but a naming. Kay said he had lots more reasons why this shouldn't be done, if you want them as well :) Also, note that the network people are really wanting symlinks these days, and the fact that renaming the device can cause problems, so please don't look to them as the "correct" solution at all. Hope this helps, greg k-h ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFD] Device Renaming Mechanism 2010-10-08 20:48 ` Greg KH @ 2010-10-18 11:43 ` Nao Nishijima 2010-10-18 12:33 ` Kay Sievers 0 siblings, 1 reply; 7+ messages in thread From: Nao Nishijima @ 2010-10-18 11:43 UTC (permalink / raw) To: Greg KH, kay.sievers Cc: James.Bottomley, rwheeler, linux-kernel, linux-hotplug, masami.hiramatsu.pt, Matt_Domsch Hello, (2010/10/09 5:48), Greg KH wrote: > >> Udev makes persistent symbolic links(by-{id, uuid, path, label}) pointing to each >> device based on device information. Applications access the device via these >> symbolic links. Udev solves mismatch between device name and physical disk. >> However the persistent name mismatches kernel's device name. >> This mismatch causes following 4 issues. >> >> Issue 1: /proc/partitions, /proc/diskstat gives you device names >> We have to run "ls -l /dev/disk/by-*" or "udevadm" for finding corresponding >> persistent symbolic links. > > That's fine, don't use the /proc files :) I think that the tool which displays persistent names instead of device names, is necessary for uses who use persistent symbolic links. > >> Issue 2: dmesg output device name instead of persistent symbolic links >> Users might not know which disk is sdX, because they identify the disk by a >> persistent symbolic link. > > Again, that's fine as well, don't use "raw" dmesg output and expect it > to map to what the user has mounted. There are a number of tools to > handle the mapping of this out there by companies (IBM, CA, etc.) > Are these tools released under open source license? (IOW, anyone who uses linux can use it?) If not, we should provide a tool under open source license, because I think that it is necessary to solve the issue of OSS kernel with OSS tool. >> Issue 3: Some system commands don't accept symbolic link(e.g. df, iostat,...) >> These commands just expect sdX device name or check input by /proc information. >> This will also occur on several GNOME/KDE/etc GUI sysadmin tools. :( > > Then we should fix those programs, it is a simple one line change. Can > you please tell us which programs you have tested and found problems > with? > Persistent symbolic links cannot be used for the argument of following commands. "smartctl", "sgpio", "grub-isntall", "iostat". And we should have to fix a number of commands which use device name for output, because those commands do not show persistent device name. >> Issue 4: Undecided symbolic link >> Even if we would like to introduce device names/persistent symbolic links >> mapping tool to solve it, we can not determine a symbolic link from a device, >> because several symbolic links point a device file. > > Huh? You want to move from a kernel name to the one that the user used? > As you are the user, you should know which one you used :) > On a multi-admin system, an admin may not be able to know which name has been used by others. Of course, that case might be finally handled by themselves. But from the viewpoint of fool-proof, current solution is easy to lead that situation. >> Therefore, I think the symbolic link is not enough to solve. We need a >> better solution. > > I strongly disagree. > >> Proposal >> ==== >> I'd like to propose introducing device renaming interface to solve these issues. > > Ick, no, please, no. > >> I think renaming device name in the kernel is the simplest way to solve mismatch >> dmesg and /proc information. This can be done while kernel booting up(like >> ifcfg). Of course, udev still needs to assign new name for each device via that >> interface. >> >> This proposal just requests to add a simple interface to kernel as below. And we >> can continue to use user program without any modification. >> >> int rename_device(const char *newname, const char *oldname) > > I'll quote a message that Kay wrote to me last week when I told him > about this talk. It's why we don't want to rename kernel devices, and > why using symlinks are the way to go: > > - All links or nodes can be stat()'d an then /sys/dev/block/M:m > points to the kernel. So easy! > Hmm, agreed > - Libudev provides all device meta information, list of links, > events. No app/management tool can ever work properly in 2010 > that does not react on hotplug or device update events. 1980 > is over, we are all 100% hotplug aware, or we die! > Yes. > - Kernel device renaming is very fragile and only done for > netdevs because they can't have symlinks. There are many > cross-refs for blockdevs like holders/ slaves/ sysfs dirs, > they all need to be renamed atomically and race-free, which is > almost impossible I would say. > > - Biggest problem with renaming is that the device gets > advertised and is accessed immediately by userspace. Renaming > after advertising (sysfs, devtmpfs, uevent) is very difficult, > racy, almost impossible. > I agree that renaming after advertising to be difficult, but network goes well? > - The only option to have named block devs is to have change the > block layer to create intermediate devices in sysfs (which are > advertised but not accessible as blockdevs) and then let > userspace hook into it and request a real blockdev with a > specified name, and only _after_ this create the real > blockdev. This is, and must be, not a renaming, but a naming. > It sounds good to me. but i don't understand clearly. Is "Not accessible as blockdevs" meaning that a device not register bdi(backing_dev_info) list or Major/miner not given to device? Could you tell me in detail? > Kay said he had lots more reasons why this shouldn't be done, if you > want them as well :) > > Also, note that the network people are really wanting symlinks these > days, and the fact that renaming the device can cause problems, so > please don't look to them as the "correct" solution at all. > I think that network people will face the same mismatch problems because they use symlinks. I understand that renaming is a problem. I'd like to try Kay's idea. > Hope this helps, > > greg k-h > Thank you for your advice. Best Regards, -- Nao NISHIJIMA 2nd Dept. Linux Technology Center Hitachi, Ltd., Systems Development Laboratory Email: nao.nishijima.xt@hitachi.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFD] Device Renaming Mechanism 2010-10-18 11:43 ` Nao Nishijima @ 2010-10-18 12:33 ` Kay Sievers 2010-10-25 10:55 ` Nao Nishijima 0 siblings, 1 reply; 7+ messages in thread From: Kay Sievers @ 2010-10-18 12:33 UTC (permalink / raw) To: Nao Nishijima Cc: Greg KH, James.Bottomley, rwheeler, linux-kernel, linux-hotplug, masami.hiramatsu.pt, Matt_Domsch On Mon, Oct 18, 2010 at 13:43, Nao Nishijima <nao.nishijima.xt@hitachi.com> wrote: >> - Kernel device renaming is very fragile and only done for >> netdevs because they can't have symlinks. There are many >> cross-refs for blockdevs like holders/ slaves/ sysfs dirs, >> they all need to be renamed atomically and race-free, which is >> almost impossible I would say. >> >> - Biggest problem with renaming is that the device gets >> advertised and is accessed immediately by userspace. Renaming >> after advertising (sysfs, devtmpfs, uevent) is very difficult, >> racy, almost impossible. >> > > I agree that renaming after advertising to be difficult, but network goes well? Not at all. It's a complete mess I wouldn't recommend doing. Udev's default does this for the common case, but it has many cases where stuff just goes wrong and can never be solved properly. Netif names need to be swapped sometimes, and then you need temporary renaming to a non-clashing name, and sleep() until the desired name becomes available. During all that, the netlink messages announce all these changes/new names to possible applications. Uevents get out of sync, the devpaths of devices swap around. In short: it's a complete nightmare from the view of reliability, and I strongly suggest not even to think about to try that model on any other subsystem. >> - The only option to have named block devs is to have change the >> block layer to create intermediate devices in sysfs (which are >> advertised but not accessible as blockdevs) and then let >> userspace hook into it and request a real blockdev with a >> specified name, and only _after_ this create the real >> blockdev. This is, and must be, not a renaming, but a naming. >> > > It sounds good to me. but i don't understand clearly. > Is "Not accessible as blockdevs" meaning that a device not register > bdi(backing_dev_info) list or Major/miner not given to device? > Could you tell me in detail? It's all about the userspace visible device state. You can't export a blockdev which you are going to rename shortly after this, it does confuse usespace, and can not made work reliably. How it's implemented inside the kernel does not really matter for the outside as long as it has a step for userspace to provide the name, that is used to create(not rename) and announce the device with. It would need to be some intermediate device, which is not a blockdev, and has no dev_t assigned, and exports needed metadata to compose a device name from it. This name is then used to create the real blockdev. Note, that I'm not suggesting to do anything like that. It would just be the only model that *could* be made working. The way network interfaces are handled must not be applied to other subsystems. Things like device-mapper could probably get a reasonable way to provide a fixed name for the dm device to create. They are created by userspace request only, they know the metadata before the request -- so this sounds feasible in some way. As long as they can not be renamed afterwards, which can't work for many other reasons. Device-mapper could maybe make the dm UUID mandatory, and use it as the device name. It will break a bunch of tools, which match on device names, but I guess it *could* be made working (if it does not involve any later renaming). > I think that network people will face the same mismatch problems because they > use symlinks. The thing is that unlike blockdevs, netif names may have meaning inside the kernel, like matching wildcards in iptables and such. That makes any out-of-kernel "alias"-model for netifs much more complicated than it is for blockdevs where "aliases" only need to exist in userspace. > I understand that renaming is a problem. I'd like to try Kay's idea. I wouldn't even try. Besides the mentioned device mapper mandatory/non-changeable UUID=device-name approach which could work, I don't think that renaming of sd devices can be made working, without rewriting half of all existing userspace. :) Kay ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFD] Device Renaming Mechanism 2010-10-18 12:33 ` Kay Sievers @ 2010-10-25 10:55 ` Nao Nishijima 0 siblings, 0 replies; 7+ messages in thread From: Nao Nishijima @ 2010-10-25 10:55 UTC (permalink / raw) To: Kay Sievers, greg Cc: James.Bottomley, rwheeler, linux-kernel, linux-hotplug, masami.hiramatsu.pt, Matt_Domsch Hello, (2010/10/18 21:33), Kay Sievers wrote: > > In short: it's a complete nightmare from the view of reliability, and > I strongly suggest not even to think about to try that model on any > other subsystem. I see. I understood that network interface and block device are different. > Device-mapper could maybe make the dm UUID mandatory, and use it as > the device name. It will break a bunch of tools, which match on device > names, but I guess it *could* be made working (if it does not involve > any later renaming). Indeed, device-mapper can provide a fixed name. However, still there is mismatch between the dm device name and the troubling device name in kernel log. That is the reason why I'm still sticking around the device renaming method. In addition, using of device-mapper is worry about the performance and management cost. I think the method of using the intermediate device is the simplest solution. Furthermore that method is compatible with old applications which are hard to modify. > I wouldn't even try. Besides the mentioned device mapper > mandatory/non-changeable UUID=device-name approach which could work, > I don't think that renaming of sd devices can be made working, without > rewriting half of all existing userspace. :) Even though, if it is only way to solve log-mismatch problem, I'd rather try to rewrite a half of those tools. :-) > > Kay > > Thank you for your advice. Best Regards, -- Nao NISHIJIMA 2nd Dept. Linux Technology Center Hitachi, Ltd., Systems Development Laboratory Email: nao.nishijima.xt@hitachi.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFD] Device Renaming Mechanism 2010-10-08 5:23 [RFD] Device Renaming Mechanism Nao Nishijima 2010-10-08 20:48 ` Greg KH @ 2010-11-02 7:00 ` Jon Masters 2010-11-02 11:14 ` Greg KH 1 sibling, 1 reply; 7+ messages in thread From: Jon Masters @ 2010-11-02 7:00 UTC (permalink / raw) To: Nao Nishijima Cc: gregkh, James.Bottomley, rwheeler, linux-kernel, linux-hotplug-devel, linux-hotplug, masami.hiramatsu.pt, mdomsch On Fri, 2010-10-08 at 14:23 +0900, Nao Nishijima wrote: > I'm trying to solve a device name(or device node) mismatch problem caused by > device configuration changes. Now I have an idea of device renaming to solve it, > and would like to request for comments from kernel developers. I apologize that I missed this mail until I was doing some podcasts ;) Although not necessarily useful for within-device volumes, I would nonetheless like to call attention to the DMTF SMBIOS specification, and in particular structure Type 9, which allows system vendors to provide various information about the correct mappings of physical system slots to devices. Matt Domsch produced a utility a while back called biosdevname that can be used to implement one possible device renaming mechanism for network interfaces, based on the system slot identifier, but of course there is no reason not to support the other devices. I have read the other mails, but I would love to know what Greg and Kay think about supporting automatic renaming in the case where the system *actually told you the preferred name* in such a table. Perhaps we can discuss this in Cambridge this week. Jon. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFD] Device Renaming Mechanism 2010-11-02 7:00 ` Jon Masters @ 2010-11-02 11:14 ` Greg KH 0 siblings, 0 replies; 7+ messages in thread From: Greg KH @ 2010-11-02 11:14 UTC (permalink / raw) To: Jon Masters Cc: Nao Nishijima, James.Bottomley, rwheeler, linux-kernel, linux-hotplug-devel, linux-hotplug, masami.hiramatsu.pt, mdomsch On Tue, Nov 02, 2010 at 03:00:08AM -0400, Jon Masters wrote: > I have read the other mails, but I would love to know what Greg and Kay > think about supporting automatic renaming in the case where the system > *actually told you the preferred name* in such a table. Perhaps we can > discuss this in Cambridge this week. Perhaps you should look in the archives where I have stated my position about this a number of times[1] :) thanks, greg k-h [1] Do it in userspace ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-11-02 11:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-08 5:23 [RFD] Device Renaming Mechanism Nao Nishijima 2010-10-08 20:48 ` Greg KH 2010-10-18 11:43 ` Nao Nishijima 2010-10-18 12:33 ` Kay Sievers 2010-10-25 10:55 ` Nao Nishijima 2010-11-02 7:00 ` Jon Masters 2010-11-02 11:14 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).