* [PATCH] blkid: optimize dm_device_is_leaf() usage @ 2008-08-25 20:48 Karel Zak 2008-08-26 12:24 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Karel Zak @ 2008-08-25 20:48 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4, Eric Sandeen, Karel Zak * devname.c: probe_one(): call dm_device_is_leaf() with *constant* devno argument in the list_for_each_safe loop does not make sense at all. * devname.c: probe_one(): call dm_device_is_leaf() for already checked DM devices is unnecessary. DM devices are already checked in dm_probe_all(). The dm_probe_all() function never returns slave devices. * devname.c: dm_device_is_leaf(): stop checking for dependences when a first dependence is detected. Performance test: # for i in $(seq 1 100); do dmsetup create $i --table "0 1 zero"; done new version: # blkid &> /dev/null real 0m0.267s user 0m0.047s sys 0m0.212s old version: # blkid &> /dev/null real 0m2.149s user 0m0.291s sys 0m1.820s Signed-off-by: Karel Zak <kzak@redhat.com> --- lib/blkid/devname.c | 37 ++++++++++++++++++++----------------- 1 files changed, 20 insertions(+), 17 deletions(-) diff --git a/lib/blkid/devname.c b/lib/blkid/devname.c index 86fd44c..4967b96 100644 --- a/lib/blkid/devname.c +++ b/lib/blkid/devname.c @@ -133,25 +133,27 @@ static void probe_one(blkid_cache cache, const char *ptname, const char **dir; char *devname = NULL; - /* See if we already have this device number in the cache. */ - list_for_each_safe(p, pnext, &cache->bic_devs) { - blkid_dev tmp = list_entry(p, struct blkid_struct_dev, - bid_devs); #ifdef HAVE_DEVMAPPER - if (!dm_device_is_leaf(devno)) - continue; + /* call dm_device_is_leaf() for non-DM device only */ + if (pri == BLKID_PRI_DM || dm_device_is_leaf(devno)) #endif - if (tmp->bid_devno == devno) { - if (only_if_new && !access(tmp->bid_name, F_OK)) - return; - dev = blkid_verify(cache, tmp); - if (dev && (dev->bid_flags & BLKID_BID_FL_VERIFIED)) - break; - dev = 0; + { + /* See if we already have this device number in the cache. */ + list_for_each_safe(p, pnext, &cache->bic_devs) { + blkid_dev tmp = list_entry(p, struct blkid_struct_dev, + bid_devs); + if (tmp->bid_devno == devno) { + if (only_if_new && !access(tmp->bid_name, F_OK)) + return; + dev = blkid_verify(cache, tmp); + if (dev && (dev->bid_flags & BLKID_BID_FL_VERIFIED)) + break; + dev = 0; + } } + if (dev && dev->bid_devno == devno) + goto set_pri; } - if (dev && dev->bid_devno == devno) - goto set_pri; /* * Take a quick look at /dev/ptname for the device number. We check @@ -271,9 +273,10 @@ static int dm_device_is_leaf(const dev_t dev) do { names = (struct dm_names *) ((char *)names + next); - if (dm_device_has_dep(dev, names->name)) + if (dm_device_has_dep(dev, names->name)) { ret = 0; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-25 20:48 [PATCH] blkid: optimize dm_device_is_leaf() usage Karel Zak @ 2008-08-26 12:24 ` Theodore Tso 2008-08-26 13:51 ` Karel Zak 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-08-26 12:24 UTC (permalink / raw) To: Karel Zak; +Cc: linux-ext4, Eric Sandeen Hi Karel, Thanks! Your patch forced me to look much more closely at the device mapper code, and after looking at it, I've been led to the inescapable conclusion that the whole thing is total cr*p. :-) My only excuse was that at the time when I accepted the patch, I wasn't using a device-mapper based system, and couldn't do live testing of the code, so I didn't realize how much of what it was doing was totally unnecessary. What I've commited into the git tree completely rips out the need to use the device mapper library, and instead relies on the already existing and well-tested code paths that supports non-dm devices. This has the nice side benefit that initrd's that want to use blkid will no longer need to pull in libdevmapper, libselinux, and libsepol --- or at least, if initrd's need them, it will no longer be on /sbin/blkid's account. And, we can yank all of the configure machinery for including libdevmapper, et. al., from e2fsprogs. It has an even more salutory performance benefit (about 3-6 times faster than yours, I estimate, since we don't end up calling into libdevmapper *at* *all*). On an X61s laptop, with: # for i in $(seq 1 100); do dmsetup create $i --table "0 1 zero"; done old version: # time /sbin/blkid >& /dev/null real 0m4.227s user 0m0.623s sys 0m3.406s new version: # time ./misc/blkid >& /dev/null real 0m0.080s user 0m0.000s sys 0m0.010s - Ted commit 3f66ecf24e896377997b909edef040be98ac76b3 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Aug 26 08:13:56 2008 -0400 libblkid: Optimize devicemapper support This commit works by removing all calls from libdevicemapper altogether, and using the standard support for "normal" non-dm devices. It depends on dm devices being placed in /dev/mapper (but the previous code had this dependency anyway), and /proc/partitions containing dm devices. We don't actually rip out the libdevicemapper code in this commit, but just disable it via #undef HAVE_DEVMAPPER, just so it's easier to review and understand the changes that were made. A subsequent commit will remove the libdevicemapper code, as well as unexport blkid_devdirs. Thanks to Karel Zak for inspiring me to look at the dm code in blkid, so I could realize how much it deserved to ripped out by its roots. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/lib/blkid/devname.c b/lib/blkid/devname.c index 86fd44c..5f9c228 100644 --- a/lib/blkid/devname.c +++ b/lib/blkid/devname.c @@ -38,6 +38,8 @@ #include "blkidP.h" +#undef HAVE_DEVMAPPER + #ifdef HAVE_DEVMAPPER #include <libdevmapper.h> #endif @@ -122,6 +124,9 @@ blkid_dev blkid_get_dev(blkid_cache cache, const char *devname, int flags) static int dm_device_is_leaf(const dev_t dev); #endif +/* Directories where we will try to search for device names */ +static const char *dirlist[] = { "/dev", "/dev/mapper", "/devfs", "/devices", NULL }; + /* * Probe a single block device to add to the device cache. */ @@ -159,17 +164,17 @@ static void probe_one(blkid_cache cache, const char *ptname, * the stat information doesn't check out, use blkid_devno_to_devname() * to find it via an exhaustive search for the device major/minor. */ - for (dir = blkid_devdirs; *dir; dir++) { + for (dir = dirlist; *dir; dir++) { struct stat st; char device[256]; - sprintf(device, "%s/%s", *dir, ptname); if ((dev = blkid_get_dev(cache, device, BLKID_DEV_FIND)) && dev->bid_devno == devno) goto set_pri; if (stat(device, &st) == 0 && S_ISBLK(st.st_mode) && st.st_rdev == devno) { + sprintf(device, "%s/%s", *dir, ptname); devname = blkid_strdup(device); break; } @@ -183,10 +188,14 @@ static void probe_one(blkid_cache cache, const char *ptname, free(devname); set_pri: - if (!pri && !strncmp(ptname, "md", 2)) - pri = BLKID_PRI_MD; - if (dev) - dev->bid_pri = pri; + if (dev) { + if (pri) + dev->bid_pri = pri; + else if (!strncmp(dev->bid_name, "/dev/mapper/", 11)) + dev->bid_pri = BLKID_PRI_DM; + else if (!strncmp(ptname, "md", 2)) + dev->bid_pri = BLKID_PRI_MD; + } return; } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 12:24 ` Theodore Tso @ 2008-08-26 13:51 ` Karel Zak 2008-08-26 14:47 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Karel Zak @ 2008-08-26 13:51 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4, Eric Sandeen, mbroz, agk On Tue, Aug 26, 2008 at 08:24:05AM -0400, Theodore Tso wrote: > commit 3f66ecf24e896377997b909edef040be98ac76b3 > Author: Theodore Ts'o <tytso@mit.edu> > Date: Tue Aug 26 08:13:56 2008 -0400 > > libblkid: Optimize devicemapper support > > This commit works by removing all calls from libdevicemapper > altogether, and using the standard support for "normal" non-dm > devices. > > It depends on dm devices being placed in /dev/mapper (but the previous > code had this dependency anyway), and /proc/partitions containing dm > devices. Well, I see few problems: * /proc/partitions containing internal dm device names (e.g. dm-0). The libdevmapper provides translation from internal to the "real" names (e.g /dev/mapper/foo). I guess (hope:-) /sys provides the real names too. * we probably need to resolve dependencies for multi-path devices where the same FS is accessable by more than one physical device. If I good remember it was the original purpose for DM support in libblkid. # mount LABEL=blabla /mnt has to mount the "right" device. I guess that only DM is able to answer which device is the "right" one ;-) The /sys/block/<devname>/slaves/ provides information about dependencies. I see these two things as critical for fsck, mount, ... > > +/* Directories where we will try to search for device names */ > +static const char *dirlist[] = { "/dev", "/dev/mapper", "/devfs", "/devices", NULL }; I think "/dev/mapper" does not make any sense here, ...because the names from /proc/partitions are in dm-<N> format, but the names in /dev/mapper uses different format. > - if (!pri && !strncmp(ptname, "md", 2)) > - pri = BLKID_PRI_MD; > - if (dev) > - dev->bid_pri = pri; > + if (dev) { > + if (pri) > + dev->bid_pri = pri; > + else if (!strncmp(dev->bid_name, "/dev/mapper/", 11)) > + dev->bid_pri = BLKID_PRI_DM; the same problem > + else if (!strncmp(ptname, "md", 2)) > + dev->bid_pri = BLKID_PRI_MD; > + } > return; > } Karel -- Karel Zak <kzak@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 13:51 ` Karel Zak @ 2008-08-26 14:47 ` Theodore Tso 2008-08-26 18:04 ` Theodore Tso ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Theodore Tso @ 2008-08-26 14:47 UTC (permalink / raw) To: Karel Zak; +Cc: linux-ext4, Eric Sandeen, mbroz, agk On Tue, Aug 26, 2008 at 03:51:02PM +0200, Karel Zak wrote: > Well, I see few problems: > > * /proc/partitions containing internal dm device names (e.g. dm-0). > The libdevmapper provides translation from internal to the "real" > names (e.g /dev/mapper/foo). I guess (hope:-) /sys provides the > real names too. You're right. So seaching for /dev/mapper/dm-n doesn't make any sense; adding /dev/mapper to the dirlist doesn't help, and in fact is a waste of time. However, the patch actually *did* work, and the reason why it does is because we are also are searching /dev/mapper by device number, and so we are finding the device name that way. > * we probably need to resolve dependencies for multi-path devices > where the same FS is accessable by more than one physical device. > If I good remember it was the original purpose for DM support > in libblkid. > > # mount LABEL=blabla /mnt > > has to mount the "right" device. I guess that only DM is able to > answer which device is the "right" one ;-) I don't think you mean multipath support in terms of where there are multiple paths to the same physical device ala fiber channel, but rather where are multiple devices which are built on each other, right? So where /dev/sda is used to create the LVM PV's, and so on, right? > The /sys/block/<devname>/slaves/ provides information about > dependencies. > > I see these two things as critical for fsck, mount, ... Right. And here, we are solving the problem already in that we prefer the /dev/mapper/* names over names like /dev/sda and /dev/sdb. That's what the priority field is all about. What we don't solve is the problem where one devicemapper device is used to build another device mapper device. This could happen in a number of circumstances. You might have some wierd circumstance where /dev/mapper/part1 and /dev/mapper/part2 are glued together to make /dev/mapper/whole-filesystem. Why you might do this instead of simply using something like lvextend is beyond me, but that is something legitimate can be done with the low-level device mapper primitives. But, #1, there are times when picking a leaf dm device over a non-leaf dm device is not the right thing to do (which would be the case when you make a live snapshot of a filesystem), and #2, your patch only checks non-leaf dm devices for non-dm devices, probably because of #1. So with both of our patches, we have the problem where we could pick the wrong dm device if the user builds one dm device on top of another dm device. (Although for transient devices, such as temporary snapshots, if the permanent devices is already in /etc/blkid.tab, the cache makes us win since we'll prefer the device already in the cache file to the other.) But in terms of making sure we pick the device mapper file over the raw file, which is the 99.99% common case, we do the right thing, and we can do the right thing without calling the devmapper library. > > +/* Directories where we will try to search for device names */ > > +static const char *dirlist[] = { "/dev", "/dev/mapper", "/devfs", "/devices", NULL }; > > I think "/dev/mapper" does not make any sense here, ...because the > names from /proc/partitions are in dm-<N> format, but the names in > /dev/mapper uses different format. Yep, I should remove that. > > + if (dev) { > > + if (pri) > > + dev->bid_pri = pri; > > + else if (!strncmp(dev->bid_name, "/dev/mapper/", 11)) > > + dev->bid_pri = BLKID_PRI_DM; > > the same problem This does work, because we do find the /dev/mapper name via a brute-force search of /dev looking for a matching devno when we call blkid_devno_to_devname(). What I *can* do is do a special search of /dev/mapper first, but instead of looking for /dev/mapper/<ptname>, to do a readdir search of /dev/mapper looking for the matching devno. That's merely an optimization, which doesn't really matter for Linux since stat's are so fast. But it's probably worth doing. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 14:47 ` Theodore Tso @ 2008-08-26 18:04 ` Theodore Tso 2008-08-26 19:44 ` Andreas Dilger 2008-08-26 20:47 ` Karel Zak 2 siblings, 0 replies; 15+ messages in thread From: Theodore Tso @ 2008-08-26 18:04 UTC (permalink / raw) To: Karel Zak; +Cc: linux-ext4, Eric Sandeen, mbroz, agk Here's the new patch. - Ted commit 2e6c2735e19ec19821eeff1cbdf11c09c25540f0 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Aug 26 08:13:56 2008 -0400 libblkid: Optimize devicemapper support This commit works by removing all calls from libdevmapper altogether, and using the standard support for "normal" non-dm devices. It depends on dm devices being placed in /dev/mapper (but the previous code had this dependency anyway), and /proc/partitions containing dm devices. We don't actually rip out the libdevmapper code in this commit, but just disable it via #undef HAVE_DEVMAPPER, just so it's easier to review and understand the fundamental code changes. A subsequent commit will remove the libdevmapper code, as well as unexport the blkid_devdirs string array. Thanks to Karel Zak for inspiring me to look at the dm code in blkid, so I could realize how much it deserved to ripped out by its roots. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/lib/blkid/blkidP.h b/lib/blkid/blkidP.h index dfe0eca..9d1e459 100644 --- a/lib/blkid/blkidP.h +++ b/lib/blkid/blkidP.h @@ -153,6 +153,13 @@ extern void blkid_debug_dump_dev(blkid_dev dev); extern void blkid_debug_dump_tag(blkid_tag tag); #endif +/* devno.c */ +struct dir_list { + char *name; + struct dir_list *next; +}; +extern void blkid__scan_dir(char *, dev_t, struct dir_list **, char **); + /* lseek.c */ extern blkid_loff_t blkid_llseek(int fd, blkid_loff_t offset, int whence); diff --git a/lib/blkid/devname.c b/lib/blkid/devname.c index 86fd44c..ec3cff3 100644 --- a/lib/blkid/devname.c +++ b/lib/blkid/devname.c @@ -38,6 +38,8 @@ #include "blkidP.h" +#undef HAVE_DEVMAPPER + #ifdef HAVE_DEVMAPPER #include <libdevmapper.h> #endif @@ -122,6 +124,9 @@ blkid_dev blkid_get_dev(blkid_cache cache, const char *devname, int flags) static int dm_device_is_leaf(const dev_t dev); #endif +/* Directories where we will try to search for device names */ +static const char *dirlist[] = { "/dev", "/devfs", "/devices", NULL }; + /* * Probe a single block device to add to the device cache. */ @@ -159,7 +164,7 @@ static void probe_one(blkid_cache cache, const char *ptname, * the stat information doesn't check out, use blkid_devno_to_devname() * to find it via an exhaustive search for the device major/minor. */ - for (dir = blkid_devdirs; *dir; dir++) { + for (dir = dirlist; *dir; dir++) { struct stat st; char device[256]; @@ -174,6 +179,9 @@ static void probe_one(blkid_cache cache, const char *ptname, break; } } + /* Do a short-cut scan of /dev/mapper first */ + if (!devname) + blkid__scan_dir("/dev/mapper", devno, 0, &devname); if (!devname) { devname = blkid_devno_to_devname(devno); if (!devname) @@ -183,10 +191,14 @@ static void probe_one(blkid_cache cache, const char *ptname, free(devname); set_pri: - if (!pri && !strncmp(ptname, "md", 2)) - pri = BLKID_PRI_MD; - if (dev) - dev->bid_pri = pri; + if (dev) { + if (pri) + dev->bid_pri = pri; + else if (!strncmp(dev->bid_name, "/dev/mapper/", 11)) + dev->bid_pri = BLKID_PRI_DM; + else if (!strncmp(ptname, "md", 2)) + dev->bid_pri = BLKID_PRI_MD; + } return; } diff --git a/lib/blkid/devno.c b/lib/blkid/devno.c index 61b34bf..1962c8d 100644 --- a/lib/blkid/devno.c +++ b/lib/blkid/devno.c @@ -33,11 +33,6 @@ #include "blkidP.h" -struct dir_list { - char *name; - struct dir_list *next; -}; - char *blkid_strndup(const char *s, int length) { char *ret; @@ -95,8 +90,8 @@ static void free_dirlist(struct dir_list **list) *list = NULL; } -static void scan_dir(char *dirname, dev_t devno, struct dir_list **list, - char **devname) +void blkid__scan_dir(char *dirname, dev_t devno, struct dir_list **list, + char **devname) { DIR *dir; struct dirent *dp; @@ -127,7 +122,7 @@ static void scan_dir(char *dirname, dev_t devno, struct dir_list **list, path, *devname)); break; } - if (S_ISDIR(st.st_mode) && !lstat(path, &st) && + if (list && S_ISDIR(st.st_mode) && !lstat(path, &st) && S_ISDIR(st.st_mode)) add_to_dirlist(path, list); } @@ -161,7 +156,7 @@ char *blkid_devno_to_devname(dev_t devno) list = list->next; DBG(DEBUG_DEVNO, printf("directory %s\n", current->name)); - scan_dir(current->name, devno, &new_list, &devname); + blkid__scan_dir(current->name, devno, &new_list, &devname); free(current->name); free(current); if (devname) ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 14:47 ` Theodore Tso 2008-08-26 18:04 ` Theodore Tso @ 2008-08-26 19:44 ` Andreas Dilger 2008-08-26 20:00 ` Theodore Tso 2008-08-26 20:47 ` Karel Zak 2 siblings, 1 reply; 15+ messages in thread From: Andreas Dilger @ 2008-08-26 19:44 UTC (permalink / raw) To: Theodore Tso; +Cc: Karel Zak, linux-ext4, Eric Sandeen, mbroz, agk On Aug 26, 2008 10:47 -0400, Theodore Ts'o wrote: > I don't think you mean multipath support in terms of where there are > multiple paths to the same physical device ala fiber channel, but > rather where are multiple devices which are built on each other, > right? So where /dev/sda is used to create the LVM PV's, and so on, > right? No, in fact DM has actual mutliple-paths-to-the-same-device support, via "dm-multipath". Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 19:44 ` Andreas Dilger @ 2008-08-26 20:00 ` Theodore Tso 0 siblings, 0 replies; 15+ messages in thread From: Theodore Tso @ 2008-08-26 20:00 UTC (permalink / raw) To: Andreas Dilger; +Cc: Karel Zak, linux-ext4, Eric Sandeen, mbroz, agk On Tue, Aug 26, 2008 at 01:44:45PM -0600, Andreas Dilger wrote: > On Aug 26, 2008 10:47 -0400, Theodore Ts'o wrote: > > I don't think you mean multipath support in terms of where there are > > multiple paths to the same physical device ala fiber channel, but > > rather where are multiple devices which are built on each other, > > right? So where /dev/sda is used to create the LVM PV's, and so on, > > right? > > No, in fact DM has actual mutliple-paths-to-the-same-device support, > via "dm-multipath". Well, *that* we have, as long as the parent devices are real (non-devicemapper) devices. So if /dev/sdc and /dev/sdg are both paths to the same filesystem, and dm-multipath has created /dev/mapper/filesystem as the multipath device to that filesystem, any devices with /dev/mapper have priority over non-dm devices, so /dev/mapper/filesystem will get returned first. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 14:47 ` Theodore Tso 2008-08-26 18:04 ` Theodore Tso 2008-08-26 19:44 ` Andreas Dilger @ 2008-08-26 20:47 ` Karel Zak 2008-08-26 23:32 ` Theodore Tso 2 siblings, 1 reply; 15+ messages in thread From: Karel Zak @ 2008-08-26 20:47 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4, Eric Sandeen, mbroz, agk On Tue, Aug 26, 2008 at 10:47:21AM -0400, Theodore Tso wrote: > On Tue, Aug 26, 2008 at 03:51:02PM +0200, Karel Zak wrote: > > Well, I see few problems: > > > > * /proc/partitions containing internal dm device names (e.g. dm-0). > > The libdevmapper provides translation from internal to the "real" > > names (e.g /dev/mapper/foo). I guess (hope:-) /sys provides the > > real names too. > > You're right. So seaching for /dev/mapper/dm-n doesn't make any > sense; adding /dev/mapper to the dirlist doesn't help, and in fact is > a waste of time. However, the patch actually *did* work, and the > reason why it does is because we are also are searching /dev/mapper by > device number, and so we are finding the device name that way. OK. > I don't think you mean multipath support in terms of where there are > multiple paths to the same physical device ala fiber channel, but Ignore this point. You are right. The physical devices are slaves to the final DM device (when dm-multipath is on). BTW, I look forward to see multiple paths vs. udev (e.g. /dev/disk/by-* ) :-) > What we don't solve is the problem where one devicemapper device is > used to build another device mapper device. This could happen in a > number of circumstances. You might have some wierd circumstance where > /dev/mapper/part1 and /dev/mapper/part2 are glued together to make > /dev/mapper/whole-filesystem. Why you might do this instead of simply > using something like lvextend is beyond me, but that is something > legitimate can be done with the low-level device mapper primitives. There is worse scenario (thanks to Milan Broz from DM camp): dmsetup create x --table "0 100 linear /dev/sdb 0" dmsetup create y --table "0 100 linear /dev/mapper/x 0" dmsetup create z --table "0 100 linear /dev/mapper/y 0" # dmsetup ls --tree z (254:3) `-y (254:2) `-x (254:1) `- (8:16) it means all these devices are exactly same, but mount LABEL=foo has to mount /dev/mapper/z (from top of the tree). The sdb, x and y should be invisible for the mount(8). > But, #1, there are times when picking a leaf dm device over a non-leaf > dm device is not the right thing to do (which would be the case when > you make a live snapshot of a filesystem), and #2, your patch only > checks non-leaf dm devices for non-dm devices, probably because of #1. > > So with both of our patches, we have the problem where we could pick > the wrong dm device if the user builds one dm device on top of another I don't think so. The dm_probe_all() function never returns any DM device which is slave to any other device. It means it always returns the device from top of the hierarchy. All devices from dm_probe_all() have greater priority than other stuff from /proc/partitions (for example dm-N devs). So back to your example... /dev/mapper/part1 + /dev/mapper/part2 = /dev/mapper/whole-filesystem the /dev/mapper/part1 and /dev/mapper/part2 will be visible for the library (e.g. blkid.tab), but with *smaller priority* than /dev/mapper/whole-filesystem. In your non-libdevmapper implementation you need to check /sys/block/dm-N/holders/ and prefer devices without holders. I think we can ignore this minor problem for now. I'll try to found a better solution for dependencies resolution without libdevmapper. My wish is to avoid libdevmapper in libfsprobe. > > > + if (dev) { > > > + if (pri) > > > + dev->bid_pri = pri; > > > + else if (!strncmp(dev->bid_name, "/dev/mapper/", 11)) what about "if (major(devno) == DMMAJOR)" rather than strcmp()? > > > + dev->bid_pri = BLKID_PRI_DM; > > > > the same problem > > This does work, because we do find the /dev/mapper name via a > brute-force search of /dev looking for a matching devno when we call > blkid_devno_to_devname(). What I *can* do is do a special search of > /dev/mapper first, but instead of looking for /dev/mapper/<ptname>, to > do a readdir search of /dev/mapper looking for the matching devno. Not elegant, but... good enough :-) It would be nice to have /sys/block/dm-N/name where you can translate the internal dm-N name to the real device name. Alasdair? Milan? :-) Karel -- Karel Zak <kzak@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 20:47 ` Karel Zak @ 2008-08-26 23:32 ` Theodore Tso 2008-08-27 0:19 ` Karel Zak 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-08-26 23:32 UTC (permalink / raw) To: Karel Zak; +Cc: linux-ext4, Eric Sandeen, mbroz, agk On Tue, Aug 26, 2008 at 10:47:37PM +0200, Karel Zak wrote: > There is worse scenario (thanks to Milan Broz from DM camp): > > dmsetup create x --table "0 100 linear /dev/sdb 0" > dmsetup create y --table "0 100 linear /dev/mapper/x 0" > dmsetup create z --table "0 100 linear /dev/mapper/y 0" > > # dmsetup ls --tree > z (254:3) > `-y (254:2) > `-x (254:1) > `- (8:16) > > it means all these devices are exactly same, but > > mount LABEL=foo > > has to mount /dev/mapper/z (from top of the tree). The sdb, x and > y should be invisible for the mount(8). Sure, but consider what happens when you create a snapshot (either read-only or read-write) of an existing filesystem? In that case, both the parent and the child filesystem is mountable, and if the child filesystem is transient, the praent one may not want to be transient. In fact, suppose the scenario is a virtualization scenario, where you create the parent, create child snapshots, then use "tune2fs -U random -L virt1 /dev/mapper/snap1" ; "tune2fs -U random -L virt2 /dev/mapper/snap2" and so on, so each of the child snapshots have their own independent identity separate from the parent, it may very *well* be the case that the parent device should be visible to mount. I don't think we can make the general argument that the leaf device is always mountable, and anything above it is *never* mountable. Maybe that's the default, but it's certainly not always true. I'm beginning the right answer is we need some assist from the device mapper infrastructure, where when we create the device mapper device, we specify whether or not it is mountable, and this information is made available somehow, either by trying to sneaking it into /proc/partitions (which will be tricky without breaking legacy programs), or by making it visible in /sys. > I think we can ignore this minor problem for now. I'll try to found a > better solution for dependencies resolution without libdevmapper. My > wish is to avoid libdevmapper in libfsprobe. Speaking of which, what is your plan for caching versus non-caching in libfsprobe? It seems to me that if you are going to be caching, you'll just be re-inventing blkid. If you don't cache, you'll either (a) have to iterate over all possible devices, which is what we did before blkid (it was Ric Wheeler pointed out to me this problem and I wrote blkid in response to his request, because it becomes a problem if you have hundred of LUN's getting exported by a large EMC storage array :-), or (b) do what vol_id does, which is depend on /dev/disk/by-label and /dev/disk/by-uuid, which has the charming Windows-like attribute of not getting updated until the next reboot --- which means after you create a new filesystem or swap device on an existing device, or change a label or UUID using tune2fs, vol_id never notices until the next reboot or until you physically unplug and reinsert the device. Or is the answer that you expect libfsprobe to only do filesystem type, uuid, and label detection, and not solve the "find a device given a uuid/label" problem? > > This does work, because we do find the /dev/mapper name via a > > brute-force search of /dev looking for a matching devno when we call > > blkid_devno_to_devname(). What I *can* do is do a special search of > > /dev/mapper first, but instead of looking for /dev/mapper/<ptname>, to > > do a readdir search of /dev/mapper looking for the matching devno. > > Not elegant, but... good enough :-) > > It would be nice to have /sys/block/dm-N/name where you can translate > the internal dm-N name to the real device name. Alasdair? Milan? :-) Or maybe the right answer is /proc/partitions should only export devicemapper devices that are "supposed" to be visible to mount, and instead of exporting dm-0, dm-1...., we export the real name via /proc/partitions? Or do you not want to have the user-visible name get pushed into the kernel? - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-26 23:32 ` Theodore Tso @ 2008-08-27 0:19 ` Karel Zak 2008-08-27 1:21 ` Theodore Tso 2008-08-27 7:26 ` Andreas Dilger 0 siblings, 2 replies; 15+ messages in thread From: Karel Zak @ 2008-08-27 0:19 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4, Eric Sandeen, mbroz, agk On Tue, Aug 26, 2008 at 07:32:25PM -0400, Theodore Tso wrote: > On Tue, Aug 26, 2008 at 10:47:37PM +0200, Karel Zak wrote: > > There is worse scenario (thanks to Milan Broz from DM camp): > > > > dmsetup create x --table "0 100 linear /dev/sdb 0" > > dmsetup create y --table "0 100 linear /dev/mapper/x 0" > > dmsetup create z --table "0 100 linear /dev/mapper/y 0" > > > > # dmsetup ls --tree > > z (254:3) > > `-y (254:2) > > `-x (254:1) > > `- (8:16) > > > > it means all these devices are exactly same, but > > > > mount LABEL=foo > > > > has to mount /dev/mapper/z (from top of the tree). The sdb, x and > > y should be invisible for the mount(8). Well, s/invisible/less important/. Sorry. > Sure, but consider what happens when you create a snapshot (either > read-only or read-write) of an existing filesystem? In that case, That's misunderstanding. I'm talking about LABEL/UUId resolution where we need *priorities* for duplicate tags. I think dep-tree is good enough for this purpose. The snapshot (or arbitrary other device) is mountable when you explicitly use device "mount /dev/mapper/the_snapshot". > both the parent and the child filesystem is mountable, and if the > child filesystem is transient, the praent one may not want to be > transient. [...] > Speaking of which, what is your plan for caching versus non-caching in > libfsprobe? It seems to me that if you are going to be caching, Both. I think you remember our (+ Kay Sievers) discussion about it. We need a library which provides both ways. The smart way (cache, dependencies, ...) for mount(8) and others standard utils, and the low-level way for udev (no cache, direct FS probing, ...). > you'll just be re-inventing blkid. If you don't cache, you'll either Hehe.. I will directly copy code from blkid and vol_id. It's open source. I needn't re-inventing ;-) > (a) have to iterate over all possible devices, which is what we did > before blkid (it was Ric Wheeler pointed out to me this problem and I > wrote blkid in response to his request, because it becomes a problem > if you have hundred of LUN's getting exported by a large EMC storage > array :-), or (b) do what vol_id does, which is depend on > /dev/disk/by-label and /dev/disk/by-uuid, which has the charming > Windows-like attribute of not getting updated until the next reboot What about fix mkfs tools and send relevant events to udev? > > It would be nice to have /sys/block/dm-N/name where you can translate > > the internal dm-N name to the real device name. Alasdair? Milan? :-) > > Or maybe the right answer is /proc/partitions should only export > devicemapper devices that are "supposed" to be visible to mount, and Yes, I have no clue why the dm-N crap is in /proc/partitions. Probably any legacy... > instead of exporting dm-0, dm-1...., we export the real name via > /proc/partitions? Or do you not want to have the user-visible name > get pushed into the kernel? I'm wrong person for these questions. DM guys are in CC: :-) Karel -- Karel Zak <kzak@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-27 0:19 ` Karel Zak @ 2008-08-27 1:21 ` Theodore Tso 2008-08-27 4:40 ` Theodore Tso 2008-08-27 7:26 ` Andreas Dilger 1 sibling, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-08-27 1:21 UTC (permalink / raw) To: Karel Zak; +Cc: linux-ext4, Eric Sandeen, mbroz, agk On Wed, Aug 27, 2008 at 02:19:42AM +0200, Karel Zak wrote: > > That's misunderstanding. I'm talking about LABEL/UUId resolution > where we need *priorities* for duplicate tags. I think dep-tree is > good enough for this purpose. OK, so what you're saying is that that a leaf dm-device is always more important (and should therefore have a higher priority) than a non-leaf device. But I'm not sure that is *still* not always the right thing to do. Suppose someone creates a snapshot of a device in order to run e2fsck, or to do create a coherent snapshot. Now suppose the machine crashes while the snapshot still exists; even though the read-only snapshot is the "leaf" device, you don't want to try use that snapshot to be mounted as the root filesystem. There are a number of solutions of course --- most of which do not require adding more smarts into blkid (or some other probing library). We could make the scripts that create the snapshots update the UUID and LABEL of the snapshot, although that means adding some kind filesystem-specific hook to lvcreate. Or we could create the concept of "ephemeral snapshots" that don't survive a reboot. Or we could try to mark certain LVM volumes with explicit priorities that would be pulled into blkid (possibly via the dm interfaces). Or we could try to put a lot of that smarts into the blkid library. Personally, I like the idea of emphermal snapshots as the best system-wide solution, but the point is we need to think about this not from a single component's point of view, but what is the best solution from a systems perspective. > Both. I think you remember our (+ Kay Sievers) discussion about it. > We need a library which provides both ways. The smart way (cache, > dependencies, ...) for mount(8) and others standard utils, and the > low-level way for udev (no cache, direct FS probing, ...). Sure, but if that's the case, we already have most of the "smart way" from blkid. What's the point of making fsprobe re-invent the caching solution? I could just point blkid at fsprobe. > > What about fix mkfs tools and send relevant events to udev? > Well, for one, it means changing every single mkfs/mkswap/tune*fs program on the system; so it seems more than a bit of kludge. Sometimes these tools are run by an unprivileged user, so there are some security problems have to be carefully thought out. Obviously you can't just tell udev the new label and uuid, since the source might be untrusted. The userspace program send an event to udevd that a device has changed, but that means that you have to allow an unprivileged process to kick udevd and then reprobe a device, which means there is a possibility of a denial of service attack. It also makes it easier to exploit any potential buffer overrun, since the attacker can now set up the buggered block device image and then politely ask udev to call fsprobe (possibly running with root prives) to access the attack image. Of course, if there is a security bug in a filesystem probing code, we're in deep trouble if there are user-writable devices. A push model just means that the fs probing code can get triggered at a time of the attacker's choosing, as opposed at some point in the future. I also don't think it's realistic to assume that we can sweep through all of the possibile tool that creates or modifies filesystem/swap superblocks in order for the tool to work correctly. If a tool sends a hint that to udev so that the cache and /dev/disk/* can get updated, that may be fine; but we shouldn't be dependent on that hint. (And maybe system administrators or security officers will have policies not allowing non-privileged users running the tool to send that hint.) BTW, I think /dev/disk/by-label is a really nasty idea. Suppose you have hundreds of LUN's and/or device mapper devices. Now suppose we change the label, or become aware that the label of a device has changed. Right now, if you don't know the previous label (which will often be the case), the only way to install the new label and remove the old label is to iterate over all of the symlinks in /dev/disk/by-label and calling readlink() on each one. It's convenient and maybe it's a nice to have on disktop systems, but if you have a huge number of devices, it's not very scalable at all. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-27 1:21 ` Theodore Tso @ 2008-08-27 4:40 ` Theodore Tso 2008-08-27 8:32 ` Karel Zak 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-08-27 4:40 UTC (permalink / raw) To: Karel Zak; +Cc: linux-ext4, Eric Sandeen, mbroz, agk > I think we can ignore this minor problem for now. I'll try to found a > better solution for dependencies resolution without libdevmapper. My > wish is to avoid libdevmapper in libfsprobe. Here's a patch that I've been working on which gives a priority bonus to dm leaf devices, without needing libdevmapper. As I've mentioned before, I'm not 100% convinced this is always the right thing. But it's probably a not-half-bad hueristic... - Ted diff --git a/lib/blkid/devname.c b/lib/blkid/devname.c index 48a1dcc..2b3855b 100644 --- a/lib/blkid/devname.c +++ b/lib/blkid/devname.c @@ -25,6 +25,7 @@ #if HAVE_SYS_TYPES_H #include <sys/types.h> #endif +#include <dirent.h> #if HAVE_SYS_STAT_H #include <sys/stat.h> #endif @@ -117,6 +118,38 @@ blkid_dev blkid_get_dev(blkid_cache cache, const char *devname, int flags) /* Directories where we will try to search for device names */ static const char *dirlist[] = { "/dev", "/devfs", "/devices", NULL }; +static int is_dm_leaf(const char *devname) +{ + struct dirent *de, *d_de; + DIR *dir, *d_dir; + char path[256]; + int ret = 1; + + if ((dir = opendir("/sys/block")) == NULL) + return 0; + while ((de = readdir(dir)) != NULL) { + if (!strcmp(de->d_name, ".") || !strcmp(de->d_name, "..") || + !strcmp(de->d_name, devname) || + strncmp(de->d_name, "dm-", 3) || + strlen(de->d_name) > sizeof(path)-32) + continue; + sprintf(path, "/sys/block/%s/slaves", de->d_name); + if ((d_dir = opendir(path)) == NULL) + continue; + while ((d_de = readdir(d_dir)) != NULL) { + if (!strcmp(d_de->d_name, devname)) { + ret = 0; + break; + } + } + closedir(d_dir); + if (!ret) + break; + } + closedir(dir); + return ret; +} + /* * Probe a single block device to add to the device cache. */ @@ -180,9 +213,11 @@ set_pri: if (dev) { if (pri) dev->bid_pri = pri; - else if (!strncmp(dev->bid_name, "/dev/mapper/", 11)) + else if (!strncmp(dev->bid_name, "/dev/mapper/", 11)) { dev->bid_pri = BLKID_PRI_DM; - else if (!strncmp(ptname, "md", 2)) + if (is_dm_leaf(ptname)) + dev->bid_pri += 5; + } else if (!strncmp(ptname, "md", 2)) dev->bid_pri = BLKID_PRI_MD; } return; @@ -198,7 +233,6 @@ set_pri: * safe thing to do?) */ #ifdef VG_DIR -#include <dirent.h> static dev_t lvm_get_devno(const char *lvm_device) { FILE *lvf; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-27 4:40 ` Theodore Tso @ 2008-08-27 8:32 ` Karel Zak 0 siblings, 0 replies; 15+ messages in thread From: Karel Zak @ 2008-08-27 8:32 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4, Eric Sandeen, mbroz, agk On Wed, Aug 27, 2008 at 12:40:54AM -0400, Theodore Tso wrote: > > I think we can ignore this minor problem for now. I'll try to found a > > better solution for dependencies resolution without libdevmapper. My > > wish is to avoid libdevmapper in libfsprobe. > > Here's a patch that I've been working on which gives a priority bonus > to dm leaf devices, without needing libdevmapper. As I've mentioned > before, I'm not 100% convinced this is always the right thing. But > it's probably a not-half-bad hueristic... [...] > > +static int is_dm_leaf(const char *devname) > +{ > + struct dirent *de, *d_de; > + DIR *dir, *d_dir; > + char path[256]; > + int ret = 1; > + > + if ((dir = opendir("/sys/block")) == NULL) > + return 0; > + while ((de = readdir(dir)) != NULL) { > + if (!strcmp(de->d_name, ".") || !strcmp(de->d_name, "..") || > + !strcmp(de->d_name, devname) || > + strncmp(de->d_name, "dm-", 3) || > + strlen(de->d_name) > sizeof(path)-32) > + continue; > + sprintf(path, "/sys/block/%s/slaves", de->d_name); > + if ((d_dir = opendir(path)) == NULL) > + continue; > + while ((d_de = readdir(d_dir)) != NULL) { > + if (!strcmp(d_de->d_name, devname)) { > + ret = 0; > + break; > + } > + } > + closedir(d_dir); > + if (!ret) > + break; > + } > + closedir(dir); > + return ret; > +} Seems good. This patch and the brute-force for conversion from dm-N to real names is the way how replace the HAVE_DEVMAPPER crap in libblkid. Thanks! Karel -- Karel Zak <kzak@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-27 0:19 ` Karel Zak 2008-08-27 1:21 ` Theodore Tso @ 2008-08-27 7:26 ` Andreas Dilger 2008-08-27 8:10 ` Karel Zak 1 sibling, 1 reply; 15+ messages in thread From: Andreas Dilger @ 2008-08-27 7:26 UTC (permalink / raw) To: Karel Zak; +Cc: Theodore Tso, linux-ext4, Eric Sandeen, mbroz, agk On Aug 27, 2008 02:19 +0200, Karel Zak wrote: > On Tue, Aug 26, 2008 at 07:32:25PM -0400, Theodore Tso wrote: > > you'll just be re-inventing blkid. If you don't cache, you'll either > > Hehe.. I will directly copy code from blkid and vol_id. It's open > source. I needn't re-inventing ;-) Couldn't you just change libblkid to export the probe functions? It always makes me cringe when code like this is copied, because I just _know_ one or the other will become out of date, and it will take twice as much effort to keep them in sync. I'd rather see people doing "high value" work instead of watching for and copying patches around. > > Or maybe the right answer is /proc/partitions should only export > > devicemapper devices that are "supposed" to be visible to mount, and > > Yes, I have no clue why the dm-N crap is in /proc/partitions. > Probably any legacy... I've wanted that for so long... It's always a pain that LVM tools work by using /dev/vgfoo/lvfoo (which is fine) but /proc/partitions doesn't give any clue to what each dm-X device is. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] blkid: optimize dm_device_is_leaf() usage 2008-08-27 7:26 ` Andreas Dilger @ 2008-08-27 8:10 ` Karel Zak 0 siblings, 0 replies; 15+ messages in thread From: Karel Zak @ 2008-08-27 8:10 UTC (permalink / raw) To: Andreas Dilger; +Cc: Theodore Tso, linux-ext4, Eric Sandeen, mbroz, agk On Wed, Aug 27, 2008 at 01:26:43AM -0600, Andreas Dilger wrote: > On Aug 27, 2008 02:19 +0200, Karel Zak wrote: > > On Tue, Aug 26, 2008 at 07:32:25PM -0400, Theodore Tso wrote: > > > you'll just be re-inventing blkid. If you don't cache, you'll either > > > > Hehe.. I will directly copy code from blkid and vol_id. It's open > > source. I needn't re-inventing ;-) > > Couldn't you just change libblkid to export the probe functions? It > always makes me cringe when code like this is copied, because I just > _know_ one or the other will become out of date, and it will take > twice as much effort to keep them in sync. I'd rather see people > doing "high value" work instead of watching for and copying patches > around. Yes, good point. Frankly, I think about it in last weeks. I have done work on the low-lever part of libfsprobe -- it shouldn't be a problem port this code to libblkid. The advantage will be a library that is usable for udev and backwardly compatible for the current blkid applications. I will try it... Karel -- Karel Zak <kzak@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-08-27 8:32 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-25 20:48 [PATCH] blkid: optimize dm_device_is_leaf() usage Karel Zak 2008-08-26 12:24 ` Theodore Tso 2008-08-26 13:51 ` Karel Zak 2008-08-26 14:47 ` Theodore Tso 2008-08-26 18:04 ` Theodore Tso 2008-08-26 19:44 ` Andreas Dilger 2008-08-26 20:00 ` Theodore Tso 2008-08-26 20:47 ` Karel Zak 2008-08-26 23:32 ` Theodore Tso 2008-08-27 0:19 ` Karel Zak 2008-08-27 1:21 ` Theodore Tso 2008-08-27 4:40 ` Theodore Tso 2008-08-27 8:32 ` Karel Zak 2008-08-27 7:26 ` Andreas Dilger 2008-08-27 8:10 ` Karel Zak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox