* linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() @ 2011-01-12 17:34 Valdis.Kletnieks 2011-01-13 0:23 ` Milan Broz 0 siblings, 1 reply; 35+ messages in thread From: Valdis.Kletnieks @ 2011-01-12 17:34 UTC (permalink / raw) To: Alexander Viro, Neil Brown; +Cc: linux-kernel, linux-fsdevel, linux-raid [-- Attachment #1: Type: text/plain, Size: 1998 bytes --] Seen in a boot of yesterday's linux-next. The 'W' flag was due to the already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'. Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups. I wonder if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off dracut before and after the warning). [ 16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2 [ 16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found. Falling back to direct link creation. [ 16.912627] ------------[ cut here ]------------ [ 16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() [ 16.912638] Hardware name: Latitude E6500 [ 16.912640] Modules linked in: [ 16.912644] Pid: 1920, comm: lvm Tainted: G W 2.6.37-next-20110111 #1 [ 16.912647] Call Trace: [ 16.912653] [<ffffffff8103a989>] ? warn_slowpath_common+0x80/0x98 [ 16.912657] [<ffffffff8103a9b6>] ? warn_slowpath_null+0x15/0x17 [ 16.912660] [<ffffffff8111fb03>] ? bd_link_disk_holder+0x92/0x1ac [ 16.912665] [<ffffffff8139b875>] ? open_dev+0x76/0x9e [ 16.912668] [<ffffffff8139bc9d>] ? dm_get_device+0x147/0x1dc [ 16.912672] [<ffffffff8139d088>] ? linear_ctr+0x98/0xd4 [ 16.912676] [<ffffffff8139c289>] ? dm_table_add_target+0x149/0x1d4 [ 16.912679] [<ffffffff8139e57c>] ? table_load+0xf0/0x27b [ 16.912683] [<ffffffff8139e48c>] ? table_load+0x0/0x27b [ 16.912686] [<ffffffff8139f326>] ? ctl_ioctl+0x1c6/0x21e [ 16.912690] [<ffffffff8139f38c>] ? dm_ctl_ioctl+0xe/0x12 [ 16.912695] [<ffffffff8110258e>] ? do_vfs_ioctl+0x4c4/0x505 [ 16.912699] [<ffffffff81102626>] ? sys_ioctl+0x57/0x95 [ 16.912703] [<ffffffff8100253b>] ? system_call_fastpath+0x16/0x1b [ 16.912706] ---[ end trace 4eaa2a86a8e2da24 ]--- [ 16.925289] dracut: The link /dev/vg_blackice/src should had been created by udev but it was not found. Falling back to direct link creation. [-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-12 17:34 linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() Valdis.Kletnieks @ 2011-01-13 0:23 ` Milan Broz 2011-01-13 2:19 ` Jun'ichi Nomura 0 siblings, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-13 0:23 UTC (permalink / raw) To: Valdis.Kletnieks Cc: Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development, Tejun Heo On 01/12/2011 06:34 PM, Valdis.Kletnieks@vt.edu wrote: > Seen in a boot of yesterday's linux-next. The 'W' flag was due to the > already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'. > Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups. I wonder > if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off > dracut before and after the warning). > > [ 16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2 > [ 16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found. Falling back to direct link creation. > [ 16.912627] ------------[ cut here ]------------ > [ 16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() That seems to be WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk); added in patch http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=e09b457bdb7e8d23fc54dcef0930ac697d8de895 "block: simplify holder symlink handling" dm linear just claims device in table constructor, I don't think it is bug in DM code. > [ 16.912638] Hardware name: Latitude E6500 > [ 16.912640] Modules linked in: > [ 16.912644] Pid: 1920, comm: lvm Tainted: G W 2.6.37-next-20110111 #1 > [ 16.912647] Call Trace: > [ 16.912653] [<ffffffff8103a989>] ? warn_slowpath_common+0x80/0x98 > [ 16.912657] [<ffffffff8103a9b6>] ? warn_slowpath_null+0x15/0x17 > [ 16.912660] [<ffffffff8111fb03>] ? bd_link_disk_holder+0x92/0x1ac > [ 16.912665] [<ffffffff8139b875>] ? open_dev+0x76/0x9e > [ 16.912668] [<ffffffff8139bc9d>] ? dm_get_device+0x147/0x1dc > [ 16.912672] [<ffffffff8139d088>] ? linear_ctr+0x98/0xd4 > [ 16.912676] [<ffffffff8139c289>] ? dm_table_add_target+0x149/0x1d4 > [ 16.912679] [<ffffffff8139e57c>] ? table_load+0xf0/0x27b > [ 16.912683] [<ffffffff8139e48c>] ? table_load+0x0/0x27b > [ 16.912686] [<ffffffff8139f326>] ? ctl_ioctl+0x1c6/0x21e > [ 16.912690] [<ffffffff8139f38c>] ? dm_ctl_ioctl+0xe/0x12 > [ 16.912695] [<ffffffff8110258e>] ? do_vfs_ioctl+0x4c4/0x505 > [ 16.912699] [<ffffffff81102626>] ? sys_ioctl+0x57/0x95 > [ 16.912703] [<ffffffff8100253b>] ? system_call_fastpath+0x16/0x1b > [ 16.912706] ---[ end trace 4eaa2a86a8e2da24 ]--- > [ 16.925289] dracut: The link /dev/vg_blackice/src should had been created by udev but it was not found. Falling back to direct link creation. > Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 0:23 ` Milan Broz @ 2011-01-13 2:19 ` Jun'ichi Nomura 2011-01-13 11:06 ` Tejun Heo 2011-01-13 17:21 ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo 0 siblings, 2 replies; 35+ messages in thread From: Jun'ichi Nomura @ 2011-01-13 2:19 UTC (permalink / raw) To: Milan Broz, Tejun Heo Cc: Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development On 01/13/11 09:23, Milan Broz wrote: > On 01/12/2011 06:34 PM, Valdis.Kletnieks@vt.edu wrote: >> Seen in a boot of yesterday's linux-next. The 'W' flag was due to the >> already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'. >> Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups. I wonder >> if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off >> dracut before and after the warning). >> >> [ 16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2 >> [ 16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found. Falling back to direct link creation. >> [ 16.912627] ------------[ cut here ]------------ >> [ 16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() > > That seems to be > WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk); > added in patch > http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=e09b457bdb7e8d23fc54dcef0930ac697d8de895 > "block: simplify holder symlink handling" > > dm linear just claims device in table constructor, I don't think it is bug in DM code. The patch assumes only one holder disk for a claimed dev, which is not true. E.g. if there are multiple LVs on a PV. In addition to that, since claiming is done in table constructor, there can be 2 claim instances for a slave/holder pair at a time when you load a table while there's already an active one. E.g. if you do lvresize. We need consideration for this, too. Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 2:19 ` Jun'ichi Nomura @ 2011-01-13 11:06 ` Tejun Heo 2011-01-13 11:26 ` [dm-devel] " Milan Broz 2011-01-13 17:21 ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo 1 sibling, 1 reply; 35+ messages in thread From: Tejun Heo @ 2011-01-13 11:06 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Milan Broz, Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development Hello, On Thu, Jan 13, 2011 at 11:19:21AM +0900, Jun'ichi Nomura wrote: > > "block: simplify holder symlink handling" > > > > dm linear just claims device in table constructor, I don't think it is bug in DM code. > > The patch assumes only one holder disk for a claimed dev, which is not true. > E.g. if there are multiple LVs on a PV. > > In addition to that, since claiming is done in table constructor, > there can be 2 claim instances for a slave/holder pair at a time > when you load a table while there's already an active one. > E.g. if you do lvresize. > We need consideration for this, too. Urgh... gees, so there actually was a user using all that cruft. Sorry about the breakage, I'll see how multiple symlinks can be restored. I'm curious why this was added at all tho. What was the rationalization? It's not like two subsystems can share the same block device so marking the currently owning subsystem should have been enough at the block layer. There is no reason for block devices to present information which is of no use to itself. All that's necessary is "this is taken by dm or md for more information, query those". dm and md need their own conf/representation layer anyway. Thanks. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 11:06 ` Tejun Heo @ 2011-01-13 11:26 ` Milan Broz 2011-01-13 12:27 ` Karel Zak 0 siblings, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-13 11:26 UTC (permalink / raw) To: device-mapper development Cc: Tejun Heo, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Karel Zak On 01/13/2011 12:06 PM, Tejun Heo wrote: > Urgh... gees, so there actually was a user using all that cruft. > Sorry about the breakage, I'll see how multiple symlinks can be > restored. I'm curious why this was added at all tho. What was the > rationalization? It's not like two subsystems can share the same > block device so marking the currently owning subsystem should have > been enough at the block layer. There is no reason for block devices > to present information which is of no use to itself. All that's > necessary is "this is taken by dm or md for more information, query > those". dm and md need their own conf/representation layer anyway. I am not sure if I understand it correctly, but multiple holders/slaves links are very useful in userspace to present device dependences. We just implemented lsblk command in util-linux which simple prints tree according to these links, so dependendes of DM/MD/whatever devices can be printed without using any system specific callbacks, just using sysfs. See http://karelzak.blogspot.com/2010/12/lsblk8.html Whatever changes are needed, please keep this functionality, it can be useful. Thanks, Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 11:26 ` [dm-devel] " Milan Broz @ 2011-01-13 12:27 ` Karel Zak 2011-01-13 13:12 ` Tejun Heo 0 siblings, 1 reply; 35+ messages in thread From: Karel Zak @ 2011-01-13 12:27 UTC (permalink / raw) To: Milan Broz Cc: device-mapper development, Tejun Heo, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel On Thu, Jan 13, 2011 at 12:26:14PM +0100, Milan Broz wrote: > On 01/13/2011 12:06 PM, Tejun Heo wrote: > > Urgh... gees, so there actually was a user using all that cruft. > > Sorry about the breakage, I'll see how multiple symlinks can be > > restored. I'm curious why this was added at all tho. What was the > > rationalization? It's not like two subsystems can share the same > > block device so marking the currently owning subsystem should have > > been enough at the block layer. There is no reason for block devices > > to present information which is of no use to itself. All that's > > necessary is "this is taken by dm or md for more information, query > > those". dm and md need their own conf/representation layer anyway. > > I am not sure if I understand it correctly, but multiple holders/slaves > links are very useful in userspace to present device dependences. > > We just implemented lsblk command in util-linux which simple prints > tree according to these links, so dependendes of DM/MD/whatever devices > can be printed without using any system specific callbacks, just using > sysfs. See http://karelzak.blogspot.com/2010/12/lsblk8.html We also use holders/slaves links in libblkid to evaluate dependencies between devices (since 2008). > Whatever changes are needed, please keep this functionality, it can be useful. Definitely. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 12:27 ` Karel Zak @ 2011-01-13 13:12 ` Tejun Heo 2011-01-13 13:26 ` Karel Zak 0 siblings, 1 reply; 35+ messages in thread From: Tejun Heo @ 2011-01-13 13:12 UTC (permalink / raw) To: Karel Zak Cc: Milan Broz, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel Hello, On Thu, Jan 13, 2011 at 01:27:01PM +0100, Karel Zak wrote: > We also use holders/slaves links in libblkid to evaluate dependencies > between devices (since 2008). > > > Whatever changes are needed, please keep this functionality, it > > can be useful. > > Definitely. Yeah, sure, it's not like I can afford to avoid fixing it at this point anyway, but I still want to point out it's at the wrong layer and shouldn't have been added like this, really. If you want blkid to identify it, the proper thing to do would be querying blk device for the claimer and then use claimer-specific method to query them. It's not like the current method would make sense for btrfs or whatnot. So, yeap, I definitely am fixing it but let's not do things like this in the future. Thanks. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 13:12 ` Tejun Heo @ 2011-01-13 13:26 ` Karel Zak 2011-01-13 13:37 ` Tejun Heo 0 siblings, 1 reply; 35+ messages in thread From: Karel Zak @ 2011-01-13 13:26 UTC (permalink / raw) To: Tejun Heo Cc: Milan Broz, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On Thu, Jan 13, 2011 at 02:12:16PM +0100, Tejun Heo wrote: > Hello, > > On Thu, Jan 13, 2011 at 01:27:01PM +0100, Karel Zak wrote: > > We also use holders/slaves links in libblkid to evaluate dependencies > > between devices (since 2008). > > > > > Whatever changes are needed, please keep this functionality, it > > > can be useful. > > > > Definitely. > > Yeah, sure, it's not like I can afford to avoid fixing it at this > point anyway, but I still want to point out it's at the wrong layer > and shouldn't have been added like this, really. If you want blkid to > identify it, the proper thing to do would be querying blk device for > the claimer and then use claimer-specific method to query them. It's It seems that dependencies (holders/slaves) between devices is pretty generic thing. Why do you think that we need claimer-specific method? The /sys filesystem is better that ictls in many ways. > not like the current method would make sense for btrfs or whatnot. Could you be more verbose, please? Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 13:26 ` Karel Zak @ 2011-01-13 13:37 ` Tejun Heo 2011-01-13 13:52 ` Tejun Heo 2011-01-13 13:58 ` Milan Broz 0 siblings, 2 replies; 35+ messages in thread From: Tejun Heo @ 2011-01-13 13:37 UTC (permalink / raw) To: Karel Zak Cc: Milan Broz, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers Hello, Karel. On Thu, Jan 13, 2011 at 02:26:37PM +0100, Karel Zak wrote: > > Yeah, sure, it's not like I can afford to avoid fixing it at this > > point anyway, but I still want to point out it's at the wrong layer > > and shouldn't have been added like this, really. If you want blkid to > > identify it, the proper thing to do would be querying blk device for > > the claimer and then use claimer-specific method to query them. It's > > It seems that dependencies (holders/slaves) between devices is pretty > generic thing. Why do you think that we need claimer-specific method? > The /sys filesystem is better that ictls in many ways. Because sysfs is already complex enough without representing dependency information without strictly defined strcture in it. It's for exporting the device tree to users. We have developed interesting ways to exploit it but it generally has proven to be a bad idea to add symlinks without clearly defined structure beneath it. People end up using them differently and often they don't understand how the kobject sysfs things are wired and it ends up with a lot of cruft exporting something which is designed by small number of people without really considering what's going on in the rest of the hierarchy. > > not like the current method would make sense for btrfs or whatnot. > > Could you be more verbose, please? If the symlinkery was something which could properly replace configuration and query, sure, let's standardize it and share it among all possible claimers of block devices, but it's not. md, dm, btrfs need to export and process way more information than can be represnted in these symlinks and it's there more as a pretty thing than anything which is actually necessary and useful. And the original implementation directly played with kobjects (in an unnecessarily complicated way too) and allowed any kobject to be linked. Things like that just never end up pretty. So, just don't do it. Sysfs is for device hierarchy. Don't try to shove pretty looking things there (unless it's something widely agreed on and necessary, of course). Thanks. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 13:37 ` Tejun Heo @ 2011-01-13 13:52 ` Tejun Heo 2011-01-13 13:58 ` Milan Broz 1 sibling, 0 replies; 35+ messages in thread From: Tejun Heo @ 2011-01-13 13:52 UTC (permalink / raw) To: Karel Zak Cc: Milan Broz, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On Thu, Jan 13, 2011 at 02:37:22PM +0100, Tejun Heo wrote: > > It seems that dependencies (holders/slaves) between devices is pretty > > generic thing. Why do you think that we need claimer-specific method? > > The /sys filesystem is better that ictls in many ways. Also, one more thing. In btrfs, there's no block device for the master device. You can add the slaves directory somewhere and your choice would be as good as any other's but it just will not be generic while being superflous exactly the same way it's superflous for md and dm devices. Thanks. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 13:37 ` Tejun Heo 2011-01-13 13:52 ` Tejun Heo @ 2011-01-13 13:58 ` Milan Broz 2011-01-13 14:11 ` Tejun Heo 1 sibling, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-13 13:58 UTC (permalink / raw) To: Tejun Heo Cc: Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On 01/13/2011 02:37 PM, Tejun Heo wrote: > So, just don't do it. Sysfs is for device hierarchy. Don't try to > shove pretty looking things there (unless it's something widely agreed > on and necessary, of course). Hi, I think that it is exactly what holders/slaves do - displaying device hierarchy. So application can check which underlying device are related and ask them for more info if needed (=> with system specific call, it can be simple sysfs attribute, ioctl, whatever). So the only request here is to keep these symlinks correct, nothing more. Or am I missing anything? Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 13:58 ` Milan Broz @ 2011-01-13 14:11 ` Tejun Heo 2011-01-13 14:25 ` Milan Broz 0 siblings, 1 reply; 35+ messages in thread From: Tejun Heo @ 2011-01-13 14:11 UTC (permalink / raw) To: Milan Broz Cc: Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers Hello, On Thu, Jan 13, 2011 at 02:58:23PM +0100, Milan Broz wrote: > On 01/13/2011 02:37 PM, Tejun Heo wrote: > > > So, just don't do it. Sysfs is for device hierarchy. Don't try to > > shove pretty looking things there (unless it's something widely agreed > > on and necessary, of course). > > I think that it is exactly what holders/slaves do - displaying device > hierarchy. So application can check which underlying device are related > and ask them for more info if needed (=> with system specific call, > it can be simple sysfs attribute, ioctl, whatever). Yeah, sure but in a completely unrestrained and non-standard way. First of all, it wasn't even necessary to begin with and I don't really see anyone else other than md/dm using it. I mean, where are you gonna you put that slaves directory? Sure you can put it somewhere but really it would be just that - somewhere. All this doesn't even matter. It wasn't even necessary to begin with. > So the only request here is to keep these symlinks correct, nothing more. > Or am I missing anything? Yeah, I'm fixing that. Don't worry. I just wanna say it wasn't such a brilliant idea to add it in the first place and hope that people would restrain from doing similar things in the future. So, as a general rule, when in doubt, just create an attribute. Let's refrain from custom symlinkery in sysfs, please. In this case too, a holder attribute containing strings like ext[3|4], md, dm or whatnot would have been _much_ simpler and actually more useful. Thank you. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:11 ` Tejun Heo @ 2011-01-13 14:25 ` Milan Broz 2011-01-13 14:30 ` Tejun Heo 0 siblings, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-13 14:25 UTC (permalink / raw) To: Tejun Heo Cc: Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On 01/13/2011 03:11 PM, Tejun Heo wrote: > So, as a general rule, when in doubt, just create an attribute. Let's > refrain from custom symlinkery in sysfs, please. In this case too, a > holder attribute containing strings like ext[3|4], md, dm or whatnot > would have been _much_ simpler and actually more useful. Maybe, but this was not invented in DM/MD camp:-) Probably Kay or Greg can answer why it was done this way? For DM it just added links to be proper user of it, see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f165921df46a977e3561f1bd9f13a348441486d1 Anyway, it is /sys/block - so it represents block devices. If btrfs internally creates some virtual _block_ device for its pool, it should present it here too with slaves/holders. If not, why it should create any links there? Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:25 ` Milan Broz @ 2011-01-13 14:30 ` Tejun Heo 2011-01-13 14:43 ` Kay Sievers ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Tejun Heo @ 2011-01-13 14:30 UTC (permalink / raw) To: Milan Broz Cc: Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers Hello, On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: > Maybe, but this was not invented in DM/MD camp:-) > Probably Kay or Greg can answer why it was done this way? Let's not play the dig the history and blame game if possible. We (including me, of course) all did a lot of horrible things in the past. :-) > For DM it just added links to be proper user of it, see > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f165921df46a977e3561f1bd9f13a348441486d1 > > Anyway, it is /sys/block - so it represents block devices. > > If btrfs internally creates some virtual _block_ device for its pool, it should > present it here too with slaves/holders. If not, why it should create any links there? Yeah, that's the most bothering part for me. The biggest customers of bd_claim are filesystems and all these custom symlinkeries don't do nothing for them. It just doesn't make a whole lot of sense to me. Thanks. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:30 ` Tejun Heo @ 2011-01-13 14:43 ` Kay Sievers 2011-01-13 15:03 ` Milan Broz 2011-01-13 15:59 ` Karel Zak 2011-01-13 14:45 ` Theodore Tso 2011-01-13 14:49 ` Milan Broz 2 siblings, 2 replies; 35+ messages in thread From: Kay Sievers @ 2011-01-13 14:43 UTC (permalink / raw) To: Tejun Heo Cc: Milan Broz, Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote: > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: >> Maybe, but this was not invented in DM/MD camp:-) >> Probably Kay or Greg can answer why it was done this way? It's not from Greg or Kay. It just appeared some day in the context of dm. :) And yes, symlinks *look* nice and simple for the outside, but they are not, and have all sorts of problems like non-atomic updates, make it impossible to ever rename a device (as long as they copy the device name), and and and .... we should not add more of this. >> If btrfs internally creates some virtual _block_ device for its pool, it should >> present it here too with slaves/holders. If not, why it should create any links there? > > Yeah, that's the most bothering part for me. The biggest customers of > bd_claim are filesystems and all these custom symlinkeries don't do > nothing for them. It just doesn't make a whole lot of sense to me. Btrfs does not use any blockdev as the master for good reason, and it can never map its slaves inside of /sys/block. Simple meta-blockdevs like md/dm just don't fit into modern requirements of a filesystem (directory snapshots, directory subvolumes, complex raids, hassle-free resizing, ...) -- hence btrfs is much more like a network-filesystem mount than a stream of blocks like a disk, and does not fit at all into this model. Kay ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:43 ` Kay Sievers @ 2011-01-13 15:03 ` Milan Broz 2011-01-14 7:38 ` Jun'ichi Nomura 2011-01-13 15:59 ` Karel Zak 1 sibling, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-13 15:03 UTC (permalink / raw) To: Kay Sievers Cc: Tejun Heo, Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel On 01/13/2011 03:43 PM, Kay Sievers wrote: >> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: >>> Maybe, but this was not invented in DM/MD camp:-) >>> Probably Kay or Greg can answer why it was done this way? > > It's not from Greg or Kay. It just appeared some day in the context of dm. :) ah, then sorry, I am just confused:-) But DM does not need it for operation at all so it must had some other reason. (We have dmsetup ls --tree using dm-ioctls for years.) > And yes, symlinks *look* nice and simple for the outside, but they are > not, and have all sorts of problems like non-atomic updates, make it > impossible to ever rename a device (as long as they copy the device > name), and and and .... we should not add more of this. Yes, I agree, if there is better way, let's switch to that. Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 15:03 ` Milan Broz @ 2011-01-14 7:38 ` Jun'ichi Nomura 0 siblings, 0 replies; 35+ messages in thread From: Jun'ichi Nomura @ 2011-01-14 7:38 UTC (permalink / raw) To: Milan Broz, Kay Sievers, Tejun Heo Cc: Karel Zak, device-mapper development, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel On 01/14/11 00:03, Milan Broz wrote: > On 01/13/2011 03:43 PM, Kay Sievers wrote: > >>> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: >>>> Maybe, but this was not invented in DM/MD camp:-) >>>> Probably Kay or Greg can answer why it was done this way? >> >> It's not from Greg or Kay. It just appeared some day in the context of dm. :) > > ah, then sorry, I am just confused:-) > But DM does not need it for operation at all so it must had some other reason. > (We have dmsetup ls --tree using dm-ioctls for years.) Sorry. It's from me 5 years ago. :) See this for backgrounds: [PATCH 0/3] sysfs representation of stacked devices (dm/md) http://lwn.net/Articles/172689/ And I don't adhere to my implementation if there's better one. Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:43 ` Kay Sievers 2011-01-13 15:03 ` Milan Broz @ 2011-01-13 15:59 ` Karel Zak 2011-01-13 16:10 ` [dm-devel] " Kay Sievers 1 sibling, 1 reply; 35+ messages in thread From: Karel Zak @ 2011-01-13 15:59 UTC (permalink / raw) To: Kay Sievers Cc: Valdis.Kletnieks, linux-kernel, linux-raid, Jun'ichi Nomura, device-mapper development, Alexander Viro, Tejun Heo, linux-fsdevel, Milan Broz On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote: > On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote: > > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: > >> Maybe, but this was not invented in DM/MD camp:-) > >> Probably Kay or Greg can answer why it was done this way? > > It's not from Greg or Kay. It just appeared some day in the context of dm. :) > > And yes, symlinks *look* nice and simple for the outside, but they are > not, and have all sorts of problems like non-atomic updates, make it Sounds like sysfs implementation problem, right? If there is noway to fix sysfs then we can add a generic ioctl or /sys/block/<device>/{slave,holder}_list files with list of holders/slaves. But please, don't force userspace to use *claimer-specific* methods to answer *generic questions* like slave/holder dependencies between devices. > impossible to ever rename a device (as long as they copy the device > name), and and and .... we should not add more of this. > > >> If btrfs internally creates some virtual _block_ device for its pool, it should > >> present it here too with slaves/holders. If not, why it should create any links there? > > > > Yeah, that's the most bothering part for me. The biggest customers of > > bd_claim are filesystems and all these custom symlinkeries don't do > > nothing for them. It just doesn't make a whole lot of sense to me. > > Btrfs does not use any blockdev as the master for good reason, and it > can never map its slaves inside of /sys/block. Yep, expected and correct response :-) Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 15:59 ` Karel Zak @ 2011-01-13 16:10 ` Kay Sievers 2011-01-14 15:07 ` Karel Zak 0 siblings, 1 reply; 35+ messages in thread From: Kay Sievers @ 2011-01-13 16:10 UTC (permalink / raw) To: Karel Zak Cc: Tejun Heo, Milan Broz, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote: > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote: >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote: >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: >> >> Maybe, but this was not invented in DM/MD camp:-) >> >> Probably Kay or Greg can answer why it was done this way? >> >> It's not from Greg or Kay. It just appeared some day in the context of dm. :) >> >> And yes, symlinks *look* nice and simple for the outside, but they are >> not, and have all sorts of problems like non-atomic updates, make it > > Sounds like sysfs implementation problem, right? It's a normal multi-file problem. It can by-definition not be atomic without doing really weird locking things. > If there is noway to fix sysfs then we can add a generic ioctl or > /sys/block/<device>/{slave,holder}_list files with list of > holders/slaves. Yeah, we've been there with the btrfs problem. For btrfs it woud probably need to be something statfs()-like. > But please, don't force userspace to use *claimer-specific* > methods to answer *generic questions* like slave/holder dependencies > between devices. The links exist only for dm and md so far, I think. It's the classical multiple-parents-in-a-tree problem. We have that for bonded network devices and some IO buses too. There is no nice representation for these reversed-trees-in-the-tree so far. Kay ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 16:10 ` [dm-devel] " Kay Sievers @ 2011-01-14 15:07 ` Karel Zak 2011-01-14 15:23 ` Kay Sievers 0 siblings, 1 reply; 35+ messages in thread From: Karel Zak @ 2011-01-14 15:07 UTC (permalink / raw) To: Kay Sievers Cc: Tejun Heo, Milan Broz, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote: > On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote: > > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote: > >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote: > >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: > >> >> Maybe, but this was not invented in DM/MD camp:-) > >> >> Probably Kay or Greg can answer why it was done this way? > >> > >> It's not from Greg or Kay. It just appeared some day in the context of dm. :) > >> > >> And yes, symlinks *look* nice and simple for the outside, but they are > >> not, and have all sorts of problems like non-atomic updates, make it > > > > Sounds like sysfs implementation problem, right? > > It's a normal multi-file problem. It can by-definition not be atomic > without doing really weird locking things. BTW, lsblk(8) and libblkid don't depend on the fact that slaves/holders files are symlinks. The important thing is the filename (/sys/block/.../slaves/<name>) only. We don't follow the symlinks and we don't use readlink() there. It means that you can replace the symlinks with regular files where in the file contents is for example maj:min, etc. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-14 15:07 ` Karel Zak @ 2011-01-14 15:23 ` Kay Sievers 0 siblings, 0 replies; 35+ messages in thread From: Kay Sievers @ 2011-01-14 15:23 UTC (permalink / raw) To: Karel Zak Cc: Tejun Heo, Milan Broz, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel On Fri, Jan 14, 2011 at 16:07, Karel Zak <kzak@redhat.com> wrote: > On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote: >> On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote: >> > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote: >> >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote: >> >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote: >> >> >> Maybe, but this was not invented in DM/MD camp:-) >> >> >> Probably Kay or Greg can answer why it was done this way? >> >> >> >> It's not from Greg or Kay. It just appeared some day in the context of dm. :) >> >> >> >> And yes, symlinks *look* nice and simple for the outside, but they are >> >> not, and have all sorts of problems like non-atomic updates, make it >> > >> > Sounds like sysfs implementation problem, right? >> >> It's a normal multi-file problem. It can by-definition not be atomic >> without doing really weird locking things. > > BTW, lsblk(8) and libblkid don't depend on the fact that slaves/holders > files are symlinks. > > The important thing is the filename (/sys/block/.../slaves/<name>) > only. We don't follow the symlinks and we don't use readlink() there. > > It means that you can replace the symlinks with regular files where > in the file contents is for example maj:min, etc. I don't think we really can change anything here, there are more users of it. If we would go changing things here, the file names should never be the device name but the format "b8:32" for blockdevs, "n22" for the network ifindex, ... which is the unchangeable unique kernel id -- not depending on any possible device rename, or fast destroy/re-create or anything like that. Kay -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:30 ` Tejun Heo 2011-01-13 14:43 ` Kay Sievers @ 2011-01-13 14:45 ` Theodore Tso 2011-01-13 20:18 ` NeilBrown 2011-01-13 14:49 ` Milan Broz 2 siblings, 1 reply; 35+ messages in thread From: Theodore Tso @ 2011-01-13 14:45 UTC (permalink / raw) To: Tejun Heo Cc: Milan Broz, Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On Jan 13, 2011, at 9:30 AM, Tejun Heo wrote: >> > > Yeah, that's the most bothering part for me. The biggest customers of > bd_claim are filesystems and all these custom symlinkeries don't do > nothing for them. It just doesn't make a whole lot of sense to me. > Well, if there's better way to do things, we can send patches to libblkid to switch away from using sysfs, assuming it's using a new enough kernel. The primary problem that we're trying to solve is to know whether a particular device contains a file system that should potentially mounted (or fsck'ed, or used as a external journal device, etc.) If the file system is located on a raid 0 device created using devicemapper, the first physical block device could look like a file system. So what we want is a very easy way of determining, "is this device being used by the device mapper layer"? If it is, then it's probably not the droids we are looking for. We'll keep looking at the rest of the block devices in the system, and when we find the dm-concatenated devices, we can properly identify it. Can you suggest a better way doing what it is we need to do? -- Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:45 ` Theodore Tso @ 2011-01-13 20:18 ` NeilBrown 2011-01-13 20:41 ` Ted Ts'o 0 siblings, 1 reply; 35+ messages in thread From: NeilBrown @ 2011-01-13 20:18 UTC (permalink / raw) To: Theodore Tso Cc: Tejun Heo, Milan Broz, Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On Thu, 13 Jan 2011 09:45:54 -0500 Theodore Tso <tytso@MIT.EDU> wrote: > > On Jan 13, 2011, at 9:30 AM, Tejun Heo wrote: > >> > > > > Yeah, that's the most bothering part for me. The biggest customers of > > bd_claim are filesystems and all these custom symlinkeries don't do > > nothing for them. It just doesn't make a whole lot of sense to me. > > > > Well, if there's better way to do things, we can send patches to libblkid to switch away from using sysfs, assuming it's using a new enough kernel. > > The primary problem that we're trying to solve is to know whether a particular device contains a file system that should potentially mounted (or fsck'ed, or used as a external journal device, etc.) > > If the file system is located on a raid 0 device created using devicemapper, the first physical block device could look like a file system. So what we want is a very easy way of determining, "is this device being used by the device mapper layer"? If it is, then it's probably not the droids we are looking for. We'll keep looking at the rest of the block devices in the system, and when we find the dm-concatenated devices, we can properly identify it. > > Can you suggest a better way doing what it is we need to do? open(O_EXCL) will fail on a block device if it is being used by anything else - a filesystem or a dm target or an md array or .... So if the *only* thing you want is "is this currently an active part of something else", then O_EXCL works since 2.6.0 (I think). NeilBrown ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 20:18 ` NeilBrown @ 2011-01-13 20:41 ` Ted Ts'o 2011-01-14 16:20 ` Tejun Heo [not found] ` <20110114162022.GC978@htj.dyndns.org> 0 siblings, 2 replies; 35+ messages in thread From: Ted Ts'o @ 2011-01-13 20:41 UTC (permalink / raw) To: NeilBrown Cc: Tejun Heo, Milan Broz, Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote: > > open(O_EXCL) will fail on a block device if it is being used by anything else > - a filesystem or a dm target or an md array or .... > > So if the *only* thing you want is "is this currently an active part of > something else", then O_EXCL works since 2.6.0 (I think). Unfortunately, that won't distinguish between a currently active file system, and a device which is being used by a dm target, which is what we want to do. - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 20:41 ` Ted Ts'o @ 2011-01-14 16:20 ` Tejun Heo [not found] ` <20110114162022.GC978@htj.dyndns.org> 1 sibling, 0 replies; 35+ messages in thread From: Tejun Heo @ 2011-01-14 16:20 UTC (permalink / raw) To: Ted Ts'o, NeilBrown, Milan Broz, Karel Zak, device-mapper development, Jun'ichi Nomura <j-no Hello, Ted, Neil. On Thu, Jan 13, 2011 at 03:41:45PM -0500, Ted Ts'o wrote: > On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote: > > > > open(O_EXCL) will fail on a block device if it is being used by anything else > > - a filesystem or a dm target or an md array or .... > > > > So if the *only* thing you want is "is this currently an active part of > > something else", then O_EXCL works since 2.6.0 (I think). > > Unfortunately, that won't distinguish between a currently active file > system, and a device which is being used by a dm target, which is what > we want to do. Hmmm... that's already possible with the existing holders symlinks, right? As Kay said in another message, I don't think we can do anything about the symlinks at this point. It already has userland users, so we'll have to maintain them and there's no reason to create something else for the same functionality. It's silly that there's no way to tell whether the device is mounted from a given block device but then again we've been working around it by reverse mapping it till now so unless there's a new requirement maybe it's okay as it is now. Ideally, it would have been nice and more fitting with the whole bd claim API if we just exported single attribute which identifies the current holder supplied at the time of claiming (just the kernel identifier in string), so that we can forward-map the exclusive opener in general instead of having to reverse-map it for everything other than md/dm. Thank you. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <20110114162022.GC978@htj.dyndns.org>]
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() [not found] ` <20110114162022.GC978@htj.dyndns.org> @ 2011-01-14 17:59 ` Ted Ts'o 2011-01-14 18:23 ` Tejun Heo 0 siblings, 1 reply; 35+ messages in thread From: Ted Ts'o @ 2011-01-14 17:59 UTC (permalink / raw) To: Tejun Heo Cc: NeilBrown, Milan Broz, Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On Fri, Jan 14, 2011 at 05:20:22PM +0100, Tejun Heo wrote: > > Hmmm... that's already possible with the existing holders symlinks, > right? As Kay said in another message, I don't think we can do > anything about the symlinks at this point. It already has userland > users, so we'll have to maintain them and there's no reason to create > something else for the same functionality. Yes, agreed. My point was that if we have a better system, we could migrate blkid and other users to new interface, and maybe in a few years, we can deprecate that interface. So we do have to maintain them for at least the medium term, but that shouldn't stop us from divising a better interface, and then gradually migrating systems to use that newer and better interface. - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-14 17:59 ` Ted Ts'o @ 2011-01-14 18:23 ` Tejun Heo 0 siblings, 0 replies; 35+ messages in thread From: Tejun Heo @ 2011-01-14 18:23 UTC (permalink / raw) To: Ted Ts'o, Tejun Heo, NeilBrown, Milan Broz, Karel Zak, device-mapper development Hello, Ted. On Fri, Jan 14, 2011 at 6:59 PM, Ted Ts'o <tytso@mit.edu> wrote: > Yes, agreed. My point was that if we have a better system, we could > migrate blkid and other users to new interface, and maybe in a few > years, we can deprecate that interface. So we do have to maintain > them for at least the medium term, but that shouldn't stop us from > divising a better interface, and then gradually migrating systems to > use that newer and better interface. Oh yeah, fully agreed there and the implementation at the block layer isn't even gonna be difficult. Just a single attribute which contains a string which can be specified at the time of claiming. We probably can introduce a new interface where the @holder is always a string the content of which should uniquely identify the holder in human/machine readable form and add a few helpers to format those strings so that the same type of usages use consistently formatted strings. For nested devices, kdev should do. For filesystems, the mount point maybe (it should be something which identifies it absolutely)? For specific programs (cd burner, whatnot), pid and so on. The only question is, would that be actually necessary? By now, we already have mostly working reverse mapping for most things which matter through blkid, fs information, fd listing under proc and so on. They sure are ugly and probably unreliable at times but is it pressing enough to introduce new interface and migrate things over or would we be just creating churns? I don't really know how these things look from userland so am genuinely unsure what the answer is. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:30 ` Tejun Heo 2011-01-13 14:43 ` Kay Sievers 2011-01-13 14:45 ` Theodore Tso @ 2011-01-13 14:49 ` Milan Broz 2011-01-14 16:35 ` Tejun Heo 2 siblings, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-13 14:49 UTC (permalink / raw) To: Tejun Heo Cc: Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers On 01/13/2011 03:30 PM, Tejun Heo wrote: Hi, >> Anyway, it is /sys/block - so it represents block devices. >> >> If btrfs internally creates some virtual _block_ device for its pool, it should >> present it here too with slaves/holders. If not, why it should create any links there? > > Yeah, that's the most bothering part for me. The biggest customers of > bd_claim are filesystems and all these custom symlinkeries don't do > nothing for them. It just doesn't make a whole lot of sense to me. Well, then it is not optimal for fs and should it be separated? There is also /sys/fs now... I think represent dependences of block devices in some generic (and exactly defined) way is useful. So if there is something missing, lets define it properly. The whole idea behind lsblk was practical: we responded many questions why "device is busy" and what keeps it open in stacked subsystem. Just try fdisk, mdadm, dmsetup table, losetup -a, kpartx, cryptsetup, etc ... So listing block device tree using ONE interface (sysfs) to check and display device tree for ALL subsystems seems to be good idea for me. In this exactly defined case. I am not saying that symlinks are perfect, just that generic interface here is not superfluous. Thanks, Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() 2011-01-13 14:49 ` Milan Broz @ 2011-01-14 16:35 ` Tejun Heo 0 siblings, 0 replies; 35+ messages in thread From: Tejun Heo @ 2011-01-14 16:35 UTC (permalink / raw) To: Milan Broz Cc: Karel Zak, device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro, linux-fsdevel, Kay Sievers Hello, On Thu, Jan 13, 2011 at 03:49:50PM +0100, Milan Broz wrote: > So listing block device tree using ONE interface (sysfs) to check > and display device tree for ALL subsystems seems to be good idea for > me. In this exactly defined case. > > I am not saying that symlinks are perfect, just that generic > interface here is not superfluous. I'm not saying finding out the current owner is superflous. That actually is useful but what's implemented now doesn't satisfy that in any generic manner. It only works for md/dm and is designed in such way that it can't be used out of that scope. So, it's only useful for md/dm and for those two cases it tries to show information which doesn't really belong there using an overly complex mechanism. Block device open allows _nesting_ by the claiming device. It is never meant to track different nested claimers and it shows in the implementation and interface. The holder tracking doesn't fit anywhere. It's a completely separate piece of logic which doesn't mix with anything else in the block layer. If you consider its only user - dm-table, the whole picture is somewhat uglier. dm-table tracks how it uses block devices but only within a single target, so there are two separate layers of reference counting going on. This again is caused by the above stated design inconsistency. As block device only considers single claimer which doesn't have to be another device, the owner naturally should have been something which only points to single in-kernel entity in a way which doesn't require link to another sysfs entry. Karel pointed out that having different methods between md and dm would be silly; however, that is a problem which should be solved between dm and md and actually is a quite artificial problem (or solution). It only exists because dm and md do very similar things in many cases. A block device can be used by many more things which are way out of the scope which can be represented in the sysfs hierarchy. It simply is impossible to represent that relationship with symlinks under block device sysfs hierarchy. Thank you. -- tejun ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] block: restore multiple bd_link_disk_holder() support 2011-01-13 2:19 ` Jun'ichi Nomura 2011-01-13 11:06 ` Tejun Heo @ 2011-01-13 17:21 ` Tejun Heo 2011-01-13 18:42 ` Milan Broz 1 sibling, 1 reply; 35+ messages in thread From: Tejun Heo @ 2011-01-13 17:21 UTC (permalink / raw) To: Jens Axboe, Jun'ichi Nomura Cc: Milan Broz, Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development, Kay Sievers Commit e09b457b (block: simplify holder symlink handling) incorrectly assumed that there is only one holder at maximum. dm may use multiple holders. Remove the single holder assumption and automatic removal of the link. Let the callers explicitly remove them. This change makes it even more alien from the rest of the block layer. While at it, note that this facility should not be used by anyone else than the current ones. Sysfs symlinks shouldn't be abused like this and the whole thing doesn't belong in the block layer at all. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Milan Broz <mbroz@redhat.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Kay Sievers <kay.sievers@vrfy.org> --- Milan, Jun, can you guys please verify this works correctly for the multi holder dm case? Thank you. drivers/md/dm-table.c | 1 + drivers/md/md.c | 1 + fs/block_dev.c | 28 +++++++++++++++------------- include/linux/fs.h | 9 ++++++--- 4 files changed, 23 insertions(+), 16 deletions(-) Index: work/include/linux/fs.h =================================================================== --- work.orig/include/linux/fs.h +++ work/include/linux/fs.h @@ -663,9 +663,6 @@ struct block_device { void * bd_holder; int bd_holders; bool bd_write_holder; -#ifdef CONFIG_SYSFS - struct gendisk * bd_holder_disk; /* for sysfs slave linkng */ -#endif struct block_device * bd_contains; unsigned bd_block_size; struct hd_struct * bd_part; @@ -2045,12 +2042,18 @@ extern struct block_device *blkdev_get_b extern int blkdev_put(struct block_device *bdev, fmode_t mode); #ifdef CONFIG_SYSFS extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); +extern void bd_unlink_disk_holder(struct block_device *bdev, + struct gendisk *disk); #else static inline int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) { return 0; } +static inline void bd_unlink_disk_holder(struct block_device *bdev, + struct gendisk *disk) +{ +} #endif #endif Index: work/drivers/md/dm-table.c =================================================================== --- work.orig/drivers/md/dm-table.c +++ work/drivers/md/dm-table.c @@ -347,6 +347,7 @@ static void close_dev(struct dm_dev_inte if (!d->dm_dev.bdev) return; + bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md)); blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL); d->dm_dev.bdev = NULL; } Index: work/drivers/md/md.c =================================================================== --- work.orig/drivers/md/md.c +++ work/drivers/md/md.c @@ -1907,6 +1907,7 @@ static void unbind_rdev_from_array(mdk_r MD_BUG(); return; } + bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk); list_del_rcu(&rdev->same_set); printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b)); rdev->mddev = NULL; Index: work/fs/block_dev.c =================================================================== --- work.orig/fs/block_dev.c +++ work/fs/block_dev.c @@ -788,6 +788,8 @@ static void del_symlink(struct kobject * * @bdev: the claimed slave bdev * @disk: the holding disk * + * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT. + * * This functions creates the following sysfs symlinks. * * - from "slaves" directory of the holder @disk to the claimed @bdev @@ -815,7 +817,7 @@ int bd_link_disk_holder(struct block_dev mutex_lock(&bdev->bd_mutex); - WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk); + WARN_ON_ONCE(!bdev->bd_holder); /* FIXME: remove the following once add_disk() handles errors */ if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir)) @@ -831,27 +833,28 @@ int bd_link_disk_holder(struct block_dev goto out_unlock; } - bdev->bd_holder_disk = disk; out_unlock: mutex_unlock(&bdev->bd_mutex); return ret; } EXPORT_SYMBOL_GPL(bd_link_disk_holder); -static void bd_unlink_disk_holder(struct block_device *bdev) +/** + * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder() + * @bdev: the calimed slave bdev + * @disk: the holding disk + * + * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT. + * + * CONTEXT: + * Might sleep. + */ +void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk) { - struct gendisk *disk = bdev->bd_holder_disk; - - bdev->bd_holder_disk = NULL; - if (!disk) - return; - del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj); del_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj); } -#else -static inline void bd_unlink_disk_holder(struct block_device *bdev) -{ } +EXPORT_SYMBOL_GPL(bd_unlink_disk_holder); #endif /** @@ -1374,7 +1377,6 @@ int blkdev_put(struct block_device *bdev * unblock evpoll if it was a write holder. */ if (bdev_free) { - bd_unlink_disk_holder(bdev); if (bdev->bd_write_holder) { disk_unblock_events(bdev->bd_disk); bdev->bd_write_holder = false; ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] block: restore multiple bd_link_disk_holder() support 2011-01-13 17:21 ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo @ 2011-01-13 18:42 ` Milan Broz 2011-01-14 7:31 ` Jun'ichi Nomura 0 siblings, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-13 18:42 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, Jun'ichi Nomura, Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development, Kay Sievers, Alasdair G Kergon On 01/13/2011 06:21 PM, Tejun Heo wrote: > Commit e09b457b (block: simplify holder symlink handling) incorrectly > assumed that there is only one holder at maximum. dm may use multiple > holders. Remove the single holder assumption and automatic removal of > the link. Let the callers explicitly remove them. This change makes > it even more alien from the rest of the block layer. > > While at it, note that this facility should not be used by anyone else > than the current ones. Sysfs symlinks shouldn't be abused like this > and the whole thing doesn't belong in the block layer at all. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Milan Broz <mbroz@redhat.com> > Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> > Cc: Neil Brown <neilb@suse.de> > Cc: linux-raid@vger.kernel.org > Cc: Kay Sievers <kay.sievers@vrfy.org> > --- > Milan, Jun, can you guys please verify this works correctly for the > multi holder dm case? Thank you. Hi, unfortunately not. And the problem is much worse, it breaks lvm resize operation completely. I cherry-picked you patch to my tree, it fails, reverting helps. Following test is over linux-next with your last patch applied: Test case (lvresize of volume over 3 disks sdb,c,d): # pvcreate /dev/sd[bcd] Physical volume "/dev/sdb" successfully created Physical volume "/dev/sdc" successfully created Physical volume "/dev/sdd" successfully created # vgcreate vg_test /dev/sd[bcd] Volume group "vg_test" successfully created # lvcreate -l100%FREE -n lv vg_test Logical volume "lv" created # dmsetup table vg_test-lv: 0 409600 linear 8:16 2048 vg_test-lv: 409600 409600 linear 8:32 2048 vg_test-lv: 819200 409600 linear 8:48 2048 so we have active LV mapped to 3 devices now. Now online resize it. I means that it will create inactive table, then switch active (so the magic with claiming disks applies): # lvresize -L-10M vg_test/lv Rounding up size to full physical extent 8.00 MiB WARNING: Reducing active logical volume to 592.00 MiB THIS MAY DESTROY YOUR DATA (filesystem etc.) Do you really want to reduce lv? [y/n]: y Reducing logical volume lv to 592.00 MiB device-mapper: reload ioctl failed: Invalid argument Failed to suspend lv and in syslog: [ 291.221081] ------------[ cut here ]------------ [ 291.221111] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0x6b/0x80() [ 291.221130] Hardware name: VMware Virtual Platform [ 291.221140] sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0/slaves/sdb' [ 291.221162] Modules linked in: usbcore dm_mod [ 291.221265] Pid: 4074, comm: lvresize Tainted: G W 2.6.37-next-20110113+ #2 [ 291.221269] Call Trace: [ 291.221286] [<c102e58f>] ? warn_slowpath_common+0x65/0x7a [ 291.221290] [<c10f8911>] ? sysfs_add_one+0x6b/0x80 [ 291.221295] [<c102e608>] ? warn_slowpath_fmt+0x26/0x2a [ 291.221299] [<c10f8911>] ? sysfs_add_one+0x6b/0x80 [ 291.221304] [<c10f9482>] ? sysfs_do_create_link+0xda/0x164 [ 291.221308] [<c10f9522>] ? sysfs_create_link+0xa/0xc [ 291.221314] [<c10db85c>] ? bd_link_disk_holder+0x66/0xad [ 291.221484] [<d08289ea>] ? open_dev+0x47/0x67 [dm_mod] [ 291.221492] [<d0828aff>] ? dm_get_device+0xf5/0x1d5 [dm_mod] [ 291.221502] [<c11bf168>] ? vsscanf+0x364/0x3ee [ 291.221510] [<c10b250a>] ? cache_alloc_debugcheck_after+0xf/0x180 [ 291.221523] [<d0829812>] ? linear_ctr+0x86/0xc0 [dm_mod] [ 291.221531] [<d0829287>] ? dm_table_add_target+0x153/0x1d3 [dm_mod] [ 291.221538] [<d082adea>] ? table_load+0x1fd/0x21e [dm_mod] [ 291.221545] [<d082b9f8>] ? dm_ctl_ioctl+0x188/0x1c8 [dm_mod] [ 291.221551] [<d082abed>] ? table_load+0x0/0x21e [dm_mod] [ 291.221558] [<d082b870>] ? dm_ctl_ioctl+0x0/0x1c8 [dm_mod] [ 291.221565] [<c10c38ab>] ? do_vfs_ioctl+0x493/0x4d8 [ 291.221573] [<c1313511>] ? do_page_fault+0x3ee/0x418 [ 291.221578] [<c10c391e>] ? sys_ioctl+0x2e/0x48 [ 291.221583] [<c1002853>] ? sysenter_do_call+0x12/0x32 [ 291.221609] ---[ end trace c21a5bad605a4a87 ]--- [ 291.221760] device-mapper: table: 254:0: linear: dm-linear: Device lookup failed [ 291.221782] device-mapper: ioctl: error adding target to table Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] block: restore multiple bd_link_disk_holder() support 2011-01-13 18:42 ` Milan Broz @ 2011-01-14 7:31 ` Jun'ichi Nomura 2011-01-14 16:10 ` [PATCH UPDATED] " Tejun Heo 0 siblings, 1 reply; 35+ messages in thread From: Jun'ichi Nomura @ 2011-01-14 7:31 UTC (permalink / raw) To: Milan Broz, Tejun Heo Cc: Jens Axboe, Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development, Kay Sievers, Alasdair G Kergon Hi, On 01/14/11 03:42, Milan Broz wrote: >> Milan, Jun, can you guys please verify this works correctly for the >> multi holder dm case? Thank you. > > Hi, > > unfortunately not. And the problem is much worse, > it breaks lvm resize operation completely. For quick testing, try this: ("/dev/sdf" can be other block device. "testA" and "testB" can be other dm names which aren't used in your machine.) # echo "0 10 linear /dev/sdf 0" | dmsetup create testA # echo "0 10 linear /dev/sdf 10" | dmsetup create testB (at this point, /dev/sdf has multiple holders) # echo "0 20 linear /dev/sdf 10" | dmsetup load testB (at this point, /dev/mapper/testB has 2 claims on /dev/sdf) # dmsetup resume testB (/dev/mapper/testB has 1 claim on /dev/sdf) # dmsetup remove testA # dmsetup remove testB Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH UPDATED] block: restore multiple bd_link_disk_holder() support 2011-01-14 7:31 ` Jun'ichi Nomura @ 2011-01-14 16:10 ` Tejun Heo 2011-01-14 21:09 ` Milan Broz 0 siblings, 1 reply; 35+ messages in thread From: Tejun Heo @ 2011-01-14 16:10 UTC (permalink / raw) To: Jun'ichi Nomura Cc: Jens Axboe, Valdis.Kletnieks, linux-kernel, linux-raid, device-mapper development, Alexander Viro, linux-fsdevel, Alasdair G Kergon, Milan Broz Commit e09b457b (block: simplify holder symlink handling) incorrectly assumed that there is only one link at maximum. dm may use multiple links and expects block layer to track reference count for each link, which is different from and unrelated to the exclusive device holder identified by @holder when the device is opened. Remove the single holder assumption and automatic removal of the link and revive the per-link reference count tracking. The code essentially behaves the same as before commit e09b457b sans the unnecessary kobject reference count dancing. While at it, note that this facility should not be used by anyone else than the current ones. Sysfs symlinks shouldn't be abused like this and the whole thing doesn't belong in the block layer at all. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Milan Broz <mbroz@redhat.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Kay Sievers <kay.sievers@vrfy.org> --- Thanks for the test commands. They were very helpful. Can you please test this one? Thanks. drivers/md/dm-table.c | 1 drivers/md/md.c | 1 fs/block_dev.c | 93 ++++++++++++++++++++++++++++++++++++++++---------- include/linux/fs.h | 8 +++- 4 files changed, 84 insertions(+), 19 deletions(-) Index: work/include/linux/fs.h =================================================================== --- work.orig/include/linux/fs.h +++ work/include/linux/fs.h @@ -664,7 +664,7 @@ struct block_device { int bd_holders; bool bd_write_holder; #ifdef CONFIG_SYSFS - struct gendisk * bd_holder_disk; /* for sysfs slave linkng */ + struct list_head bd_holder_disks; #endif struct block_device * bd_contains; unsigned bd_block_size; @@ -2045,12 +2045,18 @@ extern struct block_device *blkdev_get_b extern int blkdev_put(struct block_device *bdev, fmode_t mode); #ifdef CONFIG_SYSFS extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); +extern void bd_unlink_disk_holder(struct block_device *bdev, + struct gendisk *disk); #else static inline int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) { return 0; } +static inline void bd_unlink_disk_holder(struct block_device *bdev, + struct gendisk *disk) +{ +} #endif #endif Index: work/drivers/md/dm-table.c =================================================================== --- work.orig/drivers/md/dm-table.c +++ work/drivers/md/dm-table.c @@ -347,6 +347,7 @@ static void close_dev(struct dm_dev_inte if (!d->dm_dev.bdev) return; + bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md)); blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL); d->dm_dev.bdev = NULL; } Index: work/drivers/md/md.c =================================================================== --- work.orig/drivers/md/md.c +++ work/drivers/md/md.c @@ -1907,6 +1907,7 @@ static void unbind_rdev_from_array(mdk_r MD_BUG(); return; } + bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk); list_del_rcu(&rdev->same_set); printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b)); rdev->mddev = NULL; Index: work/fs/block_dev.c =================================================================== --- work.orig/fs/block_dev.c +++ work/fs/block_dev.c @@ -426,6 +426,9 @@ static void init_once(void *foo) mutex_init(&bdev->bd_mutex); INIT_LIST_HEAD(&bdev->bd_inodes); INIT_LIST_HEAD(&bdev->bd_list); +#ifdef CONFIG_SYSFS + INIT_LIST_HEAD(&bdev->bd_holder_disks); +#endif inode_init_once(&ei->vfs_inode); /* Initialize mutex for freeze. */ mutex_init(&bdev->bd_fsfreeze_mutex); @@ -773,6 +776,23 @@ static struct block_device *bd_start_cla } #ifdef CONFIG_SYSFS +struct bd_holder_disk { + struct list_head list; + struct gendisk *disk; + int refcnt; +}; + +static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev, + struct gendisk *disk) +{ + struct bd_holder_disk *holder; + + list_for_each_entry(holder, &bdev->bd_holder_disks, list) + if (holder->disk == disk) + return holder; + return NULL; +} + static int add_symlink(struct kobject *from, struct kobject *to) { return sysfs_create_link(from, to, kobject_name(to)); @@ -788,6 +808,8 @@ static void del_symlink(struct kobject * * @bdev: the claimed slave bdev * @disk: the holding disk * + * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT. + * * This functions creates the following sysfs symlinks. * * - from "slaves" directory of the holder @disk to the claimed @bdev @@ -811,47 +833,83 @@ static void del_symlink(struct kobject * */ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) { + struct bd_holder_disk *holder; int ret = 0; mutex_lock(&bdev->bd_mutex); - WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk); + WARN_ON_ONCE(!bdev->bd_holder); /* FIXME: remove the following once add_disk() handles errors */ if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir)) goto out_unlock; - ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj); - if (ret) + holder = bd_find_holder_disk(bdev, disk); + if (holder) { + holder->refcnt++; goto out_unlock; + } - ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj); - if (ret) { - del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj); + holder = kzalloc(sizeof(*holder), GFP_KERNEL); + if (!holder) { + ret = -ENOMEM; goto out_unlock; } - bdev->bd_holder_disk = disk; + INIT_LIST_HEAD(&holder->list); + holder->disk = disk; + holder->refcnt = 1; + + ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj); + if (ret) + goto out_free; + + ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj); + if (ret) + goto out_del; + + list_add(&holder->list, &bdev->bd_holder_disks); + goto out_unlock; + +out_del: + del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj); +out_free: + kfree(holder); out_unlock: mutex_unlock(&bdev->bd_mutex); return ret; } EXPORT_SYMBOL_GPL(bd_link_disk_holder); -static void bd_unlink_disk_holder(struct block_device *bdev) +/** + * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder() + * @bdev: the calimed slave bdev + * @disk: the holding disk + * + * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT. + * + * CONTEXT: + * Might sleep. + */ +void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk) { - struct gendisk *disk = bdev->bd_holder_disk; + struct bd_holder_disk *holder; - bdev->bd_holder_disk = NULL; - if (!disk) - return; + mutex_lock(&bdev->bd_mutex); - del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj); - del_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj); + holder = bd_find_holder_disk(bdev, disk); + + if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) { + del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj); + del_symlink(bdev->bd_part->holder_dir, + &disk_to_dev(disk)->kobj); + list_del_init(&holder->list); + kfree(holder); + } + + mutex_unlock(&bdev->bd_mutex); } -#else -static inline void bd_unlink_disk_holder(struct block_device *bdev) -{ } +EXPORT_SYMBOL_GPL(bd_unlink_disk_holder); #endif /** @@ -1374,7 +1432,6 @@ int blkdev_put(struct block_device *bdev * unblock evpoll if it was a write holder. */ if (bdev_free) { - bd_unlink_disk_holder(bdev); if (bdev->bd_write_holder) { disk_unblock_events(bdev->bd_disk); bdev->bd_write_holder = false; ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH UPDATED] block: restore multiple bd_link_disk_holder() support 2011-01-14 16:10 ` [PATCH UPDATED] " Tejun Heo @ 2011-01-14 21:09 ` Milan Broz 2011-01-17 0:18 ` Jun'ichi Nomura 0 siblings, 1 reply; 35+ messages in thread From: Milan Broz @ 2011-01-14 21:09 UTC (permalink / raw) To: Tejun Heo Cc: Jun'ichi Nomura, Jens Axboe, Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development, Kay Sievers, Alasdair G Kergon On 01/14/2011 05:10 PM, Tejun Heo wrote: > Commit e09b457b (block: simplify holder symlink handling) incorrectly > assumed that there is only one link at maximum. dm may use multiple > links and expects block layer to track reference count for each link, > which is different from and unrelated to the exclusive device holder > identified by @holder when the device is opened. > > Remove the single holder assumption and automatic removal of the link > and revive the per-link reference count tracking. The code > essentially behaves the same as before commit e09b457b sans the > unnecessary kobject reference count dancing. > > While at it, note that this facility should not be used by anyone else > than the current ones. Sysfs symlinks shouldn't be abused like this > and the whole thing doesn't belong in the block layer at all. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Milan Broz <mbroz@redhat.com> > Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> > Cc: Neil Brown <neilb@suse.de> > Cc: linux-raid@vger.kernel.org > Cc: Kay Sievers <kay.sievers@vrfy.org> > --- > Thanks for the test commands. They were very helpful. Can you please > test this one? Hi, yes, this one works for me. I run full lvm2 testsuite and no warnings. Thanks! Tested-by: Milan Broz <mbroz@redhat.com> Milan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH UPDATED] block: restore multiple bd_link_disk_holder() support 2011-01-14 21:09 ` Milan Broz @ 2011-01-17 0:18 ` Jun'ichi Nomura 0 siblings, 0 replies; 35+ messages in thread From: Jun'ichi Nomura @ 2011-01-17 0:18 UTC (permalink / raw) To: Milan Broz, Tejun Heo Cc: Jens Axboe, Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel, linux-raid, device-mapper development, Kay Sievers, Alasdair G Kergon On 01/15/11 06:09, Milan Broz wrote: > On 01/14/2011 05:10 PM, Tejun Heo wrote: >> Commit e09b457b (block: simplify holder symlink handling) incorrectly >> assumed that there is only one link at maximum. dm may use multiple >> links and expects block layer to track reference count for each link, >> which is different from and unrelated to the exclusive device holder >> identified by @holder when the device is opened. >> >> Remove the single holder assumption and automatic removal of the link >> and revive the per-link reference count tracking. The code >> essentially behaves the same as before commit e09b457b sans the >> unnecessary kobject reference count dancing. >> >> While at it, note that this facility should not be used by anyone else >> than the current ones. Sysfs symlinks shouldn't be abused like this >> and the whole thing doesn't belong in the block layer at all. >> >> Signed-off-by: Tejun Heo <tj@kernel.org> >> Reported-by: Milan Broz <mbroz@redhat.com> >> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> >> Cc: Neil Brown <neilb@suse.de> >> Cc: linux-raid@vger.kernel.org >> Cc: Kay Sievers <kay.sievers@vrfy.org> >> --- >> Thanks for the test commands. They were very helpful. Can you please >> test this one? > > Hi, > > yes, this one works for me. I run full lvm2 testsuite and no warnings. > Thanks! > > Tested-by: Milan Broz <mbroz@redhat.com> Thanks Tejun, Milan! If it passed my quick test and the lvm2 testsuite, I have nothing to add here. And the code looks ok, too. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2011-01-17 0:18 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-12 17:34 linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() Valdis.Kletnieks 2011-01-13 0:23 ` Milan Broz 2011-01-13 2:19 ` Jun'ichi Nomura 2011-01-13 11:06 ` Tejun Heo 2011-01-13 11:26 ` [dm-devel] " Milan Broz 2011-01-13 12:27 ` Karel Zak 2011-01-13 13:12 ` Tejun Heo 2011-01-13 13:26 ` Karel Zak 2011-01-13 13:37 ` Tejun Heo 2011-01-13 13:52 ` Tejun Heo 2011-01-13 13:58 ` Milan Broz 2011-01-13 14:11 ` Tejun Heo 2011-01-13 14:25 ` Milan Broz 2011-01-13 14:30 ` Tejun Heo 2011-01-13 14:43 ` Kay Sievers 2011-01-13 15:03 ` Milan Broz 2011-01-14 7:38 ` Jun'ichi Nomura 2011-01-13 15:59 ` Karel Zak 2011-01-13 16:10 ` [dm-devel] " Kay Sievers 2011-01-14 15:07 ` Karel Zak 2011-01-14 15:23 ` Kay Sievers 2011-01-13 14:45 ` Theodore Tso 2011-01-13 20:18 ` NeilBrown 2011-01-13 20:41 ` Ted Ts'o 2011-01-14 16:20 ` Tejun Heo [not found] ` <20110114162022.GC978@htj.dyndns.org> 2011-01-14 17:59 ` Ted Ts'o 2011-01-14 18:23 ` Tejun Heo 2011-01-13 14:49 ` Milan Broz 2011-01-14 16:35 ` Tejun Heo 2011-01-13 17:21 ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo 2011-01-13 18:42 ` Milan Broz 2011-01-14 7:31 ` Jun'ichi Nomura 2011-01-14 16:10 ` [PATCH UPDATED] " Tejun Heo 2011-01-14 21:09 ` Milan Broz 2011-01-17 0:18 ` Jun'ichi Nomura
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).