* [RFD] brd.ko Never supported partitions should we remove the dead code ? @ 2014-07-29 16:37 Boaz Harrosh 2014-07-29 16:56 ` Matthew Wilcox 2014-07-29 17:19 ` Ross Zwisler 0 siblings, 2 replies; 11+ messages in thread From: Boaz Harrosh @ 2014-07-29 16:37 UTC (permalink / raw) To: Nick Piggin, linux-scsi, Jens Axboe; +Cc: Ross Zwisler, Matthew Wilcox Hi folks I've been playing with yet unseen prd block device, and hit an issue with partitioning but since prd.c is a copy/paste of brd.c the same exact issue exists with brd. It does not support partitions! An attempt to fdisk say a couple of partitions, then mkfs && mount an individual partition will result in the primary device being accessed and a corrupted disk data. If I enable max_part(=2) properly on modprobe, then fdisk a couple of partitions all goes well But then if I pass the device to Kernel (Say via mount), after a call to bdev = blkdev_get_by_path(dev_name, mode, fs_type); I get the following results: (size is extracted using: i_size_read(bdev->bd_inode); part_size is extracted using: bdev->bd_part->nr_sects << 9; ) dev_name == /dev/ram0 [Jul29 17:40] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24340 bd_inode=ffff88003dc24430 bd_part=ffff88003ca22848 part_size=0x40000000 dev_name == /dev/ram0p1 [ +16.816065] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003d2f6d80 bd_inode=ffff88003d2f6e70 bd_part=ffff88003ca22848 part_size=0x40000000 dev_name == /dev/ram0p2 [Jul29 17:41] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24680 bd_inode=ffff88003dc24770 bd_part=ffff88003ca22848 part_size=0x40000000 We can see that all 3 different bdev's point to the same bd_part, which is the BUG. Funny is that bd_inode is different but contains same wrong size, for exactly the same reason. The fix for this is easy: ---- diff --git drivers/block/brd.c drivers/block/brd.c index c7d138e..0d982d7 100644 --- drivers/block/brd.c +++ drivers/block/brd.c @@ -378,10 +378,12 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector, if (!brd) return -ENODEV; + sector += get_start_sect(bdev); if (sector & (PAGE_SECTORS-1)) return -EINVAL; +/* Check is wrong here we need to check against bdev->bd_part->nr_sects */ if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk)) return -ERANGE; page = brd_insert_page(brd, sector); if (!page) return -ENOSPC; @@ -532,11 +532,17 @@ static struct brd_device *brd_init_one(int i) goto out; } +/* In the structure of this driver this will never happen and is wrong + * to do. We should just return NULL if not found. + * This is not the bug it is just DEAD CODE + * brd = brd_alloc(i); if (brd) { add_disk(brd->brd_disk); list_add_tail(&brd->brd_list, &brd_devices); } +*/ + prd = NULL; out: return brd; } @@ -558,7 +564,8 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) kobj = brd ? get_disk(brd->brd_disk) : NULL; mutex_unlock(&brd_devices_mutex); - *part = 0; +// Fix the partition BUG *part comes in correctly must not need to touch it +// *part = 0; return kobj; } --- After this simple fix of commenting out *part = 0, running again I get dev_name == /dev/ram0 [ +0.000004] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003d2f7740 inode=ffff88003d2f7830 part=ffff88001da8c048 part_size=0x40000000 dev_name == /dev/ram0p1 [ +0.000002] foo: [foo_mount:880] size=0x1e748200 bdev=ffff88003dc25040 inode=ffff88003dc25130 part=ffff880036f6a000 part_size=0x1e748200 dev_name == /dev/ram0p2 [ +0.000002] foo: [foo_mount:880] size=0x217b7e00 bdev=ffff88003d2f7a80 inode=ffff88003d2f7b70 part=ffff880036f69000 part_size=0x217b7e00 But before we are running to fix this bug. Could we please do better and just remove the support for partitions all together. Since it *never* worked anyway, so probably no one needs it! Surly no one used it Thanks Boaz ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFD] brd.ko Never supported partitions should we remove the dead code ? 2014-07-29 16:37 [RFD] brd.ko Never supported partitions should we remove the dead code ? Boaz Harrosh @ 2014-07-29 16:56 ` Matthew Wilcox 2014-07-29 17:13 ` Boaz Harrosh 2014-07-29 17:19 ` Ross Zwisler 1 sibling, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2014-07-29 16:56 UTC (permalink / raw) To: Boaz Harrosh; +Cc: Nick Piggin, linux-scsi, Jens Axboe, Ross Zwisler On Tue, Jul 29, 2014 at 07:37:49PM +0300, Boaz Harrosh wrote: > But before we are running to fix this bug. Could we please do better and just remove the support for partitions > all together. > Since it *never* worked anyway, so probably no one needs it! Surly no one used it I fixed this in patch 4/22 of the v8 series. The correct place to handle partitioning is in the block layer, not the individual drivers (nor the filesystems). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] brd.ko Never supported partitions should we remove the dead code ? 2014-07-29 16:56 ` Matthew Wilcox @ 2014-07-29 17:13 ` Boaz Harrosh 0 siblings, 0 replies; 11+ messages in thread From: Boaz Harrosh @ 2014-07-29 17:13 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-scsi, Jens Axboe, Ross Zwisler On 07/29/2014 07:56 PM, Matthew Wilcox wrote: > On Tue, Jul 29, 2014 at 07:37:49PM +0300, Boaz Harrosh wrote: >> But before we are running to fix this bug. Could we please do better and just remove the support for partitions >> all together. >> Since it *never* worked anyway, so probably no one needs it! Surly no one used it > > I fixed this in patch 4/22 of the v8 series. The correct place to > handle partitioning is in the block layer, not the individual drivers > (nor the filesystems). > Sir Mathew hi > @@ -378,10 +378,12 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector, > > if (!brd) > return -ENODEV; > + sector += get_start_sect(bdev); > if (sector & (PAGE_SECTORS-1)) > return -EINVAL; > +/* Check is wrong here we need to check against bdev->bd_part->nr_sects */ > if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk)) > return -ERANGE; > page = brd_insert_page(brd, sector); > if (!page) > return -ENOSPC; you mean you fixed the brd_direct_access() hunk by pointing all users to a wrapper that does the proper offsetting, sure. But that is not the main bug I was talking about, the main BUG is that partitions are not supported at all because of the clubber of *part in brd_probe() > @@ -558,7 +564,8 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) > kobj = brd ? get_disk(brd->brd_disk) : NULL; > mutex_unlock(&brd_devices_mutex); > > - *part = 0; > +// Fix the partition BUG *part comes in correctly must not need to touch it > +// *part = 0; > return kobj; > } And my point is that: No one uses partitions with brd, why should we not remove it all together, why fix it and keep it? Cheers Boaz ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] brd.ko Never supported partitions should we remove the dead code ? 2014-07-29 16:37 [RFD] brd.ko Never supported partitions should we remove the dead code ? Boaz Harrosh 2014-07-29 16:56 ` Matthew Wilcox @ 2014-07-29 17:19 ` Ross Zwisler 2014-07-30 11:11 ` Boaz Harrosh 1 sibling, 1 reply; 11+ messages in thread From: Ross Zwisler @ 2014-07-29 17:19 UTC (permalink / raw) To: Boaz Harrosh; +Cc: Nick Piggin, linux-scsi, Jens Axboe, Matthew Wilcox On Tue, 2014-07-29 at 19:37 +0300, Boaz Harrosh wrote: > Hi folks > > I've been playing with yet unseen prd block device, and hit an issue with partitioning > but since prd.c is a copy/paste of brd.c the same exact issue exists with brd. > > It does not support partitions! > > An attempt to fdisk say a couple of partitions, then mkfs && mount an individual > partition will result in the primary device being accessed and a corrupted disk data. > > If I enable max_part(=2) properly on modprobe, then fdisk a couple of partitions all > goes well > > But then if I pass the device to Kernel (Say via mount), after a call to > bdev = blkdev_get_by_path(dev_name, mode, fs_type); > > I get the following results: > (size is extracted using: i_size_read(bdev->bd_inode); > part_size is extracted using: bdev->bd_part->nr_sects << 9; > ) > > dev_name == /dev/ram0 > [Jul29 17:40] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24340 bd_inode=ffff88003dc24430 bd_part=ffff88003ca22848 part_size=0x40000000 > > dev_name == /dev/ram0p1 > [ +16.816065] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003d2f6d80 bd_inode=ffff88003d2f6e70 bd_part=ffff88003ca22848 part_size=0x40000000 > > dev_name == /dev/ram0p2 > [Jul29 17:41] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24680 bd_inode=ffff88003dc24770 bd_part=ffff88003ca22848 part_size=0x40000000 > > We can see that all 3 different bdev's point to the same bd_part, which is the BUG. Funny is that bd_inode is different but contains > same wrong size, for exactly the same reason. Yep, I've seen this as well. It turns out that partitions *do* work in brd if you don't specify the 'rd_nr' module parameter, but if you do specify the module parameter you get the partition overlapping behavior that you observed. AFAICT the difference between these two scenarios is that when you set rd_nr, the 'range' variable is set so something small, but when you don't set it 'range' is 1,048,576. That's done by this bit of code in brd_init: if (rd_nr) { nr = rd_nr; range = rd_nr << part_shift; } else { nr = CONFIG_BLK_DEV_RAM_COUNT; range = 1UL << MINORBITS; } The difference in behavior happens up the stack somewhere as a result of you passing 'range' into blk_register_region(). Anyway, the quick and dirty workaround is to always have 'range' be something large. This is what I've done in the current master branch of the upstream PRD tree - see commit 0256739ccab03d1662221f595b03797370c2ac12. For what it's worth, I'm planning on updating prd (and probably brd) so that it dynamically allocates partition numbers. See commit 469071a37afc8a627b6b2ddf29db0a097d864845 for how this was done in the nvme driver. - Ross ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] brd.ko Never supported partitions should we remove the dead code ? 2014-07-29 17:19 ` Ross Zwisler @ 2014-07-30 11:11 ` Boaz Harrosh 2014-07-30 14:15 ` [PATCH 1/2] brd: Fix the partitions BUG Boaz Harrosh 0 siblings, 1 reply; 11+ messages in thread From: Boaz Harrosh @ 2014-07-30 11:11 UTC (permalink / raw) To: Ross Zwisler; +Cc: Nick Piggin, linux-scsi, Jens Axboe, Matthew Wilcox On 07/29/2014 08:19 PM, Ross Zwisler wrote: > On Tue, 2014-07-29 at 19:37 +0300, Boaz Harrosh wrote: >> Hi folks >> >> I've been playing with yet unseen prd block device, and hit an issue with partitioning >> but since prd.c is a copy/paste of brd.c the same exact issue exists with brd. >> >> It does not support partitions! >> >> An attempt to fdisk say a couple of partitions, then mkfs && mount an individual >> partition will result in the primary device being accessed and a corrupted disk data. >> >> If I enable max_part(=2) properly on modprobe, then fdisk a couple of partitions all >> goes well >> >> But then if I pass the device to Kernel (Say via mount), after a call to >> bdev = blkdev_get_by_path(dev_name, mode, fs_type); >> >> I get the following results: >> (size is extracted using: i_size_read(bdev->bd_inode); >> part_size is extracted using: bdev->bd_part->nr_sects << 9; >> ) >> >> dev_name == /dev/ram0 >> [Jul29 17:40] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24340 bd_inode=ffff88003dc24430 bd_part=ffff88003ca22848 part_size=0x40000000 >> >> dev_name == /dev/ram0p1 >> [ +16.816065] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003d2f6d80 bd_inode=ffff88003d2f6e70 bd_part=ffff88003ca22848 part_size=0x40000000 >> >> dev_name == /dev/ram0p2 >> [Jul29 17:41] foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24680 bd_inode=ffff88003dc24770 bd_part=ffff88003ca22848 part_size=0x40000000 >> >> We can see that all 3 different bdev's point to the same bd_part, which is the BUG. Funny is that bd_inode is different but contains >> same wrong size, for exactly the same reason. > > Yep, I've seen this as well. It turns out that partitions *do* work in brd if > you don't specify the 'rd_nr' module parameter, but if you do specify the > module parameter you get the partition overlapping behavior that you observed. > > AFAICT the difference between these two scenarios is that when you set rd_nr, > the 'range' variable is set so something small, but when you don't set it > 'range' is 1,048,576. That's done by this bit of code in brd_init: > > if (rd_nr) { > nr = rd_nr; > range = rd_nr << part_shift; > } else { > nr = CONFIG_BLK_DEV_RAM_COUNT; > range = 1UL << MINORBITS; > } > > The difference in behavior happens up the stack somewhere as a result of you > passing 'range' into blk_register_region(). > > Anyway, the quick and dirty workaround is to always have 'range' be something > large. This is what I've done in the current master branch of the upstream > PRD tree - see commit 0256739ccab03d1662221f595b03797370c2ac12. > > For what it's worth, I'm planning on updating prd (and probably brd) so that > it dynamically allocates partition numbers. See commit > 469071a37afc8a627b6b2ddf29db0a097d864845 for how this was done in the nvme > driver. > Hi Ross I've looked at 469071a37afc8a627b6b2ddf29db0a097d864845, but surly as you said it is not the fix. I have posted the fix, in the code you snipped out. So I get it that you guys are not convinced that we need to remove support for Partitions for brd? For me a partition for /dev/ramX is totally bogus. What's the point at all? Since we have the same effect if we just do rd_nr and partition memory this way, and save 1M of memory. And it is not as if you can loose power and want the partitioning persistent for the next boot. But sigh I will not fight you about it, I made my point of this being pointless and that is that. I will post a proper patch for brd on top of Jens's tree unless you want it rebased over some other tree, I guess it can just be queued for 3.17. > - Ross Thanks Boaz ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] brd: Fix the partitions BUG 2014-07-30 11:11 ` Boaz Harrosh @ 2014-07-30 14:15 ` Boaz Harrosh 2014-07-30 14:18 ` [PATCH 2/2] brd: Fix brd_direct_access with partitions Boaz Harrosh 2014-07-30 16:50 ` [PATCH 1/2] brd: Fix the partitions BUG Ross Zwisler 0 siblings, 2 replies; 11+ messages in thread From: Boaz Harrosh @ 2014-07-30 14:15 UTC (permalink / raw) To: Jens Axboe, Ross Zwisler, Matthew Wilcox Cc: linux-scsi, Nick Piggin, linux-kernel With current code after a call to: bdev = blkdev_get_by_path(dev_name, mode, fs_type); size = i_size_read(bdev->bd_inode); part_size = bdev->bd_part->nr_sects << 9; I get the following bad results: dev_name == /dev/ram0 foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24340 \ bd_inode=ffff88003dc24430 bd_part=ffff88003ca22848 part_size=0x40000000 dev_name == /dev/ram0p1 foo: [foo_mount:880] size=0x40000000 bdev=ffff88003d2f6d80 \ bd_inode=ffff88003d2f6e70 bd_part=ffff88003ca22848 part_size=0x40000000 dev_name == /dev/ram0p2 foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24680 \ bd_inode=ffff88003dc24770 bd_part=ffff88003ca22848 part_size=0x40000000 Note how all three bdev(s) point to the same bd_part. This is do to a single bad clubber in brd_probe() which is removed in this patch: - *part = 0; because of this all 3 bdev(s) above get to point to the same bd_part[0] While at it fix/rename brd_init_one() since all devices are created on load of driver, brd_probe() will never be called with a new un-created device. brd_init_one() is now renamed to brd_find() which is what it does. TODO: There is one more partitions BUG regarding brd_direct_access() which is fixed on the next patch. Signed-off-by: Boaz Harrosh <boaz@plexistor.com> --- drivers/block/brd.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index c7d138e..92334f6 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -523,22 +523,20 @@ static void brd_free(struct brd_device *brd) kfree(brd); } -static struct brd_device *brd_init_one(int i) +static struct brd_device *brd_find(int i) { struct brd_device *brd; list_for_each_entry(brd, &brd_devices, brd_list) { if (brd->brd_number == i) - goto out; + return brd; } - brd = brd_alloc(i); - if (brd) { - add_disk(brd->brd_disk); - list_add_tail(&brd->brd_list, &brd_devices); - } -out: - return brd; + /* brd always allocates all its devices at load time, therefor + * brd_probe will never be called with a new brd_number + */ + printk(KERN_EROR "brd: brd_find unexpected device %d\n", i); + return NULL; } static void brd_del_one(struct brd_device *brd) @@ -554,11 +552,10 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) struct kobject *kobj; mutex_lock(&brd_devices_mutex); - brd = brd_init_one(MINOR(dev) >> part_shift); + brd = brd_find(MINOR(dev) >> part_shift); kobj = brd ? get_disk(brd->brd_disk) : NULL; mutex_unlock(&brd_devices_mutex); - *part = 0; return kobj; } -- 1.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] brd: Fix brd_direct_access with partitions 2014-07-30 14:15 ` [PATCH 1/2] brd: Fix the partitions BUG Boaz Harrosh @ 2014-07-30 14:18 ` Boaz Harrosh 2014-07-30 15:34 ` Matthew Wilcox 2014-07-30 16:50 ` [PATCH 1/2] brd: Fix the partitions BUG Ross Zwisler 1 sibling, 1 reply; 11+ messages in thread From: Boaz Harrosh @ 2014-07-30 14:18 UTC (permalink / raw) To: Jens Axboe, Ross Zwisler, Matthew Wilcox Cc: linux-scsi, Nick Piggin, linux-kernel When brd_direct_access() is called on a partition-bdev it would access the wrong sector. And caller would then corrupt the device's data. This is a preliminary fix, Matthew Wilcox has a patch in his DAX patchset which will define a global wrapper to bdev->bd_disk->fops->direct_access that will do the proper checks and translations before calling a driver global member. (The way it is done at the rest of the block stack) CC: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Boaz Harrosh <boaz@plexistor.com> --- drivers/block/brd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 92334f6..7506864 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -378,9 +378,10 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector, if (!brd) return -ENODEV; + sector += get_start_sect(bdev); if (sector & (PAGE_SECTORS-1)) return -EINVAL; - if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk)) + if (unlikely(sector + PAGE_SECTORS > part_nr_sects_read(bdev->bd_part))) return -ERANGE; page = brd_insert_page(brd, sector); if (!page) -- 1.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] brd: Fix brd_direct_access with partitions 2014-07-30 14:18 ` [PATCH 2/2] brd: Fix brd_direct_access with partitions Boaz Harrosh @ 2014-07-30 15:34 ` Matthew Wilcox 2014-07-30 15:37 ` Boaz Harrosh 0 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2014-07-30 15:34 UTC (permalink / raw) To: Boaz Harrosh Cc: Jens Axboe, Ross Zwisler, linux-scsi, Nick Piggin, linux-kernel On Wed, Jul 30, 2014 at 05:18:47PM +0300, Boaz Harrosh wrote: > When brd_direct_access() is called on a partition-bdev > it would access the wrong sector. And caller would then > corrupt the device's data. > > This is a preliminary fix, Matthew Wilcox has a patch > in his DAX patchset which will define a global wrapper > to bdev->bd_disk->fops->direct_access that will do the > proper checks and translations before calling a driver > global member. (The way it is done at the rest of the > block stack) Uh, no, let's just focus on getting the DAX code in instead of putting in interim patches that will conflict. Patch 4/22 is uncontroversial, fixes this problem, has no dependencies, and is key to the rest of the DAX patchset. If this problem wants to be fixed, then put 4/22 in instead. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] brd: Fix brd_direct_access with partitions 2014-07-30 15:34 ` Matthew Wilcox @ 2014-07-30 15:37 ` Boaz Harrosh 0 siblings, 0 replies; 11+ messages in thread From: Boaz Harrosh @ 2014-07-30 15:37 UTC (permalink / raw) To: Matthew Wilcox Cc: Jens Axboe, Ross Zwisler, linux-scsi, Nick Piggin, linux-kernel On 07/30/2014 06:34 PM, Matthew Wilcox wrote: > On Wed, Jul 30, 2014 at 05:18:47PM +0300, Boaz Harrosh wrote: >> When brd_direct_access() is called on a partition-bdev >> it would access the wrong sector. And caller would then >> corrupt the device's data. >> >> This is a preliminary fix, Matthew Wilcox has a patch >> in his DAX patchset which will define a global wrapper >> to bdev->bd_disk->fops->direct_access that will do the >> proper checks and translations before calling a driver >> global member. (The way it is done at the rest of the >> block stack) > > Uh, no, let's just focus on getting the DAX code in instead of putting > in interim patches that will conflict. Patch 4/22 is uncontroversial, > fixes this problem, has no dependencies, and is key to the rest of the DAX > patchset. If this problem wants to be fixed, then put 4/22 in instead. > OK, I agree, could you please push 4/22 as reply to here for Jens to take for 3.17 ? (together with my 1/2) Thanks Boaz ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] brd: Fix the partitions BUG 2014-07-30 14:15 ` [PATCH 1/2] brd: Fix the partitions BUG Boaz Harrosh 2014-07-30 14:18 ` [PATCH 2/2] brd: Fix brd_direct_access with partitions Boaz Harrosh @ 2014-07-30 16:50 ` Ross Zwisler 2014-07-30 18:37 ` Boaz Harrosh 1 sibling, 1 reply; 11+ messages in thread From: Ross Zwisler @ 2014-07-30 16:50 UTC (permalink / raw) To: Boaz Harrosh Cc: Jens Axboe, Matthew Wilcox, linux-scsi, Nick Piggin, linux-kernel On Wed, 2014-07-30 at 17:15 +0300, Boaz Harrosh wrote: > With current code after a call to: > bdev = blkdev_get_by_path(dev_name, mode, fs_type); > size = i_size_read(bdev->bd_inode); > part_size = bdev->bd_part->nr_sects << 9; > > I get the following bad results: > dev_name == /dev/ram0 > foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24340 \ > bd_inode=ffff88003dc24430 bd_part=ffff88003ca22848 part_size=0x40000000 > dev_name == /dev/ram0p1 > foo: [foo_mount:880] size=0x40000000 bdev=ffff88003d2f6d80 \ > bd_inode=ffff88003d2f6e70 bd_part=ffff88003ca22848 part_size=0x40000000 > dev_name == /dev/ram0p2 > foo: [foo_mount:880] size=0x40000000 bdev=ffff88003dc24680 \ > bd_inode=ffff88003dc24770 bd_part=ffff88003ca22848 part_size=0x40000000 > Note how all three bdev(s) point to the same bd_part. > > This is do to a single bad clubber in brd_probe() which is > removed in this patch: > - *part = 0; > > because of this all 3 bdev(s) above get to point to the same bd_part[0] > > While at it fix/rename brd_init_one() since all devices are created on > load of driver, brd_probe() will never be called with a new un-created > device. > brd_init_one() is now renamed to brd_find() which is what it does. > > TODO: There is one more partitions BUG regarding > brd_direct_access() which is fixed on the next patch. > > Signed-off-by: Boaz Harrosh <boaz@plexistor.com> > --- > drivers/block/brd.c | 19 ++++++++----------- > 1 file changed, 8 insertions(+), 11 deletions(-) > > diff --git a/drivers/block/brd.c b/drivers/block/brd.c > index c7d138e..92334f6 100644 > --- a/drivers/block/brd.c > +++ b/drivers/block/brd.c > @@ -523,22 +523,20 @@ static void brd_free(struct brd_device *brd) > kfree(brd); > } > > -static struct brd_device *brd_init_one(int i) > +static struct brd_device *brd_find(int i) > { > struct brd_device *brd; > > list_for_each_entry(brd, &brd_devices, brd_list) { > if (brd->brd_number == i) > - goto out; > + return brd; > } > > - brd = brd_alloc(i); > - if (brd) { > - add_disk(brd->brd_disk); > - list_add_tail(&brd->brd_list, &brd_devices); > - } > -out: > - return brd; > + /* brd always allocates all its devices at load time, therefor > + * brd_probe will never be called with a new brd_number > + */ > + printk(KERN_EROR "brd: brd_find unexpected device %d\n", i); s/KERN_EROR/KERN_ERR/ > + return NULL; > } > > static void brd_del_one(struct brd_device *brd) > @@ -554,11 +552,10 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) > struct kobject *kobj; > > mutex_lock(&brd_devices_mutex); > - brd = brd_init_one(MINOR(dev) >> part_shift); > + brd = brd_find(MINOR(dev) >> part_shift); > kobj = brd ? get_disk(brd->brd_disk) : NULL; > mutex_unlock(&brd_devices_mutex); > > - *part = 0; > return kobj; > } It is possible to create new block devices with BRD at runtime: # mknod /dev/new_brd b 1 4 # fdisk -l /dev/new_brd This causes a new BRD disk to be created, and hits your error case: Jul 30 10:40:57 alara kernel: brd: brd_find unexpected device 4 I guess in general I'm not saying that BRD needs to have partitions - indeed it may not give you much in the way of added functionality. As the code currently stands partitions aren't surfaced anyway (GENHD_FL_SUPPRESS_PARTITION_INFO is set). For PRD, however, I *do* want to enable partitions correctly because eventually I'd like to enhance PRD so that it *does* actually handle NVDIMMs correctly, and for that partitions do make sense. And if I have to implement and debug partitions for PRD, it's easy to stick them in BRD in case anyone wants to use them. - Ross ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] brd: Fix the partitions BUG 2014-07-30 16:50 ` [PATCH 1/2] brd: Fix the partitions BUG Ross Zwisler @ 2014-07-30 18:37 ` Boaz Harrosh 0 siblings, 0 replies; 11+ messages in thread From: Boaz Harrosh @ 2014-07-30 18:37 UTC (permalink / raw) To: Ross Zwisler, Boaz Harrosh Cc: Jens Axboe, Matthew Wilcox, linux-scsi, Nick Piggin, linux-kernel On 07/30/2014 07:50 PM, Ross Zwisler wrote: <> >> + */ >> + printk(KERN_EROR "brd: brd_find unexpected device %d\n", i); > > s/KERN_EROR/KERN_ERR/ > Yes thanks, sigh, code should compile driver error. I used pr_err but last inspection I saw that printk is used everywhere and, crapped ... >> + return NULL; >> } >> >> static void brd_del_one(struct brd_device *brd) >> @@ -554,11 +552,10 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) >> struct kobject *kobj; >> >> mutex_lock(&brd_devices_mutex); >> - brd = brd_init_one(MINOR(dev) >> part_shift); >> + brd = brd_find(MINOR(dev) >> part_shift); >> kobj = brd ? get_disk(brd->brd_disk) : NULL; >> mutex_unlock(&brd_devices_mutex); >> >> - *part = 0; >> return kobj; >> } > > It is possible to create new block devices with BRD at runtime: > > # mknod /dev/new_brd b 1 4 > # fdisk -l /dev/new_brd > > This causes a new BRD disk to be created, and hits your error case: > > Jul 30 10:40:57 alara kernel: brd: brd_find unexpected device 4 > Ha OK, So this is the mystery key. I never knew that trick, OK I guess I need to leave this in > I guess in general I'm not saying that BRD needs to have partitions - indeed > it may not give you much in the way of added functionality. As the code > currently stands partitions aren't surfaced anyway > (GENHD_FL_SUPPRESS_PARTITION_INFO is set). For PRD, however, I *do* want to > enable partitions correctly because eventually I'd like to enhance PRD so that > it *does* actually handle NVDIMMs correctly, and for that partitions do make > sense. So lets talk about that for a bit. Why would you want legacy partitions for NVDIMMs. For one fdisk will waste 1M of memory on this. And with NVDIMMs you actually need a different manager all together, more like lvm stuff. I have patches here that changes prd a lot. - Global memory parameters are gone each device remaps and operates on it's own memory range. - the prd_params are gone instead you have one memmap= parameter of the form memmap=nnn$ooo,[nnn$ooo]... where: nnn - is size in bytes of the form number[K/M/G] ooo - is offset in bytes of the form number[K/M/G] Very much the same as the memmap= Kernel parameters. So just copy/paste what you did there and it will work. Now each memory range has one prd_device created for it. Note that if specified ranges are just contiguous then it is like your prd_count thing where you took a contiguous range and sliced it up, only here they can be of different size. Off course it has an added fixture of dis-contiguous ranges like we have in our multy-nodes NUMA system in the lab that host DDR3 NvDIMMs in two banks. - A sysfs interface is added to add/remove memory range on the fly like osdblk - If no parameters are specified at all, the Kernel command-line is parsed and all memmap= sections are attempted and used if are not already claimed. An interface like mknod above is not supported since it is actually pointless. My current code still supports partitions but I still think it is silly. [ This is all speculative for DDR3-NVDIMMs, we have seen that the memory controller actually tags these DIMMs with type 12 but it looks like this is all vendor specific right now and I understand that DDR4 standardize all that. So I was hoping you guys are working on all that with the NvDIMM stuff. Now lets say that I have established auto-detection for each DIMM and have extracted its SN (Our DDR3 DIMMs each have an SN that can be displayed by the vendor supplied tool) An NvDIMM manager will need to establish an NvDIMM table and set order. The motivation of a partition table is that a disk moved to another machine and/or booted into another OS will have persistent and STD way to not clobber over established DATA. But here we have like a disk cluster/raid, an ordered set of drives. Unique to NvDIMMs is the interleaving. If pairs are then inserted in wrong order or are re-mixed. This should be detected and refused to be mounted (unless a re-initialize is forced) So data is not silently lost on operator errors. All this is then persisted on some DIMM, first or last. If code is able to re-arrange order, like when physical address have changed it will, but in the wrong interleaving case it will refuse to mount. ] > And if I have to implement and debug partitions for PRD, it's easy to > stick them in BRD in case anyone wants to use them. > I was finally playing with BRD, and it is missing your getgeo patch, so it is currently completely alien to fdisk and partitions. So why, why keep a fixture that never existed, I still hope to convince you that this is crap. Specially with brd that can have devices added on the fly like you showed above. Who needs them at all? Why waist 1M of memory on each device for no extra gain? Specially in light of my new prd that does away of any needs of a partitioning since it supports arbitrary slicing in another way. > - Ross I will send a new set of patches for brd tomorrow. If we want, really want, NEW SUPPORT for partitions, we need 2 more patch. But PLEASE consider just removing this dead never used crap. If you agree I will send a cleaning patch ASAP. Also I will send as RFC my prd patches if you want to see, it has some nice surprises in them that I hope you will like ;-) Cheers Boaz ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-07-30 18:37 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-29 16:37 [RFD] brd.ko Never supported partitions should we remove the dead code ? Boaz Harrosh 2014-07-29 16:56 ` Matthew Wilcox 2014-07-29 17:13 ` Boaz Harrosh 2014-07-29 17:19 ` Ross Zwisler 2014-07-30 11:11 ` Boaz Harrosh 2014-07-30 14:15 ` [PATCH 1/2] brd: Fix the partitions BUG Boaz Harrosh 2014-07-30 14:18 ` [PATCH 2/2] brd: Fix brd_direct_access with partitions Boaz Harrosh 2014-07-30 15:34 ` Matthew Wilcox 2014-07-30 15:37 ` Boaz Harrosh 2014-07-30 16:50 ` [PATCH 1/2] brd: Fix the partitions BUG Ross Zwisler 2014-07-30 18:37 ` Boaz Harrosh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).