* [for-4.16 PATCH v2 0/3] block: some genhd changes
@ 2018-01-10 2:41 Mike Snitzer
2018-01-10 2:41 ` [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Mike Snitzer @ 2018-01-10 2:41 UTC (permalink / raw)
To: axboe; +Cc: hch, Bart.VanAssche, dm-devel, linux-block
Hi Jens,
Please consider PATCH 1 and 2 for 4.16 inclusion. I've included PATCH
3 to show the DM changes needed in order to make use of these block
changes.
v2: added some del_gendisk() code movement to be symmetric with the
device_add_disk() related code movement (suggested by Bart). Also
fixed a missing !GENHD_FL_HIDDEN check before the "bdi" symlink
creation.
Thanks!
Mike
Mike Snitzer (3):
block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN
block: cope with gendisk's 'queue' being added later
dm: fix awkward request_queue initialization
block/blk-sysfs.c | 23 +++++++++++++++++-
block/genhd.c | 38 +++++++----------------------
drivers/md/dm-core.h | 2 --
drivers/md/dm-ioctl.c | 7 ++----
drivers/md/dm-rq.c | 13 +---------
drivers/md/dm-table.c | 2 +-
drivers/md/dm.c | 66 +++++++++++++++++++++++++++++++--------------------
7 files changed, 75 insertions(+), 76 deletions(-)
--
2.15.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN 2018-01-10 2:41 [for-4.16 PATCH v2 0/3] block: some genhd changes Mike Snitzer @ 2018-01-10 2:41 ` Mike Snitzer 2018-01-10 4:35 ` Mike Snitzer 2018-01-10 8:31 ` Christoph Hellwig 2018-01-10 2:41 ` [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later Mike Snitzer 2018-01-10 2:41 ` [for-4.16 PATCH v2 3/3] dm: fix awkward request_queue initialization Mike Snitzer 2 siblings, 2 replies; 12+ messages in thread From: Mike Snitzer @ 2018-01-10 2:41 UTC (permalink / raw) To: axboe; +Cc: hch, Bart.VanAssche, dm-devel, linux-block device_add_disk() will only call bdi_register_owner() if !GENHD_FL_HIDDEN so it follows that bdi_unregister() should be avoided for !GENHD_FL_HIDDEN in del_gendisk(). Found with code inspection. bdi_unregister() won't do much harm if bdi_register_owner() wasn't used but best to avoid it. Fixes: 8ddcd65325 ("block: introduce GENHD_FL_HIDDEN") Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/genhd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/genhd.c b/block/genhd.c index 96a66f671720..00620e01e043 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -725,7 +725,8 @@ void del_gendisk(struct gendisk *disk) * Unregister bdi before releasing device numbers (as they can * get reused and we'd get clashes in sysfs). */ - bdi_unregister(disk->queue->backing_dev_info); + if (!(disk->flags & GENHD_FL_HIDDEN)) + bdi_unregister(disk->queue->backing_dev_info); blk_unregister_queue(disk); } else { WARN_ON(1); -- 2.15.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN 2018-01-10 2:41 ` [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer @ 2018-01-10 4:35 ` Mike Snitzer 2018-01-10 8:31 ` Christoph Hellwig 1 sibling, 0 replies; 12+ messages in thread From: Mike Snitzer @ 2018-01-10 4:35 UTC (permalink / raw) To: axboe; +Cc: Bart.VanAssche, linux-block, dm-devel, hch On Tue, Jan 09 2018 at 9:41pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > device_add_disk() will only call bdi_register_owner() if > !GENHD_FL_HIDDEN so it follows that bdi_unregister() should be avoided > for !GENHD_FL_HIDDEN in del_gendisk(). This ^ should be "GENHD_FL_HIDDEN". Sorry for the cut-n-paste typo. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN 2018-01-10 2:41 ` [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer 2018-01-10 4:35 ` Mike Snitzer @ 2018-01-10 8:31 ` Christoph Hellwig 1 sibling, 0 replies; 12+ messages in thread From: Christoph Hellwig @ 2018-01-10 8:31 UTC (permalink / raw) To: Mike Snitzer; +Cc: axboe, hch, Bart.VanAssche, dm-devel, linux-block It's in fact completely harmless :) But not even calling it obviously is just as fine. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later 2018-01-10 2:41 [for-4.16 PATCH v2 0/3] block: some genhd changes Mike Snitzer 2018-01-10 2:41 ` [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer @ 2018-01-10 2:41 ` Mike Snitzer 2018-01-10 3:46 ` [dm-devel] " Ming Lei 2018-01-10 8:32 ` Christoph Hellwig 2018-01-10 2:41 ` [for-4.16 PATCH v2 3/3] dm: fix awkward request_queue initialization Mike Snitzer 2 siblings, 2 replies; 12+ messages in thread From: Mike Snitzer @ 2018-01-10 2:41 UTC (permalink / raw) To: axboe; +Cc: hch, Bart.VanAssche, dm-devel, linux-block Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this was block/genhd.c:add_disk() has required that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called -- because add_disk() also deals with exposing the request_queue via blk_register_queue(). DM's dynamic creation of arbitrary device types (and associated request_queue types) requires the DM device's gendisk be available so that DM table loads can establish a master/slave relationship with subordinate devices that are referenced by loaded DM tables -- using bd_link_disk_holder(). But until these DM tables, and their associated subordinate devices, are known DM cannot know what type of request_queue it needs -- nor what its queue_limits should be. This chicken and egg scenario has created all manner of problems for DM and, at times, the block layer. Summary of changes: - Adjust device_add_disk() so that that it can cope with the gendisk _not_ having its 'queue' established yet. - Move "bdi" symlink creation from register_disk() to the end of blk_register_queue() -- it is more logical in that the bdi is part of the request_queue. - Move extra request_queue reference count (on behalf of gendisk) from device_add_disk() to end of blk_register_queue(). - Make device_add_disk()'s calls to bdi_register_owner() and blk_register_queue() conditional on disk->queue not being NULL. - Export blk_register_queue() - Move "bdi" symlink removal and bdi_unregister() from del_gendisk() to blk_unregister_queue(). Suggested by Bart. - Remove del_gendisk()'s WARN_ON() if disk->queue is NULL These changes allow DM to use device_add_disk() to anchor its gendisk as the "master" for master/slave relationships DM must establish with subordinate devices referenced in DM tables that get loaded. Once all "slave" devices for a DM device are known a request_queue can be properly initialized and then advertised via sysfs -- important improvement being that no request_queue resource initialization is missed. These changes have been tested to work without any IO races because the request_queue and associated bdi don't even exist at the time that the gendisk's "struct device"s are established by device_add_disk(). I've been mindful of historic bugs, and haven't experienced them with DM, e.g.: https://bugzilla.kernel.org/show_bug.cgi?id=16312 (fixed by commit 01ea5063 "block: Fix race during disk initialization") Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-sysfs.c | 23 ++++++++++++++++++++++- block/genhd.c | 39 +++++++++------------------------------ 2 files changed, 31 insertions(+), 31 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 870484eaed1f..d888ecb95a2a 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -919,8 +919,21 @@ int blk_register_queue(struct gendisk *disk) ret = 0; unlock: mutex_unlock(&q->sysfs_lock); + + /* + * Take an extra ref on queue which will be put on disk_release() + * so that it sticks around as long as @disk is there. + */ + WARN_ON_ONCE(!blk_get_queue(q)); + + if (!(disk->flags & GENHD_FL_HIDDEN)) + WARN_ON(sysfs_create_link(&dev->kobj, + &q->backing_dev_info->dev->kobj, + "bdi")); + return ret; } +EXPORT_SYMBOL_GPL(blk_register_queue); void blk_unregister_queue(struct gendisk *disk) { @@ -929,13 +942,21 @@ void blk_unregister_queue(struct gendisk *disk) if (WARN_ON(!q)) return; + if (!(disk->flags & GENHD_FL_HIDDEN)) { + sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); + /* + * Unregister bdi before releasing device numbers (as they can + * get reused and we'd get clashes in sysfs). + */ + bdi_unregister(q->backing_dev_info); + } + mutex_lock(&q->sysfs_lock); queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); mutex_unlock(&q->sysfs_lock); wbt_exit(q); - if (q->mq_ops) blk_mq_unregister_dev(disk_to_dev(disk), q); diff --git a/block/genhd.c b/block/genhd.c index 00620e01e043..4a71aea1a1ef 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -621,11 +621,6 @@ static void register_disk(struct device *parent, struct gendisk *disk) while ((part = disk_part_iter_next(&piter))) kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD); disk_part_iter_exit(&piter); - - err = sysfs_create_link(&ddev->kobj, - &disk->queue->backing_dev_info->dev->kobj, - "bdi"); - WARN_ON(err); } /** @@ -671,24 +666,19 @@ void device_add_disk(struct device *parent, struct gendisk *disk) disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO; disk->flags |= GENHD_FL_NO_PART_SCAN; } else { - int ret; - - /* Register BDI before referencing it from bdev */ disk_to_dev(disk)->devt = devt; - ret = bdi_register_owner(disk->queue->backing_dev_info, - disk_to_dev(disk)); - WARN_ON(ret); + /* Register BDI before referencing it from bdev */ + if (disk->queue) { + retval = bdi_register_owner(disk->queue->backing_dev_info, + disk_to_dev(disk)); + WARN_ON(retval); + } blk_register_region(disk_devt(disk), disk->minors, NULL, exact_match, exact_lock, disk); } register_disk(parent, disk); - blk_register_queue(disk); - - /* - * Take an extra ref on queue which will be put on disk_release() - * so that it sticks around as long as @disk is there. - */ - WARN_ON_ONCE(!blk_get_queue(disk->queue)); + if (disk->queue) + blk_register_queue(disk); disk_add_events(disk); blk_integrity_add(disk); @@ -718,19 +708,8 @@ void del_gendisk(struct gendisk *disk) set_capacity(disk, 0); disk->flags &= ~GENHD_FL_UP; - if (!(disk->flags & GENHD_FL_HIDDEN)) - sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); - if (disk->queue) { - /* - * Unregister bdi before releasing device numbers (as they can - * get reused and we'd get clashes in sysfs). - */ - if (!(disk->flags & GENHD_FL_HIDDEN)) - bdi_unregister(disk->queue->backing_dev_info); + if (disk->queue) blk_unregister_queue(disk); - } else { - WARN_ON(1); - } if (!(disk->flags & GENHD_FL_HIDDEN)) blk_unregister_region(disk_devt(disk), disk->minors); -- 2.15.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [dm-devel] [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later 2018-01-10 2:41 ` [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later Mike Snitzer @ 2018-01-10 3:46 ` Ming Lei 2018-01-10 4:21 ` Mike Snitzer 2018-01-10 8:32 ` Christoph Hellwig 1 sibling, 1 reply; 12+ messages in thread From: Ming Lei @ 2018-01-10 3:46 UTC (permalink / raw) To: Mike Snitzer Cc: Jens Axboe, Bart Van Assche, linux-block, open list:DEVICE-MAPPER (LVM), Christoph Hellwig Hi Mike, On Wed, Jan 10, 2018 at 10:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: > Since I can remember DM has forced the block layer to allow the > allocation and initialization of the request_queue to be distinct > operations. Reason for this was block/genhd.c:add_disk() has required > that the request_queue (and associated bdi) be tied to the gendisk > before add_disk() is called -- because add_disk() also deals with > exposing the request_queue via blk_register_queue(). > > DM's dynamic creation of arbitrary device types (and associated > request_queue types) requires the DM device's gendisk be available so > that DM table loads can establish a master/slave relationship with > subordinate devices that are referenced by loaded DM tables -- using > bd_link_disk_holder(). But until these DM tables, and their associated > subordinate devices, are known DM cannot know what type of request_queue > it needs -- nor what its queue_limits should be. Maybe DM is the only kind of this case, but it is easy to cause trouble by adding disk before setting up queue. In theory, once disk is added, it becomes visible for external world, and IO starts to come and sysfs operations come too, then block has to deal with these cases. Another related issue is that blk-mq debugfs can't work yet for dm-mpath. So I am just wondering if it is possible to follow the usual way to add disk after setting up queue for DM? If it is possible, it may avoid lots of trouble for us. Thanks, Ming > > This chicken and egg scenario has created all manner of problems for DM > and, at times, the block layer. > > Summary of changes: > > - Adjust device_add_disk() so that that it can cope with the gendisk _not_ > having its 'queue' established yet. > > - Move "bdi" symlink creation from register_disk() to the end of > blk_register_queue() -- it is more logical in that the bdi is part of > the request_queue. > > - Move extra request_queue reference count (on behalf of gendisk) from > device_add_disk() to end of blk_register_queue(). > > - Make device_add_disk()'s calls to bdi_register_owner() and > blk_register_queue() conditional on disk->queue not being NULL. > > - Export blk_register_queue() > > - Move "bdi" symlink removal and bdi_unregister() from del_gendisk() to > blk_unregister_queue(). Suggested by Bart. > > - Remove del_gendisk()'s WARN_ON() if disk->queue is NULL > > These changes allow DM to use device_add_disk() to anchor its gendisk as > the "master" for master/slave relationships DM must establish with > subordinate devices referenced in DM tables that get loaded. Once all > "slave" devices for a DM device are known a request_queue can be > properly initialized and then advertised via sysfs -- important > improvement being that no request_queue resource initialization is > missed. > > These changes have been tested to work without any IO races because the > request_queue and associated bdi don't even exist at the time that the > gendisk's "struct device"s are established by device_add_disk(). I've > been mindful of historic bugs, and haven't experienced them with DM, > e.g.: https://bugzilla.kernel.org/show_bug.cgi?id=16312 (fixed by commit > 01ea5063 "block: Fix race during disk initialization") > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > --- > block/blk-sysfs.c | 23 ++++++++++++++++++++++- > block/genhd.c | 39 +++++++++------------------------------ > 2 files changed, 31 insertions(+), 31 deletions(-) > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 870484eaed1f..d888ecb95a2a 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -919,8 +919,21 @@ int blk_register_queue(struct gendisk *disk) > ret = 0; > unlock: > mutex_unlock(&q->sysfs_lock); > + > + /* > + * Take an extra ref on queue which will be put on disk_release() > + * so that it sticks around as long as @disk is there. > + */ > + WARN_ON_ONCE(!blk_get_queue(q)); > + > + if (!(disk->flags & GENHD_FL_HIDDEN)) > + WARN_ON(sysfs_create_link(&dev->kobj, > + &q->backing_dev_info->dev->kobj, > + "bdi")); > + > return ret; > } > +EXPORT_SYMBOL_GPL(blk_register_queue); > > void blk_unregister_queue(struct gendisk *disk) > { > @@ -929,13 +942,21 @@ void blk_unregister_queue(struct gendisk *disk) > if (WARN_ON(!q)) > return; > > + if (!(disk->flags & GENHD_FL_HIDDEN)) { > + sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); > + /* > + * Unregister bdi before releasing device numbers (as they can > + * get reused and we'd get clashes in sysfs). > + */ > + bdi_unregister(q->backing_dev_info); > + } > + > mutex_lock(&q->sysfs_lock); > queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); > mutex_unlock(&q->sysfs_lock); > > wbt_exit(q); > > - > if (q->mq_ops) > blk_mq_unregister_dev(disk_to_dev(disk), q); > > diff --git a/block/genhd.c b/block/genhd.c > index 00620e01e043..4a71aea1a1ef 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -621,11 +621,6 @@ static void register_disk(struct device *parent, struct gendisk *disk) > while ((part = disk_part_iter_next(&piter))) > kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD); > disk_part_iter_exit(&piter); > - > - err = sysfs_create_link(&ddev->kobj, > - &disk->queue->backing_dev_info->dev->kobj, > - "bdi"); > - WARN_ON(err); > } > > /** > @@ -671,24 +666,19 @@ void device_add_disk(struct device *parent, struct gendisk *disk) > disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO; > disk->flags |= GENHD_FL_NO_PART_SCAN; > } else { > - int ret; > - > - /* Register BDI before referencing it from bdev */ > disk_to_dev(disk)->devt = devt; > - ret = bdi_register_owner(disk->queue->backing_dev_info, > - disk_to_dev(disk)); > - WARN_ON(ret); > + /* Register BDI before referencing it from bdev */ > + if (disk->queue) { > + retval = bdi_register_owner(disk->queue->backing_dev_info, > + disk_to_dev(disk)); > + WARN_ON(retval); > + } > blk_register_region(disk_devt(disk), disk->minors, NULL, > exact_match, exact_lock, disk); > } > register_disk(parent, disk); > - blk_register_queue(disk); > - > - /* > - * Take an extra ref on queue which will be put on disk_release() > - * so that it sticks around as long as @disk is there. > - */ > - WARN_ON_ONCE(!blk_get_queue(disk->queue)); > + if (disk->queue) > + blk_register_queue(disk); > > disk_add_events(disk); > blk_integrity_add(disk); > @@ -718,19 +708,8 @@ void del_gendisk(struct gendisk *disk) > set_capacity(disk, 0); > disk->flags &= ~GENHD_FL_UP; > > - if (!(disk->flags & GENHD_FL_HIDDEN)) > - sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); > - if (disk->queue) { > - /* > - * Unregister bdi before releasing device numbers (as they can > - * get reused and we'd get clashes in sysfs). > - */ > - if (!(disk->flags & GENHD_FL_HIDDEN)) > - bdi_unregister(disk->queue->backing_dev_info); > + if (disk->queue) > blk_unregister_queue(disk); > - } else { > - WARN_ON(1); > - } > > if (!(disk->flags & GENHD_FL_HIDDEN)) > blk_unregister_region(disk_devt(disk), disk->minors); > -- > 2.15.0 > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later 2018-01-10 3:46 ` [dm-devel] " Ming Lei @ 2018-01-10 4:21 ` Mike Snitzer 2018-01-10 7:55 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Mike Snitzer @ 2018-01-10 4:21 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Bart Van Assche, linux-block, dm-devel, Christoph Hellwig On Tue, Jan 09 2018 at 10:46pm -0500, Ming Lei <tom.leiming@gmail.com> wrote: > Hi Mike, > > On Wed, Jan 10, 2018 at 10:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: > > Since I can remember DM has forced the block layer to allow the > > allocation and initialization of the request_queue to be distinct > > operations. Reason for this was block/genhd.c:add_disk() has required > > that the request_queue (and associated bdi) be tied to the gendisk > > before add_disk() is called -- because add_disk() also deals with > > exposing the request_queue via blk_register_queue(). > > > > DM's dynamic creation of arbitrary device types (and associated > > request_queue types) requires the DM device's gendisk be available so > > that DM table loads can establish a master/slave relationship with > > subordinate devices that are referenced by loaded DM tables -- using > > bd_link_disk_holder(). But until these DM tables, and their associated > > subordinate devices, are known DM cannot know what type of request_queue > > it needs -- nor what its queue_limits should be. > > Maybe DM is the only kind of this case, but it is easy to cause > trouble by adding disk before setting up queue. > > In theory, once disk is added, it becomes visible for external world, and > IO starts to come and sysfs operations come too, then block has to deal > with these cases. Hi, Yes, I had concerns about this. But that theory is for a fully formed disk (one that has a request_queue attached to disk->queue). So far my testing of these changes with DM (using various testsuites) hasn't encountered _any_ problems. But please try to break it... ;) I have the changes in a git branch here: https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=block-4.16 > Another related issue is that blk-mq debugfs can't work yet for dm-mpath. Not sure how that is a related issue to this thread. But can you expand on that? I'm not familiar with "blk-mq debugfs". Any pointer to code that activates it? Could be that my changes make it work now... > So I am just wondering if it is possible to follow the usual way to add disk > after setting up queue for DM? If it is possible, it may avoid lots of trouble > for us. As I explained in the header (quoted above actually) it is _not_ possible. It is the gendisk that enables the device stack to be constructed (at least how DM's stacking is currently implemented with bd_link_disk_holder() and taking references on devices found in a DM device's table). And from that gendisk I can then walk the devices to establish the request_queue as needed, its stacked limits, etc. The approach I've taken in these patches is the closest I've gotten to make DM _much_ more sane about how its request_queue is established. But its still off the beaten path due to needing "block: cope with gendisk's 'queue' being added later". Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later 2018-01-10 4:21 ` Mike Snitzer @ 2018-01-10 7:55 ` Ming Lei 2018-01-10 14:20 ` Mike Snitzer 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2018-01-10 7:55 UTC (permalink / raw) To: Mike Snitzer Cc: Jens Axboe, Bart Van Assche, linux-block, open list:DEVICE-MAPPER (LVM), Christoph Hellwig On Wed, Jan 10, 2018 at 12:21 PM, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Jan 09 2018 at 10:46pm -0500, > Ming Lei <tom.leiming@gmail.com> wrote: > >> Hi Mike, >> >> On Wed, Jan 10, 2018 at 10:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: >> > Since I can remember DM has forced the block layer to allow the >> > allocation and initialization of the request_queue to be distinct >> > operations. Reason for this was block/genhd.c:add_disk() has required >> > that the request_queue (and associated bdi) be tied to the gendisk >> > before add_disk() is called -- because add_disk() also deals with >> > exposing the request_queue via blk_register_queue(). >> > >> > DM's dynamic creation of arbitrary device types (and associated >> > request_queue types) requires the DM device's gendisk be available so >> > that DM table loads can establish a master/slave relationship with >> > subordinate devices that are referenced by loaded DM tables -- using >> > bd_link_disk_holder(). But until these DM tables, and their associated >> > subordinate devices, are known DM cannot know what type of request_queue >> > it needs -- nor what its queue_limits should be. >> >> Maybe DM is the only kind of this case, but it is easy to cause >> trouble by adding disk before setting up queue. >> >> In theory, once disk is added, it becomes visible for external world, and >> IO starts to come and sysfs operations come too, then block has to deal >> with these cases. > > Hi, > > Yes, I had concerns about this. But that theory is for a fully formed > disk (one that has a request_queue attached to disk->queue). So far my > testing of these changes with DM (using various testsuites) hasn't > encountered _any_ problems. > > But please try to break it... ;) > I have the changes in a git branch here: > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=block-4.16 > >> Another related issue is that blk-mq debugfs can't work yet for dm-mpath. > > Not sure how that is a related issue to this thread. > But can you expand on that? I'm not familiar with "blk-mq debugfs". > Any pointer to code that activates it? Could be that my changes make it > work now... Hi, blk_mq_debugfs_register() is called in blk_register_queue() from add_disk(), and blk_mq_debugfs_register() need queue's information to setup mq debugfs stuff. But your patch does fix this issue, cool! > >> So I am just wondering if it is possible to follow the usual way to add disk >> after setting up queue for DM? If it is possible, it may avoid lots of trouble >> for us. > > As I explained in the header (quoted above actually) it is _not_ > possible. It is the gendisk that enables the device stack to be > constructed (at least how DM's stacking is currently implemented with > bd_link_disk_holder() and taking references on devices found in a DM > device's table). And from that gendisk I can then walk the devices to > establish the request_queue as needed, its stacked limits, etc. > > The approach I've taken in these patches is the closest I've gotten to > make DM _much_ more sane about how its request_queue is established. > But its still off the beaten path due to needing "block: cope with > gendisk's 'queue' being added later". OK, thanks for your explanation. After taking close look at your patches, I think this approach is clean. But there are still corner cases which need to be addressed: 1) in the following disk attributes, queue is referenced, so we have to add check in their show/store operations since the attributes are shown up just after disk is added. disk_alignment_offset_show disk_discard_alignment_show part_timeout_show/part_timeout_store 2) same issue on /proc/diskstats: see diskstats_show() 3) in IO path, looks we already checked disk->queue in generic_make_request_checks(), so it should be fine, and you set disk->queue after the queue is fully initialized in the 3rd patch. Thanks, Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later 2018-01-10 7:55 ` Ming Lei @ 2018-01-10 14:20 ` Mike Snitzer 0 siblings, 0 replies; 12+ messages in thread From: Mike Snitzer @ 2018-01-10 14:20 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Bart Van Assche, linux-block, open list:DEVICE-MAPPER (LVM), Christoph Hellwig On Wed, Jan 10 2018 at 2:55am -0500, Ming Lei <tom.leiming@gmail.com> wrote: > On Wed, Jan 10, 2018 at 12:21 PM, Mike Snitzer <snitzer@redhat.com> wrote: > > On Tue, Jan 09 2018 at 10:46pm -0500, > > Ming Lei <tom.leiming@gmail.com> wrote: > > > >> Another related issue is that blk-mq debugfs can't work yet for dm-mpath. > > > > Not sure how that is a related issue to this thread. > > But can you expand on that? I'm not familiar with "blk-mq debugfs". > > Any pointer to code that activates it? Could be that my changes make it > > work now... > > Hi, > > blk_mq_debugfs_register() is called in blk_register_queue() from add_disk(), > and blk_mq_debugfs_register() need queue's information to setup mq debugfs > stuff. > > But your patch does fix this issue, cool! Great, thought it might. > >> So I am just wondering if it is possible to follow the usual way to add disk > >> after setting up queue for DM? If it is possible, it may avoid lots of trouble > >> for us. > > > > As I explained in the header (quoted above actually) it is _not_ > > possible. It is the gendisk that enables the device stack to be > > constructed (at least how DM's stacking is currently implemented with > > bd_link_disk_holder() and taking references on devices found in a DM > > device's table). And from that gendisk I can then walk the devices to > > establish the request_queue as needed, its stacked limits, etc. > > > > The approach I've taken in these patches is the closest I've gotten to > > make DM _much_ more sane about how its request_queue is established. > > But its still off the beaten path due to needing "block: cope with > > gendisk's 'queue' being added later". > > OK, thanks for your explanation. > > After taking close look at your patches, I think this approach is clean. > But there are still corner cases which need to be addressed: > > 1) in the following disk attributes, queue is referenced, so we have > to add check in their show/store operations since the attributes > are shown up just after disk is added. > > disk_alignment_offset_show > disk_discard_alignment_show > part_timeout_show/part_timeout_store > > 2) same issue on /proc/diskstats: see diskstats_show() > > 3) in IO path, looks we already checked disk->queue in > generic_make_request_checks(), so it should be fine, and you set > disk->queue after the queue is fully initialized in the 3rd patch. I'll work through these and see what I can do. Will likely deal with each independent of the 2/3 patch (otherwise if I just keep folding changes in to the original patch review will get too hard). So hopefully I'll have a v3 to share later today. Thanks, Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later 2018-01-10 2:41 ` [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later Mike Snitzer 2018-01-10 3:46 ` [dm-devel] " Ming Lei @ 2018-01-10 8:32 ` Christoph Hellwig 2018-01-10 12:01 ` Hannes Reinecke 1 sibling, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2018-01-10 8:32 UTC (permalink / raw) To: Mike Snitzer; +Cc: axboe, hch, Bart.VanAssche, dm-devel, linux-block On Tue, Jan 09, 2018 at 09:41:03PM -0500, Mike Snitzer wrote: > Since I can remember DM has forced the block layer to allow the > allocation and initialization of the request_queue to be distinct > operations. Reason for this was block/genhd.c:add_disk() has required > that the request_queue (and associated bdi) be tied to the gendisk > before add_disk() is called -- because add_disk() also deals with > exposing the request_queue via blk_register_queue(). Hmm. I don't even know how that could be safe given that the disk is live and visible to userspace once added.. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later 2018-01-10 8:32 ` Christoph Hellwig @ 2018-01-10 12:01 ` Hannes Reinecke 0 siblings, 0 replies; 12+ messages in thread From: Hannes Reinecke @ 2018-01-10 12:01 UTC (permalink / raw) To: Christoph Hellwig, Mike Snitzer Cc: axboe, Bart.VanAssche, dm-devel, linux-block On 01/10/2018 09:32 AM, Christoph Hellwig wrote: > On Tue, Jan 09, 2018 at 09:41:03PM -0500, Mike Snitzer wrote: >> Since I can remember DM has forced the block layer to allow the >> allocation and initialization of the request_queue to be distinct >> operations. Reason for this was block/genhd.c:add_disk() has required >> that the request_queue (and associated bdi) be tied to the gendisk >> before add_disk() is called -- because add_disk() also deals with >> exposing the request_queue via blk_register_queue(). > > Hmm. I don't even know how that could be safe given that the disk > is live and visible to userspace once added.. > We're getting an additional 'change' event once the tables are loaded and setup properly. That's not a problem, and in fact the DASD driver has the very same issue/feature/design. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 12+ messages in thread
* [for-4.16 PATCH v2 3/3] dm: fix awkward request_queue initialization 2018-01-10 2:41 [for-4.16 PATCH v2 0/3] block: some genhd changes Mike Snitzer 2018-01-10 2:41 ` [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer 2018-01-10 2:41 ` [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later Mike Snitzer @ 2018-01-10 2:41 ` Mike Snitzer 2 siblings, 0 replies; 12+ messages in thread From: Mike Snitzer @ 2018-01-10 2:41 UTC (permalink / raw) To: axboe; +Cc: hch, Bart.VanAssche, dm-devel, linux-block Fix DM so that it no longer creates a place-holder request_queue that doesn't reflect the actual way the request_queue will ulimately be used, only to have to backfill proper queue and queue_limits initialization. Instead, DM creates a gendisk that initially doesn't have a request_queue at all. This serves to allow DM table loads to own subordinate devices without first needing to know the aggregated capabilities that will be needed of the DM device's request_queue. Once all DM tables are loaded the request_queue is allocated and DM then backfills its gendisk's ->queue member and associated sysfs and bdi initialization via blk_register_queue(). DM is now no longer prone to having its request_queue be improperly initialized. Summary of changes: - alloc_dev() doesn't pre-allocate a place-holder request_queue, and now sets md->disk->queue to NULL before calling add_disk(). - dm_setup_md_queue() is updated to allocate and initialize the DM's request_queue (_after_ all table loads have occurred and request_queue type and features and limits are known). - dm_setup_md_queue() must call bdi_register_owner() and blk_register_queue() because they were not possible at add_disk()-time (due to disk->queue being NULL). Any future "why is disk->queue NULL!?" awkwardness is DM's to own and shouldn't put more burden on block core (famous last words). So potential issues related to disk->queue NULL pointer dereferences should be for DM maintainer(s) to triage and address. A very welcome side-effect of these changes is DM no longer needs to: 1) backfill the "mq" sysfs entry (because historically DM didn't initialize the request_queue to use blk-mq until _after_ register_queue() was called via add_disk()). 2) call elv_register_queue() to get .request_fn request-based DM device's "queue" exposed in syfs. These changes also stave off the need to introduce new DM-specific workarounds in block core, e.g. this proposal: https://patchwork.kernel.org/patch/10067961/ In the end DM devices should be less unicorn in nature (relative to initialization and availability of block core infrastructure). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm-core.h | 2 -- drivers/md/dm-ioctl.c | 7 ++---- drivers/md/dm-rq.c | 13 +--------- drivers/md/dm-table.c | 2 +- drivers/md/dm.c | 66 +++++++++++++++++++++++++++++++-------------------- 5 files changed, 44 insertions(+), 46 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 6a14f945783c..f955123b4765 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -130,8 +130,6 @@ struct mapped_device { struct srcu_struct io_barrier; }; -void dm_init_md_queue(struct mapped_device *md); -void dm_init_normal_md_queue(struct mapped_device *md); int md_in_flight(struct mapped_device *md); void disable_write_same(struct mapped_device *md); void disable_write_zeroes(struct mapped_device *md); diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index e52676fa9832..cf8908a6fcc9 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1334,13 +1334,10 @@ static int table_load(struct file *filp, struct dm_ioctl *param, size_t param_si } if (dm_get_md_type(md) == DM_TYPE_NONE) { - /* Initial table load: acquire type of table. */ - dm_set_md_type(md, dm_table_get_type(t)); - - /* setup md->queue to reflect md's type (may block) */ + /* setup md->type and md->queue to reflect table's type (may block) */ r = dm_setup_md_queue(md, t); if (r) { - DMWARN("unable to set up device queue for new table."); + DMWARN("unable to setup queue for device."); goto err_unlock_md_type; } } else if (!is_valid_type(dm_get_md_type(md), dm_table_get_type(t))) { diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 9d32f25489c2..ab95cc3f2f29 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -56,7 +56,7 @@ static unsigned dm_get_blk_mq_queue_depth(void) int dm_request_based(struct mapped_device *md) { - return queue_is_rq_based(md->queue); + return md->queue && queue_is_rq_based(md->queue); } static void dm_old_start_queue(struct request_queue *q) @@ -700,7 +700,6 @@ int dm_old_init_request_queue(struct mapped_device *md, struct dm_table *t) /* disable dm_old_request_fn's merge heuristic by default */ md->seq_rq_merge_deadline_usecs = 0; - dm_init_normal_md_queue(md); blk_queue_softirq_done(md->queue, dm_softirq_done); /* Initialize the request-based DM worker thread */ @@ -713,8 +712,6 @@ int dm_old_init_request_queue(struct mapped_device *md, struct dm_table *t) return error; } - elv_register_queue(md->queue); - return 0; } @@ -810,17 +807,9 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t) err = PTR_ERR(q); goto out_tag_set; } - dm_init_md_queue(md); - - /* backfill 'mq' sysfs registration normally done in blk_register_queue */ - err = blk_mq_register_dev(disk_to_dev(md->disk), q); - if (err) - goto out_cleanup_queue; return 0; -out_cleanup_queue: - blk_cleanup_queue(q); out_tag_set: blk_mq_free_tag_set(md->tag_set); out_kfree_tag_set: diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index aaffd0c0ee9a..4cd9a8f69b55 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1161,7 +1161,7 @@ static int dm_table_build_index(struct dm_table *t) static bool integrity_profile_exists(struct gendisk *disk) { - return !!blk_get_integrity(disk); + return disk->queue && !!blk_get_integrity(disk); } /* diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 7475739fee49..0dff9b131884 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1626,20 +1626,9 @@ static const struct dax_operations dm_dax_ops; static void dm_wq_work(struct work_struct *work); -void dm_init_md_queue(struct mapped_device *md) -{ - /* - * Initialize data that will only be used by a non-blk-mq DM queue - * - must do so here (in alloc_dev callchain) before queue is used - */ - md->queue->queuedata = md; - md->queue->backing_dev_info->congested_data = md; -} - -void dm_init_normal_md_queue(struct mapped_device *md) +static void dm_init_normal_md_queue(struct mapped_device *md) { md->use_blk_mq = false; - dm_init_md_queue(md); /* * Initialize aspects of queue that aren't relevant for blk-mq @@ -1731,13 +1720,7 @@ static struct mapped_device *alloc_dev(int minor) INIT_LIST_HEAD(&md->table_devices); spin_lock_init(&md->uevent_lock); - md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id); - if (!md->queue) - goto bad; - - dm_init_md_queue(md); - - md->disk = alloc_disk_node(1, numa_node_id); + md->disk = alloc_disk_node(1, md->numa_node_id); if (!md->disk) goto bad; @@ -1752,7 +1735,7 @@ static struct mapped_device *alloc_dev(int minor) md->disk->major = _major; md->disk->first_minor = minor; md->disk->fops = &dm_blk_dops; - md->disk->queue = md->queue; + md->disk->queue = NULL; md->disk->private_data = md; sprintf(md->disk->disk_name, "dm-%d", minor); @@ -1962,13 +1945,18 @@ static struct dm_table *__unbind(struct mapped_device *md) */ int dm_create(int minor, struct mapped_device **result) { + int r; struct mapped_device *md; md = alloc_dev(minor); if (!md) return -ENXIO; - dm_sysfs_init(md); + r = dm_sysfs_init(md); + if (r) { + free_dev(md); + return r; + } *result = md; return 0; @@ -2021,21 +2009,31 @@ EXPORT_SYMBOL_GPL(dm_get_queue_limits); int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t) { int r; - enum dm_queue_mode type = dm_get_md_type(md); + struct queue_limits limits; + enum dm_queue_mode type = dm_table_get_type(t); + + md->queue = blk_alloc_queue_node(GFP_KERNEL, md->numa_node_id); + if (!md->queue) { + DMERR("Cannot allocate queue for mapped device"); + return -ENOMEM; + } + md->queue->queuedata = md; + md->queue->backing_dev_info->congested_data = md; switch (type) { case DM_TYPE_REQUEST_BASED: + dm_init_normal_md_queue(md); r = dm_old_init_request_queue(md, t); if (r) { DMERR("Cannot initialize queue for request-based mapped device"); - return r; + goto bad; } break; case DM_TYPE_MQ_REQUEST_BASED: r = dm_mq_init_request_queue(md, t); if (r) { DMERR("Cannot initialize queue for request-based dm-mq mapped device"); - return r; + goto bad; } break; case DM_TYPE_BIO_BASED: @@ -2057,7 +2055,23 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t) break; } + r = dm_calculate_queue_limits(t, &limits); + if (r) { + DMERR("Cannot calculate initial queue limits"); + goto bad; + } + dm_table_set_restrictions(t, md->queue, &limits); + + md->disk->queue = md->queue; + WARN_ON(bdi_register_owner(md->queue->backing_dev_info, + disk_to_dev(md->disk))); + blk_register_queue(md->disk); + + dm_set_md_type(md, type); return 0; +bad: + blk_cleanup_queue(md->queue); + return r; } struct mapped_device *dm_get_md(dev_t dev) @@ -2121,7 +2135,6 @@ EXPORT_SYMBOL_GPL(dm_device_name); static void __dm_destroy(struct mapped_device *md, bool wait) { - struct request_queue *q = dm_get_md_queue(md); struct dm_table *map; int srcu_idx; @@ -2132,7 +2145,8 @@ static void __dm_destroy(struct mapped_device *md, bool wait) set_bit(DMF_FREEING, &md->flags); spin_unlock(&_minor_lock); - blk_set_queue_dying(q); + if (md->queue) + blk_set_queue_dying(md->queue); if (dm_request_based(md) && md->kworker_task) kthread_flush_worker(&md->kworker); -- 2.15.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-01-10 14:20 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-10 2:41 [for-4.16 PATCH v2 0/3] block: some genhd changes Mike Snitzer 2018-01-10 2:41 ` [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer 2018-01-10 4:35 ` Mike Snitzer 2018-01-10 8:31 ` Christoph Hellwig 2018-01-10 2:41 ` [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later Mike Snitzer 2018-01-10 3:46 ` [dm-devel] " Ming Lei 2018-01-10 4:21 ` Mike Snitzer 2018-01-10 7:55 ` Ming Lei 2018-01-10 14:20 ` Mike Snitzer 2018-01-10 8:32 ` Christoph Hellwig 2018-01-10 12:01 ` Hannes Reinecke 2018-01-10 2:41 ` [for-4.16 PATCH v2 3/3] dm: fix awkward request_queue initialization Mike Snitzer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).