* [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization @ 2010-05-24 23:46 Mike Snitzer 2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer 2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer 0 siblings, 2 replies; 16+ messages in thread From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw) To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon A mapped_device now has a specific immutable type (md->type) that is set during the initial table_load (see: patch 1/2). The DM device's request_queue will only have an elevator allocated and exposed via sysfs if it is request-based (see: patch 2/2). Please see the individual patch headers for more details. (Kiyoshi, if you approve please reply with your "Acked-by" :) This patchset is based on agk's DM "editing" tree: http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/ (apply all patches listed in the series file up to NEXT_PATCHES_END) This also patchset depends on the following linux-2.6-block.git 'for-linus' branch commits: 01effb0 block: allow initialization of previously allocated request_queue e36f724 block: Adjust elv_iosched_show to return "none" for bio-based DM I've made a full quilt series available here (builds on 2.6.34): http://people.redhat.com/msnitzer/patches/dm-queue-init/latest/ A monolithic patch is also available here: http://people.redhat.com/msnitzer/patches/dm-queue-init/latest/monolithic.patch I welcome all comments/review. v2 changes: - removed do_resume and table_clear interlock - rebased to agk's latest "editing" tree - dropped the dm_get_verified_mdptr and find_device_noinit patches that were at the end of v1; can revisit that code duplication cleanup later Mike Snitzer (2): dm: prevent table type changes after initial table load dm: only initialize full request_queue for request-based device drivers/md/dm-ioctl.c | 40 ++++++++++ drivers/md/dm.c | 189 ++++++++++++++++++++++++++++++++++++++++-------- drivers/md/dm.h | 8 ++ 3 files changed, 205 insertions(+), 32 deletions(-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/2] dm: prevent table type changes after initial table load 2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer @ 2010-05-24 23:46 ` Mike Snitzer 2010-05-25 11:16 ` Kiyoshi Ueda 2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer 1 sibling, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw) To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon Before table_load() stages an inactive table atomically check if its type conflicts with a previously established device type (md->type). Introduce 'type_lock' in mapped_device structure and use it to protect md->type access. table_load() sets md->type without concern for: - another table_load() racing to set conflicting md->type. - do_resume() making a conflicting table live. Allowed transitions: UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE Once a table load occurs the DM device's type is completely immutable. This prevents a table reload from switching the inactive table directly to a conflicting type (even if the table is explicitly cleared before load). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm-ioctl.c | 29 ++++++++++++++++++++ drivers/md/dm.c | 71 +++++++++++++++++++++++++++++++++++++++++++++----- drivers/md/dm.h | 6 ++++ 3 files changed, 99 insertions(+), 7 deletions(-) Index: linux-2.6/drivers/md/dm-ioctl.c =================================================================== --- linux-2.6.orig/drivers/md/dm-ioctl.c +++ linux-2.6/drivers/md/dm-ioctl.c @@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p goto out; } + /* + * Protect md->type against concurrent table loads. + * Locking strategy: + * + Leverage fact that md's type cannot change after initial table load. + * - Only protect type in table_load() -- not in do_resume(). + * + * + Protect type while working to stage an inactive table: + * - check if table's type conflicts with md->type + * (holding: md->type_lock) + * - stage inactive table (hc->new_map) + * (holding: md->type_lock + _hash_lock) + */ + dm_lock_md_type(md); + + if (dm_unknown_md_type(md)) { + /* initial table load, set md's type based on table's type */ + dm_set_md_type(md, t); + } else if (!dm_md_type_matches_table(md, t)) { + DMWARN("can't change device type after initial table load."); + dm_table_destroy(t); + dm_unlock_md_type(md); + r = -EINVAL; + goto out; + } + + /* stage inactive table */ down_write(&_hash_lock); hc = dm_get_mdptr(md); if (!hc || hc->md != md) { DMWARN("device has been removed from the dev hash table."); dm_table_destroy(t); up_write(&_hash_lock); + dm_unlock_md_type(md); r = -ENXIO; goto out; } @@ -1191,6 +1218,8 @@ static int table_load(struct dm_ioctl *p hc->new_map = t; up_write(&_hash_lock); + dm_unlock_md_type(md); + param->flags |= DM_INACTIVE_PRESENT_FLAG; __dev_status(md, param); Index: linux-2.6/drivers/md/dm.c =================================================================== --- linux-2.6.orig/drivers/md/dm.c +++ linux-2.6/drivers/md/dm.c @@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo); #define DMF_QUEUE_IO_TO_THREAD 6 /* + * Type for md->type field. + */ +enum mapped_device_type { + UNKNOWN_MD_TYPE, + BIO_BASED_MD_TYPE, + REQUEST_BASED_MD_TYPE, +}; + +/* * Work processed by per-device workqueue. */ struct mapped_device { @@ -123,6 +132,12 @@ struct mapped_device { unsigned long flags; struct request_queue *queue; + enum mapped_device_type type; + /* + * Protect type from concurrent access. + */ + struct mutex type_lock; + struct gendisk *disk; char name[16]; @@ -1874,8 +1889,10 @@ static struct mapped_device *alloc_dev(i if (r < 0) goto bad_minor; + md->type = UNKNOWN_MD_TYPE; init_rwsem(&md->io_lock); mutex_init(&md->suspend_lock); + mutex_init(&md->type_lock); spin_lock_init(&md->deferred_lock); spin_lock_init(&md->barrier_error_lock); rwlock_init(&md->map_lock); @@ -2128,6 +2145,53 @@ int dm_create(int minor, struct mapped_d return 0; } +/* + * Functions to manage md->type. + * All are required to hold md->type_lock. + */ +void dm_lock_md_type(struct mapped_device *md) +{ + mutex_lock(&md->type_lock); +} + +void dm_unlock_md_type(struct mapped_device *md) +{ + mutex_unlock(&md->type_lock); +} + +void dm_set_md_type(struct mapped_device *md, struct dm_table* t) +{ + if (dm_table_request_based(t)) + md->type = REQUEST_BASED_MD_TYPE; + else + md->type = BIO_BASED_MD_TYPE; +} + +bool dm_unknown_md_type(struct mapped_device *md) +{ + return md->type == UNKNOWN_MD_TYPE; +} + +static bool dm_bio_based_md_type(struct mapped_device *md) +{ + return md->type == BIO_BASED_MD_TYPE; +} + +static bool dm_request_based_md_type(struct mapped_device *md) +{ + return md->type == REQUEST_BASED_MD_TYPE; +} + +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t) +{ + if (dm_request_based_md_type(md)) + return dm_table_request_based(t); + else if (dm_bio_based_md_type(md)) + return !dm_table_request_based(t); + + return 0; +} + static struct mapped_device *dm_find_md(dev_t dev) { struct mapped_device *md; @@ -2403,13 +2467,6 @@ struct dm_table *dm_swap_table(struct ma goto out; } - /* cannot change the device type, once a table is bound */ - if (md->map && - (dm_table_get_type(md->map) != dm_table_get_type(table))) { - DMWARN("can't change the device type after a table is bound"); - goto out; - } - map = __bind(md, table, &limits); out: Index: linux-2.6/drivers/md/dm.h =================================================================== --- linux-2.6.orig/drivers/md/dm.h +++ linux-2.6/drivers/md/dm.h @@ -66,6 +66,12 @@ int dm_table_alloc_md_mempools(struct dm void dm_table_free_md_mempools(struct dm_table *t); struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t); +void dm_lock_md_type(struct mapped_device *md); +void dm_unlock_md_type(struct mapped_device *md); +void dm_set_md_type(struct mapped_device *md, struct dm_table* t); +bool dm_unknown_md_type(struct mapped_device *md); +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t); + /* * To check the return value from dm_table_find_target(). */ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] dm: prevent table type changes after initial table load 2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer @ 2010-05-25 11:16 ` Kiyoshi Ueda 2010-05-25 12:44 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Kiyoshi Ueda @ 2010-05-25 11:16 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel, Alasdair Kergon Hi Mike, On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote: > Before table_load() stages an inactive table atomically check if its > type conflicts with a previously established device type (md->type). > > Introduce 'type_lock' in mapped_device structure and use it to protect > md->type access. table_load() sets md->type without concern for: > - another table_load() racing to set conflicting md->type. > - do_resume() making a conflicting table live. > > Allowed transitions: > UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE > UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE > > Once a table load occurs the DM device's type is completely immutable. > > This prevents a table reload from switching the inactive table directly > to a conflicting type (even if the table is explicitly cleared before > load). > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> I basically agree with this design. But I have some comments. Please see below. By the way, I can't make enough time to review until tomorrow. So the following comments may not be all. Sorry for the inconvenience. > Index: linux-2.6/drivers/md/dm-ioctl.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm-ioctl.c > +++ linux-2.6/drivers/md/dm-ioctl.c > @@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p > goto out; > } > > + /* > + * Protect md->type against concurrent table loads. > + * Locking strategy: > + * + Leverage fact that md's type cannot change after initial table load. > + * - Only protect type in table_load() -- not in do_resume(). > + * > + * + Protect type while working to stage an inactive table: > + * - check if table's type conflicts with md->type > + * (holding: md->type_lock) > + * - stage inactive table (hc->new_map) > + * (holding: md->type_lock + _hash_lock) > + */ I think you don't need to hold md->type_lock while you set the table to hc->new_map; If 2 tables of the same type are racing here, we can have either table. If they are different type, the former one will win and the latter one will be rejected. > + dm_lock_md_type(md); > + > + if (dm_unknown_md_type(md)) { > + /* initial table load, set md's type based on table's type */ > + dm_set_md_type(md, t); > + } else if (!dm_md_type_matches_table(md, t)) { > + DMWARN("can't change device type after initial table load."); > + dm_table_destroy(t); > + dm_unlock_md_type(md); > + r = -EINVAL; > + goto out; > + } So you should be able to unlock md->type_lock here. > Index: linux-2.6/drivers/md/dm.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm.c > +++ linux-2.6/drivers/md/dm.c > @@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo); > #define DMF_QUEUE_IO_TO_THREAD 6 > > /* > + * Type for md->type field. > + */ > +enum mapped_device_type { > + UNKNOWN_MD_TYPE, > + BIO_BASED_MD_TYPE, > + REQUEST_BASED_MD_TYPE, > +}; You can use the defined types in drivers/md/dm.h below instead; /* * Type of table and mapped_device's mempool */ #define DM_TYPE_NONE 0 #define DM_TYPE_BIO_BASED 1 #define DM_TYPE_REQUEST_BASED 2 Using them should make type matching easier than making another definition. E.g. see below; > +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t) > +{ > + if (dm_request_based_md_type(md)) > + return dm_table_request_based(t); > + else if (dm_bio_based_md_type(md)) > + return !dm_table_request_based(t); > + > + return 0; > +} You can match type using "md->type == dm_table_get_type(t)" if you use the already defined types above. Thanks, Kiyoshi Ueda ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] dm: prevent table type changes after initial table load 2010-05-25 11:16 ` Kiyoshi Ueda @ 2010-05-25 12:44 ` Mike Snitzer 0 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2010-05-25 12:44 UTC (permalink / raw) To: Kiyoshi Ueda; +Cc: dm-devel, Alasdair Kergon On Tue, May 25 2010 at 7:16am -0400, Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > Hi Mike, > > On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote: > > Before table_load() stages an inactive table atomically check if its > > type conflicts with a previously established device type (md->type). > > > > Introduce 'type_lock' in mapped_device structure and use it to protect > > md->type access. table_load() sets md->type without concern for: > > - another table_load() racing to set conflicting md->type. > > - do_resume() making a conflicting table live. > > > > Allowed transitions: > > UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE > > UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE > > > > Once a table load occurs the DM device's type is completely immutable. > > > > This prevents a table reload from switching the inactive table directly > > to a conflicting type (even if the table is explicitly cleared before > > load). > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > > I basically agree with this design. > But I have some comments. Please see below. > > By the way, I can't make enough time to review until tomorrow. > So the following comments may not be all. > Sorry for the inconvenience. No problem, thanks again for all your review. > > Index: linux-2.6/drivers/md/dm-ioctl.c > > =================================================================== > > --- linux-2.6.orig/drivers/md/dm-ioctl.c > > +++ linux-2.6/drivers/md/dm-ioctl.c > > @@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p > > goto out; > > } > > > > + /* > > + * Protect md->type against concurrent table loads. > > + * Locking strategy: > > + * + Leverage fact that md's type cannot change after initial table load. > > + * - Only protect type in table_load() -- not in do_resume(). > > + * > > + * + Protect type while working to stage an inactive table: > > + * - check if table's type conflicts with md->type > > + * (holding: md->type_lock) > > + * - stage inactive table (hc->new_map) > > + * (holding: md->type_lock + _hash_lock) > > + */ > > I think you don't need to hold md->type_lock while you set the table > to hc->new_map; If 2 tables of the same type are racing here, we can > have either table. If they are different type, the former one will win > and the latter one will be rejected. Correct. This v2 patchset has significantly reduced the complexity (no longer need to be concerned with md->type transitioning back to UNKNOWN_MD_TYPE). I just missed the fact that the locking could be simplified a bit too. > > + dm_lock_md_type(md); > > + > > + if (dm_unknown_md_type(md)) { > > + /* initial table load, set md's type based on table's type */ > > + dm_set_md_type(md, t); > > + } else if (!dm_md_type_matches_table(md, t)) { > > + DMWARN("can't change device type after initial table load."); > > + dm_table_destroy(t); > > + dm_unlock_md_type(md); > > + r = -EINVAL; > > + goto out; > > + } > > So you should be able to unlock md->type_lock here. Yes. > > Index: linux-2.6/drivers/md/dm.c > > =================================================================== > > --- linux-2.6.orig/drivers/md/dm.c > > +++ linux-2.6/drivers/md/dm.c > > @@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo); > > #define DMF_QUEUE_IO_TO_THREAD 6 > > > > /* > > + * Type for md->type field. > > + */ > > +enum mapped_device_type { > > + UNKNOWN_MD_TYPE, > > + BIO_BASED_MD_TYPE, > > + REQUEST_BASED_MD_TYPE, > > +}; > > You can use the defined types in drivers/md/dm.h below instead; > /* > * Type of table and mapped_device's mempool > */ > #define DM_TYPE_NONE 0 > #define DM_TYPE_BIO_BASED 1 > #define DM_TYPE_REQUEST_BASED 2 > > Using them should make type matching easier than making another definition. > E.g. see below; > > > +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t) > > +{ > > + if (dm_request_based_md_type(md)) > > + return dm_table_request_based(t); > > + else if (dm_bio_based_md_type(md)) > > + return !dm_table_request_based(t); > > + > > + return 0; > > +} > > You can match type using "md->type == dm_table_get_type(t)" if you use > the already defined types above. Ah, yes that'll clean things up nicely. No idea how I missed the existing definitions. Thanks, Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device 2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer 2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer @ 2010-05-24 23:46 ` Mike Snitzer 2010-05-25 11:18 ` Kiyoshi Ueda 1 sibling, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw) To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon Allocate a minimalist request_queue structure initially (needed for both bio and request-based DM). A bio-based DM device no longer defaults to having a fully initialized request_queue (request_fn, elevator, etc). So bio-based DM devices no longer register elevator sysfs attributes ('iosched/' tree or 'scheduler' other than "none"). Initialization of a full request_queue (request_fn, elevator, etc) is deferred until it is known that the DM device is request-based -- at the end of the table load sequence. Factor DM device's request_queue initialization: - common to both request-based and bio-based into dm_init_md_queue(). - specific to request-based into dm_init_request_based_queue(). md->type_lock is also used to protect md->queue during table_load(). md->queue is setup without concern for: - another table_load() racing to setup conflicting queue state. - do_resume() making a conflicting table live. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm-ioctl.c | 15 +++++- drivers/md/dm.c | 120 +++++++++++++++++++++++++++++++++++++++----------- drivers/md/dm.h | 2 3 files changed, 109 insertions(+), 28 deletions(-) Index: linux-2.6/drivers/md/dm-ioctl.c =================================================================== --- linux-2.6.orig/drivers/md/dm-ioctl.c +++ linux-2.6/drivers/md/dm-ioctl.c @@ -1177,14 +1177,16 @@ static int table_load(struct dm_ioctl *p } /* - * Protect md->type against concurrent table loads. + * Protect md->type and md->queue against concurrent table loads. * Locking strategy: * + Leverage fact that md's type cannot change after initial table load. * - Only protect type in table_load() -- not in do_resume(). * - * + Protect type while working to stage an inactive table: + * + Protect type and queue while working to stage an inactive table: * - check if table's type conflicts with md->type * (holding: md->type_lock) + * - setup md->queue based on md->type + * (holding: md->type_lock) * - stage inactive table (hc->new_map) * (holding: md->type_lock + _hash_lock) */ @@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p goto out; } + /* setup md->queue to reflect md's and table's type (may block) */ + r = dm_setup_md_queue(md); + if (r) { + DMWARN("unable to setup device queue for this table."); + dm_table_destroy(t); + dm_unlock_md_type(md); + goto out; + } + /* stage inactive table */ down_write(&_hash_lock); hc = dm_get_mdptr(md); Index: linux-2.6/drivers/md/dm.c =================================================================== --- linux-2.6.orig/drivers/md/dm.c +++ linux-2.6/drivers/md/dm.c @@ -134,7 +134,7 @@ struct mapped_device { struct request_queue *queue; enum mapped_device_type type; /* - * Protect type from concurrent access. + * Protect queue and type from concurrent access. */ struct mutex type_lock; @@ -1864,6 +1864,28 @@ static const struct block_device_operati static void dm_wq_work(struct work_struct *work); static void dm_rq_barrier_work(struct work_struct *work); +static void dm_init_md_queue(struct mapped_device *md) +{ + /* + * Request-based dm devices cannot be stacked on top of bio-based dm + * devices. The type of this dm device has not been decided yet. + * The type is decided at the first table loading time. + * To prevent problematic device stacking, clear the queue flag + * for request stacking support until then. + * + * This queue is new, so no concurrency on the queue_flags. + */ + queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue); + + md->queue->queuedata = md; + md->queue->backing_dev_info.congested_fn = dm_any_congested; + md->queue->backing_dev_info.congested_data = md; + blk_queue_make_request(md->queue, dm_request); + blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY); + md->queue->unplug_fn = dm_unplug_all; + blk_queue_merge_bvec(md->queue, dm_merge_bvec); +} + /* * Allocate and initialise a blank device with a given minor. */ @@ -1903,34 +1925,11 @@ static struct mapped_device *alloc_dev(i INIT_LIST_HEAD(&md->uevent_list); spin_lock_init(&md->uevent_lock); - md->queue = blk_init_queue(dm_request_fn, NULL); + md->queue = blk_alloc_queue(GFP_KERNEL); if (!md->queue) goto bad_queue; - /* - * Request-based dm devices cannot be stacked on top of bio-based dm - * devices. The type of this dm device has not been decided yet, - * although we initialized the queue using blk_init_queue(). - * The type is decided at the first table loading time. - * To prevent problematic device stacking, clear the queue flag - * for request stacking support until then. - * - * This queue is new, so no concurrency on the queue_flags. - */ - queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue); - md->saved_make_request_fn = md->queue->make_request_fn; - md->queue->queuedata = md; - md->queue->backing_dev_info.congested_fn = dm_any_congested; - md->queue->backing_dev_info.congested_data = md; - blk_queue_make_request(md->queue, dm_request); - blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY); - md->queue->unplug_fn = dm_unplug_all; - blk_queue_merge_bvec(md->queue, dm_merge_bvec); - blk_queue_softirq_done(md->queue, dm_softirq_done); - blk_queue_prep_rq(md->queue, dm_prep_fn); - blk_queue_lld_busy(md->queue, dm_lld_busy); - blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH, - dm_rq_prepare_flush); + dm_init_md_queue(md); md->disk = alloc_disk(1); if (!md->disk) @@ -2192,6 +2191,75 @@ bool dm_md_type_matches_table(struct map return 0; } +/* + * Functions to manage md->queue. + * All are required to hold md->type_lock. + */ +static bool dm_bio_based_md_queue(struct mapped_device *md) +{ + return (md->queue->request_fn) ? 0 : 1; +} + +/* + * Fully initialize a request-based queue (->elevator, ->request_fn, etc). + */ +static int dm_init_request_based_queue(struct mapped_device *md) +{ + struct request_queue *q = NULL; + + /* Avoid re-initializing the queue if already fully initialized */ + if (!md->queue->elevator) { + /* Fully initialize the queue */ + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); + if (!q) + return 0; + md->queue = q; + md->saved_make_request_fn = md->queue->make_request_fn; + dm_init_md_queue(md); + blk_queue_softirq_done(md->queue, dm_softirq_done); + blk_queue_prep_rq(md->queue, dm_prep_fn); + blk_queue_lld_busy(md->queue, dm_lld_busy); + blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH, + dm_rq_prepare_flush); + } else { + BUG_ON(dm_bio_based_md_queue(md)); + return 1; /* queue already request-based */ + } + + elv_register_queue(md->queue); + + return 1; +} + +static void dm_clear_request_based_queue(struct mapped_device *md) +{ + if (dm_bio_based_md_queue(md)) + return; /* queue already bio-based */ + + /* Unregister elevator from sysfs and clear ->request_fn */ + elv_unregister_queue(md->queue); + md->queue->request_fn = NULL; +} + +/* + * Setup the DM device's queue based on md's type + */ +int dm_setup_md_queue(struct mapped_device *md) +{ + BUG_ON(!mutex_is_locked(&md->type_lock)); + BUG_ON(dm_unknown_md_type(md)); + + if (dm_request_based_md_type(md)) { + if (!dm_init_request_based_queue(md)) { + DMWARN("Cannot initialize queue for Request-based dm"); + return -EINVAL; + } + } else if (dm_bio_based_md_type(md)) + dm_clear_request_based_queue(md); + + return 0; +} + static struct mapped_device *dm_find_md(dev_t dev) { struct mapped_device *md; Index: linux-2.6/drivers/md/dm.h =================================================================== --- linux-2.6.orig/drivers/md/dm.h +++ linux-2.6/drivers/md/dm.h @@ -72,6 +72,8 @@ void dm_set_md_type(struct mapped_device bool dm_unknown_md_type(struct mapped_device *md); bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t); +int dm_setup_md_queue(struct mapped_device *md); + /* * To check the return value from dm_table_find_target(). */ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device 2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer @ 2010-05-25 11:18 ` Kiyoshi Ueda 2010-05-25 12:49 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Kiyoshi Ueda @ 2010-05-25 11:18 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel, Alasdair Kergon Hi Mike, On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote: > Index: linux-2.6/drivers/md/dm-ioctl.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm-ioctl.c > +++ linux-2.6/drivers/md/dm-ioctl.c > @@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p > goto out; > } > > + /* setup md->queue to reflect md's and table's type (may block) */ > + r = dm_setup_md_queue(md); > + if (r) { > + DMWARN("unable to setup device queue for this table."); > + dm_table_destroy(t); > + dm_unlock_md_type(md); > + goto out; > + } > + dm_setup_md_queue() should be put just after dm_set_md_type() of patch#1. Then, you can make sure that dm_setup_md_queue() is called only once and dm_setup_md_queue() should be much simpler. > +/* > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc). > + */ > +static int dm_init_request_based_queue(struct mapped_device *md) > +{ > + struct request_queue *q = NULL; > + > + /* Avoid re-initializing the queue if already fully initialized */ > + if (!md->queue->elevator) { > + /* Fully initialize the queue */ > + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); > + if (!q) > + return 0; When blk_init_allocated_queue() fails, the block-layer seems not to guarantee that the queue is still available. You should create appropriate block-layer interfaces and/or take proper recovery action here. However, if the failure is not realistic, rather than complicating this patch, you can just put a big comment and WARN_ON() for now, and fix it in a separate patch-set. > +static void dm_clear_request_based_queue(struct mapped_device *md) > +{ > + if (dm_bio_based_md_queue(md)) > + return; /* queue already bio-based */ > + > + /* Unregister elevator from sysfs and clear ->request_fn */ > + elv_unregister_queue(md->queue); > + md->queue->request_fn = NULL; > +} > + > +/* > + * Setup the DM device's queue based on md's type > + */ > +int dm_setup_md_queue(struct mapped_device *md) > +{ > + BUG_ON(!mutex_is_locked(&md->type_lock)); > + BUG_ON(dm_unknown_md_type(md)); > + > + if (dm_request_based_md_type(md)) { > + if (!dm_init_request_based_queue(md)) { > + DMWARN("Cannot initialize queue for Request-based dm"); > + return -EINVAL; > + } > + } else if (dm_bio_based_md_type(md)) > + dm_clear_request_based_queue(md); > + > + return 0; > +} These unregister/clear works shouldn't be needed if dm_setup_md_queue() is called only once as I mentioned above. Thanks, Kiyoshi Ueda ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device 2010-05-25 11:18 ` Kiyoshi Ueda @ 2010-05-25 12:49 ` Mike Snitzer 2010-05-25 16:34 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2010-05-25 12:49 UTC (permalink / raw) To: Kiyoshi Ueda; +Cc: dm-devel, Alasdair Kergon, jens.axboe On Tue, May 25 2010 at 7:18am -0400, Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > Hi Mike, > > On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote: > > Index: linux-2.6/drivers/md/dm-ioctl.c > > =================================================================== > > --- linux-2.6.orig/drivers/md/dm-ioctl.c > > +++ linux-2.6/drivers/md/dm-ioctl.c > > @@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p > > goto out; > > } > > > > + /* setup md->queue to reflect md's and table's type (may block) */ > > + r = dm_setup_md_queue(md); > > + if (r) { > > + DMWARN("unable to setup device queue for this table."); > > + dm_table_destroy(t); > > + dm_unlock_md_type(md); > > + goto out; > > + } > > + > > dm_setup_md_queue() should be put just after dm_set_md_type() of patch#1. > Then, you can make sure that dm_setup_md_queue() is called only once > and dm_setup_md_queue() should be much simpler. Good idea. > > +/* > > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc). > > + */ > > +static int dm_init_request_based_queue(struct mapped_device *md) > > +{ > > + struct request_queue *q = NULL; > > + > > + /* Avoid re-initializing the queue if already fully initialized */ > > + if (!md->queue->elevator) { > > + /* Fully initialize the queue */ > > + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); > > + if (!q) > > + return 0; > > When blk_init_allocated_queue() fails, the block-layer seems not to > guarantee that the queue is still available. Ouch, yes this portion of blk_init_allocated_queue_node() is certainly problematic: if (blk_init_free_list(q)) { kmem_cache_free(blk_requestq_cachep, q); return NULL; } Cc'ing Jens as I think it would be advantageous for us to push the above kmem_cache_free() into the callers where it really makes sense, e.g.: blk_init_queue_node(). So on blk_init_allocated_queue_node() failure blk_init_queue_node() will take care to cleanup the queue that it assumes it is managing completely. My patch (linux-2.6-block.git's commit: 01effb0) that split out blk_init_allocated_queue_node() from blk_init_queue_node() opened up this issue. I'm fairly confident we'll get it fixed by the time 2.6.35 ships. > You should create appropriate block-layer interfaces and/or take proper > recovery action here. > However, if the failure is not realistic, rather than complicating this > patch, you can just put a big comment and WARN_ON() for now, and fix it > in a separate patch-set. I'll hold off on the WARN_ON(). I think we need the request_queue to remain allocated even after blk_init_allocated_queue_node() failure. > > +static void dm_clear_request_based_queue(struct mapped_device *md) > > +{ > > + if (dm_bio_based_md_queue(md)) > > + return; /* queue already bio-based */ > > + > > + /* Unregister elevator from sysfs and clear ->request_fn */ > > + elv_unregister_queue(md->queue); > > + md->queue->request_fn = NULL; > > +} > > + > > +/* > > + * Setup the DM device's queue based on md's type > > + */ > > +int dm_setup_md_queue(struct mapped_device *md) > > +{ > > + BUG_ON(!mutex_is_locked(&md->type_lock)); > > + BUG_ON(dm_unknown_md_type(md)); > > + > > + if (dm_request_based_md_type(md)) { > > + if (!dm_init_request_based_queue(md)) { > > + DMWARN("Cannot initialize queue for Request-based dm"); > > + return -EINVAL; > > + } > > + } else if (dm_bio_based_md_type(md)) > > + dm_clear_request_based_queue(md); > > + > > + return 0; > > +} > > These unregister/clear works shouldn't be needed if dm_setup_md_queue() > is called only once as I mentioned above. Indeed. I'll get v3 of the patchset out with appropriate edits. Thanks, Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] block: avoid unconditionally freeing previously allocated request_queue 2010-05-25 12:49 ` Mike Snitzer @ 2010-05-25 16:34 ` Mike Snitzer 0 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2010-05-25 16:34 UTC (permalink / raw) To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon On Tue, May 25 2010 at 8:49am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, May 25 2010 at 7:18am -0400, > Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > > > > +/* > > > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc). > > > + */ > > > +static int dm_init_request_based_queue(struct mapped_device *md) > > > +{ > > > + struct request_queue *q = NULL; > > > + > > > + /* Avoid re-initializing the queue if already fully initialized */ > > > + if (!md->queue->elevator) { > > > + /* Fully initialize the queue */ > > > + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); > > > + if (!q) > > > + return 0; > > > > When blk_init_allocated_queue() fails, the block-layer seems not to > > guarantee that the queue is still available. > > Ouch, yes this portion of blk_init_allocated_queue_node() is certainly > problematic: > > if (blk_init_free_list(q)) { > kmem_cache_free(blk_requestq_cachep, q); > return NULL; > } > > Cc'ing Jens as I think it would be advantageous for us to push the above > kmem_cache_free() into the callers where it really makes sense, e.g.: > blk_init_queue_node(). > > So on blk_init_allocated_queue_node() failure blk_init_queue_node() will > take care to cleanup the queue that it assumes it is managing > completely. > > My patch (linux-2.6-block.git's commit: 01effb0) that split out > blk_init_allocated_queue_node() from blk_init_queue_node() opened up > this issue. I'm fairly confident we'll get it fixed by the time 2.6.35 > ships. Jens, How about something like the following? block: avoid unconditionally freeing previously allocated request_queue On blk_init_allocated_queue_node failure, only free request_queue if it is wasn't previously allocated outside the block layer (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller). This addresses a regression introduced by the following commit: 01effb0 block: allow initialization of previously allocated request_queue Otherwise the request_queue may be free'd out from underneath a caller that is managing the request_queue directly (e.g. caller uses blk_alloc_queue + blk_init_allocated_queue_node). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 31 ++++++++++++++++++++++++++----- 1 files changed, 26 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3bc5579..c0179b7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -528,6 +528,24 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) } EXPORT_SYMBOL(blk_alloc_queue_node); +static void blk_free_partial_queue(struct request_queue *q) +{ + /* Free q if blk_init_queue failed early enough. */ + int free_request_queue = 0; + struct request_list *rl; + + if (!q) + return; + + /* Was blk_init_free_list the cause for failure? */ + rl = &q->rq; + if (!rl->rq_pool) + free_request_queue = 1; + + if (free_request_queue) + kmem_cache_free(blk_requestq_cachep, q); +} + /** * blk_init_queue - prepare a request queue for use with a block device * @rfn: The function to be called to process requests that have been @@ -570,9 +588,14 @@ EXPORT_SYMBOL(blk_init_queue); struct request_queue * blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { - struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id); + struct request_queue *uninit_q, *q; + + uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id); + q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id); + if (!q) + blk_free_partial_queue(uninit_q); - return blk_init_allocated_queue_node(q, rfn, lock, node_id); + return q; } EXPORT_SYMBOL(blk_init_queue_node); @@ -592,10 +615,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return NULL; q->node = node_id; - if (blk_init_free_list(q)) { - kmem_cache_free(blk_requestq_cachep, q); + if (blk_init_free_list(q)) return NULL; - } q->request_fn = rfn; q->prep_rq_fn = NULL; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH] block: avoid unconditionally freeing previously allocated request_queue @ 2010-05-25 16:34 ` Mike Snitzer 0 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2010-05-25 16:34 UTC (permalink / raw) To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel On Tue, May 25 2010 at 8:49am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, May 25 2010 at 7:18am -0400, > Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > > > > +/* > > > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc). > > > + */ > > > +static int dm_init_request_based_queue(struct mapped_device *md) > > > +{ > > > + struct request_queue *q = NULL; > > > + > > > + /* Avoid re-initializing the queue if already fully initialized */ > > > + if (!md->queue->elevator) { > > > + /* Fully initialize the queue */ > > > + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); > > > + if (!q) > > > + return 0; > > > > When blk_init_allocated_queue() fails, the block-layer seems not to > > guarantee that the queue is still available. > > Ouch, yes this portion of blk_init_allocated_queue_node() is certainly > problematic: > > if (blk_init_free_list(q)) { > kmem_cache_free(blk_requestq_cachep, q); > return NULL; > } > > Cc'ing Jens as I think it would be advantageous for us to push the above > kmem_cache_free() into the callers where it really makes sense, e.g.: > blk_init_queue_node(). > > So on blk_init_allocated_queue_node() failure blk_init_queue_node() will > take care to cleanup the queue that it assumes it is managing > completely. > > My patch (linux-2.6-block.git's commit: 01effb0) that split out > blk_init_allocated_queue_node() from blk_init_queue_node() opened up > this issue. I'm fairly confident we'll get it fixed by the time 2.6.35 > ships. Jens, How about something like the following? block: avoid unconditionally freeing previously allocated request_queue On blk_init_allocated_queue_node failure, only free request_queue if it is wasn't previously allocated outside the block layer (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller). This addresses a regression introduced by the following commit: 01effb0 block: allow initialization of previously allocated request_queue Otherwise the request_queue may be free'd out from underneath a caller that is managing the request_queue directly (e.g. caller uses blk_alloc_queue + blk_init_allocated_queue_node). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 31 ++++++++++++++++++++++++++----- 1 files changed, 26 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3bc5579..c0179b7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -528,6 +528,24 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) } EXPORT_SYMBOL(blk_alloc_queue_node); +static void blk_free_partial_queue(struct request_queue *q) +{ + /* Free q if blk_init_queue failed early enough. */ + int free_request_queue = 0; + struct request_list *rl; + + if (!q) + return; + + /* Was blk_init_free_list the cause for failure? */ + rl = &q->rq; + if (!rl->rq_pool) + free_request_queue = 1; + + if (free_request_queue) + kmem_cache_free(blk_requestq_cachep, q); +} + /** * blk_init_queue - prepare a request queue for use with a block device * @rfn: The function to be called to process requests that have been @@ -570,9 +588,14 @@ EXPORT_SYMBOL(blk_init_queue); struct request_queue * blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { - struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id); + struct request_queue *uninit_q, *q; + + uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id); + q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id); + if (!q) + blk_free_partial_queue(uninit_q); - return blk_init_allocated_queue_node(q, rfn, lock, node_id); + return q; } EXPORT_SYMBOL(blk_init_queue_node); @@ -592,10 +615,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return NULL; q->node = node_id; - if (blk_init_free_list(q)) { - kmem_cache_free(blk_requestq_cachep, q); + if (blk_init_free_list(q)) return NULL; - } q->request_fn = rfn; q->prep_rq_fn = NULL; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/1] block: make blk_init_free_list and elevator_init idempotent 2010-05-25 16:34 ` Mike Snitzer (?) @ 2010-05-25 17:15 ` Mike Snitzer -1 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2010-05-25 17:15 UTC (permalink / raw) To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel blk_init_allocated_queue_node may fail and the caller _could_ retry. Accommodate the unlikely event that blk_init_allocated_queue_node is called on an already initialized (possibly partially) request_queue. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 3 +++ block/elevator.c | 6 ++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index c0179b7..2c208c3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -467,6 +467,9 @@ static int blk_init_free_list(struct request_queue *q) { struct request_list *rl = &q->rq; + if (unlikely(rl->rq_pool)) + return 0; + rl->count[BLK_RW_SYNC] = rl->count[BLK_RW_ASYNC] = 0; rl->starved[BLK_RW_SYNC] = rl->starved[BLK_RW_ASYNC] = 0; rl->elvpriv = 0; diff --git a/block/elevator.c b/block/elevator.c index 0abce47..923a913 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -242,9 +242,11 @@ int elevator_init(struct request_queue *q, char *name) { struct elevator_type *e = NULL; struct elevator_queue *eq; - int ret = 0; void *data; + if (unlikely(q->elevator)) + return 0; + INIT_LIST_HEAD(&q->queue_head); q->last_merge = NULL; q->end_sector = 0; @@ -284,7 +286,7 @@ int elevator_init(struct request_queue *q, char *name) } elevator_attach(q, eq, data); - return ret; + return 0; } EXPORT_SYMBOL(elevator_init); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] block: avoid unconditionally freeing previously allocated request_queue 2010-05-25 16:34 ` Mike Snitzer (?) (?) @ 2010-05-26 2:37 ` Kiyoshi Ueda 2010-05-26 4:47 ` Mike Snitzer -1 siblings, 1 reply; 16+ messages in thread From: Kiyoshi Ueda @ 2010-05-26 2:37 UTC (permalink / raw) To: Mike Snitzer; +Cc: Jens Axboe, dm-devel, Alasdair Kergon, linux-kernel Hi Mike, On 05/26/2010 01:34 AM +0900, Mike Snitzer wrote: > Mike Snitzer <snitzer@redhat.com> wrote: >> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: >>>> +/* >>>> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc). >>>> + */ >>>> +static int dm_init_request_based_queue(struct mapped_device *md) >>>> +{ >>>> + struct request_queue *q = NULL; >>>> + >>>> + /* Avoid re-initializing the queue if already fully initialized */ >>>> + if (!md->queue->elevator) { >>>> + /* Fully initialize the queue */ >>>> + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); >>>> + if (!q) >>>> + return 0; >>> >>> When blk_init_allocated_queue() fails, the block-layer seems not to >>> guarantee that the queue is still available. >> >> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly >> problematic: >> >> if (blk_init_free_list(q)) { >> kmem_cache_free(blk_requestq_cachep, q); >> return NULL; >> } Not only that. The blk_put_queue() in blk_init_allocated_queue_node() will also free the queue: if (!elevator_init(q, NULL)) { blk_queue_congestion_threshold(q); return q; } blk_put_queue(q); return NULL; Thanks, Kiyoshi Ueda ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: block: avoid unconditionally freeing previously allocated request_queue 2010-05-26 2:37 ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Kiyoshi Ueda @ 2010-05-26 4:47 ` Mike Snitzer 0 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2010-05-26 4:47 UTC (permalink / raw) To: Kiyoshi Ueda; +Cc: Jens Axboe, dm-devel, Alasdair Kergon, linux-kernel On Tue, May 25 2010 at 10:37pm -0400, Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > Hi Mike, > > On 05/26/2010 01:34 AM +0900, Mike Snitzer wrote: > > Mike Snitzer <snitzer@redhat.com> wrote: > >> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > >>>> +/* > >>>> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc). > >>>> + */ > >>>> +static int dm_init_request_based_queue(struct mapped_device *md) > >>>> +{ > >>>> + struct request_queue *q = NULL; > >>>> + > >>>> + /* Avoid re-initializing the queue if already fully initialized */ > >>>> + if (!md->queue->elevator) { > >>>> + /* Fully initialize the queue */ > >>>> + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); > >>>> + if (!q) > >>>> + return 0; > >>> > >>> When blk_init_allocated_queue() fails, the block-layer seems not to > >>> guarantee that the queue is still available. > >> > >> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly > >> problematic: > >> > >> if (blk_init_free_list(q)) { > >> kmem_cache_free(blk_requestq_cachep, q); > >> return NULL; > >> } > > Not only that. The blk_put_queue() in blk_init_allocated_queue_node() > will also free the queue: > > if (!elevator_init(q, NULL)) { > blk_queue_congestion_threshold(q); > return q; > } > > blk_put_queue(q); > return NULL; OK, I'll post v2 that addresses this and we'll see what Jens says. Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2] block: avoid unconditionally freeing previously allocated request_queue 2010-05-25 16:34 ` Mike Snitzer ` (2 preceding siblings ...) (?) @ 2010-05-26 4:52 ` Mike Snitzer 2010-06-03 16:58 ` [PATCH v3] " Mike Snitzer -1 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2010-05-26 4:52 UTC (permalink / raw) To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel On blk_init_allocated_queue_node failure, only free request_queue if it is wasn't previously allocated outside the block layer (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller). This addresses an interface bug introduced by the following commit: 01effb0 block: allow initialization of previously allocated request_queue Otherwise the request_queue may be free'd out from underneath a caller that is managing the request_queue directly (e.g. caller uses blk_alloc_queue + blk_init_allocated_queue_node). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 33 +++++++++++++++++++++++++++------ 1 files changed, 27 insertions(+), 6 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3bc5579..c0cdafd 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -528,6 +528,25 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) } EXPORT_SYMBOL(blk_alloc_queue_node); +static void blk_free_partial_queue(struct request_queue *q) +{ + struct request_list *rl; + + if (!q) + return; + + /* Was blk_init_free_list the cause for failure? */ + rl = &q->rq; + if (!rl->rq_pool) { + kmem_cache_free(blk_requestq_cachep, q); + return; + } + + /* Or was elevator_init? */ + if (!q->elevator) + blk_put_queue(q); +} + /** * blk_init_queue - prepare a request queue for use with a block device * @rfn: The function to be called to process requests that have been @@ -570,9 +589,14 @@ EXPORT_SYMBOL(blk_init_queue); struct request_queue * blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { - struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id); + struct request_queue *uninit_q, *q; + + uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id); + q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id); + if (!q) + blk_free_partial_queue(uninit_q); - return blk_init_allocated_queue_node(q, rfn, lock, node_id); + return q; } EXPORT_SYMBOL(blk_init_queue_node); @@ -592,10 +616,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return NULL; q->node = node_id; - if (blk_init_free_list(q)) { - kmem_cache_free(blk_requestq_cachep, q); + if (blk_init_free_list(q)) return NULL; - } q->request_fn = rfn; q->prep_rq_fn = NULL; @@ -618,7 +640,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return q; } - blk_put_queue(q); return NULL; } EXPORT_SYMBOL(blk_init_allocated_queue_node); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3] block: avoid unconditionally freeing previously allocated request_queue 2010-05-26 4:52 ` [PATCH v2] " Mike Snitzer @ 2010-06-03 16:58 ` Mike Snitzer 2010-06-03 17:34 ` [PATCH v4] " Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2010-06-03 16:58 UTC (permalink / raw) To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon On blk_init_allocated_queue_node failure, only free the request_queue if it is wasn't previously allocated outside the block layer (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller). This addresses an interface bug introduced by the following commit: 01effb0 block: allow initialization of previously allocated request_queue Otherwise the request_queue may be free'd out from underneath a caller that is managing the request_queue directly (e.g. caller uses blk_alloc_queue + blk_init_allocated_queue_node). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 14 ++++++++------ 1 files changed, 8 insertions(+), 6 deletions(-) v3: leverage fact that blk_cleanup_queue will properly free all memory associated with a request_queue (e.g.: q->rq_pool and q->elevator) diff --git a/block/blk-core.c b/block/blk-core.c index 3bc5579..24683a4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -570,9 +570,14 @@ EXPORT_SYMBOL(blk_init_queue); struct request_queue * blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { - struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id); + struct request_queue *uninit_q, *q; - return blk_init_allocated_queue_node(q, rfn, lock, node_id); + uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id); + q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id); + if (!q) + blk_cleanup_queue(uninit_q); + + return q; } EXPORT_SYMBOL(blk_init_queue_node); @@ -592,10 +597,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return NULL; q->node = node_id; - if (blk_init_free_list(q)) { - kmem_cache_free(blk_requestq_cachep, q); + if (blk_init_free_list(q)) return NULL; - } q->request_fn = rfn; q->prep_rq_fn = NULL; @@ -618,7 +621,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return q; } - blk_put_queue(q); return NULL; } EXPORT_SYMBOL(blk_init_allocated_queue_node); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4] block: avoid unconditionally freeing previously allocated request_queue 2010-06-03 16:58 ` [PATCH v3] " Mike Snitzer @ 2010-06-03 17:34 ` Mike Snitzer 2010-06-04 11:44 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2010-06-03 17:34 UTC (permalink / raw) To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon block: avoid unconditionally freeing previously allocated request_queue On blk_init_allocated_queue_node failure, only free the request_queue if it is wasn't previously allocated outside the block layer (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller). This addresses an interface bug introduced by the following commit: 01effb0 block: allow initialization of previously allocated request_queue Otherwise the request_queue may be free'd out from underneath a caller that is managing the request_queue directly (e.g. caller uses blk_alloc_queue + blk_init_allocated_queue_node). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 17 +++++++++++------ 1 files changed, 11 insertions(+), 6 deletions(-) v4: eliminate potential for NULL pointer in call to blk_cleanup_queue v3: leverage fact that blk_cleanup_queue will properly free all memory associated with a request_queue (e.g.: q->rq_pool and q->elevator) diff --git a/block/blk-core.c b/block/blk-core.c index 3bc5579..826d070 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -570,9 +570,17 @@ EXPORT_SYMBOL(blk_init_queue); struct request_queue * blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { - struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id); + struct request_queue *uninit_q, *q; - return blk_init_allocated_queue_node(q, rfn, lock, node_id); + uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id); + if (!uninit_q) + return NULL; + + q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id); + if (!q) + blk_cleanup_queue(uninit_q); + + return q; } EXPORT_SYMBOL(blk_init_queue_node); @@ -592,10 +600,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return NULL; q->node = node_id; - if (blk_init_free_list(q)) { - kmem_cache_free(blk_requestq_cachep, q); + if (blk_init_free_list(q)) return NULL; - } q->request_fn = rfn; q->prep_rq_fn = NULL; @@ -618,7 +624,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn, return q; } - blk_put_queue(q); return NULL; } EXPORT_SYMBOL(blk_init_allocated_queue_node); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4] block: avoid unconditionally freeing previously allocated request_queue 2010-06-03 17:34 ` [PATCH v4] " Mike Snitzer @ 2010-06-04 11:44 ` Jens Axboe 0 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2010-06-04 11:44 UTC (permalink / raw) To: Mike Snitzer Cc: Kiyoshi Ueda, dm-devel@redhat.com, linux-kernel@vger.kernel.org, Alasdair Kergon On 2010-06-03 19:34, Mike Snitzer wrote: > block: avoid unconditionally freeing previously allocated request_queue > > On blk_init_allocated_queue_node failure, only free the request_queue if > it is wasn't previously allocated outside the block layer > (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller). > > This addresses an interface bug introduced by the following commit: > 01effb0 block: allow initialization of previously allocated > request_queue > > Otherwise the request_queue may be free'd out from underneath a caller > that is managing the request_queue directly (e.g. caller uses > blk_alloc_queue + blk_init_allocated_queue_node). Thanks Mike, this looks a lot better. I have applied this one and 2/2 of the original posting. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-06-04 11:44 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer 2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer 2010-05-25 11:16 ` Kiyoshi Ueda 2010-05-25 12:44 ` Mike Snitzer 2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer 2010-05-25 11:18 ` Kiyoshi Ueda 2010-05-25 12:49 ` Mike Snitzer 2010-05-25 16:34 ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Mike Snitzer 2010-05-25 16:34 ` Mike Snitzer 2010-05-25 17:15 ` [PATCH 2/1] block: make blk_init_free_list and elevator_init idempotent Mike Snitzer 2010-05-26 2:37 ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Kiyoshi Ueda 2010-05-26 4:47 ` Mike Snitzer 2010-05-26 4:52 ` [PATCH v2] " Mike Snitzer 2010-06-03 16:58 ` [PATCH v3] " Mike Snitzer 2010-06-03 17:34 ` [PATCH v4] " Mike Snitzer 2010-06-04 11:44 ` Jens Axboe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.