* [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization
@ 2010-05-24 23:46 Mike Snitzer
2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
0 siblings, 2 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
A mapped_device now has a specific immutable type (md->type) that is
set during the initial table_load (see: patch 1/2).
The DM device's request_queue will only have an elevator allocated and
exposed via sysfs if it is request-based (see: patch 2/2).
Please see the individual patch headers for more details.
(Kiyoshi, if you approve please reply with your "Acked-by" :)
This patchset is based on agk's DM "editing" tree:
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/
(apply all patches listed in the series file up to NEXT_PATCHES_END)
This also patchset depends on the following linux-2.6-block.git
'for-linus' branch commits:
01effb0 block: allow initialization of previously allocated request_queue
e36f724 block: Adjust elv_iosched_show to return "none" for bio-based DM
I've made a full quilt series available here (builds on 2.6.34):
http://people.redhat.com/msnitzer/patches/dm-queue-init/latest/
A monolithic patch is also available here:
http://people.redhat.com/msnitzer/patches/dm-queue-init/latest/monolithic.patch
I welcome all comments/review.
v2 changes:
- removed do_resume and table_clear interlock
- rebased to agk's latest "editing" tree
- dropped the dm_get_verified_mdptr and find_device_noinit patches that
were at the end of v1; can revisit that code duplication cleanup later
Mike Snitzer (2):
dm: prevent table type changes after initial table load
dm: only initialize full request_queue for request-based device
drivers/md/dm-ioctl.c | 40 ++++++++++
drivers/md/dm.c | 189 ++++++++++++++++++++++++++++++++++++++++--------
drivers/md/dm.h | 8 ++
3 files changed, 205 insertions(+), 32 deletions(-)
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/2] dm: prevent table type changes after initial table load
2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
@ 2010-05-24 23:46 ` Mike Snitzer
2010-05-25 11:16 ` Kiyoshi Ueda
2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
1 sibling, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
Before table_load() stages an inactive table atomically check if its
type conflicts with a previously established device type (md->type).
Introduce 'type_lock' in mapped_device structure and use it to protect
md->type access. table_load() sets md->type without concern for:
- another table_load() racing to set conflicting md->type.
- do_resume() making a conflicting table live.
Allowed transitions:
UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE
UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE
Once a table load occurs the DM device's type is completely immutable.
This prevents a table reload from switching the inactive table directly
to a conflicting type (even if the table is explicitly cleared before
load).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-ioctl.c | 29 ++++++++++++++++++++
drivers/md/dm.c | 71 +++++++++++++++++++++++++++++++++++++++++++++-----
drivers/md/dm.h | 6 ++++
3 files changed, 99 insertions(+), 7 deletions(-)
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p
goto out;
}
+ /*
+ * Protect md->type against concurrent table loads.
+ * Locking strategy:
+ * + Leverage fact that md's type cannot change after initial table load.
+ * - Only protect type in table_load() -- not in do_resume().
+ *
+ * + Protect type while working to stage an inactive table:
+ * - check if table's type conflicts with md->type
+ * (holding: md->type_lock)
+ * - stage inactive table (hc->new_map)
+ * (holding: md->type_lock + _hash_lock)
+ */
+ dm_lock_md_type(md);
+
+ if (dm_unknown_md_type(md)) {
+ /* initial table load, set md's type based on table's type */
+ dm_set_md_type(md, t);
+ } else if (!dm_md_type_matches_table(md, t)) {
+ DMWARN("can't change device type after initial table load.");
+ dm_table_destroy(t);
+ dm_unlock_md_type(md);
+ r = -EINVAL;
+ goto out;
+ }
+
+ /* stage inactive table */
down_write(&_hash_lock);
hc = dm_get_mdptr(md);
if (!hc || hc->md != md) {
DMWARN("device has been removed from the dev hash table.");
dm_table_destroy(t);
up_write(&_hash_lock);
+ dm_unlock_md_type(md);
r = -ENXIO;
goto out;
}
@@ -1191,6 +1218,8 @@ static int table_load(struct dm_ioctl *p
hc->new_map = t;
up_write(&_hash_lock);
+ dm_unlock_md_type(md);
+
param->flags |= DM_INACTIVE_PRESENT_FLAG;
__dev_status(md, param);
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c
+++ linux-2.6/drivers/md/dm.c
@@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
#define DMF_QUEUE_IO_TO_THREAD 6
/*
+ * Type for md->type field.
+ */
+enum mapped_device_type {
+ UNKNOWN_MD_TYPE,
+ BIO_BASED_MD_TYPE,
+ REQUEST_BASED_MD_TYPE,
+};
+
+/*
* Work processed by per-device workqueue.
*/
struct mapped_device {
@@ -123,6 +132,12 @@ struct mapped_device {
unsigned long flags;
struct request_queue *queue;
+ enum mapped_device_type type;
+ /*
+ * Protect type from concurrent access.
+ */
+ struct mutex type_lock;
+
struct gendisk *disk;
char name[16];
@@ -1874,8 +1889,10 @@ static struct mapped_device *alloc_dev(i
if (r < 0)
goto bad_minor;
+ md->type = UNKNOWN_MD_TYPE;
init_rwsem(&md->io_lock);
mutex_init(&md->suspend_lock);
+ mutex_init(&md->type_lock);
spin_lock_init(&md->deferred_lock);
spin_lock_init(&md->barrier_error_lock);
rwlock_init(&md->map_lock);
@@ -2128,6 +2145,53 @@ int dm_create(int minor, struct mapped_d
return 0;
}
+/*
+ * Functions to manage md->type.
+ * All are required to hold md->type_lock.
+ */
+void dm_lock_md_type(struct mapped_device *md)
+{
+ mutex_lock(&md->type_lock);
+}
+
+void dm_unlock_md_type(struct mapped_device *md)
+{
+ mutex_unlock(&md->type_lock);
+}
+
+void dm_set_md_type(struct mapped_device *md, struct dm_table* t)
+{
+ if (dm_table_request_based(t))
+ md->type = REQUEST_BASED_MD_TYPE;
+ else
+ md->type = BIO_BASED_MD_TYPE;
+}
+
+bool dm_unknown_md_type(struct mapped_device *md)
+{
+ return md->type == UNKNOWN_MD_TYPE;
+}
+
+static bool dm_bio_based_md_type(struct mapped_device *md)
+{
+ return md->type == BIO_BASED_MD_TYPE;
+}
+
+static bool dm_request_based_md_type(struct mapped_device *md)
+{
+ return md->type == REQUEST_BASED_MD_TYPE;
+}
+
+bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t)
+{
+ if (dm_request_based_md_type(md))
+ return dm_table_request_based(t);
+ else if (dm_bio_based_md_type(md))
+ return !dm_table_request_based(t);
+
+ return 0;
+}
+
static struct mapped_device *dm_find_md(dev_t dev)
{
struct mapped_device *md;
@@ -2403,13 +2467,6 @@ struct dm_table *dm_swap_table(struct ma
goto out;
}
- /* cannot change the device type, once a table is bound */
- if (md->map &&
- (dm_table_get_type(md->map) != dm_table_get_type(table))) {
- DMWARN("can't change the device type after a table is bound");
- goto out;
- }
-
map = __bind(md, table, &limits);
out:
Index: linux-2.6/drivers/md/dm.h
===================================================================
--- linux-2.6.orig/drivers/md/dm.h
+++ linux-2.6/drivers/md/dm.h
@@ -66,6 +66,12 @@ int dm_table_alloc_md_mempools(struct dm
void dm_table_free_md_mempools(struct dm_table *t);
struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
+void dm_lock_md_type(struct mapped_device *md);
+void dm_unlock_md_type(struct mapped_device *md);
+void dm_set_md_type(struct mapped_device *md, struct dm_table* t);
+bool dm_unknown_md_type(struct mapped_device *md);
+bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t);
+
/*
* To check the return value from dm_table_find_target().
*/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device
2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
@ 2010-05-24 23:46 ` Mike Snitzer
2010-05-25 11:18 ` Kiyoshi Ueda
1 sibling, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
Allocate a minimalist request_queue structure initially (needed for both
bio and request-based DM). A bio-based DM device no longer defaults to
having a fully initialized request_queue (request_fn, elevator, etc).
So bio-based DM devices no longer register elevator sysfs attributes
('iosched/' tree or 'scheduler' other than "none").
Initialization of a full request_queue (request_fn, elevator, etc) is
deferred until it is known that the DM device is request-based -- at the
end of the table load sequence.
Factor DM device's request_queue initialization:
- common to both request-based and bio-based into dm_init_md_queue().
- specific to request-based into dm_init_request_based_queue().
md->type_lock is also used to protect md->queue during table_load().
md->queue is setup without concern for:
- another table_load() racing to setup conflicting queue state.
- do_resume() making a conflicting table live.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-ioctl.c | 15 +++++-
drivers/md/dm.c | 120 +++++++++++++++++++++++++++++++++++++++-----------
drivers/md/dm.h | 2
3 files changed, 109 insertions(+), 28 deletions(-)
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -1177,14 +1177,16 @@ static int table_load(struct dm_ioctl *p
}
/*
- * Protect md->type against concurrent table loads.
+ * Protect md->type and md->queue against concurrent table loads.
* Locking strategy:
* + Leverage fact that md's type cannot change after initial table load.
* - Only protect type in table_load() -- not in do_resume().
*
- * + Protect type while working to stage an inactive table:
+ * + Protect type and queue while working to stage an inactive table:
* - check if table's type conflicts with md->type
* (holding: md->type_lock)
+ * - setup md->queue based on md->type
+ * (holding: md->type_lock)
* - stage inactive table (hc->new_map)
* (holding: md->type_lock + _hash_lock)
*/
@@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p
goto out;
}
+ /* setup md->queue to reflect md's and table's type (may block) */
+ r = dm_setup_md_queue(md);
+ if (r) {
+ DMWARN("unable to setup device queue for this table.");
+ dm_table_destroy(t);
+ dm_unlock_md_type(md);
+ goto out;
+ }
+
/* stage inactive table */
down_write(&_hash_lock);
hc = dm_get_mdptr(md);
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c
+++ linux-2.6/drivers/md/dm.c
@@ -134,7 +134,7 @@ struct mapped_device {
struct request_queue *queue;
enum mapped_device_type type;
/*
- * Protect type from concurrent access.
+ * Protect queue and type from concurrent access.
*/
struct mutex type_lock;
@@ -1864,6 +1864,28 @@ static const struct block_device_operati
static void dm_wq_work(struct work_struct *work);
static void dm_rq_barrier_work(struct work_struct *work);
+static void dm_init_md_queue(struct mapped_device *md)
+{
+ /*
+ * Request-based dm devices cannot be stacked on top of bio-based dm
+ * devices. The type of this dm device has not been decided yet.
+ * The type is decided at the first table loading time.
+ * To prevent problematic device stacking, clear the queue flag
+ * for request stacking support until then.
+ *
+ * This queue is new, so no concurrency on the queue_flags.
+ */
+ queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
+
+ md->queue->queuedata = md;
+ md->queue->backing_dev_info.congested_fn = dm_any_congested;
+ md->queue->backing_dev_info.congested_data = md;
+ blk_queue_make_request(md->queue, dm_request);
+ blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
+ md->queue->unplug_fn = dm_unplug_all;
+ blk_queue_merge_bvec(md->queue, dm_merge_bvec);
+}
+
/*
* Allocate and initialise a blank device with a given minor.
*/
@@ -1903,34 +1925,11 @@ static struct mapped_device *alloc_dev(i
INIT_LIST_HEAD(&md->uevent_list);
spin_lock_init(&md->uevent_lock);
- md->queue = blk_init_queue(dm_request_fn, NULL);
+ md->queue = blk_alloc_queue(GFP_KERNEL);
if (!md->queue)
goto bad_queue;
- /*
- * Request-based dm devices cannot be stacked on top of bio-based dm
- * devices. The type of this dm device has not been decided yet,
- * although we initialized the queue using blk_init_queue().
- * The type is decided at the first table loading time.
- * To prevent problematic device stacking, clear the queue flag
- * for request stacking support until then.
- *
- * This queue is new, so no concurrency on the queue_flags.
- */
- queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
- md->saved_make_request_fn = md->queue->make_request_fn;
- md->queue->queuedata = md;
- md->queue->backing_dev_info.congested_fn = dm_any_congested;
- md->queue->backing_dev_info.congested_data = md;
- blk_queue_make_request(md->queue, dm_request);
- blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
- md->queue->unplug_fn = dm_unplug_all;
- blk_queue_merge_bvec(md->queue, dm_merge_bvec);
- blk_queue_softirq_done(md->queue, dm_softirq_done);
- blk_queue_prep_rq(md->queue, dm_prep_fn);
- blk_queue_lld_busy(md->queue, dm_lld_busy);
- blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
- dm_rq_prepare_flush);
+ dm_init_md_queue(md);
md->disk = alloc_disk(1);
if (!md->disk)
@@ -2192,6 +2191,75 @@ bool dm_md_type_matches_table(struct map
return 0;
}
+/*
+ * Functions to manage md->queue.
+ * All are required to hold md->type_lock.
+ */
+static bool dm_bio_based_md_queue(struct mapped_device *md)
+{
+ return (md->queue->request_fn) ? 0 : 1;
+}
+
+/*
+ * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
+ */
+static int dm_init_request_based_queue(struct mapped_device *md)
+{
+ struct request_queue *q = NULL;
+
+ /* Avoid re-initializing the queue if already fully initialized */
+ if (!md->queue->elevator) {
+ /* Fully initialize the queue */
+ q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
+ if (!q)
+ return 0;
+ md->queue = q;
+ md->saved_make_request_fn = md->queue->make_request_fn;
+ dm_init_md_queue(md);
+ blk_queue_softirq_done(md->queue, dm_softirq_done);
+ blk_queue_prep_rq(md->queue, dm_prep_fn);
+ blk_queue_lld_busy(md->queue, dm_lld_busy);
+ blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
+ dm_rq_prepare_flush);
+ } else {
+ BUG_ON(dm_bio_based_md_queue(md));
+ return 1; /* queue already request-based */
+ }
+
+ elv_register_queue(md->queue);
+
+ return 1;
+}
+
+static void dm_clear_request_based_queue(struct mapped_device *md)
+{
+ if (dm_bio_based_md_queue(md))
+ return; /* queue already bio-based */
+
+ /* Unregister elevator from sysfs and clear ->request_fn */
+ elv_unregister_queue(md->queue);
+ md->queue->request_fn = NULL;
+}
+
+/*
+ * Setup the DM device's queue based on md's type
+ */
+int dm_setup_md_queue(struct mapped_device *md)
+{
+ BUG_ON(!mutex_is_locked(&md->type_lock));
+ BUG_ON(dm_unknown_md_type(md));
+
+ if (dm_request_based_md_type(md)) {
+ if (!dm_init_request_based_queue(md)) {
+ DMWARN("Cannot initialize queue for Request-based dm");
+ return -EINVAL;
+ }
+ } else if (dm_bio_based_md_type(md))
+ dm_clear_request_based_queue(md);
+
+ return 0;
+}
+
static struct mapped_device *dm_find_md(dev_t dev)
{
struct mapped_device *md;
Index: linux-2.6/drivers/md/dm.h
===================================================================
--- linux-2.6.orig/drivers/md/dm.h
+++ linux-2.6/drivers/md/dm.h
@@ -72,6 +72,8 @@ void dm_set_md_type(struct mapped_device
bool dm_unknown_md_type(struct mapped_device *md);
bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t);
+int dm_setup_md_queue(struct mapped_device *md);
+
/*
* To check the return value from dm_table_find_target().
*/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] dm: prevent table type changes after initial table load
2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
@ 2010-05-25 11:16 ` Kiyoshi Ueda
2010-05-25 12:44 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Kiyoshi Ueda @ 2010-05-25 11:16 UTC (permalink / raw)
To: Mike Snitzer; +Cc: dm-devel, Alasdair Kergon
Hi Mike,
On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> Before table_load() stages an inactive table atomically check if its
> type conflicts with a previously established device type (md->type).
>
> Introduce 'type_lock' in mapped_device structure and use it to protect
> md->type access. table_load() sets md->type without concern for:
> - another table_load() racing to set conflicting md->type.
> - do_resume() making a conflicting table live.
>
> Allowed transitions:
> UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE
> UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE
>
> Once a table load occurs the DM device's type is completely immutable.
>
> This prevents a table reload from switching the inactive table directly
> to a conflicting type (even if the table is explicitly cleared before
> load).
>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
I basically agree with this design.
But I have some comments. Please see below.
By the way, I can't make enough time to review until tomorrow.
So the following comments may not be all.
Sorry for the inconvenience.
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p
> goto out;
> }
>
> + /*
> + * Protect md->type against concurrent table loads.
> + * Locking strategy:
> + * + Leverage fact that md's type cannot change after initial table load.
> + * - Only protect type in table_load() -- not in do_resume().
> + *
> + * + Protect type while working to stage an inactive table:
> + * - check if table's type conflicts with md->type
> + * (holding: md->type_lock)
> + * - stage inactive table (hc->new_map)
> + * (holding: md->type_lock + _hash_lock)
> + */
I think you don't need to hold md->type_lock while you set the table
to hc->new_map; If 2 tables of the same type are racing here, we can
have either table. If they are different type, the former one will win
and the latter one will be rejected.
> + dm_lock_md_type(md);
> +
> + if (dm_unknown_md_type(md)) {
> + /* initial table load, set md's type based on table's type */
> + dm_set_md_type(md, t);
> + } else if (!dm_md_type_matches_table(md, t)) {
> + DMWARN("can't change device type after initial table load.");
> + dm_table_destroy(t);
> + dm_unlock_md_type(md);
> + r = -EINVAL;
> + goto out;
> + }
So you should be able to unlock md->type_lock here.
> Index: linux-2.6/drivers/md/dm.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm.c
> +++ linux-2.6/drivers/md/dm.c
> @@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
> #define DMF_QUEUE_IO_TO_THREAD 6
>
> /*
> + * Type for md->type field.
> + */
> +enum mapped_device_type {
> + UNKNOWN_MD_TYPE,
> + BIO_BASED_MD_TYPE,
> + REQUEST_BASED_MD_TYPE,
> +};
You can use the defined types in drivers/md/dm.h below instead;
/*
* Type of table and mapped_device's mempool
*/
#define DM_TYPE_NONE 0
#define DM_TYPE_BIO_BASED 1
#define DM_TYPE_REQUEST_BASED 2
Using them should make type matching easier than making another definition.
E.g. see below;
> +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t)
> +{
> + if (dm_request_based_md_type(md))
> + return dm_table_request_based(t);
> + else if (dm_bio_based_md_type(md))
> + return !dm_table_request_based(t);
> +
> + return 0;
> +}
You can match type using "md->type == dm_table_get_type(t)" if you use
the already defined types above.
Thanks,
Kiyoshi Ueda
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device
2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
@ 2010-05-25 11:18 ` Kiyoshi Ueda
2010-05-25 12:49 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Kiyoshi Ueda @ 2010-05-25 11:18 UTC (permalink / raw)
To: Mike Snitzer; +Cc: dm-devel, Alasdair Kergon
Hi Mike,
On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p
> goto out;
> }
>
> + /* setup md->queue to reflect md's and table's type (may block) */
> + r = dm_setup_md_queue(md);
> + if (r) {
> + DMWARN("unable to setup device queue for this table.");
> + dm_table_destroy(t);
> + dm_unlock_md_type(md);
> + goto out;
> + }
> +
dm_setup_md_queue() should be put just after dm_set_md_type() of patch#1.
Then, you can make sure that dm_setup_md_queue() is called only once
and dm_setup_md_queue() should be much simpler.
> +/*
> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> + */
> +static int dm_init_request_based_queue(struct mapped_device *md)
> +{
> + struct request_queue *q = NULL;
> +
> + /* Avoid re-initializing the queue if already fully initialized */
> + if (!md->queue->elevator) {
> + /* Fully initialize the queue */
> + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> + if (!q)
> + return 0;
When blk_init_allocated_queue() fails, the block-layer seems not to
guarantee that the queue is still available.
You should create appropriate block-layer interfaces and/or take proper
recovery action here.
However, if the failure is not realistic, rather than complicating this
patch, you can just put a big comment and WARN_ON() for now, and fix it
in a separate patch-set.
> +static void dm_clear_request_based_queue(struct mapped_device *md)
> +{
> + if (dm_bio_based_md_queue(md))
> + return; /* queue already bio-based */
> +
> + /* Unregister elevator from sysfs and clear ->request_fn */
> + elv_unregister_queue(md->queue);
> + md->queue->request_fn = NULL;
> +}
> +
> +/*
> + * Setup the DM device's queue based on md's type
> + */
> +int dm_setup_md_queue(struct mapped_device *md)
> +{
> + BUG_ON(!mutex_is_locked(&md->type_lock));
> + BUG_ON(dm_unknown_md_type(md));
> +
> + if (dm_request_based_md_type(md)) {
> + if (!dm_init_request_based_queue(md)) {
> + DMWARN("Cannot initialize queue for Request-based dm");
> + return -EINVAL;
> + }
> + } else if (dm_bio_based_md_type(md))
> + dm_clear_request_based_queue(md);
> +
> + return 0;
> +}
These unregister/clear works shouldn't be needed if dm_setup_md_queue()
is called only once as I mentioned above.
Thanks,
Kiyoshi Ueda
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] dm: prevent table type changes after initial table load
2010-05-25 11:16 ` Kiyoshi Ueda
@ 2010-05-25 12:44 ` Mike Snitzer
0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 12:44 UTC (permalink / raw)
To: Kiyoshi Ueda; +Cc: dm-devel, Alasdair Kergon
On Tue, May 25 2010 at 7:16am -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> Hi Mike,
>
> On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> > Before table_load() stages an inactive table atomically check if its
> > type conflicts with a previously established device type (md->type).
> >
> > Introduce 'type_lock' in mapped_device structure and use it to protect
> > md->type access. table_load() sets md->type without concern for:
> > - another table_load() racing to set conflicting md->type.
> > - do_resume() making a conflicting table live.
> >
> > Allowed transitions:
> > UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE
> > UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE
> >
> > Once a table load occurs the DM device's type is completely immutable.
> >
> > This prevents a table reload from switching the inactive table directly
> > to a conflicting type (even if the table is explicitly cleared before
> > load).
> >
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>
> I basically agree with this design.
> But I have some comments. Please see below.
>
> By the way, I can't make enough time to review until tomorrow.
> So the following comments may not be all.
> Sorry for the inconvenience.
No problem, thanks again for all your review.
> > Index: linux-2.6/drivers/md/dm-ioctl.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-ioctl.c
> > +++ linux-2.6/drivers/md/dm-ioctl.c
> > @@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p
> > goto out;
> > }
> >
> > + /*
> > + * Protect md->type against concurrent table loads.
> > + * Locking strategy:
> > + * + Leverage fact that md's type cannot change after initial table load.
> > + * - Only protect type in table_load() -- not in do_resume().
> > + *
> > + * + Protect type while working to stage an inactive table:
> > + * - check if table's type conflicts with md->type
> > + * (holding: md->type_lock)
> > + * - stage inactive table (hc->new_map)
> > + * (holding: md->type_lock + _hash_lock)
> > + */
>
> I think you don't need to hold md->type_lock while you set the table
> to hc->new_map; If 2 tables of the same type are racing here, we can
> have either table. If they are different type, the former one will win
> and the latter one will be rejected.
Correct. This v2 patchset has significantly reduced the complexity (no
longer need to be concerned with md->type transitioning back to
UNKNOWN_MD_TYPE). I just missed the fact that the locking could be
simplified a bit too.
> > + dm_lock_md_type(md);
> > +
> > + if (dm_unknown_md_type(md)) {
> > + /* initial table load, set md's type based on table's type */
> > + dm_set_md_type(md, t);
> > + } else if (!dm_md_type_matches_table(md, t)) {
> > + DMWARN("can't change device type after initial table load.");
> > + dm_table_destroy(t);
> > + dm_unlock_md_type(md);
> > + r = -EINVAL;
> > + goto out;
> > + }
>
> So you should be able to unlock md->type_lock here.
Yes.
> > Index: linux-2.6/drivers/md/dm.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm.c
> > +++ linux-2.6/drivers/md/dm.c
> > @@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
> > #define DMF_QUEUE_IO_TO_THREAD 6
> >
> > /*
> > + * Type for md->type field.
> > + */
> > +enum mapped_device_type {
> > + UNKNOWN_MD_TYPE,
> > + BIO_BASED_MD_TYPE,
> > + REQUEST_BASED_MD_TYPE,
> > +};
>
> You can use the defined types in drivers/md/dm.h below instead;
> /*
> * Type of table and mapped_device's mempool
> */
> #define DM_TYPE_NONE 0
> #define DM_TYPE_BIO_BASED 1
> #define DM_TYPE_REQUEST_BASED 2
>
> Using them should make type matching easier than making another definition.
> E.g. see below;
>
> > +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t)
> > +{
> > + if (dm_request_based_md_type(md))
> > + return dm_table_request_based(t);
> > + else if (dm_bio_based_md_type(md))
> > + return !dm_table_request_based(t);
> > +
> > + return 0;
> > +}
>
> You can match type using "md->type == dm_table_get_type(t)" if you use
> the already defined types above.
Ah, yes that'll clean things up nicely. No idea how I missed the
existing definitions.
Thanks,
Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device
2010-05-25 11:18 ` Kiyoshi Ueda
@ 2010-05-25 12:49 ` Mike Snitzer
2010-05-25 16:34 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 12:49 UTC (permalink / raw)
To: Kiyoshi Ueda; +Cc: dm-devel, Alasdair Kergon, jens.axboe
On Tue, May 25 2010 at 7:18am -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> Hi Mike,
>
> On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> > Index: linux-2.6/drivers/md/dm-ioctl.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-ioctl.c
> > +++ linux-2.6/drivers/md/dm-ioctl.c
> > @@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p
> > goto out;
> > }
> >
> > + /* setup md->queue to reflect md's and table's type (may block) */
> > + r = dm_setup_md_queue(md);
> > + if (r) {
> > + DMWARN("unable to setup device queue for this table.");
> > + dm_table_destroy(t);
> > + dm_unlock_md_type(md);
> > + goto out;
> > + }
> > +
>
> dm_setup_md_queue() should be put just after dm_set_md_type() of patch#1.
> Then, you can make sure that dm_setup_md_queue() is called only once
> and dm_setup_md_queue() should be much simpler.
Good idea.
> > +/*
> > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> > + */
> > +static int dm_init_request_based_queue(struct mapped_device *md)
> > +{
> > + struct request_queue *q = NULL;
> > +
> > + /* Avoid re-initializing the queue if already fully initialized */
> > + if (!md->queue->elevator) {
> > + /* Fully initialize the queue */
> > + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > + if (!q)
> > + return 0;
>
> When blk_init_allocated_queue() fails, the block-layer seems not to
> guarantee that the queue is still available.
Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
problematic:
if (blk_init_free_list(q)) {
kmem_cache_free(blk_requestq_cachep, q);
return NULL;
}
Cc'ing Jens as I think it would be advantageous for us to push the above
kmem_cache_free() into the callers where it really makes sense, e.g.:
blk_init_queue_node().
So on blk_init_allocated_queue_node() failure blk_init_queue_node() will
take care to cleanup the queue that it assumes it is managing
completely.
My patch (linux-2.6-block.git's commit: 01effb0) that split out
blk_init_allocated_queue_node() from blk_init_queue_node() opened up
this issue. I'm fairly confident we'll get it fixed by the time 2.6.35
ships.
> You should create appropriate block-layer interfaces and/or take proper
> recovery action here.
> However, if the failure is not realistic, rather than complicating this
> patch, you can just put a big comment and WARN_ON() for now, and fix it
> in a separate patch-set.
I'll hold off on the WARN_ON(). I think we need the request_queue to
remain allocated even after blk_init_allocated_queue_node() failure.
> > +static void dm_clear_request_based_queue(struct mapped_device *md)
> > +{
> > + if (dm_bio_based_md_queue(md))
> > + return; /* queue already bio-based */
> > +
> > + /* Unregister elevator from sysfs and clear ->request_fn */
> > + elv_unregister_queue(md->queue);
> > + md->queue->request_fn = NULL;
> > +}
> > +
> > +/*
> > + * Setup the DM device's queue based on md's type
> > + */
> > +int dm_setup_md_queue(struct mapped_device *md)
> > +{
> > + BUG_ON(!mutex_is_locked(&md->type_lock));
> > + BUG_ON(dm_unknown_md_type(md));
> > +
> > + if (dm_request_based_md_type(md)) {
> > + if (!dm_init_request_based_queue(md)) {
> > + DMWARN("Cannot initialize queue for Request-based dm");
> > + return -EINVAL;
> > + }
> > + } else if (dm_bio_based_md_type(md))
> > + dm_clear_request_based_queue(md);
> > +
> > + return 0;
> > +}
>
> These unregister/clear works shouldn't be needed if dm_setup_md_queue()
> is called only once as I mentioned above.
Indeed. I'll get v3 of the patchset out with appropriate edits.
Thanks,
Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] block: avoid unconditionally freeing previously allocated request_queue
2010-05-25 12:49 ` Mike Snitzer
@ 2010-05-25 16:34 ` Mike Snitzer
0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 16:34 UTC (permalink / raw)
To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon
On Tue, May 25 2010 at 8:49am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, May 25 2010 at 7:18am -0400,
> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
>
> > > +/*
> > > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> > > + */
> > > +static int dm_init_request_based_queue(struct mapped_device *md)
> > > +{
> > > + struct request_queue *q = NULL;
> > > +
> > > + /* Avoid re-initializing the queue if already fully initialized */
> > > + if (!md->queue->elevator) {
> > > + /* Fully initialize the queue */
> > > + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > > + if (!q)
> > > + return 0;
> >
> > When blk_init_allocated_queue() fails, the block-layer seems not to
> > guarantee that the queue is still available.
>
> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
> problematic:
>
> if (blk_init_free_list(q)) {
> kmem_cache_free(blk_requestq_cachep, q);
> return NULL;
> }
>
> Cc'ing Jens as I think it would be advantageous for us to push the above
> kmem_cache_free() into the callers where it really makes sense, e.g.:
> blk_init_queue_node().
>
> So on blk_init_allocated_queue_node() failure blk_init_queue_node() will
> take care to cleanup the queue that it assumes it is managing
> completely.
>
> My patch (linux-2.6-block.git's commit: 01effb0) that split out
> blk_init_allocated_queue_node() from blk_init_queue_node() opened up
> this issue. I'm fairly confident we'll get it fixed by the time 2.6.35
> ships.
Jens,
How about something like the following?
block: avoid unconditionally freeing previously allocated request_queue
On blk_init_allocated_queue_node failure, only free request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
This addresses a regression introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue
Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-core.c | 31 ++++++++++++++++++++++++++-----
1 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..c0179b7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -528,6 +528,24 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
}
EXPORT_SYMBOL(blk_alloc_queue_node);
+static void blk_free_partial_queue(struct request_queue *q)
+{
+ /* Free q if blk_init_queue failed early enough. */
+ int free_request_queue = 0;
+ struct request_list *rl;
+
+ if (!q)
+ return;
+
+ /* Was blk_init_free_list the cause for failure? */
+ rl = &q->rq;
+ if (!rl->rq_pool)
+ free_request_queue = 1;
+
+ if (free_request_queue)
+ kmem_cache_free(blk_requestq_cachep, q);
+}
+
/**
* blk_init_queue - prepare a request queue for use with a block device
* @rfn: The function to be called to process requests that have been
@@ -570,9 +588,14 @@ EXPORT_SYMBOL(blk_init_queue);
struct request_queue *
blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{
- struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ struct request_queue *uninit_q, *q;
+
+ uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+ if (!q)
+ blk_free_partial_queue(uninit_q);
- return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+ return q;
}
EXPORT_SYMBOL(blk_init_queue_node);
@@ -592,10 +615,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return NULL;
q->node = node_id;
- if (blk_init_free_list(q)) {
- kmem_cache_free(blk_requestq_cachep, q);
+ if (blk_init_free_list(q))
return NULL;
- }
q->request_fn = rfn;
q->prep_rq_fn = NULL;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH] block: avoid unconditionally freeing previously allocated request_queue
@ 2010-05-25 16:34 ` Mike Snitzer
0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 16:34 UTC (permalink / raw)
To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel
On Tue, May 25 2010 at 8:49am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, May 25 2010 at 7:18am -0400,
> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
>
> > > +/*
> > > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> > > + */
> > > +static int dm_init_request_based_queue(struct mapped_device *md)
> > > +{
> > > + struct request_queue *q = NULL;
> > > +
> > > + /* Avoid re-initializing the queue if already fully initialized */
> > > + if (!md->queue->elevator) {
> > > + /* Fully initialize the queue */
> > > + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > > + if (!q)
> > > + return 0;
> >
> > When blk_init_allocated_queue() fails, the block-layer seems not to
> > guarantee that the queue is still available.
>
> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
> problematic:
>
> if (blk_init_free_list(q)) {
> kmem_cache_free(blk_requestq_cachep, q);
> return NULL;
> }
>
> Cc'ing Jens as I think it would be advantageous for us to push the above
> kmem_cache_free() into the callers where it really makes sense, e.g.:
> blk_init_queue_node().
>
> So on blk_init_allocated_queue_node() failure blk_init_queue_node() will
> take care to cleanup the queue that it assumes it is managing
> completely.
>
> My patch (linux-2.6-block.git's commit: 01effb0) that split out
> blk_init_allocated_queue_node() from blk_init_queue_node() opened up
> this issue. I'm fairly confident we'll get it fixed by the time 2.6.35
> ships.
Jens,
How about something like the following?
block: avoid unconditionally freeing previously allocated request_queue
On blk_init_allocated_queue_node failure, only free request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
This addresses a regression introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue
Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-core.c | 31 ++++++++++++++++++++++++++-----
1 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..c0179b7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -528,6 +528,24 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
}
EXPORT_SYMBOL(blk_alloc_queue_node);
+static void blk_free_partial_queue(struct request_queue *q)
+{
+ /* Free q if blk_init_queue failed early enough. */
+ int free_request_queue = 0;
+ struct request_list *rl;
+
+ if (!q)
+ return;
+
+ /* Was blk_init_free_list the cause for failure? */
+ rl = &q->rq;
+ if (!rl->rq_pool)
+ free_request_queue = 1;
+
+ if (free_request_queue)
+ kmem_cache_free(blk_requestq_cachep, q);
+}
+
/**
* blk_init_queue - prepare a request queue for use with a block device
* @rfn: The function to be called to process requests that have been
@@ -570,9 +588,14 @@ EXPORT_SYMBOL(blk_init_queue);
struct request_queue *
blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{
- struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ struct request_queue *uninit_q, *q;
+
+ uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+ if (!q)
+ blk_free_partial_queue(uninit_q);
- return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+ return q;
}
EXPORT_SYMBOL(blk_init_queue_node);
@@ -592,10 +615,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return NULL;
q->node = node_id;
- if (blk_init_free_list(q)) {
- kmem_cache_free(blk_requestq_cachep, q);
+ if (blk_init_free_list(q))
return NULL;
- }
q->request_fn = rfn;
q->prep_rq_fn = NULL;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/1] block: make blk_init_free_list and elevator_init idempotent
2010-05-25 16:34 ` Mike Snitzer
(?)
@ 2010-05-25 17:15 ` Mike Snitzer
-1 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 17:15 UTC (permalink / raw)
To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel
blk_init_allocated_queue_node may fail and the caller _could_ retry.
Accommodate the unlikely event that blk_init_allocated_queue_node is
called on an already initialized (possibly partially) request_queue.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-core.c | 3 +++
block/elevator.c | 6 ++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index c0179b7..2c208c3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -467,6 +467,9 @@ static int blk_init_free_list(struct request_queue *q)
{
struct request_list *rl = &q->rq;
+ if (unlikely(rl->rq_pool))
+ return 0;
+
rl->count[BLK_RW_SYNC] = rl->count[BLK_RW_ASYNC] = 0;
rl->starved[BLK_RW_SYNC] = rl->starved[BLK_RW_ASYNC] = 0;
rl->elvpriv = 0;
diff --git a/block/elevator.c b/block/elevator.c
index 0abce47..923a913 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -242,9 +242,11 @@ int elevator_init(struct request_queue *q, char *name)
{
struct elevator_type *e = NULL;
struct elevator_queue *eq;
- int ret = 0;
void *data;
+ if (unlikely(q->elevator))
+ return 0;
+
INIT_LIST_HEAD(&q->queue_head);
q->last_merge = NULL;
q->end_sector = 0;
@@ -284,7 +286,7 @@ int elevator_init(struct request_queue *q, char *name)
}
elevator_attach(q, eq, data);
- return ret;
+ return 0;
}
EXPORT_SYMBOL(elevator_init);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] block: avoid unconditionally freeing previously allocated request_queue
2010-05-25 16:34 ` Mike Snitzer
(?)
(?)
@ 2010-05-26 2:37 ` Kiyoshi Ueda
2010-05-26 4:47 ` Mike Snitzer
-1 siblings, 1 reply; 16+ messages in thread
From: Kiyoshi Ueda @ 2010-05-26 2:37 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jens Axboe, dm-devel, Alasdair Kergon, linux-kernel
Hi Mike,
On 05/26/2010 01:34 AM +0900, Mike Snitzer wrote:
> Mike Snitzer <snitzer@redhat.com> wrote:
>> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
>>>> +/*
>>>> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
>>>> + */
>>>> +static int dm_init_request_based_queue(struct mapped_device *md)
>>>> +{
>>>> + struct request_queue *q = NULL;
>>>> +
>>>> + /* Avoid re-initializing the queue if already fully initialized */
>>>> + if (!md->queue->elevator) {
>>>> + /* Fully initialize the queue */
>>>> + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
>>>> + if (!q)
>>>> + return 0;
>>>
>>> When blk_init_allocated_queue() fails, the block-layer seems not to
>>> guarantee that the queue is still available.
>>
>> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
>> problematic:
>>
>> if (blk_init_free_list(q)) {
>> kmem_cache_free(blk_requestq_cachep, q);
>> return NULL;
>> }
Not only that. The blk_put_queue() in blk_init_allocated_queue_node()
will also free the queue:
if (!elevator_init(q, NULL)) {
blk_queue_congestion_threshold(q);
return q;
}
blk_put_queue(q);
return NULL;
Thanks,
Kiyoshi Ueda
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: block: avoid unconditionally freeing previously allocated request_queue
2010-05-26 2:37 ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Kiyoshi Ueda
@ 2010-05-26 4:47 ` Mike Snitzer
0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-26 4:47 UTC (permalink / raw)
To: Kiyoshi Ueda; +Cc: Jens Axboe, dm-devel, Alasdair Kergon, linux-kernel
On Tue, May 25 2010 at 10:37pm -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> Hi Mike,
>
> On 05/26/2010 01:34 AM +0900, Mike Snitzer wrote:
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> >>>> +/*
> >>>> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> >>>> + */
> >>>> +static int dm_init_request_based_queue(struct mapped_device *md)
> >>>> +{
> >>>> + struct request_queue *q = NULL;
> >>>> +
> >>>> + /* Avoid re-initializing the queue if already fully initialized */
> >>>> + if (!md->queue->elevator) {
> >>>> + /* Fully initialize the queue */
> >>>> + q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> >>>> + if (!q)
> >>>> + return 0;
> >>>
> >>> When blk_init_allocated_queue() fails, the block-layer seems not to
> >>> guarantee that the queue is still available.
> >>
> >> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
> >> problematic:
> >>
> >> if (blk_init_free_list(q)) {
> >> kmem_cache_free(blk_requestq_cachep, q);
> >> return NULL;
> >> }
>
> Not only that. The blk_put_queue() in blk_init_allocated_queue_node()
> will also free the queue:
>
> if (!elevator_init(q, NULL)) {
> blk_queue_congestion_threshold(q);
> return q;
> }
>
> blk_put_queue(q);
> return NULL;
OK, I'll post v2 that addresses this and we'll see what Jens says.
Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2] block: avoid unconditionally freeing previously allocated request_queue
2010-05-25 16:34 ` Mike Snitzer
` (2 preceding siblings ...)
(?)
@ 2010-05-26 4:52 ` Mike Snitzer
2010-06-03 16:58 ` [PATCH v3] " Mike Snitzer
-1 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-26 4:52 UTC (permalink / raw)
To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel
On blk_init_allocated_queue_node failure, only free request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue
Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-core.c | 33 +++++++++++++++++++++++++++------
1 files changed, 27 insertions(+), 6 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..c0cdafd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -528,6 +528,25 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
}
EXPORT_SYMBOL(blk_alloc_queue_node);
+static void blk_free_partial_queue(struct request_queue *q)
+{
+ struct request_list *rl;
+
+ if (!q)
+ return;
+
+ /* Was blk_init_free_list the cause for failure? */
+ rl = &q->rq;
+ if (!rl->rq_pool) {
+ kmem_cache_free(blk_requestq_cachep, q);
+ return;
+ }
+
+ /* Or was elevator_init? */
+ if (!q->elevator)
+ blk_put_queue(q);
+}
+
/**
* blk_init_queue - prepare a request queue for use with a block device
* @rfn: The function to be called to process requests that have been
@@ -570,9 +589,14 @@ EXPORT_SYMBOL(blk_init_queue);
struct request_queue *
blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{
- struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ struct request_queue *uninit_q, *q;
+
+ uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+ if (!q)
+ blk_free_partial_queue(uninit_q);
- return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+ return q;
}
EXPORT_SYMBOL(blk_init_queue_node);
@@ -592,10 +616,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return NULL;
q->node = node_id;
- if (blk_init_free_list(q)) {
- kmem_cache_free(blk_requestq_cachep, q);
+ if (blk_init_free_list(q))
return NULL;
- }
q->request_fn = rfn;
q->prep_rq_fn = NULL;
@@ -618,7 +640,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return q;
}
- blk_put_queue(q);
return NULL;
}
EXPORT_SYMBOL(blk_init_allocated_queue_node);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3] block: avoid unconditionally freeing previously allocated request_queue
2010-05-26 4:52 ` [PATCH v2] " Mike Snitzer
@ 2010-06-03 16:58 ` Mike Snitzer
2010-06-03 17:34 ` [PATCH v4] " Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-06-03 16:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon
On blk_init_allocated_queue_node failure, only free the request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue
Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-core.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
v3: leverage fact that blk_cleanup_queue will properly free all memory
associated with a request_queue (e.g.: q->rq_pool and q->elevator)
diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..24683a4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -570,9 +570,14 @@ EXPORT_SYMBOL(blk_init_queue);
struct request_queue *
blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{
- struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ struct request_queue *uninit_q, *q;
- return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+ uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+ if (!q)
+ blk_cleanup_queue(uninit_q);
+
+ return q;
}
EXPORT_SYMBOL(blk_init_queue_node);
@@ -592,10 +597,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return NULL;
q->node = node_id;
- if (blk_init_free_list(q)) {
- kmem_cache_free(blk_requestq_cachep, q);
+ if (blk_init_free_list(q))
return NULL;
- }
q->request_fn = rfn;
q->prep_rq_fn = NULL;
@@ -618,7 +621,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return q;
}
- blk_put_queue(q);
return NULL;
}
EXPORT_SYMBOL(blk_init_allocated_queue_node);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4] block: avoid unconditionally freeing previously allocated request_queue
2010-06-03 16:58 ` [PATCH v3] " Mike Snitzer
@ 2010-06-03 17:34 ` Mike Snitzer
2010-06-04 11:44 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-06-03 17:34 UTC (permalink / raw)
To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon
block: avoid unconditionally freeing previously allocated request_queue
On blk_init_allocated_queue_node failure, only free the request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated
request_queue
Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-core.c | 17 +++++++++++------
1 files changed, 11 insertions(+), 6 deletions(-)
v4: eliminate potential for NULL pointer in call to blk_cleanup_queue
v3: leverage fact that blk_cleanup_queue will properly free all memory
associated with a request_queue (e.g.: q->rq_pool and q->elevator)
diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..826d070 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -570,9 +570,17 @@ EXPORT_SYMBOL(blk_init_queue);
struct request_queue *
blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{
- struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ struct request_queue *uninit_q, *q;
- return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+ uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+ if (!uninit_q)
+ return NULL;
+
+ q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+ if (!q)
+ blk_cleanup_queue(uninit_q);
+
+ return q;
}
EXPORT_SYMBOL(blk_init_queue_node);
@@ -592,10 +600,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return NULL;
q->node = node_id;
- if (blk_init_free_list(q)) {
- kmem_cache_free(blk_requestq_cachep, q);
+ if (blk_init_free_list(q))
return NULL;
- }
q->request_fn = rfn;
q->prep_rq_fn = NULL;
@@ -618,7 +624,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
return q;
}
- blk_put_queue(q);
return NULL;
}
EXPORT_SYMBOL(blk_init_allocated_queue_node);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4] block: avoid unconditionally freeing previously allocated request_queue
2010-06-03 17:34 ` [PATCH v4] " Mike Snitzer
@ 2010-06-04 11:44 ` Jens Axboe
0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2010-06-04 11:44 UTC (permalink / raw)
To: Mike Snitzer
Cc: Kiyoshi Ueda, dm-devel@redhat.com, linux-kernel@vger.kernel.org,
Alasdair Kergon
On 2010-06-03 19:34, Mike Snitzer wrote:
> block: avoid unconditionally freeing previously allocated request_queue
>
> On blk_init_allocated_queue_node failure, only free the request_queue if
> it is wasn't previously allocated outside the block layer
> (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
>
> This addresses an interface bug introduced by the following commit:
> 01effb0 block: allow initialization of previously allocated
> request_queue
>
> Otherwise the request_queue may be free'd out from underneath a caller
> that is managing the request_queue directly (e.g. caller uses
> blk_alloc_queue + blk_init_allocated_queue_node).
Thanks Mike, this looks a lot better. I have applied this one and
2/2 of the original posting.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-06-04 11:44 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
2010-05-25 11:16 ` Kiyoshi Ueda
2010-05-25 12:44 ` Mike Snitzer
2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
2010-05-25 11:18 ` Kiyoshi Ueda
2010-05-25 12:49 ` Mike Snitzer
2010-05-25 16:34 ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Mike Snitzer
2010-05-25 16:34 ` Mike Snitzer
2010-05-25 17:15 ` [PATCH 2/1] block: make blk_init_free_list and elevator_init idempotent Mike Snitzer
2010-05-26 2:37 ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Kiyoshi Ueda
2010-05-26 4:47 ` Mike Snitzer
2010-05-26 4:52 ` [PATCH v2] " Mike Snitzer
2010-06-03 16:58 ` [PATCH v3] " Mike Snitzer
2010-06-03 17:34 ` [PATCH v4] " Mike Snitzer
2010-06-04 11:44 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.