[PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization
@ 2010-05-24 23:46 Mike Snitzer
  2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
  2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw)
  To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon

A mapped_device now has a specific immutable type (md->type) that is
set during the initial table_load (see: patch 1/2).

The DM device's request_queue will only have an elevator allocated and
exposed via sysfs if it is request-based (see: patch 2/2).

Please see the individual patch headers for more details.
(Kiyoshi, if you approve please reply with your "Acked-by" :)

This patchset is based on agk's DM "editing" tree:
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/
(apply all patches listed in the series file up to NEXT_PATCHES_END)

This also patchset depends on the following linux-2.6-block.git
'for-linus' branch commits:
01effb0 block: allow initialization of previously allocated request_queue
e36f724 block: Adjust elv_iosched_show to return "none" for bio-based DM

I've made a full quilt series available here (builds on 2.6.34):
http://people.redhat.com/msnitzer/patches/dm-queue-init/latest/

A monolithic patch is also available here:
http://people.redhat.com/msnitzer/patches/dm-queue-init/latest/monolithic.patch

I welcome all comments/review.

v2 changes:
- removed do_resume and table_clear interlock
- rebased to agk's latest "editing" tree
- dropped the dm_get_verified_mdptr and find_device_noinit patches that
  were at the end of v1; can revisit that code duplication cleanup later

Mike Snitzer (2):
  dm: prevent table type changes after initial table load
  dm: only initialize full request_queue for request-based device

 drivers/md/dm-ioctl.c |   40 ++++++++++
 drivers/md/dm.c       |  189 ++++++++++++++++++++++++++++++++++++++++--------
 drivers/md/dm.h       |    8 ++
 3 files changed, 205 insertions(+), 32 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] dm: prevent table type changes after initial table load
  2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
@ 2010-05-24 23:46 ` Mike Snitzer
  2010-05-25 11:16   ` Kiyoshi Ueda
  2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw)
  To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon

Before table_load() stages an inactive table atomically check if its
type conflicts with a previously established device type (md->type).

Introduce 'type_lock' in mapped_device structure and use it to protect
md->type access.  table_load() sets md->type without concern for:
- another table_load() racing to set conflicting md->type.
- do_resume() making a conflicting table live.

Allowed transitions:
UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE
UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE

Once a table load occurs the DM device's type is completely immutable.

This prevents a table reload from switching the inactive table directly
to a conflicting type (even if the table is explicitly cleared before
load).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-ioctl.c |   29 ++++++++++++++++++++
 drivers/md/dm.c       |   71 +++++++++++++++++++++++++++++++++++++++++++++-----
 drivers/md/dm.h       |    6 ++++
 3 files changed, 99 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p
 		goto out;
 	}
 
+	/*
+	 * Protect md->type against concurrent table loads.
+	 * Locking strategy:
+	 * + Leverage fact that md's type cannot change after initial table load.
+	 *   - Only protect type in table_load() -- not in do_resume().
+	 *
+	 * + Protect type while working to stage an inactive table:
+	 *   - check if table's type conflicts with md->type
+	 *     (holding: md->type_lock)
+	 *   - stage inactive table (hc->new_map)
+	 *     (holding: md->type_lock + _hash_lock)
+	 */
+	dm_lock_md_type(md);
+
+	if (dm_unknown_md_type(md)) {
+		/* initial table load, set md's type based on table's type */
+		dm_set_md_type(md, t);
+	} else if (!dm_md_type_matches_table(md, t)) {
+		DMWARN("can't change device type after initial table load.");
+		dm_table_destroy(t);
+		dm_unlock_md_type(md);
+		r = -EINVAL;
+		goto out;
+	}
+
+	/* stage inactive table */
 	down_write(&_hash_lock);
 	hc = dm_get_mdptr(md);
 	if (!hc || hc->md != md) {
 		DMWARN("device has been removed from the dev hash table.");
 		dm_table_destroy(t);
 		up_write(&_hash_lock);
+		dm_unlock_md_type(md);
 		r = -ENXIO;
 		goto out;
 	}
@@ -1191,6 +1218,8 @@ static int table_load(struct dm_ioctl *p
 	hc->new_map = t;
 	up_write(&_hash_lock);
 
+	dm_unlock_md_type(md);
+
 	param->flags |= DM_INACTIVE_PRESENT_FLAG;
 	__dev_status(md, param);
 
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c
+++ linux-2.6/drivers/md/dm.c
@@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
 #define DMF_QUEUE_IO_TO_THREAD 6
 
 /*
+ * Type for md->type field.
+ */
+enum mapped_device_type {
+	UNKNOWN_MD_TYPE,
+	BIO_BASED_MD_TYPE,
+	REQUEST_BASED_MD_TYPE,
+};
+
+/*
  * Work processed by per-device workqueue.
  */
 struct mapped_device {
@@ -123,6 +132,12 @@ struct mapped_device {
 	unsigned long flags;
 
 	struct request_queue *queue;
+	enum mapped_device_type type;
+	/*
+	 * Protect type from concurrent access.
+	 */
+	struct mutex type_lock;
+
 	struct gendisk *disk;
 	char name[16];
 
@@ -1874,8 +1889,10 @@ static struct mapped_device *alloc_dev(i
 	if (r < 0)
 		goto bad_minor;
 
+	md->type = UNKNOWN_MD_TYPE;
 	init_rwsem(&md->io_lock);
 	mutex_init(&md->suspend_lock);
+	mutex_init(&md->type_lock);
 	spin_lock_init(&md->deferred_lock);
 	spin_lock_init(&md->barrier_error_lock);
 	rwlock_init(&md->map_lock);
@@ -2128,6 +2145,53 @@ int dm_create(int minor, struct mapped_d
 	return 0;
 }
 
+/*
+ * Functions to manage md->type.
+ * All are required to hold md->type_lock.
+ */
+void dm_lock_md_type(struct mapped_device *md)
+{
+	mutex_lock(&md->type_lock);
+}
+
+void dm_unlock_md_type(struct mapped_device *md)
+{
+	mutex_unlock(&md->type_lock);
+}
+
+void dm_set_md_type(struct mapped_device *md, struct dm_table* t)
+{
+	if (dm_table_request_based(t))
+		md->type = REQUEST_BASED_MD_TYPE;
+	else
+		md->type = BIO_BASED_MD_TYPE;
+}
+
+bool dm_unknown_md_type(struct mapped_device *md)
+{
+	return md->type == UNKNOWN_MD_TYPE;
+}
+
+static bool dm_bio_based_md_type(struct mapped_device *md)
+{
+	return md->type == BIO_BASED_MD_TYPE;
+}
+
+static bool dm_request_based_md_type(struct mapped_device *md)
+{
+	return md->type == REQUEST_BASED_MD_TYPE;
+}
+
+bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t)
+{
+	if (dm_request_based_md_type(md))
+		return dm_table_request_based(t);
+	else if (dm_bio_based_md_type(md))
+		return !dm_table_request_based(t);
+
+	return 0;
+}
+
 static struct mapped_device *dm_find_md(dev_t dev)
 {
 	struct mapped_device *md;
@@ -2403,13 +2467,6 @@ struct dm_table *dm_swap_table(struct ma
 		goto out;
 	}
 
-	/* cannot change the device type, once a table is bound */
-	if (md->map &&
-	    (dm_table_get_type(md->map) != dm_table_get_type(table))) {
-		DMWARN("can't change the device type after a table is bound");
-		goto out;
-	}
-
 	map = __bind(md, table, &limits);
 
 out:
Index: linux-2.6/drivers/md/dm.h
===================================================================
--- linux-2.6.orig/drivers/md/dm.h
+++ linux-2.6/drivers/md/dm.h
@@ -66,6 +66,12 @@ int dm_table_alloc_md_mempools(struct dm
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
 
+void dm_lock_md_type(struct mapped_device *md);
+void dm_unlock_md_type(struct mapped_device *md);
+void dm_set_md_type(struct mapped_device *md, struct dm_table* t);
+bool dm_unknown_md_type(struct mapped_device *md);
+bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t);
+
 /*
  * To check the return value from dm_table_find_target().
  */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] dm: prevent table type changes after initial table load
  2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
@ 2010-05-25 11:16   ` Kiyoshi Ueda
  2010-05-25 12:44     ` Mike Snitzer
  0 siblings, 1 reply; 16+ messages in thread
From: Kiyoshi Ueda @ 2010-05-25 11:16 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair Kergon

Hi Mike,

On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> Before table_load() stages an inactive table atomically check if its
> type conflicts with a previously established device type (md->type).
> 
> Introduce 'type_lock' in mapped_device structure and use it to protect
> md->type access.  table_load() sets md->type without concern for:
> - another table_load() racing to set conflicting md->type.
> - do_resume() making a conflicting table live.
> 
> Allowed transitions:
> UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE
> UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE
> 
> Once a table load occurs the DM device's type is completely immutable.
> 
> This prevents a table reload from switching the inactive table directly
> to a conflicting type (even if the table is explicitly cleared before
> load).
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>

I basically agree with this design.
But I have some comments.  Please see below.

By the way, I can't make enough time to review until tomorrow.
So the following comments may not be all.
Sorry for the inconvenience.


> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p
>  		goto out;
>  	}
>  
> +	/*
> +	 * Protect md->type against concurrent table loads.
> +	 * Locking strategy:
> +	 * + Leverage fact that md's type cannot change after initial table load.
> +	 *   - Only protect type in table_load() -- not in do_resume().
> +	 *
> +	 * + Protect type while working to stage an inactive table:
> +	 *   - check if table's type conflicts with md->type
> +	 *     (holding: md->type_lock)
> +	 *   - stage inactive table (hc->new_map)
> +	 *     (holding: md->type_lock + _hash_lock)
> +	 */

I think you don't need to hold md->type_lock while you set the table
to hc->new_map; If 2 tables of the same type are racing here, we can
have either table.  If they are different type, the former one will win
and the latter one will be rejected.

> +	dm_lock_md_type(md);
> +
> +	if (dm_unknown_md_type(md)) {
> +		/* initial table load, set md's type based on table's type */
> +		dm_set_md_type(md, t);
> +	} else if (!dm_md_type_matches_table(md, t)) {
> +		DMWARN("can't change device type after initial table load.");
> +		dm_table_destroy(t);
> +		dm_unlock_md_type(md);
> +		r = -EINVAL;
> +		goto out;
> +	}

So you should be able to unlock md->type_lock here.


> Index: linux-2.6/drivers/md/dm.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm.c
> +++ linux-2.6/drivers/md/dm.c
> @@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
>  #define DMF_QUEUE_IO_TO_THREAD 6
>  
>  /*
> + * Type for md->type field.
> + */
> +enum mapped_device_type {
> +	UNKNOWN_MD_TYPE,
> +	BIO_BASED_MD_TYPE,
> +	REQUEST_BASED_MD_TYPE,
> +};

You can use the defined types in drivers/md/dm.h below instead;
/*
 * Type of table and mapped_device's mempool
 */
#define DM_TYPE_NONE            0
#define DM_TYPE_BIO_BASED       1
#define DM_TYPE_REQUEST_BASED   2

Using them should make type matching easier than making another definition.
E.g. see below;

> +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t)
> +{
> +	if (dm_request_based_md_type(md))
> +		return dm_table_request_based(t);
> +	else if (dm_bio_based_md_type(md))
> +		return !dm_table_request_based(t);
> +
> +	return 0;
> +}

You can match type using "md->type == dm_table_get_type(t)" if you use
the already defined types above.

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] dm: prevent table type changes after initial table load
  2010-05-25 11:16   ` Kiyoshi Ueda
@ 2010-05-25 12:44     ` Mike Snitzer
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 12:44 UTC (permalink / raw)
  To: Kiyoshi Ueda; +Cc: dm-devel, Alasdair Kergon

On Tue, May 25 2010 at  7:16am -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:

> Hi Mike,
> 
> On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> > Before table_load() stages an inactive table atomically check if its
> > type conflicts with a previously established device type (md->type).
> > 
> > Introduce 'type_lock' in mapped_device structure and use it to protect
> > md->type access.  table_load() sets md->type without concern for:
> > - another table_load() racing to set conflicting md->type.
> > - do_resume() making a conflicting table live.
> > 
> > Allowed transitions:
> > UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE
> > UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE
> > 
> > Once a table load occurs the DM device's type is completely immutable.
> > 
> > This prevents a table reload from switching the inactive table directly
> > to a conflicting type (even if the table is explicitly cleared before
> > load).
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> 
> I basically agree with this design.
> But I have some comments.  Please see below.
> 
> By the way, I can't make enough time to review until tomorrow.
> So the following comments may not be all.
> Sorry for the inconvenience.

No problem, thanks again for all your review.

> > Index: linux-2.6/drivers/md/dm-ioctl.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-ioctl.c
> > +++ linux-2.6/drivers/md/dm-ioctl.c
> > @@ -1176,12 +1176,39 @@ static int table_load(struct dm_ioctl *p
> >  		goto out;
> >  	}
> >  
> > +	/*
> > +	 * Protect md->type against concurrent table loads.
> > +	 * Locking strategy:
> > +	 * + Leverage fact that md's type cannot change after initial table load.
> > +	 *   - Only protect type in table_load() -- not in do_resume().
> > +	 *
> > +	 * + Protect type while working to stage an inactive table:
> > +	 *   - check if table's type conflicts with md->type
> > +	 *     (holding: md->type_lock)
> > +	 *   - stage inactive table (hc->new_map)
> > +	 *     (holding: md->type_lock + _hash_lock)
> > +	 */
> 
> I think you don't need to hold md->type_lock while you set the table
> to hc->new_map; If 2 tables of the same type are racing here, we can
> have either table.  If they are different type, the former one will win
> and the latter one will be rejected.

Correct.  This v2 patchset has significantly reduced the complexity (no
longer need to be concerned with md->type transitioning back to
UNKNOWN_MD_TYPE).  I just missed the fact that the locking could be
simplified a bit too.

> > +	dm_lock_md_type(md);
> > +
> > +	if (dm_unknown_md_type(md)) {
> > +		/* initial table load, set md's type based on table's type */
> > +		dm_set_md_type(md, t);
> > +	} else if (!dm_md_type_matches_table(md, t)) {
> > +		DMWARN("can't change device type after initial table load.");
> > +		dm_table_destroy(t);
> > +		dm_unlock_md_type(md);
> > +		r = -EINVAL;
> > +		goto out;
> > +	}
> 
> So you should be able to unlock md->type_lock here.

Yes.

> > Index: linux-2.6/drivers/md/dm.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm.c
> > +++ linux-2.6/drivers/md/dm.c
> > @@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
> >  #define DMF_QUEUE_IO_TO_THREAD 6
> >  
> >  /*
> > + * Type for md->type field.
> > + */
> > +enum mapped_device_type {
> > +	UNKNOWN_MD_TYPE,
> > +	BIO_BASED_MD_TYPE,
> > +	REQUEST_BASED_MD_TYPE,
> > +};
> 
> You can use the defined types in drivers/md/dm.h below instead;
> /*
>  * Type of table and mapped_device's mempool
>  */
> #define DM_TYPE_NONE            0
> #define DM_TYPE_BIO_BASED       1
> #define DM_TYPE_REQUEST_BASED   2
> 
> Using them should make type matching easier than making another definition.
> E.g. see below;
> 
> > +bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t)
> > +{
> > +	if (dm_request_based_md_type(md))
> > +		return dm_table_request_based(t);
> > +	else if (dm_bio_based_md_type(md))
> > +		return !dm_table_request_based(t);
> > +
> > +	return 0;
> > +}
> 
> You can match type using "md->type == dm_table_get_type(t)" if you use
> the already defined types above.

Ah, yes that'll clean things up nicely.  No idea how I missed the
existing definitions.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device
  2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
  2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
@ 2010-05-24 23:46 ` Mike Snitzer
  2010-05-25 11:18   ` Kiyoshi Ueda
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-24 23:46 UTC (permalink / raw)
  To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon

Allocate a minimalist request_queue structure initially (needed for both
bio and request-based DM).  A bio-based DM device no longer defaults to
having a fully initialized request_queue (request_fn, elevator, etc).
So bio-based DM devices no longer register elevator sysfs attributes
('iosched/' tree or 'scheduler' other than "none").

Initialization of a full request_queue (request_fn, elevator, etc) is
deferred until it is known that the DM device is request-based -- at the
end of the table load sequence.

Factor DM device's request_queue initialization:
- common to both request-based and bio-based into dm_init_md_queue().
- specific to request-based into dm_init_request_based_queue().

md->type_lock is also used to protect md->queue during table_load().
md->queue is setup without concern for:
- another table_load() racing to setup conflicting queue state.
- do_resume() making a conflicting table live.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-ioctl.c |   15 +++++-
 drivers/md/dm.c       |  120 +++++++++++++++++++++++++++++++++++++++-----------
 drivers/md/dm.h       |    2 
 3 files changed, 109 insertions(+), 28 deletions(-)

Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -1177,14 +1177,16 @@ static int table_load(struct dm_ioctl *p
 	}
 
 	/*
-	 * Protect md->type against concurrent table loads.
+	 * Protect md->type and md->queue against concurrent table loads.
 	 * Locking strategy:
 	 * + Leverage fact that md's type cannot change after initial table load.
 	 *   - Only protect type in table_load() -- not in do_resume().
 	 *
-	 * + Protect type while working to stage an inactive table:
+	 * + Protect type and queue while working to stage an inactive table:
 	 *   - check if table's type conflicts with md->type
 	 *     (holding: md->type_lock)
+	 *   - setup md->queue based on md->type
+	 *     (holding: md->type_lock)
 	 *   - stage inactive table (hc->new_map)
 	 *     (holding: md->type_lock + _hash_lock)
 	 */
@@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p
 		goto out;
 	}
 
+	/* setup md->queue to reflect md's and table's type (may block) */
+	r = dm_setup_md_queue(md);
+	if (r) {
+		DMWARN("unable to setup device queue for this table.");
+		dm_table_destroy(t);
+		dm_unlock_md_type(md);
+		goto out;
+	}
+
 	/* stage inactive table */
 	down_write(&_hash_lock);
 	hc = dm_get_mdptr(md);
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c
+++ linux-2.6/drivers/md/dm.c
@@ -134,7 +134,7 @@ struct mapped_device {
 	struct request_queue *queue;
 	enum mapped_device_type type;
 	/*
-	 * Protect type from concurrent access.
+	 * Protect queue and type from concurrent access.
 	 */
 	struct mutex type_lock;
 
@@ -1864,6 +1864,28 @@ static const struct block_device_operati
 static void dm_wq_work(struct work_struct *work);
 static void dm_rq_barrier_work(struct work_struct *work);
 
+static void dm_init_md_queue(struct mapped_device *md)
+{
+	/*
+	 * Request-based dm devices cannot be stacked on top of bio-based dm
+	 * devices.  The type of this dm device has not been decided yet.
+	 * The type is decided at the first table loading time.
+	 * To prevent problematic device stacking, clear the queue flag
+	 * for request stacking support until then.
+	 *
+	 * This queue is new, so no concurrency on the queue_flags.
+	 */
+	queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
+
+	md->queue->queuedata = md;
+	md->queue->backing_dev_info.congested_fn = dm_any_congested;
+	md->queue->backing_dev_info.congested_data = md;
+	blk_queue_make_request(md->queue, dm_request);
+	blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
+	md->queue->unplug_fn = dm_unplug_all;
+	blk_queue_merge_bvec(md->queue, dm_merge_bvec);
+}
+
 /*
  * Allocate and initialise a blank device with a given minor.
  */
@@ -1903,34 +1925,11 @@ static struct mapped_device *alloc_dev(i
 	INIT_LIST_HEAD(&md->uevent_list);
 	spin_lock_init(&md->uevent_lock);
 
-	md->queue = blk_init_queue(dm_request_fn, NULL);
+	md->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!md->queue)
 		goto bad_queue;
 
-	/*
-	 * Request-based dm devices cannot be stacked on top of bio-based dm
-	 * devices.  The type of this dm device has not been decided yet,
-	 * although we initialized the queue using blk_init_queue().
-	 * The type is decided at the first table loading time.
-	 * To prevent problematic device stacking, clear the queue flag
-	 * for request stacking support until then.
-	 *
-	 * This queue is new, so no concurrency on the queue_flags.
-	 */
-	queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
-	md->saved_make_request_fn = md->queue->make_request_fn;
-	md->queue->queuedata = md;
-	md->queue->backing_dev_info.congested_fn = dm_any_congested;
-	md->queue->backing_dev_info.congested_data = md;
-	blk_queue_make_request(md->queue, dm_request);
-	blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
-	md->queue->unplug_fn = dm_unplug_all;
-	blk_queue_merge_bvec(md->queue, dm_merge_bvec);
-	blk_queue_softirq_done(md->queue, dm_softirq_done);
-	blk_queue_prep_rq(md->queue, dm_prep_fn);
-	blk_queue_lld_busy(md->queue, dm_lld_busy);
-	blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
-			  dm_rq_prepare_flush);
+	dm_init_md_queue(md);
 
 	md->disk = alloc_disk(1);
 	if (!md->disk)
@@ -2192,6 +2191,75 @@ bool dm_md_type_matches_table(struct map
 	return 0;
 }
 
+/*
+ * Functions to manage md->queue.
+ * All are required to hold md->type_lock.
+ */
+static bool dm_bio_based_md_queue(struct mapped_device *md)
+{
+	return (md->queue->request_fn) ? 0 : 1;
+}
+
+/*
+ * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
+ */
+static int dm_init_request_based_queue(struct mapped_device *md)
+{
+	struct request_queue *q = NULL;
+
+	/* Avoid re-initializing the queue if already fully initialized */
+	if (!md->queue->elevator) {
+		/* Fully initialize the queue */
+		q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
+		if (!q)
+			return 0;
+		md->queue = q;
+		md->saved_make_request_fn = md->queue->make_request_fn;
+		dm_init_md_queue(md);
+		blk_queue_softirq_done(md->queue, dm_softirq_done);
+		blk_queue_prep_rq(md->queue, dm_prep_fn);
+		blk_queue_lld_busy(md->queue, dm_lld_busy);
+		blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
+				  dm_rq_prepare_flush);
+	} else {
+		BUG_ON(dm_bio_based_md_queue(md));
+		return 1; /* queue already request-based */
+	}
+
+	elv_register_queue(md->queue);
+
+	return 1;
+}
+
+static void dm_clear_request_based_queue(struct mapped_device *md)
+{
+	if (dm_bio_based_md_queue(md))
+		return; /* queue already bio-based */
+
+	/* Unregister elevator from sysfs and clear ->request_fn */
+	elv_unregister_queue(md->queue);
+	md->queue->request_fn = NULL;
+}
+
+/*
+ * Setup the DM device's queue based on md's type
+ */
+int dm_setup_md_queue(struct mapped_device *md)
+{
+	BUG_ON(!mutex_is_locked(&md->type_lock));
+	BUG_ON(dm_unknown_md_type(md));
+
+	if (dm_request_based_md_type(md)) {
+		if (!dm_init_request_based_queue(md)) {
+			DMWARN("Cannot initialize queue for Request-based dm");
+			return -EINVAL;
+		}
+	} else if (dm_bio_based_md_type(md))
+		dm_clear_request_based_queue(md);
+
+	return 0;
+}
+
 static struct mapped_device *dm_find_md(dev_t dev)
 {
 	struct mapped_device *md;
Index: linux-2.6/drivers/md/dm.h
===================================================================
--- linux-2.6.orig/drivers/md/dm.h
+++ linux-2.6/drivers/md/dm.h
@@ -72,6 +72,8 @@ void dm_set_md_type(struct mapped_device
 bool dm_unknown_md_type(struct mapped_device *md);
 bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t);
 
+int dm_setup_md_queue(struct mapped_device *md);
+
 /*
  * To check the return value from dm_table_find_target().
  */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device
  2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
@ 2010-05-25 11:18   ` Kiyoshi Ueda
  2010-05-25 12:49     ` Mike Snitzer
  0 siblings, 1 reply; 16+ messages in thread
From: Kiyoshi Ueda @ 2010-05-25 11:18 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair Kergon

Hi Mike,

On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p
>  		goto out;
>  	}
>  
> +	/* setup md->queue to reflect md's and table's type (may block) */
> +	r = dm_setup_md_queue(md);
> +	if (r) {
> +		DMWARN("unable to setup device queue for this table.");
> +		dm_table_destroy(t);
> +		dm_unlock_md_type(md);
> +		goto out;
> +	}
> +

dm_setup_md_queue() should be put just after dm_set_md_type() of patch#1.
Then, you can make sure that dm_setup_md_queue() is called only once
and dm_setup_md_queue() should be much simpler.


> +/*
> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> + */
> +static int dm_init_request_based_queue(struct mapped_device *md)
> +{
> +	struct request_queue *q = NULL;
> +
> +	/* Avoid re-initializing the queue if already fully initialized */
> +	if (!md->queue->elevator) {
> +		/* Fully initialize the queue */
> +		q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> +		if (!q)
> +			return 0;

When blk_init_allocated_queue() fails, the block-layer seems not to
guarantee that the queue is still available.
You should create appropriate block-layer interfaces and/or take proper
recovery action here.
However, if the failure is not realistic, rather than complicating this
patch, you can just put a big comment and WARN_ON() for now, and fix it
in a separate patch-set.


> +static void dm_clear_request_based_queue(struct mapped_device *md)
> +{
> +	if (dm_bio_based_md_queue(md))
> +		return; /* queue already bio-based */
> +
> +	/* Unregister elevator from sysfs and clear ->request_fn */
> +	elv_unregister_queue(md->queue);
> +	md->queue->request_fn = NULL;
> +}
> +
> +/*
> + * Setup the DM device's queue based on md's type
> + */
> +int dm_setup_md_queue(struct mapped_device *md)
> +{
> +	BUG_ON(!mutex_is_locked(&md->type_lock));
> +	BUG_ON(dm_unknown_md_type(md));
> +
> +	if (dm_request_based_md_type(md)) {
> +		if (!dm_init_request_based_queue(md)) {
> +			DMWARN("Cannot initialize queue for Request-based dm");
> +			return -EINVAL;
> +		}
> +	} else if (dm_bio_based_md_type(md))
> +		dm_clear_request_based_queue(md);
> +
> +	return 0;
> +}

These unregister/clear works shouldn't be needed if dm_setup_md_queue()
is called only once as I mentioned above.

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device
  2010-05-25 11:18   ` Kiyoshi Ueda
@ 2010-05-25 12:49     ` Mike Snitzer
  2010-05-25 16:34         ` Mike Snitzer
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 12:49 UTC (permalink / raw)
  To: Kiyoshi Ueda; +Cc: dm-devel, Alasdair Kergon, jens.axboe

On Tue, May 25 2010 at  7:18am -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:

> Hi Mike,
> 
> On 05/25/2010 08:46 AM +0900, Mike Snitzer wrote:
> > Index: linux-2.6/drivers/md/dm-ioctl.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-ioctl.c
> > +++ linux-2.6/drivers/md/dm-ioctl.c
> > @@ -1201,6 +1203,15 @@ static int table_load(struct dm_ioctl *p
> >  		goto out;
> >  	}
> >  
> > +	/* setup md->queue to reflect md's and table's type (may block) */
> > +	r = dm_setup_md_queue(md);
> > +	if (r) {
> > +		DMWARN("unable to setup device queue for this table.");
> > +		dm_table_destroy(t);
> > +		dm_unlock_md_type(md);
> > +		goto out;
> > +	}
> > +
> 
> dm_setup_md_queue() should be put just after dm_set_md_type() of patch#1.
> Then, you can make sure that dm_setup_md_queue() is called only once
> and dm_setup_md_queue() should be much simpler.

Good idea.

> > +/*
> > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> > + */
> > +static int dm_init_request_based_queue(struct mapped_device *md)
> > +{
> > +	struct request_queue *q = NULL;
> > +
> > +	/* Avoid re-initializing the queue if already fully initialized */
> > +	if (!md->queue->elevator) {
> > +		/* Fully initialize the queue */
> > +		q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > +		if (!q)
> > +			return 0;
> 
> When blk_init_allocated_queue() fails, the block-layer seems not to
> guarantee that the queue is still available.

Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
problematic:

        if (blk_init_free_list(q)) {
                kmem_cache_free(blk_requestq_cachep, q);
                return NULL;
        }

Cc'ing Jens as I think it would be advantageous for us to push the above
kmem_cache_free() into the callers where it really makes sense, e.g.:
blk_init_queue_node().

So on blk_init_allocated_queue_node() failure blk_init_queue_node() will
take care to cleanup the queue that it assumes it is managing
completely.

My patch (linux-2.6-block.git's commit: 01effb0) that split out
blk_init_allocated_queue_node() from blk_init_queue_node() opened up
this issue.  I'm fairly confident we'll get it fixed by the time 2.6.35
ships.

> You should create appropriate block-layer interfaces and/or take proper
> recovery action here.
> However, if the failure is not realistic, rather than complicating this
> patch, you can just put a big comment and WARN_ON() for now, and fix it
> in a separate patch-set.

I'll hold off on the WARN_ON().  I think we need the request_queue to
remain allocated even after blk_init_allocated_queue_node() failure.

> > +static void dm_clear_request_based_queue(struct mapped_device *md)
> > +{
> > +	if (dm_bio_based_md_queue(md))
> > +		return; /* queue already bio-based */
> > +
> > +	/* Unregister elevator from sysfs and clear ->request_fn */
> > +	elv_unregister_queue(md->queue);
> > +	md->queue->request_fn = NULL;
> > +}
> > +
> > +/*
> > + * Setup the DM device's queue based on md's type
> > + */
> > +int dm_setup_md_queue(struct mapped_device *md)
> > +{
> > +	BUG_ON(!mutex_is_locked(&md->type_lock));
> > +	BUG_ON(dm_unknown_md_type(md));
> > +
> > +	if (dm_request_based_md_type(md)) {
> > +		if (!dm_init_request_based_queue(md)) {
> > +			DMWARN("Cannot initialize queue for Request-based dm");
> > +			return -EINVAL;
> > +		}
> > +	} else if (dm_bio_based_md_type(md))
> > +		dm_clear_request_based_queue(md);
> > +
> > +	return 0;
> > +}
> 
> These unregister/clear works shouldn't be needed if dm_setup_md_queue()
> is called only once as I mentioned above.

Indeed.  I'll get v3 of the patchset out with appropriate edits.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] block: avoid unconditionally freeing previously allocated request_queue
  2010-05-25 12:49     ` Mike Snitzer
@ 2010-05-25 16:34         ` Mike Snitzer
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 16:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon

On Tue, May 25 2010 at  8:49am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, May 25 2010 at  7:18am -0400,
> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> 
> > > +/*
> > > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> > > + */
> > > +static int dm_init_request_based_queue(struct mapped_device *md)
> > > +{
> > > +	struct request_queue *q = NULL;
> > > +
> > > +	/* Avoid re-initializing the queue if already fully initialized */
> > > +	if (!md->queue->elevator) {
> > > +		/* Fully initialize the queue */
> > > +		q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > > +		if (!q)
> > > +			return 0;
> > 
> > When blk_init_allocated_queue() fails, the block-layer seems not to
> > guarantee that the queue is still available.
> 
> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
> problematic:
> 
>         if (blk_init_free_list(q)) {
>                 kmem_cache_free(blk_requestq_cachep, q);
>                 return NULL;
>         }
> 
> Cc'ing Jens as I think it would be advantageous for us to push the above
> kmem_cache_free() into the callers where it really makes sense, e.g.:
> blk_init_queue_node().
> 
> So on blk_init_allocated_queue_node() failure blk_init_queue_node() will
> take care to cleanup the queue that it assumes it is managing
> completely.
> 
> My patch (linux-2.6-block.git's commit: 01effb0) that split out
> blk_init_allocated_queue_node() from blk_init_queue_node() opened up
> this issue.  I'm fairly confident we'll get it fixed by the time 2.6.35
> ships.

Jens,

How about something like the following?


block: avoid unconditionally freeing previously allocated request_queue

On blk_init_allocated_queue_node failure, only free request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).

This addresses a regression introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue

Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-core.c |   31 ++++++++++++++++++++++++++-----
 1 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..c0179b7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -528,6 +528,24 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 }
 EXPORT_SYMBOL(blk_alloc_queue_node);
 
+static void blk_free_partial_queue(struct request_queue *q)
+{
+	/* Free q if blk_init_queue failed early enough. */
+	int free_request_queue = 0;
+	struct request_list *rl;
+
+	if (!q)
+		return;
+
+	/* Was blk_init_free_list the cause for failure? */
+	rl = &q->rq;
+	if (!rl->rq_pool)
+		free_request_queue = 1;
+
+	if (free_request_queue)
+		kmem_cache_free(blk_requestq_cachep, q);
+}
+
 /**
  * blk_init_queue  - prepare a request queue for use with a block device
  * @rfn:  The function to be called to process requests that have been
@@ -570,9 +588,14 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-	struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	struct request_queue *uninit_q, *q;
+
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+	if (!q)
+		blk_free_partial_queue(uninit_q);
 
-	return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+	return q;
 }
 EXPORT_SYMBOL(blk_init_queue_node);
 
@@ -592,10 +615,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return NULL;
 
 	q->node = node_id;
-	if (blk_init_free_list(q)) {
-		kmem_cache_free(blk_requestq_cachep, q);
+	if (blk_init_free_list(q))
 		return NULL;
-	}
 
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] block: avoid unconditionally freeing previously allocated request_queue
@ 2010-05-25 16:34         ` Mike Snitzer
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 16:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel

On Tue, May 25 2010 at  8:49am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, May 25 2010 at  7:18am -0400,
> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> 
> > > +/*
> > > + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> > > + */
> > > +static int dm_init_request_based_queue(struct mapped_device *md)
> > > +{
> > > +	struct request_queue *q = NULL;
> > > +
> > > +	/* Avoid re-initializing the queue if already fully initialized */
> > > +	if (!md->queue->elevator) {
> > > +		/* Fully initialize the queue */
> > > +		q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > > +		if (!q)
> > > +			return 0;
> > 
> > When blk_init_allocated_queue() fails, the block-layer seems not to
> > guarantee that the queue is still available.
> 
> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
> problematic:
> 
>         if (blk_init_free_list(q)) {
>                 kmem_cache_free(blk_requestq_cachep, q);
>                 return NULL;
>         }
> 
> Cc'ing Jens as I think it would be advantageous for us to push the above
> kmem_cache_free() into the callers where it really makes sense, e.g.:
> blk_init_queue_node().
> 
> So on blk_init_allocated_queue_node() failure blk_init_queue_node() will
> take care to cleanup the queue that it assumes it is managing
> completely.
> 
> My patch (linux-2.6-block.git's commit: 01effb0) that split out
> blk_init_allocated_queue_node() from blk_init_queue_node() opened up
> this issue.  I'm fairly confident we'll get it fixed by the time 2.6.35
> ships.

Jens,

How about something like the following?


block: avoid unconditionally freeing previously allocated request_queue

On blk_init_allocated_queue_node failure, only free request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).

This addresses a regression introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue

Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-core.c |   31 ++++++++++++++++++++++++++-----
 1 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..c0179b7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -528,6 +528,24 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 }
 EXPORT_SYMBOL(blk_alloc_queue_node);
 
+static void blk_free_partial_queue(struct request_queue *q)
+{
+	/* Free q if blk_init_queue failed early enough. */
+	int free_request_queue = 0;
+	struct request_list *rl;
+
+	if (!q)
+		return;
+
+	/* Was blk_init_free_list the cause for failure? */
+	rl = &q->rq;
+	if (!rl->rq_pool)
+		free_request_queue = 1;
+
+	if (free_request_queue)
+		kmem_cache_free(blk_requestq_cachep, q);
+}
+
 /**
  * blk_init_queue  - prepare a request queue for use with a block device
  * @rfn:  The function to be called to process requests that have been
@@ -570,9 +588,14 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-	struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	struct request_queue *uninit_q, *q;
+
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+	if (!q)
+		blk_free_partial_queue(uninit_q);
 
-	return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+	return q;
 }
 EXPORT_SYMBOL(blk_init_queue_node);
 
@@ -592,10 +615,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return NULL;
 
 	q->node = node_id;
-	if (blk_init_free_list(q)) {
-		kmem_cache_free(blk_requestq_cachep, q);
+	if (blk_init_free_list(q))
 		return NULL;
-	}
 
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/1] block: make blk_init_free_list and elevator_init idempotent
  2010-05-25 16:34         ` Mike Snitzer
  (?)
@ 2010-05-25 17:15         ` Mike Snitzer
  -1 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-25 17:15 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel

blk_init_allocated_queue_node may fail and the caller _could_ retry.
Accommodate the unlikely event that blk_init_allocated_queue_node is
called on an already initialized (possibly partially) request_queue.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-core.c |    3 +++
 block/elevator.c |    6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c0179b7..2c208c3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -467,6 +467,9 @@ static int blk_init_free_list(struct request_queue *q)
 {
 	struct request_list *rl = &q->rq;
 
+	if (unlikely(rl->rq_pool))
+		return 0;
+
 	rl->count[BLK_RW_SYNC] = rl->count[BLK_RW_ASYNC] = 0;
 	rl->starved[BLK_RW_SYNC] = rl->starved[BLK_RW_ASYNC] = 0;
 	rl->elvpriv = 0;
diff --git a/block/elevator.c b/block/elevator.c
index 0abce47..923a913 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -242,9 +242,11 @@ int elevator_init(struct request_queue *q, char *name)
 {
 	struct elevator_type *e = NULL;
 	struct elevator_queue *eq;
-	int ret = 0;
 	void *data;
 
+	if (unlikely(q->elevator))
+		return 0;
+
 	INIT_LIST_HEAD(&q->queue_head);
 	q->last_merge = NULL;
 	q->end_sector = 0;
@@ -284,7 +286,7 @@ int elevator_init(struct request_queue *q, char *name)
 	}
 
 	elevator_attach(q, eq, data);
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL(elevator_init);
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: avoid unconditionally freeing previously allocated request_queue
  2010-05-25 16:34         ` Mike Snitzer
  (?)
  (?)
@ 2010-05-26  2:37         ` Kiyoshi Ueda
  2010-05-26  4:47           ` Mike Snitzer
  -1 siblings, 1 reply; 16+ messages in thread
From: Kiyoshi Ueda @ 2010-05-26  2:37 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Jens Axboe, dm-devel, Alasdair Kergon, linux-kernel

Hi Mike,

On 05/26/2010 01:34 AM +0900, Mike Snitzer wrote:
> Mike Snitzer <snitzer@redhat.com> wrote:
>> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
>>>> +/*
>>>> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
>>>> + */
>>>> +static int dm_init_request_based_queue(struct mapped_device *md)
>>>> +{
>>>> +	struct request_queue *q = NULL;
>>>> +
>>>> +	/* Avoid re-initializing the queue if already fully initialized */
>>>> +	if (!md->queue->elevator) {
>>>> +		/* Fully initialize the queue */
>>>> +		q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
>>>> +		if (!q)
>>>> +			return 0;
>>>
>>> When blk_init_allocated_queue() fails, the block-layer seems not to
>>> guarantee that the queue is still available.
>>
>> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
>> problematic:
>>
>>         if (blk_init_free_list(q)) {
>>                 kmem_cache_free(blk_requestq_cachep, q);
>>                 return NULL;
>>         }

Not only that.  The blk_put_queue() in blk_init_allocated_queue_node()
will also free the queue:

	if (!elevator_init(q, NULL)) {
		blk_queue_congestion_threshold(q);
		return q;
	}

	blk_put_queue(q);
	return NULL;

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: block: avoid unconditionally freeing previously allocated request_queue
  2010-05-26  2:37         ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Kiyoshi Ueda
@ 2010-05-26  4:47           ` Mike Snitzer
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-05-26  4:47 UTC (permalink / raw)
  To: Kiyoshi Ueda; +Cc: Jens Axboe, dm-devel, Alasdair Kergon, linux-kernel

On Tue, May 25 2010 at 10:37pm -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:

> Hi Mike,
> 
> On 05/26/2010 01:34 AM +0900, Mike Snitzer wrote:
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> >>>> +/*
> >>>> + * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
> >>>> + */
> >>>> +static int dm_init_request_based_queue(struct mapped_device *md)
> >>>> +{
> >>>> +	struct request_queue *q = NULL;
> >>>> +
> >>>> +	/* Avoid re-initializing the queue if already fully initialized */
> >>>> +	if (!md->queue->elevator) {
> >>>> +		/* Fully initialize the queue */
> >>>> +		q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> >>>> +		if (!q)
> >>>> +			return 0;
> >>>
> >>> When blk_init_allocated_queue() fails, the block-layer seems not to
> >>> guarantee that the queue is still available.
> >>
> >> Ouch, yes this portion of blk_init_allocated_queue_node() is certainly
> >> problematic:
> >>
> >>         if (blk_init_free_list(q)) {
> >>                 kmem_cache_free(blk_requestq_cachep, q);
> >>                 return NULL;
> >>         }
> 
> Not only that.  The blk_put_queue() in blk_init_allocated_queue_node()
> will also free the queue:
> 
> 	if (!elevator_init(q, NULL)) {
> 		blk_queue_congestion_threshold(q);
> 		return q;
> 	}
> 
> 	blk_put_queue(q);
> 	return NULL;

OK, I'll post v2 that addresses this and we'll see what Jens says.

Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] block: avoid unconditionally freeing previously allocated request_queue
  2010-05-25 16:34         ` Mike Snitzer
                           ` (2 preceding siblings ...)
  (?)
@ 2010-05-26  4:52         ` Mike Snitzer
  2010-06-03 16:58           ` [PATCH v3] " Mike Snitzer
  -1 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-05-26  4:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dm-devel, Alasdair Kergon, Kiyoshi Ueda, linux-kernel

On blk_init_allocated_queue_node failure, only free request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).

This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue

Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-core.c |   33 +++++++++++++++++++++++++++------
 1 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..c0cdafd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -528,6 +528,25 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 }
 EXPORT_SYMBOL(blk_alloc_queue_node);
 
+static void blk_free_partial_queue(struct request_queue *q)
+{
+	struct request_list *rl;
+
+	if (!q)
+		return;
+
+	/* Was blk_init_free_list the cause for failure? */
+	rl = &q->rq;
+	if (!rl->rq_pool) {
+		kmem_cache_free(blk_requestq_cachep, q);
+		return;
+	}
+
+	/* Or was elevator_init? */
+	if (!q->elevator)
+		blk_put_queue(q);
+}
+
 /**
  * blk_init_queue  - prepare a request queue for use with a block device
  * @rfn:  The function to be called to process requests that have been
@@ -570,9 +589,14 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-	struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	struct request_queue *uninit_q, *q;
+
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+	if (!q)
+		blk_free_partial_queue(uninit_q);
 
-	return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+	return q;
 }
 EXPORT_SYMBOL(blk_init_queue_node);
 
@@ -592,10 +616,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return NULL;
 
 	q->node = node_id;
-	if (blk_init_free_list(q)) {
-		kmem_cache_free(blk_requestq_cachep, q);
+	if (blk_init_free_list(q))
 		return NULL;
-	}
 
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;
@@ -618,7 +640,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return q;
 	}
 
-	blk_put_queue(q);
 	return NULL;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue_node);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3] block: avoid unconditionally freeing previously allocated request_queue
  2010-05-26  4:52         ` [PATCH v2] " Mike Snitzer
@ 2010-06-03 16:58           ` Mike Snitzer
  2010-06-03 17:34             ` [PATCH v4] " Mike Snitzer
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-06-03 16:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon

On blk_init_allocated_queue_node failure, only free the request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).

This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated request_queue

Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-core.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

v3: leverage fact that blk_cleanup_queue will properly free all memory
    associated with a request_queue (e.g.: q->rq_pool and q->elevator)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..24683a4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -570,9 +570,14 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-	struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	struct request_queue *uninit_q, *q;
 
-	return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+	if (!q)
+		blk_cleanup_queue(uninit_q);
+
+	return q;
 }
 EXPORT_SYMBOL(blk_init_queue_node);
 
@@ -592,10 +597,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return NULL;
 
 	q->node = node_id;
-	if (blk_init_free_list(q)) {
-		kmem_cache_free(blk_requestq_cachep, q);
+	if (blk_init_free_list(q))
 		return NULL;
-	}
 
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;
@@ -618,7 +621,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return q;
 	}
 
-	blk_put_queue(q);
 	return NULL;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue_node);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4] block: avoid unconditionally freeing previously allocated request_queue
  2010-06-03 16:58           ` [PATCH v3] " Mike Snitzer
@ 2010-06-03 17:34             ` Mike Snitzer
  2010-06-04 11:44               ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-06-03 17:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kiyoshi Ueda, dm-devel, linux-kernel, Alasdair Kergon

block: avoid unconditionally freeing previously allocated request_queue

On blk_init_allocated_queue_node failure, only free the request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).

This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated
request_queue

Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-core.c |   17 +++++++++++------
 1 files changed, 11 insertions(+), 6 deletions(-)

v4: eliminate potential for NULL pointer in call to blk_cleanup_queue
v3: leverage fact that blk_cleanup_queue will properly free all memory
   associated with a request_queue (e.g.: q->rq_pool and q->elevator)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3bc5579..826d070 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -570,9 +570,17 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-	struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	struct request_queue *uninit_q, *q;
 
-	return blk_init_allocated_queue_node(q, rfn, lock, node_id);
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+	if (!uninit_q)
+		return NULL;
+
+	q = blk_init_allocated_queue_node(uninit_q, rfn, lock, node_id);
+	if (!q)
+		blk_cleanup_queue(uninit_q);
+
+	return q;
 }
 EXPORT_SYMBOL(blk_init_queue_node);
 
@@ -592,10 +600,8 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return NULL;
 
 	q->node = node_id;
-	if (blk_init_free_list(q)) {
-		kmem_cache_free(blk_requestq_cachep, q);
+	if (blk_init_free_list(q))
 		return NULL;
-	}
 
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;
@@ -618,7 +624,6 @@ blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
 		return q;
 	}
 
-	blk_put_queue(q);
 	return NULL;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue_node);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] block: avoid unconditionally freeing previously  allocated request_queue
  2010-06-03 17:34             ` [PATCH v4] " Mike Snitzer
@ 2010-06-04 11:44               ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2010-06-04 11:44 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Kiyoshi Ueda, dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	Alasdair Kergon

On 2010-06-03 19:34, Mike Snitzer wrote:
> block: avoid unconditionally freeing previously allocated request_queue
> 
> On blk_init_allocated_queue_node failure, only free the request_queue if
> it is wasn't previously allocated outside the block layer
> (e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
> 
> This addresses an interface bug introduced by the following commit:
> 01effb0 block: allow initialization of previously allocated
> request_queue
> 
> Otherwise the request_queue may be free'd out from underneath a caller
> that is managing the request_queue directly (e.g. caller uses
> blk_alloc_queue + blk_init_allocated_queue_node).

Thanks Mike, this looks a lot better. I have applied this one and
2/2 of the original posting.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-06-04 11:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-24 23:46 [PATCH v2 0/2] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
2010-05-24 23:46 ` [PATCH v2 1/2] dm: prevent table type changes after initial table load Mike Snitzer
2010-05-25 11:16   ` Kiyoshi Ueda
2010-05-25 12:44     ` Mike Snitzer
2010-05-24 23:46 ` [PATCH v2 2/2 "v9"] dm: only initialize full request_queue for request-based device Mike Snitzer
2010-05-25 11:18   ` Kiyoshi Ueda
2010-05-25 12:49     ` Mike Snitzer
2010-05-25 16:34       ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Mike Snitzer
2010-05-25 16:34         ` Mike Snitzer
2010-05-25 17:15         ` [PATCH 2/1] block: make blk_init_free_list and elevator_init idempotent Mike Snitzer
2010-05-26  2:37         ` [PATCH] block: avoid unconditionally freeing previously allocated request_queue Kiyoshi Ueda
2010-05-26  4:47           ` Mike Snitzer
2010-05-26  4:52         ` [PATCH v2] " Mike Snitzer
2010-06-03 16:58           ` [PATCH v3] " Mike Snitzer
2010-06-03 17:34             ` [PATCH v4] " Mike Snitzer
2010-06-04 11:44               ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.