All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>,
	linux-kernel@vger.kernel.org, dm-devel@redhat.com,
	Alasdair Kergon <agk@redhat.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	Jun'ichi Nomura <j-nomura@ce.jp.nec.com>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [RFC PATCH 2/2] dm: only initialize full request_queue for request-based device
Date: Tue, 18 May 2010 09:46:41 -0400	[thread overview]
Message-ID: <20100518134639.GA27582@redhat.com> (raw)
In-Reply-To: <4BF25091.3000507@ct.jp.nec.com>

On Tue, May 18 2010 at  4:32am -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:

> Hi Mike,
> 
> On 05/18/2010 02:27 AM +0900, Mike Snitzer wrote:
> > Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> >> As far as I understand, the current model of device-mapper is:
> >>   - a table (precisely, a target) has various attributes,
> >>     bio-based/request-based is one of such attributes
> >>   - a table and its attributes are bound to the block device on resume
> >> If we want to fix a problem, I think we should either work based on
> >> this model or change the model.
> >>
> >> Your patch makes that loading table affects the block device, so you
> >> are changing the model.
> >>
> >> If you change the model, it should be done carefully.
> >> For example, the current model allows most of the table loading code
> >> to run without exclusive lock on the device because it doesn't affect
> >> the device itself.  If you change this model, table loading needs to
> >> be serialized with appropriate locking.
> > 
> > Nice catch, yes md->queue needs protection (see patch below).
> 
> Not enough. (See drivers/md/dm-ioctl.c:table_load().)
> Table load sequence is:
>   1. populate table
>   2. set the table to ->new_map of the hash_cell for the mapped_device
>      in protection by _hash_lock.
> 
> Since your fix only serializes the step 1, concurrent table loading
> could end up with inconsistent status; e.g. request-based table is
> bound to the mapped_device while the queue is initialized as bio-based.
> With your new model, those 2 steps above must be atomic.

Ah, yes.. I looked at the possibility of serializing the entirety of
table_load but determined that would be too excessive (would reduce
parallelism of table_load).  But I clearly missed the fact that there
could be a race to the _hash_lock protected critical section in
table_load() -- leading to queue inconsistency.

I'll post v5 of the overall patch which will revert the mapped_device
'queue_lock' serialization that I proposed in v4.  v5 will contain
the following patch to localize all table load related queue
manipulation to the _hash_lock protected critical section in
table_load().  So it sets the queue up _after_ the table's type is
established with dm_table_set_type().

Thanks again for your review; I'll work to be more meticulous in the
future!

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index d7500e1..bfb283c 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1180,6 +1180,14 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
 		goto out;
 	}
 
+	r = dm_table_setup_md_queue(t);
+	if (r) {
+		DMWARN("unable to setup device queue for this table");
+		dm_table_destroy(t);
+		up_write(&_hash_lock);
+		goto out;
+	}
+
 	if (hc->new_map)
 		dm_table_destroy(hc->new_map);
 	hc->new_map = t;
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index dec8295..990cf17 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -803,7 +803,6 @@ int dm_table_set_type(struct dm_table *t)
 	if (bio_based) {
 		/* We must use this table as bio-based */
 		t->type = DM_TYPE_BIO_BASED;
-		dm_clear_request_based_queue(t->md);
 		return 0;
 	}
 
@@ -830,11 +829,6 @@ int dm_table_set_type(struct dm_table *t)
 		return -EINVAL;
 	}
 
-	if (!dm_init_request_based_queue(t->md)) {
-		DMWARN("Cannot initialize queue for Request-based dm");
-		return -EINVAL;
-	}
-
 	t->type = DM_TYPE_REQUEST_BASED;
 
 	return 0;
@@ -850,6 +844,23 @@ bool dm_table_request_based(struct dm_table *t)
 	return dm_table_get_type(t) == DM_TYPE_REQUEST_BASED;
 }
 
+/*
+ * Setup the DM device's queue based on table's type.
+ * Caller must serialize access to table's md.
+ */
+int dm_table_setup_md_queue(struct dm_table *t)
+{
+	if (dm_table_request_based(t)) {
+		if (!dm_init_request_based_queue(t->md)) {
+			DMWARN("Cannot initialize queue for Request-based dm");
+			return -EINVAL;
+		}
+	} else
+		dm_clear_request_based_queue(t->md);
+
+	return 0;
+}
+
 int dm_table_alloc_md_mempools(struct dm_table *t)
 {
 	unsigned type = dm_table_get_type(t);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 143082c..76b61ca 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1856,6 +1856,17 @@ static void dm_rq_barrier_work(struct work_struct *work);
 
 static void dm_init_md_queue(struct mapped_device *md)
 {
+	/*
+	 * Request-based dm devices cannot be stacked on top of bio-based dm
+	 * devices.  The type of this dm device has not been decided yet.
+	 * The type is decided at the first table loading time.
+	 * To prevent problematic device stacking, clear the queue flag
+	 * for request stacking support until then.
+	 *
+	 * This queue is new, so no concurrency on the queue_flags.
+	 */
+	queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
+
 	md->queue->queuedata = md;
 	md->queue->backing_dev_info.congested_fn = dm_any_congested;
 	md->queue->backing_dev_info.congested_data = md;
@@ -1989,17 +2000,6 @@ int dm_init_request_based_queue(struct mapped_device *md)
 
 	elv_register_queue(md->queue);
 
-	/*
-	 * Request-based dm devices cannot be stacked on top of bio-based dm
-	 * devices.  The type of this dm device has not been decided yet.
-	 * The type is decided at the first table loading time.
-	 * To prevent problematic device stacking, clear the queue flag
-	 * for request stacking support until then.
-	 *
-	 * This queue is new, so no concurrency on the queue_flags.
-	 */
-	queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
-
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
 	blk_queue_prep_rq(md->queue, dm_prep_fn);
 	blk_queue_lld_busy(md->queue, dm_lld_busy);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 95aec64..b4ebb11 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -65,6 +65,7 @@ bool dm_table_request_based(struct dm_table *t);
 int dm_table_alloc_md_mempools(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
+int dm_table_setup_md_queue(struct dm_table *t);
 
 /*
  * To check the return value from dm_table_find_target().

WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Cc: dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	Jens Axboe <jens.axboe@oracle.com>,
	"Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Nikanth Karthikesan <knikanth@suse.de>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [RFC PATCH 2/2] dm: only initialize full request_queue for request-based device
Date: Tue, 18 May 2010 09:46:41 -0400	[thread overview]
Message-ID: <20100518134639.GA27582@redhat.com> (raw)
In-Reply-To: <4BF25091.3000507@ct.jp.nec.com>

On Tue, May 18 2010 at  4:32am -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:

> Hi Mike,
> 
> On 05/18/2010 02:27 AM +0900, Mike Snitzer wrote:
> > Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> >> As far as I understand, the current model of device-mapper is:
> >>   - a table (precisely, a target) has various attributes,
> >>     bio-based/request-based is one of such attributes
> >>   - a table and its attributes are bound to the block device on resume
> >> If we want to fix a problem, I think we should either work based on
> >> this model or change the model.
> >>
> >> Your patch makes that loading table affects the block device, so you
> >> are changing the model.
> >>
> >> If you change the model, it should be done carefully.
> >> For example, the current model allows most of the table loading code
> >> to run without exclusive lock on the device because it doesn't affect
> >> the device itself.  If you change this model, table loading needs to
> >> be serialized with appropriate locking.
> > 
> > Nice catch, yes md->queue needs protection (see patch below).
> 
> Not enough. (See drivers/md/dm-ioctl.c:table_load().)
> Table load sequence is:
>   1. populate table
>   2. set the table to ->new_map of the hash_cell for the mapped_device
>      in protection by _hash_lock.
> 
> Since your fix only serializes the step 1, concurrent table loading
> could end up with inconsistent status; e.g. request-based table is
> bound to the mapped_device while the queue is initialized as bio-based.
> With your new model, those 2 steps above must be atomic.

Ah, yes.. I looked at the possibility of serializing the entirety of
table_load but determined that would be too excessive (would reduce
parallelism of table_load).  But I clearly missed the fact that there
could be a race to the _hash_lock protected critical section in
table_load() -- leading to queue inconsistency.

I'll post v5 of the overall patch which will revert the mapped_device
'queue_lock' serialization that I proposed in v4.  v5 will contain
the following patch to localize all table load related queue
manipulation to the _hash_lock protected critical section in
table_load().  So it sets the queue up _after_ the table's type is
established with dm_table_set_type().

Thanks again for your review; I'll work to be more meticulous in the
future!

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index d7500e1..bfb283c 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1180,6 +1180,14 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
 		goto out;
 	}
 
+	r = dm_table_setup_md_queue(t);
+	if (r) {
+		DMWARN("unable to setup device queue for this table");
+		dm_table_destroy(t);
+		up_write(&_hash_lock);
+		goto out;
+	}
+
 	if (hc->new_map)
 		dm_table_destroy(hc->new_map);
 	hc->new_map = t;
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index dec8295..990cf17 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -803,7 +803,6 @@ int dm_table_set_type(struct dm_table *t)
 	if (bio_based) {
 		/* We must use this table as bio-based */
 		t->type = DM_TYPE_BIO_BASED;
-		dm_clear_request_based_queue(t->md);
 		return 0;
 	}
 
@@ -830,11 +829,6 @@ int dm_table_set_type(struct dm_table *t)
 		return -EINVAL;
 	}
 
-	if (!dm_init_request_based_queue(t->md)) {
-		DMWARN("Cannot initialize queue for Request-based dm");
-		return -EINVAL;
-	}
-
 	t->type = DM_TYPE_REQUEST_BASED;
 
 	return 0;
@@ -850,6 +844,23 @@ bool dm_table_request_based(struct dm_table *t)
 	return dm_table_get_type(t) == DM_TYPE_REQUEST_BASED;
 }
 
+/*
+ * Setup the DM device's queue based on table's type.
+ * Caller must serialize access to table's md.
+ */
+int dm_table_setup_md_queue(struct dm_table *t)
+{
+	if (dm_table_request_based(t)) {
+		if (!dm_init_request_based_queue(t->md)) {
+			DMWARN("Cannot initialize queue for Request-based dm");
+			return -EINVAL;
+		}
+	} else
+		dm_clear_request_based_queue(t->md);
+
+	return 0;
+}
+
 int dm_table_alloc_md_mempools(struct dm_table *t)
 {
 	unsigned type = dm_table_get_type(t);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 143082c..76b61ca 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1856,6 +1856,17 @@ static void dm_rq_barrier_work(struct work_struct *work);
 
 static void dm_init_md_queue(struct mapped_device *md)
 {
+	/*
+	 * Request-based dm devices cannot be stacked on top of bio-based dm
+	 * devices.  The type of this dm device has not been decided yet.
+	 * The type is decided at the first table loading time.
+	 * To prevent problematic device stacking, clear the queue flag
+	 * for request stacking support until then.
+	 *
+	 * This queue is new, so no concurrency on the queue_flags.
+	 */
+	queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
+
 	md->queue->queuedata = md;
 	md->queue->backing_dev_info.congested_fn = dm_any_congested;
 	md->queue->backing_dev_info.congested_data = md;
@@ -1989,17 +2000,6 @@ int dm_init_request_based_queue(struct mapped_device *md)
 
 	elv_register_queue(md->queue);
 
-	/*
-	 * Request-based dm devices cannot be stacked on top of bio-based dm
-	 * devices.  The type of this dm device has not been decided yet.
-	 * The type is decided at the first table loading time.
-	 * To prevent problematic device stacking, clear the queue flag
-	 * for request stacking support until then.
-	 *
-	 * This queue is new, so no concurrency on the queue_flags.
-	 */
-	queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
-
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
 	blk_queue_prep_rq(md->queue, dm_prep_fn);
 	blk_queue_lld_busy(md->queue, dm_lld_busy);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 95aec64..b4ebb11 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -65,6 +65,7 @@ bool dm_table_request_based(struct dm_table *t);
 int dm_table_alloc_md_mempools(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
+int dm_table_setup_md_queue(struct dm_table *t);
 
 /*
  * To check the return value from dm_table_find_target().

  reply	other threads:[~2010-05-18 13:46 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-10 22:55 [RFC PATCH 1/2] block: allow initialization of previously allocated request_queue Mike Snitzer
2010-05-10 22:55 ` Mike Snitzer
2010-05-10 22:55 ` [RFC PATCH 2/2] dm: only initialize full request_queue for request-based device Mike Snitzer
2010-05-10 22:55   ` Mike Snitzer
2010-05-11  4:23   ` Kiyoshi Ueda
2010-05-11 13:15     ` Mike Snitzer
2010-05-12  8:23       ` Kiyoshi Ueda
2010-05-13  3:57         ` Mike Snitzer
2010-05-14  8:06           ` Kiyoshi Ueda
2010-05-14 14:08             ` Mike Snitzer
2010-05-17  9:27               ` Kiyoshi Ueda
2010-05-17 17:27                 ` Mike Snitzer
2010-05-18  8:32                   ` Kiyoshi Ueda
2010-05-18 13:46                     ` Mike Snitzer [this message]
2010-05-18 13:46                       ` Mike Snitzer
2010-05-19  5:57                       ` Kiyoshi Ueda
2010-05-19 12:01                         ` Mike Snitzer
2010-05-19 12:01                           ` Mike Snitzer
2010-05-19 14:39                           ` Mike Snitzer
2010-05-19 14:45                             ` Mike Snitzer
2010-05-20 11:21                             ` Kiyoshi Ueda
2010-05-20 17:07                               ` Mike Snitzer
2010-05-21  8:32                                 ` Kiyoshi Ueda
2010-05-21 13:34                                   ` Mike Snitzer
2010-05-24  9:58                                     ` Kiyoshi Ueda
2010-05-19 21:51                           ` Mike Snitzer
2010-05-13  4:31   ` [PATCH 2/2 v2] " Mike Snitzer
2010-05-13  5:02     ` [RFC PATCH 3/2] dm: bio-based device must not register elevator in sysfs Mike Snitzer
2010-05-13 22:14       ` [PATCH 3/2 v2] " Mike Snitzer
2010-05-11  6:55 ` [RFC PATCH 1/2] block: allow initialization of previously allocated request_queue Jens Axboe
2010-05-11 13:18   ` Mike Snitzer
2010-05-11 13:21     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100518134639.GA27582@redhat.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=jens.axboe@oracle.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=knikanth@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.