* [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization
@ 2010-05-23 21:44 Mike Snitzer
2010-05-23 21:44 ` [PATCH 1/6] block: Adjust elv_iosched_show to return "none" for bio-based DM Mike Snitzer
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-23 21:44 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
This work has expanded in scope somewhat (based on Mikulas' suggestion
that I pursue more constrained table/device type switching to avoid
Kiyoshi's locking concerns). A mapped_device now has a specific type
(md->type) that is managed in table_{load,clear} (see patch 3/6).
It should be noted that patch 4/6 is labeled "v8". I still believe
v7's locking strategy is _not_ prone to problematic deadlock, as I
detailed/questioned here: http://lkml.org/lkml/2010/5/21/175
v7 is still available for viewing here:
https://patchwork.kernel.org/patch/101270/
But this new series eliminates v7's locking between table_load() and
do_resume() -- fixed md->type made this possible. So these changes
may be more desirable overall (adds some clearer exclusion and state
transitions that I feel help DM without being too restrictive).
Please see the individual patch headers for more details.
A monolithic patch is also available here:
http://people.redhat.com/msnitzer/patches/dm-queue-init/monolithic.patch
I welcome all comments/review. Hopefully others will find that what
I've done is worthwhile.
Mike Snitzer (6):
block: Adjust elv_iosched_show to return "none" for bio-based DM
dm ioctl: interlock resume and table clear
dm: prevent table type changes after initial table load
dm: only initialize full request_queue for request-based device
dm ioctl: introduce dm_get_verified_mdptr
dm ioctl: introduce find_device_noinit
block/elevator.c | 2 +-
drivers/md/dm-ioctl.c | 148 +++++++++++++++++++++++++++------
drivers/md/dm.c | 216 +++++++++++++++++++++++++++++++++++++++++-------
drivers/md/dm.h | 12 +++
4 files changed, 318 insertions(+), 60 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/6] block: Adjust elv_iosched_show to return "none" for bio-based DM
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
@ 2010-05-23 21:44 ` Mike Snitzer
2010-05-24 7:06 ` Jens Axboe
2010-05-23 21:44 ` [PATCH 2/6] dm ioctl: interlock resume and table clear Mike Snitzer
` (5 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2010-05-23 21:44 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon, Jens Axboe
Bio-based DM doesn't use an elevator (queue is !blk_queue_stackable()).
Longer-term DM will not allocate an elevator for bio-based DM. But even
then there will be small potential for an elevator to be allocated for
a request-based DM table only to have a bio-based table be loaded in the
end.
Displaying "none" for bio-based DM will help avoid user confusion.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
---
block/elevator.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6-block/block/elevator.c
===================================================================
--- linux-2.6-block.orig/block/elevator.c
+++ linux-2.6-block/block/elevator.c
@@ -1097,7 +1097,7 @@ ssize_t elv_iosched_show(struct request_
struct elevator_type *__e;
int len = 0;
- if (!q->elevator)
+ if (!q->elevator || !blk_queue_stackable(q))
return sprintf(name, "none\n");
elv = e->elevator_type;
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/6] dm ioctl: interlock resume and table clear
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
2010-05-23 21:44 ` [PATCH 1/6] block: Adjust elv_iosched_show to return "none" for bio-based DM Mike Snitzer
@ 2010-05-23 21:44 ` Mike Snitzer
2010-05-24 17:08 ` Mike Snitzer
2010-05-23 21:44 ` [PATCH 3/6] dm: prevent table type changes after initial table load Mike Snitzer
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2010-05-23 21:44 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
When resuming an inactive table there is a lack of testable state during
its transition to being live (hc->new_map is NULL and md->map isn't
set). Interlock allows table_clear() to _know_ there isn't a live table
yet.
This interlock also has the side-effect of serializing N resumes to the
_same_ mapped_device (more coarsely than the suspend_lock provides).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-ioctl.c | 34 +++++++++++++++++++++++++++++++---
drivers/md/dm.c | 17 +++++++++++++++++
drivers/md/dm.h | 3 +++
3 files changed, 51 insertions(+), 3 deletions(-)
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -871,6 +871,17 @@ static int do_resume(struct dm_ioctl *pa
}
md = hc->md;
+ up_write(&_hash_lock);
+
+ dm_lock_resume(md);
+
+ down_write(&_hash_lock);
+ if (!hc || hc->md != md) {
+ DMWARN("device has been removed from the dev hash table.");
+ up_write(&_hash_lock);
+ r = -ENXIO;
+ goto out;
+ }
new_map = hc->new_map;
hc->new_map = NULL;
@@ -891,8 +902,8 @@ static int do_resume(struct dm_ioctl *pa
old_map = dm_swap_table(md, new_map);
if (IS_ERR(old_map)) {
dm_table_destroy(new_map);
- dm_put(md);
- return PTR_ERR(old_map);
+ r = PTR_ERR(old_map);
+ goto out;
}
if (dm_table_get_mode(new_map) & FMODE_WRITE)
@@ -913,6 +924,8 @@ static int do_resume(struct dm_ioctl *pa
if (!r)
r = __dev_status(md, param);
+out:
+ dm_unlock_resume(md);
dm_put(md);
return r;
}
@@ -1209,6 +1222,20 @@ static int table_clear(struct dm_ioctl *
return -ENXIO;
}
+ md = hc->md;
+ up_write(&_hash_lock);
+
+ dm_lock_resume(md);
+
+ down_write(&_hash_lock);
+ hc = dm_get_mdptr(md);
+ if (!hc || hc->md != md) {
+ DMWARN("device has been removed from the dev hash table.");
+ up_write(&_hash_lock);
+ r = -ENXIO;
+ goto out;
+ }
+
if (hc->new_map) {
dm_table_destroy(hc->new_map);
hc->new_map = NULL;
@@ -1217,8 +1244,9 @@ static int table_clear(struct dm_ioctl *
param->flags &= ~DM_INACTIVE_PRESENT_FLAG;
r = __dev_status(hc->md, param);
- md = hc->md;
up_write(&_hash_lock);
+out:
+ dm_unlock_resume(md);
dm_put(md);
return r;
}
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c
+++ linux-2.6/drivers/md/dm.c
@@ -116,6 +116,12 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
struct mapped_device {
struct rw_semaphore io_lock;
struct mutex suspend_lock;
+ /*
+ * Resuming inactive table lacks testable state during its
+ * transition to being live. Interlock allows other operations
+ * (e.g. table_clear) to _know_ there isn't a live table yet.
+ */
+ struct mutex resume_lock;
rwlock_t map_lock;
atomic_t holders;
atomic_t open_count;
@@ -1876,6 +1882,7 @@ static struct mapped_device *alloc_dev(i
init_rwsem(&md->io_lock);
mutex_init(&md->suspend_lock);
+ mutex_init(&md->resume_lock);
spin_lock_init(&md->deferred_lock);
spin_lock_init(&md->barrier_error_lock);
rwlock_init(&md->map_lock);
@@ -1997,6 +2004,16 @@ static void free_dev(struct mapped_devic
kfree(md);
}
+void dm_lock_resume(struct mapped_device *md)
+{
+ mutex_lock(&md->resume_lock);
+}
+
+void dm_unlock_resume(struct mapped_device *md)
+{
+ mutex_unlock(&md->resume_lock);
+}
+
static void __bind_mempools(struct mapped_device *md, struct dm_table *t)
{
struct dm_md_mempools *p;
Index: linux-2.6/drivers/md/dm.h
===================================================================
--- linux-2.6.orig/drivers/md/dm.h
+++ linux-2.6/drivers/md/dm.h
@@ -66,6 +66,9 @@ int dm_table_alloc_md_mempools(struct dm
void dm_table_free_md_mempools(struct dm_table *t);
struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
+void dm_lock_resume(struct mapped_device *md);
+void dm_unlock_resume(struct mapped_device *md);
+
/*
* To check the return value from dm_table_find_target().
*/
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/6] dm: prevent table type changes after initial table load
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
2010-05-23 21:44 ` [PATCH 1/6] block: Adjust elv_iosched_show to return "none" for bio-based DM Mike Snitzer
2010-05-23 21:44 ` [PATCH 2/6] dm ioctl: interlock resume and table clear Mike Snitzer
@ 2010-05-23 21:44 ` Mike Snitzer
2010-05-23 21:44 ` [PATCH 4/6 "v8"] dm: only initialize full request_queue for request-based device Mike Snitzer
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-23 21:44 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
Before table_load() stages an inactive table atomically check if its
type conflicts with a previously established device type (md->type).
Introduce 'type_lock' in mapped_device structure and use it to protect
md->type access. table_load() sets md->type without concern for:
- another table_load() racing to set conflicting md->type.
- do_resume() making a conflicting table live.
Allowed transitions:
UNKNOWN_MD_TYPE => BIO_BASED_MD_TYPE
UNKNOWN_MD_TYPE => REQUEST_BASED_MD_TYPE
This prevents a table reload from switching the inactive table directly
to a conflicting type (even if a table hasn't yet been made live via
resume).
Switching table type is still possible, before resume, if an explicit
table clear is performed.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-ioctl.c | 43 ++++++++++++++++++++++++++++
drivers/md/dm.c | 76 +++++++++++++++++++++++++++++++++++++++++++++-----
drivers/md/dm.h | 7 ++++
3 files changed, 119 insertions(+), 7 deletions(-)
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -1153,6 +1153,7 @@ static int table_load(struct dm_ioctl *p
struct hash_cell *hc;
struct dm_table *t;
struct mapped_device *md;
+ int initial_table_load = 0;
md = find_device(param);
if (!md)
@@ -1183,12 +1184,43 @@ static int table_load(struct dm_ioctl *p
goto out;
}
+ /*
+ * Protect md->type against concurrent table loads.
+ * Locking strategy:
+ * + Leverage fact that md's type cannot change after initial table load.
+ * - Only protect type in table_load() -- not in do_resume().
+ *
+ * + Protect type while working to stage an inactive table:
+ * - check if table's type conflicts with md->type
+ * (holding: md->type_lock)
+ * - stage inactive table (hc->new_map)
+ * (holding: md->type_lock + _hash_lock)
+ */
+ dm_lock_md_type(md);
+
+ if (dm_unknown_md_type(md)) {
+ /* set md's type based on table's type */
+ dm_set_md_type(md, t);
+ /* note initial_table_load to clear md's type on error */
+ initial_table_load = 1;
+ } else if (!dm_md_type_matches_table(md, t)) {
+ DMWARN("can't change device type after initial table load.");
+ dm_table_destroy(t);
+ dm_unlock_md_type(md);
+ r = -EINVAL;
+ goto out;
+ }
+
+ /* stage inactive table */
down_write(&_hash_lock);
hc = dm_get_mdptr(md);
if (!hc || hc->md != md) {
DMWARN("device has been removed from the dev hash table.");
dm_table_destroy(t);
up_write(&_hash_lock);
+ if (initial_table_load)
+ dm_clear_md_type(md);
+ dm_unlock_md_type(md);
r = -ENXIO;
goto out;
}
@@ -1198,6 +1230,8 @@ static int table_load(struct dm_ioctl *p
hc->new_map = t;
up_write(&_hash_lock);
+ dm_unlock_md_type(md);
+
param->flags |= DM_INACTIVE_PRESENT_FLAG;
r = __dev_status(md, param);
@@ -1212,6 +1246,7 @@ static int table_clear(struct dm_ioctl *
int r;
struct hash_cell *hc;
struct mapped_device *md;
+ struct dm_table *live_table;
down_write(&_hash_lock);
@@ -1226,6 +1261,7 @@ static int table_clear(struct dm_ioctl *
up_write(&_hash_lock);
dm_lock_resume(md);
+ dm_lock_md_type(md); /* May need to clear md's type */
down_write(&_hash_lock);
hc = dm_get_mdptr(md);
@@ -1236,6 +1272,12 @@ static int table_clear(struct dm_ioctl *
goto out;
}
+ /* Clear md's type if there is no live table */
+ live_table = dm_get_live_table(md);
+ if (!live_table)
+ dm_clear_md_type(md);
+ dm_table_put(live_table);
+
if (hc->new_map) {
dm_table_destroy(hc->new_map);
hc->new_map = NULL;
@@ -1246,6 +1288,7 @@ static int table_clear(struct dm_ioctl *
r = __dev_status(hc->md, param);
up_write(&_hash_lock);
out:
+ dm_unlock_md_type(md);
dm_unlock_resume(md);
dm_put(md);
return r;
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c
+++ linux-2.6/drivers/md/dm.c
@@ -111,6 +111,15 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
#define DMF_QUEUE_IO_TO_THREAD 6
/*
+ * Type for md->type field.
+ */
+enum mapped_device_type {
+ UNKNOWN_MD_TYPE,
+ BIO_BASED_MD_TYPE,
+ REQUEST_BASED_MD_TYPE,
+};
+
+/*
* Work processed by per-device workqueue.
*/
struct mapped_device {
@@ -129,6 +138,12 @@ struct mapped_device {
unsigned long flags;
struct request_queue *queue;
+ enum mapped_device_type type;
+ /*
+ * Protect type from concurrent access.
+ */
+ struct mutex type_lock;
+
struct gendisk *disk;
char name[16];
@@ -1880,9 +1895,11 @@ static struct mapped_device *alloc_dev(i
if (r < 0)
goto bad_minor;
+ md->type = UNKNOWN_MD_TYPE;
init_rwsem(&md->io_lock);
mutex_init(&md->suspend_lock);
mutex_init(&md->resume_lock);
+ mutex_init(&md->type_lock);
spin_lock_init(&md->deferred_lock);
spin_lock_init(&md->barrier_error_lock);
rwlock_init(&md->map_lock);
@@ -2145,6 +2162,58 @@ int dm_create(int minor, struct mapped_d
return 0;
}
+/*
+ * Functions to manage md->type.
+ * All are required to hold md->type_lock.
+ */
+void dm_lock_md_type(struct mapped_device *md)
+{
+ mutex_lock(&md->type_lock);
+}
+
+void dm_unlock_md_type(struct mapped_device *md)
+{
+ mutex_unlock(&md->type_lock);
+}
+
+void dm_set_md_type(struct mapped_device *md, struct dm_table* t)
+{
+ if (dm_table_request_based(t))
+ md->type = REQUEST_BASED_MD_TYPE;
+ else
+ md->type = BIO_BASED_MD_TYPE;
+}
+
+void dm_clear_md_type(struct mapped_device *md)
+{
+ md->type = UNKNOWN_MD_TYPE;
+}
+
+bool dm_unknown_md_type(struct mapped_device *md)
+{
+ return md->type == UNKNOWN_MD_TYPE;
+}
+
+static bool dm_bio_based_md_type(struct mapped_device *md)
+{
+ return md->type == BIO_BASED_MD_TYPE;
+}
+
+static bool dm_request_based_md_type(struct mapped_device *md)
+{
+ return md->type == REQUEST_BASED_MD_TYPE;
+}
+
+bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t)
+{
+ if (dm_request_based_md_type(md))
+ return dm_table_request_based(t);
+ else if (dm_bio_based_md_type(md))
+ return !dm_table_request_based(t);
+
+ return 0;
+}
+
static struct mapped_device *dm_find_md(dev_t dev)
{
struct mapped_device *md;
@@ -2420,13 +2489,6 @@ struct dm_table *dm_swap_table(struct ma
goto out;
}
- /* cannot change the device type, once a table is bound */
- if (md->map &&
- (dm_table_get_type(md->map) != dm_table_get_type(table))) {
- DMWARN("can't change the device type after a table is bound");
- goto out;
- }
-
map = __bind(md, table, &limits);
out:
Index: linux-2.6/drivers/md/dm.h
===================================================================
--- linux-2.6.orig/drivers/md/dm.h
+++ linux-2.6/drivers/md/dm.h
@@ -66,6 +66,13 @@ int dm_table_alloc_md_mempools(struct dm
void dm_table_free_md_mempools(struct dm_table *t);
struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
+void dm_lock_md_type(struct mapped_device *md);
+void dm_unlock_md_type(struct mapped_device *md);
+void dm_set_md_type(struct mapped_device *md, struct dm_table* t);
+void dm_clear_md_type(struct mapped_device *md);
+bool dm_unknown_md_type(struct mapped_device *md);
+bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t);
+
void dm_lock_resume(struct mapped_device *md);
void dm_unlock_resume(struct mapped_device *md);
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/6 "v8"] dm: only initialize full request_queue for request-based device
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
` (2 preceding siblings ...)
2010-05-23 21:44 ` [PATCH 3/6] dm: prevent table type changes after initial table load Mike Snitzer
@ 2010-05-23 21:44 ` Mike Snitzer
2010-05-23 21:45 ` [PATCH 5/6] dm ioctl: introduce dm_get_verified_mdptr Mike Snitzer
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-23 21:44 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
Allocate a minimalist request_queue structure initially (needed for both
bio and request-based DM). A bio-based DM device no longer defaults to
having a fully initialized request_queue (request_fn, elevator, etc).
So bio-based DM devices no longer register elevator sysfs attributes
('iosched/' tree or 'scheduler' other than "none").
Initialization of a full request_queue (request_fn, elevator, etc) is
deferred until it is known that the DM device is request-based -- at the
end of the table load sequence.
Factor DM device's request_queue initialization:
- common to both request-based and bio-based into dm_init_md_queue().
- specific to request-based into dm_init_request_based_queue().
md->type_lock is also used to protect md->queue during table_load().
md->queue is setup without concern for:
- another table_load() racing to setup conflicting queue state.
- do_resume() making a conflicting table live.
NOTE: It is still possible, albeit unlikely, for a bio-based device to
have a full request_queue. But in this case the unused elevator will
not be registered with sysfs. A table switch from request-based to
bio-based would be required, e.g.:
# dmsetup create --notable bio-based
# echo "0 100 multipath ..." | dmsetup load bio-based
# dmsetup clear bio-based
# echo "0 100 linear ..." | dmsetup load bio-based
# dmsetup resume bio-based
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-ioctl.c | 17 ++++++
drivers/md/dm.c | 125 +++++++++++++++++++++++++++++++++++++++-----------
drivers/md/dm.h | 2
3 files changed, 116 insertions(+), 28 deletions(-)
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -1185,14 +1185,16 @@ static int table_load(struct dm_ioctl *p
}
/*
- * Protect md->type against concurrent table loads.
+ * Protect md->type and md->queue against concurrent table loads.
* Locking strategy:
* + Leverage fact that md's type cannot change after initial table load.
* - Only protect type in table_load() -- not in do_resume().
*
- * + Protect type while working to stage an inactive table:
+ * + Protect type and queue while working to stage an inactive table:
* - check if table's type conflicts with md->type
* (holding: md->type_lock)
+ * - setup md->queue based on md->type
+ * (holding: md->type_lock)
* - stage inactive table (hc->new_map)
* (holding: md->type_lock + _hash_lock)
*/
@@ -1211,6 +1213,17 @@ static int table_load(struct dm_ioctl *p
goto out;
}
+ /* setup md->queue to reflect md's and table's type (may block) */
+ r = dm_setup_md_queue(md);
+ if (r) {
+ DMWARN("unable to setup device queue for this table.");
+ dm_table_destroy(t);
+ if (initial_table_load)
+ dm_clear_md_type(md);
+ dm_unlock_md_type(md);
+ goto out;
+ }
+
/* stage inactive table */
down_write(&_hash_lock);
hc = dm_get_mdptr(md);
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c
+++ linux-2.6/drivers/md/dm.c
@@ -140,7 +140,7 @@ struct mapped_device {
struct request_queue *queue;
enum mapped_device_type type;
/*
- * Protect type from concurrent access.
+ * Protect queue and type from concurrent access.
*/
struct mutex type_lock;
@@ -1870,6 +1870,28 @@ static const struct block_device_operati
static void dm_wq_work(struct work_struct *work);
static void dm_rq_barrier_work(struct work_struct *work);
+static void dm_init_md_queue(struct mapped_device *md)
+{
+ /*
+ * Request-based dm devices cannot be stacked on top of bio-based dm
+ * devices. The type of this dm device has not been decided yet.
+ * The type is decided at the first table loading time.
+ * To prevent problematic device stacking, clear the queue flag
+ * for request stacking support until then.
+ *
+ * This queue is new, so no concurrency on the queue_flags.
+ */
+ queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
+
+ md->queue->queuedata = md;
+ md->queue->backing_dev_info.congested_fn = dm_any_congested;
+ md->queue->backing_dev_info.congested_data = md;
+ blk_queue_make_request(md->queue, dm_request);
+ blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
+ md->queue->unplug_fn = dm_unplug_all;
+ blk_queue_merge_bvec(md->queue, dm_merge_bvec);
+}
+
/*
* Allocate and initialise a blank device with a given minor.
*/
@@ -1910,34 +1932,11 @@ static struct mapped_device *alloc_dev(i
INIT_LIST_HEAD(&md->uevent_list);
spin_lock_init(&md->uevent_lock);
- md->queue = blk_init_queue(dm_request_fn, NULL);
+ md->queue = blk_alloc_queue(GFP_KERNEL);
if (!md->queue)
goto bad_queue;
- /*
- * Request-based dm devices cannot be stacked on top of bio-based dm
- * devices. The type of this dm device has not been decided yet,
- * although we initialized the queue using blk_init_queue().
- * The type is decided at the first table loading time.
- * To prevent problematic device stacking, clear the queue flag
- * for request stacking support until then.
- *
- * This queue is new, so no concurrency on the queue_flags.
- */
- queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
- md->saved_make_request_fn = md->queue->make_request_fn;
- md->queue->queuedata = md;
- md->queue->backing_dev_info.congested_fn = dm_any_congested;
- md->queue->backing_dev_info.congested_data = md;
- blk_queue_make_request(md->queue, dm_request);
- blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
- md->queue->unplug_fn = dm_unplug_all;
- blk_queue_merge_bvec(md->queue, dm_merge_bvec);
- blk_queue_softirq_done(md->queue, dm_softirq_done);
- blk_queue_prep_rq(md->queue, dm_prep_fn);
- blk_queue_lld_busy(md->queue, dm_lld_busy);
- blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
- dm_rq_prepare_flush);
+ dm_init_md_queue(md);
md->disk = alloc_disk(1);
if (!md->disk)
@@ -2214,6 +2213,80 @@ bool dm_md_type_matches_table(struct map
return 0;
}
+/*
+ * Functions to manage md->queue.
+ * All are required to hold md->type_lock.
+ */
+static bool dm_bio_based_md_queue(struct mapped_device *md)
+{
+ return (md->queue->request_fn) ? 0 : 1;
+}
+
+/*
+ * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
+ */
+static int dm_init_request_based_queue(struct mapped_device *md)
+{
+ struct request_queue *q = NULL;
+
+ /* Avoid re-initializing the queue if already fully initialized */
+ if (!md->queue->elevator) {
+ /* Fully initialize the queue */
+ q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
+ if (!q)
+ return 0;
+ md->queue = q;
+ md->saved_make_request_fn = md->queue->make_request_fn;
+ dm_init_md_queue(md);
+ blk_queue_softirq_done(md->queue, dm_softirq_done);
+ blk_queue_prep_rq(md->queue, dm_prep_fn);
+ blk_queue_lld_busy(md->queue, dm_lld_busy);
+ blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
+ dm_rq_prepare_flush);
+ } else if (dm_bio_based_md_queue(md)) {
+ /*
+ * Queue was fully initialized on behalf of a previous
+ * request-based table load. Table is now switching from
+ * bio-based back to request-based, e.g.: rq -> bio -> rq
+ */
+ md->queue->request_fn = dm_request_fn;
+ } else
+ return 1; /* queue already request-based */
+
+ elv_register_queue(md->queue);
+
+ return 1;
+}
+
+static void dm_clear_request_based_queue(struct mapped_device *md)
+{
+ if (dm_bio_based_md_queue(md))
+ return; /* queue already bio-based */
+
+ /* Unregister elevator from sysfs and clear ->request_fn */
+ elv_unregister_queue(md->queue);
+ md->queue->request_fn = NULL;
+}
+
+/*
+ * Setup the DM device's queue based on md's type
+ */
+int dm_setup_md_queue(struct mapped_device *md)
+{
+ BUG_ON(!mutex_is_locked(&md->type_lock));
+ BUG_ON(dm_unknown_md_type(md));
+
+ if (dm_request_based_md_type(md)) {
+ if (!dm_init_request_based_queue(md)) {
+ DMWARN("Cannot initialize queue for Request-based dm");
+ return -EINVAL;
+ }
+ } else if (dm_bio_based_md_type(md))
+ dm_clear_request_based_queue(md);
+
+ return 0;
+}
+
static struct mapped_device *dm_find_md(dev_t dev)
{
struct mapped_device *md;
Index: linux-2.6/drivers/md/dm.h
===================================================================
--- linux-2.6.orig/drivers/md/dm.h
+++ linux-2.6/drivers/md/dm.h
@@ -73,6 +73,8 @@ void dm_clear_md_type(struct mapped_devi
bool dm_unknown_md_type(struct mapped_device *md);
bool dm_md_type_matches_table(struct mapped_device *md, struct dm_table* t);
+int dm_setup_md_queue(struct mapped_device *md);
+
void dm_lock_resume(struct mapped_device *md);
void dm_unlock_resume(struct mapped_device *md);
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 5/6] dm ioctl: introduce dm_get_verified_mdptr
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
` (3 preceding siblings ...)
2010-05-23 21:44 ` [PATCH 4/6 "v8"] dm: only initialize full request_queue for request-based device Mike Snitzer
@ 2010-05-23 21:45 ` Mike Snitzer
2010-05-23 21:45 ` [PATCH 6/6] dm ioctl: introduce find_device_noinit Mike Snitzer
2010-05-24 10:00 ` [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Kiyoshi Ueda
6 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-23 21:45 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
Factor all appropriate consumers of dm_get_mdptr() to use the
dm_get_verified_mdptr() rather than duplicate code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-ioctl.c | 37 +++++++++++++++++++++++--------------
1 file changed, 23 insertions(+), 14 deletions(-)
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -174,6 +174,19 @@ static void free_cell(struct hash_cell *
}
}
+static struct hash_cell *dm_get_verified_mdptr(struct mapped_device *md)
+{
+ struct hash_cell *hc;
+
+ hc = dm_get_mdptr(md);
+ if (!hc || hc->md != md) {
+ DMWARN("device has been removed from the dev hash table.");
+ return NULL;
+ }
+
+ return hc;
+}
+
/*
* The kdev_t and uuid of a device can never change once it is
* initially inserted.
@@ -546,11 +559,9 @@ static struct dm_table *dm_get_inactive_
struct dm_table *table = NULL;
down_read(&_hash_lock);
- hc = dm_get_mdptr(md);
- if (!hc || hc->md != md) {
- DMWARN("device has been removed from the dev hash table.");
+ hc = dm_get_verified_mdptr(md);
+ if (!hc)
goto out;
- }
table = hc->new_map;
if (table)
@@ -876,8 +887,8 @@ static int do_resume(struct dm_ioctl *pa
dm_lock_resume(md);
down_write(&_hash_lock);
- if (!hc || hc->md != md) {
- DMWARN("device has been removed from the dev hash table.");
+ hc = dm_get_verified_mdptr(md);
+ if (!hc) {
up_write(&_hash_lock);
r = -ENXIO;
goto out;
@@ -1226,9 +1237,8 @@ static int table_load(struct dm_ioctl *p
/* stage inactive table */
down_write(&_hash_lock);
- hc = dm_get_mdptr(md);
- if (!hc || hc->md != md) {
- DMWARN("device has been removed from the dev hash table.");
+ hc = dm_get_verified_mdptr(md);
+ if (!hc) {
dm_table_destroy(t);
up_write(&_hash_lock);
if (initial_table_load)
@@ -1277,9 +1287,8 @@ static int table_clear(struct dm_ioctl *
dm_lock_md_type(md); /* May need to clear md's type */
down_write(&_hash_lock);
- hc = dm_get_mdptr(md);
- if (!hc || hc->md != md) {
- DMWARN("device has been removed from the dev hash table.");
+ hc = dm_get_verified_mdptr(md);
+ if (!hc) {
up_write(&_hash_lock);
r = -ENXIO;
goto out;
@@ -1736,8 +1745,8 @@ int dm_copy_name_and_uuid(struct mapped_
return -ENXIO;
mutex_lock(&dm_hash_cells_mutex);
- hc = dm_get_mdptr(md);
- if (!hc || hc->md != md) {
+ hc = dm_get_verified_mdptr(md);
+ if (!hc) {
r = -ENXIO;
goto out;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 6/6] dm ioctl: introduce find_device_noinit
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
` (4 preceding siblings ...)
2010-05-23 21:45 ` [PATCH 5/6] dm ioctl: introduce dm_get_verified_mdptr Mike Snitzer
@ 2010-05-23 21:45 ` Mike Snitzer
2010-05-24 10:00 ` [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Kiyoshi Ueda
6 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-23 21:45 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
Factor all appropriate consumers of __find_device_hash_cell() to use
find_device_noinit().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-ioctl.c | 51 +++++++++++++++++++++++++-------------------------
1 file changed, 26 insertions(+), 25 deletions(-)
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -718,23 +718,40 @@ static struct mapped_device *find_device
return md;
}
+static struct mapped_device *find_device_noinit(struct dm_ioctl *param)
+{
+ struct hash_cell *hc;
+ struct mapped_device *md = NULL;
+
+ down_write(&_hash_lock);
+ hc = __find_device_hash_cell(param);
+ if (hc)
+ md = hc->md;
+ else
+ DMWARN("device doesn't appear to be in the dev hash table.");
+ up_write(&_hash_lock);
+
+ return md;
+}
+
static int dev_remove(struct dm_ioctl *param, size_t param_size)
{
struct hash_cell *hc;
struct mapped_device *md;
int r;
- down_write(&_hash_lock);
- hc = __find_device_hash_cell(param);
+ md = find_device_noinit(param);
+ if (!md)
+ return -ENXIO;
+ down_write(&_hash_lock);
+ hc = dm_get_verified_mdptr(md);
if (!hc) {
- DMWARN("device doesn't appear to be in the dev hash table.");
up_write(&_hash_lock);
+ dm_put(md);
return -ENXIO;
}
- md = hc->md;
-
/*
* Ensure the device is not open and nothing further can open it.
*/
@@ -872,17 +889,9 @@ static int do_resume(struct dm_ioctl *pa
struct mapped_device *md;
struct dm_table *new_map, *old_map = NULL;
- down_write(&_hash_lock);
-
- hc = __find_device_hash_cell(param);
- if (!hc) {
- DMWARN("device doesn't appear to be in the dev hash table.");
- up_write(&_hash_lock);
+ md = find_device_noinit(param);
+ if (!md)
return -ENXIO;
- }
-
- md = hc->md;
- up_write(&_hash_lock);
dm_lock_resume(md);
@@ -1271,17 +1280,9 @@ static int table_clear(struct dm_ioctl *
struct mapped_device *md;
struct dm_table *live_table;
- down_write(&_hash_lock);
-
- hc = __find_device_hash_cell(param);
- if (!hc) {
- DMWARN("device doesn't appear to be in the dev hash table.");
- up_write(&_hash_lock);
+ md = find_device_noinit(param);
+ if (!md)
return -ENXIO;
- }
-
- md = hc->md;
- up_write(&_hash_lock);
dm_lock_resume(md);
dm_lock_md_type(md); /* May need to clear md's type */
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/6] block: Adjust elv_iosched_show to return "none" for bio-based DM
2010-05-23 21:44 ` [PATCH 1/6] block: Adjust elv_iosched_show to return "none" for bio-based DM Mike Snitzer
@ 2010-05-24 7:06 ` Jens Axboe
0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2010-05-24 7:06 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Kiyoshi Ueda, dm-devel, Alasdair Kergon
On Sun, May 23 2010, Mike Snitzer wrote:
> Bio-based DM doesn't use an elevator (queue is !blk_queue_stackable()).
>
> Longer-term DM will not allocate an elevator for bio-based DM. But even
> then there will be small potential for an elevator to be allocated for
> a request-based DM table only to have a bio-based table be loaded in the
> end.
>
> Displaying "none" for bio-based DM will help avoid user confusion.
Thanks, applied for 2.6.35.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
` (5 preceding siblings ...)
2010-05-23 21:45 ` [PATCH 6/6] dm ioctl: introduce find_device_noinit Mike Snitzer
@ 2010-05-24 10:00 ` Kiyoshi Ueda
2010-05-24 17:24 ` Mike Snitzer
6 siblings, 1 reply; 11+ messages in thread
From: Kiyoshi Ueda @ 2010-05-24 10:00 UTC (permalink / raw)
To: Mike Snitzer; +Cc: dm-devel, Alasdair Kergon
Hi Mike,
On 05/24/2010 06:44 AM +0900, Mike Snitzer wrote:
> It should be noted that patch 4/6 is labeled "v8". I still believe
> v7's locking strategy is _not_ prone to problematic deadlock, as I
> detailed/questioned here: http://lkml.org/lkml/2010/5/21/175
> v7 is still available for viewing here:
> https://patchwork.kernel.org/patch/101270/
As I replied to another thread, the problematic deadlock is possible.
> But this new series eliminates v7's locking between table_load() and
> do_resume() -- fixed md->type made this possible. So these changes
> may be more desirable overall (adds some clearer exclusion and state
> transitions that I feel help DM without being too restrictive).
Yes, I think it's reasonable.
> This work has expanded in scope somewhat (based on Mikulas' suggestion
> that I pursue more constrained table/device type switching to avoid
> Kiyoshi's locking concerns). A mapped_device now has a specific type
> (md->type) that is managed in table_{load,clear} (see patch 3/6).
Cancelling the type by table_clear() keeps the code/model complex
even after changing model.
I think the feature to cancel the type is not required any userspace
tools nor admins at least for now. So dropping the feature and
completely fixing the type at the first table loading time may be
a good meeting point to make the kernel code simple.
I had only a quick look, so I may find some more comments. But I'd
like to have more review after we reach an agreement about the basic
implementation design above.
Thanks,
Kiyoshi Ueda
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/6] dm ioctl: interlock resume and table clear
2010-05-23 21:44 ` [PATCH 2/6] dm ioctl: interlock resume and table clear Mike Snitzer
@ 2010-05-24 17:08 ` Mike Snitzer
0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-24 17:08 UTC (permalink / raw)
To: dm-devel; +Cc: Kiyoshi Ueda, Alasdair Kergon
On Sun, May 23 2010 at 5:44pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> When resuming an inactive table there is a lack of testable state during
> its transition to being live (hc->new_map is NULL and md->map isn't
> set). Interlock allows table_clear() to _know_ there isn't a live table
> yet.
>
> This interlock also has the side-effect of serializing N resumes to the
> _same_ mapped_device (more coarsely than the suspend_lock provides).
>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> drivers/md/dm-ioctl.c | 34 +++++++++++++++++++++++++++++++---
> drivers/md/dm.c | 17 +++++++++++++++++
> drivers/md/dm.h | 3 +++
> 3 files changed, 51 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -871,6 +871,17 @@ static int do_resume(struct dm_ioctl *pa
> }
>
> md = hc->md;
> + up_write(&_hash_lock);
> +
> + dm_lock_resume(md);
> +
> + down_write(&_hash_lock);
I'm missing the 'hc = dm_get_mdptr(md);' here. Patch 5/6 fixed this up
so that is why I didn't notice this in testing.
> + if (!hc || hc->md != md) {
> + DMWARN("device has been removed from the dev hash table.");
> + up_write(&_hash_lock);
> + r = -ENXIO;
> + goto out;
> + }
>
> new_map = hc->new_map;
> hc->new_map = NULL;
I'll hold off on reposting the patches just for the above fix though.
Especially given that we still need to answer the following 2 questions
(we also need to answer the more basic question that Kiyoshi asked in
response to patch 0/6; the answer to that may trump the following):
1)
What benefit is there with having do_resume() be reentrant for the
_same_ mapped_device?
I'm asserting that there is no benefit. We likely just got that
reentrancy "for free" by needing do_resume() to be reentrant for
_different_ mapped_device(s).
2)
The more important, but related, question:
How can we reliably _know_ that a DM device has a live table?
**see my p.s. footnote below**
If we can't reliably test for whether a DM device has a live table then
we can't impose the new table_load() constraint (as implemented in patch
3/6): A DM device must only ever have a single immutable type once an
"inactive" table is staged to become "live".
I'm looking to solve this #2 with patch 2/6 (but it fixes #1 in the
process).
I'm open to other suggestions for how to implement a fix for #2 (I
believe any solution for #2 will "fix" #1 as a side-effect).
Mike
p.s.
We currently have a state diagram gap where we have no way to know that
an "inactive" table (hc->new_map) is transitioning to be "live"
(md->map).
We haven't had a need for precise understanding of the state transitions
a DM device has while making an "inactive" table "live":
* do_resume() destages hc->new_map (and sets it to NULL) under _hash_lock
* "inactive" table is becoming "live", but AFAIK we have nothing to test
that will tell us this.
* do_resume()'s dm_swap_table() will later set md->map under _hash_lock
I explored an alternative implementation for solving #2 but it was
significantly more complex:
1) still have do_resume() not be reentrant for the _same_ mapped_device
- AFAIK, still requires a 'md->resume_lock'
2) introduce DMF_RESUMING flag (for md->flags) and set/clear/test it
while holding 'md->resume_lock'
3) do_resume() sets and then clears DMF_RESUMING while holding
'md->resume_lock'
4) table_clear() tests for DMF_RESUMING (if !hc->new_map && !md->map)
while holding 'md->resume_lock'.
So in the end I dispensed with all the DMF_RESUMING business and just
kept the do_resume() and table_clear() interlock as it fixed #2 (and
#1).
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization
2010-05-24 10:00 ` [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Kiyoshi Ueda
@ 2010-05-24 17:24 ` Mike Snitzer
0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-24 17:24 UTC (permalink / raw)
To: Kiyoshi Ueda; +Cc: dm-devel, Alasdair Kergon
On Mon, May 24 2010 at 6:00am -0400,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> Hi Mike,
>
> On 05/24/2010 06:44 AM +0900, Mike Snitzer wrote:
> > But this new series eliminates v7's locking between table_load() and
> > do_resume() -- fixed md->type made this possible. So these changes
> > may be more desirable overall (adds some clearer exclusion and state
> > transitions that I feel help DM without being too restrictive).
>
> Yes, I think it's reasonable.
OK.
> > This work has expanded in scope somewhat (based on Mikulas' suggestion
> > that I pursue more constrained table/device type switching to avoid
> > Kiyoshi's locking concerns). A mapped_device now has a specific type
> > (md->type) that is managed in table_{load,clear} (see patch 3/6).
>
> Cancelling the type by table_clear() keeps the code/model complex
> even after changing model.
> I think the feature to cancel the type is not required any userspace
> tools nor admins at least for now. So dropping the feature and
> completely fixing the type at the first table loading time may be
> a good meeting point to make the kernel code simple.
I initially thought it best to keep table_clear() more "friendly".. so
as not to impose a user explicitly delete a DM device just to switch
from bio-based to request-based tables (or vice-versa).
But I think I've come full-circle:
- I'm not against imposing your suggestion (never reset md->type).
Certainly keeps the DM code simpler! :)
- We don't know all existing userspace code -- people could be using
DM's clear ioctl and we'd break their assumptions; but do we _really_
care? They'll adapt (they'd have to do a device delete, device
create, table load).
> I had only a quick look, so I may find some more comments. But I'd
> like to have more review after we reach an agreement about the basic
> implementation design above.
Yes, wish I had the restraint to avoid making dm_clear_md_type() work ;)
But pulling out that support will be easy and will also simplify other
code:
- will eliminate concern for "rq -> bio -> rq" in
dm_init_request_based_queue()
- can remove dm_clear_request_based_queue()
- maybe more...
Thanks,
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-05-24 17:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-23 21:44 [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Mike Snitzer
2010-05-23 21:44 ` [PATCH 1/6] block: Adjust elv_iosched_show to return "none" for bio-based DM Mike Snitzer
2010-05-24 7:06 ` Jens Axboe
2010-05-23 21:44 ` [PATCH 2/6] dm ioctl: interlock resume and table clear Mike Snitzer
2010-05-24 17:08 ` Mike Snitzer
2010-05-23 21:44 ` [PATCH 3/6] dm: prevent table type changes after initial table load Mike Snitzer
2010-05-23 21:44 ` [PATCH 4/6 "v8"] dm: only initialize full request_queue for request-based device Mike Snitzer
2010-05-23 21:45 ` [PATCH 5/6] dm ioctl: introduce dm_get_verified_mdptr Mike Snitzer
2010-05-23 21:45 ` [PATCH 6/6] dm ioctl: introduce find_device_noinit Mike Snitzer
2010-05-24 10:00 ` [PATCH 0/6] dm: restrict conflicting table loads and improve queue initialization Kiyoshi Ueda
2010-05-24 17:24 ` Mike Snitzer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.