[PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next
@ 2013-12-20 23:37 Mike Snitzer
  2013-12-20 23:37 ` [PATCH maybe-for-3.13 01/20] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping Mike Snitzer
                   ` (19 more replies)
  0 siblings, 20 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

I've pushed these changes to linux-dm.git's for-next branch.  The
first 2 will likely still go to Linus for 3.13.  But the others are
all for 3.14.  Any of these patches can be modified/rebased/etc so
please shout if you see something you don't like.  Also please shout
if there is a patch you don't see queued but feel it should be (Hannes
I haven't gotten to your dm-mpath invalid paths patch yet, sorry).

I put a lot of work into fixing the thinp target's handling of
-ENOSPC.  I'd appreciate it if reviewers interested in this area would
put particular focus on patch 18: "dm thin: fix pool_preresume resize
with heavy IO races".

Happy Holidays,
Mike

Joe Thornber (5):
  dm thin: fix discard support to a previously shared block
  dm thin: return error from alloc_data_block if pool is not in write mode
  dm thin: factor out check_low_water_mark and use bools
  dm thin: handle metadata failures more consistently
  dm cache policy mq: introduce three promotion threshold tunables

Mike Snitzer (12):
  dm thin: initialize dm_thin_new_mapping returned by get_next_mapping
  dm space map metadata: limit errors in sm_metadata_new_block
  dm persistent data: cleanup dm-thin specific references in text
  dm thin: use bool rather than unsigned for flags in structures
  dm thin: add mappings to end of prepared_* lists
  dm thin: log info when growing the data or metadata device
  dm thin: cleanup and improve no space handling
  dm thin: requeue bios to DM core if no_free_space and in read-only mode
  dm thin: add error_if_no_space feature
  dm thin: eliminate the no_free_space flag
  dm thin: fix set_pool_mode exposed pool operation races
  dm thin: fix pool_preresume resize with heavy IO races

Mikulas Patocka (2):
  dm table: remove unused buggy code that extends the targets array
  dm delay: use per-bio data instead of a mempool and slab cache

Wei Yongjun (1):
  dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD

 Documentation/device-mapper/cache-policies.txt     |  16 +-
 Documentation/device-mapper/thin-provisioning.txt  |   7 +
 drivers/md/Kconfig                                 |   6 +-
 drivers/md/dm-cache-policy-mq.c                    |  67 ++--
 drivers/md/dm-delay.c                              |  35 +-
 drivers/md/dm-table.c                              |  22 +-
 drivers/md/dm-thin-metadata.c                      |  55 ++++
 drivers/md/dm-thin-metadata.h                      |   7 +-
 drivers/md/dm-thin.c                               | 358 ++++++++++++++-------
 drivers/md/persistent-data/dm-block-manager.c      |   2 +-
 drivers/md/persistent-data/dm-space-map-metadata.c |   4 +-
 11 files changed, 378 insertions(+), 201 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH maybe-for-3.13 01/20] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2014-01-06 14:21   ` Joe Thornber
  2013-12-20 23:37 ` [PATCH maybe-for-3.13 02/20] dm thin: fix discard support to a previously shared block Mike Snitzer
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: stable

As additional members are added to the dm_thin_new_mapping structure
care should be taken to make sure they get initialized before use.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/md/dm-thin.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index ee29037..da65feb 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -751,13 +751,17 @@ static int ensure_next_mapping(struct pool *pool)
 
 static struct dm_thin_new_mapping *get_next_mapping(struct pool *pool)
 {
-	struct dm_thin_new_mapping *r = pool->next_mapping;
+	struct dm_thin_new_mapping *m = pool->next_mapping;
 
 	BUG_ON(!pool->next_mapping);
 
+	memset(m, 0, sizeof(struct dm_thin_new_mapping));
+	INIT_LIST_HEAD(&m->list);
+	m->bio = NULL;
+
 	pool->next_mapping = NULL;
 
-	return r;
+	return m;
 }
 
 static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
@@ -769,15 +773,10 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	struct pool *pool = tc->pool;
 	struct dm_thin_new_mapping *m = get_next_mapping(pool);
 
-	INIT_LIST_HEAD(&m->list);
-	m->quiesced = 0;
-	m->prepared = 0;
 	m->tc = tc;
 	m->virt_block = virt_block;
 	m->data_block = data_dest;
 	m->cell = cell;
-	m->err = 0;
-	m->bio = NULL;
 
 	if (!dm_deferred_set_add_work(pool->shared_read_ds, &m->list))
 		m->quiesced = 1;
@@ -840,15 +839,12 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 	struct pool *pool = tc->pool;
 	struct dm_thin_new_mapping *m = get_next_mapping(pool);
 
-	INIT_LIST_HEAD(&m->list);
 	m->quiesced = 1;
 	m->prepared = 0;
 	m->tc = tc;
 	m->virt_block = virt_block;
 	m->data_block = data_block;
 	m->cell = cell;
-	m->err = 0;
-	m->bio = NULL;
 
 	/*
 	 * If the whole block of data is being overwritten or we are not
@@ -1045,7 +1041,6 @@ static void process_discard(struct thin_c *tc, struct bio *bio)
 			m->data_block = lookup_result.block;
 			m->cell = cell;
 			m->cell2 = cell2;
-			m->err = 0;
 			m->bio = bio;
 
 			if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list)) {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH maybe-for-3.13 02/20] dm thin: fix discard support to a previously shared block
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
  2013-12-20 23:37 ` [PATCH maybe-for-3.13 01/20] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 03/20] dm table: remove unused buggy code that extends the targets array Mike Snitzer
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Joe Thornber, stable

From: Joe Thornber <ejt@redhat.com>

If a snapshot is created and later deleted the origin dm_thin_device's
snapshotted_time will have been updated to reflect the snapshot's
creation time.  The 'shared' flag in the dm_thin_lookup_result struct
returned from dm_thin_find_block() is an approximation based on
snapshotted_time -- this is done to avoid 0(n), or worse, time
complexity.  In this case, the shared flag would be true.

But because the 'shared' flag reflects an approximation a block can be
incorrectly assumed to be shared (e.g. false positive for 'shared'
because the snapshot no longer exists).  This could result in discards
issued to a thin device not being passed down to the pool's underlying
data device.

To fix this we double check that a thin block is really still in-use
after a mapping is removed using dm_pool_block_is_used().  If the
reference count for a block is now zero the discard is allowed to be
passed down.

Also add a 'definitely_not_shared' member to the dm_thin_new_mapping
structure -- reflects that the 'shared' flag in the response from
dm_thin_find_block() can only be held as definitive if false is
returned.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/md/dm-thin-metadata.c | 20 ++++++++++++++++++++
 drivers/md/dm-thin-metadata.h |  2 ++
 drivers/md/dm-thin.c          | 14 ++++++++++++--
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 8a30ad5..7da3476 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -1349,6 +1349,12 @@ dm_thin_id dm_thin_dev_id(struct dm_thin_device *td)
 	return td->id;
 }
 
+/*
+ * Check whether @time (of block creation) is older than @td's last snapshot.
+ * If so then the associated block is shared with the last snapshot device.
+ * Any block on a device created *after* the device last got snapshotted is
+ * necessarily not shared.
+ */
 static bool __snapshotted_since(struct dm_thin_device *td, uint32_t time)
 {
 	return td->snapshotted_time > time;
@@ -1458,6 +1464,20 @@ int dm_thin_remove_block(struct dm_thin_device *td, dm_block_t block)
 	return r;
 }
 
+int dm_pool_block_is_used(struct dm_pool_metadata *pmd, dm_block_t b, bool *result)
+{
+	int r;
+	uint32_t ref_count;
+
+	down_read(&pmd->root_lock);
+	r = dm_sm_get_count(pmd->data_sm, b, &ref_count);
+	if (!r)
+		*result = (ref_count != 0);
+	up_read(&pmd->root_lock);
+
+	return r;
+}
+
 bool dm_thin_changed_this_transaction(struct dm_thin_device *td)
 {
 	int r;
diff --git a/drivers/md/dm-thin-metadata.h b/drivers/md/dm-thin-metadata.h
index 7bcc0e1..2edf5db 100644
--- a/drivers/md/dm-thin-metadata.h
+++ b/drivers/md/dm-thin-metadata.h
@@ -181,6 +181,8 @@ int dm_pool_get_data_block_size(struct dm_pool_metadata *pmd, sector_t *result);
 
 int dm_pool_get_data_dev_size(struct dm_pool_metadata *pmd, dm_block_t *result);
 
+int dm_pool_block_is_used(struct dm_pool_metadata *pmd, dm_block_t b, bool *result);
+
 /*
  * Returns -ENOSPC if the new size is too small and already allocated
  * blocks would be lost.
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index da65feb..51e656a 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -512,6 +512,7 @@ struct dm_thin_new_mapping {
 	unsigned quiesced:1;
 	unsigned prepared:1;
 	unsigned pass_discard:1;
+	unsigned definitely_not_shared:1;
 
 	struct thin_c *tc;
 	dm_block_t virt_block;
@@ -683,7 +684,15 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
 	cell_defer_no_holder(tc, m->cell2);
 
 	if (m->pass_discard)
-		remap_and_issue(tc, m->bio, m->data_block);
+		if (m->definitely_not_shared)
+			remap_and_issue(tc, m->bio, m->data_block);
+		else {
+			bool used = false;
+			if (dm_pool_block_is_used(tc->pool->pmd, m->data_block, &used) || used)
+				bio_endio(m->bio, 0);
+			else
+				remap_and_issue(tc, m->bio, m->data_block);
+		}
 	else
 		bio_endio(m->bio, 0);
 
@@ -1036,7 +1045,8 @@ static void process_discard(struct thin_c *tc, struct bio *bio)
 			 */
 			m = get_next_mapping(pool);
 			m->tc = tc;
-			m->pass_discard = (!lookup_result.shared) && pool->pf.discard_passdown;
+			m->pass_discard = pool->pf.discard_passdown;
+			m->definitely_not_shared = !lookup_result.shared;
 			m->virt_block = block;
 			m->data_block = lookup_result.block;
 			m->cell = cell;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 03/20] dm table: remove unused buggy code that extends the targets array
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
  2013-12-20 23:37 ` [PATCH maybe-for-3.13 01/20] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping Mike Snitzer
  2013-12-20 23:37 ` [PATCH maybe-for-3.13 02/20] dm thin: fix discard support to a previously shared block Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 04/20] dm delay: use per-bio data instead of a mempool and slab cache Mike Snitzer
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

From: Mikulas Patocka <mpatocka@redhat.com>

A device mapper table is allocated in the following way:
* The function dm_table_create is called, it gets the number of targets
  as an argument -- it allocates a targets array accordingly.
* For each target, we call dm_table_add_target.

If we add more targets than were specified in dm_table_create, the
function dm_table_add_target reallocates the targets array.  However,
this reallocation code is wrong - it moves the targets array to a new
location, while some target constructors hold pointers to the array in
the old location.

The following DM target drivers save the pointer to the target
structure, so they corrupt memory if the target array is moved:
multipath, raid, mirror, snapshot, stripe, switch, thin, verity.

Under normal circumstances, the reallocation function is not called
(because dm_table_create is called with the correct number of targets),
so the buggy reallocation code is not used.

Prior to the fix "dm table: fail dm_table_create on dm_round_up
overflow", the reallocation code could only be used in case the user
specifies too large a value in param->target_count, such as 0xffffffff.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-table.c | 22 ++--------------------
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ba6a38..6a7f2b8 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -155,7 +155,6 @@ static int alloc_targets(struct dm_table *t, unsigned int num)
 {
 	sector_t *n_highs;
 	struct dm_target *n_targets;
-	int n = t->num_targets;
 
 	/*
 	 * Allocate both the target array and offset array at once.
@@ -169,12 +168,7 @@ static int alloc_targets(struct dm_table *t, unsigned int num)
 
 	n_targets = (struct dm_target *) (n_highs + num);
 
-	if (n) {
-		memcpy(n_highs, t->highs, sizeof(*n_highs) * n);
-		memcpy(n_targets, t->targets, sizeof(*n_targets) * n);
-	}
-
-	memset(n_highs + n, -1, sizeof(*n_highs) * (num - n));
+	memset(n_highs, -1, sizeof(*n_highs) * num);
 	vfree(t->highs);
 
 	t->num_allocated = num;
@@ -261,17 +255,6 @@ void dm_table_destroy(struct dm_table *t)
 }
 
 /*
- * Checks to see if we need to extend highs or targets.
- */
-static inline int check_space(struct dm_table *t)
-{
-	if (t->num_targets >= t->num_allocated)
-		return alloc_targets(t, t->num_allocated * 2);
-
-	return 0;
-}
-
-/*
  * See if we've already got a device in the list.
  */
 static struct dm_dev_internal *find_device(struct list_head *l, dev_t dev)
@@ -731,8 +714,7 @@ int dm_table_add_target(struct dm_table *t, const char *type,
 		return -EINVAL;
 	}
 
-	if ((r = check_space(t)))
-		return r;
+	BUG_ON(t->num_targets >= t->num_allocated);
 
 	tgt = t->targets + t->num_targets;
 	memset(tgt, 0, sizeof(*tgt));
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 04/20] dm delay: use per-bio data instead of a mempool and slab cache
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (2 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 03/20] dm table: remove unused buggy code that extends the targets array Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 05/20] dm space map metadata: limit errors in sm_metadata_new_block Mike Snitzer
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka

From: Mikulas Patocka <mpatocka@redhat.com>

Starting with commit c0820cf5ad095 ("dm: introduce per_bio_data"),
device mapper has the capability to pre-allocate a target-specific
structure with the bio.

This patch changes dm-delay to use this facility instead of a slab cache
and mempool.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-delay.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 2f91d6d..a8a511c 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -24,7 +24,6 @@ struct delay_c {
 	struct work_struct flush_expired_bios;
 	struct list_head delayed_bios;
 	atomic_t may_delay;
-	mempool_t *delayed_pool;
 
 	struct dm_dev *dev_read;
 	sector_t start_read;
@@ -40,14 +39,11 @@ struct delay_c {
 struct dm_delay_info {
 	struct delay_c *context;
 	struct list_head list;
-	struct bio *bio;
 	unsigned long expires;
 };
 
 static DEFINE_MUTEX(delayed_bios_lock);
 
-static struct kmem_cache *delayed_cache;
-
 static void handle_delayed_timer(unsigned long data)
 {
 	struct delay_c *dc = (struct delay_c *)data;
@@ -87,13 +83,14 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
 	mutex_lock(&delayed_bios_lock);
 	list_for_each_entry_safe(delayed, next, &dc->delayed_bios, list) {
 		if (flush_all || time_after_eq(jiffies, delayed->expires)) {
+			struct bio *bio = dm_bio_from_per_bio_data(delayed,
+						sizeof(struct dm_delay_info));
 			list_del(&delayed->list);
-			bio_list_add(&flush_bios, delayed->bio);
-			if ((bio_data_dir(delayed->bio) == WRITE))
+			bio_list_add(&flush_bios, bio);
+			if ((bio_data_dir(bio) == WRITE))
 				delayed->context->writes--;
 			else
 				delayed->context->reads--;
-			mempool_free(delayed, dc->delayed_pool);
 			continue;
 		}
 
@@ -185,12 +182,6 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	}
 
 out:
-	dc->delayed_pool = mempool_create_slab_pool(128, delayed_cache);
-	if (!dc->delayed_pool) {
-		DMERR("Couldn't create delayed bio pool.");
-		goto bad_dev_write;
-	}
-
 	dc->kdelayd_wq = alloc_workqueue("kdelayd", WQ_MEM_RECLAIM, 0);
 	if (!dc->kdelayd_wq) {
 		DMERR("Couldn't start kdelayd");
@@ -206,12 +197,11 @@ out:
 
 	ti->num_flush_bios = 1;
 	ti->num_discard_bios = 1;
+	ti->per_bio_data_size = sizeof(struct dm_delay_info);
 	ti->private = dc;
 	return 0;
 
 bad_queue:
-	mempool_destroy(dc->delayed_pool);
-bad_dev_write:
 	if (dc->dev_write)
 		dm_put_device(ti, dc->dev_write);
 bad_dev_read:
@@ -232,7 +222,6 @@ static void delay_dtr(struct dm_target *ti)
 	if (dc->dev_write)
 		dm_put_device(ti, dc->dev_write);
 
-	mempool_destroy(dc->delayed_pool);
 	kfree(dc);
 }
 
@@ -244,10 +233,9 @@ static int delay_bio(struct delay_c *dc, int delay, struct bio *bio)
 	if (!delay || !atomic_read(&dc->may_delay))
 		return 1;
 
-	delayed = mempool_alloc(dc->delayed_pool, GFP_NOIO);
+	delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info));
 
 	delayed->context = dc;
-	delayed->bio = bio;
 	delayed->expires = expires = jiffies + (delay * HZ / 1000);
 
 	mutex_lock(&delayed_bios_lock);
@@ -356,13 +344,7 @@ static struct target_type delay_target = {
 
 static int __init dm_delay_init(void)
 {
-	int r = -ENOMEM;
-
-	delayed_cache = KMEM_CACHE(dm_delay_info, 0);
-	if (!delayed_cache) {
-		DMERR("Couldn't create delayed bio cache.");
-		goto bad_memcache;
-	}
+	int r;
 
 	r = dm_register_target(&delay_target);
 	if (r < 0) {
@@ -373,15 +355,12 @@ static int __init dm_delay_init(void)
 	return 0;
 
 bad_register:
-	kmem_cache_destroy(delayed_cache);
-bad_memcache:
 	return r;
 }
 
 static void __exit dm_delay_exit(void)
 {
 	dm_unregister_target(&delay_target);
-	kmem_cache_destroy(delayed_cache);
 }
 
 /* Module hooks */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 05/20] dm space map metadata: limit errors in sm_metadata_new_block
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (3 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 04/20] dm delay: use per-bio data instead of a mempool and slab cache Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2014-01-06 14:25   ` Joe Thornber
  2013-12-20 23:37 ` [PATCH for-3.14 06/20] dm persistent data: cleanup dm-thin specific references in text Mike Snitzer
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

The "unable to allocate new metadata block" error can be a particularly
verbose error if there is a systemic issue with the metadata device.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/persistent-data/dm-space-map-metadata.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c
index 58fc1ee..e930844 100644
--- a/drivers/md/persistent-data/dm-space-map-metadata.c
+++ b/drivers/md/persistent-data/dm-space-map-metadata.c
@@ -385,13 +385,13 @@ static int sm_metadata_new_block(struct dm_space_map *sm, dm_block_t *b)
 
 	int r = sm_metadata_new_block_(sm, b);
 	if (r) {
-		DMERR("unable to allocate new metadata block");
+		DMERR_LIMIT("unable to allocate new metadata block");
 		return r;
 	}
 
 	r = sm_metadata_get_nr_free(sm, &count);
 	if (r) {
-		DMERR("couldn't get free block count");
+		DMERR_LIMIT("couldn't get free block count");
 		return r;
 	}
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 06/20] dm persistent data: cleanup dm-thin specific references in text
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (4 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 05/20] dm space map metadata: limit errors in sm_metadata_new_block Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2014-01-06 14:25   ` Joe Thornber
  2013-12-20 23:37 ` [PATCH for-3.14 07/20] dm thin: use bool rather than unsigned for flags in structures Mike Snitzer
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

DM's persistent-data library is now used my multiple targets so
exclusive references to "pool" or "thin provisioning" need to be
cleaned up.  Adjust Kconfig's DM_DEBUG_BLOCK_STACK_TRACING text
and remove "pool" from a block manager error message.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/Kconfig                            | 6 +++---
 drivers/md/persistent-data/dm-block-manager.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index f2ccbc3..7441344 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -250,12 +250,12 @@ config DM_THIN_PROVISIONING
          Provides thin provisioning and snapshots that share a data store.
 
 config DM_DEBUG_BLOCK_STACK_TRACING
-	boolean "Keep stack trace of thin provisioning block lock holders"
-	depends on STACKTRACE_SUPPORT && DM_THIN_PROVISIONING
+	boolean "Keep stack trace of persistent data block lock holders"
+	depends on STACKTRACE_SUPPORT && DM_PERSISTENT_DATA
 	select STACKTRACE
 	---help---
 	  Enable this for messages that may help debug problems with the
-	  block manager locking used by thin provisioning.
+	  block manager locking used by thin provisioning and caching.
 
 	  If unsure, say N.
 
diff --git a/drivers/md/persistent-data/dm-block-manager.c b/drivers/md/persistent-data/dm-block-manager.c
index 064a3c2..455f792 100644
--- a/drivers/md/persistent-data/dm-block-manager.c
+++ b/drivers/md/persistent-data/dm-block-manager.c
@@ -104,7 +104,7 @@ static int __check_holder(struct block_lock *lock)
 
 	for (i = 0; i < MAX_HOLDERS; i++) {
 		if (lock->holders[i] == current) {
-			DMERR("recursive lock detected in pool metadata");
+			DMERR("recursive lock detected in metadata");
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
 			DMERR("previously held here:");
 			print_stack_trace(lock->traces + i, 4);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 07/20] dm thin: use bool rather than unsigned for flags in structures
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (5 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 06/20] dm persistent data: cleanup dm-thin specific references in text Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2014-01-06 14:27   ` Joe Thornber
  2013-12-20 23:37 ` [PATCH for-3.14 08/20] dm thin: return error from alloc_data_block if pool is not in write mode Mike Snitzer
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

Also, move 'err' member in dm_thin_new_mapping structure to eliminate 4
byte hole (reduces size from 88 bytes to 80).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-thin-metadata.h |  2 +-
 drivers/md/dm-thin.c          | 26 +++++++++++++++-----------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/md/dm-thin-metadata.h b/drivers/md/dm-thin-metadata.h
index 2edf5db..9a36856 100644
--- a/drivers/md/dm-thin-metadata.h
+++ b/drivers/md/dm-thin-metadata.h
@@ -131,7 +131,7 @@ dm_thin_id dm_thin_dev_id(struct dm_thin_device *td);
 
 struct dm_thin_lookup_result {
 	dm_block_t block;
-	unsigned shared:1;
+	bool shared:1;
 };
 
 /*
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 51e656a..70b02fd 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -509,16 +509,16 @@ static void remap_and_issue(struct thin_c *tc, struct bio *bio,
 struct dm_thin_new_mapping {
 	struct list_head list;
 
-	unsigned quiesced:1;
-	unsigned prepared:1;
-	unsigned pass_discard:1;
-	unsigned definitely_not_shared:1;
+	bool quiesced:1;
+	bool prepared:1;
+	bool pass_discard:1;
+	bool definitely_not_shared:1;
 
+	int err;
 	struct thin_c *tc;
 	dm_block_t virt_block;
 	dm_block_t data_block;
 	struct dm_bio_prison_cell *cell, *cell2;
-	int err;
 
 	/*
 	 * If the bio covers the whole area of a block then we can avoid
@@ -549,7 +549,7 @@ static void copy_complete(int read_err, unsigned long write_err, void *context)
 	m->err = read_err || write_err ? -EIO : 0;
 
 	spin_lock_irqsave(&pool->lock, flags);
-	m->prepared = 1;
+	m->prepared = true;
 	__maybe_add_mapping(m);
 	spin_unlock_irqrestore(&pool->lock, flags);
 }
@@ -564,7 +564,7 @@ static void overwrite_endio(struct bio *bio, int err)
 	m->err = err;
 
 	spin_lock_irqsave(&pool->lock, flags);
-	m->prepared = 1;
+	m->prepared = true;
 	__maybe_add_mapping(m);
 	spin_unlock_irqrestore(&pool->lock, flags);
 }
@@ -766,6 +766,10 @@ static struct dm_thin_new_mapping *get_next_mapping(struct pool *pool)
 
 	memset(m, 0, sizeof(struct dm_thin_new_mapping));
 	INIT_LIST_HEAD(&m->list);
+	m->quiesced = false;
+	m->prepared = false;
+	m->pass_discard = false;
+	m->definitely_not_shared = false;
 	m->bio = NULL;
 
 	pool->next_mapping = NULL;
@@ -788,7 +792,7 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	m->cell = cell;
 
 	if (!dm_deferred_set_add_work(pool->shared_read_ds, &m->list))
-		m->quiesced = 1;
+		m->quiesced = true;
 
 	/*
 	 * IO to pool_dev remaps to the pool target's data_dev.
@@ -848,8 +852,8 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 	struct pool *pool = tc->pool;
 	struct dm_thin_new_mapping *m = get_next_mapping(pool);
 
-	m->quiesced = 1;
-	m->prepared = 0;
+	m->quiesced = true;
+	m->prepared = false;
 	m->tc = tc;
 	m->virt_block = virt_block;
 	m->data_block = data_block;
@@ -2904,7 +2908,7 @@ static int thin_endio(struct dm_target *ti, struct bio *bio, int err)
 		spin_lock_irqsave(&pool->lock, flags);
 		list_for_each_entry_safe(m, tmp, &work, list) {
 			list_del(&m->list);
-			m->quiesced = 1;
+			m->quiesced = true;
 			__maybe_add_mapping(m);
 		}
 		spin_unlock_irqrestore(&pool->lock, flags);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 08/20] dm thin: return error from alloc_data_block if pool is not in write mode
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (6 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 07/20] dm thin: use bool rather than unsigned for flags in structures Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 09/20] dm thin: add mappings to end of prepared_* lists Mike Snitzer
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Joe Thornber

From: Joe Thornber <ejt@redhat.com>

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-thin.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 70b02fd..b721d7a 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -927,6 +927,9 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 	if (pool->no_free_space)
 		return -ENOSPC;
 
+	if (get_pool_mode(pool) != PM_WRITE)
+		return -EINVAL;
+
 	r = dm_pool_get_free_block_count(pool->pmd, &free_blocks);
 	if (r)
 		return r;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 09/20] dm thin: add mappings to end of prepared_* lists
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (7 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 08/20] dm thin: return error from alloc_data_block if pool is not in write mode Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 10/20] dm thin: factor out check_low_water_mark and use bools Mike Snitzer
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

Mappings could be processed in descending logical block order,
particularly if buffered IO is used.  This could adversely affect the
latency of IO processing.  Fix this by adding mappings to the end of the
'prepared_mappings' and 'prepared_discards' lists.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 drivers/md/dm-thin.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index b721d7a..850e7cd 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -535,7 +535,7 @@ static void __maybe_add_mapping(struct dm_thin_new_mapping *m)
 	struct pool *pool = m->tc->pool;
 
 	if (m->quiesced && m->prepared) {
-		list_add(&m->list, &pool->prepared_mappings);
+		list_add_tail(&m->list, &pool->prepared_mappings);
 		wake_worker(pool);
 	}
 }
@@ -1062,7 +1062,7 @@ static void process_discard(struct thin_c *tc, struct bio *bio)
 
 			if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list)) {
 				spin_lock_irqsave(&pool->lock, flags);
-				list_add(&m->list, &pool->prepared_discards);
+				list_add_tail(&m->list, &pool->prepared_discards);
 				spin_unlock_irqrestore(&pool->lock, flags);
 				wake_worker(pool);
 			}
@@ -2923,7 +2923,7 @@ static int thin_endio(struct dm_target *ti, struct bio *bio, int err)
 		if (!list_empty(&work)) {
 			spin_lock_irqsave(&pool->lock, flags);
 			list_for_each_entry_safe(m, tmp, &work, list)
-				list_add(&m->list, &pool->prepared_discards);
+				list_add_tail(&m->list, &pool->prepared_discards);
 			spin_unlock_irqrestore(&pool->lock, flags);
 			wake_worker(pool);
 		}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 10/20] dm thin: factor out check_low_water_mark and use bools
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (8 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 09/20] dm thin: add mappings to end of prepared_* lists Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 11/20] dm thin: handle metadata failures more consistently Mike Snitzer
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Joe Thornber

From: Joe Thornber <ejt@redhat.com>

Factor check_low_water_mark() out of alloc_data_block().
Change a couple unsigned flags in the pool structure to bool.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-thin.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 850e7cd..3420ebe5 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -163,8 +163,8 @@ struct pool {
 	int sectors_per_block_shift;
 
 	struct pool_features pf;
-	unsigned low_water_triggered:1;	/* A dm event has been sent */
-	unsigned no_free_space:1;	/* A -ENOSPC warning has been issued */
+	bool low_water_triggered:1;	/* A dm event has been sent */
+	bool no_free_space:1;		/* A -ENOSPC warning has been issued */
 
 	struct dm_bio_prison *prison;
 	struct dm_kcopyd_client *copier;
@@ -913,6 +913,20 @@ static int commit(struct pool *pool)
 	return r;
 }
 
+static void check_low_water_mark(struct pool *pool, dm_block_t free_blocks)
+{
+	unsigned long flags;
+
+	if (free_blocks <= pool->low_water_blocks && !pool->low_water_triggered) {
+		DMWARN("%s: reached low water mark for data device: sending event.",
+		       dm_device_name(pool->pool_md));
+		spin_lock_irqsave(&pool->lock, flags);
+		pool->low_water_triggered = true;
+		spin_unlock_irqrestore(&pool->lock, flags);
+		dm_table_event(pool->ti->table);
+	}
+}
+
 static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 {
 	int r;
@@ -934,14 +948,7 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 	if (r)
 		return r;
 
-	if (free_blocks <= pool->low_water_blocks && !pool->low_water_triggered) {
-		DMWARN("%s: reached low water mark for data device: sending event.",
-		       dm_device_name(pool->pool_md));
-		spin_lock_irqsave(&pool->lock, flags);
-		pool->low_water_triggered = 1;
-		spin_unlock_irqrestore(&pool->lock, flags);
-		dm_table_event(pool->ti->table);
-	}
+	check_low_water_mark(pool, free_blocks);
 
 	if (!free_blocks) {
 		/*
@@ -967,7 +974,7 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 			DMWARN("%s: no free data space available.",
 			       dm_device_name(pool->pool_md));
 			spin_lock_irqsave(&pool->lock, flags);
-			pool->no_free_space = 1;
+			pool->no_free_space = true;
 			spin_unlock_irqrestore(&pool->lock, flags);
 			return -ENOSPC;
 		}
@@ -1784,8 +1791,8 @@ static struct pool *pool_create(struct mapped_device *pool_md,
 	bio_list_init(&pool->deferred_flush_bios);
 	INIT_LIST_HEAD(&pool->prepared_mappings);
 	INIT_LIST_HEAD(&pool->prepared_discards);
-	pool->low_water_triggered = 0;
-	pool->no_free_space = 0;
+	pool->low_water_triggered = false;
+	pool->no_free_space = false;
 	bio_list_init(&pool->retry_on_resume_list);
 
 	pool->shared_read_ds = dm_deferred_set_create();
@@ -2302,8 +2309,8 @@ static void pool_resume(struct dm_target *ti)
 	unsigned long flags;
 
 	spin_lock_irqsave(&pool->lock, flags);
-	pool->low_water_triggered = 0;
-	pool->no_free_space = 0;
+	pool->low_water_triggered = false;
+	pool->no_free_space = false;
 	__requeue_bios(pool);
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 11/20] dm thin: handle metadata failures more consistently
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (9 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 10/20] dm thin: factor out check_low_water_mark and use bools Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 12/20] dm thin: log info when growing the data or metadata device Mike Snitzer
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Joe Thornber

From: Joe Thornber <ejt@redhat.com>

Introduce metadata_operation_failed() wrappers, around set_pool_mode(),
to assist with improving the consistency of how metadata failures are
handled.  Logging is improved and metadata operation failures trigger
read-only mode immediately.

Also, eliminate redundant set_pool_mode() calls in the two
alloc_data_block() caller's error paths.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-thin.c | 48 +++++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 3420ebe5..ccd917f 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -198,7 +198,7 @@ struct pool {
 };
 
 static enum pool_mode get_pool_mode(struct pool *pool);
-static void set_pool_mode(struct pool *pool, enum pool_mode mode);
+static void metadata_operation_failed(struct pool *pool, const char *op, int r);
 
 /*
  * Target context for a pool.
@@ -641,9 +641,7 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
 	 */
 	r = dm_thin_insert_block(tc->td, m->virt_block, m->data_block);
 	if (r) {
-		DMERR_LIMIT("%s: dm_thin_insert_block() failed: error = %d",
-			    dm_device_name(pool->pool_md), r);
-		set_pool_mode(pool, PM_READ_ONLY);
+		metadata_operation_failed(pool, "dm_thin_insert_block", r);
 		cell_error(pool, m->cell);
 		goto out;
 	}
@@ -904,11 +902,8 @@ static int commit(struct pool *pool)
 		return -EINVAL;
 
 	r = dm_pool_commit_metadata(pool->pmd);
-	if (r) {
-		DMERR_LIMIT("%s: dm_pool_commit_metadata failed: error = %d",
-			    dm_device_name(pool->pool_md), r);
-		set_pool_mode(pool, PM_READ_ONLY);
-	}
+	if (r)
+		metadata_operation_failed(pool, "dm_pool_commit_metadata", r);
 
 	return r;
 }
@@ -945,8 +940,10 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 		return -EINVAL;
 
 	r = dm_pool_get_free_block_count(pool->pmd, &free_blocks);
-	if (r)
+	if (r) {
+		metadata_operation_failed(pool, "dm_pool_get_free_block_count", r);
 		return r;
+	}
 
 	check_low_water_mark(pool, free_blocks);
 
@@ -960,8 +957,10 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 			return r;
 
 		r = dm_pool_get_free_block_count(pool->pmd, &free_blocks);
-		if (r)
+		if (r) {
+			metadata_operation_failed(pool, "dm_pool_get_free_block_count", r);
 			return r;
+		}
 
 		/*
 		 * If we still have no space we set a flag to avoid
@@ -984,11 +983,11 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 	if (r) {
 		if (r == -ENOSPC &&
 		    !dm_pool_get_free_metadata_block_count(pool->pmd, &free_blocks) &&
-		    !free_blocks) {
+		    !free_blocks)
 			DMWARN("%s: no free metadata space available.",
 			       dm_device_name(pool->pool_md));
-			set_pool_mode(pool, PM_READ_ONLY);
-		}
+
+		metadata_operation_failed(pool, "dm_pool_alloc_data_block", r);
 		return r;
 	}
 
@@ -1130,7 +1129,6 @@ static void break_sharing(struct thin_c *tc, struct bio *bio, dm_block_t block,
 	default:
 		DMERR_LIMIT("%s: alloc_data_block() failed: error = %d",
 			    __func__, r);
-		set_pool_mode(pool, PM_READ_ONLY);
 		cell_error(pool, cell);
 		break;
 	}
@@ -1209,7 +1207,6 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 	default:
 		DMERR_LIMIT("%s: alloc_data_block() failed: error = %d",
 			    __func__, r);
-		set_pool_mode(pool, PM_READ_ONLY);
 		cell_error(pool, cell);
 		break;
 	}
@@ -1453,6 +1450,18 @@ static void set_pool_mode(struct pool *pool, enum pool_mode mode)
 	}
 }
 
+/*
+ * Rather than calling set_pool_mode directly, use these which describe the
+ * reason for mode degradation.
+ */
+static void metadata_operation_failed(struct pool *pool, const char *op, int r)
+{
+	DMERR_LIMIT("%s: metadata operation '%s' failed: error = %d",
+		    dm_device_name(pool->pool_md), op, r);
+
+	set_pool_mode(pool, PM_READ_ONLY);
+}
+
 /*----------------------------------------------------------------*/
 
 /*
@@ -2213,9 +2222,7 @@ static int maybe_resize_data_dev(struct dm_target *ti, bool *need_commit)
 	} else if (data_size > sb_data_size) {
 		r = dm_pool_resize_data_dev(pool->pmd, data_size);
 		if (r) {
-			DMERR("%s: failed to resize data device",
-			      dm_device_name(pool->pool_md));
-			set_pool_mode(pool, PM_READ_ONLY);
+			metadata_operation_failed(pool, "dm_pool_resize_data_dev", r);
 			return r;
 		}
 
@@ -2252,8 +2259,7 @@ static int maybe_resize_metadata_dev(struct dm_target *ti, bool *need_commit)
 	} else if (metadata_dev_size > sb_metadata_dev_size) {
 		r = dm_pool_resize_metadata_dev(pool->pmd, metadata_dev_size);
 		if (r) {
-			DMERR("%s: failed to resize metadata device",
-			      dm_device_name(pool->pool_md));
+			metadata_operation_failed(pool, "dm_pool_resize_metadata_dev", r);
 			return r;
 		}
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 12/20] dm thin: log info when growing the data or metadata device
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (10 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 11/20] dm thin: handle metadata failures more consistently Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 13/20] dm thin: cleanup and improve no space handling Mike Snitzer
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 drivers/md/dm-thin.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index ccd917f..3a80c00 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2220,6 +2220,10 @@ static int maybe_resize_data_dev(struct dm_target *ti, bool *need_commit)
 		return -EINVAL;
 
 	} else if (data_size > sb_data_size) {
+		if (sb_data_size)
+			DMINFO("%s: growing the data device from %llu to %llu blocks",
+			       dm_device_name(pool->pool_md),
+			       sb_data_size, (unsigned long long)data_size);
 		r = dm_pool_resize_data_dev(pool->pmd, data_size);
 		if (r) {
 			metadata_operation_failed(pool, "dm_pool_resize_data_dev", r);
@@ -2257,6 +2261,9 @@ static int maybe_resize_metadata_dev(struct dm_target *ti, bool *need_commit)
 		return -EINVAL;
 
 	} else if (metadata_dev_size > sb_metadata_dev_size) {
+		DMINFO("%s: growing the metadata device from %llu to %llu blocks",
+		       dm_device_name(pool->pool_md),
+		       sb_metadata_dev_size, metadata_dev_size);
 		r = dm_pool_resize_metadata_dev(pool->pmd, metadata_dev_size);
 		if (r) {
 			metadata_operation_failed(pool, "dm_pool_resize_metadata_dev", r);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 13/20] dm thin: cleanup and improve no space handling
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (11 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 12/20] dm thin: log info when growing the data or metadata device Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 14/20] dm thin: requeue bios to DM core if no_free_space and in read-only mode Mike Snitzer
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

Factor out_of_data_space() out of alloc_data_block().  Eliminate the use
of 'no_free_space' as a latch in alloc_data_block() -- this is no longer
needed now that we switch to read-only mode when we run out of data or
metadata space.  In a later patch, the 'no_free_space' flag will be
eliminated entirely (in favor of checking metadata rather than relying
on a transient flag).

Move no metdata space handling into metdata_operation_failed().  Set
no_free_space when metadata space is exhausted too.  This is useful,
because it offers consistency, for the following patch that will requeue
data IOs if no_free_space.

Also, rename no_space() to retry_bios_on_resume().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 drivers/md/dm-thin.c | 61 +++++++++++++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 3a80c00..f19b9e3 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -198,6 +198,7 @@ struct pool {
 };
 
 static enum pool_mode get_pool_mode(struct pool *pool);
+static void out_of_data_space(struct pool *pool);
 static void metadata_operation_failed(struct pool *pool, const char *op, int r);
 
 /*
@@ -926,16 +927,8 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 {
 	int r;
 	dm_block_t free_blocks;
-	unsigned long flags;
 	struct pool *pool = tc->pool;
 
-	/*
-	 * Once no_free_space is set we must not allow allocation to succeed.
-	 * Otherwise it is difficult to explain, debug, test and support.
-	 */
-	if (pool->no_free_space)
-		return -ENOSPC;
-
 	if (get_pool_mode(pool) != PM_WRITE)
 		return -EINVAL;
 
@@ -962,31 +955,14 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
 			return r;
 		}
 
-		/*
-		 * If we still have no space we set a flag to avoid
-		 * doing all this checking and return -ENOSPC.  This
-		 * flag serves as a latch that disallows allocations from
-		 * this pool until the admin takes action (e.g. resize or
-		 * table reload).
-		 */
 		if (!free_blocks) {
-			DMWARN("%s: no free data space available.",
-			       dm_device_name(pool->pool_md));
-			spin_lock_irqsave(&pool->lock, flags);
-			pool->no_free_space = true;
-			spin_unlock_irqrestore(&pool->lock, flags);
+			out_of_data_space(pool);
 			return -ENOSPC;
 		}
 	}
 
 	r = dm_pool_alloc_data_block(pool->pmd, result);
 	if (r) {
-		if (r == -ENOSPC &&
-		    !dm_pool_get_free_metadata_block_count(pool->pmd, &free_blocks) &&
-		    !free_blocks)
-			DMWARN("%s: no free metadata space available.",
-			       dm_device_name(pool->pool_md));
-
 		metadata_operation_failed(pool, "dm_pool_alloc_data_block", r);
 		return r;
 	}
@@ -1010,7 +986,7 @@ static void retry_on_resume(struct bio *bio)
 	spin_unlock_irqrestore(&pool->lock, flags);
 }
 
-static void no_space(struct pool *pool, struct dm_bio_prison_cell *cell)
+static void retry_bios_on_resume(struct pool *pool, struct dm_bio_prison_cell *cell)
 {
 	struct bio *bio;
 	struct bio_list bios;
@@ -1123,7 +1099,7 @@ static void break_sharing(struct thin_c *tc, struct bio *bio, dm_block_t block,
 		break;
 
 	case -ENOSPC:
-		no_space(pool, cell);
+		retry_bios_on_resume(pool, cell);
 		break;
 
 	default:
@@ -1201,7 +1177,7 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 		break;
 
 	case -ENOSPC:
-		no_space(pool, cell);
+		retry_bios_on_resume(pool, cell);
 		break;
 
 	default:
@@ -1450,15 +1426,42 @@ static void set_pool_mode(struct pool *pool, enum pool_mode mode)
 	}
 }
 
+static void set_no_free_space(struct pool *pool)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->lock, flags);
+	pool->no_free_space = true;
+	spin_unlock_irqrestore(&pool->lock, flags);
+}
+
 /*
  * Rather than calling set_pool_mode directly, use these which describe the
  * reason for mode degradation.
  */
+static void out_of_data_space(struct pool *pool)
+{
+	DMERR_LIMIT("%s: no free data space available.",
+		    dm_device_name(pool->pool_md));
+	set_no_free_space(pool);
+	set_pool_mode(pool, PM_READ_ONLY);
+}
+
 static void metadata_operation_failed(struct pool *pool, const char *op, int r)
 {
+	dm_block_t free_blocks;
+
 	DMERR_LIMIT("%s: metadata operation '%s' failed: error = %d",
 		    dm_device_name(pool->pool_md), op, r);
 
+	if (r == -ENOSPC &&
+	    !dm_pool_get_free_metadata_block_count(pool->pmd, &free_blocks) &&
+	    !free_blocks) {
+		DMERR_LIMIT("%s: no free metadata space available.",
+			    dm_device_name(pool->pool_md));
+		set_no_free_space(pool);
+	}
+
 	set_pool_mode(pool, PM_READ_ONLY);
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 14/20] dm thin: requeue bios to DM core if no_free_space and in read-only mode
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (12 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 13/20] dm thin: cleanup and improve no space handling Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2014-01-06 14:35   ` Joe Thornber
  2013-12-20 23:37 ` [PATCH for-3.14 15/20] dm thin: add error_if_no_space feature Mike Snitzer
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

Now that we switch the pool to read-only mode when the data device runs
out of space it causes active writers to get IO errors once we resume
after resizing the data device.

If no_free_space is set, save bios to the 'retry_on_resume_list' and
requeue them on resume (once the data or metadata device may have been
resized).

With this patch the resize_io test passes again (on slower storage):
 dmtest run --suite thin-provisioning -n /resize_io/

Later patches fix some subtle races associated with the pool mode
transitions done as part of the pool's -ENOSPC handling.  These races
are exposed on fast storage (e.g. PCIe SSD).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 drivers/md/dm-thin.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index f19b9e3..6e747eb 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -164,7 +164,7 @@ struct pool {
 
 	struct pool_features pf;
 	bool low_water_triggered:1;	/* A dm event has been sent */
-	bool no_free_space:1;		/* A -ENOSPC warning has been issued */
+	bool no_free_space:1;		/* bios will be requeued if set */
 
 	struct dm_bio_prison *prison;
 	struct dm_kcopyd_client *copier;
@@ -986,6 +986,20 @@ static void retry_on_resume(struct bio *bio)
 	spin_unlock_irqrestore(&pool->lock, flags);
 }
 
+static void handle_unserviceable_bio(struct pool *pool, struct bio *bio)
+{
+	/*
+	 * When pool is read-only, no cell locking is needed because
+	 * nothing is changing.
+	 */
+	WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
+
+	if (pool->no_free_space)
+		retry_on_resume(bio);
+	else
+		bio_io_error(bio);
+}
+
 static void retry_bios_on_resume(struct pool *pool, struct dm_bio_prison_cell *cell)
 {
 	struct bio *bio;
@@ -995,7 +1009,7 @@ static void retry_bios_on_resume(struct pool *pool, struct dm_bio_prison_cell *c
 	cell_release(pool, cell, &bios);
 
 	while ((bio = bio_list_pop(&bios)))
-		retry_on_resume(bio);
+		handle_unserviceable_bio(pool, bio);
 }
 
 static void process_discard(struct thin_c *tc, struct bio *bio)
@@ -1249,7 +1263,7 @@ static void process_bio_read_only(struct thin_c *tc, struct bio *bio)
 	switch (r) {
 	case 0:
 		if (lookup_result.shared && (rw == WRITE) && bio->bi_size)
-			bio_io_error(bio);
+			handle_unserviceable_bio(tc->pool, bio);
 		else {
 			inc_all_io_entry(tc->pool, bio);
 			remap_and_issue(tc, bio, lookup_result.block);
@@ -1258,7 +1272,7 @@ static void process_bio_read_only(struct thin_c *tc, struct bio *bio)
 
 	case -ENODATA:
 		if (rw != READ) {
-			bio_io_error(bio);
+			handle_unserviceable_bio(tc->pool, bio);
 			break;
 		}
 
@@ -1569,9 +1583,9 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 		if (get_pool_mode(tc->pool) == PM_READ_ONLY) {
 			/*
 			 * This block isn't provisioned, and we have no way
-			 * of doing so.  Just error it.
+			 * of doing so.
 			 */
-			bio_io_error(bio);
+			handle_unserviceable_bio(tc->pool, bio);
 			return DM_MAPIO_SUBMITTED;
 		}
 		/* fall through */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 15/20] dm thin: add error_if_no_space feature
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (13 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 14/20] dm thin: requeue bios to DM core if no_free_space and in read-only mode Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 16/20] dm thin: eliminate the no_free_space flag Mike Snitzer
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

If the pool runs out of data or metadata space, the pool can either
queue or error the IO destined to the data device.  The default is to
queue the IO until more space is added.

An admin may now configure the pool to error IO when no space is
available by setting the 'error_if_no_space' feature when loading the
thin-pool table.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
---
 Documentation/device-mapper/thin-provisioning.txt |  7 +++++
 drivers/md/dm-thin.c                              | 31 ++++++++++++++++++-----
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 50c44cf..8a7a3d4 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -235,6 +235,8 @@ i) Constructor
       read_only: Don't allow any changes to be made to the pool
 		 metadata.
 
+      error_if_no_space: Error IOs, instead of queueing, if no space.
+
     Data block size must be between 64KB (128 sectors) and 1GB
     (2097152 sectors) inclusive.
 
@@ -276,6 +278,11 @@ ii) Status
 	contain the string 'Fail'.  The userspace recovery tools
 	should then be used.
 
+    error_if_no_space|queue_if_no_space
+	If the pool runs out of data or metadata space, the pool will
+	either queue or error the IO destined to the data device.  The
+	default is to queue the IO until more space is added.
+
 iii) Messages
 
     create_thin <dev id>
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 6e747eb..f16c3e4 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -144,6 +144,7 @@ struct pool_features {
 	bool zero_new_blocks:1;
 	bool discard_enabled:1;
 	bool discard_passdown:1;
+	bool error_if_no_space:1;
 };
 
 struct thin_c;
@@ -1444,6 +1445,9 @@ static void set_no_free_space(struct pool *pool)
 {
 	unsigned long flags;
 
+	if (pool->pf.error_if_no_space)
+		return;
+
 	spin_lock_irqsave(&pool->lock, flags);
 	pool->no_free_space = true;
 	spin_unlock_irqrestore(&pool->lock, flags);
@@ -1727,6 +1731,7 @@ static void pool_features_init(struct pool_features *pf)
 	pf->zero_new_blocks = true;
 	pf->discard_enabled = true;
 	pf->discard_passdown = true;
+	pf->error_if_no_space = false;
 }
 
 static void __pool_destroy(struct pool *pool)
@@ -1972,6 +1977,9 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
 		else if (!strcasecmp(arg_name, "read_only"))
 			pf->mode = PM_READ_ONLY;
 
+		else if (!strcasecmp(arg_name, "error_if_no_space"))
+			pf->error_if_no_space = true;
+
 		else {
 			ti->error = "Unrecognised pool feature requested";
 			r = -EINVAL;
@@ -2042,6 +2050,8 @@ static dm_block_t calc_metadata_threshold(struct pool_c *pt)
  *	     skip_block_zeroing: skips the zeroing of newly-provisioned blocks.
  *	     ignore_discard: disable discard
  *	     no_discard_passdown: don't pass discards down to the data device
+ *	     read_only: Don't allow any changes to be made to the pool metadata.
+ *	     error_if_no_space: error IOs, instead of queueing, if no space.
  */
 static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
 {
@@ -2559,7 +2569,8 @@ static void emit_flags(struct pool_features *pf, char *result,
 		       unsigned sz, unsigned maxlen)
 {
 	unsigned count = !pf->zero_new_blocks + !pf->discard_enabled +
-		!pf->discard_passdown + (pf->mode == PM_READ_ONLY);
+		!pf->discard_passdown + (pf->mode == PM_READ_ONLY) +
+		pf->error_if_no_space;
 	DMEMIT("%u ", count);
 
 	if (!pf->zero_new_blocks)
@@ -2573,6 +2584,9 @@ static void emit_flags(struct pool_features *pf, char *result,
 
 	if (pf->mode == PM_READ_ONLY)
 		DMEMIT("read_only ");
+
+	if (pf->error_if_no_space)
+		DMEMIT("error_if_no_space ");
 }
 
 /*
@@ -2667,11 +2681,16 @@ static void pool_status(struct dm_target *ti, status_type_t type,
 			DMEMIT("rw ");
 
 		if (!pool->pf.discard_enabled)
-			DMEMIT("ignore_discard");
+			DMEMIT("ignore_discard ");
 		else if (pool->pf.discard_passdown)
-			DMEMIT("discard_passdown");
+			DMEMIT("discard_passdown ");
+		else
+			DMEMIT("no_discard_passdown ");
+
+		if (pool->pf.error_if_no_space)
+			DMEMIT("error_if_no_space ");
 		else
-			DMEMIT("no_discard_passdown");
+			DMEMIT("queue_if_no_space ");
 
 		break;
 
@@ -2770,7 +2789,7 @@ static struct target_type pool_target = {
 	.name = "thin-pool",
 	.features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE |
 		    DM_TARGET_IMMUTABLE,
-	.version = {1, 9, 0},
+	.version = {1, 10, 0},
 	.module = THIS_MODULE,
 	.ctr = pool_ctr,
 	.dtr = pool_dtr,
@@ -3057,7 +3076,7 @@ static int thin_iterate_devices(struct dm_target *ti,
 
 static struct target_type thin_target = {
 	.name = "thin",
-	.version = {1, 9, 0},
+	.version = {1, 10, 0},
 	.module	= THIS_MODULE,
 	.ctr = thin_ctr,
 	.dtr = thin_dtr,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 16/20] dm thin: eliminate the no_free_space flag
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (14 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 15/20] dm thin: add error_if_no_space feature Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 17/20] dm thin: fix set_pool_mode exposed pool operation races Mike Snitzer
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

The pool's error_if_no_space flag can easily serve the same purpose that
no_free_space did, namely: control whether handle_unserviceable_bio()
will error a bio or requeue it.

This is cleaner since error_if_no_space is established when the pool's
features are processed during table load.  So it avoids managing the
no_free_space flag by taking the pool's spinlock.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-thin.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index f16c3e4..bfae790 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -165,7 +165,6 @@ struct pool {
 
 	struct pool_features pf;
 	bool low_water_triggered:1;	/* A dm event has been sent */
-	bool no_free_space:1;		/* bios will be requeued if set */
 
 	struct dm_bio_prison *prison;
 	struct dm_kcopyd_client *copier;
@@ -995,10 +994,10 @@ static void handle_unserviceable_bio(struct pool *pool, struct bio *bio)
 	 */
 	WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
 
-	if (pool->no_free_space)
-		retry_on_resume(bio);
-	else
+	if (pool->pf.error_if_no_space)
 		bio_io_error(bio);
+	else
+		retry_on_resume(bio);
 }
 
 static void retry_bios_on_resume(struct pool *pool, struct dm_bio_prison_cell *cell)
@@ -1441,18 +1440,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode mode)
 	}
 }
 
-static void set_no_free_space(struct pool *pool)
-{
-	unsigned long flags;
-
-	if (pool->pf.error_if_no_space)
-		return;
-
-	spin_lock_irqsave(&pool->lock, flags);
-	pool->no_free_space = true;
-	spin_unlock_irqrestore(&pool->lock, flags);
-}
-
 /*
  * Rather than calling set_pool_mode directly, use these which describe the
  * reason for mode degradation.
@@ -1461,7 +1448,6 @@ static void out_of_data_space(struct pool *pool)
 {
 	DMERR_LIMIT("%s: no free data space available.",
 		    dm_device_name(pool->pool_md));
-	set_no_free_space(pool);
 	set_pool_mode(pool, PM_READ_ONLY);
 }
 
@@ -1474,11 +1460,9 @@ static void metadata_operation_failed(struct pool *pool, const char *op, int r)
 
 	if (r == -ENOSPC &&
 	    !dm_pool_get_free_metadata_block_count(pool->pmd, &free_blocks) &&
-	    !free_blocks) {
+	    !free_blocks)
 		DMERR_LIMIT("%s: no free metadata space available.",
 			    dm_device_name(pool->pool_md));
-		set_no_free_space(pool);
-	}
 
 	set_pool_mode(pool, PM_READ_ONLY);
 }
@@ -1823,7 +1807,6 @@ static struct pool *pool_create(struct mapped_device *pool_md,
 	INIT_LIST_HEAD(&pool->prepared_mappings);
 	INIT_LIST_HEAD(&pool->prepared_discards);
 	pool->low_water_triggered = false;
-	pool->no_free_space = false;
 	bio_list_init(&pool->retry_on_resume_list);
 
 	pool->shared_read_ds = dm_deferred_set_create();
@@ -2350,7 +2333,6 @@ static void pool_resume(struct dm_target *ti)
 
 	spin_lock_irqsave(&pool->lock, flags);
 	pool->low_water_triggered = false;
-	pool->no_free_space = false;
 	__requeue_bios(pool);
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 17/20] dm thin: fix set_pool_mode exposed pool operation races
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (15 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 16/20] dm thin: eliminate the no_free_space flag Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 18/20] dm thin: fix pool_preresume resize with heavy IO races Mike Snitzer
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: stable

The pool mode must not be switched until after the corresponding pool
process_* methods have been established.  Otherwise, because
set_pool_mode() isn't interlocked with the IO path for performance
reasons, the IO path can end up executing process_* operations that
don't match the mode.  This patch eliminates problems like the following
(as seen on really fast PCIe SSD storage when transitioning the pool's
mode from PM_READ_ONLY to PM_WRITE):

kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event.
kernel: device-mapper: thin: 253:2: no free data space available.
kernel: device-mapper: thin: 253:2: switching pool to read-only mode
kernel: device-mapper: thin: 253:2: switching pool to write mode
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 11 PID: 7564 at drivers/md/dm-thin.c:995 handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]()
...
kernel: Workqueue: dm-thin do_worker [dm_thin_pool]
kernel: 00000000000003e3 ffff880308831cc8 ffffffff8152ebcb 00000000000003e3
kernel: 0000000000000000 ffff880308831d08 ffffffff8104c46c ffff88032502a800
kernel: ffff880036409000 ffff88030ec7ce00 0000000000000001 00000000ffffffc3
kernel: Call Trace:
kernel: [<ffffffff8152ebcb>] dump_stack+0x49/0x5e
kernel: [<ffffffff8104c46c>] warn_slowpath_common+0x8c/0xc0
kernel: [<ffffffff8104c4ba>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffffa001e2c6>] handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]
kernel: [<ffffffffa001f276>] process_bio_read_only+0x136/0x180 [dm_thin_pool]
kernel: [<ffffffffa0020b75>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
kernel: [<ffffffffa0020d31>] do_worker+0x51/0x60 [dm_thin_pool]
kernel: [<ffffffff81067823>] process_one_work+0x183/0x490
kernel: [<ffffffff81068c70>] worker_thread+0x120/0x3a0
kernel: [<ffffffff81068b50>] ? manage_workers+0x160/0x160
kernel: [<ffffffff8106e86e>] kthread+0xce/0xf0
kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
kernel: [<ffffffff8153b3ec>] ret_from_fork+0x7c/0xb0
kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
kernel: ---[ end trace 3f00528e08ffa55c ]---
kernel: device-mapper: thin: pool mode is PM_WRITE not PM_READ_ONLY like expected!?

dm-thin.c:995 was the WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
at the top of handle_unserviceable_bio().  And as the additional
debugging I had conveys: the pool mode was _not_ PM_READ_ONLY like
expected, it was already PM_WRITE, yet pool->process_bio was still set
to process_bio_read_only().

Also, while fixing this up, reduce logging of redundant pool mode
transitions by checking new_mode is different from old_mode.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/md/dm-thin.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index bfae790..4a2432a 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1396,16 +1396,16 @@ static enum pool_mode get_pool_mode(struct pool *pool)
 	return pool->pf.mode;
 }

-static void set_pool_mode(struct pool *pool, enum pool_mode mode)
+static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 {
 	int r;
+	enum pool_mode old_mode = pool->pf.mode;

-	pool->pf.mode = mode;
-
-	switch (mode) {
+	switch (new_mode) {
 	case PM_FAIL:
-		DMERR("%s: switching pool to failure mode",
-		      dm_device_name(pool->pool_md));
+		if (old_mode != new_mode)
+			DMERR("%s: switching pool to failure mode",
+			      dm_device_name(pool->pool_md));
 		dm_pool_metadata_read_only(pool->pmd);
 		pool->process_bio = process_bio_fail;
 		pool->process_discard = process_bio_fail;
@@ -1414,13 +1414,15 @@ static void set_pool_mode(struct pool *pool, enum pool_mode mode)
 		break;

 	case PM_READ_ONLY:
-		DMERR("%s: switching pool to read-only mode",
-		      dm_device_name(pool->pool_md));
+		if (old_mode != new_mode)
+			DMERR("%s: switching pool to read-only mode",
+			      dm_device_name(pool->pool_md));
 		r = dm_pool_abort_metadata(pool->pmd);
 		if (r) {
 			DMERR("%s: aborting transaction failed",
 			      dm_device_name(pool->pool_md));
-			set_pool_mode(pool, PM_FAIL);
+			new_mode = PM_FAIL;
+			set_pool_mode(pool, new_mode);
 		} else {
 			dm_pool_metadata_read_only(pool->pmd);
 			pool->process_bio = process_bio_read_only;
@@ -1431,6 +1433,9 @@ static void set_pool_mode(struct pool *pool, enum pool_mode mode)
 		break;

 	case PM_WRITE:
+		if (old_mode != new_mode)
+			DMINFO("%s: switching pool to write mode",
+			       dm_device_name(pool->pool_md));
 		dm_pool_metadata_read_write(pool->pmd);
 		pool->process_bio = process_bio;
 		pool->process_discard = process_discard;
@@ -1438,6 +1443,8 @@ static void set_pool_mode(struct pool *pool, enum pool_mode mode)
 		pool->process_prepared_discard = process_prepared_discard;
 		break;
 	}
+
+	pool->pf.mode = new_mode;
 }

 /*
@@ -1681,6 +1688,17 @@ static int bind_control_target(struct pool *pool, struct dm_target *ti)
 	enum pool_mode new_mode = pt->adjusted_pf.mode;

 	/*
+	 * Don't change the pool's mode until set_pool_mode() below.
+	 * Otherwise the pool's process_* function pointers may
+	 * not match the desired pool mode.
+	 */
+	pt->adjusted_pf.mode = old_mode;
+
+	pool->ti = ti;
+	pool->pf = pt->adjusted_pf;
+	pool->low_water_blocks = pt->low_water_blocks;
+
+	/*
 	 * If we were in PM_FAIL mode, rollback of metadata failed.  We're
 	 * not going to recover without a thin_repair.  So we never let the
 	 * pool move out of the old mode.  On the other hand a PM_READ_ONLY
@@ -1690,10 +1708,6 @@ static int bind_control_target(struct pool *pool, struct dm_target *ti)
 	if (old_mode == PM_FAIL)
 		new_mode = old_mode;

-	pool->ti = ti;
-	pool->low_water_blocks = pt->low_water_blocks;
-	pool->pf = pt->adjusted_pf;
-
 	set_pool_mode(pool, new_mode);

 	return 0;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 18/20] dm thin: fix pool_preresume resize with heavy IO races
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (16 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 17/20] dm thin: fix set_pool_mode exposed pool operation races Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 19/20] dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 20/20] dm cache policy mq: introduce three promotion threshold tunables Mike Snitzer
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel

Before, when a pool is being resized, on resume the pool's mode was
being immediately set to PM_WRITE and the process_* methods would be set
to the normal writeable operations.  This was occuring before
pool_resume() was able to actually resize either the metadata or data
device and resulted in the resize failing.

Now, the pool is forced to stay in read-only mode if it was already in
read-only mode.  This prevents bouncing the pool's mode and associated
process_* methods and consistently keeps read-only processing in place
until the resize(s) complete.  To achieve this the pool can now be in a
PM_READ_ONLY state but the metadata's is !read_only -- so as to allow
the commit() the follows the resize to take place.  Differentiate
between commit_metadata_superblock() and commit() since different
negative checks apply (but factor out a common __commit that each
calls).

With this patch the resize_io test passes (on fast storage):
 dmtest run --suite thin-provisioning -n /resize_io/

Otherwise, problems like the following 2 examples were seen (again when
testing on really fast PCIe SSD storage):

1)
kernel: device-mapper: thin: 253:2: growing the data device from 2048 to 4096 blocks
kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event.
kernel: device-mapper: thin: 253:2: no free data space available.
kernel: device-mapper: thin: 253:2: switching pool to read-only mode
kernel: device-mapper: thin: 253:2: switching pool to write mode (from PM_WRITE)
kernel: device-mapper: thin: 253:2: growing the data device from 4096 to 6144 blocks
kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event.
kernel: device-mapper: thin: 253:2: no free data space available.
kernel: device-mapper: thin: 253:2: switching pool to read-only mode
kernel: device-mapper: thin: 253:2: switching pool to write mode (from PM_WRITE)
kernel: device-mapper: thin: 253:2: metadata operation 'dm_pool_commit_metadata' failed: error = -1
kernel: device-mapper: thin: 253:2: switching pool to read-only mode
kernel: device-mapper: thin: 253:2: growing the data device from 6144 to 8192 blocks
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 1 PID: 22083 at drivers/md/dm-thin.c:934 alloc_data_block+0x171/0x1a0 [dm_thin_pool]()

dm-thin.c:934 was a WARN_ON() I added if alloc_data_block() was
called when pool mode != PM_WRITE.

The above clearly illustrated that alloc_data_block was still getting
called like crazy even though the pool went read-only waiting for a
resize.  This, and the following example, occurred due to the thin
device calling wake_worker() which kicked do_worker() during
pool_preresume() -- which caused calls to prcess_* methods to occur
during the resize.

2)
And this failure offers a clear indicator we weren't properly resizing,
but we were operating like we did:

kernel: device-mapper: thin: 253:2: switching pool to write mode
kernel: device-mapper: thin: 253:2: growing the data device from 6144 to 8192 blocks
kernel: device-mapper: thin: 253:2: no free data space available.
kernel: device-mapper: thin: 253:2: switching pool to read-only mode
kernel: device-mapper: thin: 253:2: switching pool to write mode
kernel: device-mapper: thin: 253:2: growing the data device from 6144 to 10240 blocks
kernel: attempt to access beyond end of device
kernel: dm-2: rw=17, want=1310848, limit=1310720
kernel: attempt to access beyond end of device
kernel: dm-2: rw=17, want=1310976, limit=1310720

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-thin-metadata.c | 35 +++++++++++++++++++
 drivers/md/dm-thin-metadata.h |  3 ++
 drivers/md/dm-thin.c          | 78 +++++++++++++++++++++++++++++++++++--------
 3 files changed, 103 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 7da3476..974e2cf 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -1586,6 +1586,30 @@ int dm_pool_get_free_metadata_block_count(struct dm_pool_metadata *pmd,
 	return r;
 }
 
+int dm_pool_is_out_of_space(struct dm_pool_metadata *pmd, bool *result)
+{
+	int r;
+	dm_block_t free_blocks;
+
+	*result = false;
+
+	r = dm_pool_get_free_block_count(pmd, &free_blocks);
+	if (r)
+		return r;
+	if (!free_blocks) {
+		*result = true;
+		return 0;
+	}
+
+	r = dm_pool_get_free_metadata_block_count(pmd, &free_blocks);
+	if (r)
+		return r;
+	if (!free_blocks)
+		*result = true;
+
+	return 0;
+}
+
 int dm_pool_get_metadata_dev_size(struct dm_pool_metadata *pmd,
 				  dm_block_t *result)
 {
@@ -1709,6 +1733,17 @@ int dm_pool_resize_metadata_dev(struct dm_pool_metadata *pmd, dm_block_t new_cou
 	return r;
 }
 
+bool dm_pool_metadata_is_read_only(struct dm_pool_metadata *pmd)
+{
+	bool r;
+
+	down_read(&pmd->root_lock);
+	r = pmd->read_only;
+	up_read(&pmd->root_lock);
+
+	return r;
+}
+
 void dm_pool_metadata_read_only(struct dm_pool_metadata *pmd)
 {
 	down_write(&pmd->root_lock);
diff --git a/drivers/md/dm-thin-metadata.h b/drivers/md/dm-thin-metadata.h
index 9a36856..dcbd08c 100644
--- a/drivers/md/dm-thin-metadata.h
+++ b/drivers/md/dm-thin-metadata.h
@@ -177,6 +177,8 @@ int dm_pool_get_free_metadata_block_count(struct dm_pool_metadata *pmd,
 int dm_pool_get_metadata_dev_size(struct dm_pool_metadata *pmd,
 				  dm_block_t *result);
 
+int dm_pool_is_out_of_space(struct dm_pool_metadata *pmd, bool *result);
+
 int dm_pool_get_data_block_size(struct dm_pool_metadata *pmd, sector_t *result);
 
 int dm_pool_get_data_dev_size(struct dm_pool_metadata *pmd, dm_block_t *result);
@@ -194,6 +196,7 @@ int dm_pool_resize_metadata_dev(struct dm_pool_metadata *pmd, dm_block_t new_siz
  * Flicks the underlying block manager into read only mode, so you know
  * that nothing is changing.
  */
+bool dm_pool_metadata_is_read_only(struct dm_pool_metadata *pmd);
 void dm_pool_metadata_read_only(struct dm_pool_metadata *pmd);
 void dm_pool_metadata_read_write(struct dm_pool_metadata *pmd);
 
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 4a2432a..53dcdfb 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -891,22 +891,43 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 	}
 }
 
+static int __commit(struct pool *pool)
+{
+	int r = dm_pool_commit_metadata(pool->pmd);
+	if (r)
+		metadata_operation_failed(pool, "dm_pool_commit_metadata", r);
+
+	return r;
+}
+
+/*
+ * Commit that is confined to the metadata superblock due to pool
+ * being in PM_READ_ONLY mode (read-only process_* functions).
+ * A non-zero return indicates metadata is not writable.
+ */
+static int commit_metadata_superblock(struct pool *pool)
+{
+	WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
+
+	if (dm_pool_metadata_is_read_only(pool->pmd)) {
+		DMERR_LIMIT("%s: commit failed: pool metadata is not writable",
+			    dm_device_name(pool->pool_md));
+		return -EINVAL;
+	}
+
+	return __commit(pool);
+}
+
 /*
  * A non-zero return indicates read_only or fail_io mode.
  * Many callers don't care about the return value.
  */
 static int commit(struct pool *pool)
 {
-	int r;
-
 	if (get_pool_mode(pool) != PM_WRITE)
 		return -EINVAL;
 
-	r = dm_pool_commit_metadata(pool->pmd);
-	if (r)
-		metadata_operation_failed(pool, "dm_pool_commit_metadata", r);
-
-	return r;
+	return __commit(pool);
 }
 
 static void check_low_water_mark(struct pool *pool, dm_block_t free_blocks)
@@ -1424,7 +1445,11 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 			new_mode = PM_FAIL;
 			set_pool_mode(pool, new_mode);
 		} else {
-			dm_pool_metadata_read_only(pool->pmd);
+			bool out_of_space;
+			if (!dm_pool_is_out_of_space(pool->pmd, &out_of_space) && out_of_space)
+				dm_pool_metadata_read_write(pool->pmd);
+			else
+				dm_pool_metadata_read_only(pool->pmd);
 			pool->process_bio = process_bio_read_only;
 			pool->process_discard = process_discard;
 			pool->process_prepared_mapping = process_prepared_mapping_fail;
@@ -1701,12 +1726,17 @@ static int bind_control_target(struct pool *pool, struct dm_target *ti)
 	/*
 	 * If we were in PM_FAIL mode, rollback of metadata failed.  We're
 	 * not going to recover without a thin_repair.  So we never let the
-	 * pool move out of the old mode.  On the other hand a PM_READ_ONLY
-	 * may have been due to a lack of metadata or data space, and may
-	 * now work (ie. if the underlying devices have been resized).
+	 * pool move out of the old mode.  Similarly, if in PM_READ_ONLY mode
+	 * and the pool doesn't have free space we must not switch modes here;
+	 * pool_preresume() will transition to PM_WRITE mode if a resize succeeds.
 	 */
 	if (old_mode == PM_FAIL)
 		new_mode = old_mode;
+	else if (old_mode == PM_READ_ONLY) {
+		bool out_of_space;
+		if (!dm_pool_is_out_of_space(pool->pmd, &out_of_space) && out_of_space)
+			new_mode = old_mode;
+	}
 
 	set_pool_mode(pool, new_mode);
 
@@ -2317,6 +2347,7 @@ static int pool_preresume(struct dm_target *ti)
 	bool need_commit1, need_commit2;
 	struct pool_c *pt = ti->private;
 	struct pool *pool = pt->pool;
+	bool out_of_space = false;
 
 	/*
 	 * Take control of the pool object.
@@ -2325,6 +2356,12 @@ static int pool_preresume(struct dm_target *ti)
 	if (r)
 		return r;
 
+	if (get_pool_mode(pool) == PM_READ_ONLY) {
+		r = dm_pool_is_out_of_space(pool->pmd, &out_of_space);
+		if (r)
+			metadata_operation_failed(pool, "dm_pool_is_out_of_space", r);
+	}
+
 	r = maybe_resize_data_dev(ti, &need_commit1);
 	if (r)
 		return r;
@@ -2333,8 +2370,23 @@ static int pool_preresume(struct dm_target *ti)
 	if (r)
 		return r;
 
-	if (need_commit1 || need_commit2)
-		(void) commit(pool);
+	if (need_commit1 || need_commit2) {
+		if (out_of_space) {
+			/*
+			 * Must update metadata's superblock despite read-only mode.
+			 */
+			r = commit_metadata_superblock(pool);
+			if (r || pt->requested_pf.mode == PM_READ_ONLY)
+				dm_pool_metadata_read_only(pool->pmd);
+			else {
+				/*
+				 * If resize succeeded transition the pool to write mode.
+				 */
+				set_pool_mode(pool, PM_WRITE);
+			}
+		} else
+			(void) commit(pool);
+	}
 
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 19/20] dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (17 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 18/20] dm thin: fix pool_preresume resize with heavy IO races Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  2013-12-20 23:37 ` [PATCH for-3.14 20/20] dm cache policy mq: introduce three promotion threshold tunables Mike Snitzer
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Wei Yongjun

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-mq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 64780ad..7f1aaa3 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -287,9 +287,8 @@ static struct entry *alloc_entry(struct entry_pool *ep)
 static struct entry *alloc_particular_entry(struct entry_pool *ep, dm_cblock_t cblock)
 {
 	struct entry *e = ep->entries + from_cblock(cblock);
-	list_del(&e->list);
 
-	INIT_LIST_HEAD(&e->list);
+	list_del_init(&e->list);
 	INIT_HLIST_NODE(&e->hlist);
 	ep->nr_allocated++;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for-3.14 20/20] dm cache policy mq: introduce three promotion threshold tunables
  2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
                   ` (18 preceding siblings ...)
  2013-12-20 23:37 ` [PATCH for-3.14 19/20] dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD Mike Snitzer
@ 2013-12-20 23:37 ` Mike Snitzer
  19 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2013-12-20 23:37 UTC (permalink / raw)
  To: dm-devel; +Cc: Joe Thornber

From: Joe Thornber <ejt@redhat.com>

Internally the mq policy maintains a promotion threshold variable.  If
the hit count of a block not in the cache goes above this threshold it
gets promoted to the cache.

This patch introduces three new tunables that allow you to tweak the
promotion threshold by adding a small value.  These adjustments depend
on the io type:

   read_promote_adjustment:    READ io, default 4
   write_promote_adjustment:   WRITE io, default 8
   discard_promote_adjustment: READ/WRITE io to a discarded block, default 1

If you're trying to quickly warm a new cache device you may wish to
reduce these to encourage promotion.  Remember to switch them back to
their defaults after the cache fills though.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 Documentation/device-mapper/cache-policies.txt | 16 ++++++-
 drivers/md/dm-cache-policy-mq.c                | 64 +++++++++++++++++---------
 2 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.txt
index df52a84..66c2774 100644
--- a/Documentation/device-mapper/cache-policies.txt
+++ b/Documentation/device-mapper/cache-policies.txt
@@ -40,8 +40,11 @@ on hit count on entry. The policy aims to take different cache miss
 costs into account and to adjust to varying load patterns automatically.
 
 Message and constructor argument pairs are:
-	'sequential_threshold <#nr_sequential_ios>' and
-	'random_threshold <#nr_random_ios>'.
+	'sequential_threshold <#nr_sequential_ios>'
+	'random_threshold <#nr_random_ios>'
+	'read_promote_adjustment <value>'
+	'write_promote_adjustment <value>'
+	'discard_promote_adjustment <value>'
 
 The sequential threshold indicates the number of contiguous I/Os
 required before a stream is treated as sequential.  The random threshold
@@ -55,6 +58,15 @@ since spindles tend to have good bandwidth. The io_tracker counts
 contiguous I/Os to try to spot when the io is in one of these sequential
 modes.
 
+Internally the mq policy maintains a promotion threshold variable.  If
+the hit count of a block not in the cache goes above this threshold it
+gets promoted to the cache.  The read, write and discard promote adjustment
+tunables allow you to tweak the promotion threshold by adding a small
+value based on the io type.  They default to 4, 8 and 1 respectively.
+If you're trying to quickly warm a new cache device you may wish to
+reduce these to encourage promotion.  Remember to switch them back to
+their defaults after the cache fills though.
+
 cleaner
 -------
 
diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 7f1aaa3..e63e36c 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -390,6 +390,10 @@ struct mq_policy {
 	 */
 	unsigned promote_threshold;
 
+	unsigned discard_promote_adjustment;
+	unsigned read_promote_adjustment;
+	unsigned write_promote_adjustment;
+
 	/*
 	 * The hash table allows us to quickly find an entry by origin
 	 * block.  Both pre_cache and cache entries are in here.
@@ -399,6 +403,10 @@ struct mq_policy {
 	struct hlist_head *table;
 };
 
+#define DEFAULT_DISCARD_PROMOTE_ADJUSTMENT 1
+#define DEFAULT_READ_PROMOTE_ADJUSTMENT 4
+#define DEFAULT_WRITE_PROMOTE_ADJUSTMENT 8
+
 /*----------------------------------------------------------------*/
 
 /*
@@ -641,25 +649,21 @@ static int demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock)
  * We bias towards reads, since they can be demoted at no cost if they
  * haven't been dirtied.
  */
-#define DISCARDED_PROMOTE_THRESHOLD 1
-#define READ_PROMOTE_THRESHOLD 4
-#define WRITE_PROMOTE_THRESHOLD 8
-
 static unsigned adjusted_promote_threshold(struct mq_policy *mq,
 					   bool discarded_oblock, int data_dir)
 {
 	if (data_dir == READ)
-		return mq->promote_threshold + READ_PROMOTE_THRESHOLD;
+		return mq->promote_threshold + mq->read_promote_adjustment;
 
 	if (discarded_oblock && (any_free_cblocks(mq) || any_clean_cblocks(mq))) {
 		/*
 		 * We don't need to do any copying at all, so give this a
 		 * very low threshold.
 		 */
-		return DISCARDED_PROMOTE_THRESHOLD;
+		return mq->discard_promote_adjustment;
 	}
 
-	return mq->promote_threshold + WRITE_PROMOTE_THRESHOLD;
+	return mq->promote_threshold + mq->write_promote_adjustment;
 }
 
 static bool should_promote(struct mq_policy *mq, struct entry *e,
@@ -808,7 +812,7 @@ static int no_entry_found(struct mq_policy *mq, dm_oblock_t oblock,
 			  bool can_migrate, bool discarded_oblock,
 			  int data_dir, struct policy_result *result)
 {
-	if (adjusted_promote_threshold(mq, discarded_oblock, data_dir) == 1) {
+	if (adjusted_promote_threshold(mq, discarded_oblock, data_dir) <= 1) {
 		if (can_migrate)
 			insert_in_cache(mq, oblock, result);
 		else
@@ -1134,20 +1138,28 @@ static int mq_set_config_value(struct dm_cache_policy *p,
 			       const char *key, const char *value)
 {
 	struct mq_policy *mq = to_mq_policy(p);
-	enum io_pattern pattern;
 	unsigned long tmp;
 
-	if (!strcasecmp(key, "random_threshold"))
-		pattern = PATTERN_RANDOM;
-	else if (!strcasecmp(key, "sequential_threshold"))
-		pattern = PATTERN_SEQUENTIAL;
-	else
-		return -EINVAL;
-
 	if (kstrtoul(value, 10, &tmp))
 		return -EINVAL;
 
-	mq->tracker.thresholds[pattern] = tmp;
+	if (!strcasecmp(key, "random_threshold")) {
+		mq->tracker.thresholds[PATTERN_RANDOM] = tmp;
+
+	} else if (!strcasecmp(key, "sequential_threshold")) {
+		mq->tracker.thresholds[PATTERN_SEQUENTIAL] = tmp;
+
+	} else if (!strcasecmp(key, "discard_promote_adjustment"))
+		mq->discard_promote_adjustment = tmp;
+
+	else if (!strcasecmp(key, "read_promote_adjustment"))
+		mq->read_promote_adjustment = tmp;
+
+	else if (!strcasecmp(key, "write_promote_adjustment"))
+		mq->write_promote_adjustment = tmp;
+
+	else
+		return -EINVAL;
 
 	return 0;
 }
@@ -1157,9 +1169,16 @@ static int mq_emit_config_values(struct dm_cache_policy *p, char *result, unsign
 	ssize_t sz = 0;
 	struct mq_policy *mq = to_mq_policy(p);
 
-	DMEMIT("4 random_threshold %u sequential_threshold %u",
+	DMEMIT("10 random_threshold %u "
+	       "sequential_threshold %u "
+	       "discard_promote_adjustment %u "
+	       "read_promote_adjustment %u "
+	       "write_promote_adjustment %u",
 	       mq->tracker.thresholds[PATTERN_RANDOM],
-	       mq->tracker.thresholds[PATTERN_SEQUENTIAL]);
+	       mq->tracker.thresholds[PATTERN_SEQUENTIAL],
+	       mq->discard_promote_adjustment,
+	       mq->read_promote_adjustment,
+	       mq->write_promote_adjustment);
 
 	return 0;
 }
@@ -1212,6 +1231,9 @@ static struct dm_cache_policy *mq_create(dm_cblock_t cache_size,
 	mq->hit_count = 0;
 	mq->generation = 0;
 	mq->promote_threshold = 0;
+	mq->discard_promote_adjustment = DEFAULT_DISCARD_PROMOTE_ADJUSTMENT;
+	mq->read_promote_adjustment = DEFAULT_READ_PROMOTE_ADJUSTMENT;
+	mq->write_promote_adjustment = DEFAULT_WRITE_PROMOTE_ADJUSTMENT;
 	mutex_init(&mq->lock);
 	spin_lock_init(&mq->tick_lock);
 
@@ -1243,7 +1265,7 @@ bad_pre_cache_init:
 
 static struct dm_cache_policy_type mq_policy_type = {
 	.name = "mq",
-	.version = {1, 1, 0},
+	.version = {1, 2, 0},
 	.hint_size = 4,
 	.owner = THIS_MODULE,
 	.create = mq_create
@@ -1251,7 +1273,7 @@ static struct dm_cache_policy_type mq_policy_type = {
 
 static struct dm_cache_policy_type default_policy_type = {
 	.name = "default",
-	.version = {1, 1, 0},
+	.version = {1, 2, 0},
 	.hint_size = 4,
 	.owner = THIS_MODULE,
 	.create = mq_create
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH maybe-for-3.13 01/20] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping
  2013-12-20 23:37 ` [PATCH maybe-for-3.13 01/20] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping Mike Snitzer
@ 2014-01-06 14:21   ` Joe Thornber
  0 siblings, 0 replies; 27+ messages in thread
From: Joe Thornber @ 2014-01-06 14:21 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 20, 2013 at 06:37:14PM -0500, Mike Snitzer wrote:
> As additional members are added to the dm_thin_new_mapping structure
> care should be taken to make sure they get initialized before use.

ACK.

> +	memset(m, 0, sizeof(struct dm_thin_new_mapping));
> +	INIT_LIST_HEAD(&m->list);
> +	m->bio = NULL;

Remove redundant m->bio assignment?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for-3.14 05/20] dm space map metadata: limit errors in sm_metadata_new_block
  2013-12-20 23:37 ` [PATCH for-3.14 05/20] dm space map metadata: limit errors in sm_metadata_new_block Mike Snitzer
@ 2014-01-06 14:25   ` Joe Thornber
  0 siblings, 0 replies; 27+ messages in thread
From: Joe Thornber @ 2014-01-06 14:25 UTC (permalink / raw)
  To: device-mapper development

ACK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for-3.14 06/20] dm persistent data: cleanup dm-thin specific references in text
  2013-12-20 23:37 ` [PATCH for-3.14 06/20] dm persistent data: cleanup dm-thin specific references in text Mike Snitzer
@ 2014-01-06 14:25   ` Joe Thornber
  0 siblings, 0 replies; 27+ messages in thread
From: Joe Thornber @ 2014-01-06 14:25 UTC (permalink / raw)
  To: device-mapper development

ACK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for-3.14 07/20] dm thin: use bool rather than unsigned for flags in structures
  2013-12-20 23:37 ` [PATCH for-3.14 07/20] dm thin: use bool rather than unsigned for flags in structures Mike Snitzer
@ 2014-01-06 14:27   ` Joe Thornber
  0 siblings, 0 replies; 27+ messages in thread
From: Joe Thornber @ 2014-01-06 14:27 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 20, 2013 at 06:37:20PM -0500, Mike Snitzer wrote:
> Also, move 'err' member in dm_thin_new_mapping structure to eliminate 4
> byte hole (reduces size from 88 bytes to 80).

ACK.  But ...

> @@ -766,6 +766,10 @@ static struct dm_thin_new_mapping *get_next_mapping(struct pool *pool)
>  
>  	memset(m, 0, sizeof(struct dm_thin_new_mapping));
>  	INIT_LIST_HEAD(&m->list);
> +	m->quiesced = false;
> +	m->prepared = false;
> +	m->pass_discard = false;
> +	m->definitely_not_shared = false;
>  	m->bio = NULL;

... this is redundant.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for-3.14 14/20] dm thin: requeue bios to DM core if no_free_space and in read-only mode
  2013-12-20 23:37 ` [PATCH for-3.14 14/20] dm thin: requeue bios to DM core if no_free_space and in read-only mode Mike Snitzer
@ 2014-01-06 14:35   ` Joe Thornber
  2014-01-06 14:38     ` Joe Thornber
  0 siblings, 1 reply; 27+ messages in thread
From: Joe Thornber @ 2014-01-06 14:35 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 20, 2013 at 06:37:27PM -0500, Mike Snitzer wrote:
> +static void handle_unserviceable_bio(struct pool *pool, struct bio *bio)
> +{
> +	/*
> +	 * When pool is read-only, no cell locking is needed because
> +	 * nothing is changing.
> +	 */
> +	WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
> +
> +	if (pool->no_free_space)
> +		retry_on_resume(bio);
> +	else
> +		bio_io_error(bio);
> +}

Could we rename no_free_space please? (requeue_unserviceable_bios?)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for-3.14 14/20] dm thin: requeue bios to DM core if no_free_space and in read-only mode
  2014-01-06 14:35   ` Joe Thornber
@ 2014-01-06 14:38     ` Joe Thornber
  0 siblings, 0 replies; 27+ messages in thread
From: Joe Thornber @ 2014-01-06 14:38 UTC (permalink / raw)
  To: device-mapper development

On Mon, Jan 06, 2014 at 02:35:24PM +0000, Joe Thornber wrote:
> Could we rename no_free_space please? (requeue_unserviceable_bios?)

Never mind, a later patch removes it completely.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-01-06 14:38 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-20 23:37 [PATCH for-next 00/20] dm: current patches queued in linux-dm.git for-next Mike Snitzer
2013-12-20 23:37 ` [PATCH maybe-for-3.13 01/20] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping Mike Snitzer
2014-01-06 14:21   ` Joe Thornber
2013-12-20 23:37 ` [PATCH maybe-for-3.13 02/20] dm thin: fix discard support to a previously shared block Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 03/20] dm table: remove unused buggy code that extends the targets array Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 04/20] dm delay: use per-bio data instead of a mempool and slab cache Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 05/20] dm space map metadata: limit errors in sm_metadata_new_block Mike Snitzer
2014-01-06 14:25   ` Joe Thornber
2013-12-20 23:37 ` [PATCH for-3.14 06/20] dm persistent data: cleanup dm-thin specific references in text Mike Snitzer
2014-01-06 14:25   ` Joe Thornber
2013-12-20 23:37 ` [PATCH for-3.14 07/20] dm thin: use bool rather than unsigned for flags in structures Mike Snitzer
2014-01-06 14:27   ` Joe Thornber
2013-12-20 23:37 ` [PATCH for-3.14 08/20] dm thin: return error from alloc_data_block if pool is not in write mode Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 09/20] dm thin: add mappings to end of prepared_* lists Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 10/20] dm thin: factor out check_low_water_mark and use bools Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 11/20] dm thin: handle metadata failures more consistently Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 12/20] dm thin: log info when growing the data or metadata device Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 13/20] dm thin: cleanup and improve no space handling Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 14/20] dm thin: requeue bios to DM core if no_free_space and in read-only mode Mike Snitzer
2014-01-06 14:35   ` Joe Thornber
2014-01-06 14:38     ` Joe Thornber
2013-12-20 23:37 ` [PATCH for-3.14 15/20] dm thin: add error_if_no_space feature Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 16/20] dm thin: eliminate the no_free_space flag Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 17/20] dm thin: fix set_pool_mode exposed pool operation races Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 18/20] dm thin: fix pool_preresume resize with heavy IO races Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 19/20] dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD Mike Snitzer
2013-12-20 23:37 ` [PATCH for-3.14 20/20] dm cache policy mq: introduce three promotion threshold tunables Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.