* [PATCH 02/14] dm-thin: remove some documentation for the unimplemented 'trim' target message
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 03/14] dm_thin: dm_sm_root_size() was being called for the wrong space-map Joe Thornber
` (12 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
I'd planned a 'trim' target message for shrinking thin devices, but
this is better handled via the discard ioctl.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
Documentation/device-mapper/thin-provisioning.txt | 10 ----------
1 files changed, 0 insertions(+), 10 deletions(-)
diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 801d9d1..981aebb 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -237,16 +237,6 @@ iii) Messages
Deletes a thin device. Irreversible.
- trim <dev id> <new size in sectors>
-
- Delete mappings from the end of a thin device. Irreversible.
- You might want to use this if you're reducing the size of
- your thinly-provisioned device. In many cases, due to the
- sharing of blocks between devices, it is not possible to
- determine in advance how much space 'trim' will release. (In
- future a userspace tool might be able to perform this
- calculation.)
-
set_transaction_id <current id> <new id>
Userland volume managers, such as LVM, need a way to
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 03/14] dm_thin: dm_sm_root_size() was being called for the wrong space-map
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
2012-03-16 15:22 ` [PATCH 02/14] dm-thin: remove some documentation for the unimplemented 'trim' target message Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 04/14] dm_thin: tweak a comment Joe Thornber
` (11 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Fix a harmless typo.
The root is a chunk of data that gets written to the superblock. This
data is used to recreate the space map when opening a metadata area.
We have two space maps; one tracking space on the metadata device and
one of the data device. Both of these use the same format for their
root, so this typo was harmless.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin-metadata.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 237571a..a680c76 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -614,7 +614,7 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
if (r < 0)
goto out;
- r = dm_sm_root_size(pmd->metadata_sm, &data_len);
+ r = dm_sm_root_size(pmd->data_sm, &data_len);
if (r < 0)
goto out;
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 04/14] dm_thin: tweak a comment
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
2012-03-16 15:22 ` [PATCH 02/14] dm-thin: remove some documentation for the unimplemented 'trim' target message Joe Thornber
2012-03-16 15:22 ` [PATCH 03/14] dm_thin: dm_sm_root_size() was being called for the wrong space-map Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 05/14] dm_btree: remove redundant arg from value_ptr() Joe Thornber
` (10 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
An extraneous 's'.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 5abcc9b..2620a13 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -72,7 +72,7 @@
* missed out if the io covers the block. (schedule_copy).
*
* iv) insert the new mapping into the origin's btree
- * (process_prepared_mappings). This act of inserting breaks some
+ * (process_prepared_mapping). This act of inserting breaks some
* sharing of btree nodes between the two devices. Breaking sharing only
* effects the btree of that specific device. Btrees for the other
* devices that share the block never change. The btree for the origin
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 05/14] dm_btree: remove redundant arg from value_ptr()
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (2 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 04/14] dm_thin: tweak a comment Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 06/14] dm_btree: fix rebalancing of 3 nodes after remove Joe Thornber
` (9 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Now that the value_size is held within every node of the btrees we can
remove this argument from value_ptr().
For the last few months a BUG_ON has been checking this argument is
the same as that held in the node. No issues were reported. So this
is a safe change.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/persistent-data/dm-btree-internal.h | 7 +----
drivers/md/persistent-data/dm-btree-remove.c | 28 ++++++++++++------------
drivers/md/persistent-data/dm-btree.c | 27 +++++++++++------------
3 files changed, 29 insertions(+), 33 deletions(-)
diff --git a/drivers/md/persistent-data/dm-btree-internal.h b/drivers/md/persistent-data/dm-btree-internal.h
index d279c76..5709bfe 100644
--- a/drivers/md/persistent-data/dm-btree-internal.h
+++ b/drivers/md/persistent-data/dm-btree-internal.h
@@ -108,12 +108,9 @@ static inline void *value_base(struct node *n)
return &n->keys[le32_to_cpu(n->header.max_entries)];
}
-/*
- * FIXME: Now that value size is stored in node we don't need the third parm.
- */
-static inline void *value_ptr(struct node *n, uint32_t index, size_t value_size)
+static inline void *value_ptr(struct node *n, uint32_t index)
{
- BUG_ON(value_size != le32_to_cpu(n->header.value_size));
+ uint32_t value_size = le32_to_cpu(n->header.value_size);
return value_base(n) + (value_size * index);
}
diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c
index 023fbc2..8576d56 100644
--- a/drivers/md/persistent-data/dm-btree-remove.c
+++ b/drivers/md/persistent-data/dm-btree-remove.c
@@ -61,20 +61,20 @@ static void node_shift(struct node *n, int shift)
if (shift < 0) {
shift = -shift;
BUG_ON(shift > nr_entries);
- BUG_ON((void *) key_ptr(n, shift) >= value_ptr(n, shift, value_size));
+ BUG_ON((void *) key_ptr(n, shift) >= value_ptr(n, shift));
memmove(key_ptr(n, 0),
key_ptr(n, shift),
(nr_entries - shift) * sizeof(__le64));
- memmove(value_ptr(n, 0, value_size),
- value_ptr(n, shift, value_size),
+ memmove(value_ptr(n, 0),
+ value_ptr(n, shift),
(nr_entries - shift) * value_size);
} else {
BUG_ON(nr_entries + shift > le32_to_cpu(n->header.max_entries));
memmove(key_ptr(n, shift),
key_ptr(n, 0),
nr_entries * sizeof(__le64));
- memmove(value_ptr(n, shift, value_size),
- value_ptr(n, 0, value_size),
+ memmove(value_ptr(n, shift),
+ value_ptr(n, 0),
nr_entries * value_size);
}
}
@@ -91,16 +91,16 @@ static void node_copy(struct node *left, struct node *right, int shift)
memcpy(key_ptr(left, nr_left),
key_ptr(right, 0),
shift * sizeof(__le64));
- memcpy(value_ptr(left, nr_left, value_size),
- value_ptr(right, 0, value_size),
+ memcpy(value_ptr(left, nr_left),
+ value_ptr(right, 0),
shift * value_size);
} else {
BUG_ON(shift > le32_to_cpu(right->header.max_entries));
memcpy(key_ptr(right, 0),
key_ptr(left, nr_left - shift),
shift * sizeof(__le64));
- memcpy(value_ptr(right, 0, value_size),
- value_ptr(left, nr_left - shift, value_size),
+ memcpy(value_ptr(right, 0),
+ value_ptr(left, nr_left - shift),
shift * value_size);
}
}
@@ -120,8 +120,8 @@ static void delete_at(struct node *n, unsigned index)
key_ptr(n, index + 1),
nr_to_copy * sizeof(__le64));
- memmove(value_ptr(n, index, value_size),
- value_ptr(n, index + 1, value_size),
+ memmove(value_ptr(n, index),
+ value_ptr(n, index + 1),
nr_to_copy * value_size);
}
@@ -175,7 +175,7 @@ static int init_child(struct dm_btree_info *info, struct node *parent,
if (inc)
inc_children(info->tm, result->n, &le64_type);
- *((__le64 *) value_ptr(parent, index, sizeof(__le64))) =
+ *((__le64 *) value_ptr(parent, index)) =
cpu_to_le64(dm_block_location(result->block));
return 0;
@@ -496,7 +496,7 @@ static int remove_raw(struct shadow_spine *s, struct dm_btree_info *info,
*/
if (shadow_has_parent(s)) {
__le64 location = cpu_to_le64(dm_block_location(shadow_current(s)));
- memcpy(value_ptr(dm_block_data(shadow_parent(s)), i, sizeof(__le64)),
+ memcpy(value_ptr(dm_block_data(shadow_parent(s)), i),
&location, sizeof(__le64));
}
@@ -553,7 +553,7 @@ int dm_btree_remove(struct dm_btree_info *info, dm_block_t root,
if (info->value_type.dec)
info->value_type.dec(info->value_type.context,
- value_ptr(n, index, info->value_type.size));
+ value_ptr(n, index));
delete_at(n, index);
}
diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c
index bd1e7ff..d12b2cc 100644
--- a/drivers/md/persistent-data/dm-btree.c
+++ b/drivers/md/persistent-data/dm-btree.c
@@ -74,8 +74,7 @@ void inc_children(struct dm_transaction_manager *tm, struct node *n,
dm_tm_inc(tm, value64(n, i));
else if (vt->inc)
for (i = 0; i < nr_entries; i++)
- vt->inc(vt->context,
- value_ptr(n, i, vt->size));
+ vt->inc(vt->context, value_ptr(n, i));
}
static int insert_at(size_t value_size, struct node *node, unsigned index,
@@ -281,7 +280,7 @@ int dm_btree_del(struct dm_btree_info *info, dm_block_t root)
for (i = 0; i < f->nr_children; i++)
info->value_type.dec(info->value_type.context,
- value_ptr(f->n, i, info->value_type.size));
+ value_ptr(f->n, i));
}
f->current_child = f->nr_children;
}
@@ -320,7 +319,7 @@ static int btree_lookup_raw(struct ro_spine *s, dm_block_t block, uint64_t key,
} while (!(flags & LEAF_NODE));
*result_key = le64_to_cpu(ro_node(s)->keys[i]);
- memcpy(v, value_ptr(ro_node(s), i, value_size), value_size);
+ memcpy(v, value_ptr(ro_node(s), i), value_size);
return 0;
}
@@ -432,7 +431,7 @@ static int btree_split_sibling(struct shadow_spine *s, dm_block_t root,
size = le32_to_cpu(ln->header.flags) & INTERNAL_NODE ?
sizeof(uint64_t) : s->info->value_type.size;
- memcpy(value_ptr(rn, 0, size), value_ptr(ln, nr_left, size),
+ memcpy(value_ptr(rn, 0), value_ptr(ln, nr_left),
size * nr_right);
/*
@@ -443,7 +442,7 @@ static int btree_split_sibling(struct shadow_spine *s, dm_block_t root,
pn = dm_block_data(parent);
location = cpu_to_le64(dm_block_location(left));
__dm_bless_for_disk(&location);
- memcpy_disk(value_ptr(pn, parent_index, sizeof(__le64)),
+ memcpy_disk(value_ptr(pn, parent_index),
&location, sizeof(__le64));
location = cpu_to_le64(dm_block_location(right));
@@ -529,8 +528,8 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key)
size = le32_to_cpu(pn->header.flags) & INTERNAL_NODE ?
sizeof(__le64) : s->info->value_type.size;
- memcpy(value_ptr(ln, 0, size), value_ptr(pn, 0, size), nr_left * size);
- memcpy(value_ptr(rn, 0, size), value_ptr(pn, nr_left, size),
+ memcpy(value_ptr(ln, 0), value_ptr(pn, 0), nr_left * size);
+ memcpy(value_ptr(rn, 0), value_ptr(pn, nr_left),
nr_right * size);
/* new_parent should just point to l and r now */
@@ -545,12 +544,12 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key)
val = cpu_to_le64(dm_block_location(left));
__dm_bless_for_disk(&val);
pn->keys[0] = ln->keys[0];
- memcpy_disk(value_ptr(pn, 0, sizeof(__le64)), &val, sizeof(__le64));
+ memcpy_disk(value_ptr(pn, 0), &val, sizeof(__le64));
val = cpu_to_le64(dm_block_location(right));
__dm_bless_for_disk(&val);
pn->keys[1] = rn->keys[0];
- memcpy_disk(value_ptr(pn, 1, sizeof(__le64)), &val, sizeof(__le64));
+ memcpy_disk(value_ptr(pn, 1), &val, sizeof(__le64));
/*
* rejig the spine. This is ugly, since it knows too
@@ -595,7 +594,7 @@ static int btree_insert_raw(struct shadow_spine *s, dm_block_t root,
__le64 location = cpu_to_le64(dm_block_location(shadow_current(s)));
__dm_bless_for_disk(&location);
- memcpy_disk(value_ptr(dm_block_data(shadow_parent(s)), i, sizeof(uint64_t)),
+ memcpy_disk(value_ptr(dm_block_data(shadow_parent(s)), i),
&location, sizeof(__le64));
}
@@ -710,12 +709,12 @@ static int insert(struct dm_btree_info *info, dm_block_t root,
(!info->value_type.equal ||
!info->value_type.equal(
info->value_type.context,
- value_ptr(n, index, info->value_type.size),
+ value_ptr(n, index),
value))) {
info->value_type.dec(info->value_type.context,
- value_ptr(n, index, info->value_type.size));
+ value_ptr(n, index));
}
- memcpy_disk(value_ptr(n, index, info->value_type.size),
+ memcpy_disk(value_ptr(n, index),
value, info->value_type.size);
}
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 06/14] dm_btree: fix rebalancing of 3 nodes after remove
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (3 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 05/14] dm_btree: remove redundant arg from value_ptr() Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 07/14] dm_space_map: remove entries from the ref_count tree if they're no longer needed Joe Thornber
` (8 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Bugfix.
When we remove an entry from a node we sometimes rebalance with it's
two neighbours. This wasn't being done correctly; in some cases
entries have to move all the way from the right neighbour to the left
neighbour, or vice versa. This patch pretty much re-writes the
balancing code.
This code is barely used currently; only when you delete a thin
device, and then only if you have hundreds of them in the same pool.
Once we have discard support, which removes mappings, this will be used
much more heavily.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/persistent-data/dm-btree-remove.c | 174 +++++++++++++++-----------
1 files changed, 99 insertions(+), 75 deletions(-)
diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c
index 8576d56..aa71e23 100644
--- a/drivers/md/persistent-data/dm-btree-remove.c
+++ b/drivers/md/persistent-data/dm-btree-remove.c
@@ -128,18 +128,9 @@ static void delete_at(struct node *n, unsigned index)
n->header.nr_entries = cpu_to_le32(nr_entries - 1);
}
-static unsigned del_threshold(struct node *n)
-{
- return le32_to_cpu(n->header.max_entries) / 3;
-}
-
static unsigned merge_threshold(struct node *n)
{
- /*
- * The extra one is because we know we're potentially going to
- * delete an entry.
- */
- return 2 * (le32_to_cpu(n->header.max_entries) / 3) + 1;
+ return le32_to_cpu(n->header.max_entries) / 3;
}
struct child {
@@ -188,6 +179,15 @@ static int exit_child(struct dm_btree_info *info, struct child *c)
static void shift(struct node *left, struct node *right, int count)
{
+ uint32_t nr_left = le32_to_cpu(left->header.nr_entries);
+ uint32_t nr_right = le32_to_cpu(right->header.nr_entries);
+ uint32_t max_entries = le32_to_cpu(left->header.max_entries);
+ uint32_t r_max_entries = le32_to_cpu(right->header.max_entries);
+
+ BUG_ON(max_entries != r_max_entries);
+ BUG_ON(nr_left - count > max_entries);
+ BUG_ON(nr_right + count > max_entries);
+
if (!count)
return;
@@ -199,13 +199,8 @@ static void shift(struct node *left, struct node *right, int count)
node_shift(right, count);
}
- left->header.nr_entries =
- cpu_to_le32(le32_to_cpu(left->header.nr_entries) - count);
- BUG_ON(le32_to_cpu(left->header.nr_entries) > le32_to_cpu(left->header.max_entries));
-
- right->header.nr_entries =
- cpu_to_le32(le32_to_cpu(right->header.nr_entries) + count);
- BUG_ON(le32_to_cpu(right->header.nr_entries) > le32_to_cpu(right->header.max_entries));
+ left->header.nr_entries = cpu_to_le32(nr_left - count);
+ right->header.nr_entries = cpu_to_le32(nr_right + count);
}
static void __rebalance2(struct dm_btree_info *info, struct node *parent,
@@ -215,8 +210,9 @@ static void __rebalance2(struct dm_btree_info *info, struct node *parent,
struct node *right = r->n;
uint32_t nr_left = le32_to_cpu(left->header.nr_entries);
uint32_t nr_right = le32_to_cpu(right->header.nr_entries);
+ unsigned threshold = 2 * merge_threshold(left) + 1;
- if (nr_left + nr_right <= merge_threshold(left)) {
+ if (nr_left + nr_right < threshold) {
/*
* Merge
*/
@@ -234,9 +230,6 @@ static void __rebalance2(struct dm_btree_info *info, struct node *parent,
* Rebalance.
*/
unsigned target_left = (nr_left + nr_right) / 2;
- unsigned shift_ = nr_left - target_left;
- BUG_ON(le32_to_cpu(left->header.max_entries) <= nr_left - shift_);
- BUG_ON(le32_to_cpu(right->header.max_entries) <= nr_right + shift_);
shift(left, right, nr_left - target_left);
*key_ptr(parent, r->index) = right->keys[0];
}
@@ -272,6 +265,84 @@ static int rebalance2(struct shadow_spine *s, struct dm_btree_info *info,
return exit_child(info, &right);
}
+/*
+ * We dump as many entries from center as possible into left, then the rest
+ * in right, then rebalance2. This wastes some cpu, but I want something
+ * simple atm.
+ */
+static void delete_center_node(struct dm_btree_info *info, struct node *parent,
+ struct child *l, struct child *c, struct child *r,
+ struct node *left, struct node *center, struct node *right,
+ uint32_t nr_left, uint32_t nr_center, uint32_t nr_right)
+{
+ uint32_t max_entries = le32_to_cpu(left->header.max_entries);
+ unsigned shift = min(max_entries - nr_left, nr_center);
+
+ BUG_ON(nr_left + shift > max_entries);
+ node_copy(left, center, -shift);
+ left->header.nr_entries = cpu_to_le32(nr_left + shift);
+
+ if (shift != nr_center) {
+ shift = nr_center - shift;
+ BUG_ON((nr_right + shift) > max_entries);
+ node_shift(right, shift);
+ node_copy(center, right, shift);
+ right->header.nr_entries = cpu_to_le32(nr_right + shift);
+ }
+ *key_ptr(parent, r->index) = right->keys[0];
+
+ delete_at(parent, c->index);
+ r->index--;
+
+ dm_tm_dec(info->tm, dm_block_location(c->block));
+ __rebalance2(info, parent, l, r);
+}
+
+/*
+ * Redistributes entries among 3 sibling nodes.
+ */
+static void redistribute3(struct dm_btree_info *info, struct node *parent,
+ struct child *l, struct child *c, struct child *r,
+ struct node *left, struct node *center, struct node *right,
+ uint32_t nr_left, uint32_t nr_center, uint32_t nr_right)
+{
+ int s;
+ uint32_t max_entries = le32_to_cpu(left->header.max_entries);
+ unsigned target = (nr_left + nr_center + nr_right) / 3;
+ BUG_ON(target > max_entries);
+
+ if (nr_left < nr_right) {
+ s = nr_left - target;
+
+ if (s < 0 && nr_center < -s) {
+ /* not enough in central node */
+ shift(left, center, nr_center);
+ s = nr_center - target;
+ shift(left, right, s);
+ nr_right += s;
+ } else
+ shift(left, center, s);
+
+ shift(center, right, target - nr_right);
+
+ } else {
+ s = target - nr_right;
+ if (s > 0 && nr_center < s) {
+ /* not enough in central node */
+ shift(center, right, nr_center);
+ s = target - nr_center;
+ shift(left, right, s);
+ nr_left -= s;
+ } else
+ shift(center, right, s);
+
+ shift(left, center, nr_left - target);
+ }
+
+ *key_ptr(parent, c->index) = center->keys[0];
+ *key_ptr(parent, r->index) = right->keys[0];
+}
+
static void __rebalance3(struct dm_btree_info *info, struct node *parent,
struct child *l, struct child *c, struct child *r)
{
@@ -282,62 +353,18 @@ static void __rebalance3(struct dm_btree_info *info, struct node *parent,
uint32_t nr_left = le32_to_cpu(left->header.nr_entries);
uint32_t nr_center = le32_to_cpu(center->header.nr_entries);
uint32_t nr_right = le32_to_cpu(right->header.nr_entries);
- uint32_t max_entries = le32_to_cpu(left->header.max_entries);
- unsigned target;
+ unsigned threshold = merge_threshold(left) * 4 + 1;
BUG_ON(left->header.max_entries != center->header.max_entries);
BUG_ON(center->header.max_entries != right->header.max_entries);
- if (((nr_left + nr_center + nr_right) / 2) < merge_threshold(center)) {
- /*
- * Delete center node:
- *
- * We dump as many entries from center as possible into
- * left, then the rest in right, then rebalance2. This
- * wastes some cpu, but I want something simple atm.
- */
- unsigned shift = min(max_entries - nr_left, nr_center);
-
- BUG_ON(nr_left + shift > max_entries);
- node_copy(left, center, -shift);
- left->header.nr_entries = cpu_to_le32(nr_left + shift);
-
- if (shift != nr_center) {
- shift = nr_center - shift;
- BUG_ON((nr_right + shift) >= max_entries);
- node_shift(right, shift);
- node_copy(center, right, shift);
- right->header.nr_entries = cpu_to_le32(nr_right + shift);
- }
- *key_ptr(parent, r->index) = right->keys[0];
-
- delete_at(parent, c->index);
- r->index--;
-
- dm_tm_dec(info->tm, dm_block_location(c->block));
- __rebalance2(info, parent, l, r);
-
- return;
- }
-
- /*
- * Rebalance
- */
- target = (nr_left + nr_center + nr_right) / 3;
- BUG_ON(target > max_entries);
-
- /*
- * Adjust the left node
- */
- shift(left, center, nr_left - target);
-
- /*
- * Adjust the right node
- */
- shift(center, right, target - nr_right);
- *key_ptr(parent, c->index) = center->keys[0];
- *key_ptr(parent, r->index) = right->keys[0];
+ if ((nr_left + nr_center + nr_right) < threshold)
+ delete_center_node(info, parent, l, c, r, left, center, right,
+ nr_left, nr_center, nr_right);
+ else
+ redistribute3(info, parent, l, c, r, left, center, right,
+ nr_left, nr_center, nr_right);
}
static int rebalance3(struct shadow_spine *s, struct dm_btree_info *info,
@@ -441,9 +468,6 @@ static int rebalance_children(struct shadow_spine *s,
if (r)
return r;
- if (child_entries > del_threshold(n))
- return 0;
-
has_left_sibling = i > 0;
has_right_sibling = i < (le32_to_cpu(n->header.nr_entries) - 1);
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 07/14] dm_space_map: remove entries from the ref_count tree if they're no longer needed
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (4 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 06/14] dm_btree: fix rebalancing of 3 nodes after remove Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 08/14] dm_thin: add support for read-only external snapshot origins Joe Thornber
` (7 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Enhancement.
Ref counts are stored in two places: a bitmap if the ref_count is
below 3, or a btree of uint32_t if 3 or above.
When a ref_count that was above 3 drops below we can remove it from
the tree and save some metadata space. This removal was commented out
before because I was unsure why this was causing under populated btree
nodes. Earlier patches have fixed this issue.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/persistent-data/dm-space-map-common.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c
index df2494c..ff3beed 100644
--- a/drivers/md/persistent-data/dm-space-map-common.c
+++ b/drivers/md/persistent-data/dm-space-map-common.c
@@ -405,8 +405,6 @@ int sm_ll_insert(struct ll_disk *ll, dm_block_t b,
if (r < 0)
return r;
-#if 0
- /* FIXME: dm_btree_remove doesn't handle this yet */
if (old > 2) {
r = dm_btree_remove(&ll->ref_count_info,
ll->ref_count_root,
@@ -414,7 +412,6 @@ int sm_ll_insert(struct ll_disk *ll, dm_block_t b,
if (r)
return r;
}
-#endif
} else {
__le32 le_rc = cpu_to_le32(ref_count);
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 08/14] dm_thin: add support for read-only external snapshot origins
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (5 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 07/14] dm_space_map: remove entries from the ref_count tree if they're no longer needed Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 09/14] dm_thin: foundation for discard support Joe Thornber
` (6 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Enhancement.
You can use an external, _read only_, device as an origin for a thin
device. Any read to an unprovisioned area of the thin device will be
passed through to the origin. Writes trigger allocation of new blocks
as usual.
One possible use case for this would be VM hosts who want to run
guests on thinp volumes, but have the base image on another device
(possibly shared between many VMs).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
Documentation/device-mapper/thin-provisioning.txt | 38 +++++++++-
drivers/md/dm-thin.c | 84 +++++++++++++++++---
2 files changed, 108 insertions(+), 14 deletions(-)
diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 981aebb..0fc0b0f 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -167,6 +167,38 @@ ii) Using an internal snapshot.
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
+External snapshots
+------------------
+
+You can use an external, _read only_, device as an origin for a thin
+device. Any read to an unprovisioned area of the thin device will be
+passed through to the origin. Writes trigger allocation of new blocks
+as usual.
+
+One possible use case for this would be VM hosts who want to run
+guests on thinp volumes, but have the base image on another device
+(possibly shared between many VMs).
+
+You must not write to the origin device if you use this technique! Of
+course you can write to the thin device, and take internal snapshots
+of the thin.
+
+i) Creating an external snapshot
+
+ Same as creating a thin device. You don't need to mention the
+ origin at this stage.
+
+ dmsetup message /dev/mapper/pool 0 "create_thin 0"
+
+ii) Using an external snapshot.
+
+ Add an extra parameter to the thin target specifying the origin:
+
+ dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
+
+ All descendants (internal snapshots) of an external snapshot will
+ need the extra origin argument.
+
Deactivation
------------
@@ -252,7 +284,7 @@ iii) Messages
i) Constructor
- thin <pool dev> <dev id>
+ thin <pool dev> <dev id> [external origin id]
pool dev:
the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
@@ -261,6 +293,10 @@ i) Constructor
the internal device identifier of the device to be
activated.
+ external origin dev:
+ a block device; reads to unprovisioned areas of the thin target
+ will be mapped to here.
+
The pool doesn't store any size against the thin devices. If you
load a thin target that is smaller than you've been using previously,
then you'll have no access to blocks mapped beyond the end. If you
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 2620a13..4b57e69 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -557,6 +557,7 @@ struct pool_c {
*/
struct thin_c {
struct dm_dev *pool_dev;
+ struct dm_dev *origin_dev;
dm_thin_id dev_id;
struct pool *pool;
@@ -674,14 +675,16 @@ static void remap(struct thin_c *tc, struct bio *bio, dm_block_t block)
(bio->bi_sector & pool->offset_mask);
}
-static void remap_and_issue(struct thin_c *tc, struct bio *bio,
- dm_block_t block)
+static void remap_to_origin(struct thin_c *tc, struct bio *bio)
+{
+ bio->bi_bdev = tc->origin_dev->bdev;
+}
+
+static void issue(struct thin_c *tc, struct bio *bio)
{
struct pool *pool = tc->pool;
unsigned long flags;
- remap(tc, bio, block);
-
/*
* Batch together any FUA/FLUSH bios we find and then issue
* a single commit for them in process_deferred_bios().
@@ -694,6 +697,19 @@ static void remap_and_issue(struct thin_c *tc, struct bio *bio,
generic_make_request(bio);
}
+static void remap_to_origin_and_issue(struct thin_c *tc, struct bio *bio)
+{
+ remap_to_origin(tc, bio);
+ issue(tc, bio);
+}
+
+static void remap_and_issue(struct thin_c *tc, struct bio *bio,
+ dm_block_t block)
+{
+ remap(tc, bio, block);
+ issue(tc, bio);
+}
+
/*
* wake_worker() is used when new work is queued and when pool_resume is
* ready to continue deferred IO processing.
@@ -940,7 +956,8 @@ static struct new_mapping *get_next_mapping(struct pool *pool)
}
static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
- dm_block_t data_origin, dm_block_t data_dest,
+ struct dm_dev *origin, dm_block_t data_origin,
+ dm_block_t data_dest,
struct cell *cell, struct bio *bio)
{
int r;
@@ -972,7 +989,7 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
} else {
struct dm_io_region from, to;
- from.bdev = tc->pool_dev->bdev;
+ from.bdev = origin->bdev;
from.sector = data_origin * pool->sectors_per_block;
from.count = pool->sectors_per_block;
@@ -990,6 +1007,22 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
}
}
+static void schedule_internal_copy(struct thin_c *tc, dm_block_t virt_block,
+ dm_block_t data_origin, dm_block_t data_dest,
+ struct cell *cell, struct bio *bio)
+{
+ schedule_copy(tc, virt_block, tc->pool_dev,
+ data_origin, data_dest, cell, bio);
+}
+
+static void schedule_external_copy(struct thin_c *tc, dm_block_t virt_block,
+ dm_block_t data_dest,
+ struct cell *cell, struct bio *bio)
+{
+ schedule_copy(tc, virt_block, tc->origin_dev,
+ virt_block, data_dest, cell, bio);
+}
+
static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
dm_block_t data_block, struct cell *cell,
struct bio *bio)
@@ -1136,8 +1169,8 @@ static void break_sharing(struct thin_c *tc, struct bio *bio, dm_block_t block,
r = alloc_data_block(tc, &data_block);
switch (r) {
case 0:
- schedule_copy(tc, block, lookup_result->block,
- data_block, cell, bio);
+ schedule_internal_copy(tc, block, lookup_result->block,
+ data_block, cell, bio);
break;
case -ENOSPC:
@@ -1211,7 +1244,10 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
r = alloc_data_block(tc, &data_block);
switch (r) {
case 0:
- schedule_zero(tc, block, data_block, cell, bio);
+ if (tc->origin_dev)
+ schedule_external_copy(tc, block, data_block, cell, bio);
+ else
+ schedule_zero(tc, block, data_block, cell, bio);
break;
case -ENOSPC:
@@ -1262,7 +1298,11 @@ static void process_bio(struct thin_c *tc, struct bio *bio)
break;
case -ENODATA:
- provision_block(tc, bio, block, cell);
+ if (bio_data_dir(bio) == READ && tc->origin_dev) {
+ cell_release_singleton(cell, bio);
+ remap_to_origin_and_issue(tc, bio);
+ } else
+ provision_block(tc, bio, block, cell);
break;
default:
@@ -2225,6 +2265,8 @@ static void thin_dtr(struct dm_target *ti)
__pool_dec(tc->pool);
dm_pool_close_thin_device(tc->td);
dm_put_device(ti, tc->pool_dev);
+ if (tc->origin_dev)
+ dm_put_device(ti, tc->origin_dev);
kfree(tc);
mutex_unlock(&dm_thin_pool_table.mutex);
@@ -2233,21 +2275,22 @@ static void thin_dtr(struct dm_target *ti)
/*
* Thin target parameters:
*
- * <pool_dev> <dev_id>
+ * <pool_dev> <dev_id> [origin_dev]
*
* pool_dev: the path to the pool (eg, /dev/mapper/my_pool)
* dev_id: the internal device identifier
+ * origin_dev: a device external to the pool that should act as the origin
*/
static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
{
int r;
struct thin_c *tc;
- struct dm_dev *pool_dev;
+ struct dm_dev *pool_dev, *origin_dev;
struct mapped_device *pool_md;
mutex_lock(&dm_thin_pool_table.mutex);
- if (argc != 2) {
+ if (argc != 2 && argc != 3) {
ti->error = "Invalid argument count";
r = -EINVAL;
goto out_unlock;
@@ -2260,6 +2303,15 @@ static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto out_unlock;
}
+ if (argc == 3) {
+ r = dm_get_device(ti, argv[2], FMODE_READ, &origin_dev);
+ if (r) {
+ ti->error = "Error opening origin device";
+ goto bad_origin_dev;
+ }
+ tc->origin_dev = origin_dev;
+ }
+
r = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &pool_dev);
if (r) {
ti->error = "Error opening pool device";
@@ -2312,6 +2364,9 @@ bad_pool_lookup:
bad_common:
dm_put_device(ti, tc->pool_dev);
bad_pool_dev:
+ if (tc->origin_dev)
+ dm_put_device(ti, tc->origin_dev);
+bad_origin_dev:
kfree(tc);
out_unlock:
mutex_unlock(&dm_thin_pool_table.mutex);
@@ -2343,6 +2398,7 @@ static int thin_status(struct dm_target *ti, status_type_t type,
ssize_t sz = 0;
dm_block_t mapped, highest;
char buf[BDEVNAME_SIZE];
+ char buf2[BDEVNAME_SIZE];
struct thin_c *tc = ti->private;
if (!tc->td)
@@ -2370,6 +2426,8 @@ static int thin_status(struct dm_target *ti, status_type_t type,
DMEMIT("%s %lu",
format_dev_t(buf, tc->pool_dev->bdev->bd_dev),
(unsigned long) tc->dev_id);
+ if (tc->origin_dev)
+ DMEMIT(" %s", format_dev_t(buf2, tc->origin_dev->bdev->bd_dev));
break;
}
}
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 09/14] dm_thin: foundation for discard support
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (6 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 08/14] dm_thin: add support for read-only external snapshot origins Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 10/14] dm_thin: add support for REQ_DISCARD Joe Thornber
` (5 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
This patch contains a lot of the ground work needed for supporting
discard.
- The thin target now has an endio function, that replaces
shared_read_endio.
- An explicit 'quiesced' flag has been introduced into the new_mapping
structure. Before, this was implicitly indicated by m->list being
empty.
- The map_info->ptr remains constant for the duration of a bio's trip
through thinp. Making it easier to reason about it.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin.c | 125 ++++++++++++++++++++++++++++----------------------
1 files changed, 70 insertions(+), 55 deletions(-)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 4b57e69..3d60d8c 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -531,7 +531,7 @@ struct pool {
struct bio_list retry_on_resume_list;
- struct deferred_set ds; /* FIXME: move to thin_c */
+ struct deferred_set shared_read_ds;
struct new_mapping *next_mapping;
mempool_t *mapping_pool;
@@ -626,6 +626,12 @@ static struct pool *__pool_table_lookup_metadata_dev(struct block_device *md_dev
/*----------------------------------------------------------------*/
+struct endio_hook {
+ struct thin_c *tc;
+ struct deferred_entry *shared_read_entry;
+ struct new_mapping *overwrite_mapping;
+};
+
static void __requeue_bio_list(struct thin_c *tc, struct bio_list *master)
{
struct bio *bio;
@@ -636,7 +642,8 @@ static void __requeue_bio_list(struct thin_c *tc, struct bio_list *master)
bio_list_init(master);
while ((bio = bio_list_pop(&bios))) {
- if (dm_get_mapinfo(bio)->ptr == tc)
+ struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+ if (h->tc == tc)
bio_endio(bio, DM_ENDIO_REQUEUE);
else
bio_list_add(master, bio);
@@ -724,16 +731,11 @@ static void wake_worker(struct pool *pool)
/*
* Bio endio functions.
*/
-struct endio_hook {
- struct thin_c *tc;
- bio_end_io_t *saved_bi_end_io;
- struct deferred_entry *entry;
-};
-
struct new_mapping {
struct list_head list;
- int prepared;
+ unsigned quiesced:1;
+ unsigned prepared:1;
struct thin_c *tc;
dm_block_t virt_block;
@@ -755,7 +757,7 @@ static void __maybe_add_mapping(struct new_mapping *m)
{
struct pool *pool = m->tc->pool;
- if (list_empty(&m->list) && m->prepared) {
+ if (m->quiesced && m->prepared) {
list_add(&m->list, &pool->prepared_mappings);
wake_worker(pool);
}
@@ -778,7 +780,8 @@ static void copy_complete(int read_err, unsigned long write_err, void *context)
static void overwrite_endio(struct bio *bio, int err)
{
unsigned long flags;
- struct new_mapping *m = dm_get_mapinfo(bio)->ptr;
+ struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+ struct new_mapping *m = h->overwrite_mapping;
struct pool *pool = m->tc->pool;
m->err = err;
@@ -789,31 +792,6 @@ static void overwrite_endio(struct bio *bio, int err)
spin_unlock_irqrestore(&pool->lock, flags);
}
-static void shared_read_endio(struct bio *bio, int err)
-{
- struct list_head mappings;
- struct new_mapping *m, *tmp;
- struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
- unsigned long flags;
- struct pool *pool = h->tc->pool;
-
- bio->bi_end_io = h->saved_bi_end_io;
- bio_endio(bio, err);
-
- INIT_LIST_HEAD(&mappings);
- ds_dec(h->entry, &mappings);
-
- spin_lock_irqsave(&pool->lock, flags);
- list_for_each_entry_safe(m, tmp, &mappings, list) {
- list_del(&m->list);
- INIT_LIST_HEAD(&m->list);
- __maybe_add_mapping(m);
- }
- spin_unlock_irqrestore(&pool->lock, flags);
-
- mempool_free(h, pool->endio_hook_pool);
-}
-
/*----------------------------------------------------------------*/
/*
@@ -947,9 +925,7 @@ static int ensure_next_mapping(struct pool *pool)
static struct new_mapping *get_next_mapping(struct pool *pool)
{
struct new_mapping *r = pool->next_mapping;
-
BUG_ON(!pool->next_mapping);
-
pool->next_mapping = NULL;
return r;
@@ -965,6 +941,7 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
struct new_mapping *m = get_next_mapping(pool);
INIT_LIST_HEAD(&m->list);
+ m->quiesced = 0;
m->prepared = 0;
m->tc = tc;
m->virt_block = virt_block;
@@ -973,7 +950,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
m->err = 0;
m->bio = NULL;
- ds_add_work(&pool->ds, &m->list);
+ if (!ds_add_work(&pool->shared_read_ds, &m->list))
+ m->quiesced = 1;
/*
* IO to pool_dev remaps to the pool target's data_dev.
@@ -982,9 +960,10 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
* bio immediately. Otherwise we use kcopyd to clone the data first.
*/
if (io_overwrites_block(pool, bio)) {
+ struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+ h->overwrite_mapping = m;
m->bio = bio;
save_and_set_endio(bio, &m->saved_bi_end_io, overwrite_endio);
- dm_get_mapinfo(bio)->ptr = m;
remap_and_issue(tc, bio, data_dest);
} else {
struct dm_io_region from, to;
@@ -1031,6 +1010,7 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
struct new_mapping *m = get_next_mapping(pool);
INIT_LIST_HEAD(&m->list);
+ m->quiesced = 1;
m->prepared = 0;
m->tc = tc;
m->virt_block = virt_block;
@@ -1048,9 +1028,10 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
process_prepared_mapping(m);
else if (io_overwrites_block(pool, bio)) {
+ struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+ h->overwrite_mapping = m;
m->bio = bio;
save_and_set_endio(bio, &m->saved_bi_end_io, overwrite_endio);
- dm_get_mapinfo(bio)->ptr = m;
remap_and_issue(tc, bio, data_block);
} else {
@@ -1137,7 +1118,8 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
*/
static void retry_on_resume(struct bio *bio)
{
- struct thin_c *tc = dm_get_mapinfo(bio)->ptr;
+ struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+ struct thin_c *tc = h->tc;
struct pool *pool = tc->pool;
unsigned long flags;
@@ -1203,13 +1185,9 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
if (bio_data_dir(bio) == WRITE)
break_sharing(tc, bio, block, &key, lookup_result, cell);
else {
- struct endio_hook *h;
- h = mempool_alloc(pool->endio_hook_pool, GFP_NOIO);
+ struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
- h->tc = tc;
- h->entry = ds_inc(&pool->ds);
- save_and_set_endio(bio, &h->saved_bi_end_io, shared_read_endio);
- dm_get_mapinfo(bio)->ptr = h;
+ h->shared_read_entry = ds_inc(&pool->shared_read_ds);
cell_release_singleton(cell, bio);
remap_and_issue(tc, bio, lookup_result->block);
@@ -1327,7 +1305,9 @@ static void process_deferred_bios(struct pool *pool)
spin_unlock_irqrestore(&pool->lock, flags);
while ((bio = bio_list_pop(&bios))) {
- struct thin_c *tc = dm_get_mapinfo(bio)->ptr;
+ struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+ struct thin_c *tc = h->tc;
+
/*
* If we've got no free new_mapping structs, and processing
* this bio might require one, we pause until there are some
@@ -1398,6 +1378,18 @@ static void thin_defer_bio(struct thin_c *tc, struct bio *bio)
wake_worker(pool);
}
+static struct endio_hook *thin_hook_bio(struct thin_c *tc, struct bio *bio)
+{
+ struct pool *pool = tc->pool;
+ struct endio_hook *h = mempool_alloc(pool->endio_hook_pool, GFP_NOIO);
+
+ h->tc = tc;
+ h->shared_read_entry = NULL;
+ h->overwrite_mapping = NULL;
+
+ return h;
+}
+
/*
* Non-blocking function called from the thin target's map function.
*/
@@ -1410,11 +1402,7 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio,
struct dm_thin_device *td = tc->td;
struct dm_thin_lookup_result result;
- /*
- * Save the thin context for easy access from the deferred bio later.
- */
- map_context->ptr = tc;
-
+ map_context->ptr = thin_hook_bio(tc, bio);
if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) {
thin_defer_bio(tc, bio);
return DM_MAPIO_SUBMITTED;
@@ -1593,7 +1581,7 @@ static struct pool *pool_create(struct mapped_device *pool_md,
pool->low_water_triggered = 0;
pool->no_free_space = 0;
bio_list_init(&pool->retry_on_resume_list);
- ds_init(&pool->ds);
+ ds_init(&pool->shared_read_ds);
pool->next_mapping = NULL;
pool->mapping_pool =
@@ -2382,6 +2370,32 @@ static int thin_map(struct dm_target *ti, struct bio *bio,
return thin_bio_map(ti, bio, map_context);
}
+static int thin_endio(struct dm_target *ti,
+ struct bio *bio, int err,
+ union map_info *map_context)
+{
+ unsigned long flags;
+ struct endio_hook *h = map_context->ptr;
+ struct list_head work;
+ struct new_mapping *m, *tmp;
+ struct pool *pool = h->tc->pool;
+
+ if (h->shared_read_entry) {
+ INIT_LIST_HEAD(&work);
+ ds_dec(h->shared_read_entry, &work);
+
+ spin_lock_irqsave(&pool->lock, flags);
+ list_for_each_entry_safe(m, tmp, &work, list) {
+ list_del(&m->list);
+ m->quiesced = 1;
+ __maybe_add_mapping(m);
+ }
+ spin_unlock_irqrestore(&pool->lock, flags);
+ }
+
+ return 0;
+}
+
static void thin_postsuspend(struct dm_target *ti)
{
if (dm_noflush_suspending(ti))
@@ -2470,6 +2484,7 @@ static struct target_type thin_target = {
.ctr = thin_ctr,
.dtr = thin_dtr,
.map = thin_map,
+ .end_io = thin_endio,
.postsuspend = thin_postsuspend,
.status = thin_status,
.iterate_devices = thin_iterate_devices,
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 10/14] dm_thin: add support for REQ_DISCARD
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (7 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 09/14] dm_thin: foundation for discard support Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-23 12:45 ` Alasdair G Kergon
2012-03-16 15:22 ` [PATCH 11/14] dm_thin: add pool target flags to control discard Joe Thornber
` (4 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Enhancement.
On discard the corresponding mapping(s) are removed from the thin
device. If the associated block(s) are no longer shared the discard
is passed to the underlying device.
All bios other than discards now have an associated deferred_entry
that is saved to the 'all_io_entry' in endio_hook. When non-discard
IO completes and associated mappings are quiesced any discards that
were deferred, via ds_add_work() in process_discard(), will be queued
for processing by the worker thread.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin.c | 173 ++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 159 insertions(+), 14 deletions(-)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 3d60d8c..1691be9 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -528,10 +528,12 @@ struct pool {
struct bio_list deferred_bios;
struct bio_list deferred_flush_bios;
struct list_head prepared_mappings;
+ struct list_head prepared_discards;
struct bio_list retry_on_resume_list;
struct deferred_set shared_read_ds;
+ struct deferred_set all_io_ds;
struct new_mapping *next_mapping;
mempool_t *mapping_pool;
@@ -629,6 +631,7 @@ static struct pool *__pool_table_lookup_metadata_dev(struct block_device *md_dev
struct endio_hook {
struct thin_c *tc;
struct deferred_entry *shared_read_entry;
+ struct deferred_entry *all_io_entry;
struct new_mapping *overwrite_mapping;
};
@@ -736,11 +739,12 @@ struct new_mapping {
unsigned quiesced:1;
unsigned prepared:1;
+ unsigned pass_discard:1;
struct thin_c *tc;
dm_block_t virt_block;
dm_block_t data_block;
- struct cell *cell;
+ struct cell *cell, *cell2;
int err;
/*
@@ -880,7 +884,30 @@ static void process_prepared_mapping(struct new_mapping *m)
mempool_free(m, tc->pool->mapping_pool);
}
-static void process_prepared_mappings(struct pool *pool)
+static void process_prepared_discard(struct new_mapping *m)
+{
+ int r;
+ struct thin_c *tc = m->tc;
+
+ r = dm_thin_remove_block(tc->td, m->virt_block);
+ if (r)
+ DMERR("dm_thin_remove_block() failed");
+
+ /*
+ * Pass the discard down to the underlying device?
+ */
+ if (m->pass_discard)
+ remap_and_issue(tc, m->bio, m->data_block);
+ else
+ bio_endio(m->bio, 0);
+
+ cell_defer_except(tc, m->cell);
+ cell_defer_except(tc, m->cell2);
+ mempool_free(m, tc->pool->mapping_pool);
+}
+
+static void process_prepared(struct pool *pool, struct list_head *head,
+ void (*fn)(struct new_mapping *))
{
unsigned long flags;
struct list_head maps;
@@ -888,21 +915,27 @@ static void process_prepared_mappings(struct pool *pool)
INIT_LIST_HEAD(&maps);
spin_lock_irqsave(&pool->lock, flags);
- list_splice_init(&pool->prepared_mappings, &maps);
+ list_splice_init(head, &maps);
spin_unlock_irqrestore(&pool->lock, flags);
list_for_each_entry_safe(m, tmp, &maps, list)
- process_prepared_mapping(m);
+ fn(m);
}
/*
* Deferred bio jobs.
*/
-static int io_overwrites_block(struct pool *pool, struct bio *bio)
+static int io_overlaps_block(struct pool *pool, struct bio *bio)
{
- return ((bio_data_dir(bio) == WRITE) &&
- !(bio->bi_sector & pool->offset_mask)) &&
+ return !(bio->bi_sector & pool->offset_mask) &&
(bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT));
+
+}
+
+static int io_overwrites_block(struct pool *pool, struct bio *bio)
+{
+ return (bio_data_dir(bio) == WRITE) &&
+ io_overlaps_block(pool, bio);
}
static void save_and_set_endio(struct bio *bio, bio_end_io_t **save,
@@ -1140,6 +1173,86 @@ static void no_space(struct cell *cell)
retry_on_resume(bio);
}
+static void process_discard(struct thin_c *tc, struct bio *bio)
+{
+ int r;
+ struct pool *pool = tc->pool;
+ struct cell *cell, *cell2;
+ struct cell_key key, key2;
+ dm_block_t block = get_bio_block(tc, bio);
+ struct dm_thin_lookup_result lookup_result;
+ struct new_mapping *m;
+
+ build_virtual_key(tc->td, block, &key);
+ if (bio_detain(tc->pool->prison, &key, bio, &cell))
+ return;
+
+ r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
+ switch (r) {
+ case 0:
+ /*
+ * Check nobody is fiddling with this pool block. This can
+ * happen if someone's in the process of breaking sharing
+ * on this block.
+ */
+ build_data_key(tc->td, lookup_result.block, &key2);
+ if (bio_detain(tc->pool->prison, &key2, bio, &cell2)) {
+ cell_release_singleton(cell, bio);
+ break;
+ }
+
+ if (io_overlaps_block(pool, bio)) {
+ /*
+ * IO may still be going to the destination block. We must
+ * quiesce before we can do the removal.
+ */
+ m = get_next_mapping(pool);
+ m->tc = tc;
+ m->pass_discard = !lookup_result.shared;
+ m->virt_block = block;
+ m->data_block = lookup_result.block;
+ m->cell = cell;
+ m->cell2 = cell2;
+ m->err = 0;
+ m->bio = bio;
+
+ if (!ds_add_work(&pool->all_io_ds, &m->list)) {
+ list_add(&m->list, &pool->prepared_discards);
+ wake_worker(pool);
+ }
+ } else {
+ /*
+ * This path is hit if people are ignoring
+ * limits->discard_granularity. It ignores any
+ * part of the discard that is in a subsequent
+ * block.
+ */
+ sector_t offset = bio->bi_sector - (block << pool->block_shift);
+ unsigned remaining = (pool->sectors_per_block - offset) << 9;
+ bio->bi_size = min(bio->bi_size, remaining);
+
+ cell_release_singleton(cell, bio);
+ cell_release_singleton(cell2, bio);
+ remap_and_issue(tc, bio, lookup_result.block);
+ }
+ break;
+
+ case -ENODATA:
+ /*
+ * It isn't provisioned, just forget it.
+ */
+ cell_release_singleton(cell, bio);
+ bio_endio(bio, 0);
+ break;
+
+ default:
+ DMERR("discard: find block unexpectedly returned %d\n", r);
+ cell_release_singleton(cell, bio);
+ bio_io_error(bio);
+ break;
+ }
+}
+
static void break_sharing(struct thin_c *tc, struct bio *bio, dm_block_t block,
struct cell_key *key,
struct dm_thin_lookup_result *lookup_result,
@@ -1285,6 +1398,7 @@ static void process_bio(struct thin_c *tc, struct bio *bio)
default:
DMERR("dm_thin_find_block() failed, error = %d", r);
+ cell_release_singleton(cell, bio);
bio_io_error(bio);
break;
}
@@ -1320,7 +1434,11 @@ static void process_deferred_bios(struct pool *pool)
break;
}
- process_bio(tc, bio);
+
+ if (bio->bi_rw & REQ_DISCARD)
+ process_discard(tc, bio);
+ else
+ process_bio(tc, bio);
}
/*
@@ -1353,7 +1471,8 @@ static void do_worker(struct work_struct *ws)
{
struct pool *pool = container_of(ws, struct pool, worker);
- process_prepared_mappings(pool);
+ process_prepared(pool, &pool->prepared_mappings, process_prepared_mapping);
+ process_prepared(pool, &pool->prepared_discards, process_prepared_discard);
process_deferred_bios(pool);
}
@@ -1385,6 +1504,7 @@ static struct endio_hook *thin_hook_bio(struct thin_c *tc, struct bio *bio)
h->tc = tc;
h->shared_read_entry = NULL;
+ h->all_io_entry = bio->bi_rw & REQ_DISCARD ? NULL : ds_inc(&pool->all_io_ds);
h->overwrite_mapping = NULL;
return h;
@@ -1403,7 +1523,7 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio,
struct dm_thin_lookup_result result;
map_context->ptr = thin_hook_bio(tc, bio);
- if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) {
+ if (bio->bi_rw & (REQ_DISCARD | REQ_FLUSH | REQ_FUA)) {
thin_defer_bio(tc, bio);
return DM_MAPIO_SUBMITTED;
}
@@ -1578,10 +1698,12 @@ static struct pool *pool_create(struct mapped_device *pool_md,
bio_list_init(&pool->deferred_bios);
bio_list_init(&pool->deferred_flush_bios);
INIT_LIST_HEAD(&pool->prepared_mappings);
+ INIT_LIST_HEAD(&pool->prepared_discards);
pool->low_water_triggered = 0;
pool->no_free_space = 0;
bio_list_init(&pool->retry_on_resume_list);
ds_init(&pool->shared_read_ds);
+ ds_init(&pool->all_io_ds);
pool->next_mapping = NULL;
pool->mapping_pool =
@@ -1821,7 +1943,8 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
pt->low_water_blocks = low_water_blocks;
pt->zero_new_blocks = pf.zero_new_blocks;
ti->num_flush_requests = 1;
- ti->num_discard_requests = 0;
+ ti->num_discard_requests = 1;
+ ti->discards_supported = 1;
ti->private = pt;
pt->callbacks.congested_fn = pool_is_congested;
@@ -2213,6 +2336,17 @@ static int pool_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
}
+static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
+{
+ limits->max_discard_sectors = pool->sectors_per_block;
+
+ /*
+ * This is just a hint, and not enforced. We have to cope with
+ * bios that overlap 2 blocks.
+ */
+ limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
+}
+
static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
{
struct pool_c *pt = ti->private;
@@ -2220,6 +2354,7 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
blk_limits_io_min(limits, 0);
blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
+ set_discard_limits(pool, limits);
}
static struct target_type pool_target = {
@@ -2336,8 +2471,8 @@ static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
ti->split_io = tc->pool->sectors_per_block;
ti->num_flush_requests = 1;
- ti->num_discard_requests = 0;
- ti->discards_supported = 0;
+ ti->num_discard_requests = 1;
+ ti->discards_supported = 1;
dm_put(pool_md);
@@ -2393,6 +2528,14 @@ static int thin_endio(struct dm_target *ti,
spin_unlock_irqrestore(&pool->lock, flags);
}
+ if (h->all_io_entry) {
+ INIT_LIST_HEAD(&work);
+ ds_dec(h->all_io_entry, &work);
+ list_for_each_entry_safe(m, tmp, &work, list)
+ list_add(&m->list, &pool->prepared_discards);
+ }
+
+ mempool_free(h, pool->endio_hook_pool);
return 0;
}
@@ -2472,9 +2615,11 @@ static int thin_iterate_devices(struct dm_target *ti,
static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
{
struct thin_c *tc = ti->private;
+ struct pool *pool = tc->pool;
blk_limits_io_min(limits, 0);
- blk_limits_io_opt(limits, tc->pool->sectors_per_block << SECTOR_SHIFT);
+ blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
+ set_discard_limits(pool, limits);
}
static struct target_type thin_target = {
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 11/14] dm_thin: add pool target flags to control discard
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (8 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 10/14] dm_thin: add support for REQ_DISCARD Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-23 12:37 ` Alasdair G Kergon
2012-03-16 15:22 ` [PATCH 12/14] dm_thin: commit metadata every second Joe Thornber
` (3 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Enhancement.
You can turn discard support on or off. Also you can specify whether
the discard gets passed down to the lower layer, or just removes the
mapping within the thinp target.
New, optional pool target flags:
'ignore_discard': disable discard support
'no_discard_passdown': don't pass discards down to the underlying data device
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
Documentation/device-mapper/thin-provisioning.txt | 2 +
drivers/md/dm-thin.c | 78 +++++++++++++++------
2 files changed, 59 insertions(+), 21 deletions(-)
diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 0fc0b0f..c33bf28 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -222,6 +222,8 @@ i) Constructor
Optional feature arguments:
- 'skip_block_zeroing': skips the zeroing of newly-provisioned blocks.
+ 'ignore_discard': disable discard support
+ 'no_discard_passdown': don't pass discards down to the underlying data device
Data block size must be between 64KB (128 sectors) and 1GB
(2097152 sectors) inclusive.
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 1691be9..a1b274b 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -499,6 +499,13 @@ static void build_virtual_key(struct dm_thin_device *td, dm_block_t b,
* devices.
*/
struct new_mapping;
+
+struct pool_features {
+ unsigned zero_new_blocks:1;
+ unsigned discard_enabled:1;
+ unsigned discard_passdown:1;
+};
+
struct pool {
struct list_head list;
struct dm_target *ti; /* Only set if a pool target is bound */
@@ -512,7 +519,7 @@ struct pool {
dm_block_t offset_mask;
dm_block_t low_water_blocks;
- unsigned zero_new_blocks:1;
+ struct pool_features pf;
unsigned low_water_triggered:1; /* A dm event has been sent */
unsigned no_free_space:1; /* A -ENOSPC warning has been issued */
@@ -551,7 +558,7 @@ struct pool_c {
struct dm_target_callbacks callbacks;
dm_block_t low_water_blocks;
- unsigned zero_new_blocks:1;
+ struct pool_features pf;
};
/*
@@ -1057,7 +1064,7 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
* zeroing pre-existing data, we can issue the bio immediately.
* Otherwise we use kcopyd to zero the data first.
*/
- if (!pool->zero_new_blocks)
+ if (!pool->pf.zero_new_blocks)
process_prepared_mapping(m);
else if (io_overwrites_block(pool, bio)) {
@@ -1605,7 +1612,7 @@ static int bind_control_target(struct pool *pool, struct dm_target *ti)
pool->ti = ti;
pool->low_water_blocks = pt->low_water_blocks;
- pool->zero_new_blocks = pt->zero_new_blocks;
+ pool->pf = pt->pf;
return 0;
}
@@ -1619,6 +1626,14 @@ static void unbind_control_target(struct pool *pool, struct dm_target *ti)
/*----------------------------------------------------------------
* Pool creation
*--------------------------------------------------------------*/
+/* Initialize pool features. */
+static void pool_features_init(struct pool_features *pf)
+{
+ pf->zero_new_blocks = 1;
+ pf->discard_enabled = 1;
+ pf->discard_passdown = 1;
+}
+
static void __pool_destroy(struct pool *pool)
{
__pool_table_remove(pool);
@@ -1666,7 +1681,7 @@ static struct pool *pool_create(struct mapped_device *pool_md,
pool->block_shift = ffs(block_size) - 1;
pool->offset_mask = block_size - 1;
pool->low_water_blocks = 0;
- pool->zero_new_blocks = 1;
+ pool_features_init(&pool->pf);
pool->prison = prison_create(PRISON_CELLS);
if (!pool->prison) {
*error = "Error creating pool's bio prison";
@@ -1802,10 +1817,6 @@ static void pool_dtr(struct dm_target *ti)
mutex_unlock(&dm_thin_pool_table.mutex);
}
-struct pool_features {
- unsigned zero_new_blocks:1;
-};
-
static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
struct dm_target *ti)
{
@@ -1814,7 +1825,7 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
const char *arg_name;
static struct dm_arg _args[] = {
- {0, 1, "Invalid number of pool feature arguments"},
+ {0, 3, "Invalid number of pool feature arguments"},
};
/*
@@ -1834,6 +1845,12 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
if (!strcasecmp(arg_name, "skip_block_zeroing")) {
pf->zero_new_blocks = 0;
continue;
+ } else if (!strcasecmp(arg_name, "ignore_discard")) {
+ pf->discard_enabled = 0;
+ continue;
+ } else if (!strcasecmp(arg_name, "no_discard_passdown")) {
+ pf->discard_passdown = 0;
+ continue;
}
ti->error = "Unrecognised pool feature requested";
@@ -1851,6 +1868,8 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
*
* Optional feature arguments are:
* skip_block_zeroing: skips the zeroing of newly-provisioned blocks.
+ * ignore_discard: disable discard
+ * no_discard_passdown: don't pass discards down to the data device
*/
static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
{
@@ -1915,8 +1934,7 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
/*
* Set default pool features.
*/
- memset(&pf, 0, sizeof(pf));
- pf.zero_new_blocks = 1;
+ pool_features_init(&pf);
dm_consume_args(&as, 4);
r = parse_pool_features(&as, &pf, ti);
@@ -1941,10 +1959,12 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
pt->metadata_dev = metadata_dev;
pt->data_dev = data_dev;
pt->low_water_blocks = low_water_blocks;
- pt->zero_new_blocks = pf.zero_new_blocks;
+ pt->pf = pf;
ti->num_flush_requests = 1;
- ti->num_discard_requests = 1;
- ti->discards_supported = 1;
+ if (pf.discard_enabled && pf.discard_passdown) {
+ ti->discards_supported = 1;
+ ti->num_discard_requests = 1;
+ }
ti->private = pt;
pt->callbacks.congested_fn = pool_is_congested;
@@ -2241,7 +2261,7 @@ static int pool_message(struct dm_target *ti, unsigned argc, char **argv)
static int pool_status(struct dm_target *ti, status_type_t type,
char *result, unsigned maxlen)
{
- int r;
+ int r, count;
unsigned sz = 0;
uint64_t transaction_id;
dm_block_t nr_free_blocks_data;
@@ -2304,10 +2324,18 @@ static int pool_status(struct dm_target *ti, status_type_t type,
(unsigned long)pool->sectors_per_block,
(unsigned long long)pt->low_water_blocks);
- DMEMIT("%u ", !pool->zero_new_blocks);
+ count = !pool->pf.zero_new_blocks + !pool->pf.discard_enabled + !pool->pf.discard_passdown;
+ DMEMIT("%u ", count);
- if (!pool->zero_new_blocks)
+ if (!pool->pf.zero_new_blocks)
DMEMIT("skip_block_zeroing ");
+
+ if (!pool->pf.discard_enabled)
+ DMEMIT("skip_discard ");
+
+ if (!pool->pf.discard_passdown)
+ DMEMIT("skip_discard_passdown");
+
break;
}
@@ -2345,6 +2373,7 @@ static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
* bios that overlap 2 blocks.
*/
limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
+ limits->discard_zeroes_data = pool->pf.zero_new_blocks;
}
static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
@@ -2354,7 +2383,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
blk_limits_io_min(limits, 0);
blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
- set_discard_limits(pool, limits);
+ if (pool->pf.discard_enabled)
+ set_discard_limits(pool, limits);
}
static struct target_type pool_target = {
@@ -2403,6 +2433,8 @@ static void thin_dtr(struct dm_target *ti)
* pool_dev: the path to the pool (eg, /dev/mapper/my_pool)
* dev_id: the internal device identifier
* origin_dev: a device external to the pool that should act as the origin
+ *
+ * If the pool has discards disabled, they get disabled for the thin as well.
*/
static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
{
@@ -2471,8 +2503,12 @@ static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
ti->split_io = tc->pool->sectors_per_block;
ti->num_flush_requests = 1;
- ti->num_discard_requests = 1;
- ti->discards_supported = 1;
+
+ /* In case the pool supports discards, pass them on. */
+ if (tc->pool->pf.discard_enabled) {
+ ti->discards_supported = 1;
+ ti->num_discard_requests = 1;
+ }
dm_put(pool_md);
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 11/14] dm_thin: add pool target flags to control discard
2012-03-16 15:22 ` [PATCH 11/14] dm_thin: add pool target flags to control discard Joe Thornber
@ 2012-03-23 12:37 ` Alasdair G Kergon
2012-03-23 21:55 ` [PATCH] dm thin: fix pool target flags that " Mike Snitzer
0 siblings, 1 reply; 26+ messages in thread
From: Alasdair G Kergon @ 2012-03-23 12:37 UTC (permalink / raw)
To: device-mapper development; +Cc: Joe Thornber, Mike Snitzer
On Fri, Mar 16, 2012 at 03:22:34PM +0000, Joe Thornber wrote:
> + 'ignore_discard': disable discard support
> + 'no_discard_passdown': don't pass discards down to the underlying data device
If someone reloads the pool target changing either of these options, how do connected
thin targets pick up the change? If they don't pick it up automatically, how do you
determine whether they did or didn't pick it up - is that internal state not
exposed to userspace yet?
Alasdair
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] dm thin: fix pool target flags that control discard
2012-03-23 12:37 ` Alasdair G Kergon
@ 2012-03-23 21:55 ` Mike Snitzer
2012-03-26 14:15 ` Joe Thornber
0 siblings, 1 reply; 26+ messages in thread
From: Mike Snitzer @ 2012-03-23 21:55 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: device-mapper development, Joe Thornber
On Fri, Mar 23 2012 at 8:37am -0400,
Alasdair G Kergon <agk@redhat.com> wrote:
> On Fri, Mar 16, 2012 at 03:22:34PM +0000, Joe Thornber wrote:
> > + 'ignore_discard': disable discard support
> > + 'no_discard_passdown': don't pass discards down to the underlying data device
>
> If someone reloads the pool target changing either of these options, how do connected
> thin targets pick up the change? If they don't pick it up automatically, how do you
> determine whether they did or didn't pick it up - is that internal state not
> exposed to userspace yet?
Here are various fixes for dm_thin-add-pool-target-flags-to-control-discard.patch
o wire up 'discard_passdown' so that it does control whether or not the
discard is passed to the pool's underlying data device
o disallow disabling discards if a pool was already configured with
discards enabled (the default)
- justification is inlined in the code
- required pool_ctr knowing whether pool was created or not; so added
'created' flag to __pool_find()
o if 'discard_passdown' is enabled (the default) verify that the pool's
data device supports discards; if not warn and disable discard_passdown
o the pool device's discard support should _not_ be limited by whether
'discard_passdown' is enabled or not.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin.c | 41 ++++++++++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 7 deletions(-)
Index: linux/drivers/md/dm-thin.c
===================================================================
--- linux.orig/drivers/md/dm-thin.c
+++ linux/drivers/md/dm-thin.c
@@ -1209,7 +1209,7 @@ static void process_discard(struct thin_
*/
m = get_next_mapping(pool);
m->tc = tc;
- m->pass_discard = !lookup_result.shared;
+ m->pass_discard = (!lookup_result.shared) & pool->pf.discard_passdown;
m->virt_block = block;
m->data_block = lookup_result.block;
m->cell = cell;
@@ -1790,7 +1790,8 @@ static void __pool_dec(struct pool *pool
static struct pool *__pool_find(struct mapped_device *pool_md,
struct block_device *metadata_dev,
- unsigned long block_size, char **error)
+ unsigned long block_size, char **error,
+ int *created)
{
struct pool *pool = __pool_table_lookup_metadata_dev(metadata_dev);
@@ -1806,8 +1807,10 @@ static struct pool *__pool_find(struct m
return ERR_PTR(-EINVAL);
__pool_inc(pool);
- } else
+ } else {
pool = pool_create(pool_md, metadata_dev, block_size, error);
+ *created = 1;
+ }
}
return pool;
@@ -1887,7 +1890,7 @@ static int parse_pool_features(struct dm
*/
static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
{
- int r;
+ int r, pool_created = 0;
struct pool_c *pt;
struct pool *pool;
struct pool_features pf;
@@ -1961,12 +1964,36 @@ static int pool_ctr(struct dm_target *ti
}
pool = __pool_find(dm_table_get_md(ti->table), metadata_dev->bdev,
- block_size, &ti->error);
+ block_size, &ti->error, &pool_created);
if (IS_ERR(pool)) {
r = PTR_ERR(pool);
goto out_free_pt;
}
+ /*
+ * 'pool_created' reflects whether this is the first table load.
+ * Discard support is not allowed to be disabled after initial load.
+ * Disabling it would require a pool reload to trigger thin device
+ * changes (e.g. ti->discards_supported and QUEUE_FLAG_DISCARD).
+ */
+ if (!pool_created && !pf.discard_enabled && pool->pf.discard_enabled) {
+ ti->error = "Discard support cannot be disabled once enabled";
+ r = -EINVAL;
+ goto out_free_pt;
+ }
+
+ /*
+ * If discard_passdown was enabled verify that the data device
+ * supports discards. Disable discard_passdown if not.
+ */
+ if (pf.discard_passdown) {
+ struct request_queue *q = bdev_get_queue(data_dev->bdev);
+ if (!q || !blk_queue_discard(q)) {
+ DMWARN("Discard unsupported by data device, disabling discard passdown");
+ pf.discard_passdown = 0;
+ }
+ }
+
pt->pool = pool;
pt->ti = ti;
pt->metadata_dev = metadata_dev;
@@ -1974,9 +2001,9 @@ static int pool_ctr(struct dm_target *ti
pt->low_water_blocks = low_water_blocks;
pt->pf = pf;
ti->num_flush_requests = 1;
- if (pf.discard_enabled && pf.discard_passdown) {
- ti->discards_supported = 1;
+ if (pf.discard_enabled) {
ti->num_discard_requests = 1;
+ ti->discards_supported = 1;
}
ti->private = pt;
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH] dm thin: fix pool target flags that control discard
2012-03-23 21:55 ` [PATCH] dm thin: fix pool target flags that " Mike Snitzer
@ 2012-03-26 14:15 ` Joe Thornber
2012-03-26 15:33 ` Mike Snitzer
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-26 14:15 UTC (permalink / raw)
To: device-mapper development; +Cc: Joe Thornber, Alasdair G Kergon
On Fri, Mar 23, 2012 at 05:55:02PM -0400, Mike Snitzer wrote:
> On Fri, Mar 23 2012 at 8:37am -0400,
> Alasdair G Kergon <agk@redhat.com> wrote:
>
> > On Fri, Mar 16, 2012 at 03:22:34PM +0000, Joe Thornber wrote:
> > > + 'ignore_discard': disable discard support
> > > + 'no_discard_passdown': don't pass discards down to the underlying data device
> >
> > If someone reloads the pool target changing either of these options, how do connected
> > thin targets pick up the change? If they don't pick it up automatically, how do you
> > determine whether they did or didn't pick it up - is that internal state not
> > exposed to userspace yet?
>
> Here are various fixes for dm_thin-add-pool-target-flags-to-control-discard.patch
>
> o wire up 'discard_passdown' so that it does control whether or not the
> discard is passed to the pool's underlying data device
This already works, see test_{enable,disable}_passdown() tests here:
https://github.com/jthornber/thinp-test-suite/blob/master/discard_tests.rb
You've got to remember that there are 2 different interacting targets:
i) discard enabled in the 'thin' target will cause mappings to be removed from the btree.
ii) discard enabled in the 'pool' device will cause discards to be passed down.
The following logic is in the pool_ctr:
if (pf.discard_enabled && pf.discard_passdown) {
ti->discards_supported = 1;
ti->num_discard_requests = 1;
}
> o disallow disabling discards if a pool was already configured with
> discards enabled (the default)
> - justification is inlined in the code
> - required pool_ctr knowing whether pool was created or not; so added
> 'created' flag to __pool_find()
ack, this needs fixing.
> o if 'discard_passdown' is enabled (the default) verify that the pool's
> data device supports discards; if not warn and disable discard_passdown
Why? Do other targets do this? (no)
> o the pool device's discard support should _not_ be limited by whether
> 'discard_passdown' is enabled or not.
Yes it should.
- Joe
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: dm thin: fix pool target flags that control discard
2012-03-26 14:15 ` Joe Thornber
@ 2012-03-26 15:33 ` Mike Snitzer
2012-03-26 19:56 ` Mike Snitzer
2012-03-26 15:34 ` [PATCH] " Joe Thornber
2012-03-26 15:46 ` Joe Thornber
2 siblings, 1 reply; 26+ messages in thread
From: Mike Snitzer @ 2012-03-26 15:33 UTC (permalink / raw)
To: device-mapper development, Alasdair G Kergon, Joe Thornber
On Mon, Mar 26 2012 at 10:15am -0400,
Joe Thornber <thornber@redhat.com> wrote:
> On Fri, Mar 23, 2012 at 05:55:02PM -0400, Mike Snitzer wrote:
> > On Fri, Mar 23 2012 at 8:37am -0400,
> > Alasdair G Kergon <agk@redhat.com> wrote:
> >
> > > On Fri, Mar 16, 2012 at 03:22:34PM +0000, Joe Thornber wrote:
> > > > + 'ignore_discard': disable discard support
> > > > + 'no_discard_passdown': don't pass discards down to the underlying data device
> > >
> > > If someone reloads the pool target changing either of these options, how do connected
> > > thin targets pick up the change? If they don't pick it up automatically, how do you
> > > determine whether they did or didn't pick it up - is that internal state not
> > > exposed to userspace yet?
> >
> > Here are various fixes for dm_thin-add-pool-target-flags-to-control-discard.patch
> >
> > o wire up 'discard_passdown' so that it does control whether or not the
> > discard is passed to the pool's underlying data device
>
> This already works, see test_{enable,disable}_passdown() tests here:
>
> https://github.com/jthornber/thinp-test-suite/blob/master/discard_tests.rb
>
> You've got to remember that there are 2 different interacting targets:
>
> i) discard enabled in the 'thin' target will cause mappings to be removed from the btree.
>
> ii) discard enabled in the 'pool' device will cause discards to be passed down.
>
> The following logic is in the pool_ctr:
>
> if (pf.discard_enabled && pf.discard_passdown) {
> ti->discards_supported = 1;
> ti->num_discard_requests = 1;
> }
We reasoned through this already, your code did work. But for the
benefit of others this is why it worked:
thin will process_discard() via bio being deferring in thin_bio_map() --
provided pf.discard_enabled. Then thin will pass the discard on to the
pool device regardless of pf.discard_passdown. dm-core will then drop
the discard because ti->num_discard_requests is 0; but that results in
-EOPNOTSUPP.
Thing is, we don't want to return -EOPNOTSUPP if passdown was disabled.
So I think my change is an improvement (because process_prepared_discard
will now properly bio_endio(m->bio, 0) the discard).
> > o disallow disabling discards if a pool was already configured with
> > discards enabled (the default)
> > - justification is inlined in the code
> > - required pool_ctr knowing whether pool was created or not; so added
> > 'created' flag to __pool_find()
>
> ack, this needs fixing.
We've discussed that it is easier to just disallow changing discard
configuration. And that there was a bug in my early return. So you'll
be sending a follow-up fixup patch.
> > o if 'discard_passdown' is enabled (the default) verify that the pool's
> > data device supports discards; if not warn and disable discard_passdown
>
> Why? Do other targets do this? (no)
The thin target is the first target to use ti->discards_supported.
Setting that will cause dm_table_supports_discards to skip checking if
the target's underlying device(s) natively support discards.
So passdown needs to only be allowed if the underlying data device
supports discard -- otherwise we'll get a bunch of failed discards from
the SCSI layer, etc.
> > o the pool device's discard support should _not_ be limited by whether
> > 'discard_passdown' is enabled or not.
>
> Yes it should.
Yeah, like I said above your approach works (except for the
-EOPNOTSUPP). I just happen to prefer dm-thin taking a more active role
by short-circuting the discard_passdown directly.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: dm thin: fix pool target flags that control discard
2012-03-26 15:33 ` Mike Snitzer
@ 2012-03-26 19:56 ` Mike Snitzer
0 siblings, 0 replies; 26+ messages in thread
From: Mike Snitzer @ 2012-03-26 19:56 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: device-mapper development, Joe Thornber
On Mon, Mar 26 2012 at 11:33am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Mon, Mar 26 2012 at 10:15am -0400,
> Joe Thornber <thornber@redhat.com> wrote:
>
> > On Fri, Mar 23, 2012 at 05:55:02PM -0400, Mike Snitzer wrote:
> > > On Fri, Mar 23 2012 at 8:37am -0400,
> > > Alasdair G Kergon <agk@redhat.com> wrote:
> > >
> > > > On Fri, Mar 16, 2012 at 03:22:34PM +0000, Joe Thornber wrote:
> > > > > + 'ignore_discard': disable discard support
> > > > > + 'no_discard_passdown': don't pass discards down to the underlying data device
> > > >
> > > > If someone reloads the pool target changing either of these options, how do connected
> > > > thin targets pick up the change? If they don't pick it up automatically, how do you
> > > > determine whether they did or didn't pick it up - is that internal state not
> > > > exposed to userspace yet?
> > >
> > > Here are various fixes for dm_thin-add-pool-target-flags-to-control-discard.patch
> > >
> > > o wire up 'discard_passdown' so that it does control whether or not the
> > > discard is passed to the pool's underlying data device
> >
> > This already works, see test_{enable,disable}_passdown() tests here:
> >
> > https://github.com/jthornber/thinp-test-suite/blob/master/discard_tests.rb
> >
> > You've got to remember that there are 2 different interacting targets:
> >
> > i) discard enabled in the 'thin' target will cause mappings to be removed from the btree.
> >
> > ii) discard enabled in the 'pool' device will cause discards to be passed down.
> >
> > The following logic is in the pool_ctr:
> >
> > if (pf.discard_enabled && pf.discard_passdown) {
> > ti->discards_supported = 1;
> > ti->num_discard_requests = 1;
> > }
>
> We reasoned through this already, your code did work. But for the
> benefit of others this is why it worked:
>
> thin will process_discard() via bio being deferring in thin_bio_map() --
> provided pf.discard_enabled. Then thin will pass the discard on to the
> pool device regardless of pf.discard_passdown. dm-core will then drop
> the discard because ti->num_discard_requests is 0; but that results in
> -EOPNOTSUPP.
>
> Thing is, we don't want to return -EOPNOTSUPP if passdown was disabled.
> So I think my change is an improvement (because process_prepared_discard
> will now properly bio_endio(m->bio, 0) the discard).
>
> > > o disallow disabling discards if a pool was already configured with
> > > discards enabled (the default)
> > > - justification is inlined in the code
> > > - required pool_ctr knowing whether pool was created or not; so added
> > > 'created' flag to __pool_find()
> >
> > ack, this needs fixing.
>
> We've discussed that it is easier to just disallow changing discard
> configuration. And that there was a bug in my early return. So you'll
> be sending a follow-up fixup patch.
Here is a fixup patch that builds on what Alasdair already has staged
in dm_thin-add-pool-target-flags-to-control-discard.patch
I've added comments to help explain the subtlety of the initial thinp
discard code.
---
drivers/md/dm-thin.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
Index: linux/drivers/md/dm-thin.c
===================================================================
--- linux.orig/drivers/md/dm-thin.c
+++ linux/drivers/md/dm-thin.c
@@ -1972,19 +1972,20 @@ static int pool_ctr(struct dm_target *ti
/*
* 'pool_created' reflects whether this is the first table load.
- * Discard support is not allowed to be disabled after initial load.
- * Disabling it would require a pool reload to trigger thin device
- * changes (e.g. ti->discards_supported and QUEUE_FLAG_DISCARD).
+ * Top level discard support is not allowed to be changed after
+ * initial load. This would require a pool reload to trigger thin
+ * device changes.
*/
- if (!pool_created && !pf.discard_enabled && pool->pf.discard_enabled) {
+ if (!pool_created && pf.discard_enabled != pool->pf.discard_enabled) {
ti->error = "Discard support cannot be disabled once enabled";
r = -EINVAL;
- goto out_free_pt;
+ goto out_flags_changed;
}
/*
* If discard_passdown was enabled verify that the data device
- * supports discards. Disable discard_passdown if not.
+ * supports discards. Disable discard_passdown if not; otherwise
+ * -EOPNOTSUPP will be returned.
*/
if (pf.discard_passdown) {
struct request_queue *q = bdev_get_queue(data_dev->bdev);
@@ -2001,8 +2002,18 @@ static int pool_ctr(struct dm_target *ti
pt->low_water_blocks = low_water_blocks;
pt->pf = pf;
ti->num_flush_requests = 1;
- if (pf.discard_enabled) {
+ /*
+ * Only need to enable discards if the pool should pass
+ * them down to the data device. The thin device's discard
+ * processing will cause mappings to be removed from the btree.
+ */
+ if (pf.discard_enabled && pf.discard_passdown) {
ti->num_discard_requests = 1;
+ /*
+ * Setting 'discards_supported' circumvents the normal
+ * stacking of discard limits (this keeps the pool and
+ * thin devices' discard limits consistent).
+ */
ti->discards_supported = 1;
}
ti->private = pt;
@@ -2014,6 +2025,8 @@ static int pool_ctr(struct dm_target *ti
return 0;
+out_flags_changed:
+ __pool_dec(pool);
out_free_pt:
kfree(pt);
out:
@@ -2408,6 +2421,9 @@ static int pool_merge(struct dm_target *
static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
{
+ /*
+ * FIXME: these limits may be incompatible with the pool's data device
+ */
limits->max_discard_sectors = pool->sectors_per_block;
/*
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] dm thin: fix pool target flags that control discard
2012-03-26 14:15 ` Joe Thornber
2012-03-26 15:33 ` Mike Snitzer
@ 2012-03-26 15:34 ` Joe Thornber
2012-03-26 15:46 ` Joe Thornber
2 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-26 15:34 UTC (permalink / raw)
To: device-mapper development, Alasdair G Kergon, Joe Thornber
On Mon, Mar 26, 2012 at 03:15:21PM +0100, Joe Thornber wrote:
> > o disallow disabling discards if a pool was already configured with
> > discards enabled (the default)
> > - justification is inlined in the code
> > - required pool_ctr knowing whether pool was created or not; so added
> > 'created' flag to __pool_find()
>
> ack, this needs fixing.
The following patch prevents people changing the top-level discard
flag when reloading a pool target.
I'll push this test in a bit (dm-cache changes holding things up).
# we don't allow people to change their minds about top level
# discard support.
def test_change_discard_with_reload_fails
with_standard_pool(@size, :discard => true) do |pool|
assert_raise(ExitError) do
table = Table.new(ThinPool.new(@size, @metadata_dev, @data_dev,
@data_block_size, @low_water_mark, false, false, false))
pool.load(table)
end
end
with_standard_pool(@size, :discard => false) do |pool|
assert_raise(ExitError) do
table = Table.new(ThinPool.new(@size, @metadata_dev, @data_dev,
@data_block_size, @low_water_mark, false, true, false))
pool.load(table)
end
end
end
- Joe
commit 9f075e7f5330039addba213a0f512ab7c0ea3ecf
Author: Joe Thornber <ejt@redhat.com>
Date: Mon Mar 26 16:26:33 2012 +0100
dm-thin: don't allow the top level discard flag to be changed on reload.
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index e8fe5db..e5a6ed4 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1445,10 +1445,12 @@ static void __pool_dec(struct pool *pool)
static struct pool *__pool_find(struct mapped_device *pool_md,
struct block_device *metadata_dev,
- unsigned long block_size, char **error)
+ unsigned long block_size, char **error,
+ int *created)
{
struct pool *pool = __pool_table_lookup_metadata_dev(metadata_dev);
+ *created = 0;
if (pool) {
if (pool->pool_md != pool_md)
return ERR_PTR(-EBUSY);
@@ -1461,8 +1463,10 @@ static struct pool *__pool_find(struct mapped_device *pool_md,
return ERR_PTR(-EINVAL);
__pool_inc(pool);
- } else
+ } else {
pool = pool_create(pool_md, metadata_dev, block_size, error);
+ *created = 1;
+ }
}
return pool;
@@ -1542,7 +1546,7 @@ static int parse_pool_features(struct dm_arg_set *as, struct pool_features *pf,
*/
static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
{
- int r;
+ int r, pool_created;
struct pool_c *pt;
struct pool *pool;
struct pool_features pf;
@@ -1617,12 +1621,24 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
}
pool = __pool_find(dm_table_get_md(ti->table), metadata_dev->bdev,
- block_size, &ti->error);
+ block_size, &ti->error, &pool_created);
if (IS_ERR(pool)) {
r = PTR_ERR(pool);
goto out_free_pt;
}
+ /*
+ * 'pool_created' reflects whether this is the first table load.
+ * Top level discard support is not allowed to be changed after
+ * initial load. This would require a pool reload to trigger thin
+ * device changes.
+ */
+ if (!pool_created && pf.discard_enabled != pool->pf.discard_enabled) {
+ ti->error = "Discard support cannot be changed once enabled";
+ r = -EINVAL;
+ goto out_flags_changed;
+ }
+
pt->pool = pool;
pt->ti = ti;
pt->metadata_dev = metadata_dev;
@@ -1643,6 +1659,8 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
return 0;
+out_flags_changed:
+ __pool_dec(pool);
out_free_pt:
kfree(pt);
out:
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH] dm thin: fix pool target flags that control discard
2012-03-26 14:15 ` Joe Thornber
2012-03-26 15:33 ` Mike Snitzer
2012-03-26 15:34 ` [PATCH] " Joe Thornber
@ 2012-03-26 15:46 ` Joe Thornber
2 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-26 15:46 UTC (permalink / raw)
To: device-mapper development, Alasdair G Kergon, Joe Thornber
commit 1f04d92a1dd6173e38d7dc652dfe22876c40957e
Author: Joe Thornber <ejt@redhat.com>
Date: Mon Mar 26 16:43:36 2012 +0100
dm-thin: don't set ti->discards_supported for the pool target
The pool is little more than a linear target; the default dm decision
about whether to support discards is adequate.
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index e5a6ed4..015cdce 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1646,10 +1646,8 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
pt->low_water_blocks = low_water_blocks;
pt->pf = pf;
ti->num_flush_requests = 1;
- if (pf.discard_enabled && pf.discard_passdown) {
- ti->discards_supported = 1;
+ if (pf.discard_enabled && pf.discard_passdown)
ti->num_discard_requests = 1;
- }
ti->private = pt;
pt->callbacks.congested_fn = pool_is_congested;
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 12/14] dm_thin: commit metadata every second
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (9 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 11/14] dm_thin: add pool target flags to control discard Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-16 15:22 ` [PATCH 13/14] dm_thin: commit just before processing a pool target info request Joe Thornber
` (2 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Enhancement.
Released blocks don't become available until after the next commit
(for crash resilience). Prior to this patch commits were only
triggered by a message to the target or a REQ_{FLUSH,FUA} bio. This
allowed far too big a position to build up.
The interval is hard-coded to 1 second. This is a sensible setting.
I'm not making this user configurable, since there isn't much to be
gained by tweaking this - and a lot lost by setting it far too high.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin.c | 28 ++++++++++++++++++++++++++--
1 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index a1b274b..a3a66af 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -23,6 +23,7 @@
#define DEFERRED_SET_SIZE 64
#define MAPPING_POOL_SIZE 1024
#define PRISON_CELLS 1024
+#define COMMIT_PERIOD HZ
/*
* The block size of the device holding pool data must be
@@ -528,8 +529,10 @@ struct pool {
struct workqueue_struct *wq;
struct work_struct worker;
+ struct delayed_work waker;
unsigned ref_count;
+ unsigned long last_commit_jiffies;
spinlock_t lock;
struct bio_list deferred_bios;
@@ -1411,6 +1414,12 @@ static void process_bio(struct thin_c *tc, struct bio *bio)
}
}
+static int need_commit_due_to_time(struct pool *pool)
+{
+ return jiffies < pool->last_commit_jiffies ||
+ jiffies > pool->last_commit_jiffies + COMMIT_PERIOD;
+}
+
static void process_deferred_bios(struct pool *pool)
{
unsigned long flags;
@@ -1458,7 +1467,7 @@ static void process_deferred_bios(struct pool *pool)
bio_list_init(&pool->deferred_flush_bios);
spin_unlock_irqrestore(&pool->lock, flags);
- if (bio_list_empty(&bios))
+ if (bio_list_empty(&bios) && !need_commit_due_to_time(pool))
return;
r = dm_pool_commit_metadata(pool->pmd);
@@ -1469,6 +1478,7 @@ static void process_deferred_bios(struct pool *pool)
bio_io_error(bio);
return;
}
+ pool->last_commit_jiffies = jiffies;
while ((bio = bio_list_pop(&bios)))
generic_make_request(bio);
@@ -1483,6 +1493,17 @@ static void do_worker(struct work_struct *ws)
process_deferred_bios(pool);
}
+/*
+ * We want to commit periodically so that not too much
+ * unwritten data builds up.
+ */
+static void do_waker(struct work_struct *ws)
+{
+ struct pool *pool = container_of(to_delayed_work(ws), struct pool, waker);
+ wake_worker(pool);
+ queue_delayed_work(pool->wq, &pool->waker, COMMIT_PERIOD);
+}
+
/*----------------------------------------------------------------*/
/*
@@ -1709,6 +1730,7 @@ static struct pool *pool_create(struct mapped_device *pool_md,
}
INIT_WORK(&pool->worker, do_worker);
+ INIT_DELAYED_WORK(&pool->waker, do_waker);
spin_lock_init(&pool->lock);
bio_list_init(&pool->deferred_bios);
bio_list_init(&pool->deferred_flush_bios);
@@ -1737,6 +1759,7 @@ static struct pool *pool_create(struct mapped_device *pool_md,
goto bad_endio_hook_pool;
}
pool->ref_count = 1;
+ pool->last_commit_jiffies = jiffies;
pool->pool_md = pool_md;
pool->md_dev = metadata_dev;
__pool_table_insert(pool);
@@ -2072,7 +2095,7 @@ static void pool_resume(struct dm_target *ti)
__requeue_bios(pool);
spin_unlock_irqrestore(&pool->lock, flags);
- wake_worker(pool);
+ do_waker(&pool->waker.work);
}
static void pool_postsuspend(struct dm_target *ti)
@@ -2081,6 +2104,7 @@ static void pool_postsuspend(struct dm_target *ti)
struct pool_c *pt = ti->private;
struct pool *pool = pt->pool;
+ cancel_delayed_work(&pool->waker);
flush_workqueue(pool->wq);
r = dm_pool_commit_metadata(pool->pmd);
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 13/14] dm_thin: commit just before processing a pool target info request
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (10 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 12/14] dm_thin: commit metadata every second Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-19 14:00 ` Alasdair G Kergon
2012-03-16 15:22 ` [PATCH 14/14] dm_thin: bump the target versions Joe Thornber
2012-03-20 18:24 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Alasdair G Kergon
13 siblings, 1 reply; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
Enhancement.
This makes the free block counts more accurate.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index a3a66af..63f8adf 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2305,6 +2305,15 @@ static int pool_status(struct dm_target *ti, status_type_t type,
if (r)
return r;
+ /*
+ * If we're in the middle of a transaction the free block
+ * counts can be quite out of date, so we do a quick
+ * commit.
+ */
+ r = dm_pool_commit_metadata(pool->pmd);
+ if (r)
+ return r;
+
r = dm_pool_get_free_metadata_block_count(pool->pmd,
&nr_free_blocks_metadata);
if (r)
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 13/14] dm_thin: commit just before processing a pool target info request
2012-03-16 15:22 ` [PATCH 13/14] dm_thin: commit just before processing a pool target info request Joe Thornber
@ 2012-03-19 14:00 ` Alasdair G Kergon
2012-03-20 10:12 ` Joe Thornber
0 siblings, 1 reply; 26+ messages in thread
From: Alasdair G Kergon @ 2012-03-19 14:00 UTC (permalink / raw)
To: device-mapper development
On Fri, Mar 16, 2012 at 03:22:36PM +0000, Joe Thornber wrote:
> This makes the free block counts more accurate.
I'm not keen on this one: deferring it.
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -2305,6 +2305,15 @@ static int pool_status(struct dm_target *ti, status_type_t type,
(Used by dmsetup status and wait.)
> + /*
> + * If we're in the middle of a transaction the free block
> + * counts can be quite out of date, so we do a quick
> + * commit.
> + */
> + r = dm_pool_commit_metadata(pool->pmd);
> + if (r)
> + return r;
> +
1) If commit fails (repeatedly?) we still need to get the status.
2) Could the commit ever be too slow? (Or re-use NOFLUSH flag to skip it?)
Alasdair
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 13/14] dm_thin: commit just before processing a pool target info request
2012-03-19 14:00 ` Alasdair G Kergon
@ 2012-03-20 10:12 ` Joe Thornber
0 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-20 10:12 UTC (permalink / raw)
To: device-mapper development
On Mon, Mar 19, 2012 at 02:00:12PM +0000, Alasdair G Kergon wrote:
> On Fri, Mar 16, 2012 at 03:22:36PM +0000, Joe Thornber wrote:
> > This makes the free block counts more accurate.
>
> I'm not keen on this one: deferring it.
>
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> > @@ -2305,6 +2305,15 @@ static int pool_status(struct dm_target *ti, status_type_t type,
>
> (Used by dmsetup status and wait.)
>
> > + /*
> > + * If we're in the middle of a transaction the free block
> > + * counts can be quite out of date, so we do a quick
> > + * commit.
> > + */
> > + r = dm_pool_commit_metadata(pool->pmd);
> > + if (r)
> > + return r;
> > +
>
> 1) If commit fails (repeatedly?) we still need to get the status.
>
> 2) Could the commit ever be too slow? (Or re-use NOFLUSH flag to skip it?)
Agreed. The correct solution is to switch to a read-only mode if the
commit fails, and thus never commit again. Commit does some io
obviously; not sure what constitutes 'too slow'.
- Joe
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 14/14] dm_thin: bump the target versions
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (11 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 13/14] dm_thin: commit just before processing a pool target info request Joe Thornber
@ 2012-03-16 15:22 ` Joe Thornber
2012-03-20 18:24 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Alasdair G Kergon
13 siblings, 0 replies; 26+ messages in thread
From: Joe Thornber @ 2012-03-16 15:22 UTC (permalink / raw)
To: dm-devel; +Cc: Joe Thornber, Mike Snitzer
The target lines for the pool target and thin target both take
addition, optional arguments. Bump up the version numbers so tools
can check for these extra features.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-thin.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 63f8adf..3f65678 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2424,7 +2424,7 @@ static struct target_type pool_target = {
.name = "thin-pool",
.features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE |
DM_TARGET_IMMUTABLE,
- .version = {1, 0, 0},
+ .version = {1, 1, 0},
.module = THIS_MODULE,
.ctr = pool_ctr,
.dtr = pool_dtr,
@@ -2693,7 +2693,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
static struct target_type thin_target = {
.name = "thin",
- .version = {1, 0, 0},
+ .version = {1, 1, 0},
.module = THIS_MODULE,
.ctr = thin_ctr,
.dtr = thin_dtr,
--
1.7.9.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell.
2012-03-16 15:22 ` [PATCH 01/14] dm-thin: don't use the bi_next field for the holder of a cell Joe Thornber
` (12 preceding siblings ...)
2012-03-16 15:22 ` [PATCH 14/14] dm_thin: bump the target versions Joe Thornber
@ 2012-03-20 18:24 ` Alasdair G Kergon
13 siblings, 0 replies; 26+ messages in thread
From: Alasdair G Kergon @ 2012-03-20 18:24 UTC (permalink / raw)
To: Joe Thornber; +Cc: dm-devel, Mike Snitzer
On Fri, Mar 16, 2012 at 03:22:24PM +0000, Joe Thornber wrote:
> This also has the effect of simplifying some code that had to work out
> which bio was the holder.
Further simplification.
- cell isn't uninitialised
- cell2 doesn't need initialising
- avoid else clause indentation
- inmates is never uninitialised (as per comment)
Alasdair
From: Alasdair G Kergon <agk@redhat.com>
Clean up prison code after holders patch.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
---
drivers/md/dm-thin.c | 79 +++++++++++++++++++++++++--------------------------
1 file changed, 39 insertions(+), 40 deletions(-)
Index: linux-3.3/drivers/md/dm-thin.c
===================================================================
--- linux-3.3.orig/drivers/md/dm-thin.c
+++ linux-3.3/drivers/md/dm-thin.c
@@ -225,55 +225,57 @@ static struct cell *__search_bucket(stru
static int bio_detain(struct bio_prison *prison, struct cell_key *key,
struct bio *inmate, struct cell **ref)
{
- int r;
+ int r = 1;
unsigned long flags;
uint32_t hash = hash_key(prison, key);
- struct cell *uninitialized_var(cell), *cell2 = NULL;
+ struct cell *cell, *cell2;
BUG_ON(hash > prison->nr_buckets);
spin_lock_irqsave(&prison->lock, flags);
+
cell = __search_bucket(prison->cells + hash, key);
+ if (cell) {
+ bio_list_add(&cell->bios, inmate);
+ goto out;
+ }
- if (!cell) {
- /*
- * Allocate a new cell
- */
- spin_unlock_irqrestore(&prison->lock, flags);
- cell2 = mempool_alloc(prison->cell_pool, GFP_NOIO);
- spin_lock_irqsave(&prison->lock, flags);
+ /*
+ * Allocate a new cell
+ */
+ spin_unlock_irqrestore(&prison->lock, flags);
+ cell2 = mempool_alloc(prison->cell_pool, GFP_NOIO);
+ spin_lock_irqsave(&prison->lock, flags);
- /*
- * We've been unlocked, so we have to double check that
- * nobody else has inserted this cell in the meantime.
- */
- cell = __search_bucket(prison->cells + hash, key);
+ /*
+ * We've been unlocked, so we have to double check that
+ * nobody else has inserted this cell in the meantime.
+ */
+ cell = __search_bucket(prison->cells + hash, key);
+ if (cell) {
+ mempool_free(cell2, prison->cell_pool);
+ bio_list_add(&cell->bios, inmate);
+ goto out;
+ }
- if (!cell) {
- cell = cell2;
- cell2 = NULL;
-
- cell->prison = prison;
- memcpy(&cell->key, key, sizeof(cell->key));
- cell->holder = inmate;
- bio_list_init(&cell->bios);
- hlist_add_head(&cell->list, prison->cells + hash);
- r = 0;
+ /*
+ * Use new cell.
+ */
+ cell = cell2;
- } else {
- mempool_free(cell2, prison->cell_pool);
- cell2 = NULL;
- r = 1;
- bio_list_add(&cell->bios, inmate);
- }
+ cell->prison = prison;
+ memcpy(&cell->key, key, sizeof(cell->key));
+ cell->holder = inmate;
+ bio_list_init(&cell->bios);
+ hlist_add_head(&cell->list, prison->cells + hash);
- } else {
- r = 1;
- bio_list_add(&cell->bios, inmate);
- }
+ r = 0;
+
+out:
spin_unlock_irqrestore(&prison->lock, flags);
*ref = cell;
+
return r;
}
@@ -286,10 +288,8 @@ static void __cell_release(struct cell *
hlist_del(&cell->list);
- if (inmates) {
- bio_list_add(inmates, cell->holder);
- bio_list_merge(inmates, &cell->bios);
- }
+ bio_list_add(inmates, cell->holder);
+ bio_list_merge(inmates, &cell->bios);
mempool_free(cell, prison->cell_pool);
}
@@ -335,8 +335,7 @@ static void __cell_release_no_holder(str
struct bio_prison *prison = cell->prison;
hlist_del(&cell->list);
- if (inmates)
- bio_list_merge(inmates, &cell->bios);
+ bio_list_merge(inmates, &cell->bios);
mempool_free(cell, prison->cell_pool);
}
^ permalink raw reply [flat|nested] 26+ messages in thread