All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/10] Snapshot merging back from the dead?
@ 2008-04-01  7:32 Mark McLoughlin
  2008-04-01  7:32 ` [PATCH 01/10] dm snapshot: Fix the fraction full percentage Mark McLoughlin
  2008-04-14  8:41 ` [PATCH 0/10] Snapshot merging back from the dead? Mark McLoughlin
  0 siblings, 2 replies; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: dm-devel


Hi Alasdair,
        I've gone back and updated my snapshot-merged patches from
ages ago so that they apply to 2.6.25-rc7.

        Even though these updated patches are completely untested,
I've been quite careful in rebasing them and they do build, so I
figure they're worth posting here for review.

        Hopefully, I'll soon get a chance to test them at length and
also update the userland patches.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 01/10] dm snapshot: Fix the fraction full percentage
  2008-04-01  7:32 [PATCH 0/10] Snapshot merging back from the dead? Mark McLoughlin
@ 2008-04-01  7:32 ` Mark McLoughlin
  2008-04-01  7:32   ` [PATCH 02/10] dm snapshot: Track snapshot reads Mark McLoughlin
  2008-04-14  8:41 ` [PATCH 0/10] Snapshot merging back from the dead? Mark McLoughlin
  1 sibling, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Make fraction_full() return a percentage of the amount of actual data
the device can hold, rather than of the amount of data+metadata it can
hold.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-exception-store.c |   36 ++++++++++++++++++++++++++++++++++--
 1 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-exception-store.c b/drivers/md/dm-exception-store.c
index 5bbce29..02fd3e9 100644
--- a/drivers/md/dm-exception-store.c
+++ b/drivers/md/dm-exception-store.c
@@ -429,11 +429,43 @@ static struct pstore *get_info(struct exception_store *store)
 	return (struct pstore *) store->context;
 }
 
+/*
+ * Calculate the number of chunks of data we can store in @n_chunks,
+ * given that each metadata area can store @exceptions_per_area
+ * exceptions.
+ */
+static inline uint32_t calc_n_data_chunks(uint32_t n_chunks,
+					  uint32_t exceptions_per_area)
+{
+	uint32_t stride, full_areas, leftovers;
+
+	/* we can't use the header */
+	if (--n_chunks <= 0)
+		return 0;
+
+	stride = exceptions_per_area + 1;
+
+	/* how many metadata areas would we completely fill? */
+	full_areas = n_chunks / stride;
+
+	/* how many chunks could we fit after the full areas? */
+	leftovers = n_chunks % stride;
+	if (leftovers - 1 <= 0)
+		leftovers = 0;
+
+	return (full_areas * exceptions_per_area) + leftovers;
+}
+
 static void persistent_fraction_full(struct exception_store *store,
 				     sector_t *numerator, sector_t *denominator)
 {
-	*numerator = get_info(store)->next_free * store->snap->chunk_size;
-	*denominator = get_dev_size(store->snap->cow->bdev);
+	struct pstore *ps = get_info(store);
+	uint32_t n_chunks;
+
+	n_chunks = get_dev_size(ps->snap->cow->bdev) >> ps->snap->chunk_shift;
+
+	*numerator = calc_n_data_chunks(ps->next_free, ps->exceptions_per_area);
+	*denominator = calc_n_data_chunks(n_chunks, ps->exceptions_per_area);
 }
 
 static void persistent_destroy(struct exception_store *store)
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 02/10] dm snapshot: Track snapshot reads
  2008-04-01  7:32 ` [PATCH 01/10] dm snapshot: Fix the fraction full percentage Mark McLoughlin
@ 2008-04-01  7:32   ` Mark McLoughlin
  2008-04-01  7:32     ` [PATCH 03/10] dm snapshot: Add exception store methods for merging Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Once a read request for an unmodified chunk in a snapshot has been
passed to the underlying original device, it is no longer tracked.

If, subsequently, a write is issued to the snapshot origin for the
same blocks, the code knows nothing about the outstanding read. It
blithely goes ahead and copies the original data across to the COW
device and submits the write request to the original device.

It does not wait for the read to complete first, so if the read from
the snapshot got delayed for any reason it could return changed data
from the origin write! Fortunately this is very rare because the read
is simple and quick but the write involves a series of slow steps.

This patch addresses the problem by adding code to track reads from
snapshot devices.

The pending_exception structure, which previously was only used for
tracking uncommitted writes, is now also used to track reads that get
mapped to original device.

The ref_count variable now keeps track of outstanding reads against
the snapshot, in addition to the already tracked number of chunks in
the process of being copied to the COW device. This prevents
pending_complete() from submitting origin writes to the original
device until all reads have been completed.

Device-mapper offers us private per-bio storage in map_context->ptr,
so we store the pending_exception structure there when mapping the bio
so we can retrieve it in our end_io function.

When the snapshot read is complete, snapshot_end_io will be called.
If there are no more snapshot reads outstanding, and the chunk is not
in the process of being copied to any COW device, then all pending
origin writes are queued for ksnapd to flush.

The scope of the pe_lock spinlock is expanded to protect ref_count,
the pending exception table and the origin_bios list to allow for
snapshot_end_io() to run concurrently in interrupt context.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |  151 +++++++++++++++++++++++++++++++++-----------------
 1 files changed, 101 insertions(+), 50 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index ae24eab..3b71f53 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -67,7 +67,8 @@ struct dm_snap_pending_exception {
 	struct dm_snap_pending_exception *primary_pe;
 
 	/*
-	 * Number of pending_exceptions processing this chunk.
+	 * Number of exception copies or snapshot reads pending on
+	 * this chunk.
 	 * When this drops to zero we must complete the origin bios.
 	 * If incrementing or decrementing this, hold pe->snap->lock for
 	 * the sibling concerned and not pe->primary_pe->snap->lock unless
@@ -705,12 +706,18 @@ static void __invalidate_snapshot(struct dm_snapshot *s, int err)
 static void get_pending_exception(struct dm_snap_pending_exception *pe)
 {
 	atomic_inc(&pe->ref_count);
+	if (pe->primary_pe && pe->primary_pe != pe)
+		atomic_inc(&pe->primary_pe->ref_count);
 }
 
 static struct bio *put_pending_exception(struct dm_snap_pending_exception *pe)
 {
+	struct dm_snapshot *s = pe->snap;
 	struct dm_snap_pending_exception *primary_pe;
 	struct bio *origin_bios = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&s->pe_lock, flags);
 
 	primary_pe = pe->primary_pe;
 
@@ -727,14 +734,21 @@ static struct bio *put_pending_exception(struct dm_snap_pending_exception *pe)
 	 * Free the pe if it's not linked to an origin write or if
 	 * it's not itself a primary pe.
 	 */
-	if (!primary_pe || primary_pe != pe)
+	if ((!primary_pe || primary_pe != pe) &&
+	    atomic_dec_and_test(&pe->ref_count)) {
+		remove_exception(&pe->e);
 		free_pending_exception(pe);
+	}
 
 	/*
 	 * Free the primary pe if nothing references it.
 	 */
-	if (primary_pe && !atomic_read(&primary_pe->ref_count))
+	if (primary_pe && !atomic_read(&primary_pe->ref_count)) {
+		remove_exception(&primary_pe->e);
 		free_pending_exception(primary_pe);
+	}
+
+	spin_unlock_irqrestore(&s->pe_lock, flags);
 
 	return origin_bios;
 }
@@ -778,7 +792,6 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success)
 	insert_completed_exception(s, e);
 
  out:
-	remove_exception(&pe->e);
 	snapshot_bios = bio_list_get(&pe->snapshot_bios);
 	origin_bios = put_pending_exception(pe);
 
@@ -857,6 +870,9 @@ __find_pending_exception(struct dm_snapshot *s, struct bio *bio)
 	struct dm_snap_exception *e;
 	struct dm_snap_pending_exception *pe;
 	chunk_t chunk = sector_to_chunk(s, bio->bi_sector);
+	unsigned long flags;
+
+	spin_lock_irqsave(&s->pe_lock, flags);
 
 	/*
 	 * Is there a pending exception for this already ?
@@ -865,27 +881,30 @@ __find_pending_exception(struct dm_snapshot *s, struct bio *bio)
 	if (e) {
 		/* cast the exception to a pending exception */
 		pe = container_of(e, struct dm_snap_pending_exception, e);
-		goto out;
+		goto ref_and_out;
 	}
 
 	/*
 	 * Create a new pending exception, we don't want
 	 * to hold the lock while we do this.
 	 */
+	spin_unlock_irqrestore(&s->pe_lock, flags);
 	up_write(&s->lock);
 	pe = alloc_pending_exception();
 	down_write(&s->lock);
+	spin_lock_irqsave(&s->pe_lock, flags);
 
 	if (!s->valid) {
 		free_pending_exception(pe);
-		return NULL;
+		pe = NULL;
+		goto out;
 	}
 
 	e = lookup_exception(&s->pending, chunk);
 	if (e) {
 		free_pending_exception(pe);
 		pe = container_of(e, struct dm_snap_pending_exception, e);
-		goto out;
+		goto ref_and_out;
 	}
 
 	pe->e.old_chunk = chunk;
@@ -896,15 +915,23 @@ __find_pending_exception(struct dm_snapshot *s, struct bio *bio)
 	pe->snap = s;
 	pe->started = 0;
 
-	if (s->store.prepare_exception(&s->store, &pe->e)) {
-		free_pending_exception(pe);
-		return NULL;
-	}
-
-	get_pending_exception(pe);
 	insert_exception(&s->pending, &pe->e);
 
+ ref_and_out:
+	if (bio_rw(bio) != WRITE)
+		get_pending_exception(pe);
+	else if (!pe->started) {
+		if (s->store.prepare_exception(&s->store, &pe->e)) {
+			free_pending_exception(pe);
+			pe = NULL;
+			goto out;
+		}
+		get_pending_exception(pe);
+	}
+
  out:
+	spin_unlock_irqrestore(&s->pe_lock, flags);
+
 	return pe;
 }
 
@@ -949,39 +976,29 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio,
 		goto out_unlock;
 	}
 
-	/*
-	 * Write to snapshot - higher level takes care of RW/RO
-	 * flags so we should only get this if we are
-	 * writeable.
-	 */
-	if (bio_rw(bio) == WRITE) {
-		pe = __find_pending_exception(s, bio);
-		if (!pe) {
-			__invalidate_snapshot(s, -ENOMEM);
-			r = -EIO;
-			goto out_unlock;
-		}
+	pe = __find_pending_exception(s, bio);
+	if (!pe) {
+		__invalidate_snapshot(s, -ENOMEM);
+		r = -EIO;
+		goto out_unlock;
+	}
 
+	if (bio_rw(bio) == WRITE) {
 		remap_exception(s, &pe->e, bio, chunk);
 		bio_list_add(&pe->snapshot_bios, bio);
 
 		r = DM_MAPIO_SUBMITTED;
 
 		if (!pe->started) {
-			/* this is protected by snap->lock */
 			pe->started = 1;
 			up_write(&s->lock);
 			start_copy(pe);
 			goto out;
 		}
-	} else
-		/*
-		 * FIXME: this read path scares me because we
-		 * always use the origin when we have a pending
-		 * exception.  However I can't think of a
-		 * situation where this is wrong - ejt.
-		 */
+	} else {
 		bio->bi_bdev = s->origin->bdev;
+		map_context->ptr = pe;
+	}
 
  out_unlock:
 	up_write(&s->lock);
@@ -989,6 +1006,36 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio,
 	return r;
 }
 
+static int snapshot_end_io(struct dm_target *ti, struct bio *bio,
+			   int error, union map_info *map_context)
+{
+	struct dm_snapshot *s = ti->private;
+	struct dm_snap_pending_exception *pe = map_context->ptr;
+	struct bio *flush;
+	unsigned long flags;
+
+	if (bio_rw(bio) == WRITE || !pe)
+		return error;
+
+	flush = put_pending_exception(pe);
+	if (!flush)
+		return error;
+
+	spin_lock_irqsave(&s->pe_lock, flags);
+
+	while (flush) {
+		struct bio *next = flush->bi_next;
+		bio_list_add(&s->queued_bios, flush);
+		flush = next;
+	}
+
+	queue_work(ksnapd, &s->queued_bios_work);
+
+	spin_unlock_irqrestore(&s->pe_lock, flags);
+
+	return error;
+}
+
 static void snapshot_resume(struct dm_target *ti)
 {
 	struct dm_snapshot *s = ti->private;
@@ -1043,11 +1090,12 @@ static int snapshot_status(struct dm_target *ti, status_type_t type,
  *---------------------------------------------------------------*/
 static int __origin_write(struct list_head *snapshots, struct bio *bio)
 {
-	int r = DM_MAPIO_REMAPPED, first = 0;
+	int r = DM_MAPIO_REMAPPED;
 	struct dm_snapshot *snap;
 	struct dm_snap_exception *e;
 	struct dm_snap_pending_exception *pe, *next_pe, *primary_pe = NULL;
 	chunk_t chunk;
+	unsigned long flags;
 	LIST_HEAD(pe_queue);
 
 	/* Do all the snapshots on this origin */
@@ -1073,9 +1121,6 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 		 * Check exception table to see if block
 		 * is already remapped in this snapshot
 		 * and trigger an exception if not.
-		 *
-		 * ref_count is initialised to 1 so pending_complete()
-		 * won't destroy the primary_pe while we're inside this loop.
 		 */
 		e = lookup_exception(&snap->complete, chunk);
 		if (e)
@@ -1087,6 +1132,8 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 			goto next_snapshot;
 		}
 
+		spin_lock_irqsave(&snap->pe_lock, flags);
+
 		if (!primary_pe) {
 			/*
 			 * Either every pe here has same
@@ -1094,10 +1141,15 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 			 */
 			if (pe->primary_pe)
 				primary_pe = pe->primary_pe;
-			else {
+			else
 				primary_pe = pe;
-				first = 1;
-			}
+
+			/*
+			 * we take a ref here so primary_pe won't
+			 * be destroyed while we're inside this
+			 * loop.
+			 */
+			get_pending_exception(primary_pe);
 
 			bio_list_add(&primary_pe->origin_bios, bio);
 
@@ -1106,9 +1158,14 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 
 		if (!pe->primary_pe) {
 			pe->primary_pe = primary_pe;
-			get_pending_exception(primary_pe);
+			atomic_inc(&pe->ref_count);
+			if (primary_pe != pe)
+				atomic_add(atomic_read(&pe->ref_count),
+					   &primary_pe->ref_count);
 		}
 
+		spin_unlock_irqrestore(&snap->pe_lock, flags);
+
 		if (!pe->started) {
 			pe->started = 1;
 			list_add_tail(&pe->list, &pe_queue);
@@ -1122,18 +1179,11 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 		return r;
 
 	/*
-	 * If this is the first time we're processing this chunk and
-	 * ref_count is now 1 it means all the pending exceptions
+	 * If ref_count is now 1 it means all the pending exceptions
 	 * got completed while we were in the loop above, so it falls to
 	 * us here to remove the primary_pe and submit any origin_bios.
 	 */
-
-	if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
-		flush_bios(bio_list_get(&primary_pe->origin_bios));
-		free_pending_exception(primary_pe);
-		/* If we got here, pe_queue is necessarily empty. */
-		return r;
-	}
+	flush_bios(put_pending_exception(primary_pe));
 
 	/*
 	 * Now that we have a complete pe list we can start the copying.
@@ -1266,6 +1316,7 @@ static struct target_type snapshot_target = {
 	.ctr     = snapshot_ctr,
 	.dtr     = snapshot_dtr,
 	.map     = snapshot_map,
+	.end_io  = snapshot_end_io,
 	.resume  = snapshot_resume,
 	.status  = snapshot_status,
 };
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 03/10] dm snapshot: Add exception store methods for merging
  2008-04-01  7:32   ` [PATCH 02/10] dm snapshot: Track snapshot reads Mark McLoughlin
@ 2008-04-01  7:32     ` Mark McLoughlin
  2008-04-01  7:32       ` [PATCH 04/10] dm snapshot: Split snapshot constructor Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Add prepare_merge() and commit_merge() methods to the exception store to
allow for snapshot chunks to be removed from the exception store as the
chunks are merged back into the origin.

prepare_merge() retrieves the details of the last exception in the store
in preparation for that chunk to be merged back into the origin.

commit_merge() is then later used to remove that same exception from the
store once the merge has completed.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-exception-store.c |   69 +++++++++++++++++++++++++++++++++++++++
 drivers/md/dm-snap.h            |   12 +++++++
 2 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-exception-store.c b/drivers/md/dm-exception-store.c
index 02fd3e9..4fd5381 100644
--- a/drivers/md/dm-exception-store.c
+++ b/drivers/md/dm-exception-store.c
@@ -624,6 +624,71 @@ static void persistent_commit(struct exception_store *store,
 	}
 }
 
+/* Given an exception index within a certain area, return the
+ * corresponding chunk index.
+ */
+static inline uint32_t exception_to_chunk(uint32_t area,
+					  uint32_t exceptions_per_area,
+					  uint32_t exception)
+{
+	/* Header followed by sets of (metadata chunk, exceptions_per_area chunks)
+	 */
+	return 1 + (area * (1 + exceptions_per_area)) + 1 + exception;
+}
+
+static int persistent_prepare_merge(struct exception_store *store,
+				    struct dm_snap_exception *e,
+				    int *empty)
+{
+	struct pstore *ps = get_info(store);
+	struct disk_exception de;
+	int r;
+
+	*empty = 0;
+
+	if (ps->current_committed == 0) {
+		if (ps->current_area == 0) {
+			*empty = 1;
+			return 0;
+		}
+
+		r = area_io(ps, ps->current_area - 1, READ);
+		if (r)
+			return r;
+
+		ps->current_committed = ps->exceptions_per_area - 1;
+	} else {
+		ps->current_committed--;
+	}
+
+	read_exception(ps, ps->current_committed, &de);
+
+	e->old_chunk = de.old_chunk;
+	e->new_chunk = de.new_chunk;
+
+	ps->next_free = exception_to_chunk(ps->current_area,
+					   ps->exceptions_per_area,
+					   ps->current_committed);
+
+	return 0;
+}
+
+static void persistent_commit_merge(struct exception_store *store)
+{
+	struct pstore *ps = get_info(store);
+	struct disk_exception de;
+	int r;
+
+	read_exception(ps, ps->current_committed, &de);
+
+	de.new_chunk = 0;
+	write_exception(ps, ps->current_committed, &de);
+
+	r = area_io(ps, ps->current_area, WRITE);
+	if (r)
+		ps->valid = 0;
+}
+
 static void persistent_drop(struct exception_store *store)
 {
 	struct pstore *ps = get_info(store);
@@ -664,6 +729,8 @@ int dm_create_persistent(struct exception_store *store)
 	store->read_metadata = persistent_read_metadata;
 	store->prepare_exception = persistent_prepare;
 	store->commit_exception = persistent_commit;
+	store->prepare_merge = persistent_prepare_merge;
+	store->commit_merge = persistent_commit_merge;
 	store->drop_snapshot = persistent_drop;
 	store->fraction_full = persistent_fraction_full;
 	store->context = ps;
@@ -727,6 +794,8 @@ int dm_create_transient(struct exception_store *store)
 	store->read_metadata = transient_read_metadata;
 	store->prepare_exception = transient_prepare;
 	store->commit_exception = transient_commit;
+	store->prepare_merge = NULL;
+	store->commit_merge = NULL;
 	store->drop_snapshot = NULL;
 	store->fraction_full = transient_fraction_full;
 
diff --git a/drivers/md/dm-snap.h b/drivers/md/dm-snap.h
index 93bce5d..8b02a42 100644
--- a/drivers/md/dm-snap.h
+++ b/drivers/md/dm-snap.h
@@ -115,6 +115,18 @@ struct exception_store {
 				  void *callback_context);
 
 	/*
+	 * Prepare the next exception for merging.
+	 */
+	int (*prepare_merge) (struct exception_store *store,
+			      struct dm_snap_exception *e,
+			      int *empty);
+
+	/*
+	 * Remove the currently merging exception.
+	 */
+	void (*commit_merge) (struct exception_store *store);
+
+	/*
 	 * The snapshot is invalid, note this in the metadata.
 	 */
 	void (*drop_snapshot) (struct exception_store *store);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 04/10] dm snapshot: Split snapshot constructor
  2008-04-01  7:32     ` [PATCH 03/10] dm snapshot: Add exception store methods for merging Mark McLoughlin
@ 2008-04-01  7:32       ` Mark McLoughlin
  2008-04-01  7:32         ` [PATCH 05/10] dm snapshot: Split up origin_resume() Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Split parts of snapshot_ctr() into a separate function, snapshot_init()
and parts of snapshot_dtr() into snapshot_destroy().

This is needed by later patches because the merging target re-uses the
dm_snapshot structure, but the merging target doesn't register itself
as a snapshot (i.e. origin writes shouldn't cause an exception), has a
different set of arguments and in some ways behaves more like a
snapshot-origin target.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |  166 +++++++++++++++++++++++++++----------------------
 1 files changed, 91 insertions(+), 75 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 3b71f53..20f12a8 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -435,18 +435,9 @@ static ulong round_up(ulong n, ulong size)
 	return (n + size) & ~size;
 }
 
-static int set_chunk_size(struct dm_snapshot *s, const char *chunk_size_arg,
+static int set_chunk_size(struct dm_snapshot *s, unsigned long chunk_size,
 			  char **error)
 {
-	unsigned long chunk_size;
-	char *value;
-
-	chunk_size = simple_strtoul(chunk_size_arg, &value, 10);
-	if (*chunk_size_arg == '\0' || *value != '\0') {
-		*error = "Invalid chunk size";
-		return -EINVAL;
-	}
-
 	if (!chunk_size) {
 		s->chunk_size = s->chunk_mask = s->chunk_shift = 0;
 		return 0;
@@ -477,61 +468,30 @@ static int set_chunk_size(struct dm_snapshot *s, const char *chunk_size_arg,
 	return 0;
 }
 
-/*
- * Construct a snapshot mapping: <origin_dev> <COW-dev> <p/n> <chunk-size>
- */
-static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+static int snapshot_init(struct dm_target *ti, struct dm_snapshot *s,
+			 const char *origin_path, int origin_mode,
+			 const char *cow_path, int cow_mode,
+			 char type, unsigned long chunk_size)
 {
-	struct dm_snapshot *s;
-	int r = -EINVAL;
-	char persistent;
-	char *origin_path;
-	char *cow_path;
-
-	if (argc != 4) {
-		ti->error = "requires exactly 4 arguments";
-		r = -EINVAL;
-		goto bad1;
-	}
-
-	origin_path = argv[0];
-	cow_path = argv[1];
-	persistent = toupper(*argv[2]);
-
-	if (persistent != 'P' && persistent != 'N') {
-		ti->error = "Persistent flag is not P or N";
-		r = -EINVAL;
-		goto bad1;
-	}
-
-	s = kmalloc(sizeof(*s), GFP_KERNEL);
-	if (s == NULL) {
-		ti->error = "Cannot allocate snapshot context private "
-		    "structure";
-		r = -ENOMEM;
-		goto bad1;
-	}
+	int r;
 
-	r = dm_get_device(ti, origin_path, 0, ti->len, FMODE_READ, &s->origin);
+	r = dm_get_device(ti, origin_path, 0, ti->len, origin_mode, &s->origin);
 	if (r) {
 		ti->error = "Cannot get origin device";
-		goto bad2;
+		goto bad1;
 	}
 
-	r = dm_get_device(ti, cow_path, 0, 0,
-			  FMODE_READ | FMODE_WRITE, &s->cow);
+	r = dm_get_device(ti, cow_path, 0, 0, cow_mode, &s->cow);
 	if (r) {
-		dm_put_device(ti, s->origin);
 		ti->error = "Cannot get COW device";
 		goto bad2;
 	}
 
-	r = set_chunk_size(s, argv[3], &ti->error);
+	r = set_chunk_size(s, chunk_size, &ti->error);
 	if (r)
 		goto bad3;
 
-	s->type = persistent;
-
+	s->type = type;
 	s->valid = 1;
 	s->active = 0;
 	s->last_percent = 0;
@@ -548,7 +508,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 
 	s->store.snap = s;
 
-	if (persistent == 'P')
+	if (s->type == 'P')
 		r = dm_create_persistent(&s->store);
 	else
 		r = dm_create_transient(&s->store);
@@ -578,41 +538,24 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	bio_list_init(&s->queued_bios);
 	INIT_WORK(&s->queued_bios_work, flush_queued_bios);
 
-	/* Add snapshot to the list of snapshots for this origin */
-	/* Exceptions aren't triggered till snapshot_resume() is called */
-	if (register_snapshot(s)) {
-		r = -EINVAL;
-		ti->error = "Cannot register snapshot origin";
-		goto bad6;
-	}
-
-	ti->private = s;
-	ti->split_io = s->chunk_size;
-
 	return 0;
 
  bad6:
 	kcopyd_client_destroy(s->kcopyd_client);
-
  bad5:
 	s->store.destroy(&s->store);
-
  bad4:
 	exit_exception_table(&s->pending, pending_cache);
 	exit_exception_table(&s->complete, exception_cache);
-
  bad3:
 	dm_put_device(ti, s->cow);
-	dm_put_device(ti, s->origin);
-
  bad2:
-	kfree(s);
-
+	dm_put_device(ti, s->origin);
  bad1:
 	return r;
 }
 
-static void __free_exceptions(struct dm_snapshot *s)
+static void snapshot_destroy(struct dm_target *ti, struct dm_snapshot *s)
 {
 	kcopyd_client_destroy(s->kcopyd_client);
 	s->kcopyd_client = NULL;
@@ -621,6 +564,82 @@ static void __free_exceptions(struct dm_snapshot *s)
 	exit_exception_table(&s->complete, exception_cache);
 
 	s->store.destroy(&s->store);
+
+	dm_put_device(ti, s->origin);
+	dm_put_device(ti, s->cow);
+}
+
+/*
+ * Construct a snapshot mapping: <origin_dev> <COW-dev> <p/n> <chunk-size>
+ */
+static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct dm_snapshot *s;
+	int r = -EINVAL;
+	char persistent;
+	char *origin_path;
+	char *cow_path;
+	char *chunk_size_arg;
+	unsigned long chunk_size;
+	char *value;
+
+	if (argc != 4) {
+		ti->error = "requires exactly 4 arguments";
+		r = -EINVAL;
+		goto bad1;
+	}
+
+	origin_path = argv[0];
+	cow_path = argv[1];
+	persistent = toupper(*argv[2]);
+	chunk_size_arg = argv[3];
+
+	if (persistent != 'P' && persistent != 'N') {
+		ti->error = "Persistent flag is not P or N";
+		r = -EINVAL;
+		goto bad1;
+	}
+
+	chunk_size = simple_strtoul(chunk_size_arg, &value, 10);
+	if (*chunk_size_arg == '\0' || *value != '\0') {
+		ti->error = "Invalid chunk size";
+		goto bad1;
+	}
+
+	s = kmalloc(sizeof(*s), GFP_KERNEL);
+	if (s == NULL) {
+		ti->error = "Cannot allocate snapshot context private "
+		    "structure";
+		r = -ENOMEM;
+		goto bad1;
+	}
+
+	r = snapshot_init(ti, s,
+			  origin_path, FMODE_READ,
+			  cow_path, FMODE_READ | FMODE_WRITE,
+			  persistent, chunk_size);
+	if (r)
+		goto bad2;
+
+	/* Add snapshot to the list of snapshots for this origin */
+	/* Exceptions aren't triggered till snapshot_resume() is called */
+	if (register_snapshot(s)) {
+		r = -EINVAL;
+		ti->error = "Cannot register snapshot origin";
+		goto bad3;
+	}
+
+	ti->private = s;
+	ti->split_io = s->chunk_size;
+
+	return 0;
+
+ bad3:
+	snapshot_destroy(ti, s);
+ bad2:
+	kfree(s);
+ bad1:
+	return r;
 }
 
 static void snapshot_dtr(struct dm_target *ti)
@@ -633,10 +652,7 @@ static void snapshot_dtr(struct dm_target *ti)
 	/* After this returns there can be no new kcopyd jobs. */
 	unregister_snapshot(s);
 
-	__free_exceptions(s);
-
-	dm_put_device(ti, s->origin);
-	dm_put_device(ti, s->cow);
+	snapshot_destroy(ti, s);
 
 	kfree(s);
 }
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 05/10] dm snapshot: Split up origin_resume()
  2008-04-01  7:32       ` [PATCH 04/10] dm snapshot: Split snapshot constructor Mark McLoughlin
@ 2008-04-01  7:32         ` Mark McLoughlin
  2008-04-01  7:32           ` [PATCH 06/10] dm snapshot: Make find_pending_exception() not take a bio Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Split the origin_resume() code which calculates the minimum chunk size
of all related snapshots into a separate function, min_chunk_size().

This function will be used by the resume() function for the merged
target.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 20f12a8..7684f8f 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1275,13 +1275,8 @@ static int origin_map(struct dm_target *ti, struct bio *bio,
 
 #define min_not_zero(l, r) (l == 0) ? r : ((r == 0) ? l : min(l, r))
 
-/*
- * Set the target "split_io" field to the minimum of all the snapshots'
- * chunk sizes.
- */
-static void origin_resume(struct dm_target *ti)
+static chunk_t min_chunk_size(struct dm_dev *dev)
 {
-	struct dm_dev *dev = ti->private;
 	struct dm_snapshot *snap;
 	struct origin *o;
 	chunk_t chunk_size = 0;
@@ -1293,7 +1288,18 @@ static void origin_resume(struct dm_target *ti)
 			chunk_size = min_not_zero(chunk_size, snap->chunk_size);
 	up_read(&_origins_lock);
 
-	ti->split_io = chunk_size;
+	return chunk_size;
+}
+
+/*
+ * Set the target "split_io" field to the minimum of all the snapshots'
+ * chunk sizes.
+ */
+static void origin_resume(struct dm_target *ti)
+{
+	struct dm_dev *dev = ti->private;
+
+	ti->split_io = min_chunk_size(dev);
 }
 
 static int origin_status(struct dm_target *ti, status_type_t type, char *result,
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 06/10] dm snapshot: Make find_pending_exception() not take a bio
  2008-04-01  7:32         ` [PATCH 05/10] dm snapshot: Split up origin_resume() Mark McLoughlin
@ 2008-04-01  7:32           ` Mark McLoughlin
  2008-04-01  7:32             ` [PATCH 07/10] dm snapshot: Make origin_write() bio arg optional Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Change __find_pending_exception() so that it takes an old chunk index
and "prepare" flag rather than a bio structure.

Rationale is that the merged target will use the pending exception
table for tracking writes which have been mapped to the COW device. In
such cases, we won't want a new exception to be prepared in the
exception store.

Also, the merged target will want to trigger an exception in all
associated snapshots without a corresponding bio structure. We'll need
to do this when initiating a copy from the COW device to the origin
device.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 7684f8f..e64b9b4 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -881,11 +881,10 @@ static void start_copy(struct dm_snap_pending_exception *pe)
  * this.
  */
 static struct dm_snap_pending_exception *
-__find_pending_exception(struct dm_snapshot *s, struct bio *bio)
+__find_pending_exception(struct dm_snapshot *s, chunk_t chunk, int prepare)
 {
 	struct dm_snap_exception *e;
 	struct dm_snap_pending_exception *pe;
-	chunk_t chunk = sector_to_chunk(s, bio->bi_sector);
 	unsigned long flags;
 
 	spin_lock_irqsave(&s->pe_lock, flags);
@@ -934,7 +933,7 @@ __find_pending_exception(struct dm_snapshot *s, struct bio *bio)
 	insert_exception(&s->pending, &pe->e);
 
  ref_and_out:
-	if (bio_rw(bio) != WRITE)
+	if (!prepare)
 		get_pending_exception(pe);
 	else if (!pe->started) {
 		if (s->store.prepare_exception(&s->store, &pe->e)) {
@@ -992,7 +991,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio,
 		goto out_unlock;
 	}
 
-	pe = __find_pending_exception(s, bio);
+	pe = __find_pending_exception(s, chunk, bio_rw(bio) == WRITE);
 	if (!pe) {
 		__invalidate_snapshot(s, -ENOMEM);
 		r = -EIO;
@@ -1142,7 +1141,7 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 		if (e)
 			goto next_snapshot;
 
-		pe = __find_pending_exception(snap, bio);
+		pe = __find_pending_exception(snap, chunk, 1);
 		if (!pe) {
 			__invalidate_snapshot(snap, -ENOMEM);
 			goto next_snapshot;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 07/10] dm snapshot: Make origin_write() bio arg optional
  2008-04-01  7:32           ` [PATCH 06/10] dm snapshot: Make find_pending_exception() not take a bio Mark McLoughlin
@ 2008-04-01  7:32             ` Mark McLoughlin
  2008-04-01  7:32               ` [PATCH 08/10] dm snapshot: Add basic version of snapshot-merged target Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Similar to the previous patch, change the arguments to
__origin_write() so that rather than requiring a bio structure, we
take a origin device sector index and optional bio structure.

Rationale is that the merged target will want to trigger an exception
in all associated snapshots before merging the chunk from the COW
to the origin and at this point we don't have a bio structure
corresponding to that copy.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |   21 ++++++++++++++-------
 1 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index e64b9b4..e4ba02c 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1103,7 +1103,8 @@ static int snapshot_status(struct dm_target *ti, status_type_t type,
 /*-----------------------------------------------------------------
  * Origin methods
  *---------------------------------------------------------------*/
-static int __origin_write(struct list_head *snapshots, struct bio *bio)
+static int __origin_write(struct list_head *snapshots, sector_t sector,
+			  struct bio *bio)
 {
 	int r = DM_MAPIO_REMAPPED;
 	struct dm_snapshot *snap;
@@ -1123,14 +1124,14 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 			goto next_snapshot;
 
 		/* Nothing to do if writing beyond end of snapshot */
-		if (bio->bi_sector >= dm_table_get_size(snap->table))
+		if (sector >= dm_table_get_size(snap->table))
 			goto next_snapshot;
 
 		/*
 		 * Remember, different snapshots can have
 		 * different chunk sizes.
 		 */
-		chunk = sector_to_chunk(snap, bio->bi_sector);
+		chunk = sector_to_chunk(snap, sector);
 
 		/*
 		 * Check exception table to see if block
@@ -1166,7 +1167,8 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 			 */
 			get_pending_exception(primary_pe);
 
-			bio_list_add(&primary_pe->origin_bios, bio);
+			if (bio)
+				bio_list_add(&primary_pe->origin_bios, bio);
 
 			r = DM_MAPIO_SUBMITTED;
 		}
@@ -1212,7 +1214,8 @@ static int __origin_write(struct list_head *snapshots, struct bio *bio)
 /*
  * Called on a write from the origin driver.
  */
-static int do_origin(struct dm_dev *origin, struct bio *bio)
+static int do_origin(struct dm_dev *origin, sector_t sector,
+		     struct bio *bio)
 {
 	struct origin *o;
 	int r = DM_MAPIO_REMAPPED;
@@ -1220,7 +1223,7 @@ static int do_origin(struct dm_dev *origin, struct bio *bio)
 	down_read(&_origins_lock);
 	o = __lookup_origin(origin->bdev);
 	if (o)
-		r = __origin_write(&o->snapshots, bio);
+		r = __origin_write(&o->snapshots, sector, bio);
 	up_read(&_origins_lock);
 
 	return r;
@@ -1266,10 +1269,14 @@ static int origin_map(struct dm_target *ti, struct bio *bio,
 		      union map_info *map_context)
 {
 	struct dm_dev *dev = ti->private;
+
 	bio->bi_bdev = dev->bdev;
 
 	/* Only tell snapshots if this is a write */
-	return (bio_rw(bio) == WRITE) ? do_origin(dev, bio) : DM_MAPIO_REMAPPED;
+	if (bio_rw(bio) != WRITE)
+		return DM_MAPIO_REMAPPED;
+
+	return do_origin(dev, bio->bi_sector, bio);
 }
 
 #define min_not_zero(l, r) (l == 0) ? r : ((r == 0) ? l : min(l, r))
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 08/10] dm snapshot: Add basic version of snapshot-merged target
  2008-04-01  7:32             ` [PATCH 07/10] dm snapshot: Make origin_write() bio arg optional Mark McLoughlin
@ 2008-04-01  7:32               ` Mark McLoughlin
  2008-04-01  7:32                 ` [PATCH 09/10] dm snapshot: Track snapshot-merged writes Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Add a very basic version of the snapshot-merged target. The only
arguments required for the target are the origin device path and the
COW device path. The origin device may not be used with another
snapshot-origin target at the same time and the COW device may not be
used with another snapshot target.

The target acts much like a snapshot target exception no new
exceptions are triggered by writes. If an I/O request corresponds to a
chunk which has not already been mapped to the COW device, then the
I/O will always be dispatched to the chunk on the origin
device. However, if it is a write, before dispatching it we will
trigger exceptions in any other associated snapshots much like the
snapshot-origin target does.

Another slight twist is the status() function reports the merging
progress of the target - i.e. if the COW device is full, we report 0%
and if it's empty we report 100%.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |  166 ++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/md/dm-snap.h |    4 +
 2 files changed, 159 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index e4ba02c..5236763 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1326,6 +1326,127 @@ static int origin_status(struct dm_target *ti, status_type_t type, char *result,
 	return 0;
 }
 
+/*
+ * Construct a merged snapshot: <origin_dev> <COW-dev>
+ */
+static int merged_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct dm_merged *merged;
+	char *origin_path;
+	char *cow_path;
+	int r;
+
+	if (argc != 2) {
+		ti->error = "dm-snapshot: requires exactly 2 arguments";
+		return -EINVAL;
+	}
+
+	origin_path = argv[0];
+	cow_path = argv[1];
+
+	merged = kmalloc(sizeof(struct dm_merged), GFP_KERNEL);
+	if (!merged) {
+		ti->error = "Cannot allocate merged snapshot context private "
+			    "structure";
+		return -ENOMEM;
+	}
+
+	r = snapshot_init(ti, &merged->snap,
+			  origin_path, dm_table_get_mode(ti->table),
+			  cow_path, FMODE_READ | FMODE_WRITE,
+			  'P', 0);
+	if (r) {
+		kfree(merged);
+		return r;
+	}
+
+	ti->private = merged;
+
+	return 0;
+}
+
+static void merged_dtr(struct dm_target *ti)
+{
+	struct dm_merged *merged = ti->private;
+
+	snapshot_destroy(ti, &merged->snap);
+
+	kfree(merged);
+}
+
+static int merged_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct dm_merged *merged = ti->private;
+	struct dm_snapshot *s = &merged->snap;
+	struct dm_snap_exception *e;
+	chunk_t chunk;
+	int r = DM_MAPIO_REMAPPED;;
+
+	if (unlikely(bio_barrier(bio)))
+		return -EOPNOTSUPP;
+
+	chunk = sector_to_chunk(s, bio->bi_sector);
+
+	down_read(&s->lock);
+
+	if (!s->valid) {
+		up_read(&s->lock);
+		return -EIO;
+	}
+
+	e = lookup_exception(&s->complete, chunk);
+	if (e)
+		remap_exception(s, e, bio, chunk);
+	else {
+		bio->bi_bdev = s->origin->bdev;
+
+		if (bio_rw(bio) == WRITE)
+			r = do_origin(s->origin, bio->bi_sector, bio);
+	}
+
+	up_read(&s->lock);
+
+	return r;
+}
+
+static void merged_resume(struct dm_target *ti)
+{
+	struct dm_merged *merged = ti->private;
+
+	ti->split_io = min_chunk_size(merged->snap.origin);
+}
+
+static int merged_status(struct dm_target *ti, status_type_t type, char *result,
+			 unsigned int maxlen)
+{
+	struct dm_merged *merged = ti->private;
+	struct dm_snapshot *snap = &merged->snap;
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		if (!snap->valid)
+			snprintf(result, maxlen, "Invalid");
+		else {
+			sector_t numerator, denominator;
+			snap->store.fraction_full(&snap->store,
+						  &numerator,
+						  &denominator);
+			snprintf(result, maxlen, "%llu/%llu",
+				 (unsigned long long)denominator - numerator,
+				 (unsigned long long)denominator);
+		}
+		break;
+
+	case STATUSTYPE_TABLE:
+		snprintf(result, maxlen, "%s %s",
+			 snap->origin->name, snap->cow->name);
+		break;
+	}
+
+	return 0;
+}
+
 static struct target_type origin_target = {
 	.name    = "snapshot-origin",
 	.version = {1, 6, 0},
@@ -1349,6 +1470,17 @@ static struct target_type snapshot_target = {
 	.status  = snapshot_status,
 };
 
+static struct target_type merged_target = {
+	.name    = "snapshot-merged",
+	.version = {1, 1, 1},
+	.module  = THIS_MODULE,
+	.ctr     = merged_ctr,
+	.dtr     = merged_dtr,
+	.map     = merged_map,
+	.resume  = merged_resume,
+	.status  = merged_status,
+};
+
 static int __init dm_snapshot_init(void)
 {
 	int r;
@@ -1365,53 +1497,61 @@ static int __init dm_snapshot_init(void)
 		goto bad1;
 	}
 
+	r = dm_register_target(&merged_target);
+	if (r < 0) {
+		DMERR("Device mapper: Merged: register failed %d\n", r);
+		goto bad2;
+	}
+
 	r = init_origin_hash();
 	if (r) {
 		DMERR("init_origin_hash failed.");
-		goto bad2;
+		goto bad3;
 	}
 
 	exception_cache = KMEM_CACHE(dm_snap_exception, 0);
 	if (!exception_cache) {
 		DMERR("Couldn't create exception cache.");
 		r = -ENOMEM;
-		goto bad3;
+		goto bad4;
 	}
 
 	pending_cache = KMEM_CACHE(dm_snap_pending_exception, 0);
 	if (!pending_cache) {
 		DMERR("Couldn't create pending cache.");
 		r = -ENOMEM;
-		goto bad4;
+		goto bad5;
 	}
 
 	pending_pool = mempool_create_slab_pool(128, pending_cache);
 	if (!pending_pool) {
 		DMERR("Couldn't create pending pool.");
 		r = -ENOMEM;
-		goto bad5;
+		goto bad6;
 	}
 
 	ksnapd = create_singlethread_workqueue("ksnapd");
 	if (!ksnapd) {
 		DMERR("Failed to create ksnapd workqueue.");
 		r = -ENOMEM;
-		goto bad6;
+		goto bad7;
 	}
 
 	return 0;
 
-      bad6:
+bad7:
 	mempool_destroy(pending_pool);
-      bad5:
+bad6:
 	kmem_cache_destroy(pending_cache);
-      bad4:
+bad5:
 	kmem_cache_destroy(exception_cache);
-      bad3:
+bad4:
 	exit_origin_hash();
-      bad2:
+bad3:
+	dm_unregister_target(&merged_target);
+bad2:
 	dm_unregister_target(&origin_target);
-      bad1:
+bad1:
 	dm_unregister_target(&snapshot_target);
 	return r;
 }
@@ -1430,6 +1570,10 @@ static void __exit dm_snapshot_exit(void)
 	if (r)
 		DMERR("origin unregister failed %d", r);
 
+	r = dm_unregister_target(&merged_target);
+	if (r)
+		DMERR("merged unregister failed %d", r);
+
 	exit_origin_hash();
 	mempool_destroy(pending_pool);
 	kmem_cache_destroy(pending_cache);
diff --git a/drivers/md/dm-snap.h b/drivers/md/dm-snap.h
index 8b02a42..8600477 100644
--- a/drivers/md/dm-snap.h
+++ b/drivers/md/dm-snap.h
@@ -188,6 +188,10 @@ struct dm_snapshot {
 	struct work_struct queued_bios_work;
 };
 
+struct dm_merged {
+	struct dm_snapshot snap;
+};
+
 /*
  * Used by the exception stores to load exceptions hen
  * initialising.
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 09/10] dm snapshot: Track snapshot-merged writes
  2008-04-01  7:32               ` [PATCH 08/10] dm snapshot: Add basic version of snapshot-merged target Mark McLoughlin
@ 2008-04-01  7:32                 ` Mark McLoughlin
  2008-04-01  7:32                   ` [PATCH 10/10] dm snapshot: Add snapshot merging code Mark McLoughlin
  0 siblings, 1 reply; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Add the ability to track writes to the snapshot-merged target which
have been mapped to the COW device.

Rationale is that we won't want to start merging a chunk until all
writes to that chunk have completed. Reads don't matter because the
contents of the chunk on the COW device is not affected by the merging
process.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |   36 +++++++++++++++++++++++++++++++-----
 1 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 5236763..bc13f35 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1380,6 +1380,7 @@ static int merged_map(struct dm_target *ti, struct bio *bio,
 	struct dm_merged *merged = ti->private;
 	struct dm_snapshot *s = &merged->snap;
 	struct dm_snap_exception *e;
+	struct dm_snap_pending_exception *pe;
 	chunk_t chunk;
 	int r = DM_MAPIO_REMAPPED;;
 
@@ -1388,28 +1389,52 @@ static int merged_map(struct dm_target *ti, struct bio *bio,
 
 	chunk = sector_to_chunk(s, bio->bi_sector);
 
-	down_read(&s->lock);
+	down_write(&s->lock);
 
 	if (!s->valid) {
-		up_read(&s->lock);
+		up_write(&s->lock);
 		return -EIO;
 	}
 
 	e = lookup_exception(&s->complete, chunk);
-	if (e)
+	if (e) {
+		if (bio_rw(bio) == WRITE) {
+			pe = __find_pending_exception(s, chunk, 0);
+			if (!pe) {
+				__invalidate_snapshot(s, -ENOMEM);
+				r = -EIO;
+				goto out_unlock;
+			}
+                       map_context->ptr = pe;
+		}
+
 		remap_exception(s, e, bio, chunk);
-	else {
+	} else {
 		bio->bi_bdev = s->origin->bdev;
 
 		if (bio_rw(bio) == WRITE)
 			r = do_origin(s->origin, bio->bi_sector, bio);
 	}
 
-	up_read(&s->lock);
+out_unlock:
+	up_write(&s->lock);
 
 	return r;
 }
 
+static int merged_end_io(struct dm_target *ti, struct bio *bio,
+			 int error, union map_info *map_context)
+{
+	struct dm_snap_pending_exception *pe = map_context->ptr;
+
+	if (bio_rw(bio) != WRITE || !pe)
+		return error;
+
+	put_pending_exception(pe);
+
+	return error;
+}
+
 static void merged_resume(struct dm_target *ti)
 {
 	struct dm_merged *merged = ti->private;
@@ -1477,6 +1502,7 @@ static struct target_type merged_target = {
 	.ctr     = merged_ctr,
 	.dtr     = merged_dtr,
 	.map     = merged_map,
+	.end_io  = merged_end_io,
 	.resume  = merged_resume,
 	.status  = merged_status,
 };
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 10/10] dm snapshot: Add snapshot merging code
  2008-04-01  7:32                 ` [PATCH 09/10] dm snapshot: Track snapshot-merged writes Mark McLoughlin
@ 2008-04-01  7:32                   ` Mark McLoughlin
  0 siblings, 0 replies; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-01  7:32 UTC (permalink / raw)
  To: agk; +Cc: Mark McLoughlin, dm-devel

Add the code to actually perform the incremental merging of chunks
from the COW device to the origin device.

Chunks are merged one at a time by ksnapd. While a merge is in
progress the merging mutex is held, allowing us to disable merging
while the target is supended.

We take care not to initiate a copy from the COW device while there
are writes pending on the chunk in question. Likewise we take care to
delay any writes to a chunk while the chunk is being merged.

The merging_work pointer added to the pending_exception structure
allows us to indicate that ksnapd should be woken up when the I/O on
that chunk has completed or when all other associated snapshots have
finished copying the chunk to their COW device. This pointer is
protected by the pe_lock spinlock.

The old_chunk, new_chunk and delayed_bios members of the dm_merged
structure are all protected by the read/write mutex in the dm_snapshot
structure.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/md/dm-snap.c |  233 +++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/md/dm-snap.h |   16 ++++
 2 files changed, 235 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index bc13f35..b2b9762 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -76,6 +76,14 @@ struct dm_snap_pending_exception {
 	 */
 	atomic_t ref_count;
 
+	/*
+	 * If set, merging of a chunk is blocked until all snapshots
+	 * have finished copying the chunk from the origin device
+	 * and until all writes to the chunk on the COW device have
+	 * completed.
+	 */
+	struct work_struct *merging_work;
+
 	/* Pointer back to snapshot context */
 	struct dm_snapshot *snap;
 
@@ -752,6 +760,8 @@ static struct bio *put_pending_exception(struct dm_snap_pending_exception *pe)
 	 */
 	if ((!primary_pe || primary_pe != pe) &&
 	    atomic_dec_and_test(&pe->ref_count)) {
+		if (pe->merging_work)
+			queue_work(ksnapd, pe->merging_work);
 		remove_exception(&pe->e);
 		free_pending_exception(pe);
 	}
@@ -760,6 +770,8 @@ static struct bio *put_pending_exception(struct dm_snap_pending_exception *pe)
 	 * Free the primary pe if nothing references it.
 	 */
 	if (primary_pe && !atomic_read(&primary_pe->ref_count)) {
+		if (primary_pe->merging_work)
+			queue_work(ksnapd, primary_pe->merging_work);
 		remove_exception(&primary_pe->e);
 		free_pending_exception(primary_pe);
 	}
@@ -927,6 +939,7 @@ __find_pending_exception(struct dm_snapshot *s, chunk_t chunk, int prepare)
 	bio_list_init(&pe->snapshot_bios);
 	pe->primary_pe = NULL;
 	atomic_set(&pe->ref_count, 0);
+	pe->merging_work = NULL;
 	pe->snap = s;
 	pe->started = 0;
 
@@ -1104,7 +1117,7 @@ static int snapshot_status(struct dm_target *ti, status_type_t type,
  * Origin methods
  *---------------------------------------------------------------*/
 static int __origin_write(struct list_head *snapshots, sector_t sector,
-			  struct bio *bio)
+			  struct bio *bio, struct work_struct *merging_work)
 {
 	int r = DM_MAPIO_REMAPPED;
 	struct dm_snapshot *snap;
@@ -1169,6 +1182,8 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
 
 			if (bio)
 				bio_list_add(&primary_pe->origin_bios, bio);
+			if (merging_work)
+				primary_pe->merging_work = merging_work;
 
 			r = DM_MAPIO_SUBMITTED;
 		}
@@ -1215,7 +1230,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
  * Called on a write from the origin driver.
  */
 static int do_origin(struct dm_dev *origin, sector_t sector,
-		     struct bio *bio)
+		     struct bio *bio, struct work_struct *merging_work)
 {
 	struct origin *o;
 	int r = DM_MAPIO_REMAPPED;
@@ -1223,7 +1238,7 @@ static int do_origin(struct dm_dev *origin, sector_t sector,
 	down_read(&_origins_lock);
 	o = __lookup_origin(origin->bdev);
 	if (o)
-		r = __origin_write(&o->snapshots, sector, bio);
+		r = __origin_write(&o->snapshots, sector, bio, merging_work);
 	up_read(&_origins_lock);
 
 	return r;
@@ -1276,7 +1291,7 @@ static int origin_map(struct dm_target *ti, struct bio *bio,
 	if (bio_rw(bio) != WRITE)
 		return DM_MAPIO_REMAPPED;
 
-	return do_origin(dev, bio->bi_sector, bio);
+	return do_origin(dev, bio->bi_sector, bio, NULL);
 }
 
 #define min_not_zero(l, r) (l == 0) ? r : ((r == 0) ? l : min(l, r))
@@ -1326,6 +1341,167 @@ static int origin_status(struct dm_target *ti, status_type_t type, char *result,
 	return 0;
 }
 
+#define MERGE_COMPLETE_BIT 0
+#define MERGE_ERROR_BIT    1
+
+static inline void set_merge_complete(struct dm_merged *merged)
+{
+	set_bit(MERGE_COMPLETE_BIT, &merged->status);
+}
+
+static inline void set_merge_error(struct dm_merged *merged)
+{
+	set_bit(MERGE_ERROR_BIT, &merged->status);
+}
+
+static inline int get_merge_complete(struct dm_merged *merged)
+{
+	return test_and_clear_bit(MERGE_COMPLETE_BIT, &merged->status);
+}
+
+static inline int get_merge_error(struct dm_merged *merged)
+{
+	return test_and_clear_bit(MERGE_ERROR_BIT, &merged->status);
+}
+
+static void merge_callback(int read_err, unsigned int write_err, void *context)
+{
+	struct dm_merged *merged = (struct dm_merged *) context;
+
+	if (read_err || write_err)
+		set_merge_error(merged);
+
+	set_merge_complete(merged);
+
+	queue_work(ksnapd, &merged->merging_work);
+}
+
+static int start_merge(struct dm_merged *merged)
+{
+	struct dm_snapshot *s = &merged->snap;
+	struct io_region src, dest;
+	sector_t dev_size, old_sector;
+	struct dm_snap_exception e;
+	struct dm_snap_pending_exception *pe;
+	unsigned long flags;
+	int empty;
+	int r;
+
+	down_write(&s->lock);
+
+	if (merged->new_chunk == 0) {
+		r = s->store.prepare_merge(&s->store, &e, &empty);
+		if (r || empty) {
+			if (empty)
+				dm_table_event(s->table);
+			up_write(&s->lock);
+			return 0;
+		}
+
+		merged->old_chunk = e.old_chunk;
+		merged->new_chunk = e.new_chunk;
+	}
+
+	spin_lock_irqsave(&s->pe_lock, flags);
+
+	pe = (struct dm_snap_pending_exception *)
+		lookup_exception(&s->pending, merged->old_chunk);
+	if (pe) {
+		pe->merging_work = &merged->merging_work;
+		spin_unlock_irqrestore(&s->pe_lock, flags);
+		up_write(&s->lock);
+		return 0; /* punt the copy until pending I/O complete */
+	}
+
+	spin_unlock_irqrestore(&s->pe_lock, flags);
+
+	up_write(&s->lock);
+
+	old_sector = chunk_to_sector(s, merged->old_chunk);
+
+	r = do_origin(s->origin, old_sector, NULL, &merged->merging_work);
+	if (r <= 0)
+		return 0; /* punt until other snapshots finish copying */
+
+	dev_size = get_dev_size(s->origin->bdev);
+
+	dest.bdev = s->origin->bdev;
+	dest.sector = old_sector;
+	dest.count = min(s->chunk_size, dev_size - dest.sector);
+
+	src.bdev = s->cow->bdev;
+	src.sector = chunk_to_sector(s, merged->new_chunk);
+	src.count = dest.count;
+
+	r = kcopyd_copy(s->kcopyd_client, &src, 1, &dest, 0,
+			merge_callback, merged);
+	return r == 0;
+}
+
+static int end_merge(struct dm_merged *merged, int *error)
+{
+	struct dm_snapshot *s = &merged->snap;
+	struct dm_snap_exception *e;
+	struct bio *bio;
+	int r;
+
+	if (!get_merge_complete(merged))
+		return 0;
+
+	*error = get_merge_error(merged);
+
+	down_write(&s->lock);
+
+	e = lookup_exception(&s->complete, merged->old_chunk);
+	BUG_ON(!e);
+
+	merged->old_chunk = 0;
+	merged->new_chunk = 0;
+
+	if (*error) {
+		up_write(&s->lock);
+		return 1;
+	}
+
+	while ((bio = bio_list_pop(&merged->delayed_bios))) {
+		r = do_origin(s->origin, bio->bi_sector, bio, NULL);
+		if (r > 0)
+			generic_make_request(bio);
+		else if (r < 0) {
+			/* error the io and bail out */
+			bio_endio(bio, r);
+			bio_put(bio);
+		}
+	}
+
+	remove_exception(e);
+	free_exception(e);
+	s->store.commit_merge(&s->store);
+
+	up_write(&s->lock);
+
+	return 1;
+}
+
+static void do_merging(struct work_struct *work)
+{
+	struct dm_merged *merged =
+		container_of(work, struct dm_merged, merging_work);
+	int error;
+
+	if (end_merge(merged, &error)) {
+		up(&merged->merging);
+		if (error)
+			return;
+	}
+
+	if (down_trylock(&merged->merging))
+		return;
+
+	if (!start_merge(merged))
+		up(&merged->merging);
+}
+
 /*
  * Construct a merged snapshot: <origin_dev> <COW-dev>
  */
@@ -1360,6 +1536,17 @@ static int merged_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		return r;
 	}
 
+	sema_init(&merged->merging, 0);
+
+	INIT_WORK(&merged->merging_work, do_merging);
+
+	merged->status = 0;
+
+	merged->old_chunk = 0;
+	merged->new_chunk = 0;
+
+	bio_list_init(&merged->delayed_bios);
+
 	ti->private = merged;
 
 	return 0;
@@ -1396,6 +1583,13 @@ static int merged_map(struct dm_target *ti, struct bio *bio,
 		return -EIO;
 	}
 
+	if (bio_rw(bio) == WRITE && chunk == merged->old_chunk) {
+		bio->bi_bdev = s->origin->bdev;
+		bio_list_add(&merged->delayed_bios, bio);
+		up_write(&s->lock);
+		return 0;
+	}
+
 	e = lookup_exception(&s->complete, chunk);
 	if (e) {
 		if (bio_rw(bio) == WRITE) {
@@ -1413,7 +1607,7 @@ static int merged_map(struct dm_target *ti, struct bio *bio,
 		bio->bi_bdev = s->origin->bdev;
 
 		if (bio_rw(bio) == WRITE)
-			r = do_origin(s->origin, bio->bi_sector, bio);
+			r = do_origin(s->origin, bio->bi_sector, bio, NULL);
 	}
 
 out_unlock:
@@ -1440,6 +1634,16 @@ static void merged_resume(struct dm_target *ti)
 	struct dm_merged *merged = ti->private;
 
 	ti->split_io = min_chunk_size(merged->snap.origin);
+
+	up(&merged->merging);
+	queue_work(ksnapd, &merged->merging_work);
+}
+
+static void merged_postsuspend(struct dm_target *ti)
+{
+	struct dm_merged *merged = (struct dm_merged *) ti->private;
+
+	down(&merged->merging);
 }
 
 static int merged_status(struct dm_target *ti, status_type_t type, char *result,
@@ -1496,15 +1700,16 @@ static struct target_type snapshot_target = {
 };
 
 static struct target_type merged_target = {
-	.name    = "snapshot-merged",
-	.version = {1, 1, 1},
-	.module  = THIS_MODULE,
-	.ctr     = merged_ctr,
-	.dtr     = merged_dtr,
-	.map     = merged_map,
-	.end_io  = merged_end_io,
-	.resume  = merged_resume,
-	.status  = merged_status,
+	.name        = "snapshot-merged",
+	.version     = {1, 1, 1},
+	.module      = THIS_MODULE,
+	.ctr         = merged_ctr,
+	.dtr         = merged_dtr,
+	.map         = merged_map,
+	.end_io      = merged_end_io,
+	.resume      = merged_resume,
+	.postsuspend = merged_postsuspend,
+	.status      = merged_status,
 };
 
 static int __init dm_snapshot_init(void)
diff --git a/drivers/md/dm-snap.h b/drivers/md/dm-snap.h
index 8600477..048274c 100644
--- a/drivers/md/dm-snap.h
+++ b/drivers/md/dm-snap.h
@@ -190,6 +190,22 @@ struct dm_snapshot {
 
 struct dm_merged {
 	struct dm_snapshot snap;
+
+	/* held while there is a merge in progress */
+	struct semaphore merging;
+
+	/* merging work for ksnapd */
+	struct work_struct merging_work;
+
+	/* status of current merge - bitfield of MERGE_*_BIT */
+	unsigned long status;
+
+	/* current merge - could be delayed or in progress */
+	chunk_t old_chunk;
+	chunk_t new_chunk;
+
+	/* I/O waiting for current merge to complete */
+	struct bio_list delayed_bios;
 };
 
 /*
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/10] Snapshot merging back from the dead?
  2008-04-01  7:32 [PATCH 0/10] Snapshot merging back from the dead? Mark McLoughlin
  2008-04-01  7:32 ` [PATCH 01/10] dm snapshot: Fix the fraction full percentage Mark McLoughlin
@ 2008-04-14  8:41 ` Mark McLoughlin
  1 sibling, 0 replies; 12+ messages in thread
From: Mark McLoughlin @ 2008-04-14  8:41 UTC (permalink / raw)
  To: device-mapper development; +Cc: agk

On Tue, 2008-04-01 at 09:32 +0200, Mark McLoughlin wrote:
> Hi Alasdair,
>         I've gone back and updated my snapshot-merged patches from
> ages ago so that they apply to 2.6.25-rc7.
> 
>         Even though these updated patches are completely untested,
> I've been quite careful in rebasing them and they do build, so I
> figure they're worth posting here for review.
> 
>         Hopefully, I'll soon get a chance to test them at length and
> also update the userland patches.

	I did eventually do a small bit of testing of these and found some
reads hanging which, at a guess, is probably the read-tracking patch
that is broken.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-04-14  8:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-01  7:32 [PATCH 0/10] Snapshot merging back from the dead? Mark McLoughlin
2008-04-01  7:32 ` [PATCH 01/10] dm snapshot: Fix the fraction full percentage Mark McLoughlin
2008-04-01  7:32   ` [PATCH 02/10] dm snapshot: Track snapshot reads Mark McLoughlin
2008-04-01  7:32     ` [PATCH 03/10] dm snapshot: Add exception store methods for merging Mark McLoughlin
2008-04-01  7:32       ` [PATCH 04/10] dm snapshot: Split snapshot constructor Mark McLoughlin
2008-04-01  7:32         ` [PATCH 05/10] dm snapshot: Split up origin_resume() Mark McLoughlin
2008-04-01  7:32           ` [PATCH 06/10] dm snapshot: Make find_pending_exception() not take a bio Mark McLoughlin
2008-04-01  7:32             ` [PATCH 07/10] dm snapshot: Make origin_write() bio arg optional Mark McLoughlin
2008-04-01  7:32               ` [PATCH 08/10] dm snapshot: Add basic version of snapshot-merged target Mark McLoughlin
2008-04-01  7:32                 ` [PATCH 09/10] dm snapshot: Track snapshot-merged writes Mark McLoughlin
2008-04-01  7:32                   ` [PATCH 10/10] dm snapshot: Add snapshot merging code Mark McLoughlin
2008-04-14  8:41 ` [PATCH 0/10] Snapshot merging back from the dead? Mark McLoughlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.