* [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write
@ 2011-06-17 12:32 Mikulas Patocka
2011-06-20 8:09 ` Joe Thornber
0 siblings, 1 reply; 7+ messages in thread
From: Mikulas Patocka @ 2011-06-17 12:32 UTC (permalink / raw)
To: Alasdair G. Kergon, dm-devel
Hi
This optimizes full chunk writes for snapshots.
Note that there is a performance bug in device mapper that it doesn't ever
send bios larger than 4k to snapshots. So this patch optimizes only 4k
chunk sizes. Once the 4k limit will be removed from device mapper, this
patch could optimize any chunk size.
Mikulas
---
dm-kcopyd/dm-snap: Don't read the origin on full chunk write
If we write a full chunk in the snapshot, there is no need to read the origin
device (because the whole chunk will be overwritten anyway).
This patch introduces a new function dm_kcopyd_submit_bios that submits the
bio (or more bios linked with bi_next) and signals the completion from the
kcopyd thread.
This patch changes snapshot write logic when full chunk is written.
In this case:
1. allocate the exception
2. dispatch the bio (but don't report the bio completion to device mapper)
3. write the exception metadata
4. report bio completed
Performance test (on snapshots with 4k chunk size):
without the patch:
non-direct-io sequential write (dd): 17.7MB/s
direct-io sequential write (dd): 20.9MB/s
non-direct-io random write (mkfs.ext2): 0.44s
with the patch:
non-direct-io sequential write (dd): 26.5MB/s
direct-io sequential write (dd): 33.2MB/s
non-direct-io random write (mkfs.ext2): 0.27s
It doesn't improve performance with >4k chunk size because there is
a problem in device mapper --- it doesn't ever send bios larger than
page size to snapshots.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
drivers/md/dm-kcopyd.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/md/dm-snap.c | 36 +++++++++++++++++++++++--
include/linux/dm-kcopyd.h | 3 ++
3 files changed, 100 insertions(+), 3 deletions(-)
Index: linux-2.6.39-fast/drivers/md/dm-snap.c
===================================================================
--- linux-2.6.39-fast.orig/drivers/md/dm-snap.c 2011-06-16 19:50:45.000000000 +0200
+++ linux-2.6.39-fast/drivers/md/dm-snap.c 2011-06-16 20:00:51.000000000 +0200
@@ -173,6 +173,10 @@ struct dm_snap_pending_exception {
* kcopyd.
*/
int started;
+
+ struct bio *full_bio;
+ bio_end_io_t *full_bio_end_io;
+ void *full_bio_private;
};
/*
@@ -1373,6 +1377,7 @@ static void pending_complete(struct dm_s
struct dm_snapshot *s = pe->snap;
struct bio *origin_bios = NULL;
struct bio *snapshot_bios = NULL;
+ struct bio *full_bio = NULL;
int error = 0;
if (!success) {
@@ -1412,6 +1417,11 @@ static void pending_complete(struct dm_s
dm_remove_exception(&pe->e);
snapshot_bios = bio_list_get(&pe->snapshot_bios);
origin_bios = bio_list_get(&pe->origin_bios);
+ full_bio = pe->full_bio;
+ if (full_bio) {
+ full_bio->bi_end_io = pe->full_bio_end_io;
+ full_bio->bi_private = pe->full_bio_private;
+ }
free_pending_exception(pe);
increment_pending_exceptions_done_count();
@@ -1419,10 +1429,15 @@ static void pending_complete(struct dm_s
up_write(&s->lock);
/* Submit any pending write bios */
- if (error)
+ if (error) {
+ if (full_bio)
+ bio_io_error(full_bio);
error_bios(snapshot_bios);
- else
+ } else {
+ if (full_bio)
+ bio_endio(full_bio, 0);
flush_bios(snapshot_bios);
+ }
retry_origin_bios(s, origin_bios);
}
@@ -1512,6 +1527,7 @@ __find_pending_exception(struct dm_snaps
bio_list_init(&pe->origin_bios);
bio_list_init(&pe->snapshot_bios);
pe->started = 0;
+ pe->full_bio = NULL;
if (s->store->type->prepare_exception(s->store, &pe->e)) {
free_pending_exception(pe);
@@ -1605,10 +1621,24 @@ static int snapshot_map(struct dm_target
}
remap_exception(s, &pe->e, bio, chunk);
- bio_list_add(&pe->snapshot_bios, bio);
r = DM_MAPIO_SUBMITTED;
+ if (!pe->started &&
+ bio->bi_size == (s->store->chunk_size << SECTOR_SHIFT)) {
+ pe->full_bio_end_io = bio->bi_end_io;
+ pe->full_bio_private = bio->bi_private;
+ pe->full_bio = bio;
+ pe->started = 1;
+ up_write(&s->lock);
+
+ dm_kcopyd_submit_bios(s->kcopyd_client, bio, 0,
+ copy_callback, pe);
+ goto out;
+ }
+
+ bio_list_add(&pe->snapshot_bios, bio);
+
if (!pe->started) {
/* this is protected by snap->lock */
pe->started = 1;
Index: linux-2.6.39-fast/drivers/md/dm-kcopyd.c
===================================================================
--- linux-2.6.39-fast.orig/drivers/md/dm-kcopyd.c 2011-06-16 19:50:45.000000000 +0200
+++ linux-2.6.39-fast/drivers/md/dm-kcopyd.c 2011-06-17 12:52:54.000000000 +0200
@@ -755,6 +755,70 @@ int dm_kcopyd_zero(struct dm_kcopyd_clie
}
EXPORT_SYMBOL(dm_kcopyd_zero);
+static void dm_kcopyd_bio_end_io(struct bio *bio, int error)
+{
+ struct kcopyd_job *job = bio->bi_private;
+
+ if (unlikely(error))
+ job->write_err = 1;
+
+ if (atomic_dec_and_test(&job->sub_jobs)) {
+ struct dm_kcopyd_client *kc = job->kc;
+ push(&kc->complete_jobs, job);
+ wake(kc);
+ }
+}
+
+int dm_kcopyd_submit_bios(struct dm_kcopyd_client *kc,
+ struct bio *bios, unsigned int flags,
+ dm_kcopyd_notify_fn fn, void *context)
+{
+ struct kcopyd_job *job;
+ struct bio *bio;
+ unsigned n_bios;
+
+ /*
+ * Allocate a new job.
+ */
+ job = mempool_alloc(kc->job_pool, GFP_NOIO);
+
+ /*
+ * set up for the write.
+ */
+ job->kc = kc;
+ job->flags = flags;
+ job->read_err = 0;
+ job->write_err = 0;
+ job->pages = NULL;
+
+ job->fn = fn;
+ job->context = context;
+
+ atomic_inc(&kc->nr_jobs);
+
+ n_bios = 0;
+ bio = bios;
+ do {
+ n_bios++;
+ bio = bio->bi_next;
+ } while (bio);
+
+ atomic_set(&job->sub_jobs, n_bios);
+
+ bio = bios;
+ do {
+ struct bio *next_bio = bio->bi_next;
+ bio->bi_next = NULL;
+ bio->bi_end_io = dm_kcopyd_bio_end_io;
+ bio->bi_private = job;
+ generic_make_request(bio);
+ bio = next_bio;
+ } while (bio);
+
+ return 0;
+}
+EXPORT_SYMBOL(dm_kcopyd_submit_bios);
+
/*
* Cancels a kcopyd job, eg. someone might be deactivating a
* mirror.
Index: linux-2.6.39-fast/include/linux/dm-kcopyd.h
===================================================================
--- linux-2.6.39-fast.orig/include/linux/dm-kcopyd.h 2011-06-16 19:50:46.000000000 +0200
+++ linux-2.6.39-fast/include/linux/dm-kcopyd.h 2011-06-16 19:54:17.000000000 +0200
@@ -58,6 +58,9 @@ int dm_kcopyd_copy(struct dm_kcopyd_clie
int dm_kcopyd_zero(struct dm_kcopyd_client *kc,
unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context);
+int dm_kcopyd_submit_bios(struct dm_kcopyd_client *kc,
+ struct bio *bios, unsigned int flags,
+ dm_kcopyd_notify_fn fn, void *context);
#endif /* __KERNEL__ */
#endif /* _LINUX_DM_KCOPYD_H */
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write
2011-06-17 12:32 [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write Mikulas Patocka
@ 2011-06-20 8:09 ` Joe Thornber
2011-06-20 9:51 ` Milan Broz
0 siblings, 1 reply; 7+ messages in thread
From: Joe Thornber @ 2011-06-20 8:09 UTC (permalink / raw)
To: device-mapper development; +Cc: Alasdair G. Kergon
On Fri, Jun 17, 2011 at 08:32:36AM -0400, Mikulas Patocka wrote:
> Hi
>
> This optimizes full chunk writes for snapshots.
>
> Note that there is a performance bug in device mapper that it doesn't ever
> send bios larger than 4k to snapshots. So this patch optimizes only 4k
> chunk sizes. Once the 4k limit will be removed from device mapper, this
> patch could optimize any chunk size.
Nack. If you don't need to copy, don't send it to kcopyd.
- Joe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write
2011-06-20 8:09 ` Joe Thornber
@ 2011-06-20 9:51 ` Milan Broz
2011-06-20 10:03 ` Joe Thornber
0 siblings, 1 reply; 7+ messages in thread
From: Milan Broz @ 2011-06-20 9:51 UTC (permalink / raw)
To: device-mapper development; +Cc: Joe Thornber, Alasdair G. Kergon
On 06/20/2011 10:09 AM, Joe Thornber wrote:
> On Fri, Jun 17, 2011 at 08:32:36AM -0400, Mikulas Patocka wrote:
>> This optimizes full chunk writes for snapshots.
>>
>> Note that there is a performance bug in device mapper that it doesn't ever
>> send bios larger than 4k to snapshots. So this patch optimizes only 4k
>> chunk sizes. Once the 4k limit will be removed from device mapper, this
>> patch could optimize any chunk size.
>
> Nack. If you don't need to copy, don't send it to kcopyd.
I think that 4k limitation was removed almost from all the DM code
by introducing merge fn concept. Snapshots seems to be the last area...
(IMHO it was just simplification not bug - bio of page size (4k)
is always allowed so you do not need to solve bio split).
So there is another place where this restriction should be removed.
Anyway, I would like to use kcopyd later for crypt (online reencryption)
so any optimisation is welcome.
For the nack - I do not get it.
Joe, please can you explain the reasons?
Milan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write
2011-06-20 9:51 ` Milan Broz
@ 2011-06-20 10:03 ` Joe Thornber
0 siblings, 0 replies; 7+ messages in thread
From: Joe Thornber @ 2011-06-20 10:03 UTC (permalink / raw)
To: Milan Broz; +Cc: device-mapper development, Alasdair G. Kergon
On Mon, Jun 20, 2011 at 11:51:37AM +0200, Milan Broz wrote:
> On 06/20/2011 10:09 AM, Joe Thornber wrote:
> For the nack - I do not get it.
> Joe, please can you explain the reasons?
You know the size of the bio when you submit it. So you're saying to
kcopyd, do this copy, except if it's a particular size, then don't,
just issue the io. I don't see from Mikulas' description why kcopyd
needs to be involved in this.
For instance here's how multisnap handles it:
if (io_covers_block(pool, bio)) {
/* no copy needed, since all data is going to change */
m->bio = bio;
m->bi_end_io = bio->bi_end_io;
m->bi_private = bio->bi_private;
bio->bi_end_io = overwrite_complete;
bio->bi_private = m;
remap_and_issue(pool, bio, data_dest);
} else {
/* use kcopyd */
struct dm_io_region from, to;
from.bdev = pool->pool_dev;
from.sector = data_origin * pool->sectors_per_block;
from.count = pool->sectors_per_block;
to.bdev = pool->pool_dev;
to.sector = data_dest * pool->sectors_per_block;
to.count = pool->sectors_per_block;
r = dm_kcopyd_copy(pool->copier, &from, 1, &to,
0, copy_complete, m);
if (r < 0) {
mempool_free(m, pool->mapping_pool);
printk(KERN_ALERT "dm_kcopyd_copy() failed");
cell_error(cell);
}
}
I don't see why we want to wait for a context switch to kcopyd in
order to issue an io.
If this is about avoiding having to hook the bio endio function then
why use kcopyd rather than dm-io? (I'm not advocating this, just
asking the question).
- Joe
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write
@ 2011-06-24 0:25 Mikulas Patocka
2011-06-24 0:51 ` Alasdair G Kergon
2011-06-24 7:35 ` Joe Thornber
0 siblings, 2 replies; 7+ messages in thread
From: Mikulas Patocka @ 2011-06-24 0:25 UTC (permalink / raw)
To: Alasdair G. Kergon, dm-devel
Hi
This is the new patch for optimizing full chunk write. dm-kcopyd provides
only callbacks.
Mikulas
---
dm-kcopyd/dm-snap: Don't read the origin on full chunk write
If we write a full chunk in the snapshot, there is no need to read the origin
device (because the whole chunk will be overwritten anyway).
This patch changes snapshot write logic when full chunk is written.
In this case:
1. allocate the exception
2. dispatch the bio (but don't report the bio completion to device mapper)
3. write the exception record
4. report bio completed
Callbacks must be done through kcopyd thread, because callbacks must not
race with each other. So we create new functions
dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
This functino must not be called from interrupt context.
dm_kcopyd_do_callback: submit callback. This function may be called from
interrupt context.
Performance test (on snapshots with 4k chunk size):
without the patch:
non-direct-io sequential write (dd): 17.7MB/s
direct-io sequential write (dd): 20.9MB/s
non-direct-io random write (mkfs.ext2): 0.44s
with the patch:
non-direct-io sequential write (dd): 26.5MB/s
direct-io sequential write (dd): 33.2MB/s
non-direct-io random write (mkfs.ext2): 0.27s
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
drivers/md/dm-kcopyd.c | 31 +++++++++++++++++++++++++
drivers/md/dm-snap.c | 56 +++++++++++++++++++++++++++++++++++++++++++---
include/linux/dm-kcopyd.h | 11 +++++++++
3 files changed, 95 insertions(+), 3 deletions(-)
Index: linux-2.6.39-fast/drivers/md/dm-kcopyd.c
===================================================================
--- linux-2.6.39-fast.orig/drivers/md/dm-kcopyd.c 2011-06-24 02:05:20.000000000 +0200
+++ linux-2.6.39-fast/drivers/md/dm-kcopyd.c 2011-06-24 02:07:09.000000000 +0200
@@ -725,6 +725,37 @@ int dm_kcopyd_zero(struct dm_kcopyd_clie
}
EXPORT_SYMBOL(dm_kcopyd_zero);
+void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc,
+ dm_kcopyd_notify_fn fn, void *context)
+{
+ struct kcopyd_job *job;
+
+ job = mempool_alloc(kc->job_pool, GFP_NOIO);
+
+ memset(job, 0, sizeof(struct kcopyd_job));
+ job->kc = kc;
+ job->fn = fn;
+ job->context = context;
+
+ atomic_inc(&kc->nr_jobs);
+
+ return job;
+}
+EXPORT_SYMBOL(dm_kcopyd_prepare_callback);
+
+void dm_kcopyd_do_callback(void *j, int read_err, unsigned long write_err)
+{
+ struct kcopyd_job *job = j;
+ struct dm_kcopyd_client *kc = job->kc;
+
+ job->read_err = read_err;
+ job->write_err = write_err;
+
+ push(&kc->complete_jobs, job);
+ wake(kc);
+}
+EXPORT_SYMBOL(dm_kcopyd_do_callback);
+
/*
* Cancels a kcopyd job, eg. someone might be deactivating a
* mirror.
Index: linux-2.6.39-fast/include/linux/dm-kcopyd.h
===================================================================
--- linux-2.6.39-fast.orig/include/linux/dm-kcopyd.h 2011-06-24 02:05:21.000000000 +0200
+++ linux-2.6.39-fast/include/linux/dm-kcopyd.h 2011-06-24 02:13:22.000000000 +0200
@@ -57,6 +57,17 @@ int dm_kcopyd_copy(struct dm_kcopyd_clie
int dm_kcopyd_zero(struct dm_kcopyd_client *kc,
unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context);
+/*
+ * Prepare a callback and submit it via kcopyd thread.
+ * dm_kcopyd_prepare_callback allocates a callback structure and returns it.
+ * The returned value is then passed to dm_kcopyd_do_callback.
+ * dm_kcopyd_prepare_callback must not be called from interrupt context.
+ * dm_kcopyd_do_callback submits the callback. The callback is done via kcopyd
+ * thread. dm_kcopyd_do_callback may be called from interrupt context.
+ */
+void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc,
+ dm_kcopyd_notify_fn fn, void *context);
+void dm_kcopyd_do_callback(void *job, int read_err, unsigned long write_err);
#endif /* __KERNEL__ */
#endif /* _LINUX_DM_KCOPYD_H */
Index: linux-2.6.39-fast/drivers/md/dm-snap.c
===================================================================
--- linux-2.6.39-fast.orig/drivers/md/dm-snap.c 2011-06-24 02:05:21.000000000 +0200
+++ linux-2.6.39-fast/drivers/md/dm-snap.c 2011-06-24 02:07:09.000000000 +0200
@@ -173,6 +173,10 @@ struct dm_snap_pending_exception {
* kcopyd.
*/
int started;
+
+ struct bio *full_bio;
+ bio_end_io_t *full_bio_end_io;
+ void *full_bio_private;
};
/*
@@ -1373,6 +1377,7 @@ static void pending_complete(struct dm_s
struct dm_snapshot *s = pe->snap;
struct bio *origin_bios = NULL;
struct bio *snapshot_bios = NULL;
+ struct bio *full_bio = NULL;
int error = 0;
if (!success) {
@@ -1412,6 +1417,11 @@ static void pending_complete(struct dm_s
dm_remove_exception(&pe->e);
snapshot_bios = bio_list_get(&pe->snapshot_bios);
origin_bios = bio_list_get(&pe->origin_bios);
+ full_bio = pe->full_bio;
+ if (full_bio) {
+ full_bio->bi_end_io = pe->full_bio_end_io;
+ full_bio->bi_private = pe->full_bio_private;
+ }
free_pending_exception(pe);
increment_pending_exceptions_done_count();
@@ -1419,10 +1429,15 @@ static void pending_complete(struct dm_s
up_write(&s->lock);
/* Submit any pending write bios */
- if (error)
+ if (error) {
+ if (full_bio)
+ bio_io_error(full_bio);
error_bios(snapshot_bios);
- else
+ } else {
+ if (full_bio)
+ bio_endio(full_bio, 0);
flush_bios(snapshot_bios);
+ }
retry_origin_bios(s, origin_bios);
}
@@ -1477,6 +1492,31 @@ static void start_copy(struct dm_snap_pe
&src, 1, &dest, 0, copy_callback, pe);
}
+static void full_bio_end_io(struct bio *bio, int error)
+{
+ void *callback = bio->bi_private;
+ dm_kcopyd_do_callback(callback, 0, error ? 1 : 0);
+}
+
+static void start_full_bio(struct dm_snap_pending_exception *pe,
+ struct bio *bio)
+{
+ struct dm_snapshot *s = pe->snap;
+ void *callback;
+
+ pe->full_bio = bio;
+ pe->full_bio_end_io = bio->bi_end_io;
+ pe->full_bio_private = bio->bi_private;
+
+ callback = dm_kcopyd_prepare_callback(s->kcopyd_client,
+ copy_callback, pe);
+
+ bio->bi_end_io = full_bio_end_io;
+ bio->bi_private = callback;
+
+ generic_make_request(bio);
+}
+
static struct dm_snap_pending_exception *
__lookup_pending_exception(struct dm_snapshot *s, chunk_t chunk)
{
@@ -1512,6 +1552,7 @@ __find_pending_exception(struct dm_snaps
bio_list_init(&pe->origin_bios);
bio_list_init(&pe->snapshot_bios);
pe->started = 0;
+ pe->full_bio = NULL;
if (s->store->type->prepare_exception(s->store, &pe->e)) {
free_pending_exception(pe);
@@ -1605,10 +1646,19 @@ static int snapshot_map(struct dm_target
}
remap_exception(s, &pe->e, bio, chunk);
- bio_list_add(&pe->snapshot_bios, bio);
r = DM_MAPIO_SUBMITTED;
+ if (!pe->started &&
+ bio->bi_size == (s->store->chunk_size << SECTOR_SHIFT)) {
+ pe->started = 1;
+ up_write(&s->lock);
+ start_full_bio(pe, bio);
+ goto out;
+ }
+
+ bio_list_add(&pe->snapshot_bios, bio);
+
if (!pe->started) {
/* this is protected by snap->lock */
pe->started = 1;
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write
2011-06-24 0:25 Mikulas Patocka
@ 2011-06-24 0:51 ` Alasdair G Kergon
2011-06-24 7:35 ` Joe Thornber
1 sibling, 0 replies; 7+ messages in thread
From: Alasdair G Kergon @ 2011-06-24 0:51 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: dm-devel
On Thu, Jun 23, 2011 at 08:25:44PM -0400, Mikulas Patocka wrote:
> This is the new patch for optimizing full chunk write. dm-kcopyd provides
> only callbacks.
Yes, I think I prefer this way of doing it as we discussed - it reduces
the kcopyd changes to what is necessary. (And the complexity involved
in avoiding kcopyd isn't worth it.)
Alasdair
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write
2011-06-24 0:25 Mikulas Patocka
2011-06-24 0:51 ` Alasdair G Kergon
@ 2011-06-24 7:35 ` Joe Thornber
1 sibling, 0 replies; 7+ messages in thread
From: Joe Thornber @ 2011-06-24 7:35 UTC (permalink / raw)
To: device-mapper development; +Cc: Alasdair G. Kergon
On Thu, Jun 23, 2011 at 08:25:44PM -0400, Mikulas Patocka wrote:
> Hi
>
> This is the new patch for optimizing full chunk write. dm-kcopyd provides
> only callbacks.
I'm happy with this approach, you keep bios out of the interface.
- Joe
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-06-24 7:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-17 12:32 [PATCH] dm-kcopyd/dm-snap: Don't read the origin on full chunk write Mikulas Patocka
2011-06-20 8:09 ` Joe Thornber
2011-06-20 9:51 ` Milan Broz
2011-06-20 10:03 ` Joe Thornber
-- strict thread matches above, loose matches on Subject: below --
2011-06-24 0:25 Mikulas Patocka
2011-06-24 0:51 ` Alasdair G Kergon
2011-06-24 7:35 ` Joe Thornber
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.