* Performance question when blocks in thin-pool are assume zeroed
@ 2014-05-12 8:54 Wayne Chou
0 siblings, 0 replies; 3+ messages in thread
From: Wayne Chou @ 2014-05-12 8:54 UTC (permalink / raw)
To: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 3693 bytes --]
Hello,
I am using 4 drives to construct a RAID5 and build a thin
volume on it. To get better performance, I use '-Zn' option
in 'lvcreate' to make the thin pool assume all blocks are
already zeroed. The chunk size in RAID5 and thin-pool are
both 512KB and the stripe_cache_size=4096 on RAID5.
The following is the performance result I got when writes to
a RAID5 device and a thin volume:
dd if=/dev/zero of=/dev/md5 bs=2M count=1000
1000+0 records in
1000+0 records out
2097152000 bytes (2.1 GB) copied, 6.02630 seconds, 348 MB/s
dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=2M count=1000
1000+0 records in
1000+0 records out
2097152000 bytes (2.1 GB) copied, 11.58648 seconds, 181 MB/s
To find out what may cause the performance dropped so much, I
made some traces in codes and finally got some interesting
result. First, the bio size with dd command is 4KB, thus every
128 bios would fill up a thin-block/RAID-chunk in my situation.
Since I have set ‘pool->pf.zero_new_blocks’ = false, it seems
when a new block is provisioned for a bio, this bio would be
put back to the tail of ‘pool->deferred_bios’ list but rather
than issue it immediately. Thus, this made a re-arrangement
for the incoming bio sequences.
For example, the bi_sector of the incoming each ‘PAGE_SIZE’ bios
are:
bi_sector : [0, 8, 16, 32,......1024]
After each of them got mapped, the orders of issuing to the lower
layer become non-sequential as:
bi_sector : [8,16,24,…136,144,152, …1016] + [0,128,256,384,512,
640,768,896,1024]
As you can see, the bios which triggered the provision_block()
got re-arranged and separated with other consecutive ones. Thus,
if the lower layer device cannot merge them back, this may cause
some read-modify-writes or seek latency overhead.
According to this observation, I made a rough patch on kernel 3.6
to maintain the sequential order of bios when
‘pool->pf.zero_new_blocks’ = false:
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index b0a5ed9..76cda40 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1321,6 +1321,7 @@ static void schedule_zero(struct thin_c *tc,
dm_block_t virt_block,
{
struct pool *pool = tc->pool;
struct new_mapping *m = get_next_mapping(pool);
+ int r;
INIT_LIST_HEAD(&m->list);
m->quiesced = 1;
@@ -1337,9 +1338,20 @@ static void schedule_zero(struct thin_c *tc,
dm_block_t virt_block,
* zeroing pre-existing data, we can issue the bio immediately.
* Otherwise we use kcopyd to zero the data first.
*/
- if (!pool->pf.zero_new_blocks)
- process_prepared_mapping(m);
-
+ if (!pool->pf.zero_new_blocks) {
+ r = dm_thin_insert_block(tc->td, m->virt_block, m->data_block, 0);
+ if (r) {
+ DMERR("schedule_zero() failed");
+ cell_error(m->cell);
+ }
+ else {
+ inc_all_io_entry(pool, bio);
+ cell_defer_except(tc, cell);
+ remap_and_issue(tc, bio, data_block);
+ }
+ list_del(&m->list);
+ mempool_free(m, tc->pool->mapping_pool);
+ }
else if (io_overwrites_block(pool, bio)) {
struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
h->overwrite_mapping = m;
And the performance also got better:
dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=2M count=1000
1000+0 records in
1000+0 records out
2097152000 bytes (2.1 GB) copied, 6.16819 seconds, 340 MB/s
Suppose my thin-pool is setup with pf->zero_new_blocks = false,
I think it's OK to issue one bio immediately rather than put it
back to the pool->deferred_bios when the mapping is known. Thus,
the sequential order can be maintained in this way. However, I
wonder if I would miss some cases in this rough patch, any
suggestions would be helpful.
[-- Attachment #1.2: Type: text/html, Size: 6121 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Performance question when blocks in thin-pool are assume zeroed
@ 2014-05-12 9:03 Wayne Chou
2014-05-13 13:59 ` Joe Thornber
0 siblings, 1 reply; 3+ messages in thread
From: Wayne Chou @ 2014-05-12 9:03 UTC (permalink / raw)
To: dm-devel
Hello,
I am using 4 drives to construct a RAID5 and build a thin
volume on it. To get better performance, I use '-Zn' option
in 'lvcreate' to make the thin pool assume all blocks are
already zeroed. The chunk size in RAID5 and thin-pool are
both 512KB and the stripe_cache_size=4096 on RAID5.
The following is the performance result I got when writes to
a RAID5 device and a thin volume:
dd if=/dev/zero of=/dev/md5 bs=2M count=1000
1000+0 records in
1000+0 records out
2097152000 bytes (2.1 GB) copied, 6.02630 seconds, 348 MB/s
dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=2M count=1000
1000+0 records in
1000+0 records out
2097152000 bytes (2.1 GB) copied, 11.58648 seconds, 181 MB/s
To find out what may cause the performance dropped so much, I
made some traces in codes and finally got some interesting
result. First, the bio size with dd command is 4KB, thus every
128 bios would fill up a thin-block/RAID-chunk in my situation.
Since I have set ‘pool->pf.zero_new_blocks’ = false, it seems
when a new block is provisioned for a bio, this bio would be
put back to the tail of ‘pool->deferred_bios’ list but rather
than issue it immediately. Thus, this made a re-arrangement
for the incoming bio sequences.
For example, the bi_sector of the incoming each ‘PAGE_SIZE’ bios
are:
bi_sector : [0, 8, 16, 32,......1024]
After each of them got mapped, the orders of issuing to the lower
layer become non-sequential as:
bi_sector : [8,16,24,…136,144,152, …1016] + [0,128,256,384,512,
640,768,896,1024]
As you can see, the bios which triggered the provision_block()
got re-arranged and separated with other consecutive ones. Thus,
if the lower layer device cannot merge them back, this may cause
some read-modify-writes or seek latency overhead.
According to this observation, I made a rough patch on kernel 3.6
to maintain the sequential order of bios when
‘pool->pf.zero_new_blocks’ = false:
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index b0a5ed9..76cda40 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1321,6 +1321,7 @@ static void schedule_zero(struct thin_c *tc,
dm_block_t virt_block,
{
struct pool *pool = tc->pool;
struct new_mapping *m = get_next_mapping(pool);
+ int r;
INIT_LIST_HEAD(&m->list);
m->quiesced = 1;
@@ -1337,9 +1338,20 @@ static void schedule_zero(struct thin_c *tc,
dm_block_t virt_block,
* zeroing pre-existing data, we can issue the bio immediately.
* Otherwise we use kcopyd to zero the data first.
*/
- if (!pool->pf.zero_new_blocks)
- process_prepared_mapping(m);
-
+ if (!pool->pf.zero_new_blocks) {
+ r = dm_thin_insert_block(tc->td, m->virt_block, m->data_block, 0);
+ if (r) {
+ DMERR("schedule_zero() failed");
+ cell_error(m->cell);
+ }
+ else {
+ inc_all_io_entry(pool, bio);
+ cell_defer_except(tc, cell);
+ remap_and_issue(tc, bio, data_block);
+ }
+ list_del(&m->list);
+ mempool_free(m, tc->pool->mapping_pool);
+ }
else if (io_overwrites_block(pool, bio)) {
struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
h->overwrite_mapping = m;
And the performance also got better:
dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=2M count=1000
1000+0 records in
1000+0 records out
2097152000 bytes (2.1 GB) copied, 6.16819 seconds, 340 MB/s
Suppose my thin-pool is setup with pf->zero_new_blocks = false,
I think it's OK to issue one bio immediately rather than put it
back to the pool->deferred_bios when the mapping is known. Thus,
the sequential order can be maintained in this way. However, I
wonder if I would miss some cases in this rough patch, any
suggestions would be helpful.
Best Regards,
- Wayne.Chou
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: Performance question when blocks in thin-pool are assume zeroed
2014-05-12 9:03 Performance question when blocks in thin-pool are assume zeroed Wayne Chou
@ 2014-05-13 13:59 ` Joe Thornber
0 siblings, 0 replies; 3+ messages in thread
From: Joe Thornber @ 2014-05-13 13:59 UTC (permalink / raw)
To: device-mapper development
Hi Wayne,
I see you're using a 3.6 kernel. Could I trouble you to try with 3.14
please? Mike Snitzer wrote a very nice patch recently that implements
a separate deferred list for each thin device, and then sorts
individual bios using a btree.
https://github.com/jthornber/linux-2.6/commit/67324ea18812bc952ef96892fbd5817b9050413f
- Joe
On Mon, May 12, 2014 at 05:03:39PM +0800, Wayne Chou wrote:
> Hello,
>
> I am using 4 drives to construct a RAID5 and build a thin
> volume on it. To get better performance, I use '-Zn' option
> in 'lvcreate' to make the thin pool assume all blocks are
> already zeroed. The chunk size in RAID5 and thin-pool are
> both 512KB and the stripe_cache_size=4096 on RAID5.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-05-13 13:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-12 9:03 Performance question when blocks in thin-pool are assume zeroed Wayne Chou
2014-05-13 13:59 ` Joe Thornber
-- strict thread matches above, loose matches on Subject: below --
2014-05-12 8:54 Wayne Chou
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.