dm-writeboost: Progress Report

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* dm-writeboost: Progress Report
@ 2013-11-04  5:17 Akira Hayakawa
  2013-11-17 11:23 ` Akira Hayakawa
  0 siblings, 1 reply; 4+ messages in thread
From: Akira Hayakawa @ 2013-11-04  5:17 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-kernel, david, mpatocka, snitzer

Hi, DM Guys

Let me share a new progress report.
I am sorry I have been off for weeks.

Writeboost is getting better I believe.

Many progress, please git pull
https://github.com/akiradeveloper/dm-writeboost.git

1. Removing version switch macros
Previously the code included 10 or more version switches
and thus is not appropriate for upstream code of course.
Now I removed them all and Writeboost is not available for
older version of kernel.

2. .postsuspend and .resume implemented
dmsetup suspend and resume works now.

I implemented .postsuspend and .resume for this purpose.
.postsuspend is called after all I/O is submitted to
the device. In .postsuspend I make all the transient data
persistent.

static void writeboost_postsuspend(struct dm_target *ti)
{
	int r;

	struct wb_device *wb = ti->private;
	struct wb_cache *cache = wb->cache;

	flush_current_buffer(cache);
	IO(blkdev_issue_flush(cache->device->bdev, GFP_NOIO, NULL));
}

.postsuspend is also called before .dtr is called.
I removed the same code from .dtr therefore.

.resume does nothing at all. 

3. Blocking up the device thoroughly implemented 
How to block up the device has been my annoyance for weeks.
Now, Writeboost can block up the device in I/O error correctly
and elegantly.

What a dead device should satisfy are
1) Returns error on all incoming requests
2) ACK to all the requests accepted
3) To clean up data structures on memory
   `dmsetup remove` must be accepted even after marked dead
4) Blocking up must be processed regardless of any timing issues.

The principle behind how Writeboost blocks up a device
is really simple.

1) All I/O to underlying devices are wrapped by IO macro (shown below)
   which set flag WB_DEAD (using set_bit) on -EIO returned.
2) If WB_DEAD is set, all I/O to the underlying devices are ignored.
   This is equivalent to having /dev/null devices underlying.
   Processings on the memory data structures are still
   continuing completely agnostic to the fact that
   the underlying devices ignore their requests.
   This ensures the logic on the data structures never corrupt
   and data on the underlying devices never change after marked dead.
3) Some codes are wrapped by LIVE, DEAD or LIVE_DEAD
   macro which switches its behavior in accordance with
   WB_DEAD bit.

I tested the mechanism by wrapping the underlying devices (like below)
by dm-flakey. For both cases (cache fails or backing fails) 
it seems to be working.

sz=`blockdev --getsize ${CACHE}`
dmsetup create cache-flakey --table "0 ${sz} flakey ${CACHE} 0 5 1"
CACHE=/dev/mapper/cache-flakey

sz=`blockdev --getsize ${BACKING}`
dmsetup create backing-flakey --table "0 ${sz} flakey ${BACKING} 0 20 0"
BACKING=/dev/mapper/backing-flakey

3.1 WARN_ONCE introduced
IO macro branches the behavior with the return code from it wrapping procedure.
It only treat -EIO as a trigger to block up the device and others are not.
The problem remains is that I don't know what error
code can return within possibility. 
If the error code is not expected, it WARN_ONCE
to ask user to report the error.

/*
 * Only -EIO returned is regarded as a signal
 * to block up the whole system.
 *
 * -EOPNOTSUPP could be returned if the target device is
 * a virtual device and we request discard.
 *
 * -ENOMEM could be returned from blkdev_issue_discard (3.12-rc5)
 * for example. Waiting for a while can make room for allocation.
 *
 * For other unknown error codes we ignore them and
 * ask the human users to report us.
 */
#define IO(proc) \
	do { \
		r = 0; \
		LIVE(r = proc); \
		if (r == -EOPNOTSUPP) { \
			r = 0; \
		} else if (r == -EIO) { \
			set_bit(WB_DEAD, &wb->flags); \
			wake_up_all(&wb->dead_wait_queue); \
			WBERR("marked as dead"); \
		} else if (r == -ENOMEM) { \
			schedule_timeout_interruptible(msecs_to_jiffies(1000));\
		} else if (r) { \
			r = 0;\
			WARN_ONCE(1, "PLEASE REPORT!!! I/O FAILED FOR UNKNOWN REASON %d", r); \
		} \
	} while (r)

3.2 dead state never recover
As Dave's advice, I regard the error as fatal and provide no way to restart.
Removed blockup from message.

3.3 XFS corruption never reproduce
I think the previously discussed error is due to the 
misbehavior of Writeboost in blocking up.
For example, some bios are not returned 
which may corrupt the XFS AIL lock?

4. On-disk metadata layout changed and
   max segment size is reduced to 512KB from 1MB
struct metablock_device was 13 bytes before.
This means that the metadata may straddle two sectors
which causes partial write.
To avoid this, I added padding to make it 16 bytes.
As a result, maximum log size shrinks to 512KB.

5. `flush_supported = true` added 
I added `flush_supported = true` to .ctr
dm-cache and dm-thin also do this.
There is no meaning for Writeboost to ignore flush requests.

Below is my wonderings to note and I request for comments
1. Daemon should stop in suspend?
Which level below should a suspended device satisfy?
1) No transient data remains (currently OK)
2) + No any I/O to the underlying devices?
   (although it logically irrelevant to the upper layer)
3) + No any process execution is admitted to
   reduce power consumption for example?

2. .merge, .hints and .iterate_devices correct?
What is the correct behavior of these methods?

static void writeboost_io_hints(struct dm_target *ti,
				struct queue_limits *limits)
{
	blk_limits_io_min(limits, 512);
	blk_limits_io_opt(limits, 4096);
}

static int writeboost_iterate_devices(struct dm_target *ti,
				      iterate_devices_callout_fn fn, void *data)
{
	struct wb_device *wb = ti->private;
	struct dm_dev *orig = wb->device;
	sector_t start = 0;
	sector_t len = dm_devsize(orig);
	return fn(ti, orig, start, len, data);
}

static int writeboost_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
			    struct bio_vec *biovec, int max_size)
{
	struct wb_device *wb = ti->private;
	struct dm_dev *device = wb->device;
	struct request_queue *q = bdev_get_queue(device->bdev);

	if (!q->merge_bvec_fn)
		return max_size;

	bvm->bi_bdev = device->bdev;
	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
}

Thanks,
Akira

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-writeboost: Progress Report
  2013-11-04  5:17 Akira Hayakawa
@ 2013-11-17 11:23 ` Akira Hayakawa
  0 siblings, 0 replies; 4+ messages in thread
From: Akira Hayakawa @ 2013-11-17 11:23 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-kernel, david, mpatocka, snitzer, ruby.wktk

Hi, DM Guys,

Another progress report

1) Changed the .ctr arguments
Before changed it had two major problems

writeboost <backing dev> <cache dev>
           [segment size order]
           [rambuf pool amount]

Problem i) 
dm-writeboost will add support for persistent memory as the RAM buffer.
In detail, I will support two other types of medium as the RAM buffer:
- (a) A block device: persistent memory will first be provided with a block interface.
- (b) A persistent memory with newly designed interface: Still in discussion.

I don't like to add new target type like writeboost-nv-block and writeboost-nv
but like to switch the medium to use by a type flag in .ctr.
The reason is increasing the target types will probably increase the
implementation in userland (like LVM2).

Problem ii)
Whereas the first two arguments are essential
the successive two are only optional.
The problem is the optional arguments are ordered.
The user who want to set rambuf pool amount but keep the
segment size order as the default will be bothered.


To solve these problems I changed the .ctr design.
It has
- "type" arguments as the first argument
- key-value-like optional parameters
  (which is similar to that of dm-flakey)

writeboost <type> <backing dev> <cache dev>
           #optional args
           [segment_size_order val]
           [rambuf_pool_amount val]

Three types (0,1,2) will be supported but 
only for type 0 is implemented at present.
For type 1 (using block device as the ram buffer) 
can be implemented right now but
I want to merge this software to upstream before that
because dm-writeboost has benefit with solely type 0.

For example the arguments for type 1 will be
1 <backing dev> <cache dev> <rambuf dev> ...

2) .ctr now uses dm_arg_set. Deleted handmade ARG_EXIST macro.
dm driver should use dm_arg_set and its methods (such as dm_read_arg).
writeboost driver didn't use it but does at present.


Thanks,
Akira

On 11/4/13 2:17 PM, Akira Hayakawa wrote:
> Hi, DM Guys
> 
> Let me share a new progress report.
> I am sorry I have been off for weeks.
> 
> Writeboost is getting better I believe.
> 
> Many progress, please git pull
> https://github.com/akiradeveloper/dm-writeboost.git
> 
> 1. Removing version switch macros
> Previously the code included 10 or more version switches
> and thus is not appropriate for upstream code of course.
> Now I removed them all and Writeboost is not available for
> older version of kernel.
> 
> 2. .postsuspend and .resume implemented
> dmsetup suspend and resume works now.
> 
> I implemented .postsuspend and .resume for this purpose.
> .postsuspend is called after all I/O is submitted to
> the device. In .postsuspend I make all the transient data
> persistent.
> 
> static void writeboost_postsuspend(struct dm_target *ti)
> {
> 	int r;
> 
> 	struct wb_device *wb = ti->private;
> 	struct wb_cache *cache = wb->cache;
> 
> 	flush_current_buffer(cache);
> 	IO(blkdev_issue_flush(cache->device->bdev, GFP_NOIO, NULL));
> }
> 
> .postsuspend is also called before .dtr is called.
> I removed the same code from .dtr therefore.
> 
> .resume does nothing at all. 
> 
> 3. Blocking up the device thoroughly implemented 
> How to block up the device has been my annoyance for weeks.
> Now, Writeboost can block up the device in I/O error correctly
> and elegantly.
> 
> What a dead device should satisfy are
> 1) Returns error on all incoming requests
> 2) ACK to all the requests accepted
> 3) To clean up data structures on memory
>    `dmsetup remove` must be accepted even after marked dead
> 4) Blocking up must be processed regardless of any timing issues.
> 
> The principle behind how Writeboost blocks up a device
> is really simple.
> 
> 1) All I/O to underlying devices are wrapped by IO macro (shown below)
>    which set flag WB_DEAD (using set_bit) on -EIO returned.
> 2) If WB_DEAD is set, all I/O to the underlying devices are ignored.
>    This is equivalent to having /dev/null devices underlying.
>    Processings on the memory data structures are still
>    continuing completely agnostic to the fact that
>    the underlying devices ignore their requests.
>    This ensures the logic on the data structures never corrupt
>    and data on the underlying devices never change after marked dead.
> 3) Some codes are wrapped by LIVE, DEAD or LIVE_DEAD
>    macro which switches its behavior in accordance with
>    WB_DEAD bit.
> 
> I tested the mechanism by wrapping the underlying devices (like below)
> by dm-flakey. For both cases (cache fails or backing fails) 
> it seems to be working.
> 
> sz=`blockdev --getsize ${CACHE}`
> dmsetup create cache-flakey --table "0 ${sz} flakey ${CACHE} 0 5 1"
> CACHE=/dev/mapper/cache-flakey
> 
> sz=`blockdev --getsize ${BACKING}`
> dmsetup create backing-flakey --table "0 ${sz} flakey ${BACKING} 0 20 0"
> BACKING=/dev/mapper/backing-flakey
> 
> 3.1 WARN_ONCE introduced
> IO macro branches the behavior with the return code from it wrapping procedure.
> It only treat -EIO as a trigger to block up the device and others are not.
> The problem remains is that I don't know what error
> code can return within possibility. 
> If the error code is not expected, it WARN_ONCE
> to ask user to report the error.
> 
> /*
>  * Only -EIO returned is regarded as a signal
>  * to block up the whole system.
>  *
>  * -EOPNOTSUPP could be returned if the target device is
>  * a virtual device and we request discard.
>  *
>  * -ENOMEM could be returned from blkdev_issue_discard (3.12-rc5)
>  * for example. Waiting for a while can make room for allocation.
>  *
>  * For other unknown error codes we ignore them and
>  * ask the human users to report us.
>  */
> #define IO(proc) \
> 	do { \
> 		r = 0; \
> 		LIVE(r = proc); \
> 		if (r == -EOPNOTSUPP) { \
> 			r = 0; \
> 		} else if (r == -EIO) { \
> 			set_bit(WB_DEAD, &wb->flags); \
> 			wake_up_all(&wb->dead_wait_queue); \
> 			WBERR("marked as dead"); \
> 		} else if (r == -ENOMEM) { \
> 			schedule_timeout_interruptible(msecs_to_jiffies(1000));\
> 		} else if (r) { \
> 			r = 0;\
> 			WARN_ONCE(1, "PLEASE REPORT!!! I/O FAILED FOR UNKNOWN REASON %d", r); \
> 		} \
> 	} while (r)
> 
> 3.2 dead state never recover
> As Dave's advice, I regard the error as fatal and provide no way to restart.
> Removed blockup from message.
> 
> 3.3 XFS corruption never reproduce
> I think the previously discussed error is due to the 
> misbehavior of Writeboost in blocking up.
> For example, some bios are not returned 
> which may corrupt the XFS AIL lock?
> 
> 4. On-disk metadata layout changed and
>    max segment size is reduced to 512KB from 1MB
> struct metablock_device was 13 bytes before.
> This means that the metadata may straddle two sectors
> which causes partial write.
> To avoid this, I added padding to make it 16 bytes.
> As a result, maximum log size shrinks to 512KB.
> 
> 5. `flush_supported = true` added 
> I added `flush_supported = true` to .ctr
> dm-cache and dm-thin also do this.
> There is no meaning for Writeboost to ignore flush requests.
> 
> Below is my wonderings to note and I request for comments
> 1. Daemon should stop in suspend?
> Which level below should a suspended device satisfy?
> 1) No transient data remains (currently OK)
> 2) + No any I/O to the underlying devices?
>    (although it logically irrelevant to the upper layer)
> 3) + No any process execution is admitted to
>    reduce power consumption for example?
> 
> 2. .merge, .hints and .iterate_devices correct?
> What is the correct behavior of these methods?
> 
> static void writeboost_io_hints(struct dm_target *ti,
> 				struct queue_limits *limits)
> {
> 	blk_limits_io_min(limits, 512);
> 	blk_limits_io_opt(limits, 4096);
> }
> 
> static int writeboost_iterate_devices(struct dm_target *ti,
> 				      iterate_devices_callout_fn fn, void *data)
> {
> 	struct wb_device *wb = ti->private;
> 	struct dm_dev *orig = wb->device;
> 	sector_t start = 0;
> 	sector_t len = dm_devsize(orig);
> 	return fn(ti, orig, start, len, data);
> }
> 
> static int writeboost_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
> 			    struct bio_vec *biovec, int max_size)
> {
> 	struct wb_device *wb = ti->private;
> 	struct dm_dev *device = wb->device;
> 	struct request_queue *q = bdev_get_queue(device->bdev);
> 
> 	if (!q->merge_bvec_fn)
> 		return max_size;
> 
> 	bvm->bi_bdev = device->bdev;
> 	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
> }
> 
> 
> Thanks,
> Akira
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* dm-writeboost: Progress Report
@ 2014-05-17 14:44 Akira Hayakawa
  2014-05-23 13:14 ` Akira Hayakawa
  0 siblings, 1 reply; 4+ messages in thread
From: Akira Hayakawa @ 2014-05-17 14:44 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-kernel, masami.hiramatsu, devel

Hi DM Guys,

I will share the latest progress report about Writeboost.

1. Where we are working on now

Kernel code
-----------
First of all, Writeboost is now merged into
thin-dev branch in Joe's tree.
URL: https://github.com/jthornber/linux-2.6

Testing
-------
Testing for Writeboost is merged into
master branch in Joe's device-mapper-test-suite (or dmts).
URL: https://github.com/jthornber/device-mapper-test-suite

Docs
----
You can access to the lastest documentations here.
URL: https://github.com/akiradeveloper/dm-writeboost/tree/develop/doc

- writeboost.txt : Will be merged into Documentation/ but is not merged yet.

It will be really thankful if you help me
improve the sentences (I am not native).

Aside this,
- writeboost-ja.txt : For Japanese forks
- writeboost.pdf : Very first introduction slides to Writeboost
  DOWNLOAD: https://github.com/akiradeveloper/dm-writeboost/blob/develop/doc/writeboost.pdf?raw=true

2. New feature from the last progress report

The kernel code wasn't changed drastically but
included many important fixes (most of them are revealed
after tested upon Joe's tree and dmts. I recommend other
target developers test their codes on dmts).

Aside the fixes two major new features will be introduced.

Sorted write back
-----------------
In April, a patch to introduce sorting writes
for dm-crypt was introduced.
I thought it is also useful for Writeboost and
decided to do it for Writeboost, too.

This feature is implemented now.

Related thread:
http://www.redhat.com/archives/dm-devel/2014-April/msg00009.html

As a result,
writeback is really efficiently done by Writeboost.
You can see the detail in Section 4 "Benchmarking Results".

Persistent Logging and <type> parameter in constructor
------------------------------------------------------
Writeboost has three layers - RAM buffer, SSD (or cache device) and HDD (or backing device).
The data on the RAM buffer are written on SSD when FLUSH is requested.
Practically, it is not so frequent but in some arbitrary workload
Writeboost performs really badly because of the overhead of writing the RAM buffer
on each FLUSH request.

Persistent Logging is to solve this problem.
It writes the dirty data into Persistent Logging Device (plog_dev)
to reduce this overhead.

For more detail, please read writeboost.pdf.
DOWNLOAD: https://github.com/akiradeveloper/dm-writeboost/blob/develop/doc/writeboost.pdf?raw=true

3. What we are gonna do?
We are engaged in investigating Writeboost 
in terms of what kind of workload is good/bad for Writeboost
and the all tests are in the dmts.
URL: https://github.com/jthornber/device-mapper-test-suite

The on-going discussion between I and Joe is
accessible in dmts. If you are interested in Writeboost
I recommend you to watch the repository.

It will be also thankful if you join us on this work.
Since my hardware environment is not rich,
testing this accelerator on richer environment
will help me a lot.
Especially, I want to test Writeboost with
RAID-ed backing device (e.g. 100-HDDs).

4. Benchmarking Results

I will share the latest benchmark result on my hardware environment.
FYI, formerly, I share this benchmark (randwrite throughput).
http://www.redhat.com/archives/dm-devel/2014-February/msg00000.html

Summary:
Stacking Writeboost on a spindle block device
will reveal these benefits.

- It will always improve writes. If the cache device is small
  and the dirties are always overflown. This is because of the 
  optimization in writeback algorithm. I think the sorting
  really affects. This is supported by the test (A).
- Writeboost doesn't much deteriorate the read although it splits
  the I/O into 4KB fragments. This is supported by (B).
- Even in read-many workload, Writeboost performs really nicely.
  This is supported by (C).
- In realistic workload, Writeboost Type1 really improves the score.
  This is supported by (D).

Test: Writeboost in comparison with backing device only (i.e. wo Writeboost)
(A) 128MB writes to the HDD: Type0 (batch size:256) improves 396% 
(B) 128MB reads: Type0 1% slower (iosize=128KB)
(C) Git Extract: Type0 22% faster (Total time)
(D) dbench: Type1 improves 234%-299% (depends on the option)

Details:

(A) writeback_sorting_effect
To see how writeback optimization effects the time to
complete all writeback is measured.
As the number of segments batched in writeback is the
major factor in writeback optimization we will see how
this parameter affects by changing it.

Note the data amount to write is 128MB.

WriteboostTestsBackingDevice
  Elapsed 118.693293268

WriteboostTestsType0
  Elapsed 117.053297884: batch_size(4)
  Elapsed 76.709325916: batch_size(32)
  Elapsed 47.994442515: batch_size(128)
  Elapsed 29.953923952: batch_size(256)

The bigger the batched size is the elapsed time is shortened.
The best case is 118.69 -> 29.95 sec (x3.96)
It is easy to imagine more higher batch_size will be more gain.

This result means Writeboost has a potential to act as
really efficient I/O scheduler. In batch_size(256)
it submits 127 * 256 4KB blocks in sorted order
asynchronously.

(B) fio_read_overhead
On read, Writeboost splits the I/O and experiences
cache lookup for each split. This is not free.

This tests just reads 128MB by the iosize specified.
With Writeboost, it never read-hits because Writeboost
never caches on read.

WriteboostTestsBackingDevice
  Elapsed 430.314465782: iosize=1k
  Elapsed 217.09349141: iosize=2k
  Elapsed 110.633395391: iosize=4k
  Elapsed 56.652597528: iosize=8k
  Elapsed 29.65688052: iosize=16k
  Elapsed 16.564318187: iosize=32k
  Elapsed 9.679151882: iosize=64k
  Elapsed 6.306119032: iosize=128k

WriteboostTestsType0
  Elapsed 430.062210932: iosize=1k
  Elapsed 217.630954333: iosize=2k
  Elapsed 110.115843367: iosize=4k
  Elapsed 56.863948191: iosize=8k
  Elapsed 29.978668891: iosize=16k
  Elapsed 16.532206415: iosize=32k
  Elapsed 9.807747472: iosize=64k
  Elapsed 6.366230798: iosize=128k

The tendency is that
with Writeboost deteriorates its read
as the iosize gets bigger.
This is because of splitting overhead gets bigger.
When the iosize=128k the deterioration ratio is 1%.

Despite depends on the use-case,
this is enough small for real-world systems
that is equipped with RAM that is used as page cache.

As you imagine, the overhead is getting dominant
as the backing device is faster. To see the
case backing device is extraordinarily fast
I ran this experiments with SSD as the backing device
(I use a HDD as the backing device in above result)

WriteboostTestsBackingDevice
  Elapsed 7.359187932: iosize=1k
  Elapsed 4.810739394: iosize=2k
  Elapsed 2.092146925: iosize=4k
  Elapsed 3.477345334: iosize=8k
  Elapsed 0.992550734: iosize=16k
  Elapsed 0.890939955: iosize=32k
  Elapsed 0.862750482: iosize=64k
  Elapsed 0.964657796: iosize=128k

WriteboostTestsType0
  Elapsed 7.603870984: iosize=1k
  Elapsed 4.124003115: iosize=2k
  Elapsed 2.026922929: iosize=4k
  Elapsed 1.779826802: iosize=8k
  Elapsed 1.378827526: iosize=16k
  Elapsed 1.258259695: iosize=32k
  Elapsed 1.219117654: iosize=64k
  Elapsed 1.301907586: iosize=128k

I don't know why with Writeboost (below)
wins pure SSD in iosize=2, 4 and 8k but
it tends to perform worse than pure SSD.
In iosize=128k, it shows 26% loss.
However, the throughput is almost 100MB/sec and
making a RAID-ed HDDs that performs 100MB/sec 
randread is really hard. So, practically,
Writeboost's overhead on read is acceptable 
if we use 10-100 HDDs as the backing device.
(I want to actually test on such environment but
 don't have such hardwares...)

(C) git_extract_cache_quick
Git extract does "git checkout" several times on linux tree.
This test is really read-many and caches seldom hit on read. 
As a application workload that is not good for Writeboost,
this benchmark means.

WriteboostTestsBackingDevice
  Elapsed 52.494120792: git_prepare
  Elapsed 276.545543981: extract all versions
  Finished in 331.363683334 seconds

WriteboostTestsType0
  Elapsed 46.966797484: git_prepare
  Elapsed 215.305219932: extract all versions
  Finished in 270.176494226 seconds.

WriteboostTestsType1
  Elapsed 83.344358679: git_prepare
  Elapsed 236.562481129: extract all versions
  Finished in 329.684926274 seconds.

With Writeboost it wins the pure HDD.
In total time, it shows 22% faster.

(D) do_dbench
This test runs dbench program with three different 
options (none, -S, -s). -S means only directory operations
are SYNC and -s means all operations are SYNC.
dbench is a benchmark program that emulates fileserver workload.

WriteboostTestsBackingDevice
none: 28.24 MB/sec
-S  : 12.21 MB/sec
-s  :  4.01 MB/sec

WriteboostTestsType0
none: 29.28 MB/sec
-S  :  8.76 MB/sec
-s  :  4.67 MB/sec

WriteboostTestsType1 (with Persistent Logging)
none: 66.36 MB/sec
-S  : 29.35 MB/sec
-s  : 12.00 MB/sec

This benchmark shows that Persistent Logging
really improves the performance (always more than double).
Especially, with -s option (all operations are sync)
the performance is tripled.
However, as the Git extract case shows
Type1 is not always the winner. It depends on the workload.

In -S case, Type0 performs really poorly.
This is because of the said overhead in the "new feature" section.
However, we can improve this by tuning the parameter
"segment_size_order" and "barrier_deadline_ms".
Set smaller number to these parameters can improve the response to the
FLUSH request at the sacrifice of maximum write performance.

Thanks for reading
- Akira

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-writeboost: Progress Report
  2014-05-17 14:44 dm-writeboost: Progress Report Akira Hayakawa
@ 2014-05-23 13:14 ` Akira Hayakawa
  0 siblings, 0 replies; 4+ messages in thread
From: Akira Hayakawa @ 2014-05-23 13:14 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-kernel, masami.hiramatsu, devel

Hi Guys,

This progress report includes very important benchmarking results, which shows
i) Write will always improve - It boosts writes (396% in the best case) even with really small cache (say, 64MB) because of the sophisticated writeback optimization.
ii) Read won't be bad - It doesn't so much deteriorate the read (less than 1% for SSD+HDD case) because of the insignificant overhead.
iii) Good in application workloads - 22% improvement with read-intensive workload and 234-299% improvement with fileserver workload (using dbench)

Yeah, merging Writeboost into the mainline is my goal now and I really need your feedbacks.
For more details, please read the previous post.

Cheers,
- Akira

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-05-23 13:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-17 14:44 dm-writeboost: Progress Report Akira Hayakawa
2014-05-23 13:14 ` Akira Hayakawa
  -- strict thread matches above, loose matches on Subject: below --
2013-11-04  5:17 Akira Hayakawa
2013-11-17 11:23 ` Akira Hayakawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox