linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] writeback: fix writeback cache thrashing
@ 2012-12-30  5:59 Namjae Jeon
  2012-12-31 11:30 ` Jan Kara
  2013-01-05  3:18 ` Fengguang Wu
  0 siblings, 2 replies; 18+ messages in thread
From: Namjae Jeon @ 2012-12-30  5:59 UTC (permalink / raw)
  To: fengguang.wu
  Cc: linux-fsdevel, linux-mm, linux-kernel, liwanp, Namjae Jeon,
	Namjae Jeon, Vivek Trivedi, Jan Kara, Dave Chinner

From: Namjae Jeon <namjae.jeon@samsung.com>

Consider Process A: huge I/O on sda
        doing heavy write operation - dirty memory becomes more
        than dirty_background_ratio
        on HDD - flusher thread flush-8:0

Consider Process B: small I/O on sdb
        doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
        on Flash device - flusher thread flush-8:16

As Process A is a heavy dirtier, dirty memory becomes more
than dirty_background_thresh. Due to this, below check becomes
true(checking global_page_state in over_bground_thresh)
for all bdi devices(even for very small dirtied bdi - sdb):

In this case, even small cached data on 'sdb' is forced to flush
and writeback cache thrashing happens.

When we added debug prints inside above 'if' condition and ran
above Process A(heavy dirtier on bdi with flush-8:0) and
Process B(1024K frequent read/rewrite on bdi with flush-8:16)
we got below prints:

[Test setup: ARM dual core CPU, 512 MB RAM]

[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
[over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB

As mentioned in above log, when global dirty memory > global background_thresh
small cached data is also forced to flush by flush-8:16.

If removing global background_thresh checking code, we can reduce cache
thrashing of frequently used small data.
And It will be great if we can reserve a portion of writeback cache using
min_ratio.

After applying patch:
$ echo 5 > /sys/block/sdb/bdi/min_ratio
$ cat /sys/block/sdb/bdi/min_ratio
5

[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
[over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB

As mentioned in the above logs, once cache is reserved for Process B,
and patch is applied there is less writeback cache thrashing on sdb
by frequent forced writeback by flush-8:16 in over_bground_thresh.

After all, small cached data will be flushed by periodic writeback
once every dirty_writeback_interval.

Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 310972b..070b773 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -756,10 +756,6 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
 
 	global_dirty_limits(&background_thresh, &dirty_thresh);
 
-	if (global_page_state(NR_FILE_DIRTY) +
-	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
-		return true;
-
 	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
 				bdi_dirty_limit(bdi, background_thresh))
 		return true;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2012-12-30  5:59 [PATCH] writeback: fix writeback cache thrashing Namjae Jeon
@ 2012-12-31 11:30 ` Jan Kara
  2013-01-01  0:51   ` Wanpeng Li
       [not found]   ` <20130101005104.GA23383@hacker.(null)>
  2013-01-05  3:18 ` Fengguang Wu
  1 sibling, 2 replies; 18+ messages in thread
From: Jan Kara @ 2012-12-31 11:30 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: fengguang.wu, linux-fsdevel, linux-mm, linux-kernel, liwanp,
	Namjae Jeon, Vivek Trivedi, Jan Kara, Dave Chinner

On Sun 30-12-12 14:59:50, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
> 
> Consider Process A: huge I/O on sda
>         doing heavy write operation - dirty memory becomes more
>         than dirty_background_ratio
>         on HDD - flusher thread flush-8:0
> 
> Consider Process B: small I/O on sdb
>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
>         on Flash device - flusher thread flush-8:16
> 
> As Process A is a heavy dirtier, dirty memory becomes more
> than dirty_background_thresh. Due to this, below check becomes
> true(checking global_page_state in over_bground_thresh)
> for all bdi devices(even for very small dirtied bdi - sdb):
> 
> In this case, even small cached data on 'sdb' is forced to flush
> and writeback cache thrashing happens.
> 
> When we added debug prints inside above 'if' condition and ran
> above Process A(heavy dirtier on bdi with flush-8:0) and
> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
> we got below prints:
> 
> [Test setup: ARM dual core CPU, 512 MB RAM]
> 
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB
> 
> As mentioned in above log, when global dirty memory > global background_thresh
> small cached data is also forced to flush by flush-8:16.
> 
> If removing global background_thresh checking code, we can reduce cache
> thrashing of frequently used small data.
  It's not completely clear to me:
  Why is this a problem? Wearing of the flash? Power consumption? I'd like
to understand this before changing the code...

> And It will be great if we can reserve a portion of writeback cache using
> min_ratio.
> 
> After applying patch:
> $ echo 5 > /sys/block/sdb/bdi/min_ratio
> $ cat /sys/block/sdb/bdi/min_ratio
> 5
> 
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB
> 
> As mentioned in the above logs, once cache is reserved for Process B,
> and patch is applied there is less writeback cache thrashing on sdb
> by frequent forced writeback by flush-8:16 in over_bground_thresh.
> 
> After all, small cached data will be flushed by periodic writeback
> once every dirty_writeback_interval.
  OK, in principle something like this makes sence to me. But if there are
more BDIs which are roughly equally used, it could happen none of them are
over threshold due to percpu counter & rounding errors. So I'd rather
change the conditions to something like:
	reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
	bdi_bground_thresh = bdi_dirty_limit(bdi, background_thresh);

  	if (reclaimable > bdi_bground_thresh)
		return true;
	/*
	 * If global background limit is exceeded, kick the writeback on
	 * BDI if there's a reasonable amount of data to write (at least
	 * 1/2 of BDI's background dirty limit).
	 */
	if (global_page_state(NR_FILE_DIRTY) +
	    global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
	    reclaimable * 2 > bdi_bground_thresh)
		return true;

								Honza

> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
> Cc: Fengguang Wu <fengguang.wu@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/fs-writeback.c |    4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 310972b..070b773 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
>  
>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>  
> -	if (global_page_state(NR_FILE_DIRTY) +
> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
> -		return true;
> -
>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
>  				bdi_dirty_limit(bdi, background_thresh))
>  		return true;
> -- 
> 1.7.9.5
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2012-12-31 11:30 ` Jan Kara
@ 2013-01-01  0:51   ` Wanpeng Li
       [not found]   ` <20130101005104.GA23383@hacker.(null)>
  1 sibling, 0 replies; 18+ messages in thread
From: Wanpeng Li @ 2013-01-01  0:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: Namjae Jeon, fengguang.wu, linux-fsdevel, linux-mm, linux-kernel,
	Namjae Jeon, Vivek Trivedi, Dave Chinner

On Mon, Dec 31, 2012 at 12:30:54PM +0100, Jan Kara wrote:
>On Sun 30-12-12 14:59:50, Namjae Jeon wrote:
>> From: Namjae Jeon <namjae.jeon@samsung.com>
>> 
>> Consider Process A: huge I/O on sda
>>         doing heavy write operation - dirty memory becomes more
>>         than dirty_background_ratio
>>         on HDD - flusher thread flush-8:0
>> 
>> Consider Process B: small I/O on sdb
>>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
>>         on Flash device - flusher thread flush-8:16
>> 
>> As Process A is a heavy dirtier, dirty memory becomes more
>> than dirty_background_thresh. Due to this, below check becomes
>> true(checking global_page_state in over_bground_thresh)
>> for all bdi devices(even for very small dirtied bdi - sdb):
>> 
>> In this case, even small cached data on 'sdb' is forced to flush
>> and writeback cache thrashing happens.
>> 
>> When we added debug prints inside above 'if' condition and ran
>> above Process A(heavy dirtier on bdi with flush-8:0) and
>> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
>> we got below prints:
>> 
>> [Test setup: ARM dual core CPU, 512 MB RAM]
>> 
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB
>> 
>> As mentioned in above log, when global dirty memory > global background_thresh
>> small cached data is also forced to flush by flush-8:16.
>> 
>> If removing global background_thresh checking code, we can reduce cache
>> thrashing of frequently used small data.
>  It's not completely clear to me:
>  Why is this a problem? Wearing of the flash? Power consumption? I'd like
>to understand this before changing the code...
>
>> And It will be great if we can reserve a portion of writeback cache using
>> min_ratio.
>> 
>> After applying patch:
>> $ echo 5 > /sys/block/sdb/bdi/min_ratio
>> $ cat /sys/block/sdb/bdi/min_ratio
>> 5
>> 
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB
>> 
>> As mentioned in the above logs, once cache is reserved for Process B,
>> and patch is applied there is less writeback cache thrashing on sdb
>> by frequent forced writeback by flush-8:16 in over_bground_thresh.
>> 
>> After all, small cached data will be flushed by periodic writeback
>> once every dirty_writeback_interval.
>  OK, in principle something like this makes sence to me. But if there are
>more BDIs which are roughly equally used, it could happen none of them are
>over threshold due to percpu counter & rounding errors. So I'd rather
>change the conditions to something like:
>	reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
>	bdi_bground_thresh = bdi_dirty_limit(bdi, background_thresh);
>
>  	if (reclaimable > bdi_bground_thresh)
>		return true;
>	/*
>	 * If global background limit is exceeded, kick the writeback on
>	 * BDI if there's a reasonable amount of data to write (at least
>	 * 1/2 of BDI's background dirty limit).
>	 */
>	if (global_page_state(NR_FILE_DIRTY) +
>	    global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>	    reclaimable * 2 > bdi_bground_thresh)
>		return true;
>

Hi Jan,

If there are enough BDIs and percpu counter of each bdi roughly equally
used less than 1/2 of BDI's background dirty limit, still nothing will 
be flushed even if over global background_thresh.

Regards,
Wanpeng Li 

>								Honza
>
>> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
>> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
>> Cc: Fengguang Wu <fengguang.wu@intel.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> ---
>>  fs/fs-writeback.c |    4 ----
>>  1 file changed, 4 deletions(-)
>> 
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 310972b..070b773 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
>>  
>>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>>  
>> -	if (global_page_state(NR_FILE_DIRTY) +
>> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
>> -		return true;
>> -
>>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
>>  				bdi_dirty_limit(bdi, background_thresh))
>>  		return true;
>> -- 
>> 1.7.9.5
>> 
>-- 
>Jan Kara <jack@suse.cz>
>SUSE Labs, CR
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
       [not found]   ` <20130101005104.GA23383@hacker.(null)>
@ 2013-01-02 13:43     ` Jan Kara
  2013-01-03  4:35       ` Namjae Jeon
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2013-01-02 13:43 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Jan Kara, Namjae Jeon, fengguang.wu, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

On Tue 01-01-13 08:51:04, Wanpeng Li wrote:
> On Mon, Dec 31, 2012 at 12:30:54PM +0100, Jan Kara wrote:
> >On Sun 30-12-12 14:59:50, Namjae Jeon wrote:
> >> From: Namjae Jeon <namjae.jeon@samsung.com>
> >> 
> >> Consider Process A: huge I/O on sda
> >>         doing heavy write operation - dirty memory becomes more
> >>         than dirty_background_ratio
> >>         on HDD - flusher thread flush-8:0
> >> 
> >> Consider Process B: small I/O on sdb
> >>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
> >>         on Flash device - flusher thread flush-8:16
> >> 
> >> As Process A is a heavy dirtier, dirty memory becomes more
> >> than dirty_background_thresh. Due to this, below check becomes
> >> true(checking global_page_state in over_bground_thresh)
> >> for all bdi devices(even for very small dirtied bdi - sdb):
> >> 
> >> In this case, even small cached data on 'sdb' is forced to flush
> >> and writeback cache thrashing happens.
> >> 
> >> When we added debug prints inside above 'if' condition and ran
> >> above Process A(heavy dirtier on bdi with flush-8:0) and
> >> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
> >> we got below prints:
> >> 
> >> [Test setup: ARM dual core CPU, 512 MB RAM]
> >> 
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB
> >> 
> >> As mentioned in above log, when global dirty memory > global background_thresh
> >> small cached data is also forced to flush by flush-8:16.
> >> 
> >> If removing global background_thresh checking code, we can reduce cache
> >> thrashing of frequently used small data.
> >  It's not completely clear to me:
> >  Why is this a problem? Wearing of the flash? Power consumption? I'd like
> >to understand this before changing the code...
> >
> >> And It will be great if we can reserve a portion of writeback cache using
> >> min_ratio.
> >> 
> >> After applying patch:
> >> $ echo 5 > /sys/block/sdb/bdi/min_ratio
> >> $ cat /sys/block/sdb/bdi/min_ratio
> >> 5
> >> 
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB
> >> 
> >> As mentioned in the above logs, once cache is reserved for Process B,
> >> and patch is applied there is less writeback cache thrashing on sdb
> >> by frequent forced writeback by flush-8:16 in over_bground_thresh.
> >> 
> >> After all, small cached data will be flushed by periodic writeback
> >> once every dirty_writeback_interval.
> >  OK, in principle something like this makes sence to me. But if there are
> >more BDIs which are roughly equally used, it could happen none of them are
> >over threshold due to percpu counter & rounding errors. So I'd rather
> >change the conditions to something like:
> >	reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> >	bdi_bground_thresh = bdi_dirty_limit(bdi, background_thresh);
> >
> >  	if (reclaimable > bdi_bground_thresh)
> >		return true;
> >	/*
> >	 * If global background limit is exceeded, kick the writeback on
> >	 * BDI if there's a reasonable amount of data to write (at least
> >	 * 1/2 of BDI's background dirty limit).
> >	 */
> >	if (global_page_state(NR_FILE_DIRTY) +
> >	    global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
> >	    reclaimable * 2 > bdi_bground_thresh)
> >		return true;
> >
> 
> Hi Jan,
> 
> If there are enough BDIs and percpu counter of each bdi roughly equally
> used less than 1/2 of BDI's background dirty limit, still nothing will 
> be flushed even if over global background_thresh.
  Yes, although then the percpu counter error would have to be quite big.
Anyway, we can change the last condition to:
     if (global_page_state(NR_FILE_DIRTY) +
         global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
         reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)

  That should be safe and for machines with resonable number of CPUs it
should save the wakeup as well.

								Honza

> >> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> >> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> >> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
> >> Cc: Fengguang Wu <fengguang.wu@intel.com>
> >> Cc: Jan Kara <jack@suse.cz>
> >> Cc: Dave Chinner <dchinner@redhat.com>
> >> ---
> >>  fs/fs-writeback.c |    4 ----
> >>  1 file changed, 4 deletions(-)
> >> 
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 310972b..070b773 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
> >>  
> >>  	global_dirty_limits(&background_thresh, &dirty_thresh);
> >>  
> >> -	if (global_page_state(NR_FILE_DIRTY) +
> >> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
> >> -		return true;
> >> -
> >>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
> >>  				bdi_dirty_limit(bdi, background_thresh))
> >>  		return true;
> >> -- 
> >> 1.7.9.5
> >> 
> >-- 
> >Jan Kara <jack@suse.cz>
> >SUSE Labs, CR
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-02 13:43     ` Jan Kara
@ 2013-01-03  4:35       ` Namjae Jeon
  2013-01-04  0:59         ` Simon Jeons
  0 siblings, 1 reply; 18+ messages in thread
From: Namjae Jeon @ 2013-01-03  4:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: Wanpeng Li, fengguang.wu, linux-fsdevel, linux-mm, linux-kernel,
	Namjae Jeon, Vivek Trivedi, Dave Chinner

2013/1/2, Jan Kara <jack@suse.cz>:
> On Tue 01-01-13 08:51:04, Wanpeng Li wrote:
>> On Mon, Dec 31, 2012 at 12:30:54PM +0100, Jan Kara wrote:
>> >On Sun 30-12-12 14:59:50, Namjae Jeon wrote:
>> >> From: Namjae Jeon <namjae.jeon@samsung.com>
>> >>
>> >> Consider Process A: huge I/O on sda
>> >>         doing heavy write operation - dirty memory becomes more
>> >>         than dirty_background_ratio
>> >>         on HDD - flusher thread flush-8:0
>> >>
>> >> Consider Process B: small I/O on sdb
>> >>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
>> >>         on Flash device - flusher thread flush-8:16
>> >>
>> >> As Process A is a heavy dirtier, dirty memory becomes more
>> >> than dirty_background_thresh. Due to this, below check becomes
>> >> true(checking global_page_state in over_bground_thresh)
>> >> for all bdi devices(even for very small dirtied bdi - sdb):
>> >>
>> >> In this case, even small cached data on 'sdb' is forced to flush
>> >> and writeback cache thrashing happens.
>> >>
>> >> When we added debug prints inside above 'if' condition and ran
>> >> above Process A(heavy dirtier on bdi with flush-8:0) and
>> >> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
>> >> we got below prints:
>> >>
>> >> [Test setup: ARM dual core CPU, 512 MB RAM]
>> >>
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
>> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB
>> >>
>> >> As mentioned in above log, when global dirty memory > global
>> >> background_thresh
>> >> small cached data is also forced to flush by flush-8:16.
>> >>
>> >> If removing global background_thresh checking code, we can reduce
>> >> cache
>> >> thrashing of frequently used small data.
>> >  It's not completely clear to me:
>> >  Why is this a problem? Wearing of the flash? Power consumption? I'd
>> > like
>> >to understand this before changing the code...
Hi Jan.
Yes, it can reduce wearing and fragmentation of flash. And also from
one scenario - we
think it might reduce power consumption also.

>> >
>> >> And It will be great if we can reserve a portion of writeback cache
>> >> using
>> >> min_ratio.
>> >>
>> >> After applying patch:
>> >> $ echo 5 > /sys/block/sdb/bdi/min_ratio
>> >> $ cat /sys/block/sdb/bdi/min_ratio
>> >> 5
>> >>
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
>> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB
>> >>
>> >> As mentioned in the above logs, once cache is reserved for Process B,
>> >> and patch is applied there is less writeback cache thrashing on sdb
>> >> by frequent forced writeback by flush-8:16 in over_bground_thresh.
>> >>
>> >> After all, small cached data will be flushed by periodic writeback
>> >> once every dirty_writeback_interval.
>> >  OK, in principle something like this makes sence to me. But if there
>> > are
>> >more BDIs which are roughly equally used, it could happen none of them
>> > are
>> >over threshold due to percpu counter & rounding errors. So I'd rather
>> >change the conditions to something like:
>> >	reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
>> >	bdi_bground_thresh = bdi_dirty_limit(bdi, background_thresh);
>> >
>> >  	if (reclaimable > bdi_bground_thresh)
>> >		return true;
>> >	/*
>> >	 * If global background limit is exceeded, kick the writeback on
>> >	 * BDI if there's a reasonable amount of data to write (at least
>> >	 * 1/2 of BDI's background dirty limit).
>> >	 */
>> >	if (global_page_state(NR_FILE_DIRTY) +
>> >	    global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>> >	    reclaimable * 2 > bdi_bground_thresh)
>> >		return true;
>> >
>>
>> Hi Jan,
>>
>> If there are enough BDIs and percpu counter of each bdi roughly equally
>> used less than 1/2 of BDI's background dirty limit, still nothing will
>> be flushed even if over global background_thresh.
>   Yes, although then the percpu counter error would have to be quite big.
> Anyway, we can change the last condition to:
>      if (global_page_state(NR_FILE_DIRTY) +
>          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
>
>   That should be safe and for machines with resonable number of CPUs it
> should save the wakeup as well.
I agree and will send v2 patch as your suggestion.

Thanks Jan.
>
> 								Honza
>
>> >> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>> >> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
>> >> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
>> >> Cc: Fengguang Wu <fengguang.wu@intel.com>
>> >> Cc: Jan Kara <jack@suse.cz>
>> >> Cc: Dave Chinner <dchinner@redhat.com>
>> >> ---
>> >>  fs/fs-writeback.c |    4 ----
>> >>  1 file changed, 4 deletions(-)
>> >>
>> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> >> index 310972b..070b773 100644
>> >> --- a/fs/fs-writeback.c
>> >> +++ b/fs/fs-writeback.c
>> >> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct
>> >> backing_dev_info *bdi)
>> >>
>> >>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>> >>
>> >> -	if (global_page_state(NR_FILE_DIRTY) +
>> >> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
>> >> -		return true;
>> >> -
>> >>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
>> >>  				bdi_dirty_limit(bdi, background_thresh))
>> >>  		return true;
>> >> --
>> >> 1.7.9.5
>> >>
>> >--
>> >Jan Kara <jack@suse.cz>
>> >SUSE Labs, CR
>> >
>> >--
>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >the body to majordomo@kvack.org.  For more info on Linux MM,
>> >see: http://www.linux-mm.org/ .
>> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-03  4:35       ` Namjae Jeon
@ 2013-01-04  0:59         ` Simon Jeons
  2013-01-04  7:41           ` Namjae Jeon
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Jeons @ 2013-01-04  0:59 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Jan Kara, Wanpeng Li, fengguang.wu, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

On Thu, 2013-01-03 at 13:35 +0900, Namjae Jeon wrote:
> 2013/1/2, Jan Kara <jack@suse.cz>:
> > On Tue 01-01-13 08:51:04, Wanpeng Li wrote:
> >> On Mon, Dec 31, 2012 at 12:30:54PM +0100, Jan Kara wrote:
> >> >On Sun 30-12-12 14:59:50, Namjae Jeon wrote:
> >> >> From: Namjae Jeon <namjae.jeon@samsung.com>
> >> >>
> >> >> Consider Process A: huge I/O on sda
> >> >>         doing heavy write operation - dirty memory becomes more
> >> >>         than dirty_background_ratio
> >> >>         on HDD - flusher thread flush-8:0
> >> >>
> >> >> Consider Process B: small I/O on sdb
> >> >>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
> >> >>         on Flash device - flusher thread flush-8:16
> >> >>
> >> >> As Process A is a heavy dirtier, dirty memory becomes more
> >> >> than dirty_background_thresh. Due to this, below check becomes
> >> >> true(checking global_page_state in over_bground_thresh)
> >> >> for all bdi devices(even for very small dirtied bdi - sdb):
> >> >>
> >> >> In this case, even small cached data on 'sdb' is forced to flush
> >> >> and writeback cache thrashing happens.
> >> >>
> >> >> When we added debug prints inside above 'if' condition and ran
> >> >> above Process A(heavy dirtier on bdi with flush-8:0) and
> >> >> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
> >> >> we got below prints:
> >> >>
> >> >> [Test setup: ARM dual core CPU, 512 MB RAM]
> >> >>
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB
> >> >>
> >> >> As mentioned in above log, when global dirty memory > global
> >> >> background_thresh
> >> >> small cached data is also forced to flush by flush-8:16.
> >> >>
> >> >> If removing global background_thresh checking code, we can reduce
> >> >> cache
> >> >> thrashing of frequently used small data.
> >> >  It's not completely clear to me:
> >> >  Why is this a problem? Wearing of the flash? Power consumption? I'd
> >> > like
> >> >to understand this before changing the code...
> Hi Jan.
> Yes, it can reduce wearing and fragmentation of flash. And also from
> one scenario - we
> think it might reduce power consumption also.
> 
> >> >
> >> >> And It will be great if we can reserve a portion of writeback cache
> >> >> using
> >> >> min_ratio.
> >> >>
> >> >> After applying patch:
> >> >> $ echo 5 > /sys/block/sdb/bdi/min_ratio
> >> >> $ cat /sys/block/sdb/bdi/min_ratio
> >> >> 5
> >> >>
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB
> >> >>
> >> >> As mentioned in the above logs, once cache is reserved for Process B,
> >> >> and patch is applied there is less writeback cache thrashing on sdb
> >> >> by frequent forced writeback by flush-8:16 in over_bground_thresh.
> >> >>
> >> >> After all, small cached data will be flushed by periodic writeback
> >> >> once every dirty_writeback_interval.
> >> >  OK, in principle something like this makes sence to me. But if there
> >> > are
> >> >more BDIs which are roughly equally used, it could happen none of them
> >> > are
> >> >over threshold due to percpu counter & rounding errors. So I'd rather
> >> >change the conditions to something like:
> >> >	reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> >> >	bdi_bground_thresh = bdi_dirty_limit(bdi, background_thresh);
> >> >
> >> >  	if (reclaimable > bdi_bground_thresh)
> >> >		return true;
> >> >	/*
> >> >	 * If global background limit is exceeded, kick the writeback on
> >> >	 * BDI if there's a reasonable amount of data to write (at least
> >> >	 * 1/2 of BDI's background dirty limit).
> >> >	 */
> >> >	if (global_page_state(NR_FILE_DIRTY) +
> >> >	    global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
> >> >	    reclaimable * 2 > bdi_bground_thresh)
> >> >		return true;
> >> >
> >>
> >> Hi Jan,
> >>
> >> If there are enough BDIs and percpu counter of each bdi roughly equally
> >> used less than 1/2 of BDI's background dirty limit, still nothing will
> >> be flushed even if over global background_thresh.
> >   Yes, although then the percpu counter error would have to be quite big.
> > Anyway, we can change the last condition to:
> >      if (global_page_state(NR_FILE_DIRTY) +
> >          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
> >          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
> >
> >   That should be safe and for machines with resonable number of CPUs it
> > should save the wakeup as well.
> I agree and will send v2 patch as your suggestion.

Hi Namjae,

Why use bdi_stat_error here? What's the meaning of its comment "maximal
error of a stat counter"?

> 
> Thanks Jan.
> >
> > 								Honza
> >
> >> >> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> >> >> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> >> >> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
> >> >> Cc: Fengguang Wu <fengguang.wu@intel.com>
> >> >> Cc: Jan Kara <jack@suse.cz>
> >> >> Cc: Dave Chinner <dchinner@redhat.com>
> >> >> ---
> >> >>  fs/fs-writeback.c |    4 ----
> >> >>  1 file changed, 4 deletions(-)
> >> >>
> >> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> >> index 310972b..070b773 100644
> >> >> --- a/fs/fs-writeback.c
> >> >> +++ b/fs/fs-writeback.c
> >> >> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct
> >> >> backing_dev_info *bdi)
> >> >>
> >> >>  	global_dirty_limits(&background_thresh, &dirty_thresh);
> >> >>
> >> >> -	if (global_page_state(NR_FILE_DIRTY) +
> >> >> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
> >> >> -		return true;
> >> >> -
> >> >>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
> >> >>  				bdi_dirty_limit(bdi, background_thresh))
> >> >>  		return true;
> >> >> --
> >> >> 1.7.9.5
> >> >>
> >> >--
> >> >Jan Kara <jack@suse.cz>
> >> >SUSE Labs, CR
> >> >
> >> >--
> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >> >see: http://www.linux-mm.org/ .
> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>
> > --
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-04  0:59         ` Simon Jeons
@ 2013-01-04  7:41           ` Namjae Jeon
  2013-01-05  0:46             ` Simon Jeons
  0 siblings, 1 reply; 18+ messages in thread
From: Namjae Jeon @ 2013-01-04  7:41 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Jan Kara, Wanpeng Li, fengguang.wu, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

2013/1/4, Simon Jeons <simon.jeons@gmail.com>:
> On Thu, 2013-01-03 at 13:35 +0900, Namjae Jeon wrote:
>> 2013/1/2, Jan Kara <jack@suse.cz>:
>> > On Tue 01-01-13 08:51:04, Wanpeng Li wrote:
>> >> On Mon, Dec 31, 2012 at 12:30:54PM +0100, Jan Kara wrote:
>> >> >On Sun 30-12-12 14:59:50, Namjae Jeon wrote:
>> >> >> From: Namjae Jeon <namjae.jeon@samsung.com>
>> >> >>
>> >> >> Consider Process A: huge I/O on sda
>> >> >>         doing heavy write operation - dirty memory becomes more
>> >> >>         than dirty_background_ratio
>> >> >>         on HDD - flusher thread flush-8:0
>> >> >>
>> >> >> Consider Process B: small I/O on sdb
>> >> >>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
>> >> >>         on Flash device - flusher thread flush-8:16
>> >> >>
>> >> >> As Process A is a heavy dirtier, dirty memory becomes more
>> >> >> than dirty_background_thresh. Due to this, below check becomes
>> >> >> true(checking global_page_state in over_bground_thresh)
>> >> >> for all bdi devices(even for very small dirtied bdi - sdb):
>> >> >>
>> >> >> In this case, even small cached data on 'sdb' is forced to flush
>> >> >> and writeback cache thrashing happens.
>> >> >>
>> >> >> When we added debug prints inside above 'if' condition and ran
>> >> >> above Process A(heavy dirtier on bdi with flush-8:0) and
>> >> >> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
>> >> >> we got below prints:
>> >> >>
>> >> >> [Test setup: ARM dual core CPU, 512 MB RAM]
>> >> >>
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544
>> >> >> KB
>> >> >>
>> >> >> As mentioned in above log, when global dirty memory > global
>> >> >> background_thresh
>> >> >> small cached data is also forced to flush by flush-8:16.
>> >> >>
>> >> >> If removing global background_thresh checking code, we can reduce
>> >> >> cache
>> >> >> thrashing of frequently used small data.
>> >> >  It's not completely clear to me:
>> >> >  Why is this a problem? Wearing of the flash? Power consumption? I'd
>> >> > like
>> >> >to understand this before changing the code...
>> Hi Jan.
>> Yes, it can reduce wearing and fragmentation of flash. And also from
>> one scenario - we
>> think it might reduce power consumption also.
>>
>> >> >
>> >> >> And It will be great if we can reserve a portion of writeback cache
>> >> >> using
>> >> >> min_ratio.
>> >> >>
>> >> >> After applying patch:
>> >> >> $ echo 5 > /sys/block/sdb/bdi/min_ratio
>> >> >> $ cat /sys/block/sdb/bdi/min_ratio
>> >> >> 5
>> >> >>
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624
>> >> >> KB
>> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688
>> >> >> KB
>> >> >>
>> >> >> As mentioned in the above logs, once cache is reserved for Process
>> >> >> B,
>> >> >> and patch is applied there is less writeback cache thrashing on sdb
>> >> >> by frequent forced writeback by flush-8:16 in over_bground_thresh.
>> >> >>
>> >> >> After all, small cached data will be flushed by periodic writeback
>> >> >> once every dirty_writeback_interval.
>> >> >  OK, in principle something like this makes sence to me. But if
>> >> > there
>> >> > are
>> >> >more BDIs which are roughly equally used, it could happen none of
>> >> > them
>> >> > are
>> >> >over threshold due to percpu counter & rounding errors. So I'd rather
>> >> >change the conditions to something like:
>> >> >	reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
>> >> >	bdi_bground_thresh = bdi_dirty_limit(bdi, background_thresh);
>> >> >
>> >> >  	if (reclaimable > bdi_bground_thresh)
>> >> >		return true;
>> >> >	/*
>> >> >	 * If global background limit is exceeded, kick the writeback on
>> >> >	 * BDI if there's a reasonable amount of data to write (at least
>> >> >	 * 1/2 of BDI's background dirty limit).
>> >> >	 */
>> >> >	if (global_page_state(NR_FILE_DIRTY) +
>> >> >	    global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>> >> >	    reclaimable * 2 > bdi_bground_thresh)
>> >> >		return true;
>> >> >
>> >>
>> >> Hi Jan,
>> >>
>> >> If there are enough BDIs and percpu counter of each bdi roughly
>> >> equally
>> >> used less than 1/2 of BDI's background dirty limit, still nothing will
>> >> be flushed even if over global background_thresh.
>> >   Yes, although then the percpu counter error would have to be quite
>> > big.
>> > Anyway, we can change the last condition to:
>> >      if (global_page_state(NR_FILE_DIRTY) +
>> >          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>> >          reclaimable * 2 + bdi_stat_error(bdi) * 2 >
>> > bdi_bground_thresh)
>> >
>> >   That should be safe and for machines with resonable number of CPUs it
>> > should save the wakeup as well.
>> I agree and will send v2 patch as your suggestion.
>
> Hi Namjae,
>
> Why use bdi_stat_error here? What's the meaning of its comment "maximal
> error of a stat counter"?
Hi Simon,

As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
percpu counters.
When these percpu counters are incremented/decremented simultaneously
on multiple CPUs by small amount (individual cpu counter less than
threshold BDI_STAT_BATCH),
it is possible that we get approximate value (not exact value) of
these percpu counters.
In order, to handle these percpu counter error we have used
bdi_stat_error. bdi_stat_error is the maximum error which can happen
in percpu bdi stats accounting.

bdi_stat(bdi, BDI_RECLAIMABLE);
 -> This will give approximate value of BDI_RECLAIMABLE by reading
previous value of percpu count.

bdi_stat_sum(bdi, BDI_RECLAIMABLE);
 ->This will give exact value of BDI_RECLAIMABLE. It will take lock
and add current percpu count of individual CPUs.
   It is not recommended to use it frequently as it is expensive. We
can better use “bdi_stat” and work with approx value of bdi stats.

Thanks.
>
>>
>> Thanks Jan.
>> >
>> > 								Honza
>> >
>> >> >> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>> >> >> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
>> >> >> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
>> >> >> Cc: Fengguang Wu <fengguang.wu@intel.com>
>> >> >> Cc: Jan Kara <jack@suse.cz>
>> >> >> Cc: Dave Chinner <dchinner@redhat.com>
>> >> >> ---
>> >> >>  fs/fs-writeback.c |    4 ----
>> >> >>  1 file changed, 4 deletions(-)
>> >> >>
>> >> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> >> >> index 310972b..070b773 100644
>> >> >> --- a/fs/fs-writeback.c
>> >> >> +++ b/fs/fs-writeback.c
>> >> >> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct
>> >> >> backing_dev_info *bdi)
>> >> >>
>> >> >>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>> >> >>
>> >> >> -	if (global_page_state(NR_FILE_DIRTY) +
>> >> >> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
>> >> >> -		return true;
>> >> >> -
>> >> >>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
>> >> >>  				bdi_dirty_limit(bdi, background_thresh))
>> >> >>  		return true;
>> >> >> --
>> >> >> 1.7.9.5
>> >> >>
>> >> >--
>> >> >Jan Kara <jack@suse.cz>
>> >> >SUSE Labs, CR
>> >> >
>> >> >--
>> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> >the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> >see: http://www.linux-mm.org/ .
>> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>
>> > --
>> > Jan Kara <jack@suse.cz>
>> > SUSE Labs, CR
>> >
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-04  7:41           ` Namjae Jeon
@ 2013-01-05  0:46             ` Simon Jeons
  2013-01-05  3:26               ` Fengguang Wu
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Jeons @ 2013-01-05  0:46 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Jan Kara, Wanpeng Li, fengguang.wu, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

On Fri, 2013-01-04 at 16:41 +0900, Namjae Jeon wrote:
> 2013/1/4, Simon Jeons <simon.jeons@gmail.com>:
> > On Thu, 2013-01-03 at 13:35 +0900, Namjae Jeon wrote:
> >> 2013/1/2, Jan Kara <jack@suse.cz>:
> >> > On Tue 01-01-13 08:51:04, Wanpeng Li wrote:
> >> >> On Mon, Dec 31, 2012 at 12:30:54PM +0100, Jan Kara wrote:
> >> >> >On Sun 30-12-12 14:59:50, Namjae Jeon wrote:
> >> >> >> From: Namjae Jeon <namjae.jeon@samsung.com>
> >> >> >>
> >> >> >> Consider Process A: huge I/O on sda
> >> >> >>         doing heavy write operation - dirty memory becomes more
> >> >> >>         than dirty_background_ratio
> >> >> >>         on HDD - flusher thread flush-8:0
> >> >> >>
> >> >> >> Consider Process B: small I/O on sdb
> >> >> >>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
> >> >> >>         on Flash device - flusher thread flush-8:16
> >> >> >>
> >> >> >> As Process A is a heavy dirtier, dirty memory becomes more
> >> >> >> than dirty_background_thresh. Due to this, below check becomes
> >> >> >> true(checking global_page_state in over_bground_thresh)
> >> >> >> for all bdi devices(even for very small dirtied bdi - sdb):
> >> >> >>
> >> >> >> In this case, even small cached data on 'sdb' is forced to flush
> >> >> >> and writeback cache thrashing happens.
> >> >> >>
> >> >> >> When we added debug prints inside above 'if' condition and ran
> >> >> >> above Process A(heavy dirtier on bdi with flush-8:0) and
> >> >> >> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
> >> >> >> we got below prints:
> >> >> >>
> >> >> >> [Test setup: ARM dual core CPU, 512 MB RAM]
> >> >> >>
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544
> >> >> >> KB
> >> >> >>
> >> >> >> As mentioned in above log, when global dirty memory > global
> >> >> >> background_thresh
> >> >> >> small cached data is also forced to flush by flush-8:16.
> >> >> >>
> >> >> >> If removing global background_thresh checking code, we can reduce
> >> >> >> cache
> >> >> >> thrashing of frequently used small data.
> >> >> >  It's not completely clear to me:
> >> >> >  Why is this a problem? Wearing of the flash? Power consumption? I'd
> >> >> > like
> >> >> >to understand this before changing the code...
> >> Hi Jan.
> >> Yes, it can reduce wearing and fragmentation of flash. And also from
> >> one scenario - we
> >> think it might reduce power consumption also.
> >>
> >> >> >
> >> >> >> And It will be great if we can reserve a portion of writeback cache
> >> >> >> using
> >> >> >> min_ratio.
> >> >> >>
> >> >> >> After applying patch:
> >> >> >> $ echo 5 > /sys/block/sdb/bdi/min_ratio
> >> >> >> $ cat /sys/block/sdb/bdi/min_ratio
> >> >> >> 5
> >> >> >>
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624
> >> >> >> KB
> >> >> >> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688
> >> >> >> KB
> >> >> >>
> >> >> >> As mentioned in the above logs, once cache is reserved for Process
> >> >> >> B,
> >> >> >> and patch is applied there is less writeback cache thrashing on sdb
> >> >> >> by frequent forced writeback by flush-8:16 in over_bground_thresh.
> >> >> >>
> >> >> >> After all, small cached data will be flushed by periodic writeback
> >> >> >> once every dirty_writeback_interval.
> >> >> >  OK, in principle something like this makes sence to me. But if
> >> >> > there
> >> >> > are
> >> >> >more BDIs which are roughly equally used, it could happen none of
> >> >> > them
> >> >> > are
> >> >> >over threshold due to percpu counter & rounding errors. So I'd rather
> >> >> >change the conditions to something like:
> >> >> >	reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> >> >> >	bdi_bground_thresh = bdi_dirty_limit(bdi, background_thresh);
> >> >> >
> >> >> >  	if (reclaimable > bdi_bground_thresh)
> >> >> >		return true;
> >> >> >	/*
> >> >> >	 * If global background limit is exceeded, kick the writeback on
> >> >> >	 * BDI if there's a reasonable amount of data to write (at least
> >> >> >	 * 1/2 of BDI's background dirty limit).
> >> >> >	 */
> >> >> >	if (global_page_state(NR_FILE_DIRTY) +
> >> >> >	    global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
> >> >> >	    reclaimable * 2 > bdi_bground_thresh)
> >> >> >		return true;
> >> >> >
> >> >>
> >> >> Hi Jan,
> >> >>
> >> >> If there are enough BDIs and percpu counter of each bdi roughly
> >> >> equally
> >> >> used less than 1/2 of BDI's background dirty limit, still nothing will
> >> >> be flushed even if over global background_thresh.
> >> >   Yes, although then the percpu counter error would have to be quite
> >> > big.
> >> > Anyway, we can change the last condition to:
> >> >      if (global_page_state(NR_FILE_DIRTY) +
> >> >          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
> >> >          reclaimable * 2 + bdi_stat_error(bdi) * 2 >
> >> > bdi_bground_thresh)
> >> >
> >> >   That should be safe and for machines with resonable number of CPUs it
> >> > should save the wakeup as well.
> >> I agree and will send v2 patch as your suggestion.
> >
> > Hi Namjae,
> >
> > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > error of a stat counter"?
> Hi Simon,
> 
> As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> percpu counters.
> When these percpu counters are incremented/decremented simultaneously
> on multiple CPUs by small amount (individual cpu counter less than
> threshold BDI_STAT_BATCH),
> it is possible that we get approximate value (not exact value) of
> these percpu counters.
> In order, to handle these percpu counter error we have used
> bdi_stat_error. bdi_stat_error is the maximum error which can happen
> in percpu bdi stats accounting.
> 
> bdi_stat(bdi, BDI_RECLAIMABLE);
>  -> This will give approximate value of BDI_RECLAIMABLE by reading
> previous value of percpu count.
> 
> bdi_stat_sum(bdi, BDI_RECLAIMABLE);
>  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> and add current percpu count of individual CPUs.
>    It is not recommended to use it frequently as it is expensive. We
> can better use “bdi_stat” and work with approx value of bdi stats.
> 

Hi Namjae, thanks for your clarify.

But why compare error stat count to bdi_bground_thresh? What's the
relationship between them? I also see bdi_stat_error compare to
bdi_thresh/bdi_dirty in function balance_dirty_pages. 


> Thanks.
> >
> >>
> >> Thanks Jan.
> >> >
> >> > 								Honza
> >> >
> >> >> >> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> >> >> >> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> >> >> >> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
> >> >> >> Cc: Fengguang Wu <fengguang.wu@intel.com>
> >> >> >> Cc: Jan Kara <jack@suse.cz>
> >> >> >> Cc: Dave Chinner <dchinner@redhat.com>
> >> >> >> ---
> >> >> >>  fs/fs-writeback.c |    4 ----
> >> >> >>  1 file changed, 4 deletions(-)
> >> >> >>
> >> >> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> >> >> index 310972b..070b773 100644
> >> >> >> --- a/fs/fs-writeback.c
> >> >> >> +++ b/fs/fs-writeback.c
> >> >> >> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct
> >> >> >> backing_dev_info *bdi)
> >> >> >>
> >> >> >>  	global_dirty_limits(&background_thresh, &dirty_thresh);
> >> >> >>
> >> >> >> -	if (global_page_state(NR_FILE_DIRTY) +
> >> >> >> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
> >> >> >> -		return true;
> >> >> >> -
> >> >> >>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
> >> >> >>  				bdi_dirty_limit(bdi, background_thresh))
> >> >> >>  		return true;
> >> >> >> --
> >> >> >> 1.7.9.5
> >> >> >>
> >> >> >--
> >> >> >Jan Kara <jack@suse.cz>
> >> >> >SUSE Labs, CR
> >> >> >
> >> >> >--
> >> >> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> >> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >> >> >see: http://www.linux-mm.org/ .
> >> >> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >> >>
> >> > --
> >> > Jan Kara <jack@suse.cz>
> >> > SUSE Labs, CR
> >> >
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> >
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2012-12-30  5:59 [PATCH] writeback: fix writeback cache thrashing Namjae Jeon
  2012-12-31 11:30 ` Jan Kara
@ 2013-01-05  3:18 ` Fengguang Wu
  2013-01-09  8:26   ` Namjae Jeon
  1 sibling, 1 reply; 18+ messages in thread
From: Fengguang Wu @ 2013-01-05  3:18 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: linux-fsdevel, linux-mm, linux-kernel, liwanp, Namjae Jeon,
	Vivek Trivedi, Jan Kara, Dave Chinner, Simon Jeons

Hi Namjae,

On Sun, Dec 30, 2012 at 02:59:50PM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
> 
> Consider Process A: huge I/O on sda
>         doing heavy write operation - dirty memory becomes more
>         than dirty_background_ratio
>         on HDD - flusher thread flush-8:0
> 
> Consider Process B: small I/O on sdb
>         doing while [1]; read 1024K + rewrite 1024K + sleep 2sec
>         on Flash device - flusher thread flush-8:16
> 
> As Process A is a heavy dirtier, dirty memory becomes more
> than dirty_background_thresh. Due to this, below check becomes
> true(checking global_page_state in over_bground_thresh)
> for all bdi devices(even for very small dirtied bdi - sdb):
> 
> In this case, even small cached data on 'sdb' is forced to flush
> and writeback cache thrashing happens.
> 
> When we added debug prints inside above 'if' condition and ran
> above Process A(heavy dirtier on bdi with flush-8:0) and
> Process B(1024K frequent read/rewrite on bdi with flush-8:16)
> we got below prints:
> 
> [Test setup: ARM dual core CPU, 512 MB RAM]
> 
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84720 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 94720 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   384 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   960 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92160 KB

> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   768 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   256 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   320 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB

Yeah, that IO pattern is not good. Perhaps it's 6 small IOs in /one/
second?  However that's not quite in line with "sleep 2sec" in your
workload description. Note that I assume flush-8:0 works on a hard
disk, so each flush-8:0 line indicates roughly 1 second interval
elapsed. It would be much more clear if the printk timestamps are
turned on (CONFIG_PRINTK_TIME=y).

> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB
> 
> As mentioned in above log, when global dirty memory > global background_thresh
> small cached data is also forced to flush by flush-8:16.
> 
> If removing global background_thresh checking code, we can reduce cache
> thrashing of frequently used small data.
> And It will be great if we can reserve a portion of writeback cache using
> min_ratio.
 
> After applying patch:
> $ echo 5 > /sys/block/sdb/bdi/min_ratio
> $ cat /sys/block/sdb/bdi/min_ratio
> 5

The below log looks all perfect. However the min_ratio setup is a
problem. If possible, I'd like the final patch being able to work
reasonably well with min_ratio=0 (the system default), too.

> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB
 
> As mentioned in the above logs, once cache is reserved for Process B,
> and patch is applied there is less writeback cache thrashing on sdb
> by frequent forced writeback by flush-8:16 in over_bground_thresh.
> 
> After all, small cached data will be flushed by periodic writeback
> once every dirty_writeback_interval.
> 
> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
> Cc: Fengguang Wu <fengguang.wu@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/fs-writeback.c |    4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 310972b..070b773 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
>  
>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>  
> -	if (global_page_state(NR_FILE_DIRTY) +
> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
> -		return true;
> -

That global test should be kept in some form (see Jan's proposal).
Because the below per-bdi test can be inaccurate in various ways:

- bdi_stat() may have errors up to bdi_stat_error()

- bdi_dirty_limit() may be arbitrarily shifted by min_ratio etc.

- bdi_dirty_limit() may be totally wrong due to the estimation in
  bdi_writeout_fraction() is in its initial value 0, or is still
  trying to catch up with sudden workload changes.

>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
>  				bdi_dirty_limit(bdi, background_thresh))
>  		return true;

I suspect even removing the global test as in your patch, the above
bdi test will still mostly return true for your described workload,
due to bdi_dirty_limit() returning a value close to 0, because the
writeout fraction of sdb is close to 0.

You cleverly avoided this in your test by raising min_ratio to 5.
However I'd suggest to test with min_ratio=0 and try solutions that
can work well in such default configuration.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-05  0:46             ` Simon Jeons
@ 2013-01-05  3:26               ` Fengguang Wu
  2013-01-05  5:26                 ` Simon Jeons
  0 siblings, 1 reply; 18+ messages in thread
From: Fengguang Wu @ 2013-01-05  3:26 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Namjae Jeon, Jan Kara, Wanpeng Li, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

> > > Hi Namjae,
> > >
> > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > error of a stat counter"?
> > Hi Simon,
> > 
> > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> > percpu counters.
> > When these percpu counters are incremented/decremented simultaneously
> > on multiple CPUs by small amount (individual cpu counter less than
> > threshold BDI_STAT_BATCH),
> > it is possible that we get approximate value (not exact value) of
> > these percpu counters.
> > In order, to handle these percpu counter error we have used
> > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > in percpu bdi stats accounting.
> > 
> > bdi_stat(bdi, BDI_RECLAIMABLE);
> >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > previous value of percpu count.
> > 
> > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > and add current percpu count of individual CPUs.
> >    It is not recommended to use it frequently as it is expensive. We
> > can better use “bdi_stat” and work with approx value of bdi stats.
> > 
> 
> Hi Namjae, thanks for your clarify.
> 
> But why compare error stat count to bdi_bground_thresh? What's the

It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
in concept, comparing bdi_stat (with error bound adjustments) to
bdi_bground_thresh.

> relationship between them? I also see bdi_stat_error compare to
> bdi_thresh/bdi_dirty in function balance_dirty_pages. 

Here, it's trying to use bdi_stat_sum(), the accurate (however more
costly) version of bdi_stat(), if the error would possibly be large:

                if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
                        bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
                        //...
                } else {
                        bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
                        //...
                }

Here the comment should have explained it well:

                 * In theory 1 page is enough to keep the comsumer-producer
                 * pipe going: the flusher cleans 1 page => the task dirties 1
                 * more page. However bdi_dirty has accounting errors.  So use
                 * the larger and more IO friendly bdi_stat_error.
                 */
                if (bdi_dirty <= bdi_stat_error(bdi))
                        break;


Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-05  3:26               ` Fengguang Wu
@ 2013-01-05  5:26                 ` Simon Jeons
  2013-01-05  7:38                   ` Fengguang Wu
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Jeons @ 2013-01-05  5:26 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Namjae Jeon, Jan Kara, Wanpeng Li, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote:
> > > > Hi Namjae,
> > > >
> > > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > > error of a stat counter"?
> > > Hi Simon,
> > > 
> > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> > > percpu counters.
> > > When these percpu counters are incremented/decremented simultaneously
> > > on multiple CPUs by small amount (individual cpu counter less than
> > > threshold BDI_STAT_BATCH),
> > > it is possible that we get approximate value (not exact value) of
> > > these percpu counters.
> > > In order, to handle these percpu counter error we have used
> > > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > > in percpu bdi stats accounting.
> > > 
> > > bdi_stat(bdi, BDI_RECLAIMABLE);
> > >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > > previous value of percpu count.
> > > 
> > > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > > and add current percpu count of individual CPUs.
> > >    It is not recommended to use it frequently as it is expensive. We
> > > can better use “bdi_stat” and work with approx value of bdi stats.
> > > 
> > 
> > Hi Namjae, thanks for your clarify.
> > 
> > But why compare error stat count to bdi_bground_thresh? What's the
> 
> It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
> in concept, comparing bdi_stat (with error bound adjustments) to
> bdi_bground_thresh.
> 
> > relationship between them? I also see bdi_stat_error compare to
> > bdi_thresh/bdi_dirty in function balance_dirty_pages. 
> 

Hi Fengguang,

> Here, it's trying to use bdi_stat_sum(), the accurate (however more
> costly) version of bdi_stat(), if the error would possibly be large:

Why error is large use bdi_stat_sum and error is few use bdi_stat?

> 
>                 if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
>                         bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
>                         //...
>                 } else {
>                         bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
>                         //...
>                 }
> 
> Here the comment should have explained it well:
> 
>                  * In theory 1 page is enough to keep the comsumer-producer
>                  * pipe going: the flusher cleans 1 page => the task dirties 1
>                  * more page. However bdi_dirty has accounting errors.  So use

Why bdi_dirty has accounting errors?

>                  * the larger and more IO friendly bdi_stat_error.
>                  */
>                 if (bdi_dirty <= bdi_stat_error(bdi))
>                         break;
> 
> 
> Thanks,
> Fengguang


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-05  5:26                 ` Simon Jeons
@ 2013-01-05  7:38                   ` Fengguang Wu
  2013-01-05  9:41                     ` Simon Jeons
  0 siblings, 1 reply; 18+ messages in thread
From: Fengguang Wu @ 2013-01-05  7:38 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Namjae Jeon, Jan Kara, Wanpeng Li, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

On Fri, Jan 04, 2013 at 11:26:43PM -0600, Simon Jeons wrote:
> On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote:
> > > > > Hi Namjae,
> > > > >
> > > > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > > > error of a stat counter"?
> > > > Hi Simon,
> > > > 
> > > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> > > > percpu counters.
> > > > When these percpu counters are incremented/decremented simultaneously
> > > > on multiple CPUs by small amount (individual cpu counter less than
> > > > threshold BDI_STAT_BATCH),
> > > > it is possible that we get approximate value (not exact value) of
> > > > these percpu counters.
> > > > In order, to handle these percpu counter error we have used
> > > > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > > > in percpu bdi stats accounting.
> > > > 
> > > > bdi_stat(bdi, BDI_RECLAIMABLE);
> > > >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > > > previous value of percpu count.
> > > > 
> > > > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > > >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > > > and add current percpu count of individual CPUs.
> > > >    It is not recommended to use it frequently as it is expensive. We
> > > > can better use “bdi_stat” and work with approx value of bdi stats.
> > > > 
> > > 
> > > Hi Namjae, thanks for your clarify.
> > > 
> > > But why compare error stat count to bdi_bground_thresh? What's the
> > 
> > It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
> > in concept, comparing bdi_stat (with error bound adjustments) to
> > bdi_bground_thresh.
> > 
> > > relationship between them? I also see bdi_stat_error compare to
> > > bdi_thresh/bdi_dirty in function balance_dirty_pages. 
> > 
> 
> Hi Fengguang,
> 
> > Here, it's trying to use bdi_stat_sum(), the accurate (however more
> > costly) version of bdi_stat(), if the error would possibly be large:
> 
> Why error is large use bdi_stat_sum and error is few use bdi_stat?

It's the opposite. Please check this per-cpu counter routine to get an idea:

/*
 * Add up all the per-cpu counts, return the result.  This is a more accurate
 * but much slower version of percpu_counter_read_positive()
 */                                                 
s64 __percpu_counter_sum(struct percpu_counter *fbc)

> > 
> >                 if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
> >                         bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> >                         //...
> >                 } else {
> >                         bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> >                         //...
> >                 }
> > 
> > Here the comment should have explained it well:
> > 
> >                  * In theory 1 page is enough to keep the comsumer-producer
> >                  * pipe going: the flusher cleans 1 page => the task dirties 1
> >                  * more page. However bdi_dirty has accounting errors.  So use
> 
> Why bdi_dirty has accounting errors?

Because it typically uses bdi_stat() to get the rough sum of the per-cpu
counters.
 
Thanks,
Fengguang

> >                  * the larger and more IO friendly bdi_stat_error.
> >                  */
> >                 if (bdi_dirty <= bdi_stat_error(bdi))
> >                         break;
> > 
> > 
> > Thanks,
> > Fengguang
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-05  7:38                   ` Fengguang Wu
@ 2013-01-05  9:41                     ` Simon Jeons
  2013-01-05  9:55                       ` Fengguang Wu
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Jeons @ 2013-01-05  9:41 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Namjae Jeon, Jan Kara, Wanpeng Li, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

On Sat, 2013-01-05 at 15:38 +0800, Fengguang Wu wrote:
> On Fri, Jan 04, 2013 at 11:26:43PM -0600, Simon Jeons wrote:
> > On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote:
> > > > > > Hi Namjae,
> > > > > >
> > > > > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > > > > error of a stat counter"?
> > > > > Hi Simon,
> > > > > 
> > > > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> > > > > percpu counters.
> > > > > When these percpu counters are incremented/decremented simultaneously
> > > > > on multiple CPUs by small amount (individual cpu counter less than
> > > > > threshold BDI_STAT_BATCH),
> > > > > it is possible that we get approximate value (not exact value) of
> > > > > these percpu counters.
> > > > > In order, to handle these percpu counter error we have used
> > > > > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > > > > in percpu bdi stats accounting.
> > > > > 
> > > > > bdi_stat(bdi, BDI_RECLAIMABLE);
> > > > >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > > > > previous value of percpu count.
> > > > > 
> > > > > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > > > >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > > > > and add current percpu count of individual CPUs.
> > > > >    It is not recommended to use it frequently as it is expensive. We
> > > > > can better use “bdi_stat” and work with approx value of bdi stats.
> > > > > 
> > > > 
> > > > Hi Namjae, thanks for your clarify.
> > > > 
> > > > But why compare error stat count to bdi_bground_thresh? What's the
> > > 
> > > It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
> > > in concept, comparing bdi_stat (with error bound adjustments) to
> > > bdi_bground_thresh.
> > > 
> > > > relationship between them? I also see bdi_stat_error compare to
> > > > bdi_thresh/bdi_dirty in function balance_dirty_pages. 
> > > 
> > 
> > Hi Fengguang,
> > 
> > > Here, it's trying to use bdi_stat_sum(), the accurate (however more
> > > costly) version of bdi_stat(), if the error would possibly be large:
> > 
> > Why error is large use bdi_stat_sum and error is few use bdi_stat?
> 

Thanks for your response Fengguang! :)

> It's the opposite. Please check this per-cpu counter routine to get an idea:
> 
> /*
>  * Add up all the per-cpu counts, return the result.  This is a more accurate
>  * but much slower version of percpu_counter_read_positive()
>  */                                                 
> s64 __percpu_counter_sum(struct percpu_counter *fbc)
> 
> > > 
> > >                 if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
> > >                         bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 } else {
> > >                         bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 }
> > > 

The comment above these codes:

                 * In order to avoid the stacked BDI deadlock we need
                 * to ensure we accurately count the 'dirty' pages when
                 * the threshold is low.

Why your meaning threshold low is error large? 


> > > Here the comment should have explained it well:
> > > 
> > >                  * In theory 1 page is enough to keep the comsumer-producer
> > >                  * pipe going: the flusher cleans 1 page => the task dirties 1
> > >                  * more page. However bdi_dirty has accounting errors.  So use
> > 
> > Why bdi_dirty has accounting errors?
> 
> Because it typically uses bdi_stat() to get the rough sum of the per-cpu
> counters.
>  
> Thanks,
> Fengguang
> 
> > >                  * the larger and more IO friendly bdi_stat_error.
> > >                  */
> > >                 if (bdi_dirty <= bdi_stat_error(bdi))
> > >                         break;
> > > 
> > > 
> > > Thanks,
> > > Fengguang
> > 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-05  9:41                     ` Simon Jeons
@ 2013-01-05  9:55                       ` Fengguang Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2013-01-05  9:55 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Namjae Jeon, Jan Kara, Wanpeng Li, linux-fsdevel, linux-mm,
	linux-kernel, Namjae Jeon, Vivek Trivedi, Dave Chinner

On Sat, Jan 05, 2013 at 03:41:54AM -0600, Simon Jeons wrote:
> On Sat, 2013-01-05 at 15:38 +0800, Fengguang Wu wrote:
> > On Fri, Jan 04, 2013 at 11:26:43PM -0600, Simon Jeons wrote:
> > > On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote:
> > > > > > > Hi Namjae,
> > > > > > >
> > > > > > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > > > > > error of a stat counter"?
> > > > > > Hi Simon,
> > > > > > 
> > > > > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> > > > > > percpu counters.
> > > > > > When these percpu counters are incremented/decremented simultaneously
> > > > > > on multiple CPUs by small amount (individual cpu counter less than
> > > > > > threshold BDI_STAT_BATCH),
> > > > > > it is possible that we get approximate value (not exact value) of
> > > > > > these percpu counters.
> > > > > > In order, to handle these percpu counter error we have used
> > > > > > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > > > > > in percpu bdi stats accounting.
> > > > > > 
> > > > > > bdi_stat(bdi, BDI_RECLAIMABLE);
> > > > > >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > > > > > previous value of percpu count.
> > > > > > 
> > > > > > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > > > > >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > > > > > and add current percpu count of individual CPUs.
> > > > > >    It is not recommended to use it frequently as it is expensive. We
> > > > > > can better use “bdi_stat” and work with approx value of bdi stats.
> > > > > > 
> > > > > 
> > > > > Hi Namjae, thanks for your clarify.
> > > > > 
> > > > > But why compare error stat count to bdi_bground_thresh? What's the
> > > > 
> > > > It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
> > > > in concept, comparing bdi_stat (with error bound adjustments) to
> > > > bdi_bground_thresh.
> > > > 
> > > > > relationship between them? I also see bdi_stat_error compare to
> > > > > bdi_thresh/bdi_dirty in function balance_dirty_pages. 
> > > > 
> > > 
> > > Hi Fengguang,
> > > 
> > > > Here, it's trying to use bdi_stat_sum(), the accurate (however more
> > > > costly) version of bdi_stat(), if the error would possibly be large:
> > > 
> > > Why error is large use bdi_stat_sum and error is few use bdi_stat?
> > 
> 
> Thanks for your response Fengguang! :)

You are welcome.

> > It's the opposite. Please check this per-cpu counter routine to get an idea:
> > 
> > /*
> >  * Add up all the per-cpu counts, return the result.  This is a more accurate
> >  * but much slower version of percpu_counter_read_positive()
> >  */                                                 
> > s64 __percpu_counter_sum(struct percpu_counter *fbc)
> > 
> > > > 
> > > >                 if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
> > > >                         bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > > >                         //...
> > > >                 } else {
> > > >                         bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> > > >                         //...
> > > >                 }
> > > > 
> 
> The comment above these codes:
> 
>                  * In order to avoid the stacked BDI deadlock we need
>                  * to ensure we accurately count the 'dirty' pages when
>                  * the threshold is low.
> 
> Why your meaning threshold low is error large? 

Because bdi_reclaimable is normally less than or at least comparable
to bdi_thresh. So (bdi_thresh < 2 * bdi_stat_error(bdi)) means the
resulted bdi_reclaimable will be small, so small that the bdi_stat()
error is not ignorable and should be avoided.

Thanks,
Fengguang

> 
> > > > Here the comment should have explained it well:
> > > > 
> > > >                  * In theory 1 page is enough to keep the comsumer-producer
> > > >                  * pipe going: the flusher cleans 1 page => the task dirties 1
> > > >                  * more page. However bdi_dirty has accounting errors.  So use
> > > 
> > > Why bdi_dirty has accounting errors?
> > 
> > Because it typically uses bdi_stat() to get the rough sum of the per-cpu
> > counters.
> >  
> > Thanks,
> > Fengguang
> > 
> > > >                  * the larger and more IO friendly bdi_stat_error.
> > > >                  */
> > > >                 if (bdi_dirty <= bdi_stat_error(bdi))
> > > >                         break;
> > > > 
> > > > 
> > > > Thanks,
> > > > Fengguang
> > > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-05  3:18 ` Fengguang Wu
@ 2013-01-09  8:26   ` Namjae Jeon
  2013-01-09 15:13     ` Jan Kara
  0 siblings, 1 reply; 18+ messages in thread
From: Namjae Jeon @ 2013-01-09  8:26 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: linux-fsdevel, linux-mm, linux-kernel, liwanp, Namjae Jeon,
	Vivek Trivedi, Jan Kara, Dave Chinner, Simon Jeons

>
> Yeah, that IO pattern is not good. Perhaps it's 6 small IOs in /one/
> second?  However that's not quite in line with "sleep 2sec" in your
> workload description. Note that I assume flush-8:0 works on a hard
> disk, so each flush-8:0 line indicates roughly 1 second interval
> elapsed. It would be much more clear if the printk timestamps are
> turned on (CONFIG_PRINTK_TIME=y).

Okay, I enabled CONFIG_PRINTK_TIME in kernel.
I did small change in my workload - removed 2 sec sleep:

Process A: huge Write on sda
Process B: doing while [1]; read 1024K + rewrite 1024K on sdb

Here sda: USB HDD with write speed ~ 30 MB/s
Here sdb: USB Flash with write speed ~ 5 MB/s

[Test setup: ARM dual core CPU, 512 MB RAM]

Please find below debug log with original kernel:

[  229.198121] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      94592 KB,
bdi_dirty_limit =      55088 KB
[  232.289630] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        724 KB
[  232.301741] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        724 KB
[  232.311931] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        724 KB
[  232.401708] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      88576 KB,
bdi_dirty_limit =      55168 KB
[  232.496078] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      94848 KB,
bdi_dirty_limit =      54976 KB
[  232.511644] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1084 KB
[  232.525624] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1084 KB
[  232.554873] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1076 KB
[  233.495648] [1]:flush--8:16 : BDI_RECLAIMABLE =        512 KB,
bdi_dirty_limit =       1152 KB
[  233.503541] [1]:flush--8:16 : BDI_RECLAIMABLE =        576 KB,
bdi_dirty_limit =       1152 KB
[  233.514282] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1152 KB
[  233.537715] [1]:flush--8:16 : BDI_RECLAIMABLE =        128 KB,
bdi_dirty_limit =       1228 KB
[  233.553075] [1]:flush--8:16 : BDI_RECLAIMABLE =        448 KB,
bdi_dirty_limit =       1228 KB
[  233.562214] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1228 KB
[  235.892394] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      82112 KB,
bdi_dirty_limit =      54848 KB
[  238.585652] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        728 KB
[  238.597671] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        728 KB
[  238.612104] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        728 KB
[  238.738163] [1]:flush--8:16 : BDI_RECLAIMABLE =        576 KB,
bdi_dirty_limit =        892 KB
[  238.747117] [1]:flush--8:16 : BDI_RECLAIMABLE =        640 KB,
bdi_dirty_limit =        888 KB
[  238.756542] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        924 KB
[  238.817905] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      91136 KB,
bdi_dirty_limit =      54972 KB
[  238.826022] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      91712 KB,
bdi_dirty_limit =      55016 KB
[  239.726429] [1]:flush--8:16 : BDI_RECLAIMABLE =        448 KB,
bdi_dirty_limit =       1024 KB
[  239.734379] [1]:flush--8:16 : BDI_RECLAIMABLE =        448 KB,
bdi_dirty_limit =       1024 KB
[  239.744833] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1024 KB
[  239.928073] [1]:flush--8:16 : BDI_RECLAIMABLE =        896 KB,
bdi_dirty_limit =       1240 KB
[  239.936026] [1]:flush--8:16 : BDI_RECLAIMABLE =        896 KB,
bdi_dirty_limit =       1240 KB
[  239.946683] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1240 KB
[  242.214657] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      91904 KB,
bdi_dirty_limit =      54816 KB
[  244.666688] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        820 KB
[  244.678468] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        820 KB
[  244.703922] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        820 KB
[  245.319828] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      93056 KB,
bdi_dirty_limit =      55080 KB
[  245.327903] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      93056 KB,
bdi_dirty_limit =      55080 KB
[  248.356755] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      93184 KB,
bdi_dirty_limit =      55484 KB
[  249.753702] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        480 KB
[  249.771723] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        480 KB
[  249.791753] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        476 KB
[  250.769776] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        736 KB
[  250.785677] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        736 KB
[  250.807895] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        732 KB
[  251.005127] [1]:flush--8:16 : BDI_RECLAIMABLE =        512 KB,
bdi_dirty_limit =       1036 KB
[  251.013080] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1036 KB
[  251.024465] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1036 KB
[  251.616792] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90112 KB,
bdi_dirty_limit =      55024 KB
[  251.624734] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90112 KB,
bdi_dirty_limit =      55024 KB
[  252.012840] [1]:flush--8:16 : BDI_RECLAIMABLE =        512 KB,
bdi_dirty_limit =       1168 KB
[  252.029653] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1164 KB
[  252.048298] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1160 KB
[  252.261246] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1400 KB
[  252.269284] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1400 KB
[  252.281098] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1396 KB
[  253.166740] [1]:flush--8:16 : BDI_RECLAIMABLE =        512 KB,
bdi_dirty_limit =       1332 KB
[  253.174682] [1]:flush--8:16 : BDI_RECLAIMABLE =        512 KB,
bdi_dirty_limit =       1332 KB
[  253.184909] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =       1364 KB
[  254.916909] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90240 KB,
bdi_dirty_limit =      54776 KB
[  258.174616] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      77056 KB,
bdi_dirty_limit =      55244 KB
[  258.361648] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        752 KB
[  258.373363] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        752 KB
[  258.396216] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        748 KB
[  259.289888] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        820 KB
[  259.302663] [1]:flush--8:16 : BDI_RECLAIMABLE =        128 KB,
bdi_dirty_limit =        820 KB
[  259.315969] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        864 KB
[  261.029994] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      94656 KB,
bdi_dirty_limit =      55176 KB
[  261.087820] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      94656 KB,
bdi_dirty_limit =      55180 KB
[  264.177467] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90688 KB,
bdi_dirty_limit =      55448 KB
[  264.345671] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        496 KB
[  264.360635] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        496 KB
[  264.382961] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        492 KB
[  264.421008] [1]:flush--8:16 : BDI_RECLAIMABLE =        192 KB,
bdi_dirty_limit =        532 KB
[  264.429271] [1]:flush--8:16 : BDI_RECLAIMABLE =        192 KB,
bdi_dirty_limit =        532 KB
[  264.440572] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        572 KB
[  265.271753] [1]:flush--8:16 : BDI_RECLAIMABLE =        128 KB,
bdi_dirty_limit =        540 KB
[  265.279611] [1]:flush--8:16 : BDI_RECLAIMABLE =        128 KB,
bdi_dirty_limit =        540 KB
[  265.290591] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        576 KB
[  267.490909] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      87488 KB,
bdi_dirty_limit =      55364 KB
[  267.584972] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      92992 KB,
bdi_dirty_limit =      55388 KB
[  270.329631] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        424 KB
[  270.344304] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        424 KB
[  270.355331] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        424 KB
[  270.809216] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      92544 KB,
bdi_dirty_limit =      55536 KB
[  271.238783] [1]:flush--8:16 : BDI_RECLAIMABLE =        128 KB,
bdi_dirty_limit =        468 KB
[  271.246592] [1]:flush--8:16 : BDI_RECLAIMABLE =        192 KB,
bdi_dirty_limit =        468 KB
[  271.257163] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        496 KB
[  274.248876] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90304 KB,
bdi_dirty_limit =      55592 KB
[  276.377686] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        324 KB
[  276.389785] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        324 KB
[  276.416520] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        320 KB
[  276.482112] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        376 KB
[  276.494751] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        376 KB
[  276.508280] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        376 KB
[  277.550003] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      89856 KB,
bdi_dirty_limit =      55400 KB
[  277.558115] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      89856 KB,
bdi_dirty_limit =      55400 KB
[  280.750010] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        388 KB
[  280.761704] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        388 KB
[  280.788278] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        388 KB
[  280.827970] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        440 KB
[  280.835839] [1]:flush--8:16 : BDI_RECLAIMABLE =          0 KB,
bdi_dirty_limit =        440 KB
[  280.848928] [1]:flush--8:16 : BDI_RECLAIMABLE =         64 KB,
bdi_dirty_limit =        468 KB
[  280.889357] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      78016 KB,
bdi_dirty_limit =      55496 KB

As mentioned above, when global memory is more than background dirty threshhold
over_bground_thresh is mostly returning true even for small dirty chunks
like 64K, 128K etc..


>
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92032 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 91968 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =  1024 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =    64 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   576 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 84352 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   192 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =   512 KB
>> [over_bground_thresh]: wakeup flush-8:16 : BDI_RECLAIMABLE =     0 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92608 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE = 92544 KB
>>
>> As mentioned in above log, when global dirty memory > global
>> background_thresh
>> small cached data is also forced to flush by flush-8:16.
>>
>> If removing global background_thresh checking code, we can reduce cache
>> thrashing of frequently used small data.
>> And It will be great if we can reserve a portion of writeback cache using
>> min_ratio.
>
>> After applying patch:
>> $ echo 5 > /sys/block/sdb/bdi/min_ratio
>> $ cat /sys/block/sdb/bdi/min_ratio
>> 5
>
> The below log looks all perfect. However the min_ratio setup is a
> problem. If possible, I'd like the final patch being able to work
> reasonably well with min_ratio=0 (the system default), too.
>
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56064 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  56704 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  84160 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  96960 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  94080 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93120 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  91520 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  89600 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  93696 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  72960 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90624 KB
>> [over_bground_thresh]: wakeup flush-8:0 : BDI_RECLAIMABLE =  90688 KB
>
>> As mentioned in the above logs, once cache is reserved for Process B,
>> and patch is applied there is less writeback cache thrashing on sdb
>> by frequent forced writeback by flush-8:16 in over_bground_thresh.
>>
>> After all, small cached data will be flushed by periodic writeback
>> once every dirty_writeback_interval.
>>
>> Suggested-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
>> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
>> Cc: Fengguang Wu <fengguang.wu@intel.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> ---
>>  fs/fs-writeback.c |    4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 310972b..070b773 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -756,10 +756,6 @@ static bool over_bground_thresh(struct
>> backing_dev_info *bdi)
>>
>>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>>
>> -	if (global_page_state(NR_FILE_DIRTY) +
>> -	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
>> -		return true;
>> -
>
> That global test should be kept in some form (see Jan's proposal).
> Because the below per-bdi test can be inaccurate in various ways:
>
> - bdi_stat() may have errors up to bdi_stat_error()
>
> - bdi_dirty_limit() may be arbitrarily shifted by min_ratio etc.
>
> - bdi_dirty_limit() may be totally wrong due to the estimation in
>   bdi_writeout_fraction() is in its initial value 0, or is still
>   trying to catch up with sudden workload changes.
>
>>  	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
>>  				bdi_dirty_limit(bdi, background_thresh))
>>  		return true;
>
> I suspect even removing the global test as in your patch, the above
> bdi test will still mostly return true for your described workload,
> due to bdi_dirty_limit() returning a value close to 0, because the
> writeout fraction of sdb is close to 0.
>
> You cleverly avoided this in your test by raising min_ratio to 5.
> However I'd suggest to test with min_ratio=0 and try solutions that
> can work well in such default configuration.

Yes, after applying min_ratio = 0, cache thrashing will be reduced
 but it will not fixed 100%. Please find below logs with min_ratio = 0:

[  446.250089] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      78208 KB,
bdi_dirty_limit =      55308 KB
[  447.961688] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        548 KB
[  447.969524] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        548 KB
[  448.177532] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        924 KB
[  448.189900] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        924 KB
[  449.005822] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      92928 KB,
bdi_dirty_limit =      55160 KB
[  452.052060] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      93696 KB,
bdi_dirty_limit =      55308 KB
[  453.225619] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        568 KB
[  453.233652] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        568 KB
[  453.451407] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        876 KB
[  453.463401] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        876 KB
[  455.437187] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      94976 KB,
bdi_dirty_limit =      55068 KB
[  457.573684] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        640 KB
[  457.589837] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        640 KB
[  458.648492] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      91776 KB,
bdi_dirty_limit =      55172 KB
[  458.656590] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      91776 KB,
bdi_dirty_limit =      55172 KB
[  458.657641] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        832 KB
[  458.657664] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        832 KB
[  461.771683] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      88128 KB,
bdi_dirty_limit =      55336 KB
[  464.164928] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        484 KB
[  464.185637] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        480 KB
[  464.254258] [2]:flush--8:16 : BDI_RECLAIMABLE =        192 KB,
bdi_dirty_limit =        508 KB
[  464.262889] [2]:flush--8:16 : BDI_RECLAIMABLE =        192 KB,
bdi_dirty_limit =        508 KB
[  464.998619] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      89600 KB,
bdi_dirty_limit =      55268 KB
[  465.006586] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      89600 KB,
bdi_dirty_limit =      55268 KB
[  468.355120] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      78016 KB,
bdi_dirty_limit =      55568 KB
[  469.289622] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        384 KB
[  469.297527] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        384 KB
[  469.445361] [1]:flush--8:16 : BDI_RECLAIMABLE =        768 KB,
bdi_dirty_limit =        552 KB
[  469.453357] [1]:flush--8:16 : BDI_RECLAIMABLE =        768 KB,
bdi_dirty_limit =        552 KB
[  470.431779] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        812 KB
[  470.439594] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        812 KB
[  470.631494] [2]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1160 KB
[  470.643585] [2]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1160 KB
[  471.111608] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90496 KB,
bdi_dirty_limit =      54900 KB
[  471.119563] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90496 KB,
bdi_dirty_limit =      54900 KB
[  471.686473] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1284 KB
[  471.694414] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1284 KB
[  474.252738] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      88192 KB,
bdi_dirty_limit =      54868 KB
[  474.261279] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1124 KB
[  474.273592] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1124 KB
[  477.701264] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      82112 KB,
bdi_dirty_limit =      55148 KB
[  477.713485] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      82176 KB,
bdi_dirty_limit =      55164 KB
[  480.281635] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        592 KB
[  480.289460] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        592 KB
[  480.676031] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90432 KB,
bdi_dirty_limit =      55160 KB
[  480.808089] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        856 KB
[  480.829621] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        852 KB
[  483.733722] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        780 KB
[  483.745819] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        780 KB
[  484.021687] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      90816 KB,
bdi_dirty_limit =      54908 KB
[  484.029652] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1128 KB
[  484.045597] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1124 KB
[  485.063624] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =       1256 KB
[  485.071569] [2]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1256 KB
[  487.273324] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      94656 KB,
bdi_dirty_limit =      54840 KB
[  487.281646] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      94656 KB,
bdi_dirty_limit =      54840 KB
[  490.233678] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        716 KB
[  490.249621] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        712 KB
[  490.486277] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      88000 KB,
bdi_dirty_limit =      55276 KB
[  491.243332] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        836 KB
[  491.268240] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        868 KB
[  493.928206] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      95040 KB,
bdi_dirty_limit =      55248 KB
[  494.329925] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        692 KB
[  494.341743] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        692 KB
[  495.330591] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        864 KB
[  495.341611] [1]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =        864 KB
[  495.540560] [2]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1096 KB
[  495.548499] [2]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1096 KB
[  496.552820] [2]:flush--8:16 : BDI_RECLAIMABLE =        704 KB,
bdi_dirty_limit =       1196 KB
[  496.560772] [2]:flush--8:16 : BDI_RECLAIMABLE =        704 KB,
bdi_dirty_limit =       1196 KB
[  496.698932] [2]:flush--8:16 : BDI_RECLAIMABLE =        640 KB,
bdi_dirty_limit =       1316 KB
[  496.706910] [2]:flush--8:16 : BDI_RECLAIMABLE =       1024 KB,
bdi_dirty_limit =       1316 KB
[  497.333402] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      88000 KB,
bdi_dirty_limit =      54784 KB
[  497.341356] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      88000 KB,
bdi_dirty_limit =      54784 KB
[  500.524621] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      69760 KB,
bdi_dirty_limit =      54980 KB
[  502.601654] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        664 KB
[  502.616538] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        664 KB
[  502.817760] [2]:flush--8:16 : BDI_RECLAIMABLE =        576 KB,
bdi_dirty_limit =        968 KB
[  502.829866] [2]:flush--8:16 : BDI_RECLAIMABLE =        832 KB,
bdi_dirty_limit =        964 KB
[  503.129652] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      91328 KB,
bdi_dirty_limit =      55028 KB
[  504.905684] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        964 KB
[  504.917666] [2]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        964 KB
[  506.422420] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      93824 KB,
bdi_dirty_limit =      55056 KB
[  509.545213] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      75840 KB,
bdi_dirty_limit =      55416 KB
[  509.635997] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        580 KB
[  509.648012] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        580 KB
[  509.662575] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      80384 KB,
bdi_dirty_limit =      55412 KB
[  512.607418] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      88960 KB,
bdi_dirty_limit =      55372 KB
[  515.643594] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        416 KB
[  515.657627] [1]:flush--8:16 : BDI_RECLAIMABLE =        960 KB,
bdi_dirty_limit =        416 KB
[  515.873800] [1]:flush-- 8:0 : BDI_RECLAIMABLE =      93440 KB,
bdi_dirty_limit =      55252 KB

As mentioned above after applying patch 'over_bground_thresh' is
returning true when reclaimable is more as compared to original
kernel.

And,
When we carefully observed the changes which we did to control the
returning condition for ‘over_bground_thresh’,
Even though the code changes help in avoiding unnecessary write-back
or wakeup in some scenarios.

But in one normal scenario, the changes actually results in
performance degradation.

Results for ‘dd’ thread on two devices:
Before applying Patch:
#> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
#> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
#>
#> 2000+0 records in
2000+0 records out
2097152000 bytes (2.0GB) copied, 77.205276 seconds, 25.9MB/s  -> USB
HDD WRITE Speed

[2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
#>
#>
#> 800+0 records in
800+0 records out
838860800 bytes (800.0MB) copied, 154.528362 seconds, 5.2MB/s -> USB
Flash WRITE Speed

After applying patch:
#> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
dd if=/
#> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
#>
#> 2000+0 records in
2000+0 records out
2097152000 bytes (2.0GB) copied, 123.844770 seconds, 16.1MB/s ->USB
HDD WRITE Speed
800+0 records in
800+0 records out
838860800 bytes (800.0MB) copied, 141.352945 seconds, 5.7MB/s -> USB
Flash WRITE Speed

[2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
[1]+ Done dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800

So, after applying our changes:
1) USB HDD Write speed dropped from 25.9 -> 16.1 MB/s
2) USB Flash Write speed increased marginally from 5.2 -> 5.7 MB/s

Normally if we have a USB Flash and HDD plugged in system. And if we
initiate the ‘dd’ on both the devices. Once dirty memory is more than
the background threshold, flushing starts for all BDI (The write-back
for the devices will be kicked by the condition):
If (global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS) > background_thresh))
	return true;
As the slow device and the fast device always make sure that there is
enough DIRTY data in memory to kick write-back.
Since, USB Flash is slow, the DIRTY pages corresponding to this device
is much higher, resulting in returning ‘true’ everytime from
over_bground_thresh. So, even though HDD might have only few KB of
dirty data, it is also flushed immediately.
This frequent flushing of HDD data results in gradually increasing the
bdi_dirty_limit() for HDD.

But, when we introduce the change to control per BDI i.e.,
 if (global_page_state(NR_FILE_DIRTY) +
         global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
         reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)

Now, in this case, when we consider the same scenario, writeback for
HDD will only be kicked only if ‘reclaimable * 2 + bdi_stat_error(bdi)
* 2 > bdi_bground_thresh’
But this condition is not true a lot many number of times, so
resulting in false.

This continuous failure to start write-back for HDD actually results
in lowering the bdi_dirty_limit for HDD, in a way PAUSING the writer
thread for HDD.
This is actually resulting in less number of WRITE operations per
second for HDD. As, the ‘dd’ on USB HDD will be put to long sleep(MAX
PAUSE) in balance_dirty_pages.

While for USB Flash, its bdi_dirty_limit is kept on increasing as it
is getting more chance to flush dirty data in over_bground_thresh. As,
bdi_reclaimable > bdi_dirty_limit is true. So, resulting more number
of WRITE operation per second for USB Flash.
>From these observations, we feel that these changes might not be
needed. Please let us know in case we are missing on any point here,
we can further check more on this.

Please share your opinion.

Thanks.
>
> Thanks,
> Fengguang
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-09  8:26   ` Namjae Jeon
@ 2013-01-09 15:13     ` Jan Kara
  2013-01-10  2:50       ` Wanpeng Li
  2013-01-10 11:58       ` Namjae Jeon
  0 siblings, 2 replies; 18+ messages in thread
From: Jan Kara @ 2013-01-09 15:13 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Fengguang Wu, linux-fsdevel, linux-mm, linux-kernel, liwanp,
	Namjae Jeon, Vivek Trivedi, Jan Kara, Dave Chinner, Simon Jeons

On Wed 09-01-13 17:26:36, Namjae Jeon wrote:
<snip>
> But in one normal scenario, the changes actually results in
> performance degradation.
> 
> Results for ‘dd’ thread on two devices:
> Before applying Patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 77.205276 seconds, 25.9MB/s  -> USB
> HDD WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> #>
> #>
> #> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 154.528362 seconds, 5.2MB/s -> USB
> Flash WRITE Speed
> 
> After applying patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> dd if=/
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 123.844770 seconds, 16.1MB/s ->USB
> HDD WRITE Speed
> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 141.352945 seconds, 5.7MB/s -> USB
> Flash WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> [1]+ Done dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800
> 
> So, after applying our changes:
> 1) USB HDD Write speed dropped from 25.9 -> 16.1 MB/s
> 2) USB Flash Write speed increased marginally from 5.2 -> 5.7 MB/s
> 
> Normally if we have a USB Flash and HDD plugged in system. And if we
> initiate the ‘dd’ on both the devices. Once dirty memory is more than
> the background threshold, flushing starts for all BDI (The write-back
> for the devices will be kicked by the condition):
> If (global_page_state(NR_FILE_DIRTY) +
> global_page_state(NR_UNSTABLE_NFS) > background_thresh))
> 	return true;
> As the slow device and the fast device always make sure that there is
> enough DIRTY data in memory to kick write-back.
> Since, USB Flash is slow, the DIRTY pages corresponding to this device
> is much higher, resulting in returning ‘true’ everytime from
> over_bground_thresh. So, even though HDD might have only few KB of
> dirty data, it is also flushed immediately.
> This frequent flushing of HDD data results in gradually increasing the
> bdi_dirty_limit() for HDD.
  Interesting. Thanks for testing! So is this just a problem with initial
writeout fraction estimation. I.e. if you first let dd to USB HDD run for a
couple of seconds to ramp up its fraction and only then start writeout to
USB flash, is there still a problem with USB HDD throughput with the
changed over_bground_thresh() function?

> But, when we introduce the change to control per BDI i.e.,
>  if (global_page_state(NR_FILE_DIRTY) +
>          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
> 
> Now, in this case, when we consider the same scenario, writeback for
> HDD will only be kicked only if ‘reclaimable * 2 + bdi_stat_error(bdi)
> * 2 > bdi_bground_thresh’
> But this condition is not true a lot many number of times, so
> resulting in false.
  I'm surprised it's not true so often... dd(1) should easily fill the
caches. But maybe we are oscilating between below-background-threshold
and at-dirty-limit situations rather quickly. Do you have recordings of
BDI_RECLAIMABLE and BDI_DIRTY from the problematic run?

> This continuous failure to start write-back for HDD actually results
> in lowering the bdi_dirty_limit for HDD, in a way PAUSING the writer
> thread for HDD.
> This is actually resulting in less number of WRITE operations per
> second for HDD. As, the ‘dd’ on USB HDD will be put to long sleep(MAX
> PAUSE) in balance_dirty_pages.
> 
> While for USB Flash, its bdi_dirty_limit is kept on increasing as it
> is getting more chance to flush dirty data in over_bground_thresh. As,
> bdi_reclaimable > bdi_dirty_limit is true. So, resulting more number
> of WRITE operation per second for USB Flash.
> From these observations, we feel that these changes might not be
> needed. Please let us know in case we are missing on any point here,
> we can further check more on this.
  Well, at least we know changing the condition has unexpected side
effects. I'd like to understand those before discarding the idea - because
in your setup flusher thread must end up writing rather small amount of
pages in each run when it's running continuously and that's not too good
either...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-09 15:13     ` Jan Kara
@ 2013-01-10  2:50       ` Wanpeng Li
  2013-01-10 11:58       ` Namjae Jeon
  1 sibling, 0 replies; 18+ messages in thread
From: Wanpeng Li @ 2013-01-10  2:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Namjae Jeon, Fengguang Wu, linux-fsdevel, linux-mm, linux-kernel,
	Namjae Jeon, Vivek Trivedi, Jan Kara, Dave Chinner, Simon Jeons

On Wed, Jan 09, 2013 at 04:13:54PM +0100, Jan Kara wrote:
>On Wed 09-01-13 17:26:36, Namjae Jeon wrote:
><snip>
>> But in one normal scenario, the changes actually results in
>> performance degradation.
>> 
>> Results for ‘dd’ thread on two devices:
>> Before applying Patch:
>> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
>> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
>> #>
>> #> 2000+0 records in
>> 2000+0 records out
>> 2097152000 bytes (2.0GB) copied, 77.205276 seconds, 25.9MB/s  -> USB
>> HDD WRITE Speed
>> 
>> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
>> #>
>> #>
>> #> 800+0 records in
>> 800+0 records out
>> 838860800 bytes (800.0MB) copied, 154.528362 seconds, 5.2MB/s -> USB
>> Flash WRITE Speed
>> 
>> After applying patch:
>> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
>> dd if=/
>> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
>> #>
>> #> 2000+0 records in
>> 2000+0 records out
>> 2097152000 bytes (2.0GB) copied, 123.844770 seconds, 16.1MB/s ->USB
>> HDD WRITE Speed
>> 800+0 records in
>> 800+0 records out
>> 838860800 bytes (800.0MB) copied, 141.352945 seconds, 5.7MB/s -> USB
>> Flash WRITE Speed
>> 
>> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
>> [1]+ Done dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800
>> 
>> So, after applying our changes:
>> 1) USB HDD Write speed dropped from 25.9 -> 16.1 MB/s
>> 2) USB Flash Write speed increased marginally from 5.2 -> 5.7 MB/s
>> 
>> Normally if we have a USB Flash and HDD plugged in system. And if we
>> initiate the ‘dd’ on both the devices. Once dirty memory is more than
>> the background threshold, flushing starts for all BDI (The write-back
>> for the devices will be kicked by the condition):
>> If (global_page_state(NR_FILE_DIRTY) +
>> global_page_state(NR_UNSTABLE_NFS) > background_thresh))
>> 	return true;
>> As the slow device and the fast device always make sure that there is
>> enough DIRTY data in memory to kick write-back.
>> Since, USB Flash is slow, the DIRTY pages corresponding to this device
>> is much higher, resulting in returning ‘true’ everytime from
>> over_bground_thresh. So, even though HDD might have only few KB of
>> dirty data, it is also flushed immediately.
>> This frequent flushing of HDD data results in gradually increasing the
>> bdi_dirty_limit() for HDD.
>  Interesting. Thanks for testing! So is this just a problem with initial
>writeout fraction estimation. I.e. if you first let dd to USB HDD run for a
>couple of seconds to ramp up its fraction and only then start writeout to
>USB flash, is there still a problem with USB HDD throughput with the
>changed over_bground_thresh() function?
>
>> But, when we introduce the change to control per BDI i.e.,
>>  if (global_page_state(NR_FILE_DIRTY) +
>>          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>>          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
>> 
>> Now, in this case, when we consider the same scenario, writeback for
>> HDD will only be kicked only if ‘reclaimable * 2 + bdi_stat_error(bdi)
>> * 2 > bdi_bground_thresh’
>> But this condition is not true a lot many number of times, so
>> resulting in false.
>  I'm surprised it's not true so often... dd(1) should easily fill the

But after merge the patch, dd can't easily fill the caches since shared 
writeback cache of HDD is small. 

>caches. But maybe we are oscilating between below-background-threshold
>and at-dirty-limit situations rather quickly. Do you have recordings of
>BDI_RECLAIMABLE and BDI_DIRTY from the problematic run?
>
>> This continuous failure to start write-back for HDD actually results
>> in lowering the bdi_dirty_limit for HDD, in a way PAUSING the writer
>> thread for HDD.
>> This is actually resulting in less number of WRITE operations per
>> second for HDD. As, the ‘dd’ on USB HDD will be put to long sleep(MAX
>> PAUSE) in balance_dirty_pages.
>> 
>> While for USB Flash, its bdi_dirty_limit is kept on increasing as it
>> is getting more chance to flush dirty data in over_bground_thresh. As,
>> bdi_reclaimable > bdi_dirty_limit is true. So, resulting more number
>> of WRITE operation per second for USB Flash.
>> From these observations, we feel that these changes might not be
>> needed. Please let us know in case we are missing on any point here,
>> we can further check more on this.
>  Well, at least we know changing the condition has unexpected side
>effects. I'd like to understand those before discarding the idea - because
>in your setup flusher thread must end up writing rather small amount of
>pages in each run when it's running continuously and that's not too good
>either...
>
>								Honza
>-- 
>Jan Kara <jack@suse.cz>
>SUSE Labs, CR
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] writeback: fix writeback cache thrashing
  2013-01-09 15:13     ` Jan Kara
  2013-01-10  2:50       ` Wanpeng Li
@ 2013-01-10 11:58       ` Namjae Jeon
  1 sibling, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2013-01-10 11:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Fengguang Wu, linux-fsdevel, linux-mm, linux-kernel, liwanp,
	Namjae Jeon, Vivek Trivedi, Dave Chinner, Simon Jeons

2013/1/10 Jan Kara <jack@suse.cz>:
> On Wed 09-01-13 17:26:36, Namjae Jeon wrote:
> <snip>
>> But in one normal scenario, the changes actually results in
>> performance degradation.
>>
>> Results for ‘dd’ thread on two devices:
>> Before applying Patch:
>> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
>> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
>> #>
>> #> 2000+0 records in
>> 2000+0 records out
>> 2097152000 bytes (2.0GB) copied, 77.205276 seconds, 25.9MB/s  -> USB
>> HDD WRITE Speed
>>
>> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
>> #>
>> #>
>> #> 800+0 records in
>> 800+0 records out
>> 838860800 bytes (800.0MB) copied, 154.528362 seconds, 5.2MB/s -> USB
>> Flash WRITE Speed
>>
>> After applying patch:
>> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
>> dd if=/
>> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
>> #>
>> #> 2000+0 records in
>> 2000+0 records out
>> 2097152000 bytes (2.0GB) copied, 123.844770 seconds, 16.1MB/s ->USB
>> HDD WRITE Speed
>> 800+0 records in
>> 800+0 records out
>> 838860800 bytes (800.0MB) copied, 141.352945 seconds, 5.7MB/s -> USB
>> Flash WRITE Speed
>>
>> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
>> [1]+ Done dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800
>>
>> So, after applying our changes:
>> 1) USB HDD Write speed dropped from 25.9 -> 16.1 MB/s
>> 2) USB Flash Write speed increased marginally from 5.2 -> 5.7 MB/s
>>
>> Normally if we have a USB Flash and HDD plugged in system. And if we
>> initiate the ‘dd’ on both the devices. Once dirty memory is more than
>> the background threshold, flushing starts for all BDI (The write-back
>> for the devices will be kicked by the condition):
>> If (global_page_state(NR_FILE_DIRTY) +
>> global_page_state(NR_UNSTABLE_NFS) > background_thresh))
>>       return true;
>> As the slow device and the fast device always make sure that there is
>> enough DIRTY data in memory to kick write-back.
>> Since, USB Flash is slow, the DIRTY pages corresponding to this device
>> is much higher, resulting in returning ‘true’ everytime from
>> over_bground_thresh. So, even though HDD might have only few KB of
>> dirty data, it is also flushed immediately.
>> This frequent flushing of HDD data results in gradually increasing the
>> bdi_dirty_limit() for HDD.
>   Interesting. Thanks for testing! So is this just a problem with initial
> writeout fraction estimation. I.e. if you first let dd to USB HDD run for a
> couple of seconds to ramp up its fraction and only then start writeout to
> USB flash, is there still a problem with USB HDD throughput with the
> changed over_bground_thresh() function?
#> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=4000 &

-> sleep for 10 seconds so that USB HDD gets chance to fill cache and
its bdi_dirty_limit
becomes high.

#>
#> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
#> 800+0 records in
800+0 records out
838860800 bytes (800.0MB) copied, 146.240434 seconds, 5.5MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (3.9GB) copied, 220.184229 seconds, 18.2MB/s

[2]+  Done                       dd if=/dev/zero of=/mnt/sdb2/file1
bs=1048576 count=800
[1]+  Done                       dd if=/dev/zero of=/mnt/sda6/file2
bs=1048576 count=4000

But still there is drop in USB HDD WRITE speed from 25 MB/s -> 18.2 MB/s

>
>> But, when we introduce the change to control per BDI i.e.,
>>  if (global_page_state(NR_FILE_DIRTY) +
>>          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>>          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
>>
>> Now, in this case, when we consider the same scenario, writeback for
>> HDD will only be kicked only if ‘reclaimable * 2 + bdi_stat_error(bdi)
>> * 2 > bdi_bground_thresh’
>> But this condition is not true a lot many number of times, so
>> resulting in false.
>   I'm surprised it's not true so often... dd(1) should easily fill the
> caches. But maybe we are oscilating between below-background-threshold
> and at-dirty-limit situations rather quickly. Do you have recordings of
> BDI_RECLAIMABLE and BDI_DIRTY from the problematic run?

Yes. below is the log in problematic run with below change:

  if (global_page_state(NR_FILE_DIRTY) +
          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)

[Test Setup: ARM dual core cPU, 500 MB RAM, background_/dirty_ratio at
default setting]

#> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
#> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &

[   97.257777] [1]:flush-8:0  : BDI_DIRTIED =      57152 KB,
BDI_RECLAIMABLE =   57088 KB, bdi_dirty_limit =       0 KB

                                                             ^ Initial
BDI dirty limit for HDD
[   97.296096] [1]:flush-8:16 : BDI_DIRTIED =        128 KB,
BDI_RECLAIMABLE =      64 KB, bdi_dirty_limit =       0 KB

                                                           ^ Initial
BDI dirty limit for FLASH
[   97.321764] [1]:flush-8:16 : BDI_DIRTIED =        704 KB,
BDI_RECLAIMABLE =     640 KB, bdi_dirty_limit =       0 KB
[   97.358775] [1]:flush-8:16 : BDI_DIRTIED =       1664 KB,
BDI_RECLAIMABLE =      64 KB, bdi_dirty_limit =       0 KB
[   97.382956] [1]:flush-8:16 : BDI_DIRTIED =       2176 KB,
BDI_RECLAIMABLE =     512 KB, bdi_dirty_limit =       0 KB
[   97.393325] [1]:flush-8:16 : BDI_DIRTIED =       2816 KB,
BDI_RECLAIMABLE =    1152 KB, bdi_dirty_limit =      52 KB
[   97.410622] [2]:flush-8:16 : BDI_DIRTIED =       4096 KB,
BDI_RECLAIMABLE =       0 KB, bdi_dirty_limit =     108 KB
[   97.422451] [1]:flush-8:16 : BDI_DIRTIED =       4224 KB,
BDI_RECLAIMABLE =     128 KB, bdi_dirty_limit =     108 KB
[   97.432777] [1]:flush-8:16 : BDI_DIRTIED =       4864 KB,
BDI_RECLAIMABLE =     768 KB, bdi_dirty_limit =     164 KB
[   97.447658] [2]:flush-8:16 : BDI_DIRTIED =       6016 KB,
BDI_RECLAIMABLE =       0 KB, bdi_dirty_limit =     164 KB
[   97.466556] [2]:flush-8:16 : BDI_DIRTIED =       6016 KB,
BDI_RECLAIMABLE =      64 KB, bdi_dirty_limit =     220 KB
[   97.485760] [1]:flush-8:16 : BDI_DIRTIED =       6528 KB,
BDI_RECLAIMABLE =     512 KB, bdi_dirty_limit =     272 KB
[   97.524776] [1]:flush-8:16 : BDI_DIRTIED =       7552 KB,
BDI_RECLAIMABLE =     384 KB, bdi_dirty_limit =     380 KB
[   97.535172] [1]:flush-8:16 : BDI_DIRTIED =       7808 KB,
BDI_RECLAIMABLE =     640 KB, bdi_dirty_limit =     380 KB
[   97.594639] [1]:flush-8:16 : BDI_DIRTIED =      11904 KB,
BDI_RECLAIMABLE =    3456 KB, bdi_dirty_limit =     484 KB
[   97.604975] [1]:flush-8:16 : BDI_DIRTIED =      12224 KB,
BDI_RECLAIMABLE =    3776 KB, bdi_dirty_limit =     536 KB
[   97.667729] [1]:flush-8:16 : BDI_DIRTIED =      16512 KB,
BDI_RECLAIMABLE =    3200 KB, bdi_dirty_limit =     696 KB
[   97.678127] [1]:flush-8:16 : BDI_DIRTIED =      17152 KB,
BDI_RECLAIMABLE =    3840 KB, bdi_dirty_limit =     696 KB
[   97.729258] [1]:flush-8:16 : BDI_DIRTIED =      20608 KB,
BDI_RECLAIMABLE =    1536 KB, bdi_dirty_limit =     744 KB
[   97.739654] [1]:flush-8:16 : BDI_DIRTIED =      20992 KB,
BDI_RECLAIMABLE =    1856 KB, bdi_dirty_limit =     740 KB
[   99.412177] [1]:flush-8:0  : BDI_DIRTIED =     102656 KB,
BDI_RECLAIMABLE =   36608 KB, bdi_dirty_limit =   14476 KB
[  100.942829] [1]:flush-8:0  : BDI_DIRTIED =     140288 KB,
BDI_RECLAIMABLE =   35840 KB, bdi_dirty_limit =   21392 KB
[  101.809082] [1]:flush-8:16 : BDI_DIRTIED =      86720 KB,
BDI_RECLAIMABLE =   51328 KB, bdi_dirty_limit =    4096 KB
[  102.266529] [1]:flush-8:0  : BDI_DIRTIED =     173056 KB,
BDI_RECLAIMABLE =   31232 KB, bdi_dirty_limit =   26304 KB
[  103.474246] [2]:flush-8:0  : BDI_DIRTIED =     189440 KB,
BDI_RECLAIMABLE =   16512 KB, bdi_dirty_limit =   30660 KB
[  104.100257] [2]:flush-8:0  : BDI_DIRTIED =     207616 KB,
BDI_RECLAIMABLE =   16704 KB, bdi_dirty_limit =   32220 KB
[  104.110758] [2]:flush-8:0  : BDI_DIRTIED =     207616 KB,
BDI_RECLAIMABLE =   16704 KB, bdi_dirty_limit =   32236 KB
[  104.808133] [2]:flush-8:0  : BDI_DIRTIED =     225856 KB,
BDI_RECLAIMABLE =   18240 KB, bdi_dirty_limit =   33536 KB
[  105.451483] [2]:flush-8:0  : BDI_DIRTIED =     244096 KB,
BDI_RECLAIMABLE =   18240 KB, bdi_dirty_limit =   34704 KB
[  108.993027] [2]:flush-8:0  : BDI_DIRTIED =     306048 KB,
BDI_RECLAIMABLE =   20352 KB, bdi_dirty_limit =   39888 KB
[  109.003379] [2]:flush-8:0  : BDI_DIRTIED =     306048 KB,
BDI_RECLAIMABLE =   20352 KB, bdi_dirty_limit =   39888 KB
[  109.707771] [2]:flush-8:0  : BDI_DIRTIED =     322432 KB,
BDI_RECLAIMABLE =   20352 KB, bdi_dirty_limit =   40512 KB
[  109.718185] [2]:flush-8:0  : BDI_DIRTIED =     322432 KB,
BDI_RECLAIMABLE =   20352 KB, bdi_dirty_limit =   40512 KB
[  110.682318] [2]:flush-8:0  : BDI_DIRTIED =     344320 KB,
BDI_RECLAIMABLE =   21888 KB, bdi_dirty_limit =   40832 KB
[  110.692868] [2]:flush-8:0  : BDI_DIRTIED =     344320 KB,
BDI_RECLAIMABLE =   21888 KB, bdi_dirty_limit =   40808 KB
[  112.607992] [1]:flush-8:16 : BDI_DIRTIED =     171328 KB,
BDI_RECLAIMABLE =   84544 KB, bdi_dirty_limit =    8416 KB
[  115.183151] [2]:flush-8:0  : BDI_DIRTIED =     402112 KB,
BDI_RECLAIMABLE =   20096 KB, bdi_dirty_limit =   40012 KB
[  115.193685] [2]:flush-8:0  : BDI_DIRTIED =     402112 KB,
BDI_RECLAIMABLE =   20096 KB, bdi_dirty_limit =   40012 KB
[  115.756987] [2]:flush-8:0  : BDI_DIRTIED =     416960 KB,
BDI_RECLAIMABLE =   22592 KB, bdi_dirty_limit =   40324 KB
[  115.767339] [2]:flush-8:0  : BDI_DIRTIED =     416960 KB,
BDI_RECLAIMABLE =   22592 KB, bdi_dirty_limit =   40324 KB
[  118.357719] [2]:flush-8:0  : BDI_DIRTIED =     440768 KB,
BDI_RECLAIMABLE =   20096 KB, bdi_dirty_limit =   40396 KB
[  118.377761] [2]:flush-8:0  : BDI_DIRTIED =     441600 KB,
BDI_RECLAIMABLE =   20928 KB, bdi_dirty_limit =   40388 KB
[  118.878817] [2]:flush-8:0  : BDI_DIRTIED =     453504 KB,
BDI_RECLAIMABLE =   20608 KB, bdi_dirty_limit =   40824 KB
[  118.892183] [2]:flush-8:0  : BDI_DIRTIED =     453632 KB,
BDI_RECLAIMABLE =   20736 KB, bdi_dirty_limit =   40828 KB
[  119.986119] [2]:flush-8:0  : BDI_DIRTIED =     477248 KB,
BDI_RECLAIMABLE =   21824 KB, bdi_dirty_limit =   41436 KB
[  119.996632] [2]:flush-8:0  : BDI_DIRTIED =     477248 KB,
BDI_RECLAIMABLE =   21824 KB, bdi_dirty_limit =   41436 KB
[  120.760551] [2]:flush-8:0  : BDI_DIRTIED =     499136 KB,
BDI_RECLAIMABLE =   21888 KB, bdi_dirty_limit =   42000 KB
[  120.771384] [2]:flush-8:0  : BDI_DIRTIED =     499904 KB,
BDI_RECLAIMABLE =   22592 KB, bdi_dirty_limit =   41952 KB
[  122.980263] [2]:flush-8:0  : BDI_DIRTIED =     561472 KB,
BDI_RECLAIMABLE =   21696 KB, bdi_dirty_limit =   43116 KB
[  122.995903] [2]:flush-8:0  : BDI_DIRTIED =     561728 KB,
BDI_RECLAIMABLE =   21888 KB, bdi_dirty_limit =   43124 KB
[  123.751970] [2]:flush-8:0  : BDI_DIRTIED =     574336 KB,
BDI_RECLAIMABLE =   22336 KB, bdi_dirty_limit =   43516 KB
[  123.776339] [2]:flush-8:0  : BDI_DIRTIED =     575616 KB,
BDI_RECLAIMABLE =   23680 KB, bdi_dirty_limit =   43456 KB
[  124.829712] [2]:flush-8:0  : BDI_DIRTIED =     598016 KB,
BDI_RECLAIMABLE =   21760 KB, bdi_dirty_limit =   43696 KB
[  124.841836] [2]:flush-8:0  : BDI_DIRTIED =     598144 KB,
BDI_RECLAIMABLE =   21888 KB, bdi_dirty_limit =   43704 KB
[  124.856201] [2]:flush-8:0  : BDI_DIRTIED =     599360 KB,
BDI_RECLAIMABLE =   23104 KB, bdi_dirty_limit =   43672 KB
[  125.966983] [2]:flush-8:0  : BDI_DIRTIED =     624640 KB,
BDI_RECLAIMABLE =   22912 KB, bdi_dirty_limit =   43816 KB
[  125.977299] [2]:flush-8:0  : BDI_DIRTIED =     624640 KB,
BDI_RECLAIMABLE =   22912 KB, bdi_dirty_limit =   43816 KB

                   ^ Max HDD BDI dirty limit during parallel write on
USB FLASH and HDD

[  130.396626] [1]:flush-8:16 : BDI_DIRTIED =     259904 KB,
BDI_RECLAIMABLE =   83072 KB, bdi_dirty_limit =   12644 KB
[  131.898466] [2]:flush-8:0  : BDI_DIRTIED =     684864 KB,
BDI_RECLAIMABLE =   21184 KB, bdi_dirty_limit =   40572 KB
[  131.908962] [2]:flush-8:0  : BDI_DIRTIED =     684864 KB,
BDI_RECLAIMABLE =   21184 KB, bdi_dirty_limit =   40572 KB
[  132.357273] [2]:flush-8:0  : BDI_DIRTIED =     692608 KB,
BDI_RECLAIMABLE =   20736 KB, bdi_dirty_limit =   40756 KB
[  132.367774] [2]:flush-8:0  : BDI_DIRTIED =     692608 KB,
BDI_RECLAIMABLE =   20736 KB, bdi_dirty_limit =   40756 KB
[  135.066501] [2]:flush-8:0  : BDI_DIRTIED =     713856 KB,
BDI_RECLAIMABLE =   21312 KB, bdi_dirty_limit =   40092 KB
[  135.076876] [2]:flush-8:0  : BDI_DIRTIED =     713856 KB,
BDI_RECLAIMABLE =   21312 KB, bdi_dirty_limit =   40084 KB
[  138.678435] [2]:flush-8:0  : BDI_DIRTIED =     755456 KB,
BDI_RECLAIMABLE =   20480 KB, bdi_dirty_limit =   38848 KB
[  138.688844] [2]:flush-8:0  : BDI_DIRTIED =     755456 KB,
BDI_RECLAIMABLE =   20480 KB, bdi_dirty_limit =   38848 KB
[  139.980078] [2]:flush-8:0  : BDI_DIRTIED =     763008 KB,
BDI_RECLAIMABLE =   19776 KB, bdi_dirty_limit =   38052 KB
[  139.990835] [2]:flush-8:0  : BDI_DIRTIED =     763008 KB,
BDI_RECLAIMABLE =   19776 KB, bdi_dirty_limit =   38020 KB
[  145.847841] [2]:flush-8:0  : BDI_DIRTIED =     797568 KB,
BDI_RECLAIMABLE =   18112 KB, bdi_dirty_limit =   36100 KB
[  145.859417] [2]:flush-8:0  : BDI_DIRTIED =     797568 KB,
BDI_RECLAIMABLE =   18112 KB, bdi_dirty_limit =   36104 KB
[  146.574201] [1]:flush-8:16 : BDI_DIRTIED =     336576 KB,
BDI_RECLAIMABLE =   76416 KB, bdi_dirty_limit =   19796 KB
[  147.284058] [1]:flush-8:16 : BDI_DIRTIED =     346176 KB,
BDI_RECLAIMABLE =   86016 KB, bdi_dirty_limit =   18892 KB
[  149.589827] [2]:flush-8:0  : BDI_DIRTIED =     819904 KB,
BDI_RECLAIMABLE =   16768 KB, bdi_dirty_limit =   33584 KB
[  149.604891] [2]:flush-8:0  : BDI_DIRTIED =     820736 KB,
BDI_RECLAIMABLE =   17600 KB, bdi_dirty_limit =   33576 KB
[  149.867466] [2]:flush-8:0  : BDI_DIRTIED =     825344 KB,
BDI_RECLAIMABLE =   18112 KB, bdi_dirty_limit =   33876 KB
[  149.877809] [2]:flush-8:0  : BDI_DIRTIED =     825344 KB,
BDI_RECLAIMABLE =   18112 KB, bdi_dirty_limit =   33848 KB
[  152.115439] [2]:flush-8:0  : BDI_DIRTIED =     843584 KB,
BDI_RECLAIMABLE =   18176 KB, bdi_dirty_limit =   33704 KB
[  152.125781] [2]:flush-8:0  : BDI_DIRTIED =     843584 KB,
BDI_RECLAIMABLE =   18176 KB, bdi_dirty_limit =   33692 KB
[  153.339221] [2]:flush-8:0  : BDI_DIRTIED =     861824 KB,
BDI_RECLAIMABLE =   18176 KB, bdi_dirty_limit =   34540 KB
[  153.349732] [2]:flush-8:0  : BDI_DIRTIED =     861824 KB,
BDI_RECLAIMABLE =   18176 KB, bdi_dirty_limit =   34540 KB
[  155.404477] [2]:flush-8:0  : BDI_DIRTIED =     880064 KB,
BDI_RECLAIMABLE =   18176 KB, bdi_dirty_limit =   34528 KB
[  155.414933] [2]:flush-8:0  : BDI_DIRTIED =     880064 KB,
BDI_RECLAIMABLE =   18176 KB, bdi_dirty_limit =   34528 KB
[  158.695935] [2]:flush-8:0  : BDI_DIRTIED =     896512 KB,
BDI_RECLAIMABLE =   16384 KB, bdi_dirty_limit =   32540 KB
[  158.706260] [2]:flush-8:0  : BDI_DIRTIED =     896512 KB,
BDI_RECLAIMABLE =   16384 KB, bdi_dirty_limit =   32540 KB
[  161.719063] [2]:flush-8:0  : BDI_DIRTIED =     912896 KB,
BDI_RECLAIMABLE =   16320 KB, bdi_dirty_limit =   32368 KB
[  161.729534] [2]:flush-8:0  : BDI_DIRTIED =     912896 KB,
BDI_RECLAIMABLE =   16320 KB, bdi_dirty_limit =   32368 KB
[  164.056751] [1]:flush-8:16 : BDI_DIRTIED =     434048 KB,
BDI_RECLAIMABLE =   86656 KB, bdi_dirty_limit =   22952 KB
[  166.220028] [2]:flush-8:0  : BDI_DIRTIED =     929088 KB,
BDI_RECLAIMABLE =   16128 KB, bdi_dirty_limit =   30928 KB
[  166.230538] [2]:flush-8:0  : BDI_DIRTIED =     929088 KB,
BDI_RECLAIMABLE =   16128 KB, bdi_dirty_limit =   30928 KB
[  166.241768] [2]:flush-8:0  : BDI_DIRTIED =     929088 KB,
BDI_RECLAIMABLE =   16128 KB, bdi_dirty_limit =   30912 KB
[  168.417606] [2]:flush-8:0  : BDI_DIRTIED =     945280 KB,
BDI_RECLAIMABLE =   16256 KB, bdi_dirty_limit =   31044 KB
[  168.427966] [2]:flush-8:0  : BDI_DIRTIED =     945280 KB,
BDI_RECLAIMABLE =   16256 KB, bdi_dirty_limit =   31044 KB
[  171.461426] [2]:flush-8:0  : BDI_DIRTIED =     961472 KB,
BDI_RECLAIMABLE =   16128 KB, bdi_dirty_limit =   29900 KB
[  171.471806] [2]:flush-8:0  : BDI_DIRTIED =     961472 KB,
BDI_RECLAIMABLE =   16128 KB, bdi_dirty_limit =   29880 KB
[  174.109197] [2]:flush-8:0  : BDI_DIRTIED =     976768 KB,
BDI_RECLAIMABLE =   15168 KB, bdi_dirty_limit =   29808 KB
[  174.119557] [2]:flush-8:0  : BDI_DIRTIED =     976768 KB,
BDI_RECLAIMABLE =   15168 KB, bdi_dirty_limit =   29792 KB
[  177.967806] [2]:flush-8:0  : BDI_DIRTIED =     991552 KB,
BDI_RECLAIMABLE =   14784 KB, bdi_dirty_limit =   28540 KB
[  177.978373] [2]:flush-8:0  : BDI_DIRTIED =     991552 KB,
BDI_RECLAIMABLE =   14784 KB, bdi_dirty_limit =   28544 KB
[  180.922946] [1]:flush-8:16 : BDI_DIRTIED =     518976 KB,
BDI_RECLAIMABLE =   84864 KB, bdi_dirty_limit =   26508 KB
[  180.933314] [1]:flush-8:16 : BDI_DIRTIED =     518976 KB,
BDI_RECLAIMABLE =   84864 KB, bdi_dirty_limit =   26508 KB

                   ^ Max FLASH BDI dirty limit during parallel write
on USB FLASH and HDD
[  182.799533] [2]:flush-8:0  : BDI_DIRTIED =    1006528 KB,
BDI_RECLAIMABLE =   14912 KB, bdi_dirty_limit =   28460 KB
[  182.809937] [2]:flush-8:0  : BDI_DIRTIED =    1006528 KB,
BDI_RECLAIMABLE =   14912 KB, bdi_dirty_limit =   28460 KB
[  185.829707] [2]:flush-8:0  : BDI_DIRTIED =    1020352 KB,
BDI_RECLAIMABLE =   13824 KB, bdi_dirty_limit =   27788 KB
[  185.852849] [2]:flush-8:0  : BDI_DIRTIED =    1021120 KB,
BDI_RECLAIMABLE =   14592 KB, bdi_dirty_limit =   27760 KB
[  190.442186] [2]:flush-8:0  : BDI_DIRTIED =    1045824 KB,
BDI_RECLAIMABLE =   14464 KB, bdi_dirty_limit =   28316 KB
[  190.452755] [2]:flush-8:0  : BDI_DIRTIED =    1045824 KB,
BDI_RECLAIMABLE =   14464 KB, bdi_dirty_limit =   28320 KB
[  193.282394] [2]:flush-8:0  : BDI_DIRTIED =    1060288 KB,
BDI_RECLAIMABLE =   14400 KB, bdi_dirty_limit =   28052 KB
[  193.292821] [2]:flush-8:0  : BDI_DIRTIED =    1060288 KB,
BDI_RECLAIMABLE =   14400 KB, bdi_dirty_limit =   28052 KB
[  193.849873] [2]:flush-8:0  : BDI_DIRTIED =    1074880 KB,
BDI_RECLAIMABLE =   14592 KB, bdi_dirty_limit =   29168 KB
[  193.860446] [2]:flush-8:0  : BDI_DIRTIED =    1074880 KB,
BDI_RECLAIMABLE =   14592 KB, bdi_dirty_limit =   29168 KB
[  194.456956] [2]:flush-8:0  : BDI_DIRTIED =    1091264 KB,
BDI_RECLAIMABLE =   16384 KB, bdi_dirty_limit =   30524 KB
[  194.470853] [2]:flush-8:0  : BDI_DIRTIED =    1092480 KB,
BDI_RECLAIMABLE =   17536 KB, bdi_dirty_limit =   30528 KB
[  195.117999] [2]:flush-8:0  : BDI_DIRTIED =    1111232 KB,
BDI_RECLAIMABLE =   16832 KB, bdi_dirty_limit =   32060 KB
[  195.128626] [2]:flush-8:0  : BDI_DIRTIED =    1111232 KB,
BDI_RECLAIMABLE =   16832 KB, bdi_dirty_limit =   32052 KB
[  195.582384] [2]:flush-8:0  : BDI_DIRTIED =    1129088 KB,
BDI_RECLAIMABLE =   16832 KB, bdi_dirty_limit =   33404 KB
[  195.593001] [2]:flush-8:0  : BDI_DIRTIED =    1129088 KB,
BDI_RECLAIMABLE =   16832 KB, bdi_dirty_limit =   33424 KB
[  198.305291] [1]:flush-8:16 : BDI_DIRTIED =     596928 KB,
BDI_RECLAIMABLE =   78016 KB, bdi_dirty_limit =   19580 KB
[  199.022989] [2]:flush-8:0  : BDI_DIRTIED =    1163904 KB,
BDI_RECLAIMABLE =   18240 KB, bdi_dirty_limit =   35628 KB
[  199.033513] [2]:flush-8:0  : BDI_DIRTIED =    1163904 KB,
BDI_RECLAIMABLE =   18240 KB, bdi_dirty_limit =   35628 KB
[  204.688042] [2]:flush-8:0  : BDI_DIRTIED =    1196800 KB,
BDI_RECLAIMABLE =   18240 KB, bdi_dirty_limit =   34556 KB
[  204.698595] [2]:flush-8:0  : BDI_DIRTIED =    1196800 KB,
BDI_RECLAIMABLE =   18240 KB, bdi_dirty_limit =   34524 KB
[  206.925717] [2]:flush-8:0  : BDI_DIRTIED =    1213888 KB,
BDI_RECLAIMABLE =   17088 KB, bdi_dirty_limit =   34332 KB
[  206.936049] [2]:flush-8:0  : BDI_DIRTIED =    1213888 KB,
BDI_RECLAIMABLE =   17088 KB, bdi_dirty_limit =   34332 KB
[  209.405238] [2]:flush-8:0  : BDI_DIRTIED =    1232320 KB,
BDI_RECLAIMABLE =   18368 KB, bdi_dirty_limit =   33764 KB
[  209.415601] [2]:flush-8:0  : BDI_DIRTIED =    1232320 KB,
BDI_RECLAIMABLE =   18368 KB, bdi_dirty_limit =   33744 KB
[  212.195798] [2]:flush-8:0  : BDI_DIRTIED =    1249216 KB,
BDI_RECLAIMABLE =   16896 KB, bdi_dirty_limit =   33256 KB
[  212.209552] [2]:flush-8:0  : BDI_DIRTIED =    1249536 KB,
BDI_RECLAIMABLE =   17152 KB, bdi_dirty_limit =   33240 KB
[  213.773098] [1]:flush-8:16 : BDI_DIRTIED =     680320 KB,
BDI_RECLAIMABLE =   83392 KB, bdi_dirty_limit =   22124 KB
[  213.783536] [1]:flush-8:16 : BDI_DIRTIED =     680320 KB,
BDI_RECLAIMABLE =   83392 KB, bdi_dirty_limit =   22124 KB
[  214.675144] [2]:flush-8:0  : BDI_DIRTIED =    1267776 KB,
BDI_RECLAIMABLE =   16704 KB, bdi_dirty_limit =   33116 KB
[  214.685498] [2]:flush-8:0  : BDI_DIRTIED =    1267776 KB,
BDI_RECLAIMABLE =   16704 KB, bdi_dirty_limit =   33128 KB
[  218.501301] [2]:flush-8:0  : BDI_DIRTIED =    1284224 KB,
BDI_RECLAIMABLE =   16384 KB, bdi_dirty_limit =   32932 KB
[  218.511646] [2]:flush-8:0  : BDI_DIRTIED =    1284224 KB,
BDI_RECLAIMABLE =   16384 KB, bdi_dirty_limit =   32904 KB
[  219.339245] [2]:flush-8:0  : BDI_DIRTIED =    1302272 KB,
BDI_RECLAIMABLE =   18048 KB, bdi_dirty_limit =   34132 KB
[  219.352808] [2]:flush-8:0  : BDI_DIRTIED =    1303168 KB,
BDI_RECLAIMABLE =   18944 KB, bdi_dirty_limit =   34104 KB
[  220.050747] [2]:flush-8:0  : BDI_DIRTIED =    1322368 KB,
BDI_RECLAIMABLE =   18240 KB, bdi_dirty_limit =   35176 KB
[  220.064885] [2]:flush-8:0  : BDI_DIRTIED =    1323264 KB,
BDI_RECLAIMABLE =   19136 KB, bdi_dirty_limit =   35188 KB
[  220.723480] [2]:flush-8:0  : BDI_DIRTIED =    1344256 KB,
BDI_RECLAIMABLE =   18944 KB, bdi_dirty_limit =   36832 KB
[  220.734171] [2]:flush-8:0  : BDI_DIRTIED =    1344256 KB,
BDI_RECLAIMABLE =   18944 KB, bdi_dirty_limit =   36832 KB
[  221.401320] [2]:flush-8:0  : BDI_DIRTIED =    1364352 KB,
BDI_RECLAIMABLE =   20032 KB, bdi_dirty_limit =   38000 KB
[  221.414822] [2]:flush-8:0  : BDI_DIRTIED =    1365440 KB,
BDI_RECLAIMABLE =   21184 KB, bdi_dirty_limit =   38000 KB
[  222.091724] [2]:flush-8:0  : BDI_DIRTIED =    1385792 KB,
BDI_RECLAIMABLE =   19584 KB, bdi_dirty_limit =   38760 KB
[  222.106450] [2]:flush-8:0  : BDI_DIRTIED =    1386048 KB,
BDI_RECLAIMABLE =   19840 KB, bdi_dirty_limit =   38740 KB
[  225.070245] [2]:flush-8:0  : BDI_DIRTIED =    1443328 KB,
BDI_RECLAIMABLE =   21248 KB, bdi_dirty_limit =   41072 KB
[  225.082125] [2]:flush-8:0  : BDI_DIRTIED =    1443328 KB,
BDI_RECLAIMABLE =   21248 KB, bdi_dirty_limit =   41044 KB
[  225.814516] [2]:flush-8:0  : BDI_DIRTIED =    1456128 KB,
BDI_RECLAIMABLE =   21760 KB, bdi_dirty_limit =   41092 KB
[  225.825050] [2]:flush-8:0  : BDI_DIRTIED =    1456128 KB,
BDI_RECLAIMABLE =   21760 KB, bdi_dirty_limit =   41076 KB
[  225.835457] [2]:flush-8:0  : BDI_DIRTIED =    1456128 KB,
BDI_RECLAIMABLE =   21760 KB, bdi_dirty_limit =   41080 KB
[  227.177970] [2]:flush-8:0  : BDI_DIRTIED =    1478272 KB,
BDI_RECLAIMABLE =   22080 KB, bdi_dirty_limit =   41208 KB
[  227.188482] [2]:flush-8:0  : BDI_DIRTIED =    1478272 KB,
BDI_RECLAIMABLE =   22080 KB, bdi_dirty_limit =   41208 KB
[  231.043363] [2]:flush-8:0  : BDI_DIRTIED =    1510528 KB,
BDI_RECLAIMABLE =   19456 KB, bdi_dirty_limit =   39068 KB
[  231.054717] [2]:flush-8:0  : BDI_DIRTIED =    1510528 KB,
BDI_RECLAIMABLE =   19456 KB, bdi_dirty_limit =   39076 KB
[  231.344643] [1]:flush-8:16 : BDI_DIRTIED =     757312 KB,
BDI_RECLAIMABLE =   76928 KB, bdi_dirty_limit =   16704 KB
[  231.494398] [2]:flush-8:0  : BDI_DIRTIED =    1519616 KB,
BDI_RECLAIMABLE =   20480 KB, bdi_dirty_limit =   39264 KB
[  231.505586] [2]:flush-8:0  : BDI_DIRTIED =    1519616 KB,
BDI_RECLAIMABLE =   20480 KB, bdi_dirty_limit =   39260 KB
[  234.975996] [2]:flush-8:0  : BDI_DIRTIED =    1561024 KB,
BDI_RECLAIMABLE =   20096 KB, bdi_dirty_limit =   39032 KB
[  234.987831] [2]:flush-8:0  : BDI_DIRTIED =    1561024 KB,
BDI_RECLAIMABLE =   20096 KB, bdi_dirty_limit =   39036 KB
[  235.468408] [2]:flush-8:0  : BDI_DIRTIED =    1570176 KB,
BDI_RECLAIMABLE =   20992 KB, bdi_dirty_limit =   39212 KB
[  235.480320] [2]:flush-8:0  : BDI_DIRTIED =    1570176 KB,
BDI_RECLAIMABLE =   20992 KB, bdi_dirty_limit =   39204 KB
[  240.183116] [2]:flush-8:0  : BDI_DIRTIED =    1606336 KB,
BDI_RECLAIMABLE =   19712 KB, bdi_dirty_limit =   37944 KB
[  240.194988] [2]:flush-8:0  : BDI_DIRTIED =    1606336 KB,
BDI_RECLAIMABLE =   19712 KB, bdi_dirty_limit =   37932 KB
[  242.622183] [2]:flush-8:0  : BDI_DIRTIED =    1626368 KB,
BDI_RECLAIMABLE =   20096 KB, bdi_dirty_limit =   37244 KB
[  242.632629] [2]:flush-8:0  : BDI_DIRTIED =    1626368 KB,
BDI_RECLAIMABLE =   20096 KB, bdi_dirty_limit =   37244 KB
[  243.988458] [2]:flush-8:0  : BDI_DIRTIED =    1662848 KB,
BDI_RECLAIMABLE =   19584 KB, bdi_dirty_limit =   38784 KB
[  244.002712] [2]:flush-8:0  : BDI_DIRTIED =    1663936 KB,
BDI_RECLAIMABLE =   20672 KB, bdi_dirty_limit =   38760 KB
[  244.307739] [2]:flush-8:0  : BDI_DIRTIED =    1672000 KB,
BDI_RECLAIMABLE =   20608 KB, bdi_dirty_limit =   39248 KB
[  244.322895] [2]:flush-8:0  : BDI_DIRTIED =    1673152 KB,
BDI_RECLAIMABLE =   21696 KB, bdi_dirty_limit =   39244 KB
[  245.009029] [2]:flush-8:0  : BDI_DIRTIED =    1695680 KB,
BDI_RECLAIMABLE =   20352 KB, bdi_dirty_limit =   40196 KB
[  245.023455] [2]:flush-8:0  : BDI_DIRTIED =    1696960 KB,
BDI_RECLAIMABLE =   21632 KB, bdi_dirty_limit =   40196 KB
[  246.891877] [1]:flush-8:16 : BDI_DIRTIED =     819200 KB,
BDI_RECLAIMABLE =   61888 KB, bdi_dirty_limit =   14336 KB
[  247.095847] [1]:flush-8:16 : BDI_DIRTIED =     819200 KB,
BDI_RECLAIMABLE =   61888 KB, bdi_dirty_limit =   14592 KB
[  247.255127] [2]:flush-8:0  : BDI_DIRTIED =    1719808 KB,
BDI_RECLAIMABLE =   20480 KB, bdi_dirty_limit =   41132 KB
[  247.269750] [2]:flush-8:0  : BDI_DIRTIED =    1720192 KB,
BDI_RECLAIMABLE =   20864 KB, bdi_dirty_limit =   41124 KB
[  248.087937] [2]:flush-8:0  : BDI_DIRTIED =    1741056 KB,
BDI_RECLAIMABLE =   20736 KB, bdi_dirty_limit =   41624 KB
[  248.098499] [2]:flush-8:0  : BDI_DIRTIED =    1741056 KB,
BDI_RECLAIMABLE =   20736 KB, bdi_dirty_limit =   41624 KB
800+0 records in
800+0 records out
838860800 bytes (800.0MB) copied, 152.778511 seconds, 5.2MB/s   ->
'dd' finished for USB flash

[  251.958577] [2]:flush-8:0  : BDI_DIRTIED =    1846656 KB,
BDI_RECLAIMABLE =   22400 KB, bdi_dirty_limit =   45012 KB
[  251.973748] [2]:flush-8:0  : BDI_DIRTIED =    1846976 KB,
BDI_RECLAIMABLE =   22720 KB, bdi_dirty_limit =   45024 KB
[  252.520107] [2]:flush-8:0  : BDI_DIRTIED =    1871360 KB,
BDI_RECLAIMABLE =   22464 KB, bdi_dirty_limit =   45136 KB
[  252.534412] [2]:flush-8:0  : BDI_DIRTIED =    1871744 KB,
BDI_RECLAIMABLE =   22784 KB, bdi_dirty_limit =   45120 KB
[  252.548297] [2]:flush-8:0  : BDI_DIRTIED =    1872256 KB,
BDI_RECLAIMABLE =   23360 KB, bdi_dirty_limit =   45120 KB
[  257.032160] [2]:flush-8:0  : BDI_DIRTIED =    2001536 KB,
BDI_RECLAIMABLE =   43712 KB, bdi_dirty_limit =   45680 KB
[  257.046248] [2]:flush-8:0  : BDI_DIRTIED =    2001856 KB,
BDI_RECLAIMABLE =   44032 KB, bdi_dirty_limit =   45652 KB


 ^^^^^^^^^^^^^
After USB flash write is finished, HDD takes over completely and its
BDI dirty limit is increased.

2000+0 records in
2000+0 records out
2097152000 bytes (2.0GB) copied, 161.683544 seconds, 12.4MB/s

>
>> This continuous failure to start write-back for HDD actually results
>> in lowering the bdi_dirty_limit for HDD, in a way PAUSING the writer
>> thread for HDD.
>> This is actually resulting in less number of WRITE operations per
>> second for HDD. As, the ‘dd’ on USB HDD will be put to long sleep(MAX
>> PAUSE) in balance_dirty_pages.
>>
>> While for USB Flash, its bdi_dirty_limit is kept on increasing as it
>> is getting more chance to flush dirty data in over_bground_thresh. As,
>> bdi_reclaimable > bdi_dirty_limit is true. So, resulting more number
>> of WRITE operation per second for USB Flash.
>> From these observations, we feel that these changes might not be
>> needed. Please let us know in case we are missing on any point here,
>> we can further check more on this.
>   Well, at least we know changing the condition has unexpected side
> effects. I'd like to understand those before discarding the idea - because
> in your setup flusher thread must end up writing rather small amount of
> pages in each run when it's running continuously and that's not too good
> either...
Yes, we were also surprised about drop in write speed with this change.
we are keen to check more on this, pls let us know if you need any
other information.

>
>                                                                 Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-01-10 11:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-30  5:59 [PATCH] writeback: fix writeback cache thrashing Namjae Jeon
2012-12-31 11:30 ` Jan Kara
2013-01-01  0:51   ` Wanpeng Li
     [not found]   ` <20130101005104.GA23383@hacker.(null)>
2013-01-02 13:43     ` Jan Kara
2013-01-03  4:35       ` Namjae Jeon
2013-01-04  0:59         ` Simon Jeons
2013-01-04  7:41           ` Namjae Jeon
2013-01-05  0:46             ` Simon Jeons
2013-01-05  3:26               ` Fengguang Wu
2013-01-05  5:26                 ` Simon Jeons
2013-01-05  7:38                   ` Fengguang Wu
2013-01-05  9:41                     ` Simon Jeons
2013-01-05  9:55                       ` Fengguang Wu
2013-01-05  3:18 ` Fengguang Wu
2013-01-09  8:26   ` Namjae Jeon
2013-01-09 15:13     ` Jan Kara
2013-01-10  2:50       ` Wanpeng Li
2013-01-10 11:58       ` Namjae Jeon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).