Re: [PATCH] writeback: fix writeback cache thrashing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Namjae Jeon <linkinjeon@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, liwanp@linux.vnet.ibm.com,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Vivek Trivedi <t.vivek@samsung.com>, Jan Kara <jack@suse.cz>,
	Dave Chinner <dchinner@redhat.com>,
	Simon Jeons <simon.jeons@gmail.com>
Subject: Re: [PATCH] writeback: fix writeback cache thrashing
Date: Wed, 9 Jan 2013 16:13:54 +0100	[thread overview]
Message-ID: <20130109151354.GA17353@quack.suse.cz> (raw)
In-Reply-To: <CAKYAXd-kTOBwZfW=17Ta0wLB4HWzkk5ta3AdT0cPRK3z2zsLUA@mail.gmail.com>

On Wed 09-01-13 17:26:36, Namjae Jeon wrote:
<snip>
> But in one normal scenario, the changes actually results in
> performance degradation.
> 
> Results for ‘dd’ thread on two devices:
> Before applying Patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 77.205276 seconds, 25.9MB/s  -> USB
> HDD WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> #>
> #>
> #> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 154.528362 seconds, 5.2MB/s -> USB
> Flash WRITE Speed
> 
> After applying patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> dd if=/
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 123.844770 seconds, 16.1MB/s ->USB
> HDD WRITE Speed
> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 141.352945 seconds, 5.7MB/s -> USB
> Flash WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> [1]+ Done dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800
> 
> So, after applying our changes:
> 1) USB HDD Write speed dropped from 25.9 -> 16.1 MB/s
> 2) USB Flash Write speed increased marginally from 5.2 -> 5.7 MB/s
> 
> Normally if we have a USB Flash and HDD plugged in system. And if we
> initiate the ‘dd’ on both the devices. Once dirty memory is more than
> the background threshold, flushing starts for all BDI (The write-back
> for the devices will be kicked by the condition):
> If (global_page_state(NR_FILE_DIRTY) +
> global_page_state(NR_UNSTABLE_NFS) > background_thresh))
> 	return true;
> As the slow device and the fast device always make sure that there is
> enough DIRTY data in memory to kick write-back.
> Since, USB Flash is slow, the DIRTY pages corresponding to this device
> is much higher, resulting in returning ‘true’ everytime from
> over_bground_thresh. So, even though HDD might have only few KB of
> dirty data, it is also flushed immediately.
> This frequent flushing of HDD data results in gradually increasing the
> bdi_dirty_limit() for HDD.
  Interesting. Thanks for testing! So is this just a problem with initial
writeout fraction estimation. I.e. if you first let dd to USB HDD run for a
couple of seconds to ramp up its fraction and only then start writeout to
USB flash, is there still a problem with USB HDD throughput with the
changed over_bground_thresh() function?

> But, when we introduce the change to control per BDI i.e.,
>  if (global_page_state(NR_FILE_DIRTY) +
>          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
> 
> Now, in this case, when we consider the same scenario, writeback for
> HDD will only be kicked only if ‘reclaimable * 2 + bdi_stat_error(bdi)
> * 2 > bdi_bground_thresh’
> But this condition is not true a lot many number of times, so
> resulting in false.
  I'm surprised it's not true so often... dd(1) should easily fill the
caches. But maybe we are oscilating between below-background-threshold
and at-dirty-limit situations rather quickly. Do you have recordings of
BDI_RECLAIMABLE and BDI_DIRTY from the problematic run?

> This continuous failure to start write-back for HDD actually results
> in lowering the bdi_dirty_limit for HDD, in a way PAUSING the writer
> thread for HDD.
> This is actually resulting in less number of WRITE operations per
> second for HDD. As, the ‘dd’ on USB HDD will be put to long sleep(MAX
> PAUSE) in balance_dirty_pages.
> 
> While for USB Flash, its bdi_dirty_limit is kept on increasing as it
> is getting more chance to flush dirty data in over_bground_thresh. As,
> bdi_reclaimable > bdi_dirty_limit is true. So, resulting more number
> of WRITE operation per second for USB Flash.
> From these observations, we feel that these changes might not be
> needed. Please let us know in case we are missing on any point here,
> we can further check more on this.
  Well, at least we know changing the condition has unexpected side
effects. I'd like to understand those before discarding the idea - because
in your setup flusher thread must end up writing rather small amount of
pages in each run when it's running continuously and that's not too good
either...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: Namjae Jeon <linkinjeon@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, liwanp@linux.vnet.ibm.com,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Vivek Trivedi <t.vivek@samsung.com>, Jan Kara <jack@suse.cz>,
	Dave Chinner <dchinner@redhat.com>,
	Simon Jeons <simon.jeons@gmail.com>
Subject: Re: [PATCH] writeback: fix writeback cache thrashing
Date: Wed, 9 Jan 2013 16:13:54 +0100	[thread overview]
Message-ID: <20130109151354.GA17353@quack.suse.cz> (raw)
In-Reply-To: <CAKYAXd-kTOBwZfW=17Ta0wLB4HWzkk5ta3AdT0cPRK3z2zsLUA@mail.gmail.com>

On Wed 09-01-13 17:26:36, Namjae Jeon wrote:
<snip>
> But in one normal scenario, the changes actually results in
> performance degradation.
> 
> Results for a??dda?? thread on two devices:
> Before applying Patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 77.205276 seconds, 25.9MB/s  -> USB
> HDD WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> #>
> #>
> #> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 154.528362 seconds, 5.2MB/s -> USB
> Flash WRITE Speed
> 
> After applying patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> dd if=/
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 123.844770 seconds, 16.1MB/s ->USB
> HDD WRITE Speed
> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 141.352945 seconds, 5.7MB/s -> USB
> Flash WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> [1]+ Done dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800
> 
> So, after applying our changes:
> 1) USB HDD Write speed dropped from 25.9 -> 16.1 MB/s
> 2) USB Flash Write speed increased marginally from 5.2 -> 5.7 MB/s
> 
> Normally if we have a USB Flash and HDD plugged in system. And if we
> initiate the a??dda?? on both the devices. Once dirty memory is more than
> the background threshold, flushing starts for all BDI (The write-back
> for the devices will be kicked by the condition):
> If (global_page_state(NR_FILE_DIRTY) +
> global_page_state(NR_UNSTABLE_NFS) > background_thresh))
> 	return true;
> As the slow device and the fast device always make sure that there is
> enough DIRTY data in memory to kick write-back.
> Since, USB Flash is slow, the DIRTY pages corresponding to this device
> is much higher, resulting in returning a??truea?? everytime from
> over_bground_thresh. So, even though HDD might have only few KB of
> dirty data, it is also flushed immediately.
> This frequent flushing of HDD data results in gradually increasing the
> bdi_dirty_limit() for HDD.
  Interesting. Thanks for testing! So is this just a problem with initial
writeout fraction estimation. I.e. if you first let dd to USB HDD run for a
couple of seconds to ramp up its fraction and only then start writeout to
USB flash, is there still a problem with USB HDD throughput with the
changed over_bground_thresh() function?

> But, when we introduce the change to control per BDI i.e.,
>  if (global_page_state(NR_FILE_DIRTY) +
>          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
> 
> Now, in this case, when we consider the same scenario, writeback for
> HDD will only be kicked only if a??reclaimable * 2 + bdi_stat_error(bdi)
> * 2 > bdi_bground_thresha??
> But this condition is not true a lot many number of times, so
> resulting in false.
  I'm surprised it's not true so often... dd(1) should easily fill the
caches. But maybe we are oscilating between below-background-threshold
and at-dirty-limit situations rather quickly. Do you have recordings of
BDI_RECLAIMABLE and BDI_DIRTY from the problematic run?

> This continuous failure to start write-back for HDD actually results
> in lowering the bdi_dirty_limit for HDD, in a way PAUSING the writer
> thread for HDD.
> This is actually resulting in less number of WRITE operations per
> second for HDD. As, the a??dda?? on USB HDD will be put to long sleep(MAX
> PAUSE) in balance_dirty_pages.
> 
> While for USB Flash, its bdi_dirty_limit is kept on increasing as it
> is getting more chance to flush dirty data in over_bground_thresh. As,
> bdi_reclaimable > bdi_dirty_limit is true. So, resulting more number
> of WRITE operation per second for USB Flash.
> From these observations, we feel that these changes might not be
> needed. Please let us know in case we are missing on any point here,
> we can further check more on this.
  Well, at least we know changing the condition has unexpected side
effects. I'd like to understand those before discarding the idea - because
in your setup flusher thread must end up writing rather small amount of
pages in each run when it's running continuously and that's not too good
either...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: Namjae Jeon <linkinjeon@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, liwanp@linux.vnet.ibm.com,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Vivek Trivedi <t.vivek@samsung.com>, Jan Kara <jack@suse.cz>,
	Dave Chinner <dchinner@redhat.com>,
	Simon Jeons <simon.jeons@gmail.com>
Subject: Re: [PATCH] writeback: fix writeback cache thrashing
Date: Wed, 9 Jan 2013 16:13:54 +0100	[thread overview]
Message-ID: <20130109151354.GA17353@quack.suse.cz> (raw)
In-Reply-To: <CAKYAXd-kTOBwZfW=17Ta0wLB4HWzkk5ta3AdT0cPRK3z2zsLUA@mail.gmail.com>

On Wed 09-01-13 17:26:36, Namjae Jeon wrote:
<snip>
> But in one normal scenario, the changes actually results in
> performance degradation.
> 
> Results for ‘dd’ thread on two devices:
> Before applying Patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 77.205276 seconds, 25.9MB/s  -> USB
> HDD WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> #>
> #>
> #> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 154.528362 seconds, 5.2MB/s -> USB
> Flash WRITE Speed
> 
> After applying patch:
> #> dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800 &
> dd if=/
> #> dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000 &
> #>
> #> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.0GB) copied, 123.844770 seconds, 16.1MB/s ->USB
> HDD WRITE Speed
> 800+0 records in
> 800+0 records out
> 838860800 bytes (800.0MB) copied, 141.352945 seconds, 5.7MB/s -> USB
> Flash WRITE Speed
> 
> [2]+ Done dd if=/dev/zero of=/mnt/sda6/file2 bs=1048576 count=2000
> [1]+ Done dd if=/dev/zero of=/mnt/sdb2/file1 bs=1048576 count=800
> 
> So, after applying our changes:
> 1) USB HDD Write speed dropped from 25.9 -> 16.1 MB/s
> 2) USB Flash Write speed increased marginally from 5.2 -> 5.7 MB/s
> 
> Normally if we have a USB Flash and HDD plugged in system. And if we
> initiate the ‘dd’ on both the devices. Once dirty memory is more than
> the background threshold, flushing starts for all BDI (The write-back
> for the devices will be kicked by the condition):
> If (global_page_state(NR_FILE_DIRTY) +
> global_page_state(NR_UNSTABLE_NFS) > background_thresh))
> 	return true;
> As the slow device and the fast device always make sure that there is
> enough DIRTY data in memory to kick write-back.
> Since, USB Flash is slow, the DIRTY pages corresponding to this device
> is much higher, resulting in returning ‘true’ everytime from
> over_bground_thresh. So, even though HDD might have only few KB of
> dirty data, it is also flushed immediately.
> This frequent flushing of HDD data results in gradually increasing the
> bdi_dirty_limit() for HDD.
  Interesting. Thanks for testing! So is this just a problem with initial
writeout fraction estimation. I.e. if you first let dd to USB HDD run for a
couple of seconds to ramp up its fraction and only then start writeout to
USB flash, is there still a problem with USB HDD throughput with the
changed over_bground_thresh() function?

> But, when we introduce the change to control per BDI i.e.,
>  if (global_page_state(NR_FILE_DIRTY) +
>          global_page_state(NR_UNSTABLE_NFS) > background_thresh &&
>          reclaimable * 2 + bdi_stat_error(bdi) * 2 > bdi_bground_thresh)
> 
> Now, in this case, when we consider the same scenario, writeback for
> HDD will only be kicked only if ‘reclaimable * 2 + bdi_stat_error(bdi)
> * 2 > bdi_bground_thresh’
> But this condition is not true a lot many number of times, so
> resulting in false.
  I'm surprised it's not true so often... dd(1) should easily fill the
caches. But maybe we are oscilating between below-background-threshold
and at-dirty-limit situations rather quickly. Do you have recordings of
BDI_RECLAIMABLE and BDI_DIRTY from the problematic run?

> This continuous failure to start write-back for HDD actually results
> in lowering the bdi_dirty_limit for HDD, in a way PAUSING the writer
> thread for HDD.
> This is actually resulting in less number of WRITE operations per
> second for HDD. As, the ‘dd’ on USB HDD will be put to long sleep(MAX
> PAUSE) in balance_dirty_pages.
> 
> While for USB Flash, its bdi_dirty_limit is kept on increasing as it
> is getting more chance to flush dirty data in over_bground_thresh. As,
> bdi_reclaimable > bdi_dirty_limit is true. So, resulting more number
> of WRITE operation per second for USB Flash.
> From these observations, we feel that these changes might not be
> needed. Please let us know in case we are missing on any point here,
> we can further check more on this.
  Well, at least we know changing the condition has unexpected side
effects. I'd like to understand those before discarding the idea - because
in your setup flusher thread must end up writing rather small amount of
pages in each run when it's running continuously and that's not too good
either...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2013-01-09 15:13 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-30  5:59 [PATCH] writeback: fix writeback cache thrashing Namjae Jeon
2012-12-30  5:59 ` Namjae Jeon
2012-12-31 11:30 ` Jan Kara
2012-12-31 11:30   ` Jan Kara
2013-01-01  0:51   ` Wanpeng Li
2013-01-02 13:43     ` Jan Kara
2013-01-02 13:43       ` Jan Kara
2013-01-03  4:35       ` Namjae Jeon
2013-01-03  4:35         ` Namjae Jeon
2013-01-04  0:59         ` Simon Jeons
2013-01-04  0:59           ` Simon Jeons
2013-01-04  7:41           ` Namjae Jeon
2013-01-04  7:41             ` Namjae Jeon
2013-01-04  7:41             ` Namjae Jeon
2013-01-05  0:46             ` Simon Jeons
2013-01-05  0:46               ` Simon Jeons
2013-01-05  0:46               ` Simon Jeons
2013-01-05  3:26               ` Fengguang Wu
2013-01-05  3:26                 ` Fengguang Wu
2013-01-05  3:26                 ` Fengguang Wu
2013-01-05  5:26                 ` Simon Jeons
2013-01-05  5:26                   ` Simon Jeons
2013-01-05  5:26                   ` Simon Jeons
2013-01-05  7:38                   ` Fengguang Wu
2013-01-05  7:38                     ` Fengguang Wu
2013-01-05  7:38                     ` Fengguang Wu
2013-01-05  9:41                     ` Simon Jeons
2013-01-05  9:41                       ` Simon Jeons
2013-01-05  9:41                       ` Simon Jeons
2013-01-05  9:55                       ` Fengguang Wu
2013-01-05  9:55                         ` Fengguang Wu
2013-01-05  9:55                         ` Fengguang Wu
2013-01-01  0:51   ` Wanpeng Li
2013-01-05  3:18 ` Fengguang Wu
2013-01-05  3:18   ` Fengguang Wu
2013-01-09  8:26   ` Namjae Jeon
2013-01-09  8:26     ` Namjae Jeon
2013-01-09 15:13     ` Jan Kara [this message]
2013-01-09 15:13       ` Jan Kara
2013-01-09 15:13       ` Jan Kara
2013-01-10  2:50       ` Wanpeng Li
2013-01-10  2:50         ` Wanpeng Li
2013-01-10  2:50       ` Wanpeng Li
2013-01-10 11:58       ` Namjae Jeon
2013-01-10 11:58         ` Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130109151354.GA17353@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=dchinner@redhat.com \
    --cc=fengguang.wu@intel.com \
    --cc=linkinjeon@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=namjae.jeon@samsung.com \
    --cc=simon.jeons@gmail.com \
    --cc=t.vivek@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.