Re: [PATCH] writeback: fix writeback cache thrashing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Simon Jeons <simon.jeons@gmail.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Namjae Jeon <linkinjeon@gmail.com>, Jan Kara <jack@suse.cz>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Vivek Trivedi <t.vivek@samsung.com>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH] writeback: fix writeback cache thrashing
Date: Sat, 05 Jan 2013 03:41:54 -0600	[thread overview]
Message-ID: <1357378914.8716.3.camel@kernel.cn.ibm.com> (raw)
In-Reply-To: <20130105073846.GA11811@localhost>

On Sat, 2013-01-05 at 15:38 +0800, Fengguang Wu wrote:
> On Fri, Jan 04, 2013 at 11:26:43PM -0600, Simon Jeons wrote:
> > On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote:
> > > > > > Hi Namjae,
> > > > > >
> > > > > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > > > > error of a stat counter"?
> > > > > Hi Simon,
> > > > > 
> > > > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> > > > > percpu counters.
> > > > > When these percpu counters are incremented/decremented simultaneously
> > > > > on multiple CPUs by small amount (individual cpu counter less than
> > > > > threshold BDI_STAT_BATCH),
> > > > > it is possible that we get approximate value (not exact value) of
> > > > > these percpu counters.
> > > > > In order, to handle these percpu counter error we have used
> > > > > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > > > > in percpu bdi stats accounting.
> > > > > 
> > > > > bdi_stat(bdi, BDI_RECLAIMABLE);
> > > > >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > > > > previous value of percpu count.
> > > > > 
> > > > > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > > > >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > > > > and add current percpu count of individual CPUs.
> > > > >    It is not recommended to use it frequently as it is expensive. We
> > > > > can better use “bdi_stat” and work with approx value of bdi stats.
> > > > > 
> > > > 
> > > > Hi Namjae, thanks for your clarify.
> > > > 
> > > > But why compare error stat count to bdi_bground_thresh? What's the
> > > 
> > > It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
> > > in concept, comparing bdi_stat (with error bound adjustments) to
> > > bdi_bground_thresh.
> > > 
> > > > relationship between them? I also see bdi_stat_error compare to
> > > > bdi_thresh/bdi_dirty in function balance_dirty_pages. 
> > > 
> > 
> > Hi Fengguang,
> > 
> > > Here, it's trying to use bdi_stat_sum(), the accurate (however more
> > > costly) version of bdi_stat(), if the error would possibly be large:
> > 
> > Why error is large use bdi_stat_sum and error is few use bdi_stat?
> 

Thanks for your response Fengguang! :)

> It's the opposite. Please check this per-cpu counter routine to get an idea:
> 
> /*
>  * Add up all the per-cpu counts, return the result.  This is a more accurate
>  * but much slower version of percpu_counter_read_positive()
>  */                                                 
> s64 __percpu_counter_sum(struct percpu_counter *fbc)
> 
> > > 
> > >                 if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
> > >                         bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 } else {
> > >                         bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 }
> > > 

The comment above these codes:

                 * In order to avoid the stacked BDI deadlock we need
                 * to ensure we accurately count the 'dirty' pages when
                 * the threshold is low.

Why your meaning threshold low is error large? 


> > > Here the comment should have explained it well:
> > > 
> > >                  * In theory 1 page is enough to keep the comsumer-producer
> > >                  * pipe going: the flusher cleans 1 page => the task dirties 1
> > >                  * more page. However bdi_dirty has accounting errors.  So use
> > 
> > Why bdi_dirty has accounting errors?
> 
> Because it typically uses bdi_stat() to get the rough sum of the per-cpu
> counters.
>  
> Thanks,
> Fengguang
> 
> > >                  * the larger and more IO friendly bdi_stat_error.
> > >                  */
> > >                 if (bdi_dirty <= bdi_stat_error(bdi))
> > >                         break;
> > > 
> > > 
> > > Thanks,
> > > Fengguang
> > 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Simon Jeons <simon.jeons@gmail.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Namjae Jeon <linkinjeon@gmail.com>, Jan Kara <jack@suse.cz>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Vivek Trivedi <t.vivek@samsung.com>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH] writeback: fix writeback cache thrashing
Date: Sat, 05 Jan 2013 03:41:54 -0600	[thread overview]
Message-ID: <1357378914.8716.3.camel@kernel.cn.ibm.com> (raw)
In-Reply-To: <20130105073846.GA11811@localhost>

On Sat, 2013-01-05 at 15:38 +0800, Fengguang Wu wrote:
> On Fri, Jan 04, 2013 at 11:26:43PM -0600, Simon Jeons wrote:
> > On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote:
> > > > > > Hi Namjae,
> > > > > >
> > > > > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > > > > error of a stat counter"?
> > > > > Hi Simon,
> > > > > 
> > > > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK a?|) are kept in
> > > > > percpu counters.
> > > > > When these percpu counters are incremented/decremented simultaneously
> > > > > on multiple CPUs by small amount (individual cpu counter less than
> > > > > threshold BDI_STAT_BATCH),
> > > > > it is possible that we get approximate value (not exact value) of
> > > > > these percpu counters.
> > > > > In order, to handle these percpu counter error we have used
> > > > > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > > > > in percpu bdi stats accounting.
> > > > > 
> > > > > bdi_stat(bdi, BDI_RECLAIMABLE);
> > > > >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > > > > previous value of percpu count.
> > > > > 
> > > > > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > > > >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > > > > and add current percpu count of individual CPUs.
> > > > >    It is not recommended to use it frequently as it is expensive. We
> > > > > can better use a??bdi_stata?? and work with approx value of bdi stats.
> > > > > 
> > > > 
> > > > Hi Namjae, thanks for your clarify.
> > > > 
> > > > But why compare error stat count to bdi_bground_thresh? What's the
> > > 
> > > It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
> > > in concept, comparing bdi_stat (with error bound adjustments) to
> > > bdi_bground_thresh.
> > > 
> > > > relationship between them? I also see bdi_stat_error compare to
> > > > bdi_thresh/bdi_dirty in function balance_dirty_pages. 
> > > 
> > 
> > Hi Fengguang,
> > 
> > > Here, it's trying to use bdi_stat_sum(), the accurate (however more
> > > costly) version of bdi_stat(), if the error would possibly be large:
> > 
> > Why error is large use bdi_stat_sum and error is few use bdi_stat?
> 

Thanks for your response Fengguang! :)

> It's the opposite. Please check this per-cpu counter routine to get an idea:
> 
> /*
>  * Add up all the per-cpu counts, return the result.  This is a more accurate
>  * but much slower version of percpu_counter_read_positive()
>  */                                                 
> s64 __percpu_counter_sum(struct percpu_counter *fbc)
> 
> > > 
> > >                 if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
> > >                         bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 } else {
> > >                         bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 }
> > > 

The comment above these codes:

                 * In order to avoid the stacked BDI deadlock we need
                 * to ensure we accurately count the 'dirty' pages when
                 * the threshold is low.

Why your meaning threshold low is error large? 


> > > Here the comment should have explained it well:
> > > 
> > >                  * In theory 1 page is enough to keep the comsumer-producer
> > >                  * pipe going: the flusher cleans 1 page => the task dirties 1
> > >                  * more page. However bdi_dirty has accounting errors.  So use
> > 
> > Why bdi_dirty has accounting errors?
> 
> Because it typically uses bdi_stat() to get the rough sum of the per-cpu
> counters.
>  
> Thanks,
> Fengguang
> 
> > >                  * the larger and more IO friendly bdi_stat_error.
> > >                  */
> > >                 if (bdi_dirty <= bdi_stat_error(bdi))
> > >                         break;
> > > 
> > > 
> > > Thanks,
> > > Fengguang
> > 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Simon Jeons <simon.jeons@gmail.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Namjae Jeon <linkinjeon@gmail.com>, Jan Kara <jack@suse.cz>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Vivek Trivedi <t.vivek@samsung.com>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH] writeback: fix writeback cache thrashing
Date: Sat, 05 Jan 2013 03:41:54 -0600	[thread overview]
Message-ID: <1357378914.8716.3.camel@kernel.cn.ibm.com> (raw)
In-Reply-To: <20130105073846.GA11811@localhost>

On Sat, 2013-01-05 at 15:38 +0800, Fengguang Wu wrote:
> On Fri, Jan 04, 2013 at 11:26:43PM -0600, Simon Jeons wrote:
> > On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote:
> > > > > > Hi Namjae,
> > > > > >
> > > > > > Why use bdi_stat_error here? What's the meaning of its comment "maximal
> > > > > > error of a stat counter"?
> > > > > Hi Simon,
> > > > > 
> > > > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK …) are kept in
> > > > > percpu counters.
> > > > > When these percpu counters are incremented/decremented simultaneously
> > > > > on multiple CPUs by small amount (individual cpu counter less than
> > > > > threshold BDI_STAT_BATCH),
> > > > > it is possible that we get approximate value (not exact value) of
> > > > > these percpu counters.
> > > > > In order, to handle these percpu counter error we have used
> > > > > bdi_stat_error. bdi_stat_error is the maximum error which can happen
> > > > > in percpu bdi stats accounting.
> > > > > 
> > > > > bdi_stat(bdi, BDI_RECLAIMABLE);
> > > > >  -> This will give approximate value of BDI_RECLAIMABLE by reading
> > > > > previous value of percpu count.
> > > > > 
> > > > > bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > > > >  ->This will give exact value of BDI_RECLAIMABLE. It will take lock
> > > > > and add current percpu count of individual CPUs.
> > > > >    It is not recommended to use it frequently as it is expensive. We
> > > > > can better use “bdi_stat” and work with approx value of bdi stats.
> > > > > 
> > > > 
> > > > Hi Namjae, thanks for your clarify.
> > > > 
> > > > But why compare error stat count to bdi_bground_thresh? What's the
> > > 
> > > It's not comparing bdi_stat_error to bdi_bground_thresh, but rather,
> > > in concept, comparing bdi_stat (with error bound adjustments) to
> > > bdi_bground_thresh.
> > > 
> > > > relationship between them? I also see bdi_stat_error compare to
> > > > bdi_thresh/bdi_dirty in function balance_dirty_pages. 
> > > 
> > 
> > Hi Fengguang,
> > 
> > > Here, it's trying to use bdi_stat_sum(), the accurate (however more
> > > costly) version of bdi_stat(), if the error would possibly be large:
> > 
> > Why error is large use bdi_stat_sum and error is few use bdi_stat?
> 

Thanks for your response Fengguang! :)

> It's the opposite. Please check this per-cpu counter routine to get an idea:
> 
> /*
>  * Add up all the per-cpu counts, return the result.  This is a more accurate
>  * but much slower version of percpu_counter_read_positive()
>  */                                                 
> s64 __percpu_counter_sum(struct percpu_counter *fbc)
> 
> > > 
> > >                 if (bdi_thresh < 2 * bdi_stat_error(bdi)) {
> > >                         bdi_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 } else {
> > >                         bdi_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> > >                         //...
> > >                 }
> > > 

The comment above these codes:

                 * In order to avoid the stacked BDI deadlock we need
                 * to ensure we accurately count the 'dirty' pages when
                 * the threshold is low.

Why your meaning threshold low is error large? 


> > > Here the comment should have explained it well:
> > > 
> > >                  * In theory 1 page is enough to keep the comsumer-producer
> > >                  * pipe going: the flusher cleans 1 page => the task dirties 1
> > >                  * more page. However bdi_dirty has accounting errors.  So use
> > 
> > Why bdi_dirty has accounting errors?
> 
> Because it typically uses bdi_stat() to get the rough sum of the per-cpu
> counters.
>  
> Thanks,
> Fengguang
> 
> > >                  * the larger and more IO friendly bdi_stat_error.
> > >                  */
> > >                 if (bdi_dirty <= bdi_stat_error(bdi))
> > >                         break;
> > > 
> > > 
> > > Thanks,
> > > Fengguang
> >

next prev parent reply	other threads:[~2013-01-05  9:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-30  5:59 [PATCH] writeback: fix writeback cache thrashing Namjae Jeon
2012-12-30  5:59 ` Namjae Jeon
2012-12-31 11:30 ` Jan Kara
2012-12-31 11:30   ` Jan Kara
2013-01-01  0:51   ` Wanpeng Li
2013-01-02 13:43     ` Jan Kara
2013-01-02 13:43       ` Jan Kara
2013-01-03  4:35       ` Namjae Jeon
2013-01-03  4:35         ` Namjae Jeon
2013-01-04  0:59         ` Simon Jeons
2013-01-04  0:59           ` Simon Jeons
2013-01-04  7:41           ` Namjae Jeon
2013-01-04  7:41             ` Namjae Jeon
2013-01-04  7:41             ` Namjae Jeon
2013-01-05  0:46             ` Simon Jeons
2013-01-05  0:46               ` Simon Jeons
2013-01-05  0:46               ` Simon Jeons
2013-01-05  3:26               ` Fengguang Wu
2013-01-05  3:26                 ` Fengguang Wu
2013-01-05  3:26                 ` Fengguang Wu
2013-01-05  5:26                 ` Simon Jeons
2013-01-05  5:26                   ` Simon Jeons
2013-01-05  5:26                   ` Simon Jeons
2013-01-05  7:38                   ` Fengguang Wu
2013-01-05  7:38                     ` Fengguang Wu
2013-01-05  7:38                     ` Fengguang Wu
2013-01-05  9:41                     ` Simon Jeons [this message]
2013-01-05  9:41                       ` Simon Jeons
2013-01-05  9:41                       ` Simon Jeons
2013-01-05  9:55                       ` Fengguang Wu
2013-01-05  9:55                         ` Fengguang Wu
2013-01-05  9:55                         ` Fengguang Wu
2013-01-01  0:51   ` Wanpeng Li
2013-01-05  3:18 ` Fengguang Wu
2013-01-05  3:18   ` Fengguang Wu
2013-01-09  8:26   ` Namjae Jeon
2013-01-09  8:26     ` Namjae Jeon
2013-01-09 15:13     ` Jan Kara
2013-01-09 15:13       ` Jan Kara
2013-01-09 15:13       ` Jan Kara
2013-01-10  2:50       ` Wanpeng Li
2013-01-10  2:50       ` Wanpeng Li
2013-01-10  2:50         ` Wanpeng Li
2013-01-10 11:58       ` Namjae Jeon
2013-01-10 11:58         ` Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1357378914.8716.3.camel@kernel.cn.ibm.com \
    --to=simon.jeons@gmail.com \
    --cc=dchinner@redhat.com \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linkinjeon@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=namjae.jeon@samsung.com \
    --cc=t.vivek@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.