Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"peterz@infradead.org" <peterz@infradead.org>
Subject: Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread
Date: Wed, 23 Jun 2010 09:34:26 +0800	[thread overview]
Message-ID: <20100623013426.GA6706@localhost> (raw)
In-Reply-To: <20100622224551.GS7869@dastard>

On Wed, Jun 23, 2010 at 06:45:51AM +0800, Dave Chinner wrote:
> On Tue, Jun 22, 2010 at 04:38:56PM +0200, Jan Kara wrote:
> > On Tue 22-06-10 10:31:24, Christoph Hellwig wrote:
> > > On Tue, Jun 22, 2010 at 09:52:34PM +0800, Wu Fengguang wrote:
> > > > 2) most writeback will be submitted by one per-bdi-flusher, so no worry
> > > >    of cache bouncing (this also means the per CPU counter error is
> > > >    normally bounded by the batch size)
> > > 
> > > What counter are we talking about exactly?  Once balanance_dirty_pages
> >   The new per-bdi counter I'd like to introduce.
> > 
> > > stops submitting I/O the per-bdi flusher thread will in fact be
> > > the only thing submitting writeback, unless you count direct invocations
> > > of writeback_single_inode.
> >   Yes, I agree that the per-bdi flusher thread should be the only thread
> > submitting lots of IO (there is direct reclaim or kswapd if we change
> > direct reclaim but those should be negligible). So does this mean that
> > also I/O completions will be local to the CPU running per-bdi flusher
> > thread? Because the counter is incremented from the I/O completion
> > callback.
> 
> By default we set QUEUE_FLAG_SAME_COMP, which means we hand
> completions back to the submitter CPU during blk_complete_request().
> Completion processing is then handled by a softirq on the CPU
> selected for completion processing.

Good to know about that, thanks!

> This was done, IIRC, because it provided some OLTP benchmark 1-2%
> better results. It can, however, be turned off via
> /sys/block/<foo>/queue/rq_affinity, and there's no guarantee that
> the completion processing doesn't get handled off to some other CPU
> (e.g. via a workqueue) so we cannot rely on this completion
> behaviour to avoid cacheline bouncing.

If rq_affinity does not work reliably somewhere in the IO completion
path, why not trying to fix it? Otherwise all the page/mapping/zone
cachelines covered by test_set_page_writeback()/test_clear_page_writeback()
(and more other functions) will also be bounced.

Another option is to put atomic accounting into test_set_page_writeback()
ie. the IO submission path. This actually matches the current
balanance_dirty_pages() behavior. It may then block on get_request().
The down side is, get_request() blocks until queue depth goes down
from nr_congestion_on to nr_congestion_off, which is not as smooth as
the IO completion path. As a result balanance_dirty_pages() may get
delayed much more than necessary when there is only 1 waiter, and
wake up multiple waiters in bursts.

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)

From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"peterz@infradead.org" <peterz@infradead.org>
Subject: Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread
Date: Wed, 23 Jun 2010 09:34:26 +0800	[thread overview]
Message-ID: <20100623013426.GA6706@localhost> (raw)
In-Reply-To: <20100622224551.GS7869@dastard>

On Wed, Jun 23, 2010 at 06:45:51AM +0800, Dave Chinner wrote:
> On Tue, Jun 22, 2010 at 04:38:56PM +0200, Jan Kara wrote:
> > On Tue 22-06-10 10:31:24, Christoph Hellwig wrote:
> > > On Tue, Jun 22, 2010 at 09:52:34PM +0800, Wu Fengguang wrote:
> > > > 2) most writeback will be submitted by one per-bdi-flusher, so no worry
> > > >    of cache bouncing (this also means the per CPU counter error is
> > > >    normally bounded by the batch size)
> > > 
> > > What counter are we talking about exactly?  Once balanance_dirty_pages
> >   The new per-bdi counter I'd like to introduce.
> > 
> > > stops submitting I/O the per-bdi flusher thread will in fact be
> > > the only thing submitting writeback, unless you count direct invocations
> > > of writeback_single_inode.
> >   Yes, I agree that the per-bdi flusher thread should be the only thread
> > submitting lots of IO (there is direct reclaim or kswapd if we change
> > direct reclaim but those should be negligible). So does this mean that
> > also I/O completions will be local to the CPU running per-bdi flusher
> > thread? Because the counter is incremented from the I/O completion
> > callback.
> 
> By default we set QUEUE_FLAG_SAME_COMP, which means we hand
> completions back to the submitter CPU during blk_complete_request().
> Completion processing is then handled by a softirq on the CPU
> selected for completion processing.

Good to know about that, thanks!

> This was done, IIRC, because it provided some OLTP benchmark 1-2%
> better results. It can, however, be turned off via
> /sys/block/<foo>/queue/rq_affinity, and there's no guarantee that
> the completion processing doesn't get handled off to some other CPU
> (e.g. via a workqueue) so we cannot rely on this completion
> behaviour to avoid cacheline bouncing.

If rq_affinity does not work reliably somewhere in the IO completion
path, why not trying to fix it? Otherwise all the page/mapping/zone
cachelines covered by test_set_page_writeback()/test_clear_page_writeback()
(and more other functions) will also be bounced.

Another option is to put atomic accounting into test_set_page_writeback()
ie. the IO submission path. This actually matches the current
balanance_dirty_pages() behavior. It may then block on get_request().
The down side is, get_request() blocks until queue depth goes down
from nr_congestion_on to nr_congestion_off, which is not as smooth as
the IO completion path. As a result balanance_dirty_pages() may get
delayed much more than necessary when there is only 1 waiter, and
wake up multiple waiters in bursts.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-06-23  1:34 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-17 18:04 [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread Jan Kara
2010-06-17 18:04 ` Jan Kara
2010-06-18  6:09 ` Dave Chinner
2010-06-18  9:11   ` Peter Zijlstra
2010-06-18 23:29     ` Dave Chinner
2010-06-21 23:36   ` Jan Kara
2010-06-22  5:44     ` Dave Chinner
2010-06-22  6:14       ` Andrew Morton
2010-06-22  7:45         ` Peter Zijlstra
2010-06-22  8:24           ` Andrew Morton
2010-06-22  8:52             ` Peter Zijlstra
2010-06-22 10:09         ` Dave Chinner
2010-06-22 13:17           ` Jan Kara
2010-06-22 13:17             ` Jan Kara
2010-06-22 13:52             ` Wu Fengguang
2010-06-22 13:52               ` Wu Fengguang
2010-06-22 13:59               ` Peter Zijlstra
2010-06-22 13:59                 ` Peter Zijlstra
2010-06-22 14:00               ` Peter Zijlstra
2010-06-22 14:36                 ` Wu Fengguang
2010-06-22 14:02               ` Jan Kara
2010-06-22 14:02                 ` Jan Kara
2010-06-22 14:24                 ` Wu Fengguang
2010-06-22 14:24                   ` Wu Fengguang
2010-06-22 22:29                 ` Dave Chinner
2010-06-23 13:15                   ` Jan Kara
2010-06-23 13:15                     ` Jan Kara
2010-06-23 23:06                     ` Dave Chinner
2010-06-22 14:31               ` Christoph Hellwig
2010-06-22 14:31                 ` Christoph Hellwig
2010-06-22 14:38                 ` Jan Kara
2010-06-22 14:38                   ` Jan Kara
2010-06-22 22:45                   ` Dave Chinner
2010-06-23  1:34                     ` Wu Fengguang [this message]
2010-06-23  1:34                       ` Wu Fengguang
2010-06-23  3:06                       ` Dave Chinner
2010-06-23  3:22                         ` Wu Fengguang
2010-06-23  3:22                           ` Wu Fengguang
2010-06-23  6:03                           ` Dave Chinner
2010-06-23  6:03                             ` Dave Chinner
2010-06-23  6:25                             ` Wu Fengguang
2010-06-23  6:25                               ` Wu Fengguang
2010-06-23 23:42                               ` Dave Chinner
2010-06-23 23:42                                 ` Dave Chinner
2010-06-22 14:41                 ` Wu Fengguang
2010-06-22 11:19       ` Jan Kara
2010-06-22 11:19         ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:31   ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 14:02   ` Jan Kara
2010-06-21 14:02     ` Jan Kara
2010-06-21 14:10     ` Jan Kara
2010-06-21 14:10       ` Jan Kara
2010-06-21 14:12       ` Peter Zijlstra
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:42   ` Jan Kara
2010-06-21 13:42     ` Jan Kara
2010-06-22  4:07     ` Wu Fengguang
2010-06-22  4:07       ` Wu Fengguang
2010-06-22 13:27       ` Jan Kara
2010-06-22 13:27         ` Jan Kara
2010-06-22 13:33         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100623013426.GA6706@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.