From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"peterz@infradead.org" <peterz@infradead.org>
Subject: Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread
Date: Wed, 23 Jun 2010 11:22:13 +0800 [thread overview]
Message-ID: <20100623032213.GA13068@localhost> (raw)
In-Reply-To: <20100623030604.GM6590@dastard>
On Wed, Jun 23, 2010 at 11:06:04AM +0800, Dave Chinner wrote:
> On Wed, Jun 23, 2010 at 09:34:26AM +0800, Wu Fengguang wrote:
> > On Wed, Jun 23, 2010 at 06:45:51AM +0800, Dave Chinner wrote:
> > > On Tue, Jun 22, 2010 at 04:38:56PM +0200, Jan Kara wrote:
> > > > On Tue 22-06-10 10:31:24, Christoph Hellwig wrote:
> > > > > On Tue, Jun 22, 2010 at 09:52:34PM +0800, Wu Fengguang wrote:
> > > > > > 2) most writeback will be submitted by one per-bdi-flusher, so no worry
> > > > > > of cache bouncing (this also means the per CPU counter error is
> > > > > > normally bounded by the batch size)
> > > > >
> > > > > What counter are we talking about exactly? Once balanance_dirty_pages
> > > > The new per-bdi counter I'd like to introduce.
> > > >
> > > > > stops submitting I/O the per-bdi flusher thread will in fact be
> > > > > the only thing submitting writeback, unless you count direct invocations
> > > > > of writeback_single_inode.
> > > > Yes, I agree that the per-bdi flusher thread should be the only thread
> > > > submitting lots of IO (there is direct reclaim or kswapd if we change
> > > > direct reclaim but those should be negligible). So does this mean that
> > > > also I/O completions will be local to the CPU running per-bdi flusher
> > > > thread? Because the counter is incremented from the I/O completion
> > > > callback.
> > >
> > > By default we set QUEUE_FLAG_SAME_COMP, which means we hand
> > > completions back to the submitter CPU during blk_complete_request().
> > > Completion processing is then handled by a softirq on the CPU
> > > selected for completion processing.
> >
> > Good to know about that, thanks!
> >
> > > This was done, IIRC, because it provided some OLTP benchmark 1-2%
> > > better results. It can, however, be turned off via
> > > /sys/block/<foo>/queue/rq_affinity, and there's no guarantee that
> > > the completion processing doesn't get handled off to some other CPU
> > > (e.g. via a workqueue) so we cannot rely on this completion
> > > behaviour to avoid cacheline bouncing.
> >
> > If rq_affinity does not work reliably somewhere in the IO completion
> > path, why not trying to fix it?
>
> Because completion on the submitter CPU is not ideal for high
> bandwidth buffered IO.
Yes there may be heavy post-processing for read data, however for writes
it is mainly the pre-processing that costs CPU? So perfect rq_affinity
should always benefit write IO?
> > Otherwise all the page/mapping/zone
> > cachelines covered by test_set_page_writeback()/test_clear_page_writeback()
> > (and more other functions) will also be bounced.
>
> Yes, but when the flusher thread is approaching being CPU bound for
> high throughput IO, bouncing cachelines to another CPU during
> completion costs far less in terms of throughput compared to
> reducing the amount of time available to issue IO on that CPU.
Yes, reasonable for reads.
> > Another option is to put atomic accounting into test_set_page_writeback()
> > ie. the IO submission path. This actually matches the current
> > balanance_dirty_pages() behavior. It may then block on get_request().
> > The down side is, get_request() blocks until queue depth goes down
> > from nr_congestion_on to nr_congestion_off, which is not as smooth as
> > the IO completion path. As a result balanance_dirty_pages() may get
> > delayed much more than necessary when there is only 1 waiter, and
> > wake up multiple waiters in bursts.
>
> Being reliant on the block layer queuing behaviour for VM congestion
> control is exactly the problem are trying to avoid...
Yes this is not a good option. The paragraph looks more like stating a
potential benefit of the proposed patch :)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-23 3:22 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-17 18:04 [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread Jan Kara
2010-06-18 6:09 ` Dave Chinner
2010-06-18 9:11 ` Peter Zijlstra
2010-06-18 23:29 ` Dave Chinner
2010-06-21 23:36 ` Jan Kara
2010-06-22 5:44 ` Dave Chinner
2010-06-22 6:14 ` Andrew Morton
2010-06-22 7:45 ` Peter Zijlstra
2010-06-22 8:24 ` Andrew Morton
2010-06-22 8:52 ` Peter Zijlstra
2010-06-22 10:09 ` Dave Chinner
2010-06-22 13:17 ` Jan Kara
2010-06-22 13:52 ` Wu Fengguang
2010-06-22 13:59 ` Peter Zijlstra
2010-06-22 14:00 ` Peter Zijlstra
2010-06-22 14:36 ` Wu Fengguang
2010-06-22 14:02 ` Jan Kara
2010-06-22 14:24 ` Wu Fengguang
2010-06-22 22:29 ` Dave Chinner
2010-06-23 13:15 ` Jan Kara
2010-06-23 23:06 ` Dave Chinner
2010-06-22 14:31 ` Christoph Hellwig
2010-06-22 14:38 ` Jan Kara
2010-06-22 22:45 ` Dave Chinner
2010-06-23 1:34 ` Wu Fengguang
2010-06-23 3:06 ` Dave Chinner
2010-06-23 3:22 ` Wu Fengguang [this message]
2010-06-23 6:03 ` Dave Chinner
2010-06-23 6:25 ` Wu Fengguang
2010-06-23 23:42 ` Dave Chinner
2010-06-22 14:41 ` Wu Fengguang
2010-06-22 11:19 ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:31 ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 14:02 ` Jan Kara
2010-06-21 14:10 ` Jan Kara
2010-06-21 14:12 ` Peter Zijlstra
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:42 ` Jan Kara
2010-06-22 4:07 ` Wu Fengguang
2010-06-22 13:27 ` Jan Kara
2010-06-22 13:33 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100623032213.GA13068@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).