Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"peterz@infradead.org" <peterz@infradead.org>
Subject: Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread
Date: Thu, 24 Jun 2010 09:42:37 +1000	[thread overview]
Message-ID: <20100623234237.GA23223@dastard> (raw)
In-Reply-To: <20100623062540.GA25103@localhost>

On Wed, Jun 23, 2010 at 02:25:40PM +0800, Wu Fengguang wrote:
> On Wed, Jun 23, 2010 at 02:03:19PM +0800, Dave Chinner wrote:
> > On Wed, Jun 23, 2010 at 11:22:13AM +0800, Wu Fengguang wrote:
> > > On Wed, Jun 23, 2010 at 11:06:04AM +0800, Dave Chinner wrote:
> > > > On Wed, Jun 23, 2010 at 09:34:26AM +0800, Wu Fengguang wrote:
> > > > > On Wed, Jun 23, 2010 at 06:45:51AM +0800, Dave Chinner wrote:
> > > > > > By default we set QUEUE_FLAG_SAME_COMP, which means we hand
> > > > > > completions back to the submitter CPU during blk_complete_request().
> > > > > > Completion processing is then handled by a softirq on the CPU
> > > > > > selected for completion processing.
> > > > > 
> > > > > Good to know about that, thanks!
> > > > > 
> > > > > > This was done, IIRC, because it provided some OLTP benchmark 1-2%
> > > > > > better results. It can, however, be turned off via
> > > > > > /sys/block/<foo>/queue/rq_affinity, and there's no guarantee that
> > > > > > the completion processing doesn't get handled off to some other CPU
> > > > > > (e.g. via a workqueue) so we cannot rely on this completion
> > > > > > behaviour to avoid cacheline bouncing.
> > > > > 
> > > > > If rq_affinity does not work reliably somewhere in the IO completion
> > > > > path, why not trying to fix it?
> > > > 
> > > > Because completion on the submitter CPU is not ideal for high
> > > > bandwidth buffered IO.
> > > 
> > > Yes there may be heavy post-processing for read data, however for writes
> > > it is mainly the pre-processing that costs CPU?
> > 
> > Could be either - delayed allocation requires significant pre-processing
> > for allocation. Avoiding this by using preallocation just
> > moves the processing load to IO completion which needs to issue
> > transactions to mark the region written.
> 
> Good point, thanks.
> 
> > > So perfect rq_affinity
> > > should always benefit write IO?
> > 
> > No, because the flusher thread gets to be CPU bound just writing
> > pages, allocating blocks and submitting IO. It might take 5-10GB/s
> > to get there (say a million dirty pages a second being processed by
> > a single CPU), but that's the sort of storage subsystem XFS is
> > capable of driving. IO completion time for such a workload is
> > significant, too, so putting that on the same CPU as the flusher
> > thread will slow things down by far more than gain from avoiding
> > cacheline bouncing.
> 
> So super fast storage is going to demand multiple flushers per bdi.
> And once we run multiple flushers for one bdi, it will again be
> beneficial to schedule IO completion to the flusher CPU :)

Yes - that is where we want to get to with XFS. But we don't have
multiple bdi-flusher thread support yet for any filesystem, so
I think it will be a while before the we can ignore this issue...

Cheers,

Dave.> 

-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)

From: Dave Chinner <david@fromorbit.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"peterz@infradead.org" <peterz@infradead.org>
Subject: Re: [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread
Date: Thu, 24 Jun 2010 09:42:37 +1000	[thread overview]
Message-ID: <20100623234237.GA23223@dastard> (raw)
In-Reply-To: <20100623062540.GA25103@localhost>

On Wed, Jun 23, 2010 at 02:25:40PM +0800, Wu Fengguang wrote:
> On Wed, Jun 23, 2010 at 02:03:19PM +0800, Dave Chinner wrote:
> > On Wed, Jun 23, 2010 at 11:22:13AM +0800, Wu Fengguang wrote:
> > > On Wed, Jun 23, 2010 at 11:06:04AM +0800, Dave Chinner wrote:
> > > > On Wed, Jun 23, 2010 at 09:34:26AM +0800, Wu Fengguang wrote:
> > > > > On Wed, Jun 23, 2010 at 06:45:51AM +0800, Dave Chinner wrote:
> > > > > > By default we set QUEUE_FLAG_SAME_COMP, which means we hand
> > > > > > completions back to the submitter CPU during blk_complete_request().
> > > > > > Completion processing is then handled by a softirq on the CPU
> > > > > > selected for completion processing.
> > > > > 
> > > > > Good to know about that, thanks!
> > > > > 
> > > > > > This was done, IIRC, because it provided some OLTP benchmark 1-2%
> > > > > > better results. It can, however, be turned off via
> > > > > > /sys/block/<foo>/queue/rq_affinity, and there's no guarantee that
> > > > > > the completion processing doesn't get handled off to some other CPU
> > > > > > (e.g. via a workqueue) so we cannot rely on this completion
> > > > > > behaviour to avoid cacheline bouncing.
> > > > > 
> > > > > If rq_affinity does not work reliably somewhere in the IO completion
> > > > > path, why not trying to fix it?
> > > > 
> > > > Because completion on the submitter CPU is not ideal for high
> > > > bandwidth buffered IO.
> > > 
> > > Yes there may be heavy post-processing for read data, however for writes
> > > it is mainly the pre-processing that costs CPU?
> > 
> > Could be either - delayed allocation requires significant pre-processing
> > for allocation. Avoiding this by using preallocation just
> > moves the processing load to IO completion which needs to issue
> > transactions to mark the region written.
> 
> Good point, thanks.
> 
> > > So perfect rq_affinity
> > > should always benefit write IO?
> > 
> > No, because the flusher thread gets to be CPU bound just writing
> > pages, allocating blocks and submitting IO. It might take 5-10GB/s
> > to get there (say a million dirty pages a second being processed by
> > a single CPU), but that's the sort of storage subsystem XFS is
> > capable of driving. IO completion time for such a workload is
> > significant, too, so putting that on the same CPU as the flusher
> > thread will slow things down by far more than gain from avoiding
> > cacheline bouncing.
> 
> So super fast storage is going to demand multiple flushers per bdi.
> And once we run multiple flushers for one bdi, it will again be
> beneficial to schedule IO completion to the flusher CPU :)

Yes - that is where we want to get to with XFS. But we don't have
multiple bdi-flusher thread support yet for any filesystem, so
I think it will be a while before the we can ignore this issue...

Cheers,

Dave.> 

-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-06-23 23:42 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-17 18:04 [PATCH RFC] mm: Implement balance_dirty_pages() through waiting for flusher thread Jan Kara
2010-06-17 18:04 ` Jan Kara
2010-06-18  6:09 ` Dave Chinner
2010-06-18  9:11   ` Peter Zijlstra
2010-06-18 23:29     ` Dave Chinner
2010-06-21 23:36   ` Jan Kara
2010-06-22  5:44     ` Dave Chinner
2010-06-22  6:14       ` Andrew Morton
2010-06-22  7:45         ` Peter Zijlstra
2010-06-22  8:24           ` Andrew Morton
2010-06-22  8:52             ` Peter Zijlstra
2010-06-22 10:09         ` Dave Chinner
2010-06-22 13:17           ` Jan Kara
2010-06-22 13:17             ` Jan Kara
2010-06-22 13:52             ` Wu Fengguang
2010-06-22 13:52               ` Wu Fengguang
2010-06-22 13:59               ` Peter Zijlstra
2010-06-22 13:59                 ` Peter Zijlstra
2010-06-22 14:00               ` Peter Zijlstra
2010-06-22 14:36                 ` Wu Fengguang
2010-06-22 14:02               ` Jan Kara
2010-06-22 14:02                 ` Jan Kara
2010-06-22 14:24                 ` Wu Fengguang
2010-06-22 14:24                   ` Wu Fengguang
2010-06-22 22:29                 ` Dave Chinner
2010-06-23 13:15                   ` Jan Kara
2010-06-23 13:15                     ` Jan Kara
2010-06-23 23:06                     ` Dave Chinner
2010-06-22 14:31               ` Christoph Hellwig
2010-06-22 14:31                 ` Christoph Hellwig
2010-06-22 14:38                 ` Jan Kara
2010-06-22 14:38                   ` Jan Kara
2010-06-22 22:45                   ` Dave Chinner
2010-06-23  1:34                     ` Wu Fengguang
2010-06-23  1:34                       ` Wu Fengguang
2010-06-23  3:06                       ` Dave Chinner
2010-06-23  3:22                         ` Wu Fengguang
2010-06-23  3:22                           ` Wu Fengguang
2010-06-23  6:03                           ` Dave Chinner
2010-06-23  6:03                             ` Dave Chinner
2010-06-23  6:25                             ` Wu Fengguang
2010-06-23  6:25                               ` Wu Fengguang
2010-06-23 23:42                               ` Dave Chinner [this message]
2010-06-23 23:42                                 ` Dave Chinner
2010-06-22 14:41                 ` Wu Fengguang
2010-06-22 11:19       ` Jan Kara
2010-06-22 11:19         ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:31   ` Jan Kara
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 14:02   ` Jan Kara
2010-06-21 14:02     ` Jan Kara
2010-06-21 14:10     ` Jan Kara
2010-06-21 14:10       ` Jan Kara
2010-06-21 14:12       ` Peter Zijlstra
2010-06-18 10:21 ` Peter Zijlstra
2010-06-21 13:42   ` Jan Kara
2010-06-21 13:42     ` Jan Kara
2010-06-22  4:07     ` Wu Fengguang
2010-06-22  4:07       ` Wu Fengguang
2010-06-22 13:27       ` Jan Kara
2010-06-22 13:27         ` Jan Kara
2010-06-22 13:33         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100623234237.GA23223@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.