Re: [RFC][PATCH v2] writeback: limit number of moved inodes in queue_io()

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@linux.vnet.ibm.com>, Mel Gorman <mel@csn.ul.ie>,
	Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Li, Shaohua" <shaohua.li@intel.com>
Subject: Re: [RFC][PATCH v2] writeback: limit number of moved inodes in queue_io()
Date: Sat, 7 May 2011 09:06:19 +1000	[thread overview]
Message-ID: <20110506230619.GG26837@dastard> (raw)
In-Reply-To: <20110506100648.GA3435@localhost>

On Fri, May 06, 2011 at 06:06:48PM +0800, Wu Fengguang wrote:
> On Fri, May 06, 2011 at 04:42:38PM +0800, Wu Fengguang wrote:
> > > patched trace-tar-dd-ext4-2.6.39-rc3+
> > 
> > >        flush-8:0-3048  [004]  1929.981734: writeback_queue_io: bdi 8:0: older=4296600898 age=2 enqueue=13227
> > 
> > > vanilla trace-tar-dd-ext4-2.6.39-rc3
> > 
> > >        flush-8:0-2911  [004]    77.158312: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=18938
> > 
> > >        flush-8:0-2911  [000]    82.461064: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=6957
> > 
> > It looks too much to move 13227 and 18938 inodes at once. So I tried
> > arbitrarily limiting the max move number to 1000 and it helps reduce
> > the lock hold time and contentions a lot.
> 
> Oh it seems 1000 is too small at least for this workload, it hurts
> dd+tar+sync total elapsed time. 
> 
> no limit:
>                 avg        167.486 
>                 stddev       8.996 
> limit=1000:
>                 avg        171.222 
>                 stddev       5.588 
> limit=3000:
>                 avg        165.335 
>                 stddev       5.503 
> 
> So use 3000 as the new limit.

I don't think that's even enough. The number is going to be workload
dependent and while a limit might be a good idea, I don't think it
can be chosen just from one simple benchmark. e.g. what does it to
do performance of workloads creating tens of thousands of small
dirty files a second?

....

>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-b
> ounces   acquisitions   holdtime-min   holdtime-max holdtime-total
> ----------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------
> vanilla 2.6.39-rc3:
>                       inode_wb_list_lock:          2063           2065           0.12        2648.66        5948.99
>  27475         943778           0.09        2704.76      498340.24

I wouldn't consider this a contended lock at all on this workload.

FWIW, my profiles on sustained 8-way small file creation workloads
on ext4 over tens of millions of inodes show a 0.1% contention rate
for the inode_wb_list_lock. That compares to a 2% contention rate
for the inode_lru_lock, a 4% contention rate on the
inode_sb_list_lock and a 6% contention rate on the inode_hash_lock.
So really, the inode_wb_list_lock is not the lock we need to spend
effort on optimising to the nth degree right now...

......
> limit=1000:
> 
> dd+tar+sync total elapsed time (10 runs):
> 				avg        171.222 
> 				stddev       5.588 
> 
>                 &(&wb->list_lock)->rlock:           842            842           0.14         101.10        1013.34
>  20489         970892           0.09         234.11      509829.79
.....
> limit=3000:
> 
> dd+tar+sync total elapsed time (10 runs):
> 				avg        165.335
> 				stddev       5.503
> 
>                 &(&wb->list_lock)->rlock:          1088           1092           0.11         245.08        3268.75
>  21124        1718636           0.09         384.53      849827.20

So, from this acquisitions are doubled, and the total lock hold time
has almost doubled as well. That seems like there's a fair bit of
inefficiency introduced. What does it do to the CPU time consumed by
queue_io() (perf top is your friend)?

FYI, queue_io() is already a _massive_ CPU hog.  See commit dcd79a1
("xfs: don't use vfs writeback for pure metadata modifications") for
how XFS tries to avoid putting dirty inodes on the list if at all
possible:

    Under heavy multi-way parallel create workloads, the VFS
    struggles to write back all the inodes that have been changed in
    age order.  The bdi flusher thread becomes CPU bound, spending
    85% of it's time in the VFS code, mostly traversing the
    superblock dirty inode list to separate dirty inodes old enough
    to flush.

    We already keep an index of all metadata changes in age order -
    in the AIL - and continued log pressure will do age ordered
    writeback without any extra overhead at all. If there is no
    pressure on the log, the xfssyncd will periodically write back
    metadata in ascending disk address offset order so will be very
    efficient.
    .....

We're moving towards only tracking inodes with dirty pages in the
b_dirty list for XFS because this time based expiry is so
inefficient. So anything that reduces the efficiency of
queue_io()....

Cheers,

Dave.


-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-05-06 23:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-20  8:03 [PATCH 0/6] writeback: moving expire targets for background/kupdate works v2 Wu Fengguang
2011-04-20  8:03 ` [PATCH 1/6] writeback: pass writeback_control down to move_expired_inodes() Wu Fengguang
2011-05-04 11:04   ` Christoph Hellwig
2011-05-04 11:13     ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 2/6] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-05-04 11:05   ` Christoph Hellwig
2011-05-04 11:11     ` Wu Fengguang
2011-05-04 11:16       ` Christoph Hellwig
2011-05-04 11:32         ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 3/6] writeback: try more writeback as long as something was written Wu Fengguang
2011-04-20  8:03 ` [PATCH 4/6] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-04-20  8:03 ` [PATCH 5/6] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-04-20 23:40   ` Andrew Morton
2011-04-21  1:14     ` Wu Fengguang
2011-04-21  1:21       ` Wu Fengguang
2011-04-24  3:15     ` Wu Fengguang
2011-04-26 12:17       ` Jan Kara
2011-04-26 13:51         ` Wu Fengguang
2011-04-26 13:59           ` Wu Fengguang
2011-04-26 14:05           ` Wu Fengguang
2011-04-27 11:15           ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 6/6] writeback: refill b_io iff empty Wu Fengguang
2011-05-04  7:39   ` Wu Fengguang
2011-05-05 16:37     ` Jan Kara
2011-05-05 16:47       ` Wu Fengguang
2011-05-06  5:29       ` Wu Fengguang
2011-05-06  8:42         ` [RFC][PATCH] writeback: limit number of moved inodes in queue_io() Wu Fengguang
2011-05-06 10:06           ` [RFC][PATCH v2] " Wu Fengguang
2011-05-06 23:06             ` Dave Chinner [this message]
2011-05-06 14:21         ` [PATCH 6/6] writeback: refill b_io iff empty Jan Kara
2011-05-10  4:31           ` Wu Fengguang
2011-05-10  4:53             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110506230619.GG26837@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=kitayama@cl.bb4u.ne.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mel@linux.vnet.ibm.com \
    --cc=minchan.kim@gmail.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).