From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@linux.vnet.ibm.com>, Mel Gorman <mel@csn.ul.ie>,
Dave Chinner <david@fromorbit.com>,
Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
Minchan Kim <minchan.kim@gmail.com>,
Linux Memory Management List <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
"Li, Shaohua" <shaohua.li@intel.com>
Subject: [RFC][PATCH v2] writeback: limit number of moved inodes in queue_io()
Date: Fri, 6 May 2011 18:06:48 +0800 [thread overview]
Message-ID: <20110506100648.GA3435@localhost> (raw)
In-Reply-To: <20110506084238.GA487@localhost>
On Fri, May 06, 2011 at 04:42:38PM +0800, Wu Fengguang wrote:
> > patched trace-tar-dd-ext4-2.6.39-rc3+
>
> > flush-8:0-3048 [004] 1929.981734: writeback_queue_io: bdi 8:0: older=4296600898 age=2 enqueue=13227
>
> > vanilla trace-tar-dd-ext4-2.6.39-rc3
>
> > flush-8:0-2911 [004] 77.158312: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=18938
>
> > flush-8:0-2911 [000] 82.461064: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=6957
>
> It looks too much to move 13227 and 18938 inodes at once. So I tried
> arbitrarily limiting the max move number to 1000 and it helps reduce
> the lock hold time and contentions a lot.
Oh it seems 1000 is too small at least for this workload, it hurts
dd+tar+sync total elapsed time.
no limit:
avg 167.486
stddev 8.996
limit=1000:
avg 171.222
stddev 5.588
limit=3000:
avg 165.335
stddev 5.503
So use 3000 as the new limit.
Thanks,
Fengguang
---
Subject: writeback: limit number of moved inodes in queue_io()
Date: Fri May 06 13:34:08 CST 2011
Only move 3000 inodes from b_dirty to b_io at one time. This reduces
lock max hold time and lock contentions by many times in a simple dd+tar
workload in a 8p test box. This workload was observed to move 10000+
inodes in one shot on ext4 which was obviously too much.
class name con-bounces contentions waittime-min waittime-max waittime-total acq-b
ounces acquisitions holdtime-min holdtime-max holdtime-total
----------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
vanilla 2.6.39-rc3:
inode_wb_list_lock: 2063 2065 0.12 2648.66 5948.99
27475 943778 0.09 2704.76 498340.24
------------------
inode_wb_list_lock 89 [<ffffffff8115cf3a>] sync_inode+0x28/0x5f
inode_wb_list_lock 38 [<ffffffff8115ccab>] inode_wait_for_writeback+0xa8/0xc6
inode_wb_list_lock 629 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
inode_wb_list_lock 842 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157
------------------
inode_wb_list_lock 891 [<ffffffff8115ce3e>] writeback_single_inode+0x175/0x249
inode_wb_list_lock 13 [<ffffffff8115dc4e>] writeback_inodes_wb+0x3a/0x143
inode_wb_list_lock 499 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
inode_wb_list_lock 617 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157
limit=1000:
dd+tar+sync total elapsed time (10 runs):
avg 171.222
stddev 5.588
&(&wb->list_lock)->rlock: 842 842 0.14 101.10 1013.34
20489 970892 0.09 234.11 509829.79
------------------------
&(&wb->list_lock)->rlock 275 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 114 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
&(&wb->list_lock)->rlock 56 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 132 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
------------------------
&(&wb->list_lock)->rlock 2 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 33 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
&(&wb->list_lock)->rlock 9 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 430 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
limit=3000:
dd+tar+sync total elapsed time (10 runs):
avg 165.335
stddev 5.503
&(&wb->list_lock)->rlock: 1088 1092 0.11 245.08 3268.75
21124 1718636 0.09 384.53 849827.20
------------------------
&(&wb->list_lock)->rlock 518 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 3 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 54 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
&(&wb->list_lock)->rlock 10 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
------------------------
&(&wb->list_lock)->rlock 4 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 379 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 4 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 446 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
fs/fs-writeback.c | 2 ++
1 file changed, 2 insertions(+)
--- linux-next.orig/fs/fs-writeback.c 2011-05-06 13:32:41.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2011-05-06 16:44:58.000000000 +0800
@@ -279,6 +279,8 @@ static int move_expired_inodes(struct li
sb = inode->i_sb;
list_move(&inode->i_wb_list, &tmp);
moved++;
+ if (unlikely(moved >= 3000)) /* limit spinlock hold time */
+ break;
}
/* just one sb in list, splice to dispatch_queue and we're done */
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@linux.vnet.ibm.com>, Mel Gorman <mel@csn.ul.ie>,
Dave Chinner <david@fromorbit.com>,
Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
Minchan Kim <minchan.kim@gmail.com>,
Linux Memory Management List <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
"Li, Shaohua" <shaohua.li@intel.com>
Subject: [RFC][PATCH v2] writeback: limit number of moved inodes in queue_io()
Date: Fri, 6 May 2011 18:06:48 +0800 [thread overview]
Message-ID: <20110506100648.GA3435@localhost> (raw)
In-Reply-To: <20110506084238.GA487@localhost>
On Fri, May 06, 2011 at 04:42:38PM +0800, Wu Fengguang wrote:
> > patched trace-tar-dd-ext4-2.6.39-rc3+
>
> > flush-8:0-3048 [004] 1929.981734: writeback_queue_io: bdi 8:0: older=4296600898 age=2 enqueue=13227
>
> > vanilla trace-tar-dd-ext4-2.6.39-rc3
>
> > flush-8:0-2911 [004] 77.158312: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=18938
>
> > flush-8:0-2911 [000] 82.461064: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=6957
>
> It looks too much to move 13227 and 18938 inodes at once. So I tried
> arbitrarily limiting the max move number to 1000 and it helps reduce
> the lock hold time and contentions a lot.
Oh it seems 1000 is too small at least for this workload, it hurts
dd+tar+sync total elapsed time.
no limit:
avg 167.486
stddev 8.996
limit=1000:
avg 171.222
stddev 5.588
limit=3000:
avg 165.335
stddev 5.503
So use 3000 as the new limit.
Thanks,
Fengguang
---
Subject: writeback: limit number of moved inodes in queue_io()
Date: Fri May 06 13:34:08 CST 2011
Only move 3000 inodes from b_dirty to b_io at one time. This reduces
lock max hold time and lock contentions by many times in a simple dd+tar
workload in a 8p test box. This workload was observed to move 10000+
inodes in one shot on ext4 which was obviously too much.
class name con-bounces contentions waittime-min waittime-max waittime-total acq-b
ounces acquisitions holdtime-min holdtime-max holdtime-total
----------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
vanilla 2.6.39-rc3:
inode_wb_list_lock: 2063 2065 0.12 2648.66 5948.99
27475 943778 0.09 2704.76 498340.24
------------------
inode_wb_list_lock 89 [<ffffffff8115cf3a>] sync_inode+0x28/0x5f
inode_wb_list_lock 38 [<ffffffff8115ccab>] inode_wait_for_writeback+0xa8/0xc6
inode_wb_list_lock 629 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
inode_wb_list_lock 842 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157
------------------
inode_wb_list_lock 891 [<ffffffff8115ce3e>] writeback_single_inode+0x175/0x249
inode_wb_list_lock 13 [<ffffffff8115dc4e>] writeback_inodes_wb+0x3a/0x143
inode_wb_list_lock 499 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
inode_wb_list_lock 617 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157
limit=1000:
dd+tar+sync total elapsed time (10 runs):
avg 171.222
stddev 5.588
&(&wb->list_lock)->rlock: 842 842 0.14 101.10 1013.34
20489 970892 0.09 234.11 509829.79
------------------------
&(&wb->list_lock)->rlock 275 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 114 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
&(&wb->list_lock)->rlock 56 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 132 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
------------------------
&(&wb->list_lock)->rlock 2 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 33 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
&(&wb->list_lock)->rlock 9 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 430 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
limit=3000:
dd+tar+sync total elapsed time (10 runs):
avg 165.335
stddev 5.503
&(&wb->list_lock)->rlock: 1088 1092 0.11 245.08 3268.75
21124 1718636 0.09 384.53 849827.20
------------------------
&(&wb->list_lock)->rlock 518 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 3 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 54 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
&(&wb->list_lock)->rlock 10 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
------------------------
&(&wb->list_lock)->rlock 4 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 379 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 4 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 446 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
fs/fs-writeback.c | 2 ++
1 file changed, 2 insertions(+)
--- linux-next.orig/fs/fs-writeback.c 2011-05-06 13:32:41.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2011-05-06 16:44:58.000000000 +0800
@@ -279,6 +279,8 @@ static int move_expired_inodes(struct li
sb = inode->i_sb;
list_move(&inode->i_wb_list, &tmp);
moved++;
+ if (unlikely(moved >= 3000)) /* limit spinlock hold time */
+ break;
}
/* just one sb in list, splice to dispatch_queue and we're done */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-05-06 10:06 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-20 8:03 [PATCH 0/6] writeback: moving expire targets for background/kupdate works v2 Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 1/6] writeback: pass writeback_control down to move_expired_inodes() Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-05-04 11:04 ` Christoph Hellwig
2011-05-04 11:04 ` Christoph Hellwig
2011-05-04 11:13 ` Wu Fengguang
2011-05-04 11:13 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 2/6] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-05-04 11:05 ` Christoph Hellwig
2011-05-04 11:05 ` Christoph Hellwig
2011-05-04 11:11 ` Wu Fengguang
2011-05-04 11:11 ` Wu Fengguang
2011-05-04 11:16 ` Christoph Hellwig
2011-05-04 11:16 ` Christoph Hellwig
2011-05-04 11:32 ` Wu Fengguang
2011-05-04 11:32 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 3/6] writeback: try more writeback as long as something was written Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 4/6] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 5/6] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 23:40 ` Andrew Morton
2011-04-20 23:40 ` Andrew Morton
2011-04-20 23:40 ` Andrew Morton
2011-04-21 1:14 ` Wu Fengguang
2011-04-21 1:14 ` Wu Fengguang
2011-04-21 1:21 ` Wu Fengguang
2011-04-21 1:21 ` Wu Fengguang
2011-04-24 3:15 ` Wu Fengguang
2011-04-24 3:15 ` Wu Fengguang
2011-04-26 12:17 ` Jan Kara
2011-04-26 12:17 ` Jan Kara
2011-04-26 13:51 ` Wu Fengguang
2011-04-26 13:51 ` Wu Fengguang
2011-04-26 13:59 ` Wu Fengguang
2011-04-26 13:59 ` Wu Fengguang
2011-04-26 14:05 ` Wu Fengguang
2011-04-26 14:05 ` Wu Fengguang
2011-04-27 11:15 ` Wu Fengguang
2011-04-27 11:15 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 6/6] writeback: refill b_io iff empty Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-04-20 8:03 ` Wu Fengguang
2011-05-04 7:39 ` Wu Fengguang
2011-05-05 16:37 ` Jan Kara
2011-05-05 16:37 ` Jan Kara
2011-05-05 16:47 ` Wu Fengguang
2011-05-05 16:47 ` Wu Fengguang
2011-05-06 5:29 ` Wu Fengguang
2011-05-06 5:29 ` Wu Fengguang
2011-05-06 8:42 ` [RFC][PATCH] writeback: limit number of moved inodes in queue_io() Wu Fengguang
2011-05-06 8:42 ` Wu Fengguang
2011-05-06 10:06 ` Wu Fengguang [this message]
2011-05-06 10:06 ` [RFC][PATCH v2] " Wu Fengguang
2011-05-06 23:06 ` Dave Chinner
2011-05-06 23:06 ` Dave Chinner
2011-05-06 14:21 ` [PATCH 6/6] writeback: refill b_io iff empty Jan Kara
2011-05-06 14:21 ` Jan Kara
2011-05-10 4:31 ` Wu Fengguang
2011-05-10 4:53 ` Dave Chinner
2011-05-10 4:53 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110506100648.GA3435@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=kitayama@cl.bb4u.ne.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mel@linux.vnet.ibm.com \
--cc=minchan.kim@gmail.com \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.