All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@linux.vnet.ibm.com>, Mel Gorman <mel@csn.ul.ie>,
	Dave Chinner <david@fromorbit.com>,
	Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Li, Shaohua" <shaohua.li@intel.com>
Subject: [RFC][PATCH v2] writeback: limit number of moved inodes in queue_io()
Date: Fri, 6 May 2011 18:06:48 +0800	[thread overview]
Message-ID: <20110506100648.GA3435@localhost> (raw)
In-Reply-To: <20110506084238.GA487@localhost>

On Fri, May 06, 2011 at 04:42:38PM +0800, Wu Fengguang wrote:
> > patched trace-tar-dd-ext4-2.6.39-rc3+
> 
> >        flush-8:0-3048  [004]  1929.981734: writeback_queue_io: bdi 8:0: older=4296600898 age=2 enqueue=13227
> 
> > vanilla trace-tar-dd-ext4-2.6.39-rc3
> 
> >        flush-8:0-2911  [004]    77.158312: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=18938
> 
> >        flush-8:0-2911  [000]    82.461064: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=6957
> 
> It looks too much to move 13227 and 18938 inodes at once. So I tried
> arbitrarily limiting the max move number to 1000 and it helps reduce
> the lock hold time and contentions a lot.

Oh it seems 1000 is too small at least for this workload, it hurts
dd+tar+sync total elapsed time. 

no limit:
                avg        167.486 
                stddev       8.996 
limit=1000:
                avg        171.222 
                stddev       5.588 
limit=3000:
                avg        165.335 
                stddev       5.503 

So use 3000 as the new limit.

Thanks,
Fengguang
---
Subject: writeback: limit number of moved inodes in queue_io()
Date: Fri May 06 13:34:08 CST 2011

Only move 3000 inodes from b_dirty to b_io at one time. This reduces
lock max hold time and lock contentions by many times in a simple dd+tar
workload in a 8p test box. This workload was observed to move 10000+
inodes in one shot on ext4 which was obviously too much.

                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-b
ounces   acquisitions   holdtime-min   holdtime-max holdtime-total
----------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
vanilla 2.6.39-rc3:
                      inode_wb_list_lock:          2063           2065           0.12        2648.66        5948.99
 27475         943778           0.09        2704.76      498340.24
                      ------------------
                      inode_wb_list_lock             89          [<ffffffff8115cf3a>] sync_inode+0x28/0x5f
                      inode_wb_list_lock             38          [<ffffffff8115ccab>] inode_wait_for_writeback+0xa8/0xc6
                      inode_wb_list_lock            629          [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock            842          [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157
                      ------------------
                      inode_wb_list_lock            891          [<ffffffff8115ce3e>] writeback_single_inode+0x175/0x249
                      inode_wb_list_lock             13          [<ffffffff8115dc4e>] writeback_inodes_wb+0x3a/0x143
                      inode_wb_list_lock            499          [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock            617          [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157

limit=1000:

dd+tar+sync total elapsed time (10 runs):
				avg        171.222 
				stddev       5.588 

                &(&wb->list_lock)->rlock:           842            842           0.14         101.10        1013.34
 20489         970892           0.09         234.11      509829.79
                ------------------------
                &(&wb->list_lock)->rlock            275          [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock            114          [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
                &(&wb->list_lock)->rlock             56          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                &(&wb->list_lock)->rlock            132          [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
                ------------------------
                &(&wb->list_lock)->rlock              2          [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
                &(&wb->list_lock)->rlock             33          [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
                &(&wb->list_lock)->rlock              9          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                &(&wb->list_lock)->rlock            430          [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e

limit=3000:

dd+tar+sync total elapsed time (10 runs):
				avg        165.335
				stddev       5.503

                &(&wb->list_lock)->rlock:          1088           1092           0.11         245.08        3268.75
 21124        1718636           0.09         384.53      849827.20
                ------------------------
                &(&wb->list_lock)->rlock            518          [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock              3          [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
                &(&wb->list_lock)->rlock             54          [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
                &(&wb->list_lock)->rlock             10          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                ------------------------
                &(&wb->list_lock)->rlock              4          [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
                &(&wb->list_lock)->rlock            379          [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock              4          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                &(&wb->list_lock)->rlock            446          [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    2 ++
 1 file changed, 2 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2011-05-06 13:32:41.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-05-06 16:44:58.000000000 +0800
@@ -279,6 +279,8 @@ static int move_expired_inodes(struct li
 		sb = inode->i_sb;
 		list_move(&inode->i_wb_list, &tmp);
 		moved++;
+		if (unlikely(moved >= 3000))	/* limit spinlock hold time */
+			break;
 	}
 
 	/* just one sb in list, splice to dispatch_queue and we're done */

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@linux.vnet.ibm.com>, Mel Gorman <mel@csn.ul.ie>,
	Dave Chinner <david@fromorbit.com>,
	Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Li, Shaohua" <shaohua.li@intel.com>
Subject: [RFC][PATCH v2] writeback: limit number of moved inodes in queue_io()
Date: Fri, 6 May 2011 18:06:48 +0800	[thread overview]
Message-ID: <20110506100648.GA3435@localhost> (raw)
In-Reply-To: <20110506084238.GA487@localhost>

On Fri, May 06, 2011 at 04:42:38PM +0800, Wu Fengguang wrote:
> > patched trace-tar-dd-ext4-2.6.39-rc3+
> 
> >        flush-8:0-3048  [004]  1929.981734: writeback_queue_io: bdi 8:0: older=4296600898 age=2 enqueue=13227
> 
> > vanilla trace-tar-dd-ext4-2.6.39-rc3
> 
> >        flush-8:0-2911  [004]    77.158312: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=18938
> 
> >        flush-8:0-2911  [000]    82.461064: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=6957
> 
> It looks too much to move 13227 and 18938 inodes at once. So I tried
> arbitrarily limiting the max move number to 1000 and it helps reduce
> the lock hold time and contentions a lot.

Oh it seems 1000 is too small at least for this workload, it hurts
dd+tar+sync total elapsed time. 

no limit:
                avg        167.486 
                stddev       8.996 
limit=1000:
                avg        171.222 
                stddev       5.588 
limit=3000:
                avg        165.335 
                stddev       5.503 

So use 3000 as the new limit.

Thanks,
Fengguang
---
Subject: writeback: limit number of moved inodes in queue_io()
Date: Fri May 06 13:34:08 CST 2011

Only move 3000 inodes from b_dirty to b_io at one time. This reduces
lock max hold time and lock contentions by many times in a simple dd+tar
workload in a 8p test box. This workload was observed to move 10000+
inodes in one shot on ext4 which was obviously too much.

                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-b
ounces   acquisitions   holdtime-min   holdtime-max holdtime-total
----------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
vanilla 2.6.39-rc3:
                      inode_wb_list_lock:          2063           2065           0.12        2648.66        5948.99
 27475         943778           0.09        2704.76      498340.24
                      ------------------
                      inode_wb_list_lock             89          [<ffffffff8115cf3a>] sync_inode+0x28/0x5f
                      inode_wb_list_lock             38          [<ffffffff8115ccab>] inode_wait_for_writeback+0xa8/0xc6
                      inode_wb_list_lock            629          [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock            842          [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157
                      ------------------
                      inode_wb_list_lock            891          [<ffffffff8115ce3e>] writeback_single_inode+0x175/0x249
                      inode_wb_list_lock             13          [<ffffffff8115dc4e>] writeback_inodes_wb+0x3a/0x143
                      inode_wb_list_lock            499          [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock            617          [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157

limit=1000:

dd+tar+sync total elapsed time (10 runs):
				avg        171.222 
				stddev       5.588 

                &(&wb->list_lock)->rlock:           842            842           0.14         101.10        1013.34
 20489         970892           0.09         234.11      509829.79
                ------------------------
                &(&wb->list_lock)->rlock            275          [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock            114          [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
                &(&wb->list_lock)->rlock             56          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                &(&wb->list_lock)->rlock            132          [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
                ------------------------
                &(&wb->list_lock)->rlock              2          [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
                &(&wb->list_lock)->rlock             33          [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
                &(&wb->list_lock)->rlock              9          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                &(&wb->list_lock)->rlock            430          [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e

limit=3000:

dd+tar+sync total elapsed time (10 runs):
				avg        165.335
				stddev       5.503

                &(&wb->list_lock)->rlock:          1088           1092           0.11         245.08        3268.75
 21124        1718636           0.09         384.53      849827.20
                ------------------------
                &(&wb->list_lock)->rlock            518          [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock              3          [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
                &(&wb->list_lock)->rlock             54          [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
                &(&wb->list_lock)->rlock             10          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                ------------------------
                &(&wb->list_lock)->rlock              4          [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
                &(&wb->list_lock)->rlock            379          [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock              4          [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
                &(&wb->list_lock)->rlock            446          [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    2 ++
 1 file changed, 2 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2011-05-06 13:32:41.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-05-06 16:44:58.000000000 +0800
@@ -279,6 +279,8 @@ static int move_expired_inodes(struct li
 		sb = inode->i_sb;
 		list_move(&inode->i_wb_list, &tmp);
 		moved++;
+		if (unlikely(moved >= 3000))	/* limit spinlock hold time */
+			break;
 	}
 
 	/* just one sb in list, splice to dispatch_queue and we're done */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-05-06 10:06 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-20  8:03 [PATCH 0/6] writeback: moving expire targets for background/kupdate works v2 Wu Fengguang
2011-04-20  8:03 ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 1/6] writeback: pass writeback_control down to move_expired_inodes() Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-05-04 11:04   ` Christoph Hellwig
2011-05-04 11:04     ` Christoph Hellwig
2011-05-04 11:13     ` Wu Fengguang
2011-05-04 11:13       ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 2/6] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-05-04 11:05   ` Christoph Hellwig
2011-05-04 11:05     ` Christoph Hellwig
2011-05-04 11:11     ` Wu Fengguang
2011-05-04 11:11       ` Wu Fengguang
2011-05-04 11:16       ` Christoph Hellwig
2011-05-04 11:16         ` Christoph Hellwig
2011-05-04 11:32         ` Wu Fengguang
2011-05-04 11:32           ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 3/6] writeback: try more writeback as long as something was written Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 4/6] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 5/6] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-04-20 23:40   ` Andrew Morton
2011-04-20 23:40     ` Andrew Morton
2011-04-20 23:40     ` Andrew Morton
2011-04-21  1:14     ` Wu Fengguang
2011-04-21  1:14       ` Wu Fengguang
2011-04-21  1:21       ` Wu Fengguang
2011-04-21  1:21         ` Wu Fengguang
2011-04-24  3:15     ` Wu Fengguang
2011-04-24  3:15       ` Wu Fengguang
2011-04-26 12:17       ` Jan Kara
2011-04-26 12:17         ` Jan Kara
2011-04-26 13:51         ` Wu Fengguang
2011-04-26 13:51           ` Wu Fengguang
2011-04-26 13:59           ` Wu Fengguang
2011-04-26 13:59             ` Wu Fengguang
2011-04-26 14:05           ` Wu Fengguang
2011-04-26 14:05             ` Wu Fengguang
2011-04-27 11:15           ` Wu Fengguang
2011-04-27 11:15             ` Wu Fengguang
2011-04-20  8:03 ` [PATCH 6/6] writeback: refill b_io iff empty Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-04-20  8:03   ` Wu Fengguang
2011-05-04  7:39   ` Wu Fengguang
2011-05-05 16:37     ` Jan Kara
2011-05-05 16:37       ` Jan Kara
2011-05-05 16:47       ` Wu Fengguang
2011-05-05 16:47         ` Wu Fengguang
2011-05-06  5:29       ` Wu Fengguang
2011-05-06  5:29         ` Wu Fengguang
2011-05-06  8:42         ` [RFC][PATCH] writeback: limit number of moved inodes in queue_io() Wu Fengguang
2011-05-06  8:42           ` Wu Fengguang
2011-05-06 10:06           ` Wu Fengguang [this message]
2011-05-06 10:06             ` [RFC][PATCH v2] " Wu Fengguang
2011-05-06 23:06             ` Dave Chinner
2011-05-06 23:06               ` Dave Chinner
2011-05-06 14:21         ` [PATCH 6/6] writeback: refill b_io iff empty Jan Kara
2011-05-06 14:21           ` Jan Kara
2011-05-10  4:31           ` Wu Fengguang
2011-05-10  4:53             ` Dave Chinner
2011-05-10  4:53               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110506100648.GA3435@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=kitayama@cl.bb4u.ne.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mel@linux.vnet.ibm.com \
    --cc=minchan.kim@gmail.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.