linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Chris Mason <chris.mason@oracle.com>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	Jens Axboe <axboe@kernel.dk>, Mel Gorman <mel@csn.ul.ie>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Hellwig <hch@lst.de>, linux-mm <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH] writeback: safety margin for bdi stat errors
Date: Tue, 7 Dec 2010 21:11:36 +0800	[thread overview]
Message-ID: <20101207131136.GA20366@localhost> (raw)
In-Reply-To: <4CFB9BE1.3030902@redhat.com>

On Sun, Dec 05, 2010 at 10:04:17PM +0800, Rik van Riel wrote:
> On 12/05/2010 01:44 AM, Wu Fengguang wrote:
> > I noticed that my NFSROOT test system goes slow responding when there
> > is heavy dd to a local disk. Traces show that the NFSROOT's bdi_limit
> > is near 0 and many tasks in the system are repeatedly stuck in
> > balance_dirty_pages().
> >
> > There are two related problems:
> >
> > - light dirtiers at one device (more often than not the rootfs) get
> >    heavily impacted by heavy dirtiers on another independent device
> >
> > - the light dirtied device does heavy throttling because bdi_limit=0,
> >    and the heavy throttling may in turn withhold its bdi_limit in 0 as
> >    it cannot dirty fast enough to grow up the bdi's proportional weight.
> >
> > Fix it by introducing some "low pass" gate, which is a small (<=8MB)
> > value reserved by others and can be safely "stole" from the current
> > global dirty margin.  It does not need to be big to help the bdi gain
> > its initial weight.
> 
> Makes a lot of sense to me.
> 
> Acked-by: Rik van Riel <riel@redhat.com>

Thanks. I find the problem when testing the IO-less balance_dirty_pages(). 
The old kernel may behave a bit better, but should still benefit from
the patch.

Now I find one more problem..with a fix.

---
Subject: writeback: safety margin for bdi stat error
Date: Tue Dec 07 20:38:28 CST 2010

In a simple dd test on a 8p system with "mem=256M", I find the light
dirtier tasks on the root fs are all heavily throttled. That happens
because the global limit is exceeded. It's unbelievable at first sight,
because the test fs doing the heavy dd is under its bdi limit.  After
doing some tracing, it's discovered that

	bdi_dirty < bdi_limit < global_limit < nr_dirty

So the root cause is, the bdi_dirty is well under nr_dirty due to
accounting errors. They should be very close because there is only one
heavy dirtied bdi in the system. This can be fixed by using
bdi_stat_sum(), however that's costly on large NUMA machines. So do a
less costly fix of lowering the bdi limit, so that the accounting
errors won't lead to the absurd situation "global limit exceeded but
bdi limit not exceeded".

CC: Rik van Riel <riel@redhat.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |    5 +++++
 1 file changed, 5 insertions(+)

--- linux-next.orig/mm/page-writeback.c	2010-12-07 20:35:00.000000000 +0800
+++ linux-next/mm/page-writeback.c	2010-12-07 20:37:34.000000000 +0800
@@ -451,6 +451,11 @@ unsigned long bdi_dirty_limit(struct bac
 	u64 bdi_dirty;
 	long numerator, denominator;
 
+	if (likely(dirty > bdi_stat_error(bdi)))
+		dirty -= bdi_stat_error(bdi);
+	else
+		return 0;
+
 	/*
 	 * Calculate this BDI's share of the dirty ratio.
 	 */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-12-07 13:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-05  6:44 [PATCH] writeback: enabling-gate for light dirtied bdi Wu Fengguang
2010-12-05 14:04 ` Rik van Riel
2010-12-07 13:11   ` Wu Fengguang [this message]
     [not found]     ` <20101207143351.GA23377@localhost>
2010-12-07 15:21       ` ext4 memory leak? Wu Fengguang
2010-12-07 16:38         ` Ted Ts'o
2010-12-08  2:40           ` Wu Fengguang
2010-12-08  3:07             ` Theodore Tso
2010-12-08  6:10               ` Wu Fengguang
2010-12-07 17:34     ` [PATCH] writeback: safety margin for bdi stat errors Rik van Riel
2010-12-08  0:51 ` [PATCH] writeback: enabling-gate for light dirtied bdi Andrew Morton
2010-12-08  4:04   ` Wu Fengguang
2010-12-08  4:30   ` [PATCH v2] " Wu Fengguang
2010-12-08  4:37     ` [PATCH v2] writeback: safety margin for bdi stat error Wu Fengguang
2010-12-08 15:31     ` [PATCH v2] writeback: enabling-gate for light dirtied bdi Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101207131136.GA20366@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).