linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: <linux-fsdevel@vger.kernel.org>
Subject: [PATCH 2/6] writeback: skip balance_dirty_pages() for in-memory fs
Date: Wed, 04 May 2011 17:17:09 +0800	[thread overview]
Message-ID: <20110504091909.399748028@intel.com> (raw)
In-Reply-To: 20110504091707.910929441@intel.com

[-- Attachment #1: writeback-trace-global-dirty-states-fix.patch --]
[-- Type: text/plain, Size: 2964 bytes --]

This avoids unnecessary checks and dirty throttling on tmpfs/ramfs.

It can also prevent

[  388.126563] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050

in the balance_dirty_pages tracepoint, which will call

	dev_name(mapping->backing_dev_info->dev)

but shmem_backing_dev_info.dev is NULL.

Summary notes about the tmpfs/ramfs behavior changes:

As for 2.6.36 and older kernels, the tmpfs writes will sleep inside
balance_dirty_pages() as long as we are over the (dirty+background)/2
global throttle threshold.  This is because both the dirty pages and
threshold will be 0 for tmpfs/ramfs. Hence this test will always
evaluate to TRUE:

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
                        || (nr_reclaimable + nr_writeback >= dirty_thresh);

For 2.6.37, someone complained that the current logic does not allow the
users to set vm.dirty_ratio=0.  So commit 4cbec4c8b9 changed the test to

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh)
                        || (nr_reclaimable + nr_writeback > dirty_thresh);

So 2.6.37 will behave differently for tmpfs/ramfs: it will never get
throttled unless the global dirty threshold is exceeded (which is very
unlikely to happen; once happen, will block many tasks).

I'd say that the 2.6.36 behavior is very bad for tmpfs/ramfs. It means
for a busy writing server, tmpfs write()s may get livelocked! The
"inadvertent" throttling can hardly bring help to any workload because
of its "either no throttling, or get throttled to death" property.

So based on 2.6.37, this patch won't bring more noticeable changes.

CC: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2011-04-13 17:18:10.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-04-13 17:18:10.000000000 +0800
@@ -244,13 +244,8 @@ void task_dirty_inc(struct task_struct *
 static void bdi_writeout_fraction(struct backing_dev_info *bdi,
 		long *numerator, long *denominator)
 {
-	if (bdi_cap_writeback_dirty(bdi)) {
-		prop_fraction_percpu(&vm_completions, &bdi->completions,
+	prop_fraction_percpu(&vm_completions, &bdi->completions,
 				numerator, denominator);
-	} else {
-		*numerator = 0;
-		*denominator = 1;
-	}
 }
 
 static inline void task_dirties_fraction(struct task_struct *tsk,
@@ -495,6 +490,9 @@ static void balance_dirty_pages(struct a
 	bool dirty_exceeded = false;
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 
+	if (!bdi_cap_account_dirty(bdi))
+		return;
+
 	for (;;) {
 		struct writeback_control wbc = {
 			.sync_mode	= WB_SYNC_NONE,

  parent reply	other threads:[~2011-05-04  9:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04  9:17 [PATCH 0/6] writeback fixes and trace events Wu Fengguang
2011-05-04  9:17 ` [PATCH 1/6] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
2011-05-04  9:17 ` Wu Fengguang [this message]
2011-05-04  9:17 ` [PATCH 3/6] writeback: make nr_to_write a per-file limit Wu Fengguang
2011-05-04  9:42   ` Christoph Hellwig
2011-05-04 11:52     ` Wu Fengguang
2011-05-04 15:51       ` Wu Fengguang
2011-05-04 16:18         ` Christoph Hellwig
2011-05-05 10:47           ` Wu Fengguang
2011-05-04  9:17 ` [PATCH 4/6] writeback: trace event writeback_single_inode Wu Fengguang
2011-05-04  9:17 ` [PATCH 5/6] writeback: trace event writeback_queue_io Wu Fengguang
2011-05-05 16:37   ` [PATCH 5/6] writeback: trace event writeback_queue_io (v2) Wu Fengguang
2011-05-05 17:26     ` Jan Kara
2011-05-04  9:17 ` [PATCH 6/6] writeback: convert to relative older_than_this in trace events Wu Fengguang
2011-05-04 22:23   ` Jan Kara
2011-05-05 12:33     ` Wu Fengguang
2011-05-04  9:46 ` [PATCH 0/6] writeback fixes and " Christoph Hellwig
2011-05-04  9:56   ` Wu Fengguang
2011-05-04 10:06     ` Dave Chinner
2011-05-04 11:18       ` Christoph Hellwig
2011-05-04 13:12       ` Theodore Tso
2011-05-04 22:31         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110504091909.399748028@intel.com \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).