All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages
Date: Tue, 20 Apr 2010 12:41:53 +1000	[thread overview]
Message-ID: <1271731314-5893-4-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1271731314-5893-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

If a filesystem writes more than one page in ->writepage, write_cache_pages
fails to notice this and continues to attempt writeback when wbc->nr_to_write
has gone negative - this trace was captured from XFS:


    wbc_writeback_start: towrt=1024
    wbc_writepage: towrt=1024
    wbc_writepage: towrt=0
    wbc_writepage: towrt=-1
    wbc_writepage: towrt=-5
    wbc_writepage: towrt=-21
    wbc_writepage: towrt=-85

This has adverse effects on filesystem writeback behaviour. write_cache_pages()
needs to terminate after a certain number of pages are written, not after a
certain number of calls to ->writepage are made. Make it observe the current
value of wbc->nr_to_write and treat a value of <= 0 as though it is a either a
termination condition or a trigger to reset to MAX_WRITEḆACK_PAGES for data
integrity syncs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c                |    9 ---------
 include/linux/writeback.h        |    9 +++++++++
 include/trace/events/writeback.h |    1 +
 mm/page-writeback.c              |   20 +++++++++++++++++++-
 4 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5214b61..d8271d5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -675,15 +675,6 @@ void writeback_inodes_wbc(struct writeback_control *wbc)
 	writeback_inodes_wb(&bdi->wb, wbc);
 }
 
-/*
- * The maximum number of pages to writeout in a single bdi flush/kupdate
- * operation.  We do this so we don't hold I_SYNC against an inode for
- * enormous amounts of time, which would block a userspace task which has
- * been forced to throttle against that inode.  Also, the code reevaluates
- * the dirty each time it has written this many pages.
- */
-#define MAX_WRITEBACK_PAGES     1024
-
 static inline bool over_bground_thresh(void)
 {
 	unsigned long background_thresh, dirty_thresh;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index b2d615f..8533a0f 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -14,6 +14,15 @@ extern struct list_head inode_in_use;
 extern struct list_head inode_unused;
 
 /*
+ * The maximum number of pages to writeout in a single bdi flush/kupdate
+ * operation.  We do this so we don't hold I_SYNC against an inode for
+ * enormous amounts of time, which would block a userspace task which has
+ * been forced to throttle against that inode.  Also, the code reevaluates
+ * the dirty each time it has written this many pages.
+ */
+#define MAX_WRITEBACK_PAGES     1024
+
+/*
  * fs/fs-writeback.c
  */
 enum writeback_sync_modes {
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 02f34a5..3bcbd83 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -241,6 +241,7 @@ DEFINE_WBC_EVENT(wbc_writeback_wait);
 DEFINE_WBC_EVENT(wbc_balance_dirty_start);
 DEFINE_WBC_EVENT(wbc_balance_dirty_written);
 DEFINE_WBC_EVENT(wbc_balance_dirty_wait);
+DEFINE_WBC_EVENT(wbc_writepage);
 
 #endif /* _TRACE_WRITEBACK_H */
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d45f59e..e22af84 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -917,6 +917,7 @@ continue_unlock:
 			if (!clear_page_dirty_for_io(page))
 				goto continue_unlock;
 
+			trace_wbc_writepage(wbc);
 			ret = (*writepage)(page, wbc, data);
 			if (unlikely(ret)) {
 				if (ret == AOP_WRITEPAGE_ACTIVATE) {
@@ -935,7 +936,7 @@ continue_unlock:
 					done = 1;
 					break;
 				}
- 			}
+			}
 
 			if (nr_to_write > 0) {
 				nr_to_write--;
@@ -955,6 +956,23 @@ continue_unlock:
 					break;
 				}
 			}
+
+			/*
+			 * Some filesystems will write multiple pages in
+			 * ->writepage, so wbc->nr_to_write can change much,
+			 * much faster than nr_to_write. Check this as an exit
+			 * condition, or if we are doing a data integrity sync,
+			 * reset the wbc to MAX_WRITEBACK_PAGES so that such
+			 * filesystems can do optimal writeout here.
+			 */
+			if (wbc->nr_to_write <= 0) {
+				if (wbc->sync_mode == WB_SYNC_NONE) {
+					done = 1;
+					nr_to_write = 0;
+					break;
+				}
+				wbc->nr_to_write = MAX_WRITEBACK_PAGES;
+			}
 		}
 		pagevec_release(&pvec);
 		cond_resched();
-- 
1.6.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages
Date: Tue, 20 Apr 2010 12:41:53 +1000	[thread overview]
Message-ID: <1271731314-5893-4-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1271731314-5893-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

If a filesystem writes more than one page in ->writepage, write_cache_pages
fails to notice this and continues to attempt writeback when wbc->nr_to_write
has gone negative - this trace was captured from XFS:


    wbc_writeback_start: towrt=1024
    wbc_writepage: towrt=1024
    wbc_writepage: towrt=0
    wbc_writepage: towrt=-1
    wbc_writepage: towrt=-5
    wbc_writepage: towrt=-21
    wbc_writepage: towrt=-85

This has adverse effects on filesystem writeback behaviour. write_cache_pages()
needs to terminate after a certain number of pages are written, not after a
certain number of calls to ->writepage are made. Make it observe the current
value of wbc->nr_to_write and treat a value of <= 0 as though it is a either a
termination condition or a trigger to reset to MAX_WRITEḆACK_PAGES for data
integrity syncs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c                |    9 ---------
 include/linux/writeback.h        |    9 +++++++++
 include/trace/events/writeback.h |    1 +
 mm/page-writeback.c              |   20 +++++++++++++++++++-
 4 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5214b61..d8271d5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -675,15 +675,6 @@ void writeback_inodes_wbc(struct writeback_control *wbc)
 	writeback_inodes_wb(&bdi->wb, wbc);
 }
 
-/*
- * The maximum number of pages to writeout in a single bdi flush/kupdate
- * operation.  We do this so we don't hold I_SYNC against an inode for
- * enormous amounts of time, which would block a userspace task which has
- * been forced to throttle against that inode.  Also, the code reevaluates
- * the dirty each time it has written this many pages.
- */
-#define MAX_WRITEBACK_PAGES     1024
-
 static inline bool over_bground_thresh(void)
 {
 	unsigned long background_thresh, dirty_thresh;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index b2d615f..8533a0f 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -14,6 +14,15 @@ extern struct list_head inode_in_use;
 extern struct list_head inode_unused;
 
 /*
+ * The maximum number of pages to writeout in a single bdi flush/kupdate
+ * operation.  We do this so we don't hold I_SYNC against an inode for
+ * enormous amounts of time, which would block a userspace task which has
+ * been forced to throttle against that inode.  Also, the code reevaluates
+ * the dirty each time it has written this many pages.
+ */
+#define MAX_WRITEBACK_PAGES     1024
+
+/*
  * fs/fs-writeback.c
  */
 enum writeback_sync_modes {
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 02f34a5..3bcbd83 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -241,6 +241,7 @@ DEFINE_WBC_EVENT(wbc_writeback_wait);
 DEFINE_WBC_EVENT(wbc_balance_dirty_start);
 DEFINE_WBC_EVENT(wbc_balance_dirty_written);
 DEFINE_WBC_EVENT(wbc_balance_dirty_wait);
+DEFINE_WBC_EVENT(wbc_writepage);
 
 #endif /* _TRACE_WRITEBACK_H */
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d45f59e..e22af84 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -917,6 +917,7 @@ continue_unlock:
 			if (!clear_page_dirty_for_io(page))
 				goto continue_unlock;
 
+			trace_wbc_writepage(wbc);
 			ret = (*writepage)(page, wbc, data);
 			if (unlikely(ret)) {
 				if (ret == AOP_WRITEPAGE_ACTIVATE) {
@@ -935,7 +936,7 @@ continue_unlock:
 					done = 1;
 					break;
 				}
- 			}
+			}
 
 			if (nr_to_write > 0) {
 				nr_to_write--;
@@ -955,6 +956,23 @@ continue_unlock:
 					break;
 				}
 			}
+
+			/*
+			 * Some filesystems will write multiple pages in
+			 * ->writepage, so wbc->nr_to_write can change much,
+			 * much faster than nr_to_write. Check this as an exit
+			 * condition, or if we are doing a data integrity sync,
+			 * reset the wbc to MAX_WRITEBACK_PAGES so that such
+			 * filesystems can do optimal writeout here.
+			 */
+			if (wbc->nr_to_write <= 0) {
+				if (wbc->sync_mode == WB_SYNC_NONE) {
+					done = 1;
+					nr_to_write = 0;
+					break;
+				}
+				wbc->nr_to_write = MAX_WRITEBACK_PAGES;
+			}
 		}
 		pagevec_release(&pvec);
 		cond_resched();
-- 
1.6.5


  parent reply	other threads:[~2010-04-20  2:40 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-20  2:41 [PATCH 0/4] writeback: tracing and wbc->nr_to_write fixes Dave Chinner
2010-04-20  2:41 ` Dave Chinner
2010-04-20  2:41 ` [PATCH 1/4] writeback: initial tracing support Dave Chinner
2010-04-20  2:41   ` Dave Chinner
2010-05-21 15:06   ` Christoph Hellwig
2010-05-21 15:06     ` Christoph Hellwig
2010-04-20  2:41 ` [PATCH 2/4] writeback: Add tracing to balance_dirty_pages Dave Chinner
2010-04-20  2:41   ` Dave Chinner
2010-04-20  2:41 ` Dave Chinner [this message]
2010-04-20  2:41   ` [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages Dave Chinner
2010-04-22 19:07   ` Jan Kara
2010-04-22 19:07     ` Jan Kara
2010-04-25  3:33   ` tytso
2010-04-25  3:33     ` tytso
2010-04-25  3:33     ` tytso
2010-04-26  1:49     ` Dave Chinner
2010-04-26  1:49       ` Dave Chinner
2010-04-26  1:49       ` Dave Chinner
2010-04-26  2:43       ` tytso
2010-04-26  2:43         ` tytso
2010-04-26  2:45         ` tytso
2010-04-26  2:45           ` tytso
2010-04-27  3:30         ` Dave Chinner
2010-04-27  3:30           ` Dave Chinner
2010-04-29 21:39   ` Andrew Morton
2010-04-29 21:39     ` Andrew Morton
2010-04-30  6:01     ` Aneesh Kumar K. V
2010-04-30  6:01       ` Aneesh Kumar K. V
2010-04-30 19:43       ` Andrew Morton
2010-04-30 19:43         ` Andrew Morton
2010-05-01 19:47         ` tytso
2010-05-01 19:47           ` tytso
2010-04-20  2:41 ` [PATCH 4/4] xfs: remove nr_to_write writeback windup Dave Chinner
2010-04-20  2:41   ` Dave Chinner
2010-04-22 19:09   ` Jan Kara
2010-04-22 19:09     ` Jan Kara
2010-04-26  0:46     ` Dave Chinner
2010-04-26  0:46       ` Dave Chinner
2010-04-20  3:40 ` [PATCH 5/4] writeback: limit write_cache_pages integrity scanning to current EOF Dave Chinner
2010-04-20  3:40   ` Dave Chinner
2010-04-20 23:28   ` Jamie Lokier
2010-04-20 23:28     ` Jamie Lokier
2010-04-20 23:31     ` Dave Chinner
2010-04-20 23:31       ` Dave Chinner
2010-04-22 19:13   ` Jan Kara
2010-04-22 19:13     ` Jan Kara
2010-04-20 12:02 ` [PATCH 0/4] writeback: tracing and wbc->nr_to_write fixes Richard Kennedy
2010-04-20 12:02   ` Richard Kennedy
2010-04-20 23:29   ` Dave Chinner
2010-04-20 23:29     ` Dave Chinner
2010-05-21 15:05 ` Christoph Hellwig
2010-05-21 15:05   ` Christoph Hellwig
2010-05-22  0:09   ` Dave Chinner
2010-05-22  0:09     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1271731314-5893-4-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.