All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <chris.mason@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"richard@rsk.demon.co.uk" <richard@rsk.demon.co.uk>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>
Subject: Re: regression in page writeback
Date: Tue, 29 Sep 2009 08:15:04 +0800	[thread overview]
Message-ID: <20090929001504.GA18192@localhost> (raw)
In-Reply-To: <20090928071507.GA20068@localhost>

On Mon, Sep 28, 2009 at 03:15:07PM +0800, Wu Fengguang wrote:
> On Mon, Sep 28, 2009 at 09:07:00AM +0800, Dave Chinner wrote:
> > 
> > pageout is so horribly inefficient from an IO perspective it is not
> > funny. It is one of the reasons Linux sucks so much when under
> > memory pressure. It basically causes the system to do random 4k
> > writeback of dirty pages (and lumpy reclaim can make it
> > synchronous!). 
> > 
> > pageout needs an enema, and preferably it should defer to background
> > writeback to clean pages. background writeback will clean pages
> > much, much faster than the random crap that pageout spews at the
> > disk right now.
> > 
> > Given that I can basically lock up my 2.6.30-based laptop for 10-15
> > minutes at a time with the disk running flat out in low memory
> > situations simply by starting to copy a large file(*), I think that
> > the way we currently handle dirty page writeback needs a bit of a
> > rethink.
> > 
> > (*) I had this happen 4-5 times last week moving VM images around on
> > my laptop, and it involved the Linux VM switching between pageout
> > and swapping to make more memory available while the copy was was
> > hammering the same drive with dirty pages from foreground writeback.
> > It made for extremely fragmented files when the machine finally
> > recovered because of the non-sequential writeback patterns on the
> > single file being copied.  You can't tell me that this is sane,
> > desirable behaviour, and this is the sort of problem that I want
> > sorted out. I don't beleive it can be fixed by maintaining the
> > number of uncoordinated, competing writeback mechanisms we currently
> > have.
> 
> I imagined some lumpy pageout policy would help, but didn't realize
> it's such a severe problem that can happen in daily desktop workload..
> 
> Below is a quick patch. Any comments?

Wow, it's much easier to reuse write_cache_pages for lumpy pageout :)

---
 mm/page-writeback.c |   36 ++++++++++++++++++++++++------------
 mm/shmem.c          |    1 +
 mm/vmscan.c         |    6 ++++++
 3 files changed, 31 insertions(+), 12 deletions(-)

--- linux.orig/mm/vmscan.c	2009-09-29 07:21:51.000000000 +0800
+++ linux/mm/vmscan.c	2009-09-29 07:46:59.000000000 +0800
@@ -344,6 +344,8 @@ typedef enum {
 	PAGE_CLEAN,
 } pageout_t;
 
+#define LUMPY_PAGEOUT_PAGES	(512 * 1024 / PAGE_CACHE_SIZE)
+
 /*
  * pageout is called by shrink_page_list() for each dirty page.
  * Calls ->writepage().
@@ -408,6 +410,10 @@ static pageout_t pageout(struct page *pa
 			return PAGE_ACTIVATE;
 		}
 
+		wbc.range_start = (page->index + 1) << PAGE_CACHE_SHIFT;
+		wbc.nr_to_write = LUMPY_PAGEOUT_PAGES - 1;
+		generic_writepages(mapping, &wbc);
+
 		/*
 		 * Wait on writeback if requested to. This happens when
 		 * direct reclaiming a large contiguous area and the
--- linux.orig/mm/page-writeback.c	2009-09-29 07:33:13.000000000 +0800
+++ linux/mm/page-writeback.c	2009-09-29 08:10:39.000000000 +0800
@@ -799,6 +799,12 @@ retry:
 		if (nr_pages == 0)
 			break;
 
+		if (wbc->for_reclaim && done_index + nr_pages - 1 !=
+					pvec.pages[nr_pages - 1]->index) {
+			pagevec_release(&pvec);
+			break;
+		}
+
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];
 
@@ -852,24 +858,30 @@ continue_unlock:
 			if (!clear_page_dirty_for_io(page))
 				goto continue_unlock;
 
+			/*
+			 * active and unevictable pages will be checked at
+			 * rotate time
+			 */
+			if (wbc->for_reclaim)
+				SetPageReclaim(page);
+
 			ret = (*writepage)(page, wbc, data);
 			if (unlikely(ret)) {
 				if (ret == AOP_WRITEPAGE_ACTIVATE) {
 					unlock_page(page);
 					ret = 0;
-				} else {
-					/*
-					 * done_index is set past this page,
-					 * so media errors will not choke
-					 * background writeout for the entire
-					 * file. This has consequences for
-					 * range_cyclic semantics (ie. it may
-					 * not be suitable for data integrity
-					 * writeout).
-					 */
-					done = 1;
-					break;
 				}
+				/*
+				 * done_index is set past this page,
+				 * so media errors will not choke
+				 * background writeout for the entire
+				 * file. This has consequences for
+				 * range_cyclic semantics (ie. it may
+				 * not be suitable for data integrity
+				 * writeout).
+				 */
+				done = 1;
+				break;
  			}
 
 			if (nr_to_write > 0) {
--- linux.orig/mm/shmem.c	2009-09-29 08:07:22.000000000 +0800
+++ linux/mm/shmem.c	2009-09-29 08:08:02.000000000 +0800
@@ -1103,6 +1103,7 @@ unlock:
 	 */
 	swapcache_free(swap, NULL);
 redirty:
+	wbc->pages_skipped++;
 	set_page_dirty(page);
 	if (wbc->for_reclaim)
 		return AOP_WRITEPAGE_ACTIVATE;	/* Return with page locked */

  parent reply	other threads:[~2009-09-29  0:15 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-22  5:49 regression in page writeback Shaohua Li
2009-09-22  6:40 ` Peter Zijlstra
2009-09-22  8:05   ` Wu Fengguang
2009-09-22  8:09     ` Peter Zijlstra
2009-09-22  8:24       ` Wu Fengguang
2009-09-22  8:32         ` Peter Zijlstra
2009-09-22  8:51           ` Wu Fengguang
2009-09-22  8:52           ` Richard Kennedy
2009-09-22  9:05             ` Wu Fengguang
2009-09-22 11:41               ` Shaohua Li
2009-09-22 15:52           ` Chris Mason
2009-09-23  0:22             ` Wu Fengguang
2009-09-23  0:54               ` Andrew Morton
2009-09-23  1:17                 ` Wu Fengguang
2009-09-23  1:27                   ` Wu Fengguang
2009-09-23  1:28                   ` Andrew Morton
2009-09-23  1:32                     ` Wu Fengguang
2009-09-23  1:47                       ` Andrew Morton
2009-09-23  2:01                         ` Wu Fengguang
2009-09-23  2:09                           ` Andrew Morton
2009-09-23  3:07                             ` Wu Fengguang
2009-09-23  1:45                     ` Wu Fengguang
2009-09-23  1:59                       ` Andrew Morton
2009-09-23  2:26                         ` Wu Fengguang
2009-09-23  2:36                           ` Andrew Morton
2009-09-23  2:49                             ` Wu Fengguang
2009-09-23  2:56                               ` Andrew Morton
2009-09-23  3:11                                 ` Wu Fengguang
2009-09-23  3:10                               ` Shaohua Li
2009-09-23  3:14                                 ` Wu Fengguang
2009-09-23  3:25                                   ` Wu Fengguang
2009-09-23 14:00                             ` Chris Mason
2009-09-24  3:15                               ` Wu Fengguang
2009-09-24 12:10                                 ` Chris Mason
2009-09-25  3:26                                   ` Wu Fengguang
2009-09-25  0:11                                 ` Dave Chinner
2009-09-25  0:38                                   ` Chris Mason
2009-09-25  5:04                                     ` Dave Chinner
2009-09-25  6:45                                       ` Wu Fengguang
2009-09-28  1:07                                         ` Dave Chinner
2009-09-28  7:15                                           ` Wu Fengguang
2009-09-28 13:08                                             ` Christoph Hellwig
2009-09-28 14:07                                               ` Theodore Tso
2009-09-30  5:26                                                 ` Wu Fengguang
2009-09-30  5:32                                                   ` Wu Fengguang
2009-10-01 22:17                                                     ` Jan Kara
2009-10-02  3:27                                                       ` Wu Fengguang
2009-10-06 12:55                                                         ` Jan Kara
2009-10-06 13:18                                                           ` Wu Fengguang
2009-09-30 14:11                                                   ` Theodore Tso
2009-10-01 15:14                                                     ` Wu Fengguang
2009-10-01 21:54                                                       ` Theodore Tso
2009-10-02  2:55                                                         ` Wu Fengguang
2009-10-02  8:19                                                           ` Wu Fengguang
2009-10-02 17:26                                                             ` Theodore Tso
2009-10-03  6:10                                                               ` Wu Fengguang
2009-09-29  2:32                                               ` Wu Fengguang
2009-09-29 14:00                                                 ` Chris Mason
2009-09-29 14:21                                                 ` Christoph Hellwig
2009-09-29  0:15                                             ` Wu Fengguang [this message]
2009-09-28 14:25                                           ` Chris Mason
2009-09-29 23:39                                             ` Dave Chinner
2009-09-30  1:30                                               ` Wu Fengguang
2009-09-25 12:06                                       ` Chris Mason
2009-09-25  3:19                                   ` Wu Fengguang
2009-09-26  1:47                                     ` Dave Chinner
2009-09-26  3:02                                       ` Wu Fengguang
2009-09-26  3:02                                         ` Wu Fengguang
2009-09-23  9:19                         ` Richard Kennedy
2009-09-23  9:23                           ` Peter Zijlstra
2009-09-23  9:37                             ` Wu Fengguang
2009-09-23 10:30                               ` Wu Fengguang
2009-09-23  6:41             ` Shaohua Li
2009-09-22 10:49 ` Wu Fengguang
2009-09-22 11:50   ` Shaohua Li
2009-09-22 13:39     ` Wu Fengguang
2009-09-23  1:52       ` Shaohua Li
2009-09-23  4:00         ` Wu Fengguang
2009-09-25  6:14           ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090929001504.GA18192@localhost \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard@rsk.demon.co.uk \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.