All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Nikita Danilov <danilov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Theodore Tso <tytso@mit.edu>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	Myklebust Trond <Trond.Myklebust@netapp.com>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
	Jan Kara <jack@suse.cz>, Nick Piggin <npiggin@suse.de>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 30/45] vmscan: lumpy pageout
Date: Wed, 7 Oct 2009 21:42:54 +0800	[thread overview]
Message-ID: <20091007134254.GA26244@localhost> (raw)
In-Reply-To: <20091007132924.GB20855@localhost>

On Wed, Oct 07, 2009 at 09:29:24PM +0800, Wu Fengguang wrote:
> On Wed, Oct 07, 2009 at 07:37:35PM +0800, Nikita Danilov wrote:
> > 2009/10/7 Wu Fengguang <fengguang.wu@intel.com>:
> > > On Wed, Oct 07, 2009 at 06:38:37PM +0800, Nikita Danilov wrote:
> > >> Hello,
> > >>
> > 
> > [...]
> > 
> > >
> > > Glad to know about your experiences :) Interestingly I started with
> > > ->writepage() and then switch to ->writepages() because filesystems
> > > behave better with the latter (i.e. less file fragmentation).
> > 
> > By the way, why is your patch doing
> > 
> >         ->writepage(page->index);
> >         generic_writepages(page->index + 1, LUMPY_PAGEOUT_PAGES - 1);
> > 
> > instead of
> > 
> >         generic_writepages(page->index, LUMPY_PAGEOUT_PAGES);
> > 
> > ? Is this because of the difficulties with passing back page specific
> > errors from generic_writepages()?
> 
> Yes. It's possible to tell write_cache_pages() to return
> AOP_WRITEPAGE_ACTIVATE. Other ->writepages() don't have to deal with
> this because their ->writepage() won't return AOP_WRITEPAGE_ACTIVATE
> at all.
> 
> But it is going to be ugly to specialize the first locked page in
> every ->writepages() functions..
> 
> > >
> > > I'd like to just ignore the shmem case, by adding a
> > > bdi_cap_writeback_dirty() check. Because clustered writing to swap
> > > device may be a less gain.
> > 
> > Or you can just call try_to_unmap() from shmem_writepage() when
> > wbc->for_reclaim is true.
> 
> Hmm, it's more comfortable to stay away from shmem for the initial version.
> But feel free to submit a patch for it in future :)
> 
> > > Page filtering should also be possible in write_cache_pages().  But
> > > what do you mean by "hard-to-fix races against inode reclamation"?
> > 
> > vmscan.c pageout path doesn't take a reference on inode, so the
> > instant ->writepage() releases lock on the page, the inode can be
> > freed.
> 
> Ah, then we could just do igrab() if inode_lock is not locked?
> A bit ugly though.

Since writepage() can sleep, we don't need to worry about inode_lock.

Here is the updated patch.

Thanks,
Fengguang
---

vmscan: lumpy pageout

When pageout a dirty page, try to piggy back more consecutive dirty
pages (up to 512KB) to improve IO efficiency.

Only ext3/reiserfs which don't have its own aops->writepages are
supported in this initial version.

CC: Dave Chinner <david@fromorbit.com>
CC: Nikita Danilov <danilov@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   12 ++++++++++++
 mm/vmscan.c         |   16 ++++++++++++++++
 2 files changed, 28 insertions(+)

--- linux.orig/mm/vmscan.c	2009-10-07 21:39:13.000000000 +0800
+++ linux/mm/vmscan.c	2009-10-07 21:39:57.000000000 +0800
@@ -344,6 +344,8 @@ typedef enum {
 	PAGE_CLEAN,
 } pageout_t;
 
+#define LUMPY_PAGEOUT_PAGES	(512 * 1024 / PAGE_CACHE_SIZE)
+
 /*
  * pageout is called by shrink_page_list() for each dirty page.
  * Calls ->writepage().
@@ -399,16 +401,30 @@ static pageout_t pageout(struct page *pa
 			.for_reclaim = 1,
 		};
 
+		igrab(mapping->host);
 		SetPageReclaim(page);
 		res = mapping->a_ops->writepage(page, &wbc);
 		if (res < 0)
 			handle_write_error(mapping, page, res);
 		if (res == AOP_WRITEPAGE_ACTIVATE) {
 			ClearPageReclaim(page);
+			iput(mapping->host);
 			return PAGE_ACTIVATE;
 		}
 
 		/*
+		 * only write_cache_pages() supports for_reclaim for now
+		 * ignore shmem for now, thanks to Nikita.
+		 */
+		if (bdi_cap_writeback_dirty(mapping->backing_dev_info) &&
+		    !mapping->a_ops->writepages) {
+			wbc.range_start = (page->index + 1) << PAGE_CACHE_SHIFT;
+			wbc.nr_to_write = LUMPY_PAGEOUT_PAGES - 1;
+			generic_writepages(mapping, &wbc);
+			iput(mapping->host);
+		}
+
+		/*
 		 * Wait on writeback if requested to. This happens when
 		 * direct reclaiming a large contiguous area and the
 		 * first attempt to free a range of pages fails.
--- linux.orig/mm/page-writeback.c	2009-10-07 21:39:13.000000000 +0800
+++ linux/mm/page-writeback.c	2009-10-07 21:39:14.000000000 +0800
@@ -805,6 +805,11 @@ int write_cache_pages(struct address_spa
 				break;
 			}
 
+			if (wbc->for_reclaim && done_index != page->index) {
+				done = 1;
+				break;
+			}
+
 			if (nr_to_write != wbc->nr_to_write &&
 			    done_index + WB_SEGMENT_DIST < page->index &&
 			    --wbc->nr_segments <= 0) {
@@ -846,6 +851,13 @@ continue_unlock:
 			if (!clear_page_dirty_for_io(page))
 				goto continue_unlock;
 
+			/*
+			 * active and unevictable pages will be checked at
+			 * rotate time
+			 */
+			if (wbc->for_reclaim)
+				SetPageReclaim(page);
+
 			ret = (*writepage)(page, wbc, data);
 			if (unlikely(ret)) {
 				if (ret == AOP_WRITEPAGE_ACTIVATE) {
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-10-07 13:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-07 10:38 [PATCH 30/45] vmscan: lumpy pageout Nikita Danilov
2009-10-07 11:14 ` Wu Fengguang
2009-10-07 11:32   ` Nick Piggin
2009-10-07 11:37   ` Nikita Danilov
2009-10-07 13:29     ` Wu Fengguang
2009-10-07 13:42       ` Wu Fengguang [this message]
2009-10-07 14:20         ` Wu Fengguang
2009-10-07 14:50           ` Nikita Danilov
2009-10-07 15:00             ` Wu Fengguang
2009-10-07 15:50               ` Nikita Danilov
2009-10-08  2:37                 ` Wu Fengguang
2009-10-08  8:20                   ` Hugh Dickins
2009-10-08 10:12                     ` Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2009-10-07  7:38 [PATCH 00/45] some writeback experiments Wu Fengguang
2009-10-07  7:38 ` [PATCH 30/45] vmscan: lumpy pageout Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091007134254.GA26244@localhost \
    --to=fengguang.wu@intel.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=danilov@gmail.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=shaohua.li@intel.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.