From: Wu Fengguang <fengguang.wu@intel.com>
To: Nikita Danilov <danilov@gmail.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Theodore Tso <tytso@mit.edu>,
Christoph Hellwig <hch@infradead.org>,
Dave Chinner <david@fromorbit.com>,
Chris Mason <chris.mason@oracle.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"Li, Shaohua" <shaohua.li@intel.com>,
Myklebust Trond <Trond.Myklebust@netapp.com>,
"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
Jan Kara <jack@suse.cz>, Nick Piggin <npiggin@suse.de>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 30/45] vmscan: lumpy pageout
Date: Thu, 8 Oct 2009 10:37:01 +0800 [thread overview]
Message-ID: <20091008023701.GA20021@localhost> (raw)
In-Reply-To: <8acda98c0910070850x14614e0fh832f5cd29b1588f0@mail.gmail.com>
On Wed, Oct 07, 2009 at 11:50:58PM +0800, Nikita Danilov wrote:
> 2009/10/7 Wu Fengguang <fengguang.wu@intel.com>:
>
> [...]
>
> > + */
> > + if (current_is_kswapd() &&
> > + bdi_cap_writeback_dirty(mapping->backing_dev_info) &&
> > + !mapping->a_ops->writepages) {
> > + wbc.range_start = (page->index + 1) << PAGE_CACHE_SHIFT;
> > + wbc.nr_to_write = LUMPY_PAGEOUT_PAGES - 1;
> > + generic_writepages(mapping, &wbc);
> > + iput(inode);
> > + }
> > +
> > + /*
>
> One potential problem with this is that generic_writepages() waits on
> page locks and this can stall kswapd (always bad). This can be worked
> around by replacing lock_page() with trylock_page() conditionally on
> wbc->for_reclaim (or wbc->nonblocking?), but then, this almost look
> like a separate function would be better.
IMHO trylock_page() is not necessary. Locked pages are rare in normal
states. kswapd already do lock_page() for all pages it try to examine
state for reclaim. So it makes sense for lumpy pageout to follow the
(simple) convention.
> On a good side, it seems I was wrong and pageout calls iput() already:
> shrink_slab()->prune_icache()->iput().
Not totally wrong ;) iput() will be called if __GFP_FS is on. However
pageout may be called on either __GFP_FS or (__GFP_IO && PageSwapCache).
So I updated the patch to do lumpy pageout for __GFP_FS. In long term,
it would be good to remove AOP_WRITEPAGE_ACTIVATE and ->writepage()
totally, and to support shmem as well :)
Thanks,
Fengguang
---
vmscan: lumpy pageout
When pageout a dirty page, try to piggy back more consecutive dirty
pages (up to 512KB) to improve IO efficiency.
Only ext3/reiserfs which don't have its own aops->writepages are
supported in this initial version.
CC: Hugh Dickins <hugh.dickins@tiscali.co.uk>
CC: Dave Chinner <david@fromorbit.com>
CC: Nikita Danilov <danilov@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/page-writeback.c | 12 ++++++++++++
mm/vmscan.c | 23 ++++++++++++++++++++++-
2 files changed, 34 insertions(+), 1 deletion(-)
--- linux.orig/mm/vmscan.c 2009-10-08 07:35:06.000000000 +0800
+++ linux/mm/vmscan.c 2009-10-08 10:20:51.000000000 +0800
@@ -344,11 +344,14 @@ typedef enum {
PAGE_CLEAN,
} pageout_t;
+#define LUMPY_PAGEOUT_PAGES (512 * 1024 / PAGE_CACHE_SIZE)
+
/*
* pageout is called by shrink_page_list() for each dirty page.
* Calls ->writepage().
*/
static pageout_t pageout(struct page *page, struct address_space *mapping,
+ struct scan_control *sc,
enum pageout_io sync_writeback)
{
/*
@@ -398,6 +401,10 @@ static pageout_t pageout(struct page *pa
.nonblocking = 1,
.for_reclaim = 1,
};
+ struct inode *inode = NULL;
+
+ if (sc->gfp_mask & __GFP_FS)
+ inode = igrab(mapping->host);
SetPageReclaim(page);
res = mapping->a_ops->writepage(page, &wbc);
@@ -405,10 +412,24 @@ static pageout_t pageout(struct page *pa
handle_write_error(mapping, page, res);
if (res == AOP_WRITEPAGE_ACTIVATE) {
ClearPageReclaim(page);
+ iput(inode);
return PAGE_ACTIVATE;
}
/*
+ * only write_cache_pages() supports for_reclaim for now
+ * ignore shmem for now, thanks to Nikita.
+ */
+ if (current_is_kswapd() &&
+ bdi_cap_writeback_dirty(mapping->backing_dev_info) &&
+ !mapping->a_ops->writepages) {
+ wbc.range_start = (page->index + 1) << PAGE_CACHE_SHIFT;
+ wbc.nr_to_write = LUMPY_PAGEOUT_PAGES - 1;
+ generic_writepages(mapping, &wbc);
+ iput(inode);
+ }
+
+ /*
* Wait on writeback if requested to. This happens when
* direct reclaiming a large contiguous area and the
* first attempt to free a range of pages fails.
@@ -684,7 +705,7 @@ static unsigned long shrink_page_list(st
goto keep_locked;
/* Page is dirty, try to write it out here */
- switch (pageout(page, mapping, sync_writeback)) {
+ switch (pageout(page, mapping, sc, sync_writeback)) {
case PAGE_KEEP:
goto keep_locked;
case PAGE_ACTIVATE:
--- linux.orig/mm/page-writeback.c 2009-10-08 07:38:57.000000000 +0800
+++ linux/mm/page-writeback.c 2009-10-08 10:06:33.000000000 +0800
@@ -805,6 +805,11 @@ int write_cache_pages(struct address_spa
break;
}
+ if (wbc->for_reclaim && done_index != page->index) {
+ done = 1;
+ break;
+ }
+
if (nr_to_write != wbc->nr_to_write &&
done_index + WB_SEGMENT_DIST < page->index &&
--wbc->nr_segments <= 0) {
@@ -846,6 +851,13 @@ continue_unlock:
if (!clear_page_dirty_for_io(page))
goto continue_unlock;
+ /*
+ * active and unevictable pages will be checked at
+ * rotate time
+ */
+ if (wbc->for_reclaim)
+ SetPageReclaim(page);
+
ret = (*writepage)(page, wbc, data);
if (unlikely(ret)) {
if (ret == AOP_WRITEPAGE_ACTIVATE) {
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-10-08 2:37 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-07 10:38 [PATCH 30/45] vmscan: lumpy pageout Nikita Danilov
2009-10-07 11:14 ` Wu Fengguang
2009-10-07 11:32 ` Nick Piggin
2009-10-07 11:37 ` Nikita Danilov
2009-10-07 13:29 ` Wu Fengguang
2009-10-07 13:42 ` Wu Fengguang
2009-10-07 14:20 ` Wu Fengguang
2009-10-07 14:50 ` Nikita Danilov
2009-10-07 15:00 ` Wu Fengguang
2009-10-07 15:50 ` Nikita Danilov
2009-10-08 2:37 ` Wu Fengguang [this message]
2009-10-08 8:20 ` Hugh Dickins
2009-10-08 10:12 ` Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2009-10-07 7:38 [PATCH 00/45] some writeback experiments Wu Fengguang
2009-10-07 7:38 ` [PATCH 30/45] vmscan: lumpy pageout Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091008023701.GA20021@localhost \
--to=fengguang.wu@intel.com \
--cc=Trond.Myklebust@netapp.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=danilov@gmail.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=jack@suse.cz \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=shaohua.li@intel.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.