From: Mel Gorman <mgorman@suse.de>
To: Dave Chinner <david@fromorbit.com>
Cc: Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>, XFS <xfs@oss.sgi.com>,
Christoph Hellwig <hch@infradead.org>,
Johannes Weiner <jweiner@redhat.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Rik van Riel <riel@redhat.com>,
Minchan Kim <minchan.kim@gmail.com>
Subject: Re: [PATCH 5/5] mm: writeback: Prioritise dirty inodes encountered by direct reclaim for background flushing
Date: Thu, 14 Jul 2011 08:30:33 +0100 [thread overview]
Message-ID: <20110714073033.GR7529@suse.de> (raw)
In-Reply-To: <20110713235606.GX23038@dastard>
On Thu, Jul 14, 2011 at 09:56:06AM +1000, Dave Chinner wrote:
> On Wed, Jul 13, 2011 at 03:31:27PM +0100, Mel Gorman wrote:
> > It is preferable that no dirty pages are dispatched from the page
> > reclaim path. If reclaim is encountering dirty pages, it implies that
> > either reclaim is getting ahead of writeback or use-once logic has
> > prioritise pages for reclaiming that are young relative to when the
> > inode was dirtied.
> >
> > When dirty pages are encounted on the LRU, this patch marks the inodes
> > I_DIRTY_RECLAIM and wakes the background flusher. When the background
> > flusher runs, it moves such inodes immediately to the dispatch queue
> > regardless of inode age. There is no guarantee that pages reclaim
> > cares about will be cleaned first but the expectation is that the
> > flusher threads will clean the page quicker than if reclaim tried to
> > clean a single page.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> > fs/fs-writeback.c | 56 ++++++++++++++++++++++++++++++++++++++++++++-
> > include/linux/fs.h | 5 ++-
> > include/linux/writeback.h | 1 +
> > mm/vmscan.c | 16 ++++++++++++-
> > 4 files changed, 74 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 0f015a0..1201052 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -257,9 +257,23 @@ static void move_expired_inodes(struct list_head *delaying_queue,
> > LIST_HEAD(tmp);
> > struct list_head *pos, *node;
> > struct super_block *sb = NULL;
> > - struct inode *inode;
> > + struct inode *inode, *tinode;
> > int do_sb_sort = 0;
> >
> > + /* Move inodes reclaim found at end of LRU to dispatch queue */
> > + list_for_each_entry_safe(inode, tinode, delaying_queue, i_wb_list) {
> > + /* Move any inode found at end of LRU to dispatch queue */
> > + if (inode->i_state & I_DIRTY_RECLAIM) {
> > + inode->i_state &= ~I_DIRTY_RECLAIM;
> > + list_move(&inode->i_wb_list, &tmp);
> > +
> > + if (sb && sb != inode->i_sb)
> > + do_sb_sort = 1;
> > + sb = inode->i_sb;
> > + }
> > + }
>
> This is not a good idea. move_expired_inodes() already sucks a large
> amount of CPU when there are lots of dirty inodes on the list (think
> hundreds of thousands), and that is when the traversal terminates at
> *older_than_this. It's not uncommon in my testing to see this
> one function consume 30-35% of the bdi-flusher thread CPU usage
> in such conditions.
>
I thought this might be the case. I wasn't sure how bad it could be but
I mentioned in the leader it might be a problem. I'll consider other
ways that pages found at the end of the LRU could be prioritised for
writeback.
> > <SNIP>
> > +
> > + sb = NULL;
> > while (!list_empty(delaying_queue)) {
> > inode = wb_inode(delaying_queue->prev);
> > if (older_than_this &&
> > @@ -968,6 +982,46 @@ void wakeup_flusher_threads(long nr_pages)
> > rcu_read_unlock();
> > }
> >
> > +/*
> > + * Similar to wakeup_flusher_threads except prioritise inodes contained
> > + * in the page_list regardless of age
> > + */
> > +void wakeup_flusher_threads_pages(long nr_pages, struct list_head *page_list)
> > +{
> > + struct page *page;
> > + struct address_space *mapping;
> > + struct inode *inode;
> > +
> > + list_for_each_entry(page, page_list, lru) {
> > + if (!PageDirty(page))
> > + continue;
> > +
> > + if (PageSwapBacked(page))
> > + continue;
> > +
> > + lock_page(page);
> > + mapping = page_mapping(page);
> > + if (!mapping)
> > + goto unlock;
> > +
> > + /*
> > + * Test outside the lock to see as if it is already set. Inode
> > + * should be pinned by the lock_page
> > + */
> > + inode = page->mapping->host;
> > + if (inode->i_state & I_DIRTY_RECLAIM)
> > + goto unlock;
> > +
> > + spin_lock(&inode->i_lock);
> > + inode->i_state |= I_DIRTY_RECLAIM;
> > + spin_unlock(&inode->i_lock);
>
> Micro optimisations like this are unnecessary - the inode->i_lock is
> not contended.
>
This patch was brought forward from a time when it would have been
taking the global inode_lock. I wasn't sure how badly inode->i_lock
was being contended and hadn't set up lock stats. Thanks for the
clarification.
> As it is, this code won't really work as you think it might.
> There's no guarantee a dirty inode is on the dirty - it might have
> already been expired, and it might even currently be under
> writeback. In that case, if it is still dirty it goes to the
> b_more_io list and writeback bandwidth is shared between all the
> other dirty inodes and completely ignores this flag...
>
Ok, it's a total bust. If I revisit this at all, it'll either be in
the context of Wu's approach or calling fdatawrite_range but but it
might be pointless and overall it might just be better for now to
leave kswapd calling ->writepage if reclaim is failing and priority
is raised.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-07-14 7:30 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-13 14:31 [RFC PATCH 0/5] Reduce filesystem writeback from page reclaim (again) Mel Gorman
2011-07-13 14:31 ` [PATCH 1/5] mm: vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
2011-07-13 23:34 ` Dave Chinner
2011-07-14 6:17 ` Mel Gorman
2011-07-14 1:38 ` KAMEZAWA Hiroyuki
2011-07-14 4:46 ` Christoph Hellwig
2011-07-14 4:46 ` KAMEZAWA Hiroyuki
2011-07-14 15:07 ` Christoph Hellwig
2011-07-14 23:55 ` KAMEZAWA Hiroyuki
2011-07-15 2:22 ` Dave Chinner
2011-07-18 2:22 ` Dave Chinner
2011-07-18 3:06 ` Dave Chinner
2011-07-14 6:19 ` Mel Gorman
2011-07-14 6:17 ` KAMEZAWA Hiroyuki
2011-07-13 14:31 ` [PATCH 2/5] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority Mel Gorman
2011-07-13 23:37 ` Dave Chinner
2011-07-14 6:29 ` Mel Gorman
2011-07-14 11:52 ` Dave Chinner
2011-07-14 13:17 ` Mel Gorman
2011-07-15 3:12 ` Dave Chinner
2011-07-13 14:31 ` [PATCH 3/5] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback Mel Gorman
2011-07-13 23:41 ` Dave Chinner
2011-07-14 6:33 ` Mel Gorman
2011-07-13 14:31 ` [PATCH 4/5] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes Mel Gorman
2011-07-13 16:40 ` Johannes Weiner
2011-07-13 17:15 ` Mel Gorman
2011-07-13 14:31 ` [PATCH 5/5] mm: writeback: Prioritise dirty inodes encountered by direct reclaim for background flushing Mel Gorman
2011-07-13 21:39 ` Jan Kara
2011-07-14 0:09 ` Dave Chinner
2011-07-14 7:03 ` Mel Gorman
2011-07-13 23:56 ` Dave Chinner
2011-07-14 7:30 ` Mel Gorman [this message]
2011-07-14 15:09 ` Christoph Hellwig
2011-07-14 15:49 ` Mel Gorman
2011-07-13 15:31 ` [RFC PATCH 0/5] Reduce filesystem writeback from page reclaim (again) Mel Gorman
2011-07-14 0:33 ` Dave Chinner
2011-07-14 4:51 ` Christoph Hellwig
2011-07-14 7:37 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110714073033.GR7529@suse.de \
--to=mgorman@suse.de \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jweiner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).