From: Andrea Arcangeli <aarcange@redhat.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, Dave Chinner <david@fromorbit.com>,
Chris Mason <chris.mason@oracle.com>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible
Date: Tue, 15 Jun 2010 18:14:19 +0200 [thread overview]
Message-ID: <20100615161419.GH28052@random.random> (raw)
In-Reply-To: <20100615153838.GO26788@csn.ul.ie>
On Tue, Jun 15, 2010 at 04:38:38PM +0100, Mel Gorman wrote:
> That is pretty much what Dave is claiming here at
> http://lkml.org/lkml/2010/4/13/121 where if mempool_alloc_slab() needed
This stack trace shows writepage called by shrink_page_list... that
contradict Christoph's claim that xfs already won't writepage if
invoked by direct reclaim.
> to allocate a page and writepage was entered, there would have been a
> a problem.
There can't be a problem if a page wasn't available in mempool because
we can't nest two writepage on top of the other or it'd deadlock on fs
locks and this is the reason of GFP_NOFS, like noticed in the email.
Surely this shows the writepage going very close to the stack
size... probably not enough to trigger the stack detector but close
enough to worry! Agreed.
I think we just need to switch stack on do_try_to_free_pages to solve
it, and not just writepage or the filesystems.
> Broken or not, it's what some of them are doing to avoid stack
> overflows. Worst, they are ignoring both kswapd and direct reclaim when they
> only really needed to ignore kswapd. With this series at least, the
> check for PF_MEMALLOC in ->writepage can be removed
I don't get how we end up in xfs_buf_ioapply above though if xfs
writepage is a noop on PF_MEMALLOC. Definitely PF_MEMALLOC is set
before try_to_free_pages but in the above trace writepage still runs
and submit the I/O.
> This series would at least allow kswapd to turn dirty pages into clean
> ones so it's an improvement.
Not saying it's not an improvement, but still it's not necessarily the
right direction.
> Other than a lack of code to do it :/
;)
> If you really feel strongly about this, you could follow on the series
> by extending clean_page_list() to switch stack if !kswapd.
>
> This has actually been the case for a while. I vaguely recall FS people
Again not what looks like from the stack trace. Also grepping for
PF_MEMALLOC in fs/xfs shows nothing. In fact it's ext4_write_inode
that skips the write if PF_MEMALLOC is set, not writepage apparently
(only did a quick grep so I might be wrong). I suspect
ext4_write_inode is the case I just mentioned about slab shrink, not
->writepage ;).
inodes are small, it's no big deal to keep an inode pinned and not
slab-reclaimable because dirty, while skipping real writepage in
memory pressure could really open a regression in oom false positives!
One pagecache much bigger than one inode and there can be plenty more
dirty pagecache than inodes.
> i.e. when direct reclaim encounters N dirty pages, unconditionally ask the
> flusher threads to clean that number of pages, throttle by waiting for them
> to be cleaned, reclaim them if they get cleaned or otherwise scan more pages
> on the LRU.
Not bad at all... throttling is what makes it safe too. Problem is all
the rest that isn't solved by this and could be solved with a stack
switch, that's my main reason for considering this a ->writepage only
hack not complete enough to provide a generic solution for reclaim
issues ending up in fs->dm->iscsi/bio. I also suspect xfs is more hog
than others (might not be a coicidence the 7k happens with xfs
writepage) and could be lightened up a bit by looking into it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-15 16:14 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-08 9:02 [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Mel Gorman
2010-06-08 9:02 ` [PATCH 1/6] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-08 9:02 ` [PATCH 2/6] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-08 9:02 ` [PATCH 3/6] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-08 9:02 ` [PATCH 4/6] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-08 9:02 ` [PATCH 5/6] vmscan: Write out ranges of pages contiguous to the inode where possible Mel Gorman
2010-06-11 6:10 ` Andrew Morton
2010-06-11 12:49 ` Mel Gorman
2010-06-11 19:07 ` Andrew Morton
2010-06-11 20:44 ` Mel Gorman
2010-06-11 21:33 ` Andrew Morton
2010-06-12 0:17 ` Mel Gorman
2010-06-11 16:27 ` Christoph Hellwig
2010-06-08 9:02 ` [PATCH 6/6] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-11 6:17 ` Andrew Morton
2010-06-11 12:54 ` Mel Gorman
2010-06-11 16:25 ` Christoph Hellwig
2010-06-11 17:43 ` Andrew Morton
2010-06-11 17:49 ` Christoph Hellwig
2010-06-11 18:13 ` Mel Gorman
2010-06-08 9:08 ` [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Christoph Hellwig
2010-06-08 9:28 ` Mel Gorman
2010-06-11 16:29 ` Christoph Hellwig
2010-06-11 18:15 ` Mel Gorman
2010-06-11 19:12 ` Chris Mason
2010-06-09 2:52 ` KAMEZAWA Hiroyuki
2010-06-09 9:52 ` Mel Gorman
2010-06-10 0:38 ` KAMEZAWA Hiroyuki
2010-06-10 1:10 ` Mel Gorman
2010-06-10 1:29 ` KAMEZAWA Hiroyuki
2010-06-11 5:57 ` Andrew Morton
2010-06-11 12:33 ` Mel Gorman
2010-06-11 16:30 ` Christoph Hellwig
2010-06-11 18:17 ` Mel Gorman
2010-06-15 14:00 ` Andrea Arcangeli
2010-06-15 14:11 ` Christoph Hellwig
2010-06-15 14:22 ` Andrea Arcangeli
2010-06-15 14:43 ` Christoph Hellwig
2010-06-15 15:08 ` Andrea Arcangeli
2010-06-15 15:25 ` Christoph Hellwig
2010-06-15 15:45 ` Andrea Arcangeli
2010-06-15 16:26 ` Christoph Hellwig
2010-06-15 16:31 ` Andrea Arcangeli
2010-06-15 16:49 ` Rik van Riel
2010-06-15 16:54 ` Christoph Hellwig
2010-06-15 19:13 ` Rik van Riel
2010-06-15 19:17 ` Christoph Hellwig
2010-06-15 19:44 ` Chris Mason
2010-06-16 7:57 ` Nick Piggin
2010-06-16 16:59 ` Rik van Riel
2010-06-16 17:04 ` Andrea Arcangeli
2010-06-15 16:54 ` Nick Piggin
2010-06-15 15:38 ` Mel Gorman
2010-06-15 16:14 ` Andrea Arcangeli [this message]
2010-06-15 16:22 ` Christoph Hellwig
2010-06-15 16:30 ` Mel Gorman
2010-06-15 16:34 ` Mel Gorman
2010-06-15 16:54 ` Andrea Arcangeli
2010-06-15 16:35 ` Christoph Hellwig
2010-06-15 16:37 ` Andrea Arcangeli
2010-06-15 17:43 ` Christoph Hellwig
2010-06-15 16:45 ` Christoph Hellwig
2010-06-15 14:51 ` Mel Gorman
2010-06-15 14:55 ` Rik van Riel
2010-06-15 15:08 ` Nick Piggin
2010-06-15 15:10 ` Mel Gorman
2010-06-15 16:28 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100615161419.GH28052@random.random \
--to=aarcange@redhat.com \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).