Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible
Date: Tue, 15 Jun 2010 16:38:38 +0100	[thread overview]
Message-ID: <20100615153838.GO26788@csn.ul.ie> (raw)
In-Reply-To: <20100615150850.GF28052@random.random>

On Tue, Jun 15, 2010 at 05:08:50PM +0200, Andrea Arcangeli wrote:
> On Tue, Jun 15, 2010 at 10:43:42AM -0400, Christoph Hellwig wrote:
> > Other callers of ->writepage are fine because they come from a
> > controlled environment with relatively little stack usage.  The problem
> > with direct reclaim is that we splice multiple stack hogs ontop of each
> > other.
> 
> It's not like we're doing a stack recursive algorithm in kernel. These
> have to be "controlled hogs", so we must have space to run 4/5 of them
> on top of each other, that's the whole point.
> 
> I'm aware the ->writepage can run on any alloc_pages, but frankly I
> don't see a whole lot of difference between regular kernel code paths
> or msync. Sure they can be at higher stack usage, but not like with
> only 1000bytes left.
> 

That is pretty much what Dave is claiming here at
http://lkml.org/lkml/2010/4/13/121 where if mempool_alloc_slab() needed
to allocate a page and writepage was entered, there would have been a
a problem.

I disagreed with his fix which is what led to this series as an alternative.

> > And seriously, if the VM isn't stopped from calling ->writepage from
> > reclaim context we FS people will simply ignore any ->writepage from
> > reclaim context.  Been there, done that and never again.
> > 
> > Just wondering, what filesystems do your hugepage testing systems use?
> > If it's any of the ext4/btrfs/xfs above you're already seeing the
> > filesystem refuse ->writepage from both kswapd and direct reclaim,
> > so Mel's series will allow us to reclaim pages from more contexts
> > than before.
> 
> fs ignoring ->writepage during memory pressure (even from kswapd) is
> broken, this is not up to the fs to decide. I'm using ext4 on most of
> my testing, it works ok, but it doesn't make it right (if fact if
> performance declines without that hack, it may prove VM needs fixing,
> it doesn't justify the hack).
> 

Broken or not, it's what some of them are doing to avoid stack
overflows. Worst, they are ignoring both kswapd and direct reclaim when they
only really needed to ignore kswapd. With this series at least, the
check for PF_MEMALLOC in ->writepage can be removed

> If you don't throttle against kswapd, or if even kswapd can't turn a
> dirty page into a clean one, you can get oom false positives. Anything
> is better than that.

This series would at least allow kswapd to turn dirty pages into clean
ones so it's an improvement.

> (provided you've proper stack instrumentation to
> notice when there is risk of a stack overflow, it's ages I never seen
> a stack overflow debug detector report)
> 
> The irq stack must be enabled and this isn't about direct reclaim but
> about irqs in general and their potential nesting with softirq calls
> too.
> 
> Also note, there's nothing that prevents us from switching the stack
> to something else the moment we enter direct reclaim.

Other than a lack of code to do it :/

If you really feel strongly about this, you could follow on the series
by extending clean_page_list() to switch stack if !kswapd.

> It doesn't need
> to be physically contiguous. Just allocate a couple of 4k pages and
> switch to them every time a new hog starts in VM context. The only
> real complexity is in the stack unwind but if irqstack can cope with
> it sure stack unwind can cope with more "special" stacks too.
> 
> Ignoring ->writepage on VM invocations at best can only hide VM
> inefficiencies with the downside of breaking the VM in corner cases
> with heavy VM pressure.
> 

This has actually been the case for a while. I vaguely recall FS people
complaining about writepage from direct reclaim at some conference or
the other two years ago.

> Crippling down the kernel by vetoing ->writepage to me looks very
> wrong, but I'd be totally supportive of a "special" writepage stack or
> special iscsi stack etc...
> 

I'm not sure the complexityy is justified based on the data I've seen so
far.

                if (reclaim_can_writeback(sc)) {
                        cleaned = MAX_SWAP_CLEAN_WAIT;
                        clean_page_list(page_list, sc);
                        goto restart_dirty;
                } else {
                        cleaned++;
                        /*
                         * If lumpy reclaiming, kick the background
                         * flusher and wait
                         * for the pages to be cleaned
                         *
                         * XXX: kswapd won't find these isolated pages but the
                         *      background flusher does not prioritise pages. It'd
                         *      be nice to prioritise a list of pages somehow
                         */
                        if (sync_writeback == PAGEOUT_IO_SYNC) {
                                wakeup_flusher_threads(nr_dirty);
                                congestion_wait(BLK_RW_ASYNC, HZ/10);
                                goto restart_dirty;
                        }
                }

to

                if (reclaim_can_writeback(sc)) {
                        cleaned = MAX_SWAP_CLEAN_WAIT;
                        clean_page_list(page_list, sc);
                        goto restart_dirty;
                } else {
                        cleaned++;
                        wakeup_flusher_threads(nr_dirty);
                        congestion_wait(BLK_RW_ASYNC, HZ/10);

			/* If not in lumpy reclaim, just try these
			 * pages one more time before isolating more
			 * pages from the LRU
			 */
			if (sync_writeback != PAGEOUT_IO_SYNC)
				clean = MAX_SWAP_CLEAN_WAIT;
			goto restart_dirty;
                }

i.e. when direct reclaim encounters N dirty pages, unconditionally ask the
flusher threads to clean that number of pages, throttle by waiting for them
to be cleaned, reclaim them if they get cleaned or otherwise scan more pages
on the LRU.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

next prev parent reply	other threads:[~2010-06-15 15:38 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-08  9:02 [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Mel Gorman
2010-06-08  9:02 ` [PATCH 1/6] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-08  9:02 ` [PATCH 2/6] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-08  9:02 ` [PATCH 3/6] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-08  9:02 ` [PATCH 4/6] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-08  9:02 ` [PATCH 5/6] vmscan: Write out ranges of pages contiguous to the inode where possible Mel Gorman
2010-06-11  6:10   ` Andrew Morton
2010-06-11 12:49     ` Mel Gorman
2010-06-11 19:07       ` Andrew Morton
2010-06-11 20:44         ` Mel Gorman
2010-06-11 21:33           ` Andrew Morton
2010-06-12  0:17             ` Mel Gorman
2010-06-11 16:27     ` Christoph Hellwig
2010-06-08  9:02 ` [PATCH 6/6] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-11  6:17   ` Andrew Morton
2010-06-11 12:54     ` Mel Gorman
2010-06-11 16:25     ` Christoph Hellwig
2010-06-11 17:43       ` Andrew Morton
2010-06-11 17:49         ` Christoph Hellwig
2010-06-11 18:13           ` Mel Gorman
2010-06-08  9:08 ` [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Christoph Hellwig
2010-06-08  9:28   ` Mel Gorman
2010-06-11 16:29     ` Christoph Hellwig
2010-06-11 18:15       ` Mel Gorman
2010-06-11 19:12       ` Chris Mason
2010-06-09  2:52 ` KAMEZAWA Hiroyuki
2010-06-09  9:52   ` Mel Gorman
2010-06-10  0:38     ` KAMEZAWA Hiroyuki
2010-06-10  1:10       ` Mel Gorman
2010-06-10  1:29         ` KAMEZAWA Hiroyuki
2010-06-11  5:57 ` Andrew Morton
2010-06-11 12:33   ` Mel Gorman
2010-06-11 16:30     ` Christoph Hellwig
2010-06-11 18:17       ` Mel Gorman
2010-06-15 14:00 ` Andrea Arcangeli
2010-06-15 14:11   ` Christoph Hellwig
2010-06-15 14:22     ` Andrea Arcangeli
2010-06-15 14:43       ` Christoph Hellwig
2010-06-15 15:08         ` Andrea Arcangeli
2010-06-15 15:25           ` Christoph Hellwig
2010-06-15 15:45             ` Andrea Arcangeli
2010-06-15 16:26               ` Christoph Hellwig
2010-06-15 16:31                 ` Andrea Arcangeli
2010-06-15 16:49                 ` Rik van Riel
2010-06-15 16:54                   ` Christoph Hellwig
2010-06-15 19:13                     ` Rik van Riel
2010-06-15 19:17                       ` Christoph Hellwig
2010-06-15 19:44                         ` Chris Mason
2010-06-16  7:57                       ` Nick Piggin
2010-06-16 16:59                         ` Rik van Riel
2010-06-16 17:04                           ` Andrea Arcangeli
2010-06-15 16:54                   ` Nick Piggin
2010-06-15 15:38           ` Mel Gorman [this message]
2010-06-15 16:14             ` Andrea Arcangeli
2010-06-15 16:22               ` Christoph Hellwig
2010-06-15 16:30               ` Mel Gorman
2010-06-15 16:34                 ` Mel Gorman
2010-06-15 16:54                   ` Andrea Arcangeli
2010-06-15 16:35                 ` Christoph Hellwig
2010-06-15 16:37                 ` Andrea Arcangeli
2010-06-15 17:43                   ` Christoph Hellwig
2010-06-15 16:45               ` Christoph Hellwig
2010-06-15 14:51   ` Mel Gorman
2010-06-15 14:55     ` Rik van Riel
2010-06-15 15:08     ` Nick Piggin
2010-06-15 15:10       ` Mel Gorman
2010-06-15 16:28     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100615153838.GO26788@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=aarcange@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).