linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH 12/14] vmscan: Do not writeback pages in direct reclaim
Date: Wed, 7 Jul 2010 01:24:58 +0100	[thread overview]
Message-ID: <20100707002458.GI13780@csn.ul.ie> (raw)
In-Reply-To: <AANLkTimOkI95ZkJecE3jxRDDGbHvP9tRUluIoJuhqqMz@mail.gmail.com>

On Wed, Jul 07, 2010 at 07:28:14AM +0900, Minchan Kim wrote:
> On Wed, Jul 7, 2010 at 5:27 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Tue, Jul 06, 2010 at 04:25:39PM +0100, Mel Gorman wrote:
> >> On Tue, Jul 06, 2010 at 08:24:57PM +0900, Minchan Kim wrote:
> >> > but it is still problem in case of swap file.
> >> > That's because swapout on swapfile cause file system writepage which
> >> > makes kernel stack overflow.
> >>
> >> I don't *think* this is a problem unless I missed where writing out to
> >> swap enters teh filesystem code. I'll double check.
> >
> > It bypasses the fs.  On swapon, the blocks are resolved
> > (mm/swapfile.c::setup_swap_extents) and then the writeout path uses
> > bios directly (mm/page_io.c::swap_writepage).
> >
> > (GFP_NOFS still includes __GFP_IO, so allows swapping)
> >
> >        Hannes
> 
> Thanks, Hannes. You're right.
> Extents would be resolved by setup_swap_extents.
> Sorry for confusing, Mel.
> 

No confusion. I was 99.99999% certain this was the case and had tested with
a few bug_on's just in case but confirmation is helpful. Thanks both.

What I have now is direct writeback for anon files. For files be it from
kswapd or direct reclaim, I kick writeback pre-emptively by an amount based
on the dirty pages encountered because monitoring from systemtap indicated
that we were getting a large percentage of the dirty file pages at the end
of the LRU lists (bad). Initial tests show that page reclaim writeback is
reduced from kswapd by 97% with this sort of pre-emptive kicking of flusher
threads based on these figures from sysbench.

                traceonly-v4r1  stackreduce-v4r1    flushforward-v4r4
Direct reclaims                                621        710         30928 
Direct reclaim pages scanned                141316     141184       1912093 
Direct reclaim write file async I/O          23904      28714             0 
Direct reclaim write anon async I/O            716        918            88 
Direct reclaim write file sync I/O               0          0             0 
Direct reclaim write anon sync I/O               0          0             0 
Wake kswapd requests                        713250     735588       5626413 
Kswapd wakeups                                1805       1498           641 
Kswapd pages scanned                      17065538   15605327       9524623 
Kswapd reclaim write file async I/O         715768     617225         23938  <-- Wooo
Kswapd reclaim write anon async I/O         218003     214051        198746 
Kswapd reclaim write file sync I/O               0          0             0 
Kswapd reclaim write anon sync I/O               0          0             0 
Time stalled direct reclaim (ms)              9.87      11.63        315.30 
Time kswapd awake (ms)                     1884.91    2088.23       3542.92 

This is "good" IMO because file IO from page reclaim is frowned upon because
of poor IO patterns. There isn't a launder process I can kick for anon pages
to get overall reclaim IO down but it's not clear it's worth it at this
juncture because AFAIK, IO to swap blows anyway. The biggest plus is that
direct reclaim still not call into the filesystem with my current series so
stack overflows are less of a heartache. As the number of pages encountered
for filesystem writeback are reduced, it's also less of a problem for memcg.

The direct reclaim stall latency increases because of congestion_wait
throttling but the overall tests completes 602 seconds faster or by 8% (figures
not included). Scanning rates go up but with reduced-time-to-completion,
on balance I think it works out.

Andrew has picked up some of the series but I have another modification
to the tracepoints to differenciate between anon and file IO which I now
think is a very important distinction as flushers work on one but not the
other. I also must rebase upon a mmotm based on 2.6.35-rc4 before re-posting
the series but broadly speaking, I think we are going the right direction
without needing stack-switching tricks.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-07-07  0:25 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-29 11:34 [PATCH 0/14] Avoid overflowing of stack during page reclaim V3 Mel Gorman
2010-06-29 11:34 ` [PATCH 01/14] vmscan: Fix mapping use after free Mel Gorman
2010-06-29 14:27   ` Minchan Kim
2010-07-01  9:53     ` Mel Gorman
2010-06-29 14:44   ` Johannes Weiner
2010-06-29 11:34 ` [PATCH 02/14] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-29 11:34 ` [PATCH 03/14] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-29 11:34 ` [PATCH 04/14] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-29 11:34 ` [PATCH 05/14] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-29 11:34 ` [PATCH 06/14] vmscan: kill prev_priority completely Mel Gorman
2010-06-29 11:34 ` [PATCH 07/14] vmscan: simplify shrink_inactive_list() Mel Gorman
2010-06-29 11:34 ` [PATCH 08/14] vmscan: Remove unnecessary temporary vars in do_try_to_free_pages Mel Gorman
2010-06-29 11:34 ` [PATCH 09/14] vmscan: Setup pagevec as late as possible in shrink_inactive_list() Mel Gorman
2010-06-29 11:34 ` [PATCH 10/14] vmscan: Setup pagevec as late as possible in shrink_page_list() Mel Gorman
2010-06-29 11:34 ` [PATCH 11/14] vmscan: Update isolated page counters outside of main path in shrink_inactive_list() Mel Gorman
2010-06-29 11:34 ` [PATCH 12/14] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-07-02 19:51   ` Andrew Morton
2010-07-05 13:49     ` Mel Gorman
2010-07-06  0:36       ` KOSAKI Motohiro
2010-07-06  5:46         ` Minchan Kim
2010-07-06  6:02           ` KOSAKI Motohiro
2010-07-06  6:38             ` Minchan Kim
2010-07-06 10:12         ` Mel Gorman
2010-07-06 11:13           ` KOSAKI Motohiro
2010-07-06 11:24           ` Minchan Kim
2010-07-06 15:25             ` Mel Gorman
2010-07-06 20:27               ` Johannes Weiner
2010-07-06 22:28                 ` Minchan Kim
2010-07-07  0:24                   ` Mel Gorman [this message]
2010-07-07  1:15                     ` Christoph Hellwig
2010-07-07  9:43                       ` Mel Gorman
2010-07-07 12:51                         ` Rik van Riel
2010-07-07  1:14                 ` Christoph Hellwig
2010-07-08  6:39                 ` KOSAKI Motohiro
2010-07-07  5:03       ` Wu Fengguang
2010-07-07  9:50         ` Mel Gorman
2010-07-07 18:09         ` Christoph Hellwig
2010-06-29 11:34 ` [PATCH 13/14] fs,btrfs: Allow kswapd to writeback pages Mel Gorman
2010-06-30 13:05   ` Chris Mason
2010-07-01  9:55     ` Mel Gorman
2010-06-29 11:34 ` [PATCH 14/14] fs,xfs: " Mel Gorman
2010-06-29 12:37   ` Christoph Hellwig
2010-06-29 12:51     ` Mel Gorman
2010-06-30  0:14       ` KAMEZAWA Hiroyuki
2010-07-01 10:30         ` Mel Gorman
2010-07-02  6:26           ` KAMEZAWA Hiroyuki
2010-07-02  6:31             ` KAMEZAWA Hiroyuki
2010-07-05 14:16             ` Mel Gorman
2010-07-06  0:45               ` KAMEZAWA Hiroyuki
2010-07-02 19:33 ` [PATCH 0/14] Avoid overflowing of stack during page reclaim V3 Andrew Morton
2010-07-05  1:35   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100707002458.GI13780@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).