linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>,
	Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Andy Whitcroft <apw@shadowen.org>, Rik van Riel <riel@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Nick Piggin <npiggin@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andreas Mohr <andi@lisas.de>, Bill Davidsen <davidsen@tmr.com>,
	Ben Gamari <bgamari.foss@gmail.com>
Subject: Re: [PATCH] vmscan: remove wait_on_page_writeback() from pageout()
Date: Sun, 1 Aug 2010 13:17:57 +0800	[thread overview]
Message-ID: <20100801051756.GA7515@localhost> (raw)
In-Reply-To: <20100728183625.4A7F.A69D9226@jp.fujitsu.com>

On Wed, Jul 28, 2010 at 05:43:41PM +0800, KOSAKI Motohiro wrote:
> > On Wed, Jul 28, 2010 at 04:46:54PM +0800, Wu Fengguang wrote:
> > > The wait_on_page_writeback() call inside pageout() is virtually dead code.
> > > 
> > >         shrink_inactive_list()
> > >           shrink_page_list(PAGEOUT_IO_ASYNC)
> > >             pageout(PAGEOUT_IO_ASYNC)
> > >           shrink_page_list(PAGEOUT_IO_SYNC)
> > >             pageout(PAGEOUT_IO_SYNC)
> > > 
> > > Because shrink_page_list/pageout(PAGEOUT_IO_SYNC) is always called after
> > > a preceding shrink_page_list/pageout(PAGEOUT_IO_ASYNC), the first
> > > pageout(ASYNC) converts dirty pages into writeback pages, the second
> > > shrink_page_list(SYNC) waits on the clean of writeback pages before
> > > calling pageout(SYNC). The second shrink_page_list(SYNC) can hardly run
> > > into dirty pages for pageout(SYNC) unless in some race conditions.
> > > 
> > 
> > It's possible for the second call to run into dirty pages as there is a
> > congestion_wait() call between the first shrink_page_list() call and the
> > second. That's a big window.
> > 
> > > And the wait page-by-page behavior of pageout(SYNC) will lead to very
> > > long stall time if running into some range of dirty pages.
> > 
> > True, but this is also lumpy reclaim which is depending on a contiguous
> > range of pages. It's better for it to wait on the selected range of pages
> > which is known to contain at least one old page than excessively scan and
> > reclaim newer pages.
> 
> Today, I was successful to reproduce the Andres's issue. and I disagree this
> opinion.
> The root cause is, congestion_wait() mean "wait until clear io congestion". but
> if the system have plenty dirty pages, flusher threads are issueing IO conteniously.
> So, io congestion is not cleared long time. eventually, congestion_wait(BLK_RW_ASYNC, HZ/10)
> become to equivalent to sleep(HZ/10).
> 
> I would propose followint patch instead.
> 
> And I've found synchronous lumpy reclaim have more serious problem. I woule like to
> explain it as another mail.
> 
> Thanks.
> 
> 
> 
> >From 0266fb2c23aef659cd4e89fccfeb464f23257b74 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Tue, 27 Jul 2010 14:36:44 +0900
> Subject: [PATCH] vmscan: synchronous lumpy reclaim don't call congestion_wait()
> 
> congestion_wait() mean "waiting for number of requests in IO queue is
> under congestion threshold".
> That said, if the system have plenty dirty pages, flusher thread push
> new request to IO queue conteniously. So, IO queue are not cleared
> congestion status for a long time. thus, congestion_wait(HZ/10) is
> almostly equivalent schedule_timeout(HZ/10).
> 
> If the system 512MB memory, DEF_PRIORITY mean 128kB scan and 4096 times
> shrink_inactive_list call. 4096 times 0.1sec stall makes crazy insane
> long stall. That shouldn't.

Good point. Maybe more clear to say: "It takes 4096 shrink_page_list()
calls to scan 512MB memory."

> In the other hand, this synchronous lumpy reclaim donesn't need this
> congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) cause to
> call wait_on_page_writeback() and it provide sufficient waiting.

Agreed.

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

Thanks,
Fengguang

  parent reply	other threads:[~2010-08-01  5:31 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-28  7:17 [PATCH] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls Wu Fengguang
2010-07-28  7:49 ` Minchan Kim
2010-07-28  8:46   ` [PATCH] vmscan: remove wait_on_page_writeback() from pageout() Wu Fengguang
2010-07-28  9:10     ` Mel Gorman
2010-07-28  9:30       ` Wu Fengguang
2010-07-28  9:45         ` Mel Gorman
2010-07-28  9:43       ` KOSAKI Motohiro
2010-07-28  9:50         ` Mel Gorman
2010-07-28  9:59           ` KOSAKI Motohiro
2010-08-01  5:27             ` Wu Fengguang
2010-08-01  5:49               ` Wu Fengguang
2010-08-01  8:32               ` KOSAKI Motohiro
2010-08-01  8:35                 ` Wu Fengguang
2010-08-01  8:40                   ` KOSAKI Motohiro
2010-08-01  5:17         ` Wu Fengguang [this message]
2010-07-28 16:29     ` Minchan Kim
2010-07-28 11:40 ` Why PAGEOUT_IO_SYNC stalls for a long time KOSAKI Motohiro
2010-07-28 13:10   ` Mel Gorman
2010-07-29 10:34     ` KOSAKI Motohiro
2010-07-29 14:24       ` Mel Gorman
2010-07-30  4:54         ` KOSAKI Motohiro
2010-07-30 10:30           ` Mel Gorman
2010-08-01  8:47             ` KOSAKI Motohiro
2010-08-04 11:10               ` Mel Gorman
2010-08-05  6:20                 ` KOSAKI Motohiro
2010-08-05  8:09                   ` Andreas Mohr
2010-07-28 17:30   ` Andrew Morton
2010-07-29  1:01     ` KOSAKI Motohiro
2010-07-30 13:17 ` [PATCH] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls Andrea Arcangeli
2010-07-30 13:31   ` Mel Gorman
2010-07-31 16:13 ` Wu Fengguang
2010-07-31 17:33   ` Christoph Hellwig
2010-07-31 17:55     ` Pekka Enberg
2010-07-31 17:59       ` Christoph Hellwig
2010-07-31 18:09         ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100801051756.GA7515@localhost \
    --to=fengguang.wu@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@lisas.de \
    --cc=apw@shadowen.org \
    --cc=bgamari.foss@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=davidsen@tmr.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).