linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible
Date: Wed, 9 Jun 2010 10:52:00 +0100	[thread overview]
Message-ID: <20100609095200.GA5650@csn.ul.ie> (raw)
In-Reply-To: <20100609115211.435a45f7.kamezawa.hiroyu@jp.fujitsu.com>

On Wed, Jun 09, 2010 at 11:52:11AM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue,  8 Jun 2010 10:02:19 +0100
> > <SNIP>
> > 
> > Patch 5 writes out contiguous ranges of pages where possible using
> > a_ops->writepages. When writing a range, the inode is pinned and the page
> > lock released before submitting to writepages(). This potentially generates
> > a better IO pattern and it should avoid a lock inversion problem within the
> > filesystem that wants the same page lock held by the VM. The downside with
> > writing ranges is that the VM may not be generating more IO than necessary.
> > 
> > Patch 6 prevents direct reclaim writing out pages at all and instead dirty
> > pages are put back on the LRU. For lumpy reclaim, the caller will briefly
> > wait on dirty pages to be written out before trying to reclaim the dirty
> > pages a second time.
> > 
> > The last patch increases the responsibility of kswapd somewhat because
> > it's now cleaning pages on behalf of direct reclaimers but kswapd seemed
> > a better fit than background flushers to clean pages as it knows where the
> > pages needing cleaning are. As it's async IO, it should not cause kswapd to
> > stall (at least until the queue is congested) but the order that pages are
> > reclaimed on the LRU is altered. Dirty pages that would have been reclaimed
> > by direct reclaimers are getting another lap on the LRU. The dirty pages
> > could have been put on a dedicated list but this increased counter overhead
> > and the number of lists and it is unclear if it is necessary.
> > 
> > <SNIP>
> 
> My concern is how memcg should work. IOW, what changes will be necessary for
> memcg to work with the new vmscan logic as no-direct-writeback.
> 

At worst, memcg waits on background flushers to clean their pages but
obviously this could lead to stalls in containers if it happened to be full
of dirty pages.

Do you have test scenarios already setup for functional and performance
regression testing of containers? If so, can you run tests with this series
and see what sort of impact you find? I haven't done performance testing
with containers to date so I don't know what the expected values are.

> Maybe an ideal solution will be
>  - support buffered I/O tracking in I/O cgroup.
>  - flusher threads should work with I/O cgroup.
>  - memcg itself should support dirty ratio. and add a trigger to kick flusher
>    threads for dirty pages in a memcg.
> But I know it's a long way.
> 

I'm not very familiar with memcg I'm afraid or its requirements so I am
having trouble guessing which of these would behave the best. You could take
a gamble on having memcg doing writeback in direct reclaim but you may run
into the same problem of overflowing stacks.

I'm not sure how a flusher thread would work just within a cgroup. It
would have to do a lot of searching to find the pages it needs
considering that it's looking at inodes rather than pages.

One possibility I guess would be to create a flusher-like thread if a direct
reclaimer finds that the dirty pages in the container are above the dirty
ratio. It would scan and clean all dirty pages in the container LRU on behalf
of dirty reclaimers.

Another possibility would be to have kswapd work in containers.
Specifically, if wakeup_kswapd() is called with a cgroup that it's added
to a list. kswapd gives priority to global reclaim but would
occasionally check if there is a container that needs kswapd on a
pending list and if so, work within the container. Is there a good
reason why kswapd does not work within container groups?

Finally, you could just allow reclaim within a memcg do writeback. Right
now, the check is based on current_is_kswapd() but I could create a helper
function that also checked for sc->mem_cgroup. Direct reclaim from the
page allocator never appears to work within a container group (which
raises questions in itself such as why a process in a container would
reclaim pages outside the container?) so it would remain safe.

> How the new logic works with memcg ? Because memcg doesn't trigger kswapd,
> memcg has to wait for a flusher thread make pages clean ?

Right now, memcg has to wait for a flusher thread to make pages clean.

> Or memcg should have kswapd-for-memcg ?
> 
> Is it okay to call writeback directly when !scanning_global_lru() ?
> memcg's reclaim routine is only called from specific positions, so, I guess
> no stack problem.

It's a judgement call from you really. I see that direct reclaimers do
not set mem_cgroup so it's down to - are you reasonably sure that all
the paths that reclaim based on a container are not deep? I looked
around for a while and the bulk appeared to be in the fault path so I
would guess "yes" but as I'm not familiar with the memcg implementation
I'll have missed a lot.

> But we just have I/O pattern problem.

True.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

  reply	other threads:[~2010-06-09  9:52 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-08  9:02 [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Mel Gorman
2010-06-08  9:02 ` [PATCH 1/6] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-08  9:02 ` [PATCH 2/6] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-08  9:02 ` [PATCH 3/6] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-08  9:02 ` [PATCH 4/6] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-08  9:02 ` [PATCH 5/6] vmscan: Write out ranges of pages contiguous to the inode where possible Mel Gorman
2010-06-11  6:10   ` Andrew Morton
2010-06-11 12:49     ` Mel Gorman
2010-06-11 19:07       ` Andrew Morton
2010-06-11 20:44         ` Mel Gorman
2010-06-11 21:33           ` Andrew Morton
2010-06-12  0:17             ` Mel Gorman
2010-06-11 16:27     ` Christoph Hellwig
2010-06-08  9:02 ` [PATCH 6/6] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-11  6:17   ` Andrew Morton
2010-06-11 12:54     ` Mel Gorman
2010-06-11 16:25     ` Christoph Hellwig
2010-06-11 17:43       ` Andrew Morton
2010-06-11 17:49         ` Christoph Hellwig
2010-06-11 18:13           ` Mel Gorman
2010-06-08  9:08 ` [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Christoph Hellwig
2010-06-08  9:28   ` Mel Gorman
2010-06-11 16:29     ` Christoph Hellwig
2010-06-11 18:15       ` Mel Gorman
2010-06-11 19:12       ` Chris Mason
2010-06-09  2:52 ` KAMEZAWA Hiroyuki
2010-06-09  9:52   ` Mel Gorman [this message]
2010-06-10  0:38     ` KAMEZAWA Hiroyuki
2010-06-10  1:10       ` Mel Gorman
2010-06-10  1:29         ` KAMEZAWA Hiroyuki
2010-06-11  5:57 ` Andrew Morton
2010-06-11 12:33   ` Mel Gorman
2010-06-11 16:30     ` Christoph Hellwig
2010-06-11 18:17       ` Mel Gorman
2010-06-15 14:00 ` Andrea Arcangeli
2010-06-15 14:11   ` Christoph Hellwig
2010-06-15 14:22     ` Andrea Arcangeli
2010-06-15 14:43       ` Christoph Hellwig
2010-06-15 15:08         ` Andrea Arcangeli
2010-06-15 15:25           ` Christoph Hellwig
2010-06-15 15:45             ` Andrea Arcangeli
2010-06-15 16:26               ` Christoph Hellwig
2010-06-15 16:31                 ` Andrea Arcangeli
2010-06-15 16:49                 ` Rik van Riel
2010-06-15 16:54                   ` Christoph Hellwig
2010-06-15 19:13                     ` Rik van Riel
2010-06-15 19:17                       ` Christoph Hellwig
2010-06-15 19:44                         ` Chris Mason
2010-06-16  7:57                       ` Nick Piggin
2010-06-16 16:59                         ` Rik van Riel
2010-06-16 17:04                           ` Andrea Arcangeli
2010-06-15 16:54                   ` Nick Piggin
2010-06-15 15:38           ` Mel Gorman
2010-06-15 16:14             ` Andrea Arcangeli
2010-06-15 16:22               ` Christoph Hellwig
2010-06-15 16:30               ` Mel Gorman
2010-06-15 16:34                 ` Mel Gorman
2010-06-15 16:54                   ` Andrea Arcangeli
2010-06-15 16:35                 ` Christoph Hellwig
2010-06-15 16:37                 ` Andrea Arcangeli
2010-06-15 17:43                   ` Christoph Hellwig
2010-06-15 16:45               ` Christoph Hellwig
2010-06-15 14:51   ` Mel Gorman
2010-06-15 14:55     ` Rik van Riel
2010-06-15 15:08     ` Nick Piggin
2010-06-15 15:10       ` Mel Gorman
2010-06-15 16:28     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100609095200.GA5650@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).