linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christoph Hellwig <hch@infradead.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 05/12] vmscan: kill prev_priority completely
Date: Mon, 28 Jun 2010 11:35:20 +0100	[thread overview]
Message-ID: <20100628103519.GC25379@csn.ul.ie> (raw)
In-Reply-To: <20100624211413.802B.A69D9226@jp.fujitsu.com>

On Fri, Jun 25, 2010 at 05:29:41PM +0900, KOSAKI Motohiro wrote:
> 
> sorry for the long delay.
> (and I'm a bit wonder why I was not CCed this thread ;)
> 

My fault, I unintentionally deleted your name from the send script.
Sorry about that.

> > On Mon, 14 Jun 2010 12:17:46 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > 
> > > Since 2.6.28 zone->prev_priority is unused. Then it can be removed
> > > safely. It reduce stack usage slightly.
> > > 
> > > Now I have to say that I'm sorry. 2 years ago, I thought prev_priority
> > > can be integrate again, it's useful. but four (or more) times trying
> > > haven't got good performance number. Thus I give up such approach.
> > 
> > This would have been badder in earlier days when we were using the
> > scanning priority to decide when to start unmapping pte-mapped pages -
> > page reclaim would have been recirculating large blobs of mapped pages
> > around the LRU until the priority had built to the level where we
> > started to unmap them.
> > 
> > However that priority-based decision got removed and right now I don't
> > recall what it got replaced with.  Aren't we now unmapping pages way
> > too early and suffering an increased major&minor fault rate?  Worried.
> > 
> > 
> > Things which are still broken after we broke prev_priority:
> > 
> > - If page reclaim is having a lot of trouble, prev_priority would
> >   have permitted do_try_to_free_pages() to call disable_swap_token()
> >   earlier on.  As things presently stand, we'll do a lot of
> >   thrash-detection stuff before (presumably correctly) dropping the
> >   swap token.
> > 
> >   So.  What's up with that?  I don't even remember _why_ we disable
> >   the swap token once the scanning priority gets severe and the code
> >   comments there are risible.  And why do we wait until priority==0
> >   rather than priority==1?
> > 
> > - Busted prev_priority means that lumpy reclaim will act oddly. 
> >   Every time someone goes into do some recalim, they'll start out not
> >   doing lumpy reclaim.  Then, after a while, they'll get a clue and
> >   will start doing the lumpy thing.  Then they return from reclaim and
> >   the next recalim caller will again forget that he should have done
> >   lumpy reclaim.
> > 

The intention of the code was to note that orders < PAGE_ALLOC_COSTLY_ORDER,
there was an expectatation that those pages would be free or nearly free
without page reclaim taking special steps with lumpy reclaim. If this has
changed, it's almost certainly because of a greater dependence on high-order
pages than previously which should be resisted (it has cropped up a few
times recently). I do have a script that uses ftrace to count call sites
using high-order allocations and how often they occur which would be of use
if this problem is being investigated.

> >   I dunno what the effects of this are in the real world, but it
> >   seems dumb.
> > 
> > And one has to wonder: if we're making these incorrect decisions based
> > upon a bogus view of the current scanning difficulty, why are these
> > various priority-based thresholding heuristics even in there?  Are they
> > doing anything useful?
> > 
> > So..  either we have a load of useless-crap-and-cruft in there which
> > should be lopped out, or we don't have a load of useless-crap-and-cruft
> > in there, and we should fix prev_priority.
> 
> May I explain my experience? I'd like to explain why prev_priority wouldn't
> works nowadays. 
> 
> First of all, Yes, current vmscan still a lot of UP centric code. it 
> expose some weakness on some dozens CPUs machine. I think we need 
> more and more improvement.
> 
> The problem is, current vmscan mix up per-system-pressure, per-zone-pressure
> and per-task-pressure a bit. example, prev_priority try to boost priority to
> other concurrent priority. but If the another task have mempolicy restriction,
> It's unnecessary, but also makes wrong big latency and exceeding reclaim.
> per-task based priority + prev_priority adjustment make the emulation of
> per-system pressure. but it have two issue 1) too rough and brutal emulation
> 2) we need per-zone pressure, not per-system.
> 
> another example, currently DEF_PRIORITY is 12. it mean the lru rotate about
> 2 cycle (1/4096 + 1/2048 + 1/1024 + .. + 1) before invoking OOM-Killer.
> but if 10,0000 thrreads enter DEF_PRIORITY reclaim at the same time, the
> system have higher memory pressure than priority==0 (1/4096*10,000 > 2).
> prev_priority can't solve such multithreads workload issue.
> 
> In other word, prev_priority concept assume the sysmtem don't have lots
> threads.
> 
> And, I don't think lumpy reclaim threshold is big matter, because It was
> introduced to case aim7 corner case issue. I don't think such situation
> will occur frequently in the real workload. thus end users can't observe
> such logic.
> 

I'm not aware of current problems with lumpy reclaim related stalls or
problems but it's not something I have specifically investigated. If
there is a known example workload that is felt to trigger lumpy reclaim
more than it should, someone point me in the general direction and I'll
take a look at it with ftrace and see what falls out.

> For mapped-vs-unmapped thing, I dunnno the exactly reason. That was
> introduced by Rik, unfortunatelly I had not joined its activity at 
> making design time. I can only say, while my testing the current code 
> works good.
> 
> That said, my conclusion is opposite. For long term view, we should
> consider to kill reclaim priority completely. Instead, we should
> consider to introduce per-zone pressure statistics.

Ah, the "what is pressure?" rat-hole :)

> 
> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > ---
> > >  include/linux/mmzone.h |   15 ------------
> > >  mm/page_alloc.c        |    2 -
> > >  mm/vmscan.c            |   57 ------------------------------------------------
> > >  mm/vmstat.c            |    2 -
> > 
> > The patch forgot to remove mem_cgroup_get_reclaim_priority() and friends.
> 
> Sure. thanks.
> Will fix.
> 

I've fixed this up in the current patchset that is V3.

> 
> btw, current zone reclaim have wrong swap token usage.
> 
> 	static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> 	{
> 	(snip)
> 	        disable_swap_token();
> 	        cond_resched();
> 
> 
> I can't understand the reason why zone reclaim _always_ disable swap token.
> that's mean, if the system is enabled zone reclaim, swap token don't works 
> at all.
> 
> Perhaps, original author's intention was following, I guess.
> 
>                 priority = ZONE_RECLAIM_PRIORITY;
>                 do {
>                         if ((zone_reclaim_mode & RECLAIM_SWAP) && !priority)	// here
> 			        disable_swap_token();				// here
> 
>                         note_zone_scanning_priority(zone, priority);
>                         shrink_zone(priority, zone, &sc);
>                         priority--;
>                 } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
> 
> 
> However, if my understanding is correct, we can remove this 
> disable_swap_token() completely. because zone reclaim failure don't bring 
> to OOM-Killer, instead melery cause normal try_to_free_pages().
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-06-28 10:35 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-14 11:17 [PATCH 0/12] Avoid overflowing of stack during page reclaim V2 Mel Gorman
2010-06-14 11:17 ` [PATCH 01/12] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-14 15:45   ` Rik van Riel
2010-06-14 21:01   ` Larry Woodman
2010-06-14 11:17 ` [PATCH 02/12] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-14 16:47   ` Rik van Riel
2010-06-14 21:02   ` Larry Woodman
2010-06-14 11:17 ` [PATCH 03/12] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-14 16:48   ` Rik van Riel
2010-06-14 21:02   ` Larry Woodman
2010-06-14 11:17 ` [PATCH 04/12] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-14 17:55   ` Rik van Riel
2010-06-14 21:03   ` Larry Woodman
2010-06-14 11:17 ` [PATCH 05/12] vmscan: kill prev_priority completely Mel Gorman
2010-06-14 18:04   ` Rik van Riel
2010-06-16 23:37   ` Andrew Morton
2010-06-16 23:45     ` Rik van Riel
2010-06-17  0:18       ` Andrew Morton
2010-06-17  0:34         ` Rik van Riel
2010-06-25  8:29     ` KOSAKI Motohiro
2010-06-28 10:35       ` Mel Gorman [this message]
2010-06-14 11:17 ` [PATCH 06/12] vmscan: simplify shrink_inactive_list() Mel Gorman
2010-06-14 18:06   ` Rik van Riel
2010-06-15 10:13     ` Mel Gorman
2010-06-14 11:17 ` [PATCH 07/12] vmscan: Remove unnecessary temporary vars in do_try_to_free_pages Mel Gorman
2010-06-14 18:14   ` Rik van Riel
2010-06-14 11:17 ` [PATCH 08/12] vmscan: Setup pagevec as late as possible in shrink_inactive_list() Mel Gorman
2010-06-14 18:59   ` Rik van Riel
2010-06-15 10:47   ` Christoph Hellwig
2010-06-15 15:56     ` Mel Gorman
2010-06-16 23:43   ` Andrew Morton
2010-06-17 10:30     ` Mel Gorman
2010-06-14 11:17 ` [PATCH 09/12] vmscan: Setup pagevec as late as possible in shrink_page_list() Mel Gorman
2010-06-14 19:24   ` Rik van Riel
2010-06-16 23:48   ` Andrew Morton
2010-06-17 10:46     ` Mel Gorman
2010-06-14 11:17 ` [PATCH 10/12] vmscan: Update isolated page counters outside of main path in shrink_inactive_list() Mel Gorman
2010-06-14 19:42   ` Rik van Riel
2010-06-14 11:17 ` [PATCH 11/12] vmscan: Write out dirty pages in batch Mel Gorman
2010-06-14 21:13   ` Rik van Riel
2010-06-15 10:18     ` Mel Gorman
2010-06-14 23:11   ` Dave Chinner
2010-06-14 23:21     ` Andrew Morton
2010-06-15  0:39       ` Dave Chinner
2010-06-15  1:16         ` Rik van Riel
2010-06-15  1:45           ` Andrew Morton
2010-06-15  4:08             ` Rik van Riel
2010-06-15  4:37               ` Andrew Morton
2010-06-15  5:12                 ` Nick Piggin
2010-06-15  5:43                   ` [patch] mm: vmscan fix mapping use after free Nick Piggin
2010-06-15 13:23                     ` Mel Gorman
2010-06-15 11:01           ` [PATCH 11/12] vmscan: Write out dirty pages in batch Christoph Hellwig
2010-06-15 13:32             ` Rik van Riel
2010-06-15  1:39         ` Andrew Morton
2010-06-15  3:20           ` Dave Chinner
2010-06-15  4:15             ` Andrew Morton
2010-06-15  6:36               ` Dave Chinner
2010-06-15 10:28                 ` Evgeniy Polyakov
2010-06-15 10:55                   ` Nick Piggin
2010-06-15 11:10                     ` Christoph Hellwig
2010-06-15 11:20                       ` Nick Piggin
2010-06-15 23:20                     ` Dave Chinner
2010-06-16  6:04                       ` Nick Piggin
2010-06-15 11:08                   ` Christoph Hellwig
2010-06-15 11:43               ` Mel Gorman
2010-06-15 13:07                 ` tytso
2010-06-15 15:44                 ` Mel Gorman
2010-06-15 10:57       ` Christoph Hellwig
2010-06-15 10:53   ` Christoph Hellwig
2010-06-15 11:11     ` Mel Gorman
2010-06-15 11:13     ` Nick Piggin
2010-06-14 11:17 ` [PATCH 12/12] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-14 21:55   ` Rik van Riel
2010-06-15 11:45     ` Mel Gorman
2010-06-15 13:34       ` Rik van Riel
2010-06-15 13:37         ` Christoph Hellwig
2010-06-15 13:54           ` Mel Gorman
2010-06-16  0:30             ` KAMEZAWA Hiroyuki
2010-06-15 14:02           ` Rik van Riel
2010-06-15 13:59         ` Mel Gorman
2010-06-15 14:04           ` Rik van Riel
2010-06-15 14:16             ` Mel Gorman
2010-06-16  0:17               ` KAMEZAWA Hiroyuki
2010-06-16  0:29                 ` Rik van Riel
2010-06-16  0:39                   ` KAMEZAWA Hiroyuki
2010-06-16  0:53                     ` Rik van Riel
2010-06-16  1:40                       ` KAMEZAWA Hiroyuki
2010-06-16  2:20                         ` KAMEZAWA Hiroyuki
2010-06-16  5:11                           ` Christoph Hellwig
2010-06-16 10:51                             ` Jens Axboe
2010-06-16  5:07                     ` Christoph Hellwig
2010-06-16  5:06                 ` Christoph Hellwig
2010-06-17  0:25                   ` KAMEZAWA Hiroyuki
2010-06-17  6:16                     ` Christoph Hellwig
2010-06-17  6:23                       ` KAMEZAWA Hiroyuki
2010-06-14 15:10 ` [PATCH 0/12] Avoid overflowing of stack during page reclaim V2 Christoph Hellwig
2010-06-15 11:45   ` Mel Gorman
2010-06-15  0:08 ` KAMEZAWA Hiroyuki
2010-06-15 11:49   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100628103519.GC25379@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).