linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Subject: Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
Date: Thu, 19 Aug 2010 16:40:33 +0100	[thread overview]
Message-ID: <20100819154032.GE19797@csn.ul.ie> (raw)
In-Reply-To: <20100819152233.GD6805@barrios-desktop>

On Fri, Aug 20, 2010 at 12:22:33AM +0900, Minchan Kim wrote:
> On Thu, Aug 19, 2010 at 04:07:39PM +0100, Mel Gorman wrote:
> > On Thu, Aug 19, 2010 at 11:34:39PM +0900, Minchan Kim wrote:
> > > On Thu, Aug 19, 2010 at 03:09:46PM +0100, Mel Gorman wrote:
> > > > On Thu, Aug 19, 2010 at 11:01:50PM +0900, Minchan Kim wrote:
> > > > > On Thu, Aug 19, 2010 at 11:38:39AM +0100, Mel Gorman wrote:
> > > > > > On Thu, Aug 19, 2010 at 07:33:57PM +0900, Minchan Kim wrote:
> > > > > > > On Thu, Aug 19, 2010 at 5:06 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> > > > > > > > On Wed, Aug 18, 2010 at 11:57:26PM +0900, Minchan Kim wrote:
> > > > > > > >> On Wed, Aug 18, 2010 at 09:51:23AM +0100, Mel Gorman wrote:
> > > > > > > >> > > What's a window low and min wmark? Maybe I can miss your point.
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> > The window is due to the fact kswapd is not awake yet. The window is because
> > > > > > > >> > kswapd might not be awake as NR_FREE_PAGES is higher than it should be. The
> > > > > > > >> > system is really somewhere between the low and min watermark but we are not
> > > > > > > >> > taking the accurate measure until kswapd gets woken up. The first allocation
> > > > > > > >> > to notice we are below the low watermark (be it due to vmstat refreshing or
> > > > > > > >> > that NR_FREE_PAGES happens to report we are below the watermark regardless of
> > > > > > > >> > any drift) wakes kswapd and other callers then take an accurate count hence
> > > > > > > >> > "we could breach the watermark but I'm expecting it can only happen for at
> > > > > > > >> > worst one allocation".
> > > > > > > >>
> > > > > > > >> Right. I misunderstood your word.
> > > > > > > >> One more question.
> > > > > > > >>
> > > > > > > >> Could you explain live lock scenario?
> > > > > > > >>
> > > > > > > >
> > > > > > > > Lets say
> > > > > > > >
> > > > > > > > NR_FREE_PAGES     = 256
> > > > > > > > Actual free pages = 8
> > > > > > > >
> > > > > > > > The PCP lists get refilled in patch taking all 8 pages. Now there are
> > > > > > > > zero free pages. Reclaim kicks in but to reclaim any pages it needs to
> > > > > > > > clean something but all the pages are on a network-backed filesystem. To
> > > > > > > > clean them, it must transmit on the network so it tries to allocate some
> > > > > > > > buffers.
> > > > > > > >
> > > > > > > > The livelock is that to free some memory, an allocation must succeed but
> > > > > > > > for an allocation to succeed, some memory must be freed. The system
> > > > > > > 
> > > > > > > Yes. I understood this as livelock but at last VM will kill victim
> > > > > > > process then it can allocate free pages.
> > > > > > 
> > > > > > And if the exit path for the OOM kill needs to allocate a page what
> > > > > > should it do?
> > > > > 
> > > > > Yeah. It might be livelock. 
> > > > > Then, let's rethink the problem. 
> > > > > 
> > > > > The problem is following as. 
> > > > > 
> > > > > 1. Process A try to allocate the page
> > > > > 2. VM try to reclaim the page for process A
> > > > > 3. VM reclaims some pages but it remains on PCP so can't allocate pages for A
> > > > > 4. VM try to kill process B
> > > > > 5. The exit path need new pages for exiting process B
> > > > > 6. Livelock happens(I am not sure but we need any warning if it really happens at least)
> > > > > 
> > > > 
> > > > The problem this patch is concerned with is about the vmstat counters, not
> > > > the pages on the per-cpu lists. The issue being dealt with is that the page
> > > > allocator grants a page going below the min watermark because NR_FREE_PAGES
> > > > can be inaccurate. The patch aims to fix that but taking greater care
> > > > with NR_FREE_PAGES when memory is low.
> > > 
> > > Your goal is to protect _min_ pages which is reserved. Right?
> > > I thought your final goal is to protect the livelock problem. 
> > > Hmm.. Sorry for the noise. :(
> > > 
> > 
> > Emm, it's the same thing. If the min watermark is not properly
> > preserved, the system is in danger of being live-locked.
> 
> Totally right. 
> Maybe I am sleeping.
> 
> Let's add follwing as comment about livelock.
> 

Sure!

> "If NR_FREE_PAGES is much higher than number of real free page in buddy,
> the VM can allocate pages below min watermark(At worst, buddy is zero). 
> Although VM kills some victim for freeing memory, it can't do it if the 
> exit path requires new page since buddy have zero page. It can result in
> livelock."
> 

Thanks

> At least, it help to not hurt you in future by me who is fool. 
> 

The patch leader now reads as

Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists. To avoid synchronization overhead,
counter deltas are maintained on a per-cpu basis and drained both periodically
and when the delta is above a threshold. On large CPU systems, the difference
between the estimated and real value of NR_FREE_PAGES can be very high.
If NR_FREE_PAGES is much higher than number of real free page in buddy, the VM
can allocate pages below min watermark, at worst reducing the real number of
pages to zero. Even if the OOM killer kills some victim for freeing memory, it
may not free memory if the exit path requires a new page resulting in livelock.

This patch introduces zone_nr_free_pages() to take a slightly more accurate
estimate of NR_FREE_PAGES while kswapd is awake. The estimate is not perfect
and may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.

Is that better?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-08-19 15:40 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-16  9:42 [RFC PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator Mel Gorman
2010-08-16  9:42 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-08-16 14:04   ` Rik van Riel
2010-08-16 15:26   ` Johannes Weiner
2010-08-17  2:21   ` Minchan Kim
2010-08-17  9:59     ` Mel Gorman
2010-08-17 14:25       ` Minchan Kim
2010-08-18  2:21   ` KAMEZAWA Hiroyuki
2010-08-16  9:42 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-08-16  9:43   ` Mel Gorman
2010-08-16 14:47     ` Rik van Riel
2010-08-16 16:06     ` Johannes Weiner
2010-08-17  2:26       ` Minchan Kim
2010-08-17 10:42         ` Mel Gorman
2010-08-17 15:01           ` Minchan Kim
2010-08-17 15:05             ` Mel Gorman
2010-08-17 10:16       ` Mel Gorman
2010-08-17 11:05         ` Johannes Weiner
2010-08-17 14:20         ` Minchan Kim
2010-08-18  8:51           ` Mel Gorman
2010-08-18 14:57             ` Minchan Kim
2010-08-19  8:06               ` Mel Gorman
2010-08-19 10:33                 ` Minchan Kim
2010-08-19 10:38                   ` Mel Gorman
2010-08-19 14:01                     ` Minchan Kim
2010-08-19 14:09                       ` Mel Gorman
2010-08-19 14:34                         ` Minchan Kim
2010-08-19 15:07                           ` Mel Gorman
2010-08-19 15:22                             ` Minchan Kim
2010-08-19 15:40                               ` Mel Gorman [this message]
2010-08-19 15:44                                 ` Minchan Kim
2010-08-19 15:46     ` Minchan Kim
2010-08-19 16:06       ` Mel Gorman
2010-08-19 16:45         ` Minchan Kim
2010-08-18  2:59   ` KAMEZAWA Hiroyuki
2010-08-18 15:55     ` Christoph Lameter
2010-08-19  0:07       ` KAMEZAWA Hiroyuki
2010-08-19 19:00         ` Christoph Lameter
2010-08-19 23:49           ` KAMEZAWA Hiroyuki
2010-08-20  0:22             ` [PATCH] vmstat : update zone stat threshold at onlining a cpu KAMEZAWA Hiroyuki
2010-08-20 14:54               ` Christoph Lameter
2010-08-20 17:29                 ` Andrew Morton
2010-08-23  7:18               ` Mel Gorman
2010-08-16  9:42 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-16 14:50   ` Rik van Riel
2010-08-17  2:57   ` Minchan Kim
2010-08-18  3:02   ` KAMEZAWA Hiroyuki
2010-08-19 14:47   ` Minchan Kim
2010-08-19 15:10     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2010-08-23  8:00 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V2 Mel Gorman
2010-08-23  8:00 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-08-23 12:56   ` Christoph Lameter
2010-08-23 13:03     ` Mel Gorman
2010-08-23 13:41       ` Christoph Lameter
2010-08-23 13:55         ` Mel Gorman
2010-08-23 16:04           ` Christoph Lameter
2010-08-23 16:13             ` Mel Gorman
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
2010-08-31 17:37 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-08-31 18:20   ` Christoph Lameter
2010-08-31 23:37   ` KOSAKI Motohiro
2010-09-01  7:24     ` Mel Gorman
2010-09-01  7:33       ` KOSAKI Motohiro
2010-09-01 20:16         ` Christoph Lameter
2010-09-01 20:34           ` Mel Gorman
2010-09-02  0:24             ` Christoph Lameter
2010-09-02  0:26               ` KOSAKI Motohiro
2010-09-02  0:39                 ` Christoph Lameter
2010-09-02  0:54                   ` Christoph Lameter
2010-09-02  0:43   ` Christoph Lameter
2010-09-02  0:49     ` KOSAKI Motohiro
2010-09-02  8:51     ` Mel Gorman
2010-09-03  9:08 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Mel Gorman
2010-09-03  9:08 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-09-03 22:55   ` Andrew Morton
2010-09-03 23:17     ` Christoph Lameter
2010-09-03 23:28       ` Andrew Morton
2010-09-04  0:54         ` Christoph Lameter
2010-09-05 18:12     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100819154032.GE19797@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).