All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Stephen Wilson <wilsons@start.ca>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Hugh Dickins <hughd@google.com>,
	David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: [PATCH 0/8] avoid allocation in show_numa_map()
Date: Wed, 4 May 2011 16:10:20 -0700	[thread overview]
Message-ID: <20110504161020.e2d0a7f2.akpm@linux-foundation.org> (raw)
In-Reply-To: <1303947349-3620-1-git-send-email-wilsons@start.ca>

On Wed, 27 Apr 2011 19:35:41 -0400
Stephen Wilson <wilsons@start.ca> wrote:

> Recently a concern was raised[1] that performing an allocation while holding a
> reference on a tasks mm could lead to a stalemate in the oom killer.  The
> concern was specific to the goings-on in /proc.  Hugh Dickins stated the issue
> thusly:
> 
>     ...imagine what happens if the system is out of memory, and the mm
>     we're looking at is selected for killing by the OOM killer: while we
>     wait in __get_free_page for more memory, no memory is freed from the
>     selected mm because it cannot reach exit_mmap while we hold that
>     reference.
> 
> The primary goal of this series is to eliminate repeated allocation/free cycles
> currently happening in show_numa_maps() while we hold a reference to an mm.
> 
> The strategy is to perform the allocation once when /proc/pid/numa_maps is
> opened, before a reference on the target tasks mm is taken.
> 
> Unfortunately, show_numa_maps() is implemented in mm/mempolicy.c while the
> primary procfs implementation  lives in fs/proc/task_mmu.c.  This makes
> clean cooperation between show_numa_maps() and the other seq_file operations
> (start(), stop(), etc) difficult.
> 
> 
> Patches 1-5 convert show_numa_maps() to use the generic walk_page_range()
> functionality instead of the mempolicy.c specific page table walking logic.
> Also, get_vma_policy() is exported.  This makes the show_numa_maps()
> implementation independent of mempolicy.c. 
> 
> Patch 6 moves show_numa_maps() and supporting routines over to
> fs/proc/task_mmu.c.
> 
> Finally, patches 7 and 8 provide minor cleanup and eliminates the troublesome
> allocation.
> 
>  
> Please note that moving show_numa_maps() into fs/proc/task_mmu.c essentially
> reverts 1a75a6c825 and 48fce3429d.  Also, please see the discussion at [2].  My
> main justifications for moving the code back into task_mmu.c is:
> 
>   - Having the show() operation "miles away" from the corresponding
>     seq_file iteration operations is a maintenance burden. 
>     
>   - The need to export ad hoc info like struct proc_maps_private is
>     eliminated.
> 
> 
> These patches are based on v2.6.39-rc5.

The patches look reasonable.  It would be nice to get some more review
happening (poke).

> 
> Please note that this series is VERY LIGHTLY TESTED.  I have been using
> CONFIG_NUMA_EMU=y thus far as I will not have access to a real NUMA system for
> another week or two.

"lightly tested" evokes fear, but the patches don't look too scary to
me.

Did you look at using apply_to_page_range()?

I'm trying to remember why we're carrying both walk_page_range() and
apply_to_page_range() but can't immediately think of a reason.

There's also an apply_to_page_range_batch() in -mm but that code is
broken on PPC and not much is happening with it, so it will probably go
away again.



WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Stephen Wilson <wilsons@start.ca>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Hugh Dickins <hughd@google.com>,
	David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: [PATCH 0/8] avoid allocation in show_numa_map()
Date: Wed, 4 May 2011 16:10:20 -0700	[thread overview]
Message-ID: <20110504161020.e2d0a7f2.akpm@linux-foundation.org> (raw)
In-Reply-To: <1303947349-3620-1-git-send-email-wilsons@start.ca>

On Wed, 27 Apr 2011 19:35:41 -0400
Stephen Wilson <wilsons@start.ca> wrote:

> Recently a concern was raised[1] that performing an allocation while holding a
> reference on a tasks mm could lead to a stalemate in the oom killer.  The
> concern was specific to the goings-on in /proc.  Hugh Dickins stated the issue
> thusly:
> 
>     ...imagine what happens if the system is out of memory, and the mm
>     we're looking at is selected for killing by the OOM killer: while we
>     wait in __get_free_page for more memory, no memory is freed from the
>     selected mm because it cannot reach exit_mmap while we hold that
>     reference.
> 
> The primary goal of this series is to eliminate repeated allocation/free cycles
> currently happening in show_numa_maps() while we hold a reference to an mm.
> 
> The strategy is to perform the allocation once when /proc/pid/numa_maps is
> opened, before a reference on the target tasks mm is taken.
> 
> Unfortunately, show_numa_maps() is implemented in mm/mempolicy.c while the
> primary procfs implementation  lives in fs/proc/task_mmu.c.  This makes
> clean cooperation between show_numa_maps() and the other seq_file operations
> (start(), stop(), etc) difficult.
> 
> 
> Patches 1-5 convert show_numa_maps() to use the generic walk_page_range()
> functionality instead of the mempolicy.c specific page table walking logic.
> Also, get_vma_policy() is exported.  This makes the show_numa_maps()
> implementation independent of mempolicy.c. 
> 
> Patch 6 moves show_numa_maps() and supporting routines over to
> fs/proc/task_mmu.c.
> 
> Finally, patches 7 and 8 provide minor cleanup and eliminates the troublesome
> allocation.
> 
>  
> Please note that moving show_numa_maps() into fs/proc/task_mmu.c essentially
> reverts 1a75a6c825 and 48fce3429d.  Also, please see the discussion at [2].  My
> main justifications for moving the code back into task_mmu.c is:
> 
>   - Having the show() operation "miles away" from the corresponding
>     seq_file iteration operations is a maintenance burden. 
>     
>   - The need to export ad hoc info like struct proc_maps_private is
>     eliminated.
> 
> 
> These patches are based on v2.6.39-rc5.

The patches look reasonable.  It would be nice to get some more review
happening (poke).

> 
> Please note that this series is VERY LIGHTLY TESTED.  I have been using
> CONFIG_NUMA_EMU=y thus far as I will not have access to a real NUMA system for
> another week or two.

"lightly tested" evokes fear, but the patches don't look too scary to
me.

Did you look at using apply_to_page_range()?

I'm trying to remember why we're carrying both walk_page_range() and
apply_to_page_range() but can't immediately think of a reason.

There's also an apply_to_page_range_batch() in -mm but that code is
broken on PPC and not much is happening with it, so it will probably go
away again.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-05-04 23:10 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-27 23:35 [PATCH 0/8] avoid allocation in show_numa_map() Stephen Wilson
2011-04-27 23:35 ` Stephen Wilson
2011-04-27 23:35 ` [PATCH 1/8] mm: export get_vma_policy() Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  7:39   ` KOSAKI Motohiro
2011-05-09  7:39     ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 2/8] mm: use walk_page_range() instead of custom page table walking code Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  7:38   ` KOSAKI Motohiro
2011-05-09  7:38     ` KOSAKI Motohiro
2011-05-09 19:36     ` Stephen Wilson
2011-05-09 19:36       ` Stephen Wilson
2011-05-10  0:20       ` KOSAKI Motohiro
2011-05-10  0:20         ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 3/8] mm: remove MPOL_MF_STATS Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  7:44   ` KOSAKI Motohiro
2011-05-09  7:44     ` KOSAKI Motohiro
2011-05-09 19:39     ` Stephen Wilson
2011-05-09 19:39       ` Stephen Wilson
2011-04-27 23:35 ` [PATCH 4/8] mm: make gather_stats() type-safe and remove forward declaration Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  7:45   ` KOSAKI Motohiro
2011-05-09  7:45     ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 5/8] mm: remove check_huge_range() Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  7:46   ` KOSAKI Motohiro
2011-05-09  7:46     ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 6/8] mm: proc: move show_numa_map() to fs/proc/task_mmu.c Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  7:49   ` KOSAKI Motohiro
2011-05-09  7:49     ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 7/8] proc: make struct proc_maps_private truly private Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  7:51   ` KOSAKI Motohiro
2011-05-09  7:51     ` KOSAKI Motohiro
2011-04-27 23:35 ` [PATCH 8/8] proc: allocate storage for numa_maps statistics once Stephen Wilson
2011-04-27 23:35   ` Stephen Wilson
2011-05-09  8:24   ` KOSAKI Motohiro
2011-05-09  8:24     ` KOSAKI Motohiro
2011-05-04 23:10 ` Andrew Morton [this message]
2011-05-04 23:10   ` [PATCH 0/8] avoid allocation in show_numa_map() Andrew Morton
2011-05-05  2:37   ` Stephen Wilson
2011-05-05  2:37     ` Stephen Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110504161020.e2d0a7f2.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jeremy@goop.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wilsons@start.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.