Re: [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rientjes@google.com, clameter@sgi.com
Subject: Re: [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2
Date: Thu, 13 Dec 2007 11:16:09 -0500	[thread overview]
Message-ID: <1197562570.5031.44.camel@localhost> (raw)
In-Reply-To: <20071213092338.8b10944c.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, 2007-12-13 at 09:23 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 12 Dec 2007 16:32:51 -0500
> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> 
> > Just this afternoon, I hit a null pointer deref in
> > __mem_cgroup_remove_list() [called from mem_cgroup_uncharge() if I can
> > trust the stack trace] attempting to unmap a page for migration.  I'm
> > just starting to investigate this.
> > 
> > I'll replace the series I have [~V10] with V11r2 and continue testing in
> > anticipation of the day that we can get this into -mm.
> > 
> Hi, Lee-san.
> 
> Could you know what is the caller of page migration ?
> system call ? hot removal ? or some new thing ?

Kame-san:

I was testing with my out-of-tree automatic-lazy migration patches.  See
http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/197.pdf
for an overview.  These patches arrange to unmap [remove ptes] for all
anon pages in a task's vmas with default/local policy [and with mapcount
below a tunable threshold] when the load balancer moves the task to a
different node.  Then, the pages will migrate on next touch/fault, if
they are misplaced relative to the policy--which is likely for many of
the pages, as the task is executing on a different node.  With respect
to file backed pages, automatic migration will only unmap the current
task's pte so that the task will take a fault on next touch and Nick
Piggin's pagecache replication patches will make, if necessary, or use
an existing local copy of the page.

The stack trace was:

try_to_unmap_one -> page_remove_rmap -> mem_cgroup_uncharge, with the
faulting instruction apparently in __mem_cgroup_remove_list().

> 
> Note: 2.6.24-rc4-mm1's cgroup/migration logic.
> 
> In 2.6.24-rc4-mm1, in page migration, mem_cgroup_prepare_migration() increments
> page_cgroup's refcnt before calling try_to_unmap(). This extra refcnt guarantees 
> the page_cgroup's refcnt will not drop to 0 in sequence of
> unmap_and_move() -> try_to_unmap() -> page_remove_rmap() -> mem_cgroup_unchage(). 

Yes, I've seen that code.  I'm working on page migration/replication in
the background, keeping the patches up to date and tested, so I haven't
had a lot of time to investigate.

I do have a heavy-handed instrumentation patch to try to trap any null
pointers or stale {page|mem}_cgroup pointers.  [I can send, if you're
interested.]  I restarted the stress test with that patch.  The test ran
quite a bit longer and then hit a different bug.  So, I have a race
somewhere.   If I can definitely pin it on the memory controller or an
iteraction of the memory controller with page migration/replication,
I'll let you know--and try to come up with a patch.  Otherwise, you
probably don't need to worry.

Lee
> 
> Thanks,
> -Kame
>

WARNING: multiple messages have this Message-ID (diff)

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rientjes@google.com, clameter@sgi.com
Subject: Re: [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2
Date: Thu, 13 Dec 2007 11:16:09 -0500	[thread overview]
Message-ID: <1197562570.5031.44.camel@localhost> (raw)
In-Reply-To: <20071213092338.8b10944c.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, 2007-12-13 at 09:23 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 12 Dec 2007 16:32:51 -0500
> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> 
> > Just this afternoon, I hit a null pointer deref in
> > __mem_cgroup_remove_list() [called from mem_cgroup_uncharge() if I can
> > trust the stack trace] attempting to unmap a page for migration.  I'm
> > just starting to investigate this.
> > 
> > I'll replace the series I have [~V10] with V11r2 and continue testing in
> > anticipation of the day that we can get this into -mm.
> > 
> Hi, Lee-san.
> 
> Could you know what is the caller of page migration ?
> system call ? hot removal ? or some new thing ?

Kame-san:

I was testing with my out-of-tree automatic-lazy migration patches.  See
http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/197.pdf
for an overview.  These patches arrange to unmap [remove ptes] for all
anon pages in a task's vmas with default/local policy [and with mapcount
below a tunable threshold] when the load balancer moves the task to a
different node.  Then, the pages will migrate on next touch/fault, if
they are misplaced relative to the policy--which is likely for many of
the pages, as the task is executing on a different node.  With respect
to file backed pages, automatic migration will only unmap the current
task's pte so that the task will take a fault on next touch and Nick
Piggin's pagecache replication patches will make, if necessary, or use
an existing local copy of the page.

The stack trace was:

try_to_unmap_one -> page_remove_rmap -> mem_cgroup_uncharge, with the
faulting instruction apparently in __mem_cgroup_remove_list().

> 
> Note: 2.6.24-rc4-mm1's cgroup/migration logic.
> 
> In 2.6.24-rc4-mm1, in page migration, mem_cgroup_prepare_migration() increments
> page_cgroup's refcnt before calling try_to_unmap(). This extra refcnt guarantees 
> the page_cgroup's refcnt will not drop to 0 in sequence of
> unmap_and_move() -> try_to_unmap() -> page_remove_rmap() -> mem_cgroup_unchage(). 

Yes, I've seen that code.  I'm working on page migration/replication in
the background, keeping the patches up to date and tested, so I haven't
had a lot of time to investigate.

I do have a heavy-handed instrumentation patch to try to trap any null
pointers or stale {page|mem}_cgroup pointers.  [I can send, if you're
interested.]  I restarted the stress test with that patch.  The test ran
quite a bit longer and then hit a different bug.  So, I have a race
somewhere.   If I can definitely pin it on the memory controller or an
iteraction of the memory controller with page migration/replication,
I'll let you know--and try to come up with a patch.  Otherwise, you
probably don't need to worry.

Lee
> 
> Thanks,
> -Kame
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-12-13 16:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-11 20:21 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2 Mel Gorman
2007-12-11 20:21 ` Mel Gorman
2007-12-11 20:22 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-12-11 20:22   ` Mel Gorman
2007-12-11 20:22 ` [PATCH 2/6] Introduce node_zonelist() for accessing the zonelist for a GFP mask Mel Gorman
2007-12-11 20:22   ` Mel Gorman
2007-12-11 20:22 ` [PATCH 3/6] Remember what the preferred zone is for zone_statistics Mel Gorman
2007-12-11 20:22   ` Mel Gorman
2007-12-11 20:23 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Mel Gorman
2007-12-11 20:23   ` Mel Gorman
2007-12-11 20:23 ` [PATCH 5/6] Have zonelist contains structs with both a zone pointer and zone_idx Mel Gorman
2007-12-11 20:23   ` Mel Gorman
2007-12-11 20:23 ` [PATCH 6/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-12-11 20:23   ` Mel Gorman
2007-12-11 21:51 ` [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2 Andrew Morton
2007-12-11 21:51   ` Andrew Morton
2007-12-12 21:32 ` Lee Schermerhorn
2007-12-12 21:32   ` Lee Schermerhorn
2007-12-13  0:23   ` KAMEZAWA Hiroyuki
2007-12-13  0:23     ` KAMEZAWA Hiroyuki
2007-12-13 16:16     ` Lee Schermerhorn [this message]
2007-12-13 16:16       ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1197562570.5031.44.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.