From: Wu Fengguang <fengguang.wu@intel.com>
To: Christoph Lameter <cl@linux.com>
Cc: Haicheng Li <haicheng.li@linux.intel.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>, Mel Gorman <mel@csn.ul.ie>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH 3/3] mem-hotplug: fix potential race while building zonelist for new populated zone
Date: Tue, 18 May 2010 10:19:23 +0800 [thread overview]
Message-ID: <20100518021923.GA6595@localhost> (raw)
In-Reply-To: <alpine.DEB.2.00.1005171108070.20764@router.home>
On Tue, May 18, 2010 at 12:09:31AM +0800, Christoph Lameter wrote:
> On Mon, 17 May 2010, Haicheng Li wrote:
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 72c1211..0729a82 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2783,6 +2783,20 @@ static __init_refok int __build_all_zonelists(void
> > *data)
> > {
> > int nid;
> > int cpu;
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> > + struct zone_online_info *new = (struct zone_online_info *)data;
> > +
> > + /*
> > + * Populate the new zone before build zonelists, which could
> > + * happen only when onlining a new node after system is booted.
> > + */
> > + if (new) {
> > + /* We are expecting a new memory block here. */
> > + WARN_ON(!new->onlined_pages);
> > + new->zone->present_pages += new->onlined_pages;
> > + new->zone->zone_pgdat->node_present_pages +=
> > new->onlined_pages;
> > + }
> > +#endif
>
>
> Building a zonelist now has the potential side effect of changes to the
> size of the zone?
Yeah, this sounds a bit hacky.
> Can we have a global mutex that protects against size modification of
> zonelists instead? And it could also serialize the pageset setup?
Good suggestion. We could make zone_pageset_mutex a global mutex and
take it in all the functions that call build_all_zonelists() --
currently only online_pages() and numa_zonelist_order_handler().
This can equally fix the possible race:
CPU0 CPU1 CPU2
(1) zone->present_pages += online_pages;
(2) build_all_zonelists();
(3) alloc_page();
(4) free_page();
(5) build_all_zonelists();
(6) __build_all_zonelists();
(7) zone->pageset = alloc_percpu();
In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Christoph Lameter <cl@linux.com>
Cc: Haicheng Li <haicheng.li@linux.intel.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>, Mel Gorman <mel@csn.ul.ie>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH 3/3] mem-hotplug: fix potential race while building zonelist for new populated zone
Date: Tue, 18 May 2010 10:19:23 +0800 [thread overview]
Message-ID: <20100518021923.GA6595@localhost> (raw)
In-Reply-To: <alpine.DEB.2.00.1005171108070.20764@router.home>
On Tue, May 18, 2010 at 12:09:31AM +0800, Christoph Lameter wrote:
> On Mon, 17 May 2010, Haicheng Li wrote:
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 72c1211..0729a82 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2783,6 +2783,20 @@ static __init_refok int __build_all_zonelists(void
> > *data)
> > {
> > int nid;
> > int cpu;
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> > + struct zone_online_info *new = (struct zone_online_info *)data;
> > +
> > + /*
> > + * Populate the new zone before build zonelists, which could
> > + * happen only when onlining a new node after system is booted.
> > + */
> > + if (new) {
> > + /* We are expecting a new memory block here. */
> > + WARN_ON(!new->onlined_pages);
> > + new->zone->present_pages += new->onlined_pages;
> > + new->zone->zone_pgdat->node_present_pages +=
> > new->onlined_pages;
> > + }
> > +#endif
>
>
> Building a zonelist now has the potential side effect of changes to the
> size of the zone?
Yeah, this sounds a bit hacky.
> Can we have a global mutex that protects against size modification of
> zonelists instead? And it could also serialize the pageset setup?
Good suggestion. We could make zone_pageset_mutex a global mutex and
take it in all the functions that call build_all_zonelists() --
currently only online_pages() and numa_zonelist_order_handler().
This can equally fix the possible race:
CPU0 CPU1 CPU2
(1) zone->present_pages += online_pages;
(2) build_all_zonelists();
(3) alloc_page();
(4) free_page();
(5) build_all_zonelists();
(6) __build_all_zonelists();
(7) zone->pageset = alloc_percpu();
In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-05-18 2:19 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-17 8:20 [PATCH 3/3] mem-hotplug: fix potential race while building zonelist for new populated zone Haicheng Li
2010-05-17 8:20 ` Haicheng Li
2010-05-17 16:09 ` Christoph Lameter
2010-05-17 16:09 ` Christoph Lameter
2010-05-18 2:19 ` Wu Fengguang [this message]
2010-05-18 2:19 ` Wu Fengguang
2010-05-18 9:02 ` [RESEND][PATCH " Haicheng Li
2010-05-18 9:02 ` Haicheng Li
2010-05-18 13:55 ` Christoph Lameter
2010-05-18 13:55 ` Christoph Lameter
2010-05-19 3:48 ` [RESEND -v2][PATCH " Haicheng Li
2010-05-19 3:48 ` Haicheng Li
2010-05-19 15:21 ` Christoph Lameter
2010-05-19 15:21 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100518021923.GA6595@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cl@linux.com \
--cc=haicheng.li@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.