All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	Jiang Liu <liuj97@gmail.com>,
	mhocko@suse.cz, bsingharora@gmail.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	paul.gortmaker@windriver.com
Subject: Re: [PATCH] memory cgroup: update root memory cgroup when node is onlined
Date: Fri, 14 Sep 2012 11:46:25 -0400	[thread overview]
Message-ID: <20120914154625.GN1560@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1209131816070.1908@eggly.anvils>

On Thu, Sep 13, 2012 at 06:36:34PM -0700, Hugh Dickins wrote:
> On Thu, 13 Sep 2012, Johannes Weiner wrote:
> > On Thu, Sep 13, 2012 at 03:14:28PM +0800, Wen Congyang wrote:
> > > root_mem_cgroup->info.nodeinfo is initialized when the system boots.
> > > But NODE_DATA(nid) is null if the node is not onlined, so
> > > root_mem_cgroup->info.nodeinfo[nid]->zoneinfo[zone].lruvec.zone contains
> > > an invalid pointer. If we use numactl to bind a program to the node
> > > after onlining the node and its memory, it will cause the kernel
> > > panicked:
> > 
> > Is there any chance we could get rid of the zone backpointer in lruvec
> > again instead?
> 
> It could be done, but it would make me sad :(

We would not want that!

> > But can't
> > we just go back to passing the zone along with the lruvec down
> > vmscan.c paths?  I agree it's ugly to pass both, given their
> > relationship.  But I don't think the backpointer is any cleaner but in
> > addition less robust.
> 
> It's like how we use vma->mm: we could change everywhere to pass mm with
> vma, but it looks cleaner and cuts down on long arglists to have mm in vma.
> >From past experience, one of the things I worried about was adding extra
> args to the reclaim stack.

Ok, you certainly have a point.

> > That being said, the crashing code in particular makes me wonder:
> > 
> > static __always_inline void add_page_to_lru_list(struct page *page,
> > 				struct lruvec *lruvec, enum lru_list lru)
> > {
> > 	int nr_pages = hpage_nr_pages(page);
> > 	mem_cgroup_update_lru_size(lruvec, lru, nr_pages);
> > 	list_add(&page->lru, &lruvec->lists[lru]);
> > 	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, nr_pages);
> > }
> > 
> > Why did we ever pass zone in here and then felt the need to replace it
> > with lruvec->zone in fa9add6 "mm/memcg: apply add/del_page to lruvec"?
> > A page does not roam between zones, its zone is a static property that
> > can be retrieved with page_zone().
> 
> Just as in vmscan.c, we have the lruvec to hand, and that's what we
> mainly want to operate upon, but there is also some need for zone.
> 
> (Both Konstantin and I were looking towards the day when we move the
> lru_lock into the lruvec, removing more dependence on "zone".  Pretty
> much the only reason that hasn't happened yet, is that we have not found
> time to make a performance case convincingly - but that's another topic.)
> 
> Yes, page_zone(page) is a static property of the page, but it's not
> necessarily cheap to evaluate: depends on how complex the memory model
> and the spare page flags space, doesn't it?  We both preferred to
> derive zone from lruvec where convenient.
> 
> How do you feel about this patch, and does it work for you guys?
> 
> You'd be right if you guessed that I started out without the
> mem_cgroup_zone_lruvec part of it, but oops in get_scan_count
> told me that's needed too.
> 
> Description to be filled in later: would it be needed for -stable,
> or is onlining already broken in other ways that you're now fixing up?
> 
> Reported-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Hugh Dickins <hughd@google.com>

This looks good to me, thanks.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	Jiang Liu <liuj97@gmail.com>,
	mhocko@suse.cz, bsingharora@gmail.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	paul.gortmaker@windriver.com
Subject: Re: [PATCH] memory cgroup: update root memory cgroup when node is onlined
Date: Fri, 14 Sep 2012 11:46:25 -0400	[thread overview]
Message-ID: <20120914154625.GN1560@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1209131816070.1908@eggly.anvils>

On Thu, Sep 13, 2012 at 06:36:34PM -0700, Hugh Dickins wrote:
> On Thu, 13 Sep 2012, Johannes Weiner wrote:
> > On Thu, Sep 13, 2012 at 03:14:28PM +0800, Wen Congyang wrote:
> > > root_mem_cgroup->info.nodeinfo is initialized when the system boots.
> > > But NODE_DATA(nid) is null if the node is not onlined, so
> > > root_mem_cgroup->info.nodeinfo[nid]->zoneinfo[zone].lruvec.zone contains
> > > an invalid pointer. If we use numactl to bind a program to the node
> > > after onlining the node and its memory, it will cause the kernel
> > > panicked:
> > 
> > Is there any chance we could get rid of the zone backpointer in lruvec
> > again instead?
> 
> It could be done, but it would make me sad :(

We would not want that!

> > But can't
> > we just go back to passing the zone along with the lruvec down
> > vmscan.c paths?  I agree it's ugly to pass both, given their
> > relationship.  But I don't think the backpointer is any cleaner but in
> > addition less robust.
> 
> It's like how we use vma->mm: we could change everywhere to pass mm with
> vma, but it looks cleaner and cuts down on long arglists to have mm in vma.
> >From past experience, one of the things I worried about was adding extra
> args to the reclaim stack.

Ok, you certainly have a point.

> > That being said, the crashing code in particular makes me wonder:
> > 
> > static __always_inline void add_page_to_lru_list(struct page *page,
> > 				struct lruvec *lruvec, enum lru_list lru)
> > {
> > 	int nr_pages = hpage_nr_pages(page);
> > 	mem_cgroup_update_lru_size(lruvec, lru, nr_pages);
> > 	list_add(&page->lru, &lruvec->lists[lru]);
> > 	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, nr_pages);
> > }
> > 
> > Why did we ever pass zone in here and then felt the need to replace it
> > with lruvec->zone in fa9add6 "mm/memcg: apply add/del_page to lruvec"?
> > A page does not roam between zones, its zone is a static property that
> > can be retrieved with page_zone().
> 
> Just as in vmscan.c, we have the lruvec to hand, and that's what we
> mainly want to operate upon, but there is also some need for zone.
> 
> (Both Konstantin and I were looking towards the day when we move the
> lru_lock into the lruvec, removing more dependence on "zone".  Pretty
> much the only reason that hasn't happened yet, is that we have not found
> time to make a performance case convincingly - but that's another topic.)
> 
> Yes, page_zone(page) is a static property of the page, but it's not
> necessarily cheap to evaluate: depends on how complex the memory model
> and the spare page flags space, doesn't it?  We both preferred to
> derive zone from lruvec where convenient.
> 
> How do you feel about this patch, and does it work for you guys?
> 
> You'd be right if you guessed that I started out without the
> mem_cgroup_zone_lruvec part of it, but oops in get_scan_count
> told me that's needed too.
> 
> Description to be filled in later: would it be needed for -stable,
> or is onlining already broken in other ways that you're now fixing up?
> 
> Reported-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Hugh Dickins <hughd@google.com>

This looks good to me, thanks.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-09-14 15:46 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-13  7:14 [PATCH] memory cgroup: update root memory cgroup when node is onlined Wen Congyang
2012-09-13  7:14 ` Wen Congyang
2012-09-13  7:14 ` Wen Congyang
     [not found] ` <505187D4.7070404-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2012-09-13 10:06   ` Kamezawa Hiroyuki
2012-09-13 10:06     ` Kamezawa Hiroyuki
2012-09-13 10:06     ` Kamezawa Hiroyuki
     [not found]     ` <5051B011.7060905-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-09-13 10:18       ` Wen Congyang
2012-09-13 10:18         ` Wen Congyang
2012-09-13 10:18         ` Wen Congyang
2012-09-13 20:59 ` Johannes Weiner
2012-09-13 20:59   ` Johannes Weiner
2012-09-14  1:36   ` Hugh Dickins
2012-09-14  1:36     ` Hugh Dickins
2012-09-14  1:36     ` Hugh Dickins
2012-09-14  1:53     ` Wen Congyang
2012-09-14  1:53       ` Wen Congyang
2012-09-14 15:46     ` Johannes Weiner [this message]
2012-09-14 15:46       ` Johannes Weiner
     [not found]     ` <alpine.LSU.2.00.1209131816070.1908-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2012-09-15 10:56       ` Konstantin Khlebnikov
2012-09-15 10:56         ` Konstantin Khlebnikov
2012-09-15 10:56         ` Konstantin Khlebnikov
2012-09-17  5:50       ` Wen Congyang
2012-09-17  5:50         ` Wen Congyang
2012-09-17  5:50         ` Wen Congyang
2012-10-16  5:58       ` Wen Congyang
2012-10-16  5:58         ` Wen Congyang
2012-10-16  5:58         ` Wen Congyang
2012-10-18 19:03         ` Hugh Dickins
2012-10-18 19:03           ` Hugh Dickins
     [not found]           ` <alpine.LSU.2.00.1210181129180.2137-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2012-10-18 22:03             ` Johannes Weiner
2012-10-18 22:03               ` Johannes Weiner
2012-10-18 22:03               ` Johannes Weiner
     [not found]               ` <20121018220306.GA1739-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-11-02  1:28                 ` [PATCH] memcg: fix hotplugged memory zone oops Hugh Dickins
2012-11-02  1:28                   ` Hugh Dickins
2012-11-02  1:28                   ` Hugh Dickins
     [not found]                   ` <alpine.LNX.2.00.1211011822190.20048-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2012-11-02 10:21                     ` Michal Hocko
2012-11-02 10:21                       ` Michal Hocko
2012-11-02 10:21                       ` Michal Hocko
2012-11-02 10:54                       ` Michal Hocko
2012-11-02 10:54                         ` Michal Hocko
2012-11-02 10:54                         ` Michal Hocko
     [not found]                         ` <20121102105420.GB24073-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-02 23:37                           ` Hugh Dickins
2012-11-02 23:37                             ` Hugh Dickins
2012-11-02 23:37                             ` Hugh Dickins
     [not found]                             ` <alpine.LNX.2.00.1211021631260.11106-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2012-11-03  7:00                               ` Michal Hocko
2012-11-03  7:00                                 ` Michal Hocko
2012-11-03  7:00                                 ` Michal Hocko
     [not found]                       ` <20121102102159.GA24073-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-02 23:31                         ` Hugh Dickins
2012-11-02 23:31                           ` Hugh Dickins
2012-11-02 23:31                           ` Hugh Dickins
2012-10-16  7:21     ` [PATCH] memory cgroup: update root memory cgroup when node is onlined Kamezawa Hiroyuki
2012-10-16  7:21       ` Kamezawa Hiroyuki
     [not found]   ` <20120913205935.GK1560-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-09-14  1:46     ` Wen Congyang
2012-09-14  1:46       ` Wen Congyang
2012-09-14  1:46       ` Wen Congyang
2012-09-17  6:40       ` Hugh Dickins
2012-09-17  6:40         ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120914154625.GN1560@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuj97@gmail.com \
    --cc=mhocko@suse.cz \
    --cc=paul.gortmaker@windriver.com \
    --cc=wency@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.