linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com,
	aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org,
	hannes@cmpxchg.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org
Subject: Re: [PATCH -V2 4/9] memcg: Add non reclaim resource tracking to memcg
Date: Thu, 08 Mar 2012 17:18:21 +0530	[thread overview]
Message-ID: <871up39uuy.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120308145628.f911419d.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, 8 Mar 2012 14:56:28 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Sun, 04 Mar 2012 23:37:22 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > On Fri, 2 Mar 2012 17:38:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > On Thu,  1 Mar 2012 14:46:15 +0530
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > > 
> > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> > > 
> > > > +	help
> > > > +	  Add non reclaim resource management to memory resource controller.
> > > > +	  Currently only HugeTLB pages will be managed using this extension.
> > > > +	  The controller limit is enforced during mmap(2), so that
> > > > +	  application can fall back to allocations using smaller page size
> > > > +	  if the memory controller limit prevented them from allocating HugeTLB
> > > > +	  pages.
> > > > +
> > > 
> > > Hm. In other thread, KMEM accounting is discussed. There is 2 proposals and
> > >  - 1st is accounting only reclaimable slabs (as dcache etc.)
> > >  - 2nd is accounting all slab allocations.
> > > 
> > > Here, 2nd one includes NORECLAIM kmem cache. (Discussion is not ended.)
> > > 
> > > So, for your developments,  How about MEM_RES_CTLR_HUGEPAGE ?
> > 
> > Frankly I didn't like the noreclaim name, I also didn't want to indicate
> > HUGEPAGE, because the code doesn't make any huge page assumption.
> 
> You can add this config for HUGEPAGE interfaces.
> Later we can sort out other configs.
> 

Will do

> 
> > > 
> > > 
> > > >  config CGROUP_MEM_RES_CTLR_SWAP
> > > >  	bool "Memory Resource Controller Swap Extension"
> > > >  	depends on CGROUP_MEM_RES_CTLR && SWAP
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 6728a7a..b00d028 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -49,6 +49,7 @@
> > > >  #include <linux/page_cgroup.h>
> > > >  #include <linux/cpu.h>
> > > >  #include <linux/oom.h>
> > > > +#include <linux/region.h>
> > > >  #include "internal.h"
> > > >  #include <net/sock.h>
> > > >  #include <net/tcp_memcontrol.h>
> > > > @@ -214,6 +215,11 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg);
> > > >  static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
> > > >  
> > > >  /*
> > > > + * Currently only hugetlbfs pages are tracked using no reclaim
> > > > + * resource count. So we need only MAX_HSTATE res counter
> > > > + */
> > > > +#define MEMCG_MAX_NORECLAIM HUGE_MAX_HSTATE
> > > > +/*
> > > >   * The memory controller data structure. The memory controller controls both
> > > >   * page cache and RSS per cgroup. We would eventually like to provide
> > > >   * statistics based on the statistics developed by Rik Van Riel for clock-pro,
> > > > @@ -235,6 +241,11 @@ struct mem_cgroup {
> > > >  	 */
> > > >  	struct res_counter memsw;
> > > >  	/*
> > > > +	 * the counter to account for non reclaim resources
> > > > +	 * like hugetlb pages
> > > > +	 */
> > > > +	struct res_counter no_rcl_res[MEMCG_MAX_NORECLAIM];
> > > 
> > > struct res_counter hugepages;
> > > 
> > > will be ok.
> > > 
> > 
> > My goal was to make this patch not to mention hugepages, because
> > it doesn't really have any depedency on hugepages. That is one of the reason
> > for adding MEMCG_MAX_NORECLAIM. Later if we want other in memory file system
> > (shmemfs) to limit the resource usage in a similar fashion, we should be
> > able to use this memcg changes.
> > 
> > May be for this patchset I can make the changes you suggested and later
> > when we want to reuse the code make it more generic ?
> > 
> 
> yes. If there is no user interface change, internal code change will be welcomed.
> 

Ok

> 
> > 
> > > 
> > > > +	/*
> > > >  	 * Per cgroup active and inactive list, similar to the
> > > >  	 * per zone LRU lists.
> > > >  	 */
> > > > @@ -4887,6 +4898,7 @@ err_cleanup:
> > > >  static struct cgroup_subsys_state * __ref
> > > >  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > > >  {
> > > > +	int idx;
> > > >  	struct mem_cgroup *memcg, *parent;
> > > >  	long error = -ENOMEM;
> > > >  	int node;
> > > > @@ -4922,6 +4934,10 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > > >  	if (parent && parent->use_hierarchy) {
> > > >  		res_counter_init(&memcg->res, &parent->res);
> > > >  		res_counter_init(&memcg->memsw, &parent->memsw);
> > > > +		for (idx = 0; idx < MEMCG_MAX_NORECLAIM; idx++) {
> > > > +			res_counter_init(&memcg->no_rcl_res[idx],
> > > > +					 &parent->no_rcl_res[idx]);
> > > > +		}
> > > 
> > > You can remove this kinds of loop and keep your implemenation simple.
> > 
> > 
> > Can you explain this ? How can we remote the loop ?. We want to track
> > each huge page size as a seperate resource. 
> > 
> Ah, sorry. I miseed it. please ignore.
> 
> 
> 
> > > > +long mem_cgroup_try_noreclaim_charge(struct list_head *chg_list,
> > > > +				     unsigned long from, unsigned long to,
> > > > +				     int idx)
> > > > +{
> > > > +	long chg;
> > > > +	int ret = 0;
> > > > +	unsigned long csize;
> > > > +	struct mem_cgroup *memcg;
> > > > +	struct res_counter *fail_res;
> > > > +
> > > > +	/*
> > > > +	 * Get the task cgroup within rcu_readlock and also
> > > > +	 * get cgroup reference to make sure cgroup destroy won't
> > > > +	 * race with page_charge. We don't allow a cgroup destroy
> > > > +	 * when the cgroup have some charge against it
> > > > +	 */
> > > > +	rcu_read_lock();
> > > > +	memcg = mem_cgroup_from_task(current);
> > > > +	css_get(&memcg->css);
> > > 
> > > css_tryget() ?
> > > 
> > 
> > 
> > Why ?
> > 
> 
> current<->cgroup relationship isn't under any locks. So, we do speculative
> access with rcu_read_lock() and css_tryget().
> 

Will update.

Right now i am redoing the patch to see if enforcing limit during
page fault (alloc_huge_page()) and page free (free_huge_page()) simplifies
the patchset. Only problem with that approach is, application should have
a clear idea about it's hugepage usage, or else enforcing the limit at
fault time will result in application getting SIGBUS. But otherwise the approach
simplifies the cgroup removal and brings the code much closer to memcg
design.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-03-08 11:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-01  9:16 [PATCH -V2 0/9] memcg: add HugeTLB resource tracking Aneesh Kumar K.V
2012-03-01  9:16 ` [PATCH -V2 1/9] mm: move hugetlbfs region tracking function to common code Aneesh Kumar K.V
2012-03-01 22:33   ` Andrew Morton
2012-03-04 17:37     ` Aneesh Kumar K.V
2012-03-01  9:16 ` [PATCH -V2 2/9] mm: Update region function to take new data arg Aneesh Kumar K.V
2012-03-01  9:16 ` [PATCH -V2 3/9] hugetlbfs: Use the generic region API and drop local one Aneesh Kumar K.V
2012-03-01  9:16 ` [PATCH -V2 4/9] memcg: Add non reclaim resource tracking to memcg Aneesh Kumar K.V
2012-03-02  8:38   ` KAMEZAWA Hiroyuki
2012-03-04 18:07     ` Aneesh Kumar K.V
2012-03-08  5:56       ` KAMEZAWA Hiroyuki
2012-03-08 11:48         ` Aneesh Kumar K.V [this message]
2012-03-01  9:16 ` [PATCH -V2 5/9] hugetlbfs: Add memory controller support for shared mapping Aneesh Kumar K.V
2012-03-01  9:16 ` [PATCH -V2 6/9] hugetlbfs: Add memory controller support for private mapping Aneesh Kumar K.V
2012-05-17 23:16   ` Darrick J. Wong
2012-03-01  9:16 ` [PATCH -V2 7/9] memcg: track resource index in cftype private Aneesh Kumar K.V
2012-03-01  9:16 ` [PATCH -V2 8/9] hugetlbfs: Add memcg control files for hugetlbfs Aneesh Kumar K.V
2012-03-01  9:16 ` [PATCH -V2 9/9] memcg: Add memory controller documentation for hugetlb management Aneesh Kumar K.V
2012-03-01 22:40 ` [PATCH -V2 0/9] memcg: add HugeTLB resource tracking Andrew Morton
2012-03-02  3:28   ` David Gibson
2012-03-04 18:09     ` Aneesh Kumar K.V
2012-03-06  2:38       ` David Gibson
2012-03-04 19:15   ` Aneesh Kumar K.V
2012-03-05 13:56     ` Hillf Danton
2012-03-06 14:05       ` Aneesh Kumar K.V
2012-03-02  5:48 ` KAMEZAWA Hiroyuki
2012-03-04 18:14   ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871up39uuy.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dhillf@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).