All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH 2/2] memcg: coalescing charges per cpu
Date: Thu, 8 Oct 2009 15:26:20 -0700	[thread overview]
Message-ID: <20091008152620.e81d6209.akpm@linux-foundation.org> (raw)
In-Reply-To: <20091002140343.ae63e932.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, 2 Oct 2009 14:03:43 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This is a patch for coalescing access to res_counter at charging by
> percpu caching. At charge, memcg charges 64pages and remember it in
> percpu cache. Because it's cache, drain/flush is done if necessary.
> 
> This version uses public percpu area.
>  2 benefits of using public percpu area.
>  1. Sum of stocked charge in the system is limited to # of cpus
>     not to the number of memcg. This shows better synchonization.
>  2. drain code for flush/cpuhotplug is very easy (and quick)
> 
> The most important point of this patch is that we never touch res_counter
> in fast path. The res_counter is system-wide shared counter which is modified
> very frequently. We shouldn't touch it as far as we can for avoiding
> false sharing.
> 
> ...
> 
> +/* size of first charge trial. "32" comes from vmscan.c's magic value */
> +#define CHARGE_SIZE	(32 * PAGE_SIZE)
> +struct memcg_stock_pcp {
> +	struct mem_cgroup *cached;
> +	int charge;
> +	struct work_struct work;
> +};
> +static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> +static DEFINE_MUTEX(memcg_drain_mutex);
> +
> +static bool consume_stock(struct mem_cgroup *mem)
> +{
> +	struct memcg_stock_pcp *stock;
> +	bool ret = true;
> +
> +	stock = &get_cpu_var(memcg_stock);
> +	if (mem == stock->cached && stock->charge)
> +		stock->charge -= PAGE_SIZE;
> +	else
> +		ret = false;
> +	put_cpu_var(memcg_stock);
> +	return ret;
> +}

It's unobvious what the return value from this function means.  A nice
comment would help.

> +static void drain_stock(struct memcg_stock_pcp *stock)
> +{
> +	struct mem_cgroup *old = stock->cached;
> +
> +	if (stock->charge) {
> +		res_counter_uncharge(&old->res, stock->charge);
> +		if (do_swap_account)
> +			res_counter_uncharge(&old->memsw, stock->charge);
> +	}
> +	stock->cached = NULL;
> +	stock->charge = 0;
> +}
> +
> +static void drain_local_stock(struct work_struct *dummy)
> +{
> +	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
> +	drain_stock(stock);
> +	put_cpu_var(memcg_stock);
> +}

drain_local_stock() is only ever run by a thread which is pinned to a
particular CPU, so we can use plain old __get_cpu_var() here and remove
the put_cpu_var(), methinks.  If this is done, a comment shuold be
added explaining why we can use that optimisation.

> +static void refill_stock(struct mem_cgroup *mem, int val)
> +{
> +	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
> +
> +	if (stock->cached != mem) {
> +		drain_stock(stock);
> +		stock->cached = mem;
> +	}
> +	stock->charge += val;
> +	put_cpu_var(memcg_stock);
> +}
> +
> +static void drain_all_stock_async(void)
> +{
> +	int cpu;
> +	/* Contention means someone tries to flush. */
> +	if (!mutex_trylock(&memcg_drain_mutex))
> +		return;

Any time I see a trylock I ask "hm, what happens when it fails - what
are the consequences and how to we later rerun the aborted operation".

That's unobvious here and merits a comment.

> +	for_each_online_cpu(cpu) {

Did we need get_online_cpus() here?

> +		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> +		if (work_pending(&stock->work))
> +			continue;
> +		INIT_WORK(&stock->work, drain_local_stock);
> +		schedule_work_on(cpu, &stock->work);
> +	}
> +	mutex_unlock(&memcg_drain_mutex);
> +	/* We don't wait for flush_work */
> +}
> +
> +static void drain_all_stock_sync(void)
> +{
> +	/* called when force_empty is called */
> +	mutex_lock(&memcg_drain_mutex);
> +	schedule_on_each_cpu(drain_local_stock);
> +	mutex_unlock(&memcg_drain_mutex);
> +}

In fact it would be nice were each of these functions to have a brief
note describing their role.

> +static int __cpuinit memcg_stock_cpu_callback(struct notifier_block *nb,
> +					unsigned long action,
> +					void *hcpu)
> +{
> +#ifdef CONFIG_HOTPLUG_CPU
> +	int cpu = (unsigned long)hcpu;
> +	struct memcg_stock_pcp *stock;
> +
> +	if (action != CPU_DEAD)
> +		return NOTIFY_OK;
> +	stock = &per_cpu(memcg_stock, cpu);
> +	drain_stock(stock);
> +#endif
> +	return NOTIFY_OK;
> +}

Is the ifdef needed?  Using hotcpu_notifier() should cause this entire
function to disappear from vmlinux when CONFIG_HOTPLUG_CPU=n.

>  /*
>   * Unlike exported interface, "oom" parameter is added. if oom==true,
>   * oom-killer can be invoked.
> @@ -1269,6 +1370,7 @@ static int __mem_cgroup_try_charge(struc
>  	struct mem_cgroup *mem, *mem_over_limit;
>  	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
>  	struct res_counter *fail_res;
> +	int csize = CHARGE_SIZE;
>  
>  	if (unlikely(test_thread_flag(TIF_MEMDIE))) {
>  		/* Don't account this! */
> @@ -1293,23 +1395,25 @@ static int __mem_cgroup_try_charge(struc
>  		return 0;
>  
>  	VM_BUG_ON(css_is_removed(&mem->css));
> +	if (mem_cgroup_is_root(mem))
> +		goto done;
>  
>  	while (1) {
>  		int ret = 0;
>  		unsigned long flags = 0;
>  
> -		if (mem_cgroup_is_root(mem))
> -			goto done;
> -		ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
> +		if (consume_stock(mem))
> +			goto charged;
> +
> +		ret = res_counter_charge(&mem->res, csize, &fail_res);
>  		if (likely(!ret)) {
>  			if (!do_swap_account)
>  				break;
> -			ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
> -							&fail_res);
> +			ret = res_counter_charge(&mem->memsw, csize, &fail_res);
>  			if (likely(!ret))
>  				break;
>  			/* mem+swap counter fails */
> -			res_counter_uncharge(&mem->res, PAGE_SIZE);
> +			res_counter_uncharge(&mem->res, csize);
>  			flags |= MEM_CGROUP_RECLAIM_NOSWAP;
>  			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
>  									memsw);
> @@ -1318,6 +1422,11 @@ static int __mem_cgroup_try_charge(struc
>  			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
>  									res);
>  
> +		/* reduce request size and retry */
> +		if (csize > PAGE_SIZE) {
> +			csize = PAGE_SIZE;
> +			continue;
> +		}
>  		if (!(gfp_mask & __GFP_WAIT))
>  			goto nomem;
>  
> @@ -1347,6 +1456,9 @@ static int __mem_cgroup_try_charge(struc
>  			goto nomem;
>  		}
>  	}
> +	if (csize > PAGE_SIZE)
> +		refill_stock(mem, csize - PAGE_SIZE);
> +charged:
>  	/*
>  	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
>  	 * if they exceeds softlimit.
> @@ -2463,6 +2575,7 @@ move_account:
>  			goto out;
>  		/* This is for making all *used* pages to be on LRU. */
>  		lru_add_drain_all();
> +		drain_all_stock_sync();
>  		ret = 0;
>  		for_each_node_state(node, N_HIGH_MEMORY) {
>  			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
> @@ -3181,6 +3294,7 @@ mem_cgroup_create(struct cgroup_subsys *
>  		root_mem_cgroup = mem;
>  		if (mem_cgroup_soft_limit_tree_init())
>  			goto free_out;
> +		hotcpu_notifier(memcg_stock_cpu_callback, 0);
>  
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH 2/2] memcg: coalescing charges per cpu
Date: Thu, 8 Oct 2009 15:26:20 -0700	[thread overview]
Message-ID: <20091008152620.e81d6209.akpm@linux-foundation.org> (raw)
In-Reply-To: <20091002140343.ae63e932.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, 2 Oct 2009 14:03:43 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This is a patch for coalescing access to res_counter at charging by
> percpu caching. At charge, memcg charges 64pages and remember it in
> percpu cache. Because it's cache, drain/flush is done if necessary.
> 
> This version uses public percpu area.
>  2 benefits of using public percpu area.
>  1. Sum of stocked charge in the system is limited to # of cpus
>     not to the number of memcg. This shows better synchonization.
>  2. drain code for flush/cpuhotplug is very easy (and quick)
> 
> The most important point of this patch is that we never touch res_counter
> in fast path. The res_counter is system-wide shared counter which is modified
> very frequently. We shouldn't touch it as far as we can for avoiding
> false sharing.
> 
> ...
> 
> +/* size of first charge trial. "32" comes from vmscan.c's magic value */
> +#define CHARGE_SIZE	(32 * PAGE_SIZE)
> +struct memcg_stock_pcp {
> +	struct mem_cgroup *cached;
> +	int charge;
> +	struct work_struct work;
> +};
> +static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> +static DEFINE_MUTEX(memcg_drain_mutex);
> +
> +static bool consume_stock(struct mem_cgroup *mem)
> +{
> +	struct memcg_stock_pcp *stock;
> +	bool ret = true;
> +
> +	stock = &get_cpu_var(memcg_stock);
> +	if (mem == stock->cached && stock->charge)
> +		stock->charge -= PAGE_SIZE;
> +	else
> +		ret = false;
> +	put_cpu_var(memcg_stock);
> +	return ret;
> +}

It's unobvious what the return value from this function means.  A nice
comment would help.

> +static void drain_stock(struct memcg_stock_pcp *stock)
> +{
> +	struct mem_cgroup *old = stock->cached;
> +
> +	if (stock->charge) {
> +		res_counter_uncharge(&old->res, stock->charge);
> +		if (do_swap_account)
> +			res_counter_uncharge(&old->memsw, stock->charge);
> +	}
> +	stock->cached = NULL;
> +	stock->charge = 0;
> +}
> +
> +static void drain_local_stock(struct work_struct *dummy)
> +{
> +	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
> +	drain_stock(stock);
> +	put_cpu_var(memcg_stock);
> +}

drain_local_stock() is only ever run by a thread which is pinned to a
particular CPU, so we can use plain old __get_cpu_var() here and remove
the put_cpu_var(), methinks.  If this is done, a comment shuold be
added explaining why we can use that optimisation.

> +static void refill_stock(struct mem_cgroup *mem, int val)
> +{
> +	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
> +
> +	if (stock->cached != mem) {
> +		drain_stock(stock);
> +		stock->cached = mem;
> +	}
> +	stock->charge += val;
> +	put_cpu_var(memcg_stock);
> +}
> +
> +static void drain_all_stock_async(void)
> +{
> +	int cpu;
> +	/* Contention means someone tries to flush. */
> +	if (!mutex_trylock(&memcg_drain_mutex))
> +		return;

Any time I see a trylock I ask "hm, what happens when it fails - what
are the consequences and how to we later rerun the aborted operation".

That's unobvious here and merits a comment.

> +	for_each_online_cpu(cpu) {

Did we need get_online_cpus() here?

> +		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> +		if (work_pending(&stock->work))
> +			continue;
> +		INIT_WORK(&stock->work, drain_local_stock);
> +		schedule_work_on(cpu, &stock->work);
> +	}
> +	mutex_unlock(&memcg_drain_mutex);
> +	/* We don't wait for flush_work */
> +}
> +
> +static void drain_all_stock_sync(void)
> +{
> +	/* called when force_empty is called */
> +	mutex_lock(&memcg_drain_mutex);
> +	schedule_on_each_cpu(drain_local_stock);
> +	mutex_unlock(&memcg_drain_mutex);
> +}

In fact it would be nice were each of these functions to have a brief
note describing their role.

> +static int __cpuinit memcg_stock_cpu_callback(struct notifier_block *nb,
> +					unsigned long action,
> +					void *hcpu)
> +{
> +#ifdef CONFIG_HOTPLUG_CPU
> +	int cpu = (unsigned long)hcpu;
> +	struct memcg_stock_pcp *stock;
> +
> +	if (action != CPU_DEAD)
> +		return NOTIFY_OK;
> +	stock = &per_cpu(memcg_stock, cpu);
> +	drain_stock(stock);
> +#endif
> +	return NOTIFY_OK;
> +}

Is the ifdef needed?  Using hotcpu_notifier() should cause this entire
function to disappear from vmlinux when CONFIG_HOTPLUG_CPU=n.

>  /*
>   * Unlike exported interface, "oom" parameter is added. if oom==true,
>   * oom-killer can be invoked.
> @@ -1269,6 +1370,7 @@ static int __mem_cgroup_try_charge(struc
>  	struct mem_cgroup *mem, *mem_over_limit;
>  	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
>  	struct res_counter *fail_res;
> +	int csize = CHARGE_SIZE;
>  
>  	if (unlikely(test_thread_flag(TIF_MEMDIE))) {
>  		/* Don't account this! */
> @@ -1293,23 +1395,25 @@ static int __mem_cgroup_try_charge(struc
>  		return 0;
>  
>  	VM_BUG_ON(css_is_removed(&mem->css));
> +	if (mem_cgroup_is_root(mem))
> +		goto done;
>  
>  	while (1) {
>  		int ret = 0;
>  		unsigned long flags = 0;
>  
> -		if (mem_cgroup_is_root(mem))
> -			goto done;
> -		ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
> +		if (consume_stock(mem))
> +			goto charged;
> +
> +		ret = res_counter_charge(&mem->res, csize, &fail_res);
>  		if (likely(!ret)) {
>  			if (!do_swap_account)
>  				break;
> -			ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
> -							&fail_res);
> +			ret = res_counter_charge(&mem->memsw, csize, &fail_res);
>  			if (likely(!ret))
>  				break;
>  			/* mem+swap counter fails */
> -			res_counter_uncharge(&mem->res, PAGE_SIZE);
> +			res_counter_uncharge(&mem->res, csize);
>  			flags |= MEM_CGROUP_RECLAIM_NOSWAP;
>  			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
>  									memsw);
> @@ -1318,6 +1422,11 @@ static int __mem_cgroup_try_charge(struc
>  			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
>  									res);
>  
> +		/* reduce request size and retry */
> +		if (csize > PAGE_SIZE) {
> +			csize = PAGE_SIZE;
> +			continue;
> +		}
>  		if (!(gfp_mask & __GFP_WAIT))
>  			goto nomem;
>  
> @@ -1347,6 +1456,9 @@ static int __mem_cgroup_try_charge(struc
>  			goto nomem;
>  		}
>  	}
> +	if (csize > PAGE_SIZE)
> +		refill_stock(mem, csize - PAGE_SIZE);
> +charged:
>  	/*
>  	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
>  	 * if they exceeds softlimit.
> @@ -2463,6 +2575,7 @@ move_account:
>  			goto out;
>  		/* This is for making all *used* pages to be on LRU. */
>  		lru_add_drain_all();
> +		drain_all_stock_sync();
>  		ret = 0;
>  		for_each_node_state(node, N_HIGH_MEMORY) {
>  			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
> @@ -3181,6 +3294,7 @@ mem_cgroup_create(struct cgroup_subsys *
>  		root_mem_cgroup = mem;
>  		if (mem_cgroup_soft_limit_tree_init())
>  			goto free_out;
> +		hotcpu_notifier(memcg_stock_cpu_callback, 0);
>  
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-10-08 22:27 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-02  4:55 [PATCH 0/2] memcg: improving scalability by reducing lock contention at charge/uncharge KAMEZAWA Hiroyuki
2009-10-02  4:55 ` KAMEZAWA Hiroyuki
2009-10-02  5:01 ` [PATCH 1/2] memcg: coalescing uncharge at unmap and truncation KAMEZAWA Hiroyuki
2009-10-02  5:01   ` KAMEZAWA Hiroyuki
2009-10-02  6:47   ` Hiroshi Shimamoto
2009-10-02  6:47     ` Hiroshi Shimamoto
2009-10-02  6:53     ` Hiroshi Shimamoto
2009-10-02  6:53       ` Hiroshi Shimamoto
2009-10-02  7:04       ` KAMEZAWA Hiroyuki
2009-10-02  7:04         ` KAMEZAWA Hiroyuki
2009-10-02  7:02     ` [PATCH 1/2] memcg: coalescing uncharge at unmap and truncation (fixed coimpile bug) KAMEZAWA Hiroyuki
2009-10-02  7:02       ` KAMEZAWA Hiroyuki
2009-10-08 22:17       ` Andrew Morton
2009-10-08 22:17         ` Andrew Morton
2009-10-08 23:48         ` KAMEZAWA Hiroyuki
2009-10-08 23:48           ` KAMEZAWA Hiroyuki
2009-10-09  4:01   ` [PATCH 1/2] memcg: coalescing uncharge at unmap and truncation Balbir Singh
2009-10-09  4:01     ` Balbir Singh
2009-10-09  4:17     ` KAMEZAWA Hiroyuki
2009-10-09  4:17       ` KAMEZAWA Hiroyuki
2009-10-02  5:03 ` [PATCH 2/2] memcg: coalescing charges per cpu KAMEZAWA Hiroyuki
2009-10-02  5:03   ` KAMEZAWA Hiroyuki
2009-10-08 22:26   ` Andrew Morton [this message]
2009-10-08 22:26     ` Andrew Morton
2009-10-08 23:54     ` KAMEZAWA Hiroyuki
2009-10-08 23:54       ` KAMEZAWA Hiroyuki
2009-10-09  4:15   ` Balbir Singh
2009-10-09  4:15     ` Balbir Singh
2009-10-09  4:25     ` KAMEZAWA Hiroyuki
2009-10-09  4:25       ` KAMEZAWA Hiroyuki
2009-10-02  8:53 ` [PATCH 0/2] memcg: improving scalability by reducing lock contention at charge/uncharge KAMEZAWA Hiroyuki
2009-10-05  7:18   ` KAMEZAWA Hiroyuki
2009-10-05  7:18     ` KAMEZAWA Hiroyuki
2009-10-05 10:37 ` Balbir Singh
2009-10-05 10:37   ` Balbir Singh
     [not found] ` <604427e00910091737s52e11ce9p256c95d533dc2837@mail.gmail.com>
2009-10-11  2:33   ` KAMEZAWA Hiroyuki
2009-10-11  2:33     ` KAMEZAWA Hiroyuki
     [not found]     ` <604427e00910111134o6f22f0ddg2b87124dd334ec02@mail.gmail.com>
2009-10-12 11:38       ` Balbir Singh
2009-10-12 11:38         ` Balbir Singh
2009-10-13  0:29       ` KAMEZAWA Hiroyuki
2009-10-13  0:29         ` KAMEZAWA Hiroyuki
     [not found]         ` <604427e00910121818w71dd4b7dl8781d7f5bc4f7dd9@mail.gmail.com>
2009-10-13  1:28           ` KAMEZAWA Hiroyuki
2009-10-13  1:28             ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091008152620.e81d6209.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.