Re: [PATCH 2/2] memcg: reduce usage at change limit

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"xemul@openvz.org" <xemul@openvz.org>,
	"menage@google.com" <menage@google.com>,
	"lizf@cn.fujitsu.com" <lizf@cn.fujitsu.com>,
	"yamamoto@valinux.co.jp" <yamamoto@valinux.co.jp>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 2/2] memcg: reduce usage at change limit
Date: Tue, 17 Jun 2008 09:16:31 +0530	[thread overview]
Message-ID: <48573397.608@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080617123604.c8cb1bd5.kamezawa.hiroyu@jp.fujitsu.com>

KAMEZAWA Hiroyuki wrote:
> Reduce the usage of res_counter at the change of limit.
> 
> Changelog v4 -> v5.
>  - moved "feedback" alogrithm from res_counter to memcg.
> 
> Background:
>  - Now, mem->usage is not reduced at limit change. So, the users will see
>    usage > limit case in memcg every time. This patch fixes it.
> 
>  Before:
>  - no usage change at setting limit.
>  - setting limit always returns 0 even if usage can never be less than zero.
>    (This can happen when memory is locked or there is no swap.)
>  - This is BUG, I think.
>  After:
>  - usage will be less than new limit at limit change.
>  - set limit returns -EBUSY when the usage cannot be reduced.
> 
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  Documentation/controllers/memory.txt |    3 -
>  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
>  2 files changed, 56 insertions(+), 15 deletions(-)
> 
> Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> @@ -852,18 +852,30 @@ out:
>  	css_put(&mem->css);
>  	return ret;
>  }
> +/*
> + * try to set limit and reduce usage if necessary.
> + * returns 0 at success.
> + * returns -EBUSY if memory cannot be dropped.
> + */
> 
> -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> +					unsigned long long val)
>  {
> -	*tmp = memparse(buf, &buf);
> -	if (*buf != '\0')
> -		return -EINVAL;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	int retry_count = 0;
> +	int progress;
> 
> -	/*
> -	 * Round up the value to the closest page size
> -	 */
> -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> -	return 0;
> +retry:
> +	if (!res_counter_set_limit(&memcg->res, val))
> +		return 0;
> +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> +		return -EBUSY;
> +
> +	cond_resched();

Do we really need this? We do have cond_resched in shrink_page_list(),
shrink_active_list(), do we need it here as well?

> +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> +	if (!progress)
> +		retry_count++;
> +	goto retry;

I don't like upward goto's. Can't we convert this to a nice do {} while or
while() loop?

>  }
> 
>  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
> 
>  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  				struct file *file, const char __user *userbuf,
> -				size_t nbytes, loff_t *ppos)
> +				size_t bbytes, loff_t *ppos)
>  {
> -	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> -				cft->private, userbuf, nbytes, ppos,
> -				mem_cgroup_write_strategy);
> +	char *buf, *end;
> +	unsigned long long val;
> +	int ret;
> +
> +	buf = kmalloc(bbytes + 1, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +	buf[bbytes] = '\0';
> +	ret = -EFAULT;
> +	if (copy_from_user(buf, userbuf, bbytes))
> +		goto out;
> +
> +	ret = -EINVAL;
> +	strstrip(buf);
> +	val = memparse(buf, &end);
> +	if (*end != '\0')
> +		goto out;
> +	/* Round to page size */
> +	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> +
> +	switch(cft->private) {
> +	case RES_LIMIT:
> +		ret = mem_cgroup_resize_limit(cont, val);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	if (!ret)
> +		ret = bbytes;
> +out:
> +	kfree(buf);
> +	return ret;
>  }
> 
>  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
>  1. Add support for accounting huge pages (as a separate controller)
>  2. Make per-cgroup scanner reclaim not-shared pages first
>  3. Teach controller to account for shared-pages
> -4. Start reclamation when the limit is lowered
> -5. Start reclamation in the background when the limit is
> +4. Start reclamation in the background when the limit is
>     not yet hit but the usage is getting closer

Except for the minor nits

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

WARNING: multiple messages have this Message-ID (diff)

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"xemul@openvz.org" <xemul@openvz.org>,
	"menage@google.com" <menage@google.com>,
	"lizf@cn.fujitsu.com" <lizf@cn.fujitsu.com>,
	"yamamoto@valinux.co.jp" <yamamoto@valinux.co.jp>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 2/2] memcg: reduce usage at change limit
Date: Tue, 17 Jun 2008 09:16:31 +0530	[thread overview]
Message-ID: <48573397.608@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080617123604.c8cb1bd5.kamezawa.hiroyu@jp.fujitsu.com>

KAMEZAWA Hiroyuki wrote:
> Reduce the usage of res_counter at the change of limit.
> 
> Changelog v4 -> v5.
>  - moved "feedback" alogrithm from res_counter to memcg.
> 
> Background:
>  - Now, mem->usage is not reduced at limit change. So, the users will see
>    usage > limit case in memcg every time. This patch fixes it.
> 
>  Before:
>  - no usage change at setting limit.
>  - setting limit always returns 0 even if usage can never be less than zero.
>    (This can happen when memory is locked or there is no swap.)
>  - This is BUG, I think.
>  After:
>  - usage will be less than new limit at limit change.
>  - set limit returns -EBUSY when the usage cannot be reduced.
> 
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  Documentation/controllers/memory.txt |    3 -
>  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
>  2 files changed, 56 insertions(+), 15 deletions(-)
> 
> Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> @@ -852,18 +852,30 @@ out:
>  	css_put(&mem->css);
>  	return ret;
>  }
> +/*
> + * try to set limit and reduce usage if necessary.
> + * returns 0 at success.
> + * returns -EBUSY if memory cannot be dropped.
> + */
> 
> -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> +					unsigned long long val)
>  {
> -	*tmp = memparse(buf, &buf);
> -	if (*buf != '\0')
> -		return -EINVAL;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	int retry_count = 0;
> +	int progress;
> 
> -	/*
> -	 * Round up the value to the closest page size
> -	 */
> -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> -	return 0;
> +retry:
> +	if (!res_counter_set_limit(&memcg->res, val))
> +		return 0;
> +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> +		return -EBUSY;
> +
> +	cond_resched();

Do we really need this? We do have cond_resched in shrink_page_list(),
shrink_active_list(), do we need it here as well?

> +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> +	if (!progress)
> +		retry_count++;
> +	goto retry;

I don't like upward goto's. Can't we convert this to a nice do {} while or
while() loop?

>  }
> 
>  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
> 
>  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  				struct file *file, const char __user *userbuf,
> -				size_t nbytes, loff_t *ppos)
> +				size_t bbytes, loff_t *ppos)
>  {
> -	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> -				cft->private, userbuf, nbytes, ppos,
> -				mem_cgroup_write_strategy);
> +	char *buf, *end;
> +	unsigned long long val;
> +	int ret;
> +
> +	buf = kmalloc(bbytes + 1, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +	buf[bbytes] = '\0';
> +	ret = -EFAULT;
> +	if (copy_from_user(buf, userbuf, bbytes))
> +		goto out;
> +
> +	ret = -EINVAL;
> +	strstrip(buf);
> +	val = memparse(buf, &end);
> +	if (*end != '\0')
> +		goto out;
> +	/* Round to page size */
> +	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> +
> +	switch(cft->private) {
> +	case RES_LIMIT:
> +		ret = mem_cgroup_resize_limit(cont, val);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	if (!ret)
> +		ret = bbytes;
> +out:
> +	kfree(buf);
> +	return ret;
>  }
> 
>  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
>  1. Add support for accounting huge pages (as a separate controller)
>  2. Make per-cgroup scanner reclaim not-shared pages first
>  3. Teach controller to account for shared-pages
> -4. Start reclamation when the limit is lowered
> -5. Start reclamation in the background when the limit is
> +4. Start reclamation in the background when the limit is
>     not yet hit but the usage is getting closer

Except for the minor nits

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-06-17  3:46 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-17  3:31 [PATCH 0/2] memcg: Reduce usage at change limit KAMEZAWA Hiroyuki
2008-06-17  3:31 ` KAMEZAWA Hiroyuki
2008-06-17  3:33 ` [PATCH 1/2] memcg: res counter set limit KAMEZAWA Hiroyuki
2008-06-17  3:33   ` KAMEZAWA Hiroyuki
2008-06-17  3:40   ` Balbir Singh
2008-06-17  3:40     ` Balbir Singh
2008-06-17  4:05     ` KAMEZAWA Hiroyuki
2008-06-17  4:05       ` KAMEZAWA Hiroyuki
2008-06-17  3:36 ` [PATCH 2/2] memcg: reduce usage at change limit KAMEZAWA Hiroyuki
2008-06-17  3:36   ` KAMEZAWA Hiroyuki
2008-06-17  3:46   ` Balbir Singh [this message]
2008-06-17  3:46     ` Balbir Singh
2008-06-17  4:06     ` KAMEZAWA Hiroyuki
2008-06-17  4:06       ` KAMEZAWA Hiroyuki
2008-06-17 10:00       ` KAMEZAWA Hiroyuki
2008-06-17 10:00         ` KAMEZAWA Hiroyuki
2008-06-17 10:14         ` Balbir Singh
2008-06-17 10:14           ` Balbir Singh
2008-06-17 12:03           ` kamezawa.hiroyu
2008-06-17 12:03             ` kamezawa.hiroyu
2008-06-20  5:16   ` Paul Menage
2008-06-20  5:16     ` Paul Menage
2008-06-20  6:44     ` kamezawa.hiroyu
2008-06-20  6:44       ` kamezawa.hiroyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48573397.608@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=xemul@openvz.org \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.