public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@in.ibm.com>
To: Pavel Emelianov <xemul@openvz.org>
Cc: Linux MM <linux-mm@kvack.org>,
	dev@openvz.org, ckrm-tech@lists.sourceforge.net,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	haveblue@us.ibm.com, rohitseth@google.com
Subject: Re: [RFC][PATCH 8/8] RSS controller support reclamation
Date: Fri, 10 Nov 2006 14:46:18 +0530	[thread overview]
Message-ID: <45544362.9040805@in.ibm.com> (raw)
In-Reply-To: <45543E36.2080600@openvz.org>

Pavel Emelianov wrote:
> Balbir Singh wrote:
>> Reclaim memory as we hit the max_shares limit. The code for reclamation
>> is inspired from Dave Hansen's challenged memory controller and from the
>> shrink_all_memory() code
>>
>> Reclamation can be triggered from two paths
>>
>> 1. While incrementing the RSS, we hit the limit of the container
>> 2. A container is resized, such that it's new limit is below its current
>>    RSS
>>
>> In (1) reclamation takes place in the background.
> 
> Hmm... This is not a hard limit in this case, right? And in case
> of overloaded system from the moment reclamation thread is woken
> up till the moment it starts shrinking zones container may touch
> too many pages...
> 
> That's not good.

Yes, please see my comments in the TODO's. Hard limits should be easy
to implement, it's a question of calling the correct routine based
on policy.

> 
>> TODO's
>>
>> 1. max_shares currently works like a soft limit. The RSS can grow beyond it's
>>    limit. One possible fix is to introduce a soft limit (reclaim when the
>>    container hits the soft limit) and fail when we hit the hard limit
> 
> Such soft limit doesn't help also. It just makes effects on
> low-loaded system smoother.
> 
> And what about a hard limit - how would you fail in page fault in
> case of limit hit? SIGKILL/SEGV is not an option - in this case we
> should run synchronous reclamation. This is done in beancounter
> patches v6 we've sent recently.
> 

I thought about running synchronous reclamation, but then did not follow
that approach, I was not sure if calling the reclaim routines from the
page fault context is a good thing to do. It's worth trying out, since
it would provide better control over rss.


>> Signed-off-by: Balbir Singh <balbir@in.ibm.com>
>> ---
>>
>> --- linux-2.6.19-rc2/mm/vmscan.c~container-memctlr-reclaim	2006-11-09 22:21:11.000000000 +0530
>> +++ linux-2.6.19-rc2-balbir/mm/vmscan.c	2006-11-09 22:21:11.000000000 +0530
>> @@ -36,6 +36,8 @@
>>  #include <linux/rwsem.h>
>>  #include <linux/delay.h>
>>  #include <linux/kthread.h>
>> +#include <linux/container.h>
>> +#include <linux/memctlr.h>
>>  
>>  #include <asm/tlbflush.h>
>>  #include <asm/div64.h>
>> @@ -65,6 +67,9 @@ struct scan_control {
>>  	int swappiness;
>>  
>>  	int all_unreclaimable;
>> +
>> +	int overlimit;
>> +	void *container;	/* Added as void * to avoid #ifdef's */
>>  };
>>  
>>  /*
>> @@ -811,6 +816,10 @@ force_reclaim_mapped:
>>  		cond_resched();
>>  		page = lru_to_page(&l_hold);
>>  		list_del(&page->lru);
>> +		if (!memctlr_page_reclaim(page, sc->container, sc->overlimit)) {
>> +			list_add(&page->lru, &l_active);
>> +			continue;
>> +		}
>>  		if (page_mapped(page)) {
>>  			if (!reclaim_mapped ||
>>  			    (total_swap_pages == 0 && PageAnon(page)) ||
> 
> [snip] See comment below.
> 
>>  
>> +#ifdef CONFIG_RES_GROUPS_MEMORY
>> +/*
>> + * Modelled after shrink_all_memory
>> + */
>> +unsigned long memctlr_shrink_container_memory(unsigned long nr_pages,
>> +						struct container *container,
>> +						int overlimit)
>> +{
>> +	unsigned long lru_pages;
>> +	unsigned long ret = 0;
>> +	int pass;
>> +	struct zone *zone;
>> +	struct scan_control sc = {
>> +		.gfp_mask = GFP_KERNEL,
>> +		.may_swap = 0,
>> +		.swap_cluster_max = nr_pages,
>> +		.may_writepage = 1,
>> +		.swappiness = vm_swappiness,
>> +		.overlimit = overlimit,
>> +		.container = container,
>> +	};
>> +
> 
> [snip]
> 
>> +		for (prio = DEF_PRIORITY; prio >= 0; prio--) {
>> +			unsigned long nr_to_scan = nr_pages - ret;
>> +
>> +			sc.nr_scanned = 0;
>> +			ret += shrink_all_zones(nr_to_scan, prio, pass, &sc);
>> +			if (ret >= nr_pages)
>> +				break;
>> +
>> +			if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
>> +				blk_congestion_wait(WRITE, HZ / 10);
>> +		}
>> +	}
>> +	return ret;
>> +}
>> +#endif
> 
> Please correct me if I'm wrong, but does this reclamation work like
> "run over all the zones' lists searching for page whose controller
> is sc->container" ?
> 

Yeah, that's correct. The code can also reclaim memory from all over-the-limit
containers (by passing SC_OVERLIMIT_ALL). The idea behind using such a scheme
is to ensure that the global LRU list is not broken.


-- 
	Thanks for the feedback,
	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

  reply	other threads:[~2006-11-10  9:16 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-09 19:35 [RFC][PATCH 0/8] RSS controller for containers Balbir Singh
2006-11-09 19:35 ` [RFC][PATCH 1/8] Fix resource groups parsing, while assigning shares Balbir Singh
2006-11-09 19:35 ` [RFC][PATCH 2/8] RSS controller setup Balbir Singh
2006-11-09 19:35 ` [RFC][PATCH 3/8] RSS controller add callbacks Balbir Singh
2006-11-09 19:36 ` [RFC][PATCH 4/8] RSS controller accounting Balbir Singh
2006-11-10  9:06   ` Pavel Emelianov
2006-11-10  9:29     ` Balbir Singh
2006-11-09 19:36 ` [RFC][PATCH 5/8] RSS controller task migration support Balbir Singh
2006-11-09 19:36 ` [RFC][PATCH 6/8] RSS controller shares allocation Balbir Singh
2006-11-10  9:11   ` Pavel Emelianov
2006-11-10 10:27     ` [ckrm-tech] " Balbir Singh
2006-11-10 10:32       ` Pavel Emelianov
2006-11-10 12:55         ` Balbir Singh
2006-11-09 19:36 ` [RFC][PATCH 7/8] RSS controller fix resource groups parsing Balbir Singh
2006-11-10  9:13   ` Pavel Emelianov
2006-11-10  9:32     ` Balbir Singh
2006-11-09 19:36 ` [RFC][PATCH 8/8] RSS controller support reclamation Balbir Singh
2006-11-09 19:45   ` Arjan van de Ven
2006-11-10  1:56     ` [ckrm-tech] " Balbir Singh
2006-11-10  8:54   ` Pavel Emelianov
2006-11-10  9:16     ` Balbir Singh [this message]
2006-11-10  9:29       ` Pavel Emelianov
2006-11-10 12:42         ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45544362.9040805@in.ibm.com \
    --to=balbir@in.ibm.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=dev@openvz.org \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rohitseth@google.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox