public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Andrea Righi <arighi@develer.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] memcg: dirty pages instrumentation
Date: Tue, 23 Feb 2010 14:56:06 -0500	[thread overview]
Message-ID: <20100223195606.GD11930@redhat.com> (raw)
In-Reply-To: <20100223094040.GC1882@linux>

On Tue, Feb 23, 2010 at 10:40:40AM +0100, Andrea Righi wrote:
> On Mon, Feb 22, 2010 at 11:52:15AM -0500, Vivek Goyal wrote:
> > >  unsigned long determine_dirtyable_memory(void)
> > >  {
> > > -	unsigned long x;
> > > -
> > > -	x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
> > > -
> > > +	unsigned long memcg_memory, memory;
> > > +
> > > +	memory = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
> > > +	memcg_memory = mem_cgroup_page_state(MEMCG_NR_FREE_PAGES);
> > > +	if (memcg_memory > 0) {
> > 
> > it could be just 
> > 
> > 	if (memcg_memory) {
> 
> Agreed.
> 
> > 	}
> > 
> > > +		memcg_memory +=
> > > +			mem_cgroup_page_state(MEMCG_NR_RECLAIMABLE_PAGES);
> > > +		if (memcg_memory < memory)
> > > +			return memcg_memory;
> > > +	}
> > >  	if (!vm_highmem_is_dirtyable)
> > > -		x -= highmem_dirtyable_memory(x);
> > > +		memory -= highmem_dirtyable_memory(memory);
> > >  
> > 
> > If vm_highmem_is_dirtyable=0, In that case, we can still return with
> > "memcg_memory" which can be more than "memory".  IOW, highmem is not
> > dirtyable system wide but still we can potetially return back saying
> > for this cgroup we can dirty more pages which can potenailly be acutally
> > be more that system wide allowed?
> > 
> > Because you have modified dirtyable_memory() and made it per cgroup, I
> > think it automatically takes care of the cases of per cgroup dirty ratio,
> > I mentioned in my previous mail. So we will use system wide dirty ratio
> > to calculate the allowed dirty pages in this cgroup (dirty_ratio *
> > available_memory()) and if this cgroup wrote too many pages start
> > writeout? 
> 
> OK, if I've understood well, you're proposing to use per-cgroup
> dirty_ratio interface and do something like:

I think we can use system wide dirty_ratio for per cgroup (instead of
providing configurable dirty_ratio for each cgroup where each memory
cgroup can have different dirty ratio. Can't think of a use case
immediately).
> 
> unsigned long determine_dirtyable_memory(void)
> {
> 	unsigned long memcg_memory, memory;
> 
> 	memory = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
> 	if (!vm_highmem_is_dirtyable)
> 		memory -= highmem_dirtyable_memory(memory);
> 
> 	memcg_memory = mem_cgroup_page_state(MEMCG_NR_FREE_PAGES);
> 	if (!memcg_memory)
> 		return memory + 1;      /* Ensure that we never return 0 */
> 	memcg_memory += mem_cgroup_page_state(MEMCG_NR_RECLAIMABLE_PAGES);
> 	if (!vm_highmem_is_dirtyable)
> 		 memcg_memory -= highmem_dirtyable_memory(memory) *
> 					mem_cgroup_dirty_ratio() / 100;
> 	if (memcg_memory < memory)
> 		return memcg_memory;
> }
> 

This one is tricky and I don't have good answers. I have concerns though.

- While calculating system wide dirtyable memory, we rely on actual memory
  available. (NR_FREE_PAGES + reclaimable_pages). In case of per memory
  cgroup available free pages, we are relying on not necessarily on
  actually available dirtyable memory but based on a user configurable
  limit (LIMIT - USAGE = cgroup_dirtyable_memory).

  This is good as long as total sum of limits of all cgroups is not more
  than available memory. But if somebody sets the "limit" to a high value,
  we will allow lots of write from that cgroup without being throttled.

  So if memory cgroups were not configured right so that limit total
  represents the actual memory in system, then we might end up having lot
  more dirty pages in the system.

- Subtracting high memory pages from dirtyable memory is tricky. Because
  how to account it in per cgroup calculation. May be we can just do
  following.

	calculate_memcg_memory;
	memory = memory - highmem_dirtyable_memory();
	if (memcg_memory < memory)
		return memcg_memory;

 Not sure. This is very crude and leaves the scope of more pages being
 dirty than otherwise would have been. Ideas?

Vivek

> 
> > 
> > > -	return x + 1;	/* Ensure that we never return 0 */
> > > +	return memory + 1;	/* Ensure that we never return 0 */
> > >  }
> > >  
> > >  void
> > > @@ -421,12 +428,13 @@ get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
> > >  		 unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
> > >  {
> > >  	unsigned long background;
> > > -	unsigned long dirty;
> > > +	unsigned long dirty, dirty_bytes;
> > >  	unsigned long available_memory = determine_dirtyable_memory();
> > >  	struct task_struct *tsk;
> > >  
> > > -	if (vm_dirty_bytes)
> > > -		dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
> > > +	dirty_bytes = mem_cgroup_dirty_bytes();
> > > +	if (dirty_bytes)
> > > +		dirty = DIV_ROUND_UP(dirty_bytes, PAGE_SIZE);
> > >  	else {
> > >  		int dirty_ratio;
> > >  
> > > @@ -505,9 +513,17 @@ static void balance_dirty_pages(struct address_space *mapping,
> > >  		get_dirty_limits(&background_thresh, &dirty_thresh,
> > >  				&bdi_thresh, bdi);
> > >  
> > > -		nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> > > +		nr_reclaimable = mem_cgroup_page_state(MEMCG_NR_FILE_DIRTY);
> > > +		if (nr_reclaimable == 0) {
> > > +			nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> > >  					global_page_state(NR_UNSTABLE_NFS);
> > > -		nr_writeback = global_page_state(NR_WRITEBACK);
> > > +			nr_writeback = global_page_state(NR_WRITEBACK);
> > > +		} else {
> > > +			nr_reclaimable +=
> > > +				mem_cgroup_page_state(MEMCG_NR_UNSTABLE_NFS);
> > > +			nr_writeback =
> > > +				mem_cgroup_page_state(MEMCG_NR_WRITEBACK);
> > > +		}
> > >  
> > >  		bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> > >  		bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> > > @@ -660,6 +676,8 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > >  	unsigned long dirty_thresh;
> > >  
> > >          for ( ; ; ) {
> > > +		unsigned long dirty;
> > > +
> > >  		get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
> > >  
> > >                  /*
> > > @@ -668,10 +686,15 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > >                   */
> > >                  dirty_thresh += dirty_thresh / 10;      /* wheeee... */
> > >  
> > > -                if (global_page_state(NR_UNSTABLE_NFS) +
> > > -			global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > -                        	break;
> > > -                congestion_wait(BLK_RW_ASYNC, HZ/10);
> > > +		dirty = mem_cgroup_page_state(MEMCG_NR_WRITEBACK);
> > > +		if (dirty < 0)
> > 
> > dirty is unsigned long. Will above condition be ever true? 
> > 
> > Are you expecting that NR_WRITEBACK can go negative?
> 
> No, this is a bug, indeed. The right check is just "if (dirty)".
> 
> Thanks!
> -Andrea

  parent reply	other threads:[~2010-02-23 19:56 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-21 15:18 [RFC] [PATCH 0/2] memcg: per cgroup dirty limit Andrea Righi
2010-02-21 15:18 ` [PATCH 1/2] memcg: dirty pages accounting and limiting infrastructure Andrea Righi
2010-02-21 21:28   ` David Rientjes
2010-02-21 22:17     ` Andrea Righi
2010-02-22 18:07       ` Vivek Goyal
2010-02-23 11:58         ` Andrea Righi
2010-02-25 15:36           ` Minchan Kim
2010-02-26  0:23             ` KAMEZAWA Hiroyuki
2010-02-26  4:50               ` Minchan Kim
2010-02-26  5:01                 ` KAMEZAWA Hiroyuki
2010-02-26  5:53                   ` Minchan Kim
2010-02-26  6:15                     ` KAMEZAWA Hiroyuki
2010-02-26  6:35                       ` Minchan Kim
2010-02-22  0:22   ` KAMEZAWA Hiroyuki
2010-02-22 18:00     ` Andrea Righi
2010-02-22 21:21       ` David Rientjes
2010-02-22 19:31     ` Vivek Goyal
2010-02-23  9:58       ` Andrea Righi
2010-02-22 15:58   ` Vivek Goyal
2010-02-22 17:29     ` Balbir Singh
2010-02-23  9:26     ` Andrea Righi
2010-02-22 16:14   ` Balbir Singh
2010-02-23  9:28     ` Andrea Righi
2010-02-24  0:09       ` KAMEZAWA Hiroyuki
2010-02-21 15:18 ` [PATCH 2/2] memcg: dirty pages instrumentation Andrea Righi
2010-02-21 21:38   ` David Rientjes
2010-02-21 22:33     ` Andrea Righi
2010-02-22  0:32   ` KAMEZAWA Hiroyuki
2010-02-22 17:57     ` Andrea Righi
2010-02-22 16:52   ` Vivek Goyal
2010-02-23  9:40     ` Andrea Righi
2010-02-23  9:45       ` Andrea Righi
2010-02-23 19:56       ` Vivek Goyal [this message]
2010-02-23 22:22         ` David Rientjes
2010-02-25 14:34           ` Andrea Righi
2010-02-26  0:14             ` KAMEZAWA Hiroyuki
2010-02-22 18:20   ` Peter Zijlstra
2010-02-23  9:46     ` Andrea Righi
2010-02-23 21:29   ` Vivek Goyal
2010-02-25 15:12     ` Andrea Righi
2010-02-26 21:48       ` Vivek Goyal
2010-02-26 22:21         ` Andrea Righi
2010-02-26 22:28           ` Vivek Goyal
2010-03-01  0:47         ` KAMEZAWA Hiroyuki
2010-02-21 23:48 ` [RFC] [PATCH 0/2] memcg: per cgroup dirty limit KAMEZAWA Hiroyuki
2010-02-22 14:27 ` Vivek Goyal
2010-02-22 17:36   ` Balbir Singh
2010-02-22 17:58     ` Vivek Goyal
2010-02-23  0:07       ` KAMEZAWA Hiroyuki
2010-02-23 15:12         ` Vivek Goyal
2010-02-24  0:19           ` KAMEZAWA Hiroyuki
2010-02-22 18:12   ` Andrea Righi
2010-02-22 18:29     ` Vivek Goyal
2010-02-22 21:15       ` David Rientjes
2010-02-23  9:55       ` Andrea Righi
2010-02-23 20:01         ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100223195606.GD11930@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=suleiman@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox