Re: [RFC] [PATCH 0/2] memcg: per cgroup dirty limit

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Vivek Goyal <vgoyal@redhat.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
	Andrea Righi <arighi@develer.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH 0/2] memcg: per cgroup dirty limit
Date: Tue, 23 Feb 2010 10:12:01 -0500	[thread overview]
Message-ID: <20100223151201.GB11930@redhat.com> (raw)
In-Reply-To: <20100223090704.839d8bef.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, Feb 23, 2010 at 09:07:04AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 22 Feb 2010 12:58:33 -0500
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Mon, Feb 22, 2010 at 11:06:40PM +0530, Balbir Singh wrote:
> > > * Vivek Goyal <vgoyal@redhat.com> [2010-02-22 09:27:45]:
> > > 
> > > 
> > > > 
> > > >   May be we can modify writeback_inodes_wbc() to check first dirty page
> > > >   of the inode. And if it does not belong to same memcg as the task who
> > > >   is performing balance_dirty_pages(), then skip that inode.
> > > 
> > > Do you expect all pages of an inode to be paged in by the same cgroup?
> > 
> > I guess at least in simple cases. Not sure whether it will cover majority
> > of usage or not and up to what extent that matters.
> > 
> > If we start doing background writeout, on per page (like memory reclaim),
> > the it probably will be slower and hence flusing out pages sequentially
> > from inode makes sense. 
> > 
> > At one point I was thinking, like pages, can we have an inode list per
> > memory cgroup so that writeback logic can traverse that inode list to
> > determine which inodes need to be cleaned. But associating inodes to
> > memory cgroup is not very intutive at the same time, we again have the
> > issue of shared file pages from two differnent cgroups. 
> > 
> > But I guess, a simpler scheme would be to just check first dirty page from
> > inode and if it does not belong to memory cgroup of task being throttled,
> > skip it.
> > 
> > It will not cover the case of shared file pages across memory cgroups, but
> > at least something relatively simple to begin with. Do you have more ideas
> > on how it can be handeled better.
> > 
> 
> If pagesa are "shared", it's hard to find _current_ owner.

Is it not the case that the task who touched the page first is owner of
the page and task memcg is charged for that page. Subsequent shared users
of the page get a free ride?

If yes, why it is hard to find _current_ owner. Will it not be the memory
cgroup which brought the page into existence?
 
> Then, what I'm
> thinking as memcg's update is a memcg-for-page-cache and pagecache
> migration between memcg.
> 
> The idea is
>   - At first, treat page cache as what we do now.
>   - When a process touches page cache, check process's memcg and page cache's
>     memcg. If process-memcg != pagecache-memcg, we migrate it to a special
>     container as memcg-for-page-cache.
> 
> Then,
>   - read-once page caches are handled by local memcg.
>   - shared page caches are handled in specail memcg for "shared".
> 
> But this will add significant overhead in native implementation.
> (We may have to use page flags rather than page_cgroup's....)
> 
> I'm now wondering about
>   - set "shared flag" to a page_cgroup if cached pages are accessed.
>   - sweep them to special memcg in other (kernel) daemon when we hit thresh
>     or some.
> 
> But hmm, I'm not sure that memcg-for-shared-page-cache is accepptable
> for anyone.

I have not understood the idea well hence few queries/thoughts.

- You seem to be suggesting that shared page caches can be accounted
  separately with-in memcg. But one page still need to be associated
  with one specific memcg and one can only do migration across memcg
  based on some policy who used how much. But we probably are trying
  to be too accurate there and it might not be needed.

  Can you elaborate a little more on what you meant by migrating pages
  to special container memcg-for-page-cache? Is it a shared container
  across memory cgroups which are sharing a page?

- Current writeback mechanism is flushing per inode basis. I think
  biggest advantage is faster writeout speed as contiguous pages
  are dispatched to disk (irrespective to the memory cgroup differnt
  pages can belong to), resulting in better merging and less seeks.

  Even if we can account shared pages well across memory cgroups, flushing
  these pages to disk will probably become complicated/slow if we start going
  through the pages of a memory cgroup and start flushing these out upon
  hitting the dirty_background/dirty_ratio/dirty_bytes limits.

Thanks
Vivek

next prev parent reply	other threads:[~2010-02-23 15:13 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-21 15:18 [RFC] [PATCH 0/2] memcg: per cgroup dirty limit Andrea Righi
2010-02-21 15:18 ` [PATCH 1/2] memcg: dirty pages accounting and limiting infrastructure Andrea Righi
2010-02-21 21:28   ` David Rientjes
2010-02-21 22:17     ` Andrea Righi
2010-02-22 18:07       ` Vivek Goyal
2010-02-23 11:58         ` Andrea Righi
2010-02-25 15:36           ` Minchan Kim
2010-02-26  0:23             ` KAMEZAWA Hiroyuki
2010-02-26  4:50               ` Minchan Kim
2010-02-26  5:01                 ` KAMEZAWA Hiroyuki
2010-02-26  5:53                   ` Minchan Kim
2010-02-26  6:15                     ` KAMEZAWA Hiroyuki
2010-02-26  6:35                       ` Minchan Kim
2010-02-22  0:22   ` KAMEZAWA Hiroyuki
2010-02-22 18:00     ` Andrea Righi
2010-02-22 21:21       ` David Rientjes
2010-02-22 19:31     ` Vivek Goyal
2010-02-23  9:58       ` Andrea Righi
2010-02-22 15:58   ` Vivek Goyal
2010-02-22 17:29     ` Balbir Singh
2010-02-23  9:26     ` Andrea Righi
2010-02-22 16:14   ` Balbir Singh
2010-02-23  9:28     ` Andrea Righi
2010-02-24  0:09       ` KAMEZAWA Hiroyuki
2010-02-21 15:18 ` [PATCH 2/2] memcg: dirty pages instrumentation Andrea Righi
2010-02-21 21:38   ` David Rientjes
2010-02-21 22:33     ` Andrea Righi
2010-02-22  0:32   ` KAMEZAWA Hiroyuki
2010-02-22 17:57     ` Andrea Righi
2010-02-22 16:52   ` Vivek Goyal
2010-02-23  9:40     ` Andrea Righi
2010-02-23  9:45       ` Andrea Righi
2010-02-23 19:56       ` Vivek Goyal
2010-02-23 22:22         ` David Rientjes
2010-02-25 14:34           ` Andrea Righi
2010-02-26  0:14             ` KAMEZAWA Hiroyuki
2010-02-22 18:20   ` Peter Zijlstra
2010-02-23  9:46     ` Andrea Righi
2010-02-23 21:29   ` Vivek Goyal
2010-02-25 15:12     ` Andrea Righi
2010-02-26 21:48       ` Vivek Goyal
2010-02-26 22:21         ` Andrea Righi
2010-02-26 22:28           ` Vivek Goyal
2010-03-01  0:47         ` KAMEZAWA Hiroyuki
2010-02-21 23:48 ` [RFC] [PATCH 0/2] memcg: per cgroup dirty limit KAMEZAWA Hiroyuki
2010-02-22 14:27 ` Vivek Goyal
2010-02-22 17:36   ` Balbir Singh
2010-02-22 17:58     ` Vivek Goyal
2010-02-23  0:07       ` KAMEZAWA Hiroyuki
2010-02-23 15:12         ` Vivek Goyal [this message]
2010-02-24  0:19           ` KAMEZAWA Hiroyuki
2010-02-22 18:12   ` Andrea Righi
2010-02-22 18:29     ` Vivek Goyal
2010-02-22 21:15       ` David Rientjes
2010-02-23  9:55       ` Andrea Righi
2010-02-23 20:01         ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100223151201.GB11930@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=suleiman@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox