From: Vivek Goyal <vgoyal@redhat.com>
To: Greg Thelen <gthelen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
Andrea Righi <arighi@develer.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Minchan Kim <minchan.kim@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Ciju Rajan K <ciju@linux.vnet.ibm.com>,
David Rientjes <rientjes@google.com>,
Wu Fengguang <fengguang.wu@intel.com>,
Chad Talbott <ctalbott@google.com>,
Justin TerAvest <teravest@google.com>
Subject: Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting
Date: Tue, 15 Mar 2011 14:48:39 -0400 [thread overview]
Message-ID: <20110315184839.GB5740@redhat.com> (raw)
In-Reply-To: <AANLkTinDNOLMdU7EEMPFkC_f9edCx7ZFc7=qLRNAEmBM@mail.gmail.com>
On Mon, Mar 14, 2011 at 07:41:13PM -0700, Greg Thelen wrote:
> On Mon, Mar 14, 2011 at 1:23 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Mon, Mar 14, 2011 at 11:29:17AM -0700, Greg Thelen wrote:
> >
> > [..]
> >> > We could just crawl the memcg's page LRU and bring things under control
> >> > that way, couldn't we? That would fix it. What were the reasons for
> >> > not doing this?
> >>
> >> My rational for pursuing bdi writeback was I/O locality. I have heard that
> >> per-page I/O has bad locality. Per inode bdi-style writeback should have better
> >> locality.
> >>
> >> My hunch is the best solution is a hybrid which uses a) bdi writeback with a
> >> target memcg filter and b) using the memcg lru as a fallback to identify the bdi
> >> that needed writeback. I think the part a) memcg filtering is likely something
> >> like:
> >> http://marc.info/?l=linux-kernel&m=129910424431837
> >>
> >> The part b) bdi selection should not be too hard assuming that page-to-mapping
> >> locking is doable.
> >
> > Greg,
> >
> > IIUC, option b) seems to be going through pages of particular memcg and
> > mapping page to inode and start writeback on particular inode?
>
> Yes.
>
> > If yes, this might be reasonably good. In the case when cgroups are not
> > sharing inodes then it automatically maps one inode to one cgroup and
> > once cgroup is over limit, it starts writebacks of its own inode.
> >
> > In case inode is shared, then we get the case of one cgroup writting
> > back the pages of other cgroup. Well I guess that also can be handeled
> > by flusher thread where a bunch or group of pages can be compared with
> > the cgroup passed in writeback structure. I guess that might hurt us
> > more than benefit us.
>
> Agreed. For now just writing the entire inode is probably fine.
>
> > IIUC how option b) works then we don't even need option a) where an N level
> > deep cache is maintained?
>
> Originally I was thinking that bdi-wide writeback with memcg filter
> was a good idea. But this may be unnecessarily complex. Now I am
> agreeing with you that option (a) may not be needed. Memcg could
> queue per-inode writeback using the memcg lru to locate inodes
> (lru->page->inode) with something like this in
> [mem_cgroup_]balance_dirty_pages():
>
> while (memcg_usage() >= memcg_fg_limit) {
> inode = memcg_dirty_inode(cg); /* scan lru for a dirty page, then
> grab mapping & inode */
> sync_inode(inode, &wbc);
> }
>
> if (memcg_usage() >= memcg_bg_limit) {
> queue per-memcg bg flush work item
> }
I think even for background we shall have to implement some kind of logic
where inodes are selected by traversing memcg->lru list so that for
background write we don't end up writting too many inodes from other
root group in an attempt to meet the low background ratio of memcg.
So to me it boils down to coming up a new inode selection logic for
memcg which can be used both for background as well as foreground
writes. This will make sure we don't end up writting pages from the
inodes we don't want to.
Though we also shall have to come up with some approximation so that
if there are multiple inodes in the cgroup, we don't end up writting
same inodes all the time and some inodes don't get written back at
all. May be skipping random amount of pages from the beginning of list
before we select an inode.
This has the disadvantage that we are using a different logic for non
root cgroup but until we figure out how to retrieve inodes belonging
to a memory cgroup, it might not be a bad idea.
Thanks
Vivek
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-15 18:48 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-11 18:43 [PATCH v6 0/9] memcg: per cgroup dirty page accounting Greg Thelen
2011-03-11 18:43 ` [PATCH v6 1/9] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-03-14 14:50 ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 2/9] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-03-11 18:43 ` [PATCH v6 3/9] memcg: add dirty page accounting infrastructure Greg Thelen
2011-03-14 14:56 ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 4/9] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-03-14 15:10 ` Minchan Kim
2011-03-15 6:32 ` Greg Thelen
2011-03-15 13:50 ` Ryusuke Konishi
2011-03-11 18:43 ` [PATCH v6 5/9] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-03-11 18:43 ` [PATCH v6 6/9] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-03-14 15:16 ` Minchan Kim
2011-03-15 14:01 ` Mike Heffner
2011-03-16 0:00 ` KAMEZAWA Hiroyuki
2011-03-16 0:50 ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 7/9] memcg: add dirty limiting routines Greg Thelen
2011-03-11 18:43 ` [PATCH v6 8/9] memcg: check memcg dirty limits in page writeback Greg Thelen
2011-03-14 17:54 ` Vivek Goyal
2011-03-14 17:59 ` Vivek Goyal
2011-03-14 21:10 ` Jan Kara
2011-03-15 3:27 ` Greg Thelen
2011-03-15 23:12 ` Jan Kara
2011-03-16 2:35 ` Greg Thelen
2011-03-16 12:35 ` Jan Kara
2011-03-16 18:07 ` Vivek Goyal
2011-03-15 16:20 ` Vivek Goyal
2011-03-11 18:43 ` [PATCH v6 9/9] memcg: make background writeback memcg aware Greg Thelen
2011-03-15 22:54 ` Vivek Goyal
2011-03-16 1:00 ` Greg Thelen
2011-03-12 1:10 ` [PATCH v6 0/9] memcg: per cgroup dirty page accounting Andrew Morton
2011-03-14 18:29 ` Greg Thelen
2011-03-14 20:23 ` Vivek Goyal
2011-03-15 2:41 ` Greg Thelen
2011-03-15 18:48 ` Vivek Goyal [this message]
2011-03-16 13:13 ` Johannes Weiner
2011-03-16 14:59 ` Vivek Goyal
2011-03-16 16:35 ` Johannes Weiner
2011-03-16 17:06 ` Vivek Goyal
2011-03-16 21:19 ` Greg Thelen
2011-03-16 21:52 ` Johannes Weiner
2011-03-17 4:41 ` Greg Thelen
2011-03-17 12:43 ` Johannes Weiner
2011-03-17 14:49 ` Vivek Goyal
2011-03-17 14:53 ` Jan Kara
2011-03-17 15:42 ` Curt Wohlgemuth
2011-03-18 7:57 ` Greg Thelen
2011-03-18 14:50 ` Vivek Goyal
2011-03-23 9:06 ` KAMEZAWA Hiroyuki
2011-03-18 14:29 ` Vivek Goyal
2011-03-18 14:46 ` Johannes Weiner
2011-03-17 14:46 ` Jan Kara
2011-03-17 17:12 ` Vivek Goyal
2011-03-17 17:59 ` Jan Kara
2011-03-17 18:15 ` Vivek Goyal
2011-03-15 21:23 ` Vivek Goyal
2011-03-15 23:11 ` Vivek Goyal
2011-03-15 1:56 ` KAMEZAWA Hiroyuki
2011-03-15 2:51 ` Greg Thelen
2011-03-15 2:54 ` KAMEZAWA Hiroyuki
2011-03-16 12:45 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110315184839.GB5740@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arighi@develer.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=ciju@linux.vnet.ibm.com \
--cc=containers@lists.osdl.org \
--cc=ctalbott@google.com \
--cc=fengguang.wu@intel.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=rientjes@google.com \
--cc=teravest@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).