All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Greg Thelen <gthelen@google.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Chad Talbott <ctalbott@google.com>,
	Justin TerAvest <teravest@google.com>
Subject: Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting
Date: Wed, 16 Mar 2011 22:52:14 +0100	[thread overview]
Message-ID: <20110316215214.GO2140@cmpxchg.org> (raw)
In-Reply-To: <AANLkTim7q3cLGjxnyBS7SDdpJsGi-z34bpPT=MJSka+C@mail.gmail.com>

On Wed, Mar 16, 2011 at 02:19:26PM -0700, Greg Thelen wrote:
> On Wed, Mar 16, 2011 at 6:13 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Tue, Mar 15, 2011 at 02:48:39PM -0400, Vivek Goyal wrote:
> >> I think even for background we shall have to implement some kind of logic
> >> where inodes are selected by traversing memcg->lru list so that for
> >> background write we don't end up writting too many inodes from other
> >> root group in an attempt to meet the low background ratio of memcg.
> >>
> >> So to me it boils down to coming up a new inode selection logic for
> >> memcg which can be used both for background as well as foreground
> >> writes. This will make sure we don't end up writting pages from the
> >> inodes we don't want to.
> >
> > Originally for struct page_cgroup reduction, I had the idea of
> > introducing something like
> >
> >        struct memcg_mapping {
> >                struct address_space *mapping;
> >                struct mem_cgroup *memcg;
> >        };
> >
> > hanging off page->mapping to make memcg association no longer per-page
> > and save the pc->memcg linkage (it's not completely per-inode either,
> > multiple memcgs can still refer to a single inode).
> >
> > We could put these descriptors on a per-memcg list and write inodes
> > from this list during memcg-writeback.
> >
> > We would have the option of extending this structure to contain hints
> > as to which subrange of the inode is actually owned by the cgroup, to
> > further narrow writeback to the right pages - iff shared big files
> > become a problem.
> >
> > Does that sound feasible?
> 
> If I understand your memcg_mapping proposal, then each inode could
> have a collection of memcg_mapping objects representing the set of
> memcg that were charged for caching pages of the inode's data.  When a
> new file page is charged to a memcg, then the inode's set of
> memcg_mapping would be scanned to determine if current's memcg is
> already in the memcg_mapping set.  If this is the first page for the
> memcg within the inode, then a new memcg_mapping would be allocated
> and attached to the inode.  The memcg_mapping may be reference counted
> and would be deleted when the last inode page for a particular memcg
> is uncharged.

Dead-on.  Well, on which side you put the list - a per-memcg list of
inodes, or a per-inode list of memcgs - really depends on which way
you want to do the lookups.  But this is the idea, yes.

>   page->mapping = &memcg_mapping
>   inode->i_mapping = collection of memcg_mapping, grows/shrinks with [un]charge

If the memcg_mapping list (or hash-table for quick find-or-create?)
was to be on the inode side, I'd put it in struct address_space, since
this is all about page cache, not so much an fs thing.

Still, correct in general.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Greg Thelen <gthelen@google.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Chad Talbott <ctalbott@google.com>,
	Justin TerAvest <teravest@google.com>
Subject: Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting
Date: Wed, 16 Mar 2011 22:52:14 +0100	[thread overview]
Message-ID: <20110316215214.GO2140@cmpxchg.org> (raw)
In-Reply-To: <AANLkTim7q3cLGjxnyBS7SDdpJsGi-z34bpPT=MJSka+C@mail.gmail.com>

On Wed, Mar 16, 2011 at 02:19:26PM -0700, Greg Thelen wrote:
> On Wed, Mar 16, 2011 at 6:13 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Tue, Mar 15, 2011 at 02:48:39PM -0400, Vivek Goyal wrote:
> >> I think even for background we shall have to implement some kind of logic
> >> where inodes are selected by traversing memcg->lru list so that for
> >> background write we don't end up writting too many inodes from other
> >> root group in an attempt to meet the low background ratio of memcg.
> >>
> >> So to me it boils down to coming up a new inode selection logic for
> >> memcg which can be used both for background as well as foreground
> >> writes. This will make sure we don't end up writting pages from the
> >> inodes we don't want to.
> >
> > Originally for struct page_cgroup reduction, I had the idea of
> > introducing something like
> >
> >        struct memcg_mapping {
> >                struct address_space *mapping;
> >                struct mem_cgroup *memcg;
> >        };
> >
> > hanging off page->mapping to make memcg association no longer per-page
> > and save the pc->memcg linkage (it's not completely per-inode either,
> > multiple memcgs can still refer to a single inode).
> >
> > We could put these descriptors on a per-memcg list and write inodes
> > from this list during memcg-writeback.
> >
> > We would have the option of extending this structure to contain hints
> > as to which subrange of the inode is actually owned by the cgroup, to
> > further narrow writeback to the right pages - iff shared big files
> > become a problem.
> >
> > Does that sound feasible?
> 
> If I understand your memcg_mapping proposal, then each inode could
> have a collection of memcg_mapping objects representing the set of
> memcg that were charged for caching pages of the inode's data.  When a
> new file page is charged to a memcg, then the inode's set of
> memcg_mapping would be scanned to determine if current's memcg is
> already in the memcg_mapping set.  If this is the first page for the
> memcg within the inode, then a new memcg_mapping would be allocated
> and attached to the inode.  The memcg_mapping may be reference counted
> and would be deleted when the last inode page for a particular memcg
> is uncharged.

Dead-on.  Well, on which side you put the list - a per-memcg list of
inodes, or a per-inode list of memcgs - really depends on which way
you want to do the lookups.  But this is the idea, yes.

>   page->mapping = &memcg_mapping
>   inode->i_mapping = collection of memcg_mapping, grows/shrinks with [un]charge

If the memcg_mapping list (or hash-table for quick find-or-create?)
was to be on the inode side, I'd put it in struct address_space, since
this is all about page cache, not so much an fs thing.

Still, correct in general.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Greg Thelen <gthelen@google.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Chad Talbott <ctalbott@google.com>,
	Justin TerAvest <teravest@google.com>
Subject: Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting
Date: Wed, 16 Mar 2011 22:52:14 +0100	[thread overview]
Message-ID: <20110316215214.GO2140@cmpxchg.org> (raw)
In-Reply-To: <AANLkTim7q3cLGjxnyBS7SDdpJsGi-z34bpPT=MJSka+C@mail.gmail.com>

On Wed, Mar 16, 2011 at 02:19:26PM -0700, Greg Thelen wrote:
> On Wed, Mar 16, 2011 at 6:13 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Tue, Mar 15, 2011 at 02:48:39PM -0400, Vivek Goyal wrote:
> >> I think even for background we shall have to implement some kind of logic
> >> where inodes are selected by traversing memcg->lru list so that for
> >> background write we don't end up writting too many inodes from other
> >> root group in an attempt to meet the low background ratio of memcg.
> >>
> >> So to me it boils down to coming up a new inode selection logic for
> >> memcg which can be used both for background as well as foreground
> >> writes. This will make sure we don't end up writting pages from the
> >> inodes we don't want to.
> >
> > Originally for struct page_cgroup reduction, I had the idea of
> > introducing something like
> >
> >        struct memcg_mapping {
> >                struct address_space *mapping;
> >                struct mem_cgroup *memcg;
> >        };
> >
> > hanging off page->mapping to make memcg association no longer per-page
> > and save the pc->memcg linkage (it's not completely per-inode either,
> > multiple memcgs can still refer to a single inode).
> >
> > We could put these descriptors on a per-memcg list and write inodes
> > from this list during memcg-writeback.
> >
> > We would have the option of extending this structure to contain hints
> > as to which subrange of the inode is actually owned by the cgroup, to
> > further narrow writeback to the right pages - iff shared big files
> > become a problem.
> >
> > Does that sound feasible?
> 
> If I understand your memcg_mapping proposal, then each inode could
> have a collection of memcg_mapping objects representing the set of
> memcg that were charged for caching pages of the inode's data.  When a
> new file page is charged to a memcg, then the inode's set of
> memcg_mapping would be scanned to determine if current's memcg is
> already in the memcg_mapping set.  If this is the first page for the
> memcg within the inode, then a new memcg_mapping would be allocated
> and attached to the inode.  The memcg_mapping may be reference counted
> and would be deleted when the last inode page for a particular memcg
> is uncharged.

Dead-on.  Well, on which side you put the list - a per-memcg list of
inodes, or a per-inode list of memcgs - really depends on which way
you want to do the lookups.  But this is the idea, yes.

>   page->mapping = &memcg_mapping
>   inode->i_mapping = collection of memcg_mapping, grows/shrinks with [un]charge

If the memcg_mapping list (or hash-table for quick find-or-create?)
was to be on the inode side, I'd put it in struct address_space, since
this is all about page cache, not so much an fs thing.

Still, correct in general.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-03-16 21:52 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-11 18:43 [PATCH v6 0/9] memcg: per cgroup dirty page accounting Greg Thelen
2011-03-11 18:43 ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 1/9] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-14 14:50   ` Minchan Kim
2011-03-14 14:50     ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 2/9] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 3/9] memcg: add dirty page accounting infrastructure Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-14 14:56   ` Minchan Kim
2011-03-14 14:56     ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 4/9] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-14 15:10   ` Minchan Kim
2011-03-14 15:10     ` Minchan Kim
2011-03-15  6:32     ` Greg Thelen
2011-03-15  6:32       ` Greg Thelen
2011-03-15  6:32       ` Greg Thelen
2011-03-15 13:50       ` Ryusuke Konishi
2011-03-15 13:50         ` Ryusuke Konishi
2011-03-11 18:43 ` [PATCH v6 5/9] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 6/9] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-14 15:16   ` Minchan Kim
2011-03-14 15:16     ` Minchan Kim
2011-03-15 14:01   ` Mike Heffner
2011-03-15 14:01     ` Mike Heffner
2011-03-16  0:00     ` KAMEZAWA Hiroyuki
2011-03-16  0:00       ` KAMEZAWA Hiroyuki
2011-03-16  0:50     ` Greg Thelen
2011-03-16  0:50       ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 7/9] memcg: add dirty limiting routines Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 8/9] memcg: check memcg dirty limits in page writeback Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-14 17:54   ` Vivek Goyal
2011-03-14 17:54     ` Vivek Goyal
2011-03-14 17:59     ` Vivek Goyal
2011-03-14 17:59       ` Vivek Goyal
2011-03-14 21:10     ` Jan Kara
2011-03-14 21:10       ` Jan Kara
2011-03-15  3:27       ` Greg Thelen
2011-03-15  3:27         ` Greg Thelen
2011-03-15 23:12         ` Jan Kara
2011-03-15 23:12           ` Jan Kara
2011-03-15 23:12           ` Jan Kara
2011-03-16  2:35           ` Greg Thelen
2011-03-16  2:35             ` Greg Thelen
2011-03-16  2:35             ` Greg Thelen
2011-03-16 12:35             ` Jan Kara
2011-03-16 12:35               ` Jan Kara
2011-03-16 12:35               ` Jan Kara
2011-03-16 18:07               ` Vivek Goyal
2011-03-16 18:07                 ` Vivek Goyal
2011-03-16 18:07                 ` Vivek Goyal
2011-03-15 16:20       ` Vivek Goyal
2011-03-15 16:20         ` Vivek Goyal
2011-03-11 18:43 ` [PATCH v6 9/9] memcg: make background writeback memcg aware Greg Thelen
2011-03-11 18:43   ` Greg Thelen
2011-03-15 22:54   ` Vivek Goyal
2011-03-15 22:54     ` Vivek Goyal
2011-03-16  1:00     ` Greg Thelen
2011-03-16  1:00       ` Greg Thelen
2011-03-12  1:10 ` [PATCH v6 0/9] memcg: per cgroup dirty page accounting Andrew Morton
2011-03-12  1:10   ` Andrew Morton
2011-03-14 18:29   ` Greg Thelen
2011-03-14 18:29     ` Greg Thelen
2011-03-14 20:23     ` Vivek Goyal
2011-03-14 20:23       ` Vivek Goyal
2011-03-15  2:41       ` Greg Thelen
2011-03-15  2:41         ` Greg Thelen
2011-03-15 18:48         ` Vivek Goyal
2011-03-15 18:48           ` Vivek Goyal
2011-03-15 18:48           ` Vivek Goyal
2011-03-16 13:13           ` Johannes Weiner
2011-03-16 13:13             ` Johannes Weiner
2011-03-16 13:13             ` Johannes Weiner
2011-03-16 14:59             ` Vivek Goyal
2011-03-16 14:59               ` Vivek Goyal
2011-03-16 14:59               ` Vivek Goyal
2011-03-16 16:35               ` Johannes Weiner
2011-03-16 16:35                 ` Johannes Weiner
2011-03-16 16:35                 ` Johannes Weiner
2011-03-16 17:06                 ` Vivek Goyal
2011-03-16 17:06                   ` Vivek Goyal
2011-03-16 21:19             ` Greg Thelen
2011-03-16 21:19               ` Greg Thelen
2011-03-16 21:52               ` Johannes Weiner [this message]
2011-03-16 21:52                 ` Johannes Weiner
2011-03-16 21:52                 ` Johannes Weiner
2011-03-17  4:41                 ` Greg Thelen
2011-03-17  4:41                   ` Greg Thelen
2011-03-17 12:43                   ` Johannes Weiner
2011-03-17 12:43                     ` Johannes Weiner
2011-03-17 14:49                     ` Vivek Goyal
2011-03-17 14:49                       ` Vivek Goyal
2011-03-17 14:53                     ` Jan Kara
2011-03-17 14:53                       ` Jan Kara
2011-03-17 15:42                       ` Curt Wohlgemuth
2011-03-17 15:42                         ` Curt Wohlgemuth
2011-03-17 15:42                         ` Curt Wohlgemuth
2011-03-18  7:57                     ` Greg Thelen
2011-03-18  7:57                       ` Greg Thelen
2011-03-18 14:50                       ` Vivek Goyal
2011-03-18 14:50                         ` Vivek Goyal
2011-03-23  9:06                       ` KAMEZAWA Hiroyuki
2011-03-23  9:06                         ` KAMEZAWA Hiroyuki
2011-03-23  9:06                         ` KAMEZAWA Hiroyuki
2011-03-18 14:29                     ` Vivek Goyal
2011-03-18 14:29                       ` Vivek Goyal
2011-03-18 14:46                       ` Johannes Weiner
2011-03-18 14:46                         ` Johannes Weiner
2011-03-17 14:46                   ` Jan Kara
2011-03-17 14:46                     ` Jan Kara
2011-03-17 17:12                     ` Vivek Goyal
2011-03-17 17:12                       ` Vivek Goyal
2011-03-17 17:59                       ` Jan Kara
2011-03-17 17:59                         ` Jan Kara
2011-03-17 18:15                         ` Vivek Goyal
2011-03-17 18:15                           ` Vivek Goyal
2011-03-15 21:23         ` Vivek Goyal
2011-03-15 21:23           ` Vivek Goyal
2011-03-15 21:23           ` Vivek Goyal
2011-03-15 23:11           ` Vivek Goyal
2011-03-15 23:11             ` Vivek Goyal
2011-03-15 23:11             ` Vivek Goyal
2011-03-15  1:56     ` KAMEZAWA Hiroyuki
2011-03-15  1:56       ` KAMEZAWA Hiroyuki
2011-03-15  2:51       ` Greg Thelen
2011-03-15  2:51         ` Greg Thelen
2011-03-15  2:54         ` KAMEZAWA Hiroyuki
2011-03-15  2:54           ` KAMEZAWA Hiroyuki
2011-03-16 12:45 ` Johannes Weiner
2011-03-16 12:45   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110316215214.GO2140@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=ctalbott@google.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=teravest@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.