From: Andrea Righi <arighi@develer.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Peter Zijlstra <peterz@infradead.org>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
Suleiman Souhlal <suleiman@google.com>,
Greg Thelen <gthelen@google.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Andrew Morton <akpm@linux-foundation.org>,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v6)
Date: Fri, 12 Mar 2010 00:27:09 +0100 [thread overview]
Message-ID: <20100311232708.GE2427@linux> (raw)
In-Reply-To: <20100311150307.GC29246@redhat.com>
On Thu, Mar 11, 2010 at 10:03:07AM -0500, Vivek Goyal wrote:
> On Thu, Mar 11, 2010 at 06:25:00PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 11 Mar 2010 10:14:25 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > On Thu, 2010-03-11 at 10:17 +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 11 Mar 2010 09:39:13 +0900
> > > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > > > The performance overhead is not so huge in both solutions, but the impact on
> > > > > > performance is even more reduced using a complicated solution...
> > > > > >
> > > > > > Maybe we can go ahead with the simplest implementation for now and start to
> > > > > > think to an alternative implementation of the page_cgroup locking and
> > > > > > charge/uncharge of pages.
> > >
> > > FWIW bit spinlocks suck massive.
> > >
> > > > >
> > > > > maybe. But in this 2 years, one of our biggest concerns was the performance.
> > > > > So, we do something complex in memcg. But complex-locking is , yes, complex.
> > > > > Hmm..I don't want to bet we can fix locking scheme without something complex.
> > > > >
> > > > But overall patch set seems good (to me.) And dirty_ratio and dirty_background_ratio
> > > > will give us much benefit (of performance) than we lose by small overheads.
> > >
> > > Well, the !cgroup or root case should really have no performance impact.
> > >
> > > > IIUC, this series affects trgger for background-write-out.
> > >
> > > Not sure though, while this does the accounting the actual writeout is
> > > still !cgroup aware and can definately impact performance negatively by
> > > shrinking too much.
> > >
> >
> > Ah, okay, your point is !cgroup (ROOT cgroup case.)
> > I don't think accounting these file cache status against root cgroup is necessary.
> >
>
> I think what peter meant was that with memory cgroups created we will do
> writeouts much more aggressively.
>
> In balance_dirty_pages()
>
> if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> break;
>
> Now with Andrea's patches, we are calculating bdi_thres per memory cgroup
> (almost)
>
> bdi_thres ~= per_memory_cgroup_dirty * bdi_fraction
>
> But bdi_nr_reclaimable and bdi_nr_writeback stats are still global.
Correct. More exactly:
bdi_thresh = memcg dirty memory limit * BDI's share of the global dirty memory
Before:
bdi_thresh = global dirty memory limit * BDI's share of the global dirty memory
>
> So for the same number of dirty pages system wide on this bdi, we will be
> triggering writeouts much more aggressively if somebody has created few
> memory cgroups and tasks are running in those cgroups.
Right, if we don't touch per-cgroup dirty limits.
>
> I guess it might cause performance regressions in case of small file
> writeouts because previously one could have written the file to cache and
> be done with it but with this patch set, there are higher changes that
> you will be throttled to write the pages back to disk.
>
> I guess we need two pieces to resolve this.
> - BDI stats per cgroup.
> - Writeback of inodes from same cgroup.
>
> I think BDI stats per cgroup will increase the complextiy.
There'll be the opposite problem I think, the number of dirty pages
(system-wide) will increase, because in this way we'll consider BDI
shares of memcg dirty memory. So I think we need both: per memcg BDI
stats and system-wide BDI stats, then we need to take the min of the two
when evaluating bdi_thresh. Maybe... I'm not really sure about this, and
need to figure better this part. So I started with the simplest
implementation: global BDI stats, and per-memcg dirty memory.
I totally agree about the other point, writeback of inodes per cgroup is
another feature that we need.
> I am still setting up the system to test whether we see any speedup in
> writeout of large files with-in a memory cgroup with small memory limits.
> I am assuming that we are expecting a speedup because we will start
> writeouts early and background writeouts probably are faster than direct
> reclaim?
mmh... speedup? I think with a large file write + reduced dirty limits
you'll get a more uniform write-out (more frequent small writes),
respect to few and less frequent large writes. The system will be more
reactive, but I don't think you'll be able to see a speedup in the large
write itself.
Thanks,
-Andrea
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Righi <arighi@develer.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Peter Zijlstra <peterz@infradead.org>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
Suleiman Souhlal <suleiman@google.com>,
Greg Thelen <gthelen@google.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Andrew Morton <akpm@linux-foundation.org>,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v6)
Date: Fri, 12 Mar 2010 00:27:09 +0100 [thread overview]
Message-ID: <20100311232708.GE2427@linux> (raw)
In-Reply-To: <20100311150307.GC29246@redhat.com>
On Thu, Mar 11, 2010 at 10:03:07AM -0500, Vivek Goyal wrote:
> On Thu, Mar 11, 2010 at 06:25:00PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 11 Mar 2010 10:14:25 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > On Thu, 2010-03-11 at 10:17 +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 11 Mar 2010 09:39:13 +0900
> > > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > > > The performance overhead is not so huge in both solutions, but the impact on
> > > > > > performance is even more reduced using a complicated solution...
> > > > > >
> > > > > > Maybe we can go ahead with the simplest implementation for now and start to
> > > > > > think to an alternative implementation of the page_cgroup locking and
> > > > > > charge/uncharge of pages.
> > >
> > > FWIW bit spinlocks suck massive.
> > >
> > > > >
> > > > > maybe. But in this 2 years, one of our biggest concerns was the performance.
> > > > > So, we do something complex in memcg. But complex-locking is , yes, complex.
> > > > > Hmm..I don't want to bet we can fix locking scheme without something complex.
> > > > >
> > > > But overall patch set seems good (to me.) And dirty_ratio and dirty_background_ratio
> > > > will give us much benefit (of performance) than we lose by small overheads.
> > >
> > > Well, the !cgroup or root case should really have no performance impact.
> > >
> > > > IIUC, this series affects trgger for background-write-out.
> > >
> > > Not sure though, while this does the accounting the actual writeout is
> > > still !cgroup aware and can definately impact performance negatively by
> > > shrinking too much.
> > >
> >
> > Ah, okay, your point is !cgroup (ROOT cgroup case.)
> > I don't think accounting these file cache status against root cgroup is necessary.
> >
>
> I think what peter meant was that with memory cgroups created we will do
> writeouts much more aggressively.
>
> In balance_dirty_pages()
>
> if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> break;
>
> Now with Andrea's patches, we are calculating bdi_thres per memory cgroup
> (almost)
>
> bdi_thres ~= per_memory_cgroup_dirty * bdi_fraction
>
> But bdi_nr_reclaimable and bdi_nr_writeback stats are still global.
Correct. More exactly:
bdi_thresh = memcg dirty memory limit * BDI's share of the global dirty memory
Before:
bdi_thresh = global dirty memory limit * BDI's share of the global dirty memory
>
> So for the same number of dirty pages system wide on this bdi, we will be
> triggering writeouts much more aggressively if somebody has created few
> memory cgroups and tasks are running in those cgroups.
Right, if we don't touch per-cgroup dirty limits.
>
> I guess it might cause performance regressions in case of small file
> writeouts because previously one could have written the file to cache and
> be done with it but with this patch set, there are higher changes that
> you will be throttled to write the pages back to disk.
>
> I guess we need two pieces to resolve this.
> - BDI stats per cgroup.
> - Writeback of inodes from same cgroup.
>
> I think BDI stats per cgroup will increase the complextiy.
There'll be the opposite problem I think, the number of dirty pages
(system-wide) will increase, because in this way we'll consider BDI
shares of memcg dirty memory. So I think we need both: per memcg BDI
stats and system-wide BDI stats, then we need to take the min of the two
when evaluating bdi_thresh. Maybe... I'm not really sure about this, and
need to figure better this part. So I started with the simplest
implementation: global BDI stats, and per-memcg dirty memory.
I totally agree about the other point, writeback of inodes per cgroup is
another feature that we need.
> I am still setting up the system to test whether we see any speedup in
> writeout of large files with-in a memory cgroup with small memory limits.
> I am assuming that we are expecting a speedup because we will start
> writeouts early and background writeouts probably are faster than direct
> reclaim?
mmh... speedup? I think with a large file write + reduced dirty limits
you'll get a more uniform write-out (more frequent small writes),
respect to few and less frequent large writes. The system will be more
reactive, but I don't think you'll be able to see a speedup in the large
write itself.
Thanks,
-Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-11 23:27 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-09 23:00 [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v6) Andrea Righi
2010-03-09 23:00 ` Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock Andrea Righi
2010-03-09 23:00 ` Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 2/5] memcg: dirty memory documentation Andrea Righi
2010-03-09 23:00 ` Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 3/5] page_cgroup: introduce file cache flags Andrea Righi
2010-03-09 23:00 ` Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 4/5] memcg: dirty pages accounting and limiting infrastructure Andrea Righi
2010-03-09 23:00 ` Andrea Righi
[not found] ` <1268175636-4673-5-git-send-email-arighi-vWjgImWzx8FBDgjK7y7TUQ@public.gmane.org>
2010-03-10 22:23 ` Vivek Goyal
2010-03-10 22:23 ` Vivek Goyal
2010-03-10 22:23 ` Vivek Goyal
2010-03-11 22:27 ` Andrea Righi
2010-03-11 22:27 ` Andrea Righi
[not found] ` <20100310222338.GB3009-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-03-11 22:27 ` Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 5/5] memcg: dirty pages instrumentation Andrea Righi
2010-03-09 23:00 ` Andrea Righi
[not found] ` <1268175636-4673-1-git-send-email-arighi-vWjgImWzx8FBDgjK7y7TUQ@public.gmane.org>
2010-03-09 23:00 ` [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 2/5] memcg: dirty memory documentation Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 3/5] page_cgroup: introduce file cache flags Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 4/5] memcg: dirty pages accounting and limiting infrastructure Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 5/5] memcg: dirty pages instrumentation Andrea Righi
2010-03-11 0:39 ` [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v6) KAMEZAWA Hiroyuki
2010-03-11 18:07 ` Vivek Goyal
2010-03-10 1:36 ` Balbir Singh
2010-03-10 1:36 ` Balbir Singh
2010-03-11 0:39 ` KAMEZAWA Hiroyuki
2010-03-11 0:39 ` KAMEZAWA Hiroyuki
2010-03-11 1:17 ` KAMEZAWA Hiroyuki
2010-03-11 1:17 ` KAMEZAWA Hiroyuki
2010-03-11 9:14 ` Peter Zijlstra
2010-03-11 9:14 ` Peter Zijlstra
2010-03-11 9:25 ` KAMEZAWA Hiroyuki
2010-03-11 9:25 ` KAMEZAWA Hiroyuki
[not found] ` <20100311182500.0f3ba994.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-11 9:42 ` KAMEZAWA Hiroyuki
2010-03-11 15:03 ` Vivek Goyal
2010-03-11 9:42 ` KAMEZAWA Hiroyuki
2010-03-11 9:42 ` KAMEZAWA Hiroyuki
2010-03-11 22:20 ` Andrea Righi
2010-03-11 22:20 ` Andrea Righi
2010-03-12 1:14 ` Daisuke Nishimura
2010-03-12 1:14 ` Daisuke Nishimura
[not found] ` <20100312101411.b2639128.nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2010-03-12 2:24 ` KAMEZAWA Hiroyuki
2010-03-12 10:07 ` Andrea Righi
2010-03-12 2:24 ` KAMEZAWA Hiroyuki
2010-03-12 2:24 ` KAMEZAWA Hiroyuki
2010-03-15 14:48 ` Vivek Goyal
2010-03-15 14:48 ` Vivek Goyal
[not found] ` <20100312112433.689c7294.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-15 14:48 ` Vivek Goyal
2010-03-12 10:07 ` Andrea Righi
2010-03-12 10:07 ` Andrea Righi
[not found] ` <20100311184244.6735076a.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-11 22:20 ` Andrea Righi
2010-03-12 1:14 ` Daisuke Nishimura
2010-03-11 15:03 ` Vivek Goyal
2010-03-11 15:03 ` Vivek Goyal
2010-03-11 23:27 ` Andrea Righi [this message]
2010-03-11 23:27 ` Andrea Righi
2010-03-11 23:52 ` KAMEZAWA Hiroyuki
2010-03-11 23:52 ` KAMEZAWA Hiroyuki
2010-03-11 23:52 ` KAMEZAWA Hiroyuki
[not found] ` <20100312085244.98e48991.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-12 10:01 ` Andrea Righi
2010-03-12 10:01 ` Andrea Righi
2010-03-12 10:01 ` Andrea Righi
2010-03-15 14:16 ` Vivek Goyal
2010-03-15 14:16 ` Vivek Goyal
2010-03-15 14:16 ` Vivek Goyal
2010-03-11 23:42 ` KAMEZAWA Hiroyuki
2010-03-11 23:42 ` KAMEZAWA Hiroyuki
2010-03-12 0:33 ` Andrea Righi
2010-03-12 0:33 ` Andrea Righi
2010-03-15 14:38 ` Vivek Goyal
2010-03-15 14:38 ` Vivek Goyal
2010-03-17 22:32 ` Andrea Righi
2010-03-17 22:32 ` Andrea Righi
[not found] ` <20100315143841.GE21127-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-03-17 22:32 ` Andrea Righi
[not found] ` <20100312084230.850f331d.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-12 0:33 ` Andrea Righi
2010-03-15 14:38 ` Vivek Goyal
[not found] ` <20100311150307.GC29246-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-03-11 23:27 ` Andrea Righi
2010-03-11 23:42 ` KAMEZAWA Hiroyuki
2010-03-11 9:25 ` KAMEZAWA Hiroyuki
[not found] ` <20100311101726.f58d24e9.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-11 9:14 ` Peter Zijlstra
2010-03-11 22:23 ` Andrea Righi
2010-03-11 22:23 ` Andrea Righi
[not found] ` <20100311093913.07c9ca8a.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-11 1:17 ` KAMEZAWA Hiroyuki
2010-03-11 22:23 ` Andrea Righi
2010-03-11 18:07 ` Vivek Goyal
2010-03-11 18:07 ` Vivek Goyal
2010-03-11 23:59 ` Andrea Righi
2010-03-11 23:59 ` Andrea Righi
2010-03-12 0:03 ` KAMEZAWA Hiroyuki
2010-03-12 0:03 ` KAMEZAWA Hiroyuki
2010-03-12 0:03 ` KAMEZAWA Hiroyuki
2010-03-12 9:58 ` Andrea Righi
2010-03-12 9:58 ` Andrea Righi
[not found] ` <20100312090326.ad07c05c.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2010-03-12 9:58 ` Andrea Righi
2010-03-15 14:41 ` Vivek Goyal
2010-03-15 14:41 ` Vivek Goyal
2010-03-15 14:41 ` Vivek Goyal
[not found] ` <20100311180753.GE29246-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-03-11 23:59 ` Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2010-03-09 23:00 Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100311232708.GE2427@linux \
--to=arighi@develer.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=gthelen@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=peterz@infradead.org \
--cc=suleiman@google.com \
--cc=trond.myklebust@fys.uio.no \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.