From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Greg Thelen <gthelen@google.com>,
"bsingharora@gmail.com" <bsingharora@gmail.com>,
Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.cz>,
linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
Ying Han <yinghan@google.com>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
lsf-pc@lists.linux-foundation.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] memcg topics.
Date: Thu, 2 Feb 2012 19:31:15 +0800 [thread overview]
Message-ID: <20120202113115.GA21994@localhost> (raw)
In-Reply-To: <20120202101525.GD31730@quack.suse.cz>
On Thu, Feb 02, 2012 at 11:15:25AM +0100, Jan Kara wrote:
> On Thu 02-02-12 14:33:45, Wu Fengguang wrote:
> > Hi Greg,
> >
> > On Wed, Feb 01, 2012 at 12:24:25PM -0800, Greg Thelen wrote:
> > > On Tue, Jan 31, 2012 at 4:55 PM, KAMEZAWA Hiroyuki
> > > <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > 4. dirty ratio
> > > > A In the last year, patches were posted but not merged. I'd like to hear
> > > > A works on this area.
> > >
> > > I would like to attend to discuss this topic. I have not had much time to work
> > > on this recently, but should be able to focus more on this soon. The
> > > IO less writeback changes require some redesign and may allow for a
> > > simpler implementation of mem_cgroup_balance_dirty_pages().
> > > Maintaining a per container dirty page counts, ratios, and limits is
> > > fairly easy, but integration with writeback is the challenge. My big
> > > questions are for writeback people:
> > > 1. how to compute per-container pause based on bdi bandwidth, cgroup
> > > dirty page usage.
> > > 2. how to ensure that writeback will engage even if system and bdi are
> > > below respective background dirty ratios, yet a memcg is above its bg
> > > dirty limit.
> >
> > The solution to (1,2) would be something like this:
> >
> > --- linux-next.orig/mm/page-writeback.c 2012-02-02 14:13:45.000000000 +0800
> > +++ linux-next/mm/page-writeback.c 2012-02-02 14:24:11.000000000 +0800
> > @@ -654,6 +654,17 @@ static unsigned long bdi_position_ratio(
> > pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> > pos_ratio += 1 << RATELIMIT_CALC_SHIFT;
> >
> > + if (memcg) {
> > + long long f;
> > + x = div_s64((memcg_setpoint - memcg_dirty) << RATELIMIT_CALC_SHIFT,
> > + memcg_limit - memcg_setpoint + 1);
> > + f = x;
> > + f = f * x >> RATELIMIT_CALC_SHIFT;
> > + f = f * x >> RATELIMIT_CALC_SHIFT;
> > + f += 1 << RATELIMIT_CALC_SHIFT;
> > + pos_ratio = pos_ratio * f >> RATELIMIT_CALC_SHIFT;
> > + }
> > +
> Hmm, so you multiply pos_ratio computed for global situation with
> pos_ratio computed for memcg situation, right? Why? My natural choice would
> be to just use memcg situation for computing pos_ratio since memcg is
> supposed to have less memory & stricter limits than root cgroup (global)...
Yeah I also started with considering a standalone memcg pos_ratio.
However the above form can free us from worrying about misconfigured
memcg dirty limit exceeding global dirty limit, or the more
uncontrollable scheme of the memcg dirty limit exceeding some bdi
threshold.
> > /*
> > * We have computed basic pos_ratio above based on global situation. If
> > * the bdi is over/under its share of dirty pages, we want to scale
> > @@ -1202,6 +1213,8 @@ static void balance_dirty_pages(struct a
> > freerun = dirty_freerun_ceiling(dirty_thresh,
> > background_thresh);
> > if (nr_dirty <= freerun) {
> > + if (memcg && memcg_dirty > memcg_freerun)
> > + goto start_writeback;
> > current->dirty_paused_when = now;
> > current->nr_dirtied = 0;
> > current->nr_dirtied_pause =
> > @@ -1209,6 +1222,7 @@ static void balance_dirty_pages(struct a
> > break;
> > }
> >
> > +start_writeback:
> > if (unlikely(!writeback_in_progress(bdi)))
> > bdi_start_background_writeback(bdi);
> I guess this should better be coupled with memcg-aware writeback which
> was part of Greg's original patches if I remember right. That way we'd know
> we are making progress on the pages of the right cgroup. But we can
> certainly try this minimal change and see whether cgroups won't get starved
> too much...
Agreed. The complete solution would need more code from Greg to
teach the flusher to focus on the memcg inodes/pages.
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2012-02-02 11:41 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-01 0:55 [LSF/MM TOPIC] memcg topics KAMEZAWA Hiroyuki
2012-02-01 8:58 ` Glauber Costa
2012-02-02 11:33 ` [LSF/MM TOPIC][ATTEND] " Glauber Costa
2012-02-01 20:24 ` [LSF/MM TOPIC] " Greg Thelen
2012-02-02 6:33 ` Wu Fengguang
2012-02-02 7:34 ` Greg Thelen
2012-02-02 7:54 ` Wu Fengguang
2012-02-02 7:52 ` Wu Fengguang
2012-02-02 10:39 ` [Lsf-pc] " Jan Kara
2012-02-02 11:04 ` Wu Fengguang
2012-02-02 15:42 ` Jan Kara
2012-02-03 1:26 ` Wu Fengguang
2012-02-03 6:21 ` Greg Thelen
2012-02-03 9:40 ` Wu Fengguang
2012-02-02 10:15 ` Jan Kara
2012-02-02 11:31 ` Wu Fengguang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120202113115.GA21994@localhost \
--to=fengguang.wu@intel.com \
--cc=bsingharora@gmail.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).