linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Greg Thelen <gthelen@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Chad Talbott <ctalbott@google.com>,
	Justin TerAvest <teravest@google.com>
Subject: Re: [PATCH v6 8/9] memcg: check memcg dirty limits in page writeback
Date: Wed, 16 Mar 2011 14:07:07 -0400	[thread overview]
Message-ID: <20110316180707.GD13562@redhat.com> (raw)
In-Reply-To: <20110316123514.GA4456@quack.suse.cz>

On Wed, Mar 16, 2011 at 01:35:14PM +0100, Jan Kara wrote:
> On Tue 15-03-11 19:35:26, Greg Thelen wrote:
> > On Tue, Mar 15, 2011 at 4:12 PM, Jan Kara <jack@suse.cz> wrote:
> > >  I found out I've already deleted the relevant email and thus have no good
> > > way to reply to it. So in the end I'll write it here: As Vivek pointed out,
> > > you try to introduce background writeback that honors per-cgroup limits but
> > > the way you do it it doesn't quite work. To avoid livelocking of flusher
> > > thread, any essentially unbounded work (and background writeback of bdi or
> > > in your case a cgroup pages on the bdi is in principle unbounded) has to
> > > give way to other work items in the queue (like a work submitted by
> > > sync(1)). Thus wb_writeback() stops for_background works if there are other
> > > works to do with the rationale that as soon as that work is finished, we
> > > may happily return to background cleaning (and that other work works for
> > > background cleaning as well anyway).
> > >
> > > But with your introduction of per-cgroup background writeback we are going
> > > to loose the information in which cgroup we have to get below background
> > > limit. And if we stored the context somewhere and tried to return to it
> > > later, we'd have the above problems with livelocking and we'd have to
> > > really carefully handle cases where more cgroups actually want their limits
> > > observed.
> > >
> > > I'm not decided what would be a good solution for this. It seems that
> > > a flusher thread should check all cgroups whether they are not exceeding
> > > their background limit and if yes, do writeback. I'm not sure how practical
> > > that would be but possibly we could have a list of cgroups with exceeded
> > > limits and flusher thread could check that?
> > 
> > mem_cgroup_balance_dirty_pages() queues a bdi work item which already
> > includes a memcg that is available to wb_writeback() in '[PATCH v6
> > 9/9] memcg: make background writeback memcg aware'.  Background
> > writeback checks the given memcg usage vs memcg limit rather than
> > global usage vs global limit.
>   Yes.
> 
> > If we amend this to requeue an interrupted background work to the end
> > of the per-bdi work_list, then I think that would address the
> > livelocking issue.
>   Yes, that would work. But it would be nice (I'd find that cleaner design)
> if we could keep just one type of background work and make sure that it
> observes all the imposed memcg limits. For that we wouldn't explicitely
> pass memcg to the flusher thread but rather make over_bground_thresh()
> check all the memcg limits - or to make this more effective have some list
> of memcgs which crossed the background limit. What do you think?

List of memcg per bdi which need writeback sounds interesting. This
can also allow us to keep track of additional state in memcgroup
regarding how much IO is in flight per memory cgroup on a bdi. One of
the additional things we wanted to do was differentiating between
write speed of two buffered writers in two groups. IO controller at
the end device can differentiate between the rates but that is only
possible if flusher threads are submitting enough IO from faster moving
group and not getting stuck behind slow group.

So if we can also do some accouting of in flight IO per memcg per bdi,
then flusher threads can skip the memcg which have lot of pending IOs.
That means IO controller at the device is holding back on these
requests and prioritizing some other group. And flusher threads can
move onto other memcg in the list and pick inodes from those.

If there are per memcg per bdi structures, then there can be per memcg
per bdi waitlists too and throttled task can sleep on those wait lists
and one can keep count of BDI_WRITTEN per memory cgroup and distribute
completion its tasks. That way, even memory cgroup foreground writeout
becomes IO less. 

Thanks
Vivek 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-03-16 18:07 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-11 18:43 [PATCH v6 0/9] memcg: per cgroup dirty page accounting Greg Thelen
2011-03-11 18:43 ` [PATCH v6 1/9] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-03-14 14:50   ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 2/9] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-03-11 18:43 ` [PATCH v6 3/9] memcg: add dirty page accounting infrastructure Greg Thelen
2011-03-14 14:56   ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 4/9] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-03-14 15:10   ` Minchan Kim
2011-03-15  6:32     ` Greg Thelen
2011-03-15 13:50       ` Ryusuke Konishi
2011-03-11 18:43 ` [PATCH v6 5/9] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-03-11 18:43 ` [PATCH v6 6/9] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-03-14 15:16   ` Minchan Kim
2011-03-15 14:01   ` Mike Heffner
2011-03-16  0:00     ` KAMEZAWA Hiroyuki
2011-03-16  0:50     ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 7/9] memcg: add dirty limiting routines Greg Thelen
2011-03-11 18:43 ` [PATCH v6 8/9] memcg: check memcg dirty limits in page writeback Greg Thelen
2011-03-14 17:54   ` Vivek Goyal
2011-03-14 17:59     ` Vivek Goyal
2011-03-14 21:10     ` Jan Kara
2011-03-15  3:27       ` Greg Thelen
2011-03-15 23:12         ` Jan Kara
2011-03-16  2:35           ` Greg Thelen
2011-03-16 12:35             ` Jan Kara
2011-03-16 18:07               ` Vivek Goyal [this message]
2011-03-15 16:20       ` Vivek Goyal
2011-03-11 18:43 ` [PATCH v6 9/9] memcg: make background writeback memcg aware Greg Thelen
2011-03-15 22:54   ` Vivek Goyal
2011-03-16  1:00     ` Greg Thelen
2011-03-12  1:10 ` [PATCH v6 0/9] memcg: per cgroup dirty page accounting Andrew Morton
2011-03-14 18:29   ` Greg Thelen
2011-03-14 20:23     ` Vivek Goyal
2011-03-15  2:41       ` Greg Thelen
2011-03-15 18:48         ` Vivek Goyal
2011-03-16 13:13           ` Johannes Weiner
2011-03-16 14:59             ` Vivek Goyal
2011-03-16 16:35               ` Johannes Weiner
2011-03-16 17:06                 ` Vivek Goyal
2011-03-16 21:19             ` Greg Thelen
2011-03-16 21:52               ` Johannes Weiner
2011-03-17  4:41                 ` Greg Thelen
2011-03-17 12:43                   ` Johannes Weiner
2011-03-17 14:49                     ` Vivek Goyal
2011-03-17 14:53                     ` Jan Kara
2011-03-17 15:42                       ` Curt Wohlgemuth
2011-03-18  7:57                     ` Greg Thelen
2011-03-18 14:50                       ` Vivek Goyal
2011-03-23  9:06                       ` KAMEZAWA Hiroyuki
2011-03-18 14:29                     ` Vivek Goyal
2011-03-18 14:46                       ` Johannes Weiner
2011-03-17 14:46                   ` Jan Kara
2011-03-17 17:12                     ` Vivek Goyal
2011-03-17 17:59                       ` Jan Kara
2011-03-17 18:15                         ` Vivek Goyal
2011-03-15 21:23         ` Vivek Goyal
2011-03-15 23:11           ` Vivek Goyal
2011-03-15  1:56     ` KAMEZAWA Hiroyuki
2011-03-15  2:51       ` Greg Thelen
2011-03-15  2:54         ` KAMEZAWA Hiroyuki
2011-03-16 12:45 ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110316180707.GD13562@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=ctalbott@google.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=teravest@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).