Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vivek Goyal <vgoyal@redhat.com>
To: Curt Wohlgemuth <curtw@google.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Wu Fengguang <fengguang.wu@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Thelen <gthelen@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Justin TerAvest <teravest@google.com>
Subject: Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach)
Date: Thu, 17 Mar 2011 18:56:57 -0400	[thread overview]
Message-ID: <20110317225657.GI10482@redhat.com> (raw)
In-Reply-To: <AANLkTimwUrvyEJdF7s2XZCv4JaC_rsTA1Rg9u68xMs=O@mail.gmail.com>

On Thu, Mar 17, 2011 at 11:55:34AM -0700, Curt Wohlgemuth wrote:
> On Thu, Mar 17, 2011 at 10:32 AM, Jan Kara <jack@suse.cz> wrote:
> > On Thu 17-03-11 08:46:23, Curt Wohlgemuth wrote:
> >> On Tue, Mar 8, 2011 at 2:31 PM, Jan Kara <jack@suse.cz> wrote:
> >> The design of IO-less foreground throttling of writeback in the context of
> >> memory cgroups is being discussed in the memcg patch threads (e.g.,
> >> "[PATCH v6 0/9] memcg: per cgroup dirty page accounting"), but I've got
> >> another concern as well.  And that's how restricting per-BDI writeback to a
> >> single task will affect proposed changes for tracking and accounting of
> >> buffered writes to the IO scheduler ("[RFC] [PATCH 0/6] Provide cgroup
> >> isolation for buffered writes", https://lkml.org/lkml/2011/3/8/332 ).
> >>
> >> It seems totally reasonable that reducing competition for write requests to
> >> a BDI -- by using the flusher thread to "handle" foreground writeout --
> >> would increase throughput to that device.  At Google, we experiemented with
> >> this in a hacked-up fashion several months ago (FG task would enqueue a work
> >> item and sleep for some period of time, wake up and see if it was below the
> >> dirty limit), and found that we were indeed getting better throughput.
> >>
> >> But if one of one's goals is to provide some sort of disk isolation based on
> >> cgroup parameters, than having at most one stream of write requests
> >> effectively neuters the IO scheduler.  We saw that in practice, which led to
> >> abandoning our attempt at "IO-less throttling."
> 
> >  Let me check if I understand: The problem you have with one flusher
> > thread is that when written pages all belong to a single memcg, there is
> > nothing IO scheduler can prioritize, right?
> 
> Correct.  Well, perhaps.  Given that the memory cgroups and the IO
> cgroups may not overlap, it's possible that write requests from a
> single memcg might be targeted to multiple IO cgroups, and scheduling
> priorities can be maintained.  Of course, the other way round might be
> the case as well.

[CCing some folks who were involved in other mail thread]

I think that for buffered write case it would make most sense when memory
controller and IO controller are co-mounted and working with each other.
The reason being that for async writes we need to control the dirty share of
a cgroup as well as try to prioritize the IO at device level from cgroup.

It would not make any sense that a low prio async group is choked at device
level and its footprint in page cache is increasing resulting in choking
other fast writers. 

So we need to make sure that slow writers don't have huge page cache
footprint and hence I think using memory and IO controller together
makes sense. Do you have other use cases where it does not make sense?

> 
> The point is just that from however many memcgs the flusher thread is
> working on behalf of, there's only a single stream of requests, which
> are *likely* for a single IO cgroup, and hence there's nothing to
> prioritize.

I think even single submitter stream can also make sense if underlying
device/bdi is slow and submitter is fast and switches frequently between
memory cgroups for selection of inodes.

So we have IO control at device level and we have IO queues for each
cgroup and if flusher thread can move quickly (say submit 512 pages
from one cgroup and then move to next), from one cgroup to other,
then we should automatically get the IO difference.

In other mail I suggested that if we can keep per memory cgroup per BDI stats
for number of writes in progress, then flusher thread can skip submitting
IO from cgroups which are slow and there are many pending writebacks. That is
a hint to flusher thread that IO scheduler is giving this cgroup a lower
priority hence high number of writes in flight which are simply queued up at
IO schduler. For high priority cgroups which are making progress, pending
writebacks will be small or zero and flusher can submit more inodes/pages
from that memory cgroup. That way a higher weight group should get more IO
done as compared to a slower group.

I am assuming that prioritizing async request is primarily is for slow
media like single SATA disk. If yes, then flusher thread should be
able to submit pages much faster then device can complete those and
can cgroup IO queues busy at end device hence IO scheduler should be
able to prioritize.

Thoughts?

Thanks
Vivek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-03-17 22:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-08 22:31 [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Jan Kara
2011-03-08 22:31 ` [PATCH 1/5] writeback: account per-bdi accumulated written pages Jan Kara
2011-03-08 22:31 ` [PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic Jan Kara
2011-03-09 21:02   ` Vivek Goyal
2011-03-14 20:44     ` Jan Kara
2011-03-15 15:21       ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() Jan Kara
2011-03-10  0:07   ` Vivek Goyal
2011-03-14 20:48     ` Jan Kara
2011-03-15 15:23       ` Vivek Goyal
2011-03-16 21:26         ` Curt Wohlgemuth
2011-03-16 22:53           ` Curt Wohlgemuth
2011-03-16 16:53   ` Vivek Goyal
2011-03-16 19:10     ` Jan Kara
2011-03-16 19:31       ` Vivek Goyal
2011-03-16 19:58         ` Jan Kara
2011-03-16 20:22           ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 4/5] mm: Remove low limit from sync_writeback_pages() Jan Kara
2011-03-08 22:31 ` [PATCH 5/5] mm: Autotune interval between distribution of page completions Jan Kara
2011-03-17 15:46 ` [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Curt Wohlgemuth
2011-03-17 15:51   ` Christoph Hellwig
2011-03-17 16:24     ` Curt Wohlgemuth
2011-03-17 16:43       ` Christoph Hellwig
2011-03-17 17:32   ` Jan Kara
2011-03-17 18:55     ` Curt Wohlgemuth
2011-03-17 22:56       ` Vivek Goyal [this message]
2011-03-18 14:30 ` Wu Fengguang
2011-03-22 21:43   ` Jan Kara
2011-03-23  4:41     ` Dave Chinner
2011-03-25 12:59       ` Wu Fengguang
2011-03-25 13:44     ` Wu Fengguang
2011-03-25 23:05       ` Jan Kara
2011-03-28  2:44         ` Wu Fengguang
2011-03-28 15:08           ` Jan Kara
2011-03-29  1:44             ` Wu Fengguang
2011-03-29  2:14           ` Dave Chinner
2011-03-29  2:41             ` Wu Fengguang
2011-03-29  5:59               ` Dave Chinner
2011-03-29  7:31                 ` Wu Fengguang
2011-03-29  7:52                   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110317225657.GI10482@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=curtw@google.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=teravest@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).