All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Curt Wohlgemuth <curtw@google.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Wu Fengguang <fengguang.wu@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach)
Date: Thu, 17 Mar 2011 18:32:23 +0100	[thread overview]
Message-ID: <20110317173223.GG4116@quack.suse.cz> (raw)
In-Reply-To: <AANLkTimeH-hFiqtALfzyyrHiLz52qQj0gCisaJ-taCdq@mail.gmail.com>

On Thu 17-03-11 08:46:23, Curt Wohlgemuth wrote:
> On Tue, Mar 8, 2011 at 2:31 PM, Jan Kara <jack@suse.cz> wrote:
> The design of IO-less foreground throttling of writeback in the context of
> memory cgroups is being discussed in the memcg patch threads (e.g.,
> "[PATCH v6 0/9] memcg: per cgroup dirty page accounting"), but I've got
> another concern as well.  And that's how restricting per-BDI writeback to a
> single task will affect proposed changes for tracking and accounting of
> buffered writes to the IO scheduler ("[RFC] [PATCH 0/6] Provide cgroup
> isolation for buffered writes", https://lkml.org/lkml/2011/3/8/332 ).
> 
> It seems totally reasonable that reducing competition for write requests to
> a BDI -- by using the flusher thread to "handle" foreground writeout --
> would increase throughput to that device.  At Google, we experiemented with
> this in a hacked-up fashion several months ago (FG task would enqueue a work
> item and sleep for some period of time, wake up and see if it was below the
> dirty limit), and found that we were indeed getting better throughput.
> 
> But if one of one's goals is to provide some sort of disk isolation based on
> cgroup parameters, than having at most one stream of write requests
> effectively neuters the IO scheduler.  We saw that in practice, which led to
> abandoning our attempt at "IO-less throttling."
  Let me check if I understand: The problem you have with one flusher
thread is that when written pages all belong to a single memcg, there is
nothing IO scheduler can prioritize, right?

> One possible solution would be to put some of the disk isolation smarts into
> the writeback path, so the flusher thread could choose inodes with this as a
> criteria, but this seems ugly on its face, and makes my head hurt.
  Well, I think it could be implemented in a reasonable way but then you
still miss reads and direct IO from the mix so it will be a poor isolation.
But maybe we could propagate the information from IO scheduler to flusher
thread? If IO scheduler sees memcg has run out of its limit, it could hint
to a flusher thread that it should switch to an inode from a different memcg.
But still the details get nasty as I think about them (how to pick next
memcg, how to pick inodes,...). Essentially, we'd have to do with flusher
threads what old pdflush did when handling congested devices. Ugh.

> Otherwise, I'm having trouble thinking of a way to do effective isolation in
> the IO scheduler without having competing threads -- for different cgroups --
> making write requests for buffered data.  Perhaps the best we could do would
> be to enable IO-less throttling in writeback as a config option?
  Well, nothing prevents us to choose to do foreground writeback throttling
for memcgs and IO-less one without them but as Christoph writes, this
doesn't seem very compeling either... I'll let this brew in my head for
some time and maybe something comes.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Curt Wohlgemuth <curtw@google.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Wu Fengguang <fengguang.wu@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach)
Date: Thu, 17 Mar 2011 18:32:23 +0100	[thread overview]
Message-ID: <20110317173223.GG4116@quack.suse.cz> (raw)
In-Reply-To: <AANLkTimeH-hFiqtALfzyyrHiLz52qQj0gCisaJ-taCdq@mail.gmail.com>

On Thu 17-03-11 08:46:23, Curt Wohlgemuth wrote:
> On Tue, Mar 8, 2011 at 2:31 PM, Jan Kara <jack@suse.cz> wrote:
> The design of IO-less foreground throttling of writeback in the context of
> memory cgroups is being discussed in the memcg patch threads (e.g.,
> "[PATCH v6 0/9] memcg: per cgroup dirty page accounting"), but I've got
> another concern as well.  And that's how restricting per-BDI writeback to a
> single task will affect proposed changes for tracking and accounting of
> buffered writes to the IO scheduler ("[RFC] [PATCH 0/6] Provide cgroup
> isolation for buffered writes", https://lkml.org/lkml/2011/3/8/332 ).
> 
> It seems totally reasonable that reducing competition for write requests to
> a BDI -- by using the flusher thread to "handle" foreground writeout --
> would increase throughput to that device.  At Google, we experiemented with
> this in a hacked-up fashion several months ago (FG task would enqueue a work
> item and sleep for some period of time, wake up and see if it was below the
> dirty limit), and found that we were indeed getting better throughput.
> 
> But if one of one's goals is to provide some sort of disk isolation based on
> cgroup parameters, than having at most one stream of write requests
> effectively neuters the IO scheduler.  We saw that in practice, which led to
> abandoning our attempt at "IO-less throttling."
  Let me check if I understand: The problem you have with one flusher
thread is that when written pages all belong to a single memcg, there is
nothing IO scheduler can prioritize, right?

> One possible solution would be to put some of the disk isolation smarts into
> the writeback path, so the flusher thread could choose inodes with this as a
> criteria, but this seems ugly on its face, and makes my head hurt.
  Well, I think it could be implemented in a reasonable way but then you
still miss reads and direct IO from the mix so it will be a poor isolation.
But maybe we could propagate the information from IO scheduler to flusher
thread? If IO scheduler sees memcg has run out of its limit, it could hint
to a flusher thread that it should switch to an inode from a different memcg.
But still the details get nasty as I think about them (how to pick next
memcg, how to pick inodes,...). Essentially, we'd have to do with flusher
threads what old pdflush did when handling congested devices. Ugh.

> Otherwise, I'm having trouble thinking of a way to do effective isolation in
> the IO scheduler without having competing threads -- for different cgroups --
> making write requests for buffered data.  Perhaps the best we could do would
> be to enable IO-less throttling in writeback as a config option?
  Well, nothing prevents us to choose to do foreground writeback throttling
for memcgs and IO-less one without them but as Christoph writes, this
doesn't seem very compeling either... I'll let this brew in my head for
some time and maybe something comes.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-03-17 17:32 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-08 22:31 [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Jan Kara
2011-03-08 22:31 ` Jan Kara
2011-03-08 22:31 ` [PATCH 1/5] writeback: account per-bdi accumulated written pages Jan Kara
2011-03-08 22:31   ` Jan Kara
2011-03-08 22:31 ` [PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic Jan Kara
2011-03-08 22:31   ` Jan Kara
2011-03-09 21:02   ` Vivek Goyal
2011-03-14 20:44     ` Jan Kara
2011-03-14 20:44       ` Jan Kara
2011-03-15 15:21       ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() Jan Kara
2011-03-08 22:31   ` Jan Kara
2011-03-10  0:07   ` Vivek Goyal
2011-03-14 20:48     ` Jan Kara
2011-03-14 20:48       ` Jan Kara
2011-03-15 15:23       ` Vivek Goyal
2011-03-16 21:26         ` Curt Wohlgemuth
2011-03-16 22:53           ` Curt Wohlgemuth
2011-03-16 22:53             ` Curt Wohlgemuth
2011-03-16 16:53   ` Vivek Goyal
2011-03-16 19:10     ` Jan Kara
2011-03-16 19:31       ` Vivek Goyal
2011-03-16 19:58         ` Jan Kara
2011-03-16 19:58           ` Jan Kara
2011-03-16 20:22           ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 4/5] mm: Remove low limit from sync_writeback_pages() Jan Kara
2011-03-08 22:31 ` [PATCH 5/5] mm: Autotune interval between distribution of page completions Jan Kara
2011-03-08 22:31   ` Jan Kara
2011-03-17 15:46 ` [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Curt Wohlgemuth
2011-03-17 15:46   ` Curt Wohlgemuth
2011-03-17 15:51   ` Christoph Hellwig
2011-03-17 15:51     ` Christoph Hellwig
2011-03-17 16:24     ` Curt Wohlgemuth
2011-03-17 16:24       ` Curt Wohlgemuth
2011-03-17 16:43       ` Christoph Hellwig
2011-03-17 16:43         ` Christoph Hellwig
2011-03-17 17:32   ` Jan Kara [this message]
2011-03-17 17:32     ` Jan Kara
2011-03-17 18:55     ` Curt Wohlgemuth
2011-03-17 18:55       ` Curt Wohlgemuth
2011-03-17 22:56       ` Vivek Goyal
2011-03-17 22:56         ` Vivek Goyal
2011-03-18 14:30 ` Wu Fengguang
2011-03-18 14:30   ` Wu Fengguang
2011-03-22 21:43   ` Jan Kara
2011-03-22 21:43     ` Jan Kara
2011-03-23  4:41     ` Dave Chinner
2011-03-23  4:41       ` Dave Chinner
2011-03-25 12:59       ` Wu Fengguang
2011-03-25 12:59         ` Wu Fengguang
2011-03-25 13:44     ` Wu Fengguang
2011-03-25 23:05       ` Jan Kara
2011-03-25 23:05         ` Jan Kara
2011-03-28  2:44         ` Wu Fengguang
2011-03-28  2:44           ` Wu Fengguang
2011-03-28 15:08           ` Jan Kara
2011-03-28 15:08             ` Jan Kara
2011-03-29  1:44             ` Wu Fengguang
2011-03-29  1:44               ` Wu Fengguang
2011-03-29  2:14           ` Dave Chinner
2011-03-29  2:41             ` Wu Fengguang
2011-03-29  5:59               ` Dave Chinner
2011-03-29  5:59                 ` Dave Chinner
2011-03-29  7:31                 ` Wu Fengguang
2011-03-29  7:52                   ` Wu Fengguang
2011-03-29  7:52                     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110317173223.GG4116@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=curtw@google.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.