From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dave Chinner <david@fromorbit.com>
Subject: Re: IO less throttling and cgroup aware writeback (Was: Re: [Lsf]
 Preliminary Agenda and Activities for LSF)
Date: Fri, 1 Apr 2011 11:55:22 +1100
Message-ID: <20110401005522.GE2904@dastard>
References: <1301373398.2590.20.camel@mulgrave.site>
 <BANLkTinw+rOgtfP9WLXhAYydbLCZHgDGpw@mail.gmail.com>
 <20110330041802.GA20849@dastard>
 <20110330153757.GD1291@redhat.com>
 <20110330222002.GB20849@dastard>
 <20110331141637.GA11139@redhat.com>
 <1301581251-sup-987@think>
 <20110331221425.GB2904@dastard>
 <1301614603-sup-6349@think>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Vivek Goyal <vgoyal@redhat.com>,
	Chad Talbott <ctalbott@google.com>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	lsf <lsf@lists.linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
To: Chris Mason <chris.mason@oracle.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ipmail04.adl6.internode.on.net ([150.101.137.141]:57098 "EHLO
	ipmail04.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1752184Ab1DAAz1 (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 31 Mar 2011 20:55:27 -0400
Content-Disposition: inline
In-Reply-To: <1301614603-sup-6349@think>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Thu, Mar 31, 2011 at 07:43:27PM -0400, Chris Mason wrote:
> Excerpts from Dave Chinner's message of 2011-03-31 18:14:25 -0400:
> > On Thu, Mar 31, 2011 at 10:34:03AM -0400, Chris Mason wrote:
> > > Excerpts from Vivek Goyal's message of 2011-03-31 10:16:37 -0400:
> > > > On Thu, Mar 31, 2011 at 09:20:02AM +1100, Dave Chinner wrote:
> > > > > There are plans to move the bdi-flusher threads to work queues, and
> > > > > once that is done all your concerns about blocking and parallelism
> > > > > are pretty much gone because it's trivial to have multiple writeback
> > > > > works in progress at once on the same bdi with that infrastructure.
> > > > 
> > > > Will this essentially not nullify the advantage of IO less throttling?
> > > > I thought that we did not want have multiple threads doing writeback
> > > > at the same time to avoid number of seeks and achieve better throughput.
> > > 
> > > Work queues alone are probably not appropriate, at least for spinning
> > > storage.  It will introduce seeks into what would have been
> > > sequential writes.  I had to make the btrfs worker thread pools after
> > > having a lot of trouble cramming writeback into work queues.
> > 
> > That was before the cmwq infrastructure, right? cmwq changes the
> > behaviour of workqueues in such a way that they can simply be
> > thought of as having a thread pool of a specific size....
> > 
> > As a strict translation of the existing one flusher thread per bdi,
> > then only allowing one work at a time to be issued (i.e. workqueue
> > concurency of 1) would give the same behaviour without having all
> > the thread management issues. i.e. regardless of the writeback
> > parallelism mechanism we have the same issue of managing writeback
> > to minimise seeking. cmwq just makes the implementation far simpler,
> > IMO.
> > 
> > As to whether that causes seeks or not, that depends on how we are
> > driving the concurrent works/threads. If we drive a concurrent work
> > per dirty cgroup that needs writing back, then we achieve the
> > concurrency needed to make the IO scheduler appropriately throttle
> > the IO. For the case of no cgroups, then we still only have a single
> > writeback work in progress at a time and behaviour is no different
> > to the current setup. Hence I don't see any particular problem with
> > using workqueues to acheive the necessary writeback parallelism that
> > cgroup aware throttling requires....
> 
> Yes, as long as we aren't trying to shotgun style spread the
> inodes across a bunch of threads, it should work well enough.  The trick
> will just be making sure we don't end up with a lot of inode
> interleaving in the delalloc allocations.

That's a problem for any concurrent writeback mechanism as it passes
through the filesystem. It comes down to filesystems also needing to
have either concurrency- or cgroup-aware allocation mechanisms. It's
just another piece of the puzzle, really.

In the case of XFS, cgroup awareness could be as simple as as simple
as associating each cgroup with a specific allocation group and
keeping each cgroup as isolated as possible. There is precedence for
doing this in XFS - the filestreams allocator makes these sorts of
dynamic associations on a per-directory basis.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com