From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: IO less throttling and cgroup aware writeback (Was: Re: [Lsf] Preliminary Agenda and Activities for LSF) Date: Fri, 1 Apr 2011 11:55:22 +1100 Message-ID: <20110401005522.GE2904@dastard> References: <1301373398.2590.20.camel@mulgrave.site> <20110330041802.GA20849@dastard> <20110330153757.GD1291@redhat.com> <20110330222002.GB20849@dastard> <20110331141637.GA11139@redhat.com> <1301581251-sup-987@think> <20110331221425.GB2904@dastard> <1301614603-sup-6349@think> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Vivek Goyal , Chad Talbott , James Bottomley , lsf , linux-fsdevel To: Chris Mason Return-path: Received: from ipmail04.adl6.internode.on.net ([150.101.137.141]:57098 "EHLO ipmail04.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752184Ab1DAAz1 (ORCPT ); Thu, 31 Mar 2011 20:55:27 -0400 Content-Disposition: inline In-Reply-To: <1301614603-sup-6349@think> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Mar 31, 2011 at 07:43:27PM -0400, Chris Mason wrote: > Excerpts from Dave Chinner's message of 2011-03-31 18:14:25 -0400: > > On Thu, Mar 31, 2011 at 10:34:03AM -0400, Chris Mason wrote: > > > Excerpts from Vivek Goyal's message of 2011-03-31 10:16:37 -0400: > > > > On Thu, Mar 31, 2011 at 09:20:02AM +1100, Dave Chinner wrote: > > > > > There are plans to move the bdi-flusher threads to work queues, and > > > > > once that is done all your concerns about blocking and parallelism > > > > > are pretty much gone because it's trivial to have multiple writeback > > > > > works in progress at once on the same bdi with that infrastructure. > > > > > > > > Will this essentially not nullify the advantage of IO less throttling? > > > > I thought that we did not want have multiple threads doing writeback > > > > at the same time to avoid number of seeks and achieve better throughput. > > > > > > Work queues alone are probably not appropriate, at least for spinning > > > storage. It will introduce seeks into what would have been > > > sequential writes. I had to make the btrfs worker thread pools after > > > having a lot of trouble cramming writeback into work queues. > > > > That was before the cmwq infrastructure, right? cmwq changes the > > behaviour of workqueues in such a way that they can simply be > > thought of as having a thread pool of a specific size.... > > > > As a strict translation of the existing one flusher thread per bdi, > > then only allowing one work at a time to be issued (i.e. workqueue > > concurency of 1) would give the same behaviour without having all > > the thread management issues. i.e. regardless of the writeback > > parallelism mechanism we have the same issue of managing writeback > > to minimise seeking. cmwq just makes the implementation far simpler, > > IMO. > > > > As to whether that causes seeks or not, that depends on how we are > > driving the concurrent works/threads. If we drive a concurrent work > > per dirty cgroup that needs writing back, then we achieve the > > concurrency needed to make the IO scheduler appropriately throttle > > the IO. For the case of no cgroups, then we still only have a single > > writeback work in progress at a time and behaviour is no different > > to the current setup. Hence I don't see any particular problem with > > using workqueues to acheive the necessary writeback parallelism that > > cgroup aware throttling requires.... > > Yes, as long as we aren't trying to shotgun style spread the > inodes across a bunch of threads, it should work well enough. The trick > will just be making sure we don't end up with a lot of inode > interleaving in the delalloc allocations. That's a problem for any concurrent writeback mechanism as it passes through the filesystem. It comes down to filesystems also needing to have either concurrency- or cgroup-aware allocation mechanisms. It's just another piece of the puzzle, really. In the case of XFS, cgroup awareness could be as simple as as simple as associating each cgroup with a specific allocation group and keeping each cgroup as isolated as possible. There is precedence for doing this in XFS - the filestreams allocator makes these sorts of dynamic associations on a per-directory basis. Cheers, Dave. -- Dave Chinner david@fromorbit.com