From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: IO less throttling and cgroup aware writeback (Was: Re: [Lsf] Preliminary Agenda and Activities for LSF) Date: Thu, 31 Mar 2011 14:00:33 +1100 Message-ID: <20110331030033.GA30279@dastard> References: <1301373398.2590.20.camel@mulgrave.site> <20110330041802.GA20849@dastard> <20110330153757.GD1291@redhat.com> <20110330222002.GB20849@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Vivek Goyal , James Bottomley , lsf@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org To: Chad Talbott Return-path: Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:47093 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755342Ab1CaDAj (ORCPT ); Wed, 30 Mar 2011 23:00:39 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Mar 30, 2011 at 03:49:17PM -0700, Chad Talbott wrote: > On Wed, Mar 30, 2011 at 3:20 PM, Dave Chinner w= rote: > > On Wed, Mar 30, 2011 at 11:37:57AM -0400, Vivek Goyal wrote: > >> We are planning to track the IO context of original submitter of I= O > >> by storing that information in page_cgroup. So that is not the pro= blem. > >> > >> The problem google guys are trying to raise is that can a single f= lusher > >> thread keep all the groups on bdi busy in such a way so that highe= r > >> prio group can get more IO done. > > > > Which has nothing to do with IO-less dirty throttling at all! >=20 > Not quite. Pre IO-less dirty throttling, any thread which was > dirtying did the writeback itself. Because there's no shortage of > threads to do the work, the IO scheduler sees a bunch of threads doin= g > writes against a given BDI and schedules them against each other. > This is how async IO isolation works for us. And it's precisely this behaviour that makes foreground throttling a scalability limitation, both from a list/lock contention POV and from a IO optimisation POV. > >> So the concern they raised that is single flusher thread per devic= e > >> is enough to keep faster cgroup full at the bdi and hence get the > >> service differentiation. > > > > I think there's much bigger problems than that. >=20 > We seem to be agreeing that it's a complicated problem. That's why I > think async write isolation needs some design-level discussion. =46rom my perspeccctive, we've still got a significant amount of work to get writeback into a scalable form for current generation machines, let alone future machines. =46ixing the writeback code is a slow process because of all the subtle interactions with different filesystems and different workloads, wh=D1=96ch made more complex by the fact that many filesyste= ms implement their own writeback paths and have their own writeback semantics. We need to make the right decision on what IO to issue, not just issue lots of IO and hope it all turns out OK in the end. If we can't get that decision matrix right for the simple case of a global context, then we have no hope of extending it to cgroup-aware writeback. IOWs, we need to get writeback working in a scalable manner before we complicate it immensely with all this cgroup and isolation madness. Hence I think trying to make writeback cgroup-aware is probably 6-12 months premature at this point and trying to do it now will only serve to make it harder to get the common, simple cases working as we desire them to... Cheers, Dave. --=20 Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html