From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chad Talbott Subject: Re: IO less throttling and cgroup aware writeback (Was: Re: [Lsf] Preliminary Agenda and Activities for LSF) Date: Wed, 30 Mar 2011 15:49:17 -0700 Message-ID: References: <1301373398.2590.20.camel@mulgrave.site> <20110330041802.GA20849@dastard> <20110330153757.GD1291@redhat.com> <20110330222002.GB20849@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Vivek Goyal , James Bottomley , lsf@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org To: Dave Chinner Return-path: Received: from smtp-out.google.com ([74.125.121.67]:23172 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755319Ab1C3WtW (ORCPT ); Wed, 30 Mar 2011 18:49:22 -0400 Received: from kpbe19.cbf.corp.google.com (kpbe19.cbf.corp.google.com [172.25.105.83]) by smtp-out.google.com with ESMTP id p2UMnJeX006874 for ; Wed, 30 Mar 2011 15:49:20 -0700 Received: from pvf33 (pvf33.prod.google.com [10.241.210.97]) by kpbe19.cbf.corp.google.com with ESMTP id p2UMnIwg018944 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Wed, 30 Mar 2011 15:49:18 -0700 Received: by pvf33 with SMTP id 33so323343pvf.38 for ; Wed, 30 Mar 2011 15:49:17 -0700 (PDT) In-Reply-To: <20110330222002.GB20849@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Mar 30, 2011 at 3:20 PM, Dave Chinner wrote: > On Wed, Mar 30, 2011 at 11:37:57AM -0400, Vivek Goyal wrote: >> We are planning to track the IO context of original submitter of IO >> by storing that information in page_cgroup. So that is not the problem. >> >> The problem google guys are trying to raise is that can a single flusher >> thread keep all the groups on bdi busy in such a way so that higher >> prio group can get more IO done. > > Which has nothing to do with IO-less dirty throttling at all! Not quite. Pre IO-less dirty throttling, any thread which was dirtying did the writeback itself. Because there's no shortage of threads to do the work, the IO scheduler sees a bunch of threads doing writes against a given BDI and schedules them against each other. This is how async IO isolation works for us. >> It should not happen that flusher >> thread gets blocked somewhere (trying to get request descriptors on >> request queue) > > A major design principle of the bdi-flusher threads is that they > are supposed to block when the request queue gets full - that's how > we got rid of all the congestion garbage from the writeback > stack. With IO cgroups and async write isolation, there are multiple queues per disk that all need to be filled to allow cgroup-aware CFQ schedule between them. If the per-BDI threads could be taught to fill each per-cgroup queue before giving up on a BDI, then IO-less throttling could work. Also, having per-(BDI, blkio cgroup)-flusher threads would work. I think it's complicated enough to warrant a discussion. > There are plans to move the bdi-flusher threads to work queues, and > once that is done all your concerns about blocking and parallelism > are pretty much gone because it's trivial to have multiple writeback > works in progress at once on the same bdi with that infrastructure. This sounds promising. >> So the concern they raised that is single flusher thread per device >> is enough to keep faster cgroup full at the bdi and hence get the >> service differentiation. > > I think there's much bigger problems than that. We seem to be agreeing that it's a complicated problem. That's why I think async write isolation needs some design-level discussion. Chad