From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pavel Emelyanov Subject: Re: [PATCH 10/10] mm: Account for WRITEBACK_TEMP in balance_dirty_pages Date: Fri, 27 Jul 2012 08:01:00 +0400 Message-ID: <5012127C.8070203@parallels.com> References: <4FF3156E.8030109@parallels.com> <4FF3166B.5090800@parallels.com> <87obnj38g9.fsf@tucsk.pomaz.szeredi.hu> <50038A3B.4090405@parallels.com> <87wr22b3tn.fsf@tucsk.pomaz.szeredi.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "fuse-devel@lists.sourceforge.net" , Alexander Viro , linux-fsdevel , James Bottomley , Kirill Korotaev To: Miklos Szeredi Return-path: Received: from mailhub.sw.ru ([195.214.232.25]:3029 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750756Ab2G0EBP (ORCPT ); Fri, 27 Jul 2012 00:01:15 -0400 In-Reply-To: <87wr22b3tn.fsf@tucsk.pomaz.szeredi.hu> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 07/17/2012 11:11 PM, Miklos Szeredi wrote: > Pavel Emelyanov writes: Miklos, sorry for the late response. Please, find the answers inline. >> On 07/13/2012 08:57 PM, Miklos Szeredi wrote: >>> Pavel Emelyanov writes: >>> >>>> Make balance_dirty_pages start the throttling when the WRITEBACK_TEMP >>>> counter is hight ehough. This prevents us from having too many dirty >>>> pages on fuse, thus giving the userspace part of it a chance to write >>>> stuff properly. >>>> >>>> Note, that the existing balance logic is per-bdi, i.e. if the fuse >>>> user task gets stuck in the function this means, that it either >>>> writes to the mountpoint it serves (but it can deadlock even without >>>> the writeback) or it is wrting to some _other_ dirty bdi and in the >>>> latter case someone else will free the memory for it. >>> >>> This is not just about deadlocking. Unprivileged fuse filesystems >>> should not impact the operation of other filesystems. I.e. a fuse >>> filesystem which is not making progress writing out pages shouln't cause >>> a write on an unrelated filesystem to block. >>> >>> I believe this patch breaks that promise. >> >> Hm... I believe it does not, and that's why. >> >> When a task writes to some bdi the balance_dirty_pages will evaluate the >> amount of time to block this task on based on this bdi dirty set counters. >> The global stats are only used to a) check whether this decision should be >> made at all > > Okay, maybe I'm blind but if this is true, then how is > balance_dirty_pages() supposed to ensure that the per-bdi limit is not > exceeded? The balance_dirty_pages logic is _very_ roughly the the following: Let this_bdi be a bdi the current task is writing to Let D be the total amount of dirty and writeback memory (and writeback_tmp after this patch) Let L be the limit of dirty memory (L = ram_size * ratio) Let d be the amount of dirty and writeback on this_bdi And let l be the limit of dirty memory on this_bdi With that the balancer logic look like while (1) { if (D < L) return; start_background_writeback(this_bdi); if (d < l) return; timeout = get_sleep_timeout(d, l, D, L); shcedule_timeout(timeout); } The d and l are calculated out of the D and L using this_bdi and global IO completions proportions (with more complexity, but still). Thus, since we throttle tasks looking ad d and l only we cannot affect all the bdis in the system by live-locking a single one of them. Accounting for writeback_tmp is required since the D should become high when there are lots of pages in-flight in FUSE. Otherwise, the balance_dirty_pages will not limit the task writing on a fuse mount. >> and b) evaluate the dirty "fraction" of a bdi. That said, even >> if we stop the fuse daemon (I actually did this) other filesystems won't >> lock. The global counter would be high, yes, but the dirty set fraction of >> non-fuse bdi would be low thus allowing others to progress. > > That makes some sense, but it looks to me that FUSE, NFS and friends > want a stricter dirty balancing logic that looks at the bdi thresholds > even if the global limits are not exceeded. Probably, but I did a very straighforward test -- I just stopped the fuse daemon and started writing to a fuse file. After some time the writing task was locked in balance_dirty_pages, since fuse daemon didn't ack-ed writeback. At the same time I tried to write to other bdis (disks and nfs) and none of them was locked, all the writes succeeded. After I let the fuse daemon run again the fuse-writer unlocked and went on writing. Do you have some trickier scenario in mind? > Thanks, > Miklos > . > Thanks, Pavel