linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Emelyanov <xemul@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: "fuse-devel@lists.sourceforge.net"
	<fuse-devel@lists.sourceforge.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	James Bottomley <jbottomley@parallels.com>,
	Kirill Korotaev <dev@parallels.com>
Subject: Re: [PATCH 10/10] mm: Account for WRITEBACK_TEMP in balance_dirty_pages
Date: Fri, 27 Jul 2012 08:01:00 +0400	[thread overview]
Message-ID: <5012127C.8070203@parallels.com> (raw)
In-Reply-To: <87wr22b3tn.fsf@tucsk.pomaz.szeredi.hu>

On 07/17/2012 11:11 PM, Miklos Szeredi wrote:
> Pavel Emelyanov <xemul@parallels.com> writes:

Miklos, sorry for the late response. Please, find the answers inline.

>> On 07/13/2012 08:57 PM, Miklos Szeredi wrote:
>>> Pavel Emelyanov <xemul@parallels.com> writes:
>>>
>>>> Make balance_dirty_pages start the throttling when the WRITEBACK_TEMP
>>>> counter is hight ehough. This prevents us from having too many dirty
>>>> pages on fuse, thus giving the userspace part of it a chance to write
>>>> stuff properly.
>>>>
>>>> Note, that the existing balance logic is per-bdi, i.e. if the fuse
>>>> user task gets stuck in the function this means, that it either
>>>> writes to the mountpoint it serves (but it can deadlock even without
>>>> the writeback) or it is wrting to some _other_ dirty bdi and in the
>>>> latter case someone else will free the memory for it.
>>>
>>> This is not just about deadlocking.  Unprivileged fuse filesystems
>>> should not impact the operation of other filesystems.  I.e. a fuse
>>> filesystem which is not making progress writing out pages shouln't cause
>>> a write on an unrelated filesystem to block.
>>>
>>> I believe this patch breaks that promise.
>>
>> Hm... I believe it does not, and that's why.
>>
>> When a task writes to some bdi the balance_dirty_pages will evaluate the
>> amount of time to block this task on based on this bdi dirty set counters.
>> The global stats are only used to a) check whether this decision should be
>> made at all
> 
> Okay, maybe I'm blind but if this is true, then how is
> balance_dirty_pages() supposed to ensure that the per-bdi limit is not
> exceeded?

The balance_dirty_pages logic is _very_ roughly the the following:

Let this_bdi be a bdi the current task is writing to
Let D be the total amount of dirty and writeback memory (and writeback_tmp after this patch)
Let L be the limit of dirty memory (L = ram_size * ratio)
Let d be the amount of dirty and writeback on this_bdi
And let l be the limit of dirty memory on this_bdi

With that the balancer logic look like

while (1) {
	if (D < L)
		return;

	start_background_writeback(this_bdi);

	if (d < l)
		return;

	timeout = get_sleep_timeout(d, l, D, L);
	shcedule_timeout(timeout);
}

The d and l are calculated out of the D and L using this_bdi and global IO completions
proportions (with more complexity, but still).

Thus, since we throttle tasks looking ad d and l only we cannot affect all the bdis in
the system by live-locking a single one of them.

Accounting for writeback_tmp is required since the D should become high when there are
lots of pages in-flight in FUSE. Otherwise, the balance_dirty_pages will not limit the
task writing on a fuse mount.

>> and b) evaluate the dirty "fraction" of a bdi. That said, even
>> if we stop the fuse daemon (I actually did this) other filesystems won't
>> lock. The global counter would be high, yes, but the dirty set fraction of 
>> non-fuse bdi would be low thus allowing others to progress.
> 
> That makes some sense, but it looks to me that FUSE, NFS and friends
> want a stricter dirty balancing logic that looks at the bdi thresholds
> even if the global limits are not exceeded.

Probably, but I did a very straighforward test -- I just stopped the fuse daemon and started
writing to a fuse file. After some time the writing task was locked in balance_dirty_pages,
since fuse daemon didn't ack-ed writeback. At the same time I tried to write to other bdis
(disks and nfs) and none of them was locked, all the writes succeeded. After I let the fuse
daemon run again the fuse-writer unlocked and went on writing.

Do you have some trickier scenario in mind?

> Thanks,
> Miklos
> .
> 

Thanks,
Pavel

  reply	other threads:[~2012-07-27  4:01 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-03 15:53 [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Pavel Emelyanov
2012-07-03 15:53 ` [PATCH 1/10] fuse: Linking file to inode helper Pavel Emelyanov
2012-07-03 15:54 ` [PATCH 2/10] fuse: Getting file for writeback helper Pavel Emelyanov
2012-07-03 15:54 ` [PATCH 3/10] fuse: Prepare to handle short reads Pavel Emelyanov
2012-07-03 15:55 ` [PATCH 4/10] fuse: Prepare to handle multiple pages in writeback Pavel Emelyanov
2012-07-04 13:06   ` Miklos Szeredi
2012-07-04 14:26     ` Pavel Emelyanov
2012-07-03 15:55 ` [PATCH 6/10] fuse: Trust kernel i_size only Pavel Emelyanov
2012-07-04 14:39   ` Miklos Szeredi
2012-07-05 14:10     ` Pavel Emelyanov
2012-07-10  5:53     ` Pavel Emelyanov
2012-07-13 16:30       ` Miklos Szeredi
2012-07-16  3:32         ` Pavel Emelyanov
2012-07-17 15:17           ` Miklos Szeredi
     [not found]     ` <8762a3pp3m.fsf-d8RdFUjzFsbxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2012-10-01 17:30       ` Maxim V. Patlasov
2012-11-16  9:49         ` Miklos Szeredi
2012-11-16 10:32           ` Maxim V. Patlasov
     [not found] ` <4FF3156E.8030109-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-03 15:55   ` [PATCH 5/10] fuse: Connection bit for enabling writeback Pavel Emelyanov
2012-07-03 15:56   ` [PATCH 7/10] fuse: Flush files on wb close Pavel Emelyanov
2012-07-03 15:56   ` [PATCH 8/10] fuse: Implement writepages and write_begin/write_end callbacks Pavel Emelyanov
2012-07-03 15:57   ` [PATCH 9/10] fuse: Turn writeback on Pavel Emelyanov
2012-07-04  3:01   ` [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Nikolaus Rath
     [not found]     ` <87a9zg1b7q.fsf-sKB8Sp2ER+yL2G7IJ6k9tw@public.gmane.org>
2012-07-04  7:11       ` Pavel Emelyanov
2012-07-04 13:22         ` Nikolaus Rath
     [not found]           ` <4FF4438B.8050807-BTH8mxji4b0@public.gmane.org>
2012-07-04 14:33             ` Pavel Emelyanov
     [not found]               ` <4FF45447.5000705-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-04 17:08                 ` Nikolaus Rath
2012-07-05  9:01                   ` Pavel Emelyanov
2012-07-05 13:07         ` Nikolaus Rath
2012-07-05 14:08           ` Pavel Emelyanov
2012-07-05 14:29             ` Nikolaus Rath
2012-07-05 14:34               ` Pavel Emelyanov
2012-07-06  2:04                 ` Nikolaus Rath
     [not found]                   ` <8762a1odbf.fsf-sKB8Sp2ER+yL2G7IJ6k9tw@public.gmane.org>
2012-07-06  8:46                     ` Pavel Emelyanov
2012-07-05 19:31   ` Anand Avati
     [not found]     ` <4FF5EB85.1050701-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-07-05 20:07       ` Pavel Emelyanov
2012-07-06 11:52         ` [fuse-devel] " Kirill Korotaev
2012-07-03 15:57 ` [PATCH 10/10] mm: Account for WRITEBACK_TEMP in balance_dirty_pages Pavel Emelyanov
     [not found]   ` <4FF3166B.5090800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-13 16:57     ` Miklos Szeredi
2012-07-16  3:27       ` Pavel Emelyanov
2012-07-17 19:11         ` Miklos Szeredi
2012-07-27  4:01           ` Pavel Emelyanov [this message]
     [not found]             ` <5012127C.8070203-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-08-07 17:30               ` Miklos Szeredi
2012-07-05 19:26 ` [fuse-devel] [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Anand Avati

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5012127C.8070203@parallels.com \
    --to=xemul@parallels.com \
    --cc=dev@parallels.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=jbottomley@parallels.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).