All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Emelyanov <xemul@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: "fuse-devel@lists.sourceforge.net"
	<fuse-devel@lists.sourceforge.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	James Bottomley <jbottomley@parallels.com>,
	Kirill Korotaev <dev@parallels.com>
Subject: Re: [PATCH 10/10] mm: Account for WRITEBACK_TEMP in balance_dirty_pages
Date: Fri, 27 Jul 2012 08:01:00 +0400	[thread overview]
Message-ID: <5012127C.8070203@parallels.com> (raw)
In-Reply-To: <87wr22b3tn.fsf@tucsk.pomaz.szeredi.hu>

On 07/17/2012 11:11 PM, Miklos Szeredi wrote:
> Pavel Emelyanov <xemul@parallels.com> writes:

Miklos, sorry for the late response. Please, find the answers inline.

>> On 07/13/2012 08:57 PM, Miklos Szeredi wrote:
>>> Pavel Emelyanov <xemul@parallels.com> writes:
>>>
>>>> Make balance_dirty_pages start the throttling when the WRITEBACK_TEMP
>>>> counter is hight ehough. This prevents us from having too many dirty
>>>> pages on fuse, thus giving the userspace part of it a chance to write
>>>> stuff properly.
>>>>
>>>> Note, that the existing balance logic is per-bdi, i.e. if the fuse
>>>> user task gets stuck in the function this means, that it either
>>>> writes to the mountpoint it serves (but it can deadlock even without
>>>> the writeback) or it is wrting to some _other_ dirty bdi and in the
>>>> latter case someone else will free the memory for it.
>>>
>>> This is not just about deadlocking.  Unprivileged fuse filesystems
>>> should not impact the operation of other filesystems.  I.e. a fuse
>>> filesystem which is not making progress writing out pages shouln't cause
>>> a write on an unrelated filesystem to block.
>>>
>>> I believe this patch breaks that promise.
>>
>> Hm... I believe it does not, and that's why.
>>
>> When a task writes to some bdi the balance_dirty_pages will evaluate the
>> amount of time to block this task on based on this bdi dirty set counters.
>> The global stats are only used to a) check whether this decision should be
>> made at all
> 
> Okay, maybe I'm blind but if this is true, then how is
> balance_dirty_pages() supposed to ensure that the per-bdi limit is not
> exceeded?

The balance_dirty_pages logic is _very_ roughly the the following:

Let this_bdi be a bdi the current task is writing to
Let D be the total amount of dirty and writeback memory (and writeback_tmp after this patch)
Let L be the limit of dirty memory (L = ram_size * ratio)
Let d be the amount of dirty and writeback on this_bdi
And let l be the limit of dirty memory on this_bdi

With that the balancer logic look like

while (1) {
	if (D < L)
		return;

	start_background_writeback(this_bdi);

	if (d < l)
		return;

	timeout = get_sleep_timeout(d, l, D, L);
	shcedule_timeout(timeout);
}

The d and l are calculated out of the D and L using this_bdi and global IO completions
proportions (with more complexity, but still).

Thus, since we throttle tasks looking ad d and l only we cannot affect all the bdis in
the system by live-locking a single one of them.

Accounting for writeback_tmp is required since the D should become high when there are
lots of pages in-flight in FUSE. Otherwise, the balance_dirty_pages will not limit the
task writing on a fuse mount.

>> and b) evaluate the dirty "fraction" of a bdi. That said, even
>> if we stop the fuse daemon (I actually did this) other filesystems won't
>> lock. The global counter would be high, yes, but the dirty set fraction of 
>> non-fuse bdi would be low thus allowing others to progress.
> 
> That makes some sense, but it looks to me that FUSE, NFS and friends
> want a stricter dirty balancing logic that looks at the bdi thresholds
> even if the global limits are not exceeded.

Probably, but I did a very straighforward test -- I just stopped the fuse daemon and started
writing to a fuse file. After some time the writing task was locked in balance_dirty_pages,
since fuse daemon didn't ack-ed writeback. At the same time I tried to write to other bdis
(disks and nfs) and none of them was locked, all the writes succeeded. After I let the fuse
daemon run again the fuse-writer unlocked and went on writing.

Do you have some trickier scenario in mind?

> Thanks,
> Miklos
> .
> 

Thanks,
Pavel

  reply	other threads:[~2012-07-27  4:01 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-03 15:53 [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Pavel Emelyanov
2012-07-03 15:53 ` [PATCH 1/10] fuse: Linking file to inode helper Pavel Emelyanov
2012-07-03 15:54 ` [PATCH 2/10] fuse: Getting file for writeback helper Pavel Emelyanov
2012-07-03 15:54 ` [PATCH 3/10] fuse: Prepare to handle short reads Pavel Emelyanov
2012-07-03 15:55 ` [PATCH 4/10] fuse: Prepare to handle multiple pages in writeback Pavel Emelyanov
2012-07-04 13:06   ` Miklos Szeredi
2012-07-04 14:26     ` Pavel Emelyanov
2012-07-03 15:55 ` [PATCH 6/10] fuse: Trust kernel i_size only Pavel Emelyanov
2012-07-04 14:39   ` Miklos Szeredi
2012-07-05 14:10     ` Pavel Emelyanov
2012-07-10  5:53     ` Pavel Emelyanov
2012-07-13 16:30       ` Miklos Szeredi
2012-07-16  3:32         ` Pavel Emelyanov
2012-07-17 15:17           ` Miklos Szeredi
     [not found]     ` <8762a3pp3m.fsf-d8RdFUjzFsbxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2012-10-01 17:30       ` Maxim V. Patlasov
2012-11-16  9:49         ` Miklos Szeredi
2012-11-16 10:32           ` Maxim V. Patlasov
     [not found] ` <4FF3156E.8030109-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-03 15:55   ` [PATCH 5/10] fuse: Connection bit for enabling writeback Pavel Emelyanov
2012-07-03 15:56   ` [PATCH 7/10] fuse: Flush files on wb close Pavel Emelyanov
2012-07-03 15:56   ` [PATCH 8/10] fuse: Implement writepages and write_begin/write_end callbacks Pavel Emelyanov
2012-07-03 15:57   ` [PATCH 9/10] fuse: Turn writeback on Pavel Emelyanov
2012-07-04  3:01   ` [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Nikolaus Rath
     [not found]     ` <87a9zg1b7q.fsf-sKB8Sp2ER+yL2G7IJ6k9tw@public.gmane.org>
2012-07-04  7:11       ` Pavel Emelyanov
2012-07-04 13:22         ` Nikolaus Rath
     [not found]           ` <4FF4438B.8050807-BTH8mxji4b0@public.gmane.org>
2012-07-04 14:33             ` Pavel Emelyanov
     [not found]               ` <4FF45447.5000705-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-04 17:08                 ` Nikolaus Rath
2012-07-05  9:01                   ` Pavel Emelyanov
2012-07-05 13:07         ` Nikolaus Rath
2012-07-05 14:08           ` Pavel Emelyanov
2012-07-05 14:29             ` Nikolaus Rath
2012-07-05 14:34               ` Pavel Emelyanov
2012-07-06  2:04                 ` Nikolaus Rath
     [not found]                   ` <8762a1odbf.fsf-sKB8Sp2ER+yL2G7IJ6k9tw@public.gmane.org>
2012-07-06  8:46                     ` Pavel Emelyanov
2012-07-05 19:31   ` Anand Avati
     [not found]     ` <4FF5EB85.1050701-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-07-05 20:07       ` Pavel Emelyanov
2012-07-06 11:52         ` [fuse-devel] " Kirill Korotaev
2012-07-03 15:57 ` [PATCH 10/10] mm: Account for WRITEBACK_TEMP in balance_dirty_pages Pavel Emelyanov
     [not found]   ` <4FF3166B.5090800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-07-13 16:57     ` Miklos Szeredi
2012-07-16  3:27       ` Pavel Emelyanov
2012-07-17 19:11         ` Miklos Szeredi
2012-07-27  4:01           ` Pavel Emelyanov [this message]
     [not found]             ` <5012127C.8070203-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-08-07 17:30               ` Miklos Szeredi
2012-07-05 19:26 ` [fuse-devel] [PATCH 0/10] fuse: An attempt to implement a write-back cache policy Anand Avati

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5012127C.8070203@parallels.com \
    --to=xemul@parallels.com \
    --cc=dev@parallels.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=jbottomley@parallels.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.