From: Maxim Patlasov <mpatlasov@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: <riel@redhat.com>, Kirill Korotaev <dev@parallels.com>,
Pavel Emelianov <xemul@parallels.com>,
fuse-devel <fuse-devel@lists.sourceforge.net>,
Brian Foster <bfoster@redhat.com>,
Kernel Mailing List <linux-kernel@vger.kernel.org>,
James Bottomley <jbottomley@parallels.com>, <linux-mm@kvack.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
<fengguang.wu@intel.com>, <devel@openvz.org>,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 10/16] fuse: Implement writepages callback
Date: Fri, 9 Aug 2013 19:02:12 +0400 [thread overview]
Message-ID: <52050474.8040608@parallels.com> (raw)
In-Reply-To: <CAJfpegtr4+vv_ZzuM7EE7MkHPqNi4brQamg4ZOWb2Me+iG87JQ@mail.gmail.com>
Hi Miklos,
08/06/2013 08:25 PM, Miklos Szeredi пишет:
> On Fri, Aug 2, 2013 at 5:40 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
>> 07/19/2013 08:50 PM, Miklos Szeredi пишет:
>>
>>> On Sat, Jun 29, 2013 at 09:45:29PM +0400, Maxim Patlasov wrote:
>>>> From: Pavel Emelyanov <xemul@openvz.org>
>>>>
>>>> The .writepages one is required to make each writeback request carry more
>>>> than
>>>> one page on it. The patch enables optimized behaviour unconditionally,
>>>> i.e. mmap-ed writes will benefit from the patch even if
>>>> fc->writeback_cache=0.
>>> I rewrote this a bit, so we won't have to do the thing in two passes,
>>> which
>>> makes it simpler and more robust. Waiting for page writeback here is
>>> wrong
>>> anyway, see comment above fuse_page_mkwrite(). BTW we had a race there
>>> because
>>> fuse_page_mkwrite() didn't take the page lock. I've also fixed that up
>>> and
>>> pushed a series containing these patches up to implementing ->writepages()
>>> to
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git
>>> writepages
>>>
>>> Passed some trivial testing but more is needed.
>>
>> Thanks a lot for efforts. The approach you implemented looks promising, but
>> it introduces the following assumption: a page cannot become dirty before we
>> have a chance to wait on fuse writeback holding the page locked. This is
>> already true for mmap-ed writes (due to your fixes) and it seems doable for
>> cached writes as well (like we do in fuse_perform_write). But the assumption
>> seems to be broken in case of direct read from local fs (e.g. ext4) to a
>> memory region mmap-ed to a file on fuse fs. See how dio_bio_submit() marks
>> pages dirty by bio_set_pages_dirty(). I can't see any solution for this
>> use-case. Do you?
> Hmm. Direct IO on an mmaped file will do get_user_pages() which will
> do the necessary page fault magic and ->page_mkwrite() will be called.
> At least AFAICS.
Yes, I agree.
>
> The page cannot become dirty through a memory mapping without first
> switching the pte from read-only to read-write first. Page accounting
> logic relies on this too. The other way the page can become dirty is
> through write(2) on the fs. But we do get notified about that too.
Yes, that's correct, but I don't understand why you disregard two other
cases of marking page dirty (both related to direct AIO read from a file
to a memory region mmap-ed to a fuse file):
1. dio_bio_submit() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
2. dio_bio_complete() -->
bio_check_pages_dirty() -->
bio_dirty_fn() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
As soon as a page became dirty through a memory mapping (exactly as you
explained), nothing would prevent it to be written-back. And fuse will
call end_page_writeback almost immediately after copying the real page
to a temporary one. Then dio_bio_submit may re-dirty page speculatively
w/o notifying fuse. And again, since then nothing would prevent it to be
written-back once more. Hence we can end up in more then one temporary
page in fuse write-back. And similar concern for dio_bio_complete()
re-dirty.
This make me think that we do need fuse_page_is_writeback() in
fuse_writepages_fill(). But it shouldn't be harmful because it will
no-op practically always due to waiting for fuse writeback in
->page_mkwrite() and in course of handling write(2).
Thanks,
Maxim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Maxim Patlasov <mpatlasov@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: riel@redhat.com, Kirill Korotaev <dev@parallels.com>,
Pavel Emelianov <xemul@parallels.com>,
fuse-devel <fuse-devel@lists.sourceforge.net>,
Brian Foster <bfoster@redhat.com>,
Kernel Mailing List <linux-kernel@vger.kernel.org>,
James Bottomley <jbottomley@parallels.com>,
linux-mm@kvack.org, Al Viro <viro@zeniv.linux.org.uk>,
Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
fengguang.wu@intel.com, devel@openvz.org,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 10/16] fuse: Implement writepages callback
Date: Fri, 9 Aug 2013 19:02:12 +0400 [thread overview]
Message-ID: <52050474.8040608@parallels.com> (raw)
In-Reply-To: <CAJfpegtr4+vv_ZzuM7EE7MkHPqNi4brQamg4ZOWb2Me+iG87JQ@mail.gmail.com>
Hi Miklos,
08/06/2013 08:25 PM, Miklos Szeredi D?D,N?DuN?:
> On Fri, Aug 2, 2013 at 5:40 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
>> 07/19/2013 08:50 PM, Miklos Szeredi D?D,N?DuN?:
>>
>>> On Sat, Jun 29, 2013 at 09:45:29PM +0400, Maxim Patlasov wrote:
>>>> From: Pavel Emelyanov <xemul@openvz.org>
>>>>
>>>> The .writepages one is required to make each writeback request carry more
>>>> than
>>>> one page on it. The patch enables optimized behaviour unconditionally,
>>>> i.e. mmap-ed writes will benefit from the patch even if
>>>> fc->writeback_cache=0.
>>> I rewrote this a bit, so we won't have to do the thing in two passes,
>>> which
>>> makes it simpler and more robust. Waiting for page writeback here is
>>> wrong
>>> anyway, see comment above fuse_page_mkwrite(). BTW we had a race there
>>> because
>>> fuse_page_mkwrite() didn't take the page lock. I've also fixed that up
>>> and
>>> pushed a series containing these patches up to implementing ->writepages()
>>> to
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git
>>> writepages
>>>
>>> Passed some trivial testing but more is needed.
>>
>> Thanks a lot for efforts. The approach you implemented looks promising, but
>> it introduces the following assumption: a page cannot become dirty before we
>> have a chance to wait on fuse writeback holding the page locked. This is
>> already true for mmap-ed writes (due to your fixes) and it seems doable for
>> cached writes as well (like we do in fuse_perform_write). But the assumption
>> seems to be broken in case of direct read from local fs (e.g. ext4) to a
>> memory region mmap-ed to a file on fuse fs. See how dio_bio_submit() marks
>> pages dirty by bio_set_pages_dirty(). I can't see any solution for this
>> use-case. Do you?
> Hmm. Direct IO on an mmaped file will do get_user_pages() which will
> do the necessary page fault magic and ->page_mkwrite() will be called.
> At least AFAICS.
Yes, I agree.
>
> The page cannot become dirty through a memory mapping without first
> switching the pte from read-only to read-write first. Page accounting
> logic relies on this too. The other way the page can become dirty is
> through write(2) on the fs. But we do get notified about that too.
Yes, that's correct, but I don't understand why you disregard two other
cases of marking page dirty (both related to direct AIO read from a file
to a memory region mmap-ed to a fuse file):
1. dio_bio_submit() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
2. dio_bio_complete() -->
bio_check_pages_dirty() -->
bio_dirty_fn() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
As soon as a page became dirty through a memory mapping (exactly as you
explained), nothing would prevent it to be written-back. And fuse will
call end_page_writeback almost immediately after copying the real page
to a temporary one. Then dio_bio_submit may re-dirty page speculatively
w/o notifying fuse. And again, since then nothing would prevent it to be
written-back once more. Hence we can end up in more then one temporary
page in fuse write-back. And similar concern for dio_bio_complete()
re-dirty.
This make me think that we do need fuse_page_is_writeback() in
fuse_writepages_fill(). But it shouldn't be harmful because it will
no-op practically always due to waiting for fuse writeback in
->page_mkwrite() and in course of handling write(2).
Thanks,
Maxim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Maxim Patlasov <mpatlasov@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: <riel@redhat.com>, Kirill Korotaev <dev@parallels.com>,
Pavel Emelianov <xemul@parallels.com>,
fuse-devel <fuse-devel@lists.sourceforge.net>,
Brian Foster <bfoster@redhat.com>,
Kernel Mailing List <linux-kernel@vger.kernel.org>,
James Bottomley <jbottomley@parallels.com>, <linux-mm@kvack.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
<fengguang.wu@intel.com>, <devel@openvz.org>,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 10/16] fuse: Implement writepages callback
Date: Fri, 9 Aug 2013 19:02:12 +0400 [thread overview]
Message-ID: <52050474.8040608@parallels.com> (raw)
In-Reply-To: <CAJfpegtr4+vv_ZzuM7EE7MkHPqNi4brQamg4ZOWb2Me+iG87JQ@mail.gmail.com>
Hi Miklos,
08/06/2013 08:25 PM, Miklos Szeredi пишет:
> On Fri, Aug 2, 2013 at 5:40 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
>> 07/19/2013 08:50 PM, Miklos Szeredi пишет:
>>
>>> On Sat, Jun 29, 2013 at 09:45:29PM +0400, Maxim Patlasov wrote:
>>>> From: Pavel Emelyanov <xemul@openvz.org>
>>>>
>>>> The .writepages one is required to make each writeback request carry more
>>>> than
>>>> one page on it. The patch enables optimized behaviour unconditionally,
>>>> i.e. mmap-ed writes will benefit from the patch even if
>>>> fc->writeback_cache=0.
>>> I rewrote this a bit, so we won't have to do the thing in two passes,
>>> which
>>> makes it simpler and more robust. Waiting for page writeback here is
>>> wrong
>>> anyway, see comment above fuse_page_mkwrite(). BTW we had a race there
>>> because
>>> fuse_page_mkwrite() didn't take the page lock. I've also fixed that up
>>> and
>>> pushed a series containing these patches up to implementing ->writepages()
>>> to
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git
>>> writepages
>>>
>>> Passed some trivial testing but more is needed.
>>
>> Thanks a lot for efforts. The approach you implemented looks promising, but
>> it introduces the following assumption: a page cannot become dirty before we
>> have a chance to wait on fuse writeback holding the page locked. This is
>> already true for mmap-ed writes (due to your fixes) and it seems doable for
>> cached writes as well (like we do in fuse_perform_write). But the assumption
>> seems to be broken in case of direct read from local fs (e.g. ext4) to a
>> memory region mmap-ed to a file on fuse fs. See how dio_bio_submit() marks
>> pages dirty by bio_set_pages_dirty(). I can't see any solution for this
>> use-case. Do you?
> Hmm. Direct IO on an mmaped file will do get_user_pages() which will
> do the necessary page fault magic and ->page_mkwrite() will be called.
> At least AFAICS.
Yes, I agree.
>
> The page cannot become dirty through a memory mapping without first
> switching the pte from read-only to read-write first. Page accounting
> logic relies on this too. The other way the page can become dirty is
> through write(2) on the fs. But we do get notified about that too.
Yes, that's correct, but I don't understand why you disregard two other
cases of marking page dirty (both related to direct AIO read from a file
to a memory region mmap-ed to a fuse file):
1. dio_bio_submit() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
2. dio_bio_complete() -->
bio_check_pages_dirty() -->
bio_dirty_fn() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
As soon as a page became dirty through a memory mapping (exactly as you
explained), nothing would prevent it to be written-back. And fuse will
call end_page_writeback almost immediately after copying the real page
to a temporary one. Then dio_bio_submit may re-dirty page speculatively
w/o notifying fuse. And again, since then nothing would prevent it to be
written-back once more. Hence we can end up in more then one temporary
page in fuse write-back. And similar concern for dio_bio_complete()
re-dirty.
This make me think that we do need fuse_page_is_writeback() in
fuse_writepages_fill(). But it shouldn't be harmful because it will
no-op practically always due to waiting for fuse writeback in
->page_mkwrite() and in course of handling write(2).
Thanks,
Maxim
next prev parent reply other threads:[~2013-08-09 15:02 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-29 17:41 [PATCH v5 00/16] fuse: An attempt to implement a write-back cache policy Maxim Patlasov
2013-06-29 17:41 ` Maxim Patlasov
2013-06-29 17:42 ` [PATCH 01/16] fuse: Linking file to inode helper Maxim Patlasov
2013-06-29 17:42 ` Maxim Patlasov
2013-06-29 17:42 ` [PATCH 02/16] fuse: Getting file for writeback helper Maxim Patlasov
2013-06-29 17:42 ` Maxim Patlasov
[not found] ` <20130629172211.20175.70154.stgit-vWG5eQQidJHciZdyczg/7Q@public.gmane.org>
2013-06-29 17:42 ` [PATCH 03/16] fuse: Prepare to handle short reads Maxim Patlasov
2013-06-29 17:42 ` Maxim Patlasov
2013-06-29 17:42 ` Maxim Patlasov
2013-06-29 17:45 ` [PATCH 09/16] fuse: restructure fuse_readpage() Maxim Patlasov
2013-06-29 17:45 ` Maxim Patlasov
2013-06-29 17:45 ` Maxim Patlasov
2013-06-29 17:42 ` [PATCH 04/16] fuse: Prepare to handle multiple pages in writeback Maxim Patlasov
2013-06-29 17:42 ` Maxim Patlasov
2013-06-29 17:42 ` [PATCH 05/16] fuse: Connection bit for enabling writeback Maxim Patlasov
2013-06-29 17:42 ` Maxim Patlasov
2013-06-29 17:44 ` [PATCH 06/16] fuse: Trust kernel i_size only - v4 Maxim Patlasov
2013-06-29 17:44 ` Maxim Patlasov
2013-06-29 17:44 ` [PATCH 07/16] fuse: Trust kernel i_mtime only Maxim Patlasov
2013-06-29 17:44 ` Maxim Patlasov
2013-07-11 16:14 ` [PATCH 07/16] fuse: Trust kernel i_mtime only -v2 Maxim Patlasov
2013-06-29 17:45 ` [PATCH 08/16] fuse: Flush files on wb close Maxim Patlasov
2013-06-29 17:45 ` Maxim Patlasov
2013-07-11 16:18 ` [PATCH 08/16] fuse: Flush files on wb close -v2 Maxim Patlasov
2013-06-29 17:45 ` [PATCH 10/16] fuse: Implement writepages callback Maxim Patlasov
2013-06-29 17:45 ` Maxim Patlasov
2013-07-19 16:50 ` Miklos Szeredi
2013-07-19 16:50 ` Miklos Szeredi
2013-08-02 15:40 ` Maxim Patlasov
2013-08-02 15:40 ` Maxim Patlasov
2013-08-02 15:40 ` Maxim Patlasov
2013-08-06 16:25 ` Miklos Szeredi
2013-08-06 16:25 ` Miklos Szeredi
2013-08-06 16:26 ` Eric Boxer
2013-08-09 15:02 ` Maxim Patlasov [this message]
2013-08-09 15:02 ` Maxim Patlasov
2013-08-09 15:02 ` Maxim Patlasov
2013-08-30 10:12 ` Miklos Szeredi
2013-08-30 10:12 ` Miklos Szeredi
2013-08-30 10:12 ` Miklos Szeredi
2013-08-30 14:50 ` Maxim Patlasov
2013-08-30 14:50 ` Maxim Patlasov
2013-08-30 14:50 ` Maxim Patlasov
2013-09-03 10:31 ` Miklos Szeredi
2013-09-03 10:31 ` Miklos Szeredi
2013-09-03 10:31 ` Miklos Szeredi
2013-09-03 16:02 ` Maxim Patlasov
2013-09-03 16:02 ` Maxim Patlasov
2013-09-03 16:02 ` Maxim Patlasov
2013-06-29 17:45 ` [PATCH 11/16] fuse: Implement write_begin/write_end callbacks Maxim Patlasov
2013-06-29 17:45 ` Maxim Patlasov
2013-06-29 17:46 ` [PATCH 12/16] fuse: fuse_writepage_locked() should wait on writeback Maxim Patlasov
2013-06-29 17:46 ` Maxim Patlasov
2013-06-29 17:46 ` [PATCH 13/16] fuse: fuse_flush() " Maxim Patlasov
2013-06-29 17:46 ` Maxim Patlasov
2013-06-29 17:46 ` [PATCH 14/16] fuse: Fix O_DIRECT operations vs cached writeback misorder - v2 Maxim Patlasov
2013-06-29 17:46 ` Maxim Patlasov
2013-06-29 17:47 ` [PATCH 15/16] fuse: Turn writeback cache on Maxim Patlasov
2013-06-29 17:47 ` Maxim Patlasov
2013-06-29 17:48 ` [PATCH 16/16] mm: strictlimit feature Maxim Patlasov
2013-06-29 17:48 ` Maxim Patlasov
2013-07-01 21:16 ` Andrew Morton
2013-07-01 21:16 ` Andrew Morton
2013-07-02 8:33 ` Maxim Patlasov
2013-07-02 8:33 ` Maxim Patlasov
2013-07-02 8:33 ` Maxim Patlasov
2013-07-02 17:44 ` [PATCH] mm: strictlimit feature -v2 Maxim Patlasov
2013-07-02 17:44 ` Maxim Patlasov
2013-07-02 19:38 ` Andrew Morton
2013-07-02 19:38 ` Andrew Morton
2013-07-03 11:01 ` Maxim Patlasov
2013-07-03 11:01 ` Maxim Patlasov
2013-07-03 11:01 ` Maxim Patlasov
2013-07-03 23:16 ` Jan Kara
2013-07-03 23:16 ` Jan Kara
2013-07-03 23:16 ` Jan Kara
2013-07-05 13:14 ` Maxim Patlasov
2013-07-05 13:14 ` Maxim Patlasov
2013-07-05 13:14 ` Maxim Patlasov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52050474.8040608@parallels.com \
--to=mpatlasov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=bfoster@redhat.com \
--cc=dev@parallels.com \
--cc=devel@openvz.org \
--cc=fengguang.wu@intel.com \
--cc=fuse-devel@lists.sourceforge.net \
--cc=jbottomley@parallels.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=miklos@szeredi.hu \
--cc=riel@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.