From: Maxim Patlasov <mpatlasov@parallels.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: <xemul@parallels.com>, <fuse-devel@lists.sourceforge.net>,
<bfoster@redhat.com>, <linux-kernel@vger.kernel.org>,
<jbottomley@parallels.com>, <linux-fsdevel@vger.kernel.org>,
<devel@openvz.org>
Subject: Re: [PATCH] fuse: hotfix truncate_pagecache() issue
Date: Thu, 29 Aug 2013 17:01:06 +0400 [thread overview]
Message-ID: <521F4612.9090708@parallels.com> (raw)
In-Reply-To: <20130829092533.GA18044@tucsk.piliscsaba.szeredi.hu>
Hi,
08/29/2013 01:25 PM, Miklos Szeredi пишет:
> On Wed, Aug 28, 2013 at 04:21:46PM +0400, Maxim Patlasov wrote:
>> The way how fuse calls truncate_pagecache() from fuse_change_attributes()
>> is completely wrong. Because, w/o i_mutex held, we never sure whether
>> 'oldsize' and 'attr->size' are valid by the time of execution of
>> truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
>> released fc->lock in the middle of fuse_change_attributes(), we completely
>> loose control of actions which may happen with given inode until we reach
>> truncate_pagecache. The list of potentially dangerous actions includes mmap-ed
>> reads and writes, ftruncate(2) and write(2) extending file size.
>>
>> The typical outcome of doing truncate_pagecache() with outdated arguments is
>> data corruption from user point of view. This is (in some sense) acceptable
>> in cases when the issue is triggered by a change of the file on the server
>> (i.e. externally wrt fuse operation), but it is absolutely intolerable in
>> scenarios when a single fuse client modifies a file without any external
>> intervention. A real life case I discovered by fsx-linux looked like this:
>>
>> 1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
>> FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
>> 2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
>> to the server synchronously, then calls fuse_change_attributes(). The latter
>> updates i_size, releases fc->lock, but before comparing oldsize vs attr->size..
>> 3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
>> updating attributes and i_size, but now oldsize is equal to outarg.attr.size
>> because i_size has just been updated (step 2). Hence, fuse_do_setattr()
>> returns w/o calling truncate_pagecache().
>> 4. As soon as ftruncate(2) completes, the user extends file size by write(2)
>> making a hole in the middle of file, then reads data from the hole either by
>> read(2) or mmap-ed read. The user expects to get zero data from the hole, but
>> gets stale data because truncate_pagecache() is not executed yet.
>>
>> The patch is a hotfix resolving the issue in a simplistic way: let's skip
>> dangerous i_size update and truncate_pagecache if an operation changing file
>> size is in progress. This simplistic approach looks correct for the cases
>> w/o external changes. And to handle them properly, more sophisticated and
>> intrusive techniques (e.g. NFS-like one) would be required. I'd like
>> to postpone it until the issue is well discussed on the mailing list(s).
> Thanks for the analysis!
>
> Okay, what about this even more simplistic approach?
>
> Not tested, but I think it addresses the very crux of the issue: not truncating
> the page cache even though we should.
The patch looks fine, but it solves only one side of the problem
(exactly what you formulated above). Another side is opposite:
truncating page cache too late, when the state of inode changed
significantly. The beginning may be as in the scenario above:
fuse_dentry_revalidate() discovered that i_size changed (due to our own
fuse_do_setattr()) and is going to call truncate_pagecache() for some
'new_size' it believes valid right now. But by the time that particular
truncate_pagecache() is called, a lot of things may happen:
1) fuse_do_setattr() called truncate_pagecache() according to your patch
2) the file was extended either by write(2) or ftruncate(2) or fallocate(2).
3) mmap-ed write make a page in the extended region dirty
The result will be the lost of data user wrote on the step '3)'. (my
patch solves the issue at least for all cases w/o external changes)
>
> AFAICS there's no such issue with write(2) or fallocate(2). But I haven't
> thought about this very deeply.
I added bits to the fuse_perform_write() to address that other side of
the issue. Fixing fallocate is also required, but I postponed it until
you include that another fix for fallocate (incorrect use of
fuse_set_nowrite()).
Thanks,
Maxim
next prev parent reply other threads:[~2013-08-29 13:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-28 12:21 [PATCH] fuse: hotfix truncate_pagecache() issue Maxim Patlasov
2013-08-29 9:25 ` Miklos Szeredi
2013-08-29 13:01 ` Maxim Patlasov [this message]
2013-08-29 16:18 ` Miklos Szeredi
[not found] ` <CAJfpegvLcu1o+FUyRL1phCGMKvB4tReyjMLXBb6JabNJxooRSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-30 13:06 ` [PATCH] fuse: hotfix truncate_pagecache() issue -v2 Maxim Patlasov
2013-08-30 13:06 ` Maxim Patlasov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=521F4612.9090708@parallels.com \
--to=mpatlasov@parallels.com \
--cc=bfoster@redhat.com \
--cc=devel@openvz.org \
--cc=fuse-devel@lists.sourceforge.net \
--cc=jbottomley@parallels.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.