From: Niels de Vos <ndevos-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: "Myklebust,
Trond" <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Cc: "linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Jeffrey Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor
Date: Mon, 07 May 2012 10:49:23 +0200 [thread overview]
Message-ID: <4FA78C93.3040204@redhat.com> (raw)
In-Reply-To: <1336156178.2582.15.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
On 05/04/2012 08:29 PM, Myklebust, Trond wrote:
> On Fri, 2012-05-04 at 18:03 +0200, Niels de Vos wrote:
>> On 05/03/2012 07:26 PM, Myklebust, Trond wrote:
>> > On Thu, 2012-05-03 at 19:07 +0200, Niels de Vos wrote:
>> >> On 05/03/2012 05:43 PM, Myklebust, Trond wrote:
>> >> > On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote:
>> >> >> When an application on an NFS-client (tested with NFSv3)
executes the
>> >> >> following steps, data written after the close() is never
flushed to the
>> >> >> server:
>> >> >>
>> >> >> 1. open()
>> >> >> 2. mmap()
>> >> >> 3. close()
>> >> >> 4.<modify data in the mmap'ed area>
>> >> >> 5. munmap()
>> >> >>
>> >> >> Dropping the caches (via /proc/sys/vm/drop_caches) or
unmounting does not
>> >> >> result in the data being sent to the server.
>> >> >>
>> >> >> The man-page for mmap (man 2 mmap) does mention that closing
the file-
>> >> >> descriptor does not munmap() the area. Using the mmap'ed area
after a
>> >> >> close() sound valid to me (even if it may be bad practice).
>> >> >>
>> >> >> Investigation and checking showed that the NFS-client does not
handle
>> >> >> munmap(), and only flushes on close(). To solve this problem,
least two
>> >> >> solutions can be proposed:
>> >> >>
>> >> >> a. f_ops->release() is called on munmap() as well as on close(),
>> >> >> therefore release() can be used to flush data as well.
>> >> >> b. In the 'struct vm_operations_struct' add a .close to the
>> >> >> 'struct vm_area_struct' on calling mmap()/nfs_file_mmap()
and flush
>> >> >> the data in the new close() function.
>> >> >>
>> >> >> Solution a. contains currently very few code changes:
>> >> >>
>> >> >> --- a/fs/nfs/inode.c
>> >> >> +++ b/fs/nfs/inode.c
>> >> >> @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct
file *filp)
>> >> >>
>> >> >> int nfs_release(struct inode *inode, struct file *filp)
>> >> >> {
>> >> >> + if (S_ISREG(inode->i_mode)&&
inode->i_mapping->nrpages != 0) {
>> >> >> + nfs_sync_mapping(inode->i_mapping);
>> >> >> nfs_file_clear_open_context(filp);
>> >> >> return 0;
>> >> >> }
>> >> >>
>> >> >> The disadvantage is, that nfs_release() is called on close()
too. That
>> >> >> means this causes a flushing of dirty pages, and just after
that the
>> >> >> nfs_file_clear_open_context() might flush again. The advantage
is that
>> >> >> it is possible (though not done at the moment) to return an
error in
>> >> >> case flushing failed.
>> >> >>
>> >> >> Solution b. does not provide an option to return an error, but
does not
>> >> >> get called on each close():
>> >> >>
>> >> >> --- a/fs/nfs/file.c
>> >> >> +++ b/fs/nfs/file.c
>> >> >> @@ -547,9 +547,17 @@ out:
>> >> >> return ret;
>> >> >> }
>> >> >>
>> >> >> +static void nfs_vm_close(struct vm_area_struct * vma)
>> >> >> +{
>> >> >> + struct file *filp = vma->vm_file;
>> >> >> +
>> >> >> + nfs_file_flush(filp, (fl_owner_t)filp);
>> >> >> +}
>> >> >> +
>> >> >> static const struct vm_operations_struct nfs_file_vm_ops = {
>> >> >> .fault = filemap_fault,
>> >> >> .page_mkwrite = nfs_vm_page_mkwrite,
>> >> >> + .close = nfs_vm_close,
>> >> >> };
>> >> >>
>> >> >> static int nfs_need_sync_write(struct file *filp, struct
inode *inode)
>> >> >>
>> >> >> I would like some feedback on what solution is most acceptable,
or any
>> >> >> other suggestions.
>> >> >
>> >> > Neither solution is acceptable. This isn't a close-to-open cache
>> >> > consistency issue.
>> >> >
>> >> > The syntax of mmap() for both block and NFS mounts is the same:
writes
>> >> > are not guaranteed to hit the disk until your application explicitly
>> >> > calls msync().
>> >> >
>> >>
>> >> Okay, that makes sense. But if the application never calls msync(), and
>> >> just munmap()'s the area, when should the changes be written? I did not
>> >> expect that unmounting just disregards the data.
>> >
>> > That suggests that the VM is failing to dirty the pages on munmap()
>> > before releasing the vma->vm_file. If so, then that would be a VM bug...
>> >
>>
>> I've checked if the VM tags the pages as dirty:
>> - f_ops->release() is called on munmap(). An added printk there, shows
>> that inode->i_state is set to I_DIRTY_PAGE.
>> - mapping_tagged(filp->f_mapping, PAGECACHE_TAG_DIRTY) also returns true
>>
>> From my understanding this is what the VM is expected to do, and the
>> pages are marked dirty correctly.
>>
>> However, nfs_inode->ndirty and nfs_inode->ncommit are both 0. It is
>> unclear to me how the VM is supposed to interact with the nfs_inode.
>> Some clarification or suggestion what to look into would be much
>> appreciated.
>
> The first time the page is touched, it will to trigger a ->pg_mkwrite(),
> which in the case of NFS will set up the necessary tracking structures
> to ensure that the page is written out using the correct credentials
> etc. In the case of NFSv4, it will also ensure that the file doesn't get
> closed on the server until the page is written out to disk.
>
> When the page is cleaned (i.e. something calls clear_page_dirty_for_io()
> as part of a write to disk), the call to page_mkclean() is supposed to
> re-write-protect the pte, ensuring that any future changes will
> re-trigger pg_mkwrite().
>
> You should be able to check if/when nfs_vm_page_mkwrite() is triggered
> using 'rpcdebug -m nfs -s pagecache' to turn on the NFS page cache
> debugging printks.
>
Many thanks for the explanation! At the moment I'm a little uncertain
where the problem lays, as the problem does not occur with more recent
kernels anymore. There likely was some invalid testing on my side :-/
I think I have to hunt down what changes were made and how this affects
the writing to mmap()'d files. The explanation you gave helps a lot in
understanding how NFS handles this all.
Thanks again, and sorry for any confusion,
Niels
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2012-05-07 8:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-03 15:34 [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor Niels de Vos
[not found] ` <4FA2A56B.1030208-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-05-03 15:43 ` Myklebust, Trond
2012-05-03 17:07 ` Niels de Vos
[not found] ` <4FA2BB68.7070304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-05-03 17:26 ` Myklebust, Trond
[not found] ` <1336066014.5385.41.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2012-05-04 16:03 ` Niels de Vos
2012-05-04 18:29 ` Myklebust, Trond
[not found] ` <1336156178.2582.15.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2012-05-07 8:49 ` Niels de Vos [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FA78C93.3040204@redhat.com \
--to=ndevos-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
--cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).