Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Niels de Vos <ndevos@redhat.com>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	David Howells <dhowells@redhat.com>,
	Jeffrey Layton <jlayton@redhat.com>
Subject: Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor
Date: Fri, 04 May 2012 18:03:39 +0200	[thread overview]
Message-ID: <4FA3FDDB.6070600@redhat.com> (raw)
In-Reply-To: <1336066014.5385.41.camel@lade.trondhjem.org>

On 05/03/2012 07:26 PM, Myklebust, Trond wrote:
 > On Thu, 2012-05-03 at 19:07 +0200, Niels de Vos wrote:
 >> On 05/03/2012 05:43 PM, Myklebust, Trond wrote:
 >>   >  On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote:
 >>   >>  When an application on an NFS-client (tested with NFSv3) executes the
 >>   >>  following steps, data written after the close() is never flushed to the
 >>   >>  server:
 >>   >>
 >>   >>  1. open()
 >>   >>  2. mmap()
 >>   >>  3. close()
 >>   >>  4.<modify data in the mmap'ed area>
 >>   >>  5. munmap()
 >>   >>
 >>   >>  Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not
 >>   >>  result in the data being sent to the server.
 >>   >>
 >>   >>  The man-page for mmap (man 2 mmap) does mention that closing the file-
 >>   >>  descriptor does not munmap() the area. Using the mmap'ed area after a
 >>   >>  close() sound valid to me (even if it may be bad practice).
 >>   >>
 >>   >>  Investigation and checking showed that the NFS-client does not handle
 >>   >>  munmap(), and only flushes on close(). To solve this problem, least two
 >>   >>  solutions can be proposed:
 >>   >>
 >>   >>  a. f_ops->release() is called on munmap() as well as on close(),
 >>   >>       therefore release() can be used to flush data as well.
 >>   >>  b. In the 'struct vm_operations_struct' add a .close to the
 >>   >>       'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush
 >>   >>       the data in the new close() function.
 >>   >>
 >>   >>  Solution a. contains currently very few code changes:
 >>   >>
 >>   >>  --- a/fs/nfs/inode.c
 >>   >>  +++ b/fs/nfs/inode.c
 >>   >>  @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp)
 >>   >>
 >>   >>     int nfs_release(struct inode *inode, struct file *filp)
 >>   >>     {
 >>   >>  +       if (S_ISREG(inode->i_mode)&&   inode->i_mapping->nrpages != 0) {
 >>   >>  +               nfs_sync_mapping(inode->i_mapping);
 >>   >>            nfs_file_clear_open_context(filp);
 >>   >>            return 0;
 >>   >>     }
 >>   >>
 >>   >>  The disadvantage is, that nfs_release() is called on close() too. That
 >>   >>  means this causes a flushing of dirty pages, and just after that the
 >>   >>  nfs_file_clear_open_context() might flush again. The advantage is that
 >>   >>  it is possible (though not done at the moment) to return an error in
 >>   >>  case flushing failed.
 >>   >>
 >>   >>  Solution b. does not provide an option to return an error, but does not
 >>   >>  get called on each close():
 >>   >>
 >>   >>  --- a/fs/nfs/file.c
 >>   >>  +++ b/fs/nfs/file.c
 >>   >>  @@ -547,9 +547,17 @@ out:
 >>   >>     	return ret;
 >>   >>     }
 >>   >>
 >>   >>  +static void nfs_vm_close(struct vm_area_struct * vma)
 >>   >>  +{
 >>   >>  +	struct file *filp = vma->vm_file;
 >>   >>  +
 >>   >>  +	nfs_file_flush(filp, (fl_owner_t)filp);
 >>   >>  +}
 >>   >>  +
 >>   >>     static const struct vm_operations_struct nfs_file_vm_ops = {
 >>   >>     	.fault = filemap_fault,
 >>   >>     	.page_mkwrite = nfs_vm_page_mkwrite,
 >>   >>  +	.close = nfs_vm_close,
 >>   >>     };
 >>   >>
 >>   >>     static int nfs_need_sync_write(struct file *filp, struct inode *inode)
 >>   >>
 >>   >>  I would like some feedback on what solution is most acceptable, or any
 >>   >>  other suggestions.
 >>   >
 >>   >  Neither solution is acceptable. This isn't a close-to-open cache
 >>   >  consistency issue.
 >>   >
 >>   >  The syntax of mmap() for both block and NFS mounts is the same: writes
 >>   >  are not guaranteed to hit the disk until your application explicitly
 >>   >  calls msync().
 >>   >
 >>
 >> Okay, that makes sense. But if the application never calls msync(), and
 >> just munmap()'s the area, when should the changes be written? I did not
 >> expect that unmounting just disregards the data.
 >
 > That suggests that the VM is failing to dirty the pages on munmap()
 > before releasing the vma->vm_file. If so, then that would be a VM bug...
 >

I've checked if the VM tags the pages as dirty:
- f_ops->release() is called on munmap(). An added printk there, shows
   that inode->i_state is set to I_DIRTY_PAGE.
- mapping_tagged(filp->f_mapping, PAGECACHE_TAG_DIRTY) also returns true

 From my understanding this is what the VM is expected to do, and the
pages are marked dirty correctly.

However, nfs_inode->ndirty and nfs_inode->ncommit are both 0. It is
unclear to me how the VM is supposed to interact with the nfs_inode.
Some clarification or suggestion what to look into would be much
appreciated.

Cheers,
Niels

next prev parent reply	other threads:[~2012-05-04 16:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-03 15:34 [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor Niels de Vos
2012-05-03 15:43 ` Myklebust, Trond
2012-05-03 17:07   ` Niels de Vos
2012-05-03 17:26     ` Myklebust, Trond
2012-05-04 16:03       ` Niels de Vos [this message]
2012-05-04 18:29         ` Myklebust, Trond
2012-05-07  8:49           ` Niels de Vos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA3FDDB.6070600@redhat.com \
    --to=ndevos@redhat.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=dhowells@redhat.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).