* [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor @ 2012-05-03 15:34 Niels de Vos [not found] ` <4FA2A56B.1030208-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Niels de Vos @ 2012-05-03 15:34 UTC (permalink / raw) To: linux-nfs-u79uwXL29TY76Z2rM5mHXA Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David Howells, Jeffrey Layton When an application on an NFS-client (tested with NFSv3) executes the following steps, data written after the close() is never flushed to the server: 1. open() 2. mmap() 3. close() 4. <modify data in the mmap'ed area> 5. munmap() Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not result in the data being sent to the server. The man-page for mmap (man 2 mmap) does mention that closing the file- descriptor does not munmap() the area. Using the mmap'ed area after a close() sound valid to me (even if it may be bad practice). Investigation and checking showed that the NFS-client does not handle munmap(), and only flushes on close(). To solve this problem, least two solutions can be proposed: a. f_ops->release() is called on munmap() as well as on close(), therefore release() can be used to flush data as well. b. In the 'struct vm_operations_struct' add a .close to the 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush the data in the new close() function. Solution a. contains currently very few code changes: --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) int nfs_release(struct inode *inode, struct file *filp) { + if (S_ISREG(inode->i_mode) && inode->i_mapping->nrpages != 0) { + nfs_sync_mapping(inode->i_mapping); nfs_file_clear_open_context(filp); return 0; } The disadvantage is, that nfs_release() is called on close() too. That means this causes a flushing of dirty pages, and just after that the nfs_file_clear_open_context() might flush again. The advantage is that it is possible (though not done at the moment) to return an error in case flushing failed. Solution b. does not provide an option to return an error, but does not get called on each close(): --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -547,9 +547,17 @@ out: return ret; } +static void nfs_vm_close(struct vm_area_struct * vma) +{ + struct file *filp = vma->vm_file; + + nfs_file_flush(filp, (fl_owner_t)filp); +} + static const struct vm_operations_struct nfs_file_vm_ops = { .fault = filemap_fault, .page_mkwrite = nfs_vm_page_mkwrite, + .close = nfs_vm_close, }; static int nfs_need_sync_write(struct file *filp, struct inode *inode) I would like some feedback on what solution is most acceptable, or any other suggestions. Many thanks, Niels -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <4FA2A56B.1030208-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor [not found] ` <4FA2A56B.1030208-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2012-05-03 15:43 ` Myklebust, Trond 2012-05-03 17:07 ` Niels de Vos 0 siblings, 1 reply; 7+ messages in thread From: Myklebust, Trond @ 2012-05-03 15:43 UTC (permalink / raw) To: Niels de Vos Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Howells, Jeffrey Layton On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote: > When an application on an NFS-client (tested with NFSv3) executes the > following steps, data written after the close() is never flushed to the > server: > > 1. open() > 2. mmap() > 3. close() > 4. <modify data in the mmap'ed area> > 5. munmap() > > Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not > result in the data being sent to the server. > > The man-page for mmap (man 2 mmap) does mention that closing the file- > descriptor does not munmap() the area. Using the mmap'ed area after a > close() sound valid to me (even if it may be bad practice). > > Investigation and checking showed that the NFS-client does not handle > munmap(), and only flushes on close(). To solve this problem, least two > solutions can be proposed: > > a. f_ops->release() is called on munmap() as well as on close(), > therefore release() can be used to flush data as well. > b. In the 'struct vm_operations_struct' add a .close to the > 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush > the data in the new close() function. > > Solution a. contains currently very few code changes: > > --- a/fs/nfs/inode.c > +++ b/fs/nfs/inode.c > @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) > > int nfs_release(struct inode *inode, struct file *filp) > { > + if (S_ISREG(inode->i_mode) && inode->i_mapping->nrpages != 0) { > + nfs_sync_mapping(inode->i_mapping); > nfs_file_clear_open_context(filp); > return 0; > } > > The disadvantage is, that nfs_release() is called on close() too. That > means this causes a flushing of dirty pages, and just after that the > nfs_file_clear_open_context() might flush again. The advantage is that > it is possible (though not done at the moment) to return an error in > case flushing failed. > > Solution b. does not provide an option to return an error, but does not > get called on each close(): > > --- a/fs/nfs/file.c > +++ b/fs/nfs/file.c > @@ -547,9 +547,17 @@ out: > return ret; > } > > +static void nfs_vm_close(struct vm_area_struct * vma) > +{ > + struct file *filp = vma->vm_file; > + > + nfs_file_flush(filp, (fl_owner_t)filp); > +} > + > static const struct vm_operations_struct nfs_file_vm_ops = { > .fault = filemap_fault, > .page_mkwrite = nfs_vm_page_mkwrite, > + .close = nfs_vm_close, > }; > > static int nfs_need_sync_write(struct file *filp, struct inode *inode) > > I would like some feedback on what solution is most acceptable, or any > other suggestions. Neither solution is acceptable. This isn't a close-to-open cache consistency issue. The syntax of mmap() for both block and NFS mounts is the same: writes are not guaranteed to hit the disk until your application explicitly calls msync(). -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor 2012-05-03 15:43 ` Myklebust, Trond @ 2012-05-03 17:07 ` Niels de Vos [not found] ` <4FA2BB68.7070304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Niels de Vos @ 2012-05-03 17:07 UTC (permalink / raw) To: Myklebust, Trond Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Howells, Jeffrey Layton On 05/03/2012 05:43 PM, Myklebust, Trond wrote: > On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote: >> When an application on an NFS-client (tested with NFSv3) executes the >> following steps, data written after the close() is never flushed to the >> server: >> >> 1. open() >> 2. mmap() >> 3. close() >> 4.<modify data in the mmap'ed area> >> 5. munmap() >> >> Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not >> result in the data being sent to the server. >> >> The man-page for mmap (man 2 mmap) does mention that closing the file- >> descriptor does not munmap() the area. Using the mmap'ed area after a >> close() sound valid to me (even if it may be bad practice). >> >> Investigation and checking showed that the NFS-client does not handle >> munmap(), and only flushes on close(). To solve this problem, least two >> solutions can be proposed: >> >> a. f_ops->release() is called on munmap() as well as on close(), >> therefore release() can be used to flush data as well. >> b. In the 'struct vm_operations_struct' add a .close to the >> 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush >> the data in the new close() function. >> >> Solution a. contains currently very few code changes: >> >> --- a/fs/nfs/inode.c >> +++ b/fs/nfs/inode.c >> @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) >> >> int nfs_release(struct inode *inode, struct file *filp) >> { >> + if (S_ISREG(inode->i_mode)&& inode->i_mapping->nrpages != 0) { >> + nfs_sync_mapping(inode->i_mapping); >> nfs_file_clear_open_context(filp); >> return 0; >> } >> >> The disadvantage is, that nfs_release() is called on close() too. That >> means this causes a flushing of dirty pages, and just after that the >> nfs_file_clear_open_context() might flush again. The advantage is that >> it is possible (though not done at the moment) to return an error in >> case flushing failed. >> >> Solution b. does not provide an option to return an error, but does not >> get called on each close(): >> >> --- a/fs/nfs/file.c >> +++ b/fs/nfs/file.c >> @@ -547,9 +547,17 @@ out: >> return ret; >> } >> >> +static void nfs_vm_close(struct vm_area_struct * vma) >> +{ >> + struct file *filp = vma->vm_file; >> + >> + nfs_file_flush(filp, (fl_owner_t)filp); >> +} >> + >> static const struct vm_operations_struct nfs_file_vm_ops = { >> .fault = filemap_fault, >> .page_mkwrite = nfs_vm_page_mkwrite, >> + .close = nfs_vm_close, >> }; >> >> static int nfs_need_sync_write(struct file *filp, struct inode *inode) >> >> I would like some feedback on what solution is most acceptable, or any >> other suggestions. > > Neither solution is acceptable. This isn't a close-to-open cache > consistency issue. > > The syntax of mmap() for both block and NFS mounts is the same: writes > are not guaranteed to hit the disk until your application explicitly > calls msync(). > Okay, that makes sense. But if the application never calls msync(), and just munmap()'s the area, when should the changes be written? I did not expect that unmounting just disregards the data. Any suggestions on correcting this (if it should be corrected) are welcome. Thanks again, Niels ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <4FA2BB68.7070304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor [not found] ` <4FA2BB68.7070304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2012-05-03 17:26 ` Myklebust, Trond [not found] ` <1336066014.5385.41.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Myklebust, Trond @ 2012-05-03 17:26 UTC (permalink / raw) To: Niels de Vos Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Howells, Jeffrey Layton On Thu, 2012-05-03 at 19:07 +0200, Niels de Vos wrote: > On 05/03/2012 05:43 PM, Myklebust, Trond wrote: > > On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote: > >> When an application on an NFS-client (tested with NFSv3) executes the > >> following steps, data written after the close() is never flushed to the > >> server: > >> > >> 1. open() > >> 2. mmap() > >> 3. close() > >> 4.<modify data in the mmap'ed area> > >> 5. munmap() > >> > >> Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not > >> result in the data being sent to the server. > >> > >> The man-page for mmap (man 2 mmap) does mention that closing the file- > >> descriptor does not munmap() the area. Using the mmap'ed area after a > >> close() sound valid to me (even if it may be bad practice). > >> > >> Investigation and checking showed that the NFS-client does not handle > >> munmap(), and only flushes on close(). To solve this problem, least two > >> solutions can be proposed: > >> > >> a. f_ops->release() is called on munmap() as well as on close(), > >> therefore release() can be used to flush data as well. > >> b. In the 'struct vm_operations_struct' add a .close to the > >> 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush > >> the data in the new close() function. > >> > >> Solution a. contains currently very few code changes: > >> > >> --- a/fs/nfs/inode.c > >> +++ b/fs/nfs/inode.c > >> @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) > >> > >> int nfs_release(struct inode *inode, struct file *filp) > >> { > >> + if (S_ISREG(inode->i_mode)&& inode->i_mapping->nrpages != 0) { > >> + nfs_sync_mapping(inode->i_mapping); > >> nfs_file_clear_open_context(filp); > >> return 0; > >> } > >> > >> The disadvantage is, that nfs_release() is called on close() too. That > >> means this causes a flushing of dirty pages, and just after that the > >> nfs_file_clear_open_context() might flush again. The advantage is that > >> it is possible (though not done at the moment) to return an error in > >> case flushing failed. > >> > >> Solution b. does not provide an option to return an error, but does not > >> get called on each close(): > >> > >> --- a/fs/nfs/file.c > >> +++ b/fs/nfs/file.c > >> @@ -547,9 +547,17 @@ out: > >> return ret; > >> } > >> > >> +static void nfs_vm_close(struct vm_area_struct * vma) > >> +{ > >> + struct file *filp = vma->vm_file; > >> + > >> + nfs_file_flush(filp, (fl_owner_t)filp); > >> +} > >> + > >> static const struct vm_operations_struct nfs_file_vm_ops = { > >> .fault = filemap_fault, > >> .page_mkwrite = nfs_vm_page_mkwrite, > >> + .close = nfs_vm_close, > >> }; > >> > >> static int nfs_need_sync_write(struct file *filp, struct inode *inode) > >> > >> I would like some feedback on what solution is most acceptable, or any > >> other suggestions. > > > > Neither solution is acceptable. This isn't a close-to-open cache > > consistency issue. > > > > The syntax of mmap() for both block and NFS mounts is the same: writes > > are not guaranteed to hit the disk until your application explicitly > > calls msync(). > > > > Okay, that makes sense. But if the application never calls msync(), and > just munmap()'s the area, when should the changes be written? I did not > expect that unmounting just disregards the data. That suggests that the VM is failing to dirty the pages on munmap() before releasing the vma->vm_file. If so, then that would be a VM bug... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1336066014.5385.41.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>]
* Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor [not found] ` <1336066014.5385.41.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> @ 2012-05-04 16:03 ` Niels de Vos 2012-05-04 18:29 ` Myklebust, Trond 0 siblings, 1 reply; 7+ messages in thread From: Niels de Vos @ 2012-05-04 16:03 UTC (permalink / raw) To: Myklebust, Trond Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Howells, Jeffrey Layton On 05/03/2012 07:26 PM, Myklebust, Trond wrote: > On Thu, 2012-05-03 at 19:07 +0200, Niels de Vos wrote: >> On 05/03/2012 05:43 PM, Myklebust, Trond wrote: >> > On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote: >> >> When an application on an NFS-client (tested with NFSv3) executes the >> >> following steps, data written after the close() is never flushed to the >> >> server: >> >> >> >> 1. open() >> >> 2. mmap() >> >> 3. close() >> >> 4.<modify data in the mmap'ed area> >> >> 5. munmap() >> >> >> >> Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not >> >> result in the data being sent to the server. >> >> >> >> The man-page for mmap (man 2 mmap) does mention that closing the file- >> >> descriptor does not munmap() the area. Using the mmap'ed area after a >> >> close() sound valid to me (even if it may be bad practice). >> >> >> >> Investigation and checking showed that the NFS-client does not handle >> >> munmap(), and only flushes on close(). To solve this problem, least two >> >> solutions can be proposed: >> >> >> >> a. f_ops->release() is called on munmap() as well as on close(), >> >> therefore release() can be used to flush data as well. >> >> b. In the 'struct vm_operations_struct' add a .close to the >> >> 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush >> >> the data in the new close() function. >> >> >> >> Solution a. contains currently very few code changes: >> >> >> >> --- a/fs/nfs/inode.c >> >> +++ b/fs/nfs/inode.c >> >> @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) >> >> >> >> int nfs_release(struct inode *inode, struct file *filp) >> >> { >> >> + if (S_ISREG(inode->i_mode)&& inode->i_mapping->nrpages != 0) { >> >> + nfs_sync_mapping(inode->i_mapping); >> >> nfs_file_clear_open_context(filp); >> >> return 0; >> >> } >> >> >> >> The disadvantage is, that nfs_release() is called on close() too. That >> >> means this causes a flushing of dirty pages, and just after that the >> >> nfs_file_clear_open_context() might flush again. The advantage is that >> >> it is possible (though not done at the moment) to return an error in >> >> case flushing failed. >> >> >> >> Solution b. does not provide an option to return an error, but does not >> >> get called on each close(): >> >> >> >> --- a/fs/nfs/file.c >> >> +++ b/fs/nfs/file.c >> >> @@ -547,9 +547,17 @@ out: >> >> return ret; >> >> } >> >> >> >> +static void nfs_vm_close(struct vm_area_struct * vma) >> >> +{ >> >> + struct file *filp = vma->vm_file; >> >> + >> >> + nfs_file_flush(filp, (fl_owner_t)filp); >> >> +} >> >> + >> >> static const struct vm_operations_struct nfs_file_vm_ops = { >> >> .fault = filemap_fault, >> >> .page_mkwrite = nfs_vm_page_mkwrite, >> >> + .close = nfs_vm_close, >> >> }; >> >> >> >> static int nfs_need_sync_write(struct file *filp, struct inode *inode) >> >> >> >> I would like some feedback on what solution is most acceptable, or any >> >> other suggestions. >> > >> > Neither solution is acceptable. This isn't a close-to-open cache >> > consistency issue. >> > >> > The syntax of mmap() for both block and NFS mounts is the same: writes >> > are not guaranteed to hit the disk until your application explicitly >> > calls msync(). >> > >> >> Okay, that makes sense. But if the application never calls msync(), and >> just munmap()'s the area, when should the changes be written? I did not >> expect that unmounting just disregards the data. > > That suggests that the VM is failing to dirty the pages on munmap() > before releasing the vma->vm_file. If so, then that would be a VM bug... > I've checked if the VM tags the pages as dirty: - f_ops->release() is called on munmap(). An added printk there, shows that inode->i_state is set to I_DIRTY_PAGE. - mapping_tagged(filp->f_mapping, PAGECACHE_TAG_DIRTY) also returns true From my understanding this is what the VM is expected to do, and the pages are marked dirty correctly. However, nfs_inode->ndirty and nfs_inode->ncommit are both 0. It is unclear to me how the VM is supposed to interact with the nfs_inode. Some clarification or suggestion what to look into would be much appreciated. Cheers, Niels -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor 2012-05-04 16:03 ` Niels de Vos @ 2012-05-04 18:29 ` Myklebust, Trond [not found] ` <1336156178.2582.15.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Myklebust, Trond @ 2012-05-04 18:29 UTC (permalink / raw) To: Niels de Vos Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Howells, Jeffrey Layton On Fri, 2012-05-04 at 18:03 +0200, Niels de Vos wrote: > On 05/03/2012 07:26 PM, Myklebust, Trond wrote: > > On Thu, 2012-05-03 at 19:07 +0200, Niels de Vos wrote: > >> On 05/03/2012 05:43 PM, Myklebust, Trond wrote: > >> > On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote: > >> >> When an application on an NFS-client (tested with NFSv3) executes the > >> >> following steps, data written after the close() is never flushed to the > >> >> server: > >> >> > >> >> 1. open() > >> >> 2. mmap() > >> >> 3. close() > >> >> 4.<modify data in the mmap'ed area> > >> >> 5. munmap() > >> >> > >> >> Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not > >> >> result in the data being sent to the server. > >> >> > >> >> The man-page for mmap (man 2 mmap) does mention that closing the file- > >> >> descriptor does not munmap() the area. Using the mmap'ed area after a > >> >> close() sound valid to me (even if it may be bad practice). > >> >> > >> >> Investigation and checking showed that the NFS-client does not handle > >> >> munmap(), and only flushes on close(). To solve this problem, least two > >> >> solutions can be proposed: > >> >> > >> >> a. f_ops->release() is called on munmap() as well as on close(), > >> >> therefore release() can be used to flush data as well. > >> >> b. In the 'struct vm_operations_struct' add a .close to the > >> >> 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush > >> >> the data in the new close() function. > >> >> > >> >> Solution a. contains currently very few code changes: > >> >> > >> >> --- a/fs/nfs/inode.c > >> >> +++ b/fs/nfs/inode.c > >> >> @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) > >> >> > >> >> int nfs_release(struct inode *inode, struct file *filp) > >> >> { > >> >> + if (S_ISREG(inode->i_mode)&& inode->i_mapping->nrpages != 0) { > >> >> + nfs_sync_mapping(inode->i_mapping); > >> >> nfs_file_clear_open_context(filp); > >> >> return 0; > >> >> } > >> >> > >> >> The disadvantage is, that nfs_release() is called on close() too. That > >> >> means this causes a flushing of dirty pages, and just after that the > >> >> nfs_file_clear_open_context() might flush again. The advantage is that > >> >> it is possible (though not done at the moment) to return an error in > >> >> case flushing failed. > >> >> > >> >> Solution b. does not provide an option to return an error, but does not > >> >> get called on each close(): > >> >> > >> >> --- a/fs/nfs/file.c > >> >> +++ b/fs/nfs/file.c > >> >> @@ -547,9 +547,17 @@ out: > >> >> return ret; > >> >> } > >> >> > >> >> +static void nfs_vm_close(struct vm_area_struct * vma) > >> >> +{ > >> >> + struct file *filp = vma->vm_file; > >> >> + > >> >> + nfs_file_flush(filp, (fl_owner_t)filp); > >> >> +} > >> >> + > >> >> static const struct vm_operations_struct nfs_file_vm_ops = { > >> >> .fault = filemap_fault, > >> >> .page_mkwrite = nfs_vm_page_mkwrite, > >> >> + .close = nfs_vm_close, > >> >> }; > >> >> > >> >> static int nfs_need_sync_write(struct file *filp, struct inode *inode) > >> >> > >> >> I would like some feedback on what solution is most acceptable, or any > >> >> other suggestions. > >> > > >> > Neither solution is acceptable. This isn't a close-to-open cache > >> > consistency issue. > >> > > >> > The syntax of mmap() for both block and NFS mounts is the same: writes > >> > are not guaranteed to hit the disk until your application explicitly > >> > calls msync(). > >> > > >> > >> Okay, that makes sense. But if the application never calls msync(), and > >> just munmap()'s the area, when should the changes be written? I did not > >> expect that unmounting just disregards the data. > > > > That suggests that the VM is failing to dirty the pages on munmap() > > before releasing the vma->vm_file. If so, then that would be a VM bug... > > > > I've checked if the VM tags the pages as dirty: > - f_ops->release() is called on munmap(). An added printk there, shows > that inode->i_state is set to I_DIRTY_PAGE. > - mapping_tagged(filp->f_mapping, PAGECACHE_TAG_DIRTY) also returns true > > From my understanding this is what the VM is expected to do, and the > pages are marked dirty correctly. > > However, nfs_inode->ndirty and nfs_inode->ncommit are both 0. It is > unclear to me how the VM is supposed to interact with the nfs_inode. > Some clarification or suggestion what to look into would be much > appreciated. The first time the page is touched, it will to trigger a ->pg_mkwrite(), which in the case of NFS will set up the necessary tracking structures to ensure that the page is written out using the correct credentials etc. In the case of NFSv4, it will also ensure that the file doesn't get closed on the server until the page is written out to disk. When the page is cleaned (i.e. something calls clear_page_dirty_for_io() as part of a write to disk), the call to page_mkclean() is supposed to re-write-protect the pte, ensuring that any future changes will re-trigger pg_mkwrite(). You should be able to check if/when nfs_vm_page_mkwrite() is triggered using 'rpcdebug -m nfs -s pagecache' to turn on the NFS page cache debugging printks. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1336156178.2582.15.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>]
* Re: [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor [not found] ` <1336156178.2582.15.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> @ 2012-05-07 8:49 ` Niels de Vos 0 siblings, 0 replies; 7+ messages in thread From: Niels de Vos @ 2012-05-07 8:49 UTC (permalink / raw) To: Myklebust, Trond Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Howells, Jeffrey Layton On 05/04/2012 08:29 PM, Myklebust, Trond wrote: > On Fri, 2012-05-04 at 18:03 +0200, Niels de Vos wrote: >> On 05/03/2012 07:26 PM, Myklebust, Trond wrote: >> > On Thu, 2012-05-03 at 19:07 +0200, Niels de Vos wrote: >> >> On 05/03/2012 05:43 PM, Myklebust, Trond wrote: >> >> > On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote: >> >> >> When an application on an NFS-client (tested with NFSv3) executes the >> >> >> following steps, data written after the close() is never flushed to the >> >> >> server: >> >> >> >> >> >> 1. open() >> >> >> 2. mmap() >> >> >> 3. close() >> >> >> 4.<modify data in the mmap'ed area> >> >> >> 5. munmap() >> >> >> >> >> >> Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not >> >> >> result in the data being sent to the server. >> >> >> >> >> >> The man-page for mmap (man 2 mmap) does mention that closing the file- >> >> >> descriptor does not munmap() the area. Using the mmap'ed area after a >> >> >> close() sound valid to me (even if it may be bad practice). >> >> >> >> >> >> Investigation and checking showed that the NFS-client does not handle >> >> >> munmap(), and only flushes on close(). To solve this problem, least two >> >> >> solutions can be proposed: >> >> >> >> >> >> a. f_ops->release() is called on munmap() as well as on close(), >> >> >> therefore release() can be used to flush data as well. >> >> >> b. In the 'struct vm_operations_struct' add a .close to the >> >> >> 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush >> >> >> the data in the new close() function. >> >> >> >> >> >> Solution a. contains currently very few code changes: >> >> >> >> >> >> --- a/fs/nfs/inode.c >> >> >> +++ b/fs/nfs/inode.c >> >> >> @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) >> >> >> >> >> >> int nfs_release(struct inode *inode, struct file *filp) >> >> >> { >> >> >> + if (S_ISREG(inode->i_mode)&& inode->i_mapping->nrpages != 0) { >> >> >> + nfs_sync_mapping(inode->i_mapping); >> >> >> nfs_file_clear_open_context(filp); >> >> >> return 0; >> >> >> } >> >> >> >> >> >> The disadvantage is, that nfs_release() is called on close() too. That >> >> >> means this causes a flushing of dirty pages, and just after that the >> >> >> nfs_file_clear_open_context() might flush again. The advantage is that >> >> >> it is possible (though not done at the moment) to return an error in >> >> >> case flushing failed. >> >> >> >> >> >> Solution b. does not provide an option to return an error, but does not >> >> >> get called on each close(): >> >> >> >> >> >> --- a/fs/nfs/file.c >> >> >> +++ b/fs/nfs/file.c >> >> >> @@ -547,9 +547,17 @@ out: >> >> >> return ret; >> >> >> } >> >> >> >> >> >> +static void nfs_vm_close(struct vm_area_struct * vma) >> >> >> +{ >> >> >> + struct file *filp = vma->vm_file; >> >> >> + >> >> >> + nfs_file_flush(filp, (fl_owner_t)filp); >> >> >> +} >> >> >> + >> >> >> static const struct vm_operations_struct nfs_file_vm_ops = { >> >> >> .fault = filemap_fault, >> >> >> .page_mkwrite = nfs_vm_page_mkwrite, >> >> >> + .close = nfs_vm_close, >> >> >> }; >> >> >> >> >> >> static int nfs_need_sync_write(struct file *filp, struct inode *inode) >> >> >> >> >> >> I would like some feedback on what solution is most acceptable, or any >> >> >> other suggestions. >> >> > >> >> > Neither solution is acceptable. This isn't a close-to-open cache >> >> > consistency issue. >> >> > >> >> > The syntax of mmap() for both block and NFS mounts is the same: writes >> >> > are not guaranteed to hit the disk until your application explicitly >> >> > calls msync(). >> >> > >> >> >> >> Okay, that makes sense. But if the application never calls msync(), and >> >> just munmap()'s the area, when should the changes be written? I did not >> >> expect that unmounting just disregards the data. >> > >> > That suggests that the VM is failing to dirty the pages on munmap() >> > before releasing the vma->vm_file. If so, then that would be a VM bug... >> > >> >> I've checked if the VM tags the pages as dirty: >> - f_ops->release() is called on munmap(). An added printk there, shows >> that inode->i_state is set to I_DIRTY_PAGE. >> - mapping_tagged(filp->f_mapping, PAGECACHE_TAG_DIRTY) also returns true >> >> From my understanding this is what the VM is expected to do, and the >> pages are marked dirty correctly. >> >> However, nfs_inode->ndirty and nfs_inode->ncommit are both 0. It is >> unclear to me how the VM is supposed to interact with the nfs_inode. >> Some clarification or suggestion what to look into would be much >> appreciated. > > The first time the page is touched, it will to trigger a ->pg_mkwrite(), > which in the case of NFS will set up the necessary tracking structures > to ensure that the page is written out using the correct credentials > etc. In the case of NFSv4, it will also ensure that the file doesn't get > closed on the server until the page is written out to disk. > > When the page is cleaned (i.e. something calls clear_page_dirty_for_io() > as part of a write to disk), the call to page_mkclean() is supposed to > re-write-protect the pte, ensuring that any future changes will > re-trigger pg_mkwrite(). > > You should be able to check if/when nfs_vm_page_mkwrite() is triggered > using 'rpcdebug -m nfs -s pagecache' to turn on the NFS page cache > debugging printks. > Many thanks for the explanation! At the moment I'm a little uncertain where the problem lays, as the problem does not occur with more recent kernels anymore. There likely was some invalid testing on my side :-/ I think I have to hunt down what changes were made and how this affects the writing to mmap()'d files. The explanation you gave helps a lot in understanding how NFS handles this all. Thanks again, and sorry for any confusion, Niels -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-05-07 8:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-03 15:34 [RFC] Dataloss on NFS-clients who modify an mmap()'d area after closing the file-descriptor Niels de Vos [not found] ` <4FA2A56B.1030208-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-05-03 15:43 ` Myklebust, Trond 2012-05-03 17:07 ` Niels de Vos [not found] ` <4FA2BB68.7070304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-05-03 17:26 ` Myklebust, Trond [not found] ` <1336066014.5385.41.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2012-05-04 16:03 ` Niels de Vos 2012-05-04 18:29 ` Myklebust, Trond [not found] ` <1336156178.2582.15.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 2012-05-07 8:49 ` Niels de Vos
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).