* NFS: msync required for data writes to server?
@ 2005-05-12 19:21 Linda Dunaphant
2005-05-13 0:57 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Linda Dunaphant @ 2005-05-12 19:21 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-kernel
Hi Trond,
On our 2.6.9 based systems, data written using mmap(MAP_SHARED) on a NFS
client is *never* being pushed out to the server if an explicit msync call
is not issued before the munmap.
On 11/12/04, there was a message thread concerning NFS corruption when
using mmap/munmap:
http://marc.theaimsgroup.com/?l=linux-nfs&m=110028817508318&w=2
In this thread you stated:
mmap() offers absolutely NO guarantees that the file will be synced to
disk on close. Use msync(MS_SYNC) if you want such a guarantee.
Are you saying that the data will *never* be written to the server? Could
you please clarify your position on this further?
Thanks!
Linda
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: NFS: msync required for data writes to server? 2005-05-12 19:21 NFS: msync required for data writes to server? Linda Dunaphant @ 2005-05-13 0:57 ` Andrew Morton 2005-05-13 2:21 ` Linda Dunaphant 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2005-05-13 0:57 UTC (permalink / raw) To: linda.dunaphant; +Cc: trond.myklebust, linux-kernel Linda Dunaphant <linda.dunaphant@ccur.com> wrote: > > Hi Trond, > > On our 2.6.9 based systems, data written using mmap(MAP_SHARED) on a NFS > client is *never* being pushed out to the server if an explicit msync call > is not issued before the munmap. > > On 11/12/04, there was a message thread concerning NFS corruption when > using mmap/munmap: > > http://marc.theaimsgroup.com/?l=linux-nfs&m=110028817508318&w=2 > > In this thread you stated: > > mmap() offers absolutely NO guarantees that the file will be synced to > disk on close. Use msync(MS_SYNC) if you want such a guarantee. > > Are you saying that the data will *never* be written to the server? Could > you please clarify your position on this further? The dirty pages will float about in memory until something causes them to be written back. That "something" could be msync/fsync/sync/pdflush/journal commit or, eventually, the VM system deciding that it wants to reuse that physical page for something else. So yes, the page will eventually be written to the server, but not for quite some time. In the case where the page was dirtied by mmap and was then unmapped (via munmap or via program exit), the page will be marked dirty in pagecache when its pagetable entry is unmapped. That makes the page's dirtiness visible to the VFS and the page will be written out approximately 30 seconds later by pdflush. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NFS: msync required for data writes to server? 2005-05-13 0:57 ` Andrew Morton @ 2005-05-13 2:21 ` Linda Dunaphant 2005-05-13 2:42 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Linda Dunaphant @ 2005-05-13 2:21 UTC (permalink / raw) To: Andrew Morton; +Cc: trond.myklebust, linux-kernel On Thu, 2005-05-12 at 20:57, Andrew Morton wrote: > Linda Dunaphant <linda.dunaphant@ccur.com> wrote: > > > > Hi Trond, > > > > On our 2.6.9 based systems, data written using mmap(MAP_SHARED) on a NFS > > client is *never* being pushed out to the server if an explicit msync call > > is not issued before the munmap. > > > > On 11/12/04, there was a message thread concerning NFS corruption when > > using mmap/munmap: > > > > http://marc.theaimsgroup.com/?l=linux-nfs&m=110028817508318&w=2 > > > > In this thread you stated: > > > > mmap() offers absolutely NO guarantees that the file will be synced to > > disk on close. Use msync(MS_SYNC) if you want such a guarantee. > > > > Are you saying that the data will *never* be written to the server? Could > > you please clarify your position on this further? > > The dirty pages will float about in memory until something causes them to > be written back. That "something" could be > msync/fsync/sync/pdflush/journal commit or, eventually, the VM system > deciding that it wants to reuse that physical page for something else. > > So yes, the page will eventually be written to the server, but not for > quite some time. > > In the case where the page was dirtied by mmap and was then unmapped (via > munmap or via program exit), the page will be marked dirty in pagecache > when its pagetable entry is unmapped. That makes the page's dirtiness > visible to the VFS and the page will be written out approximately 30 > seconds later by pdflush. Thank you for responding Andrew! The behavior that you describe is what I expected to see. However, with my test program that mmap's the NFS file on the client, writes the data, and then munmap's the file, this wasn't the case with the 2.6.9 based kernel I was using. The file data was NEVER being written to the server. This afternoon I downloaded and built several later kernels. I found that with 2.6.11, the problem still occurred. With 2.6.12-rc1, the problem did not occur. I could see the proper data on the server. Looking at the differences in fs/nfs between these trees I found a change to nfs_file_release() in fs/nfs/file.c. When I applied this change to my 2.6.9 tree, the data was written out to the server. @@ -105,6 +108,9 @@ static int nfs_file_release(struct inode *inode, struct file *filp) { + /* Ensure that dirty pages are flushed out with the right creds */ + if (filp->f_mode & FMODE_WRITE) + filemap_fdatawrite(filp->f_mapping); return NFS_PROTO(inode)->file_release(inode, filp); } Thanks for your help! Linda ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NFS: msync required for data writes to server? 2005-05-13 2:21 ` Linda Dunaphant @ 2005-05-13 2:42 ` Andrew Morton 2005-05-13 3:41 ` Linda Dunaphant 2005-05-13 3:48 ` Trond Myklebust 0 siblings, 2 replies; 7+ messages in thread From: Andrew Morton @ 2005-05-13 2:42 UTC (permalink / raw) To: linda.dunaphant; +Cc: trond.myklebust, linux-kernel Linda Dunaphant <linda.dunaphant@ccur.com> wrote: > > The behavior that you describe is what I expected to see. However, with > my test program that mmap's the NFS file on the client, writes the data, > and then munmap's the file, this wasn't the case with the 2.6.9 based > kernel I was using. The file data was NEVER being written to the server. There's something very wrong with that. > This afternoon I downloaded and built several later kernels. I found > that with 2.6.11, the problem still occurred. With 2.6.12-rc1, the > problem did not occur. I could see the proper data on the server. > > Looking at the differences in fs/nfs between these trees I found a > change to nfs_file_release() in fs/nfs/file.c. When I applied this > change to my 2.6.9 tree, the data was written out to the server. > > @@ -105,6 +108,9 @@ > static int > nfs_file_release(struct inode *inode, struct file *filp) > { > + /* Ensure that dirty pages are flushed out with the right creds > */ > + if (filp->f_mode & FMODE_WRITE) > + filemap_fdatawrite(filp->f_mapping); > return NFS_PROTO(inode)->file_release(inode, filp); > } Well yes, that'll sync the file on close, but it doesn't explain the original problem. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NFS: msync required for data writes to server? 2005-05-13 2:42 ` Andrew Morton @ 2005-05-13 3:41 ` Linda Dunaphant 2005-05-13 3:48 ` Trond Myklebust 1 sibling, 0 replies; 7+ messages in thread From: Linda Dunaphant @ 2005-05-13 3:41 UTC (permalink / raw) To: Andrew Morton; +Cc: trond.myklebust, linux-kernel On Thu, 2005-05-12 at 22:42, Andrew Morton wrote: > Linda Dunaphant <linda.dunaphant@ccur.com> wrote: > > > > The behavior that you describe is what I expected to see. However, with > > my test program that mmap's the NFS file on the client, writes the data, > > and then munmap's the file, this wasn't the case with the 2.6.9 based > > kernel I was using. The file data was NEVER being written to the server. > > There's something very wrong with that. > > > This afternoon I downloaded and built several later kernels. I found > > that with 2.6.11, the problem still occurred. With 2.6.12-rc1, the > > problem did not occur. I could see the proper data on the server. > > > > Looking at the differences in fs/nfs between these trees I found a > > change to nfs_file_release() in fs/nfs/file.c. When I applied this > > change to my 2.6.9 tree, the data was written out to the server. > > > > @@ -105,6 +108,9 @@ > > static int > > nfs_file_release(struct inode *inode, struct file *filp) > > { > > + /* Ensure that dirty pages are flushed out with the right creds > > */ > > + if (filp->f_mode & FMODE_WRITE) > > + filemap_fdatawrite(filp->f_mapping); > > return NFS_PROTO(inode)->file_release(inode, filp); > > } > > Well yes, that'll sync the file on close, but it doesn't explain the > original problem. > The original problem that I was trying to track down occurred with an Ada test suite. During the initialization phase it creates several data files with mmap(MAP_SHARED) that contain information that is used by later phases of the test. We were getting test failures on random tests because the testsuite running on the client was occasionally reading nulls from these files. Using ethereal to trace several testruns (~2.5 hrs for a complete run), I never saw any writes for these files being issued to the server. My original theory was that as long as the pages associated with the file were still in memory, the data was correct for applications running on the client - but if a page is being dropped without being written to the server, and someone references that offset later, the data would be reread from the server and nulls would be returned. After I found Trond's statement that msync was required to flush the data to the server, I created a small test program that creates a file, ftruncates it to 16384 bytes, mmaps it, closes it, writes the data, then munmaps it. With the 2.6.9 based kernel, I never saw the correct data on the server, even if I waited over an hour. If I looked at the file from the client, I would see the correct data. I also tried unmounting the filesystem to see if the data flush would occur, but it never did. We also tried adding msync calls to the testsuite and the original problem we had went away. This was the reason I posted my original question whether an msync is required if the file is NFS. Tomorrow we will try running the testsuite with the above change in place to see if it helps the original problem. I suspect the potential still exists for a page to be dropped before the file release that could still cause incorrect data to be read back from the server. Linda ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NFS: msync required for data writes to server? 2005-05-13 2:42 ` Andrew Morton 2005-05-13 3:41 ` Linda Dunaphant @ 2005-05-13 3:48 ` Trond Myklebust 2005-05-13 23:57 ` Linda Dunaphant 1 sibling, 1 reply; 7+ messages in thread From: Trond Myklebust @ 2005-05-13 3:48 UTC (permalink / raw) To: Andrew Morton; +Cc: linda.dunaphant, linux-kernel to den 12.05.2005 Klokka 19:42 (-0700) skreiv Andrew Morton: > > Looking at the differences in fs/nfs between these trees I found a > > change to nfs_file_release() in fs/nfs/file.c. When I applied this > > change to my 2.6.9 tree, the data was written out to the server. > > > > @@ -105,6 +108,9 @@ > > static int > > nfs_file_release(struct inode *inode, struct file *filp) > > { > > + /* Ensure that dirty pages are flushed out with the right creds > > */ > > + if (filp->f_mode & FMODE_WRITE) > > + filemap_fdatawrite(filp->f_mapping); > > return NFS_PROTO(inode)->file_release(inode, filp); > > } > > Well yes, that'll sync the file on close, but it doesn't explain the > original problem. See the above comment and the changelog entry for that patch. The problem is and was that writepage() does not take a struct file argument, and so we have to guess which RPC credentials to use when writing out the dirty pages. Before we had RPCSEC_GSS, we could cache the credential of the last person who opened the file, and expect that it would be usable for writing out dirty pages since the AUTH_SYS credentials have an infinite lifetime. With the advent of strong security, and credentials with a finite lifetime, that became risky behaviour, and so we now actually look for which files are still open at the moment when writepage is called. The problem was that the VFS does not flush dirty pages to disk on file_release(), and hence the above patch which at least will do it for NFS. Cheers, Trond ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NFS: msync required for data writes to server? 2005-05-13 3:48 ` Trond Myklebust @ 2005-05-13 23:57 ` Linda Dunaphant 0 siblings, 0 replies; 7+ messages in thread From: Linda Dunaphant @ 2005-05-13 23:57 UTC (permalink / raw) To: Trond Myklebust, Andrew Morton; +Cc: linux-kernel Trond & Andrew, Thank you both so much for your help! The change fixed the problem we were having with the testsuite with mmap of NFS files. We had 3 successful test runs today with the fix in place. I found the changeset entry for this fix: ChangeSet@1.2181.9.9, 2005-03-15 19:44:28-08:00, trond.myklebust@fys.uio.no [PATCH] NFS: Ensure that dirty pages are written with the right creds. When doing shared mmap writes, the resulting dirty NFS pages may find themselves incapable of being flushed out if I/O is started after the file was released. Make sure we start I/O on all existing dirty pages in nfs_file_release(). Cheers! Linda ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-05-13 23:57 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-12 19:21 NFS: msync required for data writes to server? Linda Dunaphant 2005-05-13 0:57 ` Andrew Morton 2005-05-13 2:21 ` Linda Dunaphant 2005-05-13 2:42 ` Andrew Morton 2005-05-13 3:41 ` Linda Dunaphant 2005-05-13 3:48 ` Trond Myklebust 2005-05-13 23:57 ` Linda Dunaphant
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox