public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* NFS: msync required for data writes to server?
@ 2005-05-12 19:21 Linda Dunaphant
  2005-05-13  0:57 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Linda Dunaphant @ 2005-05-12 19:21 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-kernel

Hi Trond,

On our 2.6.9 based systems, data written using mmap(MAP_SHARED) on a NFS
client is *never* being pushed out to the server if an explicit msync call
is not issued before the munmap.

On 11/12/04, there was a message thread concerning NFS corruption when
using mmap/munmap:

http://marc.theaimsgroup.com/?l=linux-nfs&m=110028817508318&w=2

In this thread you stated:

     mmap() offers absolutely NO guarantees that the file will be synced to
     disk on close. Use msync(MS_SYNC) if you want such a guarantee.

Are you saying that the data will *never* be written to the server?  Could
you please clarify your position on this further? 

Thanks!
Linda



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NFS: msync required for data writes to server?
  2005-05-12 19:21 NFS: msync required for data writes to server? Linda Dunaphant
@ 2005-05-13  0:57 ` Andrew Morton
  2005-05-13  2:21   ` Linda Dunaphant
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2005-05-13  0:57 UTC (permalink / raw)
  To: linda.dunaphant; +Cc: trond.myklebust, linux-kernel

Linda Dunaphant <linda.dunaphant@ccur.com> wrote:
>
> Hi Trond,
> 
> On our 2.6.9 based systems, data written using mmap(MAP_SHARED) on a NFS
> client is *never* being pushed out to the server if an explicit msync call
> is not issued before the munmap.
> 
> On 11/12/04, there was a message thread concerning NFS corruption when
> using mmap/munmap:
> 
> http://marc.theaimsgroup.com/?l=linux-nfs&m=110028817508318&w=2
> 
> In this thread you stated:
> 
>      mmap() offers absolutely NO guarantees that the file will be synced to
>      disk on close. Use msync(MS_SYNC) if you want such a guarantee.
> 
> Are you saying that the data will *never* be written to the server?  Could
> you please clarify your position on this further? 

The dirty pages will float about in memory until something causes them to
be written back.  That "something" could be
msync/fsync/sync/pdflush/journal commit or, eventually, the VM system
deciding that it wants to reuse that physical page for something else.

So yes, the page will eventually be written to the server, but not for
quite some time.

In the case where the page was dirtied by mmap and was then unmapped (via
munmap or via program exit), the page will be marked dirty in pagecache
when its pagetable entry is unmapped.  That makes the page's dirtiness
visible to the VFS and the page will be written out approximately 30
seconds later by pdflush.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NFS: msync required for data writes to server?
  2005-05-13  0:57 ` Andrew Morton
@ 2005-05-13  2:21   ` Linda Dunaphant
  2005-05-13  2:42     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Linda Dunaphant @ 2005-05-13  2:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: trond.myklebust, linux-kernel

On Thu, 2005-05-12 at 20:57, Andrew Morton wrote:
> Linda Dunaphant <linda.dunaphant@ccur.com> wrote:
> >
> > Hi Trond,
> > 
> > On our 2.6.9 based systems, data written using mmap(MAP_SHARED) on a NFS
> > client is *never* being pushed out to the server if an explicit msync call
> > is not issued before the munmap.
> > 
> > On 11/12/04, there was a message thread concerning NFS corruption when
> > using mmap/munmap:
> > 
> > http://marc.theaimsgroup.com/?l=linux-nfs&m=110028817508318&w=2
> > 
> > In this thread you stated:
> > 
> >      mmap() offers absolutely NO guarantees that the file will be synced to
> >      disk on close. Use msync(MS_SYNC) if you want such a guarantee.
> > 
> > Are you saying that the data will *never* be written to the server?  Could
> > you please clarify your position on this further? 
> 
> The dirty pages will float about in memory until something causes them to
> be written back.  That "something" could be
> msync/fsync/sync/pdflush/journal commit or, eventually, the VM system
> deciding that it wants to reuse that physical page for something else.
> 
> So yes, the page will eventually be written to the server, but not for
> quite some time.
> 
> In the case where the page was dirtied by mmap and was then unmapped (via
> munmap or via program exit), the page will be marked dirty in pagecache
> when its pagetable entry is unmapped.  That makes the page's dirtiness
> visible to the VFS and the page will be written out approximately 30
> seconds later by pdflush.

Thank you for responding Andrew!

The behavior that you describe is what I expected to see. However, with
my test program that mmap's the NFS file on the client, writes the data,
and then munmap's the file, this wasn't the case with the 2.6.9 based
kernel I was using. The file data was NEVER being written to the server.

This afternoon I downloaded and built several later kernels. I found
that with 2.6.11, the problem still occurred. With 2.6.12-rc1, the
problem did not occur. I could see the proper data on the server. 

Looking at the differences in fs/nfs between these trees I found a
change to nfs_file_release() in fs/nfs/file.c. When I applied this
change to my 2.6.9 tree, the data was written out to the server.

@@ -105,6 +108,9 @@
 static int
 nfs_file_release(struct inode *inode, struct file *filp)
 {
+       /* Ensure that dirty pages are flushed out with the right creds
*/
+       if (filp->f_mode & FMODE_WRITE)
+               filemap_fdatawrite(filp->f_mapping);
        return NFS_PROTO(inode)->file_release(inode, filp);
 }

Thanks for your help!
Linda


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NFS: msync required for data writes to server?
  2005-05-13  2:21   ` Linda Dunaphant
@ 2005-05-13  2:42     ` Andrew Morton
  2005-05-13  3:41       ` Linda Dunaphant
  2005-05-13  3:48       ` Trond Myklebust
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2005-05-13  2:42 UTC (permalink / raw)
  To: linda.dunaphant; +Cc: trond.myklebust, linux-kernel

Linda Dunaphant <linda.dunaphant@ccur.com> wrote:
>
> The behavior that you describe is what I expected to see. However, with
>  my test program that mmap's the NFS file on the client, writes the data,
>  and then munmap's the file, this wasn't the case with the 2.6.9 based
>  kernel I was using. The file data was NEVER being written to the server.

There's something very wrong with that.

>  This afternoon I downloaded and built several later kernels. I found
>  that with 2.6.11, the problem still occurred. With 2.6.12-rc1, the
>  problem did not occur. I could see the proper data on the server. 
> 
>  Looking at the differences in fs/nfs between these trees I found a
>  change to nfs_file_release() in fs/nfs/file.c. When I applied this
>  change to my 2.6.9 tree, the data was written out to the server.
> 
>  @@ -105,6 +108,9 @@
>   static int
>   nfs_file_release(struct inode *inode, struct file *filp)
>   {
>  +       /* Ensure that dirty pages are flushed out with the right creds
>  */
>  +       if (filp->f_mode & FMODE_WRITE)
>  +               filemap_fdatawrite(filp->f_mapping);
>          return NFS_PROTO(inode)->file_release(inode, filp);
>   }

Well yes, that'll sync the file on close, but it doesn't explain the
original problem.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NFS: msync required for data writes to server?
  2005-05-13  2:42     ` Andrew Morton
@ 2005-05-13  3:41       ` Linda Dunaphant
  2005-05-13  3:48       ` Trond Myklebust
  1 sibling, 0 replies; 7+ messages in thread
From: Linda Dunaphant @ 2005-05-13  3:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: trond.myklebust, linux-kernel

On Thu, 2005-05-12 at 22:42, Andrew Morton wrote:
> Linda Dunaphant <linda.dunaphant@ccur.com> wrote:
> >
> > The behavior that you describe is what I expected to see. However, with
> >  my test program that mmap's the NFS file on the client, writes the data,
> >  and then munmap's the file, this wasn't the case with the 2.6.9 based
> >  kernel I was using. The file data was NEVER being written to the server.
> 
> There's something very wrong with that.
> 
> >  This afternoon I downloaded and built several later kernels. I found
> >  that with 2.6.11, the problem still occurred. With 2.6.12-rc1, the
> >  problem did not occur. I could see the proper data on the server. 
> > 
> >  Looking at the differences in fs/nfs between these trees I found a
> >  change to nfs_file_release() in fs/nfs/file.c. When I applied this
> >  change to my 2.6.9 tree, the data was written out to the server.
> > 
> >  @@ -105,6 +108,9 @@
> >   static int
> >   nfs_file_release(struct inode *inode, struct file *filp)
> >   {
> >  +       /* Ensure that dirty pages are flushed out with the right creds
> >  */
> >  +       if (filp->f_mode & FMODE_WRITE)
> >  +               filemap_fdatawrite(filp->f_mapping);
> >          return NFS_PROTO(inode)->file_release(inode, filp);
> >   }
> 
> Well yes, that'll sync the file on close, but it doesn't explain the
> original problem.
> 

The original problem that I was trying to track down occurred with an
Ada test suite. During the initialization phase it creates several data
files with mmap(MAP_SHARED) that contain information that is used by
later phases of the test. We were getting test failures on random tests
because the testsuite running on the client was occasionally reading
nulls from these files. Using ethereal to trace several testruns (~2.5
hrs for a complete run), I never saw any writes for these files being
issued to the server. My original theory was that as long as the pages
associated with the file were still in memory, the data was correct for
applications running on the client - but if a page is being dropped
without being written to the server, and someone references that offset
later, the data would be reread from the server and nulls would be
returned.  

After I found Trond's statement that msync was required to flush the
data to the server, I created a small test program that creates a file,
ftruncates it to 16384 bytes, mmaps it, closes it, writes the data, then
munmaps it. With the 2.6.9 based kernel, I never saw the correct data on
the server, even if I waited over an hour. If I looked at the file from
the client, I would see the correct data. I also tried unmounting the
filesystem to see if the data flush would occur, but it never did. We
also tried adding msync calls to the testsuite and the original problem
we had went away. This was the reason I posted my original question
whether an msync is required if the file is NFS. 

Tomorrow we will try running the testsuite with the above change in
place to see if it helps the original problem. I suspect the potential
still exists for a page to be dropped before the file release that could
still cause incorrect data to be read back from the server.

Linda


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NFS: msync required for data writes to server?
  2005-05-13  2:42     ` Andrew Morton
  2005-05-13  3:41       ` Linda Dunaphant
@ 2005-05-13  3:48       ` Trond Myklebust
  2005-05-13 23:57         ` Linda Dunaphant
  1 sibling, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2005-05-13  3:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linda.dunaphant, linux-kernel

to den 12.05.2005 Klokka 19:42 (-0700) skreiv Andrew Morton:
> >  Looking at the differences in fs/nfs between these trees I found a
> >  change to nfs_file_release() in fs/nfs/file.c. When I applied this
> >  change to my 2.6.9 tree, the data was written out to the server.
> > 
> >  @@ -105,6 +108,9 @@
> >   static int
> >   nfs_file_release(struct inode *inode, struct file *filp)
> >   {
> >  +       /* Ensure that dirty pages are flushed out with the right creds
> >  */
> >  +       if (filp->f_mode & FMODE_WRITE)
> >  +               filemap_fdatawrite(filp->f_mapping);
> >          return NFS_PROTO(inode)->file_release(inode, filp);
> >   }
> 
> Well yes, that'll sync the file on close, but it doesn't explain the
> original problem.

See the above comment and the changelog entry for that patch. The
problem is and was that writepage() does not take a struct file
argument, and so we have to guess which RPC credentials to use when
writing out the dirty pages.

Before we had RPCSEC_GSS, we could cache the credential of the last
person who opened the file, and expect that it would be usable for
writing out dirty pages since the AUTH_SYS credentials have an infinite
lifetime. With the advent of strong security, and credentials with a
finite lifetime, that became risky behaviour, and so we now actually
look for which files are still open at the moment when writepage is
called.
The problem was that the VFS does not flush dirty pages to disk on
file_release(), and hence the above patch which at least will do it for
NFS.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NFS: msync required for data writes to server?
  2005-05-13  3:48       ` Trond Myklebust
@ 2005-05-13 23:57         ` Linda Dunaphant
  0 siblings, 0 replies; 7+ messages in thread
From: Linda Dunaphant @ 2005-05-13 23:57 UTC (permalink / raw)
  To: Trond Myklebust, Andrew Morton; +Cc: linux-kernel

Trond & Andrew,

Thank you both so much for your help!

The change fixed the problem we were having with the testsuite with mmap
of NFS files. We had 3 successful test runs today with the fix in place.
I found the changeset entry for this fix:

ChangeSet@1.2181.9.9, 2005-03-15 19:44:28-08:00, trond.myklebust@fys.uio.no
  [PATCH] NFS: Ensure that dirty pages are written with the right creds.
                                                                                 
   When doing shared mmap writes, the resulting dirty NFS pages may
   find themselves incapable of being flushed out if I/O is started
   after the file was released.

   Make sure we start I/O on all existing dirty pages in nfs_file_release().                                                                                 
   
Cheers!
Linda


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-05-13 23:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-12 19:21 NFS: msync required for data writes to server? Linda Dunaphant
2005-05-13  0:57 ` Andrew Morton
2005-05-13  2:21   ` Linda Dunaphant
2005-05-13  2:42     ` Andrew Morton
2005-05-13  3:41       ` Linda Dunaphant
2005-05-13  3:48       ` Trond Myklebust
2005-05-13 23:57         ` Linda Dunaphant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox