All of lore.kernel.org
 help / color / mirror / Atom feed
* nfs/mmap/rename file corruption
@ 2003-08-27  8:02 Martin Pool
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Pool @ 2003-08-27  8:02 UTC (permalink / raw)
  To: nfs

There is a fairly easily reproducible bug in NFS in 2.4.22 that can
cause files to read back as full of nulls.  I have a tcpdump that
shows what is going wrong. 

Gavrie Philipson reported corruption happening when distcc and ccache
are used together with the cache on NFS.

  http://lists.samba.org/pipermail/distcc/2003q3/001556.html

To reproduce the bug you need to just install ccache 2.2 and distcc
2.10.1.  Set CCACHE_DIR to an empty directory on an NFS filesystem
mounted with default/rw options.  Build a file with a command like
this:

  ccache distcc -c ./hello.c

The first (only the first) time that you run this, the output file
(hello.o) will be the correct size, but contain only \0 bytes.

What is basically happening here is

 - ccache runs distcc with output to a temporary file
 - distcc opens, mmaps, writes to, munmaps, and closes the temporary
   file
 - distcc exits
 - ccache renames the temporary file to its proper location in the
   ccache
 - ccache opens the file read only, and reads from it

ccache ought to see the proper contents as written by mmap, but when
the cache is on NFS it just sees \0s.  It works correctly and reliably
on reiserfs and ext3.  However, if you look at the file ccache was
trying to read a second later then it seems to have the right
contents.

I tried writing a standalone test case but I couldn't reproduce it,
perhaps because of some timing issue.  It is quite reproducible both
on my machine and Gavrie's.

If distcc is configured to not use mmap for writing, the problem is
hidden.  

A tcpdump of the problem is available here:

 
http://distcc.samba.org/ftp/distcc/misc/mmap-bug/nfs-20030827T1351.pcap.gz

Here are the significant bits:

frame 79

   renames tmp.hash.vexed.7897.o to the final object filename,
   cbfc5ca42b1a693a5bca9bb8b23c5b-17387

frame 105

   also frame 107

   look up a filehandle for the final object filename, and gets the
   hash 0xed8222404

frame 115

   reads back from the final object file, 0xed8222404

frame 116

   is the reply to the read and it is full of nulls

frame 127 

   writes the ELF output into the temporary object file,
   tmp.hash.vexed.7897.o, which has file hash 0xf27c2204.

The problem is that the NFS client tries to read from the destination
file before it has written to the temporary file!  Frame 127 is far
too late.

It seems to me like there are two possible solutions: either flush out
all cached data for a file before it's renamed, or make the rename
smart enough to 'take over' any data cached under an old name.  To me
the first seems more robust if a little slower.

You can see something similar going on in this NFS log:

 
http://distcc.samba.org/ftp/distcc/misc/mmap-bug/nfsdebug-20030827T1609.log.gz

The flush(b/49777) call comes long after the rename and the attempt to
read from the new file.

I'll try to draft a patch for this.

-- 
Martin


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* nfs/mmap/rename file corruption
@ 2003-08-28  1:03 Martin Pool
  2003-08-28  1:37 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Pool @ 2003-08-28  1:03 UTC (permalink / raw)
  To: nfs

There is a fairly easily reproducible bug in NFS in 2.4.22 that can
cause files to read back as full of nulls.  I have a tcpdump that
shows what is going wrong. 

Gavrie Philipson reported corruption happening when distcc and ccache
are used together with the cache on NFS.

  http://lists.samba.org/pipermail/distcc/2003q3/001556.html

To reproduce the bug you need to just install ccache 2.2 and distcc
2.10.1.  Set CCACHE_DIR to an empty directory on an NFS filesystem
mounted with default/rw options.  Build a file with a command like
this:

  ccache distcc -c ./hello.c

The first (only the first) time that you run this, the output file
(hello.o) will be the correct size, but contain only \0 bytes.

What is basically happening here is

 - ccache runs distcc with output to a temporary file
 - distcc opens, mmaps, writes to, munmaps, and closes the temporary
   file
 - distcc exits
 - ccache renames the temporary file to its proper location in the
   ccache
 - ccache opens the file read only, and reads from it

ccache ought to see the proper contents as written by mmap, but when
the cache is on NFS it just sees \0s.  It works correctly and reliably
on reiserfs and ext3.  However, if you look at the file ccache was
trying to read a second later then it seems to have the right
contents.

I tried writing a standalone test case but I couldn't reproduce it,
perhaps because of some timing issue.  It is quite reproducible both
on my machine and Gavrie's.

If distcc is configured to not use mmap for writing, the problem is
hidden.  

A tcpdump of the problem is available here:

 
http://distcc.samba.org/ftp/distcc/misc/mmap-bug/nfs-20030827T1351.pcap.gz

Here are the significant bits:

frame 79

   renames tmp.hash.vexed.7897.o to the final object filename,
   cbfc5ca42b1a693a5bca9bb8b23c5b-17387

frame 105

   also frame 107

   look up a filehandle for the final object filename, and gets the
   hash 0xed8222404

frame 115

   reads back from the final object file, 0xed8222404

frame 116

   is the reply to the read and it is full of nulls

frame 127 

   writes the ELF output into the temporary object file,
   tmp.hash.vexed.7897.o, which has file hash 0xf27c2204.

The problem is that the NFS client tries to read from the destination
file before it has written to the temporary file!  Frame 127 is far
too late.

It seems to me like there are two possible solutions: either flush out
all cached data for a file before it's renamed, or make the rename
smart enough to 'take over' any data cached under an old name.  To me
the first seems more robust if a little slower.

You can see something similar going on in this NFS log:

 
http://distcc.samba.org/ftp/distcc/misc/mmap-bug/nfsdebug-20030827T1609.log.gz

The flush(b/49777) call comes long after the rename and the attempt to
read from the new file.

I'll try to draft a patch for this.

-- 
Martin


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfs/mmap/rename file corruption
  2003-08-28  1:03 Martin Pool
@ 2003-08-28  1:37 ` Trond Myklebust
  2003-08-28  2:14   ` Martin Pool
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2003-08-28  1:37 UTC (permalink / raw)
  To: Martin Pool; +Cc: nfs

>>>>> " " == Martin Pool <mbp@sourcefrog.net> writes:

     > - ccache runs distcc with output to a temporary file
     >  - distcc opens, mmaps, writes to, munmaps, and closes the temporary
     >    file
     > - distcc exits
     >  - ccache renames the temporary file to its proper location in the
     >    ccache
     > - ccache opens the file read only, and reads from it

Is this a rename from one directory to the other? If so, are you using
the 'no_subtree_check' option on the server? Without the latter option
enabled, I would indeed expect the behaviour that you describe.

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfs/mmap/rename file corruption
  2003-08-28  1:37 ` Trond Myklebust
@ 2003-08-28  2:14   ` Martin Pool
  2003-08-28 14:04     ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Pool @ 2003-08-28  2:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs

On 27 Aug 2003 21:37:38 -0400
Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

Thanks for responding so quickly!

> >>>>> " " == Martin Pool <mbp@sourcefrog.net> writes:
> 
>      > - ccache runs distcc with output to a temporary file
>      >  - distcc opens, mmaps, writes to, munmaps, and closes the
>      >  temporary
>      >    file
>      > - distcc exits
>      >  - ccache renames the temporary file to its proper location in
>      >  the
>      >    ccache
>      > - ccache opens the file read only, and reads from it
> 
> Is this a rename from one directory to the other? 

Yes.

> If so, are you using
> the 'no_subtree_check' option on the server? 

No, I was not.  It happens that the filesystem is exported at its
root.

The manpage from Debian's nfs-kernel-server 1:1.0.3-2 says

    In order to perform this check, the  server  must  include  some
    information  about  the location of the file in the "filehandle"
    that is given to the  client.   This  can  cause  problems  with
    accessing  files  that  are renamed while a client has them open
    (though in many simple cases it will still work).

In this case, the file is not still open at the time it is renamed.
It just still has some dirty pages in the client's memory.

When I set this option on the server then the renamed file gets the
same filehandle and things work properly.  I'll suggest to the user
that they should set it.

> Without the latter option enabled, I would indeed expect the
> behaviour that you describe.

It seems a bit unfortunate that we can get corruption unless a special
option is set.  Will it work on non-Linux nfs servers?

Wouldn't it still be possible to get the client to flush data out
before renaming it?  I tried naively calling nfs_flush_file before
renaming but that didn't seem to do it.

-- 
Martin 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfs/mmap/rename file corruption
  2003-08-28  2:14   ` Martin Pool
@ 2003-08-28 14:04     ` Trond Myklebust
  0 siblings, 0 replies; 5+ messages in thread
From: Trond Myklebust @ 2003-08-28 14:04 UTC (permalink / raw)
  To: Martin Pool; +Cc: Trond Myklebust, nfs

>>>>> " " == Martin Pool <mbp@sourcefrog.net> writes:


     >     In order to perform this check, the server must include
     >     some information about the location of the file in the
     >     "filehandle" that is given to the client.  This can cause
     >     problems with accessing files that are renamed while a
     >     client has them open (though in many simple cases it will
     >     still work).

     > In this case, the file is not still open at the time it is
     > renamed.  It just still has some dirty pages in the client's
     > memory.

That is the same as being 'open' for the purposes of the above
paragraph.

    >> Without the latter option enabled, I would indeed expect the
    >> behaviour that you describe.

     > It seems a bit unfortunate that we can get corruption unless a
     > special option is set.  Will it work on non-Linux nfs servers?

Yes. 'subtree checking' is a Linux-only concept. Most servers just
open their files by inode number and leave it at that.

     > Wouldn't it still be possible to get the client to flush data
     > out before renaming it?  I tried naively calling nfs_flush_file
     > before renaming but that didn't seem to do it.

mmap() is 'special': there are all sorts of silly races possible, and
many of them appear to lie deep in the mm layer. If you want to
trigger some really nasty ones, try playing with mmap() +
truncate()....

In principle, you could get what you want by calling the combination

filemap_fdatasync(inode->i_mapping);
nfs_wb_all(inode);
filemap_fdatawait(inode->i_mapping);

like we do in the file locking code. However that too appears to be
race prone due to races with the swap code.

In any case, doctoring the client in order to get around a bug in the
server is not usually my first choice...

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-08-28 14:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-27  8:02 nfs/mmap/rename file corruption Martin Pool
  -- strict thread matches above, loose matches on Subject: below --
2003-08-28  1:03 Martin Pool
2003-08-28  1:37 ` Trond Myklebust
2003-08-28  2:14   ` Martin Pool
2003-08-28 14:04     ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.