NFS client behavior on close

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* NFS client behavior on close
@ 2004-05-31 21:38 Simon Kirby
  2004-06-02  6:55 ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Simon Kirby @ 2004-05-31 21:38 UTC (permalink / raw)
  To: linux-kernel, Trond Myklebust

'lo,

I have a simple script which downloads pictures from my camera's flash
card and writes them to the current directory, which is often on an NFS
mount to a box with available HD space.

I noticed that with files of about this size (~5 MB), there seems to be a
lot of blocking with "cp" (or "mv"), noted by the LED on the card reader
being idle between each file.  What seems to be happening is that "cp"
will open() the source and destination, read and write a bunch of times,
and then close() the destination and source.  An "strace -r" reveals that
on the close() of the destination, "cp" blocks for quite a while. 
Looking at the network, it appears this is happening:

- cp starts copying, data goes to dirty cache (not yet to NFS server)
- cp closes destination
- cp blocks while NFS client writes dirty cache to NFS server
- cp wakes back up when NFS has written all data
- cp proceeds to the next file

This is quite suboptimal in that while the file is being read from the
(relatively slow) compact flash reader it has not yet started writing
over the network, and then between each file it writes all data.

Is the NFS client required to write all data on close?  If I mount with
"noac", downloads are actually faster because each block is written
immediately and there's nothing to unclog on close().  However, with this
option all other bulk transfers are slower (the network never saturates).

I'm using NFSv3.  Would NFSv4 or another version behave differently here?

2.6.6 server and client, btw.  Server exported "async", though exported
with "sync" appears equivalent to the client.

...
     0.000073 read(3, "\f\0\322\0z\0\200\0\r\0\n\0"..., 4096) = 800
     0.000065 write(4, "\f\0\322\0z\0\200\0\r\0\n\0"..., 800) = 800
     0.000066 read(3, "", 4096)         = 0
     0.000055 close(4)                  = 0
     0.906548 close(3)                  = 0

Thanks,

Simon-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS client behavior on close
  2004-05-31 21:38 NFS client behavior on close Simon Kirby
@ 2004-06-02  6:55 ` Trond Myklebust
  2004-06-02 15:41   ` Simon Kirby
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2004-06-02  6:55 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

På må , 31/05/2004 klokka 14:38, skreiv Simon Kirby:

> Is the NFS client required to write all data on close?

Yes. That is the basis of the NFSv2/v3 caching model...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS client behavior on close
  2004-06-02  6:55 ` Trond Myklebust
@ 2004-06-02 15:41   ` Simon Kirby
  2004-06-02 16:38     ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Simon Kirby @ 2004-06-02 15:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

Hi Trond,

On Tue, Jun 01, 2004 at 11:55:27PM -0700, Trond Myklebust wrote:

> P? m? , 31/05/2004 klokka 14:38, skreiv Simon Kirby:
> 
> > Is the NFS client required to write all data on close?
> 
> Yes. That is the basis of the NFSv2/v3 caching model...

In that case, is there any reason why we would ever want to wait
before sending data to the server, except for a minimal time to allow
merging into wsize blocks?  With no delay, avoiding the write to disk
for temporary files can still happen on the server side (async). 
Mass file writes from a single thread should be faster if the client
write buffering is minimized.

Perhaps there is no way to easily separate the NFS client case from
the normal page cache behavior?

Simon-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS client behavior on close
  2004-06-02 15:41   ` Simon Kirby
@ 2004-06-02 16:38     ` Trond Myklebust
  2004-06-02 19:16       ` Simon Kirby
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2004-06-02 16:38 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

På on , 02/06/2004 klokka 08:41, skreiv Simon Kirby:

> In that case, is there any reason why we would ever want to wait
> before sending data to the server, except for a minimal time to allow
> merging into wsize blocks?  With no delay, avoiding the write to disk
> for temporary files can still happen on the server side (async). 

NO! async is a stupidity that was introduced in order to get round the
fact that NFSv2 had no server-side equivalent of the "fsync()" command.
Async breaks O_SYNC writes, fsync(), sync(), ... Most importantly, it
removes all the normal guarantees that clients can recover safely if the
server reboots or crashes.

<RANT>I find it hard to understand how people, who would normally scream
if you told them that "fsync()" on their desktop PC was broken and
didn't actually flush data to disk, can find it quite acceptable as long
as it's "only" their central storage units that are broken in the same
way.</RANT>

In any case, the performance benefit of using "async" should be very
small these days.

> Mass file writes from a single thread should be faster if the client
> write buffering is minimized.

Not necessarily. Consider the case of a random workload in which you
touch the same page more than once. Why then flush those same pages out
to disk more than once?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS client behavior on close
  2004-06-02 16:38     ` Trond Myklebust
@ 2004-06-02 19:16       ` Simon Kirby
  2004-06-02 19:45         ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Simon Kirby @ 2004-06-02 19:16 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

On Wed, Jun 02, 2004 at 09:38:28AM -0700, Trond Myklebust wrote:

> P? on , 02/06/2004 klokka 08:41, skreiv Simon Kirby:
> 
> > In that case, is there any reason why we would ever want to wait
> > before sending data to the server, except for a minimal time to allow
> > merging into wsize blocks?  With no delay, avoiding the write to disk
> > for temporary files can still happen on the server side (async). 
> 
> NO! async is a stupidity that was introduced in order to get round the
> fact that NFSv2 had no server-side equivalent of the "fsync()" command.
> Async breaks O_SYNC writes, fsync(), sync(), ... Most importantly, it
> removes all the normal guarantees that clients can recover safely if the
> server reboots or crashes.

Ok, that makes sense -- if NFSv2 has no fsync(), then using "async" mode
definitely sounds broken.  But is this the same with NFSv3?

> <RANT>I find it hard to understand how people, who would normally scream
> if you told them that "fsync()" on their desktop PC was broken and
> didn't actually flush data to disk, can find it quite acceptable as long
> as it's "only" their central storage units that are broken in the same
> way.</RANT>

I'm of the (probably small) school of thought where I'd rather have my
data disappear than have to wait for all of the stupid uses of sync() and
fsync() in applications everywhere these days.  In fact, I've even
considered writing an SMTP gateway which attempts delivery to the remote
host between the end-of-message marker and the response in order to avoid
having to fsync() to a queue (and still RFC compliant :) ).

Instead, I think applications should be woken up so that they can exit or
reply "OK" once the dirty data has been flushed, overwritten, or toasted
rather than the application requesting it and blocking).  The same sort
of idea, but the other way around.  Maybe fsync() could just change more
to a "I'd like to participate in the next round of writes" kind of call.

> Not necessarily. Consider the case of a random workload in which you
> touch the same page more than once. Why then flush those same pages out
> to disk more than once?

Well, if the client sends immediately _and_ the server writes it
instantly to disk, then, yes, that would not be optimal.

NFS should just extend fsync() back to the server -- with minimal caching
on the client, normal write-back caching on the server, and where fsync()
on the client forces the server to write before returning on the client. 
Forcing this to happen on close() doesn't even line up with local file
systems.

Simon-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS client behavior on close
  2004-06-02 19:16       ` Simon Kirby
@ 2004-06-02 19:45         ` Trond Myklebust
  0 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2004-06-02 19:45 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-kernel

På on , 02/06/2004 klokka 12:16, skreiv Simon Kirby:

> Ok, that makes sense -- if NFSv2 has no fsync(), then using "async" mode
> definitely sounds broken.  But is this the same with NFSv3?

The problem is that Linux's "async" implementation short-circuits the
NFSv3 fsync() equivalent. Not good!

> NFS should just extend fsync() back to the server -- with minimal caching
> on the client, normal write-back caching on the server, and where fsync()
> on the client forces the server to write before returning on the client. 
> Forcing this to happen on close() doesn't even line up with local file
> systems.

That still leaves room for races with other clients trying to open the
file after the server comes up after a crash, then finding stale data.
(Free|Net|Open)BSD choose to ignore that race, and do the above. I'm not
aware of anybody else doing so, though...

Performance is good, but it should always take second place to data
integrity. There are more than enough people out there who are
entrusting research projects, banking data,... to their NFS server.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-06-02 19:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-31 21:38 NFS client behavior on close Simon Kirby
2004-06-02  6:55 ` Trond Myklebust
2004-06-02 15:41   ` Simon Kirby
2004-06-02 16:38     ` Trond Myklebust
2004-06-02 19:16       ` Simon Kirby
2004-06-02 19:45         ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox