public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Long sleep with i_mutex in xfs_flush_device(), affects NFS service
@ 2006-09-26 18:51 Stephane Doyon
  2006-09-26 19:06 ` [NFS] " Trond Myklebust
  2006-09-27 11:33 ` Shailendra Tripathi
  0 siblings, 2 replies; 17+ messages in thread
From: Stephane Doyon @ 2006-09-26 18:51 UTC (permalink / raw)
  To: xfs, nfs

Hi,

I'm seeing an unpleasant behavior when an XFS file system becomes full, 
particularly when accessed over NFS. Both XFS and the linux NFS client 
appear to be contributing to the problem.

When the file system becomes nearly full, we eventually call down to 
xfs_flush_device(), which sleeps for 0.5seconds, waiting for xfssyncd to 
do some work.

xfs_flush_space()does
         xfs_iunlock(ip, XFS_ILOCK_EXCL);
before calling xfs_flush_device(), but i_mutex is still held, at least 
when we're being called from under xfs_write(). It seems like a fairly 
long time to hold a mutex. And I wonder whether it's really necessary to 
keep going through that again and again for every new request after we've 
hit NOSPC.

In particular this can cause a pileup when several threads are writing 
concurrently to the same file. Some specialized apps might do that, and 
nfsd threads do it all the time.

To reproduce locally, on a full file system:
#!/bin/sh
for i in `seq 30`; do
   dd if=/dev/zero of=f bs=1 count=1 &
done
wait
time that, it takes nearly exactly 15s.

The linux NFS client typically sends bunches of 16 requests, and so if the 
client is writing a single file, some NFS requests are therefore delayed 
by up to 8seconds, which is kind of long for NFS.

What's worse, when my linux NFS client writes out a file's pages, it does 
not react immediately on receiving a NOSPC error. It will remember and 
report the error later on close(), but it still tries and issues write 
requests for each page of the file. So even if there isn't a pileup on the 
i_mutex on the server, the NFS client still waits 0.5s for each 32K 
(typically) request. So on an NFS client on a gigabit network, on an 
already full filesystem, if I open and write a 10M file and close() it, it 
takes 2m40.083s for it to issue all the requests, get an NOSPC for each, 
and finally have my close() call return ENOSPC. That can stretch to 
several hours for gigabyte-sized files, which is how I noticed the 
problem.

I'm not too familiar with the NFS client code, but would it not be 
possible for it to give up when it encounters NOSPC? Or is there some 
reason why this wouldn't be desirable?

The rough workaround I have come up with for the problem is to have 
xfs_flush_space() skip calling xfs_flush_device() if we are within 2secs 
of having returned ENOSPC. I have verified that this workaround is 
effective, but I imagine there might be a cleaner solution.

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread
[parent not found: <9E397A467F4DB34884A1FD0D5D27CF43018903F96E@msxaoa4.twosigma.com>]

end of thread, other threads:[~2008-06-12 16:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-26 18:51 Long sleep with i_mutex in xfs_flush_device(), affects NFS service Stephane Doyon
2006-09-26 19:06 ` [NFS] " Trond Myklebust
2006-09-26 20:05   ` Stephane Doyon
2006-09-26 20:29     ` Trond Myklebust
2006-09-27 11:33 ` Shailendra Tripathi
2006-10-02 14:45   ` Stephane Doyon
2006-10-02 22:30     ` David Chinner
2006-10-03 13:39       ` several messages Stephane Doyon
2006-10-03 16:40         ` Trond Myklebust
2006-10-05 15:39           ` Stephane Doyon
2006-10-06  0:33             ` David Chinner
2006-10-06 13:25               ` Stephane Doyon
2006-10-05  8:30         ` David Chinner
2006-10-05 16:33           ` Stephane Doyon
2006-10-05 23:29             ` David Chinner
2006-10-06 13:03               ` Stephane Doyon
     [not found] <9E397A467F4DB34884A1FD0D5D27CF43018903F96E@msxaoa4.twosigma.com>
2008-06-12 16:54 ` Benjamin L. Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox