All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier.
@ 2007-07-12  0:12 Neil Brown
  2007-07-12 20:31 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2007-07-12  0:12 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs


When a write to a local filesystem hits a space limitation such as
filesystem-full or quota-exhausted, the write fails synchronously.
You get ENOSPC or EDQUOT immediately.

NFS cannot do that efficiently.

Currently, you don't get these errors until 'fsync' or 'close'.  That
is very different from local filesystem behaviour, and is less than
ideal.

The following patch causes these two errors to be returned through the
next write call once they are known about.

A possible extension would be to set a flag when we first get such an
error, clear it when a write succeeds, and while the flag is set, do
all writes synchronously.  This would be even closer to
local-filesystem semantics, but I'm not sure it is worth it.

Thoughts?

Thanks,
NeilBrown

Make NFS writes return ENOSPC and EDQUOT more promptly.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfs/file.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff .prev/fs/nfs/file.c ./fs/nfs/file.c
--- .prev/fs/nfs/file.c	2007-07-12 08:57:09.000000000 +1000
+++ ./fs/nfs/file.c	2007-07-12 09:56:58.000000000 +1000
@@ -367,6 +367,8 @@ const struct address_space_operations nf
 static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
 				unsigned long nr_segs, loff_t pos)
 {
+	struct nfs_open_context *ctx =
+		(struct nfs_open_context *)iocb->ki_filp->private_data;
 	struct dentry * dentry = iocb->ki_filp->f_path.dentry;
 	struct inode * inode = dentry->d_inode;
 	ssize_t result;
@@ -397,6 +399,16 @@ static ssize_t nfs_file_write(struct kio
 	if (!count)
 		goto out;
 
+	if (ctx->error == -EDQUOT || ctx->error == -ENOSPC) {
+		/* Local file systems return these errors synchronously,
+		 * NFS cannot do that efficiently, so just return them
+		 * at the first opportunity.
+		 */
+		result = ctx->error;
+		ctx->error = 0;
+		goto out;
+	}
+
 	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, count);
 	result = generic_file_aio_write(iocb, iov, nr_segs, pos);
 	/* Return error values for O_SYNC and IS_SYNC() */

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier.
  2007-07-12  0:12 [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier Neil Brown
@ 2007-07-12 20:31 ` Trond Myklebust
  2007-07-13  0:01   ` Neil Brown
  2007-07-13 17:16   ` Chuck Lever
  0 siblings, 2 replies; 5+ messages in thread
From: Trond Myklebust @ 2007-07-12 20:31 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

On Thu, 2007-07-12 at 10:12 +1000, Neil Brown wrote:
> When a write to a local filesystem hits a space limitation such as
> filesystem-full or quota-exhausted, the write fails synchronously.
> You get ENOSPC or EDQUOT immediately.
> 
> NFS cannot do that efficiently.
> 
> Currently, you don't get these errors until 'fsync' or 'close'.  That
> is very different from local filesystem behaviour, and is less than
> ideal.
> 
> The following patch causes these two errors to be returned through the
> next write call once they are known about.
> 
> A possible extension would be to set a flag when we first get such an
> error, clear it when a write succeeds, and while the flag is set, do
> all writes synchronously.  This would be even closer to
> local-filesystem semantics, but I'm not sure it is worth it.

Well... I've been thinking along these last lines myself (i.e. forcing
all subsequent writes to be synchronous whenever an error occurs).

My main worry is actually rather about returning the error value. The
point is that you are returning an error that probably isn't relevant to
the write() system call that you are servicing. In fact the write
syscall that returns the error may actually succeed in writing all its
data to stable storage. Under those circumstances, I'm not sure that
returning an error at all is a good idea. 
BTW, I absolutely _loathe_ the mapping->flags AS_EIO and AS_ENOSPC hack.
In that case you are returning the error to a random process that just
happened to complete wait_on_page_writeback_range() before all the other
processes that called it did.

Cheers
  Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier.
  2007-07-12 20:31 ` Trond Myklebust
@ 2007-07-13  0:01   ` Neil Brown
  2007-07-13 17:16   ` Chuck Lever
  1 sibling, 0 replies; 5+ messages in thread
From: Neil Brown @ 2007-07-13  0:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs

On Thursday July 12, trond.myklebust@fys.uio.no wrote:
> On Thu, 2007-07-12 at 10:12 +1000, Neil Brown wrote:
> > When a write to a local filesystem hits a space limitation such as
> > filesystem-full or quota-exhausted, the write fails synchronously.
> > You get ENOSPC or EDQUOT immediately.
> > 
> > NFS cannot do that efficiently.
> > 
> > Currently, you don't get these errors until 'fsync' or 'close'.  That
> > is very different from local filesystem behaviour, and is less than
> > ideal.
> > 
> > The following patch causes these two errors to be returned through the
> > next write call once they are known about.
> > 
> > A possible extension would be to set a flag when we first get such an
> > error, clear it when a write succeeds, and while the flag is set, do
> > all writes synchronously.  This would be even closer to
> > local-filesystem semantics, but I'm not sure it is worth it.
> 
> Well... I've been thinking along these last lines myself (i.e. forcing
> all subsequent writes to be synchronous whenever an error occurs).
> 
> My main worry is actually rather about returning the error value. The
> point is that you are returning an error that probably isn't relevant to
> the write() system call that you are servicing. In fact the write
> syscall that returns the error may actually succeed in writing all its
> data to stable storage. Under those circumstances, I'm not sure that
> returning an error at all is a good idea. 

Yes, I had those same thoughts, but decided that it wasn't really
worth worrying about.
If you want precise errors from writes, you really need to use O_SYNC,
or fsync when you really care.
And the nature of ENOSPC is that you could quite possibly get it once,
then try the same write again and it will succeed (if someone else
deleted a file).  So returning it here without even trying the write
isn't necessarily wrong.

On the other size of the coin, I looked briefly at switching to sync
writes, and it didn't look at all straight forward.  The awkwardness
is in switching from async to sync.  There are mismatched assumptions.

With O_SYNC, nfs_file_write always calls nfs_fsync, which will return
and clear ctx->error.  So any error from any preceding write will be
returned.  But that is OK - as the file is O_SYNC, all preceding
writes will have already returned their errors and anything in
ctx->error must be from this write (unless two process use pwrite to
write to different places of the same fd ...).

To do a single sync write we would need to:
  nfs_wb_all
  take a copy of ctx->error and clear it.
  submit the write
  nfs_wb_all
  copy the new ctx->error
  restore the saved ctx->error
  return the ctx->error relevant for this write.

Which seems rather clumsy and racy (though I guess we hold i_mutex, so
it is possibly safe).

So I went for the approach that was simplest....


> BTW, I absolutely _loathe_ the mapping->flags AS_EIO and AS_ENOSPC hack.
> In that case you are returning the error to a random process that just
> happened to complete wait_on_page_writeback_range() before all the other
> processes that called it did.

Hmmm... returning errors for async writes is something that the POSIX
API doesn't really support well.  The best you could hope for is that
if a write error occurs, then anyone who could have written to that
file gets an error on the next write, fsync, close, msync, ...
That would mean a counter in the address_space for each error,
and a matching counter in the struct file.
On open:  file->enospc_cnt = mapping->enospc_cnt
On fsync etc: 
          cnt = mapping->enospc_cnt;
	  if (file->enospc_cnt != cnt) {
		file->enospc_cnt = cnt;
		return -ENOSPC;
	  }

One counter each for EIO, ENOSPC, EDQUOT - I think that is all (I note
there is no AS_EDQUOT...).

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier.
  2007-07-12 20:31 ` Trond Myklebust
  2007-07-13  0:01   ` Neil Brown
@ 2007-07-13 17:16   ` Chuck Lever
  2007-07-13 21:12     ` Trond Myklebust
  1 sibling, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2007-07-13 17:16 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Neil Brown, nfs

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]

Trond Myklebust wrote:
> On Thu, 2007-07-12 at 10:12 +1000, Neil Brown wrote:
>> When a write to a local filesystem hits a space limitation such as
>> filesystem-full or quota-exhausted, the write fails synchronously.
>> You get ENOSPC or EDQUOT immediately.
>>
>> NFS cannot do that efficiently.
>>
>> Currently, you don't get these errors until 'fsync' or 'close'.  That
>> is very different from local filesystem behaviour, and is less than
>> ideal.
>>
>> The following patch causes these two errors to be returned through the
>> next write call once they are known about.
>>
>> A possible extension would be to set a flag when we first get such an
>> error, clear it when a write succeeds, and while the flag is set, do
>> all writes synchronously.  This would be even closer to
>> local-filesystem semantics, but I'm not sure it is worth it.
> 
> Well... I've been thinking along these last lines myself (i.e. forcing
> all subsequent writes to be synchronous whenever an error occurs).

OK, but...

After it is triggered by a write error, how would the synchronous 
behavior be turned off?  Both EDQUOT and ENOSPC, for example, are really 
a temporary error; if applications are smart enough, they can retry an 
EDQUOT and might get a successful write.  In that case, performance 
would be hosed until.... the client decides it's OK to use unstable 
writes again.

[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 315 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


[-- Attachment #3: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #4: Type: text/plain, Size: 140 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier.
  2007-07-13 17:16   ` Chuck Lever
@ 2007-07-13 21:12     ` Trond Myklebust
  0 siblings, 0 replies; 5+ messages in thread
From: Trond Myklebust @ 2007-07-13 21:12 UTC (permalink / raw)
  To: chuck.lever; +Cc: Neil Brown, nfs

On Fri, 2007-07-13 at 13:16 -0400, Chuck Lever wrote:
> OK, but...
> 
> After it is triggered by a write error, how would the synchronous 
> behavior be turned off?  Both EDQUOT and ENOSPC, for example, are really 
> a temporary error; if applications are smart enough, they can retry an 
> EDQUOT and might get a successful write.  In that case, performance 
> would be hosed until.... the client decides it's OK to use unstable 
> writes again.

Just as Neil said: you disable the forced synchronous behaviour upon the
first successful write. Obviously, it would be quite simple for the
application to hose this heuristic. You could do so trivially by writing
one byte at a time in order to get the error cleared, and then try to
build up a large asynchronous write backlog.

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-07-13 21:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-12  0:12 [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier Neil Brown
2007-07-12 20:31 ` Trond Myklebust
2007-07-13  0:01   ` Neil Brown
2007-07-13 17:16   ` Chuck Lever
2007-07-13 21:12     ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.