From: Neil Brown <neilb@suse.de>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: nfs@lists.sourceforge.net
Subject: Re: [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier.
Date: Fri, 13 Jul 2007 10:01:53 +1000 [thread overview]
Message-ID: <18070.49393.13232.932323@notabene.brown> (raw)
In-Reply-To: message from Trond Myklebust on Thursday July 12
On Thursday July 12, trond.myklebust@fys.uio.no wrote:
> On Thu, 2007-07-12 at 10:12 +1000, Neil Brown wrote:
> > When a write to a local filesystem hits a space limitation such as
> > filesystem-full or quota-exhausted, the write fails synchronously.
> > You get ENOSPC or EDQUOT immediately.
> >
> > NFS cannot do that efficiently.
> >
> > Currently, you don't get these errors until 'fsync' or 'close'. That
> > is very different from local filesystem behaviour, and is less than
> > ideal.
> >
> > The following patch causes these two errors to be returned through the
> > next write call once they are known about.
> >
> > A possible extension would be to set a flag when we first get such an
> > error, clear it when a write succeeds, and while the flag is set, do
> > all writes synchronously. This would be even closer to
> > local-filesystem semantics, but I'm not sure it is worth it.
>
> Well... I've been thinking along these last lines myself (i.e. forcing
> all subsequent writes to be synchronous whenever an error occurs).
>
> My main worry is actually rather about returning the error value. The
> point is that you are returning an error that probably isn't relevant to
> the write() system call that you are servicing. In fact the write
> syscall that returns the error may actually succeed in writing all its
> data to stable storage. Under those circumstances, I'm not sure that
> returning an error at all is a good idea.
Yes, I had those same thoughts, but decided that it wasn't really
worth worrying about.
If you want precise errors from writes, you really need to use O_SYNC,
or fsync when you really care.
And the nature of ENOSPC is that you could quite possibly get it once,
then try the same write again and it will succeed (if someone else
deleted a file). So returning it here without even trying the write
isn't necessarily wrong.
On the other size of the coin, I looked briefly at switching to sync
writes, and it didn't look at all straight forward. The awkwardness
is in switching from async to sync. There are mismatched assumptions.
With O_SYNC, nfs_file_write always calls nfs_fsync, which will return
and clear ctx->error. So any error from any preceding write will be
returned. But that is OK - as the file is O_SYNC, all preceding
writes will have already returned their errors and anything in
ctx->error must be from this write (unless two process use pwrite to
write to different places of the same fd ...).
To do a single sync write we would need to:
nfs_wb_all
take a copy of ctx->error and clear it.
submit the write
nfs_wb_all
copy the new ctx->error
restore the saved ctx->error
return the ctx->error relevant for this write.
Which seems rather clumsy and racy (though I guess we hold i_mutex, so
it is possibly safe).
So I went for the approach that was simplest....
> BTW, I absolutely _loathe_ the mapping->flags AS_EIO and AS_ENOSPC hack.
> In that case you are returning the error to a random process that just
> happened to complete wait_on_page_writeback_range() before all the other
> processes that called it did.
Hmmm... returning errors for async writes is something that the POSIX
API doesn't really support well. The best you could hope for is that
if a write error occurs, then anyone who could have written to that
file gets an error on the next write, fsync, close, msync, ...
That would mean a counter in the address_space for each error,
and a matching counter in the struct file.
On open: file->enospc_cnt = mapping->enospc_cnt
On fsync etc:
cnt = mapping->enospc_cnt;
if (file->enospc_cnt != cnt) {
file->enospc_cnt = cnt;
return -ENOSPC;
}
One counter each for EIO, ENOSPC, EDQUOT - I think that is all (I note
there is no AS_EDQUOT...).
NeilBrown
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2007-07-13 0:01 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-12 0:12 [PATCH/RFC] Return ENOSPC and EDQUOT for nfs writes earlier Neil Brown
2007-07-12 20:31 ` Trond Myklebust
2007-07-13 0:01 ` Neil Brown [this message]
2007-07-13 17:16 ` Chuck Lever
2007-07-13 21:12 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18070.49393.13232.932323@notabene.brown \
--to=neilb@suse.de \
--cc=nfs@lists.sourceforge.net \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.