From: NeilBrown <neilb@suse.de>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: NFS <linux-nfs@vger.kernel.org>
Subject: NFS regression - EIO is returned instead of ENOSPC
Date: Wed, 12 Dec 2012 09:28:13 +1100 [thread overview]
Message-ID: <20121212092813.23afbc5f@notabene.brown> (raw)
[-- Attachment #1: Type: text/plain, Size: 4889 bytes --]
Hi Trond et al,
we seem to have a regression introduced by
commit 7b281ee026552f10862b617a2a51acf49c829554
NFS: fsync() must exit with an error if page writeback failed
which has found it's way (in different form into -stable releases).
The problem is that an NFSERR_NOSPC comes through as EIO.
e.g. if /mnt2/ if an nfs mounted filesystem that has no space then
strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1
reported Input/output error and the relevant parts of the strace output are:
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1) = -1 EIO (Input/output error)
close(1) = -1 ENOSPC (No space left on device)
i.e. we get an EIO from fsync, then the ENOSPC comes with the close.
If don't do the fsync, then:
strace dd if=/dev/zero >> /mnt2/afile count=1
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
close(1) = -1 EIO (Input/output error)
we don't see the ENOSPC at all.
The problem is that filemap_fdatawait_range, when it sees a page with
PageError set, will return -EIO unless AS_ENOSPC is set. NFS never sets
AS_ENOSPC so we get -EIO.
Then nfs_file_fsync() will return as soon as it sees an error from
filemap_write_and_wait_range(), which will be that EIO.
nfs_file_fsync_commit knows to prefer the error from ctx->error over any
other error, but nfs_file_fsync() doesn't.
I see two ways to "fix" this.
We could get nfs{,4}_file_fsync() to always call nfs_file_fsync_commit() and
use the error from the later in preference to the error from
filemap_write_and_wait_range().
That results in this, more correct, behaviour:
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1) = -1 ENOSPC (No space left on device)
close(1) = 0
Or we could get nfs_context_set_write_error() to call mapping_set_error(),
which would result in AS_ENOSPC being set and so
filemap_write_and_wait_range() will return the "correct" error.
This results in this behaviour:
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1) = -1 ENOSPC (No space left on device)
close(1) = -1 ENOSPC (No space left on device)
which is a bit odd. The first ENOSPC is from AS_ENOSPC. The second is from
ctx->error.
I feel that calling mapping_set_error() is the "right" thing to do, but it
would need a bit more work to avoid the double errors.
What approach would you prefer?
The two patches I used are:
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 582bb88..19c06b9 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -295,12 +295,12 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
struct inode *inode = file->f_path.dentry->d_inode;
do {
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
- if (ret != 0)
- break;
+ int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end);
mutex_lock(&inode->i_mutex);
ret = nfs_file_fsync_commit(file, start, end, datasync);
mutex_unlock(&inode->i_mutex);
+ if (ret == 0)
+ ret = ret1;
/*
* If nfs_file_fsync_commit detected a server reboot, then
* resend all dirty pages that might have been covered by
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index afddd66..f0d9a88 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -96,15 +96,15 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
struct inode *inode = file->f_path.dentry->d_inode;
do {
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
- if (ret != 0)
- break;
+ int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end);
mutex_lock(&inode->i_mutex);
ret = nfs_file_fsync_commit(file, start, end, datasync);
if (!ret && !datasync)
/* application has asked for meta-data sync */
ret = pnfs_layoutcommit_inode(inode, true);
mutex_unlock(&inode->i_mutex);
+ if (ret == 0)
+ ret = ret1;
/*
* If nfs_file_fsync_commit detected a server reboot, then
* resend all dirty pages that might have been covered by
and
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 9347ab7..b0f5bb7 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -140,6 +140,7 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
ctx->error = error;
smp_wmb();
set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
+ mapping_set_error(ctx->dentry->d_inode->i_mapping, error);
}
static struct nfs_page *
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next reply other threads:[~2012-12-11 22:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-11 22:28 NeilBrown [this message]
2012-12-11 22:53 ` NFS regression - EIO is returned instead of ENOSPC NeilBrown
2012-12-11 23:16 ` Myklebust, Trond
[not found] ` <1355267807.23203.16.camel@lade.trondhjem.org>
2012-12-11 23:20 ` Myklebust, Trond
2012-12-12 0:05 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121212092813.23afbc5f@notabene.brown \
--to=neilb@suse.de \
--cc=Trond.Myklebust@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).