linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS regression - EIO is returned instead of ENOSPC
@ 2012-12-11 22:28 NeilBrown
  2012-12-11 22:53 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2012-12-11 22:28 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 4889 bytes --]


Hi Trond et al,
 we seem to have a regression introduced by 

commit 7b281ee026552f10862b617a2a51acf49c829554
    NFS: fsync() must exit with an error if page writeback failed

which has found it's way (in different form into -stable releases).

The problem is that an NFSERR_NOSPC comes through as EIO.

e.g. if /mnt2/ if an nfs mounted filesystem that has no space then

strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1

reported Input/output error and the relevant parts of the strace output are:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 EIO (Input/output error)
close(1)                                = -1 ENOSPC (No space left on device)

i.e. we get an EIO from fsync, then the ENOSPC comes with the close.

If don't do the fsync, then:

strace dd if=/dev/zero >> /mnt2/afile count=1


write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
close(1)                                = -1 EIO (Input/output error)


we don't see the ENOSPC at all.

The problem is that filemap_fdatawait_range, when it sees a page with
PageError set, will return -EIO unless AS_ENOSPC is set.  NFS never sets
AS_ENOSPC so we get -EIO.

Then nfs_file_fsync() will return as soon as it sees an error from
filemap_write_and_wait_range(), which will be that EIO.

nfs_file_fsync_commit knows to prefer the error from ctx->error over any
other error, but nfs_file_fsync() doesn't.

I see two ways to "fix" this.

We could get nfs{,4}_file_fsync() to always call nfs_file_fsync_commit() and
use the error from the later in preference to the error from
filemap_write_and_wait_range().

That results in this, more correct, behaviour:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 ENOSPC (No space left on device)
close(1)                                = 0

Or we could get nfs_context_set_write_error() to call mapping_set_error(),
which would result in AS_ENOSPC being set and so
filemap_write_and_wait_range() will return the "correct" error.
This results in this behaviour:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 ENOSPC (No space left on device)
close(1)                                = -1 ENOSPC (No space left on device)

which is a bit odd.  The first ENOSPC is from AS_ENOSPC.  The second is from
ctx->error.

I feel that calling mapping_set_error() is the "right" thing to do, but it
would need a bit more work to avoid the double errors.

What approach would you prefer?

The two patches I used are:

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 582bb88..19c06b9 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -295,12 +295,12 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	struct inode *inode = file->f_path.dentry->d_inode;
 
 	do {
-		ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-		if (ret != 0)
-			break;
+		int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end);
 		mutex_lock(&inode->i_mutex);
 		ret = nfs_file_fsync_commit(file, start, end, datasync);
 		mutex_unlock(&inode->i_mutex);
+		if (ret == 0)
+			ret = ret1;
 		/*
 		 * If nfs_file_fsync_commit detected a server reboot, then
 		 * resend all dirty pages that might have been covered by
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index afddd66..f0d9a88 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -96,15 +96,15 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	struct inode *inode = file->f_path.dentry->d_inode;
 
 	do {
-		ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-		if (ret != 0)
-			break;
+		int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end);
 		mutex_lock(&inode->i_mutex);
 		ret = nfs_file_fsync_commit(file, start, end, datasync);
 		if (!ret && !datasync)
 			/* application has asked for meta-data sync */
 			ret = pnfs_layoutcommit_inode(inode, true);
 		mutex_unlock(&inode->i_mutex);
+		if (ret == 0)
+			ret = ret1;
 		/*
 		 * If nfs_file_fsync_commit detected a server reboot, then
 		 * resend all dirty pages that might have been covered by


and 



diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 9347ab7..b0f5bb7 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -140,6 +140,7 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
 	ctx->error = error;
 	smp_wmb();
 	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
+	mapping_set_error(ctx->dentry->d_inode->i_mapping, error);
 }
 
 static struct nfs_page *



Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-12-12  0:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-11 22:28 NFS regression - EIO is returned instead of ENOSPC NeilBrown
2012-12-11 22:53 ` NeilBrown
2012-12-11 23:16   ` Myklebust, Trond
     [not found]   ` <1355267807.23203.16.camel@lade.trondhjem.org>
2012-12-11 23:20     ` Myklebust, Trond
2012-12-12  0:05       ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).