All of lore.kernel.org
 help / color / mirror / Atom feed
From: shichao <shichao@cn.fujitsu.com>
To: "nfs@lists.sourceforge.net" <nfs@lists.sourceforge.net>,
	"nfsv4@linux-nfs.org" <nfsv4@linux-nfs.org>
Subject: [NFS PATCH] Fix the issue of writing data over NFS concurrently both on client and Server uncorrectly
Date: Wed, 27 Jun 2007 11:33:38 +0800	[thread overview]
Message-ID: <4681DA92.8020303@cn.fujitsu.com> (raw)

Hi everybody,

When I write data on file over NFS (mounted aync)  both on client and on Server, 
sometimes the client seems to miss the server's updated data and commit the stale cache data to the nfs server. This
leads to a file writed uncorrectly.

Example on a local dir /local/testdir/foo NFS-exported to the localhost on /nfs/foo:
 The exportfs option in server is: /nfs/  *(rw,sync,root_no_squash)

[root@rhel5~]# mount server:/nfs  /local/testdir

 My test program's main flow is as follows:
  1) fd = open("/local/testdir/foo", oflags, CHMOD_RW )
  2) wbuf="testmsg-1-"; write(fd, wbuf, strlen(wbuf))
  3) On nfs Server: echo "testmsg-2-" >> /nfs/foo    (using rsh)
  4) sleep(6); lseek(fd, 0L, SEEK_END)
  5) wbuf="testmsg-3-"; write(fd, wbuf, strlen(wbuf))
  6) lseek(fd, 0L, 0); read(fd, rbuf, MAX_DATA_SIZE)

 It is expected that the file data should be "testmsg-1-testmsg-2-testmsg-3",
 but in fact, after the program excuted, the file data is "testmsg-1-\0\0\0\0\0\0\0\0\0-testmsg-3".
 This problem just happens in the NFS mounted in the "sync" mode.

 I have investigated the problem and found the problem may be resolved by fixing the nfs_updatepage().

 When the lseek(fs,0L,SEEK_END) is invoked, the inode will be found invalid, and the nfs_revalidae_inode() will 
 update the flle size correctly, but the cache corresponding to the inode is not updately at the same time.
 Then at the final, the write(fd) write the "testmsg-3) from the update tail of the page cache, Because the NFS is mounted 
 in async mode, the write_updatepage just write the date in the local cache, do not commit the data to server in real time.
 In fact, the middle of the page cache of the inode is just filled with '\0'. 
 Finally, when the read(fd) is invoked, the nfsi->cache_validity is invalid and the local page cache is fresher than the cache sever,
 So the kernel commit all the local page cache to the server, as a result, the file on the sever is updated by the 
 invalid data "testmsg-1-\0\0\0\0\0\0\0\0\0-testmsg-3".

 To fix the problem, I think that the page cache should be checked in nfs_updatepage even in async mode. If the page is founded 
 that the nfsi->cache_validity is marked or the local inode->i_mtime and sever fattr.mtime is not the same, the kernel should 
 invoke the nfs_writepage_sync() to commit the writed data at real time.
 Then at the read(), even some stale cache data exists in the middle of the page cache, the nfsi->cache_validity is valid and the local page
 cache timestamp is the same as the cache sever, so the kernel will not commit the local invalid page cache to the server, 
 then finally the local page cache will be refreshed by the read with the correct file data on sever.
 
 I have made the patch attached below, it is confirmed the patch can fix the issue. 
 But I am still wonder that is the patch really reasonable and correct, or is the patch just not needed, this issue may can not
 be avoid in async in fact?
 
 Can somebody give your suggestion or other better fix method? 

Thanks

Regards.
ShiChao

Signed-off-by: ShiChao <shichao@cn.fujitsu.com>
--- linux/fs/nfs/write.c	2007-06-27 10:32:52.000000000 +0800
+++ linux-fix/fs/nfs/write.c	2007-06-27 10:29:55.000000000 +0800
@@ -836,6 +836,9 @@
 	struct inode	*inode = page->mapping->host;
 	struct nfs_page	*req;
 	int		status = 0;
+	int		result = 0 ;
+	struct nfs_inode *nfsi = NFS_I(inode);
+	struct nfs_fattr fattr;
 
 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
 
@@ -852,8 +855,27 @@
	if (IS_SYNC(inode)) {
		status = nfs_writepage_sync(ctx, inode, page, offset, count, 0);
		if (status > 0) {
			if (offset == 0 && status == PAGE_CACHE_SIZE)
				SetPageUptodate(page);
			return 0;
		}
		return status;
-	}
-
+	}else{
+			   if (nfsi->cache_validity & NFS_INO_INVALID_DATA) 
+                           {	
+			           status = nfs_writepage_sync(ctx, inode, page, offset, count, 0);
+			           if (status > 0) {
+				        if (offset == 0 && status == PAGE_CACHE_SIZE)
+					   SetPageUptodate(page);
+				        return 0;
+			           }
+			   }else if ( nfs_attribute_timeout(inode)){
+                           result = NFS_PROTO(inode)->getattr(NFS_SERVER(inode), NFS_FH(inode), &fattr); 
+	                     if (!timespec_equal(&inode->i_mtime, &fattr.mtime)){
+			            status = nfs_writepage_sync(ctx, inode, page, offset, count, 0);
+			           if (status > 0) {
+				        if (offset == 0 && status == PAGE_CACHE_SIZE)
+					   SetPageUptodate(page);
+				        return 0;
+			              }     
+			    }
+			}
+        }
 	/* If we're not using byte range locks, and we know the page
 	 * is entirely in cache, it may be more efficient to avoid
 	 * fragmenting write requests.



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

             reply	other threads:[~2007-06-27  3:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-27  3:33 shichao [this message]
2007-06-27  3:39 ` [NFS PATCH] Fix the issue of writing data over NFS concurrently both on client and Server uncorrectly Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4681DA92.8020303@cn.fujitsu.com \
    --to=shichao@cn.fujitsu.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=nfsv4@linux-nfs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.