[NFS PATCH] Fix the issue of writing data over NFS concurrently both on client and Server uncorrectly

All of lore.kernel.org
 help / color / mirror / Atom feed

* [NFS PATCH] Fix the issue of writing data over NFS concurrently both on client and Server uncorrectly
@ 2007-06-27  3:33 shichao
  2007-06-27  3:39 ` Trond Myklebust
  0 siblings, 1 reply; 2+ messages in thread
From: shichao @ 2007-06-27  3:33 UTC (permalink / raw)
  To: nfs@lists.sourceforge.net, nfsv4@linux-nfs.org

Hi everybody,

When I write data on file over NFS (mounted aync)  both on client and on Server, 
sometimes the client seems to miss the server's updated data and commit the stale cache data to the nfs server. This
leads to a file writed uncorrectly.

Example on a local dir /local/testdir/foo NFS-exported to the localhost on /nfs/foo:
 The exportfs option in server is: /nfs/  *(rw,sync,root_no_squash)

[root@rhel5~]# mount server:/nfs  /local/testdir

 My test program's main flow is as follows:
  1) fd = open("/local/testdir/foo", oflags, CHMOD_RW )
  2) wbuf="testmsg-1-"; write(fd, wbuf, strlen(wbuf))
  3) On nfs Server: echo "testmsg-2-" >> /nfs/foo    (using rsh)
  4) sleep(6); lseek(fd, 0L, SEEK_END)
  5) wbuf="testmsg-3-"; write(fd, wbuf, strlen(wbuf))
  6) lseek(fd, 0L, 0); read(fd, rbuf, MAX_DATA_SIZE)

 It is expected that the file data should be "testmsg-1-testmsg-2-testmsg-3",
 but in fact, after the program excuted, the file data is "testmsg-1-\0\0\0\0\0\0\0\0\0-testmsg-3".
 This problem just happens in the NFS mounted in the "sync" mode.

 I have investigated the problem and found the problem may be resolved by fixing the nfs_updatepage().

 When the lseek(fs,0L,SEEK_END) is invoked, the inode will be found invalid, and the nfs_revalidae_inode() will 
 update the flle size correctly, but the cache corresponding to the inode is not updately at the same time.
 Then at the final, the write(fd) write the "testmsg-3) from the update tail of the page cache, Because the NFS is mounted 
 in async mode, the write_updatepage just write the date in the local cache, do not commit the data to server in real time.
 In fact, the middle of the page cache of the inode is just filled with '\0'. 
 Finally, when the read(fd) is invoked, the nfsi->cache_validity is invalid and the local page cache is fresher than the cache sever,
 So the kernel commit all the local page cache to the server, as a result, the file on the sever is updated by the 
 invalid data "testmsg-1-\0\0\0\0\0\0\0\0\0-testmsg-3".

 To fix the problem, I think that the page cache should be checked in nfs_updatepage even in async mode. If the page is founded 
 that the nfsi->cache_validity is marked or the local inode->i_mtime and sever fattr.mtime is not the same, the kernel should 
 invoke the nfs_writepage_sync() to commit the writed data at real time.
 Then at the read(), even some stale cache data exists in the middle of the page cache, the nfsi->cache_validity is valid and the local page
 cache timestamp is the same as the cache sever, so the kernel will not commit the local invalid page cache to the server, 
 then finally the local page cache will be refreshed by the read with the correct file data on sever.

 I have made the patch attached below, it is confirmed the patch can fix the issue. 
 But I am still wonder that is the patch really reasonable and correct, or is the patch just not needed, this issue may can not
 be avoid in async in fact?

 Can somebody give your suggestion or other better fix method? 

Thanks

Regards.
ShiChao

Signed-off-by: ShiChao <shichao@cn.fujitsu.com>
--- linux/fs/nfs/write.c	2007-06-27 10:32:52.000000000 +0800
+++ linux-fix/fs/nfs/write.c	2007-06-27 10:29:55.000000000 +0800
@@ -836,6 +836,9 @@
 	struct inode	*inode = page->mapping->host;
 	struct nfs_page	*req;
 	int		status = 0;
+	int		result = 0 ;
+	struct nfs_inode *nfsi = NFS_I(inode);
+	struct nfs_fattr fattr;

 	nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);

@@ -852,8 +855,27 @@
	if (IS_SYNC(inode)) {
		status = nfs_writepage_sync(ctx, inode, page, offset, count, 0);
		if (status > 0) {
			if (offset == 0 && status == PAGE_CACHE_SIZE)
				SetPageUptodate(page);
			return 0;
		}
		return status;
-	}
-
+	}else{
+			   if (nfsi->cache_validity & NFS_INO_INVALID_DATA) 
+                           {	
+			           status = nfs_writepage_sync(ctx, inode, page, offset, count, 0);
+			           if (status > 0) {
+				        if (offset == 0 && status == PAGE_CACHE_SIZE)
+					   SetPageUptodate(page);
+				        return 0;
+			           }
+			   }else if ( nfs_attribute_timeout(inode)){
+                           result = NFS_PROTO(inode)->getattr(NFS_SERVER(inode), NFS_FH(inode), &fattr); 
+	                     if (!timespec_equal(&inode->i_mtime, &fattr.mtime)){
+			            status = nfs_writepage_sync(ctx, inode, page, offset, count, 0);
+			           if (status > 0) {
+				        if (offset == 0 && status == PAGE_CACHE_SIZE)
+					   SetPageUptodate(page);
+				        return 0;
+			              }     
+			    }
+			}
+        }
 	/* If we're not using byte range locks, and we know the page
 	 * is entirely in cache, it may be more efficient to avoid
 	 * fragmenting write requests.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [NFS PATCH] Fix the issue of writing data over NFS concurrently both on client and Server uncorrectly
  2007-06-27  3:33 [NFS PATCH] Fix the issue of writing data over NFS concurrently both on client and Server uncorrectly shichao
@ 2007-06-27  3:39 ` Trond Myklebust
  0 siblings, 0 replies; 2+ messages in thread
From: Trond Myklebust @ 2007-06-27  3:39 UTC (permalink / raw)
  To: shichao; +Cc: nfsv4@linux-nfs.org, nfs@lists.sourceforge.net

On Wed, 2007-06-27 at 11:33 +0800, shichao wrote:
> Hi everybody,
> 
> When I write data on file over NFS (mounted aync)  both on client and on Server, 
> sometimes the client seems to miss the server's updated data and commit the stale cache data to the nfs server. This
> leads to a file writed uncorrectly.
> 
> Example on a local dir /local/testdir/foo NFS-exported to the localhost on /nfs/foo:
>  The exportfs option in server is: /nfs/  *(rw,sync,root_no_squash)
> 
> [root@rhel5~]# mount server:/nfs  /local/testdir
> 
>  My test program's main flow is as follows:
>   1) fd = open("/local/testdir/foo", oflags, CHMOD_RW )
>   2) wbuf="testmsg-1-"; write(fd, wbuf, strlen(wbuf))
>   3) On nfs Server: echo "testmsg-2-" >> /nfs/foo    (using rsh)
>   4) sleep(6); lseek(fd, 0L, SEEK_END)
>   5) wbuf="testmsg-3-"; write(fd, wbuf, strlen(wbuf))
>   6) lseek(fd, 0L, 0); read(fd, rbuf, MAX_DATA_SIZE)
> 
>  It is expected that the file data should be "testmsg-1-testmsg-2-testmsg-3",

No. The expectation in this case is precisely what you saw. You should
not be changing the file on the server unless you are using file locking
or some other synchronisation mechanism.

Please read the NFS FAQ on close-to-open cache consistency:
	http://nfs.sourceforge.net/#faq_a8

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-06-27  3:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-27  3:33 [NFS PATCH] Fix the issue of writing data over NFS concurrently both on client and Server uncorrectly shichao
2007-06-27  3:39 ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.