* [PATCH 000 of 2] A couple of possible nfs client patches
@ 2007-03-06 5:40 NeilBrown
2007-03-06 5:40 ` [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results NeilBrown
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: NeilBrown @ 2007-03-06 5:40 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Hi Trond,
I wonder if you would consider the following two patches.
The first fixes a real bug that is trivial to demonstrate: script
below. The problem is that the fattr which is extracted from
READDIRPLUS results does not get a ->time_start set. This can result
is incorrect attributes being set on files.
The second is really a work-around for a deeper problem (invalidate_inode_pages2
racing with do_no_page) but has value in itself I think. It extents the combination
-o nolock,nocto
to avoid flush and invalidate on locking requests.
Thanks,
NeilBrown
[PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results.
[PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed.
#!/bin/bash
#
# This script will produce the following errormessage from tar
# when run in an NFS mounted directory:
#
# tar: newdir/innerdir/innerfile: file changed as we read it
# create dirs
rm -rf nfstest
mkdir -p nfstest/dir/innerdir
# create files (should not be empty)
echo "Hello World!" >nfstest/dir/file
echo "Hello World!" >nfstest/dir/innerdir/innerfile
# problem only happens if we sleep before chmod
sleep 1
# change file modes
chmod -R a+r nfstest
# rename dir
mv nfstest/dir nfstest/newdir
# tar it
tar -cf nfstest/nfstest.tar -C nfstest newdir
# restore old dir name
mv nfstest/newdir nfstest/dir
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results. 2007-03-06 5:40 [PATCH 000 of 2] A couple of possible nfs client patches NeilBrown @ 2007-03-06 5:40 ` NeilBrown 2007-04-14 22:13 ` Trond Myklebust 2007-03-06 5:40 ` [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed NeilBrown 2007-03-06 14:57 ` [PATCH 000 of 2] A couple of possible nfs client patches Chuck Lever 2 siblings, 1 reply; 14+ messages in thread From: NeilBrown @ 2007-03-06 5:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs The 'fattr' structure filled in by nfs3_decode_direct does not get a value for ->time_start set. Thus if an entry is for an inode that we already have in cache, when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode and may update the inode with out-of-date information. Directories are read a page at a time, so each page could have a different timestamp that "should" be used to set the time_start for the fattr for info in that page. However storing the timestamp per page is awkward. (We could stick in the first 4 bytes and only read 4092 bytes, but that is a bigger code change than I was interested it). This patch records the timestamp when the first page of a directory is read and uses it to set the time_start for the fattr for each each entry read from the directory. This if an inode has been updated more recently than the directory was read, that information will not be destroyed. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./fs/nfs/dir.c | 4 ++++ 1 file changed, 4 insertions(+) diff .prev/fs/nfs/dir.c ./fs/nfs/dir.c --- .prev/fs/nfs/dir.c 2007-03-06 15:35:44.000000000 +1100 +++ ./fs/nfs/dir.c 2007-03-06 15:46:46.000000000 +1100 @@ -153,6 +153,7 @@ typedef struct { decode_dirent_t decode; int plus; int error; + unsigned long timestamp; } nfs_readdir_descriptor_t; /* Now we cache directories properly, by stuffing the dirent @@ -202,6 +203,8 @@ int nfs_readdir_filler(nfs_readdir_descr * Note: assumes we have exclusive access to this mapping either * through inode->i_mutex or some other mechanism. */ + if (page->index == 0) + desc->timestamp = timestamp; if (page->index == 0 && invalidate_inode_pages2_range(inode->i_mapping, PAGE_CACHE_SIZE, -1) < 0) { /* Should never happen */ nfs_zap_mapping(inode, inode->i_mapping); @@ -1145,6 +1148,7 @@ static struct dentry *nfs_readdir_lookup if (dentry == NULL) return NULL; dentry->d_op = NFS_PROTO(dir)->dentry_ops; + entry->fattr->time_start = desc->timestamp; inode = nfs_fhget(dentry->d_sb, entry->fh, entry->fattr); if (IS_ERR(inode)) { dput(dentry); ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results. 2007-03-06 5:40 ` [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results NeilBrown @ 2007-04-14 22:13 ` Trond Myklebust 2007-04-15 23:35 ` Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: Trond Myklebust @ 2007-04-14 22:13 UTC (permalink / raw) To: NeilBrown; +Cc: nfs On Tue, 2007-03-06 at 16:40 +1100, NeilBrown wrote: > The 'fattr' structure filled in by nfs3_decode_direct does not get a > value for ->time_start set. > Thus if an entry is for an inode that we already have in cache, > when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode > and may update the inode with out-of-date information. > > Directories are read a page at a time, so each page could have a > different timestamp that "should" be used to set the time_start for > the fattr for info in that page. However storing the timestamp per > page is awkward. (We could stick in the first 4 bytes and only read 4092 > bytes, but that is a bigger code change than I was interested it). > > This patch records the timestamp when the first page of a directory is > read and uses it to set the time_start for the fattr for each each > entry read from the directory. This if an inode has been updated more > recently than the directory was read, that information will not be > destroyed. > > Signed-off-by: Neil Brown <neilb@suse.de> > > ### Diffstat output > ./fs/nfs/dir.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff .prev/fs/nfs/dir.c ./fs/nfs/dir.c > --- .prev/fs/nfs/dir.c 2007-03-06 15:35:44.000000000 +1100 > +++ ./fs/nfs/dir.c 2007-03-06 15:46:46.000000000 +1100 > @@ -153,6 +153,7 @@ typedef struct { > decode_dirent_t decode; > int plus; > int error; > + unsigned long timestamp; > } nfs_readdir_descriptor_t; > > /* Now we cache directories properly, by stuffing the dirent > @@ -202,6 +203,8 @@ int nfs_readdir_filler(nfs_readdir_descr > * Note: assumes we have exclusive access to this mapping either > * through inode->i_mutex or some other mechanism. > */ > + if (page->index == 0) > + desc->timestamp = timestamp; > if (page->index == 0 && invalidate_inode_pages2_range(inode->i_mapping, PAGE_CACHE_SIZE, -1) < 0) { > /* Should never happen */ > nfs_zap_mapping(inode, inode->i_mapping); > @@ -1145,6 +1148,7 @@ static struct dentry *nfs_readdir_lookup > if (dentry == NULL) > return NULL; > dentry->d_op = NFS_PROTO(dir)->dentry_ops; > + entry->fattr->time_start = desc->timestamp; > inode = nfs_fhget(dentry->d_sb, entry->fh, entry->fattr); > if (IS_ERR(inode)) { > dput(dentry); Hmm... NACK on this one. If page->index != 0, or if we're reading a cached page then desc->timestamp is uninitialised. One solution might be to detect if find_dirent_page called nfs_readdir_filler, and then to have dir_decode() set NFS_ATTR_FATTR only if we're reading a fresh page? Cheers Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results. 2007-04-14 22:13 ` Trond Myklebust @ 2007-04-15 23:35 ` Neil Brown 2007-04-16 15:23 ` Trond Myklebust 0 siblings, 1 reply; 14+ messages in thread From: Neil Brown @ 2007-04-15 23:35 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs On Saturday April 14, trond.myklebust@fys.uio.no wrote: > > Hmm... NACK on this one. If page->index != 0, or if we're reading a > cached page then desc->timestamp is uninitialised. Well... not completely uninitialised. The 'memset' in nfs_readdir sets it to zero, but that isn't really helpful. Yes, the patch is broken as it stands. > > One solution might be to detect if find_dirent_page called > nfs_readdir_filler, and then to have dir_decode() set NFS_ATTR_FATTR > only if we're reading a fresh page? That probably works well enough. If the app is doing very small getdirents call, the first few entries from each page will have the readdir_plus info honoured and the remainder will have it ignored. I doubt that is a problem in real-life.... unless it means the d_type field doesn't get set properly. Anyway, here is an updated patch that uses the above idea and has been tested to solve the particular symptom. Thanks, NeilBrown Don't use uninitialsed value for fattr->time_start in readdirplus results. The 'fattr' structure filled in by nfs3_decode_direct does not get a value for ->time_start set. Thus if an entry is for an inode that we already have in cache, when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode and may update the inode with out-of-date information. Directories are read a page at a time, so each page could have a different timestamp that "should" be used to set the time_start for the fattr for info in that page. However storing the timestamp per page is awkward. (We could stick in the first 4 bytes and only read 4092 bytes, but that is a bigger code change than I am interested it). This patch ignores the readdir_plus attributes if a readdir finds the information already in cache, and otherwise sets ->time_start to the time the readdir request was sent to the server. It might be nice to store - in the directory inode - the time stamp for the earliest readdir request that is still in the page cache, so that we don't ignore attribute data that we don't have to. This patch doesn't do that. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./fs/nfs/dir.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff .prev/fs/nfs/dir.c ./fs/nfs/dir.c --- .prev/fs/nfs/dir.c 2007-04-16 09:15:12.000000000 +1000 +++ ./fs/nfs/dir.c 2007-04-16 09:30:30.000000000 +1000 @@ -153,6 +153,8 @@ typedef struct { decode_dirent_t decode; int plus; int error; + unsigned long timestamp; + int timestamp_valid; } nfs_readdir_descriptor_t; /* Now we cache directories properly, by stuffing the dirent @@ -194,6 +196,8 @@ int nfs_readdir_filler(nfs_readdir_descr } goto error; } + desc->timestamp = timestamp; + desc->timestamp_valid = 1; SetPageUptodate(page); spin_lock(&inode->i_lock); NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ATIME; @@ -224,6 +228,10 @@ int dir_decode(nfs_readdir_descriptor_t if (IS_ERR(p)) return PTR_ERR(p); desc->ptr = p; + if (desc->timestamp_valid) + desc->entry->fattr->time_start = desc->timestamp; + else + desc->entry->fattr->valid &= ~NFS_ATTR_FATTR; return 0; } @@ -315,6 +323,10 @@ int find_dirent_page(nfs_readdir_descrip __FUNCTION__, desc->page_index, (long long) *desc->dir_cookie); + /* If we find the page in the page_cache, we cannot be sure + * how fresh the data is, so we will ignore readdir_plus attributes. + */ + desc->timestamp_valid = 0; page = read_cache_page(inode->i_mapping, desc->page_index, (filler_t *)nfs_readdir_filler, desc); if (IS_ERR(page)) { @@ -462,6 +474,7 @@ int uncached_readdir(nfs_readdir_descrip struct rpc_cred *cred = nfs_file_cred(file); struct page *page = NULL; int status; + unsigned long timestamp; dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n", (unsigned long long)*desc->dir_cookie); @@ -471,6 +484,7 @@ int uncached_readdir(nfs_readdir_descrip status = -ENOMEM; goto out; } + timestamp = jiffies; desc->error = NFS_PROTO(inode)->readdir(file->f_path.dentry, cred, *desc->dir_cookie, page, NFS_SERVER(inode)->dtsize, @@ -481,6 +495,8 @@ int uncached_readdir(nfs_readdir_descrip desc->page = page; desc->ptr = kmap(page); /* matching kunmap in nfs_do_filldir */ if (desc->error >= 0) { + desc->timestamp = timestamp; + desc->timestamp_valid = 1; if ((status = dir_decode(desc)) == 0) desc->entry->prev_cookie = *desc->dir_cookie; } else ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results. 2007-04-15 23:35 ` Neil Brown @ 2007-04-16 15:23 ` Trond Myklebust 0 siblings, 0 replies; 14+ messages in thread From: Trond Myklebust @ 2007-04-16 15:23 UTC (permalink / raw) To: Neil Brown; +Cc: nfs On Mon, 2007-04-16 at 09:35 +1000, Neil Brown wrote: > > > > One solution might be to detect if find_dirent_page called > > nfs_readdir_filler, and then to have dir_decode() set NFS_ATTR_FATTR > > only if we're reading a fresh page? > > That probably works well enough. If the app is doing very small > getdirents call, the first few entries from each page will have the > readdir_plus info honoured and the remainder will have it ignored. I > doubt that is a problem in real-life.... unless it means the d_type > field doesn't get set properly. > > Anyway, here is an updated patch that uses the above idea and has been > tested to solve the particular symptom. This looks better. I'm putting it into testing with a view to inclusion in 2.6.22. Thanks Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-03-06 5:40 [PATCH 000 of 2] A couple of possible nfs client patches NeilBrown 2007-03-06 5:40 ` [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results NeilBrown @ 2007-03-06 5:40 ` NeilBrown 2007-03-06 14:40 ` Trond Myklebust 2007-03-06 14:57 ` [PATCH 000 of 2] A couple of possible nfs client patches Chuck Lever 2 siblings, 1 reply; 14+ messages in thread From: NeilBrown @ 2007-03-06 5:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs If nolock and nocto mount options are set, then the implication is that cache consistency with other clients is not desired. This is likely to be the case when the filesystem is only access by the one client - as a nfs-mounted root might be. In that case, there is no benefit in flushing writes and invalidating caches around lock/unlock requests. This patch makes those flushes and invalidates conditional on either nolock or nocto being clear. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./fs/nfs/file.c | 46 +++++++++++++++++++++++++++------------------- 1 file changed, 27 insertions(+), 19 deletions(-) diff .prev/fs/nfs/file.c ./fs/nfs/file.c --- .prev/fs/nfs/file.c 2007-03-06 16:13:00.000000000 +1100 +++ ./fs/nfs/file.c 2007-03-06 16:13:06.000000000 +1100 @@ -445,11 +445,14 @@ static int do_unlk(struct file *filp, in struct inode *inode = filp->f_mapping->host; int status; - /* - * Flush all pending writes before doing anything - * with locks.. - */ - nfs_sync_mapping(filp->f_mapping); + if (!((NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) && + (NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO))) { + /* + * Flush all pending writes before doing anything + * with locks.. + */ + nfs_sync_mapping(filp->f_mapping); + } /* NOTE: special case * If we're signalled while cleaning up locks on process exit, we @@ -470,14 +473,16 @@ static int do_setlk(struct file *filp, i struct inode *inode = filp->f_mapping->host; int status; - /* - * Flush all pending writes before doing anything - * with locks.. - */ - status = nfs_sync_mapping(filp->f_mapping); - if (status != 0) - goto out; - + if (!((NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) && + (NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO))) { + /* + * Flush all pending writes before doing anything + * with locks.. + */ + status = nfs_sync_mapping(filp->f_mapping); + if (status != 0) + goto out; + } lock_kernel(); /* Use local locking if mounted with "-onolock" */ if (!(NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM)) { @@ -495,12 +500,15 @@ static int do_setlk(struct file *filp, i unlock_kernel(); if (status < 0) goto out; - /* - * Make sure we clear the cache whenever we try to get the lock. - * This makes locking act as a cache coherency point. - */ - nfs_sync_mapping(filp->f_mapping); - nfs_zap_caches(inode); + if (!((NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) && + (NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO))) { + /* + * Make sure we clear the cache whenever we try to get the lock. + * This makes locking act as a cache coherency point. + */ + nfs_sync_mapping(filp->f_mapping); + nfs_zap_caches(inode); + } out: return status; } ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-03-06 5:40 ` [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed NeilBrown @ 2007-03-06 14:40 ` Trond Myklebust 2007-03-07 23:03 ` Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: Trond Myklebust @ 2007-03-06 14:40 UTC (permalink / raw) To: NeilBrown; +Cc: nfs On Tue, 2007-03-06 at 16:40 +1100, NeilBrown wrote: > If nolock and nocto mount options are set, then the implication is > that cache consistency with other clients is not desired. This is > likely to be the case when the filesystem is only access by the one > client - as a nfs-mounted root might be. > > In that case, there is no benefit in flushing writes and invalidating > caches around lock/unlock requests. > > This patch makes those flushes and invalidates conditional on either > nolock or nocto being clear. > > Signed-off-by: Neil Brown <neilb@suse.de> NACK. Disabling close to open cache consistency using the "nocto" option is not equivalent to disabling _all_ cache consistency. If people are using remote locking, then it is because they are expecting a situation where synchronisation is needed. You can argue for or against the need to invalidate caches when the user has specified "nolock" (it might be nice to add a Solaris-style "llock" alias to this one day, btw: nolock == local locking != no locking at all). We used to do this in early 2.4.x kernels, but that was changed. The reason is that locking is the only way of ensuring that the NFS cache is in sync with the server, and the "nolock" exception was breaking this. Cheers, Trond > ### Diffstat output > ./fs/nfs/file.c | 46 +++++++++++++++++++++++++++------------------- > 1 file changed, 27 insertions(+), 19 deletions(-) > > diff .prev/fs/nfs/file.c ./fs/nfs/file.c > --- .prev/fs/nfs/file.c 2007-03-06 16:13:00.000000000 +1100 > +++ ./fs/nfs/file.c 2007-03-06 16:13:06.000000000 +1100 > @@ -445,11 +445,14 @@ static int do_unlk(struct file *filp, in > struct inode *inode = filp->f_mapping->host; > int status; > > - /* > - * Flush all pending writes before doing anything > - * with locks.. > - */ > - nfs_sync_mapping(filp->f_mapping); > + if (!((NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) && > + (NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO))) { > + /* > + * Flush all pending writes before doing anything > + * with locks.. > + */ > + nfs_sync_mapping(filp->f_mapping); > + } > > /* NOTE: special case > * If we're signalled while cleaning up locks on process exit, we > @@ -470,14 +473,16 @@ static int do_setlk(struct file *filp, i > struct inode *inode = filp->f_mapping->host; > int status; > > - /* > - * Flush all pending writes before doing anything > - * with locks.. > - */ > - status = nfs_sync_mapping(filp->f_mapping); > - if (status != 0) > - goto out; > - > + if (!((NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) && > + (NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO))) { > + /* > + * Flush all pending writes before doing anything > + * with locks.. > + */ > + status = nfs_sync_mapping(filp->f_mapping); > + if (status != 0) > + goto out; > + } > lock_kernel(); > /* Use local locking if mounted with "-onolock" */ > if (!(NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM)) { > @@ -495,12 +500,15 @@ static int do_setlk(struct file *filp, i > unlock_kernel(); > if (status < 0) > goto out; > - /* > - * Make sure we clear the cache whenever we try to get the lock. > - * This makes locking act as a cache coherency point. > - */ > - nfs_sync_mapping(filp->f_mapping); > - nfs_zap_caches(inode); > + if (!((NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) && > + (NFS_SERVER(inode)->flags & NFS_MOUNT_NOCTO))) { > + /* > + * Make sure we clear the cache whenever we try to get the lock. > + * This makes locking act as a cache coherency point. > + */ > + nfs_sync_mapping(filp->f_mapping); > + nfs_zap_caches(inode); > + } > out: > return status; > } ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-03-06 14:40 ` Trond Myklebust @ 2007-03-07 23:03 ` Neil Brown 2007-04-14 22:21 ` Trond Myklebust 0 siblings, 1 reply; 14+ messages in thread From: Neil Brown @ 2007-03-07 23:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: NeilBrown, nfs On Tuesday March 6, trond.myklebust@fys.uio.no wrote: > On Tue, 2007-03-06 at 16:40 +1100, NeilBrown wrote: > > If nolock and nocto mount options are set, then the implication is > > that cache consistency with other clients is not desired. This is > > likely to be the case when the filesystem is only access by the one > > client - as a nfs-mounted root might be. > > > > In that case, there is no benefit in flushing writes and invalidating > > caches around lock/unlock requests. > > > > This patch makes those flushes and invalidates conditional on either > > nolock or nocto being clear. > > > > Signed-off-by: Neil Brown <neilb@suse.de> > > NACK. > > Disabling close to open cache consistency using the "nocto" option is > not equivalent to disabling _all_ cache consistency. If people are using > remote locking, then it is because they are expecting a situation where > synchronisation is needed. "If people are using remote locking..."... but in this case (nolock) they aren't. > > You can argue for or against the need to invalidate caches when the user > has specified "nolock" (it might be nice to add a Solaris-style "llock" > alias to this one day, btw: nolock == local locking != no locking at > all). We used to do this in early 2.4.x kernels, but that was changed. > The reason is that locking is the only way of ensuring that the NFS > cache is in sync with the server, and the "nolock" exception was > breaking this. Yes, the name "nolock" does seem to have been unfortunate. We had one customer wondering why it wasn't producing an 'ENOLCK' error status. We could create an alias. "nonlm"? If you have chosen nocto,nolock, I cannot see any justification in expecting any way of ensuring the NFS cache is in sync with the server except at unmount. However if you don't like it, that's fine. It was just a "this might be useful, what do you think" rather than "this is important". The other patch of the two was the real bugfix. Did that pass muster? But I worry about adding an 'llock' option that was different from 'nolock'. How would you explain the difference in a way that didn't confuse people? NeilBrown ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-03-07 23:03 ` Neil Brown @ 2007-04-14 22:21 ` Trond Myklebust 2007-04-16 1:49 ` Chuck Lever 0 siblings, 1 reply; 14+ messages in thread From: Trond Myklebust @ 2007-04-14 22:21 UTC (permalink / raw) To: Neil Brown; +Cc: nfs On Thu, 2007-03-08 at 10:03 +1100, Neil Brown wrote: > > > > You can argue for or against the need to invalidate caches when the user > > has specified "nolock" (it might be nice to add a Solaris-style "llock" > > alias to this one day, btw: nolock == local locking != no locking at > > all). We used to do this in early 2.4.x kernels, but that was changed. > > The reason is that locking is the only way of ensuring that the NFS > > cache is in sync with the server, and the "nolock" exception was > > breaking this. > > Yes, the name "nolock" does seem to have been unfortunate. We had one > customer wondering why it wasn't producing an 'ENOLCK' error status. > We could create an alias. "nonlm"? I'd prefer "llock", simply because that mirrors the same option on Solaris. > If you have chosen nocto,nolock, I cannot see any justification in > expecting any way of ensuring the NFS cache is in sync with the > server except at unmount. > > However if you don't like it, that's fine. It was just a "this might > be useful, what do you think" rather than "this is important". The > other patch of the two was the real bugfix. Did that pass muster? > > But I worry about adding an 'llock' option that was different from > 'nolock'. How would you explain the difference in a way that didn't > confuse people? Are we sure that 'llock' differs from 'nolock'? If Chuck is right, then we could simply make 'llock' set both the 'nolock' 'nocto' flags, and just document that. I'm not sure that he is, though. According to the manpage, Solaris documents nocto No close-to-open consistency. llock Local locking being used (no lock manager). Cheers Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-04-14 22:21 ` Trond Myklebust @ 2007-04-16 1:49 ` Chuck Lever 2007-04-16 14:12 ` Peter Staubach 2007-04-16 14:41 ` Trond Myklebust 0 siblings, 2 replies; 14+ messages in thread From: Chuck Lever @ 2007-04-16 1:49 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, nfs [-- Attachment #1: Type: text/plain, Size: 3080 bytes --] Trond Myklebust wrote: > On Thu, 2007-03-08 at 10:03 +1100, Neil Brown wrote: >>> You can argue for or against the need to invalidate caches when the user >>> has specified "nolock" (it might be nice to add a Solaris-style "llock" >>> alias to this one day, btw: nolock == local locking != no locking at >>> all). We used to do this in early 2.4.x kernels, but that was changed. >>> The reason is that locking is the only way of ensuring that the NFS >>> cache is in sync with the server, and the "nolock" exception was >>> breaking this. Why not create a special fcntl for NFS files that invalidates cached data for that file? That way, no matter what semantics a lock or unlock operation has, there is always a well-defined way to flush the page cache. >> Yes, the name "nolock" does seem to have been unfortunate. We had one >> customer wondering why it wasn't producing an 'ENOLCK' error status. >> We could create an alias. "nonlm"? > > I'd prefer "llock", simply because that mirrors the same option on > Solaris. > >> If you have chosen nocto,nolock, I cannot see any justification in >> expecting any way of ensuring the NFS cache is in sync with the >> server except at unmount. >> >> However if you don't like it, that's fine. It was just a "this might >> be useful, what do you think" rather than "this is important". The >> other patch of the two was the real bugfix. Did that pass muster? >> >> But I worry about adding an 'llock' option that was different from >> 'nolock'. How would you explain the difference in a way that didn't >> confuse people? > > Are we sure that 'llock' differs from 'nolock'? If Chuck is right, then > we could simply make 'llock' set both the 'nolock' 'nocto' flags, and > just document that. I'm not sure that he is, though. According to the > manpage, Solaris documents > > nocto > > No close-to-open consistency. > > llock > > Local locking being used (no lock manager). We should be careful here. Solaris, IIRC, has different cache coherency semantics from Linux when a file is locked, so it might be difficult to compare mount options directly between Linux and Solaris. Locked NFS files on Solaris get uncached I/O limited to 8KB. I don't know exactly how llock changes this, but I suspect it probably doesn't change it at all. Perhaps someone with Solaris NFS development experience can shed some light. IMO an llock mount option is sorely missing on Linux... and I think it should disable the "automatic" cache flushing behaviors as well. When local locking is used, clearly cache coherency with other clients isn't needed, and therefore invalidate-on-lock and flush-on-unlock semantics are not necessary and serve only to slow things down. I don't necessarily disagree with Trond about requiring some mechanism for cache invalidation to be available. However, as an aside, if the NFS cache invalidation logic is working overall, it really shouldn't ever be necessary for applications to trigger invalidation by hand. [-- Attachment #2: chuck.lever.vcf --] [-- Type: text/x-vcard, Size: 327 bytes --] begin:vcard fn:Chuck Lever n:Lever;Chuck org:Oracle Corporation;Corporate Architecture: Linux Projects Group adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA email;internet:chuck dot lever at nospam oracle dot com title:Principal Member of Staff tel;work:+1 248 614 5091 x-mozilla-html:FALSE version:2.1 end:vcard [-- Attachment #3: Type: text/plain, Size: 286 bytes --] ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ [-- Attachment #4: Type: text/plain, Size: 140 bytes --] _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-04-16 1:49 ` Chuck Lever @ 2007-04-16 14:12 ` Peter Staubach 2007-04-16 14:41 ` Trond Myklebust 1 sibling, 0 replies; 14+ messages in thread From: Peter Staubach @ 2007-04-16 14:12 UTC (permalink / raw) To: chuck.lever; +Cc: Neil Brown, nfs, Trond Myklebust Chuck Lever wrote: > Trond Myklebust wrote: >> On Thu, 2007-03-08 at 10:03 +1100, Neil Brown wrote: >>>> You can argue for or against the need to invalidate caches when the >>>> user >>>> has specified "nolock" (it might be nice to add a Solaris-style >>>> "llock" >>>> alias to this one day, btw: nolock == local locking != no locking at >>>> all). We used to do this in early 2.4.x kernels, but that was changed. >>>> The reason is that locking is the only way of ensuring that the NFS >>>> cache is in sync with the server, and the "nolock" exception was >>>> breaking this. > > Why not create a special fcntl for NFS files that invalidates cached > data for that file? That way, no matter what semantics a lock or > unlock operation has, there is always a well-defined way to flush the > page cache. > >>> Yes, the name "nolock" does seem to have been unfortunate. We had one >>> customer wondering why it wasn't producing an 'ENOLCK' error status. >>> We could create an alias. "nonlm"? >> >> I'd prefer "llock", simply because that mirrors the same option on >> Solaris. >> >>> If you have chosen nocto,nolock, I cannot see any justification in >>> expecting any way of ensuring the NFS cache is in sync with the >>> server except at unmount. >>> >>> However if you don't like it, that's fine. It was just a "this might >>> be useful, what do you think" rather than "this is important". The >>> other patch of the two was the real bugfix. Did that pass muster? >>> >>> But I worry about adding an 'llock' option that was different from >>> 'nolock'. How would you explain the difference in a way that didn't >>> confuse people? >> >> Are we sure that 'llock' differs from 'nolock'? If Chuck is right, then >> we could simply make 'llock' set both the 'nolock' 'nocto' flags, and >> just document that. I'm not sure that he is, though. According to the >> manpage, Solaris documents >> >> nocto >> No close-to-open consistency. >> llock >> Local locking being used (no lock manager). > > We should be careful here. Solaris, IIRC, has different cache > coherency semantics from Linux when a file is locked, so it might be > difficult to compare mount options directly between Linux and Solaris. > > Locked NFS files on Solaris get uncached I/O limited to 8KB. I don't > know exactly how llock changes this, but I suspect it probably doesn't > change it at all. Perhaps someone with Solaris NFS development > experience can shed some light. > "llock" or "local lock" implies that only local locks will be used. Thus, only the locks that the kernel is managing are used or kept track of. No over the wire traffic for locking is issued. With this option, caching is enabled. Actually, with current versions of Solaris, caching is also enabled when the entire file is locked. There is a flush on unlock and a check for consistency on lock, much like the "nocto" handling, but otherwise normal page caching applies. > IMO an llock mount option is sorely missing on Linux... and I think it > should disable the "automatic" cache flushing behaviors as well. When > local locking is used, clearly cache coherency with other clients > isn't needed, and therefore invalidate-on-lock and flush-on-unlock > semantics are not necessary and serve only to slow things down. > Yes, this last is very true and mirrors the Solaris "llock" implementation semantics. > I don't necessarily disagree with Trond about requiring some mechanism > for cache invalidation to be available. However, as an aside, if the > NFS cache invalidation logic is working overall, it really shouldn't > ever be necessary for applications to trigger invalidation by hand. I would agree with this last. The kernel should always "do the right thing", from a correctness viewpoint and not depend upon applications to do it for the kernel. Thanx... ps ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-04-16 1:49 ` Chuck Lever 2007-04-16 14:12 ` Peter Staubach @ 2007-04-16 14:41 ` Trond Myklebust 2007-04-16 14:48 ` Peter Staubach 1 sibling, 1 reply; 14+ messages in thread From: Trond Myklebust @ 2007-04-16 14:41 UTC (permalink / raw) To: chuck.lever, Peter Staubach; +Cc: Neil Brown, nfs On Sun, 2007-04-15 at 21:49 -0400, Chuck Lever wrote: > Trond Myklebust wrote: > > On Thu, 2007-03-08 at 10:03 +1100, Neil Brown wrote: > >>> You can argue for or against the need to invalidate caches when the user > >>> has specified "nolock" (it might be nice to add a Solaris-style "llock" > >>> alias to this one day, btw: nolock == local locking != no locking at > >>> all). We used to do this in early 2.4.x kernels, but that was changed. > >>> The reason is that locking is the only way of ensuring that the NFS > >>> cache is in sync with the server, and the "nolock" exception was > >>> breaking this. > > Why not create a special fcntl for NFS files that invalidates cached > data for that file? That way, no matter what semantics a lock or unlock > operation has, there is always a well-defined way to flush the page cache. That would break the code portability requirement. Locking is the only operation that has known and portable semantics w.r.t. cache consistency. > I don't necessarily disagree with Trond about requiring some mechanism > for cache invalidation to be available. However, as an aside, if the > NFS cache invalidation logic is working overall, it really shouldn't > ever be necessary for applications to trigger invalidation by hand. Not sure what you mean here. Portable applications may have synchronisation needs that go outside or beyond the usual close-to-open model. If they use NLM locking, then they have no problems. If they have their own locking, then they might not want to use NLM, but they still may have a need to invalidate the cache. On Mon, 16 Apr 2007 at 10:12:09 -0400, Peter Staubach wrote: > "llock" or "local lock" implies that only local locks will be used. > Thus, only the locks that the kernel is managing are used or kept > track of. No over the wire traffic for locking is issued. > > With this option, caching is enabled. So are you saying that 'llock' basically mirrors our 'nolock', or does it additionally disable the flushing operations? If the latter, then I suggest that we add a mount flag, NFS_MOUNT_LLOCK which automatically sets NFS_MOUNT_NONLM + a new flag that just disables the cache flush on lock/unlock. Cheers Trond ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed. 2007-04-16 14:41 ` Trond Myklebust @ 2007-04-16 14:48 ` Peter Staubach 0 siblings, 0 replies; 14+ messages in thread From: Peter Staubach @ 2007-04-16 14:48 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, nfs Trond Myklebust wrote: > > > On Mon, 16 Apr 2007 at 10:12:09 -0400, Peter Staubach wrote: > >> "llock" or "local lock" implies that only local locks will be used. >> Thus, only the locks that the kernel is managing are used or kept >> track of. No over the wire traffic for locking is issued. >> >> With this option, caching is enabled. >> > > So are you saying that 'llock' basically mirrors our 'nolock', or does > it additionally disable the flushing operations? If the latter, then I > suggest that we add a mount flag, NFS_MOUNT_LLOCK which automatically > sets NFS_MOUNT_NONLM + a new flag that just disables the cache flush on > lock/unlock. I believe that the use of "llock" also disables the cache flush at lock/unlock. Perhaps someone with access to the OpenSolaris code, who isn't worried about CDDL versus GPL issues, could look to verify. It has been a bit since I've worked on that code. I do remember that the "llock" code was designed to be used for things like root file systems, where there shouldn't be any conflicts between different clients, and hence, no need for special page cache handling. Thanx... ps ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 000 of 2] A couple of possible nfs client patches 2007-03-06 5:40 [PATCH 000 of 2] A couple of possible nfs client patches NeilBrown 2007-03-06 5:40 ` [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results NeilBrown 2007-03-06 5:40 ` [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed NeilBrown @ 2007-03-06 14:57 ` Chuck Lever 2 siblings, 0 replies; 14+ messages in thread From: Chuck Lever @ 2007-03-06 14:57 UTC (permalink / raw) To: NeilBrown; +Cc: nfs, Trond Myklebust [-- Attachment #1: Type: text/plain, Size: 808 bytes --] NeilBrown wrote: > Hi Trond, > I wonder if you would consider the following two patches. > > The first fixes a real bug that is trivial to demonstrate: script > below. The problem is that the fattr which is extracted from > READDIRPLUS results does not get a ->time_start set. This can result > is incorrect attributes being set on files. > > The second is really a work-around for a deeper problem (invalidate_inode_pages2 > racing with do_no_page) but has value in itself I think. It extents the combination > -o nolock,nocto > to avoid flush and invalidate on locking requests. This sounds like the behavior enabled by the 'llock' mount option on Solaris. Is that your intent? IMO it is a clever idea to enable 'llock' behavior on Linux with the combination of already existing mount options. [-- Attachment #2: chuck.lever.vcf --] [-- Type: text/x-vcard, Size: 265 bytes --] begin:vcard fn:Chuck Lever n:Lever;Chuck org:Oracle Corporation;Corporate Architecture Linux Projects Group email;internet:chuck dot lever at nospam oracle dot com title:Principal Member of Staff tel;work:+1 248 614 5091 x-mozilla-html:FALSE version:2.1 end:vcard [-- Attachment #3: Type: text/plain, Size: 345 bytes --] ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV [-- Attachment #4: Type: text/plain, Size: 140 bytes --] _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-04-16 15:23 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-06 5:40 [PATCH 000 of 2] A couple of possible nfs client patches NeilBrown 2007-03-06 5:40 ` [PATCH 001 of 2] Set meaningful value for fattr->time_start in readdirplus results NeilBrown 2007-04-14 22:13 ` Trond Myklebust 2007-04-15 23:35 ` Neil Brown 2007-04-16 15:23 ` Trond Myklebust 2007-03-06 5:40 ` [PATCH 002 of 2] Avoid flush-when-locking when interclient consistency not needed NeilBrown 2007-03-06 14:40 ` Trond Myklebust 2007-03-07 23:03 ` Neil Brown 2007-04-14 22:21 ` Trond Myklebust 2007-04-16 1:49 ` Chuck Lever 2007-04-16 14:12 ` Peter Staubach 2007-04-16 14:41 ` Trond Myklebust 2007-04-16 14:48 ` Peter Staubach 2007-03-06 14:57 ` [PATCH 000 of 2] A couple of possible nfs client patches Chuck Lever
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.