* Use of READDIRPLUS on large directories
@ 2011-03-16 4:55 NeilBrown
2011-03-16 12:30 ` peter.staubach
2011-03-16 13:43 ` Chuck Lever
0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2011-03-16 4:55 UTC (permalink / raw)
To: Trond Myklebust, Bryan Schumaker; +Cc: linux-nfs
Hi Trond / Bryan et al.
Now that openSUSE 11.4 is out I have started getting a few reports
of regressions that can be traced to
commit 0715dc632a271fc0fedf3ef4779fe28ac1e53ef4
Author: Bryan Schumaker <bjschuma@netapp.com>
Date: Fri Sep 24 18:50:01 2010 -0400
NFS: remove readdir plus limit
We will now use readdir plus even on directories that are very large.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This particularly affects users with their home directory over
NFS, and with largish maildir mail folders.
Where it used to take a smallish number of seconds for (e.g.)
xbiff to start up and read through the various directories, it now
takes multiple minutes.
I can confirm that the slow down is due to readdirplus by mounting the
filesystem with nordirplus.
While I can understand that there are sometime benefits in using
readdirplus for very large directories, there are also obviously real
costs. So I think we have to see this patch as a regression that should
be reverted.
It would quite possibly make sense to create a tunable (mount option or
sysctl I guess) to set the max size for directories to use readdirplus,
but I think it really should be an opt-in situation.
[[ It would also be really nice if the change-log for such a significant
change contained a little more justification.... :-( ]]
Thoughts?
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 17+ messages in thread* RE: Use of READDIRPLUS on large directories 2011-03-16 4:55 Use of READDIRPLUS on large directories NeilBrown @ 2011-03-16 12:30 ` peter.staubach 2011-03-16 13:50 ` Trond Myklebust 2011-03-16 21:40 ` NeilBrown 2011-03-16 13:43 ` Chuck Lever 1 sibling, 2 replies; 17+ messages in thread From: peter.staubach @ 2011-03-16 12:30 UTC (permalink / raw) To: neilb, Trond.Myklebust, bjschuma; +Cc: linux-nfs Perhaps the use of a heuristic that enables readdirplus only after the application has shown that it is interested in the attributes for each entry in the directory? Thus, if the application does readdir()/stat()/stat()/stat()/readdir()/... then the NFS client could use readdirplus to fill the caches. If the application is just reading the directory and looking at the names, then the client could just use readdir. ps -----Original Message----- From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of NeilBrown Sent: Wednesday, March 16, 2011 12:55 AM To: Trond Myklebust; Bryan Schumaker Cc: linux-nfs@vger.kernel.org Subject: Use of READDIRPLUS on large directories Hi Trond / Bryan et al. Now that openSUSE 11.4 is out I have started getting a few reports of regressions that can be traced to commit 0715dc632a271fc0fedf3ef4779fe28ac1e53ef4 Author: Bryan Schumaker <bjschuma@netapp.com> Date: Fri Sep 24 18:50:01 2010 -0400 NFS: remove readdir plus limit We will now use readdir plus even on directories that are very large. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> This particularly affects users with their home directory over NFS, and with largish maildir mail folders. Where it used to take a smallish number of seconds for (e.g.) xbiff to start up and read through the various directories, it now takes multiple minutes. I can confirm that the slow down is due to readdirplus by mounting the filesystem with nordirplus. While I can understand that there are sometime benefits in using readdirplus for very large directories, there are also obviously real costs. So I think we have to see this patch as a regression that should be reverted. It would quite possibly make sense to create a tunable (mount option or sysctl I guess) to set the max size for directories to use readdirplus, but I think it really should be an opt-in situation. [[ It would also be really nice if the change-log for such a significant change contained a little more justification.... :-( ]] Thoughts? Thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: Use of READDIRPLUS on large directories 2011-03-16 12:30 ` peter.staubach @ 2011-03-16 13:50 ` Trond Myklebust 2011-03-16 21:40 ` NeilBrown 1 sibling, 0 replies; 17+ messages in thread From: Trond Myklebust @ 2011-03-16 13:50 UTC (permalink / raw) To: peter.staubach; +Cc: neilb, bjschuma, linux-nfs On Wed, 2011-03-16 at 08:30 -0400, peter.staubach@emc.com wrote: > Perhaps the use of a heuristic that enables readdirplus only after the application has shown that it is interested in the attributes for each entry in the directory? Thus, if the application does readdir()/stat()/stat()/stat()/readdir()/... then the NFS client could use readdirplus to fill the caches. If the application is just reading the directory and looking at the names, then the client could just use readdir. > > ps Yes, possibly. The thing that convinced me that we should get rid of the limit was when Bryan was testing directories with 10^6 entries, and was seeing an order of magnitude improvement when comparing readdirplus vs. readdir on 'ls -l' workloads. I wish he had published the actual numbers in the changelog. As I recall, the slowdown when comparing readdirplus vs readdir on 'ls' workloads was far less. You can easily test that yourself, using the "-onordirplus" mount option to turn off readdirplus (which, btw, remains a workaround for people who don't care about 'ls -l' workloads). Cheers Trond > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of NeilBrown > Sent: Wednesday, March 16, 2011 12:55 AM > To: Trond Myklebust; Bryan Schumaker > Cc: linux-nfs@vger.kernel.org > Subject: Use of READDIRPLUS on large directories > > > Hi Trond / Bryan et al. > > Now that openSUSE 11.4 is out I have started getting a few reports > of regressions that can be traced to > > commit 0715dc632a271fc0fedf3ef4779fe28ac1e53ef4 > Author: Bryan Schumaker <bjschuma@netapp.com> > Date: Fri Sep 24 18:50:01 2010 -0400 > > NFS: remove readdir plus limit > > We will now use readdir plus even on directories that are very large. > > Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > > > This particularly affects users with their home directory over > NFS, and with largish maildir mail folders. > > Where it used to take a smallish number of seconds for (e.g.) > xbiff to start up and read through the various directories, it now > takes multiple minutes. > > I can confirm that the slow down is due to readdirplus by mounting the > filesystem with nordirplus. > > > While I can understand that there are sometime benefits in using > readdirplus for very large directories, there are also obviously real > costs. So I think we have to see this patch as a regression that should > be reverted. > > > It would quite possibly make sense to create a tunable (mount option or > sysctl I guess) to set the max size for directories to use readdirplus, > but I think it really should be an opt-in situation. > > [[ It would also be really nice if the change-log for such a significant > change contained a little more justification.... :-( ]] > > Thoughts? > > Thanks, > NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 12:30 ` peter.staubach 2011-03-16 13:50 ` Trond Myklebust @ 2011-03-16 21:40 ` NeilBrown 2011-03-17 0:55 ` NeilBrown 1 sibling, 1 reply; 17+ messages in thread From: NeilBrown @ 2011-03-16 21:40 UTC (permalink / raw) To: peter.staubach; +Cc: Trond.Myklebust, bjschuma, linux-nfs On Wed, 16 Mar 2011 08:30:20 -0400 <peter.staubach@emc.com> wrote: > Perhaps the use of a heuristic that enables readdirplus only after the application has shown that it is interested in the attributes for each entry in the directory? Thus, if the application does readdir()/stat()/stat()/stat()/readdir()/... then the NFS client could use readdirplus to fill the caches. If the application is just reading the directory and looking at the names, then the client could just use readdir. I think this could work very well. "ls -l" certainly calls 'stat' on each file after each 'getdents' call. So we could arrange that the first readdir call on a directory always uses the 'plus' version, and clears a "seen any getattr calls" flag on the directory. nfs_getattr then sets that flag on the parent subsequent readdir calls only use 'plus' if the flag was set, and clear the flag again. There might be odd issues with multiple processes reading and stating in the same directory, but they probably aren't very serious. I'm might give this idea a try ... but I still think the original switch to always use readdirplus is a regression and should be reverted. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 21:40 ` NeilBrown @ 2011-03-17 0:55 ` NeilBrown 2011-03-17 17:44 ` J. Bruce Fields 0 siblings, 1 reply; 17+ messages in thread From: NeilBrown @ 2011-03-17 0:55 UTC (permalink / raw) To: peter.staubach, Trond.Myklebust; +Cc: bjschuma, linux-nfs On Thu, 17 Mar 2011 08:40:38 +1100 NeilBrown <neilb@suse.de> wrote: > On Wed, 16 Mar 2011 08:30:20 -0400 <peter.staubach@emc.com> wrote: > > > Perhaps the use of a heuristic that enables readdirplus only after the application has shown that it is interested in the attributes for each entry in the directory? Thus, if the application does readdir()/stat()/stat()/stat()/readdir()/... then the NFS client could use readdirplus to fill the caches. If the application is just reading the directory and looking at the names, then the client could just use readdir. > > I think this could work very well. > "ls -l" certainly calls 'stat' on each file after each 'getdents' call. > > So we could arrange that the first readdir call on a directory > always uses the 'plus' version, and clears a "seen any getattr calls" > flag on the directory. > > nfs_getattr then sets that flag on the parent > > subsequent readdir calls only use 'plus' if the flag was set, and > clear the flag again. > > > There might be odd issues with multiple processes reading and stating > in the same directory, but they probably aren't very serious. > > I'm might give this idea a try ... but I still think the original > switch to always use readdirplus is a regression and should be reverted. I've been experimenting with this some more. I've been using an other-wise unloaded NFS server (4 year old consumer Linux box) with ordinary drives and networking etc. Mounting with NFSv3 and default options (unless specified). I created a directory with 30,000 small files. echo 3 > /proc/sys/vm/drop_caches on bother server and client before running a test. All timing runs on client (of course). % time ls --color=never > /dev/null This takes about 4 seconds on 2.6.38, using READDIRPLUS. With the patch below applied it takes about 1.5 seconds. The first 44 requests are READDIRPLUS, which provide the 1024 entries requested by the getdents64 call. The remaining requsts are READDIR. So on a big directory I get a factor-of-2 speed up. On a more loaded NFS server the real numbers might be bigger(??) % time ls -l --color=never > /dev/nulls This takes about 25 seconds when using READDIRPLUS, either with or without that patch the same sequence of requests are sent. With READDIR (using -o nordirplus) it takes about 40 seconds. Much of the 25 seconds is due to GETACL requests. So while this only provides a 2 second speed-up for me, it is a real speed up. The only cost I can find is that the sequence: ls ls -l becomes slower. The "ls -l" doesn't perform any READDIR as the directory listing is in cache. So that means we need 30,000 GETATTR calls and 30,000 GETACL calls, which all take a while. What do people think? Strangely, when I try NFSv4 I don't get what I would expect. "ls" on an unpatched 2.6.38 takes over 5 seconds rather than around 4. With the patch it does back down to about 2. (still NFSv3 at 1.5). Why would NFSv4 be slower? On v3 we make 44 READDIRPLUS calls and 284 READDIR calls - total of 328 READDIRPLUS have about 30 names, READDIR have about 100 On v4 we make 633 READDIR calls - nearly double. Early packed contain about 19 name, later ones about 70 Is nfsd (2.6.32) just not packing enough answers in the reply? Client asks for a dircount of 16384 and a maxcount of 32768, and gets packets which are about 4K long - I guess that is PAGE_SIZE ?? "ls -l" still takes around 25 seconds - even though READDIR is asking for and receiving all the 'plus' attributes, I see 30,000 "GETATTR" requests for exactly the same set of attributes. Something is wrong there. NeilBrown From: NeilBrown <neilb@suse.de> Subject: Make selection of 'readdir-plus' adapt to usage patterns. While the use of READDIRPLUS is significantly more efficient than READDIR followed by many GETATTR calls, it is still less efficient than just READDIR if the attributes are not required. We can get a hint as to whether the application requires attr information by looking at whether any ->getattr calls are made between ->readdir calls. If there are any, then getting the attributes seems to be worth while. This patch tracks whether there have been recent getattr calls on children of a directory and uses that information to selectively disable READDIRPLUS on that directory. The first 'readdir' call is always served using READDIRPLUS. Subsequent calls only use READDIRPLUS if there was a getattr on a child in the mean time. The locking of ->d_parent access needs to be reviewed. As the bit is simply a hint, it isn't critical that it is set on the "correct" parent if a rename is happening, but it is critical that the 'set' doesn't set a bit in something that isn't even an inode any more. Signed-off-by: NeilBrown <neilb@suse.de> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 2c3eb33..6882e14 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -804,6 +804,9 @@ static int nfs_readdir(struct file *filp, void *dirent, filldir_t filldir) desc->dir_cookie = &nfs_file_open_context(filp)->dir_cookie; desc->decode = NFS_PROTO(inode)->decode_dirent; desc->plus = NFS_USE_READDIRPLUS(inode); + if (filp->f_pos > 0 && !test_bit(NFS_INO_SEEN_GETATTR, &NFS_I(inode)->flags)) + desc->plus = 0; + clear_bit(NFS_INO_SEEN_GETATTR, &NFS_I(inode)->flags); nfs_block_sillyrename(dentry); res = nfs_revalidate_mapping(inode, filp->f_mapping); diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 2f8e618..4cb17df 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -505,6 +505,15 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) struct inode *inode = dentry->d_inode; int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; int err; + struct dentry *p; + struct inode *pi; + + rcu_read_lock(); + p = dentry->d_parent; + pi = rcu_dereference(p)->d_inode; + if (pi && !test_bit(NFS_INO_SEEN_GETATTR, &NFS_I(pi)->flags)) + set_bit(NFS_INO_SEEN_GETATTR, &NFS_I(pi)->flags); + rcu_read_unlock(); /* Flush out writes to the server in order to update c/mtime. */ if (S_ISREG(inode->i_mode)) { diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 6023efa..2a04ed5 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -219,6 +219,10 @@ struct nfs_inode { #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */ #define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */ #define NFS_INO_COMMIT (7) /* inode is committing unstable writes */ +#define NFS_INO_SEEN_GETATTR (8) /* flag to track if app is calling + * getattr in a directory during + * readdir + */ static inline struct nfs_inode *NFS_I(const struct inode *inode) { ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-17 0:55 ` NeilBrown @ 2011-03-17 17:44 ` J. Bruce Fields 2011-03-18 4:27 ` NeilBrown 0 siblings, 1 reply; 17+ messages in thread From: J. Bruce Fields @ 2011-03-17 17:44 UTC (permalink / raw) To: NeilBrown; +Cc: peter.staubach, Trond.Myklebust, bjschuma, linux-nfs On Thu, Mar 17, 2011 at 11:55:22AM +1100, NeilBrown wrote: > Strangely, when I try NFSv4 I don't get what I would expect. > > "ls" on an unpatched 2.6.38 takes over 5 seconds rather than around 4. > With the patch it does back down to about 2. (still NFSv3 at 1.5). > Why would NFSv4 be slower? > On v3 we make 44 READDIRPLUS calls and 284 READDIR calls - total of 328 > READDIRPLUS have about 30 names, READDIR have about 100 > On v4 we make 633 READDIR calls - nearly double. > Early packed contain about 19 name, later ones about 70 > > Is nfsd (2.6.32) just not packing enough answers in the reply? > Client asks for a dircount of 16384 and a maxcount of 32768, and gets > packets which are about 4K long - I guess that is PAGE_SIZE ?? >From nfsd4_encode_readdir(): maxcount = PAGE_SIZE; if (maxcount > readdir->rd_maxcount) maxcount = readdir->rd_maxcount; Unfortunately, I don't think the xdr encoding is equipped to deal with page boundaries. It should be. --b. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-17 17:44 ` J. Bruce Fields @ 2011-03-18 4:27 ` NeilBrown 0 siblings, 0 replies; 17+ messages in thread From: NeilBrown @ 2011-03-18 4:27 UTC (permalink / raw) To: J. Bruce Fields; +Cc: peter.staubach, Trond.Myklebust, bjschuma, linux-nfs On Thu, 17 Mar 2011 13:44:53 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > On Thu, Mar 17, 2011 at 11:55:22AM +1100, NeilBrown wrote: > > Strangely, when I try NFSv4 I don't get what I would expect. > > > > "ls" on an unpatched 2.6.38 takes over 5 seconds rather than around 4. > > With the patch it does back down to about 2. (still NFSv3 at 1.5). > > Why would NFSv4 be slower? > > On v3 we make 44 READDIRPLUS calls and 284 READDIR calls - total of 328 > > READDIRPLUS have about 30 names, READDIR have about 100 > > On v4 we make 633 READDIR calls - nearly double. > > Early packed contain about 19 name, later ones about 70 > > > > Is nfsd (2.6.32) just not packing enough answers in the reply? > > Client asks for a dircount of 16384 and a maxcount of 32768, and gets > > packets which are about 4K long - I guess that is PAGE_SIZE ?? > > >From nfsd4_encode_readdir(): > > maxcount = PAGE_SIZE; > if (maxcount > readdir->rd_maxcount) > maxcount = readdir->rd_maxcount; > > Unfortunately, I don't think the xdr encoding is equipped to deal with > page boundaries. It should be. Bah humbug. NFSv3 gets it right - it just encodes into the next page and then copies back. Sounds like a simple afternoon's project .... now if only we could find someone with a simple afternoon :-) Getting a realistic upper limit on the size of the reply (which is more variable for v4 than for v3) would be the only tricky bit.. Though nfsd4_encode_fattr looks fairly idempotent, so you could just try to encode and if it doesn't fit: allocate next page encode into there copy some into previous page copy rest down. NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 4:55 Use of READDIRPLUS on large directories NeilBrown 2011-03-16 12:30 ` peter.staubach @ 2011-03-16 13:43 ` Chuck Lever 2011-03-16 14:14 ` Bryan Schumaker 1 sibling, 1 reply; 17+ messages in thread From: Chuck Lever @ 2011-03-16 13:43 UTC (permalink / raw) To: NeilBrown, Bryan Schumaker; +Cc: Trond Myklebust, Linux NFS Mailing List On Mar 16, 2011, at 12:55 AM, NeilBrown wrote: > Hi Trond / Bryan et al. > > Now that openSUSE 11.4 is out I have started getting a few reports > of regressions that can be traced to > > commit 0715dc632a271fc0fedf3ef4779fe28ac1e53ef4 > Author: Bryan Schumaker <bjschuma@netapp.com> > Date: Fri Sep 24 18:50:01 2010 -0400 > > NFS: remove readdir plus limit > > We will now use readdir plus even on directories that are very large. > > Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > > > This particularly affects users with their home directory over > NFS, and with largish maildir mail folders. > > Where it used to take a smallish number of seconds for (e.g.) > xbiff to start up and read through the various directories, it now > takes multiple minutes. > > I can confirm that the slow down is due to readdirplus by mounting the > filesystem with nordirplus. Back in the dark ages, I discovered that this kind of slowdown was often the result of server slowness. The problem is that a simple readdir is often a sequential read from physical media. When you include attribute information, the server has to pick up the inodes, which is a series of small random reads. It could cause each readdir request to become slower by a factor of 10. This is a problem on NFS servers where the inode cache is turning over often (small home directory servers, for instance). In addition, as more information per file is delivered by READDIRPLUS, each request can hold fewer entries, so more requests and more packets are needed to read a directory. We hold the request count down now by allowing multi-page directory reads, if the server supports it. In any event, applications will see this slow down immediately, but it can also be a significant scalability problem for servers. > While I can understand that there are sometime benefits in using > readdirplus for very large directories, there are also obviously real > costs. So I think we have to see this patch as a regression that should > be reverted. It would be useful to understand what it is about these workloads that is causing slow downs. Is it simply the size of the directory? Or is there a bug on the server or client that is causing the issue? Is it a problem only on certain servers or with certain configurations? > It would quite possibly make sense to create a tunable (mount option or > sysctl I guess) to set the max size for directories to use readdirplus, > but I think it really should be an opt-in situation. Giving users another knob usually results in higher support costs and confused users. ;-) > [[ It would also be really nice if the change-log for such a significant > change contained a little more justification.... :-( ]] I had asked, before this series was included in upstream, for some tests to discover where the knee of the performance curve between readdir and readdirplus was. Bryan, can you publish the results of those tests? I had hoped the test results would appear in the patch description to help justify this change. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 13:43 ` Chuck Lever @ 2011-03-16 14:14 ` Bryan Schumaker 2011-03-16 14:20 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Bryan Schumaker @ 2011-03-16 14:14 UTC (permalink / raw) To: Chuck Lever; +Cc: NeilBrown, Trond Myklebust, Linux NFS Mailing List I guess I misunderstood what to publish test results for? I know I included numbers on one of the patches (commit 82f2e5472e2304e531c2fa85e457f4a71070044e, copied below)... I'll find the numbers you're asking about and post them. -Bryan commit 82f2e5472e2304e531c2fa85e457f4a71070044e Author: Bryan Schumaker <bjschuma@netapp.com> Date: Thu Oct 21 16:33:18 2010 -0400 NFS: Readdir plus in v4 By requsting more attributes during a readdir, we can mimic the readdir plus operation that was in NFSv3. To test, I ran the command `ls -lU --color=none` on directories with various numbers of files. Without readdir plus, I see this: n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 --------+-----------+-----------+-----------+-----------+---------- real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s access | 3 | 1 | 1 | 4 | 31 getattr | 2 | 1 | 1 | 1 | 1 lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003 readdir | 2 | 16 | 158 | 1,575 | 15,749 total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784 With readdir plus enabled, I see this: n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 --------+-----------+-----------+-----------+-----------+---------- real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s access | 3 | 1 | 1 | 1 | 7 getattr | 2 | 1 | 1 | 1 | 1 lookup | 4 | 3 | 3 | 3 | 3 readdir | 6 | 62 | 630 | 6,300 | 62,993 total | 15 | 67 | 635 | 6,305 | 63,004 Readdir plus disabled has about a 16x increase in the number of rpc calls an is 4 - 5 times slower on large directories. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> On 03/16/2011 09:43 AM, Chuck Lever wrote: > > On Mar 16, 2011, at 12:55 AM, NeilBrown wrote: > >> Hi Trond / Bryan et al. >> >> Now that openSUSE 11.4 is out I have started getting a few reports >> of regressions that can be traced to >> >> commit 0715dc632a271fc0fedf3ef4779fe28ac1e53ef4 >> Author: Bryan Schumaker <bjschuma@netapp.com> >> Date: Fri Sep 24 18:50:01 2010 -0400 >> >> NFS: remove readdir plus limit >> >> We will now use readdir plus even on directories that are very large. >> >> Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >> >> >> This particularly affects users with their home directory over >> NFS, and with largish maildir mail folders. >> >> Where it used to take a smallish number of seconds for (e.g.) >> xbiff to start up and read through the various directories, it now >> takes multiple minutes. >> >> I can confirm that the slow down is due to readdirplus by mounting the >> filesystem with nordirplus. > > Back in the dark ages, I discovered that this kind of slowdown was often the result of server slowness. The problem is that a simple readdir is often a sequential read from physical media. When you include attribute information, the server has to pick up the inodes, which is a series of small random reads. It could cause each readdir request to become slower by a factor of 10. This is a problem on NFS servers where the inode cache is turning over often (small home directory servers, for instance). > > In addition, as more information per file is delivered by READDIRPLUS, each request can hold fewer entries, so more requests and more packets are needed to read a directory. We hold the request count down now by allowing multi-page directory reads, if the server supports it. > > In any event, applications will see this slow down immediately, but it can also be a significant scalability problem for servers. > >> While I can understand that there are sometime benefits in using >> readdirplus for very large directories, there are also obviously real >> costs. So I think we have to see this patch as a regression that should >> be reverted. > > It would be useful to understand what it is about these workloads that is causing slow downs. Is it simply the size of the directory? Or is there a bug on the server or client that is causing the issue? Is it a problem only on certain servers or with certain configurations? > >> It would quite possibly make sense to create a tunable (mount option or >> sysctl I guess) to set the max size for directories to use readdirplus, >> but I think it really should be an opt-in situation. > > Giving users another knob usually results in higher support costs and confused users. ;-) > >> [[ It would also be really nice if the change-log for such a significant >> change contained a little more justification.... :-( ]] > > I had asked, before this series was included in upstream, for some tests to discover where the knee of the performance curve between readdir and readdirplus was. Bryan, can you publish the results of those tests? I had hoped the test results would appear in the patch description to help justify this change. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 14:14 ` Bryan Schumaker @ 2011-03-16 14:20 ` Trond Myklebust 2011-03-16 21:30 ` NeilBrown 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-03-16 14:20 UTC (permalink / raw) To: Bryan Schumaker; +Cc: Chuck Lever, NeilBrown, Linux NFS Mailing List On Wed, 2011-03-16 at 10:14 -0400, Bryan Schumaker wrote: > I guess I misunderstood what to publish test results for? I know I included numbers on one of the patches (commit 82f2e5472e2304e531c2fa85e457f4a71070044e, copied below)... I'll find the numbers you're asking about and post them. > > -Bryan > > commit 82f2e5472e2304e531c2fa85e457f4a71070044e > Author: Bryan Schumaker <bjschuma@netapp.com> > Date: Thu Oct 21 16:33:18 2010 -0400 > > NFS: Readdir plus in v4 > > By requsting more attributes during a readdir, we can mimic the readdir plus > operation that was in NFSv3. > > To test, I ran the command `ls -lU --color=none` on directories with various > numbers of files. Without readdir plus, I see this: > > n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 > --------+-----------+-----------+-----------+-----------+---------- > real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s > user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s > sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s > access | 3 | 1 | 1 | 4 | 31 > getattr | 2 | 1 | 1 | 1 | 1 > lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003 > readdir | 2 | 16 | 158 | 1,575 | 15,749 > total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784 > > With readdir plus enabled, I see this: > > n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 > --------+-----------+-----------+-----------+-----------+---------- > real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s > user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s > sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s > access | 3 | 1 | 1 | 1 | 7 > getattr | 2 | 1 | 1 | 1 | 1 > lookup | 4 | 3 | 3 | 3 | 3 > readdir | 6 | 62 | 630 | 6,300 | 62,993 > total | 15 | 67 | 635 | 6,305 | 63,004 > > Readdir plus disabled has about a 16x increase in the number of rpc calls an > is 4 - 5 times slower on large directories. Right. Those are the numbers that convinced me... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 14:20 ` Trond Myklebust @ 2011-03-16 21:30 ` NeilBrown 2011-03-16 21:42 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: NeilBrown @ 2011-03-16 21:30 UTC (permalink / raw) To: Trond Myklebust; +Cc: Bryan Schumaker, Chuck Lever, Linux NFS Mailing List On Wed, 16 Mar 2011 10:20:03 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Wed, 2011-03-16 at 10:14 -0400, Bryan Schumaker wrote: > > I guess I misunderstood what to publish test results for? I know I included numbers on one of the patches (commit 82f2e5472e2304e531c2fa85e457f4a71070044e, copied below)... I'll find the numbers you're asking about and post them. > > > > -Bryan > > > > commit 82f2e5472e2304e531c2fa85e457f4a71070044e > > Author: Bryan Schumaker <bjschuma@netapp.com> > > Date: Thu Oct 21 16:33:18 2010 -0400 > > > > NFS: Readdir plus in v4 > > > > By requsting more attributes during a readdir, we can mimic the readdir plus > > operation that was in NFSv3. > > > > To test, I ran the command `ls -lU --color=none` on directories with various > > numbers of files. Without readdir plus, I see this: > > > > n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 > > --------+-----------+-----------+-----------+-----------+---------- > > real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s > > user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s > > sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s > > access | 3 | 1 | 1 | 4 | 31 > > getattr | 2 | 1 | 1 | 1 | 1 > > lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003 > > readdir | 2 | 16 | 158 | 1,575 | 15,749 > > total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784 > > > > With readdir plus enabled, I see this: > > > > n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 > > --------+-----------+-----------+-----------+-----------+---------- > > real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s > > user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s > > sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s > > access | 3 | 1 | 1 | 1 | 7 > > getattr | 2 | 1 | 1 | 1 | 1 > > lookup | 4 | 3 | 3 | 3 | 3 > > readdir | 6 | 62 | 630 | 6,300 | 62,993 > > total | 15 | 67 | 635 | 6,305 | 63,004 > > > > Readdir plus disabled has about a 16x increase in the number of rpc calls an > > is 4 - 5 times slower on large directories. > > Right. Those are the numbers that convinced me... > > Lies, Damn Lies, and ...... while these are impressive numbers they only tell half the story. If a change makes one common operation 4 times faster, and another common operation 10 times slower, it is a good change? or even an acceptable change? (The "10 times" is not a definite statistic - it is a guess based on a low-detail report) So it is obvious that there is sometimes value in using readdirplus, it is equally obvious that there is sometimes a cost. Switching the default from "not paying the cost when it is big" to "always paying the cost" is wrong. NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 21:30 ` NeilBrown @ 2011-03-16 21:42 ` Trond Myklebust 2011-03-16 22:40 ` NeilBrown 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-03-16 21:42 UTC (permalink / raw) To: NeilBrown; +Cc: Bryan Schumaker, Chuck Lever, Linux NFS Mailing List On Thu, 2011-03-17 at 08:30 +1100, NeilBrown wrote: > On Wed, 16 Mar 2011 10:20:03 -0400 Trond Myklebust > <Trond.Myklebust@netapp.com> wrote: > > > On Wed, 2011-03-16 at 10:14 -0400, Bryan Schumaker wrote: > > > I guess I misunderstood what to publish test results for? I know I included numbers on one of the patches (commit 82f2e5472e2304e531c2fa85e457f4a71070044e, copied below)... I'll find the numbers you're asking about and post them. > > > > > > -Bryan > > > > > > commit 82f2e5472e2304e531c2fa85e457f4a71070044e > > > Author: Bryan Schumaker <bjschuma@netapp.com> > > > Date: Thu Oct 21 16:33:18 2010 -0400 > > > > > > NFS: Readdir plus in v4 > > > > > > By requsting more attributes during a readdir, we can mimic the readdir plus > > > operation that was in NFSv3. > > > > > > To test, I ran the command `ls -lU --color=none` on directories with various > > > numbers of files. Without readdir plus, I see this: > > > > > > n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 > > > --------+-----------+-----------+-----------+-----------+---------- > > > real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s > > > user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s > > > sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s > > > access | 3 | 1 | 1 | 4 | 31 > > > getattr | 2 | 1 | 1 | 1 | 1 > > > lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003 > > > readdir | 2 | 16 | 158 | 1,575 | 15,749 > > > total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784 > > > > > > With readdir plus enabled, I see this: > > > > > > n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 > > > --------+-----------+-----------+-----------+-----------+---------- > > > real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s > > > user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s > > > sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s > > > access | 3 | 1 | 1 | 1 | 7 > > > getattr | 2 | 1 | 1 | 1 | 1 > > > lookup | 4 | 3 | 3 | 3 | 3 > > > readdir | 6 | 62 | 630 | 6,300 | 62,993 > > > total | 15 | 67 | 635 | 6,305 | 63,004 > > > > > > Readdir plus disabled has about a 16x increase in the number of rpc calls an > > > is 4 - 5 times slower on large directories. > > > > Right. Those are the numbers that convinced me... > > > > > > Lies, Damn Lies, and ...... > > > while these are impressive numbers they only tell half the story. > > If a change makes one common operation 4 times faster, and another common > operation 10 times slower, it is a good change? or even an acceptable change? > > (The "10 times" is not a definite statistic - it is a guess based on > a low-detail report) > > So it is obvious that there is sometimes value in using readdirplus, > it is equally obvious that there is sometimes a cost. > > Switching the default from "not paying the cost when it is big" to > "always paying the cost" is wrong. That's what the nordirplus mount flag is for. Keeping an arbitrary limit in the face of evidence that it is hurting is equally wrong. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 21:42 ` Trond Myklebust @ 2011-03-16 22:40 ` NeilBrown 2011-03-17 17:18 ` J. Bruce Fields 0 siblings, 1 reply; 17+ messages in thread From: NeilBrown @ 2011-03-16 22:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: Bryan Schumaker, Chuck Lever, Linux NFS Mailing List On Wed, 16 Mar 2011 17:42:35 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > So it is obvious that there is sometimes value in using readdirplus, > > it is equally obvious that there is sometimes a cost. > > > > Switching the default from "not paying the cost when it is big" to > > "always paying the cost" is wrong. > > That's what the nordirplus mount flag is for. Keeping an arbitrary limit > in the face of evidence that it is hurting is equally wrong. > If people didn't need 'nordirplus' previously to get acceptable performance, and do need it now, then that is a regression. NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-16 22:40 ` NeilBrown @ 2011-03-17 17:18 ` J. Bruce Fields 2011-04-04 20:14 ` Bryan Schumaker 0 siblings, 1 reply; 17+ messages in thread From: J. Bruce Fields @ 2011-03-17 17:18 UTC (permalink / raw) To: NeilBrown Cc: Trond Myklebust, Bryan Schumaker, Chuck Lever, Linux NFS Mailing List On Thu, Mar 17, 2011 at 09:40:19AM +1100, NeilBrown wrote: > On Wed, 16 Mar 2011 17:42:35 -0400 Trond Myklebust > <Trond.Myklebust@netapp.com> wrote: > > > > > So it is obvious that there is sometimes value in using readdirplus, > > > it is equally obvious that there is sometimes a cost. > > > > > > Switching the default from "not paying the cost when it is big" to > > > "always paying the cost" is wrong. > > > > That's what the nordirplus mount flag is for. Keeping an arbitrary limit > > in the face of evidence that it is hurting is equally wrong. > > > > If people didn't need 'nordirplus' previously to get acceptable > performance, and do need it now, then that is a regression. Agreed. Unfortunately, reversion at this point would also be a regression for a different group of folks. A smaller one, since *their* problem was fixed only more recently, but still there's probably no sensible way out of this but forwards.... --b. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-03-17 17:18 ` J. Bruce Fields @ 2011-04-04 20:14 ` Bryan Schumaker 2011-04-05 12:20 ` NeilBrown 0 siblings, 1 reply; 17+ messages in thread From: Bryan Schumaker @ 2011-04-04 20:14 UTC (permalink / raw) To: J. Bruce Fields Cc: NeilBrown, Trond Myklebust, Chuck Lever, Linux NFS Mailing List I've done some more testing and posted my initial results here: https://wiki.linux-nfs.org/wiki/index.php/Readdir_performance_results. If anybody has suggestions for better ways to organize the data, please let me know. I'll also try to post some graphs in the next couple of days. - Bryan On 03/17/2011 01:18 PM, J. Bruce Fields wrote: > On Thu, Mar 17, 2011 at 09:40:19AM +1100, NeilBrown wrote: >> On Wed, 16 Mar 2011 17:42:35 -0400 Trond Myklebust >> <Trond.Myklebust@netapp.com> wrote: >> >> >>>> So it is obvious that there is sometimes value in using readdirplus, >>>> it is equally obvious that there is sometimes a cost. >>>> >>>> Switching the default from "not paying the cost when it is big" to >>>> "always paying the cost" is wrong. >>> >>> That's what the nordirplus mount flag is for. Keeping an arbitrary limit >>> in the face of evidence that it is hurting is equally wrong. >>> >> >> If people didn't need 'nordirplus' previously to get acceptable >> performance, and do need it now, then that is a regression. > > Agreed. > > Unfortunately, reversion at this point would also be a regression for a > different group of folks. A smaller one, since *their* problem was > fixed only more recently, but still there's probably no sensible way out > of this but forwards.... > > --b. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-04-04 20:14 ` Bryan Schumaker @ 2011-04-05 12:20 ` NeilBrown 2011-04-07 14:28 ` Bryan Schumaker 0 siblings, 1 reply; 17+ messages in thread From: NeilBrown @ 2011-04-05 12:20 UTC (permalink / raw) To: Bryan Schumaker Cc: J. Bruce Fields, Trond Myklebust, Chuck Lever, Linux NFS Mailing List On Mon, 04 Apr 2011 16:14:48 -0400 Bryan Schumaker <bjschuma@netapp.com> wrote: > I've done some more testing and posted my initial results here: https://wiki.linux-nfs.org/wiki/index.php/Readdir_performance_results. If anybody has suggestions for better ways to organize the data, please let me know. I'll also try to post some graphs in the next couple of days. I think graphs would certainly help. Also it might be good to be explicit about the server hardware/config as that can make a real performance difference. No bright ideas about how to organise the graphs... I'd probably try just graphing the 'real' time against kernel version with one line for each different directory size. Then you get 16 graphs, 4 different configs (v3/v4 x rddirplus/norddirplus) and 4 different tests (ls -f, ls -lU, ls -U, rm -r... though I can't see how "ls -U" is different from "ls -f"). NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Use of READDIRPLUS on large directories 2011-04-05 12:20 ` NeilBrown @ 2011-04-07 14:28 ` Bryan Schumaker 0 siblings, 0 replies; 17+ messages in thread From: Bryan Schumaker @ 2011-04-07 14:28 UTC (permalink / raw) To: NeilBrown Cc: J. Bruce Fields, Trond Myklebust, Chuck Lever, Linux NFS Mailing List On 04/05/2011 08:20 AM, NeilBrown wrote: > On Mon, 04 Apr 2011 16:14:48 -0400 Bryan Schumaker <bjschuma@netapp.com> > wrote: > >> I've done some more testing and posted my initial results here: https://wiki.linux-nfs.org/wiki/index.php/Readdir_performance_results. If anybody has suggestions for better ways to organize the data, please let me know. I'll also try to post some graphs in the next couple of days. > > I think graphs would certainly help. > Also it might be good to be explicit about the server hardware/config as that > can make a real performance difference. I've aded this to the readdir performance page. Is there anything else I should put up about the server? > No bright ideas about how to organise the graphs... > I'd probably try just graphing the 'real' time against kernel version > with one line for each different directory size. > > Then you get 16 graphs, 4 different configs (v3/v4 x rddirplus/norddirplus) > and 4 different tests (ls -f, ls -lU, ls -U, rm -r... though I can't see how > "ls -U" is different from "ls -f"). I've added graphs showing real time. I'll be putting up graphs showing sys time and total number of RPC calls throughout the day. > NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-04-07 14:28 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-16 4:55 Use of READDIRPLUS on large directories NeilBrown 2011-03-16 12:30 ` peter.staubach 2011-03-16 13:50 ` Trond Myklebust 2011-03-16 21:40 ` NeilBrown 2011-03-17 0:55 ` NeilBrown 2011-03-17 17:44 ` J. Bruce Fields 2011-03-18 4:27 ` NeilBrown 2011-03-16 13:43 ` Chuck Lever 2011-03-16 14:14 ` Bryan Schumaker 2011-03-16 14:20 ` Trond Myklebust 2011-03-16 21:30 ` NeilBrown 2011-03-16 21:42 ` Trond Myklebust 2011-03-16 22:40 ` NeilBrown 2011-03-17 17:18 ` J. Bruce Fields 2011-04-04 20:14 ` Bryan Schumaker 2011-04-05 12:20 ` NeilBrown 2011-04-07 14:28 ` Bryan Schumaker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).