linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Concurrent `ls` takes out the thrash
@ 2016-12-07 13:28 Benjamin Coddington
  2016-12-07 13:37 ` [PATCH] NFS: Serialize nfs_readdir() Benjamin Coddington
  2016-12-07 15:46 ` Concurrent `ls` takes out the thrash Trond Myklebust
  0 siblings, 2 replies; 15+ messages in thread
From: Benjamin Coddington @ 2016-12-07 13:28 UTC (permalink / raw)
  To: linux-nfs list

I was asked to figure out why the listing of very large directories was
slow.  More specifically, why concurrently listing the same large 
directory
is /very/ slow.  It seems that sometimes a user's reaction to waiting 
for
'ls' to complete is to start a few more.. and then their machine takes a
very long time to complete that work.

I can reproduce that finding.  As an example:

time ls -fl /dir/with/200000/entries/ >/dev/null

real    0m10.766s
user    0m0.716s
sys     0m0.827s

But..

for i in {1..10}; do time ls -fl /dir/with/200000/entries/ >/dev/null & 
done

Each of these ^^ 'ls' commands will take 4 to 5 minutes to complete.

The problem is that concurrent 'ls' commands stack up in nfs_readdir() 
both
waiting on the next page and taking turns filling the next page with 
xdr,
but only one of them will have desc->plus set because setting it clears 
the
flag on the directory.  So if a page is filled by a process that doesn't 
have
desc->plus then the next pass through lookup(), it dumps the entire page
cache with nfs_force_use_readdirplus().  Then the next readdir starts 
all
over filling the pagecache.  Forward progress happens, but only after 
many
steps back re-filling the pagecache.

To me most obvious fix would be to serialize nfs_readdir() on the 
directory
inode, so I'll follow-up with patch that does that with nfsi->rwsem.  
With that,
the above parallel 'ls' takes 12 seconds for each 'ls' to complete.

This only works because with concurrent 'ls' there is a consistent 
buffer
size so a waiting nfs_readdir() started in the same place for an 
unmodified
directory should always hit the cache after waiting.  Serializing
nfs_readdir() will not solve this problem for concurrent callers with
differing buffer sizes, or starting at different offsets, since there's 
a
good chance the waiting readdir() will not see the readdirplus flag when 
it
resumes and so will not prime the dcache.

While I think it's an OK fix, it feels bad to serialize.  At the same
time, nfs_readdir() is already serialized on the pagecache when 
concurrent
callers need to go to the server.  There might be other problems I 
haven't
thought about.

Maybe there's another way to fix this, or maybe we can just say "Don't 
do ls
more than once, you impatient bastards!"

Ben

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-12-08 21:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-07 13:28 Concurrent `ls` takes out the thrash Benjamin Coddington
2016-12-07 13:37 ` [PATCH] NFS: Serialize nfs_readdir() Benjamin Coddington
2016-12-07 16:30   ` Christoph Hellwig
2016-12-07 19:40     ` Benjamin Coddington
2016-12-07 17:01   ` kbuild test robot
2016-12-07 15:46 ` Concurrent `ls` takes out the thrash Trond Myklebust
2016-12-07 19:46   ` Benjamin Coddington
2016-12-07 22:34     ` Benjamin Coddington
2016-12-07 22:41       ` Trond Myklebust
2016-12-07 22:55         ` Benjamin Coddington
2016-12-07 22:59           ` Trond Myklebust
2016-12-07 23:10             ` Benjamin Coddington
2016-12-08 14:18               ` Benjamin Coddington
2016-12-08 16:13                 ` Benjamin Coddington
2016-12-08 21:48                   ` Benjamin Coddington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).