linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* inode rwlock instead of semaphore
@ 2003-02-02 17:01 Andreas Dilger
  2003-02-02 17:42 ` Matthew Wilcox
  2003-02-03 13:13 ` Jan Hudec
  0 siblings, 2 replies; 5+ messages in thread
From: Andreas Dilger @ 2003-02-02 17:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alexander Viro

Al,
I'm wondering why we use a semaphore to lock directories on lookups instead
of a rwlock?  This would allow parallel lookups on directory entries instead
of single threading.  We have a need for directories with millions of files
in them, and being able to start parallel lookups would be a big performance
boost I think.

AFAICS, the dcache is already SMP safe everywhere, but e.g. real_lookup()
is single threaded calling into the filesystem, and similarly
lookup_one_len() is SMP safe for the dcache, but we need to hold
the dir i_sem because of the call into the filesystem lookup method
in lookup_hash().  That is fine if you have small directories where
everything could be expected to fit into the dcache, but with very large
directories (which will almost always have a cold dcache) this causes
disk I/O latency for each lookup.

One possibility is to change the VFS to use down_read(&dir->i_rwlock) or
similar, or alternately create VFS methods that allow pushing the locking
down into the filesystem so they could use a rwlock or even a dir+name-based
lock (e.g. for ext3+htree) so they can lock subsets of the directory for
both read and write operations.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-02-03 17:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-02 17:01 inode rwlock instead of semaphore Andreas Dilger
2003-02-02 17:42 ` Matthew Wilcox
2003-02-02 22:32   ` Andrew Morton
2003-02-03 13:13 ` Jan Hudec
2003-02-03 17:47   ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).