* inode rwlock instead of semaphore
@ 2003-02-02 17:01 Andreas Dilger
2003-02-02 17:42 ` Matthew Wilcox
2003-02-03 13:13 ` Jan Hudec
0 siblings, 2 replies; 5+ messages in thread
From: Andreas Dilger @ 2003-02-02 17:01 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Alexander Viro
Al,
I'm wondering why we use a semaphore to lock directories on lookups instead
of a rwlock? This would allow parallel lookups on directory entries instead
of single threading. We have a need for directories with millions of files
in them, and being able to start parallel lookups would be a big performance
boost I think.
AFAICS, the dcache is already SMP safe everywhere, but e.g. real_lookup()
is single threaded calling into the filesystem, and similarly
lookup_one_len() is SMP safe for the dcache, but we need to hold
the dir i_sem because of the call into the filesystem lookup method
in lookup_hash(). That is fine if you have small directories where
everything could be expected to fit into the dcache, but with very large
directories (which will almost always have a cold dcache) this causes
disk I/O latency for each lookup.
One possibility is to change the VFS to use down_read(&dir->i_rwlock) or
similar, or alternately create VFS methods that allow pushing the locking
down into the filesystem so they could use a rwlock or even a dir+name-based
lock (e.g. for ext3+htree) so they can lock subsets of the directory for
both read and write operations.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: inode rwlock instead of semaphore
2003-02-02 17:01 inode rwlock instead of semaphore Andreas Dilger
@ 2003-02-02 17:42 ` Matthew Wilcox
2003-02-02 22:32 ` Andrew Morton
2003-02-03 13:13 ` Jan Hudec
1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2003-02-02 17:42 UTC (permalink / raw)
To: linux-fsdevel, Alexander Viro
On Sun, Feb 02, 2003 at 10:01:55AM -0700, Andreas Dilger wrote:
> I'm wondering why we use a semaphore to lock directories on lookups instead
> of a rwlock? This would allow parallel lookups on directory entries instead
> of single threading. We have a need for directories with millions of files
> in them, and being able to start parallel lookups would be a big performance
> boost I think.
You mean a rwsem, not a rwlock, I assume? How about starvation issues?
--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: inode rwlock instead of semaphore
2003-02-02 17:42 ` Matthew Wilcox
@ 2003-02-02 22:32 ` Andrew Morton
0 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2003-02-02 22:32 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-fsdevel, viro
Matthew Wilcox <willy@debian.org> wrote:
>
> On Sun, Feb 02, 2003 at 10:01:55AM -0700, Andreas Dilger wrote:
> > I'm wondering why we use a semaphore to lock directories on lookups instead
> > of a rwlock? This would allow parallel lookups on directory entries instead
> > of single threading. We have a need for directories with millions of files
> > in them, and being able to start parallel lookups would be a big performance
> > boost I think.
>
> You mean a rwsem, not a rwlock, I assume? How about starvation issues?
>
Well... things like starvation we could presumably fix with a new lock type
or whatever.
But Andreas is correct - holding i_sem on a directory while the holder is
performing synchronous I/O is a very serious scalability problem.
I hit it (badly) against /tmp: one process was unlinking a file in /tmp (and
hence waiting on underway writeback in truncate). This prevents everything in
the machine from creating new files in /tmp until that I/O completes.
I fixed that specific problem by running the actual truncate outside i_sem in
sys_unlink(). But surely there will be other such problems. sync and
dirsync mounts come to mind, as well as huge directories.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: inode rwlock instead of semaphore
2003-02-02 17:01 inode rwlock instead of semaphore Andreas Dilger
2003-02-02 17:42 ` Matthew Wilcox
@ 2003-02-03 13:13 ` Jan Hudec
2003-02-03 17:47 ` Andreas Dilger
1 sibling, 1 reply; 5+ messages in thread
From: Jan Hudec @ 2003-02-03 13:13 UTC (permalink / raw)
To: linux-fsdevel, Alexander Viro
On Sun, Feb 02, 2003 at 10:01:55AM -0700, Andreas Dilger wrote:
> Al,
> One possibility is to change the VFS to use down_read(&dir->i_rwlock) or
> similar, or alternately create VFS methods that allow pushing the locking
> down into the filesystem so they could use a rwlock or even a dir+name-based
> lock (e.g. for ext3+htree) so they can lock subsets of the directory for
> both read and write operations.
Definitely sure lookup is a reader? AFAICT from the VFS point of view,
it's a writer and from the disk's point of view the filesystem driver
and not VFS should do the locking (eg. networking filesystems would only
do this locking on server).
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <bulb@ucw.cz>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: inode rwlock instead of semaphore
2003-02-03 13:13 ` Jan Hudec
@ 2003-02-03 17:47 ` Andreas Dilger
0 siblings, 0 replies; 5+ messages in thread
From: Andreas Dilger @ 2003-02-03 17:47 UTC (permalink / raw)
To: Jan Hudec, linux-fsdevel, Alexander Viro
On Feb 03, 2003 14:13 +0100, Jan Hudec wrote:
> On Sun, Feb 02, 2003 at 10:01:55AM -0700, Andreas Dilger wrote:
> > One possibility is to change the VFS to use down_read(&dir->i_rwlock) or
> > similar, or alternately create VFS methods that allow pushing the locking
> > down into the filesystem so they could use a rwlock or even a dir+name-based
> > lock (e.g. for ext3+htree) so they can lock subsets of the directory for
> > both read and write operations.
>
> Definitely sure lookup is a reader? AFAICT from the VFS point of view,
> it's a writer and from the disk's point of view the filesystem driver
> and not VFS should do the locking (eg. networking filesystems would only
> do this locking on server).
We have already created an API on the client side which allows the network
filesystem to handle the locking itself, and we are now using the server
side distributed lock manager to totally bypass the VFS locking entirely.
It looks like we don't actually need any changes for our network filesystem,
but it would still be a performance win for local filesystems to be able to
handle locking as needed for maximum performance.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-02-03 17:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-02 17:01 inode rwlock instead of semaphore Andreas Dilger
2003-02-02 17:42 ` Matthew Wilcox
2003-02-02 22:32 ` Andrew Morton
2003-02-03 13:13 ` Jan Hudec
2003-02-03 17:47 ` Andreas Dilger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).