* fs/locks.c BKL removal
@ 2002-05-10 21:48 Dave Hansen
2002-05-10 22:13 ` Dave Hansen
2002-05-11 19:45 ` Matthew Wilcox
0 siblings, 2 replies; 7+ messages in thread
From: Dave Hansen @ 2002-05-10 21:48 UTC (permalink / raw)
To: matthew; +Cc: linux-kernel
Matthew,
Al Viro pointed me your way.
I'm looking into the fs/locks.c mess. It appears that there was an
attempt to convert this over to a semaphore, but it was removed just
before the 2.4 release because of some deadlocks.
Whenever the i_flock list is traversed, the BKL is held. It is also
held while running through the file_lock_list which I think is used
only for /proc/locks.
We definitely need a semaphore because of all the blocking that goes
on. We can either have a global lock for all of them, which I think
was tried last time. Or, we can split it up a bit more. With the
current design, there will need to be a lock for the global list, each
individual list, and one for each individual lock to protect against
access from the reference in the file_lock_list and the inode->i_flock
list.
However, I think that the file_lock_list complexity may be able to be
reduced. If we make the file_lock_list a list of inodes (or just the
i_flocks) with active locks, we can avoid the complexity of having an
individual file_lock lock. That way, we at least reduce the number of
_types_ of locks. It increases the number of dereferences, but this
is /proc we're talking about. Any comments?
Talking about locks for locks is confusing :)
--
Dave Hansen
haveblue@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fs/locks.c BKL removal
2002-05-10 21:48 fs/locks.c BKL removal Dave Hansen
@ 2002-05-10 22:13 ` Dave Hansen
2002-05-10 23:17 ` Andrew Morton
2002-05-11 19:45 ` Matthew Wilcox
1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2002-05-10 22:13 UTC (permalink / raw)
Cc: matthew, linux-kernel
As Linus pointed out, a semaphore is probably the wrong way to go.
The only things that really needs to be protected are the list
operations themselves.
> No, I really think the code should use a spinlock for the global list, and
> then on a per-lock basis something like a reference count and a blocking
> lock (which might be a semaphore).
--
Dave Hansen
haveblue@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fs/locks.c BKL removal
2002-05-10 22:13 ` Dave Hansen
@ 2002-05-10 23:17 ` Andrew Morton
2002-05-11 19:48 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2002-05-10 23:17 UTC (permalink / raw)
To: Dave Hansen; +Cc: matthew, linux-kernel
Dave Hansen wrote:
>
> As Linus pointed out, a semaphore is probably the wrong way to go.
> The only things that really needs to be protected are the list
> operations themselves.
>
It was I who put the BKL back into locks.c, much to
Matthew's disgust...
The problem was that replacing the BKL with a semaphore
seriously damaged Apache thoughput on 8-way. Apache
was using flock()-based synchronisation and replacing
a spin with a schedule just killed it.
So.. Apache isn't doing that any more, but it is an
instructive case. Replacing the BKL with a semaphore
can sometimes be a very bad thing.
See http://www.uwsg.iu.edu/hypermail/linux/kernel/0010.3/ -
search for "scalability"
-
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fs/locks.c BKL removal
2002-05-10 23:17 ` Andrew Morton
@ 2002-05-11 19:48 ` Matthew Wilcox
0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2002-05-11 19:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: Dave Hansen, matthew, linux-kernel
On Fri, May 10, 2002 at 04:17:29PM -0700, Andrew Morton wrote:
> It was I who put the BKL back into locks.c, much to
> Matthew's disgust...
The disgust was targetted more at removing the abstraction of
locking scheme which I'd put in and having explicit lock_kernel() /
unlock_kernel() calls. I'd used (iirc) acquire_lock() / release_lock()
macros which could have just been redefined.
> The problem was that replacing the BKL with a semaphore
> seriously damaged Apache thoughput on 8-way. Apache
> was using flock()-based synchronisation and replacing
> a spin with a schedule just killed it.
Which says that our semaphores suck, because they don't try to spin for a
bit before scheduling. Of course, your change back was the right thing
to do in the 2.3.late timeframe.
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fs/locks.c BKL removal
2002-05-10 21:48 fs/locks.c BKL removal Dave Hansen
2002-05-10 22:13 ` Dave Hansen
@ 2002-05-11 19:45 ` Matthew Wilcox
2002-05-12 1:40 ` Dave Hansen
1 sibling, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2002-05-11 19:45 UTC (permalink / raw)
To: Dave Hansen; +Cc: matthew, linux-kernel
On Fri, May 10, 2002 at 02:48:39PM -0700, Dave Hansen wrote:
> I'm looking into the fs/locks.c mess. It appears that there was an
> attempt to convert this over to a semaphore, but it was removed just
> before the 2.4 release because of some deadlocks.
Actually, performance problems ...
> Whenever the i_flock list is traversed, the BKL is held. It is also
> held while running through the file_lock_list which I think is used
> only for /proc/locks.
Correct.
> We definitely need a semaphore because of all the blocking that goes
> on. We can either have a global lock for all of them, which I think
> was tried last time. Or, we can split it up a bit more. With the
> current design, there will need to be a lock for the global list, each
> individual list, and one for each individual lock to protect against
> access from the reference in the file_lock_list and the inode->i_flock
> list.
Nah. Though I'm glad you missed it too; it means that I'm not as
stupid as I thought I was for only noticing it 2 years later. Look at
locks_wake_up_blocks (this is basically the _only_ tricky part). This has
to be called with a wait argument which is true. The only time that
can happen is if locks_delete_lock is called with a `true' parameter.
And the only time _that_ happens is when the _type_ of an flock lock is
being changed.
And really, what's happening here? We have a BSD flock which is blocking
one or more locks. Those processes have to have the opportunity
to acquire the lock before the previously-blocking process gets the
opportunity to acquire its lock. But that doesn't mean we need to schedule once
for _each_ task which is blocked; we only need to yield once.
So we can eliminate the `wait' argument to locks_delete_lock,
locks_wake_up_blocks and the arm of the conditional in
locks_wake_up_blocks which sleeps. We only need to check in
flock_lock_file whether we're unlocking and yield if we aren't.
I'm currently doing a major restructure of fs/locks.c, and this problem
(along with several others) simply disappears. I'm looking for a
testsuite before I release this code to the world ... anybody got one?
> However, I think that the file_lock_list complexity may be able to be
> reduced. If we make the file_lock_list a list of inodes (or just the
> i_flocks) with active locks, we can avoid the complexity of having an
> individual file_lock lock. That way, we at least reduce the number of
> _types_ of locks. It increases the number of dereferences, but this
> is /proc we're talking about. Any comments?
Ick... I'd really like to see one spinlock protecting all activity in this
area. And obviously not the magic BKL ;-)
> Talking about locks for locks is confusing :)
Tell me about it! I'm close to calling things `blocks' `plocks',
`leases' and `mlocks', just to reduce the namespace conflicts. But it's
not obvious those refer to BSD locks, POSIX locks and Mandatory locks...
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fs/locks.c BKL removal
2002-05-11 19:45 ` Matthew Wilcox
@ 2002-05-12 1:40 ` Dave Hansen
2002-05-12 2:07 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2002-05-12 1:40 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-kernel
Matthew Wilcox wrote:
> Ick... I'd really like to see one spinlock protecting all activity in this
> area. And obviously not the magic BKL ;-)
Do you really think a single lock is the way to go? Maybe I'm just
paranoid, but somebody is going to run into a locking bottleneck here
eventually. I also just don't like global locks.
I'll ask our benchmarking team if they have test suites for file
locking. I crossing my fingers.
--
Dave Hansen
haveblue@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fs/locks.c BKL removal
2002-05-12 1:40 ` Dave Hansen
@ 2002-05-12 2:07 ` Matthew Wilcox
0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2002-05-12 2:07 UTC (permalink / raw)
To: Dave Hansen; +Cc: Matthew Wilcox, linux-kernel
On Sat, May 11, 2002 at 06:40:00PM -0700, Dave Hansen wrote:
> Do you really think a single lock is the way to go? Maybe I'm just
> paranoid, but somebody is going to run into a locking bottleneck here
> eventually. I also just don't like global locks.
Well, we have a global entity being protected -- the file_lock_list.
Clearly we don't want to use one of the existing per-inode semaphores
since semaphores have a noted bad effect on both benchmarks and real-world
scenarios. And I don't think introducing a new per-inode lock would
really be welcome.
> I'll ask our benchmarking team if they have test suites for file
> locking. I crossing my fingers.
Cool, thanks.
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-05-12 2:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-10 21:48 fs/locks.c BKL removal Dave Hansen
2002-05-10 22:13 ` Dave Hansen
2002-05-10 23:17 ` Andrew Morton
2002-05-11 19:48 ` Matthew Wilcox
2002-05-11 19:45 ` Matthew Wilcox
2002-05-12 1:40 ` Dave Hansen
2002-05-12 2:07 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox