public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* fs/locks.c BKL removal
@ 2002-05-10 21:48 Dave Hansen
  2002-05-10 22:13 ` Dave Hansen
  2002-05-11 19:45 ` Matthew Wilcox
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Hansen @ 2002-05-10 21:48 UTC (permalink / raw)
  To: matthew; +Cc: linux-kernel

Matthew,
Al Viro pointed me your way.

I'm looking into the fs/locks.c mess.  It appears that there was an 
attempt to convert this over to a semaphore, but it was removed just 
before the 2.4 release because of some deadlocks.

Whenever the i_flock list is traversed, the BKL is held.  It is also 
held while running through the file_lock_list which I think is used 
only for /proc/locks.

We definitely need a semaphore because of all the blocking that goes 
on.  We can either have a global lock for all of them, which I think 
was tried last time.  Or, we can split it up a bit more.  With the 
current design, there will need to be a lock for the global list, each 
individual list, and one for each individual lock to protect against 
access from the reference in the file_lock_list and the inode->i_flock 
list.

However, I think that the file_lock_list complexity may be able to be 
reduced.  If we make the file_lock_list a list of inodes (or just the 
i_flocks) with active locks, we can avoid the complexity of having an 
individual file_lock lock.  That way, we at least reduce the number of 
_types_ of locks.  It increases the number of dereferences, but this 
is /proc we're talking about.  Any comments?

Talking about locks for locks is confusing :)

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: fs/locks.c BKL removal
  2002-05-10 21:48 fs/locks.c BKL removal Dave Hansen
@ 2002-05-10 22:13 ` Dave Hansen
  2002-05-10 23:17   ` Andrew Morton
  2002-05-11 19:45 ` Matthew Wilcox
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2002-05-10 22:13 UTC (permalink / raw)
  Cc: matthew, linux-kernel

As Linus pointed out, a semaphore is probably the wrong way to go. 
The only things that really needs to be protected are the list 
operations themselves.

> No, I really think the code should use a spinlock for the global list, and
> then on a per-lock basis something like a reference count and a blocking
> lock (which might be a semaphore).


-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: fs/locks.c BKL removal
  2002-05-10 22:13 ` Dave Hansen
@ 2002-05-10 23:17   ` Andrew Morton
  2002-05-11 19:48     ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2002-05-10 23:17 UTC (permalink / raw)
  To: Dave Hansen; +Cc: matthew, linux-kernel

Dave Hansen wrote:
> 
> As Linus pointed out, a semaphore is probably the wrong way to go.
> The only things that really needs to be protected are the list
> operations themselves.
> 

It was I who put the BKL back into locks.c, much to
Matthew's disgust...

The problem was that replacing the BKL with a semaphore
seriously damaged Apache thoughput on 8-way.  Apache
was using flock()-based synchronisation and replacing
a spin with a schedule just killed it.

So.. Apache isn't doing that any more, but it is an
instructive case.  Replacing the BKL with a semaphore
can sometimes be a very bad thing.

See http://www.uwsg.iu.edu/hypermail/linux/kernel/0010.3/ -
search for "scalability"

-

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: fs/locks.c BKL removal
  2002-05-10 21:48 fs/locks.c BKL removal Dave Hansen
  2002-05-10 22:13 ` Dave Hansen
@ 2002-05-11 19:45 ` Matthew Wilcox
  2002-05-12  1:40   ` Dave Hansen
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2002-05-11 19:45 UTC (permalink / raw)
  To: Dave Hansen; +Cc: matthew, linux-kernel

On Fri, May 10, 2002 at 02:48:39PM -0700, Dave Hansen wrote:
> I'm looking into the fs/locks.c mess.  It appears that there was an 
> attempt to convert this over to a semaphore, but it was removed just 
> before the 2.4 release because of some deadlocks.

Actually, performance problems ...

> Whenever the i_flock list is traversed, the BKL is held.  It is also 
> held while running through the file_lock_list which I think is used 
> only for /proc/locks.

Correct.

> We definitely need a semaphore because of all the blocking that goes 
> on.  We can either have a global lock for all of them, which I think 
> was tried last time.  Or, we can split it up a bit more.  With the 
> current design, there will need to be a lock for the global list, each 
> individual list, and one for each individual lock to protect against 
> access from the reference in the file_lock_list and the inode->i_flock 
> list.

Nah.  Though I'm glad you missed it too; it means that I'm not as
stupid as I thought I was for only noticing it 2 years later.  Look at
locks_wake_up_blocks (this is basically the _only_ tricky part).  This has
to be called with a wait argument which is true.  The only time that
can happen is if locks_delete_lock is called with a `true' parameter.
And the only time _that_ happens is when the _type_ of an flock lock is
being changed.

And really, what's happening here?  We have a BSD flock which is blocking
one or more locks.  Those processes have to have the opportunity
to acquire the lock before the previously-blocking process gets the
opportunity to acquire its lock.  But that doesn't mean we need to schedule once
for _each_ task which is blocked; we only need to yield once.

So we can eliminate the `wait' argument to locks_delete_lock,
locks_wake_up_blocks and the arm of the conditional in
locks_wake_up_blocks which sleeps.  We only need to check in
flock_lock_file whether we're unlocking and yield if we aren't.

I'm currently doing a major restructure of fs/locks.c, and this problem
(along with several others) simply disappears.  I'm looking for a
testsuite before I release this code to the world ... anybody got one?

> However, I think that the file_lock_list complexity may be able to be 
> reduced.  If we make the file_lock_list a list of inodes (or just the 
> i_flocks) with active locks, we can avoid the complexity of having an 
> individual file_lock lock.  That way, we at least reduce the number of 
> _types_ of locks.  It increases the number of dereferences, but this 
> is /proc we're talking about.  Any comments?

Ick... I'd really like to see one spinlock protecting all activity in this
area.  And obviously not the magic BKL ;-)

> Talking about locks for locks is confusing :)

Tell me about it!  I'm close to calling things `blocks' `plocks',
`leases' and `mlocks', just to reduce the namespace conflicts.  But it's
not obvious those refer to BSD locks, POSIX locks and Mandatory locks...

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: fs/locks.c BKL removal
  2002-05-10 23:17   ` Andrew Morton
@ 2002-05-11 19:48     ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2002-05-11 19:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Hansen, matthew, linux-kernel

On Fri, May 10, 2002 at 04:17:29PM -0700, Andrew Morton wrote:
> It was I who put the BKL back into locks.c, much to
> Matthew's disgust...

The disgust was targetted more at removing the abstraction of
locking scheme which I'd put in and having explicit lock_kernel() /
unlock_kernel() calls.  I'd used (iirc) acquire_lock() / release_lock()
macros which could have just been redefined.

> The problem was that replacing the BKL with a semaphore
> seriously damaged Apache thoughput on 8-way.  Apache
> was using flock()-based synchronisation and replacing
> a spin with a schedule just killed it.

Which says that our semaphores suck, because they don't try to spin for a
bit before scheduling.  Of course, your change back was the right thing
to do in the 2.3.late timeframe.

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: fs/locks.c BKL removal
  2002-05-11 19:45 ` Matthew Wilcox
@ 2002-05-12  1:40   ` Dave Hansen
  2002-05-12  2:07     ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2002-05-12  1:40 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel

Matthew Wilcox wrote:
> Ick... I'd really like to see one spinlock protecting all activity in this
> area.  And obviously not the magic BKL ;-)

Do you really think a single lock is the way to go?  Maybe I'm just 
paranoid, but somebody is going to run into a locking bottleneck here 
eventually.  I also just don't like global locks.

I'll ask our benchmarking team if they have test suites for file 
locking.  I crossing my fingers.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: fs/locks.c BKL removal
  2002-05-12  1:40   ` Dave Hansen
@ 2002-05-12  2:07     ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2002-05-12  2:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Matthew Wilcox, linux-kernel

On Sat, May 11, 2002 at 06:40:00PM -0700, Dave Hansen wrote:
> Do you really think a single lock is the way to go?  Maybe I'm just 
> paranoid, but somebody is going to run into a locking bottleneck here 
> eventually.  I also just don't like global locks.

Well, we have a global entity being protected -- the file_lock_list.
Clearly we don't want to use one of the existing per-inode semaphores
since semaphores have a noted bad effect on both benchmarks and real-world
scenarios.  And I don't think introducing a new per-inode lock would
really be welcome.

> I'll ask our benchmarking team if they have test suites for file 
> locking.  I crossing my fingers.

Cool, thanks.

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-05-12  2:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-10 21:48 fs/locks.c BKL removal Dave Hansen
2002-05-10 22:13 ` Dave Hansen
2002-05-10 23:17   ` Andrew Morton
2002-05-11 19:48     ` Matthew Wilcox
2002-05-11 19:45 ` Matthew Wilcox
2002-05-12  1:40   ` Dave Hansen
2002-05-12  2:07     ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox