From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dave Hansen <haveblue@us.ibm.com>
Subject: Re: BKL removal
Date: Thu, 04 Jul 2002 11:44:29 -0700
Sender: linux-fsdevel-owner@vger.kernel.org
Message-ID: <3D24978D.1030602@us.ibm.com>
References: <3D23F306.50703@us.ibm.com> <20020704134122.V27706@parcelfarce.linux.theplanet.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kernel-janitor-discuss
	<kernel-janitor-discuss@lists.sourceforge.net>,
	linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
To: Matthew Wilcox <willy@debian.org>
List-Id: linux-fsdevel.vger.kernel.org

Matthew Wilcox wrote:
> aw.  you implemented the less useful check of the two you mentioned!
> i think the release-on-sleep property of the BKL is far more harmful
> than the recursive-holds.  release-on-sleep means that you probably
> have a race there you didn't know about (eg the one i found in setfl
> earlier this week), and means you need to do some serious work in order
> to redesign your code to eliminate the race.

Be careful what you ask for.  These release-on-sleep calls happen 
quite often, maybe as often as a few a second and I can't think of any 
elegant ways of limiting the number of unique printk()s in schedule(). 
  It was easy to limit the printing for a single un/lock_kernel() call 
with static variables, but the scheduling ones are a bit more 
complicated.  I can think of some convoluted ways of doing this, but 
no simple ones.  If you apply this patch, be ready for a load of stuff 
in dmesg!

> once you've done that redesign, chances are you'll understand the locking
> in that area well enough to know that there are no recursive holds in your
> code and you'll switch to a spinlock.  this will automagically reduce the
> number of recursive-holds simply because fewer places will take the BKL.

Some of the times that I've either broken someone else's code or 
simply pointed it out to them, they've fixed it.

> as an aside, replacing the BKL by a non-recursive spinlock isn't always a
> great idea:
...<snip> look at the original if you want to see the whole call sequence!
> it doesn't look _too_ hard to persuade ext2_new_block (btw, the problem
> also occurs with ext2_free_blocks) to unlock_super and retry, but that's
> slightly more dangerous work than i want to do right now ... particularly
> since some of these interfaces may change dramatically as a result of
> the aio work.

That's icky.  So, was this potential problem introduced during Al's 
BKL replacement in ext2?  The real problem here is the faulting 
operation while holding a spinlock which requires the spinlock to 
resolve, right?  I think that all you can do is release the lock, try 
to get some memory, and start the process over.  This is the kind of 
thing that I hope my patch will help us find so that we don't all need 
to have a Viroesque understanding of VFS :)

BTW, the Brits don't have any big parties today, do they?

-- 
Dave Hansen
haveblue@us.ibm.com