From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matthew Wilcox Subject: Re: BKL removal Date: Thu, 4 Jul 2002 13:41:22 +0100 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20020704134122.V27706@parcelfarce.linux.theplanet.co.uk> References: <3D23F306.50703@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kernel-janitor-discuss , linux-fsdevel@vger.kernel.org Return-path: To: Dave Hansen Content-Disposition: inline In-Reply-To: <3D23F306.50703@us.ibm.com>; from haveblue@us.ibm.com on Thu, Jul 04, 2002 at 12:02:30AM -0700 List-Id: linux-fsdevel.vger.kernel.org On Thu, Jul 04, 2002 at 12:02:30AM -0700, Dave Hansen wrote: > I've created a patch that helps find bad uses of the big kernel lock. > It is explained in detail here: > http://marc.theaimsgroup.com/?l=linux-kernel&m=102535814329494&w=2 aw. you implemented the less useful check of the two you mentioned! i think the release-on-sleep property of the BKL is far more harmful than the recursive-holds. release-on-sleep means that you probably have a race there you didn't know about (eg the one i found in setfl earlier this week), and means you need to do some serious work in order to redesign your code to eliminate the race. once you've done that redesign, chances are you'll understand the locking in that area well enough to know that there are no recursive holds in your code and you'll switch to a spinlock. this will automagically reduce the number of recursive-holds simply because fewer places will take the BKL. as an aside, replacing the BKL by a non-recursive spinlock isn't always a great idea: /* * Nasty deadlock avoidance. * * ext2_new_block->getblk->GFP->shrink_dcache_memory->prune_dcache-> * prune_one_dentry->dput->dentry_iput->iput->inode->i_sb->s_op-> * put_inode->ext2_discard_prealloc->ext2_free_blocks->lock_super-> * DEADLOCK. * * We should make sure we don't hold the superblock lock over * block allocations, but for now: */ if (!(gfp_mask & __GFP_FS)) return 0; so we now decline to scavenge memory from the dcache or the icache if a filesystem causes the system to decide it's low on memory. this isn't a good thing. btw, this comment is somewhat abbreviated. the sequence is actually: ext2_new_block->load_block_bitmap->read_block_bitmap->sb_bread->__bread-> __getblk->grow_buffers->grow_dev_page->find_or_create_page->alloc_page-> ... it doesn't look _too_ hard to persuade ext2_new_block (btw, the problem also occurs with ext2_free_blocks) to unlock_super and retry, but that's slightly more dangerous work than i want to do right now ... particularly since some of these interfaces may change dramatically as a result of the aio work. -- Revolutions do not require corporate support.