From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: RE: oops in 2.4.25 prune_icache() called from kswapd Date: Sat, 18 Jun 2005 17:33:42 -0300 Message-ID: <20050618203342.GB8385@logos.cnet> References: <390de35bed.35bed390de@llnl.gov> <20050618194522.GA8385@logos.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Albert Chu , David Woodhouse , Al Viro , Prakash Velupula , linux-fsdevel@vger.kernel.org, lwoodman@redhat.com Return-path: Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:59289 "EHLO parcelfarce.linux.theplanet.co.uk") by vger.kernel.org with ESMTP id S261250AbVFSBvI (ORCPT ); Sat, 18 Jun 2005 21:51:08 -0400 To: Chris Caputo Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sun, Jun 19, 2005 at 01:11:20AM +0000, Chris Caputo wrote: > Hi, > > Marcello, your patch didn't come through. Unfortunately I can't test it > anymore since my environment has now changed, but hopefully Albert or > Prakash can try it when you resend. Albert, Prakash, any of you using stock v2.4? Here's the diff: --- a/fs/inode.c.orig 2005-06-18 11:19:21.508857600 -0300 +++ b/fs/inode.c 2005-06-18 11:21:03.925287936 -0300 @@ -1238,15 +1238,18 @@ BUG(); } else { if (!list_empty(&inode->i_hash)) { - if (!(inode->i_state & (I_DIRTY|I_LOCK))) - __refile_inode(inode); - inodes_stat.nr_unused++; - spin_unlock(&inode_lock); - if (!sb || (sb->s_flags & MS_ACTIVE)) + if (!sb || (sb->s_flags & MS_ACTIVE)) { + if (!(inode->i_state & + (I_DIRTY|I_LOCK))) { + __refile_inode(inode); + inodes_stat.nr_unused++; + } + spin_unlock(&inode_lock); return; + } + spin_unlock(&inode_lock); write_inode_now(inode, 1); spin_lock(&inode_lock); - inodes_stat.nr_unused--; list_del_init(&inode->i_hash); } list_del_init(&inode->i_list); > > Thanks, > Chris > > On Sat, 18 Jun 2005, Marcelo Tosatti wrote: > >Hi, > > > >Shame the RH bugzilla (#155289) requires super priviledges to be > >accessed. > > > >I've got around reading Albert's description of the race - thanks BTW. > >(its attached below for reference) > > > >It seems to me that only window open in mainline is between iput() and > >prune_icache(), while iput() sleeps on sync_one() with the inode > >being: > > > >- on the unused list > >- and with i_count set to zero > > > >prune_icache() is free to invalidate and destroy the inode in the meantime, > >causing iput()'s sync_one() to __refile_inode() the NULL entry to the > >unused list later on. > > > >If that is indeed the case, removing __refile_inode() from the nonzero > >inode->i_nlink path should close that window. > > > >Chris, can you please test the attached "iput-iprune-race.patch" with > >your usual irqbalance enable environment ? > > > > > >On Sat, Jun 18, 2005 at 01:02:20PM -0700, Albert Chu wrote: > >>Howdy everyone, > >> > >>>Albert Chu, CC'ed, has suggested the below as a fix. Albert, any new > >>>info on this, or have these two patches cleared up the problem well? > >> > >>The __refile_inode() patch below fixed the problem for us on our > >>clusters (running RHEL3). The clear_inode() patch is something Redhat > >>(I think Larry Woodman) gave to us to fix a problem he believes is in > >>the same general area. We haven't had any additional problems adding > >>his second patch. > >> > >>>BTW, the below mail says that these are workarounds and a real fix > >>>is on the way. Has that been rolled in as well? > >> > >>Sorry, not too sure. :-( > > > >Albert's description: > > > >>Just thought I'd let you know what's up. We think we're close to > >>getting to the bottom of this. We think the race is between iput() and > >>__refile_inode(). The Redhat kernel is of course different than the > >>mainline kernel, but I think the same bug exists in the mainline. The > >>example below illustrates the race between iput() and __sync_one(), but > >>it could occur with other areas that all __refile_inode(). For us, I > >>think we're hitting it in __sync_one() and prune_icache(). (The > >>prune_icache() call to __refile_inode() doesn't seem to be in the > >>mainline though). > >> > >>proc 0: > >>calls iput() and locks inode_lock. > >>iput removes the inode off of the i_list > >>unlocks inode_lock (proc 1 will now at some point grab inode_lock) > >>calls clear_inode() > >>(this is key) gets past the call to wait_on_inode(); > >>at this point clear_inode() and the remainder of iput() does not care > >>about I_LOCK or inode_lock. > >> > >>proc 1: > >>calls __sync_one() with inode_lock. > >>sets I_LOCK > >>do stuff before __refile_inode() is called, all I_LOCK/inode_lock stuff > >>doesn't matter. > >> > >>proc 0: > >>sets i_state = I_CLEAR > >>iput calls destroy_inode() > >> > >>proc 1: > >>calls __refile_inode > >> > >>and we have ourselves a corrupted inode on the inode_unused list. > >> > >>I'm not sure if you can see it or not, but Redhat bugzilla 155289 is > >>tracking this. > >> > >>Al