From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: RE: oops in 2.4.25 prune_icache() called from kswapd Date: Sat, 18 Jun 2005 16:45:22 -0300 Message-ID: <20050618194522.GA8385@logos.cnet> References: <390de35bed.35bed390de@llnl.gov> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Velupula, Prakash" , Chris Caputo , linux-fsdevel@vger.kernel.org, lwoodman@redhat.com Return-path: Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:152 "EHLO parcelfarce.linux.theplanet.co.uk") by vger.kernel.org with ESMTP id S261452AbVFSBAZ (ORCPT ); Sat, 18 Jun 2005 21:00:25 -0400 To: Albert Chu , David Woodhouse , Al Viro Content-Disposition: inline In-Reply-To: <390de35bed.35bed390de@llnl.gov> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Hi, Shame the RH bugzilla (#155289) requires super priviledges to be accessed. I've got around reading Albert's description of the race - thanks BTW. (its attached below for reference) It seems to me that only window open in mainline is between iput() and prune_icache(), while iput() sleeps on sync_one() with the inode being: - on the unused list - and with i_count set to zero prune_icache() is free to invalidate and destroy the inode in the meantime, causing iput()'s sync_one() to __refile_inode() the NULL entry to the unused list later on. If that is indeed the case, removing __refile_inode() from the nonzero inode->i_nlink path should close that window. Chris, can you please test the attached "iput-iprune-race.patch" with your usual irqbalance enable environment ? On Sat, Jun 18, 2005 at 01:02:20PM -0700, Albert Chu wrote: > Howdy everyone, > > > Albert Chu, CC'ed, has suggested the below as a fix. Albert, any new > > info on this, or have these two patches cleared up the problem well? > > The __refile_inode() patch below fixed the problem for us on our > clusters (running RHEL3). The clear_inode() patch is something Redhat > (I think Larry Woodman) gave to us to fix a problem he believes is in > the same general area. We haven't had any additional problems adding > his second patch. > > > BTW, the below mail says that these are workarounds and a real fix > > is on the way. Has that been rolled in as well? > > Sorry, not too sure. :-( Albert's description: > Just thought I'd let you know what's up. We think we're close to > getting to the bottom of this. We think the race is between iput() and > __refile_inode(). The Redhat kernel is of course different than the > mainline kernel, but I think the same bug exists in the mainline. The > example below illustrates the race between iput() and __sync_one(), but > it could occur with other areas that all __refile_inode(). For us, I > think we're hitting it in __sync_one() and prune_icache(). (The > prune_icache() call to __refile_inode() doesn't seem to be in the > mainline though). > > proc 0: > calls iput() and locks inode_lock. > iput removes the inode off of the i_list > unlocks inode_lock (proc 1 will now at some point grab inode_lock) > calls clear_inode() > (this is key) gets past the call to wait_on_inode(); > at this point clear_inode() and the remainder of iput() does not care > about I_LOCK or inode_lock. > > proc 1: > calls __sync_one() with inode_lock. > sets I_LOCK > do stuff before __refile_inode() is called, all I_LOCK/inode_lock stuff > doesn't matter. > > proc 0: > sets i_state = I_CLEAR > iput calls destroy_inode() > > proc 1: > calls __refile_inode > > and we have ourselves a corrupted inode on the inode_unused list. > > I'm not sure if you can see it or not, but Redhat bugzilla 155289 is > tracking this. > > Al