linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: RE: RE: oops in 2.4.25 prune_icache() called from kswapd
@ 2005-06-20  6:20 Albert Chu
  0 siblings, 0 replies; only message in thread
From: Albert Chu @ 2005-06-20  6:20 UTC (permalink / raw)
  To: Chris Caputo
  Cc: Velupula, Prakash, Marcelo Tosatti, David Woodhouse, Al Viro,
	linux-fsdevel, lwoodman, behlendorf1

> > Albert, Prakash, any of you using stock v2.4?

Sorry, we don't use a stock kernel in our environment :-(

> Do you have any inputs on reproducing this?

Sorry.  We were a bit resource fortunate when debugging this.  Heavy I/O
scientific simulations on a 1000 node cluster got us about 5-10 hits a day.

Al

--
Albert Chu
chu11@llnl.gov
Lawrence Livermore National Laboratory

----- Original Message -----
From: Chris Caputo <ccaputo@alt.net>
Date: Sunday, June 19, 2005 4:07 pm
Subject: RE: RE: oops in 2.4.25 prune_icache() called from kswapd

> My basic repro method was:
> 
> --
> 0) start irqbalance
> 1) run loop_dbench, which is the following dbench script which uses
>    client_plain.txt:
> 
>    #!/bin/sh
> 
>    while [ 1 ]
>    do
>         date
>         dbench 2
> 2) wait for oops
> --
> 
> I think I was using dbench-2.1:
> 
>   http://samba.org/ftp/tridge/dbench/dbench-2.1.tar.gz
> 
> In my case irqbalance was key.  If I didn't run it I never got the 
> problem.  I think irqbalance just did a good job of exasperating a 
> race 
> condition in some way.
> 
> Chris
> 
> On Sun, 19 Jun 2005, Velupula, Prakash wrote:
> > Hi Marcelo,
> >
> > We are using 2.4.20 and would be able test it. But before that, 
> we are
> > trying to recreate the problem. We just had one occurrence of this
> > problem. Do you have any inputs on reproducing this?
> >
> > Thanks,
> > Prakash
> >
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:marcelo.tosatti@cyclades.com]
> > Sent: Saturday, June 18, 2005 3:34 PM
> > To: Chris Caputo
> > Cc: Albert Chu; David Woodhouse; Al Viro; Velupula, Prakash;
> > linux-fsdevel@vger.kernel.org; lwoodman@redhat.com
> > Subject: Re: RE: oops in 2.4.25 prune_icache() called from kswapd
> >
> > On Sun, Jun 19, 2005 at 01:11:20AM +0000, Chris Caputo wrote:
> >> Hi,
> >>
> >> Marcello, your patch didn't come through.  Unfortunately I can't 
> test>> it anymore since my environment has now changed, but 
> hopefully Albert
> >> or Prakash can try it when you resend.
> >
> > Albert, Prakash, any of you using stock v2.4?
> >
> > Here's the diff:
> >
> > --- a/fs/inode.c.orig	2005-06-18 11:19:21.508857600 -0300
> > +++ b/fs/inode.c	2005-06-18 11:21:03.925287936 -0300
> > @@ -1238,15 +1238,18 @@
> >                         	BUG();
> >         	} else {
> >                 	if (!list_empty(&inode->i_hash)) {
> > -                        	if (!(inode->i_state &
> > (I_DIRTY|I_LOCK)))
> > -                                	__refile_inode(inode);
> > -                        	inodes_stat.nr_unused++;
> > -                        	spin_unlock(&inode_lock);
> > -                        	if (!sb || (sb->s_flags & MS_ACTIVE))
> > +                        	if (!sb || (sb->s_flags & MS_ACTIVE)) {
> > +                                	if (!(inode->i_state &
> > +                                	(I_DIRTY|I_LOCK))) {
> > +                                        	__refile_inode(inode);
> > +                                        	inodes_stat.nr_unused++;
> > +                                	}
> > +                                	spin_unlock(&inode_lock);
> >                                 	return;
> > +                        	}
> > +                        	spin_unlock(&inode_lock);
> >                         	write_inode_now(inode, 1);
> >                         	spin_lock(&inode_lock);
> > -                        	inodes_stat.nr_unused--;
> >                         	list_del_init(&inode->i_hash);
> >                 	}
> >                 	list_del_init(&inode->i_list);
> >>
> >> Thanks,
> >> Chris
> >>
> >> On Sat, 18 Jun 2005, Marcelo Tosatti wrote:
> >>> Hi,
> >>>
> >>> Shame the RH bugzilla (#155289) requires super priviledges to be
> >>> accessed.
> >>>
> >>> I've got around reading Albert's description of the race - thanks
> > BTW.
> >>> (its attached below for reference)
> >>>
> >>> It seems to me that only window open in mainline is between iput()
> >>> and prune_icache(), while iput() sleeps on sync_one() with the 
> inode>>> being:
> >>>
> >>> - on the unused list
> >>> - and with i_count set to zero
> >>>
> >>> prune_icache() is free to invalidate and destroy the inode in the
> >>> meantime, causing iput()'s sync_one() to __refile_inode() the NULL
> >>> entry to the unused list later on.
> >>>
> >>> If that is indeed the case, removing __refile_inode() from the
> >>> nonzero
> >>> inode->i_nlink path should close that window.
> >>>
> >>> Chris, can you please test the attached "iput-iprune-
> race.patch" with
> >
> >>> your usual irqbalance enable environment ?
> >>>
> >>>
> >>> On Sat, Jun 18, 2005 at 01:02:20PM -0700, Albert Chu wrote:
> >>>> Howdy everyone,
> >>>>
> >>>>> Albert Chu, CC'ed, has suggested the below as a fix.  Albert, 
> any>>>>> new info on this, or have these two patches cleared up the 
> problem> well?
> >>>>
> >>>> The __refile_inode() patch below fixed the problem for us on our
> >>>> clusters (running RHEL3).  The clear_inode() patch is something
> >>>> Redhat (I think Larry Woodman) gave to us to fix a problem he
> >>>> believes is in the same general area.  We haven't had any 
> additional>
> >>>> problems adding his second patch.
> >>>>
> >>>>> BTW, the below mail says that these are workarounds and a 
> real fix
> >>>>> is on the way. Has that been rolled in as well?
> >>>>
> >>>> Sorry, not too sure.  :-(
> >>>
> >>> Albert's description:
> >>>
> >>>> Just thought I'd let you know what's up.  We think we're close to
> >>>> getting to the bottom of this.  We think the race is between 
> iput()>>>> and __refile_inode().  The Redhat kernel is of course 
> different than
> >
> >>>> the mainline kernel, but I think the same bug exists in the
> >>>> mainline.  The example below illustrates the race between 
> iput() and
> >
> >>>> __sync_one(), but it could occur with other areas that all
> >>>> __refile_inode().  For us, I think we're hitting it in 
> __sync_one()>>>> and prune_icache().  (The
> >>>> prune_icache() call to __refile_inode() doesn't seem to be in the
> >>>> mainline though).
> >>>>
> >>>> proc 0:
> >>>> calls iput() and locks inode_lock.
> >>>> iput removes the inode off of the i_list unlocks inode_lock 
> (proc 1
> >>>> will now at some point grab inode_lock) calls clear_inode() 
> (this is
> >
> >>>> key) gets past the call to wait_on_inode(); at this point
> >>>> clear_inode() and the remainder of iput() does not care about 
> I_LOCK>
> >>>> or inode_lock.
> >>>>
> >>>> proc 1:
> >>>> calls __sync_one() with inode_lock.
> >>>> sets I_LOCK
> >>>> do stuff before __refile_inode() is called, all I_LOCK/inode_lock
> >>>> stuff doesn't matter.
> >>>>
> >>>> proc 0:
> >>>> sets i_state = I_CLEAR
> >>>> iput calls destroy_inode()
> >>>>
> >>>> proc 1:
> >>>> calls __refile_inode
> >>>>
> >>>> and we have ourselves a corrupted inode on the inode_unused list.
> >>>>
> >>>> I'm not sure if you can see it or not, but Redhat bugzilla 
> 155289 is
> >
> >>>> tracking this.
> >>>>
> >>>> Al
> >
> 


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2005-06-20  6:21 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-20  6:20 RE: RE: oops in 2.4.25 prune_icache() called from kswapd Albert Chu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).