All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: RE: RE: oops in 2.4.25 prune_icache() called from kswapd
@ 2005-06-20  6:20 Albert Chu
  0 siblings, 0 replies; only message in thread
From: Albert Chu @ 2005-06-20  6:20 UTC (permalink / raw)
  To: Chris Caputo
  Cc: Velupula, Prakash, Marcelo Tosatti, David Woodhouse, Al Viro,
	linux-fsdevel, lwoodman, behlendorf1

> > Albert, Prakash, any of you using stock v2.4?

Sorry, we don't use a stock kernel in our environment :-(

> Do you have any inputs on reproducing this?

Sorry.  We were a bit resource fortunate when debugging this.  Heavy I/O
scientific simulations on a 1000 node cluster got us about 5-10 hits a day.

Al

--
Albert Chu
chu11@llnl.gov
Lawrence Livermore National Laboratory

----- Original Message -----
From: Chris Caputo <ccaputo@alt.net>
Date: Sunday, June 19, 2005 4:07 pm
Subject: RE: RE: oops in 2.4.25 prune_icache() called from kswapd

> My basic repro method was:
> 
> --
> 0) start irqbalance
> 1) run loop_dbench, which is the following dbench script which uses
>    client_plain.txt:
> 
>    #!/bin/sh
> 
>    while [ 1 ]
>    do
>         date
>         dbench 2
> 2) wait for oops
> --
> 
> I think I was using dbench-2.1:
> 
>   http://samba.org/ftp/tridge/dbench/dbench-2.1.tar.gz
> 
> In my case irqbalance was key.  If I didn't run it I never got the 
> problem.  I think irqbalance just did a good job of exasperating a 
> race 
> condition in some way.
> 
> Chris
> 
> On Sun, 19 Jun 2005, Velupula, Prakash wrote:
> > Hi Marcelo,
> >
> > We are using 2.4.20 and would be able test it. But before that, 
> we are
> > trying to recreate the problem. We just had one occurrence of this
> > problem. Do you have any inputs on reproducing this?
> >
> > Thanks,
> > Prakash
> >
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:marcelo.tosatti@cyclades.com]
> > Sent: Saturday, June 18, 2005 3:34 PM
> > To: Chris Caputo
> > Cc: Albert Chu; David Woodhouse; Al Viro; Velupula, Prakash;
> > linux-fsdevel@vger.kernel.org; lwoodman@redhat.com
> > Subject: Re: RE: oops in 2.4.25 prune_icache() called from kswapd
> >
> > On Sun, Jun 19, 2005 at 01:11:20AM +0000, Chris Caputo wrote:
> >> Hi,
> >>
> >> Marcello, your patch didn't come through.  Unfortunately I can't 
> test>> it anymore since my environment has now changed, but 
> hopefully Albert
> >> or Prakash can try it when you resend.
> >
> > Albert, Prakash, any of you using stock v2.4?
> >
> > Here's the diff:
> >
> > --- a/fs/inode.c.orig	2005-06-18 11:19:21.508857600 -0300
> > +++ b/fs/inode.c	2005-06-18 11:21:03.925287936 -0300
> > @@ -1238,15 +1238,18 @@
> >                         	BUG();
> >         	} else {
> >                 	if (!list_empty(&inode->i_hash)) {
> > -                        	if (!(inode->i_state &
> > (I_DIRTY|I_LOCK)))
> > -                                	__refile_inode(inode);
> > -                        	inodes_stat.nr_unused++;
> > -                        	spin_unlock(&inode_lock);
> > -                        	if (!sb || (sb->s_flags & MS_ACTIVE))
> > +                        	if (!sb || (sb->s_flags & MS_ACTIVE)) {
> > +                                	if (!(inode->i_state &
> > +                                	(I_DIRTY|I_LOCK))) {
> > +                                        	__refile_inode(inode);
> > +                                        	inodes_stat.nr_unused++;
> > +                                	}
> > +                                	spin_unlock(&inode_lock);
> >                                 	return;
> > +                        	}
> > +                        	spin_unlock(&inode_lock);
> >                         	write_inode_now(inode, 1);
> >                         	spin_lock(&inode_lock);
> > -                        	inodes_stat.nr_unused--;
> >                         	list_del_init(&inode->i_hash);
> >                 	}
> >                 	list_del_init(&inode->i_list);
> >>
> >> Thanks,
> >> Chris
> >>
> >> On Sat, 18 Jun 2005, Marcelo Tosatti wrote:
> >>> Hi,
> >>>
> >>> Shame the RH bugzilla (#155289) requires super priviledges to be
> >>> accessed.
> >>>
> >>> I've got around reading Albert's description of the race - thanks
> > BTW.
> >>> (its attached below for reference)
> >>>
> >>> It seems to me that only window open in mainline is between iput()
> >>> and prune_icache(), while iput() sleeps on sync_one() with the 
> inode>>> being:
> >>>
> >>> - on the unused list
> >>> - and with i_count set to zero
> >>>
> >>> prune_icache() is free to invalidate and destroy the inode in the
> >>> meantime, causing iput()'s sync_one() to __refile_inode() the NULL
> >>> entry to the unused list later on.
> >>>
> >>> If that is indeed the case, removing __refile_inode() from the
> >>> nonzero
> >>> inode->i_nlink path should close that window.
> >>>
> >>> Chris, can you please test the attached "iput-iprune-
> race.patch" with
> >
> >>> your usual irqbalance enable environment ?
> >>>
> >>>
> >>> On Sat, Jun 18, 2005 at 01:02:20PM -0700, Albert Chu wrote:
> >>>> Howdy everyone,
> >>>>
> >>>>> Albert Chu, CC'ed, has suggested the below as a fix.  Albert, 
> any>>>>> new info on this, or have these two patches cleared up the 
> problem> well?
> >>>>
> >>>> The __refile_inode() patch below fixed the problem for us on our
> >>>> clusters (running RHEL3).  The clear_inode() patch is something
> >>>> Redhat (I think Larry Woodman) gave to us to fix a problem he
> >>>> believes is in the same general area.  We haven't had any 
> additional>
> >>>> problems adding his second patch.
> >>>>
> >>>>> BTW, the below mail says that these are workarounds and a 
> real fix
> >>>>> is on the way. Has that been rolled in as well?
> >>>>
> >>>> Sorry, not too sure.  :-(
> >>>
> >>> Albert's description:
> >>>
> >>>> Just thought I'd let you know what's up.  We think we're close to
> >>>> getting to the bottom of this.  We think the race is between 
> iput()>>>> and __refile_inode().  The Redhat kernel is of course 
> different than
> >
> >>>> the mainline kernel, but I think the same bug exists in the
> >>>> mainline.  The example below illustrates the race between 
> iput() and
> >
> >>>> __sync_one(), but it could occur with other areas that all
> >>>> __refile_inode().  For us, I think we're hitting it in 
> __sync_one()>>>> and prune_icache().  (The
> >>>> prune_icache() call to __refile_inode() doesn't seem to be in the
> >>>> mainline though).
> >>>>
> >>>> proc 0:
> >>>> calls iput() and locks inode_lock.
> >>>> iput removes the inode off of the i_list unlocks inode_lock 
> (proc 1
> >>>> will now at some point grab inode_lock) calls clear_inode() 
> (this is
> >
> >>>> key) gets past the call to wait_on_inode(); at this point
> >>>> clear_inode() and the remainder of iput() does not care about 
> I_LOCK>
> >>>> or inode_lock.
> >>>>
> >>>> proc 1:
> >>>> calls __sync_one() with inode_lock.
> >>>> sets I_LOCK
> >>>> do stuff before __refile_inode() is called, all I_LOCK/inode_lock
> >>>> stuff doesn't matter.
> >>>>
> >>>> proc 0:
> >>>> sets i_state = I_CLEAR
> >>>> iput calls destroy_inode()
> >>>>
> >>>> proc 1:
> >>>> calls __refile_inode
> >>>>
> >>>> and we have ourselves a corrupted inode on the inode_unused list.
> >>>>
> >>>> I'm not sure if you can see it or not, but Redhat bugzilla 
> 155289 is
> >
> >>>> tracking this.
> >>>>
> >>>> Al
> >
> 


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2005-06-20  6:21 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-20  6:20 RE: RE: oops in 2.4.25 prune_icache() called from kswapd Albert Chu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.