linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Keith Hopkins <vger@hopnet.net>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>, linux-scsi@vger.kernel.org
Subject: Re: [PATCH] aic94xx: Don't free ABORT_TASK SCBs that are timed out (Was: Re: aic94xx: failing on high load)
Date: Thu, 28 Feb 2008 08:10:14 -0800	[thread overview]
Message-ID: <1204215014.3377.5.camel@localhost.localdomain> (raw)
In-Reply-To: <47C6CB94.4070904@hopnet.net>

On Thu, 2008-02-28 at 22:56 +0800, Keith Hopkins wrote:
> On 02/20/2008 02:44 AM, Darrick J. Wong wrote:
> > If we send an ABORT_TASK ascb that doesn't return within the timeout period,
> > we should not free that ascb because the sequencer is still holding onto it.
> > Hopefully it will fix what James Bottomley describes below:
> > 
> > On Tue, Feb 19, 2008 at 10:22:20AM -0600, James Bottomley wrote:
> > 
> >> Unfortunately, there's a bug in TMF timeout handling in the driver, it
> >> leaves the sequencer entry pending, but frees the ascb.  If the
> >> sequencer ever picks this up it will get very confused, as it does a
> >> while down in the trace:
> >>
> >>> aic94xx: BUG:sequencer:dl:no ascb?!
> >>> aic94xx: BUG:sequencer:dl:no ascb?!
> >> That's where the sequencer adds an ascb to the done list that we've
> >> already freed.  From this point on confusion reigns and the error
> >> handler eventually offlines the device.
> >>
> >> I'll see if I can come up with patches to fix this ... or at least
> >> mitigate the problems it causes.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
> > ---
> > 
> >  drivers/scsi/aic94xx/aic94xx_tmf.c |    7 ++++++-
> >  1 files changed, 6 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c b/drivers/scsi/aic94xx/aic94xx_tmf.c
> > index b52124f..4b24bd3 100644
> > --- a/drivers/scsi/aic94xx/aic94xx_tmf.c
> > +++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
> > @@ -463,7 +463,7 @@ int asd_abort_task(struct sas_task *task)
> >  						       AIC94XX_SCB_TIMEOUT);
> >  		spin_lock_irqsave(&task->task_state_lock, flags);
> >  		if (leftover < 1)
> > -			res = TMF_RESP_FUNC_FAILED;
> > +			goto out_not_reported;
> >  		if (task->task_state_flags & SAS_TASK_STATE_DONE)
> >  			res = TMF_RESP_FUNC_COMPLETE;
> >  		spin_unlock_irqrestore(&task->task_state_lock, flags);
> > @@ -487,6 +487,11 @@ out:
> >  	asd_ascb_free(ascb);
> >  	ASD_DPRINTK("task 0x%p aborted, res: 0x%x\n", task, res);
> >  	return res;
> > +
> > +out_not_reported:
> > +	spin_unlock_irqrestore(&task->task_state_lock, flags);
> > +	ASD_DPRINTK("task 0x%p aborted? but not reported.\n", task);
> > +	return res;
> >  }
> >  
> >  /**
> > -
> 
> Hi Darrick,
> 
>   Is this the only patch for ascb sequencer use after free problems, or are you still looking into that?

Sorry, I forgot to cc you.  Actually this one is the full one:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=e2396f1e4ecd438a15fa653a028b93e95013caa3

Unfortunately, there are another five patches in that git tree that
you'll also need to see if we can get aic94xx working on your box.

If you're willing, could you use 2.6.25-rc3 as the base kernel and just
apply

http://www.kernel.org/pub/linux/kernel/people/jejb/scsi-rc-fixes-2.6.diff

On top of it?  That should give you a kernel patched with all of the
pending aic94xx and libsas fixes.

Thanks,

James



  reply	other threads:[~2008-02-28 16:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <479FB3ED.3080401@hopnet.net>
     [not found] ` <20080130091403.GA14887@alaris.suse.cz>
2008-01-30 10:59   ` aic94xx: failing on high load (another data point) Keith Hopkins
2008-01-30 19:29     ` Darrick J. Wong
2008-02-14 16:11       ` Keith Hopkins
2008-02-15 15:28         ` James Bottomley
2008-02-15 16:28           ` Keith Hopkins
2008-02-18 14:26           ` Keith Hopkins
2008-02-18 16:18             ` James Bottomley
2008-02-19 16:22             ` James Bottomley
2008-02-19 18:44               ` [PATCH] aic94xx: Don't free ABORT_TASK SCBs that are timed out (Was: Re: aic94xx: failing on high load) Darrick J. Wong
2008-02-19 18:52                 ` James Bottomley
2008-02-28 14:56                 ` Keith Hopkins
2008-02-28 16:10                   ` James Bottomley [this message]
2008-02-20  3:48               ` aic94xx: failing on high load (another data point) James Bottomley
2008-02-20  9:54                 ` Keith Hopkins
2008-02-20 16:22                   ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1204215014.3377.5.camel@localhost.localdomain \
    --to=james.bottomley@hansenpartnership.com \
    --cc=djwong@us.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=vger@hopnet.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).