public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: aic7xxx woes in 2.5
Date: Mon, 16 Dec 2002 01:40:25 -0800	[thread overview]
Message-ID: <3DFD9F89.4B994586@digeo.com> (raw)
In-Reply-To: 23290000.1039982976@aslan.btc.adaptec.com

"Justin T. Gibbs" wrote:
> 
> > For about six months in the 2.5 series, using aic7xxx, about every fourth
> > boot one of my disks tends to get:
> >
> > (scsi1:A:4:0): parity-error detected in Data-in phase: SEQADDR(0x1ae)
> > SCSIRATE(0x88) scsi1:0:4:0: Attempting to queue an ABORT message
> >
> > This is invariably fatal.
> 
> ...
> 
> > This never happens in 2.4 kernels.
> >
> > It seems to happen a little more frequently on uniprocessor builds.
> >
> > So relevant questions would be:
> >
> > 1) Why does only 2.5 get the parity error?
> 
> Most likely different loads on your SCSI bus.  The driver can't "make up"
> SCSI bus parity errors.

It's very consistent.  Never seen on 2.4.

> > 2) Why does the recovery lock up?
> 
> I would actually have to know the sequencer instruction that we
> are blocked on in the clear_critical_sections code to be able to
> say.  Several recovery bugs have been fixed in later driver versions.

OK, let's move on then.
 
> > 3) Does anyone have a diff for Justin's new driver?
> 
> Just populate the scsi/aic7xxx directory with the files found
> here:
> 
> http://people.FreeBSD.org/~gibbs/linux/SRC/
> 
> You will need to merge in the Kconfig and Makefile for the scsi
> directory, but if you are running a fairly recent kernel, you
> can just overwrite those files with those supplied in the linux-2.5
> archive supplied at the above URL.

That's very awkward and will hamper efforts to get testing done.

I grafted it into the 2.5.52 tree.  The Kconfig entries for
aix7xxx_old seem to be lost.


The driver still has a serious bug in ahc_linux_queue_recovery_cmd().
It does

	ahc_unlock(ahc, &s);

where local variable `s' is uninitialised.  But that gets copied
into the CPU's interrupt flag.


The driver got through recognising the disks and then locked up
strangely:

Program received signal SIGEMT, Emulation trap.
cache_alloc_refill (cachep=0xd00675a0, flags=0) at include/linux/list.h:127
127             prev->next = next;
(gdb) bt
#0  cache_alloc_refill (cachep=0xd00675a0, flags=0) at include/linux/list.h:127
#1  0x00000246 in ?? ()
#2  0xc0135947 in kmalloc (size=256, flags=0) at mm/slab.c:1652
#3  0xc0239835 in ahc_linux_dv_inq (ahc=0xc175e400, cmd=0xc3dd0c00, devinfo=0xc3d77fb0, targ=0xc3dcee00, request_length=96)
    at drivers/scsi/aic7xxx/aic7xxx_osm.c:3303
#4  0xc0237f5d in ahc_linux_dv_target (ahc=0xc175e400, target_offset=4) at drivers/scsi/aic7xxx/aic7xxx_osm.c:2060
#5  0xc0237d47 in ahc_linux_dv_thread (data=0xc175e400) at drivers/scsi/aic7xxx/aic7xxx_osm.c:1955

This is an NMI watchdog interrupt.  In here:

1571                    while (slabp->inuse < cachep->num && batchcount--)
1572                            ac_entry(ac)[ac->avail++] =
1573                                    cache_alloc_one_tail(cachep, slabp);

Presumably due to errors in use of slab-allocated memory.


So I enabled slab debugging and:


Program received signal SIGTRAP, Trace/breakpoint trap.
0xc013606f in kfree (objp=0xc3da5ed4) at mm/slab.c:1452
1452                            BUG();
(gdb) bt
#0  0xc013606f in kfree (objp=0xc3da5ed4) at mm/slab.c:1452
#1  0xc023bca1 in ahc_linux_free_target (ahc=0xc175c000, targ=0xc3dcf800) at drivers/scsi/aic7xxx/aic7xxx_osm.c:4588
#2  0xc023bdbd in ahc_linux_free_device (ahc=0xc175c000, dev=0xc3da4ba4) at drivers/scsi/aic7xxx/aic7xxx_osm.c:4642
#3  0xc023c36d in ahc_done (ahc=0xc175c000, scb=0xc3d78070) at drivers/scsi/aic7xxx/aic7xxx_osm.c:4858
#4  0xc02296bd in ahc_run_qoutfifo (ahc=0xc175c000) at drivers/scsi/aic7xxx/aic7xxx_core.c:344
#5  0xc023b93a in ahc_linux_isr (irq=35, dev_id=0xc175c000, regs=0xd0003f74) at drivers/scsi/aic7xxx/aic7xxx_inline.h:600
#6  0xc010c710 in handle_IRQ_event (irq=35, regs=0xd0003f74, action=0xc3d9c974) at arch/i386/kernel/irq.c:210
#7  0xc010c8f2 in do_IRQ (regs=
      {ebx = -805298176, ecx = 384, edx = -805298176, esi = -1072657832, edi = 0, ebp = -805290072, eax = 17, xds = 104, xes = -1072693144, orig_eax = -221, eip = -1072657788, xcs = 96, eflags = 582, esp = -805290056, xss = -1072657690}) at arch/i386/kernel/irq.c:391
#8  0xc010b114 in common_interrupt () at include/linux/kallsyms.h:39
#9  0xc0108ae6 in cpu_idle () at arch/i386/kernel/process.c:144
#10 0xc039553a in start_secondary (unused=0xc038692d) at arch/i386/kernel/smpboot.c:467

That's in here:

        if (cachep->flags & SLAB_RED_ZONE) {
                objp -= BYTES_PER_WORD;
                if (xchg((unsigned long *)objp, RED_MAGIC1) != RED_MAGIC2)
                        /* Either write before start, or a double free. */
                        BUG();

Presumably a double free in ahc_linux_free_target()

I can debug further if you like, but would really appreciate unified
diffs, thanks.

  reply	other threads:[~2002-12-16  9:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-12-15  4:31 aic7xxx woes in 2.5 Andrew Morton
2002-12-15  6:06 ` Ishikawa
2002-12-15  6:48   ` Andrew Morton
2002-12-15 13:48     ` Ishikawa
2002-12-15 20:17   ` Justin T. Gibbs
2002-12-15 20:09 ` Justin T. Gibbs
2002-12-16  9:40   ` Andrew Morton [this message]
2002-12-16 18:52     ` Justin T. Gibbs
2002-12-16 19:03       ` Christoph Hellwig
2002-12-16 19:08       ` Andrew Morton
2002-12-16 19:26         ` Justin T. Gibbs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DFD9F89.4B994586@digeo.com \
    --to=akpm@digeo.com \
    --cc=gibbs@scsiguy.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox