From: Andrew Morton <akpm@digeo.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: aic7xxx woes in 2.5
Date: Mon, 16 Dec 2002 01:40:25 -0800 [thread overview]
Message-ID: <3DFD9F89.4B994586@digeo.com> (raw)
In-Reply-To: 23290000.1039982976@aslan.btc.adaptec.com
"Justin T. Gibbs" wrote:
>
> > For about six months in the 2.5 series, using aic7xxx, about every fourth
> > boot one of my disks tends to get:
> >
> > (scsi1:A:4:0): parity-error detected in Data-in phase: SEQADDR(0x1ae)
> > SCSIRATE(0x88) scsi1:0:4:0: Attempting to queue an ABORT message
> >
> > This is invariably fatal.
>
> ...
>
> > This never happens in 2.4 kernels.
> >
> > It seems to happen a little more frequently on uniprocessor builds.
> >
> > So relevant questions would be:
> >
> > 1) Why does only 2.5 get the parity error?
>
> Most likely different loads on your SCSI bus. The driver can't "make up"
> SCSI bus parity errors.
It's very consistent. Never seen on 2.4.
> > 2) Why does the recovery lock up?
>
> I would actually have to know the sequencer instruction that we
> are blocked on in the clear_critical_sections code to be able to
> say. Several recovery bugs have been fixed in later driver versions.
OK, let's move on then.
> > 3) Does anyone have a diff for Justin's new driver?
>
> Just populate the scsi/aic7xxx directory with the files found
> here:
>
> http://people.FreeBSD.org/~gibbs/linux/SRC/
>
> You will need to merge in the Kconfig and Makefile for the scsi
> directory, but if you are running a fairly recent kernel, you
> can just overwrite those files with those supplied in the linux-2.5
> archive supplied at the above URL.
That's very awkward and will hamper efforts to get testing done.
I grafted it into the 2.5.52 tree. The Kconfig entries for
aix7xxx_old seem to be lost.
The driver still has a serious bug in ahc_linux_queue_recovery_cmd().
It does
ahc_unlock(ahc, &s);
where local variable `s' is uninitialised. But that gets copied
into the CPU's interrupt flag.
The driver got through recognising the disks and then locked up
strangely:
Program received signal SIGEMT, Emulation trap.
cache_alloc_refill (cachep=0xd00675a0, flags=0) at include/linux/list.h:127
127 prev->next = next;
(gdb) bt
#0 cache_alloc_refill (cachep=0xd00675a0, flags=0) at include/linux/list.h:127
#1 0x00000246 in ?? ()
#2 0xc0135947 in kmalloc (size=256, flags=0) at mm/slab.c:1652
#3 0xc0239835 in ahc_linux_dv_inq (ahc=0xc175e400, cmd=0xc3dd0c00, devinfo=0xc3d77fb0, targ=0xc3dcee00, request_length=96)
at drivers/scsi/aic7xxx/aic7xxx_osm.c:3303
#4 0xc0237f5d in ahc_linux_dv_target (ahc=0xc175e400, target_offset=4) at drivers/scsi/aic7xxx/aic7xxx_osm.c:2060
#5 0xc0237d47 in ahc_linux_dv_thread (data=0xc175e400) at drivers/scsi/aic7xxx/aic7xxx_osm.c:1955
This is an NMI watchdog interrupt. In here:
1571 while (slabp->inuse < cachep->num && batchcount--)
1572 ac_entry(ac)[ac->avail++] =
1573 cache_alloc_one_tail(cachep, slabp);
Presumably due to errors in use of slab-allocated memory.
So I enabled slab debugging and:
Program received signal SIGTRAP, Trace/breakpoint trap.
0xc013606f in kfree (objp=0xc3da5ed4) at mm/slab.c:1452
1452 BUG();
(gdb) bt
#0 0xc013606f in kfree (objp=0xc3da5ed4) at mm/slab.c:1452
#1 0xc023bca1 in ahc_linux_free_target (ahc=0xc175c000, targ=0xc3dcf800) at drivers/scsi/aic7xxx/aic7xxx_osm.c:4588
#2 0xc023bdbd in ahc_linux_free_device (ahc=0xc175c000, dev=0xc3da4ba4) at drivers/scsi/aic7xxx/aic7xxx_osm.c:4642
#3 0xc023c36d in ahc_done (ahc=0xc175c000, scb=0xc3d78070) at drivers/scsi/aic7xxx/aic7xxx_osm.c:4858
#4 0xc02296bd in ahc_run_qoutfifo (ahc=0xc175c000) at drivers/scsi/aic7xxx/aic7xxx_core.c:344
#5 0xc023b93a in ahc_linux_isr (irq=35, dev_id=0xc175c000, regs=0xd0003f74) at drivers/scsi/aic7xxx/aic7xxx_inline.h:600
#6 0xc010c710 in handle_IRQ_event (irq=35, regs=0xd0003f74, action=0xc3d9c974) at arch/i386/kernel/irq.c:210
#7 0xc010c8f2 in do_IRQ (regs=
{ebx = -805298176, ecx = 384, edx = -805298176, esi = -1072657832, edi = 0, ebp = -805290072, eax = 17, xds = 104, xes = -1072693144, orig_eax = -221, eip = -1072657788, xcs = 96, eflags = 582, esp = -805290056, xss = -1072657690}) at arch/i386/kernel/irq.c:391
#8 0xc010b114 in common_interrupt () at include/linux/kallsyms.h:39
#9 0xc0108ae6 in cpu_idle () at arch/i386/kernel/process.c:144
#10 0xc039553a in start_secondary (unused=0xc038692d) at arch/i386/kernel/smpboot.c:467
That's in here:
if (cachep->flags & SLAB_RED_ZONE) {
objp -= BYTES_PER_WORD;
if (xchg((unsigned long *)objp, RED_MAGIC1) != RED_MAGIC2)
/* Either write before start, or a double free. */
BUG();
Presumably a double free in ahc_linux_free_target()
I can debug further if you like, but would really appreciate unified
diffs, thanks.
next prev parent reply other threads:[~2002-12-16 9:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-12-15 4:31 aic7xxx woes in 2.5 Andrew Morton
2002-12-15 6:06 ` Ishikawa
2002-12-15 6:48 ` Andrew Morton
2002-12-15 13:48 ` Ishikawa
2002-12-15 20:17 ` Justin T. Gibbs
2002-12-15 20:09 ` Justin T. Gibbs
2002-12-16 9:40 ` Andrew Morton [this message]
2002-12-16 18:52 ` Justin T. Gibbs
2002-12-16 19:03 ` Christoph Hellwig
2002-12-16 19:08 ` Andrew Morton
2002-12-16 19:26 ` Justin T. Gibbs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3DFD9F89.4B994586@digeo.com \
--to=akpm@digeo.com \
--cc=gibbs@scsiguy.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.