From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Lord Subject: Re: sata_mv dropping disks Date: Fri, 19 May 2006 17:06:26 -0400 Message-ID: <446E3352.20405@rtr.ca> References: <20060518213131.GA10777@virasto.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from rtr.ca ([64.26.128.89]:3272 "EHLO mail.rtr.ca") by vger.kernel.org with ESMTP id S1751080AbWESVG2 (ORCPT ); Fri, 19 May 2006 17:06:28 -0400 In-Reply-To: <20060518213131.GA10777@virasto.com> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Onis Cc: linux-ide@vger.kernel.org Onis wrote: > Hello > > Got warnings while rebuilding md raid5 array. Controller is 88SX5081 with > 8xMaxtor 300GB 7V300F0. I've ran badblock -w on all disks, smartctl doesn't > report errors. > > ---- > BUG: warning at drivers/scsi/sata_mv.c:1884/mv_channel_reset() > > Call Trace: {mv_channel_reset+238} > {mv_stop_and_reset+55} > {mv_interrupt+631} > {handle_IRQ_event+44} > {__do_IRQ+176} ... I'm not sure what the complaint is about there. I see this on line 1884: mdelay(1); But maybe the 2.6.17-rc4-mm1 version is different from the 2.6.17-rc4-git2-libata1 that I have handy right now. (?) > BUG: warning at drivers/scsi/sata_mv.c:1904/__msleep() Similarly, on that line I see: mdelay(20); Is there something different about mdelay() in -mm now? .. > What does "PCI IRQ cause=0x28000020" mean? "MWrPerr: SErr# asserted upon a PErr# response to write data by the PCI master" In other words, a PCI bus parity error was detected. Noisy bus, or buggy hardware. > ata4: translated ATA stat/err 0x50/01 to SCSI SK/ASC/ASCQ 0x3/13/00 > ata4: status=0x50 { DriveReady SeekComplete } > ata4: error=0x01 { AddrMarkNotFound } That is wrong (bug). I *think* this may be fixed by the sata_mv patch series I just posted today. The response should be to reset the bus (well, at least that's what it does now) and then retry the operation, not fail it immediately. .. > Also I'm getting a lots of these on all ports on boot. smartctl also triggers > these: > ---- > ata3: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > ata3: status=0xd0 { Busy } > ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > ata1: status=0xd0 { Busy } > ... That's due to a Marvell chip bug. A workaround for that got posted in my patch series today. Cheers