sata_mv dropping disks

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* sata_mv dropping disks
@ 2006-05-18 21:31 Onis
  2006-05-19 21:06 ` Mark Lord
  0 siblings, 1 reply; 3+ messages in thread
From: Onis @ 2006-05-18 21:31 UTC (permalink / raw)
  To: linux-ide

Hello

Got warnings while rebuilding md raid5 array. Controller is 88SX5081 with
8xMaxtor 300GB 7V300F0. I've ran badblock -w on all disks, smartctl doesn't
report errors.

----
BUG: warning at drivers/scsi/sata_mv.c:1884/mv_channel_reset()

Call Trace: <IRQ> <ffffffff803a39ce>{mv_channel_reset+238}
       <ffffffff803a4277>{mv_stop_and_reset+55}
<ffffffff803a45f7>{mv_interrupt+631}
       <ffffffff8024e9fc>{handle_IRQ_event+44}
<ffffffff8024eae0>{__do_IRQ+176}
       <ffffffff8020c5d2>{do_IRQ+66} <ffffffff80209c88>{ret_from_intr+0} <EOI>
              <ffffffff80322773>{get_request_wait+35}
<ffffffff803b78bf>{xor_sse_5+191}
       <ffffffff803b09fd>{compute_block+221}
<ffffffff80323b5f>{generic_make_request+495}
       <ffffffff803b3520>{handle_stripe+7840} <ffffffff803b487d>{raid5d+349}
              <ffffffff80241268>{prepare_to_wait+24}
<ffffffff80240dd0>{keventd_create_kthread+0}
       <ffffffff803bc2ac>{md_thread+300}
<ffffffff802413e0>{autoremove_wake_function+0}
       <ffffffff802413e0>{autoremove_wake_function+0}
<ffffffff803bc180>{md_thread+0}
       <ffffffff80240d89>{kthread+217} <ffffffff8020a5da>{child_rip+8}
              <ffffffff80240dd0>{keventd_create_kthread+0}
<ffffffff80240cb0>{kthread+0}
       <ffffffff8020a5d2>{child_rip+0}
       BUG: warning at drivers/scsi/sata_mv.c:1904/__msleep()
       
       Call Trace: <IRQ> <ffffffff803a3f21>{__mv_phy_reset+241}
              <ffffffff803a39da>{mv_channel_reset+250}
<ffffffff803a45f7>{mv_interrupt+631}
       <ffffffff8024e9fc>{handle_IRQ_event+44}
<ffffffff8024eae0>{__do_IRQ+176}
       <ffffffff8020c5d2>{do_IRQ+66} <ffffffff80209c88>{ret_from_intr+0} <EOI>
              <ffffffff80322773>{get_request_wait+35}
<ffffffff803b78bf>{xor_sse_5+191}
       <ffffffff803b09fd>{compute_block+221}
<ffffffff80323b5f>{generic_make_request+495}
       <ffffffff803b3520>{handle_stripe+7840} <ffffffff803b487d>{raid5d+349}
              <ffffffff80241268>{prepare_to_wait+24}
<ffffffff80240dd0>{keventd_create_kthread+0}
       <ffffffff803bc2ac>{md_thread+300}
<ffffffff802413e0>{autoremove_wake_function+0}
       <ffffffff802413e0>{autoremove_wake_function+0}
<ffffffff803bc180>{md_thread+0}
       <ffffffff80240d89>{kthread+217} <ffffffff8020a5da>{child_rip+8}
              <ffffffff80240dd0>{keventd_create_kthread+0}
<ffffffff80240cb0>{kthread+0}
       <ffffffff8020a5d2>{child_rip+0}
       ata4: translated ATA stat/err 0x50/01 to SCSI SK/ASC/ASCQ 0x3/13/00
       ata4: status=0x50 { DriveReady SeekComplete }
       ata4: error=0x01 { AddrMarkNotFound }
       sata_mv: PCI ERROR; PCI IRQ cause=0x28000020


What does "PCI IRQ cause=0x28000020" mean?

Few minutes after that rebuild stopped:
----
sd 6:0:0:0: SCSI error: return code = 0x40000
end_request: I/O error, dev sdg, sector 403739536
sd 6:0:0:0: SCSI error: return code = 0x40000
end_request: I/O error, dev sdg, sector 403739544
sd 6:0:0:0: SCSI error: return code = 0x40000
end_request: I/O error, dev sdg, sector 403739552
md: md0: sync done.

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sda[0] sdg[8](F) sdh[7] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      2051400960 blocks level 5, 128k chunk, algorithm 2 [8/7] [UUUUUU_U]

# hdparm -I /dev/sdg

/dev/sdg:
 HDIO_DRIVE_CMD(identify) failed: Input/output error


Also I'm getting a lots of these on all ports on boot. smartctl also triggers
these:
----
ata3: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xd0 { Busy }
ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xd0 { Busy }
...

System
------
* Tyan Thunder S2882 Dual Opteron 240
* Marvell Technology Group Ltd. MV88SX5081 8-port SATA I PCI-X Controller
* 8 x Maxtor Maxline III 300GB SATA2
* Debian Sarge AMD64
* kernel 2.6.17-rc4-mm1

- Onis

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sata_mv dropping disks
  2006-05-18 21:31 sata_mv dropping disks Onis
@ 2006-05-19 21:06 ` Mark Lord
  2006-05-19 22:25   ` Onis
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Lord @ 2006-05-19 21:06 UTC (permalink / raw)
  To: Onis; +Cc: linux-ide

Onis wrote:
> Hello
> 
> Got warnings while rebuilding md raid5 array. Controller is 88SX5081 with
> 8xMaxtor 300GB 7V300F0. I've ran badblock -w on all disks, smartctl doesn't
> report errors.
> 
> ----
> BUG: warning at drivers/scsi/sata_mv.c:1884/mv_channel_reset()
> 
> Call Trace: <IRQ> <ffffffff803a39ce>{mv_channel_reset+238}
>        <ffffffff803a4277>{mv_stop_and_reset+55}
> <ffffffff803a45f7>{mv_interrupt+631}
>        <ffffffff8024e9fc>{handle_IRQ_event+44}
> <ffffffff8024eae0>{__do_IRQ+176}
...

I'm not sure what the complaint is about there.
I see this on line 1884:  mdelay(1);
But maybe the 2.6.17-rc4-mm1 version is different from
the 2.6.17-rc4-git2-libata1 that I have handy right now. (?)

>        BUG: warning at drivers/scsi/sata_mv.c:1904/__msleep()

Similarly, on that line I see:  mdelay(20);
Is there something different about mdelay() in -mm now?

..
> What does "PCI IRQ cause=0x28000020" mean?

"MWrPerr: SErr# asserted upon a PErr# response to write data by the PCI master"

In other words, a PCI bus parity error was detected.
Noisy bus, or buggy hardware.

>        ata4: translated ATA stat/err 0x50/01 to SCSI SK/ASC/ASCQ 0x3/13/00
>        ata4: status=0x50 { DriveReady SeekComplete }
>        ata4: error=0x01 { AddrMarkNotFound }

That is wrong (bug).  I *think* this may be fixed by the sata_mv
patch series I just posted today.  The response should be to reset
the bus (well, at least that's what it does now) and then retry
the operation, not fail it immediately.

..
> Also I'm getting a lots of these on all ports on boot. smartctl also triggers
> these:
> ----
> ata3: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> ata3: status=0xd0 { Busy }
> ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> ata1: status=0xd0 { Busy }
> ...

That's due to a Marvell chip bug.  A workaround for that got posted in
my patch series today.

Cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sata_mv dropping disks
  2006-05-19 21:06 ` Mark Lord
@ 2006-05-19 22:25   ` Onis
  0 siblings, 0 replies; 3+ messages in thread
From: Onis @ 2006-05-19 22:25 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-ide

Mark Lord wrote:

> >BUG: warning at drivers/scsi/sata_mv.c:1884/mv_channel_reset()
> ...
> >What does "PCI IRQ cause=0x28000020" mean?
> 
> "MWrPerr: SErr# asserted upon a PErr# response to write data by the PCI 
> master"
> 
> In other words, a PCI bus parity error was detected.
> Noisy bus, or buggy hardware.

Yes, that was fixed by relaxing bus speed to 133->66MHz, ignore it. My bad.

> >       ata4: translated ATA stat/err 0x50/01 to SCSI SK/ASC/ASCQ 0x3/13/00
> >       ata4: status=0x50 { DriveReady SeekComplete }
> >       ata4: error=0x01 { AddrMarkNotFound }
> 
> That is wrong (bug).  I *think* this may be fixed by the sata_mv
> patch series I just posted today.  The response should be to reset
> the bus (well, at least that's what it does now) and then retry
> the operation, not fail it immediately.

I think this was related to bus speed also. Haven't seen this error before.

> >Also I'm getting a lots of these on all ports on boot. smartctl also 
> >triggers
> >these:
> >----
> >ata3: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> >ata3: status=0xd0 { Busy }
> >ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> >ata1: status=0xd0 { Busy }
> >...
> 
> That's due to a Marvell chip bug.  A workaround for that got posted in
> my patch series today.

Thanks a lot for the patch Mark! I grabbed immediately and patched against
2.6.17-rc4. Is it okay?

Now I'm now running rebuild with 6081 controller. Everything seems great. No
ata busy warnings or anything.

> Cheers

Cheers!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-05-19 22:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-18 21:31 sata_mv dropping disks Onis
2006-05-19 21:06 ` Mark Lord
2006-05-19 22:25   ` Onis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).