Re: What's in linux-2.6-block.git for 2.6.24

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: What's in linux-2.6-block.git for 2.6.24
       [not found] <20070921085711.GG2367@kernel.dk>
@ 2007-09-23 13:19 ` Torsten Kaiser
  2007-09-23 13:55   ` FUJITA Tomonori
  2007-09-23 14:11   ` Alan Cox
  0 siblings, 2 replies; 7+ messages in thread
From: Torsten Kaiser @ 2007-09-23 13:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, akpm, linux-scsi, linux-ide

On 9/21/07, Jens Axboe <jens.axboe@oracle.com> wrote:
> SG chaining bits:
> - This is the bulk of the patchset. It consists of three major
>   components:
>
>         - sglist-core, which add helpers for iterating sg lists and
>           switches the block layer and SCSI to use those. Should not
>           have any functional changes.
>         - sglist-drivers, which converts drivers to use the sg list
>           helpers. Again, should not contain functional changes.
>         - sglist-arch, which adds support to most architectures and
>           actually enables sg chaining.

Adding linux-ide and linux-scsi as CC like Andrew did with my last report.

I still have trouble with my Silicon Image, Inc. SiI 3132 Serial ATA
Raid II Controller as reported on 2.6.23-rc4-mm1 on the new
2.6.23-rc6-mm1.

I'm not 100% sure if this caused by the sg chaining, but the patch
from http://lkml.org/lkml/2007/9/10/251 which touches that chaining
makes a difference, so it might be related.

First report: http://lkml.org/lkml/2007/9/1/92
With patch it fails fewer times: http://lkml.org/lkml/2007/9/14/107

To update the statistik:
prior to 2.6.23-rc4-mm1: no trouble with any drives on the SiI 3132.
2.6.23-rc4-mm1 without patch: 2 out of 2 bad.
back to 2.6.23-rc3-mm1: 18x good.
2.6.23-rc4-mm1 with patch:  2 out of 8 bad
after that second mail:
2.6.23-rc4-mm1 with patch: 1 out of 5 bad
2.6.23-rc6-mm1: 1 out of 2 bad
switching back to 2.6.23-rc3-mm1 to rule out the hardware:
2.6.23-rc3-mm1: 6x good

The error messages from the failed 2.6.23-rc6-mm1:
Sep 18 18:50:01 treogen [   33.340000] md1: bitmap initialized from
disk: read 10/10 pages, set 0 bits
Sep 18 18:50:01 treogen [   33.340000] created bitmap (145 pages) for device md1
Sep 18 18:50:01 treogen [   63.440000] ata1.00: exception Emask 0x0
SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 18 18:50:01 treogen [   63.440000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 18 18:50:01 treogen [   63.440000]          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 18 18:50:01 treogen [   63.440000] ata1.00: status: {DRDY }
Sep 18 18:50:01 treogen [   63.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [   65.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [   65.740000] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 18 18:50:01 treogen [   73.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [   75.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [   75.740000] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 18 18:50:01 treogen [   83.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [   85.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [   85.740000] ata1: reset failed (errno=-5),
retrying in 33 secs
Sep 18 18:50:01 treogen [  118.440000] ata1: limiting SATA link speed
to 1.5 Gbps
Sep 18 18:50:01 treogen [  118.440000] ata1: hard resetting link
Sep 18 18:50:01 treogen [  120.740000] ata1: softreset failed (port not ready)
Sep 18 18:50:01 treogen [  120.740000] ata1: reset failed, giving up
Sep 18 18:50:01 treogen [  120.740000] ata1.00: disabled
Sep 18 18:50:01 treogen [  120.740000] ata1: EH complete
Sep 18 18:50:01 treogen [  120.740000] sd 0:0:0:0: [sda] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 18 18:50:01 treogen [  120.740000] end_request: I/O error, dev
sda, sector 625137161
Sep 18 18:50:01 treogen [  120.740000] md: super_written gets
error=-5, uptodate=0
Sep 18 18:50:01 treogen [  120.740000] raid5: Disk failure on sda2,
disabling device. Operation continuing on 2 devices

After that many more errors like this, only differing in the sector number:
Sep 18 18:50:01 treogen [  120.810000] sd 0:0:0:0: [sda] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 18 18:50:01 treogen [  120.810000] end_request: I/O error, dev
sda, sector 19550919

Any more infos needed?

Torsten

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's in linux-2.6-block.git for 2.6.24
  2007-09-23 13:19 ` What's in linux-2.6-block.git for 2.6.24 Torsten Kaiser
@ 2007-09-23 13:55   ` FUJITA Tomonori
  2007-09-23 15:31     ` Torsten Kaiser
  2007-09-23 14:11   ` Alan Cox
  1 sibling, 1 reply; 7+ messages in thread
From: FUJITA Tomonori @ 2007-09-23 13:55 UTC (permalink / raw)
  To: just.for.lkml
  Cc: jens.axboe, linux-kernel, akpm, linux-scsi, linux-ide,
	fujita.tomonori

On Sun, 23 Sep 2007 15:19:13 +0200
"Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:

> On 9/21/07, Jens Axboe <jens.axboe@oracle.com> wrote:
> > SG chaining bits:
> > - This is the bulk of the patchset. It consists of three major
> >   components:
> >
> >         - sglist-core, which add helpers for iterating sg lists and
> >           switches the block layer and SCSI to use those. Should not
> >           have any functional changes.
> >         - sglist-drivers, which converts drivers to use the sg list
> >           helpers. Again, should not contain functional changes.
> >         - sglist-arch, which adds support to most architectures and
> >           actually enables sg chaining.
> 
> Adding linux-ide and linux-scsi as CC like Andrew did with my last report.
> 
> I still have trouble with my Silicon Image, Inc. SiI 3132 Serial ATA
> Raid II Controller as reported on 2.6.23-rc4-mm1 on the new
> 2.6.23-rc6-mm1.
> 
> I'm not 100% sure if this caused by the sg chaining, but the patch
> from http://lkml.org/lkml/2007/9/10/251 which touches that chaining
> makes a difference, so it might be related.
> 
> First report: http://lkml.org/lkml/2007/9/1/92
> With patch it fails fewer times: http://lkml.org/lkml/2007/9/14/107
> 
> To update the statistik:
> prior to 2.6.23-rc4-mm1: no trouble with any drives on the SiI 3132.
> 2.6.23-rc4-mm1 without patch: 2 out of 2 bad.
> back to 2.6.23-rc3-mm1: 18x good.
> 2.6.23-rc4-mm1 with patch:  2 out of 8 bad
> after that second mail:
> 2.6.23-rc4-mm1 with patch: 1 out of 5 bad
> 2.6.23-rc6-mm1: 1 out of 2 bad

git-block.patch in 2.6.23-rc6-mm1 includes my patch that disables sg
chaining for libata but it still includes libata's sg chaining
changes. So these changes breaks libata or libata was broken after
2.6.23-rc3-mm1.

Can you try Jens's sglist-arch branch? If it works, probably libata in
-mm has bugs.

For your convenience, I put a sglist-arch branch patch against v2.6.23-rc7:

http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2


> switching back to 2.6.23-rc3-mm1 to rule out the hardware:
> 2.6.23-rc3-mm1: 6x good
> 
> The error messages from the failed 2.6.23-rc6-mm1:
> Sep 18 18:50:01 treogen [   33.340000] md1: bitmap initialized from
> disk: read 10/10 pages, set 0 bits
> Sep 18 18:50:01 treogen [   33.340000] created bitmap (145 pages) for device md1
> Sep 18 18:50:01 treogen [   63.440000] ata1.00: exception Emask 0x0
> SAct 0x1 SErr 0x0 action 0x6 frozen
> Sep 18 18:50:01 treogen [   63.440000] ata1.00: cmd
> 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
> Sep 18 18:50:01 treogen [   63.440000]          res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Sep 18 18:50:01 treogen [   63.440000] ata1.00: status: {DRDY }
> Sep 18 18:50:01 treogen [   63.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [   65.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [   65.740000] ata1: reset failed (errno=-5),
> retrying in 8 secs
> Sep 18 18:50:01 treogen [   73.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [   75.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [   75.740000] ata1: reset failed (errno=-5),
> retrying in 8 secs
> Sep 18 18:50:01 treogen [   83.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [   85.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [   85.740000] ata1: reset failed (errno=-5),
> retrying in 33 secs
> Sep 18 18:50:01 treogen [  118.440000] ata1: limiting SATA link speed
> to 1.5 Gbps
> Sep 18 18:50:01 treogen [  118.440000] ata1: hard resetting link
> Sep 18 18:50:01 treogen [  120.740000] ata1: softreset failed (port not ready)
> Sep 18 18:50:01 treogen [  120.740000] ata1: reset failed, giving up
> Sep 18 18:50:01 treogen [  120.740000] ata1.00: disabled
> Sep 18 18:50:01 treogen [  120.740000] ata1: EH complete
> Sep 18 18:50:01 treogen [  120.740000] sd 0:0:0:0: [sda] Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> Sep 18 18:50:01 treogen [  120.740000] end_request: I/O error, dev
> sda, sector 625137161
> Sep 18 18:50:01 treogen [  120.740000] md: super_written gets
> error=-5, uptodate=0
> Sep 18 18:50:01 treogen [  120.740000] raid5: Disk failure on sda2,
> disabling device. Operation continuing on 2 devices
> 
> After that many more errors like this, only differing in the sector number:
> Sep 18 18:50:01 treogen [  120.810000] sd 0:0:0:0: [sda] Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
> Sep 18 18:50:01 treogen [  120.810000] end_request: I/O error, dev
> sda, sector 19550919
> 
> Any more infos needed?
> 
> Torsten
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's in linux-2.6-block.git for 2.6.24
  2007-09-23 13:55   ` FUJITA Tomonori
@ 2007-09-23 15:31     ` Torsten Kaiser
  2007-09-24 18:48       ` Torsten Kaiser
  0 siblings, 1 reply; 7+ messages in thread
From: Torsten Kaiser @ 2007-09-23 15:31 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: jens.axboe, linux-kernel, akpm, linux-scsi, linux-ide,
	fujita.tomonori

On 9/23/07, FUJITA Tomonori <tomof@acm.org> wrote:
> On Sun, 23 Sep 2007 15:19:13 +0200
> "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:
> > To update the statistik:
> > prior to 2.6.23-rc4-mm1: no trouble with any drives on the SiI 3132.
> > 2.6.23-rc4-mm1 without patch: 2 out of 2 bad.
> > back to 2.6.23-rc3-mm1: 18x good.
> > 2.6.23-rc4-mm1 with patch:  2 out of 8 bad
> > after that second mail:
> > 2.6.23-rc4-mm1 with patch: 1 out of 5 bad
> > 2.6.23-rc6-mm1: 1 out of 2 bad
>
> git-block.patch in 2.6.23-rc6-mm1 includes my patch that disables sg
> chaining for libata but it still includes libata's sg chaining
> changes. So these changes breaks libata or libata was broken after
> 2.6.23-rc3-mm1.
>
> Can you try Jens's sglist-arch branch? If it works, probably libata in
> -mm has bugs.
>
> For your convenience, I put a sglist-arch branch patch against v2.6.23-rc7:
>
> http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2

Thanks for the patch.
I tried it and 3 out of 3 boot attempts worked without problems.
But I can't rule out that the bug is still there, as I have no way to
trigger it on demand.

Torsten

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's in linux-2.6-block.git for 2.6.24
  2007-09-23 15:31     ` Torsten Kaiser
@ 2007-09-24 18:48       ` Torsten Kaiser
  2007-09-25  5:52         ` Torsten Kaiser
  0 siblings, 1 reply; 7+ messages in thread
From: Torsten Kaiser @ 2007-09-24 18:48 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: jens.axboe, linux-kernel, akpm, linux-scsi, linux-ide,
	fujita.tomonori

On 9/23/07, Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> On 9/23/07, FUJITA Tomonori <tomof@acm.org> wrote:
> > Can you try Jens's sglist-arch branch? If it works, probably libata in
> > -mm has bugs.
> >
> > For your convenience, I put a sglist-arch branch patch against v2.6.23-rc7:
> >
> > http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
>
> Thanks for the patch.
> I tried it and 3 out of 3 boot attempts worked without problems.
> But I can't rule out that the bug is still there, as I have no way to
> trigger it on demand.

Short update:
2 more boots with that kernel did also work.
I have just installed 2.6.23-rc7-mm1 and booted three times.
Also no problems with that version.

I will keep on using 2.6.23-rc7-mm1 and post again, if the error shows up again.

Torsten

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's in linux-2.6-block.git for 2.6.24
  2007-09-24 18:48       ` Torsten Kaiser
@ 2007-09-25  5:52         ` Torsten Kaiser
  0 siblings, 0 replies; 7+ messages in thread
From: Torsten Kaiser @ 2007-09-25  5:52 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: jens.axboe, linux-kernel, akpm, linux-scsi, linux-ide,
	fujita.tomonori

On 9/24/07, Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> I will keep on using 2.6.23-rc7-mm1 and post again, if the error shows up again.

On the next boot it did show up again, so 2.6.23-rc7-mm1 still has the bug.

[   33.810000] md1: bitmap initialized from disk: read 10/10 pages, set 0 bits
[   33.810000] created bitmap (145 pages) for device md1
[   63.910000] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[   63.910000] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[   63.910000]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[   63.910000] ata1.00: status: {DRDY }
[   63.910000] ata1: hard resetting link
[   66.210000] ata1: softreset failed (port not ready)
[   66.210000] ata1: reset failed (errno=-5), retrying in 8 secs
[   73.910000] ata1: hard resetting link
[   76.210000] ata1: softreset failed (port not ready)
[   76.210000] ata1: reset failed (errno=-5), retrying in 8 secs
[   83.910000] ata1: hard resetting link
[   86.210000] ata1: softreset failed (port not ready)
[   86.210000] ata1: reset failed (errno=-5), retrying in 33 secs
[  118.910000] ata1: limiting SATA link speed to 1.5 Gbps
[  118.910000] ata1: hard resetting link
[  121.210000] ata1: softreset failed (port not ready)
[  121.210000] ata1: reset failed, giving up
[  121.210000] ata1.00: disabled
[  121.210000] ata1: EH complete
[  121.210000] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[  121.210000] end_request: I/O error, dev sda, sector 625137161
[  121.210000] md: super_written gets error=-5, uptodate=0
[  121.210000] raid5: Disk failure on sda2, disabling device.
Operation continuing on 2 devices

After that there are many more error like this in the log:
[  135.760000] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[  135.760000] end_request: I/O error, dev sda, sector 19551113
[  135.760000] Buffer I/O error on device sda2, logical block 1
or:
[  135.760000] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[  135.760000] end_request: I/O error, dev sda, sector 19551105

Torsten

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's in linux-2.6-block.git for 2.6.24
  2007-09-23 13:19 ` What's in linux-2.6-block.git for 2.6.24 Torsten Kaiser
  2007-09-23 13:55   ` FUJITA Tomonori
@ 2007-09-23 14:11   ` Alan Cox
  2007-09-23 15:40     ` Torsten Kaiser
  1 sibling, 1 reply; 7+ messages in thread
From: Alan Cox @ 2007-09-23 14:11 UTC (permalink / raw)
  To: Torsten Kaiser; +Cc: Jens Axboe, linux-kernel, akpm, linux-scsi, linux-ide

> Sep 18 18:50:01 treogen [   63.440000] ata1.00: status: {DRDY }
> Sep 18 18:50:01 treogen [   63.440000] ata1: hard resetting link

Timed out waiting for data transfers to complete that didn't. Does sound
like the device got told the wrong sized transfer.


It then falls off the bus because Jeff hasn't merged Mark Lord's DRQ
draining patch.

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's in linux-2.6-block.git for 2.6.24
  2007-09-23 14:11   ` Alan Cox
@ 2007-09-23 15:40     ` Torsten Kaiser
  0 siblings, 0 replies; 7+ messages in thread
From: Torsten Kaiser @ 2007-09-23 15:40 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jens Axboe, linux-kernel, akpm, linux-scsi, linux-ide

On 9/23/07, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > Sep 18 18:50:01 treogen [   63.440000] ata1.00: status: {DRDY }
> > Sep 18 18:50:01 treogen [   63.440000] ata1: hard resetting link
>
> Timed out waiting for data transfers to complete that didn't. Does sound
> like the device got told the wrong sized transfer.
>
>
> It then falls off the bus because Jeff hasn't merged Mark Lord's DRQ
> draining patch.

One time the error was different:
Sep 11 19:19:24 treogen [   33.340000] ata1.00: exception Emask 0x20
SAct 0x1 SErr 0x0 action 0x2
Sep 11 19:19:24 treogen [   33.340000] ata1.00: irq_stat 0x00020002,
PCI master abort while fetching SGT
Sep 11 19:19:24 treogen [   33.340000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 11 19:19:24 treogen [   33.340000]          res
50/00:00:af:ea:42/00:00:25:00:00/e0 Emask 0x20 (host bus error)
Sep 11 19:19:24 treogen [   33.340000] ata1.00: status: {DRDY }
Sep 11 19:19:24 treogen [   33.670000] ata1: soft resetting link
Sep 11 19:19:24 treogen [   33.710000] ata1: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 11 19:19:24 treogen [   33.800000] ata1.00: configured for UDMA/100
Sep 11 19:19:24 treogen [   33.800000] ata1: EH complete

This was repeated 12 times.
(Diff between a good boot and one with that error is here:
http://lkml.org/lkml/2007/9/14/107 )

Torsten

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-09-25  5:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20070921085711.GG2367@kernel.dk>
2007-09-23 13:19 ` What's in linux-2.6-block.git for 2.6.24 Torsten Kaiser
2007-09-23 13:55   ` FUJITA Tomonori
2007-09-23 15:31     ` Torsten Kaiser
2007-09-24 18:48       ` Torsten Kaiser
2007-09-25  5:52         ` Torsten Kaiser
2007-09-23 14:11   ` Alan Cox
2007-09-23 15:40     ` Torsten Kaiser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox