Libata VIA woes continue. Worked around

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Libata VIA woes continue. Worked around
@ 2004-08-27 13:58 Brad Campbell
  2004-08-29  7:32 ` Libata VIA woes continue. Worked around - *wrong* Brad Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-27 13:58 UTC (permalink / raw)
  To: linux-ide

Ok, so after a couple of reboots with max_sector set to 200 the problem re-occurs.

It must be something to do with programming the controller or timing or some other issue.

I have worked around it by putting my 2 raid-0 drives on my spare promise ports, and at UDMA100 with 
transfers of 2048 sectors they behave fine no matter what I throw at them.

It seems with the VIA interface it either works first time for a big transfer or it does not, 
depending on the cold boot.

If I power cycle the machine 5 times it might work perfectly 3 out of 5 and if it works ok for the 
first couple of transfers it will work ok for the entire uptime of the machine. If not then it locks 
the interface up.

I'm going to sit on it for a few days. It's no longer an issue for me and perhaps I'll stop chasing 
it until someone else reports a similar issue with a VIA controller.

I'll see if I can jam my 2 new Maxtors in the machine on these VIA ports and give them a good 
workout. I only rebooted the machine 3 times with them connected and I might have been lucky with 3 
working reboots. (There is only so many times you want to power cycle a box that has 15 hard disks 
in it!)

Regards,
Brad (Dazed, confused and a little narked!)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Libata VIA woes continue. Worked around - *wrong*
  2004-08-27 13:58 Libata VIA woes continue. Worked around Brad Campbell
@ 2004-08-29  7:32 ` Brad Campbell
  2004-08-29  7:57   ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-29  7:32 UTC (permalink / raw)
  To: linux-ide

Brad Campbell wrote:
> Ok, so after a couple of reboots with max_sector set to 200 the problem 
> re-occurs.
> 
> It must be something to do with programming the controller or timing or 
> some other issue.
> 
> I have worked around it by putting my 2 raid-0 drives on my spare 
> promise ports, and at UDMA100 with transfers of 2048 sectors they behave 
> fine no matter what I throw at them.

Scratch that. After a couple of days if intensive testing/rebooting and abuse they play up on the
Promise controller in exactly the same failure mode. Just far harder to trigger.

I have removed these bridge board from my system now and thus the problem no longer exists. I'm a
little concerned that this might show itself for other people in the future but then I guess most
sane people buy SATA hard disks rather than re-use old ATA drives with bridge boards.
Cross that bridge if we come to it I guess.

Regards,
Brad (again and again and again)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Libata VIA woes continue. Worked around - *wrong*
  2004-08-29  7:32 ` Libata VIA woes continue. Worked around - *wrong* Brad Campbell
@ 2004-08-29  7:57   ` Jeff Garzik
  2004-08-29  8:17     ` Brad Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-08-29  7:57 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-ide

Brad Campbell wrote:
> Brad Campbell wrote:
> 
>> Ok, so after a couple of reboots with max_sector set to 200 the 
>> problem re-occurs.
>>
>> It must be something to do with programming the controller or timing 
>> or some other issue.
>>
>> I have worked around it by putting my 2 raid-0 drives on my spare 
>> promise ports, and at UDMA100 with transfers of 2048 sectors they 
>> behave fine no matter what I throw at them.
> 
> 
> Scratch that. After a couple of days if intensive testing/rebooting and 
> abuse they play up on the
> Promise controller in exactly the same failure mode. Just far harder to 
> trigger.
> 
> I have removed these bridge board from my system now and thus the 
> problem no longer exists. I'm a
> little concerned that this might show itself for other people in the 
> future but then I guess most
> sane people buy SATA hard disks rather than re-use old ATA drives with 
> bridge boards.

Well, there are some cases on a few controllers (SiI is one that comes 
to mind) where -- IIRC -- bridges dictate the max is UDMA/100, not 
UDMA/133, even if the underlying device is UDMA/133.

In sata_promise.c or sata_via.c, what happens if you change udma_mask 
from 0x7f to 0x3f?  Do the failures go away?


> Cross that bridge if we come to it I guess.

Guffaw ;-)

	Jeff



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Libata VIA woes continue. Worked around - *wrong*
  2004-08-29  7:57   ` Jeff Garzik
@ 2004-08-29  8:17     ` Brad Campbell
  2004-08-29  9:04       ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-29  8:17 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide

Jeff Garzik wrote:

> 
> Well, there are some cases on a few controllers (SiI is one that comes 
> to mind) where -- IIRC -- bridges dictate the max is UDMA/100, not 
> UDMA/133, even if the underlying device is UDMA/133.
> 
> In sata_promise.c or sata_via.c, what happens if you change udma_mask 
> from 0x7f to 0x3f?  Do the failures go away?

These drives are UDMA/100. On the VIA controller I changed the udma_mask to 0x1f and the failures 
"appeared" to go away but that was before I realised the exact nature of the failure mode. (That 
being it will either fail on bootup, or very soon after or it will work perfectly until the next boot)

I can always hook the drives up and hammer them if you'd like me to do further testing but I'm not 
sure how we can then let libata know that the drives connected need to be slowed down as we can't 
identify we have a bridge connected really.

I'm still not convinced that it's not something else.
Sure transfers > 200 sectors killed it on the VIA controller at UDMA/100 while they appeared to work 
ok at UDMA/66. I guess I need to run a defined array of tests.

- Large transfers (> 200) at UDMA/100 and UDMA/66
- Small transfers (<=200) at UDMA/100 and UDMA/66
- Something like 10 reboot cycles of each.

It's very hard to hit on the Promise controller (Perhaps < 10% of reboots) while on the VIA 
controller it happens maybe 60% of the time.

And of course 2.6.5 never hits it at all. (And given I patched the VIA driver in 2.6.9-rc1 to keep 
transfers < 200 sectors and still hit the bug it's not that!)

>> Cross that bridge if we come to it I guess.
> 
> 
> Guffaw ;-)

Oh bugger.. I had not even realised what I said! :p) (Slow news day obviously)

Regards,
Brad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Libata VIA woes continue. Worked around - *wrong*
  2004-08-29  8:17     ` Brad Campbell
@ 2004-08-29  9:04       ` Jeff Garzik
  2004-08-29  9:24         ` Brad Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-08-29  9:04 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-ide

Brad Campbell wrote:
> Jeff Garzik wrote:
> 
>>
>> Well, there are some cases on a few controllers (SiI is one that comes 
>> to mind) where -- IIRC -- bridges dictate the max is UDMA/100, not 
>> UDMA/133, even if the underlying device is UDMA/133.
>>
>> In sata_promise.c or sata_via.c, what happens if you change udma_mask 
>> from 0x7f to 0x3f?  Do the failures go away?
> 
> 
> These drives are UDMA/100. On the VIA controller I changed the udma_mask 
> to 0x1f and the failures "appeared" to go away but that was before I 
> realised the exact nature of the failure mode. (That being it will 
> either fail on bootup, or very soon after or it will work perfectly 
> until the next boot)
> 
> I can always hook the drives up and hammer them if you'd like me to do 
> further testing but I'm not sure how we can then let libata know that 
> the drives connected need to be slowed down as we can't identify we have 
> a bridge connected really.
> 
> I'm still not convinced that it's not something else.
> Sure transfers > 200 sectors killed it on the VIA controller at UDMA/100 
> while they appeared to work ok at UDMA/66. I guess I need to run a 
> defined array of tests.
> 
> - Large transfers (> 200) at UDMA/100 and UDMA/66
> - Small transfers (<=200) at UDMA/100 and UDMA/66
> - Something like 10 reboot cycles of each.
> 
> It's very hard to hit on the Promise controller (Perhaps < 10% of 
> reboots) while on the VIA controller it happens maybe 60% of the time.
> 
> And of course 2.6.5 never hits it at all. (And given I patched the VIA 
> driver in 2.6.9-rc1 to keep transfers < 200 sectors and still hit the 
> bug it's not that!)

Well, if you are completely unable to reproduce in 2.6.5, there are a 
couple things to try:

* copy drivers/scsi/libata*, drivers/scsi/sata_*, 
drivers/scsi/ata_piix.c, include/linux/libata.h, include/linux/ata.h 
from 2.6.9-rc1-bk into 2.6.5, and see if you can reproduce the failure. 
  (I can help if there are any compile/API problems you can't figure 
out)  That will eliminate non-libata changes at least.

* look at the changes from 2.6.5 -> 2.6.6 and see which change breaks 
things.  You can get a list of each change like this:

	bk changes -rv2.6.5..v2.6.6

then you can revert each patch in order, or bsearch.  Here's an example 
of reverting each libata patch in order:

bk clone http://linux.bkbits.net/linux-2.5 vanilla-2.6
bk clone -ql -rv2.6.6 vanilla-2.6 brad-test-2.6.6
cd brad-test-2.6.6
bk -r co -Sq
bk changes -rv2.6.5.. > /tmp/changes-list.txt
less /tmp/changes-list.txt	# scan for a libata-related change
bk cset -x1.1587.39.2		# applies reverse of cset 1.1587.39.2
make				# create test
				# ... test fails
bk cset -x1.1587.39.1		# applies reverse of cset 1.1587.39.1
				# _on top of_ previous reverted patch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Libata VIA woes continue. Worked around - *wrong*
  2004-08-29  9:04       ` Jeff Garzik
@ 2004-08-29  9:24         ` Brad Campbell
  2004-08-29  9:38           ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-29  9:24 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide, linxu-kernel

Jeff Garzik wrote:

> * look at the changes from 2.6.5 -> 2.6.6 and see which change breaks 
> things.  You can get a list of each change like this:
> 
>     bk changes -rv2.6.5..v2.6.6
> 
> then you can revert each patch in order, or bsearch.  Here's an example 
> of reverting each libata patch in order:
> 
> bk clone http://linux.bkbits.net/linux-2.5 vanilla-2.6
> bk clone -ql -rv2.6.6 vanilla-2.6 brad-test-2.6.6
> cd brad-test-2.6.6
> bk -r co -Sq
> bk changes -rv2.6.5.. > /tmp/changes-list.txt
> less /tmp/changes-list.txt    # scan for a libata-related change
> bk cset -x1.1587.39.2        # applies reverse of cset 1.1587.39.2
> make                # create test
>                 # ... test fails
> bk cset -x1.1587.39.1        # applies reverse of cset 1.1587.39.1
>                 # _on top of_ previous reverted patch
> -

Ooooohh. I have been looking for a "Dummies guide to regression testing with BK" and not been able 
to find one. I have cc'd this to linux-kernel purely for the purpose of more googleable archives for 
future reference for BK newbies like me.

Cheers Jeff!

I'll start hammering on this tonight.

(It's actually between 2.6.6 and 2.6.7-rc1 that the breakage occurs, I had just been running 2.6.5 
until I recently got a dodgy hard disk which showed up flaws in the libata error handling, thus I 
tried to move to 2.6.8.1 to debug that and found it broke some of my drives in other ways. I have 
already cloned the relevant trees, I just could not figure out how to break it down to cset granularity)

Regards,
Brad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Libata VIA woes continue. Worked around - *wrong*
  2004-08-29  9:24         ` Brad Campbell
@ 2004-08-29  9:38           ` Jeff Garzik
  2004-08-30 14:45             ` Larry McVoy
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-08-29  9:38 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-ide, linxu-kernel, Larry McVoy, Linus Torvalds

Brad Campbell wrote:
> Jeff Garzik wrote:
> 
>> * look at the changes from 2.6.5 -> 2.6.6 and see which change breaks 
>> things.  You can get a list of each change like this:
>>
>>     bk changes -rv2.6.5..v2.6.6
>>
>> then you can revert each patch in order, or bsearch.  Here's an 
>> example of reverting each libata patch in order:
>>
>> bk clone http://linux.bkbits.net/linux-2.5 vanilla-2.6
>> bk clone -ql -rv2.6.6 vanilla-2.6 brad-test-2.6.6
>> cd brad-test-2.6.6
>> bk -r co -Sq
>> bk changes -rv2.6.5.. > /tmp/changes-list.txt
>> less /tmp/changes-list.txt    # scan for a libata-related change
>> bk cset -x1.1587.39.2        # applies reverse of cset 1.1587.39.2
>> make                # create test
>>                 # ... test fails
>> bk cset -x1.1587.39.1        # applies reverse of cset 1.1587.39.1
>>                 # _on top of_ previous reverted patch
>> -
> 
> 
> Ooooohh. I have been looking for a "Dummies guide to regression testing 
> with BK" and not been able to find one. I have cc'd this to linux-kernel 
> purely for the purpose of more googleable archives for future reference 
> for BK newbies like me.
> 
> Cheers Jeff!
> 
> I'll start hammering on this tonight.

Groovy :)

Since BK changesets are ordered as a progression, you can also do a 
bsearch by clone trees to specific changesets, such as

bk changes -rv2.6.6..2.6.7 > /tmp/changes.txt
# view changes.txt, pick out cset 1.1587.39.1 as your "top of tree"
bk clone -r1.1587.39.1 vanilla-2.6 brad-test-2.6.6-bk
# compile and test the kernel in brad-test-2.6.6-bk

Since we're CC'ing lkml to add to the collective wisdom, maybe Larry or 
Linus have something to add, WRT tips on efficiently narrowing down a 
regression in the kernel, using BK.  I am _definitely_ not a BK wizard 
in this specific area.

	Jeff



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Libata VIA woes continue. Worked around - *wrong*
  2004-08-29  9:38           ` Jeff Garzik
@ 2004-08-30 14:45             ` Larry McVoy
  0 siblings, 0 replies; 8+ messages in thread
From: Larry McVoy @ 2004-08-30 14:45 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Brad Campbell, linux-ide, linxu-kernel, Larry McVoy,
	Linus Torvalds

> Since BK changesets are ordered as a progression, you can also do a 
> bsearch by clone trees to specific changesets, such as
> 
> bk changes -rv2.6.6..2.6.7 > /tmp/changes.txt
> # view changes.txt, pick out cset 1.1587.39.1 as your "top of tree"
> bk clone -r1.1587.39.1 vanilla-2.6 brad-test-2.6.6-bk
> # compile and test the kernel in brad-test-2.6.6-bk

A couple of comments:
    - BK changesets are not a linear progression, they are in the form of
      a graph called a lattice.  Getting a path through there that you can
      do binary search on is not straightforward.

    - The CVS tree represents one such straight path, get just the ChangeSet
      file from the CVS tree and do an rlog on it - you are looking for the
      lines like:

      BKrev: 41316382Cxbyp1_yHDX8LmymGot3Ww

      That rev is the "md5key" of the BK rev and can be used anywhere a BK
      rev may be used (bk clone -r41316382Cxbyp1_yHDX8LmymGot3Ww ...)

    - The biggest time saver is knowing where to look for your bug.  If you
      knew that the bug was in drivers/scsi/libata-core.c then you could
      find each changeset which touched that file like so

      $ bk rset -lv2.6.6 | grep drivers/scsi/libata-core.c
      drivers/scsi/libata-core.c|1.39
      $ bk prs -hnd:I: -r1.39.. drivers/scsi/libata-core.c | while read rev
      do  bk r2c -r$rev drivers/scsi/libata-core.c
      done

      That will crunch away and spit out (in this case) 63 revs like

      1.1803.1.40
      1.1803.1.39
      1.1803.1.38
      ...

      and a binary search over those revs is likely to be fair more fruitful
      because the history of that one file is pretty linear.
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-08-30 14:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-27 13:58 Libata VIA woes continue. Worked around Brad Campbell
2004-08-29  7:32 ` Libata VIA woes continue. Worked around - *wrong* Brad Campbell
2004-08-29  7:57   ` Jeff Garzik
2004-08-29  8:17     ` Brad Campbell
2004-08-29  9:04       ` Jeff Garzik
2004-08-29  9:24         ` Brad Campbell
2004-08-29  9:38           ` Jeff Garzik
2004-08-30 14:45             ` Larry McVoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).