* Libata VIA woes continue. Worked around
@ 2004-08-27 13:58 Brad Campbell
2004-08-29 7:32 ` Libata VIA woes continue. Worked around - *wrong* Brad Campbell
0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-27 13:58 UTC (permalink / raw)
To: linux-ide
Ok, so after a couple of reboots with max_sector set to 200 the problem re-occurs.
It must be something to do with programming the controller or timing or some other issue.
I have worked around it by putting my 2 raid-0 drives on my spare promise ports, and at UDMA100 with
transfers of 2048 sectors they behave fine no matter what I throw at them.
It seems with the VIA interface it either works first time for a big transfer or it does not,
depending on the cold boot.
If I power cycle the machine 5 times it might work perfectly 3 out of 5 and if it works ok for the
first couple of transfers it will work ok for the entire uptime of the machine. If not then it locks
the interface up.
I'm going to sit on it for a few days. It's no longer an issue for me and perhaps I'll stop chasing
it until someone else reports a similar issue with a VIA controller.
I'll see if I can jam my 2 new Maxtors in the machine on these VIA ports and give them a good
workout. I only rebooted the machine 3 times with them connected and I might have been lucky with 3
working reboots. (There is only so many times you want to power cycle a box that has 15 hard disks
in it!)
Regards,
Brad (Dazed, confused and a little narked!)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata VIA woes continue. Worked around - *wrong*
2004-08-27 13:58 Libata VIA woes continue. Worked around Brad Campbell
@ 2004-08-29 7:32 ` Brad Campbell
2004-08-29 7:57 ` Jeff Garzik
0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-29 7:32 UTC (permalink / raw)
To: linux-ide
Brad Campbell wrote:
> Ok, so after a couple of reboots with max_sector set to 200 the problem
> re-occurs.
>
> It must be something to do with programming the controller or timing or
> some other issue.
>
> I have worked around it by putting my 2 raid-0 drives on my spare
> promise ports, and at UDMA100 with transfers of 2048 sectors they behave
> fine no matter what I throw at them.
Scratch that. After a couple of days if intensive testing/rebooting and abuse they play up on the
Promise controller in exactly the same failure mode. Just far harder to trigger.
I have removed these bridge board from my system now and thus the problem no longer exists. I'm a
little concerned that this might show itself for other people in the future but then I guess most
sane people buy SATA hard disks rather than re-use old ATA drives with bridge boards.
Cross that bridge if we come to it I guess.
Regards,
Brad (again and again and again)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata VIA woes continue. Worked around - *wrong*
2004-08-29 7:32 ` Libata VIA woes continue. Worked around - *wrong* Brad Campbell
@ 2004-08-29 7:57 ` Jeff Garzik
2004-08-29 8:17 ` Brad Campbell
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-08-29 7:57 UTC (permalink / raw)
To: Brad Campbell; +Cc: linux-ide
Brad Campbell wrote:
> Brad Campbell wrote:
>
>> Ok, so after a couple of reboots with max_sector set to 200 the
>> problem re-occurs.
>>
>> It must be something to do with programming the controller or timing
>> or some other issue.
>>
>> I have worked around it by putting my 2 raid-0 drives on my spare
>> promise ports, and at UDMA100 with transfers of 2048 sectors they
>> behave fine no matter what I throw at them.
>
>
> Scratch that. After a couple of days if intensive testing/rebooting and
> abuse they play up on the
> Promise controller in exactly the same failure mode. Just far harder to
> trigger.
>
> I have removed these bridge board from my system now and thus the
> problem no longer exists. I'm a
> little concerned that this might show itself for other people in the
> future but then I guess most
> sane people buy SATA hard disks rather than re-use old ATA drives with
> bridge boards.
Well, there are some cases on a few controllers (SiI is one that comes
to mind) where -- IIRC -- bridges dictate the max is UDMA/100, not
UDMA/133, even if the underlying device is UDMA/133.
In sata_promise.c or sata_via.c, what happens if you change udma_mask
from 0x7f to 0x3f? Do the failures go away?
> Cross that bridge if we come to it I guess.
Guffaw ;-)
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata VIA woes continue. Worked around - *wrong*
2004-08-29 7:57 ` Jeff Garzik
@ 2004-08-29 8:17 ` Brad Campbell
2004-08-29 9:04 ` Jeff Garzik
0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-29 8:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-ide
Jeff Garzik wrote:
>
> Well, there are some cases on a few controllers (SiI is one that comes
> to mind) where -- IIRC -- bridges dictate the max is UDMA/100, not
> UDMA/133, even if the underlying device is UDMA/133.
>
> In sata_promise.c or sata_via.c, what happens if you change udma_mask
> from 0x7f to 0x3f? Do the failures go away?
These drives are UDMA/100. On the VIA controller I changed the udma_mask to 0x1f and the failures
"appeared" to go away but that was before I realised the exact nature of the failure mode. (That
being it will either fail on bootup, or very soon after or it will work perfectly until the next boot)
I can always hook the drives up and hammer them if you'd like me to do further testing but I'm not
sure how we can then let libata know that the drives connected need to be slowed down as we can't
identify we have a bridge connected really.
I'm still not convinced that it's not something else.
Sure transfers > 200 sectors killed it on the VIA controller at UDMA/100 while they appeared to work
ok at UDMA/66. I guess I need to run a defined array of tests.
- Large transfers (> 200) at UDMA/100 and UDMA/66
- Small transfers (<=200) at UDMA/100 and UDMA/66
- Something like 10 reboot cycles of each.
It's very hard to hit on the Promise controller (Perhaps < 10% of reboots) while on the VIA
controller it happens maybe 60% of the time.
And of course 2.6.5 never hits it at all. (And given I patched the VIA driver in 2.6.9-rc1 to keep
transfers < 200 sectors and still hit the bug it's not that!)
>> Cross that bridge if we come to it I guess.
>
>
> Guffaw ;-)
Oh bugger.. I had not even realised what I said! :p) (Slow news day obviously)
Regards,
Brad
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata VIA woes continue. Worked around - *wrong*
2004-08-29 8:17 ` Brad Campbell
@ 2004-08-29 9:04 ` Jeff Garzik
2004-08-29 9:24 ` Brad Campbell
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-08-29 9:04 UTC (permalink / raw)
To: Brad Campbell; +Cc: linux-ide
Brad Campbell wrote:
> Jeff Garzik wrote:
>
>>
>> Well, there are some cases on a few controllers (SiI is one that comes
>> to mind) where -- IIRC -- bridges dictate the max is UDMA/100, not
>> UDMA/133, even if the underlying device is UDMA/133.
>>
>> In sata_promise.c or sata_via.c, what happens if you change udma_mask
>> from 0x7f to 0x3f? Do the failures go away?
>
>
> These drives are UDMA/100. On the VIA controller I changed the udma_mask
> to 0x1f and the failures "appeared" to go away but that was before I
> realised the exact nature of the failure mode. (That being it will
> either fail on bootup, or very soon after or it will work perfectly
> until the next boot)
>
> I can always hook the drives up and hammer them if you'd like me to do
> further testing but I'm not sure how we can then let libata know that
> the drives connected need to be slowed down as we can't identify we have
> a bridge connected really.
>
> I'm still not convinced that it's not something else.
> Sure transfers > 200 sectors killed it on the VIA controller at UDMA/100
> while they appeared to work ok at UDMA/66. I guess I need to run a
> defined array of tests.
>
> - Large transfers (> 200) at UDMA/100 and UDMA/66
> - Small transfers (<=200) at UDMA/100 and UDMA/66
> - Something like 10 reboot cycles of each.
>
> It's very hard to hit on the Promise controller (Perhaps < 10% of
> reboots) while on the VIA controller it happens maybe 60% of the time.
>
> And of course 2.6.5 never hits it at all. (And given I patched the VIA
> driver in 2.6.9-rc1 to keep transfers < 200 sectors and still hit the
> bug it's not that!)
Well, if you are completely unable to reproduce in 2.6.5, there are a
couple things to try:
* copy drivers/scsi/libata*, drivers/scsi/sata_*,
drivers/scsi/ata_piix.c, include/linux/libata.h, include/linux/ata.h
from 2.6.9-rc1-bk into 2.6.5, and see if you can reproduce the failure.
(I can help if there are any compile/API problems you can't figure
out) That will eliminate non-libata changes at least.
* look at the changes from 2.6.5 -> 2.6.6 and see which change breaks
things. You can get a list of each change like this:
bk changes -rv2.6.5..v2.6.6
then you can revert each patch in order, or bsearch. Here's an example
of reverting each libata patch in order:
bk clone http://linux.bkbits.net/linux-2.5 vanilla-2.6
bk clone -ql -rv2.6.6 vanilla-2.6 brad-test-2.6.6
cd brad-test-2.6.6
bk -r co -Sq
bk changes -rv2.6.5.. > /tmp/changes-list.txt
less /tmp/changes-list.txt # scan for a libata-related change
bk cset -x1.1587.39.2 # applies reverse of cset 1.1587.39.2
make # create test
# ... test fails
bk cset -x1.1587.39.1 # applies reverse of cset 1.1587.39.1
# _on top of_ previous reverted patch
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata VIA woes continue. Worked around - *wrong*
2004-08-29 9:04 ` Jeff Garzik
@ 2004-08-29 9:24 ` Brad Campbell
2004-08-29 9:38 ` Jeff Garzik
0 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2004-08-29 9:24 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-ide, linxu-kernel
Jeff Garzik wrote:
> * look at the changes from 2.6.5 -> 2.6.6 and see which change breaks
> things. You can get a list of each change like this:
>
> bk changes -rv2.6.5..v2.6.6
>
> then you can revert each patch in order, or bsearch. Here's an example
> of reverting each libata patch in order:
>
> bk clone http://linux.bkbits.net/linux-2.5 vanilla-2.6
> bk clone -ql -rv2.6.6 vanilla-2.6 brad-test-2.6.6
> cd brad-test-2.6.6
> bk -r co -Sq
> bk changes -rv2.6.5.. > /tmp/changes-list.txt
> less /tmp/changes-list.txt # scan for a libata-related change
> bk cset -x1.1587.39.2 # applies reverse of cset 1.1587.39.2
> make # create test
> # ... test fails
> bk cset -x1.1587.39.1 # applies reverse of cset 1.1587.39.1
> # _on top of_ previous reverted patch
> -
Ooooohh. I have been looking for a "Dummies guide to regression testing with BK" and not been able
to find one. I have cc'd this to linux-kernel purely for the purpose of more googleable archives for
future reference for BK newbies like me.
Cheers Jeff!
I'll start hammering on this tonight.
(It's actually between 2.6.6 and 2.6.7-rc1 that the breakage occurs, I had just been running 2.6.5
until I recently got a dodgy hard disk which showed up flaws in the libata error handling, thus I
tried to move to 2.6.8.1 to debug that and found it broke some of my drives in other ways. I have
already cloned the relevant trees, I just could not figure out how to break it down to cset granularity)
Regards,
Brad
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata VIA woes continue. Worked around - *wrong*
2004-08-29 9:24 ` Brad Campbell
@ 2004-08-29 9:38 ` Jeff Garzik
2004-08-30 14:45 ` Larry McVoy
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-08-29 9:38 UTC (permalink / raw)
To: Brad Campbell; +Cc: linux-ide, linxu-kernel, Larry McVoy, Linus Torvalds
Brad Campbell wrote:
> Jeff Garzik wrote:
>
>> * look at the changes from 2.6.5 -> 2.6.6 and see which change breaks
>> things. You can get a list of each change like this:
>>
>> bk changes -rv2.6.5..v2.6.6
>>
>> then you can revert each patch in order, or bsearch. Here's an
>> example of reverting each libata patch in order:
>>
>> bk clone http://linux.bkbits.net/linux-2.5 vanilla-2.6
>> bk clone -ql -rv2.6.6 vanilla-2.6 brad-test-2.6.6
>> cd brad-test-2.6.6
>> bk -r co -Sq
>> bk changes -rv2.6.5.. > /tmp/changes-list.txt
>> less /tmp/changes-list.txt # scan for a libata-related change
>> bk cset -x1.1587.39.2 # applies reverse of cset 1.1587.39.2
>> make # create test
>> # ... test fails
>> bk cset -x1.1587.39.1 # applies reverse of cset 1.1587.39.1
>> # _on top of_ previous reverted patch
>> -
>
>
> Ooooohh. I have been looking for a "Dummies guide to regression testing
> with BK" and not been able to find one. I have cc'd this to linux-kernel
> purely for the purpose of more googleable archives for future reference
> for BK newbies like me.
>
> Cheers Jeff!
>
> I'll start hammering on this tonight.
Groovy :)
Since BK changesets are ordered as a progression, you can also do a
bsearch by clone trees to specific changesets, such as
bk changes -rv2.6.6..2.6.7 > /tmp/changes.txt
# view changes.txt, pick out cset 1.1587.39.1 as your "top of tree"
bk clone -r1.1587.39.1 vanilla-2.6 brad-test-2.6.6-bk
# compile and test the kernel in brad-test-2.6.6-bk
Since we're CC'ing lkml to add to the collective wisdom, maybe Larry or
Linus have something to add, WRT tips on efficiently narrowing down a
regression in the kernel, using BK. I am _definitely_ not a BK wizard
in this specific area.
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Libata VIA woes continue. Worked around - *wrong*
2004-08-29 9:38 ` Jeff Garzik
@ 2004-08-30 14:45 ` Larry McVoy
0 siblings, 0 replies; 8+ messages in thread
From: Larry McVoy @ 2004-08-30 14:45 UTC (permalink / raw)
To: Jeff Garzik
Cc: Brad Campbell, linux-ide, linxu-kernel, Larry McVoy,
Linus Torvalds
> Since BK changesets are ordered as a progression, you can also do a
> bsearch by clone trees to specific changesets, such as
>
> bk changes -rv2.6.6..2.6.7 > /tmp/changes.txt
> # view changes.txt, pick out cset 1.1587.39.1 as your "top of tree"
> bk clone -r1.1587.39.1 vanilla-2.6 brad-test-2.6.6-bk
> # compile and test the kernel in brad-test-2.6.6-bk
A couple of comments:
- BK changesets are not a linear progression, they are in the form of
a graph called a lattice. Getting a path through there that you can
do binary search on is not straightforward.
- The CVS tree represents one such straight path, get just the ChangeSet
file from the CVS tree and do an rlog on it - you are looking for the
lines like:
BKrev: 41316382Cxbyp1_yHDX8LmymGot3Ww
That rev is the "md5key" of the BK rev and can be used anywhere a BK
rev may be used (bk clone -r41316382Cxbyp1_yHDX8LmymGot3Ww ...)
- The biggest time saver is knowing where to look for your bug. If you
knew that the bug was in drivers/scsi/libata-core.c then you could
find each changeset which touched that file like so
$ bk rset -lv2.6.6 | grep drivers/scsi/libata-core.c
drivers/scsi/libata-core.c|1.39
$ bk prs -hnd:I: -r1.39.. drivers/scsi/libata-core.c | while read rev
do bk r2c -r$rev drivers/scsi/libata-core.c
done
That will crunch away and spit out (in this case) 63 revs like
1.1803.1.40
1.1803.1.39
1.1803.1.38
...
and a binary search over those revs is likely to be fair more fruitful
because the history of that one file is pretty linear.
--
---
Larry McVoy lm at bitmover.com http://www.bitkeeper.com
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-08-30 14:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-27 13:58 Libata VIA woes continue. Worked around Brad Campbell
2004-08-29 7:32 ` Libata VIA woes continue. Worked around - *wrong* Brad Campbell
2004-08-29 7:57 ` Jeff Garzik
2004-08-29 8:17 ` Brad Campbell
2004-08-29 9:04 ` Jeff Garzik
2004-08-29 9:24 ` Brad Campbell
2004-08-29 9:38 ` Jeff Garzik
2004-08-30 14:45 ` Larry McVoy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).