* [2.6.19.1] ESP regression ?
@ 2007-01-05 9:47 BERTRAND Joël
2007-01-05 17:45 ` Tom 'spot' Callaway
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: BERTRAND Joël @ 2007-01-05 9:47 UTC (permalink / raw)
To: sparclinux
Hello,
Some trouble again with ESP DMA and the 2.6.19 kernel ? With very high
disk load (raid reconstruction or apt-get dist-upgrade), I can seen ESP
DMA error on a SS20 workstation. This trouble has been fixed in 2.6.18.
Is there any chance to fix this with the last 2.6.20-rc ?
Test config :
- dual SS-II/75 MHz
- 448 MB
- 8MB VSIMM
- 2 36 GB internal SCSI disks
- 2.6.19.1 kernel
Thanks for your help,
JKB
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.6.19.1] ESP regression ?
2007-01-05 9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
@ 2007-01-05 17:45 ` Tom 'spot' Callaway
2007-01-05 19:35 ` Tom 'spot' Callaway
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Tom 'spot' Callaway @ 2007-01-05 17:45 UTC (permalink / raw)
To: sparclinux
On Fri, 2007-01-05 at 10:47 +0100, BERTRAND Joël wrote:
> Hello,
>
> Some trouble again with ESP DMA and the 2.6.19 kernel ? With very high
> disk load (raid reconstruction or apt-get dist-upgrade), I can seen ESP
> DMA error on a SS20 workstation. This trouble has been fixed in 2.6.18.
> Is there any chance to fix this with the last 2.6.20-rc ?
>
> Test config :
> - dual SS-II/75 MHz
> - 448 MB
> - 8MB VSIMM
> - 2 36 GB internal SCSI disks
> - 2.6.19.1 kernel
For what it is worth, I'm not actually able to get esp.ko (Aurora builds
esp as a module) working at all on any sparc32 systems (immediate
testing on ss4 and ss20). Several of the Aurora folks have been helping
me try to track down the failure, and here is what we know so far:
2.6.16 works properly on the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.16-1.2241sp7.1-dmesg.txt
We see the disk on the esp controller at target three, and life is good.
2.6.18.1 does not work properly on the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.18-1.2798.al3.3.txt
The esp.ko module loads, but it doesn't see the disk on the esp
controller at target three.
We then tested on an Ultra 2 to see if that hardware (which has multiple
esp controllers in it) would detect devices correctly, and it does:
http://beer.tclug.org/jima/text/badger-2.6.18-1.2798.al3.1smp-dmesg.txt
I also tested 2.6.20-rc1-git5 on the sparc32s, but it also fails to see
any of the attached devices (disk on the ss4, disk and cdrom on the
ss20).
At this point, I built 2.6.16 and 2.6.18.1 kernels with all the DEBUG_
defines enabled, in the hopes of exposing the differences in behavior.
Here is the 2.6.16 debugging output from the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.16-1.2241sp7.2.txt
Here is the 2.6.18.1 debugging output from the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.18-1.2798.al3.4.txt
The differences are rather staggering, the 2.6.16 kernel seems to call
esp_queue multiple times upon discovering the disk on target 3, but the
2.6.18.1 kernel only calls it once.
Comparing the output on the ss4:
On 2.6.16, we see:
<SLCTNORM>I[0:0](
<CLUELESS>esp_do_data:
<DATAIN>newphase
<DATAIN> hmuch<36> DMA|TI --> do_intr_end )
I[0:0](esp_work_bus: esp_do_data_finale: trans_z(36), bytes_sent(36),
<CLUELESS>!bogus_data, to new phase
At the same point in 2.6.18.1, we see:
<SLCTNORM>I[0:0](
<CLUELESS>esp_do_data:
<DATAIN>newphase
<DATAIN> hmuch<252> DMA|TI --> do_intr_end )
I[0:0](esp_work_bus: esp_do_data_finale: trans_sz(252), bytes_sent(18),
<CLUELESS>!bogus_data, to new phase
Note the difference in trans_sz, bytes_sent, hmuch... not sure if that
is relevant, or just noise.
After this, both kernels output:
<STATUS>esp_do_status: ack msg, got something, got both, status= 0 msg0, and was COMMAND_COMPLETE
<FREEING>F<03,00>)
But while the 2.6.16 kernel continues on target 3, recalling esp_queue,
and finding the disk on target 3:
esp_queue: target=3 lun=0 N<03,00>
esp: Selecting device for first time. target=3 lun=0
<SLCTNORM>I[0:0](<CLUELESS>esp_do_data: <DATAIN>newphase<DATAIN> hmuch<144> DMA|TI --> do_intr_end
)I[0:0](esp_work_bus: esp_do_data_finale: trans_sz(144), bytes_sent(144), <CLUELESS>!bogus_data, to new phase
<STATUS>esp_do_status: ack msg, got something, got both, status= 0 msg= 0, and was COMMAND_COMPLETE
<FREEING>F<03,00>)<5> Vendor: SEAGATE Model: ST34573WC Rev: 6244
Type: Direct-Access ANSI SCSI revision: 02
.....
The 2.6.18.1 kernel stops and moves on to target 4+:
esp_queue: target=4 lun=0 N<04,00>
esp: Selecting device for first time. target=4 lun=0
<SLCTNORM>I[0:0](esp: selection failure, maybe nobody there?
esp: target 4 lun 0
)esp_queue: target=5 lun=0 N<05,00>
esp: Selecting device for first time. target=5 lun=0
<SLCTNORM>I[0:0](esp: selection failure, maybe nobody there?
esp: target 5 lun 0
)esp_queue: target=6 lun=0 N<06,00>
esp: Selecting device for first time. target=6 lun=0
<SLCTNORM>I[0:0](esp: selection failure, maybe nobody there?
esp: target 6 lun 0
)
I'm not sure why this is failing, looking at git, the changes to esp to
port it to the new SBUS layer are the most significant differences:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;hA1aa5540536feace62c97478a8ea5dab7469377
But, I'm not sure why this would break sparc32 and not sparc64, and
reverting it really isn't the way to go here.
I've also got the debugging output from esp.ko on 2.6.18.1 on the Ultra
2:
http://beer.tclug.org/jima/text/badger-2.6.18-1.2798.al3.4smp.txt
It's worth noting that the trans_sz, bytes_sent and hmuch values here
match the values from the working 2.6.16 kernel on the ss4.
All kernels were built with the same toolchain:
gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)
Dave, any assistance you can offer here would be greatly appreciated, as
this is pretty much a showstopper for the next Aurora release (broken
scsi).
~spot
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.6.19.1] ESP regression ?
2007-01-05 9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
2007-01-05 17:45 ` Tom 'spot' Callaway
@ 2007-01-05 19:35 ` Tom 'spot' Callaway
2007-01-06 5:22 ` Jurij Smakov
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Tom 'spot' Callaway @ 2007-01-05 19:35 UTC (permalink / raw)
To: sparclinux
On Fri, 2007-01-05 at 11:45 -0600, Tom 'spot' Callaway wrote:
> For what it is worth, I'm not actually able to get esp.ko (Aurora builds
> esp as a module) working at all on any sparc32 systems (immediate
> testing on ss4 and ss20).
Building the SCSI and esp components into the kernel does not change the
results, it still doesn't work on 2.6.18.1+ on sparc32.
If anyone has a working sparc32 kernel config that works on a sparc32
w/esp, please post it here.
Thanks,
~spot
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.6.19.1] ESP regression ?
2007-01-05 9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
2007-01-05 17:45 ` Tom 'spot' Callaway
2007-01-05 19:35 ` Tom 'spot' Callaway
@ 2007-01-06 5:22 ` Jurij Smakov
2007-01-06 9:09 ` BERTRAND Joël
2007-01-07 14:23 ` BERTRAND Joël
4 siblings, 0 replies; 6+ messages in thread
From: Jurij Smakov @ 2007-01-06 5:22 UTC (permalink / raw)
To: sparclinux
On Fri, Jan 05, 2007 at 01:35:05PM -0600, Tom 'spot' Callaway wrote:
>
> Building the SCSI and esp components into the kernel does not change the
> results, it still doesn't work on 2.6.18.1+ on sparc32.
>
> If anyone has a working sparc32 kernel config that works on a sparc32
> w/esp, please post it here.
Tom,
The 2.6.18 kernel (it is actually 2.6.18.5 plus some patches) we
currently have in Debian works fine on my SS20. You can find the
config, kernel, and matching initrd at
http://www.wooyd.org/sparc/
The only patch which touches anything in arch/sparc is also included
there (bus-id-size.patch), other patches may be viewed at
http://svn.debian.org/wsvn/kernel/dists/sid/linux-2.6/debian/patches/
One esp problem I'm aware of is broken CD-ROM support, the bug report
with some discussion is available at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug93894
I've got as far as figuring out that this problem seems to be due to a
miscompilation with gcc 4.1. If the kernel is built with gcc 4.0,
CD-ROM actually works again, so it might be some tool chain
regression. At this point, however, I don't really have time to track
it down.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.6.19.1] ESP regression ?
2007-01-05 9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
` (2 preceding siblings ...)
2007-01-06 5:22 ` Jurij Smakov
@ 2007-01-06 9:09 ` BERTRAND Joël
2007-01-07 14:23 ` BERTRAND Joël
4 siblings, 0 replies; 6+ messages in thread
From: BERTRAND Joël @ 2007-01-06 9:09 UTC (permalink / raw)
To: sparclinux
Jurij Smakov a écrit :
> On Fri, Jan 05, 2007 at 01:35:05PM -0600, Tom 'spot' Callaway wrote:
>> Building the SCSI and esp components into the kernel does not change the
>> results, it still doesn't work on 2.6.18.1+ on sparc32.
>>
>> If anyone has a working sparc32 kernel config that works on a sparc32
>> w/esp, please post it here.
>
> Tom,
Jurij,
> The 2.6.18 kernel (it is actually 2.6.18.5 plus some patches) we
> currently have in Debian works fine on my SS20. You can find the
> config, kernel, and matching initrd at
>
> http://www.wooyd.org/sparc/
I have tried the 2.6.18 official tree. ESP works fine, but due to a bug
in VM, SMP kernel is not stable. I obtain some Oopses with
syscall_too_hard in read_pipe. For example, it is imposible to untar
kernel tree from its tar.bz2 file. I have tried both 2.6.19 and
2.6.19.1. With these kernels, I cannot obtain any syscall_too_hard error
nor Oopses, but :
1/ sunlance is totaly broken (sunhme works fine). I suspect a bug due to
a build with gcc-4.1, but I don't have time enough (today, but I shall
try tomorrow ;-) ) to build gcc-4.0 and rebuild the current kernel;
2/ I have seen a deadlock (without any Oops, tar xvfj
linux-2.6.20-rc3.tar.bz2 remains in D state !).
The 2.6.20-rc3 hangs after floppy controler initialization (but I have
built this kernel with gcc-4.1).
Regards,
JKB
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.6.19.1] ESP regression ?
2007-01-05 9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
` (3 preceding siblings ...)
2007-01-06 9:09 ` BERTRAND Joël
@ 2007-01-07 14:23 ` BERTRAND Joël
4 siblings, 0 replies; 6+ messages in thread
From: BERTRAND Joël @ 2007-01-07 14:23 UTC (permalink / raw)
To: sparclinux
Jurij Smakov a écrit :
> On Fri, Jan 05, 2007 at 01:35:05PM -0600, Tom 'spot' Callaway wrote:
>> Building the SCSI and esp components into the kernel does not change the
>> results, it still doesn't work on 2.6.18.1+ on sparc32.
>>
>> If anyone has a working sparc32 kernel config that works on a sparc32
>> w/esp, please post it here.
>
> Tom,
>
> The 2.6.18 kernel (it is actually 2.6.18.5 plus some patches) we
> currently have in Debian works fine on my SS20. You can find the
> config, kernel, and matching initrd at
>
> http://www.wooyd.org/sparc/
>
> The only patch which touches anything in arch/sparc is also included
> there (bus-id-size.patch), other patches may be viewed at
>
> http://svn.debian.org/wsvn/kernel/dists/sid/linux-2.6/debian/patches/
>
> One esp problem I'm aware of is broken CD-ROM support, the bug report
> with some discussion is available at
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug93894
>
> I've got as far as figuring out that this problem seems to be due to a
> miscompilation with gcc 4.1. If the kernel is built with gcc 4.0,
> CD-ROM actually works again, so it might be some tool chain
> regression. At this point, however, I don't really have time to track
> it down.
For information, the 2.6.19.1 runs better when it is built with
gcc-4.1. I have tried to rebuild this kernel with gcc-4.0 and I can see
several deadlocks.
Regards,
JKB
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-01-07 14:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-05 9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
2007-01-05 17:45 ` Tom 'spot' Callaway
2007-01-05 19:35 ` Tom 'spot' Callaway
2007-01-06 5:22 ` Jurij Smakov
2007-01-06 9:09 ` BERTRAND Joël
2007-01-07 14:23 ` BERTRAND Joël
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.