All of lore.kernel.org
 help / color / mirror / Atom feed
* [2.6.19.1] ESP regression ?
@ 2007-01-05  9:47 BERTRAND Joël
  2007-01-05 17:45 ` Tom 'spot' Callaway
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: BERTRAND Joël @ 2007-01-05  9:47 UTC (permalink / raw)
  To: sparclinux

	Hello,

	Some trouble again with ESP DMA and the 2.6.19 kernel ? With very high 
disk load (raid reconstruction or apt-get dist-upgrade), I can seen ESP 
DMA error on a SS20 workstation. This trouble has been fixed in 2.6.18. 
Is there any chance to fix this with the last 2.6.20-rc ?

	Test config :
- dual SS-II/75 MHz
- 448 MB
- 8MB VSIMM
- 2 36 GB internal SCSI disks
- 2.6.19.1 kernel

	Thanks for your help,

	JKB

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.19.1] ESP regression ?
  2007-01-05  9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
@ 2007-01-05 17:45 ` Tom 'spot' Callaway
  2007-01-05 19:35 ` Tom 'spot' Callaway
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tom 'spot' Callaway @ 2007-01-05 17:45 UTC (permalink / raw)
  To: sparclinux

On Fri, 2007-01-05 at 10:47 +0100, BERTRAND Joël wrote:
> 	Hello,
> 
> 	Some trouble again with ESP DMA and the 2.6.19 kernel ? With very high 
> disk load (raid reconstruction or apt-get dist-upgrade), I can seen ESP 
> DMA error on a SS20 workstation. This trouble has been fixed in 2.6.18. 
> Is there any chance to fix this with the last 2.6.20-rc ?
> 
> 	Test config :
> - dual SS-II/75 MHz
> - 448 MB
> - 8MB VSIMM
> - 2 36 GB internal SCSI disks
> - 2.6.19.1 kernel

For what it is worth, I'm not actually able to get esp.ko (Aurora builds
esp as a module) working at all on any sparc32 systems (immediate
testing on ss4 and ss20). Several of the Aurora folks have been helping
me try to track down the failure, and here is what we know so far:

2.6.16 works properly on the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.16-1.2241sp7.1-dmesg.txt

We see the disk on the esp controller at target three, and life is good.

2.6.18.1 does not work properly on the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.18-1.2798.al3.3.txt

The esp.ko module loads, but it doesn't see the disk on the esp
controller at target three.

We then tested on an Ultra 2 to see if that hardware (which has multiple
esp controllers in it) would detect devices correctly, and it does:
http://beer.tclug.org/jima/text/badger-2.6.18-1.2798.al3.1smp-dmesg.txt

I also tested 2.6.20-rc1-git5 on the sparc32s, but it also fails to see
any of the attached devices (disk on the ss4, disk and cdrom on the
ss20).

At this point, I built 2.6.16 and 2.6.18.1 kernels with all the DEBUG_
defines enabled, in the hopes of exposing the differences in behavior.

Here is the 2.6.16 debugging output from the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.16-1.2241sp7.2.txt

Here is the 2.6.18.1 debugging output from the ss4:
http://beer.tclug.org/jima/text/tmi-2.6.18-1.2798.al3.4.txt

The differences are rather staggering, the 2.6.16 kernel seems to call
esp_queue multiple times upon discovering the disk on target 3, but the
2.6.18.1 kernel only calls it once.

Comparing the output on the ss4:

On 2.6.16, we see:
<SLCTNORM>I[0:0]( 
<CLUELESS>esp_do_data: 
<DATAIN>newphase
<DATAIN> hmuch<36> DMA|TI --> do_intr_end )
I[0:0](esp_work_bus: esp_do_data_finale: trans_z(36), bytes_sent(36),
<CLUELESS>!bogus_data, to new phase

At the same point in 2.6.18.1, we see:
<SLCTNORM>I[0:0](
<CLUELESS>esp_do_data: 
<DATAIN>newphase
<DATAIN> hmuch<252> DMA|TI --> do_intr_end )
I[0:0](esp_work_bus: esp_do_data_finale: trans_sz(252), bytes_sent(18),
<CLUELESS>!bogus_data, to new phase

Note the difference in trans_sz, bytes_sent, hmuch... not sure if that
is relevant, or just noise.

After this, both kernels output:
<STATUS>esp_do_status: ack msg, got something, got both, status= 0 msg0, and was COMMAND_COMPLETE
<FREEING>F<03,00>)

But while the 2.6.16 kernel continues on target 3, recalling esp_queue,
and finding the disk on target 3:

esp_queue: target=3 lun=0 N<03,00>
esp: Selecting device for first time. target=3 lun=0
<SLCTNORM>I[0:0](<CLUELESS>esp_do_data: <DATAIN>newphase<DATAIN> hmuch<144> DMA|TI --> do_intr_end
)I[0:0](esp_work_bus: esp_do_data_finale: trans_sz(144), bytes_sent(144), <CLUELESS>!bogus_data, to new phase
<STATUS>esp_do_status: ack msg, got something, got both, status= 0 msg= 0, and was COMMAND_COMPLETE
<FREEING>F<03,00>)<5>  Vendor: SEAGATE   Model: ST34573WC         Rev: 6244
  Type:   Direct-Access                      ANSI SCSI revision: 02
.....

The 2.6.18.1 kernel stops and moves on to target 4+:
esp_queue: target=4 lun=0 N<04,00>
esp: Selecting device for first time. target=4 lun=0
<SLCTNORM>I[0:0](esp: selection failure, maybe nobody there?
esp: target 4 lun 0
)esp_queue: target=5 lun=0 N<05,00>
esp: Selecting device for first time. target=5 lun=0
<SLCTNORM>I[0:0](esp: selection failure, maybe nobody there?
esp: target 5 lun 0
)esp_queue: target=6 lun=0 N<06,00>
esp: Selecting device for first time. target=6 lun=0
<SLCTNORM>I[0:0](esp: selection failure, maybe nobody there?
esp: target 6 lun 0
)

I'm not sure why this is failing, looking at git, the changes to esp to
port it to the new SBUS layer are the most significant differences:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;hA1aa5540536feace62c97478a8ea5dab7469377

But, I'm not sure why this would break sparc32 and not sparc64, and
reverting it really isn't the way to go here.

I've also got the debugging output from esp.ko on 2.6.18.1 on the Ultra
2:
http://beer.tclug.org/jima/text/badger-2.6.18-1.2798.al3.4smp.txt

It's worth noting that the trans_sz, bytes_sent and hmuch values here
match the values from the working 2.6.16 kernel on the ss4.

All kernels were built with the same toolchain:
gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)

Dave, any assistance you can offer here would be greatly appreciated, as
this is pretty much a showstopper for the next Aurora release (broken
scsi).

~spot


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.19.1] ESP regression ?
  2007-01-05  9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
  2007-01-05 17:45 ` Tom 'spot' Callaway
@ 2007-01-05 19:35 ` Tom 'spot' Callaway
  2007-01-06  5:22 ` Jurij Smakov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tom 'spot' Callaway @ 2007-01-05 19:35 UTC (permalink / raw)
  To: sparclinux

On Fri, 2007-01-05 at 11:45 -0600, Tom 'spot' Callaway wrote:

> For what it is worth, I'm not actually able to get esp.ko (Aurora builds
> esp as a module) working at all on any sparc32 systems (immediate
> testing on ss4 and ss20). 

Building the SCSI and esp components into the kernel does not change the
results, it still doesn't work on 2.6.18.1+ on sparc32.

If anyone has a working sparc32 kernel config that works on a sparc32
w/esp, please post it here.

Thanks,

~spot


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.19.1] ESP regression ?
  2007-01-05  9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
  2007-01-05 17:45 ` Tom 'spot' Callaway
  2007-01-05 19:35 ` Tom 'spot' Callaway
@ 2007-01-06  5:22 ` Jurij Smakov
  2007-01-06  9:09 ` BERTRAND Joël
  2007-01-07 14:23 ` BERTRAND Joël
  4 siblings, 0 replies; 6+ messages in thread
From: Jurij Smakov @ 2007-01-06  5:22 UTC (permalink / raw)
  To: sparclinux

On Fri, Jan 05, 2007 at 01:35:05PM -0600, Tom 'spot' Callaway wrote:
> 
> Building the SCSI and esp components into the kernel does not change the
> results, it still doesn't work on 2.6.18.1+ on sparc32.
> 
> If anyone has a working sparc32 kernel config that works on a sparc32
> w/esp, please post it here.

Tom,

The 2.6.18 kernel (it is actually 2.6.18.5 plus some patches) we 
currently have in Debian works fine on my SS20. You can find the 
config, kernel, and matching initrd at

http://www.wooyd.org/sparc/

The only patch which touches anything in arch/sparc is also included 
there (bus-id-size.patch), other patches may be viewed at

http://svn.debian.org/wsvn/kernel/dists/sid/linux-2.6/debian/patches/

One esp problem I'm aware of is broken CD-ROM support, the bug report 
with some discussion is available at

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug93894

I've got as far as figuring out that this problem seems to be due to a 
miscompilation with gcc 4.1. If the kernel is built with gcc 4.0, 
CD-ROM actually works again, so it might be some tool chain 
regression. At this point, however, I don't really have time to track 
it down.

Best regards,
-- 
Jurij Smakov                                           jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/                      KeyID: C99E03CC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.19.1] ESP regression ?
  2007-01-05  9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
                   ` (2 preceding siblings ...)
  2007-01-06  5:22 ` Jurij Smakov
@ 2007-01-06  9:09 ` BERTRAND Joël
  2007-01-07 14:23 ` BERTRAND Joël
  4 siblings, 0 replies; 6+ messages in thread
From: BERTRAND Joël @ 2007-01-06  9:09 UTC (permalink / raw)
  To: sparclinux

Jurij Smakov a écrit :
> On Fri, Jan 05, 2007 at 01:35:05PM -0600, Tom 'spot' Callaway wrote:
>> Building the SCSI and esp components into the kernel does not change the
>> results, it still doesn't work on 2.6.18.1+ on sparc32.
>>
>> If anyone has a working sparc32 kernel config that works on a sparc32
>> w/esp, please post it here.
> 
> Tom,

	Jurij,

> The 2.6.18 kernel (it is actually 2.6.18.5 plus some patches) we 
> currently have in Debian works fine on my SS20. You can find the 
> config, kernel, and matching initrd at
> 
> http://www.wooyd.org/sparc/

	I have tried the 2.6.18 official tree. ESP works fine, but due to a bug 
in VM, SMP kernel is not stable. I obtain some Oopses with 
syscall_too_hard in read_pipe. For example, it is imposible to untar 
kernel tree from its tar.bz2 file. I have tried both 2.6.19 and 
2.6.19.1. With these kernels, I cannot obtain any syscall_too_hard error 
nor Oopses, but :

1/ sunlance is totaly broken (sunhme works fine). I suspect a bug due to 
a build with gcc-4.1, but I don't have time enough (today, but I shall 
try tomorrow ;-) ) to build gcc-4.0 and rebuild the current kernel;
2/ I have seen a deadlock (without any Oops, tar xvfj 
linux-2.6.20-rc3.tar.bz2 remains in D state !).

	The 2.6.20-rc3 hangs after floppy controler initialization (but I have 
built this kernel with gcc-4.1).

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.19.1] ESP regression ?
  2007-01-05  9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
                   ` (3 preceding siblings ...)
  2007-01-06  9:09 ` BERTRAND Joël
@ 2007-01-07 14:23 ` BERTRAND Joël
  4 siblings, 0 replies; 6+ messages in thread
From: BERTRAND Joël @ 2007-01-07 14:23 UTC (permalink / raw)
  To: sparclinux

Jurij Smakov a écrit :
> On Fri, Jan 05, 2007 at 01:35:05PM -0600, Tom 'spot' Callaway wrote:
>> Building the SCSI and esp components into the kernel does not change the
>> results, it still doesn't work on 2.6.18.1+ on sparc32.
>>
>> If anyone has a working sparc32 kernel config that works on a sparc32
>> w/esp, please post it here.
> 
> Tom,
> 
> The 2.6.18 kernel (it is actually 2.6.18.5 plus some patches) we 
> currently have in Debian works fine on my SS20. You can find the 
> config, kernel, and matching initrd at
> 
> http://www.wooyd.org/sparc/
> 
> The only patch which touches anything in arch/sparc is also included 
> there (bus-id-size.patch), other patches may be viewed at
> 
> http://svn.debian.org/wsvn/kernel/dists/sid/linux-2.6/debian/patches/
> 
> One esp problem I'm aware of is broken CD-ROM support, the bug report 
> with some discussion is available at
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug93894
> 
> I've got as far as figuring out that this problem seems to be due to a 
> miscompilation with gcc 4.1. If the kernel is built with gcc 4.0, 
> CD-ROM actually works again, so it might be some tool chain 
> regression. At this point, however, I don't really have time to track 
> it down.

	For information, the 2.6.19.1 runs better when it is built with 
gcc-4.1. I have tried to rebuild this kernel with gcc-4.0 and I can see 
several deadlocks.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-01-07 14:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-05  9:47 [2.6.19.1] ESP regression ? BERTRAND Joël
2007-01-05 17:45 ` Tom 'spot' Callaway
2007-01-05 19:35 ` Tom 'spot' Callaway
2007-01-06  5:22 ` Jurij Smakov
2007-01-06  9:09 ` BERTRAND Joël
2007-01-07 14:23 ` BERTRAND Joël

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.