Linux PARISC architecture development
 help / color / mirror / Atom feed
* [parisc-linux] Hanging with kernels >= 2.4.22
@ 2004-02-18 23:52 Stuart Brady
  2004-02-19  6:29 ` Grant Grundler
  0 siblings, 1 reply; 11+ messages in thread
From: Stuart Brady @ 2004-02-18 23:52 UTC (permalink / raw)
  To: parisc-linux

Hi,

Firstly, thanks for the great work on parisc-linux. I'm amazed at how
well it's working. I hope I'll be able to give something back, soon.

I'm having problems with kernels >= 2.4.22 on a 715/100.

I'm looking at http://parisc-linux.org/faq/kernelbug-howto.html.
The guidelines at http://bugs.parisc-linux.org/Reporting.html (linked to
from http://parisc-linux.org/) appear to be out of date, so I'm not sure
if I'm doing the right thing. Console output is a little difficult for
me to provide at the moment.

While booting (i.e. while init is bringing up services, not during
kernel startup), or after booting, something appears to be hanging.
There doesn't seem to be much of a pattern regarding when the hangs
occur, although I've noticed them happening at the same time between
subsequent reboots, e.g. after starting statd, or trying to load the
mixer settings.

If I set the default runlevel to 1, I can log in using single user
mode. Before starting inetd, telnet to port 37 (time) fails. After
running /etc/init.d/inetd start, telnet to port 37 appears to work.
(I get a valid response, e.g. 0xc3de42ac), telnetting to port 13 seems
correct: "Wed Feb 18 20:04:23 2004." If I start sshd and remove
/etc/nologin, I can connect with ssh.

After the hang has taken place, I can still ping the machine.
Anything I type at the console is echoed correctly. The heartbeat's
working. Magic sysrq works. I can make connections to port 22, but
after the connection is made, nothing happens (I.e. I don't get the
SSH greeting.) If I telnet to port 37 or port 13, I get a valid
response - this indicates that inetd is running and still working.
If I have bash running I can use tab completion to list directory
contents.

But after a while, bash freezes during tab completion. Sometimes,
telnetting to port 13 ceases to work. Still, I can ping the machine,
text is echoed, I even get a response from telnetting to port 37,
the heartbeat works, and so does magic sysrq.

uname -a reports:

Linux 1986u10 2.4.24-pa0 #6 Wed Feb 18 16:58:09 GMT 2004 parisc GNU/Linux

I'm using gcc 3.3.3 (-0pre3) package to build the kernel, with
the binutils 2.14.90.0.7-5 package, and 2.4.24 from CVS. The same
thing happens with all of the pre-compiled packages from
http://cvs.parisc-linux.org/download/autobuild-kernels/32/
that I've tried newer than palinux-32-2.4.21-pa7.

Debian's kernel-image-2.4.17-32 and kernel-image-2.4.21-32 seem to work.
palinux-32-2.4.20-pa35, palinux-32-2.4.21-pa2 and palinux-32-2.4.21-pa7
work correctly. palinux-32-2.4.22-pa7, palinux-32-2.4.22-pa10 and
palinux-32-2.4.24-pa0 don't, though, nor do any of the 2.4.24 images
that I've built from CVS.

The BootRom version is 1.5. The memory in this machine was upgraded to
160MB. There's also an extra graphics card, which I'm not using at the
moment. The problem occurs regardless of whether I'm using stifb or
sticon.

Palo's command line is: 2/vmlinux root=/dev/sda6 HOME=/ console=tty0
sti=1 sti_font=VGA8x16 TERM=linux 

I've uploaded the System.map and config files:
   http://homepage.ntlworld.com/wholehog/System.map-32-2.4.24-pa0.gz
   http://homepage.ntlworld.com/wholehog/config-32-2.4.24-pa0

Also, when doing a fsck (ext3) with the new kernel, it got to the 70.1%
mark, and froze. 70% seems to be the beginning of a particular stage in
fsck's checking.

Typical /var/log/messages output is at:
   http://homepage.ntlworld.com/wholehog/messages-2.4.21-pa7
   http://homepage.ntlworld.com/wholehog/messages-2.4.24-pa0

This may be unrelated, but sometimes the keyboard isn't detected. This
seems to affect only the newer kernels, but I may be mistaken.

I can't seem to clear the PIM TOC info, and pressing the TOC button
doesn't seem be setting it. I think the TOC button works, because
pressing it up just after powering up results in the serial port being
used for the console - at least I assume it does, because it doesn't
use the graphics device.

I'd be very grateful for any help finding the cause of this. I'll be
happy to test anyone's patches in order to find out what's causing
things to break, or to test any fixes. Please let me know if there's
any more information or log output that would be of use.
-- 
Stuart Brady

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-18 23:52 [parisc-linux] Hanging with kernels >= 2.4.22 Stuart Brady
@ 2004-02-19  6:29 ` Grant Grundler
  2004-02-21  7:58   ` Stuart Brady
  0 siblings, 1 reply; 11+ messages in thread
From: Grant Grundler @ 2004-02-19  6:29 UTC (permalink / raw)
  To: Stuart Brady; +Cc: parisc-linux

On Wed, Feb 18, 2004 at 11:52:25PM +0000, Stuart Brady wrote:
> I'm having problems with kernels >= 2.4.22 on a 715/100.

[ delete descriptions of hang symptoms ]

Stuart,
All your symptoms point at disk IO hanging.
Setting the default queue depth to 1 would be worth trying.
I've forgotten the details for setting queue depth.
 
search parisc-linux mailing list archives "scsi queue tags".
http://lists.parisc-linux.org/pipermail/parisc-linux/2004-January/022087.html

btw, 715/50 is NOT the same as 715/100.
715/100 should have "coherent" DMA and 715/50 does not.

Maybe a FAQ entry for "715/xxx hangs" could be your first contribution?

hth,
grant

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-19  6:29 ` Grant Grundler
@ 2004-02-21  7:58   ` Stuart Brady
  2004-02-21 17:05     ` Carlos O'Donell
  2004-02-22  6:21     ` Grant Grundler
  0 siblings, 2 replies; 11+ messages in thread
From: Stuart Brady @ 2004-02-21  7:58 UTC (permalink / raw)
  To: parisc-linux

On Wed, Feb 18, 2004 at 11:29:07PM -0700, Grant Grundler wrote:
> On Wed, Feb 18, 2004 at 11:52:25PM +0000, Stuart Brady wrote:
> > I'm having problems with kernels >= 2.4.22 on a 715/100.
> 
> Stuart,
> All your symptoms point at disk IO hanging.
> Setting the default queue depth to 1 would be worth trying.
> I've forgotten the details for setting queue depth.
>  
> search parisc-linux mailing list archives "scsi queue tags".
> http://lists.parisc-linux.org/pipermail/parisc-linux/2004-January/022087.html

Thanks! Changing NCR_700_MAX_TAGS to 1 in drivers/scsi/53c700.h did the
trick. Will I now have poor disk performance? If so, might I get away
with setting it to something higher, like 2 or 4, maybe?

I've not heard of tagged command queues before - the idea seems to be to
transfer data to the drive as early as possible, but make decisions as
to what should be written first at a later stage. Is that correct?

http://lists.parisc-linux.org/pipermail/parisc-linux/2002-February/015465.html
says that no tagged queue is a bad thing. Why is that?

> btw, 715/50 is NOT the same as 715/100.
> 715/100 should have "coherent" DMA and 715/50 does not.
> 
> Maybe a FAQ entry for "715/xxx hangs" could be your first contribution?

That's a good idea. Where should I send the entry? Here?
-- 
Stuart Brady

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-21  7:58   ` Stuart Brady
@ 2004-02-21 17:05     ` Carlos O'Donell
  2004-02-21 18:24       ` Matthew Wilcox
  2004-02-22 14:46       ` Riccardo
  2004-02-22  6:21     ` Grant Grundler
  1 sibling, 2 replies; 11+ messages in thread
From: Carlos O'Donell @ 2004-02-21 17:05 UTC (permalink / raw)
  To: Stuart Brady; +Cc: parisc-linux

> Thanks! Changing NCR_700_MAX_TAGS to 1 in drivers/scsi/53c700.h did the
> trick. Will I now have poor disk performance? If so, might I get away
> with setting it to something higher, like 2 or 4, maybe?

I could never run IO stably at anything higher than 1. The drives don't
seem to handle it properly, the error recovery mechanisms seem less
than perfect, and the box just hangs.
 
> I've not heard of tagged command queues before - the idea seems to be to
> transfer data to the drive as early as possible, but make decisions as
> to what should be written first at a later stage. Is that correct?
> 
> http://lists.parisc-linux.org/pipermail/parisc-linux/2002-February/015465.html
> says that no tagged queue is a bad thing. Why is that?

You misunderstand, I *removed* the tag-queue code from the driver and
*that* was bad. We're only asking that you reduce the queue size to 1 or
2. Infact higher numbers might be okay, but 16 is definately too high.

> That's a good idea. Where should I send the entry? Here?

You can checkout the entire website via CVS from cvs.parisc-linux.org,
there are CVS instructions on www.parisc-linux.org. The CVS file your
are particularly interested in is:

http://cvs.parisc-linux.org/web/src/faq/index.x?rev=1.40&content-type=text/vnd.viewcvs-markup

Generate a diff against that file, which includes your new entry, and 
voila you've added an entry to the FAQ, praise will rain down on you,
and you will never be forgotten :)

Many thanks,

c.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-21 17:05     ` Carlos O'Donell
@ 2004-02-21 18:24       ` Matthew Wilcox
  2004-02-22  6:04         ` Carlos O'Donell
  2004-02-22 14:46       ` Riccardo
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2004-02-21 18:24 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: parisc-linux

On Sat, Feb 21, 2004 at 12:05:09PM -0500, Carlos O'Donell wrote:
> > Thanks! Changing NCR_700_MAX_TAGS to 1 in drivers/scsi/53c700.h did the
> > trick. Will I now have poor disk performance? If so, might I get away
> > with setting it to something higher, like 2 or 4, maybe?
> 
> I could never run IO stably at anything higher than 1. The drives don't
> seem to handle it properly, the error recovery mechanisms seem less
> than perfect, and the box just hangs.

So, um, maybe we should change the default?  Nobody seems to be
investigating why this happens, so I don't see why our users should
suffer.  I doubt anyone's going to be terribly motivated to track this
problem down for such old machines, so let's just sacrifice a little
performance for stability.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-21 18:24       ` Matthew Wilcox
@ 2004-02-22  6:04         ` Carlos O'Donell
  0 siblings, 0 replies; 11+ messages in thread
From: Carlos O'Donell @ 2004-02-22  6:04 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: parisc-linux

On Sat, Feb 21, 2004 at 06:24:36PM +0000, Matthew Wilcox wrote:
> > I could never run IO stably at anything higher than 1. The drives don't
> > seem to handle it properly, the error recovery mechanisms seem less
> > than perfect, and the box just hangs.
> 
> So, um, maybe we should change the default?  Nobody seems to be
> investigating why this happens, so I don't see why our users should
> suffer.  I doubt anyone's going to be terribly motivated to track this
> problem down for such old machines, so let's just sacrifice a little
> performance for stability.

It just doesn't work on those older boxes? I'm not quite sure how much
more investigation I can do. I mean I've got my 715/50's all running, I
could build a few more kernels and see what's going on.

I could also commit a change to lower the max tags down to 1, that seems
a little harsh for performance, but all those perf people can crank it
back up. Perhaps a build time configure option?

c.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-21  7:58   ` Stuart Brady
  2004-02-21 17:05     ` Carlos O'Donell
@ 2004-02-22  6:21     ` Grant Grundler
  1 sibling, 0 replies; 11+ messages in thread
From: Grant Grundler @ 2004-02-22  6:21 UTC (permalink / raw)
  To: Stuart Brady; +Cc: parisc-linux

On Sat, Feb 21, 2004 at 07:58:49AM +0000, Stuart Brady wrote:
> Thanks! Changing NCR_700_MAX_TAGS to 1 in drivers/scsi/53c700.h did the
> trick. Will I now have poor disk performance? If so, might I get away
> with setting it to something higher, like 2 or 4, maybe?

You can try, but I don't think it's worth it.

> I've not heard of tagged command queues before - the idea seems to be to
> transfer data to the drive as early as possible, but make decisions as
> to what should be written first at a later stage. Is that correct?

Sort of yes. The reasons are a bit more complicated than that.
Key reasons are better utilization of disk buffer (w/o enabling WCE)
and allow disk firmware to optimize for maximum throughput.
There are tradeoffs and caveats to both.

> > Maybe a FAQ entry for "715/xxx hangs" could be your first contribution?
> 
> That's a good idea. Where should I send the entry? Here?

Yes please. Making a diff against the file carlos pointed at
would be easiest for me. But I'll take a single paragraph
in plain text as well.

The FAQ entry should mention which machines/SCSI controllers/disks
are affected and how to set the queuedepth. Anything else is
extra credit and I reserve the right to edit it out. :^)

thanks,
grant

> -- 
> Stuart Brady
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-21 17:05     ` Carlos O'Donell
  2004-02-21 18:24       ` Matthew Wilcox
@ 2004-02-22 14:46       ` Riccardo
  2004-02-22 15:30         ` Joel Soete
  1 sibling, 1 reply; 11+ messages in thread
From: Riccardo @ 2004-02-22 14:46 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: parisc-linux

Carlos O'Donell wrote:
> 
> > Thanks! Changing NCR_700_MAX_TAGS to 1 in drivers/scsi/53c700.h did the
> > trick. Will I now have poor disk performance? If so, might I get away
> > with setting it to something higher, like 2 or 4, maybe?
> 
> I could never run IO stably at anything higher than 1. The drives don't
> seem to handle it properly, the error recovery mechanisms seem less
> than perfect, and the box just hangs.
It really depends on the drive. I have a Fujitsu Enterprise and it has a
specified tag queue of 128 commands. So if something fails, it is the
driver or HP's hardware.

I have a 715 scorpio and when I used ext2 or reiser with a tag queue of
16 I had very frequent freezes up to a point were the system wouldn't
even mount the partition.

There was a discussion that hp set it to 2 for workstations and to 8 for
servers. I have set it right now to 8 (since mavbe the NCR 7100 I have
doesn't even support more) and I use XFS instead of reiser. I had no
more problems...

I would set the tag queue as default to 8 and not to 16, to stay on the
safe side. Couldn't it be made configurable from the kernel
configuration menu?


What I notivced that some drives seem to be incompatible. I had 2 hard
disks that worked together (an original quantum divre rebranded HP and
an IBM disk). Since my ibm disk died I substitued it with the Fujitsu,
used a new Filesystem and a newer kernel. There was no way to see both
disks when linux booted. I had to remove the original HP disk and
substitute it (a nuisance, since it contained the home directories).
Attaching each time only one of the two disks recognized the correct
disk respectively but both disk weren't. Another disk had no problem.

-Riccardo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-22 14:46       ` Riccardo
@ 2004-02-22 15:30         ` Joel Soete
  2004-02-22 17:16           ` Riccardo
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Soete @ 2004-02-22 15:30 UTC (permalink / raw)
  To: Riccardo; +Cc: Carlos O'Donell, parisc-linux



Riccardo wrote:
> Carlos O'Donell wrote:
> 
>>>Thanks! Changing NCR_700_MAX_TAGS to 1 in drivers/scsi/53c700.h did the
>>>trick. Will I now have poor disk performance? If so, might I get away
>>>with setting it to something higher, like 2 or 4, maybe?
>>
>>I could never run IO stably at anything higher than 1. The drives don't
>>seem to handle it properly, the error recovery mechanisms seem less
>>than perfect, and the box just hangs.
> 
> It really depends on the drive. I have a Fujitsu Enterprise and it has a
> specified tag queue of 128 commands. So if something fails, it is the
> driver or HP's hardware.
> 
> I have a 715 scorpio and when I used ext2 or reiser with a tag queue of
> 16 I had very frequent freezes up to a point were the system wouldn't
> even mount the partition.
> 
> There was a discussion that hp set it to 2 for workstations and to 8 for
> servers. I have set it right now to 8 (since mavbe the NCR 7100 I have
> doesn't even support more) and I use XFS instead of reiser. I had no
> more problems...
> 
> I would set the tag queue as default to 8 and not to 16, to stay on the
> safe side. Couldn't it be made configurable from the kernel
> configuration menu?
> 
Good idea.
> 
> What I notivced that some drives seem to be incompatible. I had 2 hard
> disks that worked together (an original quantum divre rebranded HP and
> an IBM disk). Since my ibm disk died I substitued it with the Fujitsu,
> used a new Filesystem and a newer kernel. There was no way to see both
> disks when linux booted.
And you are sure that scsi id are well different and scsi chain well terminated, I supose.
> I had to remove the original HP disk and
> substitute it (a nuisance, since it contained the home directories).
> Attaching each time only one of the two disks recognized the correct
> disk respectively but both disk weren't. Another disk had no problem.
> 
hmm, I leaved the same experience but with two external disks of exactly the same type (same supplier: hp, same manufactor: 
seagate, same product reference) but with a small firmware revision difference. Unfortunately, this pb only occured under linux 
:(. Now the disk is broken again; so no chance to test it with more recent kernel 2.4 or 2.6 :(

Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-22 15:30         ` Joel Soete
@ 2004-02-22 17:16           ` Riccardo
  2004-02-22 17:54             ` Joel Soete
  0 siblings, 1 reply; 11+ messages in thread
From: Riccardo @ 2004-02-22 17:16 UTC (permalink / raw)
  To: parisc-linux

pre-scriptum: I am sending this to the list because when trying to send
an email directly to Joel I get his MTA answering I have invalid headers
(tried two times)
Joel Soete wrote:

> > I would set the tag queue as default to 8 and not to 16, to stay on the
> > safe side. Couldn't it be made configurable from the kernel
> > configuration menu?
> >
> Good idea.
It is not me that can make changes there, but it seems to me that 8 is
reasonable. Also, a kernel config option would make this parameter
obvious to everybody. I had only a small preformance decrease from 16 to
8 tags (well, disk IO sucks anyway under linux-pa and my old box). While
2 is pure sufference :)

> And you are sure that scsi id are well different and scsi chain well terminated, I supose.
Yes, the internal chain is terminated with the original HP terminator
and I removed the disk internal ones.
In fact substituing the original HP disk with another one removed the
problem (strangely enough, with the older IBM drive as a second drive I
had sometimes freezes but this problem did not appear).


> hmm, I leaved the same experience but with two external disks of exactly the same type (same supplier: hp, same manufactor:
> seagate, same product reference) but with a small firmware revision difference. Unfortunately, this pb only occured under linux
> :(. Now the disk is broken again; so no chance to test it with more recent kernel 2.4 or 2.6 :(

I used a failry recent kernel for there test, I believe 2.4.23.

The scsi code doesn't seem to be so stable and performing as compared to
hp-ux....

-Ric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [parisc-linux] Hanging with kernels >= 2.4.22
  2004-02-22 17:16           ` Riccardo
@ 2004-02-22 17:54             ` Joel Soete
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Soete @ 2004-02-22 17:54 UTC (permalink / raw)
  To: Riccardo; +Cc: parisc-linux

Hi Ricardo,

Riccardo wrote:
> pre-scriptum: I am sending this to the list because when trying to send
> an email directly to Joel I get his MTA answering I have invalid headers
> (tried two times)

Sorry my ISP pb certainly, I will try to check with support. Thanks for advise ;)

> Joel Soete wrote:
> 
> 
>>>I would set the tag queue as default to 8 and not to 16, to stay on the
>>>safe side. Couldn't it be made configurable from the kernel
>>>configuration menu?
>>>
>>
>>Good idea.
> 
> It is not me that can make changes there, but it seems to me that 8 is
> reasonable.
Agreed (anyway you can always suggest a patch, there are always well come :) )
> Also, a kernel config option would make this parameter
> obvious to everybody. I had only a small preformance decrease from 16 to
> 8 tags (well, disk IO sucks anyway under linux-pa and my old box). While
> 2 is pure sufference :)
> 
> 
>>And you are sure that scsi id are well different and scsi chain well terminated, I supose.
> 
> Yes, the internal chain is terminated with the original HP terminator
> and I removed the disk internal ones.
> In fact substituing the original HP disk with another one removed the
> problem (strangely enough, with the older IBM drive as a second drive I
> had sometimes freezes but this problem did not appear).
> 
> 
> 
>>hmm, I leaved the same experience but with two external disks of exactly the same type (same supplier: hp, same manufactor:
>>seagate, same product reference) but with a small firmware revision difference. Unfortunately, this pb only occured under linux
>>:(. Now the disk is broken again; so no chance to test it with more recent kernel 2.4 or 2.6 :(
> 
> 
> I used a failry recent kernel for there test, I believe 2.4.23.
> 
> The scsi code doesn't seem to be so stable and performing as compared to
> hp-ux....
> 
Certainly for numerous reason but on the other hand it is not ported (i don't mean portable, i am quiet sure it is) other platform :).

Thanks for info,
	Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-02-22 17:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-18 23:52 [parisc-linux] Hanging with kernels >= 2.4.22 Stuart Brady
2004-02-19  6:29 ` Grant Grundler
2004-02-21  7:58   ` Stuart Brady
2004-02-21 17:05     ` Carlos O'Donell
2004-02-21 18:24       ` Matthew Wilcox
2004-02-22  6:04         ` Carlos O'Donell
2004-02-22 14:46       ` Riccardo
2004-02-22 15:30         ` Joel Soete
2004-02-22 17:16           ` Riccardo
2004-02-22 17:54             ` Joel Soete
2004-02-22  6:21     ` Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox