* [parisc-linux] Machine hanging during high-traffic NFS
@ 2005-07-19 21:02 Kurt Fitzner
2005-07-19 23:36 ` Michael S. Zick
2005-07-20 1:04 ` Kyle McMartin
0 siblings, 2 replies; 14+ messages in thread
From: Kurt Fitzner @ 2005-07-19 21:02 UTC (permalink / raw)
To: parisc-linux
I've been using nfs to try and save backup images from my B132L
(2.6.12-pa2) with a simple:
dd if=/dev/sda of=/mnt/bulk/sda-image bs=512
Every time the machine hangs solid - the heartbeat LED even stops.
Usually it hangs after around 1 to 2 gigs have been transferred. There
are no log entries at the time of the hang. IT just... stops.
I'm using a 3c905 PCI ethernet card rather than the stock 10 megabit
LASI on board.
I'm wondering if this might be an issue with the ethernet driver when
compiled for PARISC. I've tried very large ftp transfers and can't
reproduce the problem that way.
I've also tried NFS over TCP and tried reducing the rsize/wsize below
1500 bytes to prevent IP fragmentation. Neither of which seem to help.
Are there any known NFS issues right now? Any ideas? Suggestions?
Kurt.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-19 21:02 [parisc-linux] Machine hanging during high-traffic NFS Kurt Fitzner
@ 2005-07-19 23:36 ` Michael S. Zick
2005-07-20 1:04 ` Kyle McMartin
1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Zick @ 2005-07-19 23:36 UTC (permalink / raw)
To: parisc-linux
On Tue July 19 2005 16:02, Kurt Fitzner wrote:
> I've been using nfs to try and save backup images from my B132L
> (2.6.12-pa2) with a simple:
> dd if=/dev/sda of=/mnt/bulk/sda-image bs=512
>
> Every time the machine hangs solid - the heartbeat LED even stops.
> Usually it hangs after around 1 to 2 gigs have been transferred. There
> are no log entries at the time of the hang. IT just... stops.
>
> I'm using a 3c905 PCI ethernet card rather than the stock 10 megabit
> LASI on board.
>
> I'm wondering if this might be an issue with the ethernet driver when
> compiled for PARISC. I've tried very large ftp transfers and can't
> reproduce the problem that way.
>
> I've also tried NFS over TCP and tried reducing the rsize/wsize below
> 1500 bytes to prevent IP fragmentation. Neither of which seem to help.
>
> Are there any known NFS issues right now? Any ideas? Suggestions?
>
Questions/Suggestions only.
Any hints in the log of the receiving (nfs server) side?
Any portion of /dev/sda mounted somewhere?
Is the /mnt/bulk/sda-image mount point on /dev/sda* ?
That is, is there a drive in common with '/', '/mnt', '/dev'
and the entire device '/dev/sda' ?
Can you achive your goal with a file copy rather than
a disk image? Have you tried running rsync?
Can you successfully transfer (dd) a single file larger than
your trouble point size when trying to transfer the entire device?
Have you tried a blocksize != 512 with the dd command?
Perhaps an even sub-multiple of the packet size so that the
network stack does not have to fragment the dd blocks.
Mike
> Kurt.
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
>
>
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-19 21:02 [parisc-linux] Machine hanging during high-traffic NFS Kurt Fitzner
2005-07-19 23:36 ` Michael S. Zick
@ 2005-07-20 1:04 ` Kyle McMartin
2005-07-20 3:31 ` John David Anglin
2005-07-20 6:59 ` Kurt Fitzner
1 sibling, 2 replies; 14+ messages in thread
From: Kyle McMartin @ 2005-07-20 1:04 UTC (permalink / raw)
To: Kurt Fitzner; +Cc: parisc-linux
On Tue, Jul 19, 2005 at 03:02:55PM -0600, Kurt Fitzner wrote:
> I'm using a 3c905 PCI ethernet card rather than the stock 10 megabit
> LASI on board.
>
> I'm wondering if this might be an issue with the ethernet driver when
> compiled for PARISC. I've tried very large ftp transfers and can't
> reproduce the problem that way.
TOC dump, IIR, IOAQ/IASQ locations? Come on people... It's really hard
to even begin to figure out what's wrong if no debugging information
has been provided...
http://www.parisc-linux.org/faq/kernelbug-howto.html
--
Kyle McMartin
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-20 3:31 ` John David Anglin
@ 2005-07-20 2:57 ` Thibaut VARENE
2005-07-20 14:56 ` Matthew Wilcox
0 siblings, 1 reply; 14+ messages in thread
From: Thibaut VARENE @ 2005-07-20 2:57 UTC (permalink / raw)
To: John David Anglin; +Cc: Kyle McMartin, parisc-linux
JDA wrote:
> It might help to have a "stable" branch that is maintained longer
> than is current practice. At a minimum, the current tree needs to
> be slushed until the main problems are resolved.
I definitely concur on that. That's something I already suggested on
IRC a while ago, and I believe that we probably all agree that there's
such a need. The questions being how to do it properly (maitaining a
separate "stable" branch is not absolutely trivial), and also *who* is
going to maintain it.
In any case, we really want to take time to fix our bugs. I don't know
if we need to somehow "freeze" our tree to do that. I believe we sort
of need that, since we keep injecting new bugs on top of mostly
unknown existing ones involving and/or impacting many different
subsystems.
Maybe some sort of "puffinfest" would help cleaning up our kernel
before the situation gets out of control.
I'd really wish we talk about that while at OLS, with the guys that
are attending it ;)
my 2c
T-Bone
=20
>=20
>=20
>=20
> ----------------- Processor 1 LPMC Information
------------------
>=20
> Check Type =3D 0x00000000
> IC Parity Info =3D 0x00000000
> Cache Check =3D 0x00000000
> TLB Check =3D 0x00000000
> Bus Check =3D 0x00000000
> Assists Check =3D 0x00000000
> Assist State =3D 0x00000000
> Path Info =3D 0x00000000
> System Responder Address =3D 0x0000000000000000
> System Requestor Address =3D 0x0000000000000000
>=20
>=20
>=20
> ----------------- Processor 1 TOC Information
-------------------
>=20
> General Registers 0 - 31
> 00-03 0000000000000000 00000000103c5ca0 000000001014c628=20
00000000fe38c620
> 04-07 0000000010552cc0 00000000046db5d4 00000000105cc360=20
00000000105ccfb0
> 08-11 0000000000000018 0000000010424050 0000000000000001=20
00000000bd893300
> 12-15 0000000000000000 00000000ff85fc00 0000000000000008=20
00000000fe38c288
> 16-19 00000000fe38c620 00000000ff85b400 0000000000000000=20
0000000000024089
> 20-23 0002b96357cd9ae6 000000000009eb10 000000000000ff00=20
fffffff0f0430ed8
> 24-27 0000000000000520 0000000000000000 00000000046db5d4=20
0000000010552cc0
> 28-31 0000000000000000 00000000fe38cc20 00000000fe38cc50=20
0203010200802004
>=20
>=20
> Control Registers 0 - 31
> 00-03 0000000000000000 0000000000000000 0000000000000000=20
0000000000000000
> 04-07 0000000000000000 0000000000000000 0000000000000000=20
0000000000000000
> 08-11 000000000000e486 0000000000000000 00000000000000c0=20
0000000000000038
> 12-15 0000000000000000 0000000000000000 0000000000106000=20
0000000000000000
> 16-19 0002b96357d799de 0000000000000000 000000001014c67c=20
00000000020008b3
> 20-23 00000000103403b8 00000000e338cba8 000000ff0804fd0f=20
8140000000000000
> 24-27 00000000004cc000 00000000d153a000 0000000000041020=20
000000f0f0165650
> 28-31 000000f0f0165650 5555555555555555 00000000fe38c000=20
0000000000008020
>=20
> Space Registers 0 - 7
> 00-03 03921800 00000000 00000000 =20
03921800
> 04-07 00000000 00000000 00000000 =20
00000000
>=20
> IIA Space (back entry) =3D 0x0000000000000000
> IIA Offset (back entry) =3D 0x000000001014c680
> CPU State =3D 0x9e000001
>=20
>=20
> -------------- Memory Error Log Information --------------
>=20
> Bus 0 Log Information
>=20
>=20
> No errors logged for this bus
>=20
>=20
> ------------ I/O Module Error Log Information ------------
>=20
>=20
> No I/O module errors logged
>=20
>=20
> Service Menu: Enter command >
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
>=20
--=20
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-20 1:04 ` Kyle McMartin
@ 2005-07-20 3:31 ` John David Anglin
2005-07-20 2:57 ` Thibaut VARENE
2005-07-20 6:59 ` Kurt Fitzner
1 sibling, 1 reply; 14+ messages in thread
From: John David Anglin @ 2005-07-20 3:31 UTC (permalink / raw)
To: Kyle McMartin; +Cc: parisc-linux
> TOC dump, IIR, IOAQ/IASQ locations? Come on people... It's really hard
> to even begin to figure out what's wrong if no debugging information
> has been provided...
2.6.8.1-pa11 is quite stable (12 days up and numerous GCC builds).
As Joel has indicated, this can be pushed a bit further. However,
2.6.10 and later are not stable. Randolph and James resolved one
of the major bugs (fp register bug). However, this wasn't sufficient
to stabilize 2.6.12 under load. I spent considerable time trying
to isolate the change(s) that introduced the instability but this
is difficult and time consuming.
It might help to have a "stable" branch that is maintained longer
than is current practice. At a minimum, the current tree needs to
be slushed until the main problems are resolved.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
PS: TOC on Linux gsyprf11.external.hp.com 2.6.11-pa4 #5 SMP Sat May 21 19:09:19 PDT 2005 parisc64 GNU/Linux
Proc 0 r2 -> return from call to __brelse in journal_put_journal_head
Proc 0 IIA Offset -> location in __brelse
Proc 1 IIA Offset -> to final loop in panic.
----------------- Processor 0 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 000000001055f4c0 000000001023b3f4 0000000008108a24
04-07 0000000010552cc0 0000000204228918 0000000010517a40 000000013a7a0868
08-11 0000000000000001 0000000204228918 00000000ff85fc00 0000000000001000
12-15 0000000000000001 0000000013566ee8 0000000000000000 0000000204228918
16-19 0000000000000001 0000000000080000 0000000010553cc0 0000000000000000
20-23 0000000010517a40 0000000008108a24 000000000800000f 0000000008108a24
24-27 0000000000000001 00000000ffee4400 0000000204228930 0000000010552cc0
28-31 000000013a7a0868 00000000f8688a80 00000000f8688b30 0000000000000040
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000010c02 0000000000000000 00000000000000c0 000000000000003f
12-15 0000000000000000 0000000000000000 0000000000106000 fff0000000000000
16-19 0002b96338e536de 0000000000000000 00000000101b68dc 0000000008000240
20-23 0000000000000000 0000000000000000 00000000080c000e e200000000000000
24-27 00000000004cc000 00000000a4152000 0000000000041020 0000000041198b80
28-31 5555555555555555 5555555555555555 00000000f8688000 000000001051c000
Space Registers 0 - 7
00-03 04300800 04300800 00000000 04300800
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x00000000101b68d4
CPU State = 0x9e000001
----------------- Processor 1 HPMC Information - PDC Version: 42.09 ------
* * * No valid timestamp * * *
No HPMC chassis codes logged
General Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x0000000000000000
Check Type = 0x00000000
CPU State = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
Floating Point Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Check Summary = 0x0000000000000000
Available Memory = 0x0000000000000000
CPU Diagnose Register 2 = 0x0000000000000000
CPU Status Register 0 = 0x0000000000000000
CPU Status Register 1 = 0x0000000000000000
SADD LOG = 0x0000000000000000
Read Short LOG = 0x0000000000000000
----------------- Processor 1 LPMC Information ------------------
Check Type = 0x00000000
IC Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
----------------- Processor 1 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 00000000103c5ca0 000000001014c628 00000000fe38c620
04-07 0000000010552cc0 00000000046db5d4 00000000105cc360 00000000105ccfb0
08-11 0000000000000018 0000000010424050 0000000000000001 00000000bd893300
12-15 0000000000000000 00000000ff85fc00 0000000000000008 00000000fe38c288
16-19 00000000fe38c620 00000000ff85b400 0000000000000000 0000000000024089
20-23 0002b96357cd9ae6 000000000009eb10 000000000000ff00 fffffff0f0430ed8
24-27 0000000000000520 0000000000000000 00000000046db5d4 0000000010552cc0
28-31 0000000000000000 00000000fe38cc20 00000000fe38cc50 0203010200802004
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 000000000000e486 0000000000000000 00000000000000c0 0000000000000038
12-15 0000000000000000 0000000000000000 0000000000106000 0000000000000000
16-19 0002b96357d799de 0000000000000000 000000001014c67c 00000000020008b3
20-23 00000000103403b8 00000000e338cba8 000000ff0804fd0f 8140000000000000
24-27 00000000004cc000 00000000d153a000 0000000000041020 000000f0f0165650
28-31 000000f0f0165650 5555555555555555 00000000fe38c000 0000000000008020
Space Registers 0 - 7
00-03 03921800 00000000 00000000 03921800
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x000000001014c680
CPU State = 0x9e000001
-------------- Memory Error Log Information --------------
Bus 0 Log Information
No errors logged for this bus
------------ I/O Module Error Log Information ------------
No I/O module errors logged
Service Menu: Enter command >
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-20 1:04 ` Kyle McMartin
2005-07-20 3:31 ` John David Anglin
@ 2005-07-20 6:59 ` Kurt Fitzner
2005-07-20 16:40 ` Grant Grundler
1 sibling, 1 reply; 14+ messages in thread
From: Kurt Fitzner @ 2005-07-20 6:59 UTC (permalink / raw)
To: Kyle McMartin; +Cc: parisc-linux
Kyle McMartin wrote:
> TOC dump, IIR, IOAQ/IASQ locations? Come on people... It's really hard
> to even begin to figure out what's wrong if no debugging information
> has been provided...
>
> http://www.parisc-linux.org/faq/kernelbug-howto.html
I apologize - I had not seen that page before. I should have done more
research myself before reporting the issue.
There is no console output prior to the hang, and no kernel fault to
obtain the IAOQ/IASQ information from. I did perform a TOC. The data
from that is below.
If there is any further information that might help, please let me know.
Kurt.
Information about the machine/kernel:
- Kernel 2.6.12-pa2
- Compiled with gcc 3.3.5 (Debian 1:3.3.5-13), Binutils 2.15-6
- B132L w/ 3COM 3c905 ethernet card
- System map at http://www.excelcia.org/~kfitzner/System.map-2.6.12
- Kernel config at http://www.excelcia.org/~kfitzner/config-2.6.12
Output of "ser pim toc":
General Registers 0 - 31
0 - 3 0x00000000 0x10000000 0x101e3910 0x00000000
4 - 7 0x1389a14c 0x1389a034 0x105fcf60 0x1389a108
8 - 11 0x00000000 0x1389a14c 0x00000200 0x15273720
12 - 15 0x00000200 0x00000200 0x00000200 0x00000000
16 - 19 0x14502640 0x10428768 0x00000001 0x000280ca
20 - 23 0x17468122 0x000280ca 0x00000015 0x00000000
24 - 27 0x0000010f 0x1389a0f8 0x17468122 0x10412010
28 - 31 0x00000000 0x03980700 0x13980940 0x1014b700
Control Registers 0 - 31
0 - 3 0x00000000 0x00000000 0x00000000 0x00000000
4 - 7 0x00000000 0x00000000 0x00000000 0x00000000
8 - 11 0x00002632 0x00000000 0x000000c0 0x00000010
12 - 15 0x00000000 0x00000000 0x0010b800 0xf1000000
16 - 19 0x4c31b913 0x00000000 0x1010c1b0 0x001f0e60
20 - 23 0x00000000 0x1010c19c 0x0004ff00 0x01000000
24 - 27 0x004a0000 0x0342a000 0xffffffff 0x40e5fb80
28 - 31 0xaaaaaaaa 0x11111111 0x13980000 0x104ac000
Space Registers 0 - 7
0 - 3 0x00000000 0x00000000 0x00000000 0x00001319
4 - 7 0x00000000 0x00000000 0x00000000 0x00000000
IIA Space = 0x00000000
IIA Offset = 0x1010c1b0
CPU State = 0x9e000001
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-20 2:57 ` Thibaut VARENE
@ 2005-07-20 14:56 ` Matthew Wilcox
0 siblings, 0 replies; 14+ messages in thread
From: Matthew Wilcox @ 2005-07-20 14:56 UTC (permalink / raw)
To: Thibaut VARENE; +Cc: Kyle McMartin, John David Anglin, parisc-linux
On Wed, Jul 20, 2005 at 05:57:19AM +0300, Thibaut VARENE wrote:
> I definitely concur on that. That's something I already suggested on
> IRC a while ago, and I believe that we probably all agree that there's
> such a need. The questions being how to do it properly (maitaining a
> separate "stable" branch is not absolutely trivial), and also *who* is
> going to maintain it.
I believe I said at the time that you were more than welcome to maintain
such a thing. If you're just volunteering me to do more work ... sorry,
not interested.
> In any case, we really want to take time to fix our bugs. I don't know
> if we need to somehow "freeze" our tree to do that. I believe we sort
> of need that, since we keep injecting new bugs on top of mostly
> unknown existing ones involving and/or impacting many different
> subsystems.
Sounds good to me
> Maybe some sort of "puffinfest" would help cleaning up our kernel
> before the situation gets out of control.
>
> I'd really wish we talk about that while at OLS, with the guys that
> are attending it ;)
We can certainly get together at some point ... this week's pretty busy though!
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-20 6:59 ` Kurt Fitzner
@ 2005-07-20 16:40 ` Grant Grundler
2005-07-21 7:42 ` Kurt Fitzner
0 siblings, 1 reply; 14+ messages in thread
From: Grant Grundler @ 2005-07-20 16:40 UTC (permalink / raw)
To: Kurt Fitzner; +Cc: Kyle McMartin, parisc-linux
On Wed, Jul 20, 2005 at 12:59:42AM -0600, Kurt Fitzner wrote:
> I apologize - I had not seen that page before. I should have done more
> research myself before reporting the issue.
FAQ has a reference to it. But thanks for reporting the bug.
> There is no console output prior to the hang, and no kernel fault to
> obtain the IAOQ/IASQ information from. I did perform a TOC. The data
> from that is below.
>
> If there is any further information that might help, please let me know.
thanks - the key bit to start with is GR02 and IOAQ :
GR02 0x101e3910 nfs_mark_request_dirty+24
IOAQ 0x1010c1b0 intr_restore+11c
Sounds like either an interrupt storm from the card or a deadlock
in nfs code. Unfortunately TOC doesn't provide more stack trace
informaion. And I'm not able to chase NFS issues at the moment.
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-20 16:40 ` Grant Grundler
@ 2005-07-21 7:42 ` Kurt Fitzner
2005-07-21 12:36 ` Grant Grundler
2005-07-21 16:04 ` Kyle McMartin
0 siblings, 2 replies; 14+ messages in thread
From: Kurt Fitzner @ 2005-07-21 7:42 UTC (permalink / raw)
To: parisc-linux
John David Anglin wrote:
> 2.6.8.1-pa11 is quite stable (12 days up and numerous GCC builds).
I have switched to that version and now cannot reproduce the hang
problem. Thank-you for the suggestion.
> It might help to have a "stable" branch that is maintained longer
> than is current practice. At a minimum, the current tree needs to
> be slushed until the main problems are resolved.
> Grant Grundler wrote:
I am used to the old classic 'stable' line where each successive kernel
release under the stable tree was (theoretically) more stable than the
previous one. Perhaps, at a suggestion, a compromise can be reached by
relabelling kernels. When one is found to be quite stable label it the
2.6.N-paX. Other than that, call them 2.6.N-paX-test.
It shouldn't require too much in the way of maintenance and it might
keep naive users (like me) from using unstable kernels before they are
ready to give meaningful bug reports and feedback on problems in them.
Grant Grundler wrote:
> Sounds like either an interrupt storm from the card or a deadlock
> in nfs code. Unfortunately TOC doesn't provide more stack trace
> informaion. And I'm not able to chase NFS issues at the moment.
I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able
to reproduce the hang. Would it be helpful if I were to identify the
exact kernel version where the hang first begins to occur?
Kurt.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-21 7:42 ` Kurt Fitzner
@ 2005-07-21 12:36 ` Grant Grundler
2005-07-21 23:28 ` John David Anglin
2005-07-21 16:04 ` Kyle McMartin
1 sibling, 1 reply; 14+ messages in thread
From: Grant Grundler @ 2005-07-21 12:36 UTC (permalink / raw)
To: Kurt Fitzner; +Cc: parisc-linux
On Thu, Jul 21, 2005 at 01:42:12AM -0600, Kurt Fitzner wrote:
> I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able
> to reproduce the hang. Would it be helpful if I were to identify the
> exact kernel version where the hang first begins to occur?
Definitely. Be warned that this can be very time consuming.
If you can narrow the window to a major kernel release,
that would already be very helpful. The exact -paX version
would of course be perfect.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-21 7:42 ` Kurt Fitzner
2005-07-21 12:36 ` Grant Grundler
@ 2005-07-21 16:04 ` Kyle McMartin
1 sibling, 0 replies; 14+ messages in thread
From: Kyle McMartin @ 2005-07-21 16:04 UTC (permalink / raw)
To: Kurt Fitzner; +Cc: parisc-linux
On Thu, Jul 21, 2005 at 01:42:12AM -0600, Kurt Fitzner wrote:
> I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able
> to reproduce the hang. Would it be helpful if I were to identify the
> exact kernel version where the hang first begins to occur?
>
Binary searching from 2.6.8.1-pa11 onwards would be helpful.
[Pick the middle version between 2.6.8.1 and current, if it's broken, at
the middle of 2.6.8.1-pa11 to middle, otherwise middle to current, and
continue until you can narrow the timeframe.]
Cheers,
--
Kyle McMartin
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-21 12:36 ` Grant Grundler
@ 2005-07-21 23:28 ` John David Anglin
2005-07-22 0:29 ` Kurt Fitzner
0 siblings, 1 reply; 14+ messages in thread
From: John David Anglin @ 2005-07-21 23:28 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
> On Thu, Jul 21, 2005 at 01:42:12AM -0600, Kurt Fitzner wrote:
> > I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able
> > to reproduce the hang. Would it be helpful if I were to identify the
> > exact kernel version where the hang first begins to occur?
>
> Definitely. Be warned that this can be very time consuming.
> If you can narrow the window to a major kernel release,
> that would already be very helpful. The exact -paX version
> would of course be perfect.
I know that 32-bit 2.6.10 isn't stable on my c3k. There is a known
bug with kernel memcpy and fpregs. Either James' fix needs to be
backported, or builds need to be done with gcc-4.0.0 (or 4.0.1) using
the -mfixed-range as discussed previously on the list. I haven't
had time to try this.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-21 23:28 ` John David Anglin
@ 2005-07-22 0:29 ` Kurt Fitzner
2005-07-22 3:55 ` Grant Grundler
0 siblings, 1 reply; 14+ messages in thread
From: Kurt Fitzner @ 2005-07-22 0:29 UTC (permalink / raw)
To: parisc-linux
John David Anglin wrote:
> I know that 32-bit 2.6.10 isn't stable on my c3k. There is a known
> bug with kernel memcpy and fpregs.
Well, as far as the bug I am reporting goes, so far I have narrowed it
down to a kernel later than 2.6.10-pa11 and before 2.6.11-pa4. It
appears that whatever went into 2.6.10 isn't to blame.
It looks like the interrupt storm theory is best. The functions I get
from the TOC data this time are:
GRO2 0x101060e0 handle_interruption+6c
IOAQ 0x101120dc handle_unaligned+2c0
I am curious, though. This time when it hung the hearbeat didn't stop.
Does this mean that it didn't hang as solid as the other times? Are
interrupts still being handled at some kernel level if the heartbeat LED
is flashing normally? If this is the case, then this means the TOC data
may be useless this time around, right?
Kurt.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS
2005-07-22 0:29 ` Kurt Fitzner
@ 2005-07-22 3:55 ` Grant Grundler
0 siblings, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-07-22 3:55 UTC (permalink / raw)
To: Kurt Fitzner; +Cc: parisc-linux
On Thu, Jul 21, 2005 at 06:29:46PM -0600, Kurt Fitzner wrote:
> John David Anglin wrote:
>
> > I know that 32-bit 2.6.10 isn't stable on my c3k. There is a known
> > bug with kernel memcpy and fpregs.
>
> Well, as far as the bug I am reporting goes, so far I have narrowed it
> down to a kernel later than 2.6.10-pa11 and before 2.6.11-pa4. It
> appears that whatever went into 2.6.10 isn't to blame.
Ok...If you were to try one more kernel, could it be 2.6.11-pa1?
> It looks like the interrupt storm theory is best. The functions I get
> from the TOC data this time are:
>
> GRO2 0x101060e0 handle_interruption+6c
> IOAQ 0x101120dc handle_unaligned+2c0
yes, seems like it's likely too.
> I am curious, though. This time when it hung the hearbeat didn't stop.
> Does this mean that it didn't hang as solid as the other times?
that would be my guess too.
> Are
> interrupts still being handled at some kernel level if the heartbeat LED
> is flashing normally?
yes
> If this is the case, then this means the TOC data
> may be useless this time around, right?
Not necessarily. The TOC data may still be useful
for register state.
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-07-22 3:55 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-19 21:02 [parisc-linux] Machine hanging during high-traffic NFS Kurt Fitzner
2005-07-19 23:36 ` Michael S. Zick
2005-07-20 1:04 ` Kyle McMartin
2005-07-20 3:31 ` John David Anglin
2005-07-20 2:57 ` Thibaut VARENE
2005-07-20 14:56 ` Matthew Wilcox
2005-07-20 6:59 ` Kurt Fitzner
2005-07-20 16:40 ` Grant Grundler
2005-07-21 7:42 ` Kurt Fitzner
2005-07-21 12:36 ` Grant Grundler
2005-07-21 23:28 ` John David Anglin
2005-07-22 0:29 ` Kurt Fitzner
2005-07-22 3:55 ` Grant Grundler
2005-07-21 16:04 ` Kyle McMartin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.