* [parisc-linux] Machine hanging during high-traffic NFS @ 2005-07-19 21:02 Kurt Fitzner 2005-07-19 23:36 ` Michael S. Zick 2005-07-20 1:04 ` Kyle McMartin 0 siblings, 2 replies; 14+ messages in thread From: Kurt Fitzner @ 2005-07-19 21:02 UTC (permalink / raw) To: parisc-linux I've been using nfs to try and save backup images from my B132L (2.6.12-pa2) with a simple: dd if=/dev/sda of=/mnt/bulk/sda-image bs=512 Every time the machine hangs solid - the heartbeat LED even stops. Usually it hangs after around 1 to 2 gigs have been transferred. There are no log entries at the time of the hang. IT just... stops. I'm using a 3c905 PCI ethernet card rather than the stock 10 megabit LASI on board. I'm wondering if this might be an issue with the ethernet driver when compiled for PARISC. I've tried very large ftp transfers and can't reproduce the problem that way. I've also tried NFS over TCP and tried reducing the rsize/wsize below 1500 bytes to prevent IP fragmentation. Neither of which seem to help. Are there any known NFS issues right now? Any ideas? Suggestions? Kurt. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-19 21:02 [parisc-linux] Machine hanging during high-traffic NFS Kurt Fitzner @ 2005-07-19 23:36 ` Michael S. Zick 2005-07-20 1:04 ` Kyle McMartin 1 sibling, 0 replies; 14+ messages in thread From: Michael S. Zick @ 2005-07-19 23:36 UTC (permalink / raw) To: parisc-linux On Tue July 19 2005 16:02, Kurt Fitzner wrote: > I've been using nfs to try and save backup images from my B132L > (2.6.12-pa2) with a simple: > dd if=/dev/sda of=/mnt/bulk/sda-image bs=512 > > Every time the machine hangs solid - the heartbeat LED even stops. > Usually it hangs after around 1 to 2 gigs have been transferred. There > are no log entries at the time of the hang. IT just... stops. > > I'm using a 3c905 PCI ethernet card rather than the stock 10 megabit > LASI on board. > > I'm wondering if this might be an issue with the ethernet driver when > compiled for PARISC. I've tried very large ftp transfers and can't > reproduce the problem that way. > > I've also tried NFS over TCP and tried reducing the rsize/wsize below > 1500 bytes to prevent IP fragmentation. Neither of which seem to help. > > Are there any known NFS issues right now? Any ideas? Suggestions? > Questions/Suggestions only. Any hints in the log of the receiving (nfs server) side? Any portion of /dev/sda mounted somewhere? Is the /mnt/bulk/sda-image mount point on /dev/sda* ? That is, is there a drive in common with '/', '/mnt', '/dev' and the entire device '/dev/sda' ? Can you achive your goal with a file copy rather than a disk image? Have you tried running rsync? Can you successfully transfer (dd) a single file larger than your trouble point size when trying to transfer the entire device? Have you tried a blocksize != 512 with the dd command? Perhaps an even sub-multiple of the packet size so that the network stack does not have to fragment the dd blocks. Mike > Kurt. > _______________________________________________ > parisc-linux mailing list > parisc-linux@lists.parisc-linux.org > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux > > _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-19 21:02 [parisc-linux] Machine hanging during high-traffic NFS Kurt Fitzner 2005-07-19 23:36 ` Michael S. Zick @ 2005-07-20 1:04 ` Kyle McMartin 2005-07-20 3:31 ` John David Anglin 2005-07-20 6:59 ` Kurt Fitzner 1 sibling, 2 replies; 14+ messages in thread From: Kyle McMartin @ 2005-07-20 1:04 UTC (permalink / raw) To: Kurt Fitzner; +Cc: parisc-linux On Tue, Jul 19, 2005 at 03:02:55PM -0600, Kurt Fitzner wrote: > I'm using a 3c905 PCI ethernet card rather than the stock 10 megabit > LASI on board. > > I'm wondering if this might be an issue with the ethernet driver when > compiled for PARISC. I've tried very large ftp transfers and can't > reproduce the problem that way. TOC dump, IIR, IOAQ/IASQ locations? Come on people... It's really hard to even begin to figure out what's wrong if no debugging information has been provided... http://www.parisc-linux.org/faq/kernelbug-howto.html -- Kyle McMartin _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-20 1:04 ` Kyle McMartin @ 2005-07-20 3:31 ` John David Anglin 2005-07-20 2:57 ` Thibaut VARENE 2005-07-20 6:59 ` Kurt Fitzner 1 sibling, 1 reply; 14+ messages in thread From: John David Anglin @ 2005-07-20 3:31 UTC (permalink / raw) To: Kyle McMartin; +Cc: parisc-linux > TOC dump, IIR, IOAQ/IASQ locations? Come on people... It's really hard > to even begin to figure out what's wrong if no debugging information > has been provided... 2.6.8.1-pa11 is quite stable (12 days up and numerous GCC builds). As Joel has indicated, this can be pushed a bit further. However, 2.6.10 and later are not stable. Randolph and James resolved one of the major bugs (fp register bug). However, this wasn't sufficient to stabilize 2.6.12 under load. I spent considerable time trying to isolate the change(s) that introduced the instability but this is difficult and time consuming. It might help to have a "stable" branch that is maintained longer than is current practice. At a minimum, the current tree needs to be slushed until the main problems are resolved. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) PS: TOC on Linux gsyprf11.external.hp.com 2.6.11-pa4 #5 SMP Sat May 21 19:09:19 PDT 2005 parisc64 GNU/Linux Proc 0 r2 -> return from call to __brelse in journal_put_journal_head Proc 0 IIA Offset -> location in __brelse Proc 1 IIA Offset -> to final loop in panic. ----------------- Processor 0 TOC Information ------------------- General Registers 0 - 31 00-03 0000000000000000 000000001055f4c0 000000001023b3f4 0000000008108a24 04-07 0000000010552cc0 0000000204228918 0000000010517a40 000000013a7a0868 08-11 0000000000000001 0000000204228918 00000000ff85fc00 0000000000001000 12-15 0000000000000001 0000000013566ee8 0000000000000000 0000000204228918 16-19 0000000000000001 0000000000080000 0000000010553cc0 0000000000000000 20-23 0000000010517a40 0000000008108a24 000000000800000f 0000000008108a24 24-27 0000000000000001 00000000ffee4400 0000000204228930 0000000010552cc0 28-31 000000013a7a0868 00000000f8688a80 00000000f8688b30 0000000000000040 Control Registers 0 - 31 00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 08-11 0000000000010c02 0000000000000000 00000000000000c0 000000000000003f 12-15 0000000000000000 0000000000000000 0000000000106000 fff0000000000000 16-19 0002b96338e536de 0000000000000000 00000000101b68dc 0000000008000240 20-23 0000000000000000 0000000000000000 00000000080c000e e200000000000000 24-27 00000000004cc000 00000000a4152000 0000000000041020 0000000041198b80 28-31 5555555555555555 5555555555555555 00000000f8688000 000000001051c000 Space Registers 0 - 7 00-03 04300800 04300800 00000000 04300800 04-07 00000000 00000000 00000000 00000000 IIA Space (back entry) = 0x0000000000000000 IIA Offset (back entry) = 0x00000000101b68d4 CPU State = 0x9e000001 ----------------- Processor 1 HPMC Information - PDC Version: 42.09 ------ * * * No valid timestamp * * * No HPMC chassis codes logged General Registers 0 - 31 00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000 12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000 16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000 20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000 24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000 28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Control Registers 0 - 31 00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000 12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000 16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000 20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000 24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000 28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Space Registers 0 - 7 00-03 00000000 00000000 00000000 00000000 04-07 00000000 00000000 00000000 00000000 IIA Space (back entry) = 0x0000000000000000 IIA Offset (back entry) = 0x0000000000000000 Check Type = 0x00000000 CPU State = 0x00000000 Cache Check = 0x00000000 TLB Check = 0x00000000 Bus Check = 0x00000000 Assists Check = 0x00000000 Assist State = 0x00000000 Path Info = 0x00000000 System Responder Address = 0x0000000000000000 System Requestor Address = 0x0000000000000000 Floating Point Registers 0 - 31 00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000 12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000 16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000 20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000 24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000 28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Check Summary = 0x0000000000000000 Available Memory = 0x0000000000000000 CPU Diagnose Register 2 = 0x0000000000000000 CPU Status Register 0 = 0x0000000000000000 CPU Status Register 1 = 0x0000000000000000 SADD LOG = 0x0000000000000000 Read Short LOG = 0x0000000000000000 ----------------- Processor 1 LPMC Information ------------------ Check Type = 0x00000000 IC Parity Info = 0x00000000 Cache Check = 0x00000000 TLB Check = 0x00000000 Bus Check = 0x00000000 Assists Check = 0x00000000 Assist State = 0x00000000 Path Info = 0x00000000 System Responder Address = 0x0000000000000000 System Requestor Address = 0x0000000000000000 ----------------- Processor 1 TOC Information ------------------- General Registers 0 - 31 00-03 0000000000000000 00000000103c5ca0 000000001014c628 00000000fe38c620 04-07 0000000010552cc0 00000000046db5d4 00000000105cc360 00000000105ccfb0 08-11 0000000000000018 0000000010424050 0000000000000001 00000000bd893300 12-15 0000000000000000 00000000ff85fc00 0000000000000008 00000000fe38c288 16-19 00000000fe38c620 00000000ff85b400 0000000000000000 0000000000024089 20-23 0002b96357cd9ae6 000000000009eb10 000000000000ff00 fffffff0f0430ed8 24-27 0000000000000520 0000000000000000 00000000046db5d4 0000000010552cc0 28-31 0000000000000000 00000000fe38cc20 00000000fe38cc50 0203010200802004 Control Registers 0 - 31 00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 08-11 000000000000e486 0000000000000000 00000000000000c0 0000000000000038 12-15 0000000000000000 0000000000000000 0000000000106000 0000000000000000 16-19 0002b96357d799de 0000000000000000 000000001014c67c 00000000020008b3 20-23 00000000103403b8 00000000e338cba8 000000ff0804fd0f 8140000000000000 24-27 00000000004cc000 00000000d153a000 0000000000041020 000000f0f0165650 28-31 000000f0f0165650 5555555555555555 00000000fe38c000 0000000000008020 Space Registers 0 - 7 00-03 03921800 00000000 00000000 03921800 04-07 00000000 00000000 00000000 00000000 IIA Space (back entry) = 0x0000000000000000 IIA Offset (back entry) = 0x000000001014c680 CPU State = 0x9e000001 -------------- Memory Error Log Information -------------- Bus 0 Log Information No errors logged for this bus ------------ I/O Module Error Log Information ------------ No I/O module errors logged Service Menu: Enter command > _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-20 3:31 ` John David Anglin @ 2005-07-20 2:57 ` Thibaut VARENE 2005-07-20 14:56 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Thibaut VARENE @ 2005-07-20 2:57 UTC (permalink / raw) To: John David Anglin; +Cc: Kyle McMartin, parisc-linux JDA wrote: > It might help to have a "stable" branch that is maintained longer > than is current practice. At a minimum, the current tree needs to > be slushed until the main problems are resolved. I definitely concur on that. That's something I already suggested on IRC a while ago, and I believe that we probably all agree that there's such a need. The questions being how to do it properly (maitaining a separate "stable" branch is not absolutely trivial), and also *who* is going to maintain it. In any case, we really want to take time to fix our bugs. I don't know if we need to somehow "freeze" our tree to do that. I believe we sort of need that, since we keep injecting new bugs on top of mostly unknown existing ones involving and/or impacting many different subsystems. Maybe some sort of "puffinfest" would help cleaning up our kernel before the situation gets out of control. I'd really wish we talk about that while at OLS, with the guys that are attending it ;) my 2c T-Bone =20 >=20 >=20 >=20 > ----------------- Processor 1 LPMC Information ------------------ >=20 > Check Type =3D 0x00000000 > IC Parity Info =3D 0x00000000 > Cache Check =3D 0x00000000 > TLB Check =3D 0x00000000 > Bus Check =3D 0x00000000 > Assists Check =3D 0x00000000 > Assist State =3D 0x00000000 > Path Info =3D 0x00000000 > System Responder Address =3D 0x0000000000000000 > System Requestor Address =3D 0x0000000000000000 >=20 >=20 >=20 > ----------------- Processor 1 TOC Information ------------------- >=20 > General Registers 0 - 31 > 00-03 0000000000000000 00000000103c5ca0 000000001014c628=20 00000000fe38c620 > 04-07 0000000010552cc0 00000000046db5d4 00000000105cc360=20 00000000105ccfb0 > 08-11 0000000000000018 0000000010424050 0000000000000001=20 00000000bd893300 > 12-15 0000000000000000 00000000ff85fc00 0000000000000008=20 00000000fe38c288 > 16-19 00000000fe38c620 00000000ff85b400 0000000000000000=20 0000000000024089 > 20-23 0002b96357cd9ae6 000000000009eb10 000000000000ff00=20 fffffff0f0430ed8 > 24-27 0000000000000520 0000000000000000 00000000046db5d4=20 0000000010552cc0 > 28-31 0000000000000000 00000000fe38cc20 00000000fe38cc50=20 0203010200802004 >=20 >=20 > Control Registers 0 - 31 > 00-03 0000000000000000 0000000000000000 0000000000000000=20 0000000000000000 > 04-07 0000000000000000 0000000000000000 0000000000000000=20 0000000000000000 > 08-11 000000000000e486 0000000000000000 00000000000000c0=20 0000000000000038 > 12-15 0000000000000000 0000000000000000 0000000000106000=20 0000000000000000 > 16-19 0002b96357d799de 0000000000000000 000000001014c67c=20 00000000020008b3 > 20-23 00000000103403b8 00000000e338cba8 000000ff0804fd0f=20 8140000000000000 > 24-27 00000000004cc000 00000000d153a000 0000000000041020=20 000000f0f0165650 > 28-31 000000f0f0165650 5555555555555555 00000000fe38c000=20 0000000000008020 >=20 > Space Registers 0 - 7 > 00-03 03921800 00000000 00000000 =20 03921800 > 04-07 00000000 00000000 00000000 =20 00000000 >=20 > IIA Space (back entry) =3D 0x0000000000000000 > IIA Offset (back entry) =3D 0x000000001014c680 > CPU State =3D 0x9e000001 >=20 >=20 > -------------- Memory Error Log Information -------------- >=20 > Bus 0 Log Information >=20 >=20 > No errors logged for this bus >=20 >=20 > ------------ I/O Module Error Log Information ------------ >=20 >=20 > No I/O module errors logged >=20 >=20 > Service Menu: Enter command > > _______________________________________________ > parisc-linux mailing list > parisc-linux@lists.parisc-linux.org > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux >=20 --=20 Thibaut VARENE http://www.parisc-linux.org/~varenet/ _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-20 2:57 ` Thibaut VARENE @ 2005-07-20 14:56 ` Matthew Wilcox 0 siblings, 0 replies; 14+ messages in thread From: Matthew Wilcox @ 2005-07-20 14:56 UTC (permalink / raw) To: Thibaut VARENE; +Cc: Kyle McMartin, John David Anglin, parisc-linux On Wed, Jul 20, 2005 at 05:57:19AM +0300, Thibaut VARENE wrote: > I definitely concur on that. That's something I already suggested on > IRC a while ago, and I believe that we probably all agree that there's > such a need. The questions being how to do it properly (maitaining a > separate "stable" branch is not absolutely trivial), and also *who* is > going to maintain it. I believe I said at the time that you were more than welcome to maintain such a thing. If you're just volunteering me to do more work ... sorry, not interested. > In any case, we really want to take time to fix our bugs. I don't know > if we need to somehow "freeze" our tree to do that. I believe we sort > of need that, since we keep injecting new bugs on top of mostly > unknown existing ones involving and/or impacting many different > subsystems. Sounds good to me > Maybe some sort of "puffinfest" would help cleaning up our kernel > before the situation gets out of control. > > I'd really wish we talk about that while at OLS, with the guys that > are attending it ;) We can certainly get together at some point ... this week's pretty busy though! -- "Next the statesmen will invent cheap lies, putting the blame upon the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refuse to examine any refutations of them; and thus he will by and by convince himself that the war is just, and will thank God for the better sleep he enjoys after this process of grotesque self-deception." -- Mark Twain _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-20 1:04 ` Kyle McMartin 2005-07-20 3:31 ` John David Anglin @ 2005-07-20 6:59 ` Kurt Fitzner 2005-07-20 16:40 ` Grant Grundler 1 sibling, 1 reply; 14+ messages in thread From: Kurt Fitzner @ 2005-07-20 6:59 UTC (permalink / raw) To: Kyle McMartin; +Cc: parisc-linux Kyle McMartin wrote: > TOC dump, IIR, IOAQ/IASQ locations? Come on people... It's really hard > to even begin to figure out what's wrong if no debugging information > has been provided... > > http://www.parisc-linux.org/faq/kernelbug-howto.html I apologize - I had not seen that page before. I should have done more research myself before reporting the issue. There is no console output prior to the hang, and no kernel fault to obtain the IAOQ/IASQ information from. I did perform a TOC. The data from that is below. If there is any further information that might help, please let me know. Kurt. Information about the machine/kernel: - Kernel 2.6.12-pa2 - Compiled with gcc 3.3.5 (Debian 1:3.3.5-13), Binutils 2.15-6 - B132L w/ 3COM 3c905 ethernet card - System map at http://www.excelcia.org/~kfitzner/System.map-2.6.12 - Kernel config at http://www.excelcia.org/~kfitzner/config-2.6.12 Output of "ser pim toc": General Registers 0 - 31 0 - 3 0x00000000 0x10000000 0x101e3910 0x00000000 4 - 7 0x1389a14c 0x1389a034 0x105fcf60 0x1389a108 8 - 11 0x00000000 0x1389a14c 0x00000200 0x15273720 12 - 15 0x00000200 0x00000200 0x00000200 0x00000000 16 - 19 0x14502640 0x10428768 0x00000001 0x000280ca 20 - 23 0x17468122 0x000280ca 0x00000015 0x00000000 24 - 27 0x0000010f 0x1389a0f8 0x17468122 0x10412010 28 - 31 0x00000000 0x03980700 0x13980940 0x1014b700 Control Registers 0 - 31 0 - 3 0x00000000 0x00000000 0x00000000 0x00000000 4 - 7 0x00000000 0x00000000 0x00000000 0x00000000 8 - 11 0x00002632 0x00000000 0x000000c0 0x00000010 12 - 15 0x00000000 0x00000000 0x0010b800 0xf1000000 16 - 19 0x4c31b913 0x00000000 0x1010c1b0 0x001f0e60 20 - 23 0x00000000 0x1010c19c 0x0004ff00 0x01000000 24 - 27 0x004a0000 0x0342a000 0xffffffff 0x40e5fb80 28 - 31 0xaaaaaaaa 0x11111111 0x13980000 0x104ac000 Space Registers 0 - 7 0 - 3 0x00000000 0x00000000 0x00000000 0x00001319 4 - 7 0x00000000 0x00000000 0x00000000 0x00000000 IIA Space = 0x00000000 IIA Offset = 0x1010c1b0 CPU State = 0x9e000001 _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-20 6:59 ` Kurt Fitzner @ 2005-07-20 16:40 ` Grant Grundler 2005-07-21 7:42 ` Kurt Fitzner 0 siblings, 1 reply; 14+ messages in thread From: Grant Grundler @ 2005-07-20 16:40 UTC (permalink / raw) To: Kurt Fitzner; +Cc: Kyle McMartin, parisc-linux On Wed, Jul 20, 2005 at 12:59:42AM -0600, Kurt Fitzner wrote: > I apologize - I had not seen that page before. I should have done more > research myself before reporting the issue. FAQ has a reference to it. But thanks for reporting the bug. > There is no console output prior to the hang, and no kernel fault to > obtain the IAOQ/IASQ information from. I did perform a TOC. The data > from that is below. > > If there is any further information that might help, please let me know. thanks - the key bit to start with is GR02 and IOAQ : GR02 0x101e3910 nfs_mark_request_dirty+24 IOAQ 0x1010c1b0 intr_restore+11c Sounds like either an interrupt storm from the card or a deadlock in nfs code. Unfortunately TOC doesn't provide more stack trace informaion. And I'm not able to chase NFS issues at the moment. grant _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-20 16:40 ` Grant Grundler @ 2005-07-21 7:42 ` Kurt Fitzner 2005-07-21 12:36 ` Grant Grundler 2005-07-21 16:04 ` Kyle McMartin 0 siblings, 2 replies; 14+ messages in thread From: Kurt Fitzner @ 2005-07-21 7:42 UTC (permalink / raw) To: parisc-linux John David Anglin wrote: > 2.6.8.1-pa11 is quite stable (12 days up and numerous GCC builds). I have switched to that version and now cannot reproduce the hang problem. Thank-you for the suggestion. > It might help to have a "stable" branch that is maintained longer > than is current practice. At a minimum, the current tree needs to > be slushed until the main problems are resolved. > Grant Grundler wrote: I am used to the old classic 'stable' line where each successive kernel release under the stable tree was (theoretically) more stable than the previous one. Perhaps, at a suggestion, a compromise can be reached by relabelling kernels. When one is found to be quite stable label it the 2.6.N-paX. Other than that, call them 2.6.N-paX-test. It shouldn't require too much in the way of maintenance and it might keep naive users (like me) from using unstable kernels before they are ready to give meaningful bug reports and feedback on problems in them. Grant Grundler wrote: > Sounds like either an interrupt storm from the card or a deadlock > in nfs code. Unfortunately TOC doesn't provide more stack trace > informaion. And I'm not able to chase NFS issues at the moment. I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able to reproduce the hang. Would it be helpful if I were to identify the exact kernel version where the hang first begins to occur? Kurt. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-21 7:42 ` Kurt Fitzner @ 2005-07-21 12:36 ` Grant Grundler 2005-07-21 23:28 ` John David Anglin 2005-07-21 16:04 ` Kyle McMartin 1 sibling, 1 reply; 14+ messages in thread From: Grant Grundler @ 2005-07-21 12:36 UTC (permalink / raw) To: Kurt Fitzner; +Cc: parisc-linux On Thu, Jul 21, 2005 at 01:42:12AM -0600, Kurt Fitzner wrote: > I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able > to reproduce the hang. Would it be helpful if I were to identify the > exact kernel version where the hang first begins to occur? Definitely. Be warned that this can be very time consuming. If you can narrow the window to a major kernel release, that would already be very helpful. The exact -paX version would of course be perfect. thanks, grant _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-21 12:36 ` Grant Grundler @ 2005-07-21 23:28 ` John David Anglin 2005-07-22 0:29 ` Kurt Fitzner 0 siblings, 1 reply; 14+ messages in thread From: John David Anglin @ 2005-07-21 23:28 UTC (permalink / raw) To: Grant Grundler; +Cc: parisc-linux > On Thu, Jul 21, 2005 at 01:42:12AM -0600, Kurt Fitzner wrote: > > I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able > > to reproduce the hang. Would it be helpful if I were to identify the > > exact kernel version where the hang first begins to occur? > > Definitely. Be warned that this can be very time consuming. > If you can narrow the window to a major kernel release, > that would already be very helpful. The exact -paX version > would of course be perfect. I know that 32-bit 2.6.10 isn't stable on my c3k. There is a known bug with kernel memcpy and fpregs. Either James' fix needs to be backported, or builds need to be done with gcc-4.0.0 (or 4.0.1) using the -mfixed-range as discussed previously on the list. I haven't had time to try this. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-21 23:28 ` John David Anglin @ 2005-07-22 0:29 ` Kurt Fitzner 2005-07-22 3:55 ` Grant Grundler 0 siblings, 1 reply; 14+ messages in thread From: Kurt Fitzner @ 2005-07-22 0:29 UTC (permalink / raw) To: parisc-linux John David Anglin wrote: > I know that 32-bit 2.6.10 isn't stable on my c3k. There is a known > bug with kernel memcpy and fpregs. Well, as far as the bug I am reporting goes, so far I have narrowed it down to a kernel later than 2.6.10-pa11 and before 2.6.11-pa4. It appears that whatever went into 2.6.10 isn't to blame. It looks like the interrupt storm theory is best. The functions I get from the TOC data this time are: GRO2 0x101060e0 handle_interruption+6c IOAQ 0x101120dc handle_unaligned+2c0 I am curious, though. This time when it hung the hearbeat didn't stop. Does this mean that it didn't hang as solid as the other times? Are interrupts still being handled at some kernel level if the heartbeat LED is flashing normally? If this is the case, then this means the TOC data may be useless this time around, right? Kurt. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-22 0:29 ` Kurt Fitzner @ 2005-07-22 3:55 ` Grant Grundler 0 siblings, 0 replies; 14+ messages in thread From: Grant Grundler @ 2005-07-22 3:55 UTC (permalink / raw) To: Kurt Fitzner; +Cc: parisc-linux On Thu, Jul 21, 2005 at 06:29:46PM -0600, Kurt Fitzner wrote: > John David Anglin wrote: > > > I know that 32-bit 2.6.10 isn't stable on my c3k. There is a known > > bug with kernel memcpy and fpregs. > > Well, as far as the bug I am reporting goes, so far I have narrowed it > down to a kernel later than 2.6.10-pa11 and before 2.6.11-pa4. It > appears that whatever went into 2.6.10 isn't to blame. Ok...If you were to try one more kernel, could it be 2.6.11-pa1? > It looks like the interrupt storm theory is best. The functions I get > from the TOC data this time are: > > GRO2 0x101060e0 handle_interruption+6c > IOAQ 0x101120dc handle_unaligned+2c0 yes, seems like it's likely too. > I am curious, though. This time when it hung the hearbeat didn't stop. > Does this mean that it didn't hang as solid as the other times? that would be my guess too. > Are > interrupts still being handled at some kernel level if the heartbeat LED > is flashing normally? yes > If this is the case, then this means the TOC data > may be useless this time around, right? Not necessarily. The TOC data may still be useful for register state. grant _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parisc-linux] Machine hanging during high-traffic NFS 2005-07-21 7:42 ` Kurt Fitzner 2005-07-21 12:36 ` Grant Grundler @ 2005-07-21 16:04 ` Kyle McMartin 1 sibling, 0 replies; 14+ messages in thread From: Kyle McMartin @ 2005-07-21 16:04 UTC (permalink / raw) To: Kurt Fitzner; +Cc: parisc-linux On Thu, Jul 21, 2005 at 01:42:12AM -0600, Kurt Fitzner wrote: > I 'downgraded' to 2.6.8.1-pa11 as Mr. Anglin suggested and I am not able > to reproduce the hang. Would it be helpful if I were to identify the > exact kernel version where the hang first begins to occur? > Binary searching from 2.6.8.1-pa11 onwards would be helpful. [Pick the middle version between 2.6.8.1 and current, if it's broken, at the middle of 2.6.8.1-pa11 to middle, otherwise middle to current, and continue until you can narrow the timeframe.] Cheers, -- Kyle McMartin _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-07-22 3:55 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-19 21:02 [parisc-linux] Machine hanging during high-traffic NFS Kurt Fitzner 2005-07-19 23:36 ` Michael S. Zick 2005-07-20 1:04 ` Kyle McMartin 2005-07-20 3:31 ` John David Anglin 2005-07-20 2:57 ` Thibaut VARENE 2005-07-20 14:56 ` Matthew Wilcox 2005-07-20 6:59 ` Kurt Fitzner 2005-07-20 16:40 ` Grant Grundler 2005-07-21 7:42 ` Kurt Fitzner 2005-07-21 12:36 ` Grant Grundler 2005-07-21 23:28 ` John David Anglin 2005-07-22 0:29 ` Kurt Fitzner 2005-07-22 3:55 ` Grant Grundler 2005-07-21 16:04 ` Kyle McMartin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.