* Linux 2.4.21-rc7
@ 2003-06-03 17:04 Marcelo Tosatti
2003-06-03 18:02 ` Tomas Szepe
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Marcelo Tosatti @ 2003-06-03 17:04 UTC (permalink / raw)
To: lkml
Hallo,
Now I really hope its the last one, all this rc's are making me mad.
Ok, here it is.
Summary of changes from v2.4.21-rc6 to v2.4.21-rc7
============================================
<ehabkost@conectiva.com.br>:
o [SPARC]: Export phys_base on sparc32
<jgarzik@pobox.com>:
o fix olympic driver build
<lethal@linux-sh.org>:
o Fix Solution Engine 7751 Build
o Define VM_DATA_DEFAULT_FLAGS for SH
<wesolows@foobazco.org>:
o [sparc]: Attempt mul/div emulation handling on all cpus
David S. Miller <davem@nuts.ninka.net>:
o [SPARC]: Fix sys_ipc to return ENOSYS instead of EINVAL as appropriate
o [SPARC64]: Implement dump_stack in 2.4.x
o [SPARC64]: Only use power interrupt when button property exists
o [IPV4/IPV6]: Use Jenkins hash for fragment reassembly handling
o [IPV6]: Input full addresses into TCP_SYNQ hash function
o [IPV4]: Add sysctl to control ipfrag_secret_interval
o [SPARC64]: Fix probe error handling in envctrl.c driver
o [SPARC64]: Fix probe error handling in bbc_{envctrl,i2c}.c driver
o [SPARC64]: Fix exploitable holes and bugs in ioctl32 translations
Douglas Gilbert <dougg@torque.net>:
o sg: Fix side effect introduced by last "off by one" fix
Eric Brower <ebrower@usa.net>:
o [SPARC]: Refactor AUXIO support
Marcelo Tosatti <marcelo@freak.distro.conectiva>:
o Changed EXTRAVERSION to -rc7
Pete Zaitcev <zaitcev@redhat.com>:
o [sparc] Force type in __put_user
o [SPARC]: Fix gcc-3.x builds
Rob Radez <rob@osinvestor.com>:
o [sparc]: Fix uninitialized spinlock in SRMMU code
o [SPARC]: Kill initialize_secondary, unused
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: Linux 2.4.21-rc7 2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti @ 2003-06-03 18:02 ` Tomas Szepe 2003-06-03 18:07 ` Marcelo Tosatti 2003-06-03 18:30 ` Alex Romosan 2003-06-05 12:09 ` Andreas Haumer 2 siblings, 1 reply; 19+ messages in thread From: Tomas Szepe @ 2003-06-03 18:02 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml, alan > [marcelo@conectiva.com.br] > > Now I really hope its the last one, all this rc's are making me mad. Are you quite sure you don't want Alan to get you the updates necessary for IDE to build as modules for .21 final? -- Tomas Szepe <szepe@pinerecords.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 18:02 ` Tomas Szepe @ 2003-06-03 18:07 ` Marcelo Tosatti 2003-06-03 19:15 ` lk 0 siblings, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2003-06-03 18:07 UTC (permalink / raw) To: Tomas Szepe; +Cc: lkml, alan On Tue, 3 Jun 2003, Tomas Szepe wrote: > > [marcelo@conectiva.com.br] > > > > Now I really hope its the last one, all this rc's are making me mad. > > Are you quite sure you don't want Alan to get you the updates necessary > for IDE to build as modules for .21 final? Well, I can for sure release -rc8 with that. I just want this possible -rc8 to be released no later than tonight. Alan? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 18:07 ` Marcelo Tosatti @ 2003-06-03 19:15 ` lk 2003-06-03 19:40 ` Alan Cox 0 siblings, 1 reply; 19+ messages in thread From: lk @ 2003-06-03 19:15 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml > > > Now I really hope its the last one, all this rc's are making me mad. > > > > Are you quite sure you don't want Alan to get you the updates necessary > > for IDE to build as modules for .21 final? > > Well, I can for sure release -rc8 with that. > > I just want this possible -rc8 to be released no later than tonight. Unfortunately I just committed my test box to production and can't test Alan's SiImage fixes in rc6-ac2, but if they pan out, please try to include them in -rc8 as well. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 19:15 ` lk @ 2003-06-03 19:40 ` Alan Cox 0 siblings, 0 replies; 19+ messages in thread From: Alan Cox @ 2003-06-03 19:40 UTC (permalink / raw) To: lk; +Cc: Marcelo Tosatti, lkml On Maw, 2003-06-03 at 20:15, lk@trolloc.com wrote: > Unfortunately I just committed my test box to production and can't test > Alan's SiImage fixes in rc6-ac2, but if they pan out, please try to > include them in -rc8 as well. You could add the dma autoenable but the rest should be avoided ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti 2003-06-03 18:02 ` Tomas Szepe @ 2003-06-03 18:30 ` Alex Romosan 2003-06-03 19:27 ` Jeff Garzik 2003-06-05 12:09 ` Andreas Haumer 2 siblings, 1 reply; 19+ messages in thread From: Alex Romosan @ 2003-06-03 18:30 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml Marcelo Tosatti <marcelo@conectiva.com.br> writes: > Now I really hope its the last one, all this rc's are making me mad. i still can't get it to compile for sparc32: gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck': /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user': /usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `d' conflicts with asm clobber list /usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `l' conflicts with asm clobber list /usr/src/linux/include/asm/checksum.h:81: error: asm-specifier for variable `s' conflicts with asm clobber list /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_to_user': /usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `d' conflicts with asm clobber list /usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `l' conflicts with asm clobber list /usr/src/linux/include/asm/checksum.h:108: error: asm-specifier for variable `s' conflicts with asm clobber list make[3]: *** [ksyms.o] Error 1 make[3]: Leaving directory `/usr/src/linux/kernel' make[2]: *** [first_rule] Error 2 make[2]: Leaving directory `/usr/src/linux/kernel' make[1]: *** [_dir_kernel] Error 2 make[1]: Leaving directory `/usr/src/linux' make: *** [stamp-build] Error 2 not sure when this started. the last kernel i managed to compile was rc2 (skipped rc3 and rc4, rc5 didn't compile). the last one that will boot was 2.4.21-pre1. this is on a sun4m Fujitsu TurboSparc. --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 18:30 ` Alex Romosan @ 2003-06-03 19:27 ` Jeff Garzik 2003-06-03 19:58 ` Alex Romosan 0 siblings, 1 reply; 19+ messages in thread From: Jeff Garzik @ 2003-06-03 19:27 UTC (permalink / raw) To: Alex Romosan; +Cc: Marcelo Tosatti, lkml On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote: > Marcelo Tosatti <marcelo@conectiva.com.br> writes: > > > Now I really hope its the last one, all this rc's are making me mad. > > i still can't get it to compile for sparc32: > > gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c > /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck': > /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list > /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list > /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user': That looks like you either need a different compiler version, or different binutils version... Jeff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 19:27 ` Jeff Garzik @ 2003-06-03 19:58 ` Alex Romosan 2003-06-03 20:14 ` Tom Rini 0 siblings, 1 reply; 19+ messages in thread From: Alex Romosan @ 2003-06-03 19:58 UTC (permalink / raw) To: Jeff Garzik; +Cc: Marcelo Tosatti, lkml Jeff Garzik <jgarzik@pobox.com> writes: > On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote: >> Marcelo Tosatti <marcelo@conectiva.com.br> writes: >> >> > Now I really hope its the last one, all this rc's are making me mad. >> >> i still can't get it to compile for sparc32: >> >> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck': >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user': > > That looks like you either need a different compiler version, > or different binutils version... gcc (GCC) 3.3 (Debian) GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux the same versions work on i386 though... --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 19:58 ` Alex Romosan @ 2003-06-03 20:14 ` Tom Rini 2003-06-04 3:35 ` David S. Miller 0 siblings, 1 reply; 19+ messages in thread From: Tom Rini @ 2003-06-03 20:14 UTC (permalink / raw) To: Alex Romosan; +Cc: Jeff Garzik, Marcelo Tosatti, lkml On Tue, Jun 03, 2003 at 12:58:40PM -0700, Alex Romosan wrote: > Jeff Garzik <jgarzik@pobox.com> writes: > > > On Tue, Jun 03, 2003 at 11:30:59AM -0700, Alex Romosan wrote: > >> Marcelo Tosatti <marcelo@conectiva.com.br> writes: > >> > >> > Now I really hope its the last one, all this rc's are making me mad. > >> > >> i still can't get it to compile for sparc32: > >> > >> gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -m32 -pipe -mno-fpu -fcall-used-g5 -fcall-used-g7 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ksyms -DEXPORT_SYMTAB -c ksyms.c > >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_nocheck': > >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `d' conflicts with asm clobber list > >> /usr/src/linux/include/asm/checksum.h:59: error: asm-specifier for variable `l' conflicts with asm clobber list > >> /usr/src/linux/include/asm/checksum.h: In function `csum_partial_copy_from_user': > > > > That looks like you either need a different compiler version, > > or different binutils version... > > gcc (GCC) 3.3 (Debian) > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux That would do it. > the same versions work on i386 though... Yes, but i386 either didn't have now invalid clober lists, or they were fixed in the -pre portion (like it was on PPC32 as well). -- Tom Rini http://gate.crashing.org/~trini/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 20:14 ` Tom Rini @ 2003-06-04 3:35 ` David S. Miller 2003-06-04 15:09 ` Mr. James W. Laferriere 2003-06-04 23:37 ` Alex Romosan 0 siblings, 2 replies; 19+ messages in thread From: David S. Miller @ 2003-06-04 3:35 UTC (permalink / raw) To: Tom Rini; +Cc: Alex Romosan, Jeff Garzik, Marcelo Tosatti, lkml On Tue, 2003-06-03 at 13:14, Tom Rini wrote: > > gcc (GCC) 3.3 (Debian) > > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux > > That would do it. I don't trust anything past gcc-3.2.x on sparc and sparc64. Use 3.3.x and later at your own peril. -- David S. Miller <davem@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-04 3:35 ` David S. Miller @ 2003-06-04 15:09 ` Mr. James W. Laferriere 2003-06-04 23:37 ` Alex Romosan 1 sibling, 0 replies; 19+ messages in thread From: Mr. James W. Laferriere @ 2003-06-04 15:09 UTC (permalink / raw) To: David S. Miller Cc: Tom Rini, Alex Romosan, Jeff Garzik, Marcelo Tosatti, lkml Hello Dave , Thank you for the warning . Now how about why laymans style ? Tia , JimL On Tue, 3 Jun 2003, David S. Miller wrote: > On Tue, 2003-06-03 at 13:14, Tom Rini wrote: > > > gcc (GCC) 3.3 (Debian) > > > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux > > That would do it. > I don't trust anything past gcc-3.2.x on sparc and sparc64. > Use 3.3.x and later at your own peril. -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-04 3:35 ` David S. Miller 2003-06-04 15:09 ` Mr. James W. Laferriere @ 2003-06-04 23:37 ` Alex Romosan 1 sibling, 0 replies; 19+ messages in thread From: Alex Romosan @ 2003-06-04 23:37 UTC (permalink / raw) To: David S. Miller; +Cc: Tom Rini, Jeff Garzik, Marcelo Tosatti, lkml "David S. Miller" <davem@redhat.com> writes: > On Tue, 2003-06-03 at 13:14, Tom Rini wrote: >> > gcc (GCC) 3.3 (Debian) >> > GNU ld version 2.14.90.0.4 20030523 Debian GNU/Linux >> >> That would do it. > > I don't trust anything past gcc-3.2.x on sparc and sparc64. > Use 3.3.x and later at your own peril. recompiled with gcc-3.2.3 and the kernel not only compiled but also booted. thank you. --alex-- -- | I believe the moment is at hand when, by a paranoiac and active | | advance of the mind, it will be possible (simultaneously with | | automatism and other passive states) to systematize confusion | | and thus to help to discredit completely the world of reality. | ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti 2003-06-03 18:02 ` Tomas Szepe 2003-06-03 18:30 ` Alex Romosan @ 2003-06-05 12:09 ` Andreas Haumer 2003-06-07 15:46 ` Andreas Haumer 2 siblings, 1 reply; 19+ messages in thread From: Andreas Haumer @ 2003-06-05 12:09 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: lkml -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Marcelo Tosatti wrote: > Hallo, > > Now I really hope its the last one, all this rc's are making me mad. > ;-) So, here's a report on the more positive side... As I mentioned in some e-mails in the last few days, I'm currently testing an Asus AP1700-S5 server with a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and 4x36GB U320SCSI drives (3 of them are assembled as RAID5), connected via GBit Ethernet to our internal network root@setup:~ {533} $ lspci 00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31) 00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge 00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge 00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02) 00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93) 00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) 00:0f.3 Host bridge: ServerWorks GCLE Host Bridge 00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) 02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) 03:02.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (LOM) (rev 02) root@setup:~ {538} $ uptime 2:05pm up 18:09, 11 users, load average: 8.03, 8.45, 8.15 This system is running 2.4.21-rc7 for more than 18 hours now with the following load: *) an endless loop to create and remove a large file on the RAID5 (ext3 filesystem): while true; do time dd if /dev/zero of /var/tmp/largefile bs 1M count 2000 ; rm -f /var/tmp/largefile; done *) some commands to create additional load: cd / find . boot/ usr/ tmp/ opt/ var/ -xdev -type f -exec md5sum {} \; *) NFS copy of a whole 40GB filesystem tree from a Linux NFS server to the RAID5 (in a loop) *) the system is also NFS serving a Linux NFS client, which copies the whole server filesystem into /dev/null *) Additionally, I have the following programs running: - Squid (currently used as proxy for our internal web browsers) - Apache - jedit (with j2sdk-1.4.1_01) - StarOffice-5.2 - Mozilla-1.3.1 - and lots of additional programs (shell, sshd, emacs), but no X server (we are using Linux workstations as X-Terminals) All in all, there are more than 190 processes at any point in time in the past 18 hours. This all produces a permanent load between 7 and 9 vmstat 1 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 4 4 111720 3220 11344 423820 0 0 4 18976 4892 4273 2 68 30 0 4 3 111720 3204 11352 423728 32 0 80 25216 1460 2095 0 15 85 0 4 3 111716 3332 11352 423364 76 0 92 25796 1432 1895 2 14 84 0 4 3 111716 3208 11372 423392 48 0 712 26336 1566 2346 4 14 81 0 6 3 111716 3208 11412 423196 132 0 420 32820 1774 3113 12 19 69 0 5 3 111716 3376 11440 422340 704 0 924 24444 1570 2811 3 17 79 6 2 4 111716 2328 11560 423988 536 0 700 32088 2268 4590 6 73 21 11 3 4 111764 63352 11604 321148 16 308 310 36868 2267 5390 12 46 42 root@setup:~ {537} $ uptime 1:37pm up 17:41, 10 users, load average: 7.94, 7.31, 7.18 Under this circumstances, I made the following observations: a) The system runs stable for more than 18 hours now b) It seems to behave quite fine, given the load. Response time for all services (web-proxy, web-server) is reasonable low (you almost don't notice any delay) c) Interactive programs (Mozilla, StarOffice, JEdit) are still quite usable. There is some delay when opening a file in SO (say, about 2-3 seconds), but that's fine d) Sometimes (but not really reproducable) I noticed a _big_ delay when connecting to the server using SSH (with "big", I mean 1 minute or so). I eventually get a connection, and then can work as normal. e) The server uses a single, but hyperthreaded CPU. Hyperthreading is enabled, and Linux shows both logical CPU's: root@setup:~ {529} $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.40GHz stepping : 7 cpu MHz : 2392.169 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4771.02 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.40GHz stepping : 7 cpu MHz : 2392.169 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4771.02 But interrupt distribution seems a little bit strange: root@setup:~ {530} $ cat /proc/interrupts CPU0 CPU1 0: 6318080 0 IO-APIC-edge timer 1: 967 0 IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 4: 32477 0 IO-APIC-edge serial 5: 55629300 0 IO-APIC-level eth0 9: 85639064 0 IO-APIC-level acpi, ioc0, ioc1 11: 0 0 IO-APIC-level usb-ohci 15: 2 0 IO-APIC-edge ide1 NMI: 0 0 LOC: 6318529 6318527 ERR: 0 MIS: 0 With 2.4.21-rc6-ac1, interrupts where counted for both logical CPU's. Is this a bug or a feature? HTH - - andreas - -- Andreas Haumer | mailto:andreas@xss.co.at *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE+3zMOxJmyeGcXPhERAu6CAKCILyOUfPyGaKG8pvbl4droch6B+ACbBNB/ Dw1L/tRv2JSrOHA12B8BaHM= =rWPF -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-05 12:09 ` Andreas Haumer @ 2003-06-07 15:46 ` Andreas Haumer 2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer 2003-06-11 20:48 ` Linux 2.4.21-rc7 Marcelo Tosatti 0 siblings, 2 replies; 19+ messages in thread From: Andreas Haumer @ 2003-06-07 15:46 UTC (permalink / raw) To: Andreas Haumer; +Cc: Marcelo Tosatti, lkml -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Andreas Haumer wrote: > Hi! > > Marcelo Tosatti wrote: > >>Hallo, >> >>Now I really hope its the last one, all this rc's are making me mad. >> > > ;-) > > So, here's a report on the more positive side... > I think, I have to take that back... :-(( > As I mentioned in some e-mails in the last few days, > I'm currently testing an Asus AP1700-S5 server with > a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and > 4x36GB U320SCSI drives (3 of them are assembled as RAID5), > connected via GBit Ethernet to our internal network > I had this system running under heavy load for about 24 hours without problems. I then stopped the stress testing, and had several system freezes since then. With system freeze I mean: *) machine doesn't answer to ping, no reaction to console keyboard, no message on the console screen, no message in logfile, no oops, no noticeable system activity I changed several BIOS settings (disabled hyperthreading, disabled USB, disabled power management) and tried to run the kernel with "acpi=off" and "noapic". I also changed root disk, because I found a SCSI error message in the logs once. Nothing seems to help. The system just freezes under light load at some time between 1 and 8 hours uptime. It's really strange that it survived heavy load for more than 24 hours in the first place. I found some problem reports from several people, which sound quite similar to the freeze I see here. These people all had motherboards with serverworks chipset, GBit ethernet and noticed similar lockups or system freeze symptoms. From the reports I'm not sure if the problems still persist or if they should be solved now. Can someone please comment on that? Here are some infos from the system again: root@server:~ {505} $ cat /proc/interrupts CPU0 0: 118748 IO-APIC-edge timer 1: 274 IO-APIC-edge keyboard 2: 0 XT-PIC cascade 4: 7011 IO-APIC-edge serial 9: 1181037 IO-APIC-level ioc0, ioc1 14: 1685 IO-APIC-level eth0 15: 2 IO-APIC-edge ide1 NMI: 0 LOC: 118700 ERR: 0 MIS: 0 root@server:~ {506} $ cat /proc/cmdline auto BOOT_IMAGE=lx2421rc7 ro root=100 acpi=off root@server:~ {507} $ uname -a Linux server 2.4.21-rc7 #1 SMP Wed Jun 4 18:31:15 CEST 2003 i686 unknown root@server:~ {508} $ lsmod Module Size Used by Not tainted af_packet 13256 1 (autoclean) e1000 50028 1 (autoclean) ext3 60832 2 (autoclean) jbd 40056 2 (autoclean) [ext3] raid5 17704 1 (autoclean) md 57472 2 (autoclean) [raid5] xor 8868 0 (autoclean) [raid5] unix 15664 38 (autoclean) ext2 33440 4 (autoclean) sd_mod 10652 18 (autoclean) isense 32404 0 (autoclean) (unused) mptctl 19116 0 (autoclean) (unused) mptscsih 29696 9 (autoclean) mptbase 32640 5 (autoclean) [isense mptctl mptscsih] scsi_mod 95748 2 (autoclean) [sd_mod mptscsih] root@server:~ {511} $ lspci -vvvv 00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31) Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- 00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- 00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- 00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02) Subsystem: Intel Corp. 82540EM Gigabit Ethernet Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (63750ns min), cache line size 08 Interrupt: pin A routed to IRQ 14 Region 0: Memory at fd800000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at d800 [size=64] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device. Command: DPERE- ERO+ RBC=0 OST=0 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: ATI Technologies Inc: Unknown device 8008 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min), cache line size 08 Interrupt: pin A routed to IRQ 10 Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M] Region 1: I/O ports at d400 [size=256] Region 2: Memory at fb800000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at febe0000 [disabled] [size=128K] Capabilities: [5c] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93) Subsystem: ServerWorks CSB5 South Bridge Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 32 00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if 88 [Master SecP]) Subsystem: ServerWorks CSB5 IDE Controller Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, cache line size 08 Region 0: I/O ports at <ignored> Region 1: I/O ports at <ignored> Region 2: I/O ports at <ignored> Region 3: I/O ports at <ignored> Region 4: I/O ports at a800 [size=16] 00:0f.3 Host bridge: ServerWorks GCLE Host Bridge Subsystem: ServerWorks: Unknown device 0230 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Capabilities: [60] 00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Capabilities: [60] 00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Capabilities: [60] 00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Capabilities: [60] 02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) Subsystem: LSI Logic / Symbios Logic: Unknown device 1000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 72 (4250ns min, 4500ns max), cache line size 08 Interrupt: pin A routed to IRQ 9 Region 0: I/O ports at a000 [size=256] Region 1: Memory at fa000000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at f9800000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at fe900000 [disabled] [size=1M] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [68] 02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) Subsystem: LSI Logic / Symbios Logic: Unknown device 1000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 72 (4250ns min, 4500ns max), cache line size 08 Interrupt: pin B routed to IRQ 9 Region 0: I/O ports at 9800 [size=256] Region 1: Memory at f9000000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at f8800000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at fe800000 [disabled] [size=1M] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [68] 03:02.0 Ethernet controller: Intel Corp. 82544GC Gigabit Ethernet Controller (LOM) (rev 02) Subsystem: Intel Corp.: Unknown device 110d Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (63750ns min), cache line size 08 Interrupt: pin A routed to IRQ 5 Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=128K] Region 2: Memory at f7800000 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at 9400 [size=32] Expansion ROM at fe7e0000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device. Command: DPERE- ERO+ RBC=0 OST=0 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Any idea how I should proceed now? I really could use some help here, I'm running out of ideas... :-(( - - andreas - -- Andreas Haumer | mailto:andreas@xss.co.at *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE+4gjsxJmyeGcXPhERAsT4AJ9sylkxso5kXO51+6c5bfskVV2meACgrF33 t8xXYpu6FGPsiQ9VBmnk6ek= =Yov+ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 19+ messages in thread
* [2.4.21-rc7] AP1700-S5 system freeze :-(( 2003-06-07 15:46 ` Andreas Haumer @ 2003-06-09 10:16 ` Andreas Haumer 2003-06-09 11:46 ` Stephan von Krawczynski 2003-06-11 20:48 ` Linux 2.4.21-rc7 Marcelo Tosatti 1 sibling, 1 reply; 19+ messages in thread From: Andreas Haumer @ 2003-06-09 10:16 UTC (permalink / raw) To: Andreas Haumer; +Cc: lkml -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Note: I'm reporting this with a different subject line now, as I got zero replies to my first bugreport. This is still the same Asus AP1700-S5 server as in my previous reports, though: Asus AP1700-S5 server, single Xeon 2.4GHz CPU (FSB533) 512MB registered DDR with ECC, Asus PR-DLS533 motherboard with ServerWorks GCLE chipset root@server:~ {535} $ lspci 00:00.0 Host bridge: ServerWorks CNB20-HE Host Bridge (rev 31) 00:00.1 Host bridge: ServerWorks CNB20-HE Host Bridge 00:00.2 Host bridge: ServerWorks CNB20-HE Host Bridge 00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93) 00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) 00:0f.3 Host bridge: ServerWorks GCLE Host Bridge 00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03) 01:02.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74) 02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) 02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) Andreas Haumer wrote: [...] > I had this system running under heavy load for about 24 hours > without problems. I then stopped the stress testing, and had > several system freezes since then. > > With system freeze I mean: > > *) machine doesn't answer to ping, no reaction to console > keyboard, no message on the console screen, no message > in logfile, no oops, no noticeable system activity > I just had another freeze or lockup of this system, after 1 day and 14 hours uptime. :-( This time the machine was running with an 3Com 3c905c 100MBit NIC, with the onboard e1000 GBit controllers disabled. Obviously, this didn't help, too... When I noticed the freeze, I tried to ping the server, and got a few replies back, but with a delay of more than 60 seconds! I didn't wait that long when I tried to ping the server on the previous lockups, so maybe the "no answer to ping" symptom I described is more a "big delay in answering ping packets" symptom. Does that ring any bell? Any idea anyone? - - andreas - -- Andreas Haumer | mailto:andreas@xss.co.at *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE+5F6HxJmyeGcXPhERApOfAJ4klAsR0lA8Zzk5s22quImzxud6agCgvAi1 FXZuNQV3C4UaKVi9gOvtJFM= =qL4B -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [2.4.21-rc7] AP1700-S5 system freeze :-(( 2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer @ 2003-06-09 11:46 ` Stephan von Krawczynski 2003-06-09 12:21 ` Andreas Haumer 0 siblings, 1 reply; 19+ messages in thread From: Stephan von Krawczynski @ 2003-06-09 11:46 UTC (permalink / raw) To: Andreas Haumer; +Cc: linux-kernel Hello Andreas, I am not quite sure if you are experiencing something similar to my problem. Fact is this: I have a serverworks based dual PIII board and I am experiencing freezes just about every day. Equal setups: Kernel 2.4.21-rc7 00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (me: rev 23 you: rev 31) 00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01) Lockups during light load Differing: Just about everything else: yours: mine: Storage System: Symbios AIC VGA : ATI Rage XL ATI Radeon RV200 Network : Intel/3com Intel/Broadcom Processor : Xeon UP PIII SMP I could already produce oops-messages on the problem and mine all come up in kmem_cache_alloc_batch. It would be interesting where your box freezes. It cannot be at this same place, because the code is not there in UP. Try this (in case you are not working in front of the box): Start box and switch to text console, enter "setterm -blank 0" to disable screen blanker. Wait for oops. If we are lucky you will see something, get a pencil then :-) -- Regards, Stephan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [2.4.21-rc7] AP1700-S5 system freeze :-(( 2003-06-09 11:46 ` Stephan von Krawczynski @ 2003-06-09 12:21 ` Andreas Haumer 0 siblings, 0 replies; 19+ messages in thread From: Andreas Haumer @ 2003-06-09 12:21 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Many thanks for your reply! Stephan von Krawczynski wrote: > Hello Andreas, > > I am not quite sure if you are experiencing something similar to my problem. > Fact is this: > > I have a serverworks based dual PIII board and I am experiencing freezes just > about every day. > > Equal setups: > > Kernel 2.4.21-rc7 > 00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (me: rev 23 you: rev 31) > 00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01) > > Lockups during light load > Me too. I had it running for 24 hours with heavy stress testing and a load above 7 all the time without problems. I then stopped this test, and the box locked up 2 hours later, and locked up about 7 or 8 times in the past few days :-( > > Differing: > > Just about everything else: > yours: mine: > Storage System: Symbios AIC This is not a "normal" symbios logic "sym53c8xx" storage controller, but a "Symbios Logic 53c1030", which uses the Fusion MPT driver. This is the first time I'm running this driver, so I don't know if it's considered stable (but I guess so) Unfortunately I can't replace it as I don't have any spare SCSI controller which fits right now. > VGA : ATI Rage XL ATI Radeon RV200 > Network : Intel/3com Intel/Broadcom > Processor : Xeon UP PIII SMP > > > I could already produce oops-messages on the problem and mine all come up in > kmem_cache_alloc_batch. It would be interesting where your box freezes. It > cannot be at this same place, because the code is not there in UP. > Try this (in case you are not working in front of the box): > > Start box and switch to text console, enter "setterm -blank 0" to disable > screen blanker. Wait for oops. If we are lucky you will see something, get a > pencil then :-) > I always have the system running with text console and screen blanking disabled. Alas, I see no oops :-( IMHO it doesn't look like the kernel crashes with an oops, it does look more like it suddenly goes into an endless loop or ridiculously high load somehow. Last time I hade this freeze, I noticed that the system answered my ICMP ping messages with a delay of more than 60 seconds. This looked like the system was very busy at that time. I'm now running with 2.4.20rc2, and also have syslog routed to another system on the network. We'll see if I can get any more information out of this. - - andreas - -- Andreas Haumer | mailto:andreas@xss.co.at *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE+5HvjxJmyeGcXPhERAvOvAJ94cQS4tlzylHiVU084v7FK/e/aowCgw4w9 M3YWSHXzx9IuKeU4Z6WicEk= =8102 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Linux 2.4.21-rc7 2003-06-07 15:46 ` Andreas Haumer 2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer @ 2003-06-11 20:48 ` Marcelo Tosatti [not found] ` <1055408183.2552.18.camel@tor.trudheim.com> 1 sibling, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2003-06-11 20:48 UTC (permalink / raw) To: Andreas Haumer; +Cc: lkml On Sat, 7 Jun 2003, Andreas Haumer wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi! > > Andreas Haumer wrote: > > Hi! > > > > Marcelo Tosatti wrote: > > > >>Hallo, > >> > >>Now I really hope its the last one, all this rc's are making me mad. > >> > > > > ;-) > > > > So, here's a report on the more positive side... > > > I think, I have to take that back... :-(( > > > As I mentioned in some e-mails in the last few days, > > I'm currently testing an Asus AP1700-S5 server with > > a single Xeon 2.4GHz CPU (FSB533), 512MB RAM and > > 4x36GB U320SCSI drives (3 of them are assembled as RAID5), > > connected via GBit Ethernet to our internal network > > > I had this system running under heavy load for about 24 hours > without problems. I then stopped the stress testing, and had > several system freezes since then. > > With system freeze I mean: > > *) machine doesn't answer to ping, no reaction to console > keyboard, no message on the console screen, no message > in logfile, no oops, no noticeable system activity Maybe the NMI oopser helps? ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1055408183.2552.18.camel@tor.trudheim.com>]
* Re: Linux 2.4.21-rc7 [not found] ` <1055408183.2552.18.camel@tor.trudheim.com> @ 2003-06-12 9:35 ` Andreas Haumer 0 siblings, 0 replies; 19+ messages in thread From: Andreas Haumer @ 2003-06-12 9:35 UTC (permalink / raw) To: Anders Karlsson; +Cc: Marcelo Tosatti, Linux Kernel Mailing List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Anders Karlsson wrote: > On Wed, 2003-06-11 at 21:48, Marcelo Tosatti wrote: > >>On Sat, 7 Jun 2003, Andreas Haumer wrote: > > [snip] > >>>I had this system running under heavy load for about 24 hours >>>without problems. I then stopped the stress testing, and had >>>several system freezes since then. >>> >>>With system freeze I mean: >>> >>>*) machine doesn't answer to ping, no reaction to console >>> keyboard, no message on the console screen, no message >>> in logfile, no oops, no noticeable system activity > > > I have this problem without actually stressing the machine too hard. The > average load on my Thinkpad over a weekend would perhaps be 0.05, yet I > can have several hard hangs where there seems to be no trace of a hang > at all in logfiles. > I have to admit that "system freeze" is a quite unspecific symptom. It could have a zillion of different reasons. In my case I'm currently chasing SCSI errors which I think could have something to do with it (besides, it's _not_ an Adaptec controller, but a LSI 53c1030 with Fusion MPT driver... :-) In my server logs I sometimes see SCSI timeouts like this: [...] scsi : aborting command due to timeout : pid 1148093, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 00 0f af 00 00 10 00 mptscsih: OldAbort scheduling ABORT SCSI IO (sc=dfca8e00) IOs outstanding = 3 mptscsih: ioc0: Issue of TaskMgmt Successful! SCSI host 0 abort (pid 1148093) timed out - resetting SCSI bus is being reset for host 0 channel 0. mptscsih: OldReset scheduling BUS_RESET (sc=dfca8e00) IOs outstanding = 4 SCSI Error Report =-=-= (0:0:0) SCSI_Status=02h (CHECK CONDITION) Original_CDB[]: 2A 00 00 3C 4D 78 00 00 02 00 - "WRITE(10)" SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ... SenseKey=6h (UNIT ATTENTION); FRU=00h ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED" SCSI Error Report =-=-= (0:1:0) SCSI_Status=02h (CHECK CONDITION) Original_CDB[]: 28 00 00 00 0F AF 00 00 10 00 - "READ(10)" SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ... SenseKey=6h (UNIT ATTENTION); FRU=00h ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED" SCSI Error Report =-=-= (0:2:0) SCSI_Status=02h (CHECK CONDITION) Original_CDB[]: 28 00 00 4E 0A 37 00 00 08 00 - "READ(10)" SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ... SenseKey=6h (UNIT ATTENTION); FRU=00h ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED" SCSI Error Report =-=-= (0:3:0) SCSI_Status=02h (CHECK CONDITION) Original_CDB[]: 28 00 03 B0 08 6F 00 00 08 00 - "READ(10)" SenseData[20h]: 70 00 06 00 00 00 00 18 00 00 00 00 29 02 00 00 00 00 ... SenseKey=6h (UNIT ATTENTION); FRU=00h ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED" [...] There are 4 hot swap SCSI disks in the server, and all of them eventually report those timeouts (so it's not specific to a single disk) I already replaced cabling, tried a different hot swap (SCA) cage, and I'm now trying to replace the disks one by one to eventually find the culprit. There are two problems with this approach: 1.) After each change I have to wait several hours up to two days for a SCSI timeout to occur as I can not reproduce the problem at will. 2.) I'm not _sure_ if those SCSI timeouts are related to the server freeze symptoms I see. It's just an assumption. IMHO it could work as follows: SCSI timeouts occure somtimes. The driver then aborts the command and resets the SCSI bus to get it into a sane state again. But what if the bus reset doesn't work as expected and the bus remains unusable for a while? Could this bring the whole system into this "freeze" state (the system is still running, but everything waits for the SCSI bus to recover)? Could this explain the symptom of those big delays of ICMP ping answer messages I saw? So the most precious resource for chasing this problem is time, and this is also the resource which I don't have available as much as I'd like to... :-( > >>Maybe the NMI oopser helps? > > > Marcelo, where can I get hold of this and would there be documentation > included with it for how to install/use it? > Look at /usr/src/linux/Documentation/nmi_watchdog.txt Regards, - - andreas - -- Andreas Haumer | mailto:andreas@xss.co.at *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE+6El7xJmyeGcXPhERAqykAKCumORTm/lDofkrg52FX33rOfgC/ACeNxR7 l9/znrbi0lZoR/zw+LTdNhI= =W7Gt -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2003-06-12 9:24 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-03 17:04 Linux 2.4.21-rc7 Marcelo Tosatti
2003-06-03 18:02 ` Tomas Szepe
2003-06-03 18:07 ` Marcelo Tosatti
2003-06-03 19:15 ` lk
2003-06-03 19:40 ` Alan Cox
2003-06-03 18:30 ` Alex Romosan
2003-06-03 19:27 ` Jeff Garzik
2003-06-03 19:58 ` Alex Romosan
2003-06-03 20:14 ` Tom Rini
2003-06-04 3:35 ` David S. Miller
2003-06-04 15:09 ` Mr. James W. Laferriere
2003-06-04 23:37 ` Alex Romosan
2003-06-05 12:09 ` Andreas Haumer
2003-06-07 15:46 ` Andreas Haumer
2003-06-09 10:16 ` [2.4.21-rc7] AP1700-S5 system freeze :-(( Andreas Haumer
2003-06-09 11:46 ` Stephan von Krawczynski
2003-06-09 12:21 ` Andreas Haumer
2003-06-11 20:48 ` Linux 2.4.21-rc7 Marcelo Tosatti
[not found] ` <1055408183.2552.18.camel@tor.trudheim.com>
2003-06-12 9:35 ` Andreas Haumer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox