* network stack oops 2.4.1/gcc 2.95.3
@ 2001-01-27 13:36 Iain Sandoe
0 siblings, 0 replies; 9+ messages in thread
From: Iain Sandoe @ 2001-01-27 13:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Franz Sirl
Hi,
This is not a completely repeatable scenario.
Yesterday, when my network connection was very busy portmap & named timed
out. Today they both oops'ed in the same place - system came up OK apart
from dead network stack.
Iain.
----
Linux version 2.4.1-pre10-iain1 (iain-s@athena)
(gcc version 2.95.3 20010111 (prerelease/franzo/20010111)) #1 Fri Jan 26
NOTE1: ^ ^ ^ ^
Oops: Exception in kernel mode, sig: 4
NIP: C017AF80 XER: 20000000 LR: C017AF74 SP: CF52BD00 REGS: cf52bc50 TRAP:
0700
NOTE2: 0700 is Illegal Instruction.
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = cf52a000[227] 'portmap' Last syscall: 102
last math cf390000 last altivec 00000000
GPR00: 00000010 CF52BD00 CF52A000 00000038 00001032 00000000 CF54D458
00000000
GPR08: C099C000 00000001 0000000E C028DC60 22444849 1001E9E8 00000000
100B5390
GPR16: 100CEF50 00000000 00000000 00000000 00009032 0F52BE80 00000000
C00043F0
GPR24: C0004120 7FFFF738 10017948 00002260 CF52BD68 00000000 CF550CA8
CF52BD68
Call backtrace:
C017AF74 C0145E3C C0146D5C C0147578 C000417C 10017B28 00000709
0FF7F248 0FF7FAB0 10001980 0FED2734 00000000
from System.map:
c017af38 T inet_recvmsg
c017af98 T inet_sendmsg
c0145de4 T sock_recvmsg
c0145ee8 t sock_lseek
c0146cbc T sys_recvfrom
c0146dc0 T sys_recv
c01473fc T sys_socketcall
c01475dc T sock_register
======================================================
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
@ 2001-01-29 19:05 David Edelsohn
0 siblings, 0 replies; 9+ messages in thread
From: David Edelsohn @ 2001-01-29 19:05 UTC (permalink / raw)
To: iain; +Cc: Franz Sirl, linuxppc-dev
Has there been any follow up to this problem report? Is this a
regression from gcc-2.95.2 behavior? Franz's pre-releases do not exactly
correspond to the GCC development pre-releases, so I have no idea when
this problem began or whether it is local to the linuxppc branch.
David
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
@ 2001-01-29 19:12 Iain Sandoe
2001-01-29 19:18 ` David Edelsohn
0 siblings, 1 reply; 9+ messages in thread
From: Iain Sandoe @ 2001-01-29 19:12 UTC (permalink / raw)
To: David Edelsohn; +Cc: Franz Sirl, linuxppc-dev
> Has there been any follow up to this problem report? Is this a
> regression from gcc-2.95.2 behavior? Franz's pre-releases do not exactly
> correspond to the GCC development pre-releases, so I have no idea when
> this problem began or whether it is local to the linuxppc branch.
It is not a regression test. It would not be particularly easy, either to
put 2.95.2 back up and wind back the system to the build conditions...
but the version used was 2.95.3-test2. and the bk pull was 2.4.1-pre10.
The only reason I associated gcc at all (and copied this to Franz) was that
I don't recall ever seeing an Illegal Instruction oops before.
Unfortunately, (or fortunately depending on your POV) it is not reproducible
to order. It seems to depend on network load - maybe one time in 20 with a
heavily loaded network?
The system is booted OK
It is in the transition from single to init 5 (where portmap & named are
launched in my set-up).
Iain.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
2001-01-29 19:12 Iain Sandoe
@ 2001-01-29 19:18 ` David Edelsohn
0 siblings, 0 replies; 9+ messages in thread
From: David Edelsohn @ 2001-01-29 19:18 UTC (permalink / raw)
To: Iain Sandoe; +Cc: Franz Sirl, linuxppc-dev
>>>>> "Iain Sandoe" writes:
Iain> It is not a regression test. It would not be particularly easy, either to
Iain> put 2.95.2 back up and wind back the system to the build conditions...
Iain> but the version used was 2.95.3-test2. and the bk pull was 2.4.1-pre10.
A regression is different from a regression test. I never called
it a regression test.
I do not mean rewinding all of the way to gcc-2.95.2, but to
Franz's patched version of GCC prior to the gcc-2.95.3 changes.
I am just trying to narrow down whether this is something new.
David
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
@ 2001-01-29 23:54 Iain Sandoe
2001-01-30 12:39 ` Franz Sirl
0 siblings, 1 reply; 9+ messages in thread
From: Iain Sandoe @ 2001-01-29 23:54 UTC (permalink / raw)
To: David Edelsohn; +Cc: Franz Sirl, linuxppc-dev
Hi David,
Mon, Jan 29, 2001, David Edelsohn wrote:
>>> "Iain Sandoe" writes:
> Iain> It is not a regression test. It would not be particularly easy, either
to
> Iain> put 2.95.2 back up and wind back the system to the build conditions...
> Iain> but the version used was 2.95.3-test2. and the bk pull was 2.4.1-pre10.
>
> A regression is different from a regression test. I never called
> it a regression test.
RTFM(ail) properly Iain ;)
> I do not mean rewinding all of the way to gcc-2.95.2, but to
> Franz's patched version of GCC prior to the gcc-2.95.3 changes.
OK. It might be possible - I have a build timestamp on the kernel (I keep
the last four or five builds) and I guess I could somehow track down which
revs to get from bk. I still have the previous glibc & gcc rpms.
However, (see below) I will only do this if someone really believes they
will get useful info... I'm mid-way through a fairly large chunk of work
ATM.
> I am just trying to narrow down whether this is something new.
I believe so. Because I really think I'd built that particular pull before
I upgraded the tool-chain. BUT because it only happens under fairly unusual
circumstances (for my set-up) I can't be 100% sure.
Also it involves a depressingly large amount of system context: kernel, X,
inetd, da-da-da...
If it happens again - I'll see if I really have a genuine Illegal
Instruction in the code stream (or I'm just trying to execute a format
string ;)
ciao,
Iain.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
2001-01-29 23:54 Iain Sandoe
@ 2001-01-30 12:39 ` Franz Sirl
2001-01-30 17:52 ` David Edelsohn
0 siblings, 1 reply; 9+ messages in thread
From: Franz Sirl @ 2001-01-30 12:39 UTC (permalink / raw)
To: Iain Sandoe; +Cc: David Edelsohn, linuxppc-dev
At 00:54 2001-01-30, Iain Sandoe wrote:
> > I do not mean rewinding all of the way to gcc-2.95.2, but to
> > Franz's patched version of GCC prior to the gcc-2.95.3 changes.
>
>OK. It might be possible - I have a build timestamp on the kernel (I keep
>the last four or five builds) and I guess I could somehow track down which
>revs to get from bk. I still have the previous glibc & gcc rpms.
>
>However, (see below) I will only do this if someone really believes they
>will get useful info... I'm mid-way through a fairly large chunk of work
>ATM.
At least it would be nice if you could tell me the version of the RPM you
were using before. I haven't added new patches since early December and
most of my RPM patches are in test2 (even more in test3) now.
FYI, here's my current list of patches that are not in test3:
2000-12-04 Franz Sirl <Franz.Sirl-kernel@lauterbach.com>
2000-08-24 Jim Wilson <wilson@cygnus.com>
* c-common.c (decl_attributes, case A_ALIGN): Revert last change.
Copy type in a TYPE_DECL, just like pushdecl does.
2000-10-17 Franz Sirl <Franz.Sirl-kernel@lauterbach.com>
2000-10-17 Franz Sirl <Franz.Sirl-kernel@lauterbach.com>
* function.c (locate_and_pad_parm): Don't align stack unconditionally.
1999-12-06 Jakub Jelinek <jakub@redhat.com>
* calls.c (save_fixed_argument_area): If save_mode is BLKmode,
always use move_by_pieces to avoid infinite recursion.
(restore_fixed_argument_area): Likewise.
2000-10-14 Franz Sirl <Franz.Sirl-kernel@lauterbach.com>
2000-03-17 Martin v. Löwis <loewis@informatik.hu-berlin.de>
* calls.c (special_function_p): It is only malloc if it returns
Pmode.
2000-04-12 Mark Mitchell <mark@codesourcery.com>
* function.c (aggregate_value_p): VOID_TYPE nodes are never
aggregates.
Tue Sep 14 01:33:15 1999 Andreas Schwab <schwab@suse.de>
* loop.c (strength_reduce): Don't call reg_used_between_p if the
insn from BL2 is after the insn from BL.
Tue Dec 14 18:13:32 1999 J"orn Rennecke <amylaar@cygnus.co.uk>
* loop.c (strength_reduce): Fix sign of giv lifetime calculation
for givs made from biv increments.
Mon Feb 28 11:34:43 2000 J"orn Rennecke <amylaar@cygnus.co.uk>
* loop.c (reg_in_basic_block_p): Don't abort when falling through
to the end of the function.
Sat Apr 22 22:35:38 MET DST 2000 Jan Hubicka <jh@suse.cz>
* loop.c (strength_reduce): Fix biv removal code.
Thu Oct 14 03:59:57 1999 Stephane Carrez <stcarrez@worldnet.fr>
* stor-layout.c (layout_union): Use HOST_WIDE_INT for const_size;
check for member bit-size overflow and use var_size if it occurs.
(layout_record): Use bitsize_int() to define the type size in bits.
Likewise for computation and assignment to DECL_FIELD_BITPOS.
(layout_decl): Likewise when assigning to DECL_SIZE.
Thu Oct 28 10:20:02 1999 Geoffrey Keating <geoffk@cygnus.com>
* config/rs6000/rs6000.md (movsf): Don't convert a SUBREG
of the function return register into a plain REG until
after function inlining is done.
Another datapoint:
[fsirl@entropy:~]$ cat /proc/version
Linux version 2.4.0 (trini@entropy.crashing.org) (gcc version 2.95.3
20010111 (prerelease/franzo/20010111)) #1 Sun Jan 14 15:10:21 MST 2001
[fsirl@entropy:~]$ uptime
5:21am up 11 days, 20:50, 2 users, load average: 0.00, 0.00, 0.00
But this machine has a fairly weak net connection, so mabe this is never
triggered (define heavy network load?).
> > I am just trying to narrow down whether this is something new.
>
>I believe so. Because I really think I'd built that particular pull before
>I upgraded the tool-chain. BUT because it only happens under fairly unusual
>circumstances (for my set-up) I can't be 100% sure.
>
>Also it involves a depressingly large amount of system context: kernel, X,
>inetd, da-da-da...
>
>If it happens again - I'll see if I really have a genuine Illegal
>Instruction in the code stream (or I'm just trying to execute a format
>string ;)
I would rather think this is a new kernel bug...
Franz.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
@ 2001-01-30 15:45 Iain Sandoe
0 siblings, 0 replies; 9+ messages in thread
From: Iain Sandoe @ 2001-01-30 15:45 UTC (permalink / raw)
To: Franz Sirl; +Cc: David Edelsohn, linuxppc-dev
Hi Franz,
You are probably right - it's likely to be a red-herring.
Mind you - an Oops is an Oops ;)
- the compiler issue may be a red-herring the oops was definitely real.
On Tue, Jan 30, 2001, Franz Sirl wrote:
> At least it would be nice if you could tell me the version of the RPM you
> were using before. I haven't added new patches since early December and
> most of my RPM patches are in test2 (even more in test3) now.
the 'fault' was with test2 (2.92.3-t2) + glibc 2.1.3-15g
the previous version (I have been using for some considerable time) was:
2.95.2-1f + glibc 2.1.3-4a
> Another datapoint:
> [fsirl@entropy:~]$ cat /proc/version
> Linux version 2.4.0 (trini@entropy.crashing.org) (gcc version 2.95.3
> 20010111 (prerelease/franzo/20010111)) #1 Sun Jan 14 15:10:21 MST 2001
> [fsirl@entropy:~]$ uptime
> 5:21am up 11 days, 20:50, 2 users, load average: 0.00, 0.00, 0.00
FWIW: once it's up, mine stays there as well... but I'm rebooting a lot at
the moment because of what I'm doing. Also I'm using bk 2.4.1.
I have a _hunch_ that it is a failure/timeout in connecting to the DNS that
triggers the effect.
> But this machine has a fairly weak net connection, so mabe this is never
> triggered (define heavy network load?).
I have a combined ethernet bridge/NAT/ISDN modem - this occurred when it was
operating at full capacity downloading binary data (i.e. quite likely to
drop packets).
> I would rather think this is a new kernel bug...
yeah, probably... I assume program error first, compiler bug second...
It was just a little unusual... I'll try and find some more hard facts
(otherwise I guess we should forget it unless someone else sees something).
I haven't had time, yet, to look if any of the network code changed around
that time.
sorry if I wasted time here..
Iain.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
2001-01-30 12:39 ` Franz Sirl
@ 2001-01-30 17:52 ` David Edelsohn
0 siblings, 0 replies; 9+ messages in thread
From: David Edelsohn @ 2001-01-30 17:52 UTC (permalink / raw)
To: Franz Sirl; +Cc: Iain Sandoe, linuxppc-dev
>>>>> Franz Sirl writes:
Franz> I would rather think this is a new kernel bug...
I am pretty confident that this is a kernel bug. The question in
my mind is whether this is a kernel bug that is newly elicited or
exacerbated by changes in gcc-2.95.3. I want to avoid or at least give
the Linux kernel developers a heads-up so that we can avoid a "GCC is
broken" flame fest on the kernel mailinglists.
Thanks, David
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: network stack oops 2.4.1/gcc 2.95.3
@ 2001-02-01 0:53 Iain Sandoe
0 siblings, 0 replies; 9+ messages in thread
From: Iain Sandoe @ 2001-02-01 0:53 UTC (permalink / raw)
To: David Edelsohn, Franz Sirl; +Cc: linuxppc-dev
> Franz> I would rather think this is a new kernel bug...
Franz is quite right it is.
=====
It is a particularly nasty bug - because it is one of those where something
drops bombs in memory.
====
The code generated by gcc 2.95.3-t2 is fine - from asm through to vmlinux.
=====
However, every now and then - at run time - something trashes the location
which throws up the Illegal Instruction.
as I said - nasty - it could be *anything* and I'm not really sure how/where
to start looking...
much though I hate to admit it (being a firm believer in low-tech printk
debugging) - this is one of those cases where an ICE would really triumph...
ciao,
Iain.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2001-02-01 0:53 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-29 19:05 network stack oops 2.4.1/gcc 2.95.3 David Edelsohn
-- strict thread matches above, loose matches on Subject: below --
2001-02-01 0:53 Iain Sandoe
2001-01-30 15:45 Iain Sandoe
2001-01-29 23:54 Iain Sandoe
2001-01-30 12:39 ` Franz Sirl
2001-01-30 17:52 ` David Edelsohn
2001-01-29 19:12 Iain Sandoe
2001-01-29 19:18 ` David Edelsohn
2001-01-27 13:36 Iain Sandoe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).