From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bernard Pidoux F6BVP Subject: Re: 2.4.19 panic, netrom Date: Sun, 04 May 2003 11:16:46 +0200 Sender: linux-hams-owner@localhost Message-ID: <3EB4DA7E.1010707@ccr.jussieu.fr> References: <3EAE17DF.6060904@algonet.se> <200304291858.54384.sm6rpz@home.se> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <200304291858.54384.sm6rpz@home.se> List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: "Lars E. Pettersson" Cc: linux-hams@vger.kernel.org Lars E. Pettersson wrote: >On Tuesday 29 April 2003 08:12, Kjell Jarl wrote: >... > > >>- I issue a netrom connect to a distand node. >>- When connected, I send sveral "b" commands. >>- When (unsure of the number) one or two "b" has been sent, there comes >>a netrom disconnect from the remote station. >>- In my vnc window, the last thing I saw was another b being sent to my >>neighboring node after the disconnect arrived. >> >> > >My kernel panics (it get hung in the interrupt handler) seem to come when I >have a netrom connection with outstanding frames (if I remember correctly) >and we get a timeout from the ax25 connection. When the ax25 connection, >initiated by the netrom connect, times out, we get the hang. > >Anyone with kernel knowledge that gets any wiser by this? > >73 de Lars, sm6rpz > > I have already reported to the list my findings about kernel 2.4.x panics. For me there is no question about the origin of the problem : it is not specifically related with netrom but with ax25. Here I am using serial mkiss interface and I think that the problem is related to the serial management part of the code with intensive use of clear interrupt (cli) instructions. In 2.4 kernels the interrupts are handled differently than in 2.2 kernels by a new procedure called softirq, that apparently is unable to recover from all the interrupts generated by ax25 code. I have traced the oops five times following the recommandations in Documentation/oops-tracing.txt ( see REPORTING-BUGS and README files in /usr/src/linux/ ) All reports gave the same message : <0> Kernel panic : Aiee, killing interrupt handler ! interrupt handler not sycing To make it short (I have 5 complete listing of traces with the last subroutines addresses processed by the CPU ) the 29 sequence of routines given by trace before kernel panics are not always exactly the same but it always start at sock_def_write_space (sock.c) and the last 13 are always the same, beginning with do_sysctl_strategy in sysctl.c and finishing by ksoftirq in softirq.c. I guess that the important point is the way the code sequence leading to a kernel panic is started and what routines are involved. In my case, subroutine ax25_rcv, that make a lot of cli() instructions, and ax25_kiss_rcv, were often involved in the fatal sequence. I am aware that a lot of code cleaning is being performed in 2.5.x kernels following the decision to remove cli() / sti() mechanism. This should prevent system hanging, but until now I was not able to run such an experimental kernel (no display at boot !). I certainly would be interested in testing ax25 after these intensive modifications around interrupt mechanism. 73 de Bernard F6BVP