From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bernard Pidoux F6BVP <pidoux@ccr.jussieu.fr>
Subject: Re: 2.4.19 panic, netrom
Date: Sun, 04 May 2003 11:16:46 +0200
Sender: linux-hams-owner@localhost
Message-ID: <3EB4DA7E.1010707@ccr.jussieu.fr>
References: <3EAE17DF.6060904@algonet.se> <200304291858.54384.sm6rpz@home.se>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Return-path: <linux-hams-owner@vger.kernel.org>
In-Reply-To: <200304291858.54384.sm6rpz@home.se>
List-Id: <linux-hams.vger.kernel.org>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
To: "Lars E. Pettersson" <sm6rpz@home.se>
Cc: linux-hams@vger.kernel.org

Lars E. Pettersson wrote:

>On Tuesday 29 April 2003 08:12, Kjell Jarl wrote:
>...
>  
>
>>- I issue a netrom connect to a distand node.
>>- When connected, I send sveral "b" commands.
>>- When (unsure of the number) one or two "b" has been sent, there comes
>>a netrom disconnect from the remote station.
>>- In my vnc window, the last thing I saw was another b being sent to my
>>neighboring node after the disconnect arrived.
>>    
>>
>
>My kernel panics (it get hung in the interrupt handler) seem to come when I 
>have a netrom connection with outstanding frames (if I remember correctly) 
>and we get a timeout from the ax25 connection. When the ax25 connection, 
>initiated by the netrom connect, times out, we get the hang.
>
>Anyone with kernel knowledge that gets any wiser by this?
>
>73 de Lars, sm6rpz
>  
>
I have already reported to the list my findings about  kernel 2.4.x panics.
For me there is no question about the origin of the problem : it is not 
specifically related with netrom but with ax25.

Here I am using serial mkiss interface and I think that the problem is 
related to the serial management part of the code with intensive use of 
clear interrupt (cli) instructions. In 2.4 kernels the interrupts are 
handled differently than in 2.2 kernels by a new procedure called 
softirq, that apparently is unable to recover from all the interrupts 
generated by ax25 code.

I have traced the oops five times following the recommandations in
Documentation/oops-tracing.txt ( see REPORTING-BUGS and README files  in 
/usr/src/linux/ )

All reports gave the same message :
<0> Kernel panic : Aiee, killing interrupt handler !
interrupt handler not sycing

To make it short (I have 5 complete listing of traces with the last 
subroutines addresses processed by the CPU ) the 29 sequence of routines 
given by trace before kernel panics are not always exactly the same but 
it always start at

sock_def_write_space (sock.c)

and the last 13 are always the same, beginning with do_sysctl_strategy 
in sysctl.c and finishing by ksoftirq in softirq.c.

I guess that the important point is the way the code sequence leading to 
a kernel panic is started and what routines are involved.

In my case, subroutine ax25_rcv, that make a lot of cli() instructions, 
and ax25_kiss_rcv, were often involved in the fatal sequence.

I am aware that a lot of code cleaning is being performed in 2.5.x 
kernels following the decision to remove cli() / sti() mechanism. This 
should prevent system hanging, but until now I was not able to run such 
an experimental kernel (no display at boot !).

I certainly would be interested in testing ax25 after these intensive 
modifications around interrupt mechanism.


73 de Bernard F6BVP