Linux HAM/Amateur Radio development
 help / color / mirror / Atom feed
* libax25 axio_flush bug hangs node and fix
@ 2002-10-15  9:52 Tihomir Heidelberg
  2002-10-15 18:22 ` Paul Lewis
  2002-10-16  3:21 ` Craig Small
  0 siblings, 2 replies; 9+ messages in thread
From: Tihomir Heidelberg @ 2002-10-15  9:52 UTC (permalink / raw)
  To: linux-hams

Hi

I noticed that my awznode (a variant of linux node) hangs and 
use maximum CPU when user starts any external command and 
incoming AX.25 connection get broken.

After tracing found something strange in libax25-0.0.10 in 
ax25io.c.

In function static int flush_obuf(ax25io *p)
it returns -1 if write to output file descriptor failed
if ((ret = write(p->ofd, p->obuf, p->optr < p->paclen ? p->optr : p->paclen)) < 0)
return -1

but, in axio_flush(ax25io *p)
we have following loop:
while (p->optr) {
                FD_ZERO(&fdset);
                FD_SET(p->ofd, &fdset);   
                if (select(p->ofd+1, NULL, &fdset, NULL, NULL)<0)
                        return -1;
		flushed+=flush_obuf(p);
                flushed+=j;
}
it is obvious that this loop will never end if above mentioned write
failed. After breaking this loop if flush_obuf returns -1 my awnnode
does not hangs.

Can maintainer of libax25 fix this in next release ?

73 de Tihomir Heidelberg, 9a4gl@9a0tcp.ampr.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-15  9:52 libax25 axio_flush bug hangs node and fix Tihomir Heidelberg
@ 2002-10-15 18:22 ` Paul Lewis
  2002-10-15 19:58   ` Tihomir Heidelberg
  2002-10-16  3:21 ` Craig Small
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Lewis @ 2002-10-15 18:22 UTC (permalink / raw)
  To: Tihomir Heidelberg; +Cc: linux-hams

Are the Symptoms of this shown as -
Internal connections to my Linux_node then onward connection fails. 
Certain conditions the entry in the node user table stays there until I 
kill the associated node process.

Also incoming netrom connections to my linux_node where the link breaks 
between the remote and  my system does not clear down and leaves the 
user entry (USERS command) until I get round to Killing the associated 
node process. Seem to be from the time I install libax25-0.0.10

I had been thinking that I was looking for a netrom/node patch as I had 
to rebuild my system in August due to lost of hard disk.  and Installed 
the latest Libax25 as part of the rebuild.
de Paul g4apl (gb7cip Caterham_Uk)




In message <13346@9A0TCP>, Tihomir Heidelberg <9a4gl@9a0tcp.ampr.org> 
writes
>Hi
>
>I noticed that my awznode (a variant of linux node) hangs and
>use maximum CPU when user starts any external command and
>incoming AX.25 connection get broken.
>
>After tracing found something strange in libax25-0.0.10 in
>ax25io.c.
>
>In function static int flush_obuf(ax25io *p)
>it returns -1 if write to output file descriptor failed
>if ((ret = write(p->ofd, p->obuf, p->optr < p->paclen ? p->optr : 
>p->paclen)) < 0)
>return -1
>
>but, in axio_flush(ax25io *p)
>we have following loop:
>while (p->optr) {
>                FD_ZERO(&fdset);
>                FD_SET(p->ofd, &fdset);
>                if (select(p->ofd+1, NULL, &fdset, NULL, NULL)<0)
>                        return -1;
>               flushed+=flush_obuf(p);
>                flushed+=j;
>}
>it is obvious that this loop will never end if above mentioned write
>failed. After breaking this loop if flush_obuf returns -1 my awnnode
>does not hangs.
>
>Can maintainer of libax25 fix this in next release ?
>
>73 de Tihomir Heidelberg, 9a4gl@9a0tcp.ampr.org
>-
>To unsubscribe from this list: send the line "unsubscribe linux-hams" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
paul@skywaves.demon.co.uk

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-15 18:22 ` Paul Lewis
@ 2002-10-15 19:58   ` Tihomir Heidelberg
  0 siblings, 0 replies; 9+ messages in thread
From: Tihomir Heidelberg @ 2002-10-15 19:58 UTC (permalink / raw)
  To: paul; +Cc: linux-hams

Hi

>Internal connections to my Linux_node then onward connection fails.
>Certain conditions the entry in the node user table stays there until I
>kill the associated node process.

it happend when someone connect to node, then execute any external
command (that call any linux command) and then connection get broken

but, because axio_flush enters unlimited loop when one end of pipe die, it
is very possible that this bug produce also hangs of node in different
situations

as I see this loop does not exists in libax25-0.0.9, so I it very
possible that I get this problem after installing libax25-0.0.10

at http://ham2.cc.fer.hr/9a4gl/ax25io.patch is patch you can apply
to libax25-0.0.10 to see if it helps in your situation

of course, then do:
make
make install
ldconfig

73 de Tihomir, 9a4gl@9a0tcp.ampr.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-15  9:52 libax25 axio_flush bug hangs node and fix Tihomir Heidelberg
  2002-10-15 18:22 ` Paul Lewis
@ 2002-10-16  3:21 ` Craig Small
  2002-10-16 12:56   ` Tomi Manninen OH2BNS
  1 sibling, 1 reply; 9+ messages in thread
From: Craig Small @ 2002-10-16  3:21 UTC (permalink / raw)
  To: Tihomir Heidelberg; +Cc: linux-hams

On Tue, Oct 15, 2002 at 11:52:11AM +0200, Tihomir Heidelberg wrote:
> I noticed that my awznode (a variant of linux node) hangs and 
> use maximum CPU when user starts any external command and 
> incoming AX.25 connection get broken.

Damn, i thought this been fixed in 10. Tomi and somoene else were
discussing it, I'll need their help again.

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/                <csmall@enc.com.au>
MIEEE <csmall@ieee.org>                 Debian developer <csmall@debian.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-16  3:21 ` Craig Small
@ 2002-10-16 12:56   ` Tomi Manninen OH2BNS
  2002-10-17  7:46     ` Kjell Jarl
  0 siblings, 1 reply; 9+ messages in thread
From: Tomi Manninen OH2BNS @ 2002-10-16 12:56 UTC (permalink / raw)
  To: Craig Small; +Cc: linux-hams

On Wed, 16 Oct 2002, Craig Small wrote:

> On Tue, Oct 15, 2002 at 11:52:11AM +0200, Tihomir Heidelberg wrote:
> > I noticed that my awznode (a variant of linux node) hangs and 
> > use maximum CPU when user starts any external command and 
> > incoming AX.25 connection get broken.
> 
> Damn, i thought this been fixed in 10. Tomi and somoene else were
> discussing it, I'll need their help again.

No, actually the bug was introduced in .10 ...

Anyway, I spent last evening thinking back what was discussed with Jeroen
last spring and then cleaning up his last patch a bit. I will send that to
you in a few days.

Now, the problem is that this patch will revert the "fix" that was tried
in 10. Jeroen tried to fix a problem in the applications with a fix in the
lib. After the discussion we finally agreed that the .9 behaviour is
correct, only a few relatively minor bugs needed to be squashed. The "big"
bug is in the applications using libax25io and _they_ need to be fixed.

So finally yesterday I also started to really think how to fix node. The
fix won't be trivial but I'll try and produce a working fix soon. Gosh...
That would be the first new node release since -99 ...

(If you want a quick fix, then Tihomir's patch seems ok to me. But I
really wouldn't like to see that in an official release.)

-- 
Tomi Manninen           Internet:  oh2bns@sral.fi
OH2BNS                  AX.25:     oh2bns@oh2rbi.fin.eu
KP20ME04                Amprnet:   oh2bns@oh2rbi.ampr.org


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-16 12:56   ` Tomi Manninen OH2BNS
@ 2002-10-17  7:46     ` Kjell Jarl
  2002-10-17 13:41       ` Tomi Manninen
  0 siblings, 1 reply; 9+ messages in thread
From: Kjell Jarl @ 2002-10-17  7:46 UTC (permalink / raw)
  To: linux-hams

Tomi Manninen OH2BNS wrote:
> No, actually the bug was introduced in .10 ...

I am runnung libax25-0.0.7-7 under redhat 7.1, kernel 2.4.19 (2.4.13 was
the same). I fell I have the same problem with freezing after broken
connections, probably more than just node.
Is this the same bug?
73
Kjell sm7gvf

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-17  7:46     ` Kjell Jarl
@ 2002-10-17 13:41       ` Tomi Manninen
  2002-10-17 18:41         ` Kjell Jarl
  0 siblings, 1 reply; 9+ messages in thread
From: Tomi Manninen @ 2002-10-17 13:41 UTC (permalink / raw)
  To: Kjell Jarl; +Cc: linux-hams

On Thu, 17 Oct 2002, Kjell Jarl wrote:

> Tomi Manninen OH2BNS wrote:
> > No, actually the bug was introduced in .10 ...
> 
> I am runnung libax25-0.0.7-7 under redhat 7.1, kernel 2.4.19 (2.4.13 was
> the same). I fell I have the same problem with freezing after broken
> connections, probably more than just node.
> Is this the same bug?

No it isn't. This bug is only in 0.0.10. What exactly freezes?


-- 
Tomi Manninen           Internet:  oh2bns@sral.fi
OH2BNS                  AX.25:     oh2bns@oh2rbi.fin.eu
KP20ME04                Amprnet:   oh2bns@oh2rbi.ampr.org


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-17 13:41       ` Tomi Manninen
@ 2002-10-17 18:41         ` Kjell Jarl
  2002-10-17 19:17           ` Jeroen Vreeken
  0 siblings, 1 reply; 9+ messages in thread
From: Kjell Jarl @ 2002-10-17 18:41 UTC (permalink / raw)
  To: linux-hams

> No it isn't. This bug is only in 0.0.10. What exactly freezes?

I get a kernel panic. Similar to

> <0> Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing

I do not have a recent write down.

I am using a USCC card, which I have been suspecting before. But it
seems to be related to either netrom/node disconnects possibly with
outstanding frames or tcp/ip started from node, or an DISC/UA at the
ax.25 link layer.

I have seen the DISC/UA in listen on a frozen vnc screen before (vnc)
tcp retried out and the screen wanished.

Nothing in the logs of the failure, except

Oct 16 21:55:27 pc2 node[650]: sm6tpn @ 213.x.x.x logged in
Oct 16 21:55:45 pc2 node[650]: Connected to VAXJO:SK7HW-5
Oct 17 01:47:21 pc2 syslogd 1.4-0: restart.

suggesting node involved, probably sm6tpn issued a netrom connect.

73
Kjell

RH7.1
Linux pc2 2.4.19 #3 Tue Sep 17 18:33:09 CEST 2002 i586 unknown
libax25-devel-0.0.7-7
libax25-0.0.7-7
ax25-tools-0.0.6-13
ax25-apps-0.0.4-9
node-0.3.0-5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: libax25 axio_flush bug hangs node and fix
  2002-10-17 18:41         ` Kjell Jarl
@ 2002-10-17 19:17           ` Jeroen Vreeken
  0 siblings, 0 replies; 9+ messages in thread
From: Jeroen Vreeken @ 2002-10-17 19:17 UTC (permalink / raw)
  To: Kjell Jarl, linux-hams

On 2002.10.17 20:41:54 +0200 Kjell Jarl wrote:
> > No it isn't. This bug is only in 0.0.10. What exactly freezes?
> 
> I get a kernel panic. Similar to
> 
> > <0> Kernel panic: Aiee, killing interrupt handler!
> > In interrupt handler - not syncing
> 
> I do not have a recent write down.
> 
> I am using a USCC card, which I have been suspecting before. But it
> seems to be related to either netrom/node disconnects possibly with
> outstanding frames or tcp/ip started from node, or an DISC/UA at the
> ax.25 link layer.
> 
> I have seen the DISC/UA in listen on a frozen vnc screen before (vnc)
> tcp retried out and the screen wanished.

I have a been chasing a bug that seems similar lately....
It looks like something goes wrong in sock_def_write_space(). It is called
as a result of kfree_skb().
I have been trying to find the cause of it but have been unsuccessfull
sofar.
The problem I have is that it only occurs about once a week, so any fix I
think I make takes at least a week to test :( unless I find a way to
trigger it faster.

Jeroen



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-10-17 19:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-15  9:52 libax25 axio_flush bug hangs node and fix Tihomir Heidelberg
2002-10-15 18:22 ` Paul Lewis
2002-10-15 19:58   ` Tihomir Heidelberg
2002-10-16  3:21 ` Craig Small
2002-10-16 12:56   ` Tomi Manninen OH2BNS
2002-10-17  7:46     ` Kjell Jarl
2002-10-17 13:41       ` Tomi Manninen
2002-10-17 18:41         ` Kjell Jarl
2002-10-17 19:17           ` Jeroen Vreeken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox