* libax25 axio_flush bug hangs node and fix
@ 2002-10-15 9:52 Tihomir Heidelberg
2002-10-15 18:22 ` Paul Lewis
2002-10-16 3:21 ` Craig Small
0 siblings, 2 replies; 9+ messages in thread
From: Tihomir Heidelberg @ 2002-10-15 9:52 UTC (permalink / raw)
To: linux-hams
Hi
I noticed that my awznode (a variant of linux node) hangs and
use maximum CPU when user starts any external command and
incoming AX.25 connection get broken.
After tracing found something strange in libax25-0.0.10 in
ax25io.c.
In function static int flush_obuf(ax25io *p)
it returns -1 if write to output file descriptor failed
if ((ret = write(p->ofd, p->obuf, p->optr < p->paclen ? p->optr : p->paclen)) < 0)
return -1
but, in axio_flush(ax25io *p)
we have following loop:
while (p->optr) {
FD_ZERO(&fdset);
FD_SET(p->ofd, &fdset);
if (select(p->ofd+1, NULL, &fdset, NULL, NULL)<0)
return -1;
flushed+=flush_obuf(p);
flushed+=j;
}
it is obvious that this loop will never end if above mentioned write
failed. After breaking this loop if flush_obuf returns -1 my awnnode
does not hangs.
Can maintainer of libax25 fix this in next release ?
73 de Tihomir Heidelberg, 9a4gl@9a0tcp.ampr.org
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: libax25 axio_flush bug hangs node and fix
2002-10-15 9:52 libax25 axio_flush bug hangs node and fix Tihomir Heidelberg
@ 2002-10-15 18:22 ` Paul Lewis
2002-10-15 19:58 ` Tihomir Heidelberg
2002-10-16 3:21 ` Craig Small
1 sibling, 1 reply; 9+ messages in thread
From: Paul Lewis @ 2002-10-15 18:22 UTC (permalink / raw)
To: Tihomir Heidelberg; +Cc: linux-hams
Are the Symptoms of this shown as -
Internal connections to my Linux_node then onward connection fails.
Certain conditions the entry in the node user table stays there until I
kill the associated node process.
Also incoming netrom connections to my linux_node where the link breaks
between the remote and my system does not clear down and leaves the
user entry (USERS command) until I get round to Killing the associated
node process. Seem to be from the time I install libax25-0.0.10
I had been thinking that I was looking for a netrom/node patch as I had
to rebuild my system in August due to lost of hard disk. and Installed
the latest Libax25 as part of the rebuild.
de Paul g4apl (gb7cip Caterham_Uk)
In message <13346@9A0TCP>, Tihomir Heidelberg <9a4gl@9a0tcp.ampr.org>
writes
>Hi
>
>I noticed that my awznode (a variant of linux node) hangs and
>use maximum CPU when user starts any external command and
>incoming AX.25 connection get broken.
>
>After tracing found something strange in libax25-0.0.10 in
>ax25io.c.
>
>In function static int flush_obuf(ax25io *p)
>it returns -1 if write to output file descriptor failed
>if ((ret = write(p->ofd, p->obuf, p->optr < p->paclen ? p->optr :
>p->paclen)) < 0)
>return -1
>
>but, in axio_flush(ax25io *p)
>we have following loop:
>while (p->optr) {
> FD_ZERO(&fdset);
> FD_SET(p->ofd, &fdset);
> if (select(p->ofd+1, NULL, &fdset, NULL, NULL)<0)
> return -1;
> flushed+=flush_obuf(p);
> flushed+=j;
>}
>it is obvious that this loop will never end if above mentioned write
>failed. After breaking this loop if flush_obuf returns -1 my awnnode
>does not hangs.
>
>Can maintainer of libax25 fix this in next release ?
>
>73 de Tihomir Heidelberg, 9a4gl@9a0tcp.ampr.org
>-
>To unsubscribe from this list: send the line "unsubscribe linux-hams" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
--
paul@skywaves.demon.co.uk
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: libax25 axio_flush bug hangs node and fix
2002-10-15 18:22 ` Paul Lewis
@ 2002-10-15 19:58 ` Tihomir Heidelberg
0 siblings, 0 replies; 9+ messages in thread
From: Tihomir Heidelberg @ 2002-10-15 19:58 UTC (permalink / raw)
To: paul; +Cc: linux-hams
Hi
>Internal connections to my Linux_node then onward connection fails.
>Certain conditions the entry in the node user table stays there until I
>kill the associated node process.
it happend when someone connect to node, then execute any external
command (that call any linux command) and then connection get broken
but, because axio_flush enters unlimited loop when one end of pipe die, it
is very possible that this bug produce also hangs of node in different
situations
as I see this loop does not exists in libax25-0.0.9, so I it very
possible that I get this problem after installing libax25-0.0.10
at http://ham2.cc.fer.hr/9a4gl/ax25io.patch is patch you can apply
to libax25-0.0.10 to see if it helps in your situation
of course, then do:
make
make install
ldconfig
73 de Tihomir, 9a4gl@9a0tcp.ampr.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: libax25 axio_flush bug hangs node and fix
2002-10-15 9:52 libax25 axio_flush bug hangs node and fix Tihomir Heidelberg
2002-10-15 18:22 ` Paul Lewis
@ 2002-10-16 3:21 ` Craig Small
2002-10-16 12:56 ` Tomi Manninen OH2BNS
1 sibling, 1 reply; 9+ messages in thread
From: Craig Small @ 2002-10-16 3:21 UTC (permalink / raw)
To: Tihomir Heidelberg; +Cc: linux-hams
On Tue, Oct 15, 2002 at 11:52:11AM +0200, Tihomir Heidelberg wrote:
> I noticed that my awznode (a variant of linux node) hangs and
> use maximum CPU when user starts any external command and
> incoming AX.25 connection get broken.
Damn, i thought this been fixed in 10. Tomi and somoene else were
discussing it, I'll need their help again.
- Craig
--
Craig Small VK2XLZ GnuPG:1C1B D893 1418 2AF4 45EE 95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/ <csmall@enc.com.au>
MIEEE <csmall@ieee.org> Debian developer <csmall@debian.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: libax25 axio_flush bug hangs node and fix
2002-10-16 3:21 ` Craig Small
@ 2002-10-16 12:56 ` Tomi Manninen OH2BNS
2002-10-17 7:46 ` Kjell Jarl
0 siblings, 1 reply; 9+ messages in thread
From: Tomi Manninen OH2BNS @ 2002-10-16 12:56 UTC (permalink / raw)
To: Craig Small; +Cc: linux-hams
On Wed, 16 Oct 2002, Craig Small wrote:
> On Tue, Oct 15, 2002 at 11:52:11AM +0200, Tihomir Heidelberg wrote:
> > I noticed that my awznode (a variant of linux node) hangs and
> > use maximum CPU when user starts any external command and
> > incoming AX.25 connection get broken.
>
> Damn, i thought this been fixed in 10. Tomi and somoene else were
> discussing it, I'll need their help again.
No, actually the bug was introduced in .10 ...
Anyway, I spent last evening thinking back what was discussed with Jeroen
last spring and then cleaning up his last patch a bit. I will send that to
you in a few days.
Now, the problem is that this patch will revert the "fix" that was tried
in 10. Jeroen tried to fix a problem in the applications with a fix in the
lib. After the discussion we finally agreed that the .9 behaviour is
correct, only a few relatively minor bugs needed to be squashed. The "big"
bug is in the applications using libax25io and _they_ need to be fixed.
So finally yesterday I also started to really think how to fix node. The
fix won't be trivial but I'll try and produce a working fix soon. Gosh...
That would be the first new node release since -99 ...
(If you want a quick fix, then Tihomir's patch seems ok to me. But I
really wouldn't like to see that in an official release.)
--
Tomi Manninen Internet: oh2bns@sral.fi
OH2BNS AX.25: oh2bns@oh2rbi.fin.eu
KP20ME04 Amprnet: oh2bns@oh2rbi.ampr.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: libax25 axio_flush bug hangs node and fix
2002-10-16 12:56 ` Tomi Manninen OH2BNS
@ 2002-10-17 7:46 ` Kjell Jarl
2002-10-17 13:41 ` Tomi Manninen
0 siblings, 1 reply; 9+ messages in thread
From: Kjell Jarl @ 2002-10-17 7:46 UTC (permalink / raw)
To: linux-hams
Tomi Manninen OH2BNS wrote:
> No, actually the bug was introduced in .10 ...
I am runnung libax25-0.0.7-7 under redhat 7.1, kernel 2.4.19 (2.4.13 was
the same). I fell I have the same problem with freezing after broken
connections, probably more than just node.
Is this the same bug?
73
Kjell sm7gvf
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: libax25 axio_flush bug hangs node and fix
2002-10-17 7:46 ` Kjell Jarl
@ 2002-10-17 13:41 ` Tomi Manninen
2002-10-17 18:41 ` Kjell Jarl
0 siblings, 1 reply; 9+ messages in thread
From: Tomi Manninen @ 2002-10-17 13:41 UTC (permalink / raw)
To: Kjell Jarl; +Cc: linux-hams
On Thu, 17 Oct 2002, Kjell Jarl wrote:
> Tomi Manninen OH2BNS wrote:
> > No, actually the bug was introduced in .10 ...
>
> I am runnung libax25-0.0.7-7 under redhat 7.1, kernel 2.4.19 (2.4.13 was
> the same). I fell I have the same problem with freezing after broken
> connections, probably more than just node.
> Is this the same bug?
No it isn't. This bug is only in 0.0.10. What exactly freezes?
--
Tomi Manninen Internet: oh2bns@sral.fi
OH2BNS AX.25: oh2bns@oh2rbi.fin.eu
KP20ME04 Amprnet: oh2bns@oh2rbi.ampr.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: libax25 axio_flush bug hangs node and fix
2002-10-17 13:41 ` Tomi Manninen
@ 2002-10-17 18:41 ` Kjell Jarl
2002-10-17 19:17 ` Jeroen Vreeken
0 siblings, 1 reply; 9+ messages in thread
From: Kjell Jarl @ 2002-10-17 18:41 UTC (permalink / raw)
To: linux-hams
> No it isn't. This bug is only in 0.0.10. What exactly freezes?
I get a kernel panic. Similar to
> <0> Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing
I do not have a recent write down.
I am using a USCC card, which I have been suspecting before. But it
seems to be related to either netrom/node disconnects possibly with
outstanding frames or tcp/ip started from node, or an DISC/UA at the
ax.25 link layer.
I have seen the DISC/UA in listen on a frozen vnc screen before (vnc)
tcp retried out and the screen wanished.
Nothing in the logs of the failure, except
Oct 16 21:55:27 pc2 node[650]: sm6tpn @ 213.x.x.x logged in
Oct 16 21:55:45 pc2 node[650]: Connected to VAXJO:SK7HW-5
Oct 17 01:47:21 pc2 syslogd 1.4-0: restart.
suggesting node involved, probably sm6tpn issued a netrom connect.
73
Kjell
RH7.1
Linux pc2 2.4.19 #3 Tue Sep 17 18:33:09 CEST 2002 i586 unknown
libax25-devel-0.0.7-7
libax25-0.0.7-7
ax25-tools-0.0.6-13
ax25-apps-0.0.4-9
node-0.3.0-5
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: libax25 axio_flush bug hangs node and fix
2002-10-17 18:41 ` Kjell Jarl
@ 2002-10-17 19:17 ` Jeroen Vreeken
0 siblings, 0 replies; 9+ messages in thread
From: Jeroen Vreeken @ 2002-10-17 19:17 UTC (permalink / raw)
To: Kjell Jarl, linux-hams
On 2002.10.17 20:41:54 +0200 Kjell Jarl wrote:
> > No it isn't. This bug is only in 0.0.10. What exactly freezes?
>
> I get a kernel panic. Similar to
>
> > <0> Kernel panic: Aiee, killing interrupt handler!
> > In interrupt handler - not syncing
>
> I do not have a recent write down.
>
> I am using a USCC card, which I have been suspecting before. But it
> seems to be related to either netrom/node disconnects possibly with
> outstanding frames or tcp/ip started from node, or an DISC/UA at the
> ax.25 link layer.
>
> I have seen the DISC/UA in listen on a frozen vnc screen before (vnc)
> tcp retried out and the screen wanished.
I have a been chasing a bug that seems similar lately....
It looks like something goes wrong in sock_def_write_space(). It is called
as a result of kfree_skb().
I have been trying to find the cause of it but have been unsuccessfull
sofar.
The problem I have is that it only occurs about once a week, so any fix I
think I make takes at least a week to test :( unless I find a way to
trigger it faster.
Jeroen
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2002-10-17 19:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-15 9:52 libax25 axio_flush bug hangs node and fix Tihomir Heidelberg
2002-10-15 18:22 ` Paul Lewis
2002-10-15 19:58 ` Tihomir Heidelberg
2002-10-16 3:21 ` Craig Small
2002-10-16 12:56 ` Tomi Manninen OH2BNS
2002-10-17 7:46 ` Kjell Jarl
2002-10-17 13:41 ` Tomi Manninen
2002-10-17 18:41 ` Kjell Jarl
2002-10-17 19:17 ` Jeroen Vreeken
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox