All of lore.kernel.org
 help / color / mirror / Atom feed
* AX.25 unaccepted socket makes problems
@ 2003-05-27 16:17 Tihomir Heidelberg
  2003-05-28  0:04 ` Thomas Osterried
  0 siblings, 1 reply; 6+ messages in thread
From: Tihomir Heidelberg @ 2003-05-27 16:17 UTC (permalink / raw)
  To: linux-hams

Hi linux-hams,

finaly I found a way how to produce AX.25 kernel get mad,
I mean, when "cat /proc/net/ax25" produce segmentation fault,
and everything got unstable.

If your program bind ax.25 socket for listening, other station
connects to that listening socket and if listening socket dies
without accepting new connection from you program, after short
period the things got mad...

In ax25_destroy_socket function (af_ax25.c), there is a while
loop for dequeuing those unaccepted connections and preparing
them to die when heardbeat catch them. Also, there is code in
ax25_std_heartbeat_expiry function (ax25_std_timer.c) that
should destroy those connections, unfortunately those connection
are NOT destroyed here ! A condition:
        (ax25->sk->state == TCP_LISTEN && ax25->sk->dead)
is not true because state of those connections are not
TCP_LISTEN. State is TCP_ESTABLISHED or TCP_CLOSE, depending if
other end disconnected or not. Cannot get the reason why
segmentation fault is happening, but hope someone else can find
a reason when problem is reproducable.

To avoid this problem I added to ax25_destroy_socket function
(af_ax25.c) following:
        if (skb->sk != ax25->sk) {
+               skb->sk->state     = TCP_LISTEN;
+               if (!skb->sk->dead) {
+                       skb->sk->state_change(skb->sk);
+               }
                skb->sk->dead  = 1;
                ax25_start_heartbeat(skb->sk->protinfo.ax25);
                skb->sk->protinfo.ax25->state = AX25_STATE_0;
        }

I am pretty sure that this is not the way how problem should be
fixed, there should be a different way how to mark socket for
destroying, but at least, I got rid of this problem.

Also, some months ago I mention here that regulary I get this
AX.25 kernel behavior after few days of running 9A0TCP gateway
machine. I noticed that very often ax25d died or had to restart
ax25d because it was not handling connections. Think this
bind/non-accept kernel problem is very probably the reason.

Anyone agree with me ? Can anyone make a resonable patch/fix
for this problem ? Maybe setting skb->sk->destroy to 1 is right
way ?

73 de Tihomir Heidelberg, 9a4gl@9a0tcp.ampr.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AX.25 unaccepted socket makes problems
  2003-05-27 16:17 Tihomir Heidelberg
@ 2003-05-28  0:04 ` Thomas Osterried
  2003-05-28  6:46   ` Tihomir Heidelberg
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Osterried @ 2003-05-28  0:04 UTC (permalink / raw)
  To: Tihomir Heidelberg; +Cc: linux-hams

> are NOT destroyed here ! A condition:
>         (ax25->sk->state == TCP_LISTEN && ax25->sk->dead)
> is not true because state of those connections are not

as far as i got into the code when i looked for the reason for the
segfault problem, former connected sessions come to the state TCP_LISTEN
when they get disconnected.

> +               skb->sk->state     = TCP_LISTEN;
> +               if (!skb->sk->dead) {
> +                       skb->sk->state_change(skb->sk);
> +               }

by insuring with with your patch should at least do not harm.

perhaps there's the problem for this phaenome:
|    DB0TUD-0   DB0TUD-1   ax0     LISTENING    007/005  0       320
|    DL6MPG-1   DB0TUD-15  ax0     LISTENING    005/007  0       432
|    DB0TUD-0   DB0TUD-4   ax0     LISTENING    002/005  0       80
|    DB0DSD-0   DB0TUD-1   ax0     LISTENING    001/000  0       80
|    DL6MPG-4   DB0TUD-15  ax0     LISTENING    002/001  0       816

> Cannot get the reason why
> segmentation fault is happening, but hope someone else can find
> a reason when problem is reproducable.

as i noted abt a month ago in a mail to this list, i tracked the
problem down to the following point:
there's a list of active ax25 control blocks (ax25_cb *ax25_list).
this list gets corrupted. while walking through ax25_list->next->....->next,
somewhere in this reference is pointing to somewhere where it should not.

ax25_get_info(), which is called when doing "netstat -a" or
"cat /proc/net/ax25" from userspace, walks through this list. i had a chance
to let show me the data of all ax25_cb elements while walking through
the list (after i spicked my kernel with debug routines).
somewhere in the middle of the list, the reference to "next" was inkonsistant
in comparison to the time it was before, and the kernel oopsed while walking
over the next list element pointer.

i looked several times over the code (routines adding and removing list
elements, checking for cli() and restore_flags()). everything looks ok
for me.

currently, i assume this error is such difficile, because it may not
be caused by the routines which change the list, but because _maybe_
another routine in ax25.o allocates a buffer / struct, writes more
data as it should to it, and overwrites parts of the struct where
the ax25 control block is stored, causing corrupted data in the linked
list.

[as i mentioned also, ax25rtd which adds ax25 route lists to the kernel,
causes troubles to the kernel. perhaps it's one of those routines which
messes up the ax25 cb lists as side effect]

btw, a few days ago i announced a bugfix (the [invalid] to [invalid] SABM
problem). it does fix the problem. but my hope that the kernel oopses
will go away was not proven: they still occured.

> Also, some months ago I mention here that regulary I get this
> AX.25 kernel behavior after few days of running 9A0TCP gateway
> machine. I noticed that very often ax25d died or had to restart
> ax25d because it was not handling connections. Think this
> bind/non-accept kernel problem is very probably the reason.

afik, it's caused by the inkonsistent list:
ax25_find_listener() walks over ax25_list. and may not find the ax25
control block which actually listens.
ax25_find_listener() is called by ax25_rcv() in ax25_in.c when an SABM
is received.

a linux with a flexnet digipeater as neighbour (which connects quite often
for link test probes) gets most oopses this way.


on the other hand, an ax25 socket which is listening for incoming connections,
for e.g. by ax25d, gets sometimes
[pid  4924] accept(7, 0xbffff714, [72]) = -1 ECONNABORTED (Software caused connection abort)
that should never happen.


73,
	- thomas  dl9sau


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AX.25 unaccepted socket makes problems
  2003-05-28  0:04 ` Thomas Osterried
@ 2003-05-28  6:46   ` Tihomir Heidelberg
  2003-05-28 10:20     ` Steven Whitehouse
  2003-05-28 10:25     ` Thomas Osterried
  0 siblings, 2 replies; 6+ messages in thread
From: Tihomir Heidelberg @ 2003-05-28  6:46 UTC (permalink / raw)
  To: thomas; +Cc: linux-hams

Hi,

>> are NOT destroyed here ! A condition:
>>         (ax25->sk->state == TCP_LISTEN && ax25->sk->dead)
>> is not true because state of those connections are not
>
>as far as i got into the code when i looked for the reason for the
>segfault problem, former connected sessions come to the state TCP_LISTEN
>when they get disconnected.

I couldnt get TCP_LISTEN state, ax25_rcv in ax25_in.c set that state
to TCP_ESTABLISHED, dont see where it can be changed to TCP_LISTEN,
debugging messages reports TCP_ESTABLISHED or TCP_CLOSE, depending
if remote station disconnected or not.

>> +               skb->sk->state     = TCP_LISTEN;
>> +               if (!skb->sk->dead) {
>> +                       skb->sk->state_change(skb->sk);
>> +               }
>
>by insuring with with your patch should at least do not harm.

yap, but TCP_LISTEN state is not proper state for such connections,
so think there should be more reasonable fix for this

also I think that skb->sk->pair should be set to NULL, because from this
point the pair (listening socket) does not exist any more and reference
to this can cause problem, right ?

> there's a list of active ax25 control blocks (ax25_cb *ax25_list).
> this list gets corrupted. while walking through ax25_list->next->....->next,
> somewhere in this reference is pointing to somewhere where it should not.
...
> i looked several times over the code (routines adding and removing list
> elements, checking for cli() and restore_flags()). everything looks ok
> for me.

yap, I also spent a lot of hours looking in code that handle this list,
but cannot find anything that can hurt that list

> [as i mentioned also, ax25rtd which adds ax25 route lists to the kernel,
> causes troubles to the kernel. perhaps it's one of those routines which
> messes up the ax25 cb lists as side effect]

ax25rtd/axparms calls ioctl(SIOCADDRT) on ax.25 socket, I see that
ax25_route_list is not protected with cli() stuff, maybe we should
protect this list too ? dont know how this can hurt ax25_list, but
protecting this list will not hurt anyone...

anyway, I read that Ralf is doing some things in 2.5 kernel tree,
spinlock instead cli() protection should be used in future ax.25
kernel, as I read Ralf works on spinlock in ax.25 code... should
we move to spinlocks in 2.4 kernels too or we will wait for 2.6 ?

>> Also, some months ago I mention here that regulary I get this
>> AX.25 kernel behavior after few days of running 9A0TCP gateway
>> machine. I noticed that very often ax25d died or had to restart
>> ax25d because it was not handling connections. Think this
>> bind/non-accept kernel problem is very probably the reason.
>
>afik, it's caused by the inkonsistent list:
>ax25_find_listener() walks over ax25_list. and may not find the ax25
>control block which actually listens.
>ax25_find_listener() is called by ax25_rcv() in ax25_in.c when an SABM
>is received.

yup, but if ax25d stops accepting connections, you will get ooops, dont know
what came first, ax25d problem or kernel problem... but for sure ax25d
problem may cause kernel problem...

73 de Tihomir Heidelberg, 9a4gl@9a0tcp.ampr.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AX.25 unaccepted socket makes problems
  2003-05-28  6:46   ` Tihomir Heidelberg
@ 2003-05-28 10:20     ` Steven Whitehouse
  2003-05-28 10:25     ` Thomas Osterried
  1 sibling, 0 replies; 6+ messages in thread
From: Steven Whitehouse @ 2003-05-28 10:20 UTC (permalink / raw)
  To: Tihomir Heidelberg; +Cc: thomas, linux-hams

Hi,

I've got a few odd bits of patches for AX.25 for 2.5 kernels that I'm intending
to feed into the kernel shortly. I'll take a look at the problem you've
brought up at the same time as I'd like to have that solved. My main
interest though is 2.5 rather than fixing up 2.4.

[snip]
> 
> > [as i mentioned also, ax25rtd which adds ax25 route lists to the kernel,
> > causes troubles to the kernel. perhaps it's one of those routines which
> > messes up the ax25 cb lists as side effect]
> 
> ax25rtd/axparms calls ioctl(SIOCADDRT) on ax.25 socket, I see that
> ax25_route_list is not protected with cli() stuff, maybe we should
> protect this list too ? dont know how this can hurt ax25_list, but
> protecting this list will not hurt anyone...
> 
> anyway, I read that Ralf is doing some things in 2.5 kernel tree,
> spinlock instead cli() protection should be used in future ax.25
> kernel, as I read Ralf works on spinlock in ax.25 code... should
> we move to spinlocks in 2.4 kernels too or we will wait for 2.6 ?
>
Yes, some of his work has made it in, some more can be found in the linux-hams
bkbits tree. If you are looking at structures that are not currently
protected in 2.4, then I'd suggest looking at spinlocks right away. You
might find the solution in Ralf's work for 2.5 and it would be a good
idea to try and keep 2.4 similar to 2.5 where possible.

The last day or two I've been quite busy and I hope to have some time to
put out my patches for testing shortly,

Steve.
G7RRM


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AX.25 unaccepted socket makes problems
  2003-05-28  6:46   ` Tihomir Heidelberg
  2003-05-28 10:20     ` Steven Whitehouse
@ 2003-05-28 10:25     ` Thomas Osterried
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Osterried @ 2003-05-28 10:25 UTC (permalink / raw)
  To: Tihomir Heidelberg

> also I think that skb->sk->pair should be set to NULL, because from this
> point the pair (listening socket) does not exist any more and reference
> to this can cause problem, right ?

hmm. the listening socket exists as long the userspace application does
listen().
but i do not really understand the sk->pair concept.


also i have problems in understanding the concept in ax25_destroy_socket():
skb->sk->dead is set to 1, then ax25_start_heartbeat() is called, which
is a timer (which expires in +5s).

if we set it to skb->sk->state = TCP_LISTEN and our state is
AX25_STATE_0, then this timer will call ax25_destroy_socket() again.
due to the timer it is not a loop (hopefully), but after our first
ax25_destroy_socket() goes further down (ax25->sk == NULL), we will call
ax25_free_cb(ax25). what will our timer do, which we restarted above and
which will work on this ax25 control block?

and: if ax25->sk is != NULL and there are no in/outstanding buffer, there's
only an sk_free(ax25->sk). the socket is not destroyed, but ax25->sk is not
set to NULL. does it refer now to an unassigned memory segment? should'n
in this case ax25_free_cb() called too? here also the problem with a
potentially running ax25_std_heartbeat_expiry(), which will use ax25->sk->...
we have just free'd.

well, i deeply hope there's someone out there who knows how this code works ;)


> > [as i mentioned also, ax25rtd which adds ax25 route lists to the kernel,
> > causes troubles to the kernel. perhaps it's one of those routines which
> > messes up the ax25 cb lists as side effect]
> 
> ax25rtd/axparms calls ioctl(SIOCADDRT) on ax.25 socket, I see that
> ax25_route_list is not protected with cli() stuff, maybe we should
> protect this list too ? dont know how this can hurt ax25_list, but
> protecting this list will not hurt anyone...

just thought the same some weeks ago.

i have done this (interested in?). after the observation that when
ax25rtd is running, the oops problem on ax25 cb's (which are independend
of the ax25 route list) occorued.

unfotunately, i never found a way to produce the corrupted ax25 cb list
by force. thus i only could wait for oopses. and my system where i tried
this never oopsed :(

> anyway, I read that Ralf is doing some things in 2.5 kernel tree,
> spinlock instead cli() protection should be used in future ax.25
> kernel, as I read Ralf works on spinlock in ax.25 code... should
> we move to spinlocks in 2.4 kernels too or we will wait for 2.6 ?

i'd advice not to hack too much in the current code. we may get more
problems or side-effects.


73,
	- thomas


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AX.25 unaccepted socket makes problems
@ 2003-05-28 12:29 Tihomir Heidelberg
  0 siblings, 0 replies; 6+ messages in thread
From: Tihomir Heidelberg @ 2003-05-28 12:29 UTC (permalink / raw)
  To: thomas; +Cc: linux-hams

Hi,

>> also I think that skb->sk->pair should be set to NULL, because from this
>> point the pair (listening socket) does not exist any more and reference
>> to this can cause problem, right ?
>
>hmm. the listening socket exists as long the userspace application does
>listen().
>but i do not really understand the sk->pair concept.

yap, but I am talking about when listening socket is going to be
destroyed and there are some unaccepted connections made by listening
socket (lets call them CHILDs)... those CHILDs keeps living in kernel after
listening socket is destroyed... CHILDs have sk->pair pointing
to its listening socket (I really do not know who use/need that sk->pair)
but after listening socket is destroyed I think that CHILDs must not have
a reference to it anymore...

> also i have problems in understanding the concept in ax25_destroy_socket():
> skb->sk->dead is set to 1, then ax25_start_heartbeat() is called, which
> is a timer (which expires in +5s).

yap, CHILDs are destroyed after +5s (heardbeat expirity), I really do not
know why they are destroyed this way with a delay ?! Why dont we call
ax25_destroy_socket recursivly to destroy all CHILDs ? Anyone know the
reason ? They are anyway destroyed the same way on heardbeat exipirity.

> if we set it to skb->sk->state = TCP_LISTEN and our state is
> AX25_STATE_0, then this timer will call ax25_destroy_socket() again.

again, but first time it destroyed listening socket, and from timer it
destroy unaccepted socket...

> potentially running ax25_std_heartbeat_expiry(), which will use
> ax25->sk->... we have just free'd.

no, timer is handling nondestroyed socket (the CHILD)

> i have done this (interested in?). after the observation that when
> ax25rtd is running, the oops problem on ax25 cb's (which are independend
> of the ax25 route list) occorued.

yup, but when ax25_route_list is unprotected, it get broken, it is very
probably that new element in ax25_list is at memory which was used a bit
ago in ax25_route_list, anyone know if ax25_route_list also get broken
when ax25_list is broken ?

73 de Tihomir, 9a4gl@9a0tcp.ampr.org


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-05-28 12:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-28 12:29 AX.25 unaccepted socket makes problems Tihomir Heidelberg
  -- strict thread matches above, loose matches on Subject: below --
2003-05-27 16:17 Tihomir Heidelberg
2003-05-28  0:04 ` Thomas Osterried
2003-05-28  6:46   ` Tihomir Heidelberg
2003-05-28 10:20     ` Steven Whitehouse
2003-05-28 10:25     ` Thomas Osterried

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.