* [PATCH] negative dev use in /proc/net/rose_neigh
@ 2008-09-28 19:56 Bernard Pidoux
2008-09-30 14:32 ` David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Bernard Pidoux @ 2008-09-28 19:56 UTC (permalink / raw)
To: netdev, davem; +Cc: bpidoux
[-- Attachment #1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #2: [PATCH] negative dev use in /proc/net/rose_neigh --]
[-- Type: text/plain, Size: 1660 bytes --]
When running rose network applications, rose_neigh use counter can become negative as shown below.
Number 65535 actually represents a short integer underflow, meaning that use counter has been
decremented while equal to zero.
Then use counter continues to decrease by one each time the function is called.
proc/net/rose_neigh
addr callsign dev count use mode restart t0 tf digipeaters
00005 F5KCK-11 ax1 4 1 DTE yes 0 0
00004 F6BVP-5 ax4 6 0 DTE no 0 0
00003 F6BVP-7 ax4 6 0 DCE yes 0 0
00002 F6BVP-11 ax4 6 65535 DCE yes 0 0
00001 RSLOOP-0 ??? 1 4 DCE yes 0 0
After investigations I found that use counter value was going negative when
rose_kill_by_neigh() (in af_rose.c) was called and sk_for_each() macro loop activated
rose->neighbour->use-- more than once.
I propose the following patch to avoid use counter underflow.
However a KERN_WARNING message could be better instead of KERN_ERR.
Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
---
net/rose/af_rose.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index a7f1ce1..8a54cff 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -175,7 +175,10 @@ void rose_kill_by_neigh(struct rose_neigh *neigh)
if (rose->neighbour == neigh) {
rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
- rose->neighbour->use--;
+ if (rose->neighbour->use > 0 )
+ rose->neighbour->use--;
+ else
+ printk(KERN_ERR "ROSE: rose_kill_by_neigh() - neighbour->use-- could be < 0\n");
rose->neighbour = NULL;
}
}
--
1.5.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] negative dev use in /proc/net/rose_neigh
2008-09-28 19:56 [PATCH] negative dev use in /proc/net/rose_neigh Bernard Pidoux
@ 2008-09-30 14:32 ` David Miller
2008-09-30 21:44 ` Bernard Pidoux
0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2008-09-30 14:32 UTC (permalink / raw)
To: bpidoux; +Cc: netdev
From: Bernard Pidoux <bpidoux@free.fr>
Date: Sun, 28 Sep 2008 21:56:38 +0200
> I propose the following patch to avoid use counter underflow.
> However a KERN_WARNING message could be better instead of KERN_ERR.
I don't see any large value in adding this patch right now.
At best it's a new BUG check, it doesn't actually fix the
problem. I'd rather apply a fix :-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] negative dev use in /proc/net/rose_neigh
2008-09-30 14:32 ` David Miller
@ 2008-09-30 21:44 ` Bernard Pidoux
2008-10-13 7:30 ` David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Bernard Pidoux @ 2008-09-30 21:44 UTC (permalink / raw)
To: David Miller; +Cc: netdev
I slightly disagree. The patch actually prevents use counter to become
underflowed (starting with 65535 then 65534 etc...)
The patch makes sure that use never becomes less than 0.
This may prevent further problems.
However I agree that the real reason of the bug is still unknown.
It must be hidden in the very obscure loop (at least for me).
Here is the original code :
void rose_kill_by_neigh( struct rose_neigh *neigh)
{
struct sock *s;
struct hlist_node *node;
spin_lock_bh(&rose_list_lock);
sk_for_each(s,node,&rose_list) {
struct rose_sock *rose = rose_sk(s);
if (rose->neighbour == neigh) {
rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
rose->neighbour->use--;
rose->neighbour = NULL;
}
}
spin_unlock_bh(&rose_list_lock);
}
I suspect that the bug was unravelled when we added more than one
neighbour per route. The protocole accepts three, but this was not much
used during the early days since the density of radio stations on the
network was not big (only one node station per destination address
usually). The network is now denser with Internet links.
However, I don't understand why the test
if (rose->neighbour == neigh)
does not work, for
rose->neighbour = NULL;
should prevent next comparison to be valid and thus instruction
rose->neighbour->use--; not executed.
I have seen that a problem with sk_for_each() macro was identified a
while ago into ax25 code. The problem here could be similar ?
Bernard
David Miller wrote:
> From: Bernard Pidoux <bpidoux@free.fr>
> Date: Sun, 28 Sep 2008 21:56:38 +0200
>
>> I propose the following patch to avoid use counter underflow.
>> However a KERN_WARNING message could be better instead of KERN_ERR.
>
> I don't see any large value in adding this patch right now.
> At best it's a new BUG check, it doesn't actually fix the
> problem. I'd rather apply a fix :-)
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] negative dev use in /proc/net/rose_neigh
2008-09-30 21:44 ` Bernard Pidoux
@ 2008-10-13 7:30 ` David Miller
2008-10-21 14:14 ` pidoux
0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2008-10-13 7:30 UTC (permalink / raw)
To: bpidoux; +Cc: netdev
From: Bernard Pidoux <bpidoux@free.fr>
Date: Tue, 30 Sep 2008 23:44:55 +0200
> I suspect that the bug was unravelled when we added more than one
> neighbour per route. The protocole accepts three, but this was not
> much used during the early days since the density of radio stations
> on the network was not big (only one node station per destination
> address usually). The network is now denser with Internet links.
>
> However, I don't understand why the test
>
> if (rose->neighbour == neigh)
>
> does not work, for
> rose->neighbour = NULL;
> should prevent next comparison to be valid and thus instruction
> rose->neighbour->use--; not executed.
>
> I have seen that a problem with sk_for_each() macro was identified a
> while ago into ax25 code. The problem here could be similar ?
I took a look at this some more.
That neighbour case loop you mention does set ->neighbour to NULL.
But other paths do not. Just look for all of the pieces of code
that do "rose->neighbour->use--;" and you'll find a few that do not
NULL it out.
One such example is rose_kill_by_device().
That would cause a problem because, while rose_disconnect() marks
the socket DEAD, it doesn't actually remove it from "rose_list".
That happens later when the user actually closes the socket or
some other similar event occurs.
Therefore if rose_kill_by_neigh() happens next, the ->neighbour could
match and we'll decrement again.
But I have no idea how safe it is to NULL out this ->neighbour
unconditionally. Lots of code seems to deref the thing unconditionally.
For example the ROSE_STATE_2 handling in rose_release().
I guess since rose_disconnect() sets sk->sk_state to TCP_CLOSE, we'll
be OK here.
Can you try this patch?
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index a7f1ce1..41dd630 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -197,6 +197,7 @@ static void rose_kill_by_device(struct net_device *dev)
if (rose->device == dev) {
rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
rose->neighbour->use--;
+ rose->neighbour = NULL;
rose->device = NULL;
}
}
@@ -625,6 +626,7 @@ static int rose_release(struct socket *sock)
case ROSE_STATE_2:
rose->neighbour->use--;
+ rose->neighbour = NULL;
release_sock(sk);
rose_disconnect(sk, 0, -1, -1);
lock_sock(sk);
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] negative dev use in /proc/net/rose_neigh
2008-10-13 7:30 ` David Miller
@ 2008-10-21 14:14 ` pidoux
0 siblings, 0 replies; 5+ messages in thread
From: pidoux @ 2008-10-21 14:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David,
I tried the patch you proposed.
I agree that it makes sense, however it does not prevent rose->neighbour->use to become negative as displayed in
/proc/net/rose_neigh use field value.
I already looked at all of the pieces of code that do "rose->neighbour->use--;"
The only place that caused use to underflow (negative) is actually inside rose_kill_by_neigh().
This is why I had put a test and a warning there.
I think that inside of sk_for_each() loop in rose_kill_by_neigh() when rose->neighbour->use-- becomes = 0 then
rose->neighbour should be NULLed and in that case only.
However it seems that rose->neighbour is not actually NULLed for in that case the comparison would not be true anymore
and the decrement would not occur.
I will soon report the printk output of rose->neighbour->use-- inside of the loop to illustrate what happens here.
Bernard Pidoux
David Miller wrote:
> From: Bernard Pidoux <bpidoux@free.fr>
> Date: Tue, 30 Sep 2008 23:44:55 +0200
>
>> I suspect that the bug was unravelled when we added more than one
>> neighbour per route. The protocole accepts three, but this was not
>> much used during the early days since the density of radio stations
>> on the network was not big (only one node station per destination
>> address usually). The network is now denser with Internet links.
>>
>> However, I don't understand why the test
>>
>> if (rose->neighbour == neigh)
>>
>> does not work, for
>> rose->neighbour = NULL;
>> should prevent next comparison to be valid and thus instruction
>> rose->neighbour->use--; not executed.
>>
>> I have seen that a problem with sk_for_each() macro was identified a
>> while ago into ax25 code. The problem here could be similar ?
>
> I took a look at this some more.
>
> That neighbour case loop you mention does set ->neighbour to NULL.
>
> But other paths do not. Just look for all of the pieces of code
> that do "rose->neighbour->use--;" and you'll find a few that do not
> NULL it out.
>
> One such example is rose_kill_by_device().
>
> That would cause a problem because, while rose_disconnect() marks
> the socket DEAD, it doesn't actually remove it from "rose_list".
> That happens later when the user actually closes the socket or
> some other similar event occurs.
>
> Therefore if rose_kill_by_neigh() happens next, the ->neighbour could
> match and we'll decrement again.
>
> But I have no idea how safe it is to NULL out this ->neighbour
> unconditionally. Lots of code seems to deref the thing unconditionally.
> For example the ROSE_STATE_2 handling in rose_release().
>
> I guess since rose_disconnect() sets sk->sk_state to TCP_CLOSE, we'll
> be OK here.
>
> Can you try this patch?
>
> diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
> index a7f1ce1..41dd630 100644
> --- a/net/rose/af_rose.c
> +++ b/net/rose/af_rose.c
> @@ -197,6 +197,7 @@ static void rose_kill_by_device(struct net_device *dev)
> if (rose->device == dev) {
> rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
> rose->neighbour->use--;
> + rose->neighbour = NULL;
> rose->device = NULL;
> }
> }
> @@ -625,6 +626,7 @@ static int rose_release(struct socket *sock)
>
> case ROSE_STATE_2:
> rose->neighbour->use--;
> + rose->neighbour = NULL;
> release_sock(sk);
> rose_disconnect(sk, 0, -1, -1);
> lock_sock(sk);
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-10-21 14:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-28 19:56 [PATCH] negative dev use in /proc/net/rose_neigh Bernard Pidoux
2008-09-30 14:32 ` David Miller
2008-09-30 21:44 ` Bernard Pidoux
2008-10-13 7:30 ` David Miller
2008-10-21 14:14 ` pidoux
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).