* [PATCH][IPV6] keeping dst refcnt correctly with using xfrm
@ 2003-06-06 5:49 Kazunori Miyazawa
2003-06-06 5:55 ` David S. Miller
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Kazunori Miyazawa @ 2003-06-06 5:49 UTC (permalink / raw)
To: davem, kuznet; +Cc: usagi, netdev
Hello,
I observed invalid refcnt incrementation when using IPsec in IPv6.
I configured IPsec and did ping6 then refcnt of dst
was incremented two by two. I observed it with using "route -A inet6".
I also check it with using printk.
This patch fixes dst reference count management.
In dst_pop refernce cound of dsts except for last are incremented in
dst_clone and decremented in next call dst_pop but last dst refernce
count will be never decremented.
All dst are held by xfrm_policy and there is no need to touch the
refernce count here.
In output functions, dst is changed by xfrm_lookup if there is
any matching policy. Therefore original dst which is held before
calling xfrm_lookup will be never released.
When xfrm_lookup scceeds and dst is changed, original dst should
be release.
Patch-Name: fix dst refcnt with xfrm
Patch-Id: FIX_2_5_70+CS1_1259_DST_REFCNT_WITH_XFRM
Patch-Author: Kazunori Miyazawa / USAGI Project <miyazawa@linux-ipv6.org>
Credit: Kazunori Miyazawa / USAGI Project <miyazawa@linux-ipv6.org>
Index: linux25/include/net/dst.h
===================================================================
RCS file: /cvsroot/usagi/usagi-backport/linux25/include/net/dst.h,v
retrieving revision 1.1.1.9
retrieving revision 1.1.1.9.22.1
diff -u -r1.1.1.9 -r1.1.1.9.22.1
--- linux25/include/net/dst.h 17 Apr 2003 18:15:56 -0000 1.1.1.9
+++ linux25/include/net/dst.h 6 Jun 2003 05:02:36 -0000 1.1.1.9.22.1
@@ -160,10 +160,7 @@
static inline struct dst_entry *dst_pop(struct dst_entry *dst)
{
- struct dst_entry *child = dst_clone(dst->child);
-
- dst_release(dst);
- return child;
+ return dst->child;
}
extern void * dst_alloc(struct dst_ops * ops);
Index: linux25/net/ipv6/ip6_output.c
===================================================================
RCS file: /cvsroot/usagi/usagi-backport/linux25/net/ipv6/ip6_output.c,v
retrieving revision 1.1.1.16
retrieving revision 1.1.1.16.14.1
diff -u -r1.1.1.16 -r1.1.1.16.14.1
--- linux25/net/ipv6/ip6_output.c 26 May 2003 08:04:10 -0000 1.1.1.16
+++ linux25/net/ipv6/ip6_output.c 6 Jun 2003 05:00:58 -0000 1.1.1.16.14.1
@@ -211,6 +211,8 @@
if ((err = xfrm_lookup(&skb->dst, fl, sk, 0)) < 0) {
return err;
}
+ if (dst != skb->dst)
+ dst_release(dst);
if (opt) {
int head_room;
@@ -595,10 +597,13 @@
pktlength = length;
if (dst) {
+ struct dst_entry *dst0 = dst;
if ((err = xfrm_lookup(&dst, fl, sk, 0)) < 0) {
dst_release(dst);
return -ENETUNREACH;
}
+ if (dst0 != dst)
+ dst_release(dst0);
}
if (hlimit < 0) {
@@ -1194,10 +1199,13 @@
}
if (*dst) {
+ struct dst_entry *dst0 = *dst;
if ((err = xfrm_lookup(dst, fl, sk, 0)) < 0) {
dst_release(*dst);
return -ENETUNREACH;
}
+ if (*dst != dst0)
+ dst_release(dst0);
}
return 0;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm
2003-06-06 5:49 [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Kazunori Miyazawa
@ 2003-06-06 5:55 ` David S. Miller
2003-06-06 6:37 ` Kazunori Miyazawa
2003-06-06 7:27 ` David S. Miller
2003-06-08 15:47 ` James Morris
2 siblings, 1 reply; 6+ messages in thread
From: David S. Miller @ 2003-06-06 5:55 UTC (permalink / raw)
To: kazunori; +Cc: kuznet, usagi, netdev
From: Kazunori Miyazawa <kazunori@miyazawa.org>
Date: Fri, 6 Jun 2003 14:49:25 +0900
In dst_pop refernce cound of dsts except for last are incremented in
dst_clone and decremented in next call dst_pop but last dst refernce
count will be never decremented.
All dst are held by xfrm_policy and there is no need to touch the
refernce count here.
Ok, so the idea is to hold onto top-level parent DST entry the entire
time, and this prevents the DST and all it's children from being
destroyed. Is this correct?
Let me study this a little bit, I want to make sure it is correct.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm
2003-06-06 5:55 ` David S. Miller
@ 2003-06-06 6:37 ` Kazunori Miyazawa
0 siblings, 0 replies; 6+ messages in thread
From: Kazunori Miyazawa @ 2003-06-06 6:37 UTC (permalink / raw)
To: davem, kuznet; +Cc: netdev, usagi
On Thu, 05 Jun 2003 22:55:47 -0700 (PDT)
"David S. Miller" <davem@redhat.com> wrote:
> From: Kazunori Miyazawa <kazunori@miyazawa.org>
> Date: Fri, 6 Jun 2003 14:49:25 +0900
>
> In dst_pop refernce cound of dsts except for last are incremented in
> dst_clone and decremented in next call dst_pop but last dst refernce
> count will be never decremented.
> All dst are held by xfrm_policy and there is no need to touch the
> refernce count here.
>
> Ok, so the idea is to hold onto top-level parent DST entry the entire
> time, and this prevents the DST and all it's children from being
> destroyed. Is this correct?
>
Yes.
Additionally DST is incremented in the process but never decremented correctly.
Let me explain it. It must be "Don't try to teach your grandmother to suck eggs" :-)
"O" is original dst structure and its refcnt 1 in routing table.
"C" is the child
"DEST" is some paramter in the stack.
(X) after "O" or "C" represents reference count of it.
At first in the result of routing lookup DEST holds "O" with calling dst_hold/dst_clone.
DEST=>O(2)
In xfrm_lookup and related functions the child is created and connect to "O".
Those referenct count are incremented for xfrm_policy holding them.
Then the stack builds up stackable destination like this
DEST=>C(1)
|=>O(3)
After this the stack regards "C" as the original destination.
I assume the process is datagram. The stack call dst_clone before passing DST
to skb->dst.
skb->dst=DEST=>C(2)
|=>O(3)
In dst_pop it increments O with dst_clone and release C with dst_release
skb->dst => C(2)
|=>O(3)
call dst_pop....
skb->dst =>O(4)
DST=>C(1)
|=>O(4)
The stack done the process and it release DST with dst_release.
"O"'s reference count is decremented in kfree_skb.
skb->dst=DEST=>C(0)
|=>O(3)
I hope this helps you.
I don't think I understand whole dst life cycle.
Please teach me if I misunderstand.
BTW, why the stack set "0" to dst refernce count at the initialization.
IMHO it should be "1".
Thank you,
--Kazunori Miyazawa (Yokogawa Electric Corporation)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm
2003-06-06 5:49 [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Kazunori Miyazawa
2003-06-06 5:55 ` David S. Miller
@ 2003-06-06 7:27 ` David S. Miller
2003-06-08 15:47 ` James Morris
2 siblings, 0 replies; 6+ messages in thread
From: David S. Miller @ 2003-06-06 7:27 UTC (permalink / raw)
To: kazunori; +Cc: kuznet, usagi, netdev
From: Kazunori Miyazawa <kazunori@miyazawa.org>
Date: Fri, 6 Jun 2003 14:49:25 +0900
In dst_pop refernce cound of dsts except for last are incremented in
dst_clone and decremented in next call dst_pop but last dst refernce
count will be never decremented.
All dst are held by xfrm_policy and there is no need to touch the
refernce count here.
Ok, there is problem with this logic.
Final dst is set to skb->dst, and when SKB is freed then
we do dst_release(skb->dst). Therefore it _IS_ decremented.
(see net/core/skbuff.c:__kfree_skb(), it is where this final
DST reference is decremented).
Something is going wrong in ipv6 code if this is not happening.
If you modify skb->dst, it is your job to maintain reference
properly.
Look at how ipv4 works, we do all the work in the route lookup
and furthermore we never pass &skb->dst into these lookups.
What ipv6 output does looks really really strange. It is silly
to do flow lookups in places like ip6_xmit(). And this is where
all the refcount bugs are really coming from. Like ipv4, flow lookups
should be occuring at end of ip6_route_output() processing.
As far as I can tell, ip6_xmit() makes calculations based upon "dst"
and this is wrong. It updates only skb->dst, but this is not what
that function uses to make decisions. 'dst' is old copy :(
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm
2003-06-06 5:49 [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Kazunori Miyazawa
2003-06-06 5:55 ` David S. Miller
2003-06-06 7:27 ` David S. Miller
@ 2003-06-08 15:47 ` James Morris
2003-06-09 0:49 ` Kazunori Miyazawa
2 siblings, 1 reply; 6+ messages in thread
From: James Morris @ 2003-06-08 15:47 UTC (permalink / raw)
To: Kazunori Miyazawa; +Cc: davem, kuznet, usagi, netdev
On Fri, 6 Jun 2003, Kazunori Miyazawa wrote:
> In output functions, dst is changed by xfrm_lookup if there is
> any matching policy. Therefore original dst which is held before
> calling xfrm_lookup will be never released.
> When xfrm_lookup scceeds and dst is changed, original dst should
> be release.
It is released in xfrm_lookup():
*dst_p = dst;
ip_rt_put(rt);
xfrm_pol_put(policy);
return 0;
- James
--
James Morris
<jmorris@intercode.com.au>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][IPV6] keeping dst refcnt correctly with using xfrm
2003-06-08 15:47 ` James Morris
@ 2003-06-09 0:49 ` Kazunori Miyazawa
0 siblings, 0 replies; 6+ messages in thread
From: Kazunori Miyazawa @ 2003-06-09 0:49 UTC (permalink / raw)
To: James Morris; +Cc: davem, kuznet, usagi, netdev
On Mon, 9 Jun 2003 01:47:52 +1000 (EST)
James Morris <jmorris@intercode.com.au> wrote:
> On Fri, 6 Jun 2003, Kazunori Miyazawa wrote:
>
> > In output functions, dst is changed by xfrm_lookup if there is
> > any matching policy. Therefore original dst which is held before
> > calling xfrm_lookup will be never released.
> > When xfrm_lookup scceeds and dst is changed, original dst should
> > be release.
>
> It is released in xfrm_lookup():
>
> *dst_p = dst;
> ip_rt_put(rt);
> xfrm_pol_put(policy);
> return 0;
>
>
I overlooked it. Thank you.
--Kazunori Miyazawa (Yokogawa Electric Corporation)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-06-09 0:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-06 5:49 [PATCH][IPV6] keeping dst refcnt correctly with using xfrm Kazunori Miyazawa
2003-06-06 5:55 ` David S. Miller
2003-06-06 6:37 ` Kazunori Miyazawa
2003-06-06 7:27 ` David S. Miller
2003-06-08 15:47 ` James Morris
2003-06-09 0:49 ` Kazunori Miyazawa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).