netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs.
@ 2005-05-14 13:48 Evgeniy Polyakov
  2005-05-15  8:01 ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Evgeniy Polyakov @ 2005-05-14 13:48 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

It looks like skb_cow_data() does not set 
proper owner for newly created skb.

If we have several fragments for skb and some of them
are shared(?) or cloned (like in async IPsec) there 
might be a situation when we require recreating skb and 
thus using skb_copy() for it.
Newly created skb has neither a destructor nor a socket
assotiated with it, which must be copied from the old skb.
As far as I can see, current code sets destructor and socket
for the first one skb only and uses truesize of the first skb
only to increment sk_wmem_alloc value.

If above "analysis" is correct then attached patch fixes that.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

--- ./net/xfrm/xfrm_algo.c~	2005-04-27 12:08:59.000000000 +0400
+++ ./net/xfrm/xfrm_algo.c	2005-05-14 17:36:52.000000000 +0400
@@ -698,7 +698,7 @@
 				return -ENOMEM;
 
 			if (skb1->sk)
-				skb_set_owner_w(skb, skb1->sk);
+				skb_set_owner_w(skb2, skb1->sk);
 
 			/* Looking around. Are we still alive?
 			 * OK, link new skb, drop old one */
--
	Evgeniy Polyakov

Crash is better than data corruption. -- Artur Grabowski

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs.
  2005-05-14 13:48 [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs Evgeniy Polyakov
@ 2005-05-15  8:01 ` Herbert Xu
  2005-05-15 10:40   ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames Herbert Xu
  2005-05-15 22:18   ` [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs David S. Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Herbert Xu @ 2005-05-15  8:01 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, davem

Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
>
> As far as I can see, current code sets destructor and socket
> for the first one skb only and uses truesize of the first skb
> only to increment sk_wmem_alloc value.
> 
> If above "analysis" is correct then attached patch fixes that.

Yes you're absolutely right.

> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

In fact you've uncovered a couple of bugs elsewhere too.  IPv4's
ip_push_pending_frames attributes all the data to the first skb
which breaks when the packet is sent through ip_fragment since
the latter does not undo the truesize attribution.  skb_linearize
is broken as well since it ignores truesize/owner altogether.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-15  8:01 ` Herbert Xu
@ 2005-05-15 10:40   ` Herbert Xu
  2005-05-15 11:41     ` Herbert Xu
  2005-05-15 22:18   ` [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs David S. Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2005-05-15 10:40 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, davem

[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]

On Sun, May 15, 2005 at 06:01:09PM +1000, Herbert Xu wrote:
> 
> In fact you've uncovered a couple of bugs elsewhere too.  IPv4's
> ip_push_pending_frames attributes all the data to the first skb
> which breaks when the packet is sent through ip_fragment since
> the latter does not undo the truesize attribution.  skb_linearize
> is broken as well since it ignores truesize/owner altogether.

Here is a patch that addresses the first issue.


Since frag_list skb's constructed by ip*_push_pending_frames are
usually going to be fragmented in ip*_fragment and sent separately,
it makes no sense to account them all in the head skb.  Doing so
means that ip*_fragment would have to undo it all in order to keep
the correct wmem accounting.

In fact as it is IPv4 is attributing all the memory to the head skb
in ip_push_pending_frames and not undoing the attribution correctly
in ip_fragment.  On the fast path it will not undo the work at all
which means that as soon as the head skb is transmitted the quota
which may still be used by the other skb's can be used by new data.
On the slow path it double-charges the skb's which may unnecssarily
delay new data from being sent.

This patch simply leaves the wmem accounting with each skb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 888 bytes --]

--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1138,10 +1138,6 @@ int ip_push_pending_frames(struct sock *
 		tail_skb = &(tmp_skb->next);
 		skb->len += tmp_skb->len;
 		skb->data_len += tmp_skb->len;
-		skb->truesize += tmp_skb->truesize;
-		__sock_put(tmp_skb->sk);
-		tmp_skb->destructor = NULL;
-		tmp_skb->sk = NULL;
 	}
 
 	/* Unless user demanded real pmtu discovery (IP_PMTUDISC_DO), we allow
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1116,12 +1116,6 @@ int ip6_push_pending_frames(struct sock 
 		tail_skb = &(tmp_skb->next);
 		skb->len += tmp_skb->len;
 		skb->data_len += tmp_skb->len;
-#if 0 /* Logically correct, but useless work, ip_fragment() will have to undo */
-		skb->truesize += tmp_skb->truesize;
-		__sock_put(tmp_skb->sk);
-		tmp_skb->destructor = NULL;
-		tmp_skb->sk = NULL;
-#endif
 	}
 
 	ipv6_addr_copy(final_dst, &fl->fl6_dst);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-15 10:40   ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames Herbert Xu
@ 2005-05-15 11:41     ` Herbert Xu
  2005-05-15 12:22       ` [IPV4/IPV6] Ensure all frag_list members have NULL sk Herbert Xu
  2005-05-15 22:24       ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames David S. Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Herbert Xu @ 2005-05-15 11:41 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, davem

On Sun, May 15, 2005 at 08:40:16PM +1000, herbert wrote:

Please discard this patch.

> Since frag_list skb's constructed by ip*_push_pending_frames are
> usually going to be fragmented in ip*_fragment and sent separately,
> it makes no sense to account them all in the head skb.  Doing so
> means that ip*_fragment would have to undo it all in order to keep
> the correct wmem accounting.

Unfortunately this leads to nightmares with partially cloned frag skb's.
The reason is that once you unleash a skb with a frag_list that has
individual sk ownerships into the stack you can never undo those
ownerships safely as they may have been cloned by things like netfilter.
Since we have to undo them in order to make skb_linearize happy this
approach leads to a dead-end.

So let's go the other way and make this an invariant:

	For any skb on a frag_list, skb->sk must be NULL.

That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.

> On the slow path it double-charges the skb's which may unnecssarily
> delay new data from being sent.

Especially because I was wrong about this.  The slow paths in ip*_fragment
is actually correct since it frees the original skb after constructing
the new fragments.

I'll post a new patch soon.  However, since this is a pretty major change
and the bugs it fixes aren't that important it should probably be delayed
until 2.6.13.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [IPV4/IPV6] Ensure all frag_list members have NULL sk
  2005-05-15 11:41     ` Herbert Xu
@ 2005-05-15 12:22       ` Herbert Xu
  2005-05-15 17:33         ` Evgeniy Polyakov
  2005-05-15 22:24       ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames David S. Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2005-05-15 12:22 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, davem

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

On Sun, May 15, 2005 at 09:41:21PM +1000, herbert wrote:
> 
> I'll post a new patch soon.  However, since this is a pretty major change
> and the bugs it fixes aren't that important it should probably be delayed
> until 2.6.13.

Here it is:


Having frag_list members which holds wmem of an sk leads to nightmares
with partially cloned frag skb's.  The reason is that once you unleash
a skb with a frag_list that has individual sk ownerships into the stack
you can never undo those ownerships safely as they may have been cloned
by things like netfilter.  Since we have to undo them in order to make
skb_linearize happy this approach leads to a dead-end.

So let's go the other way and make this an invariant:

	For any skb on a frag_list, skb->sk must be NULL.

That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.

The above invariant is actually violated in the following patch
for a short duration inside ip_fragment.  This is OK because the
offending frag_list member is either destroyed at the end of the
slow path without being sent anywhere, or it is detached from
the frag_list before being sent.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 1368 bytes --]

--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -490,6 +490,14 @@ int ip_fragment(struct sk_buff *skb, int
 			/* Partially cloned skb? */
 			if (skb_shared(frag))
 				goto slow_path;
+
+			BUG_ON(frag->sk);
+			if (skb->sk) {
+				sock_hold(skb->sk);
+				frag->sk = skb->sk;
+				frag->destructor = sock_wfree;
+				skb->truesize -= frag->truesize;
+			}
 		}
 
 		/* Everything is OK. Generate! */
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -552,13 +552,17 @@ static int ip6_fragment(struct sk_buff *
 			    skb_headroom(frag) < hlen)
 			    goto slow_path;
 
-			/* Correct socket ownership. */
-			if (frag->sk == NULL)
-				goto slow_path;
-
 			/* Partially cloned skb? */
 			if (skb_shared(frag))
 				goto slow_path;
+
+			BUG_ON(frag->sk);
+			if (skb->sk) {
+				sock_hold(skb->sk);
+				frag->sk = skb->sk;
+				frag->destructor = sock_wfree;
+				skb->truesize -= frag->truesize;
+			}
 		}
 
 		err = 0;
@@ -1116,12 +1120,10 @@ int ip6_push_pending_frames(struct sock 
 		tail_skb = &(tmp_skb->next);
 		skb->len += tmp_skb->len;
 		skb->data_len += tmp_skb->len;
-#if 0 /* Logically correct, but useless work, ip_fragment() will have to undo */
 		skb->truesize += tmp_skb->truesize;
 		__sock_put(tmp_skb->sk);
 		tmp_skb->destructor = NULL;
 		tmp_skb->sk = NULL;
-#endif
 	}
 
 	ipv6_addr_copy(final_dst, &fl->fl6_dst);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Ensure all frag_list members have NULL sk
  2005-05-15 12:22       ` [IPV4/IPV6] Ensure all frag_list members have NULL sk Herbert Xu
@ 2005-05-15 17:33         ` Evgeniy Polyakov
  2005-05-15 21:29           ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Evgeniy Polyakov @ 2005-05-15 17:33 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, davem

On Sun, 15 May 2005 22:22:56 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Sun, May 15, 2005 at 09:41:21PM +1000, herbert wrote:
> > 
> > I'll post a new patch soon.  However, since this is a pretty major change
> > and the bugs it fixes aren't that important it should probably be delayed
> > until 2.6.13.
> 
> Here it is:
> 
> 
> Having frag_list members which holds wmem of an sk leads to nightmares
> with partially cloned frag skb's.  The reason is that once you unleash
> a skb with a frag_list that has individual sk ownerships into the stack
> you can never undo those ownerships safely as they may have been cloned
> by things like netfilter.  Since we have to undo them in order to make
> skb_linearize happy this approach leads to a dead-end.
> 
> So let's go the other way and make this an invariant:
> 
> 	For any skb on a frag_list, skb->sk must be NULL.

This requires skb_set_owner_* to check if it is called
for head skb or one from fragment and does nothing if
it is from frag_list.
Or to check the whole tree for ownering calls...

> That is, the socket ownership always belongs to the head skb.
> It turns out that the implementation is actually pretty simple.
> 
> The above invariant is actually violated in the following patch
> for a short duration inside ip_fragment.  This is OK because the
> offending frag_list member is either destroyed at the end of the
> slow path without being sent anywhere, or it is detached from
> the frag_list before being sent.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


	Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Ensure all frag_list members have NULL sk
  2005-05-15 17:33         ` Evgeniy Polyakov
@ 2005-05-15 21:29           ` Herbert Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2005-05-15 21:29 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, davem

On Sun, May 15, 2005 at 09:33:41PM +0400, Evgeniy Polyakov wrote:
>
> > So let's go the other way and make this an invariant:
> > 
> > 	For any skb on a frag_list, skb->sk must be NULL.
> 
> This requires skb_set_owner_* to check if it is called
> for head skb or one from fragment and does nothing if
> it is from frag_list.
> Or to check the whole tree for ownering calls...

Not really.  The frag_list skb's owned by sk's are generated in
one place only, and that place is ip*_push_pending_frames.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs.
  2005-05-15  8:01 ` Herbert Xu
  2005-05-15 10:40   ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames Herbert Xu
@ 2005-05-15 22:18   ` David S. Miller
  1 sibling, 0 replies; 16+ messages in thread
From: David S. Miller @ 2005-05-15 22:18 UTC (permalink / raw)
  To: herbert; +Cc: johnpol, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs.
Date: Sun, 15 May 2005 18:01:09 +1000

> IPv4's ip_push_pending_frames attributes all the data to the first
> skb which breaks when the packet is sent through ip_fragment since
> the latter does not undo the truesize attribution.

I'm %99 sure this one is on purpose.  I'll have to dig into
the archives and source repo history to get the details
though.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-15 11:41     ` Herbert Xu
  2005-05-15 12:22       ` [IPV4/IPV6] Ensure all frag_list members have NULL sk Herbert Xu
@ 2005-05-15 22:24       ` David S. Miller
  2005-05-15 23:20         ` Herbert Xu
  1 sibling, 1 reply; 16+ messages in thread
From: David S. Miller @ 2005-05-15 22:24 UTC (permalink / raw)
  To: herbert; +Cc: johnpol, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
Date: Sun, 15 May 2005 21:41:21 +1000

> So let's go the other way and make this an invariant:
> 
> 	For any skb on a frag_list, skb->sk must be NULL.
> 
> That is, the socket ownership always belongs to the head skb.
> It turns out that the implementation is actually pretty simple.

Yes, I was just about to inform you that when SKBs are
free'd up, only the head SKB's sk gets the destructor run
on it.

Please refer to this patch from July of 2004:

[IPV4]: Fix multicast socket hangs.

If a multicast packet gets looped back, the sending
socket can hang if a local read just sits and does
not empty its receive queue.

The problem is that when an SKB clone is freed up,
the destructor is only invoked for the head SKB when
there is a fraglist (which is created for fragmentation).

The solution is to account the fragment list SKB lengths
in the top-level head SKB, then it all works out.

Signed-off-by: David S. Miller <davem@redhat.com>

--- 1/net/ipv4/ip_output.c	2005-05-15 15:23:19 -07:00
+++ 2/net/ipv4/ip_output.c	2005-05-15 15:23:19 -07:00
@@ -498,10 +498,6 @@
 			    skb_headroom(frag) < hlen)
 			    goto slow_path;
 
-			/* Correct socket ownership. */
-			if (frag->sk == NULL && skb->sk)
-				goto slow_path;
-
 			/* Partially cloned skb? */
 			if (skb_shared(frag))
 				goto slow_path;
@@ -1113,12 +1109,10 @@
 		tail_skb = &(tmp_skb->next);
 		skb->len += tmp_skb->len;
 		skb->data_len += tmp_skb->len;
-#if 0 /* Logically correct, but useless work, ip_fragment() will have to undo */
 		skb->truesize += tmp_skb->truesize;
 		__sock_put(tmp_skb->sk);
 		tmp_skb->destructor = NULL;
 		tmp_skb->sk = NULL;
-#endif
 	}
 
 	/* Unless user demanded real pmtu discovery (IP_PMTUDISC_DO), we allow

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-15 22:24       ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames David S. Miller
@ 2005-05-15 23:20         ` Herbert Xu
  2005-05-15 23:59           ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2005-05-15 23:20 UTC (permalink / raw)
  To: David S. Miller; +Cc: johnpol, netdev

On Sun, May 15, 2005 at 03:24:41PM -0700, David S. Miller wrote:
>
> > So let's go the other way and make this an invariant:
> > 
> > 	For any skb on a frag_list, skb->sk must be NULL.
> > 
> > That is, the socket ownership always belongs to the head skb.
> > It turns out that the implementation is actually pretty simple.
> 
> Yes, I was just about to inform you that when SKBs are
> free'd up, only the head SKB's sk gets the destructor run
> on it.

Dave, I'm actually agreeing with you :) My patch does these two things:

1) Do what you did below for IPv4 to IPv6 as well.
2) Do what the comment (the one that you removed below) said and actually
   undo the truesize aggregation on the fast path in ip*_fragment.

I presume you won't have any problems with 1) since that's exactly
the same as your patch below except that it's for IPv6.

2) is needed because on the fast path the frag_list is unraveled so it
no longer makes sense to have the wmem all accounted to the head skb.
Please note that we only undo the attribution if the frag is not shared.
So the problem that your patch is meant fix won't occur here.

> @@ -1113,12 +1109,10 @@
>  		tail_skb = &(tmp_skb->next);
>  		skb->len += tmp_skb->len;
>  		skb->data_len += tmp_skb->len;
> -#if 0 /* Logically correct, but useless work, ip_fragment() will have to undo */
>  		skb->truesize += tmp_skb->truesize;
>  		__sock_put(tmp_skb->sk);
>  		tmp_skb->destructor = NULL;
>  		tmp_skb->sk = NULL;
> -#endif

All I'm saying is that if we remove the #if 0 then we should do as it
suggests and undo this accounting in ip_fragment.

BTW, I missed the netfilter defrag code in my patch.  I'll send an
updated version soon.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-15 23:20         ` Herbert Xu
@ 2005-05-15 23:59           ` Herbert Xu
  2005-05-16  0:32             ` David S. Miller
  2005-05-16  0:39             ` David S. Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Herbert Xu @ 2005-05-15 23:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: johnpol, netdev

On Mon, May 16, 2005 at 09:20:20AM +1000, herbert wrote:
> 
> BTW, I missed the netfilter defrag code in my patch.  I'll send an
> updated version soon.

Actually that turns out to be OK since ip_ct_gather_frags zeros the
skb->sk field before feeding it to ip_defrag.  Once the packet has
been reassembled it tries to put the sk back which fails unless the
head skb comes in last which is never the case on the output path.

Although the second part is buggy, it does not violate the invariant
I stated before.

Please let me know if you still have any questions/problems with my
patch.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-15 23:59           ` Herbert Xu
@ 2005-05-16  0:32             ` David S. Miller
  2005-05-16  0:39             ` David S. Miller
  1 sibling, 0 replies; 16+ messages in thread
From: David S. Miller @ 2005-05-16  0:32 UTC (permalink / raw)
  To: herbert; +Cc: johnpol, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 16 May 2005 09:59:13 +1000

> Please let me know if you still have any questions/problems with my
> patch.

No objections, I see what you are doing now.

It is basically the inverse of the case where
we hold the frag truesizes in the head SKB.

Once we rip apart the fraglist, we have to
repropagate the truesize and sock ownership
down from the head SKB to the frags.

I'll apply your patch, and Evgeniy's too.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-15 23:59           ` Herbert Xu
  2005-05-16  0:32             ` David S. Miller
@ 2005-05-16  0:39             ` David S. Miller
  2005-05-16  1:00               ` Herbert Xu
  1 sibling, 1 reply; 16+ messages in thread
From: David S. Miller @ 2005-05-16  0:39 UTC (permalink / raw)
  To: herbert; +Cc: johnpol, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 16 May 2005 09:59:13 +1000

> Actually that turns out to be OK since ip_ct_gather_frags zeros the
> skb->sk field before feeding it to ip_defrag.  Once the packet has
> been reassembled it tries to put the sk back which fails unless the
> head skb comes in last which is never the case on the output path.
> 
> Although the second part is buggy, it does not violate the invariant
> I stated before.

This code should therefore just skb_orphan(), and nothing
more.  Right?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-16  0:39             ` David S. Miller
@ 2005-05-16  1:00               ` Herbert Xu
  2005-05-19 19:35                 ` David S. Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2005-05-16  1:00 UTC (permalink / raw)
  To: David S. Miller; +Cc: johnpol, netdev

On Sun, May 15, 2005 at 05:39:44PM -0700, David S. Miller wrote:
> 
> This code should therefore just skb_orphan(), and nothing
> more.  Right?

Ideally we should move the ownership to the head skb here as well.
However, doing this would cause havoc when you have nasty users
sending fragments from two different sockets :)

So let's stick with just the skb_orphan.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-16  1:00               ` Herbert Xu
@ 2005-05-19 19:35                 ` David S. Miller
  2005-05-19 21:48                   ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: David S. Miller @ 2005-05-19 19:35 UTC (permalink / raw)
  To: herbert; +Cc: johnpol, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 16 May 2005 11:00:58 +1000

[ For other readers, Herbert and I are discussing ip_ct_gather_frags(). ]

> On Sun, May 15, 2005 at 05:39:44PM -0700, David S. Miller wrote:
> > 
> > This code should therefore just skb_orphan(), and nothing
> > more.  Right?
> 
> Ideally we should move the ownership to the head skb here as well.
> However, doing this would cause havoc when you have nasty users
> sending fragments from two different sockets :)
> 
> So let's stick with just the skb_orphan.

Ok, based upon this I am adding the following patch to my
tree.  Thanks.

--- 1/net/ipv4/netfilter/ip_conntrack_core.c.~1~	2005-05-18 22:45:26.000000000 -0700
+++ 2/net/ipv4/netfilter/ip_conntrack_core.c	2005-05-19 12:32:26.000000000 -0700
@@ -940,37 +940,25 @@
 struct sk_buff *
 ip_ct_gather_frags(struct sk_buff *skb, u_int32_t user)
 {
-	struct sock *sk = skb->sk;
 #ifdef CONFIG_NETFILTER_DEBUG
 	unsigned int olddebug = skb->nf_debug;
 #endif
 
-	if (sk) {
-		sock_hold(sk);
-		skb_orphan(skb);
-	}
+	skb_orphan(skb);
 
 	local_bh_disable(); 
 	skb = ip_defrag(skb, user);
 	local_bh_enable();
 
-	if (!skb) {
-		if (sk)
-			sock_put(sk);
-		return skb;
-	}
-
-	if (sk) {
-		skb_set_owner_w(skb, sk);
-		sock_put(sk);
-	}
-
-	ip_send_check(skb->nh.iph);
-	skb->nfcache |= NFC_ALTERED;
+	if (skb) {
+		ip_send_check(skb->nh.iph);
+		skb->nfcache |= NFC_ALTERED;
 #ifdef CONFIG_NETFILTER_DEBUG
-	/* Packet path as if nothing had happened. */
-	skb->nf_debug = olddebug;
+		/* Packet path as if nothing had happened. */
+		skb->nf_debug = olddebug;
 #endif
+	}
+
 	return skb;
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames
  2005-05-19 19:35                 ` David S. Miller
@ 2005-05-19 21:48                   ` Herbert Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2005-05-19 21:48 UTC (permalink / raw)
  To: David S. Miller; +Cc: johnpol, netdev

On Thu, May 19, 2005 at 12:35:30PM -0700, David S. Miller wrote:
> 
> Ok, based upon this I am adding the following patch to my
> tree.  Thanks.

Thanks Dave, this looks good to me.

> --- 1/net/ipv4/netfilter/ip_conntrack_core.c.~1~	2005-05-18 22:45:26.000000000 -0700
> +++ 2/net/ipv4/netfilter/ip_conntrack_core.c	2005-05-19 12:32:26.000000000 -0700
> @@ -940,37 +940,25 @@
>  struct sk_buff *
>  ip_ct_gather_frags(struct sk_buff *skb, u_int32_t user)
>  {
> -	struct sock *sk = skb->sk;
>  #ifdef CONFIG_NETFILTER_DEBUG
>  	unsigned int olddebug = skb->nf_debug;
>  #endif
>  
> -	if (sk) {
> -		sock_hold(sk);
> -		skb_orphan(skb);
> -	}
> +	skb_orphan(skb);
>  
>  	local_bh_disable(); 
>  	skb = ip_defrag(skb, user);
>  	local_bh_enable();
>  
> -	if (!skb) {
> -		if (sk)
> -			sock_put(sk);
> -		return skb;
> -	}
> -
> -	if (sk) {
> -		skb_set_owner_w(skb, sk);
> -		sock_put(sk);
> -	}
> -
> -	ip_send_check(skb->nh.iph);
> -	skb->nfcache |= NFC_ALTERED;
> +	if (skb) {
> +		ip_send_check(skb->nh.iph);
> +		skb->nfcache |= NFC_ALTERED;
>  #ifdef CONFIG_NETFILTER_DEBUG
> -	/* Packet path as if nothing had happened. */
> -	skb->nf_debug = olddebug;
> +		/* Packet path as if nothing had happened. */
> +		skb->nf_debug = olddebug;
>  #endif
> +	}
> +
>  	return skb;
>  }
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-05-19 21:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-14 13:48 [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs Evgeniy Polyakov
2005-05-15  8:01 ` Herbert Xu
2005-05-15 10:40   ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames Herbert Xu
2005-05-15 11:41     ` Herbert Xu
2005-05-15 12:22       ` [IPV4/IPV6] Ensure all frag_list members have NULL sk Herbert Xu
2005-05-15 17:33         ` Evgeniy Polyakov
2005-05-15 21:29           ` Herbert Xu
2005-05-15 22:24       ` [IPV4/IPV6] Keep wmem accounting separate in ip*_push_pending_frames David S. Miller
2005-05-15 23:20         ` Herbert Xu
2005-05-15 23:59           ` Herbert Xu
2005-05-16  0:32             ` David S. Miller
2005-05-16  0:39             ` David S. Miller
2005-05-16  1:00               ` Herbert Xu
2005-05-19 19:35                 ` David S. Miller
2005-05-19 21:48                   ` Herbert Xu
2005-05-15 22:18   ` [1/1] xfrm: skb_cow_data() does not set proper owner for new skbs David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).