netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pskb change in dst->output
@ 2004-07-07 13:06 Herbert Xu
  2004-07-07 14:58 ` James Morris
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2004-07-07 13:06 UTC (permalink / raw)
  To: James Morris, David S. Miller, netdev

Hi James:

I'm working on merging the IPsec tunnel encapsulation code so that
we have only one copy of it instead of four.

In doing so I'm wondering why you changed dst->output to take a
struct sk_buff ** from a struct sk_buff *.  All of the dst->output
functions already assumed that they have exclusive access to the skb.
This is justified because all callers to dst_output() makes sure that
the packet is neither shared nor cloned.

So in practice the calls to skb_checksum_help() from AH/ESP/IPCOMP
are never going to unshare the skb.  In that case, couldn't we just
have a version of skb_checksum_help() that was for the case where
the skb is never shared?

This would allow us to go back to the single pointer in dst->output.
The double pointer is also an eye sore because in most of these
dst->output functions, the first access to the skb after
skb_checksum_help() usually assumes that the skb is neither
shared nor cloned.

This is bad because someone in future could assume that dst->output
can take a shared/cloned skb based on the fact that it takes a
struct sk_buff **.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-07 13:06 pskb change in dst->output Herbert Xu
@ 2004-07-07 14:58 ` James Morris
  2004-07-07 21:28   ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: James Morris @ 2004-07-07 14:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Wed, 7 Jul 2004, Herbert Xu wrote:

> In doing so I'm wondering why you changed dst->output to take a
> struct sk_buff ** from a struct sk_buff *.

This was because skb_checksum_help() was changed to potentially replace 
the skb in the output path.

>  All of the dst->output functions already assumed that they have
> exclusive access to the skb. This is justified because all callers to
> dst_output() makes sure that the packet is neither shared nor cloned.

Cloned skbs are regularly passed to dst_output(), thus we need to use the 
double pointer for skb_checksum_help() in case the skb is replaced.


- James
-- 
James Morris
<jmorris@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-07 14:58 ` James Morris
@ 2004-07-07 21:28   ` Herbert Xu
  2004-07-07 22:01     ` David S. Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2004-07-07 21:28 UTC (permalink / raw)
  To: James Morris; +Cc: David S. Miller, netdev

On Wed, Jul 07, 2004 at 10:58:21AM -0400, James Morris wrote:
> 
> >  All of the dst->output functions already assumed that they have
> > exclusive access to the skb. This is justified because all callers to
> > dst_output() makes sure that the packet is neither shared nor cloned.
> 
> Cloned skbs are regularly passed to dst_output(), thus we need to use the 
> double pointer for skb_checksum_help() in case the skb is replaced.

OK.  Can you please tell me which caller of dst_output() passes
a cloned skb to it?

I need to know this because if this is the case, we need fix the various
IPsec output functions to copy the skb.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-07 21:28   ` Herbert Xu
@ 2004-07-07 22:01     ` David S. Miller
  2004-07-07 23:12       ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: David S. Miller @ 2004-07-07 22:01 UTC (permalink / raw)
  To: Herbert Xu; +Cc: jmorris, netdev

On Thu, 8 Jul 2004 07:28:05 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> OK.  Can you please tell me which caller of dst_output() passes
> a cloned skb to it?
> 
> I need to know this because if this is the case, we need fix the various
> IPsec output functions to copy the skb.

They do already, via skb_cow().

The case James's patch tries to deal with is when multicast
loops back packets to localhost, see ip_mc_output().
Since the SKB is a clone, if we do anything with the checksum
state of the clone, this messes up the hw checksumming state
of the original SKB and we get output packets with corrupt
checksums.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-07 22:01     ` David S. Miller
@ 2004-07-07 23:12       ` Herbert Xu
  2004-07-07 23:33         ` James Morris
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2004-07-07 23:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmorris, netdev

On Wed, Jul 07, 2004 at 03:01:32PM -0700, David S. Miller wrote:
> On Thu, 8 Jul 2004 07:28:05 +1000
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > OK.  Can you please tell me which caller of dst_output() passes
> > a cloned skb to it?
> 
> The case James's patch tries to deal with is when multicast
> loops back packets to localhost, see ip_mc_output().
> Since the SKB is a clone, if we do anything with the checksum
> state of the clone, this messes up the hw checksumming state
> of the original SKB and we get output packets with corrupt
> checksums.

Thanks.  I have no problems with that.  But ip_mc_output() as far
as I can see does not call dst_output.

So is there a caller to dst_output that does this as well?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-07 23:12       ` Herbert Xu
@ 2004-07-07 23:33         ` James Morris
  2004-07-08  0:04           ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: James Morris @ 2004-07-07 23:33 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Thu, 8 Jul 2004, Herbert Xu wrote:

> Thanks.  I have no problems with that.  But ip_mc_output() as far
> as I can see does not call dst_output.
> 
> So is there a caller to dst_output that does this as well?

The TCP code often clones skbs to be transmitted.


- James
-- 
James Morris
<jmorris@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-07 23:33         ` James Morris
@ 2004-07-08  0:04           ` Herbert Xu
  2004-07-08  0:17             ` David S. Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2004-07-08  0:04 UTC (permalink / raw)
  To: James Morris; +Cc: David S. Miller, netdev

On Wed, Jul 07, 2004 at 07:33:18PM -0400, James Morris wrote:
> 
> > So is there a caller to dst_output that does this as well?
> 
> The TCP code often clones skbs to be transmitted.

Thanks for the pointer.

I've looked at it and that appears to be safe.  The logic appears
to be that the first packet passed to dst_output() via ip_queue_xmit()
is a clone of the packet on the TCP output queue.  The output
functions are allowed to modify all parts of the skb except the
TCP payload.

Subsequent retransmits will unclone the packet before adding the
TCP header and passing it to ip_queue_xmit().

So the bottom line is that we don't have to unclone the packet before
touching the checksum in dst_output().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  0:04           ` Herbert Xu
@ 2004-07-08  0:17             ` David S. Miller
  2004-07-08  0:35               ` Herbert Xu
  2004-07-08  1:05               ` James Morris
  0 siblings, 2 replies; 21+ messages in thread
From: David S. Miller @ 2004-07-08  0:17 UTC (permalink / raw)
  To: Herbert Xu; +Cc: jmorris, netdev

On Thu, 8 Jul 2004 10:04:21 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Jul 07, 2004 at 07:33:18PM -0400, James Morris wrote:
> > 
> > > So is there a caller to dst_output that does this as well?
> > 
> > The TCP code often clones skbs to be transmitted.
> 
> Thanks for the pointer.

This isn't the case that prompted James's changes though.

It's a UDP packet, to multicast, with hw checksumming enabled,
that gets looped back via ip_mc_output() here:

	if (rt->rt_flags&RTCF_MULTICAST) {
		if ((!sk || inet_sk(sk)->mc_loop)
#ifdef CONFIG_IP_MROUTE
		/* Small optimization: do not loopback not local frames,
		   which returned after forwarding; they will be  dropped
		   by ip_mr_input in any case.
		   Note, that local frames are looped back to be delivered
		   to local recipients.

		   This check is duplicated in ip_mr_input at the moment.
		 */
		    && ((rt->rt_flags&RTCF_LOCAL) || !(IPCB(skb)->flags&IPSKB_FORWARDED))
#endif
		) {
			struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
			if (newskb)
				NF_HOOK(PF_INET, NF_IP_POST_ROUTING, newskb, NULL,
					newskb->dev, 
					ip_dev_loopback_xmit);
		}

If this goes through netfilter, before James's changes, in any way
shape or form, the checksum of the original SKB will be corrupted.
This breaks dhcp for example, and it's really common because if you
just build selinux even without using any rules, packets go through
netfilter.

That is what James's changes, to move the actual packet mucking deeper
in the netfilter call chain (to where packet modifications really happen)
so that we can fix the above described case without unneeded expense
added.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  0:17             ` David S. Miller
@ 2004-07-08  0:35               ` Herbert Xu
  2004-07-08  1:05               ` James Morris
  1 sibling, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2004-07-08  0:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmorris, netdev

On Wed, Jul 07, 2004 at 05:17:59PM -0700, David S. Miller wrote:
> 
> If this goes through netfilter, before James's changes, in any way
> shape or form, the checksum of the original SKB will be corrupted.
> This breaks dhcp for example, and it's really common because if you
> just build selinux even without using any rules, packets go through
> netfilter.

Thanks, I've got no problems with the netfilter part of the change
at all.

What I am trying to find out is what has this got to do with dst_output().

Let's take ah_output() as an example.  At the top of the function we've
got a conditional call to skb_checksum_help() which will unclone the
skb.  This ostensibly allows us to call ah_output() with a cloned skb.

However, if we look further down in ah_output() we will find code that
modifies the contents of skb->data without uncloning it.  So if
someone calls ah_output() with a cloned packet that does not require
skb_checksum_help() this will blow up.  The TCP case is safe as the
skb it provides is practically uncloned as long as you don't touch the
TCP payload.

This works currently as all callers to dst_output() provide it with
an skb that is practically uncloned in the sense that you can modify
all the headers at will.  But if that is the case, I don't see why
dst_output() needs to take a pskb instead of a plain skb.

Of course the current call to skb_checksum_help() will unclone
packets such as those generated by TCP, but I think that we ought
to have a version of it that doesn't unclone the packet and call
that in dst->output() functions like ah_output().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  0:17             ` David S. Miller
  2004-07-08  0:35               ` Herbert Xu
@ 2004-07-08  1:05               ` James Morris
  2004-07-08  1:11                 ` Herbert Xu
  1 sibling, 1 reply; 21+ messages in thread
From: James Morris @ 2004-07-08  1:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: Herbert Xu, netdev

On Wed, 7 Jul 2004, David S. Miller wrote:

> That is what James's changes, to move the actual packet mucking deeper
> in the netfilter call chain (to where packet modifications really happen)
> so that we can fix the above described case without unneeded expense
> added.

Btw, this should also lead to a performance increase for Netfilter, 
although I haven't measured it.


- James
-- 
James Morris
<jmorris@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  1:05               ` James Morris
@ 2004-07-08  1:11                 ` Herbert Xu
  2004-07-08  1:19                   ` James Morris
  2004-07-08  3:34                   ` James Morris
  0 siblings, 2 replies; 21+ messages in thread
From: Herbert Xu @ 2004-07-08  1:11 UTC (permalink / raw)
  To: James Morris; +Cc: David S. Miller, netdev

On Wed, Jul 07, 2004 at 09:05:11PM -0400, James Morris wrote:
> 
> Btw, this should also lead to a performance increase for Netfilter, 
> although I haven't measured it.

I'm totally happy with the netfilter part of the change.

I'm just a bit unsure whether the pskb change is needed in dst_output().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  1:11                 ` Herbert Xu
@ 2004-07-08  1:19                   ` James Morris
  2004-07-08  3:34                   ` James Morris
  1 sibling, 0 replies; 21+ messages in thread
From: James Morris @ 2004-07-08  1:19 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Thu, 8 Jul 2004, Herbert Xu wrote:

> I'm just a bit unsure whether the pskb change is needed in dst_output().

I'm looking into it now, it may not be after all.


- James
-- 
James Morris
<jmorris@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  1:11                 ` Herbert Xu
  2004-07-08  1:19                   ` James Morris
@ 2004-07-08  3:34                   ` James Morris
  2004-07-08  4:02                     ` Herbert Xu
  1 sibling, 1 reply; 21+ messages in thread
From: James Morris @ 2004-07-08  3:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Thu, 8 Jul 2004, Herbert Xu wrote:

> I'm just a bit unsure whether the pskb change is needed in dst_output().

Ok, dst_output() iterates over the skb->dst stack:

	for (;;) {
                err = skb->dst->output(&skb);

                if (likely(err == 0))
                        return err;
                if (unlikely(err != NET_XMIT_BYPASS))
                        return err;
        }

We need to pass a pointer to skb to skb->dst->output() because the skb can
be changed by skb_checksum_help() in one of these output paths.  Further
iteration would otherwise pass an incorrect (freed) skb.

The change to dst->output() which implements a double skb pointer is then
propagated throughout the code, e.g.  ah_output() has to change because it
becomes an output function via xfrm->type->output().


- James
-- 
James Morris
<jmorris@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  3:34                   ` James Morris
@ 2004-07-08  4:02                     ` Herbert Xu
  2004-07-09  8:14                       ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2004-07-08  4:02 UTC (permalink / raw)
  To: James Morris; +Cc: David S. Miller, netdev

On Wed, Jul 07, 2004 at 11:34:59PM -0400, James Morris wrote:
> 
> We need to pass a pointer to skb to skb->dst->output() because the skb can
> be changed by skb_checksum_help() in one of these output paths.  Further
> iteration would otherwise pass an incorrect (freed) skb.

That's only because the dst->output() functions are calling
skb_checksum_help().  Since those same functions assume the
skb to be "uncloned" anyway (they modify it directly by adding
headers etc.), they only need to call a version of
skb_checksum_help() that does not do a copy of the skb.

This is what dst->output() did before this change anyway.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-08  4:02                     ` Herbert Xu
@ 2004-07-09  8:14                       ` Herbert Xu
  2004-07-09 14:02                         ` James Morris
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2004-07-09  8:14 UTC (permalink / raw)
  To: James Morris; +Cc: David S. Miller, netdev

On Thu, Jul 08, 2004 at 02:02:35PM +1000, herbert wrote:
> 
> That's only because the dst->output() functions are calling
> skb_checksum_help().  Since those same functions assume the
> skb to be "uncloned" anyway (they modify it directly by adding
> headers etc.), they only need to call a version of
> skb_checksum_help() that does not do a copy of the skb.

If there are no objections, I'd like to create a version of
skb_checksum_help() that doesn't copy the packet, and call
that version from ah_output()/esp_output()/ipcomp_output().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-09  8:14                       ` Herbert Xu
@ 2004-07-09 14:02                         ` James Morris
  2004-07-09 19:36                           ` David S. Miller
  0 siblings, 1 reply; 21+ messages in thread
From: James Morris @ 2004-07-09 14:02 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Fri, 9 Jul 2004, Herbert Xu wrote:

> On Thu, Jul 08, 2004 at 02:02:35PM +1000, herbert wrote:
> > 
> > That's only because the dst->output() functions are calling
> > skb_checksum_help().  Since those same functions assume the
> > skb to be "uncloned" anyway (they modify it directly by adding
> > headers etc.), they only need to call a version of
> > skb_checksum_help() that does not do a copy of the skb.
> 
> If there are no objections, I'd like to create a version of
> skb_checksum_help() that doesn't copy the packet, and call
> that version from ah_output()/esp_output()/ipcomp_output().

This will break when cloned packets are passed to these functions.


- James
-- 
James Morris
<jmorris@redhat.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-09 14:02                         ` James Morris
@ 2004-07-09 19:36                           ` David S. Miller
  2004-07-09 20:42                             ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: David S. Miller @ 2004-07-09 19:36 UTC (permalink / raw)
  To: James Morris; +Cc: herbert, netdev

On Fri, 9 Jul 2004 10:02:25 -0400 (EDT)
James Morris <jmorris@redhat.com> wrote:

> On Fri, 9 Jul 2004, Herbert Xu wrote:
> 
> > On Thu, Jul 08, 2004 at 02:02:35PM +1000, herbert wrote:
> > > 
> > > That's only because the dst->output() functions are calling
> > > skb_checksum_help().  Since those same functions assume the
> > > skb to be "uncloned" anyway (they modify it directly by adding
> > > headers etc.), they only need to call a version of
> > > skb_checksum_help() that does not do a copy of the skb.
> > 
> > If there are no objections, I'd like to create a version of
> > skb_checksum_help() that doesn't copy the packet, and call
> > that version from ah_output()/esp_output()/ipcomp_output().
> 
> This will break when cloned packets are passed to these functions.

James is right Herbert.  TCP will send clones down into these routines
all the time.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-09 19:36                           ` David S. Miller
@ 2004-07-09 20:42                             ` Herbert Xu
  2004-07-09 21:07                               ` Herbert Xu
  2004-07-09 21:21                               ` David S. Miller
  0 siblings, 2 replies; 21+ messages in thread
From: Herbert Xu @ 2004-07-09 20:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: James Morris, netdev

On Fri, Jul 09, 2004 at 12:36:08PM -0700, David S. Miller wrote:
>
> > > If there are no objections, I'd like to create a version of
> > > skb_checksum_help() that doesn't copy the packet, and call
> > > that version from ah_output()/esp_output()/ipcomp_output().
> > 
> > This will break when cloned packets are passed to these functions.
> 
> James is right Herbert.  TCP will send clones down into these routines
> all the time.

The first TCP transmission will always be a clone of a packet off
its output queue.  However, the TCP code is written such that you
can modify any part of the skb except the TCP payload.  This
includes the TCP header which is where the TCP checksum is.

If this weren't the case then you'd have to copy the packet much earlier.
This assumption is already made by tcp_transmit_skb(), ip_queue_xmit()
and all the functions called by dst_output().

When TCP retransmits the packet, it will do a pskb_copy() on it so
it's no longer a clone.

So unless I've missed another case where someone will pass a clone
down, it is safe to change the checksum on the TCP clones.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-09 20:42                             ` Herbert Xu
@ 2004-07-09 21:07                               ` Herbert Xu
  2004-07-09 21:21                               ` David S. Miller
  1 sibling, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2004-07-09 21:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: James Morris, netdev

On Sat, Jul 10, 2004 at 06:42:28AM +1000, herbert wrote:
> 
> So unless I've missed another case where someone will pass a clone
> down, it is safe to change the checksum on the TCP clones.

Please note that even if you decide at the API-level that it is not
OK for dst->output() functions to modify anything beyond the IP
headers, we still don't need to copy the entire skb.

Since it's already a clone anyway, skb_cow() suffices to allow
changing the checksum.

Is there a situation where someone will pass a shared (not cloned) skb
down on the netfilter path? If not, then we can do skb_cow() there as
well.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-09 20:42                             ` Herbert Xu
  2004-07-09 21:07                               ` Herbert Xu
@ 2004-07-09 21:21                               ` David S. Miller
  2004-07-09 21:43                                 ` Herbert Xu
  1 sibling, 1 reply; 21+ messages in thread
From: David S. Miller @ 2004-07-09 21:21 UTC (permalink / raw)
  To: Herbert Xu; +Cc: jmorris, netdev

On Sat, 10 Jul 2004 06:42:28 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Fri, Jul 09, 2004 at 12:36:08PM -0700, David S. Miller wrote:
> >
> > > > If there are no objections, I'd like to create a version of
> > > > skb_checksum_help() that doesn't copy the packet, and call
> > > > that version from ah_output()/esp_output()/ipcomp_output().
> > > 
> > > This will break when cloned packets are passed to these functions.
> > 
> > James is right Herbert.  TCP will send clones down into these routines
> > all the time.
> 
> The first TCP transmission will always be a clone of a packet off
> its output queue.  However, the TCP code is written such that you
> can modify any part of the skb except the TCP payload.  This
> includes the TCP header which is where the TCP checksum is.
>
> If this weren't the case then you'd have to copy the packet much earlier.
> This assumption is already made by tcp_transmit_skb(), ip_queue_xmit()
> and all the functions called by dst_output().
> 
> When TCP retransmits the packet, it will do a pskb_copy() on it so
> it's no longer a clone.

Not necessarily true.  If the device has finished transmission,
which is true %99.9999 of the time when a retransmission happens,
another clone will be made against the original SKB sitting in
the write queue.

> So unless I've missed another case where someone will pass a clone
> down, it is safe to change the checksum on the TCP clones.

The hw checksumming state is what we care about.  And skb_cow()'s implementation
is:

1) Always copy all data if cloned

2) Allocate a unique data area, and even the shared private skb
   area becomes local to the skb.

In short only the data is uncloned.

However, skb_checksum_help() is doing something entirely different.
It makes a fully new skb, both data and sk_buff struct are uncloned.

This is particularly important for the very case which ah_output()
cares about, for example.  If the skb is CHECKSUM_HW we have to unclone
the full SKB.  ah_output() does not use things like skb_cow() like
ESP and others do.

I really think the dst->output() SKB pointer passing is truly needed.

You still won't be convinced, I know :-)  So propose a patch and we'll
shoot holes in it so we can discuss something concrete, ok? :)))

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pskb change in dst->output
  2004-07-09 21:21                               ` David S. Miller
@ 2004-07-09 21:43                                 ` Herbert Xu
  0 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2004-07-09 21:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmorris, netdev

On Fri, Jul 09, 2004 at 02:21:44PM -0700, David S. Miller wrote:
>
> > When TCP retransmits the packet, it will do a pskb_copy() on it so
> > it's no longer a clone.
> 
> Not necessarily true.  If the device has finished transmission,
> which is true %99.9999 of the time when a retransmission happens,
> another clone will be made against the original SKB sitting in
> the write queue.

You're right.  But that case is identical to the clone during the
initial transmission.

> The hw checksumming state is what we care about.  And skb_cow()'s implementation
> is:
> 
> 1) Always copy all data if cloned
> 
> 2) Allocate a unique data area, and even the shared private skb
>    area becomes local to the skb.
> 
> In short only the data is uncloned.
> 
> However, skb_checksum_help() is doing something entirely different.
> It makes a fully new skb, both data and sk_buff struct are uncloned.

I completely agree with these facts.

> This is particularly important for the very case which ah_output()
> cares about, for example.  If the skb is CHECKSUM_HW we have to unclone
> the full SKB.  ah_output() does not use things like skb_cow() like
> ESP and others do.
>
> I really think the dst->output() SKB pointer passing is truly needed.

That's where we disagree.  The very first thing ah_output() does after
the *conditional* skb_checksum_help() call is skb_push().  This cannot
be done on a shared skb.  Therefore the skb must be totally private
or at most cloned.

This is the fundamental reason why there is no need to make a completely
new skb because we already have (or assume to have) exlucsive access
to the fields in the sk_buff struct.

Now whether we need to unclone the data or not is a different question.
My opinion is that we don't, but I'm not to fussed over that.

> You still won't be convinced, I know :-)  So propose a patch and we'll
> shoot holes in it so we can discuss something concrete, ok? :)))

OK, that's what I'll do once I finish working on the IPv4 IPsec
encapsulation stuff.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2004-07-09 21:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-07 13:06 pskb change in dst->output Herbert Xu
2004-07-07 14:58 ` James Morris
2004-07-07 21:28   ` Herbert Xu
2004-07-07 22:01     ` David S. Miller
2004-07-07 23:12       ` Herbert Xu
2004-07-07 23:33         ` James Morris
2004-07-08  0:04           ` Herbert Xu
2004-07-08  0:17             ` David S. Miller
2004-07-08  0:35               ` Herbert Xu
2004-07-08  1:05               ` James Morris
2004-07-08  1:11                 ` Herbert Xu
2004-07-08  1:19                   ` James Morris
2004-07-08  3:34                   ` James Morris
2004-07-08  4:02                     ` Herbert Xu
2004-07-09  8:14                       ` Herbert Xu
2004-07-09 14:02                         ` James Morris
2004-07-09 19:36                           ` David S. Miller
2004-07-09 20:42                             ` Herbert Xu
2004-07-09 21:07                               ` Herbert Xu
2004-07-09 21:21                               ` David S. Miller
2004-07-09 21:43                                 ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).