netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlad Yasevich <vladislav.yasevich@hp.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	yjwei@cn.fujitsu.com, netdev@vger.kernel.org
Subject: Re: [PATCH] inet6: Fix paramater issue of inet6_csk_xmit
Date: Fri, 01 Aug 2008 10:50:17 -0400	[thread overview]
Message-ID: <489322A9.3080906@hp.com> (raw)
In-Reply-To: <20080801131453.GA19851@gondor.apana.org.au>

Herbert Xu wrote:
> On Fri, Aug 01, 2008 at 07:50:41PM +0800, Herbert Xu wrote:
>> So we should be able to simply replace ipfragok with skb->local_df.
>> The only trouble is that SCTP doesn't seem to keep its PMTU setting
>> in the usual spot so we'll need to fix that before this can be done.
> 
> OK I think this should work:
> 
> sctp: Drop ipfargok in sctp_xmit function
> 
> The ipfragok flag controls whether the packet may be fragmented
> either on the local host on beyond.  The latter is only valid on
> IPv4.
> 
> In fact, we never want to do the latter even on IPv4 when PMTU is
> enabled.  This is because even though we can't fragment packets

I think you mean that we can't "re-fragment packets".  The reason
for that is how SCTP handles sequence numbering,  Instead of numbering
bytes, it number messages.  Thus once the number is assigned, you can't
re-fragment it as that would result in different sequence numbers.

> within SCTP due to the prtocol's inherent faults, we can still
> fragment it at IP layer.  By setting the DF bit we will improve
> the PMTU process.
> 
> RFC 2960 only says that we SHOULD clear the DF bit in this case,
> so we're compliant even if we set the DF bit.

RFC 4960 doesn't mention DF bit anywhere, so that's not an issue.

> 
> Once we make this change, we only need to control the local
> fragmentation.  There is already a bit in the skb which controls
> that, local_df.  So this patch sets that instead of using the
> ipfragok argument.
> 
> The only complication is that there isn't a struct sock object
> per transport, so for IPv4 we have to resort to changing the
> pmtudisc field for every packet.  This should be safe though
> as the protocol is single-threaded.
> 
> Note that after this patch we can remove ipfragok from the rest
> of the stack too.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 535a18f..ab1c472 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -524,8 +524,7 @@ static inline void sctp_ssn_skip(struct sctp_stream *stream, __u16 id,
>   */
>  struct sctp_af {
>  	int		(*sctp_xmit)	(struct sk_buff *skb,
> -					 struct sctp_transport *,
> -					 int ipfragok);
> +					 struct sctp_transport *);
>  	int		(*setsockopt)	(struct sock *sk,
>  					 int level,
>  					 int optname,
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index a238d68..11dab07 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -195,8 +195,7 @@ out:
>  }
>  
>  /* Based on tcp_v6_xmit() in tcp_ipv6.c. */
> -static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport,
> -			int ipfragok)
> +static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport)
>  {
>  	struct sock *sk = skb->sk;
>  	struct ipv6_pinfo *np = inet6_sk(sk);
> @@ -231,7 +230,10 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport,
>  
>  	SCTP_INC_STATS(SCTP_MIB_OUTSCTPPACKS);
>  
> -	return ip6_xmit(sk, skb, &fl, np->opt, ipfragok);
> +	if (!(tp->param_flags & SPP_PMTUD_ENABLE))
> +		skb->local_df = 1;
> +
> +	return ip6_xmit(sk, skb, &fl, np->opt, 0);
>  }
>  
>  /* Returns the dst cache entry for the given source and destination ip
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 4568464..0dc4a7d 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -586,10 +586,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>  	SCTP_DEBUG_PRINTK("***sctp_transmit_packet*** skb len %d\n",
>  			  nskb->len);
>  
> -	if (tp->param_flags & SPP_PMTUD_ENABLE)
> -		(*tp->af_specific->sctp_xmit)(nskb, tp, packet->ipfragok);
> -	else
> -		(*tp->af_specific->sctp_xmit)(nskb, tp, 1);
> +	nskb->local_df = packet->ipfragok;
> +	(*tp->af_specific->sctp_xmit)(nskb, tp);
>  
>  out:
>  	packet->size = packet->overhead;
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index a6e0818..80b8efb 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -862,16 +862,25 @@ static int sctp_inet_supported_addrs(const struct sctp_sock *opt,
>  
>  /* Wrapper routine that calls the ip transmit routine. */
>  static inline int sctp_v4_xmit(struct sk_buff *skb,
> -			       struct sctp_transport *transport, int ipfragok)
> +			       struct sctp_transport *transport)
>  {
> +	struct inet_sock *inet = inet_sk(skb->sk);
> +
>  	SCTP_DEBUG_PRINTK("%s: skb:%p, len:%d, "
>  			  "src:%u.%u.%u.%u, dst:%u.%u.%u.%u\n",
>  			  __func__, skb, skb->len,
>  			  NIPQUAD(skb->rtable->rt_src),
>  			  NIPQUAD(skb->rtable->rt_dst));
>  
> +	/*
> +	 * You'd think with a protocol this big they could afford
> +	 * a struct sock per transport.
> +	 */

Please don't add comments like this.  It does not contribute anything.

> +	inet->pmtudisc = transport->param_flags & SPP_PMTUD_ENABLE ?
> +			 IP_PMTUDISC_DO : IP_PMTUDISC_DONT;

If you insist on doing this, you need to save the old value and restore
it after ip_queue_xmit(). 

The reason is that PMTU discovery change on a specific transport or
association should not affect the socket, since there could be multiple
associations on a given socket.  Each association has it's own control.

Thanks
-vlad

> +
>  	SCTP_INC_STATS(SCTP_MIB_OUTSCTPPACKS);
> -	return ip_queue_xmit(skb, ipfragok);
> +	return ip_queue_xmit(skb, 0);
>  }
>  
>  static struct sctp_af sctp_af_inet;
> 
> Cheers,


  reply	other threads:[~2008-08-01 14:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-31  9:56 [PATCH] inet6: Fix paramater issue of inet6_csk_xmit Wei Yongjun
2008-08-01  3:48 ` David Miller
2008-08-01  5:17   ` Wei Yongjun
2008-08-01  9:11     ` Herbert Xu
2008-08-01  9:18       ` David Miller
2008-08-01  9:25         ` Herbert Xu
2008-08-01 11:50           ` Herbert Xu
2008-08-01 13:14             ` Herbert Xu
2008-08-01 14:50               ` Vlad Yasevich [this message]
2008-08-02  0:30                 ` Herbert Xu
2008-08-02 23:26                   ` Vlad Yasevich
2008-08-03  2:01                     ` Herbert Xu
2008-08-03  6:56                     ` David Miller
2008-08-04  1:03               ` Vlad Yasevich
2008-08-04  1:55                 ` Herbert Xu
2008-08-04  2:58                   ` Wei Yongjun
2008-08-04  3:05                     ` Herbert Xu
2008-08-04  4:16                       ` David Miller
2008-08-04  4:15                   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=489322A9.3080906@hp.com \
    --to=vladislav.yasevich@hp.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=netdev@vger.kernel.org \
    --cc=yjwei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).