All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
To: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David Miller <davem@davemloft.net>,
	wenxu@ucloud.cn, kuznet@ms2.inr.ac.ru, jmorris@namei.org,
	kaber@trash.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org,
	wenx05124561@163.com
Subject: Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs
Date: Fri, 19 Aug 2016 16:40:07 +0300	[thread overview]
Message-ID: <20160819164007.527fe984@halley> (raw)
In-Reply-To: <d9fcc388-4722-02c6-e3dc-b68893c6cc15@stressinduktion.org>

Hi,

On Fri, 19 Aug 2016 11:20:40 +0200 Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> >> Maybe we can change our criteria in the following manner:
> >>  
> >> -	if (skb_iif && proto == IPPROTO_UDP) {
> >> +	if (skb_iif && !(df & htons(IP_DF))) {
> >> 		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> >>
> >> This way, any tunnel explicitly providing DF is NOT allowed to
> >> further fragment the resulting segments (leading to tx segments being
> >> dropped).
> >> Others tunnels, that do not care (e.g. vxlan and geneve, and probably
> >> ovs vport-gre, or other ovs encap vports, in df_default=false mode),
> >> will behave same for gso and non-gso.
> >>
> 
> I am really not sure...
> 
> Probably we have no other choice.

Further diving into this, seems the !IP_DF approach is more correct 
then the IPPROTO_UDP approach (WRT packets/segments arriving from other
interface, that exceed egress mtu):

vxlan/geneve:
  Both set df to zero.
  !IP_DF approach acts same as IPPROTO_UDP approach

vxlan/geneve in collect_md (e.g. OvS):
  They set df according to tun_flags & TUNNEL_DONT_FRAGMENT.
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS gets set unconditionally.
    In case TUNNEL_DONT_FRAGMENT requested, non-gso get dropped
    due to IPSTATS_MIB_FRAGFAILS, whereas gso gets segmented+fragmented (!)
  !IP_DF approach:
    Aligned, both non-gso and gso gets dropped for TUNNEL_DONT_FRAGMENT.

ip_gre in collect_md (e.g. OvS):
  Sets df according to tun_flags & TUNNEL_DONT_FRAGMENT.
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS is never set.
    Therefore in the case were df is NOT set, non-gso are fragged and
    passed, whereas gso gets dropped (!)
  !IP_DF approach:
    Non-gso vs gso aligned.

ip_gre in nopmtudisc:
  Will pass tnl_update_pmtu checks; Then, df inherrited from inner_iph
  (or stays unset if IFLA_GRE_IGNORE_DF specified).
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS never set.
    Therefore in the case were df is NOT set, non-gso are fragged and
    passed, whereas gso gets dropped (!)
  !IP_DF approach:
    Aligned.
    
ip_gre in fou/gue mode in nopmtudisc:
  Assuming they pass tnl_update_pmtu checks; Then, df inherrited from
  inner_iph (or stays unset if IFLA_GRE_IGNORE_DF specified).
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS gets always set (since proto==IPPROTO_UDP).
    In the case df is set, non-gso dropped by IPSTATS_MIB_FRAGFAILS,
    whereas gso gets segmented+fragmented (!)
  !IP_DF approach: 
    Aligned.

ip_gre in pmtudisc:
  Sets df to IP_DF.
  Non-gso will fail tnl_update_pmtu checks (gso should pass).
  IPPROTO_UDP approach:
    IPSKB_FRAG_SEGS never set. This leads the gso skbs to be eventually
    dropped. okay.
  !IP_DF approach:
    IPSKB_FRAG_SEGS not set, since IP_DF is true.
    This leads to gso skbs to be eventually dropped. okay.

(truely appreciate if you can review my above analysis)

Therefore using !(df & htons(IP_DF)) actually fixes some oversights of
our former proto==IPPROTO_UDP approach.

I'll send a patch.

Thanks
Shmulik

      reply	other threads:[~2016-08-19 13:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-09  7:04 [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs wenxu
2016-08-11  0:35 ` David Miller
2016-08-11 19:41   ` Shmulik Ladkani
2016-08-12  4:29     ` wenxu
     [not found]     ` <80d116d7-e61c-1bbd-64bf-e3b1f809419b@ucloud.cn>
2016-08-12  5:18       ` Shmulik Ladkani
2016-08-12 11:11     ` Hannes Frederic Sowa
2016-08-15 11:16       ` Shmulik Ladkani
2016-08-16  7:12         ` wenxu
2016-08-19  7:26         ` Shmulik Ladkani
2016-08-19  9:20           ` Hannes Frederic Sowa
2016-08-19 13:40             ` Shmulik Ladkani [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160819164007.527fe984@halley \
    --to=shmulik.ladkani@gmail.com \
    --cc=davem@davemloft.net \
    --cc=hannes@stressinduktion.org \
    --cc=jmorris@namei.org \
    --cc=kaber@trash.net \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=netdev@vger.kernel.org \
    --cc=wenx05124561@163.com \
    --cc=wenxu@ucloud.cn \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.