netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fan Du <fengyuleidian0615@gmail.com>
To: Jesse Gross <jesse@nicira.com>
Cc: "Du, Fan" <fan.du@intel.com>, Thomas Graf <tgraf@suug.ch>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"fw@strlen.de" <fw@strlen.de>,
	"dev@openvswitch.org" <dev@openvswitch.org>,
	"pshelar@nicira.com" <pshelar@nicira.com>
Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
Date: Tue, 06 Jan 2015 17:34:11 +0800	[thread overview]
Message-ID: <54ABAC13.9070402@gmail.com> (raw)
In-Reply-To: <CAEP_g=-R1FrcA0sTJJjQhypRtVCwoRZ+LieKaSJxqA-HACZqEw@mail.gmail.com>


On 2015/1/6 1:58, Jesse Gross wrote:
> On Mon, Jan 5, 2015 at 1:02 AM, Fan Du <fengyuleidian0615@gmail.com> wrote:
>> 于 2014年12月03日 10:31, Du, Fan 写道:
>>
>>>
>>>> -----Original Message-----
>>>> From: Thomas Graf [mailto:tgr@infradead.org] On Behalf Of Thomas Graf
>>>> Sent: Wednesday, December 3, 2014 1:42 AM
>>>> To: Michael S. Tsirkin
>>>> Cc: Du, Fan; 'Jason Wang'; netdev@vger.kernel.org; davem@davemloft.net;
>>>> fw@strlen.de; dev@openvswitch.org; jesse@nicira.com; pshelar@nicira.com
>>>> Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than
>>>> MTU
>>>>
>>>> On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
>>>>> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
>>>>>> On 12/02/14 at 01:48pm, Flavio Leitner wrote:
>>>>>>> What about containers or any other virtualization environment that
>>>>>>> doesn't use Virtio?
>>>>>>
>>>>>> The host can dictate the MTU in that case for both veth or OVS
>>>>>> internal which would be primary container plumbing techniques.
>>>>>
>>>>> It typically can't do this easily for VMs with emulated devices:
>>>>> real ethernet uses a fixed MTU.
>>>>>
>>>>> IMHO it's confusing to suggest MTU as a fix for this bug, it's an
>>>>> unrelated optimization.
>>>>> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
>>>>
>>>> PMTU discovery only resolves the issue if an actual IP stack is running
>>>> inside the
>>>> VM. This may not be the case at all.
>>>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> Some thoughts here:
>>>
>>> Think otherwise, this is indeed what host stack should forge a
>>> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED
>>> message with _inner_ skb network and transport header, do whatever type of
>>> encapsulation,
>>> and thereafter push such packet upward to Guest/Container, which make them
>>> feel, the intermediate node
>>> or the peer send such message. PMTU should be expected to work correct.
>>> And such behavior should be shared by all other encapsulation tech if they
>>> are also suffered.
>>
>> Hi David, Jesse and Thomas
>>
>> As discussed in here:
>> https://www.marc.info/?l=linux-netdev&m=141764712631150&w=4 and
>> quotes from Jesse:
>> My proposal would be something like this:
>>   * For L2, reduce the VM MTU to the lowest common denominator on the
>> segment.
>>   * For L3, use path MTU discovery or fragment inner packet (i.e.
>> normal routing behavior).
>>   * As a last resort (such as if using an old version of virtio in the
>> guest), fragment the tunnel packet.
>>
>>
>> For L2, it's a administrative action
>> For L3, PMTU approach looks better, because once the sender is alerted the
>> reduced MTU,
>> packet size after encapsulation will not exceed physical MTU, so no
>> additional fragments
>> efforts needed.
>> For "As a last resort... fragment the tunnel packet", the original patch:
>> https://www.marc.info/?l=linux-netdev&m=141715655024090&w=4 did the job, but
>> seems it's
>> not welcomed.
> This needs to be properly integrated into IP processing if it is to
> work correctly.
Do you mean the original patch in this thread? yes, it works correctly
in my cloud env. If you has any other concerns, please let me know. :)
> One of the reasons for only doing path MTU discovery
> for L3 is that it operates seamlessly as part of normal operation -
> there is no need to forge addresses or potentially generate ICMP when
> on an L2 network. However, this ignores the IP handling that is going
> on (note that in OVS it is possible for L3 to be implemented as a set
> of flows coming from a controller).
>
> It also should not be VXLAN specific or duplicate VXLAN encapsulation
> code. As this is happening before encapsulation, the generated ICMP
> does not need to be encapsulated either if it is created in the right
> location.
Yes, I agree. GRE share the same issue from the code flow.
Pushing back ICMP msg back without encapsulation without circulating down
to physical device is possible. The "right location" as far as I know
could only be in ovs_vport_send. In addition this probably requires wrapper
route looking up operation for GRE/VXLAN, after get the under layer 
device MTU
from the routing information, then calculate reduced MTU becomes feasible.

  reply	other threads:[~2015-01-06  9:34 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-28  6:33 [PATCH net] gso: do GSO for local skb with size bigger than MTU Fan Du
2014-11-28  7:02 ` Jason Wang
2014-11-30 10:08   ` Du, Fan
2014-12-01 13:52     ` Thomas Graf
     [not found]       ` <20141201135225.GA16814-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-01 15:06         ` Michael S. Tsirkin
2014-12-02 15:48         ` Flavio Leitner
2014-12-02 17:09           ` Thomas Graf
     [not found]             ` <20141202170927.GA9457-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-02 17:34               ` Michael S. Tsirkin
2014-12-02 17:41                 ` Thomas Graf
2014-12-02 18:12                   ` Jesse Gross
     [not found]                     ` <CAEP_g=-86Z6pxNow-wjnbx_v9er_TSn6x5waigqVqYHa7tEQJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-03  9:03                       ` Michael S. Tsirkin
2014-12-03 18:07                         ` Jesse Gross
     [not found]                           ` <CAEP_g=9C+D3gbjJ4n1t6xuyjqEAMYi4ZfqPoe92UAoQJH-UsKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-03 18:38                             ` Michael S. Tsirkin
2014-12-03 18:56                               ` Rick Jones
     [not found]                                 ` <547F5CC2.8000908-VXdhtT5mjnY@public.gmane.org>
2014-12-04 10:17                                   ` Michael S. Tsirkin
2014-12-03 19:38                               ` Jesse Gross
2014-12-03 22:02                                 ` Thomas Graf
     [not found]                                   ` <20141203220244.GA8822-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-03 22:50                                     ` Michael S. Tsirkin
2014-12-03 22:51                                   ` Jesse Gross
2014-12-03 23:05                                     ` Thomas Graf
     [not found]                                       ` <20141203230551.GC8822-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2014-12-04  0:54                                         ` Jesse Gross
2014-12-04  1:15                                           ` Thomas Graf
2014-12-04  1:51                                             ` Jesse Gross
2014-12-04  9:26                                               ` Thomas Graf
2014-12-04 23:19                                                 ` Jesse Gross
2014-12-04  7:48                                     ` Du Fan
2014-12-04 23:23                                       ` Jesse Gross
2014-12-05  0:25                                         ` Du Fan
2014-12-03  2:31                   ` Du, Fan
2015-01-05  6:02                     ` Fan Du
     [not found]                       ` <54AA2912.6090903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-05 17:58                         ` Jesse Gross
2015-01-06  9:34                           ` Fan Du [this message]
2015-01-06 19:11                             ` Jesse Gross
     [not found]                               ` <CAEP_g=8bCR=PeSoi09jLWLtNUrxhzx45h1Wm=9D=R57AqUac2w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07  5:58                                 ` Fan Du
2015-01-07 20:52                                   ` Jesse Gross
     [not found]                                     ` <CAEP_g=8EBeQUFkRRsG3sznYryd+LE9qJKWQXfS==HG2HDO=UKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-08  9:39                                       ` Fan Du
2015-01-08 19:55                                         ` Jesse Gross
     [not found]                                           ` <CAEP_g=9hh+MG7AWEnct7CwRqp=ZghpbkDeQ5BhGQktDgMST1jA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-09  5:42                                             ` Fan Du
2015-01-12 18:48                                               ` Jesse Gross
2015-01-09  5:48                                           ` Fan Du
2015-01-12 18:55                                             ` Jesse Gross
2015-01-13 16:58                                               ` Thomas Graf
2014-12-02 15:44     ` Flavio Leitner
2014-12-02 18:06       ` Jesse Gross
2014-12-02 21:32         ` Flavio Leitner
2014-12-02 21:47           ` Jesse Gross
2014-12-03  1:58           ` Du, Fan
2014-11-30 10:26 ` Florian Westphal
2014-11-30 10:55   ` Du, Fan
2014-11-30 15:11     ` Florian Westphal
2014-12-01  6:47       ` Du, Fan
2014-12-03  3:23 ` David Miller
2014-12-03  3:32   ` Du, Fan
2014-12-03  4:35     ` David Miller
2014-12-03  4:50       ` Du, Fan
2014-12-03  5:14         ` David Miller
2014-12-03  6:53           ` Du, Fan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54ABAC13.9070402@gmail.com \
    --to=fengyuleidian0615@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dev@openvswitch.org \
    --cc=fan.du@intel.com \
    --cc=fw@strlen.de \
    --cc=jasowang@redhat.com \
    --cc=jesse@nicira.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pshelar@nicira.com \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).