From: Joseph Gasparakis <joseph.gasparakis@intel.com>
To: Or Gerlitz <or.gerlitz@gmail.com>
Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
Jerry Chu <hkchu@google.com>, Or Gerlitz <ogerlitz@mellanox.com>,
Eric Dumazet <edumazet@google.com>,
Alexei Starovoitov <ast@plumgrid.com>,
Pravin B Shelar <pshelar@nicira.com>,
David Miller <davem@davemloft.net>,
netdev <netdev@vger.kernel.org>
Subject: Re: vxlan/veth performance issues on net.git + latest kernels
Date: Tue, 3 Dec 2013 16:35:39 -0800 (PST) [thread overview]
Message-ID: <alpine.LFD.2.03.1312031626400.7539@intel.com> (raw)
In-Reply-To: <CAJZOPZLT_3msfz_XY45GOBMWK_y8WEYBTq3rGRUijpYyXE1ddg@mail.gmail.com>
On Tue, 3 Dec 2013, Or Gerlitz wrote:
> On Wed, Dec 4, 2013 at 1:13 AM, Joseph Gasparakis
> <joseph.gasparakis@intel.com> wrote:
> >
> >
> > On Tue, 3 Dec 2013, Or Gerlitz wrote:
> >
> >> On Tue, Dec 3, 2013 at 11:11 PM, Joseph Gasparakis
> >> <joseph.gasparakis@intel.com> wrote:
> >>
> >> >>> lack of GRO : receiver seems to not be able to receive as fast as you want.
> >> >>>> TCPOFOQueue: 3167879
> >> >>> So many packets are received out of order (because of losses)
> >>
> >> >> I see that there's no GRO also for the non-veth tests which involve
> >> >> vxlan, and over there the receiving side is capable to consume the
> >> >> packets, do you have rough explaination why adding veth to the chain
> >> >> is such game changer which makes things to start falling out?
> >>
> >> > I have seen this before. Here are my findings:
> >> >
> >> > The gso_type is different if the skb comes from veth or not. From veth,
> >> > you will see the SKB_GSO_DODGY set. This breaks things as when the
> >> > skb with DODGY set moves from vxlan to the driver through dev_xmit_hard,
> >> > the stack drops it silently. I never got the time to find the root cause
> >> > for this, but I know it causes re-transmissions and big performance
> >> > degregation.
> >> >
> >> > I went as far as just quickly hacking a one liner unsetting the DODGY bit
> >> > in vxlan.c and that bypassed the issue and recovered the performance
> >> > problem, but obviously this is not a real fix.
> >>
> >> thanks for the heads up, few quick questions/clafications --
> >>
> >> -- you are talking on drops done @ the sender side, correct? Eric was
> >> saying we have evidences that the drops happen on the receiver.
> >
> > I am *guessing* drops on the Rx are due to the drops at the Tx. See my
> > answer to your next question for more info.
> >
> >>
> >> -- without the hack you did, still packets are sent/received, so what
> >> makes the stack to drop only some of them?
> >>
> >
> > What I had seen is GSOs getting dropped on the Tx side. Basically the GSOs
> > never made it to the driver, they were broken into non GSO smaller skbs by
> > the stack. I think the stack is not handling well the GSO with the DODGY
> > bit set, and that causes it to maybe partially the packet to be emitted,
> > causing the re-transmits (and maybe the drops on your Rx end)? Of course
> > all this is speculation, the fact that I know is that as soon as I was
> > forcing the gso type I saw offloaded VXLAN encapsulated traffic at decent speeds.
> >
> >> -- why packets coming from veth would have the SKB_GSO_DODGY bit set?
> >
> > That is something I would love to know too. I am guessing this is a way
> > for the VM to say it is a non-trusted packet? And maybe all this can be
> > fixed by maybe setting something on the VM through a userspace tool that
> > will stop the veth to set the DODGY bit?
> >
> >>
> >> -- so where is now (say net.git or 3.12.x) this one line you commented
> >> out? I don't see in vxlan.c or in ip_tunnel_core.c / ip_tunnel.c
> >> explicit setting of SKB_GSO_DODGY
> >
> > I did not commit it, as this was just a workaround to prove to myself that
> > the problem I was seing was due to the gso_type, and it would actually
> > just hide the problem and not give a proper solution to it.
> >
> >>
> >> Also, I am pretty sure the problem exists also when sending/receiving
> >> guest traffic through tap/macvtap <--> vhost/virtio-net and friends, I
> >> just sticked to the veth flavour b/c its one (== the hypervisor)
> >> network stack to debug and not two (+ the guest one).
>
> understood, can you point the line/area you hacked, I'd like to try it
> too and see the impact
I was printing the gso_type in vxlan_xmit_skb(), right before
iptunnel_xmit() gets called (I was focus UDPv4 encap only). Then I saw the
gso_type was different when a VM was involved and when it was not
(although I was transmitting exactly the same packet), and then I replaced
my printk with something like skb_shinfo(skb)->gso_type = <the gso type I had
for non-VM skb> and it all worked.
Then I looked into what was different between the two gso_types and the
only difference was that SKB_GSO_DODGY was set when Tx'ing from the VM.
I am sure I could have been more delicate with the aproach, but hey, it
worked for me.
I would be curious to see if this is the same issue as mine. It seems like
it is.
>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
>
next prev parent reply other threads:[~2013-12-04 0:18 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 15:05 vxlan/veth performance issues on net.git + latest kernels Or Gerlitz
2013-12-03 15:30 ` Eric Dumazet
2013-12-03 19:55 ` Or Gerlitz
2013-12-03 21:11 ` Joseph Gasparakis
2013-12-03 21:09 ` Or Gerlitz
2013-12-03 21:24 ` Eric Dumazet
2013-12-03 21:36 ` Or Gerlitz
2013-12-03 21:50 ` David Miller
2013-12-03 21:55 ` Eric Dumazet
2013-12-03 22:15 ` Or Gerlitz
2013-12-03 22:22 ` Or Gerlitz
2013-12-03 22:30 ` Hannes Frederic Sowa
2013-12-03 22:35 ` Or Gerlitz
2013-12-03 22:39 ` Hannes Frederic Sowa
2013-12-03 23:10 ` Or Gerlitz
2013-12-03 23:30 ` Or Gerlitz
2013-12-03 23:49 ` Hannes Frederic Sowa
2013-12-03 23:59 ` Eric Dumazet
2013-12-04 0:26 ` Alexei Starovoitov
2013-12-04 0:36 ` Eric Dumazet
2013-12-04 0:55 ` Alexei Starovoitov
2013-12-04 1:23 ` Eric Dumazet
2013-12-04 1:59 ` Alexei Starovoitov
2013-12-06 9:06 ` Or Gerlitz
2013-12-06 13:36 ` Eric Dumazet
2013-12-07 21:20 ` Or Gerlitz
2013-12-08 12:09 ` Or Gerlitz
2013-12-04 6:39 ` David Miller
2013-12-04 17:40 ` Eric Dumazet
2013-12-05 12:45 ` [PATCH net-next] net: introduce dev_consume_skb_any() Eric Dumazet
2013-12-05 14:13 ` Hannes Frederic Sowa
2013-12-05 14:45 ` Eric Dumazet
2013-12-05 15:05 ` Eric Dumazet
2013-12-05 15:44 ` Hannes Frederic Sowa
2013-12-05 16:38 ` Eric Dumazet
2013-12-05 16:54 ` Hannes Frederic Sowa
2013-12-06 20:24 ` David Miller
2013-12-03 23:13 ` vxlan/veth performance issues on net.git + latest kernels Joseph Gasparakis
2013-12-03 23:09 ` Or Gerlitz
2013-12-04 0:35 ` Joseph Gasparakis [this message]
2013-12-04 0:34 ` Alexei Starovoitov
2013-12-04 1:29 ` Joseph Gasparakis
2013-12-04 1:18 ` Eric Dumazet
2013-12-04 0:44 ` Joseph Gasparakis
2013-12-04 8:35 ` Or Gerlitz
2013-12-04 9:24 ` Joseph Gasparakis
2013-12-04 9:41 ` Or Gerlitz
2013-12-04 15:20 ` Or Gerlitz
[not found] ` <52A197DF.5010806@mellanox.com>
2013-12-06 9:30 ` Or Gerlitz
2013-12-08 12:43 ` Mike Rapoport
2013-12-08 13:07 ` Or Gerlitz
2013-12-08 14:30 ` Mike Rapoport
2013-12-08 20:50 ` Eric Dumazet
2013-12-08 21:36 ` Eric Dumazet
2013-12-06 10:30 ` Joseph Gasparakis
2013-12-07 21:27 ` Or Gerlitz
2013-12-08 18:08 ` Joseph Gasparakis
2013-12-08 20:12 ` Or Gerlitz
2013-12-08 15:21 ` Or Gerlitz
2013-12-03 17:12 ` Eric Dumazet
2013-12-03 19:50 ` Or Gerlitz
2013-12-03 20:19 ` John Fastabend
2013-12-03 21:12 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.03.1312031626400.7539@intel.com \
--to=joseph.gasparakis@intel.com \
--cc=ast@plumgrid.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=hkchu@google.com \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=or.gerlitz@gmail.com \
--cc=pshelar@nicira.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).