From: annie li <annie.li@oracle.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: Anirban Chakraborty <abchak@juniper.net>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: large packet support in netfront driver and guest network throughput
Date: Tue, 17 Sep 2013 10:09:21 +0800 [thread overview]
Message-ID: <5237B9D1.8020004@oracle.com> (raw)
In-Reply-To: <20130916142134.GE22986@zion.uk.xensource.com>
On 2013-9-16 22:21, Wei Liu wrote:
> On Fri, Sep 13, 2013 at 05:09:48PM +0000, Anirban Chakraborty wrote:
>> On Sep 13, 2013, at 4:44 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>>
>>> On Thu, Sep 12, 2013 at 05:53:02PM +0000, Anirban Chakraborty wrote:
>>>> Hi All,
>>>>
>>>> I am sure this has been answered somewhere in the list in the past, but I can't find it. I was wondering if the linux guest netfront driver has GRO support in it. tcpdump shows packets coming in with 1500 bytes, although the eth0 in dom0 and the vif corresponding to the linux guest in dom0 is showing that they receive large packet:
>>>>
>>>> In dom0:
>>>> eth0 Link encap:Ethernet HWaddr 90:E2:BA:3A:B1:A4
>>>> UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
>>>> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
>>>> 17:38:25.155373 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>>>> 10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
>>>>
>>>> vif4.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
>>>> UP BROADCAST RUNNING NOARP PROMISC MTU:1500 Metric:1
>>>> tcpdump -i vif4.0 -nnvv -s 1500 src 10.84.20.214
>>>> 17:38:25.156364 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>>>> 10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
>>>>
>>>>
>>>> In the guest:
>>>> eth0 Link encap:Ethernet HWaddr CA:FD:DE:AB:E1:E4
>>>> inet addr:10.84.20.213 Bcast:10.84.20.255 Mask:255.255.255.0
>>>> inet6 addr: fe80::c8fd:deff:feab:e1e4/64 Scope:Link
>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
>>>> 10:38:25.071418 IP (tos 0x0, ttl 64, id 15074, offset 0, flags [DF], proto TCP (6), length 1500)
>>>> 10.84.20.214.51040 > 10.84.20.213.5001: Flags [.], seq 17400:18848, ack 1, win 229, options [nop,nop,TS val 65594013 ecr 65569213], length 1448
>>>>
>>>> Is the packet on transfer from netback to net front is segmented into MTU size? Is GRO not supported in the guest?
>>> Here is what I see in the guest, iperf server running in guest and iperf
>>> client running in Dom0. Tcpdump runs with the rune you provided.
>>>
>>> 10.80.238.213.38895 > 10.80.239.197.5001: Flags [.], seq
>>> 5806480:5818064, ack 1, win 229, options [nop,nop,TS val 21968973 ecr
>>> 21832969], length 11584
>>>
>>> This is a upstream kernel. The throughput from Dom0 to DomU is ~7.2Gb/s.
>> Thanks for your reply. The tcpdump was captured on dom0 of the guest [at both vif and the physical interfaces] , i.e. on the receive path of the server. iperf server was running on the guest (10.84.20.213) and the client was at another guest (on a different server) with IP 10.84.20.214. The traffic was between two guests, not between dom0 and the guest.
>>
>>>> I am seeing extremely low throughput on a 10Gb/s link. Two linux guests (Centos 6.4 64bit, 4 VCPU and 4GB of memory) are running on two different XenServer 6.1s and iperf session between them shows at most 3.2 Gbps.
>>> XenServer might use different Dom0 kernel with their own tuning. You can
>>> also try to contact XenServer support for better idea?
>>>
>> XenServer 6.1 is running 2.6.32.43 kernel. Since the issue is in netfront driver, as it appears from the tcpdump, thats why I thought I post it here. Note that checksum offloads of the interfaces (virtual and physical) were not even touched, the default setting (which was set to on) was used.
>>
>>> In general, off-host communication can be affected by various things. It
>>> would be quite useful to identify the bottleneck first.
>>>
>>> Try to run:
>>> 1. Dom0 to Dom0 iperf (or you workload)
>>> 2. Dom0 to DomU iperf
>>> 3. DomU to Dom0 iperf
>> I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected (with GRO turned on in the physical interface). However, when I run guest to guest, things fall off. Is large packet not supported in netfront? I thought otherwise. I looked at the code and I do not see any call to napi_gro_receive(), rather it is using netif_receive_skb(). netback seems to be sending GSO packets to the netfront, but it is being segmented to 1500 byte (as it appears from the tcpdump).
>>
> OK, I get your problem.
>
> Indeed netfront doesn't make use of GRO API at the moment.
This is true.
But I am wondering why large packet is not segmented into mtu size with
upstream kernel? I did see large packets with upsteam kernel on receive
guest(test between 2 domus on same host).
Thanks
Annie
> I've added
> this to my list to work on. I will keep you posted when I get to that.
>
> Thanks!
>
> Wei.
>
>>> In order to get line rate, you need to at least get line rate from Dom0
>>> to Dom0 IMHO. 10G/s line rate from guest to guest has not yet been
>>> achieved at the moment…
>> What is the current number, without VCPU pinning etc. for 1500 byte MTU? I am getting 2.2-3.2 Gbps for 4VCPU guest with 4GB of memory. It is the only vm running on that server without any other traffic.
>>
>> -Anirban
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2013-09-17 2:09 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-12 17:53 large packet support in netfront driver and guest network throughput Anirban Chakraborty
2013-09-13 11:44 ` Wei Liu
2013-09-13 17:09 ` Anirban Chakraborty
2013-09-16 14:21 ` Wei Liu
2013-09-17 2:09 ` annie li [this message]
2013-09-17 8:25 ` Wei Liu
2013-09-17 17:53 ` Anirban Chakraborty
2013-09-18 2:28 ` annie li
2013-09-18 21:06 ` Anirban Chakraborty
2013-09-18 15:48 ` Wei Liu
2013-09-18 20:38 ` Anirban Chakraborty
2013-09-19 9:41 ` Wei Liu
2013-09-19 16:59 ` Anirban Chakraborty
2013-09-19 18:43 ` Wei Liu
2013-09-19 19:04 ` Wei Liu
2013-09-19 20:54 ` Anirban Chakraborty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5237B9D1.8020004@oracle.com \
--to=annie.li@oracle.com \
--cc=abchak@juniper.net \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).