From: Wei Liu <wei.liu2@citrix.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
Wei Liu <wei.liu2@citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Paul Durrant <paul.durrant@citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [RFC] netif: staging grants for requests
Date: Wed, 4 Jan 2017 13:54:56 +0000 [thread overview]
Message-ID: <20170104135456.GM13806@citrix.com> (raw)
In-Reply-To: <58518B40.3050408@oracle.com>
Hey!
Thanks for writing this detailed document!
On Wed, Dec 14, 2016 at 06:11:12PM +0000, Joao Martins wrote:
> Hey,
>
> Back in the Xen hackaton '16 networking session there were a couple of ideas
> brought up. One of them was about exploring permanently mapped grants between
> xen-netback/xen-netfront.
>
> I started experimenting and came up with sort of a design document (in pandoc)
> on what it would like to be proposed. This is meant as a seed for discussion
> and also requesting input to know if this is a good direction. Of course, I
> am willing to try alternatives that we come up beyond the contents of the
> spec, or any other suggested changes ;)
>
> Any comments or feedback is welcome!
>
> Cheers,
> Joao
>
> ---
> % Staging grants for network I/O requests
> % Joao Martins <<joao.m.martins@oracle.com>>
> % Revision 1
>
> \clearpage
>
> --------------------------------------------------------------------
> Status: **Experimental**
>
> Architecture(s): x86 and ARM
>
Any.
> Component(s): Guest
>
> Hardware: Intel and AMD
No need to specify this.
> --------------------------------------------------------------------
>
> # Background and Motivation
>
I skimmed through the middle -- I think you description of transmissions
in both directions is accurate.
The proposal to replace some steps with explicit memcpy is also
sensible.
> \clearpage
>
> ## Performance
>
> Numbers that give a rough idea on the performance benefits of this extension.
> These are Guest <-> Dom0 which test the communication between backend and
> frontend, excluding other bottlenecks in the datapath (the software switch).
>
> ```
> # grant copy
> Guest TX (1vcpu, 64b, UDP in pps): 1 506 170 pps
> Guest TX (4vcpu, 64b, UDP in pps): 4 988 563 pps
> Guest TX (1vcpu, 256b, UDP in pps): 1 295 001 pps
> Guest TX (4vcpu, 256b, UDP in pps): 4 249 211 pps
>
> # grant copy + grant map (see next subsection)
> Guest TX (1vcpu, 260b, UDP in pps): 577 782 pps
> Guest TX (4vcpu, 260b, UDP in pps): 1 218 273 pps
>
> # drop at the guest network stack
> Guest RX (1vcpu, 64b, UDP in pps): 1 549 630 pps
> Guest RX (4vcpu, 64b, UDP in pps): 2 870 947 pps
> ```
>
> With this extension:
> ```
> # memcpy
> data-len=256 TX (1vcpu, 64b, UDP in pps): 3 759 012 pps
> data-len=256 TX (4vcpu, 64b, UDP in pps): 12 416 436 pps
This basically means we can almost get line rate for 10Gb link.
It is already a good result. I'm interested in knowing if there is
possibility to approach 40 or 100 Gb/s? It would be good if we design
this extension with higher goals in mind.
> data-len=256 TX (1vcpu, 256b, UDP in pps): 3 248 392 pps
> data-len=256 TX (4vcpu, 256b, UDP in pps): 11 165 355 pps
>
> # memcpy + grant map (see next subsection)
> data-len=256 TX (1vcpu, 260b, UDP in pps): 588 428 pps
> data-len=256 TX (4vcpu, 260b, UDP in pps): 1 668 044 pps
>
> # (drop at the guest network stack)
> data-len=256 RX (1vcpu, 64b, UDP in pps): 3 285 362 pps
> data-len=256 RX (4vcpu, 64b, UDP in pps): 11 761 847 pps
>
> # (drop with guest XDP_DROP prog)
> data-len=256 RX (1vcpu, 64b, UDP in pps): 9 466 591 pps
> data-len=256 RX (4vcpu, 64b, UDP in pps): 33 006 157 pps
> ```
>
> Latency measurements (netperf TCP_RR request size 1 and response size 1):
> ```
> 24 KTps vs 28 KTps
> 39 KTps vs 50 KTps (with kernel busy poll)
> ```
>
> TCP Bulk transfer measurements aren't showing a representative increase on
> maximum throughput (sometimes ~10%), but rather less retransmissions and
> more stable. This is probably because of being having a slight decrease in rtt
> time (i.e. receiver acknowledging data quicker). Currently trying exploring
> other data list sizes and probably will have a better idea on the effects of
> this.
>
> ## Linux grant copy vs map remark
>
> Based on numbers above there's a sudden 2x performance drop when we switch from
> grant copy to also grant map the ` gref`: 1 295 001 vs 577 782 for 256 and 260
> packets bytes respectivally. Which is all the more visible when removing the grant
> copy with memcpy in this extension (3 248 392 vs 588 428). While there's been
> discussions of avoid the TLB unflush on unmap, one could wonder what the
> threshold of that improvement would be. Chances are that this is the least of
> our concerns in a fully poppulated host (or with an oversubscribed one). Would
> it be worth experimenting increasing the threshold of the copy beyond the
> header?
>
Yes, it would be interesting to see more data points and provide
sensible default. But I think this is secondary goal because "sensible
default" can change overtime and on different environments.
> \clearpage
>
> # References
>
> [0] http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01504.html
>
> [1] https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/netmap_mem2.c#L362
>
> [2] https://www.freebsd.org/cgi/man.cgi?query=vale&sektion=4&n=1
>
> [3] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf
>
> [4]
> http://prototype-kernel.readthedocs.io/en/latest/networking/XDP/design/requirements.html#write-access-to-packet-data
>
> [5] http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L2073
>
> [6] http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx4/en_rx.c#L52
>
> # History
>
> A table of changes to the document, in chronological order.
>
> ------------------------------------------------------------------------
> Date Revision Version Notes
> ---------- -------- -------- -------------------------------------------
> 2016-12-14 1 Xen 4.9 Initial version.
> ---------- -------- -------- -------------------------------------------
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-01-04 13:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-14 18:11 [RFC] netif: staging grants for requests Joao Martins
2017-01-04 13:54 ` Wei Liu [this message]
2017-01-05 20:27 ` Joao Martins
2017-01-04 19:40 ` Stefano Stabellini
2017-01-05 11:54 ` Wei Liu
2017-01-05 20:27 ` Joao Martins
2017-01-06 0:30 ` Stefano Stabellini
2017-01-06 17:13 ` Joao Martins
2017-01-06 19:02 ` Stefano Stabellini
2017-01-06 9:33 ` Paul Durrant
2017-01-06 19:18 ` Stefano Stabellini
2017-01-06 20:19 ` Joao Martins
2017-01-09 9:03 ` Paul Durrant
2017-01-09 18:25 ` Stefano Stabellini
2017-01-06 20:08 ` Joao Martins
2017-01-09 8:56 ` Paul Durrant
2017-01-09 13:01 ` Joao Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170104135456.GM13806@citrix.com \
--to=wei.liu2@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=joao.m.martins@oracle.com \
--cc=paul.durrant@citrix.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.