Re: [RFC] netif: staging grants for requests

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Wei Liu <wei.liu2@citrix.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <paul.durrant@citrix.com>,
	David Vrabel <david.vrabel@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [RFC] netif: staging grants for requests
Date: Wed, 4 Jan 2017 13:54:56 +0000	[thread overview]
Message-ID: <20170104135456.GM13806@citrix.com> (raw)
In-Reply-To: <58518B40.3050408@oracle.com>

Hey!

Thanks for writing this detailed document!

On Wed, Dec 14, 2016 at 06:11:12PM +0000, Joao Martins wrote:
> Hey,
> 
> Back in the Xen hackaton '16 networking session there were a couple of ideas
> brought up. One of them was about exploring permanently mapped grants between
> xen-netback/xen-netfront.
> 
> I started experimenting and came up with sort of a design document (in pandoc)
> on what it would like to be proposed. This is meant as a seed for discussion
> and also requesting input to know if this is a good direction. Of course, I
> am willing to try alternatives that we come up beyond the contents of the
> spec, or any other suggested changes ;)
> 
> Any comments or feedback is welcome!
> 
> Cheers,
> Joao
> 
> ---
> % Staging grants for network I/O requests
> % Joao Martins <<joao.m.martins@oracle.com>>
> % Revision 1
> 
> \clearpage
> 
> --------------------------------------------------------------------
> Status: **Experimental**
> 
> Architecture(s): x86 and ARM
> 

Any.

> Component(s): Guest
> 
> Hardware: Intel and AMD

No need to specify this.

> --------------------------------------------------------------------
> 
> # Background and Motivation
> 

I skimmed through the middle -- I think you description of transmissions
in both directions is accurate.

The proposal to replace some steps with explicit memcpy is also
sensible.

> \clearpage
> 
> ## Performance
> 
> Numbers that give a rough idea on the performance benefits of this extension.
> These are Guest <-> Dom0 which test the communication between backend and
> frontend, excluding other bottlenecks in the datapath (the software switch).
> 
> ```
> # grant copy
> Guest TX (1vcpu,  64b, UDP in pps):  1 506 170 pps
> Guest TX (4vcpu,  64b, UDP in pps):  4 988 563 pps
> Guest TX (1vcpu, 256b, UDP in pps):  1 295 001 pps
> Guest TX (4vcpu, 256b, UDP in pps):  4 249 211 pps
> 
> # grant copy + grant map (see next subsection)
> Guest TX (1vcpu, 260b, UDP in pps):    577 782 pps
> Guest TX (4vcpu, 260b, UDP in pps):  1 218 273 pps
> 
> # drop at the guest network stack
> Guest RX (1vcpu,  64b, UDP in pps):  1 549 630 pps
> Guest RX (4vcpu,  64b, UDP in pps):  2 870 947 pps
> ```
> 
> With this extension:
> ```
> # memcpy
> data-len=256 TX (1vcpu,  64b, UDP in pps):  3 759 012 pps
> data-len=256 TX (4vcpu,  64b, UDP in pps): 12 416 436 pps

This basically means we can almost get line rate for 10Gb link.

It is already a  good result. I'm interested in knowing if there is
possibility to approach 40 or 100 Gb/s?  It would be good if we design
this extension with higher goals in mind.


> data-len=256 TX (1vcpu, 256b, UDP in pps):  3 248 392 pps
> data-len=256 TX (4vcpu, 256b, UDP in pps): 11 165 355 pps
> 
> # memcpy + grant map (see next subsection)
> data-len=256 TX (1vcpu, 260b, UDP in pps):    588 428 pps
> data-len=256 TX (4vcpu, 260b, UDP in pps):  1 668 044 pps
> 
> # (drop at the guest network stack)
> data-len=256 RX (1vcpu,  64b, UDP in pps):  3 285 362 pps
> data-len=256 RX (4vcpu,  64b, UDP in pps): 11 761 847 pps
> 
> # (drop with guest XDP_DROP prog)
> data-len=256 RX (1vcpu,  64b, UDP in pps):  9 466 591 pps
> data-len=256 RX (4vcpu,  64b, UDP in pps): 33 006 157 pps
> ```
> 
> Latency measurements (netperf TCP_RR request size 1 and response size 1):
> ```
> 24 KTps vs 28 KTps
> 39 KTps vs 50 KTps (with kernel busy poll)
> ```
> 
> TCP Bulk transfer measurements aren't showing a representative increase on
> maximum throughput (sometimes ~10%), but rather less retransmissions and
> more stable. This is probably because of being having a slight decrease in rtt
> time (i.e. receiver acknowledging data quicker). Currently trying exploring
> other data list sizes and probably will have a better idea on the effects of
> this.
> 
> ## Linux grant copy vs map remark
> 
> Based on numbers above there's a sudden 2x performance drop when we switch from
> grant copy to also grant map the ` gref`: 1 295 001 vs  577 782 for 256 and 260
> packets bytes respectivally. Which is all the more visible when removing the grant
> copy with memcpy in this extension (3 248 392 vs 588 428). While there's been
> discussions of avoid the TLB unflush on unmap, one could wonder what the
> threshold of that improvement would be. Chances are that this is the least of
> our concerns in a fully poppulated host (or with an oversubscribed one). Would
> it be worth experimenting increasing the threshold of the copy beyond the
> header?
> 

Yes, it would be interesting to see more data points and provide
sensible default. But I think this is secondary goal because "sensible
default" can change overtime and on different environments.

> \clearpage
> 
> # References
> 
> [0] http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01504.html
> 
> [1] https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/netmap_mem2.c#L362
> 
> [2] https://www.freebsd.org/cgi/man.cgi?query=vale&sektion=4&n=1
> 
> [3] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf
> 
> [4]
> http://prototype-kernel.readthedocs.io/en/latest/networking/XDP/design/requirements.html#write-access-to-packet-data
> 
> [5] http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L2073
> 
> [6] http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx4/en_rx.c#L52
> 
> # History
> 
> A table of changes to the document, in chronological order.
> 
> ------------------------------------------------------------------------
> Date       Revision Version  Notes
> ---------- -------- -------- -------------------------------------------
> 2016-12-14 1        Xen 4.9  Initial version.
> ---------- -------- -------- -------------------------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2017-01-04 13:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-14 18:11 [RFC] netif: staging grants for requests Joao Martins
2017-01-04 13:54 ` Wei Liu [this message]
2017-01-05 20:27   ` Joao Martins
2017-01-04 19:40 ` Stefano Stabellini
2017-01-05 11:54   ` Wei Liu
2017-01-05 20:27   ` Joao Martins
2017-01-06  0:30     ` Stefano Stabellini
2017-01-06 17:13       ` Joao Martins
2017-01-06 19:02         ` Stefano Stabellini
2017-01-06  9:33 ` Paul Durrant
2017-01-06 19:18   ` Stefano Stabellini
2017-01-06 20:19     ` Joao Martins
2017-01-09  9:03     ` Paul Durrant
2017-01-09 18:25       ` Stefano Stabellini
2017-01-06 20:08   ` Joao Martins
2017-01-09  8:56     ` Paul Durrant
2017-01-09 13:01       ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170104135456.GM13806@citrix.com \
    --to=wei.liu2@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=joao.m.martins@oracle.com \
    --cc=paul.durrant@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).