Re: Interesting observation with network event notification and batching

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: annie.li@oracle.com, stefano.stabellini@eu.citrix.com,
	andrew.bennieston@citrix.com, ian.campbell@citrix.com,
	xen-devel@lists.xen.org
Subject: Re: Interesting observation with network event notification and batching
Date: Fri, 14 Jun 2013 14:53:03 -0400	[thread overview]
Message-ID: <20130614185303.GC21280@phenom.dumpdata.com> (raw)
In-Reply-To: <20130612101451.GF2765@zion.uk.xensource.com>

On Wed, Jun 12, 2013 at 11:14:51AM +0100, Wei Liu wrote:
> Hi all
> 
> I'm hacking on a netback trying to identify whether TLB flushes causes
> heavy performance penalty on Tx path. The hack is quite nasty (you would
> not want to know, trust me).
> 
> Basically what is doesn't is, 1) alter network protocol to pass along

You probably meant: "what it does" ?

> mfns instead of grant references, 2) when the backend sees a new mfn,
> map it RO and cache it in its own address space.
> 
> With this hack, now we have some sort of zero-copy TX path. Backend
> doesn't need to issue any grant copy / map operation any more. When it
> sees a new packet in the ring, it just needs to pick up the pages
> in its own address space and assemble packets with those pages then pass
> the packet on to network stack.

Uh, so not sure I understand the RO part. If dom0 is mapping it won't
that trigger a PTE update? And doesn't somebody (either the guest or
initial domain) do a grant mapping to let the hypervisor know it is
OK to map a grant?

Or is dom0 actually permitted to map the MFN of any guest without using
the grants? In which case you are then using the _PAGE_IOMAP
somewhere and setting up vmap entries with the MFN's that point to the
foreign domain - I think?

> 
> In theory this should boost performance, but in practice it is the other
> way around. This hack makes Xen network more than 50% slower than before
> (OMG). Further investigation shows that with this hack the batching
> ability is gone. Before this hack, netback batches like 64 slots in one

That is quite interesting.

> interrupt event, however after this hack, it only batches 3 slots in one
> interrupt event -- that's no batching at all because we can expect one
> packet to occupy 3 slots.

Right.
> 
> Time to have some figures (iperf from DomU to Dom0).
> 
> Before the hack, doing grant copy, throughput: 7.9 Gb/s, average slots
> per batch 64.
> 
> After the hack, throughput: 2.5 Gb/s, average slots per batch 3.
> 
> After the hack, adds in 64 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 3.2 Gb/s, average slots
> per batch 6.
> 
> After the hack, adds in 256 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 5.2 Gb/s, average slots
> per batch 26.
> 
> After the hack, adds in 512 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 7.9 Gb/s, average slots
> per batch 26.
> 
> After the hack, adds in 768 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 5.6 Gb/s, average slots
> per batch 25.
> 
> After the hack, adds in 1024 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 4.4 Gb/s, average slots
> per batch 25.
> 

How do you get it to do more HYPERVISR_xen_version? Did you just add
a (for i = 1024; i>0;i--) hypervisor_yield();

in netback?
> Average slots per batch is calculate as followed:
>  1. count total_slots processed from start of day
>  2. count tx_count which is the number of tx_action function gets
>     invoked
>  3. avg_slots_per_tx = total_slots / tx_count
> 
> The counter-intuition figures imply that there is something wrong with
> the currently batching mechanism. Probably we need to fine-tune the
> batching behavior for network and play with event pointers in the ring
> (actually I'm looking into it now). It would be good to have some input
> on this.

I am still unsure I understand hwo your changes would incur more
of the yields.
> 
> Konrad, IIRC you once mentioned you discovered something with event
> notification, what's that?

They were bizzare. I naively expected some form of # of physical NIC 
interrupts to be around the same as the VIF or less. And I figured
that the amount of interrupts would be constant irregardless of the
size of the packets. In other words #packets == #interrupts.

In reality the number of interrupts the VIF had was about the same while
for the NIC it would fluctuate. (I can't remember the details).

But it was odd and I didn't go deeper in it to figure out what
was happening. And also to figure out if for the VIF we could
do something of #packets != #interrupts.  And hopefully some
mechanism to adjust so that the amount of interrupts would
be lesser per packets (hand waving here).

next prev parent reply	other threads:[~2013-06-14 18:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12 10:14 Interesting observation with network event notification and batching Wei Liu
2013-06-14 18:53 ` Konrad Rzeszutek Wilk [this message]
2013-06-16  9:54   ` Wei Liu
2013-06-17  9:38     ` Ian Campbell
2013-06-17  9:56       ` Andrew Bennieston
2013-06-17 10:46         ` Wei Liu
2013-06-17 10:56           ` Andrew Bennieston
2013-06-17 11:08             ` Ian Campbell
2013-06-17 11:55               ` Andrew Bennieston
2013-06-17 10:06       ` Jan Beulich
2013-06-17 10:16         ` Ian Campbell
2013-06-17 10:35       ` Wei Liu
2013-06-17 11:34         ` annie li
2013-06-16 12:46   ` Wei Liu
2013-06-28 16:15 ` Wei Liu
2013-07-01  7:48   ` annie li
2013-07-01  8:54     ` Wei Liu
2013-07-01 14:29       ` Stefano Stabellini
2013-07-01 14:39         ` Wei Liu
2013-07-01 14:54           ` Stefano Stabellini
2013-07-01 15:59       ` annie li
2013-07-01 16:06         ` Wei Liu
2013-07-01 16:53           ` Andrew Bennieston
2013-07-01 17:55             ` Wei Liu
2013-07-03 15:18             ` Wei Liu
2013-07-01 14:19     ` Stefano Stabellini
2013-07-01 15:59       ` annie li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130614185303.GC21280@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=andrew.bennieston@citrix.com \
    --cc=annie.li@oracle.com \
    --cc=ian.campbell@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.