From mboxrd@z Thu Jan  1 00:00:00 1970
From: annie li <annie.li@oracle.com>
Subject: Re: Interesting observation with network event
	notification and batching
Date: Mon, 01 Jul 2013 23:59:45 +0800
Message-ID: <51D1A771.4000302@oracle.com>
References: <20130612101451.GF2765@zion.uk.xensource.com>
	<20130628161542.GF16643@zion.uk.xensource.com>
	<51D13456.1040609@oracle.com>
	<alpine.DEB.2.02.1307011519180.4525@kaball.uk.xensource.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <alpine.DEB.2.02.1307011519180.4525@kaball.uk.xensource.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: andrew.bennieston@citrix.com, Wei Liu <wei.liu2@citrix.com>, ian.campbell@citrix.com, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org


On 2013-7-1 22:19, Stefano Stabellini wrote:
> Could you please use plain text emails in the future?

Sure, sorry about that.

Thanks
Annie
>
> On Mon, 1 Jul 2013, annie li wrote:
>> On 2013-6-29 0:15, Wei Liu wrote:
>>
>> Hi all,
>>
>> After collecting more stats and comparing copying / mapping cases, I now
>> have some more interesting finds, which might contradict what I said
>> before.
>>
>> I tuned the runes I used for benchmark to make sure iperf and netperf
>> generate large packets (~64K). Here are the runes I use:
>>
>>    iperf -c 10.80.237.127 -t 5 -l 131072 -w 128k (see note)
>>    netperf -H 10.80.237.127 -l10 -f m -- -s 131072 -S 131072
>>
>>                            COPY                    MAP
>> iperf    Tput:             6.5Gb/s             14Gb/s (was 2.5Gb/s)
>>
>>
>> So with default iperf setting, copy is about 7.9G, and map is about 2.5G? How about the result of netperf without large packets?
>>
>>           PPI               2.90                  1.07
>>           SPI               37.75                 13.69
>>           PPN               2.90                  1.07
>>           SPN               37.75                 13.69
>>           tx_count           31808                174769
>>
>>
>> Seems interrupt count does not affect the performance at all with -l 131072 -w 128k.
>>
>>           nr_napi_schedule   31805                174697
>>           total_packets      92354                187408
>>           total_reqs         1200793              2392614
>>
>> netperf  Tput:            5.8Gb/s             10.5Gb/s
>>           PPI               2.13                   1.00
>>           SPI               36.70                  16.73
>>           PPN               2.13                   1.31
>>           SPN               36.70                  16.75
>>           tx_count           57635                205599
>>           nr_napi_schedule   57633                205311
>>           total_packets      122800               270254
>>           total_reqs         2115068              3439751
>>
>>    PPI: packets processed per interrupt
>>    SPI: slots processed per interrupt
>>    PPN: packets processed per napi schedule
>>    SPN: slots processed per napi schedule
>>    tx_count: interrupt count
>>    total_reqs: total slots used during test
>>
>> * Notification and batching
>>
>> Is notification and batching really a problem? I'm not so sure now. My
>> first thought when I didn't measure PPI / PPN / SPI / SPN in copying
>> case was that "in that case netback *must* have better batching" which
>> turned out not very true -- copying mode makes netback slower, however
>> the batching gained is not hugh.
>>
>> Ideally we still want to batch as much as possible. Possible way
>> includes playing with the 'weight' parameter in NAPI. But as the figures
>> show batching seems not to be very important for throughput, at least
>> for now. If the NAPI framework and netfront / netback are doing their
>> jobs as designed we might not need to worry about this now.
>>
>> Andrew, do you have any thought on this? You found out that NAPI didn't
>> scale well with multi-threaded iperf in DomU, do you have any handle how
>> that can happen?
>>
>> * Thoughts on zero-copy TX
>>
>> With this hack we are able to achieve 10Gb/s single stream, which is
>> good. But, with classic XenoLinux kernel which has zero copy TX we
>> didn't able to achieve this.  I also developed another zero copy netback
>> prototype one year ago with Ian's out-of-tree skb frag destructor patch
>> series. That prototype couldn't achieve 10Gb/s either (IIRC the
>> performance was more or less the same as copying mode, about 6~7Gb/s).
>>
>> My hack maps all necessary pages permantently, there is no unmap, we
>> skip lots of page table manipulation and TLB flushes. So my basic
>> conclusion is that page table manipulation and TLB flushes do incur
>> heavy performance penalty.
>>
>> This hack can be upstreamed in no way. If we're to re-introduce
>> zero-copy TX, we would need to implement some sort of lazy flushing
>> mechanism. I haven't thought this through. Presumably this mechanism
>> would also benefit blk somehow? I'm not sure yet.
>>
>> Could persistent mapping (with the to-be-developed reclaim / MRU list
>> mechanism) be useful here? So that we can unify blk and net drivers?
>>
>> * Changes required to introduce zero-copy TX
>>
>> 1. SKB frag destructor series: to track life cycle of SKB frags. This is
>> not yet upstreamed.
>>
>>
>> Are you mentioning this one http://old-list-archives.xen.org/archives/html/xen-devel/2011-06/msg01711.html?
>>
>>
>> 2. Mechanism to negotiate max slots frontend can use: mapping requires
>> backend's MAX_SKB_FRAGS >= frontend's MAX_SKB_FRAGS.
>>
>> 3. Lazy flushing mechanism or persistent grants: ???
>>
>>
>> I did some test with persistent grants before, it did not show better performance than grant copy. But I was using the default
>> params of netperf, and not tried large packet size. Your results reminds me that maybe persistent grants would get similar
>> results with larger packet size too.
>>
>> Thanks
>> Annie
>>
>>
>>