From mboxrd@z Thu Jan 1 00:00:00 1970 From: William Cohen Subject: Re: More network tests with xenoprofile this time Date: Tue, 07 Jun 2005 17:47:22 -0400 Message-ID: <42A615EA.5000100@redhat.com> References: <6C21311CEE34E049B74CC0EF339464B924B3B7@cacexc12.americas.cpqcorp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <6C21311CEE34E049B74CC0EF339464B924B3B7@cacexc12.americas.cpqcorp.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Santos, Jose Renato G" Cc: Ian Pratt , xen-devel@lists.xensource.com, "Turner, Yoshio" , Andrew Theurer , Aravind Menon , G John Janakiraman List-Id: xen-devel@lists.xenproject.org Santos, Jose Renato G wrote: > Andrew, > > You may want to take a look at the folowing paper > which is being presented at VEE'05 (June 11 and 12, 2005). > > http://www.hpl.hp.com/research/dca/system/papers/xenoprof-vee05.pdf > > It presents network performance results using xenoprof. > This was done for xen 2.0.3. The profile you reported > has some similarities with our results although the > exact numbers are different. But that is expected, since > you are running a different version of Xen on a different > hardware. > We have seen that a significant amount of time was spent > on handling interrupts in Xen, as well. > We have also seen that a significant amount of time is > spent on the hypervisor (+/- 40%) for the dom1 <-> external > case, measured both at dom1 and at dom0. > (in our case we instrumented the receive side) > When we run the benchmark on dom0 the time spent on Xen > is reduced to (+/-20%). > Most of this extra Xen overhead when running a guest > seems to come from the page transfer between > domain 0 and the guest (see table 6 and discussion > on paper). > > The paper omits the complete oprofile reports > for brevity. I will be happy to send you any > detailed oprofile report we have generated for the > paper, if you want to compare it with your results. > Just let me know ... > > Renato Hi Renato, The article was an interesting application of the xenoprof. It seem like it would be useful to also have data collected using the cycle counts (GLOBAL_POWER_EVENTS on P4) to give some indication of areas with high overhead operations. There may be some areas with few very expensive instructions. Calling attention to those areas would help improve performance. The increases in I-TLB and D-TLB events for Xen-domain0 shown in Figure 4 are surprising. Why would the working sets be that much larger for Xen-domain0 than regular linux, particularly for code? Is there an table similar to table 3 for I-TLB event sample locations? Can't the VMM use a 4-MB page and the Xen-domain0 kernel shouldn't be that much larger than regular linux kernel? How were TLB flushes ruled out as a cause? Could the PERFCOUNTER_CPU counters in perfc_defn.h be used to see if the VMM is doing a lot of TLB flushes? Also how much of I-TLB and D-TLB events are due to the P4 architecture? Are the results so dramatic for a Athlon or AMD64 processors? -Will