All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: New MPI benchmark performance results (update)
@ 2005-05-03 13:56 Ian Pratt
  2005-05-03 16:48 ` xuehai zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Pratt @ 2005-05-03 13:56 UTC (permalink / raw)
  To: xuehai zhang, Xen-devel

> 
> In the graphs presented on the webpage, we take the results 
> of native Linux as the reference and normalize the other 3 
> scenarios to it. We observe a general pattern that usually 
> dom0 has a better performance than domU with SMP than domU 
> without SMP (here better performance means low latency and 
> high throughput). However, we also notice very big 
> performance gap between domU (w/o SMP) and native linux (or 
> dom0 because generally dom0 has a very similar performance as 
> native linux). Some distinct examples are: 8-node SendRecv 
> latency (max domU/linux score ~ 18), 8-node Allgather latency 
> (max domU/linux score ~ 17), and 8-node Alltoall latency (max 
> domU/linux > 60). The performance difference in the last 
> example is HUGE and we could not think about a reasonable 
> explaination why transferring 512B message size is so much 
> different than other sizes. We appreciate if you can provide 
> your insight to such a big performance problem in these benchmarks.

I still don't quite understand your experimental setup. What version of
Xen are you using? How many CPUs does each node have? How many domU's do
you run on a single node?

As regards the anomalous result for 512B AlltoAll performance, the best
way to track this down would be to use xen-oprofile. Is it reliably
repeatable? Really bad results are usually due to packets being dropped
somewhere -- there hasn't ben a whole lot of effort put into UDP
performance because so few applications use it.

Ian


 

^ permalink raw reply	[flat|nested] 10+ messages in thread
* RE: New MPI benchmark performance results (update)
@ 2005-05-03 19:09 Santos, Jose Renato G (Jose Renato Santos)
  0 siblings, 0 replies; 10+ messages in thread
From: Santos, Jose Renato G (Jose Renato Santos) @ 2005-05-03 19:09 UTC (permalink / raw)
  To: xuehai zhang, Ian Pratt; +Cc: Xen-devel

[-- Attachment #1: Type: text/plain, Size: 789 bytes --]



> I am not very familar with xen-oprofile. I notice there are 
> some discussions about it in the mailing 
> list. I wonder if there is any other documents that I can 
> refer to. Thanks.
> 

  Please, see http://xenoprof.sourceforge.net for a description
  of xenoprof and for downloading patches.(You will need 3 
  patches: one for xen, one for linux and one for oprofile). 
  You need to be familiar with oprofile to use xenoprof. Please check
  http://oprofile.sourceforge.net/ for more info on oprofile.

  Xenoprof is currently available only for Xen 2.0.5.
  I am working on getting it to xen unstable but there
  is a problem with NMI handling which was not solved yet.
   
  I have also attached a text file that gives an overview
  of xenoprof

  Renato

[-- Attachment #2: xenoprof.txt --]
[-- Type: text/plain, Size: 9206 bytes --]


                XENOPROF - Performance profiling in Xen
               =========================================

                           User Guide
                          ============

 Version: 1.0
 Date: April 8, 2005
 Copyright (C) 2005 Hewlett-Packard Co. (http://xenoprof.sourceforge.net)

 (Aravind Menon, Jose Renato Santos, Yoshio Turner, G.(John) Janakiraman)
 

1. Overview
===========

This file provides an overview of Xenoprof, a system-wide statistical 
profiling toolkit implemented for the Xen virtual machine environment.
The Xenoprof toolkit supports system-wide coordinated profiling in a 
Xen environment to obtain the distribution of hardware events such as 
clock cycles, instruction execution, TLB and cache misses, etc. Xenoprof 
allows profiling of concurrently executing virtual machines (which 
includes the operating system and applications running in each virtual 
machine) and the Xen VMM itself. Xenoprof provides profiling data at the 
fine granularity of individual processes and routines executing in either 
the virtual machine or in the Xen VMM

Xenoprof was developed at HP Labs by modifying and extending the 
original OProfile code for linux (http://oprofile.sourceforge.net).  
We assume the reader is familiar with OProfile and its tools. If you
are not familiar with OProfile we suggest that you read the OProfile
user manual at http://oprofile.sourceforge.net/docs before using Xenoprof.

System wide profiling in Xen requires the cooperation of 3 software
components at different levels of the software stack.

a) Xenoprof: 

   Extensions to the Xen hypervisor to support system-wide statistical
   profiling.  Xenoprof programs hardware performance counters to
   generate sampling interrupts at regular event count intervals, and
   handles the Non Maskable Interrupts (NMI) generated by the
   performance counters at overflow.  The NMI handler samples the
   program counter (PC) at the time of interrupt and stores the PC
   value in a per domain sample buffer.  Domains interact with
   Xenoprof using a specific hypercall.  This hypercall enables
   domains to define the hardware performance events to be sampled and
   their parameters (e.g., overflow interval), as well as to control
   the start and end of profiling.  Domains are notified of new PC
   samples in their respective sample buffers using the virtual
   interrupt mechanism provided by Xen (e.g., event notification).

b) OProfile kernel module:

   This module is responsible for interpreting the PC samples received
   from Xenoprof and mapping the PC sample to the appropriate routine
   in user, kernel or hypervisor level.  The original OProfile kernel
   module for linux was modified to use the Xenoprof interface instead
   of accessing the hardware counters directly.

   The OProfile module is organized in two main components: a low
   level driver, specific to a particular CPU model, and a generic
   module that is independent of the specific CPU model and implements
   the higher level profiling functions.  To enable OProfile to be
   used with Xenoprof, a new low level driver specific to Xen was
   created.  This driver accesses the hypervisor through the exposed
   Xenoprof interface, while the high level generic module was kept
   almost unmodified, except for minor changes necessary to interpret
   performance events associated with the hypervisor.

c) OProfile user level daemon and tools:

   The user level daemon is responsible for collecting the performance
   event samples from the kernel module and storing them on files for
   later processing and reporting.  The user level tools implement
   commands that enable the user to start and stop a profiling
   session, selecting the appropriate performance events and
   parameters as well as to generate reports.  In order to be used in
   a Xen environment these tools were slightly modified.  In
   particular, new command line options were added to the opcontrol
   command as described below.


2. Profiling multiple domains
=============================

A profiling session may profile one, a subset, or all domains running
in a particular physical machine.  In every profiling session one of
the domains takes the role of the initiator, which is responsible for
configuring, starting and stopping the session.  Other domains can be
included in the session as active participants or passive
participants.  Active participants are domains which have an active
OProfile kernel module that can map a PC sample to the appropriate
routine in user, kernel or hypervisor level, given that the CPU was
executing that domain when the PC was sampled.  Passive participants
do not need to be executing an OProfile kernel module.  For these
domains performance profiling is done at a coarser granularity with PC
samples being assigned to the domain as whole, instead of to specific
routines.  Passive domains are useful when profiling systems running
domains with operating systems that do not support the OProfile kernel
module or equivalent.  Note that the initiator must always be an
active domain.  The initiator will process the PC samples of all
passive domains.

A performance event (generated when one of the hardware performance
counters overflows) is delivered to the appropriate domain, depending
on the type of domain running at the time of the event.  If the
running domain is an active domain the PC sample is delivered to that
domain.  If the running domain is a passive domain, the PC sample is
delivered to the initiator.  If the running domain is not included in
the profiling session, the PC sample is discarded.


3) Extensions to OProfile user level commands
=============================================

A few command line options were added to OProfile command "opcontrol"
for use in Xen environments. The new command line options are:

a) --xen=<xen_image_file>
   This option is used to specify the xen image (e.g. xen-syms). This
   is used to resolve PC samples collected when executing the
   hypervisor.

b) --active-domains=<list> 
   (where <list> is a list of comma separated domain ids)

   This option is used in the initiator domain to specify the list of
   active domains to be profiled.  The specification of the initiator
   domain id in the list of active domains is not necessary.  The
   initiator domain will always be considered an active domain and its
   inclusion on the specified active domain list is optional.

   For example: --active-domains=2,5,6 indicates that domains 2, 5 and
   6 are active domains. Assuming that domain 0 was the initiator the
   previous specification would be equivalent to
   --active-domains=0,2,5,6.

c) --passive-domains=<list>
   This option is used to specify the list of passive domains.


Besides opcontrol no other OProfile commands were modified for use in
Xen environments.

Full system profiling reports can be easily obtained by concatenating
the individual reports of each active domain, using the regular
opreport command in each active domain.  New tools that combine
multiple reports on a single system-wide report can be implemented in
the future.

4) Multi-domain profiling
=========================

In order to start and stop a profiling session across multiple domains
a set of OProfile commands must be executed in the multiple domains in
a coordinated way.  A typical sequence of commands for starting and
stopping profiling are listed below

 A) Sequence of commands to start profiling:
     1) On the initiator domain
         > opcontrol --reset
           (clear out any previous data of current session)
         > opcontrol --start-daemon
                     [--active-domains=<active_list>]
                     [--passive-domains=<passive_list>] ...
           (start OProfile daemon and specify the set of active and
           passive domains in the session)

     2) On each active domain
         > opcontrol --reset
         > opcontrol --start
         (indicates domain is ready to process performance events)
     3) On initiator
         > opcontrol --start
         (Multi-domain profiling session starts)
         (This is only successful if all active domains are ready)

 B) Sequence of commands to stop profiling
     1) On each active domain
         > opcontrol --stop
     2) On initiator domain
         > opcontrol --stop

5) Current supported configurations 

  a) Xen versions:           Xen 2.0.3 to 2.0.5
  b) Processor architecture: X86
  c) Processor models:       Pentium 4, Pentium iii
  d) Active Domains:         Uniprocessor - linux 2.6. (No SMP, No linux 2.4)
  e) Passive Domains:        No restriction

6) Patch files
==============

  In order to run OProfile in Xen environments three patches are needed:
  a) xenoprof-1.0-xen-2.0.5.patch
     Patch for Xen hypervisor.
  b) xenoprof-1.0-linux-2.6.10.patch
     Patch for Linux 2.6.10 (Apply to linux-sparse tree in Xen source tree)
  c) xenoprof-1.0-oprofile-0.8.1.patch
     Patch for OProfile 0.8.1

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread
* New MPI benchmark performance results (update)
@ 2005-05-03  9:11 xuehai zhang
  2005-05-03  9:28 ` Steven Hand
  2005-05-03 20:24 ` Nivedita Singhvi
  0 siblings, 2 replies; 10+ messages in thread
From: xuehai zhang @ 2005-05-03  9:11 UTC (permalink / raw)
  To: Xen-devel

Hi all,

In the following post I sent in early April 
(http://lists.xensource.com/archives/html/xen-devel/2005-04/msg00091.html), I reported some 
performance gap when running PMB SendRecv benchmark on both native Linux and domU. Now I've prepared 
a webpage comparing 8 PMB benchmarks' performance under 4 scenarios (native Linux, dom0, domU with 
SMP, and domU without SMP) at http://people.cs.uchicago.edu/~hai/vm1/vcluster/PMB/.

In the graphs presented on the webpage, we take the results of native Linux as the reference and 
normalize the other 3 scenarios to it. We observe a general pattern that usually dom0 has a better 
performance than domU with SMP than domU without SMP (here better performance means low latency and 
high throughput). However, we also notice very big performance gap between domU (w/o SMP) and native 
linux (or dom0 because generally dom0 has a very similar performance as native linux). Some distinct 
examples are: 8-node SendRecv latency (max domU/linux score ~ 18), 8-node Allgather latency (max 
domU/linux score ~ 17), and 8-node Alltoall latency (max domU/linux > 60). The performance 
difference in the last example is HUGE and we could not think about a reasonable explaination why 
transferring 512B message size is so much different than other sizes. We appreciate if you can 
provide your insight to such a big performance problem in these benchmarks.

BTW, all the benchmarking is based on the original Xen code. That is, we didn't modify the 
net_rx_action netback to kick the frontend after every packet as suggested by Ian in the following 
post (http://lists.xensource.com/archives/html/xen-devel/2005-04/msg00180.html)

Please let me know if you have any questions about the configuration of the benchmarking 
experiments. I am looking forward to your insightful explainations.

Thanks.

Xuehai

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-05-03 22:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-03 13:56 New MPI benchmark performance results (update) Ian Pratt
2005-05-03 16:48 ` xuehai zhang
  -- strict thread matches above, loose matches on Subject: below --
2005-05-03 19:09 Santos, Jose Renato G (Jose Renato Santos)
2005-05-03  9:11 xuehai zhang
2005-05-03  9:28 ` Steven Hand
2005-05-03 16:36   ` xuehai zhang
2005-05-03 16:13     ` Mark Williamson
2005-05-03 16:58       ` xuehai zhang
2005-05-03 20:24 ` Nivedita Singhvi
2005-05-03 22:05   ` xuehai zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.