From mboxrd@z Thu Jan  1 00:00:00 1970
From: xuehai zhang <hai@cs.uchicago.edu>
Subject: Re: New MPI benchmark performance results (update)
Date: Tue, 03 May 2005 11:48:38 -0500
Message-ID: <4277AB66.1010002@cs.uchicago.edu>
References: <A95E2296287EAD4EB592B5DEEFCE0E9D1E3E66@liverpoolst.ad.cl.cam.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <A95E2296287EAD4EB592B5DEEFCE0E9D1E3E66@liverpoolst.ad.cl.cam.ac.uk>
List-Unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Cc: Xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

Ian,

Thanks for the response.

>>In the graphs presented on the webpage, we take the results 
>>of native Linux as the reference and normalize the other 3 
>>scenarios to it. We observe a general pattern that usually 
>>dom0 has a better performance than domU with SMP than domU 
>>without SMP (here better performance means low latency and 
>>high throughput). However, we also notice very big 
>>performance gap between domU (w/o SMP) and native linux (or 
>>dom0 because generally dom0 has a very similar performance as 
>>native linux). Some distinct examples are: 8-node SendRecv 
>>latency (max domU/linux score ~ 18), 8-node Allgather latency 
>>(max domU/linux score ~ 17), and 8-node Alltoall latency (max 
>>domU/linux > 60). The performance difference in the last 
>>example is HUGE and we could not think about a reasonable 
>>explaination why transferring 512B message size is so much 
>>different than other sizes. We appreciate if you can provide 
>>your insight to such a big performance problem in these benchmarks.
> 
> 
> I still don't quite understand your experimental setup. What version of
> Xen are you using? How many CPUs does each node have? How many domU's do
> you run on a single node?

The Xen version is 2.0. Each node has 2 CPUs. "domU with SMP" I mentioned in the previous email 
means Xen is booted with SMP support (no "nosmp" option) and I pin dom0 to the 1st CPU and pin domU 
to the 2nd CPU; "domU with no SMP" I mentioned means Xen is booted without SMP support (with "nosmp" 
option) and both dom0 and domU use the same single CPU. There is only 1 domU running on a single 
node for each experiment.

> As regards the anomalous result for 512B AlltoAll performance, the best
> way to track this down would be to use xen-oprofile. 

I am not very familar with xen-oprofile. I notice there are some discussions about it in the mailing 
list. I wonder if there is any other documents that I can refer to. Thanks.

> Is it reliably repeatable? 

Yes, we observe this anomaly repeatable. The reported data point in the graph is the average of 10 
different runs of the same experiment in different time.

> Really bad results are usually due to packets being dropped
> somewhere -- there hasn't ben a whole lot of effort put into UDP
> performance because so few applications use it.

To clarify: do you indicate that benchmark like AlltoAll might use UDP rather than TCP as 
transportation protocol?

Thanks again for the help.

Xuehai