From mboxrd@z Thu Jan 1 00:00:00 1970 From: xuehai zhang Subject: Re: New MPI benchmark performance results (update) Date: Tue, 03 May 2005 11:48:38 -0500 Message-ID: <4277AB66.1010002@cs.uchicago.edu> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Pratt Cc: Xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Ian, Thanks for the response. >>In the graphs presented on the webpage, we take the results >>of native Linux as the reference and normalize the other 3 >>scenarios to it. We observe a general pattern that usually >>dom0 has a better performance than domU with SMP than domU >>without SMP (here better performance means low latency and >>high throughput). However, we also notice very big >>performance gap between domU (w/o SMP) and native linux (or >>dom0 because generally dom0 has a very similar performance as >>native linux). Some distinct examples are: 8-node SendRecv >>latency (max domU/linux score ~ 18), 8-node Allgather latency >>(max domU/linux score ~ 17), and 8-node Alltoall latency (max >>domU/linux > 60). The performance difference in the last >>example is HUGE and we could not think about a reasonable >>explaination why transferring 512B message size is so much >>different than other sizes. We appreciate if you can provide >>your insight to such a big performance problem in these benchmarks. > > > I still don't quite understand your experimental setup. What version of > Xen are you using? How many CPUs does each node have? How many domU's do > you run on a single node? The Xen version is 2.0. Each node has 2 CPUs. "domU with SMP" I mentioned in the previous email means Xen is booted with SMP support (no "nosmp" option) and I pin dom0 to the 1st CPU and pin domU to the 2nd CPU; "domU with no SMP" I mentioned means Xen is booted without SMP support (with "nosmp" option) and both dom0 and domU use the same single CPU. There is only 1 domU running on a single node for each experiment. > As regards the anomalous result for 512B AlltoAll performance, the best > way to track this down would be to use xen-oprofile. I am not very familar with xen-oprofile. I notice there are some discussions about it in the mailing list. I wonder if there is any other documents that I can refer to. Thanks. > Is it reliably repeatable? Yes, we observe this anomaly repeatable. The reported data point in the graph is the average of 10 different runs of the same experiment in different time. > Really bad results are usually due to packets being dropped > somewhere -- there hasn't ben a whole lot of effort put into UDP > performance because so few applications use it. To clarify: do you indicate that benchmark like AlltoAll might use UDP rather than TCP as transportation protocol? Thanks again for the help. Xuehai