MPI benchmark performance gap between native linux and domU

All of lore.kernel.org
 help / color / mirror / Atom feed

* MPI benchmark performance gap between native linux and domU
@ 2005-04-04 22:43 xuehai zhang
  0 siblings, 0 replies; 4+ messages in thread
From: xuehai zhang @ 2005-04-04 22:43 UTC (permalink / raw)
  To: xen-devel

Hi all,

I did the following experiments to explore the MPI application execution performance
on both native linux machines and inside of unpriviledged Xen user domains. I use 8
machines with identical HW configurations (498.756 MHz dual CPU, 512MB memory, on a
10MB/sec LAN) and I use Pallas MPI Benchmarks (PMB).

Experiment 1: I boot all 8 nodes with native linux (nosmp, kernel 2.4.29) and use all
of them for PMB tests.

Experiment 2: I boot all 8 nodes with Xen running and start a single user domain
(port 2.6.10,using file-backed VBD) on each node with 360MB memory. Then I run the
same PMB tests among these 8 user domains.

The expreiment results show, running a same MPI benchmark in user domains usually
results in a worse (sometimes very bad) performance comparing with on native
linux machines. The following are the results for PMB SendRecv benchmark for both
experiments (table1 and table2 report throughput and latency respectively). As you may
notice, SendRecv can achieve a 14.9MB/sec throughput on native linux machines but can
get a maximum 7.07 MB/sec throughput if running inside of user domains. The latency
results also have big gap.

Clearly, there is difference between the memory used in the native linux machine of
Experiment 1 (512MB) and in the user domain (360MB, can not go higher because dom0
started with 128MB memory) of Experiment 2. However, I don't think it is the main
cause of the performance gap because the tested message sizes are much smaller than
both memory sizes.

I will appreciate your help if you had the similar experience and wanna share your
insights.

BTW, if you are not familar with PMB SendRecv benchmark, you can find a detailed
explaination at http://people.cs.uchicago.edu/~hai/PMB-MPI1.pdf (see section 4.3.1).

Thanks in advance for you help.

Xuehai

P.S. Table 1: SendRecv throughput (MB/sec) performance

Message_Size(bytes)	Experiment_1	Experiment_2
0		        0             0
1		        0             0
2		        0             0
4		        0             0
8		        0.04          0.01
16	                0.16          0.01
32	                0.34          0.02
64	                0.65          0.04
128	                1.17          0.09
256	                2.15          0.59
512	                3.4           1.23
1K	                5.29          2.57
2K	                7.68          3.5
4K	                10.7          4.96
8K	                13.35         7.07
16K	                14.9          3.77
32K	                9.85          3.68
64K	                5.06          3.02
128K                    7.91          4.94
256K                    7.85          5.25
512K                    7.93          6.11
1M	                7.85          6.5
2M	                8.18          5.44
4M	                7.55          4.93

Table 2: SendRecv latency (millisec) performance

Message_Size(bytes)	Experiment_1	Experiment_2
0                   1979.6        3010.96
1                   1724.16       3218.88
2                   1669.65       3185.3
4                   1637.26       3055.67
8                   406.77        2966.17
16                  185.76        2777.89
32                  181.06        2791.06
64                  189.12        2940.82
128                 210.51        2716.3
256                 227.36        843.94
512                 287.28        796.71
1K                  368.72        758.19
2K                  508.65        1144.24
4K                  730.59        1612.66
8K                  1170.22       2471.65
16K                 2096.86       8300.18
32K                 6340.45       17017.99
64K                 24640.78      41264.5
128K                31709.09      50608.97
256K                63680.67      94918.13
512K                125531.7      162168.47
1M                  251566.94     321451.02
2M                  477431.32     707981
4M                  997768.35     1503987.61

^ permalink raw reply	[flat|nested] 4+ messages in thread

* MPI benchmark performance gap between native linux and domU
@ 2005-04-04 23:18 xuehai zhang
  2005-04-04 23:37 ` Nivedita Singhvi
  0 siblings, 1 reply; 4+ messages in thread
From: xuehai zhang @ 2005-04-04 23:18 UTC (permalink / raw)
  To: Xen-devel

Hi all,

I did the following experiments to explore the MPI application execution performance
on both native linux machines and inside of unpriviledged Xen user domains. I use 8
machines with identical HW configurations (498.756 MHz dual CPU, 512MB memory, on a
10MB/sec LAN) and I use Pallas MPI Benchmarks (PMB).

Experiment 1: I boot all 8 nodes with native linux (nosmp, kernel 2.4.29) and use all
of them for PMB tests.

Experiment 2: I boot all 8 nodes with Xen running and start a single user domain
(port 2.6.10,using file-backed VBD) on each node with 360MB memory. Then I run the
same PMB tests among these 8 user domains.

The expreiment results show, running a same MPI benchmark in user domains usually
results in a worse (sometimes very bad) performance comparing with on native
linux machines. The following are the results for PMB SendRecv benchmark for both
experiments (table1 and table2 report throughput and latency respectively). As you may
notice, SendRecv can achieve a 14.9MB/sec throughput on native linux machines but can
get a maximum 7.07 MB/sec throughput if running inside of user domains. The latency
results also have big gap.

Clearly, there is difference between the memory used in the native linux machine of
Experiment 1 (512MB) and in the user domain (360MB, can not go higher because dom0
started with 128MB memory) of Experiment 2. However, I don't think it is the main
cause of the performance gap because the tested message sizes are much smaller than
both memory sizes.

I will appreciate your help if you had the similar experience and wanna share your
insights.

BTW, if you are not familar with PMB SendRecv benchmark, you can find a detailed
explaination at http://people.cs.uchicago.edu/~hai/PMB-MPI1.pdf (see section 4.3.1).

Thanks in advance for you help.

Xuehai

P.S. Table 1: SendRecv throughput (MB/sec) performance

Message_Size(bytes)    Experiment_1    Experiment_2
0                0             0
1                0             0
2                0             0
4                0             0
8                0.04          0.01
16                    0.16          0.01
32                    0.34          0.02
64                    0.65          0.04
128                    1.17          0.09
256                    2.15          0.59
512                    3.4           1.23
1K                    5.29          2.57
2K                    7.68          3.5
4K                    10.7          4.96
8K                    13.35         7.07
16K                    14.9          3.77
32K                    9.85          3.68
64K                    5.06          3.02
128K                    7.91          4.94
256K                    7.85          5.25
512K                    7.93          6.11
1M                    7.85          6.5
2M                    8.18          5.44
4M                    7.55          4.93

Table 2: SendRecv latency (millisec) performance

Message_Size(bytes)    Experiment_1    Experiment_2
0                   1979.6        3010.96
1                   1724.16       3218.88
2                   1669.65       3185.3
4                   1637.26       3055.67
8                   406.77        2966.17
16                  185.76        2777.89
32                  181.06        2791.06
64                  189.12        2940.82
128                 210.51        2716.3
256                 227.36        843.94
512                 287.28        796.71
1K                  368.72        758.19
2K                  508.65        1144.24
4K                  730.59        1612.66
8K                  1170.22       2471.65
16K                 2096.86       8300.18
32K                 6340.45       17017.99
64K                 24640.78      41264.5
128K                31709.09      50608.97
256K                63680.67      94918.13
512K                125531.7      162168.47
1M                  251566.94     321451.02
2M                  477431.32     707981
4M                  997768.35     1503987.61

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: MPI benchmark performance gap between native linux and domU
  2005-04-04 23:18 MPI benchmark performance gap between native linux and domU xuehai zhang
@ 2005-04-04 23:37 ` Nivedita Singhvi
  2005-04-05  4:49   ` xuehai zhang
  0 siblings, 1 reply; 4+ messages in thread
From: Nivedita Singhvi @ 2005-04-04 23:37 UTC (permalink / raw)
  To: xuehai zhang; +Cc: Xen-devel

xuehai zhang wrote:


> Experiment 1: I boot all 8 nodes with native linux (nosmp, kernel 
> 2.4.29) and use all

> Experiment 2: I boot all 8 nodes with Xen running and start a single 
> user domain
> (port 2.6.10,using file-backed VBD) on each node with 360MB memory. Then 


What do you get when you compare 2.4.29 native Linux
against 2.6.10 native Linux, without Xen involved at
all?

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: MPI benchmark performance gap between native linux and domU
  2005-04-04 23:37 ` Nivedita Singhvi
@ 2005-04-05  4:49   ` xuehai zhang
  0 siblings, 0 replies; 4+ messages in thread
From: xuehai zhang @ 2005-04-05  4:49 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: Xen-devel

Nivedita Singhvi wrote:
> xuehai zhang wrote:
> 
> 
>> Experiment 1: I boot all 8 nodes with native linux (nosmp, kernel 
>> 2.4.29) and use all
> 
> 
>> Experiment 2: I boot all 8 nodes with Xen running and start a single 
>> user domain
>> (port 2.6.10,using file-backed VBD) on each node with 360MB memory. Then 
> 
> 
> 
> What do you get when you compare 2.4.29 native Linux
> against 2.6.10 native Linux, without Xen involved at
> all?

2.6.10 is not for native Linux but for domU (Xen is running on the machine).
Xuehai

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-04-05  4:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-04 23:18 MPI benchmark performance gap between native linux and domU xuehai zhang
2005-04-04 23:37 ` Nivedita Singhvi
2005-04-05  4:49   ` xuehai zhang
  -- strict thread matches above, loose matches on Subject: below --
2005-04-04 22:43 xuehai zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.