memory performance 20% degradation in DomU -- Sisu

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* memory performance 20% degradation in DomU -- Sisu
@ 2014-03-04 22:49 Sisu Xi
  2014-03-04 23:00 ` Sisu Xi
  0 siblings, 1 reply; 15+ messages in thread
From: Sisu Xi @ 2014-03-04 22:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Meng Xu

[-- Attachment #1.1: Type: text/plain, Size: 1809 bytes --]

Hi, all:

I am trying to study the cache/memory performance under Xen, and has
encountered some problems.

My machine is has an Intel Core i7 X980 processor with 6 physical cores. I
disabled hyper-threading, frequency scaling, so it should be running at
constant speed.
Dom0 was boot with 1 VCPU pinned to 1 core, with 2 GB of memory.

After that, I boot up DomU with 1 VCPU pinned to a separate core, with 1 GB
of memory. The credit scheduler is used, and no cap is set for them. So
DomU should be able to access all resources.

Each physical core has a 32KB dedicated L1 cache, 256KB dedicated L2 cache.
And all cores share a 12MB L3 cache.

I created a simple program to create an array of specified size. Load them
once, and then randomly access every cache line once. (1 cache line is 64B
on my machine).
rdtsc is used to record the duration for the random access.

I tried different data sizes, with 1000 repeat for each data sizes.
Attached is the boxplot for average access time for one cache line.

The x axis is the different Data Size, the y axis is the CPU cycle. The
three vertical lines at 32KB, 256KB, and 12MB represents the size
difference in L1, L2, and L3 cache on my machine.
*The black box are the results I got when I run it in non-virtualized,
while the blue box are the results I got in DomU.*

For some reason, the results in DomU varies much more than the results in
non-virtualized environment.
I also repeated the same experiments in DomU with Run Level 1, the results
are the same.

Can anyone give some suggestions about what might be the reason for this?

Thanks very much!

Sisu

-- 
Sisu Xi, PhD Candidate

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

[-- Attachment #1.2: Type: text/html, Size: 2281 bytes --]

[-- Attachment #2: cache_latency_size_boxplot.jpg --]
[-- Type: image/jpeg, Size: 205023 bytes --]

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-04 22:49 memory performance 20% degradation in DomU -- Sisu Sisu Xi
@ 2014-03-04 23:00 ` Sisu Xi
  2014-03-05 17:33   ` Konrad Rzeszutek Wilk
  2014-03-11 12:03   ` George Dunlap
  0 siblings, 2 replies; 15+ messages in thread
From: Sisu Xi @ 2014-03-04 23:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Meng Xu


[-- Attachment #1.1: Type: text/plain, Size: 2988 bytes --]

Hi, all:

I also used the ramspeed to measure memory throughput.
http://alasir.com/software/ramspeed/

I am using the v2.6, single core version. The command I used is ./ramspeed
-b 3 (for int) and ./ramspeed -b 6 (for float).
The benchmark measures four operations: add, copy, scale, and triad. And
also gives an average number for all four operations.

The results in DomU shows around 20% performance degradation compared to
non-virt results.

Attached is the results. The left part are results for int, while the right
part is the results for float. The Y axis is the measured throughput. Each
box contains 100 experiment repeats.
The black boxes are the results in non-virtualized environment, while the
blue ones are the results I got in DomU.

The Xen version I am using is 4.3.0, 64bit.

Thanks very much!

Sisu



On Tue, Mar 4, 2014 at 4:49 PM, Sisu Xi <xisisu@gmail.com> wrote:

> Hi, all:
>
> I am trying to study the cache/memory performance under Xen, and has
> encountered some problems.
>
> My machine is has an Intel Core i7 X980 processor with 6 physical cores. I
> disabled hyper-threading, frequency scaling, so it should be running at
> constant speed.
> Dom0 was boot with 1 VCPU pinned to 1 core, with 2 GB of memory.
>
> After that, I boot up DomU with 1 VCPU pinned to a separate core, with 1
> GB of memory. The credit scheduler is used, and no cap is set for them. So
> DomU should be able to access all resources.
>
> Each physical core has a 32KB dedicated L1 cache, 256KB dedicated L2
> cache. And all cores share a 12MB L3 cache.
>
> I created a simple program to create an array of specified size. Load them
> once, and then randomly access every cache line once. (1 cache line is 64B
> on my machine).
> rdtsc is used to record the duration for the random access.
>
> I tried different data sizes, with 1000 repeat for each data sizes.
> Attached is the boxplot for average access time for one cache line.
>
> The x axis is the different Data Size, the y axis is the CPU cycle. The
> three vertical lines at 32KB, 256KB, and 12MB represents the size
> difference in L1, L2, and L3 cache on my machine.
> *The black box are the results I got when I run it in non-virtualized,
> while the blue box are the results I got in DomU.*
>
> For some reason, the results in DomU varies much more than the results in
> non-virtualized environment.
> I also repeated the same experiments in DomU with Run Level 1, the results
> are the same.
>
> Can anyone give some suggestions about what might be the reason for this?
>
> Thanks very much!
>
> Sisu
>
> --
> Sisu Xi, PhD Candidate
>
> http://www.cse.wustl.edu/~xis/
> Department of Computer Science and Engineering
> Campus Box 1045
> Washington University in St. Louis
> One Brookings Drive
> St. Louis, MO 63130
>



-- 
Sisu Xi, PhD Candidate

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

[-- Attachment #1.2: Type: text/html, Size: 4144 bytes --]

[-- Attachment #2: cache_ramspeed.jpg --]
[-- Type: image/jpeg, Size: 151519 bytes --]

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-04 23:00 ` Sisu Xi
@ 2014-03-05 17:33   ` Konrad Rzeszutek Wilk
  2014-03-05 20:09     ` Sisu Xi
  2014-03-11 12:03   ` George Dunlap
  1 sibling, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-05 17:33 UTC (permalink / raw)
  To: Sisu Xi; +Cc: Meng Xu, xen-devel

On Tue, Mar 04, 2014 at 05:00:46PM -0600, Sisu Xi wrote:
> Hi, all:
> 
> I also used the ramspeed to measure memory throughput.
> http://alasir.com/software/ramspeed/
> 
> I am using the v2.6, single core version. The command I used is ./ramspeed
> -b 3 (for int) and ./ramspeed -b 6 (for float).
> The benchmark measures four operations: add, copy, scale, and triad. And
> also gives an average number for all four operations.
> 
> The results in DomU shows around 20% performance degradation compared to
> non-virt results.

What kind of domU? PV or HVM?
> 
> Attached is the results. The left part are results for int, while the right
> part is the results for float. The Y axis is the measured throughput. Each
> box contains 100 experiment repeats.
> The black boxes are the results in non-virtualized environment, while the
> blue ones are the results I got in DomU.
> 
> The Xen version I am using is 4.3.0, 64bit.
> 
> Thanks very much!
> 
> Sisu
> 
> 
> 
> On Tue, Mar 4, 2014 at 4:49 PM, Sisu Xi <xisisu@gmail.com> wrote:
> 
> > Hi, all:
> >
> > I am trying to study the cache/memory performance under Xen, and has
> > encountered some problems.
> >
> > My machine is has an Intel Core i7 X980 processor with 6 physical cores. I
> > disabled hyper-threading, frequency scaling, so it should be running at
> > constant speed.
> > Dom0 was boot with 1 VCPU pinned to 1 core, with 2 GB of memory.
> >
> > After that, I boot up DomU with 1 VCPU pinned to a separate core, with 1
> > GB of memory. The credit scheduler is used, and no cap is set for them. So
> > DomU should be able to access all resources.
> >
> > Each physical core has a 32KB dedicated L1 cache, 256KB dedicated L2
> > cache. And all cores share a 12MB L3 cache.
> >
> > I created a simple program to create an array of specified size. Load them
> > once, and then randomly access every cache line once. (1 cache line is 64B
> > on my machine).
> > rdtsc is used to record the duration for the random access.
> >
> > I tried different data sizes, with 1000 repeat for each data sizes.
> > Attached is the boxplot for average access time for one cache line.
> >
> > The x axis is the different Data Size, the y axis is the CPU cycle. The
> > three vertical lines at 32KB, 256KB, and 12MB represents the size
> > difference in L1, L2, and L3 cache on my machine.
> > *The black box are the results I got when I run it in non-virtualized,
> > while the blue box are the results I got in DomU.*
> >
> > For some reason, the results in DomU varies much more than the results in
> > non-virtualized environment.
> > I also repeated the same experiments in DomU with Run Level 1, the results
> > are the same.
> >
> > Can anyone give some suggestions about what might be the reason for this?
> >
> > Thanks very much!
> >
> > Sisu
> >
> > --
> > Sisu Xi, PhD Candidate
> >
> > http://www.cse.wustl.edu/~xis/
> > Department of Computer Science and Engineering
> > Campus Box 1045
> > Washington University in St. Louis
> > One Brookings Drive
> > St. Louis, MO 63130
> >
> 
> 
> 
> -- 
> Sisu Xi, PhD Candidate
> 
> http://www.cse.wustl.edu/~xis/
> Department of Computer Science and Engineering
> Campus Box 1045
> Washington University in St. Louis
> One Brookings Drive
> St. Louis, MO 63130


> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-05 17:33   ` Konrad Rzeszutek Wilk
@ 2014-03-05 20:09     ` Sisu Xi
  2014-03-05 21:29       ` Gordan Bobic
  2014-03-05 22:09       ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 15+ messages in thread
From: Sisu Xi @ 2014-03-05 20:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3946 bytes --]

Hi, Konrad:

It is the PV domU.

Thanks.

Sisu


On Wed, Mar 5, 2014 at 11:33 AM, Konrad Rzeszutek Wilk <
konrad.wilk@oracle.com> wrote:

> On Tue, Mar 04, 2014 at 05:00:46PM -0600, Sisu Xi wrote:
> > Hi, all:
> >
> > I also used the ramspeed to measure memory throughput.
> > http://alasir.com/software/ramspeed/
> >
> > I am using the v2.6, single core version. The command I used is
> ./ramspeed
> > -b 3 (for int) and ./ramspeed -b 6 (for float).
> > The benchmark measures four operations: add, copy, scale, and triad. And
> > also gives an average number for all four operations.
> >
> > The results in DomU shows around 20% performance degradation compared to
> > non-virt results.
>
> What kind of domU? PV or HVM?
> >
> > Attached is the results. The left part are results for int, while the
> right
> > part is the results for float. The Y axis is the measured throughput.
> Each
> > box contains 100 experiment repeats.
> > The black boxes are the results in non-virtualized environment, while the
> > blue ones are the results I got in DomU.
> >
> > The Xen version I am using is 4.3.0, 64bit.
> >
> > Thanks very much!
> >
> > Sisu
> >
> >
> >
> > On Tue, Mar 4, 2014 at 4:49 PM, Sisu Xi <xisisu@gmail.com> wrote:
> >
> > > Hi, all:
> > >
> > > I am trying to study the cache/memory performance under Xen, and has
> > > encountered some problems.
> > >
> > > My machine is has an Intel Core i7 X980 processor with 6 physical
> cores. I
> > > disabled hyper-threading, frequency scaling, so it should be running at
> > > constant speed.
> > > Dom0 was boot with 1 VCPU pinned to 1 core, with 2 GB of memory.
> > >
> > > After that, I boot up DomU with 1 VCPU pinned to a separate core, with
> 1
> > > GB of memory. The credit scheduler is used, and no cap is set for
> them. So
> > > DomU should be able to access all resources.
> > >
> > > Each physical core has a 32KB dedicated L1 cache, 256KB dedicated L2
> > > cache. And all cores share a 12MB L3 cache.
> > >
> > > I created a simple program to create an array of specified size. Load
> them
> > > once, and then randomly access every cache line once. (1 cache line is
> 64B
> > > on my machine).
> > > rdtsc is used to record the duration for the random access.
> > >
> > > I tried different data sizes, with 1000 repeat for each data sizes.
> > > Attached is the boxplot for average access time for one cache line.
> > >
> > > The x axis is the different Data Size, the y axis is the CPU cycle. The
> > > three vertical lines at 32KB, 256KB, and 12MB represents the size
> > > difference in L1, L2, and L3 cache on my machine.
> > > *The black box are the results I got when I run it in non-virtualized,
> > > while the blue box are the results I got in DomU.*
> > >
> > > For some reason, the results in DomU varies much more than the results
> in
> > > non-virtualized environment.
> > > I also repeated the same experiments in DomU with Run Level 1, the
> results
> > > are the same.
> > >
> > > Can anyone give some suggestions about what might be the reason for
> this?
> > >
> > > Thanks very much!
> > >
> > > Sisu
> > >
> > > --
> > > Sisu Xi, PhD Candidate
> > >
> > > http://www.cse.wustl.edu/~xis/
> > > Department of Computer Science and Engineering
> > > Campus Box 1045
> > > Washington University in St. Louis
> > > One Brookings Drive
> > > St. Louis, MO 63130
> > >
> >
> >
> >
> > --
> > Sisu Xi, PhD Candidate
> >
> > http://www.cse.wustl.edu/~xis/
> > Department of Computer Science and Engineering
> > Campus Box 1045
> > Washington University in St. Louis
> > One Brookings Drive
> > St. Louis, MO 63130
>
>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
>
>


-- 
Sisu Xi, PhD Candidate

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

[-- Attachment #1.2: Type: text/html, Size: 5592 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-05 20:09     ` Sisu Xi
@ 2014-03-05 21:29       ` Gordan Bobic
  2014-03-05 22:28         ` Konrad Rzeszutek Wilk
  2014-03-05 22:09       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 15+ messages in thread
From: Gordan Bobic @ 2014-03-05 21:29 UTC (permalink / raw)
  To: Sisu Xi, Konrad Rzeszutek Wilk; +Cc: Meng Xu, xen-devel

Just out of interest, have you tried the same test with HVM DomU? The 
two have different characteristics, and IIRC for some workloads PV can 
be slower than HVM. The recent PVHVM work was intended to result in the 
best aspects of both, but that is more recent than Xen 4.3.0.

It is also interesting that your findings are approximately similar to 
mine, albeit with a very different testing methodology:

http://goo.gl/lIUk4y

Gordan

On 03/05/2014 08:09 PM, Sisu Xi wrote:
> Hi, Konrad:
>
> It is the PV domU.
>
> Thanks.
>
> Sisu
>
>
> On Wed, Mar 5, 2014 at 11:33 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com <mailto:konrad.wilk@oracle.com>> wrote:
>
>     On Tue, Mar 04, 2014 at 05:00:46PM -0600, Sisu Xi wrote:
>      > Hi, all:
>      >
>      > I also used the ramspeed to measure memory throughput.
>      > http://alasir.com/software/ramspeed/
>      >
>      > I am using the v2.6, single core version. The command I used is
>     ./ramspeed
>      > -b 3 (for int) and ./ramspeed -b 6 (for float).
>      > The benchmark measures four operations: add, copy, scale, and
>     triad. And
>      > also gives an average number for all four operations.
>      >
>      > The results in DomU shows around 20% performance degradation
>     compared to
>      > non-virt results.
>
>     What kind of domU? PV or HVM?
>      >
>      > Attached is the results. The left part are results for int, while
>     the right
>      > part is the results for float. The Y axis is the measured
>     throughput. Each
>      > box contains 100 experiment repeats.
>      > The black boxes are the results in non-virtualized environment,
>     while the
>      > blue ones are the results I got in DomU.
>      >
>      > The Xen version I am using is 4.3.0, 64bit.
>      >
>      > Thanks very much!
>      >
>      > Sisu
>      >
>      >
>      >
>      > On Tue, Mar 4, 2014 at 4:49 PM, Sisu Xi <xisisu@gmail.com
>     <mailto:xisisu@gmail.com>> wrote:
>      >
>      > > Hi, all:
>      > >
>      > > I am trying to study the cache/memory performance under Xen,
>     and has
>      > > encountered some problems.
>      > >
>      > > My machine is has an Intel Core i7 X980 processor with 6
>     physical cores. I
>      > > disabled hyper-threading, frequency scaling, so it should be
>     running at
>      > > constant speed.
>      > > Dom0 was boot with 1 VCPU pinned to 1 core, with 2 GB of memory.
>      > >
>      > > After that, I boot up DomU with 1 VCPU pinned to a separate
>     core, with 1
>      > > GB of memory. The credit scheduler is used, and no cap is set
>     for them. So
>      > > DomU should be able to access all resources.
>      > >
>      > > Each physical core has a 32KB dedicated L1 cache, 256KB
>     dedicated L2
>      > > cache. And all cores share a 12MB L3 cache.
>      > >
>      > > I created a simple program to create an array of specified
>     size. Load them
>      > > once, and then randomly access every cache line once. (1 cache
>     line is 64B
>      > > on my machine).
>      > > rdtsc is used to record the duration for the random access.
>      > >
>      > > I tried different data sizes, with 1000 repeat for each data sizes.
>      > > Attached is the boxplot for average access time for one cache line.
>      > >
>      > > The x axis is the different Data Size, the y axis is the CPU
>     cycle. The
>      > > three vertical lines at 32KB, 256KB, and 12MB represents the size
>      > > difference in L1, L2, and L3 cache on my machine.
>      > > *The black box are the results I got when I run it in
>     non-virtualized,
>      > > while the blue box are the results I got in DomU.*
>      > >
>      > > For some reason, the results in DomU varies much more than the
>     results in
>      > > non-virtualized environment.
>      > > I also repeated the same experiments in DomU with Run Level 1,
>     the results
>      > > are the same.
>      > >
>      > > Can anyone give some suggestions about what might be the reason
>     for this?
>      > >
>      > > Thanks very much!
>      > >
>      > > Sisu
>      > >
>      > > --
>      > > Sisu Xi, PhD Candidate
>      > >
>      > > http://www.cse.wustl.edu/~xis/
>      > > Department of Computer Science and Engineering
>      > > Campus Box 1045
>      > > Washington University in St. Louis
>      > > One Brookings Drive
>      > > St. Louis, MO 63130
>      > >
>      >
>      >
>      >
>      > --
>      > Sisu Xi, PhD Candidate
>      >
>      > http://www.cse.wustl.edu/~xis/
>      > Department of Computer Science and Engineering
>      > Campus Box 1045
>      > Washington University in St. Louis
>      > One Brookings Drive
>      > St. Louis, MO 63130
>
>
>      > _______________________________________________
>      > Xen-devel mailing list
>      > Xen-devel@lists.xen.org <mailto:Xen-devel@lists.xen.org>
>      > http://lists.xen.org/xen-devel
>
>
>
>
> --
> Sisu Xi, PhD Candidate
>
> http://www.cse.wustl.edu/~xis/
> Department of Computer Science and Engineering
> Campus Box 1045
> Washington University in St. Louis
> One Brookings Drive
> St. Louis, MO 63130
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-05 20:09     ` Sisu Xi
  2014-03-05 21:29       ` Gordan Bobic
@ 2014-03-05 22:09       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-05 22:09 UTC (permalink / raw)
  To: Sisu Xi; +Cc: Meng Xu, xen-devel

On Wed, Mar 05, 2014 at 02:09:20PM -0600, Sisu Xi wrote:
> Hi, Konrad:
> 
> It is the PV domU.

Please try PVHVM as well.

And please don't top post.
> 
> Thanks.
> 
> Sisu
> 
> 
> On Wed, Mar 5, 2014 at 11:33 AM, Konrad Rzeszutek Wilk <
> konrad.wilk@oracle.com> wrote:
> 
> > On Tue, Mar 04, 2014 at 05:00:46PM -0600, Sisu Xi wrote:
> > > Hi, all:
> > >
> > > I also used the ramspeed to measure memory throughput.
> > > http://alasir.com/software/ramspeed/
> > >
> > > I am using the v2.6, single core version. The command I used is
> > ./ramspeed
> > > -b 3 (for int) and ./ramspeed -b 6 (for float).
> > > The benchmark measures four operations: add, copy, scale, and triad. And
> > > also gives an average number for all four operations.
> > >
> > > The results in DomU shows around 20% performance degradation compared to
> > > non-virt results.
> >
> > What kind of domU? PV or HVM?
> > >
> > > Attached is the results. The left part are results for int, while the
> > right
> > > part is the results for float. The Y axis is the measured throughput.
> > Each
> > > box contains 100 experiment repeats.
> > > The black boxes are the results in non-virtualized environment, while the
> > > blue ones are the results I got in DomU.
> > >
> > > The Xen version I am using is 4.3.0, 64bit.
> > >
> > > Thanks very much!
> > >
> > > Sisu
> > >
> > >
> > >
> > > On Tue, Mar 4, 2014 at 4:49 PM, Sisu Xi <xisisu@gmail.com> wrote:
> > >
> > > > Hi, all:
> > > >
> > > > I am trying to study the cache/memory performance under Xen, and has
> > > > encountered some problems.
> > > >
> > > > My machine is has an Intel Core i7 X980 processor with 6 physical
> > cores. I
> > > > disabled hyper-threading, frequency scaling, so it should be running at
> > > > constant speed.
> > > > Dom0 was boot with 1 VCPU pinned to 1 core, with 2 GB of memory.
> > > >
> > > > After that, I boot up DomU with 1 VCPU pinned to a separate core, with
> > 1
> > > > GB of memory. The credit scheduler is used, and no cap is set for
> > them. So
> > > > DomU should be able to access all resources.
> > > >
> > > > Each physical core has a 32KB dedicated L1 cache, 256KB dedicated L2
> > > > cache. And all cores share a 12MB L3 cache.
> > > >
> > > > I created a simple program to create an array of specified size. Load
> > them
> > > > once, and then randomly access every cache line once. (1 cache line is
> > 64B
> > > > on my machine).
> > > > rdtsc is used to record the duration for the random access.
> > > >
> > > > I tried different data sizes, with 1000 repeat for each data sizes.
> > > > Attached is the boxplot for average access time for one cache line.
> > > >
> > > > The x axis is the different Data Size, the y axis is the CPU cycle. The
> > > > three vertical lines at 32KB, 256KB, and 12MB represents the size
> > > > difference in L1, L2, and L3 cache on my machine.
> > > > *The black box are the results I got when I run it in non-virtualized,
> > > > while the blue box are the results I got in DomU.*
> > > >
> > > > For some reason, the results in DomU varies much more than the results
> > in
> > > > non-virtualized environment.
> > > > I also repeated the same experiments in DomU with Run Level 1, the
> > results
> > > > are the same.
> > > >
> > > > Can anyone give some suggestions about what might be the reason for
> > this?
> > > >
> > > > Thanks very much!
> > > >
> > > > Sisu
> > > >
> > > > --
> > > > Sisu Xi, PhD Candidate
> > > >
> > > > http://www.cse.wustl.edu/~xis/
> > > > Department of Computer Science and Engineering
> > > > Campus Box 1045
> > > > Washington University in St. Louis
> > > > One Brookings Drive
> > > > St. Louis, MO 63130
> > > >
> > >
> > >
> > >
> > > --
> > > Sisu Xi, PhD Candidate
> > >
> > > http://www.cse.wustl.edu/~xis/
> > > Department of Computer Science and Engineering
> > > Campus Box 1045
> > > Washington University in St. Louis
> > > One Brookings Drive
> > > St. Louis, MO 63130
> >
> >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel
> >
> >
> 
> 
> -- 
> Sisu Xi, PhD Candidate
> 
> http://www.cse.wustl.edu/~xis/
> Department of Computer Science and Engineering
> Campus Box 1045
> Washington University in St. Louis
> One Brookings Drive
> St. Louis, MO 63130

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-05 21:29       ` Gordan Bobic
@ 2014-03-05 22:28         ` Konrad Rzeszutek Wilk
  2014-03-06 10:31           ` Gordan Bobic
  0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-05 22:28 UTC (permalink / raw)
  To: Gordan Bobic; +Cc: Meng Xu, Sisu Xi, xen-devel

On Wed, Mar 05, 2014 at 09:29:30PM +0000, Gordan Bobic wrote:
> Just out of interest, have you tried the same test with HVM DomU?
> The two have different characteristics, and IIRC for some workloads
> PV can be slower than HVM. The recent PVHVM work was intended to
> result in the best aspects of both, but that is more recent than Xen
> 4.3.0.
> 
> It is also interesting that your findings are approximately similar
> to mine, albeit with a very different testing methodology:
> 
> http://goo.gl/lIUk4y

Don't know if you used PV drivers (for HVM) and if you used as a backend a 
block device instead of a file.

But it also helps in using 'fio' to test this sort of thing.

> 
> Gordan
> 
> On 03/05/2014 08:09 PM, Sisu Xi wrote:
> >Hi, Konrad:
> >
> >It is the PV domU.
> >
> >Thanks.
> >
> >Sisu
> >
> >
> >On Wed, Mar 5, 2014 at 11:33 AM, Konrad Rzeszutek Wilk
> ><konrad.wilk@oracle.com <mailto:konrad.wilk@oracle.com>> wrote:
> >
> >    On Tue, Mar 04, 2014 at 05:00:46PM -0600, Sisu Xi wrote:
> >     > Hi, all:
> >     >
> >     > I also used the ramspeed to measure memory throughput.
> >     > http://alasir.com/software/ramspeed/
> >     >
> >     > I am using the v2.6, single core version. The command I used is
> >    ./ramspeed
> >     > -b 3 (for int) and ./ramspeed -b 6 (for float).
> >     > The benchmark measures four operations: add, copy, scale, and
> >    triad. And
> >     > also gives an average number for all four operations.
> >     >
> >     > The results in DomU shows around 20% performance degradation
> >    compared to
> >     > non-virt results.
> >
> >    What kind of domU? PV or HVM?
> >     >
> >     > Attached is the results. The left part are results for int, while
> >    the right
> >     > part is the results for float. The Y axis is the measured
> >    throughput. Each
> >     > box contains 100 experiment repeats.
> >     > The black boxes are the results in non-virtualized environment,
> >    while the
> >     > blue ones are the results I got in DomU.
> >     >
> >     > The Xen version I am using is 4.3.0, 64bit.
> >     >
> >     > Thanks very much!
> >     >
> >     > Sisu
> >     >
> >     >
> >     >
> >     > On Tue, Mar 4, 2014 at 4:49 PM, Sisu Xi <xisisu@gmail.com
> >    <mailto:xisisu@gmail.com>> wrote:
> >     >
> >     > > Hi, all:
> >     > >
> >     > > I am trying to study the cache/memory performance under Xen,
> >    and has
> >     > > encountered some problems.
> >     > >
> >     > > My machine is has an Intel Core i7 X980 processor with 6
> >    physical cores. I
> >     > > disabled hyper-threading, frequency scaling, so it should be
> >    running at
> >     > > constant speed.
> >     > > Dom0 was boot with 1 VCPU pinned to 1 core, with 2 GB of memory.
> >     > >
> >     > > After that, I boot up DomU with 1 VCPU pinned to a separate
> >    core, with 1
> >     > > GB of memory. The credit scheduler is used, and no cap is set
> >    for them. So
> >     > > DomU should be able to access all resources.
> >     > >
> >     > > Each physical core has a 32KB dedicated L1 cache, 256KB
> >    dedicated L2
> >     > > cache. And all cores share a 12MB L3 cache.
> >     > >
> >     > > I created a simple program to create an array of specified
> >    size. Load them
> >     > > once, and then randomly access every cache line once. (1 cache
> >    line is 64B
> >     > > on my machine).
> >     > > rdtsc is used to record the duration for the random access.
> >     > >
> >     > > I tried different data sizes, with 1000 repeat for each data sizes.
> >     > > Attached is the boxplot for average access time for one cache line.
> >     > >
> >     > > The x axis is the different Data Size, the y axis is the CPU
> >    cycle. The
> >     > > three vertical lines at 32KB, 256KB, and 12MB represents the size
> >     > > difference in L1, L2, and L3 cache on my machine.
> >     > > *The black box are the results I got when I run it in
> >    non-virtualized,
> >     > > while the blue box are the results I got in DomU.*
> >     > >
> >     > > For some reason, the results in DomU varies much more than the
> >    results in
> >     > > non-virtualized environment.
> >     > > I also repeated the same experiments in DomU with Run Level 1,
> >    the results
> >     > > are the same.
> >     > >
> >     > > Can anyone give some suggestions about what might be the reason
> >    for this?
> >     > >
> >     > > Thanks very much!
> >     > >
> >     > > Sisu
> >     > >
> >     > > --
> >     > > Sisu Xi, PhD Candidate
> >     > >
> >     > > http://www.cse.wustl.edu/~xis/
> >     > > Department of Computer Science and Engineering
> >     > > Campus Box 1045
> >     > > Washington University in St. Louis
> >     > > One Brookings Drive
> >     > > St. Louis, MO 63130
> >     > >
> >     >
> >     >
> >     >
> >     > --
> >     > Sisu Xi, PhD Candidate
> >     >
> >     > http://www.cse.wustl.edu/~xis/
> >     > Department of Computer Science and Engineering
> >     > Campus Box 1045
> >     > Washington University in St. Louis
> >     > One Brookings Drive
> >     > St. Louis, MO 63130
> >
> >
> >     > _______________________________________________
> >     > Xen-devel mailing list
> >     > Xen-devel@lists.xen.org <mailto:Xen-devel@lists.xen.org>
> >     > http://lists.xen.org/xen-devel
> >
> >
> >
> >
> >--
> >Sisu Xi, PhD Candidate
> >
> >http://www.cse.wustl.edu/~xis/
> >Department of Computer Science and Engineering
> >Campus Box 1045
> >Washington University in St. Louis
> >One Brookings Drive
> >St. Louis, MO 63130
> >
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xen.org
> >http://lists.xen.org/xen-devel
> >
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-05 22:28         ` Konrad Rzeszutek Wilk
@ 2014-03-06 10:31           ` Gordan Bobic
  0 siblings, 0 replies; 15+ messages in thread
From: Gordan Bobic @ 2014-03-06 10:31 UTC (permalink / raw)
  To: xen-devel

On 2014-03-05 22:28, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 05, 2014 at 09:29:30PM +0000, Gordan Bobic wrote:
>> Just out of interest, have you tried the same test with HVM DomU?
>> The two have different characteristics, and IIRC for some workloads
>> PV can be slower than HVM. The recent PVHVM work was intended to
>> result in the best aspects of both, but that is more recent than Xen
>> 4.3.0.
>> 
>> It is also interesting that your findings are approximately similar
>> to mine, albeit with a very different testing methodology:
>> 
>> http://goo.gl/lIUk4y
> 
> Don't know if you used PV drivers (for HVM) and if you used as a 
> backend a
> block device instead of a file.
> 
> But it also helps in using 'fio' to test this sort of thing.

I used a dedicated disk which was not altered between the tests. 
Otherwise
I wouldn't have been able to run the same installation on bare metal and
virtualized.

I don't think disk I/O was particularly relevant in the test - the CPU
was always the bottleneck with no iowait time. My impression was that
it was the context switching that really crippled virtualized 
performance,
especially in multi-socket or NUMA cases. C2Q I tested on can be
considered a dual-socket non-NUMA system in this context since the two
dies on it don't share any caches which means higher migration 
penalties.
Throw in the extra Heisenbergism of the domU kernel not having any idea
where the hypervisor might schedule the virtual CPU mapping (I didn't
pin cores in the test, perhaps I should have) and it is easy to see a
case where it gets quite bad when you push the system to saturation.

Gordan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-04 23:00 ` Sisu Xi
  2014-03-05 17:33   ` Konrad Rzeszutek Wilk
@ 2014-03-11 12:03   ` George Dunlap
  2014-03-11 15:46     ` Sisu Xi
  1 sibling, 1 reply; 15+ messages in thread
From: George Dunlap @ 2014-03-11 12:03 UTC (permalink / raw)
  To: Sisu Xi; +Cc: Meng Xu, xen-devel

On Tue, Mar 4, 2014 at 11:00 PM, Sisu Xi <xisisu@gmail.com> wrote:
> Hi, all:
>
> I also used the ramspeed to measure memory throughput.
> http://alasir.com/software/ramspeed/
>
> I am using the v2.6, single core version. The command I used is ./ramspeed
> -b 3 (for int) and ./ramspeed -b 6 (for float).
> The benchmark measures four operations: add, copy, scale, and triad. And
> also gives an average number for all four operations.
>
> The results in DomU shows around 20% performance degradation compared to
> non-virt results.
>
> Attached is the results. The left part are results for int, while the right
> part is the results for float. The Y axis is the measured throughput. Each
> box contains 100 experiment repeats.
> The black boxes are the results in non-virtualized environment, while the
> blue ones are the results I got in DomU.
>
> The Xen version I am using is 4.3.0, 64bit.

Have you tried a CPU- but non-memory-intensive benchmark?  There was
an issue some time back with the highest performance modes of the CPU
not being enabled by default, but needing special patches in either
dom0 or Xen.  Is it possible you're seeing something like that?

 -George

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-11 12:03   ` George Dunlap
@ 2014-03-11 15:46     ` Sisu Xi
  2014-03-11 20:21       ` Sisu Xi
  2014-03-12  8:59       ` Dario Faggioli
  0 siblings, 2 replies; 15+ messages in thread
From: Sisu Xi @ 2014-03-11 15:46 UTC (permalink / raw)
  To: George Dunlap; +Cc: Meng Xu, xen-devel

[-- Attachment #1.1: Type: text/plain, Size: 2154 bytes --]

Hi, George:

Thanks for the reply.

Yes, I have tried CPU intensive benchmark. the performance is almost the
same as native case.
We also tried the same image, but on another physical host with a 3rd
generation i7 processor. This time, the memory-intensive benchmark only
incurs less than 5% performance degradation, and the latency is identical
to native case.
We are looking into this issue now and hopefully will figure this out.

Can you be more specific about the highest performance modes of the CPU?
For all the experiments, I disabled hyperthreading and frequency scaling,
so CPU should work at a constant speed.

Sisu

On Tue, Mar 11, 2014 at 7:03 AM, George Dunlap
<George.Dunlap@eu.citrix.com>wrote:

> On Tue, Mar 4, 2014 at 11:00 PM, Sisu Xi <xisisu@gmail.com> wrote:
> > Hi, all:
> >
> > I also used the ramspeed to measure memory throughput.
> > http://alasir.com/software/ramspeed/
> >
> > I am using the v2.6, single core version. The command I used is
> ./ramspeed
> > -b 3 (for int) and ./ramspeed -b 6 (for float).
> > The benchmark measures four operations: add, copy, scale, and triad. And
> > also gives an average number for all four operations.
> >
> > The results in DomU shows around 20% performance degradation compared to
> > non-virt results.
> >
> > Attached is the results. The left part are results for int, while the
> right
> > part is the results for float. The Y axis is the measured throughput.
> Each
> > box contains 100 experiment repeats.
> > The black boxes are the results in non-virtualized environment, while the
> > blue ones are the results I got in DomU.
> >
> > The Xen version I am using is 4.3.0, 64bit.
>
> Have you tried a CPU- but non-memory-intensive benchmark?  There was
> an issue some time back with the highest performance modes of the CPU
> not being enabled by default, but needing special patches in either
> dom0 or Xen.  Is it possible you're seeing something like that?
>
>  -George
>

-- 
Sisu Xi, PhD Candidate

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

[-- Attachment #1.2: Type: text/html, Size: 3077 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-11 15:46     ` Sisu Xi
@ 2014-03-11 20:21       ` Sisu Xi
  2014-03-12  8:55         ` Dario Faggioli
  2014-03-12  8:59       ` Dario Faggioli
  1 sibling, 1 reply; 15+ messages in thread
From: Sisu Xi @ 2014-03-11 20:21 UTC (permalink / raw)
  To: George Dunlap; +Cc: Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3152 bytes --]

by the way, since the same DomU image can get better results on another
hardware machine, we first assume there are some interference from Dom-0.

However, when I run the same program in Dom-0, the results looks very good,
almost the same as native case, just a few out liars. Which means the
interference form Dom-0 is not causing trouble for dom-0, but can interfere
with cache program in Dom-U? Is this assumption valid?

The domain-0 I tried are:
1. Ubuntu 12.04.2, default 3.2 kernel
2. Ubuntu 12.04.2, self compiled 3.11 kernel
3. Ubuntu 12.04.4, default 3.8 kernel
4. CentOS 6.2, self compiled 3.13.6 kernel

Thanks.

Sisu



On Tue, Mar 11, 2014 at 10:46 AM, Sisu Xi <xisisu@gmail.com> wrote:

> Hi, George:
>
> Thanks for the reply.
>
> Yes, I have tried CPU intensive benchmark. the performance is almost the
> same as native case.
> We also tried the same image, but on another physical host with a 3rd
> generation i7 processor. This time, the memory-intensive benchmark only
> incurs less than 5% performance degradation, and the latency is identical
> to native case.
> We are looking into this issue now and hopefully will figure this out.
>
> Can you be more specific about the highest performance modes of the CPU?
> For all the experiments, I disabled hyperthreading and frequency scaling,
> so CPU should work at a constant speed.
>
> Sisu
>
>
>
> On Tue, Mar 11, 2014 at 7:03 AM, George Dunlap <
> George.Dunlap@eu.citrix.com> wrote:
>
>> On Tue, Mar 4, 2014 at 11:00 PM, Sisu Xi <xisisu@gmail.com> wrote:
>> > Hi, all:
>> >
>> > I also used the ramspeed to measure memory throughput.
>> > http://alasir.com/software/ramspeed/
>> >
>> > I am using the v2.6, single core version. The command I used is
>> ./ramspeed
>> > -b 3 (for int) and ./ramspeed -b 6 (for float).
>> > The benchmark measures four operations: add, copy, scale, and triad. And
>> > also gives an average number for all four operations.
>> >
>> > The results in DomU shows around 20% performance degradation compared to
>> > non-virt results.
>> >
>> > Attached is the results. The left part are results for int, while the
>> right
>> > part is the results for float. The Y axis is the measured throughput.
>> Each
>> > box contains 100 experiment repeats.
>> > The black boxes are the results in non-virtualized environment, while
>> the
>> > blue ones are the results I got in DomU.
>> >
>> > The Xen version I am using is 4.3.0, 64bit.
>>
>> Have you tried a CPU- but non-memory-intensive benchmark?  There was
>> an issue some time back with the highest performance modes of the CPU
>> not being enabled by default, but needing special patches in either
>> dom0 or Xen.  Is it possible you're seeing something like that?
>>
>>  -George
>>
>
>
>
> --
> Sisu Xi, PhD Candidate
>
> http://www.cse.wustl.edu/~xis/
> Department of Computer Science and Engineering
> Campus Box 1045
> Washington University in St. Louis
> One Brookings Drive
> St. Louis, MO 63130
>



-- 
Sisu Xi, PhD Candidate

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

[-- Attachment #1.2: Type: text/html, Size: 4660 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-11 20:21       ` Sisu Xi
@ 2014-03-12  8:55         ` Dario Faggioli
  2014-03-12 16:50           ` Sisu Xi
  0 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2014-03-12  8:55 UTC (permalink / raw)
  To: Sisu Xi; +Cc: George Dunlap, Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1102 bytes --]

On Tue, 2014-03-11 at 15:21 -0500, Sisu Xi wrote:
> by the way, since the same DomU image can get better results on
> another hardware machine, we first assume there are some interference
> from Dom-0.
> 
Are you able to share the source of the test program, so that we can try
to reproduce what you're seeing?

> However, when I run the same program in Dom-0, the results looks very
> good, almost the same as native case, just a few out liars. Which
> means the interference form Dom-0 is not causing trouble for dom-0,
> but can interfere with cache program in Dom-U? Is this assumption
> valid?
> 
I'm shooting a bit in the dark, but:
 - what is Dom0 doing while the DomU is running the workload?
 - to what pCPUs are you pinning Dom0's and DomU's vCPUs? Do they 
   share any level of the cache hierarchy?

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-11 15:46     ` Sisu Xi
  2014-03-11 20:21       ` Sisu Xi
@ 2014-03-12  8:59       ` Dario Faggioli
  1 sibling, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2014-03-12  8:59 UTC (permalink / raw)
  To: Sisu Xi; +Cc: George Dunlap, Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 921 bytes --]

On Tue, 2014-03-11 at 10:46 -0500, Sisu Xi wrote:


> Can you be more specific about the highest performance modes of the
> CPU? For all the experiments, I disabled hyperthreading and frequency
> scaling, so CPU should work at a constant speed.
> 
It was about TurboBoost (or whatever it's called) mode:
 http://en.wikipedia.org/wiki/Intel_Turbo_Boost

Some more info on this old blog post:
 http://blog.xen.org/index.php/2011/11/29/baremetal-vs-xen-vs-kvm-redux/

If you're completely disabling cpufreq via BIOS, I'd say this is ruled
out, but I can't be 100% sure. Also, that should be fixed in recent
enough Xen and Dom0 kernel.

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-12  8:55         ` Dario Faggioli
@ 2014-03-12 16:50           ` Sisu Xi
  2014-03-13 10:25             ` George Dunlap
  0 siblings, 1 reply; 15+ messages in thread
From: Sisu Xi @ 2014-03-12 16:50 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2725 bytes --]

Hi, Dario:

Thanks for the reply.

The CPU i am using is i7 X 980 @ 3.33 GHz,
each core has dedicated L1(32K data, 32K inst) and L2 (256K unified) cache,
all 6 cores share a 12MB L3 cache.
I pinned Dom-0 to core 0, and Dom-U to core 1.

The program I used is attached. It takes one input parameter as the data
array size (in KB).

It can be divided into following steps:
1. init the data array
2. divided the array by cache line size (on my machine, it is 64B), then
random the first element on each cache line;
3. read each cache line once to warm up cache again
4. read it a second time, record the time for this round.
5. print out the time spent in step 4 (total time and per cache line time).

The randomization is done to defeat cache pre-fetch. And since each
accessed data are 64B apart, there should be no two accessed data on the
same cache line.

I compile it use: g++ -O0 cache_latency_size_boxplot.cc -o
cache_latency_size

The script I used to run the experiment is also attached. Basically it try
different array size, each for 1000 times.

For the throughput experiment, I used the ramspeed to measure memory
throughput.
http://alasir.com/software/ramspeed/
I used v2.6, single core version. The command I used is ./ramspeed -b 3
(for int) and ./ramspeed -b 6 (for float).


Thanks very much!

Sisu





On Wed, Mar 12, 2014 at 3:55 AM, Dario Faggioli
<dario.faggioli@citrix.com>wrote:

> On Tue, 2014-03-11 at 15:21 -0500, Sisu Xi wrote:
> > by the way, since the same DomU image can get better results on
> > another hardware machine, we first assume there are some interference
> > from Dom-0.
> >
> Are you able to share the source of the test program, so that we can try
> to reproduce what you're seeing?
>
> > However, when I run the same program in Dom-0, the results looks very
> > good, almost the same as native case, just a few out liars. Which
> > means the interference form Dom-0 is not causing trouble for dom-0,
> > but can interfere with cache program in Dom-U? Is this assumption
> > valid?
> >
> I'm shooting a bit in the dark, but:
>  - what is Dom0 doing while the DomU is running the workload?
>  - to what pCPUs are you pinning Dom0's and DomU's vCPUs? Do they
>    share any level of the cache hierarchy?
>
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>
>


-- 
Sisu Xi, PhD Candidate

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

[-- Attachment #1.2: Type: text/html, Size: 3992 bytes --]

[-- Attachment #2: cache_latency_size_boxplot.cc --]
[-- Type: text/x-c++src, Size: 1853 bytes --]

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <string>
#include <ctime>

using namespace std;

#if defined(__i386__)
static __inline__ unsigned long long rdtsc(void)
{
    unsigned long long int x;
    __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
    return x;
}
#elif defined(__x86_64__)
static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
#endif

#define CACHE_LINE_SIZE	64

#define WSS 24567 /* 24 Mb */
#define NUM_VARS WSS * 1024 / sizeof(long)

// ./a.out memsize(in KB)
int main(int argc, char** argv)
{
	unsigned long mem_size_KB = atol(argv[1]);  // mem size in KB
	unsigned long mem_size_B  = mem_size_KB * 1024;	// mem size in Byte
    unsigned long count       = mem_size_B / sizeof(long);
    unsigned long row         = mem_size_B / CACHE_LINE_SIZE;
    int           col         = CACHE_LINE_SIZE / sizeof(long);
    
    unsigned long long start, finish, dur1;
    unsigned long temp;

    long *buffer;
    buffer = new long[count];

    // init array
    for (unsigned long i = 0; i < count; ++i)
        buffer[i] = i;

    for (unsigned long i = row-1; i >0; --i) {
        temp = rand()%i;
        swap(buffer[i*col], buffer[temp*col]);
    }

    // warm the cache again
    temp = buffer[0];
    for (unsigned long i = 0; i < row-1; ++i) {
        temp = buffer[temp];
    }

    // should be cache hit
    temp = buffer[0];
    start = rdtsc();
    int sum = 0;
    for (unsigned long i = 0; i < row-1; ++i) {
        if (i%2 == 0) sum += buffer[temp];
        else sum -= buffer[temp];
        temp = buffer[temp];
    }
    finish = rdtsc();
    dur1 = finish-start;

    // Res
	printf("%lld %lld\n", dur1, dur1/row);

    free(buffer);

	return 0;
}

[-- Attachment #3: cache_latency_1_measure_size.sh --]
[-- Type: application/x-sh, Size: 902 bytes --]

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: memory performance 20% degradation in DomU -- Sisu
  2014-03-12 16:50           ` Sisu Xi
@ 2014-03-13 10:25             ` George Dunlap
  0 siblings, 0 replies; 15+ messages in thread
From: George Dunlap @ 2014-03-13 10:25 UTC (permalink / raw)
  To: Sisu Xi, Dario Faggioli; +Cc: Meng Xu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 507 bytes --]

On 03/12/2014 04:50 PM, Sisu Xi wrote:
> Hi, Dario:
>
> Thanks for the reply.
>
> The CPU i am using is i7 X 980 @ 3.33 GHz,
> each core has dedicated L1(32K data, 32K inst) and L2 (256K unified) 
> cache, all 6 cores share a 12MB L3 cache.
> I pinned Dom-0 to core 0, and Dom-U to core 1.

It might be worth switching those around -- e.g., try pinning domU to 
core 4 / 5, or try pinning domU to core 0 and dom0 to core1, just to 
rule out some kind of strange NUMA / microarchitecture effect.

  -George


[-- Attachment #1.2: Type: text/html, Size: 2084 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-03-13 10:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-04 22:49 memory performance 20% degradation in DomU -- Sisu Sisu Xi
2014-03-04 23:00 ` Sisu Xi
2014-03-05 17:33   ` Konrad Rzeszutek Wilk
2014-03-05 20:09     ` Sisu Xi
2014-03-05 21:29       ` Gordan Bobic
2014-03-05 22:28         ` Konrad Rzeszutek Wilk
2014-03-06 10:31           ` Gordan Bobic
2014-03-05 22:09       ` Konrad Rzeszutek Wilk
2014-03-11 12:03   ` George Dunlap
2014-03-11 15:46     ` Sisu Xi
2014-03-11 20:21       ` Sisu Xi
2014-03-12  8:55         ` Dario Faggioli
2014-03-12 16:50           ` Sisu Xi
2014-03-13 10:25             ` George Dunlap
2014-03-12  8:59       ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).