[XenRT] Cache-Aware Real-Time Xen: Partition shared cache for guest domains in Xen via page coloring

All of lore.kernel.org
 help / color / mirror / Atom feed

* [XenRT] Cache-Aware Real-Time Xen: Partition shared cache for guest domains in Xen via page coloring
@ 2015-05-11  2:36 Meng Xu
  2015-05-12 22:59 ` Dario Faggioli
  0 siblings, 1 reply; 3+ messages in thread
From: Meng Xu @ 2015-05-11  2:36 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap
  Cc: Andrew Cooper, Jan Beulich, xen-devel@lists.xen.org

Hi Dario and George,

I'm working on considering the shared-cache interference effect into
the schedulers (both in VMM and in VM) to improve the schedulability
of the whole system. To be specific, I'm doing the following things:
(1) Investigating the shared-cache interference on the real-time
performance (such as worst-case execution time) of applications in VMs
 that shared the last level cache (LLC) on Xen;
(2) Eliminate such shared-cache interference by statically
partitioning shared cache to VMs via page-coloring mechanism and\
evaluate the effectiveness of this mechanism;
(3) Better utilize the shared cache by dynamically
increasing/decreasing/changing the cache partitions of a VM online;
(4) Incorporating the cache effect with the scheduling algorithm of
VMM to improve the schedulability of whole system.

Right now, I almost finish the first two steps and have some
preliminary results of the real-time performance of Xen with static
cache partition mechanism. I made a quick slide to summarize the
current work and the future plan.
The slide can be found at:
http://www.cis.upenn.edu/~mengxu/cart-xen/2015-05-01-CARTXen-WiP.pdf

My question is:
Do you have any comment or concerns on the current software-based
cache management work?
I  hope to listen to your opinions and  incorporate your opinions on
my ongoing work instead of diverting too
far away from Xen mainstream ideas. :-)

Thank you very much!

Best regards,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [XenRT] Cache-Aware Real-Time Xen: Partition shared cache for guest domains in Xen via page coloring
  2015-05-11  2:36 [XenRT] Cache-Aware Real-Time Xen: Partition shared cache for guest domains in Xen via page coloring Meng Xu
@ 2015-05-12 22:59 ` Dario Faggioli
  2015-05-13  2:01   ` Meng Xu
  0 siblings, 1 reply; 3+ messages in thread
From: Dario Faggioli @ 2015-05-12 22:59 UTC (permalink / raw)
  To: Meng Xu
  Cc: artem.mygaiev@globallogic.com, George Dunlap, Andrew Cooper,
	xen-devel@lists.xen.org, Jan Beulich, Andrii Tseglytskyi


[-- Attachment #1.1: Type: text/plain, Size: 3334 bytes --]

On Sun, 2015-05-10 at 22:36 -0400, Meng Xu wrote:
> Hi Dario and George,
> 
Hi Meng,

I gave a quick look at the slides. Nice work.

Although I don't have much time, I also wanted to take a quick glance at
the code, and looked it up on GitHub, where you usually host your stuff,
but couldn't find it (maybe because I'm really ignorant about how that
site works! :-P).. Is it hosted somewhere public already?

> Right now, I almost finish the first two steps and have some
> preliminary results of the real-time performance of Xen with static
> cache partition mechanism. I made a quick slide to summarize the
> current work and the future plan.
> The slide can be found at:
> http://www.cis.upenn.edu/~mengxu/cart-xen/2015-05-01-CARTXen-WiP.pdf
> 
The results look nice and promising, at least for real-time
(virtualization) workloads. I'm quite sure folks working on
embedded/automotive projects, that are looking at RTDS, would find them
very interesting (provided it works similarly well on ARM, as that's
what they use), and in fact I'm adding people from GlobalLogic to the Cc
list.

> My question is:
> Do you have any comment or concerns on the current software-based
> cache management work?
>
My first thought is the one I sort of expressed above already: do you
think it could work on ARM as well? I'm quite sure they have caches, so
the basic idea is applicable there too, but I know too few of that
architecture to see how well/bad it will behave in there! Would you have
the chance and the interest in trying to find that out? 

Another question: what are (if any) the limitations and the restrictions
we have to accept, in order to be able to take advantage of this? E.g.,
I remember someone (I think it was Andrew) mentioning that playing the
tricks you play with addresses, would make it hard/impossible to use
superpages. I also remember that you were having problems at finding
large enough chunks of contiguous memory... Are these still open issues?

> I  hope to listen to your opinions and  incorporate your opinions on
> my ongoing work instead of diverting too
> far away from Xen mainstream ideas. :-)
> 
Some more thoughts. This looks like something that could work quite well
in environment and use cases that:
 1. are rather static (i.e., no or few dynamic domain creation, well 
    defined workload inside each domain, etc.)
 2. there are not too many domains around. I mean, if you have hundreds
    of guests, it's very unlikely that you'll be able to arrange for a
    similar number of properly sized partitions, isn't it?

That is actually why I really think this could be useful for the
embedded virt people: this is exactly how their environment looks
like! :-)

I think we should definitely consider merging something that will
potentially help the emerging embedded/automotive use cases, provided
(as usual):
 1. the benefits are real and really useful (e.g., they are still there 
    on ARM)
 2. it does not disrupt other workloads, it does not impact other 
    features and it does not make the code worse (i.e., more difficult 
    to understand and to maintain)

I'd suggest, if you agree with my analysis, you try to assess 1... Maybe
GlobalLogic people could help, if they're interested.

Thanks and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [XenRT] Cache-Aware Real-Time Xen: Partition shared cache for guest domains in Xen via page coloring
  2015-05-12 22:59 ` Dario Faggioli
@ 2015-05-13  2:01   ` Meng Xu
  0 siblings, 0 replies; 3+ messages in thread
From: Meng Xu @ 2015-05-13  2:01 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: artem.mygaiev@globallogic.com, George Dunlap, Andrew Cooper,
	xen-devel@lists.xen.org, Jan Beulich, Andrii Tseglytskyi

2015-05-12 18:59 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On Sun, 2015-05-10 at 22:36 -0400, Meng Xu wrote:
>> Hi Dario and George,
>>
> Hi Meng,

Hi Dario,

>
> I gave a quick look at the slides. Nice work.

Thanks for your encouragement! :-)

>
> Although I don't have much time, I also wanted to take a quick glance at
> the code, and looked it up on GitHub, where you usually host your stuff,
> but couldn't find it (maybe because I'm really ignorant about how that
> site works! :-P).. Is it hosted somewhere public already?

The code is not hosted public right now.
One reason is because it is  really "ugly" right now (I directly
copied some functions, change the name and interface to make it work.
Of course, I can rework the code and make it acceptable to the Xen
coding standard.). :-(
Another reason is because I'm trying to publish/submit some paper on
this, which is what I needed for graduation, before making all code
totally public. :-P
But I can add you and the people who are interested, into my private
repository in github. Is that ok?

>
>> Right now, I almost finish the first two steps and have some
>> preliminary results of the real-time performance of Xen with static
>> cache partition mechanism. I made a quick slide to summarize the
>> current work and the future plan.
>> The slide can be found at:
>> http://www.cis.upenn.edu/~mengxu/cart-xen/2015-05-01-CARTXen-WiP.pdf
>>
> The results look nice and promising, at least for real-time
> (virtualization) workloads. I'm quite sure folks working on
> embedded/automotive projects, that are looking at RTDS, would find them
> very interesting (provided it works similarly well on ARM, as that's
> what they use), and in fact I'm adding people from GlobalLogic to the Cc
> list.
>
>> My question is:
>> Do you have any comment or concerns on the current software-based
>> cache management work?

Yes! :-)
The concerns/shortcoming of this software based cache management is:
(1) It is limited to the page size equal to 4KB. As Andrea/Jan pointed
out before, when the page size become 2MB, we won't be able to
partition the cache with this approach. We need to manage the memory
in finer granularity so that we can control some machine address bits
to control which area of cache should be used. Actually, I haven't
looked into the HVM case which uses the superpage yet, so I cannot say
it is impossible, but it will be harder for sure.
(2) If we want to migrate one cache partition of a domain to another
cache partition, we will have to copy the related pages, which is
expensive in my opinion. Of course, we can do some lazy-copying to
only copy the page when it is used, but in the worst case, we still
need to copy those dirty pages.  This means dynamic cache management
will have relatively high overhead. Actually, this is what I'm working
on right now and trying to measure how large the overhead is.
(3) Potential TLB miss issues. This is not as large as we first
thought. My speculation is that TLB is a special cache which may also
have the prefetch. The prefetch may help reduce the TLB overhead.
That's why we didn't see much TLB overhead in the evaluation.

I'm actually interested in comparing this software-based approach with
hardware-based approach, like CAT introduced by Intel. But I don't
have the hardware (which is so expensive :-() right now.

>>
> My first thought is the one I sort of expressed above already: do you
> think it could work on ARM as well? I'm quite sure they have caches, so
> the basic idea is applicable there too, but I know too few of that
> architecture to see how well/bad it will behave in there! Would you have
> the chance and the interest in trying to find that out?

I think the implementation works on ARM as well because the code I
touch does not really depend on the arch. (Except that I have to
assume page size is 4KB.)
The only concern is that the latest ARM core I know only has 2MB share
cache, which is relatively small.

Actually, Hyon-Young (who is a visiting professor of Insup) and I had
a look at the ARM board last year. We tried on the Samsung Exynos 5420
Arndale Octa Board and faced the signature issue (the U-boot cannot
bring up the CPU in hypervisor mode:
http://lists.xen.org/archives/html/xen-devel/2014-04/msg01794.html).
Then we tried the cubieboard which seems working. But because I was
busy with some other things and didn't continue on this direction last
year. :-(
If we can have some good case study on the ARM board, I think it will
be interesting and we can continue working on it. :-)

>
> Another question: what are (if any) the limitations and the restrictions
> we have to accept, in order to be able to take advantage of this? E.g.,
> I remember someone (I think it was Andrew) mentioning that playing the
> tricks you play with addresses, would make it hard/impossible to use
> superpages.

As I said, superpage is an issue as far as I know right now. We have
to control some bits that is used to index cache to control which area
of cache should be used. I haven't looked into superpage yet, but I
will have a look and think it must be an interesting problem to work
out.

 I also remember that you were having problems at finding
> large enough chunks of contiguous memory... Are these still open issues?

Ah, that is solved. :-) I guess it's because I add a field in
page_info structure and blow the page_info struct too much. Then the
memory to hold all page_info is too large and eat up some memory that
is reserved for DMA. (This is my guess because after I make the added
field smaller in the page_info structure, it works. :-P Please correct
me if I'm wrong.)

>
>> I  hope to listen to your opinions and  incorporate your opinions on
>> my ongoing work instead of diverting too
>> far away from Xen mainstream ideas. :-)
>>
> Some more thoughts. This looks like something that could work quite well
> in environment and use cases that:
>  1. are rather static (i.e., no or few dynamic domain creation, well
>     defined workload inside each domain, etc.)
>  2. there are not too many domains around. I mean, if you have hundreds
>     of guests, it's very unlikely that you'll be able to arrange for a
>     similar number of properly sized partitions, isn't it?
>
> That is actually why I really think this could be useful for the
> embedded virt people: this is exactly how their environment looks
> like! :-)

Do you or anyone else have some practical use case that we can run on
RTDS? That should be an interesting case study.

>
> I think we should definitely consider merging something that will
> potentially help the emerging embedded/automotive use cases, provided
> (as usual):
>  1. the benefits are real and really useful (e.g., they are still there
>     on ARM)

I think it works on ARM as long as we manage the page in 4KB on ARM.
As to usefulness, I'm willing to make some practical applications
running on the RTDS scheduler and the cache partition mechanism. This
will make the benefit real.

>  2. it does not disrupt other workloads, it does not impact other
>     features and it does not make the code worse (i.e., more difficult
>     to understand and to maintain)

As to the static cache partition, I think it should not impact other
features. It is more like adding new hypercalls and functions and
redirect the existing memory allocation/free functions to the newly
added functions if we configure to use this feature.

>
> I'd suggest, if you agree with my analysis, you try to assess 1.

As to try to assess 1, do you mean that I should try to run the code
on some ARM board?

> .. Maybe
> GlobalLogic people could help, if they're interested.
I really appreciate it if they could help! :-P
Actually, we also want to see the real applications running on Xen
with RT features.

Thank you very much for your comments and advice!

Best regards,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-13  2:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-11  2:36 [XenRT] Cache-Aware Real-Time Xen: Partition shared cache for guest domains in Xen via page coloring Meng Xu
2015-05-12 22:59 ` Dario Faggioli
2015-05-13  2:01   ` Meng Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.