From: Dario Faggioli <dario.faggioli@citrix.com>
To: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
George Dunlap <George.Dunlap@citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
"JBeulich@suse.com" <JBeulich@suse.com>,
"chao.p.peng@linux.intel.com" <chao.p.peng@linux.intel.com>
Subject: Re: [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities
Date: Tue, 7 Apr 2015 13:10:01 +0000 [thread overview]
Message-ID: <1428412199.5671.94.camel@citrix.com> (raw)
In-Reply-To: <5523B0FB.8020509@citrix.com>
[-- Attachment #1.1: Type: text/plain, Size: 6702 bytes --]
On Tue, 2015-04-07 at 11:27 +0100, Andrew Cooper wrote:
> On 04/04/2015 03:14, Dario Faggioli wrote:
>
> > I'm putting here in the cover letter a markdown document I wrote to better
> > describe my findings and ideas (sorry if it's a bit long! :-D). You can also
> > fetch it at the following links:
> >
> > * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.pdf
> > * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.markdown
> >
> > See the document itself and the changelog of the various patches for details.
>
> There seem to be several areas of confusion indicated in your document.
>
I see. Sorry for that then.
> I am unsure whether this is a side effect of the way you have written
> it, but here are (hopefully) some words of clarification.
>
And thanks for this. :-)
> PSR CMT works by tagging cache lines with the currently-active RMID.
> The cache utilisation is a count of the number of lines which are tagged
> with a specific RMID. MBM on the other hand counts the number of cache
> line fills and cache line evictions tagged with a specific RMID.
>
Ok.
> By this nature, the information will never reveal the exact state of
> play. e.g. a core with RMID A which gets a cache line hit against a
> line currently tagged with RMID B will not alter any accounting.
>
So, you're saying that the information we get is an approximation of
reality, not it's 100% accurate representation. That is no news, IMO.
When, inside Credit2, we try to track the average load on each runqueue,
that is an approximation. When, in Credit1, we consider a vcpu "cache
hot" if it run recently, that is an approximation. Etc. These
approximations happens fully in software, because it is possible, in
those cases.
PSR provides data and insights on something that, without hardware
support, we couldn't possibly hope to know anything about. Whether we
should think about using such data or not, it depends whether they are
represents a (base for a) reasonable enough approximation, or they are
just a bunch of pseudo random numbers.
It seems to me that you are suggesting the latter to be more likely than
the former, i.e., PSR does not provide a good enough approximation for
being used from inside Xen and toolstack, is my understanding correct?
> Furthermore, as alterations of the RMID only occur in
> __context_switch(), Xen actions such as handling an interrupt will be
> accounted against the currently active domain (or other future
> granularity of RMID).
>
Yes, I thought about this. However, this is certainly important for
per-domain, or for a (unlikely) future per-vcpu, monitoring, but if you
attach an RMID to a pCPU (or groups of pCPU) then that is not really a
problem.
Actually, it's the correct behavior: running Xen and serving interrupts
in a certain core, in that case, *do* need to be accounted! So,
considering that both the document and the RFC series are mostly focused
on introducing per-pcpu/core/socket monitoring, rather than on
per-domain monitoring, and given that the document was becoming quite
long, I decided not to add a section about this.
> "max_rmid" is a per-socket property. There is no requirement for it to
> be the same for each socket in a system, although it is likely, given a
> homogeneous system.
>
I know. Again this was not mentioned for document length reasons, but I
planned to ask about this (as I've done that already this morning, as
you can see. :-D).
In this case, though, it probably was something worth being mentioned,
so I will if there will ever be a v2 of the document. :-)
Mostly, I was curious to learn why that is not reflected in the current
implementation, i.e., whether there are any reasons why we should not
take advantage of per-socketness of RMIDs, as reported by SDM, as that
can greatly help mitigating RMID shortage in the per-CPU/core/socket
configuration (in general, actually, but it's per-cpu that I'm
interested in).
> The limit on RMID is based on the size of the
> accounting table.
>
Did not know in details, but it makes sense. Getting feedback on what
should be expected as number of available RMIDs in current and future
hardware, from Intel people and from everyone who knows (like you :-D ),
was the main purpose of sending this out, so thanks.
> As far as MSRs themselves go, an extra MSR write in the context switch
> path is likely to pale into the noise. However, querying the data is an
> indirect MSR read (write to the event select MSR, read from the data
> MSR). Furthermore there is no way to atomically read all data at once
> which means that activity on other cores can interleave with
> back-to-back reads in the scheduler.
>
All true. And in fact, how and how frequent data should be gathered
remains to be decided (as said in the document). I was thinking more to
some periodic sampling, rather than to throw handfuls of rdmsr/wrmsr
against the code that makes scheduling decisions! :-D
> As far as the plans here go, I have some concerns. PSR is only
> available on server platforms, which will be 2/4 socket systems with
> large numbers of cores. As you have discovered, there insufficient
> RMIDs for redbrick pcpus, and on a system that size, XenServer typically
> gets 7x vcpus to pcpus.
>
> I think it is unrealistic to expect to use any scheduler scheme which is
> per-pcpu or per-vcpu while the RMID limit is as small as it is.
>
On the per-vcpu schemes, I fully agree. However, it was necessary to
mention it, IMO, and explain why that is the case... Being able to
monitor single vCPUs would be pretty cool, and it likely is one of the
first things that someone looking at this technology for the first time
would like to know whether it is possible or not. It's not, and I
thought not stating so and not explaining the reasons why it is not
would have been quite a deficiency of such a document.
On per-pcpu schemes, I mostly agree. Although exploiting the per-socket
nature of RMID, if possible, seems to offer a viable solution.
What I'm not sure I got is your opinion on per-pcpu or per-socket
schemes.
> Depending on workload, even a per-domain scheme might be problematic.
> One of our tests involves running 500xWin7 VMs on that particular box.
>
Yep. And in fact, I didn't even mention using any per-domain scheme for
scheduling as it has the same disadvantages of per-vcpu schemes, in
terms of RMID usage (a few multi-vcpus domain == many single-vcpus
domain), and it's useless for the scheduler, which barely knows about
what a domain is.
Regards, and Thanks a lot for your feedback. :-)
Dario
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-04-07 13:10 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-04 2:14 [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities Dario Faggioli
2015-04-04 2:14 ` [RFC PATCH 1/7] x86: improve psr scheduling code Dario Faggioli
2015-04-06 13:48 ` Konrad Rzeszutek Wilk
2015-04-04 2:14 ` [RFC PATCH 2/7] Xen: x86: print max usable RMID during init Dario Faggioli
2015-04-06 13:48 ` Konrad Rzeszutek Wilk
2015-04-07 10:11 ` Dario Faggioli
2015-04-04 2:14 ` [RFC PATCH 3/7] xen: psr: reserve an RMID for each core Dario Faggioli
2015-04-06 13:59 ` Konrad Rzeszutek Wilk
2015-04-07 10:19 ` Dario Faggioli
2015-04-07 13:57 ` Konrad Rzeszutek Wilk
2015-04-07 8:24 ` Chao Peng
2015-04-07 10:07 ` Dario Faggioli
2015-04-08 13:28 ` George Dunlap
2015-04-08 14:03 ` Dario Faggioli
2015-04-04 2:14 ` [RFC PATCH 4/7] xen: libxc: libxl: report per-CPU cache occupancy up to libxl Dario Faggioli
2015-04-04 2:14 ` [RFC PATCH 5/7] xen: libxc: libxl: allow for attaching and detaching a CPU to CMT Dario Faggioli
2015-04-04 2:15 ` [RFC PATCH 6/7] xl: report per-CPU cache occupancy up to libxl Dario Faggioli
2015-04-04 2:15 ` [RFC PATCH 7/7] xl: allow for attaching and detaching a CPU to CMT Dario Faggioli
2015-04-07 8:19 ` [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities Chao Peng
2015-04-07 9:51 ` Dario Faggioli
2015-04-07 10:27 ` Andrew Cooper
2015-04-07 13:10 ` Dario Faggioli [this message]
2015-04-08 5:59 ` Chao Peng
2015-04-08 8:23 ` Dario Faggioli
2015-04-08 8:53 ` Andrew Cooper
2015-04-08 8:55 ` Chao Peng
2015-04-09 15:44 ` Meng Xu
2015-04-08 11:27 ` George Dunlap
2015-04-08 13:29 ` Dario Faggioli
2015-04-08 11:30 ` George Dunlap
2015-04-08 13:16 ` Dario Faggioli
2015-04-09 15:37 ` Meng Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1428412199.5671.94.camel@citrix.com \
--to=dario.faggioli@citrix.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=George.Dunlap@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@suse.com \
--cc=chao.p.peng@linux.intel.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.