public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: "Yu, Fenghua" <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"x86@kernel.org" <x86@kernel.org>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	"Shivappa, Vikas" <vikas.shivappa@intel.com>,
	Tejun Heo <tj@kernel.org>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [RFD] CAT user space interface revisited
Date: Wed, 23 Dec 2015 08:28:20 -0200	[thread overview]
Message-ID: <20151223102820.GA2559@amt.cnet> (raw)
In-Reply-To: <3E5A0FA7E9CA944F9D5414FEC6C712205DF4E157@ORSMSX106.amr.corp.intel.com>

On Tue, Dec 22, 2015 at 06:12:05PM +0000, Yu, Fenghua wrote:
> > From: Thomas Gleixner [mailto:tglx@linutronix.de]
> > Sent: Wednesday, November 18, 2015 10:25 AM
> > Folks!
> > 
> > After rereading the mail flood on CAT and staring into the SDM for a while, I
> > think we all should sit back and look at it from scratch again w/o our
> > preconceptions - I certainly had to put my own away.
> > 
> > Let's look at the properties of CAT again:
> > 
> >    - It's a per socket facility
> > 
> >    - CAT slots can be associated to external hardware. This
> >      association is per socket as well, so different sockets can have
> >      different behaviour. I missed that detail when staring the first
> >      time, thanks for the pointer!
> > 
> >    - The association ifself is per cpu. The COS selection happens on a
> >      CPU while the set of masks which are selected via COS are shared
> >      by all CPUs on a socket.
> > 
> > There are restrictions which CAT imposes in terms of configurability:
> > 
> >    - The bits which select a cache partition need to be consecutive
> > 
> >    - The number of possible cache association masks is limited
> > 
> > Let's look at the configurations (CDP omitted and size restricted)
> > 
> > Default:   1 1 1 1 1 1 1 1
> > 	   1 1 1 1 1 1 1 1
> > 	   1 1 1 1 1 1 1 1
> > 	   1 1 1 1 1 1 1 1
> > 
> > Shared:	   1 1 1 1 1 1 1 1
> > 	   0 0 1 1 1 1 1 1
> > 	   0 0 0 0 1 1 1 1
> > 	   0 0 0 0 0 0 1 1
> > 
> > Isolated:  1 1 1 1 0 0 0 0
> > 	   0 0 0 0 1 1 0 0
> > 	   0 0 0 0 0 0 1 0
> > 	   0 0 0 0 0 0 0 1
> > 
> > Or any combination thereof. Surely some combinations will not make any
> > sense, but we really should not make any restrictions on the stupidity of a
> > sysadmin. The worst outcome might be L3 disabled for everything, so what?
> > 
> > Now that gets even more convoluted if CDP comes into play and we really
> > need to look at CDP right now. We might end up with something which looks
> > like this:
> > 
> >    	   1 1 1 1 0 0 0 0	Code
> > 	   1 1 1 1 0 0 0 0	Data
> > 	   0 0 0 0 0 0 1 0	Code
> > 	   0 0 0 0 1 1 0 0	Data
> > 	   0 0 0 0 0 0 0 1	Code
> > 	   0 0 0 0 1 1 0 0	Data
> > or
> > 	   0 0 0 0 0 0 0 1	Code
> > 	   0 0 0 0 1 1 0 0	Data
> > 	   0 0 0 0 0 0 0 1	Code
> > 	   0 0 0 0 0 1 1 0	Data
> > 
> > Let's look at partitioning itself. We have two options:
> > 
> >    1) Per task partitioning
> > 
> >    2) Per CPU partitioning
> > 
> > So far we only talked about #1, but I think that #2 has a value as well. Let me
> > give you a simple example.
> > 
> > Assume that you have isolated a CPU and run your important task on it. You
> > give that task a slice of cache. Now that task needs kernel services which run
> > in kernel threads on that CPU. We really don't want to (and cannot) hunt
> > down random kernel threads (think cpu bound worker threads, softirq
> > threads ....) and give them another slice of cache. What we really want is:
> > 
> >     	 1 1 1 1 0 0 0 0    <- Default cache
> > 	 0 0 0 0 1 1 1 0    <- Cache for important task
> > 	 0 0 0 0 0 0 0 1    <- Cache for CPU of important task
> > 
> > It would even be sufficient for particular use cases to just associate a piece of
> > cache to a given CPU and do not bother with tasks at all.
> > 
> > We really need to make this as configurable as possible from userspace
> > without imposing random restrictions to it. I played around with it on my new
> > intel toy and the restriction to 16 COS ids (that's 8 with CDP
> > enabled) makes it really useless if we force the ids to have the same meaning
> > on all sockets and restrict it to per task partitioning.
> > 
> > Even if next generation systems will have more COS ids available, there are
> > not going to be enough to have a system wide consistent view unless we
> > have COS ids > nr_cpus.
> > 
> > Aside of that I don't think that a system wide consistent view is useful at all.
> > 
> >  - If a task migrates between sockets, it's going to suffer anyway.
> >    Real sensitive applications will simply pin tasks on a socket to
> >    avoid that in the first place. If we make the whole thing
> >    configurable enough then the sysadmin can set it up to support
> >    even the nonsensical case of identical cache partitions on all
> >    sockets and let tasks use the corresponding partitions when
> >    migrating.
> > 
> >  - The number of cache slices is going to be limited no matter what,
> >    so one still has to come up with a sensible partitioning scheme.
> > 
> >  - Even if we have enough cos ids the system wide view will not make
> >    the configuration problem any simpler as it remains per socket.
> > 
> > It's hard. Policies are hard by definition, but this one is harder than most
> > other policies due to the inherent limitations.
> > 
> > So now to the interface part. Unfortunately we need to expose this very
> > close to the hardware implementation as there are really no abstractions
> > which allow us to express the various bitmap combinations. Any abstraction I
> > tried to come up with renders that thing completely useless.
> > 
> > I was not able to identify any existing infrastructure where this really fits in. I
> > chose a directory/file based representation. We certainly could do the same
> 
> Is this be /sys/devices/system/?
> Then create qos/cat directory. In the future, other directories may be created
> e.g. qos/mbm?
> 
> Thanks.
> 
> -Fenghua

Fenghua, 

I suppose Thomas is talking about the socketmask only, as discussed in
the call with Intel.

Thomas, is that correct? (if you want a change in directory structure,
please explain the whys, because we don't need that change in directory 
structure).




  reply	other threads:[~2015-12-23 10:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-18 18:25 [RFD] CAT user space interface revisited Thomas Gleixner
2015-11-18 19:38 ` Luiz Capitulino
2015-11-18 19:55   ` Auld, Will
2015-11-18 22:34 ` Marcelo Tosatti
2015-11-19  0:34   ` Marcelo Tosatti
2015-11-19  8:35     ` Thomas Gleixner
2015-11-19 13:44       ` Luiz Capitulino
2015-11-20 14:15       ` Marcelo Tosatti
2015-11-19  8:11   ` Thomas Gleixner
2015-11-19  0:01 ` Marcelo Tosatti
2015-11-19  1:05   ` Marcelo Tosatti
2015-11-19  9:09     ` Thomas Gleixner
2015-11-19 20:59       ` Marcelo Tosatti
2015-11-20  7:53         ` Thomas Gleixner
2015-11-20 17:51           ` Marcelo Tosatti
2015-11-19 20:30     ` Marcelo Tosatti
2015-11-19  9:07   ` Thomas Gleixner
2015-11-24  8:27   ` Chao Peng
     [not found]     ` <20151124212543.GA11303@amt.cnet>
2015-11-25  1:29       ` Marcelo Tosatti
2015-11-24  7:31 ` Chao Peng
2015-11-24 23:06   ` Marcelo Tosatti
2015-12-22 18:12 ` Yu, Fenghua
2015-12-23 10:28   ` Marcelo Tosatti [this message]
2015-12-29 12:44     ` Thomas Gleixner
2015-12-31 19:22       ` Marcelo Tosatti
2015-12-31 22:30         ` Thomas Gleixner
2016-01-04 17:20           ` Marcelo Tosatti
2016-01-04 17:44             ` Marcelo Tosatti
2016-01-05 23:09             ` Thomas Gleixner
2016-01-06 12:46               ` Marcelo Tosatti
2016-01-06 13:10                 ` Tejun Heo
2016-01-08 20:21               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151223102820.GA2559@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fenghua.yu@intel.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vikas.shivappa@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox