linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Kletzander <mkletzan@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>,
	"Auld, Will" <will.auld@intel.com>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"tj@kernel.org" <tj@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"Fleming, Matt" <matt.fleming@intel.com>,
	"Williamson, Glenn P" <glenn.p.williamson@intel.com>,
	"Juvva, Kanaka D" <kanaka.d.juvva@intel.com>
Subject: Re: [PATCH 3/9] x86/intel_rdt: Cache Allocation documentation and cgroup usage guide
Date: Sun, 2 Aug 2015 17:48:07 +0200	[thread overview]
Message-ID: <20150802154807.GA19188@wheatley> (raw)
In-Reply-To: <20150730200812.GA10832@amt.cnet>

[-- Attachment #1: Type: text/plain, Size: 7853 bytes --]

On Thu, Jul 30, 2015 at 05:08:13PM -0300, Marcelo Tosatti wrote:
>On Thu, Jul 30, 2015 at 10:47:23AM -0700, Vikas Shivappa wrote:
>>
>>
>> Marcello,
>>
>>
>> On Wed, 29 Jul 2015, Marcelo Tosatti wrote:
>> >
>> >How about this:
>> >
>> >desiredclos (closid  p1  p2  p3 p4)
>> >	     1       1   0   0  0
>> >	     2	     0	 0   0  1
>> >	     3	     0   1   1  0
>>
>> #1 Currently in the rdt cgroup , the root cgroup always has all the
>> bits set and cant be changed (because the cgroup hierarchy would by
>> default make this to have all bits as all the children need to have
>> a subset of the root's bitmask). So if the user creates a cgroup and
>> not put any task in it , the tasks in the root cgroup could be still
>> using that part of the cache. Thats the reason i say we can have
>> really 'exclusive' masks.
>>
>> Or in other words - there is always a desired clos (0) which has all
>> parts set which acts like a default pool.
>>
>> Also the parts can overlap.  Please apply this for all the below
>> comments which will change the way they work.
>>
>> >
>> >p means part.
>>
>> I am assuming p = (a contiguous cache capacity bit mask)
>
>Yes.
>
>> >closid 1 is a exclusive cgroup.
>> >closid 2 is a "cache hog" class.
>> >closid 3 is "default closid".
>> >
>> >Desiredclos is what user has specified.
>> >
>> >Transition 1: desiredclos --> effectiveclos
>> >Clean all bits of unused closid's
>> >(that must be updated whenever a
>> >closid1 cgroup goes from empty->nonempty
>> >and vice-versa).
>> >
>> >effectiveclos (closid  p1  p2  p3 p4)
>> >	       1       0   0   0  0
>> >	       2       0   0   0  1
>> >	       3       0   1   1  0
>>
>> >
>> >Transition 2: effectiveclos --> expandedclos
>> >expandedclos (closid  p1  p2  p3 p4)
>> >	       1       0   0   0  0
>> >	       2       0   0   0  1
>> >	       3       1   1   1  0
>> >Then you have different inplacecos for each
>> >CPU (see pseudo-code below):
>> >
>> >On the following events.
>> >
>> >- task migration to new pCPU:
>> >- task creation:
>> >
>> >	id = smp_processor_id();
>> >	for (part = desiredclos.p1; ...; part++)
>> >		/* if my cosid is set and any other
>> >	   	   cosid is clear, for the part,
>> >		   synchronize desiredclos --> inplacecos */
>> >		if (part[mycosid] == 1 &&
>> >		    part[any_othercosid] == 0)
>> >			wrmsr(part, desiredclos);
>> >
>>
>> Currently the root cgroup would have all the bits set which will act
>> like a default cgroup where all the otherwise unused parts (assuming
>> they are a set of contiguous cache capacity bits) will be used.
>>
>> Otherwise the question is in the expandedclos - who decides to
>> expand the closx parts to include some of the unused parts.. - that
>> could just be a default root always ?
>
>Right, so the problem is for certain closid's you might never want
>to expand (because doing so would cause data to be cached in a
>cache way which might have high eviction rate in the future).
>See the example from Will.
>
>But for the default cache (that is "unclassified applications"
>i suppose it is beneficial to expand in most cases, that is,
>use maximum amount of cache irrespective of eviction rate, which
>is the behaviour that exists now without CAT).
>
>So perhaps a new flag "expand=y/n" can be added to the cgroup
>directories... What do you say?
>
>Userspace representation of CAT
>-------------------------------
>
>Usage model:
>1) measure application performance without L3 cache reservation.
>2) measure application perf with L3 cache reservation and
>X number of cache ways until desired performance is attained.
>
>Requirements:
>1) Persistency of CLOS configuration across hardware. On migration
>of operating system or application between different hardware
>systems we'd like the following to be maintained:
>        - exclusive number of bytes (*) reserved to a certain CLOSid.
>        - shared number of bytes (*) reserved between a certain group
>          of CLOSid's.
>
>For both code and data, rounded down or up in cache way size.
>
>2) Reasoning:
>Different CBM masks in different hardware platforms might be necessary
>to specify the same CLOS configuration, in terms of exclusive number of
>bytes and shared number of bytes. (cache-way rounded number of bytes).
>For example, due to L3 allocation by other hardware entities in certain parts
>of the cache it might be necessary to relocate CBM mask to achieve
>the same CLOS configuration.
>
>3) Proposed format:
>

Few questions from a random listener, I apologise if some of them are
in a wrong place due to me missing some information from past threads.

I'm not sure whether the following proposal to the format is the
internal structure or what's going to be in cgroups.  If this is
user-visible interface, I think it could be a little less detailed.

>sharedregionK.exclusive - Number of exclusive cache bytes reserved for
>			shared region.
>sharedregionK.excl_data - Number of exclusive cache data bytes reserved for
>	 		shared region.
>sharedregionK.excl_bytes - Number of exclusive cache code bytes reserved for
>			shared region.
>sharedregionK.round_down - Round down to cache way bytes from respective number
>		     specification (default is round up).
>sharedregionK.expand - y/n - Expand shared region to more cache ways
> 			when available (default N).
>
>cgroupN.exclusive - Number of exclusive L3 cache bytes reserved
>		    for cgroup.
>cgroupN.excl_data - Number of exclusive L3 data cache bytes reserved
>		    for cgroup.
>cgroupN.excl_code - Number of exclusive L3 code cache bytes reserved
>		    for cgroup.

By exclusive, you mean that it's exclusive to the tasks in this
cgroup?

The thing is that we must differentiate between limiting some
process's from hogging the memory (like example 2 below) and making
some part of the cache exclusive for particular application (example 1
below).

I just hope we won't need to add something similar to 'isolcpus=' just
so we can make sure none of the tasks in the root cgroup can spoil the
part of the cache we need to have exclusive.

I'm not sure creating a new subgroup and moving all the tasks there
would work, It certainly is not possible with other cgroups, like the
cpuset cgroup mentioned beforehand.

I also don't quite fully understand how the co-mounting with the
cpuset cgroup should work, but that's not design-related.

One more question, how does this work on systems with multiple L3
caches (e.g. large NUMA node systems)?  I'm guessing if the process is
running only on some CPUs, the wrmsr() will be called on that
particular CPU(s), right?

>cgroupN.round_down - Round down to cache way bytes from respective number
>		     specification (default is round up).
>cgroupN.expand - y/n - Expand shared region to more cache ways when
>		       available (default N).
>cgroupN.shared = { sharedregion1, sharedregion2, ... } (list of shared
>regions)
>
>Example 1:
>One application with 2M exclusive cache, two applications
>with 1M exclusive each, sharing an expansive shared region of 1M.
>
>cgroup1.exclusive = 2M
>
>sharedregion1.exclusive = 1M
>sharedregion1.expand = Y
>
>cgroup2.exclusive = 1M
>cgroup2.shared = sharedregion1
>
>cgroup3.exclusive = 1M
>cgroup3.shared = sharedregion1
>
>Example 2:
>3 high performance applications running, one of which is a cache hog
>with no cache locality.
>
>cgroup1.exclusive = 8M
>cgroup2.exclusive = 8M
>
>cgroup3.exclusive = 512K
>cgroup3.round_down = Y
>
>In all cases the default cgroup (which requires no explicit
>specification) is expansive and uses the remaining cache
>ways, including the ways shared by other hardware entities.
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2015-08-02 15:48 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-01 22:21 [PATCH V12 0/9] Hot cpu handling changes to cqm, rapl and Intel Cache Allocation support Vikas Shivappa
2015-07-01 22:21 ` [PATCH 1/9] x86/intel_cqm: Modify hot cpu notification handling Vikas Shivappa
2015-07-29 16:44   ` Peter Zijlstra
2015-07-31 23:19     ` Vikas Shivappa
2015-07-01 22:21 ` [PATCH 2/9] x86/intel_rapl: Modify hot cpu notification handling for RAPL Vikas Shivappa
2015-07-01 22:21 ` [PATCH 3/9] x86/intel_rdt: Cache Allocation documentation and cgroup usage guide Vikas Shivappa
2015-07-28 14:54   ` Peter Zijlstra
2015-08-04 20:41     ` Vikas Shivappa
2015-07-28 23:15   ` Marcelo Tosatti
2015-07-29  0:06     ` Vikas Shivappa
2015-07-29  1:28       ` Auld, Will
2015-07-29 19:32         ` Marcelo Tosatti
2015-07-30 17:47           ` Vikas Shivappa
2015-07-30 20:08             ` Marcelo Tosatti
2015-07-31 15:34               ` Marcelo Tosatti
2015-08-02 15:48               ` Martin Kletzander [this message]
2015-08-03 15:13                 ` Marcelo Tosatti
2015-08-03 18:22                   ` Vikas Shivappa
2015-07-30 20:22             ` Marcelo Tosatti
2015-07-30 23:03               ` Vikas Shivappa
2015-07-31 14:45                 ` Marcelo Tosatti
2015-07-31 16:41                   ` [summary] " Vikas Shivappa
2015-07-31 18:38                     ` Marcelo Tosatti
2015-07-29 20:07         ` Vikas Shivappa
2015-07-01 22:21 ` [PATCH 4/9] x86/intel_rdt: Add support for Cache Allocation detection Vikas Shivappa
2015-07-28 16:25   ` Peter Zijlstra
2015-07-28 22:07     ` Vikas Shivappa
2015-07-01 22:21 ` [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management Vikas Shivappa
2015-07-28 17:06   ` Peter Zijlstra
2015-07-30 18:01     ` Vikas Shivappa
2015-07-28 17:17   ` Peter Zijlstra
2015-07-30 18:10     ` Vikas Shivappa
2015-07-30 19:44   ` Tejun Heo
2015-07-31 15:12     ` Marcelo Tosatti
2015-08-02 16:23       ` Tejun Heo
2015-08-03 20:32         ` Marcelo Tosatti
2015-08-04 12:55           ` Marcelo Tosatti
2015-08-04 18:36             ` Tejun Heo
2015-08-04 18:32           ` Tejun Heo
2015-07-31 16:24     ` Vikas Shivappa
2015-08-02 16:31       ` Tejun Heo
2015-08-04 18:50         ` Vikas Shivappa
2015-08-04 19:03           ` Tejun Heo
2015-08-05  2:21             ` Vikas Shivappa
2015-08-05 15:46               ` Tejun Heo
2015-08-06 20:58                 ` Vikas Shivappa
2015-08-07 14:48                   ` Tejun Heo
2015-08-05 12:22         ` Matt Fleming
2015-08-05 16:10           ` Tejun Heo
2015-08-06  0:24           ` Marcelo Tosatti
2015-08-06 20:46             ` Vikas Shivappa
2015-08-07 13:15               ` Marcelo Tosatti
2015-08-18  0:20                 ` Marcelo Tosatti
2015-08-21  0:06                   ` Vikas Shivappa
2015-08-21  0:13                     ` Vikas Shivappa
2015-08-22  2:28                     ` Marcelo Tosatti
2015-08-23 18:47                       ` Vikas Shivappa
2015-08-24 13:06                         ` Marcelo Tosatti
2015-07-01 22:21 ` [PATCH 6/9] x86/intel_rdt: Add support for cache bit mask management Vikas Shivappa
2015-07-28 16:35   ` Peter Zijlstra
2015-07-28 22:08     ` Vikas Shivappa
2015-07-28 16:37   ` Peter Zijlstra
2015-07-30 17:54     ` Vikas Shivappa
2015-07-01 22:21 ` [PATCH 7/9] x86/intel_rdt: Implement scheduling support for Intel RDT Vikas Shivappa
2015-07-29 13:49   ` Peter Zijlstra
2015-07-30 18:16     ` Vikas Shivappa
2015-07-01 22:21 ` [PATCH 8/9] x86/intel_rdt: Hot cpu support for Cache Allocation Vikas Shivappa
2015-07-29 15:53   ` Peter Zijlstra
2015-07-31 23:21     ` Vikas Shivappa
2015-07-01 22:21 ` [PATCH 9/9] x86/intel_rdt: Intel haswell Cache Allocation enumeration Vikas Shivappa
2015-07-29 16:35   ` Peter Zijlstra
2015-08-03 20:49     ` Vikas Shivappa
2015-07-29 16:36   ` Peter Zijlstra
2015-07-30 18:45     ` Vikas Shivappa
2015-07-13 17:13 ` [PATCH V12 0/9] Hot cpu handling changes to cqm, rapl and Intel Cache Allocation support Vikas Shivappa
2015-07-16 12:55   ` Thomas Gleixner
2015-07-24 16:52 ` Thomas Gleixner
2015-07-24 18:28   ` Vikas Shivappa
2015-07-24 18:39     ` Thomas Gleixner
2015-07-24 18:45       ` Vikas Shivappa
2015-07-29 16:47     ` Peter Zijlstra
2015-07-29 22:53       ` Vikas Shivappa
2015-07-24 18:32   ` Vikas Shivappa
  -- strict thread matches above, loose matches on Subject: below --
2015-08-06 21:55 [PATCH V13 0/9] Intel cache allocation and Hot cpu handling changes to cqm, rapl Vikas Shivappa
2015-08-06 21:55 ` [PATCH 3/9] x86/intel_rdt: Cache Allocation documentation and cgroup usage guide Vikas Shivappa
2015-06-25 19:25 [PATCH V11 0/9] Hot cpu handling changes to cqm,rapl and Intel Cache Allocation support Vikas Shivappa
2015-06-25 19:25 ` [PATCH 3/9] x86/intel_rdt: Cache Allocation documentation and cgroup usage guide Vikas Shivappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150802154807.GA19188@wheatley \
    --to=mkletzan@redhat.com \
    --cc=glenn.p.williamson@intel.com \
    --cc=hpa@zytor.com \
    --cc=kanaka.d.juvva@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt.fleming@intel.com \
    --cc=mingo@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vikas.shivappa@intel.com \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=will.auld@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).