public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	H Peter Anvin <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support
Date: Fri, 16 Oct 2015 11:44:52 +0200	[thread overview]
Message-ID: <20151016094452.GO3816@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20151016001715.GB31794@amt.cnet>

On Thu, Oct 15, 2015 at 09:17:16PM -0300, Marcelo Tosatti wrote:
> On Thu, Oct 15, 2015 at 01:37:02PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 13, 2015 at 07:40:58PM -0300, Marcelo Tosatti wrote:
> > > How can you fix the issue of sockets with different reserved cache
> > > regions with hw in the cgroup interface?
> > 
> > No idea what you're referring to. But IOCTLs blow.
> 
> Tejun brought up syscalls. Syscalls seem too generic.
> So ioctls were chosen instead.
> 
> It is necessary to perform the following operations:
> 
> 1) create cache reservation (params = size, type).

mkdir

> 2) delete cache reservation.

rmdir

> 3) attach cache reservation (params = cache reservation id, pid).
> 4) detach cache reservation (params = cache reservation id, pid).

echo $pid > tasks

> Can it done via cgroups? If so, works for me.

Trivially.

> A list of problems with the cgroup interface has been written,
> in the thread... and we found another problem.

Which was endless and tiresome so I stopped reading.

> List of problems with cgroup interface:
> 
> 1) Global IPI on CBM <---> task change does not scale.
> 
>  * cbm_update_all() - Update the cache bit mask for all packages.
>  */
> static inline void cbm_update_all(u32 closid)
> {
>        on_each_cpu_mask(&rdt_cpumask, cbm_cpu_update, (void *)closid,
> 1);
> }

There is no way around that, the moment you view the CBM as a global
resource; ie. a CBM is configured the same on all sockets; you need to
do this for a task using that CBM might run on any CPU at any time.

This is not because of the cgroup interface at all. This is because you
want CBMs to be the same machine wide.

The only way to actually change that is to _be_ a cgroup and co-mount
with cpusets and be incestuous and look at the cpusets state and
discover disjoint groups.

> 2) Syscall interface specification is in kbytes, not
> cache ways (which is what must be recorded by the OS
> to allow migration of the OS between different
> hardware systems).

Meh, that again is nothing fundamental. The cgroup interface could do
bytes just the same.

> 3) Compilers are able to configure cache optimally for
> given ranges of code inside applications, easily,
> if desired.

Yeah, so? Every SKU has a different cache size, so once you're down to
that level you're pretty hard set in your configuration and it really
doesn't matter if you give bytes or ways, you _KNOW_ what your
configuration will be.

> 4) Problem-2: The decision to allocate cache is tied to application
> initialization / destruction, and application initialization is
> essentially random from the POV of the system (the events which trigger
> the execution of the application are not visible from the system).
> 
> Think of a server running two different servers: one database
> with requests that are received with poisson distribution, average 30
> requests per hour, and every request takes 1 minute.
> 
> One httpd server with nearly constant load.
> 
> Without cache reservations, database requests takes 2 minutes.
> That is not acceptable for the database clients.
> But with cache reservation, database requests takes 1 minute.
> 
> You want to maximize performance of httpd and database requests
> What you do? You allow the database server to perform cache
> reservation once a request comes in, and to undo the reservation
> once the request is finished.

> Its impossible to perform this with a centralized interface.

Not so; just a wee bit more fragile that desired. But, this is a
pre-existing problem with cgroups and needs to be solved, not using
cgroups because of this is silly.

Every cgroup that can work on tasks suffers this and arguably a few
more.

> 5) Modify scenario 2 above as follows: each database request
> is handled by two newly created threads, and they share a certain
> percentage
> of data cache, and a certain percentage of code cache.
> 
> So the dispatcher thread, on arrival of request, has to:
> 
>         - create data cache reservation = tcrid-A.
>         - create code cache reservation = tcrid-B.
>         - create thread-1.
>         - assign tcird-A and B to thread-1.
>         - create thread-2.
>         - assign tcird-A and B to thread-2.
> 
> 6) Create reservations in such a way that the sum is larger than
> total amount of cache, and CPU pinning (example from Karen Noel):
> 
> VM-1 on socket-1 with 80% of reservation.
> VM-2 on socket-2 with 80% of reservation.
> VM-1 pinned to socket-1.
> VM-2 pinned to socket-2.
> 
> Cgroups interface attempts to set a cache mask globally. This is the
> problem the "expand" proposal solves:
> https://lkml.org/lkml/2015/7/29/682

That email is unparsable. But the only way to sanely do so it do closely
intertwine oneself with cpusets, doing that with anything other than
another cgroup controller absolutely full on insane.

> 7) Consider two sockets with different region of L3 cache
> shared with HW:
> 
> — CPUID.(EAX=10H, ECX=1):EBX[31:0] reports a bit mask. Each set bit
> within the length of the CBM
> indicates the corresponding unit of the L3 allocation may be used by
> other entities in the platform (e.g. an
> integrated graphics engine or hardware units outside the processor core
> and have direct access to L3).
> Each cleared bit within the length of the CBM indicates the
> corresponding allocation unit can be configured
> to implement a priority-based allocation scheme chosen by an OS/VMM
> without interference with other
> hardware agents in the system. Bits outside the length of the CBM are
> reserved.
> 
> You want the kernel to maintain different bitmasks in the CBM:
> 
>         socket1 [range-A]
>         socket2 [range-B]
> 
> And the kernel will automatically switch from range A to range B
> when the thread switches sockets.

This is firmly in the insane range of things.. not going to happen full
stop.

It a thread can freely schedule between two CPUs its configuration on
those two CPUs had better bloody be the same.

  reply	other threads:[~2015-10-16  9:45 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02  6:09 [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 01/11] x86/intel_cqm: Modify hot cpu notification handling Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 02/11] x86/intel_rapl: " Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 03/11] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 04/11] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
2015-11-04 14:51   ` Luiz Capitulino
2015-10-02  6:09 ` [PATCH V15 05/11] x86/intel_rdt: Add Class of service management Fenghua Yu
2015-11-04 14:55   ` Luiz Capitulino
2015-10-02  6:09 ` [PATCH V15 06/11] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 07/11] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 08/11] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 09/11] x86/intel_rdt: Intel haswell Cache Allocation enumeration Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 10/11] x86,cgroup/intel_rdt : Add intel_rdt cgroup documentation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 11/11] x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation Fenghua Yu
2015-11-18 20:58   ` Marcelo Tosatti
2015-11-18 21:27   ` Marcelo Tosatti
2015-12-16 22:00     ` Yu, Fenghua
2015-11-18 22:15   ` Marcelo Tosatti
2015-12-14 22:58     ` Yu, Fenghua
2015-10-11 19:50 ` [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support Thomas Gleixner
2015-10-12 18:52   ` Yu, Fenghua
2015-10-12 19:58     ` Thomas Gleixner
2015-10-13 22:40     ` Marcelo Tosatti
2015-10-15 11:37       ` Peter Zijlstra
2015-10-16  0:17         ` Marcelo Tosatti
2015-10-16  9:44           ` Peter Zijlstra [this message]
2015-10-16 20:24             ` Marcelo Tosatti
2015-10-19 23:49               ` Marcelo Tosatti
2015-10-13 21:31   ` Marcelo Tosatti
2015-10-15 11:36     ` Peter Zijlstra
2015-10-16  2:28       ` Marcelo Tosatti
2015-10-16  9:50         ` Peter Zijlstra
2015-10-26 20:02           ` Marcelo Tosatti
2015-11-02 22:20           ` cat cgroup interface proposal (non hierarchical) was " Marcelo Tosatti
2015-11-04 14:42 ` Luiz Capitulino
2015-11-04 14:57   ` Thomas Gleixner
2015-11-04 15:12     ` Luiz Capitulino
2015-11-04 15:28       ` Thomas Gleixner
2015-11-04 15:35         ` Luiz Capitulino
2015-11-04 15:50           ` Thomas Gleixner
2015-11-05  2:19 ` [PATCH 1/2] x86/intel_rdt,intel_cqm: Remove build dependency of RDT code on CQM code David Carrillo-Cisneros
2015-11-05  2:19   ` [PATCH 2/2] x86/intel_rdt: Fix bug in initialization, locks and write cbm mask David Carrillo-Cisneros

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151016094452.GO3816@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox