All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"x86@kernel.org" <x86@kernel.org>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	"Shivappa, Vikas" <vikas.shivappa@intel.com>,
	Tejun Heo <tj@kernel.org>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [RFD] CAT user space interface revisited
Date: Mon, 4 Jan 2016 15:20:54 -0200	[thread overview]
Message-ID: <20160104172054.GA21926@amt.cnet> (raw)
In-Reply-To: <alpine.DEB.2.11.1512312251430.28591@nanos>

On Thu, Dec 31, 2015 at 11:30:57PM +0100, Thomas Gleixner wrote:
> Marcelo,
> 
> On Thu, 31 Dec 2015, Marcelo Tosatti wrote:
> 
> First of all thanks for the explanation.
> 
> > There is one directory structure in this topic, CAT. That is the
> > directory structure which is exposed to userspace to control the 
> > CAT HW. 
> > 
> > With the current patchset posted by Intel ("Subject: [PATCH V16 00/11]
> > x86: Intel Cache Allocation Technology Support"), the directory
> > structure there (the files and directories exposed by that patchset)
> > (*1) does not allow one to configure different CBM masks on each socket
> > (that is, it forces the user to configure the same mask CBM on every
> > socket). This is a blocker for us, and it is one of the points in your
> > proposal.
> > 
> > There was a call between Red Hat and Intel where it was communicated
> > to Intel, and Intel agreed, that it was necessary to fix this (fix this
> > == allow different CBM masks on different sockets).
> > 
> > Now, that is one change to the current directory structure (*1).
> 
> I don't have an idea how that would look like. The current structure is a
> cgroups based hierarchy oriented approach, which does not allow simple things
> like
> 
> T1	00001111
> T2	00111100
> 
> at least not in a way which is natural to the problem at hand.



	cgroupA/

		cbm_mask  (if set, set for all CPUs)

		socket1/cbm_mask
		socket2/cbm_mask
		...
		socketN/cbm_mask (if set, overrides global
		cbm_mask).

Something along those lines.

Do you see any problem with it?

> > (*1) modified to allow for different CBM masks on different sockets, 
> > lets say (*2), is what we have been waiting for Intel to post. 
> > It would handle our usecase, and all use-cases which the current
> > patchset from Intel already handles (Vikas posted emails mentioning
> > there are happy users of the current interface, feel free to ask 
> > him for more details).
> 
> I cannot imagine how that modification to the current interface would solve
> that. Not to talk about per CPU associations which are not related to tasks at
> all.

Not sure what you mean by per CPU associations.

If you fix a cbmmask on a given pCPU, say CPU1, and control which tasks
run on that pCPU, then you control the cbmmask for all tasks (say
tasklist-1) on that CPU, fine.

Can achieve the same by putting all tasks from tasklist-1 into a
cgroup.

> > What i have asked you, and you replied "to go Google read my previous
> > post" is this:
> > What are the advantages over you proposal (which is a completely
> > different directory structure, requiring a complete rewrite),
> > over (*2) ?
> > 
> > (what is my reason behind this: the reason is that if you, with
> > maintainer veto power, forces your proposal to be accepted, it will be
> > necessary to wait for another rewrite (a new set of problems, fully
> > think through your proposal, test it, ...) rather than simply modify an
> > already known, reviewed, already used directory structure.
> > 
> > And functionally, your proposal adds nothing to (*2) (other than, well,
> > being a different directory structure).
> 
> Sorry. I cannot see at all how a modification to the existing interface would
> cover all the sensible use cases I described in a coherent way. I really want
> to see a proper description of the interface before people start hacking on it
> in a frenzy. What you described is: "let's say (*2)" modification. That's
> pretty meager.
> 
> > If Fenghua or you post a patchset, say in 2 weeks, with your proposal,
> > i am fine with that. But i since i doubt that will be the case, i am 
> > pushing for the interface which requires the least amount of changes
> > (and therefore the least amount of time) to be integrated.
> > 
> > >From your email:
> > 
> > "It would even be sufficient for particular use cases to just associate
> > a piece of cache to a given CPU and do not bother with tasks at all.
> > 
> > We really need to make this as configurable as possible from userspace
> > without imposing random restrictions to it. I played around with it on
> > my new intel toy and the restriction to 16 COS ids (that's 8 with CDP
> > enabled) makes it really useless if we force the ids to have the same
> > meaning on all sockets and restrict it to per task partitioning."
> > 
> > Yes, thats the issue we hit, that is the modification that was agreed
> > with Intel, and thats what we are waiting for them to post.
> 
> How do you implement the above - especially that part:
> 
>  "It would even be sufficient for particular use cases to just associate a
>   piece of cache to a given CPU and do not bother with tasks at all."
> 
> as a "simple" modification to (*1) ?

As noted above.
>  
> > > I described a directory structure for that qos/cat stuff in my proposal and
> > > that's complete AFAICT.
> > 
> > Ok, lets make the job for the submitter easier. You are the maintainer,
> > so you decide.
> > 
> > Is it enough for you to have (*2) (which was agreed with Intel), or 
> > would you rather prefer to integrate the directory structure at 
> > "[RFD] CAT user space interface revisited" ?
> 
> The only thing I care about as a maintainer is, that we merge something which
> actually reflects the properties of the hardware and gives the admin the
> required flexibility to utilize it fully. I don't care at all if it's my
> proposal or something else which allows to do the same.
> 
> Let me copy the relevant bits from my proposal here once more and let me ask
> questions to the various points so you can tell me how that modification to
> (*1) is going to deal with that.
> 
> >> At top level:
> >>
> >>  xxxxxxx/cat/max_cosids <- Assume that all CPUs are the same
> >>  xxxxxxx/cat/max_maskbits <- Assume that all CPUs are the same

This can be exposed to userspace via a file.

> >>  xxxxxxx/cat/cdp_enable <- Depends on CDP availability
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

Intel has come up with a scheme to implement CDP. I'll go read 
that and reply to this email afterwards.

> >> Per socket data:
> >>
> >>  xxxxxxx/cat/socket-0/
> >>  ...
> >>  xxxxxxx/cat/socket-N/l3_size
> >>  xxxxxxx/cat/socket-N/hwsharedbits
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

l3_size: userspace can figure that by itself (exposed somewhere in
sysfs).

hwsharedbits: All userspace needs to know is
which bits are shared with HW, to decide whether or not to use that
region of a given socket for a given cbmmask.

So expose that userspace, fine. Can do that in cgroups.

> >> Per socket mask data:
> >>
> >>  xxxxxxx/cat/socket-N/cos-id-0/
> >>  ...
> >>  xxxxxxx/cat/socket-N/cos-id-N/inuse
> >>                               /cat_mask
> >>                               /cdp_mask <- Data mask if CDP enabled
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

Unsure - will reply in next email (but per-socket information seems
independent of that).

> 
> >> Per cpu default cos id for the cpus on that socket:
> >> 
> >>  xxxxxxx/cat/socket-N/cpu-x/default_cosid
> >>  ...
> >>  xxxxxxx/cat/socket-N/cpu-N/default_cosid
> >>
> >> The above allows a simple cpu based partitioning. All tasks which do
> >> not have a cache partition assigned on a particular socket use the
> >> default one of the cpu they are running on.
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

Not required because with current Intel patchset you'd do:


# mount | grep rdt
cgroup on /sys/fs/cgroup/intel_rdt type cgroup
(rw,nosuid,nodev,noexec,relatime,intel_rdt)
# cd /sys/fs/cgroup/intel_rdt
# ls
cgroupALL              cgroup.procs  cgroup.sane_behavior
notify_on_release  tasks
cgroup.clone_children  cgroupRSVD    intel_rdt.l3_cbm	   release_agent
# cat tasks
1042
1066
1067
1069
...
# cd cgroupALL/
ps auxw  | while read i; do echo $i ; done
| cut -f 2 -d " "   | grep -v PID | while read x ; do echo $x > tasks;
done
-bash: echo: write error: No such process
-bash: echo: write error: No such process
-bash: echo: write error: No such process
-bash: echo: write error: No such process

# cat ../tasks | while read i; do echo $i > tasks; done
# cat ../tasks  | wc -l
0
(no tasks on root cgroup)

# cd ../cgroupRSVD
# cgroupRSVD]# cat tasks
# ps auxw | grep postfix
root	   1942  0.0  0.0  91136  4860 ?        Ss   Nov25   0:00
/usr/libexec/postfix/master -w
postfix    1981  0.0  0.0  91308  6520 ?        S    Nov25   0:00 qmgr
-l -t unix -u
postfix    4416  0.0  0.0  91240  6296 ?        S    17:05   0:00 pickup
-l -t unix -u
root	   4486  0.0  0.0 112652  2304 pts/0    S+   17:31   0:00 grep
--color=auto postfix
# echo 4416 > tasks
# cat intel_rdt.l3_cbm
000ffff0
# cat ../cgroupALL/intel_rdt.l3_cbm
000000ff

Bits f0 are shared between cgroupRSVD and cgroupALL. Lets change:
# echo 0xf > ../cgroupALL/intel_rdt.l3_cbm
# cat ../cgroupALL/intel_rdt.l3_cbm
0000000f

Now they share none.

------

So i have created "cgroupALL" (what you call default_cosid) and
"cgroupRSVD".

> 
> >> Now for the task(s) partitioning:
> >>
> >>  xxxxxxx/cat/partitions/
> >>
> >> Under that directory one can create partitions
> >> 
> >>  xxxxxxx/cat/partitions/p1/tasks
> >>                           /socket-0/cosid
> >>                           ...
> >>                           /socket-n/cosid
> >> 
> >> The default value for the per socket cosid is COSID_DEFAULT, which
> >> causes the task(s) to use the per cpu default id. 
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.
> 
> Yes. I ask the same question several times and I really want to see the
> directory/interface structure which solves all of the above before anyone
> starts to implement it. 

I don't see the problem, have a sequence of commands above which shows
to set a directory structure which is useful and does what the HW 
interface is supposed to do.

> We already have a completely useless interface (*1)
> and there is no point to implement another one based on it (*2) just because
> it solves your particular issue and is the fastest way forward. User space
> interfaces are hard and we really do not need some half baken solution which
> we have to support forever.

Fine. Can you please tell me what i can't do with the current interface?
AFAICS everything can be done (except missing support for (*2)).

> 
> Let me enumerate the required points again:
> 
>     1) Information about the hardware properties

Fine. Intel should expose that information.

>     2) Integration of CAT and CDP 

Fine, Intel has comeup with a directory structure for that, 
let me read the patchset again and i'll reply to you.

>     3) Per socket cos-id partitioning

(*2) as listed in the beginning of this e-mail.

>     4) Per cpu default cos-id association

This already exists, and as noted in the command sequence above,
works just fine. Please explain what problem are you seeing.

>     5) Task association to cos-id

Not sure what that means. Please explain.

> 
> Can you please explain in a simple directory based scheme, like the one I gave
> you how all of these points are going to be solved with a modification to (*1)?
> 
> Thanks,
> 
> 	tglx

Thanks Thomas, this style discussion is quite useful.


  reply	other threads:[~2016-01-04 17:21 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-18 18:25 [RFD] CAT user space interface revisited Thomas Gleixner
2015-11-18 19:38 ` Luiz Capitulino
2015-11-18 19:55   ` Auld, Will
2015-11-18 22:34 ` Marcelo Tosatti
2015-11-19  0:34   ` Marcelo Tosatti
2015-11-19  8:35     ` Thomas Gleixner
2015-11-19 13:44       ` Luiz Capitulino
2015-11-20 14:15       ` Marcelo Tosatti
2015-11-19  8:11   ` Thomas Gleixner
2015-11-19  0:01 ` Marcelo Tosatti
2015-11-19  1:05   ` Marcelo Tosatti
2015-11-19  9:09     ` Thomas Gleixner
2015-11-19 20:59       ` Marcelo Tosatti
2015-11-20  7:53         ` Thomas Gleixner
2015-11-20 17:51           ` Marcelo Tosatti
2015-11-19 20:30     ` Marcelo Tosatti
2015-11-19  9:07   ` Thomas Gleixner
2015-11-24  8:27   ` Chao Peng
     [not found]     ` <20151124212543.GA11303@amt.cnet>
2015-11-25  1:29       ` Marcelo Tosatti
2015-11-24  7:31 ` Chao Peng
2015-11-24 23:06   ` Marcelo Tosatti
2015-12-22 18:12 ` Yu, Fenghua
2015-12-23 10:28   ` Marcelo Tosatti
2015-12-29 12:44     ` Thomas Gleixner
2015-12-31 19:22       ` Marcelo Tosatti
2015-12-31 22:30         ` Thomas Gleixner
2016-01-04 17:20           ` Marcelo Tosatti [this message]
2016-01-04 17:44             ` Marcelo Tosatti
2016-01-05 23:09             ` Thomas Gleixner
2016-01-06 12:46               ` Marcelo Tosatti
2016-01-06 13:10                 ` Tejun Heo
2016-01-08 20:21               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160104172054.GA21926@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fenghua.yu@intel.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vikas.shivappa@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.