From: "Daniel P. Berrange" <berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
kay.sievers-tD+1rO4QERM@public.gmane.org,
Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH cgroup/for-3.11 1/3] cgroup: mark "tasks" cgroup file as insane
Date: Fri, 7 Jun 2013 11:12:20 +0100 [thread overview]
Message-ID: <20130607101220.GE10742@redhat.com> (raw)
In-Reply-To: <20130606211410.GF5045-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
On Thu, Jun 06, 2013 at 02:14:10PM -0700, Tejun Heo wrote:
> Hello,
>
> On Thu, Jun 06, 2013 at 10:20:55AM +0100, Daniel P. Berrange wrote:
> > Unless I'm mistaken there is no alternative that can work. With QEMU
> > we need to apply scheduling controls to
> >
> > 1. Individual vCPU threads
> > 2. All non-vCPU threads (ie QEMU's I/O threads)
> >
> > We can use per-thread APIs for 1, but for 2 we require something that
> > applies to the group of threads as a whole, without also impacting the
> > controls set for the vCPU threads. AFAIK, nothing except cgroups as
> > we use them today can satisfy that requirement ? Am I wrong ? Is there
> > something else that can achieve this same setup ?
>
> Can you please explain more about your requirements on !vCPU threads?
Well we pretty much needs the tunables available in the cpu, cpuset
and cpuacct controllers to be available for the set of non-vCPU threads
as a group. eg, cpu_shares, cfs_period_us, cfs_quota_us, cpuacct.usage,
cpuacct.usage_percpu, cpuset.cpus, cpuset.mems.
CPU/memory affinity could possibly be done with a combination of
sched_setaffinity + libnuma, but I'm not sure that it has quite
the same semantics. IIUC, with cpuset cgroup changing affinity
will cause the kernel to migrate existing memory allocations to
the newly specified node masks, but this isn't done if you just
use sched_setaffinity/libnuma.
For cpu accounting, you'd have to look at the overall cgroup usage
and then subtract the usage accounted to each vcpu thread to get
the non-vCPU thread group total. Possible but slightly tedious &
more inaccurate since you will have timing delays getting info
from each thread's /proc files
I don't see any way to do cpu_sahres, cfs_period_us and cfs_quota_us
for the group of non-vCPU threads as a whole. You can't set these
at the per-thread level since that is semantically very different.
You can't set these for the process as a whole at the cgroup level,
since that'll confine vCPU threads at the same time which is also
semantically very different.
We need completely separate scheduler tuning for the set of non-vCPU
threads, vs each vCPU thread. AFAICT the only way todo this is with
cgroups, grouping together subsets of threads within the process.
> > I understand that having wildly distinct hiearchies across different
> > controllers causes alot of pain for the kernel. Libvirt doesn't
> > actually require that full level of flexibility though. Our needs
> > are very much simpler. We're happy with the same core hierarchy
> > across all controllers. We just want to be able to create an extra
> > leaf node in some controllers to move threads about.
> >
> > It would be fine with us if the kernel required that the same directory
> > hierarchy exists in all controllers, and mandated that threads can only
> > be moved to a directory immediately below where the process is initially
> > placed.
>
> The problem is that it doesn't make any sense to split threads of the
> same process for at least two major controllers and you end up with
> situation where you can't identify a resource to be belonging to a
> certain cgroup because such level of granularity is simply undefined.
> As I wrote before, we can special case certain controllers but I'm
> extremely reluctant. If you need it, please convince me.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
next prev parent reply other threads:[~2013-06-07 10:12 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 2:13 [PATCH cgroup/for-3.11 1/3] cgroup: mark "tasks" cgroup file as insane Tejun Heo
[not found] ` <20130604021302.GH29989-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-04 2:13 ` [PATCH cgroup/for-3.11 2/3] cgroup: mark "notify_on_release" and "release_agent" cgroup files insane Tejun Heo
[not found] ` <20130604021355.GI29989-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-04 2:14 ` [PATCH cgroup/for-3.11 3/3] cgroup: clean up the cftype array for the base cgroup files Tejun Heo
[not found] ` <20130604021434.GJ29989-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-04 10:49 ` Li Zefan
2013-06-04 10:47 ` [PATCH cgroup/for-3.11 2/3] cgroup: mark "notify_on_release" and "release_agent" cgroup files insane Li Zefan
[not found] ` <51ADC5D3.5070108-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-06-04 20:00 ` Tejun Heo
2013-06-04 20:00 ` Tejun Heo
[not found] ` <20130604200003.GC14916-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-05 6:47 ` Glauber Costa
2013-06-05 6:47 ` Glauber Costa
2013-06-04 10:43 ` [PATCH cgroup/for-3.11 1/3] cgroup: mark "tasks" cgroup file as insane Li Zefan
2013-06-04 11:15 ` Daniel P. Berrange
[not found] ` <20130604111556.GA4963-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-04 14:34 ` Vivek Goyal
[not found] ` <20130604143444.GI4799-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-04 14:50 ` Daniel P. Berrange
[not found] ` <20130604145008.GV4963-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-04 15:12 ` Vivek Goyal
[not found] ` <20130604151236.GA7555-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-04 15:25 ` Daniel P. Berrange
2013-06-04 20:19 ` Tejun Heo
[not found] ` <20130604201947.GE14916-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-05 18:52 ` Tejun Heo
2013-06-05 18:52 ` Tejun Heo
2013-06-06 7:48 ` Li Zefan
2013-06-06 9:20 ` Daniel P. Berrange
[not found] ` <20130606092055.GF30217-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-06 21:14 ` Tejun Heo
[not found] ` <20130606211410.GF5045-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-07 5:10 ` Lennart Poettering
2013-06-07 5:10 ` Lennart Poettering
[not found] ` <20130607051040.GA1364-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
2013-06-07 9:30 ` Daniel P. Berrange
[not found] ` <20130607093050.GA10742-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-07 20:05 ` Tejun Heo
2013-06-07 20:03 ` Tejun Heo
2013-06-07 20:03 ` Tejun Heo
[not found] ` <20130607200307.GA14781-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-10 11:08 ` Lennart Poettering
2013-06-10 11:08 ` Lennart Poettering
2013-06-07 10:12 ` Daniel P. Berrange [this message]
[not found] ` <20130607101220.GE10742-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-07 10:21 ` Kay Sievers
2013-06-07 10:21 ` Kay Sievers
2013-06-07 10:28 ` Daniel P. Berrange
2013-06-07 10:32 ` Glauber Costa
2013-06-07 10:32 ` Glauber Costa
[not found] ` <51B1B6C2.7000304-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-06-07 20:32 ` Tejun Heo
2013-06-07 20:23 ` Tejun Heo
2013-06-07 20:23 ` Tejun Heo
2013-06-04 20:19 ` Tejun Heo
2013-06-04 11:21 ` Michal Hocko
[not found] ` <20130604112139.GD31242-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-04 20:01 ` Tejun Heo
[not found] ` <20130604200149.GD14916-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-05 9:49 ` Li Zefan
2013-06-05 19:03 ` Tejun Heo
2013-06-05 19:03 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130607101220.GE10742@redhat.com \
--to=berrange-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=kay.sievers-tD+1rO4QERM@public.gmane.org \
--cc=lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.