cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kernel-team@fb.com, pjt@google.com,
	luto@amacapital.net, Mike Galbraith <efault@gmx.de>,
	torvalds@linux-foundation.org, Roman Gushchin <guro@fb.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>
Subject: Re: [PATCH v11 7/9] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root
Date: Wed, 18 Jul 2018 11:21:18 -0400	[thread overview]
Message-ID: <a1eb4971-ff23-6fd8-d75b-112d63f0aee9@redhat.com> (raw)
In-Reply-To: <151fc655-7e29-f060-789e-ee6c5c23d132@redhat.com>

On 07/10/2018 11:23 AM, Waiman Long wrote:
> On 07/06/2018 04:32 PM, Waiman Long wrote:
>> On 07/03/2018 11:58 AM, Tejun Heo wrote:
>>> Hello, Waiman.
>>>
>>> On Tue, Jul 03, 2018 at 08:41:31AM +0800, Waiman Long wrote:
>>>>> So, effective changing when enabling partition on a child feels wrong
>>>>> to me.  It's supposed to contain what's actually allowed to the cgroup
>>>>> from its parent and that shouldn't change regardless of how those
>>>>> resources are used.  It's still given to the cgroup from its parent.
>>>> Another way to work around this issue is to expose the reserved_cpus in
>>>> the parent for holding CPUs that can taken by a chid partition. That
>>>> will require adding one more cpuset file for those cgroups that are
>>>> partition roots.
>>> Yeah, that should work.
>>>
>> Thinking about it a bit more, that approach will make creating a
>> partition a multi-step process:
>>
>> 1) Reserve the CPUs in reserved_cpus.
>> 2) enable sched.partition
>> 3) Write the CPUs list into cpus.
>>
>> There are also more exception cases that need to be handled. The current
>> approach, on the other hands, is much simpler and easier to understand
>> and use.
>>
>>>> I don't mind restricting that to the first level children for now. That
>>>> does restrict where we can put the container root if we want a separate
>>>> partition for a container. Let's hear if others have any objection about
>>>> that.
>>> As currently implemented, partioning locks away the cpus which should
>>> be a system level decision, not container level, so it makes sense to
>>> me that it is only available to system root.
>> So my preference is to allow partition only on the first level children
>> of the root for the time being. I think it should cover most of the use
>> cases. I will update the patchset to reflect that.
>>
>> Cheers,
>> Longman
>>
> Below is the incremental patch that allow partitioning only on the first
> level children of the root. Please let me know your thourght on that.
>
> Thanks,
> Longman
>
> -------------[ Cut here ]-------------------------
>
> From 5a41209da94385efff87d79f6523265c710cbea5 Mon Sep 17 00:00:00 2001
> From: Waiman Long <longman@redhat.com>
> Date: Tue, 10 Jul 2018 10:23:16 -0400
> Subject: [PATCH v11 10/10] cpuset: Restrict sched.partition to first level
>  children of root only
>
> Enabling partition on a v2 cpuset has the side effect of affecting the
> effective CPUs of its parent which is currently unique to partitioning.
> As we are not sure about the repercussion of enabling that globally,
> we are now restricting the enabling of sched.partition to the first
> level children of the root cgroup in the default hierarchy.
>
> This is done by removing the "cpuset.sched.partition" control file
> on cgroups that are not the first level children of the root. A new
> show_cfile function pointer is added to the cftype structure. If it
> is defined, it will be called to return a boolean value to determine
> if the corresponding control file should show up in the cgroup. It
> provides a more flexible mechanism to determine the visibility of the
> control file than a simple CFTYPE_* flag can do.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 10 +++++-----
>  include/linux/cgroup-defs.h             |  9 +++++++++
>  kernel/cgroup/cgroup.c                  |  4 ++++
>  kernel/cgroup/cpuset.c                  | 10 ++++++++++
>  4 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst
> b/Documentation/admin-guide/cgroup-v2.rst
> index f7cde15..cf7cd88 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1585,10 +1585,10 @@ Cpuset Interface Files
>      Its value will be affected by memory nodes hotplug events.
>  
>    cpuset.sched.partition
> -    A read-write single value file which exists on non-root
> -    cpuset-enabled cgroups.  It is a binary value flag that accepts
> -    either "0" (off) or "1" (on).  This flag is set and owned by the
> -        parent cgroup.
> +    A read-write single value file which exists on the first level
> +    children of the root cgroup.  It is a binary value flag that
> +    accepts either "0" (off) or "1" (on).  This flag is set and
> +    owned by the parent cgroup.
>  
>      If set, it indicates that the current cgroup is the root of a
>      new partition or scheduling domain that comprises itself and
> @@ -1603,7 +1603,7 @@ Cpuset Interface Files
>         exclusive, i.e. they are not shared by any of its siblings.
>      2) The "cpuset.cpus" is also a proper subset of the parent's
>         "cpuset.cpus.effective".
> -    3) The parent cgroup is a partition root.
> +    3) The parent cgroup is the root cgroup.
>      4) There is no child cgroups with cpuset enabled.  This is for
>         eliminating corner cases that have to be handled if such a
>         condition is allowed.
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index c0e68f9..be79487 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -565,6 +565,15 @@ struct cftype {
>      ssize_t (*write)(struct kernfs_open_file *of,
>               char *buf, size_t nbytes, loff_t off);
>  
> +    /*
> +     * show_cfile(), if defined, will return a boolean value to
> +     * determine if the control file should show up in the cgroup.
> +     * It provides more flexibility in deciding where the control
> +     * file should appear than simple criteria like on-root or
> +     * not-on-root.
> +     */
> +    bool (*show_cfile)(struct cgroup_subsys_state *css);
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>      struct lock_class_key    lockdep_key;
>  #endif
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 077370b..0afdab8 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3612,6 +3612,10 @@ static int cgroup_addrm_files(struct
> cgroup_subsys_state *css,
>          if ((cft->flags & CFTYPE_ONLY_ON_ROOT) && cgroup_parent(cgrp))
>              continue;
>  
> +        /* Should the control file show up in the cgroup */
> +        if (cft->show_cfile && !cft->show_cfile(css))
> +            continue;
> +
>          if (is_add) {
>              ret = cgroup_add_file(css, cgrp, cft);
>              if (ret) {
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 62b7e61..2f85c1e 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2106,6 +2106,15 @@ static s64 cpuset_read_s64(struct
> cgroup_subsys_state *css, struct cftype *cft)
>  }
>  
>  /*
> + * The sched.partition control file should only show up in the first
> + * level children of the root cgroup.
> + */
> +static bool cpuset_show_partition(struct cgroup_subsys_state *css)
> +{
> +    return parent_cs(css_cs(css)) == &top_cpuset;
> +}
> +
> +/*
>   * for the common functions, 'private' gives the type of file
>   */
>  
> @@ -2250,6 +2259,7 @@ static s64 cpuset_read_s64(struct
> cgroup_subsys_state *css, struct cftype *cft)
>          .name = "sched.partition",
>          .read_u64 = cpuset_read_u64,
>          .write_u64 = cpuset_write_u64,
> +        .show_cfile = cpuset_show_partition,
>          .private = FILE_PARTITION_ROOT,
>          .flags = CFTYPE_NOT_ON_ROOT,
>      },

Tejun,

What do you think about this patch?

Are there any other issues you have with this patchset?

Thanks,
Longman


  reply	other threads:[~2018-07-18 15:21 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-24  7:30 [PATCH v11 0/9] cpuset: Enable cpuset controller in default hierarchy Waiman Long
2018-06-24  7:30 ` [PATCH v11 1/9] " Waiman Long
2018-06-24  7:30 ` [PATCH v11 2/9] cpuset: Add new v2 cpuset.sched.partition flag Waiman Long
2018-06-24  7:30 ` [PATCH v11 3/9] cpuset: Simulate auto-off of sched.partition at cgroup removal Waiman Long
2018-06-24  7:30 ` [PATCH v11 4/9] cpuset: Allow changes to cpus in a partition root Waiman Long
2018-06-24  7:30 ` [PATCH v11 5/9] cpuset: Make sure that partition flag work properly with CPU hotplug Waiman Long
2018-06-24  7:30 ` [PATCH v11 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus Waiman Long
2018-06-24  7:30 ` [PATCH v11 7/9] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root Waiman Long
2018-07-02 16:53   ` Tejun Heo
2018-07-03  0:41     ` Waiman Long
2018-07-03 15:58       ` Tejun Heo
2018-07-06 20:32         ` Waiman Long
2018-07-10 15:23           ` Waiman Long
2018-07-18 15:21             ` Waiman Long [this message]
2018-07-18 15:31               ` Tejun Heo
2018-07-18 21:12                 ` Waiman Long
2018-07-19 13:52         ` Peter Zijlstra
2018-07-19 14:04           ` Waiman Long
2018-07-19 15:30             ` Tejun Heo
2018-07-19 15:52               ` Waiman Long
2018-07-19 16:52                 ` Tejun Heo
2018-07-19 17:22                   ` Waiman Long
2018-07-19 17:25                     ` Tejun Heo
2018-07-19 17:38                       ` Waiman Long
2018-07-20 11:32                     ` Peter Zijlstra
2018-07-20 11:31                   ` Peter Zijlstra
2018-07-20 11:45                     ` Tejun Heo
2018-07-20 12:04                       ` Tejun Heo
2018-07-20 15:44                       ` Peter Zijlstra
2018-07-20 15:56                         ` Tejun Heo
2018-07-20 16:19                           ` Waiman Long
2018-07-20 16:37                             ` Peter Zijlstra
2018-07-20 17:09                               ` Waiman Long
2018-07-20 17:41                                 ` Tejun Heo
2018-08-13 17:56                                   ` Waiman Long
2018-08-17 15:59                                     ` Tejun Heo
2018-08-18  1:03                                       ` Waiman Long
2018-07-27 21:21                                 ` Waiman Long
2018-07-20 16:25                           ` Peter Zijlstra
2018-07-20 15:57                         ` Waiman Long
2018-07-20 11:29               ` Peter Zijlstra
2018-06-24  7:30 ` [PATCH v11 8/9] cpuset: Don't rebuild sched domains if cpu changes in non-partition root Waiman Long
2018-06-24  7:30 ` [PATCH v11 9/9] cpuset: Allow reporting of sched domain generation info Waiman Long
2018-07-19 13:54   ` Peter Zijlstra
2018-07-19 13:56     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1eb4971-ff23-6fd8-d75b-112d63f0aee9@redhat.com \
    --to=longman@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=efault@gmx.de \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).