From: Juri Lelli <juri.lelli@redhat.com>
To: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Waiman Long <longman@redhat.com>, Tejun Heo <tj@kernel.org>,
Li Zefan <lizefan@huawei.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, kernel-team@fb.com, pjt@google.com,
luto@amacapital.net, Mike Galbraith <efault@gmx.de>,
torvalds@linux-foundation.org, Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
Date: Fri, 25 May 2018 14:52:17 +0200 [thread overview]
Message-ID: <20180525125217.GC678@localhost.localdomain> (raw)
In-Reply-To: <20180525103147.GC30654@e110439-lin>
On 25/05/18 11:31, Patrick Bellasi wrote:
[...]
> Right, so the problem seems to be that we "need" to call
> arch_update_cpu_topology() and we do that by calling
> partition_sched_domains() which was initially introduced by:
>
> 029190c515f1 ("cpuset sched_load_balance flag")
>
> back in 2007, where it's also quite well explained the reasons behind
> the sched_load_balance flag and the idea to have "partitioned" SDs.
>
> I also (hopefully) understood that there are at least two actors involved:
>
> - A) arch code
> which creates SDs and SGs, usually to group CPUs depending on the
> memory hierarchy, to support different time granularity of load
> balancing operations
>
> Special case here are HP and hibernation which, by on-/off-lining
> CPUs they directly affect the SDs/SGs definitions.
>
> - B) cpusets
> which expose to userspace the possibility to define,
> _if possible_, a finer granularity set of SGs to further restrict the
> scope of load balancing operations
>
> Since B is a "possible finer granularity" refinement of A, then we
> trigger A's reconfigurations based on B's constraints.
>
> That's why, for example, in consequence of an HP online event,
> we have:
>
> --- core.c -------------------
> HP[sched:active]
> | sched_cpu_activate()
> | cpuset_cpu_active()
> --- cpuset.c -----------------
> | cpuset_update_active_cpus()
> | schedule_work(&cpuset_hotplug_work)
> \.. System Kworker \
> | cpuset_hotplug_workfn()
> if (cpus_updated || force_rebuild)
> | rebuild_sched_domains()
> | rebuild_sched_domains_locked()
> | generate_sched_domains()
> --- topology.c ---------------
> | partition_sched_domains()
> | arch_update_cpu_topology()
>
>
> IOW, we need to pass via cpusets to rebuild the SDs whenever we
> there are HP events or we "need" to do an arch_update_cpu_topology()
> via the arch topology driver (drivers/base/arch_topology.c).
I don't think the arch topology driver is always involved in this (e.g.,
arch/x86/kernel/itmt::sched_itmt_update_handler()).
Still we need to check if topology changed, as you say.
> This last bit is also interesting, whenever we detect arch topology
> information that required an SD rebuild, we need to force a
> partition_sched_domains(). But, for that, in:
>
> commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs")
>
> we just introduced the support for the "force_rebuild" flag to be set.
>
> Thus, potentially we can just extend the check I've proposed to consider the
> force rebuild flag, to be something like:
>
> ---8<---
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 8f586e8bdc98..1f051fafaa3a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -874,11 +874,19 @@ static void rebuild_sched_domains_locked(void)
> !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
> goto out;
>
> + /* Special case for the 99% of systems with one, full, sched domain */
> + if (!force_rebuild &&
> + !top_cpuset.isolation_count &&
> + is_sched_load_balance(&top_cpuset))
> + goto out;
> + force_rebuild = false;
> +
> /* Generate domain masks and attrs */
> ndoms = generate_sched_domains(&doms, &attr);
>
> /* Have scheduler rebuild the domains */
> partition_sched_domains(ndoms, doms, attr);
> out:
> put_online_cpus();
> ---8<---
>
>
> Which would still allow to use something like:
>
> cpuset_force_rebuild()
> rebuild_sched_domains()
>
> to actually rebuild SD in consequence of arch topology changes.
That might work.
>
> >
> > Maybe we could move the check you are proposing in update_cpumasks_
> > hier() ?
>
> Yes, that's another option... although there we are outside of
> get_online_cpus(). Could be a problem?
Mmm, using force_rebuild flag seems safer indeed.
WARNING: multiple messages have this Message-ID (diff)
From: Juri Lelli <juri.lelli@redhat.com>
To: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Waiman Long <longman@redhat.com>, Tejun Heo <tj@kernel.org>,
Li Zefan <lizefan@huawei.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, kernel-team@fb.com, pjt@google.com,
luto@amacapital.net, Mike Galbraith <efault@gmx.de>,
torvalds@linux-foundation.org, Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
Date: Fri, 25 May 2018 14:52:17 +0200 [thread overview]
Message-ID: <20180525125217.GC678@localhost.localdomain> (raw)
In-Reply-To: <20180525103147.GC30654@e110439-lin>
On 25/05/18 11:31, Patrick Bellasi wrote:
[...]
> Right, so the problem seems to be that we "need" to call
> arch_update_cpu_topology() and we do that by calling
> partition_sched_domains() which was initially introduced by:
>
> 029190c515f1 ("cpuset sched_load_balance flag")
>
> back in 2007, where it's also quite well explained the reasons behind
> the sched_load_balance flag and the idea to have "partitioned" SDs.
>
> I also (hopefully) understood that there are at least two actors involved:
>
> - A) arch code
> which creates SDs and SGs, usually to group CPUs depending on the
> memory hierarchy, to support different time granularity of load
> balancing operations
>
> Special case here are HP and hibernation which, by on-/off-lining
> CPUs they directly affect the SDs/SGs definitions.
>
> - B) cpusets
> which expose to userspace the possibility to define,
> _if possible_, a finer granularity set of SGs to further restrict the
> scope of load balancing operations
>
> Since B is a "possible finer granularity" refinement of A, then we
> trigger A's reconfigurations based on B's constraints.
>
> That's why, for example, in consequence of an HP online event,
> we have:
>
> --- core.c -------------------
> HP[sched:active]
> | sched_cpu_activate()
> | cpuset_cpu_active()
> --- cpuset.c -----------------
> | cpuset_update_active_cpus()
> | schedule_work(&cpuset_hotplug_work)
> \.. System Kworker \
> | cpuset_hotplug_workfn()
> if (cpus_updated || force_rebuild)
> | rebuild_sched_domains()
> | rebuild_sched_domains_locked()
> | generate_sched_domains()
> --- topology.c ---------------
> | partition_sched_domains()
> | arch_update_cpu_topology()
>
>
> IOW, we need to pass via cpusets to rebuild the SDs whenever we
> there are HP events or we "need" to do an arch_update_cpu_topology()
> via the arch topology driver (drivers/base/arch_topology.c).
I don't think the arch topology driver is always involved in this (e.g.,
arch/x86/kernel/itmt::sched_itmt_update_handler()).
Still we need to check if topology changed, as you say.
> This last bit is also interesting, whenever we detect arch topology
> information that required an SD rebuild, we need to force a
> partition_sched_domains(). But, for that, in:
>
> commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs")
>
> we just introduced the support for the "force_rebuild" flag to be set.
>
> Thus, potentially we can just extend the check I've proposed to consider the
> force rebuild flag, to be something like:
>
> ---8<---
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 8f586e8bdc98..1f051fafaa3a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -874,11 +874,19 @@ static void rebuild_sched_domains_locked(void)
> !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
> goto out;
>
> + /* Special case for the 99% of systems with one, full, sched domain */
> + if (!force_rebuild &&
> + !top_cpuset.isolation_count &&
> + is_sched_load_balance(&top_cpuset))
> + goto out;
> + force_rebuild = false;
> +
> /* Generate domain masks and attrs */
> ndoms = generate_sched_domains(&doms, &attr);
>
> /* Have scheduler rebuild the domains */
> partition_sched_domains(ndoms, doms, attr);
> out:
> put_online_cpus();
> ---8<---
>
>
> Which would still allow to use something like:
>
> cpuset_force_rebuild()
> rebuild_sched_domains()
>
> to actually rebuild SD in consequence of arch topology changes.
That might work.
>
> >
> > Maybe we could move the check you are proposing in update_cpumasks_
> > hier() ?
>
> Yes, that's another option... although there we are outside of
> get_online_cpus(). Could be a problem?
Mmm, using force_rebuild flag seems safer indeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-05-25 12:52 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-17 20:55 [PATCH v8 0/6] Enable cpuset controller in default hierarchy Waiman Long
2018-05-17 20:55 ` Waiman Long
2018-05-17 20:55 ` [PATCH v8 1/6] cpuset: " Waiman Long
2018-05-17 20:55 ` Waiman Long
2018-05-21 11:55 ` Patrick Bellasi
2018-05-21 11:55 ` Patrick Bellasi
2018-05-21 13:55 ` Waiman Long
2018-05-21 13:55 ` Waiman Long
2018-05-21 15:09 ` Patrick Bellasi
2018-05-21 15:09 ` Patrick Bellasi
2018-05-21 16:10 ` Waiman Long
2018-05-21 16:10 ` Waiman Long
2018-05-17 20:55 ` [PATCH v8 2/6] cpuset: Add new v2 cpuset.sched.domain flag Waiman Long
2018-05-17 20:55 ` Waiman Long
2018-05-22 12:57 ` Juri Lelli
2018-05-22 12:57 ` Juri Lelli
2018-05-22 13:20 ` Waiman Long
2018-05-22 13:20 ` Waiman Long
2018-05-29 0:55 ` Waiman Long
2018-05-29 0:55 ` Waiman Long
2018-05-24 15:41 ` Peter Zijlstra
2018-05-24 15:41 ` Peter Zijlstra
2018-05-24 18:53 ` Waiman Long
2018-05-24 18:53 ` Waiman Long
2018-05-25 7:15 ` Peter Zijlstra
2018-05-25 7:15 ` Peter Zijlstra
2018-05-17 20:55 ` [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2 Waiman Long
2018-05-17 20:55 ` Waiman Long
2018-05-24 14:36 ` Juri Lelli
2018-05-24 14:36 ` Juri Lelli
2018-05-24 15:09 ` Waiman Long
2018-05-24 15:09 ` Waiman Long
2018-05-24 15:16 ` Juri Lelli
2018-05-24 15:16 ` Juri Lelli
2018-05-24 15:22 ` Waiman Long
2018-05-24 15:22 ` Waiman Long
2018-05-25 9:40 ` Patrick Bellasi
2018-05-25 9:40 ` Patrick Bellasi
2018-05-25 14:45 ` Waiman Long
2018-05-25 14:45 ` Waiman Long
2018-05-24 15:43 ` Peter Zijlstra
2018-05-24 15:43 ` Peter Zijlstra
2018-05-24 18:55 ` Waiman Long
2018-05-24 18:55 ` Waiman Long
2018-05-28 12:45 ` Peter Zijlstra
2018-05-28 12:45 ` Peter Zijlstra
2018-05-28 18:31 ` Waiman Long
2018-05-28 18:31 ` Waiman Long
2018-05-17 20:55 ` [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus Waiman Long
2018-05-17 20:55 ` Waiman Long
2018-05-23 17:34 ` Patrick Bellasi
2018-05-23 17:34 ` Patrick Bellasi
2018-05-23 20:18 ` Waiman Long
2018-05-23 20:18 ` Waiman Long
2018-05-24 9:04 ` Patrick Bellasi
2018-05-24 9:04 ` Patrick Bellasi
2018-05-24 9:04 ` Patrick Bellasi
2018-05-24 10:39 ` Juri Lelli
2018-05-24 10:39 ` Juri Lelli
2018-05-25 10:31 ` Patrick Bellasi
2018-05-25 10:31 ` Patrick Bellasi
2018-05-25 12:52 ` Juri Lelli [this message]
2018-05-25 12:52 ` Juri Lelli
2018-05-24 10:28 ` Juri Lelli
2018-05-24 10:28 ` Juri Lelli
2018-05-29 1:12 ` Waiman Long
2018-05-29 1:12 ` Waiman Long
2018-05-29 1:24 ` Waiman Long
2018-05-29 1:24 ` Waiman Long
2018-05-29 6:27 ` Juri Lelli
2018-05-29 6:27 ` Juri Lelli
2018-05-29 12:40 ` Waiman Long
2018-05-29 12:40 ` Waiman Long
2018-05-29 13:12 ` Juri Lelli
2018-05-29 13:12 ` Juri Lelli
2018-05-17 20:55 ` [PATCH v8 5/6] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root Waiman Long
2018-05-17 20:55 ` Waiman Long
2018-05-17 20:55 ` [PATCH v8 6/6] cpuset: Allow reporting of sched domain generation info Waiman Long
2018-05-17 20:55 ` Waiman Long
2018-05-22 13:53 ` Juri Lelli
2018-05-22 13:53 ` Juri Lelli
2018-05-29 1:04 ` Waiman Long
2018-05-29 1:04 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180525125217.GC678@localhost.localdomain \
--to=juri.lelli@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=efault@gmx.de \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=longman@redhat.com \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=patrick.bellasi@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.