From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Qais Yousef <qais.yousef@arm.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Qais Yousef (Google)" <qyousef@layalina.io>
Subject: [PATCH 6.1 79/98] sched/fair: Detect capacity inversion
Date: Mon, 24 Apr 2023 15:17:42 +0200 [thread overview]
Message-ID: <20230424131136.927856147@linuxfoundation.org> (raw)
In-Reply-To: <20230424131133.829259077@linuxfoundation.org>
From: Qais Yousef <qais.yousef@arm.com>
commit: 44c7b80bffc3a657a36857098d5d9c49d94e652b upstream.
Check each performance domain to see if thermal pressure is causing its
capacity to be lower than another performance domain.
We assume that each performance domain has CPUs with the same
capacities, which is similar to an assumption made in energy_model.c
We also assume that thermal pressure impacts all CPUs in a performance
domain equally.
If there're multiple performance domains with the same capacity_orig, we
will trigger a capacity inversion if the domain is under thermal
pressure.
The new cpu_in_capacity_inversion() should help users to know when
information about capacity_orig are not reliable and can opt in to use
the inverted capacity as the 'actual' capacity_orig.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com
(cherry picked from commit 44c7b80bffc3a657a36857098d5d9c49d94e652b)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/sched/fair.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++---
kernel/sched/sched.h | 19 +++++++++++++++
2 files changed, 79 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8866,16 +8866,73 @@ static unsigned long scale_rt_capacity(i
static void update_cpu_capacity(struct sched_domain *sd, int cpu)
{
+ unsigned long capacity_orig = arch_scale_cpu_capacity(cpu);
unsigned long capacity = scale_rt_capacity(cpu);
struct sched_group *sdg = sd->groups;
+ struct rq *rq = cpu_rq(cpu);
- cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(cpu);
+ rq->cpu_capacity_orig = capacity_orig;
if (!capacity)
capacity = 1;
- cpu_rq(cpu)->cpu_capacity = capacity;
- trace_sched_cpu_capacity_tp(cpu_rq(cpu));
+ rq->cpu_capacity = capacity;
+
+ /*
+ * Detect if the performance domain is in capacity inversion state.
+ *
+ * Capacity inversion happens when another perf domain with equal or
+ * lower capacity_orig_of() ends up having higher capacity than this
+ * domain after subtracting thermal pressure.
+ *
+ * We only take into account thermal pressure in this detection as it's
+ * the only metric that actually results in *real* reduction of
+ * capacity due to performance points (OPPs) being dropped/become
+ * unreachable due to thermal throttling.
+ *
+ * We assume:
+ * * That all cpus in a perf domain have the same capacity_orig
+ * (same uArch).
+ * * Thermal pressure will impact all cpus in this perf domain
+ * equally.
+ */
+ if (static_branch_unlikely(&sched_asym_cpucapacity)) {
+ unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
+ struct perf_domain *pd = rcu_dereference(rq->rd->pd);
+
+ rq->cpu_capacity_inverted = 0;
+
+ for (; pd; pd = pd->next) {
+ struct cpumask *pd_span = perf_domain_span(pd);
+ unsigned long pd_cap_orig, pd_cap;
+
+ cpu = cpumask_any(pd_span);
+ pd_cap_orig = arch_scale_cpu_capacity(cpu);
+
+ if (capacity_orig < pd_cap_orig)
+ continue;
+
+ /*
+ * handle the case of multiple perf domains have the
+ * same capacity_orig but one of them is under higher
+ * thermal pressure. We record it as capacity
+ * inversion.
+ */
+ if (capacity_orig == pd_cap_orig) {
+ pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu));
+
+ if (pd_cap > inv_cap) {
+ rq->cpu_capacity_inverted = inv_cap;
+ break;
+ }
+ } else if (pd_cap_orig > inv_cap) {
+ rq->cpu_capacity_inverted = inv_cap;
+ break;
+ }
+ }
+ }
+
+ trace_sched_cpu_capacity_tp(rq);
sdg->sgc->capacity = capacity;
sdg->sgc->min_capacity = capacity;
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1041,6 +1041,7 @@ struct rq {
unsigned long cpu_capacity;
unsigned long cpu_capacity_orig;
+ unsigned long cpu_capacity_inverted;
struct balance_callback *balance_callback;
@@ -2878,6 +2879,24 @@ static inline unsigned long capacity_ori
return cpu_rq(cpu)->cpu_capacity_orig;
}
+/*
+ * Returns inverted capacity if the CPU is in capacity inversion state.
+ * 0 otherwise.
+ *
+ * Capacity inversion detection only considers thermal impact where actual
+ * performance points (OPPs) gets dropped.
+ *
+ * Capacity inversion state happens when another performance domain that has
+ * equal or lower capacity_orig_of() becomes effectively larger than the perf
+ * domain this CPU belongs to due to thermal pressure throttling it hard.
+ *
+ * See comment in update_cpu_capacity().
+ */
+static inline unsigned long cpu_in_capacity_inversion(int cpu)
+{
+ return cpu_rq(cpu)->cpu_capacity_inverted;
+}
+
/**
* enum cpu_util_type - CPU utilization type
* @FREQUENCY_UTIL: Utilization used to select frequency
next prev parent reply other threads:[~2023-04-24 13:28 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-24 13:16 [PATCH 6.1 00/98] 6.1.26-rc1 review Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 01/98] ARM: dts: rockchip: fix a typo error for rk3288 spdif node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 02/98] arm64: dts: rockchip: Lower sd speed on rk3566-soquartz Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 03/98] arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 04/98] arm64: dts: qcom: hk10: use "okay" instead of "ok" Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 05/98] arm64: dts: qcom: ipq8074-hk10: enable QMP device, not the PHY node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 06/98] arm64: dts: meson-g12-common: specify full DMC range Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 07/98] arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 08/98] arm64: dts: imx8mm-evk: correct pmic clock source Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 09/98] arm64: dts: imx8mm-verdin: correct off-on-delay Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 10/98] arm64: dts: imx8mp-verdin: " Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 11/98] netfilter: br_netfilter: fix recent physdev match breakage Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 12/98] netfilter: nf_tables: Modify nla_memdups flag to GFP_KERNEL_ACCOUNT Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 13/98] rust: str: fix requierments->requirements typo Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 14/98] regulator: fan53555: Explicitly include bits header Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 15/98] regulator: fan53555: Fix wrong TCS_SLEW_MASK Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 16/98] net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 17/98] virtio_net: bugfix overflow inside xdp_linearize_page() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 18/98] sfc: Fix use-after-free due to selftest_work Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 19/98] netfilter: nf_tables: fix ifdef to also consider nf_tables=m Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 20/98] i40e: fix accessing vsi->active_filters without holding lock Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 21/98] i40e: fix i40e_setup_misc_vector() error handling Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 22/98] netfilter: nf_tables: validate catch-all set elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 23/98] netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 24/98] bnxt_en: Do not initialize PTP on older P3/P4 chips Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 25/98] mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 26/98] bonding: Fix memory leak when changing bond type to Ethernet Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 27/98] net: rpl: fix rpl header size calculation Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 28/98] mlxsw: pci: Fix possible crash during initialization Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 29/98] spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 30/98] bpf: Fix incorrect verifier pruning due to missing register precision taints Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 31/98] e1000e: Disable TSO on i219-LM card to increase speed Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 32/98] net: bridge: switchdev: dont notify FDB entries with "master dynamic" Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 33/98] f2fs: Fix f2fs_truncate_partial_nodes ftrace event Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 34/98] platform/x86/intel: vsec: Fix a memory leak in intel_vsec_add_aux Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 35/98] platform/x86 (gigabyte-wmi): Add support for A320M-S2H V2 Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 6.1 36/98] selftests: sigaltstack: fix -Wuninitialized Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 37/98] scsi: megaraid_sas: Fix fw_crash_buffer_show() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 38/98] scsi: core: Improve scsi_vpd_inquiry() checks Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 39/98] net: dsa: b53: mmap: add phy ops Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 40/98] platform/x86: gigabyte-wmi: add support for B650 AORUS ELITE AX Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 41/98] s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 42/98] drm: buddy_allocator: Fix buddy allocator init on 32-bit systems Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 43/98] drm: test: Fix 32-bit issue in drm_buddy_test Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 44/98] nvme-tcp: fix a possible UAF when failing to allocate an io queue Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 45/98] xen/netback: use same error messages for same errors Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 46/98] platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 47/98] platform/x86: asus-nb-wmi: Add quirk_asus_tablet_mode to other ROG Flow X13 models Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 48/98] mtd: spi-nor: fix memory leak when using debugfs_lookup() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 49/98] Revert "userfaultfd: dont fail on unrecognized features" Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 50/98] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 51/98] iio: dac: ad5755: Add missing fwnode_handle_put() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 52/98] iio: light: tsl2772: fix reading proximity-diodes from device tree Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 53/98] ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 54/98] btrfs: get the next extent map during fiemap/lseek more efficiently Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 55/98] rust: kernel: Mark rust_fmt_argument as extern "C" Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 56/98] LoongArch: Fix probing of the CRC32 feature Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 57/98] LoongArch: Mark 3 symbol exports as non-GPL Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 58/98] maple_tree: make maple state reusable after mas_empty_area_rev() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 59/98] maple_tree: fix mas_empty_area() search Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 60/98] maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 61/98] nilfs2: initialize unused bytes in segment summary blocks Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 62/98] tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 63/98] memstick: fix memory leak if card device is never registered Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 64/98] kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 65/98] writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 66/98] mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25 Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 67/98] drm/i915: Fix fast wake AUX sync len Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 68/98] drm/amdgpu: Fix desktop freezed after gpu-reset Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 69/98] drm/amd/display: set dcn315 lb bpp to 48 Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 70/98] drm/rockchip: vop2: fix suspend/resume Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 71/98] drm/rockchip: vop2: Use regcache_sync() to " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 72/98] mm/userfaultfd: fix uffd-wp handling for THP migration entries Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 73/98] mm/khugepaged: check again on anon uffd-wp during isolation Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 74/98] mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 75/98] mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 76/98] mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 77/98] mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 78/98] mm/mmap: regression fix for unmapped_area{_topdown} Greg Kroah-Hartman
2023-04-24 13:17 ` Greg Kroah-Hartman [this message]
2023-04-24 13:17 ` [PATCH 6.1 80/98] sched/fair: Consider capacity inversion in util_fits_cpu() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 81/98] sched/fair: Fixes for capacity inversion detection Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 82/98] KVM: arm64: Make vcpu flag updates non-preemptible Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 83/98] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 84/98] MIPS: Define RUNTIME_DISCARD_EXIT in LD script Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 85/98] fuse: always revalidate rename target dentry Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 86/98] purgatory: fix disabling debug info Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 87/98] inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 88/98] dccp: Call inet6_destroy_sock() via sk->sk_destruct() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 89/98] sctp: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 90/98] gcc: disable -Warray-bounds for gcc-13 too Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 91/98] Input: pegasus-notetaker - check pipe type when probing Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 92/98] iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 93/98] fpga: bridge: properly initialize bridge device before populating children Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 94/98] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 95/98] ASoC: SOF: pm: Tear down pipelines only if DSP was active Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 6.1 96/98] ASoC: fsl_asrc_dma: fix potential null-ptr-deref Greg Kroah-Hartman
2023-04-24 13:18 ` [PATCH 6.1 97/98] ASoC: fsl_sai: Fix pins setting for i.MX8QM platform Greg Kroah-Hartman
2023-04-24 13:18 ` [PATCH 6.1 98/98] ASN.1: Fix check for strdup() success Greg Kroah-Hartman
2023-04-24 22:09 ` [PATCH 6.1 00/98] 6.1.26-rc1 review ogasawara takeshi
2023-04-25 1:05 ` Guenter Roeck
2023-04-25 2:09 ` Markus Reichelt
2023-04-25 2:44 ` Bagas Sanjaya
2023-04-25 7:02 ` Conor Dooley
2023-04-25 9:23 ` Ron Economos
2023-04-25 10:39 ` Chris Paterson
2023-04-25 14:33 ` Naresh Kamboju
2023-04-25 21:31 ` Florian Fainelli
2023-04-26 0:19 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230424131136.927856147@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=patches@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=qyousef@layalina.io \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox