public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Odin Ugedal <odin@uged.al>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 49/90] sched/fair: Correctly insert cfs_rqs to list on unthrottle
Date: Mon, 21 Jun 2021 18:15:24 +0200	[thread overview]
Message-ID: <20210621154905.799543802@linuxfoundation.org> (raw)
In-Reply-To: <20210621154904.159672728@linuxfoundation.org>

From: Odin Ugedal <odin@uged.al>

[ Upstream commit a7b359fc6a37faaf472125867c8dc5a068c90982 ]

Fix an issue where fairness is decreased since cfs_rq's can end up not
being decayed properly. For two sibling control groups with the same
priority, this can often lead to a load ratio of 99/1 (!!).

This happens because when a cfs_rq is throttled, all the descendant
cfs_rq's will be removed from the leaf list. When they initial cfs_rq
is unthrottled, it will currently only re add descendant cfs_rq's if
they have one or more entities enqueued. This is not a perfect
heuristic.

Instead, we insert all cfs_rq's that contain one or more enqueued
entities, or it its load is not completely decayed.

Can often lead to situations like this for equally weighted control
groups:

  $ ps u -C stress
  USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  root       10009 88.8  0.0   3676   100 pts/1    R+   11:04   0:13 stress --cpu 1
  root       10023  3.0  0.0   3676   104 pts/1    R+   11:04   0:00 stress --cpu 1

Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
[vingo: !SMP build fix]
Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210612112815.61678-1-odin@uged.al
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/sched/fair.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d3f4113e87de..877672df822f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3131,6 +3131,24 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags)
 
 #ifdef CONFIG_SMP
 #ifdef CONFIG_FAIR_GROUP_SCHED
+
+static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
+{
+	if (cfs_rq->load.weight)
+		return false;
+
+	if (cfs_rq->avg.load_sum)
+		return false;
+
+	if (cfs_rq->avg.util_sum)
+		return false;
+
+	if (cfs_rq->avg.runnable_load_sum)
+		return false;
+
+	return true;
+}
+
 /**
  * update_tg_load_avg - update the tg's load avg
  * @cfs_rq: the cfs_rq whose avg changed
@@ -3833,6 +3851,11 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
 
 #else /* CONFIG_SMP */
 
+static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
+{
+	return true;
+}
+
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 #define DO_ATTACH	0x0
@@ -4488,8 +4511,8 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
 		cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
 					     cfs_rq->throttled_clock_task;
 
-		/* Add cfs_rq with already running entity in the list */
-		if (cfs_rq->nr_running >= 1)
+		/* Add cfs_rq with load or one or more already running entities to the list */
+		if (!cfs_rq_is_decayed(cfs_rq) || cfs_rq->nr_running)
 			list_add_leaf_cfs_rq(cfs_rq);
 	}
 
@@ -7620,23 +7643,6 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
-{
-	if (cfs_rq->load.weight)
-		return false;
-
-	if (cfs_rq->avg.load_sum)
-		return false;
-
-	if (cfs_rq->avg.util_sum)
-		return false;
-
-	if (cfs_rq->avg.runnable_load_sum)
-		return false;
-
-	return true;
-}
-
 static bool __update_blocked_fair(struct rq *rq, bool *done)
 {
 	struct cfs_rq *cfs_rq, *pos;
-- 
2.30.2




  parent reply	other threads:[~2021-06-21 16:19 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 16:14 [PATCH 5.4 00/90] 5.4.128-rc1 review Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 01/90] dmaengine: ALTERA_MSGDMA depends on HAS_IOMEM Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 02/90] dmaengine: QCOM_HIDMA_MGMT " Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 03/90] dmaengine: stedma40: add missing iounmap() on error in d40_probe() Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 04/90] afs: Fix an IS_ERR() vs NULL check Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 05/90] mm/memory-failure: make sure wait for page writeback in memory_failure Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 06/90] kvm: LAPIC: Restore guard to prevent illegal APIC register access Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 07/90] batman-adv: Avoid WARN_ON timing related checks Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 08/90] net: ipv4: fix memory leak in netlbl_cipsov4_add_std Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 09/90] vrf: fix maximum MTU Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 10/90] net: rds: fix memory leak in rds_recvmsg Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 11/90] net: lantiq: disable interrupt before sheduling NAPI Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 12/90] udp: fix race between close() and udp_abort() Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 13/90] rtnetlink: Fix regression in bridge VLAN configuration Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 14/90] net/sched: act_ct: handle DNAT tuple collision Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 15/90] net/mlx5e: Remove dependency in IPsec initialization flows Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 16/90] net/mlx5e: Fix page reclaim for dead peer hairpin Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 17/90] net/mlx5: Consider RoCE cap before init RDMA resources Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 18/90] net/mlx5e: allow TSO on VXLAN over VLAN topologies Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 19/90] net/mlx5e: Block offload of outer header csum for UDP tunnels Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 20/90] netfilter: synproxy: Fix out of bounds when parsing TCP options Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 21/90] sch_cake: Fix out of bounds when parsing TCP options and header Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 22/90] alx: Fix an error handling path in alx_probe() Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 23/90] net: stmmac: dwmac1000: Fix extended MAC address registers definition Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 24/90] net: make get_net_ns return error if NET_NS is disabled Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 25/90] qlcnic: Fix an error handling path in qlcnic_probe() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 26/90] netxen_nic: Fix an error handling path in netxen_nic_probe() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 27/90] net: qrtr: fix OOB Read in qrtr_endpoint_post Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 28/90] ptp: improve max_adj check against unreasonable values Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 29/90] net: cdc_ncm: switch to eth%d interface naming Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 30/90] lantiq: net: fix duplicated skb in rx descriptor ring Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 31/90] net: usb: fix possible use-after-free in smsc75xx_bind Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 32/90] net: fec_ptp: fix issue caused by refactor the fec_devtype Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 33/90] net: ipv4: fix memory leak in ip_mc_add1_src Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 34/90] net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sock Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 35/90] be2net: Fix an error handling path in be_probe() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 36/90] net: hamradio: fix memory leak in mkiss_close Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 37/90] net: cdc_eem: fix tx fixup skb leak Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 38/90] cxgb4: fix wrong shift Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 39/90] bnxt_en: Rediscover PHY capabilities after firmware reset Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 40/90] bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 41/90] icmp: dont send out ICMP messages with a source address of 0.0.0.0 Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 42/90] net: ethernet: fix potential use-after-free in ec_bhf_remove Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 43/90] regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 44/90] ASoC: rt5659: Fix the lost powers for the HDA header Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 45/90] spi: stm32-qspi: Always wait BUSY bit to be cleared in stm32_qspi_wait_cmd() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 46/90] pinctrl: ralink: rt2880: avoid to error in calls is pin is already enabled Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 47/90] radeon: use memcpy_to/fromio for UVD fw upload Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 48/90] hwmon: (scpi-hwmon) shows the negative temperature properly Greg Kroah-Hartman
2021-06-21 16:15 ` Greg Kroah-Hartman [this message]
2021-06-21 16:15 ` [PATCH 5.4 50/90] can: bcm: fix infoleak in struct bcm_msg_head Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 51/90] can: bcm/raw/isotp: use per module netdevice notifier Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 52/90] can: j1939: fix Use-after-Free, hold skb ref while in use Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 53/90] can: mcba_usb: fix memory leak in mcba_usb Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 54/90] usb: core: hub: Disable autosuspend for Cypress CY7C65632 Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 55/90] tracing: Do not stop recording cmdlines when tracing is off Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 56/90] tracing: Do not stop recording comms if the trace file is being read Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 57/90] tracing: Do no increment trace_clock_global() by one Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 58/90] PCI: Mark TI C667X to avoid bus reset Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 59/90] PCI: Mark some NVIDIA GPUs " Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 60/90] PCI: aardvark: Dont rely on jiffies while holding spinlock Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 61/90] PCI: aardvark: Fix kernel panic during PIO transfer Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 62/90] PCI: Add ACS quirk for Broadcom BCM57414 NIC Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 63/90] PCI: Work around Huawei Intelligent NIC VF FLR erratum Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 64/90] KVM: x86: Immediately reset the MMU context when the SMM flag is cleared Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 65/90] ARCv2: save ABI registers across signal handling Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 66/90] x86/process: Check PF_KTHREAD and not current->mm for kernel threads Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 67/90] x86/pkru: Write hardware init value to PKRU when xstate is init Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 68/90] x86/fpu: Reset state for all signal restore failures Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 69/90] dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 70/90] cfg80211: make certificate generation more robust Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 71/90] cfg80211: avoid double free of PMSR request Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 72/90] drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 73/90] drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 74/90] net: ll_temac: Make sure to free skb when it is completely used Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 75/90] net: ll_temac: Fix TX BD buffer overwrite Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 76/90] net: bridge: fix vlan tunnel dst null pointer dereference Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 77/90] net: bridge: fix vlan tunnel dst refcnt when egressing Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 78/90] mm/slub: clarify verification reporting Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 79/90] mm/slub: fix redzoning for small allocations Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 80/90] mm/slub.c: include swab.h Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 81/90] net: stmmac: disable clocks in stmmac_remove_config_dt() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 82/90] net: fec_ptp: add clock rate zero check Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 83/90] tools headers UAPI: Sync linux/in.h copy with the kernel sources Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 84/90] KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 85/90] ARM: OMAP: replace setup_irq() by request_irq() Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 86/90] clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 87/90] clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 88/90] clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 89/90] usb: dwc3: debugfs: Add and remove endpoint dirs dynamically Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 90/90] usb: dwc3: core: fix kernel panic when do reboot Greg Kroah-Hartman
2021-06-21 19:16 ` [PATCH 5.4 00/90] 5.4.128-rc1 review Florian Fainelli
2021-06-22  7:10 ` Naresh Kamboju
2021-06-22  7:31 ` Samuel Zou
2021-06-22  7:58 ` Jon Hunter
2021-06-22 10:45 ` Sudip Mukherjee
2021-06-22 21:34 ` Guenter Roeck
2021-06-22 23:59 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210621154905.799543802@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=odin@uged.al \
    --cc=peterz@infradead.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox