public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Tejun Heo <tj@kernel.org>,
	Dan Schatzberg <dschatzberg@fb.com>, Jens Axboe <axboe@kernel.dk>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 19/24] blk-iocost: fix weight updates of inner active iocgs
Date: Tue, 17 Dec 2024 18:07:17 +0100	[thread overview]
Message-ID: <20241217170519.790791205@linuxfoundation.org> (raw)
In-Reply-To: <20241217170519.006786596@linuxfoundation.org>

5.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

[ Upstream commit e9f4eee9a0023ba22db9560d4cc6ee63f933dae8 ]

When the weight of an active iocg is updated, weight_updated() is called
which in turn calls __propagate_weights() to update the active and inuse
weights so that the effective hierarchical weights are update accordingly.

The current implementation is incorrect for inner active nodes. For an
active leaf iocg, inuse can be any value between 1 and active and the
difference represents how much the iocg is donating. When weight is updated,
as long as inuse is clamped between 1 and the new weight, we're alright and
this is what __propagate_weights() currently implements.

However, that's not how an active inner node's inuse is set. An inner node's
inuse is solely determined by the ratio between the sums of inuse's and
active's of its children - ie. they're results of propagating the leaves'
active and inuse weights upwards. __propagate_weights() incorrectly applies
the same clamping as for a leaf when an active inner node's weight is
updated. Consider a hierarchy which looks like the following with saturating
workloads in AA and BB.

     R
   /   \
  A     B
  |     |
 AA     BB

1. For both A and B, active=100, inuse=100, hwa=0.5, hwi=0.5.

2. echo 200 > A/io.weight

3. __propagate_weights() update A's active to 200 and leave inuse at 100 as
   it's already between 1 and the new active, making A:active=200,
   A:inuse=100. As R's active_sum is updated along with A's active,
   A:hwa=2/3, B:hwa=1/3. However, because the inuses didn't change, the
   hwi's remain unchanged at 0.5.

4. The weight of A is now twice that of B but AA and BB still have the same
   hwi of 0.5 and thus are doing the same amount of IOs.

Fix it by making __propgate_weights() always calculate the inuse of an
active inner iocg based on the ratio of child_inuse_sum to child_active_sum.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Schatzberg <dschatzberg@fb.com>
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Link: https://lore.kernel.org/r/YJsxnLZV1MnBcqjj@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 57e420c84f9a ("blk-iocost: Avoid using clamp() on inuse in __propagate_weights()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/blk-iocost.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 75eb90b3241e..1acd96c9d5c0 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -907,7 +907,17 @@ static void __propagate_active_weight(struct ioc_gq *iocg, u32 active, u32 inuse
 
 	lockdep_assert_held(&ioc->lock);
 
-	inuse = clamp_t(u32, inuse, 1, active);
+	/*
+	 * For an active leaf node, its inuse shouldn't be zero or exceed
+	 * @active. An active internal node's inuse is solely determined by the
+	 * inuse to active ratio of its children regardless of @inuse.
+	 */
+	if (list_empty(&iocg->active_list) && iocg->child_active_sum) {
+		inuse = DIV64_U64_ROUND_UP(active * iocg->child_inuse_sum,
+					   iocg->child_active_sum);
+	} else {
+		inuse = clamp_t(u32, inuse, 1, active);
+	}
 
 	if (active == iocg->active && inuse == iocg->inuse)
 		return;
@@ -920,7 +930,7 @@ static void __propagate_active_weight(struct ioc_gq *iocg, u32 active, u32 inuse
 		/* update the level sums */
 		parent->child_active_sum += (s32)(active - child->active);
 		parent->child_inuse_sum += (s32)(inuse - child->inuse);
-		/* apply the udpates */
+		/* apply the updates */
 		child->active = active;
 		child->inuse = inuse;
 
-- 
2.39.5




  parent reply	other threads:[~2024-12-17 17:10 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-17 17:06 [PATCH 5.4 00/24] 5.4.288-rc1 review Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 5.4 01/24] usb: host: max3421-hcd: Correctly abort a USB request Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 02/24] ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 03/24] usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 04/24] usb: ehci-hcd: fix call balance of clocks handling routines Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 05/24] usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 06/24] xfs: dont drop errno values when we fail to ficlone the entire range Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 07/24] bpf, sockmap: Fix update element with same Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 08/24] batman-adv: Do not send uninitialized TT changes Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 09/24] batman-adv: Remove uninitialized data in full table TT response Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 10/24] batman-adv: Do not let TT changes list grows indefinitely Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 11/24] tipc: fix NULL deref in cleanup_bearer() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 12/24] net: lapb: increase LAPB_HEADER_LEN Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 13/24] ACPI: resource: Fix memory resource type union access Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 14/24] qca_spi: Fix clock speed for multiple QCA7000 Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 15/24] qca_spi: Make driver probing reliable Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 16/24] net/sched: netem: account for backlog updates from child qdisc Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 17/24] ACPICA: events/evxfregn: dont release the ContextMutex that was never acquired Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 18/24] blk-iocost: clamp inuse and skip noops in __propagate_weights() Greg Kroah-Hartman
2024-12-17 17:07 ` Greg Kroah-Hartman [this message]
2024-12-17 17:07 ` [PATCH 5.4 20/24] blk-iocost: Avoid using clamp() on inuse " Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 21/24] KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 22/24] tracing/kprobes: Skip symbol counting logic for module symbols in create_local_trace_kprobe() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 23/24] xen/netfront: fix crash when removing device Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 5.4 24/24] ALSA: usb-audio: Fix a DMA to stack memory bug Greg Kroah-Hartman
2024-12-17 18:52 ` [PATCH 5.4 00/24] 5.4.288-rc1 review Florian Fainelli
2024-12-18  1:13 ` Shuah Khan
2024-12-18 12:40 ` Mark Brown
2024-12-18 13:32 ` Naresh Kamboju
2024-12-18 17:20 ` Jon Hunter
2024-12-19  9:11 ` Harshit Mogalapalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241217170519.790791205@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=dschatzberg@fb.com \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox