From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sg-1-104.ptr.blmpb.com (sg-1-104.ptr.blmpb.com [118.26.132.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56AAC30B518 for ; Mon, 2 Feb 2026 07:10:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.104 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770016209; cv=none; b=q2RirlfTES1X8tFyqo1i0xGRbby+fvGi/dk/sygt4jCnNyhwHHTe6XeUG1KqMqAur+q2bPUzX+NmnJOmzOhDcsPiB79XOotfypLNMDzw5m1OFNtEg1+JOBKhwHbJdFJiLzBO/3sG7HwbSSwwA36XruVUzwIoN8uBfgRj64d3c0o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770016209; c=relaxed/simple; bh=KXxSJgEPdeU/D7W4zWF211AFi0Lh1bTTRau+nWVgxfM=; h=Content-Disposition:To:In-Reply-To:Cc:Subject:From:Content-Type: References:Date:Message-Id:Mime-Version; b=bDXjNpsYxwWfUJ65ZZhf7187Aeq1xJS8SD6tBr963NzKSidf56NbF5AV0OrQd+EvRTkCGvuoq7LS4Akm6E9mqvi3nY8qmHgs4GkRpowp2TbZmzv3euvL0bymIonNf1BtVC6MYsjQz/zsF3ZsyplLPX//4qgLha4VYMb3ihr4Umk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=WtgWYpND; arc=none smtp.client-ip=118.26.132.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="WtgWYpND" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1770016187; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=XUQELD/NgpyTMvd1zkHt46JO47Rk2DeAO/tDi+wcDkE=; b=WtgWYpNDRRQHhfmjniBQRNAPem8tDkxGSHWQCFNveu+AbZmZGTsT7I0gsQXbkX0KyYCU4d wW8nVfVHvZQo/35Ar5zTkZxK3RMXhkSrcONznGmKnmSDdtNeFEMZwPTxcBZ/9nR6g0Vfl9 qb1vbQITjrH6SlfSm/MXda5tMYRc20FxHXpRY+mLhB5TPKuHP7q/OPAajF8J/5IAZ2ATdn 2sI+ujFCdVP1+mQelob1wIZzyg9FJGnRLmg8TX/GIfXNpdckXca82eNb6JDK3uujgqRMvA gH5GlvTBYyA2F6pxtoekE+Ku94UzGGXlU5kDMvu4t77pG7OlkqFtKRIQcF1Y4Q== Content-Disposition: inline To: "Zicheng Qu" X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable In-Reply-To: <1594f461-549c-4db9-b80e-63c48818fc5b@huawei.com> Cc: "K Prateek Nayak" , , , , , , , , , , , , , , Subject: Re: [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups From: "Aaron Lu" Content-Type: text/plain; charset=UTF-8 References: <20260120032549.186733-1-quzicheng@huawei.com> <20260130083438.1122457-1-quzicheng@huawei.com> <1594f461-549c-4db9-b80e-63c48818fc5b@huawei.com> Date: Mon, 2 Feb 2026 15:09:15 +0800 Message-Id: <20260202070915.GA3246252@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Aaron Lu On Fri, Jan 30, 2026 at 05:03:49PM +0800, Zicheng Qu wrote: > On 1/30/2026 4:34 PM, Zicheng Qu wrote: >=20 > > 4) For kernel <=3D 5.10: Later, cgroup A is unthrottled. However, the t= ask > > P has already been migrated out of cgroup A, so unthrottle_cfs_rq() > > may observe load_weight =3D=3D 0 and return early without resched_curr(= ) > > called. For kernel >=3D 6.6: The unthrottling path normally triggers > > `resched_curr()` almost cases even when no runnable tasks remain in the > > unthrottled cgroup, preventing the idle stall described above. However, > > if cgroup A is removed before it gets unthrottled, the unthrottling pat= h > > for cgroup A is never executed. In a result, no `resched_curr()` can be > > called. I think you are right. > Hi Aaron, >=20 > Apologies for the confusion in my earlier description =E2=80=94 the origi= nal > failure model was identified and analyzed on kernels based on LTS 5.10. >=20 > Later I realized that on v6.6 and mainline, the issue becomes much harder > to reproduce due to additional conditions introduced in the condition > (cfs_rq->on_list) in unthrottle_cfs_rq(), which effectively mask the > original reproduction path. >=20 > As a result, I adjusted the reproducer accordingly. With the updated > reproducer, the issue can still be triggered on mainline by explicitly > bypassing the unthrottling reschedule path, as described in the commit > message. > I can reproduce the problem using your reproducer now and also verified your patch fixed the problem, so feel free to add: Tested-by: Aaron Lu