From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49E8143CEEB for ; Fri, 15 May 2026 09:39:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778837943; cv=none; b=PG2fKsQFqW8ztJ8Ac86X64vE7O0JEg9uBvJSGZ2WbhsFaS1RcJjZqnYn2yqnEuxAig0rIR5HevxGXuKQjNVSuEb6xHwXx16qdmb+tzScc7RcDdHIObKar1OFxEcQz+jPEnrUvRfFS5D67ZLk22zD5cbP+JZLsX4LmtK+m7Dn+Wc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778837943; c=relaxed/simple; bh=Ii5E4NDIbbFSIgFkGyUpNTq1MD2LkMKH3/ib99MhU9o=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=iutlpVySb4QVo2hsDBAEmFcgU6QlwXpQuJVK4XRxHjJa2D3OaQG2QPFBDCxRz52ecZhANuyLBPKd27py/bAM5aaEgaLeWUVDpycr6VYm4cxZQHPBKXo5BQjz8OdjBv92WpyNHimfkSJHBia+G//+6H4XZ4W7aj7oXv2CIN0TuXM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nBtlBHBz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nBtlBHBz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D29D9C2BCB8; Fri, 15 May 2026 09:39:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778837942; bh=Ii5E4NDIbbFSIgFkGyUpNTq1MD2LkMKH3/ib99MhU9o=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=nBtlBHBzB0ZYTgP1K5XHJVFV35ytREDeXXlA23eCgJ0IQpoPua4+ImftoM/8wTDzs tyb1p8ELExEIgirviq6MBUWxe7QuvZjFwiBkh9HHYohMMcZ2fI+tYAX3YaYQz0reA2 9UdCdAweQEIQnfpylF6G8nBNLN413TWH7rxTxsoYb3yFQhcGJRdWaTmiLVrM9ATqEp nj5UaZPzBpKwNtFXqD4Vn7X3UainrGSOQtAmY5mRjCe031rT/G97xmvAh6UUYz7i3d sOkR9ABjgHIgn7NCRD8cWFKmO6428oOkmmlFYTWmA+GIQ3UWas5vG0xf8Pq9AkZcu3 dhPj2UlWnqWSw== From: sashiko-bot@kernel.org Subject: Re: Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree Reply-To: sashiko-reviews@lists.linux.dev To: gregkh@linuxfoundation.org Cc: sched-ext@lists.linux.dev In-Reply-To: <2026051520-oak-marlin-d109@gregkh> References: <2026051520-oak-marlin-d109@gregkh> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Fri, 15 May 2026 09:39:01 +0000 Message-Id: <20260515093902.D29D9C2BCB8@smtp.kernel.org> Precedence: bulk X-Mailing-List: sched-ext@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 1 potential issue(= s) to consider: - [High] Unbounded iteration with interrupts disabled in `bypass_lb_cpu()` = due to skipping tasks without yielding. -- commit a8567d453e072dcdd0102db7ef988700bdb7dc40 Author: Greg Kroah-Hartman sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() This commit updates the bypass_lb_cpu() load balancer to skip tasks that have a stale task_rq. This prevents moving tasks between per-CPU bypass dispatch queues without having the correct runqueue lock held. > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c [ ... ] > @@ -4005,6 +4005,15 @@ static u32 bypass_lb_cpu(struct scx_sched *sch, st= ruct rq *rq, > if (cpumask_empty(donee_mask)) > break; > =20 > + /* > + * If an earlier pass placed @p on @donor_dsq from a different > + * CPU and the donee hasn't consumed it yet, @p is still on the > + * previous CPU and task_rq(@p) !=3D @rq. @p can't be moved > + * without its rq locked. Skip. > + */ > + if (task_rq(p) !=3D rq) > + continue; Does this regression lead to unbounded iteration with local interrupts disabled? Looking at bypass_lb_cpu(), it iterates over tasks in donor_dsq while holdi= ng rq->lock and donor_dsq->lock. The function includes a batching mechanism to drop these locks and prevent CPU stalls: nr_balanced++; if (!(nr_balanced % SCX_BYPASS_LB_BATCH) && n) { list_move_tail(&cursor.node, &n->scx.dsq_list.node); raw_spin_unlock(&donor_dsq->lock); raw_spin_rq_unlock_irq(rq); cpu_relax(); ... When tasks are skipped via the new continue statement, nr_balanced is not incremented. If the bypass dispatch queue contains a massive number of tasks that were recently load-balanced but not yet consumed, they will all have a stale task_rq. Since they are all skipped, the loop might process an immense queue without ever incrementing nr_balanced or triggering the lock yield. Would it be safer to use a separate loop iteration counter rather than the number of successfully balanced tasks to trigger the yield? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/2026051520-oak-marl= in-d109@gregkh?part=3D1