From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2D5E22A4EE; Thu, 14 May 2026 00:53:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778720003; cv=none; b=r6i5D9mUxEpyQx21CmkMgahquz7SSd7ALK0xe7ftLqRk7MyYc9IBZcT7Al/n7p1NgDtg9Esc/iqrQ/i0z1nvVCtNrxnHlI9G0AFzwX+cr+v5LczxTNZ/VVzg4St0fNBIWLIldRuv84qnVcBHtetEgbyH8LSglRAkq9pezS5LrJA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778720003; c=relaxed/simple; bh=tIZqnUSBTfvu8MvGTUzCx4rW3xP45eov3BgjVPVinTA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=UprOXCddRcMybLgc2Y7hiUs46lSaEuhHsFg6mNI4fUJREOYdbgelMtnc6GGwqR4M9aK2wcY28hCiZ4ALK4R678jFNuyO7GsBGAQscwxqbb4go49Gmm1I5sAwKxNQMR1AutFL8Eks0Cb0hJ3Tx36v++iGvWt8Aoy8IfkmEVIC/tA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=S2rwyA8I; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="S2rwyA8I" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 54EA8C19425; Thu, 14 May 2026 00:53:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778720002; bh=tIZqnUSBTfvu8MvGTUzCx4rW3xP45eov3BgjVPVinTA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=S2rwyA8IbRA3O4IpvBHu1zfGKqXbXYJc/YlKOllKzF4ltDIlkWrF0b6KCWdNbF3+l xo1bvJnsQ49FWixDMSvtFTHeLmWV4QOVcy28QKmxHF7rNzxHit9DG+dl1GHx9jaN40 ZyOx/mP6I1dJtnueYXe7LFitI0/rpojK4kefZYpkbjEwqNUtsYADcjoh+bpy4g0+xQ uqZLrp1zbmDBtGw5dkt6E5pM4gzPaBTTRDvVDhKSctkIoMmGjZfhT6oTRXSMySAszX vun4RCeRk5PPjztHszHibR9x+hwz30IpaTyqPAlWrHGrBsi+OTMKLJWWVlxAevpkXT mwQ6b7Gicw7nw== Date: Wed, 13 May 2026 14:53:21 -1000 From: Tejun Heo To: Steven Rostedt Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Kyle McMartin , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Linux RT Development , Clark Williams , Sebastian Andrzej Siewior , John Kacur Subject: Re: [PATCH sched/core] sched/rt: Fix RT_PUSH_IPI soft lockup loop Message-ID: References: <20260506235716.2530720-1-tj@kernel.org> <20260507141437.GJ3102624@noisy.programming.kicks-ass.net> <20260512113754.448c1f5b@gandalf.local.home> <056f95bc5805f7e161458984fff4b3cb@kernel.org> <20260512172847.5024e5e8@gandalf.local.home> <20260513193914.1593369-1-tj@kernel.org> <20260513202432.18dd7b9f@gandalf.local.home> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260513202432.18dd7b9f@gandalf.local.home> Hello, On Wed, May 13, 2026 at 08:24:32PM -0400, Steven Rostedt wrote: > > - Per-target hrtimer (HRTIMER_MODE_REL_PINNED_HARD) fires every > > 750us. Each fire schedules one tasklet round-robin from a pool > > of 20k distinct tasklets. Each tasklet body is a 500us cpu_relax > > loop, standing in for "process one item of softirq work". > > So you are running a softirq for 500us every 750us? > > This basically prevents any task from running on these CPUs while the > softirq is executing. Hmmm? The utilization is high at around 70%. It can still run something and wouldn't lock up. The prod repro case isn't this high. More like 30-40%. It's just difficult to make syntheric repro reliable with that. > > - Storm driver: 190 SCHED_FIFO-50 nanosleep loops on non-target > > CPUs drive tell_cpu_to_push from balance_rt. Two synthetic > > psimon-shaped kthreads (FIFO 1) bound to the targets to pin > > them into rto_mask. > > What exactly are these synthetic kthreads doing. Have code to share? It's just looping set number of times. Here's the slop: https://gist.github.com/htejun/ba43a0a7bc6f6503602ada850f45ce4d > The IPI walker should only go to the CPUs with overloaded RT tasks. Are you > making all the CPUS have overloaded RT tasks? Only 2 cpus are overloaded. I don't know why it used FIFO threads on CPUs that aren't overloaded. It's just using that to pulse CPUs to trigger need_pull_rt_task(). > So this is showing that the IPI logic is just extending the softirq work > load to something greater than the period of execution and causing a live > lock of softirqs. > > This still doesn't explain to me why the current process is of a lower > priority than a waiting RT task. 1. The CPU was running a fair task. 2. IRQ triggers which creates softirq work. 3. Either IRQ, softirq or another CPU wakes up multiple RT tasks to the CPU. 4. The CPU enters softirq. 5. Other CPUs keep sending pull IPIs, slowing softirq processing. 6. Before softirq processing finishes, another IRQ happens which creates more softirq work. Go back to 4. > I'm really starting to think you are fixing a symptom and not the cause. It seems relatively straightforward to me. The CPU was relatively loaded with irq/softirq. While in irq context, RT tasks wake up to it and then the CPU gets hammered by pull IPIs to the point where it's constantly chasing new softirq work and thus can't leave irq context in a reasonable amount of time. What am I missing? Thanks. -- tejun