From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7D9F7CCD199 for ; Thu, 16 Oct 2025 18:54:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: Message-ID:Date:References:In-Reply-To:Subject:Cc:To:From:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Cw3pW5LiVDOcN8pDOG3T4cobF3AN3EhzWUWIZj4Jfn0=; b=jZs5jhuHV2DNvotFIbfdbI1eof ypUzJjh3QafZ/REh90wh5j8S7G0i28IJXp6QQ8YL9EgrAqo4DOqpiS4KWqv42vFc3r4lWJI5+Zu1Z M1LWCcaxTDcJOwPdXz/0AiKELIjhJLj+8S1Law148JYmGOfnnxdX6mDQBGc76F2ygFiMIkUVC6FDn WggCZaM0qBcvGh3HAl9EOKIWYW85lca2QxlZsfiBdTDvGQXyQTg+Dt64SbdiDFrKjvq7u7X2VfKkG jwAZsXW5sAj8/sX5ylgTPm2wpXa/O8+A+CuLXEuKqI5uyGNTiEe34MjJ+XTAHXo5qkARFo1vfzzH9 2Qm1DOnQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1v9T6y-00000005jpZ-3tcc; Thu, 16 Oct 2025 18:54:04 +0000 Received: from galois.linutronix.de ([193.142.43.55]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1v9T6w-00000005joF-3bsH for linux-arm-kernel@lists.infradead.org; Thu, 16 Oct 2025 18:54:04 +0000 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1760640841; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Cw3pW5LiVDOcN8pDOG3T4cobF3AN3EhzWUWIZj4Jfn0=; b=nGQVX2QkQPTCRrQ3BVNkCItQ8edVCSCv+T5F2Qf/zP2coTUp9sAt8aAxm4Ht2INwayVIV8 RADP6UycoswqT4A6oeZQpAB+bsa27Eg5IvIjszqJB89aSsB19pXQxg+Kz9kH0dm/ueVy+G 6tdpH8WNg9IsLC7dnL0CMx03hTmlc/pPtdR18iD0yytSPE8627j4MyWKJQEGXrCYpVMhfq Jr4KM++28o5bwAeNoKWu6cFIMb3JnAd+gZUE/opBDKWnkoMlAeuK8Bub6mRnWh3JMr92Ra GUUxWv69KXu+PQq1M6Bg72OhVgc9LbkOXS3SE7b1mzWVs/txkfKRe2k2eWtFVQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1760640841; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Cw3pW5LiVDOcN8pDOG3T4cobF3AN3EhzWUWIZj4Jfn0=; b=VfN0Q0lmGJs/KvetSfYA7RgYqToXfCulGzUCe3r/4KAEMiWwRd1TmbHapQ9XlNL4Jtbzfr Fe/1qgwaEUvjpyAQ== To: Thierry Reding , Marc Zyngier Cc: linux-tegra@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: IRQ thread timeouts and affinity In-Reply-To: References: Date: Thu, 16 Oct 2025 20:53:59 +0200 Message-ID: <87cy6m1xvc.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20251016_115403_039676_F5124DD4 X-CRM114-Status: GOOD ( 22.47 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Oct 09 2025 at 13:38, Thierry Reding wrote: > We've been running into an issue on some systems (NVIDIA Grace chips) > where either during boot or at runtime, CPU 0 can be under very high > load and cause some IRQ thread functions to be delayed to a point where > we encounter the timeout in the work submission parts of the driver. > > Specifically this happens for the Tegra QSPI controller driver found > in drivers/spi/spi-tegra210-quad.c. This driver uses an IRQ thread to > wait for and process "transfer ready" interrupts (which need to run > DMA transfers or copy from the hardware FIFOs using PIO to get the > SPI transfer data). Under heavy load, we've seen the IRQ thread run > with up to multiple seconds of delay. If the interrupt thread which runs with SCHED_FIFO is delayed for multiple seconds, then there is something seriously wrong to begin with. You fail to explain how that happens in the first place. Heavy load is not really a good explanation for that. > Alternatively, would it be possible (and make sense) to make the IRQ > core code schedule threads across more CPUs? Is there a particular > reason that the IRQ thread runs on the same CPU that services the IRQ? Locality. Also remote wakeups are way more expensive than local wakeups. Though there is no actual hard requirement to force it onto the same CPU. What could be done is to have a flag which binds the thread to the real affinity mask instead of the effective affinity mask so it can be scheduled freely. Needs some thoughts, but should work. > Maybe another way would be to "reserve" CPU 0 for the type of core OS > driver like QSPI (the TPM is connected to this controller) and make sure > all CPU intensive tasks do not run on that CPU? > > I know that things like irqbalance and taskset exist to solve some of > these problems, but they do not work when we hit these cases at boot > time. I'm still completely failing to see how you end up with multiple seconds delay of that thread especially during boot. What exactly keeps it from getting scheduled? Thanks, tglx