From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 668834204E for ; Mon, 22 Dec 2025 16:16:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766420192; cv=none; b=cOTWK/zlcslT5/UTVXumQx8YW2ozeXqIwGRqeADwYY8/ZfOVoVie0PuGpDNi799oad3i8uKJFCWybsr4dD1GUgc1YUN3zVIdMsUUpk9X5XwfRPOIAU6zcMbeZtH5L32K3MWVoYBteCmM7D7OkJy62ciwTEO3afu1jSflWL5wyMY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766420192; c=relaxed/simple; bh=2x0Y7p6hrf76BfJsr3xSZgGuD4T3gaJ/KGn33EC4sMg=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=DJT5xmn/itrXoZYosy1xJ9autB4NNYxFhTbx42yJ1q0HZORVwm983aur0yN7n3R0Q700dFpu/R5DmU3qXgCQGM9yiOGkEQWjdKHeClxYpd3ZU12outQPavPMYZaUahwfkclapIiHzTHfHR8Z4MDCW3yDWYU4KLjvwXgzsf/1D18= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fxUBhJHV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fxUBhJHV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1058C4CEF1; Mon, 22 Dec 2025 16:16:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766420192; bh=2x0Y7p6hrf76BfJsr3xSZgGuD4T3gaJ/KGn33EC4sMg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=fxUBhJHVNzOpuOcQgDw30etVn+5mmi0LSzaWlUbHAqcOW9hDYtsGyqIeukpft4C0i xPp5Q361Q4aVKzYQq14eSjKN6MkYmru2mWpwIyZWtFlJzjjtU/W4p9XGjs7NonR8nD 1P/1PPZtoMvc8XRGr/VDjw81jumhwtYoKEOQv5gqr5PKvna8V2Bv0KIE4m/TgU54OH /N6tbUNEahbJb2y3aBEgFSyiTHDuFgjoaNmTbPKR+/SjqAmyCvOd+A41+xlnjChMsK StLYcPRkC0+sGTBq8bi/vxHipdSwgpzPphOVarGWWk2smoGMNi4ZH60ULZrl5yI57Q rC6YxilrvSJ7Q== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vXiaD-0000000Ea5D-2jE8; Mon, 22 Dec 2025 16:16:29 +0000 Date: Mon, 22 Dec 2025 16:16:29 +0000 Message-ID: <86ms3amqzm.wl-maz@kernel.org> From: Marc Zyngier To: Luigi Rizzo Cc: tglx@linutronix.de, bhelgaas@google.com, linux-kernel@vger.kernel.org Subject: Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag In-Reply-To: References: <20250903135433.380783272@linutronix.de> <20251220193120.3339162-1-lrizzo@google.com> <87tsxkdp6s.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: lrizzo@google.com, tglx@linutronix.de, bhelgaas@google.com, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Sun, 21 Dec 2025 12:41:37 +0000, Luigi Rizzo wrote: >=20 > On Sun, Dec 21, 2025 at 12:55=E2=80=AFPM Marc Zyngier wr= ote: > > > > On Sat, 20 Dec 2025 19:31:19 +0000, > > Luigi Rizzo wrote: > > > > > > There are platforms (including some ARM SoC) where the MSIx > > > writes are a performance killer, because they are exceedingly > > > serializing on the PCIe root port. > > > > > > These platforms are the key motivation for Global Software > > > Interrupt Moderation (GSIM) which relies on actually masking > > > device interrupts so the MSIx writes are not generated. > > > https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.co= m/ > > > > > > Overriding mask/unmask with irq_chip_mask_parent() makes software > > > moderation ineffective. GSIM works great on ARM platforms before > > > this patch, but becomes ineffective afterwards, e.g. on linux 6.18. > > > > You do realise that "ARM platforms" means nothing at all, right? What > > you actually mean is "the ARM machines I have access to exhibit some > > platform-specific behaviour that may or may not be a general > > behaviour". > > > > Your particular circumstances are not in any way something you can > > generalise, unless you demonstrate this is caused by an architectural > > requirement rather than an implementation defect. >=20 > You are right, I should have been more precise and clarified "some arm ma= chines > I have access to". Note though that the problem addressed by > https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/ > is not for one broken snowflake. It affects multiple SoC families from > all vendors > (Intel, AMD, ARM), and is not new at all. Back in 2020 when Eric > Dumazet and I developed > napi_defer_hard_irqs to address this very problem on a specific platform = (x86). > And sure, there are platforms that tolerate 30M intrs/s without a sweat. >=20 > Anyways. Systems are what they are, some have suboptimal > implementations which make certain operations more expensive > than they could be. We can just say "tough luck" and write them off > as broken, or try to mitigate the problem, and I am just exploring > how we can do the latter without harming common cases. >=20 > > > The round trip through the PCI endpoint for mask_irq(), caused by the > > > readback to make sure the PCI write has been sent, is almost always > > > (or really always) unnecessary. Masking is inherently racy; waiting > > > that the PCIe write has arrived at the device won't guarantee that an > > > interrupt has arrived in the meantime, so there is really no benefit > > > in the readback (which, for instance, can be conditionally removed wi= th > > > code like the one below). > > > > > > I measured the cost of pci_irq_mask_msix() and it goes from 1000-1500= ns > > > with the readl(), down to 40-50ns without it. > > > > > > Once we remove the costly readback, is there any remaining reason > > > to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ? > > > > So you are effectively not masking at all and just rely on hope > > instead. I have the utmost confidence in this sort of stuff. Totally. >=20 > I don't understand the above comment. > Masking happens as a result of the PCIe write, > which will eventually reach the device. The presence of the > readback does nothing to accelerate the landing of the write. It doesn't accelerate it. It *guarantees* that the write is observed and has taken effect. It acts as a completion barrier. Without it, the write can be buffered at an arbitrary location in the interconnect, or stored in the device but not acted upon. What you have here is the equivalent of throwing a message in a bottle at sea, and expecting a guaranteed reply. >=20 > If the expectation was "after the readl() there are no interrupts", > that is incorrect, because one may have been generated before > the mask landed, be in flight in the interrupt controller, > and fire after the readl() completes. What is guaranteed is that there is no *new* interrupt. Without read, there is no requirement that the mask ever takes effect. M. --=20 Without deviation from the norm, progress is not possible.