From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F7A61B86C9 for ; Tue, 9 Jul 2024 19:24:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720553081; cv=none; b=tK4OZ4XvAJHVrrihw6/fPgmbUEaJcKhxoyK0SpTA/Zpe3w5SmGrCDwdfjIjW9YRcBL0IAkDm3mooPS1EynnXVOvlWughdk13p1e2T5id1nsuRCtZdH/xL9qwwNNZ45NV2AFouomnrxdmKJ5xJ9m7vZOiBGM30xsdHIg7n3ZKXwg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720553081; c=relaxed/simple; bh=TM87fZir4AzOiwacUrIBN/pn/2Ab9eCt2fUU0JMqOzk=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=MB66VRHS2SjEXBjN01SwHCj6veTPSlVjAW+20KJbgOwSQC6S0VYnfTYhDFBbI1yqcmkE+aK5roXr/aKZuy6cpA1BBCetv4IAufJLp24OXkHiY5YKtiAqeVdNFOMaWqD3j7GSH8838P6qNTDMZJLLUF2G9UDHMiJ+m6IF5FXwQ1U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pyds89BP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pyds89BP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9E63C3277B; Tue, 9 Jul 2024 19:24:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1720553080; bh=TM87fZir4AzOiwacUrIBN/pn/2Ab9eCt2fUU0JMqOzk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=pyds89BPMVT7ITYJF3R5oPiyTP7y3j4rmolEgF7cDgz1YNntcbegkU+f2llzUPlt2 mX71aXeR6uwSt6zfhsXgjZrbBAL+sBCIadUsOYkkgiYiKyUKIn9Wc01MBrp3HOigh7 0BdbMwR3DAbG9aI2RkBeRu0Z0nTvlGRQGSjfhm2SypG7WnKu/eJ4Un3nrbTNqyY2eD rGxZPTtaovl9U6G4d2MQL6mNxS2B1GBQDUXp7sruKUS8BKzvwuBjyqot7s/kEByizX FsO5j1Mz39K1WKYygtcoF9/r+r6bgwD+nBvtL11Nfj1L8exvlCt/A0cft+QWCWN0aU MC97aQJ0SNAzg== Received: from ip-185-104-136-29.ptr.icomera.net ([185.104.136.29] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sRGS6-00B1Er-Ic; Tue, 09 Jul 2024 20:24:38 +0100 Date: Tue, 09 Jul 2024 20:24:37 +0100 Message-ID: <877cdupdvu.wl-maz@kernel.org> From: Marc Zyngier To: Manivannan Sadhasivam Cc: tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: MSIs not freed in GICv3 ITS driver In-Reply-To: <20240709173708.GA44420@thinkpad> References: <20240708153933.GC5745@thinkpad> <865xtf4woi.wl-maz@kernel.org> <20240709173708.GA44420@thinkpad> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.104.136.29 X-SA-Exim-Rcpt-To: manivannan.sadhasivam@linaro.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Tue, 09 Jul 2024 18:37:08 +0100, Manivannan Sadhasivam wrote: > > On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote: > > Mani, > > > > On Mon, 08 Jul 2024 16:39:33 +0100, > > Manivannan Sadhasivam wrote: > > > > > > Hi Marc, Thomas, > > > > > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > > > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > > > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > > > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > > > in its_alloc_device_irq(). > > > > That's because we can only allocate EventIDs as a number of ID > > bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or > > 2^24. This is a power-of-two architecture. > > > > Ah okay. > > > > So 4 entries are allocated by bitmap_find_free_region(). > > > > Assuming you're calling about its_alloc_device_irq(), it looks like a > > bug. Or rather, some laziness on my part. The thing is, this bitmap is > > only dealing with sub-allocation in the pool that has been given to > > the endpoint. So the power-of-two crap doesn't really matter unless > > you are dealing with Multi-MSI, which has actual alignment > > requirements. > > > > Okay. > > > > > > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > > > will only allocate 3 MSIs, leaving one bitmap entry unused. > > > > > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > > > allocated MSIs were freed and their bitmap entries were also released. But the > > > entry for the additional bitmap was never released. Due to this, > > > its_free_device() was also never called, resulting in the ITS device not getting > > > freed. > > > > > > So when the PCIe driver tries to request the MSIs again (PCIe device being > > > removed and inserted back), because the ITS device was not freed previously, > > > MSIs were again requested for the same ITS device. And due to the stale bitmap > > > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > > > available. This forces the PCIe driver to reduce the MSI count, which is sub > > > optimal. > > > > > > This behavior might be applicable to other irqchip drivers handling MSI as well. > > > I want to know if this behavior is already known with MSI and irqchip drivers? > > > > > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > > > and use a dummy MSI handler for the extra number of MSIs allocated. This could > > > also be done in the generic MSI driver itself to avoid changes in the PCIe > > > drivers. But I wouldn't say it is the best possible fix. > > > > No, that's terrible. This is just papering over a design mistake, and > > I refuse to go down that road. > > > > Agree. But what about other MSI drivers? And because of the MSI design, they > also round the requested MSI count to power of 2, leading to unused vectors and > those also wouldn't get freed. This has absolutely nothing to do with the "design" of MSIs. It has everything to do with not special-casing Multi-MSI. > I think this power of 2 limitation should be > imposed at the API level or in the MSI driver instead of silently keeping unused > vectors in irqchip drivers. You really have the wrong end of the stick. The MSi API has *zero* control over the payload allocation. How could it? The whole point of having an MSI driver is to insulate the core code from such stuff. > > > > > > > Is there any other way to address this issue? Or am I missing something > > > completely? > > > > Well, since each endpoint handled by an ITS has its allocation tracked > > by a bitmap, it makes more sense to precisely track the allocation. > > > > Here's a quick hack that managed to survive a VM boot. It may even > > work. The only problem with it is that it probably breaks a Multi-MSi > > device sitting behind a non-transparent bridge that would get its MSIs > > allocated after another device. In this case, we wouldn't honor the > > required alignment and things would break. > > > > So take this as a proof of concept. If that works, I'll think of how > > to deal with this crap in a more suitable way... > > > > This works fine. Now the ITS driver allocates requested number of MSIs, thanks! Well, as I said, this breaks tons of other things so I'm not going to merge this any time soon. Certainly not before Thomas gets his MSI rework upstream. And then I need to work out how to deal with Multi-MSI in the correct way. So don't hold your breath. M. -- Without deviation from the norm, progress is not possible.