From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76880C2BD09 for ; Tue, 9 Jul 2024 19:25:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=vquiTeHKxT5h6LIOMY/8AmTtID8EtcGA2ONt5OkqE0g=; b=P/bArTzRbzPZR4a+jxQBHMcQ9U 0Z86EzpqaHHJAEEo8eN8z2b/Q+65adCM0QmfCHBLLdqGl5XwbfUBZXl7Vl22VfsOtIMGjYw0SXvDd ytBwpD6i+w0kpa2CISqmbuhCJ23dI/LsqeG+mK73Rlx4CSV4OA/YID0kpPeYNiEX14kYGHXB06kM2 BhugxIk2F+D1fu9Gt4/H5Kfsk3OngcDIE1MotMjEcz4R/hdTHpde6IOSFZgoPh3mQPEFrXwVUvHEe rs8KPPtyvQH7Qk2qHiEDeFuTn3LgT/iqvugtzPKs1m3EqpXqBivkwOqPZRmJk4GMHsLFpjRG8DX/t DYN0oBVQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRGSR-00000008OK6-25uT; Tue, 09 Jul 2024 19:24:59 +0000 Received: from dfw.source.kernel.org ([2604:1380:4641:c500::1]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRGSA-00000008OGU-1cXY for linux-arm-kernel@lists.infradead.org; Tue, 09 Jul 2024 19:24:43 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 23A9161662; Tue, 9 Jul 2024 19:24:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9E63C3277B; Tue, 9 Jul 2024 19:24:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1720553080; bh=TM87fZir4AzOiwacUrIBN/pn/2Ab9eCt2fUU0JMqOzk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=pyds89BPMVT7ITYJF3R5oPiyTP7y3j4rmolEgF7cDgz1YNntcbegkU+f2llzUPlt2 mX71aXeR6uwSt6zfhsXgjZrbBAL+sBCIadUsOYkkgiYiKyUKIn9Wc01MBrp3HOigh7 0BdbMwR3DAbG9aI2RkBeRu0Z0nTvlGRQGSjfhm2SypG7WnKu/eJ4Un3nrbTNqyY2eD rGxZPTtaovl9U6G4d2MQL6mNxS2B1GBQDUXp7sruKUS8BKzvwuBjyqot7s/kEByizX FsO5j1Mz39K1WKYygtcoF9/r+r6bgwD+nBvtL11Nfj1L8exvlCt/A0cft+QWCWN0aU MC97aQJ0SNAzg== Received: from ip-185-104-136-29.ptr.icomera.net ([185.104.136.29] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sRGS6-00B1Er-Ic; Tue, 09 Jul 2024 20:24:38 +0100 Date: Tue, 09 Jul 2024 20:24:37 +0100 Message-ID: <877cdupdvu.wl-maz@kernel.org> From: Marc Zyngier To: Manivannan Sadhasivam Cc: tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: MSIs not freed in GICv3 ITS driver In-Reply-To: <20240709173708.GA44420@thinkpad> References: <20240708153933.GC5745@thinkpad> <865xtf4woi.wl-maz@kernel.org> <20240709173708.GA44420@thinkpad> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.104.136.29 X-SA-Exim-Rcpt-To: manivannan.sadhasivam@linaro.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240709_122442_532228_C948A211 X-CRM114-Status: GOOD ( 50.69 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 09 Jul 2024 18:37:08 +0100, Manivannan Sadhasivam wrote: > > On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote: > > Mani, > > > > On Mon, 08 Jul 2024 16:39:33 +0100, > > Manivannan Sadhasivam wrote: > > > > > > Hi Marc, Thomas, > > > > > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > > > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > > > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > > > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > > > in its_alloc_device_irq(). > > > > That's because we can only allocate EventIDs as a number of ID > > bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or > > 2^24. This is a power-of-two architecture. > > > > Ah okay. > > > > So 4 entries are allocated by bitmap_find_free_region(). > > > > Assuming you're calling about its_alloc_device_irq(), it looks like a > > bug. Or rather, some laziness on my part. The thing is, this bitmap is > > only dealing with sub-allocation in the pool that has been given to > > the endpoint. So the power-of-two crap doesn't really matter unless > > you are dealing with Multi-MSI, which has actual alignment > > requirements. > > > > Okay. > > > > > > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > > > will only allocate 3 MSIs, leaving one bitmap entry unused. > > > > > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > > > allocated MSIs were freed and their bitmap entries were also released. But the > > > entry for the additional bitmap was never released. Due to this, > > > its_free_device() was also never called, resulting in the ITS device not getting > > > freed. > > > > > > So when the PCIe driver tries to request the MSIs again (PCIe device being > > > removed and inserted back), because the ITS device was not freed previously, > > > MSIs were again requested for the same ITS device. And due to the stale bitmap > > > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > > > available. This forces the PCIe driver to reduce the MSI count, which is sub > > > optimal. > > > > > > This behavior might be applicable to other irqchip drivers handling MSI as well. > > > I want to know if this behavior is already known with MSI and irqchip drivers? > > > > > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > > > and use a dummy MSI handler for the extra number of MSIs allocated. This could > > > also be done in the generic MSI driver itself to avoid changes in the PCIe > > > drivers. But I wouldn't say it is the best possible fix. > > > > No, that's terrible. This is just papering over a design mistake, and > > I refuse to go down that road. > > > > Agree. But what about other MSI drivers? And because of the MSI design, they > also round the requested MSI count to power of 2, leading to unused vectors and > those also wouldn't get freed. This has absolutely nothing to do with the "design" of MSIs. It has everything to do with not special-casing Multi-MSI. > I think this power of 2 limitation should be > imposed at the API level or in the MSI driver instead of silently keeping unused > vectors in irqchip drivers. You really have the wrong end of the stick. The MSi API has *zero* control over the payload allocation. How could it? The whole point of having an MSI driver is to insulate the core code from such stuff. > > > > > > > Is there any other way to address this issue? Or am I missing something > > > completely? > > > > Well, since each endpoint handled by an ITS has its allocation tracked > > by a bitmap, it makes more sense to precisely track the allocation. > > > > Here's a quick hack that managed to survive a VM boot. It may even > > work. The only problem with it is that it probably breaks a Multi-MSi > > device sitting behind a non-transparent bridge that would get its MSIs > > allocated after another device. In this case, we wouldn't honor the > > required alignment and things would break. > > > > So take this as a proof of concept. If that works, I'll think of how > > to deal with this crap in a more suitable way... > > > > This works fine. Now the ITS driver allocates requested number of MSIs, thanks! Well, as I said, this breaks tons of other things so I'm not going to merge this any time soon. Certainly not before Thomas gets his MSI rework upstream. And then I need to work out how to deal with Multi-MSI in the correct way. So don't hold your breath. M. -- Without deviation from the norm, progress is not possible.