From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D942AC3DA41 for ; Mon, 8 Jul 2024 17:32:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=36ZRG7t2NsaC0A8qieCzuTEdMcFELV9Ox8phA1TGkjE=; b=tdCaCc+et9ZcOFIY3m1gWI4dwN zVcmEb8f9lkgYnNsGjJRXRHCLT2OaDtfycyKsRQQmnAdJg+t/duKr+Lr/oBjZ2lVfpqrvbwG42yNy 2pu5BkWRuruYlOMAst6Num1rszMGQ/TSdIu40JhkRj8c13spbMF5ubIe2HjkiGHnW4wJVdOh4r9Fu LHwhGwR+boVsnD6VxJU5qHXZ+FstSnjJ+qSONeJ6MSxKCpso4a0FMduv1FKlEg20+s3a4iiT1RHMu izBGbaRkexc63E5U2Snd9OxJaiIED3zM3REiIOjU6wdgXf0gggUHGmxZMY5Jp+0WPGNjov9ZvPbbg 5/A1fwbQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sQsDq-00000004b27-1YAG; Mon, 08 Jul 2024 17:32:18 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sQsDa-00000004axi-3FE6 for linux-arm-kernel@lists.infradead.org; Mon, 08 Jul 2024 17:32:04 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 0B01360EAA; Mon, 8 Jul 2024 17:32:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4D3EC116B1; Mon, 8 Jul 2024 17:32:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1720459921; bh=zsEq83nm7Ozv8Eb4jLEb08q+BCcaa6z3adsiWS4LS28=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=kFaZVUysheBnN0iK09LLpNTdgO3oIbTk3x+n8v+jUpAIyf+3FDBr2MD+XsR4Mgvoe uNlV08Jg6VUcZRfN0v0p30CKsIU70pB2Um/2C0DZAkI3keBVtdyFCoL8t0MAHiSTti 98B9JClhalu8HIkqGrpg/3fgip+KKOdqIBZ8wfdwmY29pVIcarS9r47gB+wG6U2c+D m2d/435NRLsw8H5M3EQ+ogaAuJqd1JLpqtMXSZyhHSAtE4UquMFQsVGlrsNLDo7Fif kfIVpqVsscNtxbz6VuajnJrB+5nrHxO7Z+qPSeK1HR8dFmXiz9cSwmsvxTzGNZCC9P UVTK7S2TIXnrw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sQsDW-00Aftu-7Y; Mon, 08 Jul 2024 18:31:58 +0100 Date: Mon, 08 Jul 2024 18:31:57 +0100 Message-ID: <865xtf4woi.wl-maz@kernel.org> From: Marc Zyngier To: Manivannan Sadhasivam Cc: tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: MSIs not freed in GICv3 ITS driver In-Reply-To: <20240708153933.GC5745@thinkpad> References: <20240708153933.GC5745@thinkpad> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.3 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: manivannan.sadhasivam@linaro.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240708_103203_022199_197CA04E X-CRM114-Status: GOOD ( 39.24 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Mani, On Mon, 08 Jul 2024 16:39:33 +0100, Manivannan Sadhasivam wrote: > > Hi Marc, Thomas, > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > in its_alloc_device_irq(). That's because we can only allocate EventIDs as a number of ID bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or 2^24. This is a power-of-two architecture. > So 4 entries are allocated by bitmap_find_free_region(). Assuming you're calling about its_alloc_device_irq(), it looks like a bug. Or rather, some laziness on my part. The thing is, this bitmap is only dealing with sub-allocation in the pool that has been given to the endpoint. So the power-of-two crap doesn't really matter unless you are dealing with Multi-MSI, which has actual alignment requirements. > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > will only allocate 3 MSIs, leaving one bitmap entry unused. > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > allocated MSIs were freed and their bitmap entries were also released. But the > entry for the additional bitmap was never released. Due to this, > its_free_device() was also never called, resulting in the ITS device not getting > freed. > > So when the PCIe driver tries to request the MSIs again (PCIe device being > removed and inserted back), because the ITS device was not freed previously, > MSIs were again requested for the same ITS device. And due to the stale bitmap > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > available. This forces the PCIe driver to reduce the MSI count, which is sub > optimal. > > This behavior might be applicable to other irqchip drivers handling MSI as well. > I want to know if this behavior is already known with MSI and irqchip drivers? > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > and use a dummy MSI handler for the extra number of MSIs allocated. This could > also be done in the generic MSI driver itself to avoid changes in the PCIe > drivers. But I wouldn't say it is the best possible fix. No, that's terrible. This is just papering over a design mistake, and I refuse to go down that road. > > Is there any other way to address this issue? Or am I missing something > completely? Well, since each endpoint handled by an ITS has its allocation tracked by a bitmap, it makes more sense to precisely track the allocation. Here's a quick hack that managed to survive a VM boot. It may even work. The only problem with it is that it probably breaks a Multi-MSi device sitting behind a non-transparent bridge that would get its MSIs allocated after another device. In this case, we wouldn't honor the required alignment and things would break. So take this as a proof of concept. If that works, I'll think of how to deal with this crap in a more suitable way... M. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 3c755d5dad6e6..43479c9e7f8d2 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3475,15 +3475,16 @@ static void its_free_device(struct its_device *its_dev) static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq) { - int idx; + unsigned long idx; /* Find a free LPI region in lpi_map and allocate them. */ - idx = bitmap_find_free_region(dev->event_map.lpi_map, - dev->event_map.nr_lpis, - get_count_order(nvecs)); - if (idx < 0) + idx = bitmap_find_next_zero_area(dev->event_map.lpi_map, + dev->event_map.nr_lpis, 0, nvecs, 0); + if (idx >= dev->event_map.nr_lpis) return -ENOSPC; + bitmap_set(dev->event_map.lpi_map, idx, nvecs); + *hwirq = dev->event_map.lpi_base + idx; return 0; @@ -3653,9 +3654,9 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq, struct its_node *its = its_dev->its; int i; - bitmap_release_region(its_dev->event_map.lpi_map, - its_get_event_id(irq_domain_get_irq_data(domain, virq)), - get_count_order(nr_irqs)); + bitmap_clear(its_dev->event_map.lpi_map, + its_get_event_id(irq_domain_get_irq_data(domain, virq)), + nr_irqs); for (i = 0; i < nr_irqs; i++) { struct irq_data *data = irq_domain_get_irq_data(domain, -- Without deviation from the norm, progress is not possible.