From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91E6FCA0EC4 for ; Mon, 11 Aug 2025 22:03:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=L+1rnDgw8Tv80nH+ey7Ku6m5JqdXMIGwd5iAOUOprBM=; b=Bcp+aMFjx/mFJsRS1fhIk0fWMm UUthfPNrTcb7xJMZljt7+yTQiGMye8kzdNw/6E6Em+NO13slXscDKr2FugMKLISoLJsxgXqeMAQrv w3aMOpxWfm8WeHYZ33jO4N1sp8Hvntw23LGwxbh02vJafL1MJlHedra1HRVcBwcJORiKr2XWoxXr8 XafHWjou/Jr87IiN+yaX8dh/Jv3doGyxW9WDYjdyxbYuAVq03kua7/DNdlrpcAq34sJzmhDfK8Ovo k/ZAkv+KvR3zyrUFgvPMDN3O+qycazTjLAUNL3B243NdUldeeTrzsLgE3p2gczY4LEg2K0Unwsfay LFoJ3cyA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1ulabW-00000009Anp-0UUL; Mon, 11 Aug 2025 22:02:54 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1ulVeS-00000008SVl-3SM2 for linux-arm-kernel@lists.infradead.org; Mon, 11 Aug 2025 16:45:38 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754930734; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L+1rnDgw8Tv80nH+ey7Ku6m5JqdXMIGwd5iAOUOprBM=; b=IeaYvgAWKbmLAQakgK986hgLw34lTdmkIJY22LWl2L7Gf7ETxFbiZrWLexVi+FZRAROl8S NaDyVVpYqaHEjphFxlam/mCTWHJj3078hcnyzrSUQpwvgHTCIEZ0akCRlzzDW1DVeKou4T guSTdqCL1QqMSvGKQQ6LROgAsfuLg8s= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-644-6cJCwXvwNoiQlP9stqJ-CA-1; Mon, 11 Aug 2025 12:44:05 -0400 X-MC-Unique: 6cJCwXvwNoiQlP9stqJ-CA-1 X-Mimecast-MFC-AGG-ID: 6cJCwXvwNoiQlP9stqJ-CA_1754930644 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-3b8dac70703so2324279f8f.1 for ; Mon, 11 Aug 2025 09:44:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754930644; x=1755535444; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L+1rnDgw8Tv80nH+ey7Ku6m5JqdXMIGwd5iAOUOprBM=; b=VvnK7O55WyJEhq/pBwmLSvrRCFfgeOVb3hNx2r1gweW2coUF+d8AaPhcOHtL343ifZ VXTnA21mL85d3+1Uf2XDGA4OzQn+0TcxPBsI6VbNbxCD2duxzUXrHK6veh+mLxef50XL Bblex+9u+ZJGjks4sWS6pd4/kaT/lYAR8AbFfTfSHuHvtzGfrN5tL2K036xnbuoA8sM/ xzaBGrewnc5luWx1wN70pN67Intg/UruQ695SpVTe9cpbciTdGoRZ3aKPFWlQL/gwVPZ 4T4mGs76sFIesOtyTm3BQqSdL+MDbnc8J30jUz/rADa4imISZQWoSu0snhBMq+0x2CKP HBkw== X-Forwarded-Encrypted: i=1; AJvYcCWP7eZpzGdtFEOQ0xlt80iFBZ8OVaARLhGl8L63aQ81sctPe9niGwSOs2ypzWbORVvunaNGrn1oOcSSGstMqObL@lists.infradead.org X-Gm-Message-State: AOJu0Yzjxi3MbOQorBXeltGrmJwFnkIJcS29hLXP0DXVysia4vlfizsG nAsPlctiFwSWC9tY6tCTnnlY6rbgg5xlw2x7aoEZGQy/8ahcRPO6rDyqDC7cV0DM+WwbbBBJ2U/ JIg+kYdsus2eOTTFNE5R6D7IWLTfZ8dGZ+h84wwLbRvbtlqjG/AeJpUoiLr+zLYG2EFLV+UCqkA iw X-Gm-Gg: ASbGncs4nbX0AqiK7OdwWdxp9M68GdRja5X42Yav2EMhMissxwjJCs3Ej+PXH0G2a/P awTmt2w5LpkdP9NhDp8iw5yXL6PEZQNLSorpncD7LEE1zaGy4ej5JlX4O8/6XaOw0UKki+GOJEJ 15tqulSjSH+fPZRx+kJMaSoA5zyZKWn6NpCp5nxdh1ODyYjNMzt4sB9q9N5K46Pb/8YwRfLJmFI 6tnERj9qLBDR6UjFYVXLd19bg6GgZJ6ynM1OTvHTMTJKZTMEL5ox1xdYTM1dX2lO9FE7Rs/d+7i ReWh+djytPk1mIgctf7iCAgki9dONhtUY4pXu/eWfTc2HoxBp4ssLqRjBcauldOOydXf2tjHL+J dsqXYJA== X-Received: by 2002:a05:6000:250d:b0:3a5:8934:4959 with SMTP id ffacd0b85a97d-3b9111f4b67mr224488f8f.27.1754930643996; Mon, 11 Aug 2025 09:44:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF25oCoGLeshSvyEUotSfgJPp3ymDVFFAcZXUEv7L40wYhQ5+RuhrVgr4ahVqH9ylExHZdhLQ== X-Received: by 2002:a05:6000:250d:b0:3a5:8934:4959 with SMTP id ffacd0b85a97d-3b9111f4b67mr224430f8f.27.1754930643412; Mon, 11 Aug 2025 09:44:03 -0700 (PDT) Received: from ?IPV6:2a01:e0a:f0e:9070:527b:9dff:feef:3874? ([2a01:e0a:f0e:9070:527b:9dff:feef:3874]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-459db13fc9fsm335275495e9.7.2025.08.11.09.44.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 Aug 2025 09:44:02 -0700 (PDT) Message-ID: Date: Mon, 11 Aug 2025 18:44:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path To: Robin Murphy , Lorenzo Pieralisi , Hanjun Guo , Sudeep Holla , "Rafael J. Wysocki" , Len Brown , Russell King , Greg Kroah-Hartman , Danilo Krummrich , Stuart Yoder , Laurentiu Tudor , Nipun Gupta , Nikhil Agarwal , Joerg Roedel , Will Deacon , Rob Herring , Saravana Kannan , Bjorn Helgaas , Jerry Snitselaar , Jean-Philippe Brucker Cc: linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, devicetree@vger.kernel.org, linux-pci@vger.kernel.org, Charan Teja Kalla References: From: Eric Auger In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ww2Jn16PPsA8jAk-0ezi4Y65sfkHMZsKMQ1g-MgLWto_1754930644 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250811_094537_008140_192D8F70 X-CRM114-Status: GOOD ( 50.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Robin, On 2/28/25 4:46 PM, Robin Murphy wrote: > In hindsight, there were some crucial subtleties overlooked when moving > {of,acpi}_dma_configure() to driver probe time to allow waiting for > IOMMU drivers with -EPROBE_DEFER, and these have become an > ever-increasing source of problems. The IOMMU API has some fundamental > assumptions that iommu_probe_device() is called for every device added > to the system, in the order in which they are added. Calling it in a > random order or not at all dependent on driver binding leads to > malformed groups, a potential lack of isolation for devices with no > driver, and all manner of unexpected concurrency and race conditions. > We've attempted to mitigate the latter with point-fix bodges like > iommu_probe_device_lock, but it's a losing battle and the time has come > to bite the bullet and address the true source of the problem instead. > > The crux of the matter is that the firmware parsing actually serves two > distinct purposes; one is identifying the IOMMU instance associated with > a device so we can check its availability, the second is actually > telling that instance about the relevant firmware-provided data for the > device. However the latter also depends on the former, and at the time > there was no good place to defer and retry that separately from the > availability check we also wanted for client driver probe. > > Nowadays, though, we have a proper notion of multiple IOMMU instances in > the core API itself, and each one gets a chance to probe its own devices > upon registration, so we can finally make that work as intended for > DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to > be able to run the iommu_fwspec machinery currently buried deep in the > wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be > surprisingly straightforward to bootstrap this transformation by pretty > much just calling the same path twice. At client driver probe time, > dev->driver is obviously set; conversely at device_add(), or a > subsequent bus_iommu_probe(), any device waiting for an IOMMU really > should *not* have a driver already, so we can use that as a condition to > disambiguate the two cases, and avoid recursing back into the IOMMU core > at the wrong times. > > Obviously this isn't the nicest thing, but for now it gives us a > functional baseline to then unpick the layers in between without many > more awkward cross-subsystem patches. There are some minor side-effects > like dma_range_map potentially being created earlier, and some debug > prints being repeated, but these aren't significantly detrimental. Let's > make things work first, then deal with making them nice. > > With the basic flow finally in the right order again, the next step is > probably turning the bus->dma_configure paths inside-out, since all we > really need from bus code is its notion of which device and input ID(s) > to parse the common firmware properties with... > > Acked-by: Bjorn Helgaas # pci-driver.c > Acked-by: Rob Herring (Arm) # of/device.c > Signed-off-by: Robin Murphy This patch seems to break virtio-iommu along with qemu. After applying it we cannot see any iommu group. I don't have any specific warning in dmesg though. Reverting bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") fixes the issue for me. Added Jerry and jean-Philippe in the loop. Thanks Eric > --- > > v2: > - Comment bus driver changes for clarity > - Use dev->iommu as the now-robust replay condition > - Drop the device_iommu_mapped() checks in the firmware paths as they > weren't doing much - we can't replace probe_device_lock just yet... > > drivers/acpi/arm64/dma.c | 5 +++++ > drivers/acpi/scan.c | 7 ------- > drivers/amba/bus.c | 3 ++- > drivers/base/platform.c | 3 ++- > drivers/bus/fsl-mc/fsl-mc-bus.c | 3 ++- > drivers/cdx/cdx.c | 3 ++- > drivers/iommu/iommu.c | 24 +++++++++++++++++++++--- > drivers/iommu/of_iommu.c | 7 ++++++- > drivers/of/device.c | 7 ++++++- > drivers/pci/pci-driver.c | 3 ++- > 10 files changed, 48 insertions(+), 17 deletions(-) > > diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c > index 52b2abf88689..f30f138352b7 100644 > --- a/drivers/acpi/arm64/dma.c > +++ b/drivers/acpi/arm64/dma.c > @@ -26,6 +26,11 @@ void acpi_arch_dma_setup(struct device *dev) > else > end = (1ULL << 32) - 1; > > + if (dev->dma_range_map) { > + dev_dbg(dev, "dma_range_map already set\n"); > + return; > + } > + > ret = acpi_dma_get_range(dev, &map); > if (!ret && map) { > end = dma_range_map_max(map); > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > index 9f4efa8f75a6..fb1fe9f3b1a3 100644 > --- a/drivers/acpi/scan.c > +++ b/drivers/acpi/scan.c > @@ -1632,13 +1632,6 @@ static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in) > err = viot_iommu_configure(dev); > mutex_unlock(&iommu_probe_device_lock); > > - /* > - * If we have reason to believe the IOMMU driver missed the initial > - * iommu_probe_device() call for dev, replay it to get things in order. > - */ > - if (!err && dev->bus) > - err = iommu_probe_device(dev); > - > return err; > } > > diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c > index 8ef259b4d037..71482d639a6d 100644 > --- a/drivers/amba/bus.c > +++ b/drivers/amba/bus.c > @@ -364,7 +364,8 @@ static int amba_dma_configure(struct device *dev) > ret = acpi_dma_configure(dev, attr); > } > > - if (!ret && !drv->driver_managed_dma) { > + /* @drv may not be valid when we're called from the IOMMU layer */ > + if (!ret && dev->driver && !drv->driver_managed_dma) { > ret = iommu_device_use_default_domain(dev); > if (ret) > arch_teardown_dma_ops(dev); > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index 6f2a33722c52..1813cfd0c4bd 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev) > attr = acpi_get_dma_attr(to_acpi_device_node(fwnode)); > ret = acpi_dma_configure(dev, attr); > } > - if (ret || drv->driver_managed_dma) > + /* @drv may not be valid when we're called from the IOMMU layer */ > + if (ret || !dev->driver || drv->driver_managed_dma) > return ret; > > ret = iommu_device_use_default_domain(dev); > diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c > index d1f3d327ddd1..a8be8cf246fb 100644 > --- a/drivers/bus/fsl-mc/fsl-mc-bus.c > +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c > @@ -153,7 +153,8 @@ static int fsl_mc_dma_configure(struct device *dev) > else > ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id); > > - if (!ret && !mc_drv->driver_managed_dma) { > + /* @mc_drv may not be valid when we're called from the IOMMU layer */ > + if (!ret && dev->driver && !mc_drv->driver_managed_dma) { > ret = iommu_device_use_default_domain(dev); > if (ret) > arch_teardown_dma_ops(dev); > diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c > index c573ed2ee71a..780fb0c4adba 100644 > --- a/drivers/cdx/cdx.c > +++ b/drivers/cdx/cdx.c > @@ -360,7 +360,8 @@ static int cdx_dma_configure(struct device *dev) > return ret; > } > > - if (!ret && !cdx_drv->driver_managed_dma) { > + /* @cdx_drv may not be valid when we're called from the IOMMU layer */ > + if (!ret && dev->driver && !cdx_drv->driver_managed_dma) { > ret = iommu_device_use_default_domain(dev); > if (ret) > arch_teardown_dma_ops(dev); > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index a3b45b84f42b..1cec7074367a 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev) > if (!dev_iommu_get(dev)) > return -ENOMEM; > /* > - * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU > - * instances with non-NULL fwnodes, and client devices should have been > - * identified with a fwspec by this point. Otherwise, we can currently > + * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing > + * is buried in the bus dma_configure path. Properly unpicking that is > + * still a big job, so for now just invoke the whole thing. The device > + * already having a driver bound means dma_configure has already run and > + * either found no IOMMU to wait for, or we're in its replay call right > + * now, so either way there's no point calling it again. > + */ > + if (!dev->driver && dev->bus->dma_configure) { > + mutex_unlock(&iommu_probe_device_lock); > + dev->bus->dma_configure(dev); > + mutex_lock(&iommu_probe_device_lock); > + } > + /* > + * At this point, relevant devices either now have a fwspec which will > + * match ops registered with a non-NULL fwnode, or we can reasonably > * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can > * be present, and that any of their registered instances has suitable > * ops for probing, and thus cheekily co-opt the same mechanism. > @@ -426,6 +438,12 @@ static int iommu_init_device(struct device *dev) > ret = -ENODEV; > goto err_free; > } > + /* > + * And if we do now see any replay calls, they would indicate someone > + * misusing the dma_configure path outside bus code. > + */ > + if (dev->driver) > + dev_WARN(dev, "late IOMMU probe at driver bind, something fishy here!\n"); > > if (!try_module_get(ops->owner)) { > ret = -EINVAL; > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c > index e10a68b5ffde..6b989a62def2 100644 > --- a/drivers/iommu/of_iommu.c > +++ b/drivers/iommu/of_iommu.c > @@ -155,7 +155,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np, > dev_iommu_free(dev); > mutex_unlock(&iommu_probe_device_lock); > > - if (!err && dev->bus) > + /* > + * If we're not on the iommu_probe_device() path (as indicated by the > + * initial dev->iommu) then try to simulate it. This should no longer > + * happen unless of_dma_configure() is being misused outside bus code. > + */ > + if (!err && dev->bus && !dev_iommu_present) > err = iommu_probe_device(dev); > > if (err && err != -EPROBE_DEFER) > diff --git a/drivers/of/device.c b/drivers/of/device.c > index edf3be197265..5053e5d532cc 100644 > --- a/drivers/of/device.c > +++ b/drivers/of/device.c > @@ -99,6 +99,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, > bool coherent, set_map = false; > int ret; > > + if (dev->dma_range_map) { > + dev_dbg(dev, "dma_range_map already set\n"); > + goto skip_map; > + } > + > if (np == dev->of_node) > bus_np = __of_get_dma_parent(np); > else > @@ -119,7 +124,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, > end = dma_range_map_max(map); > set_map = true; > } > - > +skip_map: > /* > * If @dev is expected to be DMA-capable then the bus code that created > * it should have initialised its dma_mask pointer by this point. For > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index f57ea36d125d..4468b7327cab 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -1653,7 +1653,8 @@ static int pci_dma_configure(struct device *dev) > > pci_put_host_bridge_device(bridge); > > - if (!ret && !driver->driver_managed_dma) { > + /* @driver may not be valid when we're called from the IOMMU layer */ > + if (!ret && dev->driver && !driver->driver_managed_dma) { > ret = iommu_device_use_default_domain(dev); > if (ret) > arch_teardown_dma_ops(dev);