devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jon Hunter <jonathanh@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>, Christoph Hellwig <hch@lst.de>
Cc: Vineet Gupta <vgupta@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	Hanjun Guo <guohanjun@huawei.com>,
	Sudeep Holla <sudeep.holla@arm.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Rob Herring <robh+dt@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	iommu@lists.linux.dev, devicetree@vger.kernel.org,
	Jason Gunthorpe <jgg@nvidia.com>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>
Subject: Re: [PATCH v4 5/7] iommu/dma: Make limit checks self-contained
Date: Fri, 17 May 2024 15:21:53 +0100	[thread overview]
Message-ID: <46fc1b7f-7d10-4233-b089-aa173ad3bbeb@nvidia.com> (raw)
In-Reply-To: <48c39306-c226-4e7f-a013-d679ca80157e@arm.com>


On 15/05/2024 15:59, Robin Murphy wrote:
> Hi Jon,
> 
> On 2024-05-14 2:27 pm, Jon Hunter wrote:
>> Hi Robin,
>>
>> On 19/04/2024 17:54, Robin Murphy wrote:
>>> It's now easy to retrieve the device's DMA limits if we want to check
>>> them against the domain aperture, so do that ourselves instead of
>>> relying on them being passed through the callchain.
>>>
>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Tested-by: Hanjun Guo <guohanjun@huawei.com>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   drivers/iommu/dma-iommu.c | 21 +++++++++------------
>>>   1 file changed, 9 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index a3039005b696..f542eabaefa4 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -660,19 +660,16 @@ static void iommu_dma_init_options(struct 
>>> iommu_dma_options *options,
>>>   /**
>>>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>>>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
>>> - * @base: IOVA at which the mappable address space starts
>>> - * @limit: Last address of the IOVA space
>>>    * @dev: Device the domain is being initialised for
>>>    *
>>> - * @base and @limit + 1 should be exact multiples of IOMMU page 
>>> granularity to
>>> - * avoid rounding surprises. If necessary, we reserve the page at 
>>> address 0
>>> + * If the geometry and dma_range_map include address 0, we reserve 
>>> that page
>>>    * to ensure it is an invalid IOVA. It is safe to reinitialise a 
>>> domain, but
>>>    * any change which could make prior IOVAs invalid will fail.
>>>    */
>>> -static int iommu_dma_init_domain(struct iommu_domain *domain, 
>>> dma_addr_t base,
>>> -                 dma_addr_t limit, struct device *dev)
>>> +static int iommu_dma_init_domain(struct iommu_domain *domain, struct 
>>> device *dev)
>>>   {
>>>       struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>> +    const struct bus_dma_region *map = dev->dma_range_map;
>>>       unsigned long order, base_pfn;
>>>       struct iova_domain *iovad;
>>>       int ret;
>>> @@ -684,18 +681,18 @@ static int iommu_dma_init_domain(struct 
>>> iommu_domain *domain, dma_addr_t base,
>>>       /* Use the smallest supported page size for IOVA granularity */
>>>       order = __ffs(domain->pgsize_bitmap);
>>> -    base_pfn = max_t(unsigned long, 1, base >> order);
>>> +    base_pfn = 1;
>>>       /* Check the domain allows at least some access to the 
>>> device... */
>>> -    if (domain->geometry.force_aperture) {
>>> +    if (map) {
>>> +        dma_addr_t base = dma_range_map_min(map);
>>>           if (base > domain->geometry.aperture_end ||
>>> -            limit < domain->geometry.aperture_start) {
>>> +            dma_range_map_max(map) < domain->geometry.aperture_start) {
>>>               pr_warn("specified DMA range outside IOMMU capability\n");
>>>               return -EFAULT;
>>>           }
>>>           /* ...then finally give it a kicking to make sure it fits */
>>> -        base_pfn = max_t(unsigned long, base_pfn,
>>> -                domain->geometry.aperture_start >> order);
>>> +        base_pfn = max(base, domain->geometry.aperture_start) >> order;
>>>       }
>>>       /* start_pfn is always nonzero for an already-initialised 
>>> domain */
>>> @@ -1760,7 +1757,7 @@ void iommu_setup_dma_ops(struct device *dev, 
>>> u64 dma_base, u64 dma_limit)
>>>        * underlying IOMMU driver needs to support via the dma-iommu 
>>> layer.
>>>        */
>>>       if (iommu_is_dma_domain(domain)) {
>>> -        if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
>>> +        if (iommu_dma_init_domain(domain, dev))
>>>               goto out_err;
>>>           dev->dma_ops = &iommu_dma_ops;
>>>       }
>>
>>
>> I have noticed some random test failures on Tegra186 and Tegra194 and 
>> bisect is pointing to this commit. Reverting this along with the 
>> various dependencies does fix the problem. On Tegra186 CPU hotplug is 
>> failing and on Tegra194 suspend is failing. Unfortunately, on neither 
>> platform do I see any particular crash but the boards hang somewhere.
> 
> That is... thoroughly bemusing :/ Not only is there supposed to be no 
> real functional change here - we should merely be recalculating the same 
> information from dev->dma_range_map that the callers were already doing 
> to generate the base/limit arguments - but the act of initially setting 
> up a default domain for a device behind an IOMMU should have no 
> connection whatsoever to suspend and especially not to CPU hotplug.


Yes it does look odd, but this is what bisect reported ...

git bisect start
# good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# bad: [6ba6c795dc73c22ce2c86006f17c4aa802db2a60] Add linux-next specific files for 20240513
git bisect bad 6ba6c795dc73c22ce2c86006f17c4aa802db2a60
# good: [29e7f949865a023a21ecdfbd82d68ac697569f34] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 29e7f949865a023a21ecdfbd82d68ac697569f34
# skip: [150e6cc14e51f2a07034106a4529cdaafd812c46] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git
git bisect skip 150e6cc14e51f2a07034106a4529cdaafd812c46
# good: [f5d75327d30af49acf2e4b55f35ce2e6c45d1287] drm/amd/display: Fix invalid Copyright notice
git bisect good f5d75327d30af49acf2e4b55f35ce2e6c45d1287
# skip: [f1ec9a9ffc526df7c9523006c2abbb8ea554cdd8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git
git bisect skip f1ec9a9ffc526df7c9523006c2abbb8ea554cdd8
# bad: [f091e93306e0429ebb7589b9874590b6a9705e64] dma-mapping: Simplify arch_setup_dma_ops()
git bisect bad f091e93306e0429ebb7589b9874590b6a9705e64
# good: [91cfd679f9e8b9a7bf2f26adf66eff99dbe2026b] ACPI/IORT: Handle memory address size limits as limits
git bisect good 91cfd679f9e8b9a7bf2f26adf66eff99dbe2026b
# bad: [ad4750b07d3462ce29a0c9b1e88b2a1f9795290e] iommu/dma: Make limit checks self-contained
git bisect bad ad4750b07d3462ce29a0c9b1e88b2a1f9795290e
# good: [fece6530bf4b59b01a476a12851e07751e73d69f] dma-mapping: Add helpers for dma_range_map bounds
git bisect good fece6530bf4b59b01a476a12851e07751e73d69f
# first bad commit: [ad4750b07d3462ce29a0c9b1e88b2a1f9795290e] iommu/dma: Make limit checks self-contained

There is a couple skips in there and so I will try this again.

>> If you have any ideas on things we can try let me know.
> 
> Since the symptom seems inexplicable, I'd throw the usual memory 
> debugging stuff like KASAN at it first. I'd also try 
> "no_console_suspend" to check whether any late output is being missed in 
> the suspend case (and if it's already broken, then any additional issues 
> that may be caused by the console itself hopefully shouldn't matter).
> 
> For more base-covering, do you have the "arm64: Properly clean up 
> iommu-dma remnants" fix in there already as well? That bug has bisected 
> to patch #6 each time though, so I do still suspect that what you're 
> seeing is likely something else. It does seem potentially significant 
> that those Tegra platforms are making fairly wide use of dma-ranges, but 
> there's no clear idea forming out of that observation just yet...

I was hoping it was the same issue other people had reported,
but the fix provided did not help. I have also tried today's
-next and I am still seeing the issue.

I should have more time next week to look at this further. Let
me confirm which change is causing this and add more debug.

Jon

-- 
nvpublic

  reply	other threads:[~2024-05-17 14:22 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 16:54 [PATCH v4 0/7] iommu, dma-mapping: Simplify arch_setup_dma_ops() Robin Murphy
2024-04-19 16:54 ` [PATCH v4 1/7] OF: Retire dma-ranges mask workaround Robin Murphy
2024-04-19 16:54 ` [PATCH v4 2/7] OF: Simplify DMA range calculations Robin Murphy
2024-04-19 16:54 ` [PATCH v4 3/7] ACPI/IORT: Handle memory address size limits as limits Robin Murphy
2024-04-19 16:54 ` [PATCH v4 4/7] dma-mapping: Add helpers for dma_range_map bounds Robin Murphy
2024-04-19 16:54 ` [PATCH v4 5/7] iommu/dma: Make limit checks self-contained Robin Murphy
2024-05-14 13:27   ` Jon Hunter
2024-05-15 14:59     ` Robin Murphy
2024-05-17 14:21       ` Jon Hunter [this message]
2024-05-17 15:03         ` Robin Murphy
2024-05-17 15:54           ` Jerry Snitselaar
2024-05-18 18:31           ` Jerry Snitselaar
2024-05-20 10:26             ` Robin Murphy
2024-05-20 18:11           ` Jon Hunter
     [not found]   ` <aeb13631-7504-4c3c-ba7b-812bf121a60f@oldschoolsolutions.biz>
2024-06-03 19:46     ` Robin Murphy
2024-06-03 19:49       ` Jens Glathe
2024-06-03 19:46     ` Jens Glathe
2024-04-19 16:54 ` [PATCH v4 6/7] iommu/dma: Centralise iommu_setup_dma_ops() Robin Murphy
2024-04-23  9:57   ` Catalin Marinas
2024-04-29 16:31   ` Dmitry Baryshkov
2024-04-29 21:26     ` Dmitry Baryshkov
2024-04-30 12:23       ` Konrad Dybcio
2024-04-30 12:33         ` Robin Murphy
2024-04-29 22:26     ` Robin Murphy
2024-04-30  0:41       ` Dmitry Baryshkov
2024-04-30 10:20         ` Robin Murphy
2024-04-30 10:56           ` Dmitry Baryshkov
2024-04-30 14:39   ` Niklas Schnelle
2024-04-19 16:54 ` [PATCH v4 7/7] dma-mapping: Simplify arch_setup_dma_ops() Robin Murphy
2024-04-23  9:59   ` Catalin Marinas
2024-04-22  6:37 ` [PATCH v4 0/7] iommu, " Christoph Hellwig
2024-04-26 10:07 ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46fc1b7f-7d10-4233-b089-aa173ad3bbeb@nvidia.com \
    --to=jonathanh@nvidia.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=baolu.lu@linux.intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=decui@microsoft.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=frowand.list@gmail.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=guohanjun@huawei.com \
    --cc=haiyangz@microsoft.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=kernel@xen0n.name \
    --cc=kys@microsoft.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lpieralisi@kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=sudeep.holla@arm.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=vgupta@kernel.org \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).