From: Robin Murphy <robin.murphy@arm.com>
To: Christian Schrefl <chrisi.schrefl@gmail.com>,
Jason Gunthorpe <jgg@ziepe.ca>
Cc: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>,
Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>,
Rob Clark <robin.clark@oss.qualcomm.com>,
Matti Vaittinen <mazziesaccount@gmail.com>,
iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org,
Rudraksha Gupta <guptarud@gmail.com>
Subject: Re: [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064
Date: Fri, 23 Jan 2026 18:02:08 +0000 [thread overview]
Message-ID: <da3fcb7f-4ac4-41c0-b3ad-055c6766fd5c@arm.com> (raw)
In-Reply-To: <5cab0953-c2a3-4baa-af91-e9519afef092@gmail.com>
On 2026-01-15 7:02 pm, Christian Schrefl wrote:
> Hi Robin,
>
> On 1/6/26 7:50 PM, Robin Murphy wrote:
>> On 2026-01-05 6:02 pm, Jason Gunthorpe wrote:
>>> Though this looks really weird:
>>>
>>> struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
>>> int sid;
>>>
>>> if (list_empty(&(*iommu)->ctx_list)) {
>>> master = kzalloc(sizeof(*master), GFP_ATOMIC);
>>>
>>> So maybe master is NULL and !list_empty()?
>>
>> AFAICS that could happen if of_xlate has run once, and then for whatever
>> reason dev->iommu is torn down and the whole process started from
>> scratch a second time - although I'm struggling to see any obvious cause
>> for that in the absence of any other visible errors or async probe
>> races (and assuming that the IOMMUs on this platform ever actually
>> worked at all, of course...)
>>
>> However it certainly stands out as a bit wrong that of_xlate is touching
>> the IOMMU instance itself - that should never have been expected to work
>> well. Does the diff below help?
>>
>> Thanks,
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
>> index 819add75a665..e57780fc3287 100644
>> --- a/drivers/iommu/msm_iommu.c
>> +++ b/drivers/iommu/msm_iommu.c
>> @@ -360,14 +360,11 @@ static int msm_iommu_domain_config(struct msm_priv *priv)
>> /* Must be called under msm_iommu_lock */
>> static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
>> {
>> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> struct msm_iommu_dev *iommu, *ret = NULL;
>> - struct msm_iommu_ctx_dev *master;
>>
>> list_for_each_entry(iommu, &qcom_iommu_devices, dev_node) {
>> - master = list_first_entry(&iommu->ctx_list,
>> - struct msm_iommu_ctx_dev,
>> - list);
>> - if (master->of_node == dev->of_node) {
>> + if (iommu->dev->fwnode == fwspec->iommu_fwnode) {
>> ret = iommu;
>> break;
>> }
>> @@ -378,6 +375,7 @@ static struct msm_iommu_dev *find_iommu_for_dev(struct device *dev)
>>
>> static struct iommu_device *msm_iommu_probe_device(struct device *dev)
>> {
>> + struct msm_iommu_ctx_dev *master;
>> struct msm_iommu_dev *iommu;
>> unsigned long flags;
>>
>> @@ -388,6 +386,8 @@ static struct iommu_device *msm_iommu_probe_device(struct device *dev)
>> if (!iommu)
>> return ERR_PTR(-ENODEV);
>>
>> + master = dev_iommu_priv_get(dev);
>> + list_add(&master->list, &iommu->ctx_list);
>> return &iommu->iommu;
>> }
>>
>> @@ -604,14 +604,13 @@ static int insert_iommu_master(struct device *dev,
>> struct msm_iommu_ctx_dev *master = dev_iommu_priv_get(dev);
>> int sid;
>>
>> - if (list_empty(&(*iommu)->ctx_list)) {
>> + if (!master) {
>> master = kzalloc(sizeof(*master), GFP_ATOMIC);
>> if (!master) {
>> dev_err(dev, "Failed to allocate iommu_master\n");
>> return -ENOMEM;
>> }
>> master->of_node = dev->of_node;
>> - list_add(&master->list, &(*iommu)->ctx_list);
>> dev_iommu_priv_set(dev, master);
>> }
>>
>
> Sorry for taking so long until I got to trying it out.
> The patch fixes the crash!
Bah, in the process of writing this up properly I've realised that
although it fixes the crash, I think it breaks the multi-IOMMU
functionality, as .add_device will only be called for the first IOMMU
instance, whereas the current code is cheekily abusing .of_xlate to link
the device to both instances. And in fact that implies the existing code
is yet more buggy, as it seems it will leak and reallocate
dev_iommu_priv the first time it looks up the second instance (whose
ctx_list will be empty), thus it's only working at all because in all
cases the DT happens to list all the IDs for one instance before all for
the other, and both use an identical set of ID values. Oh dear...
Are you using the GPU and/or display device(s) enough to be able to tell
whether their IOMMUs are working as expected?
Thanks,
Robin.
>
> Cheers,
> Christian
>
next prev parent reply other threads:[~2026-01-23 18:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-29 22:26 [REGRESSION] qcom: iommu: nullpointer dereference on boot on apq8064 Christian Schrefl
2026-01-05 18:02 ` Jason Gunthorpe
2026-01-06 18:50 ` Robin Murphy
2026-01-15 19:02 ` Christian Schrefl
2026-01-23 18:02 ` Robin Murphy [this message]
2026-01-23 18:10 ` Christian Schrefl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da3fcb7f-4ac4-41c0-b3ad-055c6766fd5c@arm.com \
--to=robin.murphy@arm.com \
--cc=chrisi.schrefl@gmail.com \
--cc=dmitry.baryshkov@oss.qualcomm.com \
--cc=guptarud@gmail.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=konrad.dybcio@oss.qualcomm.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=mazziesaccount@gmail.com \
--cc=robin.clark@oss.qualcomm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox