From: Baolu Lu <baolu.lu@linux.intel.com>
To: Yi Liu <yi.l.liu@intel.com>,
"Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>
Cc: baolu.lu@linux.intel.com,
"dwmw2@infradead.org" <dwmw2@infradead.org>,
"joro@8bytes.org" <joro@8bytes.org>,
"will@kernel.org" <will@kernel.org>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"Peng, Chao P" <chao.p.peng@intel.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Date: Thu, 12 Dec 2024 19:50:43 +0800 [thread overview]
Message-ID: <65f58697-3899-41eb-892b-44f66df97876@linux.intel.com> (raw)
In-Reply-To: <9a52713b-3a33-4e64-ad8d-8394e9504d75@intel.com>
On 2024/12/12 19:00, Yi Liu wrote:
>
>
> On 2024/12/12 18:01, Duan, Zhenzhong wrote:
>> Hi Yi,
>>
>>> -----Original Message-----
>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>> Sent: Thursday, December 12, 2024 5:29 PM
>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>>> cache_tag_flush_range_np()
>>>
>>> On 2024/12/12 16:19, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>>> Sent: Thursday, December 12, 2024 3:45 PM
>>>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer
>>>>> dereference in
>>>>> cache_tag_flush_range_np()
>>>>>
>>>>> On 2024/12/12 15:24, Zhenzhong Duan wrote:
>>>>>> When setup mapping on an paging domain before the domain is
>>>>>> attached to
>>>>> any
>>>>>> device, a NULL pointer dereference triggers as below:
>>>>>>
>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000200
>>>>>> #PF: supervisor read access in kernel mode
>>>>>> #PF: error_code(0x0000) - not-present page
>>>>>> RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>>>>> ...
>>>>>> Call Trace:
>>>>>> <TASK>
>>>>>> ? __die+0x20/0x70
>>>>>> ? page_fault_oops+0x80/0x150
>>>>>> ? do_user_addr_fault+0x5f/0x670
>>>>>> ? pfn_to_dma_pte+0xca/0x280
>>>>>> ? exc_page_fault+0x78/0x170
>>>>>> ? asm_exc_page_fault+0x22/0x30
>>>>>> ? cache_tag_flush_range_np+0x114/0x1f0
>>>>>> intel_iommu_iotlb_sync_map+0x16/0x20
>>>>>> iommu_map+0x59/0xd0
>>>>>> batch_to_domain+0x154/0x1e0
>>>>>> iopt_area_fill_domains+0x106/0x300
>>>>>> iopt_map_pages+0x1bc/0x290
>>>>>> iopt_map_user_pages+0xe8/0x1e0
>>>>>> ? xas_load+0x9/0xb0
>>>>>> iommufd_ioas_map+0xc9/0x1c0
>>>>>> iommufd_fops_ioctl+0xff/0x1b0
>>>>>> __x64_sys_ioctl+0x87/0xc0
>>>>>> do_syscall_64+0x50/0x110
>>>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>
>>>>>> qi_batch structure is allocated when domain is attached to a
>>>>>> device for the
>>>>>> first time, when a mapping is setup before that, qi_batch is
>>>>>> referenced to
>>>>>> do batched flush and trigger above issue.
>>>>>>
>>>>>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>>>>>> device is attached.
>>>>>>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache
>>>>>> invalidation")
>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>> ---
>>>>>> drivers/iommu/intel/cache.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/
>>>>>> cache.c
>>>>>> index e5b89f728ad3..bb9dae9a7fba 100644
>>>>>> --- a/drivers/iommu/intel/cache.c
>>>>>> +++ b/drivers/iommu/intel/cache.c
>>>>>> @@ -264,7 +264,7 @@ static unsigned long
>>>>> calculate_psi_aligned_address(unsigned long start,
>>>>>>
>>>>>> static void qi_batch_flush_descs(struct intel_iommu *iommu,
>>>>>> struct
>>> qi_batch
>>>>> *batch)
>>>>>> {
>>>>>> - if (!iommu || !batch->index)
>>>>>> + if (!iommu || !batch || !batch->index)
>>>>>
>>>>> this is the same issue in the below link. 🙂 And we should fix it by
>>>>> allocating the qi_batch for parent domains. The nested parent
>>>>> domains is
>>>>> not going to be attached to device at all. It acts more as a
>>>>> background
>>>>> domain, so this fix will miss the future cache flushes. It would have
>>>>> bigger problems. 🙂
>>>>>
>>>>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>>>>> yi.l.liu@intel.com/
>>>>
>>>> Ah, just see this😊
>>>> This patch tries to fix an issue that mapping setup on a paging
>>>> domain before
>>>> it's attached to a device, your patch fixed an issue that nested parent
>>>> domain's qi_batch is not allocated even if nested domain is attached to
>>>> a device. I think they are different issues?
>>>
>>> Oops. I see. I think your case is allocating a hwpt based on an IOAS
>>> that
>>> already has mappings. When the hwpt is added to it, all the mappings of
>>> this IOAS would be pushing to the hwpt. But the hwpt has not been
>>> attached
>>> yet, so hit this issue. I remember there is a immediate_attach bool
>>> to let
>>> iommufd_hwpt_paging_alloc() do an attach before calling
>>> iopt_table_add_domain() which pushes the IOAS mappings to domain.
>>>
>>> One possible fix is to set the immediate_attach to be true. But I
>>> doubt if
>>> it will be agreed since it was introduced due to some gap on ARM
>>> side. If
>>> that gap has been resolved, this behavior would go away as well.
>>>
>>> So back to this issue, with this change, the flush would be skipped. It
>>> looks ok to me to skip cache flush for map path. And we should not
>>> expect
>>> any unmap on this domain since there is no device attached (parent
>>> domain
>>> is an exception), hence nothing to be flushed even there is unmap in the
>>> domain's IOAS. So it appears to be a acceptable fix. @Baolu, your
>>> opinion?
>>
>> Hold on, it looks I'm wrong on analyzing related code
>> qi_batch_flush_descs().
>> The iommu should always be NULL in my suspected case, so
>> qi_batch_flush_descs() will return early without issue.
>>
>> I reproduced the backtrace when playing with iommufd qemu nesting, I
>> think your
>> previous comment is correct, I misunderstood the root cause of it,
>> it's indeed
>> due to nesting parent domain not paging domain. Please ignore this patch.
>
> Great. I also had a try to allocate hwpt with an IOAS that has already got
> a bunch of mappings, it can work as the iommu is null.
>
> @Baolu, you may ignore this patch as it's already fixed.
Okay, thank you!
prev parent reply other threads:[~2024-12-12 11:50 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-12 7:24 [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np() Zhenzhong Duan
2024-12-12 7:45 ` Yi Liu
2024-12-12 8:19 ` Duan, Zhenzhong
2024-12-12 9:28 ` Yi Liu
2024-12-12 10:01 ` Duan, Zhenzhong
2024-12-12 11:00 ` Yi Liu
2024-12-12 11:50 ` Baolu Lu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=65f58697-3899-41eb-892b-44f66df97876@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=chao.p.peng@intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox