From: Yi Liu <yi.l.liu@intel.com>
To: "Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>
Cc: "dwmw2@infradead.org" <dwmw2@infradead.org>,
"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"will@kernel.org" <will@kernel.org>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"Peng, Chao P" <chao.p.peng@intel.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Date: Thu, 12 Dec 2024 19:00:23 +0800 [thread overview]
Message-ID: <9a52713b-3a33-4e64-ad8d-8394e9504d75@intel.com> (raw)
In-Reply-To: <SJ0PR11MB6744EF3EB81780C1EA07FB1F923F2@SJ0PR11MB6744.namprd11.prod.outlook.com>
On 2024/12/12 18:01, Duan, Zhenzhong wrote:
> Hi Yi,
>
>> -----Original Message-----
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, December 12, 2024 5:29 PM
>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>> cache_tag_flush_range_np()
>>
>> On 2024/12/12 16:19, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>> Sent: Thursday, December 12, 2024 3:45 PM
>>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>>>> cache_tag_flush_range_np()
>>>>
>>>> On 2024/12/12 15:24, Zhenzhong Duan wrote:
>>>>> When setup mapping on an paging domain before the domain is attached to
>>>> any
>>>>> device, a NULL pointer dereference triggers as below:
>>>>>
>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000200
>>>>> #PF: supervisor read access in kernel mode
>>>>> #PF: error_code(0x0000) - not-present page
>>>>> RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>>>> ...
>>>>> Call Trace:
>>>>> <TASK>
>>>>> ? __die+0x20/0x70
>>>>> ? page_fault_oops+0x80/0x150
>>>>> ? do_user_addr_fault+0x5f/0x670
>>>>> ? pfn_to_dma_pte+0xca/0x280
>>>>> ? exc_page_fault+0x78/0x170
>>>>> ? asm_exc_page_fault+0x22/0x30
>>>>> ? cache_tag_flush_range_np+0x114/0x1f0
>>>>> intel_iommu_iotlb_sync_map+0x16/0x20
>>>>> iommu_map+0x59/0xd0
>>>>> batch_to_domain+0x154/0x1e0
>>>>> iopt_area_fill_domains+0x106/0x300
>>>>> iopt_map_pages+0x1bc/0x290
>>>>> iopt_map_user_pages+0xe8/0x1e0
>>>>> ? xas_load+0x9/0xb0
>>>>> iommufd_ioas_map+0xc9/0x1c0
>>>>> iommufd_fops_ioctl+0xff/0x1b0
>>>>> __x64_sys_ioctl+0x87/0xc0
>>>>> do_syscall_64+0x50/0x110
>>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>
>>>>> qi_batch structure is allocated when domain is attached to a device for the
>>>>> first time, when a mapping is setup before that, qi_batch is referenced to
>>>>> do batched flush and trigger above issue.
>>>>>
>>>>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>>>>> device is attached.
>>>>>
>>>>> Cc: stable@vger.kernel.org
>>>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>> drivers/iommu/intel/cache.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
>>>>> index e5b89f728ad3..bb9dae9a7fba 100644
>>>>> --- a/drivers/iommu/intel/cache.c
>>>>> +++ b/drivers/iommu/intel/cache.c
>>>>> @@ -264,7 +264,7 @@ static unsigned long
>>>> calculate_psi_aligned_address(unsigned long start,
>>>>>
>>>>> static void qi_batch_flush_descs(struct intel_iommu *iommu, struct
>> qi_batch
>>>> *batch)
>>>>> {
>>>>> - if (!iommu || !batch->index)
>>>>> + if (!iommu || !batch || !batch->index)
>>>>
>>>> this is the same issue in the below link. :) And we should fix it by
>>>> allocating the qi_batch for parent domains. The nested parent domains is
>>>> not going to be attached to device at all. It acts more as a background
>>>> domain, so this fix will miss the future cache flushes. It would have
>>>> bigger problems. :)
>>>>
>>>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>>>> yi.l.liu@intel.com/
>>>
>>> Ah, just see this😊
>>> This patch tries to fix an issue that mapping setup on a paging domain before
>>> it's attached to a device, your patch fixed an issue that nested parent
>>> domain's qi_batch is not allocated even if nested domain is attached to
>>> a device. I think they are different issues?
>>
>> Oops. I see. I think your case is allocating a hwpt based on an IOAS that
>> already has mappings. When the hwpt is added to it, all the mappings of
>> this IOAS would be pushing to the hwpt. But the hwpt has not been attached
>> yet, so hit this issue. I remember there is a immediate_attach bool to let
>> iommufd_hwpt_paging_alloc() do an attach before calling
>> iopt_table_add_domain() which pushes the IOAS mappings to domain.
>>
>> One possible fix is to set the immediate_attach to be true. But I doubt if
>> it will be agreed since it was introduced due to some gap on ARM side. If
>> that gap has been resolved, this behavior would go away as well.
>>
>> So back to this issue, with this change, the flush would be skipped. It
>> looks ok to me to skip cache flush for map path. And we should not expect
>> any unmap on this domain since there is no device attached (parent domain
>> is an exception), hence nothing to be flushed even there is unmap in the
>> domain's IOAS. So it appears to be a acceptable fix. @Baolu, your opinion?
>
> Hold on, it looks I'm wrong on analyzing related code qi_batch_flush_descs().
> The iommu should always be NULL in my suspected case, so
> qi_batch_flush_descs() will return early without issue.
>
> I reproduced the backtrace when playing with iommufd qemu nesting, I think your
> previous comment is correct, I misunderstood the root cause of it, it's indeed
> due to nesting parent domain not paging domain. Please ignore this patch.
Great. I also had a try to allocate hwpt with an IOAS that has already got
a bunch of mappings, it can work as the iommu is null.
@Baolu, you may ignore this patch as it's already fixed.
--
Regards,
Yi Liu
next prev parent reply other threads:[~2024-12-12 10:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-12 7:24 [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np() Zhenzhong Duan
2024-12-12 7:45 ` Yi Liu
2024-12-12 8:19 ` Duan, Zhenzhong
2024-12-12 9:28 ` Yi Liu
2024-12-12 10:01 ` Duan, Zhenzhong
2024-12-12 11:00 ` Yi Liu [this message]
2024-12-12 11:50 ` Baolu Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a52713b-3a33-4e64-ad8d-8394e9504d75@intel.com \
--to=yi.l.liu@intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=chao.p.peng@intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox