Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Yi Liu <yi.l.liu@intel.com>
To: "Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>
Cc: "dwmw2@infradead.org" <dwmw2@infradead.org>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"will@kernel.org" <will@kernel.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"Peng, Chao P" <chao.p.peng@intel.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Date: Thu, 12 Dec 2024 19:00:23 +0800	[thread overview]
Message-ID: <9a52713b-3a33-4e64-ad8d-8394e9504d75@intel.com> (raw)
In-Reply-To: <SJ0PR11MB6744EF3EB81780C1EA07FB1F923F2@SJ0PR11MB6744.namprd11.prod.outlook.com>



On 2024/12/12 18:01, Duan, Zhenzhong wrote:
> Hi Yi,
> 
>> -----Original Message-----
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, December 12, 2024 5:29 PM
>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>> cache_tag_flush_range_np()
>>
>> On 2024/12/12 16:19, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>> Sent: Thursday, December 12, 2024 3:45 PM
>>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>>>> cache_tag_flush_range_np()
>>>>
>>>> On 2024/12/12 15:24, Zhenzhong Duan wrote:
>>>>> When setup mapping on an paging domain before the domain is attached to
>>>> any
>>>>> device, a NULL pointer dereference triggers as below:
>>>>>
>>>>>     BUG: kernel NULL pointer dereference, address: 0000000000000200
>>>>>     #PF: supervisor read access in kernel mode
>>>>>     #PF: error_code(0x0000) - not-present page
>>>>>     RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>>>>     ...
>>>>>     Call Trace:
>>>>>      <TASK>
>>>>>      ? __die+0x20/0x70
>>>>>      ? page_fault_oops+0x80/0x150
>>>>>      ? do_user_addr_fault+0x5f/0x670
>>>>>      ? pfn_to_dma_pte+0xca/0x280
>>>>>      ? exc_page_fault+0x78/0x170
>>>>>      ? asm_exc_page_fault+0x22/0x30
>>>>>      ? cache_tag_flush_range_np+0x114/0x1f0
>>>>>      intel_iommu_iotlb_sync_map+0x16/0x20
>>>>>      iommu_map+0x59/0xd0
>>>>>      batch_to_domain+0x154/0x1e0
>>>>>      iopt_area_fill_domains+0x106/0x300
>>>>>      iopt_map_pages+0x1bc/0x290
>>>>>      iopt_map_user_pages+0xe8/0x1e0
>>>>>      ? xas_load+0x9/0xb0
>>>>>      iommufd_ioas_map+0xc9/0x1c0
>>>>>      iommufd_fops_ioctl+0xff/0x1b0
>>>>>      __x64_sys_ioctl+0x87/0xc0
>>>>>      do_syscall_64+0x50/0x110
>>>>>      entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>
>>>>> qi_batch structure is allocated when domain is attached to a device for the
>>>>> first time, when a mapping is setup before that, qi_batch is referenced to
>>>>> do batched flush and trigger above issue.
>>>>>
>>>>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>>>>> device is attached.
>>>>>
>>>>> Cc: stable@vger.kernel.org
>>>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>     drivers/iommu/intel/cache.c | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
>>>>> index e5b89f728ad3..bb9dae9a7fba 100644
>>>>> --- a/drivers/iommu/intel/cache.c
>>>>> +++ b/drivers/iommu/intel/cache.c
>>>>> @@ -264,7 +264,7 @@ static unsigned long
>>>> calculate_psi_aligned_address(unsigned long start,
>>>>>
>>>>>     static void qi_batch_flush_descs(struct intel_iommu *iommu, struct
>> qi_batch
>>>> *batch)
>>>>>     {
>>>>> -	if (!iommu || !batch->index)
>>>>> +	if (!iommu || !batch || !batch->index)
>>>>
>>>> this is the same issue in the below link. :) And we should fix it by
>>>> allocating the qi_batch for parent domains. The nested parent domains is
>>>> not going to be attached to device at all. It acts more as a background
>>>> domain, so this fix will miss the future cache flushes. It would have
>>>> bigger problems. :)
>>>>
>>>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>>>> yi.l.liu@intel.com/
>>>
>>> Ah, just see this😊
>>> This patch tries to fix an issue that mapping setup on a paging domain before
>>> it's attached to a device, your patch fixed an issue that nested parent
>>> domain's qi_batch is not allocated even if nested domain is attached to
>>> a device. I think they are different issues?
>>
>> Oops. I see. I think your case is allocating a hwpt based on an IOAS that
>> already has mappings. When the hwpt is added to it, all the mappings of
>> this IOAS would be pushing to the hwpt. But the hwpt has not been attached
>> yet, so hit this issue. I remember there is a immediate_attach bool to let
>> iommufd_hwpt_paging_alloc() do an attach before calling
>> iopt_table_add_domain() which pushes the IOAS mappings to domain.
>>
>> One possible fix is to set the immediate_attach to be true. But I doubt if
>> it will be agreed since it was introduced due to some gap on ARM side. If
>> that gap has been resolved, this behavior would go away as well.
>>
>> So back to this issue, with this change, the flush would be skipped. It
>> looks ok to me to skip cache flush for map path. And we should not expect
>> any unmap on this domain since there is no device attached (parent domain
>> is an exception), hence nothing to be flushed even there is unmap in the
>> domain's IOAS. So it appears to be a acceptable fix. @Baolu, your opinion?
> 
> Hold on, it looks I'm wrong on analyzing related code qi_batch_flush_descs().
> The iommu should always be NULL in my suspected case, so
> qi_batch_flush_descs() will return early without issue.
> 
> I reproduced the backtrace when playing with iommufd qemu nesting, I think your
> previous comment is correct, I misunderstood the root cause of it, it's indeed
> due to nesting parent domain not paging domain. Please ignore this patch.

Great. I also had a try to allocate hwpt with an IOAS that has already got
a bunch of mappings, it can work as the iommu is null.

@Baolu, you may ignore this patch as it's already fixed.

-- 
Regards,
Yi Liu

  reply	other threads:[~2024-12-12 10:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-12  7:24 [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np() Zhenzhong Duan
2024-12-12  7:45 ` Yi Liu
2024-12-12  8:19   ` Duan, Zhenzhong
2024-12-12  9:28     ` Yi Liu
2024-12-12 10:01       ` Duan, Zhenzhong
2024-12-12 11:00         ` Yi Liu [this message]
2024-12-12 11:50           ` Baolu Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a52713b-3a33-4e64-ad8d-8394e9504d75@intel.com \
    --to=yi.l.liu@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=will@kernel.org \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox