From: "Liu, Jingqi" <jingqi.liu@intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
Lu Baolu <baolu.lu@linux.intel.com>,
David Woodhouse <dwmw2@infradead.org>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] iommu/vt-d: debugfs: Increment the reference count of page table page
Date: Mon, 3 Jul 2023 22:48:06 +0800 [thread overview]
Message-ID: <cf8b0163-56c9-0540-c635-b541132a77cd@intel.com> (raw)
In-Reply-To: <BN9PR11MB527672460238BCBA63CD3CA58C29A@BN9PR11MB5276.namprd11.prod.outlook.com>
On 7/3/2023 3:00 PM, Tian, Kevin wrote:
>> From: Liu, Jingqi <jingqi.liu@intel.com>
>> Sent: Sunday, June 25, 2023 10:28 AM
>>
>> There may be a race with iommu_unmap() interface when traversing a page
>> table.
>>
>> When debugfs traverses an IOMMU page table, iommu_unmap() may clear
>> entries and free the page table pages pointed to by the entries.
>> So debugfs may read invalid or freed pages.
>>
>> To avoid this, increment the refcount of a page table page before
>> traversing the page, and decrement its refcount after traversing it.
> I'm not sure how this race can be fully avoided w/o cooperation in the
> unmap path...
Thanks.
Indeed, in order to fully avoid this, need to cooperate in the unmap
path. :)
>> Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com>
>> ---
>> drivers/iommu/intel/debugfs.c | 36 +++++++++++++++++++++++++++++++++-
>> -
>> 1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
>> index 1f925285104e..d228e1580aec 100644
>> --- a/drivers/iommu/intel/debugfs.c
>> +++ b/drivers/iommu/intel/debugfs.c
>> @@ -333,9 +333,41 @@ static void pgtable_walk_level(struct seq_file *m,
>> struct dma_pte *pde,
>> path[level] = pde->val;
>> if (dma_pte_superpage(pde) || level == 1)
>> dump_page_info(m, start, path);
>> - else
>> - pgtable_walk_level(m,
>> phys_to_virt(dma_pte_addr(pde)),
>> + else {
>> + struct page *pg;
>> + u64 pte_addr;
>> +
>> + /*
>> + * The entry references a Page-Directory Table
>> + * or a Page Table.
>> + */
>> +retry:
>> + pte_addr = dma_pte_addr(pde);
>> + pg = pfn_to_page(pte_addr >> PAGE_SHIFT);
>> + if (!get_page_unless_zero(pg))
>> + /*
>> + * If this page has a refcount of zero,
>> + * it has been freed, or will be freed.
>> + */
>> + continue;
>> +
>> + /* Check if the value of the entry is changed. */
>> + if (pde->val != path[level]) {
>> + put_page(pg);
>> +
>> + if (!dma_pte_present(pde))
>> + /* The entry is invalid. Skip it. */
>> + continue;
>> +
>> + /* The entry has been updated. */
>> + path[level] = pde->val;
>> + goto retry;
>> + }
>> +
>> + pgtable_walk_level(m, phys_to_virt(pte_addr),
>> level - 1, start, path);
> What about pde->val getting cleared after phys_to_virt(pte_addr) leading
> to all the levels below 'pg' being freed? In that case this code still walks
> the stale 'pg' content which however all point to invalid pages then.
There are 2 cases for the page pointed to by the PTE below 'pg'.
1) the page has been freed.
It will be skipped after the following check:
if (!get_page_unless_zero(pg))
/*
* If this page has a refcount of zero,
* it has been freed, or will be freed.
*/
continue;
Debugfs won't walk further.
2) The page has not been freed.
The content of this page is stale.
Dumping these stale content seems to be acceptable for debugfs.
If all the PTEs below 'pg' can be cleared before being freed in
the unmap path,
the following check can avoid to walk these stale pages.
if (!dma_pte_present(pde))
continue;
> It's probably simpler to just check the format of each PTE (e.g. whether
> any reserved bit is set) and if abnormal then break the current level of
> walk.
Good point.
It seems the bit used to check must be differentiated between valid and
stale.
But the PTE may not change whether it is valid or stale.
Thanks,
Jingqi
>> + put_page(pg);
>> + }
>> path[level] = 0;
>> }
>> }
>> --
>> 2.21.3
next prev parent reply other threads:[~2023-07-03 14:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-25 2:28 [PATCH] iommu/vt-d: debugfs: Increment the reference count of page table page Jingqi Liu
2023-07-03 7:00 ` Tian, Kevin
2023-07-03 14:48 ` Liu, Jingqi [this message]
2023-08-03 11:21 ` Liu, Jingqi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cf8b0163-56c9-0540-c635-b541132a77cd@intel.com \
--to=jingqi.liu@intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox