From: Baolu Lu <baolu.lu@linux.intel.com>
To: Yi Liu <yi.l.liu@intel.com>, joro@8bytes.org, kevin.tian@intel.com
Cc: chao.p.peng@linux.intel.com, iommu@lists.linux.dev
Subject: Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
Date: Mon, 9 Dec 2024 09:39:41 +0800 [thread overview]
Message-ID: <ce413d97-ec46-4cd9-932a-af804cd531cf@linux.intel.com> (raw)
In-Reply-To: <20241207120304.5710-1-yi.l.liu@intel.com>
On 12/7/24 20:03, Yi Liu wrote:
> The qi_batch is allocated when assigning cache tag for a domain. While
> for nested parent domain, it is missed. Hence, when trying to map pages
> to the nested parent, NULL dereference occurred.
Yes. Good catch!
> To solve it, allocate qi_batch for the nested parent domain when assigning
> cache tag for the nested domain is enough. However, it seems not quite
> reliable to allocate qi_batch in cache tag as a domain may be used by
> multiple devices, and there is no lock around the domain->qi_batch check.
> As all the domains (except blocking domain and identity domain) are supposed
> to have qi_batch, this fix just allocates the qi_batch in the domain
> allocation.
So there appears to be two problems, the domain->qi_batch is not
allocated for all paths and it's allocated without lock protection that
creates a race case and possibly result in memory leak.
But I don't think there is a need to move the domain->qi_batch
allocation to the domain allocation path. That's a kind of refactoring
and there is no need to mess it with a fix patch like this.
Perhaps add a helper to handle the domain->qi_batch allocation? And then
call it in the necessary paths? Something like this:
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -105,12 +105,35 @@ static void cache_tag_unassign(struct dmar_domain
*domain, u16 did,
spin_unlock_irqrestore(&domain->cache_lock, flags);
}
+/* domain->qi_batch will be freed in iommu_free_domain() path. */
+static int domain_qi_batch_alloc(struct dmar_domain *domain)
+{
+ unsigned long flags;
+ int ret = 0;
+
+ spin_lock_irqsave(&domain->cache_lock, flags);
+ if (domain->qi_batch)
+ goto out_unlock;
+
+ domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
+ if (!domain->qi_batch)
+ ret = -ENOMEM;
+out_unlock:
+ spin_unlock_irqrestore(&domain->cache_lock, flags);
+
+ return ret;
+}
+
static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
struct device *dev, ioasid_t pasid)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
int ret;
+ ret = domain_qi_batch_alloc(domain);
+ if (ret)
+ return ret;
+
ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_IOTLB);
if (ret || !info->ats_enabled)
return ret;
@@ -139,6 +162,10 @@ static int __cache_tag_assign_parent_domain(struct
dmar_domain *domain, u16 did,
struct device_domain_info *info = dev_iommu_priv_get(dev);
int ret;
+ ret = domain_qi_batch_alloc(domain);
+ if (ret)
+ return ret;
+
ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB);
if (ret || !info->ats_enabled)
return ret;
@@ -190,13 +217,6 @@ int cache_tag_assign_domain(struct dmar_domain *domain,
u16 did = domain_get_id_for_dev(domain, dev);
int ret;
- /* domain->qi_bach will be freed in iommu_free_domain() path. */
- if (!domain->qi_batch) {
- domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_KERNEL);
- if (!domain->qi_batch)
- return -ENOMEM;
- }
-
ret = __cache_tag_assign_domain(domain, did, dev, pasid);
if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED)
return ret;
>
> [ 176.749284] BUG: kernel NULL pointer dereference, address: 0000000000000200
> [ 176.758178] #PF: supervisor read access in kernel mode
> [ 176.765052] #PF: error_code(0x0000) - not-present page
> [ 176.771985] PGD 8104795067 P4D 0
> [ 176.776900] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 176.783640] CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
> [ 176.797868] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0107.D52.2311070228 11/07/2023
> [ 176.812762] RIP: 0010:cache_tag_flush_range_np+0x13c/0x260
> [ 176.820656] Code: 8b 6e 18 41 f6 45 18 80 0f 85 66 ff ff ff 4c 89 ef e8 e8 98 ff ff 4d 8b 36 49 39 ee 75 a3 4d 85 ed 48 8b ab 28 01 00 00 74 0e <8b> 95 00 02 00 00 85 d2 0f 85 88 00 00 00 48 8b 74 24 08 4c 89 ff
> [ 176.847296] RSP: 0018:ff2e80d8b4887aa0 EFLAGS: 00010086
> [ 176.855207] RAX: 0000000000000000 RBX: ff18cd64bca69000 RCX: 0000000000000001
> [ 176.865337] RDX: 0000000000000000 RSI: ffffffff98206127 RDI: ff18cd6440139c00
> [ 176.875453] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [ 176.885622] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 176.895802] R13: ff18cd6440139c00 R14: ff18cd64bca69118 R15: ff18cd64bca690d0
> [ 176.906056] FS: 00007f070a7fc640(0000) GS:ff18cde22fa00000(0000) knlGS:0000000000000000
> [ 176.917459] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 176.926274] CR2: 0000000000000200 CR3: 00000080996ee003 CR4: 0000000000f73ef0
> [ 176.936703] PKRU: 55555554
> [ 176.942175] Call Trace:
> [ 176.947385] <TASK>
> [ 176.952216] ? __die+0x24/0x70
> [ 176.958114] ? page_fault_oops+0x80/0x150
> [ 176.965133] ? do_user_addr_fault+0x63/0x7b0
> [ 176.972462] ? exc_page_fault+0x7c/0x220
> [ 176.979437] ? asm_exc_page_fault+0x26/0x30
> [ 176.986755] ? cache_tag_flush_range_np+0x13c/0x260
> [ 176.994903] intel_iommu_iotlb_sync_map+0x1a/0x30
> [ 177.002880] iommu_map+0x61/0xf0
> [ 177.009198] batch_to_domain+0x188/0x250
> [ 177.016344] iopt_area_fill_domains+0x125/0x320
> [ 177.024220] ? rcu_is_watching+0x11/0x50
> [ 177.031442] iopt_map_pages+0x63/0x100
> [ 177.038420] iopt_map_common.isra.0+0xa7/0x190
> [ 177.046131] iopt_map_user_pages+0x6a/0x80
> [ 177.053435] iommufd_ioas_map+0xcd/0x1d0
> [ 177.060537] iommufd_fops_ioctl+0x118/0x1c0
> [ 177.067918] __x64_sys_ioctl+0x93/0xc0
> [ 177.074736] do_syscall_64+0x71/0x140
> [ 177.081376] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 177.089489] RIP: 0033:0x7f071491a94f
> [ 177.095876] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> [ 177.124032] RSP: 002b:00007f070a7f76d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 177.134964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f071491a94f
> [ 177.145390] RDX: 00007f070a7f7760 RSI: 0000000000003b85 RDI: 0000000000000015
> [ 177.155834] RBP: 0000000000000015 R08: 00007f0603e00000 R09: 0000000000000000
> [ 177.166247] R10: 000056326cd25970 R11: 0000000000000246 R12: 00000000000a0000
> [ 177.177421] R13: 0000000000000004 R14: 0000000000000000 R15: 00000000000a0000
> [ 177.187869] </TASK>
> [ 177.192700] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ofpart snd_hda_codec pmt_telemetry kvm_intel binfmt_misc pmt_class intel_sdsi snd_hda_core spi_nor isst_if_mbox_pci isst_if_mmio kvm dax_hmem mei_me joydev mtd isst_if_common snd_pcm intel_vsec idxd mei snd_timer cxl_acpi cxl_port cxl_core einj acpi_power_meter dm_multipath fuse ip_tables crc32c_intel i2c_i801 spi_intel_pci i2c_smbus i2c_ismt spi_intel drm_shmem_helper igc pinctrl_emmitsburg pinctrl_intel pwm_lpss
Strip the kernel message and leave only valuable messages that
developers might be interested. Something like this,
BUG: kernel NULL pointer dereference, address: 0000000000000200
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8104795067 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted
6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
Call Trace:
? __die+0x24/0x70
? page_fault_oops+0x80/0x150
? do_user_addr_fault+0x63/0x7b0
? exc_page_fault+0x7c/0x220
? asm_exc_page_fault+0x26/0x30
? cache_tag_flush_range_np+0x13c/0x260
intel_iommu_iotlb_sync_map+0x1a/0x30
iommu_map+0x61/0xf0
batch_to_domain+0x188/0x250
iopt_area_fill_domains+0x125/0x320
? rcu_is_watching+0x11/0x50
iopt_map_pages+0x63/0x100
iopt_map_common.isra.0+0xa7/0x190
iopt_map_user_pages+0x6a/0x80
iommufd_ioas_map+0xcd/0x1d0
iommufd_fops_ioctl+0x118/0x1c0
__x64_sys_ioctl+0x93/0xc0
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/iommu/intel/cache.c | 7 -------
> drivers/iommu/intel/iommu.c | 7 +++++++
> drivers/iommu/intel/nested.c | 6 ++++++
> drivers/iommu/intel/svm.c | 7 +++++++
> 4 files changed, 20 insertions(+), 7 deletions(-)
[...]
Thanks,
baolu
next prev parent reply other threads:[~2024-12-09 1:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-07 12:03 [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain Yi Liu
2024-12-09 1:39 ` Baolu Lu [this message]
2024-12-09 3:07 ` Yi Liu
2024-12-09 3:05 ` Baolu Lu
2024-12-11 8:32 ` Tian, Kevin
2024-12-11 9:29 ` Yi Liu
2024-12-09 3:10 ` Yi Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce413d97-ec46-4cd9-932a-af804cd531cf@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=chao.p.peng@linux.intel.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.