* [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
@ 2024-12-07 12:03 Yi Liu
2024-12-09 1:39 ` Baolu Lu
0 siblings, 1 reply; 7+ messages in thread
From: Yi Liu @ 2024-12-07 12:03 UTC (permalink / raw)
To: joro, kevin.tian, baolu.lu; +Cc: chao.p.peng, yi.l.liu, iommu
The qi_batch is allocated when assigning cache tag for a domain. While
for nested parent domain, it is missed. Hence, when trying to map pages
to the nested parent, NULL dereference occurred.
To solve it, allocate qi_batch for the nested parent domain when assigning
cache tag for the nested domain is enough. However, it seems not quite
reliable to allocate qi_batch in cache tag as a domain may be used by
multiple devices, and there is no lock around the domain->qi_batch check.
As all the domains (except blocking domain and identity domain) are supposed
to have qi_batch, this fix just allocates the qi_batch in the domain
allocation.
[ 176.749284] BUG: kernel NULL pointer dereference, address: 0000000000000200
[ 176.758178] #PF: supervisor read access in kernel mode
[ 176.765052] #PF: error_code(0x0000) - not-present page
[ 176.771985] PGD 8104795067 P4D 0
[ 176.776900] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 176.783640] CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
[ 176.797868] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0107.D52.2311070228 11/07/2023
[ 176.812762] RIP: 0010:cache_tag_flush_range_np+0x13c/0x260
[ 176.820656] Code: 8b 6e 18 41 f6 45 18 80 0f 85 66 ff ff ff 4c 89 ef e8 e8 98 ff ff 4d 8b 36 49 39 ee 75 a3 4d 85 ed 48 8b ab 28 01 00 00 74 0e <8b> 95 00 02 00 00 85 d2 0f 85 88 00 00 00 48 8b 74 24 08 4c 89 ff
[ 176.847296] RSP: 0018:ff2e80d8b4887aa0 EFLAGS: 00010086
[ 176.855207] RAX: 0000000000000000 RBX: ff18cd64bca69000 RCX: 0000000000000001
[ 176.865337] RDX: 0000000000000000 RSI: ffffffff98206127 RDI: ff18cd6440139c00
[ 176.875453] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 176.885622] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 176.895802] R13: ff18cd6440139c00 R14: ff18cd64bca69118 R15: ff18cd64bca690d0
[ 176.906056] FS: 00007f070a7fc640(0000) GS:ff18cde22fa00000(0000) knlGS:0000000000000000
[ 176.917459] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 176.926274] CR2: 0000000000000200 CR3: 00000080996ee003 CR4: 0000000000f73ef0
[ 176.936703] PKRU: 55555554
[ 176.942175] Call Trace:
[ 176.947385] <TASK>
[ 176.952216] ? __die+0x24/0x70
[ 176.958114] ? page_fault_oops+0x80/0x150
[ 176.965133] ? do_user_addr_fault+0x63/0x7b0
[ 176.972462] ? exc_page_fault+0x7c/0x220
[ 176.979437] ? asm_exc_page_fault+0x26/0x30
[ 176.986755] ? cache_tag_flush_range_np+0x13c/0x260
[ 176.994903] intel_iommu_iotlb_sync_map+0x1a/0x30
[ 177.002880] iommu_map+0x61/0xf0
[ 177.009198] batch_to_domain+0x188/0x250
[ 177.016344] iopt_area_fill_domains+0x125/0x320
[ 177.024220] ? rcu_is_watching+0x11/0x50
[ 177.031442] iopt_map_pages+0x63/0x100
[ 177.038420] iopt_map_common.isra.0+0xa7/0x190
[ 177.046131] iopt_map_user_pages+0x6a/0x80
[ 177.053435] iommufd_ioas_map+0xcd/0x1d0
[ 177.060537] iommufd_fops_ioctl+0x118/0x1c0
[ 177.067918] __x64_sys_ioctl+0x93/0xc0
[ 177.074736] do_syscall_64+0x71/0x140
[ 177.081376] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 177.089489] RIP: 0033:0x7f071491a94f
[ 177.095876] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[ 177.124032] RSP: 002b:00007f070a7f76d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 177.134964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f071491a94f
[ 177.145390] RDX: 00007f070a7f7760 RSI: 0000000000003b85 RDI: 0000000000000015
[ 177.155834] RBP: 0000000000000015 R08: 00007f0603e00000 R09: 0000000000000000
[ 177.166247] R10: 000056326cd25970 R11: 0000000000000246 R12: 00000000000a0000
[ 177.177421] R13: 0000000000000004 R14: 0000000000000000 R15: 00000000000a0000
[ 177.187869] </TASK>
[ 177.192700] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ofpart snd_hda_codec pmt_telemetry kvm_intel binfmt_misc pmt_class intel_sdsi snd_hda_core spi_nor isst_if_mbox_pci isst_if_mmio kvm dax_hmem mei_me joydev mtd isst_if_common snd_pcm intel_vsec idxd mei snd_timer cxl_acpi cxl_port cxl_core einj acpi_power_meter dm_multipath fuse ip_tables crc32c_intel i2c_i801 spi_intel_pci i2c_smbus i2c_ismt spi_intel drm_shmem_helper igc pinctrl_emmitsburg pinctrl_intel pwm_lpss
Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/iommu/intel/cache.c | 7 -------
drivers/iommu/intel/iommu.c | 7 +++++++
drivers/iommu/intel/nested.c | 6 ++++++
drivers/iommu/intel/svm.c | 7 +++++++
4 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index e5b89f728ad3..66283e065ae2 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -190,13 +190,6 @@ int cache_tag_assign_domain(struct dmar_domain *domain,
u16 did = domain_get_id_for_dev(domain, dev);
int ret;
- /* domain->qi_bach will be freed in iommu_free_domain() path. */
- if (!domain->qi_batch) {
- domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_KERNEL);
- if (!domain->qi_batch)
- return -ENOMEM;
- }
-
ret = __cache_tag_assign_domain(domain, did, dev, pasid);
if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED)
return ret;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c8f9c70a04ab..238936c3eb55 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3277,6 +3277,12 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
if (!domain)
return ERR_PTR(-ENOMEM);
+ domain->qi_batch = kzalloc(sizeof(struct qi_batch), GFP_KERNEL);
+ if (!domain->qi_batch) {
+ kfree(domain);
+ return ERR_PTR(-ENOMEM);
+ }
+
INIT_LIST_HEAD(&domain->devices);
INIT_LIST_HEAD(&domain->dev_pasids);
INIT_LIST_HEAD(&domain->cache_tags);
@@ -3319,6 +3325,7 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
/* always allocate the top pgd */
domain->pgd = iommu_alloc_page_node(domain->nid, GFP_KERNEL);
if (!domain->pgd) {
+ kfree(domain->qi_batch);
kfree(domain);
return ERR_PTR(-ENOMEM);
}
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index aba92c00b427..90ea7ebc9708 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -217,6 +217,12 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
if (!domain)
return ERR_PTR(-ENOMEM);
+ domain->qi_batch = kzalloc(sizeof(struct qi_batch), GFP_KERNEL);
+ if (!domain->qi_batch) {
+ kfree(domain);
+ return ERR_PTR(-ENOMEM);
+ }
+
domain->use_first_level = true;
domain->s2_domain = s2_domain;
domain->s1_cfg = vtd;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index f5569347591f..e036fa83a0da 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -165,6 +165,12 @@ struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
if (!domain)
return ERR_PTR(-ENOMEM);
+ domain->qi_batch = kzalloc(sizeof(struct qi_batch), GFP_KERNEL);
+ if (!domain->qi_batch) {
+ kfree(domain);
+ return ERR_PTR(-ENOMEM);
+ }
+
domain->domain.ops = &intel_svm_domain_ops;
domain->use_first_level = true;
INIT_LIST_HEAD(&domain->dev_pasids);
@@ -175,6 +181,7 @@ struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
domain->notifier.ops = &intel_mmuops;
ret = mmu_notifier_register(&domain->notifier, mm);
if (ret) {
+ kfree(domain->qi_batch);
kfree(domain);
return ERR_PTR(ret);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
2024-12-07 12:03 [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain Yi Liu
@ 2024-12-09 1:39 ` Baolu Lu
2024-12-09 3:07 ` Yi Liu
2024-12-09 3:10 ` Yi Liu
0 siblings, 2 replies; 7+ messages in thread
From: Baolu Lu @ 2024-12-09 1:39 UTC (permalink / raw)
To: Yi Liu, joro, kevin.tian; +Cc: chao.p.peng, iommu
On 12/7/24 20:03, Yi Liu wrote:
> The qi_batch is allocated when assigning cache tag for a domain. While
> for nested parent domain, it is missed. Hence, when trying to map pages
> to the nested parent, NULL dereference occurred.
Yes. Good catch!
> To solve it, allocate qi_batch for the nested parent domain when assigning
> cache tag for the nested domain is enough. However, it seems not quite
> reliable to allocate qi_batch in cache tag as a domain may be used by
> multiple devices, and there is no lock around the domain->qi_batch check.
> As all the domains (except blocking domain and identity domain) are supposed
> to have qi_batch, this fix just allocates the qi_batch in the domain
> allocation.
So there appears to be two problems, the domain->qi_batch is not
allocated for all paths and it's allocated without lock protection that
creates a race case and possibly result in memory leak.
But I don't think there is a need to move the domain->qi_batch
allocation to the domain allocation path. That's a kind of refactoring
and there is no need to mess it with a fix patch like this.
Perhaps add a helper to handle the domain->qi_batch allocation? And then
call it in the necessary paths? Something like this:
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -105,12 +105,35 @@ static void cache_tag_unassign(struct dmar_domain
*domain, u16 did,
spin_unlock_irqrestore(&domain->cache_lock, flags);
}
+/* domain->qi_batch will be freed in iommu_free_domain() path. */
+static int domain_qi_batch_alloc(struct dmar_domain *domain)
+{
+ unsigned long flags;
+ int ret = 0;
+
+ spin_lock_irqsave(&domain->cache_lock, flags);
+ if (domain->qi_batch)
+ goto out_unlock;
+
+ domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
+ if (!domain->qi_batch)
+ ret = -ENOMEM;
+out_unlock:
+ spin_unlock_irqrestore(&domain->cache_lock, flags);
+
+ return ret;
+}
+
static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
struct device *dev, ioasid_t pasid)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
int ret;
+ ret = domain_qi_batch_alloc(domain);
+ if (ret)
+ return ret;
+
ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_IOTLB);
if (ret || !info->ats_enabled)
return ret;
@@ -139,6 +162,10 @@ static int __cache_tag_assign_parent_domain(struct
dmar_domain *domain, u16 did,
struct device_domain_info *info = dev_iommu_priv_get(dev);
int ret;
+ ret = domain_qi_batch_alloc(domain);
+ if (ret)
+ return ret;
+
ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB);
if (ret || !info->ats_enabled)
return ret;
@@ -190,13 +217,6 @@ int cache_tag_assign_domain(struct dmar_domain *domain,
u16 did = domain_get_id_for_dev(domain, dev);
int ret;
- /* domain->qi_bach will be freed in iommu_free_domain() path. */
- if (!domain->qi_batch) {
- domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_KERNEL);
- if (!domain->qi_batch)
- return -ENOMEM;
- }
-
ret = __cache_tag_assign_domain(domain, did, dev, pasid);
if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED)
return ret;
>
> [ 176.749284] BUG: kernel NULL pointer dereference, address: 0000000000000200
> [ 176.758178] #PF: supervisor read access in kernel mode
> [ 176.765052] #PF: error_code(0x0000) - not-present page
> [ 176.771985] PGD 8104795067 P4D 0
> [ 176.776900] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 176.783640] CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
> [ 176.797868] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0107.D52.2311070228 11/07/2023
> [ 176.812762] RIP: 0010:cache_tag_flush_range_np+0x13c/0x260
> [ 176.820656] Code: 8b 6e 18 41 f6 45 18 80 0f 85 66 ff ff ff 4c 89 ef e8 e8 98 ff ff 4d 8b 36 49 39 ee 75 a3 4d 85 ed 48 8b ab 28 01 00 00 74 0e <8b> 95 00 02 00 00 85 d2 0f 85 88 00 00 00 48 8b 74 24 08 4c 89 ff
> [ 176.847296] RSP: 0018:ff2e80d8b4887aa0 EFLAGS: 00010086
> [ 176.855207] RAX: 0000000000000000 RBX: ff18cd64bca69000 RCX: 0000000000000001
> [ 176.865337] RDX: 0000000000000000 RSI: ffffffff98206127 RDI: ff18cd6440139c00
> [ 176.875453] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [ 176.885622] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 176.895802] R13: ff18cd6440139c00 R14: ff18cd64bca69118 R15: ff18cd64bca690d0
> [ 176.906056] FS: 00007f070a7fc640(0000) GS:ff18cde22fa00000(0000) knlGS:0000000000000000
> [ 176.917459] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 176.926274] CR2: 0000000000000200 CR3: 00000080996ee003 CR4: 0000000000f73ef0
> [ 176.936703] PKRU: 55555554
> [ 176.942175] Call Trace:
> [ 176.947385] <TASK>
> [ 176.952216] ? __die+0x24/0x70
> [ 176.958114] ? page_fault_oops+0x80/0x150
> [ 176.965133] ? do_user_addr_fault+0x63/0x7b0
> [ 176.972462] ? exc_page_fault+0x7c/0x220
> [ 176.979437] ? asm_exc_page_fault+0x26/0x30
> [ 176.986755] ? cache_tag_flush_range_np+0x13c/0x260
> [ 176.994903] intel_iommu_iotlb_sync_map+0x1a/0x30
> [ 177.002880] iommu_map+0x61/0xf0
> [ 177.009198] batch_to_domain+0x188/0x250
> [ 177.016344] iopt_area_fill_domains+0x125/0x320
> [ 177.024220] ? rcu_is_watching+0x11/0x50
> [ 177.031442] iopt_map_pages+0x63/0x100
> [ 177.038420] iopt_map_common.isra.0+0xa7/0x190
> [ 177.046131] iopt_map_user_pages+0x6a/0x80
> [ 177.053435] iommufd_ioas_map+0xcd/0x1d0
> [ 177.060537] iommufd_fops_ioctl+0x118/0x1c0
> [ 177.067918] __x64_sys_ioctl+0x93/0xc0
> [ 177.074736] do_syscall_64+0x71/0x140
> [ 177.081376] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 177.089489] RIP: 0033:0x7f071491a94f
> [ 177.095876] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> [ 177.124032] RSP: 002b:00007f070a7f76d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 177.134964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f071491a94f
> [ 177.145390] RDX: 00007f070a7f7760 RSI: 0000000000003b85 RDI: 0000000000000015
> [ 177.155834] RBP: 0000000000000015 R08: 00007f0603e00000 R09: 0000000000000000
> [ 177.166247] R10: 000056326cd25970 R11: 0000000000000246 R12: 00000000000a0000
> [ 177.177421] R13: 0000000000000004 R14: 0000000000000000 R15: 00000000000a0000
> [ 177.187869] </TASK>
> [ 177.192700] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ofpart snd_hda_codec pmt_telemetry kvm_intel binfmt_misc pmt_class intel_sdsi snd_hda_core spi_nor isst_if_mbox_pci isst_if_mmio kvm dax_hmem mei_me joydev mtd isst_if_common snd_pcm intel_vsec idxd mei snd_timer cxl_acpi cxl_port cxl_core einj acpi_power_meter dm_multipath fuse ip_tables crc32c_intel i2c_i801 spi_intel_pci i2c_smbus i2c_ismt spi_intel drm_shmem_helper igc pinctrl_emmitsburg pinctrl_intel pwm_lpss
Strip the kernel message and leave only valuable messages that
developers might be interested. Something like this,
BUG: kernel NULL pointer dereference, address: 0000000000000200
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8104795067 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted
6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
Call Trace:
? __die+0x24/0x70
? page_fault_oops+0x80/0x150
? do_user_addr_fault+0x63/0x7b0
? exc_page_fault+0x7c/0x220
? asm_exc_page_fault+0x26/0x30
? cache_tag_flush_range_np+0x13c/0x260
intel_iommu_iotlb_sync_map+0x1a/0x30
iommu_map+0x61/0xf0
batch_to_domain+0x188/0x250
iopt_area_fill_domains+0x125/0x320
? rcu_is_watching+0x11/0x50
iopt_map_pages+0x63/0x100
iopt_map_common.isra.0+0xa7/0x190
iopt_map_user_pages+0x6a/0x80
iommufd_ioas_map+0xcd/0x1d0
iommufd_fops_ioctl+0x118/0x1c0
__x64_sys_ioctl+0x93/0xc0
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/iommu/intel/cache.c | 7 -------
> drivers/iommu/intel/iommu.c | 7 +++++++
> drivers/iommu/intel/nested.c | 6 ++++++
> drivers/iommu/intel/svm.c | 7 +++++++
> 4 files changed, 20 insertions(+), 7 deletions(-)
[...]
Thanks,
baolu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
2024-12-09 3:07 ` Yi Liu
@ 2024-12-09 3:05 ` Baolu Lu
2024-12-11 8:32 ` Tian, Kevin
1 sibling, 0 replies; 7+ messages in thread
From: Baolu Lu @ 2024-12-09 3:05 UTC (permalink / raw)
To: Yi Liu, joro, kevin.tian; +Cc: chao.p.peng, iommu
On 12/9/24 11:07, Yi Liu wrote:
>> Perhaps add a helper to handle the domain->qi_batch allocation? And then
>> call it in the necessary paths? Something like this:
>>
>> --- a/drivers/iommu/intel/cache.c
>> +++ b/drivers/iommu/intel/cache.c
>> @@ -105,12 +105,35 @@ static void cache_tag_unassign(struct
>> dmar_domain *domain, u16 did,
>> spin_unlock_irqrestore(&domain->cache_lock, flags);
>> }
>>
>> +/* domain->qi_batch will be freed in iommu_free_domain() path. */
>> +static int domain_qi_batch_alloc(struct dmar_domain *domain)
>> +{
>> + unsigned long flags;
>> + int ret = 0;
>> +
>> + spin_lock_irqsave(&domain->cache_lock, flags);
>
> I'm wondering if there is any difference using the domain->lock v.s. using
> the domain->cache_lock. The RID attach path acquires/releases the
> domain->lock first, and then the domain->cache_lock. While the PASID path
> does the reverse order. Although neither of the two paths hold the locks
> for a long duration (no A-B-B-A locking issue), it still means when a
> domain is shared between RID and PASID path, the domain->qi_batch check is
> not in the same phase of the two paths.
You are right. domain->lock is the right one.
>
>> + if (domain->qi_batch)
>> + goto out_unlock;
>> +
>> + domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
>> + if (!domain->qi_batch)
>> + ret = -ENOMEM;
>> +out_unlock:
>> + spin_unlock_irqrestore(&domain->cache_lock, flags);
>> +
>> + return ret;
>> +}
>> +
>> static int __cache_tag_assign_domain(struct dmar_domain *domain, u16
>> did,
>> struct device *dev, ioasid_t pasid)
>> {
>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>> int ret;
>>
>> + ret = domain_qi_batch_alloc(domain);
>> + if (ret)
>> + return ret;
>> +
>
> also need to call it in __cache_tag_assign_parent_domain(). BTW. Is the
> domain->qi_batch required if there is no ATS?
Yes.
Thanks,
baolu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
2024-12-09 1:39 ` Baolu Lu
@ 2024-12-09 3:07 ` Yi Liu
2024-12-09 3:05 ` Baolu Lu
2024-12-11 8:32 ` Tian, Kevin
2024-12-09 3:10 ` Yi Liu
1 sibling, 2 replies; 7+ messages in thread
From: Yi Liu @ 2024-12-09 3:07 UTC (permalink / raw)
To: Baolu Lu, joro, kevin.tian; +Cc: chao.p.peng, iommu
On 2024/12/9 09:39, Baolu Lu wrote:
> On 12/7/24 20:03, Yi Liu wrote:
>> The qi_batch is allocated when assigning cache tag for a domain. While
>> for nested parent domain, it is missed. Hence, when trying to map pages
>> to the nested parent, NULL dereference occurred.
>
> Yes. Good catch!
>
>> To solve it, allocate qi_batch for the nested parent domain when assigning
>> cache tag for the nested domain is enough. However, it seems not quite
>> reliable to allocate qi_batch in cache tag as a domain may be used by
>> multiple devices, and there is no lock around the domain->qi_batch check.
>> As all the domains (except blocking domain and identity domain) are supposed
>> to have qi_batch, this fix just allocates the qi_batch in the domain
>> allocation.
>
> So there appears to be two problems, the domain->qi_batch is not
> allocated for all paths and it's allocated without lock protection that
> creates a race case and possibly result in memory leak.
yes, it is although we haven't encountered it for now.
> But I don't think there is a need to move the domain->qi_batch
> allocation to the domain allocation path. That's a kind of refactoring
> and there is no need to mess it with a fix patch like this.
aha, yes. I felt the qi_batch eventually would be allocating it in domain
allocation, hence make the fix like this.
> Perhaps add a helper to handle the domain->qi_batch allocation? And then
> call it in the necessary paths? Something like this:
>
> --- a/drivers/iommu/intel/cache.c
> +++ b/drivers/iommu/intel/cache.c
> @@ -105,12 +105,35 @@ static void cache_tag_unassign(struct dmar_domain
> *domain, u16 did,
> spin_unlock_irqrestore(&domain->cache_lock, flags);
> }
>
> +/* domain->qi_batch will be freed in iommu_free_domain() path. */
> +static int domain_qi_batch_alloc(struct dmar_domain *domain)
> +{
> + unsigned long flags;
> + int ret = 0;
> +
> + spin_lock_irqsave(&domain->cache_lock, flags);
I'm wondering if there is any difference using the domain->lock v.s. using
the domain->cache_lock. The RID attach path acquires/releases the
domain->lock first, and then the domain->cache_lock. While the PASID path
does the reverse order. Although neither of the two paths hold the locks
for a long duration (no A-B-B-A locking issue), it still means when a
domain is shared between RID and PASID path, the domain->qi_batch check is
not in the same phase of the two paths.
> + if (domain->qi_batch)
> + goto out_unlock;
> +
> + domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
> + if (!domain->qi_batch)
> + ret = -ENOMEM;
> +out_unlock:
> + spin_unlock_irqrestore(&domain->cache_lock, flags);
> +
> + return ret;
> +}
> +
> static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
> struct device *dev, ioasid_t pasid)
> {
> struct device_domain_info *info = dev_iommu_priv_get(dev);
> int ret;
>
> + ret = domain_qi_batch_alloc(domain);
> + if (ret)
> + return ret;
> +
also need to call it in __cache_tag_assign_parent_domain(). BTW. Is the
domain->qi_batch required if there is no ATS?
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
2024-12-09 1:39 ` Baolu Lu
2024-12-09 3:07 ` Yi Liu
@ 2024-12-09 3:10 ` Yi Liu
1 sibling, 0 replies; 7+ messages in thread
From: Yi Liu @ 2024-12-09 3:10 UTC (permalink / raw)
To: Baolu Lu, joro, kevin.tian; +Cc: chao.p.peng, iommu
On 2024/12/9 09:39, Baolu Lu wrote:
> On 12/7/24 20:03, Yi Liu wrote:
>> The qi_batch is allocated when assigning cache tag for a domain. While
>> for nested parent domain, it is missed. Hence, when trying to map pages
>> to the nested parent, NULL dereference occurred.
>
> Strip the kernel message and leave only valuable messages that
> developers might be interested. Something like this,
got it.
> BUG: kernel NULL pointer dereference, address: 0000000000000200
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 8104795067 P4D 0
> Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted
> 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
> Call Trace:
> ? __die+0x24/0x70
> ? page_fault_oops+0x80/0x150
> ? do_user_addr_fault+0x63/0x7b0
> ? exc_page_fault+0x7c/0x220
> ? asm_exc_page_fault+0x26/0x30
> ? cache_tag_flush_range_np+0x13c/0x260
> intel_iommu_iotlb_sync_map+0x1a/0x30
> iommu_map+0x61/0xf0
> batch_to_domain+0x188/0x250
> iopt_area_fill_domains+0x125/0x320
> ? rcu_is_watching+0x11/0x50
> iopt_map_pages+0x63/0x100
> iopt_map_common.isra.0+0xa7/0x190
> iopt_map_user_pages+0x6a/0x80
> iommufd_ioas_map+0xcd/0x1d0
> iommufd_fops_ioctl+0x118/0x1c0
> __x64_sys_ioctl+0x93/0xc0
> do_syscall_64+0x71/0x140
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
2024-12-09 3:07 ` Yi Liu
2024-12-09 3:05 ` Baolu Lu
@ 2024-12-11 8:32 ` Tian, Kevin
2024-12-11 9:29 ` Yi Liu
1 sibling, 1 reply; 7+ messages in thread
From: Tian, Kevin @ 2024-12-11 8:32 UTC (permalink / raw)
To: Liu, Yi L, Baolu Lu, joro@8bytes.org
Cc: chao.p.peng@linux.intel.com, iommu@lists.linux.dev
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 9, 2024 11:07 AM
>
> On 2024/12/9 09:39, Baolu Lu wrote:
> > +/* domain->qi_batch will be freed in iommu_free_domain() path. */
> > +static int domain_qi_batch_alloc(struct dmar_domain *domain)
> > +{
> > + unsigned long flags;
> > + int ret = 0;
> > +
> > + spin_lock_irqsave(&domain->cache_lock, flags);
>
> I'm wondering if there is any difference using the domain->lock v.s. using
> the domain->cache_lock. The RID attach path acquires/releases the
> domain->lock first, and then the domain->cache_lock. While the PASID path
> does the reverse order. Although neither of the two paths hold the locks
> for a long duration (no A-B-B-A locking issue), it still means when a
> domain is shared between RID and PASID path, the domain->qi_batch check
> is
> not in the same phase of the two paths.
>
why is there different order between RID vs. PASID attach?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
2024-12-11 8:32 ` Tian, Kevin
@ 2024-12-11 9:29 ` Yi Liu
0 siblings, 0 replies; 7+ messages in thread
From: Yi Liu @ 2024-12-11 9:29 UTC (permalink / raw)
To: Tian, Kevin, Baolu Lu, joro@8bytes.org
Cc: chao.p.peng@linux.intel.com, iommu@lists.linux.dev
On 2024/12/11 16:32, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, December 9, 2024 11:07 AM
>>
>> On 2024/12/9 09:39, Baolu Lu wrote:
>>> +/* domain->qi_batch will be freed in iommu_free_domain() path. */
>>> +static int domain_qi_batch_alloc(struct dmar_domain *domain)
>>> +{
>>> + unsigned long flags;
>>> + int ret = 0;
>>> +
>>> + spin_lock_irqsave(&domain->cache_lock, flags);
>>
>> I'm wondering if there is any difference using the domain->lock v.s. using
>> the domain->cache_lock. The RID attach path acquires/releases the
>> domain->lock first, and then the domain->cache_lock. While the PASID path
>> does the reverse order. Although neither of the two paths hold the locks
>> for a long duration (no A-B-B-A locking issue), it still means when a
>> domain is shared between RID and PASID path, the domain->qi_batch check
>> is
>> not in the same phase of the two paths.
>>
>
> why is there different order between RID vs. PASID attach?
I think they should be consolidated to follow the same order to have
better readability although there is no actual problem for now.
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-11 9:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-07 12:03 [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain Yi Liu
2024-12-09 1:39 ` Baolu Lu
2024-12-09 3:07 ` Yi Liu
2024-12-09 3:05 ` Baolu Lu
2024-12-11 8:32 ` Tian, Kevin
2024-12-11 9:29 ` Yi Liu
2024-12-09 3:10 ` Yi Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.