All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
@ 2024-12-07 12:03 Yi Liu
  2024-12-09  1:39 ` Baolu Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Yi Liu @ 2024-12-07 12:03 UTC (permalink / raw)
  To: joro, kevin.tian, baolu.lu; +Cc: chao.p.peng, yi.l.liu, iommu

The qi_batch is allocated when assigning cache tag for a domain. While
for nested parent domain, it is missed. Hence, when trying to map pages
to the nested parent, NULL dereference occurred.

To solve it, allocate qi_batch for the nested parent domain when assigning
cache tag for the nested domain is enough. However, it seems not quite
reliable to allocate qi_batch in cache tag as a domain may be used by
multiple devices, and there is no lock around the domain->qi_batch check.
As all the domains (except blocking domain and identity domain) are supposed
to have qi_batch, this fix just allocates the qi_batch in the domain
allocation.

[  176.749284] BUG: kernel NULL pointer dereference, address: 0000000000000200
[  176.758178] #PF: supervisor read access in kernel mode
[  176.765052] #PF: error_code(0x0000) - not-present page
[  176.771985] PGD 8104795067 P4D 0
[  176.776900] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[  176.783640] CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
[  176.797868] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0107.D52.2311070228 11/07/2023
[  176.812762] RIP: 0010:cache_tag_flush_range_np+0x13c/0x260
[  176.820656] Code: 8b 6e 18 41 f6 45 18 80 0f 85 66 ff ff ff 4c 89 ef e8 e8 98 ff ff 4d 8b 36 49 39 ee 75 a3 4d 85 ed 48 8b ab 28 01 00 00 74 0e <8b> 95 00 02 00 00 85 d2 0f 85 88 00 00 00 48 8b 74 24 08 4c 89 ff
[  176.847296] RSP: 0018:ff2e80d8b4887aa0 EFLAGS: 00010086
[  176.855207] RAX: 0000000000000000 RBX: ff18cd64bca69000 RCX: 0000000000000001
[  176.865337] RDX: 0000000000000000 RSI: ffffffff98206127 RDI: ff18cd6440139c00
[  176.875453] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[  176.885622] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  176.895802] R13: ff18cd6440139c00 R14: ff18cd64bca69118 R15: ff18cd64bca690d0
[  176.906056] FS:  00007f070a7fc640(0000) GS:ff18cde22fa00000(0000) knlGS:0000000000000000
[  176.917459] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  176.926274] CR2: 0000000000000200 CR3: 00000080996ee003 CR4: 0000000000f73ef0
[  176.936703] PKRU: 55555554
[  176.942175] Call Trace:
[  176.947385]  <TASK>
[  176.952216]  ? __die+0x24/0x70
[  176.958114]  ? page_fault_oops+0x80/0x150
[  176.965133]  ? do_user_addr_fault+0x63/0x7b0
[  176.972462]  ? exc_page_fault+0x7c/0x220
[  176.979437]  ? asm_exc_page_fault+0x26/0x30
[  176.986755]  ? cache_tag_flush_range_np+0x13c/0x260
[  176.994903]  intel_iommu_iotlb_sync_map+0x1a/0x30
[  177.002880]  iommu_map+0x61/0xf0
[  177.009198]  batch_to_domain+0x188/0x250
[  177.016344]  iopt_area_fill_domains+0x125/0x320
[  177.024220]  ? rcu_is_watching+0x11/0x50
[  177.031442]  iopt_map_pages+0x63/0x100
[  177.038420]  iopt_map_common.isra.0+0xa7/0x190
[  177.046131]  iopt_map_user_pages+0x6a/0x80
[  177.053435]  iommufd_ioas_map+0xcd/0x1d0
[  177.060537]  iommufd_fops_ioctl+0x118/0x1c0
[  177.067918]  __x64_sys_ioctl+0x93/0xc0
[  177.074736]  do_syscall_64+0x71/0x140
[  177.081376]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  177.089489] RIP: 0033:0x7f071491a94f
[  177.095876] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[  177.124032] RSP: 002b:00007f070a7f76d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  177.134964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f071491a94f
[  177.145390] RDX: 00007f070a7f7760 RSI: 0000000000003b85 RDI: 0000000000000015
[  177.155834] RBP: 0000000000000015 R08: 00007f0603e00000 R09: 0000000000000000
[  177.166247] R10: 000056326cd25970 R11: 0000000000000246 R12: 00000000000a0000
[  177.177421] R13: 0000000000000004 R14: 0000000000000000 R15: 00000000000a0000
[  177.187869]  </TASK>
[  177.192700] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ofpart snd_hda_codec pmt_telemetry kvm_intel binfmt_misc pmt_class intel_sdsi snd_hda_core spi_nor isst_if_mbox_pci isst_if_mmio kvm dax_hmem mei_me joydev mtd isst_if_common snd_pcm intel_vsec idxd mei snd_timer cxl_acpi cxl_port cxl_core einj acpi_power_meter dm_multipath fuse ip_tables crc32c_intel i2c_i801 spi_intel_pci i2c_smbus i2c_ismt spi_intel drm_shmem_helper igc pinctrl_emmitsburg pinctrl_intel pwm_lpss

Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/cache.c  | 7 -------
 drivers/iommu/intel/iommu.c  | 7 +++++++
 drivers/iommu/intel/nested.c | 6 ++++++
 drivers/iommu/intel/svm.c    | 7 +++++++
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index e5b89f728ad3..66283e065ae2 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -190,13 +190,6 @@ int cache_tag_assign_domain(struct dmar_domain *domain,
 	u16 did = domain_get_id_for_dev(domain, dev);
 	int ret;
 
-	/* domain->qi_bach will be freed in iommu_free_domain() path. */
-	if (!domain->qi_batch) {
-		domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_KERNEL);
-		if (!domain->qi_batch)
-			return -ENOMEM;
-	}
-
 	ret = __cache_tag_assign_domain(domain, did, dev, pasid);
 	if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED)
 		return ret;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c8f9c70a04ab..238936c3eb55 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3277,6 +3277,12 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
 	if (!domain)
 		return ERR_PTR(-ENOMEM);
 
+	domain->qi_batch = kzalloc(sizeof(struct qi_batch), GFP_KERNEL);
+	if (!domain->qi_batch) {
+		kfree(domain);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	INIT_LIST_HEAD(&domain->devices);
 	INIT_LIST_HEAD(&domain->dev_pasids);
 	INIT_LIST_HEAD(&domain->cache_tags);
@@ -3319,6 +3325,7 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
 	/* always allocate the top pgd */
 	domain->pgd = iommu_alloc_page_node(domain->nid, GFP_KERNEL);
 	if (!domain->pgd) {
+		kfree(domain->qi_batch);
 		kfree(domain);
 		return ERR_PTR(-ENOMEM);
 	}
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index aba92c00b427..90ea7ebc9708 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -217,6 +217,12 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
 	if (!domain)
 		return ERR_PTR(-ENOMEM);
 
+	domain->qi_batch = kzalloc(sizeof(struct qi_batch), GFP_KERNEL);
+	if (!domain->qi_batch) {
+		kfree(domain);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	domain->use_first_level = true;
 	domain->s2_domain = s2_domain;
 	domain->s1_cfg = vtd;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index f5569347591f..e036fa83a0da 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -165,6 +165,12 @@ struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
 	if (!domain)
 		return ERR_PTR(-ENOMEM);
 
+	domain->qi_batch = kzalloc(sizeof(struct qi_batch), GFP_KERNEL);
+	if (!domain->qi_batch) {
+		kfree(domain);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	domain->domain.ops = &intel_svm_domain_ops;
 	domain->use_first_level = true;
 	INIT_LIST_HEAD(&domain->dev_pasids);
@@ -175,6 +181,7 @@ struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
 	domain->notifier.ops = &intel_mmuops;
 	ret = mmu_notifier_register(&domain->notifier, mm);
 	if (ret) {
+		kfree(domain->qi_batch);
 		kfree(domain);
 		return ERR_PTR(ret);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  2024-12-07 12:03 [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain Yi Liu
@ 2024-12-09  1:39 ` Baolu Lu
  2024-12-09  3:07   ` Yi Liu
  2024-12-09  3:10   ` Yi Liu
  0 siblings, 2 replies; 7+ messages in thread
From: Baolu Lu @ 2024-12-09  1:39 UTC (permalink / raw)
  To: Yi Liu, joro, kevin.tian; +Cc: chao.p.peng, iommu

On 12/7/24 20:03, Yi Liu wrote:
> The qi_batch is allocated when assigning cache tag for a domain. While
> for nested parent domain, it is missed. Hence, when trying to map pages
> to the nested parent, NULL dereference occurred.

Yes. Good catch!

> To solve it, allocate qi_batch for the nested parent domain when assigning
> cache tag for the nested domain is enough. However, it seems not quite
> reliable to allocate qi_batch in cache tag as a domain may be used by
> multiple devices, and there is no lock around the domain->qi_batch check.
> As all the domains (except blocking domain and identity domain) are supposed
> to have qi_batch, this fix just allocates the qi_batch in the domain
> allocation.

So there appears to be two problems, the domain->qi_batch is not
allocated for all paths and it's allocated without lock protection that
creates a race case and possibly result in memory leak.

But I don't think there is a need to move the domain->qi_batch
allocation to the domain allocation path. That's a kind of refactoring
and there is no need to mess it with a fix patch like this.

Perhaps add a helper to handle the domain->qi_batch allocation? And then
call it in the necessary paths? Something like this:

--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -105,12 +105,35 @@ static void cache_tag_unassign(struct dmar_domain 
*domain, u16 did,
  	spin_unlock_irqrestore(&domain->cache_lock, flags);
  }

+/* domain->qi_batch will be freed in iommu_free_domain() path. */
+static int domain_qi_batch_alloc(struct dmar_domain *domain)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&domain->cache_lock, flags);
+	if (domain->qi_batch)
+		goto out_unlock;
+
+	domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
+	if (!domain->qi_batch)
+		ret = -ENOMEM;
+out_unlock:
+	spin_unlock_irqrestore(&domain->cache_lock, flags);
+
+	return ret;
+}
+
  static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
  				     struct device *dev, ioasid_t pasid)
  {
  	struct device_domain_info *info = dev_iommu_priv_get(dev);
  	int ret;

+	ret = domain_qi_batch_alloc(domain);
+	if (ret)
+		return ret;
+
  	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_IOTLB);
  	if (ret || !info->ats_enabled)
  		return ret;
@@ -139,6 +162,10 @@ static int __cache_tag_assign_parent_domain(struct 
dmar_domain *domain, u16 did,
  	struct device_domain_info *info = dev_iommu_priv_get(dev);
  	int ret;

+	ret = domain_qi_batch_alloc(domain);
+	if (ret)
+		return ret;
+
  	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB);
  	if (ret || !info->ats_enabled)
  		return ret;
@@ -190,13 +217,6 @@ int cache_tag_assign_domain(struct dmar_domain *domain,
  	u16 did = domain_get_id_for_dev(domain, dev);
  	int ret;

-	/* domain->qi_bach will be freed in iommu_free_domain() path. */
-	if (!domain->qi_batch) {
-		domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_KERNEL);
-		if (!domain->qi_batch)
-			return -ENOMEM;
-	}
-
  	ret = __cache_tag_assign_domain(domain, did, dev, pasid);
  	if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED)
  		return ret;

> 
> [  176.749284] BUG: kernel NULL pointer dereference, address: 0000000000000200
> [  176.758178] #PF: supervisor read access in kernel mode
> [  176.765052] #PF: error_code(0x0000) - not-present page
> [  176.771985] PGD 8104795067 P4D 0
> [  176.776900] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  176.783640] CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
> [  176.797868] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0107.D52.2311070228 11/07/2023
> [  176.812762] RIP: 0010:cache_tag_flush_range_np+0x13c/0x260
> [  176.820656] Code: 8b 6e 18 41 f6 45 18 80 0f 85 66 ff ff ff 4c 89 ef e8 e8 98 ff ff 4d 8b 36 49 39 ee 75 a3 4d 85 ed 48 8b ab 28 01 00 00 74 0e <8b> 95 00 02 00 00 85 d2 0f 85 88 00 00 00 48 8b 74 24 08 4c 89 ff
> [  176.847296] RSP: 0018:ff2e80d8b4887aa0 EFLAGS: 00010086
> [  176.855207] RAX: 0000000000000000 RBX: ff18cd64bca69000 RCX: 0000000000000001
> [  176.865337] RDX: 0000000000000000 RSI: ffffffff98206127 RDI: ff18cd6440139c00
> [  176.875453] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [  176.885622] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [  176.895802] R13: ff18cd6440139c00 R14: ff18cd64bca69118 R15: ff18cd64bca690d0
> [  176.906056] FS:  00007f070a7fc640(0000) GS:ff18cde22fa00000(0000) knlGS:0000000000000000
> [  176.917459] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  176.926274] CR2: 0000000000000200 CR3: 00000080996ee003 CR4: 0000000000f73ef0
> [  176.936703] PKRU: 55555554
> [  176.942175] Call Trace:
> [  176.947385]  <TASK>
> [  176.952216]  ? __die+0x24/0x70
> [  176.958114]  ? page_fault_oops+0x80/0x150
> [  176.965133]  ? do_user_addr_fault+0x63/0x7b0
> [  176.972462]  ? exc_page_fault+0x7c/0x220
> [  176.979437]  ? asm_exc_page_fault+0x26/0x30
> [  176.986755]  ? cache_tag_flush_range_np+0x13c/0x260
> [  176.994903]  intel_iommu_iotlb_sync_map+0x1a/0x30
> [  177.002880]  iommu_map+0x61/0xf0
> [  177.009198]  batch_to_domain+0x188/0x250
> [  177.016344]  iopt_area_fill_domains+0x125/0x320
> [  177.024220]  ? rcu_is_watching+0x11/0x50
> [  177.031442]  iopt_map_pages+0x63/0x100
> [  177.038420]  iopt_map_common.isra.0+0xa7/0x190
> [  177.046131]  iopt_map_user_pages+0x6a/0x80
> [  177.053435]  iommufd_ioas_map+0xcd/0x1d0
> [  177.060537]  iommufd_fops_ioctl+0x118/0x1c0
> [  177.067918]  __x64_sys_ioctl+0x93/0xc0
> [  177.074736]  do_syscall_64+0x71/0x140
> [  177.081376]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  177.089489] RIP: 0033:0x7f071491a94f
> [  177.095876] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> [  177.124032] RSP: 002b:00007f070a7f76d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  177.134964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f071491a94f
> [  177.145390] RDX: 00007f070a7f7760 RSI: 0000000000003b85 RDI: 0000000000000015
> [  177.155834] RBP: 0000000000000015 R08: 00007f0603e00000 R09: 0000000000000000
> [  177.166247] R10: 000056326cd25970 R11: 0000000000000246 R12: 00000000000a0000
> [  177.177421] R13: 0000000000000004 R14: 0000000000000000 R15: 00000000000a0000
> [  177.187869]  </TASK>
> [  177.192700] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ofpart snd_hda_codec pmt_telemetry kvm_intel binfmt_misc pmt_class intel_sdsi snd_hda_core spi_nor isst_if_mbox_pci isst_if_mmio kvm dax_hmem mei_me joydev mtd isst_if_common snd_pcm intel_vsec idxd mei snd_timer cxl_acpi cxl_port cxl_core einj acpi_power_meter dm_multipath fuse ip_tables crc32c_intel i2c_i801 spi_intel_pci i2c_smbus i2c_ismt spi_intel drm_shmem_helper igc pinctrl_emmitsburg pinctrl_intel pwm_lpss

Strip the kernel message and leave only valuable messages that
developers might be interested. Something like this,

  BUG: kernel NULL pointer dereference, address: 0000000000000200
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8104795067 P4D 0
  Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 
6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
  Call Trace:
   ? __die+0x24/0x70
   ? page_fault_oops+0x80/0x150
   ? do_user_addr_fault+0x63/0x7b0
   ? exc_page_fault+0x7c/0x220
   ? asm_exc_page_fault+0x26/0x30
   ? cache_tag_flush_range_np+0x13c/0x260
   intel_iommu_iotlb_sync_map+0x1a/0x30
   iommu_map+0x61/0xf0
   batch_to_domain+0x188/0x250
   iopt_area_fill_domains+0x125/0x320
   ? rcu_is_watching+0x11/0x50
   iopt_map_pages+0x63/0x100
   iopt_map_common.isra.0+0xa7/0x190
   iopt_map_user_pages+0x6a/0x80
   iommufd_ioas_map+0xcd/0x1d0
   iommufd_fops_ioctl+0x118/0x1c0
   __x64_sys_ioctl+0x93/0xc0
   do_syscall_64+0x71/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e


> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/cache.c  | 7 -------
>   drivers/iommu/intel/iommu.c  | 7 +++++++
>   drivers/iommu/intel/nested.c | 6 ++++++
>   drivers/iommu/intel/svm.c    | 7 +++++++
>   4 files changed, 20 insertions(+), 7 deletions(-)

[...]

Thanks,
baolu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  2024-12-09  3:07   ` Yi Liu
@ 2024-12-09  3:05     ` Baolu Lu
  2024-12-11  8:32     ` Tian, Kevin
  1 sibling, 0 replies; 7+ messages in thread
From: Baolu Lu @ 2024-12-09  3:05 UTC (permalink / raw)
  To: Yi Liu, joro, kevin.tian; +Cc: chao.p.peng, iommu

On 12/9/24 11:07, Yi Liu wrote:
>> Perhaps add a helper to handle the domain->qi_batch allocation? And then
>> call it in the necessary paths? Something like this:
>>
>> --- a/drivers/iommu/intel/cache.c
>> +++ b/drivers/iommu/intel/cache.c
>> @@ -105,12 +105,35 @@ static void cache_tag_unassign(struct 
>> dmar_domain *domain, u16 did,
>>       spin_unlock_irqrestore(&domain->cache_lock, flags);
>>   }
>>
>> +/* domain->qi_batch will be freed in iommu_free_domain() path. */
>> +static int domain_qi_batch_alloc(struct dmar_domain *domain)
>> +{
>> +    unsigned long flags;
>> +    int ret = 0;
>> +
>> +    spin_lock_irqsave(&domain->cache_lock, flags);
> 
> I'm wondering if there is any difference using the domain->lock v.s. using
> the domain->cache_lock. The RID attach path acquires/releases the
> domain->lock first, and then the domain->cache_lock. While the PASID path
> does the reverse order. Although neither of the two paths hold the locks
> for a long duration (no A-B-B-A locking issue), it still means when a
> domain is shared between RID and PASID path, the domain->qi_batch check is
> not in the same phase of the two paths.

You are right. domain->lock is the right one.

> 
>> +    if (domain->qi_batch)
>> +        goto out_unlock;
>> +
>> +    domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
>> +    if (!domain->qi_batch)
>> +        ret = -ENOMEM;
>> +out_unlock:
>> +    spin_unlock_irqrestore(&domain->cache_lock, flags);
>> +
>> +    return ret;
>> +}
>> +
>>   static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 
>> did,
>>                        struct device *dev, ioasid_t pasid)
>>   {
>>       struct device_domain_info *info = dev_iommu_priv_get(dev);
>>       int ret;
>>
>> +    ret = domain_qi_batch_alloc(domain);
>> +    if (ret)
>> +        return ret;
>> +
> 
> also need to call it in __cache_tag_assign_parent_domain(). BTW. Is the
> domain->qi_batch required if there is no ATS?

Yes.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  2024-12-09  1:39 ` Baolu Lu
@ 2024-12-09  3:07   ` Yi Liu
  2024-12-09  3:05     ` Baolu Lu
  2024-12-11  8:32     ` Tian, Kevin
  2024-12-09  3:10   ` Yi Liu
  1 sibling, 2 replies; 7+ messages in thread
From: Yi Liu @ 2024-12-09  3:07 UTC (permalink / raw)
  To: Baolu Lu, joro, kevin.tian; +Cc: chao.p.peng, iommu

On 2024/12/9 09:39, Baolu Lu wrote:
> On 12/7/24 20:03, Yi Liu wrote:
>> The qi_batch is allocated when assigning cache tag for a domain. While
>> for nested parent domain, it is missed. Hence, when trying to map pages
>> to the nested parent, NULL dereference occurred.
> 
> Yes. Good catch!
> 
>> To solve it, allocate qi_batch for the nested parent domain when assigning
>> cache tag for the nested domain is enough. However, it seems not quite
>> reliable to allocate qi_batch in cache tag as a domain may be used by
>> multiple devices, and there is no lock around the domain->qi_batch check.
>> As all the domains (except blocking domain and identity domain) are supposed
>> to have qi_batch, this fix just allocates the qi_batch in the domain
>> allocation.
> 
> So there appears to be two problems, the domain->qi_batch is not
> allocated for all paths and it's allocated without lock protection that
> creates a race case and possibly result in memory leak.

yes, it is although we haven't encountered it for now.

> But I don't think there is a need to move the domain->qi_batch
> allocation to the domain allocation path. That's a kind of refactoring
> and there is no need to mess it with a fix patch like this.

aha, yes. I felt the qi_batch eventually would be allocating it in domain
allocation, hence make the fix like this.

> Perhaps add a helper to handle the domain->qi_batch allocation? And then
> call it in the necessary paths? Something like this:
> 
> --- a/drivers/iommu/intel/cache.c
> +++ b/drivers/iommu/intel/cache.c
> @@ -105,12 +105,35 @@ static void cache_tag_unassign(struct dmar_domain 
> *domain, u16 did,
>       spin_unlock_irqrestore(&domain->cache_lock, flags);
>   }
> 
> +/* domain->qi_batch will be freed in iommu_free_domain() path. */
> +static int domain_qi_batch_alloc(struct dmar_domain *domain)
> +{
> +    unsigned long flags;
> +    int ret = 0;
> +
> +    spin_lock_irqsave(&domain->cache_lock, flags);

I'm wondering if there is any difference using the domain->lock v.s. using
the domain->cache_lock. The RID attach path acquires/releases the
domain->lock first, and then the domain->cache_lock. While the PASID path
does the reverse order. Although neither of the two paths hold the locks
for a long duration (no A-B-B-A locking issue), it still means when a
domain is shared between RID and PASID path, the domain->qi_batch check is
not in the same phase of the two paths.

> +    if (domain->qi_batch)
> +        goto out_unlock;
> +
> +    domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
> +    if (!domain->qi_batch)
> +        ret = -ENOMEM;
> +out_unlock:
> +    spin_unlock_irqrestore(&domain->cache_lock, flags);
> +
> +    return ret;
> +}
> +
>   static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
>                        struct device *dev, ioasid_t pasid)
>   {
>       struct device_domain_info *info = dev_iommu_priv_get(dev);
>       int ret;
> 
> +    ret = domain_qi_batch_alloc(domain);
> +    if (ret)
> +        return ret;
> +

also need to call it in __cache_tag_assign_parent_domain(). BTW. Is the
domain->qi_batch required if there is no ATS?

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  2024-12-09  1:39 ` Baolu Lu
  2024-12-09  3:07   ` Yi Liu
@ 2024-12-09  3:10   ` Yi Liu
  1 sibling, 0 replies; 7+ messages in thread
From: Yi Liu @ 2024-12-09  3:10 UTC (permalink / raw)
  To: Baolu Lu, joro, kevin.tian; +Cc: chao.p.peng, iommu

On 2024/12/9 09:39, Baolu Lu wrote:
> On 12/7/24 20:03, Yi Liu wrote:
>> The qi_batch is allocated when assigning cache tag for a domain. While
>> for nested parent domain, it is missed. Hence, when trying to map pages
>> to the nested parent, NULL dereference occurred.

> 
> Strip the kernel message and leave only valuable messages that
> developers might be interested. Something like this,

got it.

>   BUG: kernel NULL pointer dereference, address: 0000000000000200
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 8104795067 P4D 0
>   Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 
> 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
>   Call Trace:
>    ? __die+0x24/0x70
>    ? page_fault_oops+0x80/0x150
>    ? do_user_addr_fault+0x63/0x7b0
>    ? exc_page_fault+0x7c/0x220
>    ? asm_exc_page_fault+0x26/0x30
>    ? cache_tag_flush_range_np+0x13c/0x260
>    intel_iommu_iotlb_sync_map+0x1a/0x30
>    iommu_map+0x61/0xf0
>    batch_to_domain+0x188/0x250
>    iopt_area_fill_domains+0x125/0x320
>    ? rcu_is_watching+0x11/0x50
>    iopt_map_pages+0x63/0x100
>    iopt_map_common.isra.0+0xa7/0x190
>    iopt_map_user_pages+0x6a/0x80
>    iommufd_ioas_map+0xcd/0x1d0
>    iommufd_fops_ioctl+0x118/0x1c0
>    __x64_sys_ioctl+0x93/0xc0
>    do_syscall_64+0x71/0x140
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
>

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  2024-12-09  3:07   ` Yi Liu
  2024-12-09  3:05     ` Baolu Lu
@ 2024-12-11  8:32     ` Tian, Kevin
  2024-12-11  9:29       ` Yi Liu
  1 sibling, 1 reply; 7+ messages in thread
From: Tian, Kevin @ 2024-12-11  8:32 UTC (permalink / raw)
  To: Liu, Yi L, Baolu Lu, joro@8bytes.org
  Cc: chao.p.peng@linux.intel.com, iommu@lists.linux.dev

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 9, 2024 11:07 AM
> 
> On 2024/12/9 09:39, Baolu Lu wrote:
> > +/* domain->qi_batch will be freed in iommu_free_domain() path. */
> > +static int domain_qi_batch_alloc(struct dmar_domain *domain)
> > +{
> > +    unsigned long flags;
> > +    int ret = 0;
> > +
> > +    spin_lock_irqsave(&domain->cache_lock, flags);
> 
> I'm wondering if there is any difference using the domain->lock v.s. using
> the domain->cache_lock. The RID attach path acquires/releases the
> domain->lock first, and then the domain->cache_lock. While the PASID path
> does the reverse order. Although neither of the two paths hold the locks
> for a long duration (no A-B-B-A locking issue), it still means when a
> domain is shared between RID and PASID path, the domain->qi_batch check
> is
> not in the same phase of the two paths.
> 

why is there different order between RID vs. PASID attach?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  2024-12-11  8:32     ` Tian, Kevin
@ 2024-12-11  9:29       ` Yi Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Yi Liu @ 2024-12-11  9:29 UTC (permalink / raw)
  To: Tian, Kevin, Baolu Lu, joro@8bytes.org
  Cc: chao.p.peng@linux.intel.com, iommu@lists.linux.dev

On 2024/12/11 16:32, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, December 9, 2024 11:07 AM
>>
>> On 2024/12/9 09:39, Baolu Lu wrote:
>>> +/* domain->qi_batch will be freed in iommu_free_domain() path. */
>>> +static int domain_qi_batch_alloc(struct dmar_domain *domain)
>>> +{
>>> +    unsigned long flags;
>>> +    int ret = 0;
>>> +
>>> +    spin_lock_irqsave(&domain->cache_lock, flags);
>>
>> I'm wondering if there is any difference using the domain->lock v.s. using
>> the domain->cache_lock. The RID attach path acquires/releases the
>> domain->lock first, and then the domain->cache_lock. While the PASID path
>> does the reverse order. Although neither of the two paths hold the locks
>> for a long duration (no A-B-B-A locking issue), it still means when a
>> domain is shared between RID and PASID path, the domain->qi_batch check
>> is
>> not in the same phase of the two paths.
>>
> 
> why is there different order between RID vs. PASID attach?

I think they should be consolidated to follow the same order to have
better readability although there is no actual problem for now.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-11  9:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-07 12:03 [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain Yi Liu
2024-12-09  1:39 ` Baolu Lu
2024-12-09  3:07   ` Yi Liu
2024-12-09  3:05     ` Baolu Lu
2024-12-11  8:32     ` Tian, Kevin
2024-12-11  9:29       ` Yi Liu
2024-12-09  3:10   ` Yi Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.