All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolu Lu <baolu.lu@linux.intel.com>
To: Yi Liu <yi.l.liu@intel.com>, joro@8bytes.org, kevin.tian@intel.com
Cc: chao.p.peng@linux.intel.com, iommu@lists.linux.dev
Subject: Re: [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
Date: Mon, 9 Dec 2024 09:39:41 +0800	[thread overview]
Message-ID: <ce413d97-ec46-4cd9-932a-af804cd531cf@linux.intel.com> (raw)
In-Reply-To: <20241207120304.5710-1-yi.l.liu@intel.com>

On 12/7/24 20:03, Yi Liu wrote:
> The qi_batch is allocated when assigning cache tag for a domain. While
> for nested parent domain, it is missed. Hence, when trying to map pages
> to the nested parent, NULL dereference occurred.

Yes. Good catch!

> To solve it, allocate qi_batch for the nested parent domain when assigning
> cache tag for the nested domain is enough. However, it seems not quite
> reliable to allocate qi_batch in cache tag as a domain may be used by
> multiple devices, and there is no lock around the domain->qi_batch check.
> As all the domains (except blocking domain and identity domain) are supposed
> to have qi_batch, this fix just allocates the qi_batch in the domain
> allocation.

So there appears to be two problems, the domain->qi_batch is not
allocated for all paths and it's allocated without lock protection that
creates a race case and possibly result in memory leak.

But I don't think there is a need to move the domain->qi_batch
allocation to the domain allocation path. That's a kind of refactoring
and there is no need to mess it with a fix patch like this.

Perhaps add a helper to handle the domain->qi_batch allocation? And then
call it in the necessary paths? Something like this:

--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -105,12 +105,35 @@ static void cache_tag_unassign(struct dmar_domain 
*domain, u16 did,
  	spin_unlock_irqrestore(&domain->cache_lock, flags);
  }

+/* domain->qi_batch will be freed in iommu_free_domain() path. */
+static int domain_qi_batch_alloc(struct dmar_domain *domain)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&domain->cache_lock, flags);
+	if (domain->qi_batch)
+		goto out_unlock;
+
+	domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC);
+	if (!domain->qi_batch)
+		ret = -ENOMEM;
+out_unlock:
+	spin_unlock_irqrestore(&domain->cache_lock, flags);
+
+	return ret;
+}
+
  static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did,
  				     struct device *dev, ioasid_t pasid)
  {
  	struct device_domain_info *info = dev_iommu_priv_get(dev);
  	int ret;

+	ret = domain_qi_batch_alloc(domain);
+	if (ret)
+		return ret;
+
  	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_IOTLB);
  	if (ret || !info->ats_enabled)
  		return ret;
@@ -139,6 +162,10 @@ static int __cache_tag_assign_parent_domain(struct 
dmar_domain *domain, u16 did,
  	struct device_domain_info *info = dev_iommu_priv_get(dev);
  	int ret;

+	ret = domain_qi_batch_alloc(domain);
+	if (ret)
+		return ret;
+
  	ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB);
  	if (ret || !info->ats_enabled)
  		return ret;
@@ -190,13 +217,6 @@ int cache_tag_assign_domain(struct dmar_domain *domain,
  	u16 did = domain_get_id_for_dev(domain, dev);
  	int ret;

-	/* domain->qi_bach will be freed in iommu_free_domain() path. */
-	if (!domain->qi_batch) {
-		domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_KERNEL);
-		if (!domain->qi_batch)
-			return -ENOMEM;
-	}
-
  	ret = __cache_tag_assign_domain(domain, did, dev, pasid);
  	if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED)
  		return ret;

> 
> [  176.749284] BUG: kernel NULL pointer dereference, address: 0000000000000200
> [  176.758178] #PF: supervisor read access in kernel mode
> [  176.765052] #PF: error_code(0x0000) - not-present page
> [  176.771985] PGD 8104795067 P4D 0
> [  176.776900] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  176.783640] CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
> [  176.797868] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0107.D52.2311070228 11/07/2023
> [  176.812762] RIP: 0010:cache_tag_flush_range_np+0x13c/0x260
> [  176.820656] Code: 8b 6e 18 41 f6 45 18 80 0f 85 66 ff ff ff 4c 89 ef e8 e8 98 ff ff 4d 8b 36 49 39 ee 75 a3 4d 85 ed 48 8b ab 28 01 00 00 74 0e <8b> 95 00 02 00 00 85 d2 0f 85 88 00 00 00 48 8b 74 24 08 4c 89 ff
> [  176.847296] RSP: 0018:ff2e80d8b4887aa0 EFLAGS: 00010086
> [  176.855207] RAX: 0000000000000000 RBX: ff18cd64bca69000 RCX: 0000000000000001
> [  176.865337] RDX: 0000000000000000 RSI: ffffffff98206127 RDI: ff18cd6440139c00
> [  176.875453] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [  176.885622] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [  176.895802] R13: ff18cd6440139c00 R14: ff18cd64bca69118 R15: ff18cd64bca690d0
> [  176.906056] FS:  00007f070a7fc640(0000) GS:ff18cde22fa00000(0000) knlGS:0000000000000000
> [  176.917459] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  176.926274] CR2: 0000000000000200 CR3: 00000080996ee003 CR4: 0000000000f73ef0
> [  176.936703] PKRU: 55555554
> [  176.942175] Call Trace:
> [  176.947385]  <TASK>
> [  176.952216]  ? __die+0x24/0x70
> [  176.958114]  ? page_fault_oops+0x80/0x150
> [  176.965133]  ? do_user_addr_fault+0x63/0x7b0
> [  176.972462]  ? exc_page_fault+0x7c/0x220
> [  176.979437]  ? asm_exc_page_fault+0x26/0x30
> [  176.986755]  ? cache_tag_flush_range_np+0x13c/0x260
> [  176.994903]  intel_iommu_iotlb_sync_map+0x1a/0x30
> [  177.002880]  iommu_map+0x61/0xf0
> [  177.009198]  batch_to_domain+0x188/0x250
> [  177.016344]  iopt_area_fill_domains+0x125/0x320
> [  177.024220]  ? rcu_is_watching+0x11/0x50
> [  177.031442]  iopt_map_pages+0x63/0x100
> [  177.038420]  iopt_map_common.isra.0+0xa7/0x190
> [  177.046131]  iopt_map_user_pages+0x6a/0x80
> [  177.053435]  iommufd_ioas_map+0xcd/0x1d0
> [  177.060537]  iommufd_fops_ioctl+0x118/0x1c0
> [  177.067918]  __x64_sys_ioctl+0x93/0xc0
> [  177.074736]  do_syscall_64+0x71/0x140
> [  177.081376]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  177.089489] RIP: 0033:0x7f071491a94f
> [  177.095876] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> [  177.124032] RSP: 002b:00007f070a7f76d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  177.134964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f071491a94f
> [  177.145390] RDX: 00007f070a7f7760 RSI: 0000000000003b85 RDI: 0000000000000015
> [  177.155834] RBP: 0000000000000015 R08: 00007f0603e00000 R09: 0000000000000000
> [  177.166247] R10: 000056326cd25970 R11: 0000000000000246 R12: 00000000000a0000
> [  177.177421] R13: 0000000000000004 R14: 0000000000000000 R15: 00000000000a0000
> [  177.187869]  </TASK>
> [  177.192700] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ofpart snd_hda_codec pmt_telemetry kvm_intel binfmt_misc pmt_class intel_sdsi snd_hda_core spi_nor isst_if_mbox_pci isst_if_mmio kvm dax_hmem mei_me joydev mtd isst_if_common snd_pcm intel_vsec idxd mei snd_timer cxl_acpi cxl_port cxl_core einj acpi_power_meter dm_multipath fuse ip_tables crc32c_intel i2c_i801 spi_intel_pci i2c_smbus i2c_ismt spi_intel drm_shmem_helper igc pinctrl_emmitsburg pinctrl_intel pwm_lpss

Strip the kernel message and leave only valuable messages that
developers might be interested. Something like this,

  BUG: kernel NULL pointer dereference, address: 0000000000000200
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8104795067 P4D 0
  Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 
6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
  Call Trace:
   ? __die+0x24/0x70
   ? page_fault_oops+0x80/0x150
   ? do_user_addr_fault+0x63/0x7b0
   ? exc_page_fault+0x7c/0x220
   ? asm_exc_page_fault+0x26/0x30
   ? cache_tag_flush_range_np+0x13c/0x260
   intel_iommu_iotlb_sync_map+0x1a/0x30
   iommu_map+0x61/0xf0
   batch_to_domain+0x188/0x250
   iopt_area_fill_domains+0x125/0x320
   ? rcu_is_watching+0x11/0x50
   iopt_map_pages+0x63/0x100
   iopt_map_common.isra.0+0xa7/0x190
   iopt_map_user_pages+0x6a/0x80
   iommufd_ioas_map+0xcd/0x1d0
   iommufd_fops_ioctl+0x118/0x1c0
   __x64_sys_ioctl+0x93/0xc0
   do_syscall_64+0x71/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e


> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/cache.c  | 7 -------
>   drivers/iommu/intel/iommu.c  | 7 +++++++
>   drivers/iommu/intel/nested.c | 6 ++++++
>   drivers/iommu/intel/svm.c    | 7 +++++++
>   4 files changed, 20 insertions(+), 7 deletions(-)

[...]

Thanks,
baolu

  reply	other threads:[~2024-12-09  1:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-07 12:03 [PATCH] iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain Yi Liu
2024-12-09  1:39 ` Baolu Lu [this message]
2024-12-09  3:07   ` Yi Liu
2024-12-09  3:05     ` Baolu Lu
2024-12-11  8:32     ` Tian, Kevin
2024-12-11  9:29       ` Yi Liu
2024-12-09  3:10   ` Yi Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce413d97-ec46-4cd9-932a-af804cd531cf@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.