* [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next
@ 2024-03-05 12:21 Lu Baolu
2024-03-05 12:21 ` [PATCH 1/8] PCI: Make pci_dev_is_disconnected() helper public for other drivers Lu Baolu
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
Hi Joerg,
The following fixes have been queued for iommu next:
- Fix hard lockup on hotplug of an ATS-capable device
- Fix NULL domain pointer dereference on device release
These changes are not critical for v6.8, so can you please consider them
for iommu next?
Best regards,
baolu
Ethan Zhao (3):
PCI: Make pci_dev_is_disconnected() helper public for other drivers
iommu/vt-d: Don't issue ATS Invalidation request when device is
disconnected
iommu/vt-d: Improve ITE fault handling if target device isn't present
Lu Baolu (5):
iommu: Add static iommu_ops->release_domain
iommu/vt-d: Fix NULL domain on device release
iommu/vt-d: Setup scalable mode context entry in probe path
iommu/vt-d: Remove scalable mode context entry setup from attach_dev
iommu/vt-d: Remove scalabe mode in domain_context_clear_one()
include/linux/iommu.h | 1 +
include/linux/pci.h | 5 +
drivers/iommu/intel/pasid.h | 2 +
drivers/pci/pci.h | 5 -
drivers/iommu/intel/dmar.c | 22 ++++
drivers/iommu/intel/iommu.c | 214 +++++++++++-------------------------
drivers/iommu/intel/pasid.c | 205 ++++++++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 19 +++-
8 files changed, 313 insertions(+), 160 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/8] PCI: Make pci_dev_is_disconnected() helper public for other drivers
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-05 12:21 ` [PATCH 2/8] iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected Lu Baolu
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
From: Ethan Zhao <haifeng.zhao@linux.intel.com>
Make pci_dev_is_disconnected() public so that it can be called from
Intel VT-d driver to quickly fix/workaround the surprise removal
unplug hang issue for those ATS capable devices on PCIe switch downstream
hotplug capable ports.
Beside pci_device_is_present() function, this one has no config space
space access, so is light enough to optimize the normal pure surprise
removal and safe removal flow.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Link: https://lore.kernel.org/r/20240301080727.3529832-2-haifeng.zhao@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
include/linux/pci.h | 5 +++++
drivers/pci/pci.h | 5 -----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7ab0d13672da..213109d3c601 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2517,6 +2517,11 @@ static inline struct pci_dev *pcie_find_root_port(struct pci_dev *dev)
return NULL;
}
+static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
+{
+ return dev->error_state == pci_channel_io_perm_failure;
+}
+
void pci_request_acs(void);
bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
bool pci_acs_path_enabled(struct pci_dev *start,
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e9750b1b19ba..bfc56f7bee1c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -368,11 +368,6 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
return 0;
}
-static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
-{
- return dev->error_state == pci_channel_io_perm_failure;
-}
-
/* pci_dev priv_flags */
#define PCI_DEV_ADDED 0
#define PCI_DPC_RECOVERED 1
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/8] iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
2024-03-05 12:21 ` [PATCH 1/8] PCI: Make pci_dev_is_disconnected() helper public for other drivers Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-05 12:21 ` [PATCH 3/8] iommu/vt-d: Improve ITE fault handling if target device isn't present Lu Baolu
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
From: Ethan Zhao <haifeng.zhao@linux.intel.com>
For those endpoint devices connect to system via hotplug capable ports,
users could request a hot reset to the device by flapping device's link
through setting the slot's link control register, as pciehp_ist() DLLSC
interrupt sequence response, pciehp will unload the device driver and
then power it off. thus cause an IOMMU device-TLB invalidation (Intel
VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
target device to be sent and deadly loop to retry that request after ITE
fault triggered in interrupt context.
That would cause following continuous hard lockup warning and system hang
[ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down
[ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present
[ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
[ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
OE kernel version xxxx
[ 4223.822623] Hardware name: vendorname xxxx 666-106,
BIOS 01.01.02.03.01 05/15/2023
[ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490
[ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1
0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
[ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
[ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
[ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
[ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
[ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
[ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
[ 4223.822626] FS: 0000000000000000(0000) GS:ffffa237ae400000(0000)
knlGS:0000000000000000
[ 4223.822627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0
[ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 4223.822628] PKRU: 55555554
[ 4223.822628] Call Trace:
[ 4223.822628] qi_flush_dev_iotlb+0xb1/0xd0
[ 4223.822628] __dmar_remove_one_dev_info+0x224/0x250
[ 4223.822629] dmar_remove_one_dev_info+0x3e/0x50
[ 4223.822629] intel_iommu_release_device+0x1f/0x30
[ 4223.822629] iommu_release_device+0x33/0x60
[ 4223.822629] iommu_bus_notifier+0x7f/0x90
[ 4223.822630] blocking_notifier_call_chain+0x60/0x90
[ 4223.822630] device_del+0x2e5/0x420
[ 4223.822630] pci_remove_bus_device+0x70/0x110
[ 4223.822630] pciehp_unconfigure_device+0x7c/0x130
[ 4223.822631] pciehp_disable_slot+0x6b/0x100
[ 4223.822631] pciehp_handle_presence_or_link_change+0xd8/0x320
[ 4223.822631] pciehp_ist+0x176/0x180
[ 4223.822631] ? irq_finalize_oneshot.part.50+0x110/0x110
[ 4223.822632] irq_thread_fn+0x19/0x50
[ 4223.822632] irq_thread+0x104/0x190
[ 4223.822632] ? irq_forced_thread_fn+0x90/0x90
[ 4223.822632] ? irq_thread_check_affinity+0xe0/0xe0
[ 4223.822633] kthread+0x114/0x130
[ 4223.822633] ? __kthread_cancel_work+0x40/0x40
[ 4223.822633] ret_from_fork+0x1f/0x30
[ 4223.822633] Kernel panic - not syncing: Hard LOCKUP
[ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
OE kernel version xxxx
[ 4223.822634] Hardware name: vendorname xxxx 666-106,
BIOS 01.01.02.03.01 05/15/2023
[ 4223.822634] Call Trace:
[ 4223.822634] <NMI>
[ 4223.822635] dump_stack+0x6d/0x88
[ 4223.822635] panic+0x101/0x2d0
[ 4223.822635] ? ret_from_fork+0x11/0x30
[ 4223.822635] nmi_panic.cold.14+0xc/0xc
[ 4223.822636] watchdog_overflow_callback.cold.8+0x6d/0x81
[ 4223.822636] __perf_event_overflow+0x4f/0xf0
[ 4223.822636] handle_pmi_common+0x1ef/0x290
[ 4223.822636] ? __set_pte_vaddr+0x28/0x40
[ 4223.822637] ? flush_tlb_one_kernel+0xa/0x20
[ 4223.822637] ? __native_set_fixmap+0x24/0x30
[ 4223.822637] ? ghes_copy_tofrom_phys+0x70/0x100
[ 4223.822637] ? __ghes_peek_estatus.isra.16+0x49/0xa0
[ 4223.822637] intel_pmu_handle_irq+0xba/0x2b0
[ 4223.822638] perf_event_nmi_handler+0x24/0x40
[ 4223.822638] nmi_handle+0x4d/0xf0
[ 4223.822638] default_do_nmi+0x49/0x100
[ 4223.822638] exc_nmi+0x134/0x180
[ 4223.822639] end_repeat_nmi+0x16/0x67
[ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490
[ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10
74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
[ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
[ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
[ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
[ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
[ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
[ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
[ 4223.822641] ? qi_submit_sync+0x2c0/0x490
[ 4223.822642] ? qi_submit_sync+0x2c0/0x490
[ 4223.822642] </NMI>
[ 4223.822642] qi_flush_dev_iotlb+0xb1/0xd0
[ 4223.822642] __dmar_remove_one_dev_info+0x224/0x250
[ 4223.822643] dmar_remove_one_dev_info+0x3e/0x50
[ 4223.822643] intel_iommu_release_device+0x1f/0x30
[ 4223.822643] iommu_release_device+0x33/0x60
[ 4223.822643] iommu_bus_notifier+0x7f/0x90
[ 4223.822644] blocking_notifier_call_chain+0x60/0x90
[ 4223.822644] device_del+0x2e5/0x420
[ 4223.822644] pci_remove_bus_device+0x70/0x110
[ 4223.822644] pciehp_unconfigure_device+0x7c/0x130
[ 4223.822644] pciehp_disable_slot+0x6b/0x100
[ 4223.822645] pciehp_handle_presence_or_link_change+0xd8/0x320
[ 4223.822645] pciehp_ist+0x176/0x180
[ 4223.822645] ? irq_finalize_oneshot.part.50+0x110/0x110
[ 4223.822645] irq_thread_fn+0x19/0x50
[ 4223.822646] irq_thread+0x104/0x190
[ 4223.822646] ? irq_forced_thread_fn+0x90/0x90
[ 4223.822646] ? irq_thread_check_affinity+0xe0/0xe0
[ 4223.822646] kthread+0x114/0x130
[ 4223.822647] ? __kthread_cancel_work+0x40/0x40
[ 4223.822647] ret_from_fork+0x1f/0x30
[ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
Such issue could be triggered by all kinds of regular surprise removal
hotplug operation. like:
1. pull EP(endpoint device) out directly.
2. turn off EP's power.
3. bring the link down.
etc.
this patch aims to work for regular safe removal and surprise removal
unplug. these hot unplug handling process could be optimized for fix the
ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
function devtlb_invalidation_with_pasid() to check target device state to
avoid sending meaningless ATS Invalidation request to iommu when device is
gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
For safe removal, device wouldn't be removed until the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS Invalidation timeout wait. In safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.
For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS Invalidation request to
the disconnected device blindly, thus avoid to trigger further ITE fault,
and ITE fault will block all invalidation request to be handled.
furthermore retry the timeout request could trigger hard lockup.
safe removal (present) & surprise removal (not present)
pciehp_ist()
pciehp_handle_presence_or_link_change()
pciehp_disable_slot()
remove_board()
pciehp_unconfigure_device(presence) {
if (!presence)
pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
}
this patch works for regular safe removal and surprise removal of ATS
capable endpoint on PCIe switch downstream ports.
Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Link: https://lore.kernel.org/r/20240301080727.3529832-3-haifeng.zhao@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
drivers/iommu/intel/pasid.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 108158e2b907..746c7abe2237 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -214,6 +214,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
if (!info || !info->ats_enabled)
return;
+ if (pci_dev_is_disconnected(to_pci_dev(dev)))
+ return;
+
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
pfsid = info->pfsid;
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/8] iommu/vt-d: Improve ITE fault handling if target device isn't present
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
2024-03-05 12:21 ` [PATCH 1/8] PCI: Make pci_dev_is_disconnected() helper public for other drivers Lu Baolu
2024-03-05 12:21 ` [PATCH 2/8] iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-05 12:21 ` [PATCH 4/8] iommu: Add static iommu_ops->release_domain Lu Baolu
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
From: Ethan Zhao <haifeng.zhao@linux.intel.com>
Because surprise removal could happen anytime, e.g. user could request safe
removal to EP(endpoint device) via sysfs and brings its link down to do
surprise removal cocurrently. such aggressive cases would cause ATS
invalidation request issued to non-existence target device, then deadly
loop to retry that request after ITE fault triggered in interrupt context.
this patch aims to optimize the ITE handling by checking the target device
presence state to avoid retrying the timeout request blindly, thus avoid
hard lockup or system hang.
Devices TLB should only be invalidated when devices are in the
iommu->device_rbtree (probed, not released) and present.
Fixes: 6ba6c3a4cacf ("VT-d: add device IOTLB invalidation support")
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Link: https://lore.kernel.org/r/20240301080727.3529832-4-haifeng.zhao@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
drivers/iommu/intel/dmar.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d14797aabb7a..36d7427b1202 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1273,6 +1273,8 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
{
u32 fault;
int head, tail;
+ struct device *dev;
+ u64 iqe_err, ite_sid;
struct q_inval *qi = iommu->qi;
int shift = qi_shift(iommu);
@@ -1317,6 +1319,13 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
tail = readl(iommu->reg + DMAR_IQT_REG);
tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
+ /*
+ * SID field is valid only when the ITE field is Set in FSTS_REG
+ * see Intel VT-d spec r4.1, section 11.4.9.9
+ */
+ iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
+ ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
+
writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
pr_info("Invalidation Time-out Error (ITE) cleared\n");
@@ -1326,6 +1335,19 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
head = (head - 2 + QI_LENGTH) % QI_LENGTH;
} while (head != tail);
+ /*
+ * If device was released or isn't present, no need to retry
+ * the ATS invalidate request anymore.
+ *
+ * 0 value of ite_sid means old VT-d device, no ite_sid value.
+ * see Intel VT-d spec r4.1, section 11.4.9.9
+ */
+ if (ite_sid) {
+ dev = device_rbtree_find(iommu, ite_sid);
+ if (!dev || !dev_is_pci(dev) ||
+ !pci_device_is_present(to_pci_dev(dev)))
+ return -ETIMEDOUT;
+ }
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/8] iommu: Add static iommu_ops->release_domain
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
` (2 preceding siblings ...)
2024-03-05 12:21 ` [PATCH 3/8] iommu/vt-d: Improve ITE fault handling if target device isn't present Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-05 12:21 ` [PATCH 5/8] iommu/vt-d: Fix NULL domain on device release Lu Baolu
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
The current device_release callback for individual iommu drivers does the
following:
1) Silent IOMMU DMA translation: It detaches any existing domain from the
device and puts it into a blocking state (some drivers might use the
identity state).
2) Resource release: It releases resources allocated during the
device_probe callback and restores the device to its pre-probe state.
Step 1 is challenging for individual iommu drivers because each must check
if a domain is already attached to the device. Additionally, if a deferred
attach never occurred, the device_release should avoid modifying hardware
configuration regardless of the reason for its call.
To simplify this process, introduce a static release_domain within the
iommu_ops structure. It can be either a blocking or identity domain
depending on the iommu hardware. The iommu core will decide whether to
attach this domain before the device_release callback, eliminating the
need for repetitive code in various drivers.
Consequently, the device_release callback can focus solely on the opposite
operations of device_probe, including releasing all resources allocated
during that callback.
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240305013305.204605-2-baolu.lu@linux.intel.com
---
include/linux/iommu.h | 1 +
drivers/iommu/iommu.c | 19 +++++++++++++++----
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index af6c367ed673..2e925b5eba53 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -585,6 +585,7 @@ struct iommu_ops {
struct module *owner;
struct iommu_domain *identity_domain;
struct iommu_domain *blocked_domain;
+ struct iommu_domain *release_domain;
struct iommu_domain *default_domain;
};
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index eb50543bf956..098869007c69 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -462,13 +462,24 @@ static void iommu_deinit_device(struct device *dev)
/*
* release_device() must stop using any attached domain on the device.
- * If there are still other devices in the group they are not effected
+ * If there are still other devices in the group, they are not affected
* by this callback.
*
- * The IOMMU driver must set the device to either an identity or
- * blocking translation and stop using any domain pointer, as it is
- * going to be freed.
+ * If the iommu driver provides release_domain, the core code ensures
+ * that domain is attached prior to calling release_device. Drivers can
+ * use this to enforce a translation on the idle iommu. Typically, the
+ * global static blocked_domain is a good choice.
+ *
+ * Otherwise, the iommu driver must set the device to either an identity
+ * or a blocking translation in release_device() and stop using any
+ * domain pointer, as it is going to be freed.
+ *
+ * Regardless, if a delayed attach never occurred, then the release
+ * should still avoid touching any hardware configuration either.
*/
+ if (!dev->iommu->attach_deferred && ops->release_domain)
+ ops->release_domain->ops->attach_dev(ops->release_domain, dev);
+
if (ops->release_device)
ops->release_device(dev);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/8] iommu/vt-d: Fix NULL domain on device release
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
` (3 preceding siblings ...)
2024-03-05 12:21 ` [PATCH 4/8] iommu: Add static iommu_ops->release_domain Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-05 12:21 ` [PATCH 6/8] iommu/vt-d: Setup scalable mode context entry in probe path Lu Baolu
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
In the kdump kernel, the IOMMU operates in deferred_attach mode. In this
mode, info->domain may not yet be assigned by the time the release_device
function is called. It leads to the following crash in the crash kernel:
BUG: kernel NULL pointer dereference, address: 000000000000003c
...
RIP: 0010:do_raw_spin_lock+0xa/0xa0
...
_raw_spin_lock_irqsave+0x1b/0x30
intel_iommu_release_device+0x96/0x170
iommu_deinit_device+0x39/0xf0
__iommu_group_remove_device+0xa0/0xd0
iommu_bus_notifier+0x55/0xb0
notifier_call_chain+0x5a/0xd0
blocking_notifier_call_chain+0x41/0x60
bus_notify+0x34/0x50
device_del+0x269/0x3d0
pci_remove_bus_device+0x77/0x100
p2sb_bar+0xae/0x1d0
...
i801_probe+0x423/0x740
Use the release_domain mechanism to fix it. The scalable mode context
entry which is not part of release domain should be cleared in
release_device().
Fixes: 586081d3f6b1 ("iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO")
Reported-by: Eric Badger <ebadger@purestorage.com>
Closes: https://lore.kernel.org/r/20240113181713.1817855-1-ebadger@purestorage.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240305013305.204605-3-baolu.lu@linux.intel.com
---
drivers/iommu/intel/pasid.h | 1 +
drivers/iommu/intel/iommu.c | 31 ++++--------------
drivers/iommu/intel/pasid.c | 64 +++++++++++++++++++++++++++++++++++++
3 files changed, 71 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 487ede039bdd..42fda97fd851 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -318,4 +318,5 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
bool fault_ignore);
void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
struct device *dev, u32 pasid);
+void intel_pasid_teardown_sm_context(struct device *dev);
#endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc3994efd362..f74d42d3258f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3869,30 +3869,6 @@ static void domain_context_clear(struct device_domain_info *info)
&domain_context_clear_one_cb, info);
}
-static void dmar_remove_one_dev_info(struct device *dev)
-{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct dmar_domain *domain = info->domain;
- struct intel_iommu *iommu = info->iommu;
- unsigned long flags;
-
- if (!dev_is_real_dma_subdevice(info->dev)) {
- if (dev_is_pci(info->dev) && sm_supported(iommu))
- intel_pasid_tear_down_entry(iommu, info->dev,
- IOMMU_NO_PASID, false);
-
- iommu_disable_pci_caps(info);
- domain_context_clear(info);
- }
-
- spin_lock_irqsave(&domain->lock, flags);
- list_del(&info->link);
- spin_unlock_irqrestore(&domain->lock, flags);
-
- domain_detach_iommu(domain, iommu);
- info->domain = NULL;
-}
-
/*
* Clear the page table pointer in context or pasid table entries so that
* all DMA requests without PASID from the device are blocked. If the page
@@ -4431,7 +4407,11 @@ static void intel_iommu_release_device(struct device *dev)
mutex_lock(&iommu->iopf_lock);
device_rbtree_remove(info);
mutex_unlock(&iommu->iopf_lock);
- dmar_remove_one_dev_info(dev);
+
+ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
+ !context_copied(iommu, info->bus, info->devfn))
+ intel_pasid_teardown_sm_context(dev);
+
intel_pasid_free_table(dev);
intel_iommu_debugfs_remove_dev(info);
kfree(info);
@@ -4922,6 +4902,7 @@ static const struct iommu_dirty_ops intel_dirty_ops = {
const struct iommu_ops intel_iommu_ops = {
.blocked_domain = &blocking_domain,
+ .release_domain = &blocking_domain,
.capable = intel_iommu_capable,
.hw_info = intel_iommu_hw_info,
.domain_alloc = intel_iommu_domain_alloc,
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 746c7abe2237..a51e895d9a17 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -670,3 +670,67 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
return 0;
}
+
+/*
+ * Interfaces to setup or teardown a pasid table to the scalable-mode
+ * context table entry:
+ */
+
+static void device_pasid_table_teardown(struct device *dev, u8 bus, u8 devfn)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct context_entry *context;
+
+ spin_lock(&iommu->lock);
+ context = iommu_context_addr(iommu, bus, devfn, false);
+ if (!context) {
+ spin_unlock(&iommu->lock);
+ return;
+ }
+
+ context_clear_entry(context);
+ __iommu_flush_cache(iommu, context, sizeof(*context));
+ spin_unlock(&iommu->lock);
+
+ /*
+ * Cache invalidation for changes to a scalable-mode context table
+ * entry.
+ *
+ * Section 6.5.3.3 of the VT-d spec:
+ * - Device-selective context-cache invalidation;
+ * - Domain-selective PASID-cache invalidation to affected domains
+ * (can be skipped if all PASID entries were not-present);
+ * - Domain-selective IOTLB invalidation to affected domains;
+ * - Global Device-TLB invalidation to affected functions.
+ *
+ * The iommu has been parked in the blocking state. All domains have
+ * been detached from the device or PASID. The PASID and IOTLB caches
+ * have been invalidated during the domain detach path.
+ */
+ iommu->flush.flush_context(iommu, 0, PCI_DEVID(bus, devfn),
+ DMA_CCMD_MASK_NOBIT, DMA_CCMD_DEVICE_INVL);
+ devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);
+}
+
+static int pci_pasid_table_teardown(struct pci_dev *pdev, u16 alias, void *data)
+{
+ struct device *dev = data;
+
+ if (dev == &pdev->dev)
+ device_pasid_table_teardown(dev, PCI_BUS_NUM(alias), alias & 0xff);
+
+ return 0;
+}
+
+void intel_pasid_teardown_sm_context(struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+ if (!dev_is_pci(dev)) {
+ device_pasid_table_teardown(dev, info->bus, info->devfn);
+ return;
+ }
+
+ pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_teardown, dev);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/8] iommu/vt-d: Setup scalable mode context entry in probe path
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
` (4 preceding siblings ...)
2024-03-05 12:21 ` [PATCH 5/8] iommu/vt-d: Fix NULL domain on device release Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-05 12:21 ` [PATCH 7/8] iommu/vt-d: Remove scalable mode context entry setup from attach_dev Lu Baolu
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
In contrast to legacy mode, the DMA translation table is configured in
the PASID table entry instead of the context entry for scalable mode.
For this reason, it is more appropriate to set up the scalable mode
context entry in the device_probe callback and direct it to the
appropriate PASID table.
The iommu domain attach/detach operations only affect the PASID table
entry. Therefore, there is no need to modify the context entry when
configuring the translation type and page table.
The only exception is the kdump case, where context entry setup is
postponed until the device driver invokes the first DMA interface.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240305013305.204605-4-baolu.lu@linux.intel.com
---
drivers/iommu/intel/pasid.h | 1 +
drivers/iommu/intel/iommu.c | 12 ++++
drivers/iommu/intel/pasid.c | 138 ++++++++++++++++++++++++++++++++++++
3 files changed, 151 insertions(+)
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 42fda97fd851..da9978fef7ac 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -318,5 +318,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
bool fault_ignore);
void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
struct device *dev, u32 pasid);
+int intel_pasid_setup_sm_context(struct device *dev);
void intel_pasid_teardown_sm_context(struct device *dev);
#endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f74d42d3258f..9b96d36b9d2a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4073,6 +4073,10 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
dmar_domain->agaw--;
}
+ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
+ context_copied(iommu, info->bus, info->devfn))
+ return intel_pasid_setup_sm_context(dev);
+
return 0;
}
@@ -4386,11 +4390,19 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
dev_err(dev, "PASID table allocation failed\n");
goto clear_rbtree;
}
+
+ if (!context_copied(iommu, info->bus, info->devfn)) {
+ ret = intel_pasid_setup_sm_context(dev);
+ if (ret)
+ goto free_table;
+ }
}
intel_iommu_debugfs_create_dev(info);
return &iommu->iommu;
+free_table:
+ intel_pasid_free_table(dev);
clear_rbtree:
device_rbtree_remove(info);
free:
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index a51e895d9a17..11f0b856d74c 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -734,3 +734,141 @@ void intel_pasid_teardown_sm_context(struct device *dev)
pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_teardown, dev);
}
+
+/*
+ * Get the PASID directory size for scalable mode context entry.
+ * Value of X in the PDTS field of a scalable mode context entry
+ * indicates PASID directory with 2^(X + 7) entries.
+ */
+static unsigned long context_get_sm_pds(struct pasid_table *table)
+{
+ unsigned long pds, max_pde;
+
+ max_pde = table->max_pasid >> PASID_PDE_SHIFT;
+ pds = find_first_bit(&max_pde, MAX_NR_PASID_BITS);
+ if (pds < 7)
+ return 0;
+
+ return pds - 7;
+}
+
+static int context_entry_set_pasid_table(struct context_entry *context,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct pasid_table *table = info->pasid_table;
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long pds;
+
+ context_clear_entry(context);
+
+ pds = context_get_sm_pds(table);
+ context->lo = (u64)virt_to_phys(table->table) | context_pdts(pds);
+ context_set_sm_rid2pasid(context, IOMMU_NO_PASID);
+
+ if (info->ats_supported)
+ context_set_sm_dte(context);
+ if (info->pri_supported)
+ context_set_sm_pre(context);
+ if (info->pasid_supported)
+ context_set_pasid(context);
+
+ context_set_fault_enable(context);
+ context_set_present(context);
+ __iommu_flush_cache(iommu, context, sizeof(*context));
+
+ return 0;
+}
+
+static int device_pasid_table_setup(struct device *dev, u8 bus, u8 devfn)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct context_entry *context;
+
+ spin_lock(&iommu->lock);
+ context = iommu_context_addr(iommu, bus, devfn, true);
+ if (!context) {
+ spin_unlock(&iommu->lock);
+ return -ENOMEM;
+ }
+
+ if (context_present(context) && !context_copied(iommu, bus, devfn)) {
+ spin_unlock(&iommu->lock);
+ return 0;
+ }
+
+ if (context_copied(iommu, bus, devfn)) {
+ context_clear_entry(context);
+ __iommu_flush_cache(iommu, context, sizeof(*context));
+
+ /*
+ * For kdump cases, old valid entries may be cached due to
+ * the in-flight DMA and copied pgtable, but there is no
+ * unmapping behaviour for them, thus we need explicit cache
+ * flushes for all affected domain IDs and PASIDs used in
+ * the copied PASID table. Given that we have no idea about
+ * which domain IDs and PASIDs were used in the copied tables,
+ * upgrade them to global PASID and IOTLB cache invalidation.
+ */
+ iommu->flush.flush_context(iommu, 0,
+ PCI_DEVID(bus, devfn),
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ qi_flush_pasid_cache(iommu, 0, QI_PC_GLOBAL, 0);
+ iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
+ devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);
+
+ /*
+ * At this point, the device is supposed to finish reset at
+ * its driver probe stage, so no in-flight DMA will exist,
+ * and we don't need to worry anymore hereafter.
+ */
+ clear_context_copied(iommu, bus, devfn);
+ }
+
+ context_entry_set_pasid_table(context, dev);
+ spin_unlock(&iommu->lock);
+
+ /*
+ * It's a non-present to present mapping. If hardware doesn't cache
+ * non-present entry we don't need to flush the caches. If it does
+ * cache non-present entries, then it does so in the special
+ * domain #0, which we have to flush:
+ */
+ if (cap_caching_mode(iommu->cap)) {
+ iommu->flush.flush_context(iommu, 0,
+ PCI_DEVID(bus, devfn),
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_DSI_FLUSH);
+ }
+
+ return 0;
+}
+
+static int pci_pasid_table_setup(struct pci_dev *pdev, u16 alias, void *data)
+{
+ struct device *dev = data;
+
+ if (dev != &pdev->dev)
+ return 0;
+
+ return device_pasid_table_setup(dev, PCI_BUS_NUM(alias), alias & 0xff);
+}
+
+/*
+ * Set the device's PASID table to its context table entry.
+ *
+ * The PASID table is set to the context entries of both device itself
+ * and its alias requester ID for DMA.
+ */
+int intel_pasid_setup_sm_context(struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+ if (!dev_is_pci(dev))
+ return device_pasid_table_setup(dev, info->bus, info->devfn);
+
+ return pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_setup, dev);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 7/8] iommu/vt-d: Remove scalable mode context entry setup from attach_dev
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
` (5 preceding siblings ...)
2024-03-05 12:21 ` [PATCH 6/8] iommu/vt-d: Setup scalable mode context entry in probe path Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-05 12:21 ` [PATCH 8/8] iommu/vt-d: Remove scalabe mode in domain_context_clear_one() Lu Baolu
2024-03-06 16:36 ` [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Joerg Roedel
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
The scalable mode context entry is now setup in the probe_device path,
eliminating the need to configure it in the attach_dev path. Removes the
redundant code from the attach_dev path to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240305013305.204605-5-baolu.lu@linux.intel.com
---
drivers/iommu/intel/iommu.c | 156 ++++++++++--------------------------
1 file changed, 44 insertions(+), 112 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9b96d36b9d2a..d682eb6ad4d2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1850,34 +1850,17 @@ static void domain_exit(struct dmar_domain *domain)
kfree(domain);
}
-/*
- * Get the PASID directory size for scalable mode context entry.
- * Value of X in the PDTS field of a scalable mode context entry
- * indicates PASID directory with 2^(X + 7) entries.
- */
-static unsigned long context_get_sm_pds(struct pasid_table *table)
-{
- unsigned long pds, max_pde;
-
- max_pde = table->max_pasid >> PASID_PDE_SHIFT;
- pds = find_first_bit(&max_pde, MAX_NR_PASID_BITS);
- if (pds < 7)
- return 0;
-
- return pds - 7;
-}
-
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
- struct pasid_table *table,
u8 bus, u8 devfn)
{
struct device_domain_info *info =
domain_lookup_dev_info(domain, iommu, bus, devfn);
u16 did = domain_id_iommu(domain, iommu);
int translation = CONTEXT_TT_MULTI_LEVEL;
+ struct dma_pte *pgd = domain->pgd;
struct context_entry *context;
- int ret;
+ int agaw, ret;
if (hw_pass_through && domain_type_is_si(domain))
translation = CONTEXT_TT_PASS_THROUGH;
@@ -1920,65 +1903,37 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
}
context_clear_entry(context);
+ context_set_domain_id(context, did);
- if (sm_supported(iommu)) {
- unsigned long pds;
-
- /* Setup the PASID DIR pointer: */
- pds = context_get_sm_pds(table);
- context->lo = (u64)virt_to_phys(table->table) |
- context_pdts(pds);
-
- /* Setup the RID_PASID field: */
- context_set_sm_rid2pasid(context, IOMMU_NO_PASID);
-
+ if (translation != CONTEXT_TT_PASS_THROUGH) {
/*
- * Setup the Device-TLB enable bit and Page request
- * Enable bit:
+ * Skip top levels of page tables for iommu which has
+ * less agaw than default. Unnecessary for PT mode.
*/
+ for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+ ret = -ENOMEM;
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd))
+ goto out_unlock;
+ }
+
if (info && info->ats_supported)
- context_set_sm_dte(context);
- if (info && info->pri_supported)
- context_set_sm_pre(context);
- if (info && info->pasid_supported)
- context_set_pasid(context);
+ translation = CONTEXT_TT_DEV_IOTLB;
+ else
+ translation = CONTEXT_TT_MULTI_LEVEL;
+
+ context_set_address_root(context, virt_to_phys(pgd));
+ context_set_address_width(context, agaw);
} else {
- struct dma_pte *pgd = domain->pgd;
- int agaw;
-
- context_set_domain_id(context, did);
-
- if (translation != CONTEXT_TT_PASS_THROUGH) {
- /*
- * Skip top levels of page tables for iommu which has
- * less agaw than default. Unnecessary for PT mode.
- */
- for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
- ret = -ENOMEM;
- pgd = phys_to_virt(dma_pte_addr(pgd));
- if (!dma_pte_present(pgd))
- goto out_unlock;
- }
-
- if (info && info->ats_supported)
- translation = CONTEXT_TT_DEV_IOTLB;
- else
- translation = CONTEXT_TT_MULTI_LEVEL;
-
- context_set_address_root(context, virt_to_phys(pgd));
- context_set_address_width(context, agaw);
- } else {
- /*
- * In pass through mode, AW must be programmed to
- * indicate the largest AGAW value supported by
- * hardware. And ASR is ignored by hardware.
- */
- context_set_address_width(context, iommu->msagaw);
- }
-
- context_set_translation_type(context, translation);
+ /*
+ * In pass through mode, AW must be programmed to
+ * indicate the largest AGAW value supported by
+ * hardware. And ASR is ignored by hardware.
+ */
+ context_set_address_width(context, iommu->msagaw);
}
+ context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
if (!ecap_coherent(iommu->ecap))
@@ -2008,43 +1963,29 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
return ret;
}
-struct domain_context_mapping_data {
- struct dmar_domain *domain;
- struct intel_iommu *iommu;
- struct pasid_table *table;
-};
-
static int domain_context_mapping_cb(struct pci_dev *pdev,
u16 alias, void *opaque)
{
- struct domain_context_mapping_data *data = opaque;
+ struct device_domain_info *info = dev_iommu_priv_get(&pdev->dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct dmar_domain *domain = opaque;
- return domain_context_mapping_one(data->domain, data->iommu,
- data->table, PCI_BUS_NUM(alias),
- alias & 0xff);
+ return domain_context_mapping_one(domain, iommu,
+ PCI_BUS_NUM(alias), alias & 0xff);
}
static int
domain_context_mapping(struct dmar_domain *domain, struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct domain_context_mapping_data data;
struct intel_iommu *iommu = info->iommu;
u8 bus = info->bus, devfn = info->devfn;
- struct pasid_table *table;
-
- table = intel_pasid_get_table(dev);
if (!dev_is_pci(dev))
- return domain_context_mapping_one(domain, iommu, table,
- bus, devfn);
-
- data.domain = domain;
- data.iommu = iommu;
- data.table = table;
+ return domain_context_mapping_one(domain, iommu, bus, devfn);
return pci_for_each_dma_alias(to_pci_dev(dev),
- &domain_context_mapping_cb, &data);
+ domain_context_mapping_cb, domain);
}
/* Returns a number of VTD pages, but aligned to MM page size */
@@ -2404,28 +2345,19 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
list_add(&info->link, &domain->devices);
spin_unlock_irqrestore(&domain->lock, flags);
- /* PASID table is mandatory for a PCI device in scalable mode. */
- if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
- /* Setup the PASID entry for requests without PASID: */
- if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu,
- dev, IOMMU_NO_PASID);
- else if (domain->use_first_level)
- ret = domain_setup_first_level(iommu, domain, dev,
- IOMMU_NO_PASID);
- else
- ret = intel_pasid_setup_second_level(iommu, domain,
- dev, IOMMU_NO_PASID);
- if (ret) {
- dev_err(dev, "Setup RID2PASID failed\n");
- device_block_translation(dev);
- return ret;
- }
- }
+ if (dev_is_real_dma_subdevice(dev))
+ return 0;
+
+ if (!sm_supported(iommu))
+ ret = domain_context_mapping(domain, dev);
+ else if (hw_pass_through && domain_type_is_si(domain))
+ ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
+ else if (domain->use_first_level)
+ ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID);
+ else
+ ret = intel_pasid_setup_second_level(iommu, domain, dev, IOMMU_NO_PASID);
- ret = domain_context_mapping(domain, dev);
if (ret) {
- dev_err(dev, "Domain context map failed\n");
device_block_translation(dev);
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 8/8] iommu/vt-d: Remove scalabe mode in domain_context_clear_one()
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
` (6 preceding siblings ...)
2024-03-05 12:21 ` [PATCH 7/8] iommu/vt-d: Remove scalable mode context entry setup from attach_dev Lu Baolu
@ 2024-03-05 12:21 ` Lu Baolu
2024-03-06 16:36 ` [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Joerg Roedel
8 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2024-03-05 12:21 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
domain_context_clear_one() only handles the context entry teardown in
legacy mode. Remove the scalable mode check in it to avoid dead code.
Remove an unnecessary check in the code as well.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240305013305.204605-6-baolu.lu@linux.intel.com
---
drivers/iommu/intel/iommu.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d682eb6ad4d2..50eb9aed47cc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2175,9 +2175,6 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
struct context_entry *context;
u16 did_old;
- if (!iommu)
- return;
-
spin_lock(&iommu->lock);
context = iommu_context_addr(iommu, bus, devfn, 0);
if (!context) {
@@ -2185,14 +2182,7 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
return;
}
- if (sm_supported(iommu)) {
- if (hw_pass_through && domain_type_is_si(info->domain))
- did_old = FLPT_DEFAULT_DID;
- else
- did_old = domain_id_iommu(info->domain, iommu);
- } else {
- did_old = context_domain_id(context);
- }
+ did_old = context_domain_id(context);
context_clear_entry(context);
__iommu_flush_cache(iommu, context, sizeof(*context));
@@ -2203,9 +2193,6 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
DMA_CCMD_MASK_NOBIT,
DMA_CCMD_DEVICE_INVL);
- if (sm_supported(iommu))
- qi_flush_pasid_cache(iommu, did_old, QI_PC_ALL_PASIDS, 0);
-
iommu->flush.flush_iotlb(iommu,
did_old,
0,
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
` (7 preceding siblings ...)
2024-03-05 12:21 ` [PATCH 8/8] iommu/vt-d: Remove scalabe mode in domain_context_clear_one() Lu Baolu
@ 2024-03-06 16:36 ` Joerg Roedel
8 siblings, 0 replies; 10+ messages in thread
From: Joerg Roedel @ 2024-03-06 16:36 UTC (permalink / raw)
To: Lu Baolu; +Cc: Ethan Zhao, Eric Badger, iommu, linux-kernel
On Tue, Mar 05, 2024 at 08:21:13PM +0800, Lu Baolu wrote:
> The following fixes have been queued for iommu next:
>
> - Fix hard lockup on hotplug of an ATS-capable device
> - Fix NULL domain pointer dereference on device release
Applied, thanks Baolu.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-03-06 16:36 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-05 12:21 [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Lu Baolu
2024-03-05 12:21 ` [PATCH 1/8] PCI: Make pci_dev_is_disconnected() helper public for other drivers Lu Baolu
2024-03-05 12:21 ` [PATCH 2/8] iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected Lu Baolu
2024-03-05 12:21 ` [PATCH 3/8] iommu/vt-d: Improve ITE fault handling if target device isn't present Lu Baolu
2024-03-05 12:21 ` [PATCH 4/8] iommu: Add static iommu_ops->release_domain Lu Baolu
2024-03-05 12:21 ` [PATCH 5/8] iommu/vt-d: Fix NULL domain on device release Lu Baolu
2024-03-05 12:21 ` [PATCH 6/8] iommu/vt-d: Setup scalable mode context entry in probe path Lu Baolu
2024-03-05 12:21 ` [PATCH 7/8] iommu/vt-d: Remove scalable mode context entry setup from attach_dev Lu Baolu
2024-03-05 12:21 ` [PATCH 8/8] iommu/vt-d: Remove scalabe mode in domain_context_clear_one() Lu Baolu
2024-03-06 16:36 ` [PATCH 0/8] [PULL REQUEST] iommu/vt-d: Fixes for iommu next Joerg Roedel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox