From: Matthew Auld <matthew.auld@intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
Raag Jadav <raag.jadav@intel.com>
Cc: intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com,
himal.prasad.ghimiray@intel.com, matthew.d.roper@intel.com
Subject: Re: [PATCH v2] drm/xe: Drop all mappings for wedged device
Date: Fri, 27 Mar 2026 10:18:01 +0000 [thread overview]
Message-ID: <9099f0ef-87a9-42f6-888f-57bb73f6d6ae@intel.com> (raw)
In-Reply-To: <acWi1gL2SI+KLWgP@gsse-cloud1.jf.intel.com>
On 26/03/2026 21:19, Matthew Brost wrote:
> On Thu, Mar 26, 2026 at 06:58:16PM +0530, Raag Jadav wrote:
>> As per uapi documentation[1], the prerequisite for wedged device is to
>> drop all memory mappings. Follow it.
>>
>> [1] Documentation/gpu/drm-uapi.rst
>>
>> v2: Also drop CPU mappings (Matthew Auld)
>>
>> Fixes: 7bc00751f877 ("drm/xe: Use device wedged event")
>> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_bo_evict.c | 8 +++++++-
>> drivers/gpu/drm/xe/xe_bo_evict.h | 1 +
>> drivers/gpu/drm/xe/xe_device.c | 5 +++++
>> 3 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
>> index 7661fca7f278..f741cda50b2d 100644
>> --- a/drivers/gpu/drm/xe/xe_bo_evict.c
>> +++ b/drivers/gpu/drm/xe/xe_bo_evict.c
>> @@ -270,7 +270,13 @@ int xe_bo_restore_late(struct xe_device *xe)
>> return ret;
>> }
>>
>> -static void xe_bo_pci_dev_remove_pinned(struct xe_device *xe)
>> +/**
>> + * xe_bo_pci_dev_remove_pinned() - Unmap external bos
>> + * @xe: xe device
>> + *
>> + * Drop dma mappings of all external pinned bos.
>> + */
>> +void xe_bo_pci_dev_remove_pinned(struct xe_device *xe)
>> {
>> struct xe_tile *tile;
>> unsigned int id;
>> diff --git a/drivers/gpu/drm/xe/xe_bo_evict.h b/drivers/gpu/drm/xe/xe_bo_evict.h
>> index e8385cb7f5e9..6ce27e272780 100644
>> --- a/drivers/gpu/drm/xe/xe_bo_evict.h
>> +++ b/drivers/gpu/drm/xe/xe_bo_evict.h
>> @@ -15,6 +15,7 @@ void xe_bo_notifier_unprepare_all_pinned(struct xe_device *xe);
>> int xe_bo_restore_early(struct xe_device *xe);
>> int xe_bo_restore_late(struct xe_device *xe);
>>
>> +void xe_bo_pci_dev_remove_pinned(struct xe_device *xe);
>> void xe_bo_pci_dev_remove_all(struct xe_device *xe);
>>
>> int xe_bo_pinned_init(struct xe_device *xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index b17d4a878686..4c0097f3aefb 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -1347,6 +1347,11 @@ void xe_device_declare_wedged(struct xe_device *xe)
>> for_each_gt(gt, xe, id)
>> xe_gt_declare_wedged(gt);
>>
>> + /* Drop dma mappings of external bos */
>> + xe_bo_pci_dev_remove_pinned(xe);
>
> Do we even need the part above? unmap_mapping_range() should drop all
> DMA mappings for the device being wedged, right? In other words, the device
> should no longer be able to access system memory or other devices’ memory
> via PCIe P2P. I'm not 100% sure about this, though.
AFAIK unmap_mapping_range() is just for the CPU mmap side. It should
ensure ~everything is refaulted on the next CPU access, so we can point
to dummy page.
For dma mapping side, I'm still not completely sure what the best
approach is. On the one hand, device is wedged so we should not really
be doing new GPU access? Ioctls are all blocked, and with below, CPU
access will be re-directed to dummy page. So perhaps doing nothing for
dma mapping side is OK? If we want to actually remove all dma mappings
for extra safety, I think closest thing is maybe purge all BOs? Similar
to what we do for an unplug.
So perhaps xe_bo_pci_dev_remove_all() is better here? Also I guess would
need:
@@ -349,7 +349,8 @@ static void xe_evict_flags(struct ttm_buffer_object
*tbo,
return;
}
- if (device_unplugged && !tbo->base.dma_buf) {
+ if ((device_unplugged || xe_device_wedged(xe)) &&
+ !tbo->base.dma_buf) {
*placement = purge_placement;
return;
}
>
> Matt
>
>> + /* Drop all CPU mappings pointing to this device */
>> + unmap_mapping_range(xe->drm.anon_inode->i_mapping, 0, 0, 1);
>> +
>> if (xe_device_wedged(xe)) {
>> /*
>> * XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging
>> --
>> 2.43.0
>>
next prev parent reply other threads:[~2026-03-27 10:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 13:28 [PATCH v2] drm/xe: Drop all mappings for wedged device Raag Jadav
2026-03-26 14:06 ` ✓ CI.KUnit: success for " Patchwork
2026-03-26 14:42 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-26 21:19 ` [PATCH v2] " Matthew Brost
2026-03-27 10:18 ` Matthew Auld [this message]
2026-03-27 10:41 ` Thomas Hellström
2026-03-27 14:24 ` Raag Jadav
2026-03-27 15:03 ` Matthew Auld
2026-03-30 7:45 ` Thomas Hellström
2026-03-27 19:51 ` Matthew Brost
2026-03-27 4:56 ` ✓ Xe.CI.FULL: success for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9099f0ef-87a9-42f6-888f-57bb73f6d6ae@intel.com \
--to=matthew.auld@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=raag.jadav@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox