From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH 4/5] drm/xe: Prevent evicting for page tables
Date: Mon, 29 May 2023 17:11:10 +0200 [thread overview]
Message-ID: <30734f17-58d0-bf50-ac83-dc4e7d65550d@linux.intel.com> (raw)
In-Reply-To: <5fece445-89eb-0e97-e934-1c896fb27bd7@linux.intel.com>
Hey,
On 2023-05-29 17:02, Thomas Hellström wrote:
>
> On 5/29/23 15:44, Maarten Lankhorst wrote:
>> On 2023-05-26 14:35, Thomas Hellström wrote:
>>> On 5/26/23 14:11, Maarten Lankhorst wrote:
>>>> When creating page tables from xe_exec_ioctl, we may end up freeing
>>>> memory we just validated. To be certain this does not happen, do not
>>>> allow the current reservation to be evicted from the ioctl.
>>>>
>>>> Callchain:
>>>> [ 109.008522] xe_bo_move_notify+0x5c/0xf0 [xe]
>>>> [ 109.008548] xe_bo_move+0x90/0x510 [xe]
>>>> [ 109.008573] ttm_bo_handle_move_mem+0xb7/0x170 [ttm]
>>>> [ 109.008581] ttm_bo_swapout+0x15e/0x360 [ttm]
>>>> [ 109.008586] ttm_device_swapout+0xc2/0x110 [ttm]
>>>> [ 109.008592] ttm_global_swapout+0x47/0xc0 [ttm]
>>>> [ 109.008598] ttm_tt_populate+0x7a/0x130 [ttm]
>>>> [ 109.008603] ttm_bo_handle_move_mem+0x160/0x170 [ttm]
>>>> [ 109.008609] ttm_bo_validate+0xe5/0x1d0 [ttm]
>>>> [ 109.008614] ttm_bo_init_reserved+0xac/0x190 [ttm]
>>>> [ 109.008620] __xe_bo_create_locked+0x153/0x260 [xe]
>>>> [ 109.008645] xe_bo_create_locked_range+0x77/0x360 [xe]
>>>> [ 109.008671] xe_bo_create_pin_map_at+0x33/0x1f0 [xe]
>>>> [ 109.008695] xe_bo_create_pin_map+0x11/0x20 [xe]
>>>> [ 109.008721] xe_pt_create+0x69/0xf0 [xe]
>>>> [ 109.008749] xe_pt_stage_bind_entry+0x208/0x430 [xe]
>>>> [ 109.008776] xe_pt_walk_range+0xe9/0x2a0 [xe]
>>>> [ 109.008802] xe_pt_walk_range+0x223/0x2a0 [xe]
>>>> [ 109.008828] xe_pt_walk_range+0x223/0x2a0 [xe]
>>>> [ 109.008853] __xe_pt_bind_vma+0x28d/0xbd0 [xe]
>>>> [ 109.008878] xe_vm_bind_vma+0xc7/0x2f0 [xe]
>>>> [ 109.008904] xe_vm_rebind+0x72/0x160 [xe]
>>>> [ 109.008930] xe_exec_ioctl+0x22b/0xa70 [xe]
>>>> [ 109.008955] drm_ioctl_kernel+0xb9/0x150 [drm]
>>>> [ 109.008972] drm_ioctl+0x210/0x430 [drm]
>>>> [ 109.008988] __x64_sys_ioctl+0x85/0xb0
>>>> [ 109.008990] do_syscall_64+0x38/0x90
>>>> [ 109.008991] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>
>>>> Original warning:
>>>> [ 5613.149126] WARNING: CPU: 3 PID: 45883 at drivers/gpu/drm/xe/xe_vm.c:504 xe_vm_unlock_dma_resv+0x43/0x50 [xe]
>>>> ...
>>>> [ 5613.226398] RIP: 0010:xe_vm_unlock_dma_resv+0x43/0x50 [xe]
>>>> [ 5613.316098] Call Trace:
>>>> [ 5613.318595] <TASK>
>>>> [ 5613.320743] xe_exec_ioctl+0x383/0x8a0 [xe]
>>>> [ 5613.325278] ? __is_insn_slot_addr+0x8e/0x110
>>>> [ 5613.329719] ? __is_insn_slot_addr+0x8e/0x110
>>>> [ 5613.334116] ? kernel_text_address+0x75/0xf0
>>>> [ 5613.338429] ? __pfx_stack_trace_consume_entry+0x10/0x10
>>>> [ 5613.343778] ? __kernel_text_address+0x9/0x40
>>>> [ 5613.348181] ? unwind_get_return_address+0x1a/0x30
>>>> [ 5613.353013] ? __pfx_stack_trace_consume_entry+0x10/0x10
>>>> [ 5613.358362] ? arch_stack_walk+0x99/0xf0
>>>> [ 5613.362329] ? rcu_read_lock_sched_held+0xb/0x70
>>>> [ 5613.366996] ? lock_acquire+0x287/0x2f0
>>>> [ 5613.370873] ? rcu_read_lock_sched_held+0xb/0x70
>>>> [ 5613.375530] ? rcu_read_lock_sched_held+0xb/0x70
>>>> [ 5613.380181] ? lock_release+0x225/0x2e0
>>>> [ 5613.384059] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
>>>> [ 5613.389092] drm_ioctl_kernel+0xc0/0x170
>>>> [ 5613.393068] drm_ioctl+0x1b7/0x490
>>>> [ 5613.396519] ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
>>>> [ 5613.401547] ? lock_release+0x225/0x2e0
>>>> [ 5613.405432] __x64_sys_ioctl+0x8a/0xb0
>>>> [ 5613.409232] do_syscall_64+0x37/0x90
>>>>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/239
>>> Did you look at passing around the ttm_operation_ctx, or a "allow_res_evict" bool?
>>> In any case would be good to have this fixed asap, so
>>>
>>> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>
>> I considered it, but the original callchain was too long. I don't think there is any usecase
>> in which we want to evict from the current context to make room for new pagetables for VM_BIND.
>> Anything locked is most likely used, making room by evicting from current VM (or its bound extobjs)
>> will likely lead to ENOSPC anyway.
>
> Well, I think the use-case where this will cause problems is if we're doing a single VM_BIND on a brand new VRAM BO, and need to evict other VRAM bos from the same VM to make room.
>
> This will then ofc ENOSPC on the next exec, but if we were to introduce a two-pass validation scheme, where we explicitly move suitable BOs with multiple placement options to TT on the first ENOSPC, we could avoid that...
Allowing same-reservation eviction will allow you to evict the BO its VM_BIND page table, leaving no entries to write. :-)
~Maarten
next prev parent reply other threads:[~2023-05-29 15:11 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-26 12:10 [Intel-xe] [PATCH 0/5] Robustness fixes with eviction and invalidation Maarten Lankhorst
2023-05-26 12:10 ` [Intel-xe] [PATCH 1/5] drm/xe: Kill small race with userptr invalidation Maarten Lankhorst
2023-05-26 12:26 ` Thomas Hellström
2023-05-26 12:10 ` [Intel-xe] [PATCH 2/5] drm/xe: Add debugfs files to evict everything Maarten Lankhorst
2023-05-26 13:02 ` Thomas Hellström
2023-05-29 13:35 ` Maarten Lankhorst
2023-05-30 14:41 ` Thomas Hellström
2023-05-30 14:54 ` Maarten Lankhorst
2023-05-26 12:10 ` [Intel-xe] [PATCH 3/5] drm/xe: Fix extobj dropping issue Maarten Lankhorst
2023-05-26 12:31 ` Thomas Hellström
2023-05-26 12:11 ` [Intel-xe] [PATCH 4/5] drm/xe: Prevent evicting for page tables Maarten Lankhorst
2023-05-26 12:35 ` Thomas Hellström
2023-05-29 13:44 ` Maarten Lankhorst
2023-05-29 15:02 ` Thomas Hellström
2023-05-29 15:11 ` Maarten Lankhorst [this message]
2023-05-29 15:13 ` Thomas Hellström
2023-05-29 15:23 ` Maarten Lankhorst
2023-05-30 8:45 ` Thomas Hellström
2023-05-30 8:50 ` Maarten Lankhorst
2023-05-26 12:11 ` [Intel-xe] [PATCH 5/5] drm/xe: Return the correct error when dma_resv_wait_timeout fails Maarten Lankhorst
2023-05-26 12:40 ` Thomas Hellström
2023-05-26 19:15 ` Souza, Jose
2023-05-27 5:17 ` Christopher Snowhill
2023-05-29 15:21 ` Maarten Lankhorst
2023-05-26 12:13 ` [Intel-xe] ✓ CI.Patch_applied: success for Robustness fixes with eviction and invalidation Patchwork
2023-05-26 12:15 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-05-26 12:19 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-05-26 12:48 ` [Intel-xe] ○ CI.BAT: info " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=30734f17-58d0-bf50-ac83-dc4e7d65550d@linux.intel.com \
--to=maarten.lankhorst@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox