From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/debugfs: add noreclaim annotations
Date: Mon, 13 Dec 2021 13:58:51 +0100 [thread overview]
Message-ID: <4469b0ff-076e-86ee-cb2a-fd9f780fc106@linux.intel.com> (raw)
In-Reply-To: <20211213125530.3960007-1-matthew.auld@intel.com>
On 12/13/21 13:55, Matthew Auld wrote:
> We have a debugfs hook to directly call into i915_gem_shrink() with the
> fs_reclaim acquire annotations to simulate hitting direct reclaim.
> However we should also annotate this with memalloc_noreclaim, which will
> set PF_MEMALLOC for us on the current context, to ensure we can't
> re-enter direct reclaim(just like "real" direct reclaim does). This is
> an issue now that ttm_bo_validate could potentially be called here,
> which might try to allocate a tiny amount of memory to hold the new
> ttm_resource struct, as per the below splat:
>
> [ 2507.913844] WARNING: possible recursive locking detected
> [ 2507.913848] 5.16.0-rc4+ #5 Tainted: G U
> [ 2507.913853] --------------------------------------------
> [ 2507.913856] gem_exec_captur/1825 is trying to acquire lock:
> [ 2507.913861] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: kmem_cache_alloc_trace+0x30/0x390
> [ 2507.913875]
> but task is already holding lock:
> [ 2507.913879] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915]
> [ 2507.913962]
> other info that might help us debug this:
> [ 2507.913966] Possible unsafe locking scenario:
>
> [ 2507.913970] CPU0
> [ 2507.913973] ----
> [ 2507.913975] lock(fs_reclaim);
> [ 2507.913979] lock(fs_reclaim);
> [ 2507.913983]
>
> DEADLOCK ***
>
> [ 2507.913988] May be due to missing lock nesting notation
>
> [ 2507.913992] 4 locks held by gem_exec_captur/1825:
> [ 2507.913997] #0: ffff888101f6e460 (sb_writers#17){..}-{0:0}, at: ksys_write+0xe9/0x1b0
> [ 2507.914009] #1: ffff88812d99e2b8 (&attr->mutex){..}-{3:3}, at: simple_attr_write+0xbb/0x220
> [ 2507.914019] #2: ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915]
> [ 2507.914085] #3: ffff8881b4a11b20 (reservation_ww_class_mutex){..}-{3:3}, at: ww_mutex_trylock+0x43f/0xcb0
> [ 2507.914097]
> stack backtrace:
> [ 2507.914102] CPU: 0 PID: 1825 Comm: gem_exec_captur Tainted: G U 5.16.0-rc4+ #5
> [ 2507.914109] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
> [ 2507.914115] Call Trace:
> [ 2507.914118] <TASK>
> [ 2507.914121] dump_stack_lvl+0x59/0x73
> [ 2507.914128] __lock_acquire.cold+0x227/0x3b0
> [ 2507.914135] ? lockdep_hardirqs_on_prepare+0x410/0x410
> [ 2507.914141] ? __lock_acquire+0x23ca/0x5000
> [ 2507.914147] lock_acquire+0x19c/0x4b0
> [ 2507.914152] ? kmem_cache_alloc_trace+0x30/0x390
> [ 2507.914157] ? lock_release+0x690/0x690
> [ 2507.914163] ? lock_is_held_type+0xe4/0x140
> [ 2507.914170] ? ttm_sys_man_alloc+0x47/0xb0 [ttm]
> [ 2507.914178] fs_reclaim_acquire+0x11a/0x160
> [ 2507.914183] ? kmem_cache_alloc_trace+0x30/0x390
> [ 2507.914188] kmem_cache_alloc_trace+0x30/0x390
> [ 2507.914192] ? lock_release+0x37f/0x690
> [ 2507.914198] ttm_sys_man_alloc+0x47/0xb0 [ttm]
> [ 2507.914206] ttm_bo_pipeline_gutting+0x70/0x440 [ttm]
> [ 2507.914214] ? ttm_mem_io_free+0x150/0x150 [ttm]
> [ 2507.914221] ? lock_is_held_type+0xe4/0x140
> [ 2507.914227] ttm_bo_validate+0x2fb/0x370 [ttm]
> [ 2507.914234] ? lock_acquire+0x19c/0x4b0
> [ 2507.914239] ? ttm_bo_bounce_temp_buffer.constprop.0+0xf0/0xf0 [ttm]
> [ 2507.914246] ? lock_acquire+0x131/0x4b0
> [ 2507.914251] ? lock_is_held_type+0xe4/0x140
> [ 2507.914257] i915_ttm_shrinker_release_pages+0x2bc/0x490 [i915]
> [ 2507.914339] ? i915_ttm_swap_notify+0x130/0x130 [i915]
> [ 2507.914429] ? i915_gem_object_release_mmap_offset+0x32/0x250 [i915]
> [ 2507.914529] i915_gem_shrink+0xb14/0x1290 [i915]
> [ 2507.914616] ? ___i915_gem_object_make_shrinkable+0x3e0/0x3e0 [i915]
> [ 2507.914698] ? _raw_spin_unlock_irqrestore+0x2d/0x60
> [ 2507.914705] ? track_intel_runtime_pm_wakeref+0x180/0x230 [i915]
> [ 2507.914777] i915_gem_shrink_all+0x4b/0x70 [i915]
> [ 2507.914857] i915_drop_caches_set+0x227/0x2c0 [i915]
>
> Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index bafb902269de..359d8ffc6e36 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -666,6 +666,7 @@ static int
> i915_drop_caches_set(void *data, u64 val)
> {
> struct drm_i915_private *i915 = data;
> + unsigned int flags;
> int ret;
>
> DRM_DEBUG("Dropping caches: 0x%08llx [0x%08llx]\n",
> @@ -676,6 +677,7 @@ i915_drop_caches_set(void *data, u64 val)
> return ret;
>
> fs_reclaim_acquire(GFP_KERNEL);
> + flags = memalloc_noreclaim_save();
> if (val & DROP_BOUND)
> i915_gem_shrink(NULL, i915, LONG_MAX, NULL, I915_SHRINK_BOUND);
>
> @@ -684,6 +686,7 @@ i915_drop_caches_set(void *data, u64 val)
>
> if (val & DROP_SHRINK_ALL)
> i915_gem_shrink_all(i915);
> + memalloc_noreclaim_restore(flags);
> fs_reclaim_release(GFP_KERNEL);
>
> if (val & DROP_RCU)
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915/debugfs: add noreclaim annotations
Date: Mon, 13 Dec 2021 13:58:51 +0100 [thread overview]
Message-ID: <4469b0ff-076e-86ee-cb2a-fd9f780fc106@linux.intel.com> (raw)
In-Reply-To: <20211213125530.3960007-1-matthew.auld@intel.com>
On 12/13/21 13:55, Matthew Auld wrote:
> We have a debugfs hook to directly call into i915_gem_shrink() with the
> fs_reclaim acquire annotations to simulate hitting direct reclaim.
> However we should also annotate this with memalloc_noreclaim, which will
> set PF_MEMALLOC for us on the current context, to ensure we can't
> re-enter direct reclaim(just like "real" direct reclaim does). This is
> an issue now that ttm_bo_validate could potentially be called here,
> which might try to allocate a tiny amount of memory to hold the new
> ttm_resource struct, as per the below splat:
>
> [ 2507.913844] WARNING: possible recursive locking detected
> [ 2507.913848] 5.16.0-rc4+ #5 Tainted: G U
> [ 2507.913853] --------------------------------------------
> [ 2507.913856] gem_exec_captur/1825 is trying to acquire lock:
> [ 2507.913861] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: kmem_cache_alloc_trace+0x30/0x390
> [ 2507.913875]
> but task is already holding lock:
> [ 2507.913879] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915]
> [ 2507.913962]
> other info that might help us debug this:
> [ 2507.913966] Possible unsafe locking scenario:
>
> [ 2507.913970] CPU0
> [ 2507.913973] ----
> [ 2507.913975] lock(fs_reclaim);
> [ 2507.913979] lock(fs_reclaim);
> [ 2507.913983]
>
> DEADLOCK ***
>
> [ 2507.913988] May be due to missing lock nesting notation
>
> [ 2507.913992] 4 locks held by gem_exec_captur/1825:
> [ 2507.913997] #0: ffff888101f6e460 (sb_writers#17){..}-{0:0}, at: ksys_write+0xe9/0x1b0
> [ 2507.914009] #1: ffff88812d99e2b8 (&attr->mutex){..}-{3:3}, at: simple_attr_write+0xbb/0x220
> [ 2507.914019] #2: ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915]
> [ 2507.914085] #3: ffff8881b4a11b20 (reservation_ww_class_mutex){..}-{3:3}, at: ww_mutex_trylock+0x43f/0xcb0
> [ 2507.914097]
> stack backtrace:
> [ 2507.914102] CPU: 0 PID: 1825 Comm: gem_exec_captur Tainted: G U 5.16.0-rc4+ #5
> [ 2507.914109] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
> [ 2507.914115] Call Trace:
> [ 2507.914118] <TASK>
> [ 2507.914121] dump_stack_lvl+0x59/0x73
> [ 2507.914128] __lock_acquire.cold+0x227/0x3b0
> [ 2507.914135] ? lockdep_hardirqs_on_prepare+0x410/0x410
> [ 2507.914141] ? __lock_acquire+0x23ca/0x5000
> [ 2507.914147] lock_acquire+0x19c/0x4b0
> [ 2507.914152] ? kmem_cache_alloc_trace+0x30/0x390
> [ 2507.914157] ? lock_release+0x690/0x690
> [ 2507.914163] ? lock_is_held_type+0xe4/0x140
> [ 2507.914170] ? ttm_sys_man_alloc+0x47/0xb0 [ttm]
> [ 2507.914178] fs_reclaim_acquire+0x11a/0x160
> [ 2507.914183] ? kmem_cache_alloc_trace+0x30/0x390
> [ 2507.914188] kmem_cache_alloc_trace+0x30/0x390
> [ 2507.914192] ? lock_release+0x37f/0x690
> [ 2507.914198] ttm_sys_man_alloc+0x47/0xb0 [ttm]
> [ 2507.914206] ttm_bo_pipeline_gutting+0x70/0x440 [ttm]
> [ 2507.914214] ? ttm_mem_io_free+0x150/0x150 [ttm]
> [ 2507.914221] ? lock_is_held_type+0xe4/0x140
> [ 2507.914227] ttm_bo_validate+0x2fb/0x370 [ttm]
> [ 2507.914234] ? lock_acquire+0x19c/0x4b0
> [ 2507.914239] ? ttm_bo_bounce_temp_buffer.constprop.0+0xf0/0xf0 [ttm]
> [ 2507.914246] ? lock_acquire+0x131/0x4b0
> [ 2507.914251] ? lock_is_held_type+0xe4/0x140
> [ 2507.914257] i915_ttm_shrinker_release_pages+0x2bc/0x490 [i915]
> [ 2507.914339] ? i915_ttm_swap_notify+0x130/0x130 [i915]
> [ 2507.914429] ? i915_gem_object_release_mmap_offset+0x32/0x250 [i915]
> [ 2507.914529] i915_gem_shrink+0xb14/0x1290 [i915]
> [ 2507.914616] ? ___i915_gem_object_make_shrinkable+0x3e0/0x3e0 [i915]
> [ 2507.914698] ? _raw_spin_unlock_irqrestore+0x2d/0x60
> [ 2507.914705] ? track_intel_runtime_pm_wakeref+0x180/0x230 [i915]
> [ 2507.914777] i915_gem_shrink_all+0x4b/0x70 [i915]
> [ 2507.914857] i915_drop_caches_set+0x227/0x2c0 [i915]
>
> Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index bafb902269de..359d8ffc6e36 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -666,6 +666,7 @@ static int
> i915_drop_caches_set(void *data, u64 val)
> {
> struct drm_i915_private *i915 = data;
> + unsigned int flags;
> int ret;
>
> DRM_DEBUG("Dropping caches: 0x%08llx [0x%08llx]\n",
> @@ -676,6 +677,7 @@ i915_drop_caches_set(void *data, u64 val)
> return ret;
>
> fs_reclaim_acquire(GFP_KERNEL);
> + flags = memalloc_noreclaim_save();
> if (val & DROP_BOUND)
> i915_gem_shrink(NULL, i915, LONG_MAX, NULL, I915_SHRINK_BOUND);
>
> @@ -684,6 +686,7 @@ i915_drop_caches_set(void *data, u64 val)
>
> if (val & DROP_SHRINK_ALL)
> i915_gem_shrink_all(i915);
> + memalloc_noreclaim_restore(flags);
> fs_reclaim_release(GFP_KERNEL);
>
> if (val & DROP_RCU)
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
next prev parent reply other threads:[~2021-12-13 12:58 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-13 12:55 [Intel-gfx] [PATCH] drm/i915/debugfs: add noreclaim annotations Matthew Auld
2021-12-13 12:55 ` Matthew Auld
2021-12-13 12:58 ` Thomas Hellström [this message]
2021-12-13 12:58 ` Thomas Hellström
2021-12-13 18:15 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for " Patchwork
2021-12-14 9:24 ` Matthew Auld
2021-12-13 21:32 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/debugfs: add noreclaim annotations (rev2) Patchwork
2021-12-14 4:55 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-12-14 9:25 ` Matthew Auld
2021-12-14 17:13 ` Vudum, Lakshminarayana
2021-12-14 17:12 ` [Intel-gfx] ✓ Fi.CI.IGT: success " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4469b0ff-076e-86ee-cb2a-fd9f780fc106@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.