From: Jani Nikula <jani.nikula@linux.intel.com>
To: Bruce Chang <yu.bruce.chang@intel.com>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
Date: Mon, 18 Nov 2019 17:59:24 +0200 [thread overview]
Message-ID: <875zjhyy3n.fsf@intel.com> (raw)
In-Reply-To: <20191113231104.24208-1-yu.bruce.chang@intel.com>
On Wed, 13 Nov 2019, Bruce Chang <yu.bruce.chang@intel.com> wrote:
> below is the call trace when this issue is hit
>
> <3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
> <3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
> <4> [113.319900] no locks held by debugfs_test/678.
> <3> [113.321002] Preemption disabled at:
> <4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
> <4> [113.327259] Call Trace:
> <4> [113.327871] dump_stack+0x67/0x9b
> <4> [113.328683] ___might_sleep+0x167/0x250
> <4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
> <4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70
> <4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
> <4> [113.335951] compress_page+0x7c/0x100 [i915]
> <4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
> <4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
> <4> [113.339771] ? __lock_acquire+0x4ac/0x1e90
> <4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50
> <4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915]
> <4> [113.343243] full_proxy_open+0x139/0x1b0
> <4> [113.344196] ? open_proxy_open+0xc0/0xc0
> <4> [113.345149] do_dentry_open+0x1ce/0x3a0
> <4> [113.346084] path_openat+0x4c9/0xac0
> <4> [113.346967] do_filp_open+0x96/0x110
> <4> [113.347848] ? __alloc_fd+0xe0/0x1f0
> <4> [113.348736] ? do_sys_open+0x1b8/0x250
> <4> [113.349647] do_sys_open+0x1b8/0x250
> <4> [113.350526] do_syscall_64+0x55/0x1c0
> <4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context
> but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can
> potentially go to sleep. When the kernel is in atomic context, sleeping is not
> allowed. This is why this bug got triggered.
>
> In order to fix this issue, we either
> 1) not enter into atomic context, i.e., to use non atomic version of
> functions like io_mapping_map_wc/kmap,
> or
> 2) make compress_page run in atomic context.
>
> But it is not a good idea to run slow compression in atomic context, so,
> 1) above is preferred solution which is the implementation of this patch.
>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> Reviewed-by: Brian Welty <brian.welty@intel.com>
> Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 1f2f266f26af..7118ecb7f144 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
> compress->wc = i915_gem_object_is_lmem(vma->obj) ||
> drm_mm_node_allocated(&ggtt->error_capture);
>
> ret = -EINVAL;
> if (drm_mm_node_allocated(&ggtt->error_capture)) {
> void __iomem *s;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> I915_CACHE_NONE, 0);
>
> s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else if (i915_gem_object_is_lmem(vma->obj)) {
> struct intel_memory_region *mem = vma->obj->mm.region;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> void __iomem *s;
>
> - s = io_mapping_map_atomic_wc(&mem->iomap, dma);
> + s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> - io_mapping_unmap_atomic(s);
> + io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else {
> struct page *page;
>
> for_each_sgt_page(page, iter, vma->pages) {
> void *s;
>
> drm_clflush_pages(&page, 1);
>
> - s = kmap_atomic(page);
> + s = kmap(page);
> ret = compress_page(compress, s, dst);
> - kunmap_atomic(s);
> + kunmap(s);
>
> drm_clflush_pages(&page, 1);
>
> if (ret)
> break;
> }
> }
>
> if (ret || compress_flush(compress, dst)) {
> while (dst->page_count--)
> pool_free(&compress->pool, dst->pages[dst->page_count]);
> kfree(dst);
> dst = NULL;
> }
> compress_finish(compress);
>
> return dst;
> }
>
> /*
> * Generate a semi-unique error code. The code is not meant to have meaning, The
> * code's only purpose is to try to prevent false duplicated bug reports by
> * grossly estimating a GPU error state.
> *
> * TODO Ideally, hashing the batchbuffer would be a very nice way to determine
For future reference, please just use the default patch context size.
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Jani Nikula <jani.nikula@linux.intel.com>
To: Bruce Chang <yu.bruce.chang@intel.com>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
Date: Mon, 18 Nov 2019 17:59:24 +0200 [thread overview]
Message-ID: <875zjhyy3n.fsf@intel.com> (raw)
Message-ID: <20191118155924.uNdQRPdvoN5uPf12ZiHfjDpDyWshnml4VdlwNMVaYG8@z> (raw)
In-Reply-To: <20191113231104.24208-1-yu.bruce.chang@intel.com>
On Wed, 13 Nov 2019, Bruce Chang <yu.bruce.chang@intel.com> wrote:
> below is the call trace when this issue is hit
>
> <3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
> <3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
> <4> [113.319900] no locks held by debugfs_test/678.
> <3> [113.321002] Preemption disabled at:
> <4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
> <4> [113.327259] Call Trace:
> <4> [113.327871] dump_stack+0x67/0x9b
> <4> [113.328683] ___might_sleep+0x167/0x250
> <4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
> <4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70
> <4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
> <4> [113.335951] compress_page+0x7c/0x100 [i915]
> <4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
> <4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
> <4> [113.339771] ? __lock_acquire+0x4ac/0x1e90
> <4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50
> <4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915]
> <4> [113.343243] full_proxy_open+0x139/0x1b0
> <4> [113.344196] ? open_proxy_open+0xc0/0xc0
> <4> [113.345149] do_dentry_open+0x1ce/0x3a0
> <4> [113.346084] path_openat+0x4c9/0xac0
> <4> [113.346967] do_filp_open+0x96/0x110
> <4> [113.347848] ? __alloc_fd+0xe0/0x1f0
> <4> [113.348736] ? do_sys_open+0x1b8/0x250
> <4> [113.349647] do_sys_open+0x1b8/0x250
> <4> [113.350526] do_syscall_64+0x55/0x1c0
> <4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context
> but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can
> potentially go to sleep. When the kernel is in atomic context, sleeping is not
> allowed. This is why this bug got triggered.
>
> In order to fix this issue, we either
> 1) not enter into atomic context, i.e., to use non atomic version of
> functions like io_mapping_map_wc/kmap,
> or
> 2) make compress_page run in atomic context.
>
> But it is not a good idea to run slow compression in atomic context, so,
> 1) above is preferred solution which is the implementation of this patch.
>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> Reviewed-by: Brian Welty <brian.welty@intel.com>
> Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 1f2f266f26af..7118ecb7f144 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
> compress->wc = i915_gem_object_is_lmem(vma->obj) ||
> drm_mm_node_allocated(&ggtt->error_capture);
>
> ret = -EINVAL;
> if (drm_mm_node_allocated(&ggtt->error_capture)) {
> void __iomem *s;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> I915_CACHE_NONE, 0);
>
> s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else if (i915_gem_object_is_lmem(vma->obj)) {
> struct intel_memory_region *mem = vma->obj->mm.region;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> void __iomem *s;
>
> - s = io_mapping_map_atomic_wc(&mem->iomap, dma);
> + s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> - io_mapping_unmap_atomic(s);
> + io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else {
> struct page *page;
>
> for_each_sgt_page(page, iter, vma->pages) {
> void *s;
>
> drm_clflush_pages(&page, 1);
>
> - s = kmap_atomic(page);
> + s = kmap(page);
> ret = compress_page(compress, s, dst);
> - kunmap_atomic(s);
> + kunmap(s);
>
> drm_clflush_pages(&page, 1);
>
> if (ret)
> break;
> }
> }
>
> if (ret || compress_flush(compress, dst)) {
> while (dst->page_count--)
> pool_free(&compress->pool, dst->pages[dst->page_count]);
> kfree(dst);
> dst = NULL;
> }
> compress_finish(compress);
>
> return dst;
> }
>
> /*
> * Generate a semi-unique error code. The code is not meant to have meaning, The
> * code's only purpose is to try to prevent false duplicated bug reports by
> * grossly estimating a GPU error state.
> *
> * TODO Ideally, hashing the batchbuffer would be a very nice way to determine
For future reference, please just use the default patch context size.
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-11-18 15:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-13 23:11 [PATCH] drm/i915: Fix a bug calling sleep function in atomic context Bruce Chang
2019-11-13 23:11 ` [Intel-gfx] " Bruce Chang
2019-11-13 23:20 ` Chris Wilson
2019-11-13 23:20 ` [Intel-gfx] " Chris Wilson
2019-11-18 15:59 ` Jani Nikula [this message]
2019-11-18 15:59 ` Jani Nikula
-- strict thread matches above, loose matches on Subject: below --
2019-11-13 23:25 Bruce Chang
2019-11-13 19:52 Bruce Chang
2019-11-13 20:05 ` Chris Wilson
2019-11-13 0:28 Bruce Chang
2019-11-13 19:32 ` Brian Welty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875zjhyy3n.fsf@intel.com \
--to=jani.nikula@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=yu.bruce.chang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.