* [PATCH] drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED
@ 2026-05-04 20:19 Mikhail Gavrilov
2026-05-04 20:57 ` Aurabindo Pillai
2026-05-04 21:01 ` Alex Deucher
0 siblings, 2 replies; 3+ messages in thread
From: Mikhail Gavrilov @ 2026-05-04 20:19 UTC (permalink / raw)
To: harry.wentland, sunpeng.li, alexander.deucher, christian.koenig
Cc: siqueira, airlied, simona, ardb, hamza.mahfooz, aurabindo.pillai,
Roman.Li, amd-gfx, dri-devel, linux-kernel, stable,
Mikhail Gavrilov
dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with
DC_FP_START()/DC_FP_END(). On x86 non-RT, DC_FP_START expands into
kernel_fpu_begin() which takes fpregs_lock(), i.e. local_bh_disable().
Allocations done inside this region must therefore not sleep.
The legacy DML1 path through dcn32_full_validate_bw_helper() ->
dcn32_add_phantom_pipes() -> dcn32_enable_phantom_plane() unconditionally
calls dc_state_create_phantom_plane() -> dc_create_plane_state(), which
performs kvzalloc(sizeof(struct dc_plane_state)). On a recent kernel
sizeof(struct dc_plane_state) is 343736 bytes (335 KiB), well above the
PAGE_ALLOC_COSTLY_ORDER threshold, so __kvmalloc_node() takes the vmalloc
path. __get_vm_area_node() then trips its BUG_ON(in_interrupt()) because
SOFTIRQ_DISABLE_OFFSET is set in preempt_count:
kernel BUG at mm/vmalloc.c:3206!
RIP: __get_vm_area_node+0x257/0x2d0
Workqueue: events_unbound commit_work
Call Trace:
__vmalloc_node_range_noprof+0x22b/0x570
__kvmalloc_node_noprof+0x3d0/0xb40
dc_create_plane_state+0x35/0x290 [amdgpu]
dc_state_create_phantom_plane+0x1a/0x120 [amdgpu]
dcn32_enable_phantom_plane+0x101/0x780 [amdgpu]
dcn32_add_phantom_pipes+0x47/0x460 [amdgpu]
dcn32_full_validate_bw_helper.constprop.0+0xa46/0x1d70 [amdgpu]
dcn32_internal_validate_bw+0x49c/0x1600 [amdgpu]
dml1_validate+0x20f/0x800 [amdgpu]
dcn32_validate_bandwidth+0x317/0x540 [amdgpu]
dc_validate_with_context+0xd34/0x1d30 [amdgpu]
dc_commit_streams+0x7ca/0x1810 [amdgpu]
amdgpu_dm_commit_streams+0xfd4/0x1e60 [amdgpu]
amdgpu_dm_atomic_commit_tail+0x29e/0x3520 [amdgpu]
commit_tail+0x204/0x4b0
process_one_work+0x8fd/0x16a0
Per-CPU __preempt_count on the crashing CPU at panic time was 0x202:
SOFTIRQ_DISABLE_OFFSET (0x200) from fpregs_lock() plus two preempt holds
from dc_fpu_begin() and kernel_fpu_begin().
The DML2 paths already wrap their large vzalloc()s in
DC_RUN_WITH_PREEMPTION_ENABLED() to handle this case (see
drivers/gpu/drm/amd/display/dc/dml2_0/dml21/dml21_wrapper.c:26 and
drivers/gpu/drm/amd/display/dc/dml2_0/dml2_wrapper.c:24). Apply the same
guard to the DML1 phantom-plane allocation in dcn32_enable_phantom_plane().
This is a separate class of issue from "drm/amd/display: Fix unsafe uses
of kernel mode FPU" by Ard Biesheuvel, which addressed callers entering
DC FP compilation units without DC_FP_START. The bug fixed here is the
inverse: a sleeping allocator invoked from within an active DC_FP_START
region.
Reproducer (RX 7900 XTX, single 4K HDMI display, DCN 3.2): launch any
workload that produces rapid atomic modeset commits. The most reliable
trigger observed is launching Rise of the Tomb Raider via Proton and
repeatedly pressing the Super key during the level loading screen;
crash occurs within ~4 minutes uptime. Random crashes are also observed
during routine fullscreen toggles (image viewers, chat applications).
Hardware verified clean: memtest86+ 4 passes, stressapptest -W -m 32
4 hours, both pass with 0 errors. KASAN active, no reports under load.
Fixes: 235c67634230 ("drm/amd/display: add DCN32/321 specific files for Display Core")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
.../drm/amd/display/dc/resource/dcn32/dcn32_resource.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
index 82f81b586986..3751f7a94a05 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
@@ -92,9 +92,14 @@
#include "dml/dcn32/dcn32_fpu.h"
#include "dc_state_priv.h"
+#include "dc_fpu.h"
#include "dml2_0/dml2_wrapper.h"
+#if !defined(DC_RUN_WITH_PREEMPTION_ENABLED)
+#define DC_RUN_WITH_PREEMPTION_ENABLED(code) code
+#endif
+
#define DC_LOGGER_INIT(logger)
enum dcn32_clk_src_array_id {
@@ -1684,7 +1689,8 @@ static void dcn32_enable_phantom_plane(struct dc *dc,
if (curr_pipe->top_pipe && curr_pipe->top_pipe->plane_state == curr_pipe->plane_state)
phantom_plane = prev_phantom_plane;
else
- phantom_plane = dc_state_create_phantom_plane(dc, context, curr_pipe->plane_state);
+ DC_RUN_WITH_PREEMPTION_ENABLED(phantom_plane =
+ dc_state_create_phantom_plane(dc, context, curr_pipe->plane_state));
if (!phantom_plane)
continue;
--
2.54.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED
2026-05-04 20:19 [PATCH] drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED Mikhail Gavrilov
@ 2026-05-04 20:57 ` Aurabindo Pillai
2026-05-04 21:01 ` Alex Deucher
1 sibling, 0 replies; 3+ messages in thread
From: Aurabindo Pillai @ 2026-05-04 20:57 UTC (permalink / raw)
To: Mikhail Gavrilov, harry.wentland, sunpeng.li, alexander.deucher,
christian.koenig
Cc: siqueira, airlied, simona, ardb, hamza.mahfooz, Roman.Li, amd-gfx,
dri-devel, linux-kernel, stable
On 5/4/26 4:19 PM, Mikhail Gavrilov wrote:
> dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with
> DC_FP_START()/DC_FP_END(). On x86 non-RT, DC_FP_START expands into
> kernel_fpu_begin() which takes fpregs_lock(), i.e. local_bh_disable().
> Allocations done inside this region must therefore not sleep.
>
> The legacy DML1 path through dcn32_full_validate_bw_helper() ->
> dcn32_add_phantom_pipes() -> dcn32_enable_phantom_plane() unconditionally
> calls dc_state_create_phantom_plane() -> dc_create_plane_state(), which
> performs kvzalloc(sizeof(struct dc_plane_state)). On a recent kernel
> sizeof(struct dc_plane_state) is 343736 bytes (335 KiB), well above the
> PAGE_ALLOC_COSTLY_ORDER threshold, so __kvmalloc_node() takes the vmalloc
> path. __get_vm_area_node() then trips its BUG_ON(in_interrupt()) because
> SOFTIRQ_DISABLE_OFFSET is set in preempt_count:
>
> kernel BUG at mm/vmalloc.c:3206!
> RIP: __get_vm_area_node+0x257/0x2d0
> Workqueue: events_unbound commit_work
> Call Trace:
> __vmalloc_node_range_noprof+0x22b/0x570
> __kvmalloc_node_noprof+0x3d0/0xb40
> dc_create_plane_state+0x35/0x290 [amdgpu]
> dc_state_create_phantom_plane+0x1a/0x120 [amdgpu]
> dcn32_enable_phantom_plane+0x101/0x780 [amdgpu]
> dcn32_add_phantom_pipes+0x47/0x460 [amdgpu]
> dcn32_full_validate_bw_helper.constprop.0+0xa46/0x1d70 [amdgpu]
> dcn32_internal_validate_bw+0x49c/0x1600 [amdgpu]
> dml1_validate+0x20f/0x800 [amdgpu]
> dcn32_validate_bandwidth+0x317/0x540 [amdgpu]
> dc_validate_with_context+0xd34/0x1d30 [amdgpu]
> dc_commit_streams+0x7ca/0x1810 [amdgpu]
> amdgpu_dm_commit_streams+0xfd4/0x1e60 [amdgpu]
> amdgpu_dm_atomic_commit_tail+0x29e/0x3520 [amdgpu]
> commit_tail+0x204/0x4b0
> process_one_work+0x8fd/0x16a0
>
> Per-CPU __preempt_count on the crashing CPU at panic time was 0x202:
> SOFTIRQ_DISABLE_OFFSET (0x200) from fpregs_lock() plus two preempt holds
> from dc_fpu_begin() and kernel_fpu_begin().
>
> The DML2 paths already wrap their large vzalloc()s in
> DC_RUN_WITH_PREEMPTION_ENABLED() to handle this case (see
> drivers/gpu/drm/amd/display/dc/dml2_0/dml21/dml21_wrapper.c:26 and
> drivers/gpu/drm/amd/display/dc/dml2_0/dml2_wrapper.c:24). Apply the same
> guard to the DML1 phantom-plane allocation in dcn32_enable_phantom_plane().
>
> This is a separate class of issue from "drm/amd/display: Fix unsafe uses
> of kernel mode FPU" by Ard Biesheuvel, which addressed callers entering
> DC FP compilation units without DC_FP_START. The bug fixed here is the
> inverse: a sleeping allocator invoked from within an active DC_FP_START
> region.
>
> Reproducer (RX 7900 XTX, single 4K HDMI display, DCN 3.2): launch any
> workload that produces rapid atomic modeset commits. The most reliable
> trigger observed is launching Rise of the Tomb Raider via Proton and
> repeatedly pressing the Super key during the level loading screen;
> crash occurs within ~4 minutes uptime. Random crashes are also observed
> during routine fullscreen toggles (image viewers, chat applications).
>
> Hardware verified clean: memtest86+ 4 passes, stressapptest -W -m 32
> 4 hours, both pass with 0 errors. KASAN active, no reports under load.
>
> Fixes: 235c67634230 ("drm/amd/display: add DCN32/321 specific files for Display Core")
> Cc: stable@vger.kernel.org # v6.0+
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---
> .../drm/amd/display/dc/resource/dcn32/dcn32_resource.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
> index 82f81b586986..3751f7a94a05 100644
> --- a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
> +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
> @@ -92,9 +92,14 @@
> #include "dml/dcn32/dcn32_fpu.h"
>
> #include "dc_state_priv.h"
> +#include "dc_fpu.h"
>
> #include "dml2_0/dml2_wrapper.h"
>
> +#if !defined(DC_RUN_WITH_PREEMPTION_ENABLED)
> +#define DC_RUN_WITH_PREEMPTION_ENABLED(code) code
> +#endif
> +
> #define DC_LOGGER_INIT(logger)
>
> enum dcn32_clk_src_array_id {
> @@ -1684,7 +1689,8 @@ static void dcn32_enable_phantom_plane(struct dc *dc,
> if (curr_pipe->top_pipe && curr_pipe->top_pipe->plane_state == curr_pipe->plane_state)
> phantom_plane = prev_phantom_plane;
> else
> - phantom_plane = dc_state_create_phantom_plane(dc, context, curr_pipe->plane_state);
> + DC_RUN_WITH_PREEMPTION_ENABLED(phantom_plane =
> + dc_state_create_phantom_plane(dc, context, curr_pipe->plane_state));
>
> if (!phantom_plane)
> continue;
Thank you very much! I'll add this to our weekly testing before merging.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED
2026-05-04 20:19 [PATCH] drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED Mikhail Gavrilov
2026-05-04 20:57 ` Aurabindo Pillai
@ 2026-05-04 21:01 ` Alex Deucher
1 sibling, 0 replies; 3+ messages in thread
From: Alex Deucher @ 2026-05-04 21:01 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: harry.wentland, sunpeng.li, alexander.deucher, christian.koenig,
siqueira, airlied, simona, ardb, hamza.mahfooz, aurabindo.pillai,
Roman.Li, amd-gfx, dri-devel, linux-kernel, stable
On Mon, May 4, 2026 at 4:29 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with
> DC_FP_START()/DC_FP_END(). On x86 non-RT, DC_FP_START expands into
> kernel_fpu_begin() which takes fpregs_lock(), i.e. local_bh_disable().
> Allocations done inside this region must therefore not sleep.
>
> The legacy DML1 path through dcn32_full_validate_bw_helper() ->
> dcn32_add_phantom_pipes() -> dcn32_enable_phantom_plane() unconditionally
> calls dc_state_create_phantom_plane() -> dc_create_plane_state(), which
> performs kvzalloc(sizeof(struct dc_plane_state)). On a recent kernel
> sizeof(struct dc_plane_state) is 343736 bytes (335 KiB), well above the
> PAGE_ALLOC_COSTLY_ORDER threshold, so __kvmalloc_node() takes the vmalloc
> path. __get_vm_area_node() then trips its BUG_ON(in_interrupt()) because
> SOFTIRQ_DISABLE_OFFSET is set in preempt_count:
>
> kernel BUG at mm/vmalloc.c:3206!
> RIP: __get_vm_area_node+0x257/0x2d0
> Workqueue: events_unbound commit_work
> Call Trace:
> __vmalloc_node_range_noprof+0x22b/0x570
> __kvmalloc_node_noprof+0x3d0/0xb40
> dc_create_plane_state+0x35/0x290 [amdgpu]
> dc_state_create_phantom_plane+0x1a/0x120 [amdgpu]
> dcn32_enable_phantom_plane+0x101/0x780 [amdgpu]
> dcn32_add_phantom_pipes+0x47/0x460 [amdgpu]
> dcn32_full_validate_bw_helper.constprop.0+0xa46/0x1d70 [amdgpu]
> dcn32_internal_validate_bw+0x49c/0x1600 [amdgpu]
> dml1_validate+0x20f/0x800 [amdgpu]
> dcn32_validate_bandwidth+0x317/0x540 [amdgpu]
> dc_validate_with_context+0xd34/0x1d30 [amdgpu]
> dc_commit_streams+0x7ca/0x1810 [amdgpu]
> amdgpu_dm_commit_streams+0xfd4/0x1e60 [amdgpu]
> amdgpu_dm_atomic_commit_tail+0x29e/0x3520 [amdgpu]
> commit_tail+0x204/0x4b0
> process_one_work+0x8fd/0x16a0
>
> Per-CPU __preempt_count on the crashing CPU at panic time was 0x202:
> SOFTIRQ_DISABLE_OFFSET (0x200) from fpregs_lock() plus two preempt holds
> from dc_fpu_begin() and kernel_fpu_begin().
>
> The DML2 paths already wrap their large vzalloc()s in
> DC_RUN_WITH_PREEMPTION_ENABLED() to handle this case (see
> drivers/gpu/drm/amd/display/dc/dml2_0/dml21/dml21_wrapper.c:26 and
> drivers/gpu/drm/amd/display/dc/dml2_0/dml2_wrapper.c:24). Apply the same
> guard to the DML1 phantom-plane allocation in dcn32_enable_phantom_plane().
>
> This is a separate class of issue from "drm/amd/display: Fix unsafe uses
> of kernel mode FPU" by Ard Biesheuvel, which addressed callers entering
> DC FP compilation units without DC_FP_START. The bug fixed here is the
> inverse: a sleeping allocator invoked from within an active DC_FP_START
> region.
>
> Reproducer (RX 7900 XTX, single 4K HDMI display, DCN 3.2): launch any
> workload that produces rapid atomic modeset commits. The most reliable
> trigger observed is launching Rise of the Tomb Raider via Proton and
> repeatedly pressing the Super key during the level loading screen;
> crash occurs within ~4 minutes uptime. Random crashes are also observed
> during routine fullscreen toggles (image viewers, chat applications).
>
> Hardware verified clean: memtest86+ 4 passes, stressapptest -W -m 32
> 4 hours, both pass with 0 errors. KASAN active, no reports under load.
>
> Fixes: 235c67634230 ("drm/amd/display: add DCN32/321 specific files for Display Core")
> Cc: stable@vger.kernel.org # v6.0+
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4470
Alex
> ---
> .../drm/amd/display/dc/resource/dcn32/dcn32_resource.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
> index 82f81b586986..3751f7a94a05 100644
> --- a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
> +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
> @@ -92,9 +92,14 @@
> #include "dml/dcn32/dcn32_fpu.h"
>
> #include "dc_state_priv.h"
> +#include "dc_fpu.h"
>
> #include "dml2_0/dml2_wrapper.h"
>
> +#if !defined(DC_RUN_WITH_PREEMPTION_ENABLED)
> +#define DC_RUN_WITH_PREEMPTION_ENABLED(code) code
> +#endif
> +
> #define DC_LOGGER_INIT(logger)
>
> enum dcn32_clk_src_array_id {
> @@ -1684,7 +1689,8 @@ static void dcn32_enable_phantom_plane(struct dc *dc,
> if (curr_pipe->top_pipe && curr_pipe->top_pipe->plane_state == curr_pipe->plane_state)
> phantom_plane = prev_phantom_plane;
> else
> - phantom_plane = dc_state_create_phantom_plane(dc, context, curr_pipe->plane_state);
> + DC_RUN_WITH_PREEMPTION_ENABLED(phantom_plane =
> + dc_state_create_phantom_plane(dc, context, curr_pipe->plane_state));
>
> if (!phantom_plane)
> continue;
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-04 21:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 20:19 [PATCH] drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED Mikhail Gavrilov
2026-05-04 20:57 ` Aurabindo Pillai
2026-05-04 21:01 ` Alex Deucher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox