Re: [RFC PATCH] drm/amd/display: Pin native scanout to VRAM on large-carveout APUs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Harry Wentland <harry.wentland@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Matthew Schwartz" <matthew.schwartz@linux.dev>,
	"Melissa Wen" <mwen@igalia.com>, "Leo Li" <sunpeng.li@amd.com>,
	"Rodrigo Siqueira" <siqueira@igalia.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	natalie.vock@gmx.de
Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Pierre-Loup A . Griffais" <pgriffais@valvesoftware.com>
Subject: Re: [RFC PATCH] drm/amd/display: Pin native scanout to VRAM on large-carveout APUs
Date: Wed, 24 Jun 2026 13:55:26 -0400	[thread overview]
Message-ID: <faa1c424-2282-4e70-9934-e29f437f8bbb@amd.com> (raw)
In-Reply-To: <9b96d6a5-7c3c-4bd5-8785-76c9642bc933@amd.com>



On 2026-06-24 11:52, Christian König wrote:
> On 6/24/26 17:30, Harry Wentland wrote:
>> On 2026-06-16 03:31, Christian König wrote:
>>> On 6/16/26 09:10, Matthew Schwartz wrote:
>>>> Native scanout buffers on APUs are pinned with the VRAM|GTT domain, so
>>>> under VRAM carveout pressure a swapchain can end up split across VRAM and
>>>> GTT. The scanout buffer's memory type then changes from one flip to the
>>>> next, and amdgpu_dm_crtc_mem_type_changed() rejects an async page flip
>>>> across the change. The result is repeated async page flip failures,
>>>> observed as choppy updates under carveout pressure, until the buffers
>>>> reconverge to a single domain.
>>>
>>> That's intentional behavior.
>>>
>>>> Pin native scanout buffers in VRAM only so the swapchain stays in one
>>>> memory domain. Restrict this to APUs whose carveout is larger than
>>
>> Above you mention that under VRAM pressure a swapchain can end up split
>> across VRAM and GTT. Wouldn't restricting the swapchain to VRAM now mean
>> that in those cases you fail to allocate the swapchain entirely?
> 
> Yes, exactly that.
> 
> My educated guess is that the display server then falls back to using a copy instead of a flip and that helps saving memory somehow (e.g. less scanout buffers alocated concurrently).
> 
> Would it somehow be possible to get DC to dynamically switch between VRAM and GTT?
> 

DCN can't switch between mapped and unmapped memory. I'm not a memory
management expert but wouldn't GTT be in GART (mapped) and VRAM in
the (unmapped) FB aperture?

From DCHUB HW doc:
"No change from mapped to unmapped or unmapped to mapped is 
allowed for immediate f lip"

If so, we can't async flip between them.

Harry

> Regards,
> Christian
> 
>>
>> Harry
>>
>>>> AMDGPU_SG_THRESHOLD, so small-carveout parts keep their existing VRAM|GTT
>>>> placement, and fall back to GTT when the buffer does not fit in VRAM, so
>>>> the flip still succeeds and the swapchain stays in one domain. Imported
>>>> buffers may only be pinnable in GTT, so leave those on the default
>>>> domains.
>>>
>>> The display guys need to take a closer look at that, but it sounds like what we used to have before and that caused problems.
>>>
>>> We somehow need to change the DC stuff to allow switching between VRAM and GTT frame buffers to fully fix this.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev>
>>>> ---
>>>> Hi,
>>>>
>>>> This came up while testing my kernel patch to fix mem_type detection for
>>>> async flips here: https://lore.kernel.org/amd-gfx/20260611154438.571685-1-matthew.schwartz@linux.dev/
>>>>
>>>> I found a new issue where splitting a swapchain between VRAM and GTT
>>>> causes a noticeable stutter in gameplay if gamescope is using direct
>>>> scanout and tearing is enabled while a game is already running.
>>>>
>>>> Once a swapchain is split across the VRAM carveout and GTT, the scanout
>>>> buffer's mem_type changes from one flip to the next, so
>>>> amdgpu_dm_crtc_mem_type_changed() rejects the async flip. Under direct
>>>> scanout with tearing that rejection recurs every time the displayed buffer
>>>> crosses domains, which is what surfaces as the choppiness. 
>>>>
>>>> With this patch, I can enable tearing on top of an already-disabled frame
>>>> limit mid-game and no longer reproduce the choppiness.
>>>>
>>>> amdgpu_gem_info confirms the swapchain converges to a single domain
>>>> instead of splitting across VRAM and GTT.
>>>>
>>>> Before:
>>>> 0x00000f81:      3981312 byte GTT exported as ino:275 NO_CPU_ACCESS CPU_GTT_USWC VRAM_CLEARED VRAM_CONTIGUOUS EXPLICIT_SYNC     write fence:drm_sched gfx_0.0.0 seq 88248 signalled
>>>> 0x00000f82:      3981312 byte GTT exported as ino:276 NO_CPU_ACCESS CPU_GTT_USWC VRAM_CLEARED VRAM_CONTIGUOUS EXPLICIT_SYNC     write fence:drm_sched gfx_0.0.0 seq 88224 signalled
>>>> 0x00000f83:      3981312 byte VRAM VISIBLE pin count 1 exported as ino:277 NO_CPU_ACCESS CPU_GTT_USWC VRAM_CLEARED VRAM_CONTIGUOUS EXPLICIT_SYNC        write fence:drm_sched gfx_0.0.0 seq 88236 signalled
>>>>
>>>> After:
>>>> 0x00000f82:      3981312 byte VRAM VISIBLE pin count 1 exported as ino:548 NO_CPU_ACCESS CPU_GTT_USWC VRAM_CLEARED VRAM_CONTIGUOUS EXPLICIT_SYNC        write fence:drm_sched gfx_0.0.0 seq 822258 signalled
>>>> 0x00000f83:      3981312 byte VRAM VISIBLE exported as ino:549 NO_CPU_ACCESS CPU_GTT_USWC VRAM_CLEARED VRAM_CONTIGUOUS EXPLICIT_SYNC    write fence:drm_sched gfx_0.0.0 seq 822255 signalled
>>>> 0x00000f84:      3981312 byte VRAM VISIBLE exported as ino:550 NO_CPU_ACCESS CPU_GTT_USWC VRAM_CLEARED VRAM_CONTIGUOUS EXPLICIT_SYNC    write fence:drm_sched gfx_0.0.0 seq 822261 signalled
>>>>
>>>> Does this seem like the correct approach to take for fixing the observed
>>>> issue? I wanted to start with an RFC to make sure I didn't overlook
>>>> anything obvious or miss any better methods of fixing this.
>>>>
>>>> Thanks,
>>>> Matt
>>>> ---
>>>>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 29 +++++++++++++++++--
>>>>  1 file changed, 26 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>>>> index 23a9faa2ea89..b99f938e58ec 100644
>>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
>>>> @@ -932,6 +932,7 @@ static int amdgpu_dm_plane_helper_prepare_fb(struct drm_plane *plane,
>>>>  	struct amdgpu_bo *rbo;
>>>>  	struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
>>>>  	uint32_t domain;
>>>> +	bool pin_vram_only;
>>>>  	int r;
>>>>  
>>>>  	if (!new_state->fb) {
>>>> @@ -958,13 +959,35 @@ static int amdgpu_dm_plane_helper_prepare_fb(struct drm_plane *plane,
>>>>  	if (r)
>>>>  		goto error_unlock;
>>>>  
>>>> -	if (plane->type != DRM_PLANE_TYPE_CURSOR)
>>>> -		domain = amdgpu_display_supported_domains(adev, rbo->flags);
>>>> -	else
>>>> +	/*
>>>> +	 * Pin native scanout in VRAM on APUs so a swapchain stays in one
>>>> +	 * memory domain. A VRAM/GTT split changes its mem_type between flips
>>>> +	 * and amdgpu_dm_crtc_mem_type_changed() rejects the async flip. Skip
>>>> +	 * small carveouts that may not fit, and imported buffers.
>>>> +	 */
>>>> +	pin_vram_only = plane->type != DRM_PLANE_TYPE_CURSOR &&
>>>> +			(adev->flags & AMD_IS_APU) &&
>>>> +			!rbo->tbo.base.import_attach &&
>>>> +			adev->gmc.real_vram_size > AMDGPU_SG_THRESHOLD;
>>>> +
>>>> +	if (plane->type == DRM_PLANE_TYPE_CURSOR || pin_vram_only)
>>>>  		domain = AMDGPU_GEM_DOMAIN_VRAM;
>>>> +	else
>>>> +		domain = amdgpu_display_supported_domains(adev, rbo->flags);
>>>>  
>>>>  	rbo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
>>>>  	r = amdgpu_bo_pin(rbo, domain);
>>>> +	if (r == -ENOMEM && pin_vram_only) {
>>>> +		/*
>>>> +		 * VRAM could not fit the buffer. Fall back to GTT where
>>>> +		 * allowed so the swapchain stays in one domain.
>>>> +		 */
>>>> +		domain = amdgpu_display_supported_domains(adev, rbo->flags);
>>>> +		if (domain & AMDGPU_GEM_DOMAIN_GTT) {
>>>> +			domain = AMDGPU_GEM_DOMAIN_GTT;
>>>> +			r = amdgpu_bo_pin(rbo, domain);
>>>> +		}
>>>> +	}
>>>>  	if (unlikely(r != 0)) {
>>>>  		if (r != -ERESTARTSYS)
>>>>  			DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
>>>
>>
>

     prev parent reply	other threads:[~2026-06-24 17:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16  7:10 [RFC PATCH] drm/amd/display: Pin native scanout to VRAM on large-carveout APUs Matthew Schwartz
2026-06-16  7:21 ` sashiko-bot
2026-06-16  7:31 ` Christian König
2026-06-24 15:30   ` Harry Wentland
2026-06-24 15:52     ` Christian König
2026-06-24 17:55       ` Harry Wentland [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=faa1c424-2282-4e70-9934-e29f437f8bbb@amd.com \
    --to=harry.wentland@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=matthew.schwartz@linux.dev \
    --cc=mwen@igalia.com \
    --cc=natalie.vock@gmx.de \
    --cc=pgriffais@valvesoftware.com \
    --cc=siqueira@igalia.com \
    --cc=sunpeng.li@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.