amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* (no subject)
@ 2025-08-20 14:33 Christian König
  2025-08-20 14:33 ` [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Christian König
                   ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Christian König @ 2025-08-20 14:33 UTC (permalink / raw)
  To: intel-xe, intel-gfx, dri-devel, amd-gfx, x86
  Cc: airlied, thomas.hellstrom, matthew.brost, david, dave.hansen,
	luto, peterz

Hi everyone,

sorry for CCing so many people, but that rabbit hole turned out to be
deeper than originally thought.

TTM always had problems with UC/WC mappings on 32bit systems and drivers
often had to revert to hacks like using GFP_DMA32 to get things working
while having no rational explanation why that helped (see the TTM AGP,
radeon and nouveau driver code for that).

It turned out that the PAT implementation we use on x86 not only enforces
the same caching attributes for pages in the linear kernel mapping, but
also for highmem pages through a separate R/B tree.

That was unexpected and TTM never updated that R/B tree for highmem pages,
so the function pgprot_set_cachemode() just overwrote the caching
attributes drivers passed in to vmf_insert_pfn_prot() and that essentially
caused all kind of random trouble.

An R/B tree is potentially not a good data structure to hold thousands if
not millions of different attributes for each page, so updating that is
probably not the way to solve this issue. 

Thomas pointed out that the i915 driver is using apply_page_range()
instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and
just fill in the page tables with what the driver things is the right
caching attribute.

This patch set here implements this and it turns out to much *faster* than
the old implementation. Together with another change on my test system
mapping 1GiB of memory through TTM improved nearly by a factor of 10
(197ms -> 20ms)!

Please review the general idea and/or comment on the patches.

Thanks,
Christian.


^ permalink raw reply	[flat|nested] 50+ messages in thread
* (no subject)
@ 2025-01-08 13:59 Jiang Liu
  2025-01-08 14:10 ` Christian König
  2025-01-08 16:33 ` Re: Mario Limonciello
  0 siblings, 2 replies; 50+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
	sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
	Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
  Cc: Jiang Liu

Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume

Recently we were testing suspend/resume functionality with AMD GPUs,
we have encountered several resource tracking related bugs, such as
double buffer free, use after free and unbalanced irq reference count.

We have tried to solve these issues case by case, but found that may
not be the right way. Especially about the unbalanced irq reference
count, there will be new issues appear once we fixed the current known
issues. After analyzing related source code, we found that there may be
some fundamental implementaion flaws behind these resource tracking
issues.

The amdgpu driver has two major state machines to driver the device
management flow, one is for ip blocks, the other is for ras blocks.
The hook points defined in struct amd_ip_funcs for device setup/teardown
are symmetric, but the implementation is asymmetric, sometime even
ambiguous. The most obvious two issues we noticed are:
1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
   are called from .hw_fini() instead of .early_fini().
2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
   match the way to set those flags.

When taking device suspend/resume into account, in addition to device
probe/remove, things get much more complex. Some issues arise because
many suspend/resume implementations directly reuse .hw_init/.hw_fini/
.late_init hook points.

So we try to fix those issues by two enhancements/refinements to current
device management state machines.

The first change is to make the ip block state machine and associated
status flags work in stack-like way as below:
Callback        Status Flags
early_init:     valid = true
sw_init:        sw = true
hw_init:        hw = true
late_init:      late_initialized = true
early_fini:     late_initialized = false
hw_fini:        hw = false
sw_fini:        sw = false
late_fini:      valid = false

Also do the same thing for ras block state machine, though it's much
more simpler.

The second change is fine tune the overall device management work
flow as below:
1. amdgpu_driver_load_kms()
	amdgpu_device_init()
		amdgpu_device_ip_early_init()
			ip_blocks[i].early_init()
			ip_blocks[i].status.valid = true
		amdgpu_device_ip_init()
			amdgpu_ras_init()
			ip_blocks[i].sw_init()
			ip_blocks[i].status.sw = true
			ip_blocks[i].hw_init()
			ip_blocks[i].status.hw = true
		amdgpu_device_ip_late_init()
			ip_blocks[i].late_init()
			ip_blocks[i].status.late_initialized = true
			amdgpu_ras_late_init()
				ras_blocks[i].ras_late_init()
					amdgpu_ras_feature_enable_on_boot()

2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
	amdgpu_device_suspend()
		amdgpu_ras_early_fini()
			ras_blocks[i].ras_early_fini()
				amdgpu_ras_feature_disable()
		amdgpu_ras_suspend()
			amdgpu_ras_disable_all_features()
+++		ip_blocks[i].early_fini()
+++		ip_blocks[i].status.late_initialized = false
		ip_blocks[i].suspend()

3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
	amdgpu_device_resume()
		amdgpu_device_ip_resume()
			ip_blocks[i].resume()
		amdgpu_device_ip_late_init()
			ip_blocks[i].late_init()
			ip_blocks[i].status.late_initialized = true
			amdgpu_ras_late_init()
				ras_blocks[i].ras_late_init()
					amdgpu_ras_feature_enable_on_boot()
		amdgpu_ras_resume()
			amdgpu_ras_enable_all_features()

4. amdgpu_driver_unload_kms()
	amdgpu_device_fini_hw()
		amdgpu_ras_early_fini()
			ras_blocks[i].ras_early_fini()
+++		ip_blocks[i].early_fini()
+++		ip_blocks[i].status.late_initialized = false
		ip_blocks[i].hw_fini()
		ip_blocks[i].status.hw = false

5. amdgpu_driver_release_kms()
	amdgpu_device_fini_sw()
		amdgpu_device_ip_fini()
			ip_blocks[i].sw_fini()
			ip_blocks[i].status.sw = false
---			ip_blocks[i].status.valid = false
+++			amdgpu_ras_fini()
			ip_blocks[i].late_fini()
+++			ip_blocks[i].status.valid = false
---			ip_blocks[i].status.late_initialized = false
---			amdgpu_ras_fini()

The main changes include:
1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
   Currently there's only one ip block which provides `early_fini`
   callback. We have add a check of `in_s3` to keep current behavior in
   function amdgpu_dm_early_fini(). So there should be no functional
   changes.
2) set ip_blocks[i].status.late_initialized to false after calling
   callback `early_fini`. We have auditted all usages of the
   late_initialized flag and no functional changes found.
3) only set ip_blocks[i].status.valid = false after calling the
   `late_fini` callback.
4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.

Then we try to refine each subsystem, such as nbio, asic, gfx, gmc,
ras etc, to follow the new design. Currently we have only taken the
nbio and asic as examples to show the proposed changes. Once we have
confirmed that's the right way to go, we will handle the lefting
subsystems.

This is in early stage and requesting for comments, any comments and
suggestions are welcomed!
Jiang Liu (13):
  amdgpu: wrong array index to get ip block for PSP
  drm/admgpu: add helper functions to track status for ras manager
  drm/amdgpu: add a flag to track ras debugfs creation status
  drm/amdgpu: free all resources on error recovery path of
    amdgpu_ras_init()
  drm/amdgpu: introduce a flag to track refcount held for features
  drm/amdgpu: enhance amdgpu_ras_block_late_fini()
  drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR
  drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini()
  drm/amdgpu: make IP block state machine works in stack like way
  drm/admgpu: make device state machine work in stack like way
  drm/amdgpu/sdma: improve the way to manage irq reference count
  drm/amdgpu/nbio: improve the way to manage irq reference count
  drm/amdgpu/asic: make ip block operations symmetric by .early_fini()

 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  40 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  37 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c      |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c      |  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h      |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c       | 144 +++++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h       |  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |  26 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h      |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c      |   2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c        |   1 +
 drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c        |   1 +
 drivers/gpu/drm/amd/amdgpu/nv.c               |  14 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |   8 -
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c      |  23 +--
 drivers/gpu/drm/amd/amdgpu/soc15.c            |  38 ++---
 drivers/gpu/drm/amd/amdgpu/soc21.c            |  35 +++--
 drivers/gpu/drm/amd/amdgpu/soc24.c            |  17 ++-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |   3 +
 25 files changed, 326 insertions(+), 118 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 50+ messages in thread
* (no subject)
@ 2022-09-12 12:36 Christian König
  2022-09-13  2:04 ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2022-09-12 12:36 UTC (permalink / raw)
  To: alexander.deucher, amd-gfx

Hey Alex,

I've decided to split this patch set into two because we still can't
figure out where the VCN regressions come from.

Ruijing tested them and confirmed that they don't regress VCN.

Can you and maybe Felix take a look and review them?

Thanks,
Christian.



^ permalink raw reply	[flat|nested] 50+ messages in thread
* (no subject)
@ 2020-07-16 21:22 Mauro Rossi
  2020-07-20  9:00 ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Mauro Rossi @ 2020-07-16 21:22 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, Mauro Rossi, harry.wentland

The series adds SI support to AMD DC

Changelog:

[RFC]
Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c

[PATCH v2]
Rebase on amd-staging-drm-next dated 17-Oct-2018

[PATCH v3]
Add support for DCE6 specific headers,
ad hoc DCE6 macros, funtions and fixes,
rebase on current amd-staging-drm-next


Commits [01/27]..[08/27] SI support added in various DC components

[PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6)
[PATCH v3 02/27] drm/amd/display: add asics info for SI parts
[PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b)
[PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2)
[PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6
[PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2)
[PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4)
[PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4)

Commits [09/27]..[24/27] DCE6 specific code adaptions

[PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2)
[PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64
[PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions
[PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros
[PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions
[PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions
[PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions
[PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions
[PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions
[PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions
[PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7)
[PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init
[PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions
[PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock
[PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions
[PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6)


Commits [25/27]..[27/27] SI support final enablements

[PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later
[PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2)
[PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2)


Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread
[parent not found: <20191205030032.GA26925@ray.huang@amd.com>]
[parent not found: <[PATCH xf86-video-amdgpu 0/3] Add non-desktop and leasing support>]

end of thread, other threads:[~2025-08-30 16:16 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-20 14:33 Christian König
2025-08-20 14:33 ` [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Christian König
2025-08-20 14:33 ` [PATCH 2/3] drm/ttm: reapply increase ttm pre-fault value to PMD size" Christian König
2025-08-20 14:33 ` [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 Christian König
2025-08-20 15:12   ` Borislav Petkov
2025-08-20 15:23 ` David Hildenbrand
2025-08-21  8:10   ` Re: Christian König
2025-08-25 19:10     ` Re: David Hildenbrand
2025-08-26  8:38       ` Re: Christian König
2025-08-26  8:46         ` Re: David Hildenbrand
2025-08-26  9:00           ` Re: Christian König
2025-08-26  9:17             ` Re: David Hildenbrand
2025-08-26  9:56               ` Re: Christian König
2025-08-26 12:07                 ` Re: David Hildenbrand
2025-08-26 16:09                   ` Re: Christian König
2025-08-27  9:13                     ` [PATCH 0/3] drm/ttm: Michel Dänzer
2025-08-28 21:18                     ` stupid and complicated PAT :) David Hildenbrand
2025-08-28 21:28                       ` David Hildenbrand
2025-08-28 21:32                         ` David Hildenbrand
2025-08-29 10:50                           ` Christian König
2025-08-29 19:52                             ` David Hildenbrand
2025-08-29 19:58                               ` David Hildenbrand
2025-08-26 14:27                 ` Thomas Hellström
2025-08-28 21:01                   ` stupid PAT :) David Hildenbrand
2025-08-26 12:37         ` David Hildenbrand
2025-08-21  9:16   ` your mail Lorenzo Stoakes
2025-08-21  9:30     ` David Hildenbrand
2025-08-21 10:05       ` Lorenzo Stoakes
2025-08-21 10:16         ` David Hildenbrand
2025-08-25 18:35         ` Christian König
2025-08-25 19:20           ` David Hildenbrand
  -- strict thread matches above, loose matches on Subject: below --
2025-01-08 13:59 Jiang Liu
2025-01-08 14:10 ` Christian König
2025-01-08 16:33 ` Re: Mario Limonciello
2025-01-09  5:34   ` Re: Gerry Liu
2025-01-09 17:10     ` Re: Mario Limonciello
2025-01-13  1:19       ` Re: Gerry Liu
2025-01-13 21:59         ` Re: Mario Limonciello
2022-09-12 12:36 Christian König
2022-09-13  2:04 ` Alex Deucher
2020-07-16 21:22 Mauro Rossi
2020-07-20  9:00 ` Christian König
2020-07-20  9:59   ` Re: Mauro Rossi
2020-07-22  2:51     ` Re: Alex Deucher
2020-07-22  7:56       ` Re: Mauro Rossi
2020-07-24 18:31         ` Re: Alex Deucher
2020-07-26 15:31           ` Re: Mauro Rossi
2020-07-27 18:31             ` Re: Alex Deucher
2020-07-27 19:46               ` Re: Mauro Rossi
2020-07-27 19:54                 ` Re: Alex Deucher
     [not found] <20191205030032.GA26925@ray.huang@amd.com>
2019-12-09  1:26 ` Quan, Evan
     [not found] <[PATCH xf86-video-amdgpu 0/3] Add non-desktop and leasing support>
2018-03-03  4:49 ` (unknown), Keith Packard
     [not found]   ` <20180303044931.6902-1-keithp-aN4HjG94KOLQT0dZR+AlfA@public.gmane.org>
2018-03-05 10:02     ` Michel Dänzer
     [not found]       ` <82fc592b-f680-c663-1a0f-7b522ca932d2-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-03-05 16:41         ` Re: Keith Packard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).