From: "Michel Dänzer" <michel@daenzer.net>
To: Philip Yang <yangp@amd.com>, Philip Yang <Philip.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>,
Felix Kuehling <Felix.Kuehling@amd.com>,
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 28/29] drm/amdkfd: Refactor migrate init to support partition switch
Date: Fri, 21 Jul 2023 10:55:45 +0200 [thread overview]
Message-ID: <d515206e-ab58-a8c4-ef3a-e93fc61ba37d@daenzer.net> (raw)
In-Reply-To: <f8c83922-f3d4-34d8-6ae1-3112b52bcdf3@amd.com>
[-- Attachment #1: Type: text/plain, Size: 3138 bytes --]
On 7/20/23 22:48, Philip Yang wrote:
> On 2023-07-20 06:46, Michel Dänzer wrote:
>> On 7/17/23 15:09, Michel Dänzer wrote:
>>> On 5/10/23 23:23, Alex Deucher wrote:
>>>> From: Philip Yang <Philip.Yang@amd.com>
>>>>
>>>> Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
>>>> because it setup zone devive pgmap for page migration and keep it in
>>>> kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
>>>> only once in amdgpu_device_ip_init after adev ip blocks are initialized,
>>>> but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
>>>> SVM support based on pgmap.
>>>>
>>>> svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
>>>> switching compute partition mode.
>>>>
>>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>>> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> I bisected a regression to this commit, which broke HW acceleration on this ThinkPad E595 with Picasso APU.
>> Actually, it doesn't seem to break HW acceleration completely. GDM eventually comes up with HW acceleration, it takes a long time (~30s or so) to start up though.
>>
>> Later, the same messages as described in https://gitlab.freedesktop.org/drm/amd/-/issues/2659 appear.
>>
>> Reverting this commit fixes all of the above symptoms.
>>
>>
>> I reproduced all of the above symptoms with amd-staging-drm-next commit 75515acf4b60 ("i2c: nvidia-gpu: Add ACPI property to align with device-tree") as well.
>>
>>
>> For full disclosure, I use these kernel command line arguments:
>>
>> fbcon=font:10x18 drm_kms_helper.drm_fbdev_overalloc=112 amdgpu.noretry=1 amdgpu.mcbp=1
>
> Thanks for the issue report and full disclosure, but I am not able to reproduce this issue, with both drm-next branch and amd-staging-drm-next branch tip on gitlab. The test system has same device id, running Ubuntu 22.04, latest linux-firmware-20230625.tar.gz, and same BIOS version.
FWIW, your system has PCI revision ID 0xC2, while mine has 0xC1.
Also, I'm currently using linux-firmware 20230515. AFAICT there are no relevant changes in 20230625, but I'm attaching the contents of /sys/kernel/debug/dri/0/amdgpu_firmware_info just in case.
> I attached full dmesg log, could you help check if there is other difference, maybe kernel config, gcc version... it is hard to guess what could cause the basic driver gfx ring IB test timeout.
I suspect the IOMMU page faults logged in my dmesg might be relevant:
amdgpu: Topology: Add APU node [0x15d8:0x1002]
amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x122201800 flags=0x0070]
amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1125fe380 flags=0x0070]
kfd kfd: amdgpu: added device 1002:15d8
There are no such page faults with the commit reverted.
Other than that and the IB test failure messages, our dmesg outputs are mostly identical indeed.
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
[-- Attachment #2: amdgpu_firmware_info.txt --]
[-- Type: text/plain, Size: 1810 bytes --]
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 53, firmware version: 0x000000a6
PFP feature version: 53, firmware version: 0x000000c2
CE feature version: 53, firmware version: 0x00000050
RLC feature version: 1, firmware version: 0x0000006f
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
RLCP feature version: 0, firmware version: 0x00000000
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 53, firmware version: 0x000001d3
MEC2 feature version: 53, firmware version: 0x000001d3
IMU feature version: 0, firmware version: 0x00000000
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x21000090
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700002e
TA DTM feature version: 0x00000000, firmware version: 0x12000012
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x27000005
SMC feature version: 0, program: 0, firmware version: 0x00041e2a (4.30.42)
SDMA0 feature version: 41, firmware version: 0x000000a9
VCN feature version: 0, firmware version: 0x0210d004
DMCU feature version: 0, firmware version: 0x00000001
DMCUB feature version: 0, firmware version: 0x00000000
TOC feature version: 0, firmware version: 0x00000000
MES_KIQ feature version: 0, firmware version: 0x00000000
MES feature version: 0, firmware version: 0x00000000
VBIOS version: 113-PICASSO-114
next prev parent reply other threads:[~2023-07-21 8:55 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-10 21:23 [PATCH 01/29] drm/amdgpu: support partition drm devices Alex Deucher
2023-05-10 21:23 ` [PATCH 02/29] drm/amdgpu: find partition ID when open device Alex Deucher
2023-05-10 21:23 ` [PATCH 03/29] drm/amdgpu: add partition ID track in ring Alex Deucher
2023-05-10 21:23 ` [PATCH 04/29] drm/amdgpu: update header to support partition scheduling Alex Deucher
2023-05-10 21:23 ` [PATCH 05/29] drm/amdgpu: add partition scheduler list update Alex Deucher
2023-05-10 21:23 ` [PATCH 06/29] drm/amdgpu: keep amdgpu_ctx_mgr in ctx structure Alex Deucher
2023-05-19 12:16 ` Mike Lothian
2023-05-19 13:36 ` Alex Deucher
2023-05-10 21:23 ` [PATCH 07/29] drm/amdgpu: add partition schedule for GC(9, 4, 3) Alex Deucher
2023-05-10 21:23 ` [PATCH 08/29] drm/amdgpu: run partition schedule if it is supported Alex Deucher
2023-05-10 21:23 ` [PATCH 09/29] drm/amdgpu: update ref_cnt before ctx free Alex Deucher
2023-05-10 21:23 ` [PATCH 10/29] drm/amdgpu: Add xcp manager num_xcp_per_mem_partition Alex Deucher
2023-05-10 21:23 ` [PATCH 11/29] drm/amdkfd: Store drm node minor number for kfd nodes Alex Deucher
2023-05-10 21:23 ` [PATCH 12/29] drm/amdgpu: Add memory partition id to amdgpu_vm Alex Deucher
2023-05-10 21:23 ` [PATCH 13/29] drm/amdkfd: Show KFD node memory partition info Alex Deucher
2023-05-10 21:23 ` [PATCH 14/29] drm/amdgpu: Add memory partition mem_id to amdgpu_bo Alex Deucher
2023-05-10 21:23 ` [PATCH 15/29] drm/amdkfd: Alloc memory of GPU support memory partition Alex Deucher
2023-05-10 21:23 ` [PATCH 16/29] drm/amdkfd: SVM range allocation " Alex Deucher
2023-05-10 21:23 ` [PATCH 17/29] drm/amdgpu: dGPU mode placement " Alex Deucher
2023-05-10 21:23 ` [PATCH 18/29] drm/amdkfd: Update MTYPE for far " Alex Deucher
2023-05-10 21:23 ` [PATCH 19/29] drm/amdgpu: Alloc page table on correct " Alex Deucher
2023-05-10 21:23 ` [PATCH 20/29] drm/amdgpu: dGPU mode set VRAM range lpfn as exclusive Alex Deucher
2023-05-10 21:23 ` [PATCH 21/29] drm/amdkfd: Store xcp partition id to amdgpu bo Alex Deucher
2023-05-10 21:23 ` [PATCH 22/29] drm/amdgpu: KFD graphics interop support compute partition Alex Deucher
2023-05-10 21:23 ` [PATCH 23/29] drm/amdgpu: use xcp partition ID for amdgpu_gem Alex Deucher
2023-05-10 21:23 ` [PATCH 24/29] drm/amdkfd: Move local_mem_info to kfd_node Alex Deucher
2023-05-10 21:23 ` [PATCH 25/29] drm/amdkfd: Fix memory reporting on GFX 9.4.3 Alex Deucher
2023-05-10 21:23 ` [PATCH 26/29] drm/amdkfd: APU mode set max svm range pages Alex Deucher
2023-05-10 21:23 ` [PATCH 27/29] drm/amdgpu: route ioctls on primary node of XCPs to primary device Alex Deucher
2023-05-10 21:23 ` [PATCH 28/29] drm/amdkfd: Refactor migrate init to support partition switch Alex Deucher
2023-07-17 13:09 ` Michel Dänzer
2023-07-19 16:17 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-08-11 9:02 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-08-11 15:54 ` Michel Dänzer
2023-07-20 10:46 ` Michel Dänzer
2023-07-20 20:48 ` Philip Yang
2023-07-21 8:55 ` Michel Dänzer [this message]
2023-07-21 10:09 ` Michel Dänzer
2023-07-21 13:30 ` Philip Yang
2023-07-24 20:04 ` Philip Yang
2023-07-25 8:09 ` Michel Dänzer
2023-07-27 6:10 ` Zhang, Jesse(Jie)
2023-07-28 1:38 ` Zhang, Jesse(Jie)
2023-07-28 9:30 ` Michel Dänzer
2023-07-28 14:25 ` Michel Dänzer
2023-07-28 16:43 ` Alex Deucher
2023-07-28 17:18 ` Michel Dänzer
2023-07-28 17:20 ` Alex Deucher
2023-08-07 16:04 ` Michel Dänzer
2023-08-07 22:08 ` Alex Deucher
2023-05-10 21:23 ` [PATCH 29/29] drm/amdgpu: Correct get_xcp_mem_id calculation Alex Deucher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d515206e-ab58-a8c4-ef3a-e93fc61ba37d@daenzer.net \
--to=michel@daenzer.net \
--cc=Felix.Kuehling@amd.com \
--cc=Philip.Yang@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=yangp@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.