* Re: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
@ 2024-10-02 4:14 James Alseth
2024-10-02 17:04 ` Deucher, Alexander
0 siblings, 1 reply; 5+ messages in thread
From: James Alseth @ 2024-10-02 4:14 UTC (permalink / raw)
To: Christian Koenig
Cc: Xinhui Pan, Alexander Deucher, Amd Gfx, Regressions,
Sigmaepsilon92
Hello,
I have a new laptop with a 7840U and am running into the same problem on kernel 6.10.11 and 6.11.0. My symptoms are slightly different, in that video played through Firefox works for some period before eventually having a GPU crash. This can be anywhere from seconds to 10+ minutes, though I don't think it has ever passed 20min of playback.
Please let me know what info I can provide to help with root cause analysis.
Regards,
James
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
2024-10-02 4:14 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS James Alseth
@ 2024-10-02 17:04 ` Deucher, Alexander
0 siblings, 0 replies; 5+ messages in thread
From: Deucher, Alexander @ 2024-10-02 17:04 UTC (permalink / raw)
To: James Alseth, Koenig, Christian, Rosca, David
Cc: Pan, Xinhui, Amd Gfx, Regressions, Sigmaepsilon92
[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: James Alseth <james@jalseth.me>
> Sent: Wednesday, October 2, 2024 12:14 AM
> To: Koenig, Christian <Christian.Koenig@amd.com>
> Cc: Pan, Xinhui <Xinhui.Pan@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Amd Gfx <amd-gfx@lists.freedesktop.org>;
> Regressions <regressions@lists.linux.dev>; Sigmaepsilon92
> <sigmaepsilon92@gmail.com>
> Subject: Re: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
>
> Hello,
>
> I have a new laptop with a 7840U and am running into the same problem on kernel
> 6.10.11 and 6.11.0. My symptoms are slightly different, in that video played through
> Firefox works for some period before eventually having a GPU crash. This can be
> anywhere from seconds to 10+ minutes, though I don't think it has ever passed
> 20min of playback.
>
> Please let me know what info I can provide to help with root cause analysis.
+ David
This sounds like a mesa bug which was recently fixed. Can you try a newer version of mesa?
Alex
>
> Regards,
> James
^ permalink raw reply [flat|nested] 5+ messages in thread
* 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
@ 2024-02-15 15:59 Michael Zimmermann
2024-02-15 18:27 ` Deucher, Alexander
2024-02-16 9:33 ` Christian König
0 siblings, 2 replies; 5+ messages in thread
From: Michael Zimmermann @ 2024-02-15 15:59 UTC (permalink / raw)
To: stable; +Cc: regressions, Alex Deucher, Christian König, Pan, Xinhui
I have a Framework 13 with a 7840U and started having massive GPU
driver issues a few weeks ago (including system freezes).
Unfortunately the information of when exactly this started to happen
is gone, but It should be somewhere in between 6.6.0 and 6.7.4.
I got many different and random dmesg-errors and system behaviors, but
I currently can only reproduce one, so let's focus on that for now.
First some basic info:
I'm on Arch Linux using the `linux` kernel package.(currently at 6.7.4).
I have an external monitor connected via a thinkpad thunderbolt 4 dock.
I am using amdgpu.sg_display=0 and VRAM sharing is configured to
UMA_GAME_OPTIMIZED in the firmware settings.
If I start playing a youtube video in firefox with hardware
acceleration enabled, it stutters until it stops playing after a few
seconds. I can see this in the kernel log. I see this multiple times
for many different addresses.
[ 5641.070540] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:40 vmid:1 pasid:32786, for process RDD Process pid 3680
thread firefox-bi:cs0 pid 3852)
[ 5641.070549] amdgpu 0000:c1:00.0: amdgpu: in page starting at
address 0x0000000000020000 from client 18
[ 5641.070553] amdgpu 0000:c1:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00143A51
[ 5641.070556] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client
ID: unknown (0x1d)
[ 5641.070559] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x1
[ 5641.070561] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
[ 5641.070563] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 5641.070565] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 5641.070567] amdgpu 0000:c1:00.0: amdgpu: RW: 0x1
Thanks
Michael
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
2024-02-15 15:59 Michael Zimmermann
@ 2024-02-15 18:27 ` Deucher, Alexander
2024-02-16 9:33 ` Christian König
1 sibling, 0 replies; 5+ messages in thread
From: Deucher, Alexander @ 2024-02-15 18:27 UTC (permalink / raw)
To: Michael Zimmermann, stable@vger.kernel.org
Cc: regressions@lists.linux.dev, Koenig, Christian, Pan, Xinhui
[Public]
> -----Original Message-----
> From: Michael Zimmermann <sigmaepsilon92@gmail.com>
> Sent: Thursday, February 15, 2024 11:00 AM
> To: stable@vger.kernel.org
> Cc: regressions@lists.linux.dev; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>
> Subject: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
>
> I have a Framework 13 with a 7840U and started having massive GPU driver
> issues a few weeks ago (including system freezes).
> Unfortunately the information of when exactly this started to happen is gone,
> but It should be somewhere in between 6.6.0 and 6.7.4.
> I got many different and random dmesg-errors and system behaviors, but I
> currently can only reproduce one, so let's focus on that for now.
>
> First some basic info:
> I'm on Arch Linux using the `linux` kernel package.(currently at 6.7.4).
> I have an external monitor connected via a thinkpad thunderbolt 4 dock.
> I am using amdgpu.sg_display=0 and VRAM sharing is configured to
> UMA_GAME_OPTIMIZED in the firmware settings.
>
> If I start playing a youtube video in firefox with hardware acceleration enabled,
> it stutters until it stops playing after a few seconds. I can see this in the kernel
> log. I see this multiple times for many different addresses.
> [ 5641.070540] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:40 vmid:1 pasid:32786, for process RDD Process pid 3680
> thread firefox-bi:cs0 pid 3852)
> [ 5641.070549] amdgpu 0000:c1:00.0: amdgpu: in page starting at
> address 0x0000000000020000 from client 18 [ 5641.070553] amdgpu
> 0000:c1:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00143A51
> [ 5641.070556] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client
> ID: unknown (0x1d)
> [ 5641.070559] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x1
> [ 5641.070561] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
> [ 5641.070563] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS:
> 0x5
> [ 5641.070565] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
> [ 5641.070567] amdgpu 0000:c1:00.0: amdgpu: RW: 0x1
This is a GPU page fault. E.g., the GPU accessed something that was not mapped into it's virtual address space. In this case it's GPU work from firefox. Did you update mesa? Most often that is the cause of GPU page faults; e.g., a bug in the user mode driver which causes the GPU to read past the end of a buffer or something like that. If you could narrow down what components you changed (kernel, mesa, firmware) and which was causes the issue that would be helpful. If it's only the kernel that has changed can you bisect?
Thanks,
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
2024-02-15 15:59 Michael Zimmermann
2024-02-15 18:27 ` Deucher, Alexander
@ 2024-02-16 9:33 ` Christian König
1 sibling, 0 replies; 5+ messages in thread
From: Christian König @ 2024-02-16 9:33 UTC (permalink / raw)
To: Michael Zimmermann; +Cc: regressions, Alex Deucher, Pan, Xinhui, amd-gfx list
Can you bisect where exactly between 6.6.0 and 6.7.4 the problems started?
Thanks,
Christian.
Am 15.02.24 um 16:59 schrieb Michael Zimmermann:
> I have a Framework 13 with a 7840U and started having massive GPU
> driver issues a few weeks ago (including system freezes).
> Unfortunately the information of when exactly this started to happen
> is gone, but It should be somewhere in between 6.6.0 and 6.7.4.
> I got many different and random dmesg-errors and system behaviors, but
> I currently can only reproduce one, so let's focus on that for now.
>
> First some basic info:
> I'm on Arch Linux using the `linux` kernel package.(currently at 6.7.4).
> I have an external monitor connected via a thinkpad thunderbolt 4 dock.
> I am using amdgpu.sg_display=0 and VRAM sharing is configured to
> UMA_GAME_OPTIMIZED in the firmware settings.
>
> If I start playing a youtube video in firefox with hardware
> acceleration enabled, it stutters until it stops playing after a few
> seconds. I can see this in the kernel log. I see this multiple times
> for many different addresses.
> [ 5641.070540] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:40 vmid:1 pasid:32786, for process RDD Process pid 3680
> thread firefox-bi:cs0 pid 3852)
> [ 5641.070549] amdgpu 0000:c1:00.0: amdgpu: in page starting at
> address 0x0000000000020000 from client 18
> [ 5641.070553] amdgpu 0000:c1:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00143A51
> [ 5641.070556] amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client
> ID: unknown (0x1d)
> [ 5641.070559] amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x1
> [ 5641.070561] amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
> [ 5641.070563] amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x5
> [ 5641.070565] amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
> [ 5641.070567] amdgpu 0000:c1:00.0: amdgpu: RW: 0x1
>
> Thanks
> Michael
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-02 17:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02 4:14 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS James Alseth
2024-10-02 17:04 ` Deucher, Alexander
-- strict thread matches above, loose matches on Subject: below --
2024-02-15 15:59 Michael Zimmermann
2024-02-15 18:27 ` Deucher, Alexander
2024-02-16 9:33 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.