All of lore.kernel.org
 help / color / mirror / Atom feed
* Kenel Ooops with: "drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status" [bisected].
@ 2019-09-01 15:44 Przemek Socha
  2019-09-01 17:40 ` Alex Deucher
  0 siblings, 1 reply; 3+ messages in thread
From: Przemek Socha @ 2019-09-01 15:44 UTC (permalink / raw)
  To: amd-gfx list, Hawking Zhang, Alex Deucher


[-- Attachment #1.1: Type: text/plain, Size: 12821 bytes --]

Hello everyone,

after today sync with amd-staging-drm-next repo my machine was hit by Ooops 
bug.
Maybe my google-foo is weak, but I could not find any fix on patchwork for this 
that will/was implemented or planned.

Machine is a Lenovo netbook with a6-6310 APU, R4 (CIK).

I have done bisection and here are the results:


1.  dmesg output from pstore after kernel panic:

<6>[   13.133880] [drm] amdgpu kernel modesetting enabled.
<6>[   13.133923] amdgpu 0000:00:01.0: remove_conflicting_pci_framebuffers: bar 
0: 0xe0000000 -> 0xefffffff
<6>[   13.133927] amdgpu 0000:00:01.0: remove_conflicting_pci_framebuffers: bar 
2: 0xf0000000 -> 0xf07fffff
<6>[   13.133930] amdgpu 0000:00:01.0: remove_conflicting_pci_framebuffers: bar 
5: 0xf0c00000 -> 0xf0c3ffff
<7>[   13.133933] checking generic (e0000000 420000) vs hw (e0000000 10000000)
<6>[   13.133935] fb0: switching to amdgpudrmfb from EFI VGA
<6>[   13.133999] Console: switching to colour dummy device 80x25
<6>[   13.136463] [drm] initializing kernel modesetting (MULLINS 0x1002:0x9851 
0x17AA:0x3801 0x00).
<6>[   13.136826] [drm] register mmio base: 0xF0C00000
<6>[   13.136827] [drm] register mmio size: 262144
<6>[   13.136837] [drm] add ip block number 0 <cik_common>
<6>[   13.136839] [drm] add ip block number 1 <gmc_v7_0>
<6>[   13.136840] [drm] add ip block number 2 <cik_ih>
<6>[   13.136842] [drm] add ip block number 3 <gfx_v7_0>
<6>[   13.136844] [drm] add ip block number 4 <cik_sdma>
<6>[   13.136845] [drm] add ip block number 5 <kv_dpm>
<6>[   13.136847] [drm] add ip block number 6 <dm>
<6>[   13.136849] [drm] add ip block number 7 <uvd_v4_2>
<6>[   13.136850] [drm] add ip block number 8 <vce_v2_0>
<6>[   13.136857] amdgpu 0000:00:01.0: kfd not supported on this ASIC
<6>[   13.136916] ATOM BIOS: BR45787.ts5
<6>[   13.137031] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, 
fragment size is 9-bit
<6>[   13.137042] amdgpu 0000:00:01.0: VRAM: 1024M 0x000000F400000000 - 
0x000000F43FFFFFFF (1024M used)
<6>[   13.137046] amdgpu 0000:00:01.0: GART: 1024M 0x000000FF00000000 - 
0x000000FF3FFFFFFF
<6>[   13.137056] [drm] Detected VRAM RAM=1024M, BAR=1024M
<6>[   13.137057] [drm] RAM width 64bits UNKNOWN
<6>[   13.138102] sdhci: Secure Digital Host Controller Interface driver
<6>[   13.138105] sdhci: Copyright(c) Pierre Ossman
<6>[   13.138741] [TTM] Zone  kernel: Available graphics memory: 3541568 KiB
<6>[   13.138744] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
<6>[   13.138745] [TTM] Initializing pool allocator
<6>[   13.138754] [TTM] Initializing DMA pool allocator
<6>[   13.138882] [drm] amdgpu: 1024M of VRAM memory ready
<6>[   13.138891] [drm] amdgpu: 3072M of GTT memory ready.
<6>[   13.138932] [drm] GART: num cpu pages 262144, num gpu pages 262144
<6>[   13.138970] [drm] PCIE GART of 1024M enabled (table at 
0x000000F400401000).
<6>[   13.176861] [drm] Internal thermal controller without fan control
<6>[   13.176865] [drm] amdgpu: dpm initialized
<6>[   13.176872] [drm] Found UVD firmware Version: 1.64 Family ID: 9
<6>[   13.178133] sdhci-pci 0000:00:14.7: SDHCI controller found [1022:7813] 
(rev 1)
<6>[   13.180552] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
<6>[   13.186202] kvm: Nested Virtualization enabled
<6>[   13.186205] kvm: Nested Paging enabled
<6>[   13.191378] mmc0: SDHCI controller on PCI [0000:00:14.7] using ADMA
<3>[   13.196258] [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: 
invalid powerlevel state: 0!
<4>[   13.196308] [drm] Unsupported Connector type:5!
<6>[   13.213496] [drm] Display Core initialized with v3.2.48!
<6>[   13.221850] [drm] SADs count is: -2, don't need to read it
<6>[   13.230392] ath: phy0: WB335 2-ANT card detected
<6>[   13.230395] ath: phy0: Set BT/WLAN RX diversity capability
<6>[   13.247472] ath: phy0: Enable LNA combining
<6>[   13.248570] ath: phy0: ASPM enabled: 0x43
<7>[   13.248574] ath: EEPROM regdomain: 0x6a
<7>[   13.248575] ath: EEPROM indicates we should expect a direct regpair map
<7>[   13.248579] ath: Country alpha2 being used: 00
<7>[   13.248580] ath: Regpair used: 0x6a
<7>[   13.261552] ieee80211 phy0: Selected rate control algorithm 
'minstrel_ht'
<6>[   13.261857] ieee80211 phy0: Atheros AR9565 Rev:1 mem=0xffffa9f1c0400000, 
irq=43
<6>[   13.296215] ath9k 0000:01:00.0 wlp1s0: renamed from wlan0
<6>[   13.304323] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
<6>[   13.304325] [drm] Driver supports precise vblank timestamp query.
<6>[   13.321092] [drm] UVD initialized successfully.
<6>[   13.373473] usb 1-1: new high-speed USB device number 2 using ehci-pci
<6>[   13.386794] usb 4-1: new high-speed USB device number 2 using ehci-pci
<6>[   13.442287] [drm] VCE initialized successfully.
<1>[   13.444174] BUG: kernel NULL pointer dereference, address: 
00000000000000a8
<1>[   13.444191] #PF: supervisor read access in kernel mode
<1>[   13.444197] #PF: error_code(0x0000) - not-present page
<6>[   13.444202] PGD 0 P4D 0 
<4>[   13.444210] Oops: 0000 [#1] PREEMPT SMP
<4>[   13.444218] CPU: 1 PID: 3311 Comm: laptop_mode Not tainted 5.2.0-rc1+ 
#94
<4>[   13.444224] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 
08/04/2016
<4>[   13.444392] RIP: 0010:amdgpu_irq_handler+0x28/0x78 [amdgpu]
<4>[   13.444401] Code: 00 00 41 54 55 53 48 8b 6e 28 48 89 f3 48 89 ef 48 8d 
b5 88 5f 00 00 e8 0e 0a 00 00 41 89 c4 ff c8 74 3e 48 8b 85 d0 70 00 00 <48> 8b 
90 a8 00 00 00 48 85 d2 74 0f 48 89 ef e8 1c 75 f7 cb 48 8b
<4>[   13.444414] RSP: 0000:ffffa9f1c00ecf00 EFLAGS: 00010012
<4>[   13.444420] RAX: 0000000000000000 RBX: ffff947b96a700b0 RCX: 
0000000000000018
<4>[   13.444427] RDX: 00000000008e7d30 RSI: 001a351391f4b553 RDI: 
ffffffff8ca17720
<4>[   13.444433] RBP: ffff947b8fb80000 R08: ffffffff8c6077e0 R09: ffff947b97ba4af8
<4>[   13.444440] R10: ffff947b969cd2b8 R11: ffff947b969cd2a8 R12: 
0000000000000001
<4>[   13.444446] R13: 0000000000000000 R14: ffffa9f1c00ecf64 R15: 
0000000000000000
<4>[   13.444453] FS:  00007f40b84aa740(0000) GS:ffff947b97a80000(0000) knlGS:
0000000000000000
<4>[   13.444461] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   13.444466] CR2: 00000000000000a8 CR3: 000000020e627000 CR4: 
00000000000406e0
<4>[   13.444472] Call Trace:
<4>[   13.444481]  <IRQ>
<4>[   13.444492]  __handle_irq_event_percpu+0x3d/0x1a0
<4>[   13.444501]  handle_irq_event_percpu+0x2c/0x78
<4>[   13.444508]  handle_irq_event+0x2f/0x4c
<4>[   13.444515]  handle_edge_irq+0x95/0x1c0
<4>[   13.444523]  handle_irq+0x17/0x20
<4>[   13.444531]  do_IRQ+0x4a/0xe0
<4>[   13.444539]  common_interrupt+0xf/0xf
<4>[   13.444545]  </IRQ>
<4>[   13.444550] RIP: 0033:0x56277e2c6830
<4>[   13.444556] Code: 68 7d 00 00 00 e9 10 f8 ff ff ff 25 22 52 0d 00 68 7e 00 
00 00 e9 00 f8 ff ff ff 25 1a 52 0d 00 68 7f 00 00 00 e9 f0 f7 ff ff <ff> 25 12 52 
0d 00 68 80 00 00 00 e9 e0 f7 ff ff ff 25 0a 52 0d 00
<4>[   13.444568] RSP: 002b:00007ffddc457328 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffffda
<4>[   13.444576] RAX: 0000562780370044 RBX: 0000000000000056 RCX: 
0000000000000045
<4>[   13.444582] RDX: 0000000000000001 RSI: 000056278037f670 RDI: 
00005627804390c0
<4>[   13.444588] RBP: 0000562780443150 R08: 0000000000000000 R09: 
0000000000003cff
<4>[   13.444595] R10: 0000000000100000 R11: 0000000000000098 R12: 
00005627804390c0
<4>[   13.444601] R13: 0000562780440044 R14: 0000562780457a00 R15: 
00000000000000bc
<4>[   13.444609] Modules linked in: ath9k ath9k_common ath9k_hw kvm_amd 
sdhci_pci iosf_mbi mac80211 cqhci kvm sdhci irqbypass crc32_pclmul 
ghash_clmulni_intel serio_raw mmc_core ath amdgpu(+) cfg80211 gpu_sched 
mfd_core ttm xhci_pci ehci_pci xhci_hcd sp5100_tco ehci_hcd
<4>[   13.444645] CR2: 00000000000000a8
<4>[   13.444654] ---[ end trace cd97c823583992aa ]---
<6>[   13.446294] [drm] fb mappable at 0xA07ED000
<6>[   13.446305] [drm] vram apper at 0xA0000000
<6>[   13.446310] [drm] size 5767168
<6>[   13.446315] [drm] fb depth is 24
<6>[   13.446319] [drm]    pitch is 5632
<6>[   13.446480] fbcon: amdgpudrmfb (fb0) is primary device
<4>[   13.486378] hpet1: lost 1 rtc interrupts
<4>[   13.531123] hpet1: lost 1 rtc interrupts
<4>[   13.572920] hpet1: lost 1 rtc interrupts
<6>[   13.573579] usb 1-1: New USB device found, idVendor=0438, 
idProduct=7900, bcdDevice= 0.18
<6>[   13.573583] usb 1-1: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
<6>[   13.573689] usb 4-1: New USB device found, idVendor=0438, 
idProduct=7900, bcdDevice= 0.18
<6>[   13.573692] usb 4-1: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
<6>[   13.573994] hub 4-1:1.0: USB hub found
<6>[   13.574107] hub 1-1:1.0: USB hub found
<6>[   13.574120] hub 4-1:1.0: 4 ports detected
<6>[   13.574182] hub 1-1:1.0: 4 ports detected
<4>[   13.612313] hpet1: lost 1 rtc interrupts
<4>[   13.651976] hpet1: lost 1 rtc interrupts
<4>[   13.690645] hpet1: lost 1 rtc interrupts
<4>[   13.732985] hpet1: lost 1 rtc interrupts
<4>[   13.773804] hpet1: lost 1 rtc interrupts
<4>[   13.815399] hpet1: lost 1 rtc interrupts
<4>[   13.857053] hpet1: lost 1 rtc interrupts
<6>[   13.943198] usb 4-1.2: new high-speed USB device number 3 using ehci-pci
<6>[   13.943227] usb 1-1.3: new high-speed USB device number 3 using ehci-pci
<4>[   14.017533] RIP: 0010:amdgpu_irq_handler+0x28/0x78 [amdgpu]
<4>[   14.017538] Code: 00 00 41 54 55 53 48 8b 6e 28 48 89 f3 48 89 ef 48 8d 
b5 88 5f 00 00 e8 0e 0a 00 00 41 89 c4 ff c8 74 3e 48 8b 85 d0 70 00 00 <48> 8b 
90 a8 00 00 00 48 85 d2 74 0f 48 89 ef e8 1c 75 f7 cb 48 8b
<4>[   14.017540] RSP: 0000:ffffa9f1c00ecf00 EFLAGS: 00010012
<4>[   14.017544] RAX: 0000000000000000 RBX: ffff947b96a700b0 RCX: 
0000000000000018
<4>[   14.017546] RDX: 00000000008e7d30 RSI: 001a351391f4b553 RDI: 
ffffffff8ca17720
<4>[   14.017547] RBP: ffff947b8fb80000 R08: ffffffff8c6077e0 R09: ffff947b97ba4af8
<4>[   14.017549] R10: ffff947b969cd2b8 R11: ffff947b969cd2a8 R12: 
0000000000000001
<4>[   14.017551] R13: 0000000000000000 R14: ffffa9f1c00ecf64 R15: 
0000000000000000
<4>[   14.017553] FS:  00007f40b84aa740(0000) GS:ffff947b97a80000(0000) knlGS:
0000000000000000
<4>[   14.017555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   14.017557] CR2: 00000000000000a8 CR3: 000000020e627000 CR4: 
00000000000406e0
<0>[   14.017559] Kernel panic - not syncing: Fatal exception in interrupt
<0>[   14.017575] Kernel Offset: 0xa800000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)


2. full git bisect log:


git bisect start
# good: [f1f7ad1b3b98a22229e71d51a1b983049e8bae6b] drm/amd/display: fix 
calc_pll_max_vco_construct
git bisect good f1f7ad1b3b98a22229e71d51a1b983049e8bae6b
# bad: [3913cc8cdcf3e27d5ffd31b70779f189e61e6c71] drm/amdgpu: Move null pointer 
dereference check
git bisect bad 3913cc8cdcf3e27d5ffd31b70779f189e61e6c71
# good: [f7ffd234bc4acc41612fd6aac83408a1aceffceb] drm/amd/display: Add hubp 
block for Renoir (v2)
git bisect good f7ffd234bc4acc41612fd6aac83408a1aceffceb
# good: [0460fba0adac1c0e6211ec5308cfb58941cf26b8] drm/amdgpu: Handle job is 
NULL use case in amdgpu_device_gpu_recover
git bisect good 0460fba0adac1c0e6211ec5308cfb58941cf26b8
# bad: [62c64055ab6d618b1afb28dd4b119cfc1e5d59cb] drm/amdgpu: switch to 
amdgpu_ras_late_init for gfx v9 block (v2)
git bisect bad 62c64055ab6d618b1afb28dd4b119cfc1e5d59cb
# good: [1b64dd1871d952c3f999aac8176ba2afbd5ff661] drm/amdgpu: add nbif v7_4 
irq source header for vega20
git bisect good 1b64dd1871d952c3f999aac8176ba2afbd5ff661
# good: [82e6cc2843fc844e5164c0618e6ec133f405a25f] drm/amdgpu: add 
ras_controller and err_event_athub interrupt support
git bisect good 82e6cc2843fc844e5164c0618e6ec133f405a25f
# bad: [598de6e65a1c1cbd36decb09d190071c99f100f8] drm/amdgpu: add helper 
function to do common ras_late_init/fini (v3)
git bisect bad 598de6e65a1c1cbd36decb09d190071c99f100f8
# bad: [ab2d6f7463d1f6eaf0529c163754feadc353469b] drm/amdgpu: poll 
ras_controller_irq and err_event_athub_irq status
git bisect bad ab2d6f7463d1f6eaf0529c163754feadc353469b
# first bad commit: [ab2d6f7463d1f6eaf0529c163754feadc353469b] drm/amdgpu: poll 
ras_controller_irq and err_event_athub_irq status



commit ab2d6f7463d1f6eaf0529c163754feadc353469b
Author: Hawking Zhang <Hawking.Zhang-5C7GfCeVMHo@public.gmane.org>
Date:   Wed Jun 5 14:40:57 2019 +0800

    drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status
    
    For the hardware that can not enable BIF ring for IH cookies for both
    ras_controller_irq and err_event_athub_irq, the driver has to poll the
    status register in irq handling and ack the hardware properly when there
    is interrupt triggered
    
    Signed-off-by: Hawking Zhang <Hawking.Zhang-5C7GfCeVMHo@public.gmane.org>
    Reviewed-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>

Any help is appreciated.

Thanks,
Przemek.

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kenel Ooops with: "drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status" [bisected].
  2019-09-01 15:44 Kenel Ooops with: "drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status" [bisected] Przemek Socha
@ 2019-09-01 17:40 ` Alex Deucher
       [not found]   ` <CADnq5_OMdehS65YE3R5HcVstS20z1brnB37JidJQYM4Ck5isCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Deucher @ 2019-09-01 17:40 UTC (permalink / raw)
  To: soprwa-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Alex Deucher, amd-gfx list, Hawking Zhang

On Sun, Sep 1, 2019 at 1:09 PM Przemek Socha <soprwa@gmail.com> wrote:
>
> Hello everyone,
>
> after today sync with amd-staging-drm-next repo my machine was hit by Ooops
> bug.
> Maybe my google-foo is weak, but I could not find any fix on patchwork for this
> that will/was implemented or planned.
>
> Machine is a Lenovo netbook with a6-6310 APU, R4 (CIK).
>
> I have done bisection and here are the results:
>
>
> 1.  dmesg output from pstore after kernel panic:
>
> <6>[   13.133880] [drm] amdgpu kernel modesetting enabled.
> <6>[   13.133923] amdgpu 0000:00:01.0: remove_conflicting_pci_framebuffers: bar
> 0: 0xe0000000 -> 0xefffffff
> <6>[   13.133927] amdgpu 0000:00:01.0: remove_conflicting_pci_framebuffers: bar
> 2: 0xf0000000 -> 0xf07fffff
> <6>[   13.133930] amdgpu 0000:00:01.0: remove_conflicting_pci_framebuffers: bar
> 5: 0xf0c00000 -> 0xf0c3ffff
> <7>[   13.133933] checking generic (e0000000 420000) vs hw (e0000000 10000000)
> <6>[   13.133935] fb0: switching to amdgpudrmfb from EFI VGA
> <6>[   13.133999] Console: switching to colour dummy device 80x25
> <6>[   13.136463] [drm] initializing kernel modesetting (MULLINS 0x1002:0x9851
> 0x17AA:0x3801 0x00).
> <6>[   13.136826] [drm] register mmio base: 0xF0C00000
> <6>[   13.136827] [drm] register mmio size: 262144
> <6>[   13.136837] [drm] add ip block number 0 <cik_common>
> <6>[   13.136839] [drm] add ip block number 1 <gmc_v7_0>
> <6>[   13.136840] [drm] add ip block number 2 <cik_ih>
> <6>[   13.136842] [drm] add ip block number 3 <gfx_v7_0>
> <6>[   13.136844] [drm] add ip block number 4 <cik_sdma>
> <6>[   13.136845] [drm] add ip block number 5 <kv_dpm>
> <6>[   13.136847] [drm] add ip block number 6 <dm>
> <6>[   13.136849] [drm] add ip block number 7 <uvd_v4_2>
> <6>[   13.136850] [drm] add ip block number 8 <vce_v2_0>
> <6>[   13.136857] amdgpu 0000:00:01.0: kfd not supported on this ASIC
> <6>[   13.136916] ATOM BIOS: BR45787.ts5
> <6>[   13.137031] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
> fragment size is 9-bit
> <6>[   13.137042] amdgpu 0000:00:01.0: VRAM: 1024M 0x000000F400000000 -
> 0x000000F43FFFFFFF (1024M used)
> <6>[   13.137046] amdgpu 0000:00:01.0: GART: 1024M 0x000000FF00000000 -
> 0x000000FF3FFFFFFF
> <6>[   13.137056] [drm] Detected VRAM RAM=1024M, BAR=1024M
> <6>[   13.137057] [drm] RAM width 64bits UNKNOWN
> <6>[   13.138102] sdhci: Secure Digital Host Controller Interface driver
> <6>[   13.138105] sdhci: Copyright(c) Pierre Ossman
> <6>[   13.138741] [TTM] Zone  kernel: Available graphics memory: 3541568 KiB
> <6>[   13.138744] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
> <6>[   13.138745] [TTM] Initializing pool allocator
> <6>[   13.138754] [TTM] Initializing DMA pool allocator
> <6>[   13.138882] [drm] amdgpu: 1024M of VRAM memory ready
> <6>[   13.138891] [drm] amdgpu: 3072M of GTT memory ready.
> <6>[   13.138932] [drm] GART: num cpu pages 262144, num gpu pages 262144
> <6>[   13.138970] [drm] PCIE GART of 1024M enabled (table at
> 0x000000F400401000).
> <6>[   13.176861] [drm] Internal thermal controller without fan control
> <6>[   13.176865] [drm] amdgpu: dpm initialized
> <6>[   13.176872] [drm] Found UVD firmware Version: 1.64 Family ID: 9
> <6>[   13.178133] sdhci-pci 0000:00:14.7: SDHCI controller found [1022:7813]
> (rev 1)
> <6>[   13.180552] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
> <6>[   13.186202] kvm: Nested Virtualization enabled
> <6>[   13.186205] kvm: Nested Paging enabled
> <6>[   13.191378] mmc0: SDHCI controller on PCI [0000:00:14.7] using ADMA
> <3>[   13.196258] [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB:
> invalid powerlevel state: 0!
> <4>[   13.196308] [drm] Unsupported Connector type:5!
> <6>[   13.213496] [drm] Display Core initialized with v3.2.48!
> <6>[   13.221850] [drm] SADs count is: -2, don't need to read it
> <6>[   13.230392] ath: phy0: WB335 2-ANT card detected
> <6>[   13.230395] ath: phy0: Set BT/WLAN RX diversity capability
> <6>[   13.247472] ath: phy0: Enable LNA combining
> <6>[   13.248570] ath: phy0: ASPM enabled: 0x43
> <7>[   13.248574] ath: EEPROM regdomain: 0x6a
> <7>[   13.248575] ath: EEPROM indicates we should expect a direct regpair map
> <7>[   13.248579] ath: Country alpha2 being used: 00
> <7>[   13.248580] ath: Regpair used: 0x6a
> <7>[   13.261552] ieee80211 phy0: Selected rate control algorithm
> 'minstrel_ht'
> <6>[   13.261857] ieee80211 phy0: Atheros AR9565 Rev:1 mem=0xffffa9f1c0400000,
> irq=43
> <6>[   13.296215] ath9k 0000:01:00.0 wlp1s0: renamed from wlan0
> <6>[   13.304323] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> <6>[   13.304325] [drm] Driver supports precise vblank timestamp query.
> <6>[   13.321092] [drm] UVD initialized successfully.
> <6>[   13.373473] usb 1-1: new high-speed USB device number 2 using ehci-pci
> <6>[   13.386794] usb 4-1: new high-speed USB device number 2 using ehci-pci
> <6>[   13.442287] [drm] VCE initialized successfully.
> <1>[   13.444174] BUG: kernel NULL pointer dereference, address:
> 00000000000000a8
> <1>[   13.444191] #PF: supervisor read access in kernel mode
> <1>[   13.444197] #PF: error_code(0x0000) - not-present page
> <6>[   13.444202] PGD 0 P4D 0
> <4>[   13.444210] Oops: 0000 [#1] PREEMPT SMP
> <4>[   13.444218] CPU: 1 PID: 3311 Comm: laptop_mode Not tainted 5.2.0-rc1+
> #94
> <4>[   13.444224] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13)
> 08/04/2016
> <4>[   13.444392] RIP: 0010:amdgpu_irq_handler+0x28/0x78 [amdgpu]
> <4>[   13.444401] Code: 00 00 41 54 55 53 48 8b 6e 28 48 89 f3 48 89 ef 48 8d
> b5 88 5f 00 00 e8 0e 0a 00 00 41 89 c4 ff c8 74 3e 48 8b 85 d0 70 00 00 <48> 8b
> 90 a8 00 00 00 48 85 d2 74 0f 48 89 ef e8 1c 75 f7 cb 48 8b
> <4>[   13.444414] RSP: 0000:ffffa9f1c00ecf00 EFLAGS: 00010012
> <4>[   13.444420] RAX: 0000000000000000 RBX: ffff947b96a700b0 RCX:
> 0000000000000018
> <4>[   13.444427] RDX: 00000000008e7d30 RSI: 001a351391f4b553 RDI:
> ffffffff8ca17720
> <4>[   13.444433] RBP: ffff947b8fb80000 R08: ffffffff8c6077e0 R09: ffff947b97ba4af8
> <4>[   13.444440] R10: ffff947b969cd2b8 R11: ffff947b969cd2a8 R12:
> 0000000000000001
> <4>[   13.444446] R13: 0000000000000000 R14: ffffa9f1c00ecf64 R15:
> 0000000000000000
> <4>[   13.444453] FS:  00007f40b84aa740(0000) GS:ffff947b97a80000(0000) knlGS:
> 0000000000000000
> <4>[   13.444461] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[   13.444466] CR2: 00000000000000a8 CR3: 000000020e627000 CR4:
> 00000000000406e0
> <4>[   13.444472] Call Trace:
> <4>[   13.444481]  <IRQ>
> <4>[   13.444492]  __handle_irq_event_percpu+0x3d/0x1a0
> <4>[   13.444501]  handle_irq_event_percpu+0x2c/0x78
> <4>[   13.444508]  handle_irq_event+0x2f/0x4c
> <4>[   13.444515]  handle_edge_irq+0x95/0x1c0
> <4>[   13.444523]  handle_irq+0x17/0x20
> <4>[   13.444531]  do_IRQ+0x4a/0xe0
> <4>[   13.444539]  common_interrupt+0xf/0xf
> <4>[   13.444545]  </IRQ>
> <4>[   13.444550] RIP: 0033:0x56277e2c6830
> <4>[   13.444556] Code: 68 7d 00 00 00 e9 10 f8 ff ff ff 25 22 52 0d 00 68 7e 00
> 00 00 e9 00 f8 ff ff ff 25 1a 52 0d 00 68 7f 00 00 00 e9 f0 f7 ff ff <ff> 25 12 52
> 0d 00 68 80 00 00 00 e9 e0 f7 ff ff ff 25 0a 52 0d 00
> <4>[   13.444568] RSP: 002b:00007ffddc457328 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffda
> <4>[   13.444576] RAX: 0000562780370044 RBX: 0000000000000056 RCX:
> 0000000000000045
> <4>[   13.444582] RDX: 0000000000000001 RSI: 000056278037f670 RDI:
> 00005627804390c0
> <4>[   13.444588] RBP: 0000562780443150 R08: 0000000000000000 R09:
> 0000000000003cff
> <4>[   13.444595] R10: 0000000000100000 R11: 0000000000000098 R12:
> 00005627804390c0
> <4>[   13.444601] R13: 0000562780440044 R14: 0000562780457a00 R15:
> 00000000000000bc
> <4>[   13.444609] Modules linked in: ath9k ath9k_common ath9k_hw kvm_amd
> sdhci_pci iosf_mbi mac80211 cqhci kvm sdhci irqbypass crc32_pclmul
> ghash_clmulni_intel serio_raw mmc_core ath amdgpu(+) cfg80211 gpu_sched
> mfd_core ttm xhci_pci ehci_pci xhci_hcd sp5100_tco ehci_hcd
> <4>[   13.444645] CR2: 00000000000000a8
> <4>[   13.444654] ---[ end trace cd97c823583992aa ]---
> <6>[   13.446294] [drm] fb mappable at 0xA07ED000
> <6>[   13.446305] [drm] vram apper at 0xA0000000
> <6>[   13.446310] [drm] size 5767168
> <6>[   13.446315] [drm] fb depth is 24
> <6>[   13.446319] [drm]    pitch is 5632
> <6>[   13.446480] fbcon: amdgpudrmfb (fb0) is primary device
> <4>[   13.486378] hpet1: lost 1 rtc interrupts
> <4>[   13.531123] hpet1: lost 1 rtc interrupts
> <4>[   13.572920] hpet1: lost 1 rtc interrupts
> <6>[   13.573579] usb 1-1: New USB device found, idVendor=0438,
> idProduct=7900, bcdDevice= 0.18
> <6>[   13.573583] usb 1-1: New USB device strings: Mfr=0, Product=0,
> SerialNumber=0
> <6>[   13.573689] usb 4-1: New USB device found, idVendor=0438,
> idProduct=7900, bcdDevice= 0.18
> <6>[   13.573692] usb 4-1: New USB device strings: Mfr=0, Product=0,
> SerialNumber=0
> <6>[   13.573994] hub 4-1:1.0: USB hub found
> <6>[   13.574107] hub 1-1:1.0: USB hub found
> <6>[   13.574120] hub 4-1:1.0: 4 ports detected
> <6>[   13.574182] hub 1-1:1.0: 4 ports detected
> <4>[   13.612313] hpet1: lost 1 rtc interrupts
> <4>[   13.651976] hpet1: lost 1 rtc interrupts
> <4>[   13.690645] hpet1: lost 1 rtc interrupts
> <4>[   13.732985] hpet1: lost 1 rtc interrupts
> <4>[   13.773804] hpet1: lost 1 rtc interrupts
> <4>[   13.815399] hpet1: lost 1 rtc interrupts
> <4>[   13.857053] hpet1: lost 1 rtc interrupts
> <6>[   13.943198] usb 4-1.2: new high-speed USB device number 3 using ehci-pci
> <6>[   13.943227] usb 1-1.3: new high-speed USB device number 3 using ehci-pci
> <4>[   14.017533] RIP: 0010:amdgpu_irq_handler+0x28/0x78 [amdgpu]
> <4>[   14.017538] Code: 00 00 41 54 55 53 48 8b 6e 28 48 89 f3 48 89 ef 48 8d
> b5 88 5f 00 00 e8 0e 0a 00 00 41 89 c4 ff c8 74 3e 48 8b 85 d0 70 00 00 <48> 8b
> 90 a8 00 00 00 48 85 d2 74 0f 48 89 ef e8 1c 75 f7 cb 48 8b
> <4>[   14.017540] RSP: 0000:ffffa9f1c00ecf00 EFLAGS: 00010012
> <4>[   14.017544] RAX: 0000000000000000 RBX: ffff947b96a700b0 RCX:
> 0000000000000018
> <4>[   14.017546] RDX: 00000000008e7d30 RSI: 001a351391f4b553 RDI:
> ffffffff8ca17720
> <4>[   14.017547] RBP: ffff947b8fb80000 R08: ffffffff8c6077e0 R09: ffff947b97ba4af8
> <4>[   14.017549] R10: ffff947b969cd2b8 R11: ffff947b969cd2a8 R12:
> 0000000000000001
> <4>[   14.017551] R13: 0000000000000000 R14: ffffa9f1c00ecf64 R15:
> 0000000000000000
> <4>[   14.017553] FS:  00007f40b84aa740(0000) GS:ffff947b97a80000(0000) knlGS:
> 0000000000000000
> <4>[   14.017555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[   14.017557] CR2: 00000000000000a8 CR3: 000000020e627000 CR4:
> 00000000000406e0
> <0>[   14.017559] Kernel panic - not syncing: Fatal exception in interrupt
> <0>[   14.017575] Kernel Offset: 0xa800000 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffffbfffffff)
>
>
> 2. full git bisect log:
>
>
> git bisect start
> # good: [f1f7ad1b3b98a22229e71d51a1b983049e8bae6b] drm/amd/display: fix
> calc_pll_max_vco_construct
> git bisect good f1f7ad1b3b98a22229e71d51a1b983049e8bae6b
> # bad: [3913cc8cdcf3e27d5ffd31b70779f189e61e6c71] drm/amdgpu: Move null pointer
> dereference check
> git bisect bad 3913cc8cdcf3e27d5ffd31b70779f189e61e6c71
> # good: [f7ffd234bc4acc41612fd6aac83408a1aceffceb] drm/amd/display: Add hubp
> block for Renoir (v2)
> git bisect good f7ffd234bc4acc41612fd6aac83408a1aceffceb
> # good: [0460fba0adac1c0e6211ec5308cfb58941cf26b8] drm/amdgpu: Handle job is
> NULL use case in amdgpu_device_gpu_recover
> git bisect good 0460fba0adac1c0e6211ec5308cfb58941cf26b8
> # bad: [62c64055ab6d618b1afb28dd4b119cfc1e5d59cb] drm/amdgpu: switch to
> amdgpu_ras_late_init for gfx v9 block (v2)
> git bisect bad 62c64055ab6d618b1afb28dd4b119cfc1e5d59cb
> # good: [1b64dd1871d952c3f999aac8176ba2afbd5ff661] drm/amdgpu: add nbif v7_4
> irq source header for vega20
> git bisect good 1b64dd1871d952c3f999aac8176ba2afbd5ff661
> # good: [82e6cc2843fc844e5164c0618e6ec133f405a25f] drm/amdgpu: add
> ras_controller and err_event_athub interrupt support
> git bisect good 82e6cc2843fc844e5164c0618e6ec133f405a25f
> # bad: [598de6e65a1c1cbd36decb09d190071c99f100f8] drm/amdgpu: add helper
> function to do common ras_late_init/fini (v3)
> git bisect bad 598de6e65a1c1cbd36decb09d190071c99f100f8
> # bad: [ab2d6f7463d1f6eaf0529c163754feadc353469b] drm/amdgpu: poll
> ras_controller_irq and err_event_athub_irq status
> git bisect bad ab2d6f7463d1f6eaf0529c163754feadc353469b
> # first bad commit: [ab2d6f7463d1f6eaf0529c163754feadc353469b] drm/amdgpu: poll
> ras_controller_irq and err_event_athub_irq status
>
>
>
> commit ab2d6f7463d1f6eaf0529c163754feadc353469b
> Author: Hawking Zhang <Hawking.Zhang@amd.com>
> Date:   Wed Jun 5 14:40:57 2019 +0800
>
>     drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status
>
>     For the hardware that can not enable BIF ring for IH cookies for both
>     ras_controller_irq and err_event_athub_irq, the driver has to poll the
>     status register in irq handling and ack the hardware properly when there
>     is interrupt triggered
>
>     Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
>     Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>
> Any help is appreciated.

This patch should fix it:
https://patchwork.freedesktop.org/patch/328558/

Alex
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kenel Ooops with: "drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status" [bisected].
       [not found]   ` <CADnq5_OMdehS65YE3R5HcVstS20z1brnB37JidJQYM4Ck5isCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-01 18:43     ` Przemek Socha
  0 siblings, 0 replies; 3+ messages in thread
From: Przemek Socha @ 2019-09-01 18:43 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, amd-gfx list, Hawking Zhang


[-- Attachment #1.1: Type: text/plain, Size: 14352 bytes --]

Dnia niedziela, 1 września 2019 19:40:25 CEST piszesz:
> On Sun, Sep 1, 2019 at 1:09 PM Przemek Socha <soprwa-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > Hello everyone,
> > 
> > after today sync with amd-staging-drm-next repo my machine was hit by
> > Ooops
> > bug.
> > Maybe my google-foo is weak, but I could not find any fix on patchwork for
> > this that will/was implemented or planned.
> > 
> > Machine is a Lenovo netbook with a6-6310 APU, R4 (CIK).
> > 
> > I have done bisection and here are the results:
> > 
> > 
> > 1.  dmesg output from pstore after kernel panic:
> > 
> > <6>[   13.133880] [drm] amdgpu kernel modesetting enabled.
> > <6>[   13.133923] amdgpu 0000:00:01.0:
> > remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
> > <6>[   13.133927] amdgpu 0000:00:01.0:
> > remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf07fffff
> > <6>[   13.133930] amdgpu 0000:00:01.0:
> > remove_conflicting_pci_framebuffers: bar 5: 0xf0c00000 -> 0xf0c3ffff
> > <7>[   13.133933] checking generic (e0000000 420000) vs hw (e0000000
> > 10000000) <6>[   13.133935] fb0: switching to amdgpudrmfb from EFI VGA
> > <6>[   13.133999] Console: switching to colour dummy device 80x25
> > <6>[   13.136463] [drm] initializing kernel modesetting (MULLINS
> > 0x1002:0x9851 0x17AA:0x3801 0x00).
> > <6>[   13.136826] [drm] register mmio base: 0xF0C00000
> > <6>[   13.136827] [drm] register mmio size: 262144
> > <6>[   13.136837] [drm] add ip block number 0 <cik_common>
> > <6>[   13.136839] [drm] add ip block number 1 <gmc_v7_0>
> > <6>[   13.136840] [drm] add ip block number 2 <cik_ih>
> > <6>[   13.136842] [drm] add ip block number 3 <gfx_v7_0>
> > <6>[   13.136844] [drm] add ip block number 4 <cik_sdma>
> > <6>[   13.136845] [drm] add ip block number 5 <kv_dpm>
> > <6>[   13.136847] [drm] add ip block number 6 <dm>
> > <6>[   13.136849] [drm] add ip block number 7 <uvd_v4_2>
> > <6>[   13.136850] [drm] add ip block number 8 <vce_v2_0>
> > <6>[   13.136857] amdgpu 0000:00:01.0: kfd not supported on this ASIC
> > <6>[   13.136916] ATOM BIOS: BR45787.ts5
> > <6>[   13.137031] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
> > fragment size is 9-bit
> > <6>[   13.137042] amdgpu 0000:00:01.0: VRAM: 1024M 0x000000F400000000 -
> > 0x000000F43FFFFFFF (1024M used)
> > <6>[   13.137046] amdgpu 0000:00:01.0: GART: 1024M 0x000000FF00000000 -
> > 0x000000FF3FFFFFFF
> > <6>[   13.137056] [drm] Detected VRAM RAM=1024M, BAR=1024M
> > <6>[   13.137057] [drm] RAM width 64bits UNKNOWN
> > <6>[   13.138102] sdhci: Secure Digital Host Controller Interface driver
> > <6>[   13.138105] sdhci: Copyright(c) Pierre Ossman
> > <6>[   13.138741] [TTM] Zone  kernel: Available graphics memory: 3541568
> > KiB <6>[   13.138744] [TTM] Zone   dma32: Available graphics memory:
> > 2097152 KiB <6>[   13.138745] [TTM] Initializing pool allocator
> > <6>[   13.138754] [TTM] Initializing DMA pool allocator
> > <6>[   13.138882] [drm] amdgpu: 1024M of VRAM memory ready
> > <6>[   13.138891] [drm] amdgpu: 3072M of GTT memory ready.
> > <6>[   13.138932] [drm] GART: num cpu pages 262144, num gpu pages 262144
> > <6>[   13.138970] [drm] PCIE GART of 1024M enabled (table at
> > 0x000000F400401000).
> > <6>[   13.176861] [drm] Internal thermal controller without fan control
> > <6>[   13.176865] [drm] amdgpu: dpm initialized
> > <6>[   13.176872] [drm] Found UVD firmware Version: 1.64 Family ID: 9
> > <6>[   13.178133] sdhci-pci 0000:00:14.7: SDHCI controller found
> > [1022:7813] (rev 1)
> > <6>[   13.180552] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
> > <6>[   13.186202] kvm: Nested Virtualization enabled
> > <6>[   13.186205] kvm: Nested Paging enabled
> > <6>[   13.191378] mmc0: SDHCI controller on PCI [0000:00:14.7] using ADMA
> > <3>[   13.196258] [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB:
> > invalid powerlevel state: 0!
> > <4>[   13.196308] [drm] Unsupported Connector type:5!
> > <6>[   13.213496] [drm] Display Core initialized with v3.2.48!
> > <6>[   13.221850] [drm] SADs count is: -2, don't need to read it
> > <6>[   13.230392] ath: phy0: WB335 2-ANT card detected
> > <6>[   13.230395] ath: phy0: Set BT/WLAN RX diversity capability
> > <6>[   13.247472] ath: phy0: Enable LNA combining
> > <6>[   13.248570] ath: phy0: ASPM enabled: 0x43
> > <7>[   13.248574] ath: EEPROM regdomain: 0x6a
> > <7>[   13.248575] ath: EEPROM indicates we should expect a direct regpair
> > map <7>[   13.248579] ath: Country alpha2 being used: 00
> > <7>[   13.248580] ath: Regpair used: 0x6a
> > <7>[   13.261552] ieee80211 phy0: Selected rate control algorithm
> > 'minstrel_ht'
> > <6>[   13.261857] ieee80211 phy0: Atheros AR9565 Rev:1
> > mem=0xffffa9f1c0400000, irq=43
> > <6>[   13.296215] ath9k 0000:01:00.0 wlp1s0: renamed from wlan0
> > <6>[   13.304323] [drm] Supports vblank timestamp caching Rev 2
> > (21.10.2013). <6>[   13.304325] [drm] Driver supports precise vblank
> > timestamp query. <6>[   13.321092] [drm] UVD initialized successfully.
> > <6>[   13.373473] usb 1-1: new high-speed USB device number 2 using
> > ehci-pci <6>[   13.386794] usb 4-1: new high-speed USB device number 2
> > using ehci-pci <6>[   13.442287] [drm] VCE initialized successfully.
> > <1>[   13.444174] BUG: kernel NULL pointer dereference, address:
> > 00000000000000a8
> > <1>[   13.444191] #PF: supervisor read access in kernel mode
> > <1>[   13.444197] #PF: error_code(0x0000) - not-present page
> > <6>[   13.444202] PGD 0 P4D 0
> > <4>[   13.444210] Oops: 0000 [#1] PREEMPT SMP
> > <4>[   13.444218] CPU: 1 PID: 3311 Comm: laptop_mode Not tainted
> > 5.2.0-rc1+
> > #94
> > <4>[   13.444224] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS
> > A2CN45WW(V2.13) 08/04/2016
> > <4>[   13.444392] RIP: 0010:amdgpu_irq_handler+0x28/0x78 [amdgpu]
> > <4>[   13.444401] Code: 00 00 41 54 55 53 48 8b 6e 28 48 89 f3 48 89 ef 48
> > 8d b5 88 5f 00 00 e8 0e 0a 00 00 41 89 c4 ff c8 74 3e 48 8b 85 d0 70 00
> > 00 <48> 8b 90 a8 00 00 00 48 85 d2 74 0f 48 89 ef e8 1c 75 f7 cb 48 8b
> > <4>[   13.444414] RSP: 0000:ffffa9f1c00ecf00 EFLAGS: 00010012
> > <4>[   13.444420] RAX: 0000000000000000 RBX: ffff947b96a700b0 RCX:
> > 0000000000000018
> > <4>[   13.444427] RDX: 00000000008e7d30 RSI: 001a351391f4b553 RDI:
> > ffffffff8ca17720
> > <4>[   13.444433] RBP: ffff947b8fb80000 R08: ffffffff8c6077e0 R09:
> > ffff947b97ba4af8 <4>[   13.444440] R10: ffff947b969cd2b8 R11:
> > ffff947b969cd2a8 R12: 0000000000000001
> > <4>[   13.444446] R13: 0000000000000000 R14: ffffa9f1c00ecf64 R15:
> > 0000000000000000
> > <4>[   13.444453] FS:  00007f40b84aa740(0000) GS:ffff947b97a80000(0000)
> > knlGS: 0000000000000000
> > <4>[   13.444461] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[   13.444466] CR2: 00000000000000a8 CR3: 000000020e627000 CR4:
> > 00000000000406e0
> > <4>[   13.444472] Call Trace:
> > <4>[   13.444481]  <IRQ>
> > <4>[   13.444492]  __handle_irq_event_percpu+0x3d/0x1a0
> > <4>[   13.444501]  handle_irq_event_percpu+0x2c/0x78
> > <4>[   13.444508]  handle_irq_event+0x2f/0x4c
> > <4>[   13.444515]  handle_edge_irq+0x95/0x1c0
> > <4>[   13.444523]  handle_irq+0x17/0x20
> > <4>[   13.444531]  do_IRQ+0x4a/0xe0
> > <4>[   13.444539]  common_interrupt+0xf/0xf
> > <4>[   13.444545]  </IRQ>
> > <4>[   13.444550] RIP: 0033:0x56277e2c6830
> > <4>[   13.444556] Code: 68 7d 00 00 00 e9 10 f8 ff ff ff 25 22 52 0d 00 68
> > 7e 00 00 00 e9 00 f8 ff ff ff 25 1a 52 0d 00 68 7f 00 00 00 e9 f0 f7 ff
> > ff <ff> 25 12 52 0d 00 68 80 00 00 00 e9 e0 f7 ff ff ff 25 0a 52 0d 00
> > <4>[   13.444568] RSP: 002b:00007ffddc457328 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffffda
> > <4>[   13.444576] RAX: 0000562780370044 RBX: 0000000000000056 RCX:
> > 0000000000000045
> > <4>[   13.444582] RDX: 0000000000000001 RSI: 000056278037f670 RDI:
> > 00005627804390c0
> > <4>[   13.444588] RBP: 0000562780443150 R08: 0000000000000000 R09:
> > 0000000000003cff
> > <4>[   13.444595] R10: 0000000000100000 R11: 0000000000000098 R12:
> > 00005627804390c0
> > <4>[   13.444601] R13: 0000562780440044 R14: 0000562780457a00 R15:
> > 00000000000000bc
> > <4>[   13.444609] Modules linked in: ath9k ath9k_common ath9k_hw kvm_amd
> > sdhci_pci iosf_mbi mac80211 cqhci kvm sdhci irqbypass crc32_pclmul
> > ghash_clmulni_intel serio_raw mmc_core ath amdgpu(+) cfg80211 gpu_sched
> > mfd_core ttm xhci_pci ehci_pci xhci_hcd sp5100_tco ehci_hcd
> > <4>[   13.444645] CR2: 00000000000000a8
> > <4>[   13.444654] ---[ end trace cd97c823583992aa ]---
> > <6>[   13.446294] [drm] fb mappable at 0xA07ED000
> > <6>[   13.446305] [drm] vram apper at 0xA0000000
> > <6>[   13.446310] [drm] size 5767168
> > <6>[   13.446315] [drm] fb depth is 24
> > <6>[   13.446319] [drm]    pitch is 5632
> > <6>[   13.446480] fbcon: amdgpudrmfb (fb0) is primary device
> > <4>[   13.486378] hpet1: lost 1 rtc interrupts
> > <4>[   13.531123] hpet1: lost 1 rtc interrupts
> > <4>[   13.572920] hpet1: lost 1 rtc interrupts
> > <6>[   13.573579] usb 1-1: New USB device found, idVendor=0438,
> > idProduct=7900, bcdDevice= 0.18
> > <6>[   13.573583] usb 1-1: New USB device strings: Mfr=0, Product=0,
> > SerialNumber=0
> > <6>[   13.573689] usb 4-1: New USB device found, idVendor=0438,
> > idProduct=7900, bcdDevice= 0.18
> > <6>[   13.573692] usb 4-1: New USB device strings: Mfr=0, Product=0,
> > SerialNumber=0
> > <6>[   13.573994] hub 4-1:1.0: USB hub found
> > <6>[   13.574107] hub 1-1:1.0: USB hub found
> > <6>[   13.574120] hub 4-1:1.0: 4 ports detected
> > <6>[   13.574182] hub 1-1:1.0: 4 ports detected
> > <4>[   13.612313] hpet1: lost 1 rtc interrupts
> > <4>[   13.651976] hpet1: lost 1 rtc interrupts
> > <4>[   13.690645] hpet1: lost 1 rtc interrupts
> > <4>[   13.732985] hpet1: lost 1 rtc interrupts
> > <4>[   13.773804] hpet1: lost 1 rtc interrupts
> > <4>[   13.815399] hpet1: lost 1 rtc interrupts
> > <4>[   13.857053] hpet1: lost 1 rtc interrupts
> > <6>[   13.943198] usb 4-1.2: new high-speed USB device number 3 using
> > ehci-pci <6>[   13.943227] usb 1-1.3: new high-speed USB device number 3
> > using ehci-pci <4>[   14.017533] RIP: 0010:amdgpu_irq_handler+0x28/0x78
> > [amdgpu]
> > <4>[   14.017538] Code: 00 00 41 54 55 53 48 8b 6e 28 48 89 f3 48 89 ef 48
> > 8d b5 88 5f 00 00 e8 0e 0a 00 00 41 89 c4 ff c8 74 3e 48 8b 85 d0 70 00
> > 00 <48> 8b 90 a8 00 00 00 48 85 d2 74 0f 48 89 ef e8 1c 75 f7 cb 48 8b
> > <4>[   14.017540] RSP: 0000:ffffa9f1c00ecf00 EFLAGS: 00010012
> > <4>[   14.017544] RAX: 0000000000000000 RBX: ffff947b96a700b0 RCX:
> > 0000000000000018
> > <4>[   14.017546] RDX: 00000000008e7d30 RSI: 001a351391f4b553 RDI:
> > ffffffff8ca17720
> > <4>[   14.017547] RBP: ffff947b8fb80000 R08: ffffffff8c6077e0 R09:
> > ffff947b97ba4af8 <4>[   14.017549] R10: ffff947b969cd2b8 R11:
> > ffff947b969cd2a8 R12: 0000000000000001
> > <4>[   14.017551] R13: 0000000000000000 R14: ffffa9f1c00ecf64 R15:
> > 0000000000000000
> > <4>[   14.017553] FS:  00007f40b84aa740(0000) GS:ffff947b97a80000(0000)
> > knlGS: 0000000000000000
> > <4>[   14.017555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[   14.017557] CR2: 00000000000000a8 CR3: 000000020e627000 CR4:
> > 00000000000406e0
> > <0>[   14.017559] Kernel panic - not syncing: Fatal exception in interrupt
> > <0>[   14.017575] Kernel Offset: 0xa800000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > 
> > 
> > 2. full git bisect log:
> > 
> > 
> > git bisect start
> > # good: [f1f7ad1b3b98a22229e71d51a1b983049e8bae6b] drm/amd/display: fix
> > calc_pll_max_vco_construct
> > git bisect good f1f7ad1b3b98a22229e71d51a1b983049e8bae6b
> > # bad: [3913cc8cdcf3e27d5ffd31b70779f189e61e6c71] drm/amdgpu: Move null
> > pointer dereference check
> > git bisect bad 3913cc8cdcf3e27d5ffd31b70779f189e61e6c71
> > # good: [f7ffd234bc4acc41612fd6aac83408a1aceffceb] drm/amd/display: Add
> > hubp block for Renoir (v2)
> > git bisect good f7ffd234bc4acc41612fd6aac83408a1aceffceb
> > # good: [0460fba0adac1c0e6211ec5308cfb58941cf26b8] drm/amdgpu: Handle job
> > is NULL use case in amdgpu_device_gpu_recover
> > git bisect good 0460fba0adac1c0e6211ec5308cfb58941cf26b8
> > # bad: [62c64055ab6d618b1afb28dd4b119cfc1e5d59cb] drm/amdgpu: switch to
> > amdgpu_ras_late_init for gfx v9 block (v2)
> > git bisect bad 62c64055ab6d618b1afb28dd4b119cfc1e5d59cb
> > # good: [1b64dd1871d952c3f999aac8176ba2afbd5ff661] drm/amdgpu: add nbif
> > v7_4 irq source header for vega20
> > git bisect good 1b64dd1871d952c3f999aac8176ba2afbd5ff661
> > # good: [82e6cc2843fc844e5164c0618e6ec133f405a25f] drm/amdgpu: add
> > ras_controller and err_event_athub interrupt support
> > git bisect good 82e6cc2843fc844e5164c0618e6ec133f405a25f
> > # bad: [598de6e65a1c1cbd36decb09d190071c99f100f8] drm/amdgpu: add helper
> > function to do common ras_late_init/fini (v3)
> > git bisect bad 598de6e65a1c1cbd36decb09d190071c99f100f8
> > # bad: [ab2d6f7463d1f6eaf0529c163754feadc353469b] drm/amdgpu: poll
> > ras_controller_irq and err_event_athub_irq status
> > git bisect bad ab2d6f7463d1f6eaf0529c163754feadc353469b
> > # first bad commit: [ab2d6f7463d1f6eaf0529c163754feadc353469b] drm/amdgpu:
> > poll ras_controller_irq and err_event_athub_irq status
> > 
> > 
> > 
> > commit ab2d6f7463d1f6eaf0529c163754feadc353469b
> > Author: Hawking Zhang <Hawking.Zhang-5C7GfCeVMHo@public.gmane.org>
> > Date:   Wed Jun 5 14:40:57 2019 +0800
> > 
> >     drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status
> >     
> >     For the hardware that can not enable BIF ring for IH cookies for both
> >     ras_controller_irq and err_event_athub_irq, the driver has to poll the
> >     status register in irq handling and ack the hardware properly when
> >     there
> >     is interrupt triggered
> >     
> >     Signed-off-by: Hawking Zhang <Hawking.Zhang-5C7GfCeVMHo@public.gmane.org>
> >     Reviewed-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>
> > 
> > Any help is appreciated.
> 
> This patch should fix it:
> https://patchwork.freedesktop.org/patch/328558/
> 
> Alex
Thanks.
Works like a charm.
Przemek.

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-01 18:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-01 15:44 Kenel Ooops with: "drm/amdgpu: poll ras_controller_irq and err_event_athub_irq status" [bisected] Przemek Socha
2019-09-01 17:40 ` Alex Deucher
     [not found]   ` <CADnq5_OMdehS65YE3R5HcVstS20z1brnB37JidJQYM4Ck5isCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-01 18:43     ` Przemek Socha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.