All of lore.kernel.org
 help / color / mirror / Atom feed
From: "José Pekkarinen" <koalinux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Xiangliang.Yu-5C7GfCeVMHo@public.gmane.org
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Topaz mistakenly reported as
Date: Sun, 17 Dec 2017 21:20:49 +0200	[thread overview]
Message-ID: <2405754.qoBN6c2ta5@bee> (raw)

	Hi,

	I hit an issue that seems to be a topaz discrete vga reporting it's a 
virtual function when my laptop is running on the battery. I received the 
following bactrace:

Dec 17 11:17:28 bee kernel: [   31.976810] kernel BUG at drivers/gpu/drm/amd/
amdgpu/mxgpu_vi.c:310!
Dec 17 11:17:28 bee kernel: [   31.976815] invalid opcode: 0000 [#1] SMP
Dec 17 11:17:28 bee kernel: [   31.976831] Modules linked in: vfio_pci 
vfio_virqfd udl loop bfq arc4 iwlmvm mac80211 kvmgt vfio_mdev amdgpu(+) mdev 
vfio_iommu_type1 vfio i915 uvcvideo x86_pkg_temp_thermal videobuf2_vmalloc 
videobuf2_memo
ps videobuf2_v4l2 intel_powerclamp videobuf2_core coretemp videodev kvm_intel 
kvm i2c_algo_bit rtsx_pci_sdmmc drm_kms_helper joydev mmc_core media mousedev 
rtsx_pci_ms btusb btrtl btbcm memstick ttm drm wmi_bmof hci_uart btintel 
bluetoot
h iwlwifi snd_hda_intel snd_hda_codec cfg80211 irqbypass crc32c_intel 
ghash_clmulni_intel intel_cstate snd_hwdep intel_uncore snd_hda_core psmouse 
intel_rapl_perf rtsx_pci snd_pcm efi_pstore evdev ideapad_laptop ac input_leds 
serio_raw e
fivars sparse_keymap intel_lpss_acpi battery thermal ecdh_generic wmi fan 
syscopyarea snd_timer sysfillrect snd rfkill intel_lpss
Dec 17 11:17:28 bee kernel: [   31.977023]  video sysimgblt tpm_crb soundcore 
button mfd_core i2c_hid i2c_i801 fb_sys_fops backlight acpi_pad efivarfs unix 
dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time 
dm_round_ro
bin dm_queue_length dm_multipath dm_log_userspace cn dm_flakey dm_delay xts 
aesni_intel crypto_simd cryptd glue_helper aes_x86_64 cbc sha256_generic 
scsi_transport_iscsi r8169 mii fuse xfs nfs lockd grace sunrpc fscache ext4 
mbcache jbd2
 multipath linear raid10 raid1 raid0 dm_raid raid456 md_mod async_raid6_recov 
async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c dm_snapshot 
dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod dax hid_generic usbhid 
xhc
i_pci xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd usbcore 
usb_common scsi_transport_fc sr_mod cdrom sg sd_mod ata_piix
Dec 17 11:17:28 bee kernel: [   31.977223]  ahci libahci sata_sx4 pata_oldpiix
Dec 17 11:17:28 bee kernel: [   31.977239] CPU: 0 PID: 3698 Comm: udevd Not 
tainted 4.14.5 #10
Dec 17 11:17:28 bee kernel: [   31.977255] Hardware name: LENOVO 80UV/Lenovo 
ideapad 510S-14IKB, BIOS 2SCN21WW(V2.01) 12/20/2016
Dec 17 11:17:28 bee kernel: [   31.977278] task: ffff880358b54280 task.stack: 
ffffc900014dc000
Dec 17 11:17:28 bee kernel: [   31.977323] RIP: 
0010:xgpu_vi_init_golden_registers+0x56/0xa0 [amdgpu]
Dec 17 11:17:28 bee kernel: [   31.977341] RSP: 0018:ffffc900014dfa08 EFLAGS: 
00010293
Dec 17 11:17:28 bee kernel: [   31.977356] RAX: 000000000000000a RBX: 
ffff880340040000 RCX: 0000000000000000
Dec 17 11:17:28 bee kernel: [   31.977375] RDX: ffff880358b54280 RSI: 
0000000000000100 RDI: ffff880340040000
Dec 17 11:17:28 bee kernel: [   31.977394] RBP: ffffc900014dfa10 R08: 
ffff88033c6dd198 R09: 0000000000000000
Dec 17 11:17:28 bee kernel: [   31.977413] R10: ffff880352c0aaa0 R11: 
0000000000000008 R12: ffff880340040458
Dec 17 11:17:28 bee kernel: [   31.977432] R13: 0000000000000000 R14: 
0000000000000000 R15: ffff880340040000
Dec 17 11:17:28 bee kernel: [   31.977452] FS:  00007fbfdd8c0780(0000) 
GS:ffff88046ec00000(0000) knlGS:0000000000000000
Dec 17 11:17:28 bee kernel: [   31.977474] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Dec 17 11:17:28 bee kernel: [   31.977490] CR2: 000055c3b48c1408 CR3: 
0000000358307003 CR4: 00000000003606f0
Dec 17 11:17:28 bee kernel: [   31.977527] Call Trace:
Dec 17 11:17:28 bee kernel: [   31.977555]  vi_common_hw_init+0x77/0xe0 
[amdgpu]
Dec 17 11:17:28 bee kernel: [   31.977584]  amdgpu_device_init+0xc4b/0x14b0 
[amdgpu]
Dec 17 11:17:28 bee kernel: [   31.977601]  ? kmem_cache_alloc_trace
+0x208/0x250
Dec 17 11:17:28 bee kernel: [   31.977629]  ? amdgpu_driver_load_kms+0x2a/
0x1b0 [amdgpu]
Dec 17 11:17:28 bee kernel: [   31.977658]  amdgpu_driver_load_kms+0x4f/0x1b0 
[amdgpu]
Dec 17 11:17:28 bee kernel: [   31.977682]  drm_dev_register+0x146/0x1d0 [drm]
Dec 17 11:17:28 bee kernel: [   31.977710]  amdgpu_pci_probe+0x118/0x140 
[amdgpu]
Dec 17 11:17:28 bee kernel: [   31.977725]  pci_device_probe+0xcf/0x150
Dec 17 11:17:28 bee kernel: [   31.977739]  driver_probe_device+0x29c/0x450
Dec 17 11:17:28 bee kernel: [   31.977753]  __driver_attach+0xdf/0xf0
Dec 17 11:17:28 bee kernel: [   31.978775]  ? driver_probe_device+0x450/0x450
Dec 17 11:17:28 bee kernel: [   31.979815]  bus_for_each_dev+0x60/0xa0
Dec 17 11:17:28 bee kernel: [   31.980882]  driver_attach+0x1e/0x20
Dec 17 11:17:28 bee kernel: [   31.981931]  bus_add_driver+0x170/0x260
Dec 17 11:17:28 bee kernel: [   31.982977]  driver_register+0x60/0xe0
Dec 17 11:17:28 bee kernel: [   31.984033]  __pci_register_driver+0x5a/0x60
Dec 17 11:17:28 bee kernel: [   31.985089]  amdgpu_init+0x88/0x9b [amdgpu]
Dec 17 11:17:28 bee kernel: [   31.986146]  ? 0xffffffffa0c51000
Dec 17 11:17:28 bee kernel: [   31.987192]  do_one_initcall+0x52/0x190
Dec 17 11:17:28 bee kernel: [   31.988229]  ? kmem_cache_alloc_trace
+0x208/0x250
Dec 17 11:17:28 bee kernel: [   31.989270]  ? do_init_module+0x27/0x202
Dec 17 11:17:28 bee kernel: [   31.990308]  ? do_init_module+0x27/0x202
Dec 17 11:17:28 bee kernel: [   31.991383]  do_init_module+0x5f/0x202
Dec 17 11:17:28 bee kernel: [   31.992396]  load_module+0x1511/0x1740
Dec 17 11:17:28 bee kernel: [   31.993433]  SyS_finit_module+0xc1/0x100
Dec 17 11:17:28 bee kernel: [   31.994478]  ? SyS_finit_module+0xc1/0x100
Dec 17 11:17:28 bee kernel: [   31.995505]  do_syscall_64+0x66/0x1a0
Dec 17 11:17:28 bee kernel: [   31.996556]  entry_SYSCALL64_slow_path
+0x25/0x25
Dec 17 11:17:28 bee kernel: [   31.997616] RIP: 0033:0x7fbfdcfd68f9
Dec 17 11:17:28 bee kernel: [   31.998643] RSP: 002b:00007ffd31e4f848 EFLAGS: 
00000246 ORIG_RAX: 0000000000000139
Dec 17 11:17:28 bee kernel: [   31.999659] RAX: ffffffffffffffda RBX: 
000055e4a76c8430 RCX: 00007fbfdcfd68f9
Dec 17 11:17:28 bee kernel: [   32.000689] RDX: 0000000000000000 RSI: 
00007fbfdd2a4565 RDI: 000000000000000e
Dec 17 11:17:28 bee kernel: [   32.001736] RBP: 00007fbfdd2a4565 R08: 
0000000000000000 R09: 00007ffd31e4f9c0
Dec 17 11:17:28 bee kernel: [   32.002813] R10: 000000000000000e R11: 
0000000000000246 R12: 0000000000000000
Dec 17 11:17:28 bee kernel: [   32.003862] R13: 000055e4a76d6710 R14: 
0000000000020000 R15: 000055e4a741b8e9
Dec 17 11:17:28 bee kernel: [   32.004906] Code: 48 89 df ba 4b 00 00 00 48 c7 
c6 60 62 13 a1 e8 11 b7 fc ff 48 89 df ba 1e 00 00 00 48 c7 c6 e0 61 13 a1 e8 
fd b6 fc ff 5b 5d c3 <0f> 0b ba 05 01 00 00 48 c7 c6 c0 5d 13 a1 e8 e7 b6 fc 
ff 48 89
Dec 17 11:17:28 bee kernel: [   32.006061] RIP: xgpu_vi_init_golden_registers
+0x56/0xa0 [amdgpu] RSP: ffffc900014dfa08
Dec 17 11:17:28 bee kernel: [   32.007226] ---[ end trace eb52a49a747a04be 
]---

	Which in the end means we got to the following BUG_ON on 
xgpu_vi_init_golden_registers:

	BUG_ON("Doesn't support chip type.\n");

	Following the path in vi_init_golden_registers:

        if (amdgpu_sriov_vf(adev)) {                                                                                  
                xgpu_vi_init_golden_registers(adev);
                mutex_unlock(&adev->grbm_idx_mutex);
                return;                          
        }

	System is using the following kernel and cpu:

$ uname -a
Linux bee 4.14.5 #10 SMP Wed Dec 13 12:07:06 EET 2017 x86_64 Intel(R) Core(TM) 
i7-7500U CPU @ 2.70GHz GenuineIntel GNU/Linux

	And the graphic card is the following:

# lspci -vvvs 01:00.0
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT 
[Radeon R7 M260/M265 / M340/M360 / M440/M445] (rev 81)
        Subsystem: Lenovo Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/
M445]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 128
        Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at b0000000 (64-bit, prefetchable) [size=2M]
        Region 4: I/O ports at 4000 [size=256]
        Region 5: Memory at b2300000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at b2340000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, 
L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- 
TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit 
Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, 
OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, 
EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, 
LinkEqualizationRequest-
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00338  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 
Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr
+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ 
ChkEn-
        Capabilities: [270 v1] #19
        Capabilities: [2b0 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable-, Smallest Translation Unit: 00
        Capabilities: [2c0 v1] Page Request Interface (PRI)
                PRICtl: Enable- Reset-
                PRISta: RF- UPRGI- Stopped+
                Page Request Capacity: 00000020, Page Request Allocation: 
00000000
        Capabilities: [2d0 v1] Process Address Space ID (PASID)
                PASIDCap: Exec+ Priv+, Max PASID Width: 10
                PASIDCtl: Enable- Exec- Priv-
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

	Funny thing is that I can boot the machine properly when not running on 
the battery, so either this seems to be a problem in the firmware, or in the 
way acpi interacts with the driver.

	Any help, or ideas are appreciated.

	José.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

             reply	other threads:[~2017-12-17 19:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-17 19:20 José Pekkarinen [this message]
2017-12-19  7:12 ` Topaz mistakenly reported as vf José Pekkarinen
2017-12-19  7:19   ` Yu, Xiangliang
     [not found]     ` <BY2PR1201MB0935B6DE5790F85F6CEC05FDEB0F0-O28G1zQ8oGkaqtME6NEo1mrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-12-19  7:27       ` José Pekkarinen
2017-12-19  7:44         ` Yu, Xiangliang
     [not found]           ` <BY2PR1201MB0935ECB9DDC4DE571DF65433EB0F0-O28G1zQ8oGkaqtME6NEo1mrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-12-19  7:50             ` José Pekkarinen
2017-12-19  7:56               ` Yu, Xiangliang
2017-12-19 14:20             ` Deucher, Alexander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2405754.qoBN6c2ta5@bee \
    --to=koalinux-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=Xiangliang.Yu-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.