* "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
@ 2017-10-19 10:24 Bjørn Mork
2017-10-19 10:31 ` Bjørn Mork
2017-10-19 10:36 ` Chris Wilson
0 siblings, 2 replies; 7+ messages in thread
From: Bjørn Mork @ 2017-10-19 10:24 UTC (permalink / raw)
To: intel-gfx
Hello,
I get these Oopses from time to time, but unfortunately(?) not often
enough to be anywhere near reproducible. But they seem to be related to
whatever activites my laptop/X-server/driver/gpu/screen is doing while
I'm not present. The oops happens when I'm away for a while. So I guess
it might be something related to screensaver and/or power saving
actions.
There is always a GPU HANG prior to the Oops, so these events are
probably related.
<6>[ 3925.798843] [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [850], reason: Hang on render ring, action: reset
<6>[ 3925.798851] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[ 3925.798854] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[ 3925.798857] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[ 3925.798860] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[ 3925.798863] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5>[ 3925.798923] drm/i915: Resetting chip after gpu hang
<6>[ 3925.798995] [drm] RC6 on
<6>[ 3925.816730] [drm] GuC firmware load skipped
<5>[ 3945.765299] drm/i915: Resetting chip after gpu hang
<6>[ 3945.765773] [drm] RC6 on
<6>[ 3945.782092] [drm] GuC firmware load skipped
<4>[ 3950.942974] e1000e 0000:00:1f.6: Failed to restore TIMINCA clock rate delta: -22
<5>[ 3967.781348] drm/i915: Resetting chip after gpu hang
<6>[ 3967.784013] [drm] RC6 on
<6>[ 3967.801547] [drm] GuC firmware load skipped
<5>[ 3987.781060] drm/i915: Resetting chip after gpu hang
<6>[ 3987.781148] [drm] RC6 on
<6>[ 3987.797332] [drm] GuC firmware load skipped
<5>[ 4005.796949] drm/i915: Resetting chip after gpu hang
<6>[ 4005.797031] [drm] RC6 on
<6>[ 4005.813929] [drm] GuC firmware load skipped
<5>[ 4023.780914] drm/i915: Resetting chip after gpu hang
<6>[ 4023.782354] [drm] RC6 on
<6>[ 4023.795459] [drm] GuC firmware load skipped
<5>[ 4046.788711] drm/i915: Resetting chip after gpu hang
<6>[ 4046.788806] [drm] RC6 on
<6>[ 4046.805294] [drm] GuC firmware load skipped
<5>[ 4064.772580] drm/i915: Resetting chip after gpu hang
<6>[ 4064.772670] [drm] RC6 on
<6>[ 4064.789342] [drm] GuC firmware load skipped
<5>[ 4080.772471] drm/i915: Resetting chip after gpu hang
<6>[ 4080.772563] [drm] RC6 on
<6>[ 4080.789200] [drm] GuC firmware load skipped
<5>[ 4095.780392] drm/i915: Resetting chip after gpu hang
<6>[ 4095.780501] [drm] RC6 on
<6>[ 4095.794800] [drm] GuC firmware load skipped
<5>[ 4109.796310] drm/i915: Resetting chip after gpu hang
<6>[ 4109.796401] [drm] RC6 on
<6>[ 4109.813305] [drm] GuC firmware load skipped
<5>[ 4126.788181] drm/i915: Resetting chip after gpu hang
<6>[ 4126.788276] [drm] RC6 on
<6>[ 4126.804593] [drm] GuC firmware load skipped
<5>[ 4143.780147] drm/i915: Resetting chip after gpu hang
<6>[ 4143.782293] [drm] RC6 on
<6>[ 4143.799046] [drm] GuC firmware load skipped
<5>[ 4162.787931] drm/i915: Resetting chip after gpu hang
<6>[ 4162.788409] [drm] RC6 on
<6>[ 4162.804360] [drm] GuC firmware load skipped
<5>[ 4175.779781] drm/i915: Resetting chip after gpu hang
<6>[ 4175.779865] [drm] RC6 on
<6>[ 4175.796174] [drm] GuC firmware load skipped
<5>[ 4196.771643] drm/i915: Resetting chip after gpu hang
<6>[ 4196.773680] [drm] RC6 on
<6>[ 4196.785992] [drm] GuC firmware load skipped
<5>[ 4226.787780] drm/i915: Resetting chip after gpu hang
<6>[ 4226.788233] [drm] RC6 on
<6>[ 4226.804266] [drm] GuC firmware load skipped
<5>[ 4241.795725] drm/i915: Resetting chip after gpu hang
<6>[ 4241.796153] [drm] RC6 on
<6>[ 4241.810190] [drm] GuC firmware load skipped
<5>[ 4261.795634] drm/i915: Resetting chip after gpu hang
<6>[ 4261.798342] [drm] RC6 on
<6>[ 4261.816858] [drm] GuC firmware load skipped
<5>[ 4284.803333] drm/i915: Resetting chip after gpu hang
<6>[ 4284.803784] [drm] RC6 on
<6>[ 4284.817656] [drm] GuC firmware load skipped
<5>[ 4296.803264] drm/i915: Resetting chip after gpu hang
<6>[ 4296.803717] [drm] RC6 on
<6>[ 4296.822146] [drm] GuC firmware load skipped
<5>[ 4313.794990] drm/i915: Resetting chip after gpu hang
<6>[ 4313.795068] [drm] RC6 on
<6>[ 4313.811487] [drm] GuC firmware load skipped
<5>[ 4322.787613] drm/i915: Resetting chip after gpu hang
<1>[ 4322.787674] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
<1>[ 4322.787759] IP: [<ffffffffc08c41a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[ 4322.787875] PGD 0
<4>[ 4322.787895]
<4>[ 4322.787916] Oops: 0000 [#1] SMP
<4>[ 4322.787947] Modules linked in: rfcomm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_conexant kvm_intel snd_hda_codec_generic kvm irqbypass crct10dif_pclmul crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp sparse_keymap snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress efi_pstore btusb btrtl btbcm cdc_mbim btintel cdc_wdm(O) bluetooth cdc_ncm iwlmvm usbnet qcserial mii usb_wwan usbserial snd_hda_intel mac80211 snd_hda_codec intel_uncore intel_rapl_perf evdev joydev serio_raw snd_hda_core efivars snd_hwdep snd_pcm iTCO_wdt snd_timer
intel_ish_ipc usb_common intel_ishtp thermal
<4>[ 4322.789722] CPU: 1 PID: 3039 Comm: kworker/1:0 Tainted: G O 4.9.0-4-amd64 #1 Debian 4.9.51-1
<4>[ 4322.789806] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET57W (1.31 ) 09/29/2017
<4>[ 4322.789906] Workqueue: events_long i915_hangcheck_elapsed [i915]
<4>[ 4322.789963] task: ffff9162993970c0 task.stack: ffffa952082e4000
<4>[ 4322.790037] RIP: 0010:[<ffffffffc08c41a3>] [<ffffffffc08c41a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[ 4322.790245] RSP: 0018:ffffa952082e7b70 EFLAGS: 00010202
<4>[ 4322.790334] RAX: 0000000000000202 RBX: ffff9162988b9a80 RCX: 0000000000000001
<4>[ 4322.790399] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
<4>[ 4322.790461] RBP: ffffa952082e7b90 R08: ffff9162eb9f08c8 R09: 0000000000000000
<4>[ 4322.790523] R10: ffff9162f0880800 R11: ffffffff968ad46d R12: ffff9162eb9f2860
<4>[ 4322.790586] R13: 0000000000000000 R14: ffff9162eb9f0000 R15: ffff9162d77d7000
<4>[ 4322.790653] FS: 0000000000000000(0000) GS:ffff916301480000(0000) knlGS:0000000000000000
<4>[ 4322.790723] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 4322.790775] CR2: 0000000000000070 CR3: 0000000429207000 CR4: 00000000003406e0
<4>[ 4322.790837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 4322.790900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[ 4322.790961] Stack:
<4>[ 4322.790982] ffff9162eb9f2860 ffff9162eb9f2ad8 ffff9162eb9f8760 ffff9162eb9f0000
<4>[ 4322.791063] ffff9162988b9a80 ffffffffc08afacc 0000000000000001 ffff9162eb9f0000
<4>[ 4322.791202] ffff9162eb9fa780 ffff9162edf68410 ffffffff960045b0 ffff9162eb9fa780
<4>[ 4322.791326] Call Trace:
<4>[ 4322.791395] [<ffffffffc08afacc>] ? i915_gem_reset+0x14c/0x240 [i915]
<4>[ 4322.791457] [<ffffffff960045b0>] ? bit_wait_io+0x60/0x60
<4>[ 4322.791532] [<ffffffffc08750f6>] ? i915_reset+0x86/0xd0 [i915]
<4>[ 4322.791616] [<ffffffffc0879f95>] ? i915_reset_and_wakeup+0x165/0x180 [i915]
<4>[ 4322.791707] [<ffffffffc087deda>] ? i915_handle_error+0x10a/0x5f0 [i915]
<4>[ 4322.791794] [<ffffffffc087e60a>] ? i915_hangcheck_elapsed+0x24a/0x520 [i915]
<4>[ 4322.791838] [<ffffffff95a90444>] ? process_one_work+0x184/0x410
<4>[ 4322.791859] [<ffffffff95a9071d>] ? worker_thread+0x4d/0x480
<4>[ 4322.791877] [<ffffffff95a906d0>] ? process_one_work+0x410/0x410
<4>[ 4322.791896] [<ffffffff95a7bb2a>] ? do_group_exit+0x3a/0xa0
<4>[ 4322.791915] [<ffffffff95a96697>] ? kthread+0xd7/0xf0
<4>[ 4322.791932] [<ffffffff95a965c0>] ? kthread_park+0x60/0x60
<4>[ 4322.791950] [<ffffffff96008835>] ? ret_from_fork+0x25/0x30
<4>[ 4322.791967] Code: 41 5e 5d c3 41 8b 44 24 28 b9 01 00 00 00 ba 00 00 ff ff 4c 89 f7 8d b0 a0 03 00 00 41 ff 96 68 07 00 00 4d 8b ac 24 38 02 00 00 <49> 8b 45 70 48 39 43 70 74 51 4d 85 ed 74 14 48 c7 c0 e0 a3 e9
<1>[ 4322.792103] RIP [<ffffffffc08c41a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[ 4322.794181] RSP <ffffa952082e7b70>
<4>[ 4322.796111] CR2: 0000000000000070
<4>[ 4322.810060] ---[ end trace 11d170a6d0542763 ]---
<4>[ 4323.017002] general protection fault: 0000 [#2] SMP
<4>[ 4323.019218] Modules linked in: rfcomm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_conexant kvm_intel snd_hda_codec_generic kvm irqbypass crct10dif_pclmul crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp sparse_keymap snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress efi_pstore btusb btrtl btbcm cdc_mbim btintel cdc_wdm(O) bluetooth cdc_ncm iwlmvm usbnet qcserial mii usb_wwan usbserial snd_hda_intel mac80211 snd_hda_codec intel_uncore intel_rapl_perf evdev joydev serio_raw snd_hda_core efivars snd_hwdep snd_pcm iTCO_wdt snd_timer
intel_ish_ipc usb_common intel_ishtp thermal
<4>[ 4323.029380] CPU: 1 PID: 3039 Comm: kworker/1:0 Tainted: G D O 4.9.0-4-amd64 #1 Debian 4.9.51-1
<4>[ 4323.031153] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET57W (1.31 ) 09/29/2017
<4>[ 4323.032875] task: ffff9162993970c0 task.stack: ffffa952082e4000
<4>[ 4323.034578] RIP: 0010:[<ffffffff95ab88a5>] [<ffffffff95ab88a5>] __wake_up_common+0x25/0x80
<4>[ 4323.036325] RSP: 0018:ffffa952082e7e70 EFLAGS: 00010002
<4>[ 4323.038024] RAX: 0000000000000282 RBX: ffffa952082e7f10 RCX: 0000000000000000
<4>[ 4323.039784] RDX: 954099b4e34ffa70 RSI: 0000000000000003 RDI: ffffa952082e7f10
<4>[ 4323.041497] RBP: ffffa952082e7f18 R08: 0000000000000000 R09: ffff9162ea4ff100
<4>[ 4323.043241] R10: 0000000002f41000 R11: 00000000c5672a10 R12: 0000000000000282
<4>[ 4323.044945] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000046
<4>[ 4323.046637] FS: 0000000000000000(0000) GS:ffff916301480000(0000) knlGS:0000000000000000
<4>[ 4323.048379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 4323.050071] CR2: 0000000000000028 CR3: 000000042d9e9000 CR4: 00000000003406e0
<4>[ 4323.051807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 4323.053495] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[ 4323.055238] Stack:
<4>[ 4323.056927] 00000001e34ffa70 ffffa952082e7f10 ffffa952082e7f08 0000000000000282
<4>[ 4323.058600] 0000000000000000 0000000000000001 0000000000000046 ffffffff95ab92f1
<4>[ 4323.060307] ffff9162993977d8 ffff9162993970c0 0000000000000000 ffffffff95a74230
<4>[ 4323.061979] Call Trace:
<4>[ 4323.063658] [<ffffffff95ab92f1>] ? complete+0x31/0x40
<4>[ 4323.065310] [<ffffffff95a74230>] ? mm_release+0xb0/0x130
<4>[ 4323.067011] [<ffffffff95a7b120>] ? do_exit+0x150/0xae0
<4>[ 4323.068671] [<ffffffff96009d97>] ? rewind_stack_do_exit+0x17/0x20
<4>[ 4323.070303] Code: 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 8d 6f 08 48 83 ec 08 89 54 24 04 48 8b 57 08 48 39 d5 <48> 8b 32 74 43 48 8d 42 e8 4c 8d 7e e8 41 89 cd 4d 89 c4 8b 18
<1>[ 4323.072102] RIP [<ffffffff95ab88a5>] __wake_up_common+0x25/0x80
<4>[ 4323.073792] RSP <ffffa952082e7e70>
<4>[ 4323.075494] ---[ end trace 11d170a6d0542764 ]---
<1>[ 4323.262525] Fixing recursive fault but reboot is needed!
<1>[ 4323.262555] BUG: unable to handle kernel paging request at ffffffffffffffd8
<1>[ 4323.264262] IP: [<ffffffff95a970fc>] kthread_data+0xc/0x20
<4>[ 4323.265945] PGD 42920a067
<4>[ 4323.265951] PUD 42920c067
<4>[ 4323.267651] PMD 0
<4>[ 4323.267653]
<4>[ 4323.269343] Oops: 0000 [#3] SMP
<4>[ 4323.271053] Modules linked in: rfcomm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_conexant kvm_intel snd_hda_codec_generic kvm irqbypass crct10dif_pclmul crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp sparse_keymap snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress efi_pstore btusb btrtl btbcm cdc_mbim btintel cdc_wdm(O) bluetooth cdc_ncm iwlmvm usbnet qcserial mii usb_wwan usbserial snd_hda_intel mac80211 snd_hda_codec intel_uncore intel_rapl_perf evdev joydev serio_raw snd_hda_core efivars snd_hwdep snd_pcm iTCO_wdt snd_timer
<4>[ 4323.274686] iTCO_vendor_support i915 rtsx_pci_ms memstick drm_kms_helper thinkpad_acpi drm iwlwifi hid_sensor_accel_3d hid_sensor_trigger mei_me hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf uvcvideo cfg80211 mei intel_pch_thermal industrialio i2c_algo_bit wmi videobuf2_vmalloc videobuf2_memops nvram videobuf2_v4l2 shpchp snd videobuf2_core soundcore rfkill tpm_crb videodev battery ac media video button parport_pc ppdev lp parport sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache crc32c_generic hid_sensor_custom hid_sensor_hub intel_ishtp_hid hid rtsx_pci_sdmmc mmc_core crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd rtsx_pci psmouse mfd_core e1000e ptp pps_core i2c_i801 i2c_smbus nvme xhci_pci nvme_core xhci_hcd usbcore intel_ish_ipc usb_common intel_ishtp thermal
<4>[ 4323.282175] CPU: 1 PID: 3039 Comm: kworker/1:0 Tainted: G D O 4.9.0-4-amd64 #1 Debian 4.9.51-1
<4>[ 4323.284056] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET57W (1.31 ) 09/29/2017
<4>[ 4323.285873] task: ffff9162993970c0 task.stack: ffffa952082e4000
<4>[ 4323.287699] RIP: 0010:[<ffffffff95a970fc>] [<ffffffff95a970fc>] kthread_data+0xc/0x20
<4>[ 4323.289438] RSP: 0018:ffffa952082e7e70 EFLAGS: 00010002
<4>[ 4323.291164] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
<4>[ 4323.292827] RDX: ffff9162f0806700 RSI: ffff916299397140 RDI: ffff9162993970c0
<4>[ 4323.294485] RBP: ffffa952082e7ec8 R08: ffff916299397168 R09: 000000000000cc00
<4>[ 4323.296157] R10: 0000000000000000 R11: ffff916299397140 R12: ffff916301498240
<4>[ 4323.297789] R13: ffff9162993970c0 R14: ffff916299397678 R15: 0000000000000046
<4>[ 4323.299457] FS: 0000000000000000(0000) GS:ffff916301480000(0000) knlGS:0000000000000000
<4>[ 4323.301085] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 4323.302745] CR2: 0000000000000028 CR3: 000000042d9e9000 CR4: 00000000003406e0
<4>[ 4323.304390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 4323.306006] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[ 4323.307661] Stack:
<4>[ 4323.309279] ffffffff95a9162a ffffffff96003b48 ffffa95200000008 00ffa952082e7ee8
<4>[ 4323.310939] ffff916301498240 954099b4e34ffa70 ffff9162993970c0 ffffa952082e7dc8
<4>[ 4323.312577] 0000000000000000 0000000000000003 0000000000000046 000000000000000b
<4>[ 4323.314203] Call Trace:
<4>[ 4323.315857] [<ffffffff95a9162a>] ? wq_worker_sleeping+0xa/0x80
<4>[ 4323.317499] [<ffffffff96003b48>] ? __schedule+0x498/0x6d0
<4>[ 4323.319165] [<ffffffff96003db2>] ? schedule+0x32/0x80
<4>[ 4323.320787] [<ffffffff95a7b889>] ? do_exit+0x8b9/0xae0
<4>[ 4323.322406] [<ffffffff96009d97>] ? rewind_stack_do_exit+0x17/0x20
<4>[ 4323.324070] Code: c0 0f 85 50 ff ff ff eb ab e8 d1 9f 04 00 e9 a3 fe ff ff 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 58 05 00 00 <48> 8b 40 d8 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f
<1>[ 4323.325805] RIP [<ffffffff95a970fc>] kthread_data+0xc/0x20
<4>[ 4323.327507] RSP <ffffa952082e7e70>
<4>[ 4323.329176] CR2: ffffffffffffffd8
<4>[ 4323.330878] ---[ end trace 11d170a6d0542765 ]---
Bjørn
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
2017-10-19 10:24 "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring Bjørn Mork
@ 2017-10-19 10:31 ` Bjørn Mork
2017-10-19 10:36 ` Chris Wilson
1 sibling, 0 replies; 7+ messages in thread
From: Bjørn Mork @ 2017-10-19 10:31 UTC (permalink / raw)
To: intel-gfx
Here's another one I had lying around in pstore. Note that this one has
slighty older kernel and BIOS version, possibly eliminating a couple of
variables.
<6>[18285.732012] [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [842], reason: Hang on render ring, action: reset
<6>[18285.732024] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[18285.732028] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[18285.732032] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[18285.732036] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[18285.732040] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5>[18285.732126] drm/i915: Resetting chip after gpu hang
<6>[18285.734955] [drm] RC6 on
<6>[18285.757377] [drm] GuC firmware load skipped
<5>[18305.729576] drm/i915: Resetting chip after gpu hang
<6>[18305.729657] [drm] RC6 on
<6>[18305.749353] [drm] GuC firmware load skipped
<5>[18323.745560] drm/i915: Resetting chip after gpu hang
<6>[18323.745668] [drm] RC6 on
<6>[18323.763981] [drm] GuC firmware load skipped
<5>[18344.769430] drm/i915: Resetting chip after gpu hang
<6>[18344.771606] [drm] RC6 on
<6>[18344.788075] [drm] GuC firmware load skipped
<5>[18367.746011] drm/i915: Resetting chip after gpu hang
<6>[18367.746094] [drm] RC6 on
<6>[18367.762999] [drm] GuC firmware load skipped
<5>[18381.729133] drm/i915: Resetting chip after gpu hang
<6>[18381.729245] [drm] RC6 on
<6>[18381.745539] [drm] GuC firmware load skipped
<5>[18405.761079] drm/i915: Resetting chip after gpu hang
<6>[18405.762607] [drm] RC6 on
<6>[18405.777474] [drm] GuC firmware load skipped
<5>[18427.776756] drm/i915: Resetting chip after gpu hang
<6>[18427.776850] [drm] RC6 on
<6>[18427.797496] [drm] GuC firmware load skipped
<5>[18440.768699] drm/i915: Resetting chip after gpu hang
<6>[18440.768781] [drm] RC6 on
<6>[18440.790409] [drm] GuC firmware load skipped
<5>[18464.768597] drm/i915: Resetting chip after gpu hang
<6>[18464.770785] [drm] RC6 on
<6>[18464.785094] [drm] GuC firmware load skipped
<5>[18486.752455] drm/i915: Resetting chip after gpu hang
<6>[18486.752551] [drm] RC6 on
<6>[18486.769592] [drm] GuC firmware load skipped
<5>[18500.768308] drm/i915: Resetting chip after gpu hang
<6>[18500.768404] [drm] RC6 on
<6>[18500.787102] [drm] GuC firmware load skipped
<5>[18519.744175] drm/i915: Resetting chip after gpu hang
<6>[18519.744274] [drm] RC6 on
<6>[18519.760581] [drm] GuC firmware load skipped
<5>[18545.760078] drm/i915: Resetting chip after gpu hang
<6>[18545.760181] [drm] RC6 on
<6>[18545.777935] [drm] GuC firmware load skipped
<5>[18563.743956] drm/i915: Resetting chip after gpu hang
<6>[18563.746147] [drm] RC6 on
<6>[18563.764567] [drm] GuC firmware load skipped
<5>[18587.743747] drm/i915: Resetting chip after gpu hang
<6>[18587.745436] [drm] RC6 on
<6>[18587.764232] [drm] GuC firmware load skipped
<5>[18607.743605] drm/i915: Resetting chip after gpu hang
<6>[18607.743689] [drm] RC6 on
<6>[18607.760602] [drm] GuC firmware load skipped
<5>[18622.751509] drm/i915: Resetting chip after gpu hang
<6>[18622.751597] [drm] RC6 on
<6>[18622.766647] [drm] GuC firmware load skipped
<5>[18641.759427] drm/i915: Resetting chip after gpu hang
<6>[18641.759909] [drm] RC6 on
<6>[18641.779665] [drm] GuC firmware load skipped
<5>[18657.759315] drm/i915: Resetting chip after gpu hang
<6>[18657.759415] [drm] RC6 on
<6>[18657.775636] [drm] GuC firmware load skipped
<5>[18672.767168] drm/i915: Resetting chip after gpu hang
<6>[18672.767263] [drm] RC6 on
<6>[18672.785638] [drm] GuC firmware load skipped
<5>[18700.767010] drm/i915: Resetting chip after gpu hang
<6>[18700.769198] [drm] RC6 on
<6>[18700.785483] [drm] GuC firmware load skipped
<5>[18726.782784] drm/i915: Resetting chip after gpu hang
<6>[18726.782869] [drm] RC6 on
<6>[18726.799174] [drm] GuC firmware load skipped
<5>[18743.742708] drm/i915: Resetting chip after gpu hang
<6>[18743.742806] [drm] RC6 on
<6>[18743.759822] [drm] GuC firmware load skipped
<5>[18753.758594] drm/i915: Resetting chip after gpu hang
<6>[18753.759089] [drm] RC6 on
<6>[18753.776971] [drm] GuC firmware load skipped
<4>[18755.118569] ------------[ cut here ]------------
<4>[18755.118601] WARNING: CPU: 2 PID: 883 at /build/linux-VffmcQ/linux-4.9.47/drivers/gpu/drm/i915/intel_display.c:14192 intel_atomic_commit_tail+0xf2b/0xf50 [i915]
<4>[18755.118602] pipe A vblank wait timed out
<4>[18755.118603] Modules linked in: uinput nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace fscache rfcomm ctr ccm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl uvcvideo x86_pkg_temp_thermal intel_powerclamp coretemp videobuf2_vmalloc kvm_intel videobuf2_memops snd_hda_codec_conexant kvm snd_hda_codec_generic irqbypass videobuf2_v4l2 videobuf2_core crct10dif_pclmul videodev media crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress sparse_keymap efi_pstore iwlmvm snd_hda_intel intel_uncore snd_hda_codec snd_hda_core
psmouse e1000e ptp i2c_i801 pps_core i2c_smbus nvme nvme_core xhci_pci rtsx_pci xhci_hcd mfd_core usbcore intel_ish_ipc usb_common intel_ishtp thermal
<4>[18755.118685] CPU: 2 PID: 883 Comm: InputThread Not tainted 4.9.0-4-amd64 #1 Debian 4.9.47-1
<4>[18755.118686] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET55W (1.29 ) 09/08/2017
<4>[18755.118688] 0000000000000000 ffffffff9a5299e4 ffffb4e1825179f0 0000000000000000
<4>[18755.118690] ffffffff9a276e9e 0000000000000000 ffffb4e182517a48 ffff987daab40000
<4>[18755.118692] 0000000000000000 ffff987da92b2000 0000000000000001 ffffffff9a276f1f
<4>[18755.118694] Call Trace:
<4>[18755.118699] [<ffffffff9a5299e4>] ? dump_stack+0x5c/0x78
<4>[18755.118702] [<ffffffff9a276e9e>] ? __warn+0xbe/0xe0
<4>[18755.118704] [<ffffffff9a276f1f>] ? warn_slowpath_fmt+0x5f/0x80
<4>[18755.118706] [<ffffffff9a2b8c6c>] ? finish_wait+0x3c/0x70
<4>[18755.118723] [<ffffffffc08d4bfb>] ? intel_atomic_commit_tail+0xf2b/0xf50 [i915]
<4>[18755.118725] [<ffffffff9a2b8e70>] ? prepare_to_wait_event+0xf0/0xf0
<4>[18755.118739] [<ffffffffc08d4f75>] ? intel_atomic_commit+0x355/0x4c0 [i915]
<4>[18755.118747] [<ffffffffc07db7ec>] ? restore_fbdev_mode+0x14c/0x270 [drm_kms_helper]
<4>[18755.118762] [<ffffffffc06a0b60>] ? drm_modeset_lock_all_ctx+0xa0/0xb0 [drm]
<4>[18755.118766] [<ffffffffc07dd32e>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper]
<4>[18755.118782] [<ffffffffc08efcc4>] ? intel_fbdev_restore_mode+0x34/0xb0 [i915]
<4>[18755.118792] [<ffffffffc085804a>] ? i915_driver_lastclose+0xa/0x10 [i915]
<4>[18755.118800] [<ffffffffc068c094>] ? drm_lastclose+0x34/0xf0 [drm]
<4>[18755.118806] [<ffffffffc068c39e>] ? drm_release+0x24e/0x300 [drm]
<4>[18755.118823] [<ffffffff9a404e05>] ? __fput+0xd5/0x220
<4>[18755.118843] [<ffffffff9a294ba9>] ? task_work_run+0x79/0xa0
<4>[18755.118845] [<ffffffff9a27b29a>] ? do_exit+0x2da/0xae0
<4>[18755.118847] [<ffffffff9a27bb1a>] ? do_group_exit+0x3a/0xa0
<4>[18755.118848] [<ffffffff9a286a57>] ? get_signal+0x297/0x640
<4>[18755.118851] [<ffffffff9a225446>] ? do_signal+0x36/0x6a0
<4>[18755.118853] [<ffffffff9a548639>] ? list_del+0x9/0x30
<4>[18755.118855] [<ffffffff9a2a18b0>] ? wake_up_q+0x70/0x70
<4>[18755.118857] [<ffffffff9a203251>] ? exit_to_usermode_loop+0x71/0xb0
<4>[18755.118859] [<ffffffff9a203a94>] ? syscall_return_slowpath+0x54/0x60
<4>[18755.118862] [<ffffffff9a808688>] ? system_call_fast_compare_end+0x99/0x9b
<4>[18755.118864] ---[ end trace 8449c2edfd44e7de ]---
<3>[18765.214447] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
<3>[18775.454357] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
<3>[18785.694286] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
<3>[18796.446269] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
<5>[18801.758292] drm/i915: Resetting chip after gpu hang
<6>[18801.760213] [drm] RC6 on
<6>[18801.774388] [drm] GuC firmware load skipped
<5>[18811.742979] drm/i915: Resetting chip after gpu hang
<1>[18811.743071] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
<1>[18811.743211] IP: [<ffffffffc08a91a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[18811.743393] PGD 0
<4>[18811.743427]
<4>[18811.743461] Oops: 0000 [#1] SMP
<4>[18811.743511] Modules linked in: uinput nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace fscache rfcomm ctr ccm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl uvcvideo x86_pkg_temp_thermal intel_powerclamp coretemp videobuf2_vmalloc kvm_intel videobuf2_memops snd_hda_codec_conexant kvm snd_hda_codec_generic irqbypass videobuf2_v4l2 videobuf2_core crct10dif_pclmul videodev media crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress sparse_keymap efi_pstore iwlmvm snd_hda_intel intel_uncore snd_hda_codec snd_hda_core
psmouse e1000e ptp i2c_i801 pps_core i2c_smbus nvme nvme_core xhci_pci rtsx_pci xhci_hcd mfd_core usbcore intel_ish_ipc usb_common intel_ishtp thermal
<4>[18811.746662] CPU: 1 PID: 7742 Comm: kworker/1:2 Tainted: G W 4.9.0-4-amd64 #1 Debian 4.9.47-1
<4>[18811.746801] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET55W (1.29 ) 09/08/2017
<4>[18811.746967] Workqueue: events_long i915_hangcheck_elapsed [i915]
<4>[18811.747061] task: ffff987d7dd6c000 task.stack: ffffb4e188598000
<4>[18811.747149] RIP: 0010:[<ffffffffc08a91a3>] [<ffffffffc08a91a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[18811.747240] RSP: 0018:ffffb4e18859bb70 EFLAGS: 00010202
<4>[18811.747263] RAX: 0000000000000202 RBX: ffff987cd5af9700 RCX: 0000000000000001
<4>[18811.747291] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
<4>[18811.747319] RBP: ffffb4e18859bb90 R08: ffff987daab408c8 R09: ffff987daab40790
<4>[18811.747348] R10: ffffffff9ac24240 R11: ffffffff9b0ac46d R12: ffff987daab42860
<4>[18811.747376] R13: 0000000000000000 R14: ffff987daab40000 R15: ffff987dac428000
<4>[18811.747405] FS: 0000000000000000(0000) GS:ffff987dc1480000(0000) knlGS:0000000000000000
<4>[18811.747437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[18811.747460] CR2: 0000000000000070 CR3: 0000000330807000 CR4: 00000000003406e0
<4>[18811.747489] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[18811.747518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[18811.747545] Stack:
<4>[18811.747580] ffff987daab42860 ffff987daab42ad8 ffff987daab48760 ffff987daab40000
<4>[18811.747617] ffff987cd5af9700 ffffffffc0894acc 0000000000000001 ffff987daab40000
<4>[18811.747652] ffff987daab4a780 ffff987dab73dc10 ffffffff9a8045e0 ffff987daab4a780
<4>[18811.747689] Call Trace:
<4>[18811.747718] [<ffffffffc0894acc>] ? i915_gem_reset+0x14c/0x240 [i915]
<4>[18811.747746] [<ffffffff9a8045e0>] ? bit_wait_io+0x60/0x60
<4>[18811.747781] [<ffffffffc085a0f6>] ? i915_reset+0x86/0xd0 [i915]
<4>[18811.747816] [<ffffffffc085ef95>] ? i915_reset_and_wakeup+0x165/0x180 [i915]
<4>[18811.747856] [<ffffffffc0862eda>] ? i915_handle_error+0x10a/0x5f0 [i915]
<4>[18811.747896] [<ffffffffc086360a>] ? i915_hangcheck_elapsed+0x24a/0x520 [i915]
<4>[18811.747927] [<ffffffff9a290434>] ? process_one_work+0x184/0x410
<4>[18811.747952] [<ffffffff9a29070d>] ? worker_thread+0x4d/0x480
<4>[18811.747977] [<ffffffff9a2906c0>] ? process_one_work+0x410/0x410
<4>[18811.748003] [<ffffffff9a296687>] ? kthread+0xd7/0xf0
<4>[18811.748025] [<ffffffff9a2965b0>] ? kthread_park+0x60/0x60
<4>[18811.748048] [<ffffffff9a2965b0>] ? kthread_park+0x60/0x60
<4>[18811.748081] [<ffffffff9a808875>] ? ret_from_fork+0x25/0x30
<4>[18811.748127] Code: 41 5e 5d c3 41 8b 44 24 28 b9 01 00 00 00 ba 00 00 ff ff 4c 89 f7 8d b0 a0 03 00 00 41 ff 96 68 07 00 00 4d 8b ac 24 38 02 00 00 <49> 8b 45 70 48 39 43 70 74 51 4d 85 ed 74 14 48 c7 c0 50 a4 69
<1>[18811.748301] RIP [<ffffffffc08a91a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[18811.748348] RSP <ffffb4e18859bb70>
<4>[18811.748363] CR2: 0000000000000070
<4>[18811.763419] ---[ end trace 8449c2edfd44e7df ]---
<4>[18811.926397] general protection fault: 0000 [#2] SMP
<4>[18811.926422] Modules linked in: uinput nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace fscache rfcomm ctr ccm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl uvcvideo x86_pkg_temp_thermal intel_powerclamp coretemp videobuf2_vmalloc kvm_intel videobuf2_memops snd_hda_codec_conexant kvm snd_hda_codec_generic irqbypass videobuf2_v4l2 videobuf2_core crct10dif_pclmul videodev media crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress sparse_keymap efi_pstore iwlmvm snd_hda_intel intel_uncore snd_hda_codec snd_hda_core
psmouse e1000e ptp i2c_i801 pps_core i2c_smbus nvme nvme_core xhci_pci rtsx_pci xhci_hcd mfd_core usbcore intel_ish_ipc usb_common intel_ishtp thermal
<4>[18811.937354] CPU: 1 PID: 7742 Comm: kworker/1:2 Tainted: G D W 4.9.0-4-amd64 #1 Debian 4.9.47-1
<4>[18811.939264] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET55W (1.29 ) 09/08/2017
<4>[18811.941035] task: ffff987d7dd6c000 task.stack: ffffb4e188598000
<4>[18811.942844] RIP: 0010:[<ffffffff9a2b8895>] [<ffffffff9a2b8895>] __wake_up_common+0x25/0x80
<4>[18811.944640] RSP: 0018:ffffb4e18859be70 EFLAGS: 00010006
<4>[18811.946456] RAX: 0000000000000282 RBX: ffffb4e18859bf10 RCX: 0000000000000000
<4>[18811.948259] RDX: cfbbfc963cd9c904 RSI: 0000000000000003 RDI: ffffb4e18859bf10
<4>[18811.950063] RBP: ffffb4e18859bf18 R08: 0000000000000000 R09: 0000000000000001
<4>[18811.951830] R10: 00000000004aa2de R11: 00000000009212e5 R12: 0000000000000282
<4>[18811.953610] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000046
<4>[18811.955417] FS: 0000000000000000(0000) GS:ffff987dc1480000(0000) knlGS:0000000000000000
<4>[18811.957203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[18811.959000] CR2: 0000000000000028 CR3: 000000042d87d000 CR4: 00000000003406e0
<4>[18811.960760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[18811.962554] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[18811.964333] Stack:
<4>[18811.966066] 000000013cd9c904 ffffb4e18859bf10 ffffb4e18859bf08 0000000000000282
<4>[18811.967804] 0000000000000000 0000000000000001 0000000000000046 ffffffff9a2b92e1
<4>[18811.969519] ffff987d7dd6c718 ffff987d7dd6c000 0000000000000000 ffffffff9a274220
<4>[18811.971288] Call Trace:
<4>[18811.972993] [<ffffffff9a2b92e1>] ? complete+0x31/0x40
<4>[18811.974738] [<ffffffff9a274220>] ? mm_release+0xb0/0x130
<4>[18811.976474] [<ffffffff9a27b110>] ? do_exit+0x150/0xae0
<4>[18811.978243] [<ffffffff9a809dd7>] ? rewind_stack_do_exit+0x17/0x20
<4>[18811.979918] Code: 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 8d 6f 08 48 83 ec 08 89 54 24 04 48 8b 57 08 48 39 d5 <48> 8b 32 74 43 48 8d 42 e8 4c 8d 7e e8 41 89 cd 4d 89 c4 8b 18
<1>[18811.981693] RIP [<ffffffff9a2b8895>] __wake_up_common+0x25/0x80
<4>[18811.983446] RSP <ffffb4e18859be70>
<4>[18811.985180] ---[ end trace 8449c2edfd44e7e0 ]---
<1>[18812.145808] Fixing recursive fault but reboot is needed!
<1>[18812.145841] BUG: unable to handle kernel paging request at ffffffffffffffd8
<1>[18812.147611] IP: [<ffffffff9a2970ec>] kthread_data+0xc/0x20
<4>[18812.149348] PGD 33080a067
<4>[18812.149355] PUD 33080c067
<4>[18812.151114] PMD 0
<4>[18812.151116]
<4>[18812.152811] Oops: 0000 [#3] SMP
<4>[18812.154521] Modules linked in: uinput nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace fscache rfcomm ctr ccm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl uvcvideo x86_pkg_temp_thermal intel_powerclamp coretemp videobuf2_vmalloc kvm_intel videobuf2_memops snd_hda_codec_conexant kvm snd_hda_codec_generic irqbypass videobuf2_v4l2 videobuf2_core crct10dif_pclmul videodev media crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress sparse_keymap efi_pstore iwlmvm snd_hda_intel intel_uncore snd_hda_codec snd_hda_core
<4>[18812.158273] joydev evdev mac80211 intel_rapl_perf serio_raw efivars snd_hwdep hid_sensor_accel_3d hid_sensor_trigger i915 hid_sensor_iio_common iTCO_wdt snd_pcm industrialio_triggered_buffer iTCO_vendor_support snd_timer cdc_mbim cdc_wdm rtsx_pci_ms drm_kms_helper mei_me iwlwifi memstick mei thinkpad_acpi cfg80211 btusb cdc_ncm btrtl nvram btbcm usbnet btintel snd kfifo_buf shpchp qcserial drm usb_wwan bluetooth soundcore industrialio intel_pch_thermal mii usbserial i2c_algo_bit wmi ac battery rfkill video tpm_crb button parport_pc ppdev lp parport sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache crc32c_generic hid_sensor_custom hid_sensor_hub intel_ishtp_hid hid rtsx_pci_sdmmc mmc_core crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse e1000e ptp i2c_i801 pps_core i2c_smbus nvme nvme_core xhci_pci rtsx_pci xhci_hcd mfd_core usbcore intel_ish_ipc usb_common intel_ishtp thermal
<4>[18812.166145] CPU: 1 PID: 7742 Comm: kworker/1:2 Tainted: G D W 4.9.0-4-amd64 #1 Debian 4.9.47-1
<4>[18812.168081] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET55W (1.29 ) 09/08/2017
<4>[18812.169973] task: ffff987d7dd6c000 task.stack: ffffb4e188598000
<4>[18812.171878] RIP: 0010:[<ffffffff9a2970ec>] [<ffffffff9a2970ec>] kthread_data+0xc/0x20
<4>[18812.173688] RSP: 0018:ffffb4e18859be70 EFLAGS: 00010002
<4>[18812.175553] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
<4>[18812.177271] RDX: ffff987db0806640 RSI: ffff987d7dd6c080 RDI: ffff987d7dd6c000
<4>[18812.179042] RBP: ffffb4e18859bec8 R08: ffff987d7dd6c0a8 R09: 000000000000d400
<4>[18812.180743] R10: 0000000000000000 R11: ffff987d7dd6c080 R12: ffff987dc1498240
<4>[18812.182466] R13: ffff987d7dd6c000 R14: ffff987d7dd6c5b8 R15: 0000000000000046
<4>[18812.184156] FS: 0000000000000000(0000) GS:ffff987dc1480000(0000) knlGS:0000000000000000
<4>[18812.185838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[18812.187603] CR2: 0000000000000028 CR3: 000000042d87d000 CR4: 00000000003406e0
<4>[18812.189316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[18812.191062] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[18812.192739] Stack:
<4>[18812.194437] ffffffff9a29161a ffffffff9a803b78 ffffb4e100000008 00ffb4e18859bee8
<4>[18812.196148] ffff987dc1498240 cfbbfc963cd9c904 ffff987d7dd6c000 ffffb4e18859bdc8
<4>[18812.197829] 0000000000000000 0000000000000003 0000000000000046 000000000000000b
<4>[18812.199602] Call Trace:
<4>[18812.201278] [<ffffffff9a29161a>] ? wq_worker_sleeping+0xa/0x80
<4>[18812.202984] [<ffffffff9a803b78>] ? __schedule+0x498/0x6d0
<4>[18812.204685] [<ffffffff9a803de2>] ? schedule+0x32/0x80
<4>[18812.206394] [<ffffffff9a27b879>] ? do_exit+0x8b9/0xae0
<4>[18812.208069] [<ffffffff9a809dd7>] ? rewind_stack_do_exit+0x17/0x20
<4>[18812.209774] Code: c0 0f 85 50 ff ff ff eb ab e8 d1 9f 04 00 e9 a3 fe ff ff 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 58 05 00 00 <48> 8b 40 d8 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f
<1>[18812.211593] RIP [<ffffffff9a2970ec>] kthread_data+0xc/0x20
<4>[18812.213327] RSP <ffffb4e18859be70>
<4>[18812.215103] CR2: ffffffffffffffd8
<4>[18812.216846] ---[ end trace 8449c2edfd44e7e1 ]---
Bjørn
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
2017-10-19 10:24 "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring Bjørn Mork
2017-10-19 10:31 ` Bjørn Mork
@ 2017-10-19 10:36 ` Chris Wilson
2017-10-19 10:43 ` Bjørn Mork
1 sibling, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2017-10-19 10:36 UTC (permalink / raw)
To: Bjørn Mork, intel-gfx
Quoting Bjørn Mork (2017-10-19 11:24:57)
> Hello,
>
> I get these Oopses from time to time, but unfortunately(?) not often
> enough to be anywhere near reproducible. But they seem to be related to
> whatever activites my laptop/X-server/driver/gpu/screen is doing while
> I'm not present. The oops happens when I'm away for a while. So I guess
> it might be something related to screensaver and/or power saving
> actions.
>
> There is always a GPU HANG prior to the Oops, so these events are
> probably related.
It is, but the oops implies that hw got lost along the way. You need a
more recent kernel to avoid that trap, and at a guess
intel_iommu=igfx_off to avoid the hangs in the first place.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
2017-10-19 10:36 ` Chris Wilson
@ 2017-10-19 10:43 ` Bjørn Mork
2017-10-19 10:55 ` Bjørn Mork
2017-10-19 10:56 ` Chris Wilson
0 siblings, 2 replies; 7+ messages in thread
From: Bjørn Mork @ 2017-10-19 10:43 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Bjørn Mork (2017-10-19 11:24:57)
>> Hello,
>>
>> I get these Oopses from time to time, but unfortunately(?) not often
>> enough to be anywhere near reproducible. But they seem to be related to
>> whatever activites my laptop/X-server/driver/gpu/screen is doing while
>> I'm not present. The oops happens when I'm away for a while. So I guess
>> it might be something related to screensaver and/or power saving
>> actions.
>>
>> There is always a GPU HANG prior to the Oops, so these events are
>> probably related.
>
> It is, but the oops implies that hw got lost along the way. You need a
> more recent kernel to avoid that trap,
OK. Would it be possible to backport those fixes to linux-4.9.y as a
service to Debian users and other stable kernel users? Or did you mean a
newer 4.9.y kernel?
> and at a guess
> intel_iommu=igfx_off to avoid the hangs in the first place.
Thanks for the tip. I'll try that
Bjørn
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
2017-10-19 10:43 ` Bjørn Mork
@ 2017-10-19 10:55 ` Bjørn Mork
2017-10-19 11:08 ` Chris Wilson
2017-10-19 10:56 ` Chris Wilson
1 sibling, 1 reply; 7+ messages in thread
From: Bjørn Mork @ 2017-10-19 10:55 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
Bjørn Mork <bjorn@mork.no> writes:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
>> and at a guess
>> intel_iommu=igfx_off to avoid the hangs in the first place.
>
> Thanks for the tip. I'll try that
My memory is more than a bit flakey, but this did eventually ring a
bell.,. And googling I see that I have tried that tip before without
success:
https://bugs.freedesktop.org/show_bug.cgi?id=101288
If you look at the logs attached to that bug, you'll see that the PC was
running with "intel_iommu=igfx_off" at the time of the hang:
[ 54047.182] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-3-amd64 root=UUID=71507198-90f4-4c25-be41-efc47d2dedd1 ro intel_iommu=igfx_off
I believe the GPU hang reported in that bug is the same? The driver
just did not Oops while trying to reset.
Bjørn
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
2017-10-19 10:55 ` Bjørn Mork
@ 2017-10-19 11:08 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-10-19 11:08 UTC (permalink / raw)
To: Bjørn Mork; +Cc: intel-gfx
Quoting Bjørn Mork (2017-10-19 11:55:18)
> Bjørn Mork <bjorn@mork.no> writes:
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> >
> >> and at a guess
> >> intel_iommu=igfx_off to avoid the hangs in the first place.
> >
> > Thanks for the tip. I'll try that
>
> My memory is more than a bit flakey, but this did eventually ring a
> bell.,. And googling I see that I have tried that tip before without
> success:
>
> https://bugs.freedesktop.org/show_bug.cgi?id=101288
>
>
> If you look at the logs attached to that bug, you'll see that the PC was
> running with "intel_iommu=igfx_off" at the time of the hang:
>
> [ 54047.182] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-3-amd64 root=UUID=71507198-90f4-4c25-be41-efc47d2dedd1 ro intel_iommu=igfx_off
IOMMU enabled?: 0
So no, not that; just a regular bug.
> I believe the GPU hang reported in that bug is the same? The driver
> just did not Oops while trying to reset.
That oops should only happen when the context-switches are out of kilter
with the breadcrumbs (i.e. the context switched away before the request
was completed; that's something we try to catch during CI as the hw
behaving unexpectedly). iommu is easy to suspect as we know it can
introduce memory latencies that cause reordering of events, a nightmare.
We've also fixed a number of races around reset, but I do not recall if
they were before or after 4.9. Simplest way is grab drm-tip, apply
existing userspace (hoping it hangs) and checking it no longer oopses.
Reverse bisect to find the backport.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
2017-10-19 10:43 ` Bjørn Mork
2017-10-19 10:55 ` Bjørn Mork
@ 2017-10-19 10:56 ` Chris Wilson
1 sibling, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2017-10-19 10:56 UTC (permalink / raw)
To: Bjørn Mork; +Cc: intel-gfx
Quoting Bjørn Mork (2017-10-19 11:43:17)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Quoting Bjørn Mork (2017-10-19 11:24:57)
> >> Hello,
> >>
> >> I get these Oopses from time to time, but unfortunately(?) not often
> >> enough to be anywhere near reproducible. But they seem to be related to
> >> whatever activites my laptop/X-server/driver/gpu/screen is doing while
> >> I'm not present. The oops happens when I'm away for a while. So I guess
> >> it might be something related to screensaver and/or power saving
> >> actions.
> >>
> >> There is always a GPU HANG prior to the Oops, so these events are
> >> probably related.
> >
> > It is, but the oops implies that hw got lost along the way. You need a
> > more recent kernel to avoid that trap,
>
> OK. Would it be possible to backport those fixes to linux-4.9.y as a
> service to Debian users and other stable kernel users? Or did you mean a
> newer 4.9.y kernel?
Take 4.14 and patch the Makefile. ;) Hmm, now that I think about it the
preventative rewrite may not land until 4.15.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-10-19 11:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-19 10:24 "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring Bjørn Mork
2017-10-19 10:31 ` Bjørn Mork
2017-10-19 10:36 ` Chris Wilson
2017-10-19 10:43 ` Bjørn Mork
2017-10-19 10:55 ` Bjørn Mork
2017-10-19 11:08 ` Chris Wilson
2017-10-19 10:56 ` Chris Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox