"BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring

* "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring
@ 2017-10-19 10:24 Bjørn Mork
  2017-10-19 10:31 ` Bjørn Mork
  2017-10-19 10:36 ` Chris Wilson
  0 siblings, 2 replies; 7+ messages in thread
From: Bjørn Mork @ 2017-10-19 10:24 UTC (permalink / raw)
  To: intel-gfx

Hello,

I get these Oopses from time to time, but unfortunately(?) not often
enough to be anywhere near reproducible.  But they seem to be related to
whatever activites my laptop/X-server/driver/gpu/screen is doing while
I'm not present. The oops happens when I'm away for a while.  So I guess
it might be something related to screensaver and/or power saving
actions.

There is always a GPU HANG prior to the Oops, so these events are
probably related.  

<6>[ 3925.798843] [drm] GPU HANG: ecode 9:0:0xfffffffe, in Xorg [850], reason: Hang on render ring, action: reset
<6>[ 3925.798851] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[ 3925.798854] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[ 3925.798857] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[ 3925.798860] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[ 3925.798863] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5>[ 3925.798923] drm/i915: Resetting chip after gpu hang
<6>[ 3925.798995] [drm] RC6 on
<6>[ 3925.816730] [drm] GuC firmware load skipped
<5>[ 3945.765299] drm/i915: Resetting chip after gpu hang
<6>[ 3945.765773] [drm] RC6 on
<6>[ 3945.782092] [drm] GuC firmware load skipped
<4>[ 3950.942974] e1000e 0000:00:1f.6: Failed to restore TIMINCA clock rate delta: -22
<5>[ 3967.781348] drm/i915: Resetting chip after gpu hang
<6>[ 3967.784013] [drm] RC6 on
<6>[ 3967.801547] [drm] GuC firmware load skipped
<5>[ 3987.781060] drm/i915: Resetting chip after gpu hang
<6>[ 3987.781148] [drm] RC6 on
<6>[ 3987.797332] [drm] GuC firmware load skipped
<5>[ 4005.796949] drm/i915: Resetting chip after gpu hang
<6>[ 4005.797031] [drm] RC6 on
<6>[ 4005.813929] [drm] GuC firmware load skipped
<5>[ 4023.780914] drm/i915: Resetting chip after gpu hang
<6>[ 4023.782354] [drm] RC6 on
<6>[ 4023.795459] [drm] GuC firmware load skipped
<5>[ 4046.788711] drm/i915: Resetting chip after gpu hang
<6>[ 4046.788806] [drm] RC6 on
<6>[ 4046.805294] [drm] GuC firmware load skipped
<5>[ 4064.772580] drm/i915: Resetting chip after gpu hang
<6>[ 4064.772670] [drm] RC6 on
<6>[ 4064.789342] [drm] GuC firmware load skipped
<5>[ 4080.772471] drm/i915: Resetting chip after gpu hang
<6>[ 4080.772563] [drm] RC6 on
<6>[ 4080.789200] [drm] GuC firmware load skipped
<5>[ 4095.780392] drm/i915: Resetting chip after gpu hang
<6>[ 4095.780501] [drm] RC6 on
<6>[ 4095.794800] [drm] GuC firmware load skipped
<5>[ 4109.796310] drm/i915: Resetting chip after gpu hang
<6>[ 4109.796401] [drm] RC6 on
<6>[ 4109.813305] [drm] GuC firmware load skipped
<5>[ 4126.788181] drm/i915: Resetting chip after gpu hang
<6>[ 4126.788276] [drm] RC6 on
<6>[ 4126.804593] [drm] GuC firmware load skipped
<5>[ 4143.780147] drm/i915: Resetting chip after gpu hang
<6>[ 4143.782293] [drm] RC6 on
<6>[ 4143.799046] [drm] GuC firmware load skipped
<5>[ 4162.787931] drm/i915: Resetting chip after gpu hang
<6>[ 4162.788409] [drm] RC6 on
<6>[ 4162.804360] [drm] GuC firmware load skipped
<5>[ 4175.779781] drm/i915: Resetting chip after gpu hang
<6>[ 4175.779865] [drm] RC6 on
<6>[ 4175.796174] [drm] GuC firmware load skipped
<5>[ 4196.771643] drm/i915: Resetting chip after gpu hang
<6>[ 4196.773680] [drm] RC6 on
<6>[ 4196.785992] [drm] GuC firmware load skipped
<5>[ 4226.787780] drm/i915: Resetting chip after gpu hang
<6>[ 4226.788233] [drm] RC6 on
<6>[ 4226.804266] [drm] GuC firmware load skipped
<5>[ 4241.795725] drm/i915: Resetting chip after gpu hang
<6>[ 4241.796153] [drm] RC6 on
<6>[ 4241.810190] [drm] GuC firmware load skipped
<5>[ 4261.795634] drm/i915: Resetting chip after gpu hang
<6>[ 4261.798342] [drm] RC6 on
<6>[ 4261.816858] [drm] GuC firmware load skipped
<5>[ 4284.803333] drm/i915: Resetting chip after gpu hang
<6>[ 4284.803784] [drm] RC6 on
<6>[ 4284.817656] [drm] GuC firmware load skipped
<5>[ 4296.803264] drm/i915: Resetting chip after gpu hang
<6>[ 4296.803717] [drm] RC6 on
<6>[ 4296.822146] [drm] GuC firmware load skipped
<5>[ 4313.794990] drm/i915: Resetting chip after gpu hang
<6>[ 4313.795068] [drm] RC6 on
<6>[ 4313.811487] [drm] GuC firmware load skipped
<5>[ 4322.787613] drm/i915: Resetting chip after gpu hang
<1>[ 4322.787674] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
<1>[ 4322.787759] IP: [<ffffffffc08c41a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[ 4322.787875] PGD 0 
<4>[ 4322.787895] 
<4>[ 4322.787916] Oops: 0000 [#1] SMP
<4>[ 4322.787947] Modules linked in: rfcomm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_conexant kvm_intel snd_hda_codec_generic kvm irqbypass crct10dif_pclmul crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp sparse_keymap snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress efi_pstore btusb btrtl btbcm cdc_mbim btintel cdc_wdm(O) bluetooth cdc_ncm iwlmvm usbnet qcserial mii usb_wwan usbserial snd_hda_intel mac80211 snd_hda_codec intel_uncore intel_rapl_perf evdev joydev serio_raw snd_hda_core efivars snd_hwdep snd_pcm iTCO_wdt snd_timer
 intel_ish_ipc usb_common intel_ishtp thermal
<4>[ 4322.789722] CPU: 1 PID: 3039 Comm: kworker/1:0 Tainted: G           O    4.9.0-4-amd64 #1 Debian 4.9.51-1
<4>[ 4322.789806] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET57W (1.31 ) 09/29/2017
<4>[ 4322.789906] Workqueue: events_long i915_hangcheck_elapsed [i915]
<4>[ 4322.789963] task: ffff9162993970c0 task.stack: ffffa952082e4000
<4>[ 4322.790037] RIP: 0010:[<ffffffffc08c41a3>]  [<ffffffffc08c41a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[ 4322.790245] RSP: 0018:ffffa952082e7b70  EFLAGS: 00010202
<4>[ 4322.790334] RAX: 0000000000000202 RBX: ffff9162988b9a80 RCX: 0000000000000001
<4>[ 4322.790399] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
<4>[ 4322.790461] RBP: ffffa952082e7b90 R08: ffff9162eb9f08c8 R09: 0000000000000000
<4>[ 4322.790523] R10: ffff9162f0880800 R11: ffffffff968ad46d R12: ffff9162eb9f2860
<4>[ 4322.790586] R13: 0000000000000000 R14: ffff9162eb9f0000 R15: ffff9162d77d7000
<4>[ 4322.790653] FS:  0000000000000000(0000) GS:ffff916301480000(0000) knlGS:0000000000000000
<4>[ 4322.790723] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 4322.790775] CR2: 0000000000000070 CR3: 0000000429207000 CR4: 00000000003406e0
<4>[ 4322.790837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 4322.790900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[ 4322.790961] Stack:
<4>[ 4322.790982]  ffff9162eb9f2860 ffff9162eb9f2ad8 ffff9162eb9f8760 ffff9162eb9f0000
<4>[ 4322.791063]  ffff9162988b9a80 ffffffffc08afacc 0000000000000001 ffff9162eb9f0000
<4>[ 4322.791202]  ffff9162eb9fa780 ffff9162edf68410 ffffffff960045b0 ffff9162eb9fa780
<4>[ 4322.791326] Call Trace:
<4>[ 4322.791395]  [<ffffffffc08afacc>] ? i915_gem_reset+0x14c/0x240 [i915]
<4>[ 4322.791457]  [<ffffffff960045b0>] ? bit_wait_io+0x60/0x60
<4>[ 4322.791532]  [<ffffffffc08750f6>] ? i915_reset+0x86/0xd0 [i915]
<4>[ 4322.791616]  [<ffffffffc0879f95>] ? i915_reset_and_wakeup+0x165/0x180 [i915]
<4>[ 4322.791707]  [<ffffffffc087deda>] ? i915_handle_error+0x10a/0x5f0 [i915]
<4>[ 4322.791794]  [<ffffffffc087e60a>] ? i915_hangcheck_elapsed+0x24a/0x520 [i915]
<4>[ 4322.791838]  [<ffffffff95a90444>] ? process_one_work+0x184/0x410
<4>[ 4322.791859]  [<ffffffff95a9071d>] ? worker_thread+0x4d/0x480
<4>[ 4322.791877]  [<ffffffff95a906d0>] ? process_one_work+0x410/0x410
<4>[ 4322.791896]  [<ffffffff95a7bb2a>] ? do_group_exit+0x3a/0xa0
<4>[ 4322.791915]  [<ffffffff95a96697>] ? kthread+0xd7/0xf0
<4>[ 4322.791932]  [<ffffffff95a965c0>] ? kthread_park+0x60/0x60
<4>[ 4322.791950]  [<ffffffff96008835>] ? ret_from_fork+0x25/0x30
<4>[ 4322.791967] Code: 41 5e 5d c3 41 8b 44 24 28 b9 01 00 00 00 ba 00 00 ff ff 4c 89 f7 8d b0 a0 03 00 00 41 ff 96 68 07 00 00 4d 8b ac 24 38 02 00 00 <49> 8b 45 70 48 39 43 70 74 51 4d 85 ed 74 14 48 c7 c0 e0 a3 e9 
<1>[ 4322.792103] RIP  [<ffffffffc08c41a3>] reset_common_ring+0xc3/0x170 [i915]
<4>[ 4322.794181]  RSP <ffffa952082e7b70>
<4>[ 4322.796111] CR2: 0000000000000070
<4>[ 4322.810060] ---[ end trace 11d170a6d0542763 ]---
<4>[ 4323.017002] general protection fault: 0000 [#2] SMP
<4>[ 4323.019218] Modules linked in: rfcomm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_conexant kvm_intel snd_hda_codec_generic kvm irqbypass crct10dif_pclmul crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp sparse_keymap snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress efi_pstore btusb btrtl btbcm cdc_mbim btintel cdc_wdm(O) bluetooth cdc_ncm iwlmvm usbnet qcserial mii usb_wwan usbserial snd_hda_intel mac80211 snd_hda_codec intel_uncore intel_rapl_perf evdev joydev serio_raw snd_hda_core efivars snd_hwdep snd_pcm iTCO_wdt snd_timer
 intel_ish_ipc usb_common intel_ishtp thermal
<4>[ 4323.029380] CPU: 1 PID: 3039 Comm: kworker/1:0 Tainted: G      D    O    4.9.0-4-amd64 #1 Debian 4.9.51-1
<4>[ 4323.031153] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET57W (1.31 ) 09/29/2017
<4>[ 4323.032875] task: ffff9162993970c0 task.stack: ffffa952082e4000
<4>[ 4323.034578] RIP: 0010:[<ffffffff95ab88a5>]  [<ffffffff95ab88a5>] __wake_up_common+0x25/0x80
<4>[ 4323.036325] RSP: 0018:ffffa952082e7e70  EFLAGS: 00010002
<4>[ 4323.038024] RAX: 0000000000000282 RBX: ffffa952082e7f10 RCX: 0000000000000000
<4>[ 4323.039784] RDX: 954099b4e34ffa70 RSI: 0000000000000003 RDI: ffffa952082e7f10
<4>[ 4323.041497] RBP: ffffa952082e7f18 R08: 0000000000000000 R09: ffff9162ea4ff100
<4>[ 4323.043241] R10: 0000000002f41000 R11: 00000000c5672a10 R12: 0000000000000282
<4>[ 4323.044945] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000046
<4>[ 4323.046637] FS:  0000000000000000(0000) GS:ffff916301480000(0000) knlGS:0000000000000000
<4>[ 4323.048379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 4323.050071] CR2: 0000000000000028 CR3: 000000042d9e9000 CR4: 00000000003406e0
<4>[ 4323.051807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 4323.053495] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[ 4323.055238] Stack:
<4>[ 4323.056927]  00000001e34ffa70 ffffa952082e7f10 ffffa952082e7f08 0000000000000282
<4>[ 4323.058600]  0000000000000000 0000000000000001 0000000000000046 ffffffff95ab92f1
<4>[ 4323.060307]  ffff9162993977d8 ffff9162993970c0 0000000000000000 ffffffff95a74230
<4>[ 4323.061979] Call Trace:
<4>[ 4323.063658]  [<ffffffff95ab92f1>] ? complete+0x31/0x40
<4>[ 4323.065310]  [<ffffffff95a74230>] ? mm_release+0xb0/0x130
<4>[ 4323.067011]  [<ffffffff95a7b120>] ? do_exit+0x150/0xae0
<4>[ 4323.068671]  [<ffffffff96009d97>] ? rewind_stack_do_exit+0x17/0x20
<4>[ 4323.070303] Code: 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 8d 6f 08 48 83 ec 08 89 54 24 04 48 8b 57 08 48 39 d5 <48> 8b 32 74 43 48 8d 42 e8 4c 8d 7e e8 41 89 cd 4d 89 c4 8b 18 
<1>[ 4323.072102] RIP  [<ffffffff95ab88a5>] __wake_up_common+0x25/0x80
<4>[ 4323.073792]  RSP <ffffa952082e7e70>
<4>[ 4323.075494] ---[ end trace 11d170a6d0542764 ]---
<1>[ 4323.262525] Fixing recursive fault but reboot is needed!
<1>[ 4323.262555] BUG: unable to handle kernel paging request at ffffffffffffffd8
<1>[ 4323.264262] IP: [<ffffffff95a970fc>] kthread_data+0xc/0x20
<4>[ 4323.265945] PGD 42920a067 
<4>[ 4323.265951] PUD 42920c067 
<4>[ 4323.267651] PMD 0 
<4>[ 4323.267653] 
<4>[ 4323.269343] Oops: 0000 [#3] SMP
<4>[ 4323.271053] Modules linked in: rfcomm ipt_REJECT nf_reject_ipv4 iptable_filter xt_set ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc cmac tun bnep binfmt_misc nls_ascii nls_cp437 vfat fat arc4 snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_conexant kvm_intel snd_hda_codec_generic kvm irqbypass crct10dif_pclmul crc32_pclmul snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp sparse_keymap snd_hda_ext_core snd_soc_sst_match ghash_clmulni_intel snd_soc_core intel_cstate snd_compress efi_pstore btusb btrtl btbcm cdc_mbim btintel cdc_wdm(O) bluetooth cdc_ncm iwlmvm usbnet qcserial mii usb_wwan usbserial snd_hda_intel mac80211 snd_hda_codec intel_uncore intel_rapl_perf evdev joydev serio_raw snd_hda_core efivars snd_hwdep snd_pcm iTCO_wdt snd_timer
<4>[ 4323.274686]  iTCO_vendor_support i915 rtsx_pci_ms memstick drm_kms_helper thinkpad_acpi drm iwlwifi hid_sensor_accel_3d hid_sensor_trigger mei_me hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf uvcvideo cfg80211 mei intel_pch_thermal industrialio i2c_algo_bit wmi videobuf2_vmalloc videobuf2_memops nvram videobuf2_v4l2 shpchp snd videobuf2_core soundcore rfkill tpm_crb videodev battery ac media video button parport_pc ppdev lp parport sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache crc32c_generic hid_sensor_custom hid_sensor_hub intel_ishtp_hid hid rtsx_pci_sdmmc mmc_core crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd rtsx_pci psmouse mfd_core e1000e ptp pps_core i2c_i801 i2c_smbus nvme xhci_pci nvme_core xhci_hcd usbcore intel_ish_ipc usb_common intel_ishtp thermal
<4>[ 4323.282175] CPU: 1 PID: 3039 Comm: kworker/1:0 Tainted: G      D    O    4.9.0-4-amd64 #1 Debian 4.9.51-1
<4>[ 4323.284056] Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET57W (1.31 ) 09/29/2017
<4>[ 4323.285873] task: ffff9162993970c0 task.stack: ffffa952082e4000
<4>[ 4323.287699] RIP: 0010:[<ffffffff95a970fc>]  [<ffffffff95a970fc>] kthread_data+0xc/0x20
<4>[ 4323.289438] RSP: 0018:ffffa952082e7e70  EFLAGS: 00010002
<4>[ 4323.291164] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
<4>[ 4323.292827] RDX: ffff9162f0806700 RSI: ffff916299397140 RDI: ffff9162993970c0
<4>[ 4323.294485] RBP: ffffa952082e7ec8 R08: ffff916299397168 R09: 000000000000cc00
<4>[ 4323.296157] R10: 0000000000000000 R11: ffff916299397140 R12: ffff916301498240
<4>[ 4323.297789] R13: ffff9162993970c0 R14: ffff916299397678 R15: 0000000000000046
<4>[ 4323.299457] FS:  0000000000000000(0000) GS:ffff916301480000(0000) knlGS:0000000000000000
<4>[ 4323.301085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 4323.302745] CR2: 0000000000000028 CR3: 000000042d9e9000 CR4: 00000000003406e0
<4>[ 4323.304390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 4323.306006] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[ 4323.307661] Stack:
<4>[ 4323.309279]  ffffffff95a9162a ffffffff96003b48 ffffa95200000008 00ffa952082e7ee8
<4>[ 4323.310939]  ffff916301498240 954099b4e34ffa70 ffff9162993970c0 ffffa952082e7dc8
<4>[ 4323.312577]  0000000000000000 0000000000000003 0000000000000046 000000000000000b
<4>[ 4323.314203] Call Trace:
<4>[ 4323.315857]  [<ffffffff95a9162a>] ? wq_worker_sleeping+0xa/0x80
<4>[ 4323.317499]  [<ffffffff96003b48>] ? __schedule+0x498/0x6d0
<4>[ 4323.319165]  [<ffffffff96003db2>] ? schedule+0x32/0x80
<4>[ 4323.320787]  [<ffffffff95a7b889>] ? do_exit+0x8b9/0xae0
<4>[ 4323.322406]  [<ffffffff96009d97>] ? rewind_stack_do_exit+0x17/0x20
<4>[ 4323.324070] Code: c0 0f 85 50 ff ff ff eb ab e8 d1 9f 04 00 e9 a3 fe ff ff 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 58 05 00 00 <48> 8b 40 d8 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 
<1>[ 4323.325805] RIP  [<ffffffff95a970fc>] kthread_data+0xc/0x20
<4>[ 4323.327507]  RSP <ffffa952082e7e70>
<4>[ 4323.329176] CR2: ffffffffffffffd8
<4>[ 4323.330878] ---[ end trace 11d170a6d0542765 ]---

Bjørn
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread