* Re: kernel 3.11.6 general protection fault @ 2013-11-13 19:58 MPhil. Emanoil Kotsev 2013-11-13 20:09 ` [Intel-gfx] " Daniel Vetter 0 siblings, 1 reply; 13+ messages in thread From: MPhil. Emanoil Kotsev @ 2013-11-13 19:58 UTC (permalink / raw) To: Borislav Petkov; +Cc: linux-kernel, intel-gfx (sorry it replys automaticaly only to the sender - now added the list) What do the intel-gfx people think? ====== original mail follows ======= Hi sorry for bothering you once again. I noticed most of the issues are coming from drm (I have the stupid "Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)") So I checked today the logs again and found out it crashed in the mornign when turning on the notebook in the office. Is there something you can conclude from the trace below and another question - why is it checking CRTC as I have LVDS, VGA1 and DVI1 - actually using only the LVDS and DVI outputs Thanks again for taking your time Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut here ]------------ Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915] () Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match! Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2 snd _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_eve nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy mac8 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev sha256_ ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44 sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core uhci_ hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys freq_table usbcore usb_common button intel_agp intel_gtt agpgart Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg Tainted: P 3.11.6eko2 #3 Nov 13 09:36:21 maistor kernel: [ 40.447386] Hardware name: Dell Inc. Latitude D520 /0NF743, BIOS A04 12/18/2006 Nov 13 09:36:21 maistor kernel: [ 40.447388] 0000000000000000 0000000000000009 ffffffff813ce8ab ffff880079c8f888 Nov 13 09:36:21 maistor kernel: [ 40.447392] ffffffff81038001 ffff88007a2596d8 ffff880079c8f900 ffff880037f3a000 Nov 13 09:36:21 maistor kernel: [ 40.447395] 0000000000000001 ffff880037f3a488 ffffffff810380e5 ffffffffa0295531 Nov 13 09:36:21 maistor kernel: [ 40.447398] Call Trace: Nov 13 09:36:21 maistor kernel: [ 40.447407] [<ffffffff813ce8ab>] ? dump_stack+0x41/0x51 Nov 13 09:36:21 maistor kernel: [ 40.447412] [<ffffffff81038001>] ? warn_slowpath_common+0x81/0xb0 Nov 13 09:36:21 maistor kernel: [ 40.447415] [<ffffffff810380e5>] ? warn_slowpath_fmt+0x45/0x50 Nov 13 09:36:21 maistor kernel: [ 40.447427] [<ffffffffa024338f>] ? check_crtc_state+0x5cf/0xa60 [i915] Nov 13 09:36:21 maistor kernel: [ 40.447440] [<ffffffffa024db7d>] ? intel_modeset_check_state+0x2bd/0x730 [i915] Nov 13 09:36:21 maistor kernel: [ 40.447445] [<ffffffff811ec219>] ? snprintf+0x39/0x40 Nov 13 09:36:21 maistor kernel: [ 40.447456] [<ffffffffa024e05d>] ? intel_set_mode+0x1d/0x30 [i915] Nov 13 09:36:21 maistor kernel: [ 40.447467] [<ffffffffa024e81a>] ? intel_crtc_set_config+0x7aa/0x980 [i915] Nov 13 09:36:21 maistor kernel: [ 40.447481] [<ffffffffa00f9155>] ? drm_mode_set_config_internal+0x55/0xd0 [drm] Nov 13 09:36:21 maistor kernel: [ 40.447490] [<ffffffffa00fb118>] ? drm_mode_setcrtc+0x118/0x640 [drm] Nov 13 09:36:21 maistor kernel: [ 40.447497] [<ffffffffa00ec11d>] ? drm_ioctl+0x4ed/0x5f0 [drm] Nov 13 09:36:21 maistor kernel: [ 40.447507] [<ffffffffa00fb000>] ? drm_mode_setplane+0x3a0/0x3a0 [drm] Nov 13 09:36:21 maistor kernel: [ 40.447512] [<ffffffff8111428b>] ? do_vfs_ioctl+0x8b/0x520 Nov 13 09:36:21 maistor kernel: [ 40.447515] [<ffffffff8111476d>] ? SyS_ioctl+0x4d/0xa0 Nov 13 09:36:21 maistor kernel: [ 40.447519] [<ffffffff813d4c56>] ? system_call_fastpath+0x1a/0x1f Nov 13 09:36:21 maistor kernel: [ 40.447521] ---[ end trace 307df46ce6dc8ed1 ]--- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-13 19:58 kernel 3.11.6 general protection fault MPhil. Emanoil Kotsev @ 2013-11-13 20:09 ` Daniel Vetter 2013-11-13 20:33 ` Borislav Petkov 0 siblings, 1 reply; 13+ messages in thread From: Daniel Vetter @ 2013-11-13 20:09 UTC (permalink / raw) To: MPhil. Emanoil Kotsev; +Cc: Borislav Petkov, intel-gfx, linux-kernel On Wed, Nov 13, 2013 at 08:58:29PM +0100, MPhil. Emanoil Kotsev wrote: > (sorry it replys automaticaly only to the sender - now added the list) > > What do the intel-gfx people think? > > ====== original mail follows ======= > Hi sorry for bothering you once again. > > I noticed most of the issues are coming from drm (I have the stupid "Intel > Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics > Controller (rev 03)") > > So I checked today the logs again and found out it crashed in the mornign when > turning on the notebook in the office. > > Is there something you can conclude from the trace below and another > question - why is it checking CRTC as I have LVDS, VGA1 and DVI1 - actually > using only the LVDS and DVI outputs > > Thanks again for taking your time Testing on latest drm-intel-nightly from http://cgit.freedesktop.org/~danvet/drm-intel/ If that doesn't help then please boot with drm.debug=0xe, reproduce the issue and then attach the complete dmesg. Please make sure everything starting from boot messages is in there, increase the dmesg buffer with log_buf_len=4M or so if that isn't the case. -Daniel > > > > Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut > here ]------------ > Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at > drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915] > () > Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match! > Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer > acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2 > snd > _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss > snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi > snd_seq_midi_eve > nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich > iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy > mac8 > 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc > irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev > sha256_ > ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44 > sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core > uhci_ > hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core > crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys > freq_table > usbcore usb_common button intel_agp intel_gtt agpgart > Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg > Tainted: P 3.11.6eko2 #3 > Nov 13 09:36:21 maistor kernel: [ 40.447386] Hardware name: Dell Inc. > Latitude D520 /0NF743, BIOS A04 12/18/2006 > Nov 13 09:36:21 maistor kernel: [ 40.447388] 0000000000000000 > 0000000000000009 ffffffff813ce8ab ffff880079c8f888 > Nov 13 09:36:21 maistor kernel: [ 40.447392] ffffffff81038001 > ffff88007a2596d8 ffff880079c8f900 ffff880037f3a000 > Nov 13 09:36:21 maistor kernel: [ 40.447395] 0000000000000001 > ffff880037f3a488 ffffffff810380e5 ffffffffa0295531 > Nov 13 09:36:21 maistor kernel: [ 40.447398] Call Trace: > Nov 13 09:36:21 maistor kernel: [ 40.447407] [<ffffffff813ce8ab>] ? > dump_stack+0x41/0x51 > Nov 13 09:36:21 maistor kernel: [ 40.447412] [<ffffffff81038001>] ? > warn_slowpath_common+0x81/0xb0 > Nov 13 09:36:21 maistor kernel: [ 40.447415] [<ffffffff810380e5>] ? > warn_slowpath_fmt+0x45/0x50 > Nov 13 09:36:21 maistor kernel: [ 40.447427] [<ffffffffa024338f>] ? > check_crtc_state+0x5cf/0xa60 [i915] > Nov 13 09:36:21 maistor kernel: [ 40.447440] [<ffffffffa024db7d>] ? > intel_modeset_check_state+0x2bd/0x730 [i915] > Nov 13 09:36:21 maistor kernel: [ 40.447445] [<ffffffff811ec219>] ? > snprintf+0x39/0x40 > Nov 13 09:36:21 maistor kernel: [ 40.447456] [<ffffffffa024e05d>] ? > intel_set_mode+0x1d/0x30 [i915] > Nov 13 09:36:21 maistor kernel: [ 40.447467] [<ffffffffa024e81a>] ? > intel_crtc_set_config+0x7aa/0x980 [i915] > Nov 13 09:36:21 maistor kernel: [ 40.447481] [<ffffffffa00f9155>] ? > drm_mode_set_config_internal+0x55/0xd0 [drm] > Nov 13 09:36:21 maistor kernel: [ 40.447490] [<ffffffffa00fb118>] ? > drm_mode_setcrtc+0x118/0x640 [drm] > Nov 13 09:36:21 maistor kernel: [ 40.447497] [<ffffffffa00ec11d>] ? > drm_ioctl+0x4ed/0x5f0 [drm] > Nov 13 09:36:21 maistor kernel: [ 40.447507] [<ffffffffa00fb000>] ? > drm_mode_setplane+0x3a0/0x3a0 [drm] > Nov 13 09:36:21 maistor kernel: [ 40.447512] [<ffffffff8111428b>] ? > do_vfs_ioctl+0x8b/0x520 > Nov 13 09:36:21 maistor kernel: [ 40.447515] [<ffffffff8111476d>] ? > SyS_ioctl+0x4d/0xa0 > Nov 13 09:36:21 maistor kernel: [ 40.447519] [<ffffffff813d4c56>] ? > system_call_fastpath+0x1a/0x1f > Nov 13 09:36:21 maistor kernel: [ 40.447521] ---[ end trace > 307df46ce6dc8ed1 ]--- > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-13 20:09 ` [Intel-gfx] " Daniel Vetter @ 2013-11-13 20:33 ` Borislav Petkov 2013-11-13 21:19 ` MPhil. Emanoil Kotsev 2013-11-17 11:35 ` MPhil. Emanoil Kotsev 0 siblings, 2 replies; 13+ messages in thread From: Borislav Petkov @ 2013-11-13 20:33 UTC (permalink / raw) To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter Some more suggestions, in addition to Daniel's: On Wed, Nov 13, 2013 at 09:09:14PM +0100, Daniel Vetter wrote: > > Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut > > here ]------------ > > Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at > > drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915] > > () > > Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match! That's if (active && !intel_pipe_config_compare(dev, &crtc->config, &pipe_config)) { WARN(1, "pipe state doesn't match!\n"); <--- intel_dump_pipe_config(crtc, &pipe_config, "[hw state]"); intel_dump_pipe_config(crtc, &crtc->config, "[sw state]"); } > > Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer > > acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2 > > snd > > _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss > > snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi > > snd_seq_midi_eve > > nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich > > iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy > > mac8 > > 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc > > irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev > > sha256_ > > ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44 > > sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core > > uhci_ > > hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core > > crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys > > freq_table > > usbcore usb_common button intel_agp intel_gtt agpgart > > Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg > > Tainted: P 3.11.6eko2 #3 And there's that taint P again due to the vmware modules. I know that you tried without the vmware modules where your kernel wasn't tainted but then you got a #GP which could be something entirely different. But now you're hitting some sanity-checking code which could mean there's some corruption happening. So, can you reproduce that exact same warning, i.e. this one: WARNING: CPU: 1 PID: 4142 at drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]() pipe state doesn't match! *without* the vmware modules installed? Also, it wouldn't hurt to try the shiny new 3.12. HTH. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-13 20:33 ` Borislav Petkov @ 2013-11-13 21:19 ` MPhil. Emanoil Kotsev 2013-11-17 11:35 ` MPhil. Emanoil Kotsev 1 sibling, 0 replies; 13+ messages in thread From: MPhil. Emanoil Kotsev @ 2013-11-13 21:19 UTC (permalink / raw) To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter Hi On Wednesday 13 November 2013 21:33:19 Borislav Petkov wrote: > Some more suggestions, in addition to Daniel's: > > On Wed, Nov 13, 2013 at 09:09:14PM +0100, Daniel Vetter wrote: > > > Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut > > > here ]------------ > > > Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: > > > 4142 at drivers/gpu/drm/i915/intel_display.c:8292 > > > check_crtc_state+0x5cf/0xa60 [i915] () > > > Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't > > > match! > > That's > > if (active && > !intel_pipe_config_compare(dev, &crtc->config, &pipe_config)) { > WARN(1, "pipe state doesn't match!\n"); <--- > intel_dump_pipe_config(crtc, &pipe_config, > "[hw state]"); > intel_dump_pipe_config(crtc, &crtc->config, > "[sw state]"); > } > I looked there, but it would have taken more time then available to get an idea on what it is exactly trying to do > > > Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: > > > snd_hrtimer acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet > > > ipv6 firewire_sbp2 snd > > > _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss > > > snd_mixer_oss snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss > > > snd_seq_midi > > > snd_seq_midi_eve > > > nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd > > > gpio_ich iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support > > > ehci_pci iwlegacy mac8 > > > 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket > > > pcmcia_rsrc irda 8250 evdev wmi processor dcdbas rtc_cmos battery > > > crc_ccitt ac joydev sha256_ > > > ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg > > > b44 sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia > > > pcmcia_core uhci_ > > > hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight > > > firewire_core crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core > > > thermal thermal_sys freq_table > > > usbcore usb_common button intel_agp intel_gtt agpgart > > > Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: > > > Xorg Tainted: P 3.11.6eko2 #3 > > And there's that taint P again due to the vmware modules. > > I know that you tried without the vmware modules where your kernel > wasn't tainted but then you got a #GP which could be something entirely > different. But now you're hitting some sanity-checking code which could > mean there's some corruption happening. Yes with #GP machine locks and this time it didn't > > So, can you reproduce that exact same warning, i.e. this one: > > WARNING: CPU: 1 PID: 4142 at drivers/gpu/drm/i915/intel_display.c:8292 > check_crtc_state+0x5cf/0xa60 [i915]() pipe state doesn't match! > > *without* the vmware modules installed? I'm not sure - you know it happens random > > Also, it wouldn't hurt to try the shiny new 3.12. I was thinking to do so - but lets be honest. I would save everybody's time if I were 100% sure it is a hardware issue and I would buy a new notebook. The one was serving great for the past 7y and it payed off itself already long time ago. I could try 3.12 and also try in combination with the git drm-intel as Daniel suggested. I'm still thinking that his has something to do with the graphics, but rather guessing from intuition. I'm not sure if it helps somehow but when I grep as following I find only the tainted erros - it's not visible which of them were GP, but still it shows where it hit the issue zgrep 'Comm:' messages* | more messages:Nov 11 10:52:42 maistor kernel: [ 43.961984] CPU: 1 PID: 4103 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages:Nov 11 10:52:54 maistor kernel: [ 55.759687] CPU: 1 PID: 4103 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages:Nov 12 10:35:42 maistor kernel: [ 28.626271] CPU: 0 PID: 3895 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages:Nov 12 10:35:55 maistor kernel: [ 41.618447] CPU: 1 PID: 3895 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages:Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages:Nov 13 09:36:34 maistor kernel: [ 53.624754] CPU: 0 PID: 4142 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages.1:Nov 4 11:21:19 maistor kernel: [ 38.497643] CPU: 1 PID: 4104 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages.1:Nov 4 11:21:31 maistor kernel: [ 50.844193] CPU: 0 PID: 4104 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages.1:Nov 5 10:28:49 maistor kernel: [ 39.545474] CPU: 1 PID: 4253 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages.1:Nov 5 10:29:02 maistor kernel: [ 52.078761] CPU: 0 PID: 4253 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages.1:Nov 6 10:33:01 maistor kernel: [ 38.876587] CPU: 0 PID: 4128 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages.1:Nov 6 10:33:12 maistor kernel: [ 49.777082] CPU: 0 PID: 4128 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages.1:Nov 7 10:27:40 maistor kernel: [ 38.771546] CPU: 0 PID: 4110 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages.1:Nov 7 10:27:53 maistor kernel: [ 51.896606] CPU: 1 PID: 4110 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages.1:Nov 8 10:35:13 maistor kernel: [ 42.021333] CPU: 1 PID: 4224 Comm: Xorg Tainted: P 3.11.6eko2 #3 messages.1:Nov 8 10:35:22 maistor kernel: [ 51.699993] CPU: 0 PID: 4224 Comm: Xorg Tainted: P W 3.11.6eko2 #3 messages.2.gz:Oct 27 19:01:29 maistor kernel: CPU: 1 PID: 6111 Comm: plugin-containe Tainted: P O 3.11.6eko2 #1 messages.2.gz:Oct 27 22:15:14 maistor kernel: CPU: 1 PID: 9024 Comm: plugin-containe Tainted: P O 3.11.6eko2 #1 messages.2.gz:Oct 28 10:33:34 maistor kernel: CPU: 0 PID: 4195 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.2.gz:Oct 28 10:33:43 maistor kernel: CPU: 0 PID: 4195 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.2.gz:Oct 29 10:34:29 maistor kernel: CPU: 1 PID: 4633 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.2.gz:Oct 29 10:34:41 maistor kernel: CPU: 1 PID: 4633 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.2.gz:Oct 30 10:30:55 maistor kernel: CPU: 1 PID: 4030 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.2.gz:Oct 30 10:31:06 maistor kernel: CPU: 0 PID: 4030 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.2.gz:Oct 31 10:51:01 maistor kernel: CPU: 1 PID: 6441 Comm: Xorg Tainted: P O 3.11.6eko2 #1 messages.2.gz:Oct 31 10:51:08 maistor kernel: CPU: 1 PID: 6441 Comm: Xorg Tainted: P W O 3.11.6eko2 #1 messages.2.gz:Nov 2 06:32:48 maistor kernel: CPU: 0 PID: 5925 Comm: Socket Thread Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 20 02:37:01 maistor kernel: CPU: 0 PID: 5952 Comm: Socket Thread Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 20 23:34:00 maistor kernel: CPU: 0 PID: 14534 Comm: plugin-containe Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 20 23:34:24 maistor kernel: CPU: 1 PID: 14535 Comm: plugin-containe Tainted: P D 3.11.6eko2 #1 messages.3.gz:Oct 20 23:34:52 maistor kernel: CPU: 1 PID: 14535 Comm: plugin-containe Tainted: P D 3.11.6eko2 #1 messages.3.gz:Oct 20 23:35:20 maistor kernel: CPU: 1 PID: 14535 Comm: plugin-containe Tainted: P D 3.11.6eko2 #1 messages.3.gz:Oct 20 23:35:48 maistor kernel: CPU: 1 PID: 14535 Comm: plugin-containe Tainted: P D 3.11.6eko2 #1 messages.3.gz:Oct 20 23:36:16 maistor kernel: CPU: 1 PID: 14535 Comm: plugin-containe Tainted: P D 3.11.6eko2 #1 messages.3.gz:Oct 20 23:36:44 maistor kernel: CPU: 1 PID: 14535 Comm: plugin-containe Tainted: P D 3.11.6eko2 #1 messages.3.gz:Oct 20 23:37:12 maistor kernel: CPU: 1 PID: 14535 Comm: plugin-containe Tainted: P D 3.11.6eko2 #1 messages.3.gz:Oct 21 10:42:33 maistor kernel: CPU: 0 PID: 4002 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 21 10:42:45 maistor kernel: CPU: 1 PID: 4002 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.3.gz:Oct 22 11:24:20 maistor kernel: CPU: 1 PID: 4129 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 22 11:24:30 maistor kernel: CPU: 1 PID: 4129 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.3.gz:Oct 23 11:09:10 maistor kernel: CPU: 1 PID: 4197 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 23 11:09:18 maistor kernel: CPU: 1 PID: 4197 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.3.gz:Oct 24 11:00:12 maistor kernel: CPU: 0 PID: 3981 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 24 11:00:23 maistor kernel: CPU: 1 PID: 3981 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.3.gz:Oct 25 12:55:49 maistor kernel: CPU: 0 PID: 4564 Comm: Xorg Tainted: P 3.11.6eko2 #1 messages.3.gz:Oct 25 12:56:01 maistor kernel: CPU: 1 PID: 4564 Comm: Xorg Tainted: P W 3.11.6eko2 #1 messages.3.gz:Oct 26 21:57:03 maistor kernel: CPU: 0 PID: 30118 Comm: plugin-containe Tainted: P O 3.11.6eko2 #1 messages.3.gz:Oct 26 21:57:27 maistor kernel: CPU: 1 PID: 30117 Comm: plugin-containe Tainted: P D O 3.11.6eko2 #1 messages.3.gz:Oct 26 21:57:55 maistor kernel: CPU: 1 PID: 30117 Comm: plugin-containe Tainted: P D O 3.11.6eko2 #1 messages.3.gz:Oct 26 21:58:23 maistor kernel: CPU: 1 PID: 30117 Comm: plugin-containe Tainted: P D O 3.11.6eko2 #1 messages.3.gz:Oct 26 21:58:51 maistor kernel: CPU: 1 PID: 30117 Comm: plugin-containe Tainted: P D O 3.11.6eko2 #1 messages.3.gz:Oct 26 21:59:19 maistor kernel: CPU: 1 PID: 30117 Comm: plugin-containe Tainted: P D O 3.11.6eko2 #1 messages.4.gz:Oct 14 19:40:28 maistor kernel: CPU: 1 PID: 19163 Comm: konsole Tainted: P O 3.10.9eko2 #4 messages.4.gz:Oct 14 19:42:04 maistor kernel: CPU: 0 PID: 26225 Comm: wfica Tainted: P D O 3.10.9eko2 #4 messages.4.gz:Oct 14 20:17:55 maistor kernel: CPU: 1 PID: 390 Comm: kswapd0 Tainted: P D O 3.10.9eko2 #4 messages.4.gz:Oct 15 20:16:45 maistor kernel: CPU: 0 PID: 4058 Comm: Xorg Tainted: P O 3.10.9eko2 #4 messages.4.gz:Oct 17 10:40:44 maistor kernel: CPU: 1 PID: 6417 Comm: plugin-containe Tainted: P O 3.10.9eko2 #4 messages.4.gz:Oct 17 12:42:09 maistor kernel: CPU: 1 PID: 390 Comm: kswapd0 Tainted: P D O 3.10.9eko2 #4 messages.4.gz:Oct 17 13:16:14 maistor kernel: CPU: 0 PID: 6108 Comm: kmix Tainted: P O 3.10.9eko2 #4 messages.4.gz:Oct 17 13:17:33 maistor kernel: CPU: 1 PID: 20690 Comm: udisks-daemon Tainted: P D O 3.10.9eko2 #4 messages.4.gz:Oct 17 13:17:33 maistor kernel: CPU: 1 PID: 20690 Comm: udisks-daemon Tainted: P D W O 3.10.9eko2 #4 messages.4.gz:Oct 17 13:56:58 maistor kernel: CPU: 1 PID: 13731 Comm: plugin-containe Tainted: P O 3.10.9eko2 #4 messages.4.gz:Oct 17 13:57:04 maistor kernel: CPU: 0 PID: 4494 Comm: Xorg Tainted: P W O 3.10.9eko2 #4 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-13 20:33 ` Borislav Petkov 2013-11-13 21:19 ` MPhil. Emanoil Kotsev @ 2013-11-17 11:35 ` MPhil. Emanoil Kotsev 2013-11-17 12:07 ` Borislav Petkov 1 sibling, 1 reply; 13+ messages in thread From: MPhil. Emanoil Kotsev @ 2013-11-17 11:35 UTC (permalink / raw) To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter Hi I listened to your advise and installed 3.12 kernel (no other modules on top that would taint the kernel like vmware/player). So it turned out I have to enable /proc/acpi (depreciated) and acpi_cpufreq, so that I may have a proper support for cooling and frequency. $ acpi -t Thermal 0: ok, 50.5 degrees C $ acpi -c Cooling 0: Processor 0 of 10 Cooling 1: Processor 0 of 10 Cooling 2: LCD 3 of 7 $ lsmod | grep cpu cpufreq_ondemand 8085 2 cpufreq_powersave 926 0 cpufreq_performance 930 0 cpufreq_conservative 6305 0 acpi_cpufreq 6955 0 processor 23167 3 acpi_cpufreq After doing all of this I was able to reproduce the issue by overloading the system with following simple steps: 1. start a compilation of something (ex. kernel) 2. run another process hungry application (flashplayer in firefox) => system locks in about 3-5mins I also noticed that the board gets pretty hot, so in my opinion it locks because of thermal issue. I think this also would explain why I see errors at different processes (mostly Xorg), but with 3.12 I do not get any trace message in the log files. Could you advise which option should be enabled in the kernel or how I could log/trace if system locks. How can I make sure that the cooling/temp works properly? Perhaps after upgrading in september the system is working under heavier load and therefore I started having the issue, or something broke in software or hardware and it can not cool down properly. I don't think the kernel is the issue, because I had the same with older kernels that were working fine before. The fan looks clean and there is no dust or whatever in the cooling area, that would prevent colling. The physical position of the notebook (docking station) also did not change. I don't know where to look at or to start, so any advise is appreciated. thanks in advance and kind regards ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-17 11:35 ` MPhil. Emanoil Kotsev @ 2013-11-17 12:07 ` Borislav Petkov 2013-11-17 14:45 ` MPhil. Emanoil Kotsev 0 siblings, 1 reply; 13+ messages in thread From: Borislav Petkov @ 2013-11-17 12:07 UTC (permalink / raw) To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter On Sun, Nov 17, 2013 at 12:35:16PM +0100, MPhil. Emanoil Kotsev wrote: > After doing all of this I was able to reproduce the issue by > overloading the system with following simple steps: > 1. start a compilation of something (ex. kernel) > 2. run another process hungry application (flashplayer in firefox) > => system locks in about 3-5mins Ha, so we're getting somewhere :) > I also noticed that the board gets pretty hot, so in my opinion it > locks because of thermal issue. The symptoms we're seeing so far are very much consistent with a thermal issue. > I think this also would explain why I see errors at different > processes (mostly Xorg), but with 3.12 I do not get any trace message > in the log files. Could you advise which option should be enabled in > the kernel or how I could log/trace if system locks. Try enabling CONFIG_LOCKUP_DETECTOR, that could tell us where we're hanging. But, make sure to be on a console and not in X in order to get a chance to see the message. What I do is reroute all log messages to /dev/tty8, i.e. have *.* |/dev/tty8 in syslog.conf and switch to it with Ctrl-Alt-F8. > How can I make sure that the cooling/temp works properly? > > Perhaps after upgrading in september the system is working under What kind of upgrade exactly did you do to a laptop? > heavier load and therefore I started having the issue, or something > broke in software or hardware and it can not cool down properly. I > don't think the kernel is the issue, because I had the same with older > kernels that were working fine before. > > The fan looks clean and there is no dust or whatever in the cooling > area, that would prevent colling. The physical position of the > notebook (docking station) also did not change. Does the issue happen if the laptop is not in the docking station? In any case, you need to follow your steps back of the upgrade to have at least a clue what causes the overheating. Can you revert the upgrade and see whether it still happens? Also, do you have sensors support for your hardware? IOW, can you monitor the temperature of some hardware elements by running $ sensors ? For example, I see this on my box here: $ sensors fam15h_power-pci-00c4 Adapter: PCI adapter power1: 45.64 W (crit = 125.19 W) k10temp-pci-00c3 Adapter: PCI adapter temp1: +19.2°C (high = +70.0°C) (crit = +90.0°C, hyst = +87.0°C) radeon-pci-0100 Adapter: PCI adapter temp1: +80.0°C so when something overheats, running "watch -n 1 sensors" could give some hints. Also, what does $ grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq give? Also, can you connect your laptop to a serial or netconsole to collect dmesg before and while the lockup happens? Basically, we're looking for a hint about which part of the hw causes the overheating... HTH. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-17 12:07 ` Borislav Petkov @ 2013-11-17 14:45 ` MPhil. Emanoil Kotsev 2013-11-17 15:06 ` Borislav Petkov 0 siblings, 1 reply; 13+ messages in thread From: MPhil. Emanoil Kotsev @ 2013-11-17 14:45 UTC (permalink / raw) To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter Hi, On Sunday 17 November 2013 13:07:34 Borislav Petkov wrote: > On Sun, Nov 17, 2013 at 12:35:16PM +0100, MPhil. Emanoil Kotsev wrote: > > After doing all of this I was able to reproduce the issue by > > overloading the system with following simple steps: > > 1. start a compilation of something (ex. kernel) > > 2. run another process hungry application (flashplayer in firefox) > > => system locks in about 3-5mins > > Ha, so we're getting somewhere :) yes looks like :) > > > I also noticed that the board gets pretty hot, so in my opinion it > > locks because of thermal issue. > > The symptoms we're seeing so far are very much consistent with a thermal > issue. this is also true - which makes me sad as the notebook was working great in the past 7y > > > I think this also would explain why I see errors at different > > processes (mostly Xorg), but with 3.12 I do not get any trace message > > in the log files. Could you advise which option should be enabled in > > the kernel or how I could log/trace if system locks. > > Try enabling CONFIG_LOCKUP_DETECTOR, that could tell us where we're > hanging. > > But, make sure to be on a console and not in X in order to get a chance > to see the message. What I do is reroute all log messages to /dev/tty8, > i.e. have > > *.* |/dev/tty8 > > in syslog.conf and switch to it with Ctrl-Alt-F8. thanks for the advise. I'll do so > > > How can I make sure that the cooling/temp works properly? > > > > Perhaps after upgrading in september the system is working under > > What kind of upgrade exactly did you do to a laptop? I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded to debian wheeze with TDE (3.5.13) > > > heavier load and therefore I started having the issue, or something > > broke in software or hardware and it can not cool down properly. I > > don't think the kernel is the issue, because I had the same with older > > kernels that were working fine before. > > > > The fan looks clean and there is no dust or whatever in the cooling > > area, that would prevent colling. The physical position of the > > notebook (docking station) also did not change. > > Does the issue happen if the laptop is not in the docking station? I wanted to test this, but as I have to replug a lot, didn't do it so far, also because it was working with this docking station for the past 2y > > In any case, you need to follow your steps back of the upgrade to have > at least a clue what causes the overheating. > > Can you revert the upgrade and see whether it still happens? This would be hard - no impossible as I have a backup but it will be time consuming > > Also, do you have sensors support for your hardware? IOW, can you > monitor the temperature of some hardware elements by running > > $ sensors $ sensors acpitz-virtual-0 Adapter: Virtual device temp1: +47.5°C (crit = +126.0°C) > > ? > > For example, I see this on my box here: > > $ sensors > fam15h_power-pci-00c4 > Adapter: PCI adapter > power1: 45.64 W (crit = 125.19 W) > > k10temp-pci-00c3 > Adapter: PCI adapter > temp1: +19.2°C (high = +70.0°C) > (crit = +90.0°C, hyst = +87.0°C) > > radeon-pci-0100 > Adapter: PCI adapter > temp1: +80.0°C > > so when something overheats, running "watch -n 1 sensors" could give > some hints. > > Also, what does > > $ grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq > > give? grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000 /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000 /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000000 1667000 1333000 1000000 /sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1 /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000 /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondemand powersave performance conservative userspace /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000 /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000 /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0 /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000 /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0 /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported> > > Also, can you connect your laptop to a serial or netconsole to collect > dmesg before and while the lockup happens? I could try this. I guess this assumes I have to have another machine running in paralell, but this can be arranged with a little effort > > Basically, we're looking for a hint about which part of the hw causes > the overheating... > > HTH. Thanks for the hints. As I never had to do with overheating or similar issues, your help is very precious to me. Unfortunately we have a little child on board and time is limitted :) to a couple of hours daily, where I can work at home which means even less time for debugging. But I never give up. I just want to be sure that it is not a hardware issue Thanks again and kind regards. I'll post when I have some useful input ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-17 14:45 ` MPhil. Emanoil Kotsev @ 2013-11-17 15:06 ` Borislav Petkov 2013-11-17 16:45 ` MPhil. Emanoil Kotsev 0 siblings, 1 reply; 13+ messages in thread From: Borislav Petkov @ 2013-11-17 15:06 UTC (permalink / raw) To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter On Sun, Nov 17, 2013 at 03:45:34PM +0100, MPhil. Emanoil Kotsev wrote: > this is also true - which makes me sad as the notebook was working > thgreat in e past 7y Hmm, maybe it is heading slowly for the eternal hunting fields... :-) > > What kind of upgrade exactly did you do to a laptop? > > I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded to > debian wheeze with TDE (3.5.13) Oh ok, so I thought you were talking about a hw upgrade, like adding more RAM, hew hdd, etc. Ok, can you try this: boot without X and try overloading the machine on the console, i.e. do while true; do make clean && make -j64; done or similar in your kernel repository. Does it trigger then? Although I can't imagine how a software upgrade would cause the overheating... :-\. > > Can you revert the upgrade and see whether it still happens? > This would be hard - no impossible as I have a backup but it will be > time consuming You could try booting a distro from a livecd and see any change there... > $ sensors > acpitz-virtual-0 > Adapter: Virtual device > temp1: +47.5°C (crit = +126.0°C) That's some ACPI timezone thing. So what happens if you do $ watch -n 1 sensors and you incur the load? Do you hit the critical temperature? > grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq > /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000000 1667000 1333000 1000000 > /sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondemand powersave performance conservative userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000 > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000 > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000 > /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000 > /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported> Yeah, I don't see anything wrong with that output. > I could try this. I guess this assumes I have to have another machine > running in paralell, but this can be arranged with a little effort Yep. > Thanks for the hints. As I never had to do with overheating or > similar issues, your help is very precious to me. Unfortunately we > have a little child on board and time is limitted :) to a couple of > hours daily, where I can work at home which means even less time for > debugging. But I never give up. I just want to be sure that it is not > a hardware issue No worries, take care of the child first - the laptop and everyone else can wait :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-17 15:06 ` Borislav Petkov @ 2013-11-17 16:45 ` MPhil. Emanoil Kotsev 2013-11-17 20:05 ` Borislav Petkov 0 siblings, 1 reply; 13+ messages in thread From: MPhil. Emanoil Kotsev @ 2013-11-17 16:45 UTC (permalink / raw) To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter Hi On Sunday 17 November 2013 16:06:07 you wrote: > On Sun, Nov 17, 2013 at 03:45:34PM +0100, MPhil. Emanoil Kotsev wrote: > > this is also true - which makes me sad as the notebook was working > > thgreat in e past 7y > > Hmm, maybe it is heading slowly for the eternal hunting fields... :-) may be, but I am a bit of academic so until 100% prove - I doubt, which does not mean that I can not purchaise a new one :) > > > > What kind of upgrade exactly did you do to a laptop? > > > > I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded > > to debian wheeze with TDE (3.5.13) > > Oh ok, so I thought you were talking about a hw upgrade, like adding > more RAM, hew hdd, etc. > > Ok, can you try this: boot without X and try overloading the machine on > the console, i.e. do > > while true; do make clean && make -j64; done > > or similar in your kernel repository. Does it trigger then? I'll try - I'm also curious what will happen! > > Although I can't imagine how a software upgrade would cause the > overheating... :-\. How - new libraries - more exhaustive algorythms - higher cpu usage etc. Some of the things M$ is doing on purpose to force you upgrade your hardware every 2-3years > > > > Can you revert the upgrade and see whether it still happens? > > > > This would be hard - no impossible as I have a backup but it will be > > time consuming > > You could try booting a distro from a livecd and see any change there... > > > $ sensors > > acpitz-virtual-0 > > Adapter: Virtual device > > temp1: +47.5°C (crit = +126.0°C) > > That's some ACPI timezone thing. So what happens if you do > > $ watch -n 1 sensors > > and you incur the load? Do you hit the critical temperature? I wanted to first compile the kernel with the debug option you mentioned, but while compiling it went to about 75°C. > > > grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq > > /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000 > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand > > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000 > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000 > >000 1667000 1333000 1000000 > > /sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1 > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq > > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000 > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondema > >nd powersave performance conservative userspace > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000 > > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000 > > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000 > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000 > > /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0 > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000 > > /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0 > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported> > > Yeah, I don't see anything wrong with that output. yes looks nice > > > I could try this. I guess this assumes I have to have another machine > > running in paralell, but this can be arranged with a little effort > > Yep. > > > Thanks for the hints. As I never had to do with overheating or > > similar issues, your help is very precious to me. Unfortunately we > > have a little child on board and time is limitted :) to a couple of > > hours daily, where I can work at home which means even less time for > > debugging. But I never give up. I just want to be sure that it is not > > a hardware issue > > No worries, take care of the child first - the laptop and everyone else > can wait :-) yes - we do load balancing with my wife :) I'll post back with some data (I hope) regards ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-17 16:45 ` MPhil. Emanoil Kotsev @ 2013-11-17 20:05 ` Borislav Petkov 2013-11-19 9:21 ` MPhil. Emanoil Kotsev 2013-12-18 20:59 ` MPhil. Emanoil Kotsev 0 siblings, 2 replies; 13+ messages in thread From: Borislav Petkov @ 2013-11-17 20:05 UTC (permalink / raw) To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote: > How - new libraries - more exhaustive algorythms - higher cpu usage > etc. Some of the things M$ is doing on purpose to force you upgrade > your hardware every 2-3years That would be too easy and machines would be dying left and right of overheating. Actually, sane hardware is much more robust than that and it throttles itself in case of critical temperature levels. And, IMHO your Dell Latitude D520 should be fine, in that respect. But we'll see. :-) > I wanted to first compile the kernel with the debug option you > mentioned, but while compiling it went to about 75°C. Yeah, that's still ok if we trust the output saying that 126°C is the critical temp. It would be interesting to see what this sensor says right before the machine locks up. HTH. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-17 20:05 ` Borislav Petkov @ 2013-11-19 9:21 ` MPhil. Emanoil Kotsev 2013-12-18 20:59 ` MPhil. Emanoil Kotsev 1 sibling, 0 replies; 13+ messages in thread From: MPhil. Emanoil Kotsev @ 2013-11-19 9:21 UTC (permalink / raw) To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter Hi On Sunday 17 November 2013 21:05:46 Borislav Petkov wrote: > On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote: > > How - new libraries - more exhaustive algorythms - higher cpu usage > > etc. Some of the things M$ is doing on purpose to force you upgrade > > your hardware every 2-3years > > That would be too easy and machines would be dying left and right of > overheating. Actually, sane hardware is much more robust than that and > it throttles itself in case of critical temperature levels. And, IMHO > your Dell Latitude D520 should be fine, in that respect. But we'll see. > I was thinking the same - but started to despair > :-) > : > > I wanted to first compile the kernel with the debug option you > > mentioned, but while compiling it went to about 75°C. > > Yeah, that's still ok if we trust the output saying that 126°C is the > critical temp. > > It would be interesting to see what this sensor says right before the > machine locks up. This test is outstanding for a moment where I have more free time to reproduce and log everything I did something else yesterday evening before going to bed ~00:30 I closed the notebook cover just so that it would switch off the LCD display In the morning I opened up and found the notebook with blinking led lights http://www.dell.com/support/troubleshooting/us/en/19/KCS/KcsArticles/ArticleView?c=us&l=en&s=dhs&docid=DSN_DBECF64CFEDA449398CB9E859D4944A5 unfortunately I don't find the pattern in the link above the left one was on and the other two were blinking Arter shut down (keep power button pressed) and turning it on only the two leds (middle and right) were blinking, which according the link above means Configuring PCI bridges Replacing the system board. After waiting for about 1-2mins notebook starts normally - another link to heating issues. At the moment I have to do pretty much at all levels, so I can not test any further. This is just an update. I'll post again when more results are available. I'm thinking to open up and inspect from inside - perhaps somewhere the cooling system is clogged or something. regards ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-11-17 20:05 ` Borislav Petkov 2013-11-19 9:21 ` MPhil. Emanoil Kotsev @ 2013-12-18 20:59 ` MPhil. Emanoil Kotsev 2013-12-18 21:22 ` Borislav Petkov 1 sibling, 1 reply; 13+ messages in thread From: MPhil. Emanoil Kotsev @ 2013-12-18 20:59 UTC (permalink / raw) To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter Hi again, sorry for writing after such long time of silence, but I was busy with one project (and family as well) On Sunday 17 November 2013 21:05:46 you wrote: > On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote: > > How - new libraries - more exhaustive algorythms - higher cpu usage > > etc. Some of the things M$ is doing on purpose to force you upgrade > > your hardware every 2-3years > > That would be too easy and machines would be dying left and right of > overheating. Actually, sane hardware is much more robust than that and > it throttles itself in case of critical temperature levels. And, IMHO > your Dell Latitude D520 should be fine, in that respect. But we'll see. > I was able to solve the issue by removing some of the modules I had in xorg.conf. I noticed that it is not the cpu that is overheating, but rather the video/graphic card. The area around the "Dell" logo on the front of the display is still pretty hot, but the system seem to be working fine now and I can not reproduce the issue any more. Someone would ask why I'm using the xorg.conf. The reason is because without it X automatically loads the GL direver for 3d support and I am not able to use second display. Perhaps it is worth trying latest intel driver as susggested before. However with the current one it is working fine, so I would consider the issue as solved. I would like to thank you for your precious support and ideas once again. regards ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault 2013-12-18 20:59 ` MPhil. Emanoil Kotsev @ 2013-12-18 21:22 ` Borislav Petkov 0 siblings, 0 replies; 13+ messages in thread From: Borislav Petkov @ 2013-12-18 21:22 UTC (permalink / raw) To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter On Wed, Dec 18, 2013 at 09:59:22PM +0100, MPhil. Emanoil Kotsev wrote: > I was able to solve the issue by removing some of the modules I had in > xorg.conf. I noticed that it is not the cpu that is overheating, but > rather the video/graphic card. The area around the "Dell" logo on the > front of the display is still pretty hot, but the system seem to be > working fine now and I can not reproduce the issue any more. Interesting. Which module was that? It was probably making your GPU go nuts. The more interesting question is whether this module would behave on your machine normally and only some buggy incarnation of it would cause the overheating... I.e., it could be you upgraded X and with the new version the issue started appearing. Fun. > I would like to thank you for your precious support and ideas once > again. Sure, you're welcome! :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-12-18 21:22 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-13 19:58 kernel 3.11.6 general protection fault MPhil. Emanoil Kotsev 2013-11-13 20:09 ` [Intel-gfx] " Daniel Vetter 2013-11-13 20:33 ` Borislav Petkov 2013-11-13 21:19 ` MPhil. Emanoil Kotsev 2013-11-17 11:35 ` MPhil. Emanoil Kotsev 2013-11-17 12:07 ` Borislav Petkov 2013-11-17 14:45 ` MPhil. Emanoil Kotsev 2013-11-17 15:06 ` Borislav Petkov 2013-11-17 16:45 ` MPhil. Emanoil Kotsev 2013-11-17 20:05 ` Borislav Petkov 2013-11-19 9:21 ` MPhil. Emanoil Kotsev 2013-12-18 20:59 ` MPhil. Emanoil Kotsev 2013-12-18 21:22 ` Borislav Petkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox