From mboxrd@z Thu Jan 1 00:00:00 1970 From: Borislav Petkov Subject: Re: kernel 3.11.6 general protection fault Date: Sun, 17 Nov 2013 13:07:34 +0100 Message-ID: <20131117120734.GE27323@pd.tnic> References: <201311132058.30310.Emanoil.Kotsev@fincom.at> <20131113200914.GG7251@phenom.ffwll.local> <20131113203319.GB23962@pd.tnic> <201311171235.17602.Emanoil.Kotsev@fincom.at> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from mail.skyhub.de (mail.skyhub.de [78.46.96.112]) by gabe.freedesktop.org (Postfix) with ESMTP id 38044FC7F1 for ; Sun, 17 Nov 2013 04:08:07 -0800 (PST) Content-Disposition: inline In-Reply-To: <201311171235.17602.Emanoil.Kotsev@fincom.at> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: "MPhil. Emanoil Kotsev" Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org List-Id: intel-gfx@lists.freedesktop.org T24gU3VuLCBOb3YgMTcsIDIwMTMgYXQgMTI6MzU6MTZQTSArMDEwMCwgTVBoaWwuIEVtYW5vaWwg S290c2V2IHdyb3RlOgo+IEFmdGVyIGRvaW5nIGFsbCBvZiB0aGlzIEkgd2FzIGFibGUgdG8gcmVw cm9kdWNlIHRoZSBpc3N1ZSBieQo+IG92ZXJsb2FkaW5nIHRoZSBzeXN0ZW0gd2l0aCBmb2xsb3dp bmcgc2ltcGxlIHN0ZXBzOgo+IDEuIHN0YXJ0IGEgY29tcGlsYXRpb24gb2Ygc29tZXRoaW5nIChl eC4ga2VybmVsKQo+IDIuIHJ1biBhbm90aGVyIHByb2Nlc3MgaHVuZ3J5IGFwcGxpY2F0aW9uIChm bGFzaHBsYXllciBpbiBmaXJlZm94KQo+ID0+IHN5c3RlbSBsb2NrcyBpbiBhYm91dCAzLTVtaW5z CgpIYSwgc28gd2UncmUgZ2V0dGluZyBzb21ld2hlcmUgOikKCj4gSSBhbHNvIG5vdGljZWQgdGhh dCB0aGUgYm9hcmQgZ2V0cyBwcmV0dHkgaG90LCBzbyBpbiBteSBvcGluaW9uIGl0Cj4gbG9ja3Mg YmVjYXVzZSBvZiB0aGVybWFsIGlzc3VlLgoKVGhlIHN5bXB0b21zIHdlJ3JlIHNlZWluZyBzbyBm YXIgYXJlIHZlcnkgbXVjaCBjb25zaXN0ZW50IHdpdGggYSB0aGVybWFsCmlzc3VlLgoKPiBJIHRo aW5rIHRoaXMgYWxzbyB3b3VsZCBleHBsYWluIHdoeSBJIHNlZSBlcnJvcnMgYXQgZGlmZmVyZW50 Cj4gcHJvY2Vzc2VzIChtb3N0bHkgWG9yZyksIGJ1dCB3aXRoIDMuMTIgSSBkbyBub3QgZ2V0IGFu eSB0cmFjZSBtZXNzYWdlCj4gaW4gdGhlIGxvZyBmaWxlcy4gQ291bGQgeW91IGFkdmlzZSB3aGlj aCBvcHRpb24gc2hvdWxkIGJlIGVuYWJsZWQgaW4KPiB0aGUga2VybmVsIG9yIGhvdyBJIGNvdWxk IGxvZy90cmFjZSBpZiBzeXN0ZW0gbG9ja3MuCgpUcnkgZW5hYmxpbmcgQ09ORklHX0xPQ0tVUF9E RVRFQ1RPUiwgdGhhdCBjb3VsZCB0ZWxsIHVzIHdoZXJlIHdlJ3JlCmhhbmdpbmcuCgpCdXQsIG1h a2Ugc3VyZSB0byBiZSBvbiBhIGNvbnNvbGUgYW5kIG5vdCBpbiBYIGluIG9yZGVyIHRvIGdldCBh IGNoYW5jZQp0byBzZWUgdGhlIG1lc3NhZ2UuIFdoYXQgSSBkbyBpcyByZXJvdXRlIGFsbCBsb2cg bWVzc2FnZXMgdG8gL2Rldi90dHk4LAppLmUuIGhhdmUKCiouKgkJfC9kZXYvdHR5OAoKaW4gc3lz bG9nLmNvbmYgYW5kIHN3aXRjaCB0byBpdCB3aXRoIEN0cmwtQWx0LUY4LgoKPiBIb3cgY2FuIEkg bWFrZSBzdXJlIHRoYXQgdGhlIGNvb2xpbmcvdGVtcCB3b3JrcyBwcm9wZXJseT8KPgo+IFBlcmhh cHMgYWZ0ZXIgdXBncmFkaW5nIGluIHNlcHRlbWJlciB0aGUgc3lzdGVtIGlzIHdvcmtpbmcgdW5k ZXIKCldoYXQga2luZCBvZiB1cGdyYWRlIGV4YWN0bHkgZGlkIHlvdSBkbyB0byBhIGxhcHRvcD8K Cj4gaGVhdmllciBsb2FkIGFuZCB0aGVyZWZvcmUgSSBzdGFydGVkIGhhdmluZyB0aGUgaXNzdWUs IG9yIHNvbWV0aGluZwo+IGJyb2tlIGluIHNvZnR3YXJlIG9yIGhhcmR3YXJlIGFuZCBpdCBjYW4g bm90IGNvb2wgZG93biBwcm9wZXJseS4gSQo+IGRvbid0IHRoaW5rIHRoZSBrZXJuZWwgaXMgdGhl IGlzc3VlLCBiZWNhdXNlIEkgaGFkIHRoZSBzYW1lIHdpdGggb2xkZXIKPiBrZXJuZWxzIHRoYXQg d2VyZSB3b3JraW5nIGZpbmUgYmVmb3JlLgo+Cj4gVGhlIGZhbiBsb29rcyBjbGVhbiBhbmQgdGhl cmUgaXMgbm8gZHVzdCBvciB3aGF0ZXZlciBpbiB0aGUgY29vbGluZwo+IGFyZWEsIHRoYXQgd291 bGQgcHJldmVudCBjb2xsaW5nLiBUaGUgcGh5c2ljYWwgcG9zaXRpb24gb2YgdGhlCj4gbm90ZWJv b2sgKGRvY2tpbmcgc3RhdGlvbikgYWxzbyBkaWQgbm90IGNoYW5nZS4KCkRvZXMgdGhlIGlzc3Vl IGhhcHBlbiBpZiB0aGUgbGFwdG9wIGlzIG5vdCBpbiB0aGUgZG9ja2luZyBzdGF0aW9uPwoKSW4g YW55IGNhc2UsIHlvdSBuZWVkIHRvIGZvbGxvdyB5b3VyIHN0ZXBzIGJhY2sgb2YgdGhlIHVwZ3Jh ZGUgdG8gaGF2ZQphdCBsZWFzdCBhIGNsdWUgd2hhdCBjYXVzZXMgdGhlIG92ZXJoZWF0aW5nLgoK Q2FuIHlvdSByZXZlcnQgdGhlIHVwZ3JhZGUgYW5kIHNlZSB3aGV0aGVyIGl0IHN0aWxsIGhhcHBl bnM/CgpBbHNvLCBkbyB5b3UgaGF2ZSBzZW5zb3JzIHN1cHBvcnQgZm9yIHlvdXIgaGFyZHdhcmU/ IElPVywgY2FuIHlvdQptb25pdG9yIHRoZSB0ZW1wZXJhdHVyZSBvZiBzb21lIGhhcmR3YXJlIGVs ZW1lbnRzIGJ5IHJ1bm5pbmcKCiQgc2Vuc29ycwoKPwoKRm9yIGV4YW1wbGUsIEkgc2VlIHRoaXMg b24gbXkgYm94IGhlcmU6CgokIHNlbnNvcnMKZmFtMTVoX3Bvd2VyLXBjaS0wMGM0CkFkYXB0ZXI6 IFBDSSBhZGFwdGVyCnBvd2VyMTogICAgICAgNDUuNjQgVyAgKGNyaXQgPSAxMjUuMTkgVykKCmsx MHRlbXAtcGNpLTAwYzMKQWRhcHRlcjogUENJIGFkYXB0ZXIKdGVtcDE6ICAgICAgICArMTkuMsKw QyAgKGhpZ2ggPSArNzAuMMKwQykKICAgICAgICAgICAgICAgICAgICAgICAoY3JpdCA9ICs5MC4w wrBDLCBoeXN0ID0gKzg3LjDCsEMpCgpyYWRlb24tcGNpLTAxMDAKQWRhcHRlcjogUENJIGFkYXB0 ZXIKdGVtcDE6ICAgICAgICArODAuMMKwQwoKc28gd2hlbiBzb21ldGhpbmcgb3ZlcmhlYXRzLCBy dW5uaW5nICJ3YXRjaCAtbiAxIHNlbnNvcnMiIGNvdWxkIGdpdmUKc29tZSBoaW50cy4KCkFsc28s IHdoYXQgZG9lcwoKJCBncmVwIC4gLUVyaUluIC9zeXMvZGV2aWNlcy9zeXN0ZW0vY3B1L2NwdTAv Y3B1ZnJlcQoKZ2l2ZT8KCkFsc28sIGNhbiB5b3UgY29ubmVjdCB5b3VyIGxhcHRvcCB0byBhIHNl cmlhbCBvciBuZXRjb25zb2xlIHRvIGNvbGxlY3QKZG1lc2cgYmVmb3JlIGFuZCB3aGlsZSB0aGUg bG9ja3VwIGhhcHBlbnM/CgpCYXNpY2FsbHksIHdlJ3JlIGxvb2tpbmcgZm9yIGEgaGludCBhYm91 dCB3aGljaCBwYXJ0IG9mIHRoZSBodyBjYXVzZXMKdGhlIG92ZXJoZWF0aW5nLi4uCgpIVEguCgot LSAKUmVnYXJkcy9HcnVzcywKICAgIEJvcmlzLgoKU2VudCBmcm9tIGEgZmF0IGNyYXRlIHVuZGVy IG15IGRlc2suIEZvcm1hdHRpbmcgaXMgZmluZS4KLS0KX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4IG1haWxpbmcgbGlzdApJbnRlbC1nZnhA bGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1h bi9saXN0aW5mby9pbnRlbC1nZngK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753692Ab3KQMIN (ORCPT ); Sun, 17 Nov 2013 07:08:13 -0500 Received: from mail.skyhub.de ([78.46.96.112]:56971 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751918Ab3KQMII (ORCPT ); Sun, 17 Nov 2013 07:08:08 -0500 Date: Sun, 17 Nov 2013 13:07:34 +0100 From: Borislav Petkov To: "MPhil. Emanoil Kotsev" Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Daniel Vetter Subject: Re: [Intel-gfx] kernel 3.11.6 general protection fault Message-ID: <20131117120734.GE27323@pd.tnic> References: <201311132058.30310.Emanoil.Kotsev@fincom.at> <20131113200914.GG7251@phenom.ffwll.local> <20131113203319.GB23962@pd.tnic> <201311171235.17602.Emanoil.Kotsev@fincom.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <201311171235.17602.Emanoil.Kotsev@fincom.at> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 17, 2013 at 12:35:16PM +0100, MPhil. Emanoil Kotsev wrote: > After doing all of this I was able to reproduce the issue by > overloading the system with following simple steps: > 1. start a compilation of something (ex. kernel) > 2. run another process hungry application (flashplayer in firefox) > => system locks in about 3-5mins Ha, so we're getting somewhere :) > I also noticed that the board gets pretty hot, so in my opinion it > locks because of thermal issue. The symptoms we're seeing so far are very much consistent with a thermal issue. > I think this also would explain why I see errors at different > processes (mostly Xorg), but with 3.12 I do not get any trace message > in the log files. Could you advise which option should be enabled in > the kernel or how I could log/trace if system locks. Try enabling CONFIG_LOCKUP_DETECTOR, that could tell us where we're hanging. But, make sure to be on a console and not in X in order to get a chance to see the message. What I do is reroute all log messages to /dev/tty8, i.e. have *.* |/dev/tty8 in syslog.conf and switch to it with Ctrl-Alt-F8. > How can I make sure that the cooling/temp works properly? > > Perhaps after upgrading in september the system is working under What kind of upgrade exactly did you do to a laptop? > heavier load and therefore I started having the issue, or something > broke in software or hardware and it can not cool down properly. I > don't think the kernel is the issue, because I had the same with older > kernels that were working fine before. > > The fan looks clean and there is no dust or whatever in the cooling > area, that would prevent colling. The physical position of the > notebook (docking station) also did not change. Does the issue happen if the laptop is not in the docking station? In any case, you need to follow your steps back of the upgrade to have at least a clue what causes the overheating. Can you revert the upgrade and see whether it still happens? Also, do you have sensors support for your hardware? IOW, can you monitor the temperature of some hardware elements by running $ sensors ? For example, I see this on my box here: $ sensors fam15h_power-pci-00c4 Adapter: PCI adapter power1: 45.64 W (crit = 125.19 W) k10temp-pci-00c3 Adapter: PCI adapter temp1: +19.2°C (high = +70.0°C) (crit = +90.0°C, hyst = +87.0°C) radeon-pci-0100 Adapter: PCI adapter temp1: +80.0°C so when something overheats, running "watch -n 1 sensors" could give some hints. Also, what does $ grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq give? Also, can you connect your laptop to a serial or netconsole to collect dmesg before and while the lockup happens? Basically, we're looking for a hint about which part of the hw causes the overheating... HTH. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --