From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bojan Smojver Subject: Re: Memory corruption on hibernate/thaw with KMS Date: Wed, 26 Oct 2011 14:44:46 +1100 Message-ID: <1319600686.1980.4.camel@shrek.rexursive.com> References: <1318230907.2019.4.camel@shrek.rexursive.com> <20111010075306.GC3021@phenom.ffwll.local> <1318234738.2010.8.camel@shrek.rexursive.com> <1318241119.1901.1.camel@shrek.rexursive.com> <1318241565.2051.1.camel@shrek.rexursive.com> <1318243076.1899.1.camel@shrek.rexursive.com> <1318245839.1936.2.camel@shrek.rexursive.com> <9067341376.1002832946@rexursive.com> <1318330938.1990.2.camel@shrek.rexursive.com> <20111011112209.GD2962@phenom.ffwll.local> <1318388966.3020.10.camel@shrek.rexursive.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from beauty.rexursive.com (beauty.rexursive.com [150.101.121.179]) by gabe.freedesktop.org (Postfix) with ESMTP id C679CA0A6F for ; Tue, 25 Oct 2011 20:44:48 -0700 (PDT) In-Reply-To: <1318388966.3020.10.camel@shrek.rexursive.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Daniel Vetter Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Wed, 2011-10-12 at 14:09 +1100, Bojan Smojver wrote: > Bug #41705. Just to follow up on this a bit, I cloned Linus' tree as of today (i.e. currently staged stuff for 3.2) then pulled Keith's tree (git://people.freedesktop.org/~keithp/linux drm-intel-next) over the top and compiled. Did 26 hibernate/thaw cycles and then went to check the machine. Unfortunately, I then got: --------------------------- [ 729.195407] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 729.199345] IP: [] __list_add+0x14/0x7f [ 729.203288] PGD 0 [ 729.207051] Oops: 0000 [#1] SMP [ 729.210874] CPU 0 [ 729.210901] Modules linked in: fuse ppdev parport_pc lp parport bnep bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm thinkpad_acpi snd_timer e1000e uvcvideo rfkill snd videodev media v4l2_compat_ioctl32 qcserial usb_wwan mxm_wmi wmi snd_page_alloc microcode i2c_i801 iTCO_wdt iTCO_vendor_support pcspkr intel_ips soundcore joydev ipv6 firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] [ 729.231392] [ 729.235351] Pid: 897, comm: dbus-daemon Tainted: G W 3.1.0+ #2 LENOVO 4313CTO/4313CTO [ 729.239316] RIP: 0010:[] [] __list_add+0x14/0x7f [ 729.243143] RSP: 0018:ffff88022ce57d60 EFLAGS: 00010286 [ 729.246941] RAX: ffff8801ab1454d0 RBX: 0000000000000000 RCX: 0000000000000054 [ 729.250770] RDX: 0000000000000000 RSI: ffff880229777100 RDI: ffff8801ab145520 [ 729.254604] RBP: ffff88022ce57d80 R08: ffff88020c7f28e8 R09: 00007f0aaeda4000 [ 729.258294] R10: 0000000000015ff8 R11: 0000000000015fa8 R12: ffff880229777100 [ 729.261820] R13: ffff8801ab145520 R14: ffff8802297770b0 R15: ffff8801ab1450b0 [ 729.265401] FS: 00007f0aaed80800(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [ 729.269032] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 729.272674] CR2: 0000000000000008 CR3: 000000022d275000 CR4: 00000000000006f0 [ 729.276338] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 729.279988] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 729.283616] Process dbus-daemon (pid: 897, threadinfo ffff88022ce56000, task ffff88022ca55c80) [ 729.287278] Stack: [ 729.290885] ffff88020c7f28e8 ffff88022c192d80 ffff88022a0bd500 ffff8801ab1454d0 [ 729.294593] ffff88022ce57d90 ffffffff810f1fe5 ffff88022ce57e10 ffffffff81055b6b [ 729.298289] 0000000000000000 ffff8801ab1450e8 ffff8801ab1450f0 ffff8801ab1450c8 [ 729.301966] Call Trace: [ 729.305664] [] vma_prio_tree_add+0x81/0x95 [ 729.309405] [] dup_mm+0x2f3/0x488 [ 729.313143] [] copy_process+0x9b1/0x119c [ 729.316888] [] ? security_file_alloc+0x16/0x18 [ 729.320631] [] ? get_empty_filp+0xa4/0x133 [ 729.324351] [] do_fork+0xef/0x22d [ 729.328029] [] ? sock_alloc_file+0xb3/0x114 [ 729.331672] [] ? should_resched+0xe/0x2d [ 729.335300] [] ? _cond_resched+0xe/0x22 [ 729.338914] [] ? might_fault+0xe/0x10 [ 729.342532] [] sys_clone+0x28/0x2a [ 729.346128] [] stub_clone+0x13/0x20 [ 729.349661] [] ? system_call_fastpath+0x16/0x1b [ 729.353249] Code: ad de 48 b9 00 02 20 00 00 00 ad de 48 89 13 48 89 4b 08 5e 5b 5d c3 55 48 89 e5 41 55 49 89 fd 41 54 49 89 f4 53 48 89 d3 41 50 <4c> 8b 42 08 49 39 f0 74 20 49 89 d1 48 89 f1 48 c7 c2 98 15 7e [ 729.361220] RIP [] __list_add+0x14/0x7f [ 729.365218] RSP [ 729.369206] CR2: 0000000000000008 [ 729.439968] ---[ end trace a0f13f2533f6746a ]--- --------------------------- Followed by machine becoming weird and throwing a whole lot more kernel errors, which I could not capture any more. The only other error was unrelated. It looked like this: --------------------------- [ 272.029435] ------------[ cut here ]------------ [ 272.029441] WARNING: at drivers/net/ethernet/intel/e1000e/ich8lan.c:870 e1000_acquire_swflag_ich8lan+0x4f/0x143 [e1000e]() [ 272.029443] Hardware name: 4313CTO [ 272.029445] e1000e: eth0: contention for Phy access [ 272.029446] Modules linked in: fuse ppdev parport_pc lp parport bnep bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm thinkpad_acpi snd_timer e1000e uvcvideo rfkill snd videodev media v4l2_compat_ioctl32 qcserial usb_wwan mxm_wmi wmi snd_page_alloc microcode i2c_i801 iTCO_wdt iTCO_vendor_support pcspkr intel_ips soundcore joydev ipv6 firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] [ 272.029480] Pid: 5712, comm: kworker/1:4 Tainted: G W 3.1.0+ #2 [ 272.029482] Call Trace: [ 272.029485] [] warn_slowpath_common+0x83/0x9b [ 272.029488] [] warn_slowpath_fmt+0x46/0x48 [ 272.029492] [] ? smp_call_function_single+0x97/0xfd [ 272.029499] [] e1000_acquire_swflag_ich8lan+0x4f/0x143 [e1000e] [ 272.029508] [] __e1000_read_phy_reg_hv+0x4d/0x157 [e1000e] [ 272.029518] [] e1000_read_phy_reg_hv+0x13/0x15 [e1000e] [ 272.029527] [] e1000_phy_read_status+0xf6/0x163 [e1000e] [ 272.029537] [] e1000_watchdog_task+0x104/0x5d2 [e1000e] [ 272.029540] [] ? __schedule+0x63b/0x669 [ 272.029550] [] ? e1000_update_mng_vlan+0x68/0x68 [e1000e] [ 272.029554] [] process_one_work+0x176/0x2a9 [ 272.029559] [] worker_thread+0xda/0x15d [ 272.029562] [] ? manage_workers+0x176/0x176 [ 272.029565] [] kthread+0x84/0x8c [ 272.029568] [] kernel_thread_helper+0x4/0x10 [ 272.029572] [] ? kthread_worker_fn+0x148/0x148 [ 272.029575] [] ? gs_change+0x13/0x13 [ 272.029577] ---[ end trace a0f13f2533f67469 ]--- --------------------------- So, yeah, still there with the latest code. PS. I will post this comment into the bug too. -- Bojan