* Memory corruption on hibernate/thaw with KMS
@ 2011-09-27 6:12 Bojan Smojver
2011-09-29 23:38 ` Bojan Smojver
` (4 more replies)
0 siblings, 5 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-09-27 6:12 UTC (permalink / raw)
To: intel-gfx
This problem is covered by various bugs, one of them being:
https://bugzilla.kernel.org/show_bug.cgi?id=37142
At some point there was a "solution" to essentially the same bug (I
believe http://bugzilla.kernel.org/show_bug.cgi?id=13811), but the
problem quickly resurfaced.
There are also similar bugs in the Free Desktop bug database. For
instance:
https://bugs.freedesktop.org/show_bug.cgi?id=40241
Does anyone have any idea what's causing this? Or better, how to fix it?
I can confirm that as of kernel 3.1.0-rc7 this is still a problem.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-09-27 6:12 Memory corruption on hibernate/thaw with KMS Bojan Smojver
@ 2011-09-29 23:38 ` Bojan Smojver
2011-10-03 6:44 ` Bojan Smojver
` (3 subsequent siblings)
4 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-09-29 23:38 UTC (permalink / raw)
To: intel-gfx
On Tue, 2011-09-27 at 16:12 +1000, Bojan Smojver wrote:
> Does anyone have any idea what's causing this? Or better, how to fix
> it? I can confirm that as of kernel 3.1.0-rc7 this is still a problem.
Anyone?
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-09-27 6:12 Memory corruption on hibernate/thaw with KMS Bojan Smojver
2011-09-29 23:38 ` Bojan Smojver
@ 2011-10-03 6:44 ` Bojan Smojver
2011-10-04 2:32 ` Eugeni Dodonov
2011-10-10 7:15 ` Bojan Smojver
` (2 subsequent siblings)
4 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-03 6:44 UTC (permalink / raw)
To: intel-gfx
On Tue, 2011-09-27 at 16:12 +1000, Bojan Smojver wrote:
> This problem is covered by various bugs, one of them being:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=37142
>
> At some point there was a "solution" to essentially the same bug (I
> believe http://bugzilla.kernel.org/show_bug.cgi?id=13811), but the
> problem quickly resurfaced.
>
> There are also similar bugs in the Free Desktop bug database. For
> instance:
>
> https://bugs.freedesktop.org/show_bug.cgi?id=40241
>
> Does anyone have any idea what's causing this? Or better, how to fix it?
> I can confirm that as of kernel 3.1.0-rc7 this is still a problem.
OK, given that nobody knows what the cause/fix is for this, can anyone
confirm that they can reproduce this on their system with Intel
graphics?
For me, this occurs after several hibernate/thaw cycles on my ThinkPad
T510 when KMS is in use.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-03 6:44 ` Bojan Smojver
@ 2011-10-04 2:32 ` Eugeni Dodonov
2011-10-04 2:46 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Eugeni Dodonov @ 2011-10-04 2:32 UTC (permalink / raw)
To: Bojan Smojver; +Cc: intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 1209 bytes --]
On Oct 3, 2011 3:44 AM, "Bojan Smojver" <bojan@rexursive.com> wrote:
>
> On Tue, 2011-09-27 at 16:12 +1000, Bojan Smojver wrote:
> > This problem is covered by various bugs, one of them being:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=37142
> >
> > At some point there was a "solution" to essentially the same bug (I
> > believe http://bugzilla.kernel.org/show_bug.cgi?id=13811), but the
> > problem quickly resurfaced.
> >
> > There are also similar bugs in the Free Desktop bug database. For
> > instance:
> >
> > https://bugs.freedesktop.org/show_bug.cgi?id=40241
> >
> > Does anyone have any idea what's causing this? Or better, how to fix it?
> > I can confirm that as of kernel 3.1.0-rc7 this is still a problem.
>
> OK, given that nobody knows what the cause/fix is for this, can anyone
> confirm that they can reproduce this on their system with Intel
> graphics?
>
> For me, this occurs after several hibernate/thaw cycles on my ThinkPad
> T510 when KMS is in use.
>
Hi,
I was investigating the issue for a couple of weeks, but I was unable to
reproduce it on any of my machines. Does it only happens only on this
thinkpad, or you can reproduce it consistently on other machines as well?
[-- Attachment #1.2: Type: text/html, Size: 1722 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-04 2:32 ` Eugeni Dodonov
@ 2011-10-04 2:46 ` Bojan Smojver
2011-10-04 3:21 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-04 2:46 UTC (permalink / raw)
To: Eugeni Dodonov; +Cc: intel-gfx
On Mon, 2011-10-03 at 23:32 -0300, Eugeni Dodonov wrote:
> I was investigating the issue for a couple of weeks, but I was unable
> to reproduce it on any of my machines. Does it only happens only on
> this thinkpad, or you can reproduce it consistently on other machines
> as well?
My other machines are either VMs or servers, so hibernation with Intel
graphics and KMS doesn't play. So, yeah, for all intents and purposes, I
can only reproduce it on this system.
I believe folks form kernel bug #37142 may have similar hardware. There
are others that can see the same. For instance:
https://bugzilla.redhat.com/show_bug.cgi?id=603897#c31
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-04 2:46 ` Bojan Smojver
@ 2011-10-04 3:21 ` Bojan Smojver
0 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-04 3:21 UTC (permalink / raw)
To: Eugeni Dodonov; +Cc: intel-gfx
On Tue, 2011-10-04 at 13:46 +1100, Bojan Smojver wrote:
> So, yeah, for all intents and purposes, I can only reproduce it on
> this system.
If it matters, before I switched to ThinkPad T510, I had a Dell Inspiron
6400, also with Intel graphics. I was then hitting:
https://bugzilla.redhat.com/show_bug.cgi?id=537494
Which is:
http://bugzilla.kernel.org/show_bug.cgi?id=13811
Which is more or less the same bug. Essentially, pages get corrupted on
thaw after the hibernate/thaw cycle is repeated several times.
If you look at this comment of that Fedora bug:
https://bugzilla.redhat.com/show_bug.cgi?id=537494#c67
You will see that the problem has been fixed at some point.
But then the code regressed again. My guess is when the code was
rewritten in such a way that the patch that fixed kernel bug #13811 did
no apply any more.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-09-27 6:12 Memory corruption on hibernate/thaw with KMS Bojan Smojver
2011-09-29 23:38 ` Bojan Smojver
2011-10-03 6:44 ` Bojan Smojver
@ 2011-10-10 7:15 ` Bojan Smojver
2011-10-10 7:53 ` Daniel Vetter
2011-10-10 8:01 ` Bojan Smojver
2011-10-14 1:02 ` Bojan Smojver
4 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-10 7:15 UTC (permalink / raw)
To: intel-gfx
On Tue, 2011-09-27 at 16:12 +1000, Bojan Smojver wrote:
> I can confirm that as of kernel 3.1.0-rc7 this is still a problem.
Just tried some hibernation loops with 3.1.0-rc9+ and got a hang caused
by one of the GPU threads being busy (claimed that GPU was busy). When I
tried the same with nomodeset, I could hibernate/thaw 5 times in a row
just fine.
Will try some more tests, just to confirm whether memory corruption
thing is still present.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-10 7:15 ` Bojan Smojver
@ 2011-10-10 7:53 ` Daniel Vetter
2011-10-10 8:18 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Daniel Vetter @ 2011-10-10 7:53 UTC (permalink / raw)
To: Bojan Smojver; +Cc: intel-gfx
On Mon, Oct 10, 2011 at 06:15:06PM +1100, Bojan Smojver wrote:
> On Tue, 2011-09-27 at 16:12 +1000, Bojan Smojver wrote:
> > I can confirm that as of kernel 3.1.0-rc7 this is still a problem.
>
> Just tried some hibernation loops with 3.1.0-rc9+ and got a hang caused
> by one of the GPU threads being busy (claimed that GPU was busy). When I
> tried the same with nomodeset, I could hibernate/thaw 5 times in a row
> just fine.
>
> Will try some more tests, just to confirm whether memory corruption
> thing is still present.
Can you try the patch attached to fdo #40241, i.e.
https://bugs.freedesktop.org/attachment.cgi?id=50648
I think I have an idea what's going wrong here.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-09-27 6:12 Memory corruption on hibernate/thaw with KMS Bojan Smojver
` (2 preceding siblings ...)
2011-10-10 7:15 ` Bojan Smojver
@ 2011-10-10 8:01 ` Bojan Smojver
2011-10-14 1:02 ` Bojan Smojver
4 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-10 8:01 UTC (permalink / raw)
To: intel-gfx
On Mon, 2011-10-10 at 18:15 +1100, Bojan Smojver wrote:
> Will try some more tests, just to confirm whether memory corruption
> thing is still present.
Yes, still present. I did this on my Fedora 15 system with 3.1.0-rc9+
(git pull as of half an hour ago):
echo -n reboot > /sys/power/disk
for (( i=0; i<15; i++)); do pm-hibernate; sleep 2; done
With nomodeset passed into to the kernel, the cycle finished
successfully. There was one program that segfaulted (modem-manager) in
one of the cycles only. I'm guessing this is probably a bug in the
program.
Without nomodeset (i.e. KMS), I got corruption and "unable to handle
paging request", followed by kernel hang on second thaw. It's pretty
much what happens when corruption occurs.
PS. Kernel 3.1.0-rc9+ was patched with my own hibernation patch that
calculates CRC32 of image pages on hibernate/thaw, so the chances that
what was read in was not what was hibernated were minimal. You can find
it here: http://marc.info/?l=linux-kernel&m=131820444524522&w=2. I'm
experiencing the same with Fedora supplied kernels (i.e.
2.6.40.6-0.fc15.x86_64, which is really 3.0.6).
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-10 7:53 ` Daniel Vetter
@ 2011-10-10 8:18 ` Bojan Smojver
2011-10-10 10:05 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-10 8:18 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Mon, 2011-10-10 at 09:53 +0200, Daniel Vetter wrote:
> Can you try the patch attached to fdo #40241, i.e.
> https://bugs.freedesktop.org/attachment.cgi?id=50648
>
> I think I have an idea what's going wrong here.
I will try and let you know. From what I've seen at some other bugs, it
helped some people, but not others.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-10 8:18 ` Bojan Smojver
@ 2011-10-10 10:05 ` Bojan Smojver
2011-10-10 10:12 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-10 10:05 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Mon, 2011-10-10 at 19:18 +1100, Bojan Smojver wrote:
> I will try and let you know. From what I've seen at some other bugs,
> it helped some people, but not others.
After about 20 cycles, got:
-----------------------------------------------
[ 173.834896] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 173.837583] IP: [<ffffffff810f0c73>] shmem_evict_inode+0x8b/0xd5
[ 173.840247] PGD 225d9f067 PUD 22c3ca067 PMD 0
[ 173.842880] Oops: 0000 [#1] SMP
[ 173.845474] CPU 1
[ 173.845491] Modules linked in: fuse ppdev parport_pc lp parport bnep bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf arc4 iwlagn mac80211 snd_hda_codec_hdmi snd_hda_codec_conexant qcserial snd_hda_intel snd_hda_codec usb_wwan snd_hwdep snd_seq cfg80211 uvcvideo videodev snd_seq_device media snd_pcm thinkpad_acpi e1000e snd_timer mxm_wmi pcspkr v4l2_compat_ioctl32 rfkill iTCO_wdt iTCO_vendor_support snd i2c_i801 intel_ips joydev snd_page_alloc soundcore wmi microcode ipv6 sdhci_pci sdhci mmc_core firewire_ohci firewire_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[ 173.858814]
[ 173.861482] Pid: 4826, comm: rm Not tainted 3.1.0-rc9+ #105 LENOVO 4313CTO/4313CTO
[ 173.864162] RIP: 0010:[<ffffffff810f0c73>] [<ffffffff810f0c73>] shmem_evict_inode+0x8b/0xd5
[ 173.866713] RSP: 0018:ffff88022c47de08 EFLAGS: 00010246
[ 173.869182] RAX: 000000004e92c20b RBX: ffff880228765f10 RCX: 0000000000038478
[ 173.871654] RDX: 0000000012e1ecfd RSI: ffffffffffffffff RDI: ffffffff81d64300
[ 173.874125] RBP: ffff88022c47de28 R08: 000000000000000e R09: 0000000000000000
[ 173.876540] R10: 0000000000000000 R11: ffffea0007afaf40 R12: 0000000000000000
[ 173.878834] R13: ffffffff81613c70 R14: ffff880228765f00 R15: 0000000000000000
[ 173.881140] FS: 00007fbef7ac4720(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
[ 173.883483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 173.885822] CR2: 0000000000000000 CR3: 000000022c337000 CR4: 00000000000006e0
[ 173.888202] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 173.890595] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 173.892968] Process rm (pid: 4826, threadinfo ffff88022c47c000, task ffff880229aadc80)
[ 173.895369] Stack:
[ 173.897756] ffff880228765f10 ffff880228766010 ffffffff81613c70 00007fffdf431330
[ 173.900193] ffff88022c47de58 ffffffff8113b944 ffff880228766000 ffff880228765f10
[ 173.902662] ffff88022aae9c00 ffffffff81613c70 ffff88022c47de88 ffffffff8113bb74
[ 173.905117] Call Trace:
[ 173.907532] [<ffffffff8113b944>] evict+0x93/0x149
[ 173.909943] [<ffffffff8113bb74>] iput+0x17a/0x182
[ 173.912345] [<ffffffff8113327d>] do_unlinkat+0x110/0x15f
[ 173.914742] [<ffffffff811309b6>] ? path_put+0x1f/0x23
[ 173.917132] [<ffffffff810a2328>] ? audit_syscall_entry+0x145/0x171
[ 173.919509] [<ffffffff811344d1>] sys_unlinkat+0x29/0x2b
[ 173.921882] [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[ 173.924223] Code: 44 f1 15 00 4c 89 63 e0 4c 89 63 e8 48 c7 c7 80 55 a3 81 e8 92 5a 3a 00 eb 09 48 8b 7f c8 e8 47 67 02 00 4c 8b 63 f0 4c 8d 73 f0 <4d> 8b 2c 24 eb 19 49 8b 7c 24 10 e8 2f 67 02 00 4c 89 e7 4d 89
[ 173.929579] RIP [<ffffffff810f0c73>] shmem_evict_inode+0x8b/0xd5
[ 173.932226] RSP <ffff88022c47de08>
[ 173.934866] CR2: 0000000000000000
[ 173.951595] ---[ end trace b2877a78c7f8d86f ]---
-----------------------------------------------
Which may or may not be corruption. Will have to test more.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-10 10:05 ` Bojan Smojver
@ 2011-10-10 10:12 ` Bojan Smojver
2011-10-10 10:37 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-10 10:12 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Mon, 2011-10-10 at 21:05 +1100, Bojan Smojver wrote:
> Which may or may not be corruption. Will have to test more.
When I attempted to shut the machine down, it hung and I could see more
kernel traces on the console.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-10 10:12 ` Bojan Smojver
@ 2011-10-10 10:37 ` Bojan Smojver
2011-10-10 11:23 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-10 10:37 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Mon, 2011-10-10 at 21:12 +1100, Bojan Smojver wrote:
> When I attempted to shut the machine down, it hung and I could see
> more
> kernel traces on the console.
Tried again, but this time I used chvt 2; sleep 1; chvt 1 in the
sequence (this seems to agitate the bug faster). Indeed, got problems
quickly:
---------------------------
[ 175.770300] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 175.774934] IP: [<ffffffff81243038>] prio_tree_replace+0x4b/0x66
[ 175.779296] PGD 1f88d0067 PUD 1f88af067 PMD 0
[ 175.783593] Oops: 0002 [#1] SMP
[ 175.788025] CPU 2
[ 175.788055] Modules linked in: fuse ppdev parport_pc lp parport sunrpc bnep bluetooth cpufreq_ondemand acpi_cpufreq freq_table mperf arc4 iwlagn snd_hda_codec_hdmi mac80211 snd_hda_codec_conexant uvcvideo snd_hda_intel snd_hda_codec videodev snd_hwdep media snd_seq qcserial v4l2_compat_ioctl32 usb_wwan snd_seq_device snd_pcm cfg80211 thinkpad_acpi snd_timer e1000e intel_ips iTCO_wdt iTCO_vendor_support joydev mxm_wmi snd_page_alloc snd i2c_i801 wmi rfkill pcspkr microcode soundcore ipv6 firewire_ohci sdhci_pci sdhci mmc_core firewire_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[ 175.810238]
[ 175.814616] Pid: 3763, comm: gcm-apply Not tainted 3.1.0-rc9+ #105 LENOVO 4313CTO/4313CTO
[ 175.819031] RIP: 0010:[<ffffffff81243038>] [<ffffffff81243038>] prio_tree_replace+0x4b/0x66
[ 175.823327] RSP: 0018:ffff8801f8bbfce8 EFLAGS: 00010207
[ 175.827099] RAX: ffff880229b84100 RBX: ffff8801f8bfb100 RCX: 0000000000000000
[ 175.829260] RDX: ffff880229b84050 RSI: ffff880229b84100 RDI: ffff88022c12b318
[ 175.831374] RBP: ffff8801f8bbfce8 R08: ffff880229b84100 R09: 0000000000000000
[ 175.833440] R10: ffff8801f8bfbd48 R11: ffff8801f8bfbd10 R12: ffff880229b84050
[ 175.835388] R13: ffff88022c12b318 R14: 0000000000000080 R15: 0000000000000000
[ 175.837373] FS: 0000000000000000(0000) GS:ffff88023bd00000(0000) knlGS:0000000000000000
[ 175.839407] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 175.841446] CR2: 0000000000000010 CR3: 00000002085a1000 CR4: 00000000000006e0
[ 175.843471] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 175.845556] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 175.847616] Process gcm-apply (pid: 3763, threadinfo ffff8801f8bbe000, task ffff8801a864ae40)
[ 175.849738] Stack:
[ 175.851809] ffff8801f8bbfd48 ffffffff8124327a ffff880229b84100 000000000000003d
[ 175.853961] 000000000000003f 0000000000000000 000000000000003d ffff8801f8bfb0b0
[ 175.856109] ffff8801f8bfb100 ffff8801f8bfbd50 ffff88022c12b2f8 ffff8801f8bfbd48
[ 175.858296] Call Trace:
[ 175.860435] [<ffffffff8124327a>] prio_tree_insert+0x16b/0x216
[ 175.862599] [<ffffffff810f12bf>] vma_prio_tree_insert+0x26/0x3c
[ 175.864759] [<ffffffff810fdc3f>] __vma_link_file+0x64/0x66
[ 175.866899] [<ffffffff810fe32e>] vma_link+0x75/0x95
[ 175.869014] [<ffffffff810ffd9a>] mmap_region+0x30a/0x46b
[ 175.871114] [<ffffffff81100194>] do_mmap_pgoff+0x299/0x2f3
[ 175.873205] [<ffffffff81100303>] sys_mmap_pgoff+0x115/0x164
[ 175.875334] [<ffffffff810126d0>] sys_mmap+0x22/0x24
[ 175.877409] [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[ 175.879523] Code: 0f 0b 48 89 17 eb 16 48 89 4a 10 48 8b 4e 10 48 39 31 75 05 48 89 11 eb 04 48 89 51 08 48 8b 08 48 39 c1 74 0a 48 89 0a 48 8b 08
[ 175.879735] 89 51 10 48 8b 48 08 48 39 c1 74 0c 48 89 4a 08 48 8b 48 08
[ 175.884169] RIP [<ffffffff81243038>] prio_tree_replace+0x4b/0x66
[ 175.886469] RSP <ffff8801f8bbfce8>
[ 175.888702] CR2: 0000000000000010
[ 176.196577] ---[ end trace 75df9d9a11de8acd ]---
[ 178.928408] PM: Marking nosave pages: 000000000009e000 - 0000000000100000
[ 178.928418] PM: Marking nosave pages: 00000000bb27c000 - 00000000bb282000
[ 178.928422] PM: Marking nosave pages: 00000000bb35f000 - 00000000bb40f000
[ 178.928430] PM: Marking nosave pages: 00000000bb46f000 - 00000000bb70f000
[ 178.928446] PM: Marking nosave pages: 00000000bb717000 - 00000000bb71f000
[ 178.928451] PM: Marking nosave pages: 00000000bb76c000 - 00000000bb7ff000
[ 178.928457] PM: Marking nosave pages: 00000000bb800000 - 0000000100000000
[ 178.930551] PM: Marking nosave pages: 00000001fc000000 - 0000000200000000
[ 178.930855] PM: Basic memory bitmaps created
[ 178.930859] PM: Syncing filesystems ... done.
[ 179.006362] Freezing user space processes ...
[ 198.991114] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 198.991165] gcm-apply D 0000000000000000 0 3763 1 0x00800084
[ 198.991175] ffff8801f8bbf860 0000000000000086 0000000000000000 ffff880100000000
[ 198.991185] ffff8801a864ae40 ffff8801f8bbffd8 ffff8801f8bbffd8 0000000000012d00
[ 198.991194] ffff88022e5d1720 ffff8801a864ae40 ffff8801a864b2c0 0000000100000002
[ 198.991203] Call Trace:
[ 198.991213] [<ffffffff8149612f>] schedule+0x5a/0x5c
[ 198.991218] [<ffffffff81497514>] rwsem_down_failed_common+0xd3/0x105
[ 198.991222] [<ffffffff814975fc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[ 198.991225] [<ffffffff8149756d>] rwsem_down_read_failed+0x12/0x14
[ 198.991231] [<ffffffff8124a844>] call_rwsem_down_read_failed+0x14/0x30
[ 198.991235] [<ffffffff81496ca4>] ? down_read+0x21/0x25
[ 198.991240] [<ffffffff81091d21>] acct_collect+0x4a/0x182
[ 198.991246] [<ffffffff8105af7c>] do_exit+0x21e/0x722
[ 198.991249] [<ffffffff810591ac>] ? kmsg_dump+0x4b/0xd7
[ 198.991253] [<ffffffff8149876e>] oops_end+0xbc/0xc5
[ 198.991256] [<ffffffff8148ddad>] no_context+0x203/0x212
[ 198.991259] [<ffffffff8148df87>] __bad_area_nosemaphore+0x1cb/0x1ec
[ 198.991263] [<ffffffff8108a8cf>] ? search_module_extables+0x3f/0x69
[ 198.991266] [<ffffffff8148dfbb>] bad_area_nosemaphore+0x13/0x15
[ 198.991270] [<ffffffff8149a716>] do_page_fault+0x1b8/0x37e
[ 198.991276] [<ffffffff811242ae>] ? lookup_page_cgroup+0x28/0x3e
[ 198.991282] [<ffffffff8116ed27>] ? dquot_file_open+0x1b/0x3e
[ 198.991285] [<ffffffff81497c75>] page_fault+0x25/0x30
[ 198.991289] [<ffffffff81243038>] ? prio_tree_replace+0x4b/0x66
[ 198.991292] [<ffffffff8124327a>] prio_tree_insert+0x16b/0x216
[ 198.991297] [<ffffffff810f12bf>] vma_prio_tree_insert+0x26/0x3c
[ 198.991302] [<ffffffff810fdc3f>] __vma_link_file+0x64/0x66
[ 198.991305] [<ffffffff810fe32e>] vma_link+0x75/0x95
[ 198.991309] [<ffffffff810ffd9a>] mmap_region+0x30a/0x46b
[ 198.991312] [<ffffffff81100194>] do_mmap_pgoff+0x299/0x2f3
[ 198.991315] [<ffffffff81100303>] sys_mmap_pgoff+0x115/0x164
[ 198.991322] [<ffffffff810126d0>] sys_mmap+0x22/0x24
[ 198.991326] [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[ 198.991330]
[ 198.991331] Restarting tasks ... done.
[ 198.992966] PM: Basic memory bitmaps freed
[ 198.996762] video LNXVIDEO:00: Restoring backlight state
[ 204.166989] PM: Marking nosave pages: 000000000009e000 - 0000000000100000
[ 204.166994] PM: Marking nosave pages: 00000000bb27c000 - 00000000bb282000
[ 204.166997] PM: Marking nosave pages: 00000000bb35f000 - 00000000bb40f000
[ 204.167001] PM: Marking nosave pages: 00000000bb46f000 - 00000000bb70f000
[ 204.167014] PM: Marking nosave pages: 00000000bb717000 - 00000000bb71f000
[ 204.167016] PM: Marking nosave pages: 00000000bb76c000 - 00000000bb7ff000
[ 204.167021] PM: Marking nosave pages: 00000000bb800000 - 0000000100000000
[ 204.168816] PM: Marking nosave pages: 00000001fc000000 - 0000000200000000
[ 204.169060] PM: Basic memory bitmaps created
[ 204.169061] PM: Syncing filesystems ... done.
[ 204.242358] Freezing user space processes ...
[ 224.228897] Freezing of tasks failed after 20.00 seconds (3 tasks refusing to freeze, wq_busy=0):
[ 224.230914] gnome-settings- D 0000000000000000 0 1716 1553 0x00800084
[ 224.232935] ffff88022a76bcd0 0000000000000086 0000000008100073 ffffea0000000000
[ 224.234996] ffff88022d059720 ffff88022a76bfd8 ffff88022a76bfd8 0000000000012d00
[ 224.237043] ffffffff81a0d020 ffff88022d059720 00000000817c1372 0000000100000001
[ 224.239067] Call Trace:
[ 224.241040] [<ffffffff810f7b97>] ? pmd_offset+0x19/0x3f
[ 224.243014] [<ffffffff8149612f>] schedule+0x5a/0x5c
[ 224.244981] [<ffffffff81496843>] __mutex_lock_common+0x102/0x163
[ 224.246946] [<ffffffff814969dc>] __mutex_lock_slowpath+0x1b/0x1d
[ 224.248883] [<ffffffff81496970>] mutex_lock+0x23/0x37
[ 224.250799] [<ffffffff81055b52>] dup_mm+0x2da/0x488
[ 224.252704] [<ffffffff810566db>] copy_process+0x9b1/0x119c
[ 224.254592] [<ffffffff811fc8dd>] ? security_file_alloc+0x16/0x18
[ 224.256473] [<ffffffff81056ff0>] do_fork+0xef/0x22d
[ 224.258329] [<ffffffff81497596>] ? _raw_spin_lock+0xe/0x10
[ 224.260184] [<ffffffff811309b6>] ? path_put+0x1f/0x23
[ 224.262024] [<ffffffff81016336>] sys_clone+0x28/0x2a
[ 224.263836] [<ffffffff8149e063>] stub_clone+0x13/0x20
[ 224.265631] [<ffffffff8149dd42>] ? system_call_fastpath+0x16/0x1b
[ 224.267423] gnome-settings- D ffff88022ab2c700 0 1721 1553 0x00800084
[ 224.269227] ffff8802286c3890 0000000000000086 0000000000000000 0000000000000000
[ 224.271071] ffff88022a6a2e40 ffff8802286c3fd8 ffff8802286c3fd8 0000000000012d00
[ 224.272912] ffff880219774560 ffff88022a6a2e40 0000000000000000 0000000000000000
[ 224.274740] Call Trace:
[ 224.276530] [<ffffffff8149612f>] schedule+0x5a/0x5c
[ 224.278320] [<ffffffff81497514>] rwsem_down_failed_common+0xd3/0x105
[ 224.280105] [<ffffffff8149756d>] rwsem_down_read_failed+0x12/0x14
[ 224.281874] [<ffffffff8124a844>] call_rwsem_down_read_failed+0x14/0x30
[ 224.283615] [<ffffffff81497a2d>] ? restore_args+0x30/0x30
[ 224.285331] [<ffffffff81496ca4>] ? down_read+0x21/0x25
[ 224.287026] [<ffffffff81497a2d>] ? restore_args+0x30/0x30
[ 224.288707] [<ffffffff8149a723>] do_page_fault+0x1c5/0x37e
[ 224.290389] [<ffffffff81495e9e>] ? __schedule+0x63b/0x669
[ 224.292060] [<ffffffff81075d75>] ? __remove_hrtimer+0x5c/0x83
[ 224.293718] [<ffffffff8149612f>] ? schedule+0x5a/0x5c
[ 224.295365] [<ffffffff81497c75>] page_fault+0x25/0x30
[ 224.297025] [<ffffffff811377e4>] ? do_sys_poll+0x32c/0x389
[ 224.298694] [<ffffffff811377cf>] ? do_sys_poll+0x317/0x389
[ 224.300340] [<ffffffff811368f4>] ? poll_freewait+0xaa/0xaa
[ 224.301971] [<ffffffff811369c0>] ? __pollwait+0xcc/0xcc
[ 224.303588] [<ffffffff814975fc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[ 224.305195] [<ffffffff8105137e>] ? select_task_rq_fair+0x3cc/0x658
[ 224.306785] [<ffffffff8102ab9c>] ? _flat_send_IPI_mask+0x7b/0x84
[ 224.308363] [<ffffffff8105c41f>] ? current_fs_time+0x37/0x3e
[ 224.309931] [<ffffffff8113afec>] ? touch_atime+0xf8/0x113
[ 224.311483] [<ffffffff81082322>] ? get_futex_key+0x8e/0x274
[ 224.313019] [<ffffffff81082a62>] ? futex_wake+0xfe/0x110
[ 224.314539] [<ffffffff811559d4>] ? fsnotify+0x1eb/0x217
[ 224.316046] [<ffffffff811378e4>] sys_poll+0x51/0xbb
[ 224.317618] [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[ 224.319148] gcm-apply D 0000000000000000 0 3763 1 0x00800084
[ 224.320644] ffff8801f8bbf860 0000000000000086 0000000000000000 ffff880100000000
[ 224.322166] ffff8801a864ae40 ffff8801f8bbffd8 ffff8801f8bbffd8 0000000000012d00
[ 224.323680] ffff88022e5d1720 ffff8801a864ae40 ffff8801a864b2c0 0000000100000002
[ 224.325181] Call Trace:
[ 224.326640] [<ffffffff8149612f>] schedule+0x5a/0x5c
[ 224.328098] [<ffffffff81497514>] rwsem_down_failed_common+0xd3/0x105
[ 224.329552] [<ffffffff814975fc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[ 224.330990] [<ffffffff8149756d>] rwsem_down_read_failed+0x12/0x14
[ 224.332418] [<ffffffff8124a844>] call_rwsem_down_read_failed+0x14/0x30
[ 224.333831] [<ffffffff81496ca4>] ? down_read+0x21/0x25
[ 224.335243] [<ffffffff81091d21>] acct_collect+0x4a/0x182
[ 224.336650] [<ffffffff8105af7c>] do_exit+0x21e/0x722
[ 224.338052] [<ffffffff810591ac>] ? kmsg_dump+0x4b/0xd7
[ 224.339455] [<ffffffff8149876e>] oops_end+0xbc/0xc5
[ 224.340861] [<ffffffff8148ddad>] no_context+0x203/0x212
[ 224.342356] [<ffffffff8148df87>] __bad_area_nosemaphore+0x1cb/0x1ec
[ 224.342361] [<ffffffff8108a8cf>] ? search_module_extables+0x3f/0x69
[ 224.342366] [<ffffffff8148dfbb>] bad_area_nosemaphore+0x13/0x15
[ 224.342373] [<ffffffff8149a716>] do_page_fault+0x1b8/0x37e
[ 224.342378] [<ffffffff811242ae>] ? lookup_page_cgroup+0x28/0x3e
[ 224.342382] [<ffffffff8116ed27>] ? dquot_file_open+0x1b/0x3e
[ 224.342385] [<ffffffff81497c75>] page_fault+0x25/0x30
[ 224.342387] [<ffffffff81243038>] ? prio_tree_replace+0x4b/0x66
[ 224.342389] [<ffffffff8124327a>] prio_tree_insert+0x16b/0x216
[ 224.342392] [<ffffffff810f12bf>] vma_prio_tree_insert+0x26/0x3c
[ 224.342395] [<ffffffff810fdc3f>] __vma_link_file+0x64/0x66
[ 224.342397] [<ffffffff810fe32e>] vma_link+0x75/0x95
[ 224.342399] [<ffffffff810ffd9a>] mmap_region+0x30a/0x46b
[ 224.342402] [<ffffffff81100194>] do_mmap_pgoff+0x299/0x2f3
[ 224.342404] [<ffffffff81100303>] sys_mmap_pgoff+0x115/0x164
[ 224.342409] [<ffffffff810126d0>] sys_mmap+0x22/0x24
[ 224.342411] [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[ 224.342414]
[ 224.342414] Restarting tasks ... done.
[ 224.344753] PM: Basic memory bitmaps freed
[ 224.349953] video LNXVIDEO:00: Restoring backlight state
---------------------------
So, at first glance, I would say the patch didn't fix it. Let me repeat
my tests using nomodeset.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-10 10:37 ` Bojan Smojver
@ 2011-10-10 11:23 ` Bojan Smojver
2011-10-11 9:29 ` Daniel Vetter
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-10 11:23 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Mon, 2011-10-10 at 21:37 +1100, Bojan Smojver wrote:
> Let me repeat my tests using nomodeset.
I just repeated 67 hibernate/thaw cycles with nomodeset, which took
almost an hour. I'm writing this e-mail from that thawed session. No
trouble whatsoever.
So, two conclusions:
- the problem is in i915 KMS code
- that patch does not fix it
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-10 11:23 ` Bojan Smojver
@ 2011-10-11 9:29 ` Daniel Vetter
2011-10-11 9:42 ` Bojan Smojver
2011-10-11 9:55 ` Bojan Smojver
0 siblings, 2 replies; 35+ messages in thread
From: Daniel Vetter @ 2011-10-11 9:29 UTC (permalink / raw)
To: Bojan Smojver; +Cc: intel-gfx
On Mon, Oct 10, 2011 at 13:23, Bojan Smojver <bojan@rexursive.com> wrote:
> On Mon, 2011-10-10 at 21:37 +1100, Bojan Smojver wrote:
>> Let me repeat my tests using nomodeset.
>
> I just repeated 67 hibernate/thaw cycles with nomodeset, which took
> almost an hour. I'm writing this e-mail from that thawed session. No
> trouble whatsoever.
>
> So, two conclusions:
>
> - the problem is in i915 KMS code
> - that patch does not fix it
Ok, we have a new idea. Can you boot your kernel with
memmap=2M#512M memmap=2M#1024M
added to your boot cmdline? Also please attach the full dmesg after
boot (only the boot messages matter) both with and without these
options.
Thanks, Daniel
--
Daniel Vetter
daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 9:29 ` Daniel Vetter
@ 2011-10-11 9:42 ` Bojan Smojver
2011-10-11 11:02 ` Bojan Smojver
2011-10-11 9:55 ` Bojan Smojver
1 sibling, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-11 9:42 UTC (permalink / raw)
To: daniel; +Cc: intel-gfx
------- Original message -------
> From: Daniel Vetter
> Ok, we have a new idea. Can you boot your kernel with
>
> memmap=2M#512M memmap=2M#1024M
>
> added to your boot cmdline? Also please attach the full dmesg after
> boot (only the boot messages matter) both with and without these
> options.
Shall do. And thanks for looking into it!
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 9:29 ` Daniel Vetter
2011-10-11 9:42 ` Bojan Smojver
@ 2011-10-11 9:55 ` Bojan Smojver
2011-10-11 10:39 ` Daniel Vetter
1 sibling, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-11 9:55 UTC (permalink / raw)
To: daniel; +Cc: intel-gfx
------- Original message -------
> From: Daniel Vetter
> memmap=2M#512M memmap=2M#1024M
>
> added to your boot cmdline?
One question: should I keep the patch thar replaces freeze/thaw with
suspend/resume when testing or not?
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 9:55 ` Bojan Smojver
@ 2011-10-11 10:39 ` Daniel Vetter
2011-10-11 10:41 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Daniel Vetter @ 2011-10-11 10:39 UTC (permalink / raw)
To: Bojan Smojver; +Cc: intel-gfx
On Tue, Oct 11, 2011 at 08:55:26PM +1100, Bojan Smojver wrote:
> ------- Original message -------
> >From: Daniel Vetter
>
> >memmap=2M#512M memmap=2M#1024M
> >
> >added to your boot cmdline?
>
> One question: should I keep the patch thar replaces freeze/thaw with
> suspend/resume when testing or not?
Probably drop it. This checks for a rather different kind of bug.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 10:39 ` Daniel Vetter
@ 2011-10-11 10:41 ` Bojan Smojver
0 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-11 10:41 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Tue, 2011-10-11 at 12:39 +0200, Daniel Vetter wrote:
> Probably drop it. This checks for a rather different kind of bug.
OK.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 9:42 ` Bojan Smojver
@ 2011-10-11 11:02 ` Bojan Smojver
2011-10-11 11:22 ` Daniel Vetter
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-11 11:02 UTC (permalink / raw)
To: intel-gfx
On Tue, 2011-10-11 at 20:42 +1100, Bojan Smojver wrote:
> Shall do.
Sent the files to your e-mail address. Ended in a crash.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:02 ` Bojan Smojver
@ 2011-10-11 11:22 ` Daniel Vetter
2011-10-11 11:31 ` Daniel Vetter
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Daniel Vetter @ 2011-10-11 11:22 UTC (permalink / raw)
To: Bojan Smojver; +Cc: intel-gfx
On Tue, Oct 11, 2011 at 10:02:18PM +1100, Bojan Smojver wrote:
> On Tue, 2011-10-11 at 20:42 +1100, Bojan Smojver wrote:
> > Shall do.
>
> Sent the files to your e-mail address. Ended in a crash.
Ok, this is bug thread is getting a bit unwindy. And I am running low on
ideas. Can you open a bug at fdo with the usual details (dont forget lspci
-nn) and all the things we've already gathered/tried.
Thanks, Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:22 ` Daniel Vetter
@ 2011-10-11 11:31 ` Daniel Vetter
2011-10-11 12:25 ` Bojan Smojver
` (3 more replies)
2011-10-11 12:24 ` Bojan Smojver
2011-10-12 3:09 ` Bojan Smojver
2 siblings, 4 replies; 35+ messages in thread
From: Daniel Vetter @ 2011-10-11 11:31 UTC (permalink / raw)
To: Bojan Smojver; +Cc: intel-gfx
On Tue, Oct 11, 2011 at 01:22:09PM +0200, Daniel Vetter wrote:
> On Tue, Oct 11, 2011 at 10:02:18PM +1100, Bojan Smojver wrote:
> > On Tue, 2011-10-11 at 20:42 +1100, Bojan Smojver wrote:
> > > Shall do.
> >
> > Sent the files to your e-mail address. Ended in a crash.
>
> Ok, this is bug thread is getting a bit unwindy. And I am running low on
> ideas. Can you open a bug at fdo with the usual details (dont forget lspci
> -nn) and all the things we've already gathered/tried.
Another thing to check: Do you have the intel_ips module loaded? If so,
does the corruption still occur if you never ever load it?
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:22 ` Daniel Vetter
2011-10-11 11:31 ` Daniel Vetter
@ 2011-10-11 12:24 ` Bojan Smojver
2011-10-11 15:51 ` Eugeni Dodonov
2011-10-12 3:09 ` Bojan Smojver
2 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-11 12:24 UTC (permalink / raw)
To: daniel; +Cc: intel-gfx
------- Original message -------
> From: Daniel Vetter
> Ok, this is bug thread is getting a bit unwindy. And I am running low on
> ideas. Can you open a bug at fdo with the usual details (dont forget
> lspci
> -nn) and all the things we've already gathered/tried.
Shall do.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:31 ` Daniel Vetter
@ 2011-10-11 12:25 ` Bojan Smojver
2011-10-12 1:38 ` Bojan Smojver
` (2 subsequent siblings)
3 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-11 12:25 UTC (permalink / raw)
To: daniel; +Cc: intel-gfx
------- Original message -------
> From: Daniel Vetter
> Another thing to check: Do you have the intel_ips module loaded? If so,
> does the corruption still occur if you never ever load it?
Will check in the morning.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 12:24 ` Bojan Smojver
@ 2011-10-11 15:51 ` Eugeni Dodonov
2011-10-11 16:13 ` Daniel Vetter
0 siblings, 1 reply; 35+ messages in thread
From: Eugeni Dodonov @ 2011-10-11 15:51 UTC (permalink / raw)
To: Bojan Smojver; +Cc: intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 525 bytes --]
On Tue, Oct 11, 2011 at 09:24, Bojan Smojver <bojan@rexursive.com> wrote:
> ------- Original message -------
>
>> From: Daniel Vetter
>>
>
> Ok, this is bug thread is getting a bit unwindy. And I am running low on
>> ideas. Can you open a bug at fdo with the usual details (dont forget lspci
>> -nn) and all the things we've already gathered/tried.
>>
>
> Shall do.
>
Or we could centralize this investigation on
https://bugs.freedesktop.org/show_bug.cgi?id=40241 perhaps?
--
Eugeni Dodonov
<http://eugeni.dodonov.net/>
[-- Attachment #1.2: Type: text/html, Size: 1173 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 15:51 ` Eugeni Dodonov
@ 2011-10-11 16:13 ` Daniel Vetter
0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2011-10-11 16:13 UTC (permalink / raw)
To: Eugeni Dodonov; +Cc: intel-gfx
On Tue, Oct 11, 2011 at 12:51:02PM -0300, Eugeni Dodonov wrote:
> On Tue, Oct 11, 2011 at 09:24, Bojan Smojver <bojan@rexursive.com> wrote:
>
> > ------- Original message -------
> >
> >> From: Daniel Vetter
> >>
> >
> > Ok, this is bug thread is getting a bit unwindy. And I am running low on
> >> ideas. Can you open a bug at fdo with the usual details (dont forget lspci
> >> -nn) and all the things we've already gathered/tried.
> >>
> >
> > Shall do.
> >
>
> Or we could centralize this investigation on
> https://bugs.freedesktop.org/show_bug.cgi?id=40241 perhaps?
For untractable problems like these, I prefer one bug report per reporter
(especially when then reporter has such a quick turnaround time like here).
Obviously cross-linking bugs is good, but only tag them as duplicates when
we have the proof in form of a patch (that fixes both bugs).
Merging this hard issues to early just results in an utter mess and
confusion.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:31 ` Daniel Vetter
2011-10-11 12:25 ` Bojan Smojver
@ 2011-10-12 1:38 ` Bojan Smojver
2011-10-14 1:14 ` Bojan Smojver
2011-10-14 1:29 ` Bojan Smojver
3 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-12 1:38 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Tue, 2011-10-11 at 13:31 +0200, Daniel Vetter wrote:
> Do you have the intel_ips module loaded?
I do.
> If so, does the corruption still occur if you never ever load it?
Will have to test.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:22 ` Daniel Vetter
2011-10-11 11:31 ` Daniel Vetter
2011-10-11 12:24 ` Bojan Smojver
@ 2011-10-12 3:09 ` Bojan Smojver
2011-10-12 22:16 ` Bojan Smojver
2011-10-26 3:44 ` Bojan Smojver
2 siblings, 2 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-12 3:09 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Tue, 2011-10-11 at 13:22 +0200, Daniel Vetter wrote:
> Can you open a bug at fdo with the usual details (dont forget lspci
> -nn) and all the things we've already gathered/tried.
Bug #41705.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-12 3:09 ` Bojan Smojver
@ 2011-10-12 22:16 ` Bojan Smojver
2011-10-26 3:44 ` Bojan Smojver
1 sibling, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-12 22:16 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Wed, 2011-10-12 at 14:09 +1100, Bojan Smojver wrote:
> Bug #41705.
Going off list for a while (way too much info I don't need/cannot grok).
If anything new develops, please let me know through the bug. Thanks.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-09-27 6:12 Memory corruption on hibernate/thaw with KMS Bojan Smojver
` (3 preceding siblings ...)
2011-10-10 8:01 ` Bojan Smojver
@ 2011-10-14 1:02 ` Bojan Smojver
2011-10-14 1:39 ` Bojan Smojver
4 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-14 1:02 UTC (permalink / raw)
To: intel-gfx
On Tue, 2011-09-27 at 16:12 +1000, Bojan Smojver wrote:
> Does anyone have any idea what's causing this? Or better, how to fix
> it? I can confirm that as of kernel 3.1.0-rc7 this is still a problem.
Just one more note here. I tested hibernation/thaw cycles on one of my
old machines, an HP Pavilion ZE4201 notebook, which has integrated
Radeon graphics (IGP 340M, which is RS200 chip). This box is a 32-bit
machine, as opposed to my ThinkPad T510, which is running 64-bit stuff.
Similar behaviour on hibernate/thaw - with nomodeset, no trouble. With
KMS, trouble after a few hibernate/thaw cycles (NULL pointers and other
kernel dumps onto the console). Interesting. Maybe the problem is not
Intel specific after all.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:31 ` Daniel Vetter
2011-10-11 12:25 ` Bojan Smojver
2011-10-12 1:38 ` Bojan Smojver
@ 2011-10-14 1:14 ` Bojan Smojver
2011-10-14 1:29 ` Bojan Smojver
3 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-14 1:14 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Tue, 2011-10-11 at 13:31 +0200, Daniel Vetter wrote:
> Another thing to check: Do you have the intel_ips module loaded? If
> so, does the corruption still occur if you never ever load it?
I still have to test this, but with nomodeset, this module is loaded as
well, resulting in no trouble.
Or is it particular interaction of intel_ips and i915 that you are
worried about?
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-11 11:31 ` Daniel Vetter
` (2 preceding siblings ...)
2011-10-14 1:14 ` Bojan Smojver
@ 2011-10-14 1:29 ` Bojan Smojver
3 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-14 1:29 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Fri, 2011-10-14 at 12:14 +1100, Bojan Smojver wrote:
> I still have to test this
With intel_ips removed but with KMS kept, the machine hung on thaw
number three or four, so pretty soon.
I think the best bet is to look into general KMS code. Of course, take
all I say with lots and lots of salt. :-)
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-14 1:02 ` Bojan Smojver
@ 2011-10-14 1:39 ` Bojan Smojver
2011-10-14 2:01 ` Bojan Smojver
0 siblings, 1 reply; 35+ messages in thread
From: Bojan Smojver @ 2011-10-14 1:39 UTC (permalink / raw)
To: intel-gfx
On Fri, 2011-10-14 at 12:02 +1100, Bojan Smojver wrote:
> Just one more note here.
Forgot - Rafael mentioned that the patch below may help with 64-bit
boxes. So, I'm going to test this next.
---------------------
arch/x86/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/arch/x86/mm/init.c
===================================================================
--- linux.orig/arch/x86/mm/init.c
+++ linux/arch/x86/mm/init.c
@@ -63,9 +63,9 @@ static void __init find_early_table_spac
#ifdef CONFIG_X86_32
/* for fixmap */
tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
+#endif
good_end = max_pfn_mapped << PAGE_SHIFT;
-#endif
base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
if (base == MEMBLOCK_ERROR)
---------------------
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-14 1:39 ` Bojan Smojver
@ 2011-10-14 2:01 ` Bojan Smojver
0 siblings, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-14 2:01 UTC (permalink / raw)
To: intel-gfx
On Fri, 2011-10-14 at 12:39 +1100, Bojan Smojver wrote:
> So, I'm going to test this next.
That patch didn't help. Again, the box hung.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Memory corruption on hibernate/thaw with KMS
2011-10-12 3:09 ` Bojan Smojver
2011-10-12 22:16 ` Bojan Smojver
@ 2011-10-26 3:44 ` Bojan Smojver
1 sibling, 0 replies; 35+ messages in thread
From: Bojan Smojver @ 2011-10-26 3:44 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Wed, 2011-10-12 at 14:09 +1100, Bojan Smojver wrote:
> Bug #41705.
Just to follow up on this a bit, I cloned Linus' tree as of today (i.e.
currently staged stuff for 3.2) then pulled Keith's tree
(git://people.freedesktop.org/~keithp/linux drm-intel-next) over the top
and compiled. Did 26 hibernate/thaw cycles and then went to check the
machine.
Unfortunately, I then got:
---------------------------
[ 729.195407] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 729.199345] IP: [<ffffffff8125121d>] __list_add+0x14/0x7f
[ 729.203288] PGD 0
[ 729.207051] Oops: 0000 [#1] SMP
[ 729.210874] CPU 0
[ 729.210901] Modules linked in: fuse ppdev parport_pc lp parport bnep bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm thinkpad_acpi snd_timer e1000e uvcvideo rfkill snd videodev media v4l2_compat_ioctl32 qcserial usb_wwan mxm_wmi wmi snd_page_alloc microcode i2c_i801 iTCO_wdt iTCO_vendor_support pcspkr intel_ips soundcore joydev ipv6 firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[ 729.231392]
[ 729.235351] Pid: 897, comm: dbus-daemon Tainted: G W 3.1.0+ #2 LENOVO 4313CTO/4313CTO
[ 729.239316] RIP: 0010:[<ffffffff8125121d>] [<ffffffff8125121d>] __list_add+0x14/0x7f
[ 729.243143] RSP: 0018:ffff88022ce57d60 EFLAGS: 00010286
[ 729.246941] RAX: ffff8801ab1454d0 RBX: 0000000000000000 RCX: 0000000000000054
[ 729.250770] RDX: 0000000000000000 RSI: ffff880229777100 RDI: ffff8801ab145520
[ 729.254604] RBP: ffff88022ce57d80 R08: ffff88020c7f28e8 R09: 00007f0aaeda4000
[ 729.258294] R10: 0000000000015ff8 R11: 0000000000015fa8 R12: ffff880229777100
[ 729.261820] R13: ffff8801ab145520 R14: ffff8802297770b0 R15: ffff8801ab1450b0
[ 729.265401] FS: 00007f0aaed80800(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
[ 729.269032] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 729.272674] CR2: 0000000000000008 CR3: 000000022d275000 CR4: 00000000000006f0
[ 729.276338] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 729.279988] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 729.283616] Process dbus-daemon (pid: 897, threadinfo ffff88022ce56000, task ffff88022ca55c80)
[ 729.287278] Stack:
[ 729.290885] ffff88020c7f28e8 ffff88022c192d80 ffff88022a0bd500 ffff8801ab1454d0
[ 729.294593] ffff88022ce57d90 ffffffff810f1fe5 ffff88022ce57e10 ffffffff81055b6b
[ 729.298289] 0000000000000000 ffff8801ab1450e8 ffff8801ab1450f0 ffff8801ab1450c8
[ 729.301966] Call Trace:
[ 729.305664] [<ffffffff810f1fe5>] vma_prio_tree_add+0x81/0x95
[ 729.309405] [<ffffffff81055b6b>] dup_mm+0x2f3/0x488
[ 729.313143] [<ffffffff810566db>] copy_process+0x9b1/0x119c
[ 729.316888] [<ffffffff811fdca6>] ? security_file_alloc+0x16/0x18
[ 729.320631] [<ffffffff81129db5>] ? get_empty_filp+0xa4/0x133
[ 729.324351] [<ffffffff81056ff0>] do_fork+0xef/0x22d
[ 729.328029] [<ffffffff813daf9e>] ? sock_alloc_file+0xb3/0x114
[ 729.331672] [<ffffffff810440eb>] ? should_resched+0xe/0x2d
[ 729.335300] [<ffffffff8149a9bd>] ? _cond_resched+0xe/0x22
[ 729.338914] [<ffffffff813d9388>] ? might_fault+0xe/0x10
[ 729.342532] [<ffffffff81016336>] sys_clone+0x28/0x2a
[ 729.346128] [<ffffffff814a2ae3>] stub_clone+0x13/0x20
[ 729.349661] [<ffffffff814a27c2>] ? system_call_fastpath+0x16/0x1b
[ 729.353249] Code: ad de 48 b9 00 02 20 00 00 00 ad de 48 89 13 48 89 4b 08 5e 5b 5d c3 55 48 89 e5 41 55 49 89 fd 41 54 49 89 f4 53 48 89 d3 41 50 <4c> 8b 42 08 49 39 f0 74 20 49 89 d1 48 89 f1 48 c7 c2 98 15 7e
[ 729.361220] RIP [<ffffffff8125121d>] __list_add+0x14/0x7f
[ 729.365218] RSP <ffff88022ce57d60>
[ 729.369206] CR2: 0000000000000008
[ 729.439968] ---[ end trace a0f13f2533f6746a ]---
---------------------------
Followed by machine becoming weird and throwing a whole lot more kernel
errors, which I could not capture any more.
The only other error was unrelated. It looked like this:
---------------------------
[ 272.029435] ------------[ cut here ]------------
[ 272.029441] WARNING: at drivers/net/ethernet/intel/e1000e/ich8lan.c:870 e1000_acquire_swflag_ich8lan+0x4f/0x143 [e1000e]()
[ 272.029443] Hardware name: 4313CTO
[ 272.029445] e1000e: eth0: contention for Phy access
[ 272.029446] Modules linked in: fuse ppdev parport_pc lp parport bnep bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm thinkpad_acpi snd_timer e1000e uvcvideo rfkill snd videodev media v4l2_compat_ioctl32 qcserial usb_wwan mxm_wmi wmi snd_page_alloc microcode i2c_i801 iTCO_wdt iTCO_vendor_support pcspkr intel_ips soundcore joydev ipv6 firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[ 272.029480] Pid: 5712, comm: kworker/1:4 Tainted: G W 3.1.0+ #2
[ 272.029482] Call Trace:
[ 272.029485] [<ffffffff81057a36>] warn_slowpath_common+0x83/0x9b
[ 272.029488] [<ffffffff81057af1>] warn_slowpath_fmt+0x46/0x48
[ 272.029492] [<ffffffff81084f41>] ? smp_call_function_single+0x97/0xfd
[ 272.029499] [<ffffffffa0225767>] e1000_acquire_swflag_ich8lan+0x4f/0x143 [e1000e]
[ 272.029508] [<ffffffffa022e5d0>] __e1000_read_phy_reg_hv+0x4d/0x157 [e1000e]
[ 272.029518] [<ffffffffa022ee8a>] e1000_read_phy_reg_hv+0x13/0x15 [e1000e]
[ 272.029527] [<ffffffffa02327b3>] e1000_phy_read_status+0xf6/0x163 [e1000e]
[ 272.029537] [<ffffffffa0236df2>] e1000_watchdog_task+0x104/0x5d2 [e1000e]
[ 272.029540] [<ffffffff8149a93e>] ? __schedule+0x63b/0x669
[ 272.029550] [<ffffffffa0236cee>] ? e1000_update_mng_vlan+0x68/0x68 [e1000e]
[ 272.029554] [<ffffffff8106eab0>] process_one_work+0x176/0x2a9
[ 272.029559] [<ffffffff8106f5be>] worker_thread+0xda/0x15d
[ 272.029562] [<ffffffff8106f4e4>] ? manage_workers+0x176/0x176
[ 272.029565] [<ffffffff81072a0b>] kthread+0x84/0x8c
[ 272.029568] [<ffffffff814a4934>] kernel_thread_helper+0x4/0x10
[ 272.029572] [<ffffffff81072987>] ? kthread_worker_fn+0x148/0x148
[ 272.029575] [<ffffffff814a4930>] ? gs_change+0x13/0x13
[ 272.029577] ---[ end trace a0f13f2533f67469 ]---
---------------------------
So, yeah, still there with the latest code.
PS. I will post this comment into the bug too.
--
Bojan
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2011-10-26 3:44 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-27 6:12 Memory corruption on hibernate/thaw with KMS Bojan Smojver
2011-09-29 23:38 ` Bojan Smojver
2011-10-03 6:44 ` Bojan Smojver
2011-10-04 2:32 ` Eugeni Dodonov
2011-10-04 2:46 ` Bojan Smojver
2011-10-04 3:21 ` Bojan Smojver
2011-10-10 7:15 ` Bojan Smojver
2011-10-10 7:53 ` Daniel Vetter
2011-10-10 8:18 ` Bojan Smojver
2011-10-10 10:05 ` Bojan Smojver
2011-10-10 10:12 ` Bojan Smojver
2011-10-10 10:37 ` Bojan Smojver
2011-10-10 11:23 ` Bojan Smojver
2011-10-11 9:29 ` Daniel Vetter
2011-10-11 9:42 ` Bojan Smojver
2011-10-11 11:02 ` Bojan Smojver
2011-10-11 11:22 ` Daniel Vetter
2011-10-11 11:31 ` Daniel Vetter
2011-10-11 12:25 ` Bojan Smojver
2011-10-12 1:38 ` Bojan Smojver
2011-10-14 1:14 ` Bojan Smojver
2011-10-14 1:29 ` Bojan Smojver
2011-10-11 12:24 ` Bojan Smojver
2011-10-11 15:51 ` Eugeni Dodonov
2011-10-11 16:13 ` Daniel Vetter
2011-10-12 3:09 ` Bojan Smojver
2011-10-12 22:16 ` Bojan Smojver
2011-10-26 3:44 ` Bojan Smojver
2011-10-11 9:55 ` Bojan Smojver
2011-10-11 10:39 ` Daniel Vetter
2011-10-11 10:41 ` Bojan Smojver
2011-10-10 8:01 ` Bojan Smojver
2011-10-14 1:02 ` Bojan Smojver
2011-10-14 1:39 ` Bojan Smojver
2011-10-14 2:01 ` Bojan Smojver
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox