public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
* Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd
@ 2009-11-09 11:50 Rafael J. Wysocki
  2009-11-09 12:02 ` Ingo Molnar
       [not found] ` <20091109120217.GB18399@elte.hu>
  0 siblings, 2 replies; 56+ messages in thread
From: Rafael J. Wysocki @ 2009-11-09 11:50 UTC (permalink / raw)
  To: LKML
  Cc: Greg KH, Jesse Barnes, Ingo Molnar, pm list, Linus Torvalds,
	Thomas Gleixner

Hi,

For the last couple of days I've been trying to debug resume problems on
Toshiba Portege R500 that seem to be to a leakage of preempt_count in the
"events" thread.

To start with, more-or-less since 2.6.32-rc4 I've been observing a
resume-from-hibernation issue causing the box to hang solid, apparently during
the final switch from a VT to X (it ends up with the X cursor in the black
background).  Initially I thought it was related to the graphics driver (i915),
but I didn't use KMS on this box and the non-KMS suspend-resume handling
hasn't really changed in there since 2.6.31.

However, after 2.6.32-rc6 I started to observe similar symptoms during resume
from suspend to RAM.  Then, I thought it was a separate issue and tried to
narrow it down.  I thought it might be related to the VGA framebuffer, so I
started to boot with vga=0 and it turned out that it apparently was a NULL
pointer dereference in worker_thread() of the "events" thread (with the
graphics framebuffer the box just hanged with the X cursor in the black
backgroung, sometimes it managed to display the windows and then hanged).

I'm saying "apparently", because I've never been able to see the entire oops
message.  I only saw a call trace pointing to autoremove_wake_function() and
saying that RIP was somewhere in worker_thread(), so I used gdb to find the
code this address corresponded to and it turned out to be the first instruction
of __list_del().  This, in turn, appeared to mean that the list_del_init() in
run_workqueue(), which was inlined, hit the NULL pointer and that seemed quite
strange to me.

Nevertheless, I started to search for commits that might have caused this
to happen and I first found out that the problem was not reproducible without
commit 9905d1b411946fb3fb228e8c6529fd94afda8a92 (PM / yenta: Split resume into
early and late parts (rev. 4)) from myself, but this commit didn't do anything
that might result in the observed behavior.  Moreover, it didn't cause any
problems to happen when applied on top of 2.6.31.5 on the very same box with
a very similar .config.  So, I thought that the commit just made the issue more
likely to happen and I started to use it as a "trigger" in further debugging.

Next, I started to look for kernel versions in which I was able to reproduce
the problem and I found that it was already present in 2.6.32-rc1 (or
2.6.32-rc2), so apparently it had to be introduced during the merge window.
So, I thought I'd carry out a bisection of the merge window commits and
I verified that the problem was not reproducible with the kernel where
commit 5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3 was the head
(approximately in the middle of the merge window).  However, in the next
step of bisection, in which commit c720f5655df159a630fa0290a0bd67c93e92b0bf was
the head, I got the following call trace:

[ 2016.865041] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/29920                                                                                         
[ 2016.865344] caller is vmstat_update+0x13/0x48                                          
[ 2016.865522] Pid: 29920, comm: events/1 Not tainted 2.6.31-tst #158                     
[ 2016.865700] Call Trace:                                                                
[ 2016.865877]  [<ffffffff811608e8>] debug_smp_processor_id+0xc4/0xd4                     
[ 2016.866052]  [<ffffffff810a9ae1>] vmstat_update+0x13/0x48                              
[ 2016.866232]  [<ffffffff81051ee6>] worker_thread+0x18b/0x22a                            
[ 2016.866409]  [<ffffffff810a9ace>] ? vmstat_update+0x0/0x48                             
[ 2016.866578]  [<ffffffff810556a5>] ? autoremove_wake_function+0x0/0x38                  
[ 2016.866749]  [<ffffffff81288803>] ? _spin_unlock_irqrestore+0x35/0x37                  
[ 2016.866935]  [<ffffffff81051d5b>] ? worker_thread+0x0/0x22a                            
[ 2016.867113]  [<ffffffff8105547d>] kthread+0x69/0x71                                    
[ 2016.867278]  [<ffffffff8100c1aa>] child_rip+0xa/0x20                                   
[ 2016.867450]  [<ffffffff81055414>] ? kthread+0x0/0x71                                   
[ 2016.867618]  [<ffffffff8100c1a0>] ? child_rip+0x0/0x20                                 
[ 2016.867781] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/29920                                                                                         
[ 2016.868077] caller is vmstat_update+0x2e/0x48                                          
[ 2016.868239] Pid: 29920, comm: events/1 Not tainted 2.6.31-tst #158                     
[ 2016.868406] Call Trace:                                                                
[ 2016.868576]  [<ffffffff811608e8>] debug_smp_processor_id+0xc4/0xd4                     
[ 2016.868756]  [<ffffffff810a9afc>] vmstat_update+0x2e/0x48                              
[ 2016.868938]  [<ffffffff81051ee6>] worker_thread+0x18b/0x22a                            
[ 2016.869121]  [<ffffffff810a9ace>] ? vmstat_update+0x0/0x48                             
[ 2016.869292]  [<ffffffff810556a5>] ? autoremove_wake_function+0x0/0x38                  
[ 2016.869466]  [<ffffffff81288803>] ? _spin_unlock_irqrestore+0x35/0x37                  
[ 2016.869635]  [<ffffffff81051d5b>] ? worker_thread+0x0/0x22a                            
[ 2016.869808]  [<ffffffff8105547d>] kthread+0x69/0x71                                    
[ 2016.869979]  [<ffffffff8100c1aa>] child_rip+0xa/0x20                                   
[ 2016.870149]  [<ffffffff81055414>] ? kthread+0x0/0x71                                   
[ 2016.870317]  [<ffffffff8100c1a0>] ? child_rip+0x0/0x20

where, according to gdb, worker_thread+0x18b corresponded to the

if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {

condition in run_workqueue().  That didn't kill the box, but I wonder if the
leakage of preempt_count may lead to the solid hang observed before.

I also am not sure if I should mark this commit as "bad" or "good" in the
bisection process.

Please advise.

Rafael

^ permalink raw reply	[flat|nested] 56+ messages in thread
[parent not found: <Pine.LNX.4.44L0.0911091527390.2858-100000@iolanthe.rowland.org>]
[parent not found: <200911092148.03815.rjw@sisk.pl>]

end of thread, other threads:[~2009-11-16 11:01 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-09 11:50 Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd Rafael J. Wysocki
2009-11-09 12:02 ` Ingo Molnar
     [not found] ` <20091109120217.GB18399@elte.hu>
2009-11-09 12:24   ` Rafael J. Wysocki
     [not found]   ` <200911091324.37955.rjw@sisk.pl>
2009-11-09 12:49     ` Ingo Molnar
2009-11-09 14:02       ` Thomas Gleixner
     [not found]       ` <alpine.LFD.2.00.0911091452300.2725@localhost.localdomain>
2009-11-09 14:16         ` Mike Galbraith
2009-11-09 14:26         ` Rafael J. Wysocki
     [not found]         ` <1257776176.6365.8.camel@marge.simson.net>
2009-11-09 14:27           ` Rafael J. Wysocki
2009-11-09 14:30             ` Mike Galbraith
     [not found]             ` <1257777040.6365.15.camel@marge.simson.net>
2009-11-09 15:47               ` Rafael J. Wysocki
     [not found]               ` <200911091647.54171.rjw@sisk.pl>
2009-11-09 16:19                 ` Mike Galbraith
     [not found]                 ` <1257783553.6408.6.camel@marge.simson.net>
2009-11-09 17:36                   ` Rafael J. Wysocki
     [not found]                   ` <200911091836.30349.rjw@sisk.pl>
2009-11-09 18:50                     ` Thomas Gleixner
2009-11-09 19:13                     ` Thomas Gleixner
     [not found]                     ` <alpine.LFD.2.00.0911091949500.2725@localhost.localdomain>
2009-11-09 20:00                       ` Rafael J. Wysocki
     [not found]                       ` <200911092100.58187.rjw@sisk.pl>
2009-11-09 20:31                         ` Alan Stern
2009-11-09 20:45                         ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Rafael J. Wysocki
     [not found]                         ` <200911092145.27485.rjw@sisk.pl>
2009-11-09 21:42                           ` Linus Torvalds
2009-11-10  0:19                             ` Rafael J. Wysocki
     [not found]                             ` <200911100119.38019.rjw@sisk.pl>
2009-11-10 22:02                               ` Linus Torvalds
2009-11-11  8:08                                 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Tejun Heo
2009-11-11 11:52                                 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Rafael J. Wysocki
2009-11-11 16:13                                 ` Oleg Nesterov
2009-11-11 17:17                                 ` Oleg Nesterov
     [not found]                                 ` <4AFA70FB.60706@kernel.org>
2009-11-11 18:13                                   ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Oleg Nesterov
     [not found]                                   ` <20091111181349.GA30638@redhat.com>
2009-11-12  4:56                                     ` Tejun Heo
     [not found]                                     ` <4AFB9595.1030002@kernel.org>
2009-11-12 18:35                                       ` Oleg Nesterov
     [not found]                                       ` <20091112183508.GA14661@redhat.com>
2009-11-12 19:14                                         ` Tejun Heo
2009-11-16 11:01                                           ` Tejun Heo
     [not found]                                 ` <200911111252.48214.rjw@sisk.pl>
2009-11-11 19:52                                   ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Linus Torvalds
2009-11-11 20:18                                     ` Marcel Holtmann
     [not found]                                     ` <1257970719.21467.4.camel@violet>
2009-11-11 20:25                                       ` Linus Torvalds
2009-11-11 21:18                                         ` Rafael J. Wysocki
2009-11-11 21:13                                       ` Oliver Neukum
     [not found]                                       ` <200911112213.30673.oliver@neukum.org>
2009-11-11 21:38                                         ` Linus Torvalds
2009-11-11 21:44                                           ` Oliver Neukum
     [not found]                                 ` <20091111161348.GA27394@redhat.com>
2009-11-11 20:00                                   ` Rafael J. Wysocki
     [not found]                                   ` <200911112100.16561.rjw@sisk.pl>
2009-11-11 20:11                                     ` Linus Torvalds
     [not found]                                     ` <alpine.LFD.2.01.0911111202490.31845@localhost.localdomain>
2009-11-11 20:20                                       ` Marcel Holtmann
2009-11-11 20:24                                     ` Oleg Nesterov
     [not found]                                     ` <20091111202433.GA5714@redhat.com>
2009-11-11 21:15                                       ` Oliver Neukum
     [not found]                                 ` <20091111171703.GA28506@redhat.com>
2009-11-12 17:33                                   ` Thomas Gleixner
     [not found]                                   ` <alpine.LFD.2.00.0911121821370.24119@localhost.localdomain>
2009-11-12 19:17                                     ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Tejun Heo
2009-11-12 20:53                                     ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Rafael J. Wysocki
     [not found]                                     ` <4AFC5F2E.9040709@kernel.org>
2009-11-12 20:53                                       ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Thomas Gleixner
     [not found]                                     ` <200911122153.12419.rjw@sisk.pl>
2009-11-12 20:55                                       ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Thomas Gleixner
     [not found]                                       ` <alpine.LFD.2.00.0911122153320.24119@localhost.localdomain>
2009-11-12 22:55                                         ` Rafael J. Wysocki
     [not found]                                         ` <200911122355.11675.rjw@sisk.pl>
2009-11-12 23:08                                           ` Thomas Gleixner
2009-11-15 23:37                                     ` Frederic Weisbecker
     [not found]                                     ` <20091115233703.GA6090@nowhere>
2009-11-15 23:40                                       ` Frederic Weisbecker
     [not found]                     ` <alpine.LFD.2.00.0911092012340.2725@localhost.localdomain>
2009-11-09 20:03                       ` Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd Rafael J. Wysocki
     [not found]         ` <200911091526.02147.rjw@sisk.pl>
2009-11-09 14:44           ` Mike Galbraith
     [not found]           ` <1257777896.6365.21.camel@marge.simson.net>
2009-11-09 15:47             ` Rafael J. Wysocki
2009-11-09 15:57         ` Linus Torvalds
     [not found] <Pine.LNX.4.44L0.0911091527390.2858-100000@iolanthe.rowland.org>
2009-11-09 20:48 ` Rafael J. Wysocki
     [not found] <200911092148.03815.rjw@sisk.pl>
2009-11-09 21:24 ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox