* hang in pvfb resulting from save/restore?
@ 2008-05-24 7:08 Jeremy Fitzhardinge
2008-05-27 7:59 ` Markus Armbruster
0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-24 7:08 UTC (permalink / raw)
To: Markus Armbruster; +Cc: Xen-devel
I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.
If I run X on pvfb, running a couple of instances of something busy like
glxgears, and then do a few rounds of save/restore, one of my "events"
kernel threads goes into 100% CPU spin and X stops responding. I'm not
sure what it's doing, but after a while the softlockup detector triggers:
INFO: task X:3408 blocked for more than 240 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
X D cf5f7da0 0 3408 3403
cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3 ca838000
ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000 0003909e c0149d2c
c025a8c3 00000000 00000000 00000000 ffffffff c05d8288 c05d8284 00000200
Call Trace:
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0149d2c>] ? lock_contended+0x15a/0x16f
[<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
[<c045e9de>] mutex_lock_nested+0x17d/0x296
[<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
[<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
[<c0164bb9>] do_wp_page+0xdc/0x6bc
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0149ebc>] ? lock_acquired+0x17b/0x194
[<c016826b>] handle_mm_fault+0xa2b/0xb36
[<c010c9a4>] ? restore_i387+0xeb/0x138
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c014bb2b>] ? lock_acquire+0x99/0xa6
[<c011d1eb>] ? do_page_fault+0x433/0x934
[<c0142089>] ? down_read_trylock+0x37/0x41
[<c011d29f>] do_page_fault+0x4e7/0x934
[<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
[<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
[<c011cdb8>] ? do_page_fault+0x0/0x934
[<c04607c2>] error_code+0x72/0x78
=======================
INFO: lockdep is turned off.
The rest of the system is working OK (though I expect things are getting
queued up on events/1).
I haven't dug in to really see what the problem is, but given that I
just implemented save/restore, it seems likely that pvfb's save/restore
handling will be a bit untested ;)
Other info:
X's wchan is fb_deferred_io_mkwrite
I can't work out where events/1 is spinning. xenctx shows the eip is
xen_irq_disable, apparently unchanging; xenctx doesn't seem to be able
to read the stack, so I don't have any context.
...and now the whole vm has locked up, so I can't investigate more.
J
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: hang in pvfb resulting from save/restore?
2008-05-24 7:08 hang in pvfb resulting from save/restore? Jeremy Fitzhardinge
@ 2008-05-27 7:59 ` Markus Armbruster
2008-05-27 10:16 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 4+ messages in thread
From: Markus Armbruster @ 2008-05-27 7:59 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Xen-devel, jayakumar.lkml
Jeremy Fitzhardinge <jeremy@goop.org> writes:
> I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.
>
> If I run X on pvfb, running a couple of instances of something busy
> like glxgears, and then do a few rounds of save/restore, one of my
> "events" kernel threads goes into 100% CPU spin and X stops
> responding. I'm not sure what it's doing, but after a while the
> softlockup detector triggers:
>
> INFO: task X:3408 blocked for more than 240 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> X D cf5f7da0 0 3408 3403
> cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3
> ca838000 ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000
> 0003909e c0149d2c c025a8c3 00000000 00000000 00000000 ffffffff
> c05d8288 c05d8284 00000200 Call Trace:
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c0149d2c>] ? lock_contended+0x15a/0x16f
> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
> [<c045e9de>] mutex_lock_nested+0x17d/0x296
> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
> [<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
> [<c0164bb9>] do_wp_page+0xdc/0x6bc
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c0149ebc>] ? lock_acquired+0x17b/0x194
> [<c016826b>] handle_mm_fault+0xa2b/0xb36
> [<c010c9a4>] ? restore_i387+0xeb/0x138
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c014bb2b>] ? lock_acquire+0x99/0xa6
> [<c011d1eb>] ? do_page_fault+0x433/0x934
> [<c0142089>] ? down_read_trylock+0x37/0x41
> [<c011d29f>] do_page_fault+0x4e7/0x934
> [<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
> [<c011cdb8>] ? do_page_fault+0x0/0x934
> [<c04607c2>] error_code+0x72/0x78
Implicates drivers/video/fb_defio.c (author cc'ed). Jaya, any ideas?
> =======================
> INFO: lockdep is turned off.
>
> The rest of the system is working OK (though I expect things are
> getting queued up on events/1).
>
> I haven't dug in to really see what the problem is, but given that I
> just implemented save/restore, it seems likely that pvfb's
> save/restore handling will be a bit untested ;)
Ha, so it's your bug then! ;->
> Other info:
>
> X's wchan is fb_deferred_io_mkwrite
>
> I can't work out where events/1 is spinning. xenctx shows the eip is
> xen_irq_disable, apparently unchanging; xenctx doesn't seem to be able
> to read the stack, so I don't have any context.
>
> ...and now the whole vm has locked up, so I can't investigate more.
>
> J
Okay, thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: hang in pvfb resulting from save/restore?
2008-05-27 7:59 ` Markus Armbruster
@ 2008-05-27 10:16 ` Jeremy Fitzhardinge
[not found] ` <45a44e480805271107p2c42369ay5e3fc61313432d3c@mail.gmail.com>
0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-27 10:16 UTC (permalink / raw)
To: Markus Armbruster; +Cc: Xen-devel, jayakumar.lkml
Markus Armbruster wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>
>
>> I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.
>>
>> If I run X on pvfb, running a couple of instances of something busy
>> like glxgears, and then do a few rounds of save/restore, one of my
>> "events" kernel threads goes into 100% CPU spin and X stops
>> responding. I'm not sure what it's doing, but after a while the
>> softlockup detector triggers:
>>
>> INFO: task X:3408 blocked for more than 240 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> X D cf5f7da0 0 3408 3403
>> cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3
>> ca838000 ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000
>> 0003909e c0149d2c c025a8c3 00000000 00000000 00000000 ffffffff
>> c05d8288 c05d8284 00000200 Call Trace:
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0149d2c>] ? lock_contended+0x15a/0x16f
>> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
>> [<c045e9de>] mutex_lock_nested+0x17d/0x296
>> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
>> [<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
>> [<c0164bb9>] do_wp_page+0xdc/0x6bc
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0149ebc>] ? lock_acquired+0x17b/0x194
>> [<c016826b>] handle_mm_fault+0xa2b/0xb36
>> [<c010c9a4>] ? restore_i387+0xeb/0x138
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c014bb2b>] ? lock_acquire+0x99/0xa6
>> [<c011d1eb>] ? do_page_fault+0x433/0x934
>> [<c0142089>] ? down_read_trylock+0x37/0x41
>> [<c011d29f>] do_page_fault+0x4e7/0x934
>> [<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
>> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
>> [<c011cdb8>] ? do_page_fault+0x0/0x934
>> [<c04607c2>] error_code+0x72/0x78
>>
>
> Implicates drivers/video/fb_defio.c (author cc'ed). Jaya, any ideas?
>
Well, looking at that code, it would seem that fb_deferred_io_work() is
spinning indefinitely while holding the mutex, which suggests that the
page list has got corrupted somehow. Not obvious where though...
J
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-05-27 18:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-24 7:08 hang in pvfb resulting from save/restore? Jeremy Fitzhardinge
2008-05-27 7:59 ` Markus Armbruster
2008-05-27 10:16 ` Jeremy Fitzhardinge
[not found] ` <45a44e480805271107p2c42369ay5e3fc61313432d3c@mail.gmail.com>
2008-05-27 18:51 ` Jeremy Fitzhardinge
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.