* hang in pvfb resulting from save/restore?
@ 2008-05-24 7:08 Jeremy Fitzhardinge
2008-05-27 7:59 ` Markus Armbruster
0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-24 7:08 UTC (permalink / raw)
To: Markus Armbruster; +Cc: Xen-devel
I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.
If I run X on pvfb, running a couple of instances of something busy like
glxgears, and then do a few rounds of save/restore, one of my "events"
kernel threads goes into 100% CPU spin and X stops responding. I'm not
sure what it's doing, but after a while the softlockup detector triggers:
INFO: task X:3408 blocked for more than 240 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
X D cf5f7da0 0 3408 3403
cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3 ca838000
ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000 0003909e c0149d2c
c025a8c3 00000000 00000000 00000000 ffffffff c05d8288 c05d8284 00000200
Call Trace:
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0149d2c>] ? lock_contended+0x15a/0x16f
[<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
[<c045e9de>] mutex_lock_nested+0x17d/0x296
[<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
[<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
[<c0164bb9>] do_wp_page+0xdc/0x6bc
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0149ebc>] ? lock_acquired+0x17b/0x194
[<c016826b>] handle_mm_fault+0xa2b/0xb36
[<c010c9a4>] ? restore_i387+0xeb/0x138
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c014bb2b>] ? lock_acquire+0x99/0xa6
[<c011d1eb>] ? do_page_fault+0x433/0x934
[<c0142089>] ? down_read_trylock+0x37/0x41
[<c011d29f>] do_page_fault+0x4e7/0x934
[<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
[<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
[<c01022b3>] ? xen_restore_fl+0x2e/0x52
[<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
[<c011cdb8>] ? do_page_fault+0x0/0x934
[<c04607c2>] error_code+0x72/0x78
=======================
INFO: lockdep is turned off.
The rest of the system is working OK (though I expect things are getting
queued up on events/1).
I haven't dug in to really see what the problem is, but given that I
just implemented save/restore, it seems likely that pvfb's save/restore
handling will be a bit untested ;)
Other info:
X's wchan is fb_deferred_io_mkwrite
I can't work out where events/1 is spinning. xenctx shows the eip is
xen_irq_disable, apparently unchanging; xenctx doesn't seem to be able
to read the stack, so I don't have any context.
...and now the whole vm has locked up, so I can't investigate more.
J
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: hang in pvfb resulting from save/restore?
2008-05-24 7:08 hang in pvfb resulting from save/restore? Jeremy Fitzhardinge
@ 2008-05-27 7:59 ` Markus Armbruster
2008-05-27 10:16 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 4+ messages in thread
From: Markus Armbruster @ 2008-05-27 7:59 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Xen-devel, jayakumar.lkml
Jeremy Fitzhardinge <jeremy@goop.org> writes:
> I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.
>
> If I run X on pvfb, running a couple of instances of something busy
> like glxgears, and then do a few rounds of save/restore, one of my
> "events" kernel threads goes into 100% CPU spin and X stops
> responding. I'm not sure what it's doing, but after a while the
> softlockup detector triggers:
>
> INFO: task X:3408 blocked for more than 240 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> X D cf5f7da0 0 3408 3403
> cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3
> ca838000 ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000
> 0003909e c0149d2c c025a8c3 00000000 00000000 00000000 ffffffff
> c05d8288 c05d8284 00000200 Call Trace:
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c0149d2c>] ? lock_contended+0x15a/0x16f
> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
> [<c045e9de>] mutex_lock_nested+0x17d/0x296
> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
> [<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
> [<c0164bb9>] do_wp_page+0xdc/0x6bc
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c0149ebc>] ? lock_acquired+0x17b/0x194
> [<c016826b>] handle_mm_fault+0xa2b/0xb36
> [<c010c9a4>] ? restore_i387+0xeb/0x138
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c014bb2b>] ? lock_acquire+0x99/0xa6
> [<c011d1eb>] ? do_page_fault+0x433/0x934
> [<c0142089>] ? down_read_trylock+0x37/0x41
> [<c011d29f>] do_page_fault+0x4e7/0x934
> [<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
> [<c011cdb8>] ? do_page_fault+0x0/0x934
> [<c04607c2>] error_code+0x72/0x78
Implicates drivers/video/fb_defio.c (author cc'ed). Jaya, any ideas?
> =======================
> INFO: lockdep is turned off.
>
> The rest of the system is working OK (though I expect things are
> getting queued up on events/1).
>
> I haven't dug in to really see what the problem is, but given that I
> just implemented save/restore, it seems likely that pvfb's
> save/restore handling will be a bit untested ;)
Ha, so it's your bug then! ;->
> Other info:
>
> X's wchan is fb_deferred_io_mkwrite
>
> I can't work out where events/1 is spinning. xenctx shows the eip is
> xen_irq_disable, apparently unchanging; xenctx doesn't seem to be able
> to read the stack, so I don't have any context.
>
> ...and now the whole vm has locked up, so I can't investigate more.
>
> J
Okay, thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: hang in pvfb resulting from save/restore?
2008-05-27 7:59 ` Markus Armbruster
@ 2008-05-27 10:16 ` Jeremy Fitzhardinge
[not found] ` <45a44e480805271107p2c42369ay5e3fc61313432d3c@mail.gmail.com>
0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-27 10:16 UTC (permalink / raw)
To: Markus Armbruster; +Cc: Xen-devel, jayakumar.lkml
Markus Armbruster wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>
>
>> I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.
>>
>> If I run X on pvfb, running a couple of instances of something busy
>> like glxgears, and then do a few rounds of save/restore, one of my
>> "events" kernel threads goes into 100% CPU spin and X stops
>> responding. I'm not sure what it's doing, but after a while the
>> softlockup detector triggers:
>>
>> INFO: task X:3408 blocked for more than 240 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> X D cf5f7da0 0 3408 3403
>> cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3
>> ca838000 ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000
>> 0003909e c0149d2c c025a8c3 00000000 00000000 00000000 ffffffff
>> c05d8288 c05d8284 00000200 Call Trace:
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0149d2c>] ? lock_contended+0x15a/0x16f
>> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
>> [<c045e9de>] mutex_lock_nested+0x17d/0x296
>> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
>> [<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
>> [<c0164bb9>] do_wp_page+0xdc/0x6bc
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0149ebc>] ? lock_acquired+0x17b/0x194
>> [<c016826b>] handle_mm_fault+0xa2b/0xb36
>> [<c010c9a4>] ? restore_i387+0xeb/0x138
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c014bb2b>] ? lock_acquire+0x99/0xa6
>> [<c011d1eb>] ? do_page_fault+0x433/0x934
>> [<c0142089>] ? down_read_trylock+0x37/0x41
>> [<c011d29f>] do_page_fault+0x4e7/0x934
>> [<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
>> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
>> [<c011cdb8>] ? do_page_fault+0x0/0x934
>> [<c04607c2>] error_code+0x72/0x78
>>
>
> Implicates drivers/video/fb_defio.c (author cc'ed). Jaya, any ideas?
>
Well, looking at that code, it would seem that fb_deferred_io_work() is
spinning indefinitely while holding the mutex, which suggests that the
page list has got corrupted somehow. Not obvious where though...
J
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: hang in pvfb resulting from save/restore?
[not found] ` <45a44e480805271107p2c42369ay5e3fc61313432d3c@mail.gmail.com>
@ 2008-05-27 18:51 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-27 18:51 UTC (permalink / raw)
To: Jaya Kumar; +Cc: Xen-devel, Markus Armbruster
Jaya Kumar wrote:
> Is this a scenario where multiple processes are mapping identical
> framebuffer areas simultaneously? If so, there is a bug in defio that
> corrupts the pagelist. I posted a patch for that here:
>
> http://marc.info/?l=linux-fbdev-devel&m=120935156922859&w=2
>
> I hope that helps.
>
Not sure. It's X using its /dev/fb driver. I don't think anything else
will have mapped the framebuffer in that case.
I'll do some more experiments with list debugging enabled, to see if
that shakes out any problems.
I'll try your patch, since it doesn't seem to be in current -git. The
spinning around the list is a common symptom of adding something to a
list more than once.
J
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-05-27 18:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-24 7:08 hang in pvfb resulting from save/restore? Jeremy Fitzhardinge
2008-05-27 7:59 ` Markus Armbruster
2008-05-27 10:16 ` Jeremy Fitzhardinge
[not found] ` <45a44e480805271107p2c42369ay5e3fc61313432d3c@mail.gmail.com>
2008-05-27 18:51 ` Jeremy Fitzhardinge
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.