From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Re: hang in pvfb resulting from save/restore? Date: Tue, 27 May 2008 11:16:18 +0100 Message-ID: <483BDF72.408@goop.org> References: <4837BED0.7020005@goop.org> <87ve10kwj8.fsf@pike.pond.sub.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87ve10kwj8.fsf@pike.pond.sub.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Markus Armbruster Cc: Xen-devel , jayakumar.lkml@gmail.com List-Id: xen-devel@lists.xenproject.org Markus Armbruster wrote: > Jeremy Fitzhardinge writes: > > >> I suspect there's a bug in xen-pvfb, possibly triggered by save/restore. >> >> If I run X on pvfb, running a couple of instances of something busy >> like glxgears, and then do a few rounds of save/restore, one of my >> "events" kernel threads goes into 100% CPU spin and X stops >> responding. I'm not sure what it's doing, but after a while the >> softlockup detector triggers: >> >> INFO: task X:3408 blocked for more than 240 seconds. >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> X D cf5f7da0 0 3408 3403 >> cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3 >> ca838000 ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000 >> 0003909e c0149d2c c025a8c3 00000000 00000000 00000000 ffffffff >> c05d8288 c05d8284 00000200 Call Trace: >> [] ? xen_restore_fl+0x2e/0x52 >> [] ? lock_contended+0x15a/0x16f >> [] ? fb_deferred_io_mkwrite+0x23/0x56 >> [] mutex_lock_nested+0x17d/0x296 >> [] ? fb_deferred_io_mkwrite+0x23/0x56 >> [] fb_deferred_io_mkwrite+0x23/0x56 >> [] do_wp_page+0xdc/0x6bc >> [] ? xen_restore_fl+0x2e/0x52 >> [] ? lock_acquired+0x17b/0x194 >> [] handle_mm_fault+0xa2b/0xb36 >> [] ? restore_i387+0xeb/0x138 >> [] ? xen_restore_fl+0x2e/0x52 >> [] ? lock_acquire+0x99/0xa6 >> [] ? do_page_fault+0x433/0x934 >> [] ? down_read_trylock+0x37/0x41 >> [] do_page_fault+0x4e7/0x934 >> [] ? restore_sigcontext+0x14d/0x1cb >> [] ? trace_hardirqs_off+0xb/0xd >> [] ? xen_restore_fl+0x2e/0x52 >> [] ? trace_hardirqs_off+0xb/0xd >> [] ? do_page_fault+0x0/0x934 >> [] error_code+0x72/0x78 >> > > Implicates drivers/video/fb_defio.c (author cc'ed). Jaya, any ideas? > Well, looking at that code, it would seem that fb_deferred_io_work() is spinning indefinitely while holding the mutex, which suggests that the page list has got corrupted somehow. Not obvious where though... J