From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: Re: hang in pvfb resulting from save/restore?
Date: Tue, 27 May 2008 11:16:18 +0100
Message-ID: <483BDF72.408@goop.org>
References: <4837BED0.7020005@goop.org> <87ve10kwj8.fsf@pike.pond.sub.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <87ve10kwj8.fsf@pike.pond.sub.org>
List-Unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Markus Armbruster <armbru@redhat.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>, jayakumar.lkml@gmail.com
List-Id: xen-devel@lists.xenproject.org

Markus Armbruster wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>
>   
>> I suspect there's a bug in xen-pvfb, possibly triggered by save/restore.
>>
>> If I run X on pvfb, running a couple of instances of something busy
>> like glxgears, and then do a few rounds of save/restore, one of my
>> "events" kernel threads goes into 100% CPU spin and X stops
>> responding.  I'm not sure what it's doing, but after a while the
>> softlockup detector triggers:
>>
>> INFO: task X:3408 blocked for more than 240 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> X             D cf5f7da0     0  3408   3403
>>       cf5f7de4 00000202 00000000 cf5f7da0 c05c3400 cf5f7da8 c01022b3
>> ca838000     ca838268 c184fa80 00000000 00000040 cf5f6000 ca838000
>> 0003909e c0149d2c     c025a8c3 00000000 00000000 00000000 ffffffff
>> c05d8288 c05d8284 00000200 Call Trace:
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0149d2c>] ? lock_contended+0x15a/0x16f
>> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
>> [<c045e9de>] mutex_lock_nested+0x17d/0x296
>> [<c025a8c3>] ? fb_deferred_io_mkwrite+0x23/0x56
>> [<c025a8c3>] fb_deferred_io_mkwrite+0x23/0x56
>> [<c0164bb9>] do_wp_page+0xdc/0x6bc
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0149ebc>] ? lock_acquired+0x17b/0x194
>> [<c016826b>] handle_mm_fault+0xa2b/0xb36
>> [<c010c9a4>] ? restore_i387+0xeb/0x138
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c014bb2b>] ? lock_acquire+0x99/0xa6
>> [<c011d1eb>] ? do_page_fault+0x433/0x934
>> [<c0142089>] ? down_read_trylock+0x37/0x41
>> [<c011d29f>] do_page_fault+0x4e7/0x934
>> [<c0105ac1>] ? restore_sigcontext+0x14d/0x1cb
>> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
>> [<c01022b3>] ? xen_restore_fl+0x2e/0x52
>> [<c0148ca7>] ? trace_hardirqs_off+0xb/0xd
>> [<c011cdb8>] ? do_page_fault+0x0/0x934
>> [<c04607c2>] error_code+0x72/0x78
>>     
>
> Implicates drivers/video/fb_defio.c (author cc'ed).  Jaya, any ideas?
>   

Well, looking at that code, it would seem that fb_deferred_io_work() is 
spinning indefinitely while holding the mutex, which suggests that the 
page list has got corrupted somehow.  Not obvious where though...

    J