From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Prepping a bugfix push Date: Thu, 03 Dec 2009 17:10:42 -0800 Message-ID: <4B186192.8000201@goop.org> References: <4B1810DF.40309@goop.org> <20091203193540.GB4228@kremvax.cs.ubc.ca> <4B184830.7070107@goop.org> <20091204002406.GB5897@kremvax.cs.ubc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20091204002406.GB5897@kremvax.cs.ubc.ca> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian.Campbell@eu.citrix.com, konrad.wilk@oracle.com, pbonzini@redhat.com, xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 12/03/09 16:24, Brendan Cully wrote: > On Thursday, 03 December 2009 at 15:22, Jeremy Fitzhardinge wrote: > >> On 12/03/09 11:35, Brendan Cully wrote: >> >>> Not a patch, but I've just tried out xm save -c again with the latest >>> xen changes, and while I no longer see the grant table version panic, >>> the guest's devices (aside from the console) appear to be wedged on >>> resume. Is anyone else seeing this? >>> >>> After a while on the console I see messages like this: >>> >>> INFO: task syslogd:2219 blocked for more than 120 seconds. >>> >>> which I assume is trouble with the block device. >>> >> Is there any backtrace or other info associated with this? >> > I take it you couldn't reproduce? Let me know what you want > captured. In the meantime, here's the kernel log: > Nope, works for me :) Ian has been doing testing with hundreds or thousands of suspend/resumes, so it *can* be reliable. That said, I am seeing some odd hangs after suspend/resume with the bugfix branch merged into Linux mainline... Can you ping the domain when its in that state? J > Dec 4 00:16:12 p32 kernel: suspending xenstore... > Dec 4 00:16:12 p32 kernel: Grant tables using version 2 layout. > > Here's a backtrace after the block message: > > INFO: task kjournald:880 blocked for more than 120 seconds. > "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kjournald D 00000000 0 880 2 0x00000000 > cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001 > cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380 > c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854 > Call Trace: > [] ? check_events+0x8/0xc > [] io_schedule+0x1e/0x28 > [] sync_buffer+0x33/0x37 > [] __wait_on_bit+0x45/0x62 > [] ? sync_buffer+0x0/0x37 > [] out_of_line_wait_on_bit+0x5f/0x72 > [] ? sync_buffer+0x0/0x37 > [] ? wake_bit_function+0x0/0x47 > [] __wait_on_buffer+0x23/0x25 > [] journal_commit_transaction+0x50b/0xdeb > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x17/0x1d > [] ? try_to_del_timer_sync+0x49/0x52 > [] kjournald+0xab/0x1f0 > [] ? autoremove_wake_function+0x0/0x38 > [] ? kjournald+0x0/0x1f0 > [] kthread+0x74/0x7f > [] ? kthread+0x0/0x7f > [] kernel_thread_helper+0x7/0x10 > > and here's the result of xm sysrq l > SysRq : Show backtrace of all active CPUs > > and xm sysrq w > > SysRq : Show Blocked State > task PC stack pid father > pdflush D cf40297c 0 150 2 0x00000000 > cf95dec8 00000246 00000000 cf40297c c1550280 c1553680 c10067df c1550280 > 00000000 c152f380 c152f380 c152f380 c1550280 cf8b3130 cf8b32bc d0808380 > c1553680 ffff995d cf95dec8 c1006f04 c1550280 c1553a6c c1550280 c1006efb > Call Trace: > [] ? xen_force_evtchn_callback+0xf/0x14 > [] ? check_events+0x8/0xc > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] schedule_timeout+0xd6/0x13e > [] ? xen_force_evtchn_callback+0xf/0x14 > [] ? process_timeout+0x0/0xa > [] io_schedule_timeout+0x1e/0x28 > [] congestion_wait+0x51/0x66 > [] ? autoremove_wake_function+0x0/0x38 > [] wb_kupdate+0xb3/0xfa > [] ? pdflush+0x0/0x205 > [] pdflush+0x100/0x205 > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x17/0x1d > [] ? wb_kupdate+0x0/0xfa > [] kthread+0x74/0x7f > [] ? kthread+0x0/0x7f > [] kernel_thread_helper+0x7/0x10 > kjournald D 00000000 0 880 2 0x00000000 > cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001 > cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380 > c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854 > Call Trace: > [] ? check_events+0x8/0xc > [] io_schedule+0x1e/0x28 > [] sync_buffer+0x33/0x37 > [] __wait_on_bit+0x45/0x62 > [] ? sync_buffer+0x0/0x37 > [] out_of_line_wait_on_bit+0x5f/0x72 > [] ? sync_buffer+0x0/0x37 > [] ? wake_bit_function+0x0/0x47 > [] __wait_on_buffer+0x23/0x25 > [] journal_commit_transaction+0x50b/0xdeb > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x17/0x1d > [] ? try_to_del_timer_sync+0x49/0x52 > [] kjournald+0xab/0x1f0 > [] ? autoremove_wake_function+0x0/0x38 > [] ? kjournald+0x0/0x1f0 > [] kthread+0x74/0x7f > [] ? kthread+0x0/0x7f > [] kernel_thread_helper+0x7/0x10 > syslogd D 00000000 0 2219 1 0x00000000 > cfb33e40 00000286 00000000 00000000 0000aa70 00000000 c10067df cfb7026c > 00000001 c152f380 c152f380 c152f380 00000000 cfa2a130 cfa2a2bc d0808380 > cfb33e64 cfb70254 c1006efb c137c4f4 c10282ff cfb33e58 cfb33e40 c1049d05 > Call Trace: > [] ? xen_force_evtchn_callback+0xf/0x14 > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x17/0x1d > [] ? __wake_up+0x3a/0x42 > [] ? prepare_to_wait+0x41/0x46 > [] log_wait_commit+0x87/0xef > [] ? autoremove_wake_function+0x0/0x38 > [] journal_stop+0x1dc/0x29f > [] ? journal_start+0x88/0xb2 > [] journal_force_commit+0x1d/0x1f > [] ext3_force_commit+0x1c/0x22 > [] ext3_write_inode+0x2d/0x3b > [] writeback_single_inode+0x1f5/0x2be > [] ? do_writepages+0x37/0x39 > [] sync_inode+0x1f/0x37 > [] ext3_sync_file+0x99/0xb0 > [] vfs_fsync+0x73/0xb1 > [] do_fsync+0x22/0x32 > [] sys_fsync+0xd/0xf > [] sysenter_do_call+0x12/0x2c > getty D cf402840 0 2330 1 0x00000004 > cfb23cc0 00000286 00000000 cf402840 00000000 00000000 00000000 00000000 > 00000000 c152f380 c152f380 c152f380 00000000 cf8740f0 cf87427c d0808380 > cfb23d14 c19202c8 c1006efb c137c4f4 00000002 cfb23d08 cfb23cc0 c1049d05 > Call Trace: > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x17/0x1d > [] ? prepare_to_wait+0x41/0x46 > [] do_get_write_access+0x2ab/0x3cd > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? wake_bit_function+0x0/0x47 > [] journal_get_write_access+0x1b/0x2a > [] __ext3_journal_get_write_access+0x19/0x3f > [] ext3_reserve_inode_write+0x34/0x6a > [] ext3_mark_inode_dirty+0x17/0x2e > [] ext3_dirty_inode+0x6b/0x6d > [] ? file_remove_suid+0x1b/0x59 > [] __mark_inode_dirty+0x23/0xd2 > [] file_update_time+0x76/0xb3 > [] __generic_file_aio_write_nolock+0x235/0x4b3 > [] ? filemap_fault+0x76/0x3b1 > [] generic_file_aio_write+0x5d/0xd1 > [] ext3_file_write+0x27/0xa5 > [] do_sync_write+0xcd/0x103 > [] ? autoremove_wake_function+0x0/0x38 > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? locks_free_lock+0x31/0x40 > [] ? fcntl_setlk+0x3e/0x185 > [] vfs_write+0x8b/0x13f > [] ? do_page_fault+0x1b4/0x3a0 > [] ? do_sync_write+0x0/0x103 > [] sys_write+0x3d/0x64 > [] sysenter_do_call+0x12/0x2c > Sched Debug Version: v0.09, 2.6.31.6 #8 > now at 194118.700060 msecs > .jiffies : 4294940775 > .sysctl_sched_latency : 20.000000 > .sysctl_sched_min_granularity : 4.000000 > .sysctl_sched_wakeup_granularity : 5.000000 > .sysctl_sched_child_runs_first : 0.000001 > .sysctl_sched_features : 113917 > > cpu#0, 2500.152 MHz > .nr_running : 1 > .load : 3121 > .nr_switches : 17143 > .nr_load_updates : 48354 > .nr_uninterruptible : 4 > .next_balance : 4294.940896 > .curr->pid : 14 > .clock : 193919.954084 > .cpu_load[0] : 0 > .cpu_load[1] : 0 > .cpu_load[2] : 0 > .cpu_load[3] : 0 > .cpu_load[4] : 0 > > cfs_rq[0]: > .exec_clock : 0.000000 > .MIN_vruntime : 0.000001 > .min_vruntime : 13296.631519 > .max_vruntime : 0.000001 > .spread : 0.000000 > .spread0 : 0.000000 > .nr_running : 1 > .load : 3121 > .nr_spread_over : 0 > > rt_rq[0]: > .rt_nr_running : 0 > .rt_throttled : 0 > .rt_time : 0.000000 > .rt_runtime : 950.000000 > > runnable tasks: > task PID tree-key switches prio exec-runtime sum-exec sum-sleep > ---------------------------------------------------------------------------------------------------------- > R xenwatch 14 13276.942024 146 115 0 0 0.000000 0.000000 0.000000 > >