From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Haigh Subject: Re: rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU Date: Sat, 19 Mar 2016 08:46:41 +1100 Message-ID: <56EC7741.9090802@crc.id.au> References: <56E9D28A.8080302@crc.id.au> <56EC75E7.4090803@crc.id.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5986850900760651435==" Return-path: In-Reply-To: <56EC75E7.4090803@crc.id.au> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --===============5986850900760651435== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="pg47vPGs4nJsXpDINueesEP07xgtOIaeP" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --pg47vPGs4nJsXpDINueesEP07xgtOIaeP Content-Type: multipart/mixed; boundary="XqpvITJj219Dqdo3Ex6GP2oJKKeil6M6j" From: Steven Haigh To: xen-devel@lists.xen.org Message-ID: <56EC7741.9090802@crc.id.au> Subject: Re: rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU References: <56E9D28A.8080302@crc.id.au> <56EC75E7.4090803@crc.id.au> In-Reply-To: <56EC75E7.4090803@crc.id.au> --XqpvITJj219Dqdo3Ex6GP2oJKKeil6M6j Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 19/03/2016 8:40 AM, Steven Haigh wrote: > Hi all, >=20 > So I'd just like to give this a prod. I'm still getting DomU's randomly= > go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as > the DomU does not induce these problems. Sorry - slight correction. Running 4.4.6 as the Dom0 kernel doesn't show these errors. Only in the DomU. >=20 > Latest crash message from today: > INFO: rcu_sched self-detected stall on CPU > 0-...: (20869552 ticks this GP) idle=3D9c9/140000000000001/0 > softirq=3D1440865/1440865 fqs=3D15068 > (t=3D20874993 jiffies g=3D1354899 c=3D1354898 q=3D798) > rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0 > s3 ->state=3D0x0 > Task dump for CPU 0: > kworker/u4:1 R running task 0 5853 2 0x00000088 > Workqueue: writeback wb_workfn (flush-202:0) > ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000 > ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001 > ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00 > Call Trace: > [] sched_show_task+0xaf/0x110 > [] dump_cpu_task+0x39/0x40 > [] rcu_dump_cpu_stacks+0x8a/0xc0 > [] rcu_check_callbacks+0x424/0x7a0 > [] ? account_system_time+0x81/0x110 > [] ? account_process_tick+0x61/0x160 > [] ? tick_sched_do_timer+0x30/0x30 > [] update_process_times+0x39/0x60 > [] tick_sched_handle.isra.15+0x36/0x50 > [] tick_sched_timer+0x3d/0x70 > [] __hrtimer_run_queues+0xf2/0x250 > [] hrtimer_interrupt+0xa8/0x190 > [] xen_timer_interrupt+0x2e/0x140 > [] handle_irq_event_percpu+0x55/0x1e0 > [] handle_percpu_irq+0x3a/0x50 > [] generic_handle_irq+0x22/0x30 > [] __evtchn_fifo_handle_events+0x15f/0x180 > [] evtchn_fifo_handle_events+0x10/0x20 > [] __xen_evtchn_do_upcall+0x43/0x80 > [] xen_evtchn_do_upcall+0x30/0x50 > [] xen_hvm_callback_vector+0x82/0x90 > [] ? queued_spin_lock_slowpath+0x22/0x170 > [] _raw_spin_lock+0x20/0x30 > [] writeback_sb_inodes+0x124/0x560 > [] ? _raw_spin_unlock_irqrestore+0x16/0x20 > [] __writeback_inodes_wb+0x86/0xc0 > [] wb_writeback+0x1d6/0x2d0 > [] wb_workfn+0x284/0x3e0 > [] process_one_work+0x151/0x400 > [] worker_thread+0x11a/0x460 > [] ? __schedule+0x2bf/0x880 > [] ? rescuer_thread+0x2f0/0x2f0 > [] kthread+0xc9/0xe0 > [] ? kthread_park+0x60/0x60 > [] ret_from_fork+0x3f/0x70 > [] ? kthread_park+0x60/0x60 >=20 > This repeats over and over causing 100% CPU usage - eventually on all > vcpus assigned to the DomU and the only recovery is 'xl destroy'. >=20 > I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in > both the DomU and Dom0. >=20 > On 17/03/2016 8:39 AM, Steven Haigh wrote: >> Hi all, >> >> I've noticed the following problem that ends up with a non-repsonsive = PV >> DomU using kernel 4.4.5 under heavy disk IO: >> >> INFO: rcu_sched self-detected stall on CPU >> 0-...: (6759098 ticks this GP) idle=3Dcb3/140000000000001/0 >> softirq=3D3244615/3244615 fqs=3D4 >> (t=3D6762321 jiffies g=3D2275626 c=3D2275625 q=3D54) >> rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 = s3 >> ->state=3D0x0 >> Task dump for CPU 0: >> updatedb R running task 0 6027 6021 0x00000088 >> ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000 >> ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001 >> ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00 >> Call Trace: >> [] sched_show_task+0xaf/0x110 >> [] dump_cpu_task+0x39/0x40 >> [] rcu_dump_cpu_stacks+0x8a/0xc0 >> [] rcu_check_callbacks+0x424/0x7a0 >> [] ? account_system_time+0x81/0x110 >> [] ? account_process_tick+0x61/0x160 >> [] ? tick_sched_do_timer+0x30/0x30 >> [] update_process_times+0x39/0x60 >> [] tick_sched_handle.isra.15+0x36/0x50 >> [] tick_sched_timer+0x3d/0x70 >> [] __hrtimer_run_queues+0xf2/0x250 >> [] hrtimer_interrupt+0xa8/0x190 >> [] xen_timer_interrupt+0x2e/0x140 >> [] handle_irq_event_percpu+0x55/0x1e0 >> [] handle_percpu_irq+0x3a/0x50 >> [] generic_handle_irq+0x22/0x30 >> [] __evtchn_fifo_handle_events+0x15f/0x180 >> [] evtchn_fifo_handle_events+0x10/0x20 >> [] __xen_evtchn_do_upcall+0x43/0x80 >> [] xen_evtchn_do_upcall+0x30/0x50 >> [] xen_hvm_callback_vector+0x82/0x90 >> [] ? queued_spin_lock_slowpath+0x10/0x170 >> [] _raw_spin_lock+0x20/0x30 >> [] find_inode_fast+0x61/0xa0 >> [] iget_locked+0x6e/0x170 >> [] ext4_iget+0x33/0xae0 >> [] ? out_of_line_wait_on_bit+0x72/0x80 >> [] ext4_iget_normal+0x30/0x40 >> [] ext4_lookup+0xd5/0x140 >> [] lookup_real+0x1d/0x50 >> [] __lookup_hash+0x33/0x40 >> [] walk_component+0x177/0x280 >> [] path_lookupat+0x60/0x110 >> [] filename_lookup+0x9c/0x150 >> [] ? kfree+0x10d/0x290 >> [] ? call_filldir+0x9c/0x130 >> [] ? getname_flags+0x4f/0x1f0 >> [] user_path_at_empty+0x36/0x40 >> [] vfs_fstatat+0x53/0xa0 >> [] ? __fput+0x169/0x1d0 >> [] SYSC_newlstat+0x22/0x40 >> [] ? __audit_syscall_exit+0x1f0/0x270 >> [] ? syscall_slow_exit_work+0x3f/0xc0 >> [] ? __audit_syscall_entry+0xaf/0x100 >> [] SyS_newlstat+0xe/0x10 >> [] entry_SYSCALL_64_fastpath+0x12/0x71 >> >> This ends up with the system not responding at 100% CPU usage. >> >> Has anyone else seen this using kernel 4.4.5 in a DomU? >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> >=20 >=20 >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >=20 --=20 Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 --XqpvITJj219Dqdo3Ex6GP2oJKKeil6M6j-- --pg47vPGs4nJsXpDINueesEP07xgtOIaeP Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJW7HdBAAoJEEGvNdV6fTHc8E4P/ifIfIquj1CiWvlbWaOQNS2e pgF9uNzGVkpe/xSMM+fT8Ug764FCUUImgf2GK5XaNZM2dA/CIuHUqVTTpqLBWWmp 5jkwb0GoeDqaSj1q9Jc0VEapAJ1O7ALrhqUutKknlkJhP+remSWCAx3lKmduhPLp WqyoktjvP+VJwp1oxOMBlf1FDt7d+ritTwglZFl5kBuLXvQLUWEwCQSz5YJdQlYT BasEfHsOOpwoefJiYwR4GsYVpTPEs272z4mVdy6loyYoGfmKXWuFtqm6FICOP0Fc Lw9syizownFtv/oBDcubKM6ah/m1rZzGB/SYslB5mC9CLY7fLPlIp4N/ee/jFob/ wUtVroI9yut9MHdT1Ns4XbISXeW8RNv2nGETvCrR9xc0cQjQxsQmrcr8LJy/qId3 bu/3TTZOfaKbXne7rq31bdM7bBFjSRSBwL3sP72jADjdyViBgxdvhq5k6Ju4ArB9 XwvTWXYBleGkARwTvYcdQY/Eo9f5TmNA7uYYncY/0v3+RTmvb+gpvA6YbDfyAWoL HPfQZBb0EBIC+kOVQqNb8MNV6vUdzWBJNRKHHChQUvEwUUtAnNkLRL3qHDw93S2i Zx1Vi6d6Knaak36GV4Cxuc1nWaxfImuTFwPT3Ii/idxhhDq279mzIaUhYZoBRLEw ouvHSUZG64kEIp75ag1c =7XDb -----END PGP SIGNATURE----- --pg47vPGs4nJsXpDINueesEP07xgtOIaeP-- --===============5986850900760651435== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============5986850900760651435==--