Re: rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Steven Haigh <netwiz@crc.id.au>
To: xen-devel@lists.xen.org
Subject: Re: rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU
Date: Sat, 19 Mar 2016 08:46:41 +1100	[thread overview]
Message-ID: <56EC7741.9090802@crc.id.au> (raw)
In-Reply-To: <56EC75E7.4090803@crc.id.au>


[-- Attachment #1.1.1: Type: text/plain, Size: 7263 bytes --]

On 19/03/2016 8:40 AM, Steven Haigh wrote:
> Hi all,
> 
> So I'd just like to give this a prod. I'm still getting DomU's randomly
> go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as
> the DomU does not induce these problems.

Sorry - slight correction. Running 4.4.6 as the Dom0 kernel doesn't show
these errors. Only in the DomU.

> 
> Latest crash message from today:
> INFO: rcu_sched self-detected stall on CPU
>         0-...: (20869552 ticks this GP) idle=9c9/140000000000001/0
> softirq=1440865/1440865 fqs=15068
>          (t=20874993 jiffies g=1354899 c=1354898 q=798)
> rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0
> s3 ->state=0x0
> Task dump for CPU 0:
> kworker/u4:1    R  running task        0  5853      2 0x00000088
> Workqueue: writeback wb_workfn (flush-202:0)
>  ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
>  ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
>  ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
> Call Trace:
>  <IRQ>  [<ffffffff810a625f>] sched_show_task+0xaf/0x110
>  [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
>  [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
>  [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
>  [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
>  [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
>  [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
>  [<ffffffff810d9749>] update_process_times+0x39/0x60
>  [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
>  [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
>  [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
>  [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
>  [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
>  [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
>  [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
>  [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
>  [<ffffffff813e50ff>] __evtchn_fifo_handle_events+0x15f/0x180
>  [<ffffffff813e5130>] evtchn_fifo_handle_events+0x10/0x20
>  [<ffffffff813e2233>] __xen_evtchn_do_upcall+0x43/0x80
>  [<ffffffff813e3ea0>] xen_evtchn_do_upcall+0x30/0x50
>  [<ffffffff8165deb2>] xen_hvm_callback_vector+0x82/0x90
>  <EOI>  [<ffffffff810b9ce2>] ? queued_spin_lock_slowpath+0x22/0x170
>  [<ffffffff8165bc70>] _raw_spin_lock+0x20/0x30
>  [<ffffffff811f1f34>] writeback_sb_inodes+0x124/0x560
>  [<ffffffff8165ba36>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>  [<ffffffff811f23f6>] __writeback_inodes_wb+0x86/0xc0
>  [<ffffffff811f2606>] wb_writeback+0x1d6/0x2d0
>  [<ffffffff811f2e94>] wb_workfn+0x284/0x3e0
>  [<ffffffff81093831>] process_one_work+0x151/0x400
>  [<ffffffff8109437a>] worker_thread+0x11a/0x460
>  [<ffffffff8165848f>] ? __schedule+0x2bf/0x880
>  [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
>  [<ffffffff810993d9>] kthread+0xc9/0xe0
>  [<ffffffff81099310>] ? kthread_park+0x60/0x60
>  [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
>  [<ffffffff81099310>] ? kthread_park+0x60/0x60
> 
> This repeats over and over causing 100% CPU usage - eventually on all
> vcpus assigned to the DomU and the only recovery is 'xl destroy'.
> 
> I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in
> both the DomU and Dom0.
> 
> On 17/03/2016 8:39 AM, Steven Haigh wrote:
>> Hi all,
>>
>> I've noticed the following problem that ends up with a non-repsonsive PV
>> DomU using kernel 4.4.5 under heavy disk IO:
>>
>> INFO: rcu_sched self-detected stall on CPU
>>         0-...: (6759098 ticks this GP) idle=cb3/140000000000001/0
>> softirq=3244615/3244615 fqs=4
>>          (t=6762321 jiffies g=2275626 c=2275625 q=54)
>> rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3
>> ->state=0x0
>> Task dump for CPU 0:
>> updatedb        R  running task        0  6027   6021 0x00000088
>>  ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
>>  ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
>>  ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
>> Call Trace:
>>  <IRQ>  [<ffffffff810a625f>] sched_show_task+0xaf/0x110
>>  [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
>>  [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
>>  [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
>>  [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
>>  [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
>>  [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
>>  [<ffffffff810d9749>] update_process_times+0x39/0x60
>>  [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
>>  [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
>>  [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
>>  [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
>>  [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
>>  [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
>>  [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
>>  [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
>>  [<ffffffff813e51af>] __evtchn_fifo_handle_events+0x15f/0x180
>>  [<ffffffff813e51e0>] evtchn_fifo_handle_events+0x10/0x20
>>  [<ffffffff813e22e3>] __xen_evtchn_do_upcall+0x43/0x80
>>  [<ffffffff813e3f50>] xen_evtchn_do_upcall+0x30/0x50
>>  [<ffffffff8165def2>] xen_hvm_callback_vector+0x82/0x90
>>  <EOI>  [<ffffffff810b9cd0>] ? queued_spin_lock_slowpath+0x10/0x170
>>  [<ffffffff8165bcc0>] _raw_spin_lock+0x20/0x30
>>  [<ffffffff811df221>] find_inode_fast+0x61/0xa0
>>  [<ffffffff811df7ce>] iget_locked+0x6e/0x170
>>  [<ffffffff81249203>] ext4_iget+0x33/0xae0
>>  [<ffffffff81658f02>] ? out_of_line_wait_on_bit+0x72/0x80
>>  [<ffffffff81249ce0>] ext4_iget_normal+0x30/0x40
>>  [<ffffffff81253415>] ext4_lookup+0xd5/0x140
>>  [<ffffffff811ceffd>] lookup_real+0x1d/0x50
>>  [<ffffffff811cf3b3>] __lookup_hash+0x33/0x40
>>  [<ffffffff811d1af7>] walk_component+0x177/0x280
>>  [<ffffffff811d2c50>] path_lookupat+0x60/0x110
>>  [<ffffffff811d47dc>] filename_lookup+0x9c/0x150
>>  [<ffffffff811a982d>] ? kfree+0x10d/0x290
>>  [<ffffffff8124122c>] ? call_filldir+0x9c/0x130
>>  [<ffffffff811d441f>] ? getname_flags+0x4f/0x1f0
>>  [<ffffffff811d4946>] user_path_at_empty+0x36/0x40
>>  [<ffffffff811ca6f3>] vfs_fstatat+0x53/0xa0
>>  [<ffffffff811c7919>] ? __fput+0x169/0x1d0
>>  [<ffffffff811cab42>] SYSC_newlstat+0x22/0x40
>>  [<ffffffff8110cad0>] ? __audit_syscall_exit+0x1f0/0x270
>>  [<ffffffff81078115>] ? syscall_slow_exit_work+0x3f/0xc0
>>  [<ffffffff8110c88f>] ? __audit_syscall_entry+0xaf/0x100
>>  [<ffffffff811cad4e>] SyS_newlstat+0xe/0x10
>>  [<ffffffff8165c06e>] entry_SYSCALL_64_fastpath+0x12/0x71
>>
>> This ends up with the system not responding at 100% CPU usage.
>>
>> Has anyone else seen this using kernel 4.4.5 in a DomU?
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-03-18 21:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-16 21:39 rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU Steven Haigh
2016-03-18 21:40 ` Steven Haigh
2016-03-18 21:46   ` Steven Haigh [this message]
2016-03-24  3:32     ` rcu_sched self-detected stall on CPU on kernel 4.4.5/6 " Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56EC7741.9090802@crc.id.au \
    --to=netwiz@crc.id.au \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.