Re: 4.4: INFO: rcu_sched self-detected stall on CPU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Steven Haigh <netwiz@crc.id.au>
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	linux-kernel@vger.kernel.org
Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU
Date: Sat, 2 Apr 2016 12:50:07 +1100	[thread overview]
Message-ID: <56FF254F.3050103@crc.id.au> (raw)
In-Reply-To: <56FA8DDD.7070406@oracle.com>


[-- Attachment #1.1: Type: text/plain, Size: 10386 bytes --]

On 30/03/2016 1:14 AM, Boris Ostrovsky wrote:
> On 03/29/2016 04:56 AM, Steven Haigh wrote:
>>
>> Interestingly enough, this just happened again - but on a different
>> virtual machine. I'm starting to wonder if this may have something to do
>> with the uptime of the machine - as the system that this seems to happen
>> to is always different.
>>
>> Destroying it and monitoring it again has so far come up blank.
>>
>> I've thrown the latest lot of kernel messages here:
>>      http://paste.fedoraproject.org/346802/59241532
> 
> Would be good to see full console log. The one that you posted starts
> with an error so I wonder what was before that.

Ok, so I had a virtual machine do this again today. Both vcpus went to
100% usage and essentially hung. I attached to the screen console that
was connected via 'xl console' and copied the entire buffer to paste below:

yum-cron[30740]: segfault at 1781ab8 ip 00007f2a7fcd282f sp
00007ffe8655fe90 error 5 in libpython2.7.so.1.0[7f2a7fbf5000+178000]
swap_free: Bad swap file entry 2a2b7d5bb69515d8
BUG: Bad page map in process yum-cron  pte:56fab76d2a2bb06a pmd:0309e067
addr:0000000001780000 vm_flags:00100073 anon_vma:ffff88007b974c08
mapping:          (null) index:1780
file:          (null) fault:          (null) mmap:          (null)
readpage:          (null)
CPU: 0 PID: 30740 Comm: yum-cron Tainted: G    B
4.4.6-4.el7xen.x86_64 #1
 0000000000000000 ffff88004176bac0 ffffffff81323d17 0000000001780000
 00003ffffffff000 ffff88004176bb08 ffffffff8117e574 ffffffff81193d6e
 0000000000001780 ffff88000309ec00 0000000001780000 56fab76d2a2bb06a
Call Trace:
 [<ffffffff81323d17>] dump_stack+0x63/0x8c
 [<ffffffff8117e574>] print_bad_pte+0x1e4/0x290
 [<ffffffff81193d6e>] ? swap_info_get+0x7e/0xe0
 [<ffffffff8117fdef>] unmap_single_vma+0x4ff/0x840
 [<ffffffff81180837>] unmap_vmas+0x47/0x90
 [<ffffffff811891f8>] exit_mmap+0x98/0x150
 [<ffffffff8107a557>] mmput+0x47/0x100
 [<ffffffff8107f81e>] do_exit+0x24e/0xad0
 [<ffffffff8108011f>] do_group_exit+0x3f/0xa0
 [<ffffffff8108aa03>] get_signal+0x1c3/0x5e0
 [<ffffffff81017208>] do_signal+0x28/0x630
 [<ffffffff81152d7d>] ? printk+0x4d/0x4f
 [<ffffffff810c735f>] ? vprintk_default+0x1f/0x30
 [<ffffffff8107938c>] ? bad_area_access_error+0x43/0x4a
 [<ffffffff810603ac>] ? __do_page_fault+0x22c/0x3f0
 [<ffffffff8107808d>] exit_to_usermode_loop+0x4c/0x95
 [<ffffffff81003878>] prepare_exit_to_usermode+0x18/0x20
 [<ffffffff8165cae5>] retint_user+0x8/0x13
BUG: Bad page state in process yum-cron  pfn:0f3bf
page:ffffea00003cefc0 count:0 mapcount:7 mapping:ffff88000f3bf008
index:0xffff88000f3bf000
flags: 0x10000000000094(referenced|dirty|slab)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x80(slab)
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache
x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel
xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 0 PID: 30740 Comm: yum-cron Tainted: G    B
4.4.6-4.el7xen.x86_64 #1
 0000000000000000 ffff88004176b958 ffffffff81323d17 ffffea00003cefc0
 ffffffff817ab348 ffff88004176b980 ffffffff811c1ab5 ffffea00003cefc0
 0000000000000001 0000000000000000 ffff88004176b9d0 ffffffff81159584
Call Trace:
 [<ffffffff81323d17>] dump_stack+0x63/0x8c
 [<ffffffff811c1ab5>] bad_page.part.69+0xdf/0xfc
 [<ffffffff81159584>] free_pages_prepare+0x294/0x2a0
 [<ffffffff8115b6c1>] free_hot_cold_page+0x31/0x160
 [<ffffffff8115b839>] free_hot_cold_page_list+0x49/0xb0
 [<ffffffff811621b5>] release_pages+0xc5/0x260
 [<ffffffff8119380d>] free_pages_and_swap_cache+0x7d/0x90
 [<ffffffff8117ddd6>] tlb_flush_mmu_free+0x36/0x60
 [<ffffffff8117ff54>] unmap_single_vma+0x664/0x840
 [<ffffffff81180837>] unmap_vmas+0x47/0x90
 [<ffffffff811891f8>] exit_mmap+0x98/0x150
 [<ffffffff8107a557>] mmput+0x47/0x100
 [<ffffffff8107f81e>] do_exit+0x24e/0xad0
 [<ffffffff8108011f>] do_group_exit+0x3f/0xa0
 [<ffffffff8108aa03>] get_signal+0x1c3/0x5e0
 [<ffffffff81017208>] do_signal+0x28/0x630
 [<ffffffff81152d7d>] ? printk+0x4d/0x4f
 [<ffffffff810c735f>] ? vprintk_default+0x1f/0x30
 [<ffffffff8107938c>] ? bad_area_access_error+0x43/0x4a
 [<ffffffff810603ac>] ? __do_page_fault+0x22c/0x3f0
 [<ffffffff8107808d>] exit_to_usermode_loop+0x4c/0x95
 [<ffffffff81003878>] prepare_exit_to_usermode+0x18/0x20
 [<ffffffff8165cae5>] retint_user+0x8/0x13
BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:0 val:-1
BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:1 val:2
BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:2 val:-1
yum-cron[4197]: segfault at 32947fcb ip 00007ff0fa1bf8bd sp
00007ffdb1c54990 error 4 in libpython2.7.so.1.0[7ff0fa13a000+178000]
BUG: unable to handle kernel paging request at ffff88010f3beffe
IP: [<ffffffff811a7fb9>] free_block+0x119/0x190
PGD 188b063 PUD 0
Oops: 0002 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache
x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel
xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G    B
4.4.6-4.el7xen.x86_64 #1
Workqueue: events cache_reap
task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000
RIP: 0010:[<ffffffff811a7fb9>]  [<ffffffff811a7fb9>] free_block+0x119/0x190
RSP: 0018:ffff880051703d40  EFLAGS: 00010082
RAX: ffffea00003cefc0 RBX: ffffea0000000000 RCX: ffff88000f3bf000
RDX: 00000000fffffffe RSI: ffff88007fd19c40 RDI: ffff88007d012100
RBP: ffff880051703d68 R08: ffff880051703d88 R09: 0000000000000006
R10: ffff88007d01a940 R11: 0000000080000000 R12: 000077ff80000000
R13: ffff88007fd19c58 R14: ffff88007d01a948 R15: ffff88007d01a968
FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88010f3beffe CR3: 000000007bc8d000 CR4: 0000000000040620
Stack:
 ffff88007fd19bf0 ffff880051703d88 ffff88007d01a940 000000000000000b
 ffff88007d012100 ffff880051703dc0 ffffffff811a871b 0000000b51703dd8
 ffff88007fd19c00 ffff880051703d88 ffff880051703d88 ffff88007d012100
Call Trace:
 [<ffffffff811a871b>] drain_array+0xab/0x130
 [<ffffffff811a955c>] cache_reap+0x6c/0x230
 [<ffffffff81093831>] process_one_work+0x151/0x400
 [<ffffffff8109437a>] worker_thread+0x11a/0x460
 [<ffffffff8165848f>] ? __schedule+0x2bf/0x880
 [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
 [<ffffffff810993d9>] kthread+0xc9/0xe0
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
 [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
Code: 41 89 d1 4c 0f af c9 0f b6 4f 1c 49 c1 e9 20 44 29 ca d3 ea 0f b6
4f 1d 41 01 d1 8b 50 18 41 d3 e9 48 8b 48 10 83 ea 01 89 50 18 <44> 88
0c 11 49 8b 52 38 48 83 c2 01 49 89 52 38 8b 48 18 85 c9
RIP  [<ffffffff811a7fb9>] free_block+0x119/0x190
 RSP <ffff880051703d40>
CR2: ffff88010f3beffe
---[ end trace 99d22a980c39d4db ]---
BUG: unable to handle kernel paging request at ffffffffffffffd8
IP: [<ffffffff81099b40>] kthread_data+0x10/0x20
PGD 188d067 PUD 188f067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache
x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel
xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G    B D
4.4.6-4.el7xen.x86_64 #1
task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000
RIP: 0010:[<ffffffff81099b40>]  [<ffffffff81099b40>] kthread_data+0x10/0x20
RSP: 0018:ffff880051703a60  EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
RDX: ffffffff81bd4080 RSI: 0000000000000001 RDI: ffff8800346bf1c0
RBP: ffff880051703a60 R08: ffff88007c8e19d0 R09: 0000c88ed0b2b602
R10: 000000000000001f R11: 0000000000000000 R12: ffff8800346bf1c0
R13: 0000000000000000 R14: 00000000000163c0 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 000000007bc8d000 CR4: 0000000000040620
Stack:
 ffff880051703a78 ffffffff81094741 ffff88007fd163c0 ffff880051703ac0
 ffffffff8165858f ffffffff810d2e27 ffff8800346bf1c0 ffff880051704000
 ffff8800346bf510 0000000000000046 ffff8800346bf1c0 ffff88007d3facc0
Call Trace:
 [<ffffffff81094741>] wq_worker_sleeping+0x11/0x90
 [<ffffffff8165858f>] __schedule+0x3bf/0x880
 [<ffffffff810d2e27>] ? call_rcu_sched+0x17/0x20
 [<ffffffff81658a85>] schedule+0x35/0x80
 [<ffffffff8107fc29>] do_exit+0x659/0xad0
 [<ffffffff8101a60a>] oops_end+0x9a/0xd0
 [<ffffffff8105f7ad>] no_context+0x10d/0x360
 [<ffffffff8105fb09>] __bad_area_nosemaphore+0x109/0x210
 [<ffffffff8105fc23>] bad_area_nosemaphore+0x13/0x20
 [<ffffffff81060200>] __do_page_fault+0x80/0x3f0
 [<ffffffff81060592>] do_page_fault+0x22/0x30
 [<ffffffff8165e0e8>] page_fault+0x28/0x30
 [<ffffffff811a7fb9>] ? free_block+0x119/0x190
 [<ffffffff811a871b>] drain_array+0xab/0x130
 [<ffffffff811a955c>] cache_reap+0x6c/0x230
 [<ffffffff81093831>] process_one_work+0x151/0x400
 [<ffffffff8109437a>] worker_thread+0x11a/0x460
 [<ffffffff8165848f>] ? __schedule+0x2bf/0x880
 [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
 [<ffffffff810993d9>] kthread+0xc9/0xe0
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
 [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
Code: ff ff eb 92 be 3a 02 00 00 48 c7 c7 0b 5e 7a 81 e8 e6 34 fe ff e9
d5 fe ff ff 90 66 66 66 66 90 55 48 8b 87 00 04 00 00 48 89 e5 <48> 8b
40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [<ffffffff81099b40>] kthread_data+0x10/0x20
 RSP <ffff880051703a60>
CR2: ffffffffffffffd8
---[ end trace 99d22a980c39d4dc ]---
Fixing recursive fault but reboot is needed!

There is nothing further logged on the console after, and any previous
messages are concerning an NFS server.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

next prev parent reply	other threads:[~2016-04-02  1:50 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25  2:53 4.4: INFO: rcu_sched self-detected stall on CPU Steven Haigh
2016-03-25  2:53 ` Steven Haigh
2016-03-25 12:23 ` Boris Ostrovsky
2016-03-25 12:23 ` [Xen-devel] " Boris Ostrovsky
2016-03-25 14:05   ` Steven Haigh
2016-03-25 14:05   ` [Xen-devel] " Steven Haigh
2016-03-25 14:44     ` Boris Ostrovsky
2016-03-25 14:44     ` [Xen-devel] " Boris Ostrovsky
2016-03-25 16:04       ` Steven Haigh
2016-03-25 16:20         ` Boris Ostrovsky
2016-03-25 16:20         ` [Xen-devel] " Boris Ostrovsky
2016-03-25 21:07           ` Steven Haigh
2016-03-25 21:07           ` Steven Haigh
2016-03-29  8:56             ` Steven Haigh
2016-03-29  8:56             ` Steven Haigh
2016-03-29 14:14               ` Boris Ostrovsky
2016-03-29 17:44                 ` Steven Haigh
2016-03-29 17:44                 ` Steven Haigh
2016-03-29 18:04                   ` Steven Haigh
2016-03-29 18:04                   ` Steven Haigh
2016-03-29 18:32                     ` Steven Haigh
2016-03-30 13:44                     ` Boris Ostrovsky
2016-03-30 13:44                     ` Boris Ostrovsky
2016-05-02 20:54                     ` gregkh
2016-05-03 15:11                       ` Steven Haigh
2016-05-03 15:46                         ` gregkh
2016-05-02 20:54                     ` gregkh
2016-04-02  1:50                 ` Steven Haigh
2016-04-02  1:50                 ` Steven Haigh [this message]
2016-03-29 14:14               ` Boris Ostrovsky
2016-03-25 16:04       ` Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FF254F.3050103@crc.id.au \
    --to=netwiz@crc.id.au \
    --cc=boris.ostrovsky@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.