From: CAI Qian <caiqian@redhat.com>
To: stable@vger.kernel.org
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>
Subject: Re: oom caused disk corruption on 3.7.1
Date: Wed, 30 Jan 2013 01:57:05 -0500 (EST) [thread overview]
Message-ID: <561898288.11015388.1359529025256.JavaMail.root@redhat.com> (raw)
In-Reply-To: <1022938540.1925160.1357725053304.JavaMail.root@redhat.com>
----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "linux-mm" kvack.org>
> Cc: stable@vger.kernel.org, "linux-kernel" vger.kernel.org>
> Sent: Wednesday, January 9, 2013 5:50:53 PM
> Subject: oom caused disk corruption on 3.7.1
>
> While doing oom testing on a power7 system with swapping,
> it was swallowed a panic on v3.7.1 below. Without a swap device,
> it is running fine. v3.0 has the same problem.
This is weird that if turned on those options,
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
it turns out to be fine except some warnings which looks like
better than a panic.
INFO: task (tmpfiles):5456 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
(tmpfiles) D 00003fff877fb508 0 5456 1 0x00000080
Call Trace:
[c00000001cf76a30] [c0000000010a2180] jiffies+0x0/0x80 (unreliable)
[c00000001cf76c00] [c000000000014960] .__switch_to+0x110/0x240
[c00000001cf76cb0] [c0000000006b1cc0] .__schedule+0x3c0/0x8b0
[c00000001cf76f30] [c0000000006affb4] .schedule_timeout+0x1e4/0x2d0
[c00000001cf77030] [c0000000006b23fc] .wait_for_common+0x18c/0x200
[c00000001cf77110] [c0000000002863a8] .xfs_buf_iowait+0x88/0x150
[c00000001cf771a0] [c000000000286700] .xfs_buf_read_map+0xd0/0x170
[c00000001cf77240] [c0000000002f4074] .xfs_trans_read_buf_map+0x204/0x570
[c00000001cf77300] [c0000000002c5940] .xfs_da_read_buf+0x100/0x250
[c00000001cf773f0] [c0000000002c7098] .xfs_da_node_lookup_int+0xc8/0x440
[c00000001cf774c0] [c0000000002d0c60] .xfs_dir2_node_lookup+0x70/0x1d0
[c00000001cf77570] [c0000000002c8fe4] .xfs_dir_lookup+0x214/0x230
[c00000001cf776a0] [c00000000029f068] .xfs_lookup+0xb8/0x1a0
[c00000001cf77760] [c000000000293f50] .xfs_vn_lookup+0x60/0xd0
[c00000001cf77800] [c0000000001db454] .lookup_real+0x44/0xa0
[c00000001cf77890] [c0000000001e16e8] .do_last+0xad8/0xe00
[c00000001cf779c0] [c0000000001e1afc] .path_openat+0xec/0x5f0
[c00000001cf77ae0] [c0000000001e2450] .do_filp_open+0x40/0xb0
[c00000001cf77c10] [c0000000001d6308] .open_exec+0x48/0x170
[c00000001cf77cc0] [c0000000001d7ae0] .do_execve_common.isra.19+0x240/0x4e0
[c00000001cf77da0] [c0000000001d8100] .SyS_execve+0x50/0x90
[c00000001cf77e30] [c0000000000097d4] syscall_exit+0x0/0x94
>
> Test case is here,
> http://tinyurl.com/bzzmrb8
>
> ...
> [ 763.781571] Write-error on swap-device (253:0:7545984)
> [ 763.781573] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781574] Write-error on swap-device (253:0:7546240)
> [ 763.781576] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781577] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [ 763.781578] Write-error on swap-device (253:0:7546496)
> [ 763.781579] Call Trace:
> [ 763.781580] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781590] [c0000002eac83870] [c000000000015884]
> .show_stack+0x74/0x1b0 (unreliable)
> [ 763.781595] [c0000002eac83920] [c000000000721d28]
> .panic+0xe4/0x264
> [ 763.781598] [c0000002eac839c0] [c0000000000886e4]
> .do_exit+0x954/0x960
> [ 763.781601] [c0000002eac83ac0] [c0000000000889d4]
> .do_group_exit+0x54/0xf0
> [ 763.781604] [c0000002eac83b50] [c00000000009be28]
> .get_signal_to_deliver+0x1f8/0x730
> [ 763.781606] [c0000002eac83c60] [c000000000017924]
> .do_signal+0x54/0x320
> [ 763.781608] [c0000002eac83da0] [c000000000017d74]
> .do_notify_resume+0xb4/0xd0
> [ 763.781611] [c0000002eac83e30] [c000000000009e1c]
> .ret_from_except_lite+0x48/0x4c
> [ 763.781612] Write-error on swap-device (253:0:7546752)
> [ 763.781613] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781615] Write-error on swap-device (253:0:7547008)
> [ 763.781616] Sending IPI to other CPUs
> [ 763.781616] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781618] Write-error on swap-device (253:0:7547392)
> [ 763.781619] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781620] Write-error on swap-device (253:0:7547648)
> [ 763.781622] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781623] Write-error on swap-device (253:0:7547904)
> [ 763.781625] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781627] Write-error on swap-device (253:0:7548160)
> [ 763.781628] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781630] Write-error on swap-device (253:0:7548416)
> [ 763.781631] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781632] Write-error on swap-device (253:0:7548672)
> [ 763.781634] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781635] Write-error on swap-device (253:0:7548928)
> [ 773.781972] ERROR: 1 cpu(s) not responding
>
> KERNEL: /boot/vmlinux-3.7.1+
> DUMPFILE: /var/crash/127.0.0.1-2013.01.09-19:12:02/vmcore
> CPUS: 28
> DATE: Tue Jan 8 23:11:35 2013
> UPTIME: 00:12:43
> LOAD AVERAGE: 5.88, 4.82, 2.51
> TASKS: 278
> RELEASE: 3.7.1+
> VERSION: #0 SMP Tue Jan 8 06:59:49 EST 2013
> MACHINE: ppc64 (3550 Mhz)
> MEMORY: 12 GB
> PANIC: "Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b"
> PID: 1
> COMMAND: "systemd"
> TASK: c0000002eac00000 [THREAD_INFO: c0000002eac80000]
> CPU: 18
> STATE: TASK_INTERRUPTIBLE|TASK_UNINTERRUPTIBLE|TASK_TRACED
> (PANIC)
>
> crash> bt
> PID: 1 TASK: c0000002eac00000 CPU: 18 COMMAND: "systemd"
>
> R0: c000000000721d34 R1: c0000002eac83920 R2:
> c000000001157098
> R3: c0000002eac83790 R4: c0000002eac00000 R5:
> 0000000000000070
> R6: 0000000000000000 R7: c0000002fff584a0 R8:
> 0000000000000000
> R9: c0000002e7909000 R10: 0000000000000001 R11:
> 6578636570745f6c
> R12: 0000000022004884 R13: c000000007f23f00 R14:
> 0000000000040006
> R15: 00000000279b056c R16: c0000002eac83ea0 R17:
> c000000001398ab8
> R18: c0000002eac00000 R19: c0000002eac00000 R20:
> 00000000003c0000
> R21: c0000002eac00a14 R22: c0000000011b2080 R23:
> c0000002eac83a30
> R24: c000000018d90000 R25: 0000000000000140 R26:
> 0000000000106001
> R27: c0000002eac83790 R28: c0000000013ba848 R29:
> 0000000000000000
> R30: c0000000010d4d18 R31: c00000000101e4b0
> NIP: c000000000721d34 MSR: 8000000000009032 OR3:
> c0000002eac83920
> CTR: 0000000000000000 LR: c000000000721d34 XER:
> 0000000000000001
> CCR: 0000000022004882 MQ: 3030303030303030 DAR:
> 0000000000000000
> DSISR: c000000018d90000 Syscall Result: 0000000000000140
> NIP [c000000000721d34] .panic
>
> #0 [c0000002eac83920] .panic at c000000000721d34
> #1 [c0000002eac839c0] .do_exit at c0000000000886e4
> #2 [c0000002eac83ac0] .do_group_exit at c0000000000889d4
> #3 [c0000002eac83b50] .get_signal_to_deliver at c00000000009be28
> #4 [c0000002eac83c60] .do_signal at c000000000017924
> #5 [c0000002eac83da0] .do_notify_resume at c000000000017d74
> #6 [c0000002eac83e30] .ret_from_except_lite at c000000000009e1c
>
> CAI Qian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: CAI Qian <caiqian@redhat.com>
To: stable@vger.kernel.org
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>
Subject: Re: oom caused disk corruption on 3.7.1
Date: Wed, 30 Jan 2013 01:57:05 -0500 (EST) [thread overview]
Message-ID: <561898288.11015388.1359529025256.JavaMail.root@redhat.com> (raw)
In-Reply-To: <1022938540.1925160.1357725053304.JavaMail.root@redhat.com>
----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "linux-mm" kvack.org>
> Cc: stable@vger.kernel.org, "linux-kernel" vger.kernel.org>
> Sent: Wednesday, January 9, 2013 5:50:53 PM
> Subject: oom caused disk corruption on 3.7.1
>
> While doing oom testing on a power7 system with swapping,
> it was swallowed a panic on v3.7.1 below. Without a swap device,
> it is running fine. v3.0 has the same problem.
This is weird that if turned on those options,
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
it turns out to be fine except some warnings which looks like
better than a panic.
INFO: task (tmpfiles):5456 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
(tmpfiles) D 00003fff877fb508 0 5456 1 0x00000080
Call Trace:
[c00000001cf76a30] [c0000000010a2180] jiffies+0x0/0x80 (unreliable)
[c00000001cf76c00] [c000000000014960] .__switch_to+0x110/0x240
[c00000001cf76cb0] [c0000000006b1cc0] .__schedule+0x3c0/0x8b0
[c00000001cf76f30] [c0000000006affb4] .schedule_timeout+0x1e4/0x2d0
[c00000001cf77030] [c0000000006b23fc] .wait_for_common+0x18c/0x200
[c00000001cf77110] [c0000000002863a8] .xfs_buf_iowait+0x88/0x150
[c00000001cf771a0] [c000000000286700] .xfs_buf_read_map+0xd0/0x170
[c00000001cf77240] [c0000000002f4074] .xfs_trans_read_buf_map+0x204/0x570
[c00000001cf77300] [c0000000002c5940] .xfs_da_read_buf+0x100/0x250
[c00000001cf773f0] [c0000000002c7098] .xfs_da_node_lookup_int+0xc8/0x440
[c00000001cf774c0] [c0000000002d0c60] .xfs_dir2_node_lookup+0x70/0x1d0
[c00000001cf77570] [c0000000002c8fe4] .xfs_dir_lookup+0x214/0x230
[c00000001cf776a0] [c00000000029f068] .xfs_lookup+0xb8/0x1a0
[c00000001cf77760] [c000000000293f50] .xfs_vn_lookup+0x60/0xd0
[c00000001cf77800] [c0000000001db454] .lookup_real+0x44/0xa0
[c00000001cf77890] [c0000000001e16e8] .do_last+0xad8/0xe00
[c00000001cf779c0] [c0000000001e1afc] .path_openat+0xec/0x5f0
[c00000001cf77ae0] [c0000000001e2450] .do_filp_open+0x40/0xb0
[c00000001cf77c10] [c0000000001d6308] .open_exec+0x48/0x170
[c00000001cf77cc0] [c0000000001d7ae0] .do_execve_common.isra.19+0x240/0x4e0
[c00000001cf77da0] [c0000000001d8100] .SyS_execve+0x50/0x90
[c00000001cf77e30] [c0000000000097d4] syscall_exit+0x0/0x94
>
> Test case is here,
> http://tinyurl.com/bzzmrb8
>
> ...
> [ 763.781571] Write-error on swap-device (253:0:7545984)
> [ 763.781573] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781574] Write-error on swap-device (253:0:7546240)
> [ 763.781576] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781577] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [ 763.781578] Write-error on swap-device (253:0:7546496)
> [ 763.781579] Call Trace:
> [ 763.781580] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781590] [c0000002eac83870] [c000000000015884]
> .show_stack+0x74/0x1b0 (unreliable)
> [ 763.781595] [c0000002eac83920] [c000000000721d28]
> .panic+0xe4/0x264
> [ 763.781598] [c0000002eac839c0] [c0000000000886e4]
> .do_exit+0x954/0x960
> [ 763.781601] [c0000002eac83ac0] [c0000000000889d4]
> .do_group_exit+0x54/0xf0
> [ 763.781604] [c0000002eac83b50] [c00000000009be28]
> .get_signal_to_deliver+0x1f8/0x730
> [ 763.781606] [c0000002eac83c60] [c000000000017924]
> .do_signal+0x54/0x320
> [ 763.781608] [c0000002eac83da0] [c000000000017d74]
> .do_notify_resume+0xb4/0xd0
> [ 763.781611] [c0000002eac83e30] [c000000000009e1c]
> .ret_from_except_lite+0x48/0x4c
> [ 763.781612] Write-error on swap-device (253:0:7546752)
> [ 763.781613] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781615] Write-error on swap-device (253:0:7547008)
> [ 763.781616] Sending IPI to other CPUs
> [ 763.781616] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781618] Write-error on swap-device (253:0:7547392)
> [ 763.781619] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781620] Write-error on swap-device (253:0:7547648)
> [ 763.781622] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781623] Write-error on swap-device (253:0:7547904)
> [ 763.781625] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781627] Write-error on swap-device (253:0:7548160)
> [ 763.781628] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781630] Write-error on swap-device (253:0:7548416)
> [ 763.781631] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781632] Write-error on swap-device (253:0:7548672)
> [ 763.781634] sd 0:0:1:0: rejecting I/O to offline device
> [ 763.781635] Write-error on swap-device (253:0:7548928)
> [ 773.781972] ERROR: 1 cpu(s) not responding
>
> KERNEL: /boot/vmlinux-3.7.1+
> DUMPFILE: /var/crash/127.0.0.1-2013.01.09-19:12:02/vmcore
> CPUS: 28
> DATE: Tue Jan 8 23:11:35 2013
> UPTIME: 00:12:43
> LOAD AVERAGE: 5.88, 4.82, 2.51
> TASKS: 278
> RELEASE: 3.7.1+
> VERSION: #0 SMP Tue Jan 8 06:59:49 EST 2013
> MACHINE: ppc64 (3550 Mhz)
> MEMORY: 12 GB
> PANIC: "Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b"
> PID: 1
> COMMAND: "systemd"
> TASK: c0000002eac00000 [THREAD_INFO: c0000002eac80000]
> CPU: 18
> STATE: TASK_INTERRUPTIBLE|TASK_UNINTERRUPTIBLE|TASK_TRACED
> (PANIC)
>
> crash> bt
> PID: 1 TASK: c0000002eac00000 CPU: 18 COMMAND: "systemd"
>
> R0: c000000000721d34 R1: c0000002eac83920 R2:
> c000000001157098
> R3: c0000002eac83790 R4: c0000002eac00000 R5:
> 0000000000000070
> R6: 0000000000000000 R7: c0000002fff584a0 R8:
> 0000000000000000
> R9: c0000002e7909000 R10: 0000000000000001 R11:
> 6578636570745f6c
> R12: 0000000022004884 R13: c000000007f23f00 R14:
> 0000000000040006
> R15: 00000000279b056c R16: c0000002eac83ea0 R17:
> c000000001398ab8
> R18: c0000002eac00000 R19: c0000002eac00000 R20:
> 00000000003c0000
> R21: c0000002eac00a14 R22: c0000000011b2080 R23:
> c0000002eac83a30
> R24: c000000018d90000 R25: 0000000000000140 R26:
> 0000000000106001
> R27: c0000002eac83790 R28: c0000000013ba848 R29:
> 0000000000000000
> R30: c0000000010d4d18 R31: c00000000101e4b0
> NIP: c000000000721d34 MSR: 8000000000009032 OR3:
> c0000002eac83920
> CTR: 0000000000000000 LR: c000000000721d34 XER:
> 0000000000000001
> CCR: 0000000022004882 MQ: 3030303030303030 DAR:
> 0000000000000000
> DSISR: c000000018d90000 Syscall Result: 0000000000000140
> NIP [c000000000721d34] .panic
>
> #0 [c0000002eac83920] .panic at c000000000721d34
> #1 [c0000002eac839c0] .do_exit at c0000000000886e4
> #2 [c0000002eac83ac0] .do_group_exit at c0000000000889d4
> #3 [c0000002eac83b50] .get_signal_to_deliver at c00000000009be28
> #4 [c0000002eac83c60] .do_signal at c000000000017924
> #5 [c0000002eac83da0] .do_notify_resume at c000000000017d74
> #6 [c0000002eac83e30] .ret_from_except_lite at c000000000009e1c
>
> CAI Qian
next prev parent reply other threads:[~2013-01-30 6:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <767713684.1922924.1357724589680.JavaMail.root@redhat.com>
2013-01-09 9:50 ` oom caused disk corruption on 3.7.1 CAI Qian
2013-01-30 6:57 ` CAI Qian [this message]
2013-01-30 6:57 ` CAI Qian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561898288.11015388.1359529025256.JavaMail.root@redhat.com \
--to=caiqian@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.