All of lore.kernel.org
 help / color / mirror / Atom feed
From: CAI Qian <caiqian@redhat.com>
To: stable@vger.kernel.org
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: oom caused disk corruption on 3.7.1
Date: Wed, 30 Jan 2013 01:57:05 -0500 (EST)	[thread overview]
Message-ID: <561898288.11015388.1359529025256.JavaMail.root@redhat.com> (raw)
In-Reply-To: <1022938540.1925160.1357725053304.JavaMail.root@redhat.com>



----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "linux-mm" kvack.org>
> Cc: stable@vger.kernel.org, "linux-kernel" vger.kernel.org>
> Sent: Wednesday, January 9, 2013 5:50:53 PM
> Subject: oom caused disk corruption on 3.7.1
> 
> While doing oom testing on a power7 system with swapping,
> it was swallowed a panic on v3.7.1 below. Without a swap device,
> it is running fine. v3.0 has the same problem.
This is weird that if turned on those options,
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y

it turns out to be fine except some warnings which looks like
better than a panic.
INFO: task (tmpfiles):5456 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
(tmpfiles)      D 00003fff877fb508     0  5456      1 0x00000080
Call Trace:
[c00000001cf76a30] [c0000000010a2180] jiffies+0x0/0x80 (unreliable)
[c00000001cf76c00] [c000000000014960] .__switch_to+0x110/0x240
[c00000001cf76cb0] [c0000000006b1cc0] .__schedule+0x3c0/0x8b0
[c00000001cf76f30] [c0000000006affb4] .schedule_timeout+0x1e4/0x2d0
[c00000001cf77030] [c0000000006b23fc] .wait_for_common+0x18c/0x200
[c00000001cf77110] [c0000000002863a8] .xfs_buf_iowait+0x88/0x150
[c00000001cf771a0] [c000000000286700] .xfs_buf_read_map+0xd0/0x170
[c00000001cf77240] [c0000000002f4074] .xfs_trans_read_buf_map+0x204/0x570
[c00000001cf77300] [c0000000002c5940] .xfs_da_read_buf+0x100/0x250
[c00000001cf773f0] [c0000000002c7098] .xfs_da_node_lookup_int+0xc8/0x440
[c00000001cf774c0] [c0000000002d0c60] .xfs_dir2_node_lookup+0x70/0x1d0
[c00000001cf77570] [c0000000002c8fe4] .xfs_dir_lookup+0x214/0x230
[c00000001cf776a0] [c00000000029f068] .xfs_lookup+0xb8/0x1a0
[c00000001cf77760] [c000000000293f50] .xfs_vn_lookup+0x60/0xd0
[c00000001cf77800] [c0000000001db454] .lookup_real+0x44/0xa0
[c00000001cf77890] [c0000000001e16e8] .do_last+0xad8/0xe00
[c00000001cf779c0] [c0000000001e1afc] .path_openat+0xec/0x5f0
[c00000001cf77ae0] [c0000000001e2450] .do_filp_open+0x40/0xb0
[c00000001cf77c10] [c0000000001d6308] .open_exec+0x48/0x170
[c00000001cf77cc0] [c0000000001d7ae0] .do_execve_common.isra.19+0x240/0x4e0
[c00000001cf77da0] [c0000000001d8100] .SyS_execve+0x50/0x90
[c00000001cf77e30] [c0000000000097d4] syscall_exit+0x0/0x94
> 
> Test case is here,
> http://tinyurl.com/bzzmrb8
> 
> ...
> [  763.781571] Write-error on swap-device (253:0:7545984)
> [  763.781573] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781574] Write-error on swap-device (253:0:7546240)
> [  763.781576] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781577] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [  763.781578] Write-error on swap-device (253:0:7546496)
> [  763.781579] Call Trace:
> [  763.781580] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781590] [c0000002eac83870] [c000000000015884]
> .show_stack+0x74/0x1b0 (unreliable)
> [  763.781595] [c0000002eac83920] [c000000000721d28]
> .panic+0xe4/0x264
> [  763.781598] [c0000002eac839c0] [c0000000000886e4]
> .do_exit+0x954/0x960
> [  763.781601] [c0000002eac83ac0] [c0000000000889d4]
> .do_group_exit+0x54/0xf0
> [  763.781604] [c0000002eac83b50] [c00000000009be28]
> .get_signal_to_deliver+0x1f8/0x730
> [  763.781606] [c0000002eac83c60] [c000000000017924]
> .do_signal+0x54/0x320
> [  763.781608] [c0000002eac83da0] [c000000000017d74]
> .do_notify_resume+0xb4/0xd0
> [  763.781611] [c0000002eac83e30] [c000000000009e1c]
> .ret_from_except_lite+0x48/0x4c
> [  763.781612] Write-error on swap-device (253:0:7546752)
> [  763.781613] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781615] Write-error on swap-device (253:0:7547008)
> [  763.781616] Sending IPI to other CPUs
> [  763.781616] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781618] Write-error on swap-device (253:0:7547392)
> [  763.781619] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781620] Write-error on swap-device (253:0:7547648)
> [  763.781622] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781623] Write-error on swap-device (253:0:7547904)
> [  763.781625] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781627] Write-error on swap-device (253:0:7548160)
> [  763.781628] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781630] Write-error on swap-device (253:0:7548416)
> [  763.781631] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781632] Write-error on swap-device (253:0:7548672)
> [  763.781634] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781635] Write-error on swap-device (253:0:7548928)
> [  773.781972] ERROR: 1 cpu(s) not responding
> 
>       KERNEL: /boot/vmlinux-3.7.1+
>     DUMPFILE: /var/crash/127.0.0.1-2013.01.09-19:12:02/vmcore
>         CPUS: 28
>         DATE: Tue Jan  8 23:11:35 2013
>       UPTIME: 00:12:43
> LOAD AVERAGE: 5.88, 4.82, 2.51
>        TASKS: 278
>      RELEASE: 3.7.1+
>      VERSION: #0 SMP Tue Jan 8 06:59:49 EST 2013
>      MACHINE: ppc64  (3550 Mhz)
>       MEMORY: 12 GB
>        PANIC: "Kernel panic - not syncing: Attempted to kill init!
>        exitcode=0x0000000b"
>          PID: 1
>      COMMAND: "systemd"
>         TASK: c0000002eac00000  [THREAD_INFO: c0000002eac80000]
>          CPU: 18
>        STATE: TASK_INTERRUPTIBLE|TASK_UNINTERRUPTIBLE|TASK_TRACED
>        (PANIC)
> 
> crash> bt
> PID: 1      TASK: c0000002eac00000  CPU: 18  COMMAND: "systemd"
> 
>  R0:  c000000000721d34    R1:  c0000002eac83920    R2:
>   c000000001157098
>  R3:  c0000002eac83790    R4:  c0000002eac00000    R5:
>   0000000000000070
>  R6:  0000000000000000    R7:  c0000002fff584a0    R8:
>   0000000000000000
>  R9:  c0000002e7909000    R10: 0000000000000001    R11:
>  6578636570745f6c
>  R12: 0000000022004884    R13: c000000007f23f00    R14:
>  0000000000040006
>  R15: 00000000279b056c    R16: c0000002eac83ea0    R17:
>  c000000001398ab8
>  R18: c0000002eac00000    R19: c0000002eac00000    R20:
>  00000000003c0000
>  R21: c0000002eac00a14    R22: c0000000011b2080    R23:
>  c0000002eac83a30
>  R24: c000000018d90000    R25: 0000000000000140    R26:
>  0000000000106001
>  R27: c0000002eac83790    R28: c0000000013ba848    R29:
>  0000000000000000
>  R30: c0000000010d4d18    R31: c00000000101e4b0
>  NIP: c000000000721d34    MSR: 8000000000009032    OR3:
>  c0000002eac83920
>  CTR: 0000000000000000    LR:  c000000000721d34    XER:
>  0000000000000001
>  CCR: 0000000022004882    MQ:  3030303030303030    DAR:
>  0000000000000000
>  DSISR: c000000018d90000     Syscall Result: 0000000000000140
>  NIP [c000000000721d34] .panic
> 
>  #0 [c0000002eac83920] .panic at c000000000721d34
>  #1 [c0000002eac839c0] .do_exit at c0000000000886e4
>  #2 [c0000002eac83ac0] .do_group_exit at c0000000000889d4
>  #3 [c0000002eac83b50] .get_signal_to_deliver at c00000000009be28
>  #4 [c0000002eac83c60] .do_signal at c000000000017924
>  #5 [c0000002eac83da0] .do_notify_resume at c000000000017d74
>  #6 [c0000002eac83e30] .ret_from_except_lite at c000000000009e1c
> 
> CAI Qian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: CAI Qian <caiqian@redhat.com>
To: stable@vger.kernel.org
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: oom caused disk corruption on 3.7.1
Date: Wed, 30 Jan 2013 01:57:05 -0500 (EST)	[thread overview]
Message-ID: <561898288.11015388.1359529025256.JavaMail.root@redhat.com> (raw)
In-Reply-To: <1022938540.1925160.1357725053304.JavaMail.root@redhat.com>



----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "linux-mm" kvack.org>
> Cc: stable@vger.kernel.org, "linux-kernel" vger.kernel.org>
> Sent: Wednesday, January 9, 2013 5:50:53 PM
> Subject: oom caused disk corruption on 3.7.1
> 
> While doing oom testing on a power7 system with swapping,
> it was swallowed a panic on v3.7.1 below. Without a swap device,
> it is running fine. v3.0 has the same problem.
This is weird that if turned on those options,
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y

it turns out to be fine except some warnings which looks like
better than a panic.
INFO: task (tmpfiles):5456 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
(tmpfiles)      D 00003fff877fb508     0  5456      1 0x00000080
Call Trace:
[c00000001cf76a30] [c0000000010a2180] jiffies+0x0/0x80 (unreliable)
[c00000001cf76c00] [c000000000014960] .__switch_to+0x110/0x240
[c00000001cf76cb0] [c0000000006b1cc0] .__schedule+0x3c0/0x8b0
[c00000001cf76f30] [c0000000006affb4] .schedule_timeout+0x1e4/0x2d0
[c00000001cf77030] [c0000000006b23fc] .wait_for_common+0x18c/0x200
[c00000001cf77110] [c0000000002863a8] .xfs_buf_iowait+0x88/0x150
[c00000001cf771a0] [c000000000286700] .xfs_buf_read_map+0xd0/0x170
[c00000001cf77240] [c0000000002f4074] .xfs_trans_read_buf_map+0x204/0x570
[c00000001cf77300] [c0000000002c5940] .xfs_da_read_buf+0x100/0x250
[c00000001cf773f0] [c0000000002c7098] .xfs_da_node_lookup_int+0xc8/0x440
[c00000001cf774c0] [c0000000002d0c60] .xfs_dir2_node_lookup+0x70/0x1d0
[c00000001cf77570] [c0000000002c8fe4] .xfs_dir_lookup+0x214/0x230
[c00000001cf776a0] [c00000000029f068] .xfs_lookup+0xb8/0x1a0
[c00000001cf77760] [c000000000293f50] .xfs_vn_lookup+0x60/0xd0
[c00000001cf77800] [c0000000001db454] .lookup_real+0x44/0xa0
[c00000001cf77890] [c0000000001e16e8] .do_last+0xad8/0xe00
[c00000001cf779c0] [c0000000001e1afc] .path_openat+0xec/0x5f0
[c00000001cf77ae0] [c0000000001e2450] .do_filp_open+0x40/0xb0
[c00000001cf77c10] [c0000000001d6308] .open_exec+0x48/0x170
[c00000001cf77cc0] [c0000000001d7ae0] .do_execve_common.isra.19+0x240/0x4e0
[c00000001cf77da0] [c0000000001d8100] .SyS_execve+0x50/0x90
[c00000001cf77e30] [c0000000000097d4] syscall_exit+0x0/0x94
> 
> Test case is here,
> http://tinyurl.com/bzzmrb8
> 
> ...
> [  763.781571] Write-error on swap-device (253:0:7545984)
> [  763.781573] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781574] Write-error on swap-device (253:0:7546240)
> [  763.781576] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781577] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [  763.781578] Write-error on swap-device (253:0:7546496)
> [  763.781579] Call Trace:
> [  763.781580] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781590] [c0000002eac83870] [c000000000015884]
> .show_stack+0x74/0x1b0 (unreliable)
> [  763.781595] [c0000002eac83920] [c000000000721d28]
> .panic+0xe4/0x264
> [  763.781598] [c0000002eac839c0] [c0000000000886e4]
> .do_exit+0x954/0x960
> [  763.781601] [c0000002eac83ac0] [c0000000000889d4]
> .do_group_exit+0x54/0xf0
> [  763.781604] [c0000002eac83b50] [c00000000009be28]
> .get_signal_to_deliver+0x1f8/0x730
> [  763.781606] [c0000002eac83c60] [c000000000017924]
> .do_signal+0x54/0x320
> [  763.781608] [c0000002eac83da0] [c000000000017d74]
> .do_notify_resume+0xb4/0xd0
> [  763.781611] [c0000002eac83e30] [c000000000009e1c]
> .ret_from_except_lite+0x48/0x4c
> [  763.781612] Write-error on swap-device (253:0:7546752)
> [  763.781613] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781615] Write-error on swap-device (253:0:7547008)
> [  763.781616] Sending IPI to other CPUs
> [  763.781616] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781618] Write-error on swap-device (253:0:7547392)
> [  763.781619] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781620] Write-error on swap-device (253:0:7547648)
> [  763.781622] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781623] Write-error on swap-device (253:0:7547904)
> [  763.781625] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781627] Write-error on swap-device (253:0:7548160)
> [  763.781628] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781630] Write-error on swap-device (253:0:7548416)
> [  763.781631] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781632] Write-error on swap-device (253:0:7548672)
> [  763.781634] sd 0:0:1:0: rejecting I/O to offline device
> [  763.781635] Write-error on swap-device (253:0:7548928)
> [  773.781972] ERROR: 1 cpu(s) not responding
> 
>       KERNEL: /boot/vmlinux-3.7.1+
>     DUMPFILE: /var/crash/127.0.0.1-2013.01.09-19:12:02/vmcore
>         CPUS: 28
>         DATE: Tue Jan  8 23:11:35 2013
>       UPTIME: 00:12:43
> LOAD AVERAGE: 5.88, 4.82, 2.51
>        TASKS: 278
>      RELEASE: 3.7.1+
>      VERSION: #0 SMP Tue Jan 8 06:59:49 EST 2013
>      MACHINE: ppc64  (3550 Mhz)
>       MEMORY: 12 GB
>        PANIC: "Kernel panic - not syncing: Attempted to kill init!
>        exitcode=0x0000000b"
>          PID: 1
>      COMMAND: "systemd"
>         TASK: c0000002eac00000  [THREAD_INFO: c0000002eac80000]
>          CPU: 18
>        STATE: TASK_INTERRUPTIBLE|TASK_UNINTERRUPTIBLE|TASK_TRACED
>        (PANIC)
> 
> crash> bt
> PID: 1      TASK: c0000002eac00000  CPU: 18  COMMAND: "systemd"
> 
>  R0:  c000000000721d34    R1:  c0000002eac83920    R2:
>   c000000001157098
>  R3:  c0000002eac83790    R4:  c0000002eac00000    R5:
>   0000000000000070
>  R6:  0000000000000000    R7:  c0000002fff584a0    R8:
>   0000000000000000
>  R9:  c0000002e7909000    R10: 0000000000000001    R11:
>  6578636570745f6c
>  R12: 0000000022004884    R13: c000000007f23f00    R14:
>  0000000000040006
>  R15: 00000000279b056c    R16: c0000002eac83ea0    R17:
>  c000000001398ab8
>  R18: c0000002eac00000    R19: c0000002eac00000    R20:
>  00000000003c0000
>  R21: c0000002eac00a14    R22: c0000000011b2080    R23:
>  c0000002eac83a30
>  R24: c000000018d90000    R25: 0000000000000140    R26:
>  0000000000106001
>  R27: c0000002eac83790    R28: c0000000013ba848    R29:
>  0000000000000000
>  R30: c0000000010d4d18    R31: c00000000101e4b0
>  NIP: c000000000721d34    MSR: 8000000000009032    OR3:
>  c0000002eac83920
>  CTR: 0000000000000000    LR:  c000000000721d34    XER:
>  0000000000000001
>  CCR: 0000000022004882    MQ:  3030303030303030    DAR:
>  0000000000000000
>  DSISR: c000000018d90000     Syscall Result: 0000000000000140
>  NIP [c000000000721d34] .panic
> 
>  #0 [c0000002eac83920] .panic at c000000000721d34
>  #1 [c0000002eac839c0] .do_exit at c0000000000886e4
>  #2 [c0000002eac83ac0] .do_group_exit at c0000000000889d4
>  #3 [c0000002eac83b50] .get_signal_to_deliver at c00000000009be28
>  #4 [c0000002eac83c60] .do_signal at c000000000017924
>  #5 [c0000002eac83da0] .do_notify_resume at c000000000017d74
>  #6 [c0000002eac83e30] .ret_from_except_lite at c000000000009e1c
> 
> CAI Qian

  reply	other threads:[~2013-01-30  6:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <767713684.1922924.1357724589680.JavaMail.root@redhat.com>
2013-01-09  9:50 ` oom caused disk corruption on 3.7.1 CAI Qian
2013-01-30  6:57   ` CAI Qian [this message]
2013-01-30  6:57     ` CAI Qian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561898288.11015388.1359529025256.JavaMail.root@redhat.com \
    --to=caiqian@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.