From: CAI Qian <caiqian@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
stable@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: 3.9.2: xfstests triggered panic
Date: Wed, 22 May 2013 23:16:56 -0400 (EDT) [thread overview]
Message-ID: <1483868349.4996990.1369279016162.JavaMail.root@redhat.com> (raw)
In-Reply-To: <20130522095300.GK29466@dastard>
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "LKML" <linux-kernel@vger.kernel.org>, stable@vger.kernel.org, xfs@oss.sgi.com
> Sent: Wednesday, May 22, 2013 5:53:00 PM
> Subject: Re: 3.9.2: xfstests triggered panic
>
> On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > Reproduced on almost all s390x guests by running xfstests.
> >
> > 14634.396658¨ XFS (dm-1): Mounting Filesystem
> > 14634.525522¨ XFS (dm-1): Ending clean mount
> > 14640.413007¨ <000000000017c6d4>¨ idle_balance+0x1a0/0x340
> > 14640.413010¨ <000000000063303e>¨ __schedule+0xa22/0xaf0
> > 14640.428279¨ <0000000000630da6>¨ schedule_timeout+0x186/0x2c0
> > 14640.428289¨ <00000000001cf864>¨ rcu_gp_kthread+0x1bc/0x298
> > 14640.428300¨ <0000000000158c5a>¨ kthread+0xe6/0xec
> > 14640.428304¨ <0000000000634de6>¨ kernel_thread_starter+0x6/0xc
> > 14640.428308¨ <0000000000634de0>¨ kernel_thread_starter+0x0/0xc
> > 14640.428311¨ Last Breaking-Event-Address:
> > 14640.428314¨ <000000000016bd76>¨ walk_tg_tree_from+0x3a/0xf4
> > 14640.428319¨ list_add corruption. next->prev should be prev
> > (0000000000000918
> > ), but was (null). (next= (null)).
>
> Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> code. This kind of implies a stack corruption....
>
> > Sometimes, this pops up,
> > [16907.275002] WARNING: at kernel/rcutree.c:1960
> >
> > or this,
> > 15316.154171¨ XFS (dm-1): Mounting Filesystem
> > 15316.255796¨ XFS (dm-1): Ending clean mount
> > 15320.364246¨ 00000000006367a2: e310b0080004 lg
> > %r1,8(%r
> > 11)
> > 15320.364249¨ 00000000006367a8: 41101010 la
> > %r1,16(%
> > r1)
> > 15320.364251¨ 00000000006367ac: e33010000004 lg
> > %r3,0(%r
> > 1)
> > 15320.364252¨ Call Trace:
> > 15320.364252¨ Last Breaking-Event-Address:
> > 15320.364253¨ � <0000000000000000>¨ Kernel stack overflow.
> > 15320.364308¨ CPU: 0 Tainted: GF W 3.9.2 #1
> > 15320.364309¨ Process rhts-test-runne (pid: 625, task: 000000003dccc890,
> > ksp: 0
>
> .... and there you go - a stack overflow. Your kernel stack size is
> too small.
>
> I'd suggest that you need 16k stacks on s390 - IIRC every function
> call has 128 byte stack frame, and there are call chains 70-80
> functions deep in the storage stack...
Hmm, I am unsure how to set to 16k stack there, and power 7 has looks
like has the same problem.
[14927.117017] XFS (dm-0): Mounting Filesystem
[14927.299854] XFS (dm-0): Ending clean mount
[14927.668909] Unable to handle kernel paging request for data at address 0x00000040
[14927.668913] Unable to handle kernel paging request for data at address 0x000000f8
[14927.668914] Unable to handle kernel paging request for data at address 0x000000bb
[14927.668915] Faulting instruction address: 0xc0000000000d1bd8
[14927.668916] Faulting instruction address: 0xc0000000000d1bd8
[14927.668919] Unable to handle kernel paging request for data at address 0x00000018
[14927.668920] Faulting instruction address: 0xc0000000003d34b8
[14927.668922] Oops: Kernel access of bad area, sig: 11 [#1]
[14927.668924] SMP NR_CPUS=1024 NUMA pSeries
[14927.668927] Modules linked in: binfmt_misc(F) tun(F) ipt_ULOG(F) rds(F) scsi_transport_iscsi(F) atm(F) nfc(F) pppoe(F) pppox(F) ppp_generic(F) slhc(F) af_802154(F) af_key(F) sctp(F) btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) arc4(F) md4(F) nls_utf8(F) cifs(F) dns_resolver(F) nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F)[14927.668955] Faulting instruction address: 0xc0000000000d1bd8
fuse(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) ehea(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: brd]
[14927.669041] NIP: c0000000000d1bd8 LR: c0000000000d1b94 CTR: c0000000000d7e30
[14927.669048] REGS: c0000001fbfb3120 TRAP: 0300 Tainted: GF (3.9.3)
[14927.669053] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 28000028 XER: 00000000
[14927.669069] SOFTE: 0
[14927.669072] CFAR: c00000000000908c
[14927.669076] DAR: 00000000000000f8, DSISR: 40000000
[14927.669080] TASK = c0000001fbf14880[0] 'swapper/2' THREAD: c0000001fbfb0000 CPU: 2
GPR00: c0000000000d1b94 c0000001fbfb33a0 c0000000010f3038 00000d939e66add6
GPR04: 0000000000000000 00000001001651f2 0000000000000099 c000000000af3038
GPR08: c000000001163038 0000000000000002 00000000000000b8 000c3420953d115d
GPR12: 0000000048000022 c00000000ed90800 c0000001fbfb3f90 000000000eee7bc0
GPR16: 0000000010200040 00000001001651f2 c000000001152100 0000000000000000
GPR20: c000000000af3f80 c000000001152180 0000000000000000 0000000000000000
GPR24: c0000000007801e8 0000000000000001 0000000000200200 c0000000015550d0
GPR28: c000000001554880 0000000000000000 c0000001f5564200 0000000000000000
[14927.669159] NIP [c0000000000d1bd8] .update_blocked_averages+0xc8/0x5c0
[14927.669165] LR [c0000000000d1b94] .update_blocked_averages+0x84/0x5c0
[14927.669170] Call Trace:
[14927.669174] [c0000001fbfb33a0] [c0000000000d1b94] .update_blocked_averages+0x84/0x5c0 (unreliable)
[14927.669183] [c0000001fbfb3490] [c0000000000d7c54] .rebalance_domains+0x84/0x260
[14927.669190] [c0000001fbfb3570] [c0000000000d7eb4] .run_rebalance_domains+0x84/0x230
[14927.669198] [c0000001fbfb3650] [c000000000091228] .__do_softirq+0x148/0x310
[14927.669205] [c0000001fbfb3740] [c000000000091608] .irq_exit+0xc8/0xe0
[14927.669212] [c0000001fbfb37c0] [c00000000001d214] .timer_interrupt+0x154/0x2e0
[14927.669220] [c0000001fbfb3870] [c0000000000024d4] decrementer_common+0x154/0x180
[14927.669230] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4
[14927.669230] LR = .check_and_cede_processor+0x24/0x40
[14927.669240] [c0000001fbfb3b60] [0000000000000001] 0x1 (unreliable)
[14927.669247] [c0000001fbfb3bd0] [c00000000006d070] .shared_cede_loop+0x50/0xe0
[14927.669256] [c0000001fbfb3c90] [c0000000005b818c] .cpuidle_enter+0x2c/0x40
[14927.669263] [c0000001fbfb3d00] [c0000000005b8ad0] .cpuidle_idle_call+0xf0/0x300
[14927.669270] [c0000001fbfb3db0] [c00000000005dab0] .pSeries_idle+0x10/0x40
[14927.669278] [c0000001fbfb3e20] [c0000000000171b8] .cpu_idle+0x158/0x2a0
[14927.669285] [c0000001fbfb3ed0] [c00000000074c030] .start_secondary+0x3a4/0x3ac
[14927.669293] [c0000001fbfb3f90] [c00000000000976c] .start_secondary_prolog+0x10/0x14
[14927.669299] Instruction dump:
[14927.669303] 7fbbf040 3bdeff50 419e01f0 3f400020 3f02ff69 3ae00000 3ac00000 3b18d1b0
[14927.669314] 635a0200 60000000 e93c0912 e95e00c0 <e90a0040> e94a0048 79291f24 7fe8482a
[14927.669334] ---[ end trace ac4936baffc8b47b ]---
[14927.671261]
[14927.671266] Oops: Kernel access of bad area, sig: 11 [#2]
[14927.671272] SMP NR_CPUS=1024 NUMA pSeries
CAI Qian
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: CAI Qian <caiqian@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
stable@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: 3.9.2: xfstests triggered panic
Date: Wed, 22 May 2013 23:16:56 -0400 (EDT) [thread overview]
Message-ID: <1483868349.4996990.1369279016162.JavaMail.root@redhat.com> (raw)
In-Reply-To: <20130522095300.GK29466@dastard>
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: "LKML" <linux-kernel@vger.kernel.org>, stable@vger.kernel.org, xfs@oss.sgi.com
> Sent: Wednesday, May 22, 2013 5:53:00 PM
> Subject: Re: 3.9.2: xfstests triggered panic
>
> On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > Reproduced on almost all s390x guests by running xfstests.
> >
> > 14634.396658¨ XFS (dm-1): Mounting Filesystem
> > 14634.525522¨ XFS (dm-1): Ending clean mount
> > 14640.413007¨ <000000000017c6d4>¨ idle_balance+0x1a0/0x340
> > 14640.413010¨ <000000000063303e>¨ __schedule+0xa22/0xaf0
> > 14640.428279¨ <0000000000630da6>¨ schedule_timeout+0x186/0x2c0
> > 14640.428289¨ <00000000001cf864>¨ rcu_gp_kthread+0x1bc/0x298
> > 14640.428300¨ <0000000000158c5a>¨ kthread+0xe6/0xec
> > 14640.428304¨ <0000000000634de6>¨ kernel_thread_starter+0x6/0xc
> > 14640.428308¨ <0000000000634de0>¨ kernel_thread_starter+0x0/0xc
> > 14640.428311¨ Last Breaking-Event-Address:
> > 14640.428314¨ <000000000016bd76>¨ walk_tg_tree_from+0x3a/0xf4
> > 14640.428319¨ list_add corruption. next->prev should be prev
> > (0000000000000918
> > ), but was (null). (next= (null)).
>
> Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> code. This kind of implies a stack corruption....
>
> > Sometimes, this pops up,
> > [16907.275002] WARNING: at kernel/rcutree.c:1960
> >
> > or this,
> > 15316.154171¨ XFS (dm-1): Mounting Filesystem
> > 15316.255796¨ XFS (dm-1): Ending clean mount
> > 15320.364246¨ 00000000006367a2: e310b0080004 lg
> > %r1,8(%r
> > 11)
> > 15320.364249¨ 00000000006367a8: 41101010 la
> > %r1,16(%
> > r1)
> > 15320.364251¨ 00000000006367ac: e33010000004 lg
> > %r3,0(%r
> > 1)
> > 15320.364252¨ Call Trace:
> > 15320.364252¨ Last Breaking-Event-Address:
> > 15320.364253¨ � <0000000000000000>¨ Kernel stack overflow.
> > 15320.364308¨ CPU: 0 Tainted: GF W 3.9.2 #1
> > 15320.364309¨ Process rhts-test-runne (pid: 625, task: 000000003dccc890,
> > ksp: 0
>
> .... and there you go - a stack overflow. Your kernel stack size is
> too small.
>
> I'd suggest that you need 16k stacks on s390 - IIRC every function
> call has 128 byte stack frame, and there are call chains 70-80
> functions deep in the storage stack...
Hmm, I am unsure how to set to 16k stack there, and power 7 has looks
like has the same problem.
[14927.117017] XFS (dm-0): Mounting Filesystem
[14927.299854] XFS (dm-0): Ending clean mount
[14927.668909] Unable to handle kernel paging request for data at address 0x00000040
[14927.668913] Unable to handle kernel paging request for data at address 0x000000f8
[14927.668914] Unable to handle kernel paging request for data at address 0x000000bb
[14927.668915] Faulting instruction address: 0xc0000000000d1bd8
[14927.668916] Faulting instruction address: 0xc0000000000d1bd8
[14927.668919] Unable to handle kernel paging request for data at address 0x00000018
[14927.668920] Faulting instruction address: 0xc0000000003d34b8
[14927.668922] Oops: Kernel access of bad area, sig: 11 [#1]
[14927.668924] SMP NR_CPUS=1024 NUMA pSeries
[14927.668927] Modules linked in: binfmt_misc(F) tun(F) ipt_ULOG(F) rds(F) scsi_transport_iscsi(F) atm(F) nfc(F) pppoe(F) pppox(F) ppp_generic(F) slhc(F) af_802154(F) af_key(F) sctp(F) btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) arc4(F) md4(F) nls_utf8(F) cifs(F) dns_resolver(F) nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F)[14927.668955] Faulting instruction address: 0xc0000000000d1bd8
fuse(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) ehea(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: brd]
[14927.669041] NIP: c0000000000d1bd8 LR: c0000000000d1b94 CTR: c0000000000d7e30
[14927.669048] REGS: c0000001fbfb3120 TRAP: 0300 Tainted: GF (3.9.3)
[14927.669053] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 28000028 XER: 00000000
[14927.669069] SOFTE: 0
[14927.669072] CFAR: c00000000000908c
[14927.669076] DAR: 00000000000000f8, DSISR: 40000000
[14927.669080] TASK = c0000001fbf14880[0] 'swapper/2' THREAD: c0000001fbfb0000 CPU: 2
GPR00: c0000000000d1b94 c0000001fbfb33a0 c0000000010f3038 00000d939e66add6
GPR04: 0000000000000000 00000001001651f2 0000000000000099 c000000000af3038
GPR08: c000000001163038 0000000000000002 00000000000000b8 000c3420953d115d
GPR12: 0000000048000022 c00000000ed90800 c0000001fbfb3f90 000000000eee7bc0
GPR16: 0000000010200040 00000001001651f2 c000000001152100 0000000000000000
GPR20: c000000000af3f80 c000000001152180 0000000000000000 0000000000000000
GPR24: c0000000007801e8 0000000000000001 0000000000200200 c0000000015550d0
GPR28: c000000001554880 0000000000000000 c0000001f5564200 0000000000000000
[14927.669159] NIP [c0000000000d1bd8] .update_blocked_averages+0xc8/0x5c0
[14927.669165] LR [c0000000000d1b94] .update_blocked_averages+0x84/0x5c0
[14927.669170] Call Trace:
[14927.669174] [c0000001fbfb33a0] [c0000000000d1b94] .update_blocked_averages+0x84/0x5c0 (unreliable)
[14927.669183] [c0000001fbfb3490] [c0000000000d7c54] .rebalance_domains+0x84/0x260
[14927.669190] [c0000001fbfb3570] [c0000000000d7eb4] .run_rebalance_domains+0x84/0x230
[14927.669198] [c0000001fbfb3650] [c000000000091228] .__do_softirq+0x148/0x310
[14927.669205] [c0000001fbfb3740] [c000000000091608] .irq_exit+0xc8/0xe0
[14927.669212] [c0000001fbfb37c0] [c00000000001d214] .timer_interrupt+0x154/0x2e0
[14927.669220] [c0000001fbfb3870] [c0000000000024d4] decrementer_common+0x154/0x180
[14927.669230] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4
[14927.669230] LR = .check_and_cede_processor+0x24/0x40
[14927.669240] [c0000001fbfb3b60] [0000000000000001] 0x1 (unreliable)
[14927.669247] [c0000001fbfb3bd0] [c00000000006d070] .shared_cede_loop+0x50/0xe0
[14927.669256] [c0000001fbfb3c90] [c0000000005b818c] .cpuidle_enter+0x2c/0x40
[14927.669263] [c0000001fbfb3d00] [c0000000005b8ad0] .cpuidle_idle_call+0xf0/0x300
[14927.669270] [c0000001fbfb3db0] [c00000000005dab0] .pSeries_idle+0x10/0x40
[14927.669278] [c0000001fbfb3e20] [c0000000000171b8] .cpu_idle+0x158/0x2a0
[14927.669285] [c0000001fbfb3ed0] [c00000000074c030] .start_secondary+0x3a4/0x3ac
[14927.669293] [c0000001fbfb3f90] [c00000000000976c] .start_secondary_prolog+0x10/0x14
[14927.669299] Instruction dump:
[14927.669303] 7fbbf040 3bdeff50 419e01f0 3f400020 3f02ff69 3ae00000 3ac00000 3b18d1b0
[14927.669314] 635a0200 60000000 e93c0912 e95e00c0 <e90a0040> e94a0048 79291f24 7fe8482a
[14927.669334] ---[ end trace ac4936baffc8b47b ]---
[14927.671261]
[14927.671266] Oops: Kernel access of bad area, sig: 11 [#2]
[14927.671272] SMP NR_CPUS=1024 NUMA pSeries
CAI Qian
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-05-23 3:17 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <40971621.4497871.1369211701112.JavaMail.root@redhat.com>
2013-05-22 8:39 ` 3.9.2: xfstests triggered panic CAI Qian
2013-05-22 8:39 ` CAI Qian
2013-05-22 9:53 ` Dave Chinner
2013-05-22 9:53 ` Dave Chinner
2013-05-23 3:16 ` CAI Qian [this message]
2013-05-23 3:16 ` CAI Qian
2013-05-23 3:46 ` Dave Chinner
2013-05-23 3:46 ` Dave Chinner
2013-05-23 4:11 ` CAI Qian
2013-05-23 4:11 ` CAI Qian
2013-05-23 4:57 ` 3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic) CAI Qian
2013-05-23 4:57 ` CAI Qian
2013-05-23 4:57 ` CAI Qian
2013-05-24 3:33 ` CAI Qian
2013-05-24 3:33 ` CAI Qian
2013-05-24 3:33 ` CAI Qian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1483868349.4996990.1369279016162.JavaMail.root@redhat.com \
--to=caiqian@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.