All of lore.kernel.org
 help / color / mirror / Atom feed
From: CAI Qian <caiqian@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: 3.9-rc2 xfs panic
Date: Tue, 12 Mar 2013 02:34:07 -0400 (EDT)	[thread overview]
Message-ID: <1032405745.12626044.1363070047355.JavaMail.root@redhat.com> (raw)
In-Reply-To: <20130312060701.GI21651@dastard>



----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: xfs@oss.sgi.com
> Sent: Tuesday, March 12, 2013 2:07:01 PM
> Subject: Re: 3.9-rc2 xfs panic
> 
> On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote:
> > Just came across when running xfstests using 3.9-rc2 kernel on a
> > power7
> > box with addition of this patch which fixed a known issue,
> > http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
> > 
> > The log shows it was happened around test case 370 with
> > TEST_PARAM_BLKSIZE = 2048
> 
> That doesn't sound like xfstests. it only has 305 tests, and no
> parameters like TEST_PARAM_BLKSIZE....
Sorry, it is a typo, test case 270 not 370. TEST_PARAM_BLKSIZE was
from an internal wrapper to be used to create new filessytem not from the
original xfstests. Apologize for that, Dave.
> 
> > Some more information:
> > xfsprogs version = 3.1.10
> > number of CPUs = 32
> > Swap Size = 4047 MB
> > Mem Size = 4046 M
> > 
> > Still reproducing and bisecting, so this is just a head-up to see
> > if
> > helps.
> > 
> > CAI Qian
> > 
> > [31797.113368] XFS (loop1): xfs_trans_ail_delete_bulk: attempting
> > to delete a log item that is not in the AIL
> > [31797.113383] XFS (loop1): xfs_do_force_shutdown(0x2) called from
> > line 743 of file fs/xfs/xfs_trans_ail.c.  Return address =
> > 0xd000000000f22838
> 
> Shutdown for an in-memory problem of some kind....
> 
> > [31817.508411] XFS (loop0): Mounting Filesystem
> > [31817.566235] XFS (loop0): Ending clean mount
> > [31819.094713] XFS (loop0): Mounting Filesystem
> > [31819.152248] XFS (loop0): Ending clean mount
> > [31819.348238] XFS (loop1): Mounting Filesystem
> > [31819.349879] XFS (loop1): Ending clean mount
> > [31819.561366] XFS (loop0): Mounting Filesystem
> > [31819.616607] XFS (loop0): Ending clean mount
> > [31819.990833] XFS (loop1): Mounting Filesystem
> > [31819.992652] XFS (loop1): Ending clean mount
> > [31819.992768] XFS (loop1): Quotacheck needed: Please wait.
> > [31820.051134] XFS (loop1): Quotacheck: Done.
> > [31832.534868] Unable to handle kernel paging request for data at
> > address 0x5841474900000001
> 
> And after remounting the filesystemi a couple of times, it's tried
> to follow an AGI buffer header (magic # XAGI, seqno = 1) as though
> it was a pointer. I can't think of why that would be
> executed....
> 
> > [31832.534881] Faulting instruction address: 0xc0000000001f8070
> > [31832.534888] Oops: Kernel access of bad area, sig: 11 [#1]
> > [31832.534891] SMP NR_CPUS=1024 NUMA pSeries
> > [31832.534899] Modules linked in: tun(F) binfmt_misc(F) hidp(F)
> > cmtp(F) kernelcapi(F) rfcomm(F) l2tp_ppp(F) l2tp_netlink(F)
> > l2tp_core(F) bnep(F) nfc(F) af_802154(F) pppoe(F) pppox(F)
> > ppp_generic(F) slhc(F) rds(F) af_key(F) atm(F) sctp(F)
> > ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F)
> > btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F)
> > nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) nfsv2(F)
> > nfs(F) dns_resolver(F) lockd(F) sunrpc(F) fscache(F)
> > nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) fuse(F)
> > sg(F) ibmveth(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F)
> > ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F)
> > dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: ipt_REJECT]
> > [31832.534978] NIP: c0000000001f8070 LR: c000000000192f6c CTR:
> > c000000000192f50
> > [31832.534984] REGS: c0000000f1c125f0 TRAP: 0300   Tainted: GF
> >       W     (3.9.0-rc2+)
> > [31832.534989] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR:
> > 24022024  XER: 20000001
> > [31832.535003] SOFTE: 0
> > [31832.535006] CFAR: c000000000005f1c
> > [31832.535009] DAR: 5841474900000001, DSISR: 40000000
> > [31832.535013] TASK = c00000003f0111c0[16795] 'loop1' THREAD:
> > c0000000f1c10000 CPU: 30
> > GPR00: c000000000192f6c c0000000f1c12870 c0000000010f3a48
> > c0000000fe015a00
> > GPR04: 0000000000011220 0000000000000080 00000000000f3aaf
> > c0000000018d5840
> > GPR08: 0000000000000000 0000000000000000 0000000000000000
> > c0000000004e3300
> > GPR12: 0000000044024024 c00000000f247800 c0000000010d01b0
> > 0000000000000000
> > GPR16: 0000000000000001 0000000000000000 c0000000009d9020
> > c0000000009d9060
> > GPR20: c0000000009d9048 0000000000000020 000000000000007f
> > 0000000000000000
> > GPR24: 0000000000000fe0 c0000000010d1020 c0000000fe015a00
> > 0000000000000000
> > GPR28: c000000000192f6c 0000000000011220 5841474900000001
> > c0000000fe015a00
> > [31832.535086] NIP [c0000000001f8070] .kmem_cache_alloc+0xb0/0x2d0
> > [31832.535092] LR [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30
> > [31832.535096] Call Trace:
> > [31832.535101] [c0000000f1c12870] [0000000000016ac3] 0x16ac3
> > (unreliable)
> > [31832.535108] [c0000000f1c12920] [c000000000192f6c]
> > .mempool_alloc_slab+0x1c/0x30
> > [31832.535114] [c0000000f1c12990] [c000000000193108]
> > .mempool_alloc+0x88/0x1c0
> > [31832.535122] [c0000000f1c12a80] [c0000000004e1824]
> > .scsi_sg_alloc+0x64/0xc0
> > [31832.535129] [c0000000f1c12af0] [c0000000003e09f8]
> > .__sg_alloc_table+0xa8/0x190
> > [31832.535135] [c0000000f1c12bc0] [c0000000004e15f0]
> > .scsi_alloc_sgtable+0x40/0x90
> > [31832.535142] [c0000000f1c12c40] [c0000000004e1668]
> > .scsi_init_sgtable+0x28/0x90
> > [31832.535148] [c0000000f1c12cc0] [c0000000004e19e0]
> > .scsi_init_io+0x40/0x1a0
> > [31832.535157] [c0000000f1c12d60] [d000000000c02e78]
> > .sd_prep_fn+0x128/0xac0 [sd_mod]
> > [31832.535164] [c0000000f1c12e20] [c0000000003a611c]
> > .blk_peek_request+0xfc/0x2d0
> > [31832.535171] [c0000000f1c12eb0] [c0000000004e2c08]
> > .scsi_request_fn+0xb8/0x6d0
> > [31832.535178] [c0000000f1c12fa0] [c00000000039d7c0]
> > .__blk_run_queue+0x50/0x80
> > [31832.535184] [c0000000f1c13020] [c0000000003a2184]
> > .queue_unplugged+0xe4/0x100
> > [31832.535190] [c0000000f1c130c0] [c0000000003a67d8]
> > .blk_flush_plug_list+0x248/0x2e0
> > [31832.535197] [c0000000f1c13180] [c0000000003a6bcc]
> > .blk_queue_bio+0x2fc/0x490
> > [31832.535203] [c0000000f1c13230] [c0000000003a436c]
> > .generic_make_request+0x11c/0x180
> > [31832.535210] [c0000000f1c132c0] [c0000000003a4484]
> > .submit_bio+0xb4/0x1e0
> > [31832.535245] [c0000000f1c13380] [d000000000eaffa0]
> > .xfs_submit_ioend_bio.isra.10+0x70/0x90 [xfs]
> > [31832.535286] [c0000000f1c133f0] [d000000000eb00f0]
> > .xfs_submit_ioend+0x130/0x190 [xfs]
> > [31832.535343] [c0000000f1c134a0] [d000000000eb045c]
> > .xfs_vm_writepage+0x30c/0x670 [xfs]
> > [31832.535349] [c0000000f1c135d0] [c00000000019d050]
> > .__writepage+0x30/0x90
> > [31832.535356] [c0000000f1c13650] [c00000000019d728]
> > .write_cache_pages+0x208/0x4f0
> > [31832.535362] [c0000000f1c137e0] [c00000000019da5c]
> > .generic_writepages+0x4c/0xa0
> > [31832.535395] [c0000000f1c138a0] [d000000000eaea10]
> > .xfs_vm_writepages+0x60/0x90 [xfs]
> > [31832.535411] [c0000000f1c13930] [c00000000019ee7c]
> > .do_writepages+0x3c/0x70
> > [31832.535424] [c0000000f1c139a0] [c0000000001914b8]
> > .__filemap_fdatawrite_range+0x68/0x80
> > [31832.535430] [c0000000f1c13a40] [c000000000191610]
> > .filemap_write_and_wait_range+0x70/0xc0
> > [31832.535463] [c0000000f1c13ad0] [d000000000eb7970]
> > .xfs_file_fsync+0x60/0x250 [xfs]
> > [31832.535479] [c0000000f1c13b90] [c00000000024c278]
> > .vfs_fsync+0x48/0x70
> > [31832.535497] [c0000000f1c13c00] [c0000000004d299c]
> > .loop_thread+0x3ec/0x5b0
> > [31832.535503] [c0000000f1c13d30] [c0000000000b58c8]
> > .kthread+0xe8/0xf0
> > [31832.535510] [c0000000f1c13e30] [c000000000009f64]
> > .ret_from_kernel_thread+0x64/0x80
> 
> So, looks like memory corruption - a corrupted slab, perhaps? Can
> you turn on memory poisoning, debugging, etc?
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-03-12  6:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1868681549.12603593.1363061997919.JavaMail.root@redhat.com>
2013-03-12  4:32 ` 3.9-rc2 xfs panic CAI Qian
2013-03-12  6:07   ` Dave Chinner
2013-03-12  6:34     ` CAI Qian [this message]
2013-03-12  7:46       ` Dave Chinner
2013-03-12  8:04         ` CAI Qian
2013-03-12 10:23           ` Dave Chinner
2013-03-13  2:44         ` CAI Qian
2013-03-13  4:43           ` Dave Chinner
2013-03-13  4:56             ` CAI Qian
2013-03-14  7:39             ` CAI Qian
2013-03-14  8:06               ` CAI Qian
2013-03-14 13:17                 ` Mark Tinguely
2013-03-14 23:39                   ` Dave Chinner
2013-03-28  8:39         ` CAI Qian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1032405745.12626044.1363070047355.JavaMail.root@redhat.com \
    --to=caiqian@redhat.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.