* 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 @ 2010-03-15 9:32 Justin Piszcz 2010-03-15 14:40 ` Eric Sandeen 0 siblings, 1 reply; 6+ messages in thread From: Justin Piszcz @ 2010-03-15 9:32 UTC (permalink / raw) To: linux-kernel; +Cc: xfs Specifications: 2.6.33 32bit Debian Testing On Mar 14th, my server hung up: Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here ]------------ Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000 [#1] SMP Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file: /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid: 584, ti=f687a000 task=f70fa400 task.ti=f687a000) Mar 14 00:00:22 server1 kernel: [488470.189705] Stack: Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace: Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56 89 c6 b8 80 b0 52 c1 83 e6 07 53 Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 Is this a bug related to AMD (CPU) or XFS? Justin. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 2010-03-15 9:32 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 Justin Piszcz @ 2010-03-15 14:40 ` Eric Sandeen 2010-03-15 14:44 ` Justin Piszcz 0 siblings, 1 reply; 6+ messages in thread From: Eric Sandeen @ 2010-03-15 14:40 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-kernel, xfs Justin Piszcz wrote: > Specifications: > > 2.6.33 > 32bit > Debian Testing > > On Mar 14th, my server hung up: > > Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here > ]------------ > Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000 > [#1] SMP > Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file: > /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent > > Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid: > 584, ti=f687a000 task=f70fa400 task.ti=f687a000) > Mar 14 00:00:22 server1 kernel: [488470.189705] Stack: > Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace: > Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00 > 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24 > 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56 > 89 c6 b8 80 b0 52 c1 83 e6 07 53 > Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>] > assfail+0x1b/0x20 SS:ESP 0068:f687bf14 > > Is this a bug related to AMD (CPU) or XFS? Hard to know without seeing the rest of the stack trace, is this all you got? You hit an assertion failure in xfs. Are you running with CONFIG_XFS_DEBUG? -Eric > Justin. > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 2010-03-15 14:40 ` Eric Sandeen @ 2010-03-15 14:44 ` Justin Piszcz 2010-03-15 14:46 ` Eric Sandeen 0 siblings, 1 reply; 6+ messages in thread From: Justin Piszcz @ 2010-03-15 14:44 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-kernel, xfs On Mon, 15 Mar 2010, Eric Sandeen wrote: > Justin Piszcz wrote: >> Specifications: >> >> 2.6.33 >> 32bit >> Debian Testing >> >> On Mar 14th, my server hung up: >> >> Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here >> ]------------ >> Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000 >> [#1] SMP >> Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file: >> /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent >> >> Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid: >> 584, ti=f687a000 task=f70fa400 task.ti=f687a000) >> Mar 14 00:00:22 server1 kernel: [488470.189705] Stack: >> Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace: >> Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00 >> 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24 >> 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56 >> 89 c6 b8 80 b0 52 c1 83 e6 07 53 >> Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>] >> assfail+0x1b/0x20 SS:ESP 0068:f687bf14 >> >> Is this a bug related to AMD (CPU) or XFS? > > Hard to know without seeing the rest of the stack trace, is this all you got? Yes, that was the only entry in the log file, I could not wake up the monitor to get any more information. > > You hit an assertion failure in xfs. > > Are you running with CONFIG_XFS_DEBUG? Nope. CONFIG_XFS_FS=y # CONFIG_XFS_QUOTA is not set # CONFIG_XFS_POSIX_ACL is not set # CONFIG_XFS_RT is not set # CONFIG_XFS_DEBUG is not set > > -Eric > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 2010-03-15 14:44 ` Justin Piszcz @ 2010-03-15 14:46 ` Eric Sandeen 2010-03-15 15:11 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Eric Sandeen @ 2010-03-15 14:46 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-kernel, xfs Justin Piszcz wrote: > > > On Mon, 15 Mar 2010, Eric Sandeen wrote: > >> Justin Piszcz wrote: >>> Specifications: >>> >>> 2.6.33 >>> 32bit >>> Debian Testing >>> >>> On Mar 14th, my server hung up: >>> >>> Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here >>> ]------------ >>> Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000 >>> [#1] SMP >>> Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file: >>> /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent >>> >>> >>> Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid: >>> 584, ti=f687a000 task=f70fa400 task.ti=f687a000) >>> Mar 14 00:00:22 server1 kernel: [488470.189705] Stack: >>> Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace: >>> Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00 >>> 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24 >>> 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56 >>> 89 c6 b8 80 b0 52 c1 83 e6 07 53 >>> Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>] >>> assfail+0x1b/0x20 SS:ESP 0068:f687bf14 >>> >>> Is this a bug related to AMD (CPU) or XFS? >> >> Hard to know without seeing the rest of the stack trace, is this all >> you got? > Yes, that was the only entry in the log file, I could not wake up the > monitor to get any more information. > >> >> You hit an assertion failure in xfs. >> >> Are you running with CONFIG_XFS_DEBUG? > Nope. > > CONFIG_XFS_FS=y > # CONFIG_XFS_QUOTA is not set > # CONFIG_XFS_POSIX_ACL is not set > # CONFIG_XFS_RT is not set > # CONFIG_XFS_DEBUG is not set Ok, then you hit an ASSERT_ALWAYS There are only a few: 0 xfs/linux-2.6/xfs_super.c xfs_fs_destroy_inode 953 ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIMABLE)); 1 xfs/linux-2.6/xfs_super.c xfs_fs_destroy_inode 954 ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIM)); 2 xfs/linux-2.6/xfs_sync.c xfs_reclaim_inode 727 ASSERT_ALWAYS(__xfs_iflags_test(ip, XFS_IRECLAIMABLE)); 3 fs/xfs/xfs_log.c xfs_log_notify 376 ASSERT_ALWAYS((iclog->ic_state == XLOG_STATE_ACTIVE) || 4 fs/xfs/xfs_log.c xlog_commit_record 1273 ASSERT_ALWAYS(iclog); but I'm not sure which one you hit since there is no backtrace provided. -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 2010-03-15 14:46 ` Eric Sandeen @ 2010-03-15 15:11 ` Christoph Hellwig 2010-03-15 15:21 ` Justin Piszcz 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2010-03-15 15:11 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs, Justin Piszcz, linux-kernel On Mon, Mar 15, 2010 at 09:46:59AM -0500, Eric Sandeen wrote: > Ok, then you hit an ASSERT_ALWAYS > > There are only a few: If the box is a nfs server I suspect it's a reclaimable state one and the patch below should help: --- From: Christoph Hellwig <hch@lst.de> Subject: xfs: fix locking for inode cache radix tree tag updates The radix-tree code requires it's users to serialize tag updates against other updates to the tree. While XFS protects tag updates against each other it does not serialize them against updates of the tree contents, which can lead to tag corruption. Fix the inode cache to always take pag_ici_lock in exclusive mode when updating radix tree tags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Patrick Schreurs <patrick@news-service.com> Tested-by: Patrick Schreurs <patrick@news-service.com> Index: xfs/fs/xfs/linux-2.6/xfs_sync.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2010-02-10 13:08:41.000000000 +0100 +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2010-02-10 15:53:28.739570272 +0100 @@ -687,12 +687,12 @@ xfs_inode_set_reclaim_tag( struct xfs_perag *pag; pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); - read_lock(&pag->pag_ici_lock); + write_lock(&pag->pag_ici_lock); spin_lock(&ip->i_flags_lock); __xfs_inode_set_reclaim_tag(pag, ip); __xfs_iflags_set(ip, XFS_IRECLAIMABLE); spin_unlock(&ip->i_flags_lock); - read_unlock(&pag->pag_ici_lock); + write_unlock(&pag->pag_ici_lock); xfs_perag_put(pag); } Index: xfs/fs/xfs/xfs_iget.c =================================================================== --- xfs.orig/fs/xfs/xfs_iget.c 2010-02-04 17:28:35.000000000 +0100 +++ xfs/fs/xfs/xfs_iget.c 2010-02-10 15:53:55.504284758 +0100 @@ -190,13 +190,12 @@ xfs_iget_cache_hit( trace_xfs_iget_reclaim(ip); /* - * We need to set XFS_INEW atomically with clearing the - * reclaimable tag so that we do have an indicator of the - * inode still being initialized. + * We need to set XFS_IRECLAIM to prevent xfs_reclaim_inode + * from stomping over us while we recycle the inode. We can't + * clear the radix tree reclaimable tag yet as it requires + * pag_ici_lock to be helt exclusive. */ - ip->i_flags |= XFS_INEW; - ip->i_flags &= ~XFS_IRECLAIMABLE; - __xfs_inode_clear_reclaim_tag(mp, pag, ip); + ip->i_flags |= XFS_IRECLAIM; spin_unlock(&ip->i_flags_lock); read_unlock(&pag->pag_ici_lock); @@ -216,7 +215,15 @@ xfs_iget_cache_hit( trace_xfs_iget_reclaim(ip); goto out_error; } + + write_lock(&pag->pag_ici_lock); + spin_lock(&ip->i_flags_lock); + ip->i_flags &= ~(XFS_IRECLAIMABLE | XFS_IRECLAIM); + ip->i_flags |= XFS_INEW; + __xfs_inode_clear_reclaim_tag(mp, pag, ip); inode->i_state = I_NEW; + spin_unlock(&ip->i_flags_lock); + write_unlock(&pag->pag_ici_lock); } else { /* If the VFS inode is being torn down, pause and try again. */ if (!igrab(inode)) { _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 2010-03-15 15:11 ` Christoph Hellwig @ 2010-03-15 15:21 ` Justin Piszcz 0 siblings, 0 replies; 6+ messages in thread From: Justin Piszcz @ 2010-03-15 15:21 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Eric Sandeen, linux-kernel, xfs On Mon, 15 Mar 2010, Christoph Hellwig wrote: > On Mon, Mar 15, 2010 at 09:46:59AM -0500, Eric Sandeen wrote: >> Ok, then you hit an ASSERT_ALWAYS >> >> There are only a few: > > If the box is a nfs server I suspect it's a reclaimable state one and > the patch below should help: Hello, The host is not running an NFS server actively, the option may be compiled in but it is not 'acting' as an NFS server/has shares exported. I need stability for this specific host, I cannot run netconsole due to restrictions etc/will switch to ext4 since, if it recurs again, all I/O to the disk stops and I will not be able to assist in reporting the bug/giving the full output. If I could reproduce the bug in another environment, I would be able to troubleshoot the problem more.. This bug is very similar to the one I reported in the past (that host was an active NFS server): http://lists.openwall.net/linux-kernel/2009/10/19/118 Justin. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-03-15 15:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-15 9:32 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 Justin Piszcz 2010-03-15 14:40 ` Eric Sandeen 2010-03-15 14:44 ` Justin Piszcz 2010-03-15 14:46 ` Eric Sandeen 2010-03-15 15:11 ` Christoph Hellwig 2010-03-15 15:21 ` Justin Piszcz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox