2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14
@ 2010-03-15  9:32 Justin Piszcz
  2010-03-15 14:40 ` Eric Sandeen
  0 siblings, 1 reply; 6+ messages in thread
From: Justin Piszcz @ 2010-03-15  9:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs

Specifications:

2.6.33
32bit
Debian Testing

On Mar 14th, my server hung up:

Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here ]------------
Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000 [#1] SMP
Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file: /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent
Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid: 584, ti=f687a000 task=f70fa400 task.ti=f687a000)
Mar 14 00:00:22 server1 kernel: [488470.189705] Stack:
Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace:
Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56 89 c6 b8 80 b0 52 c1 83 e6 07 53
Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14

Is this a bug related to AMD (CPU) or XFS?

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14
  2010-03-15  9:32 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 Justin Piszcz
@ 2010-03-15 14:40 ` Eric Sandeen
  2010-03-15 14:44   ` Justin Piszcz
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2010-03-15 14:40 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, xfs

Justin Piszcz wrote:
> Specifications:
> 
> 2.6.33
> 32bit
> Debian Testing
> 
> On Mar 14th, my server hung up:
> 
> Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here
> ]------------
> Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000
> [#1] SMP
> Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file:
> /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent
> 
> Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid:
> 584, ti=f687a000 task=f70fa400 task.ti=f687a000)
> Mar 14 00:00:22 server1 kernel: [488470.189705] Stack:
> Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace:
> Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00
> 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24
> 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56
> 89 c6 b8 80 b0 52 c1 83 e6 07 53
> Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>]
> assfail+0x1b/0x20 SS:ESP 0068:f687bf14
> 
> Is this a bug related to AMD (CPU) or XFS?

Hard to know without seeing the rest of the stack trace, is this all you got?

You hit an assertion failure in xfs.

Are you running with CONFIG_XFS_DEBUG?

-Eric

> Justin.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14
  2010-03-15 14:40 ` Eric Sandeen
@ 2010-03-15 14:44   ` Justin Piszcz
  2010-03-15 14:46     ` Eric Sandeen
  0 siblings, 1 reply; 6+ messages in thread
From: Justin Piszcz @ 2010-03-15 14:44 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-kernel, xfs



On Mon, 15 Mar 2010, Eric Sandeen wrote:

> Justin Piszcz wrote:
>> Specifications:
>>
>> 2.6.33
>> 32bit
>> Debian Testing
>>
>> On Mar 14th, my server hung up:
>>
>> Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here
>> ]------------
>> Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000
>> [#1] SMP
>> Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file:
>> /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent
>>
>> Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid:
>> 584, ti=f687a000 task=f70fa400 task.ti=f687a000)
>> Mar 14 00:00:22 server1 kernel: [488470.189705] Stack:
>> Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace:
>> Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00
>> 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24
>> 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56
>> 89 c6 b8 80 b0 52 c1 83 e6 07 53
>> Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>]
>> assfail+0x1b/0x20 SS:ESP 0068:f687bf14
>>
>> Is this a bug related to AMD (CPU) or XFS?
>
> Hard to know without seeing the rest of the stack trace, is this all you got?
Yes, that was the only entry in the log file, I could not wake up the
monitor to get any more information.

>
> You hit an assertion failure in xfs.
>
> Are you running with CONFIG_XFS_DEBUG?
Nope.

CONFIG_XFS_FS=y
# CONFIG_XFS_QUOTA is not set
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_XFS_RT is not set
# CONFIG_XFS_DEBUG is not set

>
> -Eric
>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14
  2010-03-15 14:44   ` Justin Piszcz
@ 2010-03-15 14:46     ` Eric Sandeen
  2010-03-15 15:11       ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2010-03-15 14:46 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, xfs

Justin Piszcz wrote:
> 
> 
> On Mon, 15 Mar 2010, Eric Sandeen wrote:
> 
>> Justin Piszcz wrote:
>>> Specifications:
>>>
>>> 2.6.33
>>> 32bit
>>> Debian Testing
>>>
>>> On Mar 14th, my server hung up:
>>>
>>> Mar 14 00:00:22 server1 kernel: [488470.189675] ------------[ cut here
>>> ]------------
>>> Mar 14 00:00:22 server1 kernel: [488470.189679] invalid opcode: 0000
>>> [#1] SMP
>>> Mar 14 00:00:22 server1 kernel: [488470.189681] last sysfs file:
>>> /sys/devices/pci0000:00/0000:00:0e.0/host0/target0:0:0/0:0:0:0/block/sda/uevent
>>>
>>>
>>> Mar 14 00:00:22 server1 kernel: [488470.189704] Process xfssyncd (pid:
>>> 584, ti=f687a000 task=f70fa400 task.ti=f687a000)
>>> Mar 14 00:00:22 server1 kernel: [488470.189705] Stack:
>>> Mar 14 00:00:22 server1 kernel: [488470.189721] Call Trace:
>>> Mar 14 00:00:22 server1 kernel: [488470.189739] Code: 00 e8 ea e5 01 00
>>> 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89 54 24 08 89 44 24
>>> 04 c7 04 24 30 ab 42 c1 e8 7c 11 1c 00 <0f> 0b eb fe 90 55 57 89 cf 56
>>> 89 c6 b8 80 b0 52 c1 83 e6 07 53
>>> Mar 14 00:00:22 server1 kernel: [488470.189739] EIP: [<c11a018b>]
>>> assfail+0x1b/0x20 SS:ESP 0068:f687bf14
>>>
>>> Is this a bug related to AMD (CPU) or XFS?
>>
>> Hard to know without seeing the rest of the stack trace, is this all
>> you got?
> Yes, that was the only entry in the log file, I could not wake up the
> monitor to get any more information.
> 
>>
>> You hit an assertion failure in xfs.
>>
>> Are you running with CONFIG_XFS_DEBUG?
> Nope.
> 
> CONFIG_XFS_FS=y
> # CONFIG_XFS_QUOTA is not set
> # CONFIG_XFS_POSIX_ACL is not set
> # CONFIG_XFS_RT is not set
> # CONFIG_XFS_DEBUG is not set

Ok, then you hit an ASSERT_ALWAYS

There are only a few:

0 xfs/linux-2.6/xfs_super.c xfs_fs_destroy_inode  953 ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIMABLE));
1 xfs/linux-2.6/xfs_super.c xfs_fs_destroy_inode  954 ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIM));
2 xfs/linux-2.6/xfs_sync.c  xfs_reclaim_inode     727 ASSERT_ALWAYS(__xfs_iflags_test(ip, XFS_IRECLAIMABLE));
3 fs/xfs/xfs_log.c          xfs_log_notify        376 ASSERT_ALWAYS((iclog->ic_state == XLOG_STATE_ACTIVE) ||
4 fs/xfs/xfs_log.c          xlog_commit_record   1273 ASSERT_ALWAYS(iclog);

but I'm not sure which one you hit since there is no backtrace provided.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14
  2010-03-15 14:46     ` Eric Sandeen
@ 2010-03-15 15:11       ` Christoph Hellwig
  2010-03-15 15:21         ` Justin Piszcz
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2010-03-15 15:11 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs, Justin Piszcz, linux-kernel

On Mon, Mar 15, 2010 at 09:46:59AM -0500, Eric Sandeen wrote:
> Ok, then you hit an ASSERT_ALWAYS
> 
> There are only a few:

If the box is a nfs server I suspect it's a reclaimable state one and
the patch below should help:

---

From: Christoph Hellwig <hch@lst.de>
Subject: xfs: fix locking for inode cache radix tree tag updates

The radix-tree code requires it's users to serialize tag updates against
other updates to the tree.  While XFS protects tag updates against each
other it does not serialize them against updates of the tree contents,
which can lead to tag corruption.  Fix the inode cache to always take
pag_ici_lock in exclusive mode when updating radix tree tags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>

Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c	2010-02-10 13:08:41.000000000 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_sync.c	2010-02-10 15:53:28.739570272 +0100
@@ -687,12 +687,12 @@ xfs_inode_set_reclaim_tag(
 	struct xfs_perag *pag;
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	read_lock(&pag->pag_ici_lock);
+	write_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
 	__xfs_inode_set_reclaim_tag(pag, ip);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
-	read_unlock(&pag->pag_ici_lock);
+	write_unlock(&pag->pag_ici_lock);
 	xfs_perag_put(pag);
 }
 
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c	2010-02-04 17:28:35.000000000 +0100
+++ xfs/fs/xfs/xfs_iget.c	2010-02-10 15:53:55.504284758 +0100
@@ -190,13 +190,12 @@ xfs_iget_cache_hit(
 		trace_xfs_iget_reclaim(ip);
 
 		/*
-		 * We need to set XFS_INEW atomically with clearing the
-		 * reclaimable tag so that we do have an indicator of the
-		 * inode still being initialized.
+		 * We need to set XFS_IRECLAIM to prevent xfs_reclaim_inode
+		 * from stomping over us while we recycle the inode.  We can't
+		 * clear the radix tree reclaimable tag yet as it requires
+		 * pag_ici_lock to be helt exclusive.
 		 */
-		ip->i_flags |= XFS_INEW;
-		ip->i_flags &= ~XFS_IRECLAIMABLE;
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
+		ip->i_flags |= XFS_IRECLAIM;
 
 		spin_unlock(&ip->i_flags_lock);
 		read_unlock(&pag->pag_ici_lock);
@@ -216,7 +215,15 @@ xfs_iget_cache_hit(
 			trace_xfs_iget_reclaim(ip);
 			goto out_error;
 		}
+
+		write_lock(&pag->pag_ici_lock);
+		spin_lock(&ip->i_flags_lock);
+		ip->i_flags &= ~(XFS_IRECLAIMABLE | XFS_IRECLAIM);
+		ip->i_flags |= XFS_INEW;
+		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
 		inode->i_state = I_NEW;
+		spin_unlock(&ip->i_flags_lock);
+		write_unlock(&pag->pag_ici_lock);
 	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
 		if (!igrab(inode)) {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14
  2010-03-15 15:11       ` Christoph Hellwig
@ 2010-03-15 15:21         ` Justin Piszcz
  0 siblings, 0 replies; 6+ messages in thread
From: Justin Piszcz @ 2010-03-15 15:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Sandeen, linux-kernel, xfs

On Mon, 15 Mar 2010, Christoph Hellwig wrote:

> On Mon, Mar 15, 2010 at 09:46:59AM -0500, Eric Sandeen wrote:
>> Ok, then you hit an ASSERT_ALWAYS
>>
>> There are only a few:
>
> If the box is a nfs server I suspect it's a reclaimable state one and
> the patch below should help:

Hello,

The host is not running an NFS server actively, the option may be 
compiled in but it is not 'acting' as an NFS server/has shares exported. I
need stability for this specific host, I cannot run netconsole due to
restrictions etc/will switch to ext4 since, if it recurs again, all I/O to 
the disk stops and I will not be able to assist in reporting the 
bug/giving the full output.  If I could reproduce the bug in another 
environment, I would be able to troubleshoot the problem more..  This bug 
is very similar to the one I reported in the past (that host was an active 
NFS server):
http://lists.openwall.net/linux-kernel/2009/10/19/118

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-03-15 15:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-15  9:32 2.6.33 crash: invalid opcode: 0000 [#1] SMP: EIP: [<c11a018b>] assfail+0x1b/0x20 SS:ESP 0068:f687bf14 Justin Piszcz
2010-03-15 14:40 ` Eric Sandeen
2010-03-15 14:44   ` Justin Piszcz
2010-03-15 14:46     ` Eric Sandeen
2010-03-15 15:11       ` Christoph Hellwig
2010-03-15 15:21         ` Justin Piszcz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox