public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS internal error when NFS client accesses nonexistent inode
@ 2008-12-31 14:09 Mario Becroft
  2009-01-01 17:14 ` Christoph Hellwig
  2009-01-01 17:17 ` Feature requests, was " Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Mario Becroft @ 2008-12-31 14:09 UTC (permalink / raw)
  To: xfs

I hit a seemingly strange problem today when I xfsdump/restored some
filesystems from one volume onto another. When I exported the new
volumes, errors like the following started to occur:

Dec 31 09:12:46 nfs1 kernel: nfsd: non-standard errno: -117

Bumping up the XFS debug level revealed the following (full details at
the end):

Dec 31 09:12:46 nfs1 kernel: Filesystem "dm-17": XFS internal error xfs_imap_to_bp at line 186 of file fs/xfs/xfs_inode.c.  Caller 0xffffffff80374c48

xfs_check reported no error in the filesystem.

After wasting quite a lot of time, I finally realised that this was
probably caused by NFS clients accessing nonexistent file handles that
they had open from when the filesystem was previously exported, prior to
the dump/restore.

Is my analysis correct? Is an internal error the expected behaviour in
this case? And can this cause any harm?

While I am writing, two things I wish XFS could do, and two more that
would be jolly nice to have:

1. shrink filesystems
2. dump/restore preserving inode numbers
3. high-performance dump with multi-threaded reading to fully utilise
disk throughput
4. on-line xfs_check/repair

I wonder if anyone is working on those?

Full log output follows:

--8<---------------cut here---------------start------------->8---
Dec 31 09:12:46 nfs1 kernel: 00000000: f5 83 23 77 26 c5 70 43 bb bd 3d 44 c9 63 e2 b1  ..#w&.pC..=D.c..
Dec 31 09:12:46 nfs1 kernel: Filesystem "dm-17": XFS internal error xfs_imap_to_bp at line 186 of file fs/xfs/xfs_inode.c.  Caller 0xffffffff80374c48
Dec 31 09:12:46 nfs1 kernel: Pid: 4969, comm: nfsd Not tainted 2.6.27.4 #3
Dec 31 09:12:46 nfs1 kernel:
Dec 31 09:12:46 nfs1 kernel: Call Trace:
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374c48>] xfs_itobp+0xa0/0xe7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374b82>] xfs_imap_to_bp+0xd6/0xfc
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374c48>] xfs_itobp+0xa0/0xe7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8037439f>] xfs_imap+0x6a/0x135
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374c48>] xfs_itobp+0xa0/0xe7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80376eec>] xfs_iread+0x79/0x1ed
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80372687>] xfs_iget_core+0x2ea/0x54d
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff803729cc>] xfs_iget+0xe2/0x18a
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80390e1d>] xfs_nfs_get_inode+0x39/0x88
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80390f80>] xfs_fs_fh_to_dentry+0x64/0x97
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa031f53c>] exportfs_decode_fh+0x30/0x1dc [exportfs]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0341449>] nfsd_acceptable+0x0/0xc7 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa03456a7>] exp_find_key+0x96/0xa9 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80228c88>] place_entity+0x9e/0xc7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8022b04d>] enqueue_task_fair+0x17e/0x1a7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa034571d>] exp_find+0x63/0x6f [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0340cf9>] fh_verify+0x278/0x546 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff802285c3>] __wake_up_common+0x41/0x74
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0341a0e>] nfsd_access+0x29/0xff [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa03495f4>] nfsd3_proc_access+0x99/0xa6 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa033e23c>] nfsd_dispatch+0xde/0x1c2 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0202371>] svc_process+0x408/0x6eb [sunrpc]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80531407>] __down_read+0x12/0x8b
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa033e874>] nfsd+0x1ae/0x27a [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa033e6c6>] nfsd+0x0/0x27a [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff802463a6>] kthread+0x47/0x75
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8020cd59>] child_rip+0xa/0x11
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8024635f>] kthread+0x0/0x75
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8020cd4f>] child_rip+0x0/0x11
Dec 31 09:12:46 nfs1 kernel:
Dec 31 09:12:46 nfs1 kernel: nfsd: non-standard errno: -117
Dec 31 09:12:46 nfs1 kernel: 00000000: f5 83 23 77 26 c5 70 43 bb bd 3d 44 c9 63 e2 b1  ..#w&.pC..=D.c..
Dec 31 09:12:46 nfs1 kernel: Filesystem "dm-17": XFS internal error xfs_imap_to_bp at line 186 of file fs/xfs/xfs_inode.c.  Caller 0xffffffff80374c48
Dec 31 09:12:46 nfs1 kernel: Pid: 4969, comm: nfsd Not tainted 2.6.27.4 #3
Dec 31 09:12:46 nfs1 kernel:
Dec 31 09:12:46 nfs1 kernel: Call Trace:
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374c48>] xfs_itobp+0xa0/0xe7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374b82>] xfs_imap_to_bp+0xd6/0xfc
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374c48>] xfs_itobp+0xa0/0xe7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8037439f>] xfs_imap+0x6a/0x135
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80374c48>] xfs_itobp+0xa0/0xe7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80376eec>] xfs_iread+0x79/0x1ed
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80372687>] xfs_iget_core+0x2ea/0x54d
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff803729cc>] xfs_iget+0xe2/0x18a
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80390e1d>] xfs_nfs_get_inode+0x39/0x88
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80390f80>] xfs_fs_fh_to_dentry+0x64/0x97
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa031f53c>] exportfs_decode_fh+0x30/0x1dc [exportfs]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0341449>] nfsd_acceptable+0x0/0xc7 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa03456a7>] exp_find_key+0x96/0xa9 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80228c88>] place_entity+0x9e/0xc7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8022b04d>] enqueue_task_fair+0x17e/0x1a7
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa034571d>] exp_find+0x63/0x6f [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0340cf9>] fh_verify+0x278/0x546 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff802285c3>] __wake_up_common+0x41/0x74
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0341a0e>] nfsd_access+0x29/0xff [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa03495f4>] nfsd3_proc_access+0x99/0xa6 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa033e23c>] nfsd_dispatch+0xde/0x1c2 [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa0202371>] svc_process+0x408/0x6eb [sunrpc]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff80531407>] __down_read+0x12/0x8b
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa033e874>] nfsd+0x1ae/0x27a [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffffa033e6c6>] nfsd+0x0/0x27a [nfsd]
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff802463a6>] kthread+0x47/0x75
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8020cd59>] child_rip+0xa/0x11
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8024635f>] kthread+0x0/0x75
Dec 31 09:12:46 nfs1 kernel:  [<ffffffff8020cd4f>] child_rip+0x0/0x11
Dec 31 09:12:46 nfs1 kernel:
Dec 31 09:12:46 nfs1 kernel: nfsd: non-standard errno: -117
--8<---------------cut here---------------end--------------->8---

-- 
Mario Becroft <mb@gem.win.co.nz>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error when NFS client accesses nonexistent inode
  2008-12-31 14:09 XFS internal error when NFS client accesses nonexistent inode Mario Becroft
@ 2009-01-01 17:14 ` Christoph Hellwig
  2009-01-01 17:37   ` Christoph Hellwig
  2009-01-01 19:00   ` Christoph Hellwig
  2009-01-01 17:17 ` Feature requests, was " Christoph Hellwig
  1 sibling, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2009-01-01 17:14 UTC (permalink / raw)
  To: Mario Becroft; +Cc: xfs

On Thu, Jan 01, 2009 at 03:09:08AM +1300, Mario Becroft wrote:
> I hit a seemingly strange problem today when I xfsdump/restored some
> filesystems from one volume onto another. When I exported the new
> volumes, errors like the following started to occur:
> 
> Dec 31 09:12:46 nfs1 kernel: nfsd: non-standard errno: -117
> 
> Bumping up the XFS debug level revealed the following (full details at
> the end):
> 
> Dec 31 09:12:46 nfs1 kernel: Filesystem "dm-17": XFS internal error xfs_imap_to_bp at line 186 of file fs/xfs/xfs_inode.c.  Caller 0xffffffff80374c48

This is:


                di_ok = be16_to_cpu(dip->di_core.di_magic) ==
			    XFS_DINODE_GOOD_VERSION(dip->di_core.di_version);
		if (unlikely(XFS_TEST_ERROR(!di_ok, mp,
						XFS_ERRTAG_ITOBP_INOTOBP,
						XFS_RANDOM_ITOBP_INOTOBP))) {
			if (imap_flags & XFS_IMAP_BULKSTAT) {
				xfs_trans_brelse(tp, bp);
				return XFS_ERROR(EINVAL);
			}
 here -->		XFS_CORRUPTION_ERROR("xfs_imap_to_bp",
						XFS_ERRLEVEL_HIGH, mp, dip);

> After wasting quite a lot of time, I finally realised that this was
> probably caused by NFS clients accessing nonexistent file handles that
> they had open from when the filesystem was previously exported, prior to
> the dump/restore.
>
> Is my analysis correct? Is an internal error the expected behaviour in
> this case? And can this cause any harm?

That explanation makes a lot of sense.  As seen in the snipplet above
we actually have checks for bulkstat which might hand in invalid inode
numbers, and I think we need to extent this check to nfs export and
the handle ioctls, too as we can get arbitrary inode numbers passed
from a client / user space.  In addition we should probably translate
the error number into something more useful.

I will create a testcase using the handle ioctls for this and provide
a fix to handle this issue more gracefully.

Except for shutting down a perfectly fine filesystem this should
not cause additional damage.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Feature requests, was Re: XFS internal error when NFS client accesses nonexistent inode
  2008-12-31 14:09 XFS internal error when NFS client accesses nonexistent inode Mario Becroft
  2009-01-01 17:14 ` Christoph Hellwig
@ 2009-01-01 17:17 ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2009-01-01 17:17 UTC (permalink / raw)
  To: Mario Becroft; +Cc: xfs

On Thu, Jan 01, 2009 at 03:09:08AM +1300, Mario Becroft wrote:
> While I am writing, two things I wish XFS could do, and two more that
> would be jolly nice to have:
> 
> 1. shrink filesystems

http://xfs.org/index.php/Shrinking_Support

> 2. dump/restore preserving inode numbers

That's very hard to do, given that inode numbers encode the location
on disk.  To support your NFS exporting scenario you would also have
to preserve the generation number, which also forms part of the nfs file
handle.

> 3. high-performance dump with multi-threaded reading to fully utilise
> disk throughput

Didn't xfsdump in IRIX have some sort of multi-stream support.  Bill, do
you remember anything like that?

> 4. on-line xfs_check/repair

Well, you can check online, it's just not going to give good results.
Using snaphots you can easily check online, and with a little hack
even repair, but you'd still have to reboot to then use the repaired
filesystem.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error when NFS client accesses nonexistent inode
  2009-01-01 17:14 ` Christoph Hellwig
@ 2009-01-01 17:37   ` Christoph Hellwig
  2009-01-01 19:00   ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2009-01-01 17:37 UTC (permalink / raw)
  To: Mario Becroft; +Cc: xfs

This should cure your shutdowns on a 2.6.27-ish codebase:



Index: btrfs-unstable/fs/xfs/linux-2.6/xfs_export.c
===================================================================
--- btrfs-unstable.orig/fs/xfs/linux-2.6/xfs_export.c	2009-01-01 18:34:39.868671500 +0100
+++ btrfs-unstable/fs/xfs/linux-2.6/xfs_export.c	2009-01-01 18:35:37.224782654 +0100
@@ -127,8 +127,8 @@ xfs_nfs_get_inode(
 	if (ino == 0)
 		return ERR_PTR(-ESTALE);
 
-	error = xfs_iget(mp, NULL, ino, 0, XFS_ILOCK_SHARED, &ip, 0);
+	error = xfs_iget(mp, NULL, ino, XFS_IGET_BULKSTAT, XFS_ILOCK_SHARED, &ip, 0);
 		return ERR_PTR(-error);
 	if (!ip)
 		return ERR_PTR(-EIO);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error when NFS client accesses nonexistent inode
  2009-01-01 17:14 ` Christoph Hellwig
  2009-01-01 17:37   ` Christoph Hellwig
@ 2009-01-01 19:00   ` Christoph Hellwig
  2009-01-01 23:15     ` Mario Becroft
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2009-01-01 19:00 UTC (permalink / raw)
  To: Mario Becroft; +Cc: xfs

Btw, you update /proc/sys/fs/xfs/error_level manually?  The corruption
test only triggers from a avalue of 5, but 3 is the default.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error when NFS client accesses nonexistent inode
  2009-01-01 19:00   ` Christoph Hellwig
@ 2009-01-01 23:15     ` Mario Becroft
  2009-01-01 23:20       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Mario Becroft @ 2009-01-01 23:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig <hch@infradead.org> writes:

> Btw, you update /proc/sys/fs/xfs/error_level manually?  The corruption
> test only triggers from a avalue of 5, but 3 is the default.

I was getting:

Dec 31 09:12:46 nfs1 kernel: nfsd: non-standard errno: -117

and in trying to figure out what it meant, I bumped up the XFS debug
level to 6, which enabled me to see the errors from XFS. Maybe I should
have just left it alone?

I should have pointed out that when this happened, the filesystem did
not actually shut down. So it did not cause any real problems. Should it
have been shutting down?

I was mainly just worried that depending on what data it happened to hit
when accessing the nonexistent inode, it might screw things up. If I do
encounter any shutdowns, I will apply the patch you sent through. Thanks
for the ultra-fast response.

I realise preserving inode/generation numbers on dump/restore is
probably hard and never going to happen. None of the other Linux
filesystems I have looked at do it either. It would be very, very nice
though... This is a feature I have wanted for ages.

-- 
Mario Becroft <mb@gem.win.co.nz>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS internal error when NFS client accesses nonexistent inode
  2009-01-01 23:15     ` Mario Becroft
@ 2009-01-01 23:20       ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2009-01-01 23:20 UTC (permalink / raw)
  To: Mario Becroft; +Cc: Christoph Hellwig, xfs

On Fri, Jan 02, 2009 at 12:15:46PM +1300, Mario Becroft wrote:
> Christoph Hellwig <hch@infradead.org> writes:
> 
> > Btw, you update /proc/sys/fs/xfs/error_level manually?  The corruption
> > test only triggers from a avalue of 5, but 3 is the default.
> 
> I was getting:
> 
> Dec 31 09:12:46 nfs1 kernel: nfsd: non-standard errno: -117
> 
> and in trying to figure out what it meant, I bumped up the XFS debug
> level to 6, which enabled me to see the errors from XFS. Maybe I should
> have just left it alone?
>
> I should have pointed out that when this happened, the filesystem did
> not actually shut down. So it did not cause any real problems. Should it
> have been shutting down?
> 
> I was mainly just worried that depending on what data it happened to hit

Looking at the code again there indeed aren't shutdowns, just
stacktraces.  So yes, the stacktraces are caused by the higher error
level.  With debug kernels it's still a kernel crash though, but no one
should run debug kernels on their production machines.

> when accessing the nonexistent inode, it might screw things up. If I do
> encounter any shutdowns, I will apply the patch you sent through. Thanks
> for the ultra-fast response.

Please try the second patch which I cc'ed you on as it gives back
the correct error code to the nfs clients.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-01-01 23:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-31 14:09 XFS internal error when NFS client accesses nonexistent inode Mario Becroft
2009-01-01 17:14 ` Christoph Hellwig
2009-01-01 17:37   ` Christoph Hellwig
2009-01-01 19:00   ` Christoph Hellwig
2009-01-01 23:15     ` Mario Becroft
2009-01-01 23:20       ` Christoph Hellwig
2009-01-01 17:17 ` Feature requests, was " Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox