2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117
@ 2008-03-17 23:27 Stuart Rowan
  2008-03-18  0:28 ` Timothy Shimmin
  0 siblings, 1 reply; 6+ messages in thread
From: Stuart Rowan @ 2008-03-17 23:27 UTC (permalink / raw)
  To: xfs

Hi,

Firstly thanks for the great filesystem and apologies if this ends up being 
NFS rather than XFS being weird! I'm not subscribed so please do keep me CC'd.

I have *millions* of lines of (>200k per minute according to syslog):
nfsd: non-standard errno: -117
being sent out of dmesg

Now errno 117 is
#define EUCLEAN         117     /* Structure needs cleaning */
which seems to be only used from a quick grep by XFS and JFFS and smbfs.

My nfs server export two locations
/home
/home/archive
both of these are XFS partitions, hence my suspicion that the -117 is 
coming from XFS.

xfs_repair -n says the filesystems are clean
xfs_repair has been run multiple times to completion on the filesystems, 
all is fine.

The XFS partitions are lvm volumes as follows
data/home 900G
data/archive 400G
The volume group, data, is sda3
sda3 is a 6 drive 3ware 9550SXU-8LP RAID10 array

The NFS server is currently in use (indeed the message only starts once 
clients connect) and works absolutely fine.

How do I find out what (if anything) is wrong with my filesystem / 
appropriately silence this message?

Many thanks,
Stu.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117
  2008-03-17 23:27 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117 Stuart Rowan
@ 2008-03-18  0:28 ` Timothy Shimmin
  2008-03-18  1:07   ` Stuart Rowan
  2008-03-18 13:49   ` Stuart Rowan
  0 siblings, 2 replies; 6+ messages in thread
From: Timothy Shimmin @ 2008-03-18  0:28 UTC (permalink / raw)
  To: strr-debian; +Cc: xfs

Hi Stuart,

Stuart Rowan wrote:
> Hi,
> 
> Firstly thanks for the great filesystem and apologies if this ends up 
> being NFS rather than XFS being weird! I'm not subscribed so please do 
> keep me CC'd.
> 
> I have *millions* of lines of (>200k per minute according to syslog):
> nfsd: non-standard errno: -117
> being sent out of dmesg
> 
> Now errno 117 is
> #define EUCLEAN         117     /* Structure needs cleaning */
> which seems to be only used from a quick grep by XFS and JFFS and smbfs.
> 
> 
In XFS we mapped EFSCORRUPTED to EUCLEAN as EFSCORRUPTED
didn't exist on Linux.
However, normally if this error is encountered in XFS then
we output an appropriate msg to the syslog.
Our default error level is 3 and most reports are rated at 1
so should show up I would have thought.

--Tim

> My nfs server export two locations
> /home
> /home/archive
> both of these are XFS partitions, hence my suspicion that the -117 is 
> coming from XFS.
> 
> xfs_repair -n says the filesystems are clean
> xfs_repair has been run multiple times to completion on the filesystems, 
> all is fine.
> 
> The XFS partitions are lvm volumes as follows
> data/home 900G
> data/archive 400G
> The volume group, data, is sda3
> sda3 is a 6 drive 3ware 9550SXU-8LP RAID10 array
> 
> The NFS server is currently in use (indeed the message only starts once 
> clients connect) and works absolutely fine.
> 
> How do I find out what (if anything) is wrong with my filesystem / 
> appropriately silence this message?
> 
> Many thanks,
> Stu.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117
  2008-03-18  0:28 ` Timothy Shimmin
@ 2008-03-18  1:07   ` Stuart Rowan
  2008-03-18 13:49   ` Stuart Rowan
  1 sibling, 0 replies; 6+ messages in thread
From: Stuart Rowan @ 2008-03-18  1:07 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: xfs

Hi Tim,

Timothy Shimmin wrote:
> Hi Stuart,
> 
> Stuart Rowan wrote:
>> Hi,
>>
>> Firstly thanks for the great filesystem and apologies if this ends up 
>> being NFS rather than XFS being weird! I'm not subscribed so please do 
>> keep me CC'd.
>>
>> I have *millions* of lines of (>200k per minute according to syslog):
>> nfsd: non-standard errno: -117
>> being sent out of dmesg
>>
>> Now errno 117 is
>> #define EUCLEAN         117     /* Structure needs cleaning */
>> which seems to be only used from a quick grep by XFS and JFFS and smbfs.
>>
>>
> In XFS we mapped EFSCORRUPTED to EUCLEAN as EFSCORRUPTED
> didn't exist on Linux.
> However, normally if this error is encountered in XFS then
> we output an appropriate msg to the syslog.
> Our default error level is 3 and most reports are rated at 1
> so should show up I would have thought.
> 
> --Tim
>

Thanks for the swift reply -- reading previous mailing list posts, that 
was my expectation too!
I've attached the relevant section of /var/log/kern.log inline below -- 
there is no error message! Is there another way of inspecting what (if 
anything) a given XFS file-system thinks is wrong with it.

Thanks,
Stu.

Mar 17 23:01:50 evenlode kernel: SGI XFS with ACLs, security attributes, 
realtime, large block/inode numbers, no debug enabled
Mar 17 23:01:50 evenlode kernel: SGI XFS Quota Management subsystem
Mar 17 23:01:50 evenlode kernel: Filesystem "dm-0": Disabling barriers, 
not supported by the underlying device
Mar 17 23:01:50 evenlode kernel: XFS mounting filesystem dm-0
Mar 17 23:01:50 evenlode kernel: Ending clean XFS mount for filesystem: dm-0
Mar 17 23:01:50 evenlode kernel: Filesystem "dm-1": Disabling barriers, 
not supported by the underlying device
Mar 17 23:01:50 evenlode kernel: XFS mounting filesystem dm-1
Mar 17 23:01:50 evenlode kernel: Ending clean XFS mount for filesystem: dm-1
Mar 17 23:01:50 evenlode kernel: e1000: eth0: e1000_watchdog: NIC Link 
is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Mar 17 23:01:50 evenlode kernel: RPC: Registered udp transport module.
Mar 17 23:01:50 evenlode kernel: RPC: Registered tcp transport module.
Mar 17 23:01:50 evenlode kernel: NET: Registered protocol family 10
Mar 17 23:01:50 evenlode kernel: lo: Disabled Privacy Extensions
Mar 17 23:01:50 evenlode kernel: IA-32 Microcode Update Driver: v1.14a 
<tigran@aivazian.fsnet.co.uk>
Mar 17 23:01:51 evenlode kernel: Installing knfsd (copyright (C) 1996 
okir@monad.swb.de).
Mar 17 23:01:51 evenlode kernel: NFSD: Using /var/lib/nfs/v4recovery as 
the NFSv4 state recovery directory
Mar 17 23:01:51 evenlode kernel: NFSD: starting 90-second grace period
Mar 17 23:01:57 evenlode kernel: eth0: no IPv6 routers present
Mar 17 23:03:04 evenlode kernel: 3w-9xxx: scsi0: AEN: INFO 
(0x04:0x000C): Initialize started:unit=0, subunit=0.
Mar 17 23:03:04 evenlode kernel: 3w-9xxx: scsi0: AEN: INFO 
(0x04:0x000C): Initialize started:unit=0, subunit=2.
Mar 17 23:04:40 evenlode kernel: nfsd: non-standard errno: -117
Mar 17 23:05:11 evenlode last message repeated 93970 times
Mar 17 23:06:12 evenlode last message repeated 188363 times

mount:
/dev/mapper/data-home on /home type xfs (rw,logbufs=8,usrquota)
/dev/mapper/data-archive on /home/archive type xfs (rw,logbufs=8)

evenlode:~# xfs_info /home
meta-data=/dev/data/home         isize=256    agcount=32, agsize=7372784 
blks
          =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=235929088, imaxpct=25
          =                       sunit=16     swidth=48 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=2
          =                       sectsz=512   sunit=16 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0

evenlode:~# xfs_info /home/archive
meta-data=/dev/data/archive      isize=256    agcount=16, agsize=6553600 
blks
          =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=104857600, imaxpct=25
          =                       sunit=16     swidth=48 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=2
          =                       sectsz=512   sunit=16 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117
  2008-03-18  0:28 ` Timothy Shimmin
  2008-03-18  1:07   ` Stuart Rowan
@ 2008-03-18 13:49   ` Stuart Rowan
  2008-03-20  1:09     ` Timothy Shimmin
  1 sibling, 1 reply; 6+ messages in thread
From: Stuart Rowan @ 2008-03-18 13:49 UTC (permalink / raw)
  To: xfs; +Cc: Timothy Shimmin

Timothy Shimmin wrote, on 18/03/08 00:28:
> Hi Stuart,
> 
> Stuart Rowan wrote:
>> Hi,
>>
>> Firstly thanks for the great filesystem and apologies if this ends up 
>> being NFS rather than XFS being weird! I'm not subscribed so please do 
>> keep me CC'd.
>>
>> I have *millions* of lines of (>200k per minute according to syslog):
>> nfsd: non-standard errno: -117
>> being sent out of dmesg
>>
>> Now errno 117 is
>> #define EUCLEAN         117     /* Structure needs cleaning */
>> which seems to be only used from a quick grep by XFS and JFFS and smbfs.
>>
>>
> In XFS we mapped EFSCORRUPTED to EUCLEAN as EFSCORRUPTED
> didn't exist on Linux.
> However, normally if this error is encountered in XFS then
> we output an appropriate msg to the syslog.
> Our default error level is 3 and most reports are rated at 1
> so should show up I would have thought.
> 
> --Tim
> 
>> My nfs server export two locations
>> /home
>> /home/archive
>> both of these are XFS partitions, hence my suspicion that the -117 is 
>> coming from XFS.
>>
>> xfs_repair -n says the filesystems are clean
>> xfs_repair has been run multiple times to completion on the 
>> filesystems, all is fine.
>>
>> The XFS partitions are lvm volumes as follows
>> data/home 900G
>> data/archive 400G
>> The volume group, data, is sda3
>> sda3 is a 6 drive 3ware 9550SXU-8LP RAID10 array
>>
>> The NFS server is currently in use (indeed the message only starts 
>> once clients connect) and works absolutely fine.
>>
>> How do I find out what (if anything) is wrong with my filesystem / 
>> appropriately silence this message?
>>
>> Many thanks,
>> Stu.
>>
> 
> 
I briefly changed the sysctl fs.xfs.error_level to 6 and then back to 3

It gives the following message and backtrace

> Mar 18 13:35:15 evenlode kernel: nfsd: non-standard errno: -117
> Mar 18 13:35:15 evenlode kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> Mar 18 13:35:15 evenlode kernel: Filesystem "dm-0": XFS internal error xfs_itobp at line 360 of file fs/xfs/xfs_inode.c.  Caller 0xffffffff8821224d
> Mar 18 13:35:15 evenlode kernel: Pid: 2791, comm: nfsd Not tainted 2.6.24.3-generic #1
> Mar 18 13:35:15 evenlode kernel: 
> Mar 18 13:35:15 evenlode kernel: Call Trace:
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] :xfs:xfs_iread+0x71/0x1e8
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820f784>] :xfs:xfs_itobp+0x141/0x17b
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] :xfs:xfs_iread+0x71/0x1e8
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] :xfs:xfs_iread+0x71/0x1e8
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820d7c9>] :xfs:xfs_iget_core+0x352/0x63a
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8029095f>] alloc_inode+0x152/0x1c2
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820db4c>] :xfs:xfs_iget+0x9b/0x13f
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff882243d1>] :xfs:xfs_vget+0x4d/0xbb
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8822f0d2>] :xfs:xfs_nfs_get_inode+0x2e/0x42
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8822f1f9>] :xfs:xfs_fs_fh_to_dentry+0x64/0x97
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88332554>] :exportfs:exportfs_decode_fh+0x30/0x1dc
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88347c16>] :nfsd:nfsd_acceptable+0x0/0xca
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8023b11a>] set_current_groups+0x148/0x153
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8834df5f>] :nfsd:nfsd_setuser+0x11c/0x171
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88347762>] :nfsd:nfsd_setuser_and_check_port+0x52/0x57
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88347edb>] :nfsd:fh_verify+0x1fb/0x4a4
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88261024>] :sunrpc:svc_tcp_recvfrom+0x7ab/0x843
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88349141>] :nfsd:nfsd_open+0x1f/0x170
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88349538>] :nfsd:nfsd_read+0x7f/0xc4
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff88350321>] :nfsd:nfsd3_proc_read+0x117/0x15a
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8834524e>] :nfsd:nfsd_dispatch+0xde/0x1c2
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8825e192>] :sunrpc:svc_process+0x3f7/0x6e9
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff803fba8d>] __down_read+0x12/0x9a
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8834582f>] :nfsd:nfsd+0x191/0x2ae
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8020cbe8>] child_rip+0xa/0x12
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8834569e>] :nfsd:nfsd+0x0/0x2ae
> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8020cbde>] child_rip+0x0/0x12
> Mar 18 13:35:15 evenlode kernel: 

Does that help?

Thanks,
Stu.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117
  2008-03-18 13:49   ` Stuart Rowan
@ 2008-03-20  1:09     ` Timothy Shimmin
  2008-03-20  8:25       ` XFS internal error xfs_itobp at line 360 of file fs/xfs/xfs_inode.c. (was Re: 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117) Stuart Rowan
  0 siblings, 1 reply; 6+ messages in thread
From: Timothy Shimmin @ 2008-03-20  1:09 UTC (permalink / raw)
  To: strr-debian; +Cc: xfs

Stuart Rowan wrote:
> Timothy Shimmin wrote, on 18/03/08 00:28:
>> Hi Stuart,
>>
>> Stuart Rowan wrote:
>>>
>>> I have *millions* of lines of (>200k per minute according to syslog):
>>> nfsd: non-standard errno: -117
>>> being sent out of dmesg
>>>
>>> Now errno 117 is
>>> #define EUCLEAN         117     /* Structure needs cleaning */
>>>
>> In XFS we mapped EFSCORRUPTED to EUCLEAN as EFSCORRUPTED
>> didn't exist on Linux.
>> However, normally if this error is encountered in XFS then
>> we output an appropriate msg to the syslog.
>> Our default error level is 3 and most reports are rated at 1
>> so should show up I would have thought.
>>
>> --Tim
>>
>>>
>>> xfs_repair -n says the filesystems are clean
>>> xfs_repair has been run multiple times to completion on the 
>>> filesystems, all is fine.
>>>
>>> The NFS server is currently in use (indeed the message only starts 
>>> once clients connect) and works absolutely fine.
>>>
>>> How do I find out what (if anything) is wrong with my filesystem / 
>>> appropriately silence this message?
>>>
>>
> I briefly changed the sysctl fs.xfs.error_level to 6 and then back to 3
> 
Good idea (I was thinking about that :-).

Somehow, your subject line referring to 2.6.24 didn't stick in
my brain (that's pretty old).
So I was looking at recent code which I can't see has this error
case from xfs_itobp() (it is now in xfs_imap_to_bp()).

Looking at old code, I see 2 EFSCORRUPTED paths with the following
one triggering at XFS_ERRLEVEL_HIGH (and presumably why you didn't
see it until now) ...

montep    |1.198|                            |  /*
montep    |1.198|                            |   * Validate the magic number and version of every inode in the buffer
montep    |1.198|                            |   * (if DEBUG kernel) or the first inode in the buffer, otherwise.
montep    |1.198|                            |   */
nathans   |1.303|2.4.x-xfs:slinx:74929a      |#ifdef DEBUG
montep    |1.198|                            |  ni = BBTOB(imap.im_len) >> mp->m_sb.sb_inodelog;
montep    |1.198|                            |#else
montep    |1.198|                            |  ni = 1;
montep    |1.198|                            |#endif
montep    |1.198|                            |  for (i = 0; i < ni; i++) {
doucette  |1.245|irix6.5f:irix:09146b        |          int             di_ok;
doucette  |1.245|irix6.5f:irix:09146b        |          xfs_dinode_t    *dip;
doucette  |1.245|irix6.5f:irix:09146b        |
lord      |1.292|2.4.0-test1-xfs:slinx:65571a|          dip = (xfs_dinode_t *)xfs_buf_offset(bp,
montep    |1.198|                            |                                  (i << mp->m_sb.sb_inodelog));
dxm       |1.285|2.4.0-test1-xfs:slinx:62350a|          di_ok = INT_GET(dip->di_core.di_magic, ARCH_CONVERT) == XFS_DINODE_MAGIC &&
dxm       |1.285|2.4.0-test1-xfs:slinx:62350a|                      XFS_DINODE_GOOD_VERSION(INT_GET(dip->di_core.di_version, ARCH_CONVERT));
overby    |1.362|2.4.x-xfs:slinx:136445a     |          if (unlikely(XFS_TEST_ERROR(!di_ok, mp, XFS_ERRTAG_ITOBP_INOTOBP,
overby    |1.362|2.4.x-xfs:slinx:136445a     |                           XFS_RANDOM_ITOBP_INOTOBP))) {
montep    |1.198|                            |#ifdef DEBUG
nathans   |1.337|2.4.x-xfs:slinx:119399a     |                  prdev("bad inode magic/vsn daddr 0x%llx #%d (magic=%x)",
nathans   |1.337|2.4.x-xfs:slinx:119399a     |                          mp->m_dev, (unsigned long long)imap.im_blkno, i,
nathans   |1.303|2.4.x-xfs:slinx:74929a      |                          INT_GET(dip->di_core.di_magic, ARCH_CONVERT));
montep    |1.198|                            |#endif
lord      |1.376|2.4.x-xfs:slinx:150747a     |                  XFS_CORRUPTION_ERROR("xfs_itobp", XFS_ERRLEVEL_HIGH,
overby    |1.362|2.4.x-xfs:slinx:136445a     |                                       mp, dip);
montep    |1.198|                            |                  xfs_trans_brelse(tp, bp);
sup       |1.216|                            |                  return XFS_ERROR(EFSCORRUPTED);
montep    |1.198|                            |          }
ajs       |1.143|                            |  }

So the first inode in the buffer has the wrong magic# or version#.
I'm surprised that this wasn't picked up by repair or check.

--Tim

> It gives the following message and backtrace
> 
>> Mar 18 13:35:15 evenlode kernel: nfsd: non-standard errno: -117
>> Mar 18 13:35:15 evenlode kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 
>> 00 00 00 00 00 Mar 18 13:35:15 evenlode kernel: Filesystem "dm-0": XFS 
>> internal error xfs_itobp at line 360 of file fs/xfs/xfs_inode.c.  
>> Caller 0xffffffff8821224d
>> Mar 18 13:35:15 evenlode kernel: Pid: 2791, comm: nfsd Not tainted 
>> 2.6.24.3-generic #1
>> Mar 18 13:35:15 evenlode kernel: Mar 18 13:35:15 evenlode kernel: Call 
>> Trace:
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] 
>> :xfs:xfs_iread+0x71/0x1e8
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820f784>] 
>> :xfs:xfs_itobp+0x141/0x17b
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] 
>> :xfs:xfs_iread+0x71/0x1e8
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] 
>> :xfs:xfs_iread+0x71/0x1e8
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820d7c9>] 
>> :xfs:xfs_iget_core+0x352/0x63a
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8029095f>] 
>> alloc_inode+0x152/0x1c2
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820db4c>] 
>> :xfs:xfs_iget+0x9b/0x13f
>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff882243d1>] 
>> :xfs:xfs_vget+0x4d/0xbb
>
> 
> Does that help?
> 
> Thanks,
> Stu.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* XFS internal error xfs_itobp at line 360 of file fs/xfs/xfs_inode.c. (was Re: 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117)
  2008-03-20  1:09     ` Timothy Shimmin
@ 2008-03-20  8:25       ` Stuart Rowan
  0 siblings, 0 replies; 6+ messages in thread
From: Stuart Rowan @ 2008-03-20  8:25 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: strr-debian, xfs



On Thu, 20 Mar 2008, Timothy Shimmin wrote:

> Stuart Rowan wrote:
>> Timothy Shimmin wrote, on 18/03/08 00:28:
>>> Hi Stuart,
>>> 
>>> Stuart Rowan wrote:
>>>> 
>>>> I have *millions* of lines of (>200k per minute according to syslog):
>>>> nfsd: non-standard errno: -117
>>>> being sent out of dmesg
>>>> 
>>>> Now errno 117 is
>>>> #define EUCLEAN         117     /* Structure needs cleaning */
>>>> 
>>> In XFS we mapped EFSCORRUPTED to EUCLEAN as EFSCORRUPTED
>>> didn't exist on Linux.
>>> However, normally if this error is encountered in XFS then
>>> we output an appropriate msg to the syslog.
>>> Our default error level is 3 and most reports are rated at 1
>>> so should show up I would have thought.
>>> 
>>> --Tim
>>> 
>>>> 
>>>> xfs_repair -n says the filesystems are clean
>>>> xfs_repair has been run multiple times to completion on the filesystems, 
>>>> all is fine.
>>>> 
>>>> The NFS server is currently in use (indeed the message only starts once 
>>>> clients connect) and works absolutely fine.
>>>> 
>>>> How do I find out what (if anything) is wrong with my filesystem / 
>>>> appropriately silence this message?
>>>> 
>>> 
>> I briefly changed the sysctl fs.xfs.error_level to 6 and then back to 3
>> 
> Good idea (I was thinking about that :-).
>
> Somehow, your subject line referring to 2.6.24 didn't stick in
> my brain (that's pretty old).
> So I was looking at recent code which I can't see has this error
> case from xfs_itobp() (it is now in xfs_imap_to_bp()).
>
Pretty old for you, latest released Linux kernel to me :-P

> Looking at old code, I see 2 EFSCORRUPTED paths with the following
> one triggering at XFS_ERRLEVEL_HIGH (and presumably why you didn't
> see it until now) ...
>
> montep    |1.198|                            |  /*
> montep    |1.198|                            |   * Validate the magic number 
> and version of every inode in the buffer
> montep    |1.198|                            |   * (if DEBUG kernel) or the 
> first inode in the buffer, otherwise.
> montep    |1.198|                            |   */
> nathans   |1.303|2.4.x-xfs:slinx:74929a      |#ifdef DEBUG
> montep    |1.198|                            |  ni = BBTOB(imap.im_len) >> 
> mp->m_sb.sb_inodelog;
> montep    |1.198|                            |#else
> montep    |1.198|                            |  ni = 1;
> montep    |1.198|                            |#endif
> montep    |1.198|                            |  for (i = 0; i < ni; i++) {
> doucette  |1.245|irix6.5f:irix:09146b        |          int 
> di_ok;
> doucette  |1.245|irix6.5f:irix:09146b        |          xfs_dinode_t    *dip;
> doucette  |1.245|irix6.5f:irix:09146b        |
> lord      |1.292|2.4.0-test1-xfs:slinx:65571a|          dip = (xfs_dinode_t 
> *)xfs_buf_offset(bp,
> montep    |1.198|                            | 
> (i << mp->m_sb.sb_inodelog));
> dxm       |1.285|2.4.0-test1-xfs:slinx:62350a|          di_ok = 
> INT_GET(dip->di_core.di_magic, ARCH_CONVERT) == XFS_DINODE_MAGIC &&
> dxm       |1.285|2.4.0-test1-xfs:slinx:62350a| 
> XFS_DINODE_GOOD_VERSION(INT_GET(dip->di_core.di_version, ARCH_CONVERT));
> overby    |1.362|2.4.x-xfs:slinx:136445a     |          if 
> (unlikely(XFS_TEST_ERROR(!di_ok, mp, XFS_ERRTAG_ITOBP_INOTOBP,
> overby    |1.362|2.4.x-xfs:slinx:136445a     | 
> XFS_RANDOM_ITOBP_INOTOBP))) {
> montep    |1.198|                            |#ifdef DEBUG
> nathans   |1.337|2.4.x-xfs:slinx:119399a     |                  prdev("bad 
> inode magic/vsn daddr 0x%llx #%d (magic=%x)",
> nathans   |1.337|2.4.x-xfs:slinx:119399a     | 
> mp->m_dev, (unsigned long long)imap.im_blkno, i,
> nathans   |1.303|2.4.x-xfs:slinx:74929a      | 
> INT_GET(dip->di_core.di_magic, ARCH_CONVERT));
> montep    |1.198|                            |#endif
> lord      |1.376|2.4.x-xfs:slinx:150747a     | 
> XFS_CORRUPTION_ERROR("xfs_itobp", XFS_ERRLEVEL_HIGH,
> overby    |1.362|2.4.x-xfs:slinx:136445a     | 
> mp, dip);
> montep    |1.198|                            | 
> xfs_trans_brelse(tp, bp);
> sup       |1.216|                            |                  return 
> XFS_ERROR(EFSCORRUPTED);
> montep    |1.198|                            |          }
> ajs       |1.143|                            |  }
>
> So the first inode in the buffer has the wrong magic# or version#.
> I'm surprised that this wasn't picked up by repair or check.
>
> --Tim
>
I have some more information! The server, evenlode, was previously serving 
NFS exports of ext3 filesystems. Last week we rsycned the data to the new 
server running XFS.

Eventually I spotted the high error rate was linked to the volume of NFS 
read calls (200k / minute). A quick tcpdump gave me a couple of likely 
looking hosts. I logged into one (bonny) and found gnome-panel using 100% 
CPU. I killed that and these messages have now reduced to a handful an 
hour. That gnome-panel will have had the NFS server and underlying NFS 
backing filesystem (ext3-> XFS) changed underneath it.

So my questions ...

Is it possible that the errors are related to duff request data being sent 
by the NFS clients because they are still referencing e.g. inodes as they 
were when the NFS server was ext3 backed?

Is it also possible that things like the rather high request rate 
(200k/sec) although that's reduced now, made a race in e.g. the XFS code 
triggerable?

As you say, it's rather suprising that this sort of issue is not being 
caught by xfs_repair (-n) and that's what leads me to suspect something 
else at play ...

Cheers,
Stu.

>> It gives the following message and backtrace
>> 
>>> Mar 18 13:35:15 evenlode kernel: nfsd: non-standard errno: -117
>>> Mar 18 13:35:15 evenlode kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 
>>> 00 00 00 00 Mar 18 13:35:15 evenlode kernel: Filesystem "dm-0": XFS 
>>> internal error xfs_itobp at line 360 of file fs/xfs/xfs_inode.c.  Caller 
>>> 0xffffffff8821224d
>>> Mar 18 13:35:15 evenlode kernel: Pid: 2791, comm: nfsd Not tainted 
>>> 2.6.24.3-generic #1
>>> Mar 18 13:35:15 evenlode kernel: Mar 18 13:35:15 evenlode kernel: Call 
>>> Trace:
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] 
>>> :xfs:xfs_iread+0x71/0x1e8
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820f784>] 
>>> :xfs:xfs_itobp+0x141/0x17b
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] 
>>> :xfs:xfs_iread+0x71/0x1e8
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8821224d>] 
>>> :xfs:xfs_iread+0x71/0x1e8
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820d7c9>] 
>>> :xfs:xfs_iget_core+0x352/0x63a
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8029095f>] 
>>> alloc_inode+0x152/0x1c2
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff8820db4c>] 
>>> :xfs:xfs_iget+0x9b/0x13f
>>> Mar 18 13:35:15 evenlode kernel:  [<ffffffff882243d1>] 
>>> :xfs:xfs_vget+0x4d/0xbb
>> 
>> 
>> Does that help?
>> 
>> Thanks,
>> Stu.
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-03-20  8:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-17 23:27 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117 Stuart Rowan
2008-03-18  0:28 ` Timothy Shimmin
2008-03-18  1:07   ` Stuart Rowan
2008-03-18 13:49   ` Stuart Rowan
2008-03-20  1:09     ` Timothy Shimmin
2008-03-20  8:25       ` XFS internal error xfs_itobp at line 360 of file fs/xfs/xfs_inode.c. (was Re: 2.6.24.3 nfs server on xfs keeps producing nfsd: non-standard errno: -117) Stuart Rowan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox