All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shyam Kaushik <shyam@zadarastorage.com>
To: dchinner@redhat.com, Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Metadata corruption at xfs_attr3_leaf_write_verify()
Date: Wed, 26 Jul 2017 10:52:50 +0530	[thread overview]
Message-ID: <d05c5fec57e551e85e7351dee11bb4d4@mail.gmail.com> (raw)
In-Reply-To: 4381c2b04c11a81f9203d5595651c64a@mail.gmail.com

Hi Dave,

>From time to time we hit an xfs corruption when xattr leaf is attempted to
be written when it has ichdr.count==0.

We are on an old kernel 3.18.19. We managed to capture xfs trace buffer
closer to the problem time.

This is the corruption report
	Jul 20 05:22:27.189727 node1 kernel: [466290.765299] XFS (dm-78): Metadata
corruption detected at xfs_attr3_leaf_write_verify+0xe5/0x100 [xfs], block
0x9698734d8
	Jul 20 05:22:27.189741 node1 kernel: [466290.767279] XFS (dm-78): Unmount
and run xfs_repair
	Jul 20 05:22:27.189743 node1 kernel: [466290.768022] XFS (dm-78): First 64
bytes of corrupted metadata buffer:
	Jul 20 05:22:27.191635 node1 kernel: [466290.768832] ffff88002b4bf000: 00
00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00  ................
	Jul 20 05:22:27.192949 node1 kernel: [466290.770163] ffff88002b4bf010: 10
00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
	Jul 20 05:22:27.192953 node1 kernel: [466290.771264] ffff88002b4bf020: 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
	Jul 20 05:22:27.195921 node1 kernel: [466290.772815] ffff88002b4bf030: 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

>From xfs xattr debug trace, it appears that a file is created within a
directory having default ACL set. This causes file creation to insert xattr
leaf during file creation. This completes queuing an AIL. Immediately after
this an explicit xfs_setattr is being done on the same file which alters the
default ACL & this causes default ACL to be removed/added. It looks like
this 2nd set attr is somehow influencing the first AIL entry that it is
being seen with ichr.count==0 during the write.

I can pass the full xfs debug trace if required. Here is the snippet of
debug trace causing the issue

           <...>-18953 [004] ...1 466290.749358: xfs_attr_sf_create: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749359: xfs_attr_sf_addname: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749361: xfs_attr_sf_lookup: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749362: xfs_attr_sf_to_leaf: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749430: xfs_attr_leaf_create: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749454: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749456: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749457: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749458: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.749649: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.749908: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags
           <...>-18953 [004] ...1 466290.749910: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags
           <...>-18953 [004] ...1 466290.749918: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.749931: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.749932: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.749984: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.749990: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750063: xfs_setattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750257: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750260: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750261: xfs_attr_leaf_replace: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750262: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750262: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750270: xfs_attr_leaf_flipflags:
dev 253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100
hashval 0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750275: xfs_attr_leaf_remove: dev
253:78 ino 0x1749833903 name SGI_ACL_FILE namelen 12 valuelen 100 hashval
0x275b19c4 op_flags RENAME|ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750461: xfs_attr_leaf_removename:
dev 253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 0 hashval
0xea9061c8 op_flags OKNOENT
           <...>-18953 [004] ...1 466290.750463: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 0 hashval 0xea9061c8
op_flags OKNOENT
           <...>-18953 [004] ...1 466290.750474: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750475: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750476: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750476: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 49 hashval
0x487716dd op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750645: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750648: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750645: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750648: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750659: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.750661: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name SAMBA_PAI namelen 9 valuelen 1024 hashval
0x487716dd op_flags
           <...>-18953 [004] ...1 466290.750744: xfs_getattr: dev 253:78 ino
0x1749833903
           <...>-18953 [004] ...1 466290.750791: xfs_attr_leaf_addname: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750793: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750794: xfs_attr_leaf_add: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.750794: xfs_attr_leaf_add_work: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 444 hashval 0xea9061c8
op_flags ADDNAME|OKNOENT
           <...>-18953 [004] ...1 466290.751029: xfs_attr_leaf_get: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags
           <...>-18953 [004] ...1 466290.751031: xfs_attr_leaf_lookup: dev
253:78 ino 0x1749833903 name NTACL namelen 5 valuelen 1024 hashval
0xea9061c8 op_flags


Any pointers/input to check this further is greatly appreciated! Thanks.

--Shyam

On Tue, Apr 26, 2016 at 4:10 PM, Shyam Kaushik <shyam@zadarastorage.com>
wrote:
Hi Eric,

Yes this is a non-CRC FS. The workload is we a custom application that
does several (16 threads) file create/read/writes/update xattrs on a XFS
mounted over raw disk.

There is no messages prior to this & we hit this once in a few days (like
every 4-5 days). I will try your suggestion of running with xfs_attr_*
tracepoints all the time so that when we hit this error we have better
info. Please let me know if you prefer to add further debug prints.
Thanks.

--Shyam

> Eric Sandeen sandeen at sandeen.net wrote:
> On Mon Apr 25 16:57:11 CDT 2016
>
> Is this a non-crc filesystem?
>
>
> XFS_ATTR_LEAF_MAGIC is ok (if it's a non-crc filesystem)
>
> Looks the same as the other report, tripping on:
>
>        if (ichdr.count == 0)
>                return false;
>
> A reproducer would be super here.  At least maybe a description of the
> workload that hits it?
>
> Count is manipulated in things like attr leaf compaction...
> Any other messages prior to this?
>
> How often do you hit it?
>
> You could also turn on xfs_attr_* tracepoints, maybe.
>
> -Eric

-----Original Message-----
From: Shyam Kaushik [mailto:shyam@zadarastorage.com]
Sent: 25 April 2016 22:22
To: 'xfs@oss.sgi.com'
Cc: Alex Lyakas
Subject: Metadata corruption detected at
xfs_attr3_leaf_write_verify+0xe5/0x100

Hi Dave et al,

We are periodically hitting the below metadata corruption with XFS over a
raw disk running several file copies with xattr operations on kernel
3.18.19. Unmounting & running xfs_repair doesn't report any corruption. I
see that this was last reported here
http://oss.sgi.com/archives/xfs/2015-12/msg00224.html

Unfortunately we dont have a reproducer, but this issue happens
periodically. We can add more debug prints & allow this issue to happen
again. Can you pls suggest any options to debug this further? Thanks

Apr 20 21:58:03 node1 kernel: [16736.286370] XFS (dm-26): Metadata
corruption detected at xfs_attr3_leaf_write_verify+0xe5/0x100 [xfs], block
0x19c5c728
Apr 20 21:58:03 node1 kernel: [16736.289084] XFS (dm-26): Unmount and run
xfs_repair
Apr 20 21:58:03 node1 kernel: [16736.290257] XFS (dm-26): First 64 bytes
of corrupted metadata buffer:
Apr 20 21:58:03 node1 kernel: [16736.291797] ffff880123668000: 00 00 00 00
00 00 00 00 fb ee 00 00 00 00 00 00  ................
Apr 20 21:58:03 node1 kernel: [16736.293823] ffff880123668010: 10 00 00 00
00 20 0f e0 00 00 00 00 00 00 00 00  ..... ..........
Apr 20 21:58:03 node1 kernel: [16736.297504] ffff880123668020: 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 20 21:58:03 node1 kernel: [16736.299343] ffff880123668030: 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 20 21:58:03 node1 kernel: [16736.301465] XFS (dm-26):
xfs_do_force_shutdown(0x8) called from line 1244 of file fs/xfs/xfs_buf.c.
Return address = 0xffffffffc095cee0
Apr 20 21:58:03 node1 kernel: [16736.301469] ------------[ cut here
]------------
Apr 20 21:58:03 node1 kernel: [16736.301551] XFS(dm-26): SHUTDOWN!!!
old_flags=0x0 new_flags=0x8
Apr 20 21:58:03 node1 kernel: [16736.301703] CPU: 1 PID: 7857 Comm:
xfsaild/dm-26 Tainted: G           OE  3.18.19 #1
Apr 20 21:58:03 node1 kernel: [16736.301705] Hardware name: Bochs Bochs,
BIOS Bochs 01/01/2011
Apr 20 21:58:03 node1 kernel: [16736.301707]  0000000000000009
ffff88020c5ffb38 ffffffff81710c85 0000000000000000
Apr 20 21:58:03 node1 kernel: [16736.301711]  ffff88020c5ffb88
ffff88020c5ffb78 ffffffff81072df1 2e2e202030302030
Apr 20 21:58:03 node1 kernel: [16736.301715]  0000000000000000
0000000000000008 ffff88020c127000 0000000000000000
Apr 20 21:58:03 node1 kernel: [16736.301718] Call Trace:
Apr 20 21:58:03 node1 kernel: [16736.301769]  [<ffffffff81710c85>]
dump_stack+0x4e/0x71
Apr 20 21:58:03 node1 kernel: [16736.301780]  [<ffffffff81072df1>]
warn_slowpath_common+0x81/0xa0
Apr 20 21:58:03 node1 kernel: [16736.301784]  [<ffffffff81072e56>]
warn_slowpath_fmt+0x46/0x50
Apr 20 21:58:03 node1 kernel: [16736.301860]  [<ffffffffc09693f3>]
xfs_do_force_shutdown+0x33/0x170 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.301921]  [<ffffffffc095cee0>] ?
_xfs_buf_ioapply+0xa0/0x430 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.301951]  [<ffffffffc095ee4b>] ?
__xfs_buf_delwri_submit+0x22b/0x290 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302066]  [<ffffffffc095cee0>]
_xfs_buf_ioapply+0xa0/0x430 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302137]  [<ffffffff8109e260>] ?
wake_up_state+0x20/0x20
Apr 20 21:58:03 node1 kernel: [16736.302162]  [<ffffffffc095ee4b>] ?
__xfs_buf_delwri_submit+0x22b/0x290 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302183]  [<ffffffffc095ea78>]
xfs_buf_submit+0x68/0x210 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302241]  [<ffffffffc095ee4b>]
__xfs_buf_delwri_submit+0x22b/0x290 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302267]  [<ffffffffc095fc60>] ?
xfs_buf_delwri_submit_nowait+0x20/0x30 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302291]  [<ffffffffc098f440>] ?
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302308]  [<ffffffffc095fc60>]
xfs_buf_delwri_submit_nowait+0x20/0x30 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302333]  [<ffffffffc098f66b>]
xfsaild+0x22b/0x630 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302513]  [<ffffffffc098f440>] ?
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr 20 21:58:03 node1 kernel: [16736.302518]  [<ffffffff810911b9>]
kthread+0xc9/0xe0
Apr 20 21:58:03 node1 kernel: [16736.302522]  [<ffffffff810910f0>] ?
kthread_create_on_node+0x180/0x180
Apr 20 21:58:03 node1 kernel: [16736.302530]  [<ffffffff81717918>]
ret_from_fork+0x58/0x90
Apr 20 21:58:03 node1 kernel: [16736.302549]  [<ffffffff810910f0>] ?
kthread_create_on_node+0x180/0x180
Apr 20 21:58:03 node1 kernel: [16736.302551] ---[ end trace
0bb81b88fdd6a298 ]---

--Shyam

  parent reply	other threads:[~2017-07-26  5:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAPh1sj5oU6QRyH_cnzrkGJb6ed3XO4fGABJ4yJLPnb-ppqVJeg@mail.gmail.com>
2017-07-21 13:31 ` Metadata corruption detected at xfs_attr3_leaf_write_verify Shyam Kaushik
2017-07-26  5:22 ` Shyam Kaushik [this message]
2017-07-26 12:15   ` Metadata corruption at xfs_attr3_leaf_write_verify() Brian Foster
     [not found] <d161b07cad6536b0baf285328cf99500@mail.gmail.com>
2017-08-01 17:30 ` Alex Lyakas
2017-08-01 18:22   ` Eric Sandeen
2017-08-01 18:57     ` Alex Lyakas
2017-08-01 19:02       ` Eric Sandeen
2017-08-01 23:18   ` Dave Chinner
2017-08-02  8:38     ` Alex Lyakas
2017-08-02 11:50       ` Dave Chinner
2017-08-07 14:31         ` Alex Lyakas
2017-08-07 13:55       ` Libor Klepáč
2017-08-07 14:32         ` Alex Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d05c5fec57e551e85e7351dee11bb4d4@mail.gmail.com \
    --to=shyam@zadarastorage.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.