linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xfs/181 trigger xfs corruption on ppc64le
@ 2016-09-21  3:08 Zorro Lang
  2017-01-05  4:03 ` Eryu Guan
  0 siblings, 1 reply; 3+ messages in thread
From: Zorro Lang @ 2016-09-21  3:08 UTC (permalink / raw)
  To: linux-xfs

Hi,

There's a XFS (v4/v5) corruption from xfs/181. If run xfs/181 on ppc64le
10~100 times (more or less) with 1k or 4k block size, it'll trigger a
corruption:
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
attribute entry #0 in attr block 0, inode 25194 is INCOMPLETE
problem with attribute contents in inode 25194
would clear attr fork
bad nblocks 33 for inode 25194, would reset to 0
bad anextents 1 for inode 25194, would reset to 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output

The ppc64le machine has 64k page size. Above corruption only can be reproduced
on ppc64le machine until now. The full output (on v4 1k blksize XFS) as below:
http://paste.fedoraproject.org/431761/14744268/



Then I tried to test with 64k blocksize, I got another problem which easiler
to reproduce, ppc64 and aarch64 which have 64 pagesize all can trigger this
problem too:
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
agi unlinked bucket 0 is 1088 in ag 0 (inode=1088)
agi unlinked bucket 1 is 1089 in ag 0 (inode=1089)
agi unlinked bucket 2 is 1090 in ag 0 (inode=1090)
agi unlinked bucket 3 is 1091 in ag 0 (inode=1091)
...
...
agi unlinked bucket 60 is 1084 in ag 0 (inode=1084)
agi unlinked bucket 61 is 1085 in ag 0 (inode=1085)
agi unlinked bucket 62 is 1086 in ag 0 (inode=1086)
agi unlinked bucket 63 is 1087 in ag 0 (inode=1087)
sb_ifree 124, counted 6
sb_fdblocks 237036, counted 245112
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata corruption detected at xfs_attr3_leaf block 0x3f80/0x10000
wrong FS UUID, inode 1145 attr block 16256
problem with attribute contents in inode 1145
would clear attr fork
bad nblocks 1 for inode 1145, would reset to 0
bad anextents 1 for inode 1145, would reset to 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 1027, would move to lost+found
disconnected inode 1028, would move to lost+found
disconnected inode 1029, would move to lost+found
disconnected inode 1030, would move to lost+found
...
...
disconnected inode 1140, would move to lost+found
disconnected inode 1141, would move to lost+found
disconnected inode 1142, would move to lost+found
disconnected inode 1143, would move to lost+found
disconnected inode 1144, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 1027 nlinks from 0 to 1
would have reset inode 1028 nlinks from 0 to 1
would have reset inode 1029 nlinks from 0 to 1
would have reset inode 1030 nlinks from 0 to 1
...
...
would have reset inode 1141 nlinks from 0 to 1
would have reset inode 1142 nlinks from 0 to 1
would have reset inode 1143 nlinks from 0 to 1
would have reset inode 1144 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting.

Thanks,
Zorro

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs/181 trigger xfs corruption on ppc64le
  2016-09-21  3:08 xfs/181 trigger xfs corruption on ppc64le Zorro Lang
@ 2017-01-05  4:03 ` Eryu Guan
  2017-01-05 14:02   ` Brian Foster
  0 siblings, 1 reply; 3+ messages in thread
From: Eryu Guan @ 2017-01-05  4:03 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 2981 bytes --]

On Wed, Sep 21, 2016 at 11:08:41AM +0800, Zorro Lang wrote:
> Hi,
> 
> There's a XFS (v4/v5) corruption from xfs/181. If run xfs/181 on ppc64le
> 10~100 times (more or less) with 1k or 4k block size, it'll trigger a
> corruption:
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> attribute entry #0 in attr block 0, inode 25194 is INCOMPLETE
> problem with attribute contents in inode 25194
> would clear attr fork
> bad nblocks 33 for inode 25194, would reset to 0
> bad anextents 1 for inode 25194, would reset to 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> *** end xfs_repair output

I hit this corruption again today with 4.10-rc2 kernel & latest master
branch of xfsprogs, still ppc64le host.

*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
attribute entry #0 in attr block 0, inode 10236 is INCOMPLETE
problem with attribute contents in inode 10236
would clear attr fork
bad nblocks 10 for inode 10236, would reset to 0
bad anextents 1 for inode 10236, would reset to 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output

I attached compressed xfs-181.full file, in case someone has interest to
look into it.

Thanks,
Eryu

[-- Attachment #2: xfs-181.full.gz --]
[-- Type: application/gzip, Size: 22997 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs/181 trigger xfs corruption on ppc64le
  2017-01-05  4:03 ` Eryu Guan
@ 2017-01-05 14:02   ` Brian Foster
  0 siblings, 0 replies; 3+ messages in thread
From: Brian Foster @ 2017-01-05 14:02 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Zorro Lang, linux-xfs

On Thu, Jan 05, 2017 at 12:03:13PM +0800, Eryu Guan wrote:
> On Wed, Sep 21, 2016 at 11:08:41AM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > There's a XFS (v4/v5) corruption from xfs/181. If run xfs/181 on ppc64le
> > 10~100 times (more or less) with 1k or 4k block size, it'll trigger a
> > corruption:
> > *** xfs_repair -n output ***
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> >         - zero log...
> >         - scan filesystem freespace and inode maps...
> >         - found root inode chunk
> > Phase 3 - for each AG...
> >         - scan (but don't clear) agi unlinked lists...
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> > attribute entry #0 in attr block 0, inode 25194 is INCOMPLETE
> > problem with attribute contents in inode 25194
> > would clear attr fork
> > bad nblocks 33 for inode 25194, would reset to 0
> > bad anextents 1 for inode 25194, would reset to 0
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> >         - process newly discovered inodes...
> > Phase 4 - check for duplicate blocks...
> >         - setting up duplicate extent list...
> >         - check for inodes claiming duplicate blocks...
> >         - agno = 0
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> > No modify flag set, skipping phase 5
> > Phase 6 - check inode connectivity...
> >         - traversing filesystem ...
> >         - traversal finished ...
> >         - moving disconnected inodes to lost+found ...
> > Phase 7 - verify link counts...
> > No modify flag set, skipping filesystem flush and exiting.
> > *** end xfs_repair output
> 
> I hit this corruption again today with 4.10-rc2 kernel & latest master
> branch of xfsprogs, still ppc64le host.
> 
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> attribute entry #0 in attr block 0, inode 10236 is INCOMPLETE
> problem with attribute contents in inode 10236
> would clear attr fork
> bad nblocks 10 for inode 10236, would reset to 0
> bad anextents 1 for inode 10236, would reset to 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 2
>         - agno = 3
>         - agno = 1
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> *** end xfs_repair output
> 

Isn't this the same as rhbz 1377163 (not sure why that bug appears to be
locked)? E.g., this is basically due to the fact that remote attribute
block allocation occurs in a separate transaction. The existence of the
incomplete flag means that by design, logging doesn't guarantee
consistency for such attributes.

Brian

> I attached compressed xfs-181.full file, in case someone has interest to
> look into it.
> 
> Thanks,
> Eryu



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-01-05 14:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-21  3:08 xfs/181 trigger xfs corruption on ppc64le Zorro Lang
2017-01-05  4:03 ` Eryu Guan
2017-01-05 14:02   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).