* xfs/181 trigger xfs corruption on ppc64le
@ 2016-09-21 3:08 Zorro Lang
2017-01-05 4:03 ` Eryu Guan
0 siblings, 1 reply; 3+ messages in thread
From: Zorro Lang @ 2016-09-21 3:08 UTC (permalink / raw)
To: linux-xfs
Hi,
There's a XFS (v4/v5) corruption from xfs/181. If run xfs/181 on ppc64le
10~100 times (more or less) with 1k or 4k block size, it'll trigger a
corruption:
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
attribute entry #0 in attr block 0, inode 25194 is INCOMPLETE
problem with attribute contents in inode 25194
would clear attr fork
bad nblocks 33 for inode 25194, would reset to 0
bad anextents 1 for inode 25194, would reset to 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output
The ppc64le machine has 64k page size. Above corruption only can be reproduced
on ppc64le machine until now. The full output (on v4 1k blksize XFS) as below:
http://paste.fedoraproject.org/431761/14744268/
Then I tried to test with 64k blocksize, I got another problem which easiler
to reproduce, ppc64 and aarch64 which have 64 pagesize all can trigger this
problem too:
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
agi unlinked bucket 0 is 1088 in ag 0 (inode=1088)
agi unlinked bucket 1 is 1089 in ag 0 (inode=1089)
agi unlinked bucket 2 is 1090 in ag 0 (inode=1090)
agi unlinked bucket 3 is 1091 in ag 0 (inode=1091)
...
...
agi unlinked bucket 60 is 1084 in ag 0 (inode=1084)
agi unlinked bucket 61 is 1085 in ag 0 (inode=1085)
agi unlinked bucket 62 is 1086 in ag 0 (inode=1086)
agi unlinked bucket 63 is 1087 in ag 0 (inode=1087)
sb_ifree 124, counted 6
sb_fdblocks 237036, counted 245112
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
Metadata corruption detected at xfs_attr3_leaf block 0x3f80/0x10000
wrong FS UUID, inode 1145 attr block 16256
problem with attribute contents in inode 1145
would clear attr fork
bad nblocks 1 for inode 1145, would reset to 0
bad anextents 1 for inode 1145, would reset to 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 2
- agno = 3
- agno = 1
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 1027, would move to lost+found
disconnected inode 1028, would move to lost+found
disconnected inode 1029, would move to lost+found
disconnected inode 1030, would move to lost+found
...
...
disconnected inode 1140, would move to lost+found
disconnected inode 1141, would move to lost+found
disconnected inode 1142, would move to lost+found
disconnected inode 1143, would move to lost+found
disconnected inode 1144, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 1027 nlinks from 0 to 1
would have reset inode 1028 nlinks from 0 to 1
would have reset inode 1029 nlinks from 0 to 1
would have reset inode 1030 nlinks from 0 to 1
...
...
would have reset inode 1141 nlinks from 0 to 1
would have reset inode 1142 nlinks from 0 to 1
would have reset inode 1143 nlinks from 0 to 1
would have reset inode 1144 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting.
Thanks,
Zorro
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: xfs/181 trigger xfs corruption on ppc64le
2016-09-21 3:08 xfs/181 trigger xfs corruption on ppc64le Zorro Lang
@ 2017-01-05 4:03 ` Eryu Guan
2017-01-05 14:02 ` Brian Foster
0 siblings, 1 reply; 3+ messages in thread
From: Eryu Guan @ 2017-01-05 4:03 UTC (permalink / raw)
To: Zorro Lang; +Cc: linux-xfs
[-- Attachment #1: Type: text/plain, Size: 2981 bytes --]
On Wed, Sep 21, 2016 at 11:08:41AM +0800, Zorro Lang wrote:
> Hi,
>
> There's a XFS (v4/v5) corruption from xfs/181. If run xfs/181 on ppc64le
> 10~100 times (more or less) with 1k or 4k block size, it'll trigger a
> corruption:
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> attribute entry #0 in attr block 0, inode 25194 is INCOMPLETE
> problem with attribute contents in inode 25194
> would clear attr fork
> bad nblocks 33 for inode 25194, would reset to 0
> bad anextents 1 for inode 25194, would reset to 0
> - agno = 1
> - agno = 2
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> *** end xfs_repair output
I hit this corruption again today with 4.10-rc2 kernel & latest master
branch of xfsprogs, still ppc64le host.
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
attribute entry #0 in attr block 0, inode 10236 is INCOMPLETE
problem with attribute contents in inode 10236
would clear attr fork
bad nblocks 10 for inode 10236, would reset to 0
bad anextents 1 for inode 10236, would reset to 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 2
- agno = 3
- agno = 1
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output
I attached compressed xfs-181.full file, in case someone has interest to
look into it.
Thanks,
Eryu
[-- Attachment #2: xfs-181.full.gz --]
[-- Type: application/gzip, Size: 22997 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: xfs/181 trigger xfs corruption on ppc64le
2017-01-05 4:03 ` Eryu Guan
@ 2017-01-05 14:02 ` Brian Foster
0 siblings, 0 replies; 3+ messages in thread
From: Brian Foster @ 2017-01-05 14:02 UTC (permalink / raw)
To: Eryu Guan; +Cc: Zorro Lang, linux-xfs
On Thu, Jan 05, 2017 at 12:03:13PM +0800, Eryu Guan wrote:
> On Wed, Sep 21, 2016 at 11:08:41AM +0800, Zorro Lang wrote:
> > Hi,
> >
> > There's a XFS (v4/v5) corruption from xfs/181. If run xfs/181 on ppc64le
> > 10~100 times (more or less) with 1k or 4k block size, it'll trigger a
> > corruption:
> > *** xfs_repair -n output ***
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> > - zero log...
> > - scan filesystem freespace and inode maps...
> > - found root inode chunk
> > Phase 3 - for each AG...
> > - scan (but don't clear) agi unlinked lists...
> > - process known inodes and perform inode discovery...
> > - agno = 0
> > attribute entry #0 in attr block 0, inode 25194 is INCOMPLETE
> > problem with attribute contents in inode 25194
> > would clear attr fork
> > bad nblocks 33 for inode 25194, would reset to 0
> > bad anextents 1 for inode 25194, would reset to 0
> > - agno = 1
> > - agno = 2
> > - agno = 3
> > - process newly discovered inodes...
> > Phase 4 - check for duplicate blocks...
> > - setting up duplicate extent list...
> > - check for inodes claiming duplicate blocks...
> > - agno = 0
> > - agno = 1
> > - agno = 2
> > - agno = 3
> > No modify flag set, skipping phase 5
> > Phase 6 - check inode connectivity...
> > - traversing filesystem ...
> > - traversal finished ...
> > - moving disconnected inodes to lost+found ...
> > Phase 7 - verify link counts...
> > No modify flag set, skipping filesystem flush and exiting.
> > *** end xfs_repair output
>
> I hit this corruption again today with 4.10-rc2 kernel & latest master
> branch of xfsprogs, still ppc64le host.
>
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> attribute entry #0 in attr block 0, inode 10236 is INCOMPLETE
> problem with attribute contents in inode 10236
> would clear attr fork
> bad nblocks 10 for inode 10236, would reset to 0
> bad anextents 1 for inode 10236, would reset to 0
> - agno = 1
> - agno = 2
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 2
> - agno = 3
> - agno = 1
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> *** end xfs_repair output
>
Isn't this the same as rhbz 1377163 (not sure why that bug appears to be
locked)? E.g., this is basically due to the fact that remote attribute
block allocation occurs in a separate transaction. The existence of the
incomplete flag means that by design, logging doesn't guarantee
consistency for such attributes.
Brian
> I attached compressed xfs-181.full file, in case someone has interest to
> look into it.
>
> Thanks,
> Eryu
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-01-05 14:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-21 3:08 xfs/181 trigger xfs corruption on ppc64le Zorro Lang
2017-01-05 4:03 ` Eryu Guan
2017-01-05 14:02 ` Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).