* [PATCH 0/2] xfs: swapext replay fixes
@ 2018-12-18 18:59 Eric Sandeen
2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Eric Sandeen @ 2018-12-18 18:59 UTC (permalink / raw)
To: linux-xfs
We've seen 2 reports of an oops on log replay after a swapext, which
look something like:
[ 63.188907] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[ 63.196774] IP: [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs]
...
[ 63.309769] RIP: 0010:[<ffffffffc093e5e6>] [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs]
[ 63.319132] RSP: 0018:ffff951b4d2977d0 EFLAGS: 00010282
[ 63.324443] RAX: ffff951b4df63440 RBX: ffff951ba8a48000 RCX: 0000000000000000
[ 63.331570] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff951b4df63518
[ 63.338689] RBP: ffff951b4d2977f8 R08: 000000000001e2f0 R09: ffff951b4df63440
[ 63.345818] R10: ffff951afa403800 R11: ffffffffffffffe0 R12: ffff951b4d0a3000
[ 63.352949] R13: 0000000000000000 R14: 0000000000000000 R15: ffff951ba8a48040
[ 63.360079] FS: 00007f72bcd1f880(0000) GS:ffff951bafc80000(0000) knlGS:0000000000000000
[ 63.368160] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 63.373893] CR2: 0000000000000004 CR3: 0000000128d8a000 CR4: 00000000000007e0
[ 63.381029] Call Trace:
[ 63.383541] [<ffffffffc093e7f7>] xfs_bmbt_change_owner+0x27/0x70 [xfs]
[ 63.390225] [<ffffffffc09941a3>] xfs_recover_inode_owner_change.isra.26+0xa3/0xc0 [xfs]
[ 63.398351] [<ffffffffc0995b9c>] xlog_recover_inode_pass2+0x4cc/0x9d0 [xfs]
[ 63.405460] [<ffffffffc0990018>] ? xfs_efi_release+0x58/0x80 [xfs]
[ 63.411763] [<ffffffffc0996192>] xlog_recover_commit_pass2+0xf2/0x1a0 [xfs]
[ 63.418848] [<ffffffffc0996289>] xlog_recover_items_pass2+0x49/0x70 [xfs]
[ 63.425769] [<ffffffffc09964c5>] xlog_recover_commit_trans+0x215/0x250 [xfs]
...
I can reproduce this with a hacky script that can be cleaned up and turned into
an xfstest:
#!/bin/bash
DEV=/dev/sdZ1
umount $DEV
mkfs.xfs -f $DEV
mkdir -p /mnt/test
mount $DEV /mnt/test
for I in `seq 4194304 -4096 0`; do
((I % 12288)) || continue
xfs_io -f -d -c "pwrite $I 4096" /mnt/test/fragfile
done
xfs_fsr -v /mnt/test/fragfile
xfs_io -c "truncate 0" /mnt/test/fragfile
for I in `seq 32768 -4096 0`; do
xfs_io -f -c "pwrite $I 4096" /mnt/test/fragfile
done
sync
xfs_io -x -c shutdown /mnt/test
umount /mnt/test
mount $DEV /mnt/test
====
The upshot is that if we successfully fsr a file and it remains in btree format,
then it changes to non-btree format and we crash, log replay will still try to
do xfs_bmbt_change_owner which will oops if we're not in btree format.
Patches to fix this follow. Lightly tested. Thanks to dave for talking
through this one with me.
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change 2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen @ 2018-12-18 19:02 ` Eric Sandeen 2018-12-18 19:15 ` Darrick J. Wong 2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen 2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong 2 siblings, 1 reply; 10+ messages in thread From: Eric Sandeen @ 2018-12-18 19:02 UTC (permalink / raw) To: Eric Sandeen, linux-xfs Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER is set, XFS_ILOG_DBROOT must be as well, via an assert. However, this could fail to be true due to fuzzing or corruption, so we really should handle it gracefully rather than calling ASSERT() and crashing, or blowing past it on a non-debug build and BUGging later. Return -EFSCORRUPTED and fail the log replay if we find this inconsistent state. Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 1fc9e9042e0e..56148a3083b8 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change( } if (in_f->ilf_fields & XFS_ILOG_DOWNER) { - ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT); + if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) { + error = -EFSCORRUPTED; + goto out_free_ip; + } error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK, ip->i_ino, buffer_list); if (error) @@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change( } if (in_f->ilf_fields & XFS_ILOG_AOWNER) { - ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT); + if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) { + error = -EFSCORRUPTED; + goto out_free_ip; + } error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK, ip->i_ino, buffer_list); if (error) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change 2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen @ 2018-12-18 19:15 ` Darrick J. Wong 2018-12-18 19:23 ` Eric Sandeen 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2018-12-18 19:15 UTC (permalink / raw) To: Eric Sandeen; +Cc: Eric Sandeen, linux-xfs On Tue, Dec 18, 2018 at 01:02:56PM -0600, Eric Sandeen wrote: > Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER > is set, XFS_ILOG_DBROOT must be as well, via an assert. However, this > could fail to be true due to fuzzing or corruption, so we really > should handle it gracefully rather than calling ASSERT() and crashing, > or blowing past it on a non-debug build and BUGging later. > > Return -EFSCORRUPTED and fail the log replay if we find this > inconsistent state. > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index 1fc9e9042e0e..56148a3083b8 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change( > } > > if (in_f->ilf_fields & XFS_ILOG_DOWNER) { > - ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT); > + if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) { > + error = -EFSCORRUPTED; > + goto out_free_ip; > + } > error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK, > ip->i_ino, buffer_list); > if (error) > @@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change( > } > > if (in_f->ilf_fields & XFS_ILOG_AOWNER) { > - ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT); > + if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) { > + error = -EFSCORRUPTED; > + goto out_free_ip; Are there any downsides to changing the data fork owner and bailing out afterwards if we encounter the combination of (DOWNER | DBROOT | AOWNER)? Thinking this through, the log won't continue recovering, so you have to run xfs_repair -L which zaps the log and checks everything. We already finished the data fork bmbt update so (barring other problems) it should be fine. The attr fork won't have been updated, but its log entries were unrecoverable, so at worst we lose the attr fork, right? And we don't want the ABROOT check any earlier, because we don't want to forego a data fork owner update that might have succeeded anyway and we'll definitely lose it if we don't update it and xfs_repair encounters it. Right? If so, then, Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > + } > error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK, > ip->i_ino, buffer_list); > if (error) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change 2018-12-18 19:15 ` Darrick J. Wong @ 2018-12-18 19:23 ` Eric Sandeen 2018-12-18 19:30 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Eric Sandeen @ 2018-12-18 19:23 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Eric Sandeen, linux-xfs On 12/18/18 1:15 PM, Darrick J. Wong wrote: > On Tue, Dec 18, 2018 at 01:02:56PM -0600, Eric Sandeen wrote: >> Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER >> is set, XFS_ILOG_DBROOT must be as well, via an assert. However, this >> could fail to be true due to fuzzing or corruption, so we really >> should handle it gracefully rather than calling ASSERT() and crashing, >> or blowing past it on a non-debug build and BUGging later. >> >> Return -EFSCORRUPTED and fail the log replay if we find this >> inconsistent state. >> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >> --- >> >> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c >> index 1fc9e9042e0e..56148a3083b8 100644 >> --- a/fs/xfs/xfs_log_recover.c >> +++ b/fs/xfs/xfs_log_recover.c >> @@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change( >> } >> >> if (in_f->ilf_fields & XFS_ILOG_DOWNER) { >> - ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT); >> + if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) { >> + error = -EFSCORRUPTED; >> + goto out_free_ip; >> + } >> error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK, >> ip->i_ino, buffer_list); >> if (error) >> @@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change( >> } >> >> if (in_f->ilf_fields & XFS_ILOG_AOWNER) { >> - ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT); >> + if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) { >> + error = -EFSCORRUPTED; >> + goto out_free_ip; > > Are there any downsides to changing the data fork owner and bailing out > afterwards if we encounter the combination of (DOWNER | DBROOT | AOWNER)? Not sure I understand the Q. (Maybe you mean DOWNER && !DBROOT?) > Thinking this through, the log won't continue recovering, so you have to > run xfs_repair -L which zaps the log and checks everything. We already > finished the data fork bmbt update so (barring other problems) it should > be fine. The attr fork won't have been updated, but its log entries > were unrecoverable, so at worst we lose the attr fork, right? TBH, I hadn't really thought through "recover as much as we can before deciding we have a problem" - if we encounter this, it's an inconsistent state in the log for whatever, and we should stop. I don't ... think we're in the business of trying to second guess or fix on the fly here, right? > And we don't want the ABROOT check any earlier, because we don't want to > forego a data fork owner update that might have succeeded anyway and > we'll definitely lose it if we don't update it and xfs_repair encounters > it. Right? Again, my caveman coder brain just said "inconsistent state -> stop now." Should we be doing more? -Eric > If so, then, > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > --D > >> + } >> error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK, >> ip->i_ino, buffer_list); >> if (error) >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change 2018-12-18 19:23 ` Eric Sandeen @ 2018-12-18 19:30 ` Darrick J. Wong 0 siblings, 0 replies; 10+ messages in thread From: Darrick J. Wong @ 2018-12-18 19:30 UTC (permalink / raw) To: Eric Sandeen; +Cc: Eric Sandeen, linux-xfs On Tue, Dec 18, 2018 at 01:23:22PM -0600, Eric Sandeen wrote: > On 12/18/18 1:15 PM, Darrick J. Wong wrote: > > On Tue, Dec 18, 2018 at 01:02:56PM -0600, Eric Sandeen wrote: > >> Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER > >> is set, XFS_ILOG_DBROOT must be as well, via an assert. However, this > >> could fail to be true due to fuzzing or corruption, so we really > >> should handle it gracefully rather than calling ASSERT() and crashing, > >> or blowing past it on a non-debug build and BUGging later. > >> > >> Return -EFSCORRUPTED and fail the log replay if we find this > >> inconsistent state. > >> > >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> > >> --- > >> > >> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > >> index 1fc9e9042e0e..56148a3083b8 100644 > >> --- a/fs/xfs/xfs_log_recover.c > >> +++ b/fs/xfs/xfs_log_recover.c > >> @@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change( > >> } > >> > >> if (in_f->ilf_fields & XFS_ILOG_DOWNER) { > >> - ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT); > >> + if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) { > >> + error = -EFSCORRUPTED; > >> + goto out_free_ip; > >> + } > >> error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK, > >> ip->i_ino, buffer_list); > >> if (error) > >> @@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change( > >> } > >> > >> if (in_f->ilf_fields & XFS_ILOG_AOWNER) { > >> - ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT); > >> + if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) { > >> + error = -EFSCORRUPTED; > >> + goto out_free_ip; > > > > Are there any downsides to changing the data fork owner and bailing out > > afterwards if we encounter the combination of (DOWNER | DBROOT | AOWNER)? > > Not sure I understand the Q. > > (Maybe you mean DOWNER && !DBROOT?) No, I really did mean the case where DOWNER and DBROOT are set properly, but it's the AOWNER/ABROOT flags that aren't set properly. I was wondering why not check DOWNER/DBROOT and AOWNER/ABROOT before touching *anything* and was typing my way through it. > > Thinking this through, the log won't continue recovering, so you > > have to > > run xfs_repair -L which zaps the log and checks everything. We already > > finished the data fork bmbt update so (barring other problems) it should > > be fine. The attr fork won't have been updated, but its log entries > > were unrecoverable, so at worst we lose the attr fork, right? > > TBH, I hadn't really thought through "recover as much as we can before > deciding we have a problem" - if we encounter this, it's an inconsistent > state in the log for whatever, and we should stop. I don't ... think > we're in the business of trying to second guess or fix on the fly here, > right? If that's true then we ought to validate all four flags before calling xfs_bmbt_change_owner(), right? > > And we don't want the ABROOT check any earlier, because we don't want to > > forego a data fork owner update that might have succeeded anyway and > > we'll definitely lose it if we don't update it and xfs_repair encounters > > it. Right? > > Again, my caveman coder brain just said "inconsistent state -> stop now." > > Should we be doing more? See my reply to the second patch, sorry. :/ --D > -Eric > > > If so, then, > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > --D > > > >> + } > >> error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK, > >> ip->i_ino, buffer_list); > >> if (error) > >> > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats 2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen 2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen @ 2018-12-18 19:09 ` Eric Sandeen 2018-12-18 19:28 ` Darrick J. Wong 2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong 2 siblings, 1 reply; 10+ messages in thread From: Eric Sandeen @ 2018-12-18 19:09 UTC (permalink / raw) To: Eric Sandeen, linux-xfs If an inode had been in btree format and had a data fork owner change logged, that flag must be cleared if the inode changes format to non-btree. Otherwise we hit the old ASSERT (now changed to corruption detection) in xfs_recover_inode_owner_change() which enforces that if XFS_ILOG_[AD]OWNER is set, XFS_ILOG_[AD]BROOT must be as well. (Or, put more plainly, changing the fork owner is nonsensical for inodes which have no such fork.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index fa1c4fe2ffbf..82a13b7ad321 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -144,7 +144,8 @@ xfs_inode_item_format_data_fork( switch (ip->i_d.di_format) { case XFS_DINODE_FMT_EXTENTS: iip->ili_fields &= - ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEV); + ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | + XFS_ILOG_DEV | XFS_ILOG_DOWNER); if ((iip->ili_fields & XFS_ILOG_DEXT) && ip->i_d.di_nextents > 0 && @@ -185,7 +186,8 @@ xfs_inode_item_format_data_fork( break; case XFS_DINODE_FMT_LOCAL: iip->ili_fields &= - ~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV); + ~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | + XFS_ILOG_DEV | XFS_ILOG_DOWNER); if ((iip->ili_fields & XFS_ILOG_DDATA) && ip->i_df.if_bytes > 0) { /* @@ -206,7 +208,8 @@ xfs_inode_item_format_data_fork( break; case XFS_DINODE_FMT_DEV: iip->ili_fields &= - ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEXT); + ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | + XFS_ILOG_DEXT | XFS_ILOG_DOWNER); if (iip->ili_fields & XFS_ILOG_DEV) ilf->ilf_u.ilfu_rdev = sysv_encode_dev(VFS_I(ip)->i_rdev); break; @@ -229,7 +232,7 @@ xfs_inode_item_format_attr_fork( switch (ip->i_d.di_aformat) { case XFS_DINODE_FMT_EXTENTS: iip->ili_fields &= - ~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT); + ~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER); if ((iip->ili_fields & XFS_ILOG_AEXT) && ip->i_d.di_anextents > 0 && @@ -268,7 +271,7 @@ xfs_inode_item_format_attr_fork( break; case XFS_DINODE_FMT_LOCAL: iip->ili_fields &= - ~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT); + ~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER); if ((iip->ili_fields & XFS_ILOG_ADATA) && ip->i_afp->if_bytes > 0) { ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats 2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen @ 2018-12-18 19:28 ` Darrick J. Wong 2018-12-18 19:42 ` Eric Sandeen 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2018-12-18 19:28 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-xfs On Tue, Dec 18, 2018 at 01:09:40PM -0600, Eric Sandeen wrote: > If an inode had been in btree format and had a data fork owner change > logged, that flag must be cleared if the inode changes format to > non-btree. Otherwise we hit the old ASSERT (now changed to > corruption detection) in xfs_recover_inode_owner_change() which > enforces that if XFS_ILOG_[AD]OWNER is set, XFS_ILOG_[AD]BROOT > must be as well. > > (Or, put more plainly, changing the fork owner is nonsensical > for inodes which have no such fork.) ...and now I learn why not to review a series until all the patches show up. :( > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > index fa1c4fe2ffbf..82a13b7ad321 100644 > --- a/fs/xfs/xfs_inode_item.c > +++ b/fs/xfs/xfs_inode_item.c > @@ -144,7 +144,8 @@ xfs_inode_item_format_data_fork( > switch (ip->i_d.di_format) { > case XFS_DINODE_FMT_EXTENTS: > iip->ili_fields &= > - ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEV); > + ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | > + XFS_ILOG_DEV | XFS_ILOG_DOWNER); So ... existing kernels could accidentally log an inode with an extents format data fork and (DOWNER | DBROOT). If a kernel encounters such a log item, it would assert and kaboom, and now it'll fail the rest of log recovery, which forces the admin to zap the log. However, if the data fork is in extents format there isn't any work required to change the owner anyway, so why do we have to abort the log recovery in that case? IOWs, why doesn't the previous patch do: /* * Some old kernels mistakenly log inode items with DOWNER set on a * inode with a non-btree format data fork. DBROOT won't be set in this * non-btree case, but since the work that DOWNER does is only required * for the btree case we can safely ignore this one weird combination. */ if ((ili_fields & DOWNER) && di_format == FMT_BTREE) { if (!(ili_fields & DBROOT)) return -EFSCORRUPTED; error = xfs_bmbt_change_owner(...); } --D > > if ((iip->ili_fields & XFS_ILOG_DEXT) && > ip->i_d.di_nextents > 0 && > @@ -185,7 +186,8 @@ xfs_inode_item_format_data_fork( > break; > case XFS_DINODE_FMT_LOCAL: > iip->ili_fields &= > - ~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV); > + ~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | > + XFS_ILOG_DEV | XFS_ILOG_DOWNER); > if ((iip->ili_fields & XFS_ILOG_DDATA) && > ip->i_df.if_bytes > 0) { > /* > @@ -206,7 +208,8 @@ xfs_inode_item_format_data_fork( > break; > case XFS_DINODE_FMT_DEV: > iip->ili_fields &= > - ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEXT); > + ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | > + XFS_ILOG_DEXT | XFS_ILOG_DOWNER); > if (iip->ili_fields & XFS_ILOG_DEV) > ilf->ilf_u.ilfu_rdev = sysv_encode_dev(VFS_I(ip)->i_rdev); > break; > @@ -229,7 +232,7 @@ xfs_inode_item_format_attr_fork( > switch (ip->i_d.di_aformat) { > case XFS_DINODE_FMT_EXTENTS: > iip->ili_fields &= > - ~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT); > + ~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER); > > if ((iip->ili_fields & XFS_ILOG_AEXT) && > ip->i_d.di_anextents > 0 && > @@ -268,7 +271,7 @@ xfs_inode_item_format_attr_fork( > break; > case XFS_DINODE_FMT_LOCAL: > iip->ili_fields &= > - ~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT); > + ~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER); > > if ((iip->ili_fields & XFS_ILOG_ADATA) && > ip->i_afp->if_bytes > 0) { > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats 2018-12-18 19:28 ` Darrick J. Wong @ 2018-12-18 19:42 ` Eric Sandeen 0 siblings, 0 replies; 10+ messages in thread From: Eric Sandeen @ 2018-12-18 19:42 UTC (permalink / raw) To: Darrick J. Wong, Eric Sandeen; +Cc: linux-xfs On 12/18/18 1:28 PM, Darrick J. Wong wrote: > On Tue, Dec 18, 2018 at 01:09:40PM -0600, Eric Sandeen wrote: >> If an inode had been in btree format and had a data fork owner change >> logged, that flag must be cleared if the inode changes format to >> non-btree. Otherwise we hit the old ASSERT (now changed to >> corruption detection) in xfs_recover_inode_owner_change() which >> enforces that if XFS_ILOG_[AD]OWNER is set, XFS_ILOG_[AD]BROOT >> must be as well. >> >> (Or, put more plainly, changing the fork owner is nonsensical >> for inodes which have no such fork.) > > ...and now I learn why not to review a series until all the patches show > up. :( > >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >> --- >> >> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c >> index fa1c4fe2ffbf..82a13b7ad321 100644 >> --- a/fs/xfs/xfs_inode_item.c >> +++ b/fs/xfs/xfs_inode_item.c >> @@ -144,7 +144,8 @@ xfs_inode_item_format_data_fork( >> switch (ip->i_d.di_format) { >> case XFS_DINODE_FMT_EXTENTS: >> iip->ili_fields &= >> - ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEV); >> + ~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | >> + XFS_ILOG_DEV | XFS_ILOG_DOWNER); > > So ... existing kernels could accidentally log an inode with an extents > format data fork and (DOWNER | DBROOT). If a kernel encounters such a > log item, it would assert and kaboom, and now it'll fail the rest of log > recovery, which forces the admin to zap the log. > > However, if the data fork is in extents format there isn't any work > required to change the owner anyway, so why do we have to abort the log > recovery in that case? Because I didn't think we were in the business of guessing which bits of inconsistency indicate corruption and which bits are "probably ok" during log recovery... We have a mismatch between the logged fields and the format. Your suggestion seems to put a lot of faith in the format being correct, and therefore deciding to ignore the inconsistency in the fields, no? i.e. if we don't see FMT_BTREE, what justification do we have for completely ignoring the flag inconsistency? What if the format was wrong? I don't really see how we can make that prioritization. Maybe I'm not thinking creatively enough and you can convince me. :) (Granted, the admin recovery path of "throw it all away" does suck, but such is life.) -Eric > IOWs, why doesn't the previous patch do: > > /* > * Some old kernels mistakenly log inode items with DOWNER set on a > * inode with a non-btree format data fork. DBROOT won't be set in this > * non-btree case, but since the work that DOWNER does is only required > * for the btree case we can safely ignore this one weird combination. > */ > if ((ili_fields & DOWNER) && di_format == FMT_BTREE) { > if (!(ili_fields & DBROOT)) > return -EFSCORRUPTED; > error = xfs_bmbt_change_owner(...); > } > > --D ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] xfs: swapext replay fixes 2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen 2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen 2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen @ 2018-12-18 19:31 ` Darrick J. Wong 2018-12-18 19:43 ` Eric Sandeen 2 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2018-12-18 19:31 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-xfs On Tue, Dec 18, 2018 at 12:59:15PM -0600, Eric Sandeen wrote: > We've seen 2 reports of an oops on log replay after a swapext, which > look something like: > > [ 63.188907] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 > [ 63.196774] IP: [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] > ... > [ 63.309769] RIP: 0010:[<ffffffffc093e5e6>] [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] > [ 63.319132] RSP: 0018:ffff951b4d2977d0 EFLAGS: 00010282 > [ 63.324443] RAX: ffff951b4df63440 RBX: ffff951ba8a48000 RCX: 0000000000000000 > [ 63.331570] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff951b4df63518 > [ 63.338689] RBP: ffff951b4d2977f8 R08: 000000000001e2f0 R09: ffff951b4df63440 > [ 63.345818] R10: ffff951afa403800 R11: ffffffffffffffe0 R12: ffff951b4d0a3000 > [ 63.352949] R13: 0000000000000000 R14: 0000000000000000 R15: ffff951ba8a48040 > [ 63.360079] FS: 00007f72bcd1f880(0000) GS:ffff951bafc80000(0000) knlGS:0000000000000000 > [ 63.368160] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 63.373893] CR2: 0000000000000004 CR3: 0000000128d8a000 CR4: 00000000000007e0 > [ 63.381029] Call Trace: > [ 63.383541] [<ffffffffc093e7f7>] xfs_bmbt_change_owner+0x27/0x70 [xfs] > [ 63.390225] [<ffffffffc09941a3>] xfs_recover_inode_owner_change.isra.26+0xa3/0xc0 [xfs] > [ 63.398351] [<ffffffffc0995b9c>] xlog_recover_inode_pass2+0x4cc/0x9d0 [xfs] > [ 63.405460] [<ffffffffc0990018>] ? xfs_efi_release+0x58/0x80 [xfs] > [ 63.411763] [<ffffffffc0996192>] xlog_recover_commit_pass2+0xf2/0x1a0 [xfs] > [ 63.418848] [<ffffffffc0996289>] xlog_recover_items_pass2+0x49/0x70 [xfs] > [ 63.425769] [<ffffffffc09964c5>] xlog_recover_commit_trans+0x215/0x250 [xfs] > ... > > I can reproduce this with a hacky script that can be cleaned up and turned into > an xfstest: > > #!/bin/bash > > DEV=/dev/sdZ1 > > umount $DEV > mkfs.xfs -f $DEV > > mkdir -p /mnt/test > mount $DEV /mnt/test > > for I in `seq 4194304 -4096 0`; do > ((I % 12288)) || continue > xfs_io -f -d -c "pwrite $I 4096" /mnt/test/fragfile > done > > xfs_fsr -v /mnt/test/fragfile > > > xfs_io -c "truncate 0" /mnt/test/fragfile > > for I in `seq 32768 -4096 0`; do > xfs_io -f -c "pwrite $I 4096" /mnt/test/fragfile > done > > sync > > xfs_io -x -c shutdown /mnt/test > umount /mnt/test > mount $DEV /mnt/test ...which I assume is going to show up on fstests shortly, right? :) --D > ==== > > The upshot is that if we successfully fsr a file and it remains in btree format, > then it changes to non-btree format and we crash, log replay will still try to > do xfs_bmbt_change_owner which will oops if we're not in btree format. > > Patches to fix this follow. Lightly tested. Thanks to dave for talking > through this one with me. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] xfs: swapext replay fixes 2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong @ 2018-12-18 19:43 ` Eric Sandeen 0 siblings, 0 replies; 10+ messages in thread From: Eric Sandeen @ 2018-12-18 19:43 UTC (permalink / raw) To: Darrick J. Wong, Eric Sandeen; +Cc: linux-xfs On 12/18/18 1:31 PM, Darrick J. Wong wrote: > On Tue, Dec 18, 2018 at 12:59:15PM -0600, Eric Sandeen wrote: >> We've seen 2 reports of an oops on log replay after a swapext, which >> look something like: >> >> [ 63.188907] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 >> [ 63.196774] IP: [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] >> ... >> [ 63.309769] RIP: 0010:[<ffffffffc093e5e6>] [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] >> [ 63.319132] RSP: 0018:ffff951b4d2977d0 EFLAGS: 00010282 >> [ 63.324443] RAX: ffff951b4df63440 RBX: ffff951ba8a48000 RCX: 0000000000000000 >> [ 63.331570] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff951b4df63518 >> [ 63.338689] RBP: ffff951b4d2977f8 R08: 000000000001e2f0 R09: ffff951b4df63440 >> [ 63.345818] R10: ffff951afa403800 R11: ffffffffffffffe0 R12: ffff951b4d0a3000 >> [ 63.352949] R13: 0000000000000000 R14: 0000000000000000 R15: ffff951ba8a48040 >> [ 63.360079] FS: 00007f72bcd1f880(0000) GS:ffff951bafc80000(0000) knlGS:0000000000000000 >> [ 63.368160] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 63.373893] CR2: 0000000000000004 CR3: 0000000128d8a000 CR4: 00000000000007e0 >> [ 63.381029] Call Trace: >> [ 63.383541] [<ffffffffc093e7f7>] xfs_bmbt_change_owner+0x27/0x70 [xfs] >> [ 63.390225] [<ffffffffc09941a3>] xfs_recover_inode_owner_change.isra.26+0xa3/0xc0 [xfs] >> [ 63.398351] [<ffffffffc0995b9c>] xlog_recover_inode_pass2+0x4cc/0x9d0 [xfs] >> [ 63.405460] [<ffffffffc0990018>] ? xfs_efi_release+0x58/0x80 [xfs] >> [ 63.411763] [<ffffffffc0996192>] xlog_recover_commit_pass2+0xf2/0x1a0 [xfs] >> [ 63.418848] [<ffffffffc0996289>] xlog_recover_items_pass2+0x49/0x70 [xfs] >> [ 63.425769] [<ffffffffc09964c5>] xlog_recover_commit_trans+0x215/0x250 [xfs] >> ... >> >> I can reproduce this with a hacky script that can be cleaned up and turned into >> an xfstest: ... > ...which I assume is going to show up on fstests shortly, right? :) yes, either zorro or I will do it. -Eric ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-12-18 19:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen 2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen 2018-12-18 19:15 ` Darrick J. Wong 2018-12-18 19:23 ` Eric Sandeen 2018-12-18 19:30 ` Darrick J. Wong 2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen 2018-12-18 19:28 ` Darrick J. Wong 2018-12-18 19:42 ` Eric Sandeen 2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong 2018-12-18 19:43 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox