[PATCH 0/2] xfs: swapext replay fixes

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] xfs: swapext replay fixes
@ 2018-12-18 18:59 Eric Sandeen
  2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Eric Sandeen @ 2018-12-18 18:59 UTC (permalink / raw)
  To: linux-xfs

We've seen 2 reports of an oops on log replay after a swapext, which
look something like:

[   63.188907] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 
[   63.196774] IP: [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] 
...
[   63.309769] RIP: 0010:[<ffffffffc093e5e6>]  [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] 
[   63.319132] RSP: 0018:ffff951b4d2977d0  EFLAGS: 00010282 
[   63.324443] RAX: ffff951b4df63440 RBX: ffff951ba8a48000 RCX: 0000000000000000 
[   63.331570] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff951b4df63518 
[   63.338689] RBP: ffff951b4d2977f8 R08: 000000000001e2f0 R09: ffff951b4df63440 
[   63.345818] R10: ffff951afa403800 R11: ffffffffffffffe0 R12: ffff951b4d0a3000 
[   63.352949] R13: 0000000000000000 R14: 0000000000000000 R15: ffff951ba8a48040 
[   63.360079] FS:  00007f72bcd1f880(0000) GS:ffff951bafc80000(0000) knlGS:0000000000000000 
[   63.368160] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
[   63.373893] CR2: 0000000000000004 CR3: 0000000128d8a000 CR4: 00000000000007e0 
[   63.381029] Call Trace: 
[   63.383541]  [<ffffffffc093e7f7>] xfs_bmbt_change_owner+0x27/0x70 [xfs] 
[   63.390225]  [<ffffffffc09941a3>] xfs_recover_inode_owner_change.isra.26+0xa3/0xc0 [xfs] 
[   63.398351]  [<ffffffffc0995b9c>] xlog_recover_inode_pass2+0x4cc/0x9d0 [xfs] 
[   63.405460]  [<ffffffffc0990018>] ? xfs_efi_release+0x58/0x80 [xfs] 
[   63.411763]  [<ffffffffc0996192>] xlog_recover_commit_pass2+0xf2/0x1a0 [xfs] 
[   63.418848]  [<ffffffffc0996289>] xlog_recover_items_pass2+0x49/0x70 [xfs] 
[   63.425769]  [<ffffffffc09964c5>] xlog_recover_commit_trans+0x215/0x250 [xfs] 
...

I can reproduce this with a hacky script that can be cleaned up and turned into
an xfstest:

#!/bin/bash

DEV=/dev/sdZ1

umount $DEV
mkfs.xfs -f $DEV

mkdir -p /mnt/test
mount $DEV /mnt/test

for I in `seq 4194304 -4096 0`; do
	((I % 12288)) || continue
	xfs_io -f -d -c "pwrite $I 4096" /mnt/test/fragfile
done

xfs_fsr -v /mnt/test/fragfile


xfs_io -c "truncate 0" /mnt/test/fragfile

for I in `seq 32768 -4096 0`; do
	xfs_io -f -c "pwrite $I 4096" /mnt/test/fragfile
done

sync

xfs_io -x -c shutdown /mnt/test
umount /mnt/test
mount $DEV /mnt/test

====

The upshot is that if we successfully fsr a file and it remains in btree format,
then it changes to non-btree format and we crash, log replay will still try to
do xfs_bmbt_change_owner which will oops if we're not in btree format.

Patches to fix this follow.  Lightly tested.  Thanks to dave for talking
through this one with me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change
  2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen
@ 2018-12-18 19:02 ` Eric Sandeen
  2018-12-18 19:15   ` Darrick J. Wong
  2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen
  2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2018-12-18 19:02 UTC (permalink / raw)
  To: Eric Sandeen, linux-xfs

Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER
is set, XFS_ILOG_DBROOT must be as well, via an assert.  However, this 
could fail to be true due to fuzzing or corruption, so we really
should handle it gracefully rather than calling ASSERT() and crashing,
or blowing past it on a non-debug build and BUGging later.

Return -EFSCORRUPTED and fail the log replay if we find this
inconsistent state.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1fc9e9042e0e..56148a3083b8 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change(
 	}
 
 	if (in_f->ilf_fields & XFS_ILOG_DOWNER) {
-		ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT);
+		if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) {
+			error = -EFSCORRUPTED;
+			goto out_free_ip;
+		}
 		error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK,
 					      ip->i_ino, buffer_list);
 		if (error)
@@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change(
 	}
 
 	if (in_f->ilf_fields & XFS_ILOG_AOWNER) {
-		ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT);
+		if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) {
+			error = -EFSCORRUPTED;
+			goto out_free_ip;
+		}
 		error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK,
 					      ip->i_ino, buffer_list);
 		if (error)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change
  2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen
@ 2018-12-18 19:15   ` Darrick J. Wong
  2018-12-18 19:23     ` Eric Sandeen
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2018-12-18 19:15 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, linux-xfs

On Tue, Dec 18, 2018 at 01:02:56PM -0600, Eric Sandeen wrote:
> Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER
> is set, XFS_ILOG_DBROOT must be as well, via an assert.  However, this 
> could fail to be true due to fuzzing or corruption, so we really
> should handle it gracefully rather than calling ASSERT() and crashing,
> or blowing past it on a non-debug build and BUGging later.
> 
> Return -EFSCORRUPTED and fail the log replay if we find this
> inconsistent state.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 1fc9e9042e0e..56148a3083b8 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change(
>  	}
>  
>  	if (in_f->ilf_fields & XFS_ILOG_DOWNER) {
> -		ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT);
> +		if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) {
> +			error = -EFSCORRUPTED;
> +			goto out_free_ip;
> +		}
>  		error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK,
>  					      ip->i_ino, buffer_list);
>  		if (error)
> @@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change(
>  	}
>  
>  	if (in_f->ilf_fields & XFS_ILOG_AOWNER) {
> -		ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT);
> +		if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) {
> +			error = -EFSCORRUPTED;
> +			goto out_free_ip;

Are there any downsides to changing the data fork owner and bailing out
afterwards if we encounter the combination of (DOWNER | DBROOT | AOWNER)?

Thinking this through, the log won't continue recovering, so you have to
run xfs_repair -L which zaps the log and checks everything.  We already
finished the data fork bmbt update so (barring other problems) it should
be fine.  The attr fork won't have been updated, but its log entries
were unrecoverable, so at worst we lose the attr fork, right?

And we don't want the ABROOT check any earlier, because we don't want to
forego a data fork owner update that might have succeeded anyway and
we'll definitely lose it if we don't update it and xfs_repair encounters
it.  Right?

If so, then,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +		}
>  		error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK,
>  					      ip->i_ino, buffer_list);
>  		if (error)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change
  2018-12-18 19:15   ` Darrick J. Wong
@ 2018-12-18 19:23     ` Eric Sandeen
  2018-12-18 19:30       ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2018-12-18 19:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, linux-xfs

On 12/18/18 1:15 PM, Darrick J. Wong wrote:
> On Tue, Dec 18, 2018 at 01:02:56PM -0600, Eric Sandeen wrote:
>> Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER
>> is set, XFS_ILOG_DBROOT must be as well, via an assert.  However, this 
>> could fail to be true due to fuzzing or corruption, so we really
>> should handle it gracefully rather than calling ASSERT() and crashing,
>> or blowing past it on a non-debug build and BUGging later.
>>
>> Return -EFSCORRUPTED and fail the log replay if we find this
>> inconsistent state.
>>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>> index 1fc9e9042e0e..56148a3083b8 100644
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change(
>>  	}
>>  
>>  	if (in_f->ilf_fields & XFS_ILOG_DOWNER) {
>> -		ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT);
>> +		if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) {
>> +			error = -EFSCORRUPTED;
>> +			goto out_free_ip;
>> +		}
>>  		error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK,
>>  					      ip->i_ino, buffer_list);
>>  		if (error)
>> @@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change(
>>  	}
>>  
>>  	if (in_f->ilf_fields & XFS_ILOG_AOWNER) {
>> -		ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT);
>> +		if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) {
>> +			error = -EFSCORRUPTED;
>> +			goto out_free_ip;
> 
> Are there any downsides to changing the data fork owner and bailing out
> afterwards if we encounter the combination of (DOWNER | DBROOT | AOWNER)?

Not sure I understand the Q.

(Maybe you mean DOWNER && !DBROOT?)
 
> Thinking this through, the log won't continue recovering, so you have to
> run xfs_repair -L which zaps the log and checks everything.  We already
> finished the data fork bmbt update so (barring other problems) it should
> be fine.  The attr fork won't have been updated, but its log entries
> were unrecoverable, so at worst we lose the attr fork, right?

TBH, I hadn't really thought through "recover as much as we can before
deciding we have a problem" - if we encounter this, it's an inconsistent
state in the log for whatever, and we should stop.  I don't ... think
we're in the business of trying to second guess or fix on the fly here,
right?

> And we don't want the ABROOT check any earlier, because we don't want to
> forego a data fork owner update that might have succeeded anyway and
> we'll definitely lose it if we don't update it and xfs_repair encounters
> it.  Right?

Again, my caveman coder brain just said "inconsistent state -> stop now."

Should we be doing more?

-Eric

> If so, then,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> --D
> 
>> +		}
>>  		error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK,
>>  					      ip->i_ino, buffer_list);
>>  		if (error)
>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change
  2018-12-18 19:23     ` Eric Sandeen
@ 2018-12-18 19:30       ` Darrick J. Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2018-12-18 19:30 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, linux-xfs

On Tue, Dec 18, 2018 at 01:23:22PM -0600, Eric Sandeen wrote:
> On 12/18/18 1:15 PM, Darrick J. Wong wrote:
> > On Tue, Dec 18, 2018 at 01:02:56PM -0600, Eric Sandeen wrote:
> >> Today, xfs_recover_inode_owner_change() indicates that if XFS_ILOG_DOWNER
> >> is set, XFS_ILOG_DBROOT must be as well, via an assert.  However, this 
> >> could fail to be true due to fuzzing or corruption, so we really
> >> should handle it gracefully rather than calling ASSERT() and crashing,
> >> or blowing past it on a non-debug build and BUGging later.
> >>
> >> Return -EFSCORRUPTED and fail the log replay if we find this
> >> inconsistent state.
> >>
> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >> ---
> >>
> >> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> >> index 1fc9e9042e0e..56148a3083b8 100644
> >> --- a/fs/xfs/xfs_log_recover.c
> >> +++ b/fs/xfs/xfs_log_recover.c
> >> @@ -2964,7 +2964,10 @@ xfs_recover_inode_owner_change(
> >>  	}
> >>  
> >>  	if (in_f->ilf_fields & XFS_ILOG_DOWNER) {
> >> -		ASSERT(in_f->ilf_fields & XFS_ILOG_DBROOT);
> >> +		if (!(in_f->ilf_fields & XFS_ILOG_DBROOT)) {
> >> +			error = -EFSCORRUPTED;
> >> +			goto out_free_ip;
> >> +		}
> >>  		error = xfs_bmbt_change_owner(NULL, ip, XFS_DATA_FORK,
> >>  					      ip->i_ino, buffer_list);
> >>  		if (error)
> >> @@ -2972,7 +2975,10 @@ xfs_recover_inode_owner_change(
> >>  	}
> >>  
> >>  	if (in_f->ilf_fields & XFS_ILOG_AOWNER) {
> >> -		ASSERT(in_f->ilf_fields & XFS_ILOG_ABROOT);
> >> +		if (!(in_f->ilf_fields & XFS_ILOG_ABROOT)) {
> >> +			error = -EFSCORRUPTED;
> >> +			goto out_free_ip;
> > 
> > Are there any downsides to changing the data fork owner and bailing out
> > afterwards if we encounter the combination of (DOWNER | DBROOT | AOWNER)?
> 
> Not sure I understand the Q.
> 
> (Maybe you mean DOWNER && !DBROOT?)

No, I really did mean the case where DOWNER and DBROOT are set properly,
but it's the AOWNER/ABROOT flags that aren't set properly.  I was
wondering why not check DOWNER/DBROOT and AOWNER/ABROOT before touching
*anything* and was typing my way through it.

> > Thinking this through, the log won't continue recovering, so you
> > have to
> > run xfs_repair -L which zaps the log and checks everything.  We already
> > finished the data fork bmbt update so (barring other problems) it should
> > be fine.  The attr fork won't have been updated, but its log entries
> > were unrecoverable, so at worst we lose the attr fork, right?
> 
> TBH, I hadn't really thought through "recover as much as we can before
> deciding we have a problem" - if we encounter this, it's an inconsistent
> state in the log for whatever, and we should stop.  I don't ... think
> we're in the business of trying to second guess or fix on the fly here,
> right?

If that's true then we ought to validate all four flags before calling
xfs_bmbt_change_owner(), right?

> > And we don't want the ABROOT check any earlier, because we don't want to
> > forego a data fork owner update that might have succeeded anyway and
> > we'll definitely lose it if we don't update it and xfs_repair encounters
> > it.  Right?
> 
> Again, my caveman coder brain just said "inconsistent state -> stop now."
> 
> Should we be doing more?

See my reply to the second patch, sorry. :/

--D

> -Eric
> 
> > If so, then,
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > --D
> > 
> >> +		}
> >>  		error = xfs_bmbt_change_owner(NULL, ip, XFS_ATTR_FORK,
> >>  					      ip->i_ino, buffer_list);
> >>  		if (error)
> >>
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats
  2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen
  2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen
@ 2018-12-18 19:09 ` Eric Sandeen
  2018-12-18 19:28   ` Darrick J. Wong
  2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2018-12-18 19:09 UTC (permalink / raw)
  To: Eric Sandeen, linux-xfs

If an inode had been in btree format and had a data fork owner change
logged, that flag must be cleared if the inode changes format to
non-btree.  Otherwise we hit the old ASSERT (now changed to
corruption detection) in xfs_recover_inode_owner_change() which
enforces that if XFS_ILOG_[AD]OWNER is set, XFS_ILOG_[AD]BROOT
must be as well.

(Or, put more plainly, changing the fork owner is nonsensical
for inodes which have no such fork.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index fa1c4fe2ffbf..82a13b7ad321 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -144,7 +144,8 @@ xfs_inode_item_format_data_fork(
 	switch (ip->i_d.di_format) {
 	case XFS_DINODE_FMT_EXTENTS:
 		iip->ili_fields &=
-			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
+			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT |
+			  XFS_ILOG_DEV | XFS_ILOG_DOWNER);
 
 		if ((iip->ili_fields & XFS_ILOG_DEXT) &&
 		    ip->i_d.di_nextents > 0 &&
@@ -185,7 +186,8 @@ xfs_inode_item_format_data_fork(
 		break;
 	case XFS_DINODE_FMT_LOCAL:
 		iip->ili_fields &=
-			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
+			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT |
+			  XFS_ILOG_DEV | XFS_ILOG_DOWNER);
 		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
 		    ip->i_df.if_bytes > 0) {
 			/*
@@ -206,7 +208,8 @@ xfs_inode_item_format_data_fork(
 		break;
 	case XFS_DINODE_FMT_DEV:
 		iip->ili_fields &=
-			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEXT);
+			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT |
+			  XFS_ILOG_DEXT | XFS_ILOG_DOWNER);
 		if (iip->ili_fields & XFS_ILOG_DEV)
 			ilf->ilf_u.ilfu_rdev = sysv_encode_dev(VFS_I(ip)->i_rdev);
 		break;
@@ -229,7 +232,7 @@ xfs_inode_item_format_attr_fork(
 	switch (ip->i_d.di_aformat) {
 	case XFS_DINODE_FMT_EXTENTS:
 		iip->ili_fields &=
-			~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT);
+			~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER);
 
 		if ((iip->ili_fields & XFS_ILOG_AEXT) &&
 		    ip->i_d.di_anextents > 0 &&
@@ -268,7 +271,7 @@ xfs_inode_item_format_attr_fork(
 		break;
 	case XFS_DINODE_FMT_LOCAL:
 		iip->ili_fields &=
-			~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT);
+			~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER);
 
 		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
 		    ip->i_afp->if_bytes > 0) {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats
  2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen
@ 2018-12-18 19:28   ` Darrick J. Wong
  2018-12-18 19:42     ` Eric Sandeen
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2018-12-18 19:28 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Tue, Dec 18, 2018 at 01:09:40PM -0600, Eric Sandeen wrote:
> If an inode had been in btree format and had a data fork owner change
> logged, that flag must be cleared if the inode changes format to
> non-btree.  Otherwise we hit the old ASSERT (now changed to
> corruption detection) in xfs_recover_inode_owner_change() which
> enforces that if XFS_ILOG_[AD]OWNER is set, XFS_ILOG_[AD]BROOT
> must be as well.
> 
> (Or, put more plainly, changing the fork owner is nonsensical
> for inodes which have no such fork.)

...and now I learn why not to review a series until all the patches show
up. :(

> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index fa1c4fe2ffbf..82a13b7ad321 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -144,7 +144,8 @@ xfs_inode_item_format_data_fork(
>  	switch (ip->i_d.di_format) {
>  	case XFS_DINODE_FMT_EXTENTS:
>  		iip->ili_fields &=
> -			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
> +			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT |
> +			  XFS_ILOG_DEV | XFS_ILOG_DOWNER);

So ... existing kernels could accidentally log an inode with an extents
format data fork and (DOWNER | DBROOT).  If a kernel encounters such a
log item, it would assert and kaboom, and now it'll fail the rest of log
recovery, which forces the admin to zap the log.

However, if the data fork is in extents format there isn't any work
required to change the owner anyway, so why do we have to abort the log
recovery in that case?

IOWs, why doesn't the previous patch do:

/*
 * Some old kernels mistakenly log inode items with DOWNER set on a
 * inode with a non-btree format data fork.  DBROOT won't be set in this
 * non-btree case, but since the work that DOWNER does is only required
 * for the btree case we can safely ignore this one weird combination.
 */
if ((ili_fields & DOWNER) && di_format == FMT_BTREE) {
	if (!(ili_fields & DBROOT))
		return -EFSCORRUPTED;
	error = xfs_bmbt_change_owner(...);
}

--D

>  
>  		if ((iip->ili_fields & XFS_ILOG_DEXT) &&
>  		    ip->i_d.di_nextents > 0 &&
> @@ -185,7 +186,8 @@ xfs_inode_item_format_data_fork(
>  		break;
>  	case XFS_DINODE_FMT_LOCAL:
>  		iip->ili_fields &=
> -			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
> +			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT |
> +			  XFS_ILOG_DEV | XFS_ILOG_DOWNER);
>  		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
>  		    ip->i_df.if_bytes > 0) {
>  			/*
> @@ -206,7 +208,8 @@ xfs_inode_item_format_data_fork(
>  		break;
>  	case XFS_DINODE_FMT_DEV:
>  		iip->ili_fields &=
> -			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEXT);
> +			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT |
> +			  XFS_ILOG_DEXT | XFS_ILOG_DOWNER);
>  		if (iip->ili_fields & XFS_ILOG_DEV)
>  			ilf->ilf_u.ilfu_rdev = sysv_encode_dev(VFS_I(ip)->i_rdev);
>  		break;
> @@ -229,7 +232,7 @@ xfs_inode_item_format_attr_fork(
>  	switch (ip->i_d.di_aformat) {
>  	case XFS_DINODE_FMT_EXTENTS:
>  		iip->ili_fields &=
> -			~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT);
> +			~(XFS_ILOG_ADATA | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER);
>  
>  		if ((iip->ili_fields & XFS_ILOG_AEXT) &&
>  		    ip->i_d.di_anextents > 0 &&
> @@ -268,7 +271,7 @@ xfs_inode_item_format_attr_fork(
>  		break;
>  	case XFS_DINODE_FMT_LOCAL:
>  		iip->ili_fields &=
> -			~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT);
> +			~(XFS_ILOG_AEXT | XFS_ILOG_ABROOT | XFS_ILOG_AOWNER);
>  
>  		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
>  		    ip->i_afp->if_bytes > 0) {
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats
  2018-12-18 19:28   ` Darrick J. Wong
@ 2018-12-18 19:42     ` Eric Sandeen
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2018-12-18 19:42 UTC (permalink / raw)
  To: Darrick J. Wong, Eric Sandeen; +Cc: linux-xfs



On 12/18/18 1:28 PM, Darrick J. Wong wrote:
> On Tue, Dec 18, 2018 at 01:09:40PM -0600, Eric Sandeen wrote:
>> If an inode had been in btree format and had a data fork owner change
>> logged, that flag must be cleared if the inode changes format to
>> non-btree.  Otherwise we hit the old ASSERT (now changed to
>> corruption detection) in xfs_recover_inode_owner_change() which
>> enforces that if XFS_ILOG_[AD]OWNER is set, XFS_ILOG_[AD]BROOT
>> must be as well.
>>
>> (Or, put more plainly, changing the fork owner is nonsensical
>> for inodes which have no such fork.)
> 
> ...and now I learn why not to review a series until all the patches show
> up. :(
> 
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
>> index fa1c4fe2ffbf..82a13b7ad321 100644
>> --- a/fs/xfs/xfs_inode_item.c
>> +++ b/fs/xfs/xfs_inode_item.c
>> @@ -144,7 +144,8 @@ xfs_inode_item_format_data_fork(
>>  	switch (ip->i_d.di_format) {
>>  	case XFS_DINODE_FMT_EXTENTS:
>>  		iip->ili_fields &=
>> -			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
>> +			~(XFS_ILOG_DDATA | XFS_ILOG_DBROOT |
>> +			  XFS_ILOG_DEV | XFS_ILOG_DOWNER);
> 
> So ... existing kernels could accidentally log an inode with an extents
> format data fork and (DOWNER | DBROOT).  If a kernel encounters such a
> log item, it would assert and kaboom, and now it'll fail the rest of log
> recovery, which forces the admin to zap the log.
> 
> However, if the data fork is in extents format there isn't any work
> required to change the owner anyway, so why do we have to abort the log
> recovery in that case?

Because I didn't think we were in the business of guessing which bits of
inconsistency indicate corruption and which bits are "probably ok"
during log recovery...

We have a mismatch between the logged fields and the format.  Your
suggestion seems to put a lot of faith in the format being correct,
and therefore deciding to ignore the inconsistency in the fields, no?

i.e. if we don't see FMT_BTREE, what justification do we have for
completely ignoring the flag inconsistency?  What if the format was
wrong? 

I don't really see how we can make that prioritization.  Maybe I'm
not thinking creatively enough and you can convince me.  :)

(Granted, the admin recovery path of "throw it all away" does suck,
but such is life.)

-Eric

> IOWs, why doesn't the previous patch do:
> 
> /*
>  * Some old kernels mistakenly log inode items with DOWNER set on a
>  * inode with a non-btree format data fork.  DBROOT won't be set in this
>  * non-btree case, but since the work that DOWNER does is only required
>  * for the btree case we can safely ignore this one weird combination.
>  */
> if ((ili_fields & DOWNER) && di_format == FMT_BTREE) {
> 	if (!(ili_fields & DBROOT))
> 		return -EFSCORRUPTED;
> 	error = xfs_bmbt_change_owner(...);
> }
> 
> --D

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] xfs: swapext replay fixes
  2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen
  2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen
  2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen
@ 2018-12-18 19:31 ` Darrick J. Wong
  2018-12-18 19:43   ` Eric Sandeen
  2 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2018-12-18 19:31 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Tue, Dec 18, 2018 at 12:59:15PM -0600, Eric Sandeen wrote:
> We've seen 2 reports of an oops on log replay after a swapext, which
> look something like:
> 
> [   63.188907] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 
> [   63.196774] IP: [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] 
> ...
> [   63.309769] RIP: 0010:[<ffffffffc093e5e6>]  [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] 
> [   63.319132] RSP: 0018:ffff951b4d2977d0  EFLAGS: 00010282 
> [   63.324443] RAX: ffff951b4df63440 RBX: ffff951ba8a48000 RCX: 0000000000000000 
> [   63.331570] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff951b4df63518 
> [   63.338689] RBP: ffff951b4d2977f8 R08: 000000000001e2f0 R09: ffff951b4df63440 
> [   63.345818] R10: ffff951afa403800 R11: ffffffffffffffe0 R12: ffff951b4d0a3000 
> [   63.352949] R13: 0000000000000000 R14: 0000000000000000 R15: ffff951ba8a48040 
> [   63.360079] FS:  00007f72bcd1f880(0000) GS:ffff951bafc80000(0000) knlGS:0000000000000000 
> [   63.368160] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
> [   63.373893] CR2: 0000000000000004 CR3: 0000000128d8a000 CR4: 00000000000007e0 
> [   63.381029] Call Trace: 
> [   63.383541]  [<ffffffffc093e7f7>] xfs_bmbt_change_owner+0x27/0x70 [xfs] 
> [   63.390225]  [<ffffffffc09941a3>] xfs_recover_inode_owner_change.isra.26+0xa3/0xc0 [xfs] 
> [   63.398351]  [<ffffffffc0995b9c>] xlog_recover_inode_pass2+0x4cc/0x9d0 [xfs] 
> [   63.405460]  [<ffffffffc0990018>] ? xfs_efi_release+0x58/0x80 [xfs] 
> [   63.411763]  [<ffffffffc0996192>] xlog_recover_commit_pass2+0xf2/0x1a0 [xfs] 
> [   63.418848]  [<ffffffffc0996289>] xlog_recover_items_pass2+0x49/0x70 [xfs] 
> [   63.425769]  [<ffffffffc09964c5>] xlog_recover_commit_trans+0x215/0x250 [xfs] 
> ...
> 
> I can reproduce this with a hacky script that can be cleaned up and turned into
> an xfstest:
> 
> #!/bin/bash
> 
> DEV=/dev/sdZ1
> 
> umount $DEV
> mkfs.xfs -f $DEV
> 
> mkdir -p /mnt/test
> mount $DEV /mnt/test
> 
> for I in `seq 4194304 -4096 0`; do
> 	((I % 12288)) || continue
> 	xfs_io -f -d -c "pwrite $I 4096" /mnt/test/fragfile
> done
> 
> xfs_fsr -v /mnt/test/fragfile
> 
> 
> xfs_io -c "truncate 0" /mnt/test/fragfile
> 
> for I in `seq 32768 -4096 0`; do
> 	xfs_io -f -c "pwrite $I 4096" /mnt/test/fragfile
> done
> 
> sync
> 
> xfs_io -x -c shutdown /mnt/test
> umount /mnt/test
> mount $DEV /mnt/test

...which I assume is going to show up on fstests shortly, right? :)

--D

> ====
> 
> The upshot is that if we successfully fsr a file and it remains in btree format,
> then it changes to non-btree format and we crash, log replay will still try to
> do xfs_bmbt_change_owner which will oops if we're not in btree format.
> 
> Patches to fix this follow.  Lightly tested.  Thanks to dave for talking
> through this one with me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] xfs: swapext replay fixes
  2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong
@ 2018-12-18 19:43   ` Eric Sandeen
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2018-12-18 19:43 UTC (permalink / raw)
  To: Darrick J. Wong, Eric Sandeen; +Cc: linux-xfs



On 12/18/18 1:31 PM, Darrick J. Wong wrote:
> On Tue, Dec 18, 2018 at 12:59:15PM -0600, Eric Sandeen wrote:
>> We've seen 2 reports of an oops on log replay after a swapext, which
>> look something like:
>>
>> [   63.188907] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 
>> [   63.196774] IP: [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] 
>> ...
>> [   63.309769] RIP: 0010:[<ffffffffc093e5e6>]  [<ffffffffc093e5e6>] xfs_bmbt_init_cursor+0x46/0x180 [xfs] 
>> [   63.319132] RSP: 0018:ffff951b4d2977d0  EFLAGS: 00010282 
>> [   63.324443] RAX: ffff951b4df63440 RBX: ffff951ba8a48000 RCX: 0000000000000000 
>> [   63.331570] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff951b4df63518 
>> [   63.338689] RBP: ffff951b4d2977f8 R08: 000000000001e2f0 R09: ffff951b4df63440 
>> [   63.345818] R10: ffff951afa403800 R11: ffffffffffffffe0 R12: ffff951b4d0a3000 
>> [   63.352949] R13: 0000000000000000 R14: 0000000000000000 R15: ffff951ba8a48040 
>> [   63.360079] FS:  00007f72bcd1f880(0000) GS:ffff951bafc80000(0000) knlGS:0000000000000000 
>> [   63.368160] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
>> [   63.373893] CR2: 0000000000000004 CR3: 0000000128d8a000 CR4: 00000000000007e0 
>> [   63.381029] Call Trace: 
>> [   63.383541]  [<ffffffffc093e7f7>] xfs_bmbt_change_owner+0x27/0x70 [xfs] 
>> [   63.390225]  [<ffffffffc09941a3>] xfs_recover_inode_owner_change.isra.26+0xa3/0xc0 [xfs] 
>> [   63.398351]  [<ffffffffc0995b9c>] xlog_recover_inode_pass2+0x4cc/0x9d0 [xfs] 
>> [   63.405460]  [<ffffffffc0990018>] ? xfs_efi_release+0x58/0x80 [xfs] 
>> [   63.411763]  [<ffffffffc0996192>] xlog_recover_commit_pass2+0xf2/0x1a0 [xfs] 
>> [   63.418848]  [<ffffffffc0996289>] xlog_recover_items_pass2+0x49/0x70 [xfs] 
>> [   63.425769]  [<ffffffffc09964c5>] xlog_recover_commit_trans+0x215/0x250 [xfs] 
>> ...
>>
>> I can reproduce this with a hacky script that can be cleaned up and turned into
>> an xfstest:

...

> ...which I assume is going to show up on fstests shortly, right? :)

yes, either zorro or I will do it.

-Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-12-18 19:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-18 18:59 [PATCH 0/2] xfs: swapext replay fixes Eric Sandeen
2018-12-18 19:02 ` [PATCH 1/2] xfs: handle bad flags in xfs_recover_inode_owner_change Eric Sandeen
2018-12-18 19:15   ` Darrick J. Wong
2018-12-18 19:23     ` Eric Sandeen
2018-12-18 19:30       ` Darrick J. Wong
2018-12-18 19:09 ` [PATCH 2/2] xfs: clear XFS_ILOG_[AD]OWNER for non-btree formats Eric Sandeen
2018-12-18 19:28   ` Darrick J. Wong
2018-12-18 19:42     ` Eric Sandeen
2018-12-18 19:31 ` [PATCH 0/2] xfs: swapext replay fixes Darrick J. Wong
2018-12-18 19:43   ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox