All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/2] xfs: catch inode allocation state mismatch corruption
Date: Wed, 21 Mar 2018 10:06:40 -0700	[thread overview]
Message-ID: <20180321170640.GC4818@magnolia> (raw)
In-Reply-To: <20180321075248.7677-2-david@fromorbit.com>

On Wed, Mar 21, 2018 at 06:52:47PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We recently came across a V4 filesystem causing memory corruption
> due to a newly allocated inode being setup twice and being added to
> the superblock inode list twice. From code inspection, the only way
> this could happen is if a newly allocated inode was not marked as
> free on disk (i.e. di_mode wasn't zero).
> 
> Running the metadump on an upstream debug kernel fails during inode
> allocation like so:
> 
> XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inode.c, line: 838
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:114!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> RIP: 0010:assfail+0x28/0x30
> RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
> RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
> RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
> R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
> FS:  00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
> Call Trace:
>  xfs_ialloc+0x383/0x570
>  xfs_dir_ialloc+0x6a/0x2a0
>  xfs_create+0x412/0x670
>  xfs_generic_create+0x1f7/0x2c0
>  ? capable_wrt_inode_uidgid+0x3f/0x50
>  vfs_mkdir+0xfb/0x1b0
>  SyS_mkdir+0xcf/0xf0
>  do_syscall_64+0x73/0x1a0
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> Extracting the inode number we crashed on from an event trace and
> looking at it with xfs_db:
> 
> xfs_db> inode 184452204
> xfs_db> p
> core.magic = 0x494e
> core.mode = 0100644
> core.version = 2
> core.format = 2 (extents)
> core.nlinkv2 = 1
> core.onlink = 0
> .....
> 
> Confirms that it is not a free inode on disk. xfs_repair
> also trips over this inode:
> 
> .....
> zero length extent (off = 0, fsbno = 0) in ino 184452204
> correcting nextents for inode 184452204
> bad attribute fork in inode 184452204, would clear attr fork
> bad nblocks 1 for inode 184452204, would reset to 0
> bad anextents 1 for inode 184452204, would reset to 0
> imap claims in-use inode 184452204 is free, would correct imap
> would have cleared inode 184452204
> .....
> disconnected inode 184452204, would move to lost+found
> 
> And so we have a situation where the directory structure and the
> inobt thinks the inode is free, but the inode on disk thinks it is
> still in use. Where this corruption came from is not possible to
> diagnose, but we can detect it and prevent the kernel from oopsing
> on lookup. The reproducer now results in:
> 
> $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
> mkdir: cannot create directory ‘/mnt/scratch/00’: File exists
> mkdir: cannot create directory ‘/mnt/scratch/01’: File exists
> mkdir: cannot create directory ‘/mnt/scratch/03’: Structure needs cleaning
> mkdir: cannot create directory ‘/mnt/scratch/04’: Input/output error
> mkdir: cannot create directory ‘/mnt/scratch/05’: Input/output error
> ....
> 
> And this corruption shutdown:
> 
> [   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not marked free on disk
> [   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
> [   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #443
> [   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [   54.852859] Call Trace:
> [   54.853531]  dump_stack+0x85/0xc5
> [   54.854385]  xfs_trans_cancel+0x197/0x1c0
> [   54.855421]  xfs_create+0x425/0x670
> [   54.856314]  xfs_generic_create+0x1f7/0x2c0
> [   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
> [   54.858586]  vfs_mkdir+0xfb/0x1b0
> [   54.859458]  SyS_mkdir+0xcf/0xf0
> [   54.860254]  do_syscall_64+0x73/0x1a0
> [   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [   54.862492] RIP: 0033:0x7fb73bddf547
> [   54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
> [   54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73bddf547
> [   54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffdaa55449a
> [   54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a8670dd0
> [   54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 00000000000001ff
> [   54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffdaa553500
> [   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1024 of file fs/xfs/xfs_trans.c.  Return address = ffffffff814cd050
> [   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem
> [   54.884597] XFS (loop0): Please umount the filesystem and rectify the problem(s)
> 
> Note that this crash is only possible on v4 filesystemsi or v5
> filesystems mounted with the ikeep mount option. For all other V5
> filesystems, this problem cannot occur because we don't read inodes
> we are allocating from disk - we simply overwrite them with the new
> inode information.

Got a test case for this scenario? :)

The patch looks ok, but I'd rather have some sort of reproducer so I can
verify that it works (and that scrub will notice the broken state) even
if we have to mkfs + db to check it.

> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 1dc37b72b6ea..98b7a4ae15e4 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -484,7 +484,28 @@ xfs_iget_cache_miss(
>  
>  	trace_xfs_iget_miss(ip);
>  
> -	if ((VFS_I(ip)->i_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> +
> +	/*
> +	 * If we are allocating a new inode, then check what was returned is
> +	 * actually a free, empty inode. If we are not allocating an inode,
> +	 * the check we didn't find a free inode.
> +	 */
> +	if (flags & XFS_IGET_CREATE) {
> +		if (VFS_I(ip)->i_mode != 0) {
> +			xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx not marked free on disk",
> +				ino);
> +			error = -EFSCORRUPTED;
> +			goto out_destroy;
> +		}
> +		if (ip->i_d.di_nblocks != 0) {
> +			xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx has blocks allocated!",
> +				ino);
> +			error = -EFSCORRUPTED;
> +			goto out_destroy;

I've a patch out for review that adds a xfs_inode_verifier_error
function that spits out a standardized corruption warning, a hex dump of
the bad dinode, and tells the user to run repair.  This seems like a
good candidate for that.

--D

> +		}
> +	} else if (VFS_I(ip)->i_mode == 0) {
>  		error = -ENOENT;
>  		goto out_destroy;
>  	}
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-03-21 17:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21  7:52 [PATCH 0/2] xfs: fixes for inode allocation issues Dave Chinner
2018-03-21  7:52 ` [PATCH 1/2] xfs: catch inode allocation state mismatch corruption Dave Chinner
2018-03-21 17:06   ` Darrick J. Wong [this message]
2018-03-21 21:10     ` Dave Chinner
2018-03-23 13:07       ` Carlos Maiolino
2018-03-23 12:55   ` Carlos Maiolino
2018-03-21  7:52 ` [PATCH 2/2] xfs: remove dead inode version setting code Dave Chinner
2018-03-21 17:07   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180321170640.GC4818@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.