From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/2] xfs: catch inode allocation state mismatch corruption
Date: Wed, 21 Mar 2018 10:06:40 -0700 [thread overview]
Message-ID: <20180321170640.GC4818@magnolia> (raw)
In-Reply-To: <20180321075248.7677-2-david@fromorbit.com>
On Wed, Mar 21, 2018 at 06:52:47PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> We recently came across a V4 filesystem causing memory corruption
> due to a newly allocated inode being setup twice and being added to
> the superblock inode list twice. From code inspection, the only way
> this could happen is if a newly allocated inode was not marked as
> free on disk (i.e. di_mode wasn't zero).
>
> Running the metadump on an upstream debug kernel fails during inode
> allocation like so:
>
> XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inode.c, line: 838
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:114!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> RIP: 0010:assfail+0x28/0x30
> RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
> RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
> RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
> R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
> FS: 00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
> Call Trace:
> xfs_ialloc+0x383/0x570
> xfs_dir_ialloc+0x6a/0x2a0
> xfs_create+0x412/0x670
> xfs_generic_create+0x1f7/0x2c0
> ? capable_wrt_inode_uidgid+0x3f/0x50
> vfs_mkdir+0xfb/0x1b0
> SyS_mkdir+0xcf/0xf0
> do_syscall_64+0x73/0x1a0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> Extracting the inode number we crashed on from an event trace and
> looking at it with xfs_db:
>
> xfs_db> inode 184452204
> xfs_db> p
> core.magic = 0x494e
> core.mode = 0100644
> core.version = 2
> core.format = 2 (extents)
> core.nlinkv2 = 1
> core.onlink = 0
> .....
>
> Confirms that it is not a free inode on disk. xfs_repair
> also trips over this inode:
>
> .....
> zero length extent (off = 0, fsbno = 0) in ino 184452204
> correcting nextents for inode 184452204
> bad attribute fork in inode 184452204, would clear attr fork
> bad nblocks 1 for inode 184452204, would reset to 0
> bad anextents 1 for inode 184452204, would reset to 0
> imap claims in-use inode 184452204 is free, would correct imap
> would have cleared inode 184452204
> .....
> disconnected inode 184452204, would move to lost+found
>
> And so we have a situation where the directory structure and the
> inobt thinks the inode is free, but the inode on disk thinks it is
> still in use. Where this corruption came from is not possible to
> diagnose, but we can detect it and prevent the kernel from oopsing
> on lookup. The reproducer now results in:
>
> $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
> mkdir: cannot create directory ‘/mnt/scratch/00’: File exists
> mkdir: cannot create directory ‘/mnt/scratch/01’: File exists
> mkdir: cannot create directory ‘/mnt/scratch/03’: Structure needs cleaning
> mkdir: cannot create directory ‘/mnt/scratch/04’: Input/output error
> mkdir: cannot create directory ‘/mnt/scratch/05’: Input/output error
> ....
>
> And this corruption shutdown:
>
> [ 54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not marked free on disk
> [ 54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x425/0x670
> [ 54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #443
> [ 54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 54.852859] Call Trace:
> [ 54.853531] dump_stack+0x85/0xc5
> [ 54.854385] xfs_trans_cancel+0x197/0x1c0
> [ 54.855421] xfs_create+0x425/0x670
> [ 54.856314] xfs_generic_create+0x1f7/0x2c0
> [ 54.857390] ? capable_wrt_inode_uidgid+0x3f/0x50
> [ 54.858586] vfs_mkdir+0xfb/0x1b0
> [ 54.859458] SyS_mkdir+0xcf/0xf0
> [ 54.860254] do_syscall_64+0x73/0x1a0
> [ 54.861193] entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [ 54.862492] RIP: 0033:0x7fb73bddf547
> [ 54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
> [ 54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73bddf547
> [ 54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffdaa55449a
> [ 54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a8670dd0
> [ 54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 00000000000001ff
> [ 54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffdaa553500
> [ 54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1024 of file fs/xfs/xfs_trans.c. Return address = ffffffff814cd050
> [ 54.882790] XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem
> [ 54.884597] XFS (loop0): Please umount the filesystem and rectify the problem(s)
>
> Note that this crash is only possible on v4 filesystemsi or v5
> filesystems mounted with the ikeep mount option. For all other V5
> filesystems, this problem cannot occur because we don't read inodes
> we are allocating from disk - we simply overwrite them with the new
> inode information.
Got a test case for this scenario? :)
The patch looks ok, but I'd rather have some sort of reproducer so I can
verify that it works (and that scrub will notice the broken state) even
if we have to mkfs + db to check it.
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_icache.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 1dc37b72b6ea..98b7a4ae15e4 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -484,7 +484,28 @@ xfs_iget_cache_miss(
>
> trace_xfs_iget_miss(ip);
>
> - if ((VFS_I(ip)->i_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> +
> + /*
> + * If we are allocating a new inode, then check what was returned is
> + * actually a free, empty inode. If we are not allocating an inode,
> + * the check we didn't find a free inode.
> + */
> + if (flags & XFS_IGET_CREATE) {
> + if (VFS_I(ip)->i_mode != 0) {
> + xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx not marked free on disk",
> + ino);
> + error = -EFSCORRUPTED;
> + goto out_destroy;
> + }
> + if (ip->i_d.di_nblocks != 0) {
> + xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx has blocks allocated!",
> + ino);
> + error = -EFSCORRUPTED;
> + goto out_destroy;
I've a patch out for review that adds a xfs_inode_verifier_error
function that spits out a standardized corruption warning, a hex dump of
the bad dinode, and tells the user to run repair. This seems like a
good candidate for that.
--D
> + }
> + } else if (VFS_I(ip)->i_mode == 0) {
> error = -ENOENT;
> goto out_destroy;
> }
> --
> 2.16.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-03-21 17:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-21 7:52 [PATCH 0/2] xfs: fixes for inode allocation issues Dave Chinner
2018-03-21 7:52 ` [PATCH 1/2] xfs: catch inode allocation state mismatch corruption Dave Chinner
2018-03-21 17:06 ` Darrick J. Wong [this message]
2018-03-21 21:10 ` Dave Chinner
2018-03-23 13:07 ` Carlos Maiolino
2018-03-23 12:55 ` Carlos Maiolino
2018-03-21 7:52 ` [PATCH 2/2] xfs: remove dead inode version setting code Dave Chinner
2018-03-21 17:07 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180321170640.GC4818@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).