From: Brian Foster <bfoster@redhat.com>
To: Alex Lyakas <alex@zadarastorage.com>
Cc: xfs@oss.sgi.com
Subject: Re: use-after-free on log replay failure
Date: Wed, 6 Aug 2014 11:20:42 -0400 [thread overview]
Message-ID: <20140806152042.GB39990@bfoster.bfoster> (raw)
In-Reply-To: <AC10852F403846A182491ED8071135ED@alyakaslap>
On Wed, Aug 06, 2014 at 03:52:03PM +0300, Alex Lyakas wrote:
> Hello Dave and Brian,
>
> Dave, I tried the patch you suggested, but it does not fix the issue. I did
> some further digging, and it appears that _xfs_buf_ioend(schedule=1) can be
> called from xfs_buf_iorequest(), which the patch fixes, but also from
> xfs_buf_bio_end_io() which is my case. I am reproducing the issue pretty
> easily. The flow that I have is like this:
> - xlog_recover() calls xlog_find_tail(). This works alright.
What's the purpose of a sleep here?
> - Now I add a small sleep before calling xlog_do_recover(), and meanwhile I
> instruct my block device to return ENOSPC for any WRITE from now on.
>
> What seems to happen is that several WRITE bios are submitted and they all
> fail. When they do, they reach xfs_buf_ioend() through a stack like this:
>
> Aug 6 15:23:07 dev kernel: [ 304.410528] [56]xfs*[xfs_buf_ioend:1056]
> XFS(dm-19): Scheduling xfs_buf_iodone_work on error
> Aug 6 15:23:07 dev kernel: [ 304.410534] Pid: 56, comm: kworker/u:1
> Tainted: G W O 3.8.13-557-generic #1382000791
> Aug 6 15:23:07 dev kernel: [ 304.410537] Call Trace:
> Aug 6 15:23:07 dev kernel: [ 304.410587] [<ffffffffa04d6654>]
> xfs_buf_ioend+0x1a4/0x1b0 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.410621] [<ffffffffa04d6685>]
> _xfs_buf_ioend+0x25/0x30 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.410643] [<ffffffffa04d6b3d>]
> xfs_buf_bio_end_io+0x3d/0x50 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.410652] [<ffffffff811c3d8d>]
> bio_endio+0x1d/0x40
> ...
>
> At this point, they are scheduled to run xlog_recover_iodone through
> xfslogd_workqueue.
> The first callback that gets called, calls xfs_do_force_shutdown in stack
> like this:
>
> Aug 6 15:23:07 dev kernel: [ 304.411791] XFS (dm-19): metadata I/O error:
> block 0x3780001 ("xlog_recover_iodone") error 28 numblks 1
> Aug 6 15:23:07 dev kernel: [ 304.413493] XFS (dm-19):
> xfs_do_force_shutdown(0x1) called from line 377 of file
> /mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
> 0xffffffffa0526848
> Aug 6 15:23:07 dev kernel: [ 304.413837] [<ffffffffa04e0b60>]
> xfs_do_force_shutdown+0x40/0x180 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.413870] [<ffffffffa0526848>] ?
> xlog_recover_iodone+0x48/0x70 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.413902] [<ffffffffa0526848>]
> xlog_recover_iodone+0x48/0x70 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.413923] [<ffffffffa04d645d>]
> xfs_buf_iodone_work+0x4d/0xa0 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.413930] [<ffffffff81077a11>]
> process_one_work+0x141/0x4a0
> Aug 6 15:23:07 dev kernel: [ 304.413937] [<ffffffff810789e8>]
> worker_thread+0x168/0x410
> Aug 6 15:23:07 dev kernel: [ 304.413943] [<ffffffff81078880>] ?
> manage_workers+0x120/0x120
> Aug 6 15:23:07 dev kernel: [ 304.413949] [<ffffffff8107df10>]
> kthread+0xc0/0xd0
> Aug 6 15:23:07 dev kernel: [ 304.413954] [<ffffffff8107de50>] ?
> flush_kthread_worker+0xb0/0xb0
> Aug 6 15:23:07 dev kernel: [ 304.413976] [<ffffffff816ab86c>]
> ret_from_fork+0x7c/0xb0
> Aug 6 15:23:07 dev kernel: [ 304.413986] [<ffffffff8107de50>] ?
> flush_kthread_worker+0xb0/0xb0
> Aug 6 15:23:07 dev kernel: [ 304.413990] ---[ end trace 988d698520e1fa81
> ]---
> Aug 6 15:23:07 dev kernel: [ 304.414012] XFS (dm-19): I/O Error Detected.
> Shutting down filesystem
> Aug 6 15:23:07 dev kernel: [ 304.415936] XFS (dm-19): Please umount the
> filesystem and rectify the problem(s)
>
> But the rest of the callbacks also arrive:
> Aug 6 15:23:07 dev kernel: [ 304.417812] XFS (dm-19): metadata I/O error:
> block 0x3780002 ("xlog_recover_iodone") error 28 numblks 1
> Aug 6 15:23:07 dev kernel: [ 304.420420] XFS (dm-19):
> xfs_do_force_shutdown(0x1) called from line 377 of file
> /mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
> 0xffffffffa0526848
> Aug 6 15:23:07 dev kernel: [ 304.420427] XFS (dm-19): metadata I/O error:
> block 0x3780008 ("xlog_recover_iodone") error 28 numblks 8
> Aug 6 15:23:07 dev kernel: [ 304.422708] XFS (dm-19):
> xfs_do_force_shutdown(0x1) called from line 377 of file
> /mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
> 0xffffffffa0526848
> Aug 6 15:23:07 dev kernel: [ 304.422738] XFS (dm-19): metadata I/O error:
> block 0x3780010 ("xlog_recover_iodone") error 28 numblks 8
>
> The mount sequence fails and goes back to the caller:
> Aug 6 15:23:07 dev kernel: [ 304.423438] XFS (dm-19): log mount/recovery
> failed: error 28
> Aug 6 15:23:07 dev kernel: [ 304.423757] XFS (dm-19): log mount failed
>
> But there are still additional callbacks to deliver, which the mount
> sequence did not wait for!
> Aug 6 15:23:07 dev kernel: [ 304.425717] XFS (\x10@dR):
> xfs_do_force_shutdown(0x1) called from line 377 of file
> /mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
> 0xffffffffa0526848
> Aug 6 15:23:07 dev kernel: [ 304.425723] XFS (\x10@dR): metadata I/O error:
> block 0x3780018 ("xlog_recover_iodone") error 28 numblks 8
> Aug 6 15:23:07 dev kernel: [ 304.428239] XFS (\x10@dR):
> xfs_do_force_shutdown(0x1) called from line 377 of file
> /mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
> 0xffffffffa0526848
> Aug 6 15:23:07 dev kernel: [ 304.428246] XFS (\x10@dR): metadata I/O error:
> block 0x37800a0 ("xlog_recover_iodone") error 28 numblks 16
>
> Notice the junk that they are printing! Naturally, because xfs_mount
> structure has been kfreed.
>
> Finally the kernel crashes (instead of printing junk), because the xfs_mount
> structure is gone, but the callback tries to access it (printing the name):
>
> Aug 6 15:23:07 dev kernel: [ 304.430796] general protection fault: 0000
> [#1] SMP
> Aug 6 15:23:07 dev kernel: [ 304.432035] Modules linked in: xfrm_user
> xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 iscsi_scst_tcp(O)
> scst_vdisk(O) scst(O) dm_zcache(O) dm_btrfs(O) xfs(O) btrfs(O) libcrc32c
> raid456(O) async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq
> async_tx raid1(O) md_mod deflate zlib_deflate ctr twofish_generic
> twofish_x86_64_3way glue_helper lrw xts gf128mul twofish_x86_64
> twofish_common camellia_generic serpent_generic blowfish_generic
> blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic xcbc
> rmd160 sha512_generic crypto_null af_key xfrm_algo dm_round_robin kvm vfat
> fat ppdev psmouse microcode nfsd nfs_acl dm_multipath(O) serio_raw
> parport_pc nfsv4 dm_iostat(O) mac_hid i2c_piix4 auth_rpcgss nfs fscache
> lockd sunrpc lp parport floppy
> Aug 6 15:23:07 dev kernel: [ 304.432035] CPU 1
> Aug 6 15:23:07 dev kernel: [ 304.432035] Pid: 133, comm: kworker/1:1H
> Tainted: G W O 3.8.13-557-generic #1382000791 Bochs Bochs
> Aug 6 15:23:07 dev kernel: [ 304.432035] RIP: 0010:[<ffffffff8133c2cb>]
> [<ffffffff8133c2cb>] strnlen+0xb/0x30
> Aug 6 15:23:07 dev kernel: [ 304.432035] RSP: 0018:ffff880035461b08
> EFLAGS: 00010086
> Aug 6 15:23:07 dev kernel: [ 304.432035] RAX: 0000000000000000 RBX:
> ffffffff81e6a4e7 RCX: 0000000000000000
> Aug 6 15:23:07 dev kernel: [ 304.432035] RDX: e4e8390a265c0000 RSI:
> ffffffffffffffff RDI: e4e8390a265c0000
> Aug 6 15:23:07 dev kernel: [ 304.432035] RBP: ffff880035461b08 R08:
> 000000000000ffff R09: 000000000000ffff
> Aug 6 15:23:07 dev kernel: [ 304.432035] R10: 0000000000000000 R11:
> 00000000000004cd R12: e4e8390a265c0000
> Aug 6 15:23:07 dev kernel: [ 304.432035] R13: ffffffff81e6a8c0 R14:
> 0000000000000000 R15: 000000000000ffff
> Aug 6 15:23:07 dev kernel: [ 304.432035] FS: 0000000000000000(0000)
> GS:ffff88007fc80000(0000) knlGS:0000000000000000
> Aug 6 15:23:07 dev kernel: [ 304.432035] CS: 0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Aug 6 15:23:07 dev kernel: [ 304.432035] CR2: 00007fc902ffbfd8 CR3:
> 000000007702a000 CR4: 00000000000006e0
> Aug 6 15:23:07 dev kernel: [ 304.432035] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Aug 6 15:23:07 dev kernel: [ 304.432035] DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Aug 6 15:23:07 dev kernel: [ 304.432035] Process kworker/1:1H (pid: 133,
> threadinfo ffff880035460000, task ffff880035412e00)
> Aug 6 15:23:07 dev kernel: [ 304.432035] Stack:
> Aug 6 15:23:07 dev kernel: [ 304.432035] ffff880035461b48
> ffffffff8133dd5e 0000000000000000 ffffffff81e6a4e7
> Aug 6 15:23:07 dev kernel: [ 304.432035] ffffffffa0566cba
> ffff880035461c80 ffffffffa0566cba ffffffff81e6a8c0
> Aug 6 15:23:07 dev kernel: [ 304.432035] ffff880035461bc8
> ffffffff8133ef59 ffff880035461bc8 ffffffff81c84040
> Aug 6 15:23:07 dev kernel: [ 304.432035] Call Trace:
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8133dd5e>]
> string.isra.4+0x3e/0xd0
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8133ef59>]
> vsnprintf+0x219/0x640
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8133f441>]
> vscnprintf+0x11/0x30
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8105a971>]
> vprintk_emit+0xc1/0x490
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8105aa20>] ?
> vprintk_emit+0x170/0x490
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8168b992>]
> printk+0x61/0x63
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffffa04e9bf1>]
> __xfs_printk+0x31/0x50 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffffa04e9e43>]
> xfs_notice+0x53/0x60 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffffa04e0c15>]
> xfs_do_force_shutdown+0xf5/0x180 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffffa0526848>] ?
> xlog_recover_iodone+0x48/0x70 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffffa0526848>]
> xlog_recover_iodone+0x48/0x70 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffffa04d645d>]
> xfs_buf_iodone_work+0x4d/0xa0 [xfs]
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff81077a11>]
> process_one_work+0x141/0x4a0
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff810789e8>]
> worker_thread+0x168/0x410
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff81078880>] ?
> manage_workers+0x120/0x120
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8107df10>]
> kthread+0xc0/0xd0
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8107de50>] ?
> flush_kthread_worker+0xb0/0xb0
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff816ab86c>]
> ret_from_fork+0x7c/0xb0
> Aug 6 15:23:07 dev kernel: [ 304.432035] [<ffffffff8107de50>] ?
> flush_kthread_worker+0xb0/0xb0
> Aug 6 15:23:07 dev kernel: [ 304.432035] Code: 31 c0 80 3f 00 55 48 89 e5
> 74 11 48 89 f8 66 90 48 83 c0 01 80 38 00 75 f7 48 29 f8 5d c3 66 90 55 31
> c0 48 85 f6 48 89 e5 74 23 <80> 3f 00 74 1e 48 89 f8 eb 0c 0f 1f 00 48 83 ee
> 01 80 38 00 74
> Aug 6 15:23:07 dev kernel: [ 304.432035] RIP [<ffffffff8133c2cb>]
> strnlen+0xb/0x30
> Aug 6 15:23:07 dev kernel: [ 304.432035] RSP <ffff880035461b08>
>
>
> So previously you said: "So, something is corrupting memory and stamping all
> over the XFS structures." and also "given you have a bunch of out of tree
> modules loaded (and some which are experiemental) suggests that you have a
> problem with your storage...".
>
> But I believe, my analysis shows that during the mount sequence XFS does not
> wait properly for all the bios to complete, before failing the mount
> sequence back to the caller.
>
As an experiment, what about the following? Compile tested only and not
safe for general use.
What might help more is to see if you can create a reproducer on a
recent, clean kernel. Perhaps a metadump of your reproducer fs combined
with whatever block device ENOSPC hack you're using would do it.
Brian
---8<---
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index cd7b8ca..fbcf524 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1409,19 +1409,27 @@ xfs_buf_iorequest(
* case nothing will ever complete. It returns the I/O error code, if any, or
* 0 if there was no error.
*/
-int
-xfs_buf_iowait(
- xfs_buf_t *bp)
+static int
+__xfs_buf_iowait(
+ struct xfs_buf *bp,
+ bool skip_error)
{
trace_xfs_buf_iowait(bp, _RET_IP_);
- if (!bp->b_error)
+ if (skip_error || !bp->b_error)
wait_for_completion(&bp->b_iowait);
trace_xfs_buf_iowait_done(bp, _RET_IP_);
return bp->b_error;
}
+int
+xfs_buf_iowait(
+ struct xfs_buf *bp)
+{
+ return __xfs_buf_iowait(bp, false);
+}
+
xfs_caddr_t
xfs_buf_offset(
xfs_buf_t *bp,
@@ -1866,7 +1874,7 @@ xfs_buf_delwri_submit(
bp = list_first_entry(&io_list, struct xfs_buf, b_list);
list_del_init(&bp->b_list);
- error2 = xfs_buf_iowait(bp);
+ error2 = __xfs_buf_iowait(bp, true);
xfs_buf_relse(bp);
if (!error)
error = error2;
---
> Thanks,
> Alex.
>
>
>
> -----Original Message----- From: Dave Chinner
> Sent: 05 August, 2014 2:07 AM
> To: Alex Lyakas
> Cc: xfs@oss.sgi.com
> Subject: Re: use-after-free on log replay failure
>
> On Mon, Aug 04, 2014 at 02:00:05PM +0300, Alex Lyakas wrote:
> >Greetings,
> >
> >we had a log replay failure due to some errors that the underlying
> >block device returned:
> >[49133.801406] XFS (dm-95): metadata I/O error: block 0x270e8c180
> >("xlog_recover_iodone") error 28 numblks 16
> >[49133.802495] XFS (dm-95): log mount/recovery failed: error 28
> >[49133.802644] XFS (dm-95): log mount failed
>
> #define ENOSPC 28 /* No space left on device */
>
> You're getting an ENOSPC as a metadata IO error during log recovery?
> Thin provisioning problem, perhaps, and the error is occurring on
> submission rather than completion? If so:
>
> 8d6c121 xfs: fix buffer use after free on IO error
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-06 15:20 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-18 18:37 Questions about XFS discard and xfs_free_extent() code (newbie) Alex Lyakas
2013-12-18 23:06 ` Dave Chinner
2013-12-19 9:24 ` Alex Lyakas
2013-12-19 10:55 ` Dave Chinner
2013-12-19 19:24 ` Alex Lyakas
2013-12-21 17:03 ` Chris Murphy
2013-12-24 18:21 ` Alex Lyakas
2013-12-26 23:00 ` Dave Chinner
2014-01-08 18:13 ` Alex Lyakas
2014-01-13 3:02 ` Dave Chinner
2014-01-13 17:44 ` Alex Lyakas
2014-01-13 20:43 ` Dave Chinner
2014-01-14 13:48 ` Alex Lyakas
2014-01-15 1:45 ` Dave Chinner
2014-01-19 9:38 ` Alex Lyakas
2014-01-19 23:17 ` Dave Chinner
2014-07-01 15:06 ` xfs_growfs_data_private memory leak Alex Lyakas
2014-07-01 21:56 ` Dave Chinner
2014-07-02 12:27 ` Alex Lyakas
2014-08-04 18:15 ` Eric Sandeen
2014-08-06 8:56 ` Alex Lyakas
2014-08-04 11:00 ` use-after-free on log replay failure Alex Lyakas
2014-08-04 14:12 ` Brian Foster
2014-08-04 23:07 ` Dave Chinner
2014-08-06 10:05 ` Alex Lyakas
2014-08-06 12:32 ` Dave Chinner
2014-08-06 14:43 ` Alex Lyakas
2014-08-10 16:26 ` Alex Lyakas
2014-08-06 12:52 ` Alex Lyakas
2014-08-06 15:20 ` Brian Foster [this message]
2014-08-06 15:28 ` Alex Lyakas
2014-08-10 12:20 ` Alex Lyakas
2014-08-11 13:20 ` Brian Foster
2014-08-11 21:52 ` Dave Chinner
2014-08-12 12:03 ` Brian Foster
2014-08-12 12:39 ` Alex Lyakas
2014-08-12 19:31 ` Brian Foster
2014-08-12 23:56 ` Dave Chinner
2014-08-13 12:59 ` Brian Foster
2014-08-13 20:59 ` Dave Chinner
2014-08-13 23:21 ` Brian Foster
2014-08-14 6:14 ` Dave Chinner
2014-08-14 19:05 ` Brian Foster
2014-08-14 22:27 ` Dave Chinner
2014-08-13 17:07 ` Alex Lyakas
2014-08-13 0:03 ` Dave Chinner
2014-08-13 13:11 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140806152042.GB39990@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=alex@zadarastorage.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox