From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, git.user@gmail.com
Subject: Re: [PATCH] xfs: fix transaction leak in xfs_reflink_allocate_cow()
Date: Thu, 6 Sep 2018 19:40:47 -0700 [thread overview]
Message-ID: <20180907024047.GA4607@magnolia> (raw)
In-Reply-To: <20180907015113.32448-1-david@fromorbit.com>
On Fri, Sep 07, 2018 at 11:51:13AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When xfs_reflink_allocate_cow() allocates a transaction, it drops
> the ILOCK to perform the operation. This Introduces a race condition
> where another thread modifying the file can perform the COW
> allocation operation underneath us. This result in the retry loop
> finding an allocated block and jumping straight to the conversion
> code. It does not, however, cancel the transaction it holds and so
> this gets leaked. This results in a lockdep warning:
>
> ================================================
> WARNING: lock held when returning to user space!
> 4.18.5 #1 Not tainted
> ------------------------------------------------
> worker/6123 is leaving the kernel with locks still held!
> 1 lock held by worker/6123:
> #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220
>
> And eventually the filesystem deadlocks because it runs out of log
> space that is reserved by the leaked transaction and never gets
> released.
>
> The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
> gotos - it's no surprise that it has bug where the flow through
> several goto jumps then fails to clean up context from a non-obvious
> logic path. CLean up the logic flow and make sure every path does
> the right thing.
Ugh, sorry about leaving that mess.
I haven't tested this at all, but it looks right. If the original
reporter tests it and the problem goes away I'm willing to attach a:
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_reflink.c | 98 +++++++++++++++++++++++++++++---------------
> 1 file changed, 66 insertions(+), 32 deletions(-)
>
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 38f405415b88..b39f5afa57aa 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -352,6 +352,47 @@ xfs_reflink_convert_cow(
> return error;
> }
>
> +/*
> + * Find the extent that maps the given range in the COW fork. Even if the extent
> + * is not shared we might have a preallocation for it in the COW fork. If so we
> + * use it that rather than trigger a new allocation.
> + */
> +static int
> +find_trim_cow_extent(
> + struct xfs_inode *ip,
> + struct xfs_bmbt_irec *imap,
> + bool *shared,
> + bool *found)
> +{
> + xfs_fileoff_t offset_fsb = imap->br_startoff;
> + xfs_filblks_t count_fsb = imap->br_blockcount;
> + struct xfs_iext_cursor icur;
> + struct xfs_bmbt_irec got;
> + bool trimmed;
> +
> + *found = false;
> +
> + /*
> + * if we don't find an overlapping extent, trim the range we need to
> + * allocate to fit the hole we found.
> + */
> + if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) ||
> + got.br_startoff > offset_fsb)
> + return xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> +
> + *shared = true;
> + if (isnullstartblock(got.br_startblock)) {
> + xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> + return 0;
> + }
> +
> + /* real extent found - no need to allocate */
> + xfs_trim_extent(&got, offset_fsb, count_fsb);
> + *imap = got;
> + *found = true;
> + return 0;
> +}
> +
> /* Allocate all CoW reservations covering a range of blocks in a file. */
> int
> xfs_reflink_allocate_cow(
> @@ -363,41 +404,37 @@ xfs_reflink_allocate_cow(
> struct xfs_mount *mp = ip->i_mount;
> xfs_fileoff_t offset_fsb = imap->br_startoff;
> xfs_filblks_t count_fsb = imap->br_blockcount;
> - struct xfs_bmbt_irec got;
> struct xfs_trans *tp = NULL;
> int nimaps, error = 0;
> - bool trimmed;
> + bool found;
> xfs_filblks_t resaligned;
> xfs_extlen_t resblks = 0;
> - struct xfs_iext_cursor icur;
>
> -retry:
> - ASSERT(xfs_is_reflink_inode(ip));
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
>
> /*
> - * Even if the extent is not shared we might have a preallocation for
> - * it in the COW fork. If so use it.
> + * do-while to run a single lookup retry after transaction allocation.
> + * All exits from the loop need to check if the transaction needs
> + * cancelling. Dropping the ILOCK to allocate the transaction allows
> + * races with other COW fork allocations, so we need to start the extent
> + * lookup from scratch once we have the ILOCK again.
> */
> - if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) &&
> - got.br_startoff <= offset_fsb) {
> - *shared = true;
> + do {
> + ASSERT(xfs_is_reflink_inode(ip));
>
> - /* If we have a real allocation in the COW fork we're done. */
> - if (!isnullstartblock(got.br_startblock)) {
> - xfs_trim_extent(&got, offset_fsb, count_fsb);
> - *imap = got;
> + error = find_trim_cow_extent(ip, imap, shared, &found);
> + if (error || !*shared)
> + goto out_trans_cancel;
> + if (found) {
> + if (tp)
> + xfs_trans_cancel(tp);
> goto convert;
> }
>
> - xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> - } else {
> - error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> - if (error || !*shared)
> - goto out;
> - }
> + /* if we have a transaction, it's time to allocate */
> + if (tp)
> + break;
>
> - if (!tp) {
> resaligned = xfs_aligned_fsb_count(imap->br_startoff,
> imap->br_blockcount, xfs_get_cowextsz_hint(ip));
> resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
> @@ -412,29 +449,25 @@ xfs_reflink_allocate_cow(
>
> error = xfs_qm_dqattach_locked(ip, false);
> if (error)
> - goto out;
> - goto retry;
> - }
> + goto out_trans_cancel;
> + } while (tp);
>
> error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
> XFS_QMOPT_RES_REGBLKS);
> if (error)
> - goto out;
> + goto out_trans_cancel;
>
> xfs_trans_ijoin(tp, ip, 0);
>
> - nimaps = 1;
> -
> /* Allocate the entire reservation as unwritten blocks. */
> + nimaps = 1;
> error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
> XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
> resblks, imap, &nimaps);
> if (error)
> - goto out_trans_cancel;
> + goto out_unreserve;
>
> xfs_inode_set_cowblocks_tag(ip);
> -
> - /* Finish up. */
> error = xfs_trans_commit(tp);
> if (error)
> return error;
> @@ -447,10 +480,11 @@ xfs_reflink_allocate_cow(
> return -ENOSPC;
> convert:
> return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
> -out_trans_cancel:
> +
> +out_unreserve:
> xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
> XFS_QMOPT_RES_REGBLKS);
> -out:
> +out_trans_cancel:
> if (tp)
> xfs_trans_cancel(tp);
> return error;
> --
> 2.17.0
>
next prev parent reply other threads:[~2018-09-07 7:19 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-07 1:51 [PATCH] xfs: fix transaction leak in xfs_reflink_allocate_cow() Dave Chinner
2018-09-07 2:40 ` Darrick J. Wong [this message]
2018-09-07 13:11 ` Alexander Y. Fomichev
2018-09-07 14:03 ` Brian Foster
2018-09-10 7:02 ` Christoph Hellwig
2018-09-12 9:24 ` Alexander Y. Fomichev
2018-09-17 16:29 ` Darrick J. Wong
2018-09-17 16:30 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180907024047.GA4607@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=git.user@gmail.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).