From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow()
Date: Mon, 17 Sep 2018 16:51:10 -0700 [thread overview]
Message-ID: <20180917235110.GA20086@magnolia> (raw)
In-Reply-To: <20180917205354.15401-2-hch@lst.de>
On Mon, Sep 17, 2018 at 10:53:45PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When xfs_reflink_allocate_cow() allocates a transaction, it drops
> the ILOCK to perform the operation. This Introduces a race condition
> where another thread modifying the file can perform the COW
> allocation operation underneath us. This result in the retry loop
> finding an allocated block and jumping straight to the conversion
> code. It does not, however, cancel the transaction it holds and so
> this gets leaked. This results in a lockdep warning:
>
> ================================================
> WARNING: lock held when returning to user space!
> 4.18.5 #1 Not tainted
> ------------------------------------------------
> worker/6123 is leaving the kernel with locks still held!
> 1 lock held by worker/6123:
> #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220
>
> And eventually the filesystem deadlocks because it runs out of log
> space that is reserved by the leaked transaction and never gets
> released.
>
> The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
> gotos - it's no surprise that it has bug where the flow through
> several goto jumps then fails to clean up context from a non-obvious
> logic path. CLean up the logic flow and make sure every path does
> the right thing.
>
> Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
> Tested-by: Alexander Y. Fomichev <git.user@gmail.com>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [hch: slight refactor]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/xfs_reflink.c | 127 ++++++++++++++++++++++++++-----------------
> 1 file changed, 77 insertions(+), 50 deletions(-)
>
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 38f405415b88..d60d0eeed7b9 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -352,6 +352,47 @@ xfs_reflink_convert_cow(
> return error;
> }
>
> +/*
> + * Find the extent that maps the given range in the COW fork. Even if the extent
> + * is not shared we might have a preallocation for it in the COW fork. If so we
> + * use it that rather than trigger a new allocation.
> + */
> +static int
> +xfs_find_trim_cow_extent(
> + struct xfs_inode *ip,
> + struct xfs_bmbt_irec *imap,
> + bool *shared,
> + bool *found)
> +{
> + xfs_fileoff_t offset_fsb = imap->br_startoff;
> + xfs_filblks_t count_fsb = imap->br_blockcount;
> + struct xfs_iext_cursor icur;
> + struct xfs_bmbt_irec got;
> + bool trimmed;
> +
> + *found = false;
> +
> + /*
> + * If we don't find an overlapping extent, trim the range we need to
> + * allocate to fit the hole we found.
> + */
> + if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) ||
> + got.br_startoff > offset_fsb)
> + return xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> +
> + *shared = true;
> + if (isnullstartblock(got.br_startblock)) {
> + xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> + return 0;
> + }
> +
> + /* real extent found - no need to allocate */
> + xfs_trim_extent(&got, offset_fsb, count_fsb);
> + *imap = got;
> + *found = true;
> + return 0;
> +}
> +
> /* Allocate all CoW reservations covering a range of blocks in a file. */
> int
> xfs_reflink_allocate_cow(
> @@ -363,78 +404,64 @@ xfs_reflink_allocate_cow(
> struct xfs_mount *mp = ip->i_mount;
> xfs_fileoff_t offset_fsb = imap->br_startoff;
> xfs_filblks_t count_fsb = imap->br_blockcount;
> - struct xfs_bmbt_irec got;
> - struct xfs_trans *tp = NULL;
> + struct xfs_trans *tp;
> int nimaps, error = 0;
> - bool trimmed;
> + bool found;
> xfs_filblks_t resaligned;
> xfs_extlen_t resblks = 0;
> - struct xfs_iext_cursor icur;
>
> -retry:
> - ASSERT(xfs_is_reflink_inode(ip));
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> + ASSERT(xfs_is_reflink_inode(ip));
>
> - /*
> - * Even if the extent is not shared we might have a preallocation for
> - * it in the COW fork. If so use it.
> - */
> - if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) &&
> - got.br_startoff <= offset_fsb) {
> - *shared = true;
> -
> - /* If we have a real allocation in the COW fork we're done. */
> - if (!isnullstartblock(got.br_startblock)) {
> - xfs_trim_extent(&got, offset_fsb, count_fsb);
> - *imap = got;
> - goto convert;
> - }
> + error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
> + if (error || !*shared)
> + return error;
> + if (found)
> + goto convert;
>
> - xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> - } else {
> - error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> - if (error || !*shared)
> - goto out;
> - }
> + resaligned = xfs_aligned_fsb_count(imap->br_startoff,
> + imap->br_blockcount, xfs_get_cowextsz_hint(ip));
> + resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
>
> - if (!tp) {
> - resaligned = xfs_aligned_fsb_count(imap->br_startoff,
> - imap->br_blockcount, xfs_get_cowextsz_hint(ip));
> - resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
> + xfs_iunlock(ip, *lockmode);
> + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> + *lockmode = XFS_ILOCK_EXCL;
> + xfs_ilock(ip, *lockmode);
>
> - xfs_iunlock(ip, *lockmode);
> - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> - *lockmode = XFS_ILOCK_EXCL;
> - xfs_ilock(ip, *lockmode);
> + if (error)
> + return error;
>
> - if (error)
> - return error;
> + error = xfs_qm_dqattach_locked(ip, false);
> + if (error)
> + goto out_trans_cancel;
>
> - error = xfs_qm_dqattach_locked(ip, false);
> - if (error)
> - goto out;
> - goto retry;
> + /*
> + * Check for an overlapping extent again now that we dropped the ilock.
> + */
> + error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
> + if (error || !*shared)
> + goto out_trans_cancel;
> + if (found) {
> + xfs_trans_cancel(tp);
> + goto convert;
> }
>
> error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
> XFS_QMOPT_RES_REGBLKS);
> if (error)
> - goto out;
> + goto out_trans_cancel;
>
> xfs_trans_ijoin(tp, ip, 0);
>
> - nimaps = 1;
> -
> /* Allocate the entire reservation as unwritten blocks. */
> + nimaps = 1;
> error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
> XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
> resblks, imap, &nimaps);
> if (error)
> - goto out_trans_cancel;
> + goto out_unreserve;
>
> xfs_inode_set_cowblocks_tag(ip);
> -
> - /* Finish up. */
> error = xfs_trans_commit(tp);
> if (error)
> return error;
> @@ -447,12 +474,12 @@ xfs_reflink_allocate_cow(
> return -ENOSPC;
> convert:
> return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
> -out_trans_cancel:
> +
> +out_unreserve:
> xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
> XFS_QMOPT_RES_REGBLKS);
> -out:
> - if (tp)
> - xfs_trans_cancel(tp);
> +out_trans_cancel:
> + xfs_trans_cancel(tp);
> return error;
> }
>
> --
> 2.18.0
>
next prev parent reply other threads:[~2018-09-18 5:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
2018-09-17 20:53 ` [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow() Christoph Hellwig
2018-09-17 23:51 ` Darrick J. Wong [this message]
2018-09-17 20:53 ` [PATCH 02/10] xfs: don't bring in extents in xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-09-20 20:23 ` Darrick J. Wong
2018-09-17 20:53 ` [PATCH 03/10] xfs: remove XFS_IO_INVALID Christoph Hellwig
2018-09-20 20:31 ` Darrick J. Wong
2018-09-27 18:38 ` Christoph Hellwig
2018-09-17 20:53 ` [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit Christoph Hellwig
2018-09-20 20:31 ` Darrick J. Wong
2018-09-26 15:17 ` Brian Foster
2018-09-27 18:40 ` Christoph Hellwig
2018-09-17 20:53 ` [PATCH 05/10] xfs: handle zeroing in xfs_file_iomap_begin_delay Christoph Hellwig
2018-09-17 20:53 ` [PATCH 06/10] xfs: always allocate blocks as unwritten for file data Christoph Hellwig
2018-09-17 20:53 ` [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay Christoph Hellwig
2018-09-26 15:17 ` Brian Foster
2018-10-01 12:38 ` Christoph Hellwig
2018-09-17 20:53 ` [PATCH 08/10] xfs: remove the unused shared argument to xfs_reflink_reserve_cow Christoph Hellwig
2018-09-17 20:53 ` [PATCH 09/10] xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared Christoph Hellwig
2018-09-17 20:53 ` [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes Christoph Hellwig
2018-09-26 15:18 ` Brian Foster
2018-10-01 12:40 ` Christoph Hellwig
2018-09-17 21:23 ` delalloc and reflink fixes & tweaks Dave Chinner
2018-09-18 18:17 ` Christoph Hellwig
2018-09-18 23:00 ` Dave Chinner
2018-09-19 5:40 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180917235110.GA20086@magnolia \
--to=darrick.wong@oracle.com \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).