From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2] xfs: use iomap new flag for newly allocated delalloc blocks
Date: Mon, 6 Mar 2017 13:23:07 -0500 [thread overview]
Message-ID: <20170306182307.GF3223@bfoster.bfoster> (raw)
In-Reply-To: <20170306180044.GB5280@birch.djwong.org>
On Mon, Mar 06, 2017 at 10:00:44AM -0800, Darrick J. Wong wrote:
> On Fri, Mar 03, 2017 at 12:23:18PM -0500, Brian Foster wrote:
> > Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
> > failure") fixed one regression in the iomap error handling code and
> > exposed another. The fundamental problem is that if a buffered write
> > is a rewrite of preexisting delalloc blocks and the write fails, the
> > failure handling code can punch out preexisting blocks with valid
> > file data.
> >
> > This was reproduced directly by sub-block writes in the LTP
> > kernel/syscalls/write/write03 test. A first 100 byte write allocates
> > a single block in a file. A subsequent 100 byte write fails and
> > punches out the block, including the data successfully written by
> > the previous write.
> >
> > To address this problem, update the ->iomap_begin() handler to
> > distinguish newly allocated delalloc blocks from preexisting
> > delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
> > ->iomap_end() handler to decide when a failed or short write should
> > punch out delalloc blocks.
> >
> > This introduces the subtle requirement that ->iomap_begin() should
> > never combine newly allocated delalloc blocks with existing blocks
> > in the resulting iomap descriptor. This can occur when a new
> > delalloc reservation merges with a neighboring extent that is part
> > of the current write, for example. Therefore, update
> > xfs_bmapi_reserve_delalloc() to conditionally return the allocated
> > record along with the updated 'got' record and map the former into
> > the iomap. This ensures that preexisting delalloc blocks are always
> > handled as "found" blocks and not punched out on a failed rewrite.
> >
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >
> > This version simply changes the interface to
> > xfs_bmapi_reserve_delalloc() for using the allocated range rather than
> > the fully up-to-date record after the allocation. Thoughts? If this is
> > still too ugly I suppose we could just do as Christoph suggested
> > originally and drop the update of got entirely. I'm just not that fond
> > of the landmine that leaves around should some future caller expect
> > 'got' to be updated, or worse, for somebody to re-add it to
> > bmapi_reserve_delalloc() without us remembering the requirement for
> > IOMAP_F_NEW and quietly breaking write failure handling again.
>
> Could you add a comment above xfs_bmapi_reserve_delalloc summarizing
> what the function is supposed to do and capturing the justification for
> for arec? I concede it's a little funny to mention iomap requirements
> in libxfs, but I'd rather it get written down than left as a potential
> landmine.
>
Sure... I'm not sure we need to describe the iomap case here (the
comment at the callsite should explain what's going on there), but we
can certainly describe got and arec here.
> > Brian
> >
> > v2:
> > - Use arec param rather than bmapi flag in xfs_bmapi_reserve_delalloc().
> > v1: http://www.spinics.net/lists/linux-xfs/msg04604.html
> >
> > fs/xfs/libxfs/xfs_bmap.c | 11 ++++++++---
> > fs/xfs/libxfs/xfs_bmap.h | 3 ++-
> > fs/xfs/xfs_iomap.c | 34 +++++++++++++++++++++++++---------
> > fs/xfs/xfs_reflink.c | 2 +-
> > 4 files changed, 36 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index a9c66d4..dcc6c13 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -4159,7 +4159,8 @@ xfs_bmapi_reserve_delalloc(
> > xfs_filblks_t prealloc,
> > struct xfs_bmbt_irec *got,
> > xfs_extnum_t *lastx,
> > - int eof)
> > + int eof,
> > + struct xfs_bmbt_irec *arec)
> > {
> > struct xfs_mount *mp = ip->i_mount;
> > struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
> > @@ -4236,11 +4237,15 @@ xfs_bmapi_reserve_delalloc(
> > got->br_startblock = nullstartblock(indlen);
> > got->br_blockcount = alen;
> > got->br_state = XFS_EXT_NORM;
> > + if (arec)
> > + *arec = *got;
> > +
> > xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
> >
> > /*
> > - * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
> > - * might have merged it into one of the neighbouring ones.
> > + * Update our extent pointer if the caller asked for entire extents.
> > + * xfs_bmap_add_extent_hole_delay() might have merged it into one of
> > + * the neighbouring ones.
> > */
> > xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
> >
> > diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> > index cdef87d..c61f2a7 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.h
> > +++ b/fs/xfs/libxfs/xfs_bmap.h
> > @@ -243,7 +243,8 @@ int xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
> > int xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
> > int xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
> > xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
> > - struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof);
> > + struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof,
> > + struct xfs_bmbt_irec *arec);
> >
> > enum xfs_bmap_intent_type {
> > XFS_BMAP_MAP = 1,
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 41662fb..bdbab0f 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -531,7 +531,7 @@ xfs_file_iomap_begin_delay(
> > XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > xfs_fileoff_t end_fsb;
> > int error = 0, eof = 0;
> > - struct xfs_bmbt_irec got;
> > + struct xfs_bmbt_irec got, arec;
> > xfs_extnum_t idx;
> > xfs_fsblock_t prealloc_blocks = 0;
> >
> > @@ -613,7 +613,8 @@ xfs_file_iomap_begin_delay(
> >
> > retry:
> > error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
> > - end_fsb - offset_fsb, prealloc_blocks, &got, &idx, eof);
> > + end_fsb - offset_fsb, prealloc_blocks, &got, &idx,
> > + eof, &arec);
> > switch (error) {
> > case 0:
> > break;
> > @@ -630,6 +631,15 @@ xfs_file_iomap_begin_delay(
> > goto out_unlock;
> > }
> >
> > + /*
> > + * IOMAP_F_NEW controls whether we punch out delalloc blocks if the
> > + * write happens to fail, which means we can't combine newly allocated
> > + * blocks with preexisting delalloc blocks in the same iomap. got may
> > + * have been merged with contiguous extents, so use arec to map only the
> > + * newly allocated blocks.
> > + */
> > + got = arec;
> > + iomap->flags = IOMAP_F_NEW;
> > trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
> > done:
> > if (isnullstartblock(got.br_startblock))
> > @@ -1071,16 +1081,22 @@ xfs_file_iomap_end_delalloc(
> > struct xfs_inode *ip,
> > loff_t offset,
> > loff_t length,
> > - ssize_t written)
> > + ssize_t written,
> > + struct iomap *iomap)
> > {
> > struct xfs_mount *mp = ip->i_mount;
> > xfs_fileoff_t start_fsb;
> > xfs_fileoff_t end_fsb;
> > int error = 0;
> >
> > - /* behave as if the write failed if drop writes is enabled */
> > - if (xfs_mp_drop_writes(mp))
> > + /*
> > + * Behave as if the write failed if drop writes is enabled. Set the NEW
> > + * flag to force delalloc cleanup.
> > + */
> > + if (xfs_mp_drop_writes(mp)) {
> > + iomap->flags |= IOMAP_F_NEW;
> > written = 0;
> > + }
> >
> > /*
> > * start_fsb refers to the first unused block after a short write. If
> > @@ -1094,14 +1110,14 @@ xfs_file_iomap_end_delalloc(
> > end_fsb = XFS_B_TO_FSB(mp, offset + length);
> >
> > /*
> > - * Trim back delalloc blocks if we didn't manage to write the whole
> > - * range reserved.
> > + * Trim delalloc blocks if they were allocated by this write and we
> > + * didn't manage to write the whole range.
> > *
> > * We don't need to care about racing delalloc as we hold i_mutex
> > * across the reserve/allocate/unreserve calls. If there are delalloc
> > * blocks in the range, they are ours.
> > */
> > - if (start_fsb < end_fsb) {
> > + if (iomap->flags & IOMAP_F_NEW && start_fsb < end_fsb) {
>
> Doesn't the IOMAP_F_NEW check here need parentheses?
>
I didn't think so, but I can add parens for clarity (which I prefer
anyways, not sure why I didn't use them here) if nothing else. Thanks.
Brian
> --D
>
> > truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> > XFS_FSB_TO_B(mp, end_fsb) - 1);
> >
> > @@ -1131,7 +1147,7 @@ xfs_file_iomap_end(
> > {
> > if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
> > return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
> > - length, written);
> > + length, written, iomap);
> > return 0;
> > }
> >
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index da6d08f..c29cc08 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -313,7 +313,7 @@ xfs_reflink_reserve_cow(
> > return error;
> >
> > error = xfs_bmapi_reserve_delalloc(ip, XFS_COW_FORK, imap->br_startoff,
> > - imap->br_blockcount, 0, &got, &idx, eof);
> > + imap->br_blockcount, 0, &got, &idx, eof, NULL);
> > if (error == -ENOSPC || error == -EDQUOT)
> > trace_xfs_reflink_cow_enospc(ip, imap);
> > if (error)
> > --
> > 2.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2017-03-06 18:24 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-03 17:23 [PATCH v2] xfs: use iomap new flag for newly allocated delalloc blocks Brian Foster
2017-03-06 18:00 ` Darrick J. Wong
2017-03-06 18:23 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170306182307.GF3223@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).