From: Brian Foster <bfoster@redhat.com>
To: Alex Lyakas <alex@zadarastorage.com>
Cc: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
Shyam Kaushik <shyam@zadarastorage.com>,
Yair Hershko <yair@zadarastorage.com>
Subject: Re: [RFC - PATCH] xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of the first attribute
Date: Mon, 7 Aug 2017 13:59:29 -0400 [thread overview]
Message-ID: <20170807175928.GB49072@bfoster.bfoster> (raw)
In-Reply-To: <CAOcd+r2=tfvHuH=EE3j9y2w9KT42k9_HXdjUzB+4jCnd4KQx+Q@mail.gmail.com>
On Mon, Aug 07, 2017 at 08:00:46PM +0300, Alex Lyakas wrote:
> Hello Brian,
>
> Thanks for the review.
>
> On Mon, Aug 7, 2017 at 6:16 PM, Brian Foster <bfoster@redhat.com> wrote:
> > On Mon, Aug 07, 2017 at 03:54:03PM +0300, Alex Lyakas wrote:
> >> The new attribute leaf buffer is not held locked across the transaction roll
> >> between the shortform->leaf modification and the addition of the new entry.
> >> As a result, the attribute buffer modification being made is not atomic
> >> from an operational perspective.
> >> Hence the AIL push can grab it in the transient state of "just created"
> >> after the initial transaction is rolled, because the buffer has been
> >> released.
> >> This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero,
> >> treating this as in-memory corruption, and shutting down the filesystem.
> >>
> >> More info at:
> >> https://www.spinics.net/lists/linux-xfs/msg08778.html
> >>
> >
> > FYI.. I'm not sure how appropriate it is to use URLs in commit log
> > descriptions. Perhaps somebody else can chime in on that.
> I will get rid of it.
>
> >
> >> This patch is based on kernel 3.18.19, which is a long-term (although EOL)
> >> kernel.
> >> This is the reason it is marked as RFC.
> >>
> >
> > It's probably best to develop against the latest kernel, get the fix
> > right, then worry about v3.18 for a backport.
> Unfortunately, for us this is exactly the opposite. We are running
> kernel 3.18 in production, and that's where we need the fix. Moving to
> a different kernel is a significant effort for us. We do it from time
> to time, but it's always to a long-term stable kernel and not to one
> of the latests. Still, below I provide a patch for 4.13-rc4.
>
This is the same as for most people. Nobody is running the latest
for-next kernel in a critical production environment. We debug/diagnose
issues against older kernels all the time, fix in the latest development
tree and backport from there as needed. Once the fix is backported and
picked up in a stable kernel, you can update your production systems at
that point.
It sounds like you have a reliable reproducer (albeit with injection of
an artificial delay), so you should be able to reproduce and test/verify
against a development kernel. This doesn't require moving any of your
infrastructure to the latest kernel (use a vm, for example).
> >
> >> Reproducing the issue is achieved by adding "msleep(1000)" after the
> >> xfs_trans_roll() call.
> >> From the limited testing, this patch seems to fix the issue.
> >> Once/if the community approves this patch, we will do a formal testing.
> >>
> >> Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
> >> ---
> >
> > Note that your patch doesn't apply for me:
> >
> > ...
> > patching file fs/xfs/libxfs/xfs_attr.c
> > Hunk #1 FAILED at 285.
> > patch: **** malformed patch at line 70: commits */
> >
> > Perhaps it has been damaged by your mailer? Otherwise, a few initial
> > comments...
> Trying a different mailer now.
>
Same problem. It looks like everything is converted to spaces. Are you
copy+pasting the patch into an email? Note that usually will not work.
You can use something like 'git send-email' to post correctly formatted
patches over mail. Also note you can send it to yourself initially and
test apply it (and/or run scripts/checkpatch.pl against it) to make sure
it works.
> >
> >> fs/xfs/libxfs/xfs_attr.c | 24 +++++++++++++++++++++---
> >> fs/xfs/libxfs/xfs_attr_leaf.c | 5 ++++-
> >> fs/xfs/libxfs/xfs_attr_leaf.h | 2 +-
> >> 3 files changed, 26 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> >> index 353fb42..c0ced12 100644
> >> --- a/fs/xfs/libxfs/xfs_attr.c
> >> +++ b/fs/xfs/libxfs/xfs_attr.c
...
> >> @@ -328,48 +329,65 @@ xfs_attr_set(
...
> > Hmm, I don't think this is right. We don't want to release the buffer
> > while the transaction still has a reference. Perhaps we should move this
> > to the "out:" patch after the transaction cancel (and make sure the
> > pointer is reset at the appropriate place)..?
> I think I understand what you are saying. After we call
> xfs_trans_bhold(), we need first the original transaction to commit or
> to abort. Then we must either rejoin the buffer to a different
> transaction or release it. I believe the below patch addresses this
> issue.
>
As noted in the last mail, it may not matter since the tx is committed
or aborted by the time an error is returned here. That said, I still
prefer the code you have below. :)
...
> >> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> >> index b7cd0a0..2e03c32 100644
> >> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> >> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> >> @@ -649,23 +649,24 @@ xfs_attr_shortform_getvalue(xfs_da_args_t *args)
> >> args->valuelen = sfe->valuelen;
> >> memcpy(args->value, &sfe->nameval[args->namelen],
> >> args->valuelen);
> >> return -EEXIST;
> >> }
> >> return -ENOATTR;
> >> }
> >>
> >> /*
> >> * Convert from using the shortform to the leaf.
> >> + * Upon success, return the leaf buffer.
> >> */
> >> int
> >> -xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
> >> +xfs_attr_shortform_to_leaf(xfs_da_args_t *args, xfs_buf_t **bpp)
> >> {
> >> xfs_inode_t *dp;
> >> xfs_attr_shortform_t *sf;
> >> xfs_attr_sf_entry_t *sfe;
> >> xfs_da_args_t nargs;
> >> char *tmpbuffer;
> >> int error, i, size;
> >> xfs_dablk_t blkno;
> >> struct xfs_buf *bp;
> >> xfs_ifork_t *ifp;
> >> @@ -733,20 +734,22 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
> >> ASSERT(error == -ENOATTR);
> >> error = xfs_attr3_leaf_add(bp, &nargs);
> >> ASSERT(error != -ENOSPC);
> >> if (error)
> >> goto out;
> >> sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
> >> }
> >> error = 0;
> >>
> >> out:
> >> + if (error == 0)
> >> + *bpp = bp;
> >
> > It looks like the only way error == 0 is where it is set just above, so
> > we could probably move this before the label.
> I am a bit confused by this code in xfs_attr_shortform_to_leaf():
> ...
> error = xfs_attr3_leaf_create(args, blkno, &bp);
> if (error) {
> error = xfs_da_shrink_inode(args, 0, bp);
> bp = NULL;
> if (error)
> goto out;
> xfs_idata_realloc(dp, size, XFS_ATTR_FORK); /* try to put */
> memcpy(ifp->if_u1.if_data, tmpbuffer, size); /* it back */
> goto out;
> }
> Here we fail to convert to a leaf. But if xfs_da_shrink_inode()
> succeeds, we return error==0 back to the caller. But we actually
> failed to convert. This, however, is not directly related to your
> comment.
>
Hm, I'm not aware of any reason to intentionally clear the error in that
case. That may be a bug.
Brian
> >
> > I'm also wondering whether it might be cleaner to 1.) conditionally
> > return the buffer when sf->hdr.count == 0 because that is the only case
> > where the problem occurs and 2.) do the trans_bhold() here in
> > shortform_to_leaf() when we do actually return it.
> >
> > Brian
> >
> >> kmem_free(tmpbuffer);
> >> return error;
> >> }
> >>
>
> Here is the patch based on kernel 4.13-rc4.
>
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index de7b9bd..982e322 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -218,6 +218,7 @@
> xfs_fsblock_t firstblock;
> int rsvd = (flags & ATTR_ROOT) != 0;
> int error, err2, local;
> + struct xfs_buf *leaf_bp = NULL;
>
> XFS_STATS_INC(mp, xs_attr_set);
>
> @@ -327,7 +328,13 @@
> * GROT: another possible req'mt for a double-split btree op.
> */
> xfs_defer_init(args.dfops, args.firstblock);
> - error = xfs_attr_shortform_to_leaf(&args);
> + error = xfs_attr_shortform_to_leaf(&args, &leaf_bp);
> + /*
> + * Prevent the leaf buffer from being unlocked
> + * when "args.trans" transaction commits.
> + */
> + if (leaf_bp)
> + xfs_trans_bhold(args.trans, leaf_bp);
> if (!error)
> error = xfs_defer_finish(&args.trans, args.dfops, dp);
> if (error) {
> @@ -345,6 +352,14 @@
> if (error)
> goto out;
>
> + /*
> + * Rejoin the leaf buffer to the new transaction.
> + * This allows a subsequent read to find the buffer in the
> + * transaction (and avoid a deadlock).
> + */
> + xfs_trans_bjoin(args.trans, leaf_bp);
> + /* Prevent from being released at the end of the function */
> + leaf_bp = NULL;
> }
>
> if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> @@ -376,6 +391,8 @@
> out:
> if (args.trans)
> xfs_trans_cancel(args.trans);
> + if (leaf_bp)
> + xfs_buf_relse(leaf_bp);
> xfs_iunlock(dp, XFS_ILOCK_EXCL);
> return error;
> }
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index c6c15e5..ab73e4b 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -740,9 +740,10 @@ STATIC void xfs_attr3_leaf_moveents(struct
> xfs_da_args *args,
>
> /*
> * Convert from using the shortform to the leaf.
> + * Upon success, return the leaf buffer.
> */
> int
> -xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
> +xfs_attr_shortform_to_leaf(xfs_da_args_t *args, struct xfs_buf **bpp)
> {
> xfs_inode_t *dp;
> xfs_attr_shortform_t *sf;
> @@ -822,6 +823,7 @@ STATIC void xfs_attr3_leaf_moveents(struct
> xfs_da_args *args,
> sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
> }
> error = 0;
> + *bpp = bp;
>
> out:
> kmem_free(tmpbuffer);
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h
> index f7dda0c..7382d4e 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.h
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.h
> @@ -48,7 +48,7 @@
> void xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff);
> int xfs_attr_shortform_lookup(struct xfs_da_args *args);
> int xfs_attr_shortform_getvalue(struct xfs_da_args *args);
> -int xfs_attr_shortform_to_leaf(struct xfs_da_args *args);
> +int xfs_attr_shortform_to_leaf(struct xfs_da_args *args, struct
> xfs_buf **bpp);
> int xfs_attr_shortform_remove(struct xfs_da_args *args);
> int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
> int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2017-08-07 17:59 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-07 12:54 [RFC - PATCH] xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of the first attribute Alex Lyakas
2017-08-07 15:16 ` Brian Foster
2017-08-07 16:37 ` Darrick J. Wong
2017-08-07 17:11 ` Alex Lyakas
2017-08-07 17:26 ` Brian Foster
2017-08-07 18:10 ` Darrick J. Wong
2017-08-08 7:51 ` Alex Lyakas
2017-08-07 17:00 ` Alex Lyakas
2017-08-07 17:59 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170807175928.GB49072@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=alex@zadarastorage.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=shyam@zadarastorage.com \
--cc=yair@zadarastorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox