From: Brian Foster <bfoster@redhat.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: linux-xfs@vger.kernel.org, "Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [PATCH] xfs/log: protect the logging content under xc_ctx_lock
Date: Thu, 31 Oct 2019 07:36:40 -0400 [thread overview]
Message-ID: <20191031113640.GA54006@bfoster> (raw)
In-Reply-To: <1572442631-4472-1-git-send-email-kernelfans@gmail.com>
Dropped linux-fsdevel from cc. There's no reason to spam -fsdevel with
low level XFS patches.
On Wed, Oct 30, 2019 at 09:37:11PM +0800, Pingfan Liu wrote:
> xc_cil_lock is not enough to protect the integrity of a trans logging.
> Taking the scenario:
> cpuA cpuB cpuC
>
> xlog_cil_insert_format_items()
>
> spin_lock(&cil->xc_cil_lock)
> link transA's items to xc_cil,
> including item1
> spin_unlock(&cil->xc_cil_lock)
So you commit a transaction, item1 ends up on the CIL.
> xlog_cil_push() fetches transA's item under xc_cil_lock
xlog_cil_push() doesn't use ->xc_cil_lock, so I'm not sure what this
means. This sequence executes under ->xc_ctx_lock in write mode, which
locks out all transaction committers.
> issue transB, modify item1
So presumably transB joins item1 while it is on the CIL from trans A and
commits.
> xlog_write(), but now, item1 contains content from transB and we have a broken transA
I'm not following how this is possible. The CIL push above, under
exclusive lock, removes each log item from ->xc_cil and pulls the log
vectors off of the log items to form the lv chain on the CIL context.
This means that the transB commit either updates the lv attached to the
log item from transA with the latest in-core version or uses the new
shadow buffer allocated in the commit path of transB. Either way is fine
because there is no guarantee of per-transaction granularity in the
on-disk log. The purpose of the on-disk log is to guarantee filesystem
consistency after a crash.
All in all, I can't really tell what problem you're describing here. If
you believe there's an issue in this code, I'd suggest to either try and
instrument it manually to reproduce a demonstrable problem and/or
provide far more detailed of a description to explain it.
>
> Survive this race issue by putting under the protection of xc_ctx_lock.
> Meanwhile the xc_cil_lock can be dropped as xc_ctx_lock does it against
> xlog_cil_insert_items()
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: Brian Foster <bfoster@redhat.com>
> To: linux-xfs@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> ---
FYI, this patch also doesn't apply to for-next. I'm guessing because
it's based on your previous patch to add the spinlock around the loop.
> fs/xfs/xfs_log_cil.c | 35 +++++++++++++++++++----------------
> 1 file changed, 19 insertions(+), 16 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 004af09..f8df3b5 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -723,22 +723,6 @@ xlog_cil_push(
> */
> lv = NULL;
> num_iovecs = 0;
> - spin_lock(&cil->xc_cil_lock);
> - while (!list_empty(&cil->xc_cil)) {
There's a comment just above that documents this loop that isn't
moved/modified.
> - struct xfs_log_item *item;
> -
> - item = list_first_entry(&cil->xc_cil,
> - struct xfs_log_item, li_cil);
> - list_del_init(&item->li_cil);
> - if (!ctx->lv_chain)
> - ctx->lv_chain = item->li_lv;
> - else
> - lv->lv_next = item->li_lv;
> - lv = item->li_lv;
> - item->li_lv = NULL;
> - num_iovecs += lv->lv_niovecs;
> - }
> - spin_unlock(&cil->xc_cil_lock);
>
> /*
> * initialise the new context and attach it to the CIL. Then attach
> @@ -783,6 +767,25 @@ xlog_cil_push(
> up_write(&cil->xc_ctx_lock);
>
> /*
> + * cil->xc_cil_lock around this loop can be dropped, since xc_ctx_lock
> + * protects us against xlog_cil_insert_items().
> + */
> + while (!list_empty(&cil->xc_cil)) {
> + struct xfs_log_item *item;
> +
> + item = list_first_entry(&cil->xc_cil,
> + struct xfs_log_item, li_cil);
> + list_del_init(&item->li_cil);
> + if (!ctx->lv_chain)
> + ctx->lv_chain = item->li_lv;
> + else
> + lv->lv_next = item->li_lv;
> + lv = item->li_lv;
> + item->li_lv = NULL;
> + num_iovecs += lv->lv_niovecs;
> + }
> +
This places the associated loop outside of ->xc_ctx_lock, which means we
can now race modifying ->xc_cil during a CIL push and a transaction
commit. Have you tested this?
Brian
> + /*
> * Build a checkpoint transaction header and write it to the log to
> * begin the transaction. We need to account for the space used by the
> * transaction header here as it is not accounted for in xlog_write().
> --
> 2.7.5
>
next prev parent reply other threads:[~2019-10-31 11:36 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-30 6:29 [PATCH] xfs/log: protect xc_cil in xlog_cil_push() Pingfan Liu
2019-10-30 12:53 ` Brian Foster
2019-10-30 13:33 ` Pingfan Liu
2019-10-30 13:37 ` [PATCH] xfs/log: protect the logging content under xc_ctx_lock Pingfan Liu
2019-10-30 16:48 ` Darrick J. Wong
2019-10-31 3:48 ` Pingfan Liu
2019-10-31 11:36 ` Brian Foster [this message]
2019-11-01 4:02 ` Pingfan Liu
2019-10-31 21:40 ` Dave Chinner
2019-11-01 3:39 ` Pingfan Liu
2019-10-31 21:25 ` [PATCH] xfs/log: protect xc_cil in xlog_cil_push() Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191031113640.GA54006@bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=kernelfans@gmail.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).