All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Tinguely <tinguely@sgi.com>
To: Jeff Liu <jeff.liu@oracle.com>
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: [PATCH 1/4] xfs: wake up cil->xc_commit_wait while removing ctx from cil->xc_committing
Date: Mon, 30 Dec 2013 09:20:54 -0600	[thread overview]
Message-ID: <52C18F56.70709@sgi.com> (raw)
In-Reply-To: <52B98292.5040002@oracle.com>

On 12/24/13 06:48, Jeff Liu wrote:
> From: Jie Liu<jeff.liu@oracle.com>
>
> I can easily to hit a hang up while running fsstress and shutting down
> XFS on SSD via the tests below:
>
> for ((i=0;i<10;i++))
> do
>      echo "[$i] Fire up..."
>      mount /dev/sda7 /xfs
>      fsstress -d /xfs -n 1000 -p 100>/dev/null 2>&1&
>      sleep 10
>      godown /xfs
>      wait
>      killall -q fsstress
>      umount /xfs
>      echo "[$i] Done...."
>      echo
> done
>
> which yielding a backtrace as below:
>
> [  246.268987] INFO: task fsstress:3347 blocked for more than 120 seconds.
> [  246.268992]       Tainted: PF          O 3.13.0-rc2+ #4
> [  246.268994] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  246.268996] fsstress        D ffff88026f254440     0  3347   3284
> <snip>
> [  246.269013] Call Trace:
> [  246.269022]  [<ffffffff816f3829>] schedule+0x29/0x70
> [  246.269054]  [<ffffffffa0c4546b>] xlog_cil_force_lsn+0x1cb/0x220 [xfs]
> [  246.269059]  [<ffffffff81097210>] ? wake_up_state+0x20/0x20
> [  246.269064]  [<ffffffff811e9110>] ? do_fsync+0x80/0x80
> [  246.269087]  [<ffffffffa0c43881>] _xfs_log_force+0x61/0x270 [xfs]
> [  246.269091]  [<ffffffff8128b490>] ? jbd2_log_wait_commit+0x110/0x180
> [  246.269095]  [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100
> [  246.269098]  [<ffffffff811e9110>] ? do_fsync+0x80/0x80
> [  246.269120]  [<ffffffffa0c43ab6>] xfs_log_force+0x26/0x80 [xfs]
> [  246.269139]  [<ffffffffa0bea31d>] xfs_fs_sync_fs+0x2d/0x50 [xfs]
> [  246.269143]  [<ffffffff811e9130>] sync_fs_one_sb+0x20/0x30
> [  246.269147]  [<ffffffff811bd5d2>] iterate_supers+0xb2/0x110
> [  246.269150]  [<ffffffff811e9262>] sys_sync+0x62/0xa0
> [  246.269156]  [<ffffffff816ffd6d>] system_call_fastpath+0x1a/0x1f
> [  266.335154] XFS (sda7): xfs_log_force: error 5 returned.
> [  296.400515] XFS (sda7): xfs_log_force: error 5 returned.
>
> In xlog_cil_force_lsn(), if the task finds a previous sequence still in
> committing, it need to wait until all those previously sequence commits
> to complete, i.e, blocked on cil->xc_commit_wait wait queue.  In normal
> situations, the ctx with a previous sequence will eventually commit and
> wake up tasks on cil->xc_commit_wait after getting a vaild commit_lsn
> (see xlog_cil_push()).  However, if something wrong during commit, e.g,
> XLOG_STATE_IOERROR is detected, it will be aborted and the ctx will be
> just removed from the cil->xc_committing list but we did not wake up
> the waiting tasks in this case.  Hence, there is a race condition will
> happen as below:
>
> 	Task1                    Task2
>
>                  	list_add(&ctx->committing,&cil->xc_committing);
>
> xlog_wait(&cil->xc_commit_wait..)
> schedule()...
>
>                  	Aborting!! list_del(&ctx->committing);
>                  	wake_up_all(&cil->xc_commit_wait);<-- MISSING!
>
> As a result, we should handle this situation in xlog_cil_committed().
>
> Signed-off-by: Jie Liu<jeff.liu@oracle.com>
> ---
>   fs/xfs/xfs_log_cil.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 5eb51fc..8c7e9c7 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -406,6 +406,8 @@ xlog_cil_committed(
>
>   	spin_lock(&ctx->cil->xc_push_lock);
>   	list_del(&ctx->committing);
> +	if (abort)
> +		wake_up_all(&ctx->cil->xc_commit_wait);
>   	spin_unlock(&ctx->cil->xc_push_lock);
>
>   	xlog_cil_free_logvec(ctx->lv_chain);

Hi Jeff, I hope you had a good break,

So you are saying the wakeup in the CIL push error path missing?
I agree with that. But I don't like adding a new wake up to 
xlog_cil_committed(), which is after the log buffer is written.

Thanks.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-12-30 15:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-24 12:48 [PATCH 1/4] xfs: wake up cil->xc_commit_wait while removing ctx from cil->xc_committing Jeff Liu
2013-12-30 15:20 ` Mark Tinguely [this message]
2014-01-01 14:38   ` Jeff Liu
2014-01-02  0:45     ` Dave Chinner
2014-01-03 10:25       ` Jeff Liu
2014-01-03 13:17         ` Jeff Liu
2014-01-03 15:30           ` Mark Tinguely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52C18F56.70709@sgi.com \
    --to=tinguely@sgi.com \
    --cc=jeff.liu@oracle.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.