From: Sage Weil <sage@newdream.net>
To: liubo <liubo2009@cn.fujitsu.com>
Cc: Linux Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 1/6] Btrfs: fix deadlock in btrfs_commit_transaction
Date: Tue, 26 Oct 2010 09:36:26 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.1010260931310.7660@cobra.newdream.net> (raw)
In-Reply-To: <4CC67930.4030406@cn.fujitsu.com>
On Tue, 26 Oct 2010, liubo wrote:
> On 10/26/2010 03:07 AM, Sage Weil wrote:
> > We calculate timeout (either 1 or MAX_SCHEDULE_TIMEOUT) based on whether
> > num_writers > 1 or should_grow at the top of the loop. Then, much much
> > later, we wait for that timeout if either num_writers or should_grow is
> > true. However, it's possible for a racing process (calling
> > btrfs_end_transaction()) to decrement num_writers such that we wait
> > forever instead of for 1.
> >
>
> IMO, there still exists a deadlock with your patch.
> ===
> with your patch:
>
> thread 1: thread 2:
>
> btrfs_commit_transaction()
> if (num_writers > 1)
> timeout = MAX_TIMEOUT;
(This bit goes away, btw.)
> --------->
> __btrfs_end_transaction()
> num_writers--;
> if (wq)
> wake_up();
> <---------
> smp_mb();
> prepare_wait();
> if (num_writers > 1)
> schedule_timeout(MAX);
> else if (should_grow)
> schedule_timeout(1);
> ===
What's the problem above? The wake_up() doesn't get called, and thread1
doesn't sleep.
> thread2 also needs a memory_barrier, for without memory_barrier,
> on some CPUs, "if (wq)" may be executed before num_writers--, like
> ===
> thread 1: thread 2:
>
> btrfs_commit_transaction()
> if (num_writers > 1)
> timeout = MAX_TIMEOUT;
(This bit is gone)
> --------->
> __btrfs_end_transaction()
> if (wq)
> wake_up();
> <---------
> smp_mb();
> prepare_wait();
> if (num_writers > 1)
> schedule_timeout(MAX);
> else if (should_grow)
> schedule_timeout(1);
> ---------->
> num_writers--;
> ===
> then, thread1 may wait forever.
>
> Since wake_up() itself provides a implied wmb, and a wq active check,
> it is better to drop "if (wq)" in __btrfs_end_transaction().
I see. It could also be
smb_mb();
if (wq)
wake_up();
but just calling wake_up() unconditionally is simpler, and fewer barriers
in the wake_up case. I'm not attached to the if (wq); I just kept it
because it was there already.
Chris?
Thanks!
sage
>
>
> thanks,
> liubo
>
> > Fix this by deciding how long to wait when we wait.
> >
> > Signed-off-by: Sage Weil <sage@newdream.net>
> > ---
> > fs/btrfs/transaction.c | 12 ++++--------
> > 1 files changed, 4 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> > index 66e4c66..bf399ea 100644
> > --- a/fs/btrfs/transaction.c
> > +++ b/fs/btrfs/transaction.c
> > @@ -992,7 +992,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
> > struct btrfs_root *root)
> > {
> > unsigned long joined = 0;
> > - unsigned long timeout = 1;
> > struct btrfs_transaction *cur_trans;
> > struct btrfs_transaction *prev_trans = NULL;
> > DEFINE_WAIT(wait);
> > @@ -1063,11 +1062,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
> > snap_pending = 1;
> >
> > WARN_ON(cur_trans != trans->transaction);
> > - if (cur_trans->num_writers > 1)
> > - timeout = MAX_SCHEDULE_TIMEOUT;
> > - else if (should_grow)
> > - timeout = 1;
> > -
> > mutex_unlock(&root->fs_info->trans_mutex);
> >
> > if (flush_on_commit || snap_pending) {
> > @@ -1089,8 +1083,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
> > TASK_UNINTERRUPTIBLE);
> >
> > smp_mb();
> > - if (cur_trans->num_writers > 1 || should_grow)
> > - schedule_timeout(timeout);
> > + if (cur_trans->num_writers > 1)
> > + schedule_timeout(MAX_SCHEDULE_TIMEOUT);
> > + else if (should_grow)
> > + schedule_timeout(1);
> >
> > mutex_lock(&root->fs_info->trans_mutex);
> > finish_wait(&cur_trans->writer_wait, &wait);
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
next prev parent reply other threads:[~2010-10-26 16:36 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-25 19:07 [PATCH 0/6] Btrfs commit fixes, async subvol operations Sage Weil
2010-10-25 19:07 ` [PATCH 1/6] Btrfs: fix deadlock in btrfs_commit_transaction Sage Weil
2010-10-25 19:07 ` [PATCH 2/6] Btrfs: async transaction commit Sage Weil
2010-10-25 19:07 ` [PATCH 3/6] Btrfs: add START_SYNC, WAIT_SYNC ioctls Sage Weil
2010-10-25 19:07 ` [PATCH 4/6] Btrfs: add SNAP_CREATE_ASYNC ioctl Sage Weil
2010-10-25 19:07 ` [PATCH 5/6] Btrfs: make SNAP_DESTROY async Sage Weil
2010-10-25 19:07 ` [PATCH 6/6] Btrfs: allow subvol deletion by owner Sage Weil
2010-10-26 6:46 ` [PATCH 1/6] Btrfs: fix deadlock in btrfs_commit_transaction liubo
2010-10-26 16:36 ` Sage Weil [this message]
2010-10-26 17:06 ` Chris Mason
2010-10-27 0:41 ` liubo
2010-10-25 19:29 ` [PATCH 0/6] Btrfs commit fixes, async subvol operations Chris Mason
2010-10-25 19:41 ` Sage Weil
2010-10-25 19:58 ` Chris Mason
2010-10-25 21:27 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.1010260931310.7660@cobra.newdream.net \
--to=sage@newdream.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=liubo2009@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).