* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
@ 2016-01-19 16:46 Eric Ren
2016-01-20 2:16 ` Eric Ren
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Eric Ren @ 2016-01-19 16:46 UTC (permalink / raw)
To: ocfs2-devel
This problem was introduced by commit a19128260107f951d1b4c421cf98b92f8092b069.
OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY. This
will prevent dc thread from downconverting immediately, and let mask-waiters in
->l_mask_waiters list whose requesting level is compatible with ->l_level to take
the lock. But if we have two waiters in mw list, the first is to get EX lock, and
the second is to to get PR lock. The first may fail to get lock and then clear
UPCONVERT_FINISHING. It's too early to clear the flag because this second will be
also queued again even if ->l_level is PR. As a result, nobody would kick up dc
thread, leaving dlmglue a deadlock until another lockres relative thread wake it
up.
More specifically, for example:
On node1, there is thread W1 keeping writing; on node2, there are thread R1 and
R2 keeping reading; sure this 3 threads make IO on the same shared file. At a
time, node2 is receiving ast(0=>3), followed immediately by a bast requesting EX
lock on behave of node1. Then this may happen:
node2: node1:
l_level==3; R1(3); R2(3) l_level==3
R1(unlock); R1(3=>5, update atime) W1(3=>5)
BAST
R2(unlock); AST(3=>0)
R2(0=>3)
BAST
AST(0=>3)
set OCFS2_LOCK_UPCONVERT_FINISHING
clear OCFS2_LOCK_BUSY
W1(3=>5)
BAST
dc thread requeue=yes
R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
R2(wait)
...
dlmglue deadlock util dc thread woken up by others
This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
been cleared and every waiters has been looped.
Signed-off-by: Eric Ren <zren@suse.com>
---
fs/ocfs2/dlmglue.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index f92612e..72f8b6c 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res *lockres,
unsigned long clear)
{
lockres_set_flags(lockres, lockres->l_flags & ~clear);
+ if(clear & OCFS2_LOCK_BUSY)
+ lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
}
static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres)
@@ -1522,8 +1524,6 @@ update_holders:
ret = 0;
unlock:
- lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
-
spin_unlock_irqrestore(&lockres->l_lock, flags);
out:
/*
--
2.6.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
2016-01-19 16:46 [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early Eric Ren
@ 2016-01-20 2:16 ` Eric Ren
2016-01-20 2:35 ` Zhen Ren
2016-01-21 7:10 ` Junxiao Bi
2 siblings, 0 replies; 8+ messages in thread
From: Eric Ren @ 2016-01-20 2:16 UTC (permalink / raw)
To: ocfs2-devel
Hi,
This fix is wrong, becuase it can ensure waking up every waiter, but cannot
guarantee every waiter finish trying its "again" patch in __ocfs2_cluster_lock().
Other solutions now on my mind are:
1. Give every waiter an ID. When clearing OCFS2_LOCK_BUSY, we can record those IDs
in an array. Process any waiter in mask-waiter list, remove the ID from the arry
if its ID is in the array, util array is empty we can then clear
OCFS2_LOCK_UPCONVERT_FINISHING.
I think it's a bad idea. It's inefficient to handle the array and the ID control is
another problem.
2. Split mask-waiter list into two lists: one for OCFS2_LOCK_BUSY, and another for
OCFS2_LOCK_BLOCKED. When OCFS2_LOCK_BUSY being cleared and OCFS2_LOCK_BLOCKED being
set, we should process waiters in BUSY list and move waiters who cannot get the lock into
BLOCKED list again. And when OCFS2_LOCK_BLOCKED being cleared and OCFS2_LOCK_BUSY being
set, we should do things like that.
But is any chance that both OCFS2_LOCK_BUSY and OCFS2_LOCK_BLOCKED are set at the same time?
If not, I prefer this one.
What do you think? Any comment would be appreciated.
Thanks,
Eric
On Wed, Jan 20, 2016 at 12:46:53AM +0800, Eric Ren wrote:
> This problem was introduced by commit a19128260107f951d1b4c421cf98b92f8092b069.
> OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY. This
> will prevent dc thread from downconverting immediately, and let mask-waiters in
> ->l_mask_waiters list whose requesting level is compatible with ->l_level to take
> the lock. But if we have two waiters in mw list, the first is to get EX lock, and
> the second is to to get PR lock. The first may fail to get lock and then clear
> UPCONVERT_FINISHING. It's too early to clear the flag because this second will be
> also queued again even if ->l_level is PR. As a result, nobody would kick up dc
> thread, leaving dlmglue a deadlock until another lockres relative thread wake it
> up.
>
> More specifically, for example:
> On node1, there is thread W1 keeping writing; on node2, there are thread R1 and
> R2 keeping reading; sure this 3 threads make IO on the same shared file. At a
> time, node2 is receiving ast(0=>3), followed immediately by a bast requesting EX
> lock on behave of node1. Then this may happen:
> node2: node1:
> l_level==3; R1(3); R2(3) l_level==3
> R1(unlock); R1(3=>5, update atime) W1(3=>5)
> BAST
> R2(unlock); AST(3=>0)
> R2(0=>3)
> BAST
> AST(0=>3)
> set OCFS2_LOCK_UPCONVERT_FINISHING
> clear OCFS2_LOCK_BUSY
> W1(3=>5)
> BAST
> dc thread requeue=yes
> R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
> R2(wait)
> ...
> dlmglue deadlock util dc thread woken up by others
>
> This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
> been cleared and every waiters has been looped.
>
> Signed-off-by: Eric Ren <zren@suse.com>
> ---
> fs/ocfs2/dlmglue.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index f92612e..72f8b6c 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res *lockres,
> unsigned long clear)
> {
> lockres_set_flags(lockres, lockres->l_flags & ~clear);
> + if(clear & OCFS2_LOCK_BUSY)
> + lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
> }
>
> static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres)
> @@ -1522,8 +1524,6 @@ update_holders:
>
> ret = 0;
> unlock:
> - lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
> -
> spin_unlock_irqrestore(&lockres->l_lock, flags);
> out:
> /*
> --
> 2.6.2
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
2016-01-19 16:46 [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early Eric Ren
2016-01-20 2:16 ` Eric Ren
@ 2016-01-20 2:35 ` Zhen Ren
2016-01-21 7:10 ` Junxiao Bi
2 siblings, 0 replies; 8+ messages in thread
From: Zhen Ren @ 2016-01-20 2:35 UTC (permalink / raw)
To: ocfs2-devel
Hi,
Very sorry, this fix is wrong, becuase it can ensure waking up every waiter, but cannot
guarantee every waiter finish trying its "again" patch in __ocfs2_cluster_lock().
Other solutions now on my mind are:
1. Give every waiter an ID. When clearing OCFS2_LOCK_BUSY, we can record those IDs
in an array. Process any waiter in mask-waiter list, remove the ID from the arry
if its ID is in the array, util array is empty we can then clear
OCFS2_LOCK_UPCONVERT_FINISHING.
I think it's a bad idea. It's inefficient to handle the array and the ID control is
another problem.
2. Split mask-waiter list into two lists: one for OCFS2_LOCK_BUSY, and another for
OCFS2_LOCK_BLOCKED. When OCFS2_LOCK_BUSY being cleared and OCFS2_LOCK_BLOCKED being
set, we should process waiters in BUSY list and move waiters who cannot get the lock into
BLOCKED list again. And when OCFS2_LOCK_BLOCKED being cleared and OCFS2_LOCK_BUSY being
set, we should do things like that.
But is any chance that both OCFS2_LOCK_BUSY and OCFS2_LOCK_BLOCKED are set at the same time?
If not, I prefer this one.
What do you think? Any comment would be appreciated.
Thanks,
Eric
>>>
> This problem was introduced by commit
> a19128260107f951d1b4c421cf98b92f8092b069.
> OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY.
> This
> will prevent dc thread from downconverting immediately, and let mask-waiters
> in
> ->l_mask_waiters list whose requesting level is compatible with ->l_level to
> take
> the lock. But if we have two waiters in mw list, the first is to get EX
> lock, and
> the second is to to get PR lock. The first may fail to get lock and then
> clear
> UPCONVERT_FINISHING. It's too early to clear the flag because this second
> will be
> also queued again even if ->l_level is PR. As a result, nobody would kick up
> dc
> thread, leaving dlmglue a deadlock until another lockres relative thread
> wake it
> up.
>
> More specifically, for example:
> On node1, there is thread W1 keeping writing; on node2, there are thread R1
> and
> R2 keeping reading; sure this 3 threads make IO on the same shared file. At
> a
> time, node2 is receiving ast(0=>3), followed immediately by a bast requesting
> EX
> lock on behave of node1. Then this may happen:
> node2: node1:
> l_level==3; R1(3); R2(3) l_level==3
> R1(unlock); R1(3=>5, update atime) W1(3=>5)
> BAST
> R2(unlock); AST(3=>0)
> R2(0=>3)
> BAST
> AST(0=>3)
> set OCFS2_LOCK_UPCONVERT_FINISHING
> clear OCFS2_LOCK_BUSY
> W1(3=>5)
> BAST
> dc thread requeue=yes
> R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
> R2(wait)
> ...
> dlmglue deadlock util dc thread woken up by others
>
> This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
> been cleared and every waiters has been looped.
>
> Signed-off-by: Eric Ren <zren@suse.com>
> ---
> fs/ocfs2/dlmglue.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index f92612e..72f8b6c 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res
> *lockres,
> unsigned long clear)
> {
> lockres_set_flags(lockres, lockres->l_flags & ~clear);
> + if(clear & OCFS2_LOCK_BUSY)
> + lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
> }
>
> static inline void ocfs2_generic_handle_downconvert_action(struct
> ocfs2_lock_res *lockres)
> @@ -1522,8 +1524,6 @@ update_holders:
>
> ret = 0;
> unlock:
> - lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
> -
> spin_unlock_irqrestore(&lockres->l_lock, flags);
> out:
> /*
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
2016-01-19 16:46 [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early Eric Ren
2016-01-20 2:16 ` Eric Ren
2016-01-20 2:35 ` Zhen Ren
@ 2016-01-21 7:10 ` Junxiao Bi
2016-01-21 8:10 ` Eric Ren
2 siblings, 1 reply; 8+ messages in thread
From: Junxiao Bi @ 2016-01-21 7:10 UTC (permalink / raw)
To: ocfs2-devel
Hi Eric,
This patch should fix your issue.
"NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
Thanks,
Junxiao.
On 01/20/2016 12:46 AM, Eric Ren wrote:
> This problem was introduced by commit a19128260107f951d1b4c421cf98b92f8092b069.
> OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY. This
> will prevent dc thread from downconverting immediately, and let mask-waiters in
> ->l_mask_waiters list whose requesting level is compatible with ->l_level to take
> the lock. But if we have two waiters in mw list, the first is to get EX lock, and
> the second is to to get PR lock. The first may fail to get lock and then clear
> UPCONVERT_FINISHING. It's too early to clear the flag because this second will be
> also queued again even if ->l_level is PR. As a result, nobody would kick up dc
> thread, leaving dlmglue a deadlock until another lockres relative thread wake it
> up.
>
> More specifically, for example:
> On node1, there is thread W1 keeping writing; on node2, there are thread R1 and
> R2 keeping reading; sure this 3 threads make IO on the same shared file. At a
> time, node2 is receiving ast(0=>3), followed immediately by a bast requesting EX
> lock on behave of node1. Then this may happen:
> node2: node1:
> l_level==3; R1(3); R2(3) l_level==3
> R1(unlock); R1(3=>5, update atime) W1(3=>5)
> BAST
> R2(unlock); AST(3=>0)
> R2(0=>3)
> BAST
> AST(0=>3)
> set OCFS2_LOCK_UPCONVERT_FINISHING
> clear OCFS2_LOCK_BUSY
> W1(3=>5)
> BAST
> dc thread requeue=yes
> R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
> R2(wait)
> ...
> dlmglue deadlock util dc thread woken up by others
>
> This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
> been cleared and every waiters has been looped.
>
> Signed-off-by: Eric Ren <zren@suse.com>
> ---
> fs/ocfs2/dlmglue.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index f92612e..72f8b6c 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res *lockres,
> unsigned long clear)
> {
> lockres_set_flags(lockres, lockres->l_flags & ~clear);
> + if(clear & OCFS2_LOCK_BUSY)
> + lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
> }
>
> static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres)
> @@ -1522,8 +1524,6 @@ update_holders:
>
> ret = 0;
> unlock:
> - lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
> -
> spin_unlock_irqrestore(&lockres->l_lock, flags);
> out:
> /*
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
2016-01-21 7:10 ` Junxiao Bi
@ 2016-01-21 8:10 ` Eric Ren
2016-01-21 8:18 ` Junxiao Bi
0 siblings, 1 reply; 8+ messages in thread
From: Eric Ren @ 2016-01-21 8:10 UTC (permalink / raw)
To: ocfs2-devel
Hi Junxiao,
On Thu, Jan 21, 2016 at 03:10:20PM +0800, Junxiao Bi wrote:
> Hi Eric,
>
> This patch should fix your issue.
> "NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
Thanks a lot for bringing up this patch! It hasn't been merged into mainline(
at least 4.4), right?
I have found this patch in maillist and it looks good! I'd like to test it right
now and give feadback!
Thanks again,
Eric
>
> Thanks,
> Junxiao.
> On 01/20/2016 12:46 AM, Eric Ren wrote:
> > This problem was introduced by commit a19128260107f951d1b4c421cf98b92f8092b069.
> > OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY. This
> > will prevent dc thread from downconverting immediately, and let mask-waiters in
> > ->l_mask_waiters list whose requesting level is compatible with ->l_level to take
> > the lock. But if we have two waiters in mw list, the first is to get EX lock, and
> > the second is to to get PR lock. The first may fail to get lock and then clear
> > UPCONVERT_FINISHING. It's too early to clear the flag because this second will be
> > also queued again even if ->l_level is PR. As a result, nobody would kick up dc
> > thread, leaving dlmglue a deadlock until another lockres relative thread wake it
> > up.
> >
> > More specifically, for example:
> > On node1, there is thread W1 keeping writing; on node2, there are thread R1 and
> > R2 keeping reading; sure this 3 threads make IO on the same shared file. At a
> > time, node2 is receiving ast(0=>3), followed immediately by a bast requesting EX
> > lock on behave of node1. Then this may happen:
> > node2: node1:
> > l_level==3; R1(3); R2(3) l_level==3
> > R1(unlock); R1(3=>5, update atime) W1(3=>5)
> > BAST
> > R2(unlock); AST(3=>0)
> > R2(0=>3)
> > BAST
> > AST(0=>3)
> > set OCFS2_LOCK_UPCONVERT_FINISHING
> > clear OCFS2_LOCK_BUSY
> > W1(3=>5)
> > BAST
> > dc thread requeue=yes
> > R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
> > R2(wait)
> > ...
> > dlmglue deadlock util dc thread woken up by others
> >
> > This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
> > been cleared and every waiters has been looped.
> >
> > Signed-off-by: Eric Ren <zren@suse.com>
> > ---
> > fs/ocfs2/dlmglue.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> > index f92612e..72f8b6c 100644
> > --- a/fs/ocfs2/dlmglue.c
> > +++ b/fs/ocfs2/dlmglue.c
> > @@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res *lockres,
> > unsigned long clear)
> > {
> > lockres_set_flags(lockres, lockres->l_flags & ~clear);
> > + if(clear & OCFS2_LOCK_BUSY)
> > + lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
> > }
> >
> > static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres)
> > @@ -1522,8 +1524,6 @@ update_holders:
> >
> > ret = 0;
> > unlock:
> > - lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
> > -
> > spin_unlock_irqrestore(&lockres->l_lock, flags);
> > out:
> > /*
> >
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
2016-01-21 8:10 ` Eric Ren
@ 2016-01-21 8:18 ` Junxiao Bi
2016-01-21 23:05 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Junxiao Bi @ 2016-01-21 8:18 UTC (permalink / raw)
To: ocfs2-devel
On 01/21/2016 04:10 PM, Eric Ren wrote:
> Hi Junxiao,
>
> On Thu, Jan 21, 2016 at 03:10:20PM +0800, Junxiao Bi wrote:
>> Hi Eric,
>>
>> This patch should fix your issue.
>> "NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
>
> Thanks a lot for bringing up this patch! It hasn't been merged into mainline(
> at least 4.4), right?
Right, it is still in linux-next.
Thanks,
Junxiao.
>
> I have found this patch in maillist and it looks good! I'd like to test it right
> now and give feadback!
>
> Thanks again,
> Eric
>
>>
>> Thanks,
>> Junxiao.
>> On 01/20/2016 12:46 AM, Eric Ren wrote:
>>> This problem was introduced by commit a19128260107f951d1b4c421cf98b92f8092b069.
>>> OCFS2_LOCK_UPCONVERT_FINISHING is set just before clearing OCFS2_LOCK_BUSY. This
>>> will prevent dc thread from downconverting immediately, and let mask-waiters in
>>> ->l_mask_waiters list whose requesting level is compatible with ->l_level to take
>>> the lock. But if we have two waiters in mw list, the first is to get EX lock, and
>>> the second is to to get PR lock. The first may fail to get lock and then clear
>>> UPCONVERT_FINISHING. It's too early to clear the flag because this second will be
>>> also queued again even if ->l_level is PR. As a result, nobody would kick up dc
>>> thread, leaving dlmglue a deadlock until another lockres relative thread wake it
>>> up.
>>>
>>> More specifically, for example:
>>> On node1, there is thread W1 keeping writing; on node2, there are thread R1 and
>>> R2 keeping reading; sure this 3 threads make IO on the same shared file. At a
>>> time, node2 is receiving ast(0=>3), followed immediately by a bast requesting EX
>>> lock on behave of node1. Then this may happen:
>>> node2: node1:
>>> l_level==3; R1(3); R2(3) l_level==3
>>> R1(unlock); R1(3=>5, update atime) W1(3=>5)
>>> BAST
>>> R2(unlock); AST(3=>0)
>>> R2(0=>3)
>>> BAST
>>> AST(0=>3)
>>> set OCFS2_LOCK_UPCONVERT_FINISHING
>>> clear OCFS2_LOCK_BUSY
>>> W1(3=>5)
>>> BAST
>>> dc thread requeue=yes
>>> R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait)
>>> R2(wait)
>>> ...
>>> dlmglue deadlock util dc thread woken up by others
>>>
>>> This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has
>>> been cleared and every waiters has been looped.
>>>
>>> Signed-off-by: Eric Ren <zren@suse.com>
>>> ---
>>> fs/ocfs2/dlmglue.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index f92612e..72f8b6c 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -824,6 +824,8 @@ static void lockres_clear_flags(struct ocfs2_lock_res *lockres,
>>> unsigned long clear)
>>> {
>>> lockres_set_flags(lockres, lockres->l_flags & ~clear);
>>> + if(clear & OCFS2_LOCK_BUSY)
>>> + lockres->l_flags &= ~OCFS2_LOCK_UPCONVERT_FINISHING;
>>> }
>>>
>>> static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres)
>>> @@ -1522,8 +1524,6 @@ update_holders:
>>>
>>> ret = 0;
>>> unlock:
>>> - lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
>>> -
>>> spin_unlock_irqrestore(&lockres->l_lock, flags);
>>> out:
>>> /*
>>>
>>
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
2016-01-21 8:18 ` Junxiao Bi
@ 2016-01-21 23:05 ` Andrew Morton
2016-01-22 2:32 ` Eric Ren
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2016-01-21 23:05 UTC (permalink / raw)
To: ocfs2-devel
On Thu, 21 Jan 2016 16:18:38 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> On 01/21/2016 04:10 PM, Eric Ren wrote:
> > Hi Junxiao,
> >
> > On Thu, Jan 21, 2016 at 03:10:20PM +0800, Junxiao Bi wrote:
> >> Hi Eric,
> >>
> >> This patch should fix your issue.
> >> "NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
> >
> > Thanks a lot for bringing up this patch! It hasn't been merged into mainline(
> > at least 4.4), right?
> Right, it is still in linux-next.
I'll be sending it to Linus today.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early
2016-01-21 23:05 ` Andrew Morton
@ 2016-01-22 2:32 ` Eric Ren
0 siblings, 0 replies; 8+ messages in thread
From: Eric Ren @ 2016-01-22 2:32 UTC (permalink / raw)
To: ocfs2-devel
Hi all,
On Thu, Jan 21, 2016 at 03:05:58PM -0800, Andrew Morton wrote:
> On Thu, 21 Jan 2016 16:18:38 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
>
> > On 01/21/2016 04:10 PM, Eric Ren wrote:
> > > Hi Junxiao,
> > >
> > > On Thu, Jan 21, 2016 at 03:10:20PM +0800, Junxiao Bi wrote:
> > >> Hi Eric,
> > >>
> > >> This patch should fix your issue.
> > >> "NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
> > >
> > > Thanks a lot for bringing up this patch! It hasn't been merged into mainline(
> > > at least 4.4), right?
> > Right, it is still in linux-next.
>
> I'll be sending it to Linus today.
Thanks! This patch can also avoid the deadlock in my case. It make sense into mainline now.
But there still remains another problem, that is, it may not be fair enough for the node which
have more than one threads resting on mask-waiter list during OCFS2_LOCK_BUSY was set. Now,
it only give one of those waiters the chance of retry, then clear UPCONVERT_FINISHING, and
then start the downconvert thread.
I think it should be better to let every waiter retry beforing down conversion. However,
it's more complex. We could discuss and fix it in a new thread later;-)
THX,
Eric
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-22 2:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-19 16:46 [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early Eric Ren
2016-01-20 2:16 ` Eric Ren
2016-01-20 2:35 ` Zhen Ren
2016-01-21 7:10 ` Junxiao Bi
2016-01-21 8:10 ` Eric Ren
2016-01-21 8:18 ` Junxiao Bi
2016-01-21 23:05 ` Andrew Morton
2016-01-22 2:32 ` Eric Ren
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).