* [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue.
@ 2017-08-09 5:51 tsutomu.owa
2017-08-09 16:41 ` David Teigland
0 siblings, 1 reply; 4+ messages in thread
From: tsutomu.owa @ 2017-08-09 5:51 UTC (permalink / raw)
To: cluster-devel.redhat.com
If there is a lock resource conflict on multiple nodes, the lock on
convert queue may not be granted forever.
EX.)
grant queue:
node0 grmode NL / rqmode IV
node1 grmode NL / rqmode IV
convert queue:
node2 grmode NL / rqmode EX
node3 grmode PR / rqmode EX
wait queue:
node4 grmode IV / rqmode PR
node5 grmode IV / rqmode PR
When the lock conversion (node PR -> NL) of node 0 is completed, the lock
of node 2 should be grantable. However, __can_be_granted() returns 0
because the grmode of the lock on node 3 in convert queue is PR.
When checking the lock at the head of convert queue, exclude
queue_conflict() targeting convert queue.
Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
---
fs/dlm/lock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..dcdd26d 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2336,7 +2336,8 @@ static int _can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now,
* locks
*/
- if (queue_conflict(&r->res_convertqueue, lkb))
+ if (!first_in_list(lkb, &r->res_convertqueue) &&
+ queue_conflict(&r->res_convertqueue, lkb))
return 0;
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue.
2017-08-09 5:51 [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue tsutomu.owa
@ 2017-08-09 16:41 ` David Teigland
2017-08-09 18:48 ` David Teigland
0 siblings, 1 reply; 4+ messages in thread
From: David Teigland @ 2017-08-09 16:41 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, Aug 09, 2017 at 05:51:37AM +0000, tsutomu.owa at toshiba.co.jp wrote:
> If there is a lock resource conflict on multiple nodes, the lock on
> convert queue may not be granted forever.
>
> EX.)
> grant queue:
> node0 grmode NL / rqmode IV
> node1 grmode NL / rqmode IV
>
> convert queue:
> node2 grmode NL / rqmode EX
> node3 grmode PR / rqmode EX
>
> wait queue:
> node4 grmode IV / rqmode PR
> node5 grmode IV / rqmode PR
>
> When the lock conversion (node PR -> NL) of node 0 is completed, the lock
> of node 2 should be grantable. However, __can_be_granted() returns 0
> because the grmode of the lock on node 3 in convert queue is PR.
>
> When checking the lock at the head of convert queue, exclude
> queue_conflict() targeting convert queue.
This example doesn't look right. node2's NL->EX cannot be granted because
it conflicts with the PR lock held by node3. (The grmode is still valid
when a lock is on the convert queue.)
There are two valid outcomes in the example above, either 1) node3 PR->EX
is granted, or 2) node4 and node5 PR requests are granted. What have you
seen the dlm do in this state? If it does not grant anything, that would
be a bug.
Based on the sequence of events you describe, I think that the correct
outcome would be 1 (granting node3's PR->EX), based on this rule:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c#n2429
> - if (queue_conflict(&r->res_convertqueue, lkb))
> + if (!first_in_list(lkb, &r->res_convertqueue) &&
> + queue_conflict(&r->res_convertqueue, lkb))
> return 0;
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue.
2017-08-09 16:41 ` David Teigland
@ 2017-08-09 18:48 ` David Teigland
2017-08-17 23:40 ` tsutomu.owa
0 siblings, 1 reply; 4+ messages in thread
From: David Teigland @ 2017-08-09 18:48 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, Aug 09, 2017 at 11:41:44AM -0500, David Teigland wrote:
> On Wed, Aug 09, 2017 at 05:51:37AM +0000, tsutomu.owa at toshiba.co.jp wrote:
> > If there is a lock resource conflict on multiple nodes, the lock on
> > convert queue may not be granted forever.
> >
> > EX.)
> > grant queue:
> > node0 grmode NL / rqmode IV
> > node1 grmode NL / rqmode IV
> >
> > convert queue:
> > node2 grmode NL / rqmode EX
> > node3 grmode PR / rqmode EX
> >
> > wait queue:
> > node4 grmode IV / rqmode PR
> > node5 grmode IV / rqmode PR
> >
> > When the lock conversion (node PR -> NL) of node 0 is completed, the lock
> > of node 2 should be grantable. However, __can_be_granted() returns 0
> > because the grmode of the lock on node 3 in convert queue is PR.
> >
> > When checking the lock at the head of convert queue, exclude
> > queue_conflict() targeting convert queue.
>
> This example doesn't look right. node2's NL->EX cannot be granted because
> it conflicts with the PR lock held by node3. (The grmode is still valid
> when a lock is on the convert queue.)
After looking more closely, this is a subtle form of conversion deadlock,
and this exact case is described in the comment here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c#n2218
This should be handled by the dlm canceling one of the converting locks
(returning it to the grant queue with IV rqmode) and returning -EDEADLK to
the application. There is a FIXME in the code highlighting a case you
could be hitting:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c#n2504
If you are running into that FIXME, you should see these log messages:
if (deadlk) {
log_print("WARN: pending deadlock %x node %d %s",
lkb->lkb_id, lkb->lkb_nodeid, r->res_name);
dlm_dump_rsb(r);
continue;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue.
2017-08-09 18:48 ` David Teigland
@ 2017-08-17 23:40 ` tsutomu.owa
0 siblings, 0 replies; 4+ messages in thread
From: tsutomu.owa @ 2017-08-17 23:40 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
>> After looking more closely, this is a subtle form of conversion deadlock,
>> and this exact case is described in the comment here:
thanks, we withdraw this patch for now since we need to look into more.
-- owa
-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com]
Sent: Thursday, August 10, 2017 3:48 AM
To: owa tsutomu(?? ? ??? ?????????????)
Cc: cluster-devel at redhat.com; miyauchi tadashi(?? ?? ???? ?????????)
Subject: Re: [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue.
On Wed, Aug 09, 2017 at 11:41:44AM -0500, David Teigland wrote:
> On Wed, Aug 09, 2017 at 05:51:37AM +0000, tsutomu.owa at toshiba.co.jp wrote:
> > If there is a lock resource conflict on multiple nodes, the lock on
> > convert queue may not be granted forever.
> >
> > EX.)
> > grant queue:
> > node0 grmode NL / rqmode IV
> > node1 grmode NL / rqmode IV
> >
> > convert queue:
> > node2 grmode NL / rqmode EX
> > node3 grmode PR / rqmode EX
> >
> > wait queue:
> > node4 grmode IV / rqmode PR
> > node5 grmode IV / rqmode PR
> >
> > When the lock conversion (node PR -> NL) of node 0 is completed, the lock
> > of node 2 should be grantable. However, __can_be_granted() returns 0
> > because the grmode of the lock on node 3 in convert queue is PR.
> >
> > When checking the lock at the head of convert queue, exclude
> > queue_conflict() targeting convert queue.
>
> This example doesn't look right. node2's NL->EX cannot be granted because
> it conflicts with the PR lock held by node3. (The grmode is still valid
> when a lock is on the convert queue.)
After looking more closely, this is a subtle form of conversion deadlock,
and this exact case is described in the comment here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c#n2218
This should be handled by the dlm canceling one of the converting locks
(returning it to the grant queue with IV rqmode) and returning -EDEADLK to
the application. There is a FIXME in the code highlighting a case you
could be hitting:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c#n2504
If you are running into that FIXME, you should see these log messages:
if (deadlk) {
log_print("WARN: pending deadlock %x node %d %s",
lkb->lkb_id, lkb->lkb_nodeid, r->res_name);
dlm_dump_rsb(r);
continue;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-08-17 23:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-09 5:51 [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue tsutomu.owa
2017-08-09 16:41 ` David Teigland
2017-08-09 18:48 ` David Teigland
2017-08-17 23:40 ` tsutomu.owa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).