[Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
@ 2012-05-25  5:53 xiaowei.hu at oracle.com
  2012-05-25 22:17 ` srinivas eeda
  2012-05-29 22:09 ` Sunil Mushran
  0 siblings, 2 replies; 7+ messages in thread
From: xiaowei.hu at oracle.com @ 2012-05-25  5:53 UTC (permalink / raw)
  To: ocfs2-devel

From: "Xiaowei.Hu" <xiaowei.hu@oracle.com>

when the master requested locks ,but one/some of the live nodes died,
after it received the request msg and before send out the locks packages,
the recovery will fall into endless loop,waiting for the status changed to finalize

NodeA                                     NodeB
selected as recovery master
dlm_remaster_locks
  -> dlm_requeset_all_locks
  this send request locks msg to B
                                          received the msg from A,
                                          queue worker dlm_request_all_locks_worker
                                          return 0
go on set state to requested
wait for the state become done
                                          NodeB lost connection due to network
                                          before the worker begin, or it die.
NodeA still waiting for the
change of reco state.
It won't end if it not get data done msg
And at this time nodeB do not realize this (or it just died),
it won't send the msg for ever, nodeA left in the recovery process forever.

This patch let the recovery master check if the node still in live node
map when it stay in REQUESTED status.

Signed-off-by: Xiaowei.Hu <xiaowei.hu@oracle.com>
---
 fs/ocfs2/dlm/dlmrecovery.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 01ebfd0..62659e8 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -555,6 +555,7 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 dead_node)
 	int all_nodes_done;
 	int destroy = 0;
 	int pass = 0;
+	int dying = 0;
 
 	do {
 		/* we have become recovery master.  there is no escaping
@@ -659,6 +660,7 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 dead_node)
 		list_for_each_entry(ndata, &dlm->reco.node_data, list) {
 			mlog(0, "checking recovery state of node %u\n",
 			     ndata->node_num);
+			dying = 0;
 			switch (ndata->state) {
 				case DLM_RECO_NODE_DATA_INIT:
 				case DLM_RECO_NODE_DATA_REQUESTING:
@@ -679,6 +681,13 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 dead_node)
 					     dlm->name, ndata->node_num,
 					     ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
 					     "receiving" : "requested");
+					spin_lock(&dlm->spinlock);
+					dying = !test_bit(ndata->node_num, dlm->live_nodes_map);
+					spin_unlock(&dlm->spinlock);
+					if (dying) {
+						ndata->state = DLM_RECO_NODE_DATA_DEAD;
+						break;
+					}
 					all_nodes_done = 0;
 					break;
 				case DLM_RECO_NODE_DATA_DONE:
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
  2012-05-25  5:53 [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery xiaowei.hu at oracle.com
@ 2012-05-25 22:17 ` srinivas eeda
  2012-05-26  2:05   ` Xiaowei
  2012-05-29 22:09 ` Sunil Mushran
  1 sibling, 1 reply; 7+ messages in thread
From: srinivas eeda @ 2012-05-25 22:17 UTC (permalink / raw)
  To: ocfs2-devel

comments inline

On 5/24/2012 10:53 PM, xiaowei.hu at oracle.com wrote:
> From: "Xiaowei.Hu"<xiaowei.hu@oracle.com>
>
> when the master requested locks ,but one/some of the live nodes died,
> after it received the request msg and before send out the locks packages,
> the recovery will fall into endless loop,waiting for the status changed to finalize
>
> NodeA                                     NodeB
> selected as recovery master
> dlm_remaster_locks
>    ->  dlm_requeset_all_locks
>    this send request locks msg to B
>                                            received the msg from A,
>                                            queue worker dlm_request_all_locks_worker
>                                            return 0
> go on set state to requested
> wait for the state become done
>                                            NodeB lost connection due to network
>                                            before the worker begin, or it die.
> NodeA still waiting for the
> change of reco state.
> It won't end if it not get data done msg
> And at this time nodeB do not realize this (or it just died),
> it won't send the msg for ever, nodeA left in the recovery process forever.
>
> This patch let the recovery master check if the node still in live node
> map when it stay in REQUESTED status.
>
> Signed-off-by: Xiaowei.Hu<xiaowei.hu@oracle.com>
> ---
>   fs/ocfs2/dlm/dlmrecovery.c |    9 +++++++++
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 01ebfd0..62659e8 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -555,6 +555,7 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 dead_node)
>   	int all_nodes_done;
>   	int destroy = 0;
>   	int pass = 0;
> +	int dying = 0;
>
>   	do {
>   		/* we have become recovery master.  there is no escaping
> @@ -659,6 +660,7 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 dead_node)
>   		list_for_each_entry(ndata,&dlm->reco.node_data, list) {
>   			mlog(0, "checking recovery state of node %u\n",
>   			     ndata->node_num);
> +			dying = 0;
>   			switch (ndata->state) {
>   				case DLM_RECO_NODE_DATA_INIT:
>   				case DLM_RECO_NODE_DATA_REQUESTING:
> @@ -679,6 +681,13 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 dead_node)
>   					     dlm->name, ndata->node_num,
>   					     ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
>   					     "receiving" : "requested");
> +					spin_lock(&dlm->spinlock);
> +					dying = !test_bit(ndata->node_num, dlm->live_nodes_map);
> +					spin_unlock(&dlm->spinlock);
> +					if (dying) {
> +						ndata->state = DLM_RECO_NODE_DATA_DEAD;
> +						break;
> +					}
>   					all_nodes_done = 0;
>   					break;
>   				case DLM_RECO_NODE_DATA_DONE:
fix seems to address the issue, but can you please add a function 
dlm_is_node_in_livemap similar to dlm_is_node_dead so that it' improves 
readability. You can then add the following to check if the node is 
still alive
+        if (!dlm_is_node_in_livemap(dlm, ndata->node_num))
+            ndate->state = DLM_RECO_NODE_DATA_DEAD;
+        else
+            all_nodes_done = 0;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
  2012-05-25 22:17 ` srinivas eeda
@ 2012-05-26  2:05   ` Xiaowei
  0 siblings, 0 replies; 7+ messages in thread
From: Xiaowei @ 2012-05-26  2:05 UTC (permalink / raw)
  To: ocfs2-devel

Thanks Srini ,
This sounds good, I tried to use dlm_is_node_dead in this patch , but 
this function can't report
another node is dead if this node already in recovery process. It was 
blocked to set the bit in domain_map,
but the live_nodes_map could always reflect the really live nodes.

I will reformat the patch.

Thanks,
Xiaowei

On 05/26/2012 06:17 AM, srinivas eeda wrote:
> comments inline
>
> On 5/24/2012 10:53 PM, xiaowei.hu at oracle.com wrote:
>> From: "Xiaowei.Hu"<xiaowei.hu@oracle.com>
>>
>> when the master requested locks ,but one/some of the live nodes died,
>> after it received the request msg and before send out the locks 
>> packages,
>> the recovery will fall into endless loop,waiting for the status 
>> changed to finalize
>>
>> NodeA                                     NodeB
>> selected as recovery master
>> dlm_remaster_locks
>>    ->  dlm_requeset_all_locks
>>    this send request locks msg to B
>>                                            received the msg from A,
>>                                            queue worker 
>> dlm_request_all_locks_worker
>>                                            return 0
>> go on set state to requested
>> wait for the state become done
>>                                            NodeB lost connection due 
>> to network
>>                                            before the worker begin, 
>> or it die.
>> NodeA still waiting for the
>> change of reco state.
>> It won't end if it not get data done msg
>> And at this time nodeB do not realize this (or it just died),
>> it won't send the msg for ever, nodeA left in the recovery process 
>> forever.
>>
>> This patch let the recovery master check if the node still in live node
>> map when it stay in REQUESTED status.
>>
>> Signed-off-by: Xiaowei.Hu<xiaowei.hu@oracle.com>
>> ---
>>   fs/ocfs2/dlm/dlmrecovery.c |    9 +++++++++
>>   1 files changed, 9 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>> index 01ebfd0..62659e8 100644
>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>> @@ -555,6 +555,7 @@ static int dlm_remaster_locks(struct dlm_ctxt 
>> *dlm, u8 dead_node)
>>       int all_nodes_done;
>>       int destroy = 0;
>>       int pass = 0;
>> +    int dying = 0;
>>
>>       do {
>>           /* we have become recovery master.  there is no escaping
>> @@ -659,6 +660,7 @@ static int dlm_remaster_locks(struct dlm_ctxt 
>> *dlm, u8 dead_node)
>>           list_for_each_entry(ndata,&dlm->reco.node_data, list) {
>>               mlog(0, "checking recovery state of node %u\n",
>>                    ndata->node_num);
>> +            dying = 0;
>>               switch (ndata->state) {
>>                   case DLM_RECO_NODE_DATA_INIT:
>>                   case DLM_RECO_NODE_DATA_REQUESTING:
>> @@ -679,6 +681,13 @@ static int dlm_remaster_locks(struct dlm_ctxt 
>> *dlm, u8 dead_node)
>>                            dlm->name, ndata->node_num,
>>                            ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
>>                            "receiving" : "requested");
>> +                    spin_lock(&dlm->spinlock);
>> +                    dying = !test_bit(ndata->node_num, 
>> dlm->live_nodes_map);
>> +                    spin_unlock(&dlm->spinlock);
>> +                    if (dying) {
>> +                        ndata->state = DLM_RECO_NODE_DATA_DEAD;
>> +                        break;
>> +                    }
>>                       all_nodes_done = 0;
>>                       break;
>>                   case DLM_RECO_NODE_DATA_DONE:
> fix seems to address the issue, but can you please add a function 
> dlm_is_node_in_livemap similar to dlm_is_node_dead so that it' 
> improves readability. You can then add the following to check if the 
> node is still alive
> +        if (!dlm_is_node_in_livemap(dlm, ndata->node_num))
> +            ndate->state = DLM_RECO_NODE_DATA_DEAD;
> +        else
> +            all_nodes_done = 0;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
  2012-05-25  5:53 [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery xiaowei.hu at oracle.com
  2012-05-25 22:17 ` srinivas eeda
@ 2012-05-29 22:09 ` Sunil Mushran
  2012-05-30  0:41   ` Xiaowei
  1 sibling, 1 reply; 7+ messages in thread
From: Sunil Mushran @ 2012-05-29 22:09 UTC (permalink / raw)
  To: ocfs2-devel

On Thu, May 24, 2012 at 10:53 PM, <xiaowei.hu@oracle.com> wrote:

>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 01ebfd0..62659e8 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -555,6 +555,7 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8
> dead_node)
>        int all_nodes_done;
>        int destroy = 0;
>        int pass = 0;
> +       int dying = 0;
>
>        do {
>                /* we have become recovery master.  there is no escaping
> @@ -659,6 +660,7 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8
> dead_node)
>                list_for_each_entry(ndata, &dlm->reco.node_data, list) {
>                        mlog(0, "checking recovery state of node %u\n",
>                             ndata->node_num);
> +                       dying = 0;
>                        switch (ndata->state) {
>                                case DLM_RECO_NODE_DATA_INIT:
>                                case DLM_RECO_NODE_DATA_REQUESTING:
> @@ -679,6 +681,13 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm,
> u8 dead_node)
>                                             dlm->name, ndata->node_num,
>
> ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
>                                             "receiving" : "requested");
> +                                       spin_lock(&dlm->spinlock);
> +                                       dying = !test_bit(ndata->node_num,
> dlm->live_nodes_map);
> +                                       spin_unlock(&dlm->spinlock);
> +                                       if (dying) {
> +                                               ndata->state =
> DLM_RECO_NODE_DATA_DEAD;
> +                                               break;
> +                                       }
>




I would suggest exploring adding this in dlm hb down event. Checking live
map all
over the place is hacky. We do it more than we should right now. Let's not
add to the
mess.





>                                        all_nodes_done = 0;
>                                        break;
>                                case DLM_RECO_NODE_DATA_DONE:
> --
> 1.7.7.6
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20120529/1080a567/attachment.html 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
  2012-05-29 22:09 ` Sunil Mushran
@ 2012-05-30  0:41   ` Xiaowei
  2012-05-31  1:18     ` Sunil Mushran
  0 siblings, 1 reply; 7+ messages in thread
From: Xiaowei @ 2012-05-30  0:41 UTC (permalink / raw)
  To: ocfs2-devel

On 05/30/2012 06:09 AM, Sunil Mushran wrote:
> On Thu, May 24, 2012 at 10:53 PM, <xiaowei.hu@oracle.com 
> <mailto:xiaowei.hu@oracle.com>> wrote:
>
>
>     diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>     index 01ebfd0..62659e8 100644
>     --- a/fs/ocfs2/dlm/dlmrecovery.c
>     +++ b/fs/ocfs2/dlm/dlmrecovery.c
>     @@ -555,6 +555,7 @@ static int dlm_remaster_locks(struct dlm_ctxt
>     *dlm, u8 dead_node)
>            int all_nodes_done;
>            int destroy = 0;
>            int pass = 0;
>     +       int dying = 0;
>
>            do {
>                    /* we have become recovery master.  there is no
>     escaping
>     @@ -659,6 +660,7 @@ static int dlm_remaster_locks(struct dlm_ctxt
>     *dlm, u8 dead_node)
>                    list_for_each_entry(ndata, &dlm->reco.node_data,
>     list) {
>                            mlog(0, "checking recovery state of node %u\n",
>                                 ndata->node_num);
>     +                       dying = 0;
>                            switch (ndata->state) {
>                                    case DLM_RECO_NODE_DATA_INIT:
>                                    case DLM_RECO_NODE_DATA_REQUESTING:
>     @@ -679,6 +681,13 @@ static int dlm_remaster_locks(struct dlm_ctxt
>     *dlm, u8 dead_node)
>                                                 dlm->name,
>     ndata->node_num,
>                                                
>     ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
>                                                 "receiving" :
>     "requested");
>     +                                       spin_lock(&dlm->spinlock);
>     +                                       dying =
>     !test_bit(ndata->node_num, dlm->live_nodes_map);
>     +                                       spin_unlock(&dlm->spinlock);
>     +                                       if (dying) {
>     +                                               ndata->state =
>     DLM_RECO_NODE_DATA_DEAD;
>     +                                               break;
>     +                                       }
>
>
>
>
>
> I would suggest exploring adding this in dlm hb down event. Checking 
> live map all
> over the place is hacky. We do it more than we should right now. Let's 
> not add to the
> mess.
HI Sunil,

Do you mean we should clear the bit in domain map in dlm hb down event 
directly when the node down
and check with dlm_is_node_dead at here?
Or how could we explore and ensure the node is alive during the whole 
migrate process?One node could die even after it sends out one locks 
package and before the next if there were too many locks on that lockres.

Thanks,
Xiaowei
>
>
>
>                                            all_nodes_done = 0;
>                                            break;
>                                    case DLM_RECO_NODE_DATA_DONE:
>     --
>     1.7.7.6
>
>
>     _______________________________________________
>     Ocfs2-devel mailing list
>     Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>
>     http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20120530/5fcb3ea7/attachment.html 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
  2012-05-30  0:41   ` Xiaowei
@ 2012-05-31  1:18     ` Sunil Mushran
  2012-07-26  6:52       ` Xiaowei
  0 siblings, 1 reply; 7+ messages in thread
From: Sunil Mushran @ 2012-05-31  1:18 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, May 29, 2012 at 5:41 PM, Xiaowei <xiaowei.hu@oracle.com> wrote:
> On 05/30/2012 06:09 AM, Sunil Mushran wrote:
> I would suggest exploring adding this in dlm hb down event. Checking live
> map all
> over the place is hacky. We do it more than we should right now. Let's not
> add to the
> mess.
>
> HI Sunil,
>
> Do you mean we should clear the bit in domain map in dlm hb down event
> directly when the node down
> and check with dlm_is_node_dead at here?
> Or how could we explore and ensure the node is alive during the whole
> migrate process?One node could die even after it sends out one locks package
> and before the next if there were too many locks on that lockres.

dlm hb down event is triggered when a node is declared dead. That's where we
clean up pending mles, etc. You can add a check for recovery and add logic to
change the reco state for that node there.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
  2012-05-31  1:18     ` Sunil Mushran
@ 2012-07-26  6:52       ` Xiaowei
  0 siblings, 0 replies; 7+ messages in thread
From: Xiaowei @ 2012-07-26  6:52 UTC (permalink / raw)
  To: ocfs2-devel

Hi Sunil,

I considered your suggestion about this patch, it's possible to change 
the status in dlm hb down event,
but what need to change are the dlm_reco_node_data structures in 
dlm->reco.node_data list.
This list is initialized in dlm_remaster_locks when it begins the lock 
remaster and destroied before exit this function.
So it's not proper to check data in such a list from dlm hb down event, 
am I right?
If change the status from dlm hb down event , that means we make the 
recovery thread rely on more information from the hb down event,
actually the dlm->live_nodes_map is marked in this event , and for 
others to check , right?

This race condition only happen when cluster already in recovery and a 
node dead during recovery. the recovery thread blocked the update of 
dlm->domain_map, so I fallback to check the live_nodes_map, which won't 
be blocked.

Please reconsider this patch.

Thanks,
Xiaowei

On 05/31/2012 09:18 AM, Sunil Mushran wrote:
> On Tue, May 29, 2012 at 5:41 PM, Xiaowei <xiaowei.hu@oracle.com> wrote:
>> On 05/30/2012 06:09 AM, Sunil Mushran wrote:
>> I would suggest exploring adding this in dlm hb down event. Checking live
>> map all
>> over the place is hacky. We do it more than we should right now. Let's not
>> add to the
>> mess.
>>
>> HI Sunil,
>>
>> Do you mean we should clear the bit in domain map in dlm hb down event
>> directly when the node down
>> and check with dlm_is_node_dead at here?
>> Or how could we explore and ensure the node is alive during the whole
>> migrate process?One node could die even after it sends out one locks package
>> and before the next if there were too many locks on that lockres.
> dlm hb down event is triggered when a node is declared dead. That's where we
> clean up pending mles, etc. You can add a check for recovery and add logic to
> change the reco state for that node there.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-07-26  6:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-25  5:53 [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery xiaowei.hu at oracle.com
2012-05-25 22:17 ` srinivas eeda
2012-05-26  2:05   ` Xiaowei
2012-05-29 22:09 ` Sunil Mushran
2012-05-30  0:41   ` Xiaowei
2012-05-31  1:18     ` Sunil Mushran
2012-07-26  6:52       ` Xiaowei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.