All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] clvmd: closedown the cluster after finishing of lvm_thread
@ 2013-11-27  8:56 dongmao zhang
  2013-11-28 13:57 ` Zdenek Kabelac
  0 siblings, 1 reply; 3+ messages in thread
From: dongmao zhang @ 2013-11-27  8:56 UTC (permalink / raw)
  To: lvm-devel

when lvm_thread is processing remote request, the clvmd
received a SIG_TERM, it will free cluster resource before
the realwork of lvm_thread is done. If freeing the cluster
resource happens before send_message, it would cause the
remote command hangs forever.

this patch move closedown after the closing the working thread.
---
 daemons/clvmd/clvmd.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/daemons/clvmd/clvmd.c b/daemons/clvmd/clvmd.c
index d57c0fd..b2f7dd5 100644
--- a/daemons/clvmd/clvmd.c
+++ b/daemons/clvmd/clvmd.c
@@ -621,6 +621,8 @@ int main(int argc, char *argv[])
 	if ((errno = pthread_join(lvm_thread, NULL)))
 		log_sys_error("pthread_join", "");
 
+	clops->cluster_closedown();
+
 	close_local_sock(local_sock);
 	destroy_lvm();
 
@@ -979,7 +981,6 @@ static void main_loop(int local_sock, int cmd_timeout)
 	}
 
       closedown:
-	clops->cluster_closedown();
 	if (quit)
 		DEBUGLOG("SIGTERM received\n");
 }
-- 
1.7.3.4



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] clvmd: closedown the cluster after finishing of lvm_thread
  2013-11-27  8:56 [PATCH] clvmd: closedown the cluster after finishing of lvm_thread dongmao zhang
@ 2013-11-28 13:57 ` Zdenek Kabelac
  2013-11-29  6:06   ` dongmao zhang
  0 siblings, 1 reply; 3+ messages in thread
From: Zdenek Kabelac @ 2013-11-28 13:57 UTC (permalink / raw)
  To: lvm-devel

Dne 27.11.2013 09:56, dongmao zhang napsal(a):
> when lvm_thread is processing remote request, the clvmd
> received a SIG_TERM, it will free cluster resource before
> the realwork of lvm_thread is done. If freeing the cluster
> resource happens before send_message, it would cause the
> remote command hangs forever.
>
> this patch move closedown after the closing the working thread.
> ---
>   daemons/clvmd/clvmd.c |    3 ++-
>   1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/daemons/clvmd/clvmd.c b/daemons/clvmd/clvmd.c
> index d57c0fd..b2f7dd5 100644
> --- a/daemons/clvmd/clvmd.c
> +++ b/daemons/clvmd/clvmd.c
> @@ -621,6 +621,8 @@ int main(int argc, char *argv[])
>   	if ((errno = pthread_join(lvm_thread, NULL)))
>   		log_sys_error("pthread_join", "");
>
> +	clops->cluster_closedown();
> +
>   	close_local_sock(local_sock);
>   	destroy_lvm();
>
> @@ -979,7 +981,6 @@ static void main_loop(int local_sock, int cmd_timeout)
>   	}
>
>         closedown:
> -	clops->cluster_closedown();
>   	if (quit)
>   		DEBUGLOG("SIGTERM received\n");
>   }


It's not clear to me how this  code move helps to anything.

You just moved call of  clops->cluster_closedown(); after joining thread?

In which code path this patch is changing something ?

Zdenek



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] clvmd: closedown the cluster after finishing of lvm_thread
  2013-11-28 13:57 ` Zdenek Kabelac
@ 2013-11-29  6:06   ` dongmao zhang
  0 siblings, 0 replies; 3+ messages in thread
From: dongmao zhang @ 2013-11-29  6:06 UTC (permalink / raw)
  To: lvm-devel

? 2013?11?28? 21:57, Zdenek Kabelac ??:
> Dne 27.11.2013 09:56, dongmao zhang napsal(a):
>> when lvm_thread is processing remote request, the clvmd
>> received a SIG_TERM, it will free cluster resource before
>> the realwork of lvm_thread is done. If freeing the cluster
>> resource happens before send_message, it would cause the
>> remote command hangs forever.
>>
>> this patch move closedown after the closing the working thread.
>> ---
>> daemons/clvmd/clvmd.c | 3 ++-
>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/daemons/clvmd/clvmd.c b/daemons/clvmd/clvmd.c
>> index d57c0fd..b2f7dd5 100644
>> --- a/daemons/clvmd/clvmd.c
>> +++ b/daemons/clvmd/clvmd.c
>> @@ -621,6 +621,8 @@ int main(int argc, char *argv[])
>> if ((errno = pthread_join(lvm_thread, NULL)))
>> log_sys_error("pthread_join", "");
>>
>> + clops->cluster_closedown();
>> +
>> close_local_sock(local_sock);
>> destroy_lvm();
>>
>> @@ -979,7 +981,6 @@ static void main_loop(int local_sock, int 
>> cmd_timeout)
>> }
>>
>> closedown:
>> - clops->cluster_closedown();
>> if (quit)
>> DEBUGLOG("SIGTERM received\n");
>> }
>
>
> It's not clear to me how this code move helps to anything.
>
> You just moved call of clops->cluster_closedown(); after joining thread?
>
> In which code path this patch is changing something ?
>
> Zdenek
>
>

hi Zdenek,
thank you for you reply. The main idea is that the lvm_thread_fn is 
using cluster resources(such as using cpg_handler in send_message), we 
could not free cluster resource until lvm_thread_fn finishs.

The 'lvm_thread_fn' thread is doing 'process_work_item' in which it will 
send reply message(cluster_send_message) back
to remote nodes. The cluster_send_message is using the cluster resource. 
So it means we can not free the cluster resource before lvm_thread_fn 
really is finished. The cluster_closedown in the main thread could 
possibly happen before lvm_thread_fn thread calls send_message.

If so, it could cause a sending message failure, moreover, the remote 
node can not get the response, it has to wait a timeout to finish.

I met a bug like this: two nodes with VG resource.
1. NodeA runs 'rcopenais stop'
2. NodeB runs 'vgscan'

in some time, vgscan could hang for a while waiting all cluster nodes' 
response.
Because unfortunately clvmd on NodeA can not send back message because 
cluster_closedown happens before send_message.


Dongmao Zhang






















^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-11-29  6:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-27  8:56 [PATCH] clvmd: closedown the cluster after finishing of lvm_thread dongmao zhang
2013-11-28 13:57 ` Zdenek Kabelac
2013-11-29  6:06   ` dongmao zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.