All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread
@ 2013-05-18  6:27 Joseph Qi
  2013-05-18 13:26 ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2013-05-18  6:27 UTC (permalink / raw)
  To: ocfs2-devel

Hi,
Once there is node down in the cluster, ocfs2_recovery_thread will be
triggered on each node. These threads then do the down node recovery by
get super lock.
I have several questions on this:
1) Why each node has to run such a thread? We know at last one node can
get the super lock and do the actual recovery.
2) If this thread is running but something error occurred, take
ocfs2_super_lock failed for example, the thread will exit without
clearing recovery map, will it cause other threads still waiting for
recovery in ocfs2_wait_for_recovery?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread
  2013-05-18  6:27 [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread Joseph Qi
@ 2013-05-18 13:26 ` Sunil Mushran
  2013-05-19  2:25   ` Joseph Qi
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2013-05-18 13:26 UTC (permalink / raw)
  To: ocfs2-devel

The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again.

On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote:

> Hi,
> Once there is node down in the cluster, ocfs2_recovery_thread will be
> triggered on each node. These threads then do the down node recovery by
> get super lock.
> I have several questions on this:
> 1) Why each node has to run such a thread? We know at last one node can
> get the super lock and do the actual recovery.
> 2) If this thread is running but something error occurred, take
> ocfs2_super_lock failed for example, the thread will exit without
> clearing recovery map, will it cause other threads still waiting for
> recovery in ocfs2_wait_for_recovery?
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread
  2013-05-18 13:26 ` Sunil Mushran
@ 2013-05-19  2:25   ` Joseph Qi
  2013-05-20  2:49     ` Joseph Qi
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2013-05-19  2:25 UTC (permalink / raw)
  To: ocfs2-devel

On 2013/5/18 21:26, Sunil Mushran wrote:
> The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again.
> 
> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote:
> 
>> Hi,
>> Once there is node down in the cluster, ocfs2_recovery_thread will be
>> triggered on each node. These threads then do the down node recovery by
>> get super lock.
>> I have several questions on this:
>> 1) Why each node has to run such a thread? We know at last one node can
>> get the super lock and do the actual recovery.
>> 2) If this thread is running but something error occurred, take
>> ocfs2_super_lock failed for example, the thread will exit without
>> clearing recovery map, will it cause other threads still waiting for
>> recovery in ocfs2_wait_for_recovery?
>>
> 
> 
But when error occurs and goes to bail, and the restart logic will not
run. Codes like below:
...
	status = ocfs2_wait_on_mount(osb);
	if (status < 0) {
		goto bail;
	}

	rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS);
	if (!rm_quota) {
		status = -ENOMEM;
		goto bail;
	}
restart:
	status = ocfs2_super_lock(osb, 1);
	if (status < 0) {
		mlog_errno(status);
		goto bail;
	}
...
	if (!status && !ocfs2_recovery_completed(osb)) {
		mutex_unlock(&osb->recovery_lock);
		goto restart;
	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread
  2013-05-19  2:25   ` Joseph Qi
@ 2013-05-20  2:49     ` Joseph Qi
  2013-05-22 23:00       ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph Qi @ 2013-05-20  2:49 UTC (permalink / raw)
  To: ocfs2-devel

On 2013/5/19 10:25, Joseph Qi wrote:
> On 2013/5/18 21:26, Sunil Mushran wrote:
>> The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again.
>>
>> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote:
>>
>>> Hi,
>>> Once there is node down in the cluster, ocfs2_recovery_thread will be
>>> triggered on each node. These threads then do the down node recovery by
>>> get super lock.
>>> I have several questions on this:
>>> 1) Why each node has to run such a thread? We know at last one node can
>>> get the super lock and do the actual recovery.
>>> 2) If this thread is running but something error occurred, take
>>> ocfs2_super_lock failed for example, the thread will exit without
>>> clearing recovery map, will it cause other threads still waiting for
>>> recovery in ocfs2_wait_for_recovery?
>>>
>>
>>
> But when error occurs and goes to bail, and the restart logic will not
> run. Codes like below:
> ...
> 	status = ocfs2_wait_on_mount(osb);
> 	if (status < 0) {
> 		goto bail;
> 	}
> 
> 	rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS);
> 	if (!rm_quota) {
> 		status = -ENOMEM;
> 		goto bail;
> 	}
> restart:
> 	status = ocfs2_super_lock(osb, 1);
> 	if (status < 0) {
> 		mlog_errno(status);
> 		goto bail;
> 	}
> ...
> 	if (!status && !ocfs2_recovery_completed(osb)) {
> 		mutex_unlock(&osb->recovery_lock);
> 		goto restart;
> 	}
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 
One more question, do we make sure dlm_recovery_thread always prior to
ocfs2_recovery_thread?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread
  2013-05-20  2:49     ` Joseph Qi
@ 2013-05-22 23:00       ` Sunil Mushran
  2013-05-23  9:37         ` shencanquan
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2013-05-22 23:00 UTC (permalink / raw)
  To: ocfs2-devel

True. The function could do with a little bit of cleanup. Feel free to send
a patch.


On Sun, May 19, 2013 at 7:49 PM, Joseph Qi <joseph.qi@huawei.com> wrote:

> On 2013/5/19 10:25, Joseph Qi wrote:
> > On 2013/5/18 21:26, Sunil Mushran wrote:
> >> The first node that gets the lock will do the actual recovery. The
> others will get the lock and see a clean journal and skip the recovery. A
> thread should never error out if it fails to get the lock. It should try
> and try again.
> >>
> >> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote:
> >>
> >>> Hi,
> >>> Once there is node down in the cluster, ocfs2_recovery_thread will be
> >>> triggered on each node. These threads then do the down node recovery by
> >>> get super lock.
> >>> I have several questions on this:
> >>> 1) Why each node has to run such a thread? We know at last one node can
> >>> get the super lock and do the actual recovery.
> >>> 2) If this thread is running but something error occurred, take
> >>> ocfs2_super_lock failed for example, the thread will exit without
> >>> clearing recovery map, will it cause other threads still waiting for
> >>> recovery in ocfs2_wait_for_recovery?
> >>>
> >>
> >>
> > But when error occurs and goes to bail, and the restart logic will not
> > run. Codes like below:
> > ...
> >       status = ocfs2_wait_on_mount(osb);
> >       if (status < 0) {
> >               goto bail;
> >       }
> >
> >       rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS);
> >       if (!rm_quota) {
> >               status = -ENOMEM;
> >               goto bail;
> >       }
> > restart:
> >       status = ocfs2_super_lock(osb, 1);
> >       if (status < 0) {
> >               mlog_errno(status);
> >               goto bail;
> >       }
> > ...
> >       if (!status && !ocfs2_recovery_completed(osb)) {
> >               mutex_unlock(&osb->recovery_lock);
> >               goto restart;
> >       }
> >
> >
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel at oss.oracle.com
> > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >
> >
> One more question, do we make sure dlm_recovery_thread always prior to
> ocfs2_recovery_thread?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20130522/f0a5a382/attachment-0001.html 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread
  2013-05-22 23:00       ` Sunil Mushran
@ 2013-05-23  9:37         ` shencanquan
  0 siblings, 0 replies; 6+ messages in thread
From: shencanquan @ 2013-05-23  9:37 UTC (permalink / raw)
  To: ocfs2-devel

On 2013/5/23 7:00, Sunil Mushran wrote:
> True. The function could do with a little bit of cleanup. Feel free to 
> send a patch.

  from ocfs2 code , I don't found that dlm_recovery_thread always prior to
ocfs2_recovery_thread? please tell me,thanks.
>
>
> On Sun, May 19, 2013 at 7:49 PM, Joseph Qi <joseph.qi@huawei.com 
> <mailto:joseph.qi@huawei.com>> wrote:
>
>     On 2013/5/19 10:25, Joseph Qi wrote:
>     > On 2013/5/18 21:26, Sunil Mushran wrote:
>     >> The first node that gets the lock will do the actual recovery.
>     The others will get the lock and see a clean journal and skip the
>     recovery. A thread should never error out if it fails to get the
>     lock. It should try and try again.
>     >>
>     >> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com
>     <mailto:joseph.qi@huawei.com>> wrote:
>     >>
>     >>> Hi,
>     >>> Once there is node down in the cluster, ocfs2_recovery_thread
>     will be
>     >>> triggered on each node. These threads then do the down node
>     recovery by
>     >>> get super lock.
>     >>> I have several questions on this:
>     >>> 1) Why each node has to run such a thread? We know at last one
>     node can
>     >>> get the super lock and do the actual recovery.
>     >>> 2) If this thread is running but something error occurred, take
>     >>> ocfs2_super_lock failed for example, the thread will exit without
>     >>> clearing recovery map, will it cause other threads still
>     waiting for
>     >>> recovery in ocfs2_wait_for_recovery?
>     >>>
>     >>
>     >>
>     > But when error occurs and goes to bail, and the restart logic
>     will not
>     > run. Codes like below:
>     > ...
>     >       status = ocfs2_wait_on_mount(osb);
>     >       if (status < 0) {
>     >               goto bail;
>     >       }
>     >
>     >       rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS);
>     >       if (!rm_quota) {
>     >               status = -ENOMEM;
>     >               goto bail;
>     >       }
>     > restart:
>     >       status = ocfs2_super_lock(osb, 1);
>     >       if (status < 0) {
>     >               mlog_errno(status);
>     >               goto bail;
>     >       }
>     > ...
>     >       if (!status && !ocfs2_recovery_completed(osb)) {
>     >               mutex_unlock(&osb->recovery_lock);
>     >               goto restart;
>     >       }
>     >
>     >
>     > _______________________________________________
>     > Ocfs2-devel mailing list
>     > Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>
>     > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>     >
>     >
>     One more question, do we make sure dlm_recovery_thread always prior to
>     ocfs2_recovery_thread?
>
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20130523/42aef9be/attachment.html 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-23  9:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-18  6:27 [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread Joseph Qi
2013-05-18 13:26 ` Sunil Mushran
2013-05-19  2:25   ` Joseph Qi
2013-05-20  2:49     ` Joseph Qi
2013-05-22 23:00       ` Sunil Mushran
2013-05-23  9:37         ` shencanquan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.