* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread @ 2013-05-18 6:27 Joseph Qi 2013-05-18 13:26 ` Sunil Mushran 0 siblings, 1 reply; 6+ messages in thread From: Joseph Qi @ 2013-05-18 6:27 UTC (permalink / raw) To: ocfs2-devel Hi, Once there is node down in the cluster, ocfs2_recovery_thread will be triggered on each node. These threads then do the down node recovery by get super lock. I have several questions on this: 1) Why each node has to run such a thread? We know at last one node can get the super lock and do the actual recovery. 2) If this thread is running but something error occurred, take ocfs2_super_lock failed for example, the thread will exit without clearing recovery map, will it cause other threads still waiting for recovery in ocfs2_wait_for_recovery? ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread 2013-05-18 6:27 [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread Joseph Qi @ 2013-05-18 13:26 ` Sunil Mushran 2013-05-19 2:25 ` Joseph Qi 0 siblings, 1 reply; 6+ messages in thread From: Sunil Mushran @ 2013-05-18 13:26 UTC (permalink / raw) To: ocfs2-devel The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again. On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote: > Hi, > Once there is node down in the cluster, ocfs2_recovery_thread will be > triggered on each node. These threads then do the down node recovery by > get super lock. > I have several questions on this: > 1) Why each node has to run such a thread? We know at last one node can > get the super lock and do the actual recovery. > 2) If this thread is running but something error occurred, take > ocfs2_super_lock failed for example, the thread will exit without > clearing recovery map, will it cause other threads still waiting for > recovery in ocfs2_wait_for_recovery? > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread 2013-05-18 13:26 ` Sunil Mushran @ 2013-05-19 2:25 ` Joseph Qi 2013-05-20 2:49 ` Joseph Qi 0 siblings, 1 reply; 6+ messages in thread From: Joseph Qi @ 2013-05-19 2:25 UTC (permalink / raw) To: ocfs2-devel On 2013/5/18 21:26, Sunil Mushran wrote: > The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again. > > On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote: > >> Hi, >> Once there is node down in the cluster, ocfs2_recovery_thread will be >> triggered on each node. These threads then do the down node recovery by >> get super lock. >> I have several questions on this: >> 1) Why each node has to run such a thread? We know at last one node can >> get the super lock and do the actual recovery. >> 2) If this thread is running but something error occurred, take >> ocfs2_super_lock failed for example, the thread will exit without >> clearing recovery map, will it cause other threads still waiting for >> recovery in ocfs2_wait_for_recovery? >> > > But when error occurs and goes to bail, and the restart logic will not run. Codes like below: ... status = ocfs2_wait_on_mount(osb); if (status < 0) { goto bail; } rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS); if (!rm_quota) { status = -ENOMEM; goto bail; } restart: status = ocfs2_super_lock(osb, 1); if (status < 0) { mlog_errno(status); goto bail; } ... if (!status && !ocfs2_recovery_completed(osb)) { mutex_unlock(&osb->recovery_lock); goto restart; } ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread 2013-05-19 2:25 ` Joseph Qi @ 2013-05-20 2:49 ` Joseph Qi 2013-05-22 23:00 ` Sunil Mushran 0 siblings, 1 reply; 6+ messages in thread From: Joseph Qi @ 2013-05-20 2:49 UTC (permalink / raw) To: ocfs2-devel On 2013/5/19 10:25, Joseph Qi wrote: > On 2013/5/18 21:26, Sunil Mushran wrote: >> The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again. >> >> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote: >> >>> Hi, >>> Once there is node down in the cluster, ocfs2_recovery_thread will be >>> triggered on each node. These threads then do the down node recovery by >>> get super lock. >>> I have several questions on this: >>> 1) Why each node has to run such a thread? We know at last one node can >>> get the super lock and do the actual recovery. >>> 2) If this thread is running but something error occurred, take >>> ocfs2_super_lock failed for example, the thread will exit without >>> clearing recovery map, will it cause other threads still waiting for >>> recovery in ocfs2_wait_for_recovery? >>> >> >> > But when error occurs and goes to bail, and the restart logic will not > run. Codes like below: > ... > status = ocfs2_wait_on_mount(osb); > if (status < 0) { > goto bail; > } > > rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS); > if (!rm_quota) { > status = -ENOMEM; > goto bail; > } > restart: > status = ocfs2_super_lock(osb, 1); > if (status < 0) { > mlog_errno(status); > goto bail; > } > ... > if (!status && !ocfs2_recovery_completed(osb)) { > mutex_unlock(&osb->recovery_lock); > goto restart; > } > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > One more question, do we make sure dlm_recovery_thread always prior to ocfs2_recovery_thread? ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread 2013-05-20 2:49 ` Joseph Qi @ 2013-05-22 23:00 ` Sunil Mushran 2013-05-23 9:37 ` shencanquan 0 siblings, 1 reply; 6+ messages in thread From: Sunil Mushran @ 2013-05-22 23:00 UTC (permalink / raw) To: ocfs2-devel True. The function could do with a little bit of cleanup. Feel free to send a patch. On Sun, May 19, 2013 at 7:49 PM, Joseph Qi <joseph.qi@huawei.com> wrote: > On 2013/5/19 10:25, Joseph Qi wrote: > > On 2013/5/18 21:26, Sunil Mushran wrote: > >> The first node that gets the lock will do the actual recovery. The > others will get the lock and see a clean journal and skip the recovery. A > thread should never error out if it fails to get the lock. It should try > and try again. > >> > >> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com> wrote: > >> > >>> Hi, > >>> Once there is node down in the cluster, ocfs2_recovery_thread will be > >>> triggered on each node. These threads then do the down node recovery by > >>> get super lock. > >>> I have several questions on this: > >>> 1) Why each node has to run such a thread? We know at last one node can > >>> get the super lock and do the actual recovery. > >>> 2) If this thread is running but something error occurred, take > >>> ocfs2_super_lock failed for example, the thread will exit without > >>> clearing recovery map, will it cause other threads still waiting for > >>> recovery in ocfs2_wait_for_recovery? > >>> > >> > >> > > But when error occurs and goes to bail, and the restart logic will not > > run. Codes like below: > > ... > > status = ocfs2_wait_on_mount(osb); > > if (status < 0) { > > goto bail; > > } > > > > rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS); > > if (!rm_quota) { > > status = -ENOMEM; > > goto bail; > > } > > restart: > > status = ocfs2_super_lock(osb, 1); > > if (status < 0) { > > mlog_errno(status); > > goto bail; > > } > > ... > > if (!status && !ocfs2_recovery_completed(osb)) { > > mutex_unlock(&osb->recovery_lock); > > goto restart; > > } > > > > > > _______________________________________________ > > Ocfs2-devel mailing list > > Ocfs2-devel at oss.oracle.com > > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > > > > One more question, do we make sure dlm_recovery_thread always prior to > ocfs2_recovery_thread? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20130522/f0a5a382/attachment-0001.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread 2013-05-22 23:00 ` Sunil Mushran @ 2013-05-23 9:37 ` shencanquan 0 siblings, 0 replies; 6+ messages in thread From: shencanquan @ 2013-05-23 9:37 UTC (permalink / raw) To: ocfs2-devel On 2013/5/23 7:00, Sunil Mushran wrote: > True. The function could do with a little bit of cleanup. Feel free to > send a patch. from ocfs2 code , I don't found that dlm_recovery_thread always prior to ocfs2_recovery_thread? please tell me,thanks. > > > On Sun, May 19, 2013 at 7:49 PM, Joseph Qi <joseph.qi@huawei.com > <mailto:joseph.qi@huawei.com>> wrote: > > On 2013/5/19 10:25, Joseph Qi wrote: > > On 2013/5/18 21:26, Sunil Mushran wrote: > >> The first node that gets the lock will do the actual recovery. > The others will get the lock and see a clean journal and skip the > recovery. A thread should never error out if it fails to get the > lock. It should try and try again. > >> > >> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi@huawei.com > <mailto:joseph.qi@huawei.com>> wrote: > >> > >>> Hi, > >>> Once there is node down in the cluster, ocfs2_recovery_thread > will be > >>> triggered on each node. These threads then do the down node > recovery by > >>> get super lock. > >>> I have several questions on this: > >>> 1) Why each node has to run such a thread? We know at last one > node can > >>> get the super lock and do the actual recovery. > >>> 2) If this thread is running but something error occurred, take > >>> ocfs2_super_lock failed for example, the thread will exit without > >>> clearing recovery map, will it cause other threads still > waiting for > >>> recovery in ocfs2_wait_for_recovery? > >>> > >> > >> > > But when error occurs and goes to bail, and the restart logic > will not > > run. Codes like below: > > ... > > status = ocfs2_wait_on_mount(osb); > > if (status < 0) { > > goto bail; > > } > > > > rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS); > > if (!rm_quota) { > > status = -ENOMEM; > > goto bail; > > } > > restart: > > status = ocfs2_super_lock(osb, 1); > > if (status < 0) { > > mlog_errno(status); > > goto bail; > > } > > ... > > if (!status && !ocfs2_recovery_completed(osb)) { > > mutex_unlock(&osb->recovery_lock); > > goto restart; > > } > > > > > > _______________________________________________ > > Ocfs2-devel mailing list > > Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com> > > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > > > > One more question, do we make sure dlm_recovery_thread always prior to > ocfs2_recovery_thread? > > > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20130523/42aef9be/attachment.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-23 9:37 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-18 6:27 [Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread Joseph Qi 2013-05-18 13:26 ` Sunil Mushran 2013-05-19 2:25 ` Joseph Qi 2013-05-20 2:49 ` Joseph Qi 2013-05-22 23:00 ` Sunil Mushran 2013-05-23 9:37 ` shencanquan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.