[Cluster-devel] [PATCH] dlm: schedule during recovery loops

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH] dlm: schedule during recovery loops
@ 2007-09-25 16:23 David Teigland
  2007-09-26  7:18 ` Patrick Caulfield
  0 siblings, 1 reply; 4+ messages in thread
From: David Teigland @ 2007-09-25 16:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Call schedule() in a bunch of places where the recovery code loops
through lists of locks.  The theory is that these lists become so
long that looping through them triggers the softlockup watchdog.
(usually on ia64, doesn't seem to happen often on other arch's).

Signed-off-by: David Teigland <teigland@redhat.com>

Index: linux-quilt/fs/dlm/lock.c
===================================================================
--- linux-quilt.orig/fs/dlm/lock.c
+++ linux-quilt/fs/dlm/lock.c
@@ -3997,6 +3997,7 @@ int dlm_recover_waiters_post(struct dlm_
 		unlock_rsb(r);
 		put_rsb(r);
 		dlm_put_lkb(lkb);
+		schedule();
 	}
 
 	return error;
Index: linux-quilt/fs/dlm/recover.c
===================================================================
--- linux-quilt.orig/fs/dlm/recover.c
+++ linux-quilt/fs/dlm/recover.c
@@ -533,6 +533,7 @@ int dlm_recover_locks(struct dlm_ls *ls)
 		}
 
 		count += r->res_recover_locks_count;
+		schedule();
 	}
 	up_read(&ls->ls_root_sem);
 
@@ -705,6 +706,7 @@ void dlm_recover_rsbs(struct dlm_ls *ls)
 		rsb_clear_flag(r, RSB_RECOVER_CONVERT);
 		rsb_clear_flag(r, RSB_NEW_MASTER2);
 		unlock_rsb(r);
+		schedule();
 	}
 	up_read(&ls->ls_root_sem);
 
@@ -732,6 +734,7 @@ int dlm_create_root_list(struct dlm_ls *
 			dlm_hold_rsb(r);
 		}
 		read_unlock(&ls->ls_rsbtbl[i].lock);
+		schedule();
 	}
  out:
 	up_write(&ls->ls_root_sem);
@@ -741,11 +744,15 @@ int dlm_create_root_list(struct dlm_ls *
 void dlm_release_root_list(struct dlm_ls *ls)
 {
 	struct dlm_rsb *r, *safe;
+	unsigned int count = 0;
 
 	down_write(&ls->ls_root_sem);
 	list_for_each_entry_safe(r, safe, &ls->ls_root_list, res_root_list) {
 		list_del_init(&r->res_root_list);
 		dlm_put_rsb(r);
+
+		if (!(++count % 100))
+			schedule();
 	}
 	up_write(&ls->ls_root_sem);
 }
@@ -763,6 +770,7 @@ void dlm_clear_toss_list(struct dlm_ls *
 			free_rsb(r);
 		}
 		write_unlock(&ls->ls_rsbtbl[i].lock);
+		schedule();
 	}
 }
 
Index: linux-quilt/fs/dlm/requestqueue.c
===================================================================
--- linux-quilt.orig/fs/dlm/requestqueue.c
+++ linux-quilt/fs/dlm/requestqueue.c
@@ -192,6 +192,7 @@ void dlm_purge_requestqueue(struct dlm_l
 			list_del(&e->list);
 			kfree(e);
 		}
+		schedule();
 	}
 	mutex_unlock(&ls->ls_requestqueue_mutex);
 }



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] dlm: schedule during recovery loops
  2007-09-25 16:23 [Cluster-devel] [PATCH] dlm: schedule during recovery loops David Teigland
@ 2007-09-26  7:18 ` Patrick Caulfield
  2007-09-26 13:25   ` David Teigland
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Caulfield @ 2007-09-26  7:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

David Teigland wrote:
> Call schedule() in a bunch of places where the recovery code loops
> through lists of locks.  The theory is that these lists become so
> long that looping through them triggers the softlockup watchdog.
> (usually on ia64, doesn't seem to happen often on other arch's).
> 
> Signed-off-by: David Teigland <teigland@redhat.com>


I think we're encouraged to use cond_resched() instead these days. It has the
same effect but doesn't force a schedule if there is nothing else to run.


-- 
Patrick



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] dlm: schedule during recovery loops
  2007-09-26  7:18 ` Patrick Caulfield
@ 2007-09-26 13:25   ` David Teigland
  2007-09-26 13:52     ` Patrick Caulfield
  0 siblings, 1 reply; 4+ messages in thread
From: David Teigland @ 2007-09-26 13:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Sep 26, 2007 at 08:18:55AM +0100, Patrick Caulfield wrote:
> David Teigland wrote:
> > Call schedule() in a bunch of places where the recovery code loops
> > through lists of locks.  The theory is that these lists become so
> > long that looping through them triggers the softlockup watchdog.
> > (usually on ia64, doesn't seem to happen often on other arch's).
> > 
> > Signed-off-by: David Teigland <teigland@redhat.com>
> 
> 
> I think we're encouraged to use cond_resched() instead these days. It has the
> same effect but doesn't force a schedule if there is nothing else to run.

OK, I'd like to try to do cond_resched() instead, how certain are we that
it's just as effective in avoiding the softlockup watchdog?  Testing it is
going to be difficult since it's largely unreproducable outside of some
single cpu ia64 machines in the qe dept...



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] dlm: schedule during recovery loops
  2007-09-26 13:25   ` David Teigland
@ 2007-09-26 13:52     ` Patrick Caulfield
  0 siblings, 0 replies; 4+ messages in thread
From: Patrick Caulfield @ 2007-09-26 13:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

David Teigland wrote:
> On Wed, Sep 26, 2007 at 08:18:55AM +0100, Patrick Caulfield wrote:
>> David Teigland wrote:
>>> Call schedule() in a bunch of places where the recovery code loops
>>> through lists of locks.  The theory is that these lists become so
>>> long that looping through them triggers the softlockup watchdog.
>>> (usually on ia64, doesn't seem to happen often on other arch's).
>>>
>>> Signed-off-by: David Teigland <teigland@redhat.com>
>>
>> I think we're encouraged to use cond_resched() instead these days. It has the
>> same effect but doesn't force a schedule if there is nothing else to run.
> 
> OK, I'd like to try to do cond_resched() instead, how certain are we that
> it's just as effective in avoiding the softlockup watchdog?  Testing it is
> going to be difficult since it's largely unreproducable outside of some
> single cpu ia64 machines in the qe dept...


I can't see it making any real difference. If there is nothing to schedule then
the process will continue. With cond_resched() it continues cheaply, with
schedule() it will re-enter the scheduler and /then/ get rescheduled. If
anything it will help because there's less time spent in the schedule I suspect
(though I doubt it's measurable)

-- 
Patrick



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-09-26 13:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-25 16:23 [Cluster-devel] [PATCH] dlm: schedule during recovery loops David Teigland
2007-09-26  7:18 ` Patrick Caulfield
2007-09-26 13:25   ` David Teigland
2007-09-26 13:52     ` Patrick Caulfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).