All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [DLM PATCH] dlm: Don't swamp the CPU with callbacks queued during recovery
       [not found] <369491237.32086764.1541703782413.JavaMail.zimbra@redhat.com>
@ 2018-11-08 19:04 ` Bob Peterson
  2018-11-09 10:14   ` Steven Whitehouse
  0 siblings, 1 reply; 2+ messages in thread
From: Bob Peterson @ 2018-11-08 19:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Before this patch, recovery would cause all callbacks to be delayed,
put on a queue, and afterward they were all queued to the callback
work queue. This patch does the same thing, but occasionally takes
a break after 25 of them so it won't swamp the CPU at the expense
of other RT processes like corosync.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/dlm/ast.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 562fa8c3edff..47ee66d70109 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -292,6 +292,8 @@ void dlm_callback_suspend(struct dlm_ls *ls)
 		flush_workqueue(ls->ls_callback_wq);
 }
 
+#define MAX_CB_QUEUE 25
+
 void dlm_callback_resume(struct dlm_ls *ls)
 {
 	struct dlm_lkb *lkb, *safe;
@@ -302,15 +304,23 @@ void dlm_callback_resume(struct dlm_ls *ls)
 	if (!ls->ls_callback_wq)
 		return;
 
+more:
 	mutex_lock(&ls->ls_cb_mutex);
 	list_for_each_entry_safe(lkb, safe, &ls->ls_cb_delay, lkb_cb_list) {
 		list_del_init(&lkb->lkb_cb_list);
 		queue_work(ls->ls_callback_wq, &lkb->lkb_cb_work);
 		count++;
+		if (count == MAX_CB_QUEUE)
+			break;
 	}
 	mutex_unlock(&ls->ls_cb_mutex);
 
 	if (count)
 		log_rinfo(ls, "dlm_callback_resume %d", count);
+	if (count == MAX_CB_QUEUE) {
+		count = 0;
+		cond_resched();
+		goto more;
+	}
 }
 



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [Cluster-devel] [DLM PATCH] dlm: Don't swamp the CPU with callbacks queued during recovery
  2018-11-08 19:04 ` [Cluster-devel] [DLM PATCH] dlm: Don't swamp the CPU with callbacks queued during recovery Bob Peterson
@ 2018-11-09 10:14   ` Steven Whitehouse
  0 siblings, 0 replies; 2+ messages in thread
From: Steven Whitehouse @ 2018-11-09 10:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 08/11/18 19:04, Bob Peterson wrote:
> Hi,
>
> Before this patch, recovery would cause all callbacks to be delayed,
> put on a queue, and afterward they were all queued to the callback
> work queue. This patch does the same thing, but occasionally takes
> a break after 25 of them so it won't swamp the CPU at the expense
> of other RT processes like corosync.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>   fs/dlm/ast.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
> index 562fa8c3edff..47ee66d70109 100644
> --- a/fs/dlm/ast.c
> +++ b/fs/dlm/ast.c
> @@ -292,6 +292,8 @@ void dlm_callback_suspend(struct dlm_ls *ls)
>   		flush_workqueue(ls->ls_callback_wq);
>   }
>   
> +#define MAX_CB_QUEUE 25
> +
>   void dlm_callback_resume(struct dlm_ls *ls)
>   {
>   	struct dlm_lkb *lkb, *safe;
> @@ -302,15 +304,23 @@ void dlm_callback_resume(struct dlm_ls *ls)
>   	if (!ls->ls_callback_wq)
>   		return;
>   
> +more:
>   	mutex_lock(&ls->ls_cb_mutex);
>   	list_for_each_entry_safe(lkb, safe, &ls->ls_cb_delay, lkb_cb_list) {
>   		list_del_init(&lkb->lkb_cb_list);
>   		queue_work(ls->ls_callback_wq, &lkb->lkb_cb_work);
>   		count++;
> +		if (count == MAX_CB_QUEUE)
> +			break;
>   	}
>   	mutex_unlock(&ls->ls_cb_mutex);
>   
>   	if (count)
>   		log_rinfo(ls, "dlm_callback_resume %d", count);
> +	if (count == MAX_CB_QUEUE) {
> +		count = 0;
> +		cond_resched();
> +		goto more;
> +	}
>   }
>   
>

While that is a good thing to do, it looks like the real culprit here 
might be elsewhere. Look at what this is doing... adding a large number 
of work items under the ls_cb_mutex, and then look at what the work item 
does... first thing is to lock the lkb_cb_mutex, so if we have a 
multi-core system then this is creating a large number of work items all 
of which will be fighting each other (and the thread that is trying to 
add new items) for the lock so no wonder it doesn't work efficiently.

If we called the callbacks directly here, then we would avoid all that 
fighting for the mutex and also remove the need for scheduling the work 
item in the first place too. That should greatly decrease the amount of 
cpu time required and reduce latency and contention on the mutex,

Steve.



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-11-09 10:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <369491237.32086764.1541703782413.JavaMail.zimbra@redhat.com>
2018-11-08 19:04 ` [Cluster-devel] [DLM PATCH] dlm: Don't swamp the CPU with callbacks queued during recovery Bob Peterson
2018-11-09 10:14   ` Steven Whitehouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.