From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Dongsu Park <dongsu.park@profitbricks.com>
Cc: Tejun Heo <tj@kernel.org>, NeilBrown <neilb@suse.de>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH/RFC] workqueue: allow rescuer thread to do more work.
Date: Fri, 7 Nov 2014 11:03:40 +0800 [thread overview]
Message-ID: <545C368C.5040704@cn.fujitsu.com> (raw)
In-Reply-To: <20141106165811.GA2338@gmail.com>
On 11/07/2014 12:58 AM, Dongsu Park wrote:
> Hi Tejun & Neil,
>
> On 04.11.2014 09:22, Tejun Heo wrote:
>> On Thu, Oct 30, 2014 at 10:19:32AM +1100, NeilBrown wrote:
>>>> Given that workder depletion is pool-wide
>>>> event, maybe it'd make sense to trigger rescuers immediately while
>>>> workers are in short supply? e.g. while there's a manager stuck in
>>>> maybe_create_worker() with the mayday timer already triggered?
>>>
>>> So what if I change "need_more_worker" to "need_to_create_worker" ?
>>> Then it will stop as soon as there in an idle worker thread.
>>> That is the condition that keeps maybe_create_worker() looping.
>>> ??
>>
>> Yeah, that'd be a better condition and can work out. Can you please
>> write up a patch to do that and do some synthetic tests excercising
>> the code path? Also please cc Lai Jiangshan <laijs@cn.fujitsu.com>
>> when posting the patch.
>
> This issue looks exactly like what I've encountered occasionally in our test
> setup. (with a custom kernel based on 3.12, MD/raid1, dm-multipath, etc.)
> When a system suffers from high memory pressure, and at the same time
> underlying devices of RAID arrays are repeatedly removed and re-added,
> then sometimes the whole system gets locked up on a worker pool's lock.
> So I had to fix our custom MD code to allocate a separate ordered workqueue
> with WQ_MEM_RECLAIM, apart from md_wq or md_misc_wq.
> Then the lockup seemed to have disappeared.
>
> Now that I read the Neil's patch, which looks like an ultimate solution
> to the problem I have seen. I'm really looking forward to seeing this
> change in mainline.
>
> How about the attached patch? Based on the Neil's patch, I replaced
> need_more_worker() with need_to_create_worker() as Tejun suggested.
>
> Test is running with this patch, which seems to be working for now.
> But I'm going to observe the test result carefully for a few more days.
>
> Regards,
> Dongsu
>
> ----
>>From de9aadd6fb742ea8acce4245a27946d3f233ab7f Mon Sep 17 00:00:00 2001
> From: Dongsu Park <dongsu.park@profitbricks.com>
> Date: Wed, 5 Nov 2014 17:28:07 +0100
> Subject: [RFC PATCH] workqueue: allow rescuer thread to do more work
>
> Original commit message from NeilBrown <neilb@suse.de>:
> ====
> When there is serious memory pressure, all workers in a pool could be
> blocked, and a new thread cannot be created because it requires memory
> allocation.
>
> In this situation a WQ_MEM_RECLAIM workqueue will wake up the rescuer
> thread to do some work.
>
> The rescuer will only handle requests that are already on ->worklist.
> If max_requests is 1, that means it will handle a single request.
>
> The rescuer will be woken again in 100ms to handle another max_requests
> requests.
I also observed this problem by review when I was developing
the per-pwq-worklist patchset which has a side-affect that it also naturally
fix the problem.
However, it is nothing about correctness and I made promise to Frederic Weisbecker
for working on unbound pool for power-saving, then the per-pwq-worklist patchset
is put off. So I have to ack it.
>
> I've seen a machine (running a 3.0 based "enterprise" kernel) with
> thousands of requests queued for xfslogd, which has a max_requests of 1,
> and is needed for retiring all 'xfs' write requests. When one of the
> worker pools gets into this state, it progresses extremely slowly and
> possibly never recovers (only waited an hour or two).
>
> So if, after handling everything on worklist, there is again something
> on worklist (counted in nr_active), and if the queue is still congested,
> keep processing instead of waiting for the next wake-up.
> ====
>
> Dongsu Park: replaced need_more_worker() with need_to_create_worker(),
> as suggested by Tejun.
>
> Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
> Link: https://lkml.org/lkml/2014/10/29/55
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> Original-by: NeilBrown <neilb@suse.de>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> kernel/workqueue.c | 23 +++++++++++++----------
> 1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 09b685d..4d20225 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2244,16 +2244,19 @@ repeat:
> spin_lock_irq(&pool->lock);
> rescuer->pool = pool;
>
> - /*
> - * Slurp in all works issued via this workqueue and
> - * process'em.
> - */
> - WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
> - list_for_each_entry_safe(work, n, &pool->worklist, entry)
> - if (get_work_pwq(work) == pwq)
> - move_linked_works(work, scheduled, &n);
> -
> - process_scheduled_works(rescuer);
> + do {
> + /*
> + * Slurp in all works issued via this workqueue and
> + * process'em.
> + */
> + WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
> + list_for_each_entry_safe(work, n, &pool->worklist,
> + entry)
> + if (get_work_pwq(work) == pwq)
> + move_linked_works(work, scheduled, &n);
> +
> + process_scheduled_works(rescuer);
> + } while (need_to_create_worker(pool) && pwq->nr_active);
>
> /*
> * Put the reference grabbed by send_mayday(). @pool won't
next prev parent reply other threads:[~2014-11-07 3:00 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-29 6:26 [PATCH/RFC] workqueue: allow rescuer thread to do more work NeilBrown
2014-10-29 14:32 ` Tejun Heo
2014-10-29 23:19 ` NeilBrown
2014-11-04 14:22 ` Tejun Heo
2014-11-06 16:58 ` Dongsu Park
2014-11-07 3:03 ` Lai Jiangshan [this message]
2014-11-10 5:28 ` NeilBrown
2014-11-10 8:52 ` Jan Kara
2014-11-10 22:04 ` NeilBrown
2014-11-14 17:21 ` Tejun Heo
2014-11-18 4:27 ` [PATCH - v3?] " NeilBrown
2014-11-18 6:01 ` Lai Jiangshan
2014-11-18 6:11 ` NeilBrown
2014-12-02 20:43 ` Tejun Heo
2014-12-03 0:40 ` NeilBrown
2014-12-03 17:20 ` Tejun Heo
2014-12-03 18:02 ` Tejun Heo
2014-12-03 22:31 ` Dongsu Park
2014-12-04 1:19 ` NeilBrown
2014-12-04 1:01 ` Lai Jiangshan
2014-12-04 14:57 ` Tejun Heo
2014-12-04 15:11 ` [PATCH workqueue/for-3.18-fixes 1/2] workqueue: invert the order between pool->lock and wq_mayday_lock Tejun Heo
2014-12-04 15:12 ` [PATCH workqueue/for-3.18-fixes 2/2] workqueue: allow rescuer thread to do more work Tejun Heo
2014-12-08 17:40 ` Tejun Heo
2014-12-08 22:47 ` NeilBrown
2014-12-05 2:09 ` [PATCH workqueue/for-3.18-fixes 1/2] workqueue: invert the order between pool->lock and wq_mayday_lock Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=545C368C.5040704@cn.fujitsu.com \
--to=laijs@cn.fujitsu.com \
--cc=dongsu.park@profitbricks.com \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.