From: NeilBrown <neilb@suse.de>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>, Jan Kara <jack@suse.cz>,
Dongsu Park <dongsu.park@profitbricks.com>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH - v3?] workqueue: allow rescuer thread to do more work.
Date: Tue, 18 Nov 2014 17:11:04 +1100 [thread overview]
Message-ID: <20141118171104.631f4c1f@notabene.brown> (raw)
In-Reply-To: <546AE0BC.5020606@cn.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 5094 bytes --]
On Tue, 18 Nov 2014 14:01:32 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> On 11/18/2014 12:27 PM, NeilBrown wrote:
> >
> > When there is serious memory pressure, all workers in a pool could be
> > blocked, and a new thread cannot be created because it requires memory
> > allocation.
> >
> > In this situation a WQ_MEM_RECLAIM workqueue will wake up the
> > rescuer thread to do some work.
> >
> > The rescuer will only handle requests that are already on ->worklist.
> > If max_requests is 1, that means it will handle a single request.
> >
> > The rescuer will be woken again in 100ms to handle another max_requests
> > requests.
> >
> > I've seen a machine (running a 3.0 based "enterprise" kernel) with
> > thousands of requests queued for xfslogd, which has a max_requests of
> > 1, and is needed for retiring all 'xfs' write requests. When one of
> > the worker pools gets into this state, it progresses extremely slowly
> > and possibly never recovers (only waited an hour or two).
> >
> > With this patch we leave a pool_workqueue on mayday list
> > until it is clearly no longer in need of assistance. This allows
> > all requests to be handled in a timely fashion.
> >
> > The code is a bit awkward due to the need to hold both wq_mayday_lock
> > and pool->lock at the same time, and due to the lock ordering imposed
> > on them. In particular we move work items from the ->worklist to the
> > rescuer list while holding both locks because we need pool->lock
> > to do the move, and need wq_mayday_lock to manipulate the mayday list
> > after we have found out if there was anything to do.
> >
> > 'need_to_create_worker()' is called *before* moving work items off
> > pool->worklist as an empty worklist will make it return false, but
> > after the move_linked_works() calls and before the
> > process_scheduled_works() call, an empty worklist does not indicate
> > that there is no work to do.
> >
> > We keep each pool_workqueue on the mayday list until
> > need_to_create_worker() is false, and no work for this workqueue is
> > found in the pool.
> >
> > As the main rescuer loop now iterates an arbitrary number of time,
> > cond_resched() was inserted to avoid imposing excessive latencies.
> >
> > I have tested this in combination with a (hackish) patch which forces
> > all work items to be handled by the rescuer thread. In that context
> > it significantly improves performance. A similar patch for a 3.0
> > kernel significantly improved performance on a heavy work load.
> >
> > Thanks to Jan Kara for some design ideas, and to Dongsu Park for
> > some comments and testing.
> >
> > Cc: Dongsu Park <dongsu.park@profitbricks.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index caedde34ee7f..4baa7b8b7e0f 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -2253,26 +2253,36 @@ repeat:
> > struct pool_workqueue, mayday_node);
> > struct worker_pool *pool = pwq->pool;
> > struct work_struct *work, *n;
> > + int still_needed;
> >
> > __set_current_state(TASK_RUNNING);
> > - list_del_init(&pwq->mayday_node);
> > -
> > - spin_unlock_irq(&wq_mayday_lock);
> > -
> > - worker_attach_to_pool(rescuer, pool);
> > -
> > - spin_lock_irq(&pool->lock);
> > - rescuer->pool = pool;
> > -
> > + spin_lock(&pool->lock);
> > /*
> > * Slurp in all works issued via this workqueue and
> > * process'em.
> > */
> > WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
>
> > + still_needed = need_to_create_worker(pool);
>
> This line of code will cause the rescuer busy-loop even no work-item pending
> on the workqueue.
Thanks for the review...
>
> > list_for_each_entry_safe(work, n, &pool->worklist, entry)
> > if (get_work_pwq(work) == pwq)
> > move_linked_works(work, scheduled, &n);
> >
> > + if (!list_empty(scheduled))
> > + still_needed = 1;
This should have been
if (list_empty(scheduled))
still_needed = 0;
which would address your concern. My original code effectively did that, but
when I restructured it a little to make it more readable, I broke it :-(
I'll repost tomorrow if there are no further comments.
Thanks.
NeilBrown
> > + if (still_needed) {
> > + list_move_tail(&pwq->mayday_node, &wq->maydays);
> > + get_pwq(pwq);
> > + } else
> > + /* We can let go of this one now */
> > + list_del_init(&pwq->mayday_node);
> > +
> > + spin_unlock(&pool->lock);
> > + spin_unlock_irq(&wq_mayday_lock);
> > +
> > + worker_attach_to_pool(rescuer, pool);
> > +
> > + spin_lock_irq(&pool->lock);
> > + rescuer->pool = pool;
> > process_scheduled_works(rescuer);
> >
> > /*
> > @@ -2293,7 +2303,7 @@ repeat:
> > spin_unlock_irq(&pool->lock);
> >
> > worker_detach_from_pool(rescuer, pool);
> > -
> > + cond_resched();
> > spin_lock_irq(&wq_mayday_lock);
> > }
> >
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2014-11-18 6:11 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-29 6:26 [PATCH/RFC] workqueue: allow rescuer thread to do more work NeilBrown
2014-10-29 14:32 ` Tejun Heo
2014-10-29 23:19 ` NeilBrown
2014-11-04 14:22 ` Tejun Heo
2014-11-06 16:58 ` Dongsu Park
2014-11-07 3:03 ` Lai Jiangshan
2014-11-10 5:28 ` NeilBrown
2014-11-10 8:52 ` Jan Kara
2014-11-10 22:04 ` NeilBrown
2014-11-14 17:21 ` Tejun Heo
2014-11-18 4:27 ` [PATCH - v3?] " NeilBrown
2014-11-18 6:01 ` Lai Jiangshan
2014-11-18 6:11 ` NeilBrown [this message]
2014-12-02 20:43 ` Tejun Heo
2014-12-03 0:40 ` NeilBrown
2014-12-03 17:20 ` Tejun Heo
2014-12-03 18:02 ` Tejun Heo
2014-12-03 22:31 ` Dongsu Park
2014-12-04 1:19 ` NeilBrown
2014-12-04 1:01 ` Lai Jiangshan
2014-12-04 14:57 ` Tejun Heo
2014-12-04 15:11 ` [PATCH workqueue/for-3.18-fixes 1/2] workqueue: invert the order between pool->lock and wq_mayday_lock Tejun Heo
2014-12-04 15:12 ` [PATCH workqueue/for-3.18-fixes 2/2] workqueue: allow rescuer thread to do more work Tejun Heo
2014-12-08 17:40 ` Tejun Heo
2014-12-08 22:47 ` NeilBrown
2014-12-05 2:09 ` [PATCH workqueue/for-3.18-fixes 1/2] workqueue: invert the order between pool->lock and wq_mayday_lock Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141118171104.631f4c1f@notabene.brown \
--to=neilb@suse.de \
--cc=dongsu.park@profitbricks.com \
--cc=jack@suse.cz \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.