From: NeilBrown <neilb@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Battersby <tonyb@cybernetics.com>,
linux-raid@vger.kernel.org, lkml <linux-kernel@vger.kernel.org>,
axboe@kernel.dk, Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: RAID1 might_sleep() warning on 3.19-rc7
Date: Fri, 13 Feb 2015 16:26:00 +1100 [thread overview]
Message-ID: <20150213162600.059fffb2@notabene.brown> (raw)
In-Reply-To: <20150210092936.GW21418@twins.programming.kicks-ass.net>
[-- Attachment #1: Type: text/plain, Size: 4838 bytes --]
On Tue, 10 Feb 2015 10:29:36 +0100 Peter Zijlstra <peterz@infradead.org>
wrote:
> On Tue, Feb 10, 2015 at 01:50:17PM +1100, NeilBrown wrote:
> > On Mon, 9 Feb 2015 10:10:00 +0100 Peter Zijlstra <peterz@infradead.org> wrote:
> > > > However, when io_schedule() explicitly calls blk_flush_plug(), then
> > > > @from_schedule=false variant is used, and the unplug functions are allowed to
> > > > allocate memory and block and maybe even call mempool_alloc() which might
> > > > call io_schedule().
> > > >
> > > > This shouldn't be a problem as blk_flush_plug() spliced out the plug list, so
> > > > any recursive call will find an empty list and do nothing.
> > >
> > > Unless, something along the way stuck something back on, right? So
> > > should we stick an:
> > >
> > > WARN_ON(current->in_iowait);
> > >
> > > somewhere near where things are added to this plug list? (and move the
> > > blk_flush_plug() call inside of where that's actually true of course).
> >
> > No, I don't think so.
> >
> > It is certainly possible that some request on plug->cb_list could add
> > something to plug->list - which is processed after ->cb_list.
> >
> > I think the best way to think about this is that the *problem* was that a
> > wait_event loop could spin without making any progress. So any time that
> > clear forward progress is made it is safe sleep without necessitating the
> > warning. Hence sched_annotate_sleep() is reasonable.
> > blk_flush_plug() with definitely have dispatched some requests if it
> > might_sleep(), so the sleep is OK.
>
> Well, yes, but you forget that this gets us back into recursion land.
> io_schedule() calling io_schedule() calling io_schedule() and *boom*
> stack overflow -> dead machine.
>
> We must either guarantee io_schedule() will never call io_schedule() or
> that io_schedule() itself will not add new work to the current plug such
> that calling io_schedule() itself will not recurse on the blk stuff.
>
> Pick either option, but pick one.
I choose ... Buzz Lightyear !!!
Sorry, go carried away there. Uhhmm. I think I pick a/ (But I expect I'll
find a goat... ho hum).
Does this look credible?
Thanks,
NeilBrown
From: NeilBrown <neilb@suse.de>
Date: Fri, 13 Feb 2015 15:49:17 +1100
Subject: [PATCH] sched: prevent recursion in io_schedule()
io_schedule() calls blk_flush_plug() which, depending on the
contents of current->plug, can initiate arbitrary blk-io requests.
Note that this contrasts with blk_schedule_flush_plug() which requires
all non-trivial work to be handed off to a separate thread.
This makes it possible for io_schedule() to recurse, and initiating
block requests could possibly call mempool_alloc() which, in times of
memory pressure, uses io_schedule().
Apart from any stack usage issues, io_schedule() will not behave
correctly when called recursively as delayacct_blkio_start() does
not allow for repeated calls.
So:
- use in_iowait to detect recursion. Set it earlier, and restore
it to the old value.
- move the call to "raw_rq" after the call to blk_flush_plug().
As this is some sort of per-cpu thing, we want some chance that
we are on the right CPU
- When io_schedule() is called recurively, use blk_schedule_flush_plug()
which cannot further recurse.
- as this makes io_schedule() a lot more complex and as io_schedule()
must match io_schedule_timeout(), but all the changes in io_schedule_timeout()
and make io_schedule a simple wrapper for that.
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f37fe7f77a4..90f3de8bc7ca 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4420,30 +4420,27 @@ EXPORT_SYMBOL_GPL(yield_to);
*/
void __sched io_schedule(void)
{
- struct rq *rq = raw_rq();
-
- delayacct_blkio_start();
- atomic_inc(&rq->nr_iowait);
- blk_flush_plug(current);
- current->in_iowait = 1;
- schedule();
- current->in_iowait = 0;
- atomic_dec(&rq->nr_iowait);
- delayacct_blkio_end();
+ io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
}
EXPORT_SYMBOL(io_schedule);
long __sched io_schedule_timeout(long timeout)
{
- struct rq *rq = raw_rq();
+ struct rq *rq;
long ret;
+ int old_iowait = current->in_iowait;
+
+ current->in_iowait = 1;
+ if (old_iowait)
+ blk_schedule_flush_plug(current);
+ else
+ blk_flush_plug(current);
delayacct_blkio_start();
+ rq = raw_rq();
atomic_inc(&rq->nr_iowait);
- blk_flush_plug(current);
- current->in_iowait = 1;
ret = schedule_timeout(timeout);
- current->in_iowait = 0;
+ current->in_iowait = old_iowait;
atomic_dec(&rq->nr_iowait);
delayacct_blkio_end();
return ret;
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2015-02-13 5:26 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-05 20:27 RAID1 might_sleep() warning on 3.19-rc7 Tony Battersby
2015-02-05 21:51 ` NeilBrown
2015-02-06 11:39 ` Peter Zijlstra
2015-02-09 1:13 ` NeilBrown
2015-02-09 9:10 ` Peter Zijlstra
2015-02-10 2:50 ` NeilBrown
2015-02-10 9:29 ` Peter Zijlstra
2015-02-10 11:01 ` Peter Zijlstra
2015-02-13 5:26 ` NeilBrown [this message]
2015-02-13 8:32 ` Peter Zijlstra
2015-02-13 8:49 ` NeilBrown
2015-02-13 10:27 ` Peter Zijlstra
2015-02-13 14:48 ` Peter Zijlstra
2015-02-18 1:09 ` NeilBrown
2015-02-18 13:47 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150213162600.059fffb2@notabene.brown \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tonyb@cybernetics.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).