linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jens Axboe <axboe@fb.com>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Mason <clm@fb.com>, Dave Chinner <david@fromorbit.com>,
	Jan Kara <jack@suse.cz>, Josef Bacik <jbacik@fb.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Neil Brown <neilb@suse.de>, Christoph Hellwig <hch@lst.de>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()
Date: Fri, 18 Sep 2015 08:32:15 -0700	[thread overview]
Message-ID: <CA+55aFznZFMHzGZr1ybY9wu_FuRJ-ZiJipuHM3brvHO7GybUpQ@mail.gmail.com> (raw)
In-Reply-To: <55FC1E72.3040500@fb.com>

On Fri, Sep 18, 2015 at 7:23 AM, Jens Axboe <axboe@fb.com> wrote:
>
> It makes no sense for preemption schedule to NOT unplug, the fact that it
> doesn't is news to me as well. It was never the intent of the
> unplug-on-schedule to NOT unplug for certain schedule out events, that seems
> like very odd behavior.

Actually, even a *full* schedule doesn't unplug, unless the process is
going to sleep. See sched_submit_work(), which  will only call the
unplugging if the process is actually going to sleep (ok, so it's a
bit subtle if you don't know the state rules, but it's the
"!tsk->state" check there)

So preemption and cond_resched() isn't _that_ odd. We've basically
treated a non-sleeping schedule as a no-op for the task work.

The thinking was probably that it might be better to delay starting
the IO in case we get scheduled back quickly, and we're obviously not
actually _sleeping_, so it's likely not too bad.

Now, that's probably bogus, and I think that we should perhaps just
make the rule be that "if we actually switch to another task, we run
blk_schedule_flush_plug()".

But it should be noted that that really *does* introduce a lot of new
potential races. Traditionally, our block layer plugging has been
entirely thread-synchronous, and would never happen asynchronously.
But with preemption, that "switch to another thread" really *does*
happen asynchronously.

So making things always happen on task switch is actually fairly
dangerous, and potentially adds the need for much more synchronization
for the IO submission.

What we possibly *could* make the scheduler rule be:

 - if it's not an actual PREEMPT_ACTIVE (ie in a random place)

 - _and_ we actually switch to another thread

 - _then_ do the whole blk_schedule_flush_plug(tsk) thing.

adding some scheduler people to the explicit cc list.

That said, the "cond_resched[_lock]()" functions currently always set
PREEMPT_ACTIVE (indirectly - they use preempt_schedule_common()), so
even though those are synchronous, right now they *look* asynchronous
to the scheduler, so we'd still have to sort that out.

Ingo/Peter/Frederic? Comments?

                        Linus

  reply	other threads:[~2015-09-18 15:32 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11 19:37 [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug() Chris Mason
2015-09-11 20:02 ` Linus Torvalds
2015-09-11 20:37   ` Linus Torvalds
2015-09-11 20:40     ` Josef Bacik
2015-09-11 21:04       ` Linus Torvalds
2015-09-11 22:06         ` Linus Torvalds
2015-09-11 23:16           ` Chris Mason
2015-09-11 23:36             ` Linus Torvalds
2015-09-12  0:52               ` Linus Torvalds
2015-09-12  2:15                 ` Chris Mason
2015-09-12  2:27                   ` Linus Torvalds
2015-09-12 23:00               ` Chris Mason
2015-09-12 23:29                 ` Linus Torvalds
2015-09-12 23:46                   ` Chris Mason
2015-09-13 13:12                     ` Chris Mason
2015-09-13 22:56                       ` Dave Chinner
2015-09-13 23:12                 ` Dave Chinner
2015-09-14 20:06                   ` Linus Torvalds
2015-09-16 15:16                     ` Chris Mason
2015-09-16 19:58                       ` Jan Kara
2015-09-16 20:00                         ` Chris Mason
2015-09-16 22:07                           ` Dave Chinner
2015-09-17  0:37                             ` Dave Chinner
2015-09-17  1:12                               ` Linus Torvalds
2015-09-17  2:14                                 ` Dave Chinner
2015-09-17 19:39                                   ` Linus Torvalds
2015-09-17 22:42                                     ` Chris Mason
2015-09-17 23:08                                       ` Linus Torvalds
2015-09-17 23:56                                         ` Chris Mason
2015-09-18  0:37                                           ` Dave Chinner
2015-09-18  1:50                                             ` Linus Torvalds
2015-09-18  5:40                                               ` Dave Chinner
2015-09-18  6:04                                                 ` Linus Torvalds
2015-09-18  6:06                                                   ` Linus Torvalds
2015-09-18 14:21                                                     ` Jens Axboe
2015-09-18 13:16                                                   ` Chris Mason
2015-09-18 14:23                                                     ` Jens Axboe
2015-09-18 15:32                                                       ` Linus Torvalds [this message]
2015-09-18 15:59                                                         ` Peter Zijlstra
2015-09-18 16:02                                                           ` Peter Zijlstra
2015-09-18 16:12                                                           ` Linus Torvalds
2015-09-28 14:47                                                             ` Peter Zijlstra
2015-09-28 16:08                                                               ` Linus Torvalds
2015-09-29  7:55                                                                 ` Ingo Molnar
2015-09-18 22:17                                                   ` Dave Chinner
2015-09-21  9:24                                                     ` Jan Kara
2015-09-21 20:21                                                       ` Andrew Morton
2015-09-17 23:03                                   ` Dave Chinner
2015-09-17 23:13                                     ` Linus Torvalds
2015-09-17  3:48                               ` Chris Mason
2015-09-17  4:30                                 ` Dave Chinner
2015-09-17 12:13                                   ` Chris Mason
2015-09-11 23:06         ` Chris Mason
2015-09-11 23:13           ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2015-09-09 15:23 Chris Mason
2015-09-11 18:49 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFznZFMHzGZr1ybY9wu_FuRJ-ZiJipuHM3brvHO7GybUpQ@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=fweisbec@gmail.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=neilb@suse.de \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).