From: Kent Overstreet <kmo@daterainc.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Sam Fulcomer <enotty@brown.edu>,
Francis Moreau <francis.moro@gmail.com>,
Daniel J Blueman <daniel@quora.org>,
linux-bcache@vger.kernel.org
Subject: Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
Date: Thu, 10 Apr 2014 17:46:49 -0700 [thread overview]
Message-ID: <20140411004649.GA4817@kmo> (raw)
In-Reply-To: <20140411002636.GB9176@birch.djwong.org>
On Thu, Apr 10, 2014 at 05:26:36PM -0700, Darrick J. Wong wrote:
> Hi all,
>
> The attached patch fixes both the "writeback blocked for XXX seconds"
> complaints from the kernel and the oddly high load averages on idle systems
> problems for me. Can you give it a try to see if it fixes your problem too?
>
> --D
> ---
> Currently, the writeback thread performs uninterruptible sleep while
> it waits for enough dirty data to accumulate to start writeback.
> Unfortunately, uninterruptible sleep counts towards load average,
> which artificially inflates it. Since the wb thread is a kernel
> thread and kthreads don't receive signals, we can use the
> interruptible sleep call, which eliminates the high load average
> symptom.
>
> A second symptom is that if we mount a non-writeback cache, the
> writeback thread will be woken up. If the cache later accumulates
> dirty data and writeback_running=1 (this seems to be a default) then
> the writeback thread will enter uninterruptible sleep waiting for
> dirty data. This is unnecessary and (I think) results in the
> "bcache_writebac:155 blocked for more than XXX seconds" complaints
> that people have been talking about. The fix for this is simple -- if
> we're not in writeback mode, just go to (interruptible) sleep for a
> long time. Alternately, we could use wait_event until the cache mode
> changes.
>
> Finally, change bch_cached_dev_attach() to always wake up the
> writeback thread, because the newly created wb thread remains in
> uninterruptible sleep state until something explicitly wakes it up.
> This wakeup allows the thread to call bch_writeback_thread(),
> whereupon it will most likely end up in interruptible sleep. In
> theory we could just let the first write take care of this, but
> there's really no reason not to do the transition quickly.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> drivers/md/bcache/super.c | 2 +-
> drivers/md/bcache/writeback.c | 16 ++++++++++++++--
> 2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 24a3a15..3ffe970 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1048,8 +1048,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
> bch_sectors_dirty_init(dc);
> atomic_set(&dc->has_dirty, 1);
> atomic_inc(&dc->count);
> - bch_writeback_queue(dc);
> }
> + bch_writeback_queue(dc);
>
> bch_cached_dev_run(dc);
> bcache_device_link(&dc->disk, c, "bdev");
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index f4300e4..f49e6b1 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -239,7 +239,7 @@ static void read_dirty(struct cached_dev *dc)
> if (KEY_START(&w->key) != dc->last_read ||
> jiffies_to_msecs(delay) > 50)
> while (!kthread_should_stop() && delay)
> - delay = schedule_timeout_uninterruptible(delay);
> + delay = schedule_timeout_interruptible(delay);
>
> dc->last_read = KEY_OFFSET(&w->key);
>
> @@ -401,6 +401,18 @@ static int bch_writeback_thread(void *arg)
>
> while (!kthread_should_stop()) {
> down_write(&dc->writeback_lock);
> + if (BDEV_CACHE_MODE(&dc->sb) != CACHE_MODE_WRITEBACK) {
> + up_write(&dc->writeback_lock);
> + set_current_state(TASK_INTERRUPTIBLE);
> +
> + if (kthread_should_stop())
> + return 0;
> +
> + try_to_freeze();
> + schedule_timeout_interruptible(10 * HZ);
> + continue;
> + }
> +
So this addition isn't correct - cache mode might've been flipped to
writethrough, but we might still have dirty data we need to flush: that's why
the line below is checking dc->has_dirty.
I don't think your schedule_timeout_interruptible() is correct either, and I'm
not seeing what's wrong with the code right below - where it's doing the
set_current_state() then the schedule() - but from your report of high idle load
average (what about cpu?) it sounds like something is wrong.
Can you try and diagnose further? Also if you want to split the rest of the
changes out into a separate patch I'll definitely take that. Thanks!
> if (!atomic_read(&dc->has_dirty) ||
> (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
> !dc->writeback_running)) {
> @@ -436,7 +448,7 @@ static int bch_writeback_thread(void *arg)
> while (delay &&
> !kthread_should_stop() &&
> !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags))
> - delay = schedule_timeout_uninterruptible(delay);
> + delay = schedule_timeout_interruptible(delay);
> }
> }
>
next prev parent reply other threads:[~2014-04-11 0:46 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-11 0:26 [PATCH] bcache: fix writeback thread to sleep less intrusively Darrick J. Wong
2014-04-11 0:46 ` Kent Overstreet [this message]
2014-04-11 1:41 ` Darrick J. Wong
2014-04-12 2:31 ` Darrick J. Wong
2014-04-12 2:33 ` [PATCH] bcache: let the writeback thread run at least once at startup Darrick J. Wong
2014-04-30 11:51 ` [PATCH] bcache: fix writeback thread to sleep less intrusively Daniel Smedegaard Buus
2014-04-30 17:24 ` Darrick J. Wong
2014-05-01 9:38 ` Daniel Smedegaard Buus
2014-05-01 21:54 ` Slava Pestov
2014-05-20 7:07 ` Daniel J Blueman
[not found] ` <CACHGV4+mNu_KV7JazT-34D++3S2NKhDkOmc_wo0QfrfdqpccoQ@mail.gmail.com>
2014-07-09 10:27 ` Daniel Smedegaard Buus
2014-07-09 15:53 ` Peter Kieser
2014-07-09 18:02 ` Daniel Smedegaard Buus
2014-07-09 18:19 ` Slava Pestov
2014-07-09 19:18 ` Daniel Smedegaard Buus
2014-04-11 7:26 ` Francis Moreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140411004649.GA4817@kmo \
--to=kmo@daterainc.com \
--cc=daniel@quora.org \
--cc=darrick.wong@oracle.com \
--cc=enotty@brown.edu \
--cc=francis.moro@gmail.com \
--cc=linux-bcache@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).