[PATCH] bcache: fix writeback thread to sleep less intrusively

linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] bcache: fix writeback thread to sleep less intrusively
@ 2014-04-11  0:26 Darrick J. Wong
  2014-04-11  0:46 ` Kent Overstreet
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2014-04-11  0:26 UTC (permalink / raw)
  To: Kent Overstreet, Sam Fulcomer, Francis Moreau, Daniel J Blueman
  Cc: linux-bcache

Hi all,

The attached patch fixes both the "writeback blocked for XXX seconds"
complaints from the kernel and the oddly high load averages on idle systems
problems for me.  Can you give it a try to see if it fixes your problem too?

--D
---
Currently, the writeback thread performs uninterruptible sleep while
it waits for enough dirty data to accumulate to start writeback.
Unfortunately, uninterruptible sleep counts towards load average,
which artificially inflates it.  Since the wb thread is a kernel
thread and kthreads don't receive signals, we can use the
interruptible sleep call, which eliminates the high load average
symptom.

A second symptom is that if we mount a non-writeback cache, the
writeback thread will be woken up.  If the cache later accumulates
dirty data and writeback_running=1 (this seems to be a default) then
the writeback thread will enter uninterruptible sleep waiting for
dirty data.  This is unnecessary and (I think) results in the
"bcache_writebac:155 blocked for more than XXX seconds" complaints
that people have been talking about.  The fix for this is simple -- if
we're not in writeback mode, just go to (interruptible) sleep for a
long time.  Alternately, we could use wait_event until the cache mode
changes.

Finally, change bch_cached_dev_attach() to always wake up the
writeback thread, because the newly created wb thread remains in
uninterruptible sleep state until something explicitly wakes it up.
This wakeup allows the thread to call bch_writeback_thread(),
whereupon it will most likely end up in interruptible sleep.  In
theory we could just let the first write take care of this, but
there's really no reason not to do the transition quickly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 drivers/md/bcache/super.c     |    2 +-
 drivers/md/bcache/writeback.c |   16 ++++++++++++++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 24a3a15..3ffe970 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1048,8 +1048,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
 		bch_sectors_dirty_init(dc);
 		atomic_set(&dc->has_dirty, 1);
 		atomic_inc(&dc->count);
-		bch_writeback_queue(dc);
 	}
+	bch_writeback_queue(dc);

 	bch_cached_dev_run(dc);
 	bcache_device_link(&dc->disk, c, "bdev");
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index f4300e4..f49e6b1 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -239,7 +239,7 @@ static void read_dirty(struct cached_dev *dc)
 		if (KEY_START(&w->key) != dc->last_read ||
 		    jiffies_to_msecs(delay) > 50)
 			while (!kthread_should_stop() && delay)
-				delay = schedule_timeout_uninterruptible(delay);
+				delay = schedule_timeout_interruptible(delay);

 		dc->last_read	= KEY_OFFSET(&w->key);

@@ -401,6 +401,18 @@ static int bch_writeback_thread(void *arg)

 	while (!kthread_should_stop()) {
 		down_write(&dc->writeback_lock);
+		if (BDEV_CACHE_MODE(&dc->sb) != CACHE_MODE_WRITEBACK) {
+			up_write(&dc->writeback_lock);
+			set_current_state(TASK_INTERRUPTIBLE);
+
+			if (kthread_should_stop())
+				return 0;
+
+			try_to_freeze();
+			schedule_timeout_interruptible(10 * HZ);
+			continue;
+		}
+
 		if (!atomic_read(&dc->has_dirty) ||
 		    (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
 		     !dc->writeback_running)) {
@@ -436,7 +448,7 @@ static int bch_writeback_thread(void *arg)
 			while (delay &&
 			       !kthread_should_stop() &&
 			       !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags))
-				delay = schedule_timeout_uninterruptible(delay);
+				delay = schedule_timeout_interruptible(delay);
 		}
 	}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-04-11  0:26 [PATCH] bcache: fix writeback thread to sleep less intrusively Darrick J. Wong
@ 2014-04-11  0:46 ` Kent Overstreet
  2014-04-11  1:41   ` Darrick J. Wong
  2014-04-11  7:26   ` Francis Moreau
  0 siblings, 2 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-04-11  0:46 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Sam Fulcomer, Francis Moreau, Daniel J Blueman, linux-bcache

On Thu, Apr 10, 2014 at 05:26:36PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> The attached patch fixes both the "writeback blocked for XXX seconds"
> complaints from the kernel and the oddly high load averages on idle systems
> problems for me.  Can you give it a try to see if it fixes your problem too?
> 
> --D
> ---
> Currently, the writeback thread performs uninterruptible sleep while
> it waits for enough dirty data to accumulate to start writeback.
> Unfortunately, uninterruptible sleep counts towards load average,
> which artificially inflates it.  Since the wb thread is a kernel
> thread and kthreads don't receive signals, we can use the
> interruptible sleep call, which eliminates the high load average
> symptom.
> 
> A second symptom is that if we mount a non-writeback cache, the
> writeback thread will be woken up.  If the cache later accumulates
> dirty data and writeback_running=1 (this seems to be a default) then
> the writeback thread will enter uninterruptible sleep waiting for
> dirty data.  This is unnecessary and (I think) results in the
> "bcache_writebac:155 blocked for more than XXX seconds" complaints
> that people have been talking about.  The fix for this is simple -- if
> we're not in writeback mode, just go to (interruptible) sleep for a
> long time.  Alternately, we could use wait_event until the cache mode
> changes.
> 
> Finally, change bch_cached_dev_attach() to always wake up the
> writeback thread, because the newly created wb thread remains in
> uninterruptible sleep state until something explicitly wakes it up.
> This wakeup allows the thread to call bch_writeback_thread(),
> whereupon it will most likely end up in interruptible sleep.  In
> theory we could just let the first write take care of this, but
> there's really no reason not to do the transition quickly.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  drivers/md/bcache/super.c     |    2 +-
>  drivers/md/bcache/writeback.c |   16 ++++++++++++++--
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 24a3a15..3ffe970 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1048,8 +1048,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
>  		bch_sectors_dirty_init(dc);
>  		atomic_set(&dc->has_dirty, 1);
>  		atomic_inc(&dc->count);
> -		bch_writeback_queue(dc);
>  	}
> +	bch_writeback_queue(dc);
>  
>  	bch_cached_dev_run(dc);
>  	bcache_device_link(&dc->disk, c, "bdev");
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index f4300e4..f49e6b1 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -239,7 +239,7 @@ static void read_dirty(struct cached_dev *dc)
>  		if (KEY_START(&w->key) != dc->last_read ||
>  		    jiffies_to_msecs(delay) > 50)
>  			while (!kthread_should_stop() && delay)
> -				delay = schedule_timeout_uninterruptible(delay);
> +				delay = schedule_timeout_interruptible(delay);
>  
>  		dc->last_read	= KEY_OFFSET(&w->key);
>  
> @@ -401,6 +401,18 @@ static int bch_writeback_thread(void *arg)
>  
>  	while (!kthread_should_stop()) {
>  		down_write(&dc->writeback_lock);
> +		if (BDEV_CACHE_MODE(&dc->sb) != CACHE_MODE_WRITEBACK) {
> +			up_write(&dc->writeback_lock);
> +			set_current_state(TASK_INTERRUPTIBLE);
> +
> +			if (kthread_should_stop())
> +				return 0;
> +
> +			try_to_freeze();
> +			schedule_timeout_interruptible(10 * HZ);
> +			continue;
> +		}
> +

So this addition isn't correct - cache mode might've been flipped to
writethrough, but we might still have dirty data we need to flush: that's why
the line below is checking dc->has_dirty.

I don't think your schedule_timeout_interruptible() is correct either, and I'm
not seeing what's wrong with the code right below - where it's doing the
set_current_state() then the schedule() - but from your report of high idle load
average (what about cpu?) it sounds like something is wrong.

Can you try and diagnose further? Also if you want to split the rest of the
changes out into a separate patch I'll definitely take that. Thanks!


>  		if (!atomic_read(&dc->has_dirty) ||
>  		    (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
>  		     !dc->writeback_running)) {
> @@ -436,7 +448,7 @@ static int bch_writeback_thread(void *arg)
>  			while (delay &&
>  			       !kthread_should_stop() &&
>  			       !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags))
> -				delay = schedule_timeout_uninterruptible(delay);
> +				delay = schedule_timeout_interruptible(delay);
>  		}
>  	}
>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-04-11  0:46 ` Kent Overstreet
@ 2014-04-11  1:41   ` Darrick J. Wong
  2014-04-12  2:31     ` Darrick J. Wong
  2014-04-11  7:26   ` Francis Moreau
  1 sibling, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2014-04-11  1:41 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Sam Fulcomer, Francis Moreau, Daniel J Blueman, linux-bcache

On Thu, Apr 10, 2014 at 05:46:49PM -0700, Kent Overstreet wrote:
> On Thu, Apr 10, 2014 at 05:26:36PM -0700, Darrick J. Wong wrote:
> > Hi all,
> > 
> > The attached patch fixes both the "writeback blocked for XXX seconds"
> > complaints from the kernel and the oddly high load averages on idle systems
> > problems for me.  Can you give it a try to see if it fixes your problem too?
> > 
> > --D
> > ---
> > Currently, the writeback thread performs uninterruptible sleep while
> > it waits for enough dirty data to accumulate to start writeback.
> > Unfortunately, uninterruptible sleep counts towards load average,
> > which artificially inflates it.  Since the wb thread is a kernel
> > thread and kthreads don't receive signals, we can use the
> > interruptible sleep call, which eliminates the high load average
> > symptom.
> > 
> > A second symptom is that if we mount a non-writeback cache, the
> > writeback thread will be woken up.  If the cache later accumulates
> > dirty data and writeback_running=1 (this seems to be a default) then
> > the writeback thread will enter uninterruptible sleep waiting for
> > dirty data.  This is unnecessary and (I think) results in the
> > "bcache_writebac:155 blocked for more than XXX seconds" complaints
> > that people have been talking about.  The fix for this is simple -- if
> > we're not in writeback mode, just go to (interruptible) sleep for a
> > long time.  Alternately, we could use wait_event until the cache mode
> > changes.
> > 
> > Finally, change bch_cached_dev_attach() to always wake up the
> > writeback thread, because the newly created wb thread remains in
> > uninterruptible sleep state until something explicitly wakes it up.
> > This wakeup allows the thread to call bch_writeback_thread(),
> > whereupon it will most likely end up in interruptible sleep.  In
> > theory we could just let the first write take care of this, but
> > there's really no reason not to do the transition quickly.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  drivers/md/bcache/super.c     |    2 +-
> >  drivers/md/bcache/writeback.c |   16 ++++++++++++++--
> >  2 files changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index 24a3a15..3ffe970 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -1048,8 +1048,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
> >  		bch_sectors_dirty_init(dc);
> >  		atomic_set(&dc->has_dirty, 1);
> >  		atomic_inc(&dc->count);
> > -		bch_writeback_queue(dc);
> >  	}
> > +	bch_writeback_queue(dc);
> >  
> >  	bch_cached_dev_run(dc);
> >  	bcache_device_link(&dc->disk, c, "bdev");
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index f4300e4..f49e6b1 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -239,7 +239,7 @@ static void read_dirty(struct cached_dev *dc)
> >  		if (KEY_START(&w->key) != dc->last_read ||
> >  		    jiffies_to_msecs(delay) > 50)
> >  			while (!kthread_should_stop() && delay)
> > -				delay = schedule_timeout_uninterruptible(delay);
> > +				delay = schedule_timeout_interruptible(delay);
> >  
> >  		dc->last_read	= KEY_OFFSET(&w->key);
> >  
> > @@ -401,6 +401,18 @@ static int bch_writeback_thread(void *arg)
> >  
> >  	while (!kthread_should_stop()) {
> >  		down_write(&dc->writeback_lock);
> > +		if (BDEV_CACHE_MODE(&dc->sb) != CACHE_MODE_WRITEBACK) {
> > +			up_write(&dc->writeback_lock);
> > +			set_current_state(TASK_INTERRUPTIBLE);
> > +
> > +			if (kthread_should_stop())
> > +				return 0;
> > +
> > +			try_to_freeze();
> > +			schedule_timeout_interruptible(10 * HZ);
> > +			continue;
> > +		}
> > +
> 
> So this addition isn't correct - cache mode might've been flipped to
> writethrough, but we might still have dirty data we need to flush: that's why
> the line below is checking dc->has_dirty.

Good point.

> I don't think your schedule_timeout_interruptible() is correct either, and I'm
> not seeing what's wrong with the code right below - where it's doing the
> set_current_state() then the schedule() - but from your report of high idle load
> average (what about cpu?) it sounds like something is wrong.

The load average will settle at around 2.0 or so; powertop reports idleness of
90% or more for all cores, and the disks aren't doing anything either.

On a freshly booted VM, it calls schedule and goes to sleep for quite a while.
As soon as I start running some fs write tests, I see that has_dirty becomes 1.
Once in a while, searched_full_index==1, but RB_EMPTY_ROOT(&dc->writeback_keys.keys)
never hits 0, so has_dirty stays 1.  When I kill the tests and go back to idle,
we end up looping in read_dirty for a long time, I guess because ... aha!  It's
slowly trickling the dirty data out to the backing device, and cranking up
writeback_rate makes it finish (and go back to that schedule() sleep) faster.

Hmm.  I'm wondering about the choice of _interruptible vs.
_uninterruptible -- what are you trying to prevent from happening?

<shrug> will diagnose further (it's dinnertime).

> Can you try and diagnose further? Also if you want to split the rest of the
> changes out into a separate patch I'll definitely take that. Thanks!

Do you mean the first chunk, which moves the bch_writeback_queue() in super.c?

--D
> 
> 
> >  		if (!atomic_read(&dc->has_dirty) ||
> >  		    (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
> >  		     !dc->writeback_running)) {
> > @@ -436,7 +448,7 @@ static int bch_writeback_thread(void *arg)
> >  			while (delay &&
> >  			       !kthread_should_stop() &&
> >  			       !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags))
> > -				delay = schedule_timeout_uninterruptible(delay);
> > +				delay = schedule_timeout_interruptible(delay);
> >  		}
> >  	}
> >  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-04-11  1:41   ` Darrick J. Wong
@ 2014-04-12  2:31     ` Darrick J. Wong
  2014-04-12  2:33       ` [PATCH] bcache: let the writeback thread run at least once at startup Darrick J. Wong
  2014-04-30 11:51       ` [PATCH] bcache: fix writeback thread to sleep less intrusively Daniel Smedegaard Buus
  0 siblings, 2 replies; 16+ messages in thread
From: Darrick J. Wong @ 2014-04-12  2:31 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Sam Fulcomer, Francis Moreau, Daniel J Blueman, linux-bcache

On Thu, Apr 10, 2014 at 06:41:35PM -0700, Darrick J. Wong wrote:
> On Thu, Apr 10, 2014 at 05:46:49PM -0700, Kent Overstreet wrote:
> > On Thu, Apr 10, 2014 at 05:26:36PM -0700, Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > > The attached patch fixes both the "writeback blocked for XXX seconds"
> > > complaints from the kernel and the oddly high load averages on idle systems
> > > problems for me.  Can you give it a try to see if it fixes your problem too?
> > > 
> > > --D
> > > ---
> > > Currently, the writeback thread performs uninterruptible sleep while
> > > it waits for enough dirty data to accumulate to start writeback.
> > > Unfortunately, uninterruptible sleep counts towards load average,
> > > which artificially inflates it.  Since the wb thread is a kernel
> > > thread and kthreads don't receive signals, we can use the
> > > interruptible sleep call, which eliminates the high load average
> > > symptom.
> > > 
> > > A second symptom is that if we mount a non-writeback cache, the
> > > writeback thread will be woken up.  If the cache later accumulates
> > > dirty data and writeback_running=1 (this seems to be a default) then
> > > the writeback thread will enter uninterruptible sleep waiting for
> > > dirty data.  This is unnecessary and (I think) results in the
> > > "bcache_writebac:155 blocked for more than XXX seconds" complaints
> > > that people have been talking about.  The fix for this is simple -- if
> > > we're not in writeback mode, just go to (interruptible) sleep for a
> > > long time.  Alternately, we could use wait_event until the cache mode
> > > changes.
> > > 
> > > Finally, change bch_cached_dev_attach() to always wake up the
> > > writeback thread, because the newly created wb thread remains in
> > > uninterruptible sleep state until something explicitly wakes it up.
> > > This wakeup allows the thread to call bch_writeback_thread(),
> > > whereupon it will most likely end up in interruptible sleep.  In
> > > theory we could just let the first write take care of this, but
> > > there's really no reason not to do the transition quickly.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  drivers/md/bcache/super.c     |    2 +-
> > >  drivers/md/bcache/writeback.c |   16 ++++++++++++++--
> > >  2 files changed, 15 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > > index 24a3a15..3ffe970 100644
> > > --- a/drivers/md/bcache/super.c
> > > +++ b/drivers/md/bcache/super.c
> > > @@ -1048,8 +1048,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
> > >  		bch_sectors_dirty_init(dc);
> > >  		atomic_set(&dc->has_dirty, 1);
> > >  		atomic_inc(&dc->count);
> > > -		bch_writeback_queue(dc);
> > >  	}
> > > +	bch_writeback_queue(dc);
> > >  
> > >  	bch_cached_dev_run(dc);
> > >  	bcache_device_link(&dc->disk, c, "bdev");
> > > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > > index f4300e4..f49e6b1 100644
> > > --- a/drivers/md/bcache/writeback.c
> > > +++ b/drivers/md/bcache/writeback.c
> > > @@ -239,7 +239,7 @@ static void read_dirty(struct cached_dev *dc)
> > >  		if (KEY_START(&w->key) != dc->last_read ||
> > >  		    jiffies_to_msecs(delay) > 50)
> > >  			while (!kthread_should_stop() && delay)
> > > -				delay = schedule_timeout_uninterruptible(delay);
> > > +				delay = schedule_timeout_interruptible(delay);
> > >  
> > >  		dc->last_read	= KEY_OFFSET(&w->key);
> > >  
> > > @@ -401,6 +401,18 @@ static int bch_writeback_thread(void *arg)
> > >  
> > >  	while (!kthread_should_stop()) {
> > >  		down_write(&dc->writeback_lock);
> > > +		if (BDEV_CACHE_MODE(&dc->sb) != CACHE_MODE_WRITEBACK) {
> > > +			up_write(&dc->writeback_lock);
> > > +			set_current_state(TASK_INTERRUPTIBLE);
> > > +
> > > +			if (kthread_should_stop())
> > > +				return 0;
> > > +
> > > +			try_to_freeze();
> > > +			schedule_timeout_interruptible(10 * HZ);
> > > +			continue;
> > > +		}
> > > +
> > 
> > So this addition isn't correct - cache mode might've been flipped to
> > writethrough, but we might still have dirty data we need to flush: that's why
> > the line below is checking dc->has_dirty.
> 
> Good point.
> 
> > I don't think your schedule_timeout_interruptible() is correct either, and I'm
> > not seeing what's wrong with the code right below - where it's doing the
> > set_current_state() then the schedule() - but from your report of high idle load
> > average (what about cpu?) it sounds like something is wrong.
> 
> The load average will settle at around 2.0 or so; powertop reports idleness of
> 90% or more for all cores, and the disks aren't doing anything either.
> 
> On a freshly booted VM, it calls schedule and goes to sleep for quite a while.
> As soon as I start running some fs write tests, I see that has_dirty becomes 1.
> Once in a while, searched_full_index==1, but RB_EMPTY_ROOT(&dc->writeback_keys.keys)
> never hits 0, so has_dirty stays 1.  When I kill the tests and go back to idle,
> we end up looping in read_dirty for a long time, I guess because ... aha!  It's
> slowly trickling the dirty data out to the backing device, and cranking up
> writeback_rate makes it finish (and go back to that schedule() sleep) faster.

Ok, I have a little more to share about this -- the PD controller in charge of
WB tries to maintain (by default) 10% of the cache as dirty.  On my relatively
idle laptop, it is frequently the case that < 10% of the cache is dirty.  When
this happens, the writeback_rate falls to 512b/s.  Meanwhile, read_dirty writes
out the dirty data at this relatively slow rate, apparently using
schedule_timeout_uninterruptible to throttle the writeout.

There's enough activity on my laptop to ensure that there's always _some_ dirty
data, hence the writeback thread is constantly in uninterruptible sleep.  I
could probably set writeback_percent = 0 to mitigate this since in my case I
probably want as little dirty data as possible, but ... ugh.

So that brings me back to this question:

> Hmm.  I'm wondering about the choice of _interruptible vs.
> _uninterruptible -- what are you trying to prevent from happening?

I changed it to _interruptible, and so far I haven't seen any problems running
xfstests.

> > Can you try and diagnose further? Also if you want to split the rest of the
> > changes out into a separate patch I'll definitely take that. Thanks!
> 
> Do you mean the first chunk, which moves the bch_writeback_queue() in super.c?

Assuming you do, I'll issue a patch shortly.

--D
> 
> --D
> > 
> > 
> > >  		if (!atomic_read(&dc->has_dirty) ||
> > >  		    (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
> > >  		     !dc->writeback_running)) {
> > > @@ -436,7 +448,7 @@ static int bch_writeback_thread(void *arg)
> > >  			while (delay &&
> > >  			       !kthread_should_stop() &&
> > >  			       !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags))
> > > -				delay = schedule_timeout_uninterruptible(delay);
> > > +				delay = schedule_timeout_interruptible(delay);
> > >  		}
> > >  	}
> > >  
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] bcache: let the writeback thread run at least once at startup
  2014-04-12  2:31     ` Darrick J. Wong
@ 2014-04-12  2:33       ` Darrick J. Wong
  2014-04-30 11:51       ` [PATCH] bcache: fix writeback thread to sleep less intrusively Daniel Smedegaard Buus
  1 sibling, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2014-04-12  2:33 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

Change bch_cached_dev_attach() to always wake up the writeback thread,
because the newly created wb thread remains in uninterruptible sleep
state until something explicitly wakes it up.  This wakeup allows the
thread to call bch_writeback_thread(), whereupon it will most likely
end up in interruptible sleep.  In theory we could just let the first
write take care of this, but there's really no reason not to do the
transition quickly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 drivers/md/bcache/super.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 24a3a15..3ffe970 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1048,8 +1048,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
 		bch_sectors_dirty_init(dc);
 		atomic_set(&dc->has_dirty, 1);
 		atomic_inc(&dc->count);
-		bch_writeback_queue(dc);
 	}
+	bch_writeback_queue(dc);
 
 	bch_cached_dev_run(dc);
 	bcache_device_link(&dc->disk, c, "bdev");

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-04-12  2:31     ` Darrick J. Wong
  2014-04-12  2:33       ` [PATCH] bcache: let the writeback thread run at least once at startup Darrick J. Wong
@ 2014-04-30 11:51       ` Daniel Smedegaard Buus
  2014-04-30 17:24         ` Darrick J. Wong
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Smedegaard Buus @ 2014-04-30 11:51 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Kent Overstreet, Sam Fulcomer, Francis Moreau, Daniel J Blueman,
	linux-bcache

Hi :)

What's the state of this?

I have this issue on a number of EC2 instances that are using bcache
to speed up EBS access by caching on the ephemeral SSDs (writethrough
mode).

"echo 0 > /sys/block/bcache0/bcache/writeback_running" as suggested by
others stops the flood of "blocked" messages in dmesg, but it only
partly fixes the 2.0 minimum load average issue (instead, it's now a
1.0 minimum load average).

I just tried the latest kernel I could get my hands on for Ubuntu
Trusty, the mainline v3.15-rc2 (built April 20th), and it exhibits the
same behavior.

Thanks :)

Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-04-30 11:51       ` [PATCH] bcache: fix writeback thread to sleep less intrusively Daniel Smedegaard Buus
@ 2014-04-30 17:24         ` Darrick J. Wong
  2014-05-01  9:38           ` Daniel Smedegaard Buus
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2014-04-30 17:24 UTC (permalink / raw)
  To: Daniel Smedegaard Buus
  Cc: Kent Overstreet, Sam Fulcomer, Francis Moreau, Daniel J Blueman,
	linux-bcache

On Wed, Apr 30, 2014 at 01:51:14PM +0200, Daniel Smedegaard Buus wrote:
> Hi :)
> 
> What's the state of this?
> 
> I have this issue on a number of EC2 instances that are using bcache
> to speed up EBS access by caching on the ephemeral SSDs (writethrough
> mode).
> 
> "echo 0 > /sys/block/bcache0/bcache/writeback_running" as suggested by
> others stops the flood of "blocked" messages in dmesg, but it only
> partly fixes the 2.0 minimum load average issue (instead, it's now a
> 1.0 minimum load average).

I haven't spent time on figuring out the other source of load average.  Kent
didn't seem to like the patch to convert the bcache_writeback thread to
interruptible sleep (I recall he said it was 'wrong', but didn't elaborate).

--D
> 
> I just tried the latest kernel I could get my hands on for Ubuntu
> Trusty, the mainline v3.15-rc2 (built April 20th), and it exhibits the
> same behavior.
> 
> Thanks :)
> 
> Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-04-30 17:24         ` Darrick J. Wong
@ 2014-05-01  9:38           ` Daniel Smedegaard Buus
  2014-05-01 21:54             ` Slava Pestov
       [not found]             ` <CACHGV4+mNu_KV7JazT-34D++3S2NKhDkOmc_wo0QfrfdqpccoQ@mail.gmail.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Smedegaard Buus @ 2014-05-01  9:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Kent Overstreet, Sam Fulcomer, Francis Moreau, Daniel J Blueman,
	linux-bcache

On Wed, Apr 30, 2014 at 7:24 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>
> I haven't spent time on figuring out the other source of load average.  Kent
> didn't seem to like the patch to convert the bcache_writeback thread to
> interruptible sleep (I recall he said it was 'wrong', but didn't elaborate).
>

Sorry to hear that... Would be really nice to be able to go back to
normal load. And I cannot revert to an older kernel, as I need
3.15-rc2 or greater to fix a different problem concerning Oracle Java
:/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-05-01  9:38           ` Daniel Smedegaard Buus
@ 2014-05-01 21:54             ` Slava Pestov
  2014-05-20  7:07               ` Daniel J Blueman
       [not found]             ` <CACHGV4+mNu_KV7JazT-34D++3S2NKhDkOmc_wo0QfrfdqpccoQ@mail.gmail.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Slava Pestov @ 2014-05-01 21:54 UTC (permalink / raw)
  To: Daniel Smedegaard Buus
  Cc: Darrick J. Wong, Kent Overstreet, Sam Fulcomer, Francis Moreau,
	Daniel J Blueman, linux-bcache

Hi Daniel and Darrick,

I mailed a patch that attempts to fix the uninterruptible issue while
taking Kent's feedback regarding your earlier patch into account.
Please test it out and let me know what you think.

On Thu, May 1, 2014 at 2:38 AM, Daniel Smedegaard Buus
<danielbuus@gmail.com> wrote:
> On Wed, Apr 30, 2014 at 7:24 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
>>
>> I haven't spent time on figuring out the other source of load average.  Kent
>> didn't seem to like the patch to convert the bcache_writeback thread to
>> interruptible sleep (I recall he said it was 'wrong', but didn't elaborate).
>>
>
> Sorry to hear that... Would be really nice to be able to go back to
> normal load. And I cannot revert to an older kernel, as I need
> 3.15-rc2 or greater to fix a different problem concerning Oracle Java
> :/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-05-01 21:54             ` Slava Pestov
@ 2014-05-20  7:07               ` Daniel J Blueman
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel J Blueman @ 2014-05-20  7:07 UTC (permalink / raw)
  To: Slava Pestov
  Cc: Daniel Smedegaard Buus, Darrick J. Wong, Kent Overstreet,
	Sam Fulcomer, Francis Moreau, linux-bcache

On 2 May 2014 05:54, Slava Pestov <sp@datera.io> wrote:
> On Thu, May 1, 2014 at 2:38 AM, Daniel Smedegaard Buus
> <danielbuus@gmail.com> wrote:
>> On Wed, Apr 30, 2014 at 7:24 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>>>
>>> I haven't spent time on figuring out the other source of load average.  Kent
>>> didn't seem to like the patch to convert the bcache_writeback thread to
>>> interruptible sleep (I recall he said it was 'wrong', but didn't elaborate).
>>
>> Sorry to hear that... Would be really nice to be able to go back to
>> normal load. And I cannot revert to an older kernel, as I need
>> 3.15-rc2 or greater to fix a different problem concerning Oracle Java

> Hi Daniel and Darrick,
>
> I mailed a patch that attempts to fix the uninterruptible issue while
> taking Kent's feedback regarding your earlier patch into account.
> Please test it out and let me know what you think.

Hi Slava,

Apologies for the delays. I rebuilt 3.14.4 with your bcache patch [1]
and the 'bcache_writeback blocked for more than 120 seconds' don't
occur, though when the bcache threads are torn down during reboot, we
crash [2] at:

static void cached_dev_free(struct closure *cl)
{
    struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);

    cancel_delayed_work_sync(&dc->writeback_rate_update);
    kthread_stop(dc->writeback_thread);

dc->writeback_thread is clearly zero, as likely the struct cached_dev
was freed already.

Many thanks,
  Daniel

[1] http://www.spinics.net/lists/linux-bcache/msg02464.html

-- [2]

BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff8108bff6>] kthread_stop+0x16/0xe0
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: microcode(F) nfsd(F) nfs_acl(F) auth_rpcgss(F)
nfs(F) fscache(F) lockd(F) sunrpc(F) joydev(F) ipmi_si(F)
ipmi_msghandler(F) psmouse(F) serio_raw(F) video(F) mac_hid(F)
lpc_ich(F) lp(F) parport(F) btrfs(F) raid6_pq(F) bcache(F) xor(F)
hid_generic(F) usbhid(F) hid(F) e1000e(F) ptp(F) pps_core(F) ahci(F)
CPU: 2 PID: 27 Comm: kworker/2:0 Tainted: GF      3.14.4-bcachefix+ #3
Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
Workqueue: events cached_dev_free [bcache]
task: ffff8804097363c0 ti: ffff8804097d2000 task.ti: ffff8804097d2000
RIP: 0010:[<ffffffff8108bff6>] [<ffffffff8108bff6>] kthread_stop+0x16/0xe0
RSP: 0018:ffff8804097d3df0 EFLAGS: 00010292
RAX: 0000000fffffff00 RBX: ffff8804034e0010 RCX: 000000007fffffff
RDX: 0000000000000296 RSI: 000000007fffffff RDI: 0000000000000000
RBP: ffff8804097d3e08 R08: 20100d3800400000 R09: 0080000000000000
R10: dfef7acc030e0010 R11: 0000000000000400 R12: 0000000000000000
R13: ffff8804034e0010 R14: 0000000000000000 R15: 0000000000000080
FS: 0000000000000000(0000) GS:ffff88041fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000001c0e000 CR4: 00000000000407e0
Stack:
 ffff8804034e0010 ffff88041fd13d80 ffff8804034e0010 ffff8804097d3e20
 ffffffffa00b2dd5 ffff880409503580 ffff8804097d3e68 ffffffff81084482
 000000001fd13d98 ffff88041fd17f00 ffff88041fd13d98 ffff8804095035b0
Call Trace:
 [<ffffffffa00b2dd5>] cached_dev_free+0x25/0x100 [bcache]
 [<ffffffff81084482>] process_one_work+0x182/0x450
 [<ffffffff81085241>] worker_thread+0x121/0x410
 [<ffffffff81085120>] ? rescuer_thread+0x3e0/0x3e0
 [<ffffffff8108bdc2>] kthread+0xd2/0xf0
 [<ffffffff8108bcf0>] ? kthread_create_on_node+0x190/0x190
 [<ffffffff8173c67c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8108bcf0>] ? kthread_create_on_node+0x190/0x190
Code: e8 20 ff ff ff 48 89 df be 00 02 00 00 e8 63 10 01 00 5b 5d c3
66 66 66 66 90 55 48 89 e5 41 55 41 54 49 89 fc 53 66 66 66 66 90 <f0>
41 ff 44 24 10 49 8b 9c 24 a0 04 00 00 48 85 db 74 21 f0 80
RIP [<ffffffff8108bff6>] kthread_stop+0x16/0xe0
 RSP <ffff8804097d3df0>
CR2: 0000000000000010
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <CACHGV4+mNu_KV7JazT-34D++3S2NKhDkOmc_wo0QfrfdqpccoQ@mail.gmail.com>]

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
       [not found]             ` <CACHGV4+mNu_KV7JazT-34D++3S2NKhDkOmc_wo0QfrfdqpccoQ@mail.gmail.com>
@ 2014-07-09 10:27               ` Daniel Smedegaard Buus
  2014-07-09 15:53                 ` Peter Kieser
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Smedegaard Buus @ 2014-07-09 10:27 UTC (permalink / raw)
  Cc: linux-bcache

On Thu, May 1, 2014 at 11:53 PM, Slava Pestov <sp@datera.io> wrote:
> Hi Daniel and Darrick,
>
> I mailed a patch that attempts to fix the uninterruptible issue while taking
> Kent's feedback regarding your earlier patch into account. Please test it
> out and let me know what you think.
>

Curious about what ever happened to this bug. This patch fixed the
load bug, but with new problems added...

Did anyone ever fix this properly? Or is it being attended to?

Thanks, and thanks for the great work so far :)

Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-07-09 10:27               ` Daniel Smedegaard Buus
@ 2014-07-09 15:53                 ` Peter Kieser
  2014-07-09 18:02                   ` Daniel Smedegaard Buus
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Kieser @ 2014-07-09 15:53 UTC (permalink / raw)
  Cc: Daniel Smedegaard Buus, linux-bcache

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]


On 2014-07-09 3:27 AM, Daniel Smedegaard Buus wrote:
> On Thu, May 1, 2014 at 11:53 PM, Slava Pestov<sp@datera.io>  wrote:
>> >Hi Daniel and Darrick,
>> >
>> >I mailed a patch that attempts to fix the uninterruptible issue while taking
>> >Kent's feedback regarding your earlier patch into account. Please test it
>> >out and let me know what you think.
>> >
> Curious about what ever happened to this bug. This patch fixed the
> load bug, but with new problems added...
What other bugs did it introduce?

-Peter


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4504 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-07-09 15:53                 ` Peter Kieser
@ 2014-07-09 18:02                   ` Daniel Smedegaard Buus
  2014-07-09 18:19                     ` Slava Pestov
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Smedegaard Buus @ 2014-07-09 18:02 UTC (permalink / raw)
  To: Peter Kieser; +Cc: linux-bcache

On Wed, Jul 9, 2014 at 5:53 PM, Peter Kieser <peter@kieser.ca> wrote:
>
> What other bugs did it introduce?
>

Well, the one mentioned in this thread, and the one where losing (or
disabling? can't remember which) the cache device leaves you with a
system that won't boot (can't find that message ATM).

> -Peter
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-07-09 18:02                   ` Daniel Smedegaard Buus
@ 2014-07-09 18:19                     ` Slava Pestov
  2014-07-09 19:18                       ` Daniel Smedegaard Buus
  0 siblings, 1 reply; 16+ messages in thread
From: Slava Pestov @ 2014-07-09 18:19 UTC (permalink / raw)
  To: Daniel Smedegaard Buus; +Cc: Peter Kieser, linux-bcache

I sent two patches to fix the writeback thread -- if you apply both of
them, everything should work fine.

On Wed, Jul 9, 2014 at 11:02 AM, Daniel Smedegaard Buus
<danielbuus@gmail.com> wrote:
> On Wed, Jul 9, 2014 at 5:53 PM, Peter Kieser <peter@kieser.ca> wrote:
>>
>> What other bugs did it introduce?
>>
>
> Well, the one mentioned in this thread, and the one where losing (or
> disabling? can't remember which) the cache device leaves you with a
> system that won't boot (can't find that message ATM).
>
>> -Peter
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-07-09 18:19                     ` Slava Pestov
@ 2014-07-09 19:18                       ` Daniel Smedegaard Buus
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Smedegaard Buus @ 2014-07-09 19:18 UTC (permalink / raw)
  To: Slava Pestov; +Cc: Peter Kieser, linux-bcache

On Wed, Jul 9, 2014 at 8:19 PM, Slava Pestov <sp@datera.io> wrote:
> I sent two patches to fix the writeback thread -- if you apply both of
> them, everything should work fine.
>

That's fantastic! :D Where is the right place to look for these patches?

Thanks!

Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bcache: fix writeback thread to sleep less intrusively
  2014-04-11  0:46 ` Kent Overstreet
  2014-04-11  1:41   ` Darrick J. Wong
@ 2014-04-11  7:26   ` Francis Moreau
  1 sibling, 0 replies; 16+ messages in thread
From: Francis Moreau @ 2014-04-11  7:26 UTC (permalink / raw)
  To: Kent Overstreet, Darrick J. Wong
  Cc: Sam Fulcomer, Daniel J Blueman, linux-bcache

On 04/11/2014 02:46 AM, Kent Overstreet wrote:
> On Thu, Apr 10, 2014 at 05:26:36PM -0700, Darrick J. Wong wrote:
>> Hi all,
>>
>> The attached patch fixes both the "writeback blocked for XXX seconds"
>> complaints from the kernel and the oddly high load averages on idle systems
>> problems for me.  Can you give it a try to see if it fixes your problem too?
>>
>> --D
>> ---
>> Currently, the writeback thread performs uninterruptible sleep while
>> it waits for enough dirty data to accumulate to start writeback.
>> Unfortunately, uninterruptible sleep counts towards load average,
>> which artificially inflates it.  Since the wb thread is a kernel
>> thread and kthreads don't receive signals, we can use the
>> interruptible sleep call, which eliminates the high load average
>> symptom.
>>
>> A second symptom is that if we mount a non-writeback cache, the
>> writeback thread will be woken up.  If the cache later accumulates
>> dirty data and writeback_running=1 (this seems to be a default) then
>> the writeback thread will enter uninterruptible sleep waiting for
>> dirty data.  This is unnecessary and (I think) results in the
>> "bcache_writebac:155 blocked for more than XXX seconds" complaints
>> that people have been talking about.  The fix for this is simple -- if
>> we're not in writeback mode, just go to (interruptible) sleep for a
>> long time.  Alternately, we could use wait_event until the cache mode
>> changes.
>>
>> Finally, change bch_cached_dev_attach() to always wake up the
>> writeback thread, because the newly created wb thread remains in
>> uninterruptible sleep state until something explicitly wakes it up.
>> This wakeup allows the thread to call bch_writeback_thread(),
>> whereupon it will most likely end up in interruptible sleep.  In
>> theory we could just let the first write take care of this, but
>> there's really no reason not to do the transition quickly.
>>
>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>> ---
>>  drivers/md/bcache/super.c     |    2 +-
>>  drivers/md/bcache/writeback.c |   16 ++++++++++++++--
>>  2 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index 24a3a15..3ffe970 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -1048,8 +1048,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
>>  		bch_sectors_dirty_init(dc);
>>  		atomic_set(&dc->has_dirty, 1);
>>  		atomic_inc(&dc->count);
>> -		bch_writeback_queue(dc);
>>  	}
>> +	bch_writeback_queue(dc);
>>  
>>  	bch_cached_dev_run(dc);
>>  	bcache_device_link(&dc->disk, c, "bdev");
>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>> index f4300e4..f49e6b1 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -239,7 +239,7 @@ static void read_dirty(struct cached_dev *dc)
>>  		if (KEY_START(&w->key) != dc->last_read ||
>>  		    jiffies_to_msecs(delay) > 50)
>>  			while (!kthread_should_stop() && delay)
>> -				delay = schedule_timeout_uninterruptible(delay);
>> +				delay = schedule_timeout_interruptible(delay);
>>  
>>  		dc->last_read	= KEY_OFFSET(&w->key);
>>  
>> @@ -401,6 +401,18 @@ static int bch_writeback_thread(void *arg)
>>  
>>  	while (!kthread_should_stop()) {
>>  		down_write(&dc->writeback_lock);
>> +		if (BDEV_CACHE_MODE(&dc->sb) != CACHE_MODE_WRITEBACK) {
>> +			up_write(&dc->writeback_lock);
>> +			set_current_state(TASK_INTERRUPTIBLE);
>> +
>> +			if (kthread_should_stop())
>> +				return 0;
>> +
>> +			try_to_freeze();
>> +			schedule_timeout_interruptible(10 * HZ);
>> +			continue;
>> +		}
>> +
> 
> So this addition isn't correct - cache mode might've been flipped to
> writethrough, but we might still have dirty data we need to flush: that's why
> the line below is checking dc->has_dirty.
> 
> I don't think your schedule_timeout_interruptible() is correct either, and I'm
> not seeing what's wrong with the code right below - where it's doing the
> set_current_state() then the schedule() - but from your report of high idle load
> average (what about cpu?) it sounds like something is wrong.

could you then try to explain then why people is currently facing this
issue (3.13 and 3.14 at least):

[Apr11 09:15] INFO: task bcache_writebac:166 blocked for more than 120
seconds.
[  +0.000009]       Not tainted 3.14.0-4-ARCH #1
[  +0.000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  +0.000003] bcache_writebac D 0000000000000000     0   166      2
0x00000000
[  +0.000007]  ffff880405223eb8 0000000000000046 ffff8800d8939d70
ffff880405223fd8
[  +0.000006]  00000000000142c0 00000000000142c0 ffff8800d8939d70
ffff880405223e38
[  +0.000004]  ffffffff81097c1a 0000000000000000 0000000000000000
0000000000000046
[  +0.000005] Call Trace:
[  +0.000015]  [<ffffffff81097c1a>] ? try_to_wake_up+0x1fa/0x2c0
[  +0.000005]  [<ffffffff81097d32>] ? default_wake_function+0x12/0x20
[  +0.000007]  [<ffffffff810a9b48>] ? __wake_up_common+0x58/0x90
[  +0.000035]  [<ffffffffa0428ff0>] ? read_dirty_endio+0x60/0x60 [bcache]
[  +0.000007]  [<ffffffff814d7d89>] schedule+0x29/0x70
[  +0.000005]  [<ffffffff8108715d>] kthread+0xad/0xf0
[  +0.000005]  [<ffffffff810870b0>] ? kthread_create_on_node+0x180/0x180
[  +0.000006]  [<ffffffff814e2ebc>] ret_from_fork+0x7c/0xb0
[  +0.000004]  [<ffffffff810870b0>] ? kthread_create_on_node+0x180/0x180

This has been reported several times but it seems that you ignored it
each time.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-07-09 19:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-11  0:26 [PATCH] bcache: fix writeback thread to sleep less intrusively Darrick J. Wong
2014-04-11  0:46 ` Kent Overstreet
2014-04-11  1:41   ` Darrick J. Wong
2014-04-12  2:31     ` Darrick J. Wong
2014-04-12  2:33       ` [PATCH] bcache: let the writeback thread run at least once at startup Darrick J. Wong
2014-04-30 11:51       ` [PATCH] bcache: fix writeback thread to sleep less intrusively Daniel Smedegaard Buus
2014-04-30 17:24         ` Darrick J. Wong
2014-05-01  9:38           ` Daniel Smedegaard Buus
2014-05-01 21:54             ` Slava Pestov
2014-05-20  7:07               ` Daniel J Blueman
     [not found]             ` <CACHGV4+mNu_KV7JazT-34D++3S2NKhDkOmc_wo0QfrfdqpccoQ@mail.gmail.com>
2014-07-09 10:27               ` Daniel Smedegaard Buus
2014-07-09 15:53                 ` Peter Kieser
2014-07-09 18:02                   ` Daniel Smedegaard Buus
2014-07-09 18:19                     ` Slava Pestov
2014-07-09 19:18                       ` Daniel Smedegaard Buus
2014-04-11  7:26   ` Francis Moreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).