* ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)
[not found] ` <20161019173403.GB3142@twins.programming.kicks-ass.net>
@ 2016-10-24 1:57 ` Davidlohr Bueso
2016-10-24 13:26 ` Kent Overstreet
2016-10-24 14:27 ` Kent Overstreet
0 siblings, 2 replies; 5+ messages in thread
From: Davidlohr Bueso @ 2016-10-24 1:57 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Will Deacon, Linus Torvalds, Waiman Long, Jason Low,
Ding Tianhong, Thomas Gleixner, Ingo Molnar, Imre Deak,
Linux Kernel Mailing List, Tim Chen, Terry Rudd, Paul E. McKenney,
Jason Low, Chris Wilson, Daniel Vetter, kent.overstreet, axboe,
linux-bcache
On Wed, 19 Oct 2016, Peter Zijlstra wrote:
>Subject: sched: Better explain sleep/wakeup
>From: Peter Zijlstra <peterz@infradead.org>
>Date: Wed Oct 19 15:45:27 CEST 2016
>
>There were a few questions wrt how sleep-wakeup works. Try and explain
>it more.
>
>Requested-by: Will Deacon <will.deacon@arm.com>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> include/linux/sched.h | 52 ++++++++++++++++++++++++++++++++++----------------
> kernel/sched/core.c | 15 +++++++-------
> 2 files changed, 44 insertions(+), 23 deletions(-)
>
>--- a/include/linux/sched.h
>+++ b/include/linux/sched.h
>@@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> #define set_task_state(tsk, state_value) \
> do { \
> (tsk)->task_state_change = _THIS_IP_; \
>- smp_store_mb((tsk)->state, (state_value)); \
>+ smp_store_mb((tsk)->state, (state_value)); \
> } while (0)
>
>-/*
>- * set_current_state() includes a barrier so that the write of current->state
>- * is correctly serialised wrt the caller's subsequent test of whether to
>- * actually sleep:
>- *
>- * set_current_state(TASK_UNINTERRUPTIBLE);
>- * if (do_i_need_to_sleep())
>- * schedule();
>- *
>- * If the caller does not need such serialisation then use __set_current_state()
>- */
> #define __set_current_state(state_value) \
> do { \
> current->task_state_change = _THIS_IP_; \
>@@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> #define set_current_state(state_value) \
> do { \
> current->task_state_change = _THIS_IP_; \
>- smp_store_mb(current->state, (state_value)); \
>+ smp_store_mb(current->state, (state_value)); \
> } while (0)
>
> #else
>
>+/*
>+ * @tsk had better be current, or you get to keep the pieces.
That reminds me we were getting rid of the set_task_state() calls. Bcache was
pending, being only user in the kernel that doesn't actually use current; but
instead breaks newly (yet blocked/uninterruptible) created garbage collection
kthread. I cannot figure out why this is done (ie purposely accounting the
load avg. Furthermore gc kicks in in very specific scenarios obviously, such
as as by the allocator task, so I don't see why bcache gc should want to be
interruptible.
Kent, Jens, can we get rid of this?
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 76f7534d1dd1..6e3c358b5759 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
if (IS_ERR(c->gc_thread))
return PTR_ERR(c->gc_thread);
- set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
return 0;
}
Thanks,
Davidlohr
>+ *
>+ * The only reason is that computing current can be more expensive than
>+ * using a pointer that's already available.
>+ *
>+ * Therefore, see set_current_state().
>+ */
> #define __set_task_state(tsk, state_value) \
> do { (tsk)->state = (state_value); } while (0)
> #define set_task_state(tsk, state_value) \
>@@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!
> * is correctly serialised wrt the caller's subsequent test of whether to
> * actually sleep:
> *
>+ * for (;;) {
> * set_current_state(TASK_UNINTERRUPTIBLE);
>- * if (do_i_need_to_sleep())
>- * schedule();
>+ * if (!need_sleep)
>+ * break;
>+ *
>+ * schedule();
>+ * }
>+ * __set_current_state(TASK_RUNNING);
>+ *
>+ * If the caller does not need such serialisation (because, for instance, the
>+ * condition test and condition change and wakeup are under the same lock) then
>+ * use __set_current_state().
>+ *
>+ * The above is typically ordered against the wakeup, which does:
>+ *
>+ * need_sleep = false;
>+ * wake_up_state(p, TASK_UNINTERRUPTIBLE);
>+ *
>+ * Where wake_up_state() (and all other wakeup primitives) imply enough
>+ * barriers to order the store of the variable against wakeup.
>+ *
>+ * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
>+ * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
>+ * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
>+ *
>+ * This is obviously fine, since they both store the exact same value.
> *
>- * If the caller does not need such serialisation then use __set_current_state()
>+ * Also see the comments of try_to_wake_up().
> */
> #define __set_current_state(state_value) \
> do { current->state = (state_value); } while (0)
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -2000,14 +2000,15 @@ static void ttwu_queue(struct task_struc
> * @state: the mask of task states that can be woken
> * @wake_flags: wake modifier flags (WF_*)
> *
>- * Put it on the run-queue if it's not already there. The "current"
>- * thread is always on the run-queue (except when the actual
>- * re-schedule is in progress), and as such you're allowed to do
>- * the simpler "current->state = TASK_RUNNING" to mark yourself
>- * runnable without the overhead of this.
>+ * If (@state & @p->state) @p->state = TASK_RUNNING.
> *
>- * Return: %true if @p was woken up, %false if it was already running.
>- * or @state didn't match @p's state.
>+ * If the task was not queued/runnable, also place it back on a runqueue.
>+ *
>+ * Atomic against schedule() which would dequeue a task, also see
>+ * set_current_state().
>+ *
>+ * Return: %true if @p->state changes (an actual wakeup was done),
>+ * %false otherwise.
> */
> static int
> try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)
2016-10-24 1:57 ` ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop) Davidlohr Bueso
@ 2016-10-24 13:26 ` Kent Overstreet
2016-10-24 14:27 ` Kent Overstreet
1 sibling, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2016-10-24 13:26 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Peter Zijlstra, Will Deacon, Linus Torvalds, Waiman Long,
Jason Low, Ding Tianhong, Thomas Gleixner, Ingo Molnar, Imre Deak,
Linux Kernel Mailing List, Tim Chen, Terry Rudd, Paul E. McKenney,
Jason Low, Chris Wilson, Daniel Vetter, axboe, linux-bcache
On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> On Wed, 19 Oct 2016, Peter Zijlstra wrote:
>
> > Subject: sched: Better explain sleep/wakeup
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Wed Oct 19 15:45:27 CEST 2016
> >
> > There were a few questions wrt how sleep-wakeup works. Try and explain
> > it more.
> >
> > Requested-by: Will Deacon <will.deacon@arm.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > include/linux/sched.h | 52 ++++++++++++++++++++++++++++++++++----------------
> > kernel/sched/core.c | 15 +++++++-------
> > 2 files changed, 44 insertions(+), 23 deletions(-)
> >
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_task_state(tsk, state_value) \
> > do { \
> > (tsk)->task_state_change = _THIS_IP_; \
> > - smp_store_mb((tsk)->state, (state_value)); \
> > + smp_store_mb((tsk)->state, (state_value)); \
> > } while (0)
> >
> > -/*
> > - * set_current_state() includes a barrier so that the write of current->state
> > - * is correctly serialised wrt the caller's subsequent test of whether to
> > - * actually sleep:
> > - *
> > - * set_current_state(TASK_UNINTERRUPTIBLE);
> > - * if (do_i_need_to_sleep())
> > - * schedule();
> > - *
> > - * If the caller does not need such serialisation then use __set_current_state()
> > - */
> > #define __set_current_state(state_value) \
> > do { \
> > current->task_state_change = _THIS_IP_; \
> > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_current_state(state_value) \
> > do { \
> > current->task_state_change = _THIS_IP_; \
> > - smp_store_mb(current->state, (state_value)); \
> > + smp_store_mb(current->state, (state_value)); \
> > } while (0)
> >
> > #else
> >
> > +/*
> > + * @tsk had better be current, or you get to keep the pieces.
>
> That reminds me we were getting rid of the set_task_state() calls. Bcache was
> pending, being only user in the kernel that doesn't actually use current; but
> instead breaks newly (yet blocked/uninterruptible) created garbage collection
> kthread. I cannot figure out why this is done (ie purposely accounting the
> load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> as as by the allocator task, so I don't see why bcache gc should want to be
> interruptible.
>
> Kent, Jens, can we get rid of this?
>
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 76f7534d1dd1..6e3c358b5759 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
> if (IS_ERR(c->gc_thread))
> return PTR_ERR(c->gc_thread);
>
> - set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
> return 0;
> }
Actually, that code looks broken, or at least stupid. Let me do a proper fix...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)
2016-10-24 1:57 ` ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop) Davidlohr Bueso
2016-10-24 13:26 ` Kent Overstreet
@ 2016-10-24 14:27 ` Kent Overstreet
2016-10-25 16:55 ` Eric Wheeler
1 sibling, 1 reply; 5+ messages in thread
From: Kent Overstreet @ 2016-10-24 14:27 UTC (permalink / raw)
To: Davidlohr Bueso, Eric Wheeler; +Cc: Linux Kernel Mailing List, linux-bcache
On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> On Wed, 19 Oct 2016, Peter Zijlstra wrote:
>
> > Subject: sched: Better explain sleep/wakeup
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Wed Oct 19 15:45:27 CEST 2016
> >
> > There were a few questions wrt how sleep-wakeup works. Try and explain
> > it more.
> >
> > Requested-by: Will Deacon <will.deacon@arm.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > include/linux/sched.h | 52 ++++++++++++++++++++++++++++++++++----------------
> > kernel/sched/core.c | 15 +++++++-------
> > 2 files changed, 44 insertions(+), 23 deletions(-)
> >
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_task_state(tsk, state_value) \
> > do { \
> > (tsk)->task_state_change = _THIS_IP_; \
> > - smp_store_mb((tsk)->state, (state_value)); \
> > + smp_store_mb((tsk)->state, (state_value)); \
> > } while (0)
> >
> > -/*
> > - * set_current_state() includes a barrier so that the write of current->state
> > - * is correctly serialised wrt the caller's subsequent test of whether to
> > - * actually sleep:
> > - *
> > - * set_current_state(TASK_UNINTERRUPTIBLE);
> > - * if (do_i_need_to_sleep())
> > - * schedule();
> > - *
> > - * If the caller does not need such serialisation then use __set_current_state()
> > - */
> > #define __set_current_state(state_value) \
> > do { \
> > current->task_state_change = _THIS_IP_; \
> > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_current_state(state_value) \
> > do { \
> > current->task_state_change = _THIS_IP_; \
> > - smp_store_mb(current->state, (state_value)); \
> > + smp_store_mb(current->state, (state_value)); \
> > } while (0)
> >
> > #else
> >
> > +/*
> > + * @tsk had better be current, or you get to keep the pieces.
>
> That reminds me we were getting rid of the set_task_state() calls. Bcache was
> pending, being only user in the kernel that doesn't actually use current; but
> instead breaks newly (yet blocked/uninterruptible) created garbage collection
> kthread. I cannot figure out why this is done (ie purposely accounting the
> load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> as as by the allocator task, so I don't see why bcache gc should want to be
> interruptible.
>
> Kent, Jens, can we get rid of this?
Here's a patch that just fixes the way gc gets woken up. Eric, you want to take
this?
-- >8 --
Subject: [PATCH] bcache: Make gc wakeup saner
This lets us ditch a set_task_state() call.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
---
drivers/md/bcache/bcache.h | 4 ++--
drivers/md/bcache/btree.c | 39 ++++++++++++++++++++-------------------
drivers/md/bcache/btree.h | 3 +--
drivers/md/bcache/request.c | 4 +---
drivers/md/bcache/super.c | 2 ++
5 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 6b420a55c7..c3ea03c9a1 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -425,7 +425,7 @@ struct cache {
* until a gc finishes - otherwise we could pointlessly burn a ton of
* cpu
*/
- unsigned invalidate_needs_gc:1;
+ unsigned invalidate_needs_gc;
bool discard; /* Get rid of? */
@@ -593,8 +593,8 @@ struct cache_set {
/* Counts how many sectors bio_insert has added to the cache */
atomic_t sectors_to_gc;
+ wait_queue_head_t gc_wait;
- wait_queue_head_t moving_gc_wait;
struct keybuf moving_gc_keys;
/* Number of moving GC bios in flight */
struct semaphore moving_in_flight;
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81d3db40cd..2efdce0724 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1757,32 +1757,34 @@ static void bch_btree_gc(struct cache_set *c)
bch_moving_gc(c);
}
-static int bch_gc_thread(void *arg)
+static bool gc_should_run(struct cache_set *c)
{
- struct cache_set *c = arg;
struct cache *ca;
unsigned i;
- while (1) {
-again:
- bch_btree_gc(c);
+ for_each_cache(ca, c, i)
+ if (ca->invalidate_needs_gc)
+ return true;
- set_current_state(TASK_INTERRUPTIBLE);
- if (kthread_should_stop())
- break;
+ if (atomic_read(&c->sectors_to_gc) < 0)
+ return true;
- mutex_lock(&c->bucket_lock);
+ return false;
+}
- for_each_cache(ca, c, i)
- if (ca->invalidate_needs_gc) {
- mutex_unlock(&c->bucket_lock);
- set_current_state(TASK_RUNNING);
- goto again;
- }
+static int bch_gc_thread(void *arg)
+{
+ struct cache_set *c = arg;
- mutex_unlock(&c->bucket_lock);
+ while (1) {
+ wait_event_interruptible(c->gc_wait,
+ kthread_should_stop() || gc_should_run(c));
- schedule();
+ if (kthread_should_stop())
+ break;
+
+ set_gc_sectors(c);
+ bch_btree_gc(c);
}
return 0;
@@ -1790,11 +1792,10 @@ static int bch_gc_thread(void *arg)
int bch_gc_thread_start(struct cache_set *c)
{
- c->gc_thread = kthread_create(bch_gc_thread, c, "bcache_gc");
+ c->gc_thread = kthread_run(bch_gc_thread, c, "bcache_gc");
if (IS_ERR(c->gc_thread))
return PTR_ERR(c->gc_thread);
- set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
return 0;
}
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index 5c391fa01b..9b80417cd5 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -260,8 +260,7 @@ void bch_initial_mark_key(struct cache_set *, int, struct bkey *);
static inline void wake_up_gc(struct cache_set *c)
{
- if (c->gc_thread)
- wake_up_process(c->gc_thread);
+ wake_up(&c->gc_wait);
}
#define MAP_DONE 0
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 40ffe5e424..a37c1776f2 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -196,10 +196,8 @@ static void bch_data_insert_start(struct closure *cl)
struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
struct bio *bio = op->bio, *n;
- if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0) {
- set_gc_sectors(op->c);
+ if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
wake_up_gc(op->c);
- }
if (op->bypass)
return bch_data_invalidate(cl);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 849ad441cd..66669c8f41 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1491,6 +1491,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
mutex_init(&c->bucket_lock);
init_waitqueue_head(&c->btree_cache_wait);
init_waitqueue_head(&c->bucket_wait);
+ init_waitqueue_head(&c->gc_wait);
sema_init(&c->uuid_write_mutex, 1);
spin_lock_init(&c->btree_gc_time.lock);
@@ -1550,6 +1551,7 @@ static void run_cache_set(struct cache_set *c)
for_each_cache(ca, c, i)
c->nbuckets += ca->sb.nbuckets;
+ set_gc_sectors(c);
if (CACHE_SYNC(&c->sb)) {
LIST_HEAD(journal);
--
2.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)
2016-10-24 14:27 ` Kent Overstreet
@ 2016-10-25 16:55 ` Eric Wheeler
2016-10-25 17:45 ` Kent Overstreet
0 siblings, 1 reply; 5+ messages in thread
From: Eric Wheeler @ 2016-10-25 16:55 UTC (permalink / raw)
To: Kent Overstreet; +Cc: Davidlohr Bueso, Linux Kernel Mailing List, linux-bcache
On Mon, 24 Oct 2016, Kent Overstreet wrote:
> On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> > On Wed, 19 Oct 2016, Peter Zijlstra wrote:
> >
> > > Subject: sched: Better explain sleep/wakeup
> > > From: Peter Zijlstra <peterz@infradead.org>
> > > Date: Wed Oct 19 15:45:27 CEST 2016
> > >
> > > There were a few questions wrt how sleep-wakeup works. Try and explain
> > > it more.
> > >
> > > Requested-by: Will Deacon <will.deacon@arm.com>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > include/linux/sched.h | 52 ++++++++++++++++++++++++++++++++++----------------
> > > kernel/sched/core.c | 15 +++++++-------
> > > 2 files changed, 44 insertions(+), 23 deletions(-)
> > >
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > > #define set_task_state(tsk, state_value) \
> > > do { \
> > > (tsk)->task_state_change = _THIS_IP_; \
> > > - smp_store_mb((tsk)->state, (state_value)); \
> > > + smp_store_mb((tsk)->state, (state_value)); \
> > > } while (0)
> > >
> > > -/*
> > > - * set_current_state() includes a barrier so that the write of current->state
> > > - * is correctly serialised wrt the caller's subsequent test of whether to
> > > - * actually sleep:
> > > - *
> > > - * set_current_state(TASK_UNINTERRUPTIBLE);
> > > - * if (do_i_need_to_sleep())
> > > - * schedule();
> > > - *
> > > - * If the caller does not need such serialisation then use __set_current_state()
> > > - */
> > > #define __set_current_state(state_value) \
> > > do { \
> > > current->task_state_change = _THIS_IP_; \
> > > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > > #define set_current_state(state_value) \
> > > do { \
> > > current->task_state_change = _THIS_IP_; \
> > > - smp_store_mb(current->state, (state_value)); \
> > > + smp_store_mb(current->state, (state_value)); \
> > > } while (0)
> > >
> > > #else
> > >
> > > +/*
> > > + * @tsk had better be current, or you get to keep the pieces.
> >
> > That reminds me we were getting rid of the set_task_state() calls. Bcache was
> > pending, being only user in the kernel that doesn't actually use current; but
> > instead breaks newly (yet blocked/uninterruptible) created garbage collection
> > kthread. I cannot figure out why this is done (ie purposely accounting the
> > load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> > as as by the allocator task, so I don't see why bcache gc should want to be
> > interruptible.
> >
> > Kent, Jens, can we get rid of this?
>
> Here's a patch that just fixes the way gc gets woken up. Eric, you want to take
> this?
Sure, I'll put it up with my -rc2 pull request to Jens.
A couple of sanity checks (for my understanding at least):
Why does bch_data_insert_start() no longer need to call
set_gc_sectors(op->c) now that bch_cache_set_alloc and bch_gc_thread do?
Does bch_cache_set_alloc() even need to call set_gc_sectors since
bch_gc_thread() does before calling bch_btree_gc?
Also I'm curious, why change invalidate_needs_gc from a bitfield?
-Eric
>
> -- >8 --
> Subject: [PATCH] bcache: Make gc wakeup saner
>
> This lets us ditch a set_task_state() call.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
> ---
> drivers/md/bcache/bcache.h | 4 ++--
> drivers/md/bcache/btree.c | 39 ++++++++++++++++++++-------------------
> drivers/md/bcache/btree.h | 3 +--
> drivers/md/bcache/request.c | 4 +---
> drivers/md/bcache/super.c | 2 ++
> 5 files changed, 26 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 6b420a55c7..c3ea03c9a1 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -425,7 +425,7 @@ struct cache {
> * until a gc finishes - otherwise we could pointlessly burn a ton of
> * cpu
> */
> - unsigned invalidate_needs_gc:1;
> + unsigned invalidate_needs_gc;
>
> bool discard; /* Get rid of? */
>
> @@ -593,8 +593,8 @@ struct cache_set {
>
> /* Counts how many sectors bio_insert has added to the cache */
> atomic_t sectors_to_gc;
> + wait_queue_head_t gc_wait;
>
> - wait_queue_head_t moving_gc_wait;
> struct keybuf moving_gc_keys;
> /* Number of moving GC bios in flight */
> struct semaphore moving_in_flight;
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 81d3db40cd..2efdce0724 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1757,32 +1757,34 @@ static void bch_btree_gc(struct cache_set *c)
> bch_moving_gc(c);
> }
>
> -static int bch_gc_thread(void *arg)
> +static bool gc_should_run(struct cache_set *c)
> {
> - struct cache_set *c = arg;
> struct cache *ca;
> unsigned i;
>
> - while (1) {
> -again:
> - bch_btree_gc(c);
> + for_each_cache(ca, c, i)
> + if (ca->invalidate_needs_gc)
> + return true;
>
> - set_current_state(TASK_INTERRUPTIBLE);
> - if (kthread_should_stop())
> - break;
> + if (atomic_read(&c->sectors_to_gc) < 0)
> + return true;
>
> - mutex_lock(&c->bucket_lock);
> + return false;
> +}
>
> - for_each_cache(ca, c, i)
> - if (ca->invalidate_needs_gc) {
> - mutex_unlock(&c->bucket_lock);
> - set_current_state(TASK_RUNNING);
> - goto again;
> - }
> +static int bch_gc_thread(void *arg)
> +{
> + struct cache_set *c = arg;
>
> - mutex_unlock(&c->bucket_lock);
> + while (1) {
> + wait_event_interruptible(c->gc_wait,
> + kthread_should_stop() || gc_should_run(c));
>
> - schedule();
> + if (kthread_should_stop())
> + break;
> +
> + set_gc_sectors(c);
> + bch_btree_gc(c);
> }
>
> return 0;
> @@ -1790,11 +1792,10 @@ static int bch_gc_thread(void *arg)
>
> int bch_gc_thread_start(struct cache_set *c)
> {
> - c->gc_thread = kthread_create(bch_gc_thread, c, "bcache_gc");
> + c->gc_thread = kthread_run(bch_gc_thread, c, "bcache_gc");
> if (IS_ERR(c->gc_thread))
> return PTR_ERR(c->gc_thread);
>
> - set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
> return 0;
> }
>
> diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
> index 5c391fa01b..9b80417cd5 100644
> --- a/drivers/md/bcache/btree.h
> +++ b/drivers/md/bcache/btree.h
> @@ -260,8 +260,7 @@ void bch_initial_mark_key(struct cache_set *, int, struct bkey *);
>
> static inline void wake_up_gc(struct cache_set *c)
> {
> - if (c->gc_thread)
> - wake_up_process(c->gc_thread);
> + wake_up(&c->gc_wait);
> }
>
> #define MAP_DONE 0
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 40ffe5e424..a37c1776f2 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -196,10 +196,8 @@ static void bch_data_insert_start(struct closure *cl)
> struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
> struct bio *bio = op->bio, *n;
>
> - if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0) {
> - set_gc_sectors(op->c);
> + if (atomic_sub_return(bio_sectors(bio), &op->c->sectors_to_gc) < 0)
> wake_up_gc(op->c);
> - }
>
> if (op->bypass)
> return bch_data_invalidate(cl);
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 849ad441cd..66669c8f41 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1491,6 +1491,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
> mutex_init(&c->bucket_lock);
> init_waitqueue_head(&c->btree_cache_wait);
> init_waitqueue_head(&c->bucket_wait);
> + init_waitqueue_head(&c->gc_wait);
> sema_init(&c->uuid_write_mutex, 1);
>
> spin_lock_init(&c->btree_gc_time.lock);
> @@ -1550,6 +1551,7 @@ static void run_cache_set(struct cache_set *c)
>
> for_each_cache(ca, c, i)
> c->nbuckets += ca->sb.nbuckets;
> + set_gc_sectors(c);
>
> if (CACHE_SYNC(&c->sb)) {
> LIST_HEAD(journal);
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)
2016-10-25 16:55 ` Eric Wheeler
@ 2016-10-25 17:45 ` Kent Overstreet
0 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2016-10-25 17:45 UTC (permalink / raw)
To: Eric Wheeler; +Cc: Davidlohr Bueso, Linux Kernel Mailing List, linux-bcache
On Tue, Oct 25, 2016 at 09:55:21AM -0700, Eric Wheeler wrote:
> Sure, I'll put it up with my -rc2 pull request to Jens.
>
> A couple of sanity checks (for my understanding at least):
>
> Why does bch_data_insert_start() no longer need to call
> set_gc_sectors(op->c) now that bch_cache_set_alloc and bch_gc_thread do?
Before, the gc thread wasn't being explicitly signalled that it was time to run
- it was just being woken up, meaning that a spurious wakeup would cause gc to
run. At best it was sketchy and fragile, for multiple reasons - wait_event() is
a much better mechanism.
So with wait_event() gc is explicitly checking if it's time to run - the wakeup
doesn't really make anything happen by itself, it just pokes the gc thread so
it's able to notice that it should run.
When you're signalling a thread this way - we're in effect setting a global
variable that says "gc should run now", then kicking the gc thread so it can
check the "gc should run" variable. But wakeups aren't synchronous - our call to
wake_up() doesn't make the gc thread check that variable before it returns, all
we know when the wake_up() call returns is that the gc thread is going to check
that variable some point in the future. So we can't set the "gc should run"
varible, wake up the gc thread, and then set it back to "gc shouldn't run yet" -
what'll happen most of the time is that the gc thread won't run before we set
the variable back to "gc shouldn't run yet", it'll never see that it was
supposed to run and it'll go back to sleep.
So the way you make this work is you have the gc thread has to set the variable
back to "gc shouldn't run yet" _after_ it's seen it and decided to run.
> Does bch_cache_set_alloc() even need to call set_gc_sectors since
> bch_gc_thread() does before calling bch_btree_gc?
Yes, because the gc thread only resets the counter when it's decided to run - we
don't want it to run right away at startup.
> Also I'm curious, why change invalidate_needs_gc from a bitfield?
Bitfields are particularly unsafe for multiple threads to access - the compiler
has to emit instructions to do read/modify/write, which will clobber adjacent
data. A bare int is also not in _general_ safe for multiple threads to access
without a lock, but for what it's doing here it's fine.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-10-25 17:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20161007145243.361481786@infradead.org>
[not found] ` <20161007150211.271490994@infradead.org>
[not found] ` <20161013151720.GB13138@arm.com>
[not found] ` <20161019173403.GB3142@twins.programming.kicks-ass.net>
2016-10-24 1:57 ` ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop) Davidlohr Bueso
2016-10-24 13:26 ` Kent Overstreet
2016-10-24 14:27 ` Kent Overstreet
2016-10-25 16:55 ` Eric Wheeler
2016-10-25 17:45 ` Kent Overstreet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox