[PATCH] bcache: treat stale && dirty keys as bad keys

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] bcache: treat stale && dirty keys as bad keys
@ 2018-12-18  6:37 Junhui Tang
  2018-12-18 14:01 ` Kent Overstreet
  2018-12-22 13:02 ` Coly Li
  0 siblings, 2 replies; 7+ messages in thread
From: Junhui Tang @ 2018-12-18  6:37 UTC (permalink / raw)
  To: colyli; +Cc: linux-bcache, linux-block

From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001
From: Tang Junhui <tang.junhui.linux@gmail.com>
Date: Tue, 18 Dec 2018 10:38:26 +0800
Subject: [PATCH] bcache: treat stale && dirty keys as bad keys

Stale && dirty keys can be produced in the follow way:
After writeback in write_dirty_finish(), dirty keys k1 will
replace by clean keys k2
==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
==>static int bch_btree_insert_node(struct btree *b,
       struct btree_op *op,
       struct keylist *insert_keys,
       atomic_t *journal_ref,
Then two steps:
A) update k1 to k2 in btree node memory;
   bch_btree_insert_keys(b, op, insert_keys, replace_key)
B) Write the bset(contains k2) to cache disk by a 30s delay work
   bch_btree_leaf_dirty(b, journal_ref).
But before the 30s delay work write the bset to cache device,
these things happend:
A) GC works, and reclaim the bucket k2 point to;
B) Allocator works, and invalidate the bucket k2 point to,
   and increase the gen of the bucket, and place it into free_inc
   fifo;
C) Until now, the 30s delay work still does not finish work,
   so in the disk, the key still is k1, it is dirty and stale
   (its gen is smaller than the gen of the bucket). and then the
   machine power off suddenly happens;
D) When the machine power on again, after the btree reconstruction,
   the stale dirty key appear.

In bch_extent_bad(), when expensive_debug_checks is off, it would
treat the dirty key as good even it is stale keys, and it would
cause bellow probelms:
A) In read_dirty() it would cause machine crash:
   BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
B) It could be worse when reads hits stale dirty keys, it would
   read old incorrect data.

This patch tolerate the existence of these stale && dirty keys,
and treat them as bad key in bch_extent_bad().

Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
---
 drivers/md/bcache/extents.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
index 1d09674..f9eb03c 100644
--- a/drivers/md/bcache/extents.c
+++ b/drivers/md/bcache/extents.c
@@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk,
const struct bkey *k)
 {
  struct btree *b = container_of(bk, struct btree, keys);
  unsigned i, stale;
+ char buf[80];

  if (!KEY_PTRS(k) ||
      bch_extent_invalid(bk, k))
@@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys
*bk, const struct bkey *k)
  if (!ptr_available(b->c, k, i))
  return true;

- if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
- return false;
-
  for (i = 0; i < KEY_PTRS(k); i++) {
  stale = ptr_stale(b->c, k, i);

+ if (stale && KEY_DIRTY(k)) {
+ bch_extent_to_text(buf, sizeof(buf), k);
+ pr_info("stale dirty pointer, stale %u, key: %s",
+     stale,
+     buf);
+ }
+
  btree_bug_on(stale > 96, b,
       "key too stale: %i, need_gc %u",
       stale, b->c->need_gc);

- btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
-      b, "stale dirty pointer");
-
  if (stale)
  return true;

-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] bcache: treat stale && dirty keys as bad keys
  2018-12-18  6:37 [PATCH] bcache: treat stale && dirty keys as bad keys Junhui Tang
@ 2018-12-18 14:01 ` Kent Overstreet
  2018-12-19  1:30   ` Junhui Tang
  2018-12-19  1:32   ` Junhui Tang
  2018-12-22 13:02 ` Coly Li
  1 sibling, 2 replies; 7+ messages in thread
From: Kent Overstreet @ 2018-12-18 14:01 UTC (permalink / raw)
  To: Junhui Tang; +Cc: colyli, linux-bcache, linux-block

On Tue, Dec 18, 2018 at 02:37:14PM +0800, Junhui Tang wrote:
> From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001
> From: Tang Junhui <tang.junhui.linux@gmail.com>
> Date: Tue, 18 Dec 2018 10:38:26 +0800
> Subject: [PATCH] bcache: treat stale && dirty keys as bad keys
> 
> Stale && dirty keys can be produced in the follow way:
> After writeback in write_dirty_finish(), dirty keys k1 will
> replace by clean keys k2
> ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
> ==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
> ==>static int bch_btree_insert_node(struct btree *b,
>        struct btree_op *op,
>        struct keylist *insert_keys,
>        atomic_t *journal_ref,
> Then two steps:
> A) update k1 to k2 in btree node memory;
>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
> B) Write the bset(contains k2) to cache disk by a 30s delay work
>    bch_btree_leaf_dirty(b, journal_ref).
> But before the 30s delay work write the bset to cache device,
> these things happend:
> A) GC works, and reclaim the bucket k2 point to;
> B) Allocator works, and invalidate the bucket k2 point to,
>    and increase the gen of the bucket, and place it into free_inc
>    fifo;
> C) Until now, the 30s delay work still does not finish work,
>    so in the disk, the key still is k1, it is dirty and stale
>    (its gen is smaller than the gen of the bucket). and then the
>    machine power off suddenly happens;
> D) When the machine power on again, after the btree reconstruction,
>    the stale dirty key appear.

Only prior to journal replay, right? Or did you uncover something more severe?

> In bch_extent_bad(), when expensive_debug_checks is off, it would
> treat the dirty key as good even it is stale keys, and it would
> cause bellow probelms:
> A) In read_dirty() it would cause machine crash:
>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> B) It could be worse when reads hits stale dirty keys, it would
>    read old incorrect data.

Neither of these can happen until after journal replay is finished. Prior to
journal replay we expect to find stale dirty keys - if we find any after journal
replay then it's indicative of a real bug.

> 
> This patch tolerate the existence of these stale && dirty keys,
> and treat them as bad key in bch_extent_bad().
> 
> Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
> ---
>  drivers/md/bcache/extents.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
> index 1d09674..f9eb03c 100644
> --- a/drivers/md/bcache/extents.c
> +++ b/drivers/md/bcache/extents.c
> @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk,
> const struct bkey *k)
>  {
>   struct btree *b = container_of(bk, struct btree, keys);
>   unsigned i, stale;
> + char buf[80];
> 
>   if (!KEY_PTRS(k) ||
>       bch_extent_invalid(bk, k))
> @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys
> *bk, const struct bkey *k)
>   if (!ptr_available(b->c, k, i))
>   return true;
> 
> - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
> - return false;
> -
>   for (i = 0; i < KEY_PTRS(k); i++) {
>   stale = ptr_stale(b->c, k, i);
> 
> + if (stale && KEY_DIRTY(k)) {
> + bch_extent_to_text(buf, sizeof(buf), k);
> + pr_info("stale dirty pointer, stale %u, key: %s",
> +     stale,
> +     buf);
> + }
> +
>   btree_bug_on(stale > 96, b,
>        "key too stale: %i, need_gc %u",
>        stale, b->c->need_gc);
> 
> - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
> -      b, "stale dirty pointer");
> -
>   if (stale)
>   return true;
> 
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bcache: treat stale && dirty keys as bad keys
  2018-12-18 14:01 ` Kent Overstreet
@ 2018-12-19  1:30   ` Junhui Tang
  2018-12-19  1:32   ` Junhui Tang
  1 sibling, 0 replies; 7+ messages in thread
From: Junhui Tang @ 2018-12-19  1:30 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: colyli, linux-bcache, linux-block

hello Kent
long long no see, glad to hear you again.
>> Then two steps:
>> A) update k1 to k2 in btree node memory;
>>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
>> B) Write the bset(contains k2) to cache disk by a 30s delay work
>>    bch_btree_leaf_dirty(b, journal_ref).
>> But before the 30s delay work write the bset to cache device,
>> these things happend:
>> A) GC works, and reclaim the bucket k2 point to;
>> B) Allocator works, and invalidate the bucket k2 point to,
>>    and increase the gen of the bucket, and place it into free_inc
>>    fifo;
>> C) Until now, the 30s delay work still does not finish work,
>>    so in the disk, the key still is k1, it is dirty and stale
>>    (its gen is smaller than the gen of the bucket). and then the
>>    machine power off suddenly happens;
>> D) When the machine power on again, after the btree reconstruction,
>>    the stale dirty key appear.

> Only prior to journal replay, right? Or did you uncover something more severe?
No, it's after the journal replay, and in write_dirty_finish(), when
replace a dirty key with
a clean key by calling bch_btree_insert(), no journal will write.

>> In bch_extent_bad(), when expensive_debug_checks is off, it would
>> treat the dirty key as good even it is stale keys, and it would
>> cause bellow probelms:
>> A) In read_dirty() it would cause machine crash:
>>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
>> B) It could be worse when reads hits stale dirty keys, it would
>>    read old incorrect data.

>Neither of these can happen until after journal replay is finished. Prior to
>journal replay we expect to find stale dirty keys - if we find any after journal
>replay then it's indicative of a real bug.
As I said previous, since no journal writes after inserting a replace key in
writeback, so this issue has nothing to do with journal.

This is a real problem in my environment, after running IO sometimes, I turn off
the power suddenly,  then turn on the power, and the machine crash in
read_dirty()
due to the stale && dirty keys.

Kent Overstreet <kent.overstreet@gmail.com> 于2018年12月18日周二 下午10:02写道：
>
> On Tue, Dec 18, 2018 at 02:37:14PM +0800, Junhui Tang wrote:
> > From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001
> > From: Tang Junhui <tang.junhui.linux@gmail.com>
> > Date: Tue, 18 Dec 2018 10:38:26 +0800
> > Subject: [PATCH] bcache: treat stale && dirty keys as bad keys
> >
> > Stale && dirty keys can be produced in the follow way:
> > After writeback in write_dirty_finish(), dirty keys k1 will
> > replace by clean keys k2
> > ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
> > ==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
> > ==>static int bch_btree_insert_node(struct btree *b,
> >        struct btree_op *op,
> >        struct keylist *insert_keys,
> >        atomic_t *journal_ref,
> > Then two steps:
> > A) update k1 to k2 in btree node memory;
> >    bch_btree_insert_keys(b, op, insert_keys, replace_key)
> > B) Write the bset(contains k2) to cache disk by a 30s delay work
> >    bch_btree_leaf_dirty(b, journal_ref).
> > But before the 30s delay work write the bset to cache device,
> > these things happend:
> > A) GC works, and reclaim the bucket k2 point to;
> > B) Allocator works, and invalidate the bucket k2 point to,
> >    and increase the gen of the bucket, and place it into free_inc
> >    fifo;
> > C) Until now, the 30s delay work still does not finish work,
> >    so in the disk, the key still is k1, it is dirty and stale
> >    (its gen is smaller than the gen of the bucket). and then the
> >    machine power off suddenly happens;
> > D) When the machine power on again, after the btree reconstruction,
> >    the stale dirty key appear.
>
> Only prior to journal replay, right? Or did you uncover something more severe?
>
> > In bch_extent_bad(), when expensive_debug_checks is off, it would
> > treat the dirty key as good even it is stale keys, and it would
> > cause bellow probelms:
> > A) In read_dirty() it would cause machine crash:
> >    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> > B) It could be worse when reads hits stale dirty keys, it would
> >    read old incorrect data.
>
> Neither of these can happen until after journal replay is finished. Prior to
> journal replay we expect to find stale dirty keys - if we find any after journal
> replay then it's indicative of a real bug.
>
> >
> > This patch tolerate the existence of these stale && dirty keys,
> > and treat them as bad key in bch_extent_bad().
> >
> > Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
> > ---
> >  drivers/md/bcache/extents.c | 14 ++++++++------
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
> > index 1d09674..f9eb03c 100644
> > --- a/drivers/md/bcache/extents.c
> > +++ b/drivers/md/bcache/extents.c
> > @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk,
> > const struct bkey *k)
> >  {
> >   struct btree *b = container_of(bk, struct btree, keys);
> >   unsigned i, stale;
> > + char buf[80];
> >
> >   if (!KEY_PTRS(k) ||
> >       bch_extent_invalid(bk, k))
> > @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys
> > *bk, const struct bkey *k)
> >   if (!ptr_available(b->c, k, i))
> >   return true;
> >
> > - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
> > - return false;
> > -
> >   for (i = 0; i < KEY_PTRS(k); i++) {
> >   stale = ptr_stale(b->c, k, i);
> >
> > + if (stale && KEY_DIRTY(k)) {
> > + bch_extent_to_text(buf, sizeof(buf), k);
> > + pr_info("stale dirty pointer, stale %u, key: %s",
> > +     stale,
> > +     buf);
> > + }
> > +
> >   btree_bug_on(stale > 96, b,
> >        "key too stale: %i, need_gc %u",
> >        stale, b->c->need_gc);
> >
> > - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
> > -      b, "stale dirty pointer");
> > -
> >   if (stale)
> >   return true;
> >
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bcache: treat stale && dirty keys as bad keys
  2018-12-18 14:01 ` Kent Overstreet
  2018-12-19  1:30   ` Junhui Tang
@ 2018-12-19  1:32   ` Junhui Tang
  2018-12-19 11:32     ` Kent Overstreet
  1 sibling, 1 reply; 7+ messages in thread
From: Junhui Tang @ 2018-12-19  1:32 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: colyli, linux-bcache, linux-block

hello Kent

Long long no see, glad to hear you again.

>> Then two steps:
>> A) update k1 to k2 in btree node memory;
>>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
>> B) Write the bset(contains k2) to cache disk by a 30s delay work
>>    bch_btree_leaf_dirty(b, journal_ref).
>> But before the 30s delay work write the bset to cache device,
>> these things happend:
>> A) GC works, and reclaim the bucket k2 point to;
>> B) Allocator works, and invalidate the bucket k2 point to,
>>    and increase the gen of the bucket, and place it into free_inc
>>    fifo;
>> C) Until now, the 30s delay work still does not finish work,
>>    so in the disk, the key still is k1, it is dirty and stale
>>    (its gen is smaller than the gen of the bucket). and then the
>>    machine power off suddenly happens;
>> D) When the machine power on again, after the btree reconstruction,
>>    the stale dirty key appear.

> Only prior to journal replay, right? Or did you uncover something more severe?
No, it's after the journal replay, and in write_dirty_finish(), when
replace a dirty key with a clean key by calling bch_btree_insert(),
no journal will write.

>> In bch_extent_bad(), when expensive_debug_checks is off, it would
>> treat the dirty key as good even it is stale keys, and it would
>> cause bellow probelms:
>> A) In read_dirty() it would cause machine crash:
>>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
>> B) It could be worse when reads hits stale dirty keys, it would
>>    read old incorrect data.

>Neither of these can happen until after journal replay is finished. Prior to
>journal replay we expect to find stale dirty keys - if we find any after journal
>replay then it's indicative of a real bug.
As I said previous, since no journal writes after inserting a replace key in
writeback, so this issue has nothing to do with journal.

This is a real problem in my environment, after running IO sometimes, I turn off
the power suddenly,  then turn on the power, and the machine crash in
read_dirty() due to the stale && dirty keys.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bcache: treat stale && dirty keys as bad keys
  2018-12-19  1:32   ` Junhui Tang
@ 2018-12-19 11:32     ` Kent Overstreet
  2018-12-20  8:40       ` Junhui Tang
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2018-12-19 11:32 UTC (permalink / raw)
  To: Junhui Tang; +Cc: colyli, linux-bcache, linux-block

On Wed, Dec 19, 2018 at 09:32:55AM +0800, Junhui Tang wrote:
> hello Kent
> 
> Long long no see, glad to hear you again.
> 
> >> Then two steps:
> >> A) update k1 to k2 in btree node memory;
> >>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
> >> B) Write the bset(contains k2) to cache disk by a 30s delay work
> >>    bch_btree_leaf_dirty(b, journal_ref).
> >> But before the 30s delay work write the bset to cache device,
> >> these things happend:
> >> A) GC works, and reclaim the bucket k2 point to;
> >> B) Allocator works, and invalidate the bucket k2 point to,
> >>    and increase the gen of the bucket, and place it into free_inc
> >>    fifo;
> >> C) Until now, the 30s delay work still does not finish work,
> >>    so in the disk, the key still is k1, it is dirty and stale
> >>    (its gen is smaller than the gen of the bucket). and then the
> >>    machine power off suddenly happens;
> >> D) When the machine power on again, after the btree reconstruction,
> >>    the stale dirty key appear.
> 
> > Only prior to journal replay, right? Or did you uncover something more severe?
> No, it's after the journal replay, and in write_dirty_finish(), when
> replace a dirty key with a clean key by calling bch_btree_insert(),
> no journal will write.

Holy crap you're right, this was from before I moved journalling to be driven by
the btree update path.

I think a better fix here would be to journal the btree updates writeback does,
but given that we haven't been journalling those updates all this time your fix
does make sense too.

> 
> >> In bch_extent_bad(), when expensive_debug_checks is off, it would
> >> treat the dirty key as good even it is stale keys, and it would
> >> cause bellow probelms:
> >> A) In read_dirty() it would cause machine crash:
> >>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> >> B) It could be worse when reads hits stale dirty keys, it would
> >>    read old incorrect data.
> 
> >Neither of these can happen until after journal replay is finished. Prior to
> >journal replay we expect to find stale dirty keys - if we find any after journal
> >replay then it's indicative of a real bug.
> As I said previous, since no journal writes after inserting a replace key in
> writeback, so this issue has nothing to do with journal.
> 
> This is a real problem in my environment, after running IO sometimes, I turn off
> the power suddenly,  then turn on the power, and the machine crash in
> read_dirty() due to the stale && dirty keys.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bcache: treat stale && dirty keys as bad keys
  2018-12-19 11:32     ` Kent Overstreet
@ 2018-12-20  8:40       ` Junhui Tang
  0 siblings, 0 replies; 7+ messages in thread
From: Junhui Tang @ 2018-12-20  8:40 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: colyli, linux-bcache, linux-block

Kent Overstreet <kent.overstreet@gmail.com> 于2018年12月19日周三 下午7:32写道：
>
> On Wed, Dec 19, 2018 at 09:32:55AM +0800, Junhui Tang wrote:
> > hello Kent
> >
> > Long long no see, glad to hear you again.
> >
> > >> Then two steps:
> > >> A) update k1 to k2 in btree node memory;
> > >>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
> > >> B) Write the bset(contains k2) to cache disk by a 30s delay work
> > >>    bch_btree_leaf_dirty(b, journal_ref).
> > >> But before the 30s delay work write the bset to cache device,
> > >> these things happend:
> > >> A) GC works, and reclaim the bucket k2 point to;
> > >> B) Allocator works, and invalidate the bucket k2 point to,
> > >>    and increase the gen of the bucket, and place it into free_inc
> > >>    fifo;
> > >> C) Until now, the 30s delay work still does not finish work,
> > >>    so in the disk, the key still is k1, it is dirty and stale
> > >>    (its gen is smaller than the gen of the bucket). and then the
> > >>    machine power off suddenly happens;
> > >> D) When the machine power on again, after the btree reconstruction,
> > >>    the stale dirty key appear.
> >
> > > Only prior to journal replay, right? Or did you uncover something more severe?
> > No, it's after the journal replay, and in write_dirty_finish(), when
> > replace a dirty key with a clean key by calling bch_btree_insert(),
> > no journal will write.
>
> Holy crap you're right, this was from before I moved journalling to be driven by
> the btree update path.
>
> I think a better fix here would be to journal the btree updates writeback does,
> but given that we haven't been journalling those updates all this time your fix
> does make sense too.

Journal those updates would consume lots of resources, so I don't think it a
good way. And this fix is very simple.

Kent, Could you add an Acked-by or Reviewed-by for this patch? So Coly
can put it to upstream.
>
> >
> > >> In bch_extent_bad(), when expensive_debug_checks is off, it would
> > >> treat the dirty key as good even it is stale keys, and it would
> > >> cause bellow probelms:
> > >> A) In read_dirty() it would cause machine crash:
> > >>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> > >> B) It could be worse when reads hits stale dirty keys, it would
> > >>    read old incorrect data.
> >
> > >Neither of these can happen until after journal replay is finished. Prior to
> > >journal replay we expect to find stale dirty keys - if we find any after journal
> > >replay then it's indicative of a real bug.
> > As I said previous, since no journal writes after inserting a replace key in
> > writeback, so this issue has nothing to do with journal.
> >
> > This is a real problem in my environment, after running IO sometimes, I turn off
> > the power suddenly,  then turn on the power, and the machine crash in
> > read_dirty() due to the stale && dirty keys.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bcache: treat stale && dirty keys as bad keys
  2018-12-18  6:37 [PATCH] bcache: treat stale && dirty keys as bad keys Junhui Tang
  2018-12-18 14:01 ` Kent Overstreet
@ 2018-12-22 13:02 ` Coly Li
  1 sibling, 0 replies; 7+ messages in thread
From: Coly Li @ 2018-12-22 13:02 UTC (permalink / raw)
  To: Junhui Tang; +Cc: linux-bcache, linux-block

On 12/18/18 2:37 下午, Junhui Tang wrote:
> From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001
> From: Tang Junhui <tang.junhui.linux@gmail.com>
> Date: Tue, 18 Dec 2018 10:38:26 +0800
> Subject: [PATCH] bcache: treat stale && dirty keys as bad keys
> 
> Stale && dirty keys can be produced in the follow way:
> After writeback in write_dirty_finish(), dirty keys k1 will
> replace by clean keys k2
> ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
> ==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
> ==>static int bch_btree_insert_node(struct btree *b,
>        struct btree_op *op,
>        struct keylist *insert_keys,
>        atomic_t *journal_ref,
> Then two steps:
> A) update k1 to k2 in btree node memory;
>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
> B) Write the bset(contains k2) to cache disk by a 30s delay work
>    bch_btree_leaf_dirty(b, journal_ref).
> But before the 30s delay work write the bset to cache device,
> these things happend:
> A) GC works, and reclaim the bucket k2 point to;
> B) Allocator works, and invalidate the bucket k2 point to,
>    and increase the gen of the bucket, and place it into free_inc
>    fifo;
> C) Until now, the 30s delay work still does not finish work,
>    so in the disk, the key still is k1, it is dirty and stale
>    (its gen is smaller than the gen of the bucket). and then the
>    machine power off suddenly happens;
> D) When the machine power on again, after the btree reconstruction,
>    the stale dirty key appear.
> 
> In bch_extent_bad(), when expensive_debug_checks is off, it would
> treat the dirty key as good even it is stale keys, and it would
> cause bellow probelms:
> A) In read_dirty() it would cause machine crash:
>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> B) It could be worse when reads hits stale dirty keys, it would
>    read old incorrect data.
> 
> This patch tolerate the existence of these stale && dirty keys,
> and treat them as bad key in bch_extent_bad().
> 
> Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>

Hi Junhui,

As a quick fix, this patch is good to me. Although there is indent issue
in this patch, I will fix it, don't worry.

Now I am also testing this patch on my hardware, if the addressed
problem does not reproduce next Monday, I will submit it to Jens in 4.21
wave.

Thanks.

Coly Li


> ---
>  drivers/md/bcache/extents.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
> index 1d09674..f9eb03c 100644
> --- a/drivers/md/bcache/extents.c
> +++ b/drivers/md/bcache/extents.c
> @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk,
> const struct bkey *k)
>  {
>   struct btree *b = container_of(bk, struct btree, keys);
>   unsigned i, stale;
> + char buf[80];
> 
>   if (!KEY_PTRS(k) ||
>       bch_extent_invalid(bk, k))
> @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys
> *bk, const struct bkey *k)
>   if (!ptr_available(b->c, k, i))
>   return true;
> 
> - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
> - return false;
> -
>   for (i = 0; i < KEY_PTRS(k); i++) {
>   stale = ptr_stale(b->c, k, i);
> 
> + if (stale && KEY_DIRTY(k)) {
> + bch_extent_to_text(buf, sizeof(buf), k);
> + pr_info("stale dirty pointer, stale %u, key: %s",
> +     stale,
> +     buf);
> + }
> +
>   btree_bug_on(stale > 96, b,
>        "key too stale: %i, need_gc %u",
>        stale, b->c->need_gc);
> 
> - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
> -      b, "stale dirty pointer");
> -
>   if (stale)
>   return true;
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-12-22 17:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-18  6:37 [PATCH] bcache: treat stale && dirty keys as bad keys Junhui Tang
2018-12-18 14:01 ` Kent Overstreet
2018-12-19  1:30   ` Junhui Tang
2018-12-19  1:32   ` Junhui Tang
2018-12-19 11:32     ` Kent Overstreet
2018-12-20  8:40       ` Junhui Tang
2018-12-22 13:02 ` Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).