* [PATCH] bcache: treat stale && dirty keys as bad keys
@ 2018-12-18 6:37 Junhui Tang
2018-12-18 14:01 ` Kent Overstreet
2018-12-22 13:02 ` Coly Li
0 siblings, 2 replies; 7+ messages in thread
From: Junhui Tang @ 2018-12-18 6:37 UTC (permalink / raw)
To: colyli; +Cc: linux-bcache, linux-block
From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001
From: Tang Junhui <tang.junhui.linux@gmail.com>
Date: Tue, 18 Dec 2018 10:38:26 +0800
Subject: [PATCH] bcache: treat stale && dirty keys as bad keys
Stale && dirty keys can be produced in the follow way:
After writeback in write_dirty_finish(), dirty keys k1 will
replace by clean keys k2
==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
==>static int bch_btree_insert_node(struct btree *b,
struct btree_op *op,
struct keylist *insert_keys,
atomic_t *journal_ref,
Then two steps:
A) update k1 to k2 in btree node memory;
bch_btree_insert_keys(b, op, insert_keys, replace_key)
B) Write the bset(contains k2) to cache disk by a 30s delay work
bch_btree_leaf_dirty(b, journal_ref).
But before the 30s delay work write the bset to cache device,
these things happend:
A) GC works, and reclaim the bucket k2 point to;
B) Allocator works, and invalidate the bucket k2 point to,
and increase the gen of the bucket, and place it into free_inc
fifo;
C) Until now, the 30s delay work still does not finish work,
so in the disk, the key still is k1, it is dirty and stale
(its gen is smaller than the gen of the bucket). and then the
machine power off suddenly happens;
D) When the machine power on again, after the btree reconstruction,
the stale dirty key appear.
In bch_extent_bad(), when expensive_debug_checks is off, it would
treat the dirty key as good even it is stale keys, and it would
cause bellow probelms:
A) In read_dirty() it would cause machine crash:
BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
B) It could be worse when reads hits stale dirty keys, it would
read old incorrect data.
This patch tolerate the existence of these stale && dirty keys,
and treat them as bad key in bch_extent_bad().
Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
---
drivers/md/bcache/extents.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
index 1d09674..f9eb03c 100644
--- a/drivers/md/bcache/extents.c
+++ b/drivers/md/bcache/extents.c
@@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk,
const struct bkey *k)
{
struct btree *b = container_of(bk, struct btree, keys);
unsigned i, stale;
+ char buf[80];
if (!KEY_PTRS(k) ||
bch_extent_invalid(bk, k))
@@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys
*bk, const struct bkey *k)
if (!ptr_available(b->c, k, i))
return true;
- if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
- return false;
-
for (i = 0; i < KEY_PTRS(k); i++) {
stale = ptr_stale(b->c, k, i);
+ if (stale && KEY_DIRTY(k)) {
+ bch_extent_to_text(buf, sizeof(buf), k);
+ pr_info("stale dirty pointer, stale %u, key: %s",
+ stale,
+ buf);
+ }
+
btree_bug_on(stale > 96, b,
"key too stale: %i, need_gc %u",
stale, b->c->need_gc);
- btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
- b, "stale dirty pointer");
-
if (stale)
return true;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] bcache: treat stale && dirty keys as bad keys 2018-12-18 6:37 [PATCH] bcache: treat stale && dirty keys as bad keys Junhui Tang @ 2018-12-18 14:01 ` Kent Overstreet 2018-12-19 1:30 ` Junhui Tang 2018-12-19 1:32 ` Junhui Tang 2018-12-22 13:02 ` Coly Li 1 sibling, 2 replies; 7+ messages in thread From: Kent Overstreet @ 2018-12-18 14:01 UTC (permalink / raw) To: Junhui Tang; +Cc: colyli, linux-bcache, linux-block On Tue, Dec 18, 2018 at 02:37:14PM +0800, Junhui Tang wrote: > From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001 > From: Tang Junhui <tang.junhui.linux@gmail.com> > Date: Tue, 18 Dec 2018 10:38:26 +0800 > Subject: [PATCH] bcache: treat stale && dirty keys as bad keys > > Stale && dirty keys can be produced in the follow way: > After writeback in write_dirty_finish(), dirty keys k1 will > replace by clean keys k2 > ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key); > ==>btree_insert_fn(struct btree_op *b_op, struct btree *b) > ==>static int bch_btree_insert_node(struct btree *b, > struct btree_op *op, > struct keylist *insert_keys, > atomic_t *journal_ref, > Then two steps: > A) update k1 to k2 in btree node memory; > bch_btree_insert_keys(b, op, insert_keys, replace_key) > B) Write the bset(contains k2) to cache disk by a 30s delay work > bch_btree_leaf_dirty(b, journal_ref). > But before the 30s delay work write the bset to cache device, > these things happend: > A) GC works, and reclaim the bucket k2 point to; > B) Allocator works, and invalidate the bucket k2 point to, > and increase the gen of the bucket, and place it into free_inc > fifo; > C) Until now, the 30s delay work still does not finish work, > so in the disk, the key still is k1, it is dirty and stale > (its gen is smaller than the gen of the bucket). and then the > machine power off suddenly happens; > D) When the machine power on again, after the btree reconstruction, > the stale dirty key appear. Only prior to journal replay, right? Or did you uncover something more severe? > In bch_extent_bad(), when expensive_debug_checks is off, it would > treat the dirty key as good even it is stale keys, and it would > cause bellow probelms: > A) In read_dirty() it would cause machine crash: > BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > B) It could be worse when reads hits stale dirty keys, it would > read old incorrect data. Neither of these can happen until after journal replay is finished. Prior to journal replay we expect to find stale dirty keys - if we find any after journal replay then it's indicative of a real bug. > > This patch tolerate the existence of these stale && dirty keys, > and treat them as bad key in bch_extent_bad(). > > Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com> > --- > drivers/md/bcache/extents.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c > index 1d09674..f9eb03c 100644 > --- a/drivers/md/bcache/extents.c > +++ b/drivers/md/bcache/extents.c > @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk, > const struct bkey *k) > { > struct btree *b = container_of(bk, struct btree, keys); > unsigned i, stale; > + char buf[80]; > > if (!KEY_PTRS(k) || > bch_extent_invalid(bk, k)) > @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys > *bk, const struct bkey *k) > if (!ptr_available(b->c, k, i)) > return true; > > - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k)) > - return false; > - > for (i = 0; i < KEY_PTRS(k); i++) { > stale = ptr_stale(b->c, k, i); > > + if (stale && KEY_DIRTY(k)) { > + bch_extent_to_text(buf, sizeof(buf), k); > + pr_info("stale dirty pointer, stale %u, key: %s", > + stale, > + buf); > + } > + > btree_bug_on(stale > 96, b, > "key too stale: %i, need_gc %u", > stale, b->c->need_gc); > > - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k), > - b, "stale dirty pointer"); > - > if (stale) > return true; > > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bcache: treat stale && dirty keys as bad keys 2018-12-18 14:01 ` Kent Overstreet @ 2018-12-19 1:30 ` Junhui Tang 2018-12-19 1:32 ` Junhui Tang 1 sibling, 0 replies; 7+ messages in thread From: Junhui Tang @ 2018-12-19 1:30 UTC (permalink / raw) To: Kent Overstreet; +Cc: colyli, linux-bcache, linux-block hello Kent long long no see, glad to hear you again. >> Then two steps: >> A) update k1 to k2 in btree node memory; >> bch_btree_insert_keys(b, op, insert_keys, replace_key) >> B) Write the bset(contains k2) to cache disk by a 30s delay work >> bch_btree_leaf_dirty(b, journal_ref). >> But before the 30s delay work write the bset to cache device, >> these things happend: >> A) GC works, and reclaim the bucket k2 point to; >> B) Allocator works, and invalidate the bucket k2 point to, >> and increase the gen of the bucket, and place it into free_inc >> fifo; >> C) Until now, the 30s delay work still does not finish work, >> so in the disk, the key still is k1, it is dirty and stale >> (its gen is smaller than the gen of the bucket). and then the >> machine power off suddenly happens; >> D) When the machine power on again, after the btree reconstruction, >> the stale dirty key appear. > Only prior to journal replay, right? Or did you uncover something more severe? No, it's after the journal replay, and in write_dirty_finish(), when replace a dirty key with a clean key by calling bch_btree_insert(), no journal will write. >> In bch_extent_bad(), when expensive_debug_checks is off, it would >> treat the dirty key as good even it is stale keys, and it would >> cause bellow probelms: >> A) In read_dirty() it would cause machine crash: >> BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); >> B) It could be worse when reads hits stale dirty keys, it would >> read old incorrect data. >Neither of these can happen until after journal replay is finished. Prior to >journal replay we expect to find stale dirty keys - if we find any after journal >replay then it's indicative of a real bug. As I said previous, since no journal writes after inserting a replace key in writeback, so this issue has nothing to do with journal. This is a real problem in my environment, after running IO sometimes, I turn off the power suddenly, then turn on the power, and the machine crash in read_dirty() due to the stale && dirty keys. Kent Overstreet <kent.overstreet@gmail.com> 于2018年12月18日周二 下午10:02写道: > > On Tue, Dec 18, 2018 at 02:37:14PM +0800, Junhui Tang wrote: > > From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001 > > From: Tang Junhui <tang.junhui.linux@gmail.com> > > Date: Tue, 18 Dec 2018 10:38:26 +0800 > > Subject: [PATCH] bcache: treat stale && dirty keys as bad keys > > > > Stale && dirty keys can be produced in the follow way: > > After writeback in write_dirty_finish(), dirty keys k1 will > > replace by clean keys k2 > > ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key); > > ==>btree_insert_fn(struct btree_op *b_op, struct btree *b) > > ==>static int bch_btree_insert_node(struct btree *b, > > struct btree_op *op, > > struct keylist *insert_keys, > > atomic_t *journal_ref, > > Then two steps: > > A) update k1 to k2 in btree node memory; > > bch_btree_insert_keys(b, op, insert_keys, replace_key) > > B) Write the bset(contains k2) to cache disk by a 30s delay work > > bch_btree_leaf_dirty(b, journal_ref). > > But before the 30s delay work write the bset to cache device, > > these things happend: > > A) GC works, and reclaim the bucket k2 point to; > > B) Allocator works, and invalidate the bucket k2 point to, > > and increase the gen of the bucket, and place it into free_inc > > fifo; > > C) Until now, the 30s delay work still does not finish work, > > so in the disk, the key still is k1, it is dirty and stale > > (its gen is smaller than the gen of the bucket). and then the > > machine power off suddenly happens; > > D) When the machine power on again, after the btree reconstruction, > > the stale dirty key appear. > > Only prior to journal replay, right? Or did you uncover something more severe? > > > In bch_extent_bad(), when expensive_debug_checks is off, it would > > treat the dirty key as good even it is stale keys, and it would > > cause bellow probelms: > > A) In read_dirty() it would cause machine crash: > > BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > > B) It could be worse when reads hits stale dirty keys, it would > > read old incorrect data. > > Neither of these can happen until after journal replay is finished. Prior to > journal replay we expect to find stale dirty keys - if we find any after journal > replay then it's indicative of a real bug. > > > > > This patch tolerate the existence of these stale && dirty keys, > > and treat them as bad key in bch_extent_bad(). > > > > Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com> > > --- > > drivers/md/bcache/extents.c | 14 ++++++++------ > > 1 file changed, 8 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c > > index 1d09674..f9eb03c 100644 > > --- a/drivers/md/bcache/extents.c > > +++ b/drivers/md/bcache/extents.c > > @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk, > > const struct bkey *k) > > { > > struct btree *b = container_of(bk, struct btree, keys); > > unsigned i, stale; > > + char buf[80]; > > > > if (!KEY_PTRS(k) || > > bch_extent_invalid(bk, k)) > > @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys > > *bk, const struct bkey *k) > > if (!ptr_available(b->c, k, i)) > > return true; > > > > - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k)) > > - return false; > > - > > for (i = 0; i < KEY_PTRS(k); i++) { > > stale = ptr_stale(b->c, k, i); > > > > + if (stale && KEY_DIRTY(k)) { > > + bch_extent_to_text(buf, sizeof(buf), k); > > + pr_info("stale dirty pointer, stale %u, key: %s", > > + stale, > > + buf); > > + } > > + > > btree_bug_on(stale > 96, b, > > "key too stale: %i, need_gc %u", > > stale, b->c->need_gc); > > > > - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k), > > - b, "stale dirty pointer"); > > - > > if (stale) > > return true; > > > > -- > > 1.8.3.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bcache: treat stale && dirty keys as bad keys 2018-12-18 14:01 ` Kent Overstreet 2018-12-19 1:30 ` Junhui Tang @ 2018-12-19 1:32 ` Junhui Tang 2018-12-19 11:32 ` Kent Overstreet 1 sibling, 1 reply; 7+ messages in thread From: Junhui Tang @ 2018-12-19 1:32 UTC (permalink / raw) To: Kent Overstreet; +Cc: colyli, linux-bcache, linux-block hello Kent Long long no see, glad to hear you again. >> Then two steps: >> A) update k1 to k2 in btree node memory; >> bch_btree_insert_keys(b, op, insert_keys, replace_key) >> B) Write the bset(contains k2) to cache disk by a 30s delay work >> bch_btree_leaf_dirty(b, journal_ref). >> But before the 30s delay work write the bset to cache device, >> these things happend: >> A) GC works, and reclaim the bucket k2 point to; >> B) Allocator works, and invalidate the bucket k2 point to, >> and increase the gen of the bucket, and place it into free_inc >> fifo; >> C) Until now, the 30s delay work still does not finish work, >> so in the disk, the key still is k1, it is dirty and stale >> (its gen is smaller than the gen of the bucket). and then the >> machine power off suddenly happens; >> D) When the machine power on again, after the btree reconstruction, >> the stale dirty key appear. > Only prior to journal replay, right? Or did you uncover something more severe? No, it's after the journal replay, and in write_dirty_finish(), when replace a dirty key with a clean key by calling bch_btree_insert(), no journal will write. >> In bch_extent_bad(), when expensive_debug_checks is off, it would >> treat the dirty key as good even it is stale keys, and it would >> cause bellow probelms: >> A) In read_dirty() it would cause machine crash: >> BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); >> B) It could be worse when reads hits stale dirty keys, it would >> read old incorrect data. >Neither of these can happen until after journal replay is finished. Prior to >journal replay we expect to find stale dirty keys - if we find any after journal >replay then it's indicative of a real bug. As I said previous, since no journal writes after inserting a replace key in writeback, so this issue has nothing to do with journal. This is a real problem in my environment, after running IO sometimes, I turn off the power suddenly, then turn on the power, and the machine crash in read_dirty() due to the stale && dirty keys. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bcache: treat stale && dirty keys as bad keys 2018-12-19 1:32 ` Junhui Tang @ 2018-12-19 11:32 ` Kent Overstreet 2018-12-20 8:40 ` Junhui Tang 0 siblings, 1 reply; 7+ messages in thread From: Kent Overstreet @ 2018-12-19 11:32 UTC (permalink / raw) To: Junhui Tang; +Cc: colyli, linux-bcache, linux-block On Wed, Dec 19, 2018 at 09:32:55AM +0800, Junhui Tang wrote: > hello Kent > > Long long no see, glad to hear you again. > > >> Then two steps: > >> A) update k1 to k2 in btree node memory; > >> bch_btree_insert_keys(b, op, insert_keys, replace_key) > >> B) Write the bset(contains k2) to cache disk by a 30s delay work > >> bch_btree_leaf_dirty(b, journal_ref). > >> But before the 30s delay work write the bset to cache device, > >> these things happend: > >> A) GC works, and reclaim the bucket k2 point to; > >> B) Allocator works, and invalidate the bucket k2 point to, > >> and increase the gen of the bucket, and place it into free_inc > >> fifo; > >> C) Until now, the 30s delay work still does not finish work, > >> so in the disk, the key still is k1, it is dirty and stale > >> (its gen is smaller than the gen of the bucket). and then the > >> machine power off suddenly happens; > >> D) When the machine power on again, after the btree reconstruction, > >> the stale dirty key appear. > > > Only prior to journal replay, right? Or did you uncover something more severe? > No, it's after the journal replay, and in write_dirty_finish(), when > replace a dirty key with a clean key by calling bch_btree_insert(), > no journal will write. Holy crap you're right, this was from before I moved journalling to be driven by the btree update path. I think a better fix here would be to journal the btree updates writeback does, but given that we haven't been journalling those updates all this time your fix does make sense too. > > >> In bch_extent_bad(), when expensive_debug_checks is off, it would > >> treat the dirty key as good even it is stale keys, and it would > >> cause bellow probelms: > >> A) In read_dirty() it would cause machine crash: > >> BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > >> B) It could be worse when reads hits stale dirty keys, it would > >> read old incorrect data. > > >Neither of these can happen until after journal replay is finished. Prior to > >journal replay we expect to find stale dirty keys - if we find any after journal > >replay then it's indicative of a real bug. > As I said previous, since no journal writes after inserting a replace key in > writeback, so this issue has nothing to do with journal. > > This is a real problem in my environment, after running IO sometimes, I turn off > the power suddenly, then turn on the power, and the machine crash in > read_dirty() due to the stale && dirty keys. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bcache: treat stale && dirty keys as bad keys 2018-12-19 11:32 ` Kent Overstreet @ 2018-12-20 8:40 ` Junhui Tang 0 siblings, 0 replies; 7+ messages in thread From: Junhui Tang @ 2018-12-20 8:40 UTC (permalink / raw) To: Kent Overstreet; +Cc: colyli, linux-bcache, linux-block Kent Overstreet <kent.overstreet@gmail.com> 于2018年12月19日周三 下午7:32写道: > > On Wed, Dec 19, 2018 at 09:32:55AM +0800, Junhui Tang wrote: > > hello Kent > > > > Long long no see, glad to hear you again. > > > > >> Then two steps: > > >> A) update k1 to k2 in btree node memory; > > >> bch_btree_insert_keys(b, op, insert_keys, replace_key) > > >> B) Write the bset(contains k2) to cache disk by a 30s delay work > > >> bch_btree_leaf_dirty(b, journal_ref). > > >> But before the 30s delay work write the bset to cache device, > > >> these things happend: > > >> A) GC works, and reclaim the bucket k2 point to; > > >> B) Allocator works, and invalidate the bucket k2 point to, > > >> and increase the gen of the bucket, and place it into free_inc > > >> fifo; > > >> C) Until now, the 30s delay work still does not finish work, > > >> so in the disk, the key still is k1, it is dirty and stale > > >> (its gen is smaller than the gen of the bucket). and then the > > >> machine power off suddenly happens; > > >> D) When the machine power on again, after the btree reconstruction, > > >> the stale dirty key appear. > > > > > Only prior to journal replay, right? Or did you uncover something more severe? > > No, it's after the journal replay, and in write_dirty_finish(), when > > replace a dirty key with a clean key by calling bch_btree_insert(), > > no journal will write. > > Holy crap you're right, this was from before I moved journalling to be driven by > the btree update path. > > I think a better fix here would be to journal the btree updates writeback does, > but given that we haven't been journalling those updates all this time your fix > does make sense too. Journal those updates would consume lots of resources, so I don't think it a good way. And this fix is very simple. Kent, Could you add an Acked-by or Reviewed-by for this patch? So Coly can put it to upstream. > > > > > >> In bch_extent_bad(), when expensive_debug_checks is off, it would > > >> treat the dirty key as good even it is stale keys, and it would > > >> cause bellow probelms: > > >> A) In read_dirty() it would cause machine crash: > > >> BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > > >> B) It could be worse when reads hits stale dirty keys, it would > > >> read old incorrect data. > > > > >Neither of these can happen until after journal replay is finished. Prior to > > >journal replay we expect to find stale dirty keys - if we find any after journal > > >replay then it's indicative of a real bug. > > As I said previous, since no journal writes after inserting a replace key in > > writeback, so this issue has nothing to do with journal. > > > > This is a real problem in my environment, after running IO sometimes, I turn off > > the power suddenly, then turn on the power, and the machine crash in > > read_dirty() due to the stale && dirty keys. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bcache: treat stale && dirty keys as bad keys 2018-12-18 6:37 [PATCH] bcache: treat stale && dirty keys as bad keys Junhui Tang 2018-12-18 14:01 ` Kent Overstreet @ 2018-12-22 13:02 ` Coly Li 1 sibling, 0 replies; 7+ messages in thread From: Coly Li @ 2018-12-22 13:02 UTC (permalink / raw) To: Junhui Tang; +Cc: linux-bcache, linux-block On 12/18/18 2:37 下午, Junhui Tang wrote: > From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001 > From: Tang Junhui <tang.junhui.linux@gmail.com> > Date: Tue, 18 Dec 2018 10:38:26 +0800 > Subject: [PATCH] bcache: treat stale && dirty keys as bad keys > > Stale && dirty keys can be produced in the follow way: > After writeback in write_dirty_finish(), dirty keys k1 will > replace by clean keys k2 > ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key); > ==>btree_insert_fn(struct btree_op *b_op, struct btree *b) > ==>static int bch_btree_insert_node(struct btree *b, > struct btree_op *op, > struct keylist *insert_keys, > atomic_t *journal_ref, > Then two steps: > A) update k1 to k2 in btree node memory; > bch_btree_insert_keys(b, op, insert_keys, replace_key) > B) Write the bset(contains k2) to cache disk by a 30s delay work > bch_btree_leaf_dirty(b, journal_ref). > But before the 30s delay work write the bset to cache device, > these things happend: > A) GC works, and reclaim the bucket k2 point to; > B) Allocator works, and invalidate the bucket k2 point to, > and increase the gen of the bucket, and place it into free_inc > fifo; > C) Until now, the 30s delay work still does not finish work, > so in the disk, the key still is k1, it is dirty and stale > (its gen is smaller than the gen of the bucket). and then the > machine power off suddenly happens; > D) When the machine power on again, after the btree reconstruction, > the stale dirty key appear. > > In bch_extent_bad(), when expensive_debug_checks is off, it would > treat the dirty key as good even it is stale keys, and it would > cause bellow probelms: > A) In read_dirty() it would cause machine crash: > BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > B) It could be worse when reads hits stale dirty keys, it would > read old incorrect data. > > This patch tolerate the existence of these stale && dirty keys, > and treat them as bad key in bch_extent_bad(). > > Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com> Hi Junhui, As a quick fix, this patch is good to me. Although there is indent issue in this patch, I will fix it, don't worry. Now I am also testing this patch on my hardware, if the addressed problem does not reproduce next Monday, I will submit it to Jens in 4.21 wave. Thanks. Coly Li > --- > drivers/md/bcache/extents.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c > index 1d09674..f9eb03c 100644 > --- a/drivers/md/bcache/extents.c > +++ b/drivers/md/bcache/extents.c > @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk, > const struct bkey *k) > { > struct btree *b = container_of(bk, struct btree, keys); > unsigned i, stale; > + char buf[80]; > > if (!KEY_PTRS(k) || > bch_extent_invalid(bk, k)) > @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys > *bk, const struct bkey *k) > if (!ptr_available(b->c, k, i)) > return true; > > - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k)) > - return false; > - > for (i = 0; i < KEY_PTRS(k); i++) { > stale = ptr_stale(b->c, k, i); > > + if (stale && KEY_DIRTY(k)) { > + bch_extent_to_text(buf, sizeof(buf), k); > + pr_info("stale dirty pointer, stale %u, key: %s", > + stale, > + buf); > + } > + > btree_bug_on(stale > 96, b, > "key too stale: %i, need_gc %u", > stale, b->c->need_gc); > > - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k), > - b, "stale dirty pointer"); > - > if (stale) > return true; > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-12-22 17:23 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-18 6:37 [PATCH] bcache: treat stale && dirty keys as bad keys Junhui Tang 2018-12-18 14:01 ` Kent Overstreet 2018-12-19 1:30 ` Junhui Tang 2018-12-19 1:32 ` Junhui Tang 2018-12-19 11:32 ` Kent Overstreet 2018-12-20 8:40 ` Junhui Tang 2018-12-22 13:02 ` Coly Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).