* [PATCH 0/3] bcache-6.10 20240528
@ 2024-05-28 12:09 Coly Li
2024-05-28 12:09 ` [PATCH 1/3] bcache: allow allocator to invalidate bucket in gc Coly Li
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Coly Li @ 2024-05-28 12:09 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-bcache, Coly Li
Hi Jens,
This series is just for the patch from Dongsheng Yang, due to more
testing needed, it comes after the first wave.
Dongsheng's patch helps to improve the latency of cache space
allocation. This patch has been shipped in product more than one year
by his team. Robert Pang from Google introduces this patch has been
tested inside Google with quite extended hardware configurations. Eric
Wheeler also deploys this patch in his production for 1+ months.
So far there is no issue reported for this patch, except for a bug in
existed code and easier to be trigger by Dongsheng's patch. This is why
my first patch is composed. Another patch from me is just a code
cleanup cought my eyes during the fix.
Please consider to take them, and thank you in advance.
Coly Li
---
Coly Li (2):
bcache: call force_wake_up_gc() if necessary in check_should_bypass()
bcache: code cleanup in __bch_bucket_alloc_set()
Dongsheng Yang (1):
bcache: allow allocator to invalidate bucket in gc
drivers/md/bcache/alloc.c | 21 +++++++--------------
drivers/md/bcache/bcache.h | 1 +
drivers/md/bcache/btree.c | 7 ++++++-
drivers/md/bcache/request.c | 16 +++++++++++++++-
4 files changed, 29 insertions(+), 16 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] bcache: allow allocator to invalidate bucket in gc
2024-05-28 12:09 [PATCH 0/3] bcache-6.10 20240528 Coly Li
@ 2024-05-28 12:09 ` Coly Li
2024-05-28 12:09 ` [PATCH 2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass() Coly Li
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Coly Li @ 2024-05-28 12:09 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-bcache, Dongsheng Yang, Mingzhe Zou, Coly Li
From: Dongsheng Yang <dongsheng.yang@easystack.cn>
Currently, if the gc is running, when the allocator found free_inc
is empty, allocator has to wait the gc finish. Before that, the
IO is blocked.
But actually, there would be some buckets is reclaimable before gc,
and gc will never mark this kind of bucket to be unreclaimable.
So we can put these buckets into free_inc in gc running to avoid
IO being blocked.
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
---
drivers/md/bcache/alloc.c | 13 +++++--------
drivers/md/bcache/bcache.h | 1 +
drivers/md/bcache/btree.c | 7 ++++++-
3 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index ce13c272c387..32a46343097d 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -129,12 +129,9 @@ static inline bool can_inc_bucket_gen(struct bucket *b)
bool bch_can_invalidate_bucket(struct cache *ca, struct bucket *b)
{
- BUG_ON(!ca->set->gc_mark_valid);
-
- return (!GC_MARK(b) ||
- GC_MARK(b) == GC_MARK_RECLAIMABLE) &&
- !atomic_read(&b->pin) &&
- can_inc_bucket_gen(b);
+ return (ca->set->gc_mark_valid || b->reclaimable_in_gc) &&
+ ((!GC_MARK(b) || GC_MARK(b) == GC_MARK_RECLAIMABLE) &&
+ !atomic_read(&b->pin) && can_inc_bucket_gen(b));
}
void __bch_invalidate_one_bucket(struct cache *ca, struct bucket *b)
@@ -148,6 +145,7 @@ void __bch_invalidate_one_bucket(struct cache *ca, struct bucket *b)
bch_inc_gen(ca, b);
b->prio = INITIAL_PRIO;
atomic_inc(&b->pin);
+ b->reclaimable_in_gc = 0;
}
static void bch_invalidate_one_bucket(struct cache *ca, struct bucket *b)
@@ -352,8 +350,7 @@ static int bch_allocator_thread(void *arg)
*/
retry_invalidate:
- allocator_wait(ca, ca->set->gc_mark_valid &&
- !ca->invalidate_needs_gc);
+ allocator_wait(ca, !ca->invalidate_needs_gc);
invalidate_buckets(ca);
/*
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 4e6afa89921f..1d33e40d26ea 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -200,6 +200,7 @@ struct bucket {
uint8_t gen;
uint8_t last_gc; /* Most out of date gen in the btree */
uint16_t gc_mark; /* Bitfield used by GC. See below for field */
+ uint16_t reclaimable_in_gc:1;
};
/*
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index d011a7154d33..4e6ccf2c8a0b 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1741,18 +1741,20 @@ static void btree_gc_start(struct cache_set *c)
mutex_lock(&c->bucket_lock);
- c->gc_mark_valid = 0;
c->gc_done = ZERO_KEY;
ca = c->cache;
for_each_bucket(b, ca) {
b->last_gc = b->gen;
+ if (bch_can_invalidate_bucket(ca, b))
+ b->reclaimable_in_gc = 1;
if (!atomic_read(&b->pin)) {
SET_GC_MARK(b, 0);
SET_GC_SECTORS_USED(b, 0);
}
}
+ c->gc_mark_valid = 0;
mutex_unlock(&c->bucket_lock);
}
@@ -1809,6 +1811,9 @@ static void bch_btree_gc_finish(struct cache_set *c)
for_each_bucket(b, ca) {
c->need_gc = max(c->need_gc, bucket_gc_gen(b));
+ if (b->reclaimable_in_gc)
+ b->reclaimable_in_gc = 0;
+
if (atomic_read(&b->pin))
continue;
--
2.35.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass()
2024-05-28 12:09 [PATCH 0/3] bcache-6.10 20240528 Coly Li
2024-05-28 12:09 ` [PATCH 1/3] bcache: allow allocator to invalidate bucket in gc Coly Li
@ 2024-05-28 12:09 ` Coly Li
2024-05-28 12:09 ` [PATCH 3/3] bcache: code cleanup in __bch_bucket_alloc_set() Coly Li
2024-05-28 12:57 ` [PATCH 0/3] bcache-6.10 20240528 Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: Coly Li @ 2024-05-28 12:09 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-bcache, Coly Li
If there are extreme heavy write I/O continuously hit on relative small
cache device (512GB in my testing), it is possible to make counter
c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD.
If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write
requests will bypass the cache device because check_should_bypass()
returns 'true'. Because all writes bypass the cache device, counter
c->sectors_to_gc has no chance to be negative value, and garbage
collection thread won't be waken up even the whole cache becomes clean
after writeback accomplished. The aftermath is that all write I/Os go
directly into backing device even the cache device is clean.
To avoid the above situation, this patch uses a quite conservative way
to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes
up garbage collection thread when the whole cache device is clean.
Before the fix, the writes-always-bypass situation happens after 10+
hours write I/O pressure on 512GB Intel optane memory which acts as
cache device. After this fix, such situation doesn't happen after 36+
hours testing.
Signed-off-by: Coly Li <colyli@suse.de>
---
drivers/md/bcache/request.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 83d112bd2b1c..af345dc6fde1 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -369,10 +369,24 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
struct io *i;
if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) ||
- c->gc_stats.in_use > CUTOFF_CACHE_ADD ||
(bio_op(bio) == REQ_OP_DISCARD))
goto skip;
+ if (c->gc_stats.in_use > CUTOFF_CACHE_ADD) {
+ /*
+ * If cached buckets are all clean now, 'true' will be
+ * returned and all requests will bypass the cache device.
+ * Then c->sectors_to_gc has no chance to be negative, and
+ * gc thread won't wake up and caching won't work forever.
+ * Here call force_wake_up_gc() to avoid such aftermath.
+ */
+ if (BDEV_STATE(&dc->sb) == BDEV_STATE_CLEAN &&
+ c->gc_mark_valid)
+ force_wake_up_gc(c);
+
+ goto skip;
+ }
+
if (mode == CACHE_MODE_NONE ||
(mode == CACHE_MODE_WRITEAROUND &&
op_is_write(bio_op(bio))))
--
2.35.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] bcache: code cleanup in __bch_bucket_alloc_set()
2024-05-28 12:09 [PATCH 0/3] bcache-6.10 20240528 Coly Li
2024-05-28 12:09 ` [PATCH 1/3] bcache: allow allocator to invalidate bucket in gc Coly Li
2024-05-28 12:09 ` [PATCH 2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass() Coly Li
@ 2024-05-28 12:09 ` Coly Li
2024-05-28 12:57 ` [PATCH 0/3] bcache-6.10 20240528 Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: Coly Li @ 2024-05-28 12:09 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-bcache, Coly Li
In __bch_bucket_alloc_set() the lines after lable 'err:' indeed do
nothing useful after multiple cache devices are removed from bcache
code. This cleanup patch drops the useless code to save a bit CPU
cycles.
Signed-off-by: Coly Li <colyli@suse.de>
---
drivers/md/bcache/alloc.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index 32a46343097d..48ce750bf70a 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -498,8 +498,8 @@ int __bch_bucket_alloc_set(struct cache_set *c, unsigned int reserve,
ca = c->cache;
b = bch_bucket_alloc(ca, reserve, wait);
- if (b == -1)
- goto err;
+ if (b < 0)
+ return -1;
k->ptr[0] = MAKE_PTR(ca->buckets[b].gen,
bucket_to_sector(c, b),
@@ -508,10 +508,6 @@ int __bch_bucket_alloc_set(struct cache_set *c, unsigned int reserve,
SET_KEY_PTRS(k, 1);
return 0;
-err:
- bch_bucket_free(c, k);
- bkey_put(c, k);
- return -1;
}
int bch_bucket_alloc_set(struct cache_set *c, unsigned int reserve,
--
2.35.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] bcache-6.10 20240528
2024-05-28 12:09 [PATCH 0/3] bcache-6.10 20240528 Coly Li
` (2 preceding siblings ...)
2024-05-28 12:09 ` [PATCH 3/3] bcache: code cleanup in __bch_bucket_alloc_set() Coly Li
@ 2024-05-28 12:57 ` Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2024-05-28 12:57 UTC (permalink / raw)
To: Coly Li; +Cc: linux-block, linux-bcache
On Tue, 28 May 2024 20:09:11 +0800, Coly Li wrote:
> This series is just for the patch from Dongsheng Yang, due to more
> testing needed, it comes after the first wave.
>
> Dongsheng's patch helps to improve the latency of cache space
> allocation. This patch has been shipped in product more than one year
> by his team. Robert Pang from Google introduces this patch has been
> tested inside Google with quite extended hardware configurations. Eric
> Wheeler also deploys this patch in his production for 1+ months.
>
> [...]
Applied, thanks!
[1/3] bcache: allow allocator to invalidate bucket in gc
commit: a14a68b76954e73031ca6399abace17dcb77c17a
[2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass()
commit: 05356938a4be356adde4eab4425c6822f3c7d706
[3/3] bcache: code cleanup in __bch_bucket_alloc_set()
commit: 74d4ce92e08d5669d66fd890403724faa4286c21
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-28 12:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 12:09 [PATCH 0/3] bcache-6.10 20240528 Coly Li
2024-05-28 12:09 ` [PATCH 1/3] bcache: allow allocator to invalidate bucket in gc Coly Li
2024-05-28 12:09 ` [PATCH 2/3] bcache: call force_wake_up_gc() if necessary in check_should_bypass() Coly Li
2024-05-28 12:09 ` [PATCH 3/3] bcache: code cleanup in __bch_bucket_alloc_set() Coly Li
2024-05-28 12:57 ` [PATCH 0/3] bcache-6.10 20240528 Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox