* [PATCH 1/3] bcache: fix NULL pointer in cache_set_flush()
2025-05-27 5:15 [PATCH 0/3] bcache-6.16-20250527 colyli
@ 2025-05-27 5:15 ` colyli
2025-05-27 5:16 ` [PATCH 2/3] bcache: remove unused constants colyli
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: colyli @ 2025-05-27 5:15 UTC (permalink / raw)
To: axboe; +Cc: linux-bcache, linux-block, Linggang Zeng, Mingzhe Zou, Coly Li
From: Linggang Zeng <linggang.zeng@easystack.cn>
1. LINE#1794 - LINE#1887 is some codes about function of
bch_cache_set_alloc().
2. LINE#2078 - LINE#2142 is some codes about function of
register_cache_set().
3. register_cache_set() will call bch_cache_set_alloc() in LINE#2098.
1794 struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
1795 {
...
1860 if (!(c->devices = kcalloc(c->nr_uuids, sizeof(void *), GFP_KERNEL)) ||
1861 mempool_init_slab_pool(&c->search, 32, bch_search_cache) ||
1862 mempool_init_kmalloc_pool(&c->bio_meta, 2,
1863 sizeof(struct bbio) + sizeof(struct bio_vec) *
1864 bucket_pages(c)) ||
1865 mempool_init_kmalloc_pool(&c->fill_iter, 1, iter_size) ||
1866 bioset_init(&c->bio_split, 4, offsetof(struct bbio, bio),
1867 BIOSET_NEED_BVECS|BIOSET_NEED_RESCUER) ||
1868 !(c->uuids = alloc_bucket_pages(GFP_KERNEL, c)) ||
1869 !(c->moving_gc_wq = alloc_workqueue("bcache_gc",
1870 WQ_MEM_RECLAIM, 0)) ||
1871 bch_journal_alloc(c) ||
1872 bch_btree_cache_alloc(c) ||
1873 bch_open_buckets_alloc(c) ||
1874 bch_bset_sort_state_init(&c->sort, ilog2(c->btree_pages)))
1875 goto err;
^^^^^^^^
1876
...
1883 return c;
1884 err:
1885 bch_cache_set_unregister(c);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
1886 return NULL;
1887 }
...
2078 static const char *register_cache_set(struct cache *ca)
2079 {
...
2098 c = bch_cache_set_alloc(&ca->sb);
2099 if (!c)
2100 return err;
^^^^^^^^^^
...
2128 ca->set = c;
2129 ca->set->cache[ca->sb.nr_this_dev] = ca;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
2138 return NULL;
2139 err:
2140 bch_cache_set_unregister(c);
2141 return err;
2142 }
(1) If LINE#1860 - LINE#1874 is true, then do 'goto err'(LINE#1875) and
call bch_cache_set_unregister()(LINE#1885).
(2) As (1) return NULL(LINE#1886), LINE#2098 - LINE#2100 would return.
(3) As (2) has returned, LINE#2128 - LINE#2129 would do *not* give the
value to c->cache[], it means that c->cache[] is NULL.
LINE#1624 - LINE#1665 is some codes about function of cache_set_flush().
As (1), in LINE#1885 call
bch_cache_set_unregister()
---> bch_cache_set_stop()
---> closure_queue()
-.-> cache_set_flush() (as below LINE#1624)
1624 static void cache_set_flush(struct closure *cl)
1625 {
...
1654 for_each_cache(ca, c, i)
1655 if (ca->alloc_thread)
^^
1656 kthread_stop(ca->alloc_thread);
...
1665 }
(4) In LINE#1655 ca is NULL(see (3)) in cache_set_flush() then the
kernel crash occurred as below:
[ 846.712887] bcache: register_cache() error drbd6: cannot allocate memory
[ 846.713242] bcache: register_bcache() error : failed to register device
[ 846.713336] bcache: cache_set_free() Cache set 2f84bdc1-498a-4f2f-98a7-01946bf54287 unregistered
[ 846.713768] BUG: unable to handle kernel NULL pointer dereference at 00000000000009f8
[ 846.714790] PGD 0 P4D 0
[ 846.715129] Oops: 0000 [#1] SMP PTI
[ 846.715472] CPU: 19 PID: 5057 Comm: kworker/19:16 Kdump: loaded Tainted: G OE --------- - - 4.18.0-147.5.1.el8_1.5es.3.x86_64 #1
[ 846.716082] Hardware name: ESPAN GI-25212/X11DPL-i, BIOS 2.1 06/15/2018
[ 846.716451] Workqueue: events cache_set_flush [bcache]
[ 846.716808] RIP: 0010:cache_set_flush+0xc9/0x1b0 [bcache]
[ 846.717155] Code: 00 4c 89 a5 b0 03 00 00 48 8b 85 68 f6 ff ff a8 08 0f 84 88 00 00 00 31 db 66 83 bd 3c f7 ff ff 00 48 8b 85 48 ff ff ff 74 28 <48> 8b b8 f8 09 00 00 48 85 ff 74 05 e8 b6 58 a2 e1 0f b7 95 3c f7
[ 846.718026] RSP: 0018:ffffb56dcf85fe70 EFLAGS: 00010202
[ 846.718372] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 846.718725] RDX: 0000000000000001 RSI: 0000000040000001 RDI: 0000000000000000
[ 846.719076] RBP: ffffa0ccc0f20df8 R08: ffffa0ce1fedb118 R09: 000073746e657665
[ 846.719428] R10: 8080808080808080 R11: 0000000000000000 R12: ffffa0ce1fee8700
[ 846.719779] R13: ffffa0ccc0f211a8 R14: ffffa0cd1b902840 R15: ffffa0ccc0f20e00
[ 846.720132] FS: 0000000000000000(0000) GS:ffffa0ce1fec0000(0000) knlGS:0000000000000000
[ 846.720726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 846.721073] CR2: 00000000000009f8 CR3: 00000008ba00a005 CR4: 00000000007606e0
[ 846.721426] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 846.721778] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 846.722131] PKRU: 55555554
[ 846.722467] Call Trace:
[ 846.722814] process_one_work+0x1a7/0x3b0
[ 846.723157] worker_thread+0x30/0x390
[ 846.723501] ? create_worker+0x1a0/0x1a0
[ 846.723844] kthread+0x112/0x130
[ 846.724184] ? kthread_flush_work_fn+0x10/0x10
[ 846.724535] ret_from_fork+0x35/0x40
Now, check whether that ca is NULL in LINE#1655 to fix the issue.
Signed-off-by: Linggang Zeng <linggang.zeng@easystack.cn>
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@kernel.org>
---
drivers/md/bcache/super.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index c40db9c161c1..9e6dfe2ec147 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1732,7 +1732,12 @@ static CLOSURE_CALLBACK(cache_set_flush)
mutex_unlock(&b->write_lock);
}
- if (ca->alloc_thread)
+ /*
+ * If the register_cache_set() call to bch_cache_set_alloc() failed,
+ * ca has not been assigned a value and return error.
+ * So we need check ca is not NULL during bch_cache_set_unregister().
+ */
+ if (ca && ca->alloc_thread)
kthread_stop(ca->alloc_thread);
if (c->journal.cur) {
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 2/3] bcache: remove unused constants
2025-05-27 5:15 [PATCH 0/3] bcache-6.16-20250527 colyli
2025-05-27 5:15 ` [PATCH 1/3] bcache: fix NULL pointer in cache_set_flush() colyli
@ 2025-05-27 5:16 ` colyli
2025-05-27 5:16 ` [PATCH 3/3] bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang colyli
2025-05-27 13:38 ` [PATCH 0/3] bcache-6.16-20250527 Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: colyli @ 2025-05-27 5:16 UTC (permalink / raw)
To: axboe; +Cc: linux-bcache, linux-block, Robert Pang, Coly Li
From: Robert Pang <robertpang@google.com>
Remove constants MAX_NEED_GC and MAX_SAVE_PRIO in btree.c that have been unused
since initial commit.
Signed-off-by: Robert Pang <robertpang@google.com>
Signed-off-by: Coly Li <colyli@kernel.org>
---
drivers/md/bcache/btree.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ed40d8600656..f991be2bc44e 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -88,8 +88,6 @@
* Test module load/unload
*/
-#define MAX_NEED_GC 64
-#define MAX_SAVE_PRIO 72
#define MAX_GC_TIMES 100
#define MIN_GC_NODES 100
#define GC_SLEEP_MS 100
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang
2025-05-27 5:15 [PATCH 0/3] bcache-6.16-20250527 colyli
2025-05-27 5:15 ` [PATCH 1/3] bcache: fix NULL pointer in cache_set_flush() colyli
2025-05-27 5:16 ` [PATCH 2/3] bcache: remove unused constants colyli
@ 2025-05-27 5:16 ` colyli
2025-05-27 13:38 ` [PATCH 0/3] bcache-6.16-20250527 Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: colyli @ 2025-05-27 5:16 UTC (permalink / raw)
To: axboe; +Cc: linux-bcache, linux-block, Mingzhe Zou, Coly Li
From: Mingzhe Zou <mingzhe.zou@easystack.cn>
Reported an IO hang and unrecoverable error in our testing environment.
After careful research, we found that bch_allocator_thread is stuck,
the call stack is as follows:
[<0>] __switch_to+0xbc/0x108
[<0>] __closure_sync+0x7c/0xbc [bcache]
[<0>] bch_prio_write+0x430/0x448 [bcache]
[<0>] bch_allocator_thread+0xb44/0xb70 [bcache]
[<0>] kthread+0x124/0x130
[<0>] ret_from_fork+0x10/0x18
Moreover, the RESERVE_BTREE type bucket slot are empty and journal_full
occurs at the same time.
When the cache disk is first used, the sb.nJournal_buckets defaults to 0.
So, only 8 RESERVE_BTREE type buckets are reserved. If RESERVE_BTREE type
buckets used up or btree_check_reserve() failed when request handle btree
split, the request will be repeatedly retried and wait for alloc thread to
fill in.
After the alloc thread fills the buckets, it will call bch_prio_write().
If journal_full occurs simultaneously at this time, journal_reclaim() and
btree_flush_write() will be called sequentially, journal_write cannot be
completed.
This is a low probability event, we believe that reserve more RESERVE_BTREE
buckets can avoid the worst situation.
Fixes: 682811b3ce1a ("bcache: fix for allocator and register thread race")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@kernel.org>
---
drivers/md/bcache/super.c | 48 ++++++++++++++++++++++++++++++++-------
1 file changed, 40 insertions(+), 8 deletions(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 9e6dfe2ec147..12fb3e557fb1 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2237,15 +2237,47 @@ static int cache_alloc(struct cache *ca)
bio_init(&ca->journal.bio, NULL, ca->journal.bio.bi_inline_vecs, 8, 0);
/*
- * when ca->sb.njournal_buckets is not zero, journal exists,
- * and in bch_journal_replay(), tree node may split,
- * so bucket of RESERVE_BTREE type is needed,
- * the worst situation is all journal buckets are valid journal,
- * and all the keys need to replay,
- * so the number of RESERVE_BTREE type buckets should be as much
- * as journal buckets
+ * When the cache disk is first registered, ca->sb.njournal_buckets
+ * is zero, and it is assigned in run_cache_set().
+ *
+ * When ca->sb.njournal_buckets is not zero, journal exists,
+ * and in bch_journal_replay(), tree node may split.
+ * The worst situation is all journal buckets are valid journal,
+ * and all the keys need to replay, so the number of RESERVE_BTREE
+ * type buckets should be as much as journal buckets.
+ *
+ * If the number of RESERVE_BTREE type buckets is too few, the
+ * bch_allocator_thread() may hang up and unable to allocate
+ * bucket. The situation is roughly as follows:
+ *
+ * 1. In bch_data_insert_keys(), if the operation is not op->replace,
+ * it will call the bch_journal(), which increments the journal_ref
+ * counter. This counter is only decremented after bch_btree_insert
+ * completes.
+ *
+ * 2. When calling bch_btree_insert, if the btree needs to split,
+ * it will call btree_split() and btree_check_reserve() to check
+ * whether there are enough reserved buckets in the RESERVE_BTREE
+ * slot. If not enough, bcache_btree_root() will repeatedly retry.
+ *
+ * 3. Normally, the bch_allocator_thread is responsible for filling
+ * the reservation slots from the free_inc bucket list. When the
+ * free_inc bucket list is exhausted, the bch_allocator_thread
+ * will call invalidate_buckets() until free_inc is refilled.
+ * Then bch_allocator_thread calls bch_prio_write() once. and
+ * bch_prio_write() will call bch_journal_meta() and waits for
+ * the journal write to complete.
+ *
+ * 4. During journal_write, journal_write_unlocked() is be called.
+ * If journal full occurs, journal_reclaim() and btree_flush_write()
+ * will be called sequentially, then retry journal_write.
+ *
+ * 5. When 2 and 4 occur together, IO will hung up and cannot recover.
+ *
+ * Therefore, reserve more RESERVE_BTREE type buckets.
*/
- btree_buckets = ca->sb.njournal_buckets ?: 8;
+ btree_buckets = clamp_t(size_t, ca->sb.nbuckets >> 7,
+ 32, SB_JOURNAL_BUCKETS);
free = roundup_pow_of_two(ca->sb.nbuckets) >> 10;
if (!free) {
ret = -EPERM;
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH 0/3] bcache-6.16-20250527
2025-05-27 5:15 [PATCH 0/3] bcache-6.16-20250527 colyli
` (2 preceding siblings ...)
2025-05-27 5:16 ` [PATCH 3/3] bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang colyli
@ 2025-05-27 13:38 ` Jens Axboe
3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2025-05-27 13:38 UTC (permalink / raw)
To: colyli; +Cc: linux-bcache, linux-block
On Tue, 27 May 2025 13:15:58 +0800, colyli@kernel.org wrote:
> Please consider to take these patches for 6.16. They are generated
> against linux-block tree for-6.16/block branch.
>
> Linggang and Mingzhe from Easystack contribute two important fixes, the
> patches are verified in their production environment for quite long
> time. This time we have a new contributor Robert Pang from Google who
> posts a code clean patch.
>
> [...]
Applied, thanks!
[1/3] bcache: fix NULL pointer in cache_set_flush()
commit: 1e46ed947ec658f89f1a910d880cd05e42d3763e
[2/3] bcache: remove unused constants
commit: 5a08e49f2359a14629f27da99aaf0f1c3a68b850
[3/3] bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang
commit: 208c1559c5b18894e3380b3807b6364bd14f7584
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread