All of lore.kernel.org
 help / color / mirror / Atom feed
* gen wraparound warning: is this a problem?
@ 2025-09-07 14:37 Nix
       [not found] ` <C155231F-E439-46E5-8AFE-502CB75F183C@coly.li>
  0 siblings, 1 reply; 3+ messages in thread
From: Nix @ 2025-09-07 14:37 UTC (permalink / raw)
  To: linux-bcache

So, out of the blue, I just got this for my long-standing writearound
bcache setup (which covers my rootfs and $HOME, so I kind of care that
it keeps working):

[3997629.262213] WARNING: CPU: 0 PID: 222478 at drivers/md/bcache/alloc.c:81 bch_inc_gen+0x3c/0x40
[3997629.278998] Modules linked in: vfat fat intel_uncore_frequency intel_uncore_frequency_common
[3997629.295763] CPU: 0 UID: 0 PID: 222478 Comm: kworker/0:9 Tainted: G        W           6.15.6-00002-g0d4613ee4427-dirty #2 NONE 
[3997629.329334] Tainted: [W]=WARN
[3997629.345701] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.1029.090220201031 09/02/2020
[3997629.362413] Workqueue: bcache bch_data_insert_keys
[3997629.378983] RIP: 0010:bch_inc_gen+0x3c/0x40
[3997629.395389] Code: 0f 89 c2 48 89 e5 2a 56 07 0f b6 b1 1a 06 00 00 40 38 f2 0f 42 d6 88 91 1a 06 00 00 48 8b 17 80 ba 1a 06 00 00 60 77 02 5d c3 <0f> 0b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 8b 87 30 03 00
[3997629.428514] RSP: 0018:ffffa7dba31b7a00 EFLAGS: 00010202
[3997629.444857] RAX: 0000000000000045 RBX: 000007ffffffffff RCX: ffff9e5e0d140000
[3997629.461276] RDX: ffff9e5e0d140000 RSI: 0000000000000060 RDI: ffff9e5e00ee4000
[3997629.477509] RBP: ffffa7dba31b7a00 R08: 0000000000000001 R09: 0000000000000000
[3997629.493394] R10: 000e33d2971e5c02 R11: 0000000000000004 R12: 0000000000000002
[3997629.509021] R13: 0000000000000001 R14: ffffa7dba31b7ac8 R15: ffff9e5e149b5000
[3997629.524515] FS:  0000000000000000(0000) GS:ffff9e7d6d275000(0000) knlGS:0000000000000000
[3997629.539853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3997629.555063] CR2: 00000000328da018 CR3: 00000007d6c41002 CR4: 00000000003726f0
[3997629.570252] Call Trace:
[3997629.585190]  <TASK>
[3997629.599881]  make_btree_freeing_key+0xf0/0x130
[3997629.614450]  btree_split+0x4c9/0x6b0
[3997629.628797]  ? cpu_rmap_get+0x40/0x40
[3997629.642981]  bch_btree_insert_node+0x2c2/0x390
[3997629.657004]  btree_insert_fn+0x24/0x50
[3997629.670750]  bch_btree_map_nodes_recurse+0x3c/0x170
[3997629.684371]  ? bch_btree_insert_check_key+0x1b0/0x1b0
[3997629.697940]  ? bch_btree_ptr_bad+0x49/0xe0
[3997629.711187]  ? bch_btree_node_get.part.0+0x79/0x2c0
[3997629.724089]  ? rwsem_wake.isra.0+0x70/0x70
[3997629.736874]  bch_btree_map_nodes_recurse+0xf3/0x170
[3997629.749516]  ? bch_btree_insert_check_key+0x1b0/0x1b0
[3997629.762012]  __bch_btree_map_nodes+0x189/0x1a0
[3997629.774179]  ? bch_btree_insert_check_key+0x1b0/0x1b0
[3997629.786202]  bch_btree_insert+0xca/0x130
[3997629.797987]  ? ipi_sync_rq_state+0x40/0x40
[3997629.809575]  bch_data_insert_keys+0x36/0xe0
[3997629.820919]  process_one_work+0x14b/0x300
[3997629.831940]  worker_thread+0x2c3/0x400
[3997629.842820]  ? flush_rcu_work+0x40/0x40
[3997629.853452]  kthread+0xe8/0x1e0
[3997629.863718]  ? kthread_cancel_delayed_work_sync+0x20/0x20
[3997629.873923]  ret_from_fork+0x36/0x50
[3997629.883866]  ? kthread_cancel_delayed_work_sync+0x20/0x20
[3997629.893642]  ret_from_fork_asm+0x11/0x20
[3997629.903209]  </TASK>

[4001476.794864] WARNING: CPU: 10 PID: 408 at drivers/md/bcache/alloc.c:81 __bch_invalidate_one_bucket+0xba/0xc0
[4001476.804157] Modules linked in: vfat fat intel_uncore_frequency intel_uncore_frequency_common
[4001476.813375] CPU: 10 UID: 0 PID: 408 Comm: bcache_allocato Tainted: G        W           6.15.6-00002-g0d4613ee4427-dirty #2 NONE 
[4001476.832090] Tainted: [W]=WARN
[4001476.841130] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.1029.090220201031 09/02/2020
[4001476.850542] RIP: 0010:__bch_invalidate_one_bucket+0xba/0xc0
[4001476.859914] Code: 24 48 89 f2 48 8b 78 08 4c 89 e6 48 29 ca 48 b9 ab aa aa aa aa aa aa aa 48 c1 fa 02 48 0f af d1 e8 fb c6 01 00 e9 6a ff ff ff <0f> 0b eb 96 0f 0b 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 f4 53 48
[4001476.879592] RSP: 0018:ffffa7db80a2fe50 EFLAGS: 00010202
[4001476.889576] RAX: ffff9e5e0d140000 RBX: ffffa7db80976af0 RCX: 0000000000000061
[4001476.899666] RDX: ffff9e5e0d140000 RSI: ffffa7db80976af0 RDI: ffff9e5e00ee4000
[4001476.909659] RBP: ffffa7db80a2fe60 R08: ffffa7db808e82a8 R09: 0000000000000102
[4001476.919519] R10: 000000000000040b R11: ffff9e5e00ee0000 R12: ffff9e5e00ee4000
[4001476.929322] R13: 00000000000003ff R14: ffffa7db808ff30c R15: ffff9e5e00ee4000
[4001476.938978] FS:  0000000000000000(0000) GS:ffff9e7d6d4f5000(0000) knlGS:0000000000000000
[4001476.948860] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4001476.958735] CR2: 00007f8d3fbff6d8 CR3: 00000007d6c41006 CR4: 00000000003726f0
[4001476.968744] Call Trace:
[4001476.978687]  <TASK>
[4001476.988541]  bch_invalidate_one_bucket+0x17/0x80
[4001476.998509]  bch_allocator_thread+0xb05/0xc90
[4001477.008447]  ? bch_invalidate_one_bucket+0x80/0x80
[4001477.018391]  kthread+0xe8/0x1e0
[4001477.028309]  ? kthread_cancel_delayed_work_sync+0x20/0x20
[4001477.038348]  ret_from_fork+0x36/0x50
[4001477.048346]  ? kthread_cancel_delayed_work_sync+0x20/0x20
[4001477.058448]  ret_from_fork_asm+0x11/0x20
[4001477.068493]  </TASK>

These both map to this in bch_inc_gen():

        WARN_ON_ONCE(ca->set->need_gc > BUCKET_GC_GEN_MAX);

Is this something the admin needs to do something about? (And, if it's
not and bcache recovers smoothly, as so far it seems to -- though I
haven't tried to remount it since the warning -- why do we warn about
it at all?)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: gen wraparound warning: is this a problem?
       [not found] ` <C155231F-E439-46E5-8AFE-502CB75F183C@coly.li>
@ 2025-09-10 15:54   ` Nix
  2025-09-24 16:04     ` Coly Li
  0 siblings, 1 reply; 3+ messages in thread
From: Nix @ 2025-09-10 15:54 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

On 8 Sep 2025, Coly Li verbalised:

>> 2025年9月7日 22:37,Nix <nix@esperi.org.uk> 写道:
>> 
>> So, out of the blue, I just got this for my long-standing writearound
>> bcache setup (which covers my rootfs and $HOME, so I kind of care that
>> it keeps working):

(Oh, this is kernel 6.15.6 -- but that's only since Jul 20th. Before
that, I was running 5.16.19 right back to April 2022, yes, I know... so
it's possible this wasn't touched by *5.16* and thus this is a bug that
was fixed long ago.)

>> These both map to this in bch_inc_gen():
>> 
>>     WARN_ON_ONCE(ca->set->need_gc > BUCKET_GC_GEN_MAX);
>
> It seems a bucket has not been touched by garbage collection for a long time.

Not too surprising: half-terabyte cache, and the xfs filesystems it
backs only has 1.5TiB of data on it and not all of it is accessed
frequently, and some is bypassed... so it can take a long time to gc
through the entire cache :) it took months just to fill it.

>> Is this something the admin needs to do something about? (And, if it's
>> not and bcache recovers smoothly, as so far it seems to -- though I
>> haven't tried to remount it since the warning -- why do we warn about
>> it at all?)
>
> I don’t know why this bucket is not touched by GC for such a long
> time. It should not happen in my expectation.

It's possible that *no* buckets were touched for a long time.

> To make sure everything is safe, I would suggest to writeback all the
> dirty datas into backing device, detach the cache device, re-make the
> cache device and attach backing device to it again.

There is no dirty data (writethrough cache)... and this is backing the
rootfs, among other things, so IIRC detaching is quite difficult and
panic-prone to do (it's been many years, but I believe you can't do it
while mounted?). I'll schedule it for the next reboot...

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: gen wraparound warning: is this a problem?
  2025-09-10 15:54   ` Nix
@ 2025-09-24 16:04     ` Coly Li
  0 siblings, 0 replies; 3+ messages in thread
From: Coly Li @ 2025-09-24 16:04 UTC (permalink / raw)
  To: Nix; +Cc: linux-bcache

> 2025年9月10日 23:54,Nix <nix@esperi.org.uk> 写道:
> 
> On 8 Sep 2025, Coly Li verbalised:
> 
>>> 2025年9月7日 22:37,Nix <nix@esperi.org.uk> 写道:
>>> 
>>> So, out of the blue, I just got this for my long-standing writearound
>>> bcache setup (which covers my rootfs and $HOME, so I kind of care that
>>> it keeps working):
> 
> (Oh, this is kernel 6.15.6 -- but that's only since Jul 20th. Before
> that, I was running 5.16.19 right back to April 2022, yes, I know... so
> it's possible this wasn't touched by *5.16* and thus this is a bug that
> was fixed long ago.)

IMHO this might not be a kernel bug, and just about time.

> 
>>> These both map to this in bch_inc_gen():
>>> 
>>>    WARN_ON_ONCE(ca->set->need_gc > BUCKET_GC_GEN_MAX);
>> 
>> It seems a bucket has not been touched by garbage collection for a long time.
> 
> Not too surprising: half-terabyte cache, and the xfs filesystems it
> backs only has 1.5TiB of data on it and not all of it is accessed
> frequently, and some is bypassed... so it can take a long time to gc
> through the entire cache :) it took months just to fill it.
> 

Copied.



>>> Is this something the admin needs to do something about? (And, if it's
>>> not and bcache recovers smoothly, as so far it seems to -- though I
>>> haven't tried to remount it since the warning -- why do we warn about
>>> it at all?)
>> 
>> I don’t know why this bucket is not touched by GC for such a long
>> time. It should not happen in my expectation.
> 
> It's possible that *no* buckets were touched for a long time.
> 

Yeah, I assumed for that.


>> To make sure everything is safe, I would suggest to writeback all the
>> dirty datas into backing device, detach the cache device, re-make the
>> cache device and attach backing device to it again.
> 
> There is no dirty data (writethrough cache)... and this is backing the
> rootfs, among other things, so IIRC detaching is quite difficult and
> panic-prone to do (it's been many years, but I believe you can't do it
> while mounted?). I'll schedule it for the next reboot…

If there is no dirty data on the cache, the cache device can be safely detached
from backing device while the bcache device is mounted.

echo 1 > /sys/block/<backing dev>/bcache/detach

Hope it works.

Thanks.

Coly Li

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-09-24 16:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-07 14:37 gen wraparound warning: is this a problem? Nix
     [not found] ` <C155231F-E439-46E5-8AFE-502CB75F183C@coly.li>
2025-09-10 15:54   ` Nix
2025-09-24 16:04     ` Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.