public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
* Another bcachefs version downgrade bug
@ 2024-10-16  4:51 Carl E. Thompson
  2024-10-17  0:09 ` Kent Overstreet
  0 siblings, 1 reply; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-16  4:51 UTC (permalink / raw)
  To: linux-bcachefs@vger.kernel.org

Hi,

     I believe there is another newer version downgrade bug in bcachefs (tested versions: 6.9.4 <--> 6.11.3).

     My laptop runs kernel 6.9.4 normally with 4 bcachefs filesystems on LVM2 logical volumes mounted including the root filesystem. I needed to test something under 6.11 so I booted kernel 6.11.3 and used the system normally from the console (bcachefs worked fine under 6.11.3). After attempting to boot back into 6.9.4 my laptop no longer starts and hangs when trying to mount and manipulate the root filesystem. The kernel log shows kernel traces due to hung copygc tasks (see dmesg output below). This happens every time I try to start 6.9.4 now. The kernel log reveals that the bcachefs filesystem seems to complete the version downgrade and initial mount successfully but it starts hanging as soon as the filesystem is used. Booting back into the 6.11.3 kernel causes the filesystems to work again but I can't run 6.11 on my laptop normally because 6.11 (and 6.10) have amdgpu issues that cause irrecoverable graphical desktop lockups. So right now I can either choose to boot with filesystem
 s that don't work or with periodic hard graphical desktop crashes neither of which is ideal.

     On my laptop and some of my other computers I boot multiple Linux distributions which usually run different kernels and mount the same filesystems on all of them (except root). So I do need to be able to switch back and forth between kernels as needed on all of my systems and these types of issues give me some pause. I will disable bcachefs use on my dev systems and servers for now until I am more confident that there is a solid testing plan in place to make sure there can be no more of these kind of issues in the future when booting multiple kernels. I will keep bcachefs on my laptop for testing. A fix for my laptop isn't urgent for me personally as I can recreate the filesystems under 6.9.4 and restore from backups. Of course others people might need a fix more quickly. Next time I need to boot a different kernel I'll make sure to create LVM snapshots of the devices first to which I can revert if needed. 

Thanks,
Carl

show-super from one affected filesystem:
---
[clip carl]# bcachefs show-super /dev/clip/root-alpine 
Device:                                     (unknown device)
External UUID:                             c992a5de-c9b3-4fd1-82ed-4d2f66bc11cb
Internal UUID:                             43b4fe97-f5a4-48b3-8d99-3a3dda25211a
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              0
Label:                                     (none)
Version:                                   1.12: (unknown version)
Version upgrade complete:                  1.12: (unknown version)
Oldest version on disk:                    1.4: member_seq
Created:                                   Fri Mar 22 19:19:01 2024
Sequence number:                           249
Time of last write:                        Tue Oct 15 20:23:34 2024
Superblock size:                           4.45 KiB/1.00 MiB
Clean:                                     0
Devices:                                   1
Sections:                                  members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  lz4,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro 
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             lz4
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash] 
  metadata_target:                         none
  foreground_target:                       none
  background_target:                       none
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   0
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 160):
Device:                                    0
  Label:                                   (none)
  UUID:                                    352e33b9-dde4-48da-8fe2-255ae78c6320
  Size:                                    24.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         2
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 98304
  Last mount:                              Tue Oct 15 20:23:32 2024
  Last superblock write:                   249
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        1.00 MiB
  Btree allocated bitmap:                  0000000000000000000000000000011111111111111111111111111111111111
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1

errors (size 24):
bset_bad_csum                               1               Sat Jul  6 07:43:37 2024


dmesg output:
---

...

[  230.456893] bcachefs (dm-7): mounting version 1.12: (unknown version) opts=compression=lz4
[  230.456911] bcachefs (dm-7): recovering from clean shutdown, journal seq 4901
[  230.456915] bcachefs (dm-7): Version downgrade required:
[  230.469098] bcachefs (dm-7): alloc_read... done
[  230.469111] bcachefs (dm-7): stripes_read... done
[  230.469115] bcachefs (dm-7): snapshots_read... done
[  230.469436] bcachefs (dm-7): journal_replay... done
[  230.469441] bcachefs (dm-7): resume_logged_ops... done
[  230.469450] bcachefs (dm-7): going read-write
[  368.351326] INFO: task bch-copygc/dm-7:547 blocked for more than 122 seconds.
[  368.351336]       Not tainted 6.9.4-arch1-1 #1
[  368.351338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  368.351340] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   ppid:2      flags:0x00004000
[  368.351345] Call Trace:
[  368.351348]  <TASK>
[  368.351354]  __schedule+0x3c7/0x1510
[  368.351368]  schedule+0x27/0xf0
[  368.351372]  __closure_sync+0x7e/0x140
[  368.351382]  __bch2_write+0x136b/0x1660 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351436]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351440]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351441]  ? __kmalloc+0x1a7/0x440
[  368.351446]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351448]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351452]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351454]  ? local_clock_noinstr+0xd/0xd0
[  368.351456]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351457]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351460]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351489]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351511]  ? srso_alias_return_thunk+0x5/0xfbef5
[  368.351512]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351539]  bch2_data_update_init+0x68b/0x1420 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351573]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351602]  bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351631]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351652]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351681]  ? bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351702]  bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351732]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351775]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351828]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  368.351868]  kthread+0xcf/0x100
[  368.351876]  ? __pfx_kthread+0x10/0x10
[  368.351882]  ret_from_fork+0x31/0x50
[  368.351889]  ? __pfx_kthread+0x10/0x10
[  368.351894]  ret_from_fork_asm+0x1a/0x30
[  368.351905]  </TASK>
[  491.230894] INFO: task bch-copygc/dm-7:547 blocked for more than 245 seconds.
[  491.230914]       Not tainted 6.9.4-arch1-1 #1
[  491.230920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  491.230924] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   ppid:2      flags:0x00004000
[  491.230939] Call Trace:
[  491.230944]  <TASK>
[  491.230955]  __schedule+0x3c7/0x1510
[  491.230984]  schedule+0x27/0xf0
[  491.230993]  __closure_sync+0x7e/0x140
[  491.231011]  __bch2_write+0x136b/0x1660 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231160]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231169]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231174]  ? __kmalloc+0x1a7/0x440
[  491.231186]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231192]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231206]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231211]  ? local_clock_noinstr+0xd/0xd0
[  491.231218]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231223]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231232]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231340]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231412]  ? srso_alias_return_thunk+0x5/0xfbef5
[  491.231418]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231509]  bch2_data_update_init+0x68b/0x1420 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231625]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231732]  bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231823]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231883]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.231963]  ? bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.232022]  bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.232089]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.232148]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.232217]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[  491.232271]  kthread+0xcf/0x100
[  491.232282]  ? __pfx_kthread+0x10/0x10
[  491.232289]  ret_from_fork+0x31/0x50
[  491.232298]  ? __pfx_kthread+0x10/0x10
[  491.232304]  ret_from_fork_asm+0x1a/0x30
[  491.232319]  </TASK>

...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-16  4:51 Another bcachefs version downgrade bug Carl E. Thompson
@ 2024-10-17  0:09 ` Kent Overstreet
  2024-10-17  8:29   ` Carl E. Thompson
  0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2024-10-17  0:09 UTC (permalink / raw)
  To: Carl E. Thompson; +Cc: linux-bcachefs@vger.kernel.org

On Tue, Oct 15, 2024 at 09:51:16PM -0700, Carl E. Thompson wrote:
> Hi,
> 
>      I believe there is another newer version downgrade bug in bcachefs (tested versions: 6.9.4 <--> 6.11.3).
> 
>      My laptop runs kernel 6.9.4 normally with 4 bcachefs filesystems
>      on LVM2 logical volumes mounted including the root filesystem. I
>      needed to test something under 6.11 so I booted kernel 6.11.3 and
>      used the system normally from the console (bcachefs worked fine
>      under 6.11.3). After attempting to boot back into 6.9.4 my laptop
>      no longer starts and hangs when trying to mount and manipulate
>      the root filesystem. The kernel log shows kernel traces due to
>      hung copygc tasks (see dmesg output below). This happens every
>      time I try to start 6.9.4 now. The kernel log reveals that the
>      bcachefs filesystem seems to complete the version downgrade and
>      initial mount successfully but it starts hanging as soon as the
>      filesystem is used. Booting back into the 6.11.3 kernel causes
>      the filesystems to work again but I can't run 6.11 on my laptop
>      normally because 6.11 (and 6.10) have amdgpu issues that cause
>      irrecoverable graphical desktop lockups. So right now I can
>      either choose to boot with filesystem s that don't work or with
>      periodic hard graphical desktop crashes neither of which is
>      ideal.

Yeah, it looks like 6.9 isn't running the recovery passess specified in
the superblock downgrade section, meaning we start running without
correct accounting counters - 6.10 works, though.

6.10 is a LTS release and 6.9 is not - is 6.10 an option?


> 
>      On my laptop and some of my other computers I boot multiple Linux distributions which usually run different kernels and mount the same filesystems on all of them (except root). So I do need to be able to switch back and forth between kernels as needed on all of my systems and these types of issues give me some pause. I will disable bcachefs use on my dev systems and servers for now until I am more confident that there is a solid testing plan in place to make sure there can be no more of these kind of issues in the future when booting multiple kernels. I will keep bcachefs on my laptop for testing. A fix for my laptop isn't urgent for me personally as I can recreate the filesystems under 6.9.4 and restore from backups. Of course others people might need a fix more quickly. Next time I need to boot a different kernel I'll make sure to create LVM snapshots of the devices first to which I can revert if needed. 
> 
> Thanks,
> Carl
> 
> show-super from one affected filesystem:
> ---
> [clip carl]# bcachefs show-super /dev/clip/root-alpine 
> Device:                                     (unknown device)
> External UUID:                             c992a5de-c9b3-4fd1-82ed-4d2f66bc11cb
> Internal UUID:                             43b4fe97-f5a4-48b3-8d99-3a3dda25211a
> Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
> Device index:                              0
> Label:                                     (none)
> Version:                                   1.12: (unknown version)
> Version upgrade complete:                  1.12: (unknown version)
> Oldest version on disk:                    1.4: member_seq
> Created:                                   Fri Mar 22 19:19:01 2024
> Sequence number:                           249
> Time of last write:                        Tue Oct 15 20:23:34 2024
> Superblock size:                           4.45 KiB/1.00 MiB
> Clean:                                     0
> Devices:                                   1
> Sections:                                  members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
> Features:                                  lz4,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
> Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
> 
> Options:
>   block_size:                              4.00 KiB
>   btree_node_size:                         256 KiB
>   errors:                                  continue [fix_safe] panic ro 
>   metadata_replicas:                       1
>   data_replicas:                           1
>   metadata_replicas_required:              1
>   data_replicas_required:                  1
>   encoded_extent_max:                      64.0 KiB
>   metadata_checksum:                       none [crc32c] crc64 xxhash 
>   data_checksum:                           none [crc32c] crc64 xxhash 
>   compression:                             lz4
>   background_compression:                  none
>   str_hash:                                crc32c crc64 [siphash] 
>   metadata_target:                         none
>   foreground_target:                       none
>   background_target:                       none
>   promote_target:                          none
>   erasure_code:                            0
>   inodes_32bit:                            1
>   shard_inode_numbers:                     1
>   inodes_use_key_cache:                    1
>   gc_reserve_percent:                      8
>   gc_reserve_bytes:                        0 B
>   root_reserve_percent:                    0
>   wide_macs:                               0
>   promote_whole_extents:                   0
>   acl:                                     1
>   usrquota:                                0
>   grpquota:                                0
>   prjquota:                                0
>   journal_flush_delay:                     1000
>   journal_flush_disabled:                  0
>   journal_reclaim_delay:                   100
>   journal_transaction_names:               1
>   allocator_stuck_timeout:                 30
>   version_upgrade:                         [compatible] incompatible none 
>   nocow:                                   0
> 
> members_v2 (size 160):
> Device:                                    0
>   Label:                                   (none)
>   UUID:                                    352e33b9-dde4-48da-8fe2-255ae78c6320
>   Size:                                    24.0 GiB
>   read errors:                             0
>   write errors:                            0
>   checksum errors:                         2
>   seqread iops:                            0
>   seqwrite iops:                           0
>   randread iops:                           0
>   randwrite iops:                          0
>   Bucket size:                             256 KiB
>   First bucket:                            0
>   Buckets:                                 98304
>   Last mount:                              Tue Oct 15 20:23:32 2024
>   Last superblock write:                   249
>   State:                                   rw
>   Data allowed:                            journal,btree,user
>   Has data:                                journal,btree,user
>   Btree allocated bitmap blocksize:        1.00 MiB
>   Btree allocated bitmap:                  0000000000000000000000000000011111111111111111111111111111111111
>   Durability:                              1
>   Discard:                                 1
>   Freespace initialized:                   1
> 
> errors (size 24):
> bset_bad_csum                               1               Sat Jul  6 07:43:37 2024
> 
> 
> dmesg output:
> ---
> 
> ...
> 
> [  230.456893] bcachefs (dm-7): mounting version 1.12: (unknown version) opts=compression=lz4
> [  230.456911] bcachefs (dm-7): recovering from clean shutdown, journal seq 4901
> [  230.456915] bcachefs (dm-7): Version downgrade required:
> [  230.469098] bcachefs (dm-7): alloc_read... done
> [  230.469111] bcachefs (dm-7): stripes_read... done
> [  230.469115] bcachefs (dm-7): snapshots_read... done
> [  230.469436] bcachefs (dm-7): journal_replay... done
> [  230.469441] bcachefs (dm-7): resume_logged_ops... done
> [  230.469450] bcachefs (dm-7): going read-write
> [  368.351326] INFO: task bch-copygc/dm-7:547 blocked for more than 122 seconds.
> [  368.351336]       Not tainted 6.9.4-arch1-1 #1
> [  368.351338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  368.351340] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   ppid:2      flags:0x00004000
> [  368.351345] Call Trace:
> [  368.351348]  <TASK>
> [  368.351354]  __schedule+0x3c7/0x1510
> [  368.351368]  schedule+0x27/0xf0
> [  368.351372]  __closure_sync+0x7e/0x140
> [  368.351382]  __bch2_write+0x136b/0x1660 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351436]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351440]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351441]  ? __kmalloc+0x1a7/0x440
> [  368.351446]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351448]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351452]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351454]  ? local_clock_noinstr+0xd/0xd0
> [  368.351456]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351457]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351460]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351489]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351511]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  368.351512]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351539]  bch2_data_update_init+0x68b/0x1420 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351573]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351602]  bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351631]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351652]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351681]  ? bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351702]  bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351732]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351775]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351828]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  368.351868]  kthread+0xcf/0x100
> [  368.351876]  ? __pfx_kthread+0x10/0x10
> [  368.351882]  ret_from_fork+0x31/0x50
> [  368.351889]  ? __pfx_kthread+0x10/0x10
> [  368.351894]  ret_from_fork_asm+0x1a/0x30
> [  368.351905]  </TASK>
> [  491.230894] INFO: task bch-copygc/dm-7:547 blocked for more than 245 seconds.
> [  491.230914]       Not tainted 6.9.4-arch1-1 #1
> [  491.230920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  491.230924] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   ppid:2      flags:0x00004000
> [  491.230939] Call Trace:
> [  491.230944]  <TASK>
> [  491.230955]  __schedule+0x3c7/0x1510
> [  491.230984]  schedule+0x27/0xf0
> [  491.230993]  __closure_sync+0x7e/0x140
> [  491.231011]  __bch2_write+0x136b/0x1660 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231160]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231169]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231174]  ? __kmalloc+0x1a7/0x440
> [  491.231186]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231192]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231206]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231211]  ? local_clock_noinstr+0xd/0xd0
> [  491.231218]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231223]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231232]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231340]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231412]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  491.231418]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231509]  bch2_data_update_init+0x68b/0x1420 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231625]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231732]  bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231823]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231883]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.231963]  ? bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232022]  bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232089]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232148]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232217]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> [  491.232271]  kthread+0xcf/0x100
> [  491.232282]  ? __pfx_kthread+0x10/0x10
> [  491.232289]  ret_from_fork+0x31/0x50
> [  491.232298]  ? __pfx_kthread+0x10/0x10
> [  491.232304]  ret_from_fork_asm+0x1a/0x30
> [  491.232319]  </TASK>
> 
> ...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17  0:09 ` Kent Overstreet
@ 2024-10-17  8:29   ` Carl E. Thompson
  2024-10-17  8:39     ` Kent Overstreet
  0 siblings, 1 reply; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-17  8:29 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs@vger.kernel.org

Unfortunately 6.10 and 6.11 aren't options for the normal use of my laptop. But I was easily able to recover from backups so no harm done and I'm back on 6.9.

I do think problems with being able to switch between different kernel versions are a pretty big deal, though. At least they are in my workflows. 

Would an approach similar to the one ZFS takes be better where the filesystem's on-disk format is never upgraded automatically but requires the admin to manually run an upgrade?

Can tests be added to your test suite to make sure previous kernels can still access and use filesystems after they've been upgraded by newer kernels?

Being able to revert to an earlier kernel if a newer one has problems is a big deal for me.

Thanks,
Carl


> On 2024-10-16 5:09 PM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
>  
> On Tue, Oct 15, 2024 at 09:51:16PM -0700, Carl E. Thompson wrote:
> > Hi,
> > 
> >      I believe there is another newer version downgrade bug in bcachefs (tested versions: 6.9.4 <--> 6.11.3).
> > 
> >      My laptop runs kernel 6.9.4 normally with 4 bcachefs filesystems
> >      on LVM2 logical volumes mounted including the root filesystem. I
> >      needed to test something under 6.11 so I booted kernel 6.11.3 and
> >      used the system normally from the console (bcachefs worked fine
> >      under 6.11.3). After attempting to boot back into 6.9.4 my laptop
> >      no longer starts and hangs when trying to mount and manipulate
> >      the root filesystem. The kernel log shows kernel traces due to
> >      hung copygc tasks (see dmesg output below). This happens every
> >      time I try to start 6.9.4 now. The kernel log reveals that the
> >      bcachefs filesystem seems to complete the version downgrade and
> >      initial mount successfully but it starts hanging as soon as the
> >      filesystem is used. Booting back into the 6.11.3 kernel causes
> >      the filesystems to work again but I can't run 6.11 on my laptop
> >      normally because 6.11 (and 6.10) have amdgpu issues that cause
> >      irrecoverable graphical desktop lockups. So right now I can
> >      either choose to boot with filesystem s that don't work or with
> >      periodic hard graphical desktop crashes neither of which is
> >      ideal.
> 
> Yeah, it looks like 6.9 isn't running the recovery passess specified in
> the superblock downgrade section, meaning we start running without
> correct accounting counters - 6.10 works, though.
> 
> 6.10 is a LTS release and 6.9 is not - is 6.10 an option?
> 
> 
> > 
> >      On my laptop and some of my other computers I boot multiple Linux distributions which usually run different kernels and mount the same filesystems on all of them (except root). So I do need to be able to switch back and forth between kernels as needed on all of my systems and these types of issues give me some pause. I will disable bcachefs use on my dev systems and servers for now until I am more confident that there is a solid testing plan in place to make sure there can be no more of these kind of issues in the future when booting multiple kernels. I will keep bcachefs on my laptop for testing. A fix for my laptop isn't urgent for me personally as I can recreate the filesystems under 6.9.4 and restore from backups. Of course others people might need a fix more quickly. Next time I need to boot a different kernel I'll make sure to create LVM snapshots of the devices first to which I can revert if needed. 
> > 
> > Thanks,
> > Carl
> > 
> > show-super from one affected filesystem:
> > ---
> > [clip carl]# bcachefs show-super /dev/clip/root-alpine 
> > Device:                                     (unknown device)
> > External UUID:                             c992a5de-c9b3-4fd1-82ed-4d2f66bc11cb
> > Internal UUID:                             43b4fe97-f5a4-48b3-8d99-3a3dda25211a
> > Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
> > Device index:                              0
> > Label:                                     (none)
> > Version:                                   1.12: (unknown version)
> > Version upgrade complete:                  1.12: (unknown version)
> > Oldest version on disk:                    1.4: member_seq
> > Created:                                   Fri Mar 22 19:19:01 2024
> > Sequence number:                           249
> > Time of last write:                        Tue Oct 15 20:23:34 2024
> > Superblock size:                           4.45 KiB/1.00 MiB
> > Clean:                                     0
> > Devices:                                   1
> > Sections:                                  members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
> > Features:                                  lz4,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
> > Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
> > 
> > Options:
> >   block_size:                              4.00 KiB
> >   btree_node_size:                         256 KiB
> >   errors:                                  continue [fix_safe] panic ro 
> >   metadata_replicas:                       1
> >   data_replicas:                           1
> >   metadata_replicas_required:              1
> >   data_replicas_required:                  1
> >   encoded_extent_max:                      64.0 KiB
> >   metadata_checksum:                       none [crc32c] crc64 xxhash 
> >   data_checksum:                           none [crc32c] crc64 xxhash 
> >   compression:                             lz4
> >   background_compression:                  none
> >   str_hash:                                crc32c crc64 [siphash] 
> >   metadata_target:                         none
> >   foreground_target:                       none
> >   background_target:                       none
> >   promote_target:                          none
> >   erasure_code:                            0
> >   inodes_32bit:                            1
> >   shard_inode_numbers:                     1
> >   inodes_use_key_cache:                    1
> >   gc_reserve_percent:                      8
> >   gc_reserve_bytes:                        0 B
> >   root_reserve_percent:                    0
> >   wide_macs:                               0
> >   promote_whole_extents:                   0
> >   acl:                                     1
> >   usrquota:                                0
> >   grpquota:                                0
> >   prjquota:                                0
> >   journal_flush_delay:                     1000
> >   journal_flush_disabled:                  0
> >   journal_reclaim_delay:                   100
> >   journal_transaction_names:               1
> >   allocator_stuck_timeout:                 30
> >   version_upgrade:                         [compatible] incompatible none 
> >   nocow:                                   0
> > 
> > members_v2 (size 160):
> > Device:                                    0
> >   Label:                                   (none)
> >   UUID:                                    352e33b9-dde4-48da-8fe2-255ae78c6320
> >   Size:                                    24.0 GiB
> >   read errors:                             0
> >   write errors:                            0
> >   checksum errors:                         2
> >   seqread iops:                            0
> >   seqwrite iops:                           0
> >   randread iops:                           0
> >   randwrite iops:                          0
> >   Bucket size:                             256 KiB
> >   First bucket:                            0
> >   Buckets:                                 98304
> >   Last mount:                              Tue Oct 15 20:23:32 2024
> >   Last superblock write:                   249
> >   State:                                   rw
> >   Data allowed:                            journal,btree,user
> >   Has data:                                journal,btree,user
> >   Btree allocated bitmap blocksize:        1.00 MiB
> >   Btree allocated bitmap:                  0000000000000000000000000000011111111111111111111111111111111111
> >   Durability:                              1
> >   Discard:                                 1
> >   Freespace initialized:                   1
> > 
> > errors (size 24):
> > bset_bad_csum                               1               Sat Jul  6 07:43:37 2024
> > 
> > 
> > dmesg output:
> > ---
> > 
> > ...
> > 
> > [  230.456893] bcachefs (dm-7): mounting version 1.12: (unknown version) opts=compression=lz4
> > [  230.456911] bcachefs (dm-7): recovering from clean shutdown, journal seq 4901
> > [  230.456915] bcachefs (dm-7): Version downgrade required:
> > [  230.469098] bcachefs (dm-7): alloc_read... done
> > [  230.469111] bcachefs (dm-7): stripes_read... done
> > [  230.469115] bcachefs (dm-7): snapshots_read... done
> > [  230.469436] bcachefs (dm-7): journal_replay... done
> > [  230.469441] bcachefs (dm-7): resume_logged_ops... done
> > [  230.469450] bcachefs (dm-7): going read-write
> > [  368.351326] INFO: task bch-copygc/dm-7:547 blocked for more than 122 seconds.
> > [  368.351336]       Not tainted 6.9.4-arch1-1 #1
> > [  368.351338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  368.351340] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   ppid:2      flags:0x00004000
> > [  368.351345] Call Trace:
> > [  368.351348]  <TASK>
> > [  368.351354]  __schedule+0x3c7/0x1510
> > [  368.351368]  schedule+0x27/0xf0
> > [  368.351372]  __closure_sync+0x7e/0x140
> > [  368.351382]  __bch2_write+0x136b/0x1660 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351436]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351440]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351441]  ? __kmalloc+0x1a7/0x440
> > [  368.351446]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351448]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351452]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351454]  ? local_clock_noinstr+0xd/0xd0
> > [  368.351456]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351457]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351460]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351489]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351511]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  368.351512]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351539]  bch2_data_update_init+0x68b/0x1420 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351573]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351602]  bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351631]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351652]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351681]  ? bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351702]  bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351732]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351775]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351828]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  368.351868]  kthread+0xcf/0x100
> > [  368.351876]  ? __pfx_kthread+0x10/0x10
> > [  368.351882]  ret_from_fork+0x31/0x50
> > [  368.351889]  ? __pfx_kthread+0x10/0x10
> > [  368.351894]  ret_from_fork_asm+0x1a/0x30
> > [  368.351905]  </TASK>
> > [  491.230894] INFO: task bch-copygc/dm-7:547 blocked for more than 245 seconds.
> > [  491.230914]       Not tainted 6.9.4-arch1-1 #1
> > [  491.230920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  491.230924] task:bch-copygc/dm-7 state:D stack:0     pid:547   tgid:547   ppid:2      flags:0x00004000
> > [  491.230939] Call Trace:
> > [  491.230944]  <TASK>
> > [  491.230955]  __schedule+0x3c7/0x1510
> > [  491.230984]  schedule+0x27/0xf0
> > [  491.230993]  __closure_sync+0x7e/0x140
> > [  491.231011]  __bch2_write+0x136b/0x1660 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231160]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231169]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231174]  ? __kmalloc+0x1a7/0x440
> > [  491.231186]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231192]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231206]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231211]  ? local_clock_noinstr+0xd/0xd0
> > [  491.231218]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231223]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231232]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231340]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231412]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  491.231418]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231509]  bch2_data_update_init+0x68b/0x1420 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231625]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231732]  bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231823]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231883]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.231963]  ? bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.232022]  bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.232089]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.232148]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.232217]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
> > [  491.232271]  kthread+0xcf/0x100
> > [  491.232282]  ? __pfx_kthread+0x10/0x10
> > [  491.232289]  ret_from_fork+0x31/0x50
> > [  491.232298]  ? __pfx_kthread+0x10/0x10
> > [  491.232304]  ret_from_fork_asm+0x1a/0x30
> > [  491.232319]  </TASK>
> > 
> > ...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17  8:29   ` Carl E. Thompson
@ 2024-10-17  8:39     ` Kent Overstreet
  2024-10-17  9:15       ` Carl E. Thompson
  0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2024-10-17  8:39 UTC (permalink / raw)
  To: Carl E. Thompson; +Cc: linux-bcachefs@vger.kernel.org

On Thu, Oct 17, 2024 at 01:29:07AM -0700, Carl E. Thompson wrote:
> Unfortunately 6.10 and 6.11 aren't options for the normal use of my laptop. But I was easily able to recover from backups so no harm done and I'm back on 6.9.
> 
> I do think problems with being able to switch between different kernel
> versions are a pretty big deal, though. At least they are in my
> workflows. 

It absolutely is - that was a bug, and it was fixed, a year ago.

But 6.9 hasn't been getting updates for some time, and 6.10 has been, so
there's not really much I can do at this point. And you should be
running a kernel that's still getting updates.

> Would an approach similar to the one ZFS takes be better where the
> filesystem's on-disk format is never upgraded automatically but
> requires the admin to manually run an upgrade?

At some point we'll likely be switching to that model. But for right
now, while it's still marked experimental, it would be adding a lot of
overhead to add conditionals to all the code for new features, so that's
why I'm not doing it while we're still rapidly iterating - it wasn't
realistically possible with the disk accounting rewrite.

> Can tests be added to your test suite to make sure previous kernels
> can still access and use filesystems after they've been upgraded by
> newer kernels?

I have those tests now. They're not run automatically yet, and they
could use some improvement, but they're there.

Again - bcachefs was only merged in 6.7, clearly marked experimental,
and you're running 6.9; this kind of bug is exactly the sort of thing we
try to shake out in the experimental phase.

Also, a fsck would have sufficed, if you haven't ran that already.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17  8:39     ` Kent Overstreet
@ 2024-10-17  9:15       ` Carl E. Thompson
  2024-10-17  9:30         ` Kent Overstreet
  0 siblings, 1 reply; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-17  9:15 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs@vger.kernel.org


> On 2024-10-17 1:39 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:

> ...

> Again - bcachefs was only merged in 6.7, clearly marked experimental,
> and you're running 6.9; this kind of bug is exactly the sort of thing we
> try to shake out in the experimental phase.

Not a bcachefs problem but as a distribution user I would have no idea that bcachefs was experimental. Every major distribution I've looked at recently includes the bcachefs module and tools and there is nothing to tell the user it's experimental. Only the person who actually configured the kernel (or people who read mailing lists) would know that it's experimental.

Perhaps if this is to be expected right now the bcachefs command line tool should output a big warning letting users know that bcachefs is experimental and might eat their data?

> Also, a fsck would have sufficed, if you haven't ran that already.

That must be a different bug because that doesn't work. I still have the old filesystem images and I just tried fsck again, then mounted, then tried to unmount and immediately got the same filesystem lockup. Time to reboot. See below.

Carl


fsck test:
---

[clip carl]# fsck.bcachefs /dev/clip/local.old
Running userspace offline fsck
starting version 1.12: (unknown version) opts=ro,compression=lz4,nopromote_whole_extents,degraded,fsck,fix_errors=ask,read_only
recovering from clean shutdown, journal seq 5004
Version downgrade required:
  running recovery passes: check_allocations
accounting_read... done
alloc_read... done
stripes_read... done
snapshots_read... done
check_allocations... done
going read-write
journal_replay... done
check_alloc_info... done
check_lrus... done
check_btree_backpointers... done
check_backpointers_to_extents... done
check_extents_to_backpointers... done
check_alloc_to_lru_refs... done
check_snapshot_trees... done
check_snapshots... done
check_subvols... done
check_subvol_children... done
delete_dead_snapshots... done
check_inodes... done
check_extents... done
check_indirect_extents... done
check_dirents... done
check_xattrs... done
check_root... done
check_subvolume_structure... done
check_directory_structure... done
check_nlinks... done
resume_logged_ops... done
delete_dead_inodes... done
shutdown complete, journal seq 5005


[clip carl]# mount /dev/clip/local.old /cet

[clip carl]# umount /cet



^C^C^C^C^C



From dmesg:
---

[10679.819992] bcachefs (dm-7): mounting version 1.11: (unknown version) opts=compression=lz4
[10679.820007] bcachefs (dm-7): recovering from clean shutdown, journal seq 5005
[10679.820009] bcachefs (dm-7): Version downgrade required:
[10679.828985] bcachefs (dm-7): alloc_read... done
[10679.828999] bcachefs (dm-7): stripes_read... done
[10679.829004] bcachefs (dm-7): snapshots_read... done
[10679.829439] bcachefs (dm-7): journal_replay... done
[10679.829445] bcachefs (dm-7): resume_logged_ops... done
[10679.829454] bcachefs (dm-7): going read-write
[10929.797197] Process accounting resumed
[10930.010868] r8169 0000:02:00.0 eth0: Link is Down
[10935.961246] INFO: task bch-copygc/dm-7:28532 blocked for more than 122 seconds.
[10935.961252]       Not tainted 6.9.4-arch1-1 #1
[10935.961253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10935.961255] task:bch-copygc/dm-7 state:D stack:0     pid:28532 tgid:28532 ppid:2      flags:0x00024000
[10935.961259] Call Trace:
[10935.961261]  <TASK>
[10935.961265]  __schedule+0x3c7/0x1510
[10935.961275]  schedule+0x27/0xf0
[10935.961278]  __closure_sync+0x7e/0x140
[10935.961283]  __bch2_write+0x136b/0x1660 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961336]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.961339]  ? __kmalloc+0x1a7/0x440
[10935.961343]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.961346]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.961351]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.961353]  ? local_clock_noinstr+0xd/0xd0
[10935.961355]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.961357]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.961360]  ? bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961397]  bch2_moving_ctxt_do_pending_writes+0x11a/0x220 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961426]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.961428]  ? bch2_btree_path_traverse_one+0x958/0xcf0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961467]  bch2_data_update_init+0x68b/0x1420 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961512]  ? bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961549]  bch2_move_extent+0x3da/0xed0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961584]  ? bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961615]  bch2_evacuate_bucket+0x9d4/0xc00 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961653]  ? bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961684]  bch2_copygc+0x210/0x880 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961718]  bch2_copygc_thread+0x152/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961749]  ? bch2_copygc_thread+0xcf/0x3d0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961784]  ? __pfx_bch2_copygc_thread+0x10/0x10 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961813]  kthread+0xcf/0x100
[10935.961818]  ? __pfx_kthread+0x10/0x10
[10935.961821]  ret_from_fork+0x31/0x50
[10935.961825]  ? __pfx_kthread+0x10/0x10
[10935.961828]  ret_from_fork_asm+0x1a/0x30
[10935.961834]  </TASK>
[10935.961835] INFO: task umount:28561 blocked for more than 122 seconds.
[10935.961837]       Not tainted 6.9.4-arch1-1 #1
[10935.961838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10935.961839] task:umount          state:D stack:0     pid:28561 tgid:28561 ppid:27681  flags:0x00004004
[10935.961842] Call Trace:
[10935.961844]  <TASK>
[10935.961846]  __schedule+0x3c7/0x1510
[10935.961850]  ? schedule+0x27/0xf0
[10935.961855]  schedule+0x27/0xf0
[10935.961857]  schedule_timeout+0x12f/0x160
[10935.961862]  wait_for_completion+0x86/0x170
[10935.961866]  kthread_stop+0x6a/0x180
[10935.961869]  bch2_copygc_stop+0x1e/0x80 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961900]  __bch2_fs_read_only+0x3b/0x210 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961938]  bch2_fs_read_only+0x140/0x3f0 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.961972]  ? __pfx_autoremove_wake_function+0x10/0x10
[10935.961976]  __bch2_fs_stop+0x5a/0x380 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.962009]  generic_shutdown_super+0x77/0x170
[10935.962014]  bch2_kill_sb+0x16/0x20 [bcachefs 8d6f6bf430dcfbb124cd4a016333997e24e1fc8a]
[10935.962053]  deactivate_locked_super+0x30/0xb0
[10935.962057]  cleanup_mnt+0xba/0x150
[10935.962061]  task_work_run+0x59/0x90
[10935.962065]  syscall_exit_to_user_mode+0x1fe/0x210
[10935.962067]  do_syscall_64+0x8f/0x190
[10935.962071]  ? syscall_exit_to_user_mode+0x75/0x210
[10935.962073]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.962075]  ? do_syscall_64+0x8f/0x190
[10935.962077]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.962079]  ? srso_alias_return_thunk+0x5/0xfbef5
[10935.962081]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[10935.962085] RIP: 0033:0x79801ee197a9
[10935.962127] RSP: 002b:00007ffd0559c690 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[10935.962130] RAX: 0000000000000000 RBX: 000079801edcfb50 RCX: 000079801ee197a9
[10935.962132] RDX: 000000000000034a RSI: 0000000000000000 RDI: 000056a4fc1f53b0
[10935.962133] RBP: 000056a4fc1f53b0 R08: 00000000ffffff9c R09: 0000000000000000
[10935.962135] R10: 00000000fffffffe R11: 0000000000000246 R12: 0000000000000000
[10935.962136] R13: 000079801edcfc58 R14: 000056a4fc1f5740 R15: 0000000000000000
[10935.962140]  </TASK>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17  9:15       ` Carl E. Thompson
@ 2024-10-17  9:30         ` Kent Overstreet
  2024-10-17  9:45           ` Carl E. Thompson
  0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2024-10-17  9:30 UTC (permalink / raw)
  To: Carl E. Thompson; +Cc: linux-bcachefs@vger.kernel.org

On Thu, Oct 17, 2024 at 02:15:27AM -0700, Carl E. Thompson wrote:
> 
> > On 2024-10-17 1:39 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> > ...
> 
> > Again - bcachefs was only merged in 6.7, clearly marked experimental,
> > and you're running 6.9; this kind of bug is exactly the sort of thing we
> > try to shake out in the experimental phase.
> 
> Not a bcachefs problem but as a distribution user I would have no idea
> that bcachefs was experimental. Every major distribution I've looked
> at recently includes the bcachefs module and tools and there is
> nothing to tell the user it's experimental. Only the person who
> actually configured the kernel (or people who read mailing lists)
> would know that it's experimental.

I'm honestly not surprised, when I met with the Fedora people just prior
to 6.7 I spent most of the meeting telling them to _slow down_.

> Perhaps if this is to be expected right now the bcachefs command line
> tool should output a big warning letting users know that bcachefs is
> experimental and might eat their data?

Honestly not warrented at this point, things have been stabilizing fast
and I'm likely 6 months or so from taking the experimental label off.

> > Also, a fsck would have sufficed, if you haven't ran that already.
> 
> That must be a different bug because that doesn't work. I still have
> the old filesystem images and I just tried fsck again, then mounted,
> then tried to unmount and immediately got the same filesystem lockup.
> Time to reboot. See below.

Please try 6.10 - I can't get any fixes into 6.9 anymore, but if you
still run into issues on 6.10 I can fix that.

And check 'bcachefs fs usage' first, that will tell us if disc
accounting is screwed up or if we're looking for something else.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17  9:30         ` Kent Overstreet
@ 2024-10-17  9:45           ` Carl E. Thompson
  2024-10-17 10:13             ` Kent Overstreet
  0 siblings, 1 reply; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-17  9:45 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs@vger.kernel.org


> On 2024-10-17 2:30 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:

> ...

> Honestly not warrented at this point, things have been stabilizing fast
> and I'm likely 6 months or so from taking the experimental label off.

Remember the 80/20 rule. I suspect that may be (should be) further away than we might like.

Carl

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17  9:45           ` Carl E. Thompson
@ 2024-10-17 10:13             ` Kent Overstreet
  2024-10-17 16:49               ` Carl E. Thompson
  0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2024-10-17 10:13 UTC (permalink / raw)
  To: Carl E. Thompson; +Cc: linux-bcachefs@vger.kernel.org

On Thu, Oct 17, 2024 at 02:45:19AM -0700, Carl E. Thompson wrote:
> 
> > On 2024-10-17 2:30 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> > ...
> 
> > Honestly not warrented at this point, things have been stabilizing fast
> > and I'm likely 6 months or so from taking the experimental label off.
> 
> Remember the 80/20 rule. I suspect that may be (should be) further away than we might like.

Carl, I'm happy to fix bugs that you find, but the attitude and the
unnecessary advice - that I don't enjoy.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17 10:13             ` Kent Overstreet
@ 2024-10-17 16:49               ` Carl E. Thompson
  2024-10-18  8:17                 ` Christopher Snowhill
  0 siblings, 1 reply; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-17 16:49 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs@vger.kernel.org



> On 2024-10-17 3:13 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:

> ...

> Carl, I'm happy to fix bugs that you find, but the attitude and the
> unnecessary advice - that I don't enjoy.

Seriously, Kent? **You're** talking to **me** about attitude?

**You're** giving **me** passive-aggressive attitude about how I'm reporting a bug when using a kernel which is only 4 months old. And how about that crack about about how I should have just done an fsck when you knew (should have known) that a fsck doesn't fix the problem? I don't doubt that in your mind those are subtle insults that would fly right over my head and I wouldn't notice but I might suggest that other people aren't quite the idiots you seem to always assume we are.

I'm trying to **help** you, Kent. And that includes the reminder about how long it takes to put mostly-done software into a usable state. That's not attitude, that's just reality.

Carl

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-17 16:49               ` Carl E. Thompson
@ 2024-10-18  8:17                 ` Christopher Snowhill
  2024-10-18 17:37                   ` Carl E. Thompson
  0 siblings, 1 reply; 26+ messages in thread
From: Christopher Snowhill @ 2024-10-18  8:17 UTC (permalink / raw)
  To: Carl E. Thompson, Kent Overstreet; +Cc: linux-bcachefs@vger.kernel.org

On Thu Oct 17, 2024 at 9:49 AM PDT, Carl E. Thompson wrote:
>
>
> > On 2024-10-17 3:13 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> > ...
>
> > Carl, I'm happy to fix bugs that you find, but the attitude and the
> > unnecessary advice - that I don't enjoy.
>
> Seriously, Kent? **You're** talking to **me** about attitude?
>
> **You're** giving **me** passive-aggressive attitude about how I'm reporting a bug when using a kernel which is only 4 months old. And how about that crack about about how I should have just done an fsck when you knew (should have known) that a fsck doesn't fix the problem? I don't doubt that in your mind those are subtle insults that would fly right over my head and I wouldn't notice but I might suggest that other people aren't quite the idiots you seem to always assume we are.
>
> I'm trying to **help** you, Kent. And that includes the reminder about how long it takes to put mostly-done software into a usable state. That's not attitude, that's just reality.
>
> Carl

FYI, 6.9 was first released in May, and is already EOL now. In fact, so
is 6.10, at 6.10.14. There are no LTS kernels with bcachefs support yet
at this point. All further development is going into 6.12 now, with bug
fixes being backported into 6.11.

The only distributions still shipping 6.9 or 6.10 are backporting things
themselves. Actually, last I recall, there's at least one distribution
that decided to slap 6.8 into an LTS as well. I guess big distributions
just *love* doing their own backport work.

- Christopher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-18  8:17                 ` Christopher Snowhill
@ 2024-10-18 17:37                   ` Carl E. Thompson
  2024-10-18 19:12                     ` Kent Overstreet
  0 siblings, 1 reply; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-18 17:37 UTC (permalink / raw)
  To: Christopher Snowhill, Kent Overstreet; +Cc: linux-bcachefs@vger.kernel.org


> On 2024-10-18 1:17 AM PDT Christopher Snowhill <chris@kode54.net> wrote:

> ...

> FYI, 6.9 was first released in May, and is already EOL now. In fact, so
> is 6.10, at 6.10.14.

I'm on 6.9.4 which was released in June. It's not really by choice; there are some nasty stability issues in the amdgpu driver in 6.10 and 6.11 which cause my laptop to hang multiple times per day if I use a graphical environment under them.

> There are no LTS kernels with bcachefs support yet
> at this point. All further development is going into 6.12 now, with bug
> fixes being backported into 6.11.

True. I understand that and accept it. I am not at all suggesting that Kent should make fixes for unsupported kernels. I'm just reporting a bug. But I also think that this sort of issue is one that really, **REALLY** should not happen in the first place. A filesystem silently modifying itself so that it no longer works on earlier kernels without warning is a very bad thing in my opinion. There should be processes in place to catch that sort of problem **before** a new kernel is released. And remember this isn't a one-time thing. It's happened before.

I'd also say that a filesystem design that requires that an older driver be able to successfully automatically discover and undo random on-disk modifications automatically made by a newer driver is probably a bad design (my personal opinion). It's sounds great until it doesn't work (as in this case). And I mean that as constructive criticism and not something that Kent should take personally. I think that particular design decision should be reevaluated.

It's just dumb luck that in this case the older kernel is no longer supported so it doesn't need to be fixed.

As for my particular case, I should probably see if I can get the older, stable amdgpu driver to compile under the current kernel. Barring that, I should see if I can get the current bcachefs driver to compile under the older kernel. I guess I'll find time to do that this evening.

> ...

You make some good points.

Thanks,
Carl

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-18 17:37                   ` Carl E. Thompson
@ 2024-10-18 19:12                     ` Kent Overstreet
  2024-10-19  0:15                       ` Carl E. Thompson
  0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2024-10-18 19:12 UTC (permalink / raw)
  To: Carl E. Thompson; +Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org

On Fri, Oct 18, 2024 at 10:37:54AM -0700, Carl E. Thompson wrote:
> 
> > On 2024-10-18 1:17 AM PDT Christopher Snowhill <chris@kode54.net> wrote:
> 
> > ...
> 
> > FYI, 6.9 was first released in May, and is already EOL now. In fact, so
> > is 6.10, at 6.10.14.
> 
> I'm on 6.9.4 which was released in June. It's not really by choice; there are some nasty stability issues in the amdgpu driver in 6.10 and 6.11 which cause my laptop to hang multiple times per day if I use a graphical environment under them.
> 
> > There are no LTS kernels with bcachefs support yet
> > at this point. All further development is going into 6.12 now, with bug
> > fixes being backported into 6.11.
> 
> True. I understand that and accept it. I am not at all suggesting that Kent should make fixes for unsupported kernels. I'm just reporting a bug. But I also think that this sort of issue is one that really, **REALLY** should not happen in the first place. A filesystem silently modifying itself so that it no longer works on earlier kernels without warning is a very bad thing in my opinion. There should be processes in place to catch that sort of problem **before** a new kernel is released. And remember this isn't a one-time thing. It's happened before.
> 
> I'd also say that a filesystem design that requires that an older
> driver be able to successfully automatically discover and undo random
> on-disk modifications automatically made by a newer driver is probably
> a bad design (my personal opinion). It's sounds great until it doesn't
> work (as in this case). And I mean that as constructive criticism and
> not something that Kent should take personally. I think that
> particular design decision should be reevaluated.

It sounds like you should go back to ZFS then.

> It's just dumb luck that in this case the older kernel is no longer
> supported so it doesn't need to be fixed.
> 
> As for my particular case, I should probably see if I can get the
> older, stable amdgpu driver to compile under the current kernel.
> Barring that, I should see if I can get the current bcachefs driver to
> compile under the older kernel. I guess I'll find time to do that this
> evening.

I have been hearing a lot of people complain about regressions in
amdgpu. Have you taken that up with them?

The capability of doing seamless, automatic upgrades and downgrades, to
the extent that bcachefs can, is something new that other filesystems
don't have.

And it's been incredibly useful. Without it, rolling out the disk
accounting rewrite wouldn't have been possible - and that got us per
snapshot ID accounting, compression type accounting, per-btree
accounting, and per-inode fragmentation accounting (not all of these are
exposed to the user yet, but they're there).

While bcachefs is still marked experimental, doing these upgrades
automatically makes sense because it gets us better test coverage of
codepaths that will need to be rock solid later, and it drastically
reduces the amount of compat code I have to carry around. I'm hoping to
get several more on disk format features done while we're still in the
experimental phase - for inode allocation improvements, fsck performance
improvements, and assorted other odds and ends.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-18 19:12                     ` Kent Overstreet
@ 2024-10-19  0:15                       ` Carl E. Thompson
  2024-10-19  8:13                         ` Malte Schröder
  2024-10-20 16:59                         ` Kent Overstreet
  0 siblings, 2 replies; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-19  0:15 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org


> On 2024-10-18 12:12 PM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:

> ...

> It sounds like you should go back to ZFS then.

If you tell me you don't want me testing bcachefs anymore it won't hurt my feelings and I'll respect your wishes. There are plenty of quality filesystems for me to use where I'll have less hassle. But I'd suggest to you that pushing out testers who point out bugs and try to offer constructive criticism isn't the best way to make quality software.

> I have been hearing a lot of people complain about regressions in
> amdgpu. Have you taken that up with them?

I personally have not because other people already have. They're aware of the issues and I'm confident they'll fix them.

> The capability of doing seamless, automatic upgrades and downgrades, to
> the extent that bcachefs can, is something new that other filesystems
> don't have.
> 
> And it's been incredibly useful. Without it, rolling out the disk
> accounting rewrite wouldn't have been possible - and that got us per
> snapshot ID accounting, compression type accounting, per-btree
> accounting, and per-inode fragmentation accounting (not all of these are
> exposed to the user yet, but they're there).
> 
> While bcachefs is still marked experimental, doing these upgrades
> automatically makes sense because it gets us better test coverage of
> codepaths that will need to be rock solid later, and it drastically
> reduces the amount of compat code I have to carry around. I'm hoping to
> get several more on disk format features done while we're still in the
> experimental phase - for inode allocation improvements, fsck performance
> improvements, and assorted other odds and ends.

What you say here makes perfect sense. But I also think it's risky. That's the point I was trying to make.

Carl

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-19  0:15                       ` Carl E. Thompson
@ 2024-10-19  8:13                         ` Malte Schröder
  2024-10-19  8:31                           ` Martin Steigerwald
  2024-10-20 16:59                         ` Kent Overstreet
  1 sibling, 1 reply; 26+ messages in thread
From: Malte Schröder @ 2024-10-19  8:13 UTC (permalink / raw)
  To: Carl E. Thompson
  Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org,
	Kent Overstreet

On 19/10/2024 02:15, Carl E. Thompson wrote:
> If you tell me you don't want me testing bcachefs anymore it won't hurt my feelings and I'll respect your wishes. There are plenty of quality filesystems for me to use where I'll have less hassle. But I'd suggest to you that pushing out testers who point out bugs and try to offer constructive criticism isn't the best way to make quality software.

I think in your case the developer of the fs is the wrong person to 
complain to. The issues you are reporting have looong been fixed but 
apparently your distro neglected to provide these fixes to its users. So 
if you are stuck with a 6.9 series kernel, well, bcachefs was really not 
ready for daily use back then. 6.11 is fine, 6.12 seems to fix the last 
issue I was seeing. So I think the options you have are: get a newer 
kernel and/or choose a different fs.


/Malte


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-19  8:13                         ` Malte Schröder
@ 2024-10-19  8:31                           ` Martin Steigerwald
  2024-10-19  9:29                             ` Carl E. Thompson
  2024-10-19 20:18                             ` Jani Partanen
  0 siblings, 2 replies; 26+ messages in thread
From: Martin Steigerwald @ 2024-10-19  8:31 UTC (permalink / raw)
  To: Carl E. Thompson, Malte Schröder
  Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org,
	Kent Overstreet

Hi Carl, hi Malte, hi,

Malte Schröder - 19.10.24, 10:13:08 MESZ:
> On 19/10/2024 02:15, Carl E. Thompson wrote:
> > If you tell me you don't want me testing bcachefs anymore it won't
> > hurt my feelings and I'll respect your wishes. There are plenty of
> > quality filesystems for me to use where I'll have less hassle. But
> > I'd suggest to you that pushing out testers who point out bugs and
> > try to offer constructive criticism isn't the best way to make
> > quality software.
>
> I think in your case the developer of the fs is the wrong person to
> complain to. The issues you are reporting have looong been fixed but
> apparently your distro neglected to provide these fixes to its users. So
> if you are stuck with a 6.9 series kernel, well, bcachefs was really
> not ready for daily use back then. 6.11 is fine, 6.12 seems to fix the
> last issue I was seeing. So I think the options you have are: get a
> newer kernel and/or choose a different fs.

While I certainly do not agree with Kent on everything – and also not with 
the tone of some conversations –, I agree here about the basic situation:

BCacheFS is marked experimental. My take with that is: As long as it is 
marked experimental and you like to test it and give feedback, it is 
important to move quickly enough to new kernel versions. It was and partly 
still is the same with BTRFS. Developers often asked users to use a newer 
kernel. Feedback on BCacheFS on 6.9 is quite likely not very useful to 
Kent and other BCacheFS developers while they already work on what to 
bring in for 6.13.

It reminds me of an annoying issue with appointment reminders in KDE's 
Plasma and one frustrated bug reporter expecting to fix the issue in the 
version of the software it occured in. Due to the nature of the 
implementation of restoring lost functionality the fix had some familiarity 
with a new feature and was more than 100 lines changed in different files. 
While I certainly get that it has been frustrating for the user, cause the 
issue was annoying for me as well… I would not expect and basically demand 
on how developers use their free time. Of course, Carl, in case you 
support Kent financially regarding BCacheFS development… then that may be a 
bit of a different story, but once kernels are out of stable support… I'd 
still agree with Kent.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-19  8:31                           ` Martin Steigerwald
@ 2024-10-19  9:29                             ` Carl E. Thompson
  2024-10-20  9:29                               ` Kent Overstreet
  2024-10-19 20:18                             ` Jani Partanen
  1 sibling, 1 reply; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-19  9:29 UTC (permalink / raw)
  To: Martin Steigerwald, Malte Schröder
  Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org,
	Kent Overstreet

I reported the bug because it's a bug that I hadn't seen reported or referenced before. I didn't know at the time I reported it that it had already been fixed. 

I actually **still** don't know that the bug is fixed in current versions. What Kent said is that "6.10 works" which isn't quite the same thing as saying that he previously found and fixed the bug. He could have meant that he knew about and fixed it or he could have meant that there have not (yet) been any on-disk layout changes since 6.10 that would trigger the bug. Looking at the 6.10 pull requests I don't see anything in Kent's description of the changes there that suggests that this bug was known or fixed then unless it's the bch2_sb_downgrade_update() fix that went into rc5 but wasn't described. Maybe Kent could clarify?

If the bug **was** discovered and fixed previously, I think a note to the list warning users about the problem would have been helpful. It certainly would have saved me a bunch of time.

Thanks,
Carl


> On 2024-10-19 1:31 AM PDT Martin Steigerwald <martin@lichtvoll.de> wrote:
> 
>  
> Hi Carl, hi Malte, hi,
> 
> Malte Schröder - 19.10.24, 10:13:08 MESZ:
> > On 19/10/2024 02:15, Carl E. Thompson wrote:
> > > If you tell me you don't want me testing bcachefs anymore it won't
> > > hurt my feelings and I'll respect your wishes. There are plenty of
> > > quality filesystems for me to use where I'll have less hassle. But
> > > I'd suggest to you that pushing out testers who point out bugs and
> > > try to offer constructive criticism isn't the best way to make
> > > quality software.
> >
> > I think in your case the developer of the fs is the wrong person to
> > complain to. The issues you are reporting have looong been fixed but
> > apparently your distro neglected to provide these fixes to its users. So
> > if you are stuck with a 6.9 series kernel, well, bcachefs was really
> > not ready for daily use back then. 6.11 is fine, 6.12 seems to fix the
> > last issue I was seeing. So I think the options you have are: get a
> > newer kernel and/or choose a different fs.
> 
> While I certainly do not agree with Kent on everything – and also not with 
> the tone of some conversations –, I agree here about the basic situation:
> 
> BCacheFS is marked experimental. My take with that is: As long as it is 
> marked experimental and you like to test it and give feedback, it is 
> important to move quickly enough to new kernel versions. It was and partly 
> still is the same with BTRFS. Developers often asked users to use a newer 
> kernel. Feedback on BCacheFS on 6.9 is quite likely not very useful to 
> Kent and other BCacheFS developers while they already work on what to 
> bring in for 6.13.
> 
> It reminds me of an annoying issue with appointment reminders in KDE's 
> Plasma and one frustrated bug reporter expecting to fix the issue in the 
> version of the software it occured in. Due to the nature of the 
> implementation of restoring lost functionality the fix had some familiarity 
> with a new feature and was more than 100 lines changed in different files. 
> While I certainly get that it has been frustrating for the user, cause the 
> issue was annoying for me as well… I would not expect and basically demand 
> on how developers use their free time. Of course, Carl, in case you 
> support Kent financially regarding BCacheFS development… then that may be a 
> bit of a different story, but once kernels are out of stable support… I'd 
> still agree with Kent.
> 
> Best,
> -- 
> Martin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-19  8:31                           ` Martin Steigerwald
  2024-10-19  9:29                             ` Carl E. Thompson
@ 2024-10-19 20:18                             ` Jani Partanen
  2024-10-20  8:04                               ` Malte Schröder
  1 sibling, 1 reply; 26+ messages in thread
From: Jani Partanen @ 2024-10-19 20:18 UTC (permalink / raw)
  To: linux-bcachefs


On 19/10/2024 11.31, Martin Steigerwald wrote:
> BCacheFS is marked experimental. My take with that is: As long as it is
> marked experimental and you like to test it and give feedback, it is
> important to move quickly enough to new kernel versions. It was and partly
> still is the same with BTRFS. Developers often asked users to use a newer
> kernel. Feedback on BCacheFS on 6.9 is quite likely not very useful to
> Kent and other BCacheFS developers while they already work on what to
> bring in for 6.13.

Issue is that normal user have no idea that bcachefs is experimental 
because it is only told to be experimental IF you are building building 
kernel yourself.

- modinfo does not say anything about experimental

- bcachefs user tool does not say anything about experimental. iirc 
btrfs user tool warns that you are going to use experimental stuff when 
you select raid5


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-19 20:18                             ` Jani Partanen
@ 2024-10-20  8:04                               ` Malte Schröder
  2024-10-21  3:49                                 ` Jani Partanen
  0 siblings, 1 reply; 26+ messages in thread
From: Malte Schröder @ 2024-10-20  8:04 UTC (permalink / raw)
  To: Jani Partanen, linux-bcachefs

On 19/10/2024 22:18, Jani Partanen wrote:
>
> On 19/10/2024 11.31, Martin Steigerwald wrote:
>> BCacheFS is marked experimental. My take with that is: As long as it is
>> marked experimental and you like to test it and give feedback, it is
>> important to move quickly enough to new kernel versions. It was and 
>> partly
>> still is the same with BTRFS. Developers often asked users to use a 
>> newer
>> kernel. Feedback on BCacheFS on 6.9 is quite likely not very useful to
>> Kent and other BCacheFS developers while they already work on what to
>> bring in for 6.13.
>
> Issue is that normal user have no idea that bcachefs is experimental 
> because it is only told to be experimental IF you are building 
> building kernel yourself.
>
> - modinfo does not say anything about experimental
>
> - bcachefs user tool does not say anything about experimental. iirc 
> btrfs user tool warns that you are going to use experimental stuff 
> when you select raid5
>
>
I think the only place where that can be done sensibly (even 
retroactively) is in the distro's package manager when one installs the 
tools. Might have been something that could have been in the format 
sub-command, but it's way too late for that now.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-19  9:29                             ` Carl E. Thompson
@ 2024-10-20  9:29                               ` Kent Overstreet
  0 siblings, 0 replies; 26+ messages in thread
From: Kent Overstreet @ 2024-10-20  9:29 UTC (permalink / raw)
  To: Carl E. Thompson
  Cc: Martin Steigerwald, Malte Schröder, Christopher Snowhill,
	linux-bcachefs@vger.kernel.org

On Sat, Oct 19, 2024 at 02:29:53AM -0700, Carl E. Thompson wrote:
> I reported the bug because it's a bug that I hadn't seen reported or referenced before. I didn't know at the time I reported it that it had already been fixed. 
> 
> I actually **still** don't know that the bug is fixed in current versions. What Kent said is that "6.10 works" which isn't quite the same thing as saying that he previously found and fixed the bug. He could have meant that he knew about and fixed it or he could have meant that there have not (yet) been any on-disk layout changes since 6.10 that would trigger the bug. Looking at the 6.10 pull requests I don't see anything in Kent's description of the changes there that suggests that this bug was known or fixed then unless it's the bch2_sb_downgrade_update() fix that went into rc5 but wasn't described. Maybe Kent could clarify?

Yes, the bug has been fixed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-19  0:15                       ` Carl E. Thompson
  2024-10-19  8:13                         ` Malte Schröder
@ 2024-10-20 16:59                         ` Kent Overstreet
  2024-10-21  0:34                           ` Carl E. Thompson
  1 sibling, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2024-10-20 16:59 UTC (permalink / raw)
  To: Carl E. Thompson; +Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org

On Fri, Oct 18, 2024 at 05:15:06PM -0700, Carl E. Thompson wrote:
> 
> > On 2024-10-18 12:12 PM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> > ...
> 
> > It sounds like you should go back to ZFS then.
> 
> If you tell me you don't want me testing bcachefs anymore it won't
> hurt my feelings and I'll respect your wishes. There are plenty of
> quality filesystems for me to use where I'll have less hassle. But I'd
> suggest to you that pushing out testers who point out bugs and try to
> offer constructive criticism isn't the best way to make quality
> software.

Testing is fine.

I have a lot of users and testers that I work with, and will bend over
backwards for if something is broken, and to make sure data isn't lost.

But I do draw a line at being demanding, or all the "constructive
criticism" that comes with an attitude of "I know your job better than
you do". You wouldn't walk into someone's office and immediately start
telling them how to do things, would you?

I don't suscribe to the modern theories about how we all have to be nice
to each other and always use nice words, if you're legitimately pissed
off because I screwed up and shipped something broken and it ate your
data and now you're screwed if it's not recovered - that's one thing.

Making sure your data is safe is why I'm here, and if you want help with
recovering a filesystem, I'll always help with that.

But you're complaining about an EOL kernel, where the issues have long
been fixed, so in this case there's just not a lot I can do. I could ask
Greg to cut a new 6.9 release, but all the distros have been told that
6.9 is EOL so it's not likely it would be widely deployed.

> > I have been hearing a lot of people complain about regressions in
> > amdgpu. Have you taken that up with them?
> 
> I personally have not because other people already have. They're aware of the issues and I'm confident they'll fix them.
> 
> > The capability of doing seamless, automatic upgrades and downgrades, to
> > the extent that bcachefs can, is something new that other filesystems
> > don't have.
> > 
> > And it's been incredibly useful. Without it, rolling out the disk
> > accounting rewrite wouldn't have been possible - and that got us per
> > snapshot ID accounting, compression type accounting, per-btree
> > accounting, and per-inode fragmentation accounting (not all of these are
> > exposed to the user yet, but they're there).
> > 
> > While bcachefs is still marked experimental, doing these upgrades
> > automatically makes sense because it gets us better test coverage of
> > codepaths that will need to be rock solid later, and it drastically
> > reduces the amount of compat code I have to carry around. I'm hoping to
> > get several more on disk format features done while we're still in the
> > experimental phase - for inode allocation improvements, fsck performance
> > improvements, and assorted other odds and ends.
> 
> What you say here makes perfect sense. But I also think it's risky.
> That's the point I was trying to make.

Anyone can build a bridge that doesn't fall down, it takes an engineer
to build one that just barely doesn't fall down.

There's is always some level of risk to developing new capabilities and
deploying new code, and we have to balance those risks against the worth
of those capabalities, along with all of our priorities.

The ability to upgrade to a new on disk format, and then seamlessly
downgrade - that hasn't been done before, keep that in mind, and that's
an enormously useful capability that makes it dramatically _less_ risky
to deploy new features.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-20 16:59                         ` Kent Overstreet
@ 2024-10-21  0:34                           ` Carl E. Thompson
  2024-10-21  1:15                             ` Kent Overstreet
  2024-10-21  7:26                             ` Another bcachefs version downgrade bug Martin Steigerwald
  0 siblings, 2 replies; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-21  0:34 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org


> On 2024-10-20 9:59 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:

> ...

> I have a lot of users and testers that I work with, and will bend over
> backwards for if something is broken, and to make sure data isn't lost.

Absolutely. We've all seen you do that over the years. That's a big part of the reason why users like me spend so much of our own time and resources testing and supporting your work.

> But I do draw a line at being demanding, or all the "constructive
> criticism" that comes with an attitude of "I know your job better than
> you do". 

And I'm not doing that. I'm simply reporting bugs and offering perspectives that you may not have thought of for your consideration. I don't demand that you do anything because I **can't** demand that you do anything. 

I'll give you an example. More than a year ago I raised the issue that bcachefs does not allow multiple versions/images of a filesystem to be mounted at the same time. I pointed out that there are several professional workflows that require this (forensics, auditing, etc) and non-professional workflows too (retrieving a previous version of a file from an LVM snapshot, etc). I pointed out that other modern filesystems allow it and argued that it is a needed feature. You disagreed that it is that important and have never (to date) implemented the feature. This causes me to have to create special-case workarounds specific to bcachefs which operate differently than what I do for every other filesystem. Despite this extra work and inconvenience for me I have never brought it up again, never nagged you and never demanded that you listen to me and change your mind. I made my pitch, you made your decision and I accept your decision even if I don't like it.

I give you my opinion and the logic behind it and you can choose to consider what I say or not. You have your decades of experience. I have my decades of experience. Despite my own considerable experience I've personally found that sometimes other people **can** make persuasive points that change my mind and can make a project better **if I'm open to them**.

It's really, really hard to tell what someone's attitude is over the internet. I'm sometimes guilty of trying to do that too and so in the cases where I've done that to you I apologize. But I assure you my intent when I make a comment is to **persuade** you and not to demand or force you to do anything.

> You wouldn't walk into someone's office and immediately start
> telling them how to do things, would you?

No, but that's an orthogonal argument. This isn't an office. This (the Linux kernel) is an open-source project that not only encourages open collaboration among many types of people **it absolutely requires it**. By most any objective measure Linux has been, still is and will continue to be by far the most successful, influential and dominant software project in the history of humanity and it's not even close. In my opinion that's a core reason why it rubs a lot of people the wrong way when you over and over seem to tell Linus and the other kernel developers "you're doing it wrong." (If I'm misinterpreting I apologize.) Now, you've definitely earned the right to voice your opinion but I'd advise taking a page from my book and making your point as eloquently as you can **once** and then let it go if they don't go for all of it. If you must, bring it up again periodically (I'd suggest annually) but of course that's just free advice (and worth every penny!) and you should do what you th
 ink is best in the way you think is best.

I'll also make the self-serving point that Linux isn't successfully only because of kernel developers like you it's just as much due to users like me. Back in the 90s I was once **literally** laughed out of a boardroom for presenting rationale for why my company should pivot from developing products on traditional Unices and start focusing our efforts on the new Linux thing. They disagreed; I left the company and within months they were out of business so I guess I got the last laugh. I didn't write fancy papers and become famous the way some people did back then but I have ever since relentlessly done my part by pushing every company I've worked for to be more successful by more fully leveraging the benefits of Linux. Now, of course, pretty much every company gets it. I advocate for Linux in my personal life too. And there are countless millions of other users who also contribute in their own ways big and small most of whom will never be known or even have conversations on kernel ma
 iling lists but who are just as important and integral to the success of Linux.

So there's a whole big picture thing to keep in mind.

> I don't suscribe to the modern theories about how we all have to be nice
> to each other and always use nice words

Well those theories are not at all modern concepts. Whether you call it karma, or the Golden Rule, or being neighborly, or modern political correctness, the idea that human societies work better when people work together and politely respect others' differences and opinions is ancient.

Now as a practical matter all human beings have egos and feelings. You could make an interesting theoretical logical argument that those things **shouldn't** matter in social interactions such as the development of software but my experience is that in reality those things matter **more than everything else** in **every** type of voluntary social interaction. 

One of the first hard lessons I learned in independent life is that it doesn't matter how smart you are if you can't get people to like you. If a large enough portion of the people around you don't like you they will keep sniping at you until you are (socially or professionally) dead. That's just a fact of life that's also true professionally and it's probably true here too. Should it be that way? The answer doesn't matter. For me, dealing with it professionally was initially an intellectual challenge; how do I get my coworkers to like me so that I (and my ideas) can be more successful? But the funny thing is I found out that purposefully expending effort to be nice so I can further my own success doesn't just make **me** more successful, it makes everyone else more successful too. And it has the unexpected benefit of making me **happier** too, which is something I need.

So while I understand your opinion it's not one I share.

> ...

> But you're complaining about an EOL kernel, where the issues have long
> been fixed, so in this case there's just not a lot I can do. I could ask
> Greg to cut a new 6.9 release, but all the distros have been told that
> 6.9 is EOL so it's not likely it would be widely deployed.

I am **not** complaining about an EOL kernel. I explicitly stated that I **didn't** need a fix. I simply reported a bug that affected me which I was unaware had been fixed.

I do think that if you find and fix a bug like that it would be nice for you to document it and let users know about it so we can possibly avoid wasting a lot of our time and effort unnecessarily.

> ...

> The ability to upgrade to a new on disk format, and then seamlessly
> downgrade - that hasn't been done before, keep that in mind, and that's
> an enormously useful capability that makes it dramatically _less_ risky
> to deploy new features.

As I said previously I understand and accept your reasoning here. I think it's risky to believe you can ahead of time code in the ability to understand and correctly revert any possible format change you'll come up with in the future especially since the only reason we're talking about it is because it's already failed at least once. It's also a system that I believe can't be meaningfully tested without a time machine.

But it's your project and I accept your judgement and I have no need to talk about that feature any further (unless you yourself find it helpful to continue the discussion).

Thanks,
Carl

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-21  0:34                           ` Carl E. Thompson
@ 2024-10-21  1:15                             ` Kent Overstreet
  2024-10-21  7:43                               ` Mounting multiple versions/snapshots/images at the same time (was: Re: Another bcachefs version downgrade bug) Martin Steigerwald
  2024-10-21  7:26                             ` Another bcachefs version downgrade bug Martin Steigerwald
  1 sibling, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2024-10-21  1:15 UTC (permalink / raw)
  To: Carl E. Thompson; +Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org

On Sun, Oct 20, 2024 at 05:34:53PM -0700, Carl E. Thompson wrote:
> 
> > On 2024-10-20 9:59 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> > ...
> 
> > I have a lot of users and testers that I work with, and will bend over
> > backwards for if something is broken, and to make sure data isn't lost.
> 
> Absolutely. We've all seen you do that over the years. That's a big part of the reason why users like me spend so much of our own time and resources testing and supporting your work.
> 
> > But I do draw a line at being demanding, or all the "constructive
> > criticism" that comes with an attitude of "I know your job better than
> > you do". 
> 
> And I'm not doing that. I'm simply reporting bugs and offering perspectives that you may not have thought of for your consideration. I don't demand that you do anything because I **can't** demand that you do anything. 
> 
> I'll give you an example. More than a year ago I raised the issue that
> bcachefs does not allow multiple versions/images of a filesystem to be
> mounted at the same time. I pointed out that there are several
> professional workflows that require this (forensics, auditing, etc)
> and non-professional workflows too (retrieving a previous version of a
> file from an LVM snapshot, etc). I pointed out that other modern
> filesystems allow it and argued that it is a needed feature. You
> disagreed that it is that important and have never (to date)
> implemented the feature. This causes me to have to create special-case
> workarounds specific to bcachefs which operate differently than what I
> do for every other filesystem. Despite this extra work and
> inconvenience for me I have never brought it up again, never nagged
> you and never demanded that you listen to me and change your mind. I
> made my pitch, you made your decision and I accept your decision even
> if I don't like it.

If I came across as strongly disagreeing on that issue, I apologize.

On that issue, it's just that it's a really problematic feature for a
multi device filesystem - we have to have a unique identifier for
identifying the filesystem, separate from the block device, and if it's
not actually unique - what do we do?

Even for single device filesystems it's a problem, because some of our
userspace interfaces use that same unique identifier (sysfs, debugfs)
and a single device filesystem can become a multidevice filesystem at
any time.

So it's not impossible, but genuinely messy, and it's the kind of thing
I could see leading to landmines later, which makes me not particularly
eager to touch it. But if it keeps coming up, I may end up giving it
more study in the future.

> I'll also make the self-serving point that Linux isn't successfully
> only because of kernel developers like you it's just as much due to
> users like me. Back in the 90s I was once **literally** laughed out of
> a boardroom for presenting rationale for why my company should pivot
> from developing products on traditional Unices and start focusing our
> efforts on the new Linux thing. They disagreed; I left the company and
> within months they were out of business so I guess I got the last
> laugh. I didn't write fancy papers and become famous the way some
> people did back then but I have ever since relentlessly done my part
> by pushing every company I've worked for to be more successful by more
> fully leveraging the benefits of Linux. Now, of course, pretty much
> every company gets it. I advocate for Linux in my personal life too.
> And there are countless millions of other users who also contribute in
> their own ways big and small most of whom will never be known or even
> have conversations on kernel mailing lists but who are just as
> important and integral to the success of Linux.

Oh, absolutely. And if you come to the IRC channel, you'll see me
interacting with those users every day.

I could not have accomplished all this without their help.

But I'm the one writing the code, and I have a lot of demands on my
time, so sometimes I do ask people to chill out or be patient.

> > But you're complaining about an EOL kernel, where the issues have long
> > been fixed, so in this case there's just not a lot I can do. I could ask
> > Greg to cut a new 6.9 release, but all the distros have been told that
> > 6.9 is EOL so it's not likely it would be widely deployed.
> 
> I am **not** complaining about an EOL kernel. I explicitly stated that
> I **didn't** need a fix. I simply reported a bug that affected me
> which I was unaware had been fixed.
> 
> I do think that if you find and fix a bug like that it would be nice
> for you to document it and let users know about it so we can possibly
> avoid wasting a lot of our time and effort unnecessarily.

The best practice for users, while it's still experimental, is to stay
on Linus's tree. A lot of users are running the latest rc kernel these
days.

Not because of upgrade/downgrade shenanigans, aside from 6.9 that's been
generally working (I haven't tried 6.7 - perhaps someone could do that)
- but just because of all the fixes and improvements that are rapidly
happening. If you're testing you want to be testing and reporting bugs
on the latest, right? And because the rate of bugfixing makes
backporting anything but the most critical stuff infeasible for now.

(Seriously, the best way to support older kernels for now would be to
just periodically backport /all/ of fs/bcachefs/ to older kernels, and
if I thought Greg would be ok with that that's what I'd be doing).

Note that I do explicitly support users with old, even ancient
filesystems (someone just popped up with a 0.11 filesystem today! with
some seriously crazy corruption), it's just that adding old kernels to
that just adds a bit much to the matrix of things to keep track of for
now.

> > The ability to upgrade to a new on disk format, and then seamlessly
> > downgrade - that hasn't been done before, keep that in mind, and that's
> > an enormously useful capability that makes it dramatically _less_ risky
> > to deploy new features.
> 
> As I said previously I understand and accept your reasoning here. I
> think it's risky to believe you can ahead of time code in the ability
> to understand and correctly revert any possible format change you'll
> come up with in the future especially since the only reason we're
> talking about it is because it's already failed at least once. It's
> also a system that I believe can't be meaningfully tested without a
> time machine.

I'm not writing dedicated upgrade/downgrade code for every new feature -
that would be crazy. The upgrade/downgrade process just works by noting
which fsck passes have to be run, and which errors should be silently
corrected, which means it's quite robust (fsck has to work, after all).

The 6.9 bug was that the superblock downgrade section was being read
incorrectly and the latest entry was skipped - there was some trickyness
with the superblock section due to alignment, kasan popped up some
things and then the fixes were incorrect, whoops. 6.7/6.8/6.9 was a bit
crazy.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-20  8:04                               ` Malte Schröder
@ 2024-10-21  3:49                                 ` Jani Partanen
  0 siblings, 0 replies; 26+ messages in thread
From: Jani Partanen @ 2024-10-21  3:49 UTC (permalink / raw)
  To: Malte Schröder, linux-bcachefs


On 20/10/2024 11.04, Malte Schröder wrote:
>>
>> Issue is that normal user have no idea that bcachefs is experimental 
>> because it is only told to be experimental IF you are building 
>> building kernel yourself.
>>
>> - modinfo does not say anything about experimental
>>
>> - bcachefs user tool does not say anything about experimental. iirc 
>> btrfs user tool warns that you are going to use experimental stuff 
>> when you select raid5
>>
>>
> I think the only place where that can be done sensibly (even 
> retroactively) is in the distro's package manager when one installs 
> the tools. Might have been something that could have been in the 
> format sub-command, but it's way too late for that now.


I dont think it's distros maintainers job. It should have been in bcache 
user tools reporting it as experimental just like btrfs has done. Also 
experimental tag could have been put into module description, but that 
would have been kinda passive method.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Another bcachefs version downgrade bug
  2024-10-21  0:34                           ` Carl E. Thompson
  2024-10-21  1:15                             ` Kent Overstreet
@ 2024-10-21  7:26                             ` Martin Steigerwald
  1 sibling, 0 replies; 26+ messages in thread
From: Martin Steigerwald @ 2024-10-21  7:26 UTC (permalink / raw)
  To: Kent Overstreet, Carl E. Thompson
  Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org

Hi Carl, hi Kent, hi,

Carl E. Thompson - 21.10.24, 02:34:53 MESZ:
> > But you're complaining about an EOL kernel, where the issues have long
> > been fixed, so in this case there's just not a lot I can do. I could
> > ask Greg to cut a new 6.9 release, but all the distros have been told
> > that 6.9 is EOL so it's not likely it would be widely deployed.
> 
> I am **not** complaining about an EOL kernel. I explicitly stated that I
> **didn't** need a fix. I simply reported a bug that affected me which I
> was unaware had been fixed.
> 
> I do think that if you find and fix a bug like that it would be nice for
> you to document it and let users know about it so we can possibly avoid
> wasting a lot of our time and effort unnecessarily.

Apparently there has been quite the misunderstanding here?

Kent understood "user demands a fix in EOL kernel" and you, the user, 
thought you said "user does not demand a fix in EOL kernel"?

If so… it might be important to understand misunderstanding each other can 
happen, especially in written language without body language and voice 
tonality. And that there (often or even always) is a difference between 
what one sends out and the other one receives… and how they receive it. 
And no one is at fault about that. It is just the way we are all different, 
have different internal maps of what we call reality and can understand the 
very same sentence in a different way.

-- 
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Mounting multiple versions/snapshots/images at the same time (was: Re: Another bcachefs version downgrade bug)
  2024-10-21  1:15                             ` Kent Overstreet
@ 2024-10-21  7:43                               ` Martin Steigerwald
  2024-10-21 20:15                                 ` Carl E. Thompson
  0 siblings, 1 reply; 26+ messages in thread
From: Martin Steigerwald @ 2024-10-21  7:43 UTC (permalink / raw)
  To: Carl E. Thompson, Kent Overstreet
  Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org

Hi Carl and Kent,

It is a topic change, so please ping if you like to be dropped from CC.

Thanks for the constructive discussion here. I pick out an aspect I am 
quite interested in.

I am extending this aspect to mounting different snapshots at the same time 
in case you, Carl, did not mean this.

Kent Overstreet - 21.10.24, 03:15:47 MESZ:
> I'll give you an example. More than a year ago I raised the issue that
> 
> > bcachefs does not allow multiple versions/images of a filesystem to be
> > mounted at the same time. I pointed out that there are several
> > professional workflows that require this (forensics, auditing, etc)
> > and non-professional workflows too (retrieving a previous version of a
> > file from an LVM snapshot, etc). I pointed out that other modern
> > filesystems allow it and argued that it is a needed feature. You
> > disagreed that it is that important and have never (to date)
> > implemented the feature. This causes me to have to create special-case
> > workarounds specific to bcachefs which operate differently than what I
> > do for every other filesystem. Despite this extra work and
> > inconvenience for me I have never brought it up again, never nagged
> > you and never demanded that you listen to me and change your mind. I
> > made my pitch, you made your decision and I accept your decision even
> > if I don't like it.
> 
> If I came across as strongly disagreeing on that issue, I apologize.
> 
> On that issue, it's just that it's a really problematic feature for a
> multi device filesystem - we have to have a unique identifier for
> identifying the filesystem, separate from the block device, and if it's
> not actually unique - what do we do?

There are several different variants of this and I bet it is important to 
clarify exactly what is meant here. At least these two come to my mind:

1) Mount a block-for-block clone of the same filesystem another time.

2) Mount a snapshot of one filesystem on one block device to a different 
location in filesystem tree.

Carl, what is the one refering to? If you mean a third thing, please 
elaborate.

I do the second one with BTRFS in combination with the default subvolume 
feature in order to hide away snapshots.

First I create a subvolume where I can create snaphots and one for the 
actual filesystem contents. Then I tell the filesystem to use the snapshot 
for the filesystem contents as default for mounting. And then I additional 
mount the subvolume that contains the snapshots to a different location. 
Something like this:

/dev/nvme/system / btrfs lazytime,compress=zstd,discard=async   0       0

/dev/nvme/system /snap/system  btrfs   subvol=snap   0       0

(in this case BCacheFS on LVM as I wanted to have the flexibility to test 
out different filesystems)

This makes it easy for me to exclude all snapshots from backup operations 
as I can do top level snapshots of the filesystem contents and "hide" them 
away in a subvolume (means sub directory in filesystem tree) of my choice.

I still use rsync for backups as it has stood the test of time. I could 
probably switch to BTRFS send/recieve or a similar functionality in 
BCacheFS. With the added benefit of way better handling of renames.

> Even for single device filesystems it's a problem, because some of our
> userspace interfaces use that same unique identifier (sysfs, debugfs)
> and a single device filesystem can become a multidevice filesystem at
> any time.

I am not sure whether case 2 would already be possible with BCacheFS. If 
not, this would hold me back from switching over completely. Of course I 
can try to rework my setup, but I'd prefer not to. Unless someone can 
convince me of a good technical reason to do it.

> So it's not impossible, but genuinely messy, and it's the kind of thing
> I could see leading to landmines later, which makes me not particularly
> eager to touch it. But if it keeps coming up, I may end up giving it
> more study in the future.

Well it would be important to find a somewhat clean approach about that. 
I'd rather not have a feature than a feature that could be unreliable due 
to it being messy.

I do not know how other filesystems do either 1 or 2 of the above variants. 
All I can say is that with variant 2 on BTRFS I did not see an issue so 
far.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mounting multiple versions/snapshots/images at the same time (was: Re: Another bcachefs version downgrade bug)
  2024-10-21  7:43                               ` Mounting multiple versions/snapshots/images at the same time (was: Re: Another bcachefs version downgrade bug) Martin Steigerwald
@ 2024-10-21 20:15                                 ` Carl E. Thompson
  0 siblings, 0 replies; 26+ messages in thread
From: Carl E. Thompson @ 2024-10-21 20:15 UTC (permalink / raw)
  To: Martin Steigerwald, Kent Overstreet
  Cc: Christopher Snowhill, linux-bcachefs@vger.kernel.org


> On 2024-10-21 12:43 AM PDT Martin Steigerwald <martin@lichtvoll.de> wrote:

> ...

> There are several different variants of this and I bet it is important to 
> clarify exactly what is meant here. At least these two come to my mind:
> 
> 1) Mount a block-for-block clone of the same filesystem another time.
> 
> 2) Mount a snapshot of one filesystem on one block device to a different 
> location in filesystem tree.
> 
> Carl, what is the one refering to? If you mean a third thing, please 
> elaborate.

I am talking about both 2) and 1). However, I'm talking about thin **LVM** snapshots, not bcachefs' native snapshots. I'm also talking about file images of entire filesystems (what you'd get if you ran something like "cat /dev/nvme0n1p1 > /fs.bak"). 

I have several computers on which I use several different filesystems, and on almost all of those computers I use (thin) LVM. Since I'm using LVM anyway it's easier, more reliable and more consistent for me to use LVM's snapshots rather than the filesystems' native snapshots (if any). For similar reasons I also tend to use MDRAID instead of the filesystems' native multiple device support and LUKS instead of the filesystems' native encryption support. I'm also thinking about adding checksums to every filesystem using dm-integrity but I haven't gotten around to planning that out yet. So if I need to build server that absolutely needs to run reliably and consistently my current default is:

  Drives -> MDRAID -> LUKS -> Thin LVM -> XFS

I know that's really old-school and really un-sexy these days but it's essentially bulletproof when managed properly.

> ...

> (in this case BCacheFS on LVM as I wanted to have the flexibility to test 
> out different filesystems)
> 
> This makes it easy for me to exclude all snapshots from backup operations 
> as I can do top level snapshots of the filesystem contents and "hide" them 
> away in a subvolume (means sub directory in filesystem tree) of my choice.

I'm not sure I understand what you mean but I don't think I do it that way.

> I still use rsync for backups as it has stood the test of time.

Yeah, I've fought with rsync for a couple of decades now and it's the transport used by my backup system. For me rsync has always been problematic. It semi-regularly hangs despite being run on completely quiescent snapshots, it has atrocious performance on large images and it has some security weak spots that probably don't matter much on a secure network but still bother me. So I'm going to switch out rsync for BorgBackup at some point which should allow me to scrap most of my current backup system except for the front-end. If you've ever had any issues with rsync you might want to check it out. There are actually a bunch of other newer alternatives to rsync I've tested but for me BorgBackup was the winner.

> I could probably switch to BTRFS send/recieve or a similar functionality 
> in BCacheFS. With the added benefit of way better handling of renames.

There are actually projects that you can find on GitHub and elsewhere that allow you to do the same sort of send/receive that xfs/zfs/btrfs can do on **any** filesystem by working at the thin LVM level. (The tools require that you are using **thin** LVs which you should be anyway.) I use this method when I need to send an efficient, incremental **exact** copy backup of an **entire** filesystem somewhere else.

These tools are nice because they don't require reading or sending every block on the source device just the changed ones so they're efficient which is crucial for large filesystems or slow networks. I use this technique instead of the native filesystem send/receive so I can use can have a consistent interface across any filesystem. I won't point you at any particular project as what I'm using has been heavily modified for my particular use case.

> ...

Thanks,
Carl

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-10-21 20:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16  4:51 Another bcachefs version downgrade bug Carl E. Thompson
2024-10-17  0:09 ` Kent Overstreet
2024-10-17  8:29   ` Carl E. Thompson
2024-10-17  8:39     ` Kent Overstreet
2024-10-17  9:15       ` Carl E. Thompson
2024-10-17  9:30         ` Kent Overstreet
2024-10-17  9:45           ` Carl E. Thompson
2024-10-17 10:13             ` Kent Overstreet
2024-10-17 16:49               ` Carl E. Thompson
2024-10-18  8:17                 ` Christopher Snowhill
2024-10-18 17:37                   ` Carl E. Thompson
2024-10-18 19:12                     ` Kent Overstreet
2024-10-19  0:15                       ` Carl E. Thompson
2024-10-19  8:13                         ` Malte Schröder
2024-10-19  8:31                           ` Martin Steigerwald
2024-10-19  9:29                             ` Carl E. Thompson
2024-10-20  9:29                               ` Kent Overstreet
2024-10-19 20:18                             ` Jani Partanen
2024-10-20  8:04                               ` Malte Schröder
2024-10-21  3:49                                 ` Jani Partanen
2024-10-20 16:59                         ` Kent Overstreet
2024-10-21  0:34                           ` Carl E. Thompson
2024-10-21  1:15                             ` Kent Overstreet
2024-10-21  7:43                               ` Mounting multiple versions/snapshots/images at the same time (was: Re: Another bcachefs version downgrade bug) Martin Steigerwald
2024-10-21 20:15                                 ` Carl E. Thompson
2024-10-21  7:26                             ` Another bcachefs version downgrade bug Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox