* [GIT PULL] bcachefs fixes for 6.16-rc4
@ 2025-06-27 2:22 Kent Overstreet
2025-06-27 3:21 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Kent Overstreet @ 2025-06-27 2:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl
per the maintainer thread discussion and precedent in xfs and btrfs
for repair code in RCs, journal_rewind is again included
The following changes since commit e04c78d86a9699d136910cfc0bdcf01087e3267e:
Linux 6.16-rc2 (2025-06-15 13:49:41 -0700)
are available in the Git repository at:
git://evilpiepirate.org/bcachefs.git tags/bcachefs-2025-06-26
for you to fetch changes up to ef6fac0f9e5d0695cee1d820c727fe753eca52d5:
bcachefs: Plumb correct ip to trans_relock_fail tracepoint (2025-06-26 00:01:16 -0400)
----------------------------------------------------------------
bcachefs fixes for 6.16-rc4
----------------------------------------------------------------
Alan Huang (7):
bcachefs: Don't allocate new memory when mempool is exhausted
bcachefs: Fix alloc_req use after free
bcachefs: Add missing EBUG_ON
bcachefs: Delay calculation of trans->journal_u64s
bcachefs: Move bset size check before csum check
bcachefs: Fix pool->alloc NULL pointer dereference
bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked
Bharadwaj Raju (1):
bcachefs: don't return fsck_fix for unfixable node errors in __btree_err
Kent Overstreet (43):
bcachefs: trace_extent_trim_atomic
bcachefs: btree iter tracepoints
bcachefs: Fix bch2_journal_keys_peek_prev_min()
bcachefs: btree_iter: fix updates, journal overlay
bcachefs: better __bch2_snapshot_is_ancestor() assert
bcachefs: pass last_seq into fs_journal_start()
bcachefs: Fix "now allowing incompatible features" message
bcachefs: Fix snapshot_key_missing_inode_snapshot repair
bcachefs: fsck: fix add_inode()
bcachefs: fsck: fix extent past end of inode repair
bcachefs: opts.journal_rewind
bcachefs: Kill unused tracepoints
bcachefs: mark more errors autofix
bcachefs: fsck: Improve check_key_has_inode()
bcachefs: Call bch2_fs_init_rw() early if we'll be going rw
bcachefs: Fix __bch2_inum_to_path() when crossing subvol boundaries
bcachefs: fsck: Print path when we find a subvol loop
bcachefs: fsck: Fix remove_backpointer() for subvol roots
bcachefs: fsck: Fix reattach_inode() for subvol roots
bcachefs: fsck: check_directory_structure runs in reverse order
bcachefs: fsck: additional diagnostics for reattach_inode()
bcachefs: fsck: check_subdir_count logs path
bcachefs: fsck: Fix check_path_loop() + snapshots
bcachefs: Fix bch2_read_bio_to_text()
bcachefs: Fix restart handling in btree_node_scrub_work()
bcachefs: fsck: Fix check_directory_structure when no check_dirents
bcachefs: fsck: fix unhandled restart in topology repair
bcachefs: fsck: Fix oops in key_visible_in_snapshot()
bcachefs: fix spurious error in read_btree_roots()
bcachefs: Fix missing newlines before ero
bcachefs: Fix *__bch2_trans_subbuf_alloc() error path
bcachefs: Don't log fsck err in the journal if doing repair elsewhere
bcachefs: Add missing key type checks to check_snapshot_exists()
bcachefs: Add missing bch2_err_class() to fileattr_set()
bcachefs: fix spurious error_throw
bcachefs: Fix range in bch2_lookup_indirect_extent() error path
bcachefs: Check for bad write buffer key when moving from journal
bcachefs: Use wait_on_allocator() when allocating journal
bcachefs: fix bch2_journal_keys_peek_prev_min() underflow
bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix
bcachefs: Ensure btree node scan runs before checking for scanned nodes
bcachefs: Ensure we rewind to run recovery passes
bcachefs: Plumb correct ip to trans_relock_fail tracepoint
fs/bcachefs/alloc_background.c | 13 +-
fs/bcachefs/backpointers.c | 2 +-
fs/bcachefs/bcachefs.h | 3 +-
fs/bcachefs/btree_gc.c | 37 ++--
fs/bcachefs/btree_io.c | 74 ++++----
fs/bcachefs/btree_iter.c | 173 ++++++++++++------
fs/bcachefs/btree_journal_iter.c | 82 ++++++---
fs/bcachefs/btree_journal_iter_types.h | 5 +-
fs/bcachefs/btree_locking.c | 12 +-
fs/bcachefs/btree_node_scan.c | 6 +-
fs/bcachefs/btree_node_scan.h | 2 +-
fs/bcachefs/btree_trans_commit.c | 18 +-
fs/bcachefs/btree_types.h | 1 +
fs/bcachefs/btree_update.c | 16 +-
fs/bcachefs/btree_update.h | 5 +-
fs/bcachefs/btree_update_interior.c | 16 +-
fs/bcachefs/btree_update_interior.h | 3 +
fs/bcachefs/btree_write_buffer.c | 8 +-
fs/bcachefs/btree_write_buffer.h | 6 +
fs/bcachefs/chardev.c | 29 ++-
fs/bcachefs/data_update.c | 1 +
fs/bcachefs/errcode.h | 5 -
fs/bcachefs/error.c | 4 +-
fs/bcachefs/extent_update.c | 13 +-
fs/bcachefs/fs.c | 3 +-
fs/bcachefs/fsck.c | 317 +++++++++++++++++++++++----------
fs/bcachefs/inode.h | 5 +
fs/bcachefs/io_read.c | 7 +-
fs/bcachefs/journal.c | 20 +--
fs/bcachefs/journal.h | 2 +-
fs/bcachefs/journal_io.c | 26 ++-
fs/bcachefs/namei.c | 30 +++-
fs/bcachefs/opts.h | 5 +
fs/bcachefs/recovery.c | 24 ++-
fs/bcachefs/recovery_passes.c | 19 +-
fs/bcachefs/recovery_passes.h | 9 +
fs/bcachefs/reflink.c | 12 +-
fs/bcachefs/sb-errors_format.h | 19 +-
fs/bcachefs/snapshot.c | 14 +-
fs/bcachefs/super.c | 13 +-
fs/bcachefs/super.h | 1 +
fs/bcachefs/trace.h | 125 +++----------
42 files changed, 734 insertions(+), 451 deletions(-)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet
@ 2025-06-27 3:21 ` Linus Torvalds
2025-06-27 3:34 ` Kent Overstreet
` (2 more replies)
2025-06-27 3:33 ` pr-tracker-bot
2025-06-27 14:46 ` Josef Bacik
2 siblings, 3 replies; 19+ messages in thread
From: Linus Torvalds @ 2025-06-27 3:21 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl
On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> per the maintainer thread discussion and precedent in xfs and btrfs
> for repair code in RCs, journal_rewind is again included
I have pulled this, but also as per that discussion, I think we'll be
parting ways in the 6.17 merge window.
You made it very clear that I can't even question any bug-fixes and I
should just pull anything and everything.
Honestly, at that point, I don't really feel comfortable being
involved at all, and the only thing we both seemed to really
fundamentally agree on in that discussion was "we're done".
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet
2025-06-27 3:21 ` Linus Torvalds
@ 2025-06-27 3:33 ` pr-tracker-bot
2025-06-27 14:46 ` Josef Bacik
2 siblings, 0 replies; 19+ messages in thread
From: pr-tracker-bot @ 2025-06-27 3:33 UTC (permalink / raw)
To: Kent Overstreet
Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
The pull request you sent on Thu, 26 Jun 2025 22:22:52 -0400:
> git://evilpiepirate.org/bcachefs.git tags/bcachefs-2025-06-26
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6f2a71a99ebd5dfaa7948a2e9c59eae94b741bd8
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 3:21 ` Linus Torvalds
@ 2025-06-27 3:34 ` Kent Overstreet
2025-07-01 14:43 ` John Stoffel
2025-07-04 7:02 ` Hillf Danton
2025-06-27 19:07 ` Kyle Sanderson
2025-06-28 8:06 ` Gerhard Wiesinger
2 siblings, 2 replies; 19+ messages in thread
From: Kent Overstreet @ 2025-06-27 3:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl
On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote:
> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >
> > per the maintainer thread discussion and precedent in xfs and btrfs
> > for repair code in RCs, journal_rewind is again included
>
> I have pulled this, but also as per that discussion, I think we'll be
> parting ways in the 6.17 merge window.
>
> You made it very clear that I can't even question any bug-fixes and I
> should just pull anything and everything.
Linus, I'm not trying to say you can't have any say in bcachefs. Not at
all.
I positively enjoy working with you - when you're not being a dick, but
you can be genuinely impossible sometimes. A lot of times...
When bcachefs was getting merged, I got comments from another filesystem
maintainer that were pretty much "great! we finally have a filesystem
maintainer who can stand up to Linus!".
And having been on the receiving end of a lot of venting from them about
what was going on... And more that I won't get into...
I don't want to be in that position.
I'm just not going to have any sense of humour where user data integrity
is concerned or making sure users have the bugfixes they need.
Like I said - all I've been wanting is for you to tone it down and stop
holding pull requests over my head as THE place to have that discussion.
You have genuinely good ideas, and you're bloody sharp. It is FUN
getting shit done with you when we're not battling.
But you have to understand the constraints people are under. Not just
myself.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet
2025-06-27 3:21 ` Linus Torvalds
2025-06-27 3:33 ` pr-tracker-bot
@ 2025-06-27 14:46 ` Josef Bacik
2025-06-28 1:59 ` Theodore Ts'o
2 siblings, 1 reply; 19+ messages in thread
From: Josef Bacik @ 2025-06-27 14:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl
On Thu, Jun 26, 2025 at 10:22:52PM -0400, Kent Overstreet wrote:
> per the maintainer thread discussion and precedent in xfs and btrfs
> for repair code in RCs, journal_rewind is again included
>
I'm replying to set the record straight. This is not the start of the
discussion. I am not going to let false statements stand by unchallenged
however.
Sterba has never sent large pull requests in RCs, certainly not with features in
them. Even when Chris was the maintainer and we were a little faster and looser
and were pushing the envelope to see what Linus would accept we didn't ship
anything near this volume of patches past rc1.
And the numbers don't lie.
josef@fedora:~/linux$ git tag --contains 1c6fdbd8f2465ddfb73a01ec620cbf3d14044e1a | grep -v rc > bcachefs-tags
josef@fedora:~/linux$ git tag --contains be0e5c097fc206b863ce9fe6b3cfd6974b0110f4 | grep -v rc > tags
josef@fedora:~/linux$ for i in $(cat tags); do git log --no-merges --oneline $i-rc2..$i fs/btrfs | wc -l; done > btrfs-counts.txt
josef@fedora:~/linux$ for i in $(cat bcachefs-tags); do git log --no-merges --oneline $i-rc2..$i fs/bcachefs | wc -l; done > bcachefs-counts.txt
josef@fedora:~/linux$ R -q -e "x <- read.csv('btrfs-counts.txt', header = F); summary(x); sd(x[ , 1 ])"
> x <- read.csv('btrfs-counts.txt', header = F); summary(x); sd(x[ , 1 ])
V1
Min. : 0.00
1st Qu.:10.25
Median :19.00
Mean :20.48
3rd Qu.:27.50
Max. :55.00
[1] 11.77108
>
josef@fedora:~/linux$ R -q -e "x <- read.csv('bcachefs-counts.txt', header = F); summary(x); sd(x[ , 1 ])"
> x <- read.csv('bcachefs-counts.txt', header = F); summary(x); sd(x[ , 1 ])
V1
Min. : 0.00
1st Qu.: 38.50
Median : 70.00
Mean : 63.86
3rd Qu.: 81.50
Max. :137.00
[1] 45.28218
>
So even including the wilder times of kernel development in general and btrfs's
specifically, our worst window was 55 patches, less than your mean.
These are not the same thing. Do not equivicate the two. Sterba is a phenomenal
maintainer who does his job well, manages to work with Linus just fine. We are
not the same, we do not work the same, and we absolutely do follow the rules, as
do 99.99% of the kernel community.
If xfs has done this then good for them. Those developers have a track record of
doing the right thing over a long period of time. Btrfs for sure hasn't. Thanks,
Josef
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 3:21 ` Linus Torvalds
2025-06-27 3:34 ` Kent Overstreet
@ 2025-06-27 19:07 ` Kyle Sanderson
2025-06-27 19:16 ` Kyle Sanderson
2025-06-28 8:06 ` Gerhard Wiesinger
2 siblings, 1 reply; 19+ messages in thread
From: Kyle Sanderson @ 2025-06-27 19:07 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-bcachefs, linux-fsdevel, linux-kerenl, Kent Overstreet
On 6/26/2025 8:21 PM, Linus Torvalds wrote:
> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>>
>> per the maintainer thread discussion and precedent in xfs and btrfs
>> for repair code in RCs, journal_rewind is again included
>
> I have pulled this, but also as per that discussion, I think we'll be
> parting ways in the 6.17 merge window.
>
> You made it very clear that I can't even question any bug-fixes and I
> should just pull anything and everything.
>
> Honestly, at that point, I don't really feel comfortable being
> involved at all, and the only thing we both seemed to really
> fundamentally agree on in that discussion was "we're done".
>
> Linus
Linus,
The pushback on rewind makes sense, it wasn’t fully integrated and was
fsck code written to fix the problems with the retail 6.15 release -
this looks like it slipped through Kents CI and there were indeed
multiple people hit by it (myself included).
Quoting someone back to themselves is not cool, however I believe it
highlights what has gone on here which is why I am breaking my own rule:
"One of the things I liked about the Rust side of the kernel was that
there was one maintainer who was clearly much younger than most of the
maintainers and that was the Rust maintainer.
We can clearly see that certain areas in the kernel bring in more young
people.
At the Maintainer Summit, we had this clear division between the
filesystem people, who were very careful and very staid, and cared
deeply about their code being 100% correct - because if you have a bug
in a filesystem, the data on your disk may be gone - so these people
take themselves and their code very seriously.
And then you have the driver people who are a bit more 'okay',
especially the GPU folks, 'where anything goes'.
You notice that on the driver side it’s much easier to find young
people, and that is traditionally how we’ve grown a lot of maintainers.
" (1)
Kent is moving like the older days of rapid development - fast and
driven - and this style clashes with the mature stable filesystem
culture that demands extreme caution today. Almost every single patch
has been in response to reported issues, the primary issue here is
that’s on IRC where his younger users are (not so young, anymore - it is
not tiktok), and not on lkml. The pace of development has kept up, and
the "new feature" part of it like changing out the entire hash table in
rc6 seems to have stopped. This is still experimental, and he's moving
that way now with care and continuing to improve his testing coverage
with each bug.
Kent has deep technical experience here, much earlier in the
interview(1) regarding the 6.7 merge window this filesystem has been in
the works for a decade. Maintainership means adapting to kernel process
as much as code quality, that may be closer to the issue here.
If direct pulls aren’t working, maybe a co-maintainer or routing changes
through a senior fs maintainer can help. If you're open to it, maybe
that is even you.
Dropping bcachefs now would be a monumental step backward from the
filesystems we have today. Enterprises simply do not use them for true
storage at scale which is why vendors have largely taken over this
space. The question is how to balance rigor with supporting new
maintainers in the ecosystem. Everything Kent has written around
supporting users is true, and publicly visible, if only to the 260 users
on irc, and however many more are on matrix. There are plenty more that
are offline, and while this is experimental there are a number of public
sector agencies testing this now (I have seen reference to a number of
emergency service providers, which isn’t great, but for whatever reason
they are doing that).
(1) https://youtu.be/OvuEYtkOH88?t=1044
Kyle.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 19:07 ` Kyle Sanderson
@ 2025-06-27 19:16 ` Kyle Sanderson
2025-06-27 19:42 ` Kent Overstreet
0 siblings, 1 reply; 19+ messages in thread
From: Kyle Sanderson @ 2025-06-27 19:16 UTC (permalink / raw)
To: linux-kernel
Cc: linux-bcachefs, linux-fsdevel, Kent Overstreet, Linus Torvalds
On 6/27/2025 12:07 PM, Kyle Sanderson wrote:
> On 6/26/2025 8:21 PM, Linus Torvalds wrote:
>> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet
>> <kent.overstreet@linux.dev> wrote:
>>>
>>> per the maintainer thread discussion and precedent in xfs and btrfs
>>> for repair code in RCs, journal_rewind is again included
>>
>> I have pulled this, but also as per that discussion, I think we'll be
>> parting ways in the 6.17 merge window.
>>
>> You made it very clear that I can't even question any bug-fixes and I
>> should just pull anything and everything.
>>
>> Honestly, at that point, I don't really feel comfortable being
>> involved at all, and the only thing we both seemed to really
>> fundamentally agree on in that discussion was "we're done".
>>
>> Linus
>
> Linus,
>
> The pushback on rewind makes sense, it wasn’t fully integrated and was
> fsck code written to fix the problems with the retail 6.15 release -
> this looks like it slipped through Kents CI and there were indeed
> multiple people hit by it (myself included).
>
> Quoting someone back to themselves is not cool, however I believe it
> highlights what has gone on here which is why I am breaking my own rule:
>
> "One of the things I liked about the Rust side of the kernel was that
> there was one maintainer who was clearly much younger than most of the
> maintainers and that was the Rust maintainer.
>
> We can clearly see that certain areas in the kernel bring in more young
> people.
>
> At the Maintainer Summit, we had this clear division between the
> filesystem people, who were very careful and very staid, and cared
> deeply about their code being 100% correct - because if you have a bug
> in a filesystem, the data on your disk may be gone - so these people
> take themselves and their code very seriously.
>
> And then you have the driver people who are a bit more 'okay',
> especially the GPU folks, 'where anything goes'.
> You notice that on the driver side it’s much easier to find young
> people, and that is traditionally how we’ve grown a lot of maintainers.
> " (1)
>
> Kent is moving like the older days of rapid development - fast and
> driven - and this style clashes with the mature stable filesystem
> culture that demands extreme caution today. Almost every single patch
> has been in response to reported issues, the primary issue here is
> that’s on IRC where his younger users are (not so young, anymore - it is
> not tiktok), and not on lkml. The pace of development has kept up, and
> the "new feature" part of it like changing out the entire hash table in
> rc6 seems to have stopped. This is still experimental, and he's moving
> that way now with care and continuing to improve his testing coverage
> with each bug.
>
> Kent has deep technical experience here, much earlier in the
> interview(1) regarding the 6.7 merge window this filesystem has been in
> the works for a decade. Maintainership means adapting to kernel process
> as much as code quality, that may be closer to the issue here.
>
> If direct pulls aren’t working, maybe a co-maintainer or routing changes
> through a senior fs maintainer can help. If you're open to it, maybe
> that is even you.
>
> Dropping bcachefs now would be a monumental step backward from the
> filesystems we have today. Enterprises simply do not use them for true
> storage at scale which is why vendors have largely taken over this
> space. The question is how to balance rigor with supporting new
> maintainers in the ecosystem. Everything Kent has written around
> supporting users is true, and publicly visible, if only to the 260 users
> on irc, and however many more are on matrix. There are plenty more that
> are offline, and while this is experimental there are a number of public
> sector agencies testing this now (I have seen reference to a number of
> emergency service providers, which isn’t great, but for whatever reason
> they are doing that).
>
> (1) https://youtu.be/OvuEYtkOH88?t=1044
>
> Kyle.
Re-sending as this thread seems to have typo'd lkml (removing the bad
entry).
Kyle.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 19:16 ` Kyle Sanderson
@ 2025-06-27 19:42 ` Kent Overstreet
0 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2025-06-27 19:42 UTC (permalink / raw)
To: Kyle Sanderson
Cc: linux-kernel, linux-bcachefs, linux-fsdevel, Linus Torvalds
On Fri, Jun 27, 2025 at 12:16:09PM -0700, Kyle Sanderson wrote:
> On 6/27/2025 12:07 PM, Kyle Sanderson wrote:
> > On 6/26/2025 8:21 PM, Linus Torvalds wrote:
> > > On Thu, 26 Jun 2025 at 19:23, Kent Overstreet
> > > <kent.overstreet@linux.dev> wrote:
> > > >
> > > > per the maintainer thread discussion and precedent in xfs and btrfs
> > > > for repair code in RCs, journal_rewind is again included
> > >
> > > I have pulled this, but also as per that discussion, I think we'll be
> > > parting ways in the 6.17 merge window.
> > >
> > > You made it very clear that I can't even question any bug-fixes and I
> > > should just pull anything and everything.
> > >
> > > Honestly, at that point, I don't really feel comfortable being
> > > involved at all, and the only thing we both seemed to really
> > > fundamentally agree on in that discussion was "we're done".
> > >
> > > Linus
> >
> > Linus,
> >
> > The pushback on rewind makes sense, it wasn’t fully integrated and was
> > fsck code written to fix the problems with the retail 6.15 release -
> > this looks like it slipped through Kents CI and there were indeed
> > multiple people hit by it (myself included).
> >
> > Quoting someone back to themselves is not cool, however I believe it
> > highlights what has gone on here which is why I am breaking my own rule:
> >
> > "One of the things I liked about the Rust side of the kernel was that
> > there was one maintainer who was clearly much younger than most of the
> > maintainers and that was the Rust maintainer.
> >
> > We can clearly see that certain areas in the kernel bring in more young
> > people.
> >
> > At the Maintainer Summit, we had this clear division between the
> > filesystem people, who were very careful and very staid, and cared
> > deeply about their code being 100% correct - because if you have a bug
> > in a filesystem, the data on your disk may be gone - so these people
> > take themselves and their code very seriously.
> >
> > And then you have the driver people who are a bit more 'okay',
> > especially the GPU folks, 'where anything goes'.
> > You notice that on the driver side it’s much easier to find young
> > people, and that is traditionally how we’ve grown a lot of maintainers.
> > " (1)
> >
> > Kent is moving like the older days of rapid development - fast and
> > driven - and this style clashes with the mature stable filesystem
> > culture that demands extreme caution today. Almost every single patch
> > has been in response to reported issues, the primary issue here is
> > that’s on IRC where his younger users are (not so young, anymore - it is
> > not tiktok), and not on lkml. The pace of development has kept up, and
> > the "new feature" part of it like changing out the entire hash table in
> > rc6 seems to have stopped. This is still experimental, and he's moving
> > that way now with care and continuing to improve his testing coverage
> > with each bug.
> >
> > Kent has deep technical experience here, much earlier in the
> > interview(1) regarding the 6.7 merge window this filesystem has been in
> > the works for a decade. Maintainership means adapting to kernel process
> > as much as code quality, that may be closer to the issue here.
> >
> > If direct pulls aren’t working, maybe a co-maintainer or routing changes
> > through a senior fs maintainer can help. If you're open to it, maybe
> > that is even you.
> >
> > Dropping bcachefs now would be a monumental step backward from the
> > filesystems we have today. Enterprises simply do not use them for true
> > storage at scale which is why vendors have largely taken over this
> > space. The question is how to balance rigor with supporting new
> > maintainers in the ecosystem. Everything Kent has written around
> > supporting users is true, and publicly visible, if only to the 260 users
> > on irc, and however many more are on matrix. There are plenty more that
> > are offline, and while this is experimental there are a number of public
> > sector agencies testing this now (I have seen reference to a number of
> > emergency service providers, which isn’t great, but for whatever reason
> > they are doing that).
> >
> > (1) https://youtu.be/OvuEYtkOH88?t=1044
> >
> > Kyle.
>
> Re-sending as this thread seems to have typo'd lkml (removing the bad
> entry).
Thanks.
Also, I think I should add, in case my words in the private conversation
were misinterpreted:
I don't think bcachefs should be dropped from the kernel, I think it
would be better for this to be worked out.
I firstly want to reassure people that: if bcachefs has to be shipped as
a DKMS module, that will not kill the project. It will be a giant hassle
(especially if distributions have to scramble), but life will continue.
I remain committed as ever to getting this done - one way or the other.
And I think it is safe to say that going that route would be the better
option for the sanity of myself and Linus, but it wouldn't be the better
option for the users or the rest of the development community.
With that, I am going to take a breather.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 14:46 ` Josef Bacik
@ 2025-06-28 1:59 ` Theodore Ts'o
0 siblings, 0 replies; 19+ messages in thread
From: Theodore Ts'o @ 2025-06-28 1:59 UTC (permalink / raw)
To: Josef Bacik; +Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
On Fri, Jun 27, 2025 at 10:46:04AM -0400, Josef Bacik wrote:
> On Thu, Jun 26, 2025 at 10:22:52PM -0400, Kent Overstreet wrote:
> > per the maintainer thread discussion and precedent in xfs and btrfs
> > for repair code in RCs, journal_rewind is again included
>
> I'm replying to set the record straight. This is not the start of the
> discussion. I am not going to let false statements stand by unchallenged
> however.
>
> Sterba has never sent large pull requests in RCs, certainly not with
> features in them. Even when Chris was the maintainer and we were a
> little faster and looser and were pushing the envelope to see what
> Linus would accept we didn't ship anything near this volume of
> patches past rc1.
And as far as XFS is concerned, "citation needed". Dave Chinner (who
is not the current XFS maintainer) has asserted that there might be a
time when XFS *might* want to send repair code post merge window.
However, I'm not aware of any time when Darrick Wong was working on
XFS online repair that he sent changes outside of the merge window as
the XFS maintainer.
And now that XFS online repair feature is upstream, *bug fixes* can be
sent at any time. So (a) I am not aware of any time that XFS *has*
sent online repair changes upstream outside of a merge window --- this
is just an assertion by Kent --- and (b) I am not sure when XFS would
need to send some kind of new feature involving online repair
upstream, given that online repair is *already* upstream.
- Ted
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 3:21 ` Linus Torvalds
2025-06-27 3:34 ` Kent Overstreet
2025-06-27 19:07 ` Kyle Sanderson
@ 2025-06-28 8:06 ` Gerhard Wiesinger
2 siblings, 0 replies; 19+ messages in thread
From: Gerhard Wiesinger @ 2025-06-28 8:06 UTC (permalink / raw)
To: Linus Torvalds, Kent Overstreet
Cc: linux-bcachefs, linux-fsdevel, linux-kerenl
On 27.06.2025 05:21, Linus Torvalds wrote:
> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>> per the maintainer thread discussion and precedent in xfs and btrfs
>> for repair code in RCs, journal_rewind is again included
> I have pulled this, but also as per that discussion, I think we'll be
> parting ways in the 6.17 merge window.
>
> You made it very clear that I can't even question any bug-fixes and I
> should just pull anything and everything.
>
> Honestly, at that point, I don't really feel comfortable being
> involved at all, and the only thing we both seemed to really
> fundamentally agree on in that discussion was "we're done".
>
Hello Linus,
Do you think the "hard rules" for "no features" in the "fixing merge
window" also apply for modules in Linux kernel which are marked as
experimental (as long no other code outside of the module itself is
changed)?
I understand your points fully for non experimental code but maybe it is
a solution is to have different rules for code marked as experimental
code. Every user who uses experimental features should be aware that
potential non stable code is used.
Maybe you can think of it.
Thnx.
Ciao,
Gerhard
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 3:34 ` Kent Overstreet
@ 2025-07-01 14:43 ` John Stoffel
2025-07-02 16:34 ` Kent Overstreet
2025-07-04 7:02 ` Hillf Danton
1 sibling, 1 reply; 19+ messages in thread
From: John Stoffel @ 2025-07-01 14:43 UTC (permalink / raw)
To: Kent Overstreet
Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
>>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes:
I wasn't sure if I wanted to chime in here, or even if it would be
worth it. But whatever.
> On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote:
>> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>> >
>> > per the maintainer thread discussion and precedent in xfs and btrfs
>> > for repair code in RCs, journal_rewind is again included
>>
>> I have pulled this, but also as per that discussion, I think we'll be
>> parting ways in the 6.17 merge window.
>>
>> You made it very clear that I can't even question any bug-fixes and I
>> should just pull anything and everything.
> Linus, I'm not trying to say you can't have any say in bcachefs. Not at
> all.
> I positively enjoy working with you - when you're not being a dick,
> but you can be genuinely impossible sometimes. A lot of times...
Kent, you can be a dick too. Prime example, the lines above. And
how you've treated me and others who gave feedback on bcachefs in the
past. I'm not a programmer, I'm in IT and follow this because it's
interesting, and I've been doing data management all my career. So
new filesystems are interesting.
But I've also been bitten by data loss, so I'd never ever trust my
production data to something labeled "experimental". It's wonderful
that you have stepped up and managed to get back people's data when
bugs in the code have caused them to lose data.
But for god's sake, just because you can find and fix this type of bug
during the -rc series, doesn't mean you need to try and patch it NOW.
Queue it up for the next release. Tell people how they can pull the
patch early if they want, but don't push it late in the release
cycle.
I've been watching this list since the early 2.x days, and I've seen
how the workflow has evolved over time. I've watched people burn out
and leave, flame wars and all kinds of crap. And the people who have
stayed around are generally the nice people. The flexible people.
The people who know when to back the f*ck off and take their time.
> When bcachefs was getting merged, I got comments from another
> filesystem maintainer that were pretty much "great! we finally have
> a filesystem maintainer who can stand up to Linus!".
Is that in terms of being dicks, or in terms of technical ability? Or
in terms of being super productive and focussed and able to get work
done. Standing up doesn't mean you're right. Or wrong.
> And having been on the receiving end of a lot of venting from them
> about what was going on... And more that I won't get into...
> I don't want to be in that position.
So don't! Just step back a second. Go back and read and re-read all
the comments Linus had made about the workflow and release process
over the years, much less decades of the kernel development. I'm not
sure you realize how much work it is to have people blasting patches
at you all day long, 365 days a year, and who think their patches are
the most important thing in the entire world bar none.
Just reflect on this for a second. Take your hands off your keyboard,
and don't type anything. And think about how many other people also
think their patches are the most important.
And about the users who _need_ _that_ _patch_ _right_ _now_ to fix a
problem. Why doesn't Linus see that I'm important and my part of the
kernel is the most important!
Just let that sink in a bit.
Then think about how many people do not care about bcachefs at all,
who don't even know it exists. And haven't used it or want to use
it. Are they less important? What about the graphics driver they
need to get _their_ work done right now? Is that more or less
important?
> I'm just not going to have any sense of humour where user data integrity
> is concerned or making sure users have the bugfixes they need.
So release your own patches in your own tree! No one is stopping you!
Have your '6.17-next' branch with the big re-working to fix this
horrible issue. But send in just the minimal patch _now_. The
absolutely the smallest patch.
Or just send in a revert for all you have done in the current series
which is breaking people, because it wasn't quite baked enough for
stability. Fall back, re-group, re-submit it all on the next release.
Slow down.
> Like I said - all I've been wanting is for you to tone it down and stop
> holding pull requests over my head as THE place to have that discussion.
And you need to stop thinking you are the most important thing and
only you can decide when bcachefs needs to be updated or not in the
kernel tree.
> You have genuinely good ideas, and you're bloody sharp. It is FUN
> getting shit done with you when we're not battling.
I'm honestly amazed at your abilities here Kent, even though you can
be an abrassive person too.
> But you have to understand the constraints people are under. Not
> just myself.
Dude, you need to listed to Linus saying this exact same line back to
you.
John
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-07-01 14:43 ` John Stoffel
@ 2025-07-02 16:34 ` Kent Overstreet
2025-07-02 17:41 ` Carl E. Thompson
2025-07-07 20:03 ` John Stoffel
0 siblings, 2 replies; 19+ messages in thread
From: Kent Overstreet @ 2025-07-02 16:34 UTC (permalink / raw)
To: John Stoffel; +Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote:
> >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes:
>
> I wasn't sure if I wanted to chime in here, or even if it would be
> worth it. But whatever.
>
> > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote:
> >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >> >
> >> > per the maintainer thread discussion and precedent in xfs and btrfs
> >> > for repair code in RCs, journal_rewind is again included
> >>
> >> I have pulled this, but also as per that discussion, I think we'll be
> >> parting ways in the 6.17 merge window.
> >>
> >> You made it very clear that I can't even question any bug-fixes and I
> >> should just pull anything and everything.
>
> > Linus, I'm not trying to say you can't have any say in bcachefs. Not at
> > all.
>
> > I positively enjoy working with you - when you're not being a dick,
> > but you can be genuinely impossible sometimes. A lot of times...
>
> Kent, you can be a dick too. Prime example, the lines above. And
> how you've treated me and others who gave feedback on bcachefs in the
> past. I'm not a programmer, I'm in IT and follow this because it's
> interesting, and I've been doing data management all my career. So
> new filesystems are interesting.
Oh yes, I can be. I apologize if I've been a dick to you personally, I
try to be nice to my users and build good working relationships. But
kernel development is a high stakes, high pressure, stressful job, as I
often remind people. I don't ever take it personally, although sometimes
we do need to cool off before we drive each other completely mad :)
If there was something that was unresolved, and you'd like me to look at
it again, I'd be more than happy to. If you want to share what you were
hitting here, I'll tell you what I know - and if it was from a year or
more ago it's most likely been fixed.
> Slow down.
This is the most critical phase in the 10+ year process of shipping a
new filesystem.
We're seeing continually increasing usage (hopefully by users who are
prepared to accept that risk, but not always!), but we're not yet ready
for true widespread deployment.
Shipping a project as large and complex as a filesystem must be done
incrementally, in stages where we're deploying to gradually increasing
numbers of users, fixing everything they find and assessing where we're
at before opening it up to more users.
Working with users, supporting with them, checking in on how it's doing,
and getting them the fixes for what they find is how we iterate and
improve. The job is not done until it's working well for everyone.
Right now, everyone is concerned because this is a hotly anticipated
project, and everyone wants to see it done right.
And in 6.16, we had two massive pull requests (30+ patches in a week,
twice in a row); that also generates concern when people are wondering
"is this thing stabilizing?".
6.16 was largely a case of a few particularly interesting bug reports
generating a bunch of fixes (and relatively simple and localized fixes,
which is what we like to see) for repair corner cases, the biggest
culprit (again) being snapshots.
If you look at the bug tracker, especially rate of incoming bugs and the
severity of bug reports (and also other sources of bug reports, like
reddit and IRC) - yes, we are stabilizing fast.
There is still a lot of work to be done, but we're on the right track.
"Slowing down" is not something you do without a concrete reason. Right
now we need to be getting those fixes out to users so they can keep
testing and finding the next bug. When someone has invested time and
effort learning how the system works and how to report bugs, we don't
watn them getting frustrated and leaving - we want to work with them, so
they can keep testing and finding new bugs.
The signals that would tell me it's time to slow down are:
- Regressions getting through (quantity, severity, time spent on fixing
them)
- Bugs getting through that show that show that something fundamental is
missing (testing, hardening), or broken in our our design.
- Frequency of bug reports going up to where I can't keep up (it's been
in steady, gradual decline)
We actually do not want this to be 100% perfect before it sees users.
That would result in a filesystem that's brittle - a glass cannon. We
might get it to the point where it works 99% of the time, but then when
it breaks we'd be in a panic - and if you discover it then, when it's in
the wild, it's too late.
The processes for how we debug and recover from failures, in the wild,
is a huge part (perhaps the majority) of what we're working on now. That
stuff has to be baked into the design on a deep level, and like all
other complex design it requires continual iteration.
That is how we'll get the reliability and robustness we hope to achieve.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-07-02 16:34 ` Kent Overstreet
@ 2025-07-02 17:41 ` Carl E. Thompson
2025-07-02 17:53 ` Kent Overstreet
2025-07-07 20:03 ` John Stoffel
1 sibling, 1 reply; 19+ messages in thread
From: Carl E. Thompson @ 2025-07-02 17:41 UTC (permalink / raw)
To: Kent Overstreet, John Stoffel
Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
Kent, at this point in bcachefs' development you want complete control over your development processes and timetable that you simply can't get in the mainline kernel. It's in your own best interest for you to develop out-of-tree for now.
It's in your users' best interests too. It's much faster, easier, less invasive and less risky to compile and install a single module than it is to replace the entire kernel. Developing out-of-tree will help users track down bugs faster because you'll be able to iterate faster and because testing multiple bcachefs versions in the same kernel eliminates the possibility of other kernel changes clouding the tests.
And it seems to me to be in the other kernel developers' best interests. They need to be able to do their work and I suspect the constant drama and distraction you bring could make that harder. You've already damaged your own reputation considerably but your continued drama is also damaging _their_ reputations (and the kernel's) and that's not fair.
I don't know what arrangement you have with your corporate sponsor but if they have incentivized you to have your development happen in the mainline kernel tree then I would ask that you not put their interests above everyone else's.
Carl Thompson
PS: There is a typo in the linux-kernel mailing list email address in this chain. Not fixing it as I don't think there's anything in this discussion that is of value to a larger audience.
> On 2025-07-02 9:34 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
>
> On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote:
> > >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes:
> >
> > I wasn't sure if I wanted to chime in here, or even if it would be
> > worth it. But whatever.
> >
> > > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote:
> > >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> > >> >
> > >> > per the maintainer thread discussion and precedent in xfs and btrfs
> > >> > for repair code in RCs, journal_rewind is again included
> > >>
> > >> I have pulled this, but also as per that discussion, I think we'll be
> > >> parting ways in the 6.17 merge window.
> > >>
> > >> You made it very clear that I can't even question any bug-fixes and I
> > >> should just pull anything and everything.
> >
> > > Linus, I'm not trying to say you can't have any say in bcachefs. Not at
> > > all.
> >
> > > I positively enjoy working with you - when you're not being a dick,
> > > but you can be genuinely impossible sometimes. A lot of times...
> >
> > Kent, you can be a dick too. Prime example, the lines above. And
> > how you've treated me and others who gave feedback on bcachefs in the
> > past. I'm not a programmer, I'm in IT and follow this because it's
> > interesting, and I've been doing data management all my career. So
> > new filesystems are interesting.
>
> Oh yes, I can be. I apologize if I've been a dick to you personally, I
> try to be nice to my users and build good working relationships. But
> kernel development is a high stakes, high pressure, stressful job, as I
> often remind people. I don't ever take it personally, although sometimes
> we do need to cool off before we drive each other completely mad :)
>
> If there was something that was unresolved, and you'd like me to look at
> it again, I'd be more than happy to. If you want to share what you were
> hitting here, I'll tell you what I know - and if it was from a year or
> more ago it's most likely been fixed.
>
> > Slow down.
>
> This is the most critical phase in the 10+ year process of shipping a
> new filesystem.
>
> We're seeing continually increasing usage (hopefully by users who are
> prepared to accept that risk, but not always!), but we're not yet ready
> for true widespread deployment.
>
> Shipping a project as large and complex as a filesystem must be done
> incrementally, in stages where we're deploying to gradually increasing
> numbers of users, fixing everything they find and assessing where we're
> at before opening it up to more users.
>
> Working with users, supporting with them, checking in on how it's doing,
> and getting them the fixes for what they find is how we iterate and
> improve. The job is not done until it's working well for everyone.
>
> Right now, everyone is concerned because this is a hotly anticipated
> project, and everyone wants to see it done right.
>
> And in 6.16, we had two massive pull requests (30+ patches in a week,
> twice in a row); that also generates concern when people are wondering
> "is this thing stabilizing?".
>
> 6.16 was largely a case of a few particularly interesting bug reports
> generating a bunch of fixes (and relatively simple and localized fixes,
> which is what we like to see) for repair corner cases, the biggest
> culprit (again) being snapshots.
>
> If you look at the bug tracker, especially rate of incoming bugs and the
> severity of bug reports (and also other sources of bug reports, like
> reddit and IRC) - yes, we are stabilizing fast.
>
> There is still a lot of work to be done, but we're on the right track.
>
> "Slowing down" is not something you do without a concrete reason. Right
> now we need to be getting those fixes out to users so they can keep
> testing and finding the next bug. When someone has invested time and
> effort learning how the system works and how to report bugs, we don't
> watn them getting frustrated and leaving - we want to work with them, so
> they can keep testing and finding new bugs.
>
> The signals that would tell me it's time to slow down are:
>
> - Regressions getting through (quantity, severity, time spent on fixing
> them)
> - Bugs getting through that show that show that something fundamental is
> missing (testing, hardening), or broken in our our design.
> - Frequency of bug reports going up to where I can't keep up (it's been
> in steady, gradual decline)
>
> We actually do not want this to be 100% perfect before it sees users.
> That would result in a filesystem that's brittle - a glass cannon. We
> might get it to the point where it works 99% of the time, but then when
> it breaks we'd be in a panic - and if you discover it then, when it's in
> the wild, it's too late.
>
> The processes for how we debug and recover from failures, in the wild,
> is a huge part (perhaps the majority) of what we're working on now. That
> stuff has to be baked into the design on a deep level, and like all
> other complex design it requires continual iteration.
>
> That is how we'll get the reliability and robustness we hope to achieve.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-07-02 17:41 ` Carl E. Thompson
@ 2025-07-02 17:53 ` Kent Overstreet
2025-07-02 18:49 ` Malte Schröder
0 siblings, 1 reply; 19+ messages in thread
From: Kent Overstreet @ 2025-07-02 17:53 UTC (permalink / raw)
To: Carl E. Thompson
Cc: John Stoffel, Linus Torvalds, linux-bcachefs, linux-fsdevel,
linux-kernel
On Wed, Jul 02, 2025 at 10:41:34AM -0700, Carl E. Thompson wrote:
> Kent, at this point in bcachefs' development you want complete control
> over your development processes and timetable that you simply can't
> get in the mainline kernel. It's in your own best interest for you to
> develop out-of-tree for now.
Carl, all I'm doing is stating up front what it's going to take to get
this done right.
I'm not particularly pushing one way or the other for bcachefs to stay
in; there are pros and cons either way. It'll be disruptive for it to be
out, but if the alternative is disrupting process too much and driving
Linus and I completely completely nuts, that's ok.
Everyone please be patient. This is a 10+ year process, no one thing is
make or break.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-07-02 17:53 ` Kent Overstreet
@ 2025-07-02 18:49 ` Malte Schröder
0 siblings, 0 replies; 19+ messages in thread
From: Malte Schröder @ 2025-07-02 18:49 UTC (permalink / raw)
To: Kent Overstreet, Carl E. Thompson
Cc: John Stoffel, Linus Torvalds, linux-bcachefs, linux-fsdevel,
linux-kernel
On 02.07.25 19:53, Kent Overstreet wrote:
> On Wed, Jul 02, 2025 at 10:41:34AM -0700, Carl E. Thompson wrote:
>> Kent, at this point in bcachefs' development you want complete control
>> over your development processes and timetable that you simply can't
>> get in the mainline kernel. It's in your own best interest for you to
>> develop out-of-tree for now.
> Carl, all I'm doing is stating up front what it's going to take to get
> this done right.
>
> I'm not particularly pushing one way or the other for bcachefs to stay
> in; there are pros and cons either way. It'll be disruptive for it to be
> out, but if the alternative is disrupting process too much and driving
> Linus and I completely completely nuts, that's ok.
>
> Everyone please be patient. This is a 10+ year process, no one thing is
> make or break.
>
So as a user usually hanging out on IRC and running Kent's trees:
I think most of those people actually testing bcachefs are either
running bcachefs-master, -rc or some outdated distro kernel. From my
perspective I'd think it would be good enough to push for-upstream
during the merge window and then only provide further patches if there
where regressions or some really bad bug appears that actually eats data
(like the one that bit me). If it's "just" stability fixes, well, if
people running a distro kernel hit those bugs they'll need to build a
-rc kernel anyways to get fixes, those could just build bcachefs-master.
When running Linus' tree I am aware and accept that I am not running the
absolutely latest code.
I've had some pretty bad experience with amdgpu requiring out of tree
patches to get my system running free of glitches, which took months to
get into upstream. It's annoying, but I accept that.
I'd rather have a slightly outdated bcachefs in kernel than not at all.
It is good to have a distro kernel I can fall back to if I mess up my
own kernel building ;)
my 0.02€
/Malte
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-06-27 3:34 ` Kent Overstreet
2025-07-01 14:43 ` John Stoffel
@ 2025-07-04 7:02 ` Hillf Danton
1 sibling, 0 replies; 19+ messages in thread
From: Hillf Danton @ 2025-07-04 7:02 UTC (permalink / raw)
To: Kent Overstreet
Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
On Thu, 26 Jun 2025 23:34:11 -0400 Kent Overstreet wrote:
> On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote:
> > On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> > >
> > > per the maintainer thread discussion and precedent in xfs and btrfs
> > > for repair code in RCs, journal_rewind is again included
> >
> > I have pulled this, but also as per that discussion, I think we'll be
> > parting ways in the 6.17 merge window.
> >
> > You made it very clear that I can't even question any bug-fixes and I
> > should just pull anything and everything.
>
> Linus, I'm not trying to say you can't have any say in bcachefs. Not at
> all.
>
> I positively enjoy working with you - when you're not being a dick, but
> you can be genuinely impossible sometimes. A lot of times...
>
Now I see why your nostrils are so up, dude, because like Linux dancing
out of the Unix box, you are not so tame. Is it wasting minutes for Elon
to think he is richer than Zucker?
BTW I have some difficulty replying offline mails.
Hillf Danton
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-07-02 16:34 ` Kent Overstreet
2025-07-02 17:41 ` Carl E. Thompson
@ 2025-07-07 20:03 ` John Stoffel
2025-07-07 20:39 ` Kent Overstreet
2025-07-07 21:32 ` Carl E. Thompson
1 sibling, 2 replies; 19+ messages in thread
From: John Stoffel @ 2025-07-07 20:03 UTC (permalink / raw)
To: Kent Overstreet
Cc: John Stoffel, Linus Torvalds, linux-bcachefs, linux-fsdevel,
linux-kerenl
>>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes:
> On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote:
>> >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes:
>>
>> I wasn't sure if I wanted to chime in here, or even if it would be
>> worth it. But whatever.
>>
>> > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote:
>> >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>> >> >
>> >> > per the maintainer thread discussion and precedent in xfs and btrfs
>> >> > for repair code in RCs, journal_rewind is again included
>> >>
>> >> I have pulled this, but also as per that discussion, I think we'll be
>> >> parting ways in the 6.17 merge window.
>> >>
>> >> You made it very clear that I can't even question any bug-fixes and I
>> >> should just pull anything and everything.
>>
>> > Linus, I'm not trying to say you can't have any say in bcachefs. Not at
>> > all.
>>
>> > I positively enjoy working with you - when you're not being a dick,
>> > but you can be genuinely impossible sometimes. A lot of times...
>>
>> Kent, you can be a dick too. Prime example, the lines above. And
>> how you've treated me and others who gave feedback on bcachefs in the
>> past. I'm not a programmer, I'm in IT and follow this because it's
>> interesting, and I've been doing data management all my career. So
>> new filesystems are interesting.
> Oh yes, I can be. I apologize if I've been a dick to you personally, I
> try to be nice to my users and build good working relationships. But
> kernel development is a high stakes, high pressure, stressful job, as I
> often remind people. I don't ever take it personally, although sometimes
> we do need to cool off before we drive each other completely mad :)
I appreciate this, but honestly I'll withhold judgement until I see
how it goes more long term. But I'm also NOT a kernel developer, I'm
an IT professional who does storage and backups and managing data. So
my perspective is very definitely one of your users, or users-to-be.
But I've also got a CS degree and understand programming issues and
such.
> If there was something that was unresolved, and you'd like me to
> look at it again, I'd be more than happy to. If you want to share
> what you were hitting here, I'll tell you what I know - and if it
> was from a year or more ago it's most likely been fixed.
Nope, it was over a year ago and it's behind me. I was trying to
build the tools on Debian distro when the bcachefs-tools were a real
pain to build. It's better now.
>> Slow down.
> This is the most critical phase in the 10+ year process of shipping a
> new filesystem.
Sure, but that's not what I'm trying to say here. The kernel has, as
you most certainly know, a standard process for quickly deploying new
versions. Linus's entire problem is that you dropped in a big chunk
of code into the late release process.
And none of that is critical, because if you have people running 100tb
of bcachefs right now, they certainly understand that they can lose
data at any time. Or at least they should if they have any sort of
understanding of reliable data. bcachefs isn't there yet. It's
getting close, but Linux has an amazingly complicated VFS and supports
all kinds of wierd edge cases. Which sucks from the filesystem
perspective.
But you know this.
So when you run into a major bug in the code, or potential data loss
when -rc2 or later is coming out, just revert. Pull that code out
because it's obviously not ready. So you wait a few months, big deal!
IT gives you and the code time to stabilize.
If someone is losing data and you want to give them a patch to try and
fix it, great, but they can take a patch from you directly. And post
it to your mailing list. Put it on a git branch somewhere.
But revery from the main linus tree. For now. In two months, you'll
be back with better code. bcachefs is still listed as experimental,
so don't feel like you have to keep pushing the absolutely latest code
into the kernel. Just slow it down a little to make sure you push
good code.
> We're seeing continually increasing usage (hopefully by users who are
> prepared to accept that risk, but not always!), but we're not yet ready
> for true widespread deployment.
If those users are not prepared to accept the risk of an experimental
filesystem, then screw them! They're idiots and should be treated as
such.
I would expect to be fired from my job if I bet my company's data on
bcachefs currently. Sure, play around and test it if you like, but if
it breaks, you get to keep both pieces.
Same with bleeding edge kernel developement! I might run pretty
bleeding edge kernels at home, but only for my own data that I realize
I might lose. But I also do backups, have the data on XFS and ext4
filesystems, which are stable, and I'm not trying to do crazy things
with it.
Do I have some test bcachefs volumes? Sure do. And I treat them like
lepers, if they break, I either toss them away, or I file a report,
but I certainly don't keep ANY data on there I don't want to lose.
I'm being blunt here.
> Shipping a project as large and complex as a filesystem must be done
> incrementally, in stages where we're deploying to gradually increasing
> numbers of users, fixing everything they find and assessing where we're
> at before opening it up to more users.
Yes! But that process also has to include rollbacks, which git has
made so so so easy. Just accept that _if_ 6.x-rc[12345] is buggy,
then it needs to be rolled back and subbmitted to 6.x+1-rc1 for the
next cycle after it's been baked.
Anyone running such a bleeding edge kernel and finding problems isn't
going to care about having to hand apply patches, they're already
doing crazy things! *grin*
> Working with users, supporting with them, checking in on how it's doing,
> and getting them the fixes for what they find is how we iterate and
> improve. The job is not done until it's working well for everyone.
Yes, I agree 100% with all this.
> Right now, everyone is concerned because this is a hotly anticipated
> project, and everyone wants to see it done right.
So which is more important? Ship super fast and break things? Or be
willing to revert and ship just a bit slower?
> And in 6.16, we had two massive pull requests (30+ patches in a
> week, twice in a row); that also generates concern when people are
> wondering "is this thing stabilizing?".
Correct!
> 6.16 was largely a case of a few particularly interesting bug
> reports generating a bunch of fixes (and relatively simple and
> localized fixes, which is what we like to see) for repair corner
> cases, the biggest culprit (again) being snapshots.
Sure, fixes are great. But why did you have to drop them into -rc2 in
a big bundle? Why not just roll back what you had submitted and say
"it's not baked enough, it needs to wait a release"?
> If you look at the bug tracker, especially rate of incoming bugs and the
> severity of bug reports (and also other sources of bug reports, like
> reddit and IRC) - yes, we are stabilizing fast.
Sure, and I'm happy for this. And so are a bunch of other people!
> There is still a lot of work to be done, but we're on the right track.
No arguement there.
> "Slowing down" is not something you do without a concrete
> reason.
And this is where you and Linus are butting heads in my opinion. You
want to release big patches at any time. Linus wants to stabilize
releases and development for the entire kernel. You're concentrating
on your small area which is vitally important to you. But not
everyone is as invested. Others want the latest DRM drivers. Or the
latest i2c code, or some other subsystem which they care about. Linus
(and the process) is about the entire kernel.
> Right now we need to be getting those fixes out to users so
> they can keep testing and finding the next bug. When someone has
> invested time and effort learning how the system works and how to
> report bugs, we don't watn them getting frustrated and leaving - we
> want to work with them, so they can keep testing and finding new
> bugs.
So post patches on your own tree that they can use, nothing stops you!
> The signals that would tell me it's time to slow down are:
> - Regressions getting through (quantity, severity, time spent on fixing
> them)
> - Bugs getting through that show that show that something fundamental is
> missing (testing, hardening), or broken in our our design.
> - Frequency of bug reports going up to where I can't keep up (it's been
> in steady, gradual decline)
> We actually do not want this to be 100% perfect before it sees users.
> That would result in a filesystem that's brittle - a glass cannon. We
> might get it to the point where it works 99% of the time, but then when
> it breaks we'd be in a panic - and if you discover it then, when it's in
> the wild, it's too late.
> The processes for how we debug and recover from failures, in the wild,
> is a huge part (perhaps the majority) of what we're working on now. That
> stuff has to be baked into the design on a deep level, and like all
> other complex design it requires continual iteration.
> That is how we'll get the reliability and robustness we hope to achieve.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-07-07 20:03 ` John Stoffel
@ 2025-07-07 20:39 ` Kent Overstreet
2025-07-07 21:32 ` Carl E. Thompson
1 sibling, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2025-07-07 20:39 UTC (permalink / raw)
To: John Stoffel; +Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
On Mon, Jul 07, 2025 at 04:03:27PM -0400, John Stoffel wrote:
> If those users are not prepared to accept the risk of an experimental
> filesystem, then screw them! They're idiots and should be treated as
> such.
No, that's not the attitude of a responsible filesystem developer.
It doesn't matter what stage of development the code is in, if it's got
users and they're hitting bugs, those bugs take priority. "Screw the
user" is no way to develop a filesystem that people will actually want
to run.
If we want to attain rock solid, bulletproof reliability, then
reliability, top to bottom, has to be the priority at every stage of
development. The job is not done until it is working well and verified
for everyone.
> > Shipping a project as large and complex as a filesystem must be done
> > incrementally, in stages where we're deploying to gradually increasing
> > numbers of users, fixing everything they find and assessing where we're
> > at before opening it up to more users.
>
> Yes! But that process also has to include rollbacks, which git has
> made so so so easy. Just accept that _if_ 6.x-rc[12345] is buggy,
> then it needs to be rolled back and subbmitted to 6.x+1-rc1 for the
> next cycle after it's been baked.
I'm very quick to kick out buggy patches; simple regressions where a
revert is the correct solution almost never hit Linus's tree. This isn't
the issue we're talking about.
> > Right now we need to be getting those fixes out to users so
> > they can keep testing and finding the next bug. When someone has
> > invested time and effort learning how the system works and how to
> > report bugs, we don't watn them getting frustrated and leaving - we
> > want to work with them, so they can keep testing and finding new
> > bugs.
>
> So post patches on your own tree that they can use, nothing stops you!
That is indeed what it comes down to, isn't it?
I support my code. I triage every bug report, prioritizing the critical
bugs, and I spend most of my day working with users to track bugs down
and make sure the fixes work.
That's my job.
If fixes aren't going into Linus's tree, that means the working,
supported bcachefs tree is no longer his tree, it's mine.
We've been down that road with other subsystems in the past (e.g.
Lustre), and it doesn't work.
Going down that road means the first thing I have to do with every bug
report is ask "which tree are you running? stock mainline or mine?" -
and then we might as well go all the way and go the DKMS route, for the
sake of everyone's sanity.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4
2025-07-07 20:03 ` John Stoffel
2025-07-07 20:39 ` Kent Overstreet
@ 2025-07-07 21:32 ` Carl E. Thompson
1 sibling, 0 replies; 19+ messages in thread
From: Carl E. Thompson @ 2025-07-07 21:32 UTC (permalink / raw)
To: John Stoffel, Kent Overstreet
Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl
Don't bother. You can't reason with him and you can't "fix" him. He'll keeping sucking up as much of our collective time and energy as we allow so it's time to move on and stop feeding him.
Carl
> On 2025-07-07 1:03 PM PDT John Stoffel <john@stoffel.org> wrote:
>
>
> >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes:
>
> > On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote:
> >> >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes:
> >>
> >> I wasn't sure if I wanted to chime in here, or even if it would be
> >> worth it. But whatever.
> >>
> >> > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote:
> >> >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >> >> >
> >> >> > per the maintainer thread discussion and precedent in xfs and btrfs
> >> >> > for repair code in RCs, journal_rewind is again included
> >> >>
> >> >> I have pulled this, but also as per that discussion, I think we'll be
> >> >> parting ways in the 6.17 merge window.
> >> >>
> >> >> You made it very clear that I can't even question any bug-fixes and I
> >> >> should just pull anything and everything.
> >>
> >> > Linus, I'm not trying to say you can't have any say in bcachefs. Not at
> >> > all.
> >>
> >> > I positively enjoy working with you - when you're not being a dick,
> >> > but you can be genuinely impossible sometimes. A lot of times...
> >>
> >> Kent, you can be a dick too. Prime example, the lines above. And
> >> how you've treated me and others who gave feedback on bcachefs in the
> >> past. I'm not a programmer, I'm in IT and follow this because it's
> >> interesting, and I've been doing data management all my career. So
> >> new filesystems are interesting.
>
> > Oh yes, I can be. I apologize if I've been a dick to you personally, I
> > try to be nice to my users and build good working relationships. But
> > kernel development is a high stakes, high pressure, stressful job, as I
> > often remind people. I don't ever take it personally, although sometimes
> > we do need to cool off before we drive each other completely mad :)
>
> I appreciate this, but honestly I'll withhold judgement until I see
> how it goes more long term. But I'm also NOT a kernel developer, I'm
> an IT professional who does storage and backups and managing data. So
> my perspective is very definitely one of your users, or users-to-be.
> But I've also got a CS degree and understand programming issues and
> such.
>
> > If there was something that was unresolved, and you'd like me to
> > look at it again, I'd be more than happy to. If you want to share
> > what you were hitting here, I'll tell you what I know - and if it
> > was from a year or more ago it's most likely been fixed.
>
> Nope, it was over a year ago and it's behind me. I was trying to
> build the tools on Debian distro when the bcachefs-tools were a real
> pain to build. It's better now.
>
> >> Slow down.
>
> > This is the most critical phase in the 10+ year process of shipping a
> > new filesystem.
>
> Sure, but that's not what I'm trying to say here. The kernel has, as
> you most certainly know, a standard process for quickly deploying new
> versions. Linus's entire problem is that you dropped in a big chunk
> of code into the late release process.
>
> And none of that is critical, because if you have people running 100tb
> of bcachefs right now, they certainly understand that they can lose
> data at any time. Or at least they should if they have any sort of
> understanding of reliable data. bcachefs isn't there yet. It's
> getting close, but Linux has an amazingly complicated VFS and supports
> all kinds of wierd edge cases. Which sucks from the filesystem
> perspective.
>
> But you know this.
>
> So when you run into a major bug in the code, or potential data loss
> when -rc2 or later is coming out, just revert. Pull that code out
> because it's obviously not ready. So you wait a few months, big deal!
> IT gives you and the code time to stabilize.
>
> If someone is losing data and you want to give them a patch to try and
> fix it, great, but they can take a patch from you directly. And post
> it to your mailing list. Put it on a git branch somewhere.
>
> But revery from the main linus tree. For now. In two months, you'll
> be back with better code. bcachefs is still listed as experimental,
> so don't feel like you have to keep pushing the absolutely latest code
> into the kernel. Just slow it down a little to make sure you push
> good code.
>
> > We're seeing continually increasing usage (hopefully by users who are
> > prepared to accept that risk, but not always!), but we're not yet ready
> > for true widespread deployment.
>
> If those users are not prepared to accept the risk of an experimental
> filesystem, then screw them! They're idiots and should be treated as
> such.
>
> I would expect to be fired from my job if I bet my company's data on
> bcachefs currently. Sure, play around and test it if you like, but if
> it breaks, you get to keep both pieces.
>
> Same with bleeding edge kernel developement! I might run pretty
> bleeding edge kernels at home, but only for my own data that I realize
> I might lose. But I also do backups, have the data on XFS and ext4
> filesystems, which are stable, and I'm not trying to do crazy things
> with it.
>
> Do I have some test bcachefs volumes? Sure do. And I treat them like
> lepers, if they break, I either toss them away, or I file a report,
> but I certainly don't keep ANY data on there I don't want to lose.
>
> I'm being blunt here.
>
> > Shipping a project as large and complex as a filesystem must be done
> > incrementally, in stages where we're deploying to gradually increasing
> > numbers of users, fixing everything they find and assessing where we're
> > at before opening it up to more users.
>
> Yes! But that process also has to include rollbacks, which git has
> made so so so easy. Just accept that _if_ 6.x-rc[12345] is buggy,
> then it needs to be rolled back and subbmitted to 6.x+1-rc1 for the
> next cycle after it's been baked.
>
> Anyone running such a bleeding edge kernel and finding problems isn't
> going to care about having to hand apply patches, they're already
> doing crazy things! *grin*
>
> > Working with users, supporting with them, checking in on how it's doing,
> > and getting them the fixes for what they find is how we iterate and
> > improve. The job is not done until it's working well for everyone.
>
> Yes, I agree 100% with all this.
>
> > Right now, everyone is concerned because this is a hotly anticipated
> > project, and everyone wants to see it done right.
>
> So which is more important? Ship super fast and break things? Or be
> willing to revert and ship just a bit slower?
>
> > And in 6.16, we had two massive pull requests (30+ patches in a
> > week, twice in a row); that also generates concern when people are
> > wondering "is this thing stabilizing?".
>
> Correct!
>
> > 6.16 was largely a case of a few particularly interesting bug
> > reports generating a bunch of fixes (and relatively simple and
> > localized fixes, which is what we like to see) for repair corner
> > cases, the biggest culprit (again) being snapshots.
>
> Sure, fixes are great. But why did you have to drop them into -rc2 in
> a big bundle? Why not just roll back what you had submitted and say
> "it's not baked enough, it needs to wait a release"?
>
> > If you look at the bug tracker, especially rate of incoming bugs and the
> > severity of bug reports (and also other sources of bug reports, like
> > reddit and IRC) - yes, we are stabilizing fast.
>
> Sure, and I'm happy for this. And so are a bunch of other people!
>
> > There is still a lot of work to be done, but we're on the right track.
>
> No arguement there.
>
> > "Slowing down" is not something you do without a concrete
> > reason.
>
> And this is where you and Linus are butting heads in my opinion. You
> want to release big patches at any time. Linus wants to stabilize
> releases and development for the entire kernel. You're concentrating
> on your small area which is vitally important to you. But not
> everyone is as invested. Others want the latest DRM drivers. Or the
> latest i2c code, or some other subsystem which they care about. Linus
> (and the process) is about the entire kernel.
>
> > Right now we need to be getting those fixes out to users so
> > they can keep testing and finding the next bug. When someone has
> > invested time and effort learning how the system works and how to
> > report bugs, we don't watn them getting frustrated and leaving - we
> > want to work with them, so they can keep testing and finding new
> > bugs.
>
> So post patches on your own tree that they can use, nothing stops you!
>
> > The signals that would tell me it's time to slow down are:
>
> > - Regressions getting through (quantity, severity, time spent on fixing
> > them)
> > - Bugs getting through that show that show that something fundamental is
> > missing (testing, hardening), or broken in our our design.
> > - Frequency of bug reports going up to where I can't keep up (it's been
> > in steady, gradual decline)
>
> > We actually do not want this to be 100% perfect before it sees users.
> > That would result in a filesystem that's brittle - a glass cannon. We
> > might get it to the point where it works 99% of the time, but then when
> > it breaks we'd be in a panic - and if you discover it then, when it's in
> > the wild, it's too late.
>
> > The processes for how we debug and recover from failures, in the wild,
> > is a huge part (perhaps the majority) of what we're working on now. That
> > stuff has to be baked into the design on a deep level, and like all
> > other complex design it requires continual iteration.
>
> > That is how we'll get the reliability and robustness we hope to achieve.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-07-07 21:32 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet
2025-06-27 3:21 ` Linus Torvalds
2025-06-27 3:34 ` Kent Overstreet
2025-07-01 14:43 ` John Stoffel
2025-07-02 16:34 ` Kent Overstreet
2025-07-02 17:41 ` Carl E. Thompson
2025-07-02 17:53 ` Kent Overstreet
2025-07-02 18:49 ` Malte Schröder
2025-07-07 20:03 ` John Stoffel
2025-07-07 20:39 ` Kent Overstreet
2025-07-07 21:32 ` Carl E. Thompson
2025-07-04 7:02 ` Hillf Danton
2025-06-27 19:07 ` Kyle Sanderson
2025-06-27 19:16 ` Kyle Sanderson
2025-06-27 19:42 ` Kent Overstreet
2025-06-28 8:06 ` Gerhard Wiesinger
2025-06-27 3:33 ` pr-tracker-bot
2025-06-27 14:46 ` Josef Bacik
2025-06-28 1:59 ` Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).