* [GIT PULL] bcachefs fixes for 6.16-rc4 @ 2025-06-27 2:22 Kent Overstreet 2025-06-27 3:21 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Kent Overstreet @ 2025-06-27 2:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl per the maintainer thread discussion and precedent in xfs and btrfs for repair code in RCs, journal_rewind is again included The following changes since commit e04c78d86a9699d136910cfc0bdcf01087e3267e: Linux 6.16-rc2 (2025-06-15 13:49:41 -0700) are available in the Git repository at: git://evilpiepirate.org/bcachefs.git tags/bcachefs-2025-06-26 for you to fetch changes up to ef6fac0f9e5d0695cee1d820c727fe753eca52d5: bcachefs: Plumb correct ip to trans_relock_fail tracepoint (2025-06-26 00:01:16 -0400) ---------------------------------------------------------------- bcachefs fixes for 6.16-rc4 ---------------------------------------------------------------- Alan Huang (7): bcachefs: Don't allocate new memory when mempool is exhausted bcachefs: Fix alloc_req use after free bcachefs: Add missing EBUG_ON bcachefs: Delay calculation of trans->journal_u64s bcachefs: Move bset size check before csum check bcachefs: Fix pool->alloc NULL pointer dereference bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked Bharadwaj Raju (1): bcachefs: don't return fsck_fix for unfixable node errors in __btree_err Kent Overstreet (43): bcachefs: trace_extent_trim_atomic bcachefs: btree iter tracepoints bcachefs: Fix bch2_journal_keys_peek_prev_min() bcachefs: btree_iter: fix updates, journal overlay bcachefs: better __bch2_snapshot_is_ancestor() assert bcachefs: pass last_seq into fs_journal_start() bcachefs: Fix "now allowing incompatible features" message bcachefs: Fix snapshot_key_missing_inode_snapshot repair bcachefs: fsck: fix add_inode() bcachefs: fsck: fix extent past end of inode repair bcachefs: opts.journal_rewind bcachefs: Kill unused tracepoints bcachefs: mark more errors autofix bcachefs: fsck: Improve check_key_has_inode() bcachefs: Call bch2_fs_init_rw() early if we'll be going rw bcachefs: Fix __bch2_inum_to_path() when crossing subvol boundaries bcachefs: fsck: Print path when we find a subvol loop bcachefs: fsck: Fix remove_backpointer() for subvol roots bcachefs: fsck: Fix reattach_inode() for subvol roots bcachefs: fsck: check_directory_structure runs in reverse order bcachefs: fsck: additional diagnostics for reattach_inode() bcachefs: fsck: check_subdir_count logs path bcachefs: fsck: Fix check_path_loop() + snapshots bcachefs: Fix bch2_read_bio_to_text() bcachefs: Fix restart handling in btree_node_scrub_work() bcachefs: fsck: Fix check_directory_structure when no check_dirents bcachefs: fsck: fix unhandled restart in topology repair bcachefs: fsck: Fix oops in key_visible_in_snapshot() bcachefs: fix spurious error in read_btree_roots() bcachefs: Fix missing newlines before ero bcachefs: Fix *__bch2_trans_subbuf_alloc() error path bcachefs: Don't log fsck err in the journal if doing repair elsewhere bcachefs: Add missing key type checks to check_snapshot_exists() bcachefs: Add missing bch2_err_class() to fileattr_set() bcachefs: fix spurious error_throw bcachefs: Fix range in bch2_lookup_indirect_extent() error path bcachefs: Check for bad write buffer key when moving from journal bcachefs: Use wait_on_allocator() when allocating journal bcachefs: fix bch2_journal_keys_peek_prev_min() underflow bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix bcachefs: Ensure btree node scan runs before checking for scanned nodes bcachefs: Ensure we rewind to run recovery passes bcachefs: Plumb correct ip to trans_relock_fail tracepoint fs/bcachefs/alloc_background.c | 13 +- fs/bcachefs/backpointers.c | 2 +- fs/bcachefs/bcachefs.h | 3 +- fs/bcachefs/btree_gc.c | 37 ++-- fs/bcachefs/btree_io.c | 74 ++++---- fs/bcachefs/btree_iter.c | 173 ++++++++++++------ fs/bcachefs/btree_journal_iter.c | 82 ++++++--- fs/bcachefs/btree_journal_iter_types.h | 5 +- fs/bcachefs/btree_locking.c | 12 +- fs/bcachefs/btree_node_scan.c | 6 +- fs/bcachefs/btree_node_scan.h | 2 +- fs/bcachefs/btree_trans_commit.c | 18 +- fs/bcachefs/btree_types.h | 1 + fs/bcachefs/btree_update.c | 16 +- fs/bcachefs/btree_update.h | 5 +- fs/bcachefs/btree_update_interior.c | 16 +- fs/bcachefs/btree_update_interior.h | 3 + fs/bcachefs/btree_write_buffer.c | 8 +- fs/bcachefs/btree_write_buffer.h | 6 + fs/bcachefs/chardev.c | 29 ++- fs/bcachefs/data_update.c | 1 + fs/bcachefs/errcode.h | 5 - fs/bcachefs/error.c | 4 +- fs/bcachefs/extent_update.c | 13 +- fs/bcachefs/fs.c | 3 +- fs/bcachefs/fsck.c | 317 +++++++++++++++++++++++---------- fs/bcachefs/inode.h | 5 + fs/bcachefs/io_read.c | 7 +- fs/bcachefs/journal.c | 20 +-- fs/bcachefs/journal.h | 2 +- fs/bcachefs/journal_io.c | 26 ++- fs/bcachefs/namei.c | 30 +++- fs/bcachefs/opts.h | 5 + fs/bcachefs/recovery.c | 24 ++- fs/bcachefs/recovery_passes.c | 19 +- fs/bcachefs/recovery_passes.h | 9 + fs/bcachefs/reflink.c | 12 +- fs/bcachefs/sb-errors_format.h | 19 +- fs/bcachefs/snapshot.c | 14 +- fs/bcachefs/super.c | 13 +- fs/bcachefs/super.h | 1 + fs/bcachefs/trace.h | 125 +++---------- 42 files changed, 734 insertions(+), 451 deletions(-) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet @ 2025-06-27 3:21 ` Linus Torvalds 2025-06-27 3:34 ` Kent Overstreet ` (2 more replies) 2025-06-27 3:33 ` pr-tracker-bot 2025-06-27 14:46 ` Josef Bacik 2 siblings, 3 replies; 19+ messages in thread From: Linus Torvalds @ 2025-06-27 3:21 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > per the maintainer thread discussion and precedent in xfs and btrfs > for repair code in RCs, journal_rewind is again included I have pulled this, but also as per that discussion, I think we'll be parting ways in the 6.17 merge window. You made it very clear that I can't even question any bug-fixes and I should just pull anything and everything. Honestly, at that point, I don't really feel comfortable being involved at all, and the only thing we both seemed to really fundamentally agree on in that discussion was "we're done". Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 3:21 ` Linus Torvalds @ 2025-06-27 3:34 ` Kent Overstreet 2025-07-01 14:43 ` John Stoffel 2025-07-04 7:02 ` Hillf Danton 2025-06-27 19:07 ` Kyle Sanderson 2025-06-28 8:06 ` Gerhard Wiesinger 2 siblings, 2 replies; 19+ messages in thread From: Kent Overstreet @ 2025-06-27 3:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote: > On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > per the maintainer thread discussion and precedent in xfs and btrfs > > for repair code in RCs, journal_rewind is again included > > I have pulled this, but also as per that discussion, I think we'll be > parting ways in the 6.17 merge window. > > You made it very clear that I can't even question any bug-fixes and I > should just pull anything and everything. Linus, I'm not trying to say you can't have any say in bcachefs. Not at all. I positively enjoy working with you - when you're not being a dick, but you can be genuinely impossible sometimes. A lot of times... When bcachefs was getting merged, I got comments from another filesystem maintainer that were pretty much "great! we finally have a filesystem maintainer who can stand up to Linus!". And having been on the receiving end of a lot of venting from them about what was going on... And more that I won't get into... I don't want to be in that position. I'm just not going to have any sense of humour where user data integrity is concerned or making sure users have the bugfixes they need. Like I said - all I've been wanting is for you to tone it down and stop holding pull requests over my head as THE place to have that discussion. You have genuinely good ideas, and you're bloody sharp. It is FUN getting shit done with you when we're not battling. But you have to understand the constraints people are under. Not just myself. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 3:34 ` Kent Overstreet @ 2025-07-01 14:43 ` John Stoffel 2025-07-02 16:34 ` Kent Overstreet 2025-07-04 7:02 ` Hillf Danton 1 sibling, 1 reply; 19+ messages in thread From: John Stoffel @ 2025-07-01 14:43 UTC (permalink / raw) To: Kent Overstreet Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes: I wasn't sure if I wanted to chime in here, or even if it would be worth it. But whatever. > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote: >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: >> > >> > per the maintainer thread discussion and precedent in xfs and btrfs >> > for repair code in RCs, journal_rewind is again included >> >> I have pulled this, but also as per that discussion, I think we'll be >> parting ways in the 6.17 merge window. >> >> You made it very clear that I can't even question any bug-fixes and I >> should just pull anything and everything. > Linus, I'm not trying to say you can't have any say in bcachefs. Not at > all. > I positively enjoy working with you - when you're not being a dick, > but you can be genuinely impossible sometimes. A lot of times... Kent, you can be a dick too. Prime example, the lines above. And how you've treated me and others who gave feedback on bcachefs in the past. I'm not a programmer, I'm in IT and follow this because it's interesting, and I've been doing data management all my career. So new filesystems are interesting. But I've also been bitten by data loss, so I'd never ever trust my production data to something labeled "experimental". It's wonderful that you have stepped up and managed to get back people's data when bugs in the code have caused them to lose data. But for god's sake, just because you can find and fix this type of bug during the -rc series, doesn't mean you need to try and patch it NOW. Queue it up for the next release. Tell people how they can pull the patch early if they want, but don't push it late in the release cycle. I've been watching this list since the early 2.x days, and I've seen how the workflow has evolved over time. I've watched people burn out and leave, flame wars and all kinds of crap. And the people who have stayed around are generally the nice people. The flexible people. The people who know when to back the f*ck off and take their time. > When bcachefs was getting merged, I got comments from another > filesystem maintainer that were pretty much "great! we finally have > a filesystem maintainer who can stand up to Linus!". Is that in terms of being dicks, or in terms of technical ability? Or in terms of being super productive and focussed and able to get work done. Standing up doesn't mean you're right. Or wrong. > And having been on the receiving end of a lot of venting from them > about what was going on... And more that I won't get into... > I don't want to be in that position. So don't! Just step back a second. Go back and read and re-read all the comments Linus had made about the workflow and release process over the years, much less decades of the kernel development. I'm not sure you realize how much work it is to have people blasting patches at you all day long, 365 days a year, and who think their patches are the most important thing in the entire world bar none. Just reflect on this for a second. Take your hands off your keyboard, and don't type anything. And think about how many other people also think their patches are the most important. And about the users who _need_ _that_ _patch_ _right_ _now_ to fix a problem. Why doesn't Linus see that I'm important and my part of the kernel is the most important! Just let that sink in a bit. Then think about how many people do not care about bcachefs at all, who don't even know it exists. And haven't used it or want to use it. Are they less important? What about the graphics driver they need to get _their_ work done right now? Is that more or less important? > I'm just not going to have any sense of humour where user data integrity > is concerned or making sure users have the bugfixes they need. So release your own patches in your own tree! No one is stopping you! Have your '6.17-next' branch with the big re-working to fix this horrible issue. But send in just the minimal patch _now_. The absolutely the smallest patch. Or just send in a revert for all you have done in the current series which is breaking people, because it wasn't quite baked enough for stability. Fall back, re-group, re-submit it all on the next release. Slow down. > Like I said - all I've been wanting is for you to tone it down and stop > holding pull requests over my head as THE place to have that discussion. And you need to stop thinking you are the most important thing and only you can decide when bcachefs needs to be updated or not in the kernel tree. > You have genuinely good ideas, and you're bloody sharp. It is FUN > getting shit done with you when we're not battling. I'm honestly amazed at your abilities here Kent, even though you can be an abrassive person too. > But you have to understand the constraints people are under. Not > just myself. Dude, you need to listed to Linus saying this exact same line back to you. John ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-07-01 14:43 ` John Stoffel @ 2025-07-02 16:34 ` Kent Overstreet 2025-07-02 17:41 ` Carl E. Thompson 2025-07-07 20:03 ` John Stoffel 0 siblings, 2 replies; 19+ messages in thread From: Kent Overstreet @ 2025-07-02 16:34 UTC (permalink / raw) To: John Stoffel; +Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote: > >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes: > > I wasn't sure if I wanted to chime in here, or even if it would be > worth it. But whatever. > > > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote: > >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: > >> > > >> > per the maintainer thread discussion and precedent in xfs and btrfs > >> > for repair code in RCs, journal_rewind is again included > >> > >> I have pulled this, but also as per that discussion, I think we'll be > >> parting ways in the 6.17 merge window. > >> > >> You made it very clear that I can't even question any bug-fixes and I > >> should just pull anything and everything. > > > Linus, I'm not trying to say you can't have any say in bcachefs. Not at > > all. > > > I positively enjoy working with you - when you're not being a dick, > > but you can be genuinely impossible sometimes. A lot of times... > > Kent, you can be a dick too. Prime example, the lines above. And > how you've treated me and others who gave feedback on bcachefs in the > past. I'm not a programmer, I'm in IT and follow this because it's > interesting, and I've been doing data management all my career. So > new filesystems are interesting. Oh yes, I can be. I apologize if I've been a dick to you personally, I try to be nice to my users and build good working relationships. But kernel development is a high stakes, high pressure, stressful job, as I often remind people. I don't ever take it personally, although sometimes we do need to cool off before we drive each other completely mad :) If there was something that was unresolved, and you'd like me to look at it again, I'd be more than happy to. If you want to share what you were hitting here, I'll tell you what I know - and if it was from a year or more ago it's most likely been fixed. > Slow down. This is the most critical phase in the 10+ year process of shipping a new filesystem. We're seeing continually increasing usage (hopefully by users who are prepared to accept that risk, but not always!), but we're not yet ready for true widespread deployment. Shipping a project as large and complex as a filesystem must be done incrementally, in stages where we're deploying to gradually increasing numbers of users, fixing everything they find and assessing where we're at before opening it up to more users. Working with users, supporting with them, checking in on how it's doing, and getting them the fixes for what they find is how we iterate and improve. The job is not done until it's working well for everyone. Right now, everyone is concerned because this is a hotly anticipated project, and everyone wants to see it done right. And in 6.16, we had two massive pull requests (30+ patches in a week, twice in a row); that also generates concern when people are wondering "is this thing stabilizing?". 6.16 was largely a case of a few particularly interesting bug reports generating a bunch of fixes (and relatively simple and localized fixes, which is what we like to see) for repair corner cases, the biggest culprit (again) being snapshots. If you look at the bug tracker, especially rate of incoming bugs and the severity of bug reports (and also other sources of bug reports, like reddit and IRC) - yes, we are stabilizing fast. There is still a lot of work to be done, but we're on the right track. "Slowing down" is not something you do without a concrete reason. Right now we need to be getting those fixes out to users so they can keep testing and finding the next bug. When someone has invested time and effort learning how the system works and how to report bugs, we don't watn them getting frustrated and leaving - we want to work with them, so they can keep testing and finding new bugs. The signals that would tell me it's time to slow down are: - Regressions getting through (quantity, severity, time spent on fixing them) - Bugs getting through that show that show that something fundamental is missing (testing, hardening), or broken in our our design. - Frequency of bug reports going up to where I can't keep up (it's been in steady, gradual decline) We actually do not want this to be 100% perfect before it sees users. That would result in a filesystem that's brittle - a glass cannon. We might get it to the point where it works 99% of the time, but then when it breaks we'd be in a panic - and if you discover it then, when it's in the wild, it's too late. The processes for how we debug and recover from failures, in the wild, is a huge part (perhaps the majority) of what we're working on now. That stuff has to be baked into the design on a deep level, and like all other complex design it requires continual iteration. That is how we'll get the reliability and robustness we hope to achieve. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-07-02 16:34 ` Kent Overstreet @ 2025-07-02 17:41 ` Carl E. Thompson 2025-07-02 17:53 ` Kent Overstreet 2025-07-07 20:03 ` John Stoffel 1 sibling, 1 reply; 19+ messages in thread From: Carl E. Thompson @ 2025-07-02 17:41 UTC (permalink / raw) To: Kent Overstreet, John Stoffel Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl Kent, at this point in bcachefs' development you want complete control over your development processes and timetable that you simply can't get in the mainline kernel. It's in your own best interest for you to develop out-of-tree for now. It's in your users' best interests too. It's much faster, easier, less invasive and less risky to compile and install a single module than it is to replace the entire kernel. Developing out-of-tree will help users track down bugs faster because you'll be able to iterate faster and because testing multiple bcachefs versions in the same kernel eliminates the possibility of other kernel changes clouding the tests. And it seems to me to be in the other kernel developers' best interests. They need to be able to do their work and I suspect the constant drama and distraction you bring could make that harder. You've already damaged your own reputation considerably but your continued drama is also damaging _their_ reputations (and the kernel's) and that's not fair. I don't know what arrangement you have with your corporate sponsor but if they have incentivized you to have your development happen in the mainline kernel tree then I would ask that you not put their interests above everyone else's. Carl Thompson PS: There is a typo in the linux-kernel mailing list email address in this chain. Not fixing it as I don't think there's anything in this discussion that is of value to a larger audience. > On 2025-07-02 9:34 AM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote: > > >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes: > > > > I wasn't sure if I wanted to chime in here, or even if it would be > > worth it. But whatever. > > > > > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote: > > >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > >> > > > >> > per the maintainer thread discussion and precedent in xfs and btrfs > > >> > for repair code in RCs, journal_rewind is again included > > >> > > >> I have pulled this, but also as per that discussion, I think we'll be > > >> parting ways in the 6.17 merge window. > > >> > > >> You made it very clear that I can't even question any bug-fixes and I > > >> should just pull anything and everything. > > > > > Linus, I'm not trying to say you can't have any say in bcachefs. Not at > > > all. > > > > > I positively enjoy working with you - when you're not being a dick, > > > but you can be genuinely impossible sometimes. A lot of times... > > > > Kent, you can be a dick too. Prime example, the lines above. And > > how you've treated me and others who gave feedback on bcachefs in the > > past. I'm not a programmer, I'm in IT and follow this because it's > > interesting, and I've been doing data management all my career. So > > new filesystems are interesting. > > Oh yes, I can be. I apologize if I've been a dick to you personally, I > try to be nice to my users and build good working relationships. But > kernel development is a high stakes, high pressure, stressful job, as I > often remind people. I don't ever take it personally, although sometimes > we do need to cool off before we drive each other completely mad :) > > If there was something that was unresolved, and you'd like me to look at > it again, I'd be more than happy to. If you want to share what you were > hitting here, I'll tell you what I know - and if it was from a year or > more ago it's most likely been fixed. > > > Slow down. > > This is the most critical phase in the 10+ year process of shipping a > new filesystem. > > We're seeing continually increasing usage (hopefully by users who are > prepared to accept that risk, but not always!), but we're not yet ready > for true widespread deployment. > > Shipping a project as large and complex as a filesystem must be done > incrementally, in stages where we're deploying to gradually increasing > numbers of users, fixing everything they find and assessing where we're > at before opening it up to more users. > > Working with users, supporting with them, checking in on how it's doing, > and getting them the fixes for what they find is how we iterate and > improve. The job is not done until it's working well for everyone. > > Right now, everyone is concerned because this is a hotly anticipated > project, and everyone wants to see it done right. > > And in 6.16, we had two massive pull requests (30+ patches in a week, > twice in a row); that also generates concern when people are wondering > "is this thing stabilizing?". > > 6.16 was largely a case of a few particularly interesting bug reports > generating a bunch of fixes (and relatively simple and localized fixes, > which is what we like to see) for repair corner cases, the biggest > culprit (again) being snapshots. > > If you look at the bug tracker, especially rate of incoming bugs and the > severity of bug reports (and also other sources of bug reports, like > reddit and IRC) - yes, we are stabilizing fast. > > There is still a lot of work to be done, but we're on the right track. > > "Slowing down" is not something you do without a concrete reason. Right > now we need to be getting those fixes out to users so they can keep > testing and finding the next bug. When someone has invested time and > effort learning how the system works and how to report bugs, we don't > watn them getting frustrated and leaving - we want to work with them, so > they can keep testing and finding new bugs. > > The signals that would tell me it's time to slow down are: > > - Regressions getting through (quantity, severity, time spent on fixing > them) > - Bugs getting through that show that show that something fundamental is > missing (testing, hardening), or broken in our our design. > - Frequency of bug reports going up to where I can't keep up (it's been > in steady, gradual decline) > > We actually do not want this to be 100% perfect before it sees users. > That would result in a filesystem that's brittle - a glass cannon. We > might get it to the point where it works 99% of the time, but then when > it breaks we'd be in a panic - and if you discover it then, when it's in > the wild, it's too late. > > The processes for how we debug and recover from failures, in the wild, > is a huge part (perhaps the majority) of what we're working on now. That > stuff has to be baked into the design on a deep level, and like all > other complex design it requires continual iteration. > > That is how we'll get the reliability and robustness we hope to achieve. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-07-02 17:41 ` Carl E. Thompson @ 2025-07-02 17:53 ` Kent Overstreet 2025-07-02 18:49 ` Malte Schröder 0 siblings, 1 reply; 19+ messages in thread From: Kent Overstreet @ 2025-07-02 17:53 UTC (permalink / raw) To: Carl E. Thompson Cc: John Stoffel, Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kernel On Wed, Jul 02, 2025 at 10:41:34AM -0700, Carl E. Thompson wrote: > Kent, at this point in bcachefs' development you want complete control > over your development processes and timetable that you simply can't > get in the mainline kernel. It's in your own best interest for you to > develop out-of-tree for now. Carl, all I'm doing is stating up front what it's going to take to get this done right. I'm not particularly pushing one way or the other for bcachefs to stay in; there are pros and cons either way. It'll be disruptive for it to be out, but if the alternative is disrupting process too much and driving Linus and I completely completely nuts, that's ok. Everyone please be patient. This is a 10+ year process, no one thing is make or break. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-07-02 17:53 ` Kent Overstreet @ 2025-07-02 18:49 ` Malte Schröder 0 siblings, 0 replies; 19+ messages in thread From: Malte Schröder @ 2025-07-02 18:49 UTC (permalink / raw) To: Kent Overstreet, Carl E. Thompson Cc: John Stoffel, Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kernel On 02.07.25 19:53, Kent Overstreet wrote: > On Wed, Jul 02, 2025 at 10:41:34AM -0700, Carl E. Thompson wrote: >> Kent, at this point in bcachefs' development you want complete control >> over your development processes and timetable that you simply can't >> get in the mainline kernel. It's in your own best interest for you to >> develop out-of-tree for now. > Carl, all I'm doing is stating up front what it's going to take to get > this done right. > > I'm not particularly pushing one way or the other for bcachefs to stay > in; there are pros and cons either way. It'll be disruptive for it to be > out, but if the alternative is disrupting process too much and driving > Linus and I completely completely nuts, that's ok. > > Everyone please be patient. This is a 10+ year process, no one thing is > make or break. > So as a user usually hanging out on IRC and running Kent's trees: I think most of those people actually testing bcachefs are either running bcachefs-master, -rc or some outdated distro kernel. From my perspective I'd think it would be good enough to push for-upstream during the merge window and then only provide further patches if there where regressions or some really bad bug appears that actually eats data (like the one that bit me). If it's "just" stability fixes, well, if people running a distro kernel hit those bugs they'll need to build a -rc kernel anyways to get fixes, those could just build bcachefs-master. When running Linus' tree I am aware and accept that I am not running the absolutely latest code. I've had some pretty bad experience with amdgpu requiring out of tree patches to get my system running free of glitches, which took months to get into upstream. It's annoying, but I accept that. I'd rather have a slightly outdated bcachefs in kernel than not at all. It is good to have a distro kernel I can fall back to if I mess up my own kernel building ;) my 0.02€ /Malte ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-07-02 16:34 ` Kent Overstreet 2025-07-02 17:41 ` Carl E. Thompson @ 2025-07-07 20:03 ` John Stoffel 2025-07-07 20:39 ` Kent Overstreet 2025-07-07 21:32 ` Carl E. Thompson 1 sibling, 2 replies; 19+ messages in thread From: John Stoffel @ 2025-07-07 20:03 UTC (permalink / raw) To: Kent Overstreet Cc: John Stoffel, Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes: > On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote: >> >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes: >> >> I wasn't sure if I wanted to chime in here, or even if it would be >> worth it. But whatever. >> >> > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote: >> >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: >> >> > >> >> > per the maintainer thread discussion and precedent in xfs and btrfs >> >> > for repair code in RCs, journal_rewind is again included >> >> >> >> I have pulled this, but also as per that discussion, I think we'll be >> >> parting ways in the 6.17 merge window. >> >> >> >> You made it very clear that I can't even question any bug-fixes and I >> >> should just pull anything and everything. >> >> > Linus, I'm not trying to say you can't have any say in bcachefs. Not at >> > all. >> >> > I positively enjoy working with you - when you're not being a dick, >> > but you can be genuinely impossible sometimes. A lot of times... >> >> Kent, you can be a dick too. Prime example, the lines above. And >> how you've treated me and others who gave feedback on bcachefs in the >> past. I'm not a programmer, I'm in IT and follow this because it's >> interesting, and I've been doing data management all my career. So >> new filesystems are interesting. > Oh yes, I can be. I apologize if I've been a dick to you personally, I > try to be nice to my users and build good working relationships. But > kernel development is a high stakes, high pressure, stressful job, as I > often remind people. I don't ever take it personally, although sometimes > we do need to cool off before we drive each other completely mad :) I appreciate this, but honestly I'll withhold judgement until I see how it goes more long term. But I'm also NOT a kernel developer, I'm an IT professional who does storage and backups and managing data. So my perspective is very definitely one of your users, or users-to-be. But I've also got a CS degree and understand programming issues and such. > If there was something that was unresolved, and you'd like me to > look at it again, I'd be more than happy to. If you want to share > what you were hitting here, I'll tell you what I know - and if it > was from a year or more ago it's most likely been fixed. Nope, it was over a year ago and it's behind me. I was trying to build the tools on Debian distro when the bcachefs-tools were a real pain to build. It's better now. >> Slow down. > This is the most critical phase in the 10+ year process of shipping a > new filesystem. Sure, but that's not what I'm trying to say here. The kernel has, as you most certainly know, a standard process for quickly deploying new versions. Linus's entire problem is that you dropped in a big chunk of code into the late release process. And none of that is critical, because if you have people running 100tb of bcachefs right now, they certainly understand that they can lose data at any time. Or at least they should if they have any sort of understanding of reliable data. bcachefs isn't there yet. It's getting close, but Linux has an amazingly complicated VFS and supports all kinds of wierd edge cases. Which sucks from the filesystem perspective. But you know this. So when you run into a major bug in the code, or potential data loss when -rc2 or later is coming out, just revert. Pull that code out because it's obviously not ready. So you wait a few months, big deal! IT gives you and the code time to stabilize. If someone is losing data and you want to give them a patch to try and fix it, great, but they can take a patch from you directly. And post it to your mailing list. Put it on a git branch somewhere. But revery from the main linus tree. For now. In two months, you'll be back with better code. bcachefs is still listed as experimental, so don't feel like you have to keep pushing the absolutely latest code into the kernel. Just slow it down a little to make sure you push good code. > We're seeing continually increasing usage (hopefully by users who are > prepared to accept that risk, but not always!), but we're not yet ready > for true widespread deployment. If those users are not prepared to accept the risk of an experimental filesystem, then screw them! They're idiots and should be treated as such. I would expect to be fired from my job if I bet my company's data on bcachefs currently. Sure, play around and test it if you like, but if it breaks, you get to keep both pieces. Same with bleeding edge kernel developement! I might run pretty bleeding edge kernels at home, but only for my own data that I realize I might lose. But I also do backups, have the data on XFS and ext4 filesystems, which are stable, and I'm not trying to do crazy things with it. Do I have some test bcachefs volumes? Sure do. And I treat them like lepers, if they break, I either toss them away, or I file a report, but I certainly don't keep ANY data on there I don't want to lose. I'm being blunt here. > Shipping a project as large and complex as a filesystem must be done > incrementally, in stages where we're deploying to gradually increasing > numbers of users, fixing everything they find and assessing where we're > at before opening it up to more users. Yes! But that process also has to include rollbacks, which git has made so so so easy. Just accept that _if_ 6.x-rc[12345] is buggy, then it needs to be rolled back and subbmitted to 6.x+1-rc1 for the next cycle after it's been baked. Anyone running such a bleeding edge kernel and finding problems isn't going to care about having to hand apply patches, they're already doing crazy things! *grin* > Working with users, supporting with them, checking in on how it's doing, > and getting them the fixes for what they find is how we iterate and > improve. The job is not done until it's working well for everyone. Yes, I agree 100% with all this. > Right now, everyone is concerned because this is a hotly anticipated > project, and everyone wants to see it done right. So which is more important? Ship super fast and break things? Or be willing to revert and ship just a bit slower? > And in 6.16, we had two massive pull requests (30+ patches in a > week, twice in a row); that also generates concern when people are > wondering "is this thing stabilizing?". Correct! > 6.16 was largely a case of a few particularly interesting bug > reports generating a bunch of fixes (and relatively simple and > localized fixes, which is what we like to see) for repair corner > cases, the biggest culprit (again) being snapshots. Sure, fixes are great. But why did you have to drop them into -rc2 in a big bundle? Why not just roll back what you had submitted and say "it's not baked enough, it needs to wait a release"? > If you look at the bug tracker, especially rate of incoming bugs and the > severity of bug reports (and also other sources of bug reports, like > reddit and IRC) - yes, we are stabilizing fast. Sure, and I'm happy for this. And so are a bunch of other people! > There is still a lot of work to be done, but we're on the right track. No arguement there. > "Slowing down" is not something you do without a concrete > reason. And this is where you and Linus are butting heads in my opinion. You want to release big patches at any time. Linus wants to stabilize releases and development for the entire kernel. You're concentrating on your small area which is vitally important to you. But not everyone is as invested. Others want the latest DRM drivers. Or the latest i2c code, or some other subsystem which they care about. Linus (and the process) is about the entire kernel. > Right now we need to be getting those fixes out to users so > they can keep testing and finding the next bug. When someone has > invested time and effort learning how the system works and how to > report bugs, we don't watn them getting frustrated and leaving - we > want to work with them, so they can keep testing and finding new > bugs. So post patches on your own tree that they can use, nothing stops you! > The signals that would tell me it's time to slow down are: > - Regressions getting through (quantity, severity, time spent on fixing > them) > - Bugs getting through that show that show that something fundamental is > missing (testing, hardening), or broken in our our design. > - Frequency of bug reports going up to where I can't keep up (it's been > in steady, gradual decline) > We actually do not want this to be 100% perfect before it sees users. > That would result in a filesystem that's brittle - a glass cannon. We > might get it to the point where it works 99% of the time, but then when > it breaks we'd be in a panic - and if you discover it then, when it's in > the wild, it's too late. > The processes for how we debug and recover from failures, in the wild, > is a huge part (perhaps the majority) of what we're working on now. That > stuff has to be baked into the design on a deep level, and like all > other complex design it requires continual iteration. > That is how we'll get the reliability and robustness we hope to achieve. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-07-07 20:03 ` John Stoffel @ 2025-07-07 20:39 ` Kent Overstreet 2025-07-07 21:32 ` Carl E. Thompson 1 sibling, 0 replies; 19+ messages in thread From: Kent Overstreet @ 2025-07-07 20:39 UTC (permalink / raw) To: John Stoffel; +Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl On Mon, Jul 07, 2025 at 04:03:27PM -0400, John Stoffel wrote: > If those users are not prepared to accept the risk of an experimental > filesystem, then screw them! They're idiots and should be treated as > such. No, that's not the attitude of a responsible filesystem developer. It doesn't matter what stage of development the code is in, if it's got users and they're hitting bugs, those bugs take priority. "Screw the user" is no way to develop a filesystem that people will actually want to run. If we want to attain rock solid, bulletproof reliability, then reliability, top to bottom, has to be the priority at every stage of development. The job is not done until it is working well and verified for everyone. > > Shipping a project as large and complex as a filesystem must be done > > incrementally, in stages where we're deploying to gradually increasing > > numbers of users, fixing everything they find and assessing where we're > > at before opening it up to more users. > > Yes! But that process also has to include rollbacks, which git has > made so so so easy. Just accept that _if_ 6.x-rc[12345] is buggy, > then it needs to be rolled back and subbmitted to 6.x+1-rc1 for the > next cycle after it's been baked. I'm very quick to kick out buggy patches; simple regressions where a revert is the correct solution almost never hit Linus's tree. This isn't the issue we're talking about. > > Right now we need to be getting those fixes out to users so > > they can keep testing and finding the next bug. When someone has > > invested time and effort learning how the system works and how to > > report bugs, we don't watn them getting frustrated and leaving - we > > want to work with them, so they can keep testing and finding new > > bugs. > > So post patches on your own tree that they can use, nothing stops you! That is indeed what it comes down to, isn't it? I support my code. I triage every bug report, prioritizing the critical bugs, and I spend most of my day working with users to track bugs down and make sure the fixes work. That's my job. If fixes aren't going into Linus's tree, that means the working, supported bcachefs tree is no longer his tree, it's mine. We've been down that road with other subsystems in the past (e.g. Lustre), and it doesn't work. Going down that road means the first thing I have to do with every bug report is ask "which tree are you running? stock mainline or mine?" - and then we might as well go all the way and go the DKMS route, for the sake of everyone's sanity. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-07-07 20:03 ` John Stoffel 2025-07-07 20:39 ` Kent Overstreet @ 2025-07-07 21:32 ` Carl E. Thompson 1 sibling, 0 replies; 19+ messages in thread From: Carl E. Thompson @ 2025-07-07 21:32 UTC (permalink / raw) To: John Stoffel, Kent Overstreet Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl Don't bother. You can't reason with him and you can't "fix" him. He'll keeping sucking up as much of our collective time and energy as we allow so it's time to move on and stop feeding him. Carl > On 2025-07-07 1:03 PM PDT John Stoffel <john@stoffel.org> wrote: > > > >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes: > > > On Tue, Jul 01, 2025 at 10:43:11AM -0400, John Stoffel wrote: > >> >>>>> "Kent" == Kent Overstreet <kent.overstreet@linux.dev> writes: > >> > >> I wasn't sure if I wanted to chime in here, or even if it would be > >> worth it. But whatever. > >> > >> > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote: > >> >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: > >> >> > > >> >> > per the maintainer thread discussion and precedent in xfs and btrfs > >> >> > for repair code in RCs, journal_rewind is again included > >> >> > >> >> I have pulled this, but also as per that discussion, I think we'll be > >> >> parting ways in the 6.17 merge window. > >> >> > >> >> You made it very clear that I can't even question any bug-fixes and I > >> >> should just pull anything and everything. > >> > >> > Linus, I'm not trying to say you can't have any say in bcachefs. Not at > >> > all. > >> > >> > I positively enjoy working with you - when you're not being a dick, > >> > but you can be genuinely impossible sometimes. A lot of times... > >> > >> Kent, you can be a dick too. Prime example, the lines above. And > >> how you've treated me and others who gave feedback on bcachefs in the > >> past. I'm not a programmer, I'm in IT and follow this because it's > >> interesting, and I've been doing data management all my career. So > >> new filesystems are interesting. > > > Oh yes, I can be. I apologize if I've been a dick to you personally, I > > try to be nice to my users and build good working relationships. But > > kernel development is a high stakes, high pressure, stressful job, as I > > often remind people. I don't ever take it personally, although sometimes > > we do need to cool off before we drive each other completely mad :) > > I appreciate this, but honestly I'll withhold judgement until I see > how it goes more long term. But I'm also NOT a kernel developer, I'm > an IT professional who does storage and backups and managing data. So > my perspective is very definitely one of your users, or users-to-be. > But I've also got a CS degree and understand programming issues and > such. > > > If there was something that was unresolved, and you'd like me to > > look at it again, I'd be more than happy to. If you want to share > > what you were hitting here, I'll tell you what I know - and if it > > was from a year or more ago it's most likely been fixed. > > Nope, it was over a year ago and it's behind me. I was trying to > build the tools on Debian distro when the bcachefs-tools were a real > pain to build. It's better now. > > >> Slow down. > > > This is the most critical phase in the 10+ year process of shipping a > > new filesystem. > > Sure, but that's not what I'm trying to say here. The kernel has, as > you most certainly know, a standard process for quickly deploying new > versions. Linus's entire problem is that you dropped in a big chunk > of code into the late release process. > > And none of that is critical, because if you have people running 100tb > of bcachefs right now, they certainly understand that they can lose > data at any time. Or at least they should if they have any sort of > understanding of reliable data. bcachefs isn't there yet. It's > getting close, but Linux has an amazingly complicated VFS and supports > all kinds of wierd edge cases. Which sucks from the filesystem > perspective. > > But you know this. > > So when you run into a major bug in the code, or potential data loss > when -rc2 or later is coming out, just revert. Pull that code out > because it's obviously not ready. So you wait a few months, big deal! > IT gives you and the code time to stabilize. > > If someone is losing data and you want to give them a patch to try and > fix it, great, but they can take a patch from you directly. And post > it to your mailing list. Put it on a git branch somewhere. > > But revery from the main linus tree. For now. In two months, you'll > be back with better code. bcachefs is still listed as experimental, > so don't feel like you have to keep pushing the absolutely latest code > into the kernel. Just slow it down a little to make sure you push > good code. > > > We're seeing continually increasing usage (hopefully by users who are > > prepared to accept that risk, but not always!), but we're not yet ready > > for true widespread deployment. > > If those users are not prepared to accept the risk of an experimental > filesystem, then screw them! They're idiots and should be treated as > such. > > I would expect to be fired from my job if I bet my company's data on > bcachefs currently. Sure, play around and test it if you like, but if > it breaks, you get to keep both pieces. > > Same with bleeding edge kernel developement! I might run pretty > bleeding edge kernels at home, but only for my own data that I realize > I might lose. But I also do backups, have the data on XFS and ext4 > filesystems, which are stable, and I'm not trying to do crazy things > with it. > > Do I have some test bcachefs volumes? Sure do. And I treat them like > lepers, if they break, I either toss them away, or I file a report, > but I certainly don't keep ANY data on there I don't want to lose. > > I'm being blunt here. > > > Shipping a project as large and complex as a filesystem must be done > > incrementally, in stages where we're deploying to gradually increasing > > numbers of users, fixing everything they find and assessing where we're > > at before opening it up to more users. > > Yes! But that process also has to include rollbacks, which git has > made so so so easy. Just accept that _if_ 6.x-rc[12345] is buggy, > then it needs to be rolled back and subbmitted to 6.x+1-rc1 for the > next cycle after it's been baked. > > Anyone running such a bleeding edge kernel and finding problems isn't > going to care about having to hand apply patches, they're already > doing crazy things! *grin* > > > Working with users, supporting with them, checking in on how it's doing, > > and getting them the fixes for what they find is how we iterate and > > improve. The job is not done until it's working well for everyone. > > Yes, I agree 100% with all this. > > > Right now, everyone is concerned because this is a hotly anticipated > > project, and everyone wants to see it done right. > > So which is more important? Ship super fast and break things? Or be > willing to revert and ship just a bit slower? > > > And in 6.16, we had two massive pull requests (30+ patches in a > > week, twice in a row); that also generates concern when people are > > wondering "is this thing stabilizing?". > > Correct! > > > 6.16 was largely a case of a few particularly interesting bug > > reports generating a bunch of fixes (and relatively simple and > > localized fixes, which is what we like to see) for repair corner > > cases, the biggest culprit (again) being snapshots. > > Sure, fixes are great. But why did you have to drop them into -rc2 in > a big bundle? Why not just roll back what you had submitted and say > "it's not baked enough, it needs to wait a release"? > > > If you look at the bug tracker, especially rate of incoming bugs and the > > severity of bug reports (and also other sources of bug reports, like > > reddit and IRC) - yes, we are stabilizing fast. > > Sure, and I'm happy for this. And so are a bunch of other people! > > > There is still a lot of work to be done, but we're on the right track. > > No arguement there. > > > "Slowing down" is not something you do without a concrete > > reason. > > And this is where you and Linus are butting heads in my opinion. You > want to release big patches at any time. Linus wants to stabilize > releases and development for the entire kernel. You're concentrating > on your small area which is vitally important to you. But not > everyone is as invested. Others want the latest DRM drivers. Or the > latest i2c code, or some other subsystem which they care about. Linus > (and the process) is about the entire kernel. > > > Right now we need to be getting those fixes out to users so > > they can keep testing and finding the next bug. When someone has > > invested time and effort learning how the system works and how to > > report bugs, we don't watn them getting frustrated and leaving - we > > want to work with them, so they can keep testing and finding new > > bugs. > > So post patches on your own tree that they can use, nothing stops you! > > > The signals that would tell me it's time to slow down are: > > > - Regressions getting through (quantity, severity, time spent on fixing > > them) > > - Bugs getting through that show that show that something fundamental is > > missing (testing, hardening), or broken in our our design. > > - Frequency of bug reports going up to where I can't keep up (it's been > > in steady, gradual decline) > > > We actually do not want this to be 100% perfect before it sees users. > > That would result in a filesystem that's brittle - a glass cannon. We > > might get it to the point where it works 99% of the time, but then when > > it breaks we'd be in a panic - and if you discover it then, when it's in > > the wild, it's too late. > > > The processes for how we debug and recover from failures, in the wild, > > is a huge part (perhaps the majority) of what we're working on now. That > > stuff has to be baked into the design on a deep level, and like all > > other complex design it requires continual iteration. > > > That is how we'll get the reliability and robustness we hope to achieve. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 3:34 ` Kent Overstreet 2025-07-01 14:43 ` John Stoffel @ 2025-07-04 7:02 ` Hillf Danton 1 sibling, 0 replies; 19+ messages in thread From: Hillf Danton @ 2025-07-04 7:02 UTC (permalink / raw) To: Kent Overstreet Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl On Thu, 26 Jun 2025 23:34:11 -0400 Kent Overstreet wrote: > On Thu, Jun 26, 2025 at 08:21:23PM -0700, Linus Torvalds wrote: > > On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > per the maintainer thread discussion and precedent in xfs and btrfs > > > for repair code in RCs, journal_rewind is again included > > > > I have pulled this, but also as per that discussion, I think we'll be > > parting ways in the 6.17 merge window. > > > > You made it very clear that I can't even question any bug-fixes and I > > should just pull anything and everything. > > Linus, I'm not trying to say you can't have any say in bcachefs. Not at > all. > > I positively enjoy working with you - when you're not being a dick, but > you can be genuinely impossible sometimes. A lot of times... > Now I see why your nostrils are so up, dude, because like Linux dancing out of the Unix box, you are not so tame. Is it wasting minutes for Elon to think he is richer than Zucker? BTW I have some difficulty replying offline mails. Hillf Danton ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 3:21 ` Linus Torvalds 2025-06-27 3:34 ` Kent Overstreet @ 2025-06-27 19:07 ` Kyle Sanderson 2025-06-27 19:16 ` Kyle Sanderson 2025-06-28 8:06 ` Gerhard Wiesinger 2 siblings, 1 reply; 19+ messages in thread From: Kyle Sanderson @ 2025-06-27 19:07 UTC (permalink / raw) To: Linus Torvalds Cc: linux-bcachefs, linux-fsdevel, linux-kerenl, Kent Overstreet On 6/26/2025 8:21 PM, Linus Torvalds wrote: > On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: >> >> per the maintainer thread discussion and precedent in xfs and btrfs >> for repair code in RCs, journal_rewind is again included > > I have pulled this, but also as per that discussion, I think we'll be > parting ways in the 6.17 merge window. > > You made it very clear that I can't even question any bug-fixes and I > should just pull anything and everything. > > Honestly, at that point, I don't really feel comfortable being > involved at all, and the only thing we both seemed to really > fundamentally agree on in that discussion was "we're done". > > Linus Linus, The pushback on rewind makes sense, it wasn’t fully integrated and was fsck code written to fix the problems with the retail 6.15 release - this looks like it slipped through Kents CI and there were indeed multiple people hit by it (myself included). Quoting someone back to themselves is not cool, however I believe it highlights what has gone on here which is why I am breaking my own rule: "One of the things I liked about the Rust side of the kernel was that there was one maintainer who was clearly much younger than most of the maintainers and that was the Rust maintainer. We can clearly see that certain areas in the kernel bring in more young people. At the Maintainer Summit, we had this clear division between the filesystem people, who were very careful and very staid, and cared deeply about their code being 100% correct - because if you have a bug in a filesystem, the data on your disk may be gone - so these people take themselves and their code very seriously. And then you have the driver people who are a bit more 'okay', especially the GPU folks, 'where anything goes'. You notice that on the driver side it’s much easier to find young people, and that is traditionally how we’ve grown a lot of maintainers. " (1) Kent is moving like the older days of rapid development - fast and driven - and this style clashes with the mature stable filesystem culture that demands extreme caution today. Almost every single patch has been in response to reported issues, the primary issue here is that’s on IRC where his younger users are (not so young, anymore - it is not tiktok), and not on lkml. The pace of development has kept up, and the "new feature" part of it like changing out the entire hash table in rc6 seems to have stopped. This is still experimental, and he's moving that way now with care and continuing to improve his testing coverage with each bug. Kent has deep technical experience here, much earlier in the interview(1) regarding the 6.7 merge window this filesystem has been in the works for a decade. Maintainership means adapting to kernel process as much as code quality, that may be closer to the issue here. If direct pulls aren’t working, maybe a co-maintainer or routing changes through a senior fs maintainer can help. If you're open to it, maybe that is even you. Dropping bcachefs now would be a monumental step backward from the filesystems we have today. Enterprises simply do not use them for true storage at scale which is why vendors have largely taken over this space. The question is how to balance rigor with supporting new maintainers in the ecosystem. Everything Kent has written around supporting users is true, and publicly visible, if only to the 260 users on irc, and however many more are on matrix. There are plenty more that are offline, and while this is experimental there are a number of public sector agencies testing this now (I have seen reference to a number of emergency service providers, which isn’t great, but for whatever reason they are doing that). (1) https://youtu.be/OvuEYtkOH88?t=1044 Kyle. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 19:07 ` Kyle Sanderson @ 2025-06-27 19:16 ` Kyle Sanderson 2025-06-27 19:42 ` Kent Overstreet 0 siblings, 1 reply; 19+ messages in thread From: Kyle Sanderson @ 2025-06-27 19:16 UTC (permalink / raw) To: linux-kernel Cc: linux-bcachefs, linux-fsdevel, Kent Overstreet, Linus Torvalds On 6/27/2025 12:07 PM, Kyle Sanderson wrote: > On 6/26/2025 8:21 PM, Linus Torvalds wrote: >> On Thu, 26 Jun 2025 at 19:23, Kent Overstreet >> <kent.overstreet@linux.dev> wrote: >>> >>> per the maintainer thread discussion and precedent in xfs and btrfs >>> for repair code in RCs, journal_rewind is again included >> >> I have pulled this, but also as per that discussion, I think we'll be >> parting ways in the 6.17 merge window. >> >> You made it very clear that I can't even question any bug-fixes and I >> should just pull anything and everything. >> >> Honestly, at that point, I don't really feel comfortable being >> involved at all, and the only thing we both seemed to really >> fundamentally agree on in that discussion was "we're done". >> >> Linus > > Linus, > > The pushback on rewind makes sense, it wasn’t fully integrated and was > fsck code written to fix the problems with the retail 6.15 release - > this looks like it slipped through Kents CI and there were indeed > multiple people hit by it (myself included). > > Quoting someone back to themselves is not cool, however I believe it > highlights what has gone on here which is why I am breaking my own rule: > > "One of the things I liked about the Rust side of the kernel was that > there was one maintainer who was clearly much younger than most of the > maintainers and that was the Rust maintainer. > > We can clearly see that certain areas in the kernel bring in more young > people. > > At the Maintainer Summit, we had this clear division between the > filesystem people, who were very careful and very staid, and cared > deeply about their code being 100% correct - because if you have a bug > in a filesystem, the data on your disk may be gone - so these people > take themselves and their code very seriously. > > And then you have the driver people who are a bit more 'okay', > especially the GPU folks, 'where anything goes'. > You notice that on the driver side it’s much easier to find young > people, and that is traditionally how we’ve grown a lot of maintainers. > " (1) > > Kent is moving like the older days of rapid development - fast and > driven - and this style clashes with the mature stable filesystem > culture that demands extreme caution today. Almost every single patch > has been in response to reported issues, the primary issue here is > that’s on IRC where his younger users are (not so young, anymore - it is > not tiktok), and not on lkml. The pace of development has kept up, and > the "new feature" part of it like changing out the entire hash table in > rc6 seems to have stopped. This is still experimental, and he's moving > that way now with care and continuing to improve his testing coverage > with each bug. > > Kent has deep technical experience here, much earlier in the > interview(1) regarding the 6.7 merge window this filesystem has been in > the works for a decade. Maintainership means adapting to kernel process > as much as code quality, that may be closer to the issue here. > > If direct pulls aren’t working, maybe a co-maintainer or routing changes > through a senior fs maintainer can help. If you're open to it, maybe > that is even you. > > Dropping bcachefs now would be a monumental step backward from the > filesystems we have today. Enterprises simply do not use them for true > storage at scale which is why vendors have largely taken over this > space. The question is how to balance rigor with supporting new > maintainers in the ecosystem. Everything Kent has written around > supporting users is true, and publicly visible, if only to the 260 users > on irc, and however many more are on matrix. There are plenty more that > are offline, and while this is experimental there are a number of public > sector agencies testing this now (I have seen reference to a number of > emergency service providers, which isn’t great, but for whatever reason > they are doing that). > > (1) https://youtu.be/OvuEYtkOH88?t=1044 > > Kyle. Re-sending as this thread seems to have typo'd lkml (removing the bad entry). Kyle. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 19:16 ` Kyle Sanderson @ 2025-06-27 19:42 ` Kent Overstreet 0 siblings, 0 replies; 19+ messages in thread From: Kent Overstreet @ 2025-06-27 19:42 UTC (permalink / raw) To: Kyle Sanderson Cc: linux-kernel, linux-bcachefs, linux-fsdevel, Linus Torvalds On Fri, Jun 27, 2025 at 12:16:09PM -0700, Kyle Sanderson wrote: > On 6/27/2025 12:07 PM, Kyle Sanderson wrote: > > On 6/26/2025 8:21 PM, Linus Torvalds wrote: > > > On Thu, 26 Jun 2025 at 19:23, Kent Overstreet > > > <kent.overstreet@linux.dev> wrote: > > > > > > > > per the maintainer thread discussion and precedent in xfs and btrfs > > > > for repair code in RCs, journal_rewind is again included > > > > > > I have pulled this, but also as per that discussion, I think we'll be > > > parting ways in the 6.17 merge window. > > > > > > You made it very clear that I can't even question any bug-fixes and I > > > should just pull anything and everything. > > > > > > Honestly, at that point, I don't really feel comfortable being > > > involved at all, and the only thing we both seemed to really > > > fundamentally agree on in that discussion was "we're done". > > > > > > Linus > > > > Linus, > > > > The pushback on rewind makes sense, it wasn’t fully integrated and was > > fsck code written to fix the problems with the retail 6.15 release - > > this looks like it slipped through Kents CI and there were indeed > > multiple people hit by it (myself included). > > > > Quoting someone back to themselves is not cool, however I believe it > > highlights what has gone on here which is why I am breaking my own rule: > > > > "One of the things I liked about the Rust side of the kernel was that > > there was one maintainer who was clearly much younger than most of the > > maintainers and that was the Rust maintainer. > > > > We can clearly see that certain areas in the kernel bring in more young > > people. > > > > At the Maintainer Summit, we had this clear division between the > > filesystem people, who were very careful and very staid, and cared > > deeply about their code being 100% correct - because if you have a bug > > in a filesystem, the data on your disk may be gone - so these people > > take themselves and their code very seriously. > > > > And then you have the driver people who are a bit more 'okay', > > especially the GPU folks, 'where anything goes'. > > You notice that on the driver side it’s much easier to find young > > people, and that is traditionally how we’ve grown a lot of maintainers. > > " (1) > > > > Kent is moving like the older days of rapid development - fast and > > driven - and this style clashes with the mature stable filesystem > > culture that demands extreme caution today. Almost every single patch > > has been in response to reported issues, the primary issue here is > > that’s on IRC where his younger users are (not so young, anymore - it is > > not tiktok), and not on lkml. The pace of development has kept up, and > > the "new feature" part of it like changing out the entire hash table in > > rc6 seems to have stopped. This is still experimental, and he's moving > > that way now with care and continuing to improve his testing coverage > > with each bug. > > > > Kent has deep technical experience here, much earlier in the > > interview(1) regarding the 6.7 merge window this filesystem has been in > > the works for a decade. Maintainership means adapting to kernel process > > as much as code quality, that may be closer to the issue here. > > > > If direct pulls aren’t working, maybe a co-maintainer or routing changes > > through a senior fs maintainer can help. If you're open to it, maybe > > that is even you. > > > > Dropping bcachefs now would be a monumental step backward from the > > filesystems we have today. Enterprises simply do not use them for true > > storage at scale which is why vendors have largely taken over this > > space. The question is how to balance rigor with supporting new > > maintainers in the ecosystem. Everything Kent has written around > > supporting users is true, and publicly visible, if only to the 260 users > > on irc, and however many more are on matrix. There are plenty more that > > are offline, and while this is experimental there are a number of public > > sector agencies testing this now (I have seen reference to a number of > > emergency service providers, which isn’t great, but for whatever reason > > they are doing that). > > > > (1) https://youtu.be/OvuEYtkOH88?t=1044 > > > > Kyle. > > Re-sending as this thread seems to have typo'd lkml (removing the bad > entry). Thanks. Also, I think I should add, in case my words in the private conversation were misinterpreted: I don't think bcachefs should be dropped from the kernel, I think it would be better for this to be worked out. I firstly want to reassure people that: if bcachefs has to be shipped as a DKMS module, that will not kill the project. It will be a giant hassle (especially if distributions have to scramble), but life will continue. I remain committed as ever to getting this done - one way or the other. And I think it is safe to say that going that route would be the better option for the sanity of myself and Linus, but it wouldn't be the better option for the users or the rest of the development community. With that, I am going to take a breather. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 3:21 ` Linus Torvalds 2025-06-27 3:34 ` Kent Overstreet 2025-06-27 19:07 ` Kyle Sanderson @ 2025-06-28 8:06 ` Gerhard Wiesinger 2 siblings, 0 replies; 19+ messages in thread From: Gerhard Wiesinger @ 2025-06-28 8:06 UTC (permalink / raw) To: Linus Torvalds, Kent Overstreet Cc: linux-bcachefs, linux-fsdevel, linux-kerenl On 27.06.2025 05:21, Linus Torvalds wrote: > On Thu, 26 Jun 2025 at 19:23, Kent Overstreet <kent.overstreet@linux.dev> wrote: >> per the maintainer thread discussion and precedent in xfs and btrfs >> for repair code in RCs, journal_rewind is again included > I have pulled this, but also as per that discussion, I think we'll be > parting ways in the 6.17 merge window. > > You made it very clear that I can't even question any bug-fixes and I > should just pull anything and everything. > > Honestly, at that point, I don't really feel comfortable being > involved at all, and the only thing we both seemed to really > fundamentally agree on in that discussion was "we're done". > Hello Linus, Do you think the "hard rules" for "no features" in the "fixing merge window" also apply for modules in Linux kernel which are marked as experimental (as long no other code outside of the module itself is changed)? I understand your points fully for non experimental code but maybe it is a solution is to have different rules for code marked as experimental code. Every user who uses experimental features should be aware that potential non stable code is used. Maybe you can think of it. Thnx. Ciao, Gerhard ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet 2025-06-27 3:21 ` Linus Torvalds @ 2025-06-27 3:33 ` pr-tracker-bot 2025-06-27 14:46 ` Josef Bacik 2 siblings, 0 replies; 19+ messages in thread From: pr-tracker-bot @ 2025-06-27 3:33 UTC (permalink / raw) To: Kent Overstreet Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl The pull request you sent on Thu, 26 Jun 2025 22:22:52 -0400: > git://evilpiepirate.org/bcachefs.git tags/bcachefs-2025-06-26 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/6f2a71a99ebd5dfaa7948a2e9c59eae94b741bd8 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet 2025-06-27 3:21 ` Linus Torvalds 2025-06-27 3:33 ` pr-tracker-bot @ 2025-06-27 14:46 ` Josef Bacik 2025-06-28 1:59 ` Theodore Ts'o 2 siblings, 1 reply; 19+ messages in thread From: Josef Bacik @ 2025-06-27 14:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-bcachefs, linux-fsdevel, linux-kerenl On Thu, Jun 26, 2025 at 10:22:52PM -0400, Kent Overstreet wrote: > per the maintainer thread discussion and precedent in xfs and btrfs > for repair code in RCs, journal_rewind is again included > I'm replying to set the record straight. This is not the start of the discussion. I am not going to let false statements stand by unchallenged however. Sterba has never sent large pull requests in RCs, certainly not with features in them. Even when Chris was the maintainer and we were a little faster and looser and were pushing the envelope to see what Linus would accept we didn't ship anything near this volume of patches past rc1. And the numbers don't lie. josef@fedora:~/linux$ git tag --contains 1c6fdbd8f2465ddfb73a01ec620cbf3d14044e1a | grep -v rc > bcachefs-tags josef@fedora:~/linux$ git tag --contains be0e5c097fc206b863ce9fe6b3cfd6974b0110f4 | grep -v rc > tags josef@fedora:~/linux$ for i in $(cat tags); do git log --no-merges --oneline $i-rc2..$i fs/btrfs | wc -l; done > btrfs-counts.txt josef@fedora:~/linux$ for i in $(cat bcachefs-tags); do git log --no-merges --oneline $i-rc2..$i fs/bcachefs | wc -l; done > bcachefs-counts.txt josef@fedora:~/linux$ R -q -e "x <- read.csv('btrfs-counts.txt', header = F); summary(x); sd(x[ , 1 ])" > x <- read.csv('btrfs-counts.txt', header = F); summary(x); sd(x[ , 1 ]) V1 Min. : 0.00 1st Qu.:10.25 Median :19.00 Mean :20.48 3rd Qu.:27.50 Max. :55.00 [1] 11.77108 > josef@fedora:~/linux$ R -q -e "x <- read.csv('bcachefs-counts.txt', header = F); summary(x); sd(x[ , 1 ])" > x <- read.csv('bcachefs-counts.txt', header = F); summary(x); sd(x[ , 1 ]) V1 Min. : 0.00 1st Qu.: 38.50 Median : 70.00 Mean : 63.86 3rd Qu.: 81.50 Max. :137.00 [1] 45.28218 > So even including the wilder times of kernel development in general and btrfs's specifically, our worst window was 55 patches, less than your mean. These are not the same thing. Do not equivicate the two. Sterba is a phenomenal maintainer who does his job well, manages to work with Linus just fine. We are not the same, we do not work the same, and we absolutely do follow the rules, as do 99.99% of the kernel community. If xfs has done this then good for them. Those developers have a track record of doing the right thing over a long period of time. Btrfs for sure hasn't. Thanks, Josef ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [GIT PULL] bcachefs fixes for 6.16-rc4 2025-06-27 14:46 ` Josef Bacik @ 2025-06-28 1:59 ` Theodore Ts'o 0 siblings, 0 replies; 19+ messages in thread From: Theodore Ts'o @ 2025-06-28 1:59 UTC (permalink / raw) To: Josef Bacik; +Cc: Linus Torvalds, linux-bcachefs, linux-fsdevel, linux-kerenl On Fri, Jun 27, 2025 at 10:46:04AM -0400, Josef Bacik wrote: > On Thu, Jun 26, 2025 at 10:22:52PM -0400, Kent Overstreet wrote: > > per the maintainer thread discussion and precedent in xfs and btrfs > > for repair code in RCs, journal_rewind is again included > > I'm replying to set the record straight. This is not the start of the > discussion. I am not going to let false statements stand by unchallenged > however. > > Sterba has never sent large pull requests in RCs, certainly not with > features in them. Even when Chris was the maintainer and we were a > little faster and looser and were pushing the envelope to see what > Linus would accept we didn't ship anything near this volume of > patches past rc1. And as far as XFS is concerned, "citation needed". Dave Chinner (who is not the current XFS maintainer) has asserted that there might be a time when XFS *might* want to send repair code post merge window. However, I'm not aware of any time when Darrick Wong was working on XFS online repair that he sent changes outside of the merge window as the XFS maintainer. And now that XFS online repair feature is upstream, *bug fixes* can be sent at any time. So (a) I am not aware of any time that XFS *has* sent online repair changes upstream outside of a merge window --- this is just an assertion by Kent --- and (b) I am not sure when XFS would need to send some kind of new feature involving online repair upstream, given that online repair is *already* upstream. - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-07-07 21:32 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-27 2:22 [GIT PULL] bcachefs fixes for 6.16-rc4 Kent Overstreet 2025-06-27 3:21 ` Linus Torvalds 2025-06-27 3:34 ` Kent Overstreet 2025-07-01 14:43 ` John Stoffel 2025-07-02 16:34 ` Kent Overstreet 2025-07-02 17:41 ` Carl E. Thompson 2025-07-02 17:53 ` Kent Overstreet 2025-07-02 18:49 ` Malte Schröder 2025-07-07 20:03 ` John Stoffel 2025-07-07 20:39 ` Kent Overstreet 2025-07-07 21:32 ` Carl E. Thompson 2025-07-04 7:02 ` Hillf Danton 2025-06-27 19:07 ` Kyle Sanderson 2025-06-27 19:16 ` Kyle Sanderson 2025-06-27 19:42 ` Kent Overstreet 2025-06-28 8:06 ` Gerhard Wiesinger 2025-06-27 3:33 ` pr-tracker-bot 2025-06-27 14:46 ` Josef Bacik 2025-06-28 1:59 ` Theodore Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).