* Re: [RFC PATCH 3/4] ceph: introduce ceph_submit_write() method [not found] ` <Z6-xg-p_mi3I1aMq@casper.infradead.org> @ 2025-08-26 22:06 ` Max Kellermann 2025-08-26 22:33 ` Viacheslav Dubeyko 0 siblings, 1 reply; 5+ messages in thread From: Max Kellermann @ 2025-08-26 22:06 UTC (permalink / raw) To: Matthew Wilcox, brauner Cc: Viacheslav Dubeyko, ceph-devel, idryomov, dhowells, linux-fsdevel, pdonnell, amarkuze, Slava.Dubeyko, linux-kernel On 2025/02/14 22:11, Matthew Wilcox <willy@infradead.org> wrote: > On Tue, Feb 04, 2025 at 04:02:48PM -0800, Viacheslav Dubeyko wrote: > > This patch implements refactoring of ceph_submit_write() > > and also it solves the second issue. > > > > Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> > > This kind of giant refactoring to solve a bug is a really bad idea. > First, it's going to need to be backported to older kernels. How far > back? You need to identify that with a Fixes: line. > > It's also really hard to review and know whether it's right. You might > have introduced several new bugs while doing it. In general, bugfixes > first, refactor later. I *think* this means we can do without 1/7 of > the patches I resent earlier today, but it's really hard to be sure. I'm very disappointed that nobody has listened to Matthew's complaint. Viacheslav has only hand-waved it away with a lazy non-argument. From Documentation/process/submitting-patches.rst: "you should not modify the moved code at all in the same patch which moves it" Obviously, this patch set violates this rule. There are lots of semantic/behavior changes in the patches that move code around. In the end, Christian Brauner has merged this into Linux 6.15 and that merge has wreaked havoc in our production clusters. We have been testing 6.15 for a month with no problems (after David Howells had fixed yet-another netfs regression that was stable-backported to 6.15, ugh!), but when we updated a production clusters, all servers had crashed after a few hours, and our ops team had a very bad night. This patch set is obviously bad. It pretends to fix a bug, but really rewrites almost everything in two patches documented as "introduce XY method" with no real explanation for why Viacheslav has decided to do it, instead of just fixing the bug (as Matthew asked him to). Look at this line modified by patch "ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoring": > + ceph_wbc.fbatch.folios[i] = NULL; This sets a folio_batch element to NULL, which will, of course, crash in folios_put_refs() (but only if the global huge zero page has already been created). Fortunately, there's code that removes all NULL elements from the folio_batch array. That is code that already existed before Viacheslav's patch set (code which I already dislike because it's a fragile mess that is just waiting to crash), and the code was only being moved around. Did I mention that I think this is a fragile mess? Fast-forward to Viacheslav's patch "ceph: introduce ceph_process_folio_batch() method" which moves the NULL-setting loop to ceph_process_folio_batch(). Look at this (untouched) piece of code after the ceph_process_folio_batch() call: > if (i) { > unsigned j, n = 0; > /* shift unused page to beginning of fbatch */ Shifting only happens if at least one folio has been processed ("i" was incremented). But does it really happen? No, the loop was moved to ceph_process_folio_batch(), and nobody ever increments "i" anymore. Voila, crash due to NULL pointer dereference: BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ceph_writepages_start+0xeb9/0x1410 Viacheslav's third patch "ceph: introduce ceph_submit_write() method" messes up the logic a bit more and makes it more fragile by hiding the shifting code behind more conditions: - if ceph_process_folio_batch() fails, shifting never happens - if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens - if processed_in_fbatch is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens If shifting doesn't happen, then the kernel crashes (unless huge-zero-page doesn't exist, see above). Obviously, nobody has ever looked closely enough at the code. I'm still new to Linux memory management and file systems, but these problems were obvious when I first saw this patch set (which was my candidate for the other 6.15 crashes which then turned out to be netfs regressions, not Ceph). This whole patch set is a huge mess and has caused my team a good amount of pain. This could and should have been avoided, had only somebody listened to Matthew. (Also look at all those "checkpatch.pl" complaints on all patches in this patch set. There are many coding style violations.) Can we please revert the whole patch set? I don't think it's possible to fix all the weird undocumented side effects that may cause more crashes once we fix this one. Refactoring the Ceph code sure is necessary, it's not in a good shape, but it should be done more carefully. Some people (like me) depend on Ceph's stability, and this mess is doing a disservice to Ceph's reputation. Max ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [RFC PATCH 3/4] ceph: introduce ceph_submit_write() method 2025-08-26 22:06 ` [RFC PATCH 3/4] ceph: introduce ceph_submit_write() method Max Kellermann @ 2025-08-26 22:33 ` Viacheslav Dubeyko 2025-08-26 22:53 ` Max Kellermann 0 siblings, 1 reply; 5+ messages in thread From: Viacheslav Dubeyko @ 2025-08-26 22:33 UTC (permalink / raw) To: brauner@kernel.org, Patrick Donnelly, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, slava@dubeyko.com, David Howells, Alex Markuze, willy@infradead.org, idryomov@gmail.com On Wed, 2025-08-27 at 00:06 +0200, Max Kellermann wrote: > On 2025/02/14 22:11, Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Feb 04, 2025 at 04:02:48PM -0800, Viacheslav Dubeyko wrote: > > > This patch implements refactoring of ceph_submit_write() > > > and also it solves the second issue. > > > > > > Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> > > > > This kind of giant refactoring to solve a bug is a really bad idea. > > First, it's going to need to be backported to older kernels. How far > > back? You need to identify that with a Fixes: line. > > > > It's also really hard to review and know whether it's right. You might > > have introduced several new bugs while doing it. In general, bugfixes > > first, refactor later. I *think* this means we can do without 1/7 of > > the patches I resent earlier today, but it's really hard to be sure. > > I'm very disappointed that nobody has listened to Matthew's complaint. > Viacheslav has only hand-waved it away with a lazy non-argument. > > From Documentation/process/submitting-patches.rst: > > "you should not modify the moved code at all in the same patch which > moves it" > > Obviously, this patch set violates this rule. There are lots of > semantic/behavior changes in the patches that move code around. > > In the end, Christian Brauner has merged this into Linux 6.15 and that > merge has wreaked havoc in our production clusters. We have been > testing 6.15 for a month with no problems (after David Howells had > fixed yet-another netfs regression that was stable-backported to 6.15, > ugh!), but when we updated a production clusters, all servers had > crashed after a few hours, and our ops team had a very bad night. > > This patch set is obviously bad. It pretends to fix a bug, but really > rewrites almost everything in two patches documented as "introduce XY > method" with no real explanation for why Viacheslav has decided to do > it, instead of just fixing the bug (as Matthew asked him to). > > Look at this line modified by patch "ceph: extend ceph_writeback_ctl > for ceph_writepages_start() refactoring": > > > + ceph_wbc.fbatch.folios[i] = NULL; > > This sets a folio_batch element to NULL, which will, of course, crash > in folios_put_refs() (but only if the global huge zero page has > already been created). Fortunately, there's code that removes all > NULL elements from the folio_batch array. That is code that already > existed before Viacheslav's patch set (code which I already dislike > because it's a fragile mess that is just waiting to crash), and the > code was only being moved around. > > Did I mention that I think this is a fragile mess? Fast-forward to > Viacheslav's patch "ceph: introduce ceph_process_folio_batch() method" > which moves the NULL-setting loop to ceph_process_folio_batch(). Look > at this (untouched) piece of code after the ceph_process_folio_batch() > call: > > > if (i) { > > unsigned j, n = 0; > > /* shift unused page to beginning of fbatch */ > > Shifting only happens if at least one folio has been processed ("i" > was incremented). But does it really happen? > > No, the loop was moved to ceph_process_folio_batch(), and nobody ever > increments "i" anymore. Voila, crash due to NULL pointer dereference: > > BUG: kernel NULL pointer dereference, address: 0000000000000034 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: Oops: 0002 [#1] SMP NOPTI > CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE > Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 > Workqueue: writeback wb_workfn (flush-ceph-1) > RIP: 0010:folios_put_refs+0x85/0x140 > Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 > RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 > RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f > R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 > FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 > PKRU: 55555554 > Call Trace: > <TASK> > ceph_writepages_start+0xeb9/0x1410 > > Viacheslav's third patch "ceph: introduce ceph_submit_write() method" > messes up the logic a bit more and makes it more fragile by hiding the > shifting code behind more conditions: > > - if ceph_process_folio_batch() fails, shifting never happens > > - if ceph_move_dirty_page_in_page_array() was never called (because > ceph_process_folio_batch() has returned early for some of various > reasons), shifting never happens > > - if processed_in_fbatch is zero (because ceph_process_folio_batch() > has returned early for some of the reasons mentioned above or > because ceph_move_dirty_page_in_page_array() has failed), shifting > never happens > > If shifting doesn't happen, then the kernel crashes (unless > huge-zero-page doesn't exist, see above). Obviously, nobody has ever > looked closely enough at the code. I'm still new to Linux memory > management and file systems, but these problems were obvious when I > first saw this patch set (which was my candidate for the other 6.15 > crashes which then turned out to be netfs regressions, not Ceph). > > This whole patch set is a huge mess and has caused my team a good > amount of pain. This could and should have been avoided, had only > somebody listened to Matthew. > > (Also look at all those "checkpatch.pl" complaints on all patches in > this patch set. There are many coding style violations.) > > Can we please revert the whole patch set? I don't think it's possible > to fix all the weird undocumented side effects that may cause more > crashes once we fix this one. Refactoring the Ceph code sure is > necessary, it's not in a good shape, but it should be done more > carefully. Some people (like me) depend on Ceph's stability, and this > mess is doing a disservice to Ceph's reputation. Of course, we can revert any patch. This patchset has been sent not with the goal of pure refactoring but it fixes several bugs. Reverting means returning these bugs back. This patchset was available for review for a long time. So, anybody was able to review it and to share any comments. I am always happy to rework and to make any patch more better. From my point of view, reverting is not answer and it makes sense to continue fix bugs and to make CephFS code more stable. Thanks, Slava. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH 3/4] ceph: introduce ceph_submit_write() method 2025-08-26 22:33 ` Viacheslav Dubeyko @ 2025-08-26 22:53 ` Max Kellermann 2025-08-27 3:06 ` Viacheslav Dubeyko 0 siblings, 1 reply; 5+ messages in thread From: Max Kellermann @ 2025-08-26 22:53 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: brauner@kernel.org, Patrick Donnelly, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, slava@dubeyko.com, David Howells, Alex Markuze, willy@infradead.org, idryomov@gmail.com On 2025/08/27 00:33, Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > Of course, we can revert any patch. This patchset has been sent not with the > goal of pure refactoring but it fixes several bugs. Reverting means returning > these bugs back. You should have listened of Matthew and submit separate minimal bug-fixing patches instead of posting huge patches which move code around, change semantics and hidden somewhere deep within fix some bug (and then introduce new bugs). > This patchset was available for review for a long time. There was exactly one review, and no, you were not "happy to rework and to make any patch more better" - you openly rejected Matthew's review. > From my point of view, reverting is not answer and it makes sense to > continue fix bugs and to make CephFS code more stable. Your argument only appears to sound right, but it is detached from the reality I'm living in. Your patches made Ceph less stable. 6.14 had one Ceph-related crash every other week, but 6.15 with your patches made all servers crash within hours. The point is: the Linux kernel was better without your patches. Your patches may have fixed a bug, but have introduced a dozen new bugs, including one that very quickly crashes the whole kernel, one that was really obvious enough, just nobody cared enough to read deeply enough after you rejected Matthew's review. Too bad no maintainer stopped you! Of course, the bug that was fixed by your patch set should be fixed - but not the way you did it. Every aspect of your approach to fixing the bug was bad. The best way forward for you would be to revert this patch set and write a minimal patch that only fixes the bug. If you want to be helpful here, please give this a try. Max ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [RFC PATCH 3/4] ceph: introduce ceph_submit_write() method 2025-08-26 22:53 ` Max Kellermann @ 2025-08-27 3:06 ` Viacheslav Dubeyko 2025-08-27 4:57 ` Max Kellermann 0 siblings, 1 reply; 5+ messages in thread From: Viacheslav Dubeyko @ 2025-08-27 3:06 UTC (permalink / raw) To: David Howells, brauner@kernel.org, slava@dubeyko.com, ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, Alex Markuze, Patrick Donnelly, willy@infradead.org, idryomov@gmail.com, linux-fsdevel@vger.kernel.org On Wed, 2025-08-27 at 00:53 +0200, Max Kellermann wrote: > On 2025/08/27 00:33, Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > Of course, we can revert any patch. This patchset has been sent not with the > > goal of pure refactoring but it fixes several bugs. Reverting means returning > > these bugs back. > > You should have listened of Matthew and submit separate minimal > bug-fixing patches instead of posting huge patches which move code > around, change semantics and hidden somewhere deep within fix some bug > (and then introduce new bugs). > > > This patchset was available for review for a long time. > > There was exactly one review, and no, you were not "happy to rework > and to make any patch more better" - you openly rejected Matthew's > review. > > > From my point of view, reverting is not answer and it makes sense to > > continue fix bugs and to make CephFS code more stable. > > Your argument only appears to sound right, but it is detached from the > reality I'm living in. > > Your patches made Ceph less stable. 6.14 had one Ceph-related crash > every other week, but 6.15 with your patches made all servers crash > within hours. > > The point is: the Linux kernel was better without your patches. Your > patches may have fixed a bug, but have introduced a dozen new bugs, > including one that very quickly crashes the whole kernel, one that was > really obvious enough, just nobody cared enough to read deeply enough > after you rejected Matthew's review. Too bad no maintainer stopped > you! > > Of course, the bug that was fixed by your patch set should be fixed - > but not the way you did it. Every aspect of your approach to fixing > the bug was bad. > > The best way forward for you would be to revert this patch set and > write a minimal patch that only fixes the bug. If you want to be > helpful here, please give this a try. > > This patchset had been tested by xfstests and no issue had been triggered. We have not enough Ceph related test-cases. So, you are welcome to add a test-case for the issue. This issue had been reported more than a month ago. I tried to reproduce it but I had no luck. So, if you are lucky enough, then simply share the patch or the way to reproduce the issue. The patchset simply placed old code into dedicated functions with the goal to manage complexity. So, the old code is still there and the patch cannot introduce "dozen new bugs". Otherwise, xfstests can easily reproduce these gazilion of bugs. :) Currently, we have only one issue reported. The open-source is space for collaboration and respect of each other. It is not space for offensive or bullying behavior. Thanks, Slava. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH 3/4] ceph: introduce ceph_submit_write() method 2025-08-27 3:06 ` Viacheslav Dubeyko @ 2025-08-27 4:57 ` Max Kellermann 0 siblings, 0 replies; 5+ messages in thread From: Max Kellermann @ 2025-08-27 4:57 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: David Howells, brauner@kernel.org, slava@dubeyko.com, ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, Alex Markuze, Patrick Donnelly, willy@infradead.org, idryomov@gmail.com, linux-fsdevel@vger.kernel.org On 2025/08/27 05:06, Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > The open-source is space for collaboration and respect of each other. It is not > space for offensive or bullying behavior. This is now your third reply to a comment on your patch set (one to Matthew and two to me), and you are still sidestepping the valid criticism. This time, instead of talking how to fix it, you are explaining how well you tested this, and then this meta-comment. No, for the bug discussion, it does not matter how well you tested this. If it's buggy, it's buggy. For me, as an amateur, the bug was obvious enough to see in the source code, and after my long explanation, it should be easy enough for everybody else to see, isn't it? And no, I don't owe you a test case. I explained the exact conditions that need to occur for the crash to happen, and I have to trust that you are capable of easily injecting such an error, aren't you? This is not what I call "collaboration" neither "respect", to me it rather seems like ignorance. You were not collaborating with the reviewer and you are not respecting the bug report. You just divert the discussion instead of talking about the actual problems. And, note: in both of your replies to me, you removed me from the recipient list. Everybody else you kept, you only removed me. I had to look up your replies on lore.kernel.org. (You did not do that with your replies to Matthew and David.) What kind of funny behavior is that? Max ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-27 4:57 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20250205000249.123054-1-slava@dubeyko.com> [not found] ` <20250205000249.123054-4-slava@dubeyko.com> [not found] ` <Z6-xg-p_mi3I1aMq@casper.infradead.org> 2025-08-26 22:06 ` [RFC PATCH 3/4] ceph: introduce ceph_submit_write() method Max Kellermann 2025-08-26 22:33 ` Viacheslav Dubeyko 2025-08-26 22:53 ` Max Kellermann 2025-08-27 3:06 ` Viacheslav Dubeyko 2025-08-27 4:57 ` Max Kellermann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).