* [PATCH 1/2] ceph: free page array when ceph_submit_write() fails [not found] <20260126022715.404984-1-CFSworks@gmail.com> @ 2026-01-26 2:27 ` Sam Edwards 2026-01-26 22:51 ` Viacheslav Dubeyko 2026-02-11 14:10 ` Ilya Dryomov 0 siblings, 2 replies; 5+ messages in thread From: Sam Edwards @ 2026-01-26 2:27 UTC (permalink / raw) To: Xiubo Li, Ilya Dryomov Cc: Viacheslav Dubeyko, Christian Brauner, Milind Changire, Jeff Layton, ceph-devel, linux-kernel, Sam Edwards, stable If `locked_pages` is zero, the page array must not be allocated: ceph_process_folio_batch() uses `locked_pages` to decide when to allocate `pages`, and redundant allocations trigger ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and writeback stall) or even a kernel panic. Consequently, the main loop in ceph_writepages_start() assumes that the lifetime of `pages` is confined to a single iteration. The ceph_submit_write() function claims ownership of the page array on success (it is later freed when the write concludes). But failures only redirty/unlock the pages and fail to free the array, making the failure case in ceph_submit_write() fatal. Free the page array (and reset locked_pages) in ceph_submit_write()'s error-handling 'if' block so that the caller's invariant (that the array does not remain in ceph_wbc) is maintained unconditionally, making failures in ceph_submit_write() recoverable as originally intended. Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards <CFSworks@gmail.com> --- fs/ceph/addr.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 63b75d214210..c3e0b5b429ea 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping, unlock_page(page); } + if (ceph_wbc->from_pool) { + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); + ceph_wbc->from_pool = false; + } else + kfree(ceph_wbc->pages); + ceph_wbc->pages = NULL; + ceph_wbc->locked_pages = 0; + ceph_osdc_put_request(req); return -EIO; } -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails 2026-01-26 2:27 ` [PATCH 1/2] ceph: free page array when ceph_submit_write() fails Sam Edwards @ 2026-01-26 22:51 ` Viacheslav Dubeyko 2026-01-29 0:10 ` Sam Edwards 2026-02-11 14:10 ` Ilya Dryomov 1 sibling, 1 reply; 5+ messages in thread From: Viacheslav Dubeyko @ 2026-01-26 22:51 UTC (permalink / raw) To: Xiubo Li, idryomov@gmail.com, cfsworks@gmail.com Cc: Milind Changire, stable@vger.kernel.org, ceph-devel@vger.kernel.org, brauner@kernel.org, jlayton@kernel.org, linux-kernel@vger.kernel.org On Sun, 2026-01-25 at 18:27 -0800, Sam Edwards wrote: > If `locked_pages` is zero, the page array must not be allocated: > ceph_process_folio_batch() uses `locked_pages` to decide when to > allocate `pages`, and redundant allocations trigger > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and > writeback stall) or even a kernel panic. Consequently, the main loop in > ceph_writepages_start() assumes that the lifetime of `pages` is confined > to a single iteration. > > The ceph_submit_write() function claims ownership of the page array on > success (it is later freed when the write concludes). But failures only > redirty/unlock the pages and fail to free the array, making the failure > case in ceph_submit_write() fatal. > > Free the page array (and reset locked_pages) in ceph_submit_write()'s > error-handling 'if' block so that the caller's invariant (that the array > does not remain in ceph_wbc) is maintained unconditionally, making > failures in ceph_submit_write() recoverable as originally intended. > > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") > Cc: stable@vger.kernel.org > Signed-off-by: Sam Edwards <CFSworks@gmail.com> > --- > fs/ceph/addr.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 63b75d214210..c3e0b5b429ea 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping, > unlock_page(page); > } > > + if (ceph_wbc->from_pool) { > + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); > + ceph_wbc->from_pool = false; > + } else > + kfree(ceph_wbc->pages); > + ceph_wbc->pages = NULL; > + ceph_wbc->locked_pages = 0; > + I see the completely identical code pattern in two patches: + if (ceph_wbc->from_pool) { + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); + ceph_wbc->from_pool = false; + } else + kfree(ceph_wbc->pages); + ceph_wbc->pages = NULL; + ceph_wbc->locked_pages = 0; I believe we need to introduce the inline function that can be reused in two places. Thanks, Slava. > ceph_osdc_put_request(req); > return -EIO; > } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails 2026-01-26 22:51 ` Viacheslav Dubeyko @ 2026-01-29 0:10 ` Sam Edwards 0 siblings, 0 replies; 5+ messages in thread From: Sam Edwards @ 2026-01-29 0:10 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: Xiubo Li, idryomov@gmail.com, Milind Changire, stable@vger.kernel.org, ceph-devel@vger.kernel.org, brauner@kernel.org, jlayton@kernel.org, linux-kernel@vger.kernel.org On Mon, Jan 26, 2026 at 2:51 PM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > On Sun, 2026-01-25 at 18:27 -0800, Sam Edwards wrote: > > If `locked_pages` is zero, the page array must not be allocated: > > ceph_process_folio_batch() uses `locked_pages` to decide when to > > allocate `pages`, and redundant allocations trigger > > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and > > writeback stall) or even a kernel panic. Consequently, the main loop in > > ceph_writepages_start() assumes that the lifetime of `pages` is confined > > to a single iteration. > > > > The ceph_submit_write() function claims ownership of the page array on > > success (it is later freed when the write concludes). But failures only > > redirty/unlock the pages and fail to free the array, making the failure > > case in ceph_submit_write() fatal. > > > > Free the page array (and reset locked_pages) in ceph_submit_write()'s > > error-handling 'if' block so that the caller's invariant (that the array > > does not remain in ceph_wbc) is maintained unconditionally, making > > failures in ceph_submit_write() recoverable as originally intended. > > > > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") > > Cc: stable@vger.kernel.org > > Signed-off-by: Sam Edwards <CFSworks@gmail.com> > > --- > > fs/ceph/addr.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > index 63b75d214210..c3e0b5b429ea 100644 > > --- a/fs/ceph/addr.c > > +++ b/fs/ceph/addr.c > > @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping, > > unlock_page(page); > > } > > > > + if (ceph_wbc->from_pool) { > > + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); > > + ceph_wbc->from_pool = false; > > + } else > > + kfree(ceph_wbc->pages); > > + ceph_wbc->pages = NULL; > > + ceph_wbc->locked_pages = 0; > > + > > > I see the completely identical code pattern in two patches: The second patch only contains that pattern because it is moving it to a separate function, patch 2 isn't introducing any *new* code. > > + if (ceph_wbc->from_pool) { > + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); > + ceph_wbc->from_pool = false; > + } else > + kfree(ceph_wbc->pages); > + ceph_wbc->pages = NULL; > + ceph_wbc->locked_pages = 0; > > I believe we need to introduce the inline function that can be reused in two > places. Patch 2 is introducing that inline function as requested -- but that function is not actually used in two places: for now (in this series), it is only split out for better readability. These patches are organized like this because of kernel development norms: bugfixes intended for stable (such as this patch) should consist of minimal, backport-friendly and correctness-focused changes. Moving existing code to a new function is a separate change and does not constitute a bugfix, so it needs to go in its own patch that isn't Cc: stable. Cheers, Sam > > Thanks, > Slava. > > > ceph_osdc_put_request(req); > > return -EIO; > > } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails 2026-01-26 2:27 ` [PATCH 1/2] ceph: free page array when ceph_submit_write() fails Sam Edwards 2026-01-26 22:51 ` Viacheslav Dubeyko @ 2026-02-11 14:10 ` Ilya Dryomov 2026-02-11 14:52 ` Ilya Dryomov 1 sibling, 1 reply; 5+ messages in thread From: Ilya Dryomov @ 2026-02-11 14:10 UTC (permalink / raw) To: Sam Edwards Cc: Xiubo Li, Viacheslav Dubeyko, Christian Brauner, Milind Changire, Jeff Layton, ceph-devel, linux-kernel, stable On Mon, Jan 26, 2026 at 3:27 AM Sam Edwards <cfsworks@gmail.com> wrote: > > If `locked_pages` is zero, the page array must not be allocated: > ceph_process_folio_batch() uses `locked_pages` to decide when to > allocate `pages`, and redundant allocations trigger > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and > writeback stall) or even a kernel panic. Consequently, the main loop in > ceph_writepages_start() assumes that the lifetime of `pages` is confined > to a single iteration. > > The ceph_submit_write() function claims ownership of the page array on > success (it is later freed when the write concludes). But failures only > redirty/unlock the pages and fail to free the array, making the failure > case in ceph_submit_write() fatal. > > Free the page array (and reset locked_pages) in ceph_submit_write()'s > error-handling 'if' block so that the caller's invariant (that the array > does not remain in ceph_wbc) is maintained unconditionally, making > failures in ceph_submit_write() recoverable as originally intended. > > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") > Cc: stable@vger.kernel.org > Signed-off-by: Sam Edwards <CFSworks@gmail.com> > --- > fs/ceph/addr.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 63b75d214210..c3e0b5b429ea 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping, > unlock_page(page); > } > > + if (ceph_wbc->from_pool) { > + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); > + ceph_wbc->from_pool = false; > + } else > + kfree(ceph_wbc->pages); > + ceph_wbc->pages = NULL; > + ceph_wbc->locked_pages = 0; Hi Sam, While I don't see anything wrong with the patch per se, I can't help but question the existence of this entire branch along with the meaning of the error. ceph_writepages_start() is the only caller of ceph_submit_write() and it already calls ceph_inc_osd_stopping_blocker() at the top where the error can be handled naturally -- nothing needs to be unlocked or freed at that point. Since mdsc->stopping_blockers is just a counter, all calls made by ceph_submit_write() invocations in a loop would be "contained" within that ceph_writepages_start() call. The only benefit achieved is potentially faster response to the MDS client moving to CEPH_MDSC_STOPPING_FLUSHING state, but it's rather dubious because sneaking in/having to wait for some more OSD requests isn't really the end of the world. Rather than patching the error path, I wonder if instead of calling ceph_inc_osd_stopping_blocker() the counter could just be incremented unconditionally, with the check for CEPH_MDSC_STOPPING_FLUSHING bypassed there? This could be wrapped into a new helper that could also assert that the counter is already elevated before the increment. Thanks, Ilya ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails 2026-02-11 14:10 ` Ilya Dryomov @ 2026-02-11 14:52 ` Ilya Dryomov 0 siblings, 0 replies; 5+ messages in thread From: Ilya Dryomov @ 2026-02-11 14:52 UTC (permalink / raw) To: Sam Edwards Cc: Xiubo Li, Viacheslav Dubeyko, Christian Brauner, Milind Changire, Jeff Layton, ceph-devel, linux-kernel, stable On Wed, Feb 11, 2026 at 3:10 PM Ilya Dryomov <idryomov@gmail.com> wrote: > > On Mon, Jan 26, 2026 at 3:27 AM Sam Edwards <cfsworks@gmail.com> wrote: > > > > If `locked_pages` is zero, the page array must not be allocated: > > ceph_process_folio_batch() uses `locked_pages` to decide when to > > allocate `pages`, and redundant allocations trigger > > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and > > writeback stall) or even a kernel panic. Consequently, the main loop in > > ceph_writepages_start() assumes that the lifetime of `pages` is confined > > to a single iteration. > > > > The ceph_submit_write() function claims ownership of the page array on > > success (it is later freed when the write concludes). But failures only > > redirty/unlock the pages and fail to free the array, making the failure > > case in ceph_submit_write() fatal. > > > > Free the page array (and reset locked_pages) in ceph_submit_write()'s > > error-handling 'if' block so that the caller's invariant (that the array > > does not remain in ceph_wbc) is maintained unconditionally, making > > failures in ceph_submit_write() recoverable as originally intended. > > > > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") > > Cc: stable@vger.kernel.org > > Signed-off-by: Sam Edwards <CFSworks@gmail.com> > > --- > > fs/ceph/addr.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > index 63b75d214210..c3e0b5b429ea 100644 > > --- a/fs/ceph/addr.c > > +++ b/fs/ceph/addr.c > > @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping, > > unlock_page(page); > > } > > > > + if (ceph_wbc->from_pool) { > > + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); > > + ceph_wbc->from_pool = false; > > + } else > > + kfree(ceph_wbc->pages); > > + ceph_wbc->pages = NULL; > > + ceph_wbc->locked_pages = 0; > > Hi Sam, > > While I don't see anything wrong with the patch per se, I can't help > but question the existence of this entire branch along with the meaning > of the error. > > ceph_writepages_start() is the only caller of ceph_submit_write() and > it already calls ceph_inc_osd_stopping_blocker() at the top where the > error can be handled naturally -- nothing needs to be unlocked or freed > at that point. Since mdsc->stopping_blockers is just a counter, all > calls made by ceph_submit_write() invocations in a loop would be > "contained" within that ceph_writepages_start() call. The only benefit > achieved is potentially faster response to the MDS client moving to > CEPH_MDSC_STOPPING_FLUSHING state, but it's rather dubious because > sneaking in/having to wait for some more OSD requests isn't really the > end of the world. > > Rather than patching the error path, I wonder if instead of calling > ceph_inc_osd_stopping_blocker() the counter could just be incremented > unconditionally, with the check for CEPH_MDSC_STOPPING_FLUSHING bypassed > there? This could be wrapped into a new helper that could also assert > that the counter is already elevated before the increment. Something along the lines of void __ceph_inc_osd_stopping_blocker(struct ceph_mds_client *mdsc) { BUG_ON(!atomic_inc_not_zero(&mdsc->stopping_blockers)); } and switching to __ceph_inc_osd_stopping_blocker() in place of ceph_inc_osd_stopping_blocker() in ceph_submit_write(). Thanks, Ilya ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-11 14:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260126022715.404984-1-CFSworks@gmail.com>
2026-01-26 2:27 ` [PATCH 1/2] ceph: free page array when ceph_submit_write() fails Sam Edwards
2026-01-26 22:51 ` Viacheslav Dubeyko
2026-01-29 0:10 ` Sam Edwards
2026-02-11 14:10 ` Ilya Dryomov
2026-02-11 14:52 ` Ilya Dryomov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox