[PATCH 1/2] ceph: free page array when ceph_submit

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] ceph: free page array when ceph_submit_write() fails
       [not found] <20260126022715.404984-1-CFSworks@gmail.com>
@ 2026-01-26  2:27 ` Sam Edwards
  2026-01-26 22:51   ` Viacheslav Dubeyko
  2026-02-11 14:10   ` Ilya Dryomov
  0 siblings, 2 replies; 5+ messages in thread
From: Sam Edwards @ 2026-01-26  2:27 UTC (permalink / raw)
  To: Xiubo Li, Ilya Dryomov
  Cc: Viacheslav Dubeyko, Christian Brauner, Milind Changire,
	Jeff Layton, ceph-devel, linux-kernel, Sam Edwards, stable

If `locked_pages` is zero, the page array must not be allocated:
ceph_process_folio_batch() uses `locked_pages` to decide when to
allocate `pages`, and redundant allocations trigger
ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
writeback stall) or even a kernel panic. Consequently, the main loop in
ceph_writepages_start() assumes that the lifetime of `pages` is confined
to a single iteration.

The ceph_submit_write() function claims ownership of the page array on
success (it is later freed when the write concludes). But failures only
redirty/unlock the pages and fail to free the array, making the failure
case in ceph_submit_write() fatal.

Free the page array (and reset locked_pages) in ceph_submit_write()'s
error-handling 'if' block so that the caller's invariant (that the array
does not remain in ceph_wbc) is maintained unconditionally, making
failures in ceph_submit_write() recoverable as originally intended.

Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
Cc: stable@vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---
 fs/ceph/addr.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 63b75d214210..c3e0b5b429ea 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping,
 			unlock_page(page);
 		}

+		if (ceph_wbc->from_pool) {
+			mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
+			ceph_wbc->from_pool = false;
+		} else
+			kfree(ceph_wbc->pages);
+		ceph_wbc->pages = NULL;
+		ceph_wbc->locked_pages = 0;
+
 		ceph_osdc_put_request(req);
 		return -EIO;
 	}
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re:  [PATCH 1/2] ceph: free page array when ceph_submit_write() fails
  2026-01-26  2:27 ` [PATCH 1/2] ceph: free page array when ceph_submit_write() fails Sam Edwards
@ 2026-01-26 22:51   ` Viacheslav Dubeyko
  2026-01-29  0:10     ` Sam Edwards
  2026-02-11 14:10   ` Ilya Dryomov
  1 sibling, 1 reply; 5+ messages in thread
From: Viacheslav Dubeyko @ 2026-01-26 22:51 UTC (permalink / raw)
  To: Xiubo Li, idryomov@gmail.com, cfsworks@gmail.com
  Cc: Milind Changire, stable@vger.kernel.org,
	ceph-devel@vger.kernel.org, brauner@kernel.org,
	jlayton@kernel.org, linux-kernel@vger.kernel.org

On Sun, 2026-01-25 at 18:27 -0800, Sam Edwards wrote:
> If `locked_pages` is zero, the page array must not be allocated:
> ceph_process_folio_batch() uses `locked_pages` to decide when to
> allocate `pages`, and redundant allocations trigger
> ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
> writeback stall) or even a kernel panic. Consequently, the main loop in
> ceph_writepages_start() assumes that the lifetime of `pages` is confined
> to a single iteration.
> 
> The ceph_submit_write() function claims ownership of the page array on
> success (it is later freed when the write concludes). But failures only
> redirty/unlock the pages and fail to free the array, making the failure
> case in ceph_submit_write() fatal.
> 
> Free the page array (and reset locked_pages) in ceph_submit_write()'s
> error-handling 'if' block so that the caller's invariant (that the array
> does not remain in ceph_wbc) is maintained unconditionally, making
> failures in ceph_submit_write() recoverable as originally intended.
> 
> Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> ---
>  fs/ceph/addr.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 63b75d214210..c3e0b5b429ea 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping,
>  			unlock_page(page);
>  		}
>  
> +		if (ceph_wbc->from_pool) {
> +			mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
> +			ceph_wbc->from_pool = false;
> +		} else
> +			kfree(ceph_wbc->pages);
> +		ceph_wbc->pages = NULL;
> +		ceph_wbc->locked_pages = 0;
> +


I see the completely identical code pattern in two patches:

+	if (ceph_wbc->from_pool) {
+		mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
+		ceph_wbc->from_pool = false;
+	} else
+		kfree(ceph_wbc->pages);
+	ceph_wbc->pages = NULL;
+	ceph_wbc->locked_pages = 0;

I believe we need to introduce the inline function that can be reused in two
places.

Thanks,
Slava.

>  		ceph_osdc_put_request(req);
>  		return -EIO;
>  	}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails
  2026-01-26 22:51   ` Viacheslav Dubeyko
@ 2026-01-29  0:10     ` Sam Edwards
  0 siblings, 0 replies; 5+ messages in thread
From: Sam Edwards @ 2026-01-29  0:10 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: Xiubo Li, idryomov@gmail.com, Milind Changire,
	stable@vger.kernel.org, ceph-devel@vger.kernel.org,
	brauner@kernel.org, jlayton@kernel.org,
	linux-kernel@vger.kernel.org

On Mon, Jan 26, 2026 at 2:51 PM Viacheslav Dubeyko
<Slava.Dubeyko@ibm.com> wrote:
>
> On Sun, 2026-01-25 at 18:27 -0800, Sam Edwards wrote:
> > If `locked_pages` is zero, the page array must not be allocated:
> > ceph_process_folio_batch() uses `locked_pages` to decide when to
> > allocate `pages`, and redundant allocations trigger
> > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
> > writeback stall) or even a kernel panic. Consequently, the main loop in
> > ceph_writepages_start() assumes that the lifetime of `pages` is confined
> > to a single iteration.
> >
> > The ceph_submit_write() function claims ownership of the page array on
> > success (it is later freed when the write concludes). But failures only
> > redirty/unlock the pages and fail to free the array, making the failure
> > case in ceph_submit_write() fatal.
> >
> > Free the page array (and reset locked_pages) in ceph_submit_write()'s
> > error-handling 'if' block so that the caller's invariant (that the array
> > does not remain in ceph_wbc) is maintained unconditionally, making
> > failures in ceph_submit_write() recoverable as originally intended.
> >
> > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> > ---
> >  fs/ceph/addr.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 63b75d214210..c3e0b5b429ea 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping,
> >                       unlock_page(page);
> >               }
> >
> > +             if (ceph_wbc->from_pool) {
> > +                     mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
> > +                     ceph_wbc->from_pool = false;
> > +             } else
> > +                     kfree(ceph_wbc->pages);
> > +             ceph_wbc->pages = NULL;
> > +             ceph_wbc->locked_pages = 0;
> > +
>
>
> I see the completely identical code pattern in two patches:

The second patch only contains that pattern because it is moving it to
a separate function, patch 2 isn't introducing any *new* code.

>
> +       if (ceph_wbc->from_pool) {
> +               mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
> +               ceph_wbc->from_pool = false;
> +       } else
> +               kfree(ceph_wbc->pages);
> +       ceph_wbc->pages = NULL;
> +       ceph_wbc->locked_pages = 0;
>
> I believe we need to introduce the inline function that can be reused in two
> places.

Patch 2 is introducing that inline function as requested -- but that
function is not actually used in two places: for now (in this series),
it is only split out for better readability.

These patches are organized like this because of kernel development
norms: bugfixes intended for stable (such as this patch) should
consist of minimal, backport-friendly and correctness-focused changes.
Moving existing code to a new function is a separate change and does
not constitute a bugfix, so it needs to go in its own patch that isn't
Cc: stable.

Cheers,
Sam

>
> Thanks,
> Slava.
>
> >               ceph_osdc_put_request(req);
> >               return -EIO;
> >       }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails
  2026-01-26  2:27 ` [PATCH 1/2] ceph: free page array when ceph_submit_write() fails Sam Edwards
  2026-01-26 22:51   ` Viacheslav Dubeyko
@ 2026-02-11 14:10   ` Ilya Dryomov
  2026-02-11 14:52     ` Ilya Dryomov
  1 sibling, 1 reply; 5+ messages in thread
From: Ilya Dryomov @ 2026-02-11 14:10 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Xiubo Li, Viacheslav Dubeyko, Christian Brauner, Milind Changire,
	Jeff Layton, ceph-devel, linux-kernel, stable

On Mon, Jan 26, 2026 at 3:27 AM Sam Edwards <cfsworks@gmail.com> wrote:
>
> If `locked_pages` is zero, the page array must not be allocated:
> ceph_process_folio_batch() uses `locked_pages` to decide when to
> allocate `pages`, and redundant allocations trigger
> ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
> writeback stall) or even a kernel panic. Consequently, the main loop in
> ceph_writepages_start() assumes that the lifetime of `pages` is confined
> to a single iteration.
>
> The ceph_submit_write() function claims ownership of the page array on
> success (it is later freed when the write concludes). But failures only
> redirty/unlock the pages and fail to free the array, making the failure
> case in ceph_submit_write() fatal.
>
> Free the page array (and reset locked_pages) in ceph_submit_write()'s
> error-handling 'if' block so that the caller's invariant (that the array
> does not remain in ceph_wbc) is maintained unconditionally, making
> failures in ceph_submit_write() recoverable as originally intended.
>
> Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> ---
>  fs/ceph/addr.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 63b75d214210..c3e0b5b429ea 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping,
>                         unlock_page(page);
>                 }
>
> +               if (ceph_wbc->from_pool) {
> +                       mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
> +                       ceph_wbc->from_pool = false;
> +               } else
> +                       kfree(ceph_wbc->pages);
> +               ceph_wbc->pages = NULL;
> +               ceph_wbc->locked_pages = 0;

Hi Sam,

While I don't see anything wrong with the patch per se, I can't help
but question the existence of this entire branch along with the meaning
of the error.

ceph_writepages_start() is the only caller of ceph_submit_write() and
it already calls ceph_inc_osd_stopping_blocker() at the top where the
error can be handled naturally -- nothing needs to be unlocked or freed
at that point.  Since mdsc->stopping_blockers is just a counter, all
calls made by ceph_submit_write() invocations in a loop would be
"contained" within that ceph_writepages_start() call.  The only benefit
achieved is potentially faster response to the MDS client moving to
CEPH_MDSC_STOPPING_FLUSHING state, but it's rather dubious because
sneaking in/having to wait for some more OSD requests isn't really the
end of the world.

Rather than patching the error path, I wonder if instead of calling
ceph_inc_osd_stopping_blocker() the counter could just be incremented
unconditionally, with the check for CEPH_MDSC_STOPPING_FLUSHING bypassed
there?  This could be wrapped into a new helper that could also assert
that the counter is already elevated before the increment.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails
  2026-02-11 14:10   ` Ilya Dryomov
@ 2026-02-11 14:52     ` Ilya Dryomov
  0 siblings, 0 replies; 5+ messages in thread
From: Ilya Dryomov @ 2026-02-11 14:52 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Xiubo Li, Viacheslav Dubeyko, Christian Brauner, Milind Changire,
	Jeff Layton, ceph-devel, linux-kernel, stable

On Wed, Feb 11, 2026 at 3:10 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Mon, Jan 26, 2026 at 3:27 AM Sam Edwards <cfsworks@gmail.com> wrote:
> >
> > If `locked_pages` is zero, the page array must not be allocated:
> > ceph_process_folio_batch() uses `locked_pages` to decide when to
> > allocate `pages`, and redundant allocations trigger
> > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
> > writeback stall) or even a kernel panic. Consequently, the main loop in
> > ceph_writepages_start() assumes that the lifetime of `pages` is confined
> > to a single iteration.
> >
> > The ceph_submit_write() function claims ownership of the page array on
> > success (it is later freed when the write concludes). But failures only
> > redirty/unlock the pages and fail to free the array, making the failure
> > case in ceph_submit_write() fatal.
> >
> > Free the page array (and reset locked_pages) in ceph_submit_write()'s
> > error-handling 'if' block so that the caller's invariant (that the array
> > does not remain in ceph_wbc) is maintained unconditionally, making
> > failures in ceph_submit_write() recoverable as originally intended.
> >
> > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> > ---
> >  fs/ceph/addr.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 63b75d214210..c3e0b5b429ea 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping,
> >                         unlock_page(page);
> >                 }
> >
> > +               if (ceph_wbc->from_pool) {
> > +                       mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
> > +                       ceph_wbc->from_pool = false;
> > +               } else
> > +                       kfree(ceph_wbc->pages);
> > +               ceph_wbc->pages = NULL;
> > +               ceph_wbc->locked_pages = 0;
>
> Hi Sam,
>
> While I don't see anything wrong with the patch per se, I can't help
> but question the existence of this entire branch along with the meaning
> of the error.
>
> ceph_writepages_start() is the only caller of ceph_submit_write() and
> it already calls ceph_inc_osd_stopping_blocker() at the top where the
> error can be handled naturally -- nothing needs to be unlocked or freed
> at that point.  Since mdsc->stopping_blockers is just a counter, all
> calls made by ceph_submit_write() invocations in a loop would be
> "contained" within that ceph_writepages_start() call.  The only benefit
> achieved is potentially faster response to the MDS client moving to
> CEPH_MDSC_STOPPING_FLUSHING state, but it's rather dubious because
> sneaking in/having to wait for some more OSD requests isn't really the
> end of the world.
>
> Rather than patching the error path, I wonder if instead of calling
> ceph_inc_osd_stopping_blocker() the counter could just be incremented
> unconditionally, with the check for CEPH_MDSC_STOPPING_FLUSHING bypassed
> there?  This could be wrapped into a new helper that could also assert
> that the counter is already elevated before the increment.

Something along the lines of

void __ceph_inc_osd_stopping_blocker(struct ceph_mds_client *mdsc)
{
        BUG_ON(!atomic_inc_not_zero(&mdsc->stopping_blockers));
}

and switching to __ceph_inc_osd_stopping_blocker() in place of
ceph_inc_osd_stopping_blocker() in ceph_submit_write().

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-11 14:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260126022715.404984-1-CFSworks@gmail.com>
2026-01-26  2:27 ` [PATCH 1/2] ceph: free page array when ceph_submit_write() fails Sam Edwards
2026-01-26 22:51   ` Viacheslav Dubeyko
2026-01-29  0:10     ` Sam Edwards
2026-02-11 14:10   ` Ilya Dryomov
2026-02-11 14:52     ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox