[REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails
@ 2026-03-18  2:37 Sam Edwards
  2026-03-18 19:41 ` Viacheslav Dubeyko
  0 siblings, 1 reply; 5+ messages in thread
From: Sam Edwards @ 2026-03-18  2:37 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko
  Cc: Milind Changire, Xiubo Li, Jeff Layton, ceph-devel, linux-kernel,
	regressions, Sam Edwards, stable

move_dirty_folio_in_page_array() may fail if the file is encrypted, the
dirty folio is not the first in the batch, and it fails to allocate a
bounce buffer to hold the ciphertext. When that happens,
ceph_process_folio_batch() simply redirties the folio and flushes the
current batch -- it can retry that folio in a future batch.

However, if this failed folio is not contiguous with the last folio that
did make it into the batch, then ceph_process_folio_batch() has already
incremented `ceph_wbc->num_ops`; because it doesn't follow through and
add the discontiguous folio to the array, ceph_submit_write() -- which
expects that `ceph_wbc->num_ops` accurately reflects the number of
contiguous ranges (and therefore the required number of "write extent"
ops) in the writeback -- will panic the kernel:

    BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);

This issue can be reproduced on affected kernels by writing to
fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-skipped/repeat
pattern (total filesize should not matter) and gradually increasing the
system's memory pressure until a bounce buffer allocation fails.

Fix this crash by decrementing `ceph_wbc->num_ops` back to the correct
value when move_dirty_folio_in_page_array() fails, but the folio already
started counting a new (i.e. still-empty) extent.

The defect corrected by this patch has existed since 2022 (see first
`Fixes:`), but another bug blocked multi-folio encrypted writeback until
recently (see second `Fixes:`). The second commit made it into 6.18.16,
6.19.6, and 7.0-rc1, unmasking the panic in those versions. This patch
therefore fixes a regression (panic) introduced by cac190c7674f.

Cc: stable@vger.kernel.org # v6.18+
Fixes: d55207717ded ("ceph: add encryption support to writepage and writepages")
Fixes: cac190c7674f ("ceph: fix write storm on fscrypted files")
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---

Changes v1->v2:
- Added a paragraph to the commit log briefly explaining the I/O pattern to
  reproduce the issue (thanks Slava)

- Additionally Cc'd regressions@lists.linux.dev as required when handling
  regressions

Feedback not addressed:
- "Commit message should link to the mentioned BUG_ON line in a source listing"
    (link would not really help anyone, and the line is a moving target anyway)

- "Commit message should indicate that ceph_wbc->num_ops is passed to
   ceph_osdc_new_request() to explain why ceph_wbc->num_ops == req->r_num_ops"
    (ceph_wbc->num_ops is easy enough to search; and the cause->effect of the
     BUG_ON() is secondary to the central point that ceph_process_folio_batch()
     is responsible for ensuring ceph_wbc->num_ops is correct before returning)

- "An issue should be filed in the Ceph Redmine, linked via Closes:"
    (thanks Ilya for clarifying this is unnecessary)

---
 fs/ceph/addr.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e87b3bb94ee8..f366e159ffa6 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1366,6 +1366,10 @@ void ceph_process_folio_batch(struct address_space *mapping,
 		rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc,
 				folio);
 		if (rc) {
+			/* Did we just begin a new contiguous op? Nevermind! */
+			if (ceph_wbc->len == 0)
+				ceph_wbc->num_ops--;
+
 			folio_redirty_for_writepage(wbc, folio);
 			folio_unlock(folio);
 			break;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re:  [REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails
  2026-03-18  2:37 [REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails Sam Edwards
@ 2026-03-18 19:41 ` Viacheslav Dubeyko
  2026-03-19 19:14   ` Viacheslav Dubeyko
  2026-03-25  2:56   ` Sam Edwards
  0 siblings, 2 replies; 5+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-18 19:41 UTC (permalink / raw)
  To: idryomov@gmail.com, Alex Markuze, cfsworks@gmail.com,
	slava@dubeyko.com
  Cc: Milind Changire, stable@vger.kernel.org, Xiubo Li,
	jlayton@kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@vger.kernel.org, regressions@lists.linux.dev

On Tue, 2026-03-17 at 19:37 -0700, Sam Edwards wrote:
> move_dirty_folio_in_page_array() may fail if the file is encrypted, the
> dirty folio is not the first in the batch, and it fails to allocate a
> bounce buffer to hold the ciphertext. When that happens,
> ceph_process_folio_batch() simply redirties the folio and flushes the
> current batch -- it can retry that folio in a future batch.
> 
> However, if this failed folio is not contiguous with the last folio that
> did make it into the batch, then ceph_process_folio_batch() has already
> incremented `ceph_wbc->num_ops`; because it doesn't follow through and
> add the discontiguous folio to the array, ceph_submit_write() -- which
> expects that `ceph_wbc->num_ops` accurately reflects the number of
> contiguous ranges (and therefore the required number of "write extent"
> ops) in the writeback -- will panic the kernel:
> 
>     BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);
> 
> This issue can be reproduced on affected kernels by writing to
> fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-skipped/repeat
> pattern (total filesize should not matter) and gradually increasing the
> system's memory pressure until a bounce buffer allocation fails.
> 
> Fix this crash by decrementing `ceph_wbc->num_ops` back to the correct
> value when move_dirty_folio_in_page_array() fails, but the folio already
> started counting a new (i.e. still-empty) extent.
> 
> The defect corrected by this patch has existed since 2022 (see first
> `Fixes:`), but another bug blocked multi-folio encrypted writeback until
> recently (see second `Fixes:`). The second commit made it into 6.18.16,
> 6.19.6, and 7.0-rc1, unmasking the panic in those versions. This patch
> therefore fixes a regression (panic) introduced by cac190c7674f.
> 
> Cc: stable@vger.kernel.org # v6.18+
> Fixes: d55207717ded ("ceph: add encryption support to writepage and writepages")
> Fixes: cac190c7674f ("ceph: fix write storm on fscrypted files")
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> ---
> 
> Changes v1->v2:
> - Added a paragraph to the commit log briefly explaining the I/O pattern to
>   reproduce the issue (thanks Slava)
> 
> - Additionally Cc'd regressions@lists.linux.dev as required when handling
>   regressions
> 
> Feedback not addressed:
> - "Commit message should link to the mentioned BUG_ON line in a source listing"
>     (link would not really help anyone, and the line is a moving target anyway)

My request was to identify the location of:

BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);

Because, it's completely not clear from the commit message the location of this
code pattern.

There are two possible ways:
(1) Link https://elixir.bootlin.com/linux/v7.0-rc4/source/fs/ceph/addr.c#L1555.
I hope you can see that it includes kernel version. So, if the line will change
with time, then this link always will identify the position of this code pattern
in v7.0-rc4, for example.

(2) You can show the function that contains this code pattern:

static
int ceph_submit_write(struct address_space *mapping,
			struct writeback_control *wbc,
			struct ceph_writeback_ctl *ceph_wbc)
{
<skipped>

    BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);

<skipped>
}

> 
> - "Commit message should indicate that ceph_wbc->num_ops is passed to
>    ceph_osdc_new_request() to explain why ceph_wbc->num_ops == req->r_num_ops"
>     (ceph_wbc->num_ops is easy enough to search; and the cause->effect of the
>      BUG_ON() is secondary to the central point that ceph_process_folio_batch()
>      is responsible for ensuring ceph_wbc->num_ops is correct before returning)
> 
> - "An issue should be filed in the Ceph Redmine, linked via Closes:"
>     (thanks Ilya for clarifying this is unnecessary)
> 
> ---
>  fs/ceph/addr.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index e87b3bb94ee8..f366e159ffa6 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1366,6 +1366,10 @@ void ceph_process_folio_batch(struct address_space *mapping,
>  		rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc,
>  				folio);
>  		if (rc) {
> +			/* Did we just begin a new contiguous op? Nevermind! */
> +			if (ceph_wbc->len == 0)
> +				ceph_wbc->num_ops--;
> +
>  			folio_redirty_for_writepage(wbc, folio);
>  			folio_unlock(folio);
>  			break;

Let me run the xfstests for the patch. I'll be back with the result ASAP.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re:  [REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails
  2026-03-18 19:41 ` Viacheslav Dubeyko
@ 2026-03-19 19:14   ` Viacheslav Dubeyko
  2026-03-25  2:56   ` Sam Edwards
  1 sibling, 0 replies; 5+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-19 19:14 UTC (permalink / raw)
  To: Viacheslav Dubeyko, idryomov@gmail.com, Alex Markuze,
	cfsworks@gmail.com
  Cc: Milind Changire, stable@vger.kernel.org, Xiubo Li,
	jlayton@kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@vger.kernel.org, regressions@lists.linux.dev

On Wed, 2026-03-18 at 19:41 +0000, Viacheslav Dubeyko wrote:
> On Tue, 2026-03-17 at 19:37 -0700, Sam Edwards wrote:
> > move_dirty_folio_in_page_array() may fail if the file is encrypted,
> > the
> > dirty folio is not the first in the batch, and it fails to allocate
> > a
> > bounce buffer to hold the ciphertext. When that happens,
> > ceph_process_folio_batch() simply redirties the folio and flushes
> > the
> > current batch -- it can retry that folio in a future batch.
> > 
> > However, if this failed folio is not contiguous with the last folio
> > that
> > did make it into the batch, then ceph_process_folio_batch() has
> > already
> > incremented `ceph_wbc->num_ops`; because it doesn't follow through
> > and
> > add the discontiguous folio to the array, ceph_submit_write() --
> > which
> > expects that `ceph_wbc->num_ops` accurately reflects the number of
> > contiguous ranges (and therefore the required number of "write
> > extent"
> > ops) in the writeback -- will panic the kernel:
> > 
> >     BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);
> > 
> > This issue can be reproduced on affected kernels by writing to
> > fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-
> > skipped/repeat
> > pattern (total filesize should not matter) and gradually increasing
> > the
> > system's memory pressure until a bounce buffer allocation fails.
> > 
> > Fix this crash by decrementing `ceph_wbc->num_ops` back to the
> > correct
> > value when move_dirty_folio_in_page_array() fails, but the folio
> > already
> > started counting a new (i.e. still-empty) extent.
> > 
> > The defect corrected by this patch has existed since 2022 (see
> > first
> > `Fixes:`), but another bug blocked multi-folio encrypted writeback
> > until
> > recently (see second `Fixes:`). The second commit made it into
> > 6.18.16,
> > 6.19.6, and 7.0-rc1, unmasking the panic in those versions. This
> > patch
> > therefore fixes a regression (panic) introduced by cac190c7674f.
> > 
> > Cc: stable@vger.kernel.org # v6.18+
> > Fixes: d55207717ded ("ceph: add encryption support to writepage and
> > writepages")
> > Fixes: cac190c7674f ("ceph: fix write storm on fscrypted files")
> > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> > ---
> > 
> > Changes v1->v2:
> > - Added a paragraph to the commit log briefly explaining the I/O
> > pattern to
> >   reproduce the issue (thanks Slava)
> > 
> > - Additionally Cc'd regressions@lists.linux.dev as required when
> > handling
> >   regressions
> > 
> > Feedback not addressed:
> > - "Commit message should link to the mentioned BUG_ON line in a
> > source listing"
> >     (link would not really help anyone, and the line is a moving
> > target anyway)
> 
> My request was to identify the location of:
> 
> BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);
> 
> Because, it's completely not clear from the commit message the
> location of this
> code pattern.
> 
> There are two possible ways:
> (1) Link
> https://elixir.bootlin.com/linux/v7.0-rc4/source/fs/ceph/addr.c#L1555
> .
> I hope you can see that it includes kernel version. So, if the line
> will change
> with time, then this link always will identify the position of this
> code pattern
> in v7.0-rc4, for example.
> 
> (2) You can show the function that contains this code pattern:
> 
> static
> int ceph_submit_write(struct address_space *mapping,
> 			struct writeback_control *wbc,
> 			struct ceph_writeback_ctl *ceph_wbc)
> {
> <skipped>
> 
>     BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);
> 
> <skipped>
> }
> 
> > 
> > - "Commit message should indicate that ceph_wbc->num_ops is passed
> > to
> >    ceph_osdc_new_request() to explain why ceph_wbc->num_ops == req-
> > >r_num_ops"
> >     (ceph_wbc->num_ops is easy enough to search; and the cause-
> > >effect of the
> >      BUG_ON() is secondary to the central point that
> > ceph_process_folio_batch()
> >      is responsible for ensuring ceph_wbc->num_ops is correct
> > before returning)
> > 
> > - "An issue should be filed in the Ceph Redmine, linked via
> > Closes:"
> >     (thanks Ilya for clarifying this is unnecessary)
> > 
> > ---
> >  fs/ceph/addr.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index e87b3bb94ee8..f366e159ffa6 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1366,6 +1366,10 @@ void ceph_process_folio_batch(struct
> > address_space *mapping,
> >  		rc = move_dirty_folio_in_page_array(mapping, wbc,
> > ceph_wbc,
> >  				folio);
> >  		if (rc) {
> > +			/* Did we just begin a new contiguous op?
> > Nevermind! */
> > +			if (ceph_wbc->len == 0)
> > +				ceph_wbc->num_ops--;
> > +
> >  			folio_redirty_for_writepage(wbc, folio);
> >  			folio_unlock(folio);
> >  			break;
> 
> Let me run the xfstests for the patch. I'll be back with the result
> ASAP.
> 
> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
> 

I don't see any new issue during the xfstests run.

Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails
  2026-03-18 19:41 ` Viacheslav Dubeyko
  2026-03-19 19:14   ` Viacheslav Dubeyko
@ 2026-03-25  2:56   ` Sam Edwards
  2026-03-25 11:55     ` Ilya Dryomov
  1 sibling, 1 reply; 5+ messages in thread
From: Sam Edwards @ 2026-03-25  2:56 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: idryomov@gmail.com, Alex Markuze, slava@dubeyko.com,
	Milind Changire, stable@vger.kernel.org, Xiubo Li,
	jlayton@kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@vger.kernel.org, regressions@lists.linux.dev

On Wed, Mar 18, 2026 at 12:42 PM Viacheslav Dubeyko
<Slava.Dubeyko@ibm.com> wrote:
> ...

Hi Slava,

> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

This looks like you gave "for future reference" feedback and provided
a R-b tag for the current version of the patch; is that it? Or is this
a tag to roll forward to a v3 with your feedback applied?

If necessary to pass review, I can do something like your (2) and
amend the commit message:

<excerpt>
contiguous ranges (and therefore the required number of "write extent"
ops) in the writeback -- will panic the kernel:

    /* in ceph_submit_write() */
    req = ceph_osdc_new_request(/* ... */, ceph_wbc->num_ops, /* ... */);
    /* ... */
    BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);

This issue can be reproduced on affected kernels by writing to
</excerpt>

But I fear adding even that much sacrifices clarity: the central point
is that num_ops needs to be correct when ceph_process_folio_batch()
returns; I understand that documenting the symptom of the problem
(where it panics) is an important secondary goal for helping affected
users/stable/downstreams understand the impact and/or discover the
commit, but I'm also trying to be respectful of their time by not
reiterating code to someone who would just CTRL+F the source file if
they wanted this level of detail.

Best,
Sam

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails
  2026-03-25  2:56   ` Sam Edwards
@ 2026-03-25 11:55     ` Ilya Dryomov
  0 siblings, 0 replies; 5+ messages in thread
From: Ilya Dryomov @ 2026-03-25 11:55 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Viacheslav Dubeyko, Alex Markuze, slava@dubeyko.com,
	Milind Changire, stable@vger.kernel.org, Xiubo Li,
	jlayton@kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@vger.kernel.org, regressions@lists.linux.dev

On Wed, Mar 25, 2026 at 3:56 AM Sam Edwards <cfsworks@gmail.com> wrote:
>
> On Wed, Mar 18, 2026 at 12:42 PM Viacheslav Dubeyko
> <Slava.Dubeyko@ibm.com> wrote:
> > ...
>
> Hi Slava,
>
> > Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
>
> This looks like you gave "for future reference" feedback and provided
> a R-b tag for the current version of the patch; is that it? Or is this
> a tag to roll forward to a v3 with your feedback applied?

Hi Sam,

The patch was applied as is last week:

https://github.com/ceph/ceph-client/commit/681a6d350eff104294bf8aceebb627a08c037298

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-25 11:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18  2:37 [REGRESSION] [PATCH v2] ceph: fix num_ops OBOE when crypto allocation fails Sam Edwards
2026-03-18 19:41 ` Viacheslav Dubeyko
2026-03-19 19:14   ` Viacheslav Dubeyko
2026-03-25  2:56   ` Sam Edwards
2026-03-25 11:55     ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox