* [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left()
@ 2025-08-27 18:17 Max Kellermann
2025-08-27 19:07 ` Viacheslav Dubeyko
2025-08-28 18:54 ` Viacheslav Dubeyko
0 siblings, 2 replies; 6+ messages in thread
From: Max Kellermann @ 2025-08-27 18:17 UTC (permalink / raw)
To: Slava.Dubeyko, xiubli, idryomov, amarkuze, ceph-devel,
linux-kernel
Cc: Max Kellermann, stable
The function ceph_process_folio_batch() sets folio_batch entries to
NULL, which is an illegal state. Before folio_batch_release() crashes
due to this API violation, the function
ceph_shift_unused_folios_left() is supposed to remove those NULLs from
the array.
However, since commit ce80b76dd327 ("ceph: introduce
ceph_process_folio_batch() method"), this shifting doesn't happen
anymore because the "for" loop got moved to
ceph_process_folio_batch(), and now the `i` variable that remains in
ceph_writepages_start() doesn't get incremented anymore, making the
shifting effectively unreachable much of the time.
Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write()
method") added more preconditions for doing the shift, replacing the
`i` check (with something that is still just as broken):
- if ceph_process_folio_batch() fails, shifting never happens
- if ceph_move_dirty_page_in_page_array() was never called (because
ceph_process_folio_batch() has returned early for some of various
reasons), shifting never happens
- if `processed_in_fbatch` is zero (because ceph_process_folio_batch()
has returned early for some of the reasons mentioned above or
because ceph_move_dirty_page_in_page_array() has failed), shifting
never happens
Since those two commits, any problem in ceph_process_folio_batch()
could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP NOPTI
CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
Workqueue: writeback wb_workfn (flush-ceph-1)
RIP: 0010:folios_put_refs+0x85/0x140
Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 >
RSP: 0018:ffffb880af8db778 EFLAGS: 00010207
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003
RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0
RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f
R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0
R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000
FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the
ceph_check_page_before_write() return value to `-E2BIG`.
(Interestingly, the crash happens only if `huge_zero_folio` has
already been allocated; without `huge_zero_folio`,
is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL
entries instead of dereferencing them. That makes reproducing the bug
somewhat unreliable. See
https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com
for a discussion of this detail.)
My suggestion is to move the ceph_shift_unused_folios_left() to right
after ceph_process_folio_batch() to ensure it always gets called to
fix up the illegal folio_batch state.
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
fs/ceph/addr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 8b202d789e93..8bc66b45dade 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1687,6 +1687,7 @@ static int ceph_writepages_start(struct address_space *mapping,
process_folio_batch:
rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc);
+ ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
if (rc)
goto release_folios;
@@ -1695,8 +1696,6 @@ static int ceph_writepages_start(struct address_space *mapping,
goto release_folios;
if (ceph_wbc.processed_in_fbatch) {
- ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
-
if (folio_batch_count(&ceph_wbc.fbatch) == 0 &&
ceph_wbc.locked_pages < ceph_wbc.max_pages) {
doutc(cl, "reached end fbatch, trying for more\n");
--
2.47.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left()
2025-08-27 18:17 [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left() Max Kellermann
@ 2025-08-27 19:07 ` Viacheslav Dubeyko
2025-08-28 18:54 ` Viacheslav Dubeyko
1 sibling, 0 replies; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-08-27 19:07 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org, max.kellermann@ionos.com, Xiubo Li,
idryomov@gmail.com, linux-kernel@vger.kernel.org, Alex Markuze
Cc: stable@vger.kernel.org
On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
> The function ceph_process_folio_batch() sets folio_batch entries to
> NULL, which is an illegal state. Before folio_batch_release() crashes
> due to this API violation, the function
> ceph_shift_unused_folios_left() is supposed to remove those NULLs from
> the array.
>
> However, since commit ce80b76dd327 ("ceph: introduce
> ceph_process_folio_batch() method"), this shifting doesn't happen
> anymore because the "for" loop got moved to
> ceph_process_folio_batch(), and now the `i` variable that remains in
> ceph_writepages_start() doesn't get incremented anymore, making the
> shifting effectively unreachable much of the time.
>
> Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write()
> method") added more preconditions for doing the shift, replacing the
> `i` check (with something that is still just as broken):
>
> - if ceph_process_folio_batch() fails, shifting never happens
>
> - if ceph_move_dirty_page_in_page_array() was never called (because
> ceph_process_folio_batch() has returned early for some of various
> reasons), shifting never happens
>
> - if `processed_in_fbatch` is zero (because ceph_process_folio_batch()
> has returned early for some of the reasons mentioned above or
> because ceph_move_dirty_page_in_page_array() has failed), shifting
> never happens
>
> Since those two commits, any problem in ceph_process_folio_batch()
> could crash the kernel, e.g. this way:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000034
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0002 [#1] SMP NOPTI
> CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE
> Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
> Workqueue: writeback wb_workfn (flush-ceph-1)
> RIP: 0010:folios_put_refs+0x85/0x140
> Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 >
> RSP: 0018:ffffb880af8db778 EFLAGS: 00010207
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003
> RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0
> RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f
> R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000
> FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ceph_writepages_start+0xeb9/0x1410
>
> The crash can be reproduced easily by changing the
> ceph_check_page_before_write() return value to `-E2BIG`.
>
> (Interestingly, the crash happens only if `huge_zero_folio` has
> already been allocated; without `huge_zero_folio`,
> is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL
> entries instead of dereferencing them. That makes reproducing the bug
> somewhat unreliable. See
> https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com
> for a discussion of this detail.)
>
> My suggestion is to move the ceph_shift_unused_folios_left() to right
> after ceph_process_folio_batch() to ensure it always gets called to
> fix up the illegal folio_batch state.
>
> Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
> Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
> fs/ceph/addr.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 8b202d789e93..8bc66b45dade 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1687,6 +1687,7 @@ static int ceph_writepages_start(struct address_space *mapping,
>
> process_folio_batch:
> rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc);
> + ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
> if (rc)
> goto release_folios;
>
> @@ -1695,8 +1696,6 @@ static int ceph_writepages_start(struct address_space *mapping,
> goto release_folios;
>
> if (ceph_wbc.processed_in_fbatch) {
> - ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
> -
> if (folio_batch_count(&ceph_wbc.fbatch) == 0 &&
> ceph_wbc.locked_pages < ceph_wbc.max_pages) {
> doutc(cl, "reached end fbatch, trying for more\n");
Let us try to reproduce the issue and to test the patch.
Thanks,
Slava.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left()
2025-08-27 18:17 [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left() Max Kellermann
2025-08-27 19:07 ` Viacheslav Dubeyko
@ 2025-08-28 18:54 ` Viacheslav Dubeyko
2025-08-28 19:05 ` Ilya Dryomov
1 sibling, 1 reply; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-08-28 18:54 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org, max.kellermann@ionos.com, Xiubo Li,
idryomov@gmail.com, linux-kernel@vger.kernel.org, Alex Markuze
Cc: stable@vger.kernel.org
On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
> The function ceph_process_folio_batch() sets folio_batch entries to
> NULL, which is an illegal state. Before folio_batch_release() crashes
> due to this API violation, the function
> ceph_shift_unused_folios_left() is supposed to remove those NULLs from
> the array.
>
> However, since commit ce80b76dd327 ("ceph: introduce
> ceph_process_folio_batch() method"), this shifting doesn't happen
> anymore because the "for" loop got moved to
> ceph_process_folio_batch(), and now the `i` variable that remains in
> ceph_writepages_start() doesn't get incremented anymore, making the
> shifting effectively unreachable much of the time.
>
> Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write()
> method") added more preconditions for doing the shift, replacing the
> `i` check (with something that is still just as broken):
>
> - if ceph_process_folio_batch() fails, shifting never happens
>
> - if ceph_move_dirty_page_in_page_array() was never called (because
> ceph_process_folio_batch() has returned early for some of various
> reasons), shifting never happens
>
> - if `processed_in_fbatch` is zero (because ceph_process_folio_batch()
> has returned early for some of the reasons mentioned above or
> because ceph_move_dirty_page_in_page_array() has failed), shifting
> never happens
>
> Since those two commits, any problem in ceph_process_folio_batch()
> could crash the kernel, e.g. this way:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000034
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0002 [#1] SMP NOPTI
> CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE
> Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
> Workqueue: writeback wb_workfn (flush-ceph-1)
> RIP: 0010:folios_put_refs+0x85/0x140
> Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 >
> RSP: 0018:ffffb880af8db778 EFLAGS: 00010207
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003
> RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0
> RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f
> R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000
> FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ceph_writepages_start+0xeb9/0x1410
>
> The crash can be reproduced easily by changing the
> ceph_check_page_before_write() return value to `-E2BIG`.
>
I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns
`-E2BIG`, then nothing happens. There is no crush and no write operations could
be processed by file system driver anymore. So, it doesn't look like recipe to
reproduce the issue. I cannot confirm that the patch fixes the issue without
clear way to reproduce the issue.
Could you please provide more clear explanation of the issue reproduction path?
Thanks,
Slava.
> (Interestingly, the crash happens only if `huge_zero_folio` has
> already been allocated; without `huge_zero_folio`,
> is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL
> entries instead of dereferencing them. That makes reproducing the bug
> somewhat unreliable. See
> https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com
> for a discussion of this detail.)
>
> My suggestion is to move the ceph_shift_unused_folios_left() to right
> after ceph_process_folio_batch() to ensure it always gets called to
> fix up the illegal folio_batch state.
>
> Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
> Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
> fs/ceph/addr.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 8b202d789e93..8bc66b45dade 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1687,6 +1687,7 @@ static int ceph_writepages_start(struct address_space *mapping,
>
> process_folio_batch:
> rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc);
> + ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
> if (rc)
> goto release_folios;
>
> @@ -1695,8 +1696,6 @@ static int ceph_writepages_start(struct address_space *mapping,
> goto release_folios;
>
> if (ceph_wbc.processed_in_fbatch) {
> - ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
> -
> if (folio_batch_count(&ceph_wbc.fbatch) == 0 &&
> ceph_wbc.locked_pages < ceph_wbc.max_pages) {
> doutc(cl, "reached end fbatch, trying for more\n");
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left()
2025-08-28 18:54 ` Viacheslav Dubeyko
@ 2025-08-28 19:05 ` Ilya Dryomov
2025-08-28 19:08 ` Viacheslav Dubeyko
0 siblings, 1 reply; 6+ messages in thread
From: Ilya Dryomov @ 2025-08-28 19:05 UTC (permalink / raw)
To: Viacheslav Dubeyko
Cc: ceph-devel@vger.kernel.org, max.kellermann@ionos.com, Xiubo Li,
linux-kernel@vger.kernel.org, Alex Markuze,
stable@vger.kernel.org
On Thu, Aug 28, 2025 at 8:55 PM Viacheslav Dubeyko
<Slava.Dubeyko@ibm.com> wrote:
>
> On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
> > The function ceph_process_folio_batch() sets folio_batch entries to
> > NULL, which is an illegal state. Before folio_batch_release() crashes
> > due to this API violation, the function
> > ceph_shift_unused_folios_left() is supposed to remove those NULLs from
> > the array.
> >
> > However, since commit ce80b76dd327 ("ceph: introduce
> > ceph_process_folio_batch() method"), this shifting doesn't happen
> > anymore because the "for" loop got moved to
> > ceph_process_folio_batch(), and now the `i` variable that remains in
> > ceph_writepages_start() doesn't get incremented anymore, making the
> > shifting effectively unreachable much of the time.
> >
> > Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write()
> > method") added more preconditions for doing the shift, replacing the
> > `i` check (with something that is still just as broken):
> >
> > - if ceph_process_folio_batch() fails, shifting never happens
> >
> > - if ceph_move_dirty_page_in_page_array() was never called (because
> > ceph_process_folio_batch() has returned early for some of various
> > reasons), shifting never happens
> >
> > - if `processed_in_fbatch` is zero (because ceph_process_folio_batch()
> > has returned early for some of the reasons mentioned above or
> > because ceph_move_dirty_page_in_page_array() has failed), shifting
> > never happens
> >
> > Since those two commits, any problem in ceph_process_folio_batch()
> > could crash the kernel, e.g. this way:
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000034
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 0 P4D 0
> > Oops: Oops: 0002 [#1] SMP NOPTI
> > CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE
> > Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
> > Workqueue: writeback wb_workfn (flush-ceph-1)
> > RIP: 0010:folios_put_refs+0x85/0x140
> > Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 >
> > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207
> > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003
> > RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0
> > RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f
> > R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000
> > FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > ceph_writepages_start+0xeb9/0x1410
> >
> > The crash can be reproduced easily by changing the
> > ceph_check_page_before_write() return value to `-E2BIG`.
> >
>
> I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns
> `-E2BIG`, then nothing happens. There is no crush and no write operations could
> be processed by file system driver anymore. So, it doesn't look like recipe to
> reproduce the issue. I cannot confirm that the patch fixes the issue without
> clear way to reproduce the issue.
>
> Could you please provide more clear explanation of the issue reproduction path?
Hi Slava,
Was this bit taken into account?
(Interestingly, the crash happens only if `huge_zero_folio` has
already been allocated; without `huge_zero_folio`,
is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL
entries instead of dereferencing them. That makes reproducing the bug
somewhat unreliable. See
https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com
for a discussion of this detail.)
Thanks,
Ilya
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left()
2025-08-28 19:05 ` Ilya Dryomov
@ 2025-08-28 19:08 ` Viacheslav Dubeyko
2025-08-28 21:37 ` Max Kellermann
0 siblings, 1 reply; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-08-28 19:08 UTC (permalink / raw)
To: idryomov@gmail.com
Cc: stable@vger.kernel.org, max.kellermann@ionos.com,
ceph-devel@vger.kernel.org, Xiubo Li,
linux-kernel@vger.kernel.org, Alex Markuze
On Thu, 2025-08-28 at 21:05 +0200, Ilya Dryomov wrote:
> On Thu, Aug 28, 2025 at 8:55 PM Viacheslav Dubeyko
> <Slava.Dubeyko@ibm.com> wrote:
> >
> > On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
> > > The function ceph_process_folio_batch() sets folio_batch entries to
> > > NULL, which is an illegal state. Before folio_batch_release() crashes
> > > due to this API violation, the function
> > > ceph_shift_unused_folios_left() is supposed to remove those NULLs from
> > > the array.
> > >
> > > However, since commit ce80b76dd327 ("ceph: introduce
> > > ceph_process_folio_batch() method"), this shifting doesn't happen
> > > anymore because the "for" loop got moved to
> > > ceph_process_folio_batch(), and now the `i` variable that remains in
> > > ceph_writepages_start() doesn't get incremented anymore, making the
> > > shifting effectively unreachable much of the time.
> > >
> > > Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write()
> > > method") added more preconditions for doing the shift, replacing the
> > > `i` check (with something that is still just as broken):
> > >
> > > - if ceph_process_folio_batch() fails, shifting never happens
> > >
> > > - if ceph_move_dirty_page_in_page_array() was never called (because
> > > ceph_process_folio_batch() has returned early for some of various
> > > reasons), shifting never happens
> > >
> > > - if `processed_in_fbatch` is zero (because ceph_process_folio_batch()
> > > has returned early for some of the reasons mentioned above or
> > > because ceph_move_dirty_page_in_page_array() has failed), shifting
> > > never happens
> > >
> > > Since those two commits, any problem in ceph_process_folio_batch()
> > > could crash the kernel, e.g. this way:
> > >
> > > BUG: kernel NULL pointer dereference, address: 0000000000000034
> > > #PF: supervisor write access in kernel mode
> > > #PF: error_code(0x0002) - not-present page
> > > PGD 0 P4D 0
> > > Oops: Oops: 0002 [#1] SMP NOPTI
> > > CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE
> > > Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
> > > Workqueue: writeback wb_workfn (flush-ceph-1)
> > > RIP: 0010:folios_put_refs+0x85/0x140
> > > Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 >
> > > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207
> > > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003
> > > RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0
> > > RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f
> > > R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0
> > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000
> > > FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0
> > > PKRU: 55555554
> > > Call Trace:
> > > <TASK>
> > > ceph_writepages_start+0xeb9/0x1410
> > >
> > > The crash can be reproduced easily by changing the
> > > ceph_check_page_before_write() return value to `-E2BIG`.
> > >
> >
> > I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns
> > `-E2BIG`, then nothing happens. There is no crush and no write operations could
> > be processed by file system driver anymore. So, it doesn't look like recipe to
> > reproduce the issue. I cannot confirm that the patch fixes the issue without
> > clear way to reproduce the issue.
> >
> > Could you please provide more clear explanation of the issue reproduction path?
>
> Hi Slava,
>
> Was this bit taken into account?
>
> (Interestingly, the crash happens only if `huge_zero_folio` has
> already been allocated; without `huge_zero_folio`,
> is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL
> entries instead of dereferencing them. That makes reproducing the bug
> somewhat unreliable. See
> https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com
> for a discussion of this detail.)
>
>
Hi Ilya,
And which practical step of actions do you see to repeat and reproduce it? :)
Thanks,
Slava.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left()
2025-08-28 19:08 ` Viacheslav Dubeyko
@ 2025-08-28 21:37 ` Max Kellermann
0 siblings, 0 replies; 6+ messages in thread
From: Max Kellermann @ 2025-08-28 21:37 UTC (permalink / raw)
To: Viacheslav Dubeyko
Cc: idryomov@gmail.com, stable@vger.kernel.org,
ceph-devel@vger.kernel.org, Xiubo Li,
linux-kernel@vger.kernel.org, Alex Markuze
On Thu, Aug 28, 2025 at 9:08 PM Viacheslav Dubeyko
<Slava.Dubeyko@ibm.com> wrote:
> And which practical step of actions do you see to repeat and reproduce it? :)
Apply the patch in the link. Did you read that thread/patch?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-28 21:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 18:17 [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left() Max Kellermann
2025-08-27 19:07 ` Viacheslav Dubeyko
2025-08-28 18:54 ` Viacheslav Dubeyko
2025-08-28 19:05 ` Ilya Dryomov
2025-08-28 19:08 ` Viacheslav Dubeyko
2025-08-28 21:37 ` Max Kellermann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).