[PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders
@ 2026-06-04 18:08 Michael Bommarito
  2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

The CephFS client decodes several variable-length fields from MDS and
monitor replies without checking the declared length against the bytes
actually present in the message. Each of the four sites below trusts a
32-bit or 64-bit length from the wire and then copies, advances over, or
loops on it. A malicious or compromised server (or, for the monitor map,
an on-path attacker on an unsigned messenger session) can drive an
out-of-bounds read or an unbounded loop in the client kernel. The fixes
add the missing ceph_decode_need()/ceph_decode_copy_safe() bound or an
explicit count cap, matching the idioms already used elsewhere in the
same decoders.

Patch 1 is the most serious: the over-read value is returned to user
space through getxattr(2), so it is a kernel-heap information leak
(AC:H, requires a user to read the attribute). Patches 2-4 are
out-of-bounds reads / an unbounded loop that crash or wedge the client
(denial of service, no information disclosure). For patch 3 the supplier
is the monitor, a more privileged cluster role than the MDS; it is
included as a hardening fix for the info_v 2/3 compatibility path.

All four were reproduced on x86_64 QEMU with CONFIG_KASAN=y by calling
the real decoder via an in-tree debugfs harness, and re-run with the
patch applied:

  patch 1: stock - __build_xattrs() stores val_len=304 from a 256-byte
           blob, then the getxattr memcpy is a slab-out-of-bounds read of
           304 bytes; patched - decode returns -EIO, no value stored.
  patch 2: stock - slab-out-of-bounds read of 512 bytes in
           handle_session(); patched - the bound rejects the record.
  patch 3: stock - slab-out-of-bounds read in ceph_mdsmap_decode();
           patched - decode returns -EINVAL.
  patch 4: stock - ceph_parse_deleg_inos() runs 1048640 iterations on a
           crafted len; patched - returns -EIO before the loop.

Well-formed replies still decode in every case (valid xattr value,
in-range delegation, in-bounds export-target array). These bugs were
found with AI assistance and are reported on the public list
accordingly, especially since they are mostly about a malicious
trusted server.

Michael Bommarito (4):
  ceph: bound xattr value length in __build_xattrs()
  ceph: bound MDSCapAuth path and fs_name decode in handle_session()
  ceph: bound num_export_targets array for mds info v2/v3
  ceph: cap delegated inode count in ceph_parse_deleg_inos()

 fs/ceph/mds_client.c | 22 ++++++++++++++++++++--
 fs/ceph/mdsmap.c     |  2 ++
 fs/ceph/super.h      |  9 +++++++++
 fs/ceph/xattr.c      |  1 +
 4 files changed, 32 insertions(+), 2 deletions(-)

base-commit: f72c95f3a516d87483e225ae081a402a09fd0127
-- 
2.53.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] ceph: bound xattr value length in __build_xattrs()
  2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
@ 2026-06-04 18:08 ` Michael Bommarito
  2026-06-04 19:50   ` Viacheslav Dubeyko
  2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

__build_xattrs() decodes the MDS-supplied xattr blob one attribute at a
time. For each attribute it reads a 32-bit name length, advances past the
name bytes, reads a 32-bit value length, records the value pointer, and
advances past the value bytes. The two length fields are read with
ceph_decode_32_safe(), but the value bytes themselves are advanced over
with a bare "p += len" and no ceph_decode_need() check that "len" bytes
remain in the blob.

For every attribute except the last, the next iteration's
ceph_decode_32_safe() on the following name length implicitly verifies
that the previous value did not run past the blob end. The final
attribute has no successor, so its decoded value length is never checked
against the blob bounds. A malicious or compromised metadata server can
set the last attribute's value length larger than the bytes actually
present in the blob.

The blob is a dedicated kvmalloc() allocation sized to the wire length
(ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the
oversized length in xattr->val_len verbatim, and a later getxattr(2) runs
memcpy(value, xattr->val, xattr->val_len) into a user-supplied buffer,
copying bytes past the end of the allocation back to user space.

Impact: a malicious metadata server discloses adjacent kernel heap bytes
to a local user via getxattr(2) on a CephFS file. Add the missing
ceph_decode_need() so an out-of-bounds value length on the final
attribute fails the decode and returns -EIO instead of being stored.

Fixes: 355da1eb7a1f ("ceph: inode operations")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
---
 fs/ceph/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index e773be07f7674..d488bb8fc00ba 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode)
 			name = p;
 			p += len;
 			ceph_decode_32_safe(&p, end, len, bad);
+			ceph_decode_need(&p, end, len, bad);
 			val = p;
 			p += len;

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] ceph: bound xattr value length in __build_xattrs()
  2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
@ 2026-06-04 19:50   ` Viacheslav Dubeyko
  0 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 19:50 UTC (permalink / raw)
  To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel

On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote:
> __build_xattrs() decodes the MDS-supplied xattr blob one attribute at
> a
> time. For each attribute it reads a 32-bit name length, advances past
> the
> name bytes, reads a 32-bit value length, records the value pointer,
> and
> advances past the value bytes. The two length fields are read with
> ceph_decode_32_safe(), but the value bytes themselves are advanced
> over
> with a bare "p += len" and no ceph_decode_need() check that "len"
> bytes
> remain in the blob.
> 
> For every attribute except the last, the next iteration's
> ceph_decode_32_safe() on the following name length implicitly
> verifies
> that the previous value did not run past the blob end. The final
> attribute has no successor, so its decoded value length is never
> checked
> against the blob bounds. A malicious or compromised metadata server
> can
> set the last attribute's value length larger than the bytes actually
> present in the blob.
> 
> The blob is a dedicated kvmalloc() allocation sized to the wire
> length
> (ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the
> oversized length in xattr->val_len verbatim, and a later getxattr(2)
> runs
> memcpy(value, xattr->val, xattr->val_len) into a user-supplied
> buffer,
> copying bytes past the end of the allocation back to user space.
> 
> Impact: a malicious metadata server discloses adjacent kernel heap
> bytes
> to a local user via getxattr(2) on a CephFS file. Add the missing
> ceph_decode_need() so an out-of-bounds value length on the final
> attribute fails the decode and returns -EIO instead of being stored.
> 
> Fixes: 355da1eb7a1f ("ceph: inode operations")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> ---
>  fs/ceph/xattr.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
> index e773be07f7674..d488bb8fc00ba 100644
> --- a/fs/ceph/xattr.c
> +++ b/fs/ceph/xattr.c
> @@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode)
>  			name = p;
>  			p += len;
>  			ceph_decode_32_safe(&p, end, len, bad);
> +			ceph_decode_need(&p, end, len, bad);
>  			val = p;
>  			p += len;
>  

Makes sense.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session()
  2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
  2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
@ 2026-06-04 18:08 ` Michael Bommarito
  2026-06-04 19:54   ` Viacheslav Dubeyko
  2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

handle_session() decodes the MDSCapAuth records carried by a
CEPH_SESSION_OPEN message (msg_version >= 6). For each record the
match.path and match.fs_name byte strings are read by first decoding a
32-bit length and then copying that many bytes with the bare
ceph_decode_copy(). Unlike the surrounding fields, which all use the
_safe decode variants, these two copies are not preceded by a
ceph_decode_need() bounds check, and the enclosing MDSCapAuth and
MDSCapMatch struct_len fields are skipped rather than enforced as an
upper bound. A length larger than the bytes remaining in the message
front makes ceph_decode_copy() read past the end of the front buffer.

The message front is a dedicated allocation (ceph_msg_new2() ->
kvmalloc), so the over-read runs off that object. A malicious or
compromised MDS can trigger this with the first post-connect message on
mount, with no client-side user interaction; under KASAN it is reported
as a slab-out-of-bounds read in handle_session().

Impact: a malicious MDS can force the kernel client to read up to 4 GiB
past the message front allocation during session setup, crashing the
client (out-of-bounds read).

Switch both copies to ceph_decode_copy_safe(), which performs the
ceph_decode_need() bounds check before the copy and branches to the
existing bad label, matching the rest of the decoder and the error path
that frees the partially decoded cap_auths array.

Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session is opened")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
---
 fs/ceph/mds_client.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ed17e0023705e..4f36ac73305dc 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4318,7 +4318,9 @@ static void handle_session(struct ceph_mds_session *session,
 					pr_err_client(cl, "No memory for path\n");
 					goto fail;
 				}
-				ceph_decode_copy(&p, cap_auths[i].match.path, _len);
+				ceph_decode_copy_safe(&p, end,
+						      cap_auths[i].match.path,
+						      _len, bad);

 				/* Remove the tailing '/' */
 				while (_len && cap_auths[i].match.path[_len - 1] == '/') {
@@ -4335,7 +4337,9 @@ static void handle_session(struct ceph_mds_session *session,
 					pr_err_client(cl, "No memory for fs_name\n");
 					goto fail;
 				}
-				ceph_decode_copy(&p, cap_auths[i].match.fs_name, _len);
+				ceph_decode_copy_safe(&p, end,
+						      cap_auths[i].match.fs_name,
+						      _len, bad);
 			}

 			ceph_decode_8_safe(&p, end, cap_auths[i].match.root_squash, bad);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session()
  2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
@ 2026-06-04 19:54   ` Viacheslav Dubeyko
  0 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 19:54 UTC (permalink / raw)
  To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel

On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote:
> handle_session() decodes the MDSCapAuth records carried by a
> CEPH_SESSION_OPEN message (msg_version >= 6). For each record the
> match.path and match.fs_name byte strings are read by first decoding
> a
> 32-bit length and then copying that many bytes with the bare
> ceph_decode_copy(). Unlike the surrounding fields, which all use the
> _safe decode variants, these two copies are not preceded by a
> ceph_decode_need() bounds check, and the enclosing MDSCapAuth and
> MDSCapMatch struct_len fields are skipped rather than enforced as an
> upper bound. A length larger than the bytes remaining in the message
> front makes ceph_decode_copy() read past the end of the front buffer.
> 
> The message front is a dedicated allocation (ceph_msg_new2() ->
> kvmalloc), so the over-read runs off that object. A malicious or
> compromised MDS can trigger this with the first post-connect message
> on
> mount, with no client-side user interaction; under KASAN it is
> reported
> as a slab-out-of-bounds read in handle_session().
> 
> Impact: a malicious MDS can force the kernel client to read up to 4
> GiB
> past the message front allocation during session setup, crashing the
> client (out-of-bounds read).
> 
> Switch both copies to ceph_decode_copy_safe(), which performs the
> ceph_decode_need() bounds check before the copy and branches to the
> existing bad label, matching the rest of the decoder and the error
> path
> that frees the partially decoded cap_auths array.
> 
> Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session
> is opened")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> ---
>  fs/ceph/mds_client.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index ed17e0023705e..4f36ac73305dc 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4318,7 +4318,9 @@ static void handle_session(struct
> ceph_mds_session *session,
>  					pr_err_client(cl, "No memory
> for path\n");
>  					goto fail;
>  				}
> -				ceph_decode_copy(&p,
> cap_auths[i].match.path, _len);
> +				ceph_decode_copy_safe(&p, end,
> +						     
> cap_auths[i].match.path,
> +						      _len, bad);
>  
>  				/* Remove the tailing '/' */
>  				while (_len &&
> cap_auths[i].match.path[_len - 1] == '/') {
> @@ -4335,7 +4337,9 @@ static void handle_session(struct
> ceph_mds_session *session,
>  					pr_err_client(cl, "No memory
> for fs_name\n");
>  					goto fail;
>  				}
> -				ceph_decode_copy(&p,
> cap_auths[i].match.fs_name, _len);
> +				ceph_decode_copy_safe(&p, end,
> +						     
> cap_auths[i].match.fs_name,
> +						      _len, bad);
>  			}
>  
>  			ceph_decode_8_safe(&p, end,
> cap_auths[i].match.root_squash, bad);

Looks good.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3
  2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
  2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
  2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
@ 2026-06-04 18:08 ` Michael Bommarito
  2026-06-04 20:04   ` Viacheslav Dubeyko
  2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
  2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
  4 siblings, 1 reply; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets from
each per-mds info record and advances the decode cursor by
num_export_targets * sizeof(u32) without first checking that many bytes
remain. The only upper-bound check that catches a runaway cursor
(*p > info_end) is gated on info_v >= 4, because info_end is left NULL
for info_v 2 and 3. When the monitor sends an MDS map whose per-mds
info version is 2 or 3 with an oversized num_export_targets, the cursor
moves past the message front buffer and the later export-targets loop
calls the unchecked ceph_decode_32() on out-of-bounds memory.

A kernel client processes CEPH_MSG_MDS_MAP from its monitor session
(net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to
ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound and
calls ceph_mdsmap_decode()). A malicious or compromised monitor, or an
on-path attacker on an unsigned/unencrypted messenger session, can
therefore drive an out-of-bounds read in the client kernel; on x86_64
with KASAN it is reported as a slab-out-of-bounds read in
ceph_mdsmap_decode(). The decoded values land in the internal
info->export_targets[] array, so the consequence is a kernel
out-of-bounds read, not an information leak to the attacker.

Impact: a malicious or compromised Ceph monitor sending an MDS map with
a per-mds info version of 2 or 3 and an oversized num_export_targets
field triggers an out-of-bounds read in the CephFS client kernel.

Add a ceph_decode_need() for num_export_targets * sizeof(u32) before
advancing the cursor, so the bound is enforced for every info_v >= 2,
not only info_v >= 4. This mirrors the count-then-need idiom already
used for m_data_pg_pools later in the same function.

Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
 fs/ceph/mdsmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
index d8e46eb7e5eb5..7cb65f9f4783c 100644
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -224,6 +224,8 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p,
 		*p += namelen;
 		if (info_v >= 2) {
 			ceph_decode_32_safe(p, end, num_export_targets, bad);
+			ceph_decode_need(p, end,
+					 num_export_targets * sizeof(u32), bad);
 			pexport_targets = *p;
 			*p += num_export_targets * sizeof(u32);
 		} else {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3
  2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
@ 2026-06-04 20:04   ` Viacheslav Dubeyko
  2026-06-04 20:23     ` Michael Bommarito
  0 siblings, 1 reply; 17+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 20:04 UTC (permalink / raw)
  To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel

On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote:
> ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets
> from
> each per-mds info record and advances the decode cursor by
> num_export_targets * sizeof(u32) without first checking that many
> bytes
> remain. The only upper-bound check that catches a runaway cursor
> (*p > info_end) is gated on info_v >= 4, because info_end is left
> NULL
> for info_v 2 and 3. When the monitor sends an MDS map whose per-mds
> info version is 2 or 3 with an oversized num_export_targets, the
> cursor
> moves past the message front buffer and the later export-targets loop
> calls the unchecked ceph_decode_32() on out-of-bounds memory.
> 
> A kernel client processes CEPH_MSG_MDS_MAP from its monitor session
> (net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to
> ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound
> and
> calls ceph_mdsmap_decode()). A malicious or compromised monitor, or
> an
> on-path attacker on an unsigned/unencrypted messenger session, can
> therefore drive an out-of-bounds read in the client kernel; on x86_64
> with KASAN it is reported as a slab-out-of-bounds read in
> ceph_mdsmap_decode(). The decoded values land in the internal
> info->export_targets[] array, so the consequence is a kernel
> out-of-bounds read, not an information leak to the attacker.
> 
> Impact: a malicious or compromised Ceph monitor sending an MDS map
> with
> a per-mds info version of 2 or 3 and an oversized num_export_targets
> field triggers an out-of-bounds read in the CephFS client kernel.
> 
> Add a ceph_decode_need() for num_export_targets * sizeof(u32) before
> advancing the cursor, so the bound is enforced for every info_v >= 2,
> not only info_v >= 4. This mirrors the count-then-need idiom already
> used for m_data_pg_pools later in the same function.
> 
> Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support")
> Cc: stable@vger.kernel.org
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
>  fs/ceph/mdsmap.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
> index d8e46eb7e5eb5..7cb65f9f4783c 100644
> --- a/fs/ceph/mdsmap.c
> +++ b/fs/ceph/mdsmap.c
> @@ -224,6 +224,8 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct
> ceph_mds_client *mdsc, void **p,
>  		*p += namelen;
>  		if (info_v >= 2) {
>  			ceph_decode_32_safe(p, end,
> num_export_targets, bad);
> +			ceph_decode_need(p, end,
> +					 num_export_targets *
> sizeof(u32), bad);

We extract the num_export_targets from the message. What if the
num_export_targets is huge enough? I assume we could potentially have
overflow during the multiplication. What do you think?

Thanks,
Slava.

>  			pexport_targets = *p;
>  			*p += num_export_targets * sizeof(u32);
>  		} else {

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3
  2026-06-04 20:04   ` Viacheslav Dubeyko
@ 2026-06-04 20:23     ` Michael Bommarito
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 20:23 UTC (permalink / raw)
  To: Viacheslav Dubeyko; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel

> We extract the num_export_targets from the message. What if the
> num_export_targets is huge enough? I assume we could potentially have
> overflow during the multiplication. What do you think?

Good catch, we definitely need a v2 to address that with size_mul

Thanks,
Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
  2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
                   ` (2 preceding siblings ...)
  2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
@ 2026-06-04 18:09 ` Michael Bommarito
  2026-06-04 21:06   ` Viacheslav Dubeyko
  2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
  4 siblings, 1 reply; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:09 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

ceph_parse_deleg_inos() decodes interval sets of delegated inode numbers
from an MDS create-with-delegation reply. For each set it reads a 64-bit
start and a 64-bit len with ceph_decode_64_safe(), which only validates
that the eight bytes are present in the message, not the value, and then
loops:

	while (len--) {
		xa_insert(&s->s_delegated_inos, start++,
			  DELEGATED_INO_AVAILABLE, GFP_KERNEL);
		...
	}

len is fully attacker controlled. A malicious or compromised MDS can send
a create reply whose interval set declares len near 2^63, driving an
effectively unbounded loop that performs a GFP_KERNEL xarray insert on
each iteration. This spins a kernel thread in the reply dispatch path and
exhausts memory; on a client that has negotiated CEPHFS_FEATURE_DELEG_INO
(enabled by default) a single reply is enough to wedge the mount.

A legitimate MDS delegates only a small range per set (the userspace MDS
prealloc window, mds_client_prealloc_inos, defaults to 1000). Reject any
set whose len exceeds CEPH_MAX_DELEG_INOS (1M, far above any legitimate
value) by treating the reply as malformed and returning -EIO, consistent
with the existing decode error handling. Normal delegations are well under
the cap and are unaffected.

Fixes: d4846487870897 ("ceph: decode interval_sets for delegated inos")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
---
 fs/ceph/mds_client.c | 14 ++++++++++++++
 fs/ceph/super.h      |  9 +++++++++
 2 files changed, 23 insertions(+)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 4f36ac73305dc..0a084c4f3aae2 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -633,6 +633,20 @@ static int ceph_parse_deleg_inos(void **p, void *end,
 				start, len);
 			continue;
 		}
+
+		/*
+		 * A legitimate MDS delegates a small range per set. Treat a
+		 * count larger than any plausible delegation window as a
+		 * malformed reply rather than spinning while(len--) and
+		 * inserting unbounded xarray entries.
+		 */
+		if (len > CEPH_MAX_DELEG_INOS) {
+			pr_warn_ratelimited_client(cl,
+				"rejecting oversized inode range delegation (start=0x%llx len=0x%llx)\n",
+				start, len);
+			return -EIO;
+		}
+
 		while (len--) {
 			int err = xa_insert(&s->s_delegated_inos, start++,
 					    DELEGATED_INO_AVAILABLE,
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index afc89ce91804e..43a9b075f344c 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode *inode, void *data)
 #define CEPH_MDS_INO_LOG_OFFSET		(2 * CEPH_MAX_MDS)
 #define CEPH_INO_SYSTEM_BASE		((6*CEPH_MAX_MDS) + (CEPH_MAX_MDS * CEPH_NUM_STRAY))
 
+/*
+ * Upper bound on the number of inodes the MDS may delegate to a client in a
+ * single interval set. The userspace MDS hands out at most a few thousand
+ * (mds_client_prealloc_inos, default 1000); 1M is far above any legitimate
+ * value and guards ceph_parse_deleg_inos() against an unbounded loop driven
+ * by an attacker controlled 64-bit length.
+ */
+#define CEPH_MAX_DELEG_INOS		(1024 * 1024)
+
 static inline bool ceph_vino_is_reserved(const struct ceph_vino vino)
 {
 	if (vino.ino >= CEPH_INO_SYSTEM_BASE ||
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
  2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
@ 2026-06-04 21:06   ` Viacheslav Dubeyko
  2026-06-04 21:41     ` Michael Bommarito
  0 siblings, 1 reply; 17+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 21:06 UTC (permalink / raw)
  To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel

On Thu, 2026-06-04 at 14:09 -0400, Michael Bommarito wrote:
> ceph_parse_deleg_inos() decodes interval sets of delegated inode
> numbers
> from an MDS create-with-delegation reply. For each set it reads a 64-
> bit
> start and a 64-bit len with ceph_decode_64_safe(), which only
> validates
> that the eight bytes are present in the message, not the value, and
> then
> loops:
> 
> 	while (len--) {
> 		xa_insert(&s->s_delegated_inos, start++,
> 			  DELEGATED_INO_AVAILABLE, GFP_KERNEL);
> 		...
> 	}
> 
> len is fully attacker controlled. A malicious or compromised MDS can
> send
> a create reply whose interval set declares len near 2^63, driving an
> effectively unbounded loop that performs a GFP_KERNEL xarray insert
> on
> each iteration. This spins a kernel thread in the reply dispatch path
> and
> exhausts memory; on a client that has negotiated
> CEPHFS_FEATURE_DELEG_INO
> (enabled by default) a single reply is enough to wedge the mount.
> 
> A legitimate MDS delegates only a small range per set (the userspace
> MDS
> prealloc window, mds_client_prealloc_inos, defaults to 1000). Reject
> any
> set whose len exceeds CEPH_MAX_DELEG_INOS (1M, far above any
> legitimate
> value) by treating the reply as malformed and returning -EIO,
> consistent
> with the existing decode error handling. Normal delegations are well
> under
> the cap and are unaffected.
> 
> Fixes: d4846487870897 ("ceph: decode interval_sets for delegated
> inos")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> ---
>  fs/ceph/mds_client.c | 14 ++++++++++++++
>  fs/ceph/super.h      |  9 +++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 4f36ac73305dc..0a084c4f3aae2 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -633,6 +633,20 @@ static int ceph_parse_deleg_inos(void **p, void
> *end,
>  				start, len);
>  			continue;
>  		}
> +
> +		/*
> +		 * A legitimate MDS delegates a small range per set.
> Treat a
> +		 * count larger than any plausible delegation window
> as a
> +		 * malformed reply rather than spinning while(len--)
> and
> +		 * inserting unbounded xarray entries.
> +		 */
> +		if (len > CEPH_MAX_DELEG_INOS) {
> +			pr_warn_ratelimited_client(cl,
> +				"rejecting oversized inode range
> delegation (start=0x%llx len=0x%llx)\n",
> +				start, len);
> +			return -EIO;
> +		}
> +
>  		while (len--) {
>  			int err = xa_insert(&s->s_delegated_inos,
> start++,
>  					    DELEGATED_INO_AVAILABLE,
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index afc89ce91804e..43a9b075f344c 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode
> *inode, void *data)
>  #define CEPH_MDS_INO_LOG_OFFSET		(2 * CEPH_MAX_MDS)
>  #define CEPH_INO_SYSTEM_BASE		((6*CEPH_MAX_MDS) +
> (CEPH_MAX_MDS * CEPH_NUM_STRAY))
>  
> +/*
> + * Upper bound on the number of inodes the MDS may delegate to a
> client in a
> + * single interval set. The userspace MDS hands out at most a few
> thousand
> + * (mds_client_prealloc_inos, default 1000); 1M is far above any
> legitimate
> + * value and guards ceph_parse_deleg_inos() against an unbounded
> loop driven
> + * by an attacker controlled 64-bit length.
> + */
> +#define CEPH_MAX_DELEG_INOS		(1024 * 1024)

Technically speaking, I am not completely convinced by this limit.
Also, this limit is for one single interval. But if attacker tries
multiple times, then we can still have memory pressure or run out of
memory. I think we need to consider more robust solution. Could we
improve the fix?

Thanks,
Slava.

> +
>  static inline bool ceph_vino_is_reserved(const struct ceph_vino
> vino)
>  {
>  	if (vino.ino >= CEPH_INO_SYSTEM_BASE ||

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
  2026-06-04 21:06   ` Viacheslav Dubeyko
@ 2026-06-04 21:41     ` Michael Bommarito
  2026-06-05 19:10       ` Viacheslav Dubeyko
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 21:41 UTC (permalink / raw)
  To: Viacheslav Dubeyko; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel

On Thu, Jun 4, 2026 at 5:06 PM Viacheslav Dubeyko <slava@dubeyko.com> wrote:
> Technically speaking, I am not completely convinced by this limit.
> Also, this limit is for one single interval. But if attacker tries
> multiple times, then we can still have memory pressure or run out of
> memory. I think we need to consider more robust solution. Could we
> improve the fix?

I looked at a per session limit but the lock management gets to be a
bit more complicated.  If you think that's the right direction, I can
implement v2 based on that.

About the count, if we move to per session, then do you agree that the
right number is based on some factor of mds_client_prealloc_inos?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
  2026-06-04 21:41     ` Michael Bommarito
@ 2026-06-05 19:10       ` Viacheslav Dubeyko
  0 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-05 19:10 UTC (permalink / raw)
  To: Michael Bommarito; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel

On Thu, 2026-06-04 at 17:41 -0400, Michael Bommarito wrote:
> On Thu, Jun 4, 2026 at 5:06 PM Viacheslav Dubeyko <slava@dubeyko.com>
> wrote:
> > Technically speaking, I am not completely convinced by this limit.
> > Also, this limit is for one single interval. But if attacker tries
> > multiple times, then we can still have memory pressure or run out
> > of
> > memory. I think we need to consider more robust solution. Could we
> > improve the fix?
> 
> I looked at a per session limit but the lock management gets to be a
> bit more complicated.  If you think that's the right direction, I can
> implement v2 based on that.
> 
> About the count, if we move to per session, then do you agree that
> the
> right number is based on some factor of mds_client_prealloc_inos?

I believe that it is right direction. Anyway, it's better than to
operate by assumption or hardcode the constant into the code.

Thanks,
Slava.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders
  2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
                   ` (3 preceding siblings ...)
  2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
@ 2026-06-06 19:00 ` Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
                     ` (3 more replies)
  4 siblings, 4 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

This is v2 of the CephFS decoder-bound series. The first two patches are
unchanged code-wise and now carry Slava's Reviewed-by. Patches 3 and 4
address the review feedback on overflow-safe sizing and aggregate delegated
inode bounds.

The four bugs are still independent:

  1/4 rejects a final xattr value length that runs past the xattr blob.
  2/4 bounds MDSCapAuth path and fs_name copies in handle_session().
  3/4 bounds the mdsmap export_targets array for info_v 2/3.
  4/4 bounds delegated-inode parsing by session population and by one
      reply's aggregate interval length.

Changes in v2:

  - Add Reviewed-by: Viacheslav Dubeyko to patches 1 and 2.
  - Patch 3 computes the export-targets byte count with size_mul() and
    reuses the checked length for the cursor advance.
  - Patch 4 replaces the per-interval cap with a per-session population
    counter and a per-reply interval budget, so repeated replies and
    duplicate ranges are bounded too. The cap stays a fixed client-side
    constant because the kernel client never sees the userspace
    mds_client_prealloc_inos option; it is sized as a generous multiple of
    that option's documented default of 1000.

Michael Bommarito (4):
  ceph: bound xattr value length in __build_xattrs()
  ceph: bound MDSCapAuth path and fs_name decode in handle_session()
  ceph: bound num_export_targets array for mds info v2/v3
  ceph: cap delegated inode count in ceph_parse_deleg_inos()

 fs/ceph/mds_client.c | 59 ++++++++++++++++++++++++++++++++++++++------
 fs/ceph/mds_client.h |  1 +
 fs/ceph/mdsmap.c     |  7 +++++-
 fs/ceph/super.h      |  9 +++++++
 fs/ceph/xattr.c      |  1 +
 5 files changed, 68 insertions(+), 9 deletions(-)


base-commit: f72c95f3a516d87483e225ae081a402a09fd0127
-- 
2.53.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs()
  2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
@ 2026-06-06 19:00   ` Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

__build_xattrs() decodes the MDS-supplied xattr blob one attribute at a
time. For each attribute it reads a 32-bit name length, advances past the
name bytes, reads a 32-bit value length, records the value pointer, and
advances past the value bytes. The two length fields are read with
ceph_decode_32_safe(), but the value bytes themselves are advanced over
with a bare "p += len" and no ceph_decode_need() check that "len" bytes
remain in the blob.

For every attribute except the last, the next iteration's
ceph_decode_32_safe() on the following name length implicitly verifies
that the previous value did not run past the blob end. The final
attribute has no successor, so its decoded value length is never checked
against the blob bounds. A malicious or compromised metadata server can
set the last attribute's value length larger than the bytes actually
present in the blob.

The blob is a dedicated kvmalloc() allocation sized to the wire length
(ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the
oversized length in xattr->val_len verbatim, and a later getxattr(2) runs
memcpy(value, xattr->val, xattr->val_len) into a user-supplied buffer,
copying bytes past the end of the allocation back to user space.

Impact: a malicious metadata server discloses adjacent kernel heap bytes
to a local user via getxattr(2) on a CephFS file. Add the missing
ceph_decode_need() so an out-of-bounds value length on the final
attribute fails the decode and returns -EIO instead of being stored.

Fixes: 355da1eb7a1f ("ceph: inode operations")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
---
v2:
- Add Reviewed-by from Viacheslav Dubeyko.

 fs/ceph/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index e773be07f7674..d488bb8fc00ba 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode)
 			name = p;
 			p += len;
 			ceph_decode_32_safe(&p, end, len, bad);
+			ceph_decode_need(&p, end, len, bad);
 			val = p;
 			p += len;

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session()
  2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
@ 2026-06-06 19:00   ` Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
  3 siblings, 0 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

handle_session() decodes the MDSCapAuth records carried by a
CEPH_SESSION_OPEN message (msg_version >= 6). For each record the
match.path and match.fs_name byte strings are read by first decoding a
32-bit length and then copying that many bytes with the bare
ceph_decode_copy(). Unlike the surrounding fields, which all use the
_safe decode variants, these two copies are not preceded by a
ceph_decode_need() bounds check, and the enclosing MDSCapAuth and
MDSCapMatch struct_len fields are skipped rather than enforced as an
upper bound. A length larger than the bytes remaining in the message
front makes ceph_decode_copy() read past the end of the front buffer.

The message front is a dedicated allocation (ceph_msg_new2() ->
kvmalloc), so the over-read runs off that object. A malicious or
compromised MDS can trigger this with the first post-connect message on
mount, with no client-side user interaction; under KASAN it is reported
as a slab-out-of-bounds read in handle_session().

Impact: a malicious MDS can force the kernel client to read up to 4 GiB
past the message front allocation during session setup, crashing the
client (out-of-bounds read).

Switch both copies to ceph_decode_copy_safe(), which performs the
ceph_decode_need() bounds check before the copy and branches to the
existing bad label, matching the rest of the decoder and the error path
that frees the partially decoded cap_auths array.

Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session is opened")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
---
v2:
- Add Reviewed-by from Viacheslav Dubeyko.

 fs/ceph/mds_client.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ed17e0023705e..4f36ac73305dc 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4318,7 +4318,9 @@ static void handle_session(struct ceph_mds_session *session,
 					pr_err_client(cl, "No memory for path\n");
 					goto fail;
 				}
-				ceph_decode_copy(&p, cap_auths[i].match.path, _len);
+				ceph_decode_copy_safe(&p, end,
+						      cap_auths[i].match.path,
+						      _len, bad);

 				/* Remove the tailing '/' */
 				while (_len && cap_auths[i].match.path[_len - 1] == '/') {
@@ -4335,7 +4337,9 @@ static void handle_session(struct ceph_mds_session *session,
 					pr_err_client(cl, "No memory for fs_name\n");
 					goto fail;
 				}
-				ceph_decode_copy(&p, cap_auths[i].match.fs_name, _len);
+				ceph_decode_copy_safe(&p, end,
+						      cap_auths[i].match.fs_name,
+						      _len, bad);
 			}

 			ceph_decode_8_safe(&p, end, cap_auths[i].match.root_squash, bad);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3
  2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
@ 2026-06-06 19:00   ` Michael Bommarito
  2026-06-06 19:00   ` [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
  3 siblings, 0 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets from
each per-mds info record and advances the decode cursor by
num_export_targets * sizeof(u32) without first checking that many bytes
remain. The only upper-bound check that catches a runaway cursor
(*p > info_end) is gated on info_v >= 4, because info_end is left NULL
for info_v 2 and 3. When the monitor sends an MDS map whose per-mds
info version is 2 or 3 with an oversized num_export_targets, the cursor
moves past the message front buffer and the later export-targets loop
calls the unchecked ceph_decode_32() on out-of-bounds memory.

A kernel client processes CEPH_MSG_MDS_MAP from its monitor session
(net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to
ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound and
calls ceph_mdsmap_decode()). A malicious or compromised monitor, or an
on-path attacker on an unsigned/unencrypted messenger session, can
therefore drive an out-of-bounds read in the client kernel; on x86_64
with KASAN it is reported as a slab-out-of-bounds read in
ceph_mdsmap_decode(). The decoded values land in the internal
info->export_targets[] array, so the consequence is a kernel
out-of-bounds read, not an information leak to the attacker.

Impact: a malicious or compromised Ceph monitor sending an MDS map with
a per-mds info version of 2 or 3 and an oversized num_export_targets
field triggers an out-of-bounds read in the CephFS client kernel.

Add a ceph_decode_need() for the export-targets array before advancing
the cursor, so the bound is enforced for every info_v >= 2, not only
info_v >= 4. This mirrors the count-then-need idiom already used for
m_data_pg_pools later in the same function.

Compute the export-targets byte count with size_mul() and reuse that
checked length when advancing the cursor, so the attacker-controlled
num_export_targets multiplication fails closed on overflow rather than
relying on the later kcalloc() guard.

Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
v2:
- Compute the export-targets byte count with size_mul() and reuse the
  checked length when advancing the cursor.

 fs/ceph/mdsmap.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
index d8e46eb7e5eb5..c0f63f0460f67 100644
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -3,6 +3,7 @@

 #include <linux/bug.h>
 #include <linux/err.h>
+#include <linux/overflow.h>
 #include <linux/random.h>
 #include <linux/slab.h>
 #include <linux/types.h>
@@ -126,6 +127,7 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p,
 	u8 mdsmap_v;
 	u16 mdsmap_ev;
 	u32 target;
+	size_t export_targets_len;

 	m = kzalloc_obj(*m, GFP_NOFS);
 	if (!m)
@@ -224,8 +226,11 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p,
 		*p += namelen;
 		if (info_v >= 2) {
 			ceph_decode_32_safe(p, end, num_export_targets, bad);
+			export_targets_len = size_mul(num_export_targets,
+						      sizeof(u32));
+			ceph_decode_need(p, end, export_targets_len, bad);
 			pexport_targets = *p;
-			*p += num_export_targets * sizeof(u32);
+			*p += export_targets_len;
 		} else {
 			num_export_targets = 0;
 		}
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
  2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
                     ` (2 preceding siblings ...)
  2026-06-06 19:00   ` [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
@ 2026-06-06 19:00   ` Michael Bommarito
  3 siblings, 0 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel

ceph_parse_deleg_inos() decodes interval sets of delegated inode numbers
from an MDS create-with-delegation reply. For each set it reads a 64-bit
start and a 64-bit len with ceph_decode_64_safe(), which only validates
that the eight bytes are present in the message, not the value, and then
loops over len while inserting entries into s_delegated_inos.

len is fully attacker controlled. A malicious or compromised MDS can send
one huge interval, many intervals in one reply, duplicate intervals, or
repeated replies that accumulate delegated inodes on the same session.
The v1 fix bounded only one interval and still allowed aggregate CPU and
memory pressure.

Bound both dimensions. Track the number of delegated inodes currently
held by each MDS session, reject replies that would exceed that
population cap, and also cap the total interval length accepted from one
reply so duplicate ranges cannot spin through an attacker-controlled len
while making no successful xarray inserts. The counter is decremented
when async create consumes a delegated inode, incremented when a
delegated inode is restored, initialized with the session xarray, and
reset when reconnect destroys the xarray contents.

The cap is a fixed, client-chosen constant rather than a value derived
from the MDS. mds_client_prealloc_inos is a userspace MDS configuration
option; it is never sent to the kernel client on the wire, and a
server-supplied bound could not be trusted for a defensive limit in any
case. The constant is set well above that option's documented default of
1000 (a generous multiple), so legitimate refill behavior is unaffected
while the CPU and xarray memory a malformed delegation stream can consume
stays bounded.

Impact: a malicious or compromised Ceph MDS can no longer make a client
spin through an unbounded delegated-inode interval or grow one session's
delegated-inode xarray without limit.

Fixes: d48464878708 ("ceph: decode interval_sets for delegated inos")
Cc: stable@vger.kernel.org
Suggested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
v2:
- Replace the per-interval cap with a per-session delegated-inode
  population counter.
- Add a per-reply interval budget so duplicate ranges cannot run the
  insertion loop without increasing the population counter.
- Reset the counter when reconnect destroys the delegated-inode xarray.
- Document why the cap is a fixed client-side constant (the kernel client
  cannot read the userspace mds_client_prealloc_inos option).

 fs/ceph/mds_client.c | 51 ++++++++++++++++++++++++++++++++++++++------
 fs/ceph/mds_client.h |  1 +
 fs/ceph/super.h      |  9 ++++++++
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 4f36ac73305dc..ec5a80c9f86a7 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -612,16 +612,34 @@ static int parse_reply_info_filelock(void **p, void *end,
 
 #define DELEGATED_INO_AVAILABLE		xa_mk_value(1)
 
+static int ceph_insert_deleg_ino(struct ceph_mds_session *s, u64 ino)
+{
+	int err;
+
+	if (atomic_inc_return(&s->s_num_deleg_inos) > CEPH_MAX_DELEG_INOS) {
+		atomic_dec(&s->s_num_deleg_inos);
+		return -EOVERFLOW;
+	}
+
+	err = xa_insert(&s->s_delegated_inos, ino, DELEGATED_INO_AVAILABLE,
+			GFP_KERNEL);
+	if (err)
+		atomic_dec(&s->s_num_deleg_inos);
+	return err;
+}
+
 static int ceph_parse_deleg_inos(void **p, void *end,
 				 struct ceph_mds_session *s)
 {
 	struct ceph_client *cl = s->s_mdsc->fsc->client;
+	u64 msg_deleg_inos = 0;
 	u32 sets;
 
 	ceph_decode_32_safe(p, end, sets, bad);
 	doutc(cl, "got %u sets of delegated inodes\n", sets);
 	while (sets--) {
 		u64 start, len;
+		int session_deleg_inos;
 
 		ceph_decode_64_safe(p, end, start, bad);
 		ceph_decode_64_safe(p, end, len, bad);
@@ -633,16 +651,34 @@ static int ceph_parse_deleg_inos(void **p, void *end,
 				start, len);
 			continue;
 		}
+
+		if (len > CEPH_MAX_DELEG_INOS ||
+		    msg_deleg_inos > CEPH_MAX_DELEG_INOS - len) {
+			pr_warn_ratelimited_client(cl, "too many delegated inodes in reply\n");
+			return -EIO;
+		}
+
+		session_deleg_inos = atomic_read(&s->s_num_deleg_inos);
+		if (session_deleg_inos > CEPH_MAX_DELEG_INOS ||
+		    len > CEPH_MAX_DELEG_INOS - session_deleg_inos) {
+			pr_warn_ratelimited_client(cl, "too many delegated inodes for session\n");
+			return -EIO;
+		}
+
+		msg_deleg_inos += len;
+
 		while (len--) {
-			int err = xa_insert(&s->s_delegated_inos, start++,
-					    DELEGATED_INO_AVAILABLE,
-					    GFP_KERNEL);
+			int err = ceph_insert_deleg_ino(s, start++);
+
 			if (!err) {
 				doutc(cl, "added delegated inode 0x%llx\n", start - 1);
 			} else if (err == -EBUSY) {
 				pr_warn_client(cl,
 					"MDS delegated inode 0x%llx more than once.\n",
 					start - 1);
+			} else if (err == -EOVERFLOW) {
+				pr_warn_ratelimited_client(cl, "too many delegated inodes\n");
+				return -EIO;
 			} else {
 				return err;
 			}
@@ -660,16 +696,17 @@ u64 ceph_get_deleg_ino(struct ceph_mds_session *s)
 
 	xa_for_each(&s->s_delegated_inos, ino, val) {
 		val = xa_erase(&s->s_delegated_inos, ino);
-		if (val == DELEGATED_INO_AVAILABLE)
+		if (val == DELEGATED_INO_AVAILABLE) {
+			atomic_dec(&s->s_num_deleg_inos);
 			return ino;
+		}
 	}
 	return 0;
 }
 
 int ceph_restore_deleg_ino(struct ceph_mds_session *s, u64 ino)
 {
-	return xa_insert(&s->s_delegated_inos, ino, DELEGATED_INO_AVAILABLE,
-			 GFP_KERNEL);
+	return ceph_insert_deleg_ino(s, ino);
 }
 #else /* BITS_PER_LONG == 64 */
 /*
@@ -1056,6 +1093,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
 	INIT_LIST_HEAD(&s->s_waiting);
 	INIT_LIST_HEAD(&s->s_unsafe);
 	xa_init(&s->s_delegated_inos);
+	atomic_set(&s->s_num_deleg_inos, 0);
 	INIT_LIST_HEAD(&s->s_cap_releases);
 	INIT_WORK(&s->s_cap_release_work, ceph_cap_release_work);
 
@@ -4971,6 +5009,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
 		goto fail_nomsg;
 
 	xa_destroy(&session->s_delegated_inos);
+	atomic_set(&session->s_num_deleg_inos, 0);
 
 	mutex_lock(&session->s_mutex);
 	session->s_state = CEPH_MDS_SESSION_RECONNECTING;
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 4e6c87f8414cc..3810b51ece2e6 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -255,6 +255,7 @@ struct ceph_mds_session {
 	struct list_head  s_waiting;  /* waiting requests */
 	struct list_head  s_unsafe;   /* unsafe requests */
 	struct xarray	  s_delegated_inos;
+	atomic_t	  s_num_deleg_inos;
 };
 
 /*
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index afc89ce91804e..bfc26e4d83bb4 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode *inode, void *data)
 #define CEPH_MDS_INO_LOG_OFFSET		(2 * CEPH_MAX_MDS)
 #define CEPH_INO_SYSTEM_BASE		((6*CEPH_MAX_MDS) + (CEPH_MAX_MDS * CEPH_NUM_STRAY))
 
+/*
+ * Upper bound on the number of delegated inodes a single MDS session may
+ * hold. The MDS normally hands out a small preallocation window (the
+ * userspace mds_client_prealloc_inos option defaults to 1000) and refills
+ * it as the client consumes entries. This leaves generous headroom while
+ * bounding the CPU and memory a malformed delegation interval can consume.
+ */
+#define CEPH_MAX_DELEG_INOS		8192
+
 static inline bool ceph_vino_is_reserved(const struct ceph_vino vino)
 {
 	if (vino.ino >= CEPH_INO_SYSTEM_BASE ||
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-06-06 19:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
2026-06-04 19:50   ` Viacheslav Dubeyko
2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
2026-06-04 19:54   ` Viacheslav Dubeyko
2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
2026-06-04 20:04   ` Viacheslav Dubeyko
2026-06-04 20:23     ` Michael Bommarito
2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
2026-06-04 21:06   ` Viacheslav Dubeyko
2026-06-04 21:41     ` Michael Bommarito
2026-06-05 19:10       ` Viacheslav Dubeyko
2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
2026-06-06 19:00   ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
2026-06-06 19:00   ` [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
2026-06-06 19:00   ` [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
2026-06-06 19:00   ` [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox