* [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders
@ 2026-06-04 18:08 Michael Bommarito
2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel
The CephFS client decodes several variable-length fields from MDS and
monitor replies without checking the declared length against the bytes
actually present in the message. Each of the four sites below trusts a
32-bit or 64-bit length from the wire and then copies, advances over, or
loops on it. A malicious or compromised server (or, for the monitor map,
an on-path attacker on an unsigned messenger session) can drive an
out-of-bounds read or an unbounded loop in the client kernel. The fixes
add the missing ceph_decode_need()/ceph_decode_copy_safe() bound or an
explicit count cap, matching the idioms already used elsewhere in the
same decoders.
Patch 1 is the most serious: the over-read value is returned to user
space through getxattr(2), so it is a kernel-heap information leak
(AC:H, requires a user to read the attribute). Patches 2-4 are
out-of-bounds reads / an unbounded loop that crash or wedge the client
(denial of service, no information disclosure). For patch 3 the supplier
is the monitor, a more privileged cluster role than the MDS; it is
included as a hardening fix for the info_v 2/3 compatibility path.
All four were reproduced on x86_64 QEMU with CONFIG_KASAN=y by calling
the real decoder via an in-tree debugfs harness, and re-run with the
patch applied:
patch 1: stock - __build_xattrs() stores val_len=304 from a 256-byte
blob, then the getxattr memcpy is a slab-out-of-bounds read of
304 bytes; patched - decode returns -EIO, no value stored.
patch 2: stock - slab-out-of-bounds read of 512 bytes in
handle_session(); patched - the bound rejects the record.
patch 3: stock - slab-out-of-bounds read in ceph_mdsmap_decode();
patched - decode returns -EINVAL.
patch 4: stock - ceph_parse_deleg_inos() runs 1048640 iterations on a
crafted len; patched - returns -EIO before the loop.
Well-formed replies still decode in every case (valid xattr value,
in-range delegation, in-bounds export-target array). These bugs were
found with AI assistance and are reported on the public list
accordingly, especially since they are mostly about a malicious
trusted server.
Michael Bommarito (4):
ceph: bound xattr value length in __build_xattrs()
ceph: bound MDSCapAuth path and fs_name decode in handle_session()
ceph: bound num_export_targets array for mds info v2/v3
ceph: cap delegated inode count in ceph_parse_deleg_inos()
fs/ceph/mds_client.c | 22 ++++++++++++++++++++--
fs/ceph/mdsmap.c | 2 ++
fs/ceph/super.h | 9 +++++++++
fs/ceph/xattr.c | 1 +
4 files changed, 32 insertions(+), 2 deletions(-)
base-commit: f72c95f3a516d87483e225ae081a402a09fd0127
--
2.53.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] ceph: bound xattr value length in __build_xattrs()
2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
@ 2026-06-04 18:08 ` Michael Bommarito
2026-06-04 19:50 ` Viacheslav Dubeyko
2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel
__build_xattrs() decodes the MDS-supplied xattr blob one attribute at a
time. For each attribute it reads a 32-bit name length, advances past the
name bytes, reads a 32-bit value length, records the value pointer, and
advances past the value bytes. The two length fields are read with
ceph_decode_32_safe(), but the value bytes themselves are advanced over
with a bare "p += len" and no ceph_decode_need() check that "len" bytes
remain in the blob.
For every attribute except the last, the next iteration's
ceph_decode_32_safe() on the following name length implicitly verifies
that the previous value did not run past the blob end. The final
attribute has no successor, so its decoded value length is never checked
against the blob bounds. A malicious or compromised metadata server can
set the last attribute's value length larger than the bytes actually
present in the blob.
The blob is a dedicated kvmalloc() allocation sized to the wire length
(ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the
oversized length in xattr->val_len verbatim, and a later getxattr(2) runs
memcpy(value, xattr->val, xattr->val_len) into a user-supplied buffer,
copying bytes past the end of the allocation back to user space.
Impact: a malicious metadata server discloses adjacent kernel heap bytes
to a local user via getxattr(2) on a CephFS file. Add the missing
ceph_decode_need() so an out-of-bounds value length on the final
attribute fails the decode and returns -EIO instead of being stored.
Fixes: 355da1eb7a1f ("ceph: inode operations")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
---
fs/ceph/xattr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index e773be07f7674..d488bb8fc00ba 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode)
name = p;
p += len;
ceph_decode_32_safe(&p, end, len, bad);
+ ceph_decode_need(&p, end, len, bad);
val = p;
p += len;
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session()
2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
@ 2026-06-04 18:08 ` Michael Bommarito
2026-06-04 19:54 ` Viacheslav Dubeyko
2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
3 siblings, 1 reply; 11+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel
handle_session() decodes the MDSCapAuth records carried by a
CEPH_SESSION_OPEN message (msg_version >= 6). For each record the
match.path and match.fs_name byte strings are read by first decoding a
32-bit length and then copying that many bytes with the bare
ceph_decode_copy(). Unlike the surrounding fields, which all use the
_safe decode variants, these two copies are not preceded by a
ceph_decode_need() bounds check, and the enclosing MDSCapAuth and
MDSCapMatch struct_len fields are skipped rather than enforced as an
upper bound. A length larger than the bytes remaining in the message
front makes ceph_decode_copy() read past the end of the front buffer.
The message front is a dedicated allocation (ceph_msg_new2() ->
kvmalloc), so the over-read runs off that object. A malicious or
compromised MDS can trigger this with the first post-connect message on
mount, with no client-side user interaction; under KASAN it is reported
as a slab-out-of-bounds read in handle_session().
Impact: a malicious MDS can force the kernel client to read up to 4 GiB
past the message front allocation during session setup, crashing the
client (out-of-bounds read).
Switch both copies to ceph_decode_copy_safe(), which performs the
ceph_decode_need() bounds check before the copy and branches to the
existing bad label, matching the rest of the decoder and the error path
that frees the partially decoded cap_auths array.
Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session is opened")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
---
fs/ceph/mds_client.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ed17e0023705e..4f36ac73305dc 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4318,7 +4318,9 @@ static void handle_session(struct ceph_mds_session *session,
pr_err_client(cl, "No memory for path\n");
goto fail;
}
- ceph_decode_copy(&p, cap_auths[i].match.path, _len);
+ ceph_decode_copy_safe(&p, end,
+ cap_auths[i].match.path,
+ _len, bad);
/* Remove the tailing '/' */
while (_len && cap_auths[i].match.path[_len - 1] == '/') {
@@ -4335,7 +4337,9 @@ static void handle_session(struct ceph_mds_session *session,
pr_err_client(cl, "No memory for fs_name\n");
goto fail;
}
- ceph_decode_copy(&p, cap_auths[i].match.fs_name, _len);
+ ceph_decode_copy_safe(&p, end,
+ cap_auths[i].match.fs_name,
+ _len, bad);
}
ceph_decode_8_safe(&p, end, cap_auths[i].match.root_squash, bad);
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3
2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
@ 2026-06-04 18:08 ` Michael Bommarito
2026-06-04 20:04 ` Viacheslav Dubeyko
2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
3 siblings, 1 reply; 11+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel
ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets from
each per-mds info record and advances the decode cursor by
num_export_targets * sizeof(u32) without first checking that many bytes
remain. The only upper-bound check that catches a runaway cursor
(*p > info_end) is gated on info_v >= 4, because info_end is left NULL
for info_v 2 and 3. When the monitor sends an MDS map whose per-mds
info version is 2 or 3 with an oversized num_export_targets, the cursor
moves past the message front buffer and the later export-targets loop
calls the unchecked ceph_decode_32() on out-of-bounds memory.
A kernel client processes CEPH_MSG_MDS_MAP from its monitor session
(net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to
ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound and
calls ceph_mdsmap_decode()). A malicious or compromised monitor, or an
on-path attacker on an unsigned/unencrypted messenger session, can
therefore drive an out-of-bounds read in the client kernel; on x86_64
with KASAN it is reported as a slab-out-of-bounds read in
ceph_mdsmap_decode(). The decoded values land in the internal
info->export_targets[] array, so the consequence is a kernel
out-of-bounds read, not an information leak to the attacker.
Impact: a malicious or compromised Ceph monitor sending an MDS map with
a per-mds info version of 2 or 3 and an oversized num_export_targets
field triggers an out-of-bounds read in the CephFS client kernel.
Add a ceph_decode_need() for num_export_targets * sizeof(u32) before
advancing the cursor, so the bound is enforced for every info_v >= 2,
not only info_v >= 4. This mirrors the count-then-need idiom already
used for m_data_pg_pools later in the same function.
Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
fs/ceph/mdsmap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
index d8e46eb7e5eb5..7cb65f9f4783c 100644
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -224,6 +224,8 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p,
*p += namelen;
if (info_v >= 2) {
ceph_decode_32_safe(p, end, num_export_targets, bad);
+ ceph_decode_need(p, end,
+ num_export_targets * sizeof(u32), bad);
pexport_targets = *p;
*p += num_export_targets * sizeof(u32);
} else {
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
` (2 preceding siblings ...)
2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
@ 2026-06-04 18:09 ` Michael Bommarito
2026-06-04 21:06 ` Viacheslav Dubeyko
3 siblings, 1 reply; 11+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:09 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel
ceph_parse_deleg_inos() decodes interval sets of delegated inode numbers
from an MDS create-with-delegation reply. For each set it reads a 64-bit
start and a 64-bit len with ceph_decode_64_safe(), which only validates
that the eight bytes are present in the message, not the value, and then
loops:
while (len--) {
xa_insert(&s->s_delegated_inos, start++,
DELEGATED_INO_AVAILABLE, GFP_KERNEL);
...
}
len is fully attacker controlled. A malicious or compromised MDS can send
a create reply whose interval set declares len near 2^63, driving an
effectively unbounded loop that performs a GFP_KERNEL xarray insert on
each iteration. This spins a kernel thread in the reply dispatch path and
exhausts memory; on a client that has negotiated CEPHFS_FEATURE_DELEG_INO
(enabled by default) a single reply is enough to wedge the mount.
A legitimate MDS delegates only a small range per set (the userspace MDS
prealloc window, mds_client_prealloc_inos, defaults to 1000). Reject any
set whose len exceeds CEPH_MAX_DELEG_INOS (1M, far above any legitimate
value) by treating the reply as malformed and returning -EIO, consistent
with the existing decode error handling. Normal delegations are well under
the cap and are unaffected.
Fixes: d4846487870897 ("ceph: decode interval_sets for delegated inos")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
---
fs/ceph/mds_client.c | 14 ++++++++++++++
fs/ceph/super.h | 9 +++++++++
2 files changed, 23 insertions(+)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 4f36ac73305dc..0a084c4f3aae2 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -633,6 +633,20 @@ static int ceph_parse_deleg_inos(void **p, void *end,
start, len);
continue;
}
+
+ /*
+ * A legitimate MDS delegates a small range per set. Treat a
+ * count larger than any plausible delegation window as a
+ * malformed reply rather than spinning while(len--) and
+ * inserting unbounded xarray entries.
+ */
+ if (len > CEPH_MAX_DELEG_INOS) {
+ pr_warn_ratelimited_client(cl,
+ "rejecting oversized inode range delegation (start=0x%llx len=0x%llx)\n",
+ start, len);
+ return -EIO;
+ }
+
while (len--) {
int err = xa_insert(&s->s_delegated_inos, start++,
DELEGATED_INO_AVAILABLE,
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index afc89ce91804e..43a9b075f344c 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode *inode, void *data)
#define CEPH_MDS_INO_LOG_OFFSET (2 * CEPH_MAX_MDS)
#define CEPH_INO_SYSTEM_BASE ((6*CEPH_MAX_MDS) + (CEPH_MAX_MDS * CEPH_NUM_STRAY))
+/*
+ * Upper bound on the number of inodes the MDS may delegate to a client in a
+ * single interval set. The userspace MDS hands out at most a few thousand
+ * (mds_client_prealloc_inos, default 1000); 1M is far above any legitimate
+ * value and guards ceph_parse_deleg_inos() against an unbounded loop driven
+ * by an attacker controlled 64-bit length.
+ */
+#define CEPH_MAX_DELEG_INOS (1024 * 1024)
+
static inline bool ceph_vino_is_reserved(const struct ceph_vino vino)
{
if (vino.ino >= CEPH_INO_SYSTEM_BASE ||
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] ceph: bound xattr value length in __build_xattrs()
2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
@ 2026-06-04 19:50 ` Viacheslav Dubeyko
0 siblings, 0 replies; 11+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 19:50 UTC (permalink / raw)
To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel
On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote:
> __build_xattrs() decodes the MDS-supplied xattr blob one attribute at
> a
> time. For each attribute it reads a 32-bit name length, advances past
> the
> name bytes, reads a 32-bit value length, records the value pointer,
> and
> advances past the value bytes. The two length fields are read with
> ceph_decode_32_safe(), but the value bytes themselves are advanced
> over
> with a bare "p += len" and no ceph_decode_need() check that "len"
> bytes
> remain in the blob.
>
> For every attribute except the last, the next iteration's
> ceph_decode_32_safe() on the following name length implicitly
> verifies
> that the previous value did not run past the blob end. The final
> attribute has no successor, so its decoded value length is never
> checked
> against the blob bounds. A malicious or compromised metadata server
> can
> set the last attribute's value length larger than the bytes actually
> present in the blob.
>
> The blob is a dedicated kvmalloc() allocation sized to the wire
> length
> (ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the
> oversized length in xattr->val_len verbatim, and a later getxattr(2)
> runs
> memcpy(value, xattr->val, xattr->val_len) into a user-supplied
> buffer,
> copying bytes past the end of the allocation back to user space.
>
> Impact: a malicious metadata server discloses adjacent kernel heap
> bytes
> to a local user via getxattr(2) on a CephFS file. Add the missing
> ceph_decode_need() so an out-of-bounds value length on the final
> attribute fails the decode and returns -EIO instead of being stored.
>
> Fixes: 355da1eb7a1f ("ceph: inode operations")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> ---
> fs/ceph/xattr.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
> index e773be07f7674..d488bb8fc00ba 100644
> --- a/fs/ceph/xattr.c
> +++ b/fs/ceph/xattr.c
> @@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode)
> name = p;
> p += len;
> ceph_decode_32_safe(&p, end, len, bad);
> + ceph_decode_need(&p, end, len, bad);
> val = p;
> p += len;
>
Makes sense.
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Thanks,
Slava.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session()
2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
@ 2026-06-04 19:54 ` Viacheslav Dubeyko
0 siblings, 0 replies; 11+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 19:54 UTC (permalink / raw)
To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel
On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote:
> handle_session() decodes the MDSCapAuth records carried by a
> CEPH_SESSION_OPEN message (msg_version >= 6). For each record the
> match.path and match.fs_name byte strings are read by first decoding
> a
> 32-bit length and then copying that many bytes with the bare
> ceph_decode_copy(). Unlike the surrounding fields, which all use the
> _safe decode variants, these two copies are not preceded by a
> ceph_decode_need() bounds check, and the enclosing MDSCapAuth and
> MDSCapMatch struct_len fields are skipped rather than enforced as an
> upper bound. A length larger than the bytes remaining in the message
> front makes ceph_decode_copy() read past the end of the front buffer.
>
> The message front is a dedicated allocation (ceph_msg_new2() ->
> kvmalloc), so the over-read runs off that object. A malicious or
> compromised MDS can trigger this with the first post-connect message
> on
> mount, with no client-side user interaction; under KASAN it is
> reported
> as a slab-out-of-bounds read in handle_session().
>
> Impact: a malicious MDS can force the kernel client to read up to 4
> GiB
> past the message front allocation during session setup, crashing the
> client (out-of-bounds read).
>
> Switch both copies to ceph_decode_copy_safe(), which performs the
> ceph_decode_need() bounds check before the copy and branches to the
> existing bad label, matching the rest of the decoder and the error
> path
> that frees the partially decoded cap_auths array.
>
> Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session
> is opened")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> ---
> fs/ceph/mds_client.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index ed17e0023705e..4f36ac73305dc 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4318,7 +4318,9 @@ static void handle_session(struct
> ceph_mds_session *session,
> pr_err_client(cl, "No memory
> for path\n");
> goto fail;
> }
> - ceph_decode_copy(&p,
> cap_auths[i].match.path, _len);
> + ceph_decode_copy_safe(&p, end,
> +
> cap_auths[i].match.path,
> + _len, bad);
>
> /* Remove the tailing '/' */
> while (_len &&
> cap_auths[i].match.path[_len - 1] == '/') {
> @@ -4335,7 +4337,9 @@ static void handle_session(struct
> ceph_mds_session *session,
> pr_err_client(cl, "No memory
> for fs_name\n");
> goto fail;
> }
> - ceph_decode_copy(&p,
> cap_auths[i].match.fs_name, _len);
> + ceph_decode_copy_safe(&p, end,
> +
> cap_auths[i].match.fs_name,
> + _len, bad);
> }
>
> ceph_decode_8_safe(&p, end,
> cap_auths[i].match.root_squash, bad);
Looks good.
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Thanks,
Slava.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3
2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
@ 2026-06-04 20:04 ` Viacheslav Dubeyko
2026-06-04 20:23 ` Michael Bommarito
0 siblings, 1 reply; 11+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 20:04 UTC (permalink / raw)
To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel
On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote:
> ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets
> from
> each per-mds info record and advances the decode cursor by
> num_export_targets * sizeof(u32) without first checking that many
> bytes
> remain. The only upper-bound check that catches a runaway cursor
> (*p > info_end) is gated on info_v >= 4, because info_end is left
> NULL
> for info_v 2 and 3. When the monitor sends an MDS map whose per-mds
> info version is 2 or 3 with an oversized num_export_targets, the
> cursor
> moves past the message front buffer and the later export-targets loop
> calls the unchecked ceph_decode_32() on out-of-bounds memory.
>
> A kernel client processes CEPH_MSG_MDS_MAP from its monitor session
> (net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to
> ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound
> and
> calls ceph_mdsmap_decode()). A malicious or compromised monitor, or
> an
> on-path attacker on an unsigned/unencrypted messenger session, can
> therefore drive an out-of-bounds read in the client kernel; on x86_64
> with KASAN it is reported as a slab-out-of-bounds read in
> ceph_mdsmap_decode(). The decoded values land in the internal
> info->export_targets[] array, so the consequence is a kernel
> out-of-bounds read, not an information leak to the attacker.
>
> Impact: a malicious or compromised Ceph monitor sending an MDS map
> with
> a per-mds info version of 2 or 3 and an oversized num_export_targets
> field triggers an out-of-bounds read in the CephFS client kernel.
>
> Add a ceph_decode_need() for num_export_targets * sizeof(u32) before
> advancing the cursor, so the bound is enforced for every info_v >= 2,
> not only info_v >= 4. This mirrors the count-then-need idiom already
> used for m_data_pg_pools later in the same function.
>
> Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support")
> Cc: stable@vger.kernel.org
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> fs/ceph/mdsmap.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
> index d8e46eb7e5eb5..7cb65f9f4783c 100644
> --- a/fs/ceph/mdsmap.c
> +++ b/fs/ceph/mdsmap.c
> @@ -224,6 +224,8 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct
> ceph_mds_client *mdsc, void **p,
> *p += namelen;
> if (info_v >= 2) {
> ceph_decode_32_safe(p, end,
> num_export_targets, bad);
> + ceph_decode_need(p, end,
> + num_export_targets *
> sizeof(u32), bad);
We extract the num_export_targets from the message. What if the
num_export_targets is huge enough? I assume we could potentially have
overflow during the multiplication. What do you think?
Thanks,
Slava.
> pexport_targets = *p;
> *p += num_export_targets * sizeof(u32);
> } else {
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3
2026-06-04 20:04 ` Viacheslav Dubeyko
@ 2026-06-04 20:23 ` Michael Bommarito
0 siblings, 0 replies; 11+ messages in thread
From: Michael Bommarito @ 2026-06-04 20:23 UTC (permalink / raw)
To: Viacheslav Dubeyko; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel
> We extract the num_export_targets from the message. What if the
> num_export_targets is huge enough? I assume we could potentially have
> overflow during the multiplication. What do you think?
Good catch, we definitely need a v2 to address that with size_mul
Thanks,
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
@ 2026-06-04 21:06 ` Viacheslav Dubeyko
2026-06-04 21:41 ` Michael Bommarito
0 siblings, 1 reply; 11+ messages in thread
From: Viacheslav Dubeyko @ 2026-06-04 21:06 UTC (permalink / raw)
To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel
On Thu, 2026-06-04 at 14:09 -0400, Michael Bommarito wrote:
> ceph_parse_deleg_inos() decodes interval sets of delegated inode
> numbers
> from an MDS create-with-delegation reply. For each set it reads a 64-
> bit
> start and a 64-bit len with ceph_decode_64_safe(), which only
> validates
> that the eight bytes are present in the message, not the value, and
> then
> loops:
>
> while (len--) {
> xa_insert(&s->s_delegated_inos, start++,
> DELEGATED_INO_AVAILABLE, GFP_KERNEL);
> ...
> }
>
> len is fully attacker controlled. A malicious or compromised MDS can
> send
> a create reply whose interval set declares len near 2^63, driving an
> effectively unbounded loop that performs a GFP_KERNEL xarray insert
> on
> each iteration. This spins a kernel thread in the reply dispatch path
> and
> exhausts memory; on a client that has negotiated
> CEPHFS_FEATURE_DELEG_INO
> (enabled by default) a single reply is enough to wedge the mount.
>
> A legitimate MDS delegates only a small range per set (the userspace
> MDS
> prealloc window, mds_client_prealloc_inos, defaults to 1000). Reject
> any
> set whose len exceeds CEPH_MAX_DELEG_INOS (1M, far above any
> legitimate
> value) by treating the reply as malformed and returning -EIO,
> consistent
> with the existing decode error handling. Normal delegations are well
> under
> the cap and are unaffected.
>
> Fixes: d4846487870897 ("ceph: decode interval_sets for delegated
> inos")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> ---
> fs/ceph/mds_client.c | 14 ++++++++++++++
> fs/ceph/super.h | 9 +++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 4f36ac73305dc..0a084c4f3aae2 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -633,6 +633,20 @@ static int ceph_parse_deleg_inos(void **p, void
> *end,
> start, len);
> continue;
> }
> +
> + /*
> + * A legitimate MDS delegates a small range per set.
> Treat a
> + * count larger than any plausible delegation window
> as a
> + * malformed reply rather than spinning while(len--)
> and
> + * inserting unbounded xarray entries.
> + */
> + if (len > CEPH_MAX_DELEG_INOS) {
> + pr_warn_ratelimited_client(cl,
> + "rejecting oversized inode range
> delegation (start=0x%llx len=0x%llx)\n",
> + start, len);
> + return -EIO;
> + }
> +
> while (len--) {
> int err = xa_insert(&s->s_delegated_inos,
> start++,
> DELEGATED_INO_AVAILABLE,
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index afc89ce91804e..43a9b075f344c 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode
> *inode, void *data)
> #define CEPH_MDS_INO_LOG_OFFSET (2 * CEPH_MAX_MDS)
> #define CEPH_INO_SYSTEM_BASE ((6*CEPH_MAX_MDS) +
> (CEPH_MAX_MDS * CEPH_NUM_STRAY))
>
> +/*
> + * Upper bound on the number of inodes the MDS may delegate to a
> client in a
> + * single interval set. The userspace MDS hands out at most a few
> thousand
> + * (mds_client_prealloc_inos, default 1000); 1M is far above any
> legitimate
> + * value and guards ceph_parse_deleg_inos() against an unbounded
> loop driven
> + * by an attacker controlled 64-bit length.
> + */
> +#define CEPH_MAX_DELEG_INOS (1024 * 1024)
Technically speaking, I am not completely convinced by this limit.
Also, this limit is for one single interval. But if attacker tries
multiple times, then we can still have memory pressure or run out of
memory. I think we need to consider more robust solution. Could we
improve the fix?
Thanks,
Slava.
> +
> static inline bool ceph_vino_is_reserved(const struct ceph_vino
> vino)
> {
> if (vino.ino >= CEPH_INO_SYSTEM_BASE ||
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
2026-06-04 21:06 ` Viacheslav Dubeyko
@ 2026-06-04 21:41 ` Michael Bommarito
0 siblings, 0 replies; 11+ messages in thread
From: Michael Bommarito @ 2026-06-04 21:41 UTC (permalink / raw)
To: Viacheslav Dubeyko; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel
On Thu, Jun 4, 2026 at 5:06 PM Viacheslav Dubeyko <slava@dubeyko.com> wrote:
> Technically speaking, I am not completely convinced by this limit.
> Also, this limit is for one single interval. But if attacker tries
> multiple times, then we can still have memory pressure or run out of
> memory. I think we need to consider more robust solution. Could we
> improve the fix?
I looked at a per session limit but the lock management gets to be a
bit more complicated. If you think that's the right direction, I can
implement v2 based on that.
About the count, if we move to per session, then do you agree that the
right number is based on some factor of mds_client_prealloc_inos?
Thanks,
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-04 21:41 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito
2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
2026-06-04 19:50 ` Viacheslav Dubeyko
2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito
2026-06-04 19:54 ` Viacheslav Dubeyko
2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito
2026-06-04 20:04 ` Viacheslav Dubeyko
2026-06-04 20:23 ` Michael Bommarito
2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
2026-06-04 21:06 ` Viacheslav Dubeyko
2026-06-04 21:41 ` Michael Bommarito
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox