* [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders
@ 2026-06-04 18:08 Michael Bommarito
2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel
The CephFS client decodes several variable-length fields from MDS and
monitor replies without checking the declared length against the bytes
actually present in the message. Each of the four sites below trusts a
32-bit or 64-bit length from the wire and then copies, advances over, or
loops on it. A malicious or compromised server (or, for the monitor map,
an on-path attacker on an unsigned messenger session) can drive an
out-of-bounds read or an unbounded loop in the client kernel. The fixes
add the missing ceph_decode_need()/ceph_decode_copy_safe() bound or an
explicit count cap, matching the idioms already used elsewhere in the
same decoders.
Patch 1 is the most serious: the over-read value is returned to user
space through getxattr(2), so it is a kernel-heap information leak
(AC:H, requires a user to read the attribute). Patches 2-4 are
out-of-bounds reads / an unbounded loop that crash or wedge the client
(denial of service, no information disclosure). For patch 3 the supplier
is the monitor, a more privileged cluster role than the MDS; it is
included as a hardening fix for the info_v 2/3 compatibility path.
All four were reproduced on x86_64 QEMU with CONFIG_KASAN=y by calling
the real decoder via an in-tree debugfs harness, and re-run with the
patch applied:
patch 1: stock - __build_xattrs() stores val_len=304 from a 256-byte
blob, then the getxattr memcpy is a slab-out-of-bounds read of
304 bytes; patched - decode returns -EIO, no value stored.
patch 2: stock - slab-out-of-bounds read of 512 bytes in
handle_session(); patched - the bound rejects the record.
patch 3: stock - slab-out-of-bounds read in ceph_mdsmap_decode();
patched - decode returns -EINVAL.
patch 4: stock - ceph_parse_deleg_inos() runs 1048640 iterations on a
crafted len; patched - returns -EIO before the loop.
Well-formed replies still decode in every case (valid xattr value,
in-range delegation, in-bounds export-target array). These bugs were
found with AI assistance and are reported on the public list
accordingly, especially since they are mostly about a malicious
trusted server.
Michael Bommarito (4):
ceph: bound xattr value length in __build_xattrs()
ceph: bound MDSCapAuth path and fs_name decode in handle_session()
ceph: bound num_export_targets array for mds info v2/v3
ceph: cap delegated inode count in ceph_parse_deleg_inos()
fs/ceph/mds_client.c | 22 ++++++++++++++++++++--
fs/ceph/mdsmap.c | 2 ++
fs/ceph/super.h | 9 +++++++++
fs/ceph/xattr.c | 1 +
4 files changed, 32 insertions(+), 2 deletions(-)
base-commit: f72c95f3a516d87483e225ae081a402a09fd0127
--
2.53.0
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() 2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito @ 2026-06-04 18:08 ` Michael Bommarito 2026-06-04 19:50 ` Viacheslav Dubeyko 2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito ` (3 subsequent siblings) 4 siblings, 1 reply; 17+ messages in thread From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel __build_xattrs() decodes the MDS-supplied xattr blob one attribute at a time. For each attribute it reads a 32-bit name length, advances past the name bytes, reads a 32-bit value length, records the value pointer, and advances past the value bytes. The two length fields are read with ceph_decode_32_safe(), but the value bytes themselves are advanced over with a bare "p += len" and no ceph_decode_need() check that "len" bytes remain in the blob. For every attribute except the last, the next iteration's ceph_decode_32_safe() on the following name length implicitly verifies that the previous value did not run past the blob end. The final attribute has no successor, so its decoded value length is never checked against the blob bounds. A malicious or compromised metadata server can set the last attribute's value length larger than the bytes actually present in the blob. The blob is a dedicated kvmalloc() allocation sized to the wire length (ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the oversized length in xattr->val_len verbatim, and a later getxattr(2) runs memcpy(value, xattr->val, xattr->val_len) into a user-supplied buffer, copying bytes past the end of the allocation back to user space. Impact: a malicious metadata server discloses adjacent kernel heap bytes to a local user via getxattr(2) on a CephFS file. Add the missing ceph_decode_need() so an out-of-bounds value length on the final attribute fails the decode and returns -EIO instead of being stored. Fixes: 355da1eb7a1f ("ceph: inode operations") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-8 --- fs/ceph/xattr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index e773be07f7674..d488bb8fc00ba 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode) name = p; p += len; ceph_decode_32_safe(&p, end, len, bad); + ceph_decode_need(&p, end, len, bad); val = p; p += len; -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() 2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito @ 2026-06-04 19:50 ` Viacheslav Dubeyko 0 siblings, 0 replies; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-06-04 19:50 UTC (permalink / raw) To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote: > __build_xattrs() decodes the MDS-supplied xattr blob one attribute at > a > time. For each attribute it reads a 32-bit name length, advances past > the > name bytes, reads a 32-bit value length, records the value pointer, > and > advances past the value bytes. The two length fields are read with > ceph_decode_32_safe(), but the value bytes themselves are advanced > over > with a bare "p += len" and no ceph_decode_need() check that "len" > bytes > remain in the blob. > > For every attribute except the last, the next iteration's > ceph_decode_32_safe() on the following name length implicitly > verifies > that the previous value did not run past the blob end. The final > attribute has no successor, so its decoded value length is never > checked > against the blob bounds. A malicious or compromised metadata server > can > set the last attribute's value length larger than the bytes actually > present in the blob. > > The blob is a dedicated kvmalloc() allocation sized to the wire > length > (ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the > oversized length in xattr->val_len verbatim, and a later getxattr(2) > runs > memcpy(value, xattr->val, xattr->val_len) into a user-supplied > buffer, > copying bytes past the end of the allocation back to user space. > > Impact: a malicious metadata server discloses adjacent kernel heap > bytes > to a local user via getxattr(2) on a CephFS file. Add the missing > ceph_decode_need() so an out-of-bounds value length on the final > attribute fails the decode and returns -EIO instead of being stored. > > Fixes: 355da1eb7a1f ("ceph: inode operations") > Cc: stable@vger.kernel.org > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > Assisted-by: Claude:claude-opus-4-8 > --- > fs/ceph/xattr.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c > index e773be07f7674..d488bb8fc00ba 100644 > --- a/fs/ceph/xattr.c > +++ b/fs/ceph/xattr.c > @@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode) > name = p; > p += len; > ceph_decode_32_safe(&p, end, len, bad); > + ceph_decode_need(&p, end, len, bad); > val = p; > p += len; > Makes sense. Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Thanks, Slava. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() 2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito @ 2026-06-04 18:08 ` Michael Bommarito 2026-06-04 19:54 ` Viacheslav Dubeyko 2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito ` (2 subsequent siblings) 4 siblings, 1 reply; 17+ messages in thread From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel handle_session() decodes the MDSCapAuth records carried by a CEPH_SESSION_OPEN message (msg_version >= 6). For each record the match.path and match.fs_name byte strings are read by first decoding a 32-bit length and then copying that many bytes with the bare ceph_decode_copy(). Unlike the surrounding fields, which all use the _safe decode variants, these two copies are not preceded by a ceph_decode_need() bounds check, and the enclosing MDSCapAuth and MDSCapMatch struct_len fields are skipped rather than enforced as an upper bound. A length larger than the bytes remaining in the message front makes ceph_decode_copy() read past the end of the front buffer. The message front is a dedicated allocation (ceph_msg_new2() -> kvmalloc), so the over-read runs off that object. A malicious or compromised MDS can trigger this with the first post-connect message on mount, with no client-side user interaction; under KASAN it is reported as a slab-out-of-bounds read in handle_session(). Impact: a malicious MDS can force the kernel client to read up to 4 GiB past the message front allocation during session setup, crashing the client (out-of-bounds read). Switch both copies to ceph_decode_copy_safe(), which performs the ceph_decode_need() bounds check before the copy and branches to the existing bad label, matching the rest of the decoder and the error path that frees the partially decoded cap_auths array. Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session is opened") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-8 --- fs/ceph/mds_client.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index ed17e0023705e..4f36ac73305dc 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4318,7 +4318,9 @@ static void handle_session(struct ceph_mds_session *session, pr_err_client(cl, "No memory for path\n"); goto fail; } - ceph_decode_copy(&p, cap_auths[i].match.path, _len); + ceph_decode_copy_safe(&p, end, + cap_auths[i].match.path, + _len, bad); /* Remove the tailing '/' */ while (_len && cap_auths[i].match.path[_len - 1] == '/') { @@ -4335,7 +4337,9 @@ static void handle_session(struct ceph_mds_session *session, pr_err_client(cl, "No memory for fs_name\n"); goto fail; } - ceph_decode_copy(&p, cap_auths[i].match.fs_name, _len); + ceph_decode_copy_safe(&p, end, + cap_auths[i].match.fs_name, + _len, bad); } ceph_decode_8_safe(&p, end, cap_auths[i].match.root_squash, bad); -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() 2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito @ 2026-06-04 19:54 ` Viacheslav Dubeyko 0 siblings, 0 replies; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-06-04 19:54 UTC (permalink / raw) To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote: > handle_session() decodes the MDSCapAuth records carried by a > CEPH_SESSION_OPEN message (msg_version >= 6). For each record the > match.path and match.fs_name byte strings are read by first decoding > a > 32-bit length and then copying that many bytes with the bare > ceph_decode_copy(). Unlike the surrounding fields, which all use the > _safe decode variants, these two copies are not preceded by a > ceph_decode_need() bounds check, and the enclosing MDSCapAuth and > MDSCapMatch struct_len fields are skipped rather than enforced as an > upper bound. A length larger than the bytes remaining in the message > front makes ceph_decode_copy() read past the end of the front buffer. > > The message front is a dedicated allocation (ceph_msg_new2() -> > kvmalloc), so the over-read runs off that object. A malicious or > compromised MDS can trigger this with the first post-connect message > on > mount, with no client-side user interaction; under KASAN it is > reported > as a slab-out-of-bounds read in handle_session(). > > Impact: a malicious MDS can force the kernel client to read up to 4 > GiB > past the message front allocation during session setup, crashing the > client (out-of-bounds read). > > Switch both copies to ceph_decode_copy_safe(), which performs the > ceph_decode_need() bounds check before the copy and branches to the > existing bad label, matching the rest of the decoder and the error > path > that frees the partially decoded cap_auths array. > > Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session > is opened") > Cc: stable@vger.kernel.org > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > Assisted-by: Claude:claude-opus-4-8 > --- > fs/ceph/mds_client.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index ed17e0023705e..4f36ac73305dc 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -4318,7 +4318,9 @@ static void handle_session(struct > ceph_mds_session *session, > pr_err_client(cl, "No memory > for path\n"); > goto fail; > } > - ceph_decode_copy(&p, > cap_auths[i].match.path, _len); > + ceph_decode_copy_safe(&p, end, > + > cap_auths[i].match.path, > + _len, bad); > > /* Remove the tailing '/' */ > while (_len && > cap_auths[i].match.path[_len - 1] == '/') { > @@ -4335,7 +4337,9 @@ static void handle_session(struct > ceph_mds_session *session, > pr_err_client(cl, "No memory > for fs_name\n"); > goto fail; > } > - ceph_decode_copy(&p, > cap_auths[i].match.fs_name, _len); > + ceph_decode_copy_safe(&p, end, > + > cap_auths[i].match.fs_name, > + _len, bad); > } > > ceph_decode_8_safe(&p, end, > cap_auths[i].match.root_squash, bad); Looks good. Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Thanks, Slava. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito 2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito @ 2026-06-04 18:08 ` Michael Bommarito 2026-06-04 20:04 ` Viacheslav Dubeyko 2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 4 siblings, 1 reply; 17+ messages in thread From: Michael Bommarito @ 2026-06-04 18:08 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets from each per-mds info record and advances the decode cursor by num_export_targets * sizeof(u32) without first checking that many bytes remain. The only upper-bound check that catches a runaway cursor (*p > info_end) is gated on info_v >= 4, because info_end is left NULL for info_v 2 and 3. When the monitor sends an MDS map whose per-mds info version is 2 or 3 with an oversized num_export_targets, the cursor moves past the message front buffer and the later export-targets loop calls the unchecked ceph_decode_32() on out-of-bounds memory. A kernel client processes CEPH_MSG_MDS_MAP from its monitor session (net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound and calls ceph_mdsmap_decode()). A malicious or compromised monitor, or an on-path attacker on an unsigned/unencrypted messenger session, can therefore drive an out-of-bounds read in the client kernel; on x86_64 with KASAN it is reported as a slab-out-of-bounds read in ceph_mdsmap_decode(). The decoded values land in the internal info->export_targets[] array, so the consequence is a kernel out-of-bounds read, not an information leak to the attacker. Impact: a malicious or compromised Ceph monitor sending an MDS map with a per-mds info version of 2 or 3 and an oversized num_export_targets field triggers an out-of-bounds read in the CephFS client kernel. Add a ceph_decode_need() for num_export_targets * sizeof(u32) before advancing the cursor, so the bound is enforced for every info_v >= 2, not only info_v >= 4. This mirrors the count-then-need idiom already used for m_data_pg_pools later in the same function. Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> --- fs/ceph/mdsmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c index d8e46eb7e5eb5..7cb65f9f4783c 100644 --- a/fs/ceph/mdsmap.c +++ b/fs/ceph/mdsmap.c @@ -224,6 +224,8 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p, *p += namelen; if (info_v >= 2) { ceph_decode_32_safe(p, end, num_export_targets, bad); + ceph_decode_need(p, end, + num_export_targets * sizeof(u32), bad); pexport_targets = *p; *p += num_export_targets * sizeof(u32); } else { -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito @ 2026-06-04 20:04 ` Viacheslav Dubeyko 2026-06-04 20:23 ` Michael Bommarito 0 siblings, 1 reply; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-06-04 20:04 UTC (permalink / raw) To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel On Thu, 2026-06-04 at 14:08 -0400, Michael Bommarito wrote: > ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets > from > each per-mds info record and advances the decode cursor by > num_export_targets * sizeof(u32) without first checking that many > bytes > remain. The only upper-bound check that catches a runaway cursor > (*p > info_end) is gated on info_v >= 4, because info_end is left > NULL > for info_v 2 and 3. When the monitor sends an MDS map whose per-mds > info version is 2 or 3 with an oversized num_export_targets, the > cursor > moves past the message front buffer and the later export-targets loop > calls the unchecked ceph_decode_32() on out-of-bounds memory. > > A kernel client processes CEPH_MSG_MDS_MAP from its monitor session > (net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to > ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound > and > calls ceph_mdsmap_decode()). A malicious or compromised monitor, or > an > on-path attacker on an unsigned/unencrypted messenger session, can > therefore drive an out-of-bounds read in the client kernel; on x86_64 > with KASAN it is reported as a slab-out-of-bounds read in > ceph_mdsmap_decode(). The decoded values land in the internal > info->export_targets[] array, so the consequence is a kernel > out-of-bounds read, not an information leak to the attacker. > > Impact: a malicious or compromised Ceph monitor sending an MDS map > with > a per-mds info version of 2 or 3 and an oversized num_export_targets > field triggers an out-of-bounds read in the CephFS client kernel. > > Add a ceph_decode_need() for num_export_targets * sizeof(u32) before > advancing the cursor, so the bound is enforced for every info_v >= 2, > not only info_v >= 4. This mirrors the count-then-need idiom already > used for m_data_pg_pools later in the same function. > > Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support") > Cc: stable@vger.kernel.org > Assisted-by: Claude:claude-opus-4-8 > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > --- > fs/ceph/mdsmap.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c > index d8e46eb7e5eb5..7cb65f9f4783c 100644 > --- a/fs/ceph/mdsmap.c > +++ b/fs/ceph/mdsmap.c > @@ -224,6 +224,8 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct > ceph_mds_client *mdsc, void **p, > *p += namelen; > if (info_v >= 2) { > ceph_decode_32_safe(p, end, > num_export_targets, bad); > + ceph_decode_need(p, end, > + num_export_targets * > sizeof(u32), bad); We extract the num_export_targets from the message. What if the num_export_targets is huge enough? I assume we could potentially have overflow during the multiplication. What do you think? Thanks, Slava. > pexport_targets = *p; > *p += num_export_targets * sizeof(u32); > } else { ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 2026-06-04 20:04 ` Viacheslav Dubeyko @ 2026-06-04 20:23 ` Michael Bommarito 0 siblings, 0 replies; 17+ messages in thread From: Michael Bommarito @ 2026-06-04 20:23 UTC (permalink / raw) To: Viacheslav Dubeyko; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel > We extract the num_export_targets from the message. What if the > num_export_targets is huge enough? I assume we could potentially have > overflow during the multiplication. What do you think? Good catch, we definitely need a v2 to address that with size_mul Thanks, Mike ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() 2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito ` (2 preceding siblings ...) 2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito @ 2026-06-04 18:09 ` Michael Bommarito 2026-06-04 21:06 ` Viacheslav Dubeyko 2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 4 siblings, 1 reply; 17+ messages in thread From: Michael Bommarito @ 2026-06-04 18:09 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel ceph_parse_deleg_inos() decodes interval sets of delegated inode numbers from an MDS create-with-delegation reply. For each set it reads a 64-bit start and a 64-bit len with ceph_decode_64_safe(), which only validates that the eight bytes are present in the message, not the value, and then loops: while (len--) { xa_insert(&s->s_delegated_inos, start++, DELEGATED_INO_AVAILABLE, GFP_KERNEL); ... } len is fully attacker controlled. A malicious or compromised MDS can send a create reply whose interval set declares len near 2^63, driving an effectively unbounded loop that performs a GFP_KERNEL xarray insert on each iteration. This spins a kernel thread in the reply dispatch path and exhausts memory; on a client that has negotiated CEPHFS_FEATURE_DELEG_INO (enabled by default) a single reply is enough to wedge the mount. A legitimate MDS delegates only a small range per set (the userspace MDS prealloc window, mds_client_prealloc_inos, defaults to 1000). Reject any set whose len exceeds CEPH_MAX_DELEG_INOS (1M, far above any legitimate value) by treating the reply as malformed and returning -EIO, consistent with the existing decode error handling. Normal delegations are well under the cap and are unaffected. Fixes: d4846487870897 ("ceph: decode interval_sets for delegated inos") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-8 --- fs/ceph/mds_client.c | 14 ++++++++++++++ fs/ceph/super.h | 9 +++++++++ 2 files changed, 23 insertions(+) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 4f36ac73305dc..0a084c4f3aae2 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -633,6 +633,20 @@ static int ceph_parse_deleg_inos(void **p, void *end, start, len); continue; } + + /* + * A legitimate MDS delegates a small range per set. Treat a + * count larger than any plausible delegation window as a + * malformed reply rather than spinning while(len--) and + * inserting unbounded xarray entries. + */ + if (len > CEPH_MAX_DELEG_INOS) { + pr_warn_ratelimited_client(cl, + "rejecting oversized inode range delegation (start=0x%llx len=0x%llx)\n", + start, len); + return -EIO; + } + while (len--) { int err = xa_insert(&s->s_delegated_inos, start++, DELEGATED_INO_AVAILABLE, diff --git a/fs/ceph/super.h b/fs/ceph/super.h index afc89ce91804e..43a9b075f344c 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode *inode, void *data) #define CEPH_MDS_INO_LOG_OFFSET (2 * CEPH_MAX_MDS) #define CEPH_INO_SYSTEM_BASE ((6*CEPH_MAX_MDS) + (CEPH_MAX_MDS * CEPH_NUM_STRAY)) +/* + * Upper bound on the number of inodes the MDS may delegate to a client in a + * single interval set. The userspace MDS hands out at most a few thousand + * (mds_client_prealloc_inos, default 1000); 1M is far above any legitimate + * value and guards ceph_parse_deleg_inos() against an unbounded loop driven + * by an attacker controlled 64-bit length. + */ +#define CEPH_MAX_DELEG_INOS (1024 * 1024) + static inline bool ceph_vino_is_reserved(const struct ceph_vino vino) { if (vino.ino >= CEPH_INO_SYSTEM_BASE || -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() 2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito @ 2026-06-04 21:06 ` Viacheslav Dubeyko 2026-06-04 21:41 ` Michael Bommarito 0 siblings, 1 reply; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-06-04 21:06 UTC (permalink / raw) To: Michael Bommarito, Ilya Dryomov, Alex Markuze; +Cc: ceph-devel, linux-kernel On Thu, 2026-06-04 at 14:09 -0400, Michael Bommarito wrote: > ceph_parse_deleg_inos() decodes interval sets of delegated inode > numbers > from an MDS create-with-delegation reply. For each set it reads a 64- > bit > start and a 64-bit len with ceph_decode_64_safe(), which only > validates > that the eight bytes are present in the message, not the value, and > then > loops: > > while (len--) { > xa_insert(&s->s_delegated_inos, start++, > DELEGATED_INO_AVAILABLE, GFP_KERNEL); > ... > } > > len is fully attacker controlled. A malicious or compromised MDS can > send > a create reply whose interval set declares len near 2^63, driving an > effectively unbounded loop that performs a GFP_KERNEL xarray insert > on > each iteration. This spins a kernel thread in the reply dispatch path > and > exhausts memory; on a client that has negotiated > CEPHFS_FEATURE_DELEG_INO > (enabled by default) a single reply is enough to wedge the mount. > > A legitimate MDS delegates only a small range per set (the userspace > MDS > prealloc window, mds_client_prealloc_inos, defaults to 1000). Reject > any > set whose len exceeds CEPH_MAX_DELEG_INOS (1M, far above any > legitimate > value) by treating the reply as malformed and returning -EIO, > consistent > with the existing decode error handling. Normal delegations are well > under > the cap and are unaffected. > > Fixes: d4846487870897 ("ceph: decode interval_sets for delegated > inos") > Cc: stable@vger.kernel.org > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > Assisted-by: Claude:claude-opus-4-8 > --- > fs/ceph/mds_client.c | 14 ++++++++++++++ > fs/ceph/super.h | 9 +++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 4f36ac73305dc..0a084c4f3aae2 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -633,6 +633,20 @@ static int ceph_parse_deleg_inos(void **p, void > *end, > start, len); > continue; > } > + > + /* > + * A legitimate MDS delegates a small range per set. > Treat a > + * count larger than any plausible delegation window > as a > + * malformed reply rather than spinning while(len--) > and > + * inserting unbounded xarray entries. > + */ > + if (len > CEPH_MAX_DELEG_INOS) { > + pr_warn_ratelimited_client(cl, > + "rejecting oversized inode range > delegation (start=0x%llx len=0x%llx)\n", > + start, len); > + return -EIO; > + } > + > while (len--) { > int err = xa_insert(&s->s_delegated_inos, > start++, > DELEGATED_INO_AVAILABLE, > diff --git a/fs/ceph/super.h b/fs/ceph/super.h > index afc89ce91804e..43a9b075f344c 100644 > --- a/fs/ceph/super.h > +++ b/fs/ceph/super.h > @@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode > *inode, void *data) > #define CEPH_MDS_INO_LOG_OFFSET (2 * CEPH_MAX_MDS) > #define CEPH_INO_SYSTEM_BASE ((6*CEPH_MAX_MDS) + > (CEPH_MAX_MDS * CEPH_NUM_STRAY)) > > +/* > + * Upper bound on the number of inodes the MDS may delegate to a > client in a > + * single interval set. The userspace MDS hands out at most a few > thousand > + * (mds_client_prealloc_inos, default 1000); 1M is far above any > legitimate > + * value and guards ceph_parse_deleg_inos() against an unbounded > loop driven > + * by an attacker controlled 64-bit length. > + */ > +#define CEPH_MAX_DELEG_INOS (1024 * 1024) Technically speaking, I am not completely convinced by this limit. Also, this limit is for one single interval. But if attacker tries multiple times, then we can still have memory pressure or run out of memory. I think we need to consider more robust solution. Could we improve the fix? Thanks, Slava. > + > static inline bool ceph_vino_is_reserved(const struct ceph_vino > vino) > { > if (vino.ino >= CEPH_INO_SYSTEM_BASE || ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() 2026-06-04 21:06 ` Viacheslav Dubeyko @ 2026-06-04 21:41 ` Michael Bommarito 2026-06-05 19:10 ` Viacheslav Dubeyko 0 siblings, 1 reply; 17+ messages in thread From: Michael Bommarito @ 2026-06-04 21:41 UTC (permalink / raw) To: Viacheslav Dubeyko; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel On Thu, Jun 4, 2026 at 5:06 PM Viacheslav Dubeyko <slava@dubeyko.com> wrote: > Technically speaking, I am not completely convinced by this limit. > Also, this limit is for one single interval. But if attacker tries > multiple times, then we can still have memory pressure or run out of > memory. I think we need to consider more robust solution. Could we > improve the fix? I looked at a per session limit but the lock management gets to be a bit more complicated. If you think that's the right direction, I can implement v2 based on that. About the count, if we move to per session, then do you agree that the right number is based on some factor of mds_client_prealloc_inos? Thanks, Mike ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() 2026-06-04 21:41 ` Michael Bommarito @ 2026-06-05 19:10 ` Viacheslav Dubeyko 0 siblings, 0 replies; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-06-05 19:10 UTC (permalink / raw) To: Michael Bommarito; +Cc: Ilya Dryomov, Alex Markuze, ceph-devel, linux-kernel On Thu, 2026-06-04 at 17:41 -0400, Michael Bommarito wrote: > On Thu, Jun 4, 2026 at 5:06 PM Viacheslav Dubeyko <slava@dubeyko.com> > wrote: > > Technically speaking, I am not completely convinced by this limit. > > Also, this limit is for one single interval. But if attacker tries > > multiple times, then we can still have memory pressure or run out > > of > > memory. I think we need to consider more robust solution. Could we > > improve the fix? > > I looked at a per session limit but the lock management gets to be a > bit more complicated. If you think that's the right direction, I can > implement v2 based on that. > > About the count, if we move to per session, then do you agree that > the > right number is based on some factor of mds_client_prealloc_inos? I believe that it is right direction. Anyway, it's better than to operate by assumption or hardcode the constant into the code. Thanks, Slava. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders 2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito ` (3 preceding siblings ...) 2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito @ 2026-06-06 19:00 ` Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito ` (3 more replies) 4 siblings, 4 replies; 17+ messages in thread From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel This is v2 of the CephFS decoder-bound series. The first two patches are unchanged code-wise and now carry Slava's Reviewed-by. Patches 3 and 4 address the review feedback on overflow-safe sizing and aggregate delegated inode bounds. The four bugs are still independent: 1/4 rejects a final xattr value length that runs past the xattr blob. 2/4 bounds MDSCapAuth path and fs_name copies in handle_session(). 3/4 bounds the mdsmap export_targets array for info_v 2/3. 4/4 bounds delegated-inode parsing by session population and by one reply's aggregate interval length. Changes in v2: - Add Reviewed-by: Viacheslav Dubeyko to patches 1 and 2. - Patch 3 computes the export-targets byte count with size_mul() and reuses the checked length for the cursor advance. - Patch 4 replaces the per-interval cap with a per-session population counter and a per-reply interval budget, so repeated replies and duplicate ranges are bounded too. The cap stays a fixed client-side constant because the kernel client never sees the userspace mds_client_prealloc_inos option; it is sized as a generous multiple of that option's documented default of 1000. Michael Bommarito (4): ceph: bound xattr value length in __build_xattrs() ceph: bound MDSCapAuth path and fs_name decode in handle_session() ceph: bound num_export_targets array for mds info v2/v3 ceph: cap delegated inode count in ceph_parse_deleg_inos() fs/ceph/mds_client.c | 59 ++++++++++++++++++++++++++++++++++++++------ fs/ceph/mds_client.h | 1 + fs/ceph/mdsmap.c | 7 +++++- fs/ceph/super.h | 9 +++++++ fs/ceph/xattr.c | 1 + 5 files changed, 68 insertions(+), 9 deletions(-) base-commit: f72c95f3a516d87483e225ae081a402a09fd0127 -- 2.53.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() 2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito @ 2026-06-06 19:00 ` Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel __build_xattrs() decodes the MDS-supplied xattr blob one attribute at a time. For each attribute it reads a 32-bit name length, advances past the name bytes, reads a 32-bit value length, records the value pointer, and advances past the value bytes. The two length fields are read with ceph_decode_32_safe(), but the value bytes themselves are advanced over with a bare "p += len" and no ceph_decode_need() check that "len" bytes remain in the blob. For every attribute except the last, the next iteration's ceph_decode_32_safe() on the following name length implicitly verifies that the previous value did not run past the blob end. The final attribute has no successor, so its decoded value length is never checked against the blob bounds. A malicious or compromised metadata server can set the last attribute's value length larger than the bytes actually present in the blob. The blob is a dedicated kvmalloc() allocation sized to the wire length (ceph_buffer_new() in ceph_fill_inode()). __set_xattr() records the oversized length in xattr->val_len verbatim, and a later getxattr(2) runs memcpy(value, xattr->val, xattr->val_len) into a user-supplied buffer, copying bytes past the end of the allocation back to user space. Impact: a malicious metadata server discloses adjacent kernel heap bytes to a local user via getxattr(2) on a CephFS file. Add the missing ceph_decode_need() so an out-of-bounds value length on the final attribute fails the decode and returns -EIO instead of being stored. Fixes: 355da1eb7a1f ("ceph: inode operations") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-8 Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> --- v2: - Add Reviewed-by from Viacheslav Dubeyko. fs/ceph/xattr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index e773be07f7674..d488bb8fc00ba 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -848,6 +848,7 @@ static int __build_xattrs(struct inode *inode) name = p; p += len; ceph_decode_32_safe(&p, end, len, bad); + ceph_decode_need(&p, end, len, bad); val = p; p += len; -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() 2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito @ 2026-06-06 19:00 ` Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito 3 siblings, 0 replies; 17+ messages in thread From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel handle_session() decodes the MDSCapAuth records carried by a CEPH_SESSION_OPEN message (msg_version >= 6). For each record the match.path and match.fs_name byte strings are read by first decoding a 32-bit length and then copying that many bytes with the bare ceph_decode_copy(). Unlike the surrounding fields, which all use the _safe decode variants, these two copies are not preceded by a ceph_decode_need() bounds check, and the enclosing MDSCapAuth and MDSCapMatch struct_len fields are skipped rather than enforced as an upper bound. A length larger than the bytes remaining in the message front makes ceph_decode_copy() read past the end of the front buffer. The message front is a dedicated allocation (ceph_msg_new2() -> kvmalloc), so the over-read runs off that object. A malicious or compromised MDS can trigger this with the first post-connect message on mount, with no client-side user interaction; under KASAN it is reported as a slab-out-of-bounds read in handle_session(). Impact: a malicious MDS can force the kernel client to read up to 4 GiB past the message front allocation during session setup, crashing the client (out-of-bounds read). Switch both copies to ceph_decode_copy_safe(), which performs the ceph_decode_need() bounds check before the copy and branches to the existing bad label, matching the rest of the decoder and the error path that frees the partially decoded cap_auths array. Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session is opened") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-8 Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> --- v2: - Add Reviewed-by from Viacheslav Dubeyko. fs/ceph/mds_client.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index ed17e0023705e..4f36ac73305dc 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4318,7 +4318,9 @@ static void handle_session(struct ceph_mds_session *session, pr_err_client(cl, "No memory for path\n"); goto fail; } - ceph_decode_copy(&p, cap_auths[i].match.path, _len); + ceph_decode_copy_safe(&p, end, + cap_auths[i].match.path, + _len, bad); /* Remove the tailing '/' */ while (_len && cap_auths[i].match.path[_len - 1] == '/') { @@ -4335,7 +4337,9 @@ static void handle_session(struct ceph_mds_session *session, pr_err_client(cl, "No memory for fs_name\n"); goto fail; } - ceph_decode_copy(&p, cap_auths[i].match.fs_name, _len); + ceph_decode_copy_safe(&p, end, + cap_auths[i].match.fs_name, + _len, bad); } ceph_decode_8_safe(&p, end, cap_auths[i].match.root_squash, bad); -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3 2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito @ 2026-06-06 19:00 ` Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito 3 siblings, 0 replies; 17+ messages in thread From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel ceph_mdsmap_decode() in fs/ceph/mdsmap.c reads num_export_targets from each per-mds info record and advances the decode cursor by num_export_targets * sizeof(u32) without first checking that many bytes remain. The only upper-bound check that catches a runaway cursor (*p > info_end) is gated on info_v >= 4, because info_end is left NULL for info_v 2 and 3. When the monitor sends an MDS map whose per-mds info version is 2 or 3 with an oversized num_export_targets, the cursor moves past the message front buffer and the later export-targets loop calls the unchecked ceph_decode_32() on out-of-bounds memory. A kernel client processes CEPH_MSG_MDS_MAP from its monitor session (net/ceph/mon_client.c dispatches it; fs/ceph/super.c routes it to ceph_mdsc_handle_mdsmap(), which sets end to the front buffer bound and calls ceph_mdsmap_decode()). A malicious or compromised monitor, or an on-path attacker on an unsigned/unencrypted messenger session, can therefore drive an out-of-bounds read in the client kernel; on x86_64 with KASAN it is reported as a slab-out-of-bounds read in ceph_mdsmap_decode(). The decoded values land in the internal info->export_targets[] array, so the consequence is a kernel out-of-bounds read, not an information leak to the attacker. Impact: a malicious or compromised Ceph monitor sending an MDS map with a per-mds info version of 2 or 3 and an oversized num_export_targets field triggers an out-of-bounds read in the CephFS client kernel. Add a ceph_decode_need() for the export-targets array before advancing the cursor, so the bound is enforced for every info_v >= 2, not only info_v >= 4. This mirrors the count-then-need idiom already used for m_data_pg_pools later in the same function. Compute the export-targets byte count with size_mul() and reuse that checked length when advancing the cursor, so the attacker-controlled num_export_targets multiplication fails closed on overflow rather than relying on the later kcalloc() guard. Fixes: d463a43d69f4 ("ceph: CEPH_FEATURE_MDSENC support") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> --- v2: - Compute the export-targets byte count with size_mul() and reuse the checked length when advancing the cursor. fs/ceph/mdsmap.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c index d8e46eb7e5eb5..c0f63f0460f67 100644 --- a/fs/ceph/mdsmap.c +++ b/fs/ceph/mdsmap.c @@ -3,6 +3,7 @@ #include <linux/bug.h> #include <linux/err.h> +#include <linux/overflow.h> #include <linux/random.h> #include <linux/slab.h> #include <linux/types.h> @@ -126,6 +127,7 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p, u8 mdsmap_v; u16 mdsmap_ev; u32 target; + size_t export_targets_len; m = kzalloc_obj(*m, GFP_NOFS); if (!m) @@ -224,8 +226,11 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p, *p += namelen; if (info_v >= 2) { ceph_decode_32_safe(p, end, num_export_targets, bad); + export_targets_len = size_mul(num_export_targets, + sizeof(u32)); + ceph_decode_need(p, end, export_targets_len, bad); pexport_targets = *p; - *p += num_export_targets * sizeof(u32); + *p += export_targets_len; } else { num_export_targets = 0; } -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() 2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito ` (2 preceding siblings ...) 2026-06-06 19:00 ` [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito @ 2026-06-06 19:00 ` Michael Bommarito 3 siblings, 0 replies; 17+ messages in thread From: Michael Bommarito @ 2026-06-06 19:00 UTC (permalink / raw) To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko; +Cc: ceph-devel, linux-kernel ceph_parse_deleg_inos() decodes interval sets of delegated inode numbers from an MDS create-with-delegation reply. For each set it reads a 64-bit start and a 64-bit len with ceph_decode_64_safe(), which only validates that the eight bytes are present in the message, not the value, and then loops over len while inserting entries into s_delegated_inos. len is fully attacker controlled. A malicious or compromised MDS can send one huge interval, many intervals in one reply, duplicate intervals, or repeated replies that accumulate delegated inodes on the same session. The v1 fix bounded only one interval and still allowed aggregate CPU and memory pressure. Bound both dimensions. Track the number of delegated inodes currently held by each MDS session, reject replies that would exceed that population cap, and also cap the total interval length accepted from one reply so duplicate ranges cannot spin through an attacker-controlled len while making no successful xarray inserts. The counter is decremented when async create consumes a delegated inode, incremented when a delegated inode is restored, initialized with the session xarray, and reset when reconnect destroys the xarray contents. The cap is a fixed, client-chosen constant rather than a value derived from the MDS. mds_client_prealloc_inos is a userspace MDS configuration option; it is never sent to the kernel client on the wire, and a server-supplied bound could not be trusted for a defensive limit in any case. The constant is set well above that option's documented default of 1000 (a generous multiple), so legitimate refill behavior is unaffected while the CPU and xarray memory a malformed delegation stream can consume stays bounded. Impact: a malicious or compromised Ceph MDS can no longer make a client spin through an unbounded delegated-inode interval or grow one session's delegated-inode xarray without limit. Fixes: d48464878708 ("ceph: decode interval_sets for delegated inos") Cc: stable@vger.kernel.org Suggested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> --- v2: - Replace the per-interval cap with a per-session delegated-inode population counter. - Add a per-reply interval budget so duplicate ranges cannot run the insertion loop without increasing the population counter. - Reset the counter when reconnect destroys the delegated-inode xarray. - Document why the cap is a fixed client-side constant (the kernel client cannot read the userspace mds_client_prealloc_inos option). fs/ceph/mds_client.c | 51 ++++++++++++++++++++++++++++++++++++++------ fs/ceph/mds_client.h | 1 + fs/ceph/super.h | 9 ++++++++ 3 files changed, 55 insertions(+), 6 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 4f36ac73305dc..ec5a80c9f86a7 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -612,16 +612,34 @@ static int parse_reply_info_filelock(void **p, void *end, #define DELEGATED_INO_AVAILABLE xa_mk_value(1) +static int ceph_insert_deleg_ino(struct ceph_mds_session *s, u64 ino) +{ + int err; + + if (atomic_inc_return(&s->s_num_deleg_inos) > CEPH_MAX_DELEG_INOS) { + atomic_dec(&s->s_num_deleg_inos); + return -EOVERFLOW; + } + + err = xa_insert(&s->s_delegated_inos, ino, DELEGATED_INO_AVAILABLE, + GFP_KERNEL); + if (err) + atomic_dec(&s->s_num_deleg_inos); + return err; +} + static int ceph_parse_deleg_inos(void **p, void *end, struct ceph_mds_session *s) { struct ceph_client *cl = s->s_mdsc->fsc->client; + u64 msg_deleg_inos = 0; u32 sets; ceph_decode_32_safe(p, end, sets, bad); doutc(cl, "got %u sets of delegated inodes\n", sets); while (sets--) { u64 start, len; + int session_deleg_inos; ceph_decode_64_safe(p, end, start, bad); ceph_decode_64_safe(p, end, len, bad); @@ -633,16 +651,34 @@ static int ceph_parse_deleg_inos(void **p, void *end, start, len); continue; } + + if (len > CEPH_MAX_DELEG_INOS || + msg_deleg_inos > CEPH_MAX_DELEG_INOS - len) { + pr_warn_ratelimited_client(cl, "too many delegated inodes in reply\n"); + return -EIO; + } + + session_deleg_inos = atomic_read(&s->s_num_deleg_inos); + if (session_deleg_inos > CEPH_MAX_DELEG_INOS || + len > CEPH_MAX_DELEG_INOS - session_deleg_inos) { + pr_warn_ratelimited_client(cl, "too many delegated inodes for session\n"); + return -EIO; + } + + msg_deleg_inos += len; + while (len--) { - int err = xa_insert(&s->s_delegated_inos, start++, - DELEGATED_INO_AVAILABLE, - GFP_KERNEL); + int err = ceph_insert_deleg_ino(s, start++); + if (!err) { doutc(cl, "added delegated inode 0x%llx\n", start - 1); } else if (err == -EBUSY) { pr_warn_client(cl, "MDS delegated inode 0x%llx more than once.\n", start - 1); + } else if (err == -EOVERFLOW) { + pr_warn_ratelimited_client(cl, "too many delegated inodes\n"); + return -EIO; } else { return err; } @@ -660,16 +696,17 @@ u64 ceph_get_deleg_ino(struct ceph_mds_session *s) xa_for_each(&s->s_delegated_inos, ino, val) { val = xa_erase(&s->s_delegated_inos, ino); - if (val == DELEGATED_INO_AVAILABLE) + if (val == DELEGATED_INO_AVAILABLE) { + atomic_dec(&s->s_num_deleg_inos); return ino; + } } return 0; } int ceph_restore_deleg_ino(struct ceph_mds_session *s, u64 ino) { - return xa_insert(&s->s_delegated_inos, ino, DELEGATED_INO_AVAILABLE, - GFP_KERNEL); + return ceph_insert_deleg_ino(s, ino); } #else /* BITS_PER_LONG == 64 */ /* @@ -1056,6 +1093,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc, INIT_LIST_HEAD(&s->s_waiting); INIT_LIST_HEAD(&s->s_unsafe); xa_init(&s->s_delegated_inos); + atomic_set(&s->s_num_deleg_inos, 0); INIT_LIST_HEAD(&s->s_cap_releases); INIT_WORK(&s->s_cap_release_work, ceph_cap_release_work); @@ -4971,6 +5009,7 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc, goto fail_nomsg; xa_destroy(&session->s_delegated_inos); + atomic_set(&session->s_num_deleg_inos, 0); mutex_lock(&session->s_mutex); session->s_state = CEPH_MDS_SESSION_RECONNECTING; diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 4e6c87f8414cc..3810b51ece2e6 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -255,6 +255,7 @@ struct ceph_mds_session { struct list_head s_waiting; /* waiting requests */ struct list_head s_unsafe; /* unsafe requests */ struct xarray s_delegated_inos; + atomic_t s_num_deleg_inos; }; /* diff --git a/fs/ceph/super.h b/fs/ceph/super.h index afc89ce91804e..bfc26e4d83bb4 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode *inode, void *data) #define CEPH_MDS_INO_LOG_OFFSET (2 * CEPH_MAX_MDS) #define CEPH_INO_SYSTEM_BASE ((6*CEPH_MAX_MDS) + (CEPH_MAX_MDS * CEPH_NUM_STRAY)) +/* + * Upper bound on the number of delegated inodes a single MDS session may + * hold. The MDS normally hands out a small preallocation window (the + * userspace mds_client_prealloc_inos option defaults to 1000) and refills + * it as the client consumes entries. This leaves generous headroom while + * bounding the CPU and memory a malformed delegation interval can consume. + */ +#define CEPH_MAX_DELEG_INOS 8192 + static inline bool ceph_vino_is_reserved(const struct ceph_vino vino) { if (vino.ino >= CEPH_INO_SYSTEM_BASE || -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-06-06 19:01 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-04 18:08 [PATCH 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 2026-06-04 18:08 ` [PATCH 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito 2026-06-04 19:50 ` Viacheslav Dubeyko 2026-06-04 18:08 ` [PATCH 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito 2026-06-04 19:54 ` Viacheslav Dubeyko 2026-06-04 18:08 ` [PATCH 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito 2026-06-04 20:04 ` Viacheslav Dubeyko 2026-06-04 20:23 ` Michael Bommarito 2026-06-04 18:09 ` [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito 2026-06-04 21:06 ` Viacheslav Dubeyko 2026-06-04 21:41 ` Michael Bommarito 2026-06-05 19:10 ` Viacheslav Dubeyko 2026-06-06 19:00 ` [PATCH v2 0/4] ceph: bound untrusted MDS and monitor reply decoders Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 1/4] ceph: bound xattr value length in __build_xattrs() Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 2/4] ceph: bound MDSCapAuth path and fs_name decode in handle_session() Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 3/4] ceph: bound num_export_targets array for mds info v2/v3 Michael Bommarito 2026-06-06 19:00 ` [PATCH v2 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos() Michael Bommarito
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox