* [PATCH v2 0/2] Bounce buffer for mds client decryption when vmalloc()
@ 2026-05-30 3:06 Sam Edwards
2026-05-30 3:06 ` [PATCH v2 1/2] ceph: pass fscrypt `tname` buffers directly Sam Edwards
2026-05-30 3:06 ` [PATCH v2 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
0 siblings, 2 replies; 3+ messages in thread
From: Sam Edwards @ 2026-05-30 3:06 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko
Cc: Jeff Layton, Xiubo Li, ceph-devel, linux-kernel, Sam Edwards
Hi Ceph maintainers,
This is version 2 of my previous patchset to resolve fscrypt oops/panic when
decrypting filenames from a MDS message that spilled into the vmalloc area. [1]
Cheers,
Sam
Changes v1->v2:
- As Slava pointed out, v1 was performing `snprintf(..., oname->name)` *before*
the memcpy() that makes `oname->name` valid. He also raised a great point
about the if/else-if block's complexity. This version simplifies the control
flow structure.
- Reformat some whitespace and remove unnecessary linebreaks, going a little
over the 80-column soft limit, but improving readability.
Feedback not addressed:
- David shared some discomfort about fscrypt_fname_alloc_buffer() sometimes
changing the allocation length, but it only ever lengthens the allocation and
is therefore always at least NAME_MAX.
- David also began some discussion around passing an explicit `tname` length;
I'm still open to the idea, but would like to get a consensus around that
first, because enforcing the buffer size will likely require changes to the
base64_{de,en}code() function prototypes to accept buffer sizes.
[1] https://lore.kernel.org/ceph-devel/20260527025828.5966-1-CFSworks@gmail.com/
Sam Edwards (2):
ceph: pass fscrypt `tname` buffers directly
ceph: properly decrypt filenames in vmalloc() buffers
fs/ceph/crypto.c | 50 ++++++++++++++++++++++++++++++++------------
fs/ceph/crypto.h | 4 ++--
fs/ceph/mds_client.c | 12 +++++++----
3 files changed, 47 insertions(+), 19 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v2 1/2] ceph: pass fscrypt `tname` buffers directly
2026-05-30 3:06 [PATCH v2 0/2] Bounce buffer for mds client decryption when vmalloc() Sam Edwards
@ 2026-05-30 3:06 ` Sam Edwards
2026-05-30 3:06 ` [PATCH v2 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
1 sibling, 0 replies; 3+ messages in thread
From: Sam Edwards @ 2026-05-30 3:06 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko
Cc: Jeff Layton, Xiubo Li, ceph-devel, linux-kernel, Sam Edwards
ceph_fname_to_usr() needs a temporary buffer for some operations
(currently only base64-decoding ciphertext) and it is convenient to
allow the caller to specify this buffer to avoid a heap allocation, so
it has a (nullable) `tname` argument. Until now, this argument was a
`struct fscrypt_str`; however, this is unnecessary for two reasons:
1. `tname->len` isn't used anywhere: ceph_fname_to_usr() assumes a
buffer large enough to hold the ciphertext, and
parse_reply_info_readdir() -- the only caller to use tname -- doesn't
set it.
2. While the `tname` parameter is documented "may be NULL,"
parse_reply_info_readdir() always passes it but with `tname->name`
sometimes NULL in violation of the contract, indicating that the
unnecessary container creates actual confusion.
Therefore, change the type to `unsigned char *` and pass the buffer
directly.
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---
fs/ceph/crypto.c | 9 ++++-----
fs/ceph/crypto.h | 4 ++--
fs/ceph/mds_client.c | 6 +++---
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 64d240759277..7493a3acd7d0 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -300,7 +300,7 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
*
* Returns 0 on success or negative error code on error.
*/
-int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
struct fscrypt_str *oname, bool *is_nokey)
{
struct inode *dir = fname->dir;
@@ -357,16 +357,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
if (ret)
goto out_inode;
- tname = &_tname;
+ tname = _tname.name;
}
- declen = base64_decode(name, name_len,
- tname->name, false, BASE64_IMAP);
+ declen = base64_decode(name, name_len, tname, false, BASE64_IMAP);
if (declen <= 0) {
ret = -EIO;
goto out;
}
- iname.name = tname->name;
+ iname.name = tname;
iname.len = declen;
} else {
iname.name = fname->ctext;
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index b748e2060bc9..79cb563fd887 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -115,7 +115,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
fscrypt_fname_free_buffer(fname);
}
-int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
struct fscrypt_str *oname, bool *is_nokey);
int ceph_fscrypt_prepare_readdir(struct inode *dir);
@@ -204,7 +204,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
}
static inline int ceph_fname_to_usr(const struct ceph_fname *fname,
- struct fscrypt_str *tname,
+ unsigned char *tname,
struct fscrypt_str *oname, bool *is_nokey)
{
oname->name = fname->name;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ed17e0023705..aa6730b48e97 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -488,11 +488,11 @@ static int parse_reply_info_readdir(void **p, void *end,
struct inode *inode = d_inode(req->r_dentry);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
- struct fscrypt_str tname = FSTR_INIT(NULL, 0);
struct fscrypt_str oname = FSTR_INIT(NULL, 0);
struct ceph_fname fname;
u32 altname_len, _name_len;
u8 *altname, *_name;
+ u8 *tname = NULL;
/* dentry */
ceph_decode_32_safe(p, end, _name_len, bad);
@@ -540,7 +540,7 @@ static int parse_reply_info_readdir(void **p, void *end,
* always be shorter, which is 3/4 of origin
* string.
*/
- tname.name = _name;
+ tname = _name;
/*
* Set oname to _name too, and this will be
@@ -557,7 +557,7 @@ static int parse_reply_info_readdir(void **p, void *end,
oname.len = altname_len;
}
rde->is_nokey = false;
- err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
+ err = ceph_fname_to_usr(&fname, tname, &oname, &rde->is_nokey);
if (err) {
pr_err_client(cl, "unable to decode %.*s, got %d\n",
_name_len, _name, err);
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v2 2/2] ceph: properly decrypt filenames in vmalloc() buffers
2026-05-30 3:06 [PATCH v2 0/2] Bounce buffer for mds client decryption when vmalloc() Sam Edwards
2026-05-30 3:06 ` [PATCH v2 1/2] ceph: pass fscrypt `tname` buffers directly Sam Edwards
@ 2026-05-30 3:06 ` Sam Edwards
1 sibling, 0 replies; 3+ messages in thread
From: Sam Edwards @ 2026-05-30 3:06 UTC (permalink / raw)
To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko
Cc: Jeff Layton, Xiubo Li, ceph-devel, linux-kernel, Sam Edwards,
stable
The fscrypt subsystem uses the scatterlist crypto API, inheriting its
requirement that any buffers are in the linear mapping region. However,
the messenger client uses kvmalloc() to create buffers for messages,
which will occasionally place those buffers in the vmalloc() region when
physical memory fragmentation doesn't permit a large enough kmalloc().
The various callers of ceph_fname_to_usr() directly pass (slices of) raw
messages from the MDS without considering that the messages may be in
vmalloc() buffers, resulting in oopses especially on non-x86 platforms
(see 'Closes:' for more details and a reproducer).
Make ceph_fname_to_usr() explicitly tolerant of vmalloc()-allocated
fname->ctext, fname->name, and/or oname->name buffers, using `tname`
(which, when non-null, must be a linear address; when null, is briefly
allocated as necessary) as a bounce buffer to avoid passing any
inappropriate addresses to fscrypt_fname_disk_to_usr().
Additionally change parse_reply_info_readdir() -- the only function to
supply its own `tname` -- to follow the new "tname must never come from
vmalloc()" rule by passing NULL when the message is not in the linear
region. Though this causes a per-dentry kmalloc()+kfree(), this overhead
exists only when processing the minority of messages that spill into
vmalloc(). My (crude) testing puts this at only about 1 in 8,000 readdir
messages. Still, if the overhead proves unreasonable in the future, it
is easy enough to mitigate: a future change could allocate a bounce
buffer in parse_reply_info_readdir() and use that as `tname` instead.
Fixes: 457117f077c67 ("ceph: add helpers for converting names for userland presentation")
Closes: https://lore.kernel.org/ceph-devel/CAH5Ym4ga7miUQE0K-cJA93Ya7w62P69MAN27R5cBiYnudoOHdA@mail.gmail.com/T/
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---
fs/ceph/crypto.c | 43 ++++++++++++++++++++++++++++++++++---------
fs/ceph/mds_client.c | 8 ++++++--
2 files changed, 40 insertions(+), 11 deletions(-)
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 7493a3acd7d0..bc0a097a4cea 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -298,6 +298,10 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
* Otherwise, base64 decode the string, and then ask fscrypt to format it
* for userland presentation.
*
+ * Though the fscrypt/crypto subsystems broadly expect all buffers to be in the
+ * linear-mapped region, this function slightly relaxes those requirements:
+ * fname->ctext, fname->name, and oname->name may be vmalloc(), but not tname.
+ *
* Returns 0 on success or negative error code on error.
*/
int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
@@ -305,11 +309,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
{
struct inode *dir = fname->dir;
struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
+ struct fscrypt_str _oname;
struct fscrypt_str iname;
char *name = fname->name;
int name_len = fname->name_len;
int ret;
+ if (WARN_ON_ONCE(tname && is_vmalloc_addr(tname)))
+ return -EIO;
+
/* Sanity check that the resulting name will fit in the buffer */
if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
return -EIO;
@@ -350,16 +358,18 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
goto out_inode;
}
+ if (!tname && (fname->ctext_len == 0 ||
+ unlikely(is_vmalloc_addr(fname->ctext)) ||
+ unlikely(is_vmalloc_addr(oname->name)))) {
+ ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
+ if (ret)
+ goto out_inode;
+ tname = _tname.name;
+ }
+
if (fname->ctext_len == 0) {
int declen;
- if (!tname) {
- ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
- if (ret)
- goto out_inode;
- tname = _tname.name;
- }
-
declen = base64_decode(name, name_len, tname, false, BASE64_IMAP);
if (declen <= 0) {
ret = -EIO;
@@ -367,13 +377,28 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
}
iname.name = tname;
iname.len = declen;
+ } else if (unlikely(is_vmalloc_addr(fname->ctext))) {
+ memcpy(tname, fname->ctext, fname->ctext_len);
+
+ iname.name = tname;
+ iname.len = fname->ctext_len;
} else {
iname.name = fname->ctext;
iname.len = fname->ctext_len;
}
- ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname);
- if (!ret && (dir != fname->dir)) {
+ _oname.name = unlikely(is_vmalloc_addr(oname->name)) ? tname : oname->name;
+ _oname.len = oname->len;
+
+ ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, &_oname);
+ if (ret)
+ goto out;
+
+ if (unlikely(is_vmalloc_addr(oname->name)))
+ memcpy(oname->name, _oname.name, _oname.len);
+ oname->len = _oname.len;
+
+ if (dir != fname->dir) {
char tmp_buf[BASE64_CHARS(NAME_MAX)];
name_len = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%llu",
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index aa6730b48e97..8fcf185e3a82 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -538,9 +538,13 @@ static int parse_reply_info_readdir(void **p, void *end,
* to do the base64_decode in-place. It's
* safe because the decoded string should
* always be shorter, which is 3/4 of origin
- * string.
+ * string. If this message was allocated with
+ * vmalloc() (happens, but rarely), leave it
+ * NULL and let ceph_fname_to_usr() allocate
+ * suitable temporary working space instead.
*/
- tname = _name;
+ if (likely(!is_vmalloc_addr(_name)))
+ tname = _name;
/*
* Set oname to _name too, and this will be
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-30 3:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-30 3:06 [PATCH v2 0/2] Bounce buffer for mds client decryption when vmalloc() Sam Edwards
2026-05-30 3:06 ` [PATCH v2 1/2] ceph: pass fscrypt `tname` buffers directly Sam Edwards
2026-05-30 3:06 ` [PATCH v2 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox