All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Bounce buffer for mds client decryption when vmalloc()
@ 2026-05-27  2:58 Sam Edwards
  2026-05-27  2:58 ` [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly Sam Edwards
  2026-05-27  2:58 ` [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
  0 siblings, 2 replies; 9+ messages in thread
From: Sam Edwards @ 2026-05-27  2:58 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko
  Cc: Jeff Layton, Xiubo Li, Milind Changire, ceph-devel, linux-kernel,
	Sam Edwards

Hi Ceph maintainers,

This is my proposed patchset after my previous RFC [1]. Apologies for the long
delay, life gets busy (and my Linux hacking time was eaten by trying to stay
patched in the face of the recent flood of LPE vulns), so it took me longer
than usual to get patches written, tested, and cleaned up. Glad to be back! :)

Since Alex pointed out last time that parse_reply_info_trace() has the same
latent bug, and Slava disliked the already-high complexity of
parse_reply_info_readdir(), I opted to resolve the issue in ceph_fname_to_usr()
instead. The purpose of patch #2 is to make ceph_fname_to_usr() explicitly
responsible for handling vmalloc()-backed filename buffers (for base64,
ciphertext, and plaintext), using the `tname` parameter as a bounce buffer
where appropriate (and preserving the allocate-when-needed behavior).

Patch #1 simplifies `struct fscrypt_str *tname` to `unsigned char *tname`, both
in recognition of `tname->len` being unused/unnecessary, and to eliminate the
`tname == NULL` / `tname->name == NULL` confusion.

Cheers,
Sam

[1] https://lore.kernel.org/ceph-devel/CAH5Ym4ga7miUQE0K-cJA93Ya7w62P69MAN27R5cBiYnudoOHdA@mail.gmail.com/T/

Sam Edwards (2):
  ceph: pass fscrypt `tname` buffers directly
  ceph: properly decrypt filenames in vmalloc() buffers

 fs/ceph/crypto.c     | 45 ++++++++++++++++++++++++++++++++------------
 fs/ceph/crypto.h     |  4 ++--
 fs/ceph/mds_client.c | 12 ++++++++----
 3 files changed, 43 insertions(+), 18 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly
  2026-05-27  2:58 [PATCH 0/2] Bounce buffer for mds client decryption when vmalloc() Sam Edwards
@ 2026-05-27  2:58 ` Sam Edwards
  2026-05-27 12:00   ` David Laight
  2026-05-27  2:58 ` [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
  1 sibling, 1 reply; 9+ messages in thread
From: Sam Edwards @ 2026-05-27  2:58 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko
  Cc: Jeff Layton, Xiubo Li, Milind Changire, ceph-devel, linux-kernel,
	Sam Edwards

ceph_fname_to_usr() needs a temporary buffer for some operations
(currently only base64-decoding ciphertext) and it is convenient to
allow the caller to specify this buffer to avoid a heap allocation, so
it has a (nullable) `tname` argument. Until now, this argument was a
`struct fscrypt_str`; however, this is unnecessary for two reasons:

1. `tname->len` isn't used anywhere: ceph_fname_to_usr() assumes a
   buffer large enough to hold the ciphertext, and
   parse_reply_info_readdir() -- the only caller to use tname -- doesn't
   set it.
2. While the `tname` parameter is documented "may be NULL,"
   parse_reply_info_readdir() always passes it but with `tname->name`
   sometimes NULL in violation of the contract, indicating that the
   unnecessary container creates actual confusion.

Therefore, change the type to `unsigned char *` and pass the buffer
directly.

Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---
 fs/ceph/crypto.c     | 10 +++++-----
 fs/ceph/crypto.h     |  4 ++--
 fs/ceph/mds_client.c |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 64d240759277..7515cb251226 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -300,7 +300,7 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
  *
  * Returns 0 on success or negative error code on error.
  */
-int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
 		      struct fscrypt_str *oname, bool *is_nokey)
 {
 	struct inode *dir = fname->dir;
@@ -357,16 +357,16 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
 			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
 			if (ret)
 				goto out_inode;
-			tname = &_tname;
+			tname = _tname.name;
 		}
 
-		declen = base64_decode(name, name_len,
-				       tname->name, false, BASE64_IMAP);
+		declen = base64_decode(name, name_len, tname, false,
+				       BASE64_IMAP);
 		if (declen <= 0) {
 			ret = -EIO;
 			goto out;
 		}
-		iname.name = tname->name;
+		iname.name = tname;
 		iname.len = declen;
 	} else {
 		iname.name = fname->ctext;
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index b748e2060bc9..79cb563fd887 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -115,7 +115,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
 		fscrypt_fname_free_buffer(fname);
 }
 
-int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
 		      struct fscrypt_str *oname, bool *is_nokey);
 int ceph_fscrypt_prepare_readdir(struct inode *dir);
 
@@ -204,7 +204,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
 }
 
 static inline int ceph_fname_to_usr(const struct ceph_fname *fname,
-				    struct fscrypt_str *tname,
+				    unsigned char *tname,
 				    struct fscrypt_str *oname, bool *is_nokey)
 {
 	oname->name = fname->name;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ed17e0023705..aa6730b48e97 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -488,11 +488,11 @@ static int parse_reply_info_readdir(void **p, void *end,
 		struct inode *inode = d_inode(req->r_dentry);
 		struct ceph_inode_info *ci = ceph_inode(inode);
 		struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
-		struct fscrypt_str tname = FSTR_INIT(NULL, 0);
 		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
 		struct ceph_fname fname;
 		u32 altname_len, _name_len;
 		u8 *altname, *_name;
+		u8 *tname = NULL;
 
 		/* dentry */
 		ceph_decode_32_safe(p, end, _name_len, bad);
@@ -540,7 +540,7 @@ static int parse_reply_info_readdir(void **p, void *end,
 			 * always be shorter, which is 3/4 of origin
 			 * string.
 			 */
-			tname.name = _name;
+			tname = _name;
 
 			/*
 			 * Set oname to _name too, and this will be
@@ -557,7 +557,7 @@ static int parse_reply_info_readdir(void **p, void *end,
 			oname.len = altname_len;
 		}
 		rde->is_nokey = false;
-		err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
+		err = ceph_fname_to_usr(&fname, tname, &oname, &rde->is_nokey);
 		if (err) {
 			pr_err_client(cl, "unable to decode %.*s, got %d\n",
 				      _name_len, _name, err);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers
  2026-05-27  2:58 [PATCH 0/2] Bounce buffer for mds client decryption when vmalloc() Sam Edwards
  2026-05-27  2:58 ` [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly Sam Edwards
@ 2026-05-27  2:58 ` Sam Edwards
  2026-05-29 20:50   ` Viacheslav Dubeyko
  1 sibling, 1 reply; 9+ messages in thread
From: Sam Edwards @ 2026-05-27  2:58 UTC (permalink / raw)
  To: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko
  Cc: Jeff Layton, Xiubo Li, Milind Changire, ceph-devel, linux-kernel,
	Sam Edwards, stable

The fscrypt subsystem uses the scatterlist crypto API, inheriting its
requirement that any buffers are in the linear mapping region. However,
the messenger client uses kvmalloc() to create buffers for messages,
which will occasionally place those buffers in the vmalloc() region when
physical memory fragmentation doesn't permit a large enough kmalloc().
The various callers of ceph_fname_to_usr() directly pass (slices of) raw
messages from the MDS without considering that the messages may be in
vmalloc() buffers, resulting in oopses especially on non-x86 platforms
(see 'Closes:' for more details and a reproducer).

Make ceph_fname_to_usr() explicitly tolerant of vmalloc()-allocated
fname->ctext, fname->name, and/or oname->name buffers, using `tname`
(which, when non-null, must be a linear address; when null, is briefly
allocated as necessary) as a bounce buffer to avoid passing any
inappropriate addresses to fscrypt_fname_disk_to_usr().

Additionally change parse_reply_info_readdir() -- the only function to
supply its own `tname` -- to follow the new "tname must never come from
vmalloc()" rule by passing NULL when the message is not in the linear
region. Though this causes a per-dentry kmalloc()+kfree(), this overhead
exists only when processing the minority of messages that spill into
vmalloc(). My (crude) testing puts this at only about 1 in 8,000 readdir
messages. Still, if the overhead proves unreasonable in the future, it
is easy enough to mitigate: a future change could allocate a bounce
buffer in parse_reply_info_readdir() and use that as `tname` instead.

Fixes: 457117f077c67 ("ceph: add helpers for converting names for userland presentation")
Closes: https://lore.kernel.org/ceph-devel/CAH5Ym4ga7miUQE0K-cJA93Ya7w62P69MAN27R5cBiYnudoOHdA@mail.gmail.com/T/
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---
 fs/ceph/crypto.c     | 37 +++++++++++++++++++++++++++++--------
 fs/ceph/mds_client.c |  8 ++++++--
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 7515cb251226..61d6830d16bc 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -298,6 +298,10 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
  * Otherwise, base64 decode the string, and then ask fscrypt to format it
  * for userland presentation.
  *
+ * Though the fscrypt/crypto subsystems broadly expect all buffers to be in the
+ * linear-mapped region, this function slightly relaxes those requirements:
+ * fname->ctext, fname->name, and oname->name may be vmalloc(), but not tname.
+ *
  * Returns 0 on success or negative error code on error.
  */
 int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
@@ -305,11 +309,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
 {
 	struct inode *dir = fname->dir;
 	struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str _oname;
 	struct fscrypt_str iname;
 	char *name = fname->name;
 	int name_len = fname->name_len;
 	int ret;
 
+	if (WARN_ON_ONCE(tname && is_vmalloc_addr(tname)))
+		return -EIO;
+
 	/* Sanity check that the resulting name will fit in the buffer */
 	if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
 		return -EIO;
@@ -350,16 +358,18 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
 		goto out_inode;
 	}
 
+	if (!tname && (fname->ctext_len == 0 ||
+		       unlikely(is_vmalloc_addr(fname->ctext)) ||
+		       unlikely(is_vmalloc_addr(oname->name)))) {
+		ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
+		if (ret)
+			goto out_inode;
+		tname = _tname.name;
+	}
+
 	if (fname->ctext_len == 0) {
 		int declen;
 
-		if (!tname) {
-			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
-			if (ret)
-				goto out_inode;
-			tname = _tname.name;
-		}
-
 		declen = base64_decode(name, name_len, tname, false,
 				       BASE64_IMAP);
 		if (declen <= 0) {
@@ -368,12 +378,21 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
 		}
 		iname.name = tname;
 		iname.len = declen;
+	} else if (unlikely(is_vmalloc_addr(fname->ctext))) {
+		memcpy(tname, fname->ctext, fname->ctext_len);
+
+		iname.name = tname;
+		iname.len = fname->ctext_len;
 	} else {
 		iname.name = fname->ctext;
 		iname.len = fname->ctext_len;
 	}
 
-	ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname);
+	_oname.name = unlikely(is_vmalloc_addr(oname->name)) ?
+		tname : oname->name;
+	_oname.len = oname->len;
+	ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, &_oname);
+	oname->len = _oname.len;
 	if (!ret && (dir != fname->dir)) {
 		char tmp_buf[BASE64_CHARS(NAME_MAX)];
 
@@ -381,6 +400,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
 				    oname->len, oname->name, dir->i_ino);
 		memcpy(oname->name, tmp_buf, name_len);
 		oname->len = name_len;
+	} else if (!ret && unlikely(is_vmalloc_addr(oname->name))) {
+		memcpy(oname->name, _oname.name, _oname.len);
 	}
 
 out:
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index aa6730b48e97..8fcf185e3a82 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -538,9 +538,13 @@ static int parse_reply_info_readdir(void **p, void *end,
 			 * to do the base64_decode in-place. It's
 			 * safe because the decoded string should
 			 * always be shorter, which is 3/4 of origin
-			 * string.
+			 * string. If this message was allocated with
+			 * vmalloc() (happens, but rarely), leave it
+			 * NULL and let ceph_fname_to_usr() allocate
+			 * suitable temporary working space instead.
 			 */
-			tname = _name;
+			if (likely(!is_vmalloc_addr(_name)))
+				tname = _name;
 
 			/*
 			 * Set oname to _name too, and this will be
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly
  2026-05-27  2:58 ` [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly Sam Edwards
@ 2026-05-27 12:00   ` David Laight
  2026-05-27 18:06     ` Sam Edwards
  0 siblings, 1 reply; 9+ messages in thread
From: David Laight @ 2026-05-27 12:00 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko, Jeff Layton,
	Xiubo Li, Milind Changire, ceph-devel, linux-kernel

On Tue, 26 May 2026 19:58:27 -0700
Sam Edwards <cfsworks@gmail.com> wrote:

> ceph_fname_to_usr() needs a temporary buffer for some operations
> (currently only base64-decoding ciphertext) and it is convenient to
> allow the caller to specify this buffer to avoid a heap allocation, so
> it has a (nullable) `tname` argument. Until now, this argument was a
> `struct fscrypt_str`; however, this is unnecessary for two reasons:
> 
> 1. `tname->len` isn't used anywhere: ceph_fname_to_usr() assumes a
>    buffer large enough to hold the ciphertext, and
>    parse_reply_info_readdir() -- the only caller to use tname -- doesn't
>    set it.
> 2. While the `tname` parameter is documented "may be NULL,"
>    parse_reply_info_readdir() always passes it but with `tname->name`
>    sometimes NULL in violation of the contract, indicating that the
>    unnecessary container creates actual confusion.
> 
> Therefore, change the type to `unsigned char *` and pass the buffer
> directly.
> 
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> ---
>  fs/ceph/crypto.c     | 10 +++++-----
>  fs/ceph/crypto.h     |  4 ++--
>  fs/ceph/mds_client.c |  6 +++---
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index 64d240759277..7515cb251226 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -300,7 +300,7 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
>   *
>   * Returns 0 on success or negative error code on error.
>   */
> -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,

I can't help feeling that the buffer length should also be passed.
Either explicitly or, if constant, implicitly by embedding the array
in a structure.

-- David


>  		      struct fscrypt_str *oname, bool *is_nokey)
>  {
>  	struct inode *dir = fname->dir;
> @@ -357,16 +357,16 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
>  			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
>  			if (ret)
>  				goto out_inode;
> -			tname = &_tname;
> +			tname = _tname.name;
>  		}
>  
> -		declen = base64_decode(name, name_len,
> -				       tname->name, false, BASE64_IMAP);
> +		declen = base64_decode(name, name_len, tname, false,
> +				       BASE64_IMAP);
>  		if (declen <= 0) {
>  			ret = -EIO;
>  			goto out;
>  		}
> -		iname.name = tname->name;
> +		iname.name = tname;
>  		iname.len = declen;
>  	} else {
>  		iname.name = fname->ctext;
> diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
> index b748e2060bc9..79cb563fd887 100644
> --- a/fs/ceph/crypto.h
> +++ b/fs/ceph/crypto.h
> @@ -115,7 +115,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
>  		fscrypt_fname_free_buffer(fname);
>  }
>  
> -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  		      struct fscrypt_str *oname, bool *is_nokey);
>  int ceph_fscrypt_prepare_readdir(struct inode *dir);
>  
> @@ -204,7 +204,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
>  }
>  
>  static inline int ceph_fname_to_usr(const struct ceph_fname *fname,
> -				    struct fscrypt_str *tname,
> +				    unsigned char *tname,
>  				    struct fscrypt_str *oname, bool *is_nokey)
>  {
>  	oname->name = fname->name;
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index ed17e0023705..aa6730b48e97 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -488,11 +488,11 @@ static int parse_reply_info_readdir(void **p, void *end,
>  		struct inode *inode = d_inode(req->r_dentry);
>  		struct ceph_inode_info *ci = ceph_inode(inode);
>  		struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> -		struct fscrypt_str tname = FSTR_INIT(NULL, 0);
>  		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
>  		struct ceph_fname fname;
>  		u32 altname_len, _name_len;
>  		u8 *altname, *_name;
> +		u8 *tname = NULL;
>  
>  		/* dentry */
>  		ceph_decode_32_safe(p, end, _name_len, bad);
> @@ -540,7 +540,7 @@ static int parse_reply_info_readdir(void **p, void *end,
>  			 * always be shorter, which is 3/4 of origin
>  			 * string.
>  			 */
> -			tname.name = _name;
> +			tname = _name;
>  
>  			/*
>  			 * Set oname to _name too, and this will be
> @@ -557,7 +557,7 @@ static int parse_reply_info_readdir(void **p, void *end,
>  			oname.len = altname_len;
>  		}
>  		rde->is_nokey = false;
> -		err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
> +		err = ceph_fname_to_usr(&fname, tname, &oname, &rde->is_nokey);
>  		if (err) {
>  			pr_err_client(cl, "unable to decode %.*s, got %d\n",
>  				      _name_len, _name, err);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly
  2026-05-27 12:00   ` David Laight
@ 2026-05-27 18:06     ` Sam Edwards
  2026-05-27 22:13       ` David Laight
  0 siblings, 1 reply; 9+ messages in thread
From: Sam Edwards @ 2026-05-27 18:06 UTC (permalink / raw)
  To: David Laight
  Cc: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko, Jeff Layton,
	Xiubo Li, ceph-devel, linux-kernel

On Wed, May 27, 2026 at 5:00 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Tue, 26 May 2026 19:58:27 -0700
> Sam Edwards <cfsworks@gmail.com> wrote:
>
> > ceph_fname_to_usr() needs a temporary buffer for some operations
> > (currently only base64-decoding ciphertext) and it is convenient to
> > allow the caller to specify this buffer to avoid a heap allocation, so
> > it has a (nullable) `tname` argument. Until now, this argument was a
> > `struct fscrypt_str`; however, this is unnecessary for two reasons:
> >
> > 1. `tname->len` isn't used anywhere: ceph_fname_to_usr() assumes a
> >    buffer large enough to hold the ciphertext, and
> >    parse_reply_info_readdir() -- the only caller to use tname -- doesn't
> >    set it.
> > 2. While the `tname` parameter is documented "may be NULL,"
> >    parse_reply_info_readdir() always passes it but with `tname->name`
> >    sometimes NULL in violation of the contract, indicating that the
> >    unnecessary container creates actual confusion.
> >
> > Therefore, change the type to `unsigned char *` and pass the buffer
> > directly.
> >
> > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> > ---
> >  fs/ceph/crypto.c     | 10 +++++-----
> >  fs/ceph/crypto.h     |  4 ++--
> >  fs/ceph/mds_client.c |  6 +++---
> >  3 files changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > index 64d240759277..7515cb251226 100644
> > --- a/fs/ceph/crypto.c
> > +++ b/fs/ceph/crypto.c
> > @@ -300,7 +300,7 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
> >   *
> >   * Returns 0 on success or negative error code on error.
> >   */
> > -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>
> I can't help feeling that the buffer length should also be passed.
> Either explicitly or, if constant, implicitly by embedding the array
> in a structure.

It isn't constant; the specific requirement (unchanged in patch 2) is
that the buffer be at least large enough to hold the ciphertext. The
only caller to pass tname has a comment explaining how it meets the
size requirement, so this is currently safe.

Or is your feeling more about general robustness, ensuring that the
function prototype of ceph_fname_to_usr() makes it hard for future
patches to ignore the length requirement? If so, the issue is
ultimately that the base64_*() functions don't accept a `dstlen` that
ceph_fname_to_usr() could use to (meaningfully) enforce the size
requirement.

Cheers,
Sam

>
> -- David
>
>
> >                     struct fscrypt_str *oname, bool *is_nokey)
> >  {
> >       struct inode *dir = fname->dir;
> > @@ -357,16 +357,16 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> >                       ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> >                       if (ret)
> >                               goto out_inode;
> > -                     tname = &_tname;
> > +                     tname = _tname.name;
> >               }
> >
> > -             declen = base64_decode(name, name_len,
> > -                                    tname->name, false, BASE64_IMAP);
> > +             declen = base64_decode(name, name_len, tname, false,
> > +                                    BASE64_IMAP);
> >               if (declen <= 0) {
> >                       ret = -EIO;
> >                       goto out;
> >               }
> > -             iname.name = tname->name;
> > +             iname.name = tname;
> >               iname.len = declen;
> >       } else {
> >               iname.name = fname->ctext;
> > diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
> > index b748e2060bc9..79cb563fd887 100644
> > --- a/fs/ceph/crypto.h
> > +++ b/fs/ceph/crypto.h
> > @@ -115,7 +115,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
> >               fscrypt_fname_free_buffer(fname);
> >  }
> >
> > -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> >                     struct fscrypt_str *oname, bool *is_nokey);
> >  int ceph_fscrypt_prepare_readdir(struct inode *dir);
> >
> > @@ -204,7 +204,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
> >  }
> >
> >  static inline int ceph_fname_to_usr(const struct ceph_fname *fname,
> > -                                 struct fscrypt_str *tname,
> > +                                 unsigned char *tname,
> >                                   struct fscrypt_str *oname, bool *is_nokey)
> >  {
> >       oname->name = fname->name;
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index ed17e0023705..aa6730b48e97 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -488,11 +488,11 @@ static int parse_reply_info_readdir(void **p, void *end,
> >               struct inode *inode = d_inode(req->r_dentry);
> >               struct ceph_inode_info *ci = ceph_inode(inode);
> >               struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> > -             struct fscrypt_str tname = FSTR_INIT(NULL, 0);
> >               struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> >               struct ceph_fname fname;
> >               u32 altname_len, _name_len;
> >               u8 *altname, *_name;
> > +             u8 *tname = NULL;
> >
> >               /* dentry */
> >               ceph_decode_32_safe(p, end, _name_len, bad);
> > @@ -540,7 +540,7 @@ static int parse_reply_info_readdir(void **p, void *end,
> >                        * always be shorter, which is 3/4 of origin
> >                        * string.
> >                        */
> > -                     tname.name = _name;
> > +                     tname = _name;
> >
> >                       /*
> >                        * Set oname to _name too, and this will be
> > @@ -557,7 +557,7 @@ static int parse_reply_info_readdir(void **p, void *end,
> >                       oname.len = altname_len;
> >               }
> >               rde->is_nokey = false;
> > -             err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
> > +             err = ceph_fname_to_usr(&fname, tname, &oname, &rde->is_nokey);
> >               if (err) {
> >                       pr_err_client(cl, "unable to decode %.*s, got %d\n",
> >                                     _name_len, _name, err);
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly
  2026-05-27 18:06     ` Sam Edwards
@ 2026-05-27 22:13       ` David Laight
  2026-05-27 22:44         ` Sam Edwards
  0 siblings, 1 reply; 9+ messages in thread
From: David Laight @ 2026-05-27 22:13 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko, Jeff Layton,
	Xiubo Li, ceph-devel, linux-kernel

On Wed, 27 May 2026 11:06:08 -0700
Sam Edwards <cfsworks@gmail.com> wrote:

> On Wed, May 27, 2026 at 5:00 AM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Tue, 26 May 2026 19:58:27 -0700
> > Sam Edwards <cfsworks@gmail.com> wrote:
> >  
> > > ceph_fname_to_usr() needs a temporary buffer for some operations
> > > (currently only base64-decoding ciphertext) and it is convenient to
> > > allow the caller to specify this buffer to avoid a heap allocation, so
> > > it has a (nullable) `tname` argument. Until now, this argument was a
> > > `struct fscrypt_str`; however, this is unnecessary for two reasons:
> > >
> > > 1. `tname->len` isn't used anywhere: ceph_fname_to_usr() assumes a
> > >    buffer large enough to hold the ciphertext, and
> > >    parse_reply_info_readdir() -- the only caller to use tname -- doesn't
> > >    set it.
> > > 2. While the `tname` parameter is documented "may be NULL,"
> > >    parse_reply_info_readdir() always passes it but with `tname->name`
> > >    sometimes NULL in violation of the contract, indicating that the
> > >    unnecessary container creates actual confusion.
> > >
> > > Therefore, change the type to `unsigned char *` and pass the buffer
> > > directly.
> > >
> > > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> > > ---
> > >  fs/ceph/crypto.c     | 10 +++++-----
> > >  fs/ceph/crypto.h     |  4 ++--
> > >  fs/ceph/mds_client.c |  6 +++---
> > >  3 files changed, 10 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > > index 64d240759277..7515cb251226 100644
> > > --- a/fs/ceph/crypto.c
> > > +++ b/fs/ceph/crypto.c
> > > @@ -300,7 +300,7 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
> > >   *
> > >   * Returns 0 on success or negative error code on error.
> > >   */
> > > -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > > +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,  
> >
> > I can't help feeling that the buffer length should also be passed.
> > Either explicitly or, if constant, implicitly by embedding the array
> > in a structure.  
> 
> It isn't constant; the specific requirement (unchanged in patch 2) is
> that the buffer be at least large enough to hold the ciphertext. The
> only caller to pass tname has a comment explaining how it meets the
> size requirement, so this is currently safe.

Ugg...
That is just an accident waiting to happen.

> Or is your feeling more about general robustness, ensuring that the
> function prototype of ceph_fname_to_usr() makes it hard for future
> patches to ignore the length requirement? If so, the issue is
> ultimately that the base64_*() functions don't accept a `dstlen` that
> ceph_fname_to_usr() could use to (meaningfully) enforce the size
> requirement.

The output for the base64 functions depends only on the size of the
input - so is easy to get right.
And for decode is always shorter.

I've just looked at what happens when tname is NULL - that looks
broken as well - why not just kmalloc() a buffer that is the right
size instead of using a wrapper function that might return a
different length entirely.
Maybe it should be too long, but bugs happen.

There are seem to be random overwrites of buffers of pointers
to buffers - more code that is badly fragile.

-- David

> 
> Cheers,
> Sam
> 
> >
> > -- David
> >
> >  
> > >                     struct fscrypt_str *oname, bool *is_nokey)
> > >  {
> > >       struct inode *dir = fname->dir;
> > > @@ -357,16 +357,16 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > >                       ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> > >                       if (ret)
> > >                               goto out_inode;
> > > -                     tname = &_tname;
> > > +                     tname = _tname.name;
> > >               }
> > >
> > > -             declen = base64_decode(name, name_len,
> > > -                                    tname->name, false, BASE64_IMAP);
> > > +             declen = base64_decode(name, name_len, tname, false,
> > > +                                    BASE64_IMAP);
> > >               if (declen <= 0) {
> > >                       ret = -EIO;
> > >                       goto out;
> > >               }
> > > -             iname.name = tname->name;
> > > +             iname.name = tname;
> > >               iname.len = declen;
> > >       } else {
> > >               iname.name = fname->ctext;
> > > diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
> > > index b748e2060bc9..79cb563fd887 100644
> > > --- a/fs/ceph/crypto.h
> > > +++ b/fs/ceph/crypto.h
> > > @@ -115,7 +115,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
> > >               fscrypt_fname_free_buffer(fname);
> > >  }
> > >
> > > -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > > +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > >                     struct fscrypt_str *oname, bool *is_nokey);
> > >  int ceph_fscrypt_prepare_readdir(struct inode *dir);
> > >
> > > @@ -204,7 +204,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
> > >  }
> > >
> > >  static inline int ceph_fname_to_usr(const struct ceph_fname *fname,
> > > -                                 struct fscrypt_str *tname,
> > > +                                 unsigned char *tname,
> > >                                   struct fscrypt_str *oname, bool *is_nokey)
> > >  {
> > >       oname->name = fname->name;
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index ed17e0023705..aa6730b48e97 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -488,11 +488,11 @@ static int parse_reply_info_readdir(void **p, void *end,
> > >               struct inode *inode = d_inode(req->r_dentry);
> > >               struct ceph_inode_info *ci = ceph_inode(inode);
> > >               struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> > > -             struct fscrypt_str tname = FSTR_INIT(NULL, 0);
> > >               struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> > >               struct ceph_fname fname;
> > >               u32 altname_len, _name_len;
> > >               u8 *altname, *_name;
> > > +             u8 *tname = NULL;
> > >
> > >               /* dentry */
> > >               ceph_decode_32_safe(p, end, _name_len, bad);
> > > @@ -540,7 +540,7 @@ static int parse_reply_info_readdir(void **p, void *end,
> > >                        * always be shorter, which is 3/4 of origin
> > >                        * string.
> > >                        */
> > > -                     tname.name = _name;
> > > +                     tname = _name;
> > >
> > >                       /*
> > >                        * Set oname to _name too, and this will be
> > > @@ -557,7 +557,7 @@ static int parse_reply_info_readdir(void **p, void *end,
> > >                       oname.len = altname_len;
> > >               }
> > >               rde->is_nokey = false;
> > > -             err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
> > > +             err = ceph_fname_to_usr(&fname, tname, &oname, &rde->is_nokey);
> > >               if (err) {
> > >                       pr_err_client(cl, "unable to decode %.*s, got %d\n",
> > >                                     _name_len, _name, err);  
> >  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly
  2026-05-27 22:13       ` David Laight
@ 2026-05-27 22:44         ` Sam Edwards
  0 siblings, 0 replies; 9+ messages in thread
From: Sam Edwards @ 2026-05-27 22:44 UTC (permalink / raw)
  To: David Laight
  Cc: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko, Jeff Layton,
	Xiubo Li, ceph-devel, linux-kernel

On Wed, May 27, 2026 at 3:13 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 27 May 2026 11:06:08 -0700
> Sam Edwards <cfsworks@gmail.com> wrote:
>
> > On Wed, May 27, 2026 at 5:00 AM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Tue, 26 May 2026 19:58:27 -0700
> > > Sam Edwards <cfsworks@gmail.com> wrote:
> > >
> > > > ceph_fname_to_usr() needs a temporary buffer for some operations
> > > > (currently only base64-decoding ciphertext) and it is convenient to
> > > > allow the caller to specify this buffer to avoid a heap allocation, so
> > > > it has a (nullable) `tname` argument. Until now, this argument was a
> > > > `struct fscrypt_str`; however, this is unnecessary for two reasons:
> > > >
> > > > 1. `tname->len` isn't used anywhere: ceph_fname_to_usr() assumes a
> > > >    buffer large enough to hold the ciphertext, and
> > > >    parse_reply_info_readdir() -- the only caller to use tname -- doesn't
> > > >    set it.
> > > > 2. While the `tname` parameter is documented "may be NULL,"
> > > >    parse_reply_info_readdir() always passes it but with `tname->name`
> > > >    sometimes NULL in violation of the contract, indicating that the
> > > >    unnecessary container creates actual confusion.
> > > >
> > > > Therefore, change the type to `unsigned char *` and pass the buffer
> > > > directly.
> > > >
> > > > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> > > > ---
> > > >  fs/ceph/crypto.c     | 10 +++++-----
> > > >  fs/ceph/crypto.h     |  4 ++--
> > > >  fs/ceph/mds_client.c |  6 +++---
> > > >  3 files changed, 10 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > > > index 64d240759277..7515cb251226 100644
> > > > --- a/fs/ceph/crypto.c
> > > > +++ b/fs/ceph/crypto.c
> > > > @@ -300,7 +300,7 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
> > > >   *
> > > >   * Returns 0 on success or negative error code on error.
> > > >   */
> > > > -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > > > +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > >
> > > I can't help feeling that the buffer length should also be passed.
> > > Either explicitly or, if constant, implicitly by embedding the array
> > > in a structure.
> >
> > It isn't constant; the specific requirement (unchanged in patch 2) is
> > that the buffer be at least large enough to hold the ciphertext. The
> > only caller to pass tname has a comment explaining how it meets the
> > size requirement, so this is currently safe.
>
> Ugg...
> That is just an accident waiting to happen.
>
> > Or is your feeling more about general robustness, ensuring that the
> > function prototype of ceph_fname_to_usr() makes it hard for future
> > patches to ignore the length requirement? If so, the issue is
> > ultimately that the base64_*() functions don't accept a `dstlen` that
> > ceph_fname_to_usr() could use to (meaningfully) enforce the size
> > requirement.
>
> The output for the base64 functions depends only on the size of the
> input - so is easy to get right.
> And for decode is always shorter.

That's the same brand of "easy to get right" as passing a big-enough
tname buffer. In fact, you just paraphrased the second sentence of the
safety comment in parse_reply_info_readdir().

But my concern is that if tname->len were to be made significant --
which I'm receptive to doing, to be clear -- I need to be able to
enforce that length meaningfully and proactively, which
base64_decode() currently doesn't allow. And something like this is a
non-starter:

declen = base64_decode(name, name_len,
                       tname->name, false, BASE64_IMAP);
if (declen <= 0 || declen > tname->len) {
    ret = -EIO;
    goto out;
}

And I'm firmly against mixing the "explicit buffer bounds check" and
"easy enough to plan the size ahead of time" approaches; that's the
real "accident waiting to happen" territory. Any argument for an
explicit check here is also an argument for an explicit check in
base64_{de,en}code().

> I've just looked at what happens when tname is NULL - that looks
> broken as well - why not just kmalloc() a buffer that is the right
> size instead of using a wrapper function that might return a
> different length entirely.
> Maybe it should be too long, but bugs happen.

The wrapper function guarantees a minimum length, and it's being
called with NAME_MAX, so it's (at least) big enough to hold the
longest possible value it needs to hold. What kind of bug are you
expecting, should the buffer be too long?

> There are seem to be random overwrites of buffers of pointers
> to buffers - more code that is badly fragile.

I need more specificity: Where? Which buffers are being overwritten?
Under which circumstances? Which code?

Best,
Sam

>
> -- David
>
> >
> > Cheers,
> > Sam
> >
> > >
> > > -- David
> > >
> > >
> > > >                     struct fscrypt_str *oname, bool *is_nokey)
> > > >  {
> > > >       struct inode *dir = fname->dir;
> > > > @@ -357,16 +357,16 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > > >                       ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> > > >                       if (ret)
> > > >                               goto out_inode;
> > > > -                     tname = &_tname;
> > > > +                     tname = _tname.name;
> > > >               }
> > > >
> > > > -             declen = base64_decode(name, name_len,
> > > > -                                    tname->name, false, BASE64_IMAP);
> > > > +             declen = base64_decode(name, name_len, tname, false,
> > > > +                                    BASE64_IMAP);
> > > >               if (declen <= 0) {
> > > >                       ret = -EIO;
> > > >                       goto out;
> > > >               }
> > > > -             iname.name = tname->name;
> > > > +             iname.name = tname;
> > > >               iname.len = declen;
> > > >       } else {
> > > >               iname.name = fname->ctext;
> > > > diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
> > > > index b748e2060bc9..79cb563fd887 100644
> > > > --- a/fs/ceph/crypto.h
> > > > +++ b/fs/ceph/crypto.h
> > > > @@ -115,7 +115,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
> > > >               fscrypt_fname_free_buffer(fname);
> > > >  }
> > > >
> > > > -int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
> > > > +int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > > >                     struct fscrypt_str *oname, bool *is_nokey);
> > > >  int ceph_fscrypt_prepare_readdir(struct inode *dir);
> > > >
> > > > @@ -204,7 +204,7 @@ static inline void ceph_fname_free_buffer(struct inode *parent,
> > > >  }
> > > >
> > > >  static inline int ceph_fname_to_usr(const struct ceph_fname *fname,
> > > > -                                 struct fscrypt_str *tname,
> > > > +                                 unsigned char *tname,
> > > >                                   struct fscrypt_str *oname, bool *is_nokey)
> > > >  {
> > > >       oname->name = fname->name;
> > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > index ed17e0023705..aa6730b48e97 100644
> > > > --- a/fs/ceph/mds_client.c
> > > > +++ b/fs/ceph/mds_client.c
> > > > @@ -488,11 +488,11 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > >               struct inode *inode = d_inode(req->r_dentry);
> > > >               struct ceph_inode_info *ci = ceph_inode(inode);
> > > >               struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> > > > -             struct fscrypt_str tname = FSTR_INIT(NULL, 0);
> > > >               struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> > > >               struct ceph_fname fname;
> > > >               u32 altname_len, _name_len;
> > > >               u8 *altname, *_name;
> > > > +             u8 *tname = NULL;
> > > >
> > > >               /* dentry */
> > > >               ceph_decode_32_safe(p, end, _name_len, bad);
> > > > @@ -540,7 +540,7 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > >                        * always be shorter, which is 3/4 of origin
> > > >                        * string.
> > > >                        */
> > > > -                     tname.name = _name;
> > > > +                     tname = _name;
> > > >
> > > >                       /*
> > > >                        * Set oname to _name too, and this will be
> > > > @@ -557,7 +557,7 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > >                       oname.len = altname_len;
> > > >               }
> > > >               rde->is_nokey = false;
> > > > -             err = ceph_fname_to_usr(&fname, &tname, &oname, &rde->is_nokey);
> > > > +             err = ceph_fname_to_usr(&fname, tname, &oname, &rde->is_nokey);
> > > >               if (err) {
> > > >                       pr_err_client(cl, "unable to decode %.*s, got %d\n",
> > > >                                     _name_len, _name, err);
> > >
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers
  2026-05-27  2:58 ` [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
@ 2026-05-29 20:50   ` Viacheslav Dubeyko
  2026-05-30  0:59     ` Sam Edwards
  0 siblings, 1 reply; 9+ messages in thread
From: Viacheslav Dubeyko @ 2026-05-29 20:50 UTC (permalink / raw)
  To: idryomov@gmail.com, Alex Markuze, cfsworks@gmail.com,
	slava@dubeyko.com
  Cc: mchangir@redhat.com, stable@vger.kernel.org, Xiubo Li,
	jlayton@kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@vger.kernel.org

On Tue, 2026-05-26 at 19:58 -0700, Sam Edwards wrote:
> The fscrypt subsystem uses the scatterlist crypto API, inheriting its
> requirement that any buffers are in the linear mapping region. However,
> the messenger client uses kvmalloc() to create buffers for messages,
> which will occasionally place those buffers in the vmalloc() region when
> physical memory fragmentation doesn't permit a large enough kmalloc().
> The various callers of ceph_fname_to_usr() directly pass (slices of) raw
> messages from the MDS without considering that the messages may be in
> vmalloc() buffers, resulting in oopses especially on non-x86 platforms
> (see 'Closes:' for more details and a reproducer).
> 
> Make ceph_fname_to_usr() explicitly tolerant of vmalloc()-allocated
> fname->ctext, fname->name, and/or oname->name buffers, using `tname`
> (which, when non-null, must be a linear address; when null, is briefly
> allocated as necessary) as a bounce buffer to avoid passing any
> inappropriate addresses to fscrypt_fname_disk_to_usr().
> 
> Additionally change parse_reply_info_readdir() -- the only function to
> supply its own `tname` -- to follow the new "tname must never come from
> vmalloc()" rule by passing NULL when the message is not in the linear
> region. Though this causes a per-dentry kmalloc()+kfree(), this overhead
> exists only when processing the minority of messages that spill into
> vmalloc(). My (crude) testing puts this at only about 1 in 8,000 readdir
> messages. Still, if the overhead proves unreasonable in the future, it
> is easy enough to mitigate: a future change could allocate a bounce
> buffer in parse_reply_info_readdir() and use that as `tname` instead.
> 
> Fixes: 457117f077c67 ("ceph: add helpers for converting names for userland presentation")
> Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_ceph-2Ddevel_CAH5Ym4ga7miUQE0K-2DcJA93Ya7w62P69MAN27R5cBiYnudoOHdA-40mail.gmail.com_T_&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=bcX0FhBD6jzWXGOsw2LoJOl_TqgobmwNBmqNIhj2K0qBn2krB8IUrIhcUs8LmJWM&s=2uInY5Ys7xQ_57Ifo4uovP7_e8SN0Q_wnzBDX-uj0hE&e= 
> Cc: stable@vger.kernel.org # v6.6+
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> ---
>  fs/ceph/crypto.c     | 37 +++++++++++++++++++++++++++++--------
>  fs/ceph/mds_client.c |  8 ++++++--
>  2 files changed, 35 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index 7515cb251226..61d6830d16bc 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -298,6 +298,10 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
>   * Otherwise, base64 decode the string, and then ask fscrypt to format it
>   * for userland presentation.
>   *
> + * Though the fscrypt/crypto subsystems broadly expect all buffers to be in the
> + * linear-mapped region, this function slightly relaxes those requirements:
> + * fname->ctext, fname->name, and oname->name may be vmalloc(), but not tname.
> + *
>   * Returns 0 on success or negative error code on error.
>   */
>  int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> @@ -305,11 +309,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  {
>  	struct inode *dir = fname->dir;
>  	struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
> +	struct fscrypt_str _oname;
>  	struct fscrypt_str iname;
>  	char *name = fname->name;
>  	int name_len = fname->name_len;
>  	int ret;
>  
> +	if (WARN_ON_ONCE(tname && is_vmalloc_addr(tname)))
> +		return -EIO;
> +
>  	/* Sanity check that the resulting name will fit in the buffer */
>  	if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
>  		return -EIO;
> @@ -350,16 +358,18 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  		goto out_inode;
>  	}
>  
> +	if (!tname && (fname->ctext_len == 0 ||
> +		       unlikely(is_vmalloc_addr(fname->ctext)) ||
> +		       unlikely(is_vmalloc_addr(oname->name)))) {
> +		ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> +		if (ret)
> +			goto out_inode;
> +		tname = _tname.name;
> +	}
> +
>  	if (fname->ctext_len == 0) {
>  		int declen;
>  
> -		if (!tname) {
> -			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> -			if (ret)
> -				goto out_inode;
> -			tname = _tname.name;
> -		}
> -
>  		declen = base64_decode(name, name_len, tname, false,
>  				       BASE64_IMAP);
>  		if (declen <= 0) {
> @@ -368,12 +378,21 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  		}
>  		iname.name = tname;
>  		iname.len = declen;
> +	} else if (unlikely(is_vmalloc_addr(fname->ctext))) {
> +		memcpy(tname, fname->ctext, fname->ctext_len);
> +
> +		iname.name = tname;
> +		iname.len = fname->ctext_len;
>  	} else {
>  		iname.name = fname->ctext;
>  		iname.len = fname->ctext_len;
>  	}
>  
> -	ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname);
> +	_oname.name = unlikely(is_vmalloc_addr(oname->name)) ?
> +		tname : oname->name;
> +	_oname.len = oname->len;
> +	ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, &_oname);
> +	oname->len = _oname.len;
>  	if (!ret && (dir != fname->dir)) {
>  		char tmp_buf[BASE64_CHARS(NAME_MAX)];
>  
> @@ -381,6 +400,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  				    oname->len, oname->name, dir->i_ino);
>  		memcpy(oname->name, tmp_buf, name_len);
>  		oname->len = name_len;
> +	} else if (!ret && unlikely(is_vmalloc_addr(oname->name))) {
> +		memcpy(oname->name, _oname.name, _oname.len);
>  	}

When both dir != fname->dir (longname snapshot) and is_vmalloc_addr(oname->name)
are true:

(1) The if branch is taken — NOT the else if.
(2) _oname.name = tname holds the decrypted result (fscrypt wrote there).
(3) oname->name is the stale vmalloc buffer — the copy-back in the else if was
never executed.
(4) The snprintf reads oname->name and formats a snapshot name from garbage.

Am I right?

This part of logic needs to be reworked carefully. This if - else construction
becomes really complicated to understand.

Thanks,
Slava.

>  
>  out:
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index aa6730b48e97..8fcf185e3a82 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -538,9 +538,13 @@ static int parse_reply_info_readdir(void **p, void *end,
>  			 * to do the base64_decode in-place. It's
>  			 * safe because the decoded string should
>  			 * always be shorter, which is 3/4 of origin
> -			 * string.
> +			 * string. If this message was allocated with
> +			 * vmalloc() (happens, but rarely), leave it
> +			 * NULL and let ceph_fname_to_usr() allocate
> +			 * suitable temporary working space instead.
>  			 */
> -			tname = _name;
> +			if (likely(!is_vmalloc_addr(_name)))
> +				tname = _name;
>  
>  			/*
>  			 * Set oname to _name too, and this will be

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers
  2026-05-29 20:50   ` Viacheslav Dubeyko
@ 2026-05-30  0:59     ` Sam Edwards
  0 siblings, 0 replies; 9+ messages in thread
From: Sam Edwards @ 2026-05-30  0:59 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: idryomov@gmail.com, Alex Markuze, slava@dubeyko.com,
	mchangir@redhat.com, stable@vger.kernel.org, Xiubo Li,
	jlayton@kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@vger.kernel.org

On Fri, May 29, 2026 at 1:50 PM Viacheslav Dubeyko
<Slava.Dubeyko@ibm.com> wrote:
>
> On Tue, 2026-05-26 at 19:58 -0700, Sam Edwards wrote:
> > The fscrypt subsystem uses the scatterlist crypto API, inheriting its
> > requirement that any buffers are in the linear mapping region. However,
> > the messenger client uses kvmalloc() to create buffers for messages,
> > which will occasionally place those buffers in the vmalloc() region when
> > physical memory fragmentation doesn't permit a large enough kmalloc().
> > The various callers of ceph_fname_to_usr() directly pass (slices of) raw
> > messages from the MDS without considering that the messages may be in
> > vmalloc() buffers, resulting in oopses especially on non-x86 platforms
> > (see 'Closes:' for more details and a reproducer).
> >
> > Make ceph_fname_to_usr() explicitly tolerant of vmalloc()-allocated
> > fname->ctext, fname->name, and/or oname->name buffers, using `tname`
> > (which, when non-null, must be a linear address; when null, is briefly
> > allocated as necessary) as a bounce buffer to avoid passing any
> > inappropriate addresses to fscrypt_fname_disk_to_usr().
> >
> > Additionally change parse_reply_info_readdir() -- the only function to
> > supply its own `tname` -- to follow the new "tname must never come from
> > vmalloc()" rule by passing NULL when the message is not in the linear
> > region. Though this causes a per-dentry kmalloc()+kfree(), this overhead
> > exists only when processing the minority of messages that spill into
> > vmalloc(). My (crude) testing puts this at only about 1 in 8,000 readdir
> > messages. Still, if the overhead proves unreasonable in the future, it
> > is easy enough to mitigate: a future change could allocate a bounce
> > buffer in parse_reply_info_readdir() and use that as `tname` instead.
> >
> > Fixes: 457117f077c67 ("ceph: add helpers for converting names for userland presentation")
> > Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_ceph-2Ddevel_CAH5Ym4ga7miUQE0K-2DcJA93Ya7w62P69MAN27R5cBiYnudoOHdA-40mail.gmail.com_T_&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=bcX0FhBD6jzWXGOsw2LoJOl_TqgobmwNBmqNIhj2K0qBn2krB8IUrIhcUs8LmJWM&s=2uInY5Ys7xQ_57Ifo4uovP7_e8SN0Q_wnzBDX-uj0hE&e=
> > Cc: stable@vger.kernel.org # v6.6+
> > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> > ---
> >  fs/ceph/crypto.c     | 37 +++++++++++++++++++++++++++++--------
> >  fs/ceph/mds_client.c |  8 ++++++--
> >  2 files changed, 35 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > index 7515cb251226..61d6830d16bc 100644
> > --- a/fs/ceph/crypto.c
> > +++ b/fs/ceph/crypto.c
> > @@ -298,6 +298,10 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
> >   * Otherwise, base64 decode the string, and then ask fscrypt to format it
> >   * for userland presentation.
> >   *
> > + * Though the fscrypt/crypto subsystems broadly expect all buffers to be in the
> > + * linear-mapped region, this function slightly relaxes those requirements:
> > + * fname->ctext, fname->name, and oname->name may be vmalloc(), but not tname.
> > + *
> >   * Returns 0 on success or negative error code on error.
> >   */
> >  int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > @@ -305,11 +309,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> >  {
> >       struct inode *dir = fname->dir;
> >       struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
> > +     struct fscrypt_str _oname;
> >       struct fscrypt_str iname;
> >       char *name = fname->name;
> >       int name_len = fname->name_len;
> >       int ret;
> >
> > +     if (WARN_ON_ONCE(tname && is_vmalloc_addr(tname)))
> > +             return -EIO;
> > +
> >       /* Sanity check that the resulting name will fit in the buffer */
> >       if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
> >               return -EIO;
> > @@ -350,16 +358,18 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> >               goto out_inode;
> >       }
> >
> > +     if (!tname && (fname->ctext_len == 0 ||
> > +                    unlikely(is_vmalloc_addr(fname->ctext)) ||
> > +                    unlikely(is_vmalloc_addr(oname->name)))) {
> > +             ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> > +             if (ret)
> > +                     goto out_inode;
> > +             tname = _tname.name;
> > +     }
> > +
> >       if (fname->ctext_len == 0) {
> >               int declen;
> >
> > -             if (!tname) {
> > -                     ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> > -                     if (ret)
> > -                             goto out_inode;
> > -                     tname = _tname.name;
> > -             }
> > -
> >               declen = base64_decode(name, name_len, tname, false,
> >                                      BASE64_IMAP);
> >               if (declen <= 0) {
> > @@ -368,12 +378,21 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> >               }
> >               iname.name = tname;
> >               iname.len = declen;
> > +     } else if (unlikely(is_vmalloc_addr(fname->ctext))) {
> > +             memcpy(tname, fname->ctext, fname->ctext_len);
> > +
> > +             iname.name = tname;
> > +             iname.len = fname->ctext_len;
> >       } else {
> >               iname.name = fname->ctext;
> >               iname.len = fname->ctext_len;
> >       }
> >
> > -     ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname);
> > +     _oname.name = unlikely(is_vmalloc_addr(oname->name)) ?
> > +             tname : oname->name;
> > +     _oname.len = oname->len;
> > +     ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, &_oname);
> > +     oname->len = _oname.len;
> >       if (!ret && (dir != fname->dir)) {
> >               char tmp_buf[BASE64_CHARS(NAME_MAX)];
> >
> > @@ -381,6 +400,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> >                                   oname->len, oname->name, dir->i_ino);
> >               memcpy(oname->name, tmp_buf, name_len);
> >               oname->len = name_len;
> > +     } else if (!ret && unlikely(is_vmalloc_addr(oname->name))) {
> > +             memcpy(oname->name, _oname.name, _oname.len);
> >       }
>
> When both dir != fname->dir (longname snapshot) and is_vmalloc_addr(oname->name)
> are true:
>
> (1) The if branch is taken — NOT the else if.
> (2) _oname.name = tname holds the decrypted result (fscrypt wrote there).
> (3) oname->name is the stale vmalloc buffer — the copy-back in the else if was
> never executed.
> (4) The snprintf reads oname->name and formats a snapshot name from garbage.
>
> Am I right?

Hi Slava,

Indeed you are. Well spotted, I should have examined that snprintf a
little more closely.

> This part of logic needs to be reworked carefully. This if - else construction
> becomes really complicated to understand.

ACK - I was originally just going to amend the patch with that fixed:

                char tmp_buf[BASE64_CHARS(NAME_MAX)];

                name_len = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%llu",
-                                   oname->len, oname->name, dir->i_ino);
+                                   _oname.len, _oname.name, dir->i_ino);
                memcpy(oname->name, tmp_buf, name_len);
                oname->len = name_len;

...but, looking at the if/else-if block more carefully, I realized we
only care about ensuring `!ret`, so I'll simplify the control flow for
v2.

Thank you for the feedback,
Sam

> Thanks,
> Slava.
>
> >
> >  out:
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index aa6730b48e97..8fcf185e3a82 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -538,9 +538,13 @@ static int parse_reply_info_readdir(void **p, void *end,
> >                        * to do the base64_decode in-place. It's
> >                        * safe because the decoded string should
> >                        * always be shorter, which is 3/4 of origin
> > -                      * string.
> > +                      * string. If this message was allocated with
> > +                      * vmalloc() (happens, but rarely), leave it
> > +                      * NULL and let ceph_fname_to_usr() allocate
> > +                      * suitable temporary working space instead.
> >                        */
> > -                     tname = _name;
> > +                     if (likely(!is_vmalloc_addr(_name)))
> > +                             tname = _name;
> >
> >                       /*
> >                        * Set oname to _name too, and this will be

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-30  1:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27  2:58 [PATCH 0/2] Bounce buffer for mds client decryption when vmalloc() Sam Edwards
2026-05-27  2:58 ` [PATCH 1/2] ceph: pass fscrypt `tname` buffers directly Sam Edwards
2026-05-27 12:00   ` David Laight
2026-05-27 18:06     ` Sam Edwards
2026-05-27 22:13       ` David Laight
2026-05-27 22:44         ` Sam Edwards
2026-05-27  2:58 ` [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
2026-05-29 20:50   ` Viacheslav Dubeyko
2026-05-30  0:59     ` Sam Edwards

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.