Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "idryomov@gmail.com" <idryomov@gmail.com>,
	Alex Markuze <amarkuze@redhat.com>,
	"cfsworks@gmail.com" <cfsworks@gmail.com>,
	"slava@dubeyko.com" <slava@dubeyko.com>
Cc: "mchangir@redhat.com" <mchangir@redhat.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	Xiubo Li <xiubli@redhat.com>,
	"jlayton@kernel.org" <jlayton@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re:  [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers
Date: Fri, 29 May 2026 20:50:16 +0000	[thread overview]
Message-ID: <937c1dbbb0300a8113d62ca1dbaffe1493bd10e5.camel@ibm.com> (raw)
In-Reply-To: <20260527025828.5966-3-CFSworks@gmail.com>

On Tue, 2026-05-26 at 19:58 -0700, Sam Edwards wrote:
> The fscrypt subsystem uses the scatterlist crypto API, inheriting its
> requirement that any buffers are in the linear mapping region. However,
> the messenger client uses kvmalloc() to create buffers for messages,
> which will occasionally place those buffers in the vmalloc() region when
> physical memory fragmentation doesn't permit a large enough kmalloc().
> The various callers of ceph_fname_to_usr() directly pass (slices of) raw
> messages from the MDS without considering that the messages may be in
> vmalloc() buffers, resulting in oopses especially on non-x86 platforms
> (see 'Closes:' for more details and a reproducer).
> 
> Make ceph_fname_to_usr() explicitly tolerant of vmalloc()-allocated
> fname->ctext, fname->name, and/or oname->name buffers, using `tname`
> (which, when non-null, must be a linear address; when null, is briefly
> allocated as necessary) as a bounce buffer to avoid passing any
> inappropriate addresses to fscrypt_fname_disk_to_usr().
> 
> Additionally change parse_reply_info_readdir() -- the only function to
> supply its own `tname` -- to follow the new "tname must never come from
> vmalloc()" rule by passing NULL when the message is not in the linear
> region. Though this causes a per-dentry kmalloc()+kfree(), this overhead
> exists only when processing the minority of messages that spill into
> vmalloc(). My (crude) testing puts this at only about 1 in 8,000 readdir
> messages. Still, if the overhead proves unreasonable in the future, it
> is easy enough to mitigate: a future change could allocate a bounce
> buffer in parse_reply_info_readdir() and use that as `tname` instead.
> 
> Fixes: 457117f077c67 ("ceph: add helpers for converting names for userland presentation")
> Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_ceph-2Ddevel_CAH5Ym4ga7miUQE0K-2DcJA93Ya7w62P69MAN27R5cBiYnudoOHdA-40mail.gmail.com_T_&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=bcX0FhBD6jzWXGOsw2LoJOl_TqgobmwNBmqNIhj2K0qBn2krB8IUrIhcUs8LmJWM&s=2uInY5Ys7xQ_57Ifo4uovP7_e8SN0Q_wnzBDX-uj0hE&e= 
> Cc: stable@vger.kernel.org # v6.6+
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> ---
>  fs/ceph/crypto.c     | 37 +++++++++++++++++++++++++++++--------
>  fs/ceph/mds_client.c |  8 ++++++--
>  2 files changed, 35 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index 7515cb251226..61d6830d16bc 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -298,6 +298,10 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
>   * Otherwise, base64 decode the string, and then ask fscrypt to format it
>   * for userland presentation.
>   *
> + * Though the fscrypt/crypto subsystems broadly expect all buffers to be in the
> + * linear-mapped region, this function slightly relaxes those requirements:
> + * fname->ctext, fname->name, and oname->name may be vmalloc(), but not tname.
> + *
>   * Returns 0 on success or negative error code on error.
>   */
>  int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> @@ -305,11 +309,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  {
>  	struct inode *dir = fname->dir;
>  	struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
> +	struct fscrypt_str _oname;
>  	struct fscrypt_str iname;
>  	char *name = fname->name;
>  	int name_len = fname->name_len;
>  	int ret;
>  
> +	if (WARN_ON_ONCE(tname && is_vmalloc_addr(tname)))
> +		return -EIO;
> +
>  	/* Sanity check that the resulting name will fit in the buffer */
>  	if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
>  		return -EIO;
> @@ -350,16 +358,18 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  		goto out_inode;
>  	}
>  
> +	if (!tname && (fname->ctext_len == 0 ||
> +		       unlikely(is_vmalloc_addr(fname->ctext)) ||
> +		       unlikely(is_vmalloc_addr(oname->name)))) {
> +		ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> +		if (ret)
> +			goto out_inode;
> +		tname = _tname.name;
> +	}
> +
>  	if (fname->ctext_len == 0) {
>  		int declen;
>  
> -		if (!tname) {
> -			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> -			if (ret)
> -				goto out_inode;
> -			tname = _tname.name;
> -		}
> -
>  		declen = base64_decode(name, name_len, tname, false,
>  				       BASE64_IMAP);
>  		if (declen <= 0) {
> @@ -368,12 +378,21 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  		}
>  		iname.name = tname;
>  		iname.len = declen;
> +	} else if (unlikely(is_vmalloc_addr(fname->ctext))) {
> +		memcpy(tname, fname->ctext, fname->ctext_len);
> +
> +		iname.name = tname;
> +		iname.len = fname->ctext_len;
>  	} else {
>  		iname.name = fname->ctext;
>  		iname.len = fname->ctext_len;
>  	}
>  
> -	ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname);
> +	_oname.name = unlikely(is_vmalloc_addr(oname->name)) ?
> +		tname : oname->name;
> +	_oname.len = oname->len;
> +	ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, &_oname);
> +	oname->len = _oname.len;
>  	if (!ret && (dir != fname->dir)) {
>  		char tmp_buf[BASE64_CHARS(NAME_MAX)];
>  
> @@ -381,6 +400,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
>  				    oname->len, oname->name, dir->i_ino);
>  		memcpy(oname->name, tmp_buf, name_len);
>  		oname->len = name_len;
> +	} else if (!ret && unlikely(is_vmalloc_addr(oname->name))) {
> +		memcpy(oname->name, _oname.name, _oname.len);
>  	}

When both dir != fname->dir (longname snapshot) and is_vmalloc_addr(oname->name)
are true:

(1) The if branch is taken — NOT the else if.
(2) _oname.name = tname holds the decrypted result (fscrypt wrote there).
(3) oname->name is the stale vmalloc buffer — the copy-back in the else if was
never executed.
(4) The snprintf reads oname->name and formats a snapshot name from garbage.

Am I right?

This part of logic needs to be reworked carefully. This if - else construction
becomes really complicated to understand.

Thanks,
Slava.

>  
>  out:
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index aa6730b48e97..8fcf185e3a82 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -538,9 +538,13 @@ static int parse_reply_info_readdir(void **p, void *end,
>  			 * to do the base64_decode in-place. It's
>  			 * safe because the decoded string should
>  			 * always be shorter, which is 3/4 of origin
> -			 * string.
> +			 * string. If this message was allocated with
> +			 * vmalloc() (happens, but rarely), leave it
> +			 * NULL and let ceph_fname_to_usr() allocate
> +			 * suitable temporary working space instead.
>  			 */
> -			tname = _name;
> +			if (likely(!is_vmalloc_addr(_name)))
> +				tname = _name;
>  
>  			/*
>  			 * Set oname to _name too, and this will be

      reply	other threads:[~2026-05-29 20:50 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260527025828.5966-1-CFSworks@gmail.com>
2026-05-27  2:58 ` [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers Sam Edwards
2026-05-29 20:50   ` Viacheslav Dubeyko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=937c1dbbb0300a8113d62ca1dbaffe1493bd10e5.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=amarkuze@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=cfsworks@gmail.com \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchangir@redhat.com \
    --cc=slava@dubeyko.com \
    --cc=stable@vger.kernel.org \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox