linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: kernel@collabora.com, linux-ext4@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Subject: Re: [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization
Date: Fri, 30 Nov 2018 11:12:51 -0500	[thread overview]
Message-ID: <20181130161251.GA3512@thunk.org> (raw)
In-Reply-To: <20181126221949.12172-9-krisman@collabora.com>

On Mon, Nov 26, 2018 at 05:19:45PM -0500, Gabriel Krisman Bertazi wrote:
> +static int utf8_casefold(const struct nls_table *table,
> +			  const unsigned char *str, size_t len,
> +			  unsigned char *dest, size_t dlen)
> +{
> +	const struct utf8data *data = utf8nfkdicf(UNICODE_AGE(10,0,0));
> +	struct utf8cursor cur;
> +	size_t nlen = 0;
> +
> +	if (utf8ncursor(&cur, data, str, len) < 0)
> +		goto invalid_seq;
> +
> +	for (nlen = 0; nlen < dlen; nlen++) {
> +		dest[nlen] = utf8byte(&cur);
> +		if (!dest[nlen])
> +			return nlen;
> +		if (dest[nlen] == -1)
> +			break;
> +	}
> +invalid_seq:
> +	/* Treat the sequence as a binary blob. */
> +	memcpy(dest, str, len);
> +	return len;
> +
> +}

So it looks like the interface is if the destination buffer is too
small OR if the string is not a valid UTF-8 string, we treat it as a
binary blob.  I wonder if we would be better off if this function
actually signalling that there is a problem?  (Buffer too small,
invalid UTF-8 string).

It's fine to treat it as a binary blob, and copy it out to the
destination buffer, but I can imagine be use cases where knowing this
will be useful.  *Especially* the destination buffer too small case;
I'm actually a little nervous about having it silently ignoring that
error condition and just copying the binary blob.

Also, there *really* needs to be a check before dlen is assumed to be
>= len in the memcpy after the invalid_seq label.

							- Ted

  parent reply	other threads:[~2018-12-01  3:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields Gabriel Krisman Bertazi
2018-11-30 15:42   ` Theodore Y. Ts'o
2018-11-30 20:46     ` Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 02/12] mke2fs: Configure encoding during superblock initialization Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 03/12] chattr/lsattr: Support casefold attribute Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 04/12] lib/ext2fs: Implement NLS support Gabriel Krisman Bertazi
2018-11-30 15:54   ` Theodore Y. Ts'o
2018-11-26 22:19 ` [PATCH v3 05/12] lib/ext2fs: Support encoding when calculating dx hashes Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 06/12] debugfs/htree: Support encoding when printing the file hash Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 07/12] tune2fs: Prevent enabling encryption flag on encoding-aware fs Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 09/12] ext4.5: Add fname_encoding feature to ext4 man page Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 10/12] mke2fs.8: Document fname_encoding options Gabriel Krisman Bertazi
2018-11-30 15:59   ` Theodore Y. Ts'o
2018-11-26 22:19 ` [PATCH v3 11/12] mke2fs.conf.5: Document fname_encoding configuration option Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 12/12] chattr.1: Document the casefold attribute Gabriel Krisman Bertazi
     [not found] ` <20181126221949.12172-9-krisman@collabora.com>
2018-11-30 16:12   ` Theodore Y. Ts'o [this message]
2018-11-30 16:53   ` [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization Theodore Y. Ts'o
2018-11-30 18:48     ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181130161251.GA3512@thunk.org \
    --to=tytso@mit.edu \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.co.uk \
    --cc=krisman@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).