All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Mark Ruvald Pedersen via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Mark Ruvald Pedersen <mped@oticon.com>,
	Mark Ruvald Pedersen <mped@demant.com>
Subject: Re: [PATCH 1/2] sequencer: truncate labels to accommodate loose refs
Date: Thu, 10 Aug 2023 10:12:12 -0700	[thread overview]
Message-ID: <xmqqr0oastxv.fsf@gitster.g> (raw)
In-Reply-To: <4971e3c52504bf965aa754c9a5d31abddbcc1466.1691685300.git.gitgitgadget@gmail.com> (Mark Ruvald Pedersen via GitGitGadget's message of "Thu, 10 Aug 2023 16:34:59 +0000")

"Mark Ruvald Pedersen via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +/*
> + * To accommodate common filesystem limitations, where the loose refs' file
> + * names must not exceed `NAME_MAX`, the labels generated by `git rebase
> + * --rebase-merges` need to be truncated if the corresponding commit subjects
> + * are too long.
> + * Add some margin to stay clear from reaching `NAME_MAX`.
> + */
> +#define GIT_MAX_LABEL_LENGTH ((NAME_MAX) - (LOCK_SUFFIX_LEN) - 16)

OK.  Hopefully no systems defien NAME_MAX shorter than 20 bytes ;-).

We may suffix "-%d" to make it unique after this truncation, so
there definitely is a need for some slop, and 16-bytes should
sufficiently be long.

> @@ -5404,14 +5415,34 @@ static const char *label_oid(struct object_id *oid, const char *label,
>  		 *
>  		 * Note that we retain non-ASCII UTF-8 characters (identified
>  		 * via the most significant bit). They should be all acceptable
> -		 * in file names. We do not validate the UTF-8 here, that's not
> -		 * the job of this function.
> +		 * in file names.
> +		 *
> +		 * As we will use the labels as names of (loose) refs, it is
> +		 * vital that the name not be longer than the maximum component
> +		 * size of the file system (`NAME_MAX`). We are careful to
> +		 * truncate the label accordingly, allowing for the `.lock`
> +		 * suffix and for the label to be UTF-8 encoded (i.e. we avoid
> +		 * truncating in the middle of a character).
>  		 */
> -		for (; *label; label++)
> -			if ((*label & 0x80) || isalnum(*label))
> +		for (; *label && buf->len + 1 < max_len; label++)
> +			if (isalnum(*label) ||
> +			    (!label_is_utf8 && (*label & 0x80)))
>  				strbuf_addch(buf, *label);
> +			else if (*label & 0x80) {
> +				const char *p = label;
> +
> +				utf8_width(&p, NULL);
> +				if (p) {
> +					if (buf->len + (p - label) > max_len)
> +						break;
> +					strbuf_add(buf, label, p - label);
> +					label = p - 1;
> +				} else {
> +					label_is_utf8 = 0;
> +					strbuf_addch(buf, *label);
> +				}

Utf8_width() does let you advance one unicode character at a time as
its side effect, but it may be a bit overkill, as its primary
function is to compute the display width of that character.

We could take advantage of the fact that the first byte of a UTF-8
character has two high-bits set (i.e. 11xxxxxx) while the second and
subsequent bytes have only the top-bit set and the second highest
bit clear (i.e. 10xxxxxx) to simplify/optimize it.  If this were in
a performance sensitive codepath, that is.

I'll queue it as-is for now, as we are in "regression fix only"
phase of the cycle, and have enough time to polish it.

Thanks.

>  			/* avoid leading dash and double-dashes */
> -			else if (buf->len && buf->buf[buf->len - 1] != '-')
> +			} else if (buf->len && buf->buf[buf->len - 1] != '-')
>  				strbuf_addch(buf, '-');
>  		if (!buf->len) {
>  			strbuf_addstr(buf, "rev-");

  reply	other threads:[~2023-08-10 17:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10 16:34 [PATCH 0/2] sequencer: truncate lockfile and ref to NAME_MAX Mark Ruvald Pedersen via GitGitGadget
2023-08-10 16:34 ` [PATCH 1/2] sequencer: truncate labels to accommodate loose refs Mark Ruvald Pedersen via GitGitGadget
2023-08-10 17:12   ` Junio C Hamano [this message]
2023-08-16  8:36     ` Johannes Schindelin
2023-08-16 16:28       ` Junio C Hamano
2023-08-10 16:35 ` [PATCH 2/2] rebase: allow overriding the maximal length of the generated labels Johannes Schindelin via GitGitGadget
2023-08-10 17:15   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqr0oastxv.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=mped@demant.com \
    --cc=mped@oticon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.