From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F58419EED3 for ; Tue, 10 Feb 2026 07:44:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.130.231.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770709485; cv=none; b=eqtYxnn7CsHAJ6N+tfOpt9ziNdR9V332rW5+Ba55TyZhZAXX7Df/Hq19v1xwy1Xe/MIoLr4/6i1PpaLDcQRsR+lo8RtuiP7PsXU06ksWekCuAuYVd99r3WEeSWUSSL7Ma8TUxdbx0kM7Hm5rxEr88qkgENZmy5eVPjXlhQgwJ1U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770709485; c=relaxed/simple; bh=lHXTqgtPGn+YGMxmBFwloBWnfEaeywI1eIkEJ4oMDtM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=uo3yg98w+/lWNTBLCRZFuNXSnOhAtKQXD75vuggZ3YUpsOxVoZX9pHtjTKMDAgSnEbNH5bDxxFn80ht26IJKMyno2TFw1niLQ7G21wycqhJ1t81vF7R7dP3raqCO2vJTPQcRkSXlKRWfrUkwRUeWvqzwxnjKZitaMxl+YTgSkg4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=fjggU1Sr; arc=none smtp.client-ip=104.130.231.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="fjggU1Sr" Received: (qmail 353568 invoked by uid 109); 10 Feb 2026 07:44:42 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:content-transfer-encoding:in-reply-to; s=20240930; bh=lHXTqgtPGn+YGMxmBFwloBWnfEaeywI1eIkEJ4oMDtM=; b=fjggU1Sr94w7G15lzXNi5SDPadeIaADQmGSazmlMNpCwkT2HvqXqgOb9T07pPVnAouFvdat2ovrpGZ1V6OZHa/HmLXfwlHJxz8vFVqmbC2b73ZUF/uscvJQb9i35aaYZEYpwOi+IUHyFv6TdnrRmTl4Rp69GANH4x9gYIR++txva6bmntEsfLMKEWbBREMosZ96nh+e8hTd/0UJu4UVihHwmu26D0uuL+wDf2WqFjg+u5sRdXKDdlwp32/W+ZfT51cmK6wq5+RnE0bllNYiExaQDlN1TavcwgBPhudwjyUixq5qLQxHEs2rGHhFEhOjYDU6zkk9bJXTPv0pgm918Ow== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 10 Feb 2026 07:44:42 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 698382 invoked by uid 111); 10 Feb 2026 07:44:46 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 10 Feb 2026 02:44:46 -0500 Authentication-Results: peff.net; auth=none Date: Tue, 10 Feb 2026 02:44:41 -0500 From: Jeff King To: Jonatan Holmgren Cc: git@vger.kernel.org, gitster@pobox.com, "D . Ben Knoble" , "brian m . carlson" Subject: Re: [PATCH v1] alias: support UTF-8 characters via subsection syntax Message-ID: <20260210074441.GE1756549@coredump.intra.peff.net> References: <3124b359-2929-4f3f-9ac6-793277fe422b@jontes.page> <20260209220115.461109-1-jonatan@jontes.page> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260209220115.461109-1-jonatan@jontes.page> On Mon, Feb 09, 2026 at 11:01:15PM +0100, Jonatan Holmgren wrote: > Git aliases are currently restricted to ASCII characters due to > config key syntax limitations. This prevents non-English speakers > from creating aliases in their native languages. > > Add support for UTF-8 alias names using config subsections: > > [alias "förgrena"] > command = branch > > The subsection name is matched verbatim (case-sensitive), while the > existing flat syntax (alias.name) remains case-insensitive for > backward compatibility. This approach uses existing config > infrastructure and avoids complex Unicode normalization. Thanks, this mostly looks quite good. I have a few comments below that range from a few small bugs to project-specific gotchas to style/readability suggestions. > diff --git a/Documentation/RelNotes/2.54.0.adoc b/Documentation/RelNotes/2.54.0.adoc > [...] Usually patches do not fill out their own release notes, as it would create many conflicts when merging topics in arbitrary order. Instead, the maintainer generally writes up a blurb that goes in the merge commits and is eventually placed in the release notes by script. Suggesting a blurb does save work for the maintainer, but it should go in the cover letter (so for a single patch, after the "---" lines). There's a bit on this in Documentation/SubmittingPatches, under the section "the-topic-summary". > diff --git a/Documentation/config/alias.adoc b/Documentation/config/alias.adoc > index 80ce17d2de..feba1e2022 100644 > --- a/Documentation/config/alias.adoc > +++ b/Documentation/config/alias.adoca Yay, I was happy to see nice comprehensive documentation here. And you do not seem to have gotten caught by any of the (many) formatting gotchas. > +After defining these, you can run `git co`, `git hämta`, `git gömma`, > +and `git "my alias with whitespace"` (note the quotes for aliases with spaces). The use of literal backticks here makes sense, and if you haven't disabled it with a build-time knob, they should be shown in bold. Curiously in my build of the docs this doesn't quite work, and I get "git h" in bold, but "ämta" unbolded. Weird, and presumably a bug in asciidoc (the generated docbook xml shows the same thing, as does html). I get the same with asciidoctor, too. I'm not sure if it is worth working around it (or even how we would do so). > diff --git a/alias.c b/alias.c > index 1a1a141a0a..f0c5f12fdd 100644 > --- a/alias.c > +++ b/alias.c > @@ -17,19 +17,45 @@ static int config_alias_cb(const char *key, const char *value, > const struct config_context *ctx UNUSED, void *d) > { > struct config_alias_data *data = d; > - const char *p; > + const char *cmd, *subkey; > + size_t cmd_len; > + int is_subsection; > > - if (!skip_prefix(key, "alias.", &p)) > + /* Use parse_config_key() to handle both 2-level and 3-level keys */ > + if (parse_config_key(key, "alias", &cmd, &cmd_len, &subkey) < 0) > return 0; > > + /* > + * Support two syntaxes: > + * 1. alias.name = value (simple, 2-level key) > + * 2. [alias "name"] command = value (new, 3-level key) > + */ > + if (cmd) { > + if (strcmp(subkey, "command")) > + return 0; > + is_subsection = 1; > + } else { > + cmd = subkey; > + cmd_len = strlen(cmd); > + is_subsection = 0; > + } OK, so versus what I showed earlier, we are keeping an is_subsection flag in order to differentiate the cases. And then we use that below... > if (data->alias) { > - if (!strcasecmp(p, data->alias)) { > + int match; > + if (is_subsection) { > + match = (strlen(data->alias) == cmd_len && > + !strncmp(data->alias, cmd, cmd_len)); > + } else { > + match = (strlen(data->alias) == cmd_len && > + !strncasecmp(data->alias, cmd, cmd_len)); > + } ...to decide whether to be case-insensitive or not. Makes sense. I was going to suggest some simplifications here: if is_subsection is false, then we know that cmd is a NUL-terminated string, and we could just use strcasecmp() as before. And in fact, with that conditional we do not need the "cmd" aliasing at all. Which also means we don't need a separate flag to distinguish the cases. You could just write: if (cmd) { match = (strlen(data->alias) == cmd_len && !strncmp(data->alias, cmd, cmd_len)); } else { match = !strcasecmp(data->alias, subkey); } But I guess we do use the redirection in the listing path later anyway: > } else if (data->list) { > - string_list_append(data->list, p); > + string_list_append_nodup(data->list, xmemdupz(cmd, cmd_len)); > } So maybe it is better as-is. I dunno. I have a feeling this would all need even more refactoring even if we ever added any more alias.foo.* config entries. So we can probably just leave it until then for more rearranging. One thing I did find a little funny is the use of the word "subkey" for one variable. Usually we just call this the "key", but that name is annoyingly already taken by the function parameter (well before your patch). Usually we call that "var" instead. I don't know if it's worth changing or not. It's probably only confusing to me because I've stared at so many other Git config callback functions. > diff --git a/t/t0014-alias-utf8.sh b/t/t0014-alias-utf8.sh Is there any reason to add a new script here, and not just put these in the existing t0014? If you were bailing from the script when the prereq was not met, we'd need a separate script. But since you are using per-test prereqs, they can just exist alongside the other alias tests. And also, you can't have two scripts with the same number. ;) Running "make test" should complain about that (I'll assume you ran your tests, but manually as ./t0014). > +# Skip if filesystem/locale doesn't support UTF-8 > +test_lazy_prereq UTF8_LOCALE ' > + test_have_prereq !MINGW && > + test_set_prereq UTF8_LOCALE > +' This is a slight mis-use of set_prereq. You should just need to return success from the lazy-prereq function, and it will be set automatically. So more like: test_lazy_prereq UTF8_LOCALE ' test_have_prereq !MINGW ' At which point I wonder if there is much value in having UTF8_LOCALE at all, and not just putting "!MINGW" in each of the tests. Beyond that, my larger question is: what are we asking of the platform and why does MINGW not work? We do not care about actual utf8 filesystem support, since these are just aliases. As long as the config file can store them and we can pass them on the command-line, that should be enough. > +test_expect_success UTF8_LOCALE 'setup UTF-8 aliases' ' > + git config alias."förgrena".command branch && > + git config alias."分支".command "branch --list" && > + git config alias."test name".command status > +' Usually we'd just define these in the tests that use them, which keeps each individual test a bit more self-contained. So more like: test_expect_success 'UTF-8 alias with Swedish characters' ' test_config alias."förgrena".command branch && git förgrena >output && ... ' But we use them in the listing test, too, so they are each used twice. I dunno. It might make sense for the listing test to just define its own expected set. > +test_expect_success UTF8_LOCALE 'UTF-8 alias with Swedish characters' ' > + git förgrena >output && > + test_grep -E "^(\* )?(main|master)" output > +' So I see we're aliasing "branch" here, which gives us this complicated regex for checking the output. Would something like "!echo ran my alias" make the test easier to understand? > +test_expect_success 'list UTF-8 aliases' ' > + git config --get-regexp "^alias\\..*\\.command" >output && > + test_line_count -ge 3 output > +' This one needs the UTF8_LOCALE prereq, too (since we'd only have set up those aliases if we had it). Is doing --get-regexp here all that interesting? It's not testing your new code at all, and would have passed already. I expected us to look at the output of "git help -a" to see that the entries are listed. And trying that with your patch: ./git -c alias.föo.command=branch help -a shows some possible issues: - we show it as "föo.command" - the alignment with other aliases is not quite right. Probably it is using a byte-count instead of utf8_strwidth() I was puzzled that we would not parse it as föo correctly, since your patch modified the listing code. It looks like help.c has its own separate parser for this. Yuck. :( I think this also means that "git help föo" will fail. So probably we want a preparatory patch to teach help.c to rely on list_aliases() and alias_lookup() from alias.[ch]. > +test_expect_success 'flat syntax still works' ' > + git config alias.testlegacy status && > + git testlegacy >output && > + test_grep "On branch" output > +' This probably is covered elsewhere already, but OK. :) > +test_expect_success 'new subsection syntax works' ' > + git config alias.testnew.command status && > + git testnew >output && > + test_grep "On branch" output > +' Nice basic test without utf8. Good. > +test_expect_success 'subsection syntax only accepts command key' ' > + git config alias.invalid.notcommand "value" && > + test_must_fail git invalid 2>error && > + test_grep -i "not a git command" error > +' Good thinking. > +test_expect_success 'simple syntax is case-insensitive' ' > + git config alias.LegacyCase status && > + git legacycase >output 2>&1 && > + test_grep "On branch" output > +' > + > +test_expect_success 'subsection syntax is case-sensitive' ' > + test_commit case-test && > + git config alias.SubCase.command "log --oneline" && > + git config alias.subcase.command status && > + git SubCase >upper.out 2>&1 && > + git subcase >lower.out 2>&1 && > + ! test_cmp upper.out lower.out > +' And these are both very nice to see. -Peff