git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adrian Ratiu <adrian.ratiu@collabora.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Aaron Schrab <aaron@schrab.com>,
	git@vger.kernel.org, Emily Shaffer <emilyshaffer@google.com>,
	Rodrigo Damazio Bovendorp <rdamazio@google.com>,
	Jeff King <peff@peff.net>, Jonathan Nieder <jrnieder@gmail.com>,
	Patrick Steinhardt <ps@pks.im>,
	Josh Steadmon <steadmon@google.com>,
	Ben Knoble <ben.knoble@gmail.com>,
	Phillip Wood <phillip.wood123@gmail.com>
Subject: Re: [PATCH v4 4/4] submodule: fix case-folding gitdir filesystem colisions
Date: Tue, 11 Nov 2025 01:01:27 +0200	[thread overview]
Message-ID: <878qgdjxvc.fsf@collabora.com> (raw)
In-Reply-To: <xmqqwm3xzots.fsf@gitster.g>

Hi Junio,

On Mon, 10 Nov 2025, Junio C Hamano <gitster@pobox.com> wrote:
> Adrian Ratiu <adrian.ratiu@collabora.com> writes: 
> 
>> On Sat, 08 Nov 2025, Aaron Schrab <aaron@schrab.com> wrote: 
>>> At 17:05 +0200 07 Nov 2025, Adrian Ratiu 
>>> <adrian.ratiu@collabora.com> wrote:  
>>>>Add a new check in validate_submodule_git_dir() to detect and 
>>>>prevent case-folding filesystem colisions. When this new check 
>>>>is triggered, a stricter casefolding aware URI encoding is 
>>>>used  to percent-encode uppercase characters, e.g. Foo becomes 
>>>>%46oo.  By using this check/retry mechanism the uppercase 
>>>>encoding is  only applied when necessary, so case-sensitive 
>>>>filesystems are  not affected.  
> 
> The .gitdir name munging is a local thing, so it makes sense to 
> do the casefold mitigation only the filesystem is case folding 
> one,

That is correct. Case-sensitive filesystems will stop before 
reaching the "Case 2.2" attempt, because their gitdirs will pass 
validation in the earlier steps.
 
> 
> Your code seems to compare directory names textually, and 
> downcasing the proposed name for some reason, but I am not sure 
> why we need any of these complexity.  Wouldn't it be the matter 
> of actually trying to mkdir(2) the name presented (either "foo" 
> or "Foo") and see if that fails?  If it fails (most likely with 
> EEXIST if case folding is getting in the way, but for any 
> reason), the name is unusable and we need to "tweak" the name to 
> a usable one at that point by retrying.  Once we find a usable 
> name, we can remember the fact that we already created a 
> directory for it and reuse that empty directory in the code 
> where we used to do mkdir(2), no?

I tried the mkdir approach. Unfortunately it does not work because 
submodule_name_to_gitdir() is called on all submodule paths: both 
existing or new, valid or non-valid.

So if a dir already exists, we do not know if there is a 
conflict. It might just be a normal valid path verification of an 
existing module.

This is why I had to come up with the double-test using to_lower() 
to compute a common path, canonicalization and checking for 
existence when there is a difference via is_git_directory(). In 
this case there is always a conflict and is the only reliable way 
I colud think of.

I am very much in favor for finding a simpler check, however we 
need something at least as reliable as the current double-test, 
which passes all cases I could think of and all CI tests.
 
> 
>> Maybe we could derive a new path automatically (eg foo2 or 
>> foo_,  suggestions welcome) and use it if valid. This way, 
>> there is no  user intervention. 
>> 
>> Do you have any preference? 
> 
> If adding 'foo' and then an attempt to add 'Foo' will 
> automatically assign a name that does not conflict with 'foo' to 
> the newly added submodule, then the users would expect the same 
> to happen if the order to add them are swapped, wouldn't they?

Yes and it's a valid expectation. :)

I just missed this corner-case which Aaron brought up (many thanks 
again Aaron!).

It's a simple fix which I'll do in v5: if validation fails, we 
just need to come up with another name and retry.

The probability of conflicts decreases exponentially with each 
retry. By the 4-5 retry the probability is so small, at that 
point, if we're that desperate, we might just hash the name or use 
a random string.

> 
> IOW, I do not see why the code wants to treat uppercase and 
> lowercase letters any differently, and suspect that it might be 
> the source of additional complication.  Also, if there is an 
> existing module with a funny path "%46oo", you cannot just 
> encode "Foo" into "%46oo" to avoid crashes with 'foo' and be 
> done anyway, so it feels like we are inviting more bugs by 
> special casing certain paths (and not encoding or checking 
> others).  Don't we have an issue similar to "case folding" in 
> macOS wrt UTF-8 canonicalization, too?  An identical Unicode 
> string may be canonicalized in two ways, so in a presence of a 
> submodule named one way, the other submodule named in the other 
> canonicalization, while their names may be with different byte 
> sequences, cannot co-exist in the same directory next to each 
> other.  "Try to mkdir(2) the new name, and see if it succeeds, 
> and if so use the resulting empty directory" approach would 
> cover that case with the same mechanism as you need to use for 
> case folding filesystems, I would imagine. 

Foo and "%46oo" is another possible conflict, yes, but only when 
both "foo" and "%46oo" already exist. The deeper you go in the 
retry cycles, the more unlikely conflicts become.

We just need to detect this conflict, come up with another name 
and retry. Retry until one sticks.

Again, the probability of conflicts decreases exponentially with 
each retry. :) If we're desperate: hash the name or try a random 
string before giving up and throwing an error, which is "Case 3" 
in the current patch.

(I wrote above why mkdir cannot solve the detection problem, hope 
it makes sense).

I think it's also worth mentioning a key idea of this design: we 
don't need to detect & fix all corner cases. We can iterate and 
improve both the conflict detection and path generation over time, 
since they are opaque to users, who rely only on the resulting 
submodule.name.gitdir config we publish.

I know it seems like a game of "whack-a-mole", but even if we fix 
just half of the possible conflicts, the likeliest ones, we still 
are in a much better situation than before. At some point we're 
approaching diminishing returns, so extremely rare conflicts might 
not even be worth fixing.

I will add a test for this "Foo" + "foo" + "%46oo" in v5 as 
well. :)

  reply	other threads:[~2025-11-10 23:01 UTC|newest]

Thread overview: 179+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-16 21:36 [PATCH 0/9] Encode submodule gitdir names to avoid conflicts Adrian Ratiu
2025-08-16 21:36 ` [PATCH 1/9] submodule--helper: use submodule_name_to_gitdir in add_submodule Adrian Ratiu
2025-08-20 19:04   ` Josh Steadmon
2025-08-21 11:26     ` Adrian Ratiu
2025-08-16 21:36 ` [PATCH 2/9] submodule: create new gitdirs under submodules path Adrian Ratiu
2025-09-08 14:24   ` Phillip Wood
2025-09-08 15:46     ` Adrian Ratiu
2025-09-09  8:53       ` Phillip Wood
2025-09-09 10:57         ` Adrian Ratiu
2025-08-16 21:36 ` [PATCH 3/9] submodule: add gitdir path config override Adrian Ratiu
2025-08-20 19:37   ` Josh Steadmon
2025-08-21 12:18     ` Adrian Ratiu
2025-08-20 21:38   ` Josh Steadmon
2025-08-21 13:04     ` Adrian Ratiu
2025-08-20 21:50   ` Josh Steadmon
2025-08-21 13:05     ` Adrian Ratiu
2025-09-08 14:23   ` Phillip Wood
2025-09-09 12:02     ` Adrian Ratiu
2025-08-16 21:36 ` [PATCH 4/9] t: submodules: add basic mixed gitdir path tests Adrian Ratiu
2025-08-20 22:07   ` Josh Steadmon
2025-09-02 23:02   ` Junio C Hamano
2025-08-16 21:36 ` [PATCH 5/9] strbuf: bring back is_rfc3986_unreserved Adrian Ratiu
2025-08-16 21:56   ` Ben Knoble
2025-08-21 13:08     ` Adrian Ratiu
2025-08-16 21:36 ` [PATCH 6/9] submodule: encode gitdir paths to avoid conflicts Adrian Ratiu
2025-08-20 19:29   ` Jeff King
2025-08-21 13:14     ` Adrian Ratiu
2025-08-16 21:36 ` [PATCH 7/9] submodule: remove validate_submodule_git_dir() Adrian Ratiu
2025-09-08 14:23   ` Phillip Wood
2025-08-16 21:36 ` [PATCH 8/9] t: move nested gitdir tests to proper location Adrian Ratiu
2025-08-16 21:36 ` [PATCH 9/9] t: add gitdir encoding tests Adrian Ratiu
2025-08-18 22:06   ` Junio C Hamano
2025-08-21 13:17     ` Adrian Ratiu
2025-08-17 13:01 ` [PATCH 0/9] Encode submodule gitdir names to avoid conflicts Adrian Ratiu
2025-09-08 14:01 ` [PATCH v2 00/10] " Adrian Ratiu
2025-09-08 14:01   ` [PATCH v2 01/10] submodule--helper: use submodule_name_to_gitdir in add_submodule Adrian Ratiu
2025-09-30 13:37     ` Kristoffer Haugsbakk
2025-09-08 14:01   ` [PATCH v2 02/10] submodule: create new gitdirs under submodules path Adrian Ratiu
2025-09-09  7:40     ` Patrick Steinhardt
2025-09-09 16:17       ` Adrian Ratiu
2025-09-08 14:01   ` [PATCH v2 03/10] submodule: add gitdir path config override Adrian Ratiu
2025-09-09  7:40     ` Patrick Steinhardt
2025-09-09 17:46       ` Adrian Ratiu
2025-09-08 14:01   ` [PATCH v2 04/10] t7425: add basic mixed submodule gitdir path tests Adrian Ratiu
2025-09-08 14:01   ` [PATCH v2 05/10] strbuf: bring back is_rfc3986_unreserved Adrian Ratiu
2025-09-08 14:01   ` [PATCH v2 06/10] submodule: encode gitdir paths to avoid conflicts Adrian Ratiu
2025-09-10 18:15     ` SZEDER Gábor
2025-09-10 19:30       ` Adrian Ratiu
2025-09-10 20:18     ` Kristoffer Haugsbakk
2025-09-30 13:36     ` Kristoffer Haugsbakk
2025-09-08 14:01   ` [PATCH v2 07/10] submodule: error out if gitdir name is too long Adrian Ratiu
2025-09-08 15:51     ` Jeff King
2025-09-08 17:15       ` Adrian Ratiu
2025-09-30 13:35     ` Kristoffer Haugsbakk
2025-09-08 14:01   ` [PATCH v2 08/10] submodule: remove validate_submodule_git_dir() Adrian Ratiu
2025-09-30 13:35     ` Kristoffer Haugsbakk
2025-10-03  7:56       ` Adrian Ratiu
2025-09-08 14:01   ` [PATCH v2 09/10] t7450: move nested gitdir tests to t7425 Adrian Ratiu
2025-09-08 14:01   ` [PATCH v2 10/10] t7425: add gitdir encoding tests Adrian Ratiu
2025-10-06 11:25 ` [PATCH v3 0/5] Encode submodule gitdir names to avoid conflicts Adrian Ratiu
2025-10-06 11:25   ` [PATCH v3 1/5] submodule--helper: use submodule_name_to_gitdir in add_submodule Adrian Ratiu
2025-10-06 16:37     ` Junio C Hamano
2025-10-07  9:23       ` Adrian Ratiu
2025-10-06 11:25   ` [PATCH v3 2/5] submodule: add gitdir path config override Adrian Ratiu
2025-10-06 16:47     ` Junio C Hamano
2025-10-07 15:41       ` Junio C Hamano
2025-10-21  8:06         ` Patrick Steinhardt
2025-10-21 11:50           ` Adrian Ratiu
2025-10-21  8:05     ` Patrick Steinhardt
2025-10-21 11:57       ` Adrian Ratiu
2025-10-06 11:25   ` [PATCH v3 3/5] strbuf: bring back is_rfc3986_unreserved Adrian Ratiu
2025-10-06 16:51     ` Junio C Hamano
2025-10-06 17:47       ` Junio C Hamano
2025-10-07  9:43       ` Adrian Ratiu
2025-10-21  8:06     ` Patrick Steinhardt
2025-10-06 11:25   ` [PATCH v3 4/5] submodule: encode gitdir paths to avoid conflicts Adrian Ratiu
2025-10-06 16:57     ` Junio C Hamano
2025-10-07 14:10       ` Adrian Ratiu
2025-10-07 17:20         ` Junio C Hamano
2025-10-07 17:41           ` Adrian Ratiu
2025-10-07 19:55             ` Junio C Hamano
2025-10-06 11:25   ` [PATCH v3 5/5] submodule: error out if gitdir name is too long Adrian Ratiu
2025-10-06 17:06     ` Junio C Hamano
2025-10-07 10:17       ` Adrian Ratiu
2025-10-07 15:58         ` Junio C Hamano
2025-10-21  8:06     ` Patrick Steinhardt
2025-10-21 13:13       ` Adrian Ratiu
2025-10-06 16:21   ` [PATCH v3 0/5] Encode submodule gitdir names to avoid conflicts Junio C Hamano
2025-10-07 11:13     ` Adrian Ratiu
2025-10-07 15:36       ` Junio C Hamano
2025-10-07 16:58         ` Adrian Ratiu
2025-10-07 17:27         ` Junio C Hamano
2025-10-07 16:21       ` Junio C Hamano
2025-10-07 17:21         ` Adrian Ratiu
2025-11-07 15:05 ` [PATCH v4 0/4] " Adrian Ratiu
2025-11-07 15:05   ` [PATCH v4 1/4] submodule--helper: use submodule_name_to_gitdir in add_submodule Adrian Ratiu
2025-11-07 15:05   ` [PATCH v4 2/4] builtin/credential-store: move is_rfc3986_unreserved to url.[ch] Adrian Ratiu
2025-11-07 15:05   ` [PATCH v4 3/4] submodule: add extension to encode gitdir paths Adrian Ratiu
2025-11-07 15:05   ` [PATCH v4 4/4] submodule: fix case-folding gitdir filesystem colisions Adrian Ratiu
2025-11-08 18:20     ` Aaron Schrab
2025-11-10 17:11       ` Adrian Ratiu
2025-11-10 17:31         ` Aaron Schrab
2025-11-10 18:27           ` Adrian Ratiu
2025-11-10 19:10         ` Junio C Hamano
2025-11-10 23:01           ` Adrian Ratiu [this message]
2025-11-10 23:17             ` Junio C Hamano
2025-11-11 12:41               ` Adrian Ratiu
2025-11-12 15:28     ` Adrian Ratiu
2025-11-14 23:03   ` [PATCH v4 0/4] Encode submodule gitdir names to avoid conflicts Josh Steadmon
2025-11-17 15:22     ` Adrian Ratiu
2025-11-19 21:10 ` [PATCH v5 0/7] " Adrian Ratiu
2025-11-19 21:10   ` [PATCH v5 1/7] submodule--helper: use submodule_name_to_gitdir in add_submodule Adrian Ratiu
2025-11-19 21:10   ` [PATCH v5 2/7] builtin/credential-store: move is_rfc3986_unreserved to url.[ch] Adrian Ratiu
2025-12-05 12:16     ` Patrick Steinhardt
2025-12-05 17:25       ` Adrian Ratiu
2025-11-19 21:10   ` [PATCH v5 3/7] submodule: always validate gitdirs inside submodule_name_to_gitdir Adrian Ratiu
2025-12-05 12:17     ` Patrick Steinhardt
2025-12-05 18:17       ` Adrian Ratiu
2025-11-19 21:10   ` [PATCH v5 4/7] submodule: add extension to encode gitdir paths Adrian Ratiu
2025-12-05 12:19     ` Patrick Steinhardt
2025-12-05 19:30       ` Adrian Ratiu
2025-12-05 22:47         ` Junio C Hamano
2025-12-06 11:59           ` Patrick Steinhardt
2025-12-06 16:38             ` Junio C Hamano
2025-12-08  9:01               ` Adrian Ratiu
2025-12-08 11:46                 ` Patrick Steinhardt
2025-12-08 15:48                   ` Adrian Ratiu
2025-12-08  9:10             ` Adrian Ratiu
2025-11-19 21:10   ` [PATCH v5 5/7] submodule: fix case-folding gitdir filesystem colisions Adrian Ratiu
2025-11-19 21:10   ` [PATCH v5 6/7] submodule: use hashed name for gitdir Adrian Ratiu
2025-11-19 21:10   ` [PATCH v5 7/7] meson/Makefile: allow setting submodule encoding at build time Adrian Ratiu
2025-12-05 12:19     ` Patrick Steinhardt
2025-12-05 19:42       ` Adrian Ratiu
2025-12-05 22:52         ` Junio C Hamano
2025-12-06 12:02           ` Patrick Steinhardt
2025-12-06 16:48             ` Junio C Hamano
2025-12-08  9:23             ` Adrian Ratiu
2025-12-08  9:42           ` Adrian Ratiu
2025-12-13  8:08 ` [PATCH v6 00/10] Add submodulePathConfig extension and gitdir encoding Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 01/10] submodule--helper: use submodule_name_to_gitdir in add_submodule Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 02/10] submodule: always validate gitdirs inside submodule_name_to_gitdir Adrian Ratiu
2025-12-16  9:09     ` Patrick Steinhardt
2025-12-13  8:08   ` [PATCH v6 03/10] builtin/submodule--helper: add gitdir command Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 04/10] submodule: introduce extensions.submodulePathConfig Adrian Ratiu
2025-12-16  9:09     ` Patrick Steinhardt
2025-12-16  9:45       ` Adrian Ratiu
2025-12-16 23:22     ` Josh Steadmon
2025-12-17  7:30       ` Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 05/10] submodule: allow runtime enabling extensions.submodulePathConfig Adrian Ratiu
2025-12-16  9:09     ` Patrick Steinhardt
2025-12-16 10:01       ` Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 06/10] submodule--helper: add gitdir migration command Adrian Ratiu
2025-12-16  9:09     ` Patrick Steinhardt
2025-12-16 10:17       ` Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 07/10] builtin/credential-store: move is_rfc3986_unreserved to url.[ch] Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 08/10] submodule--helper: fix filesystem collisions by encoding gitdir paths Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 09/10] submodule: fix case-folding gitdir filesystem collisions Adrian Ratiu
2025-12-13  8:08   ` [PATCH v6 10/10] submodule: hash the submodule name for the gitdir path Adrian Ratiu
2025-12-13 14:03   ` [PATCH v6 00/10] Add submodulePathConfig extension and gitdir encoding Ben Knoble
2025-12-15 16:28     ` Adrian Ratiu
2025-12-16  0:53       ` Junio C Hamano
2025-12-18  3:43       ` Ben Knoble
2025-12-16 23:20   ` Josh Steadmon
2025-12-17  8:17     ` Adrian Ratiu
2025-12-20 10:15 ` [PATCH v7 00/11] " Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 01/11] submodule--helper: use submodule_name_to_gitdir in add_submodule Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 02/11] submodule: always validate gitdirs inside submodule_name_to_gitdir Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 03/11] builtin/submodule--helper: add gitdir command Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 04/11] submodule: introduce extensions.submodulePathConfig Adrian Ratiu
2025-12-21  3:27     ` Junio C Hamano
2025-12-23 13:35       ` Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 05/11] submodule: allow runtime enabling extensions.submodulePathConfig Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 06/11] submodule--helper: add gitdir migration command Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 07/11] builtin/credential-store: move is_rfc3986_unreserved to url.[ch] Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 08/11] submodule--helper: fix filesystem collisions by encoding gitdir paths Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 09/11] submodule: fix case-folding gitdir filesystem collisions Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 10/11] submodule: hash the submodule name for the gitdir path Adrian Ratiu
2025-12-20 10:15   ` [PATCH v7 11/11] submodule: detect conflicts with existing gitdir configs Adrian Ratiu
2025-12-21  2:39   ` [PATCH v7 00/11] Add submodulePathConfig extension and gitdir encoding Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878qgdjxvc.fsf@collabora.com \
    --to=adrian.ratiu@collabora.com \
    --cc=aaron@schrab.com \
    --cc=ben.knoble@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=ps@pks.im \
    --cc=rdamazio@google.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).