git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Possible bug in .gitignore
       [not found] <CADLV-7+fX7jrC8e_nPBHZfg8yXKpjLfPL3MgxS8peUrr8pqQoA@mail.gmail.com>
@ 2024-07-25  4:01 ` KwonHyun Kim
  2024-07-26  5:26   ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: KwonHyun Kim @ 2024-07-25  4:01 UTC (permalink / raw)
  To: git

Hello,

I am experimenting with git and I found there is something not working
as explain in the document

When I place `text_[가나].txt` in `.gitignore` it does not ignore
text_가.txt nor text_나.txt

I experimented with `text_[ab].txt` and it works fine.

So I thought it might work bytewise so I put
`text_[\200-\352][\200-\352][\200-\352].txt` with no effect. (가 is
"\352\260\200" when core.quotepath is set to true)

So I think it must be a bug that is that pattern [abc] or [a-z] does
not incorporate non-ascii characters. but I am not sure.

Thank you for  reading and hope to hear from you guys soon

KwH Kim.

# ====
Here is my spec

PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

git version
git version 2.43.0

LANG=ko_KR.UTF-8
LANGUAGE=ko:en
LC_CTYPE="ko_KR.UTF-8"
LC_NUMERIC=ko_KR.UTF-8
LC_TIME=ko_KR.UTF-8
LC_COLLATE="ko_KR.UTF-8"
LC_MONETARY=ko_KR.UTF-8
LC_MESSAGES="ko_KR.UTF-8"
LC_PAPER=ko_KR.UTF-8
LC_NAME=ko_KR.UTF-8
LC_ADDRESS=ko_KR.UTF-8
LC_TELEPHONE=ko_KR.UTF-8
LC_MEASUREMENT=ko_KR.UTF-8
LC_IDENTIFICATION=ko_KR.UTF-8
LC_ALL=

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Possible bug in .gitignore
  2024-07-25  4:01 ` Possible bug in .gitignore KwonHyun Kim
@ 2024-07-26  5:26   ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2024-07-26  5:26 UTC (permalink / raw)
  To: KwonHyun Kim; +Cc: git

On Thu, Jul 25, 2024 at 01:01:45PM +0900, KwonHyun Kim wrote:

> I am experimenting with git and I found there is something not working
> as explain in the document
> 
> When I place `text_[가나].txt` in `.gitignore` it does not ignore
> text_가.txt nor text_나.txt
> 
> I experimented with `text_[ab].txt` and it works fine.
> 
> So I thought it might work bytewise so I put
> `text_[\200-\352][\200-\352][\200-\352].txt` with no effect. (가 is
> "\352\260\200" when core.quotepath is set to true)
> 
> So I think it must be a bug that is that pattern [abc] or [a-z] does
> not incorporate non-ascii characters. but I am not sure.

The globbing in git is generally done by wildmatch.c, which was imported
from rsync. Looking in that file, it looks like it does not support
multi-byte characters at all inside brackets.

So I don't see a way to make it work except to place the _literal_ bytes
making up the utf8 sequence, each inside its own single-byte match.
Like:

  printf 'text_[\352\353][\260\202][\200\230].txt\n' >.gitignore

But then your .gitignore file is itself invalid utf8 (not to mention
that this is obviously something a user shouldn't have to do).

So I guess the fix would be to teach wildmatch.c to recognize and match
multi-byte sequences inside []. That probably requires that we assume
the pattern and the path are utf8, which will usually be true, but not
always. So we might need some kind of config switch there.

There are also probably a deep rabbit hole of corner cases there (e.g.,
NFD vs NFC, matching é versus "e" + combining accent). But I suspect
that even recognizing multi-byte sequences as a single char to match
would be big improvement.

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-07-26  5:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CADLV-7+fX7jrC8e_nPBHZfg8yXKpjLfPL3MgxS8peUrr8pqQoA@mail.gmail.com>
2024-07-25  4:01 ` Possible bug in .gitignore KwonHyun Kim
2024-07-26  5:26   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).