* Git *accepts* a branch name, it can't identity in the future?
@ 2017-08-20 7:51 Kaartic Sivaraam
2017-08-20 8:20 ` Johannes Sixt
2017-08-20 8:33 ` Jeff King
0 siblings, 2 replies; 5+ messages in thread
From: Kaartic Sivaraam @ 2017-08-20 7:51 UTC (permalink / raw)
To: git
Hello all,
First of all, I would like to tell that this happened completely by
accident and it's partly my mistake. Here's what happened.
I recently started creating 'feature branches' a lot for the few
patches that I sent to this mailing list. To identify the status of the
patch corresponding to that branch I prefixed them with special unicode
characters like ✓, ˅ etc. instead of using conventional hierarchical
names like, 'done/', 'archived/'.
Then I started finding it difficult to distinguish these unicode-
prefixed names probably because they had only one unicode character in
common. So, I thought of switching to the conventional way of using
scoped branch names (old is gold, you see). I wrote a tiny script to
rename the branches by replacing a specific unicode prefix with a
corresponding hierachy. For example, the script would convert a branch
named '✓doc-fix' to 'done/doc-fix'.
I made a small assumption in the script which turned out to be false. I
thought the unicode prefixes I used corresponded to only two bytes.
This lead to the issue. The unicode character '✓' corresponds to three
characters and as a result instead of removing it, my script replaced
it with the unknown character '�'. So, the branch named '✓doc-fix'
became 'done/�doc-fix'. Here's the issue. I couldn't use
$ git branch -m done/�doc-fix done/dic-fix
to rename the branch. Nor could I refer to it in anyway. Git simply
says,
error: pathspec 'done/�doc-fix' did not match any file(s) known to git.
It's not a big issue as I haven't lost anything out of it. The branches
have been merged into 'master'.
I just wanted to know why git accepted a branch name which it can't
identify later?
If it had rejected that name in the first place it would have been
better. In case you would like to know how I got that weird name,
here's a way to get that
$ echo '✓doc-fix' | cut -c3-100
--
Kaartic
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Git *accepts* a branch name, it can't identity in the future?
2017-08-20 7:51 Git *accepts* a branch name, it can't identity in the future? Kaartic Sivaraam
@ 2017-08-20 8:20 ` Johannes Sixt
2017-08-20 9:11 ` Kaartic Sivaraam
2017-08-20 8:33 ` Jeff King
1 sibling, 1 reply; 5+ messages in thread
From: Johannes Sixt @ 2017-08-20 8:20 UTC (permalink / raw)
To: Kaartic Sivaraam, git
Am 20.08.2017 um 09:51 schrieb Kaartic Sivaraam:
> I made a small assumption in the script which turned out to be false. I
> thought the unicode prefixes I used corresponded to only two bytes.
> This lead to the issue. The unicode character '✓' corresponds to three
> characters and as a result instead of removing it, my script replaced
> it with the unknown character '�'. So, the branch named '✓doc-fix'
> became 'done/�doc-fix'. Here's the issue. I couldn't use
>
> $ git branch -m done/�doc-fix done/dic-fix
>
> to rename the branch. Nor could I refer to it in anyway. Git simply
> says,
>
> error: pathspec 'done/�doc-fix' did not match any file(s) known to git.
>
> It's not a big issue as I haven't lost anything out of it. The branches
> have been merged into 'master'.
>
> I just wanted to know why git accepted a branch name which it can't
> identify later?
>
> If it had rejected that name in the first place it would have been
> better. In case you would like to know how I got that weird name,
> here's a way to get that
>
> $ echo '✓doc-fix' | cut -c3-100
>
See, these two are different:
$ echo '✓doc-fix' | cut -c3-100 | od -t x1
0000000 93 64 6f 63 2d 66 69 78 0a
0000011
$ echo '�doc-fix' | od -t x1
0000000 64 6f bd 64 6f 63 2d 66 69 78 0a
0000013
It is not Git's fault that your terminal converts an invalid UTF-8
sequence (that your script produces) to �. Nor is it when you paste that
character onto the command line, that it is passed as a (correct) UTF-8
character.
Perhaps this helps (untested):
$ git branch -m done/$(printf '\x93')doc-fix done/dic-fix
In Git's database, branch names are just sequences of bytes. It is
outside the scope to verify that all input is encoded correctly.
-- Hannes
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Git *accepts* a branch name, it can't identity in the future?
2017-08-20 8:20 ` Johannes Sixt
@ 2017-08-20 9:11 ` Kaartic Sivaraam
0 siblings, 0 replies; 5+ messages in thread
From: Kaartic Sivaraam @ 2017-08-20 9:11 UTC (permalink / raw)
To: Johannes Sixt, git
On Sun, 2017-08-20 at 10:20 +0200, Johannes Sixt wrote:
> It is not Git's fault that your terminal converts an invalid UTF-8
> sequence (that your script produces) to �. Nor is it when you paste that
> character onto the command line, that it is passed as a (correct) UTF-8
> character.
>
You're right. I just now realise how I missed the line between "what's
seen by us" and "what's seen by the program".
> Perhaps this helps (untested):
>
> $ git branch -m done/$(printf '\x93')doc-fix done/dic-fix
>
This one helped, thanks.
--
Kaartic
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Git *accepts* a branch name, it can't identity in the future?
2017-08-20 7:51 Git *accepts* a branch name, it can't identity in the future? Kaartic Sivaraam
2017-08-20 8:20 ` Johannes Sixt
@ 2017-08-20 8:33 ` Jeff King
2017-08-20 10:00 ` Kaartic Sivaraam
1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2017-08-20 8:33 UTC (permalink / raw)
To: Kaartic Sivaraam; +Cc: git
On Sun, Aug 20, 2017 at 01:21:29PM +0530, Kaartic Sivaraam wrote:
> I made a small assumption in the script which turned out to be false. I
> thought the unicode prefixes I used corresponded to only two bytes.
> This lead to the issue. The unicode character '✓' corresponds to three
> characters and as a result instead of removing it, my script replaced
> it with the unknown character '�'. So, the branch named '✓doc-fix'
> became 'done/�doc-fix'. Here's the issue. I couldn't use
>
> $ git branch -m done/�doc-fix done/dic-fix
>
> to rename the branch. Nor could I refer to it in anyway. Git simply
> says,
>
> error: pathspec 'done/�doc-fix' did not match any file(s) known to git.
What does "git for-each-ref" say about which branches you _do_ have?
Also, what platform are you on?
I'm wondering specifically if you have a filesystem (like HFS+ on MacOS)
that silently rewrites invalid unicode in filenames we create. That
would mean your branches are still there, but probably with some funny
filename like "done/%xxdoc-fix". Git wouldn't know that name because the
filesystem rewriting happened behinds its back (though I'd think that a
further open() call would find the same file, so maybe this is barking
up the wrong tree).
Another line of thinking: are you sure the � you are writing on the
command line is identical to the one generated by the corruption (and if
you cut and paste, is perhaps a generic glyph placed in the buffer by
your terminal to replace an invalid codepoint, rather than the actual
bytes)?
> I just wanted to know why git accepted a branch name which it can't
> identify later?
>
> If it had rejected that name in the first place it would have been
> better. In case you would like to know how I got that weird name,
> here's a way to get that
>
> $ echo '✓doc-fix' | cut -c3-100
[a few defines to make it easy to prod git]
$ check=$(printf '\342\234\223')
$ broken=$(printf '\223')
[this is your starting state, a branch with the unicode name]
$ git branch ${check}doc-fix
[you didn't say how your script works, so let's use git to rename]
$ git branch -m ${check}doc-fix ${broken}doc-fix
[my terminal doesn't show the unknown-character glyph, but we
can see the funny character with "cat -A"]:
$ git for-each-ref --format='%(refname)' | cat -A
refs/heads/master$
refs/heads/M-^Sdoc-fix$
[and we can rename it using that knowledge]
$ git branch ${broken}doc-fix doc-fix
-Peff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Git *accepts* a branch name, it can't identity in the future?
2017-08-20 8:33 ` Jeff King
@ 2017-08-20 10:00 ` Kaartic Sivaraam
0 siblings, 0 replies; 5+ messages in thread
From: Kaartic Sivaraam @ 2017-08-20 10:00 UTC (permalink / raw)
To: Jeff King; +Cc: git
Thanks, but Johannes has already found the issue and given a solution.
Regardless, replying to the questions just for the note.
On Sun, 2017-08-20 at 04:33 -0400, Jeff King wrote:
> What does "git for-each-ref" say about which branches you _do_ have?
>
> Also, what platform are you on?
>
I use a "Debian GNU/Linux buster/sid 64-bit"
> I'm wondering specifically if you have a filesystem (like HFS+ on MacOS)
> that silently rewrites invalid unicode in filenames we create. That
> would mean your branches are still there, but probably with some funny
> filename like "done/%xxdoc-fix". Git wouldn't know that name because the
> filesystem rewriting happened behinds its back (though I'd think that a
> further open() call would find the same file, so maybe this is barking
> up the wrong tree).
>
That sounds dangerous!
> Another line of thinking: are you sure the � you are writing on the
> command line is identical to the one generated by the corruption (and if
> you cut and paste, is perhaps a generic glyph placed in the buffer by
> your terminal to replace an invalid codepoint, rather than the actual
> bytes)?
>
This was the issue. I wasn't providing git with the actual bytes that
resulted as a consequence of the sloppy script.
> [you didn't say how your script works, so let's use git to rename]
I know of no other way to rename a branch, so I didn't mention it :)
> $ broken=$(printf '\223')
>
> [and we can rename it using that knowledge]
> $ git branch ${broken}doc-fix doc-fix
>
Johannes has already given a solution, this one works too.
--
Kaartic
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-08-20 10:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-20 7:51 Git *accepts* a branch name, it can't identity in the future? Kaartic Sivaraam
2017-08-20 8:20 ` Johannes Sixt
2017-08-20 9:11 ` Kaartic Sivaraam
2017-08-20 8:33 ` Jeff King
2017-08-20 10:00 ` Kaartic Sivaraam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).