git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git *accepts* a branch name, it can't identity in the future?
@ 2017-08-20  7:51 Kaartic Sivaraam
  2017-08-20  8:20 ` Johannes Sixt
  2017-08-20  8:33 ` Jeff King
  0 siblings, 2 replies; 5+ messages in thread
From: Kaartic Sivaraam @ 2017-08-20  7:51 UTC (permalink / raw)
  To: git

Hello all,

First of all, I would like to tell that this happened completely by
accident and it's partly my mistake. Here's what happened.

I recently started creating 'feature branches' a lot for the few
patches that I sent to this mailing list. To identify the status of the
patch corresponding to that branch I prefixed them with special unicode
characters like ✓, ˅ etc. instead of using conventional hierarchical
names like, 'done/', 'archived/'.

Then I started finding it difficult to distinguish these unicode-
prefixed names probably because they had only one unicode character in
common. So, I thought of switching to the conventional way of using
scoped branch names (old is gold, you see). I wrote a tiny script to
rename the branches by replacing a specific unicode prefix with a
corresponding hierachy. For example, the script would convert a branch
named '✓doc-fix' to 'done/doc-fix'.

I made a small assumption in the script which turned out to be false. I
thought the unicode prefixes I used corresponded to only two bytes.
This lead to the issue. The unicode character '✓' corresponds to three
characters and as a result instead of removing it, my script replaced
it with the unknown character '�'. So, the branch named '✓doc-fix'
became 'done/�doc-fix'. Here's the issue. I couldn't use 

    $ git branch -m done/�doc-fix done/dic-fix 

to rename the branch. Nor could I refer to it in anyway. Git simply
says,

    error: pathspec 'done/�doc-fix' did not match any file(s) known to git.

It's not a big issue as I haven't lost anything out of it. The branches
have been merged into 'master'.

I just wanted to know why git accepted a branch name which it can't
identify later?

If it had rejected that name in the first place it would have been
better. In case you would like to know how I got that weird name,
here's a way to get that

    $ echo '✓doc-fix' | cut -c3-100

-- 
Kaartic

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git *accepts* a branch name, it can't identity in the future?
  2017-08-20  7:51 Git *accepts* a branch name, it can't identity in the future? Kaartic Sivaraam
@ 2017-08-20  8:20 ` Johannes Sixt
  2017-08-20  9:11   ` Kaartic Sivaraam
  2017-08-20  8:33 ` Jeff King
  1 sibling, 1 reply; 5+ messages in thread
From: Johannes Sixt @ 2017-08-20  8:20 UTC (permalink / raw)
  To: Kaartic Sivaraam, git

Am 20.08.2017 um 09:51 schrieb Kaartic Sivaraam:
> I made a small assumption in the script which turned out to be false. I
> thought the unicode prefixes I used corresponded to only two bytes.
> This lead to the issue. The unicode character '✓' corresponds to three
> characters and as a result instead of removing it, my script replaced
> it with the unknown character '�'. So, the branch named '✓doc-fix'
> became 'done/�doc-fix'. Here's the issue. I couldn't use
> 
>      $ git branch -m done/�doc-fix done/dic-fix
> 
> to rename the branch. Nor could I refer to it in anyway. Git simply
> says,
> 
>      error: pathspec 'done/�doc-fix' did not match any file(s) known to git.
> 
> It's not a big issue as I haven't lost anything out of it. The branches
> have been merged into 'master'.
> 
> I just wanted to know why git accepted a branch name which it can't
> identify later?
> 
> If it had rejected that name in the first place it would have been
> better. In case you would like to know how I got that weird name,
> here's a way to get that
> 
>      $ echo '✓doc-fix' | cut -c3-100
> 

See, these two are different:

$ echo '✓doc-fix' | cut -c3-100 | od -t x1
0000000 93 64 6f 63 2d 66 69 78 0a
0000011
$ echo '�doc-fix' | od -t x1
0000000 64 6f bd 64 6f 63 2d 66 69 78 0a
0000013

It is not Git's fault that your terminal converts an invalid UTF-8 
sequence (that your script produces) to �. Nor is it when you paste that 
character onto the command line, that it is passed as a (correct) UTF-8 
character.

Perhaps this helps (untested):

$ git branch -m done/$(printf '\x93')doc-fix done/dic-fix

In Git's database, branch names are just sequences of bytes. It is 
outside the scope to verify that all input is encoded correctly.

-- Hannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git *accepts* a branch name, it can't identity in the future?
  2017-08-20  7:51 Git *accepts* a branch name, it can't identity in the future? Kaartic Sivaraam
  2017-08-20  8:20 ` Johannes Sixt
@ 2017-08-20  8:33 ` Jeff King
  2017-08-20 10:00   ` Kaartic Sivaraam
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2017-08-20  8:33 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: git

On Sun, Aug 20, 2017 at 01:21:29PM +0530, Kaartic Sivaraam wrote:

> I made a small assumption in the script which turned out to be false. I
> thought the unicode prefixes I used corresponded to only two bytes.
> This lead to the issue. The unicode character '✓' corresponds to three
> characters and as a result instead of removing it, my script replaced
> it with the unknown character '�'. So, the branch named '✓doc-fix'
> became 'done/�doc-fix'. Here's the issue. I couldn't use 
> 
>     $ git branch -m done/�doc-fix done/dic-fix 
> 
> to rename the branch. Nor could I refer to it in anyway. Git simply
> says,
> 
>     error: pathspec 'done/�doc-fix' did not match any file(s) known to git.

What does "git for-each-ref" say about which branches you _do_ have?

Also, what platform are you on?

I'm wondering specifically if you have a filesystem (like HFS+ on MacOS)
that silently rewrites invalid unicode in filenames we create. That
would mean your branches are still there, but probably with some funny
filename like "done/%xxdoc-fix". Git wouldn't know that name because the
filesystem rewriting happened behinds its back (though I'd think that a
further open() call would find the same file, so maybe this is barking
up the wrong tree).

Another line of thinking: are you sure the � you are writing on the
command line is identical to the one generated by the corruption (and if
you cut and paste, is perhaps a generic glyph placed in the buffer by
your terminal to replace an invalid codepoint, rather than the actual
bytes)?

> I just wanted to know why git accepted a branch name which it can't
> identify later?
> 
> If it had rejected that name in the first place it would have been
> better. In case you would like to know how I got that weird name,
> here's a way to get that
> 
>     $ echo '✓doc-fix' | cut -c3-100

  [a few defines to make it easy to prod git]
  $ check=$(printf '\342\234\223')
  $ broken=$(printf '\223')

  [this is your starting state, a branch with the unicode name]
  $ git branch ${check}doc-fix

  [you didn't say how your script works, so let's use git to rename]
  $ git branch -m ${check}doc-fix ${broken}doc-fix

  [my terminal doesn't show the unknown-character glyph, but we
   can see the funny character with "cat -A"]:
  $ git for-each-ref --format='%(refname)' | cat -A
  refs/heads/master$
  refs/heads/M-^Sdoc-fix$

  [and we can rename it using that knowledge]
  $ git branch ${broken}doc-fix doc-fix

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git *accepts* a branch name, it can't identity in the future?
  2017-08-20  8:20 ` Johannes Sixt
@ 2017-08-20  9:11   ` Kaartic Sivaraam
  0 siblings, 0 replies; 5+ messages in thread
From: Kaartic Sivaraam @ 2017-08-20  9:11 UTC (permalink / raw)
  To: Johannes Sixt, git

On Sun, 2017-08-20 at 10:20 +0200, Johannes Sixt wrote:
> It is not Git's fault that your terminal converts an invalid UTF-8 
> sequence (that your script produces) to �. Nor is it when you paste that 
> character onto the command line, that it is passed as a (correct) UTF-8 
> character.
> 

You're right. I just now realise how I missed the line between "what's
seen by us" and "what's seen by the program".


> Perhaps this helps (untested):
> 
> $ git branch -m done/$(printf '\x93')doc-fix done/dic-fix
> 

This one helped, thanks.

-- 
Kaartic

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git *accepts* a branch name, it can't identity in the future?
  2017-08-20  8:33 ` Jeff King
@ 2017-08-20 10:00   ` Kaartic Sivaraam
  0 siblings, 0 replies; 5+ messages in thread
From: Kaartic Sivaraam @ 2017-08-20 10:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Thanks, but Johannes has already found the issue and given a solution.
Regardless, replying to the questions just for the note.

On Sun, 2017-08-20 at 04:33 -0400, Jeff King wrote:
> What does "git for-each-ref" say about which branches you _do_ have?
> 
> Also, what platform are you on?
> 

I use a "Debian GNU/Linux buster/sid 64-bit"

> I'm wondering specifically if you have a filesystem (like HFS+ on MacOS)
> that silently rewrites invalid unicode in filenames we create. That
> would mean your branches are still there, but probably with some funny
> filename like "done/%xxdoc-fix". Git wouldn't know that name because the
> filesystem rewriting happened behinds its back (though I'd think that a
> further open() call would find the same file, so maybe this is barking
> up the wrong tree).
> 

That sounds dangerous!


> Another line of thinking: are you sure the � you are writing on the
> command line is identical to the one generated by the corruption (and if
> you cut and paste, is perhaps a generic glyph placed in the buffer by
> your terminal to replace an invalid codepoint, rather than the actual
> bytes)?
> 

This was the issue. I wasn't providing git with the actual bytes that
resulted as a consequence of the sloppy script.


>   [you didn't say how your script works, so let's use git to rename]

I know of no other way to rename a branch, so I didn't mention it :)


>   $ broken=$(printf '\223')
> 
>   [and we can rename it using that knowledge]
>   $ git branch ${broken}doc-fix doc-fix
> 

Johannes has already given a solution, this one works too.


-- 
Kaartic

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-20 10:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-20  7:51 Git *accepts* a branch name, it can't identity in the future? Kaartic Sivaraam
2017-08-20  8:20 ` Johannes Sixt
2017-08-20  9:11   ` Kaartic Sivaraam
2017-08-20  8:33 ` Jeff King
2017-08-20 10:00   ` Kaartic Sivaraam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).