git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* i18n, alternative solution
@ 2007-01-01 14:47 Robin Rosenberg
  2007-01-01 19:44 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Robin Rosenberg @ 2007-01-01 14:47 UTC (permalink / raw)
  To: git

Hi and Happy New Year

Things are happening wrt to support for working with multiple locales, which 
is nice to see. I've not had time to follow the development so these comments 
(and code) may be considered somewhat late in the process.

What disturbes me is the complexity that the approach of storing multiple 
encodings in the same repository results in. I was hacking another solution 
earlier this autumn, that takes another approach, i.e. UTF-8 is the internal 
encoding. period. If you want to working with anything else, you convert on 
input and output. The code is very simply since it has few cases to consider 
and I don' t need to store encoding anywhere. What is missing is a flags to 
disable i18n for a flag (or enable it). 

The uglyness is the filename parts since I catch all filename operations and 
not only the ones I need. The target wasn't final code, but to prove that it 
could be done.

The focus is on file names, but commit messages were included since it was so 
simple. You can get the code at http://rosenberg.homelinux.net/repos/GIT.git 
on the branch i18n. I modded the testcases to verify that GIT works after the 
patches, but one would probably prefer to write new test cases for UTF-8. 

5d73e28397f7ec0f85fcb8e31e91326afbcfea19 is the last commit that executes all 
test cases successfully. 

I could send the code as patches if you like, not on top of the lastest 
version though, but 1.4.2-something,

-- robin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: i18n, alternative solution
  2007-01-01 14:47 i18n, alternative solution Robin Rosenberg
@ 2007-01-01 19:44 ` Junio C Hamano
  2007-01-07 16:55   ` Robin Rosenberg
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2007-01-01 19:44 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:

> What disturbes me is the complexity that the approach of storing multiple 
> encodings in the same repository results in.

We are not encouraging mixed encodings, mind you.

Even though we check and warn comits that do not have a valid
UTF-8 string, the users can make mistakes and people need be
able to look at them later.  That is what we are solving.  

At the same time we do NOT force inconvenience on projects that
want to use legacy encoding for whatever reason.  The world is
not UTF-8 only, and encoding to UTF-8 is non-reversible a times
(positive return value from iconv(3)).  Always re-coding to
UTF-8 will NOT be accepted to git for now.  We can revisit this
perhaps in 5 years.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: i18n, alternative solution
  2007-01-01 19:44 ` Junio C Hamano
@ 2007-01-07 16:55   ` Robin Rosenberg
  0 siblings, 0 replies; 3+ messages in thread
From: Robin Rosenberg @ 2007-01-07 16:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

måndag 01 januari 2007 20:44 skrev Junio C Hamano:
> Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:
> > What disturbes me is the complexity that the approach of storing multiple
> > encodings in the same repository results in.
>
> We are not encouraging mixed encodings, mind you.
>
> Even though we check and warn comits that do not have a valid
> UTF-8 string, the users can make mistakes and people need be
> able to look at them later.  That is what we are solving.

My is to not convert at all if the uset  has locale=UTF-8. Mostly it's an
optimization, but it also has the effect of getting the raw message.

> At the same time we do NOT force inconvenience on projects that
> want to use legacy encoding for whatever reason.  The world is
> not UTF-8 only, and encoding to UTF-8 is non-reversible a times 
> (positive return value from iconv(3)).  Always re-coding to
> UTF-8 will NOT be accepted to git for now.  We can revisit this
> perhaps in 5 years.

According to the unicode FAQ, unicode is a superset of all local 
encodings, so why would the conversion be non-reversible for local 
vs utf-8?

In five years there will be so much legacy, that fixing it in a simple way 
will be unfeasble (just like CVS,  FTP, etc).

-- robin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-01-07 16:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-01 14:47 i18n, alternative solution Robin Rosenberg
2007-01-01 19:44 ` Junio C Hamano
2007-01-07 16:55   ` Robin Rosenberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).