git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [RFC] Design for pathname encoding gitattribute [RESEND]
@ 2008-01-22  4:41 Sam Vilain
  2008-01-22  5:35 ` Johannes Schindelin
  2008-01-22  6:26 ` Junio C Hamano
  0 siblings, 2 replies; 11+ messages in thread
From: Sam Vilain @ 2008-01-22  4:41 UTC (permalink / raw)
  To: git
  Cc: Peter Karlsson, Mark Junker, Pedro Melo, Martin Langhoff,
	Johannes Schindelin, Dmitry Potapov, Kevin Ballard

Some projects may like to enforce a particular encoding is used for
all filenames in the repository.  Within the UTF-8 encoding, there are
four normal forms (see http://unicode.org/reports/tr15/), any of which
may be a reasonable repository format choice.  Additionally, some
filesystems may have a single encoding that they support when writing
local filenames.  To support this, iconv and a normalization library
must have the information they need to perform the correct conversion.

This is a configuration design proposal, and does not implement any
changes.
---
   Hi all, I think that restating the problem in these terms might be
   more productive than the previous discussion, design critiques?

   It is intended that this doesn't impact at all on users with C
   filesystems without explicit configuration, while adding the feature
   of allowing projects to specify unicode normalisation (so, eg,
   Märchen ends up the same as Märchen)

   [apologies if this hits the list twice; I sent the first with a bad
    content encoding header and assume it got dropped]

 Documentation/config.txt        |   16 ++++++++++++++++
 Documentation/gitattributes.txt |   19 +++++++++++++++++++
 Documentation/i18n.txt          |    9 ++++++---
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ee08845..9d2567d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -146,6 +146,22 @@ core.symlinks::
 	file. Useful on filesystems like FAT that do not support
 	symbolic links. True by default.
 
+core.repositoryPathEncoding::
+	Specify the default assumed encoding of repository paths, if
+	not specified in gitlink:gitattributes[3] for that repository.
+	The default value of this is "C".
+
+core.checkoutPathEncoding::
+	Specify the encoding of local filenames.  The default value of
+	this depends on the platform and filesystem, but for most users
+	will be "C", indicating no pathname conversion required.
+
+core.checkoutPathEncodingFromLocale::
+	Specify whether the checkout path encoding should be
+	controlled via environment locale variables.  This may have
+	some bizarre side effects if you switch locales between
+	working with a checkout.  False by default.
+
 core.gitProxy::
 	A "proxy command" to execute (as 'command host port') instead
 	of establishing direct connection to the remote server when
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index cc9c7c5..4136528 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -170,6 +170,25 @@ intent is that if someone unsets the filter driver definition,
 or does not have the appropriate filter program, the project
 should still be usable.
 
+`encoding`
+^^^^^^^^^^
+Specifies the valid encoding for file names (does not affect content)
+on the specified path.  Git enforces that all filenames are valid in
+this encoding, and if applicable and possible, will translate from the
+encoding configured (or, on relevant platform and filesystem
+combinations, detected) to this encoding.
+
+The default value of this is "C", which leaves behaviour on
+filesystems which do not support "C" semantics undefined until it is
+set.  For instance, if your filesystem supports only UTF-8, and you
+are trying to check out a repository that is in Latin-1, then you will
+need to configure the repository encoding in `.git/info/attributes` 
+before you can check files out on that system.
+
+Valid encodings are currently 'ISO-8859-1' and 'UTF-8'.  'UTF-8' may
+be followed by '+NFC', '+NFD', '+NFKD' or '+NFKC' to enforce a
+particular normalization of filenames.
+
 
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt
index b95f99b..fba0407 100644
--- a/Documentation/i18n.txt
+++ b/Documentation/i18n.txt
@@ -1,11 +1,14 @@
 At the core level, git is character encoding agnostic.
 
  - The pathnames recorded in the index and in the tree objects
-   are treated as uninterpreted sequences of non-NUL bytes.
+   are normally treated as uninterpreted sequences of non-NUL bytes.
    What readdir(2) returns are what are recorded and compared
    with the data git keeps track of, which in turn are expected
-   to be what lstat(2) and creat(2) accepts.  There is no such
-   thing as pathname encoding translation.
+   to be what lstat(2) and creat(2) accepts.
+
+However, if there are configured encodings for the checkout and/or
+repository, then the defined conversions will occur between the
+readdir(2) and the index, in both directions.
 
  - The contents of the blob objects are uninterpreted sequence
    of bytes.  There is no encoding translation at the core
-- 
1.5.3.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-01-22 10:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-22  4:41 [PATCH] [RFC] Design for pathname encoding gitattribute [RESEND] Sam Vilain
2008-01-22  5:35 ` Johannes Schindelin
2008-01-22  6:37   ` Junio C Hamano
2008-01-22  6:26 ` Junio C Hamano
2008-01-22  7:43   ` Junio C Hamano
2008-01-22  8:09     ` Mark Junker
2008-01-22  9:16       ` Junio C Hamano
2008-01-22  9:13     ` Rafael Garcia-Suarez
2008-01-22  9:57     ` Sam Vilain
2008-01-22 10:36       ` Junio C Hamano
2008-01-22 10:44         ` Sam Vilain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).