git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
Cc: Sven Strickroth <sven.strickroth@tu-clausthal.de>, git@vger.kernel.org
Subject: Re: git archive --format zip utf-8 issues
Date: Sat, 11 Aug 2012 21:27:42 -0700	[thread overview]
Message-ID: <7vhas8hgzl.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <5026C649.2090700@lsrfire.ath.cx> ("René Scharfe"'s message of "Sat, 11 Aug 2012 22:53:29 +0200")

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> ... A more interesting question is: What's supported by
> which programs?

Yes, that is the most interesting question.

>> Of course, "git archive --format=zip --path-reencode=utf8-to-latin1"
>> would be the most generic way to do this.
>
> I really hope we can make do without additional options.

We need to at least know the path encoding used in the tree objects,
and I'd be OK with a solution that assumes a single encoding is used
for the entire tree.

We would eventually need to also know the encoding used on the local
working tree (i.e. in what encoding paths are returned from
readdir() and the pathspec the user gives us from the command line),
and iconv it to the tree objects encoding for the project when
creating a cache_entry object to be fed to add_to_index(), and iconv
it back from the tree objects encoding to the working tree encoding
in write_entry(), but that is a longer term direction.  For now, in
order to address the immediate issue, we only need the tree object
encoding, which should default to UTF-8 for interoperability.

So "git archive --format=zip --in-object-path-encoding=big5" for a
project whose tree object pathnames are in that encoding (and we
always record paths in UTF-8 when writing zipfiles) should be the
minimal that we need for now.

Optionally, with a configuration variable i18n.inObjectPathEncoding
(as opposed to the eventual i18n.worktreePathEncoding) set to big5,
users of such a project can say "git archive --format=zip" without
the "--in-object-path-encoding" option.

Considering that zip is a format meant for exchange, I'd think we
would be fine to always write in UTF-8 and leaving the readers
responsible for converting the pathname while extracting.  If a
major zip extractor is incapable of handling UTF-8 (or even if
capable it is cumbersome, for that matter), we may end up having to
add "--in-archive-path-encoding=UTF-8" option to "git archive", with
associated "zip.archivePathEncoding" variable, though.

      parent reply	other threads:[~2012-08-12  4:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-10 21:58 git archive --format zip utf-8 issues Sven Strickroth
2012-08-10 22:47 ` Junio C Hamano
2012-08-10 23:53   ` Sven Strickroth
2012-08-11 20:53     ` René Scharfe
2012-08-12  4:08       ` Junio C Hamano
2012-08-11 20:53   ` René Scharfe
2012-08-11 21:37     ` Sven Strickroth
2012-08-30 22:26       ` Jeff King
2012-09-04 20:23         ` René Scharfe
2012-09-04 21:03           ` Junio C Hamano
2012-09-05 19:36             ` René Scharfe
2012-09-18 19:40               ` René Scharfe
2012-09-18 19:46                 ` [PATCH 1/2] archive-zip: support UTF-8 paths René Scharfe
2012-09-18 19:53                 ` [PATCH 2/2] archive-zip: declare creator to be Unix for " René Scharfe
2012-09-18 20:24                 ` git archive --format zip utf-8 issues René Scharfe
2012-09-18 21:12                 ` Junio C Hamano
2012-09-20 22:00                   ` René Scharfe
2012-09-24 15:56                     ` René Scharfe
2012-09-24 18:13                       ` Junio C Hamano
2012-09-24 15:56                 ` [PATCH 3/2] archive-zip: write extended timestamp René Scharfe
2012-08-12  4:27     ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vhas8hgzl.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=rene.scharfe@lsrfire.ath.cx \
    --cc=sven.strickroth@tu-clausthal.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).