From: Jeff King <peff@peff.net>
To: Sven Strickroth <sven.strickroth@tu-clausthal.de>
Cc: git@vger.kernel.org, "René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: git archive --format zip utf-8 issues
Date: Thu, 30 Aug 2012 18:26:03 -0400 [thread overview]
Message-ID: <20120830222603.GA20289@sigill.intra.peff.net> (raw)
In-Reply-To: <5026D081.2040906@tu-clausthal.de>
On Sat, Aug 11, 2012 at 11:37:05PM +0200, Sven Strickroth wrote:
> Am 11.08.2012 22:53 schrieb René Scharfe:
> > The standard says we need to convert to CP437, or to UTF-8, or provide
> > both versions. A more interesting question is: What's supported by which
> > programs?
> >
> > The ZIP functionality built into Windows 7 doesn't seem to work with
> > UTF-8 encoded filenames (except for those that only use the ASCII
> > subset), and to ignore the UTF-8 part if both are given.
>
> I played a bit with the git source code and found out, that
>
> diff --git a/archive-zip.c b/archive-zip.c
> index f5af81f..e0ccb4f 100644
> --- a/archive-zip.c
> +++ b/archive-zip.c
> @@ -257,7 +257,7 @@ static int write_zip_entry(struct archiver_args *args,
> copy_le16(dirent.creator_version,
> S_ISLNK(mode) || (S_ISREG(mode) && (mode & 0111)) ? 0x0317 : 0);
> copy_le16(dirent.version, 10);
> - copy_le16(dirent.flags, flags);
> + copy_le16(dirent.flags, flags+2048);
> copy_le16(dirent.compression_method, method);
> copy_le16(dirent.mtime, zip_time);
> copy_le16(dirent.mdate, zip_date);
> --
> works with 7-zip, however, not with Windows 7 build-in zip.
>
> If I create a zip file with 7-zip which contains umlauts and other
> unicode chars like (國立1-кккк.txt) the Windows 7 build-in zip displays
> them correctly, too.
Ping on this stalled discussion.
It seems like there are two separate issues here:
1. Knowing the encoding of pathnames in the repository.
2. Setting the right flags in zip output.
A full solution would handle both parts, but let's ignore (1) for a
moment, and assume we have utf-8 (or can massage into utf-8 from an
encoding specified by the user).
It seems like just setting the magic utf-8 flag would be the only thing
we need to do, according to the standard. But according to discussions
referenced elsewhere in this thread, that flag was invented only in
2007, so we may be dealing with older implementations (I have no idea
how common they would be; that may be the problem with Windows 7's zip
you are seeing). We could re-encode to cp437, which the standard
specifies, but apparently some implementations do not respect that
(and use a local code page instead). And it cannot represent all utf-8
characters, anyway.
It sounds like 7-zip has figured out a more portable solution. Can you
show us a sample of 7-zip's output with utf-8 characters to compare to
what git generates? I wonder if it is using a combination of methods.
-Peff
next prev parent reply other threads:[~2012-08-30 22:26 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-10 21:58 git archive --format zip utf-8 issues Sven Strickroth
2012-08-10 22:47 ` Junio C Hamano
2012-08-10 23:53 ` Sven Strickroth
2012-08-11 20:53 ` René Scharfe
2012-08-12 4:08 ` Junio C Hamano
2012-08-11 20:53 ` René Scharfe
2012-08-11 21:37 ` Sven Strickroth
2012-08-30 22:26 ` Jeff King [this message]
2012-09-04 20:23 ` René Scharfe
2012-09-04 21:03 ` Junio C Hamano
2012-09-05 19:36 ` René Scharfe
2012-09-18 19:40 ` René Scharfe
2012-09-18 19:46 ` [PATCH 1/2] archive-zip: support UTF-8 paths René Scharfe
2012-09-18 19:53 ` [PATCH 2/2] archive-zip: declare creator to be Unix for " René Scharfe
2012-09-18 20:24 ` git archive --format zip utf-8 issues René Scharfe
2012-09-18 21:12 ` Junio C Hamano
2012-09-20 22:00 ` René Scharfe
2012-09-24 15:56 ` René Scharfe
2012-09-24 18:13 ` Junio C Hamano
2012-09-24 15:56 ` [PATCH 3/2] archive-zip: write extended timestamp René Scharfe
2012-08-12 4:27 ` git archive --format zip utf-8 issues Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120830222603.GA20289@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=rene.scharfe@lsrfire.ath.cx \
--cc=sven.strickroth@tu-clausthal.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).