From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: unlisted-recipients:; (no To-header on input)
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
Sven Strickroth <sven.strickroth@tu-clausthal.de>,
git@vger.kernel.org
Subject: Re: git archive --format zip utf-8 issues
Date: Mon, 24 Sep 2012 17:56:57 +0200 [thread overview]
Message-ID: <506082C9.6050604@lsrfire.ath.cx> (raw)
In-Reply-To: <505B91E9.7060208@lsrfire.ath.cx>
[-- Attachment #1: Type: text/plain, Size: 2058 bytes --]
Hi,
I found a way to make unzip respect the UTF-8 flag in ZIP files:
Apparently (from looking at the source) an extended field needs to be
present in order for it to even look at general purpose flag 11. I sent
a patch to add an extended timestamp field that fits the bill.
Here are new numbers on ZIP international filename compatibility:
7-Zip PeaZip builtin unzip unzip unzip 7z
Windows Windows Windows Linux mingw Windows Linux
git Linux 1 1 1 7 1 1 1
git 1 Linux 37 37 1 7 1 1 37
git 2 Linux 37 37 1 37 1 1 37
git 3 Linux 37 37 1 37 15 15 37
git mingw 1 1 1 7 1 1 1
git 1 mingw 37 37 1 7 1 1 37
git 2 mingw 37 37 1 37 1 1 37
git 3 mingw 37 37 1 37 15 15 37
7-Zip Windows 37 37 14 24 15 15 24
PeaZip Windows 37 37 14 24 15 15 24
zip Linux 37 37 1 37 15 15 37
zip Windows 14 14 0 37 15 15 1
builtin Windows 14 14 14 1 14 14 1
The test corpus still consists of 37 files based on the pangrams on the
following web page:
http://www.columbia.edu/~fdc/utf8/index.html#quickbrownfox
The files can be created using the attached script. It also provides a
check command to count the files with correct names, and the results of
that for different ZIP extractors are give in the table. The built-in
ZIP functionality on Windows was only able to pack 14 of the 37 files,
which explains the low score across the board for this packer.
"git 1" is the patch "archive-zip: support UTF-8 paths" added, which
let's archive-zip make use of the UTF-8 flag. "git 2" is "git 1" plus
the patch "archive-zip: declare creator to be Unix for UTF-8 paths".
Both have been posted before. "git 3" is "git 1" plus the new patch
"archive-zip: write extended timestamp".
Let's drop patch 2 (Unix as creator) and keep patches 1 (UTF-8 flag) and
3 (mtime field) to make archive-zip record non-ASCII filenames in a
portable way. It's not perfect, but I don't know how to do any better
given that Windows' built-in ZIP functionality expects filenames in the
local code page and with an international audience for projects
distributing ZIP files.
René
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pangrams.sh --]
[-- Type: text/plain; charset=windows-1252; name="pangrams.sh", Size: 2367 bytes --]
#!/bin/sh
files() {
cat <<EOF
pangrams/わがよたれぞ つねならむ
pangrams/うゐのおくやま けふこえて
pangrams/いろはにほへど ちりぬるを
pangrams/あさきゆめみじ ゑひもせず
pangrams/An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a ṁeall
pangrams/Árvíztűrő tükörfúrógép
pangrams/Blåbærsyltetøy
pangrams/D'ḟuascail Íosa Úrṁac na hÓiġe Beannaiṫe pór
pangrams/d'œufs abîmés
pangrams/Éava agus Áḋaiṁ
pangrams/Eĥoŝanĝo ĉiuĵaŭde
pangrams/El pingüino Wenceslao hizo kilómetros bajo exhaustiva
pangrams/Falsches Üben von Xylophonmusik quält
pangrams/Flygande bäckasiner söka strax hwila på mjuka tuvor
pangrams/Høj bly gom vandt fræk sexquiz på wc
pangrams/jeden größeren Zwerg
pangrams/lena ṗóg éada ó ṡlí do leasa ṫú
pangrams/Les naïfs ægithales hâtifs pondant à Noël où il gèle
pangrams/lluvia y frío añoraba a su querido cachorro
pangrams/na stĺpe sa ďateľ učí kvákať novú ódu o živote
pangrams/O próximo vôo à noite sobre o Atlântico
pangrams/Pa's wijze lynx bezag vroom het fikse aquaduct
pangrams/Pchnąć w tę łódź jeża lub osiem skrzyń fig
pangrams/põe freqüentemente o único médico
pangrams/Příliš žluťoučký kůň úpěl ďábelské kódy
pangrams/Sævör grét áðan því úlpan var ónýt
pangrams/sont sûrs d'être déçus en voyant leurs drôles
pangrams/Starý kôň na hŕbe kníh žuje tíško povädnuté ruže
pangrams/The quick brown fox jumps over the lazy dog
pangrams/Törkylempijävongahdus
pangrams/Vuol Ruoŧa geđggiid leat máŋga luosa ja čuovžža
pangrams/זה כיף סתם לשמוע איך תנצח קרפד עץ טוב בגן
pangrams/ξεσκεπάζω την ψυχοφθόρα βδελυγμία
pangrams/ξεσκεπάζω τὴν ψυχοφθόρα βδελυγμία
pangrams/Жълтата дюля беше щастлива
pangrams/Съешь же ещё этих мягких французских булок да выпей чаю
pangrams/че пухът, който цъфна, замръзна като гьон
EOF
}
case "$1" in
create)
mkdir -p pangrams
files | while read file
do
touch "$file"
done
;;
check)
files | while read file
do
test -f "$file" && echo "$file"
done | wc -l
;;
*)
echo "Usage: $0 create | check" >&2
exit 1
;;
esac
next prev parent reply other threads:[~2012-09-24 15:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-10 21:58 git archive --format zip utf-8 issues Sven Strickroth
2012-08-10 22:47 ` Junio C Hamano
2012-08-10 23:53 ` Sven Strickroth
2012-08-11 20:53 ` René Scharfe
2012-08-12 4:08 ` Junio C Hamano
2012-08-11 20:53 ` René Scharfe
2012-08-11 21:37 ` Sven Strickroth
2012-08-30 22:26 ` Jeff King
2012-09-04 20:23 ` René Scharfe
2012-09-04 21:03 ` Junio C Hamano
2012-09-05 19:36 ` René Scharfe
2012-09-18 19:40 ` René Scharfe
2012-09-18 19:46 ` [PATCH 1/2] archive-zip: support UTF-8 paths René Scharfe
2012-09-18 19:53 ` [PATCH 2/2] archive-zip: declare creator to be Unix for " René Scharfe
2012-09-18 20:24 ` git archive --format zip utf-8 issues René Scharfe
2012-09-18 21:12 ` Junio C Hamano
2012-09-20 22:00 ` René Scharfe
2012-09-24 15:56 ` René Scharfe [this message]
2012-09-24 18:13 ` Junio C Hamano
2012-09-24 15:56 ` [PATCH 3/2] archive-zip: write extended timestamp René Scharfe
2012-08-12 4:27 ` git archive --format zip utf-8 issues Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=506082C9.6050604@lsrfire.ath.cx \
--to=rene.scharfe@lsrfire.ath.cx \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=sven.strickroth@tu-clausthal.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).