From: "Torsten Bögershausen" <tboegi@web.de>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
git@vger.kernel.org, "Torsten Bögershausen" <tboegi@web.de>
Subject: Re: [PATCH V4] git on Mac OS and precomposed unicode
Date: Sun, 29 Jan 2012 11:29:53 +0100 [thread overview]
Message-ID: <4F251FA1.80400@web.de> (raw)
In-Reply-To: <CACsJy8BKQHLdoXfSKsULkWWbWjWEuZgr=bVNKmgCSArvwbf2UA@mail.gmail.com>
On 22.01.12 10:58, Nguyen Thai Ngoc Duy wrote:
> On Sun, Jan 22, 2012 at 5:56 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> [Pinging Nguyen who has worked rather extensively on the start-up sequence
>> for ideas.]
>>
[snip]
>
> I just have a quick look, you reencode opendir, readdir, and
> closedir() to precomposed form. But files are still in decomposed
> form, does open(<precomposed file>) work when only <decomposed file>
> exists?
Yes. All function like stat(), lstat(), open(), fopen(), unlink() behave the same
for precomped or decomposed. This is similar to the ignore case feature.
And because the default HFS+ is case preserving, case insenstive and unicode decomposing
all at the same time, a file name "Ä" could be reached under 4 different names.
Please see the output of the test script:
(which is at the end of this email)
tests/Darwin_i386/NFC file name created as nfc is readable as nfd
tests/Darwin_i386/NFC readdir returns nfd but expected is nfc
tests/Darwin_i386/NFD file name created as nfd is readable as nfc
tests/Darwin_i386/NFCNFD 1 file found in directory, but there should be 2
tests/Darwin_i386/NFCNFD nfc is missing, nfd is present
tests/Darwin_i386/NFCNFD nfc File content overwritten by nfd
tests/Darwin_i386/NFDNFC 1 file found in directory, but there should be 2
tests/Darwin_i386/NFDNFC nfc is missing, nfd is present
tests/Darwin_i386/NFDNFC nfd File content overwritten by nfc
>
>>> In order to prevent that ever a file name in decomposed unicode is
>>> entering the index, a "brute force" attempt is taken: all arguments into
>>> git (argv[1]..argv[n]) are converted into precomposed unicode. This is
>>> done in git.c by calling precompose_argv(). This function is actually a
>>> #define, and it is only defined under Mac OS. Nothing is converted on
>>> any other platforms.
>
> This is not entirely safe. Filenames can be taken from a file for
> example (--stdin option or similar). Unless I'm mistaken, all file
> names must enter git through the index, the conversion at read-cache.c
> may be a better option.
Good point, thanks.
I added some code to read-cache.c, and it works for files, but not for directories.
I looked through the code for "case-ignoring" directory names, and couldn't
find something obvious. More work is to be done.
[snip]
> I'd rather encode at index level and read_directory() than at argv[].
>But if reencoding argv is the only feasible way, perhaps put the
>conversion in parse_options()?
I tried that, and found that git-lsfiles.c doesn't use parse_options.
[snip]
On the long run I want to get rid of the argv[] conversion completely,
but I'm not there yet.
Thanks for all comments and inspiration!
(and apologies for my long response times I use to have)
/Torsten
PS:
Here the script.
Mac OS writes decomposd unicode to HFS+, precomposed unicode to VFAT and SAMBA.
In any case readdir() returns decomposed.
=================
#!/bin/sh
errorandout() {
echo Error: The shell can not handle nfd
echo try to run /bin/bash $0
rm -rf $DIR
exit 1
}
checkDirNfcOrNfd() {
DDNFCNFD=$1
readdirexp=$2
if test -r $DDNFCNFD/$aumlnfc; then
x=`cat $DDNFCNFD/$aumlnfc`
if test "$x" = nfd; then
echo $DDNFCNFD file name created as nfd is readable as nfc
fi
fi
if test -r $DDNFCNFD/$aumlnfd; then
x=`cat $DDNFCNFD/$aumlnfd 2>/dev/null` || {
echo $DDNFCNFD nfd is not readable, but readdir says that it exist
}
if test "$x" = nfc; then
echo $DDNFCNFD file name created as nfc is readable as nfd
fi
fi
readdirres=`echo $DDNFCNFD/*`
if test "$readdirres" != "$DDNFCNFD/$readdirexp"; then
if test "$readdirres" = $DDNFCNFD/$aumlnfd; then
echo $DDNFCNFD readdir returns nfd but expected is nfc
fi
if test "$readdirres" = $DDNFCNFD/$aumlnfc; then
echo $DDNFCNFD readdir returns nfc but expected is nfd
fi
fi
}
checkdirnfcnfd() {
DDNFCNFD=$1
if test `ls -1 $DDNFCNFD | wc -l` != 2; then
if test `ls -1 $DDNFCNFD | wc -l` == 1; then
echo $DDNFCNFD 1 file found in directory, but there should be 2
else
echo $DDNFCNFD 2 files should be in directory
fi
fi
x=`echo $DDNFCNFD/*`
a=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfc`
b=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfd`
c=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfc`
d=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfd`
e=`echo $DDNFCNFD/$aumlnfc`
f=`echo $DDNFCNFD/$aumlnfd`
case "$x" in
$a)
;;
$b)
;;
$c)
echo $DDNFCNFD nfd is hidden, nfc is listed twice
;;
$d)
echo $DDNFCNFD nfc is hidden, nfd is listed twice
;;
$e)
echo $DDNFCNFD nfd is missing, nfc is present
;;
$f)
echo $DDNFCNFD nfc is missing, nfd is present
;;
*)
echo $DDNFCNFD x`echo $x | xxd`
;;
esac
if ! test -r $DDNFCNFD/$aumlnfc; then
echo $DDNFCNFD/nfc File does not exist
else
x=`cat $DDNFCNFD/$aumlnfc`
if test "$x" != nfc; then
echo $DDNFCNFD nfc File content overwritten by $x
fi
fi
if ! test -r $DDNFCNFD/$aumlnfd; then
echo $DDNFCNFD/nfd File does not exist
else
x=`cat $DDNFCNFD/$aumlnfd`
if test "$x" != nfd; then
echo $DDNFCNFD nfd File content overwritten by $x
fi
fi
}
aumlnfc=$(printf '\303\204')
aumlnfd=$(printf '\101\314\210')
DIR=tests/`uname -s`_`uname -m`
echo "DIR=$DIR"
rm -rf $DIR/NFC &&
rm -rf $DIR/NFD &&
rm -rf $DIR/NFCNFD &&
rm -rf $DIR/NFDNFC &&
mkdir -p $DIR/NFC &&
mkdir -p $DIR/NFD &&
mkdir -p $DIR/NFDNFC &&
mkdir -p $DIR/NFCNFD &&
echo nfc > $DIR/NFC/$aumlnfc &&
echo nfd > $DIR/NFD/$aumlnfd &&
echo nfd > $DIR/NFDNFC/$aumlnfd &&
echo nfc > $DIR/NFDNFC/$aumlnfc &&
echo nfc > $DIR/NFCNFD/$aumlnfc &&
echo nfd > $DIR/NFCNFD/$aumlnfd && {
# test 1: basic if the shell handles nfd
if ! test -r $DIR/NFD/$aumlnfd; then
errorandout
fi
for DD in tests/*; do
checkDirNfcOrNfd $DD/NFC $aumlnfc
checkDirNfcOrNfd $DD/NFD $aumlnfd
checkdirnfcnfd $DD/NFCNFD
checkdirnfcnfd $DD/NFDNFC
done
} || errorandout
next prev parent reply other threads:[~2012-01-29 10:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-21 19:36 [PATCH V4] git on Mac OS and precomposed unicode Torsten Bögershausen
2012-01-21 22:28 ` Carlos Martín Nieto
2012-01-29 16:26 ` Erik Faye-Lund
2012-01-21 22:56 ` Junio C Hamano
2012-01-22 9:58 ` Nguyen Thai Ngoc Duy
2012-01-22 10:03 ` Nguyen Thai Ngoc Duy
2012-06-24 15:47 ` Torsten Bögershausen
2012-07-25 20:45 ` Robin Rosenberg
2012-01-29 10:29 ` Torsten Bögershausen [this message]
2012-01-29 12:57 ` Torsten Bögershausen
-- strict thread matches above, loose matches on Subject: below --
2012-01-21 19:36 Torsten Bögershausen
2012-01-21 22:14 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F251FA1.80400@web.de \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).