All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	git@vger.kernel.org, "Torsten Bögershausen" <tboegi@web.de>
Subject: Re: [PATCH V4] git on Mac OS and precomposed unicode
Date: Sun, 29 Jan 2012 11:29:53 +0100	[thread overview]
Message-ID: <4F251FA1.80400@web.de> (raw)
In-Reply-To: <CACsJy8BKQHLdoXfSKsULkWWbWjWEuZgr=bVNKmgCSArvwbf2UA@mail.gmail.com>

On 22.01.12 10:58, Nguyen Thai Ngoc Duy wrote:
> On Sun, Jan 22, 2012 at 5:56 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> [Pinging Nguyen who has worked rather extensively on the start-up sequence
>> for ideas.]
>>
[snip]
> 
> I just have a quick look, you reencode opendir, readdir, and
> closedir() to precomposed form. But files are still in decomposed
> form, does open(<precomposed file>) work when only <decomposed file>
> exists?

Yes. All function like stat(), lstat(), open(), fopen(), unlink() behave the same
for precomped or decomposed. This is similar to the ignore case feature.
And because the default HFS+ is case preserving, case insenstive and unicode decomposing
all at the same time, a file name "Ä" could be reached under 4 different names.
Please see the output of the test script:
(which is at the end of this email)

tests/Darwin_i386/NFC file name created as nfc is readable as nfd
tests/Darwin_i386/NFC readdir returns nfd but expected is nfc
tests/Darwin_i386/NFD file name created as nfd is readable as nfc
tests/Darwin_i386/NFCNFD 1 file found in directory, but there should be 2
tests/Darwin_i386/NFCNFD nfc is missing, nfd is present
tests/Darwin_i386/NFCNFD nfc File content overwritten by nfd
tests/Darwin_i386/NFDNFC 1 file found in directory, but there should be 2
tests/Darwin_i386/NFDNFC nfc is missing, nfd is present
tests/Darwin_i386/NFDNFC nfd File content overwritten by nfc


> 
>>> In order to prevent that ever a file name in decomposed unicode is
>>> entering the index, a "brute force" attempt is taken: all arguments into
>>> git (argv[1]..argv[n]) are converted into precomposed unicode.  This is
>>> done in git.c by calling precompose_argv().  This function is actually a
>>> #define, and it is only defined under Mac OS.  Nothing is converted on
>>> any other platforms.
> 
> This is not entirely safe. Filenames can be taken from a file for
> example (--stdin option or similar). Unless I'm mistaken, all file
> names must enter git through the index, the conversion at read-cache.c
> may be a better option.
Good point, thanks. 
I added some code to read-cache.c, and it works for files, but not for directories.
I looked through the code for "case-ignoring" directory names, and couldn't
find something obvious. More work is to be done.
 

[snip]
> I'd rather encode at index level and read_directory() than at argv[].
>But if reencoding argv is the only feasible way, perhaps put the
>conversion in parse_options()?

I tried that, and found that git-lsfiles.c doesn't use parse_options.

[snip]

On the long run I want to get rid of the argv[] conversion completely,
but I'm not there yet.

Thanks for all comments and inspiration!

(and apologies for my long response times I use to have)
/Torsten



PS:
Here the script.
Mac OS writes decomposd unicode to HFS+, precomposed unicode to VFAT and SAMBA.
In any case readdir() returns decomposed.

=================
#!/bin/sh
errorandout() {
  echo Error: The shell can not handle nfd
  echo try to run /bin/bash $0
  rm -rf $DIR
  exit 1
}

checkDirNfcOrNfd() {
  DDNFCNFD=$1
  readdirexp=$2
  if test -r $DDNFCNFD/$aumlnfc; then
    x=`cat $DDNFCNFD/$aumlnfc`
    if test "$x" = nfd; then
      echo $DDNFCNFD file name created as nfd is readable as nfc
    fi
  fi
  if test -r $DDNFCNFD/$aumlnfd; then
    x=`cat $DDNFCNFD/$aumlnfd 2>/dev/null` || {
      echo $DDNFCNFD nfd is not readable, but readdir says that it exist
    }
    if test "$x" = nfc; then
      echo $DDNFCNFD file name created as nfc is readable as nfd
    fi
  fi
  readdirres=`echo $DDNFCNFD/*`
  if test "$readdirres" != "$DDNFCNFD/$readdirexp"; then
    if test "$readdirres" = $DDNFCNFD/$aumlnfd; then
      echo $DDNFCNFD readdir returns nfd but expected is nfc
    fi
    if test "$readdirres" = $DDNFCNFD/$aumlnfc; then
      echo $DDNFCNFD readdir returns nfc but expected is nfd
    fi
  fi
}

checkdirnfcnfd() {
  DDNFCNFD=$1
  if test `ls -1 $DDNFCNFD | wc -l` != 2; then
    if test `ls -1 $DDNFCNFD | wc -l` == 1; then
      echo $DDNFCNFD 1 file found in directory, but there should be 2
    else
      echo $DDNFCNFD 2 files should be in directory
    fi  
  fi

  x=`echo $DDNFCNFD/*`
  a=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfc`
  b=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfd`
  c=`echo $DDNFCNFD/$aumlnfc $DDNFCNFD/$aumlnfc`
  d=`echo $DDNFCNFD/$aumlnfd $DDNFCNFD/$aumlnfd`
  e=`echo $DDNFCNFD/$aumlnfc`
  f=`echo $DDNFCNFD/$aumlnfd`
  case "$x" in
    $a)
    ;;      
    $b)
    ;;
    $c)
    echo $DDNFCNFD nfd is hidden, nfc is listed twice
    ;;
    $d)
    echo $DDNFCNFD nfc is hidden, nfd is listed twice
    ;;
    $e)
    echo $DDNFCNFD nfd is missing, nfc is present
    ;;      
    $f)
    echo $DDNFCNFD nfc is missing, nfd is present
    ;;      
    *)
    echo $DDNFCNFD x`echo $x | xxd`
    ;;
  esac

  if ! test -r $DDNFCNFD/$aumlnfc; then
    echo $DDNFCNFD/nfc File does not exist
  else
    x=`cat $DDNFCNFD/$aumlnfc`
    if test "$x" != nfc; then
      echo $DDNFCNFD nfc File content overwritten by $x
    fi
  fi
  
  if ! test -r $DDNFCNFD/$aumlnfd; then
    echo $DDNFCNFD/nfd File does not exist
  else
    x=`cat $DDNFCNFD/$aumlnfd`
    if test "$x" != nfd; then
      echo $DDNFCNFD nfd File content overwritten by $x
    fi
  fi
}


aumlnfc=$(printf '\303\204')
aumlnfd=$(printf '\101\314\210')

DIR=tests/`uname -s`_`uname -m`
echo "DIR=$DIR"

rm -rf $DIR/NFC &&
rm -rf $DIR/NFD &&
rm -rf $DIR/NFCNFD &&
rm -rf $DIR/NFDNFC &&
mkdir -p $DIR/NFC &&
mkdir -p $DIR/NFD &&
mkdir -p $DIR/NFDNFC &&
mkdir -p $DIR/NFCNFD &&
echo nfc > $DIR/NFC/$aumlnfc &&
echo nfd > $DIR/NFD/$aumlnfd &&
echo nfd > $DIR/NFDNFC/$aumlnfd &&
echo nfc > $DIR/NFDNFC/$aumlnfc &&
echo nfc > $DIR/NFCNFD/$aumlnfc &&
echo nfd > $DIR/NFCNFD/$aumlnfd && {
    # test 1: basic if the shell handles nfd
    if ! test -r $DIR/NFD/$aumlnfd; then
      errorandout
    fi

  for DD in tests/*; do
    checkDirNfcOrNfd $DD/NFC  $aumlnfc
    checkDirNfcOrNfd $DD/NFD  $aumlnfd

    checkdirnfcnfd $DD/NFCNFD
    checkdirnfcnfd $DD/NFDNFC
  done
} || errorandout

  parent reply	other threads:[~2012-01-29 10:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-21 19:36 [PATCH V4] git on Mac OS and precomposed unicode Torsten Bögershausen
2012-01-21 22:28 ` Carlos Martín Nieto
2012-01-29 16:26   ` Erik Faye-Lund
2012-01-21 22:56 ` Junio C Hamano
2012-01-22  9:58   ` Nguyen Thai Ngoc Duy
2012-01-22 10:03     ` Nguyen Thai Ngoc Duy
2012-06-24 15:47       ` Torsten Bögershausen
2012-07-25 20:45         ` Robin Rosenberg
2012-01-29 10:29     ` Torsten Bögershausen [this message]
2012-01-29 12:57       ` Torsten Bögershausen
  -- strict thread matches above, loose matches on Subject: below --
2012-01-21 19:36 Torsten Bögershausen
2012-01-21 22:14 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F251FA1.80400@web.de \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.