From: Junio C Hamano <gitster@pobox.com>
To: Mark Junker <mjscod@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8
Date: Mon, 21 Jan 2008 03:04:33 -0800 [thread overview]
Message-ID: <7vejcbzbge.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <fn1sk4$uh4$1@ger.gmane.org> (Mark Junker's message of "Mon, 21 Jan 2008 11:36:55 +0100")
Mark Junker <mjscod@web.de> writes:
> Junio C Hamano schrieb:
>
>> I do not know how Macintosh libc implements "struc dirent", but
>> this approach does not work in general.
>
> IMHO there is no need that this approach works in general because this
> is a fix for MacOSX systems only. I also use d_namlen which might not
> be available on other systems. But on MacOSX this works as expected.
>
>> yet you can obtain a path component longer than 256 bytes.
>> Apparently the library allocates longer d_name[] field than what
>> is shown to the user.
>
> This is not a problem either because on MacOSX we get decomposed UTF8
> and we always convert to composed UTF8. This means that the string
> returned from reencode_string will always be smaller than the original
> filename that had to be reencoded.
It is not quite enough that this works Ok on MacOS, if you made
FIX_UTF8_MAC definable in the Makefile. After all some friendly
and helpful Linux folks might want to enable it with their build
trying to help debugging, right?
In the short term, as long as it safely runs without overrunning
the buffer on MacOS, then that is fine, even though we will need
some protection to prevent this code from getting compiled and
used on Linux with glibc, which does have the issue.
I was specifically talking about this "static" thing.
+static struct dirent temp;
+struct dirent *gitreaddir(DIR *dirp)
+{
+ size_t utf8_len;
+ char *utf8;
+ struct dirent *result;
+ result = readdir(dirp);
+ if (result != NULL) {
+ memcpy(&temp, result, sizeof(struct dirent));
+ utf8 = reencode_string(temp.d_name, "UTF8", "UTF8-MAC");
+ if (utf8 != NULL) {
+ utf8_len = strlen(utf8);
+ temp.d_namlen = (u_int8_t) utf8_len;
+ memcpy(temp.d_name, utf8, utf8_len + 1);
+ free(utf8);
+ result = &temp;
+ }
+ }
+ return result;
+}
You memcpy() what the library gave you in *result to the
statically allocated "temp". d_name[] in "temp" comes from the
structure definition in the user visible include file, which
could be much shorter than what the library gave you in *result.
The structure definition I showed in my message you are
responding to illustrates the issue. If MacOS uses a similar
trick to define d_name[256] and sometimes returns much longer
name in *result, you are truncating the name by copying only the
first part of the structure and first 256 bytes of d_name[].
But you have a Mac, I don't, so as long as you have verified
that their header has enough room in statically allocated "temp"
to store longest possible name that can be returned from
readdir(), the code is Ok. I was just being cautious, as I know
the above code has a problem on one platform.
next prev parent reply other threads:[~2008-01-21 11:05 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-21 9:12 [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8 Mark Junker
2008-01-21 9:45 ` Mark Junker
2008-01-21 9:50 ` Mark Junker
2008-01-21 9:55 ` Mark Junker
2008-01-21 10:15 ` Junio C Hamano
2008-01-21 10:36 ` Mark Junker
2008-01-21 11:04 ` Junio C Hamano [this message]
2008-01-21 11:43 ` Mark Junker
2008-01-22 4:08 ` H. Peter Anvin
2008-01-22 4:59 ` Linus Torvalds
2008-01-22 7:16 ` Linus Torvalds
2008-01-22 7:54 ` Junio C Hamano
2008-01-22 22:34 ` Robin Rosenberg
2008-01-22 12:20 ` Dmitry Potapov
2008-01-22 11:57 ` Dmitry Potapov
2008-01-22 14:21 ` Nicolas Pitre
2008-01-22 15:58 ` Linus Torvalds
2008-01-21 11:24 ` Johannes Schindelin
2008-01-21 11:29 ` Junio C Hamano
2008-01-21 11:49 ` Mark Junker
2008-01-21 12:09 ` Johannes Schindelin
2008-01-21 19:14 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vejcbzbge.fsf@gitster.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=mjscod@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox