git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] macos: lazily initialize iconv
@ 2012-07-31 17:52 Junio C Hamano
  2012-07-31 18:07 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-07-31 17:52 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Linus Torvalds

In practice, the majority of paths do not have any utf8 character
that needs the canonicalization.  Lazily call iconv_open() and
iconv_close() to avoid unnecessary overhead.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This is not even compile tested, so it needs testing and
   benchmarking, as I do not even know how costly the calls to
   open/close are when we do not have to call iconv() itself.

   This was brought up by Linus (Cc'ed) in http://goo.gl/INWVc

 compat/precompose_utf8.c | 24 ++++++++++++++++++------
 compat/precompose_utf8.h |  1 +
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index d40d1b3..63ce89f 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -67,7 +67,7 @@ void probe_utf8_pathname_composition(char *path, int len)
 
 void precompose_argv(int argc, const char **argv)
 {
-	int i = 0;
+	int i;
 	const char *oldarg;
 	char *newarg;
 	iconv_t ic_precompose;
@@ -75,11 +75,19 @@ void precompose_argv(int argc, const char **argv)
 	if (precomposed_unicode != 1)
 		return;
 
+	/* Avoid iconv_open()/iconv_close() if there is nothing to convert */
+	for (i = 0; i < argc; i++) {
+		if (has_utf8(argv[i], (size_t)-1, NULL))
+			break;
+	}
+	if (i < argc)
+		return;
+
 	ic_precompose = iconv_open(repo_encoding, path_encoding);
 	if (ic_precompose == (iconv_t) -1)
 		return;
 
-	while (i < argc) {
+	for (i = 0; i < argc; i++) {
 		size_t namelen;
 		oldarg = argv[i];
 		if (has_utf8(oldarg, (size_t)-1, &namelen)) {
@@ -87,7 +95,6 @@ void precompose_argv(int argc, const char **argv)
 			if (newarg)
 				argv[i] = newarg;
 		}
-		i++;
 	}
 	iconv_close(ic_precompose);
 }
@@ -106,8 +113,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname)
 		return NULL;
 	} else {
 		int ret_errno = errno;
-		prec_dir->ic_precompose = iconv_open(repo_encoding, path_encoding);
-		/* if iconv_open() fails, die() in readdir() if needed */
+		prec_dir->iconv_opened = 0;
 		errno = ret_errno;
 	}
 
@@ -136,6 +142,11 @@ struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR *prec_dir)
 		prec_dir->dirent_nfc->d_type = res->d_type;
 
 		if ((precomposed_unicode == 1) && has_utf8(res->d_name, (size_t)-1, NULL)) {
+			if (!prec_dir->iconv_opened) {
+				prec_dir->ic_precompose =
+					iconv_open(repo_encoding, path_encoding);
+				prec_dir->iconv_opened = 1;
+			}
 			if (prec_dir->ic_precompose == (iconv_t)-1) {
 				die("iconv_open(%s,%s) failed, but needed:\n"
 						"    precomposed unicode is not supported.\n"
@@ -181,7 +192,8 @@ int precompose_utf8_closedir(PREC_DIR *prec_dir)
 	int ret_errno;
 	ret_value = closedir(prec_dir->dirp);
 	ret_errno = errno;
-	if (prec_dir->ic_precompose != (iconv_t)-1)
+	if (prec->dir->iconv_opened &&
+	    (prec_dir->ic_precompose != (iconv_t)-1))
 		iconv_close(prec_dir->ic_precompose);
 	free(prec_dir->dirent_nfc);
 	free(prec_dir);
diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index 3b73585..8de485e 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -22,6 +22,7 @@ typedef struct dirent_prec_psx {
 
 typedef struct {
 	iconv_t ic_precompose;
+	int iconv_opened;
 	DIR *dirp;
 	struct dirent_prec_psx *dirent_nfc;
 } PREC_DIR;
-- 
1.7.12.rc1.43.g3fa3e7e

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] macos: lazily initialize iconv
  2012-07-31 17:52 [PATCH] macos: lazily initialize iconv Junio C Hamano
@ 2012-07-31 18:07 ` Junio C Hamano
  2012-07-31 18:10 ` Ralf Thielow
  2012-07-31 18:37 ` [PATCH v2] " Junio C Hamano
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-07-31 18:07 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> In practice, the majority of paths do not have any utf8 character
> that needs the canonicalization.  Lazily call iconv_open() and
> iconv_close() to avoid unnecessary overhead.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * This is not even compile tested, so it needs testing and
>    benchmarking, as I do not even know how costly the calls to
>    open/close are when we do not have to call iconv() itself.
>
>    This was brought up by Linus (Cc'ed) in http://goo.gl/INWVc

Even though I also think that per-DIR iconv may not be the optimal
way to organize this (I think iconv_t should be a per-thread thing
at most), it would be a more involved change that needs to be done
by somebody who actually works on Mac, so the patch I sent is kept
deliberately minimum.

Just FYI.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] macos: lazily initialize iconv
  2012-07-31 17:52 [PATCH] macos: lazily initialize iconv Junio C Hamano
  2012-07-31 18:07 ` Junio C Hamano
@ 2012-07-31 18:10 ` Ralf Thielow
  2012-07-31 18:19   ` Junio C Hamano
  2012-07-31 18:37 ` [PATCH v2] " Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: Ralf Thielow @ 2012-07-31 18:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, git, Linus Torvalds

On Tue, Jul 31, 2012 at 7:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
> +       /* Avoid iconv_open()/iconv_close() if there is nothing to convert */
> +       for (i = 0; i < argc; i++) {
> +               if (has_utf8(argv[i], (size_t)-1, NULL))
> +                       break;
> +       }
> +       if (i < argc)
> +               return;
> +

I'm not very familiar with this code but:

before: it reencodes everything which is utf-8 in argv
after: it reencodes *nothing* if one string in argv is not in utf-8

am i wrong?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] macos: lazily initialize iconv
  2012-07-31 18:10 ` Ralf Thielow
@ 2012-07-31 18:19   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-07-31 18:19 UTC (permalink / raw)
  To: Ralf Thielow; +Cc: Torsten Bögershausen, git, Linus Torvalds

Ralf Thielow <ralf.thielow@gmail.com> writes:

> On Tue, Jul 31, 2012 at 7:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> +       /* Avoid iconv_open()/iconv_close() if there is nothing to convert */
>> +       for (i = 0; i < argc; i++) {
>> +               if (has_utf8(argv[i], (size_t)-1, NULL))
>> +                       break;
>> +       }
>> +       if (i < argc)
>> +               return;
>> +
>
> I'm not very familiar with this code but:
>
> before: it reencodes everything which is utf-8 in argv
> after: it reencodes *nothing* if one string in argv is not in utf-8
>
> am i wrong?

You are right.  It should avoid the whole iconv thing if we saw _no_
utf8, i.e. the last two should be:

	if (argc <= i)
        	return;

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] macos: lazily initialize iconv
  2012-07-31 17:52 [PATCH] macos: lazily initialize iconv Junio C Hamano
  2012-07-31 18:07 ` Junio C Hamano
  2012-07-31 18:10 ` Ralf Thielow
@ 2012-07-31 18:37 ` Junio C Hamano
  2012-07-31 18:55   ` Ralf Thielow
                     ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-07-31 18:37 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Linus Torvalds, Ralf Thielow

In practice, the majority of paths do not have utf8 that needs
the canonicalization. Lazily call iconv_open()/iconv_close() to
avoid unnecessary overhead.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ralf Thielow <ralf.thielow@gmail.com>
Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
---

 * This is not even compile tested, so it needs testing and
   benchmarking, as I do not even know how costly the calls to
   open/close are when we do not have to call iconv() itself.

   This reroll corrects an obvious thinko pointed out by Ralf, and
   gets rid of an extra iconv_opened field added unnecessarily in
   the previous round.

   This was brought up by Linus in http://goo.gl/INWVc

 compat/precompose_utf8.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index d40d1b3..79b5528 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -67,7 +67,7 @@ void probe_utf8_pathname_composition(char *path, int len)
 
 void precompose_argv(int argc, const char **argv)
 {
-	int i = 0;
+	int i;
 	const char *oldarg;
 	char *newarg;
 	iconv_t ic_precompose;
@@ -75,11 +75,19 @@ void precompose_argv(int argc, const char **argv)
 	if (precomposed_unicode != 1)
 		return;
 
+	/* Avoid iconv_open()/iconv_close() if there is nothing to convert */
+	for (i = 0; i < argc; i++) {
+		if (has_utf8(argv[i], (size_t)-1, NULL))
+			break;
+	}
+	if (argc <= i)
+		return; /* no utf8 found */
+
 	ic_precompose = iconv_open(repo_encoding, path_encoding);
 	if (ic_precompose == (iconv_t) -1)
 		return;
 
-	while (i < argc) {
+	for (i = 0; i < argc; i++) {
 		size_t namelen;
 		oldarg = argv[i];
 		if (has_utf8(oldarg, (size_t)-1, &namelen)) {
@@ -87,7 +95,6 @@ void precompose_argv(int argc, const char **argv)
 			if (newarg)
 				argv[i] = newarg;
 		}
-		i++;
 	}
 	iconv_close(ic_precompose);
 }
@@ -106,8 +113,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname)
 		return NULL;
 	} else {
 		int ret_errno = errno;
-		prec_dir->ic_precompose = iconv_open(repo_encoding, path_encoding);
-		/* if iconv_open() fails, die() in readdir() if needed */
+		prec_dir->ic_precompose = (iconv_t)-1;
 		errno = ret_errno;
 	}
 
@@ -136,6 +142,9 @@ struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR *prec_dir)
 		prec_dir->dirent_nfc->d_type = res->d_type;
 
 		if ((precomposed_unicode == 1) && has_utf8(res->d_name, (size_t)-1, NULL)) {
+			if (prec_dir->ic_precompose == (iconv_t)-1)
+				prec_dir->ic_precompose =
+					iconv_open(repo_encoding, path_encoding);
 			if (prec_dir->ic_precompose == (iconv_t)-1) {
 				die("iconv_open(%s,%s) failed, but needed:\n"
 						"    precomposed unicode is not supported.\n"
-- 
1.7.12.rc1.43.g3fa3e7e

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-07-31 18:37 ` [PATCH v2] " Junio C Hamano
@ 2012-07-31 18:55   ` Ralf Thielow
  2012-07-31 19:00     ` Junio C Hamano
  2012-07-31 19:51   ` Linus Torvalds
  2012-08-10 16:43   ` Torsten Bögershausen
  2 siblings, 1 reply; 15+ messages in thread
From: Ralf Thielow @ 2012-07-31 18:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, git, Linus Torvalds

On Tue, Jul 31, 2012 at 8:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> +       /* Avoid iconv_open()/iconv_close() if there is nothing to convert */
> +       for (i = 0; i < argc; i++) {
> +               if (has_utf8(argv[i], (size_t)-1, NULL))
> +                       break;
> +       }
> +       if (argc <= i)
> +               return; /* no utf8 found */

sorry, but "argc" can never be smaller than "i", right?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-07-31 18:55   ` Ralf Thielow
@ 2012-07-31 19:00     ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-07-31 19:00 UTC (permalink / raw)
  To: Ralf Thielow; +Cc: Torsten Bögershausen, git, Linus Torvalds

Ralf Thielow <ralf.thielow@gmail.com> writes:

> On Tue, Jul 31, 2012 at 8:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> +       /* Avoid iconv_open()/iconv_close() if there is nothing to convert */
>> +       for (i = 0; i < argc; i++) {
>> +               if (has_utf8(argv[i], (size_t)-1, NULL))
>> +                       break;
>> +       }
>> +       if (argc <= i)
>> +               return; /* no utf8 found */
>
> sorry, but "argc" can never be smaller than "i", right?

Yeah, but it is idiomatic to have an inverse of the exit condition
of the preceding for loop here to catch an early exit, and writing
it as "if (i == argc)", while technically correct, would break the
pattern.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-07-31 18:37 ` [PATCH v2] " Junio C Hamano
  2012-07-31 18:55   ` Ralf Thielow
@ 2012-07-31 19:51   ` Linus Torvalds
  2012-07-31 20:16     ` Junio C Hamano
  2012-08-10 16:43   ` Torsten Bögershausen
  2 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2012-07-31 19:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, git, Ralf Thielow

On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
>  * This is not even compile tested, so it needs testing and
>    benchmarking, as I do not even know how costly the calls to
>    open/close are when we do not have to call iconv() itself.

Ok, so it's easily compile-tested: just add

+       COMPAT_OBJS += compat/precompose_utf8.o
+       BASIC_CFLAGS += -DPRECOMPOSE_UNICODE

to the makefile for Linux too.

Actually testing how well it *works* is hard, since I don't really
have a mac (well, I do, but it no longer has OS X on it ;), and the
whole "utf-8-mac" thing does not make sense.

HOWEVER. I actually tested it with the conversion being from Latin1 to
UTF-8 instead, and it does interesting things, and kind of works. I
say "kind of", because for the case of the filesystem being in Latin1,
we actually have to convert things back to the filesystem character
set when doing "open()" and "lstat()", and this patch obviously
doesn't do that, because OS X does the conversion back to NFD on its
own.

But ACK on the patch.

If I had more time, I'd actually be interested to do the generic case
of namespace conversion, and we could make this a generic git feature
- it's something I wanted to do long ago. However, right now I'm in
the merge window and will go on a vacation to Finland after that, so I
probably won't get around to it.

I do have one suggestion: please rename the "has_utf8()" function to
"has_nonascii()" too. Why? Because that's what the function actually
does. It has nothing to do with testing for UTF-8 (the utf-8 rules are
more complex than just "high bit set"), and *if* I ever get around to
doing a more generic character set conversion for the filenames, the
decision really would be about non-ASCII, not about non-UTF8.

              Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-07-31 19:51   ` Linus Torvalds
@ 2012-07-31 20:16     ` Junio C Hamano
  2012-07-31 20:39       ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2012-07-31 20:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Torsten Bögershausen, git, Ralf Thielow

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Jul 31, 2012 at 11:37 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>>  * This is not even compile tested, so it needs testing and
>>    benchmarking, as I do not even know how costly the calls to
>>    open/close are when we do not have to call iconv() itself.
>
> Ok, so it's easily compile-tested: just add
>
> +       COMPAT_OBJS += compat/precompose_utf8.o
> +       BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
>
> to the makefile for Linux too.
>
> Actually testing how well it *works* is hard, since I don't really
> have a mac (well, I do, but it no longer has OS X on it ;), and the
> whole "utf-8-mac" thing does not make sense.

Also the motivation for this change (not the original utf-8-mac one,
which is not my code) is about not paying unnecessary iconv_open()
overhead when we do not have to, so the measurement has to happen on
Mac, not on Linux.

> HOWEVER. I actually tested it with the conversion being from Latin1 to
> UTF-8 instead, and it does interesting things, and kind of works. I
> say "kind of", because for the case of the filesystem being in Latin1,
> we actually have to convert things back to the filesystem character
> set ...

Eek.

Not just write_entry() codepath that creates the final paths on the
filesystem, you would need to touch lstat() calls that check the
existence and freshness of the path, once you go that route.  I am
sure such a change can be made to work, but I am not sure how much
we would gain from one.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-07-31 20:16     ` Junio C Hamano
@ 2012-07-31 20:39       ` Linus Torvalds
  2012-08-01 19:25         ` Torsten Bögershausen
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2012-07-31 20:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, git, Ralf Thielow

On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Eek.

Oh, I agree. Doing a full character set conversion both ways is quite
a bit more work.

> Not just write_entry() codepath that creates the final paths on the
> filesystem, you would need to touch lstat() calls that check the
> existence and freshness of the path, once you go that route.  I am
> sure such a change can be made to work, but I am not sure how much
> we would gain from one.

I think it might be interesting. I doubt it matters all that much any
more in Western Europe (Unicode really does seem to have largely taken
over), but I think Japan still uses Shift-JIS a lot.

Although maybe that is starting to fade too.

And it really is just a generalization of the OS X filesystem damage.

            Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-07-31 20:39       ` Linus Torvalds
@ 2012-08-01 19:25         ` Torsten Bögershausen
  0 siblings, 0 replies; 15+ messages in thread
From: Torsten Bögershausen @ 2012-08-01 19:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git, Ralf Thielow, tboegi

Am 2012-07-31 22:39, schrieb Linus Torvalds:
> On Tue, Jul 31, 2012 at 1:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Eek.
>
> Oh, I agree. Doing a full character set conversion both ways is quite
> a bit more work.
>
>> Not just write_entry() codepath that creates the final paths on the
>> filesystem, you would need to touch lstat() calls that check the
>> existence and freshness of the path, once you go that route.  I am
>> sure such a change can be made to work, but I am not sure how much
>> we would gain from one.
>
> I think it might be interesting. I doubt it matters all that much any
> more in Western Europe (Unicode really does seem to have largely taken
> over), but I think Japan still uses Shift-JIS a lot.
>
> Although maybe that is starting to fade too.
>
> And it really is just a generalization of the OS X filesystem damage.
>
>              Linus
>
Hi,
(I'm on vacation myself, so I migth apologize for answering very late.)

I had done a fully fledges conversion back-and-force of file names.

Having e.g. ISO-8859-1 on disk and UTF-8 in the repo.

The basic idea came from Linus, and it needs conversion for
open() fopen(), unlink(), readdir(), stat(), lstat(), rename()
(and may be one or two more, I even added support in readlink()).

After doing the whole thing ready, I realized that UTF-8 became more and 
more standard.

At the same time people started to enhance msysgit to use UTF-8 as well. 
(Thanks to the contributors)

My feeling was that the "market window" for such a generic file name 
conversion might be closed.

So, is there still a need for such a feature in git?

I wouldn't mind to re-base my pathch to master and post it within a 
couple of days/weeks.

Thanks for git and all enhancements.
/Torsten

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-07-31 18:37 ` [PATCH v2] " Junio C Hamano
  2012-07-31 18:55   ` Ralf Thielow
  2012-07-31 19:51   ` Linus Torvalds
@ 2012-08-10 16:43   ` Torsten Bögershausen
  2012-08-10 18:18     ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: Torsten Bögershausen @ 2012-08-10 16:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Torsten Bögershausen, git, Linus Torvalds, Ralf Thielow

On 31.07.12 20:37, Junio C Hamano wrote:
> In practice, the majority of paths do not have utf8 that needs
> the canonicalization. Lazily call iconv_open()/iconv_close() to
> avoid unnecessary overhead.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Helped-by: Ralf Thielow <ralf.thielow@gmail.com>
> Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>
>  * This is not even compile tested, so it needs testing and
>    benchmarking, as I do not even know how costly the calls to
>    open/close are when we do not have to call iconv() itself.
>
>    This reroll corrects an obvious thinko pointed out by Ralf, and
>    gets rid of an extra iconv_opened field added unnecessarily in
>    the previous round.
>
>    This was brought up by Linus in http://goo.gl/INWVc
>
>  compat/precompose_utf8.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
> index d40d1b3..79b5528 100644
> --- a/compat/precompose_utf8.c
> +++ b/compat/precompose_utf8.c
> @@ -67,7 +67,7 @@ void probe_utf8_pathname_composition(char *path, int len)
>  
>  void precompose_argv(int argc, const char **argv)
>  {
> -	int i = 0;
> +	int i;
>  	const char *oldarg;
>  	char *newarg;
>  	iconv_t ic_precompose;
> @@ -75,11 +75,19 @@ void precompose_argv(int argc, const char **argv)
>  	if (precomposed_unicode != 1)
>  		return;
>  
> +	/* Avoid iconv_open()/iconv_close() if there is nothing to convert */
> +	for (i = 0; i < argc; i++) {
> +		if (has_utf8(argv[i], (size_t)-1, NULL))
> +			break;
> +	}
> +	if (argc <= i)
> +		return; /* no utf8 found */
> +
>  	ic_precompose = iconv_open(repo_encoding, path_encoding);
>  	if (ic_precompose == (iconv_t) -1)
>  		return;
>  
> -	while (i < argc) {
> +	for (i = 0; i < argc; i++) {
>  		size_t namelen;
>  		oldarg = argv[i];
>  		if (has_utf8(oldarg, (size_t)-1, &namelen)) {
> @@ -87,7 +95,6 @@ void precompose_argv(int argc, const char **argv)
>  			if (newarg)
>  				argv[i] = newarg;
>  		}
> -		i++;
>  	}
>  	iconv_close(ic_precompose);
>  }
> @@ -106,8 +113,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname)
>  		return NULL;
>  	} else {
>  		int ret_errno = errno;
> -		prec_dir->ic_precompose = iconv_open(repo_encoding, path_encoding);
> -		/* if iconv_open() fails, die() in readdir() if needed */
> +		prec_dir->ic_precompose = (iconv_t)-1;
>  		errno = ret_errno;
>  	}
>  
> @@ -136,6 +142,9 @@ struct dirent_prec_psx *precompose_utf8_readdir(PREC_DIR *prec_dir)
>  		prec_dir->dirent_nfc->d_type = res->d_type;
>  
>  		if ((precomposed_unicode == 1) && has_utf8(res->d_name, (size_t)-1, NULL)) {
> +			if (prec_dir->ic_precompose == (iconv_t)-1)
> +				prec_dir->ic_precompose =
> +					iconv_open(repo_encoding, path_encoding);
>  			if (prec_dir->ic_precompose == (iconv_t)-1) {
>  				die("iconv_open(%s,%s) failed, but needed:\n"
>  						"    precomposed unicode is not supported.\n"
Hi Junio,

thanks for the optimization.
Tested-by: Torsten Bögershausen <tboegi@web.de>

We can optimize the optimization with another 0.01 %  ;-)

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 79b5528..93ae5de 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -112,9 +112,7 @@ PREC_DIR *precompose_utf8_opendir(const char *dirname)
                free(prec_dir);
                return NULL;
        } else {
-               int ret_errno = errno;
                prec_dir->ic_precompose = (iconv_t)-1;
-               errno = ret_errno;
        }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-08-10 16:43   ` Torsten Bögershausen
@ 2012-08-10 18:18     ` Junio C Hamano
  2012-08-10 21:11       ` Torsten Bögershausen
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2012-08-10 18:18 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Linus Torvalds, Ralf Thielow

Torsten Bögershausen <tboegi@web.de> writes:

> On 31.07.12 20:37, Junio C Hamano wrote:
>> In practice, the majority of paths do not have utf8 that needs
>> the canonicalization. Lazily call iconv_open()/iconv_close() to
>> avoid unnecessary overhead.
>>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> Helped-by: Ralf Thielow <ralf.thielow@gmail.com>
>> Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
>> ---
>>
>>  * This is not even compile tested, so it needs testing and
>>    benchmarking, as I do not even know how costly the calls to
>>    open/close are when we do not have to call iconv() itself.
>> ...
> Hi Junio,
>
> thanks for the optimization.
> Tested-by: Torsten Bögershausen <tboegi@web.de>

Well, I didn't mean the correctness testing without numbers.  The
correctness of the patch after a couple of people eyeballed it was
no longer a question.

If the patch does not give any measuable performance difference to
people who exercise this codepath, it is not worth merging.  And
that is not something I can't do myself without a Mac (nor I wish to
have one to be able to do so myself).

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-08-10 18:18     ` Junio C Hamano
@ 2012-08-10 21:11       ` Torsten Bögershausen
  2012-08-10 21:47         ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Torsten Bögershausen @ 2012-08-10 21:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Torsten Bögershausen, git, Linus Torvalds, Ralf Thielow

On 10.08.12 20:18, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
>> On 31.07.12 20:37, Junio C Hamano wrote:
>>> In practice, the majority of paths do not have utf8 that needs
>>> the canonicalization. Lazily call iconv_open()/iconv_close() to
>>> avoid unnecessary overhead.
>>>
>>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>>> Helped-by: Ralf Thielow <ralf.thielow@gmail.com>
>>> Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> ---
>>>
>>>  * This is not even compile tested, so it needs testing and
>>>    benchmarking, as I do not even know how costly the calls to
>>>    open/close are when we do not have to call iconv() itself.
>>> ...
>> Hi Junio,
>>
>> thanks for the optimization.
>> Tested-by: Torsten Bögershausen <tboegi@web.de>
> Well, I didn't mean the correctness testing without numbers.  The
> correctness of the patch after a couple of people eyeballed it was
> no longer a question.
>
> If the patch does not give any measuable performance difference to
> people who exercise this codepath, it is not worth merging.  And
> that is not something I can't do myself without a Mac (nor I wish to
> have one to be able to do so myself).
>
> Thanks.
Really sorry for the confusion with the percentage, I should have written:
The patch is tested OK, and we can even remove 2 lines of code,
where we save and restore errno.
They are no longer needed.

Here some benchmarks on my system
(Intel Core 2 Duo @   3,06 GHz, L2-Cache:    3 MB,  Memory:    4 GB)
running git status on the linux source tree.

After cloning, I set "git config core.precomposeunicode true".

git is v1.7.12.rc2,
git_git/git is with your patch.

git v1.7.12.rc2:
tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time git status  ; done 2>&1 | egrep "^user|^real|^sys"
real    0m0.469s
user    0m0.283s
sys     0m0.175s

real    0m0.461s
user    0m0.283s
sys     0m0.170s

real    0m0.460s
user    0m0.283s
sys     0m0.170s

real    0m0.461s
user    0m0.283s
sys     0m0.177s

real    0m0.460s
user    0m0.283s
sys     0m0.176s

With lazy init of iconv:
tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time ~/projects/git/git_git/git status  ; done 2>&1 | egrep "^user|^real|^sys"
real    0m0.463s
user    0m0.282s
sys     0m0.173s

real    0m0.460s
user    0m0.281s
sys     0m0.172s

real    0m0.460s
user    0m0.281s
sys     0m0.172s

real    0m0.463s
user    0m0.281s
sys     0m0.175s

real    0m0.462s
user    0m0.281s
sys     0m0.177s

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] macos: lazily initialize iconv
  2012-08-10 21:11       ` Torsten Bögershausen
@ 2012-08-10 21:47         ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-08-10 21:47 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Linus Torvalds, Ralf Thielow

Torsten Bögershausen <tboegi@web.de> writes:

> After cloning, I set "git config core.precomposeunicode true".
>
> git is v1.7.12.rc2,
> git_git/git is with your patch.
> 
> git v1.7.12.rc2:
> tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time git status  ; done 2>&1 | egrep "^user|^real|^sys"
> real    0m0.460s
> user    0m0.283s
> sys     0m0.176s
>
> With lazy init of iconv:
> tb@birne:~/projects/git/linux-2.6> for f in 1 2 3 4 5; do time ~/projects/git/git_git/git status  ; done 2>&1 | egrep "^user|^real|^sys"
>
> real    0m0.462s
> user    0m0.281s
> sys     0m0.177s

Thanks.

So in short, even in a tree that theoretically should benefit from
lazy initialization, it does not make things better in any measuable
way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-08-10 21:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-31 17:52 [PATCH] macos: lazily initialize iconv Junio C Hamano
2012-07-31 18:07 ` Junio C Hamano
2012-07-31 18:10 ` Ralf Thielow
2012-07-31 18:19   ` Junio C Hamano
2012-07-31 18:37 ` [PATCH v2] " Junio C Hamano
2012-07-31 18:55   ` Ralf Thielow
2012-07-31 19:00     ` Junio C Hamano
2012-07-31 19:51   ` Linus Torvalds
2012-07-31 20:16     ` Junio C Hamano
2012-07-31 20:39       ` Linus Torvalds
2012-08-01 19:25         ` Torsten Bögershausen
2012-08-10 16:43   ` Torsten Bögershausen
2012-08-10 18:18     ` Junio C Hamano
2012-08-10 21:11       ` Torsten Bögershausen
2012-08-10 21:47         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).