* Problem listing GIT repository with accents
@ 2010-02-01 10:48 Ellié Computing Open Source Program
2010-02-01 11:32 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Ellié Computing Open Source Program @ 2010-02-01 10:48 UTC (permalink / raw)
To: git
Dear all,
I'm writing a tool (ECMerge) which use the git command line to list folders
content (as of specific commits, tags and so on). This way our users can
browse a GIT repository in our GUI.
We are in front of the following problem: when a GIT repository contains a
folder with accents, the files names for its content are often prepended
with random characters. Here is a transcript of a list of operation
exhibiting the problem:
C:\temp\scc-tests\git>git ls-tree HEAD .
100644 blob 443d8625f771c421efd86c129483f9a139a4e85f "p\351p\351.txt"
C:\temp\scc-tests\git>mkdir "caractère spécial"
C:\temp\scc-tests\git>echo plouf > "caractère spécial\plouf.txt"
C:\temp\scc-tests\git>git add "caractère spécial"
C:\temp\scc-tests\git>git commit -m plouf
[master b94d9cb] plouf
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 "caract\350re sp\351cial/plouf.txt"
C:\temp\scc-tests\git>git ls-tree HEAD .
040000 tree d2b614bbfb4c5f39a32eb1309654262df113f605 "caract\350re
sp\351cial"
100644 blob 443d8625f771c421efd86c129483f9a139a4e85f "p\351p\351.txt"
C:\temp\scc-tests\git>git ls-tree -r HEAD .
100644 blob bf10a8b39e72c754ee1872fcdb13662cba6a8880 "caract\350re
sp\351cial/\272plouf.txt"
100644 blob 443d8625f771c421efd86c129483f9a139a4e85f "p\351p\351.txt"
C:\temp\scc-tests\git>git ls-tree -r HEAD "caractère spécial"
100644 blob bf10a8b39e72c754ee1872fcdb13662cba6a8880 "caract\350re
sp\351cial/\272plouf.txt"
Note the spurious \272 which comes in the listing :(
Trying again the same commands may give other spurious characters (each time
we tried we get different _bad_ responses)
Hope you can do somehing for that,
Best regards
Armel Asselin - from Ellié Computing
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem listing GIT repository with accents
2010-02-01 10:48 Problem listing GIT repository with accents Ellié Computing Open Source Program
@ 2010-02-01 11:32 ` Jeff King
2010-02-01 12:19 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2010-02-01 11:32 UTC (permalink / raw)
To: Ellié Computing Open Source Program
Cc: Pierre Habouzit, Junio C Hamano, git
On Mon, Feb 01, 2010 at 11:48:29AM +0100, Ellié Computing Open Source Program wrote:
> C:\temp\scc-tests\git>git ls-tree -r HEAD "caractère spécial"
> 100644 blob bf10a8b39e72c754ee1872fcdb13662cba6a8880 "caract\350re
> sp\351cial/\272plouf.txt"
>
> Note the spurious \272 which comes in the listing :(
> Trying again the same commands may give other spurious characters
> (each time we tried we get different _bad_ responses)
Looks like a bug. I was able to replicate it here, and valgrind notices
it, too:
==22720== Invalid read of size 1
==22720== at 0x80E77FF: next_quote_pos (quote.c:174)
==22720== by 0x80E783A: quote_c_style_counted (quote.c:215)
==22720== by 0x80E7D14: write_name_quotedpfx (quote.c:286)
==22720== by 0x80808F3: show_tree (builtin-ls-tree.c:114)
==22720== by 0x811000D: read_tree_recursive (tree.c:114)
==22720== by 0x81100E7: read_tree_recursive (tree.c:131)
==22720== by 0x8080CC2: cmd_ls_tree (builtin-ls-tree.c:173)
==22720== by 0x804B7FA: run_builtin (git.c:257)
==22720== by 0x804B958: handle_internal_command (git.c:412)
==22720== by 0x804BA2F: run_argv (git.c:454)
==22720== by 0x804BB97: main (git.c:525)
==22720== Address 0x43405b4 is 0 bytes after a block of size 20 alloc'd
==22720== at 0x4024C4C: malloc (vg_replace_malloc.c:195)
==22720== by 0x8115739: xmalloc (wrapper.c:20)
==22720== by 0x811005E: read_tree_recursive (tree.c:127)
==22720== by 0x8080CC2: cmd_ls_tree (builtin-ls-tree.c:173)
==22720== by 0x804B7FA: run_builtin (git.c:257)
==22720== by 0x804B958: handle_internal_command (git.c:412)
==22720== by 0x804BA2F: run_argv (git.c:454)
==22720== by 0x804BB97: main (git.c:525)
The patch below fixes it for me. This is the first time I've ever looked
at this code, though, so an extra set of eyes is appreciated. I'm also
not sure of the "!p[len]" termination that the loop uses (quoted in the
context below). The string is explicitly not NUL-terminated, so why
would that matter? I think that may have been covering up the bug in
some cases.
-- >8 --
Subject: [PATCH] fix invalid read in quote_c_style_counted
We progress through a length-bounded string, looking for
characters in need of quoting. After each character is
found, we output everything up until that character
literally, then the quoted character. We then advance our
string and look again. However, we never actually
decremented the length, meaning we ended up looking at
whatever random junk was stored after the string.
Signed-off-by: Jeff King <peff@peff.net>
---
quote.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/quote.c b/quote.c
index acb6bf9..392006d 100644
--- a/quote.c
+++ b/quote.c
@@ -216,20 +216,21 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
if (len == maxlen || !p[len])
break;
if (!no_dq && p == name)
EMIT('"');
EMITBUF(p, len);
EMIT('\\');
p += len;
ch = (unsigned char)*p++;
+ maxlen -= len + 1;
if (sq_lookup[ch] >= ' ') {
EMIT(sq_lookup[ch]);
} else {
EMIT(((ch >> 6) & 03) + '0');
EMIT(((ch >> 3) & 07) + '0');
EMIT(((ch >> 0) & 07) + '0');
}
}
EMITBUF(p, len);
--
1.7.0.rc1.16.g21332.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Problem listing GIT repository with accents
2010-02-01 11:32 ` Jeff King
@ 2010-02-01 12:19 ` Jeff King
2010-02-01 12:48 ` Johannes Sixt
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2010-02-01 12:19 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ellié Computing Open Source Program, Pierre Habouzit, git
On Mon, Feb 01, 2010 at 06:32:13AM -0500, Jeff King wrote:
> The patch below fixes it for me. This is the first time I've ever looked
> at this code, though, so an extra set of eyes is appreciated. I'm also
> not sure of the "!p[len]" termination that the loop uses (quoted in the
> context below). The string is explicitly not NUL-terminated, so why
> would that matter? I think that may have been covering up the bug in
> some cases.
OK, I see now. Callers can pass "-1" for the length to indicate that the
name is NUL-terminated. So my initial patch does work, but it ends up
decrementing the "-1", which happens to work because next_quote_pos just
checks that it is negative.
Here is an updated patch. The two changes are:
1. A test in t3902.
2. It converts the "-1" into strlen(name) before doing any work, which
means we can lose the test for a NUL-terminator. I think this makes
it much easier to see what is going on. And arguably we are fixing
a "bug" where embedded NULs in a length-bounded string erroneously
marked the end-of-string. In practice, I don't think it matters as
we are quoting paths, and NUL is not a valid character there.
It does come at the slight expense of running an extra strlen()
instead of discovering the length as we progress through the loop.
It's possible to fix that, but it makes the code a bit harder to
read.
Junio, I think this bug goes back to v1.5.4, which makes the fix
appropriate for 'maint'.
-- >8 --
Subject: [PATCH] fix invalid read in quote_c_style_counted
This function did not work on strings that were not
NUL-terminated. It reads through a length-bounded string,
searching for characters in need of quoting. After we find
one, we output the quoted character, then advance our
pointer to find the next one. However, we never decremented
the length, meaning we ended up looking at whatever random
junk was stored after the string.
On top of this, callers can pass in "-1" as the length to
indicate a NUL-terminated string, and we would also end our
loop upon seeing a NUL. To keep the code simple, let's just
convert "-1" cases into "strlen(name)".
This bug was not found by the existing tests because most
code paths feed a NUL-terminated string. The notable
exception is a directory name being fed by ls-tree.
Signed-off-by: Jeff King <peff@peff.net>
---
quote.c | 6 +++++-
t/t3902-quoted.sh | 19 ++++++++++++++++++-
2 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/quote.c b/quote.c
index acb6bf9..e35ae50 100644
--- a/quote.c
+++ b/quote.c
@@ -209,11 +209,14 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
size_t len, count = 0;
const char *p = name;
+ if (maxlen < 0)
+ maxlen = strlen(name);
+
for (;;) {
int ch;
len = next_quote_pos(p, maxlen);
- if (len == maxlen || !p[len])
+ if (len == maxlen)
break;
if (!no_dq && p == name)
@@ -223,6 +226,7 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
EMIT('\\');
p += len;
ch = (unsigned char)*p++;
+ maxlen -= len + 1;
if (sq_lookup[ch] >= ' ') {
EMIT(sq_lookup[ch]);
} else {
diff --git a/t/t3902-quoted.sh b/t/t3902-quoted.sh
index 5868052..14da45f 100755
--- a/t/t3902-quoted.sh
+++ b/t/t3902-quoted.sh
@@ -25,7 +25,7 @@ for_each_name () {
for name in \
Name "Name and a${LF}LF" "Name and an${HT}HT" "Name${DQ}" \
"$FN$HT$GN" "$FN$LF$GN" "$FN $GN" "$FN$GN" "$FN$DQ$GN" \
- "With SP in it"
+ "With SP in it" "caractère spécial/file"
do
eval "$1"
done
@@ -33,6 +33,7 @@ for_each_name () {
test_expect_success setup '
+ mkdir "caractère spécial" &&
for_each_name "echo initial >\"\$name\""
git add . &&
git commit -q -m Initial &&
@@ -50,6 +51,7 @@ Name
"Name and an\tHT"
"Name\""
With SP in it
+"caract\303\250re sp\303\251cial/file"
"\346\277\261\351\207\216\t\347\264\224"
"\346\277\261\351\207\216\n\347\264\224"
"\346\277\261\351\207\216 \347\264\224"
@@ -63,6 +65,7 @@ Name
"Name and an\tHT"
"Name\""
With SP in it
+caractère spécial/file
"濱野\t純"
"濱野\n純"
濱野 純
@@ -97,6 +100,13 @@ test_expect_success 'check fully quoted output from diff-tree' '
'
+test_expect_success 'check fully quoted output from ls-tree' '
+
+ git ls-tree --name-only -r HEAD >current &&
+ test_cmp expect.quoted current
+
+'
+
test_expect_success 'setting core.quotepath' '
git config --bool core.quotepath false
@@ -130,4 +140,11 @@ test_expect_success 'check fully quoted output from diff-tree' '
'
+test_expect_success 'check fully quoted output from ls-tree' '
+
+ git ls-tree --name-only -r HEAD >current &&
+ test_cmp expect.raw current
+
+'
+
test_done
--
1.7.0.rc1.16.g21332.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Problem listing GIT repository with accents
2010-02-01 12:19 ` Jeff King
@ 2010-02-01 12:48 ` Johannes Sixt
2010-02-01 13:39 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Johannes Sixt @ 2010-02-01 12:48 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, Ellié Computing Open Source Program,
Pierre Habouzit, git
Jeff King schrieb:
> @@ -209,11 +209,14 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
> size_t len, count = 0;
> const char *p = name;
>
> + if (maxlen < 0)
> + maxlen = strlen(name);
> +
> for (;;) {
> int ch;
>
> len = next_quote_pos(p, maxlen);
> - if (len == maxlen || !p[len])
> + if (len == maxlen)
> break;
>
> if (!no_dq && p == name)
> @@ -223,6 +226,7 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
> EMIT('\\');
> p += len;
> ch = (unsigned char)*p++;
> + maxlen -= len + 1;
Couldn't you just write
if (0 <= maxlen)
maxlen -= len + 1;
to avoid the strlen(), because the rest of the loop is actually OK when
maxlen == -1, isn't it?
-- Hannes
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem listing GIT repository with accents
2010-02-01 12:48 ` Johannes Sixt
@ 2010-02-01 13:39 ` Jeff King
2010-02-01 13:44 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2010-02-01 13:39 UTC (permalink / raw)
To: Johannes Sixt
Cc: Junio C Hamano, Ellié Computing Open Source Program,
Pierre Habouzit, git
On Mon, Feb 01, 2010 at 01:48:01PM +0100, Johannes Sixt wrote:
> Jeff King schrieb:
> > @@ -209,11 +209,14 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
> > size_t len, count = 0;
> > const char *p = name;
> >
> > + if (maxlen < 0)
> > + maxlen = strlen(name);
> > +
> > for (;;) {
> > int ch;
> >
> > len = next_quote_pos(p, maxlen);
> > - if (len == maxlen || !p[len])
> > + if (len == maxlen)
> > break;
> >
> > if (!no_dq && p == name)
> > @@ -223,6 +226,7 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
> > EMIT('\\');
> > p += len;
> > ch = (unsigned char)*p++;
> > + maxlen -= len + 1;
>
> Couldn't you just write
>
> if (0 <= maxlen)
> maxlen -= len + 1;
>
> to avoid the strlen(), because the rest of the loop is actually OK when
> maxlen == -1, isn't it?
If you want to keep the "!p[len]" condition, then yes. If we want to
properly quote internal NULs (which again, I am not sure is of practical
use), then you would also have to make that condition "(maxlen >= 0 &&
!p[len])". Which is really not that bad, but I was trying to make it
easier to read. I am OK with any of the three combinations of fixes.
And the fact that I am using the word "combination" probably means it
should be a separate patch anyway.
So here is the minimal patch to go on 'maint', for Junio's convenience.
-- >8 --
Subject: [PATCH] fix invalid read in quote_c_style_counted
This function did not work on strings that were not
NUL-terminated. It reads through a length-bounded string,
searching for characters in need of quoting. After we find
one, we output the quoted character, then advance our
pointer to find the next one. However, we never decremented
the length, meaning we ended up looking at whatever random
junk was stored after the string.
This bug was not found by the existing tests because most
code paths feed a NUL-terminated string. The notable
exception is a directory name being fed by ls-tree.
Signed-off-by: Jeff King <peff@peff.net>
---
quote.c | 2 ++
t/t3902-quoted.sh | 19 ++++++++++++++++++-
2 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/quote.c b/quote.c
index acb6bf9..723bb4f 100644
--- a/quote.c
+++ b/quote.c
@@ -223,6 +223,8 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
EMIT('\\');
p += len;
ch = (unsigned char)*p++;
+ if (maxlen >= 0)
+ maxlen -= len + 1;
if (sq_lookup[ch] >= ' ') {
EMIT(sq_lookup[ch]);
} else {
diff --git a/t/t3902-quoted.sh b/t/t3902-quoted.sh
index 5868052..14da45f 100755
--- a/t/t3902-quoted.sh
+++ b/t/t3902-quoted.sh
@@ -25,7 +25,7 @@ for_each_name () {
for name in \
Name "Name and a${LF}LF" "Name and an${HT}HT" "Name${DQ}" \
"$FN$HT$GN" "$FN$LF$GN" "$FN $GN" "$FN$GN" "$FN$DQ$GN" \
- "With SP in it"
+ "With SP in it" "caractère spécial/file"
do
eval "$1"
done
@@ -33,6 +33,7 @@ for_each_name () {
test_expect_success setup '
+ mkdir "caractère spécial" &&
for_each_name "echo initial >\"\$name\""
git add . &&
git commit -q -m Initial &&
@@ -50,6 +51,7 @@ Name
"Name and an\tHT"
"Name\""
With SP in it
+"caract\303\250re sp\303\251cial/file"
"\346\277\261\351\207\216\t\347\264\224"
"\346\277\261\351\207\216\n\347\264\224"
"\346\277\261\351\207\216 \347\264\224"
@@ -63,6 +65,7 @@ Name
"Name and an\tHT"
"Name\""
With SP in it
+caractère spécial/file
"濱野\t純"
"濱野\n純"
濱野 純
@@ -97,6 +100,13 @@ test_expect_success 'check fully quoted output from diff-tree' '
'
+test_expect_success 'check fully quoted output from ls-tree' '
+
+ git ls-tree --name-only -r HEAD >current &&
+ test_cmp expect.quoted current
+
+'
+
test_expect_success 'setting core.quotepath' '
git config --bool core.quotepath false
@@ -130,4 +140,11 @@ test_expect_success 'check fully quoted output from diff-tree' '
'
+test_expect_success 'check fully quoted output from ls-tree' '
+
+ git ls-tree --name-only -r HEAD >current &&
+ test_cmp expect.raw current
+
+'
+
test_done
--
1.7.0.rc1.16.g21332.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Problem listing GIT repository with accents
2010-02-01 13:39 ` Jeff King
@ 2010-02-01 13:44 ` Jeff King
2010-02-01 17:21 ` Junio C Hamano
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2010-02-01 13:44 UTC (permalink / raw)
To: Johannes Sixt
Cc: Junio C Hamano, Ellié Computing Open Source Program,
Pierre Habouzit, git
On Mon, Feb 01, 2010 at 08:39:03AM -0500, Jeff King wrote:
> If you want to keep the "!p[len]" condition, then yes. If we want to
> properly quote internal NULs (which again, I am not sure is of practical
> use), then you would also have to make that condition "(maxlen >= 0 &&
> !p[len])". Which is really not that bad, but I was trying to make it
> easier to read. I am OK with any of the three combinations of fixes.
>
> And the fact that I am using the word "combination" probably means it
> should be a separate patch anyway.
>
> So here is the minimal patch to go on 'maint', for Junio's convenience.
And here is what the "quote embedded NULs" patch would look like on top.
It's actually pretty straightforward, but the more I think of it, the
more I think it is probably not worthwhile. Not only are we quoting
paths, which should not have embedded NULs, but it requires that the
caller always pass the length explicitly, and clearly we are not doing
that all or even most of the time. So while this would fix the low-level
"this function quotes an arbitrary string" case, for it to be of any use
all of the code paths leading to it would need to be audited to handle
NUL-embedded strings.
---
diff --git a/quote.c b/quote.c
index 723bb4f..fc93435 100644
--- a/quote.c
+++ b/quote.c
@@ -213,7 +213,7 @@ static size_t quote_c_style_counted(const char *name, ssize_t maxlen,
int ch;
len = next_quote_pos(p, maxlen);
- if (len == maxlen || !p[len])
+ if (len == maxlen || (maxlen < 0 && !p[len]))
break;
if (!no_dq && p == name)
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Problem listing GIT repository with accents
2010-02-01 13:44 ` Jeff King
@ 2010-02-01 17:21 ` Junio C Hamano
2010-02-01 17:40 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2010-02-01 17:21 UTC (permalink / raw)
To: Jeff King
Cc: Johannes Sixt, Ellié Computing Open Source Program,
Pierre Habouzit, git
Jeff King <peff@peff.net> writes:
> And here is what the "quote embedded NULs" patch would look like on top.
> It's actually pretty straightforward, but the more I think of it, the
> more I think it is probably not worthwhile. Not only are we quoting
> paths, which should not have embedded NULs, but it requires that the
> caller always pass the length explicitly, and clearly we are not doing
> that all or even most of the time. So while this would fix the low-level
> "this function quotes an arbitrary string" case, for it to be of any use
> all of the code paths leading to it would need to be audited to handle
> NUL-embedded strings.
Thanks; I think your analysis is very sound.
The current callers do not care (for a good reason). They are dealing
with a pathname, and either they are feeding a string on which there is a
pathname followed by something else and they know where that something
else begins (they give maxlen because they don't want to or cannot NUL
terminate the string in place while calling this function), or they know
they want to quote to the end of the string but they haven't counted how
long it is (they say "I don't care---just quote to the end"). In either
way, they don't expect giving too long a maxlen will go beyond the end of
the string past the terminating NUL.
Unless we document "this function is to C-quote a (portion of a) string,
either to the end or up to the given length", however, future callers may
incorrectly assume that with length the function can be fed anything and
would C-quote that piece of memory. The argument name "const char *name"
already suggests that is not an arbitrary binary rubbish, changing that to
"str" would probably make that a bit stronger documentation, or we could
explicitly say "this is a (early part of a) NUL-terminated string" in a
comment.
But your one-liner patch would actually be a smaller change than any of
them and makes the whole problem disappear; wouldn't it be a far better
solution?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem listing GIT repository with accents
2010-02-01 17:21 ` Junio C Hamano
@ 2010-02-01 17:40 ` Jeff King
0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2010-02-01 17:40 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johannes Sixt, Ellié Computing Open Source Program,
Pierre Habouzit, git
On Mon, Feb 01, 2010 at 09:21:51AM -0800, Junio C Hamano wrote:
> Unless we document "this function is to C-quote a (portion of a) string,
> either to the end or up to the given length", however, future callers may
> incorrectly assume that with length the function can be fed anything and
> would C-quote that piece of memory. The argument name "const char *name"
> already suggests that is not an arbitrary binary rubbish, changing that to
> "str" would probably make that a bit stronger documentation, or we could
> explicitly say "this is a (early part of a) NUL-terminated string" in a
> comment.
>
> But your one-liner patch would actually be a smaller change than any of
> them and makes the whole problem disappear; wouldn't it be a far better
> solution?
Sure, if you are going to bother to document it to future-proof against
new callers, you might as well just make it more flexible with my
one-liner. I don't think it will impact the behavior of any existing
callers either way.
I was just going to not bother, but perhaps while we have spent some
brain cycles on it, it is better to just fix it. Either way is fine with
me.
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-02-01 17:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-01 10:48 Problem listing GIT repository with accents Ellié Computing Open Source Program
2010-02-01 11:32 ` Jeff King
2010-02-01 12:19 ` Jeff King
2010-02-01 12:48 ` Johannes Sixt
2010-02-01 13:39 ` Jeff King
2010-02-01 13:44 ` Jeff King
2010-02-01 17:21 ` Junio C Hamano
2010-02-01 17:40 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).