* [PATCHv2] pathspec: allow escaped query values
@ 2016-06-02 21:30 Stefan Beller
2016-06-02 21:54 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Beller @ 2016-06-02 21:30 UTC (permalink / raw)
To: gitster; +Cc: git, pclouds, ramsay, Stefan Beller
In our own .gitattributes file we have attributes such as:
*.[ch] whitespace=indent,trail,space
When querying for attributes we want to be able to ask for the exact
value, i.e.
git ls-files :(attr:whitespace=indent,trail,space)
should work, but the commas are used in the attr magic to introduce
the next attr, such that this query currently fails with
fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'
This change allows escaping characters by a backslash, such that the query
git ls-files :(attr:whitespace=indent\,trail\,space)
will match all path that have the value "indent,trail,space" for the
whitespace attribute. To accomplish this, we need to modify two places.
First `eat_long_magic` needs to not stop early upon seeing a comma or
closing paren that is escaped. As a second step we need to remove any
escaping from the attr value.
Signed-off-by: Stefan Beller <sbeller@google.com>
---
So here is the "escaping only, but escaping done right" version.
(It goes on top of sb/pathspec-label)
Thanks,
Stefan
pathspec.c | 44 +++++++++++++++++++++++++++++++++++++----
t/t6134-pathspec-with-labels.sh | 23 +++++++++++++++++++++
2 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/pathspec.c b/pathspec.c
index 326863a..45bd775 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -89,6 +89,43 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
strbuf_addf(sb, ",prefix:%d)", prefixlen);
}
+static size_t strcspn_escaped(const char *s, const char *reject)
+{
+ const char *i, *j;
+
+ for (i = s; *i; i++) {
+ /* skip escaped the character */
+ if (i[0] == '\\' && i[1]) {
+ i++;
+ continue;
+ }
+ /* see if any of the chars matches the current character */
+ for (j = reject; *j; j++)
+ if (!*i || *i == *j)
+ return i - s;
+ }
+ return i - s;
+}
+
+static char *attr_value_unescape(const char *value)
+{
+ char *i, *ret = xstrdup(value);
+
+ for (i = ret; *i; i++) {
+ if (i[0] == '\\') {
+ if (!i[1])
+ die(_("Escape character '\\' not allowed as "
+ "last character in attr value"));
+
+ /* remove the backslash */
+ memmove(i, i + 1, strlen(i));
+ /* and ignore the character after that */
+ i++;
+ }
+ }
+ return ret;
+}
+
static void parse_pathspec_attr_match(struct pathspec_item *item, const char *value)
{
struct string_list_item *si;
@@ -131,10 +168,9 @@ static void parse_pathspec_attr_match(struct pathspec_item *item, const char *va
if (attr[attr_len] != '=')
am->match_mode = MATCH_SET;
else {
+ const char *v = &attr[attr_len + 1];
am->match_mode = MATCH_VALUE;
- am->value = xstrdup(&attr[attr_len + 1]);
- if (strchr(am->value, '\\'))
- die(_("attr spec values must not contain backslashes"));
+ am->value = attr_value_unescape(v);
}
break;
}
@@ -166,7 +202,7 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
for (copyfrom = elt + 2;
*copyfrom && *copyfrom != ')';
copyfrom = nextat) {
- size_t len = strcspn(copyfrom, ",)");
+ size_t len = strcspn_escaped(copyfrom, ",)");
if (copyfrom[len] == ',')
nextat = copyfrom + len + 1;
else
diff --git a/t/t6134-pathspec-with-labels.sh b/t/t6134-pathspec-with-labels.sh
index a5c9632..cbea858 100755
--- a/t/t6134-pathspec-with-labels.sh
+++ b/t/t6134-pathspec-with-labels.sh
@@ -163,4 +163,27 @@ test_expect_success 'abort on asking for wrong magic' '
test_must_fail git ls-files . ":(attr:!label=foo)"
'
+test_expect_success 'check attribute list' '
+ cat <<-EOF >>.gitattributes &&
+ * whitespace=indent,trail,space
+ EOF
+ cat .gitattributes &&
+ git ls-files ":(attr:whitespace=indent\,trail\,space)" >actual &&
+ git ls-files >expect &&
+ test_cmp expect actual
+'
+test_expect_success 'wrong escaping caught' '
+ # Pass one backslash to git to fail with a missing closing paren
+ test_must_fail git ls-files ":(attr:marked-with-backslash=\\)" 2>actual &&
+ test_i18ngrep Missing actual
+'
+test_expect_success 'check escaped backslash' '
+ cat <<-EOF >>.gitattributes &&
+ /sub/* marked-with-backslash=\\
+ EOF
+ git ls-files ":(attr:marked-with-backslash=\\\\)" >actual &&
+ git ls-files sub/ >expect &&
+ test_cmp expect actual
+'
+
test_done
--
2.8.2.126.gaa5c87d.dirty
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCHv2] pathspec: allow escaped query values
2016-06-02 21:30 [PATCHv2] pathspec: allow escaped query values Stefan Beller
@ 2016-06-02 21:54 ` Junio C Hamano
2016-06-02 22:10 ` Stefan Beller
2016-06-02 22:53 ` Stefan Beller
0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2016-06-02 21:54 UTC (permalink / raw)
To: Stefan Beller; +Cc: git, pclouds, ramsay
Stefan Beller <sbeller@google.com> writes:
> In our own .gitattributes file we have attributes such as:
>
> *.[ch] whitespace=indent,trail,space
>
> When querying for attributes we want to be able to ask for the exact
> value, i.e.
>
> git ls-files :(attr:whitespace=indent,trail,space)
>
> should work, but the commas are used in the attr magic to introduce
> the next attr, ...
> ...
> So here is the "escaping only, but escaping done right" version.
> (It goes on top of sb/pathspec-label)
The phrase "should work" is probably a bit too strong (I'd have said
"it would be nice if this worked"), as we do not have to even
support comma for our immediately expected use cases. Allowing it
merely makes a casual test using our own .gitattributes easier.
> +static size_t strcspn_escaped(const char *s, const char *reject)
Perhaps s/reject/stop/?
> +{
> + const char *i, *j;
> +
> + for (i = s; *i; i++) {
> + /* skip escaped the character */
> + if (i[0] == '\\' && i[1]) {
> + i++;
> + continue;
> + }
> + /* see if any of the chars matches the current character */
> + for (j = reject; *j; j++)
> + if (!*i || *i == *j)
> + return i - s;
I somehow doubt that *i can be NUL here. In any case, this looks
more like
/* is *i is one of the stop codon? */
if (strchr(stop, *i))
break;
> + }
> + return i - s;
> +}
> +static char *attr_value_unescape(const char *value)
> +{
> + char *i, *ret = xstrdup(value);
> +
> + for (i = ret; *i; i++) {
> + if (i[0] == '\\') {
> + if (!i[1])
> + die(_("Escape character '\\' not allowed as "
> + "last character in attr value"));
> +
> + /* remove the backslash */
> + memmove(i, i + 1, strlen(i));
> + /* and ignore the character after that */
> + i++;
> + }
> + }
> + return ret;
> +}
> +
Repeated memmove() and strlen() somehow bothers me. Would there be
a more efficient and straight-forward way to do this, perhaps along
the lines of this instead?
const char *src;
char *dst, *ret;
ret = xmalloc(strlen(value));
for (src = value, dst = ret; *src; src++, dst++) {
if (*src == '\\') {
if (!src[1])
die();
src++;
}
if (*src && invalid_value_char(*src))
die("cannot use '%c' for value matching", *src)
*dst = *src;
}
*dst = '\0'
return ret;
Even though I earlier said "Now we have an unquote mechanism, we can
open the floodgate by lifting the "no backslash in value" check, I
now realize that we do want to keep some escape hatch for us to
extend the quoting syntax even more later, so perhaps with something
like this:
static inline int invalid_value_char(const char ch)
{
if (isalnum(ch) || strchr(",-_", ch))
return 0;
return -1;
}
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCHv2] pathspec: allow escaped query values
2016-06-02 21:54 ` Junio C Hamano
@ 2016-06-02 22:10 ` Stefan Beller
2016-06-02 22:53 ` Stefan Beller
1 sibling, 0 replies; 5+ messages in thread
From: Stefan Beller @ 2016-06-02 22:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git@vger.kernel.org, Duy Nguyen, Ramsay Jones
On Thu, Jun 2, 2016 at 2:54 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> In our own .gitattributes file we have attributes such as:
>>
>> *.[ch] whitespace=indent,trail,space
>>
>> When querying for attributes we want to be able to ask for the exact
>> value, i.e.
>>
>> git ls-files :(attr:whitespace=indent,trail,space)
>>
>> should work, but the commas are used in the attr magic to introduce
>> the next attr, ...
>> ...
>> So here is the "escaping only, but escaping done right" version.
>> (It goes on top of sb/pathspec-label)
>
> The phrase "should work" is probably a bit too strong (I'd have said
> "it would be nice if this worked"), as we do not have to even
> support comma for our immediately expected use cases. Allowing it
> merely makes a casual test using our own .gitattributes easier.
>
>> +static size_t strcspn_escaped(const char *s, const char *reject)
>
> Perhaps s/reject/stop/?
I took the signature from the strcspn man page, and my version of the
man page has `reject` there, `stop` certainly works too
>
>> +{
>> + const char *i, *j;
>> +
>> + for (i = s; *i; i++) {
>> + /* skip escaped the character */
>> + if (i[0] == '\\' && i[1]) {
>> + i++;
>> + continue;
>> + }
>> + /* see if any of the chars matches the current character */
>> + for (j = reject; *j; j++)
>> + if (!*i || *i == *j)
>> + return i - s;
>
> I somehow doubt that *i can be NUL here. In any case, this looks
> more like
>
> /* is *i is one of the stop codon? */
> if (strchr(stop, *i))
> break;
right, that seems easier.
>
>> + }
>> + return i - s;
>> +}
>
>> +static char *attr_value_unescape(const char *value)
>> +{
>> + char *i, *ret = xstrdup(value);
>> +
>> + for (i = ret; *i; i++) {
>> + if (i[0] == '\\') {
>> + if (!i[1])
>> + die(_("Escape character '\\' not allowed as "
>> + "last character in attr value"));
>> +
>> + /* remove the backslash */
>> + memmove(i, i + 1, strlen(i));
>> + /* and ignore the character after that */
>> + i++;
>> + }
>> + }
>> + return ret;
>> +}
>> +
>
> Repeated memmove() and strlen() somehow bothers me. Would there be
> a more efficient and straight-forward way to do this, perhaps along
> the lines of this instead?
>
> const char *src;
> char *dst, *ret;
>
> ret = xmalloc(strlen(value));
> for (src = value, dst = ret; *src; src++, dst++) {
> if (*src == '\\') {
> if (!src[1])
> die();
> src++;
> }
> if (*src && invalid_value_char(*src))
> die("cannot use '%c' for value matching", *src)
> *dst = *src;
> }
> *dst = '\0'
> return ret;
>
> Even though I earlier said "Now we have an unquote mechanism, we can
> open the floodgate by lifting the "no backslash in value" check, I
> now realize that we do want to keep some escape hatch for us to
> extend the quoting syntax even more later, so perhaps with something
> like this:
>
> static inline int invalid_value_char(const char ch)
> {
> if (isalnum(ch) || strchr(",-_", ch))
> return 0;
> return -1;
> }
Makes sense.
Later on we could have : or ; for an and/or of the values and
parens and such, so the invalid_value_char makes sense.
>
> Thanks.
Thanks,
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCHv2] pathspec: allow escaped query values
2016-06-02 21:54 ` Junio C Hamano
2016-06-02 22:10 ` Stefan Beller
@ 2016-06-02 22:53 ` Stefan Beller
2016-06-02 23:01 ` Junio C Hamano
1 sibling, 1 reply; 5+ messages in thread
From: Stefan Beller @ 2016-06-02 22:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git@vger.kernel.org, Duy Nguyen, Ramsay Jones
On Thu, Jun 2, 2016 at 2:54 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> In our own .gitattributes file we have attributes such as:
>>
>> *.[ch] whitespace=indent,trail,space
>>
>> When querying for attributes we want to be able to ask for the exact
>> value, i.e.
>>
>> git ls-files :(attr:whitespace=indent,trail,space)
>>
>> should work, but the commas are used in the attr magic to introduce
>> the next attr, ...
>> ...
>> So here is the "escaping only, but escaping done right" version.
>> (It goes on top of sb/pathspec-label)
>
> The phrase "should work" is probably a bit too strong (I'd have said
> "it would be nice if this worked"), as we do not have to even
> support comma for our immediately expected use cases. Allowing it
> merely makes a casual test using our own .gitattributes easier.
>
>> +static size_t strcspn_escaped(const char *s, const char *reject)
>
> Perhaps s/reject/stop/?
>
>> +{
>> + const char *i, *j;
>> +
>> + for (i = s; *i; i++) {
>> + /* skip escaped the character */
>> + if (i[0] == '\\' && i[1]) {
>> + i++;
>> + continue;
>> + }
>> + /* see if any of the chars matches the current character */
>> + for (j = reject; *j; j++)
>> + if (!*i || *i == *j)
>> + return i - s;
>
> I somehow doubt that *i can be NUL here. In any case, this looks
> more like
>
> /* is *i is one of the stop codon? */
> if (strchr(stop, *i))
> break;
>
>> + }
>> + return i - s;
>> +}
>
>> +static char *attr_value_unescape(const char *value)
>> +{
>> + char *i, *ret = xstrdup(value);
>> +
>> + for (i = ret; *i; i++) {
>> + if (i[0] == '\\') {
>> + if (!i[1])
>> + die(_("Escape character '\\' not allowed as "
>> + "last character in attr value"));
>> +
>> + /* remove the backslash */
>> + memmove(i, i + 1, strlen(i));
>> + /* and ignore the character after that */
>> + i++;
>> + }
>> + }
>> + return ret;
>> +}
>> +
>
> Repeated memmove() and strlen() somehow bothers me. Would there be
> a more efficient and straight-forward way to do this, perhaps along
> the lines of this instead?
Thinking about efficiency, I have the believe that memmove can be faster
than a `*src=*dst` thing we do ourselves as it may have access to specialized
assembly instructions to move larger chunks of memory or such.
So I think ideally we would do a block copy between the escape characters
(sketched as:)
last = input
while input not ended:
current = find next escape character in input
memcpy from input value in the range of last to current
last = current + 1
copy remaining parts if no further escape is found
It doesn't seem worth the effort to get it right though.
>
> const char *src;
> char *dst, *ret;
>
> ret = xmalloc(strlen(value));
xmallocz at least?
> for (src = value, dst = ret; *src; src++, dst++) {
> if (*src == '\\') {
> if (!src[1])
> die();
> src++;
> }
> if (*src && invalid_value_char(*src))
> die("cannot use '%c' for value matching", *src)
> *dst = *src;
> }
> *dst = '\0'
> return ret;
>
Thanks,
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCHv2] pathspec: allow escaped query values
2016-06-02 22:53 ` Stefan Beller
@ 2016-06-02 23:01 ` Junio C Hamano
0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2016-06-02 23:01 UTC (permalink / raw)
To: Stefan Beller; +Cc: git@vger.kernel.org, Duy Nguyen, Ramsay Jones
Stefan Beller <sbeller@google.com> writes:
> Thinking about efficiency, I have the believe that memmove can be faster
> than a `*src=*dst` thing we do ourselves as it may have access to specialized
> assembly instructions to move larger chunks of memory or such.
> So I think ideally we would do a block copy between the escape characters
> (sketched as:)
>
> last = input
> while input not ended:
> current = find next escape character in input
> memcpy from input value in the range of last to current
> last = current + 1
> copy remaining parts if no further escape is found
That would be true _only_ when "find next escape and copy up to that
byte" aka "scanning once with optimized strchr(), and copying once
with optimized memmove()" is faster than "scanning once and copying"
loop.
I was merely reacting to your use of memmove() in a very different
loop, where if you unescape "a\b\c\defghijk", your memmove() would
move "efghijk" many times.
>> ret = xmalloc(strlen(value));
>
> xmallocz at least?
Yes. Also the handling of the terminating NUL may need to be
updated. That is why I did not say "replace yours with this", but
merely "along the lines of this" ;-)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-06-02 23:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-02 21:30 [PATCHv2] pathspec: allow escaped query values Stefan Beller
2016-06-02 21:54 ` Junio C Hamano
2016-06-02 22:10 ` Stefan Beller
2016-06-02 22:53 ` Stefan Beller
2016-06-02 23:01 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).