* [PATCHv3] pathspec: allow escaped query values
@ 2016-06-02 23:14 Stefan Beller
2016-06-02 23:23 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Stefan Beller @ 2016-06-02 23:14 UTC (permalink / raw)
To: gitster; +Cc: git, pclouds, ramsay, Stefan Beller
In our own .gitattributes file we have attributes such as:
*.[ch] whitespace=indent,trail,space
When querying for attributes we want to be able to ask for the exact
value, i.e.
git ls-files :(attr:whitespace=indent,trail,space)
should work, but the commas are used in the attr magic to introduce
the next attr, such that this query currently fails with
fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'
This change allows escaping characters by a backslash, such that the query
git ls-files :(attr:whitespace=indent\,trail\,space)
will match all path that have the value "indent,trail,space" for the
whitespace attribute. To accomplish this, we need to modify two places.
First `eat_long_magic` needs to not stop early upon seeing a comma or
closing paren that is escaped. As a second step we need to remove any
escaping from the attr value.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
> That would be true _only_ when "find next escape and copy up to that
> byte" aka "scanning once with optimized strchr(), and copying once
> with optimized memmove()" is faster than "scanning once and copying"
> loop.
Oh right. that would work for larger strings that don't fit into a cache
very well, but in our expected case this will do.
> I was merely reacting to your use of memmove() in a very different
> loop, where if you unescape "a\b\c\defghijk", your memmove() would
> move "efghijk" many times.
Right, at the time of writing I didn't like it, but was ok with it as
we only have a few escapes.
> Also the handling of the terminating NUL may need to be
> updated.
I don't think so.
> That is why I did not say "replace yours with this", but
> merely "along the lines of this" ;-)
This is pretty much your code modulo nits (xmalloz, semicolons) now,
and I am convinced it works by the test.
However if we add a value restriction here, we need to be as strict in the
.gitattributes parsing as well and put a warning there (similar to
invalid_attr_name_message) I would think.
Thanks,
Stefan
pathspec.c | 52 +++++++++++++++++++++++++++++++++++++----
t/t6134-pathspec-with-labels.sh | 10 ++++++++
2 files changed, 58 insertions(+), 4 deletions(-)
diff --git a/pathspec.c b/pathspec.c
index 326863a..fe53ddf 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -89,6 +89,51 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
strbuf_addf(sb, ",prefix:%d)", prefixlen);
}
+static size_t strcspn_escaped(const char *s, const char *stop)
+{
+ const char *i;
+
+ for (i = s; *i; i++) {
+ /* skip the escaped character */
+ if (i[0] == '\\' && i[1]) {
+ i++;
+ continue;
+ }
+
+ if (strchr(stop, *i))
+ break;
+ }
+ return i - s;
+}
+
+static inline int invalid_value_char(const char ch)
+{
+ if (isalnum(ch) || strchr(",-_", ch))
+ return 0;
+ return -1;
+}
+
+static char *attr_value_unescape(const char *value)
+{
+ const char *src;
+ char *dst, *ret;
+
+ ret = xmallocz(strlen(value));
+ for (src = value, dst = ret; *src; src++, dst++) {
+ if (*src == '\\') {
+ if (!src[1])
+ die(_("Escape character '\\' not allowed as "
+ "last character in attr value"));
+ src++;
+ }
+ if (*src && invalid_value_char(*src))
+ die("cannot use '%c' for value matching", *src);
+ *dst = *src;
+ }
+ *dst = '\0';
+ return ret;
+}
+
static void parse_pathspec_attr_match(struct pathspec_item *item, const char *value)
{
struct string_list_item *si;
@@ -131,10 +176,9 @@ static void parse_pathspec_attr_match(struct pathspec_item *item, const char *va
if (attr[attr_len] != '=')
am->match_mode = MATCH_SET;
else {
+ const char *v = &attr[attr_len + 1];
am->match_mode = MATCH_VALUE;
- am->value = xstrdup(&attr[attr_len + 1]);
- if (strchr(am->value, '\\'))
- die(_("attr spec values must not contain backslashes"));
+ am->value = attr_value_unescape(v);
}
break;
}
@@ -166,7 +210,7 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
for (copyfrom = elt + 2;
*copyfrom && *copyfrom != ')';
copyfrom = nextat) {
- size_t len = strcspn(copyfrom, ",)");
+ size_t len = strcspn_escaped(copyfrom, ",)");
if (copyfrom[len] == ',')
nextat = copyfrom + len + 1;
else
diff --git a/t/t6134-pathspec-with-labels.sh b/t/t6134-pathspec-with-labels.sh
index a5c9632..f1e355f 100755
--- a/t/t6134-pathspec-with-labels.sh
+++ b/t/t6134-pathspec-with-labels.sh
@@ -163,4 +163,14 @@ test_expect_success 'abort on asking for wrong magic' '
test_must_fail git ls-files . ":(attr:!label=foo)"
'
+test_expect_success 'check attribute list' '
+ cat <<-EOF >>.gitattributes &&
+ * whitespace=indent,trail,space
+ EOF
+ cat .gitattributes &&
+ git ls-files ":(attr:whitespace=indent\,trail\,space)" >actual &&
+ git ls-files >expect &&
+ test_cmp expect actual
+'
+
test_done
--
2.8.2.126.g9068a9d.dirty
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCHv3] pathspec: allow escaped query values
2016-06-02 23:14 [PATCHv3] pathspec: allow escaped query values Stefan Beller
@ 2016-06-02 23:23 ` Junio C Hamano
2016-06-02 23:41 ` Stefan Beller
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2016-06-02 23:23 UTC (permalink / raw)
To: Stefan Beller; +Cc: git, pclouds, ramsay
Stefan Beller <sbeller@google.com> writes:
> However if we add a value restriction here, we need to be as strict in the
> .gitattributes parsing as well and put a warning there (similar to
> invalid_attr_name_message) I would think.
Remember, the attribute system is used for many purposes other than
this new "further limit pathspec". While the (hopefully temporary)
limitation hurts here, users will limit the attributes they use for
":(attr:VAR=VAL)" pathspec magic, and it is unlikely that either the
VAR part or its VALue are something the core Git currently uses
anyway.
So I do not think it is necessary or even beneficial to add such a
warning.
> +static char *attr_value_unescape(const char *value)
> +{
> + const char *src;
> + char *dst, *ret;
> +
> + ret = xmallocz(strlen(value));
> + for (src = value, dst = ret; *src; src++, dst++) {
> + if (*src == '\\') {
> + if (!src[1])
> + die(_("Escape character '\\' not allowed as "
> + "last character in attr value"));
> + src++;
> + }
> + if (*src && invalid_value_char(*src))
> + die("cannot use '%c' for value matching", *src);
> + *dst = *src;
> + }
> + *dst = '\0';
> + return ret;
> +}
Please sanity-check me. Just like I said to your original "I doubt
*i could be NUL here", I now doubt *src could be NUL there where
invalid_value_char() gets called.
If *src could be NUL there, then *dst gets NUL once, and then after
loop exits (presumably after incrementing dst), *dst gets another
NUL, which was the terminating NUL condition being iffy I mentioned,
but as you said, I do not think it would happen, so we can lose the
"*src && " before invalid_value_char() is called.
Thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCHv3] pathspec: allow escaped query values
2016-06-02 23:23 ` Junio C Hamano
@ 2016-06-02 23:41 ` Stefan Beller
0 siblings, 0 replies; 3+ messages in thread
From: Stefan Beller @ 2016-06-02 23:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git@vger.kernel.org, Duy Nguyen, Ramsay Jones
On Thu, Jun 2, 2016 at 4:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> However if we add a value restriction here, we need to be as strict in the
>> .gitattributes parsing as well and put a warning there (similar to
>> invalid_attr_name_message) I would think.
>
> Remember, the attribute system is used for many purposes other than
> this new "further limit pathspec".
Right, and in the past we followed a rigid pattern of non-crazy values.
The crazy values will show up once users realize they can
put anything they want into the .gitattributes to query for files.
Before this labeling change, there was no need to put anything
other the officially documented values into the attribute files.
> So I do not think it is necessary or even beneficial to add such a
> warning.
Ok.
>
>> +static char *attr_value_unescape(const char *value)
>> +{
>> + const char *src;
>> + char *dst, *ret;
>> +
>> + ret = xmallocz(strlen(value));
>> + for (src = value, dst = ret; *src; src++, dst++) {
>> + if (*src == '\\') {
>> + if (!src[1])
>> + die(_("Escape character '\\' not allowed as "
>> + "last character in attr value"));
>> + src++;
>> + }
>> + if (*src && invalid_value_char(*src))
>> + die("cannot use '%c' for value matching", *src);
>> + *dst = *src;
>> + }
>> + *dst = '\0';
>> + return ret;
>> +}
>
> Please sanity-check me. Just like I said to your original "I doubt
> *i could be NUL here", I now doubt *src could be NUL there where
> invalid_value_char() gets called.
>
> If *src could be NUL there, then *dst gets NUL once, and then after
> loop exits (presumably after incrementing dst), *dst gets another
> NUL, which was the terminating NUL condition being iffy I mentioned,
> but as you said, I do not think it would happen, so we can lose the
> "*src && " before invalid_value_char() is called.
Right, we can lose the *src check before invalid_value_char
as that ought to be caught in if (!src[1]) die(...) before.
Thanks,
Stefan
>
> Thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-06-02 23:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-02 23:14 [PATCHv3] pathspec: allow escaped query values Stefan Beller
2016-06-02 23:23 ` Junio C Hamano
2016-06-02 23:41 ` Stefan Beller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).