git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Duy Nguyen <pclouds@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCHv7 5/5] pathspec: allow querying for attributes
Date: Wed, 18 May 2016 15:31:45 -0700	[thread overview]
Message-ID: <CAGZ79ka0Of6S3AGx24jrvso=AwMsxQOnWVFZA-XWy3590JbVgA@mail.gmail.com> (raw)
In-Reply-To: <xmqq7ferrvvd.fsf@gitster.mtv.corp.google.com>

On Wed, May 18, 2016 at 12:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> + - "`+`" the attribute must be set
>> + - "`-`" the attribute must be unset
>> + - "`~`" the attribute must be unspecified

Well the '-' is in line, at least.
So you're proposing empty string for set?

gitattributes documentation:
> Sometimes you would need to override an setting of an attribute for a
> path to Unspecified state. This can be done by listing the name of the
> attribute prefixed with an exclamation point !.

So the unspecified should be '!' to match Git consistency.

[I dislike the exclamation mark because of bash.
    git ls-files ":(attr:!text)"
    bash: !text: event not found
    git ls-files ":(attr:\!text)"
    fatal: attr spec '\!text': attrs must not start with '-' and be
composed of [-A-Za-z0-9_.].
    # we error out because of the \ in there
    git ls-files :\(attr:\!text\)
    ...
    # that actually works.
]

>
> I think using these three, not the way how .gitattributes specify
> them, is highly confusing to the users.

Ok I'll align those.

>
>> + - "`?`" the attribute must not be unspecified, i.e. set, unset or value matches
>
> The choice of '?' is OK, as there is no reason .gitattributes wants
> to say something fuzzy like "this is set or unset or has value".
> The last part "set, unset or value matches" does not make sense,
> though.  Did you mean "set, unset or set to value"?

Actually I meant "this is set or unset or has [any] value".

>
>> + - an empty "`[mode]`" matches if the attribute is set or has a value
>
> The same comment as +/-/~ above applies.

So consistency would ask for

 - an empty "`[mode]`" matches if the attribute is set
 - "`-`" the attribute must be unset
 - "`!`" the attribute must be unspecified
 - an empty "`[mode]`" combined with "`[=value]`" matches if the attribute has
   the given value.

and those are not yet codified, but for discussion then:

 - "`?`" the attribute must not be unspecified, i.e. set, unset or has any value
 - "`+`" the attribute must be set or has any value

>> +
>> +             match_mode = item->attr_match[i].match_mode;
>
> Mental note: there is a one-to-one correspondence between
> attr_check->check[] and attr_match[].

Right. I guess I should add a comment for that. Or better yet a test
with some weird pathspec making use of that

    git ls-files :(attr:labelA !labelB +labelA)

That way we would see if there will be confusion between the modes
and attr names in the future.

>
>> +             if (!matched)
>> +                     return 0;
>
> So this ANDs together.  OK.

Sure, (it's clear to me, maybe I should document that as well),
because multiple pathspecs are OR.

    git ls-files :(attr:labelA) :(attr:labelB)

>
> Mental note after skiming the caller:
>
> The "value" here is like "VAR1=VAL1 VAR2 !VAR3" in a pathspec
> ":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit", i.e. the source string
> to be compiled into a list of conditionals that will be evaluated by
> match_attr() in dir.c

Right.

>> +     string_list_split(&list, value, ' ', -1);
>> +     string_list_remove_empty_items(&list, 0);

Right.

>
> At this point, we got "VAR1=VAL1", "VAR2", "!VAR3" as elements in
> this list.

Right.

>
>> +     if (!item->attr_check)
>> +             item->attr_check = git_attr_check_alloc();
>
> Given more than one "attr" magic, e.g. ":(attr:A=a,attr:B)/path",
> the check may not be empty when we process the second one; we just
> extend it without losing the existing contents.

That is why I am not super happy with it though.

    ":(attr:A=a,attr:B)/path",
    ":(attr:A=a B)/path",

are the same for the user as well as in the internal data structures.
This "wastes" the white space as a possible convenient separator
character, e.g. for multiple values. On the other hand it will be easier
to type, specially for many attrs (read submodule groups).

>
>> +     ALLOC_GROW(item->attr_match, item->attr_match_nr + list.nr, item->attr_match_alloc);
>
> Likewise, we extend attr_match[] array.

>> +
>> +             val_len = strcspn(val, "=,)");
>
> I understand "=", but can "," and ")" appear here?

This was overly caution from some intermediate state, where the caller
handed in more than required.

    if (skip_prefix(copyfrom, "attr:", &body)) {
        char *pass = xmemdupz(body, len - strlen("attr:"));
        parse_pathspec_attr_match(item, pass);
        free(pass);
        continue;
    }

When given ":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit"
this only passes in "VAR1=VAL1 VAR2 !VAR3".

I contemplated write the caller as

    if (skip_prefix(copyfrom, "attr:", &body)) {
        parse_pathspec_attr_match(item, body);
        continue;
    }

but that would include further characters, i.e. ',' or ')'.
(but not a lot more)

I wanted to pass in an uncounted string though, as
string_list_separate doesn't offer a variant for counted
strings.

>
>> +             if (val[val_len] == '=') {
>> +                     am->match_mode = MATCH_VALUE;
>> +                     am->value = xstrdup(&val[val_len + 1]);
>> +                     /*
>> +                      * NEEDSWORK:
>> +                      * Do we want to allow escaped commas to search
>> +                      * for comma separated values?
>> +                      */
>
> No, we don't ;-).

ok will drop comment, ...

>
>> +                     if (strchr(am->value, '\\'))
>> +                             die(_("attr spec values must not contain backslashes"));
>
> But this is a good escape hatch to reserve for our own use, until we
> decide what the quoting mechanism will be (or if it is necessary).

but keep the code here

>
>> +             } else
>> +                     am->value = NULL;
>> +
>> +             if (invalid_attr_name(val, val_len)) {
>> +                     am->match_mode = INVALID_ATTR;
>> +                     goto err;
>> +             }
>> +
>> +             am->attr = git_attr(xmemdupz(val, val_len));
>> +             git_attr_check_append(item->attr_check, am->attr);
>
> GOOD!

except for the memory leak of xmemdupz. git_attr makes another
internal copy with FLEX_ALLOC_MEM. That ill be fixed once we use
git_attr_counted though.



>
> I think val and val_len should be renamed to attr and attr_len (in
> the VARIABLE=VALUE context, these two identifiers are about parsing
> the variable side, not the value side).

ok

>
>> +     }
>> +
>> +     string_list_clear(&list, 0);
>> +     return;
>> +err:
>> +     die(_("attr spec '%s': attrs must not start with '-' and "
>> +           "be composed of [-A-Za-z0-9_.]."), value);
>
> What is "value" at this point?  If you failed to parse the second
> element in "VAR1=VAL1 FR*TZ=VAL2 !VAR3", I think you would want to
> say "'FR*TZ' is malformed".
>
> Existing users of the function seems to say this:
>
>                 if (invalid_attr_name(cp, len)) {
>                         fprintf(stderr,
>                                 "%.*s is not a valid attribute name: %s:%d\n",
>                                 len, cp, src, lineno);
>                         return NULL;
>                 }
>
> when parsing .gitattribute file.  The source file:line reference
> does not apply to this context, but it would be better to unify the
> message.  I do not mind spelling the rules out explicitly like you
> did, but I do not want to see it spread across many places (which
> forces us to update them when we have to change the rule later).
>
> Perhaps we want a helper function in attr.c side that takes the
> attribute name string (val, val_len in your code above, which I
> suggest to be renamed to attr, attr_len) and formats the error
> message into a strbuf, or something?

That's why you added the reporting function to attrs, I see.

>
> Makes me wonder what "pass" stands for.  From the look of xmemdupz(),
> given ":(attr:VAR1=VAL1 VAR2 !VAR3)path/to/limit" to parse, len
> points at ")" and xmemdupz() gives "VAR1=VAL1 VAR2 !VAR3" to it.

"what we _pass_ to the parse_pathspec_attr_match"... Maybe attr_body?

Thanks,
Stefan

  parent reply	other threads:[~2016-05-18 22:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-18 19:02 [PATCHv7 0/5] pathspec magic extension to search for attributes Stefan Beller
2016-05-18 19:02 ` [PATCHv7 1/5] string list: improve comment Stefan Beller
2016-05-18 19:02 ` [PATCHv7 2/5] Documentation: fix a typo Stefan Beller
2016-05-18 19:02 ` [PATCHv7 3/5] pathspec: move long magic parsing out of prefix_pathspec Stefan Beller
2016-05-18 19:02 ` [PATCHv7 4/5] pathspec: move prefix check out of the inner loop Stefan Beller
2016-05-18 19:02 ` [PATCHv7 5/5] pathspec: allow querying for attributes Stefan Beller
2016-05-18 19:42   ` Stefan Beller
2016-05-18 19:47   ` Junio C Hamano
2016-05-18 20:00     ` Junio C Hamano
2016-05-18 20:21     ` Junio C Hamano
2016-05-18 21:01       ` [PATCH] attr: add counted string version of git_check_attr() Junio C Hamano
2016-05-18 21:03       ` [PATCH] attr: add counted string version of git_attr() Junio C Hamano
2016-05-18 21:36         ` Stefan Beller
2016-05-18 21:05       ` [PATCH] attr: expose validity check for attribute names Junio C Hamano
2016-05-18 21:42         ` Stefan Beller
2016-05-18 22:31     ` Stefan Beller [this message]
2016-05-18 22:59       ` [PATCHv7 5/5] pathspec: allow querying for attributes Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79ka0Of6S3AGx24jrvso=AwMsxQOnWVFZA-XWy3590JbVgA@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).