* Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? @ 2015-04-17 10:00 Tim Friske 2015-04-17 14:26 ` Michael J Gruber 0 siblings, 1 reply; 12+ messages in thread From: Tim Friske @ 2015-04-17 10:00 UTC (permalink / raw) To: git Hi, I wonder why "git log -G<regexp>" works with the "regexp-ignore-case" option but not with the other regexp-related options? Wouldn't it be useful to make the "G<regex>" option support the following options? * basic-regexp * extended-regexp * fixed-strings * perl-regexp Similarly I think it is not very consistent that one cannot combine any of the above options with the "S<string>" but instead have yet another option called "pickaxe-regex" to toggle between "fixed-string" and "extended-regexp" semantics for the argument passed to option "S". The description of the above options in the git-log(1) manpage of Git version 2.1 do not explicitly say that they do not support the "G<regex>" and "S<string>" option. Wouldn't it be nice to have all of the above options collaborate with each other? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-17 10:00 Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? Tim Friske @ 2015-04-17 14:26 ` Michael J Gruber 2015-04-17 16:18 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Michael J Gruber @ 2015-04-17 14:26 UTC (permalink / raw) To: Tim Friske, git Tim Friske venit, vidit, dixit 17.04.2015 12:00: > Hi, > > I wonder why "git log -G<regexp>" works with the "regexp-ignore-case" > option but not with the other regexp-related options? Wouldn't it be > useful to make the "G<regex>" option support the following options? > > * basic-regexp > * extended-regexp > * fixed-strings > * perl-regexp > > Similarly I think it is not very consistent that one cannot combine any of > the above options with the "S<string>" but instead have yet another option > called "pickaxe-regex" to toggle between "fixed-string" and > "extended-regexp" semantics for the argument passed to option "S". The defaults are different, and it is likely that users want to switch one without switching the other. E.g., with -S you often use strings that you'd rather not have to quote to guard them against the regexp engine. > The description of the above options in the git-log(1) manpage of Git > version 2.1 do not explicitly say that they do not support the "G<regex>" > and "S<string>" option. They are in different sections, since --grep etc. are log options pertaining to matching the commit header and log message (commit object), while S and G match in the diff and are described in the diff section (although they are commit limitting as well). > Wouldn't it be nice to have all of the above options collaborate with each > other? I'm afraid it's important to keep the different defaults. Personally, I found it surprising that --regexp-ignore-case applies to -G at all. It turns out that it was "bolted on" retroactively - it used to apply to commit object greps only, and was made to switch also diff grep behaviour later, as a convenience matter. The reason probaly is that "-S" originally was directed at script usage and turned out to be used by end users quite a bit. I'd say most of our inconsistencies are due to convenience... If you want to work on this, I suggest you introduce the missing long option names such as "--grep-diff" (-G) and maybe "--grep-log" (--grep) first and then find consistent and convenient names and defaults for the regexp options. Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-17 14:26 ` Michael J Gruber @ 2015-04-17 16:18 ` Junio C Hamano 2015-04-17 17:09 ` Junio C Hamano 2015-04-17 17:45 ` Junio C Hamano 2 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2015-04-17 16:18 UTC (permalink / raw) To: Michael J Gruber; +Cc: Tim Friske, git Michael J Gruber <git@drmicha.warpmail.net> writes: >> Similarly I think it is not very consistent that one cannot combine any of >> the above options with the "S<string>" but instead have yet another option >> called "pickaxe-regex" to toggle between "fixed-string" and >> "extended-regexp" semantics for the argument passed to option "S". > > The defaults are different, and it is likely that users want to switch > one without switching the other. > > E.g., with -S you often use strings that you'd rather not have to quote > to guard them against the regexp engine. But the hypothetical -G that would look for a fixed string would be vastly different from -S, wouldn't it? The -S<string> option was invented to find a commit where one side of the comparison has that string in the blob and the other side does not; it shows commits where <string> appears different number of times in the before- and the after- blobs, because doing so does not hurt its primary use case to find commits where one side has one instance of <string> and the other side has zero. But -G<regexp> shows commits whose "git show $that_commit" output would have lines matching <regexp> as added or deleted. So you get different results from this history: (before) (after) a b b a c c As "git show" for such a commit looks like this: diff --git a/one b/one index de98044..0c81c28 100644 --- a/one +++ b/one @@ -1,3 +1,3 @@ -a b +a c "git log -Ga" would say it is a match. But from "git log -Sa"'s point of view, it is not a match; both sides have the same number of 'a' [*1*]. I think it would make sense to teach --fixed-strings or whatever option to -G just like it pays attention to ignore-case, but "-G --fixed-strings" cannot be "-S". They have different semantics. [Footnote] *1* This is because -S was envisioned as (and its behaviour has been maintained as such) a building block for Porcelain that does more than "git blame". You feed a _unique_ block of lines taken from the current contents as the <string> to quickly find the last commit that touched that area, and iteratively dig deeper. The -S option was meant to be used for that single step of digging, as a part of much more grand vision in $gmane/217, which I would still consider one of the most important messages on the mailing list, posted 10 years ago ;-) ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-17 14:26 ` Michael J Gruber 2015-04-17 16:18 ` Junio C Hamano @ 2015-04-17 17:09 ` Junio C Hamano 2015-04-17 17:45 ` Junio C Hamano 2 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2015-04-17 17:09 UTC (permalink / raw) To: Michael J Gruber; +Cc: Tim Friske, git Michael J Gruber <git@drmicha.warpmail.net> writes: >> Similarly I think it is not very consistent that one cannot combine any of >> the above options with the "S<string>" but instead have yet another option >> called "pickaxe-regex" to toggle between "fixed-string" and >> "extended-regexp" semantics for the argument passed to option "S". > > The defaults are different, and it is likely that users want to switch > one without switching the other. > > E.g., with -S you often use strings that you'd rather not have to quote > to guard them against the regexp engine. But the hypothetical -G that would look for a fixed string would be vastly different from -S, wouldn't it? The -S<string> option was invented to find a commit where one side of the comparison has that string in the blob and the other side does not; it shows commits where <string> appears different number of times in the before- and the after- blobs, because doing so does not hurt its primary use case to find commits where one side has one instance of <string> and the other side has zero. But -G<regexp> shows commits whose "git show $that_commit" output would have lines matching <regexp> as added or deleted. So you get different results from this history: (before) (after) a b b a c c As "git show" for such a commit looks like this: diff --git a/one b/one index de98044..0c81c28 100644 --- a/one +++ b/one @@ -1,3 +1,3 @@ -a b +a c "git log -Ga" would say it is a match. But from "git log -Sa"'s point of view, it is not a match; both sides have the same number of 'a' [*1*]. I think it would make sense to teach --fixed-strings or whatever option to -G just like it pays attention to ignore-case, but "-G --fixed-strings" cannot be "-S". They have different semantics. [Footnote] *1* This is because -S was envisioned as (and its behaviour has been maintained as such) a building block for Porcelain that does more than "git blame". You feed a _unique_ block of lines taken from the current contents as the <string> to quickly find the last commit that touched that area, and iteratively dig deeper. The -S option was meant to be used for that single step of digging, as a part of much more grand vision in $gmane/217, which I would still consider one of the most important messages on the mailing list, posted 10 years ago ;-) ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-17 14:26 ` Michael J Gruber 2015-04-17 16:18 ` Junio C Hamano 2015-04-17 17:09 ` Junio C Hamano @ 2015-04-17 17:45 ` Junio C Hamano 2015-04-20 8:49 ` Michael J Gruber 2 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2015-04-17 17:45 UTC (permalink / raw) To: Michael J Gruber; +Cc: Tim Friske, git On Fri, Apr 17, 2015 at 7:26 AM, Michael J Gruber <git@drmicha.warpmail.net> wrote: >> >> Similarly I think it is not very consistent that one cannot combine any of >> the above options with the "S<string>" but instead have yet another option >> called "pickaxe-regex" to toggle between "fixed-string" and >> "extended-regexp" semantics for the argument passed to option "S". > > The defaults are different, and it is likely that users want to switch > one without switching the other. > > E.g., with -S you often use strings that you'd rather not have to quote > to guard them against the regexp engine. But the hypothetical -G that would look for a fixed string would be vastly different from -S, wouldn't it? The -S<string> option was invented to find a commit where one side of the comparison has that string in the blob and the other side does not; it shows commits where <string> appears different number of times in the before- and the after- blobs, because doing so does not hurt its primary use case to find commits where one side has one instance of <string> and the other side has zero. But -G<regexp> shows commits whose "git show $that_commit" output would have lines matching <regexp> as added or deleted. So you get different results from this history: (before) (after) a b b a c c As "git show" for such a commit looks like this: diff --git a/one b/one index de98044..0c81c28 100644 --- a/one +++ b/one @@ -1,3 +1,3 @@ -a b +a c "git log -Ga" would say it is a match. But from "git log -Sa"'s point of view, it is not a match; both sides have the same number of 'a' [*1*]. I think it would make sense to teach --fixed-strings or whatever option to -G just like it pays attention to ignore-case, but "-G --fixed-strings" cannot be "-S". They have different semantics. [Footnote] *1* This is because -S was envisioned as (and its behaviour has been maintained as such) a building block for Porcelain that does more than "git blame". You feed a _unique_ block of lines taken from the current contents as the <string> to quickly find the last commit that touched that area, and iteratively dig deeper. The -S option was meant to be used for that single step of digging, as a part of much more grand vision in $gmane/217, which I would still consider one of the most important messages on the mailing list, posted 10 years ago ;-) [jc: My mail provider seem to be queuing but not sending out SMTP outgoing traffic, so I am trying to (re)send this in an alternate route. If you got a duplicate of this message, my apologies.] ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-17 17:45 ` Junio C Hamano @ 2015-04-20 8:49 ` Michael J Gruber 2015-04-20 17:41 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Michael J Gruber @ 2015-04-20 8:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: Tim Friske, git Junio C Hamano venit, vidit, dixit 17.04.2015 19:45: > On Fri, Apr 17, 2015 at 7:26 AM, Michael J Gruber > <git@drmicha.warpmail.net> wrote: >>> >>> Similarly I think it is not very consistent that one cannot combine any of >>> the above options with the "S<string>" but instead have yet another option >>> called "pickaxe-regex" to toggle between "fixed-string" and >>> "extended-regexp" semantics for the argument passed to option "S". >> >> The defaults are different, and it is likely that users want to switch >> one without switching the other. >> >> E.g., with -S you often use strings that you'd rather not have to quote >> to guard them against the regexp engine. > > But the hypothetical -G that would look for a fixed string would be > vastly different from -S, wouldn't it? > > The -S<string> option was invented to find a commit where one side > of the comparison has that string in the blob and the other side > does not; it shows commits where <string> appears different number > of times in the before- and the after- blobs, because doing so does > not hurt its primary use case to find commits where one side has one > instance of <string> and the other side has zero. > > But -G<regexp> shows commits whose "git show $that_commit" output > would have lines matching <regexp> as added or deleted. So you get > different results from this history: > > (before) (after) > a b > b a > c c > > As "git show" for such a commit looks like this: > > diff --git a/one b/one > index de98044..0c81c28 100644 > --- a/one > +++ b/one > @@ -1,3 +1,3 @@ > -a > b > +a > c > > "git log -Ga" would say it is a match. But from "git log -Sa"'s > point of view, it is not a match; both sides have the same number of > 'a' [*1*]. > > I think it would make sense to teach --fixed-strings or whatever > option to -G just like it pays attention to ignore-case, but "-G > --fixed-strings" cannot be "-S". They have different semantics. Of course they cannot, that's not what I meant. They have different semantics, and *therefore* they have different defaults, and *therefore* a user may want to switch one of them (or --grep or --author or...) to --fixed--strings and keep the other to --regexp. One idea would be to make --regexp -S --fixed-strings -G work the obvious way (match option affects following grep options), but we have position independent options for most commands. Alternatively, we could distinguish at least between two groups of greppish operations and let them have independent modifying arguments and defaults: - commit header/object (--grep, --grep-reflog, --author, ...) - diff (-S, -G) But that would require some changes to current behavior. > [Footnote] > > *1* This is because -S was envisioned as (and its behaviour has been > maintained as such) a building block for Porcelain that does > more than "git blame". You feed a _unique_ block of lines taken > from the current contents as the <string> to quickly find the > last commit that touched that area, and iteratively dig deeper. > The -S option was meant to be used for that single step of > digging, as a part of much more grand vision in $gmane/217, > which I would still consider one of the most important messages > on the mailing list, posted 10 years ago ;-) > > > > [jc: My mail provider seem to be queuing but not sending out SMTP > outgoing traffic, so I am trying to (re)send this in an alternate route. > If you got a duplicate of this message, my apologies.] > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-20 8:49 ` Michael J Gruber @ 2015-04-20 17:41 ` Junio C Hamano 2015-04-20 18:33 ` Linus Torvalds 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2015-04-20 17:41 UTC (permalink / raw) To: Michael J Gruber; +Cc: Tim Friske, git Michael J Gruber <git@drmicha.warpmail.net> writes: > They [jc: -S and -G] have different semantics, and *therefore* > they have different defaults, and *therefore* a user may want to > switch one of them (or --grep or --author or...) to > --fixed--strings and keep the other to --regexp. Ahh, OK. And not just -S and -G, the "fields in headers" may be something user may want to switch independently? > One idea would be to make > > --regexp -S --fixed-strings -G > > work the obvious way (match option affects following grep > options),... I understand that your idea is for options to accumulate up to what consumes them, e.g. -S, -G, --author,..., and then get reset for the next consumer. I would think it is very much debatable if that way of working is "the obvious" one, though. If I had no prior Git experience, I would imagine that I would find it more intuitive if $ git log --regexp-ignore-case --author=tiM --grep=wip showed a commit authored by Tim that is labelled with "[WIP]". It may be tempting to expose that our underlying machinery could use 3 different regexp matching settings for header fields (i.e. author, committer), log messages and the patch bodies somehow to the end users, and either interpreting options position-dependently or having separate options may be possible ways to do so. That would give the end users full flexibility the underlying machinery offers. I am however not yet convinced that additional complexity at the UI level that would burden the end users is a reasonable price to pay for such a flexibility. When was the last time you wanted to grep for log messages case insensitively for commits authored by Tim but wanted to hide commits authored by tim when you used the above "log" command line or similar? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-20 17:41 ` Junio C Hamano @ 2015-04-20 18:33 ` Linus Torvalds 2015-04-20 18:44 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2015-04-20 18:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael J Gruber, Tim Friske, git On Mon, Apr 20, 2015 at 10:41 AM, Junio C Hamano <gitster@pobox.com> wrote: > > Ahh, OK. And not just -S and -G, the "fields in headers" may be > something user may want to switch independently? So personally, I hate extra command line flags for this. I'd much rather see is use something in the regular expression itself, and make *that* be the way you do it, and make it be the preferred format. Otherwise, you'll always have the issue that you want *part* to be case-ignoring, and another entry not, and then it's just messy with the "ignore case" being some other thing. And we support that with perl regexps, but those are only enabled with libpcre. I wonder if we could just make some simple pattern extension that we make work even *without* libpcre. IOW, instead of making people use "-regexp-ignore-case", could we just say that we *always* support the syntax of appending "(?i)" in front of the regexp. So that your git log --regexp-ignore-case --author=tiM --grep=wip example would be git log --author="(?i)tiM" --grep=wip and it would match the _author_ with ignoring case, but the "--grep=wip" part would be an exact grep. Right now the above already works (I think) if you: - build with USE_LIBPCRE - add that "--perl-regexp" switch. but what I'm suggesting is that we'd make a special case for the magical perl modifier pattern at the beginning for "(?i)", and make it work even without USE_LIBPCRE, and without specifying "--perl-regexp". We'd just special-case that pattern (and perhaps _only_ that special four-byte sequence of "(?i)" at the beginning of the search string), but perhaps we could support '(?s)' too? Hmm? I realize that this would be theoretically an incompatible change, but it would be very convenient and if we document it well it might be ok. I doubt people really search for "(?i)" at the beginning of strings _except_ if they already know about the perl syntax and want it. And to clarify: I don't suggest always building with libpcre. I literally suggest having something like /* hacky mac-hack hack */ if (strncmp("(?i)", p->pattern, 4)) { p->pattern += 4; p->ignore_case = true; } just in front of the "regcomp() call, and nothing more fancy than that. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-20 18:33 ` Linus Torvalds @ 2015-04-20 18:44 ` Junio C Hamano 2015-04-21 8:41 ` Michael J Gruber 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2015-04-20 18:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Michael J Gruber, Tim Friske, git Linus Torvalds <torvalds@linux-foundation.org> writes: > And to clarify: I don't suggest always building with libpcre. I > literally suggest having something like > > /* hacky mac-hack hack */ > if (strncmp("(?i)", p->pattern, 4)) { > p->pattern += 4; > p->ignore_case = true; > } > > just in front of the "regcomp() call, and nothing more fancy than that. Yeah, looking at the way grep.c:compile_regexp() is structured, we are already prepared to allow $ git log --grep='(?i)torvalds' --grep='Linus' that wants to find one piece of text case insensitively while another case sensitively in the same text (i.e. the log message part), so per-pattern customization may be a good way to do this. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-20 18:44 ` Junio C Hamano @ 2015-04-21 8:41 ` Michael J Gruber 2015-04-21 16:59 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Michael J Gruber @ 2015-04-21 8:41 UTC (permalink / raw) To: Junio C Hamano, Linus Torvalds; +Cc: Tim Friske, git Junio C Hamano venit, vidit, dixit 20.04.2015 20:44: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> And to clarify: I don't suggest always building with libpcre. I >> literally suggest having something like >> >> /* hacky mac-hack hack */ >> if (strncmp("(?i)", p->pattern, 4)) { >> p->pattern += 4; >> p->ignore_case = true; >> } >> >> just in front of the "regcomp() call, and nothing more fancy than that. > > Yeah, looking at the way grep.c:compile_regexp() is structured, we > are already prepared to allow > > $ git log --grep='(?i)torvalds' --grep='Linus' > > that wants to find one piece of text case insensitively while > another case sensitively in the same text (i.e. the log message > part), so per-pattern customization may be a good way to do this. > And '(?f)foo' switches to fixed strings ;) We have engine-switching options and engine-modification options. The latter are certainly good in the expression itself. Maybe even the former, though I don't know how to switch away from fixed-strings in that way... I had forgotten about pcre. Maybe switching options independently is so unusual that "use pcre" is good enough as a solution to suggest to those few users? In any case, that leaves us with: - resolve the existing inconsistencies around --regexp-ignore-case - allow to switch the engine for all greppy operations Maybe have all command line options apply to all greppy ops as a first step, which allows pcre and thus '(?i)' for all fields? Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-21 8:41 ` Michael J Gruber @ 2015-04-21 16:59 ` Junio C Hamano 2015-04-22 9:08 ` Michael J Gruber 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2015-04-21 16:59 UTC (permalink / raw) To: Michael J Gruber; +Cc: Linus Torvalds, Tim Friske, git Michael J Gruber <git@drmicha.warpmail.net> writes: > We have engine-switching options and engine-modification options. The > latter are certainly good in the expression itself. Maybe even the > former, though I don't know how to switch away from fixed-strings in > that way... I do not think mixing matching engines in a single request makes much sense. As the internal machinery is not even prepared to do that, even though it is prepared to apply engine-modifications ones to each grep term AFAIK, let's not go there. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? 2015-04-21 16:59 ` Junio C Hamano @ 2015-04-22 9:08 ` Michael J Gruber 0 siblings, 0 replies; 12+ messages in thread From: Michael J Gruber @ 2015-04-22 9:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Tim Friske, git Junio C Hamano venit, vidit, dixit 21.04.2015 18:59: > Michael J Gruber <git@drmicha.warpmail.net> writes: > >> We have engine-switching options and engine-modification options. The >> latter are certainly good in the expression itself. Maybe even the >> former, though I don't know how to switch away from fixed-strings in >> that way... > > I do not think mixing matching engines in a single request makes > much sense. As the internal machinery is not even prepared to do > that, even though it is prepared to apply engine-modifications ones > to each grep term AFAIK, let's not go there. > >From a user perspective, we mix engines already: fixed strings for -S, regexp for the rest (by default). The user can switch one, but not the other. And there are options that modify both engines at the same time. That is the kind of confusion that (triggered OP's request and that) I would like to resolve. Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-04-22 9:08 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-04-17 10:00 Why does "git log -G<regex>" works with "regexp-ignore-case" but not with other regexp-related options? Tim Friske 2015-04-17 14:26 ` Michael J Gruber 2015-04-17 16:18 ` Junio C Hamano 2015-04-17 17:09 ` Junio C Hamano 2015-04-17 17:45 ` Junio C Hamano 2015-04-20 8:49 ` Michael J Gruber 2015-04-20 17:41 ` Junio C Hamano 2015-04-20 18:33 ` Linus Torvalds 2015-04-20 18:44 ` Junio C Hamano 2015-04-21 8:41 ` Michael J Gruber 2015-04-21 16:59 ` Junio C Hamano 2015-04-22 9:08 ` Michael J Gruber
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).