* Making git grep ignore binary the default @ 2025-10-17 15:00 El_Hoy 2025-10-17 21:29 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: El_Hoy @ 2025-10-17 15:00 UTC (permalink / raw) To: git I've found that there is a flag (`git grep -I`) to ignore binary files, it works great, but I've found no way to make it the default. It would be great to have a config for this. This way a possible implementation implies: - Adding a config `grep.ignoreBinary` that defaults to false, keeping the current default. - Adding a flag `git grep --include-binary` to revert the default. But maybe the `-a, --text` flag already does that. Also, maybe the next git version (3.0) can default to ignore-binary as a better default. Finally, if this makes sense, I can do my best to implement the change in the code. Regards. - Eloy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-17 15:00 Making git grep ignore binary the default El_Hoy @ 2025-10-17 21:29 ` Junio C Hamano 2025-10-17 23:29 ` Thomas Braun 2025-10-18 10:22 ` Jeff King 0 siblings, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2025-10-17 21:29 UTC (permalink / raw) To: El_Hoy; +Cc: git El_Hoy <eloyesp@gmail.com> writes: > I've found that there is a flag (`git grep -I`) to ignore binary > files, it works great, but I've found no way to make it the default. > > It would be great to have a config for this. This way a possible > implementation implies: > > - Adding a config `grep.ignoreBinary` that defaults to false, keeping > the current default. > > - Adding a flag `git grep --include-binary` to revert the default. But > maybe the `-a, --text` flag already does that. > > Also, maybe the next git version (3.0) can default to ignore-binary as > a better default. I am tempted to suggest not to do any of the above. Simply because we have never needed to do something similar to "-a" and "-I" that we added in early 2006 for the past nearly 20 years. Also because GNU does not have any such thing to force "-a" or "-I" as default. The biggest reason is that it would be surprising if such a change does not break existing scripts that have been written by people over the years. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-17 21:29 ` Junio C Hamano @ 2025-10-17 23:29 ` Thomas Braun 2025-10-18 0:52 ` brian m. carlson 2025-10-18 10:22 ` Jeff King 1 sibling, 1 reply; 10+ messages in thread From: Thomas Braun @ 2025-10-17 23:29 UTC (permalink / raw) To: Junio C Hamano, El_Hoy; +Cc: git Am 17.10.2025 um 23:29 schrieb Junio C Hamano: > El_Hoy <eloyesp@gmail.com> writes: > >> I've found that there is a flag (`git grep -I`) to ignore binary >> files, it works great, but I've found no way to make it the default. >> >> It would be great to have a config for this. This way a possible >> implementation implies: >> >> - Adding a config `grep.ignoreBinary` that defaults to false, keeping >> the current default. >> >> - Adding a flag `git grep --include-binary` to revert the default. But >> maybe the `-a, --text` flag already does that. >> >> Also, maybe the next git version (3.0) can default to ignore-binary as >> a better default. > I am tempted to suggest not to do any of the above. > > Simply because we have never needed to do something similar to "-a" > and "-I" that we added in early 2006 for the past nearly 20 years. > Also because GNU does not have any such thing to force "-a" or "-I" > as default. The biggest reason is that it would be surprising if > such a change does not break existing scripts that have been written > by people over the years. And if we only would have the config option "grep.ignoreBinary" defaulting to false with no default change whatsoever? I always want to ignore binaries when grepping and find it a bit tedious that I have to spell it out all over again. And yes I do have an alias as well but usually don't remember to use it. I'm also curious what people are looking for in binary files with git grep. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-17 23:29 ` Thomas Braun @ 2025-10-18 0:52 ` brian m. carlson 2025-10-18 14:16 ` rsbecker 2025-10-20 15:24 ` Thomas Braun 0 siblings, 2 replies; 10+ messages in thread From: brian m. carlson @ 2025-10-18 0:52 UTC (permalink / raw) To: Thomas Braun; +Cc: Junio C Hamano, El_Hoy, git [-- Attachment #1: Type: text/plain, Size: 2394 bytes --] On 2025-10-17 at 23:29:22, Thomas Braun wrote: > Am 17.10.2025 um 23:29 schrieb Junio C Hamano: > > Simply because we have never needed to do something similar to "-a" > > and "-I" that we added in early 2006 for the past nearly 20 years. > > Also because GNU does not have any such thing to force "-a" or "-I" > > as default. The biggest reason is that it would be surprising if > > such a change does not break existing scripts that have been written > > by people over the years. > > And if we only would have the config option "grep.ignoreBinary" defaulting > to false with no default change whatsoever? I always want to ignore binaries > when grepping and find it a bit tedious that I have to spell it out all over > again. And yes I do have an alias as well but usually don't remember to use > it. As Junio said, this could break existing scripts. If I write a command which uses `git grep` and expects to find all matching files, it would not work on your system with `grep.ignoreBinary` set to true. For instance, if I am working on a project for a company and must exclude source code with a certain vendor's copyright (because we don't have permission to distribute their code), then it would be very bad if I accidentally distributed that company's binary files due to `git grep -l PATTERN | xargs rm -f` not matching them since it would violate the license. This is just an example, but there are lots of cases where people do really want to search every file. > I'm also curious what people are looking for in binary files with git grep. It's common to mark PDFs or PostScript files as binary because they often contain embedded binary fonts, but they are actually mostly text and can be usefully searched with grep. For instance, I once created some awards for a non-profit based on combining standalone text-based PostScript code along with output from groff, so those independent pieces could end up being source that you might store in Git and search, even if many configurations would use `*.ps -text` in a system gitattributes file. Sometimes you also have images or such for a website, which contain XMP metadata (a form of XML-serialized RDF). Finding those images which have certain author metadata or a certain license URL embedded in them could be valuable. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Making git grep ignore binary the default 2025-10-18 0:52 ` brian m. carlson @ 2025-10-18 14:16 ` rsbecker 2025-10-20 15:24 ` Thomas Braun 1 sibling, 0 replies; 10+ messages in thread From: rsbecker @ 2025-10-18 14:16 UTC (permalink / raw) To: 'brian m. carlson', 'Thomas Braun' Cc: 'Junio C Hamano', 'El_Hoy', git On October 17, 2025 8:52 PM, brian m. carlson wrote: >On 2025-10-17 at 23:29:22, Thomas Braun wrote: >> Am 17.10.2025 um 23:29 schrieb Junio C Hamano: >> > Simply because we have never needed to do something similar to "-a" >> > and "-I" that we added in early 2006 for the past nearly 20 years. >> > Also because GNU does not have any such thing to force "-a" or "-I" >> > as default. The biggest reason is that it would be surprising if >> > such a change does not break existing scripts that have been written >> > by people over the years. >> >> And if we only would have the config option "grep.ignoreBinary" >> defaulting to false with no default change whatsoever? I always want >> to ignore binaries when grepping and find it a bit tedious that I have >> to spell it out all over again. And yes I do have an alias as well but >> usually don't remember to use it. > >As Junio said, this could break existing scripts. If I write a command which uses `git >grep` and expects to find all matching files, it would not work on your system with >`grep.ignoreBinary` set to true. > >For instance, if I am working on a project for a company and must exclude source >code with a certain vendor's copyright (because we don't have permission to >distribute their code), then it would be very bad if I accidentally distributed that >company's binary files due to `git grep -l PATTERN | xargs rm -f` not matching them >since it would violate the license. > >This is just an example, but there are lots of cases where people do really want to >search every file. > >> I'm also curious what people are looking for in binary files with git grep. > >It's common to mark PDFs or PostScript files as binary because they often contain >embedded binary fonts, but they are actually mostly text and can be usefully >searched with grep. For instance, I once created some awards for a non-profit >based on combining standalone text-based PostScript code along with output from >groff, so those independent pieces could end up being source that you might store >in Git and search, even if many configurations would use `*.ps -text` in a system >gitattributes file. > >Sometimes you also have images or such for a website, which contain XMP >metadata (a form of XML-serialized RDF). Finding those images which have certain >author metadata or a certain license URL embedded in them could be valuable. I agree that this will break scripts. There are quasi-binary files in some SQL spaces that really benefit from git grep working. Please do not make this the default. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-18 0:52 ` brian m. carlson 2025-10-18 14:16 ` rsbecker @ 2025-10-20 15:24 ` Thomas Braun 2025-10-20 17:20 ` El_Hoy 1 sibling, 1 reply; 10+ messages in thread From: Thomas Braun @ 2025-10-20 15:24 UTC (permalink / raw) To: brian m. carlson; +Cc: Junio C Hamano, El_Hoy, git, Jeff King Am Samstag, dem 18.10.2025 um 00:52 +0000 schrieb brian m. carlson: > On 2025-10-17 at 23:29:22, Thomas Braun wrote: > > Am 17.10.2025 um 23:29 schrieb Junio C Hamano: > > > Simply because we have never needed to do something similar to "- > > > a" > > > and "-I" that we added in early 2006 for the past nearly 20 > > > years. > > > Also because GNU does not have any such thing to force "-a" or "- > > > I" > > > as default. The biggest reason is that it would be surprising if > > > such a change does not break existing scripts that have been > > > written > > > by people over the years. > > > > And if we only would have the config option "grep.ignoreBinary" > > defaulting > > to false with no default change whatsoever? I always want to ignore > > binaries > > when grepping and find it a bit tedious that I have to spell it out > > all over > > again. And yes I do have an alias as well but usually don't > > remember to use > > it. > > As Junio said, this could break existing scripts. If I write a > command which uses `git grep` and expects to find all matching files, > it would not work on your system with `grep.ignoreBinary` set to > true. > > For instance, if I am working on a project for a company and must > exclude source code with a certain vendor's copyright (because we > don't have permission to distribute their code), then it would be > very bad if I accidentally distributed that company's binary files > due to `git grep -l PATTERN | xargs rm -f` not matching them since it > would violate the license. > > This is just an example, but there are lots of cases where people do > really want to search every file. I understand your use case. But if you don't control the environment (git config settings among others) your task of finding things reliably will just very easily break. Also in your use case, I either opted in to ignoring binary files, so I should be wary of scripts assuming binary files are searched or I did not and then nothing changes. > > I'm also curious what people are looking for in binary files with > > git grep. > > It's common to mark PDFs or PostScript files as binary because they > often contain embedded binary fonts, but they are actually mostly > text > and can be usefully searched with grep. For instance, I once created > some awards for a non-profit based on combining standalone text-based > PostScript code along with output from groff, so those independent > pieces could end up being source that you might store in Git and > search, > even if many configurations would use `*.ps -text` in a system > gitattributes file. > > Sometimes you also have images or such for a website, which contain > XMP metadata (a form of XML-serialized RDF). Finding those images > which have certain author metadata or a certain license URL embedded > in them could be valuable. Thanks for the examples. The previous discussion dug up by Junio and Peff was an interesting read. But from my understanding adding a git attribute like grep, which allows to ignore "uninteresting" files for grep, does not solve your backward compatibility concerns. Changing that looks easier now to be done in 2012 comared to 2025 ;) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-20 15:24 ` Thomas Braun @ 2025-10-20 17:20 ` El_Hoy 2025-10-21 7:27 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: El_Hoy @ 2025-10-20 17:20 UTC (permalink / raw) To: Thomas Braun; +Cc: brian m. carlson, Junio C Hamano, git, Jeff King Ok, so if I understand correctly: 1. changing the default grep behaviour is not acceptable because it might break existing scripts. 2. Adding a config option might break a shared script on specific computers but that seems more reasonable. 3. There may be better ways to implement the setting that allow more flexibility. Regarding point 1, I thought about this as a possible idea for a distant future with proper warnings, because I think this is a better default, but if the cost of making such a change is too big, we can omit this. If someone wants to come back to this, it might make sense to "grep" public code and check how much code would be affected to have more clarity about the costs. On point 2, as Thomas points, there are many factors that might break a script that rely on 'git grep' directly for a dangerous task, this makes me think that we could add a `--porcelain` option to `git grep` to be used on scripts and be reliable, and it might ignore the config. On point 3, the configuration could be made with more flexibility in mind, making it possible to ignore different files that are not binary (for example linguist-generated files). The downside of that approach is that it requires more configuration, while a single boolean for skipping binaries might be simpler. I'm ok with any approach. That said, it seems important to add a flag to negate that setting for a single run, so if I have the setting to skip from grep some files, there should be a way to run grep on all the files, ignoring this setting, as it is also needed from time to time. Regards. --- Eloy On Mon, Oct 20, 2025 at 12:24 PM Thomas Braun <thomas.braun@virtuell-zuhause.de> wrote: > > Am Samstag, dem 18.10.2025 um 00:52 +0000 schrieb brian m. carlson: > > On 2025-10-17 at 23:29:22, Thomas Braun wrote: > > > Am 17.10.2025 um 23:29 schrieb Junio C Hamano: > > > > Simply because we have never needed to do something similar to "- > > > > a" > > > > and "-I" that we added in early 2006 for the past nearly 20 > > > > years. > > > > Also because GNU does not have any such thing to force "-a" or "- > > > > I" > > > > as default. The biggest reason is that it would be surprising if > > > > such a change does not break existing scripts that have been > > > > written > > > > by people over the years. > > > > > > And if we only would have the config option "grep.ignoreBinary" > > > defaulting > > > to false with no default change whatsoever? I always want to ignore > > > binaries > > > when grepping and find it a bit tedious that I have to spell it out > > > all over > > > again. And yes I do have an alias as well but usually don't > > > remember to use > > > it. > > > > As Junio said, this could break existing scripts. If I write a > > command which uses `git grep` and expects to find all matching files, > > it would not work on your system with `grep.ignoreBinary` set to > > true. > > > > For instance, if I am working on a project for a company and must > > exclude source code with a certain vendor's copyright (because we > > don't have permission to distribute their code), then it would be > > very bad if I accidentally distributed that company's binary files > > due to `git grep -l PATTERN | xargs rm -f` not matching them since it > > would violate the license. > > > > This is just an example, but there are lots of cases where people do > > really want to search every file. > > I understand your use case. But if you don't control the environment > (git config settings among others) your task of finding things reliably > will just very easily break. > > Also in your use case, I either opted in to ignoring binary files, so I > should be wary of scripts assuming binary files are searched or I did > not and then nothing changes. > > > > I'm also curious what people are looking for in binary files with > > > git grep. > > > > It's common to mark PDFs or PostScript files as binary because they > > often contain embedded binary fonts, but they are actually mostly > > text > > and can be usefully searched with grep. For instance, I once created > > some awards for a non-profit based on combining standalone text-based > > PostScript code along with output from groff, so those independent > > pieces could end up being source that you might store in Git and > > search, > > even if many configurations would use `*.ps -text` in a system > > gitattributes file. > > > > Sometimes you also have images or such for a website, which contain > > XMP metadata (a form of XML-serialized RDF). Finding those images > > which have certain author metadata or a certain license URL embedded > > in them could be valuable. > > Thanks for the examples. > > The previous discussion dug up by Junio and Peff was an interesting > read. But from my understanding adding a git attribute like grep, which > allows to ignore "uninteresting" files for grep, does not solve your > backward compatibility concerns. Changing that looks easier now to be > done in 2012 comared to 2025 ;) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-20 17:20 ` El_Hoy @ 2025-10-21 7:27 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2025-10-21 7:27 UTC (permalink / raw) To: El_Hoy; +Cc: Thomas Braun, brian m. carlson, Junio C Hamano, git On Mon, Oct 20, 2025 at 02:20:06PM -0300, El_Hoy wrote: > On point 2, as Thomas points, there are many factors that might break > a script that rely on 'git grep' directly for a dangerous task, this > makes me think that we could add a `--porcelain` option to `git grep` > to be used on scripts and be reliable, and it might ignore the config. Another option here is to provide a way for scripts to override the ignore mechanism specifically (which would depend on how it is implemented). For an example, see below. > On point 3, the configuration could be made with more flexibility in > mind, making it possible to ignore different files that are not binary > (for example linguist-generated files). The downside of that approach > is that it requires more configuration, while a single boolean for > skipping binaries might be simpler. I'm ok with any approach. One way to do this would be to provide a default pathspec for git-grep when one is not defined. Something like: diff --git a/builtin/grep.c b/builtin/grep.c index 13841fbf00..7b6a6ba9c6 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -42,6 +42,7 @@ static char const * const grep_usage[] = { }; static int recurse_submodules; +static struct strvec default_pathspec = STRVEC_INIT; static int num_threads; @@ -320,6 +321,15 @@ static int grep_cmd_config(const char *var, const char *value, if (!strcmp(var, "submodule.recurse")) recurse_submodules = git_config_bool(var, value); + if (!strcmp(var, "grep.defaultpathspec")) { + if (!value) + return config_error_nonbool(var); + else if (*value) + strvec_push(&default_pathspec, value); + else + strvec_clear(&default_pathspec); + } + return st; } @@ -1169,7 +1179,7 @@ int cmd_grep(int argc, parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD | (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0), - prefix, argv + i); + prefix, i < argc ? argv + i : default_pathspec.v); pathspec.max_depth = opt.max_depth; pathspec.recursive = 1; pathspec.recurse_submodules = !!recurse_submodules; Building with that lets you do something like this in git.git: $ ./git grep 'added by us:' po/bg.po:msgid "added by us:" po/ca.po:msgid "added by us:" po/de.po:msgid "added by us:" po/el.po:msgid "added by us:" po/es.po:msgid "added by us:" po/fr.po:msgid "added by us:" po/ga.po:msgid "added by us:" po/id.po:msgid "added by us:" po/it.po:msgid "added by us:" po/ko.po:msgid "added by us:" po/pl.po:msgid "added by us:" po/pt_PT.po:msgid "added by us:" po/ru.po:msgid "added by us:" po/sv.po:msgid "added by us:" po/tr.po:msgid "added by us:" po/uk.po:msgid "added by us:" po/vi.po:msgid "added by us:" po/zh_CN.po:msgid "added by us:" po/zh_TW.po:msgid "added by us:" t/t7060-wtstatus.sh: added by us: sub_second.txt wt-status.c: return _("added by us:"); $ git config grep.defaultPathspec :^po $ ./git grep 'added by us:' t/t7060-wtstatus.sh: added by us: sub_second.txt wt-status.c: return _("added by us:"); And then scripts override it by providing a pathspec (like "." if they want to see everything, which conveniently also works on old versions of Git). It isn't _quite_ the same as an option to ignore certain paths, as it's a default replacement, and not additive (so as soon as I ask for everything in "foo/", then "foo/bar" will be included even if I have "^foo/bar" in my default pathspec). I'm not sure if that is a drawback or a feature. There may be other rough edges. It's not something I've thought that carefully about yet. But it just gives an idea of a possible direction. Of course you can already do the same thing with an alias right now[1]. You just need to remember to type the alias instead of "grep". That requires some finger retraining, but it would eliminate any script / compatibility questions. -Peff [1] The alias isn't quite trivial because we want to add our pathspecs at the _end_ of the command-line. But I think something like: [alias] gr = "!f() { exec git grep \"$@\" :^po; }; f" works. ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-17 21:29 ` Junio C Hamano 2025-10-17 23:29 ` Thomas Braun @ 2025-10-18 10:22 ` Jeff King 2025-10-18 16:01 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Jeff King @ 2025-10-18 10:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: El_Hoy, git On Fri, Oct 17, 2025 at 02:29:46PM -0700, Junio C Hamano wrote: > Simply because we have never needed to do something similar to "-a" > and "-I" that we added in early 2006 for the past nearly 20 years. > Also because GNU does not have any such thing to force "-a" or "-I" > as default. The biggest reason is that it would be surprising if > such a change does not break existing scripts that have been written > by people over the years. I do think there is one difference between git-grep and regular grep here: the input file selection. In "grep", the default set of files to search is nothing, and you have to tell it which files to look at. So aside from overly broad globs, the problem solves itself when you just don't pass in the binary paths. But in git-grep, the default set of files to search is everything in the repository! So it is very easy to noisy hits from uninteresting files. I think binary-ness of the files is a red herring, though. There are plenty of text files that are not interesting to grep either. I almost never want to see hits from po/ in git.git, for example. I get by with "^po/", or even "'*.c'" (extra single-quotes so that Git expands the glob). But I'd be happy if I could set a configuration knob to say that files with attribute X should be omitted from grep results (whether binary, or some custom attribute that I assign in .git/info/attributes). I think we've discussed this before, and digging in the archive found this thread from 2012: https://lore.kernel.org/git/4f1d2a8b.a2d8320a.50ec.576d@mx.google.com/ I think some of those ideas came to fruition. You can do: git grep ':(attr:!binary)' now (which obviously is harder than "-I", but the point is that it extends to any attribute if you want). But I still think it would be nice if there was a way to make it the default (without using an alias). -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Making git grep ignore binary the default 2025-10-18 10:22 ` Jeff King @ 2025-10-18 16:01 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2025-10-18 16:01 UTC (permalink / raw) To: Jeff King; +Cc: El_Hoy, git Jeff King <peff@peff.net> writes: > I think we've discussed this before, and digging in the archive found > this thread from 2012: > > https://lore.kernel.org/git/4f1d2a8b.a2d8320a.50ec.576d@mx.google.com/ > > I think some of those ideas came to fruition. You can do: > > git grep ':(attr:!binary)' > > now (which obviously is harder than "-I", but the point is that it > extends to any attribute if you want). But I still think it would be > nice if there was a way to make it the default (without using an alias). Yeah, after I re-read the thread, I specially liked the "filetype" idea that you floated in https://lore.kernel.org/git/20120125214625.GA4666@sigill.intra.peff.net/ ;-). ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-10-21 7:27 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-17 15:00 Making git grep ignore binary the default El_Hoy 2025-10-17 21:29 ` Junio C Hamano 2025-10-17 23:29 ` Thomas Braun 2025-10-18 0:52 ` brian m. carlson 2025-10-18 14:16 ` rsbecker 2025-10-20 15:24 ` Thomas Braun 2025-10-20 17:20 ` El_Hoy 2025-10-21 7:27 ` Jeff King 2025-10-18 10:22 ` Jeff King 2025-10-18 16:01 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).