git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <rsbecker@nexbridge.com>
To: "'brian m. carlson'" <sandals@crustytoothpaste.net>,
	"'Thomas Braun'" <thomas.braun@virtuell-zuhause.de>
Cc: "'Junio C Hamano'" <gitster@pobox.com>,
	"'El_Hoy'" <eloyesp@gmail.com>, <git@vger.kernel.org>
Subject: RE: Making git grep ignore binary the default
Date: Sat, 18 Oct 2025 10:16:52 -0400	[thread overview]
Message-ID: <00af01dc4039$dd45e090$97d1a1b0$@nexbridge.com> (raw)
In-Reply-To: <aPLkuPgirAVHkERr@fruit.crustytoothpaste.net>

On October 17, 2025 8:52 PM, brian m. carlson wrote:
>On 2025-10-17 at 23:29:22, Thomas Braun wrote:
>> Am 17.10.2025 um 23:29 schrieb Junio C Hamano:
>> > Simply because we have never needed to do something similar to "-a"
>> > and "-I" that we added in early 2006 for the past nearly 20 years.
>> > Also because GNU does not have any such thing to force "-a" or "-I"
>> > as default.  The biggest reason is that it would be surprising if
>> > such a change does not break existing scripts that have been written
>> > by people over the years.
>>
>> And if we only would have the config option "grep.ignoreBinary"
>> defaulting to false with no default change whatsoever? I always want
>> to ignore binaries when grepping and find it a bit tedious that I have
>> to spell it out all over again. And yes I do have an alias as well but
>> usually don't remember to use it.
>
>As Junio said, this could break existing scripts.  If I write a command which uses `git
>grep` and expects to find all matching files, it would not work on your system with
>`grep.ignoreBinary` set to true.
>
>For instance, if I am working on a project for a company and must exclude source
>code with a certain vendor's copyright (because we don't have permission to
>distribute their code), then it would be very bad if I accidentally distributed that
>company's binary files due to `git grep -l PATTERN | xargs rm -f` not matching them
>since it would violate the license.
>
>This is just an example, but there are lots of cases where people do really want to
>search every file.
>
>> I'm also curious what people are looking for in binary files with git grep.
>
>It's common to mark PDFs or PostScript files as binary because they often contain
>embedded binary fonts, but they are actually mostly text and can be usefully
>searched with grep.  For instance, I once created some awards for a non-profit
>based on combining standalone text-based PostScript code along with output from
>groff, so those independent pieces could end up being source that you might store
>in Git and search, even if many configurations would use `*.ps -text` in a system
>gitattributes file.
>
>Sometimes you also have images or such for a website, which contain XMP
>metadata (a form of XML-serialized RDF).  Finding those images which have certain
>author metadata or a certain license URL embedded in them could be valuable.

I agree that this will break scripts. There are quasi-binary files in some SQL
spaces that really benefit from git grep working. Please do not make this the
default.


  reply	other threads:[~2025-10-18 14:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 15:00 Making git grep ignore binary the default El_Hoy
2025-10-17 21:29 ` Junio C Hamano
2025-10-17 23:29   ` Thomas Braun
2025-10-18  0:52     ` brian m. carlson
2025-10-18 14:16       ` rsbecker [this message]
2025-10-20 15:24       ` Thomas Braun
2025-10-20 17:20         ` El_Hoy
2025-10-21  7:27           ` Jeff King
2025-10-18 10:22   ` Jeff King
2025-10-18 16:01     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='00af01dc4039$dd45e090$97d1a1b0$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=eloyesp@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=thomas.braun@virtuell-zuhause.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).