git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Cc: Junio C Hamano <gitster@pobox.com>, El_Hoy <eloyesp@gmail.com>,
	git@vger.kernel.org
Subject: Re: Making git grep ignore binary the default
Date: Sat, 18 Oct 2025 00:52:08 +0000	[thread overview]
Message-ID: <aPLkuPgirAVHkERr@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <0de410fa-22ef-4495-a6a9-dcd33a329201@virtuell-zuhause.de>

[-- Attachment #1: Type: text/plain, Size: 2394 bytes --]

On 2025-10-17 at 23:29:22, Thomas Braun wrote:
> Am 17.10.2025 um 23:29 schrieb Junio C Hamano:
> > Simply because we have never needed to do something similar to "-a"
> > and "-I" that we added in early 2006 for the past nearly 20 years.
> > Also because GNU does not have any such thing to force "-a" or "-I"
> > as default.  The biggest reason is that it would be surprising if
> > such a change does not break existing scripts that have been written
> > by people over the years.
> 
> And if we only would have the config option "grep.ignoreBinary" defaulting
> to false with no default change whatsoever? I always want to ignore binaries
> when grepping and find it a bit tedious that I have to spell it out all over
> again. And yes I do have an alias as well but usually don't remember to use
> it.

As Junio said, this could break existing scripts.  If I write a command
which uses `git grep` and expects to find all matching files, it would
not work on your system with `grep.ignoreBinary` set to true.

For instance, if I am working on a project for a company and must
exclude source code with a certain vendor's copyright (because we don't
have permission to distribute their code), then it would be very bad if
I accidentally distributed that company's binary files due to `git grep
-l PATTERN | xargs rm -f` not matching them since it would violate the
license.

This is just an example, but there are lots of cases where people do
really want to search every file.

> I'm also curious what people are looking for in binary files with git grep.

It's common to mark PDFs or PostScript files as binary because they
often contain embedded binary fonts, but they are actually mostly text
and can be usefully searched with grep.  For instance, I once created
some awards for a non-profit based on combining standalone text-based
PostScript code along with output from groff, so those independent
pieces could end up being source that you might store in Git and search,
even if many configurations would use `*.ps -text` in a system
gitattributes file.

Sometimes you also have images or such for a website, which contain XMP
metadata (a form of XML-serialized RDF).  Finding those images which
have certain author metadata or a certain license URL embedded in them
could be valuable.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  reply	other threads:[~2025-10-18  0:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 15:00 Making git grep ignore binary the default El_Hoy
2025-10-17 21:29 ` Junio C Hamano
2025-10-17 23:29   ` Thomas Braun
2025-10-18  0:52     ` brian m. carlson [this message]
2025-10-18 14:16       ` rsbecker
2025-10-20 15:24       ` Thomas Braun
2025-10-20 17:20         ` El_Hoy
2025-10-21  7:27           ` Jeff King
2025-10-18 10:22   ` Jeff King
2025-10-18 16:01     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPLkuPgirAVHkERr@fruit.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=eloyesp@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=thomas.braun@virtuell-zuhause.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).