git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Hamza Mahfooz <someguy@effective-light.com>
Cc: git@vger.kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v11 3/3] grep: fix an edge case concerning ascii patterns and UTF-8 data
Date: Fri, 08 Oct 2021 14:26:28 -0700	[thread overview]
Message-ID: <xmqq1r4vjji3.fsf@gitster.g> (raw)
In-Reply-To: <20211007203148.23888-3-someguy@effective-light.com> (Hamza Mahfooz's message of "Thu, 7 Oct 2021 16:31:48 -0400")

Hamza Mahfooz <someguy@effective-light.com> writes:

> If we attempt to grep non-ascii log message text with an ascii pattern, we
> run into the following issue:
>
>     $ git log --color --author='.var.*Bjar' -1 origin/master | grep ^Author
>     grep: (standard input): binary file matches
>
> So, to fix this teach the grep code to mark the pattern as UTF-8 (even if
> the pattern is composed of only ascii characters), so long as the log
> output is encoded using UTF-8.

We'd need this only if we are using pcre2 backend, no?  If that is
the case, that fact needs to be recorded in the proposed log message
to help later developers, when they wonder why this "all-the-things"
knob exists.

And if it is the case that this bit is needed only to work around a
glitch while using pcre2 backend, I'd rather want to see a solution
that does not need to contaminate the more generic "struct grep_opt"
data and "setup_revisions()" codepath.

In other words, can't the function compile_pcre2_pattern() make the
"is log output encoding utf8?" decision locally and act accordingly?

Thanks.

  reply	other threads:[~2021-10-08 21:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 20:31 [PATCH v11 1/3] grep: refactor next_match() and match_one_pattern() for external use Hamza Mahfooz
2021-10-07 20:31 ` [PATCH v11 2/3] pretty: colorize pattern matches in commit messages Hamza Mahfooz
2021-10-07 20:31 ` [PATCH v11 3/3] grep: fix an edge case concerning ascii patterns and UTF-8 data Hamza Mahfooz
2021-10-08 21:26   ` Junio C Hamano [this message]
2021-10-09  6:44   ` Junio C Hamano
2021-10-09 15:52     ` Hamza Mahfooz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1r4vjji3.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=someguy@effective-light.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).