git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Joel Holdsworth <jholdsworth@nvidia.com>
Cc: git@vger.kernel.org,
	Tzadik Vanderhoof <tzadik.vanderhoof@gmail.com>,
	Dorgon Chang <dorgonman@hotmail.com>,
	Joachim Kuebart <joachim.kuebart@gmail.com>,
	Daniel Levin <dendy.ua@gmail.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>,
	Luke Diamand <luke@diamand.org>, Ben Keene <seraphire@gmail.com>,
	Andrew Oakley <andrew@adoakley.name>
Subject: Re: [PATCH 4/4] git-p4: resolve RCS keywords in binary
Date: Mon, 13 Dec 2021 15:34:56 -0800	[thread overview]
Message-ID: <xmqqzgp484f3.fsf@gitster.g> (raw)
In-Reply-To: <20211213225441.1865782-5-jholdsworth@nvidia.com> (Joel Holdsworth's message of "Mon, 13 Dec 2021 22:54:41 +0000")

Joel Holdsworth <jholdsworth@nvidia.com> writes:

> RCS keywords are strings that will are replaced with information from
> Perforce. Examples include $Date$, $Author$, $File$, $Change$ etc.
>
> Perforce resolves these by expanding them with their expanded values
> when files are synced, but Git's data model requires these expanded
> values to be converted back into their unexpanded form.
>
> Previously, git-p4.py would implement this behaviour through the use of
> regular expressions. However, the regular expression substitution was
> applied using decoded strings i.e. the content of incoming commit diffs
> was first decoded from bytes into UTF-8, processed with regular
> expressions, then converted back to bytes.
>
> Not only is this behaviour inefficient, but it is also a cause of a
> common issue caused by text files containing invalid UTF-8 data. For
> files created in Windows, CP1252 Smart Quote Characters (0x93 and 0x94)
> are seen fairly frequently. These codes are invalid in UTF-8, so if the
> script encountered any file containing them, on Python 2 the symbols
> will be corrupted, and on Python 3 the script will fail with an
> exception.

Makes sense, and I am with others who commented on the previous
discussion thread that the right approach to take is to take the
stuff coming from Perforce as byte strings, process them as such and
write them out as byte strings, UNLESS we positively know what the
source and destination encodings are.

And this change we see here, matching with patterns, is perfectly in
line with that direction.  Very nice.

>          try:
> -            with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile:
> +            with os.fdopen(handle, "wb") as outFile, open(file, "rb") as inFile:

We seem to have lost "w+" and now it is "wb".  I do not see a reason
to make outFile anything but write-only, so the end result looks
good to me, but is it an unrelated "bug"fix that should be explained
as such (e.g. "there is no reason to make outFile read-write, so
instead of using 'w+' just use 'wb' while we make it unencoded
output by adding 'b' to it")?

  reply	other threads:[~2021-12-13 23:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13 22:54 [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Joel Holdsworth
2021-12-13 22:54 ` [PATCH 1/4] git-p4: use with statements to close files after use in patchRCSKeywords Joel Holdsworth
2021-12-13 22:54 ` [PATCH 2/4] git-p4: pre-compile RCS keyword regexes Joel Holdsworth
2021-12-13 22:54 ` [PATCH 3/4] git-p4: add raw option to read_pipelines Joel Holdsworth
2021-12-13 22:54 ` [PATCH 4/4] git-p4: resolve RCS keywords in binary Joel Holdsworth
2021-12-13 23:34   ` Junio C Hamano [this message]
2021-12-14 13:12     ` Joel Holdsworth
2021-12-15 21:41       ` Junio C Hamano
2021-12-14 22:36 ` [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Andrew Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqzgp484f3.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=andrew@adoakley.name \
    --cc=dendy.ua@gmail.com \
    --cc=dorgonman@hotmail.com \
    --cc=git@vger.kernel.org \
    --cc=jholdsworth@nvidia.com \
    --cc=joachim.kuebart@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=luke@diamand.org \
    --cc=seraphire@gmail.com \
    --cc=tzadik.vanderhoof@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).