git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Moritz Baumann via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Tao Klerks <tao@klerks.biz>,
	Moritz Baumann <moritz.baumann@sap.com>
Subject: Re: [PATCH] git-p4: fix crlf handling for utf16 files on Windows
Date: Wed, 20 Jul 2022 09:08:52 -0700	[thread overview]
Message-ID: <xmqqilnr4vhn.fsf@gitster.g> (raw)
In-Reply-To: <pull.1294.git.git.1658294873702.gitgitgadget@gmail.com> (Moritz Baumann via GitGitGadget's message of "Wed, 20 Jul 2022 05:27:53 +0000")

"Moritz Baumann via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Moritz Baumann <moritz.baumann@sap.com>

Can you describe briefly what problem is being solved and how the
change solves it in this place above your Sign-off?  The title says
"fix", without saying how the behaviour by the current code is
"broken", so that is one thing you can describe.  It talks about
"UTF-16 files on Windows", but does it mean git-p4 running on
Windows or git-p4 running anywhere that (over the wire) talks with
P4 running on Windows?  IOW, would the same problem trigger if you
are on macOS but the contents of the file you exchange with P4
happens to be in UTF-16?

These are the things you can describe to help those who are not you
(i.e. without access to an environment similar to what you saw the
problem on) understand the issue and help them convince themselves
that the patch they are seeing is a sensible solution.  Without any,
it is hard to evaluate.

> Signed-off-by: Moritz Baumann <moritz.baumann@sap.com>
> ---

> diff --git a/git-p4.py b/git-p4.py
> index 8fbf6eb1fe3..0a9d7e2ed7c 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -3148,7 +3148,7 @@ class P4Sync(Command, P4UserMap):
>                      raise e
>              else:
>                  if p4_version_string().find('/NT') >= 0:
> -                    text = text.replace(b'\r\n', b'\n')
> +                    text = text.replace(b'\x0d\x00\x0a\x00', b'\x0a\x00')
>                  contents = [text]
>  
>          if type_base == "apple":

OK, the part being touched is inside this context:

        if type_base == "utf16":
            # ...
            # But ascii text saved as -t utf16 is completely mangled.
            # Invoke print -o to get the real contents.
            #
            # On windows, the newlines will always be mangled by print, so put
            # them back too.  This is not needed to the cygwin windows version,
            # just the native "NT" type.
            #

            try:
                text = ...
            except Exception as e:
                ...
            else:
                if p4_version_string().find('/NT') >= 0:
                    text = text.replace(b'\r\n', b'\n')
                contents = [text]

So the intent of the existing code is "we know we are dealing with
UTF-16 text, and after successfully reading 'text' without
exception, we need to convert CRLF back to LF if we are on 'the
native NT type'".  Presumably 'text' that came from
p4_read_pipe(... raw=True) is not unicode string but just a bunch of
bytes, so each "char" is represented as two-byte sequence in UTF-16?

With that (speculative) understanding, I can guess that the patch
makes sense, but the patch should not make readers guess.

Thanks.

  reply	other threads:[~2022-07-20 16:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-20  5:27 [PATCH] git-p4: fix crlf handling for utf16 files on Windows Moritz Baumann via GitGitGadget
2022-07-20 16:08 ` Junio C Hamano [this message]
2022-07-20 16:32   ` Baumann, Moritz
2022-07-20 17:18     ` Junio C Hamano
2022-07-20 18:17 ` [PATCH v2] git-p4: fix CR LF handling for utf16 files Moritz Baumann via GitGitGadget
2022-07-20 18:42   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqilnr4vhn.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=moritz.baumann@sap.com \
    --cc=tao@klerks.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).