git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Shustov <nikolay.shustov@gmail.com>
To: git@vger.kernel.org
Subject: Re: [PATCH] git p4 fix for failure to decode p4 errors
Date: Sat, 22 Mar 2025 07:48:05 -0400	[thread overview]
Message-ID: <32b401c3-de0e-427b-83b7-eb5a5b315db1@gmail.com> (raw)
In-Reply-To: <pull.1926.git.git.1742440852765.gitgitgadget@gmail.com>

ping, pretty please? :-)

On 3/19/25 23:20, Nikolay Shustov via GitGitGadget wrote:
> From: Nikolay Shustov <Nikolay.Shustov@gmail.com>
>
> Fixes the git p4 failure happening when Perforce command returns error
> containing byte stream of characters with high bit set. In such situations
> git p4 implementatino fails to decode this byte stream into utf-8 string.
>
> Design:
> Make use of existing decoding fallback strategy, described by
> git-p4.metadataDecodingStrategy and git-p4.metadataFallbackEncoding
> settings in the logic that decodes the Perforce command error bytes.
>
> Details:
> - Moved p4 metadata transcoding logic from
>    metadata_stream_to_writable_bytes(..) into a new MetadataTranscoder class.
> - Enhcanced the implementation to use git-p4.metadataDecodingStrategy and
>    git-p4.metadataFallbackEncoding settings for p4 errors decoding.
> - Added test.
>
> Signed-off-by: Nikolay Shustov <Nikolay.Shustov@gmail.com>
> ---
>      git p4 fix for failure to decode p4 errors
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1926%2Fnshustov%2Fgit-p4-error-decoding-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1926/nshustov/git-p4-error-decoding-v1
> Pull-Request: https://github.com/git/git/pull/1926
>
>   git-p4.py                        | 135 ++++++++++++++++++-------------
>   t/meson.build                    |   1 +
>   t/t9837-git-p4-error-encoding.sh |  53 ++++++++++++
>   t/t9837/git-p4-error-python3.py  |  15 ++++
>   4 files changed, 149 insertions(+), 55 deletions(-)
>   create mode 100755 t/t9837-git-p4-error-encoding.sh
>   create mode 100644 t/t9837/git-p4-error-python3.py
>
> diff --git a/git-p4.py b/git-p4.py
> index c0ca7becaf4..72a4c55f99e 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -234,67 +234,91 @@ else:
>   
>   
>   class MetadataDecodingException(Exception):
> -    def __init__(self, input_string):
> +    def __init__(self, input_string, error=None):
>           self.input_string = input_string
> +        self.error = error
>   
>       def __str__(self):
> -        return """Decoding perforce metadata failed!
> +        message = """Decoding perforce metadata failed!
>   The failing string was:
>   ---
>   {}
>   ---
>   Consider setting the git-p4.metadataDecodingStrategy config option to
>   'fallback', to allow metadata to be decoded using a fallback encoding,
> -defaulting to cp1252.""".format(self.input_string)
> +defaulting to cp1252."""
> +        if verbose and self.error is not None:
> +            message += """
> +---
> +Error:
> +---
> +{}"""
> +        return message.format(self.input_string, self.error)
>   
>   
> -encoding_fallback_warning_issued = False
> -encoding_escape_warning_issued = False
> -def metadata_stream_to_writable_bytes(s):
> -    encodingStrategy = gitConfig('git-p4.metadataDecodingStrategy') or defaultMetadataDecodingStrategy
> -    fallbackEncoding = gitConfig('git-p4.metadataFallbackEncoding') or defaultFallbackMetadataEncoding
> -    if not isinstance(s, bytes):
> -        return s.encode('utf_8')
> -    if encodingStrategy == 'passthrough':
> -        return s
> -    try:
> -        s.decode('utf_8')
> -        return s
> -    except UnicodeDecodeError:
> -        if encodingStrategy == 'fallback' and fallbackEncoding:
> -            global encoding_fallback_warning_issued
> -            global encoding_escape_warning_issued
> -            try:
> -                if not encoding_fallback_warning_issued:
> -                    print("\nCould not decode value as utf-8; using configured fallback encoding %s: %s" % (fallbackEncoding, s))
> -                    print("\n(this warning is only displayed once during an import)")
> -                    encoding_fallback_warning_issued = True
> -                return s.decode(fallbackEncoding).encode('utf_8')
> -            except Exception as exc:
> -                if not encoding_escape_warning_issued:
> -                    print("\nCould not decode value with configured fallback encoding %s; escaping bytes over 127: %s" % (fallbackEncoding, s))
> -                    print("\n(this warning is only displayed once during an import)")
> -                    encoding_escape_warning_issued = True
> -                escaped_bytes = b''
> -                # bytes and strings work very differently in python2 vs python3...
> -                if str is bytes:
> -                    for byte in s:
> -                        byte_number = struct.unpack('>B', byte)[0]
> -                        if byte_number > 127:
> -                            escaped_bytes += b'%'
> -                            escaped_bytes += hex(byte_number)[2:].upper()
> -                        else:
> -                            escaped_bytes += byte
> -                else:
> -                    for byte_number in s:
> -                        if byte_number > 127:
> -                            escaped_bytes += b'%'
> -                            escaped_bytes += hex(byte_number).upper().encode()[2:]
> -                        else:
> -                            escaped_bytes += bytes([byte_number])
> -                return escaped_bytes
> +class MetadataTranscoder:
> +    def __init__(self, default_metadata_decoding_strategy, default_fallback_metadata_encoding):
> +        self.decoding_fallback_warning_issued = False
> +        self.decoding_escape_warning_issued = False
> +        self.decodingStrategy = gitConfig('git-p4.metadataDecodingStrategy') or default_metadata_decoding_strategy
> +        self.fallbackEncoding = gitConfig('git-p4.metadataFallbackEncoding') or default_fallback_metadata_encoding
> +
> +    def decode_metadata(self, s, error_from_fallback=True):
> +        try:
> +            return [s.decode('utf_8'), 'utf_8']
> +        except UnicodeDecodeError as decode_exception:
> +            error = decode_exception
> +            if self.decodingStrategy == 'fallback' and self.fallbackEncoding:
> +                try:
> +                    if not self.decoding_fallback_warning_issued:
> +                        print("\nCould not decode value as utf-8; using configured fallback encoding %s: %s" % (self.fallbackEncoding, s))
> +                        print("\n(this warning is only displayed once during an import)")
> +                        self.decoding_fallback_warning_issued = True
> +                    return [s.decode(self.fallbackEncoding), self.fallbackEncoding]
> +                except Exception as decode_exception:
> +                    if not error_from_fallback:
> +                        return [s, None]
> +                    error = decode_exception
> +            raise MetadataDecodingException(s, error)
> +
> +    def metadata_stream_to_writable_bytes(self, s):
> +        if not isinstance(s, bytes):
> +            return s.encode('utf_8')
> +        if self.decodingStrategy == 'passthrough':
> +            return s
> +
> +        [text, encoding] = self.decode_metadata(s, False)
> +        if encoding == 'utf_8':
> +            # s is of utf-8 already
> +            return s
> +
> +        if encoding is None:
> +            # could not decode s, even with fallback encoding
> +            if not self.decoding_escape_warning_issued:
> +                print("\nCould not decode value with configured fallback encoding %s; escaping bytes over 127: %s" % (self.fallbackEncoding, s))
> +                print("\n(this warning is only displayed once during an import)")
> +                self.decoding_escape_warning_issued = True
> +            escaped_bytes = b''
> +            # bytes and strings work very differently in python2 vs python3...
> +            if str is bytes:
> +                for byte in s:
> +                    byte_number = struct.unpack('>B', byte)[0]
> +                    if byte_number > 127:
> +                        escaped_bytes += b'%'
> +                        escaped_bytes += hex(byte_number)[2:].upper()
> +                    else:
> +                        escaped_bytes += byte
> +            else:
> +                for byte_number in s:
> +                    if byte_number > 127:
> +                        escaped_bytes += b'%'
> +                        escaped_bytes += hex(byte_number).upper().encode()[2:]
> +                    else:
> +                        escaped_bytes += bytes([byte_number])
> +            return escaped_bytes
>   
> -        raise MetadataDecodingException(s)
> +        # were able to decode but not to utf-8
> +        return text.encode('utf_8')
>   
>   
>   def decode_path(path):
> @@ -898,14 +922,14 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
>                       decoded_entry[key] = value
>                   # Parse out data if it's an error response
>                   if decoded_entry.get('code') == 'error' and 'data' in decoded_entry:
> -                    decoded_entry['data'] = decoded_entry['data'].decode()
> +                    decoded_entry['data'] = metadataTranscoder.decode_metadata(decoded_entry['data'])
>                   entry = decoded_entry
>               if skip_info:
>                   if 'code' in entry and entry['code'] == 'info':
>                       continue
>               for key in p4KeysContainingNonUtf8Chars():
>                   if key in entry:
> -                    entry[key] = metadata_stream_to_writable_bytes(entry[key])
> +                    entry[key] = metadataTranscoder.metadata_stream_to_writable_bytes(entry[key])
>               if cb is not None:
>                   cb(entry)
>               else:
> @@ -1718,7 +1742,7 @@ class P4UserMap:
>               # python2 or python3. To support
>               # git-p4.metadataDecodingStrategy=fallback, self.users dict values
>               # are always bytes, ready to be written to git.
> -            emailbytes = metadata_stream_to_writable_bytes(output["Email"])
> +            emailbytes = metadataTranscoder.metadata_stream_to_writable_bytes(output["Email"])
>               self.users[output["User"]] = output["FullName"] + b" <" + emailbytes + b">"
>               self.emails[output["Email"]] = output["User"]
>   
> @@ -1730,12 +1754,12 @@ class P4UserMap:
>                   fullname = mapUser[0][1]
>                   email = mapUser[0][2]
>                   fulluser = fullname + " <" + email + ">"
> -                self.users[user] = metadata_stream_to_writable_bytes(fulluser)
> +                self.users[user] = metadataTranscoder.metadata_stream_to_writable_bytes(fulluser)
>                   self.emails[email] = user
>   
>           s = b''
>           for (key, val) in self.users.items():
> -            keybytes = metadata_stream_to_writable_bytes(key)
> +            keybytes = metadataTranscoder.metadata_stream_to_writable_bytes(key)
>               s += b"%s\t%s\n" % (keybytes.expandtabs(1), val.expandtabs(1))
>   
>           open(self.getUserCacheFilename(), 'wb').write(s)
> @@ -3349,7 +3373,7 @@ class P4Sync(Command, P4UserMap):
>           if userid in self.users:
>               return self.users[userid]
>           else:
> -            userid_bytes = metadata_stream_to_writable_bytes(userid)
> +            userid_bytes = metadataTranscoder.metadata_stream_to_writable_bytes(userid)
>               return b"%s <a@b>" % userid_bytes
>   
>       def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
> @@ -4561,6 +4585,7 @@ commands = {
>       "unshelve": P4Unshelve,
>   }
>   
> +metadataTranscoder = MetadataTranscoder(defaultMetadataDecodingStrategy, defaultFallbackMetadataEncoding)
>   
>   def main():
>       if len(sys.argv[1:]) == 0:
> diff --git a/t/meson.build b/t/meson.build
> index a59da26be3f..656424fdff3 100644
> --- a/t/meson.build
> +++ b/t/meson.build
> @@ -1090,6 +1090,7 @@ integration_tests = [
>     't9834-git-p4-file-dir-bug.sh',
>     't9835-git-p4-metadata-encoding-python2.sh',
>     't9836-git-p4-metadata-encoding-python3.sh',
> +  't9837-git-p4-error-encoding.sh',
>     't9850-shell.sh',
>     't9901-git-web--browse.sh',
>     't9902-completion.sh',
> diff --git a/t/t9837-git-p4-error-encoding.sh b/t/t9837-git-p4-error-encoding.sh
> new file mode 100755
> index 00000000000..1ea774afb1b
> --- /dev/null
> +++ b/t/t9837-git-p4-error-encoding.sh
> @@ -0,0 +1,53 @@
> +#!/bin/sh
> +
> +test_description='git p4 error encoding
> +
> +This test checks that the import process handles inconsistent text
> +encoding in p4 error messages without failing'
> +
> +. ./lib-git-p4.sh
> +
> +###############################
> +## SECTION REPEATED IN t9835 ##
> +###############################
> +
> +# These tests require Perforce with non-unicode setup.
> +out=$(2>&1 P4CHARSET=utf8 p4 client -o)
> +if test $? -eq 0
> +then
> +	skip_all="skipping git p4 error encoding tests; Perforce is setup with unicode"
> +	test_done
> +fi
> +
> +# These tests are specific to Python 3. Write a custom script that executes
> +# git-p4 directly with the Python 3 interpreter to ensure that we use that
> +# version even if Git was compiled with Python 2.
> +python_target_binary=$(which python3)
> +if test -n "$python_target_binary"
> +then
> +	mkdir temp_python
> +	PATH="$(pwd)/temp_python:$PATH"
> +	export PATH
> +
> +	write_script temp_python/git-p4-python3 <<-EOF
> +	exec "$python_target_binary" "$(git --exec-path)/git-p4" "\$@"
> +	EOF
> +fi
> +
> +git p4-python3 >err
> +if ! grep 'valid commands' err
> +then
> +	skip_all="skipping python3 git p4 tests; python3 not available"
> +	test_done
> +fi
> +
> +test_expect_success 'start p4d' '
> +	start_p4d
> +'
> +
> +test_expect_success 'see if Perforce error with characters not convertable to utf-8 will be processed correctly' '
> +	test_when_finished cleanup_git &&
> +	$python_target_binary "$TEST_DIRECTORY"/t9837/git-p4-error-python3.py "$TEST_DIRECTORY"
> +'
> +
> +test_done
> diff --git a/t/t9837/git-p4-error-python3.py b/t/t9837/git-p4-error-python3.py
> new file mode 100644
> index 00000000000..fb65aee386e
> --- /dev/null
> +++ b/t/t9837/git-p4-error-python3.py
> @@ -0,0 +1,15 @@
> +import os
> +import sys
> +from  importlib.machinery import SourceFileLoader
> +
> +def main():
> +    if len(sys.argv[1:]) != 1:
> +        print("Expected test directory name")
> +
> +    gitp4_path = sys.argv[1] + "/../git-p4.py"
> +    gitp4 = SourceFileLoader("gitp4", gitp4_path).load_module()
> +    gitp4.p4CmdList(["edit", b'\xFEfile'])
> +
> +if __name__ == '__main__':
> +    main()
> +
>
> base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e

  reply	other threads:[~2025-03-22 11:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  3:20 [PATCH] git p4 fix for failure to decode p4 errors Nikolay Shustov via GitGitGadget
2025-03-22 11:48 ` Nikolay Shustov [this message]
2025-03-25 23:09   ` Nikolay Shustov
2025-03-26 15:09     ` Phillip Wood
2025-03-30 20:06       ` Nikolay Shustov
2025-03-31  1:21         ` Nikolay Shustov
2025-03-31  9:01           ` Fahad Al-Rashed
2025-03-31 23:37             ` Nikolay Shustov
2025-04-05 18:46               ` Nikolay Shustov
2025-04-08 12:13                 ` Nikolay Shustov
     [not found]                   ` <CAFd+s4USsHPaepvfNtjm5VGieuH89zbW5Yj+OSXD8THxkj6tTw@mail.gmail.com>
2025-04-09 12:20                     ` Nikolay Shustov
2025-04-08 13:18         ` Phillip Wood
2025-04-08 13:28           ` Nikolay Shustov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32b401c3-de0e-427b-83b7-eb5a5b315db1@gmail.com \
    --to=nikolay.shustov@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).