git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Shustov <nikolay.shustov@gmail.com>
To: git@vger.kernel.org
Subject: Re: [PATCH] git p4 fix for failure to decode p4 errors
Date: Tue, 25 Mar 2025 19:09:28 -0400	[thread overview]
Message-ID: <fdbb3f88-7321-4dc0-9ead-7ed9ef0fc995@gmail.com> (raw)
In-Reply-To: <32b401c3-de0e-427b-83b7-eb5a5b315db1@gmail.com>

I think this fix is important.
git-p4 is used in the companies where there is an intent to migrate from 
Perforce to Git and having the issue that this change fixes is a real 
roadblock.
The better we can make git-p4, the more adoption Git would get in the 
commercial world.

On 3/22/25 07:48, Nikolay Shustov wrote:
> ping, pretty please? :-)
>
> On 3/19/25 23:20, Nikolay Shustov via GitGitGadget wrote:
>> From: Nikolay Shustov <Nikolay.Shustov@gmail.com>
>>
>> Fixes the git p4 failure happening when Perforce command returns error
>> containing byte stream of characters with high bit set. In such 
>> situations
>> git p4 implementatino fails to decode this byte stream into utf-8 
>> string.
>>
>> Design:
>> Make use of existing decoding fallback strategy, described by
>> git-p4.metadataDecodingStrategy and git-p4.metadataFallbackEncoding
>> settings in the logic that decodes the Perforce command error bytes.
>>
>> Details:
>> - Moved p4 metadata transcoding logic from
>>    metadata_stream_to_writable_bytes(..) into a new 
>> MetadataTranscoder class.
>> - Enhcanced the implementation to use git-p4.metadataDecodingStrategy 
>> and
>>    git-p4.metadataFallbackEncoding settings for p4 errors decoding.
>> - Added test.
>>
>> Signed-off-by: Nikolay Shustov <Nikolay.Shustov@gmail.com>
>> ---
>>      git p4 fix for failure to decode p4 errors
>>
>> Published-As: 
>> https://github.com/gitgitgadget/git/releases/tag/pr-git-1926%2Fnshustov%2Fgit-p4-error-decoding-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
>> pr-git-1926/nshustov/git-p4-error-decoding-v1
>> Pull-Request: https://github.com/git/git/pull/1926
>>
>>   git-p4.py                        | 135 ++++++++++++++++++-------------
>>   t/meson.build                    |   1 +
>>   t/t9837-git-p4-error-encoding.sh |  53 ++++++++++++
>>   t/t9837/git-p4-error-python3.py  |  15 ++++
>>   4 files changed, 149 insertions(+), 55 deletions(-)
>>   create mode 100755 t/t9837-git-p4-error-encoding.sh
>>   create mode 100644 t/t9837/git-p4-error-python3.py
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index c0ca7becaf4..72a4c55f99e 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -234,67 +234,91 @@ else:
>>       class MetadataDecodingException(Exception):
>> -    def __init__(self, input_string):
>> +    def __init__(self, input_string, error=None):
>>           self.input_string = input_string
>> +        self.error = error
>>         def __str__(self):
>> -        return """Decoding perforce metadata failed!
>> +        message = """Decoding perforce metadata failed!
>>   The failing string was:
>>   ---
>>   {}
>>   ---
>>   Consider setting the git-p4.metadataDecodingStrategy config option to
>>   'fallback', to allow metadata to be decoded using a fallback encoding,
>> -defaulting to cp1252.""".format(self.input_string)
>> +defaulting to cp1252."""
>> +        if verbose and self.error is not None:
>> +            message += """
>> +---
>> +Error:
>> +---
>> +{}"""
>> +        return message.format(self.input_string, self.error)
>>     -encoding_fallback_warning_issued = False
>> -encoding_escape_warning_issued = False
>> -def metadata_stream_to_writable_bytes(s):
>> -    encodingStrategy = gitConfig('git-p4.metadataDecodingStrategy') 
>> or defaultMetadataDecodingStrategy
>> -    fallbackEncoding = gitConfig('git-p4.metadataFallbackEncoding') 
>> or defaultFallbackMetadataEncoding
>> -    if not isinstance(s, bytes):
>> -        return s.encode('utf_8')
>> -    if encodingStrategy == 'passthrough':
>> -        return s
>> -    try:
>> -        s.decode('utf_8')
>> -        return s
>> -    except UnicodeDecodeError:
>> -        if encodingStrategy == 'fallback' and fallbackEncoding:
>> -            global encoding_fallback_warning_issued
>> -            global encoding_escape_warning_issued
>> -            try:
>> -                if not encoding_fallback_warning_issued:
>> -                    print("\nCould not decode value as utf-8; using 
>> configured fallback encoding %s: %s" % (fallbackEncoding, s))
>> -                    print("\n(this warning is only displayed once 
>> during an import)")
>> -                    encoding_fallback_warning_issued = True
>> -                return s.decode(fallbackEncoding).encode('utf_8')
>> -            except Exception as exc:
>> -                if not encoding_escape_warning_issued:
>> -                    print("\nCould not decode value with configured 
>> fallback encoding %s; escaping bytes over 127: %s" % 
>> (fallbackEncoding, s))
>> -                    print("\n(this warning is only displayed once 
>> during an import)")
>> -                    encoding_escape_warning_issued = True
>> -                escaped_bytes = b''
>> -                # bytes and strings work very differently in python2 
>> vs python3...
>> -                if str is bytes:
>> -                    for byte in s:
>> -                        byte_number = struct.unpack('>B', byte)[0]
>> -                        if byte_number > 127:
>> -                            escaped_bytes += b'%'
>> -                            escaped_bytes += 
>> hex(byte_number)[2:].upper()
>> -                        else:
>> -                            escaped_bytes += byte
>> -                else:
>> -                    for byte_number in s:
>> -                        if byte_number > 127:
>> -                            escaped_bytes += b'%'
>> -                            escaped_bytes += 
>> hex(byte_number).upper().encode()[2:]
>> -                        else:
>> -                            escaped_bytes += bytes([byte_number])
>> -                return escaped_bytes
>> +class MetadataTranscoder:
>> +    def __init__(self, default_metadata_decoding_strategy, 
>> default_fallback_metadata_encoding):
>> +        self.decoding_fallback_warning_issued = False
>> +        self.decoding_escape_warning_issued = False
>> +        self.decodingStrategy = 
>> gitConfig('git-p4.metadataDecodingStrategy') or 
>> default_metadata_decoding_strategy
>> +        self.fallbackEncoding = 
>> gitConfig('git-p4.metadataFallbackEncoding') or 
>> default_fallback_metadata_encoding
>> +
>> +    def decode_metadata(self, s, error_from_fallback=True):
>> +        try:
>> +            return [s.decode('utf_8'), 'utf_8']
>> +        except UnicodeDecodeError as decode_exception:
>> +            error = decode_exception
>> +            if self.decodingStrategy == 'fallback' and 
>> self.fallbackEncoding:
>> +                try:
>> +                    if not self.decoding_fallback_warning_issued:
>> +                        print("\nCould not decode value as utf-8; 
>> using configured fallback encoding %s: %s" % (self.fallbackEncoding, s))
>> +                        print("\n(this warning is only displayed 
>> once during an import)")
>> +                        self.decoding_fallback_warning_issued = True
>> +                    return [s.decode(self.fallbackEncoding), 
>> self.fallbackEncoding]
>> +                except Exception as decode_exception:
>> +                    if not error_from_fallback:
>> +                        return [s, None]
>> +                    error = decode_exception
>> +            raise MetadataDecodingException(s, error)
>> +
>> +    def metadata_stream_to_writable_bytes(self, s):
>> +        if not isinstance(s, bytes):
>> +            return s.encode('utf_8')
>> +        if self.decodingStrategy == 'passthrough':
>> +            return s
>> +
>> +        [text, encoding] = self.decode_metadata(s, False)
>> +        if encoding == 'utf_8':
>> +            # s is of utf-8 already
>> +            return s
>> +
>> +        if encoding is None:
>> +            # could not decode s, even with fallback encoding
>> +            if not self.decoding_escape_warning_issued:
>> +                print("\nCould not decode value with configured 
>> fallback encoding %s; escaping bytes over 127: %s" % 
>> (self.fallbackEncoding, s))
>> +                print("\n(this warning is only displayed once during 
>> an import)")
>> +                self.decoding_escape_warning_issued = True
>> +            escaped_bytes = b''
>> +            # bytes and strings work very differently in python2 vs 
>> python3...
>> +            if str is bytes:
>> +                for byte in s:
>> +                    byte_number = struct.unpack('>B', byte)[0]
>> +                    if byte_number > 127:
>> +                        escaped_bytes += b'%'
>> +                        escaped_bytes += hex(byte_number)[2:].upper()
>> +                    else:
>> +                        escaped_bytes += byte
>> +            else:
>> +                for byte_number in s:
>> +                    if byte_number > 127:
>> +                        escaped_bytes += b'%'
>> +                        escaped_bytes += 
>> hex(byte_number).upper().encode()[2:]
>> +                    else:
>> +                        escaped_bytes += bytes([byte_number])
>> +            return escaped_bytes
>>   -        raise MetadataDecodingException(s)
>> +        # were able to decode but not to utf-8
>> +        return text.encode('utf_8')
>>       def decode_path(path):
>> @@ -898,14 +922,14 @@ def p4CmdList(cmd, stdin=None, 
>> stdin_mode='w+b', cb=None, skip_info=False,
>>                       decoded_entry[key] = value
>>                   # Parse out data if it's an error response
>>                   if decoded_entry.get('code') == 'error' and 'data' 
>> in decoded_entry:
>> -                    decoded_entry['data'] = 
>> decoded_entry['data'].decode()
>> +                    decoded_entry['data'] = 
>> metadataTranscoder.decode_metadata(decoded_entry['data'])
>>                   entry = decoded_entry
>>               if skip_info:
>>                   if 'code' in entry and entry['code'] == 'info':
>>                       continue
>>               for key in p4KeysContainingNonUtf8Chars():
>>                   if key in entry:
>> -                    entry[key] = 
>> metadata_stream_to_writable_bytes(entry[key])
>> +                    entry[key] = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(entry[key])
>>               if cb is not None:
>>                   cb(entry)
>>               else:
>> @@ -1718,7 +1742,7 @@ class P4UserMap:
>>               # python2 or python3. To support
>>               # git-p4.metadataDecodingStrategy=fallback, self.users 
>> dict values
>>               # are always bytes, ready to be written to git.
>> -            emailbytes = 
>> metadata_stream_to_writable_bytes(output["Email"])
>> +            emailbytes = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(output["Email"])
>>               self.users[output["User"]] = output["FullName"] + b" <" 
>> + emailbytes + b">"
>>               self.emails[output["Email"]] = output["User"]
>>   @@ -1730,12 +1754,12 @@ class P4UserMap:
>>                   fullname = mapUser[0][1]
>>                   email = mapUser[0][2]
>>                   fulluser = fullname + " <" + email + ">"
>> -                self.users[user] = 
>> metadata_stream_to_writable_bytes(fulluser)
>> +                self.users[user] = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(fulluser)
>>                   self.emails[email] = user
>>             s = b''
>>           for (key, val) in self.users.items():
>> -            keybytes = metadata_stream_to_writable_bytes(key)
>> +            keybytes = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(key)
>>               s += b"%s\t%s\n" % (keybytes.expandtabs(1), 
>> val.expandtabs(1))
>>             open(self.getUserCacheFilename(), 'wb').write(s)
>> @@ -3349,7 +3373,7 @@ class P4Sync(Command, P4UserMap):
>>           if userid in self.users:
>>               return self.users[userid]
>>           else:
>> -            userid_bytes = metadata_stream_to_writable_bytes(userid)
>> +            userid_bytes = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(userid)
>>               return b"%s <a@b>" % userid_bytes
>>         def streamTag(self, gitStream, labelName, labelDetails, 
>> commit, epoch):
>> @@ -4561,6 +4585,7 @@ commands = {
>>       "unshelve": P4Unshelve,
>>   }
>>   +metadataTranscoder = 
>> MetadataTranscoder(defaultMetadataDecodingStrategy, 
>> defaultFallbackMetadataEncoding)
>>     def main():
>>       if len(sys.argv[1:]) == 0:
>> diff --git a/t/meson.build b/t/meson.build
>> index a59da26be3f..656424fdff3 100644
>> --- a/t/meson.build
>> +++ b/t/meson.build
>> @@ -1090,6 +1090,7 @@ integration_tests = [
>>     't9834-git-p4-file-dir-bug.sh',
>>     't9835-git-p4-metadata-encoding-python2.sh',
>>     't9836-git-p4-metadata-encoding-python3.sh',
>> +  't9837-git-p4-error-encoding.sh',
>>     't9850-shell.sh',
>>     't9901-git-web--browse.sh',
>>     't9902-completion.sh',
>> diff --git a/t/t9837-git-p4-error-encoding.sh 
>> b/t/t9837-git-p4-error-encoding.sh
>> new file mode 100755
>> index 00000000000..1ea774afb1b
>> --- /dev/null
>> +++ b/t/t9837-git-p4-error-encoding.sh
>> @@ -0,0 +1,53 @@
>> +#!/bin/sh
>> +
>> +test_description='git p4 error encoding
>> +
>> +This test checks that the import process handles inconsistent text
>> +encoding in p4 error messages without failing'
>> +
>> +. ./lib-git-p4.sh
>> +
>> +###############################
>> +## SECTION REPEATED IN t9835 ##
>> +###############################
>> +
>> +# These tests require Perforce with non-unicode setup.
>> +out=$(2>&1 P4CHARSET=utf8 p4 client -o)
>> +if test $? -eq 0
>> +then
>> +    skip_all="skipping git p4 error encoding tests; Perforce is 
>> setup with unicode"
>> +    test_done
>> +fi
>> +
>> +# These tests are specific to Python 3. Write a custom script that 
>> executes
>> +# git-p4 directly with the Python 3 interpreter to ensure that we 
>> use that
>> +# version even if Git was compiled with Python 2.
>> +python_target_binary=$(which python3)
>> +if test -n "$python_target_binary"
>> +then
>> +    mkdir temp_python
>> +    PATH="$(pwd)/temp_python:$PATH"
>> +    export PATH
>> +
>> +    write_script temp_python/git-p4-python3 <<-EOF
>> +    exec "$python_target_binary" "$(git --exec-path)/git-p4" "\$@"
>> +    EOF
>> +fi
>> +
>> +git p4-python3 >err
>> +if ! grep 'valid commands' err
>> +then
>> +    skip_all="skipping python3 git p4 tests; python3 not available"
>> +    test_done
>> +fi
>> +
>> +test_expect_success 'start p4d' '
>> +    start_p4d
>> +'
>> +
>> +test_expect_success 'see if Perforce error with characters not 
>> convertable to utf-8 will be processed correctly' '
>> +    test_when_finished cleanup_git &&
>> +    $python_target_binary 
>> "$TEST_DIRECTORY"/t9837/git-p4-error-python3.py "$TEST_DIRECTORY"
>> +'
>> +
>> +test_done
>> diff --git a/t/t9837/git-p4-error-python3.py 
>> b/t/t9837/git-p4-error-python3.py
>> new file mode 100644
>> index 00000000000..fb65aee386e
>> --- /dev/null
>> +++ b/t/t9837/git-p4-error-python3.py
>> @@ -0,0 +1,15 @@
>> +import os
>> +import sys
>> +from  importlib.machinery import SourceFileLoader
>> +
>> +def main():
>> +    if len(sys.argv[1:]) != 1:
>> +        print("Expected test directory name")
>> +
>> +    gitp4_path = sys.argv[1] + "/../git-p4.py"
>> +    gitp4 = SourceFileLoader("gitp4", gitp4_path).load_module()
>> +    gitp4.p4CmdList(["edit", b'\xFEfile'])
>> +
>> +if __name__ == '__main__':
>> +    main()
>> +
>>
>> base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e

  reply	other threads:[~2025-03-25 23:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  3:20 [PATCH] git p4 fix for failure to decode p4 errors Nikolay Shustov via GitGitGadget
2025-03-22 11:48 ` Nikolay Shustov
2025-03-25 23:09   ` Nikolay Shustov [this message]
2025-03-26 15:09     ` Phillip Wood
2025-03-30 20:06       ` Nikolay Shustov
2025-03-31  1:21         ` Nikolay Shustov
2025-03-31  9:01           ` Fahad Al-Rashed
2025-03-31 23:37             ` Nikolay Shustov
2025-04-05 18:46               ` Nikolay Shustov
2025-04-08 12:13                 ` Nikolay Shustov
     [not found]                   ` <CAFd+s4USsHPaepvfNtjm5VGieuH89zbW5Yj+OSXD8THxkj6tTw@mail.gmail.com>
2025-04-09 12:20                     ` Nikolay Shustov
2025-04-08 13:18         ` Phillip Wood
2025-04-08 13:28           ` Nikolay Shustov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fdbb3f88-7321-4dc0-9ead-7ed9ef0fc995@gmail.com \
    --to=nikolay.shustov@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).