All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Shustov <nikolay.shustov@gmail.com>
To: git@vger.kernel.org
Subject: Re: [PATCH] git p4 fix for failure to decode p4 errors
Date: Tue, 25 Mar 2025 19:09:28 -0400	[thread overview]
Message-ID: <fdbb3f88-7321-4dc0-9ead-7ed9ef0fc995@gmail.com> (raw)
In-Reply-To: <32b401c3-de0e-427b-83b7-eb5a5b315db1@gmail.com>

I think this fix is important.
git-p4 is used in the companies where there is an intent to migrate from 
Perforce to Git and having the issue that this change fixes is a real 
roadblock.
The better we can make git-p4, the more adoption Git would get in the 
commercial world.

On 3/22/25 07:48, Nikolay Shustov wrote:
> ping, pretty please? :-)
>
> On 3/19/25 23:20, Nikolay Shustov via GitGitGadget wrote:
>> From: Nikolay Shustov <Nikolay.Shustov@gmail.com>
>>
>> Fixes the git p4 failure happening when Perforce command returns error
>> containing byte stream of characters with high bit set. In such 
>> situations
>> git p4 implementatino fails to decode this byte stream into utf-8 
>> string.
>>
>> Design:
>> Make use of existing decoding fallback strategy, described by
>> git-p4.metadataDecodingStrategy and git-p4.metadataFallbackEncoding
>> settings in the logic that decodes the Perforce command error bytes.
>>
>> Details:
>> - Moved p4 metadata transcoding logic from
>>    metadata_stream_to_writable_bytes(..) into a new 
>> MetadataTranscoder class.
>> - Enhcanced the implementation to use git-p4.metadataDecodingStrategy 
>> and
>>    git-p4.metadataFallbackEncoding settings for p4 errors decoding.
>> - Added test.
>>
>> Signed-off-by: Nikolay Shustov <Nikolay.Shustov@gmail.com>
>> ---
>>      git p4 fix for failure to decode p4 errors
>>
>> Published-As: 
>> https://github.com/gitgitgadget/git/releases/tag/pr-git-1926%2Fnshustov%2Fgit-p4-error-decoding-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
>> pr-git-1926/nshustov/git-p4-error-decoding-v1
>> Pull-Request: https://github.com/git/git/pull/1926
>>
>>   git-p4.py                        | 135 ++++++++++++++++++-------------
>>   t/meson.build                    |   1 +
>>   t/t9837-git-p4-error-encoding.sh |  53 ++++++++++++
>>   t/t9837/git-p4-error-python3.py  |  15 ++++
>>   4 files changed, 149 insertions(+), 55 deletions(-)
>>   create mode 100755 t/t9837-git-p4-error-encoding.sh
>>   create mode 100644 t/t9837/git-p4-error-python3.py
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index c0ca7becaf4..72a4c55f99e 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -234,67 +234,91 @@ else:
>>       class MetadataDecodingException(Exception):
>> -    def __init__(self, input_string):
>> +    def __init__(self, input_string, error=None):
>>           self.input_string = input_string
>> +        self.error = error
>>         def __str__(self):
>> -        return """Decoding perforce metadata failed!
>> +        message = """Decoding perforce metadata failed!
>>   The failing string was:
>>   ---
>>   {}
>>   ---
>>   Consider setting the git-p4.metadataDecodingStrategy config option to
>>   'fallback', to allow metadata to be decoded using a fallback encoding,
>> -defaulting to cp1252.""".format(self.input_string)
>> +defaulting to cp1252."""
>> +        if verbose and self.error is not None:
>> +            message += """
>> +---
>> +Error:
>> +---
>> +{}"""
>> +        return message.format(self.input_string, self.error)
>>     -encoding_fallback_warning_issued = False
>> -encoding_escape_warning_issued = False
>> -def metadata_stream_to_writable_bytes(s):
>> -    encodingStrategy = gitConfig('git-p4.metadataDecodingStrategy') 
>> or defaultMetadataDecodingStrategy
>> -    fallbackEncoding = gitConfig('git-p4.metadataFallbackEncoding') 
>> or defaultFallbackMetadataEncoding
>> -    if not isinstance(s, bytes):
>> -        return s.encode('utf_8')
>> -    if encodingStrategy == 'passthrough':
>> -        return s
>> -    try:
>> -        s.decode('utf_8')
>> -        return s
>> -    except UnicodeDecodeError:
>> -        if encodingStrategy == 'fallback' and fallbackEncoding:
>> -            global encoding_fallback_warning_issued
>> -            global encoding_escape_warning_issued
>> -            try:
>> -                if not encoding_fallback_warning_issued:
>> -                    print("\nCould not decode value as utf-8; using 
>> configured fallback encoding %s: %s" % (fallbackEncoding, s))
>> -                    print("\n(this warning is only displayed once 
>> during an import)")
>> -                    encoding_fallback_warning_issued = True
>> -                return s.decode(fallbackEncoding).encode('utf_8')
>> -            except Exception as exc:
>> -                if not encoding_escape_warning_issued:
>> -                    print("\nCould not decode value with configured 
>> fallback encoding %s; escaping bytes over 127: %s" % 
>> (fallbackEncoding, s))
>> -                    print("\n(this warning is only displayed once 
>> during an import)")
>> -                    encoding_escape_warning_issued = True
>> -                escaped_bytes = b''
>> -                # bytes and strings work very differently in python2 
>> vs python3...
>> -                if str is bytes:
>> -                    for byte in s:
>> -                        byte_number = struct.unpack('>B', byte)[0]
>> -                        if byte_number > 127:
>> -                            escaped_bytes += b'%'
>> -                            escaped_bytes += 
>> hex(byte_number)[2:].upper()
>> -                        else:
>> -                            escaped_bytes += byte
>> -                else:
>> -                    for byte_number in s:
>> -                        if byte_number > 127:
>> -                            escaped_bytes += b'%'
>> -                            escaped_bytes += 
>> hex(byte_number).upper().encode()[2:]
>> -                        else:
>> -                            escaped_bytes += bytes([byte_number])
>> -                return escaped_bytes
>> +class MetadataTranscoder:
>> +    def __init__(self, default_metadata_decoding_strategy, 
>> default_fallback_metadata_encoding):
>> +        self.decoding_fallback_warning_issued = False
>> +        self.decoding_escape_warning_issued = False
>> +        self.decodingStrategy = 
>> gitConfig('git-p4.metadataDecodingStrategy') or 
>> default_metadata_decoding_strategy
>> +        self.fallbackEncoding = 
>> gitConfig('git-p4.metadataFallbackEncoding') or 
>> default_fallback_metadata_encoding
>> +
>> +    def decode_metadata(self, s, error_from_fallback=True):
>> +        try:
>> +            return [s.decode('utf_8'), 'utf_8']
>> +        except UnicodeDecodeError as decode_exception:
>> +            error = decode_exception
>> +            if self.decodingStrategy == 'fallback' and 
>> self.fallbackEncoding:
>> +                try:
>> +                    if not self.decoding_fallback_warning_issued:
>> +                        print("\nCould not decode value as utf-8; 
>> using configured fallback encoding %s: %s" % (self.fallbackEncoding, s))
>> +                        print("\n(this warning is only displayed 
>> once during an import)")
>> +                        self.decoding_fallback_warning_issued = True
>> +                    return [s.decode(self.fallbackEncoding), 
>> self.fallbackEncoding]
>> +                except Exception as decode_exception:
>> +                    if not error_from_fallback:
>> +                        return [s, None]
>> +                    error = decode_exception
>> +            raise MetadataDecodingException(s, error)
>> +
>> +    def metadata_stream_to_writable_bytes(self, s):
>> +        if not isinstance(s, bytes):
>> +            return s.encode('utf_8')
>> +        if self.decodingStrategy == 'passthrough':
>> +            return s
>> +
>> +        [text, encoding] = self.decode_metadata(s, False)
>> +        if encoding == 'utf_8':
>> +            # s is of utf-8 already
>> +            return s
>> +
>> +        if encoding is None:
>> +            # could not decode s, even with fallback encoding
>> +            if not self.decoding_escape_warning_issued:
>> +                print("\nCould not decode value with configured 
>> fallback encoding %s; escaping bytes over 127: %s" % 
>> (self.fallbackEncoding, s))
>> +                print("\n(this warning is only displayed once during 
>> an import)")
>> +                self.decoding_escape_warning_issued = True
>> +            escaped_bytes = b''
>> +            # bytes and strings work very differently in python2 vs 
>> python3...
>> +            if str is bytes:
>> +                for byte in s:
>> +                    byte_number = struct.unpack('>B', byte)[0]
>> +                    if byte_number > 127:
>> +                        escaped_bytes += b'%'
>> +                        escaped_bytes += hex(byte_number)[2:].upper()
>> +                    else:
>> +                        escaped_bytes += byte
>> +            else:
>> +                for byte_number in s:
>> +                    if byte_number > 127:
>> +                        escaped_bytes += b'%'
>> +                        escaped_bytes += 
>> hex(byte_number).upper().encode()[2:]
>> +                    else:
>> +                        escaped_bytes += bytes([byte_number])
>> +            return escaped_bytes
>>   -        raise MetadataDecodingException(s)
>> +        # were able to decode but not to utf-8
>> +        return text.encode('utf_8')
>>       def decode_path(path):
>> @@ -898,14 +922,14 @@ def p4CmdList(cmd, stdin=None, 
>> stdin_mode='w+b', cb=None, skip_info=False,
>>                       decoded_entry[key] = value
>>                   # Parse out data if it's an error response
>>                   if decoded_entry.get('code') == 'error' and 'data' 
>> in decoded_entry:
>> -                    decoded_entry['data'] = 
>> decoded_entry['data'].decode()
>> +                    decoded_entry['data'] = 
>> metadataTranscoder.decode_metadata(decoded_entry['data'])
>>                   entry = decoded_entry
>>               if skip_info:
>>                   if 'code' in entry and entry['code'] == 'info':
>>                       continue
>>               for key in p4KeysContainingNonUtf8Chars():
>>                   if key in entry:
>> -                    entry[key] = 
>> metadata_stream_to_writable_bytes(entry[key])
>> +                    entry[key] = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(entry[key])
>>               if cb is not None:
>>                   cb(entry)
>>               else:
>> @@ -1718,7 +1742,7 @@ class P4UserMap:
>>               # python2 or python3. To support
>>               # git-p4.metadataDecodingStrategy=fallback, self.users 
>> dict values
>>               # are always bytes, ready to be written to git.
>> -            emailbytes = 
>> metadata_stream_to_writable_bytes(output["Email"])
>> +            emailbytes = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(output["Email"])
>>               self.users[output["User"]] = output["FullName"] + b" <" 
>> + emailbytes + b">"
>>               self.emails[output["Email"]] = output["User"]
>>   @@ -1730,12 +1754,12 @@ class P4UserMap:
>>                   fullname = mapUser[0][1]
>>                   email = mapUser[0][2]
>>                   fulluser = fullname + " <" + email + ">"
>> -                self.users[user] = 
>> metadata_stream_to_writable_bytes(fulluser)
>> +                self.users[user] = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(fulluser)
>>                   self.emails[email] = user
>>             s = b''
>>           for (key, val) in self.users.items():
>> -            keybytes = metadata_stream_to_writable_bytes(key)
>> +            keybytes = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(key)
>>               s += b"%s\t%s\n" % (keybytes.expandtabs(1), 
>> val.expandtabs(1))
>>             open(self.getUserCacheFilename(), 'wb').write(s)
>> @@ -3349,7 +3373,7 @@ class P4Sync(Command, P4UserMap):
>>           if userid in self.users:
>>               return self.users[userid]
>>           else:
>> -            userid_bytes = metadata_stream_to_writable_bytes(userid)
>> +            userid_bytes = 
>> metadataTranscoder.metadata_stream_to_writable_bytes(userid)
>>               return b"%s <a@b>" % userid_bytes
>>         def streamTag(self, gitStream, labelName, labelDetails, 
>> commit, epoch):
>> @@ -4561,6 +4585,7 @@ commands = {
>>       "unshelve": P4Unshelve,
>>   }
>>   +metadataTranscoder = 
>> MetadataTranscoder(defaultMetadataDecodingStrategy, 
>> defaultFallbackMetadataEncoding)
>>     def main():
>>       if len(sys.argv[1:]) == 0:
>> diff --git a/t/meson.build b/t/meson.build
>> index a59da26be3f..656424fdff3 100644
>> --- a/t/meson.build
>> +++ b/t/meson.build
>> @@ -1090,6 +1090,7 @@ integration_tests = [
>>     't9834-git-p4-file-dir-bug.sh',
>>     't9835-git-p4-metadata-encoding-python2.sh',
>>     't9836-git-p4-metadata-encoding-python3.sh',
>> +  't9837-git-p4-error-encoding.sh',
>>     't9850-shell.sh',
>>     't9901-git-web--browse.sh',
>>     't9902-completion.sh',
>> diff --git a/t/t9837-git-p4-error-encoding.sh 
>> b/t/t9837-git-p4-error-encoding.sh
>> new file mode 100755
>> index 00000000000..1ea774afb1b
>> --- /dev/null
>> +++ b/t/t9837-git-p4-error-encoding.sh
>> @@ -0,0 +1,53 @@
>> +#!/bin/sh
>> +
>> +test_description='git p4 error encoding
>> +
>> +This test checks that the import process handles inconsistent text
>> +encoding in p4 error messages without failing'
>> +
>> +. ./lib-git-p4.sh
>> +
>> +###############################
>> +## SECTION REPEATED IN t9835 ##
>> +###############################
>> +
>> +# These tests require Perforce with non-unicode setup.
>> +out=$(2>&1 P4CHARSET=utf8 p4 client -o)
>> +if test $? -eq 0
>> +then
>> +    skip_all="skipping git p4 error encoding tests; Perforce is 
>> setup with unicode"
>> +    test_done
>> +fi
>> +
>> +# These tests are specific to Python 3. Write a custom script that 
>> executes
>> +# git-p4 directly with the Python 3 interpreter to ensure that we 
>> use that
>> +# version even if Git was compiled with Python 2.
>> +python_target_binary=$(which python3)
>> +if test -n "$python_target_binary"
>> +then
>> +    mkdir temp_python
>> +    PATH="$(pwd)/temp_python:$PATH"
>> +    export PATH
>> +
>> +    write_script temp_python/git-p4-python3 <<-EOF
>> +    exec "$python_target_binary" "$(git --exec-path)/git-p4" "\$@"
>> +    EOF
>> +fi
>> +
>> +git p4-python3 >err
>> +if ! grep 'valid commands' err
>> +then
>> +    skip_all="skipping python3 git p4 tests; python3 not available"
>> +    test_done
>> +fi
>> +
>> +test_expect_success 'start p4d' '
>> +    start_p4d
>> +'
>> +
>> +test_expect_success 'see if Perforce error with characters not 
>> convertable to utf-8 will be processed correctly' '
>> +    test_when_finished cleanup_git &&
>> +    $python_target_binary 
>> "$TEST_DIRECTORY"/t9837/git-p4-error-python3.py "$TEST_DIRECTORY"
>> +'
>> +
>> +test_done
>> diff --git a/t/t9837/git-p4-error-python3.py 
>> b/t/t9837/git-p4-error-python3.py
>> new file mode 100644
>> index 00000000000..fb65aee386e
>> --- /dev/null
>> +++ b/t/t9837/git-p4-error-python3.py
>> @@ -0,0 +1,15 @@
>> +import os
>> +import sys
>> +from  importlib.machinery import SourceFileLoader
>> +
>> +def main():
>> +    if len(sys.argv[1:]) != 1:
>> +        print("Expected test directory name")
>> +
>> +    gitp4_path = sys.argv[1] + "/../git-p4.py"
>> +    gitp4 = SourceFileLoader("gitp4", gitp4_path).load_module()
>> +    gitp4.p4CmdList(["edit", b'\xFEfile'])
>> +
>> +if __name__ == '__main__':
>> +    main()
>> +
>>
>> base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e

  reply	other threads:[~2025-03-25 23:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  3:20 [PATCH] git p4 fix for failure to decode p4 errors Nikolay Shustov via GitGitGadget
2025-03-22 11:48 ` Nikolay Shustov
2025-03-25 23:09   ` Nikolay Shustov [this message]
2025-03-26 15:09     ` Phillip Wood
2025-03-30 20:06       ` Nikolay Shustov
2025-03-31  1:21         ` Nikolay Shustov
2025-03-31  9:01           ` Fahad Al-Rashed
2025-03-31 23:37             ` Nikolay Shustov
2025-04-05 18:46               ` Nikolay Shustov
2025-04-08 12:13                 ` Nikolay Shustov
     [not found]                   ` <CAFd+s4USsHPaepvfNtjm5VGieuH89zbW5Yj+OSXD8THxkj6tTw@mail.gmail.com>
2025-04-09 12:20                     ` Nikolay Shustov
2025-04-08 13:18         ` Phillip Wood
2025-04-08 13:28           ` Nikolay Shustov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fdbb3f88-7321-4dc0-9ead-7ed9ef0fc995@gmail.com \
    --to=nikolay.shustov@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.