* [PATCH] git-p4: fix bug with encoding of p4 client name @ 2022-07-08 8:01 Kilian Kilger via GitGitGadget 2022-07-08 11:28 ` Tao Klerks 2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget 0 siblings, 2 replies; 9+ messages in thread From: Kilian Kilger via GitGitGadget @ 2022-07-08 8:01 UTC (permalink / raw) To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger From: Kilian Kilger <kilian.kilger@sap.com> The Perforce client name can contain arbitrary characters which do not decode to UTF-8. Use the fallback strategy implemented in metadata_stream_to_writable_bytes() also for the client name. Signed-off-by: Kilian Kilger <kkilger@gmail.com> --- git-p4: Fix bug with encoding of P4 client name Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v1 Pull-Request: https://github.com/git/git/pull/1285 git-p4.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/git-p4.py b/git-p4.py index 8fbf6eb1fe3..e65d6a2b0e1 100755 --- a/git-p4.py +++ b/git-p4.py @@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, if bytes is not str: # Decode unmarshalled dict to use str keys and values, except for: # - `data` which may contain arbitrary binary data - # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes + # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() decoded_entry = {} for key, value in entry.items(): key = key.decode() - if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')): + if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')): value = value.decode() decoded_entry[key] = value # Parse out data if it's an error response @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, continue if 'desc' in entry: entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) + if 'client' in entry: + entry['client'] = metadata_stream_to_writable_bytes(entry['client']) if 'FullName' in entry: entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) if cb is not None: base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5 -- gitgitgadget ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] git-p4: fix bug with encoding of p4 client name 2022-07-08 8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget @ 2022-07-08 11:28 ` Tao Klerks 2022-07-08 15:05 ` Junio C Hamano 2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget 1 sibling, 1 reply; 9+ messages in thread From: Tao Klerks @ 2022-07-08 11:28 UTC (permalink / raw) To: Kilian Kilger via GitGitGadget; +Cc: git, Kilian Kilger, Kilian Kilger This makes sense to me, and I don't see anything wrong with the "form" (and nor does GitGitGadget). Not sure whether formal review sign-off is used on this list, I don't tend to see it, but do I see "Reviewed-by" on patches, so FWIW: Reviewed-by: Tao Klerks <tao@klerks.biz> On Fri, Jul 8, 2022 at 10:01 AM Kilian Kilger via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Kilian Kilger <kilian.kilger@sap.com> > > The Perforce client name can contain arbitrary characters > which do not decode to UTF-8. Use the fallback strategy > implemented in metadata_stream_to_writable_bytes() also > for the client name. > > Signed-off-by: Kilian Kilger <kkilger@gmail.com> > --- > git-p4: Fix bug with encoding of P4 client name > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v1 > Pull-Request: https://github.com/git/git/pull/1285 > > git-p4.py | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/git-p4.py b/git-p4.py > index 8fbf6eb1fe3..e65d6a2b0e1 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, > if bytes is not str: > # Decode unmarshalled dict to use str keys and values, except for: > # - `data` which may contain arbitrary binary data > - # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes > + # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes > # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() > decoded_entry = {} > for key, value in entry.items(): > key = key.decode() > - if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')): > + if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')): > value = value.decode() > decoded_entry[key] = value > # Parse out data if it's an error response > @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, > continue > if 'desc' in entry: > entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) > + if 'client' in entry: > + entry['client'] = metadata_stream_to_writable_bytes(entry['client']) > if 'FullName' in entry: > entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) > if cb is not None: > > base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5 > -- > gitgitgadget ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-p4: fix bug with encoding of p4 client name 2022-07-08 11:28 ` Tao Klerks @ 2022-07-08 15:05 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2022-07-08 15:05 UTC (permalink / raw) To: Tao Klerks Cc: Kilian Kilger via GitGitGadget, git, Kilian Kilger, Kilian Kilger Tao Klerks <tao@klerks.biz> writes: > > > On Fri, Jul 8, 2022 at 10:01 AM Kilian Kilger via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Kilian Kilger <kilian.kilger@sap.com> >> >> The Perforce client name can contain arbitrary characters >> which do not decode to UTF-8. Use the fallback strategy >> implemented in metadata_stream_to_writable_bytes() also >> for the client name. >> >> Signed-off-by: Kilian Kilger <kkilger@gmail.com> >> --- >> ... >> >> @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, >> continue >> if 'desc' in entry: >> entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) >> + if 'client' in entry: >> + entry['client'] = metadata_stream_to_writable_bytes(entry['client']) >> if 'FullName' in entry: >> entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'] We had two repetitions and now we have three, which is a good time to see if it makes sense to reduce the temptation for future developers to add the fourth repetition in the next round, e.g. for e in ["client", "desc", "FullName"]: if e in entry: entry[e] = metadata_stream_to_writable_bytes(entry[e]) or something like that? > This makes sense to me, and I don't see anything wrong with the "form" > (and nor does GitGitGadget). One thing that is a bit problematic is that in-body From does not match the sign-off. Kilian, which identity do you want to use in your contribution to this project? > Not sure whether formal review sign-off is used on this list, I don't > tend to see it, but do I see "Reviewed-by" on patches, so FWIW: > > Reviewed-by: Tao Klerks <tao@klerks.biz> Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] git-p4: fix bug with encoding of p4 client name 2022-07-08 8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget 2022-07-08 11:28 ` Tao Klerks @ 2022-07-18 8:57 ` Kilian Kilger via GitGitGadget 2022-07-18 16:36 ` Junio C Hamano 2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget 1 sibling, 2 replies; 9+ messages in thread From: Kilian Kilger via GitGitGadget @ 2022-07-18 8:57 UTC (permalink / raw) To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger From: Kilian Kilger <kkilger@gmail.com> The Perforce client name can contain arbitrary characters which do not decode to UTF-8. Use the fallback strategy implemented in metadata_stream_to_writable_bytes() also for the client name. Signed-off-by: Kilian Kilger <kkilger@gmail.com> --- git-p4: Fix bug with encoding of P4 client name Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v2 Pull-Request: https://github.com/git/git/pull/1285 Range-diff vs v1: 1: 7393b59c642 ! 1: 3280a9579bc git-p4: fix bug with encoding of p4 client name @@ ## Metadata ## -Author: Kilian Kilger <kilian.kilger@sap.com> +Author: Kilian Kilger <kkilger@gmail.com> ## Commit message ## git-p4: fix bug with encoding of p4 client name @@ Commit message Signed-off-by: Kilian Kilger <kkilger@gmail.com> ## git-p4.py ## +@@ git-p4.py: def isModeExecChanged(src_mode, dst_mode): + return isModeExec(src_mode) != isModeExec(dst_mode) + + ++def p4KeysContainingNonUtf8Chars(): ++ """Returns all keys which may contain non UTF-8 encoded strings ++ for which a fallback strategy has to be applied. ++ """ ++ return ['desc', 'client', 'FullName'] ++ ++ ++def p4KeysContainingBinaryData(): ++ """Returns all keys which may contain arbitrary binary data ++ """ ++ return ['data'] ++ ++ ++def p4KeyContainsFilePaths(key): ++ """Returns True if the key contains file paths. These are handled by decode_path(). ++ Otherwise False. ++ """ ++ return key.startswith('depotFile') or key in ['path', 'clientFile'] ++ ++ ++def p4KeyWhichCanBeDirectlyDecoded(key): ++ """Returns True if the key can be directly decoded as UTF-8 string ++ Otherwise False. ++ ++ Keys which can not be encoded directly: ++ - `data` which may contain arbitrary binary data ++ - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text ++ - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() ++ """ ++ if key in p4KeysContainingNonUtf8Chars() or \ ++ key in p4KeysContainingBinaryData() or \ ++ p4KeyContainsFilePaths(key): ++ return False ++ return True ++ ++ + def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, + errors_as_exceptions=False, *k, **kw): + @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, + try: + while True: + entry = marshal.load(p4.stdout) ++ if bytes is not str: - # Decode unmarshalled dict to use str keys and values, except for: - # - `data` which may contain arbitrary binary data +- # Decode unmarshalled dict to use str keys and values, except for: +- # - `data` which may contain arbitrary binary data - # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes -+ # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes - # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() +- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() ++ # Decode unmarshalled dict to use str keys and values. Special cases are handled below. decoded_entry = {} for key, value in entry.items(): key = key.decode() - if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')): -+ if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')): ++ if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key): value = value.decode() decoded_entry[key] = value # Parse out data if it's an error response @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, + if skip_info: + if 'code' in entry and entry['code'] == 'info': continue - if 'desc' in entry: - entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) -+ if 'client' in entry: -+ entry['client'] = metadata_stream_to_writable_bytes(entry['client']) - if 'FullName' in entry: - entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) +- if 'desc' in entry: +- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) +- if 'FullName' in entry: +- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) ++ for key in p4KeysContainingNonUtf8Chars(): ++ if key in entry: ++ entry[key] = metadata_stream_to_writable_bytes(entry[key]) if cb is not None: + cb(entry) + else: git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 42 insertions(+), 9 deletions(-) diff --git a/git-p4.py b/git-p4.py index 8fbf6eb1fe3..9323b943c68 100755 --- a/git-p4.py +++ b/git-p4.py @@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode): return isModeExec(src_mode) != isModeExec(dst_mode) +def p4KeysContainingNonUtf8Chars(): + """Returns all keys which may contain non UTF-8 encoded strings + for which a fallback strategy has to be applied. + """ + return ['desc', 'client', 'FullName'] + + +def p4KeysContainingBinaryData(): + """Returns all keys which may contain arbitrary binary data + """ + return ['data'] + + +def p4KeyContainsFilePaths(key): + """Returns True if the key contains file paths. These are handled by decode_path(). + Otherwise False. + """ + return key.startswith('depotFile') or key in ['path', 'clientFile'] + + +def p4KeyWhichCanBeDirectlyDecoded(key): + """Returns True if the key can be directly decoded as UTF-8 string + Otherwise False. + + Keys which can not be encoded directly: + - `data` which may contain arbitrary binary data + - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text + - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() + """ + if key in p4KeysContainingNonUtf8Chars() or \ + key in p4KeysContainingBinaryData() or \ + p4KeyContainsFilePaths(key): + return False + return True + + def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, errors_as_exceptions=False, *k, **kw): @@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, try: while True: entry = marshal.load(p4.stdout) + if bytes is not str: - # Decode unmarshalled dict to use str keys and values, except for: - # - `data` which may contain arbitrary binary data - # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes - # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() + # Decode unmarshalled dict to use str keys and values. Special cases are handled below. decoded_entry = {} for key, value in entry.items(): key = key.decode() - if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')): + if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key): value = value.decode() decoded_entry[key] = value # Parse out data if it's an error response @@ -869,10 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, if skip_info: if 'code' in entry and entry['code'] == 'info': continue - if 'desc' in entry: - entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) - if 'FullName' in entry: - entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) + for key in p4KeysContainingNonUtf8Chars(): + if key in entry: + entry[key] = metadata_stream_to_writable_bytes(entry[key]) if cb is not None: cb(entry) else: base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5 -- gitgitgadget ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] git-p4: fix bug with encoding of p4 client name 2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget @ 2022-07-18 16:36 ` Junio C Hamano 2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget 1 sibling, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2022-07-18 16:36 UTC (permalink / raw) To: Kilian Kilger via GitGitGadget; +Cc: git, Tao Klerks, Kilian Kilger "Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kilian Kilger <kkilger@gmail.com> > > The Perforce client name can contain arbitrary characters > which do not decode to UTF-8. Use the fallback strategy > implemented in metadata_stream_to_writable_bytes() also > for the client name. > > Signed-off-by: Kilian Kilger <kkilger@gmail.com> > --- > git-p4: Fix bug with encoding of P4 client name Sorry, is this to improve the topic that has already last Monday? If so, it is way too late to update with a replacement patch, like this. Instead, please send in a follow-up patch on top of what has already been merged. The description above seems to be identical to what was in the previous round, but the patch seems to use a lot more code, so such a follow-up patch would have plenty to explain how the new strategy improves over the previous one. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name 2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget 2022-07-18 16:36 ` Junio C Hamano @ 2022-07-21 9:07 ` Kilian Kilger via GitGitGadget 2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget ` (2 more replies) 1 sibling, 3 replies; 9+ messages in thread From: Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 UTC (permalink / raw) To: git; +Cc: Tao Klerks, Kilian Kilger Kilian Kilger (2): git-p4: fix bug with encoding of p4 client name git-p4: refactoring of p4CmdList() git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 42 insertions(+), 9 deletions(-) base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v3 Pull-Request: https://github.com/git/git/pull/1285 Range-diff vs v2: -: ----------- > 1: 87e7809b75a git-p4: fix bug with encoding of p4 client name 1: 3280a9579bc ! 2: 4a81423f0e8 git-p4: fix bug with encoding of p4 client name @@ Metadata Author: Kilian Kilger <kkilger@gmail.com> ## Commit message ## - git-p4: fix bug with encoding of p4 client name + git-p4: refactoring of p4CmdList() - The Perforce client name can contain arbitrary characters - which do not decode to UTF-8. Use the fallback strategy - implemented in metadata_stream_to_writable_bytes() also - for the client name. + The function p4CmdList executes a Perforce command and + decodes the marshalled python dictionary. Special care has to be + taken for certain dictionary values which contain non-unicode characters. + The old handling contained separate hacks for each of the corresponding + dictionary keys. This commit tries to refactor the coding to handle the + special cases uniformely. Signed-off-by: Kilian Kilger <kkilger@gmail.com> @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=F if bytes is not str: - # Decode unmarshalled dict to use str keys and values, except for: - # - `data` which may contain arbitrary binary data -- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes +- # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes - # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() + # Decode unmarshalled dict to use str keys and values. Special cases are handled below. decoded_entry = {} for key, value in entry.items(): key = key.decode() -- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')): +- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')): + if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key): value = value.decode() decoded_entry[key] = value @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=F continue - if 'desc' in entry: - entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) +- if 'client' in entry: +- entry['client'] = metadata_stream_to_writable_bytes(entry['client']) - if 'FullName' in entry: - entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) + for key in p4KeysContainingNonUtf8Chars(): -- gitgitgadget ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/2] git-p4: fix bug with encoding of p4 client name 2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 ` Kilian Kilger via GitGitGadget 2022-07-21 9:07 ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget 2022-07-21 16:46 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano 2 siblings, 0 replies; 9+ messages in thread From: Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 UTC (permalink / raw) To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger From: Kilian Kilger <kilian.kilger@sap.com> The Perforce client name can contain arbitrary characters which do not decode to UTF-8. Use the fallback strategy implemented in metadata_stream_to_writable_bytes() also for the client name. Signed-off-by: Kilian Kilger <kkilger@gmail.com> Reviewed-by: Tao Klerks <tao@klerks.biz> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- git-p4.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/git-p4.py b/git-p4.py index 8fbf6eb1fe3..e65d6a2b0e1 100755 --- a/git-p4.py +++ b/git-p4.py @@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, if bytes is not str: # Decode unmarshalled dict to use str keys and values, except for: # - `data` which may contain arbitrary binary data - # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes + # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() decoded_entry = {} for key, value in entry.items(): key = key.decode() - if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')): + if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')): value = value.decode() decoded_entry[key] = value # Parse out data if it's an error response @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, continue if 'desc' in entry: entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) + if 'client' in entry: + entry['client'] = metadata_stream_to_writable_bytes(entry['client']) if 'FullName' in entry: entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) if cb is not None: -- gitgitgadget ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] git-p4: refactoring of p4CmdList() 2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget 2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 ` Kilian Kilger via GitGitGadget 2022-07-21 16:46 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano 2 siblings, 0 replies; 9+ messages in thread From: Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 UTC (permalink / raw) To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger From: Kilian Kilger <kkilger@gmail.com> The function p4CmdList executes a Perforce command and decodes the marshalled python dictionary. Special care has to be taken for certain dictionary values which contain non-unicode characters. The old handling contained separate hacks for each of the corresponding dictionary keys. This commit tries to refactor the coding to handle the special cases uniformely. Signed-off-by: Kilian Kilger <kkilger@gmail.com> --- git-p4.py | 53 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 42 insertions(+), 11 deletions(-) diff --git a/git-p4.py b/git-p4.py index e65d6a2b0e1..9323b943c68 100755 --- a/git-p4.py +++ b/git-p4.py @@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode): return isModeExec(src_mode) != isModeExec(dst_mode) +def p4KeysContainingNonUtf8Chars(): + """Returns all keys which may contain non UTF-8 encoded strings + for which a fallback strategy has to be applied. + """ + return ['desc', 'client', 'FullName'] + + +def p4KeysContainingBinaryData(): + """Returns all keys which may contain arbitrary binary data + """ + return ['data'] + + +def p4KeyContainsFilePaths(key): + """Returns True if the key contains file paths. These are handled by decode_path(). + Otherwise False. + """ + return key.startswith('depotFile') or key in ['path', 'clientFile'] + + +def p4KeyWhichCanBeDirectlyDecoded(key): + """Returns True if the key can be directly decoded as UTF-8 string + Otherwise False. + + Keys which can not be encoded directly: + - `data` which may contain arbitrary binary data + - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text + - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() + """ + if key in p4KeysContainingNonUtf8Chars() or \ + key in p4KeysContainingBinaryData() or \ + p4KeyContainsFilePaths(key): + return False + return True + + def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, errors_as_exceptions=False, *k, **kw): @@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, try: while True: entry = marshal.load(p4.stdout) + if bytes is not str: - # Decode unmarshalled dict to use str keys and values, except for: - # - `data` which may contain arbitrary binary data - # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes - # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path() + # Decode unmarshalled dict to use str keys and values. Special cases are handled below. decoded_entry = {} for key, value in entry.items(): key = key.decode() - if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')): + if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key): value = value.decode() decoded_entry[key] = value # Parse out data if it's an error response @@ -869,12 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, if skip_info: if 'code' in entry and entry['code'] == 'info': continue - if 'desc' in entry: - entry['desc'] = metadata_stream_to_writable_bytes(entry['desc']) - if 'client' in entry: - entry['client'] = metadata_stream_to_writable_bytes(entry['client']) - if 'FullName' in entry: - entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']) + for key in p4KeysContainingNonUtf8Chars(): + if key in entry: + entry[key] = metadata_stream_to_writable_bytes(entry[key]) if cb is not None: cb(entry) else: -- gitgitgadget ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name 2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget 2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget 2022-07-21 9:07 ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget @ 2022-07-21 16:46 ` Junio C Hamano 2 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2022-07-21 16:46 UTC (permalink / raw) To: Kilian Kilger via GitGitGadget; +Cc: git, Tao Klerks, Kilian Kilger "Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com> writes: > Kilian Kilger (2): > git-p4: fix bug with encoding of p4 client name > git-p4: refactoring of p4CmdList() [PATCH v3 1/2] seems to be identical to 34f67c96 (git-p4: fix bug with encoding of p4 client name, 2022-07-08) that was already merged to 'next' yesterday, so I can safely ignore it and take [PATCH v3 2/2] as if it were a single follow-up patch on the same topic and queue it on top. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-07-21 16:46 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-08 8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget 2022-07-08 11:28 ` Tao Klerks 2022-07-08 15:05 ` Junio C Hamano 2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget 2022-07-18 16:36 ` Junio C Hamano 2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget 2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget 2022-07-21 9:07 ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget 2022-07-21 16:46 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).