git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan-Philip Gehrcke <jgehrcke@googlemail.com>
To: "Torsten Bögershausen" <tboegi@web.de>, git@vger.kernel.org
Subject: Re: Should the --encoding argument to log/show commands make any guarantees about their output?
Date: Tue, 16 Jun 2015 11:38:45 +0200	[thread overview]
Message-ID: <557FEEA5.2080006@googlemail.com> (raw)
In-Reply-To: <557EFB94.3040104@web.de>

On 15.06.2015 18:21, Torsten Bögershausen wrote:
> On 2015-06-15 10.50, Jan-Philip Gehrcke wrote:
>> Let me describe what I think it currently does:
>>
>> The program attempts to re-code a log message, so it follows the chain
>>
>>      raw input -> unicode -> raw output
> Not sure what "raw input/output" means.
> But there is only one reencode step involved, e.g.
> input(8859) -> output(UTF-8)

We surely agree. With "raw" I meant a sequence of bytes, and with 
"unicode" I meant the intermediate state in the process of re-encoding 
(which can be thought of as decoding and encoding with a transient 
intermediate state).

> If the user ignores this warning, how should Git guess the encoding  ?

I entirely appreciate that there is no satisfying solution to this very 
problem.

>> If this step fails (if the entry contains a byte sequence that is invalid in the specified/assumed input codec),
>> the procedure is aborted and the data is dumped as is (obviously without applying the requested output encoding).
>>
>> Is that correct?
> Yes, see above.

Thanks!

>> Hence, from my point of view, the rational that git show/log should be able to output *text* information means
>> that they should not emit byte sequences that are invalid in the codec specified via the --encoding argument.
>> In the current situation, the work of dealing with invalid byte sequences is just outsourced to software
>> further below in the tool chain
>> (at some point a replacement character � should be displayed to the user instead of the invalid raw bytes).
>>
>> I am not entirely sure where this discussion should lead to.
> Yes, until someone writes a patch to improve either the documentation or the code,
> nothing will be changed.
>> However, I think that if the behavior of the software will not be changed,
>> then the documentation for the --encoding option should be more precise and
>> clarify what actually happens behind the scenes. What do you think?
> Patches are more than welcome.

I'd be willing to contribute, but of course there must be a discussion 
and an agreement before that, if there is need to change something at 
all, and what exactly.

To this discussion I would like to contribute that I am of the opinion 
that there should be a command line option to make git show/log/friends 
emit a byte stream that is guaranteed to be valid in a given codec.

That would require detection and treatment of those cases where 
corrupted text resides in the repository (we cannot prevent it from 
entering the repository, as discussed). In these cases, one could emit a 
replacement symbol (e.g. '?') per invalid byte subsequence (this seems 
to be more established than just swallowing the invalid byte sequence).

What do you think?

I think the --encoding option would have ideal semantics for described 
behavior.

However, I guess maintaining backwards compatibility is an issue here. 
On the other hand, I realize that the --encoding option undergoes 
changes: the docs for git log in release 2.4.3 do not even list the 
--encoding option anymore. Why is that? I haven't found a corresponding 
changelog/release notes entry.


Thanks,


Jan-Philip

  reply	other threads:[~2015-06-16  9:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15  8:50 Should the --encoding argument to log/show commands make any guarantees about their output? Jan-Philip Gehrcke
2015-06-15 16:21 ` Torsten Bögershausen
2015-06-16  9:38   ` Jan-Philip Gehrcke [this message]
2015-06-16 20:04     ` Torsten Bögershausen
2015-06-17 16:42 ` Junio C Hamano
2015-06-17 17:07   ` Jan-Philip Gehrcke
2015-06-17 18:46     ` Jeff King
2015-06-17 20:02       ` Junio C Hamano
2015-06-17 19:55     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=557FEEA5.2080006@googlemail.com \
    --to=jgehrcke@googlemail.com \
    --cc=git@vger.kernel.org \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).