git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole
Date: Tue, 21 Feb 2012 13:25:59 -0500	[thread overview]
Message-ID: <20120221182559.GB32668@sigill.intra.peff.net> (raw)
In-Reply-To: <1329834292-2511-4-git-send-email-pclouds@gmail.com>

On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote:

> Commit object has its own format, which happens to be in ascii, but
> not really subject to re-encoding.
> 
> There are only four areas that may be re-encoded: author line,
> committer line, mergetag lines and commit body.  Encoding of tags
> embedded in mergetag lines is not decided by commit encoding, so leave
> it out and consider it binary.

Is this worth the effort? Yes, re-encoding the ASCII bits of the commit
object is unnecessary. But do we actually handle encodings that are not
ASCII supersets? IOW, I could see the point if this is making it
possible to hold utf-16 names and messages in your commits (though why
you would want to do so is beyond me...). But my understanding is that
this is horribly broken anyway by other parts of the code. And even
looking at your code below:

> +static char *reencode_commit(const char *buffer,
> +			     const char *out_enc, const char *in_enc)
> +{
> +	struct strbuf out = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	char *reencoded, *s, *e;
> +
> +	strbuf_addstr(&buf, buffer);
> +
> +	s = strstr(buf.buf, "\nauthor ");
> +	assert(s != NULL);

Wouldn't this assert trigger in the presence of encodings which
contain ASCII NUL (e.g., wide encodings like utf-16)?

Is there an encoding you have in mind which would be helped by this?

-Peff

  reply	other threads:[~2012-02-21 18:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy
2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy
2012-02-21 14:53   ` Nguyen Thai Ngoc Duy
2012-02-21 18:21   ` Jeff King
2012-02-22  2:17     ` Nguyen Thai Ngoc Duy
2012-02-23 11:25     ` Peter Krefting
2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy
2012-02-21 17:36   ` Junio C Hamano
2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy
2012-02-21 18:25   ` Jeff King [this message]
2012-02-22  2:01     ` Nguyen Thai Ngoc Duy
2012-02-22  3:14       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120221182559.GB32668@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).