All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Geoffrey Thomas <geofft@MIT.EDU>,
	git@vger.kernel.org
Subject: Re: [PATCH] builtin-blame.c: Use utf8_strwidth for author's names
Date: Mon, 2 Feb 2009 07:41:40 -0500	[thread overview]
Message-ID: <20090202124139.GA8325@sigio.peff.net> (raw)
In-Reply-To: <7v8wopmizw.fsf@gitster.siamese.dyndns.org>

On Sun, Feb 01, 2009 at 10:48:51PM -0800, Junio C Hamano wrote:

> > I do not know what encoding the author is at that point, but if you cannot 
> > be sure that it is UTF-8, using utf8_strwidth() is just as wrong as the 
> > current code, IMHO.
> 
> That is true, but then we are not losing anything.
> 
> This codepath is not about the payload (the contents of the files) but the
> author name part of the commit log message, and UTF-8 would probably be
> the only sensible encoding to standardize on.
> 
> If your project uses UTF-8 for everybody, great, we will align them better
> than we did before.  If not, sorry, you will get a different misaligned
> names.
> 
> That assumes utf8_width() does not barf when fed an invalid byte sequence,
> but I did not think it is that fragile (I didn't actually audit the
> codepath, though).

We should be able to know the encoding (we call reencode_commit_message,
but we don't bother to save the result). It should be trivial to do:

int strwidth(const char *s, const char *encoding)
{
  if (!strcmp(encoding, "utf-8"))
    return utf8_strwidth(s);
  /* ideally, else if (some_other_encoding_family) */
  else
    return strlen(s);
}

Then utf-8 is fixed, and other encodings keep identical behavior (and
don't even waste cycles on utf-8 decoding). And it should be obvious to
anyone who wants to add a width detector for their pet encoding where it
should go.

-Peff

  parent reply	other threads:[~2009-02-02 12:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30  9:41 [PATCH] utf8: add utf8_strwidth() Geoffrey Thomas
2009-01-30  9:41 ` [PATCH] builtin-blame.c: Use utf8_strwidth for author's names Geoffrey Thomas
2009-01-30 17:12   ` Johannes Schindelin
2009-01-30 22:22     ` Geoffrey Thomas
2009-01-31  7:24       ` Jeff King
2009-02-01 22:34       ` Johannes Schindelin
2009-02-02  6:48         ` Junio C Hamano
2009-02-02 12:40           ` Johannes Schindelin
2009-02-03  4:30             ` Junio C Hamano
2009-02-02 12:41           ` Jeff King [this message]
2009-01-31  7:17 ` [PATCH] utf8: add utf8_strwidth() Jeff King
2009-01-31  8:51   ` Geoffrey Thomas
2009-01-31  8:56     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090202124139.GA8325@sigio.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=geofft@MIT.EDU \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.