git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: linux@horizon.com
To: junkio@cox.net
Cc: git@vger.kernel.org
Subject: Re: [PATCH] git-blame: Make the output human readable
Date: 6 Mar 2006 14:33:26 -0500	[thread overview]
Message-ID: <20060306193326.19262.qmail@science.horizon.com> (raw)

Well, getting 15 characters in UTF-8 is easy (just stop before the 16th
byte for which ((b & 0xc0) != 0x80)), but what about combining characters?

You've got accents and stuff to worry about.  And the annoying fact that
Unicode defined accents as suffixes, so you have to go past the 15th
column to include all of the 

And then there's that fact that many characters are traditionally
represented as double-wide forms, even on character terminals.

See http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c for details
an an example implementation of wcwidth().

Using that, it would be something like (compiles but untested):

/*
 * Return the number of bytes from the nul-terminated utf8 string
 * that can be printed in at most max columns using a monospaced
 * font.  *actual returns the number of columns actually occupied,
 * which may be less than max.
 *
 * Output is truncated before any control characters or illegal
 * UTF-8 sequences.
 */
unsigned
fit_columns(char const *utf8, unsigned max, unsigned *actual)
{
	char const * const origin = utf8;
	unsigned width = 0;
	unsigned pos = 0;
	unsigned c;

	for (;;) {
		unsigned w;
		unsigned c = *utf8++;

		/* Part 1: Parse the next Unicode code point */
		if (c < 0x20) {
			break;	/* Control character - stop */
		} else if (c < 0x7F) {
			w = 1;	/* Standard ASCII */
		} else if (c < 0xC2 || c > 0xF4) {
			break;	/* DEL or illegal Unicode */
		} else {
			/* Multi-byte UTF-8 sequence */
			unsigned n;
			unsigned char byte = *utf8++;

			if (c < 0xE0) {
				/* 2-byte sequence: U+0080..U+07FF */
				n = 1;
				c &= 0x1F;
			} else if (c < 0xF0) {
				/* 3-byte sequence: U+0800..U+FFFF */
				if (c == 0xE0 && byte < 0xA0)
					break;	/* < /U+0800 */
				n = 2;
				c &= 0x0F;
			} else {
				/* 4-byte sequence: U+10000..U+10FFFF */
				if (byte < 0x90 ? c == 0xF0 : c == 0xF4)
					break; /* < 10000 or > 10FFFF */
				n = 3;
				c &= 0x07;
			}

			for (; n--; byte = *utf8++) {
				if (byte & 0xc0 != 0x80)
					goto done;	/* Double break */
				c = (c << 6) | (byte & 0x3f);
			}
			/* Now find the width of it */
			w = wcwidth(c);
			if (w == -1)
				break;
		}

		/* Part 2: Figure out if it will fit */
		if (width + w > max)
			break;	/* Would exceed space - stop */
		/* Part 3: It fits; update our statistics */
		width += w;
		pos = (unsigned)(utf8 - origin);
	}

done:
	if (actual)
		*actual = width;
	return pos;
}

             reply	other threads:[~2006-03-06 19:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-06 19:33 linux [this message]
2006-03-08 14:32 ` [PATCH] git-blame: Make the output human readable Sergey Vlasov
2006-03-08 18:04   ` linux
2006-03-08 18:30     ` Sergey Vlasov
2006-03-08 19:06       ` linux
  -- strict thread matches above, loose matches on Subject: below --
2006-03-05 11:03 Fredrik Kuivinen
2006-03-05 12:10 ` Junio C Hamano
2006-03-05 12:38   ` Fredrik Kuivinen
2006-03-05 14:23     ` Johannes Schindelin
2006-03-05 21:28     ` Junio C Hamano
2006-03-07 16:34       ` Fredrik Kuivinen
2006-03-05 14:11   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060306193326.19262.qmail@science.horizon.com \
    --to=linux@horizon.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).