All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Git Mailing List <git@vger.kernel.org>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: [PATCH 4/4] utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text()
Date: Fri, 19 Feb 2010 23:20:44 +0100	[thread overview]
Message-ID: <4B7F0EBC.4060209@lsrfire.ath.cx> (raw)
In-Reply-To: <4B7F0D08.6040608@lsrfire.ath.cx>

is_utf8() works by calling utf8_width() for each character at the
supplied location.  In strbuf_add_wrapped_text(), we do that anyway
while wrapping the lines.  So instead of checking the encoding
beforehand, optimistically assume that it's utf-8 and wrap along
until an invalid character is hit, and when that happens start over.

This pays off if the text consists only of valid utf-8 characters.
The following command was run against the Linux kernel repo with
git 1.7.0:

	$ time git log --format='%b' v2.6.32 >/dev/null

	real	0m2.679s
	user	0m2.580s
	sys	0m0.100s

	$ time git log --format='%w(60,4,8)%b' >/dev/null

	real	0m4.342s
	user	0m4.230s
	sys	0m0.110s

And with this patch series:

	$ time git log --format='%w(60,4,8)%b' >/dev/null

	real	0m3.741s
	user	0m3.630s
	sys	0m0.110s

So the cost of wrapping is reduced to 70% in this case.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Missing: numbers for a non-utf-8 repo.

 utf8.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/utf8.c b/utf8.c
index 87437b0..84cfc72 100644
--- a/utf8.c
+++ b/utf8.c
@@ -324,16 +324,21 @@ static size_t display_mode_esc_sequence_len(const char *s)
  * consumed (and no extra indent is necessary for the first line).
  */
 int strbuf_add_wrapped_text(struct strbuf *buf,
-		const char *text, int indent, int indent2, int width)
+		const char *text, int indent1, int indent2, int width)
 {
-	int w = indent, assume_utf8 = is_utf8(text);
-	const char *bol = text, *space = NULL;
+	int indent, w, assume_utf8 = 1;
+	const char *bol, *space, *start = text;
+	size_t orig_len = buf->len;
 
 	if (width <= 0) {
-		strbuf_add_indented_text(buf, text, indent, indent2);
+		strbuf_add_indented_text(buf, text, indent1, indent2);
 		return 1;
 	}
 
+retry:
+	bol = text;
+	w = indent = indent1;
+	space = NULL;
 	if (indent < 0) {
 		w = -indent;
 		space = text;
@@ -385,9 +390,15 @@ new_line:
 			}
 			continue;
 		}
-		if (assume_utf8)
+		if (assume_utf8) {
 			w += utf8_width(&text, NULL);
-		else {
+			if (!text) {
+				assume_utf8 = 0;
+				text = start;
+				strbuf_setlen(buf, orig_len);
+				goto retry;
+			}
+		} else {
 			w++;
 			text++;
 		}
-- 
1.7.0

  parent reply	other threads:[~2010-02-19 22:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-19 22:13 [PATCH 0/4] utf8.c: strbuf'ify strbuf_add_wrapped_text() René Scharfe
2010-02-19 22:15 ` [PATCH 1/4] utf8.c: remove print_wrapped_text() René Scharfe
2010-02-19 22:15 ` [PATCH 2/4] utf8.c: remove print_spaces() René Scharfe
2010-02-19 22:16 ` [PATCH 3/4] utf8.c: remove strbuf_write() René Scharfe
2010-02-19 22:20 ` René Scharfe [this message]
2010-02-20  9:14   ` [PATCH 4/4] utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text() Johannes Schindelin
2010-02-20 19:22     ` Junio C Hamano
2010-02-20  9:03 ` [PATCH 0/4] utf8.c: strbuf'ify strbuf_add_wrapped_text() Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B7F0EBC.4060209@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.