From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Git Mailing List <git@vger.kernel.org>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: [PATCH 4/4] utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text()
Date: Fri, 19 Feb 2010 23:20:44 +0100 [thread overview]
Message-ID: <4B7F0EBC.4060209@lsrfire.ath.cx> (raw)
In-Reply-To: <4B7F0D08.6040608@lsrfire.ath.cx>
is_utf8() works by calling utf8_width() for each character at the
supplied location. In strbuf_add_wrapped_text(), we do that anyway
while wrapping the lines. So instead of checking the encoding
beforehand, optimistically assume that it's utf-8 and wrap along
until an invalid character is hit, and when that happens start over.
This pays off if the text consists only of valid utf-8 characters.
The following command was run against the Linux kernel repo with
git 1.7.0:
$ time git log --format='%b' v2.6.32 >/dev/null
real 0m2.679s
user 0m2.580s
sys 0m0.100s
$ time git log --format='%w(60,4,8)%b' >/dev/null
real 0m4.342s
user 0m4.230s
sys 0m0.110s
And with this patch series:
$ time git log --format='%w(60,4,8)%b' >/dev/null
real 0m3.741s
user 0m3.630s
sys 0m0.110s
So the cost of wrapping is reduced to 70% in this case.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Missing: numbers for a non-utf-8 repo.
utf8.c | 23 +++++++++++++++++------
1 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/utf8.c b/utf8.c
index 87437b0..84cfc72 100644
--- a/utf8.c
+++ b/utf8.c
@@ -324,16 +324,21 @@ static size_t display_mode_esc_sequence_len(const char *s)
* consumed (and no extra indent is necessary for the first line).
*/
int strbuf_add_wrapped_text(struct strbuf *buf,
- const char *text, int indent, int indent2, int width)
+ const char *text, int indent1, int indent2, int width)
{
- int w = indent, assume_utf8 = is_utf8(text);
- const char *bol = text, *space = NULL;
+ int indent, w, assume_utf8 = 1;
+ const char *bol, *space, *start = text;
+ size_t orig_len = buf->len;
if (width <= 0) {
- strbuf_add_indented_text(buf, text, indent, indent2);
+ strbuf_add_indented_text(buf, text, indent1, indent2);
return 1;
}
+retry:
+ bol = text;
+ w = indent = indent1;
+ space = NULL;
if (indent < 0) {
w = -indent;
space = text;
@@ -385,9 +390,15 @@ new_line:
}
continue;
}
- if (assume_utf8)
+ if (assume_utf8) {
w += utf8_width(&text, NULL);
- else {
+ if (!text) {
+ assume_utf8 = 0;
+ text = start;
+ strbuf_setlen(buf, orig_len);
+ goto retry;
+ }
+ } else {
w++;
text++;
}
--
1.7.0
next prev parent reply other threads:[~2010-02-19 22:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-19 22:13 [PATCH 0/4] utf8.c: strbuf'ify strbuf_add_wrapped_text() René Scharfe
2010-02-19 22:15 ` [PATCH 1/4] utf8.c: remove print_wrapped_text() René Scharfe
2010-02-19 22:15 ` [PATCH 2/4] utf8.c: remove print_spaces() René Scharfe
2010-02-19 22:16 ` [PATCH 3/4] utf8.c: remove strbuf_write() René Scharfe
2010-02-19 22:20 ` René Scharfe [this message]
2010-02-20 9:14 ` [PATCH 4/4] utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text() Johannes Schindelin
2010-02-20 19:22 ` Junio C Hamano
2010-02-20 9:03 ` [PATCH 0/4] utf8.c: strbuf'ify strbuf_add_wrapped_text() Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B7F0EBC.4060209@lsrfire.ath.cx \
--to=rene.scharfe@lsrfire.ath.cx \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).