* [PATCH] log --format: document %w
@ 2009-11-22 16:15 René Scharfe
2009-11-22 17:10 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: René Scharfe @ 2009-11-22 16:15 UTC (permalink / raw)
To: Git Mailing List; +Cc: Junio C Hamano
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
I'm not especially proud of the triple negative in that note. How to say it
better, yet concise?
Documentation/pretty-formats.txt | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 0946202..7ff6a6c 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -135,6 +135,10 @@ The placeholders are:
- '%m': left, right or boundary mark
- '%n': newline
- '%x00': print a byte from a hex code
+- '%w([<w>[,<i1>[,<i2>]]])': switch line wrapping, like the -w option of
+ linkgit:git-shortlog[1]. NOTE: Color placeholders (`%C*`) are not
+ recognized as having no width, so they should not be put into wrapped
+ sections.
NOTE: Some placeholders may depend on other options given to the
revision traversal engine. For example, the `%g*` reflog options will
--
1.6.5.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] log --format: document %w
2009-11-22 16:15 [PATCH] log --format: document %w René Scharfe
@ 2009-11-22 17:10 ` Junio C Hamano
2009-11-22 23:46 ` René Scharfe
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2009-11-22 17:10 UTC (permalink / raw)
To: René Scharfe; +Cc: Git Mailing List, Junio C Hamano
René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> I'm not especially proud of the triple negative in that note. How to say it
> better, yet concise?
> +- '%w([<w>[,<i1>[,<i2>]]])': switch line wrapping, like the -w option of
> + linkgit:git-shortlog[1]. NOTE: Color placeholders (`%C*`) are not
> + recognized as having no width, so they should not be put into wrapped
> + sections.
"The code miscounts the width of '%C*' color placeholders"?
Perhaps somebody in the codepath leading to pick_one_utf8_char() in utf8.c
can be made aware of them?
utf8_width() is called from many places (has one caller outside utf8.c as
well). It is given a pointer to a pointer that points at the current
position in a string, and is responsible for picking up one logical letter
advancing the given pointer to skip over that letter, and returning the
display width of that one letter. The function wants the string to be
encoded in utf-8 and signals by putting NULL in the pointer when it
detects the input string is not.
Picking up one logical letter is done by pick_one_utf8_char(), which is a
nicely written generic "We are at the character boundary of a potentially
multi-byte utf-8 string; pick the first character" implementation, and we
wouldn't want to contaminate that with escape sequence logic---we might
want to reuse it in other codepaths where we have no reason to expect any
escape sequences.
So perhaps we can introduce is_esc_sequence(s, r, w) that
- returns true if we are at the beginning of an esc-sequence;
- skips the sequence just like utf8_width() does with s and r; and
- counts the width of the sequence and returns it in *w
The implementation of the is_esc_sequence() could be to only detect the
color sequence (if the sequence has things like cursor-position control
then we are already lost, as calling "utf8_width()" on such a string does
not make much sense anyway) and report zero-width.
I dunno.
diff --git a/utf8.c b/utf8.c
index 5c18f0c..d45e75f 100644
--- a/utf8.c
+++ b/utf8.c
@@ -241,7 +241,12 @@ invalid:
*/
int utf8_width(const char **start, size_t *remainder_p)
{
- ucs_char_t ch = pick_one_utf8_char(start, remainder_p);
+ ucs_char_t ch;
+ int w;
+
+ if (is_esc_sequence(start, remainder_p, &w))
+ return w;
+ ch = pick_one_utf8_char(start, remainder_p);
if (!*start)
return 0;
return git_wcwidth(ch);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] log --format: document %w
2009-11-22 17:10 ` Junio C Hamano
@ 2009-11-22 23:46 ` René Scharfe
2009-11-23 0:06 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: René Scharfe @ 2009-11-22 23:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
Junio C Hamano schrieb:
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>
>> I'm not especially proud of the triple negative in that note. How to say it
>> better, yet concise?
>> +- '%w([<w>[,<i1>[,<i2>]]])': switch line wrapping, like the -w option of
>> + linkgit:git-shortlog[1]. NOTE: Color placeholders (`%C*`) are not
>> + recognized as having no width, so they should not be put into wrapped
>> + sections.
>
> "The code miscounts the width of '%C*' color placeholders"?
>
> Perhaps somebody in the codepath leading to pick_one_utf8_char() in utf8.c
> can be made aware of them?
>
> utf8_width() is called from many places (has one caller outside utf8.c as
> well). It is given a pointer to a pointer that points at the current
> position in a string, and is responsible for picking up one logical letter
> advancing the given pointer to skip over that letter, and returning the
> display width of that one letter. The function wants the string to be
> encoded in utf-8 and signals by putting NULL in the pointer when it
> detects the input string is not.
>
> Picking up one logical letter is done by pick_one_utf8_char(), which is a
> nicely written generic "We are at the character boundary of a potentially
> multi-byte utf-8 string; pick the first character" implementation, and we
> wouldn't want to contaminate that with escape sequence logic---we might
> want to reuse it in other codepaths where we have no reason to expect any
> escape sequences.
>
> So perhaps we can introduce is_esc_sequence(s, r, w) that
>
> - returns true if we are at the beginning of an esc-sequence;
> - skips the sequence just like utf8_width() does with s and r; and
> - counts the width of the sequence and returns it in *w
>
> The implementation of the is_esc_sequence() could be to only detect the
> color sequence (if the sequence has things like cursor-position control
> then we are already lost, as calling "utf8_width()" on such a string does
> not make much sense anyway) and report zero-width.
>
> I dunno.
>
> diff --git a/utf8.c b/utf8.c
> index 5c18f0c..d45e75f 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -241,7 +241,12 @@ invalid:
> */
> int utf8_width(const char **start, size_t *remainder_p)
> {
> - ucs_char_t ch = pick_one_utf8_char(start, remainder_p);
> + ucs_char_t ch;
> + int w;
> +
> + if (is_esc_sequence(start, remainder_p, &w))
> + return w;
> + ch = pick_one_utf8_char(start, remainder_p);
> if (!*start)
> return 0;
> return git_wcwidth(ch);
I think utf8_width() is too generic for that; we shouldn't teach it terminal
control details. Something like this? It keeps it all local to
strbuf_add_wrapped_text(); ignoring display mode escape codes in there can be
justified with its purpose.
utf8.c | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/utf8.c b/utf8.c
index 5c18f0c..fcc0aeb 100644
--- a/utf8.c
+++ b/utf8.c
@@ -298,6 +298,21 @@ static void print_spaces(struct strbuf *buf, int count)
strbuf_write(buf, s, count);
}
+/* XXX: this handles display mode sequences, only. Do we need more? */
+static size_t esc_sequence_len(const char *s)
+{
+ const char *p = s;
+ if (*p++ != '\033')
+ return 0;
+ if (*p++ != '[')
+ return 0;
+ while (isdigit(*p) || *p == ';')
+ p++;
+ if (*p++ != 'm')
+ return 0;
+ return p - s;
+}
+
/*
* Wrap the text, if necessary. The variable indent is the indent for the
* first line, indent2 is the indent for all other lines.
@@ -329,7 +344,13 @@ int strbuf_add_wrapped_text(struct strbuf *buf,
}
for (;;) {
- char c = *text;
+ char c;
+ size_t skip;
+
+ while ((skip = esc_sequence_len(text)))
+ text += skip;
+
+ c = *text;
if (!c || isspace(c)) {
if (w < width || !space) {
const char *start = bol;
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] log --format: document %w
2009-11-22 23:46 ` René Scharfe
@ 2009-11-23 0:06 ` Junio C Hamano
2009-11-23 22:40 ` [PATCH] strbuf_add_wrapped_text(): skip over colour codes René Scharfe
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2009-11-23 0:06 UTC (permalink / raw)
To: René Scharfe; +Cc: Git Mailing List
René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> I think utf8_width() is too generic for that; we shouldn't teach it terminal
> control details.
I agree that the function whose purpose is to compute display width should
not be called utf8_width().
The outside caller of utf8_width() in diff.c uses it to truncate the
function name hint on the hunk header line at character boundary. The
input shouldn't have color escapes _we_ add (it might contain such
sequences from the user data, though), so I agree that we shouldn't
contaminate this function with color escapes.
> strbuf_add_wrapped_text(); ignoring display mode escape codes in there can be
> justified with its purpose.
Surely, and thanks. The patch looks good.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] strbuf_add_wrapped_text(): skip over colour codes
2009-11-23 0:06 ` Junio C Hamano
@ 2009-11-23 22:40 ` René Scharfe
0 siblings, 0 replies; 5+ messages in thread
From: René Scharfe @ 2009-11-23 22:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
Ignore display mode escape sequences (colour codes) for the purpose of
text wrapping because they don't have a visible width.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Renamed function to document its purpose and limitation, and remove
the note that is obsoleted by this patch.
Documentation/pretty-formats.txt | 4 +---
utf8.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 7ff6a6c..0683fb3 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -136,9 +136,7 @@ The placeholders are:
- '%n': newline
- '%x00': print a byte from a hex code
- '%w([<w>[,<i1>[,<i2>]]])': switch line wrapping, like the -w option of
- linkgit:git-shortlog[1]. NOTE: Color placeholders (`%C*`) are not
- recognized as having no width, so they should not be put into wrapped
- sections.
+ linkgit:git-shortlog[1].
NOTE: Some placeholders may depend on other options given to the
revision traversal engine. For example, the `%g*` reflog options will
diff --git a/utf8.c b/utf8.c
index 01d1869..7ddff23 100644
--- a/utf8.c
+++ b/utf8.c
@@ -314,6 +314,20 @@ static void strbuf_add_indented_text(struct strbuf *buf, const char *text,
}
}
+static size_t display_mode_esc_sequence_len(const char *s)
+{
+ const char *p = s;
+ if (*p++ != '\033')
+ return 0;
+ if (*p++ != '[')
+ return 0;
+ while (isdigit(*p) || *p == ';')
+ p++;
+ if (*p++ != 'm')
+ return 0;
+ return p - s;
+}
+
/*
* Wrap the text, if necessary. The variable indent is the indent for the
* first line, indent2 is the indent for all other lines.
@@ -337,7 +351,13 @@ int strbuf_add_wrapped_text(struct strbuf *buf,
}
for (;;) {
- char c = *text;
+ char c;
+ size_t skip;
+
+ while ((skip = display_mode_esc_sequence_len(text)))
+ text += skip;
+
+ c = *text;
if (!c || isspace(c)) {
if (w < width || !space) {
const char *start = bol;
--
1.6.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-11-23 22:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-22 16:15 [PATCH] log --format: document %w René Scharfe
2009-11-22 17:10 ` Junio C Hamano
2009-11-22 23:46 ` René Scharfe
2009-11-23 0:06 ` Junio C Hamano
2009-11-23 22:40 ` [PATCH] strbuf_add_wrapped_text(): skip over colour codes René Scharfe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox