[PATCH] client: handle wide chars for table rows

Wireless Daemon for Linux
 help / color / mirror / Atom feed

* [PATCH] client: handle wide chars for table rows
@ 2022-10-08  7:55 Pinghao Wu
  2022-10-10 15:57 ` James Prestwood
  2022-10-10 17:00 ` James Prestwood
  0 siblings, 2 replies; 5+ messages in thread
From: Pinghao Wu @ 2022-10-08  7:55 UTC (permalink / raw)
  To: iwd; +Cc: Pinghao Wu

Try to find out printing width of wide chars using current locale and
its byte length before falling back to non-codepoint UTF-8 bytes method.
---
 client/display.c | 16 +++++++++++++++-
 client/main.c    |  3 +++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/client/display.c b/client/display.c
index b729ad4c..dcbe11f8 100644
--- a/client/display.c
+++ b/client/display.c
@@ -26,10 +26,13 @@
 
 #define _GNU_SOURCE
 #include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
 #include <signal.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
 #include <unistd.h>
+#include <wchar.h>
 
 #include <readline/history.h>
 #include <readline/readline.h>
@@ -401,15 +404,26 @@ static char* next_line(char *s, unsigned int *max, char **color_out)
 	unsigned int i;
 	int last_space = -1;
 	int last_color = -1;
+	int s_len = strlen(s);
 
 	/* Find the last space before 'max', as well as any color */
-	for (i = 0; i <= *max && s[i] != '\0'; i++) {
+	for (i = 0; i <= *max && i != s_len; i++) {
 		if (s[i] == ' ')
 			last_space = i;
 		else if (s[i] == 0x1b) {
 			/* color escape won't count for column width */
 			*max += color_end(s + i);
 			last_color = i;
+		/* Non-ASCII, try wchar */
+		} else if (s[i] & 0x80) {
+			wchar_t w;
+			int w_mblen;
+			if ((w_mblen = mbtowc(&w, &s[i], s_len - i)) > 0) {
+				/* Compensate max bytes for wide char */
+				*max += w_mblen - wcwidth(w);
+				/* Skip other bytes for this wchar */
+				i += w_mblen - 1;
+			}
 		/* Add width for non-codepoint UTF-8 bytes */
 		} else if (((uint8_t)s[i] >> 6) == 2)
 			*max += 1;
diff --git a/client/main.c b/client/main.c
index df5c0a61..7e8dead4 100644
--- a/client/main.c
+++ b/client/main.c
@@ -25,6 +25,7 @@
 #endif
 
 #include <errno.h>
+#include <locale.h>
 #include <signal.h>
 #include <ell/ell.h>
 
@@ -50,6 +51,8 @@ int main(int argc, char *argv[])
 {
 	bool all_done;
 
+	setlocale(LC_CTYPE, "");
+
 	if (!l_main_init())
 		return EXIT_FAILURE;
 
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] client: handle wide chars for table rows
  2022-10-08  7:55 [PATCH] client: handle wide chars for table rows Pinghao Wu
@ 2022-10-10 15:57 ` James Prestwood
  2022-10-10 16:09   ` Pinghao Wu
  2022-10-10 17:00 ` James Prestwood
  1 sibling, 1 reply; 5+ messages in thread
From: James Prestwood @ 2022-10-10 15:57 UTC (permalink / raw)
  To: Pinghao Wu, iwd

Hi Pinghao,

On Sat, 2022-10-08 at 15:55 +0800, Pinghao Wu wrote:
> Try to find out printing width of wide chars using current locale and
> its byte length before falling back to non-codepoint UTF-8 bytes
> method.

So IWD itself only supports utf-8 so I'm not sure adding support for
wide chars in iwctl is really useful.

This function is for printing table rows and any data in these rows
comes from DBus, which will always be utf8.

Unless you have some other use case I hadn't thought of?

Thanks,
James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] client: handle wide chars for table rows
  2022-10-10 15:57 ` James Prestwood
@ 2022-10-10 16:09   ` Pinghao Wu
  2022-10-10 16:55     ` James Prestwood
  0 siblings, 1 reply; 5+ messages in thread
From: Pinghao Wu @ 2022-10-10 16:09 UTC (permalink / raw)
  To: James Prestwood, iwd

The current non-codepoint UTF-8 method assumes that a codepoint occupies
one column while printing. This is not true for some characters like
Chinese characters, which occupies two columns.

The motivation of this patch is to handle these "wide" (in printing
columns) chars properly via wcwidth(). It's not actually about
non-UTF-8, but it is also useful for that case.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] client: handle wide chars for table rows
  2022-10-10 16:09   ` Pinghao Wu
@ 2022-10-10 16:55     ` James Prestwood
  0 siblings, 0 replies; 5+ messages in thread
From: James Prestwood @ 2022-10-10 16:55 UTC (permalink / raw)
  To: Pinghao Wu, iwd

On Tue, 2022-10-11 at 00:09 +0800, Pinghao Wu wrote:
> The current non-codepoint UTF-8 method assumes that a codepoint
> occupies
> one column while printing. This is not true for some characters like
> Chinese characters, which occupies two columns.

Ah ok, in this case it may be best to combine you're changes with the
non-codepoint check since you're changes are more complete for handling
utf-8. I'll review the original patch.

> 
> The motivation of this patch is to handle these "wide" (in printing
> columns) chars properly via wcwidth(). It's not actually about
> non-UTF-8, but it is also useful for that case.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] client: handle wide chars for table rows
  2022-10-08  7:55 [PATCH] client: handle wide chars for table rows Pinghao Wu
  2022-10-10 15:57 ` James Prestwood
@ 2022-10-10 17:00 ` James Prestwood
  1 sibling, 0 replies; 5+ messages in thread
From: James Prestwood @ 2022-10-10 17:00 UTC (permalink / raw)
  To: Pinghao Wu, iwd

On Sat, 2022-10-08 at 15:55 +0800, Pinghao Wu wrote:
> Try to find out printing width of wide chars using current locale and
> its byte length before falling back to non-codepoint UTF-8 bytes
> method.
> ---
>  client/display.c | 16 +++++++++++++++-
>  client/main.c    |  3 +++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/client/display.c b/client/display.c
> index b729ad4c..dcbe11f8 100644
> --- a/client/display.c
> +++ b/client/display.c
> @@ -26,10 +26,13 @@
>  
>  #define _GNU_SOURCE
>  #include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
>  #include <signal.h>
>  #include <sys/stat.h>
>  #include <sys/ioctl.h>
>  #include <unistd.h>
> +#include <wchar.h>
>  
>  #include <readline/history.h>
>  #include <readline/readline.h>
> @@ -401,15 +404,26 @@ static char* next_line(char *s, unsigned int
> *max, char **color_out)
>         unsigned int i;
>         int last_space = -1;
>         int last_color = -1;
> +       int s_len = strlen(s);
>  
>         /* Find the last space before 'max', as well as any color */
> -       for (i = 0; i <= *max && s[i] != '\0'; i++) {
> +       for (i = 0; i <= *max && i != s_len; i++) {
>                 if (s[i] == ' ')
>                         last_space = i;
>                 else if (s[i] == 0x1b) {
>                         /* color escape won't count for column width
> */
>                         *max += color_end(s + i);
>                         last_color = i;
> +               /* Non-ASCII, try wchar */
> +               } else if (s[i] & 0x80) {
> +                       wchar_t w;
> +                       int w_mblen;
> +                       if ((w_mblen = mbtowc(&w, &s[i], s_len - i))

Instead of mbtowc() use l_utf8_get_codepoint() (an ELL utility). Since
we assume the input is going to be utf-8 always.

> > 0) {
> +                               /* Compensate max bytes for wide char
> */
> +                               *max += w_mblen - wcwidth(w);
> +                               /* Skip other bytes for this wchar */
> +                               i += w_mblen - 1;
> +                       }
>                 /* Add width for non-codepoint UTF-8 bytes */
>                 } else if (((uint8_t)s[i] >> 6) == 2)
>                         *max += 1;

And you can go ahead and remove this if-clause entirely since the above
case should now cover it. Plus the s[i] & 0x80 check above actually
prevents this branch from ever being taken.

> diff --git a/client/main.c b/client/main.c
> index df5c0a61..7e8dead4 100644
> --- a/client/main.c
> +++ b/client/main.c
> @@ -25,6 +25,7 @@
>  #endif
>  
>  #include <errno.h>
> +#include <locale.h>
>  #include <signal.h>
>  #include <ell/ell.h>
>  
> @@ -50,6 +51,8 @@ int main(int argc, char *argv[])
>  {
>         bool all_done;
>  
> +       setlocale(LC_CTYPE, "");
> +
>         if (!l_main_init())
>                 return EXIT_FAILURE;
>  

Thanks,
James



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-10 17:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-08  7:55 [PATCH] client: handle wide chars for table rows Pinghao Wu
2022-10-10 15:57 ` James Prestwood
2022-10-10 16:09   ` Pinghao Wu
2022-10-10 16:55     ` James Prestwood
2022-10-10 17:00 ` James Prestwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox