DASH Shell discussions
 help / color / mirror / Atom feed
* The Greek letter "rho" is considered as two letters
@ 2010-08-07 19:37 Alkis Georgopoulos
  2010-08-07 19:57 ` Alkis Georgopoulos
  0 siblings, 1 reply; 4+ messages in thread
From: Alkis Georgopoulos @ 2010-08-07 19:37 UTC (permalink / raw)
  To: dash

$ touch ρ
$ ls ?
ls: cannot access ?: No such file or directory
$ ls ??
ρ

It happens to some utf-8 characters, but not for all of them.
This might be related:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=532302

Please CC me if possible, I'm not on the list.

Kind regards,
Alkis Georgopoulos


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The Greek letter "rho" is considered as two letters
  2010-08-07 19:37 The Greek letter "rho" is considered as two letters Alkis Georgopoulos
@ 2010-08-07 19:57 ` Alkis Georgopoulos
  2010-08-08 12:55   ` ? doesn't match non-ascii characters Alkis Georgopoulos
  2010-08-08 12:56   ` The Greek letter "rho" is considered as two letters Jilles Tjoelker
  0 siblings, 2 replies; 4+ messages in thread
From: Alkis Georgopoulos @ 2010-08-07 19:57 UTC (permalink / raw)
  To: dash

Erm actually this problem happens with all utf8 characters, i.e. dash
does not properly take utf8 characters into account when expanding "?".

$ touch appétit              
$ ls app?tit
ls: cannot access app?tit: No such file or directory
$ ls app??tit
appétit


I'll send another mail about the greek rho problem which occurs only
with redirections.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ? doesn't match non-ascii characters
  2010-08-07 19:57 ` Alkis Georgopoulos
@ 2010-08-08 12:55   ` Alkis Georgopoulos
  2010-08-08 12:56   ` The Greek letter "rho" is considered as two letters Jilles Tjoelker
  1 sibling, 0 replies; 4+ messages in thread
From: Alkis Georgopoulos @ 2010-08-08 12:55 UTC (permalink / raw)
  To: dash

I've changed the title because it was misleading
(was: "The Greek letter "rho" is considered as two letters").

Repeating the problem,

$ touch appétit              
$ ls app?tit
ls: cannot access app?tit: No such file or directory
$ ls app??tit
appétit

I.e. double-byte utf-8 characters need two "?" to be matched.,
triple-byte utf-8 characters (e.g. ἀ) need three "?" to be matched etc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The Greek letter "rho" is considered as two letters
  2010-08-07 19:57 ` Alkis Georgopoulos
  2010-08-08 12:55   ` ? doesn't match non-ascii characters Alkis Georgopoulos
@ 2010-08-08 12:56   ` Jilles Tjoelker
  1 sibling, 0 replies; 4+ messages in thread
From: Jilles Tjoelker @ 2010-08-08 12:56 UTC (permalink / raw)
  To: Alkis Georgopoulos; +Cc: dash

On Sat, Aug 07, 2010 at 10:57:12PM +0300, Alkis Georgopoulos wrote:
> Erm actually this problem happens with all utf8 characters, i.e. dash
> does not properly take utf8 characters into account when expanding "?".

> $ touch appétit              
> $ ls app?tit
> ls: cannot access app?tit: No such file or directory
> $ ls app??tit
> appétit

Yes, it seems that dash has zero support for locales. In some ways this
is an advantage, as locale support can make things considerably slower
and configure/startup scripts don't need it. However, it leads to
inconsistent behaviour with other utilities that do support locales.

For FreeBSD's /bin/sh, which is another ash variant, I think some degree
of locale support (at least for utf-8) is desirable at some point. This
would include changing pattern matching and ${#var}.

I don't know what Herbert Xu thinks about this.

-- 
Jilles Tjoelker

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-08-08 12:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-07 19:37 The Greek letter "rho" is considered as two letters Alkis Georgopoulos
2010-08-07 19:57 ` Alkis Georgopoulos
2010-08-08 12:55   ` ? doesn't match non-ascii characters Alkis Georgopoulos
2010-08-08 12:56   ` The Greek letter "rho" is considered as two letters Jilles Tjoelker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox