* regex.7 manpage is awful
@ 2009-01-09 10:22 Simon Oosthoek
[not found] ` <20090109102208.GA22747-earCsCjlB1dYz1uS2RbbqIS2ikGnqaxS@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Simon Oosthoek @ 2009-01-09 10:22 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Jan Christiaan van Winkel, linux-man-u79uwXL29TY76Z2rM5mHXA
Hi
The manpage regex(7) is explcitly mentioned in the new LPI objectives,
but in preparation to make study material for this subject, I feel a
strong sense of repulsion to even mention this page, as I cannot
understand what anyone would learn from it...
Probably the source of the problem lies with POSIX.2, as this has
apparently an ambiguous definition of the "extended" regular
expressions and (IMO erroneously) defines the basic regular
expressions as obsolete, though they are used in many many programs,
including grep, vim, sed, etc.
I may be wrong, but when I see so many (!)'s in the manpage,
indicating that this is not implemented equally, the whole definition
is broken and there is no such thing as "extended" or "modern" REs,
but a bunch of varying implementations of roughly similar extended RE
rules.
Also, if I interpret the page correctly, "modern" REs don't have
backreferences? They are mentioned as a new item for BASIC REs. While
they are bothersome to use, backreference are useful and would be
missed in an extended RE implementation without them. (certainly,
egrep /did/ implement them)
To me, the "modern" REs are a simplification of the syntax with bounds
and () enclosures without preceding '\'s and adding a bunch of weirdly
long character class notations that I doubt anyone uses frequently,
since they are too long to type and remember ([. [= [:)
If I were on the POSIX committee, I'd propose perl REs as the next
"modern" RE, but I'd not obsolete the basic RE at all, because in 95%
of the uses they are sufficient.
Perl REs are easy to type, well documented and powerful. Also
implementations already exist and are very well tested on probably all
Unix platforms.
Anyway, the point of this e-mail:
- the current page is awful, hard to read and ambiguous
- the implied POSIX decision to obsolete basic REs is bad
- I would not recommend this manpage to anyone trying to understand
REs
If possible I'd like to change the world ;-) but failing that, a less
ambiguous (and easier to read) version of the regex(7) manpage would
be greatly appreciated!
Cheers
Simon
PS, speaking on personal title, not for my employer AT Computing
PPS, I'd love to be proved wrong on any of my statements, so please
correct me if I'm wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regex.7 manpage is awful
[not found] ` <20090109102208.GA22747-earCsCjlB1dYz1uS2RbbqIS2ikGnqaxS@public.gmane.org>
@ 2009-01-09 11:38 ` Petr Baudis
[not found] ` <20090109113852.GB21648-DDGJ70k9y3lX+M3pkMnKjw@public.gmane.org>
2009-01-12 10:08 ` Michael Kerrisk
1 sibling, 1 reply; 6+ messages in thread
From: Petr Baudis @ 2009-01-09 11:38 UTC (permalink / raw)
To: Simon Oosthoek
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Jan Christiaan van Winkel,
linux-man-u79uwXL29TY76Z2rM5mHXA
Hi,
On Fri, Jan 09, 2009 at 11:22:09AM +0100, Simon Oosthoek wrote:
> If I were on the POSIX committee, I'd propose perl REs as the next
> "modern" RE, but I'd not obsolete the basic RE at all, because in 95%
> of the uses they are sufficient.
> Perl REs are easy to type, well documented and powerful. Also
> implementations already exist and are very well tested on probably all
> Unix platforms.
I doubt that you will succeed in adding a third regexp mode at this
point, but the POSIX development process seems very open, so feel free
to propose this: http://www.opengroup.org/austin/
> Anyway, the point of this e-mail:
> - the current page is awful, hard to read and ambiguous
> - the implied POSIX decision to obsolete basic REs is bad
> - I would not recommend this manpage to anyone trying to understand
> REs
I agree with most of your points, but I'm sure Michael is aware of the
issues as well - what would probably help were actual patches. ;-)
A random set of working item ideas:
* Don't use modern/obsolete terms since they are unwarranted and
confusing - they seem to push an agenda that has nothing to do
with the reality
* Avoid (!) since they disturb the text severly, IMHO - discuss
extensions at the end; this is something that's even very
difficult to do for me since I don't actually understand many
of the (!)s
* The page should be divided into subsections, with examples
at the end of each subsection
* Atom should be explained before bound
* Back reference should be mentioned before basic regexes
(glibc supports it for ERE too)
* | is supported in BRE too in glibc
* SEE ALSO should have perlre(1) (sic) reference
* The AUTHOR paragraph hidden might violate the page licence?
* Wound boundaries syntax is commented out, but a real one
is supported, using \b, \B, \< and \>
* Plenty of other extensions available too, e.g. \w and \s
(see regcomp.c:peek_token())
* On a related note, re_set_syntax() should be documented
(grep(1) has nice, concise and incomplete regex description.)
--
Petr "Pasky" Baudis
The average, healthy, well-adjusted adult gets up at seven-thirty
in the morning feeling just terrible. -- Jean Kerr
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regex.7 manpage is awful
[not found] ` <20090109102208.GA22747-earCsCjlB1dYz1uS2RbbqIS2ikGnqaxS@public.gmane.org>
2009-01-09 11:38 ` Petr Baudis
@ 2009-01-12 10:08 ` Michael Kerrisk
[not found] ` <cfd18e0f0901120208v48551ce7i2268226bd9fbb1bd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 1 reply; 6+ messages in thread
From: Michael Kerrisk @ 2009-01-12 10:08 UTC (permalink / raw)
To: Simon Oosthoek
Cc: Jan Christiaan van Winkel, linux-man-u79uwXL29TY76Z2rM5mHXA
Simon,
[...]
> Anyway, the point of this e-mail:
> - the current page is awful, hard to read and ambiguous
> - the implied POSIX decision to obsolete basic REs is bad
> - I would not recommend this manpage to anyone trying to understand
> REs
>
> If possible I'd like to change the world ;-) but failing that, a less
> ambiguous (and easier to read) version of the regex(7) manpage would
> be greatly appreciated!
You are not the first to note that the regex(7) page could be better.
But so far, no one has done the work to make it better. My time is
limited, so I'm unlikely to get to it in a hurry. Help would be
greatly appreciated...
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regex.7 manpage is awful
[not found] ` <20090109113852.GB21648-DDGJ70k9y3lX+M3pkMnKjw@public.gmane.org>
@ 2009-01-12 10:17 ` Michael Kerrisk
2009-01-12 10:29 ` Simon Oosthoek
1 sibling, 0 replies; 6+ messages in thread
From: Michael Kerrisk @ 2009-01-12 10:17 UTC (permalink / raw)
To: Petr Baudis
Cc: Simon Oosthoek, Jan Christiaan van Winkel,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Sat, Jan 10, 2009 at 12:38 AM, Petr Baudis <pasky-AlSwsSmVLrQ@public.gmane.org> wrote:
> Hi,
>
> On Fri, Jan 09, 2009 at 11:22:09AM +0100, Simon Oosthoek wrote:
>> If I were on the POSIX committee, I'd propose perl REs as the next
>> "modern" RE, but I'd not obsolete the basic RE at all, because in 95%
>> of the uses they are sufficient.
>> Perl REs are easy to type, well documented and powerful. Also
>> implementations already exist and are very well tested on probably all
>> Unix platforms.
>
> I doubt that you will succeed in adding a third regexp mode at this
> point, but the POSIX development process seems very open, so feel free
> to propose this: http://www.opengroup.org/austin/
>
>> Anyway, the point of this e-mail:
>> - the current page is awful, hard to read and ambiguous
>> - the implied POSIX decision to obsolete basic REs is bad
>> - I would not recommend this manpage to anyone trying to understand
>> REs
>
> I agree with most of your points, but I'm sure Michael is aware of the
> issues as well - what would probably help were actual patches. ;-)
Yes.
> A random set of working item ideas:
>
> * Don't use modern/obsolete terms since they are unwarranted and
> confusing - they seem to push an agenda that has nothing to do
> with the reality
> * Avoid (!) since they disturb the text severly, IMHO - discuss
> extensions at the end; this is something that's even very
> difficult to do for me since I don't actually understand many
> of the (!)s
> * The page should be divided into subsections, with examples
> at the end of each subsection
> * Atom should be explained before bound
> * Back reference should be mentioned before basic regexes
> (glibc supports it for ERE too)
> * | is supported in BRE too in glibc
> * SEE ALSO should have perlre(1) (sic) reference
> * The AUTHOR paragraph hidden might violate the page licence?
Yes, that's probably true. I reinstated that text.
[...]
Cheers,
Michael
PS Petr: Many other good ideas for fixes; maybe someone will pick this
up. Not me, at the moment.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regex.7 manpage is awful
[not found] ` <20090109113852.GB21648-DDGJ70k9y3lX+M3pkMnKjw@public.gmane.org>
2009-01-12 10:17 ` Michael Kerrisk
@ 2009-01-12 10:29 ` Simon Oosthoek
1 sibling, 0 replies; 6+ messages in thread
From: Simon Oosthoek @ 2009-01-12 10:29 UTC (permalink / raw)
To: Petr Baudis
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Jan Christiaan van Winkel,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Friday 09 January 2009 12:38, Petr Baudis wrote:
> I doubt that you will succeed in adding a third regexp mode at this
> point, but the POSIX development process seems very open, so feel free
> to propose this: http://www.opengroup.org/austin/
Thanks for the link, but I'd have to think longer about it before I would
propose something like this ;-)
[snip - good suggestions!]
>
> (grep(1) has nice, concise and incomplete regex description.)
I guess that would be a good place to start. And to verify all the things
mentioned there to some common GNU implementations (since these are the
Linux/GNU manpages).
It might be worthwhile to create a section called PORTABILITY and list
all known exceptions to the description above it and wether the
behaviour is POSIX.2 compliant.
Cheers
Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regex.7 manpage is awful
[not found] ` <cfd18e0f0901120208v48551ce7i2268226bd9fbb1bd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-01-12 10:30 ` Simon Oosthoek
0 siblings, 0 replies; 6+ messages in thread
From: Simon Oosthoek @ 2009-01-12 10:30 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Jan Christiaan van Winkel, linux-man-u79uwXL29TY76Z2rM5mHXA
On Monday 12 January 2009 11:08, Michael Kerrisk wrote:
> Simon,
>
> [...]
>
> > Anyway, the point of this e-mail:
> > - the current page is awful, hard to read and ambiguous
> > - the implied POSIX decision to obsolete basic REs is bad
> > - I would not recommend this manpage to anyone trying to understand
> > REs
> >
> > If possible I'd like to change the world ;-) but failing that, a
> > less ambiguous (and easier to read) version of the regex(7) manpage
> > would be greatly appreciated!
>
> You are not the first to note that the regex(7) page could be better.
> But so far, no one has done the work to make it better. My time is
> limited, so I'm unlikely to get to it in a hurry. Help would be
> greatly appreciated...
I understand, the same goes for me, but if I can get some time in, I'll
try to get something going...
Petr's suggestions are a good start I think.
I'll keep it in the back of my mind for when I get some time... (I'll try
to get to it sooner rather than later)
Cheers
Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-01-12 10:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-09 10:22 regex.7 manpage is awful Simon Oosthoek
[not found] ` <20090109102208.GA22747-earCsCjlB1dYz1uS2RbbqIS2ikGnqaxS@public.gmane.org>
2009-01-09 11:38 ` Petr Baudis
[not found] ` <20090109113852.GB21648-DDGJ70k9y3lX+M3pkMnKjw@public.gmane.org>
2009-01-12 10:17 ` Michael Kerrisk
2009-01-12 10:29 ` Simon Oosthoek
2009-01-12 10:08 ` Michael Kerrisk
[not found] ` <cfd18e0f0901120208v48551ce7i2268226bd9fbb1bd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-01-12 10:30 ` Simon Oosthoek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox