* [PATCH 01/11] cmd: test: add support for =~ operator
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 16:49 ` Tom Rini
2025-05-06 14:10 ` [PATCH 02/11] slre: add myself as maintainer Rasmus Villemoes
` (11 subsequent siblings)
12 siblings, 1 reply; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
Currently, the only way to make use of regex matching in the shell is
by using "setexpr [g]sub" command. That's rather awkward for asking
whether a string matches a regex. At the very least, it requires
providing setexpr with a dummy target variable, but also, the return
value of setexpr doesn't say whether any substitutions were done, so
one would have to do some roundabout thing like
env set dummy "${string_to_test}"
setexpr sub dummy '<some regex>' ''
if test "${dummy}" != "${string_to_test}" ; then ...
When CONFIG_REGEX is set, teach the test command a new operator, =~,
which will allow one to more naturally write
if test "${string_to_test}" =~ '<some regex>' ; then ...
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
cmd/test.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/cmd/test.c b/cmd/test.c
index b4c3eabf9f6..a42a523d33d 100644
--- a/cmd/test.c
+++ b/cmd/test.c
@@ -7,6 +7,7 @@
#include <command.h>
#include <fs.h>
#include <log.h>
+#include <slre.h>
#include <vsprintf.h>
#define OP_INVALID 0
@@ -26,6 +27,7 @@
#define OP_INT_GT 14
#define OP_INT_GE 15
#define OP_FILE_EXISTS 16
+#define OP_REGEX 17
const struct {
int arg;
@@ -49,6 +51,9 @@ const struct {
{0, "-z", OP_STR_EMPTY, 2},
{0, "-n", OP_STR_NEMPTY, 2},
{0, "-e", OP_FILE_EXISTS, 4},
+#ifdef CONFIG_REGEX
+ {1, "=~", OP_REGEX, 3},
+#endif
};
static int do_test(struct cmd_tbl *cmdtp, int flag, int argc,
@@ -141,6 +146,20 @@ static int do_test(struct cmd_tbl *cmdtp, int flag, int argc,
case OP_FILE_EXISTS:
expr = file_exists(ap[1], ap[2], ap[3], FS_TYPE_ANY);
break;
+#ifdef CONFIG_REGEX
+ case OP_REGEX: {
+ struct slre slre;
+
+ if (slre_compile(&slre, ap[2]) == 0) {
+ printf("Error compiling regex: %s\n", slre.err_str);
+ expr = 0;
+ break;
+ }
+
+ expr = slre_match(&slre, ap[0], strlen(ap[0]), NULL);
+ break;
+ }
+#endif
}
switch (op) {
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 01/11] cmd: test: add support for =~ operator
2025-05-06 14:10 ` [PATCH 01/11] cmd: test: add support for =~ operator Rasmus Villemoes
@ 2025-05-06 16:49 ` Tom Rini
2025-05-06 16:52 ` Tom Rini
0 siblings, 1 reply; 19+ messages in thread
From: Tom Rini @ 2025-05-06 16:49 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: u-boot
[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]
On Tue, May 06, 2025 at 04:10:25PM +0200, Rasmus Villemoes wrote:
> Currently, the only way to make use of regex matching in the shell is
> by using "setexpr [g]sub" command. That's rather awkward for asking
> whether a string matches a regex. At the very least, it requires
> providing setexpr with a dummy target variable, but also, the return
> value of setexpr doesn't say whether any substitutions were done, so
> one would have to do some roundabout thing like
>
> env set dummy "${string_to_test}"
> setexpr sub dummy '<some regex>' ''
> if test "${dummy}" != "${string_to_test}" ; then ...
>
> When CONFIG_REGEX is set, teach the test command a new operator, =~,
> which will allow one to more naturally write
>
> if test "${string_to_test}" =~ '<some regex>' ; then ...
>
> Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
We should also mention here (and then in docs) that this the same as the
=~ operator in bash, which I only learned about now as part of answering
my own question of "Are people going to expect =~ to do something
else?".
With that,
Reviewed-by: Tom Rini <trini@konsulko.com>
--
Tom
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 01/11] cmd: test: add support for =~ operator
2025-05-06 16:49 ` Tom Rini
@ 2025-05-06 16:52 ` Tom Rini
2025-05-06 19:07 ` Rasmus Villemoes
0 siblings, 1 reply; 19+ messages in thread
From: Tom Rini @ 2025-05-06 16:52 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: u-boot
[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]
On Tue, May 06, 2025 at 10:49:31AM -0600, Tom Rini wrote:
> On Tue, May 06, 2025 at 04:10:25PM +0200, Rasmus Villemoes wrote:
>
> > Currently, the only way to make use of regex matching in the shell is
> > by using "setexpr [g]sub" command. That's rather awkward for asking
> > whether a string matches a regex. At the very least, it requires
> > providing setexpr with a dummy target variable, but also, the return
> > value of setexpr doesn't say whether any substitutions were done, so
> > one would have to do some roundabout thing like
> >
> > env set dummy "${string_to_test}"
> > setexpr sub dummy '<some regex>' ''
> > if test "${dummy}" != "${string_to_test}" ; then ...
> >
> > When CONFIG_REGEX is set, teach the test command a new operator, =~,
> > which will allow one to more naturally write
> >
> > if test "${string_to_test}" =~ '<some regex>' ; then ...
> >
> > Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
>
> We should also mention here (and then in docs) that this the same as the
> =~ operator in bash, which I only learned about now as part of answering
> my own question of "Are people going to expect =~ to do something
> else?".
>
> With that,
>
> Reviewed-by: Tom Rini <trini@konsulko.com>
Oh, and we should update doc/usage/cmd/setexpr.rst at least and we
should have one for test as well, but don't, yet.
--
Tom
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 01/11] cmd: test: add support for =~ operator
2025-05-06 16:52 ` Tom Rini
@ 2025-05-06 19:07 ` Rasmus Villemoes
2025-05-06 19:24 ` Tom Rini
0 siblings, 1 reply; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 19:07 UTC (permalink / raw)
To: Tom Rini; +Cc: u-boot
On Tue, May 06 2025, Tom Rini <trini@konsulko.com> wrote:
> On Tue, May 06, 2025 at 10:49:31AM -0600, Tom Rini wrote:
>> On Tue, May 06, 2025 at 04:10:25PM +0200, Rasmus Villemoes wrote:
>>
>> > Currently, the only way to make use of regex matching in the shell is
>> > by using "setexpr [g]sub" command. That's rather awkward for asking
>> > whether a string matches a regex. At the very least, it requires
>> > providing setexpr with a dummy target variable, but also, the return
>> > value of setexpr doesn't say whether any substitutions were done, so
>> > one would have to do some roundabout thing like
>> >
>> > env set dummy "${string_to_test}"
>> > setexpr sub dummy '<some regex>' ''
>> > if test "${dummy}" != "${string_to_test}" ; then ...
>> >
>> > When CONFIG_REGEX is set, teach the test command a new operator, =~,
>> > which will allow one to more naturally write
>> >
>> > if test "${string_to_test}" =~ '<some regex>' ; then ...
>> >
>> > Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
>>
>> We should also mention here (and then in docs) that this the same as the
>> =~ operator in bash, which I only learned about now as part of answering
>> my own question of "Are people going to expect =~ to do something
>> else?".
>>
>> With that,
>>
>> Reviewed-by: Tom Rini <trini@konsulko.com>
>
> Oh, and we should update doc/usage/cmd/setexpr.rst at least and we
> should have one for test as well, but don't, yet.
Yes, if test.rst had already existed I would have updated it, but I
didn't feel like writing the whole thing from scratch until I had a
sense of whether these were going anywhere. I'll take a stab at it,
including the bash ref, but I'm not sure what you want to put in
setexpr.rst? Perhaps a reference to test.rst for an overview of what
regex features are available so they don't need to be repeated?
Another thing I considered was to have the 'test foo =~ ...' thing
populate shell variables $1, $2, ... with the capture groups, if any;
and possibly even $0 with "what matched the whole thing", as U-Boot
doesn't really have anything sensible for that.
Rasmus
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 01/11] cmd: test: add support for =~ operator
2025-05-06 19:07 ` Rasmus Villemoes
@ 2025-05-06 19:24 ` Tom Rini
2025-05-06 21:10 ` Rasmus Villemoes
0 siblings, 1 reply; 19+ messages in thread
From: Tom Rini @ 2025-05-06 19:24 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: u-boot
[-- Attachment #1: Type: text/plain, Size: 2756 bytes --]
On Tue, May 06, 2025 at 09:07:05PM +0200, Rasmus Villemoes wrote:
> On Tue, May 06 2025, Tom Rini <trini@konsulko.com> wrote:
>
> > On Tue, May 06, 2025 at 10:49:31AM -0600, Tom Rini wrote:
> >> On Tue, May 06, 2025 at 04:10:25PM +0200, Rasmus Villemoes wrote:
> >>
> >> > Currently, the only way to make use of regex matching in the shell is
> >> > by using "setexpr [g]sub" command. That's rather awkward for asking
> >> > whether a string matches a regex. At the very least, it requires
> >> > providing setexpr with a dummy target variable, but also, the return
> >> > value of setexpr doesn't say whether any substitutions were done, so
> >> > one would have to do some roundabout thing like
> >> >
> >> > env set dummy "${string_to_test}"
> >> > setexpr sub dummy '<some regex>' ''
> >> > if test "${dummy}" != "${string_to_test}" ; then ...
> >> >
> >> > When CONFIG_REGEX is set, teach the test command a new operator, =~,
> >> > which will allow one to more naturally write
> >> >
> >> > if test "${string_to_test}" =~ '<some regex>' ; then ...
> >> >
> >> > Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
> >>
> >> We should also mention here (and then in docs) that this the same as the
> >> =~ operator in bash, which I only learned about now as part of answering
> >> my own question of "Are people going to expect =~ to do something
> >> else?".
> >>
> >> With that,
> >>
> >> Reviewed-by: Tom Rini <trini@konsulko.com>
> >
> > Oh, and we should update doc/usage/cmd/setexpr.rst at least and we
> > should have one for test as well, but don't, yet.
>
> Yes, if test.rst had already existed I would have updated it, but I
> didn't feel like writing the whole thing from scratch until I had a
> sense of whether these were going anywhere. I'll take a stab at it,
> including the bash ref, but I'm not sure what you want to put in
> setexpr.rst? Perhaps a reference to test.rst for an overview of what
> regex features are available so they don't need to be repeated?
Yeah, for setexpr.rst I would go for (a) making sure it's correct and
without subtle bugs (as you just fixed a number of them) and then (b)
noting that the =~ operator exists and link to the new test.rst
> Another thing I considered was to have the 'test foo =~ ...' thing
> populate shell variables $1, $2, ... with the capture groups, if any;
> and possibly even $0 with "what matched the whole thing", as U-Boot
> doesn't really have anything sensible for that.
I would ask what bash does here first (are things saved anywhere?) as I
assume you use this feature there. And then a second of how useful the
captured output is likely to be outside of debugging a script you're
writing.
--
Tom
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 01/11] cmd: test: add support for =~ operator
2025-05-06 19:24 ` Tom Rini
@ 2025-05-06 21:10 ` Rasmus Villemoes
0 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 21:10 UTC (permalink / raw)
To: Tom Rini; +Cc: u-boot
On Tue, May 06 2025, Tom Rini <trini@konsulko.com> wrote:
> On Tue, May 06, 2025 at 09:07:05PM +0200, Rasmus Villemoes wrote:
>> On Tue, May 06 2025, Tom Rini <trini@konsulko.com> wrote:
>>
>> > On Tue, May 06, 2025 at 10:49:31AM -0600, Tom Rini wrote:
>> >> On Tue, May 06, 2025 at 04:10:25PM +0200, Rasmus Villemoes wrote:
>> >>
>> >> > Currently, the only way to make use of regex matching in the shell is
>> >> > by using "setexpr [g]sub" command. That's rather awkward for asking
>> >> > whether a string matches a regex. At the very least, it requires
>> >> > providing setexpr with a dummy target variable, but also, the return
>> >> > value of setexpr doesn't say whether any substitutions were done, so
>> >> > one would have to do some roundabout thing like
>> >> >
>> >> > env set dummy "${string_to_test}"
>> >> > setexpr sub dummy '<some regex>' ''
>> >> > if test "${dummy}" != "${string_to_test}" ; then ...
>> >> >
>> >> > When CONFIG_REGEX is set, teach the test command a new operator, =~,
>> >> > which will allow one to more naturally write
>> >> >
>> >> > if test "${string_to_test}" =~ '<some regex>' ; then ...
>> >> >
>> >> > Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
>> >>
>> >> We should also mention here (and then in docs) that this the same as the
>> >> =~ operator in bash, which I only learned about now as part of answering
>> >> my own question of "Are people going to expect =~ to do something
>> >> else?".
>> >>
>> >> With that,
>> >>
>> >> Reviewed-by: Tom Rini <trini@konsulko.com>
>> >
>> > Oh, and we should update doc/usage/cmd/setexpr.rst at least and we
>> > should have one for test as well, but don't, yet.
>>
>> Yes, if test.rst had already existed I would have updated it, but I
>> didn't feel like writing the whole thing from scratch until I had a
>> sense of whether these were going anywhere. I'll take a stab at it,
>> including the bash ref, but I'm not sure what you want to put in
>> setexpr.rst? Perhaps a reference to test.rst for an overview of what
>> regex features are available so they don't need to be repeated?
>
> Yeah, for setexpr.rst I would go for (a) making sure it's correct and
> without subtle bugs (as you just fixed a number of them) and then (b)
> noting that the =~ operator exists and link to the new test.rst
See the patch I just sent. As for subtle bugs, IDK, I've made sure that
the setexpr test suite passes throughout my series, so I don't think
I've changed anything that affects those tests, but I also don't know
how comprehensive those tests are or if they even exercise
e.g. character classes.
>> Another thing I considered was to have the 'test foo =~ ...' thing
>> populate shell variables $1, $2, ... with the capture groups, if any;
>> and possibly even $0 with "what matched the whole thing", as U-Boot
>> doesn't really have anything sensible for that.
>
> I would ask what bash does here first (are things saved anywhere?)
Yes, but bash also has array variables, and it stores the capture groups
in the array variable BASH_REMATCH, so capture group 2 would be
"${BASH_REMATCH[2]}". If $n is too subtle, we could certainly make it
REMATCHn instead for analogy with bash.
> as I assume you use this feature there.
Yes, I do occasinally use [[ =~ ]], but no, I rarely find myself needing
to extract capture groups in that setting. But that's probably partly
due to bash's string expansion features that make a lot of string
munging quite easy; e.g. to get the basename of a file one simply does
${file##*/} ; there's really never any reason to do "$(basename
"$file")".
> And then a second of how useful the captured output is likely to be
> outside of debugging a script you're writing.
Well, I had half a use case some time ago when I first started thinking
about having regex matching in U-Boot shell in the first place, but then
I got distracted, did the whole thing some other way, and now I can't
even remember what it was. I'm just floating the idea, and I won't push
it until there is a real use case.
As for the use case for the regex matching itself, this is at first for
some sanity checking/verification of data related a secure boot setup;
if it doesn't match some strict checks, I'll throw it away and use sane
defaults.
Rasmus
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 02/11] slre: add myself as maintainer
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 01/11] cmd: test: add support for =~ operator Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 03/11] test: slre: add tests for regex library Rasmus Villemoes
` (10 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
I guess that touching these files means "tag, you're it". That's fine
with me.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
MAINTAINERS | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index d62dd35a385..94c62daf834 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1628,6 +1628,12 @@ F: drivers/gpio/sl28cpld-gpio.c
F: drivers/misc/sl28cpld.c
F: drivers/watchdog/sl28cpld-wdt.c
+SLRE
+M: Rasmus Villemoes <ravi@prevas.dk>
+S: Maintained
+F: include/slre.h
+F: lib/slre.c
+
SMCCC TRNG
M: Etienne Carriere <etienne.carriere@linaro.org>
S: Maintained
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 03/11] test: slre: add tests for regex library
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 01/11] cmd: test: add support for =~ operator Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 02/11] slre: add myself as maintainer Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 04/11] slre: drop wrong "anchored" optimization Rasmus Villemoes
` (9 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
Inspecting the slre.c code reveals a few bugs; those are easy to
demonstrate with the new '=~' test operator. Before fixing them, let's
add a place to add test cases.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
MAINTAINERS | 1 +
test/lib/Makefile | 1 +
test/lib/slre.c | 36 ++++++++++++++++++++++++++++++++++++
3 files changed, 38 insertions(+)
create mode 100644 test/lib/slre.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 94c62daf834..111e2767917 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1633,6 +1633,7 @@ M: Rasmus Villemoes <ravi@prevas.dk>
S: Maintained
F: include/slre.h
F: lib/slre.c
+F: test/lib/slre.c
SMCCC TRNG
M: Etienne Carriere <etienne.carriere@linaro.org>
diff --git a/test/lib/Makefile b/test/lib/Makefile
index d620510f998..ff4ff63270d 100644
--- a/test/lib/Makefile
+++ b/test/lib/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_SHA256) += test_sha256_hmac.o
obj-$(CONFIG_HKDF_MBEDTLS) += test_sha256_hkdf.o
obj-$(CONFIG_GETOPT) += getopt.o
obj-$(CONFIG_CRC8) += test_crc8.o
+obj-$(CONFIG_REGEX) += slre.o
obj-$(CONFIG_UT_LIB_CRYPT) += test_crypt.o
obj-$(CONFIG_UT_TIME) += time.o
obj-$(CONFIG_$(PHASE_)UT_UNICODE) += unicode.o
diff --git a/test/lib/slre.c b/test/lib/slre.c
new file mode 100644
index 00000000000..51a50b269aa
--- /dev/null
+++ b/test/lib/slre.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+
+#include <test/lib.h>
+#include <test/ut.h>
+#include <slre.h>
+
+struct re_test {
+ const char *str;
+ const char *re;
+ int match;
+};
+
+static const struct re_test re_test[] = {
+ { "123", "^\\d+$", 1},
+ { "x23", "^\\d+$", 0},
+ { "banana", "^([bn]a)*$", 1},
+ { "panama", "^([bn]a)*$", 0},
+ {}
+};
+
+static int lib_slre(struct unit_test_state *uts)
+{
+ const struct re_test *t;
+
+ for (t = re_test; t->str; t++) {
+ struct slre slre;
+
+ ut_assert(slre_compile(&slre, t->re));
+ ut_assertf(!!slre_match(&slre, t->str, strlen(t->str), NULL) == t->match,
+ "'%s' unexpectedly %s '%s'\n", t->str,
+ t->match ? "didn't match" : "matched", t->re);
+ }
+
+ return 0;
+}
+LIB_TEST(lib_slre, 0);
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 04/11] slre: drop wrong "anchored" optimization
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (2 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 03/11] test: slre: add tests for regex library Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 05/11] test: slre: add more test cases Rasmus Villemoes
` (8 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
The regex '^a|b' means "does the string start with a, or does it have
a b anywhere", not "does the string start with a or b" (the latter
should be spelled '^[ab]' or '^(a|b)'). It should match exactly the
same strings as 'b|^a'. But the current implementation hard-codes an
assumption that when the regex starts with a ^, the whole regex must
match from the beginning, i.e. it only attempts at offset 0.
It really should be completely symmetrical to 'b|c$' ("does it have a
b anywhere or end with c?"), which is treated correctly.
Another quirk is that currently the regex 'x*$', which should match
all strings (because it just means "does the string end
with 0 or more x'es"), does not, because in the unanchored case we
never attempt to match at ofs==len. In the anchored case, '^x*$', this
works correctly and matches exactly strings (including the empty
string) consisting entirely of x'es.
Fix both of these issues by dropping all use of the slre->anchored
member and always test at all possible offsets. If the regex does have
a ^ somewhere (including after a | branch character), that is
correctly handled by the match engine by only matching when *ofs is 0.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
include/slre.h | 1 -
lib/slre.c | 13 +++----------
2 files changed, 3 insertions(+), 11 deletions(-)
diff --git a/include/slre.h b/include/slre.h
index 4b41a4b276f..af5b1302d9c 100644
--- a/include/slre.h
+++ b/include/slre.h
@@ -63,7 +63,6 @@ struct slre {
int code_size;
int data_size;
int num_caps; /* Number of bracket pairs */
- int anchored; /* Must match from string start */
const char *err_str; /* Error string */
};
diff --git a/lib/slre.c b/lib/slre.c
index 277a59a03a7..4f455400d3a 100644
--- a/lib/slre.c
+++ b/lib/slre.c
@@ -413,10 +413,7 @@ int
slre_compile(struct slre *r, const char *re)
{
r->err_str = NULL;
- r->code_size = r->data_size = r->num_caps = r->anchored = 0;
-
- if (*re == '^')
- r->anchored++;
+ r->code_size = r->data_size = r->num_caps = 0;
emit(r, OPEN); /* This will capture what matches full RE */
emit(r, 0);
@@ -650,13 +647,9 @@ slre_match(const struct slre *r, const char *buf, int len,
{
int i, ofs = 0, res = 0;
- if (r->anchored) {
+ for (i = 0; i <= len && res == 0; i++) {
+ ofs = i;
res = match(r, 0, buf, len, &ofs, caps);
- } else {
- for (i = 0; i < len && res == 0; i++) {
- ofs = i;
- res = match(r, 0, buf, len, &ofs, caps);
- }
}
return res;
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 05/11] test: slre: add more test cases
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (3 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 04/11] slre: drop wrong "anchored" optimization Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 06/11] test: slre: add some (negative) character class tests Rasmus Villemoes
` (7 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
Add some tests for the "drop wrong anchored optimization". Without
the previous commit, the first, fifth and seventh of these would fail,
i.e. those:
{ "xby", "^a|b", 1},
{ "", "x*$", 1},
{ "yy", "x*$", 1},
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
test/lib/slre.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/test/lib/slre.c b/test/lib/slre.c
index 51a50b269aa..b76d33475dd 100644
--- a/test/lib/slre.c
+++ b/test/lib/slre.c
@@ -15,6 +15,14 @@ static const struct re_test re_test[] = {
{ "x23", "^\\d+$", 0},
{ "banana", "^([bn]a)*$", 1},
{ "panama", "^([bn]a)*$", 0},
+ { "xby", "^a|b", 1},
+ { "xby", "b|^a", 1},
+ { "xby", "b|c$", 1},
+ { "xby", "c$|b", 1},
+ { "", "x*$", 1},
+ { "", "^x*$", 1},
+ { "yy", "x*$", 1},
+ { "yy", "^x*$", 0},
{}
};
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 06/11] test: slre: add some (negative) character class tests
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (4 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 05/11] test: slre: add more test cases Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 07/11] slre: refactor is_any_but() Rasmus Villemoes
` (6 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
test/lib/slre.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/test/lib/slre.c b/test/lib/slre.c
index b76d33475dd..9b41ea92f38 100644
--- a/test/lib/slre.c
+++ b/test/lib/slre.c
@@ -23,6 +23,9 @@ static const struct re_test re_test[] = {
{ "", "^x*$", 1},
{ "yy", "x*$", 1},
{ "yy", "^x*$", 0},
+ { "Gadsby", "^[^eE]*$", 1},
+ { "Ernest", "^[^eE]*$", 0},
+ { "6d41f0a39d6", "^[0123456789abcdef]*$", 1 },
{}
};
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 07/11] slre: refactor is_any_but()
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (5 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 06/11] test: slre: add some (negative) character class tests Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 08/11] slre: fix matching of escape sequence used inside character class Rasmus Villemoes
` (5 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
As preparation for fixing the handling of backslash-escapes used
inside a character class, refactor is_any_but() to be defined in terms
of is_any_of() so we don't have to repeat the same logic in two places.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
lib/slre.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/lib/slre.c b/lib/slre.c
index 4f455400d3a..5cb0d3ec7fa 100644
--- a/lib/slre.c
+++ b/lib/slre.c
@@ -484,17 +484,14 @@ is_any_of(const unsigned char *p, int len, const char *s, int *ofs)
static int
is_any_but(const unsigned char *p, int len, const char *s, int *ofs)
{
- int i, ch;
-
- ch = s[*ofs];
+ int dummy = *ofs;
- for (i = 0; i < len; i++) {
- if (p[i] == ch)
- return 0;
+ if (is_any_of(p, len, s, &dummy)) {
+ return 0;
+ } else {
+ (*ofs)++;
+ return 1;
}
-
- (*ofs)++;
- return 1;
}
static int
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 08/11] slre: fix matching of escape sequence used inside character class
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (6 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 07/11] slre: refactor is_any_but() Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 09/11] test: slre: add test cases for escape char in " Rasmus Villemoes
` (4 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
At the compile stage, the anyof() function clearly intends to handle escape
sequences like \d (for digits) inside square brackets, since the logic
emits a 0 byte followed by the code representing the character
class (NONSPACE, SPACE or DIGIT).
However, this is not handled in the corresponding match helper
is_any_of(); it just naively loops over all the bytes in the ->data
array emitted by anyof() and compares those directly to the current
character. For example, this means that the string "\x11" (containing
the single character with value 17) is matched by the regex "[#%\d]",
because DIGIT happens to be 17.
Fix that by recognizing a zero byte as indicating something special
and act accordingly. In order not to repeat the "increment *ofs and
return 1" in all places, put those two lines after a new match: label.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
lib/slre.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/lib/slre.c b/lib/slre.c
index 5cb0d3ec7fa..87dfde720e9 100644
--- a/lib/slre.c
+++ b/lib/slre.c
@@ -472,13 +472,33 @@ is_any_of(const unsigned char *p, int len, const char *s, int *ofs)
ch = s[*ofs];
- for (i = 0; i < len; i++)
- if (p[i] == ch) {
- (*ofs)++;
- return 1;
+ for (i = 0; i < len; i++) {
+ if (p[i] == '\0') {
+ switch (p[++i]) {
+ case NONSPACE:
+ if (!isspace(ch))
+ goto match;
+ break;
+ case SPACE:
+ if (isspace(ch))
+ goto match;
+ break;
+ case DIGIT:
+ if (isdigit(ch))
+ goto match;
+ break;
+ }
+ continue;
}
+ if (p[i] == ch)
+ goto match;
+ }
return 0;
+
+match:
+ (*ofs)++;
+ return 1;
}
static int
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 09/11] test: slre: add test cases for escape char in character class
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (7 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 08/11] slre: fix matching of escape sequence used inside character class Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 10/11] slre: implement support for ranges in character classes Rasmus Villemoes
` (3 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
test/lib/slre.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/test/lib/slre.c b/test/lib/slre.c
index 9b41ea92f38..053d046075e 100644
--- a/test/lib/slre.c
+++ b/test/lib/slre.c
@@ -26,6 +26,9 @@ static const struct re_test re_test[] = {
{ "Gadsby", "^[^eE]*$", 1},
{ "Ernest", "^[^eE]*$", 0},
{ "6d41f0a39d6", "^[0123456789abcdef]*$", 1 },
+ /* DIGIT is 17 */
+ { "##\x11%%\x11", "^[#%\\d]*$", 0 },
+ { "##23%%45", "^[#%\\d]*$", 1 },
{}
};
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 10/11] slre: implement support for ranges in character classes
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (8 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 09/11] test: slre: add test cases for escape char in " Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 14:10 ` [PATCH 11/11] test: slre: add tests for character ranges Rasmus Villemoes
` (2 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
When trying to use U-Boot's regex facility, it is a rather large
gotcha that [a-z] range syntax is not supported. It doesn't require a
lot of extra code to implement that; we just let the regular parsing
emit the start and end literal symbols as usual, and add a new
"escape" code RANGE.
At match time, this means the code will first just see an 'a' and try
to match that, and only then recognize that it's actually part of a
range and then do the 'a' <= ch <= 'z' test.
Of course, this means that a - in the middle of a [] pair no longer
matches a literal dash, but I highly doubt anybody relies on
that. Putting it first or last, or escaping it with \, as in most
other RE engines, continues to work.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
lib/slre.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/lib/slre.c b/lib/slre.c
index 87dfde720e9..117815a6d60 100644
--- a/lib/slre.c
+++ b/lib/slre.c
@@ -30,7 +30,7 @@
#include <slre.h>
enum {END, BRANCH, ANY, EXACT, ANYOF, ANYBUT, OPEN, CLOSE, BOL, EOL,
- STAR, PLUS, STARQ, PLUSQ, QUEST, SPACE, NONSPACE, DIGIT};
+ STAR, PLUS, STARQ, PLUSQ, QUEST, SPACE, NONSPACE, DIGIT, RANGE};
#ifdef SLRE_TEST
static struct {
@@ -55,7 +55,8 @@ static struct {
{"QUEST", 1, "o"}, /* Match zero or one time, "?" */
{"SPACE", 0, ""}, /* Match whitespace, "\s" */
{"NONSPACE", 0, ""}, /* Match non-space, "\S" */
- {"DIGIT", 0, ""} /* Match digit, "\d" */
+ {"DIGIT", 0, ""}, /* Match digit, "\d" */
+ {"RANGE", 0, ""}, /* Range separator - */
};
#endif /* SLRE_TEST */
@@ -260,6 +261,15 @@ anyof(struct slre *r, const char **re)
return;
/* NOTREACHED */
break;
+ case '-':
+ if (r->data_size == old_data_size || **re == ']') {
+ /* First or last character, just match - itself. */
+ store_char_in_data(r, '-');
+ break;
+ }
+ store_char_in_data(r, 0);
+ store_char_in_data(r, RANGE);
+ break;
case '\\':
esc = get_escape_char(re);
if ((esc & 0xff) == 0) {
@@ -487,6 +497,14 @@ is_any_of(const unsigned char *p, int len, const char *s, int *ofs)
if (isdigit(ch))
goto match;
break;
+ case RANGE:
+ /*
+ * a-z is represented in the data array as {'a', \0, RANGE, 'z'}
+ */
+ ++i;
+ if (p[i - 3] <= (unsigned char)ch && (unsigned char)ch <= p[i])
+ goto match;
+ break;
}
continue;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 11/11] test: slre: add tests for character ranges
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (9 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 10/11] slre: implement support for ranges in character classes Rasmus Villemoes
@ 2025-05-06 14:10 ` Rasmus Villemoes
2025-05-06 16:44 ` [PATCH 00/11] regex patches Tom Rini
2025-05-10 11:25 ` Simon Glass
12 siblings, 0 replies; 19+ messages in thread
From: Rasmus Villemoes @ 2025-05-06 14:10 UTC (permalink / raw)
To: u-boot; +Cc: Tom Rini, Rasmus Villemoes
The first of these, { "U-Boot", "^[B-Uo-t]*$", 0 }, would match
previously when the - and the letters were all interpreted literally.
Signed-off-by: Rasmus Villemoes <ravi@prevas.dk>
---
test/lib/slre.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/test/lib/slre.c b/test/lib/slre.c
index 053d046075e..ff2386d614a 100644
--- a/test/lib/slre.c
+++ b/test/lib/slre.c
@@ -29,6 +29,14 @@ static const struct re_test re_test[] = {
/* DIGIT is 17 */
{ "##\x11%%\x11", "^[#%\\d]*$", 0 },
{ "##23%%45", "^[#%\\d]*$", 1 },
+ { "U-Boot", "^[B-Uo-t]*$", 0 },
+ { "U-Boot", "^[A-Zm-v-]*$", 1 },
+ { "U-Boot", "^[-A-Za-z]*$", 1 },
+ /* The range --C covers both - and B. */
+ { "U-Boot", "^[--CUot]*$", 1 },
+ { "U-Boot", "^[^0-9]*$", 1 },
+ { "U-Boot", "^[^0-9<->]*$", 1 },
+ { "U-Boot", "^[^0-9<\\->]*$", 0 },
{}
};
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 00/11] regex patches
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (10 preceding siblings ...)
2025-05-06 14:10 ` [PATCH 11/11] test: slre: add tests for character ranges Rasmus Villemoes
@ 2025-05-06 16:44 ` Tom Rini
2025-05-10 11:25 ` Simon Glass
12 siblings, 0 replies; 19+ messages in thread
From: Tom Rini @ 2025-05-06 16:44 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: u-boot
[-- Attachment #1: Type: text/plain, Size: 837 bytes --]
On Tue, May 06, 2025 at 04:10:24PM +0200, Rasmus Villemoes wrote:
> This started as a rather simple patch, 1/11, adding the ability to
> more conveniently do regex matching in shell.
>
> But with that, it became very easy to see what the slre library can
> and especially what it cannot do, and that way I found both outright
> bugs and a "wow, doesn't it support that syntax" gotcha. I couldn't
> find any tests ('git grep slre -- test/' was empty), so I added a
> small test suite and tweaked slre.c.
I will say I worry there might be users depending on the bugs you've
fixed here, but I don't think that's a reason to not fix the bugs. We
can if needed deal with bug compatibility some opt-in way if required in
the future. And yes, this is a note to anyone in the future that has hit
upon the problem.
--
Tom
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 00/11] regex patches
2025-05-06 14:10 [PATCH 00/11] regex patches Rasmus Villemoes
` (11 preceding siblings ...)
2025-05-06 16:44 ` [PATCH 00/11] regex patches Tom Rini
@ 2025-05-10 11:25 ` Simon Glass
12 siblings, 0 replies; 19+ messages in thread
From: Simon Glass @ 2025-05-10 11:25 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: u-boot, Tom Rini
On Tue, 6 May 2025 at 16:15, Rasmus Villemoes <ravi@prevas.dk> wrote:
>
> This started as a rather simple patch, 1/11, adding the ability to
> more conveniently do regex matching in shell.
>
> But with that, it became very easy to see what the slre library can
> and especially what it cannot do, and that way I found both outright
> bugs and a "wow, doesn't it support that syntax" gotcha. I couldn't
> find any tests ('git grep slre -- test/' was empty), so I added a
> small test suite and tweaked slre.c.
>
> Rasmus Villemoes (11):
> cmd: test: add support for =~ operator
> slre: add myself as maintainer
> test: slre: add tests for regex library
> slre: drop wrong "anchored" optimization
> test: slre: add more test cases
> test: slre: add some (negative) character class tests
> slre: refactor is_any_but()
> slre: fix matching of escape sequence used inside character class
> test: slre: add test cases for escape char in character class
> slre: implement support for ranges in character classes
> test: slre: add tests for character ranges
>
> MAINTAINERS | 7 +++++
> cmd/test.c | 19 ++++++++++++
> include/slre.h | 1 -
> lib/slre.c | 78 ++++++++++++++++++++++++++++++++---------------
> test/lib/Makefile | 1 +
> test/lib/slre.c | 58 +++++++++++++++++++++++++++++++++++
> 6 files changed, 138 insertions(+), 26 deletions(-)
> create mode 100644 test/lib/slre.c
>
> --
> 2.49.0
>
For the series:
Reviewed-by: Simon Glass <sjg@chromium.org>
Thank you for doing this.
^ permalink raw reply [flat|nested] 19+ messages in thread