From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Alex Riesen <raa.lkml@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/2] grep: don't call regexec() for fixed strings
Date: Mon, 12 Jan 2009 20:18:24 +0100 [thread overview]
Message-ID: <496B9780.3030000@lsrfire.ath.cx> (raw)
In-Reply-To: <81b0412b0901120732t1bd1978awdc4be47767e02863@mail.gmail.com>
Alex Riesen schrieb:
> 2009/1/10 René Scharfe <rene.scharfe@lsrfire.ath.cx>:
>> +static int isregexspecial(int c)
>> +{
>> + return isspecial(c) || c == '$' || c == '(' || c == ')' || c == '+' ||
>> + c == '.' || c == '^' || c == '{' || c == '|';
>> +}
>> +
>> +static int is_fixed(const char *s)
>> +{
>> + while (!isregexspecial(*s))
>> + s++;
>> + return !*s;
>> +}
>
> strchr?
Oh, yes, that would look nicer.
Another option is to extend ctype.c and implement isregexspecial() --
and while we're at it islowerxdigit() (builtin-name-rev.c::ishex()) and
iswordchar() (config.c::iskeychar(), grep.c::word_char()), too -- as
table lookups. I.e., something like the following (untested).
Which of the mentioned functions are really worth of this promotion?
The isregexspecial() char class has more members than isspecial(), but
it's not performance critical (unless you have a lot of patterns and
only a small amount of data to grep :).
Are there more candidates for ctype-ification?
René
ctype.c | 14 ++++++++++----
git-compat-util.h | 6 ++++++
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/ctype.c b/ctype.c
index 9208d67..1a76586 100644
--- a/ctype.c
+++ b/ctype.c
@@ -10,20 +10,26 @@
#undef AA
#undef DD
#undef GS
+#undef RR
+#undef US
+#undef Ah
#define SS GIT_SPACE
#define AA GIT_ALPHA
#define DD GIT_DIGIT
#define GS GIT_SPECIAL /* \0, *, ?, [, \\ */
+#define RR GIT_REGEX_SPECIAL /* $, (, ), +, ., ^, {, | */
+#define US GIT_UNDERSCORE
+#define Ah (GIT_ALPHA | GIT_LOWER_XDIGIT)
unsigned char sane_ctype[256] = {
GS, 0, 0, 0, 0, 0, 0, 0, 0, SS, SS, 0, 0, SS, 0, 0, /* 0-15 */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 16-15 */
- SS, 0, 0, 0, 0, 0, 0, 0, 0, 0, GS, 0, 0, 0, 0, 0, /* 32-15 */
+ SS, 0, 0, 0, RR, 0, 0, 0, RR, RR, GS, RR, 0, 0, RR, 0, /* 32-15 */
DD, DD, DD, DD, DD, DD, DD, DD, DD, DD, 0, 0, 0, 0, 0, GS, /* 48-15 */
0, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, /* 64-15 */
- AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, GS, GS, 0, 0, 0, /* 80-15 */
- 0, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, /* 96-15 */
- AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, 0, 0, 0, 0, 0, /* 112-15 */
+ AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, GS, GS, 0, RR, US, /* 80-15 */
+ 0, Ah, Ah, Ah, Ah, Ah, Ah, AA, AA, AA, AA, AA, AA, AA, AA, AA, /* 96-15 */
+ AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, RR, RR, 0, 0, 0, /* 112-15 */
/* Nothing in the 128.. range */
};
diff --git a/git-compat-util.h b/git-compat-util.h
index e20b1e8..5eaa662 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -328,12 +328,18 @@ extern unsigned char sane_ctype[256];
#define GIT_DIGIT 0x02
#define GIT_ALPHA 0x04
#define GIT_SPECIAL 0x08
+#define GIT_REGEX_SPECIAL 0x10
+#define GIT_UNDERSCORE 0x20
+#define GIT_LOWER_XDIGIT 0x40
#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
#define isspace(x) sane_istest(x,GIT_SPACE)
#define isdigit(x) sane_istest(x,GIT_DIGIT)
#define isalpha(x) sane_istest(x,GIT_ALPHA)
#define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
#define isspecial(x) sane_istest(x,GIT_SPECIAL)
+#define isregexspecial(x) sane_istest(x,GIT_SPECIAL | GIT_REGEX_SPECIAL)
+#define iswordchar(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT | GIT_UNDERSCORE)
+#define islowerxdigit(x) sane_istest(x,GIT_DIGIT | GIT_LOWER_XDIGIT)
#define tolower(x) sane_case((unsigned char)(x), 0x20)
#define toupper(x) sane_case((unsigned char)(x), 0)
next prev parent reply other threads:[~2009-01-12 19:19 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-09 23:08 [PATCH 1/2] grep -w: forward to next possible position after rejected match René Scharfe
2009-01-09 23:18 ` [PATCH 2/2] grep: don't call regexec() for fixed strings René Scharfe
2009-01-10 20:37 ` Junio C Hamano
2009-01-12 12:25 ` Mikael Magnusson
2009-01-12 13:33 ` Johannes Schindelin
2009-01-12 15:32 ` Alex Riesen
2009-01-12 19:18 ` René Scharfe [this message]
2009-01-13 8:13 ` Junio C Hamano
2009-01-17 15:50 ` [PATCH 1/4] Add ctype test René Scharfe
2009-01-17 15:50 ` [PATCH 2/4] Reformat ctype.c René Scharfe
2009-01-17 15:50 ` [PATCH 3/4] Change NUL char handling of isspecial() René Scharfe
2009-01-17 15:50 ` [PATCH 4/4] Add is_regex_special() René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=496B9780.3030000@lsrfire.ath.cx \
--to=rene.scharfe@lsrfire.ath.cx \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=raa.lkml@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).