From: Eric Bambach <eric@cisu.net>
To: James Colannino <james@colannino.org>
Cc: linux-c-programming@vger.kernel.org
Subject: Re: comparing char to other known char's
Date: Thu, 23 Jun 2005 22:34:55 -0500 [thread overview]
Message-ID: <200506232234.55312.eric@cisu.net> (raw)
In-Reply-To: <42BB52E4.5090504@colannino.org>
On Thursday 23 June 2005 07:25 pm, James Colannino wrote:
> Eric Bambach wrote:
> > Generally speaking (in terms of input validation), its better practice to
> > check against a LEGAL set of characters rather than an illegal set. That
> > way you can get all the characters you need, but everything else is
> > blocked. If you block illegal ones you're bound to miss a few or even
> > ones from extended charsets and input methods that you might not have
> > thought of that could wreck havoc in your program.
>
> Here's what I've whipped up based on your suggestion that I should look
> for legal characters instead of the other way around:
>
> <CODE>
>
> /* This function returns 1 if the character being checked is legal and 0
> if it isn't. */
>
> int legal_characters(char character_to_check) {
>
> int index;
> legal_characters[] =
> "abcdefghijklmnopqrstuvwxyzAVCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_-";
>
> int number_of_legal_chars = sizeof(legal_characters) / sizeof(char);
>
> for (index = 0; index < number_of_legal_chars; ++index) {
> if (character_to_check == legal_characters[index])
> return 1;
> }
>
> return 0;
>
> </CODE>
>
> How does this function look?
A solution at best. Inneficient at worst. I suppose if you are only comparing
a few characters it will get you by or if you dont need them processed fast.
The problem is that for each character you want to validate you have a
minimum of 1 and a maximum of 65ish loop iterations. That could easily add up
on a long string. Even a sentence with 30 characters is a minimum of 30
itterations, max of 1800 with the average probably being a few to several
hundred.
It does do the job nicely though of returning true if the character is legal
and you did implement my second suggestion well. Try my code below (not
tested, might not compile but the general idea is there. Its C++ code because
of the bool type and the comments, but it could easily be C99 with a little
editing). See how the whole function collapses to a single line after it is
first run? You pay for it with a little extra memory but if you need to throw
hundreds or more characters at the function it will do it fast-It will be
super-fast no matter how many characters you throw at it. Each charater is
analyzed by a quick memory jump. The computer only has to execute a few
instructions per character you pump into the function as opposed to tens to
hundreds of instructions in your function(compare, jump, add, compare, jump,
add). There is still a jump at entry to my function each time though because
I did the lazy initialization. If you want to optimize further hand code the
table as static and you will see extra performance.
Perhaps someone can tell me if the compiler is smart enough to optimize out my
static initialization of the array and collapse it into a single static
initialization without all the code. I would be very interested in knowing
that.
e.g. table[256] = { 0,0,1,1,0... };
But maybe im being too pedantic ....your function DOES do the job ;)
I just dont know how you plan to use it.
HTH!
bool legal(unsigned char *character) {
// legal_characters[] =
//"abcdefghijklmnopqrstuvwxyzAVCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_-";
static unsigned char table[256];
static bool initialized = false;
if ( !initialized){
//Set up the whole table as false
memcpy(table,0,sizeof(table)); //did i get the memcpy args right? Its been a
few months since ive done C/c++
//Set up 0-9 as true
for(int j = 48; j <=58;j++){
table[j] = 1;
}
//A-Z is true
for(int j = 65; j <=90;j++){
table[j] = 1;
}
//a-z is true
for(int j = 97; j <=122;j++){
table[j] = 1;
}
//The stragglers
table[95] = 1;
table[45] = 1;
initialized = true;
}
return table[*charater];
}
> James
--
----------------------------------------
--EB
> All is fine except that I can reliably "oops" it simply by trying to read
> from /proc/apm (e.g. cat /proc/apm).
> oops output and ksymoops-2.3.4 output is attached.
> Is there anything else I can contribute?
The latitude and longtitude of the bios writers current position, and
a ballistic missile.
--Alan Cox LKML-December 08,2000
----------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2005-06-24 3:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-22 23:22 comparing char to other known char's James Colannino
2005-06-22 23:44 ` David L. Martin
2005-06-22 23:46 ` Eric Bambach
2005-06-23 0:25 ` James Colannino
2005-06-23 13:10 ` Adrian Popescu
2005-06-23 20:40 ` James Colannino
2005-06-23 22:57 ` Eric Bambach
2005-06-23 23:58 ` James Colannino
2005-06-24 0:25 ` James Colannino
2005-06-24 3:34 ` Eric Bambach [this message]
2005-06-24 5:48 ` James Colannino
2005-06-24 7:57 ` J.
2005-06-24 8:32 ` Glynn Clements
2005-06-25 11:58 ` HIToC
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200506232234.55312.eric@cisu.net \
--to=eric@cisu.net \
--cc=james@colannino.org \
--cc=linux-c-programming@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).