linux-c-programming.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Bambach <eric@cisu.net>
To: James Colannino <james@colannino.org>
Cc: linux-c-programming@vger.kernel.org
Subject: Re: comparing char to other known char's
Date: Thu, 23 Jun 2005 22:34:55 -0500	[thread overview]
Message-ID: <200506232234.55312.eric@cisu.net> (raw)
In-Reply-To: <42BB52E4.5090504@colannino.org>

On Thursday 23 June 2005 07:25 pm, James Colannino wrote:
> Eric Bambach wrote:
> > Generally speaking (in terms of input validation), its better practice to
> > check against a LEGAL set of characters rather than an illegal set. That
> > way you can get all the characters you need, but everything else is
> > blocked. If you block illegal ones you're bound to miss a few or even
> > ones from extended charsets and input methods that you might not have
> > thought of that could wreck havoc in your program.
>
> Here's what I've whipped up based on your suggestion that I should look
> for legal characters instead of the other way around:
>
> <CODE>
>
> /* This function returns 1 if the character being checked is legal and 0
> if it isn't. */
>
> int legal_characters(char character_to_check) {
>
>  int index;
>  legal_characters[] =
> "abcdefghijklmnopqrstuvwxyzAVCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_-";
>
>  int number_of_legal_chars = sizeof(legal_characters) / sizeof(char);
>
>  for (index = 0; index < number_of_legal_chars; ++index) {
>   if (character_to_check == legal_characters[index])
>    return 1;
>  }
>
>  return 0;
>
> </CODE>
>
> How does this function look?

A solution at best. Inneficient at worst. I suppose if you are only comparing 
a few characters it will get you by or if you dont need them processed fast. 
The problem is that for each character you want to validate you have a 
minimum of 1 and a maximum of 65ish loop iterations. That could easily add up 
on a long string. Even a sentence with 30 characters is a minimum of 30 
itterations, max of 1800 with the average probably being a few to several 
hundred.

It does do the job nicely though of returning true if the character is legal 
and you did implement my second suggestion well. Try my code below (not 
tested, might not compile but the general idea is there. Its C++ code because 
of the bool type and the comments, but it could easily be C99 with a little 
editing). See how the whole function collapses to a single line after it is 
first run? You pay for it with a little extra memory but if you need to throw 
hundreds or more characters at the function it will do it fast-It will be 
super-fast no matter how many characters you throw at it. Each charater is 
analyzed by a quick memory jump. The computer only has to execute a few 
instructions per character you pump into the function as opposed to tens to 
hundreds of instructions in your function(compare, jump, add, compare, jump, 
add). There is still a jump at entry to my function each time though because  
I did the lazy initialization. If you want to optimize further hand code the 
table as static and you will see extra performance. 

Perhaps someone can tell me if the compiler is smart enough to optimize out my 
static initialization of the array and collapse it into a single static 
initialization without all the code. I would be very interested in knowing 
that.

e.g. table[256] = { 0,0,1,1,0... };

But maybe im being too pedantic ....your function DOES do the job ;)
I just dont know how you plan to use it.


HTH!

bool legal(unsigned char *character) {

 // legal_characters[] =
 //"abcdefghijklmnopqrstuvwxyzAVCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_-";
 static unsigned char table[256];
 static bool initialized = false;
 if ( !initialized){
 //Set up the whole table as false
 memcpy(table,0,sizeof(table)); //did i get the memcpy args right? Its been  a 
few months since ive done C/c++
 //Set up 0-9 as true
 for(int j = 48; j <=58;j++){
  table[j] = 1;
 }
 //A-Z is true
  for(int j = 65; j <=90;j++){
    table[j] = 1;
  } 
 //a-z is true
   for(int j = 97; j <=122;j++){
      table[j] = 1;
   } 
        //The stragglers
 table[95] = 1;
 table[45] = 1;
        initialized = true;
  }
  return table[*charater];
}

> James

-- 
----------------------------------------
--EB

> All is fine except that I can reliably "oops" it simply by trying to read
> from /proc/apm (e.g. cat /proc/apm).
> oops output and ksymoops-2.3.4 output is attached.
> Is there anything else I can contribute?

The latitude and longtitude of the bios writers current position, and
a ballistic missile.

                --Alan Cox LKML-December 08,2000 

----------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2005-06-24  3:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-22 23:22 comparing char to other known char's James Colannino
2005-06-22 23:44 ` David L. Martin
2005-06-22 23:46 ` Eric Bambach
2005-06-23  0:25 ` James Colannino
2005-06-23 13:10 ` Adrian Popescu
2005-06-23 20:40   ` James Colannino
2005-06-23 22:57     ` Eric Bambach
2005-06-23 23:58       ` James Colannino
2005-06-24  0:25       ` James Colannino
2005-06-24  3:34         ` Eric Bambach [this message]
2005-06-24  5:48           ` James Colannino
2005-06-24  7:57         ` J.
2005-06-24  8:32         ` Glynn Clements
2005-06-25 11:58 ` HIToC

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200506232234.55312.eric@cisu.net \
    --to=eric@cisu.net \
    --cc=james@colannino.org \
    --cc=linux-c-programming@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).