From: "J." <mailing-lists@xs4all.nl>
To: linux-c-programming@vger.kernel.org
Subject: RE: Newbie - Perl Equivalent Split - Seg Faults
Date: Tue, 16 Nov 2004 22:20:46 +0100 (CET) [thread overview]
Message-ID: <Pine.LNX.4.21.0411162214450.16227-100000@hestia> (raw)
In-Reply-To: <D2AA47A6FB2C1A48AF0526440C0F245CAA3621@monm207.monmouth.army.mil>
On Mon, 13 Dec 2004, Huber, George K RDECOM CERDEC STCD SRI wrote:
> Darren wrote:
>
> >Oh.. I almost forgot, if I place the strtok in main instead of calling it as
> >a function in split_char - it works.
>
> > Here is a super simple program that (trys) to split a charvar based on a
> > delimiter. I get no compile errors. If I remove the strtok line, then
> > split_var returns the string passed to it from main just fine.
> >
> > I tried changing the char *delim from *delim to delim[50] - same problem.
> >
> > This is something stupid, and probably super simple. Coming from the Perl
> > world, I'm trying to write some equivalent string manipulation functions
> > that I can use throughout my programs to avoid repetition and make the code
> > cleaner and easier to read.
> >
> > #include <stdio.h>
> > #include <string.h>
> >
> > char *split_char(char *string, char *delim) {
> > fprintf( stderr, "\tString = %s \n", string);
> > fprintf( stderr, "\tDelimiter = %s \n", delim);
> > string = strtok(string, delim);
> > return string;
> > }
> >
> > int main()
> > {
> > char *testvar;
> > testvar = split_char("test-hello", "-");
> > fprintf( stderr, "\tArray = %s \n", testvar);
> > return(0);
> > }
> >
>
> First of all, this is not a silly question. You are going to find that string processing
> in C is very painfull when compared to PERL. After all PERL as initially designed as
> a text processing language, C was designed as a general purpose language.
>
> You probably want to take a look at how strtok works. It is a `destructive' function
> call in that it actually modifies the string that is being tokenized.
>
> As an example, consider tokenizing the string `test-case-hello'. After the first
> call to strtok (i.e. strtok(string, delim), with string containing "test-case-hello"
> and delim containing "-") we have the following situation. strtok has replaced the
> first occurance of the deliminator with a null ('\0') character, returns a pointer to the
> first token and moves it internal pointer to the character after the old deliminator.
>
> return value
> | internal pointer
> | |
> \/ \/
> t e s t\0case-hello\0
>
> on the next call to strtok (i.e. strtok(NULL, delim) -- not the value of the first
> parameter. NULL is used to signify that we are continuing to tokenize the first string),
> we start from the internal pointer and search forward to find the next deliminator. Now
> this is replaced with a null and the address of the token is returned and the internal
> pointer is moved to the first character after the old deliminator. This process continues
> until no more delimitors are found and then strtok returns null.
>
> At this point, the original string now looks like this:
>
> t e s t\0c a s e\0h e l l o\0
>
> When I need to split a line using strtok I typically do something like this. Note, I
> have not attempted to compile this, it should work (barring typos) - but no guarentees.
>
> char* string="test-case-hello"
> char* delim="-"
> int idx = 0;
>
> these creates my sting and delimators and an index value
>
> char* tokens[MAX_TOKENS];
>
> this creates an array of pointers to characters. In production code you would need
> to use some sort of dynamic data structure since you could not know the number of
> tokens in a line in advance
>
> char* token;
>
> this is a pointer to a character (which in C can also be a pointer to a string).
>
> for(idx = 0; idx < MAX_TOKENS; idx++)
> tokens[idx] = NULL;
>
> this is used to initialize each pointer in the array to NULL.
>
> token = strtok(string, delim);
> idx = 0;
>
> perform the first tokenization and reset the index value.
>
> while((idx < MAX_TOKENS) && (token != NULL)
> {
> tokens[idx] = token;
> idx++;
> strtok(NULL, delim);
> }
>
> this loops through the string, pulling out each token or until the max number of
> tokens has been found (do not want to overflow a buffer now do we?). Now at
> this point the array `tokens' contains the various tokens.
>
> idx = 0;
> while((idx < MAX_TOKENS) && (tokens[idx] != 0)
> {
> printf("%s ", tokens[nIdx]);
> idx++;
> }
>
> printf("\n");
>
> this loops through the array, stopping when index reaches MAX_TOKEN of a token
> is NULL, printing each token in turn.
>
> So now armed with this to write an equivalent `split' function in C you need to create
> a function that takes a string (or a character array in C-speak) and another string as
> arguments and returns an array of pointers to string. A first pass might be:
>
> char** split(char* string, char* delim)
> {
> char** tokens = malloc(sizeof(char*) * MAX_TOKENS);
> char* working = malloc((strlen(string) + 1) * sizeof(char));
> char* token;
> int idx;
>
> /* insure that malloc worked.... */
> if((NULL != working) && (NULL != tokens))
> {
> strcpy(working, string); /* make a working copy of the string */
>
> for(idx = 0; idx < MAX_TOKENS; idx++)
> tokens[idx] = NULL;
>
> token = strtok(working, delim);
> idx = 0;
>
> while((idx < MAX_TOKENS) && (token != NULL)
> {
> tokens[idx] = malloc(sizeof(char) * strlen(token);
> strcpy(tokens[idx], token);
> idx++;
> token = strtok(NULL, delim);
> }
>
> free(working);
> }
>
> return tokens;
> }
>
> You would use this function like,
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> char** split(char*, char*);
>
> int main(int argc, char** argv)
> {
> char* text="this-is-a-testing";
> char* delim="-";
> char** tokens;
> int i = 0;
>
> tokens = split(text,delim);
>
> while(NULL != tokens[i])
> {
> printf("token %d is %s\n", i, tokens[i]);
> i++;
> }
>
> /* NOTE : missing clean up of tokens. This program leaks
> memory like a sieve */
>
> return 0;
> }
>
> The program can be compiled as (assuming everything is in a file called split.c):
>
> gcc -ansi -pedantic -Wall split.c -o my_split
>
> and produces the following output:
Error, error.. because you missed a couple ')' and a define...
Sorry.. Just coulnd't resist the reply.. ;-) After reading all your
keyboard bashing ;-) Anywaysss.. May tha peace be with ya..
> George
J.
next prev parent reply other threads:[~2004-11-16 21:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-13 21:33 Newbie - Perl Equivalent Split - Seg Faults Huber, George K RDECOM CERDEC STCD SRI
2004-11-16 21:20 ` J. [this message]
-- strict thread matches above, loose matches on Subject: below --
2004-12-13 16:56 Darren Sessions
2004-12-13 17:10 ` Darren Sessions
2004-12-13 20:21 ` Jan-Benedict Glaw
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.21.0411162214450.16227-100000@hestia \
--to=mailing-lists@xs4all.nl \
--cc=linux-c-programming@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).