* File handling
@ 2002-04-29 22:46 Marius Nita
2002-04-29 22:57 ` Glynn Clements
0 siblings, 1 reply; 2+ messages in thread
From: Marius Nita @ 2002-04-29 22:46 UTC (permalink / raw)
To: linux-c-programming
I have a large text file (45Mb) which needs to be searched through in a decent
amount of time (4-5 seconds.) So far I have some Perl and shell scripts that
search it, but either they are way too slow, or if I try to tweak them to make
them faster, I usually run into low memory problems.
So I was thinking that if I write the program in C I would get better results.
Do you have any suggestions about a way to approach this, pointers about
libraries, etc?
Thanks.
marius
PS: The search is very simple. It just tries to match the exact keywords, case
insensitive, so that's not an issue. It's file reading performance that I'm
worried about.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: File handling
2002-04-29 22:46 File handling Marius Nita
@ 2002-04-29 22:57 ` Glynn Clements
0 siblings, 0 replies; 2+ messages in thread
From: Glynn Clements @ 2002-04-29 22:57 UTC (permalink / raw)
To: Marius Nita; +Cc: linux-c-programming
Marius Nita wrote:
> I have a large text file (45Mb) which needs to be searched through in a decent
> amount of time (4-5 seconds.) So far I have some Perl and shell scripts that
> search it, but either they are way too slow, or if I try to tweak them to make
> them faster, I usually run into low memory problems.
>
> So I was thinking that if I write the program in C I would get better results.
> Do you have any suggestions about a way to approach this, pointers about
> libraries, etc?
>
> PS: The search is very simple. It just tries to match the exact keywords, case
> insensitive, so that's not an issue. It's file reading performance that I'm
> worried about.
The issue isn't really the search, it's what you want to do once
you've found a match. Performing a regexp search should only impose a
minor overhead.
If you just need to extract text near the match point, sed should be
adequate, and is likely to be faster than perl.
If you need to perform more involved operations on the data around the
match, then it might be worth using C. In which case, the fastest way
to read the file is to use mmap() to map it into memory.
The standard interface for performing regexp searches in C is the
functions listed in the regcomp(3) manpage.
--
Glynn Clements <glynn.clements@virgin.net>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-04-29 22:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-29 22:46 File handling Marius Nita
2002-04-29 22:57 ` Glynn Clements
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).