From: Hareesh Nagarajan <hareesh.nagarajan@gmail.com>
To: fabio@crearium.com
Cc: linux-c-programming@vger.kernel.org
Subject: Re: Pattern matching programming
Date: Thu, 19 May 2005 20:55:25 -0500 [thread overview]
Message-ID: <7728232c050519185517fc01f9@mail.gmail.com> (raw)
In-Reply-To: <200505182009.29868.lain@neotes.org>
On 5/18/05, Fabrizio Sestito <lain@neotes.org> wrote:
> On Wednesday 18 May 2005 17:36, fabio@crearium.com wrote:
> > Hello,
> >
> > I am trying to code a small C program that basically takes a long text
> > file with data that comes from a mysql server.
If you know the exact syntax of the incoming text, you could hand
write a parser. Essentially, you need to know all the states you can
be in.
For e.g.: You cannot encounter a </p> before you a <p>. Etc.
HTH,
Hareesh
PS: But you should use an existing library which Fabrizio mentions,
instead of reinventing the wheel.
> >
> > But I realize It is better to use regular expression. This is an examples
> > of the text:
> >
> > =1 <p> blah </p> <div foo>{$foobar}</div>blah.... <p>linux rulez</p>
> > misc characters.... =2 blah blah <p> linux rulez again</p>....
> > <p>foo</p?blah
> >
> >
> > And so on.
> >
> > The patterns are:
> >
> > The record is represented by an equal. Ej, record 1 is "=1", record 2 is
> > "=2" and so on.
> >
> > The desired text is where "linux rulez" is inside, it is the FIRST <p>
> > </p> AFTER a record.
> >
> > So, I see that program this makes no sense because it is better to use sed
> > and awk.
> >
> > The result I want to have is something like:
> >
> > 1 linux rulez
> > 2 linux rulez again
> > 3 linux rulez so far
> > ...etc
> >
> > The idea is elimate all <div>'s tags, then get the numbers (maybe with awk
> > -F"="), and then get the next <p> taq, remove the tags themself and
> > numbers and then the text and do the same procedure for all the 65230
> > records.
> >
> > Thanks alot for any comment, sorry for the 'offtopic'
> >
> > Kind regards,
> >
> > fabio
> >
> Why don't you use an XML parser library?
>
> Fabrizio
> -
> To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2005-05-20 1:55 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-18 17:36 Pattern matching programming fabio
2005-05-18 20:09 ` Fabrizio Sestito
2005-05-20 1:55 ` Hareesh Nagarajan [this message]
2005-05-20 20:36 ` Glynn Clements
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7728232c050519185517fc01f9@mail.gmail.com \
--to=hareesh.nagarajan@gmail.com \
--cc=fabio@crearium.com \
--cc=linux-c-programming@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.