All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hareesh Nagarajan <hareesh.nagarajan@gmail.com>
To: fabio@crearium.com
Cc: linux-c-programming@vger.kernel.org
Subject: Re: Pattern matching programming
Date: Thu, 19 May 2005 20:55:25 -0500	[thread overview]
Message-ID: <7728232c050519185517fc01f9@mail.gmail.com> (raw)
In-Reply-To: <200505182009.29868.lain@neotes.org>

On 5/18/05, Fabrizio Sestito <lain@neotes.org> wrote:
> On Wednesday 18 May 2005 17:36, fabio@crearium.com wrote:
> > Hello,
> >
> > I am trying to code a small C program that basically takes a long text
> > file with data that comes from a mysql server.

If you know the exact syntax of the incoming text, you could hand
write a parser. Essentially, you need to know all the states you can
be in.

For e.g.: You cannot encounter a </p> before you a <p>. Etc.

HTH,

Hareesh

PS: But you should use an existing library which Fabrizio mentions,
instead of reinventing the wheel.

> >
> > But I realize It is better to use regular expression. This is an examples
> > of the text:
> >
> > =1 <p> blah </p> <div foo>{$foobar}</div>blah.... <p>linux rulez</p>
> > misc characters.... =2 blah blah <p> linux rulez again</p>....
> > <p>foo</p?blah
> >
> >
> > And so on.
> >
> > The patterns are:
> >
> > The record is represented by an equal. Ej, record 1 is "=1", record 2 is
> > "=2" and so on.
> >
> > The desired text is where "linux rulez" is inside, it is the FIRST <p>
> > </p> AFTER a record.
> >
> > So, I see that program this makes no sense because it is better to use sed
> > and awk.
> >
> > The result I want to have is something like:
> >
> > 1 linux rulez
> > 2 linux rulez again
> > 3 linux rulez so far
> > ...etc
> >
> > The idea is elimate all <div>'s tags, then get the numbers (maybe with awk
> > -F"="), and then get the next <p> taq, remove the tags themself and
> > numbers and then the text and do the same procedure for all the 65230
> > records.
> >
> > Thanks alot for any comment, sorry for the 'offtopic'
> >
> > Kind regards,
> >
> > fabio
> >
> Why don't you use an XML parser library?
> 
> Fabrizio
> -
> To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2005-05-20  1:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-18 17:36 Pattern matching programming fabio
2005-05-18 20:09 ` Fabrizio Sestito
2005-05-20  1:55   ` Hareesh Nagarajan [this message]
2005-05-20 20:36 ` Glynn Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7728232c050519185517fc01f9@mail.gmail.com \
    --to=hareesh.nagarajan@gmail.com \
    --cc=fabio@crearium.com \
    --cc=linux-c-programming@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.