From: "Philipp G?hring" <p.guehring@futureware.at>
To: Oleg Drokin <green@namesys.com>, reiserfs-list@namesys.com
Subject: Re: Performance question
Date: Sun, 5 May 2002 18:43:45 +0200 [thread overview]
Message-ID: <200205051644.g45GijA03908@linux1.futureware.at> (raw)
In-Reply-To: <20020505190739.A13452@namesys.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello!
Thank you Oleg for your answers.
> *glob functions are implemented by various library functions, that do full
> readdir scans at least once, I believe.
I thought I heard about a syscall, that makes it possible to pass the glob to
the filesystem, so that the filesystem can optimize globbings as it likes,
and pass the result back to the application, but ok.
> > Or should I do 2 opendir-readdir loops, one to read over the first 39
> > results, that I do not need, and the second one to geht the results 40 to
> > 49?
>
> In fact I do not see why do you need to do 2 opendir-readdir loops.
> One loop should be enough.
Yeah. Sure. My mistake. One opendir, and 2 readdir loops. The first one skips
over unneeded results and the second one serves the data.
> You just compare each filename returned against your query and and if it
> matched remember it in separate list. So at the end of readdir loop you
> have a list of all names in a directory that match your query. And you can
> apply any additional check in place just not to remember unnecesary files.
>
> > The problem here is that I have to readdir about 50000 files (40000 to
> > get through the unneeded results, and 10000 to get the 10 results i need)
> > But on the other hand, I do not have to remember 100 files, from which I
> > only need 10.
>
> I am completely missing the idea on where these numbers are from. Can you
> explain in more details.
I will try so.
I have a table with 100000 files. A complete search would result for example
100 files, which are spread across the whole directory.
About every thousand files, there is one file, that matches the query.
Since the client does not want to get 100 files at once, at first I return
only 10 results for the first page, and the user can navigate page-wise.
So I built up the scenario where the user now wants the see results 40-49
from the query "001_*_1212_1",
which I assume as normal behaviour for my application.
> Binary tree is only helps when you know filename, I believe.
Ok.
> Readdir would require less iterations through 001/*, because number of
> entries will be only 100 as you described above.
> You get all these 100 entries and then loop 100 times trying to open
> 001/${next_name}/1212/1 and deciding whenever you need this file or not.
> (If it exists of course, or you might get -ENOENT and proceed to next
> directory).
> Also deleting directories would be an overkill.
So the question is, how big that overkill is.
Is there perhaps a benchmark that tested it already?
> I think this might be faster in many circumfstances.
> Also what you've descrived looks very like to what squid does. And squid
> people went to reiserfs-raw interface and are quite happy with it.
I think the difference to squid is that they only need one result, not a part
of a search, with more than one result.
But I am thinking about using reiserfs-raw too ...
(At the moment flexibility has still more priority for me than raw
performance)
Many greetings,
- --
~ Philipp G?hring p.guehring@futureware.at
~ http://www.livingxml.net/ ICQ UIN: 6588261
~ <xsl:value-of select="file:/home/philipp/.sig"/>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE81WFGlqQ+F+0wB3oRAtYSAJsGgaHnsohasbrjnJEQWAhi4tatSwCfQXDB
dGlKoxKq0vcB0jHMOV6AEWQ=
=heIa
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2002-05-05 16:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-05 14:20 Performance question Philipp Gühring
2002-05-05 15:07 ` Oleg Drokin
2002-05-05 16:43 ` Philipp G?hring [this message]
2002-05-06 13:01 ` Oleg Drokin
2002-05-06 11:06 ` Hans Reiser
-- strict thread matches above, loose matches on Subject: below --
2003-03-31 21:37 performance question jp
2003-04-01 5:40 ` Trond Myklebust
2003-03-31 21:45 Lever, Charles
[not found] <1049188686.19334.20.camel@deskpro02>
2003-04-01 15:39 ` jp
2003-04-01 16:06 ` Philippe Gramoullé
2003-04-01 16:22 ` Matt Heaton
2003-04-01 17:08 ` Philippe Gramoullé
2003-04-01 18:45 ` Bogdan Costescu
2005-09-12 19:06 Moritz Gartenmeister
2008-02-14 15:40 Performance question Font Bella
[not found] ` <90d010000802140740y3ff2706ybc169728fbafbfb4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-14 16:27 ` Marcelo Leal
[not found] ` <42996ba90802140827p533779c6o8ab404400be51fdc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-14 16:56 ` Chuck Lever
2008-02-15 15:37 ` Font Bella
[not found] ` <90d010000802150737x2ad0739dmeaaa24dc2845e81a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-15 16:13 ` Trond Myklebust
[not found] ` <1203092030.11333.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-02-18 9:39 ` Font Bella
[not found] ` <90d010000802180139x49ac1f49x976f11cec0e01fdf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-18 16:59 ` Chuck Lever
2008-02-15 16:18 ` Chuck Lever
2008-03-20 18:01 performance question david ahern
2009-01-17 17:18 Performance question Piergiorgio Sartor
2009-01-17 18:37 ` Bill Davidsen
2009-01-17 22:08 ` Keld Jørn Simonsen
2009-01-19 18:12 ` Piergiorgio Sartor
2009-01-21 0:15 ` Keld Jørn Simonsen
2009-01-21 1:05 ` Richard Scobie
2009-01-21 19:14 ` Piergiorgio Sartor
2009-01-21 20:15 ` Keld Jørn Simonsen
2009-01-21 20:26 ` Piergiorgio Sartor
2009-01-17 18:11 David Lethe
2009-01-17 18:20 ` Piergiorgio Sartor
2011-09-15 19:43 Performance Question --[ UxBoD ]--
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200205051644.g45GijA03908@linux1.futureware.at \
--to=p.guehring@futureware.at \
--cc=green@namesys.com \
--cc=reiserfs-list@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.