Re: Performance question

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleg Drokin <green@namesys.com>
To: Philipp G?hring <p.guehring@futureware.at>
Cc: reiserfs-list@namesys.com
Subject: Re: Performance question
Date: Mon, 6 May 2002 17:01:03 +0400	[thread overview]
Message-ID: <20020506170103.A954@namesys.com> (raw)
In-Reply-To: <200205051644.g45GijA03908@linux1.futureware.at>

Hello!

On Sun, May 05, 2002 at 06:43:45PM +0200, Philipp G?hring wrote:

> > *glob functions are implemented by various library functions, that do full
> > readdir scans at least once, I believe.
> I thought I heard about a syscall, that makes it possible to pass the glob to 
> the filesystem, so that the filesystem can optimize globbings as it likes, 
> and pass the result back to the application, but ok.

I do not think something like that exists in Linux. But if you
come up with man page from section 2...

> > > Or should I do 2 opendir-readdir loops, one to read over the first 39
> > > results, that I do not need, and the second one to geht the results 40 to
> > > 49?
> > In fact I do not see why do you need to do 2 opendir-readdir loops.
> > One loop should be enough.
> Yeah. Sure. My mistake. One opendir, and 2 readdir loops. The first one skips 
> over unneeded results and the second one serves the data.

No. Still I think you need only one loop anyway, like this:
<pseudocode>
DIR=opendir(name);
while((result=readdir(DIR)) != NULL) {
	if ( check_filename_criteria(result->filename) ) {
		add_to_list_of_files_to_process(result->filename);
	}
}
for i in list_of_files_to_process {
	process_file(i);
}

So only one loop, and the second one does not count because it is serves
actual data.

> > > The problem here is that I have to readdir about 50000 files (40000 to
> > > get through the unneeded results, and 10000 to get the 10 results i need)
> > > But on the other hand, I do not have to remember 100 files, from which I
> > > only need 10.
> > I am completely missing the idea on where these numbers are from. Can you
> > explain in more details.
> I will try so.
> I have a table with 100000 files. A complete search would result for example 
> 100 files, which are spread across the whole directory.
> About every thousand files, there is one file, that matches the query.
> Since the client does not want to get 100 files at once, at first I return 
> only 10 results for the first page, and the user can navigate page-wise.
> So I built up the scenario where the user now wants the see results 40-49 
> from the query "001_*_1212_1", 
> which I assume as normal behaviour for my application.

Ah, I see what you mean. If you have a lot of resources, you can setup a session
and store all the search results for that session at server side.
So when second request comes in, you just read search result from the session.
Also you kill the session for 5 minutes after 5 minutes of inactivity on it or
so. Hm... This requires for cookies to be enabled, though. ;)

> > Readdir would require less iterations through 001/*, because number of
> > entries will be only 100 as you described above.
> > You get all these 100 entries and then loop 100 times trying to open
> > 001/${next_name}/1212/1 and deciding whenever you need this file or not.
> > (If it exists of course, or you might get -ENOENT and proceed to next
> > directory).
> > Also deleting directories would be an overkill.
> So the question is, how big that overkill is.

I mean that you do not need to delete directories, when they are empty.
You only need to create the directory structure once.

> Is there perhaps a benchmark that tested it already?

No, I do not think so, but feel free to compose and run your own benchmark.

> > I think this might be faster in many circumfstances.
> > Also what you've descrived looks very like to what squid does. And squid
> > people went to reiserfs-raw interface and are quite happy with it.
> I think the difference to squid is that they only need one result, not a part 
> of a search, with more than one result.

Hm. This is true.

Bye,
    Oleg

next prev parent reply	other threads:[~2002-05-06 13:01 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-05 14:20 Performance question Philipp Gühring
2002-05-05 15:07 ` Oleg Drokin
2002-05-05 16:43   ` Philipp G?hring
2002-05-06 13:01     ` Oleg Drokin [this message]
2002-05-06 11:06   ` Hans Reiser
  -- strict thread matches above, loose matches on Subject: below --
2003-03-31 21:37 performance question jp
2003-04-01  5:40 ` Trond Myklebust
2003-03-31 21:45 Lever, Charles
     [not found] <1049188686.19334.20.camel@deskpro02>
2003-04-01 15:39 ` jp
2003-04-01 16:06   ` Philippe Gramoullé
2003-04-01 16:22     ` Matt Heaton
2003-04-01 17:08       ` Philippe Gramoullé
2003-04-01 18:45   ` Bogdan Costescu
2005-09-12 19:06 Moritz Gartenmeister
2008-02-14 15:40 Performance question Font Bella
     [not found] ` <90d010000802140740y3ff2706ybc169728fbafbfb4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-14 16:27   ` Marcelo Leal
     [not found]     ` <42996ba90802140827p533779c6o8ab404400be51fdc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-14 16:56       ` Chuck Lever
2008-02-15 15:37         ` Font Bella
     [not found]           ` <90d010000802150737x2ad0739dmeaaa24dc2845e81a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-15 16:13             ` Trond Myklebust
     [not found]               ` <1203092030.11333.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-02-18  9:39                 ` Font Bella
     [not found]                   ` <90d010000802180139x49ac1f49x976f11cec0e01fdf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-18 16:59                     ` Chuck Lever
2008-02-15 16:18             ` Chuck Lever
2008-03-20 18:01 performance question david ahern
2009-01-17 17:18 Performance question Piergiorgio Sartor
2009-01-17 18:37 ` Bill Davidsen
2009-01-17 22:08 ` Keld Jørn Simonsen
2009-01-19 18:12   ` Piergiorgio Sartor
2009-01-21  0:15     ` Keld Jørn Simonsen
2009-01-21  1:05       ` Richard Scobie
2009-01-21 19:14       ` Piergiorgio Sartor
2009-01-21 20:15         ` Keld Jørn Simonsen
2009-01-21 20:26           ` Piergiorgio Sartor
2009-01-17 18:11 David Lethe
2009-01-17 18:20 ` Piergiorgio Sartor
2011-09-15 19:43 Performance Question --[ UxBoD ]--

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020506170103.A954@namesys.com \
    --to=green@namesys.com \
    --cc=p.guehring@futureware.at \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.