All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans Reiser <reiser@namesys.com>
To: Oleg Drokin <green@namesys.com>
Cc: Philipp G?hring <p.guehring@futureware.at>, reiserfs-list@namesys.com
Subject: Re: Performance question
Date: Mon, 06 May 2002 15:06:51 +0400	[thread overview]
Message-ID: <3CD663CB.20703@namesys.com> (raw)
In-Reply-To: 20020505190739.A13452@namesys.com

glob is implemented by the shell not the filesystem.  This is not for 
good reason, it just is.  We could write something for you to do it in 
the filesystem and it would be faster.  Is your need for speed critical 
enough to justify writing something special for it?

Hans


Oleg Drokin wrote:

>Hello!
>
>On Sun, May 05, 2002 at 04:20:13PM +0200, Philipp G?hring wrote:
>
>  
>
>>Let's say I have a directory with 100.000 files in it.
>>The filenames look like
>>name1_name2_name3_id
>>So I have
>>001_41052_50125_1
>>001_63216_1212_1
>>I have to create a search engine, that serves for example the 4th Block of 10 
>>files that match the query "001_*_1212_1". The how query would result to 100 
>>files, that are spread across the directory.
>>Now my question:
>>Is it faster with ReiserFS to do a bsd_glob("001_*_1212_1") first, which 
>>should result to about 100 entries, and then take the entries 40 to 49 from 
>>the resulting array? 
>>(Is ReiserFS able to directly return 100 files out of 100000 with the 
>>globbing function, or is it an iteration over all files in the directory?)
>>    
>>
>
>*glob functions are implemented by various library functions, that do full
>readdir scans at least once, I believe.
>
>  
>
>>Or should I do 2 opendir-readdir loops, one to read over the first 39 
>>results, that I do not need, and the second one to geht the results 40 to 49?
>>    
>>
>
>In fact I do not see why do you need to do 2 opendir-readdir loops.
>One loop should be enough.
>You just compare each filename returned against your query and and if it matched
>remember it in separate list. So at the end of readdir loop you have a list of
>all names in a directory that match your query. And you can apply any additional
>check in place just not to remember unnecesary files.
>
>  
>
>>The problem here is that I have to readdir about 50000 files (40000 to get 
>>through the unneeded results, and 10000 to get the 10 results i need)
>>But on the other hand, I do not have to remember 100 files, from which I only 
>>need 10.
>>    
>>
>
>I am completely missing the idea on where these numbers are from. Can you
>explain in more details.
>
>  
>
>>If ReiserFS has to iterate over 100000 files (the whole directory) to do a 
>>"001_*_1212_1" glob, because the binary tree only speeds up known files, but 
>>not patterns, then opendir-readdir should be faster, I guess.
>>    
>>
>
>Binary tree is only helps when you know filename, I believe. You calculate
>a hash and out of that hash you can quickly find desired location.
>You you come up with a hash that places all filenames like your one near one,
>this will help, then.
>
>  
>
>>Another option would be to use subdirectories like
>>name1/name2/name3/id
>>So the glob would be "001/*/1212/1", which should be faster, anyway.
>>But on the other hand, I would have to do a lot more directory management, 
>>creating and deleting directories ...
>>And implementing an opendir-readdir search through "001/*/1212/1" will be 
>>more work too.
>>    
>>
>
>Readdir would require less iterations through 001/*, because number of
>entries will be only 100 as you described above.
>You get all these 100 entries and then loop 100 times trying to open
>001/${next_name}/1212/1 and deciding whenever you need this file or not.
>(If it exists of course, or you might get -ENOENT and proceed to next
>directory).
>Also deleting directories would be an overkill.
>I think this might be faster in many circumfstances.
>Also what you've descrived looks very like to what squid does. And squid people
>went to reiserfs-raw interface and are quite happy with it.
>
>
>Bye,
>    Oleg
>
>
>  
>




  parent reply	other threads:[~2002-05-06 11:06 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-05 14:20 Performance question Philipp Gühring
2002-05-05 15:07 ` Oleg Drokin
2002-05-05 16:43   ` Philipp G?hring
2002-05-06 13:01     ` Oleg Drokin
2002-05-06 11:06   ` Hans Reiser [this message]
  -- strict thread matches above, loose matches on Subject: below --
2003-03-31 21:37 performance question jp
2003-04-01  5:40 ` Trond Myklebust
2003-03-31 21:45 Lever, Charles
     [not found] <1049188686.19334.20.camel@deskpro02>
2003-04-01 15:39 ` jp
2003-04-01 16:06   ` Philippe Gramoullé
2003-04-01 16:22     ` Matt Heaton
2003-04-01 17:08       ` Philippe Gramoullé
2003-04-01 18:45   ` Bogdan Costescu
2005-09-12 19:06 Moritz Gartenmeister
2008-02-14 15:40 Performance question Font Bella
     [not found] ` <90d010000802140740y3ff2706ybc169728fbafbfb4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-14 16:27   ` Marcelo Leal
     [not found]     ` <42996ba90802140827p533779c6o8ab404400be51fdc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-14 16:56       ` Chuck Lever
2008-02-15 15:37         ` Font Bella
     [not found]           ` <90d010000802150737x2ad0739dmeaaa24dc2845e81a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-15 16:13             ` Trond Myklebust
     [not found]               ` <1203092030.11333.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-02-18  9:39                 ` Font Bella
     [not found]                   ` <90d010000802180139x49ac1f49x976f11cec0e01fdf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-18 16:59                     ` Chuck Lever
2008-02-15 16:18             ` Chuck Lever
2008-03-20 18:01 performance question david ahern
2009-01-17 17:18 Performance question Piergiorgio Sartor
2009-01-17 18:37 ` Bill Davidsen
2009-01-17 22:08 ` Keld Jørn Simonsen
2009-01-19 18:12   ` Piergiorgio Sartor
2009-01-21  0:15     ` Keld Jørn Simonsen
2009-01-21  1:05       ` Richard Scobie
2009-01-21 19:14       ` Piergiorgio Sartor
2009-01-21 20:15         ` Keld Jørn Simonsen
2009-01-21 20:26           ` Piergiorgio Sartor
2009-01-17 18:11 David Lethe
2009-01-17 18:20 ` Piergiorgio Sartor
2011-09-15 19:43 Performance Question --[ UxBoD ]--

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3CD663CB.20703@namesys.com \
    --to=reiser@namesys.com \
    --cc=green@namesys.com \
    --cc=p.guehring@futureware.at \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.