public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* readdir() ordering guarantees on XFS
@ 2008-06-06 13:34 dizzy
  2008-06-10  3:55 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: dizzy @ 2008-06-06 13:34 UTC (permalink / raw)
  To: xfs

Hello

POSIX leaves unspecified the order of getting the entries with readdir(). This 
is normal since different filesystems may implement their own techniques to 
organize entries in a directory (linear, hash, various search trees, etc).

But if I can makes sure that several Linux machines will have the same FS (ie 
XFS), mount options and same kernels can assume that traversing the same file 
hierarchy structure (that is a file structure with the exact same directories 
and files as names, structure, attributes, except maybe "ctime" which we 
can't really control in Linux) can I expect that traversing using readdir() 
will give me the entries in the exact same order? Or are there any other 
conditions that I have to check for that would guarantee it? (such as, for a 
linear directory aproach one has to have created the directory entries in the 
same order on all the machines to expect same order on readdir()).

Currently, we workaround this issue by reading all the directory entries, 
sorting them and traversing the entries in the sorted order. This however has 
been proven to slow the traversal (depth first traversal) operation about 30% 
than doing it in the order received from readdir() (I even had the test code 
read the whole directory contents, sort them in memory but still have it 
traverse in the order readdir() reported and it is 30% faster than traversing 
in the sorted order).

Any idea?

Thanks!

PS: please Cc: me as I'm not subscribed to the list, thank you

-- 
Mihai RUSU					Email: dizzy@roedu.net
			"Linux is obsolete" -- AST

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: readdir() ordering guarantees on XFS
  2008-06-06 13:34 readdir() ordering guarantees on XFS dizzy
@ 2008-06-10  3:55 ` Dave Chinner
  2008-06-10  8:20   ` dizzy
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2008-06-10  3:55 UTC (permalink / raw)
  To: dizzy; +Cc: xfs

On Fri, Jun 06, 2008 at 04:34:13PM +0300, dizzy wrote:
> Hello
> 
> POSIX leaves unspecified the order of getting the entries with readdir(). This 
> is normal since different filesystems may implement their own techniques to 
> organize entries in a directory (linear, hash, various search trees, etc).
> 
> But if I can makes sure that several Linux machines will have the same FS (ie 
> XFS), mount options and same kernels can assume that traversing the same file 
> hierarchy structure (that is a file structure with the exact same directories 
> and files as names, structure, attributes, except maybe "ctime" which we 
> can't really control in Linux) can I expect that traversing using readdir() 
> will give me the entries in the exact same order?

No. For speed I suggest sorting the inode stat() calls in ascending
inode number order before issuing them. Also, perhaps you should
look at:

http://oss.oracle.com/~mason/acp/

To see if you can use similar techniques to speed directory
traversal.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: readdir() ordering guarantees on XFS
  2008-06-10  3:55 ` Dave Chinner
@ 2008-06-10  8:20   ` dizzy
  0 siblings, 0 replies; 3+ messages in thread
From: dizzy @ 2008-06-10  8:20 UTC (permalink / raw)
  To: xfs

On Tuesday 10 June 2008 06:55:47 Dave Chinner wrote:
> On Fri, Jun 06, 2008 at 04:34:13PM +0300, dizzy wrote:
> > Hello
> >
> > POSIX leaves unspecified the order of getting the entries with readdir().
> > This is normal since different filesystems may implement their own
> > techniques to organize entries in a directory (linear, hash, various
> > search trees, etc).
> >
> > But if I can makes sure that several Linux machines will have the same FS
> > (ie XFS), mount options and same kernels can assume that traversing the
> > same file hierarchy structure (that is a file structure with the exact
> > same directories and files as names, structure, attributes, except maybe
> > "ctime" which we can't really control in Linux) can I expect that
> > traversing using readdir() will give me the entries in the exact same
> > order?
>
> No. For speed I suggest sorting the inode stat() calls in ascending
> inode number order before issuing them. 

But this does not solve the main requirement, that is the files traversed on 
the multiple Linux machines have to be sent in the same order (not sure if I 
have specified this in the original mesage, sorry if not). For now I'm 
sorting them lexicographically which is pretty slow. Sorting them by inode 
would not give them in the same order.

> Also, perhaps you should 
> look at:
>
> http://oss.oracle.com/~mason/acp/
>
> To see if you can use similar techniques to speed directory
> traversal.

Funny that you mention acp. We have benchmarked simple "tar" reading and "acp" 
reading of directory structures and on XFS "tar" reading is faster (but not 
on ext3), here are some results (reading a linux kernel tree after a flush of 
the cache by "tar"-ing a huge ammount of data, double the memory size):
- xfs: acp: 1m32s, tar: 1m12s
- ext3: acp: 0m1.5s, tar: 0m2.8s

Although in the test ext3 seems to be much faster than XFS overall in reading, 
it isn't so in writing so we will stick with XFS as it's fast enough for 
reading and fast for writing. Anyway that is another topic.

We still have that ordering issues tho from the original message :)

-- 
Mihai RUSU					Email: dizzy@roedu.net
			"Linux is obsolete" -- AST

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-06-10  8:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-06 13:34 readdir() ordering guarantees on XFS dizzy
2008-06-10  3:55 ` Dave Chinner
2008-06-10  8:20   ` dizzy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox