From: Hans Reiser <reiser@namesys.com>
To: Steve Lord <lord@sgi.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>,
Alexander Viro <viro@math.psu.edu>,
"Peter J. Braam" <braam@clusterfs.com>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: BIG files & file systems
Date: Fri, 02 Aug 2002 19:10:30 +0400 [thread overview]
Message-ID: <3D4AA0E6.9000904@namesys.com> (raw)
In-Reply-To: 1028297194.30192.25.camel@jen.americas.sgi.com
There are a number of interfaces that need expansion in 2.5. Telldir
and seekdir would be much better if they took as argument some
filesystem specific opaque cookie (e.g. filename). Using a byte offset
to reference a directory entry that was found with a filename is an
implementation specific artifact that obviously only works for a
ufs/s5fs/ext2 type of filesystem, and is just wrong.
4 billion files is not enough to store the government's XML databases in.
Hans
Steve Lord wrote:
>On Fri, 2002-08-02 at 08:56, Jan Harkes wrote:
>
>
>>I was simply assuming that any filesystem that is using iget5 and
>>doesn't use the simpler iget helper has some reason why it cannot find
>>an inode given just the 32-bit ino_t.
>>
>>
>
>In XFS's case (remember, the iget5 code is based on XFS changes) it is
>more a matter of the code to read the inode sometimes needing to pass
>other info down to the read_inode part of the filesystem, so we want to
>do that internally. XFS can have 64 bit inode numbers, but you need more
>than 1 Tbyte in an fs to get that big (inode numbers are a disk
>address). We also have code which keeps them in the bottom 1 Tbyte
>which is turned on by default on Linux.
>
>
>
>>This is definitely true for Coda, we have 96-bit file identifiers.
>>Actually my development tree currently uses 128-bit, it is aware of
>>multiple administrative realms and distinguishes between objects with
>>FID 0x7f000001.0x1.0x1 in different administrative domains. There is a
>>hash-function that tries to map these large FIDs into the 32-bit ino_t
>>space with as few collisions as possible.
>>
>>NFS has a >32-bit filehandle. ReiserFS might have unique inodes, but
>>seems to need access to the directory to find them. So I don't quickly
>>see how it would guarantee uniqueness. NTFS actually doesn't seem to use
>>iget5 yet, but it has multiple streams per object which would probably
>>end up using the same ino_t.
>>
>>Userspace applications should either have an option to ignore hardlinks.
>>Very large filesystems either don't care because there is plenty of
>>space, don't support them across boundaries that are not visible to the
>>application, or could be dealing with them them automatically (COW
>>links). Besides, if I really have a trillion files, I don't want 'tar
>>and friends' to try to keep track of all those inode numbers (and device
>>numbers) in memory.
>>
>>The other solution is that applications can actually use more of the
>>information from the inode to avoid confusion, like st_nlink and
>>st_mtime, which are useful when the filesystem is still mounted rw as
>>well. And to make it even better, st_uid, st_gid, st_size, st_blocks and
>>st_ctime, and a MD5/SHA checksum. Although this obviously would become
>>even worse for the trillion file backup case.
>>
>>
>
>If apps would have to change then I would vote for allowing larger
>inodes out of the kernel in an extended version of stat and getdents.
>I was going to say 64 bit versions, but if even 64 is not enough for
>you, it is getting a little hard to handle.
>
>Steve
>
>
>
>>Jan
>>
>>
--
Hans
next prev parent reply other threads:[~2002-08-02 15:07 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-07-31 19:16 BIG files & file systems Peter J. Braam
2002-07-31 19:26 ` Christoph Hellwig
2002-07-31 20:04 ` Matti Aarnio
2002-07-31 20:12 ` Christoph Hellwig
2002-08-02 17:26 ` Albert D. Cahalan
2002-08-02 22:14 ` Randy.Dunlap
2002-08-03 3:26 ` Albert D. Cahalan
2002-08-06 5:19 ` Andreas Dilger
2002-08-06 7:24 ` Albert D. Cahalan
2002-08-06 7:52 ` Andreas Dilger
2002-08-06 9:28 ` Matti Aarnio
2002-08-05 13:04 ` Stephen Lord
2002-08-05 13:42 ` Hans Reiser
2002-08-05 13:56 ` Randy.Dunlap
2002-08-05 14:21 ` Randy.Dunlap
2002-08-05 17:31 ` Albert D. Cahalan
2002-08-06 0:16 ` jw schultz
2002-08-06 9:48 ` Hans Reiser
2002-07-31 21:07 ` Jan Harkes
2002-07-31 21:13 ` Alexander Viro
2002-08-01 3:51 ` Jan Harkes
2002-08-01 12:01 ` Mark Mielke
2002-08-02 0:09 ` Stephen Lord
2002-08-02 12:17 ` Chris Mason
2002-08-02 12:33 ` Anton Altaparmakov
2002-08-02 13:56 ` Jan Harkes
2002-08-02 14:06 ` Steve Lord
2002-08-02 15:10 ` Hans Reiser [this message]
2002-08-02 15:39 ` Trond Myklebust
2002-08-02 17:01 ` Hans Reiser
2002-08-02 17:25 ` Nikita Danilov
2002-08-02 17:47 ` Trond Myklebust
2002-08-02 18:10 ` Nikita Danilov
2002-08-02 18:31 ` Hans Reiser
2002-08-02 18:48 ` Nikita Danilov
2002-08-02 18:59 ` Hans Reiser
2002-08-01 12:01 ` David Woodhouse
2002-08-01 20:33 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D4AA0E6.9000904@namesys.com \
--to=reiser@namesys.com \
--cc=braam@clusterfs.com \
--cc=jaharkes@cs.cmu.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=lord@sgi.com \
--cc=viro@math.psu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox