From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Dilger Subject: Re: Reiserfs with Samba vs NetApp Filer Date: Sun, 13 Oct 2002 09:46:45 -0600 Message-ID: <20021013154645.GM3045@clusterfs.com> References: <200210121052.22603.bofh@coker.com.au> <20021012150028.G14731@vestdata.no> <200210121600.39712.bofh@coker.com.au> <3DA89A37.2070801@namesys.com> <20021012222950.GK3045@clusterfs.com> <3DA8CAF5.7050203@namesys.com> <20021013063809.GL3045@clusterfs.com> <3DA97993.4090403@namesys.com> Mime-Version: 1.0 Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com Content-Disposition: inline In-Reply-To: <3DA97993.4090403@namesys.com> List-Id: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Hans Reiser Cc: Russell Coker , Ragnar ? , reiserfs-list@namesys.com On Oct 13, 2002 17:48 +0400, Hans Reiser wrote: > Andreas Dilger wrote: > >While it is _possible_ that you have object based storage on a drive, > >the reality is that the object storage targets (OSTs henceforth) are > >in practise really large storage systems, like a IBM Shark, or a Linux > >box with a few TB of disk and RAID and LVM, and most importantly have a > >regular filesystem like ext3 or reiserfs on top of all that storage to > >do all of the real storage management. > > > >The benefit of an object based netowrk protocol like Lustre is that the > >client is free from all of the details of file and block allocation, and > >the OST filesystem can do all of this. Since each OST has an independent > >filesystem, it can handle all of the locking/threading for block and > >inode allocation locally. It can also do this in any way it sees fit, > >so it actually allows for MORE innovation at the OST filesystem level > >than other distributed filesystems. > > > >The Lustre network protocol could be considered akin to a network > >version of the Linux VFS - the Lustre client (like a Linux process) > >is doing I/O on a file, but the Lustre OST (like a Linux filesystem) > >is free to implement the details of storing data within that file as > >it sees fit. Similarly, the metadata server (MDS) is free to store > >filenames, EA data, etc however it wants. > > > >Lustre, like the VFS, needs locking to ensure multiple processes do > >not do conflicting things. The Lustre locking code actually is only > >doing per-node locking, and trusts the Linux VFS to do the right thing > >internally, so we leverage as much of Al Viro's work in this complex > >area as we possibly can ;-). > > Ok, this makes some sense. How does Seagate feel about that view, I am > curious? It is irrelevant how they feel, since they stopped funding our project in 2000 after we had barely completed the initial prototype. Even so, (back when we were looking at "smart" individual disks that presented an OBD interface over ethernet), the disks themselves would have been running Linux internally, with the block and inode allocation half of the filesystem running there. Since it was a lot of additional effort to maintain the split filesystem code ourselves, and also less flexible, we implemented an "obdfilter" OBD driver, which stack on top of a regular Linux filesystem at the VFS layer, and basically replaces the file I/O syscall interface with a network RPC layer. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/