From: Andreas Dilger <adilger@clusterfs.com>
To: Hans Reiser <reiser@namesys.com>
Cc: Russell Coker <bofh@coker.com.au>,
Ragnar ? <reiserfs@ragnark.vestdata.no>,
reiserfs-list@namesys.com
Subject: Re: Reiserfs with Samba vs NetApp Filer
Date: Sun, 13 Oct 2002 00:38:09 -0600 [thread overview]
Message-ID: <20021013063809.GL3045@clusterfs.com> (raw)
In-Reply-To: <3DA8CAF5.7050203@namesys.com>
On Oct 13, 2002 05:23 +0400, Hans Reiser wrote:
> Andreas Dilger wrote:
> >On Oct 13, 2002 01:55 +0400, Hans Reiser wrote:
> >>Someday not too long from now, it will look like one filesystem even
> >>though it is in multiple cases. Whether that is in reiser5 or reiser6
> >>depends on what sponsors fund first.
> >
> >you should take a look at Lustre - www.lustre.org. We are basically
> >already developing what you are suggesting - a distributed filesystem
> >which is built atop two or more local filesystems. The aggregate
> >throughput of N lustre storage servers is basically N times the
> >throughput of a single server (clients communicate directly with the
> >storage targets, so the cross-sectional bandwidth in perfectly
> >scalable on a switched network).
> >
> >Like Intermezzo, Lustre can be stacked on top of journaling local
> >filesystems, so it would be possible to use reiserfs for both the
> >metadata and storage targets.
> >
> >We are deploying on a 1000-node cluster early next year, and expect
> >total throughput around 4GB/s (we have already made a limited test
> >at 1.4GB/s) with 90TB of storage - on a 2.4 kernel. Because we are
> >using multiple separate filesystems, we are not hampered by the
> >2TB block device limit, and we get all sorts of parallelisms that
> >are not possible with a single large server.
>
> I really don't understand what is the advantage of object based disk
> storage. It seems like its main effect is to prevent people from coming
> up with optimizations the drive manufacturer did not think of. I don't
> at all understand these supposed metadata advantages. We are lucky that
> we don't have in disk drives the sort of innovation inhibiting
> separation of compilers and CPUs that our compatriots in the language
> design business suffer from. The more smarts that go into the drive,
> the more our field will ossify, unless they work closely with FS authors.
While it is _possible_ that you have object based storage on a drive,
the reality is that the object storage targets (OSTs henceforth) are
in practise really large storage systems, like a IBM Shark, or a Linux
box with a few TB of disk and RAID and LVM, and most importantly have a
regular filesystem like ext3 or reiserfs on top of all that storage to
do all of the real storage management.
The benefit of an object based netowrk protocol like Lustre is that the
client is free from all of the details of file and block allocation, and
the OST filesystem can do all of this. Since each OST has an independent
filesystem, it can handle all of the locking/threading for block and
inode allocation locally. It can also do this in any way it sees fit,
so it actually allows for MORE innovation at the OST filesystem level
than other distributed filesystems.
The Lustre network protocol could be considered akin to a network
version of the Linux VFS - the Lustre client (like a Linux process)
is doing I/O on a file, but the Lustre OST (like a Linux filesystem)
is free to implement the details of storing data within that file as
it sees fit. Similarly, the metadata server (MDS) is free to store
filenames, EA data, etc however it wants.
Lustre, like the VFS, needs locking to ensure multiple processes do
not do conflicting things. The Lustre locking code actually is only
doing per-node locking, and trusts the Linux VFS to do the right thing
internally, so we leverage as much of Al Viro's work in this complex
area as we possibly can ;-).
Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/
next prev parent reply other threads:[~2002-10-13 6:38 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-10 7:01 Reiserfs and recovering from the format action Fabrizio Morbini
2002-10-10 1:48 ` Reiserfs with Samba vs NetApp Filer darren
2002-10-10 14:58 ` Philippe Gramoullé
2002-10-10 17:31 ` Hans Reiser
2002-10-10 17:53 ` Philippe Gramoullé
2002-10-10 18:02 ` Hans Reiser
2002-10-10 18:57 ` Dieter Nützel
2002-10-10 19:00 ` Dieter Nützel
2002-10-11 10:40 ` Lars Marowsky-Bree
2002-10-11 10:54 ` Philippe Gramoullé
2002-10-11 12:34 ` Adrian Phillips
2002-10-11 13:05 ` Philippe Gramoullé
2002-10-12 14:34 ` Heinz-Josef Claes
[not found] ` <200210112022.27319.russell@coker.com.au>
2002-10-11 20:00 ` Hans Reiser
2002-10-12 8:52 ` Russell Coker
2002-10-12 9:59 ` Hans Reiser
2002-10-14 2:21 ` Reiserfs with Samba vs NetApp Filer (purely performance) darren
2002-12-14 21:19 ` Hans Reiser
2002-12-14 22:34 ` Richard Sharpe
2002-12-14 23:11 ` Ragnar Kjørstad
2002-12-14 23:21 ` Hans Reiser
2002-12-15 2:13 ` Richard Sharpe
2002-12-15 2:09 ` Richard Sharpe
[not found] ` <20021012150028.G14731@vestdata.no>
2002-10-12 14:00 ` Reiserfs with Samba vs NetApp Filer Russell Coker
2002-10-12 14:47 ` Adrian Phillips
2002-10-12 21:55 ` Hans Reiser
2002-10-12 22:29 ` Andreas Dilger
2002-10-13 1:23 ` Hans Reiser
2002-10-13 6:38 ` Andreas Dilger [this message]
2002-10-13 13:48 ` Hans Reiser
2002-10-13 15:46 ` Andreas Dilger
[not found] ` <20021013181003.D24037@vestdata.no>
2002-10-13 16:36 ` Hans Reiser
2002-10-14 3:50 ` Andreas Dilger
2002-10-14 10:48 ` Hans Reiser
2002-10-12 23:02 ` The Amazing Dragon
2002-10-12 23:27 ` Russell Coker
[not found] ` <20021013020800.T14731@vestdata.no>
2002-10-13 1:39 ` Hans Reiser
2002-10-13 6:52 ` The Amazing Dragon
[not found] ` <20021014175909.GA10292@tapu.f00f.org>
2002-10-14 20:52 ` Russell Coker
2002-10-14 21:24 ` Dieter Nützel
[not found] ` <20021014225812.GA11337@tapu.f00f.org>
2002-10-15 12:57 ` Russell Coker
2002-10-15 13:42 ` Hans Reiser
2002-10-15 14:46 ` Russell Coker
2002-10-15 13:43 ` Hans Reiser
2002-10-10 11:31 ` Reiserfs and recovering from the format action Oleg Drokin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021013063809.GL3045@clusterfs.com \
--to=adilger@clusterfs.com \
--cc=bofh@coker.com.au \
--cc=reiser@namesys.com \
--cc=reiserfs-list@namesys.com \
--cc=reiserfs@ragnark.vestdata.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.