All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@clusterfs.com>
To: Anton Altaparmakov <aia21@cantab.net>
Cc: ptb@it.uc3m.es, Lars Marowsky-Bree <lmb@suse.de>,
	root@chaos.analogic.com, Rik van Riel <riel@conectiva.com.br>,
	linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] mount flag "direct" (fwd)
Date: Tue, 3 Sep 2002 16:46:43 -0600	[thread overview]
Message-ID: <20020903224643.GU32468@clusterfs.com> (raw)
In-Reply-To: <5.1.0.14.2.20020903221201.00ac5770@pop.cus.cam.ac.uk>

On Sep 03, 2002  22:54 +0100, Anton Altaparmakov wrote:
> In my understanding a DFS offers exactly what you need: each node has disks 
> and all disks on all nodes are part of the very same file system. Of course 
> each node maintains the local disks, i.e. the local part of the file system 
> and certain operations require that the nodes communicates with the "DFS 
> master node(s)" in order for example to reserve blocks of disks or to 
> create/rename files (need to make sure no duplicate filenames are 
> instantiated for example). -- Sound familiar so far? You wanted to do 
> exactly the same things but at the block layer and the VFS layer levels 
> instead of the FS layer...
> 
> The difference between a DFS and your proposal is that a DFS maintains all 
> the caching benefits of a normal FS at the local node level, while your 
> proposal completely and entirely disables caching, which is debatably 
> impossible (due to need to load things into ram to read them and to modify 
> them and then write them back) and certainly no FS author will accept their 
> FS driver to be crippled in such a way. The performance loss incurred by 
> removing caching completely is going to make sure you will only be dreaming 
> of those 50GiB/sec. More likely you will be getting a few bytes/sec... (OK, 
> I exaggerate a bit.) The seek times on the disks together with the 
> read/write timings are going to completely annihilate performance. A DFS 
> maintains caching at local node level, so you can still keep open inodes in 
> memory for example (just don't allow any other node to open the same file 
> at the same time or you need to do some juggling via the "Master DFS node").
> 
> Your time would be much better spent in creating the _one_ true DFS, or 
> helping improve one of the existing ones instead of trying to hack up the 
> VFS/block layers to pieces. It almost certainly will be a hell of a lot 
> less work to implement a decent DFS in comparison to changing the block 
> layer, the VFS, _and_ every single FS driver out there to comply with the 
> block layer and VFS changes. And at the same time you get exactly the same 
> features you wanted to have but with hugely boosted performance.

This is exactly what Lustre is supposed to be.  Many nodes, each with
local storage, and clients are able to do I/O directly to the storage
nodes (for non-local storage, or if they have no local storage at all).

There is (currently) a single metadata server (MDS) which controls the
directory tree locking, and the storage nodes control the locking of
inodes (objects) local to their storage.

It's not quite in a robust state yet, but we're working on it.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


  reply	other threads:[~2002-09-03 22:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-03 15:39 [RFC] mount flag "direct" (fwd) Peter T. Breuer
2002-09-03 15:44 ` Rik van Riel
2002-09-03 15:50   ` Peter T. Breuer
2002-09-03 15:56     ` Chris Wedgwood
2002-09-03 15:59       ` Peter T. Breuer
2002-09-03 16:09     ` Richard B. Johnson
2002-09-03 16:29       ` Peter T. Breuer
2002-09-03 16:33         ` Rik van Riel
2002-09-03 17:32         ` Richard B. Johnson
2002-09-03 18:53         ` Lars Marowsky-Bree
2002-09-03 21:07           ` Peter T. Breuer
2002-09-03 21:15             ` Andreas Dilger
2002-09-03 21:15             ` Rik van Riel
2002-09-03 21:54             ` Anton Altaparmakov
2002-09-03 22:46               ` Andreas Dilger [this message]
2002-09-03 23:19               ` Daniel Phillips
2002-09-04  0:18                 ` Anton Altaparmakov
2002-09-04  5:23                 ` David Lang
2002-09-04  7:16             ` Helge Hafting
2002-09-04  8:39               ` Andreas Dilger
2002-09-04 12:07                 ` Helge Hafting
2002-09-04 13:03                   ` Hans Reiser
2002-09-04  8:41               ` Peter T. Breuer
2002-09-04  7:50             ` Joachim Breuer
     [not found]               ` <3D75F8B0.8C7E974E@aitel.hist.no>
2002-09-04 21:26                 ` Joachim Breuer
2002-09-04  9:26             ` Lars Marowsky-Bree
2002-09-03 16:58     ` Anton Altaparmakov
2002-09-03 17:26       ` Peter T. Breuer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020903224643.GU32468@clusterfs.com \
    --to=adilger@clusterfs.com \
    --cc=aia21@cantab.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lmb@suse.de \
    --cc=ptb@it.uc3m.es \
    --cc=riel@conectiva.com.br \
    --cc=root@chaos.analogic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.