From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Jeff Garzik <jeff@garzik.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: Distributed storage. Move away from char device ioctls.
Date: Sat, 15 Sep 2007 16:29:57 +0400 [thread overview]
Message-ID: <20070915122957.GA25752@2ka.mipt.ru> (raw)
In-Reply-To: <46EADC02.9070409@garzik.org>
Hi Jeff.
On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik (jeff@garzik.org) wrote:
> >Further TODO list includes:
> >* implement optional saving of mirroring/linear information on the remote
> > nodes (simple)
> >* new redundancy algorithm (complex)
> >* some thoughts about distributed filesystem tightly connected to DST
> > (far-far planes so far)
> >
> >Homepage:
> >http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
> >
> >Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
>
> My thoughts. But first a disclaimer: Perhaps you will recall me as
> one of the people who really reads all your patches, and examines your
> code and proposals closely. So, with that in mind...
:)
> I question the value of distributed block services (DBS), whether its
> your version or the others out there. DBS are not very useful, because
> it still relies on a useful filesystem sitting on top of the DBS. It
> devolves into one of two cases: (1) multi-path much like today's SCSI,
> with distributed filesystem arbitrarion to ensure coherency, or (2) the
> filesystem running on top of the DBS is on a single host, and thus, a
> single point of failure (SPOF).
>
> It is quite logical to extend the concepts of RAID across the network,
> but ultimately you are still bound by the inflexibility and simplicity
> of the block device.
Yes, block device itself is not able to scale well, but it is the place
for redundancy, since filesystem will just fail if underlying device
does not work correctly and FS actually does not know about where it
should place redundancy bits - it might happen to be the same broken
disk, so I created a low-level device which distribute requests itself.
It is not allowed to mount it via multiple points, that is where
distributed filesystem must enter the show - multiple remote nodes
export its devices via network, each client gets address of the remote
node to work with, connect to it and process requests. All those bits
are already in the DST, next logical step is to connect it with
higher-layer filesystem.
> In contrast, a distributed filesystem offers far more scalability,
> eliminates single points of failure, and offers more room for
> optimization and redundancy across the cluster.
>
> A distributed filesystem is also much more complex, which is why
> distributed block devices are so appealing :)
>
> With a redundant, distributed filesystem, you simply do not need any
> complexity at all at the block device level. You don't even need RAID.
>
> It is my hope that you will put your skills towards a distributed
> filesystem :) Of the current solutions, GFS (currently in kernel)
> scales poorly, and NFS v4.1 is amazingly bloated and overly complex.
>
> I've been waiting for years for a smart person to come along and write a
> POSIX-only distributed filesystem.
Well, originally (about half a year ago) I started to draft a generic
filesystem which would be just superior to existing designs, not
overbloated like zfs, and just faster.
I do believe it can be implemented.
Further I added network capabilities (since what I saw that time
(AFS was proposed) I did not like - I'm not saying it is bad or
something like that at all, but I would implement things differently)
into design drafts.
When Chris Mason announced btrfs, I found that quite a few new ideas
are already implemented there, so I postponed project (although
direction of the developement of the btrfs seems to move to the zfs side
with some questionable imho points, so I think I can jump to the wagon
of new filesystems right now).
DST is low level for my (theoretical so far) filesystem (actually its
network part) like kevent was a low level system for network AIO (originally).
No matter what filesystem works with network it implements some kind
of logic completed in DST.
Sometimes it is very simple, sometimes a bit more complex, but
eventually it is a network entity with parts of stuff I put into DST.
Since I postponed the project (looking at btrfs and its results), I
completed DST as a standalone block device.
So, essentially, a filesystem with simple distributed facilities is on
(my) radar, but so far you are first who requested it :)
--
Evgeniy Polyakov
next prev parent reply other threads:[~2007-09-15 12:30 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-14 18:54 Distributed storage. Move away from char device ioctls Evgeniy Polyakov
2007-09-14 19:07 ` Jeff Garzik
2007-09-14 20:46 ` Al Boldi
2007-09-14 21:12 ` J. Bruce Fields
2007-09-14 21:14 ` Jeff Garzik
2007-09-14 21:18 ` J. Bruce Fields
2007-09-14 22:32 ` Jeff Garzik
2007-09-14 22:42 ` J. Bruce Fields
2007-09-15 4:08 ` Jeff Garzik
2007-09-15 4:40 ` J. Bruce Fields
2007-09-15 2:54 ` Mike Snitzer
2007-09-15 12:34 ` Evgeniy Polyakov
2007-09-15 12:29 ` Evgeniy Polyakov [this message]
2007-09-15 17:24 ` Andreas Dilger
2007-09-16 7:07 ` Kyle Moffett
2007-10-26 10:44 ` Evgeniy Polyakov
2007-09-16 13:43 ` Evgeniy Polyakov
2007-09-15 13:56 ` Robin Humble
2007-09-15 14:35 ` Jeff Garzik
2007-09-15 16:20 ` Robin Humble
2007-09-15 17:51 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070915122957.GA25752@2ka.mipt.ru \
--to=johnpol@2ka.mipt.ru \
--cc=jeff@garzik.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).