linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Jeff Garzik <jeff@garzik.org>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: POHMELFS high performance network filesystem. Transactions, failover, performance.
Date: Wed, 14 May 2008 00:51:14 +0400	[thread overview]
Message-ID: <20080513205114.GA16489@2ka.mipt.ru> (raw)
In-Reply-To: <4829E752.8030104@garzik.org>

Hi.

On Tue, May 13, 2008 at 03:09:06PM -0400, Jeff Garzik (jeff@garzik.org) wrote:
> This continues to be a neat and interesting project :)

Thanks :)

> Where is the best place to look at client<->server protocol?

Hmm, in sources I think, I need to kick myself to write a somewhat good
spec for the next release.

Basically protocol contains of fixed sized header (struct netfs_cmd) and
attached data, which size is embedded into above header. Simple commands
are finished here (essentially all except write/create commands), you
can check them in approrpiate address space/inode operations.
Transactions follow netlink (which is very ugly but exceptionally
extendible) protocol: there is main header (above structure), which
holds size of the embedded data, which can be dereferenced as header/data
parts, where each inner header corresponds to any command (except
transaction header). So one can pack (upto 90 pages of data or different
commands on x86, this is limit of the page size devoted to headers)
requested number of commands into single 'frame' and submit it to
system, which will care about atomicity of that request in regards of
being either fully processed by one of the servers or dropped.

> Are you planning to support the case where the server filesystem dataset 
> does not fit entirely on one server?

Sure. First by allowing whole object to be placed on different servers
(i.e. one subdir is on server1 and another on server2), probably in the
future there will be added support for the same object being distributed
to different servers (i.e. half of the big file on server1 and another
half on server2).

> What is your opinion of the Paxos algorithm?

It is slow. But it does solve failure cases.
So far POHMELFS does not work as distributed filesystem, so it should
not care about it at all, i.e. at most in the very nearest future it
will just have number of acceptors in paxos terminology (metadata
servers in others) without need for active dynamical reconfiguration,
so protocol will be greatly reduced, with addition of dynamical
metadata cluster extension protocol will have to be extended.

As practice shows, the smaller and simpler initial steps are, the better
results eventually become :)

-- 
	Evgeniy Polyakov

  reply	other threads:[~2008-05-13 20:51 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-13 17:45 POHMELFS high performance network filesystem. Transactions, failover, performance Evgeniy Polyakov
2008-05-13 19:09 ` Jeff Garzik
2008-05-13 20:51   ` Evgeniy Polyakov [this message]
2008-05-14  0:52     ` Jamie Lokier
2008-05-14  1:16       ` Florian Wiessner
2008-05-14  8:10         ` Evgeniy Polyakov
2008-05-14  7:57       ` Evgeniy Polyakov
2008-05-14 13:35     ` Sage Weil
2008-05-14 13:52       ` Evgeniy Polyakov
2008-05-14 14:31         ` Jamie Lokier
2008-05-14 15:00           ` Evgeniy Polyakov
2008-05-14 19:08             ` Jeff Garzik
2008-05-14 19:32               ` Evgeniy Polyakov
2008-05-14 20:37                 ` Jeff Garzik
2008-05-14 21:19                   ` Evgeniy Polyakov
2008-05-14 21:34                     ` Jeff Garzik
2008-05-14 21:32             ` Jamie Lokier
2008-05-14 21:37               ` Jeff Garzik
2008-05-14 21:43                 ` Jamie Lokier
2008-05-14 22:02               ` Evgeniy Polyakov
2008-05-14 22:28                 ` Jamie Lokier
2008-05-14 22:45                   ` Evgeniy Polyakov
2008-05-15  1:10                     ` Jamie Lokier
2008-05-15  7:34                       ` Evgeniy Polyakov
2008-05-14 19:05           ` Jeff Garzik
2008-05-14 21:38             ` Jamie Lokier
2008-05-14 19:03         ` Jeff Garzik
2008-05-14 19:38           ` Evgeniy Polyakov
2008-05-14 21:57             ` Jamie Lokier
2008-05-14 22:06               ` Jeff Garzik
2008-05-14 22:41                 ` Evgeniy Polyakov
2008-05-14 22:50                   ` Evgeniy Polyakov
2008-05-14 22:32               ` Evgeniy Polyakov
2008-05-14 14:09       ` Jamie Lokier
2008-05-14 16:09         ` Sage Weil
2008-05-14 19:11           ` Jeff Garzik
2008-05-14 21:19           ` Jamie Lokier
2008-05-14 18:24       ` Jeff Garzik
2008-05-14 20:00         ` Sage Weil
2008-05-14 21:49           ` Jeff Garzik
2008-05-14 22:26             ` Sage Weil
2008-05-14 22:35               ` Jamie Lokier
2008-05-14  6:33 ` Andrew Morton
2008-05-14  7:40   ` Evgeniy Polyakov
2008-05-14  8:01     ` Andrew Morton
2008-05-14  8:31       ` Evgeniy Polyakov
2008-05-14  8:08     ` Evgeniy Polyakov
2008-05-14 13:41       ` Sage Weil
2008-05-14 13:56         ` Evgeniy Polyakov
2008-05-14 17:56         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080513205114.GA16489@2ka.mipt.ru \
    --to=johnpol@2ka.mipt.ru \
    --cc=jeff@garzik.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).