linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Garzik <jeff@garzik.org>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	avishay@gmail.com, akpm@linux-foundation.org,
	linux-fsdevel@vger.kernel.org, osd-dev@open-osd.org,
	linux-kernel@vger.kernel.org,
	James.Bottomley@HansenPartnership.com, jens.axboe@oracle.com,
	linux-scsi@vger.kernel.org
Subject: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils)
Date: Mon, 16 Feb 2009 06:05:04 -0500	[thread overview]
Message-ID: <49994860.9060109@garzik.org> (raw)
In-Reply-To: <49993DAD.40407@panasas.com>

Boaz Harrosh wrote:
> No can do. exofs is meant to be a reference implementation of a pNFS-objects
> file serving system. Have you read the spec of pNFS-objects layout? they define
> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data
> for its clients as NFS, so it needs to have all the infra structure and knowledge
> of an Client pNFS-object layout drive.

Yes, I have studied pNFS!  I plan to add v4.1 and pNFS support to my NFS 
server, once v4.0 support is working well.


pNFS The Theory:   is wise and necessary:  permit clients to directly 
connect to data storage, rather than copying through the metadata 
server(s).  This is what every distributed filesystem is doing these 
days -- direct to data server for bulk data read/write.

pNFS The Specification:   is an utter piece of shit.  I can only presume 
some shady backroom deal in a smoke-filled room was the reason this saw 
the light of day.


In a sane world, NFS clients would speak... NFS.

In the crazy world of pNFS, NFS clients are now forced to speak NFS, 
SCSI, RAID, and any number of proprietary layout types.  When will HTTP 
be added to the list?  :)

But anything beyond the NFS protocol for talking client <-> data servers 
is code bloat complexity madness for an NFS client that wishes to be 
compatible with "most of the NFS 4.1 world".

An ideal NFS client for pNFS should be asked to do these two things, and 
nothing more:

1) send metadata transactions to one or more metadata servers, using 
well-known NFS protocol

2) send data to one or more data servers, using well-known NFS protocol 
subset designed for storage (v4.1, section 13.6)

But no.

pNFS has forced a huge complexity on the NFS client, by permitting an 
unbounded number of network protocols.  A "layout plugin" layer is 
required.  SCSI and OSD support are REQUIRED for any reasonably 
compatible setup going forward.

But even more than the technical complexity, this is the first time in 
NFS history that NFS has required a protocol besides... NFS.

pNFS means that a useful. compatible NFS client must know all these 
storage protocols, in addition to NFS.

Furthermore, enabling proprietary layout types means that it is easy for 
a "compatible" v4.1 client to be denied parallel access to data 
available to other "compatible" v4.1 clients:

	Client A: Linux, fully open source

	Client B: Linux, with closed source module for
		  layout type SuperWhizBang storage

	Both Client A and Client B can claim to be NFS v4.1 and pNFS
	compatible,
	yet Client A must read data through the metadata
	server because it lacks the SuperWhizBang storage plugin.

pNFS means a never-ending arms race for the best storage layout, where 
NFS clients are inevitably compatibly with a __random subset__ of total 
available layout types.  pNFS will be a continuing train wreck of 
fly-by-night storage companies, and their pet layout types & storage 
protocols.

It is a support nightmare, an admin nightmare, a firewall nightmare, a 
client implementor's nightmare, but a storage vendor's wet dream.

NFS was never beautiful, but at least until v4.0 it was well known and 
widely cross-compatible.  And only required one network protocol: NFS.

	Jeff




  reply	other threads:[~2009-02-16 11:05 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-09 13:07 [PATCHSET 0/8 version 3] exofs Boaz Harrosh
2009-02-09 13:12 ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils Boaz Harrosh
2009-02-16  4:18   ` FUJITA Tomonori
2009-02-16  8:49     ` Boaz Harrosh
2009-02-16  9:00       ` FUJITA Tomonori
2009-02-16  9:19         ` Boaz Harrosh
2009-02-16  9:27           ` Jeff Garzik
2009-02-16 10:19             ` Boaz Harrosh
2009-02-16 11:05               ` Jeff Garzik [this message]
2009-02-16 12:45                 ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Boaz Harrosh
2009-02-16 15:50                 ` James Bottomley
2009-02-16 16:27                   ` Benny Halevy
2009-02-16 16:23                 ` Benny Halevy
2009-02-16  9:38           ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
2009-02-16 10:29             ` Boaz Harrosh
2009-02-17  0:20               ` FUJITA Tomonori
2009-02-17  8:10                 ` [osd-dev] " Boaz Harrosh
2009-02-27  8:09                   ` FUJITA Tomonori
2009-03-01 10:43                     ` Boaz Harrosh
2009-02-09 13:18 ` [PATCH 2/8] exofs: file and file_inode operations Boaz Harrosh
2009-02-09 13:20 ` [PATCH 3/8] exofs: symlink_inode and fast_symlink_inode operations Boaz Harrosh
2009-02-09 13:22 ` [PATCH 4/8] exofs: address_space_operations Boaz Harrosh
2009-02-09 13:24 ` [PATCH 5/8] exofs: dir_inode and directory operations Boaz Harrosh
2009-02-15 17:08   ` Evgeniy Polyakov
2009-02-16  9:31     ` Boaz Harrosh
2009-03-15 18:10       ` Boaz Harrosh
2009-03-15 18:37         ` Evgeniy Polyakov
2009-02-09 13:25 ` [PATCH 6/8] exofs: super_operations and file_system_type Boaz Harrosh
2009-02-15 17:24   ` Evgeniy Polyakov
2009-02-16  9:59     ` Boaz Harrosh
2009-02-09 13:29 ` [PATCH 7/8] exofs: Documentation Boaz Harrosh
2009-02-09 13:31 ` [PATCH 8/8] fs: Add exofs to Kernel build Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49994860.9060109@garzik.org \
    --to=jeff@garzik.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=avishay@gmail.com \
    --cc=bharrosh@panasas.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=osd-dev@open-osd.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).