From: Benny Halevy <bhalevy@panasas.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: Greg Banks
<gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>,
NFS list <linux-nfs@vger.kernel.org>,
nfsv4@linux-nfs.org
Subject: Re: A new NFSv4 server...
Date: Fri, 04 Jan 2008 21:51:24 +0200 [thread overview]
Message-ID: <477E8E3C.4030509@panasas.com> (raw)
In-Reply-To: <477E557C.3000104@garzik.org>
Jeff Garzik wrote:
> Benny Halevy wrote:
>> Jeff, taking into account the amount of effort people and different
>> organizations have already put into NFSv4 and NFSv4.1 I wish you could
>> tunnel your inventive energy into making NFSv4.1 better rather than
>> trying to reinvent NFS/RPC/XDR.
>>
>> Although It's rather late in the process since the NFSv4 working group
>> is close to putting the NFSv4.1 Internet-draft up for last
>> call, we would certainly appreciate more implementation feedback.
>
>
> I am more than happy to give feedback, though (as you say) it is
> probably too late for substantial feedback to have any large effect.
>
> My general engineering opinions of pNFS:
>
> * Fills an obvious need: eliminating the need to copy data through
> the metadata interface to backend storage. Many clear, tangible
> benefits here.
>
>
> * pNFS major issue #1: client storage protocol
>
> Storing and retrieving blobs over the network, with strong
> authentication/integrity/security, is a solved problem.
>
> Pick ONE client storage protocol (HTTP? iSCSI OSD2?), and stick to it.
> Or maybe HTTP|SCSI but nothing more. Heck, even BitTorrent w/ auth
> extensions would be better than yet another protocol for similar
> purposes (not that I'm advocating BT, just saying...).
Well, two things about that: maybe if pnfs started from scratch this
was the approach that could have been taken but one of the motivating
factors for pNFS (and actually one that I believe will help make it
successful)
was the desire to replace existing proprietary file system protocols
from several vendors such as EMC, IBM, or Panasas that used
different storage protocols. Second, providing support for several
kinds of storage is better for customers having existing storage
they want to harness together with pNFS.
>
> Maximize reuse of existing software and mindshare.
That's always good.
>
>
> * pNFS major issue #2: abandons NFS's "one true generic" path
>
> I believe pNFS violates the "spirit of NFS" by deviating from a
> defacto assumption found in earlier versions: data transfer is
> simple, arbitrary blobs, addressed in the same manner, and sent via
> the same protocol.
>
> Pick ONE layout type, and stick to it. Banish all other layout types
> to other software layers.
I agree wholeheartedly that we should have had one layout data structure
("layout type" is a loaded
term in the spec...). This was my position from day one (and even
before) but unfortunately it wasn't
accepted and each "layout type" got to define it's own layout data
structure while we could have
defined one generic data structure for mapping files onto all different
kinds of storage devices
while keeping only the device addressing information private to the
"layout type" (== storage
protocol class), plus some other data that's internal to the layout
type, e.g. OSD capabilities.
>
> Protocol conversion servers, firmware, and other softwares can easily
> convert from a generic layout to something more exotic like OSD or
> [insert site specific protocol here].
>
> NFS itself should not be delving into low-level storage details like
> this. Clients should not need to know low-level details (like stripe
> sizes). In Linux, we call this a layering violation.
This is the essence of the layout type concept, implemented as a layout
driver in the linux nfsv4.1 client
implementation. The fact that the files layout type definition are part
of the nfsv4.1 protocol are a mere
fact that the files based layout type uses NFSv4.1 as the storage
protocol. It could very well be defined
in a separate RFC exactly like the blocks and objects layout type
specifications and that can keep its
internal data structures opaque to the generic NFSv4.1 protocol (which
behaves as a transport protocol
for the layout-type specific data)
>
> Working on kernel storage drivers as I do, I can see the attraction of
> wanting to do things this way... but we invented layering and
> abstraction in computer science for good reasons :)
Yup. I think that the layout driver is the software layer you're looking
for.
>
>
> * pNFS major issue #3: no longer a "closed loop" protocol
>
> By permitting multiple layout types, and in particular undefined
> (site-specific) layout types, it is by definition _impossible_ for
> anyone to claim full protocol interoperability with other
> implementations.
>
> The number of possible combinations approaches infinity, with obvious
> consequences on testing, and production software quality.
The non-standard layout types are defined as experimental. To claim
interoperability one would
need to publish a suitable specification of the new layout type (see
"Defining new layout types",
section 22.4 of
http://www.nfsv4-editor.org/draft-18/draft-ietf-nfsv4-minorversion1-18.html#pnfsiana).
Though I agree that allowing a single file to be accessed with multiple
layout types (in theory)
complicates testing typically there will be at most one layout type per
file system.
>
> And when a marketing department advertises "fully NFSv4.1 compliant!"
> on ther company's appliance, it is trivial for any engineer to
> construct another "fully NFSv4.1 compliant" setup -- with equivalent
> authentication, metadata and data sets -- that is not interoperable
> except via the fallback case (copy through the metadata server).
The compliance with NFSv4.1 is indeed tied with the legacy I/O path, and
for pnfs, with the files layout type,
as it is a part of NFSv4.1. Other implementation of NFSv4.1 with pNFS
over non-files layout types will
have to claim compliance with their respective standards. For example,
Panasas's implementation will need
to comply with the OSD standard, iSCSI (or FC), the Object-based pNFS
RFC, and finally NFSv4.1
> Such interoperability breakdowns are IMO not in the spirit of NFS.
That's a part of making progress IMO...
Benny
>
> Jeff
>
>
next prev parent reply other threads:[~2008-01-04 19:52 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-03 12:16 A new NFSv4 server Jeff Garzik
2008-01-03 16:32 ` J. Bruce Fields
2008-01-04 5:32 ` Jeff Garzik
2008-01-04 6:24 ` Greg Banks
[not found] ` <477DD11B.40909-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-01-04 7:04 ` Jeff Garzik
2008-01-04 9:07 ` Benny Halevy
2008-01-04 15:49 ` Jeff Garzik
2008-01-04 19:51 ` Benny Halevy [this message]
2008-01-05 1:46 ` Greg Banks
2008-01-05 7:56 ` Benny Halevy
2008-01-04 17:47 ` J. Bruce Fields
2008-01-04 19:55 ` Benny Halevy
2008-01-04 9:15 ` Peter Åstrand
2008-01-04 10:05 ` Neil Brown
[not found] ` <Pine.LNX.4.64.0801040954070.5004-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-04 13:50 ` Frank van Maarseveen
2008-01-04 16:41 ` Jeff Garzik
2008-01-04 20:03 ` Peter Åstrand
[not found] ` <Pine.LNX.4.64.0801042030380.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-06 23:54 ` James Morris
2008-01-04 20:31 ` Muntz, Daniel
2008-01-04 9:15 ` Peter Åstrand
2008-01-04 16:14 ` Jeff Garzik
2008-01-04 19:58 ` Peter Åstrand
-- strict thread matches above, loose matches on Subject: below --
2008-01-04 15:28 Rick Macklem
[not found] ` <200801041528.KAA18776-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:21 ` J. Bruce Fields
2008-01-04 18:03 ` Tom Haynes
[not found] ` <477E750A.2030905-8AdZ+HgO7noAvxtiuMwx3w@public.gmane.org>
2008-01-04 18:21 ` J. Bruce Fields
2008-01-04 19:50 ` Jeff Garzik
2008-01-04 19:57 ` Peter Åstrand
[not found] ` <Pine.LNX.4.64.0801042055490.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-05 0:43 ` Jeff Garzik
2008-01-04 15:48 Rick Macklem
[not found] ` <200801041548.KAA18953-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:15 ` J. Bruce Fields
2008-01-05 2:32 ` Greg Banks
2008-01-04 17:11 Rick Macklem
[not found] ` <200801041711.MAA19577-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-05 0:51 ` Jeff Garzik
2008-01-04 17:28 Rick Macklem
[not found] ` <200801041728.MAA19743-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:42 ` J. Bruce Fields
2008-01-04 17:45 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=477E8E3C.4030509@panasas.com \
--to=bhalevy@panasas.com \
--cc=gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org \
--cc=jeff@garzik.org \
--cc=linux-nfs@vger.kernel.org \
--cc=nfsv4@linux-nfs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox