linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-ext4@vger.kernel.org, sandeen@redhat.com,
	Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
	gluster-devel@nongnu.org
Subject: Re: regressions due to 64-bit ext4 directory cookies
Date: Wed, 13 Feb 2013 10:36:54 -0500	[thread overview]
Message-ID: <20130213153654.GC17431@thunk.org> (raw)
In-Reply-To: <20130213151953.GJ14195@fieldses.org>

On Wed, Feb 13, 2013 at 10:19:53AM -0500, J. Bruce Fields wrote:
> > > (In more detail: they're spreading a single directory across multiple
> > > nodes, and encoding a node ID into the cookie they return, so they can
> > > tell which node the cookie came from when they get it back.)
> > > 
> > > That works if you assume the cookie is an "offset" bounded above by some
> > > measure of the directory size, hence unlikely to ever use the high
> > > bits....
> > 
> > Right, but why wouldn't a nfs export option solave the problem for
> > gluster?
> 
> No, gluster is running on ext4 directly.

OK, so let me see if I can get this straight.  Each local gluster node
is running a userspace NFS server, right?  Because if it were running
a kernel-side NFS server, it would be sufficient to use an nfs export
option.

A client which mounts a "gluster file system" is also doing this via
NFSv3, right?  Or are they using their own protocol?  If they are
using their own protocol, why can't they encode the node ID somewhere
else?

So this a correct picture of what is going on:

                                                  /------ GFS Storage
                                                 /        Server #1
  GFS Cluster     NFS V3      GFS Cluster      -- NFS v3
  Client        <--------->   Frontend Server  ---------- GFS Storage
                                               --         Server #2
                                                 \
                                                  \------ GFS Storage
                                                          Server #3


And the reason why it needs to use the high bits is because when it
needs to coalesce the results from each GFS Storage Server to the GFS
Cluster client?


The other thing that I'd note is that the readdir cookie has been
64-bit since NFSv3, which was released in June ***1995***.  And the
explicit, stated purpose of making it be a 64-bit value (as stated in
RFC 1813) was to reduce interoperability problems.  If that were the
case, are you telling me that Sun (who has traditionally been pretty
good worrying about interoperability concerns, and in fact employed
the editors of RFC 1813) didn't get this right?  This seems
quite.... surprising to me.

I thought this was the whole point of the various NFS interoperability
testing done at Connectathon, for which Sun was a major sponsor?!?  No
one noticed?!?

	     		      	    	- Ted

  reply	other threads:[~2013-02-13 15:37 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00   ` J. Bruce Fields
2013-02-13  8:17     ` Bernd Schubert
2013-02-13 22:18       ` J. Bruce Fields
2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40       ` Bernd Schubert
2013-02-14  5:32         ` Dave Chinner
2013-02-13  4:00 ` Theodore Ts'o
2013-02-13 13:31   ` J. Bruce Fields
2013-02-13 15:14     ` Theodore Ts'o
2013-02-13 15:19       ` J. Bruce Fields
2013-02-13 15:36         ` Theodore Ts'o [this message]
     [not found]           ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20             ` J. Bruce Fields
     [not found]               ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43                 ` Myklebust, Trond
2013-02-13 21:33                   ` J. Bruce Fields
2013-02-14  3:59                     ` Myklebust, Trond
     [not found]                       ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14  5:45                         ` Dave Chinner
2013-02-13 21:21                 ` Anand Avati
     [not found]                   ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20                     ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:41                       ` J. Bruce Fields
     [not found]                         ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47                           ` Theodore Ts'o
     [not found]                             ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57                               ` Anand Avati
     [not found]                                 ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05                                   ` [Gluster-devel] " J. Bruce Fields
     [not found]                                     ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44                                       ` Theodore Ts'o
     [not found]                                         ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  0:05                                           ` Anand Avati
     [not found]                                             ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47                                               ` [Gluster-devel] " J. Bruce Fields
2013-03-26 15:23                                               ` Bernd Schubert
     [not found]                                                 ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48                                                   ` [Gluster-devel] " Eric Sandeen
2013-03-28 14:07                                                     ` Theodore Ts'o
2013-03-28 16:26                                                       ` Eric Sandeen
2013-03-28 17:52                                                       ` Zach Brown
     [not found]                                                         ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05                                                           ` Anand Avati
     [not found]                                                             ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31                                                               ` [Gluster-devel] " J. Bruce Fields
     [not found]                                                                 ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49                                                                   ` Anand Avati
     [not found]                                                                     ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43                                                                       ` [Gluster-devel] " Jeff Darcy
     [not found]                                                                         ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14                                                                           ` Anand Avati
     [not found]                                                                             ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20                                                                               ` Anand Avati
2013-02-14 21:46                                           ` [Gluster-devel] " J. Bruce Fields
     [not found]                       ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  6:10                         ` Dave Chinner
2013-02-14 22:01                           ` J. Bruce Fields
2013-02-15  2:27                             ` Dave Chinner
2013-02-13  6:56 ` Andreas Dilger
2013-02-13 13:40   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130213153654.GC17431@thunk.org \
    --to=tytso@mit.edu \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=bfields@fieldses.org \
    --cc=gluster-devel@nongnu.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).