All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Anand Avati <anand.avati@gmail.com>,
	Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
	sandeen@redhat.com, linux-nfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, gluster-devel@nongnu.org
Subject: Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
Date: Wed, 13 Feb 2013 17:41:41 -0500	[thread overview]
Message-ID: <20130213224141.GU14195@fieldses.org> (raw)
In-Reply-To: <20130213222052.GD5938@thunk.org>

On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> On Wed, Feb 13, 2013 at 01:21:06PM -0800, Anand Avati wrote:
> > 
> > NFS uses the term cookies, while man pages of readdir/seekdir/telldir calls
> > them "offsets". 
> 
> Unfortunately, telldir and seekdir are part of the "unspeakable Unix
> design horrors" which has been with us for 25+ years.  To quote from
> the rationale section from the Single Unix Specification v3 (there is
> similar language in the Posix spec).
> 
>     The original standard developers perceived that there were
>     restrictions on the use of the seekdir() and telldir() functions
>     related to implementation details, and for that reason these
>     functions need not be supported on all POSIX-conforming
>     systems. They are required on implementations supporting the XSI
>     extension.
> 
>     One of the perceived problems of implementation is that returning
>     to a given point in a directory is quite difficult to describe
>     formally, in spite of its intuitive appeal, when systems that use
>     B-trees, hashing functions, or other similar mechanisms to order
>     their directories are considered. The definition of seekdir() and
>     telldir() does not specify whether, when using these interfaces, a
>     given directory entry will be seen at all, or more than once.
> 
>     On systems not supporting these functions, their capability can
>     sometimes be accomplished by saving a filename found by readdir()
>     and later using rewinddir() and a loop on readdir() to relocate
>     the position from which the filename was saved.
> 
> 
> Telldir() and seekdir() are basically implementation horrors for any
> file system that is using anything other than a simple array of
> directory entries ala the V7 Unix file system or the BSD FFS.  For any
> file system which is using a more advanced data structure, like
> b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> readdir stream.  This is why ext3/ext4 uses a telldir cookie, and it's
> why the NFS specifications refer to it as a cookie.  If you are using
> a modern file system, it can't possibly be an offset.
> 
> > You can always say "this is your fault" for interpreting the man pages
> > differently and punish us by leaving things as they are (and unfortunately
> > a big chunk of users who want both ext4 and gluster jeapordized). Or you
> > can be kind, generous and be considerate to the legacy apps and users (of
> > which gluster is only a subset) and only provide a mount option to control
> > the large d_off behavior.
> 
> The problem is that we made this change to fix real problems that take
> place when you have hash collisions.  And if you are using a 31-bit
> cookie, the birthday paradox means that by the time you have a
> directory with 2**16 entries, the chances of hash collisions are very
> real.  This could result in NFS readdir getting stuck in loops where
> it constantly gets the file "foo.c", and then when it passes the
> 31-bit cookie for "bar.c", since there is a hash collision, it gets
> "foo.c" again, and the readdir never terminates.
> 
> So the problem is that you are effectively asking me to penalize
> well-behaved programs that don't try to steel bits from the top of the
> telldir cookie, just for the benefit of gluster.
> 
> What if we have an ioctl or a process personality flag where a broken
> application can tell the file system "I'm broken, please give me a
> degraded telldir/seekdir cookie"?  That way we don't penalize programs
> that are doing the right thing, while providing some accomodation for
> programs who are abusing the telldir cookie.

Yeah, if there's a simple way to do that, maybe it would be worth it.

--b.

  reply	other threads:[~2013-02-13 22:41 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00   ` J. Bruce Fields
2013-02-13  8:17     ` Bernd Schubert
2013-02-13 22:18       ` J. Bruce Fields
2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40       ` Bernd Schubert
2013-02-14  5:32         ` Dave Chinner
2013-02-13  4:00 ` Theodore Ts'o
2013-02-13 13:31   ` J. Bruce Fields
2013-02-13 15:14     ` Theodore Ts'o
2013-02-13 15:19       ` J. Bruce Fields
2013-02-13 15:36         ` Theodore Ts'o
     [not found]           ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20             ` J. Bruce Fields
2013-02-13 16:20               ` J. Bruce Fields
     [not found]               ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43                 ` Myklebust, Trond
2013-02-13 16:43                   ` Myklebust, Trond
2013-02-13 21:33                   ` J. Bruce Fields
2013-02-14  3:59                     ` Myklebust, Trond
     [not found]                       ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14  5:45                         ` Dave Chinner
2013-02-14  5:45                           ` Dave Chinner
2013-02-13 21:21                 ` Anand Avati
2013-02-13 21:21                   ` Anand Avati
     [not found]                   ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20                     ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:20                       ` Theodore Ts'o
2013-02-13 22:41                       ` J. Bruce Fields [this message]
     [not found]                         ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47                           ` Theodore Ts'o
2013-02-13 22:47                             ` Theodore Ts'o
     [not found]                             ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57                               ` Anand Avati
2013-02-13 22:57                                 ` Anand Avati
     [not found]                                 ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05                                   ` [Gluster-devel] " J. Bruce Fields
2013-02-13 23:05                                     ` J. Bruce Fields
     [not found]                                     ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44                                       ` Theodore Ts'o
2013-02-13 23:44                                         ` Theodore Ts'o
     [not found]                                         ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  0:05                                           ` Anand Avati
2013-02-14  0:05                                             ` Anand Avati
     [not found]                                             ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47                                               ` [Gluster-devel] " J. Bruce Fields
2013-02-14 21:47                                                 ` J. Bruce Fields
2013-03-26 15:23                                               ` Bernd Schubert
2013-03-26 15:23                                                 ` [Gluster-devel] " Bernd Schubert
     [not found]                                                 ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48                                                   ` Eric Sandeen
2013-03-26 15:48                                                     ` Eric Sandeen
2013-03-28 14:07                                                     ` Theodore Ts'o
2013-03-28 16:26                                                       ` Eric Sandeen
2013-03-28 17:52                                                       ` Zach Brown
     [not found]                                                         ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05                                                           ` Anand Avati
2013-03-28 18:05                                                             ` Anand Avati
     [not found]                                                             ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31                                                               ` [Gluster-devel] " J. Bruce Fields
2013-03-28 18:31                                                                 ` J. Bruce Fields
     [not found]                                                                 ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49                                                                   ` Anand Avati
2013-03-28 18:49                                                                     ` Anand Avati
     [not found]                                                                     ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43                                                                       ` [Gluster-devel] " Jeff Darcy
2013-03-28 19:43                                                                         ` Jeff Darcy
     [not found]                                                                         ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14                                                                           ` Anand Avati
2013-03-28 22:14                                                                             ` Anand Avati
     [not found]                                                                             ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20                                                                               ` Anand Avati
2013-03-28 22:20                                                                                 ` Anand Avati
2013-02-14 21:46                                           ` [Gluster-devel] " J. Bruce Fields
2013-02-14 21:46                                             ` J. Bruce Fields
     [not found]                       ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  6:10                         ` Dave Chinner
2013-02-14  6:10                           ` Dave Chinner
2013-02-14 22:01                           ` J. Bruce Fields
2013-02-15  2:27                             ` Dave Chinner
2013-02-13  6:56 ` Andreas Dilger
2013-02-13 13:40   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130213224141.GU14195@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=anand.avati@gmail.com \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=gluster-devel@nongnu.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.