linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Cc: linux-ext4@vger.kernel.org, sandeen@redhat.com,
	Theodore Ts'o <tytso@mit.edu>,
	gluster-devel@nongnu.org,
	Andreas Dilger <adilger.kernel@dilger.ca>
Subject: Re: regressions due to 64-bit ext4 directory cookies
Date: Wed, 13 Feb 2013 17:18:01 -0500	[thread overview]
Message-ID: <20130213221800.GS14195@fieldses.org> (raw)
In-Reply-To: <511B4C18.8030300@itwm.fraunhofer.de>

On Wed, Feb 13, 2013 at 09:17:28AM +0100, Bernd Schubert wrote:
> On 02/12/2013 10:00 PM, J. Bruce Fields wrote:
> >On Tue, Feb 12, 2013 at 09:56:41PM +0100, Bernd Schubert wrote:
> >>On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
> >>>06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
> >>>and previous patches solved problems with hash collisions in large
> >>>directories by using 64- instead of 32- bit directory hashes in some
> >>>cases.  But it caused problems for users who assume directory offsets
> >>>are "small".  Two cases we've run across:
> >>>
> >>>	- older NFS clients: 64-bit cookies cause applications on many
> >>>	  older clients to fail.
> >>>	- gluster: gluster assumed that it could take the top bits of
> >>>	  the offset for its own use.
> >>>
> >>>In both cases we could argue we're in the right: the nfs protocol
> >>>defines cookies to be 64 bits, so clients should be prepared to handle
> >>>them (remapping to smaller integers if necessary to placate applications
> >>>using older system interfaces).  And gluster was incorrect to assume
> >>>that the "offset" was really an "offset" as opposed to just an opaque
> >>>value.
> >>>
> >>>But in practice things that worked fine for a long time break on a
> >>>kernel upgrade.
> >>>
> >>>So at a minimum I think we owe people a workaround, and turning off
> >>>dir_index may not be practical for everyone.
> >>>
> >>>A "no_64bit_cookies" export option would provide a workaround for NFS
> >>>servers with older NFS clients, but not for applications like gluster.
> >>>
> >>>For that reason I'd rather have a way to turn this off on a given ext4
> >>>filesystem.  Is that practical?
> >>
> >>I think Ted needs to answer if he would accept another mount option. But
> >>before we are going this way, what is gluster doing if there are hash
> >>collions?
> >
> >They probably just haven't tested NFS with large enough directories.
> 
> Is it only related to NFS or generic readdir over gluster?
> 
> >The birthday paradox says you'd need about 2^16 entries to have a 50-50
> >chance of hitting the problem.
> 
> We are frequently running into it with 50000 files per directory.
> 
> >
> >I don't know enough about ext4 directory performance.  But unfortunately
> >I suspect there's a range of directory sizes that are too small to have
> >a significant chance of having directory collisions, but still large
> >enough to need dir_index?
> 
> Here is a link to the initial benchmark:
> http://search.luky.org/linux-kernel.2001/msg00117.html

Hm, so I still don't have a good feeling for when dir_index is likely to
start winning.

For comparison, assuming the probability of seeing a failure due to hash
collisions in an n-entry directory is the probability of a collision
among n numbers chosen uniformly at random from 2^31, that's about:

	 0.0002% for n=  100
	 0.006 % for n=  500
	 0.02  % for n= 1000
	 0.6   % for n= 5000
	 2     % for n=10000

So if we could tell anyone with directories smaller than 10,000 entries:
"hey, you don't need dir_index anyway, just turn it off"--good, the only
people still forced to deal with 64-bit cookies will be the ones that
have probably already found that ext4 isn't reliable for their purposes.

If there are people with only a few hundred entries who still need
dir_index--well, we may be making them unhappy as we're making them
suffer to fix a bug that they've never actually seen.

--b.

  reply	other threads:[~2013-02-13 22:18 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00   ` J. Bruce Fields
2013-02-13  8:17     ` Bernd Schubert
2013-02-13 22:18       ` J. Bruce Fields [this message]
2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40       ` Bernd Schubert
2013-02-14  5:32         ` Dave Chinner
2013-02-13  4:00 ` Theodore Ts'o
2013-02-13 13:31   ` J. Bruce Fields
2013-02-13 15:14     ` Theodore Ts'o
2013-02-13 15:19       ` J. Bruce Fields
2013-02-13 15:36         ` Theodore Ts'o
     [not found]           ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20             ` J. Bruce Fields
     [not found]               ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43                 ` Myklebust, Trond
2013-02-13 21:33                   ` J. Bruce Fields
2013-02-14  3:59                     ` Myklebust, Trond
     [not found]                       ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14  5:45                         ` Dave Chinner
2013-02-13 21:21                 ` Anand Avati
     [not found]                   ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20                     ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:41                       ` J. Bruce Fields
     [not found]                         ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47                           ` Theodore Ts'o
     [not found]                             ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57                               ` Anand Avati
     [not found]                                 ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05                                   ` [Gluster-devel] " J. Bruce Fields
     [not found]                                     ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44                                       ` Theodore Ts'o
     [not found]                                         ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  0:05                                           ` Anand Avati
     [not found]                                             ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47                                               ` [Gluster-devel] " J. Bruce Fields
2013-03-26 15:23                                               ` Bernd Schubert
     [not found]                                                 ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48                                                   ` [Gluster-devel] " Eric Sandeen
2013-03-28 14:07                                                     ` Theodore Ts'o
2013-03-28 16:26                                                       ` Eric Sandeen
2013-03-28 17:52                                                       ` Zach Brown
     [not found]                                                         ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05                                                           ` Anand Avati
     [not found]                                                             ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31                                                               ` [Gluster-devel] " J. Bruce Fields
     [not found]                                                                 ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49                                                                   ` Anand Avati
     [not found]                                                                     ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43                                                                       ` [Gluster-devel] " Jeff Darcy
     [not found]                                                                         ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14                                                                           ` Anand Avati
     [not found]                                                                             ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20                                                                               ` Anand Avati
2013-02-14 21:46                                           ` [Gluster-devel] " J. Bruce Fields
     [not found]                       ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  6:10                         ` Dave Chinner
2013-02-14 22:01                           ` J. Bruce Fields
2013-02-15  2:27                             ` Dave Chinner
2013-02-13  6:56 ` Andreas Dilger
2013-02-13 13:40   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130213221800.GS14195@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=gluster-devel@nongnu.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).