From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
To: Niels de Vos <ndevos@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
sandeen@redhat.com, Andreas Dilger <adilger.kernel@dilger.ca>,
linux-ext4@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
gluster-devel@nongnu.org
Subject: Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
Date: Wed, 13 Feb 2013 16:40:35 +0100 [thread overview]
Message-ID: <511BB3F3.7010608@itwm.fraunhofer.de> (raw)
In-Reply-To: <20130213133133.GB23233@ndevos-laptop.usersys.redhat.com>
On 02/13/2013 02:31 PM, Niels de Vos wrote:
> On Tue, Feb 12, 2013 at 04:00:54PM -0500, J. Bruce Fields wrote:
>> On Tue, Feb 12, 2013 at 09:56:41PM +0100, Bernd Schubert wrote:
>>> On 02/12/2013 09:28 PM, J. Bruce Fields wrote:
>>>> 06effdbb49af5f6c "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)"
>>>> and previous patches solved problems with hash collisions in large
>>>> directories by using 64- instead of 32- bit directory hashes in some
>>>> cases. But it caused problems for users who assume directory offsets
>>>> are "small". Two cases we've run across:
>>>>
>>>> - older NFS clients: 64-bit cookies cause applications on many
>>>> older clients to fail.
>>>> - gluster: gluster assumed that it could take the top bits of
>>>> the offset for its own use.
>>>>
>>>> In both cases we could argue we're in the right: the nfs protocol
>>>> defines cookies to be 64 bits, so clients should be prepared to handle
>>>> them (remapping to smaller integers if necessary to placate applications
>>>> using older system interfaces). And gluster was incorrect to assume
>>>> that the "offset" was really an "offset" as opposed to just an opaque
>>>> value.
>>>>
>>>> But in practice things that worked fine for a long time break on a
>>>> kernel upgrade.
>>>>
>>>> So at a minimum I think we owe people a workaround, and turning off
>>>> dir_index may not be practical for everyone.
>>>>
>>>> A "no_64bit_cookies" export option would provide a workaround for NFS
>>>> servers with older NFS clients, but not for applications like gluster.
>>>>
>>>> For that reason I'd rather have a way to turn this off on a given ext4
>>>> filesystem. Is that practical?
>>>
>>> I think Ted needs to answer if he would accept another mount option. But
>>> before we are going this way, what is gluster doing if there are hash
>>> collions?
>>
>> They probably just haven't tested NFS with large enough directories.
>> The birthday paradox says you'd need about 2^16 entries to have a 50-50
>> chance of hitting the problem.
>
> The Gluster NFS-server gets into an infinite loop:
> - https://bugzilla.redhat.com/show_bug.cgi?id=838784
Hmm, this bugzilla is not entirely what I meant, as it refers to 64-bit
hashes.
My question actually was, what is gluster going to do if there is a
32-bit hash collision and ext4 seeks back to a random entry?
That might end in an endless loop, but it also simply might list entries
multiple times on readdir().
Of course, something that only happens rarely is better than something
that happens all the time, but it still would be better to properly fix
it, wouldn't it?
> The general advise (even before this Bug) is that XFS should be used,
> which is not affected with this problem (yet?).
Hmm, well, always depends on the workload.
Cheers,
Bernd
next prev parent reply other threads:[~2013-02-13 15:40 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00 ` J. Bruce Fields
2013-02-13 8:17 ` Bernd Schubert
2013-02-13 22:18 ` J. Bruce Fields
2013-02-13 13:31 ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40 ` Bernd Schubert [this message]
2013-02-14 5:32 ` Dave Chinner
2013-02-13 4:00 ` Theodore Ts'o
2013-02-13 13:31 ` J. Bruce Fields
2013-02-13 15:14 ` Theodore Ts'o
2013-02-13 15:19 ` J. Bruce Fields
2013-02-13 15:36 ` Theodore Ts'o
[not found] ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20 ` J. Bruce Fields
2013-02-13 16:20 ` J. Bruce Fields
[not found] ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43 ` Myklebust, Trond
2013-02-13 16:43 ` Myklebust, Trond
2013-02-13 21:33 ` J. Bruce Fields
2013-02-14 3:59 ` Myklebust, Trond
[not found] ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14 5:45 ` Dave Chinner
2013-02-14 5:45 ` Dave Chinner
2013-02-13 21:21 ` Anand Avati
2013-02-13 21:21 ` Anand Avati
[not found] ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20 ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:20 ` Theodore Ts'o
2013-02-13 22:41 ` J. Bruce Fields
[not found] ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47 ` Theodore Ts'o
2013-02-13 22:47 ` Theodore Ts'o
[not found] ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57 ` Anand Avati
2013-02-13 22:57 ` Anand Avati
[not found] ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05 ` [Gluster-devel] " J. Bruce Fields
2013-02-13 23:05 ` J. Bruce Fields
[not found] ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44 ` Theodore Ts'o
2013-02-13 23:44 ` Theodore Ts'o
[not found] ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14 0:05 ` Anand Avati
2013-02-14 0:05 ` Anand Avati
[not found] ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47 ` [Gluster-devel] " J. Bruce Fields
2013-02-14 21:47 ` J. Bruce Fields
2013-03-26 15:23 ` Bernd Schubert
2013-03-26 15:23 ` [Gluster-devel] " Bernd Schubert
[not found] ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48 ` Eric Sandeen
2013-03-26 15:48 ` Eric Sandeen
2013-03-28 14:07 ` Theodore Ts'o
2013-03-28 16:26 ` Eric Sandeen
2013-03-28 17:52 ` Zach Brown
[not found] ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05 ` Anand Avati
2013-03-28 18:05 ` Anand Avati
[not found] ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31 ` [Gluster-devel] " J. Bruce Fields
2013-03-28 18:31 ` J. Bruce Fields
[not found] ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49 ` Anand Avati
2013-03-28 18:49 ` Anand Avati
[not found] ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43 ` [Gluster-devel] " Jeff Darcy
2013-03-28 19:43 ` Jeff Darcy
[not found] ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14 ` Anand Avati
2013-03-28 22:14 ` Anand Avati
[not found] ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20 ` Anand Avati
2013-03-28 22:20 ` Anand Avati
2013-02-14 21:46 ` [Gluster-devel] " J. Bruce Fields
2013-02-14 21:46 ` J. Bruce Fields
[not found] ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14 6:10 ` Dave Chinner
2013-02-14 6:10 ` Dave Chinner
2013-02-14 22:01 ` J. Bruce Fields
2013-02-15 2:27 ` Dave Chinner
2013-02-13 6:56 ` Andreas Dilger
2013-02-13 13:40 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=511BB3F3.7010608@itwm.fraunhofer.de \
--to=bernd.schubert@itwm.fraunhofer.de \
--cc=adilger.kernel@dilger.ca \
--cc=bfields@fieldses.org \
--cc=gluster-devel@nongnu.org \
--cc=linux-ext4@vger.kernel.org \
--cc=ndevos@redhat.com \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.