From: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
To: "Myklebust,
Trond" <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Cc: "J. Bruce Fields"
<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>,
Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
"linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
<sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Bernd Schubert
<bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>,
"gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org"
<gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org>,
"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: regressions due to 64-bit ext4 directory cookies
Date: Thu, 14 Feb 2013 16:45:36 +1100 [thread overview]
Message-ID: <20130214054536.GL26694@dastard> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
On Thu, Feb 14, 2013 at 03:59:17AM +0000, Myklebust, Trond wrote:
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org]
> > Sent: Wednesday, February 13, 2013 4:34 PM
> > To: Myklebust, Trond
> > Cc: Theodore Ts'o; linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> > Bernd Schubert; gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: regressions due to 64-bit ext4 directory cookies
> >
> > On Wed, Feb 13, 2013 at 04:43:05PM +0000, Myklebust, Trond wrote:
> > > On Wed, 2013-02-13 at 11:20 -0500, J. Bruce Fields wrote:
> > > > Oops, probably should have cc'd linux-nfs.
> > > >
> > > > On Wed, Feb 13, 2013 at 10:36:54AM -0500, Theodore Ts'o wrote:
> > > > > The other thing that I'd note is that the readdir cookie has been
> > > > > 64-bit since NFSv3, which was released in June ***1995***. And
> > > > > the explicit, stated purpose of making it be a 64-bit value (as
> > > > > stated in RFC 1813) was to reduce interoperability problems. If
> > > > > that were the case, are you telling me that Sun (who has
> > > > > traditionally been pretty good worrying about interoperability
> > > > > concerns, and in fact employed the editors of RFC 1813) didn't get
> > > > > this right? This seems quite.... surprising to me.
> > > > >
> > > > > I thought this was the whole point of the various NFS
> > > > > interoperability testing done at Connectathon, for which Sun was a
> > > > > major sponsor?!? No one noticed?!?
> > > >
> > > > Beats me. But it's not necessarily easy to replace clients running
> > > > legacy applications, so we're stuck working with the clients we have....
> > > >
> > > > The linux client does remap the server-provided cookies to small
> > > > integers, I believe exactly because older applications had trouble
> > > > with servers returning "large" cookies. So presumably
> > > > ext4-exporting-Linux servers aren't the first to do this.
> > > >
> > > > I don't know which client versions are affected--Connectathon's next
> > > > week and I'll talk to people and make sure there's an ext4 export
> > > > with this turned on to test against.
> > >
> > > Actually, one of the main reasons for the Linux client not exporting
> > > raw readdir cookies is because the glibc-2 folks in their infinite
> > > wisdom declared that telldir()/seekdir() use an off_t. They then went
> > > yet one further and decided to declare negative offsets to be illegal
> > > so that they could use the negative values internally in their syscall
> > wrappers.
> > >
> > > The POSIX definition has none of the above rubbish
> > > (http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html
> > > ) and so glibc brilliantly saddled Linux with a crippled readdir
> > > implementation that is _not_ POSIX compatible.
> > >
> > > No, I'm not at all bitter...
> >
> > Oh, right, I knew I'd forgotten part of the story....
> >
> > But then you must have actually been testing against servers that were using
> > that 32nd bit?
> >
> > I think ext4 actually only uses 31 bits even in the 32-bit case. And for a server
> > that was literally using an offset inside a directory file, that would be a
> > colossal directory.
That's exactly what XFS directory cookies are - a direct encoding of
the dirent offset into the directory file. Which means a overflow
would occur at 16GB of directory data for XFS. That is in the realm
of several hundreds of millions of files in a single directory,
which I have seen done before....
> > So I'm wondering how you ran across it.
> >
> > Partly just pure curiosity.
>
> IIRC, XFS on IRIX used 0xFFFFF as the readdir eof marker, which caused us to generate an EIO...
And this discussion explains the magic 0x7fffffff offset mask in the
linux XFS readdir code. I've been trying to find out for years
exactly why that was necessary, and now I know.
I probably should write a patch that makes it a "non-magic" number
and remove it completely for 64 bit platforms before I forget again...
Cheers,
Dave.
--
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-02-14 5:45 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00 ` J. Bruce Fields
2013-02-13 8:17 ` Bernd Schubert
2013-02-13 22:18 ` J. Bruce Fields
2013-02-13 13:31 ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40 ` Bernd Schubert
2013-02-14 5:32 ` Dave Chinner
2013-02-13 4:00 ` Theodore Ts'o
2013-02-13 13:31 ` J. Bruce Fields
2013-02-13 15:14 ` Theodore Ts'o
2013-02-13 15:19 ` J. Bruce Fields
2013-02-13 15:36 ` Theodore Ts'o
[not found] ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20 ` J. Bruce Fields
[not found] ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43 ` Myklebust, Trond
2013-02-13 21:33 ` J. Bruce Fields
2013-02-14 3:59 ` Myklebust, Trond
[not found] ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14 5:45 ` Dave Chinner [this message]
2013-02-13 21:21 ` Anand Avati
[not found] ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20 ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:41 ` J. Bruce Fields
[not found] ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47 ` Theodore Ts'o
[not found] ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57 ` Anand Avati
[not found] ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05 ` [Gluster-devel] " J. Bruce Fields
[not found] ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44 ` Theodore Ts'o
[not found] ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14 0:05 ` Anand Avati
[not found] ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47 ` [Gluster-devel] " J. Bruce Fields
2013-03-26 15:23 ` Bernd Schubert
[not found] ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48 ` [Gluster-devel] " Eric Sandeen
2013-03-28 14:07 ` Theodore Ts'o
2013-03-28 16:26 ` Eric Sandeen
2013-03-28 17:52 ` Zach Brown
[not found] ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05 ` Anand Avati
[not found] ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31 ` [Gluster-devel] " J. Bruce Fields
[not found] ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49 ` Anand Avati
[not found] ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43 ` [Gluster-devel] " Jeff Darcy
[not found] ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14 ` Anand Avati
[not found] ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20 ` Anand Avati
2013-02-14 21:46 ` [Gluster-devel] " J. Bruce Fields
[not found] ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14 6:10 ` Dave Chinner
2013-02-14 22:01 ` J. Bruce Fields
2013-02-15 2:27 ` Dave Chinner
2013-02-13 6:56 ` Andreas Dilger
2013-02-13 13:40 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130214054536.GL26694@dastard \
--to=david-fqsqvqoi3ljby3ivrkzq2a@public.gmane.org \
--cc=Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
--cc=bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org \
--cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
--cc=gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=tytso-3s7WtUTddSA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).