From: Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>
To: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
Cc: Anand Avati <anand.avati-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Bernd Schubert
<bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>,
sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
Subject: Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
Date: Wed, 13 Feb 2013 18:44:30 -0500 [thread overview]
Message-ID: <20130213234430.GF5938@thunk.org> (raw)
In-Reply-To: <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
On Wed, Feb 13, 2013 at 06:05:11PM -0500, J. Bruce Fields wrote:
>
> Would it be possible to make something work like, for example, a 31-bit
> hash plus an offset into a hash bucket?
>
> I have trouble thinking about this, partly because I can't remember
> where to find the requirements for readdir on concurrently modified
> directories....
The requires are that for a directory entry which has not been
modified since the last opendir() or rewindir(), readdir() must return
that directory entry exactly once.
For a directory entry which has been added or removed since the last
opendir() or rewinddir() call, it is undefined whether the directory
entry is returned once or not at all. And a rename is defined as a
add/remove, so it's OK for the old filename and the new file name to
appear in the readdir() stream; it would also be OK if neither
appeared in the readdir() stream.
The SUSv3 definition of readdir() can be found here:
http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html
Note also that if you look at the SuSv3 definition of seekdir(), it
explicitly states that the value returned by telldir() is not
guaranteed to be valid after a rewinddir() or across another opendir():
If the value of loc was not obtained from an earlier call to
telldir(), or if a call to rewinddir() occurred between the call to
telldir() and the call to seekdir(), the results of subsequent
calls to readdir() are unspecified.
Hence, it would be legal, and arguably more correct, if we created an
internal array of pointers into the directory structure, where the
first call to telldir() return 1, and the second call to telldir()
returned 2, and the third call to telldir() returned 3, regardless of
the position in the directory, and this number was used by seekdir()
to index into the array of pointers to return the exact location in
the b-tree. This would completely eliminate the possibility of hash
collisions, and guarantee that readdir() would never drop or return a
directory entry multiple times after seekdir().
This implementation approach would have a potential denial of service
potential since each call to telldir() would potentially be allocating
kernel memory, but as long as we make sure the OOM killler kills the
nasty process which is calling telldir() a lot, this would probably be
OK.
It would also be legal to throw away this array after a call to
rewinddir() and closedir(), since telldir() cookies and not guaranteed
to valid indefinitely. See:
http://pubs.opengroup.org/onlinepubs/009695399/functions/seekdir.html
I suspect this would seriously screw over Gluster, though, and this
wouldn't be a solution for NFSv3, since NFS needs long-lived directory
cookies, and not the short-lived cookies which is all POSIX/SuSv3 guarantees.
Regards,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-02-13 23:44 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00 ` J. Bruce Fields
2013-02-13 8:17 ` Bernd Schubert
2013-02-13 22:18 ` J. Bruce Fields
2013-02-13 13:31 ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40 ` Bernd Schubert
2013-02-14 5:32 ` Dave Chinner
2013-02-13 4:00 ` Theodore Ts'o
2013-02-13 13:31 ` J. Bruce Fields
2013-02-13 15:14 ` Theodore Ts'o
2013-02-13 15:19 ` J. Bruce Fields
2013-02-13 15:36 ` Theodore Ts'o
[not found] ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20 ` J. Bruce Fields
[not found] ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43 ` Myklebust, Trond
2013-02-13 21:33 ` J. Bruce Fields
2013-02-14 3:59 ` Myklebust, Trond
[not found] ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14 5:45 ` Dave Chinner
2013-02-13 21:21 ` Anand Avati
[not found] ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20 ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:41 ` J. Bruce Fields
[not found] ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47 ` Theodore Ts'o
[not found] ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57 ` Anand Avati
[not found] ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05 ` [Gluster-devel] " J. Bruce Fields
[not found] ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44 ` Theodore Ts'o [this message]
[not found] ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14 0:05 ` Anand Avati
[not found] ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47 ` [Gluster-devel] " J. Bruce Fields
2013-03-26 15:23 ` Bernd Schubert
[not found] ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48 ` [Gluster-devel] " Eric Sandeen
2013-03-28 14:07 ` Theodore Ts'o
2013-03-28 16:26 ` Eric Sandeen
2013-03-28 17:52 ` Zach Brown
[not found] ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05 ` Anand Avati
[not found] ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31 ` [Gluster-devel] " J. Bruce Fields
[not found] ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49 ` Anand Avati
[not found] ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43 ` [Gluster-devel] " Jeff Darcy
[not found] ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14 ` Anand Avati
[not found] ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20 ` Anand Avati
2013-02-14 21:46 ` [Gluster-devel] " J. Bruce Fields
[not found] ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14 6:10 ` Dave Chinner
2013-02-14 22:01 ` J. Bruce Fields
2013-02-15 2:27 ` Dave Chinner
2013-02-13 6:56 ` Andreas Dilger
2013-02-13 13:40 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130213234430.GF5938@thunk.org \
--to=tytso-3s7wtutddsa@public.gmane.org \
--cc=anand.avati-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org \
--cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
--cc=gluster-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).