From: Boaz Harrosh <bharrosh@panasas.com>
To: Ted Ts'o <tytso@mit.edu>
Cc: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: infinite getdents64 loop
Date: Tue, 31 May 2011 20:13:29 +0300 [thread overview]
Message-ID: <4DE521B9.5050603@panasas.com> (raw)
In-Reply-To: <20110531123518.GB4215@thunk.org>
On 05/31/2011 03:35 PM, Ted Ts'o wrote:
> On Tue, May 31, 2011 at 12:18:11PM +0200, Bernd Schubert wrote:
>>
>> Out of interest, did anyone ever benchmark if dirindex provides any
>> advantages to readdir? And did those benchmarks include the
>> disadvantages of the present implementation (non-linear inode
>> numbers from readdir, so disk seeks on stat() (e.g. from 'ls -l') or
>> 'rm -fr $dir')?
>
> The problem is that seekdir/telldir is terminally broken (and so is
> NFSv2 for using a such a tiny cookie) in that it fundamentally assumes
> a linear data structure. If you're going to use any kind of
> tree-based data structure, a 32-bit "offset" for seekdir/telldir just
> doesn't cut it. We actually play games where we memoize the low
> 32-bits of the hash and keep track of which cookies we hand out via
> seekdir/telldir so that things mostly work --- except for NFSv2, where
> with the 32-bit cookie, you're just hosed.
>
> The reason why we have to iterate over the directory in hash tree
> order is because if we have a leaf node split, half the directories
> entries get copied to another directory entry, given the promises made
> by seekdir() and telldir() about directory entries appearing exactly
> once during a readdir() stream, even if you hold the fd open for weeks
> or days, mean that you really have to iterate over things in hash
> order.
open fd means that it does not survive a server reboot. Why don't you
keep an array per open fd, and hand out the array index. In the array
you can keep a pointer to any info you want to keep. (that's the meaning of
a cookie)
>
> I'd have to look, since it's been too many years, but as I recall the
> problem was that there is a common path for NFSv2 and NFSv3/v4, so we
> don't know whether we can hand back a 32-bit cookie or a 64-bit
> cookie, so we're always handing the NFS server a 32-bit "offset", even
> though ew could do better.
Please fix that. In the 64-bit case of NFSv3/v4 you can give out a pointer
instead of array-index. In NFSv2 on 64bit arches you are stuck with an index
> Actually, if we had an interface where we
> could give you a 128-bit "offset" into the directory, we could
> probably eliminate the duplicate cookie problem entirely. We just
> send 64-bits worth of hash, plus the first two bytes of the of file
> name.
>
If you hand out a pointer or index per fd, you could keep in memory
any info you want, as big as you need it.
>> 3) Disable dirindexing for readdirs
>
> That won't work, since it will break POSIX compliance. Once again,
> we're tied by the decisions made decades ago...
>
> - Ted
Thanks
Boaz
next prev parent reply other threads:[~2011-05-31 17:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <201105281502.32719.sweet_f_a@gmx.de>
[not found] ` <201105301137.02061.sweet_f_a@gmx.de>
[not found] ` <1306767521.5971.2.camel@lade.trondhjem.org>
[not found] ` <201105311147.24939.sweet_f_a@gmx.de>
2011-05-31 10:18 ` infinite getdents64 loop Bernd Schubert
2011-05-31 12:35 ` Ted Ts'o
2011-05-31 17:07 ` Bernd Schubert
2011-05-31 17:13 ` Boaz Harrosh [this message]
[not found] ` <4DE521B9.5050603-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org>
2011-05-31 17:30 ` Bernd Schubert
[not found] ` <4DE525AE.9030806-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2011-06-01 13:10 ` Boaz Harrosh
2011-06-01 16:15 ` Trond Myklebust
[not found] ` <20110531123518.GB4215-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-05-31 17:26 ` Andreas Dilger
[not found] ` <D598829B-FB36-4DA8-978E-8C689940D0FA-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2011-05-31 17:43 ` Bernd Schubert
[not found] ` <4DE528DE.5020908-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2011-05-31 19:16 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DE521B9.5050603@panasas.com \
--to=bharrosh@panasas.com \
--cc=bernd.schubert@itwm.fraunhofer.de \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox