public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Bernd Schubert <bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
To: Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
Cc: "Ted Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List"
	<linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Fan Yong <yong.fan-KloliPT79xf2eFz/2MeuCQ@public.gmane.org>
Subject: Re: infinite getdents64 loop
Date: Tue, 31 May 2011 19:43:58 +0200	[thread overview]
Message-ID: <4DE528DE.5020908@itwm.fraunhofer.de> (raw)
In-Reply-To: <D598829B-FB36-4DA8-978E-8C689940D0FA-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>

On 05/31/2011 07:26 PM, Andreas Dilger wrote:
> On 2011-05-31, at 6:35 AM, Ted Ts'o wrote:
>> On Tue, May 31, 2011 at 12:18:11PM +0200, Bernd Schubert wrote:
>>>
>>> Out of interest, did anyone ever benchmark if dirindex provides any
>>> advantages to readdir?  And did those benchmarks include the
>>> disadvantages of the present implementation (non-linear inode
>>> numbers from readdir, so disk seeks on stat() (e.g. from 'ls -l') or
>>> 'rm -fr $dir')?
>>
>> The problem is that seekdir/telldir is terminally broken (and so is
>> NFSv2 for using a such a tiny cookie) in that it fundamentally assumes
>> a linear data structure.  If you're going to use any kind of
>> tree-based data structure, a 32-bit "offset" for seekdir/telldir just
>> doesn't cut it.  We actually play games where we memoize the low
>> 32-bits of the hash and keep track of which cookies we hand out via
>> seekdir/telldir so that things mostly work --- except for NFSv2, where
>> with the 32-bit cookie, you're just hosed.
>>
>> The reason why we have to iterate over the directory in hash tree
>> order is because if we have a leaf node split, half the directories
>> entries get copied to another directory entry, given the promises made
>> by seekdir() and telldir() about directory entries appearing exactly
>> once during a readdir() stream, even if you hold the fd open for weeks
>> or days, mean that you really have to iterate over things in hash
>> order.
>>
>> I'd have to look, since it's been too many years, but as I recall the
>> problem was that there is a common path for NFSv2 and NFSv3/v4, so we
>> don't know whether we can hand back a 32-bit cookie or a 64-bit
>> cookie, so we're always handing the NFS server a 32-bit "offset", even
>> though ew could do better.  Actually, if we had an interface where we
>> could give you a 128-bit "offset" into the directory, we could
>> probably eliminate the duplicate cookie problem entirely.  We just
>> send 64-bits worth of hash, plus the first two bytes of the of file
>> name.
>
> If it's of interest, we've implemented a 64-bit hash mode for ext4 to
> solve just this problem for Lustre.  The llseek() code will return a
> 64-bit hash value on 64-bit systems, unless it is running for some
> process that needs a 32-bit hash value (only NFSv2, AFAIK).
>
> The attached patch can at least form the basis for being able to return
> 64-bit hash values for userspace/NFSv3/v4 when usable.  The patch
> is NOT usable as it stands now, since I've had to modify it from the
> version that we are currently using for Lustre (this version hasn't
> actually been compiled), but it at least shows the outline of what needs
> to be done to get this working.  None of the NFS side is implemented.

Thanks Andreas! I haven't tested it yet, but the generic idea looks 
good. I guess the lower part of the patch (netfilter stuff) got 
accidentally in?


Cheers,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2011-05-31 17:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <201105281502.32719.sweet_f_a@gmx.de>
     [not found] ` <201105301137.02061.sweet_f_a@gmx.de>
     [not found]   ` <1306767521.5971.2.camel@lade.trondhjem.org>
     [not found]     ` <201105311147.24939.sweet_f_a@gmx.de>
2011-05-31 10:18       ` infinite getdents64 loop Bernd Schubert
2011-05-31 12:35         ` Ted Ts'o
2011-05-31 17:07           ` Bernd Schubert
2011-05-31 17:13           ` Boaz Harrosh
     [not found]             ` <4DE521B9.5050603-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org>
2011-05-31 17:30               ` Bernd Schubert
     [not found]                 ` <4DE525AE.9030806-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2011-06-01 13:10                   ` Boaz Harrosh
2011-06-01 16:15                     ` Trond Myklebust
     [not found]           ` <20110531123518.GB4215-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-05-31 17:26             ` Andreas Dilger
     [not found]               ` <D598829B-FB36-4DA8-978E-8C689940D0FA-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2011-05-31 17:43                 ` Bernd Schubert [this message]
     [not found]                   ` <4DE528DE.5020908-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2011-05-31 19:16                     ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DE528DE.5020908@itwm.fraunhofer.de \
    --to=bernd.schubert-mpn0npgs4xgatndf+kubs4quadtiucjx@public.gmane.org \
    --cc=adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
    --cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tytso-3s7WtUTddSA@public.gmane.org \
    --cc=yong.fan-KloliPT79xf2eFz/2MeuCQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox